Deep learning-based computer vision has plenty of real-life applications. In particular, Convolutional Neural Network (CNN) architectures are commonly used across multiple CV tasks as they have high accuracy. But their strength leads also to a weakness point, as a little perturbation to an input image/frame can make them fail. Would it be possible to interpret why a CNN failed when receiving input from an adversarial sample? Several interpretability techniques exist for other kinds of structured or semi-structured data, few only specialized for unstructured data such as visual content. In this mini-tutorial, I am going to briefly showcase one of these with reference to adversarial samples in computer vision. 

Let’s consider the two images in figure 1:

Figure 1 – A legit image (on the left) and an adversarial sample (on the right) derived from it

To our eyes, they look similar, but they aren’t, as the one on the right is an adversarial sample that has been created by adding some noise to the original on the left. Let’s consider a CNN architecture pretrained on ImageNet. By submitting the image on the left to this CNN for classification, the results are shown in figure 2:

Figure 2 – Top 5 classification results for the legit image

While, by submitting to the same network the image on the right, the observed results are different (see figure 3):

Figure 3 – Top 5 classification results for the adversarial sample

Why is this happening? How can we interpret the classification result from the network? The original image shows a tabby cat and the classification with the highest probability (even if it isn’t so high) for it says “tabby cat”. While for the adversarial sample the model returns “tiger cat” (still a cat, but with some different coat patterns) as the main classification result with a very high probability. Staring at the adversarial image doesn’t give us any information to interpret the result. One way to perform a local explanation is the anchor algorithm, based on this paper from Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. This algorithm provides model-agnostic and human interpretable explanations suitable for classification models applied to images and other types of data. An anchor explanation consists of if-then rules, called anchors, which sufficiently guarantee the explanation locally and try to maximize the area for which the explanation holds. This means that as long as the anchor holds, the prediction should remain the same regardless of the values of the features not present in the anchor. No need to reinvent the wheel to apply this algorithm to your images: a stable and Open Source implementation exists and it is part of the Python package alibi, available in PyPI. Let’s see how it works in practice. Assuming model is our pre-trained multi-classification model, let’s define the prediction function for it:

predict_fn = lambda x: model.predict(x)

Once we specify the shape of the images expected by the model (for the architecture in this example it is 224×224 pixels, with 3 color channels):

image_shape = (224, 224, 3)

we can initialize an anchor image explainer

from alibi.explainers import AnchorImage
segmentation_fn = 'slic'
kwargs = {'n_segments': 15, 'compactness': 20, 'sigma': .5}
explainer = AnchorImage(predict_fn, image_shape, segmentation_fn=segmentation_fn, 
                        segmentation_kwargs=kwargs, images_background=None)

This explainer supports 3 different superpixel segmentation functions and provides the possibility to alternatively use a custom implementation. Here we are choosing the slic segmentation, which uses k-means clustering in the given color space. Some parameters for it can be set: in the above example we are specifying the number of labels in the segmented output image, the compactness (which depends on the image contrast and the shape of objects in the image), and the width of Gaussian smoothing kernel for pre-processing each dimension of the image. Depending on the specific context a preliminary tuning of these parameters would be needed. Load the input image and transform it into a numpy array (use any Python method you like for this) and then use the explainer instance to produce an explanation:

import numpy as np


explanation = explainer.explain(img_array, threshold=.95, p_sample=.5, tau=0.25) 

where threshold defines the minimum fraction of samples for a candidate anchor that need to lead to the same prediction as the original instance, p_sample determines the fraction of superpixels that are changed to either the average value of the superpixel or the pixel value for the superimposed image and tau determines when we assume convergence (a bigger value for tau means faster convergence but also looser anchor restrictions. Default value for this parameter is 0.15). The explain function returns an explanation object which contains info about the anchor. With reference to the legit image first, it is possible to display the generated superpixels:

import matplotlib.pyplot as plt

and also display the anchor:


Repeating the same for the adversarial sample, the anchor looks like this:

Computer Vision example with catsWe can not only notice that the anchors for the two images are different, but also that the one for the adversarial sample doesn’t include any superpixel containing the typical forehead “M” pattern for tabby cats, but other superpixels that lead the model towards the “tiger cat” classification. In the top 5 classification results for the legit image, you should have probably noticed that “tiger cat” was the second top prediction from the model. Is it just a coincidence that the adversarial image classification is producing “tiger cat” as the top prediction? No, because the way the adversarial image has been created from the original has been meant to perform a targeted adversarial attack (which goal is to slightly modify the source image in a way that the derived image will be classified as a specified target class and not just make the model fail). 

Some questions are still unanswered: how adversarial images can be created? Is it possible to set up a defense strategy to mitigate the effect of adversarial attacks on a computer vision system? Are transformers more robust than CNN to adversarial attacks? If you want to find an answer to these questions and learn more, please attend my upcoming workshop at ODSC Europe 2021, “Adversarial Attacks and Defence in Computer Vision 101.”

upcoming odsc europe 2021 talk on Computer Vision

About the author/ODSC Europe 2021 speaker on Adversarial Image Explanation:

Guglielmo is a Computer Vision specialist working in biotech manufacturing. He comes from other Machine Learning and Deep Learning experiences across diverse industries. He is a frequent speaker at local meetups and international conferences and author of a book about distributed Deep Learning on Apache Spark. He is passionate about ML/AI, XR, and Open Source. In his spare time, he enjoys cooking, swimming, and reading comics.


Twitter: @guglielmoiozzia