XAI: Explainable A.I. by Justification and Introspection

7 min readJan 30, 2019

Deep Learning is used to solve a wide range of problems by providing samples of input data along with the expected output from the neural networks. The learning is driven by concrete mathematical rules used systematically to adjust weights of the network — think of these as a collection of numerical knobs adjusted to produce the desired output. What the network actually learns are these numbers which transform an input question into an output answer. What do these numbers signify? Do they represent knowledge which can be interpreted and understood by humans?

Let us begin at the two of the largest problem spaces where deep learning has proved itself — Vision and Language.

Using Neural Networks for Image Classification

Image classification is a task for which neural networks are used extensively. The performance of neural networks in this task is well established for many datasets [1]. The images in these datasets vary from images of digits to flora, fauna, people, cars, bikes, aeroplanes and other classes of objects. The process for image classification is to take an input image and pass it through a trained network where it goes through a series of transformations which gives a single output class.

Using Neural Networks for Text Generation

Another popular application of neural networks is in natural language. Neural networks excel at translation, language modelling, text classification, question answering, named entity recognition and dependency parsing. In late 2016, Google replaced its traditional NLP methods with an LSTM based method for its translation service [2]. This free service is available worldwide and is used by over 200 million people daily.

Text Generation using a Recurrent Neural Network.

The Need for Explainable A.I.

Although neural networks have been used to solve many problems, understanding of what the network actually learns is unknown. To give us an idea of the acquired knowledge, attempts have been made to visualize the learned weights. Classified as open research — we are yet to decipher the meanings of these tuned knobs. Without understanding what knowledge is actually acquired it is risky to use these networks in real problems.

The legendary tank detection network is a cautionary tale that warns researchers to overcome biases in the dataset. One version of the legend is quoted from [3] as follows,

…However, when a new set of photographs were used, the results were horrible. At first the team was puzzled. But after careful inspection of the first two sets of photographs, they discovered a very simple explanation. The photos with tanks in them were all taken on sunny days, and those without the tanks were taken on overcast days. The network had not learned to identify tank like images; instead, it had learned to identify photographs of sunny days and overcast days.

We cannot afford mistakes such as these in critical domains such as healthcare where real lives are at stake. Neural networks need to become more transparent or, at the very least, justify the predictions it makes.

Explainable A.I. can be viewed through various lenses. The approach depends on how we, as humans, are convinced. One may feel more confident about decisions if multiple models arrive at the same conclusion. Alternatively, the network could generate reasoning to support the prediction. Others may prefer the network grounds the prediction by highlighting parts of the input which helped make the decision — these could be words in an input sentence or regions in an input image. We will discuss two major types of XAI systems,

Justification Systems

Generating Visual Explanations [4] explores a model which produces sentences that explain why a predicted label is appropriate for a given image. The authors propose a ‘justification’ explanation system which produces sentences that explain why a certain classification decision was made. The Image Classification network is trained with an additional Text Generation network for the task of explaining predictions made by the former network. Thus when classifying unseen images, a sentence is generated to justify the class prediction for the given image.

Introspection Systems

Grad-CAM [5] uses gradient information to produce a coarse localization map which highlights regions of the input image that led to the predicted class. The network's decision is introspectively traced from the last layer to the input image to visually explain which features of the input image contributed to the prediction.

GRAD-CAM highlights obtained by tracing gradients from the last layer to the input image.

We combine these complementary explanation systems to generate justifications which are supported by an introspective understanding of the network decision. This allows the model to generate less class-specific justifications and rely more on the features found in the input image.

The CUB Dataset

Why do we choose a collection of bird images? A fine-grained dataset of bird images [6] was used for this task because each bird can be assigned to a unique class. This property makes the Caltech UCSD Birds (CUB) dataset ideal for testing for class-specific general understanding and for image specific discriminative understanding.

How do these models fare in an adversarial setting? Can these systems be fooled by hackers? Congruent to other adversarial experiments our models fail when subject to such attacks.

Weakness to Adversarial Attacks

Effect of adversarial attacks on explanations.

Adversarial attacks are when small perturbations are made to the input image/activation which is invisible to the human eye but leads the network to a wrong prediction. The Fast Gradient Sign Method, as described in [7], was used to generate the adversarial images. The caption generated by the explainable model for the perturbed bird image is completely undescriptive of the bird’s features. [8] is an excellent blog for a deeper explanation of adversarial attacks.

How can these explanation systems be used in the real world? We illustrate one application which constructs an explanation which informs why the given bird does not belong to a certain bird family.

Application: Generate Counter-factual Explanations

Given two distinct classes of bird images, a counter-factual explanation describes why one bird does not belong to the class of bird in the other image. These discriminatory explanations are useful for face matching, fingerprint searching, image-based location detection and similar tasks. spaCy’s dependency parser is used on the generated explanation and attributes are compared to discriminate the bird features found in the image pairs.

Bird images with explanations (Top Row). Counter-factual explanations — describe why the bird in the image does not belong to the class above (Bottom Row).

The discussed techniques are yet to solve the problems for which they were created. As mentioned earlier, the scientific community is actively looking for a deeper understanding of deep networks. The questions investigated are aimed to highlight that these open issues need to be addressed before we can achieve stable global adoption.

Explainable A.I. is essential to establish TRUST in Deep Learning solutions. Transparency is needed to guarantee fairness in the predictions made by models. Interpretability of the learned knowledge allows accountability of A.I. driven decisions. An ambitious motivation is a possibility of interpreting revelations when looking into these magical black-boxes.

Research conducted at the University of Amsterdam with Samarth Bhargav, Daniel Daza and Christina Winkler under the supervision of Prof. Zeynep Akata.

References

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems 2012 (pp. 1097–1105).
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144. 2016 Sep 26.
Dhar V, Stein R. Intelligent decision support methods: the science of knowledge work.
Hendricks LA, Akata Z, Rohrbach M, Donahue J, Schiele B, Darrell T. Generating visual explanations. In European Conference on Computer Vision 2016 Oct 8 (pp. 3–19). Springer, Cham.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV) 2017 Oct 22 (pp. 618–626). IEEE.
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200–2011 Dataset. Technical Report CNS-TR-2011–001, California Institute of Technology (2011).
Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples (2014). arXiv preprint arXiv:1412.6572.
Adversarial examples in deep learning — A Towards Data Science article by Gregory Chatel