Compositional Explanation Tool (CET) for Occluded Images

Motivated by the inherent compositionality in the human vision, CET (Compositional Explanation Tool) explains the image classifier based on the simplfied causal reasoning theory. As a baseline, CET has been extensively compared with a variety of popular and recent tools, including RAP (AAAI'20), DeepCover (ECCV'20), Extremal (ICCV'19), RISE (BMVC'18), Integrated Gradients (ICML'17), Guided Backpropagaton (ICLR'15) and Layer-wise relevance propagation (PloS one'15).

Output from different explanation methods on the input: A 'German shepherd dog' that is occluded by paper

Output from different explanation methods on the input: A 'Dalmatian' that is occluded by a baby

Pixel importance from high to low by CET for explaining 'bus', with partial occlusion

Pixel importance from high to low by DeepCover for explaining 'bus', with partial occlusion

When comparing with the baseline, we notice that there are some input images for which the DeepCover outputs are "strange". We put some examples below and also we add the CET explanation results.

Anomaly Input Images:

'airship'

'malinois'

'flamingo'

'bobsled'

DeepCover Output:

CET Output

CET source code

cet-tool.zip

To Run CET:

python src/cet.py --comp --vgg16-model --inputs data --outputs outs

“Photo Bombing”: Images with Groundtruth Occlusions

We plant occlusions (aka “photobombers”) into ImageNet images and we record these occlusion pixels so that we can measure the intersection of explanations with the occluded pixels. Examples (and the corresponding CET explanations) from the Photo Bombing dadaset are as follows.

The occlusions planted in the photo bombing images can be regarded as the groundtruth for the explanation not to overlap with. Thanks to this, we run experiments with different tools and confirm that the intersection between the occlusion and CET is the smallest among all eight tools evaluated. This is complemented by the explanation size measured (aka the portion of pixels of the original input image for restoring the original decision), which shows that the explanation size from CET is also consistently smaller than other tools. The results from other explanation tools are a bix mixing (please see below).

To use the Photo Bombing Data: download link

Images without Occlusions?

While CET targets explaining partially occluded images, it also maintains a high performance for normal images (without occlusions) explanations. We tested this by using the ``roaming panda'' data set from DeepCover, a set of ImageNet images with explanation groundtruth, and we recorded the percentage (the number in the parenthesis ) of groundtruth explanations that were successfully detected by each tool: RAP (91.0%) > DeepCover (76.7%) > CET (72.3%) > Extremal (70.7%) > RISE (55.8%) > LRP (53.8%) > GBP (20.8%) > IG (12.2%). In contrast to the explanations for partial-occlusion images, RAP delivers the best performance on the ``roaming panda'' data set. However, CET's results are still better than those by most other tools. We observe that with or without occlusion does impact the performance of explanation tools, and CET achieves an overall better performance than other tools that are validated using complementary metrics in the evaluation.

CET explanations on the `roaming panda', using VGG16 and MobileNet

Page updated

Google Sites

Report abuse

Compositional Explanation Tool (CET) for Occluded Images

When comparing with the baseline, we notice that there are some input images for which the DeepCover outputs are "strange". We put some examples below and also we add the CET explanation results.

CET source code

To Run CET:

python src/cet.py --comp --vgg16-model --inputs data --outputs outs

“Photo Bombing”: Images with Groundtruth Occlusions

Images without Occlusions?