This would go into section 6 of the paper. Here we analyze the impact of using a numerical or a binary tabular representation, which is also a parameter of SECA that could impact the cost, correctness and informativeness trade-off.
We performed additional experiments using a numerical tabular representation where the average pixel saliency of the bounding box is used to encode the concept associated to the bounding box. The goal was to investigate how beneficial it is to use such numerical representation (which is more costly since it requires to draw precise bounding boxes) compared to using the binary representation.
Because certain learning tasks present saliency maps with compact areas and others with more extended ones with asymmetrical shapes, we use as a proxy to study the impact of the precision of the salient area drawing process and of the use of the numerical representation with the current SECA set-up: the interpretations resulting from tasks with different types of saliency maps with fixed bounding box shapes (rectangles). While the pedestrian and bias scenarios exhibit compact salient areas (the salient concepts have round shapes), the fish and vehicle ones have more asymmetrical shapes (e.g. a shark fin is not squared) and larger range of values. Hence, we hypothesize that on these last scenarios, the numerical representation should provide more accurate interpretations than the binary ones, while this should not change for the other scenarios.
We observe the expected behavior.
For the pedestrian scenarios, we obtain the same top concepts and typicality values with both encodings. For the other two scenarios, we observe small differences: most top concepts remain the same but their rankings differ and typicality values decrease.
The numerical representation, there, brings more precise information as it accounts for concept-frequency differences between classes and saliency scores. For instance, the numerical representation indicates that the shark head shape is more determinative of the shark than the shark body shape, and that the specific shark mouth shape is highly important, while the opposite was suggested by the binary representation (the mouth shape did not even appear in the top 10). This is probably explained because the mouth does not appear in all images, but is very salient when it appears, while other concepts such as the shark body shape are more frequent but possibly less salient as it might be the combination of shape and color that is actually indicative of the shark (the pair of shark body and gray texture is correctly as as typical as the shape only).
We recommend to briefly explore some saliency maps and in cases where the range of saliency values is small and shapes compact, privilege a binary representation and simple bounding boxes, as the cost of annotating finer grain areas would not improve explanation efficiency. In other cases, the cost correctness trade-off is higher.
This also depends on the desired concept granularity. The more fine-grain the "element" concepts should be annotated, the less additional effort is required to obtain correct explanations because the salient shape is more compact and the bounding boxes fit them better.