WWW'22 - Scalpel-HS

What Should You Know? A Human-In-the-Loop Approach toUnknown Unknowns characterization in Image Recognition

Companion Page

Abstract: Unknown unknowns represent a major challenge in reliable image recognition. Existing methods mainly focus on unknown unknowns identification, leveraging human intelligence to gather images that are potentially difficult for the machine. To drive a deeper understanding of unknown unknowns and more effective identification and treatment, this paper focuses on unknown unknowns characterization. We introduce a human-in-the-loop, semantic analysis framework for characterizing unknown unknowns at scale. Humans are engaged in two tasks that specify what a machine should know and describe what it really knows, respectively, both at the conceptual level, supported by information extraction and machine learning interpretability methods. Data partitioning and sampling techniques are employed to scale up human contributions in handling large data. Through extensive experimentation on scene recognition tasks, we show that our approach provides a rich, descriptive characterization of unknown unknowns and allows for more effective and cost-efficient detection than the state of the art.

The overall architecture of the Scalpel-HS framework

DEMO:

SHOULD-KNOW:

http://145.38.206.91/sigir/scene?t=1

REALLY-KNOWS:

http://145.38.206.91/sigir/scene?t=2

Scene Graph Encoder:

For scene graph generation (SGG), we used Neural Motifs model trained on the Visual Genome dataset. As the backbone for object detection, Faster-RCNN model is integrated. We followed the same hyperparameters suggested by the creator of Neural Motifs. The figure below illustrates the architecture of our proposed Scene Graph Encoder:

The architecture of the Scene Graph Encoder

Semantic Space Partitioning:

we used a genetic algorithm with the following hyperparameters:

Population size: 100
Crossover rate: 0.7
mutation rate: 0.005
Elitism: 20%
Termination criteria: 1000 generations

We initialize the genetic algorithm by constructing a population of random chromosomes P={p1,p2, ...}, where each chromosome consists of representative images that can be sampled. The optimization process is guided by the following fitness function:

The Human Computation Tasks:

The procedure of the SHOULD-KNOW task :

Step 1: validating whether the description of the relation correctly corresponds to the bounding boxes or not

Step 2: specifying the relevancy of the relation is in identifying the given label (e.g. bedroom) of the scene

Step 3: validating the given label of the scene

Step 4: adding new concepts(objects or relations) if they are missing

Step 5: Specifying a minimal set of relationships that can sufficiently identify the scene

Step 6: choose all objects which are specifically relevant to the given scene

The procedure of the REALLY-KNOWS task :

Step 1: Drawing bounding boxes to annotate objects highlighted by the heatmap

Step 2: Naming objects and assigning attributes

Step 3: Defining relations using the objects annotated

Step 4: Adding all the objects and relations highlighted by the heatmap

Step 5: Rating the relevancy of objects/relations for identifying the given scene label

Source code:

https://github.com/shahinsharifi/Scalpel-HS

Induced Biases:

we create unknown unknowns of False Positive by removing concepts from training images of all classes except those of the class of interest. By doing so, the model will strongly associate the spurious concept with the class of interest and make wrong predictions for test images of other classes. Similarly, we create unknown unknowns of False Negative by removing concepts from the training images of the class of interest (not other classes). To make sure the concepts are distributed in several classes, we select the 15 most frequent concepts (objects and relations) and then those that are distributed across at least three classes. The figures below present the co-occurrence matrices we used for choosing the concepts to be removed:

The co-occurrence matrix between objects and scene categories

The co-occurrence matrix between relations and scene categories

Examples of detected unknown-unknows:

Induced unknown-unknowns:

Example of induced unknown unknowns characterized and detected by Scalpel-HS: (upper-row) False Negative <Living room, (-) sofa, Dorm room> and (lower-row) False-negative <Conference room, (+) person, Kindergarden>. For each case, we show the sampled representative image with relevant concepts on the left and an additional similar unknown unknown image on the right. All images are shown together with a corresponding saliency map showing where the model is attending to in making the incorrect prediction. Note that in the False Negative case, the sofa leads to False Negative w.r.t. Living Room yet False Positive w.r.t. Dorm Room. For each case, an incorrect characterization is shown in the red box.

Natural unknown-unknowns:

Example of natural unknown unknowns characterized and detected by Scalpel-HS: (upper-row) False Negative <Hospital Room, (+)sink | (+)counter, Bathroom> and (lower-row) False-negative <Conference room, (-) chair at table, Kitchen>. For each case, we show the sampled representative image with relevant concepts on the left and an additional similar unknown unknown image on the right. All images are shown together with a corresponding saliency map showing where the model is attending to in making the incorrect prediction. For each case, an incorrect characterization is shown in the red box.

Page updated

Google Sites

Report abuse