We implement pretrained ResNet50 model provided by PyTorch, and apply DFauLo on the ILSVRC2012-validation set.
2: the label matches the image.
1: the label is basically consistent with the data, but the identification is poor.
0: not sure.
-1: labels and image are not consistent: the image is basically irrelevant to the label, the image can be labeled as multiple labels at the same time, the image may need to be corrected to other labels.
-2: labels and image do not match: the image is completely independent of the label, the image should be labeled as multiple labels at the same time, the image needs to be corrected to other labels.
Exclusion Fault: Image should correspond to other label in the label set.
Multiple Choice Fault : Image corresponds to multiple labels in a label set.
Irrelevant Fault: Image does not belong to any label in the label set.
For Round 1, we take the top 1% data of Offline ranking results for scoring.
For Round 2, we used the scoring results of the Round 1 to predict and rank the remaining data of the Round 1's ranking. Specifically, for one data, if 3 or more crowdsourcing workers scored it as negative, it was considered as a data fault. Then, we take the top 1% data of ranking results for scoring.
For Round 3, we used the scoring results of the Round 1 & Round 2 to predict and rank the remaining data of the Round 2's ranking. Then, we take the top 1% data of ranking results for scoring.
The results are shown below in increasing order of score for each round (we only show images with at least one negative score).
Fault Diagnosis shows x of 5 crowdsourcing workers think this data is Exclusion , Multiple or Irrelevant.Overall information: 193 images were shown for Round 1, with an average score of 0.9
Overall information: 236 images were shown for Round 2, with an average score of 0.8
Overall information: 112 images were shown for Round 3, with an average score of 0.7