In further analysis of the data prioritized by DFauLo, we are curious about whether if we identify new data faults that are not previously discovered.
Based on the previously study: https://labelerrors.com/.
We compared the data selected by DFauLo to identify which ones were not included in the scope of previous works (https://github.com/cleanlab/label-errors/blob/main/mturk/imagenet_mturk.json). These data samples, which CleanLab considered to have a lower likelihood of being faulty, were not manually inspected in previous works.
The table below presents all data points marked negatively by at least one annotator but were not included in previous work.