Statistical test on the de-confusion ability for EIF vs. IF

Statistical test on H1: d_EIF(p) - d_IF(p) > 0 for both group of pairs and individual pairs

We test for the alternative hypothesis that d_EIF(p) - d_IF(p) > 0, i.e. up-weighting/down-weighting the helpful/harmful samples identified by our EIF can increase the distance between confusion pair(s) MORE than the original Influence function.

We test the hypothesis on both groups of confusion pairs (We identify the top 10 wrong classes, for each class we collect all its confusion pairs), and 50 individual confusion pairs. We carry out the test on CUB200, CARS196, InShop, and SOP with two approaches ProxyNCA++, and SoftTriple (Click on different sheets and scroll to see more)

** The definition of confusion pair is: two samples' embeddings with different class labels are the nearest neighbor with each other

Metric learning