DistXplore is a targeted test sample generation technique and we mentioned in the paper that it is not always possible to optimize the test suites for generating indistringuishable errors. Due to the space limitation, we only show the average results of all/successful optimized test suites in the paper. Here we list the defense results of each truth-target pair compared with targeted baseline techniques(i.e., BIM, PGD, C&W).
For each dataset, we have 10 truth label and 9 target label for each truth label, i.e. we have 90 experiment settings, so, to show the defense results clearly, we show the defense results of Dissector, Attack-as-defense and Data-transformation in turn. We first show the results of Dissector.
As shown in the Tabel Dissector, in most experiment settings, DistXplore get better defense evasion results on each dataset. On MNIST dataset, DistXplore get the best AUC score in all 90 settings, 83, 84, 88 settings on Fashion-MNIST, CIFAR-10 and SVHN dataset respectively.
The results of Attack-as-Defense are shown in the following table, DistXplore get better defense evasion results on each dataset.
The results of Data Transformation are shown in the following table, DistXplore get better defense evasion results on each dataset.