In this section, we discuss the repairing effectiveness of ArchRepair. We evaluate ArchRepair on 4 different DNN models (i.e., ResNet-18, ResNet-50, ResNet-101, and DenseNet-121), and 2 popular datasets (i.e., CIFAR-10 and Tiny-ImageNet), and compare the results with 6 SOTA repairing methods. The results show that ArchRepair performs better than other SOTA repairing methods on all different experimental settings, comfirming that ArchRepair has better repairing capability. In addition, we also test repaired DNN model on the corrupted datasets, finding that the accuracy of models repaired by ArchRepair didn't decrease on the most of the corrupted datasets, even increased on some of them, which demonstrate that ArchRepair can also enhance the robustness of DNN model.
We compare ArchRepair with other existing repairing method (i.e. MODE, Apricot, Arachne, SENSEI, Few-Shot, and DeepRepair) on 4 different DNN models (i.e., ResNet-18. ResNet-50, ResNet-101 and DenseNet-121) and 2 popular datasets (i.e., CIFAR-10 and Tiny-ImageNet). For each pre-trained DNN model, we first mix the failure cases in the validation set and the entire training set into a new dataset (denoted as repairing dataset). Then we use ArchRepair and other repairing methods to repair the same model on this repairing dataset, respectively, finally calculate the accuracy on the testing set.
To better illustrate the effectiveness of ArchRepair, we also evaluate the repaired DNN models on two corruption datasets (i.e., CIFAR-10-C and Tiny-ImageNet-C, where we list all 15 types of corruptions and their abbreviations on the picture above). This is to investigate whether the reparing methods would harm model's performance on other datasets, especially on the corrupted datasets. The accuracy of repaired models on original and corruption datasets are listed in the following tables.
In summary, according to the results, ArchRepair can always have the highest accuracy, especially its accuracy is up to over 90% on ResNet-101 and DenseNet-121, demonstrating that ArchRepair has the state-of-the-art repairing ability. On the corrupted datasets, the repaired models also have higher accuracy than the original one. On ResNet-18, ResNet-50, and ResNet-101, the model repaired by ArchRepair has the highest accuracy on over half of the 15 corruption datasets. On DenseNet-121, the model repaired by ArchRepair do not performs well as repaired by Apricot, however, their difference is less than 1% on most of the corrupted dataset (i.e. Shot Noise, Brightness and Contrast). These experimental results demonstrate that the proposed method ArchRepair can not only improve the model's accuracy on original dataset, but also enhance its robustness.