Experiments And Reproduce

We here show more detailed results of settings and experiments in our paper.

Our code is available here [code.tar.gz], including defect injection, fault localization, and fault repair processes.

All the codes are implemented with PyTorch, and all experiments are conducted on a server with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 256GB system memory, and 8 NVIDIA TITAN Xp GPUs with 12 GB memory.

More evaluation results of RQ1 (Section Ⅳ.B Features of BDefects4NN Database)

Benign models: we first show the clean accuracy of benign models in Table 7.

Infected models: the performance across four quantity levels of infected models on CIFAR-100 and GTSRB are shown in Tables 8 and 9, respectively.

Consistent with the results presented in our paper, SRA exhibits a notable decline in clean accuracy (CA) as the sub-network level increases, observed on 3 datasets.

Performance of infected models across four quantity levels and four attacks on three datasets. (full visualization of Fig. 3)

More evaluation results of RQ2 (Section Ⅳ.C Performance of Localization Criteria)

Hyper-parameters configurations of localization methods.

[1] Backdoorbench: A comprehensive benchmark of backdoor learning.

Effectiveness (average): The average effectiveness results on the CIFAR-100 and GTSRB datasets are shown in Tables 10 and 11, respectively. Column "Mean'' represents the average value across the corresponding row.

In general, the average ranking of localization effectiveness remains consistent: criteria that emphasize neuron weight (ANP and CLP) outperform general localization (SLICER and deepmufl), with activation-based criteria (NC and FP) ranking the lowest.

Effectiveness: The results focus on the specific attack, architecture, and quantity level are as follows.

(1) Criteria based on neuron weight (CLP and ANP). Across three datasets, CLP exhibits significant declines from narrow to large levels, whereas ANP remains more stable.

(2) General localization (SLICER and deepmufl). Overall, their effectiveness increases as the defect level increases. Specifically, SLICER performs well on the models (especially the VGG series) injected by the SRA backdoor attack across three datasets, occasionally even surpassing ANP/CLP. This can be attributed to the SRA injection method, which isolates the infected neurons from clean predictions.

(3) Activation-based criteria (NC and FP). In general, NC demonstrates effectiveness in narrow-level injected sub-networks but fails to identify infected neurons at larger levels, possibly due to the fewer infected neurons leading to greater activation differences. Conversely, FP consistently exhibits weaker performance.

Effectiveness of six localization methods against specific attack architecture, and quantity level on the CIFAR-10 dataset. (full visualization of Fig. 5)

Effectiveness of six localization methods against specific attack architecture, and quantity level on the CIFAR-100 dataset.

Effectiveness of six localization methods against specific attack architecture, and quantity level on the GTSRB dataset.

Efficiency: The average efficiency results on the CIFAR-100 and GTSRB datasets are shown in Tables 12 and 13, respectively.

For FP, ANP, and CLP, their efficiency shows only slight differences across different datasets. For NC, its processing time is determined by the number of classes in the dataset. For deepmufl and SLICER, the consumed time is influenced by both network architecture (different neuron numbers) and dataset (different inference time).

More evaluation results of RQ3 (Section Ⅳ.D Repair Performance)

Repair setup: For neuron fine-tuning, we fine-tune the localized neurons with 10 epochs, utilizing 5% of accessed clean data. The learning rate is set to 0.01 with cosine annealing, and we use an SGD optimizer with a momentum of 0.9 and a weight decay of 0.0005.

Repair performance: The average results of neuron pruning and neuron fine-tuning on three datasets are as follows.

The general trends between localization effectiveness and repair performance are consistent, with CLP and ANP effectively removing the backdoor, followed by SLICER and deepmufl, while NC and FP exhibit the lowest effectiveness. Note that inaccurate localization (e.g., NC and FP) may result in a significant decline in clean accuracy when employing neuron fine-tuning. We speculate that the primary reason is that fine-tuning these clean neurons leads to catastrophic forgetting [2] of the benign model's capabilities.

[2] Overcoming catastrophic forgetting in neural networks.

Repair performance on the CIFAR-10 dataset: The average repair results of neuron fine-tuning are shown in Table 14. Column "Mean'' represents the average value across the corresponding row.

Repair performance on the CIFAR-100 dataset: The average repair results of neuron pruning and neuron fine-tuning are shown in Tables 15 and 16, respectively. Column "Mean'' represents the average value across the corresponding row.

Repair performance on the GTSRB dataset: The average repair results of neuron pruning and neuron fine-tuning are shown in Tables 17 and 18, respectively. Column "Mean'' represents the average value across the corresponding row.

More results of Discussion (Section Ⅴ)

Besides the localization performance shown in the paper, we here present more results of additional attacks, architectures, and dataset, including their injection performance, and the corresponding repair performance.