Average absolute accuracy~($\Delta$Acc) and ASR~(ΔASR) changes after fine-tuning on CodeBERT, GraphCodeBERT and CodeT5. The adversarial test cases used to evaluate ΔACC and ΔASR is generated from original trained models and the data will not be used in fine-tuning. The ASR* denotes the ASR on final retrained model after tuning with separate adversarial test cases (no overlapping with test data). Arrows with up~(down) direction indicate the performance increased~(decreased) compared with previous values. The larger ASR decline indicates the better robustness enhancement. The bigger accuracy increase indicates the better performance enhancement.
Average ASR on Model=CodeBERT, GraphCodeBERT and CodeT5 before and after adversarial tuning.
Method with * denotes code-based substitution.
Average ASR on Model=CodeBERT, Method with * denotes code-based substitution.
Average ASR on Model=GraphCodeBERT, Method with * denotes code-based substitution.
Average ASR on Model=CodeT5, Method with * denotes code-based substitution.