Answers to RQ1: Existing AIGC detectors generally perform better on natural language data than on code data, indicating that detecting ChatGPT-generated code is a more challenging task. Although commercial detectors outperform open-source ones, they still face difficulties in detecting ChatGPT-generated code.
In general, the results in Table 3 show that detecting ChatGPT-generated content is challenging, with average AUC, FPR, and FNR on NLCD-Test at 0.58, 0.32, and 0.44, respectively, and on CCD-Test at 0.46, 0.32, and 0.66, respectively. The overall results show that detecting ChatGPT-generated contents is still very challenging. Specifically, the AUC value is higher on NLCD-Test than on CCD-Test, while the FNR is lower on NLCD-Test than on CCD-Test. These results suggest that detecting ChatGPT-generated code is even more difficult than detecting natural language contents, potentially due to the fact that existing detectors are trained with more natural language data than code data.
We further study the performance of detectors on code with different programming languages. The comparative detection results on the Doc2Code-GPT dataset (including 6 programming languages) are presented in Table 4. Our analysis reveals that the detectors exhibit different performance levels in different programming languages. For instance, GPTZero performs remarkably well in detecting Java, JavaScript, Python, and Ruby code with AUC scores of 0.90, 0.90, 0.92, and 0.75, respectively, while it does not perform well on Go and PHP (only 0.59 and 0.55). Conversely, AI-TextClassifier shows excellent results in detecting Go language code (AUC score 0.95) but performs poorly on the remaining languages. It is worth noting that while GPTZero performs well on Java code in the Doc2Code-GPT dataset, it performs poorly on the CONCODE-GPT dataset (with an AUC score of 0.50), which is also written in Java. This indicates that the performance of the detector is sensitive to different datasets with varying distributions, even if they are written in the same programming language.