Answers to RQ1: Existing AIGC detectors generally perform better on natural language data than on code data, indicating that detecting ChatGPT-generated code is a more challenging task. Although commercial detectors are better than open-source ones, they still face difficulties in detecting ChatGPT-generated code.
Explore the RQ1 in language level: We further study the performance of detectors on code with different programming languages. The comparative detection results on the Doc2Code-LLM dataset (including 6 programming languages) are presented in Table 4. Our analysis reveals that these detectors exhibit similar detection performance across different programming languages. We cannot reach a unified conclusion about which detector is better. For example, although Scribbr performs relatively better on Go and Java, GPTZero is better on Javascript and Ruby, their differences are not obvious.