RQ1&2 Effectiveness

RQ1: How effective are existing detectors in detecting LLMs- generated content? We will evaluate the performance of 13 detectors on dataset CCD-Test and NLCD-Test.

Answers to RQ1: Existing AIGC detectors generally perform better on natural language data than on code data, indicating that detecting ChatGPT-generated code is a more challenging task. Although commercial detectors are better than open-source ones, they still face difficulties in detecting ChatGPT-generated code.

Explore the RQ1 in language level: We further study the performance of detectors on code with different programming languages. The comparative detection results on the Doc2Code-LLM dataset (including 6 programming languages) are presented in Table 4. Our analysis reveals that these detectors exhibit similar detection performance across different programming languages. We cannot reach a unified conclusion about which detector is better. For example, although Scribbr performs relatively better on Go and Java, GPTZero is better on Javascript and Ruby, their differences are not obvious.

Reproduce the RQ1

Google Sites

Report abuse