RQ2 Fine-tuning

RQ2: To what extent can fine-tuning improve detection performance? Since the detectors we compared were primarily designed for detecting natural language content, they may not perform optimally on our code-related dataset. Therefore, in this question, we aim to investigate whether fine-tuning can enhance the perfor- mance of the detectors.

Answers to RQ2: Fine-tuning on the collected ChatGPT-generated content can significantly improve the detection performance of the detectors. Furthermore, the fine-tuned models trained on one dataset have certain generalization abilities to other datasets.

The results show that fine-tuning can significantly enhance the performance of the existing detectors, both for natural language and code data. The fine-tuned models achieved high AUC scores ranging from 0.9 to 1.0 and low FPR/FNR, with most being less than 0.1, on the corresponding test dataset, except for Wiki-GPT and APPS-GPT. The poor results on Wiki-GPT and APPS-GPT are due to the lack of fine-tuning with the Wiki-GPT and APPS-GPT dataset. The overall results highlight the significance of fine-tuning in improving the performance of detectors in various domains

Reproduce the RQ2

Google Sites

Report abuse