Answers to RQ2: Fine-tuning on the collected ChatGPT-generated content can significantly improve the detection performance of the detectors. Furthermore, the fine-tuned models trained on one dataset have certain generalization abilities to other datasets.
The results show that fine-tuning can significantly enhance the performance of the existing detectors, both for natural language and code data. The fine-tuned models achieved high AUC scores ranging from 0.9 to 1.0 and low FPR/FNR, with most being less than 0.1, on the corresponding test dataset, except for Wiki-GPT and APPS-GPT. The poor results on Wiki-GPT and APPS-GPT are due to the lack of fine-tuning with the Wiki-GPT and APPS-GPT dataset. The overall results highlight the significance of fine-tuning in improving the performance of detectors in various domains