Conclusion 

This study reveals a critical vulnerability: fake profile detection systems trained on manually created data are ineffective against modern, LLM-generated adversaries. Models that were once reliable in traditional scenarios were easily bypassed by GPT-3.5 and GPT-4 generated profiles, leading to false acceptance rates exceeding 50%.

However, incorporating synthetically generated profiles into training, particularly from multiple LLM sources, significantly improved robustness. Adversarial retraining with GPT-3.5 and GPT-4 examples drastically reduced the False Acceptance Rate (FAR) to 1.34%, outperforming both GPT-4 itself and human annotators. The most effective model integrated both textual and behavioral features and was precisely calibrated.

These findings underscore a crucial point: future detection pipelines must acknowledge and adapt to attackers' access to powerful generative models. Simply curating more manual created fake profiles is no longer a sufficient defense. Without adversarial diversity in training, systems will remain vulnerable in real-world applications.

Future Work

LLMs are undergoing rapid evolution. Therefore, defensive strategies must not only keep pace but also proactively anticipate how these generative techniques will be exploited to compromise identity integrity at scale.