ConclusionÂ
This study reveals a critical vulnerability: fake profile detection systems trained on manually created data are ineffective against modern, LLM-generated adversaries. Models that were once reliable in traditional scenarios were easily bypassed by GPT-3.5 and GPT-4 generated profiles, leading to false acceptance rates exceeding 50%.
However, incorporating synthetically generated profiles into training, particularly from multiple LLM sources, significantly improved robustness. Adversarial retraining with GPT-3.5 and GPT-4 examples drastically reduced the False Acceptance Rate (FAR) to 1.34%, outperforming both GPT-4 itself and human annotators. The most effective model integrated both textual and behavioral features and was precisely calibrated.
These findings underscore a crucial point: future detection pipelines must acknowledge and adapt to attackers' access to powerful generative models. Simply curating more manual created fake profiles is no longer a sufficient defense. Without adversarial diversity in training, systems will remain vulnerable in real-world applications.
Future Work
Broader LLM Variants: Expand adversarial retraining to include profiles generated by other advanced LLMs like Claude, Gemini, and various open-source models (e.g., LLaMA, Mixtral).
Cross-Platform Generalization: Evaluate the robustness of these models when applied to other platforms (e.g., GitHub, Twitter) without requiring retraining.
Real-Time Detection: Integrate the detection mechanisms into live systems and evaluate their performance in terms of latency, throughput, and their ability to evade real-time adversarial attacks.
Longitudinal Threat Evolution: Monitor how LLM-generated fake profiles evolve after deployment in response to published defense strategies.
Human-in-the-Loop Systems: Investigate hybrid detection pipelines that combine automated model predictions with expert human oversight to minimize false rejections and enhance accuracy.
LLMs are undergoing rapid evolution. Therefore, defensive strategies must not only keep pace but also proactively anticipate how these generative techniques will be exploited to compromise identity integrity at scale.