The project codes and data can be found in the link.
Large Language Models (LLMs) have exhibited extraordinary capabilities in a range of applications, from story generation to writing assistance. However, the potential misuse of LLMs to generate harmful content, such as spear-phishing emails, poses a growing threat. Spear-phishing emails, which lure individuals into clicking malicious URLs through highly targeted content, can result in the leakage of personal information. This paper introduces a framework — SpearBot — for generating spear-phishing emails using LLMs, which comprises two primary components. The first is a series of jailbreak prompts designed to bypass the security policies of LLMs, thereby enabling the generation of spear-phishing emails. The second component involves optimizing these emails through feedback from LLM critics. If an LLM critic identifies an email as phishing, it is refined based on the critique reason until it is no longer recognizable as such.
To assess the efficacy of SpearBot, we deploy various machine-based defenders to evaluate their performance in the phishing emails. The experimental results indicate that our attacks are notably challenging for these defenders to detect, demonstrating the deceptive quality of the generated emails. Further, we conduct manual evaluations of the generated emails through questionnaires to gauge their readability and deception levels. The findings confirm that our framework can produce highly convincing phishing emails, highlighting the substantial potential harm such LLM capabilities pose. We also propose a mixup training strategy for defenders, i.e. adopting a small number of phishing emails generated by LLMs in training, which can mitigate the threats of LLMs.