While Large Language Models (LLMs) power numerous applications, their reliability risks remain understudied, particularly in security-critical contexts. In this paper, we investigate recurrent meltdown, a novel vulnerability in which models enter repetitive generation loops, leading to severe latency degradation (exceeding a 30× increase). This represents a new attack surface. When maliciously exploited by an attacker, this phenomenon poses a significant risk of Denial-of-Service (DoS) against LLM-based applications. To demonstrate this vulnerability, we present EvoGen, a black-box evolutionary algorithm capable of inducing DoS conditions in LLMs without model weight access. Using EvoGen, we discovered 2,631 triggers inducing recurrent meltdown across ten leading LLMs (including Llama-3 and OpenAI o1). In addition, we evaluated EvoGen on 21 real-world applications, successfully triggering recurrent meltdown in all of them. To mitigate this attack, we propose RecurrentDetector, a method for real-time detection of recurrent meltdown. Our detection approach leverages an analysis of LLM activation patterns during recurrent meltdown, which reveals a key insight: such events trigger rapidly increasing self-similarity in model activations. RecurrentDetector demonstrates strong performance with 95.24% accuracy (F1=0.87) while maintaining a low false positive rate of 2.59%, effectively countering EvoGen-based attacks.
The figure above compares normal model generation with Recurrent Meltdown.
Recurrent Meltdown is an undesirable behavior pattern of LLMs. Triggered by certain malicious prompts, the model repeatedly produce highly similar contents endlessly until it reaches predefined limits, such as GPU memory constraints or hard-coded max_new_tokens settings . The behavior is analogous infinite loops in programming.
Due to the resource-sensitive nature of LLM based services, Recurrent Meltdown can severely diminish its responsiveness (over 30× delay increase) , which is a new attack surface. When maliciously exploited by an attacker, this phenomenon poses a significant risk of Denial-of-Service (DoS) against LLM-based applications.
Using an evolution-based algorithm, EvoGen can find a adversarial prompt that triggers recurrent generation in LLMs at least 335% faster.
RecurrentDetector is a real-time tool designed to preemptively detect and stop recurrent generation. It achieves an accuracy rate of 95.2%.