LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models
Software vulnerabilities continue to be ubiquitous, even in the era of AI-powered code assistants, advanced static analysis tools, and the adoption of extensive testing frameworks. It has become apparent that we must not simply prevent these bugs, but also eliminate them in a quick, efficient manner. Yet, human code intervention is slow, costly, and can often lead to further security vulnerabilities, especially in legacy codebases. The advent of highly advanced Large Language Models (LLM) has opened up the possibility for many software defects to be patched automatically. We propose LLM4CVE — an LLM-based iterative pipeline that robustly fixes vulnerable functions in real-world code with high accuracy. We examine our pipeline with State-of-the-Art LLMs, such as GPT-3.5, GPT-4o, Llama 3 8B, and Llama 3 70B. We achieve a human-verified quality score of 8.51/10 and an increase in ground-truth code similarity of 20% with Llama 3 70B.
Human developers are prone to costly mistakes when designing software systems. Often, these are not simple errors, but a fundamental misunderstanding of cybersecurity concepts. These vulnerable programs are targets for criminals to steal credentials, assets, and other valuable items from end-users. Moreover, the frequency of these types of attacks is steadily increasing. As a result, the ability to quickly and efficiently rectify software bugs has become more critical than ever before. We demonstrate in the above figure how our pipeline can mitigate the risks caused by abandoned or poorly maintained legacy code.
As an increasing amount of software governs critical real-world systems, the importance of program maintenance has grown drastically. The proportion of engineers devoted to maintaining legacy code systems has risen significantly. Even then, the average time-to-fix of software vulnerabilities is only increasing. This presents a growing threat to end-users, especially when these bugs may take far longer to be patched in downstream code. An overview of common vulnerability repair practices and how LLM4CVE is positioned in this cycle is shown above.
LLM4CVE is an iterative pipeline that intends to automatically rectify common software vulnerabilities through the use of augmented Large Language Models. Next, we aim to describe the structural and theoretical motivations behind the implementation of the pipeline. A visualization of the LLM4CVE pipeline is given in the above figure.
The LLM4CVE pipeline implements a "guided with feedback" configuration which employs both Prompt Engineering and iterative generation to synthesize higher-quality candidate vulnerability patches. A complete description of our pipeline's iterative generation step is provided above.
We include results for all three pipeline configurations — "unguided" (zero-shot), "guided" (one-shot), and "guided+feedback" (few-shot) — and the semantic similarity scores are presented above. Importantly, the full configuration of the LLM4CVE pipeline — "guided+feedback" (few-shot) — demonstrates a remarkable performance improvement across all models, with the Llama 3 70B LLM peaking at a 20% increase in semantic similarity scores.
Anonymous repository access has been provided to reviewers during the review process using the following link:
https://anonymous.4open.science/r/LLM4CVE/README.md
We provide the LoRA files for these two models for download using the following link: https://drive.google.com/file/d/1XOPTGhi7AdM0lUqfzXe6YGgHyv8qeclN/view?usp=sharing