Open source software (OSS) vulnerabilities are increasingly prevalent, emphasizing the importance of security patches. However, in widely used security platforms like NVD, a substantial number of CVE records still lack trace links to patches. Although rank-based approaches have been proposed for security patch tracing, they heavily rely on handcrafted features in a single-step framework, which limits their effectiveness.
In this paper, we propose PatchFinder, a two-phase framework with end-to-end correlation learning for better-tracing security patches. In the initial retrieval phase, we employ a patch retriever to account for both lexical and semantic matching based on the raw source code, an efficient and powerful information retrieval method, to narrow down the search space by extracting those commits as candidates that are similar to the CVE descriptions. Afterward, in the re-ranking phase, we design an end-to-end architecture under the supervised fine-tuning paradigm for learning the semantic correlations between CVE descriptions and commits. In this way, we can automatically rank the candidates based on their correlation scores while maintaining low computation overhead. We evaluated our system against 4,789 CVEs from 532 OSS projects. The results are highly promising: PatchFinder achieves a Recall@10 of 80.63% and a Mean Reciprocal Rank (MRR) of 0.7951. Moreover, the manual effort@10 required is curtailed to 2.77, marking a 2.03 times improvement over current leading methods.
When applying PatchFinder in practice, we initially identified 533 patch commits (average rank at 1.65) and submitted them to the official, 482 of which have been confirmed by CVE Numbering Authorities. We have open-source the implementation of PatchFinder at https://github.com/MarkLee131/PatchFinder.
Figure 1. Overview of our approach.
NEW!🎉 Exciting News as of May 20th, 2024! We submitted a total of 700 patch commits to various CVE Numbering Authorities (CNAs). We are delighted to announce that 600 of these submissions have been officially confirmed and acknowledged by esteemed CNAs, including MITRE, RedHat, Oracle, GitHub, ICS-CERT, Jenkins Project, etc.
🎯By leveraging PatchFinder, we are committed to enhancing cybersecurity by ensuring the integrity and reliability of these critical patches.