Abstract
Keeping open-source software (OSS) up to date is one potential solution to prevent known vulnerabilities. However, it requires frequent and costly testing and may introduce compatibility issues. Consequently, developers often choose to backport security patches to the vulnerable versions instead. Manual backporting is time-consuming, especially for large OSS such as the Linux kernel. Therefore, automating this process is urgently needed to save considerable time. Existing automated approaches for backporting patches involve either automatic patch generation or automatic patch migration. However, these methods are often ineffective and error-prone since they failed to locate the precise patch locations or generate the correct patch, operating only on the syntactic level.
In this paper, we propose a patch type-sensitive approach to automatically backport OSS security patches, guided by the patch type and patch semantics. Specifically, our approach identifies patch locations with the aid of program dependency graph-based matching at the semantic level. It further applies fine-grained patch migration and fine-tuning based on patch types. We have implemented our approach in a tool named TSBPORT and evaluated it on a largescale dataset consisting of 1,815 pairs of real-world security patches for the Linux kernel. The evaluation results show that TSBPORT successfully backported 1,587 (87.44%) patches, out of which 556 (30.63%) could not be backported by any state-of-the-art approaches, significantly outperforming state-of-the-art approaches. In addition, experiments also show that TSBPORT can be generalized to backport patches in other OSS projects with a success rate of 88.33%.
Approach
This figure illustrates the high-level overview of TSBPORT, which comprises three modules. The Patch Analyzing module receives a security patch and corresponding software program as inputs and identifies patch types for each hunk. The Target Patch Localization module takes the software program with the target version and patch information obtained from the first module and performs syntactic and semantic level mapping to determine the location of the patch on the target version. Finally, the Type Sensitive Patch Generation module uses fine-grained patch migration and fine-tuning based on the patch types to generate a patch for the target version.
Datasets
We collected patch pairs consisting of an original patch on the mainline and a manually backported patch on a target (older) version. Initially, we included 350 bug-fixing patch pairs (Linux-Bug-350) for Linux kernel bugs that were shared by FIXMORPH.
Additionally, we gathered more patch pairs (Linux-CVE-1465) for vulnerabilities in the Linux kernel, where the vulnerabilities are recorded in the CVE database.
Finally, we randomly selected 10 vulnerabilities and their corresponding patches from six different OSS projects (WireShark, FFmpeg, QEMU, OpenSSL, OpenJPEG, and ImageMagick), totaling 60 cases.
Evaluation Results
We compare TSBPORT with the state-of-the-art approaches on Linux-Bug-350 and Linux-CVE-1465 datasets. First, we considered FIXMORPH, as it is the most relevant approach. Second, we evaluated VRepair and VulRepair, which are designed to generate patches using deep-learning techniques. Third, we examined the performance of widely used deep-learning models, namely CodeBert, GraphCodeBert, and ChatGPT.
Then, to investigate the applicability of TSBPORT, we evaluate its effectiveness on six different OSS projects (WireShark, FFmpeg, QEMU, OpenSSL, OpenJPEG, and ImageMagick).
The full set of details is given below.
Tools
We release our tool, TSBPORT, into a docker image at https://drive.google.com/file/d/1SfLCG4F8_xvApDcrSnheBkafFY2pYji5/view?usp=sharing.