How Far Have We Come?

Vulnerability-Affected Versions Identification:

How Far Have We Come?

This website provides the supplementary materials for the paper.

Quick Index

Abstract

Identifying which software versions are affected by a vulnerability is critical for patching, risk mitigation, and downstream security tasks. Although numerous tools have been proposed, existing evaluations are limited in scope—focusing primarily on early SZZ variants, neglecting state-of-the-art techniques, and relying on small-scale datasets— leaving open questions about tool accuracy, robustness, and design limitations. In this paper, We present the first comprehensive empirical study of vulnerability-affected versions identification. We construct a high-quality benchmark of 1,128 real-world C/C++ vulnerabilities and systematically evaluate 12 representative tools across across accuracy, robustness, and error root causes. Our findings reveal low overall accuracy (best < 45%), driven by over-reliance on heuristics, limited semantic modeling, and poor handling of complex patches. While modular recomposition and ensemble strategies yield up to 10.1% improvement, accuracy remains below 60.0%. Our study offers actionable insights to guide tool development, combination strategies, and future research in this critical area.

Overview of our study

RQ1: Effectiveness Analysis. How effective are current tools in identifying affected versions of vulnerabilities?
RQ2: Root Cause Analysis. What are the primary causes of FPs and FNs produced by existing tools?
RQ3: Patch-Type Sensitivity Analysis. How does identification performance vary across different patch types?
RQ4: Optimal Combination Analysis. Can combining existing tools improve the overall effectiveness?

Contribution

We constructed a large-scale and representative benchmark dataset comprising 1,128 vulnerabilities across 132 vulnerability types from diverse C/C++ projects. This manually curated dataset—built over 1.5 months—supports reproducible and reliable evaluation of identification tools.
We performed a comprehensive empirical evaluation of 12 representative tools from two major categories, offering an in-depth assessment of their effectiveness across multiple dimensions.
We identified key technical challenges and root causes behind tool limitations, and provide practical insights for tool improvement, ensemble design, and future research directions.

Page updated

Google Sites

Report abuse