As mentioned in our paper, we constructed our benchmark by collecting 4 representative benchmarks plus a newly constructed BNB Benchmark for evaluation.
To achieve this, we reorganized the existing benchmarks and relabeled them to the vulnerability types in our proposed taxonomy.
Importantly, we excluded some test cases from the four peer-reviewed benchmarks that fell outside our scope. For instance, the SolidiFI Benchmark included 50 contract files associated with "Unhandled Exceptions." We excluded these from our analysis as they primarily dealt with quality issues rather than vulnerabilities as per our taxonomy's classifications. The detailed number of vulnerabilities is shown as follows.
It was noted that we excluded smart contracts from the original datasets that only harbored vulnerabilities falling outside our designated scope. Specifically, we removed 6 files from Not-So-Smart-Contracts, as they represent instances of the "honeypot", an outdated vulnerability type. We also excluded 50 contract files from the SolidiFI Benchmark due to their exclusive content of "unhandled exceptions", issues that solely pertain to code quality. Additionally, we discarded a contract file from the original Smartbugs-curated dataset which was associated with "short addresses", an obsolete vulnerability type already addressed in our paper.
To construct a ground truth from our 2,941 BNB projects, we collaborated with our industrial partner (an anonymous web 3.0 security auditing company) by engaging 3 security audit experts to construct the ground truth. We identified 120 vulnerabilities affecting 113 contracts from these BNB projects.
Feel free to contact our co-author Yue Xue with any questions regarding the benchmark construction.