The source code of a prototype illustrating the workflow of SpecGen can be accessed here.
The dataset involved in the evaluation can be accessed here.
We evaluate the performance of SpecGen on two datasets. The first dataset is collected from the Java category of the SV-COMP benchmark. Due to the unexpected situations (mostly compilation failures) that occurred to some baselines, some class definitions within the benchmark are discarded. The rest of them are collected and processed with necessary modifications, as mentioned in the paper.
For a deeper insight into the relationship between the performance of SpecGen and different kinds of programs, we constructed a dataset containing 120 Java programs, with manually written verifiable JML specifications. Among the 120 programs, 20 are from the GitHub repository Java-JML, and the rest of the programs are collected from LeetCode.