RQ1: Comparison with Baselines

The detailed statistics of this RQ are listed in the table below. The table shows the performance of each evaluated method on all test cases. The metric for LLM-based methods is the success probability, whereas it is a single state of success/failure for Houdini and Daikon. We marked success probabilities in different intervals with different colors: green (80%~100%), yellow (50%~80%), orange (1%~50%), and red (failed).

SpecGen-RQ1

In summary, SpecGen significantly outperforms other baseline methods in terms of verifiable specification generation. This is due to the code comprehension ability of LLMs, and the error-correcting ability of mutation-based specification generation process.

Page updated

Google Sites

Report abuse