AutoSpec is a technique that aims to generate program specifications, achieving fully automated, cost-effective, and practical deductive verification.
Formal program verification provides a rigorous approach to ensuring software reliability. Yet, constructing formal specifications (e.g. ACSL annotations) for the complete proof relies on domain expertise and substantial manual effort. In view of such needs, autospec was introduced to alleviate this burden by using Large Language Models (LLMs), guided by static analysis, to synthesize specifications in a hierarchical manner. However, achieving comprehensive verification requires addressing broader challenges in correctness scope, robustness, and scalability. Given these advancements, we present an extended version of autospec}that significantly expands its capability and usability. It overcomes the limitations of the original prototype, offering a more robust pipeline for automated verification. The extended autospec addresses these challenges four-fold: (1) automating total correctness proofs through loop-variant synthesis, (2) adopting a model-ensemble strategy that leverages multiple LLMs for more reliable and diverse specification synthesis, (3) supporting multi-file verification to enable the analysis of inter-modular level C codebases, and (4) providing a containerized deployment via Docker to ensure reproducibility and ease of use. By doing so, the extended autospec delivers a practical, fully automated solution for transforming unannotated C code into provably correct software.
We envision a fully automatic static verification pipeline with LLMs-powered specification generation, as depicted in the following figure. The pipeline takes the source code, which is annotated with property assertions that require verification, as input. It automatically generates specifications and leverages existing auto-active verification tools to complete the verification task.
We have released the implementation and all associated publicly available data to encourage comparable and evidence-based studies on specification Generation for deductive verification.
Package Download: https://doi.org/10.5281/zenodo.17826752 . The package includes all raw experimental data.
Furthermore, all source code and deployment tutorials have been open-sourced on GitHub : https://github.com/Xidian-ICTT-GZ/AutoSpec
# 1. Directory Structure
# 2. Installation and Deployment
## docker image downloads
docker pull junjiehu1905/autospec:latest
docker pull junjiehu1905/termination_analysis:latest
## clone repo
git clone git@github.com:Xidian-ICTT-GZ/Autospec.git
## run docker image and mount local files
docker run -it --rm -v $(pwd):/app docker_name /bin/bash
## install veri-clang
cd LLM4Veri
bash ./script/intall/install_veri-clang.sh
## test clang and veri-clang
clang --version
veri-clang --version
# 3. Getting Started
## set your API_KEY ( for example)
echo 'export OPENAI_API_KEY=""' >> ~/.bashrc
source ~/.bashrc
## Usage
The script main.py executes autospec on a user-specified set of C source files.
The parameter -f accepts one or more .c files, separated by commas.
This is useful when verifying a small component of a project where multiple C files depend on each other.
Header files (.h) do not need to be specified; they are automatically discovered and indexed using framac's multi-file support.
Users select the LLM backend models using the -m parameter.
python3 main.py -f file1.c,file2.c,... -o output-dir -m model1,model2,...
For large-scale benchmark evaluation, auto_run.py supports batch verification by processing all C files within a directory specified by -i. It also enables the model-ensemble strategy via -m, leveraging parallel synthesis across multiple LLMs to improve verification success.
python3 auto_run.py -i input-dir -o output-dir -m model1,model2,...
After partial correctness has been established using either of the above modes, generate_variant.py performs termination analysis.It synthesizes ranking functions and invokes framac's WP plugin to validate termination, thereby upgrading verification results from partial correctness to total correctness.
The script accepts either a list of C files -f or a directory -i matching the earlier modes.
python3 generate_variant.py -f file1.c,file2.c,... -o output-dir -m model1,model2,...
For example: `dataset/case4test/1.c`
run ` python3 main.py -f ./dataset/case4test/1.c -o ./out -m your_model`
Then you can get the annotated file on `out/1_0001/1_merged.c`. It looks like:
void main()
{
int x = 1;
int y = 1;
/*@
loop invariant y == x;
loop invariant 1 <= y;
loop assigns y;
loop assigns x;
*/
while(unknown1()) {
int t1 = x;
int t2 = y;
x = t1 + t2;
y = t1 + t2;
}
//@ assert y>=1;
}
If the verification process has been completed and you wish to regenerate the specification, simply delete the `dataset/case4test/1.c.pickle` file and rerun `main.py`.