PropertyGPT

PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

Abstract

Formal verification is a technique that can prove the correctness of a system with respect to a certain specification or property. It is es- pecially valuable for security-sensitive smart contracts that manage billions in cryptocurrency assets. Although existing research has developed various static provers for smart contracts, a key missing component is the automated generation of comprehensive proper- ties, including invariants, pre-/post-conditions, and rules. Hence, industry-leading players like Certora have to rely on their own or crowdsourced experts to manually write properties case by case.

With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs, such as GPT-4, to transfer existing human-written properties (e.g., those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for LLM-based in-context learning to generate a new prop- erty for a given code. While this basic process is relatively straight- forward, ensuring that the generated properties are (i) compilable, (ii) appropriate, and (iii) runtime-verifiable presents challenges. To address (i), we use the compilation and static analysis feedback as an external oracle to guide LLMs in iteratively revising the generated properties. For (ii), we consider multiple dimensions of similarity to rank the properties and employ a weighted algorithm to identify the top-K properties as the final result. For (iii), we design a dedicated prover to formally verify the correctness of the generated prop- erties. We have implemented these strategies into a novel system called PropertyGPT, with 623 human-written properties collected from 23 Certora projects. Our experiments show that PropertyGPT can generate comprehensive and high-quality properties, achieving an 80% recall compared to the ground truth. It successfully detected 26 CVEs/attack incidents out of 37 tested and also uncovered 12 zero-day vulnerabilities, resulting in $8,256 bug bounty rewards.

Research Questions & Experiment Results

RQ1:(PropertyGeneration)How accurately does PropertyGPT generate properties for smart contracts? RQ1-Experiment

RQ2: (Vulnerability Detection) How effectively does PropertyGPT discover smart contract vulnerabilities? Can PropertyGPT achieve state-of-the-art results? RQ2-CVE RQ2-SmartInv

RQ3: (Ablation Study) What factors influence the performance of PropertyGPT? RQ3-PropertyFix RQ3-Top-K

RQ4: (Impact) How well does PropertyGPT find zero-day vulnerabilities in real-world smart contract projects? RQ4

Page updated

Google Sites

Report abuse