3rd International Verification of Neural Networks Competition (VNN-COMP'22)
News / Updates
February 2023: the 4th VNN-COMP for 2023 website is here: https://sites.google.com/view/vnn2023
Congratulates to the following teams:
alpha-beta-crown: highest overall score
MN-BaB: second highest overall score
VeriNet: third highesst overall score
Results are available here, please help audit in case of any issues: https://github.com/ChristopherBrix/vnncomp2022_results
Recording from presentation: https://drive.google.com/file/d/1JQ_pmTeJCuRhMT_ubgbQnwgruCJaohad/view?usp=sharing
Slides from presentation: https://drive.google.com/file/d/1nnRWSq3plsPvOT3V-drAF5D8zWGu02VF/view?usp=sharing
Update: Report is now online here.
Benchmark list: https://github.com/ChristopherBrix/vnncomp2022_benchmarks
Execution framework (automating underlying execution on AWS): https://vnncomp.christopher-brix.de/
Refer to the GitHub issues for up-to-date discussions: https://github.com/stanleybak/vnncomp2022/issues
Presentation of VNN-COMP'22 results at FLoC'22 event with FOMLAS on July 31 at 14:00 local time: https://easychair.org/smart-program/FLoC2022/FoMLAS-index.html
February 23, 2022: website updated; please register tools for the competition in this Google form: https://forms.gle/RbGbKvfvn7bvTR1M9
VNN-COMP'22 benchmarks and rules discussion: https://github.com/stanleybak/vnncomp2022/issues
February 3, 2022: Much content is from last year's VNN-COMP'21 website still, this site to be updated soon for the 2022 edition. VNN-COMP'22 is planned to be held with FoMLAS again as a part of CAV'22/FLoC'22, with dates, etc., to be posted soon.
VNN-COMP'21 report: https://arxiv.org/abs/2109.00498
VNN-COMP'21 benchmarks: https://github.com/stanleybak/vnncomp2021.
The competition will be held jointly with the 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS) affiliated with CAV 2022
The 2022 Verification of Neural Networks Competition (VNN-COMP'22), to be held with the 5th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS) with CAV 2022 over July 31-August 1 2022, in Haifa, Israel, aims to bring together researchers interested in methods and tools providing guarantees about the behaviors of neural networks and systems built from them.
Introduction and Background
Methods based on machine learning are increasingly being deployed for a wide range of problems, including recommender systems, machine vision, autonomous driving, and beyond. While machine learning has made significant contributions to such applications, concerns remain about the lack of methods and tools to provide formal guarantees about the behaviours of the resulting systems.
In particular, for data-driven methods to be usable in safety-critical applications, including autonomous systems, robotics, cybersecurity, and cyber-physical systems, it is essential that the behaviours generated by neural networks are well-understood and can be predicted at design time. In the case of systems that are learning at run-time it is desirable that any change to the underlying system respects a given safety-envelope for the system.
While the literature on verification of traditionally designed systems is wide and successful, there has been a lack of results and efforts in this area until recently. The competition intends to bring together researchers working on techniques for the verification of neural networks. We anticipate a similar organization and process to the International Competition on Verifying Continuous and Hybrid Systems (ARCH-COMP), where a categorization based on system expressiveness and problem formulation exists. In the context of VNN, this could for instance be different problem categories for whether a network is feedforward or recurrent (RNNs). Within these broad categorizations, further categorization may exist based on whether verification approaches support only certain layers (e.g., many approaches allow ReLUs, but relatively fewer allow nonlinear activations such as tanh). If you are interested, please contact the organizers. We anticipate analysis on existing benchmarks and challenge problems, such as ACAS-Xu, MNIST, CIFAR-10, but are also open to new challenge problems and benchmarks. We will follow similar procedures in the 2nd VNN-COMP.
Intention to participate: March 18, 2022
Finalization of the rules: April 15, 2022
Submission of benchmarks: May 31, 2022 June 14, 2022 (updated)
Participants finalize tool scripts and organizers begin running tools: June 30, 2022 July 14, 2022 (updated)
Workshop, presentation of VNN-COMP results and report: July 31 / August 1, 2022
How To Participate
Given the infancy of the area, the competition will be friendly in the sense that there are no rankings in this iteration, and is primarily a mechanism to share and standardize relevant benchmarks to enable easier progress within the domain, as well as to understand better on what methods are most effective for which problems along with current limitations. Subject to sponsorship, we may offer a "best competition results" award with a process that will involve community feedback for selection.
Register by March 18, 2022 (you may continue to join after this date, but please register sooner rather than later), if you are interested to learn more, participate, and help guide the direction of the competition, please use this google form.
We welcome participation from all, and particularly have considered the following possible participants:
Verification / Robustness Tool Developers: you have a tool/method for proving properties of neural networks
Benchmark / Challenge Problem Proposers: you have neural networks and properties you would like to check for them, and can publicly share both
Sponsors and Others: you are interested in the area, but do not want to participate with a tool or benchmark/challenge problem, and/or would be interested in sponsoring a "best competition results" award
Participation will be done remotely in advance of the workshop, with summary results presented at the workshop. Attendance at the workshop by VNN-COMP participants will NOT be required, but of course you would be welcome to attend. Coordination mechanisms are open to discussion, but likely would be facilitated via, e.g., Git repositories, Slack, and/or forums such as Google Groups.
The mechanisms of the competition are community driven, along the lines of prior related competitions for hybrid systems verification (ARCH-COMP, https://cps-vo.org/group/ARCH/FriendlyCompetition ) and software verification (SV-COMP, https://sv-comp.sosy-lab.org/ ), and we welcome any suggestions for organization. Current plans anticipate some subdivisions of the competition into categories, such as by what layers/activations different tools and methods allow, whether they perform exact (sound and complete) or over-approximative (sound but incomplete) analysis, or training/synthesis vs. verification.
Anticipated benchmarks include ACAS-Xu, MNIST classifiers, CIFAR classifiers, etc., with various parameterizations (initial states, specifications, robustness bounds, etc.), and the mechanism for selection will be community driven, similar to the benchmark jury selection in related competitions. Participants are welcome to propose other benchmarks.
Some goals of this initiative are to lead to greater standardization of benchmarks, model formats (ONNX, etc.), etc., which has helped led to advances in other areas (e.g., SMT-LIB, http://smtlib.cs.uiowa.edu/ ), such as by making progress on the VNN-LIB initiative (http://www.vnnlib.org/ ), as well as to get some scientific sense on what the current landscape of methods and their applicability to different problems. Eventually, we hope the initiative will lead to better comparisons among methods. Depending on levels of participation, categories, etc., the outcome of the competition may be a series of competition reports and a repeatability evaluation.
Thank you for your consideration, and we hope you will participate. Please let us know of any questions, if you would like to discuss before making a decision, or any suggestions you may have for the organization of this initiative, as we believe it will be most successful if driven actively by the community.