SpecTest: Specification-based Compiler Fuzzing

SpecTest (Specification-based Compiler Fuzzing) is a research project granted under the NSOE-TSS Grant Call 2019: Trustworthy Software Systems – Core Technologies issued by the National Satellite of Excellence in Trustworthy Software Systems (NSoE-TSS) and funded by the National Research Foundation (NRF).


Currently, we have multiple open postdoc positions for research fellows, who wish to join the project team. If you have a PhD in Computer Science or related areas and experience in program analysis/verification/testing, please contact Sun Jun for details.


Abstract:

Compilers are a key technology of software development. They are relevant for not only general purpose programming languages (like C/Java) but also many domain specific languages. Compilers are error-prone, especially concerning less-used language features. Existing compiler testing techniques often rely on weak test oracles which prevents them from finding deep semantic errors. In this project, we aim to develop a novel specification-based fuzzing method named SpecTest for compilers. SpecTest has three components: an executable specification of the language, a fuzzing engine which generates test cases for programs in the language, and a code mutator which generates new programs for testing the compiler. SpecTest identifies compiler bugs by comparing the abstract execution of the specification and concrete execution of compiled program. Furthermore, with the mutator, SpecTest can systematically test those less-used language features. SpecTest will be evaluated with the newly developed Solidity compiler and more mature compilers like GCC/JVM.


Project Scope and Goals:

Compilers are an essential tool for software development. They are relevant for not only general purpose programming languages (like C/Java) but also many domain specific languages. Their main function is to convert source code to executable machine code. In addition, they may also provide features, like code optimisation or debug utilities. The first compiler was implemented by Grace Hopper in 1952 and was called the A-0 system [1]. Since then, a variety of compilers has been written for countless languages. Modern compilers like GCC and LLVM are overwhelmingly complicated and thus are inevitably subject to bugs. Although some of them (e.g., GCC) have been used for decades, they may still be buggy (e.g., [2, 3]), especially concerning those less commonly used language features. For instance, the team led by Prof. Su from ETH managed to detect more than 1600 bugs in GCC and LLVM in a period of about 2 years. For another instance, the uninitialized storage pointers vulnerability in Solidity threatens many smart contracts deployed on Ethereum network. A honey pot named OpenAddressLottery was deployed that used this vulnerability to collect ether (i.e., digital money in Ethereum) from some would-be hackers. This compiler bug is later fixed in the latest version of Solidity compiler.

Overall, automatic compiler testing is still an open research field, especially so for newly developed programming languages (like Rust, Solidity, and numerous domain specific languages), since the semantics of a new language are often not completely defined (e.g., subject to modification, refinement and introduction of new features) and there are no pre-existing compilers that can serve as a reference. Automatic compiler testing must systematically address at least the following three problems.

1. The test automation problem (i.e., how to automatically run the test cases)

2. The test generation problem (i.e., what test cases do we generate, among an infinite number of possible ones).

3. The oracle problem (i.e., what are regarded as bugs).

While the first problem is relatively easy to solve (e.g. with the help of frameworks like jUnit), the remaining two are known to be challenging. In this work, we aim to develop a novel specification-based fuzzing method called SpecTest for compiler testing. In particular, we would like to achieve the following objectives.

First, SpecTest should be fully automatic. In other words, SpecTest should systematically and automatically answer the above three problems. Given the size of compilers (e.g. GCC has more than 15M lines of code) and the rapid development of new languages and compilers (e.g. the Solidity compiler for the Solidity language), testing compilers through manually crafted test cases has its limitations and thus it is important that SpecTest is fully automatic so that it can be potentially applied to a wide range of compilers.

Second, SpecTest must be able to detect semantic errors, i.e. bugs that are related to the language semantics (rather than the syntax). To achieve this objective, we need a test oracle that can decide if a test passes or fails. Note that a test case for a compiler is a program, together with the program input. There are various types of oracles, e.g. some weak oracles might only be able to check whether the program is compilable or not; a strong oracle can check whether the output of the program (with the program input) is correct or not. Existing approaches are rather limited. They focus on parsing errors in the compiler or simple semantic errors, using weak semantic oracles which only detects whether unexpected exceptions occur while compiling the program or whether simple algebraic properties are satisfied or not, but are not able to detect errors like “1+2” resulting in “4”. While those approaches are useful in their own right, SpecTest aims to thoroughly test that the language semantics is properly captured by the compiler.

Third, SpecTest must be applicable to a variety of compilers. Given that the rapid development of new compilers, SpecTest will be much more useful if it is not particular to one particular language or one particular compiler. We plan to apply SpecTest for testing at least two compilers. The first and most important target for SpecTest is the Solidity compiler, which is a relatively new compiler undergoing rapid development. Solidity is the most popular language for smart contracts, which are programs that specify rules and terms for an exchange of assets between some involved parties in the source code. The advantage of smart contracts is that they run on the block chain (on a network of distrusting nodes), which eliminates the need for a central trusted party. The correctness of smart contracts is essential, since they are concerned with transactions that can involve huge amounts of monetary value, and it has been shown that security vulnerabilities can cause losses of millions of dollars [4]. Testing the Solidity compiler for smart contracts is especially important, since a bug in the compiler might result in security vulnerabilities in all the smart contracts which were compiled with it. For instance, the uninitialized storage pointers bug in the Solidity compiler resulted in multiple honeypot contracts (e.g. OpenAddressLottery and CryptoRoulette) which aim to steal ether through exploiting the bug. Additionally, SpecTest will be implemented for a more mature compiler, such as GCC/LLVM or JVM, to show its generality.


Project Team:


Project Start: October 2019


Project Duration: 2.5 years


References:

  1. B.H. Bunch and A. Hellemans. The Timetables of Technology: A Chronology of the Most Important People and Events in the History of Technology. Hudson group book. Simon & Schuster, 1993. ISBN: 9780671769185.
  2. Chengnian Sun, Vu Le, and Zhendong Su. “Finding and analyzing compiler warning defects”. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. Ed. by Laura K. Dillon, Willem Visser, and Laurie Williams. ACM, 2016, pp. 203–213.
  3. Chengnian Sun, Vu Le, and Zhendong Su. “Finding compiler bugs via live code mutation”. In: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016. Ed. by Eelco Visser and Yannis Smaragdakis. ACM, 2016, pp. 849–863.
  4. Nicola Atzei, Massimo Bartoletti, and Tiziana Cimoli. “A Survey of Attacks on Ethereum Smart Contracts (SoK)”. In: Principles of Security and Trust - 6th International Conference, POST 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings. Ed. by Matteo Maffei and Mark Ryan.