Benchmarks are heavily used in different areas of computer science to evaluate algorithms and tools. In program analysis and testing, open-source and commercial programs are routinely used as benchmarks to evaluate different aspects of the algorithms and tools. Unfortunately, many of these programs are written by programmers who introduce different biases, not to mention that it is very difficult to find programs that can serve as benchmarks with high reproducibility of results.
We propose a novel approach for generating random benchmarks for evaluating program analysis and testing tools. Our approach uses stochastic parse trees, where language grammar production rules are randomly instantiated to generate programs that meet overall program configuration goals. We implemented our tool for Java and applied it to generate benchmarks with which we evaluated different program analysis and testing tools.
Technical paper: In the proceedings of the 10th International Workshop on Dynamic Analysis (WODA), 2012 [PDF]
Slides: From the talk given at the 10th International Workshop on Dynamic Analysis (WODA), 2012 [PPTX] [PDF]
Our experiments were done in two different setups:
We used a Sun HotSpot 1.6.0_24 JVM running on a Windows XP OS system with 2.33 GHz 64 - bit Intel Xeon processor with 32GB RAM to run the experiments
RQ1. How similar are RUGRAT-generated applications to third-party applications?
RQ2. How do program analysis tools behave while analyzing RUGRAT-generated applications?
RQ3. Can the RUGRAT-generated applications find defects in the program analysis tools?
Preliminary Experimental Results for RUGRAT4Load:
We used a Sun HotSpot 1.6.0_24 64bit JVM running on a Windows 7 system with 2.4GHz 64bit Intel i5 processor and 4GB RAM to run the experiments.
RQ4. How do the dynamic characteristics (i.e., memory and CPU usage, disk I/O, etc.) of the generated programs compare with widely used benchmark applications ?
Here is a list of the programs we used in our experiments:
1. RUGRAT (Mandatory)
2. RUGRAT4Load (optional)
3. Benchmark programs (Optional, can be generated using RUGRAT and configuration files)
4. Tools and Libraries (Mandatory)
This material is based upon work supported by the National Science Foundation under Grants No. 0916139, 1017633, 1017305, and 1117369, as well as Accenture. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
RUGRAT is a collaboration between the University of Texas at Arlington, Accenture Technology Labs, the University of Illinois at Chicago, Georgia Tech, and North Carolina State University.
E-mail: ishtiaque.hussain [at] mavs [dot] uta [dot] edu ihussain [at] psu [dot] edu
Affiliation: University of Texas at Arlington Pennsylvania State University - Abington
E-mail: csallner [at] uta [dot] edu
Affiliation: University of Texas at Arlington
E-mail: drmark [at] uic [dot] edu
Affiliation: Accenture Technology Labs and University of Illinois at Chicago
E-mail: chen.fu [at] accenture [dot] com chenfucs [at] gmail [dot] com
Affiliation: Accenture Technology Labs Alibaba Group
E-mail: qing.xie [at] accenture [dot] com xieqing2000 [at] gmail [dot] com
Affiliation: Accenture Technology Labs Tex-X
E-mail: sangminp [at] cc [dot] gatech [dot] edu
Affiliation: Georgia Tech Two Sigma
E-mail: ktaneja [at] ncsu [dot] edu taneja [dot] kunal [at] gmail [dot] com
Affiliation: North Carolina State University Google
E-mail: bhossa2 [at] uic [dot] edu mainul [at] iit [dot] du [dot] ac [dot] bd
Affiliation: University of Illinois at Chicago University of Dhaka