eXpress: Guided Path Exploration for Regression Testing

PROJECT SUMMARY

Software programs evolve throughout their lifetime undergoing various changes. While making these changes, software developers may introduce regression faults. It is desirable to detect these faults as quickly as possible to reduce the cost involved in fixing them. One existing solution is continuous testing, which runs an existing test suite to quickly find regression faults as soon as code changes are saved. However, the effectiveness of continuous testing depends on the capability of the existing test suite for finding behavioral differences across versions.

To address the issue, we propose an approach, called eXpress, that conducts efficient regression test generation based on a path-exploration-based test generation (PBTG) technique, such as dynamic symbolic execution. eXpress prunes various irrelevant paths with respect to detecting behavioral differences to optimize the search strategy of a PBTG technique. As a result, the PBTG technique focuses its efforts on regression test generation. In addition, eXpress leverages the existing test suite (if available) for the original version to efficiently execute the changed regions of the program and infect program states. Experimental results on 67 versions (in total) of four programs (two from the subject infrastructure repository and two from real-world open source projects) show that, using eXpress, a state-of-the-art PBTG technique, called Pex,

requires about 36% less amount of time (on average) to detect behavioral differences than without using eXpress. In addition, Pex with eXpress detects nine behavioral differences that could not be detected without eXpress (within a time bound). Furthermore, our approach requires 67% less amount of time to find behavioral differences by exploiting an existing test suite than exploration without using the test suite.

This project is part of the collaborative project on improving automation in developer testing with Microsoft Research.

PEOPLE

Faculty

Tao Xie (Principal Investigator)

Students

Kunal Taneja (PhD Student)

Collaborators

Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte (Microsoft Research)

EVALUATIONS

We conducted experiments on four programs and their 67 versions (in total) collected from three different sources. In our experiments, we try to address the following research questions:

RQ1. How high percentage of paths explored by Pex belong to the 3 categories of irrelevant paths (P¬P , P¬E, and P¬I as described in Section 4 of attached pdf) being pruned by eXpress?

RQ2. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavioral differences with the assistance of eXpress?|

RQ3. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavior differences when the program exploration is seeded with the existing test suite?

SUBJECTS USED

1. Replace: 32 versions

2. Siena : 17 versions

3. Silverlight String-To-PathGeometry Converter (STPG): 2 versions

4. Structorian : 21 versions

EXPERIMENTAL SETUP

For replace and siena, we conduct regression test generation between the original version and each version v2 synthesized from the available faults in the SIR. We use eXpress and the default search strategy in Pex to conduct regression test generation. In addition to the versions synthesized by seeding faults, we also conduct regression test generation between each successive versions of siena (versions 1.8 through 1.15) available in SIR, using eXpress and the default search strategy in Pex. For STPG and structorian, we conduct regression test generation between two successive pairs of versions that we collected.

To address RQ1, we categorize all the irrelevant paths explored by Pex (without eXpress) as one of the 3 categories described in Section 4 and measure the percentage of paths in each category. The percentage reflects the benefits that eXpress achieves in pruning irrelevant paths. To address RQ2, we compare the number of runs and the amount of time required by Pex with the number of runs required by Pex incorporated with eXpress (referred to as Pex+eXpress in the rest of this paper) to find behavioral differences between two versions of a program under test. To address RQ3, we compare the number of DSE runs and the amount of time required by Pex (and Pex+eXpress) to find behavioral differences with and without seeding the program exploration (with the existing test suite).

Currently, we have not automated our code-instrumentation technique. We instrument each version manually for our experiments. In future work, we plan to automate the technique. The rest of the approach is fully automated and is implemented in a tool as an extension14 to Pex . We developed its components to statically find irrelevant branches as a .NET Reflector15 AddIn. To find behavioral differences between two versions, we execute the tests generated for a new version on the original version. Behavioral differences are detected by a test if an assertion in the test fails.

Experimental Results

RQ1:

How high percentage of paths explored by Pex belong to the 3 categories of irrelevant paths (P¬P , P¬E, and P¬I as described in Section 4 of attached pdf) being pruned by eXpress?

#BP : The average number of branches in the set BP (as described in Section 4).

#BE : The number of average branches in the set BE (as described in Section 4).

#T: The average number of branches in the CFG

P1: The percentage of irrelevant paths (on average) in Category P¬P (as described in Section 4) among all the paths explored by Pex.

P2: The percentage of irrelevant paths (on average) in Category P¬E (as described in Section 4) among all the paths explored by Pex.

P3: The percentage of irrelevant paths (on average) in Category P¬I (as described in Section 4) among all the paths explored by Pex.

T(Irr) : The percentage of irrelevant paths (on average) among all the paths explored by Pex.

Summary: Among the total paths explored by Pex, 55% (on average) are irrelevant. P¬P , P¬E, and P¬I respectively contain 14%, 26%, and 6% of the total paths explored by Pex.

RQ2. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavioral differences with the assistance of eXpress?|

E-Pex: The total number of DSE runs required Pex for satisfying E (i.e., execution of a changed region)

E-Red: Percentage reductions in the number of DSE runs using eXpress to satisfy E.

I-Pex: The total number of DSE runs required by Pex for satisfying I (i.e., Program State Infection)

I-Red: Percentage reductions in the number of DSE runs using eXpress for satisfying I.

P-Pex: The total number of DSE runs required by Pex for satisfying P (i.e., finding behavioral differences)

P-Red: Percentage reductions in the number of DSE runs using eXpress for satisfying P.

Summary: Pex+eXpress requires 36% less amount of time (on

average) to detect behavioral differences than without using eXpress

RQ3:

How many fewer DSE runs and how much fewer amount of time does Pex require to find behavior differences when the program exploration is seeded with the existing test suite?

Summary: Our incremental approach requires 67% less amount of time to find behavioral differences by exploiting an existing test suite than exploration without using the test suite.

PUBLICATIONS

  1. Kunal Taneja, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. eXpress: Guided Path Exploration for Regression Test Generation. In Proceedings of the 2011 International Symposium on Software Testing and Analysis
  2. (ISSTA 2011), Toronto, Canada, July 2011. [PDF] (A previous short version appeared as ICSE 2009 NIER track short paper below).
    1. Kunal Taneja, Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. Guided Path Exploration for Regression Test Generation. In Companion Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), New Ideas and Emerging Results, Vancouver, Canada, pp. 311-314, May 2009. [PDF]

SPONSORS

d
g
ss

This material is based upon work supported by the National Science Foundation and was originally developed under Grant No CCF-0725190 and CNS-0958235. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.