PROJECT SUMMARY
Software programs evolve throughout their lifetime undergoing various changes. While making these changes, software developers may introduce regression faults. It is desirable to detect these faults as quickly as possible to reduce the cost involved in fixing them. One existing solution is continuous testing, which runs an existing test suite to quickly find regression faults as soon as code changes are saved. However, the effectiveness of continuous testing depends on the capability of the existing test suite for finding behavioral differences across versions . To address the issue, we propose an approach, called eXpress, that conducts efficient regression test generation based on a pathexplorationbased test generation (PBTG) technique, such as dynamic symbolic execution. eXpress prunes various irrelevant paths with respect to detecting behavioral differences to optimize the search strategy of a PBTG technique. As a result, the PBTG technique focuses its efforts on regression test generation. In addition, eXpress leverages the existing test suite (if available) for the original version to efficiently execute the changed regions of the program and infect program states. Experimental results on 67 versions (in total) of four programs (two from the subject infrastructure repository and two from realworld open source projects) show that, using eXpress, a stateoftheart PBTG technique, called Pex, requires about 36% less amount of time (on average) to detect behavioral differences than without using eXpress. In addition, Pex with eXpress detects nine behavioral differences that could not be detected without eXpress (within a time bound). Furthermore, our approach requires 67% less amount of time to find behavioral differences by exploiting an existing test suite than exploration without using the test suite. PEOPLE
Faculty Tao Xie (Principal Investigator) Students Kunal Taneja (PhD Student)
Collaborators Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte (Microsoft Research)
EVALUATIONS We conducted experiments on four programs and their 67 versions (in total) collected from three different sources. In our experiments, we try to address the following research questions:
RQ1. How high percentage of paths explored by Pex belong to the 3 categories of irrelevant paths (P¬P , P¬E, and P¬I as described in Section 4 of attached pdf) being pruned by eXpress? RQ2. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavioral differences with the assistance of eXpress?
RQ3. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavior differences when the program exploration is seeded with the existing test suite? SUBJECTS USED
1. Replace: 32 versions 2. Siena : 17 versions 3. Silverlight StringToPathGeometry Converter (STPG): 2 versions 4. Structorian : 21 versions EXPERIMENTAL SETUP
For replace and siena, we conduct regression test generation between the original version and each version v2 synthesized from the available faults in the SIR. We use eXpress and the default search strategy in Pex to conduct regression test generation. In addition to the versions synthesized by seeding faults, we also conduct regression test generation between each successive versions of siena (versions 1.8 through 1.15) available in SIR, using eXpress and the default search strategy in Pex. For STPG and structorian, we conduct regression test generation between two successive pairs of versions that we collected. To address RQ1, we categorize all the irrelevant paths explored by Pex (without eXpress) as one of the 3 categories described in Section 4 and measure the percentage of paths in each category. The percentage reflects the benefits that eXpress achieves in pruning irrelevant paths. To address RQ2, we compare the number of runs and the amount of time required by Pex with the number of runs required by Pex incorporated with eXpress (referred to as Pex+eXpress in the rest of this paper) to find behavioral differences between two versions of a program under test. To address RQ3, we compare the number of DSE runs and the amount of time required by Pex (and Pex+eXpress) to find behavioral differences with and without seeding the program exploration (with the existing test suite). Currently, we have not automated our codeinstrumentation technique. We instrument each version manually for our experiments. In future work, we plan to automate the technique. The rest of the approach is fully automated and is implemented in a tool as an extension14 to Pex . We developed its components to statically find irrelevant branches as a .NET Reflector15 AddIn. To find behavioral differences between two versions, we execute the tests generated for a new version on the original version. Behavioral differences are detected by a test if an assertion in the test fails. Experimental Results
RQ1: How high percentage of paths explored by Pex belong to the 3 categories of irrelevant paths (P¬P , P¬E, and P¬I as described in Section 4 of attached pdf) being pruned by eXpress?
#BP : The average number of branches in the set BP (as described in Section 4). #BE : The number of average branches in the set BE (as described in Section 4). #T: The average number of branches in the CFG P1: The percentage of irrelevant paths (on average) in Category P¬P (as described in Section 4) among all the paths explored by Pex. P2: The percentage of irrelevant paths (on average) in Category P¬E (as described in Section 4) among all the paths explored by Pex. P3: The percentage of irrelevant paths (on average) in Category P¬I (as described in Section 4) among all the paths explored by Pex. T(Irr) : The percentage of irrelevant paths (on average) among all the paths explored by Pex. Subject  #Bp  #Be  #T  P1(%)  P2(%)  P3(%)  T(Irr) 
 replace  3.6  84.4  206  22.5  25  7.5  55  siena  9  29  157  6  18  5  29  STPG  2  13  145  0  12  1  13  structorian (SL)  1  13  69  0  11  13  24  structorian (SP)  21  49  447  13  28  0  41 








Summary: Among the total paths explored by Pex, 55% (on average) are irrelevant. P¬P , P¬E, and P¬I respectively contain 14%, 26%, and 6% of the total paths explored by Pex.
RQ2. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavioral differences with the assistance of eXpress?
EPex: The total number of DSE runs required Pex for satisfying E (i.e., execution of a changed region) ERed: Percentage reductions in the number of DSE runs using eXpress to satisfy E. IPex: The total number of DSE runs required by Pex for satisfying I (i.e., Program State Infection) IRed: Percentage reductions in the number of DSE runs using eXpress for satisfying I. PPex: The total number of DSE runs required by Pex for satisfying P (i.e., finding behavioral differences) PRed: Percentage reductions in the number of DSE runs using eXpress for satisfying P.

 Execution  Infection  Propagation  Subject  Version  E_Pex  E_Red  Median Reduction  I_Pex  I_Red  Median Reduction  P_Pex  P_Red  Median Reduction  Time (Pex)  Time (eXpress)  Reduction in Time  












 replace  32  1668  49.1  0  2740  42  34  9812  63  37  711  305  57
 siena  17  286  42  13  284  39.4  13  6914  33  11  1011  628  29  STPG  2  341  26  26  378  23  23  378  32  32  353  286  19 













 structorian 
 SL  29  102  26.5    102  26.5                SL  9139  102  26.5    102  26.5    2988  66    99  69.3  68  SL  139150  102  26.5 
 102  26.5    761  69    26  7.5  71  SL  150169  53  13.2    53  13.2    299  52    7.4  3.9  47  SL  169174  55  12.7    55  12.7    478  32.2    14.2  8.4  41  SL  174175  102  26.5    102  26.5                Total(SL) 
 516  24  26.5  516  26.5  24  4526  62  52  146.6  89.1  39 
 SP  25  10000*  81    10000*  74    10000*  74    1hr  35min  41.7  SP  56  10000*  74    10000*  74    10000*  74    1hr  32min  47  SP  913  10000*  81    10000*  81    10000*  81    1hr  27min  55  SP  3739  6  0    3699  77    3699  77    26min  22min  15  SP  3940  2  0                      SP  5062  6188  82.9 
 6188  82.9    6188  82.9    35 min  21  40  SP  4547                          SP  4750  2  0    2  0                SP  62124  6  0    10000*  28    10000*  28    1hr*  58min  2  SP  4045                          Total(SL) 
 36206  79  74  49889  68  77  49889  68  77  5hr  3.25hr  35 
Summary: Pex+eXpress requires 36% less amount of time (on average) to detect behavioral differences than without using eXpress RQ3: How many fewer DSE runs and how much fewer amount of time does Pex require to find behavior differences when the program exploration is seeded with the existing test suite? C  V  Pex  Pex+Seeding  eXpress  eXpress+seeding  SP  25  10000/1hr^{*}  10000/1hr^{*}  2381/35min  181/17min  SP  3739  3699/26m  60/1m  851/22m  47/11m  SP  3940  10000/1hr^{*}  304/2m  10000/1hr^{*}  251/12m  SP  4547  10000/1hr^{*}  10000/1hr^{*}  10000/1hr^{*}  10000/1hr^{*}  SP  4750  10000/1hr^{*}  81/1m  10000/1hr^{*}  64/10m  SP  62124  10000/1hr^{*}  59/1m  7228/58m  41/10m  SL  169174  478/1m  324/1m  34/1m  18/1m  SL  150169  299/1m  37/1m  52/1m  29/1m  SL  9139  2988/2m  69/1m  1002/1m  52/1m  Total 
 64476/6.5hr  20934/2hr8m  41568/5hr9m  10683/2hr3m

Summary: Our incremental approach requires 67% less amount of time to find behavioral differences by exploiting an existing test suite than exploration without using the test suite. PUBLICATIONS
 Kunal Taneja, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. eXpress: Guided Path Exploration for Regression Test Generation. In Proceedings of the 2011 International Symposium on Software Testing and Analysis
(ISSTA 2011), Toronto, Canada, July 2011. [PDF] (A previous short version appeared as ICSE 2009 NIER track short paper below).
 Kunal Taneja, Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. Guided Path Exploration for Regression Test Generation. In Companion Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), New Ideas and Emerging Results, Vancouver, Canada, pp. 311314, May 2009. [PDF]
SPONSORS 

Updating...
Ċ Kunal Taneja, Jun 15, 2011, 8:05 AM
