PROJECT SUMMARY
Regression test generation aims at generating a test suite that can detect behavioral differences between the original and the new versions of a program. Regression test generation can be automated by using Dynamic Symbolic Execution (DSE), a state-of-the-art test generation technique. DSE explores all feasible paths in the program but exploration of all these paths can often be expensive. However, if our aim is to detect behavioral differences between two versions of a program, we do not need to explore all these paths in the program, since not all these paths are relevant for detecting behavioral differences. In this paper, we propose an approach on guided path exploration that avoids exploring irrelevant paths in terms of detecting behavioral differences. Hence, behavioral differences are more likely to be detected earlier in path exploration. In addition, our approach leverages the existing test suite (if available) for the original version to efficiently execute the changed regions of the program and infect program states. Experimental results on 67 versions (in total) of four programs show that our approach requires about 36% fewer amount of time (on average) to detect behavioral differences than exploration without using our approach. In addition, our approach detects 6 behavioral difference that could not be detected by exploration without using our approach (within a time bound). Furthermore, our approach requires 67% fewer amount of time to find behavioral differences by exploiting an existing test suite than exploration without using the test suite.
PEOPLE
Faculty
Tao Xie (Principal Investigator)
Students
Kunal Taneja (PhD Student)
Collaborators
Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte (Microsoft Research)
EVALUATIONS
We conducted experiments on four programs and their 67 versions (in total) collected from three different sources. In our experiments, we try to address the following research questions:
RQ1. How many fewer DSE runs does Pex require to execute the changed regions between the two versions of a program with the assistance of eXpress?
RQ2. How many fewer DSE runs does Pex require to infect the program states with the assistance of eXpress?
RQ3. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavioral differences with the assistance of eXpress?|
RQ4. How many fewer DSE runs and how much fewer amount of time does Pex require to find behavior differences when the program exploration is seeded with the existing test suite?
SUBJECTS USED
1. Replace: 32 versions
2. Siena : 17 versions
3. Silverlight String-To-PathGeometry Converter (STPG): 2 versions
4. Structorian : 21 versions
EXPERIMENTAL SETUP
For replace and siena, we conduct regression test generation between the original version and each version v2 synthesized from the available faults in the SIR. We use eXpress and the default search strategy in Pex to conduct regression test generation. In addition to the versions synthesized by seeding faults, we also conduct regression test generation between each successive versions of siena (versions 1.8 through 1.15) available in SIR, using eXpress and the default search strategy in Pex. For STPG and structorian, we conduct regression test generation between two successive pairs of versions that we collected.
To address RQ1, we compare the number of runs of DSE required by the default search strategy in Pex (in short as Pex) with the number of runs required by Pex enhanced with eXpress (in short as Pex+eXpress) to execute a changed region. To address RQ2, we compare the number of runs required by Pex with the number of runs required by Pex+eXpress to infect the program states. To address RQ3, we compare the number of tests that cover a changed region generated by Pex with the number of such tests generated by Pex+eXpress. If more tests are generated that cover a changed region, it is easier for developers (or testers) to debug the program under test (if the changes are faulty) and gives more confidence to developers that the changes they made do not introduce any unintended side effects. To address RQ4, we compare the number of tests that infect the program state after the execution of changed region generated by Pex with the number of such tests generated by Pex+eXpress. To address RQ5, we compare the number of DSE runs required by Pex (and Pex+eXpress) to cover all the blocks in all the changed regions with and without seeding the program exploration (with the existing test suite). Our experimental results for the four subjects are shown below.
To find behavioral differences between two versions, we execute the tests generated for a new version on the original version. Behavioral differences are detected by a test if an assertion in the test fails.
Experimental Results for Replace and STPG
| Execution | Infection |
| Subject | | E-Pex | E-eXpress | Reduction (%) | Ne-Pex | Ne-eXpres | Increment (%) | I-Pex | I-eXpress | Reduction (%) | Ni-Pex | Ni-eXpress | Increment (%) |
| replace | 1 | 7 | 7 | 0 | 107 | 153 | 43 | 16 | 14 | 12.5 | 21 | 32 | 52.4 |
| replace | 2 | 7 | 7 | 0 | 107 | 153 | 43 | 79 | 65 | 17.7 | 19 | 27 | 42.1 |
| replace | 3 | 28 | 28 | 0 | 50 | 132 | 164 | - | - | - | - | - | |
| replace | 4 | 28 | 28 | 0 | 50 | 132 | 164 | 133 | 133 | 0 | 14 | 30 | 114.3 |
| replace | 5 | 240 | 150 | 37.5 | 12 | 25 | 108.3 | 240 | 158 | 34.2 | 5 | 12 | 140 |
| replace | 6 | 317 | 82 | 73.8 | 10 | 18 | 80 | 317 | 83 | 73.8 | 7 | 7 | 0 |
| replace | 7 | 60 | 32 | 46.7 | 11 | 18 | 63.6 | 128 | 58 | 54.7 | 4 | 7 | 75 |
| replace | 8 | 60 | 32 | 46.7 | 11 | 18 | 63.6 | 132 | 132 | 0 | 4 | 6 | 50 |
| replace | 9 | 13 | 13 | 0 | 41 | 213 | 419.5 | 243 | 102 | 58 | 11 | 31 | 181.8 |
| replace | 10 | 13 | 13 | 0 | 41 | 213 | 419.5 | 152 | 111 | 27 | 13 | 32 | 146.1 |
| replace | 11 | 13 | 13 | 0 | 41 | 213 | 419.5 | 224 | 138 | 38.4 | 15 | 24 | 60 |
| replace | 14 | - | - | - | | | | - | - | - | - | | - |
| replace | 15 | 4 | 4 | 0 | 91 | 211 | 131.9 | 4 | 4 | 0 | 91 | 105 | 15.4 |
| replace | 16 | 60 | 32 | 46.7 | 11 | 18 | 63.6 | 126 | 123 | 2.4 | 4 | 8 | 100 |
| replace | 17 | 20 | 20 | 0 | 45 | 45 | 0 | 15 | 15 | 0 | 4 | 23 | 475 |
| replace | 18 | - | - | - | - | - | - | - | - | - | - | - | - |
| replace | 20 | 15 | 15 | 0 | 3 | 71 | 2266.7 | 15 | 15 | 0 | 3 | 71 | 2266.7 |
| replace | 22 | 6 | 6 | 0 | 128 | 151 | 18 | - | - | - | - | - | - |
| replace | 23 | 6 | 6 | 0 | 128 | 151 | 18 | 6 | 6 | 0 | 101 | 103 | 2 |
| replace | 24 | 6 | 6 | 0 | 128 | 151 | 18 | 15 | 15 | 0 | 27 | 39 | 44.4 |
| replace | 25 | 425 | 95 | 77.6 | 2 | 7 | 250 | 465 | 132 | 71.6 | 1 | 3 | 200 |
| replace | 26 | 425 | 95 | 77.6 | 2 | 7 | 250 | 469 | 133 | 71.6 | 1 | 3 | 200 |
| replace | 27 | - | - | - | - | - | - | - | - | - | - | - | |
| replace | 28 | 60 | 32 | 46.7 | 11 | 18 | 63.6 | 182 | 123 | 32.42 | 3 | 3 | 0 |
| replace | 29 | 60 | 32 | 46.7 | 11 | 18 | 63.6 | 182 | 123 | 32.42 | 6 | 8 | 33.3 |
| replace | 30 | 60 | 32 | 46.7 | 11 | 18 | 63.6 | 60 | 32 | 46.7 | 4 | 5 | 25 |
| replace | 31 | - | - | - | - | - | - | - | - | - | - | - | - |
| replace | 32 | 13 | 13 | 0 | 51 | 95 | - | - | - | - | - | - | - |
| Total | | 1946 | 789 | 59.4 | 1183 | 2249 | 90 | 3203 | 1716 | 46.4 | 358 | 579 | 62 |
| STPG | 1 | 141 | 125 | 11.3 | 21 | 25 | 19 | 178 | 127 | 28.7 | 7 | 9 | 28.6 |
| STPG | 2 | 200 | 125 | 37.5 | 17 | 23 | 35.3 | 200 | 128 | 36 | 3 | 4 | 33.3 |
| Total | | 341 | 250 | 26.1 | 38 | 48 | 26.3 | 378 | 255 | 32.4 | 10 | 13 | 30 |
Experimental Results for Siena
| Execution | Infection |
Versions (Revions) | E-Pex | E-eXpress | Reduction (%) | Ne-Pex | Ne-eXpres | Increment (%) | I-Pex | I-eXpress | Reduction (%) | Ni-Pex | Ni-eXpress | Increment (%) |
| v1-v2 | 5 | 5 | 0 | 19 | 43 | 111.1 | 30 | 27 | 10 | 12 | 18 | 50 |
| v2-v3 | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| v3-v4(d) | 50 | 12 | 76 | 121 | 316 | 161.2 | 50 | 12 | 76 | 115 | 292 | 153.9 |
| v3-v4(e) | 3 | 3 | 0 | 36 | 98 | 172.2 | 3 | 3 | 0 | 28 | 88 | 214.3 |
| v4-v5 | 8 | 8 | 0 | 113 | 116 | 2.7 | NR | NR | - | - | - | - |
| v5-v6 | 23 | 17 | 26.1 | 25 | 81 | 224 | 33 | 24 | 27.3 | 12 | 23 | 91.67 |
| v6-v7 | - | - | - | - | - | - | - | - | - | - | - | - |
| v7-v8 | 21 | 3 | 85.7 | 13 | 45 | 246.2 | 21 | 3 | 85.7 | 13 | 45 | 246.2 |
|
| v0-v9 | 60 | 35 | 41.7 | 3 | 10 | 233 | 60 | 35 | 41.7 | 3 | 10 | 233 |
| v0-v10 | 33 | 27 | 18.2 | 11 | 18 | 63.6 | NA | NA | - | - | - | - |
| v0-v11 | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| v0-v12 | 20 | 7 | 65 | 5 | 8 | 60 | 20 | 7 | 65 | 5 | 5 | 60 |
| v0-v13 | 8 | 8 | 0 | 113 | 116 | 2.65 | 8 | 8 | 0 | 113 | 116 | 2.65 |
| v0-v14 | 17 | 17 | 0 | 56 | 60 | 5.3 | 21 | 19 | 9.5 | 20 | 25 | 25 |
| v0-v15 | 8 | 8 | 0 | 30 | 30 | 0 | 8 | 8 | 0 | 3 | 3 | 0 |
| v0-v16 | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR | NR |
| v0-v17 | 30 | 26 | 13.3 | 10 | 283 | 2730 | 30 | 26 | 13.3 | 10 | 283 | 2730 |
| Total | 286 | 166 | 42 | 549 | 1214 | 121.1 | 284 | 172 | 39.4 | 336 | 908 | 170.2 |
Experimental Results for Structorian
| Execution | Infection |
| Class | Versions (Revions) | E-Pex | E-eXpress | Reduction (%) | Ne-Pex | Ne-eXpres | Increment (%) | I-Pex | I-eXpress | Reduction (%) | Ni-Pex | Ni-eXpress | Increment (%) |
| StructLexer | 2-9 | 102 | 75 | 26.5 | 24 | 38 | 58.3 | 102 | 75 | 26.5 | 24 | 38 | 58.3 |
| StructLexer | 9-139 | 102 | 75 | 26.5 | 24 | 38 | 58.3 | 152 | 107 | 29.6 | 8 | 11 | 37.5 |
| StructLexer | 139-150 | 102 | 75 | 26.5 | 24 | 38 | 58.3 | 102 | 75 | 26.5 | 13 | 18 | 38.5 |
| StructLexer | 150-169 | 53 | 46 | 13.2 | 20 | 25 | 25 | 53 | 46 | 13.2 | 20 | 25 | 25 |
| StructLexer | 169-174 | 55 | 48 | 12.7 | 323 | 411 | 21.4 | 55 | 48 | 12.7 | 230 | 281 | 22.2 |
| StructLexer | 174-175 | 102 | 75 | 26.5 | 24 | 38 | 58.3 | - | - | - | - | - | - |
| StructLexer | 175-184 | 19 | 15 | 21.1 | 41 | 48 | 17.1 | 21 | 21 | 0 | 13 | 17 | 30.8 |
| Total | | 535 | 396 | 26 | 480 | 636 | 24.5 | 485 | 372 | 23.3 | 308 | 390 | 26.6 |
| BaseLexer | 45-174 | 2 | 2 | 0 | 999 | 999 | 0 | 3 | 3 | 0 | 243 | 265 | 9.1 |
| BaseLexer | 174-175 | 2 | 2 | 0 | 999 | 999 | 0 | 3 | 3 | 0 | 243 | 265 | 9.1 |
|
| StructParser | 2-5 | - | 1866 | - | - | - | - | - | 2587 | - | - | - | - |
| StructParser | 5-6 | - | 2614 | - | - | - | - | - | 2614 | - | - | - | - |
| StructParser | 9-13 | - | 1866 | - | - | - | - | - | 1866 | - | - | - | - |
| StructParser | 37-39 | 6 | 6 | 0 | 128 | 165 | 28.9 | - | 851 | - | - | 5 | - |
| StructParser | 39-40 | 2 | 2 | 0 | 150 | 167 | 11.3 | - | - | - | - | - | - |
| StructParser | 50-62 | 6188 | 1053 | 82.9 | - | - | - | 6188 | 1053 | 82.9 | - | - | |
| StructParser | 45-47 | 2 | 2 | 0 | 43 | 53 | 23.3 | 2 | 2 | 0 | 43 | 53 | 23.3 |
| StructParser | 47-50 | 2 | 2 | 0 | 43 | 53 | 23.3 | 2 | 2 | 0 | 43 | 53 | 23.3 |
| StructParser | 62-124 | 2 | 2 | 0 | 43 | 53 | 23.3 | 2 | 2 | 0 | 43 | 53 | 23.3 |
| StructParser | 124-125 | 2 | 2 | 0 | 43 | 53 | 23.3 | 2 | 2 | 0 | 43 | 53 | 23.3 |
| StructParser | 125-166 | - | 7452 | - | - | - | - | - | 7452 | - | - | | - |
| StructParser | 40-45 | - | 8214 | - | - | - | - | - | 8276 | - | - | - | - |
E-Pex: The total number of DSE runs required Pex for satisfying E (i.e., execution of a changed region)
E-eXpress: The total number of DSE runs required by Pex+eXpress for satisfying E.
Ne-Pex: The total number of tests that execute a changed region, generated by Pex
Ne-eXpres: The total number of tests that execute a changed region, generated by eXpress
I-Pex: The total number of DSE runs required by Pex for satisfying I (i.e., Program State Infection)
I-eXpress: The total number of DSE runs required by Pex+eXpress for satisfying I.
N-Pex: The total number of tests generated by Pex that infect the program state.
Ne-eXpress: The total number of tests generated by Pex+eXpress, that infect the program state.
The results of our seeding technique are as follows:
| C | V | Pex | Pex+Seeding | eXpress | eXpress+seeding |
| SP | 2-5 | 10000/1hr* | 10000/1hr* | 2381/35min | 181/17min |
| SP | 37-39 | 3699/26m | 60/1m | 851/22m | 47/11m |
| SP | 39-40 | 10000/1hr* | 304/2m | 10000/1hr* | 251/12m |
| SP | 45-47 | 10000/1hr* | 10000/1hr* | 10000/1hr* | 10000/1hr* |
| SP | 47-50 | 10000/1hr* | 81/1m | 10000/1hr* | 64/10m |
| SP | 62-124 | 10000/1hr* | 59/1m | 7228/58m | 41/10m |
| SL | 169-174 | 478/1m | 324/1m | 34/1m | 18/1m |
| SL | 150-169 | 299/1m | 37/1m | 52/1m | 29/1m |
| SL | 9-139 | 2988/2m | 69/1m | 1002/1m | 52/1m |
| Total | | 64476/6.5hr | 20934/2hr8m | 41568/5hr9m | 10683/2hr3m
|
PUBLICATIONS
- Kunal Taneja, Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. Guided Path Exploration for Regression Test Generation. In Companion Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), New Ideas and Emerging Results, Vancouver, Canada, pp. 311-314, May 2009.
Download: [PDF] - Kunal Taneja, Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. eXpress: Guided Path Exploration for Regression Test Generation. Submitted for publication.
SPONSORS