TestSelect

Test Selection via Mining Common Operational Models

Project Summary

In automated testing, especially test generation in the absence of specifications, a large amount of manual effort is spent on test-result inspection. Test selection helps to reduce this effort by selecting a small subset of tests that are likely to reveal faults. A promising test-selection approach is to dynamically mine operational models as potential test oracles and then select tests that violate them. Existing work adopting this approach mines operational models based on dynamic invariant detection. In this paper, we propose to mine common operational models, which are often but not always true in all observed traces, from a (potentially large) set of unverified tests. Specifically, our approach collects branch coverage and data value bounds at runtime and then mines implication relationships between branches and constraints of data values as potential operational models after running all the tests. Our approach then selects tests that violate the mined common operational models for result inspection. We have evaluated our approach on a set of programs, compared with previous code-coverage-based, clustering-based, dynamic-invariant-based, and random selection approaches. The experimental results show that our approach can more effectively reduce the number of tests for result inspection while revealing most of the faults.

Subject Programs

The subjects used in our evaluation are as shown below:

    • Siemens suite: 7 programs (print_tokens, print_tokens2, replace, schedule, schedule2, tcas, tot_info), 170-540 LOC, 1052-5542 tests
    • Space: 9564 LOC, 13585 tests
    • grep: 13358 LOC, 809 tests

The characteristic of the subject programs is shown in this table. In these subjects, there are many trivial faults that fail on more than 5% of the tests. Such faults are less likely to be seen in practical setting. To better evaluate the potential effectiveness of test selection approaches in practice, we conduct additional experiments on only the nontrivial faults.

All of these subjects and the test suites are downloaded from the Subject Infrastructure Repository.

Implementation

We collect the execution traces using the Cooperative Bug Isolation (CBI) tools. Specifically, we use two instrumentation schemes, including "branches" and "bounds", to collect branch coverage and data value bounds.

We compare our approach with the following existing test selection approaches:

    • Random Selection
    • In random selection, programmers are assumed to select tests for result inspection randomly without replacement.
    • Code-coverage-based Approach
    • The code-coverage-based approach attempts to cover as many program elements of a given type as the original test suite with as few test cases as possible. It can be implemented by greedily selecting the test that covers the largest number of elements not covered by the previously selected tests.
    • Clustering-based Approach
    • The clustering-based approach uses agglomerative hierarchical clustering to cluster the tests, and then selects one test from each cluster.
    • Dynamic-invariant-based Approach
    • Harder et al. proposed the operational difference approach to select tests based on Daikon. It starts with an empty test suite and repeatedly adds new tests if they violate the invariants of the previously selected tests. To control the number of tests, the algorithm terminates when n (n=50 in their experiments) consecutive tests are considered and rejected.

We implement our approach, random selection, code-coverage-based approach, and clustering-based approach in MATLAB. For the (dynamic-invariant-based) operational difference approach, we adopt the results presented by Harder et al. (They presented results in the Siemens suite and the Space program). We do not re-implement the approach and experiment in the grep program. This is because in our preliminary experiments, it takes a lot of time to repeatly check and update dynamic invariants.

Source Code: Our Approach, Random Selection, Code-coverage-based Approach, Clustering-based Approach.

Example Inputs (execution traces of the 1st version of print_tokens): trace_branches.mat, trace_bounds.mat, run_result.

Evaluation Results

Overall Results

(a) Results for the Siemens suite

(b) Results for the Space program

Figure 1: Results of all the faults

(c) Results for the grep program

(a) Results for the Siemens suite

(b) Results for the Space program

Figure 2: Results of the nontrivial faults

(c) Results for the grep program

Our approach mines two kinds of rules: control rules (implication relationships between branches) and data rules (constraints of data values). In the figures, "Control Rules" indicates the effectiveness of our approach considering only control rules, and "Data Rules" indicates the effectiveness of our approach considering only data rules.

Detailed Results

Efficiency (seconds)

All of the experiments were carried out on a 1.7GHz Intel Pentium Dual-Core PC with 1.5GB physical memory, running Fedora Core 6.

Future Work

    • Integration with automatic test generation tools such as Pex
    • Experiments on large scale programs
    • Domain-specific common operational models

Thank you for your interest in our work. Please contact Wujie Zheng for any additional information.