Evoif: Recovering Fitness Gradients for Interprocedural Flags

in Search Based Testing

1. Overall Description

In the field of Search-Based Software Testing (SBST), many metaheuristic searching algorithms are applied such as genetic algorithms, simulated annealing and hill climbing search algorithm, etc to guide the searching process. Therefore, most problems on test generation can be transformed to optimisation problems where tests are iteratively evolved by these effective metaheuristic searching algorithms. In such a scenario, test generation is guided by a problem-specific fitness function whose role is to guide the search towards better solutions from potentially infinite search space, within a practical time limit. Given a Boolean conditional statement, a popular fitness function will estimate how close a test case is to reach this uncovered condition (i.e., true or false), which is defined as branch distance. However, the branch distance does not provide gradient for the search since a Boolean value can only be evaluated to true or false (e.g., if (x && y)). This dilemma constitutes a well-known domain in SBST called flag problem. Flag problems can be addressed by replacing the Boolean flags with numeric comparisons which provide clearer guidance for the search (e.g., if (5 * 2 > 9 && 2 + 6 < 3)).

Unfortunately, in an interprocedural flag problem (ipf) case where Boolean flags are passed around as parameters and return values, it is not straightforward to define an semantics-preserving transformation technique which can be universally applied to all ipf cases. Thus, it is not yet supported by state-of-the-art test generators. This project is developed based on the insight that the fitness gradients of ipf can be recovered by making use of runtime information. Given an uncovered interprocedural flag branch, this approach calculates context-sensitive branch distance for all control flows potentially returning the required flag in the called method, and recursively aggregates these distances into a continuous value to guide the search. This approach is implemented on top of the state-of- the-art automatic test suite generation tool for Java EvoSuite framework, and we empirically compare it with its testability transformations on 807 non-trivial methods suffering from ipf problems, sampled from 150 open source Java projects. Our experiment demonstrates that this approach achieves higher overall branch coverage on the investigated methods with statistical significance and acceptable runtime overheads.

2. Experiment Details:

Evoif is a tool built on top of Evosuite (http://www.evosuite.org/) , with the support of solving interprocedural flag problem. Apart from supporting interprocedural flag fitness, Evoif also provides facility to run customized scripts to (1) run data set in a batch mode and (2) support remote debugging during the batch experiment.

This website is used for anonymous review for ISSTA 2020. We present runtime parameter for evosuite framework, google drive links for source, binary, and scripts, all subject methods, as well as a brief tutorial of replicating the experiment.

  • Runtime Parameter for Evosuite Framework:
    • For Evoif (i.e., fbranch criterion):

-Dsearch_budget 100 -Dinstrument_context true -Dp_test_delete 0 -Dp_test_change 0.9 -Dp_test_insert 0.3 -Dp_change_parameter 0.6 -Dp_functional_mocking 0 -Dmock_if_no_generator false -Dfunctional_mocking_percent 0 -Dprimitive_reuse_probability 0 -Dmin_initial_tests 10 -Dmax_initial_tests 20 -Ddse_probability 0 -Dinstrument_libraries true -Dinstrument_parent true -Dmax_attempts 100 -Dassertions false -Delite 10 -Ddynamic_pool 0.0 -Dprimitive_pool 0.0 -Djunit_check false

    • For Evosuite (i.e., branch critierion):

-Dsearch_budget 100 -Dinstrument_context true -Dp_test_delete 0 -Dp_test_change 0.9 -Dp_test_insert 0.3 -Dp_change_parameter 0.6 -Dp_functional_mocking 0 -Dmock_if_no_generator false -Dfunctional_mocking_percent 0 -Dprimitive_reuse_probability 0 -Dmin_initial_tests 10 -Dmax_initial_tests 20 -Ddse_probability 0 -Dinstrument_libraries true -Dinstrument_parent true -Dmax_attempts 100 -Dassertions false -Delite 10 -Ddynamic_pool 0.0 -Dprimitive_pool 0.0 -Djunit_check false

  • Tool Download:
  • Data Download:
    • The subject projects: link
    • The overall comparison on methods suffering interprocedural flag problem: link
    • The overall statistics on Interprocedural methods and filtering criteria: link
    • Script examples for running our tool: link

2. Replicating the Experiment:

  • Tutorial of using Evoif and replicating the experiment:
    • Prerequisite: install java 8 (not jre, as Evosuite requires tools.jar library on runtime)
    • Step1: Extract the subject projects( which can download from here: link) into a folder, and put the EvosuiteTest-1.0.7-SNAPSHOT.jar (which can downloaded from jar file here: link ) under the same root directory. Attached screenshot is the hierarchy of the experiment setup.

Step1

    • Step2: The subject projects consist of 150 projects, following the format in SF100 benchmark (i.e. index_projectName). The first 100 projects come from SF100. The rest are adopted from popular open source Java projects from maven central library and sourceforge.

Step2

    • Step3: Set up a target method list for running the experiment, an example is here: link. Therefore Evoif can automatically pick up the target methods from the projects provided and execute the experiment for them correspondingly.

Step 3

    • Step4: Open up the script file and specify the report folder path by setting FOLDER_NAME option inside the script provided for executing, (e.g., FOLDER_NAME=report). Evoif will automatically generate a folder called report to store the test result.

Step4

    • Step5: Choose a mode to run. The scripts support running the batch experiment in two modes: running mode and debug mode. In most cases, you only need to use running mode by default. The tool will read from the target method list and run all the methods in a batch way. Nevertheless, you can link the binary with the source code of Evoif imported into IDE like eclipse and debug the program with source code. If you want to do this, you can enable EVOTEST_DEBUG option.

Step5

    • Step6: Start running the script. The UI is essential command line provided by Evosuite. The results in form of excel report will be generated in the folder you specify. The format is (FOLDER_NAME_strategy). Strategy branch is the report generated by Evosuite while fbranch is the report by Evoif. The estimated log on successful run with look like the screenshot.

Step6

Step6

3. Filter rules for achieving 807 target methods:

    • 1. The target method must have at least one parameter of primitive type (java.lang.String is also considered as primitive type).
    • 2. The parameters of target method cannot belong to interface or abstract class.
    • 3. The branch must contain a branch with flag problem (i.e., the branch is TRUE or FALSE, the bytecode of the branch instruction is either IFEQ or IFNE)
    • 4. Following 3, next instruction followed by IFEQ or IFNE needs either be
      • 1) invoke method instruction or
      • 2) access field instruction (e.g., field use, local variable use, load array instruction) to become potential Inter-procedural Flag Problem (IPF).
    • 5. The target flag method should:
      • 1) have at least one parameter of primitive type (java.lang.String is also considered as primitive type)
      • 2) be instrumentable (uninstrumentable classes include but not limit to package java., com.sun., evosuite., e.g., java.lang.Object equals() is excluded here)
    • This filter rule can be found at EvosuiteTest/src/evosuite/shell/listmethod/PrimitiveBasedFlagMethodFilter.java and also in our paper

4. Qualitative Examples on How Evoif Work

First, We show two examples on how EvoIf can improve Evosuite:

Example 1

In this example, the target method is org.databene.jdbacl.SQLUtil#mutatesStructure(String sql) from 13_jdbacl project (SF100), which contains two inter-procedural flag (a.k.a, IPF) methods isDDL(String sql) and isProcedureCall(String sql). The target method returns a Boolean value based on the type of input string. Insider both IPF methods, the method normalizeSQL(String sql) is invoked and also revoke external library org.databene.commons.StringUtil.

    • Challenge: the challenge lie in that randomly generated string can hardly meet the requirement of sql syntex, e.g., "select * from ...."
    • Evoif' remedy: Evoif can generated the valid SQL string by recursively unrolling the method calls. The branch distance in line 191 and line 192 can force Evoif to generate valid string conforming to SQL syntax.
    • Result: Evoif outperforms Evosuite with the coverage of 0.975 and 0.5 and with the time used of 26 and 101 seconds correspondingly on average.

Example.1

Example 2

In this example, the target method is org.javacc.parser.NfaState#getFirstValidPos(String s, int i, int len) from 159_javacc project (Extra 50) suffers from the inter-procedural flag method CanMoveUsingChar(char c).

    • Challenge: the challenge lies in that a correct input for line 792, i.e., the generated string needs to contains a char satisfying the condition of "moving" (see the method CanMoveUsingChar(char c)).
    • Evoif' remedy: Evoif investigate the unrolled interprocedural flag methods and generate a string contains char meeting the condition on line 168 in CanMoveUsingChar(char c).
    • Result: Evoif achieves an overall coverage of 1, compared to that of 0.66 in Evosuite. With merely an execution of 8 seconds, Evoif is able to cover all the branches inside the target method.

Example 2