Bounded Exhaustive Search of Alloy Specification Repairs

Replication Package

Replicating experiments

Using Docker

Requirements

  • Docker, by either installing Docker Desktop for Windows or macOS; or by installing Docker Engine (not available for Windows).

  • Pulling image drstein/beafix:2.1.2 by executing docker pull drstein/beafix:2.1.2 or by using Docker Desktop.

  • Our experiments were run with 16Gb of memory as maximum memory available for BeAFix.

Running paper's experiments

To run the docker, the command is docker run -it drstein/beafix:2.1.2.

Our benchmarks are found in /home/beafix/replicationPackage/benchmarks; running script and BeAFix jar is found in /home/beafix/replicationPackage; and to run a particular study case the following command must be executed:

./runBeAFix.sh BeAFixCLI-2.1.2.jar <path to study case> <depth> <pruning> <timeout> <results folder>.

Where path to study case is the full path to an .als file inside the benchmarks folder; depth is how many mutations to apply to each buggy expression; pruning enables (1) or disables (0) the use of pruning techniques; timeout (in minutes) states how much time is allowed to find a fix, the script adds 5 extra minutes to forcefully terminate BeAFix if it didn't finished by itself in the allotted timeout; and results folder is where results will be stored (if the folder doesn't exist it will be created, but if it exists it should be empty).

On our experiments we used depth 2 to 5 (only running cases with an increased depth for cases that were not repaired), and we used 60 minutes as timeout.

MacOS and Linux

Requirements

  • Java 8 (we used Oracle's version, can be downloaded from here).

  • GNU Bash 4.4.20+ (although it might work with older versions).

    • We used some commands in ubuntu that they may work slightly different on MacOS: timeout, grep, and sed.

  • Our experiments were run with 16Gb of memory as maximum memory available for BeAFix.

  • Our replication package containing BeAFixCLI-2.1.2.jar, runBeAFix.sh, astryker.properties, and the benchmarks folder.

    BeAFix requires MiniSat to be used as a SAT solver. The CLI version found in the replication package's docker has MiniSat already selected. If you use the CLI version of BeAFix outside the docker, you will need to first use either AlloyAnalyzer or the GUI version of BeAFix and set MiniSat; the latest version of BeAFix have MiniSat already set for the CLI version.


Running paper's experiments

Inside the folder containing the replication package contents, the following command must be executed:

./runBeAFix.sh BeAFixCLI-2.1.2.jar <path to study case> <depth> <pruning> <timeout> <results folder>.

Where path to study case is the full path to an .als file inside the benchmarks folder; depth is how many mutations to apply to each buggy expression; pruning enables (1) or disables (0) the use of pruning techniques; timeout (in minutes) states how much time is allowed to find a fix, the script adds 5 extra minutes to forcefully terminate BeAFix if it didn't finished by itself in the allotted timeout; and results folder is where results will be stored (if the folder doesn't exist it will be created, but if it exists it should be empty).

On our experiments we used depth 2 to 5 (only running cases with an increased depth for cases that were not repaired), and we used 60 minutes as timeout.

Script's results

After running an experiment, the results folder defined when calling the BeAFix script will contain the following:

  • <model's name>.summary a sequence of values separated with `;` that starts with the model's name, followed by:

    • TO from script if BeAFix didn't end in the allotted time and was finished by the script. In this case no other value will be present in the summary.

    • FAILED WITH CODE followed by an exit code, if an error occurred. In this case no other value will be present in the summary.

    • repaired (true/false) stating if BeAFix was able to find a fix or not.

    • time (ms) the running time of BeAFix.

    • spuriosity (real/spurious) a check only done when having a model with oracles that will only be used to validate a fix (none of our experiments use this) this will always be real for this replication package.

    • the repair as the modifications done by BeAFix to repair the model (separated by the symbol £).

    • repair verification this last value was only used as a debug helper and should be ignored.

  • <model's name>_repair.als a repaired model (if a repair was found), this model can sometimes have some issues because the candidate writer was only used as a debug helper. This file can sometimes stay in the same path as the original model.

  • <model's name>_repair.verification a side product of the last verification process.

  • <model's name>_summaryLog.log if BeAFix ended with an error code, this file contains the last 300 lines of some log files generated by BeAFix, this log files are deleted by the cleanFiles function in runBeAFix.sh, you can avoid this cleaning by commenting the cleanFiles "$scase" line using sed command.

Running all models per case

To run whole cases, all Alloy4Fun 2 bugs classroom cases for example, the following command can be used:

find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} <depth> <pruning> <timeout> <results folder>

We recommend using meaningful and separate results folder for each run, depth, and pruning configuration, we give a list of commands for each case and pruning configuration using depth 2:

Alloy4Fun 1 bug cases
find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} <depth> <pruning> <timeout> <results folder>

Alloy4Fun 2 bugs

Classroom
find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 CLASSROOM-2B-NOPRUNING-DEPTH2

find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 CLASSROOM-2B-PRUNING-DEPTH2

CV

find benchmarks/A4F-MB | awk '/CV_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 CV-2B-NOPRUNING-DEPTH2

find benchmarks/A4F-MB | awk '/CV_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 CV-2B-PRUNING-DEPTH2

Graphs

find benchmarks/A4F-MB | awk '/Graphs_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 GRAPHS-2B-NOPRUNING-DEPTH2

find benchmarks/A4F-MB | awk '/Graphs_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 GRAPHS-2B-PRUNING-DEPTH2

LTS

find benchmarks/A4F-MB | awk '/LTS_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 LTS-2B-NOPRUNING-DEPTH2

find benchmarks/A4F-MB | awk '/LTS_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 LTS-2B-PRUNING-DEPTH2

Production

find benchmarks/A4F-MB | awk '/Production_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 PRODUCTION-2B-NOPRUNING-DEPTH2

find benchmarks/A4F-MB | awk '/Production_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 PRODUCTION-2B-PRUNING-DEPTH2

Trash

find benchmarks/A4F-MB | awk '/Trash/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 TRASH-2B-NOPRUNING-DEPTH2

find benchmarks/A4F-MB | awk '/Trash/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 TRASH-2B-PRUNING-DEPTH2

ARepair 1 bug

find benchmarks/ARepair-1B | awk '/\.als/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 AREPAIR-1B-DEPTH2

ARepair multiple bugs

find benchmarks/ARepair-MB/depth1granularity | awk '/\.als/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 AREPAIR-2B-GRAN1-DEPTH2

Merging summaries and running next depth experiments

Assuming a results folder (called results-depth2 for simplicity), Graphs 2 bugs cases, and pruning:

Merging : find results-depth2 | awk '/\.summary$/' | xargs -I {} cat {} >> results.summary
Obtaining not repaired cases: cat results.summary | grep -v '^.*; true' | sed 's|;.*$||g' >> notRepaired
Running next depth: cat notRepaired | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar benchmarks/A4F-MB/{}.als 3 1 60 GRAPHS-2B-PRUNING-DEPTH3

Benchmarks

This replication package contains the benchmarks used in the paper. These can all be found in the benchmarks folder.

  1. A4F-1B, 1936 models from the Alloy4Fun benchmark , each having one bug.

  2. A4F-MB, 273 models derived from the Alloy4Fun benchmark , each having multiple bugs.

  3. ARepair-1B, 16 models from the ARepair benchmark , each having one bug.

  4. ARepair-MB, 15 models from the ARepair benchmark , each having multiple bugs.

    1. depth1granularity, the 15 models have the faulty expression marked at the "line" level.

    2. depth2granularity, 13 of the 15 models with the faulty expression marked with one more level inside of the original expression. For example, if the expression marked at line level was c in (a + b), and the fault is in b then the marked expression would be a + b.

    3. depth3granularity, 5 of the 13 models with the faulty expression marked with one more level.

    4. depth4granularity, 3 of the 5 models with the faulty expression marked with one more level.

granularity didn't have a great impact on the reparability