Bounded Exhaustive Search of Alloy Specification Repairs
Replication Package
Replicating experiments
Using Docker
Requirements
Docker, by either installing Docker Desktop for Windows or macOS; or by installing Docker Engine (not available for Windows).
Pulling image drstein/beafix:2.1.2 by executing docker pull drstein/beafix:2.1.2 or by using Docker Desktop.
Our experiments were run with 16Gb of memory as maximum memory available for BeAFix.
Running paper's experiments
To run the docker, the command is docker run -it drstein/beafix:2.1.2.
Our benchmarks are found in /home/beafix/replicationPackage/benchmarks; running script and BeAFix jar is found in /home/beafix/replicationPackage; and to run a particular study case the following command must be executed:
./runBeAFix.sh BeAFixCLI-2.1.2.jar <path to study case> <depth> <pruning> <timeout> <results folder>.
Where path to study case is the full path to an .als file inside the benchmarks folder; depth is how many mutations to apply to each buggy expression; pruning enables (1) or disables (0) the use of pruning techniques; timeout (in minutes) states how much time is allowed to find a fix, the script adds 5 extra minutes to forcefully terminate BeAFix if it didn't finished by itself in the allotted timeout; and results folder is where results will be stored (if the folder doesn't exist it will be created, but if it exists it should be empty).
On our experiments we used depth 2 to 5 (only running cases with an increased depth for cases that were not repaired), and we used 60 minutes as timeout.
MacOS and Linux
Requirements
Java 8 (we used Oracle's version, can be downloaded from here).
GNU Bash 4.4.20+ (although it might work with older versions).
We used some commands in ubuntu that they may work slightly different on MacOS: timeout, grep, and sed.
Our experiments were run with 16Gb of memory as maximum memory available for BeAFix.
Our replication package containing BeAFixCLI-2.1.2.jar, runBeAFix.sh, astryker.properties, and the benchmarks folder.
BeAFix requires MiniSat to be used as a SAT solver. The CLI version found in the replication package's docker has MiniSat already selected. If you use the CLI version of BeAFix outside the docker, you will need to first use either AlloyAnalyzer or the GUI version of BeAFix and set MiniSat; the latest version of BeAFix have MiniSat already set for the CLI version.
Running paper's experiments
Running paper's experiments
Inside the folder containing the replication package contents, the following command must be executed:
./runBeAFix.sh BeAFixCLI-2.1.2.jar <path to study case> <depth> <pruning> <timeout> <results folder>.
Where path to study case is the full path to an .als file inside the benchmarks folder; depth is how many mutations to apply to each buggy expression; pruning enables (1) or disables (0) the use of pruning techniques; timeout (in minutes) states how much time is allowed to find a fix, the script adds 5 extra minutes to forcefully terminate BeAFix if it didn't finished by itself in the allotted timeout; and results folder is where results will be stored (if the folder doesn't exist it will be created, but if it exists it should be empty).
On our experiments we used depth 2 to 5 (only running cases with an increased depth for cases that were not repaired), and we used 60 minutes as timeout.
Script's results
After running an experiment, the results folder defined when calling the BeAFix script will contain the following:
<model's name>.summary a sequence of values separated with `;` that starts with the model's name, followed by:
TO from script if BeAFix didn't end in the allotted time and was finished by the script. In this case no other value will be present in the summary.
FAILED WITH CODE followed by an exit code, if an error occurred. In this case no other value will be present in the summary.
repaired (true/false) stating if BeAFix was able to find a fix or not.
time (ms) the running time of BeAFix.
spuriosity (real/spurious) a check only done when having a model with oracles that will only be used to validate a fix (none of our experiments use this) this will always be real for this replication package.
the repair as the modifications done by BeAFix to repair the model (separated by the symbol £).
repair verification this last value was only used as a debug helper and should be ignored.
<model's name>_repair.als a repaired model (if a repair was found), this model can sometimes have some issues because the candidate writer was only used as a debug helper. This file can sometimes stay in the same path as the original model.
<model's name>_repair.verification a side product of the last verification process.
<model's name>_summaryLog.log if BeAFix ended with an error code, this file contains the last 300 lines of some log files generated by BeAFix, this log files are deleted by the cleanFiles function in runBeAFix.sh, you can avoid this cleaning by commenting the cleanFiles "$scase" line using sed command.
Running all models per case
To run whole cases, all Alloy4Fun 2 bugs classroom cases for example, the following command can be used:
find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} <depth> <pruning> <timeout> <results folder>
We recommend using meaningful and separate results folder for each run, depth, and pruning configuration, we give a list of commands for each case and pruning configuration using depth 2:
Alloy4Fun 1 bug cases
find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} <depth> <pruning> <timeout> <results folder>
find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} <depth> <pruning> <timeout> <results folder>
Alloy4Fun 2 bugs
Classroom
find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 CLASSROOM-2B-NOPRUNING-DEPTH2
find benchmarks/A4F-MB | awk '/Classroom/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 CLASSROOM-2B-PRUNING-DEPTH2
CV
find benchmarks/A4F-MB | awk '/CV_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 CV-2B-NOPRUNING-DEPTH2
find benchmarks/A4F-MB | awk '/CV_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 CV-2B-PRUNING-DEPTH2
Graphs
find benchmarks/A4F-MB | awk '/Graphs_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 GRAPHS-2B-NOPRUNING-DEPTH2
find benchmarks/A4F-MB | awk '/Graphs_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 GRAPHS-2B-PRUNING-DEPTH2
LTS
find benchmarks/A4F-MB | awk '/LTS_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 LTS-2B-NOPRUNING-DEPTH2
find benchmarks/A4F-MB | awk '/LTS_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 LTS-2B-PRUNING-DEPTH2
Production
find benchmarks/A4F-MB | awk '/Production_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 PRODUCTION-2B-NOPRUNING-DEPTH2
find benchmarks/A4F-MB | awk '/Production_/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 PRODUCTION-2B-PRUNING-DEPTH2
Trash
find benchmarks/A4F-MB | awk '/Trash/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 TRASH-2B-NOPRUNING-DEPTH2
find benchmarks/A4F-MB | awk '/Trash/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 1 60 TRASH-2B-PRUNING-DEPTH2
ARepair 1 bug
find benchmarks/ARepair-1B | awk '/\.als/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 AREPAIR-1B-DEPTH2
ARepair multiple bugs
find benchmarks/ARepair-MB/depth1granularity | awk '/\.als/' | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar {} 2 0 60 AREPAIR-2B-GRAN1-DEPTH2
Merging summaries and running next depth experiments
Assuming a results folder (called results-depth2 for simplicity), Graphs 2 bugs cases, and pruning:
Merging : find results-depth2 | awk '/\.summary$/' | xargs -I {} cat {} >> results.summary
Obtaining not repaired cases: cat results.summary | grep -v '^.*; true' | sed 's|;.*$||g' >> notRepaired
Running next depth: cat notRepaired | xargs -I {} ./runBeAFix.sh BeAFixCLI-2.1.2.jar benchmarks/A4F-MB/{}.als 3 1 60 GRAPHS-2B-PRUNING-DEPTH3
Benchmarks
This replication package contains the benchmarks used in the paper. These can all be found in the benchmarks folder.
A4F-1B, 1936 models from the Alloy4Fun benchmark , each having one bug.
A4F-MB, 273 models derived from the Alloy4Fun benchmark , each having multiple bugs.
ARepair-1B, 16 models from the ARepair benchmark , each having one bug.
ARepair-MB, 15 models from the ARepair benchmark , each having multiple bugs.
depth1granularity, the 15 models have the faulty expression marked at the "line" level.
depth2granularity, 13 of the 15 models with the faulty expression marked with one more level inside of the original expression. For example, if the expression marked at line level was c in (a + b), and the fault is in b then the marked expression would be a + b.
depth3granularity, 5 of the 13 models with the faulty expression marked with one more level.
depth4granularity, 3 of the 5 models with the faulty expression marked with one more level.
granularity didn't have a great impact on the reparability