Data

We report here all the data used, extracted, and generated in our study:

Bug-Fixes
Datasets & Predictions
Idioms
Manual Evaluation
BLEU Score tests
Clustering and Silhouette

Bug-Fixes

Bug-Fixing Commits

Bug-Fixing commits metadata extracted during the mining. The CSV file contains the following fields:

ID : Commit HASH ID
Repo_URL : GitHub URL of the repository
Commit_URL : GitHub URL of the bug-fixing commit
Message : Commit message of the bug-fixing commit

Data

Download CSV file (900 MB)

Code from Bug-Fixes

Raw source code extracted from the bug-fixing commits.

Each bug-fixing commit is represented by a folder named as the commit hash ID. In each folder there are two sub-folders:

P_DIR: Java source code files before the bug-fixing commit
F_DIR: Java source code files after the bug-fixing commit

Data

Download data (15 GB)

Extracted Bug-Fix Pairs (BFP)

Method pairs extracted from the bug-fixing commits.

Each bug-fix is represented by a folder with the corresponding commit hash ID. In each bug-fix folder there is a first level of folders representing the files, then a second level of folders representing the methods. In each method folders there are the following files:

before.java : Method's source code before the fix
after.java : Method's source code after the fix
operations.txt : AST operations performed on the method as extracted by GumTreeDiff
signature.txt : Fully qualified signatures of the method before/after the fix

Data

Download data (7 GB)

Datasets & Predictions

We share the datasets and predictions of the models. Each dataset is partitioned in 80% training, 10% validation, and 10% testing set. The predictions of the models represent the mutant generated for the 10% test set.

Idioms

Manual Evaluation

We share the code sample and evaluation performed in order to assess the characteristics of the mutants generated by the models.

Code Sample

Download data - each text file contains the fixed code and, below, the mutant generated by the model.

Results

Spreadsheet - contains the evaluation performed by the judges.

BLEU Score Tests

We share the source code and the logs for the BLEU score tests.

Source Code

Download code - used to run statistical tests between models and baseline

Logs

Download logs - output logs of the tests.

Clustering - Silhouette

Download data - script to compute silhouette values as well as the results of the distribution (boxplots).

Google Sites

Report abuse