
We report here all the data used, extracted, and generated in our study:

  • Bug-Fixes

  • Datasets & Predictions

  • Idioms

  • Manual Evaluation

  • BLEU Score tests

  • Clustering and Silhouette


Bug-Fixing Commits

Bug-Fixing commits metadata extracted during the mining. The CSV file contains the following fields:

  • ID : Commit HASH ID

  • Repo_URL : GitHub URL of the repository

  • Commit_URL : GitHub URL of the bug-fixing commit

  • Message : Commit message of the bug-fixing commit


Download CSV file (900 MB)

Code from Bug-Fixes

Raw source code extracted from the bug-fixing commits.

Each bug-fixing commit is represented by a folder named as the commit hash ID. In each folder there are two sub-folders:

  • P_DIR: Java source code files before the bug-fixing commit

  • F_DIR: Java source code files after the bug-fixing commit


Download data (15 GB)

Extracted Bug-Fix Pairs (BFP)

Method pairs extracted from the bug-fixing commits.

Each bug-fix is represented by a folder with the corresponding commit hash ID. In each bug-fix folder there is a first level of folders representing the files, then a second level of folders representing the methods. In each method folders there are the following files:

  • : Method's source code before the fix

  • : Method's source code after the fix

  • operations.txt : AST operations performed on the method as extracted by GumTreeDiff

  • signature.txt : Fully qualified signatures of the method before/after the fix


Download data (7 GB)

Datasets & Predictions

We share the datasets and predictions of the models. Each dataset is partitioned in 80% training, 10% validation, and 10% testing set. The predictions of the models represent the mutant generated for the 10% test set.



Manual Evaluation

We share the code sample and evaluation performed in order to assess the characteristics of the mutants generated by the models.

Code Sample

Download data - each text file contains the fixed code and, below, the mutant generated by the model.


Spreadsheet - contains the evaluation performed by the judges.

BLEU Score Tests

We share the source code and the logs for the BLEU score tests.

Source Code

Download code - used to run statistical tests between models and baseline


Download logs - output logs of the tests.

Clustering - Silhouette

Download data - script to compute silhouette values as well as the results of the distribution (boxplots).