Data

We report here all the data used, extracted, and generated in our study:

  • Bug-Fixes

  • Datasets & Predictions

  • Idioms

  • Manual Evaluation

  • BLEU Score tests

  • Clustering and Silhouette

Bug-Fixes

Bug-Fixing Commits

Bug-Fixing commits metadata extracted during the mining. The CSV file contains the following fields:

  • ID : Commit HASH ID

  • Repo_URL : GitHub URL of the repository

  • Commit_URL : GitHub URL of the bug-fixing commit

  • Message : Commit message of the bug-fixing commit

Data

Download CSV file (900 MB)


Code from Bug-Fixes

Raw source code extracted from the bug-fixing commits.

Each bug-fixing commit is represented by a folder named as the commit hash ID. In each folder there are two sub-folders:

  • P_DIR: Java source code files before the bug-fixing commit

  • F_DIR: Java source code files after the bug-fixing commit

Data

Download data (15 GB)


Extracted Bug-Fix Pairs (BFP)

Method pairs extracted from the bug-fixing commits.

Each bug-fix is represented by a folder with the corresponding commit hash ID. In each bug-fix folder there is a first level of folders representing the files, then a second level of folders representing the methods. In each method folders there are the following files:

  • before.java : Method's source code before the fix

  • after.java : Method's source code after the fix

  • operations.txt : AST operations performed on the method as extracted by GumTreeDiff

  • signature.txt : Fully qualified signatures of the method before/after the fix

Data

Download data (7 GB)

Datasets & Predictions

We share the datasets and predictions of the models. Each dataset is partitioned in 80% training, 10% validation, and 10% testing set. The predictions of the models represent the mutant generated for the 10% test set.

Idioms

Idioms

Manual Evaluation

We share the code sample and evaluation performed in order to assess the characteristics of the mutants generated by the models.

Code Sample

Download data - each text file contains the fixed code and, below, the mutant generated by the model.

Results

Spreadsheet - contains the evaluation performed by the judges.

BLEU Score Tests

We share the source code and the logs for the BLEU score tests.

Source Code

Download code - used to run statistical tests between models and baseline

Logs

Download logs - output logs of the tests.

Clustering - Silhouette

Download data - script to compute silhouette values as well as the results of the distribution (boxplots).