We report here all the data used, extracted, and generated in our study:
Bug-Fixes
Datasets & Predictions
Idioms
Manual Evaluation
BLEU Score tests
Clustering and Silhouette
Bug-Fixing commits metadata extracted during the mining. The CSV file contains the following fields:
ID : Commit HASH ID
Repo_URL : GitHub URL of the repository
Commit_URL : GitHub URL of the bug-fixing commit
Message : Commit message of the bug-fixing commit
Data
Download CSV file (900 MB)
Raw source code extracted from the bug-fixing commits.
Each bug-fixing commit is represented by a folder named as the commit hash ID. In each folder there are two sub-folders:
P_DIR: Java source code files before the bug-fixing commit
F_DIR: Java source code files after the bug-fixing commit
Data
Download data (15 GB)
Method pairs extracted from the bug-fixing commits.
Each bug-fix is represented by a folder with the corresponding commit hash ID. In each bug-fix folder there is a first level of folders representing the files, then a second level of folders representing the methods. In each method folders there are the following files:
before.java : Method's source code before the fix
after.java : Method's source code after the fix
operations.txt : AST operations performed on the method as extracted by GumTreeDiff
signature.txt : Fully qualified signatures of the method before/after the fix
Data
Download data (7 GB)
We share the code sample and evaluation performed in order to assess the characteristics of the mutants generated by the models.
Code Sample
Download data - each text file contains the fixed code and, below, the mutant generated by the model.
Results
Spreadsheet - contains the evaluation performed by the judges.
We share the source code and the logs for the BLEU score tests.
Source Code
Download code - used to run statistical tests between models and baseline
Logs
Download logs - output logs of the tests.
Download data - script to compute silhouette values as well as the results of the distribution (boxplots).