Data

We report here all the data used, extracted, and generated in our study:

  • Bug-Fixes

  • Datasets

  • Predictions

  • Idioms

Bug-Fixes

Bug-Fixing Commits

Bug-Fixing commits metadata extracted during the mining. The CSV file contains the following fields:

  • ID : Commit HASH ID

  • Repo_URL : GitHub URL of the repository

  • Commit_URL : GitHub URL of the bug-fixing commit

  • Message : Commit message of the bug-fixing commit

Data

Download CSV file (900 MB)


Code from Bug-Fixes

Raw source code extracted from the bug-fixing commits.

Each bug-fixing commit is represented by a folder named as the commit hash ID. In each folder there are two sub-folders:

  • P_DIR: Java source code files before the bug-fixing commit

  • F_DIR: Java source code files after the bug-fixing commit

Data

Download data (15 GB)


Extracted Bug-Fix Pairs (BFP)

Method pairs extracted from the bug-fixing commits.

Each bug-fix is represented by a folder with the corresponding commit hash ID. In each bug-fix folder there is a first level of folders representing the files, then a second level of folders representing the methods. In each method folders there are the following files:

  • before.java : Method's source code before the fix

  • after.java : Method's source code after the fix

  • operations.txt : AST operations performed on the method as extracted by GumTreeDiff

  • signature.txt : Fully qualified signatures of the method before/after the fix

Data

Download data (7 GB)

Datasets

Dataset of Bug-Fix Pairs for small and medium methods. Each dataset is formed by training (80%), validation (10%), and test (10%) sets, as well as the combined files. Each set contains the buggy and the corresponding fixed code, with line-by-line alignment (i-th line in buggy corresponds to the i-th line in fixed). The original source code of the methods is also available. The mapping file allows to map the abstract code with the original source code.

Download data

Predictions

Predictions of the models for small and medium methods. For each model, there is a folder for each beam size containing files of the predictions performed on the test set of the corresponding dataset (above):

Each beam size folder contains the following files:

  • prediction.beam.mul.txt : the k predictions performed by the model separated by the <SEP> token. The i-th prediction refer to the i-th buggy code in the test set;

  • prediction.beam.vis.txt : same predictions but displayed one by line;

  • perfect.beam.mul.txt : the perfect fixes generated by the model. The file shows first the buggy code, then the fixed code, then the predicted code (equal to fixed);

  • pred.operations.txt : AST fine grained operations emulated by the model when performing the perfect fixes.

Idioms

Idioms