Data

We report here all the data used, extracted, and generated in our study:

  • Code Changes (Pull Requests)

  • Datasets

  • Predictions

  • Frequent Identifiers & Literals

Code Changes

Code changes mined and extracted from Pull Requests (PR). We provide the data at different stages of mining and extraction.

Pull Requests

Pull Requests metadata extracted during the mining. The CSV file contains the following fields:

  • ID : Unique ID of the Pull Request

  • URL : URL of the Pull Request

  • Repo : Repository where the Pull Request was performed

  • Title : Title of the Pull Request

  • Date & Time : Date and Time when the Pull Request was open

Data

Download CSV file


Code from Pull Requests

Raw source code extracted from the PRs.

Each PR is represented by a folder with a numerical ID (<PR_ID>). In each PR folder there are two folders:

  • P_DIR: Java source code files before the PR

  • F_DIR: Java source code files after the PR is accepted and merged

Data

  • Android

  • Google

  • Ovirt


Extracted Method Pairs

Method pairs extracted from the PRs.

Each PR is represented by a folder with a numerical ID (<PR_ID>). In each PR folder there is a first level of folders representing the files, then a second level of folders representing the methods. In each method folders there are the following files:

  • before.java : Method's source code before the PR

  • after.java : Method's source code after the PR is accepted and merged

  • operations.txt : AST operations performed on the method as extracted by GumTreeDiff

  • signature.txt : Fully qualified signatures of the method before/after

Data

  • Android

  • Google

  • Ovirt

Datasets

Dataset of method pairs used to train and evaluate the NMT models. The data is splitted in training (80%), validation (10%), and test (10%). We provide several representations of the data.

Source Code

Method Pairs' source code before/after the PR. Each PR <ID> is in before/<ID>.java which maps to after/<ID>.java.

Data

  • Android

  • Google

  • Ovirt


Source Code - Abstract Code

Method Pair's source code and abstracted code. The format of the files is the following:

====

<abstracted code before>

<abstracted code after>

----

<source code before>

----

<source code after>

(empty line)

Data

  • Android

  • Google

  • Ovirt


Abstract Code for NMT

Dataset of abstracted code changes for each repository and for small and medium methods used to train and evaluate the NMT models.

Data

  • Android

  • Google

  • Ovirt

  • All

Predictions

Predictions of the NMT models at with different beam sizes (number of predictions for each input), for the test set.

Data

  • Android

  • Google

  • Ovirt

  • All

Frequent Identifiers & Literals

Idioms