Data
We report here all the data used, extracted, and generated in our study:
Code Changes (Pull Requests)
Datasets
Predictions
Frequent Identifiers & Literals
Code Changes
Code changes mined and extracted from Pull Requests (PR). We provide the data at different stages of mining and extraction.
Pull Requests
Pull Requests metadata extracted during the mining. The CSV file contains the following fields:
ID : Unique ID of the Pull Request
URL : URL of the Pull Request
Repo : Repository where the Pull Request was performed
Title : Title of the Pull Request
Date & Time : Date and Time when the Pull Request was open
Data
Code from Pull Requests
Raw source code extracted from the PRs.
Each PR is represented by a folder with a numerical ID (<PR_ID>). In each PR folder there are two folders:
P_DIR: Java source code files before the PR
F_DIR: Java source code files after the PR is accepted and merged
Android
Google
Ovirt
Extracted Method Pairs
Method pairs extracted from the PRs.
Each PR is represented by a folder with a numerical ID (<PR_ID>). In each PR folder there is a first level of folders representing the files, then a second level of folders representing the methods. In each method folders there are the following files:
before.java : Method's source code before the PR
after.java : Method's source code after the PR is accepted and merged
operations.txt : AST operations performed on the method as extracted by GumTreeDiff
signature.txt : Fully qualified signatures of the method before/after
Android
Google
Ovirt
Datasets
Dataset of method pairs used to train and evaluate the NMT models. The data is splitted in training (80%), validation (10%), and test (10%). We provide several representations of the data.
Source Code
Method Pairs' source code before/after the PR. Each PR <ID> is in before/<ID>.java which maps to after/<ID>.java.
Android
Google
Ovirt
Source Code - Abstract Code
Method Pair's source code and abstracted code. The format of the files is the following:
====
<abstracted code before>
<abstracted code after>
----
<source code before>
----
<source code after>
(empty line)
Android
Google
Ovirt
Abstract Code for NMT
Dataset of abstracted code changes for each repository and for small and medium methods used to train and evaluate the NMT models.
Android
Google
Ovirt
All
Predictions
Predictions of the NMT models at with different beam sizes (number of predictions for each input), for the test set.
Android
Google
Ovirt
All