Code Representations

We use the following representations of the code:

Identifiers: a stream of identifiers and constants used in the code;
AST: a stream of the node types composing its AST;
Bytecode: a stream of bytecode mnemonic opcodes (e.g., iload, invokevirtual) forming the compiled code;
CFG: a code fragment is expressed as its CFG.

Representations for Projects

The zipped file contains a folder for each project system in the Projects dataset. Each project folder contains a folder for each representation. The representations folders contain a text file for methods and class representations. Each line in the representation's file represents a single artifact (i.e., method or class). The signature for each artifact is contained in the .key files for classes and methods. There is a mapping between the lines of the .key files and the lines of the representations files. The following summarize the structure of the dataset:

<project>
- <representation>
  - methods.src
  - types.src
- methods.src.key
- types.src.key

Download Representations for Projects

Representations for Libraries

The zipped file contains the bytecode representation for all the 47 libraries in the Library dataset. The representation is at class-level only and aggregated for all the libraries. The following summarize the structure of the dataset:

types.src
types.key

Download Representations for Libraries

Google Sites

Report abuse