Home

The datasets can be downloaded here.

The datasets include two benchmarks as listed below.

BigCloneBench [1] is a benchmark (98% clones are Type-III/Type-IV clones) that is widely used to detect code clones. It is mined from IJaDataset2.0 and confirmed by 3 experts. For comparison, we use the BigCloneBench dataset used in Wei and Li[2] that contains 9,134 code fragments.

Modified-BigCloneBench is the improved dataset derived from BigCloneBench, which is more appropriate for validating and comparing approaches for detecting Type-III/Type-IV clones.

[1] Jeffrey Svajlenko and Chanchal K Roy. 2015. Evaluating clone detection tools with bigclonebench. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).IEEE, 131–140.

[2] Huihui Wei and Ming Li. 2017. Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code. In IJCAI. 3034–30

Page updated

Google Sites

Report abuse