RQ4: Are DL-based models applicable for detecting clones

among different projects?

The goal of this research question is two-fold. On the one hand, we want to instantiate our approach in a realistic usage scenario in which only one code representation is available. On the other hand, we also want to show that Deep Learning based model can be used to identify inter-project clones. Indeed, the latter is one of the major limitations of previous works, where given the potentially large vocabulary of identified-based corpora, the approach was evaluated only to detect intra-project clones

RQ 4.1 - Duplicated Code

The zipped file contains the duplicated code identified among the apache-common libraries. We report the list of candidates identified as well as the actual source code. The following summarize the structure:

  • <library1>-<library2>

    • candidates.csv

    • <library1>

      • <class1>.java

      • ...

    • <library2>

      • <class1>.java

      • ...

RQ 4.2 - Imported Classes

The zipped file contains the identified shared classes from imported libraries in commons-weaver-1.3. The list of classes are organized based on the imported library. We also include the pom.xml files of commons-weaver-1.3 where the imported libraries are referenced.