One of the major drawbacks of DL-based models is their large training time with respect to other techniques. This training time could be amortized if these models could be reused across different projects belonging to different domains. The major factor that hinder the reusability of such models is the possible variability in the vocabulary for new unseen projects.
In this Research Question we show that we are able to successfully reuse an AST model trained on a given project to detect similar code fragments in a different project.
The following zip files contain all the candidates at method- and class-level extracted using a reused AST model (i.e., trained on the project Lucene and executed on all the other projects). The candidates have been compared with the original list of candidates available in RQ1.
The following zip file contains the training and test set of the reused CloneDetector model. In particular, the training set is formed only by the instances (manually validated) from one project (hibernate), while the test contains all the instances of the remaining 9 projects in the Projects dataset.