CodeSearchNet
CodeSearchNet is a collection of datasets and benchmarks that explore the problem of code retrieval using natural language. This research is a continuation of some ideas presented in this blog post and is a joint collaboration between GitHub and the Deep Program Understanding group at Microsoft Research - Cambridge.
Here is the origin dataset website.
In our experiment, we use the same prepocessed dataset in the CodeBert link.
CodeKG
There are 1,288,071 methods in the dataset in which 218,737 methods contain documentation. In those methods that contain documentation, there are 21,974 methods contain method caller and 56,461 contains method callee(all caller/callee can be found in the dataset). Besides, 122,119 methods contain field information, 121,186 contain co-location information and 218, 737 contain class information(declaration information).
Here is the origin dataset website.
This is the link we have preprocessed and use for our experiment.
Funcom
Here is the origin dataset website.
This is the link we have preprocessed and use for our experiment.
CosBench
Here is the origin dataset website.
This is the link we have preprocessed and use for our experiment.