Dataset

CodeSearchNet

CodeSearchNet is a collection of datasets and benchmarks that explore the problem of code retrieval using natural language. This research is a continuation of some ideas presented in this blog post and is a joint collaboration between GitHub and the Deep Program Understanding group at Microsoft Research - Cambridge.

Here is the origin dataset website.

In our experiment, we use the same prepocessed dataset in the CodeBert link.

CodeKG

There are 1,288,071 methods in the dataset in which 218,737 methods contain documentation. In those methods that contain documentation, there are 21,974 methods contain method caller and 56,461 contains method callee(all caller/callee can be found in the dataset). Besides, 122,119 methods contain field information, 121,186 contain co-location information and 218, 737 contain class information(declaration information).

Here is the origin dataset website.

This is the link we have preprocessed and use for our experiment.

Funcom

Here is the origin dataset website.

This is the link we have preprocessed and use for our experiment.

CosBench

Here is the origin dataset website.

This is the link we have preprocessed and use for our experiment.

Page updated

Google Sites

Report abuse