We crawl the top-star 100 Java projects from the Github in 2020/09. We further extract the abstract syntax tree and static analysis on the extracted code snippets.
There are 1,288,071 methods in the dataset in which 218,737 methods contain documentation.
In those methods that contain documentation, there are 21,974 methods contain method caller and 56,461 contains method callee(all caller/callee can be found in the dataset). Besides, 122,119 methods contain field information, 121,186 contain co-location information and 218, 737 contain class information(declaration information).
Link1: the crawled raw data from github can be downloaded in this link.
Link2: the code how we extract the code knowledge graph from raw data can be downloaded in this link.
Link3: the code knowledge graph in json format can be downloaded in this link.
Link4: the visualization way by neo4j below. The resources of neo4j projects and instructions to deploy can be downloaded in this anonymous link.