We collected literature from top software engineering conferences, including ICSE, FSE, ASE, ISSTA. To ensure timeliness, relevance, and quality, we only focused on the literature published in the past two years with at least 10 pages and used GitHub and open source as keywords to filter the title. After this filtering, 24 papers were left.
Then, we manually checked the literature and filtered out 9 papers that did not claim the explicit criteria for constructing their dataset from open-source software. The remaining 15 papers are kept for the metrics analysis.