Study Design
We admit the threats of selecting repositories from a single source. To measure this threat, we constructed a study on vcpkg.io (a C/C++ dependency manager from Microsoft, which selected more than 2000 commonly used C TPLs). We searched for the source of each TPL based on its homepage from the project information.
Result Analysis
Among the 2369 cases, excluding 259 projects that did not provide information, the sources of the remaining projects include GitHub (73.74%, 1747 cases), Gitlab (1.06%, 25 cases), SourceForge (2.07%, 49 cases), FreeDesktop (2.41%, 57 cases), kernel.org (0.13%, 3 cases), savannah.gnu.org (0.13%, 3 cases), individual site (9.54%, 226 cases). We further inspect the non-GitHub TPL. Of these 363 cases, 90.35% (328 cases) can be found in similar repositories or the same software family repository on GitHub, which are collected in our repository list. Therefore, although a single data source will cause the missing of some repositories, it has limited influence on the data coverage and diversity.