Dataset collection
We collected representative Java open-source programs in various sizes by following the two criteria.
(1) We first collected Java open-source programs from the repositories published in the package managers including Ant, Maven, and Gradle since they are more likely to be packaged successfully. We got 3,500 programs as the initial list.
(2) We then selected representative programs by setting another two criteria:
① each program’s package should be relied on by at least one package, and ② there exist new packages relying on them within the last three years. Finally, we obtained 1,049 programs that can be packaged, of which the versions are all up-to-date till August 2022.
Experimental setup
Eventually, we performed the experiment on a server with 80 vCPUs (Intel® Xeon® Gold 6248 CPU @ 2.50 GHz ×2), with 188G of RAMs and GNU/Linux Ubuntu18.04 (64-bit) as the operating system.
In order to fasten the experiment process and prevent the influence of cloudy disk reading/writing, we copied programs into the RAM disk for performance analysis, and the speed of the RAM is 2,933 MT/s.