Details of Research Questions
Evaluation of OSSScope and RClassifer
Evaluation of OSSScope and RClassifer
Pre-study of quantifying θ
The threshold θ controlled the extent of filtering. We evaluated the values of θ within the range of [5%, 10%, 15%, 20%, 25%]. The results details of SCA are listed above, which are not listed in the paper.
When θ is set as 15%, F1 score can reach the highest value of 89.36%.
RQ1: Effectiveness
When applying the same SCA algorithm, the function feature database provided by OSSScope can be significantly better than the original feature library selection method on the ground truth data set. Precision and recall increased by 0.87% and 22.65%, respectively. Although limited by the judgment criteria of the SCA algorithm, OSSScope improves the results of SCA in terms of data quality.
RQ2: Ablation Study
The ablation study shows the different impact of each step on the SCA results. The metrics filtering step covers most of the TPL repositories. As a supplement, the supplementary closure step adds the repositories that are mistakenly filtered. Finally, the repository filtering step avoids adding too many duplicated or useless features, which improves accuracy and boosts efficiency. By adopting these three steps, OSSScope can significantly enhance the quality of the feature database while ensuring the detection efficiency.
The details of SCA are listed as above. The True Positive, False Positive, False Negative and the F1 score are not listed in the paper.
RQ3: Characteristics
OSSScope includes repositories with a range of 20 to 1,392 functions, demonstrating active maintenance and minimal popularity, which aligns with the basic metrics. Excluded repositories, often with high function counts or specialized content, do not fit TPL suitability due to their specific or duplicating nature.
These data can be found in the final-repo-list and filtered-by-low-value-repo
RQ4: Generalization
Based on the results of OSSScope, we proposed a lightweight model named RClassifier. When migrating to other language ecosystems, RClassifier can still identify the high-quality TPL feature scope for SCA detection. The scope created by the RClassifier can reach 84.61% precision and 92.54% recalls on the manually marked ground truth. Compared to OSSScope, the time overhead and computing resources are greatly reduced.
The details of SCA are listed as above. The True Positive, False Positive, False Negative and the F1 score are not listed in the paper.