Details of Research Questions

Evaluation of OSSScope and RClassifer

Pre-study of quantifying θ

The threshold θ controlled the extent of filtering. We evaluated the values of θ within the range of [5%, 10%, 15%, 20%, 25%]. The results details of SCA are listed above, which are not listed in the paper.

When θ is set as 15%, F1 score can reach the highest value of 89.36%.

RQ1: Effectiveness

When applying the same SCA algorithm, the function feature database provided by OSSScope can be significantly better than the original feature library selection method on the ground truth data set. Precision and recall increased by 0.87% and 22.65%, respectively. Although limited by the judgment criteria of the SCA algorithm, OSSScope improves the results of SCA in terms of data quality.

SCA-result.xlsx

RQ2: Ablation Study

The ablation study shows the different impact of each step on the SCA results. The metrics filtering step covers most of the TPL repositories. As a supplement, the supplementary closure step adds the repositories that are mistakenly filtered. Finally, the repository filtering step avoids adding too many duplicated or useless features, which improves accuracy and boosts efficiency. By adopting these three steps, OSSScope can significantly enhance the quality of the feature database while ensuring the detection efficiency.

The details of SCA are listed as above. The True Positive, False Positive, False Negative and the F1 score are not listed in the paper.

RQ3: Characteristics

OSSScope includes repositories with a range of 20 to 1,392 functions, demonstrating active maintenance and minimal popularity, which aligns with the basic metrics. Excluded repositories, often with high function counts or specialized content, do not fit TPL suitability due to their specific or duplicating nature.

These data can be found in the final-repo-list and filtered-by-low-value-repo

RQ4: Generalization

Based on the results of OSSScope, we proposed a lightweight model named RClassifier. When migrating to other language ecosystems, RClassifier can still identify the high-quality TPL feature scope for SCA detection. The scope created by the RClassifier can reach 84.61% precision and 92.54% recalls on the manually marked ground truth. Compared to OSSScope, the time overhead and computing resources are greatly reduced.

The details of SCA are listed as above. The True Positive, False Positive, False Negative and the F1 score are not listed in the paper.

Page updated

Google Sites

Report abuse