Supplementary material
Authors
Ilias Kalouptsoglou, Miltiadis Siavvas, Apostolos Ampatzoglou, Dionysios Kehagias, and Alexander Chatzigeorgiou
Abstract
Nowadays, software security is considered as a major aspect of software quality as the number of discovered vulnerabilities in software products is growing. Vulnerability Prediction is a mechanism that identifies the components of a software product that may contain security vulnerabilities. Vulnerability Prediction is considered beneficial in practice, since it can help software engineers to prioritize their testing and inspection efforts for detecting and fixing security flaws in the source code. This paper describes the results of a Systematic Mapping Study on 180 primary studies on the field of Vulnerability Prediction placing particular emphasis on the investigation of: a) the main goals of the Vulnerability Prediction-related studies; b) the data collection processes and the types of datasets that exist in the literature; c) the most thoroughly examined techniques for the design and development of the prediction models and their input features; and d) the mostly utilized evaluation techniques. To the best of our knowledge, this is the first thorough Systematic Mapping Study about Vulnerability Prediction. The results of our study suggest that: i) there are two major study types, primarily the prediction of vulnerable software components and subsequently, the time series forecasting of the evolution of the vulnerabilities in a software system; ii) most of the studies construct their own real-world vulnerability-related dataset, retrieving information from vulnerability databases; iii) there is a growing interest for Deep Learning-based predictors; iv) along with the rise of Deep Learning there is also a trend of representing the source code as text and especially in a graphical format (e.g., Control Flow Graphs, Data Flow Graphs, etc.); v) there are several metrics suitable for the evaluation of the Vulnerability Prediction models with the F1-score being the most decisive factor during either a train-test dataset separation or a cross-validation process; and vi) most studies focus on specific-project evaluation (i.e., within-project), bypassing the real-world scenario of cross-project prediction.
Index Terms
Systematic Mapping Study, Software Security, Vulnerability Prediction, Machine Learning