Abstract
Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have been trained through billions of text/code tokens and demonstrated substantial success applications on various code-related tasks, could potentially circumvent the above problem. Nevertheless, the performance of PTMs on AWI has not been systematically investigated, leaving a gap in understanding their pros and cons. In this paper, we are the first to explore the feasibility of applying various PTMs for AWI. By conducting the extensive evaluation on 10K+ SpotBugs warnings from 10 large-scale and open-source projects, we observe that all studied PTMs are consistently 9.85%~21.12% better than the state-of-the-art ML-based AWI approaches. Besides, we investigate the impact of three primary aspects (i.e., data preprocessing, model training, and model prediction) in the typical PTM-based AWI workflow. Further, we identify the reasons for PTMs' underperformance on AWI. Based on our findings, we provide several practical guidelines to enhance PTM-based AWI in future work.
Overview of our study
Dataset
Research questions
RQ1: Effectiveness of PTM-based AWI. How is the performance of PTMs on AWI in comparison to the SOTA ML-based AWI approach?
RQ2.Analysis of warning context and abstraction in the data preprocessing. How do the data preprocessing ways affect the performance of PTMs on AWI?
RQ3.Analysis of pre-training and fine-tuning in the model training. How do the model training components affect the performance of PTMs on AWI?
RQ4.Analysis of within and cross project AWI scenarios in the model prediction. How do the model prediction scenarios affect the performance of PTMs on AWI?
RQ5. Further analysis on incorrect predictions in PTM-based AWI. What are root causes of incorrect predictions in the current PTM-based AWI approach?