LLMPrior
Redefining Crowdsourced Test Report Prioritization:
An Innovative Approach with Large Language Model
Redefining Crowdsourced Test Report Prioritization:
An Innovative Approach with Large Language Model
Crowdsourced testing has become increasingly popular in software testing due to its openness nature, bringing diversity to tackle the fragmentation issues in mobile app testing. Nonetheless, the openness also poses challenges for app developers. Manual review of numerous reports becomes inevitable, which is both time-consuming and labor-intensive. To address the problem, crowdsourced test report prioritization has been proposed to improve the review efficiency. However, even though continuously developed and improved, current prioritization approaches still lack a deep understanding of semantic information when analyzing textual descriptions in crowdsourced test reports.
In this paper, we propose LLMPrior, a novel crowdsourced test report prioritization approach based on large language models (LLMs). Considering LLMs frequently fail to response with a complete prioritization result when the total report number is large, LLMPrior adopts an indirect prioritization strategy and consists of two components—bug categorization and report prioritization. LLMPrior begins by instructing an LLM to analyze all crowdsourced test reports' textual descriptions, categorizing them in a prescribed format according to the types of bugs they reveal. Then LLMPrior parses the LLM's output and applies a recurrent selection algorithm to the parsing result, resulting in a prioritized test report sequence. We conduct an empirical experiment to evaluate the effectiveness of our approach. The results demonstrate that LLMPrior not only outperforms the current state-of-the-art approach, but is also more feasible, efficient, and reliable with the help of prompt engineering techniques and our indirect prioritization strategy.
We construct a dataset containing 1,417 crowdsourced test reports from 20 mobile apps.
We provide code for reproduction here.
RQ1: How effective is LLMPrior when compared with the state-of-the-art baseline?
RQ2: How does LLMPrior enhance its effectiveness through the indirect prioritization strategy and prompt engineering techniques?
RQ3: How does LLMPrior's indirect prioritization strategy impact token efficiency with LLM?
We provide LLMPrior's detailed prioritization results in each iteration here.
We provide 4 illustrating examples of LLMPrior's conversations with LLM here.