LLM-Guided Scenario-based GUI Testing

Overview

Contribution

Framework

Example of Decider

Example of Supervisior

LLM Prompt Template

Template Download: Source File

Reproduction Package

Code Download: Code

App List for Evaluation

Result Presentation

Overview

The assurance of mobile application (app) GUI has become increasingly important, as the GUI serves as the primary medium of interaction between users and apps. Although numerous automated GUI testing approaches have been developed with diverse strategies, a substantial gap remains between these approaches and the underlying app business logic. Most existing approaches focus on general exploration rather than the completion of specific testing scenarios, often resulting in missed coverage of critical functionalities. Inspired by the manual testing process, which treats business logic, driven testing scenarios as the fundamental unit of testing, this paper introduces an approach that leverages large language models (LLMs) to comprehend the semantics expressed in app GUIs and their contextual relevance to given testing scenarios. Building upon this capability, we propose ScenGen, a novel LLM-guided scenario-based GUI testing framework that employs a multi-agent collaboration mechanism to simulate and automate the phases of manual testing.

Specifically, ScenGen integrates five agents: the Observer, Decider, Executor, Supervisor, and Recorder. The Observer perceives the app GUI state by extracting and structuring GUI widgets and layouts, thereby interpreting the semantic information presented in the GUI. This information is then passed to the Decider, which makes scenario-driven decisions with the guidance of LLMs to identify target widgets and determine appropriate actions toward fulfilling specific testing goals. The Executor executes the decided operations on the app, while the Supervisor verifies whether the execution results align with the intended scenario completion, ensuring traceability and consistency in test generation and execution. Finally, the Recorder records the corresponding GUI operations into the context memory as a knowledge base for subsequent decision-making and concurrently monitors runtime bug occurrences. Comprehensive evaluations demonstrate that ScenGen effectively generates scenario-based GUI tests guided by LLM collaboration, achieving higher relevance to app business logic and improving the completeness of automated GUI testing.

Contribution

We propose ScenGen, a novel LLM-guided, scenario-based GUI testing framework that integrates multi-agent collaboration with GUI semantic understanding. The framework enables automated test generation aligned with high-level testing scenarios rather than low-level event exploration.
We design a GUI understanding and widget localization mechanism that combines computer vision and multi-modal LLMs to achieve comprehensive GUI comprehension and support semantic test generation.
We construct an app benchmark dataset containing ten representative testing scenarios and empirically evaluate ScenGen. The results demonstrate its effectiveness and efficiency in generating semantically meaningful, scenario-based GUI tests.