The rapid proliferation and increasing complexity of software demand robust quality assurance, with graphical user interface (GUI) testing playing a pivotal role. Crowdsourced testing has proven effective in this context by leveraging the diversity of human testers to achieve rich, scenario-based coverage across varied devices, user behaviors, and usage environments. In parallel, automated testing, particularly with the advent of large language models (LLMs), offers significant advantages in controllability, reproducibility, and efficiency, enabling scalable and systematic exploration. However, automated approaches often lack the behavioral diversity characteristic of human testers, limiting their capability to fully simulate real-world testing dynamics. To address this gap, we present PersonaTester, a novel personified-LLM-based framework designed to automate crowdsourced GUI testing. By injecting representative personas, defined along three orthogonal dimensions: testing mindset, exploration strategy, and interaction habit, into LLM-based agents, PersonaTester enables the simulation of diverse human-like testing behaviors in a controllable and repeatable manner. Experimental results demonstrate that PersonaTester faithfully reproduces the behavioral patterns of real crowdworkers, exhibiting strong intra-persona consistency and clear inter-persona variability (117.86% - 126.23% improvement over the baseline). Moreover, persona-guided testing agents consistently generate more effective test events and trigger more crashes (100+) and functional bugs (11) than the baseline without persona, thus substantially advancing the realism and effectiveness of automated crowdsourced GUI testing.
This paper presents PersonaTester, a novel framework that practically enables personified LLM agents to simulate diverse and realistic human-like GUI testing behaviors, which is the first work to apply the persona concept to software testing.
This paper introduces a structured three-dimensional persona schema that systematically models testing mindsets, exploration strategies, and interaction habits for test exploration.
Empirical experiments show persona-guided LLM agents enhance test diversity and effectiveness, outperforming the non-personified baseline in bug triggering.
Persona B (Path 1)
Testing Mindset A. sequential & coherent, Exploration Strategy b. core function focused, Interaction Habit ii. valid & short input
Persona C (Path 2)
Testing Mindset A. sequential & coherent, Exploration Strategy c. input oriented, Interaction Habit iii. invalid input
Persona E (Path 3)
Testing Mindset B. divergent & non-linear, Exploration Strategy a. click oriented, Interaction Habit ii. valid & short input
RQ1.1 Intra-Cluster Cohesion
RQ1.2 Inter-Cluster Separation
RQ2.1 General Test Generation Effectiveness
RQ2.2 Input Test Generation Effectiveness
RQ3.1 Crash Bug Triggering Capability
RQ3.2 Functional Bug Triggering Capability
Inter-Cluster Separation Strategy between Random Strategy Exploration and Personified LLM Agents
(the random strategy exploration cannot finish specific tasks but can only randomly explore the apps under test, making the exploration trend quite different)
We replace the GPTs with DeepSeek models and keep all other configurations the same (1 non-personified agent and 9 different personified agents). The results show the similar patterns although the similarity drops a little. This is because that DeepSeek models are slightly weaker than GPT models in our tasks. Nevertheless, this can help illustrate that the design of PersonaTest can help achieve intended cwordworker simulation and express the persona distinctions.