Alongside the recent excitement towards generative AI, there has been a surge of interest in using large language models (LLMs) as simulated research participants. Researchers and technologists have begun experimenting with LLMs to simulate various forms of human participation in research, from conducting interviews [1] to generating responses in surveys [2] or text annotation tasks [3]. Far from being a speculative exercise, LLMs are now being applied across a range of contexts within research [4,5,6] and industry settings [7,8]. The motivations behind these substitution proposals are varied and include goals such as increasing the speed and scale of research, reducing costs, augmenting the diversity of collected data, or protecting participants from harm [9].
While LLMs may be championed for their pragmatic value, their use invites epistemological questions that must first be understood and addressed by the CHI community. Simulating human participants pushes the boundaries of what it means to produce knowledge in HCI, a discipline that has long prioritized contextually situated inquiries into human experiences, behaviors, and interactions with technology, particularly through the core HCI practice of user experience (UX) research. LLMs trained on vast datasets could produce 'believable' and fluent human-like responses, but they lack the embodied lived experiences that human participants bring to research [10]. What are the limits of a model's ability to produce knowledge that is rooted in specific cultural, social, and material conditions, given their disembodiment and inherently partial representation of human experience? These questions demand a critical reevaluation of what counts as `valid' data in HCI, the ethics of using LLM-generated data, and a reckoning with the ways that knowledge generated from simulated interactions is different from those derived from human experiences with technology.
We must also contend with the ethical implications of using LLMs as human subjects in HCI research. Most current LLMs are trained on public content extracted from the Internet [11], often without explicit consent from data subjects. This lack of informed consent may undermine the autonomy of data subjects, which is central to research ethics. In addition, the opaque nature of LLM training processes makes it difficult to trace specific pieces of data back to their authors, as much data lacks author linkage with text [12], complicating efforts to obtain retroactive consent or provide meaningful transparency. In addition to ethical concerns, using synthetically generated research data may also lack validity. Consider how a model trained predominantly on Western-centric datasets may not accurately simulate experiences from non-Western contexts, limiting its utility in research that seeks to understand perspectives from communities in the Global South. Qualitative researchers increasingly surface concerns when using LLMs either as simulated research participants [13] or as analysis tools [14]: using LLMs in these functions can lack depth and contextual nuance, flattening the richness of human perspective and insight.
Different models undergo different alignment processes depending on the goals and values of the developers. For instance, models like OpenAI’s ChatGPT are aligned to avoid generating certain harmful or offensive content; on the other hand, platforms like Gab.ai which market themselves as 'uncensored and unfiltered,' or XAI's Grok, might produce outputs that are less moderated and could align more closely with conservative ideologies. Given recent work showing the risks of language generated about minority groups when intending to represent them [15] researchers should exercise caution when drawing inferences from simulated interview data.
In considering the use of LLMs as simulated research participants in HCI, it is important to acknowledge and address the complexity of determining if, when, and how they can be applied appropriately across different research contexts. HCI, as a multidisciplinary field, engages with diverse methodologies and research goals, ranging from usability testing and design exploration to critical inquiries into human-technology interaction. The decision to incorporate LLMs requires nuanced consideration of the specific research objectives and types of knowledge being sought. For example, in user experience research, some might find LLMs useful for pilot testing interview protocols, allowing researchers to iterate quickly and refine questions before engaging real participants. On the other hand, the appropriateness of using LLMs might be contested when the research seeks to explore deeply situated human experiences, cultural practices, or sensitive topics. Here, the absence of embodied, lived experience in LLM-generated responses can introduce risks of misrepresentation, oversimplification, or even harm, particularly when studying marginalized or vulnerable populations [16].
This workshop seeks to foster critical dialogue around the situational and context-specific considerations of when and how to employ LLMs as simulated research participants in HCI. Through a series of discussions and collaborative activities, participants will explore the benefits and limitations of using LLMs in various HCI contexts, paying particular attention to issues of validity, ethics, and equity. Our goal is to develop guidelines that address the nuances of incorporating LLMs into research, ensuring that their use aligns with the epistemological commitments and ethical standards of HCI.
This workshop will serve as the first step to develop guidelines, standards, and documentation practices for LLM use in HCI human subjects research. Workshop participants will be invited to submit reflection on suggested criteria for using LLM-generated data in UX research, which will provide the basis for a first draft of suggested guidelines, which will be discussed at the workshop. Based on engagement in the workshop, participants will join an extended collaboration as coauthors to synthesize and finalize these guidelines. After publication, we will work with conferences like CHI, educators, and standards organizations to determine if parts of our guidelines can be useful to their missions. The outputs of our workshop will be broadly applicable to not just academic research, but also industry user experience and human subject research.
We invite perspectives, extended abstracts, and position papers between one and four pages (excluding references) on LLM use in UX research. We encourage authors to address some or all of the following questions in their position papers:
How should we taxonomize LLM use in UX research?
When should we use LLMs in UX research?
How should we document LLM use in UX research?
How should we validate LLM use in UX research?
And other topics you think are important!
All submissions will be reviewed, and accepted submissions will be posted on our website. Workshop organizers will make accepted position papers, and a summary synthesizing areas of agreement and contention, available to authors prior to the workshop. After the workshop, authors will be invited to collaborate on a position paper on standards for LLM in UX research use, documentation, and validation. Submissions are due February 20th AoE.
Please submit via https://forms.gle/ULoQ7AR8iKbVwdw6A
For any questions, please contact wagnew[at]andrew[dot]cmu[dot]edu
William Agnew, wagnew[at]andrew[dot]cmu[dot]edu
Shivani Kapania
Marianne Aubin Le Quéré
Sarah Fox
Hoda Heidari