Human Centered AI (HCAI) is an interdisciplinary field at the intersection of AI and Human Computer Interaction. Some would add ethics to the list of disciplines. HCAI encompasses many important research topics, including Human Centered Explanatory AI, alignment of AI outcomes to human and societal values, the design of human work in relation to AI, and social justice.
Background: Generative AI in Software Engineering
During 2019-2020, we began to explore generative AI as a research topic. We proposed an analytic language to describe human and AI actions in human-AI collaborations (Muller et al., 2020). We were able to test and refine this vocabulary in a formal brainstorming experiment (Muller, Houde, et al., 2024, see below), and we revised it further to analyze tensions between exploration and optimization in human-AI collaborations (Muller et al., 2025). We re-used concepts from this work in the first general set of design principles for generative AI applications (Weisz et al., 2024).
Because we worked in a software organization, our first projects were to understand how software engineers might consider AI assistance in programming (Weisz et al., 2021, 2022). These exploratory experiences led to an early implementation of a LLM-based "Programmer's Assistant" in an interactive development environment, which provided quite popular (Ross et al., 2023). We also explored graphical sticky-note interfaces through which a group of engineers could interact with a LLM-based agent (Gonzalez et al., 2024; He et al., 2024).
We had motivated parts of this work in terms of human-AI co-creativity. We took a mixed-methods approach to evaluating our prototypes. However, we realized that our quantitative evaluations were based on how close human+AI could come to a known correct answer. We wanted to expand our thinking to more creative and co-creative approaches.
Human-AI Co-Creativity
I led a major part of IBM Research's work in human-AI co-creativity. We asked if a Large Language Model (LLM) could be used in conceptual design, such as an exploration of analogies between well-known human concepts and computing concepts - e.g., "If a database is a toolbox for data, what data-tools would we find in that toolbox?" (Muller, Candello, & Weisz, 2023). The LLM readily provided detailed, plausible design extensions for each proposed analogy. Based on those successes, we next asked the LLM to generate its own metaphors (e.g., "what is a good metaphor for a human who is using a computer...?"). The LLM was able to generate design concepts and metaphors, and was able to provide detailed rationales for its design proposals.
The LLM's metaphor became a way of framing the design problem. Following work in framing and reframing as "moves" in creativity (MacNeil et al., 2021; Silk et al., 2021), we next conducted an informal experiment in which we rejected the LLM's frame, requiring it to reframe the problem. Again, the LLM was capable of each design-support activity (Muller & Weisz, 2023). I summarized this research program in a keynote address to the AI Ethics workshop at the International Joint Conference on AI (Muller, 2023). More recent work continued these ideas in the form of human-AI creative brainstorming (Muller, He, & Weisz, 2024), story-writing by multiple AI agents (Weisz et al., 2025) and an initial exploration of a LLM as a "creative muse" (Richards et al, 2025).
In all cases, we organized the work as an activity of a skilled-human-with-tool(s), and in these cases the tools were LLMs. We were inspired by Schon's discussions of "conversations with the materials in a design situation" (Schon, 1992), and with the tools that were used on those materials (Schon, 1987). Of course, an AI and its outcomes were more conversational than Schon's architectural drawings and therapy plans. What was similar in our approaches was that in all cases a knowledgeable human was guiding the work. We explicitly refused to structure the project as a formal workflow (Muller, He, and Weisz, 2023) or as a then-conventional "human-in-the-loop" scenario in which a human responded to the needs of an AI (Muller & Weisz, 2022). Schon (1987, 1992) had argued that materials and tools could "talk back" in a design situation, and of course a LLM usually talks! In common with Schon's experiences, we focused on the human, and on how the materials and tools might be (a) useful and usable for the human's intentions, while being (b) informative to the human through both their compliance and (c) their resistance and/or friction against the human's initial assumptions. However, unlike Schon's examples from the 1980-1990s, our materials and tools were far more articulate. See Marchand (2016) and Sennett (2009) for related accounts of tools and materials in interaction with human craftworkers.
Assigning a Co-Creative Role to an AI
In our next experiments, we created an LLM-based AI agent in a Slack channel (Muller, Houde, et al., 2024). We called the agent "Koala." We conducted two formal experiments in which a group of humans and Koala brainstormed together in the Slack channel, first in divergent generation of many ideas, and then in convergent selection of the three "final" ideas for an imaginary client. For a formal and statistical analysis, we re-used a Mixed Initiative Generative AI action vocabulary that we had developed earlier (Muller et al,, 2020). We learned that, when humans chose the three "final" brainstormed ideas for the client, they accepted 30% of ideas that had been initiated by Koala. We also observed that ideas were more likely to become "final" if they had been endorsed or critiqued by either human or Koala, and that ideas that received attention from both human and Koala were even more likely to be chosen as "final" ideas. In these terms, we observed collaborative idea-generation and idea-refinement by humans and AI together.
However, we also observed problems with the AI agent. In different sessions, we compared two different versions of Koala (Houde et al., 2025). Based on an analysis by McComb et al. (2023), we designed Reactive-Koala to contribute ideas only when asked by a human. By contrast, we designed Proactive-Koala to compute the importance of a potential brainstorming contribution, and to spontaneously add that contribution to the Slack conversation if it exceeded an importance-threshold. During sessions with Proactive-Koala, humans complained that the agent was generating too many ideas, and described many of those ideas as having low value. In overall ratings, all humans preferred to work with an AI rather than the no-AI control condition. However, humans strongly preferred the Reactive AI to the Proactive AI.
Human participants tried to use instructions (prompts) in the chat to reduce the Proactive agent's overwhelming contributions. In a few cases, humans tried to elicit more contributions from the Reactive agent. Therefore, in a second experiment, we provided several features that allowed the human participants to adjust the degree of practivity of the agent. Humans readily adopted these features, redesigning the agent while interacting with it.
These experiments extend four areas of on-going theory development.
First, as described, we contributed new insights to Schon's conception of designerly conversations with materials and tools (1987, 1992). Schon's metaphorical conversations became literal conversations in our experiments. Those literal conversations moved from Schon's informal and incompletely specified criteria, to more measurable behaviors and outcomes based on the articulations of both humans and tools (i.e., through quantitative as well as qualitative analyses).
Second, our experiments showed how humans can adapt LLMs while adopting them - an extension of the more cyclic processes - design-then-use - in earlier studies of technology appropriation (e.g., Muller et al., 2016).
Third, we refined concepts of reactivity and proactivity beyond the reactive-vs.-proactive binary offered as fixed designs by McComb et al. (2023), to a set of dimensions that users can modify dynamically while interacting with the agent.
Finally, our experiments open new possibilities in Participatory Design. Classical studies of PD described two separate phases: activities of design are usually distinct from activities of work, which follows after the co-designed concepts have been implemented (e.g., Bjerknes et al., 1987; Muller & Kuhn, 1993; Simonson & Robertson, 2012). By contrast, our second experiment with Koala showed that the human participants could collaboratively redesign the agent in real time, while doing the work. Thereby, activities of design and activities of work become merged, and the processes of PD take on new immediacy and dynamism.
Refining the Role of the AI
In our experiments with Koala, we assumed that a co-creative AI would behave similarly to a human partner. However, there are good reasons to define an AI agent as being explicitly different from a human - an entity rather than a simulated human (Milička, 2024; Perkins, 2024). We had showed that an AI agent could be instructed to act as a "creative muse" to advise and challenge a human creator (Richards et al., 2025). In unpublished work, we explored how multiple, distinct AI agents could provide contrasting views on various values-oriented topics to help a human decision-maker to explore diverse perspectives and potential outcomes (Morrison, under revision; Muller et al., under revision). We designed our experiments in the latter work as a contrast to the dominant idea of AI as an oracle or "artificial moral advisor" (e.g., Giubilini & Savulescu, 2018). We plan to make improvements suggested by reviewers, and to resubmit this work later in 2026.
In 2025 work, we explored configurations of AI agents to support user experience researchers (UXRs) in product groups (Muller, Candello, & He, in preparation). These groups sometimes have to complete a new project each week. We provided simple scripts that a UXR could choose among, to conduct one or more types of basic pluralistic thematic analysis (e.g., combinations of bottom-up, top-down, step-by-step thematic analysis methods) with UXR-selectable quality checks (e.g., none, show-evidence, provide confidence-ratings). We also allowed to conduct multiple types of pluralistic content analysis (e.g., qualitative, quantitative, discourse content analyses). Scripts were written in the same prompt language that a UXR would write to produce analysis. Thereby, scripts could potentially be used as collaboration aids, in which a seasoned UXR could write a script for use by a less-experienced UXR. (We had observed similar approaches to sharing resources in organization-scale social media applications - Geyer et al., 2008; Muller et al., 2009, 2010.) Scripts could become community resources, available for incremental improvement and specialized adaptations. During the execution of a script, the user could interrupt or abandon the script, replacing it with a human-to-AI dialog as needed.
These experiments again used our strategy to provide a pluralistic set of outcomes, requiring a human UXR to use their own knowledge, experience, and discretion to choose among those multiple outcomes. We deliberately configured the scripts to present the human user with choices, rather than with a single recommendation (e.g., as proposed by commercial vendors of LLM-based qualitative analysis products). Our design was also sensitive to the changing needs of UXRs, who could choose the depth of their analysis based on available time, deadlines, and competing projects. In these ways, we extended Ehn's concepts from Work-Oriented Design of Computer Artifacts (Ehn, 1988) to a more contemporary, flexible worker-oriented design of user-research affordances.
These research projects open new possibilities of creating useful distinctions between human and AI, while supporting their collaborations through shared conversations and (where appropriate) common tasks and activities during those collaborations (Muller et al., 2020; Muller, He, & Weisz, 2024, 2025).
Conclusion
Our research strategy emphasizes the human user as a responsible, accountable, and independent actor who can choose to invoke LLM-based technologies under their own human control. We built prototypes and demos of these human-controlled configurations, and in some cases we collected user evaluations and counter-proposals to redesign those configurations, including the ability of business users to redefine the LLM's behavior in real-time. Further expansion of this work shows promise to contribute to concepts the nature of LLM-based agents, technology adoption of such agents, proactivity of agents, and real-time participatory design.
References
Bjerknes, G., Ehn, P., & Kyng, M. (1987). Computers and democracy-a Scandinavian challenge. Gower Publishing.
Candello, H., Geyer, W., Kunde, S., Muller, M., Sarkar, D., He, J., ... & Pelletier, L. (2025). The Emerging Use of GenAI for UX Research in Software Development: Challenges and Opportunities. arXiv preprint arXiv:2512.15944.
Ehn, P. (1988). Work-oriented design of computer artifacts. Lawrence Erlbaum.
Geyer, W., Dugan, C., DiMicco, J., Millen, D. R., Brownholtz, B., & Muller, M. (2008). Use and reuse of shared lists as a social content type. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1545-1554).
Giubilini, A., & Savulescu, J. (2018). The artificial moral advisor. The “ideal observer” meets artificial intelligence. Philosophy & technology, 31(2), 169-188.
Gonzalez, G. E., Moran, D. A. S., Houde, S., He, J., Ross, S. I., Muller, M. J., ... & Weisz, J. D. (2024). Collaborative Canvas: A Tool for Exploring LLM Use in Group Ideation Tasks. In IUI Workshops.
He, J., Houde, S., Gonzalez, G. E., Silva Moran, D. A., Ross, S. I., Muller, M., & Weisz, J. D. (2024). AI and the Future of Collaborative Work: Group Ideation with an LLM in a Virtual Canvas. In Proceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work (pp. 1-14).
Houde, S., Brimijoin, K., Muller, M., Ross, S. I., Silva Moran, D. A., Gonzalez, G. E., ... & Weisz, J. D. (2025). Controlling AI Agent Participation in Group Conversations: A Human-Centered Approach. In Proceedings of the 30th International Conference on Intelligent User Interfaces (pp. 390-408).
Houde, S., Ross, S. I., Muller, M., Agarwal, M., Martinez, F., Richards, J., ... & Weisz, J. D. (2022, March). Opportunities for generative AI in UX modernization. In Joint Proceedings of the ACM IUI Workshops.
MacNeil, S., Ding, Z., Quan, K., Parashos, T. J., Sun, Y., & Dow, S. P. (2021, June). Framing creative work: Helping novices frame better problems through interactive scaffolding. In Proceedings of the 13th Conference on Creativity and Cognition (pp. 1-10).
Marchand, T. (2016). Craftwork as problem-solving: Ethnographic studies of design and making. Routledge.
McComb, C., Boatwright, P., & Cagan, J. (2023). Focus and modality: Defining a roadmap to future AI-human teaming in design. Proceedings of the Design Society, 3, 1905-1914.
Milička, J. (2024). Theoretical and Methodological Framework for Studying Texts Produced by Large Language Models. arXiv preprint arXiv:2408.16740.
Morrison, K., et al. (under revision). The Impact of AI Theory of Mind on Decision Making in Value-Diverse Human-AI Teams.
Muller, M. (2023). Exploring Human-AI Co-Creativity under Human Control: Framing, Reframing, Brainstorming, and Future Challenges. In International Joint Conference on Artificial Intelligence.
Muller, M., Candello, H., & He, J. (in preparation). Scripted UI to LLM-Supported Thematic Analysis.
Muller, M., Candello, H., & Weisz, J. (2023). Interactional co-creativity of human and AI in analogy-based design. In International Conference on Computational Creativity.
Muller, M. J., Milien, D. R., & Feinberg, J. (2009). Information curators in an enterprise file-sharing service. In ECSCW 2009 (pp. 403-410). Springer London.
Muller, M., Millen, D. R., & Feinberg, J. (2010). Patterns of usage in an enterprise file-sharing service: publicizing, discovering, and telling the news. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 763-766).
Muller, M., & Weisz, J. (2023). Analogies-based design using a generative AI application: A play in three acts. In ACM Conference on Designing Interactive Systems.
Muller, M., Weisz, J. D., & Geyer, W. (2020). Mixed initiative generative AI interfaces: An analytic framework for generative AI applications. In Proceedings of the Workshop The Future of Co-Creative Systems-A Workshop on Human-Computer Co-Creativity of the 11th International Conference on Computational Creativity (ICCC 2020) (p. 3). Association for Computational Creativity (ACC).
Muller, M., He, J., & Weisz, J. D. (2025). Exploration and Optimization of Generative Variability in Future Work: A Mixed-Initiative Analysis. In Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work (pp. 1-13).
Muller, M., He, J., & Weisz, J. D. (2023). The Trouble with AI-Based Workflows. In workshop proceedings, ACM CHI Conference on Human Factors in Computing Systems, https://www.researchgate.net/publication/368242833_The_Trouble_with_AI-Based_Workflows
Muller, M., He, J., & Weisz, J. (2024). Workplace Everyday-Creativity through a Highly-Conversational UI to Large Language Models. In workshop proceedings, ACM CHI Conference on Human Factors in Computing Systems.
Muller, M., Houde, S., Gonzalez, G., Brimijoin, K., Ross, S. I., Moran, D. A. S., & Weisz, J. D. (2024). Group brainstorming with an ai agent: Creating and selecting ideas. In International conference on computational creativity (p. 10).
Muller, M., & Kuhn, S. (1993). Participatory design. Communications of the ACM, 36(6), 24-28.
Muller, M., Morrison, K., and Varshney, K. (under revision). AI Pseudo-Agents in Ethical Conversations with One Another and a Human.
Muller, M., Neureiter, K., Verdezoto, N., Krischkowsky, A., Al Zubaidi-Polli, A. M., & Tscheligi, M. (2016). Collaborative appropriation: How couples, teams, groups and communities adapt and adopt technologies. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (pp. 473-480).
Muller, M., & Seaborn, K. (2025). Stepford Twins and Potemkin Engineering: A Critique of Synthetic Personas in the Age of Generative AI. In workshop proceedings, Aarhus Conference.
Muller, M., & Weisz, J. (2022). Extending a human-ai collaboration framework with dynamism and sociality. In Proceedings of the 1st Annual Meeting of the Symposium on Human-Computer Interaction for Work (pp. 1-12).
Muller, M., Weisz, J. D., Houde, S., & Ross, S. I. (2024). Drinking chai with your (AI) programming partner: Value tensions in the tokenization of future human-AI collaborative work. In Proceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work (pp. 1-15).
Perkins, M. (2024). Artificial Intelligence: The Human, the Multiple, and the Cartel. Available at SSRN 4925940.
Richards, J. T., Martino, J., Bellamy, R. K., & Muller, M. (2025). Musings on AI Muses: Support for Human Creativity. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Creative AI Track: Humanity.
Ross, S. I., Martinez, F., Houde, S., Muller, M., & Weisz, J. D. (2023). The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces (pp. 491-514).
Schön, D. A. (1992). Designing as reflective conversation with the materials of a design situation. Knowledge-based systems, 5(1), 3-14.
Schön, D. A. (1987). Educating the reflective practitioner: Toward a new design for teaching and learning in the professions. Jossey-Bass.
Sennett, R. (2009). The craftman. Yale University Press.
Silk, E. M., Rechkemmer, A. E., Daly, S. R., Jablokow, K. W., & McKilligan, S. (2021). Problem framing and cognitive style: Impacts on design ideation perceptions. Design studies, 74, 101015.
Simonson, J., & Robertson, T. (2012). Routledge international handbook of participatory design. Routledge.
Weisz, J. D., He, J., Muller, M., Hoefer, G., Miles, R., & Geyer, W. (2024). Design principles for generative AI applications. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (pp. 1-22).
Weisz, J. D., Kumar, S. V., Muller, M., Browne, K. E., Goldberg, A., Heintze, K. E., & Bajpai, S. (2025). Examining the use and impact of an ai code assistant on developer productivity and experience in the enterprise. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (pp. 1-13).
Weisz, J. D., Muller, M., Houde, S., Richards, J., Ross, S. I., Martinez, F., ... & Talamadupula, K. (2021). Perfection not required? Human-AI partnerships in code translation. In Proceedings of the 26th International Conference on Intelligent User Interfaces (pp. 402-412).
Weisz, J. D., Muller, M., Ross, S. I., Martinez, F., Houde, S., Agarwal, M., ... & Richards, J. T. (2022). Better together? an evaluation of ai-supported code translation. In Proceedings of the 27th International Conference on Intelligent User Interfaces (pp. 369-391).
Weisz, J. D., Muller, M., & Varshney, K. R. (2025). Story Arena: A Multi-Agent Environment for Envisioning the Future of Software Engineering. arXiv preprint arXiv:2511.05410.