Workshop at the 13th International Conference on Human-Agent Interaction
Yokohama, Japan
* Sorted by the presentation title.
Adapting Dialogue Strategies to Learners: A Mentoring System for Inquiry-Based Learning
Hinano Koga, Kohei Okuoka and Masahiko Osawa
[+ Abstract]
Educator mentoring is important in inquiry-based learning. However, mentoring each learner individually increases educators’ burden. In inquiry-based learning, learners are expected to set their own problems, and thus stimulating thoughts on problem-setting is indispensable. However, most existing studies have mainly focused on supporting the exploration of solutions to already defined problems, and little attention has been paid to encouraging learners’ reflection on problem-setting. To address this issue, this study proposes a mentoring system that switches dialogue strategies based on learner responses. The system incorporates the Design of Questions framework into large language model (LLM) prompts to stimulate reflection, while also engaging in active listening to encourage expression when learners’ utterances are less elaborate. A between-subjects experiment was conducted in a university course where students defined problems using the Objectives and Key Results (OKR) framework. Participants were assigned to either the proposed condition or the Active Listening Only condition. Evaluation employed pre- and post-questionnaires and dialogue log analysis. Results showed that mentoring with the proposed system significantly increased learners’ confidence in their objectives compared with the Active Listening Only condition. For depth of thinking, no significant difference was observed between conditions, but scores improved significantly after mentoring in both conditions, suggesting that mentoring stimulated learners’ reflection. Participant comments indicated that their understanding of their problem-setting became clearer under the proposed condition. In contrast, comments from the Active Listening Only condition often referred to the motivation behind learners’ problem-setting, suggesting that encouraging subjective expressions helped clarify personal motivation. These findings suggest that both conditions promoted learners’ thinking, albeit from different perspectives: the proposed condition appeared to encourage clarification of problem-setting, whereas the Active Listening Only condition seemed to stimulate reflection on personal motivation. Future work should therefore analyze in detail how each dialogue strategy supports deeper thinking from these different perspectives.
Adaptive Embodiment of an Agent for Modifying User's Attention
Ryosuke Kawashima
[+ Abstract]
Embodied agents, such as robots and signage agents, are being increasingly employed in sales promotions, such as product recommendations. These agents attract overt visual attention through their presence and can influence purchasing decisions through interactive behavior. However, the presence of an embodied agent in public spaces can present challenges related to user attention, specifically leading to embarrassment due to perceived social evaluation. Furthermore, users may focus excessively on the agent’s embodied form, which could reduce the impact of the sales promotion. One possible cause of these problems is that the interaction phase is not considered and that the continuously embodied agent interacts with the user. To address this, we define the interaction as comprising three distinct phases in product recommendation scenarios: (1) Introduction Phase, where the agent captures user attention; (2) Recommendation Phase, where the agent provides product information; and (3) Dialog Phase, where the user responds. This study hypothesizes that adapting the agent’s embodiment to each of these phases can reduce user discomfort and enhance engagement. An in-person experiment was conducted in a public setting to test this hypothesis. A new agent design, termed ``Adaptive Embodiment,” was developed to improve user engagement and reduce discomfort during interaction across various phases by leveraging the strengths of existing embodied agents for each phase of interaction. In the experimental setup, Adaptive Embodiment served as the target condition, whereas Embodiment and Disembodiment served as comparative conditions. The experiment indicated that during the Recommendation Phase, where the agent primarily listens, the presence of the agent tends to draw more overt visual attention from bystanders, potentially leading to user embarrassment which increases user embarrassment and reduces the amount of time spent gazing at the product being recommended. The findings suggest that Adaptive Embodiment could be an effective solution to mitigate these issues.
Analyzing the Relationship between Conversational Language Style and Outcomes of Speed-Dating
Yuriko Kikuchi
[+ Abstract]
We study whether linguistic synchrony in transcripts of Japanese speed-dating dialogues relates to willingness to exchange contact information, and whether it can be used to model this outcome. The outcome is a binary, self-reported label indicating whether the evaluator said they would like to exchange contact information with their partner immediately after the speed-dating, analyzed separately for female-to-male and male-to-female evaluations. Building on prior findings that language style matching (LSM) tracks social cohesion and task performance, we hypothesized that greater synchrony would align with willingness to exchange contact information and improve predictive accuracy. We operationalize LSM as stylistic similarity in function-word use (particles, auxiliary verbs, pronouns, conjunctions, etc.): we compute a similarity score for each category and average across categories, yielding one LSM value per dialogue. To move beyond a global summary, we add a directional, response-side synchrony that compares a responder to the immediately preceding partner turn (rLSM), and a rolling-window rLSM(rw.rLSM) that aggregates short windows before computing rLSM to stabilize short turns and highlight local bursts of alignment. All measurements use the LIWC 2015 Japanese dictionary. Analytically, we run two-sample, two-sided Welch’s t-tests within each rater direction, comparing conversations whose evaluators answered “yes” versus “no” to “Would you like to exchange contact information?” In female-to-male evaluations, the willing group exhibited lower synchrony on several response-side and short-window measures; this suggests that not over-matching may be positively received. In male-to-female evaluations, we observe no clear differences. For prediction, we train separate L2-regularized logistic-regression classifiers by rater direction to classify “would exchange” vs. “would not exchange” from language-only synchrony features, using rater-held-out cross-validation with macro-F1-based thresholding. We observe near-chance performance, indicating that synchrony alone is insufficient and motivating semantic alignment and multimodal extensions.
Can LLMs Read Intent Across Cultures?
Taiga Sumi, Ayu Iida, Yasuhito Hosaka, Isabelle Lavelle, Masako Kohama, Sonoko Moriyama and Masahiko Osawa
[+ Abstract]
The authors aim to develop Large Language Models (LLMs) capable of generating responses that reflect human intentions and proposed integrating LLMs with cognitive models. The first method is LLM embedded in Cognitive Model (LEC), and the second is Cognitive Model embedded in LLM (CEL). Their evaluation experiments demonstrate that integrating LLMs with cognitive models enables intention reading. However, it is important to note that all prompts in the evaluation experiments were written in Japanese. The behavior of LLMs is known to vary significantly depending on the language used in the prompts. This study investigates the impact of prompt language on the performance of communication tasks that require responses reflecting the speaker’s intention when integrating LLMs with cognitive models. Specifically, prompts were created in six languages—Japanese, English, German, Italian, French, and Chinese—and performance was compared across languages. The tasks were identical to those used in prior research and involved three scenarios with discrepancies between utterances and their intended meaning: Sarcasm, TSUNDERE, and Cautious Coaching. The LLM used in this study was gpt-4.1. The results showed that, in the Sarcasm and Cautious Coaching scenarios, integrating LLMs with cognitive models achieved high performance across all six languages. In contrast, in the TSUNDERE scenario, responses based on the literal meaning of utterances were frequently observed in Italian, French, and Chinese, indicating performance differences across languages.
Effect of Shading on the Impressionof Minimal-Design Robots
Chue Cho, Ryoichi Nakashima and Masahiko Osawa
[+ Abstract]
Currently, a wide variety of visually designed social robots are being developed: Some are closely human-like designs, and others are human-unlike designs. The latter includes minimal-design robots, which are designed by eliminating unnecessary functions and decorative elements as much as possible. Minimal-design robots have some advantages that they are less susceptible to issues inducing the communication failure, such as the adaptation gap and the uncanny valley effect, which are the problems that human-like robots usually confront. On the other hand, the minimal-design robots have some disadvantages that they are difficult to convey emotions effectively due to their lack of facial expression, whereas the human-like appearance robots can produce facial expressions by moving facial components. To address this challenge, we propose a method to make minimal-design robots express their emotional valences (i.e., positive/negative emotions), inspired by studies about Noh masks, which revealed that the perceived impression of the mask changes depending on the shading. Since a Noh mask has facial parts but lacks movement, we think that the insights can be applied to a minimal-design robot. In this study, we examine whether and how the shading influences the impressions in minimal-design robots’ faces. We employ a robot with one camera embedded on a spherical face. To generate and manipulate the shading, we arrange six light sources in front of the robot: three positioned above and three below the frontal axis, and all equidistant from the center of the sphere. For each lighting condition, we generate one image of the shaded robot. In the experiment, we plan to ask participants to view each image and rate the impression with a seven-point positive–negative scale. Based on the experimental results, we discuss the possible effectiveness and applications of expressing emotions in minimal-design robots with shading.
Endogenous Formation of Collective Behavioral Patterns of Human-Autonomous Mobility Interactions in Mixed-Use Traffic Environments
Kaori Nakamura
[+ Abstract]
This study explores safe and efficient coexistence of autonomous vehicles (AVs) and pedestrians in mixed-use traffic environments. Using an evolutionary game theory framework with imitation and best-response dynamics, respectively, and under the assumption of heterogeneous risk perceptions among pedestrians, spontaneous collective behavioral patterns which can be formed as their interactions are analyzed. Numerical results indicate that fully autonomous decision-making vehicles may better avoid social dilemmas such as the ``Freezing Robot Problem" and promote cooperation. The study emphasizes the importance of aligning AV behavior with suitable environmental and traffic rule conditions for the successful integration of AVs into public spaces.
Enhancing Robot Expressiveness with Augmented Reality Avatar
Zejun Yu
[+ Abstract]
This paper explores a novel interaction paradigm where an expressive Augmented Reality (AR) avatar is physically embodied by a concealed robotic arm, enabling it to manipulate real-world objects. The central challenge in making this paradigm viable is maintaining the user’s sense of presence by ensuring the actuator remains completely concealed. We designed a new interaction system and propose a key occlusion algorithm that effectively conceal the physical robotic arm from view, allowing users to perceive only the expressive virtual avatar. This approach demonstrates the potential of a new Human-Robot Interaction paradigm and opens up possibilities for everyday service robots, educational applications, and remote presence systems where low-host and highly customizable expressiveness is essential.
Exploring Ethical Autonomy in Human-AI Interactions
Rafik Hadfi
[+ Abstract]
The gap between human judgment and AI autonomy is rapidly narrowing. This shift forces us to face the ethical implications of allowing AI agents to influence our decisions. Should AI agents be entrusted with consequential decisions? And if so, which ones, and under what conditions? To elucidate these questions, I will first argue that true autonomy requires more than independence from external constraints, and demands coherent reasoning. I will then present a framework combining deontological ethics and decision theory to model how autonomy operates in humans and AI agents. The core insight is that the design choices about memory and determinism create critical thresholds. Below these thresholds, AI agents can genuinely enhance human autonomy. Above them, agents begin substituting their own logic for human judgment, often imperceptibly. The framework also accounts for cognitive adaptation: how AI agents learn to navigate or even circumvent imposed constraints. I examine when this adaptation supports ethical accountability and when it undermines it. The practical outcome of this research is a set of design principles for building AI agents that respect human autonomy rather than merely emulating it. These principles matter as we rapidly deploy autonomous agents without adequate tools to assess whether they enhance or curtail human autonomy.
Exploring Robot-Mediated Remote Reminiscence Conversations
Ryunosuke Ito
[+ Abstract]
This presentation explores a robot-mediated approach to facilitate remote reminiscence conversations between two people in different locations. Reminiscence therapy has been widely used to help older adults recall personal memories through conversation, yet most existing methods require participants to be in the same physical space. To address this limitation, we developed a remote conversation support system that utilizes a shoulder-mounted robot to share live outdoor video with a remote participant in real time. The system integrates a large language model that generates context-aware questions based on both the captured scene and the conversation flow, following the principles of reminiscence therapy. This enables the remote therapy recipient to engage in natural, memory-triggering dialogue, as if visiting the place together with the local companion. To evaluate the system, we conducted an experiment comparing three conditions: (1) facilitation with reminiscence-based question generation, (2) facilitation without reminiscence prompts, and (3) no facilitation. The results showed that facilitation support encouraged more active and continuous conversations, and that the reminiscence-based approach effectively enhanced memory recall for the remote participant. This study contributes to understanding how robot-mediated dialogue systems can support remote interpersonal communication, reduce barriers to social connection, and promote cognitive engagement in aging and therapeutic contexts.
Expressive Augmentation with Environmental Projection
Hibiki Harano
[+ Abstract]
Recent advances in generative AI have enabled visual systems that support communication by generating or displaying images relevant to conversations. However, prior studies have identified two major drawbacks: users often become distracted by AI-generated visuals, and such systems may unintentionally reduce human agency or creative diversity. To address these issues, this research proposes the concept of Expressive Augmentation, which redefines virtual backgrounds as an environmental medium that implicitly extends human nonverbal expression into the surrounding space. Instead of requiring manual operation or explicit prompts, the system interprets users’ utterances in real time using a large language model (LLM) and dynamically projects visual representations behind the speaker. Three prototypes were developed to explore different contexts: (1) Casual conversation, where adaptive visuals reflected the tone and content of dialogue, increasing topic diversity and engagement; (2) Presentation support, where background colors, particles, and facial expressions enhanced emotional communication for speakers with limited expressiveness; and (3) Human–Agent interaction, where we are currently developing and studying a system that applies Expressive Augmentation to human–agent co–creating process. The system aims to create shared immersion and support creative exploration through gradual background transitions that reflect the ongoing conversation. Across these implementations, results suggest that adaptive visuals can enrich communicative experience and stimulate creative thinking, though excessive visual change may cause distraction. Expressive Augmentation seeks to balance adaptivity and focus, empowering both humans and agents through implicit, human-centered visual support. By treating the background not as decoration but as a dynamic communicative medium, this approach envisions a new form of interaction design that integrates generative AI into everyday human and agent communication environments.
Investigating Human Interest Dynamics Through Web Browsing Behavior
Nilupul Heshan and Randika Kodikara
[+ Abstract]
Maintaining focus during web-based learning poses a significant challenge for learners, as they are constantly exposed to distracting, goal-irrelevant information sources. This study examines how the semantic context of accessed content influences the preservation or disruption of goal-directed behavior in digital environments. To address this issue, we developed an analysis framework that monitors browsing behavior by capturing the semantics of observed content. In an experiment with university students performing online research tasks, we tracked browsing sequences and extracted semantic embeddings of observed web materials using CLIP. Our method evaluates two types of similarity scores: (1) between the materials and the task objective, reflecting goal-oriented interest, and (2) between consecutive materials, capturing whether learners sustain focus or drift away from assigned goals. Quantitative analyses demonstrated that the proposed method can capture diverse patterns of interest dynamics. We also confirmed that similarity scores for relevant task–material pairs were significantly higher than those for irrelevant pairs used as a baseline (p < 0.01), highlighting the framework's ability to detect interest fluctuations. A notable strength of this method is that it relies solely on observed browsing materials without requiring special devices to infer internal cognitive states. This characteristic provides a computationally efficient approach to understanding and supporting sustained engagement in digital learning. Future applications include adaptive support systems that dynamically respond to learners' cognitive states, offering timely task recommendations to mitigate attention drift and enhance task persistence.
Locally-Coherent Dialogue with Multitasker: A Case Study
Tomoyuki Maekawa
[+ Abstract]
I would like to discuss the challenges and solutions of multitasking during group conversations. When people perform another task (e.g., using a laptop) while talking with others, they sometimes miss important information and have difficulty catching up. Our goal is to develop an assistive agent that provides useful information to help them engage in the conversation smoothly. To observe human interaction, we conducted a case study with three participants: a facilitator, a presenter, and a multitasker. The presenter explained a paper they had read to the facilitator. The facilitator listened to the presenter and asked the multitasker questions at any time. The multitasker had a text chat with an AI chatbot and responded to the facilitator's questions only when asked. We found that the facilitator tended to provide a brief summary of the previous conversation to the multitasker before asking a question. The multitasker responded in a locally coherent manner; the responses were superficially consistent with the questions, but not with the whole conversation. These findings suggest that agents need to provide multitaskers with context-rich information to help them understand the meaning of others' utterances.
Simulating the Role of Emotions in Cooperative Behavior through Cognitive Modeling
Kawaji Ruiki and Morita Junya
[+ Abstract]
Emotions evolved to facilitate quick decision-making for survival and significantly influence the formation and maintenance of cooperation. Recently, artifacts that recognize and express emotion have been developed. However, the impact of artificial emotions on human cooperation is not well understood. To address this issue, this study uses cognitive modeling to simulate the role of emotions in cooperative behavior. We adopted the two-player version of "Hanabi" as a microworld to instantiate cooperative situations and incorporated the effects of emotions into instance-based learning within the ACT-R cognitive architecture. Specifically, emotional valence was modeled as a mood-congruency effect and arousal as a memory activation modulator. Additionally, we implemented decision-making that bypasses the instance-based process as heuristics to represent situations in which cooperative protocols are institutionalized. In the simulations, we manipulated task difficulty, emotional conditions (positive, neutral, negative, and dynamically fluctuating), and the degree of institutionalization. The results showed that when institutionalization was low, emotions negatively influenced performance, especially under positive conditions with high task difficulty. In these conditions, a decrease in performance due to repeated interactions was observed. In contrast, when the level of institutionalization was high, the influence of emotions was limited. Furthermore, improvement in cooperative performance through repeated interaction was observed under the neutral condition when emotions fluctuated in response to environmental feedback. In conclusion, we demonstrated that the role of emotion in cooperative behavior depends on the type of situation and that dynamic emotional adjustment contributes to the success of cooperation. At the same time, we identified a mechanism through which institutionalized heuristics suppress risk-seeking behavior driven by positive emotional conformity. These findings are important for designing artificial emotional systems that cooperate with humans.
The Role of a Critical Tongue Dialogue Strategy in Stimulating Emotion Regulation: An Interaction Model and Video-Based Study
Keisuke Magara
[+ Abstract]
This study proposes a dialogue strategy called “Critical Tongue Dialogue Strategy” (CTDS) for dialogue agents that respond to users experiencing anxiety or tension by delivering blunt, non-conformist remarks while maintaining a non-empathetic stance. This strategy employs balance theory principles to deliberately avoid acknowledging users’ negative emotions, instead utilizing interpersonal emotion regulation strategies. By creating emotional imbalance, it aims to trigger users’ own emotion regulation processes, thereby facilitating cognitive changes that enable them to reframe their anxieties and tensions in a more positive light. In this paper, we present the interaction modeling of this dialogue strategy and the results of psychological evaluations from a scenario-based and video-based study comparing CTDS agents with those employing empathetic dialogue strategies (n=108). Experimental results demonstrate that in terms of cognitive change assessment metrics, CTDS significantly outperformed the empathetic dialogue strategy.
Towards LLM-Based Agents Inferring Hidden Intentions
Ayu Iida, Kohei Okuoka, Takashi Omori, Ryoichi Nakashima and Masahiko Osawa
[+ Abstract]
Large Language Models (LLMs) have achieved remarkable progress in recent years. However, LLMs often show low performance in complex communications that require them to infer the hidden utterance intention of the partner. To improve the LLMs’ performances in these types of communications, we proposed LLM-based agents by integrating the LLMs with the Mental Model of Others (MMO). In this research, the MMO is defined as one of the cognitive models designed to predict and interpret the intentions of others. The evaluation results showed that our proposed agents generated appropriate utterances as if inferring the partner’s intentions. To advance this research topic further, we focused on two challenges in the study described above and investigated them in the follow-up studies. First, because the evaluations were conducted via a WebUI, we did not systematically examine the influence of sampling parameters, such as Temperature and Top_P. Our follow-up study manipulated these parameters and presented that our proposed agents robustly showed high performances regardless of these parameters. Second, because the agents’ performances were evaluated by a human evaluator, we have concerns about the maintenance of intra-rater consistency when the very large number of the cases should be evaluated. Another follow-up study examined whether a LLM can evaluate the generated utterances in the same way as human evaluators, demonstrating that the evaluations of the LLM and the human evaluator were similar. In summary, we suggested the effectiveness and robustness of the LLM-based agents by integrating LLMs and MMOs in communication tasks requiring the inference of hidden intentions, and implied the feasibility of the evaluation of the agents’ performances by LLMs.
Verification and Future Perspectives of Teachable Agent Research Utilizing Group Instructional Formats
Kazuma Ichikawa
[+ Abstract]
Learning-by-teaching has been proposed as an effective method for enhancing learners’ motivation by enabling them to deepen their understanding by teaching others. Prior research has demonstrated that this effect extends to scenarios in which learners teach virtual agents instead of humans, underscoring the educational potential of virtual agents. Consequently, designing agent-based environments that effectively support learners’ motivation has emerged as a key challenge. This work focuses on the number of teachable agents as an environmental factor influencing learners’ experiences. We developed a virtual classroom in which participants delivered lessons to either 1- or 20-agent students. Drawing on cognitive appraisal theory, we examined whether the stress induced by an increased number of agents functions as eustress—a positive form of stress that enhances performance—or distress—a negative form that impairs it. We further investigated how these types of stress influence learners’ psychological states and behavioral outcomes. An experiment was conducted with 16 university students under two conditions: teaching a single agent or teaching 20 agents. The participants’ levels of eustress and distress were measured via standardized questionnaires, and learning outcomes were evaluated via subjective reports, test scores, and teaching durations.