HAI HORIZONS

Showcasing Early-Career Research from Non-Native English Speakers

Workshop at the 13th International Conference on Human-Agent Interaction

Yokohama, Japan

List of Presentations

The information on this page is tentative and subject to change at a later date.

* Sorted by the presentation title.

Adapting Dialogue Strategies to Learners: A Mentoring System for Inquiry-Based Learning
Hinano Koga, Kohei Okuoka and Masahiko Osawa

[+ Abstract]

Educator mentoring is important in inquiry-based learning. However, mentoring each learner individually increases educators’ burden. In inquiry-based learning, learners are expected to set their own problems, and thus stimulating thoughts on problem-setting is indispensable. However, most existing studies have mainly focused on supporting the exploration of solutions to already defined problems, and little attention has been paid to encouraging learners’ reflection on problem-setting. To address this issue, this study proposes a mentoring system that switches dialogue strategies based on learner responses. The system incorporates the Design of Questions framework into large language model (LLM) prompts to stimulate reflection, while also engaging in active listening to encourage expression when learners’ utterances are less elaborate.

A between-subjects experiment was conducted in a university course where students defined problems using the Objectives and Key Results (OKR) framework. Participants were assigned to either the proposed condition or the Active Listening Only condition. Evaluation employed pre- and post-questionnaires and dialogue log analysis.

Results showed that mentoring with the proposed system significantly increased learners’ confidence in their objectives compared with the Active Listening Only condition. For depth of thinking, no significant difference was observed between conditions, but scores improved significantly after mentoring in both conditions, suggesting that mentoring stimulated learners’ reflection. Participant comments indicated that their understanding of their problem-setting became clearer under the proposed condition. In contrast, comments from the Active Listening Only condition often referred to the motivation behind learners’ problem-setting, suggesting that encouraging subjective expressions helped clarify personal motivation. These findings suggest that both conditions promoted learners’ thinking, albeit from different perspectives: the proposed condition appeared to encourage clarification of problem-setting, whereas the Active Listening Only condition seemed to stimulate reflection on personal motivation. Future work should therefore analyze in detail how each dialogue strategy supports deeper thinking from these different perspectives.

Adoptive Embodiment of Agent
Ryosuke Kawashima

[+ Abstract]

Embodied agents, such as robots and signage agents, are being increasingly employed in sales promotions, such as product recommendations. These agents attract overt visual attention through their presence and can influence purchasing decisions through interactive behavior. However, the presence of an embodied agent in public spaces can present challenges related to user attention, specifically leading to embarrassment due to perceived social evaluation. Furthermore, users may focus excessively on the agent’s embodied form, which could reduce the impact of the sales promotion. One possible cause of these problems is that the interaction phase is not considered and that the continuously embodied agent interacts with the user. To address this, we define the interaction as comprising three distinct phases in product recommendation scenarios: (1) Introduction Phase, where the agent captures user attention; (2) Recommendation Phase, where the agent provides product information; and (3) Dialog Phase, where the user responds.

This study hypothesizes that adapting the agent’s embodiment to each of these phases can reduce user discomfort and enhance engagement. An in-person experiment was conducted in a public setting to test this hypothesis. A new agent design, termed ``Adaptive Embodiment,” was developed to improve user engagement and reduce discomfort during interaction across various phases by leveraging the strengths of existing embodied agents for each phase of interaction. In the experimental setup, Adaptive Embodiment served as the target condition, whereas the Embodiment and Disembodiment served as comparative conditions. The experiment indicated that during the Recommendation Phase, where the agent primarily listens, the presence of the agent tends to draw more overt visual attention from bystanders which increases user embarrassment and reduces the amount of time spent gazing at the product being recommended. The findings suggest that Adaptive Embodiment could be an effective solution to mitigate these issues.

Can LLMs Read Intent Across Cultures?
Taiga Sumi, Ayu Iida, Yasuhito Hosaka, Isabelle Lavelle, Masako Kohama, Sonoko Moriyama and Masahiko Osawa

[+ Abstract]

The authors aim to develop Large Language Models (LLMs) capable of generating responses that reflect human intentions and proposed integrating LLMs with cognitive models. The first method is LLM embedded in Cognitive Model (LEC), and the second is Cognitive Model embedded in LLM (CEL). Their evaluation experiments demonstrate that integrating LLMs with cognitive models enables intention reading. However, it is important to note that all prompts in the evaluation experiments were written in Japanese.

The behavior of LLMs is known to vary significantly depending on the language used in the prompts. This study investigates the impact of prompt language on the performance of communication tasks that require responses reflecting the speaker’s intention when integrating LLMs with cognitive models. Specifically, prompts were created in six languages—Japanese, English, German, Italian, French, and Chinese—and performance was compared across languages. The tasks were identical to those used in prior research and involved three scenarios with discrepancies between utterances and their intended meaning: Sarcasm, TSUNDERE, and Cautious Coaching. The LLM used in this study was gpt-4.1. The results showed that, in the Sarcasm and Cautious Coaching scenarios, integrating LLMs with cognitive models achieved high performance across all six languages. In contrast, in the TSUNDERE scenario, responses based on the literal meaning of utterances were frequently observed in Italian, French, and Chinese, indicating performance differences across languages.

Effect of Shading on the Impression of Minimal-Design Robots
Chue Cho, Ryoichi Nakashima and Masahiko Osawa

[+ Abstract]

Currently, a wide variety of visually designed social robots are being developed: Some are closely human-like designs, and others are human-unlike designs.　The latter includes minimal-design robots, which are designed by eliminating unnecessary functions and decorative elements as much as possible. Minimal-design robots have some advantages that they are less susceptible to issues inducing the communication failure, such as the adaptation gap and the uncanny valley effect, which are the problems that human-like robots usually confront. On the other hand, the minimal-design robots have some disadvantages that they are difficult to convey emotions effectively due to their lack of facial expression, whereas the human-like appearance robots can produce facial expressions by moving facial components.

To address this challenge, we propose a method to make minimal-design robots express their emotional valences (i.e., positive/negative emotions), inspired by studies about Noh masks, which revealed that the perceived impression of the mask changes depending on the shading. Since a Noh mask has facial parts but lacks movement, we think that the insights can be applied to a minimal-design robot.

In this study, we examine whether and how the shading influences the impressions in minimal-design robots’ faces. We employ a robot with one camera embedded on a spherical face. To generate and manipulate the shading, we arrange six light sources in front of the robot: three positioned above and three below the frontal axis, and all equidistant from the center of the sphere. For each lighting condition, we generate one image of the shaded robot. In the experiment, we plan to ask participants to view each image and rate the impression with a seven-point positive–negative scale.

Based on the experimental results, we discuss the possible effectiveness and applications of expressing emotions in minimal-design robots with shading.

Endogenous Formation of Collective Behavioral Patterns of Human-Autonomous Mobility Interactions in Mixed-Use Traffic Environments
Kaori Nakamura

[+ Abstract]

This study explores safe and efficient coexistence of autonomous vehicles (AVs) and pedestrians in mixed-use traffic environments. Using an evolutionary game theory framework with imitation and best-response dynamics, respectively, and under the assumption of heterogeneous risk perceptions among pedestrians, spontaneous collective behavioral patterns which can be formed as their interactions are analyzed. Numerical results indicate that fully autonomous decision-making vehicles may better avoid social dilemmas such as the ``Freezing Robot Problem" and promote cooperation. The study emphasizes the importance of aligning AV behavior with suitable environmental and traffic rule conditions for the successful integration of AVs into public spaces.

Enhancing Robot Expressiveness with Augmented Reality Avatar
Zejun YU

[+ Abstract]

This paper explores a novel interaction paradigm where an expressive Augmented Reality (AR) avatar is physically embodied by a concealed robotic arm, enabling it to manipulate real-world objects. The central challenge in making this paradigm viable is maintaining the user's sense of presence by ensuring the actuator remains completely hidden. We designed a new interaction system and propose a key occlusion algorithm that effectively hides the physical robotic arm from view, allowing users to perceive only the expressive virtual avatar. This approach demonstrates the potential of a new human-robot-avatar interaction paradigm and opens up possibilities for low-cost, highly customizable robot expressiveness.

Exploring Ethical Autonomy in Human-AI Interactions
Rafik Hadfi

[+ Abstract]

AI systems are becoming more autonomous, yet their independence raises difficult questions of ethics and trust. This project develops a framework for exploring ethical autonomy in human-AI interactions. Drawing on moral philosophy, information theory, and game theory, we model bounded rational agents who face social dilemmas and negotiate within the constraints of memory, risk, and environmental factors. Autonomy is quantified not as unlimited freedom but as a balance between determinism and strategic adaptation. Simulation results demonstrate how various design choices impact this balance and highlight conditions under which autonomy supports, rather than undermines, human decisions. The framework aims to provide concrete tools for reasoning about autonomy in AI while staying connected to human values.

Expressive Augmentation with Environmental Projection: Towards More Empathic and Immersive Communication
Hibiki Harano

[+ Abstract]

In contemporary communication, the inner emotions and images that a speaker wishes to convey are often not fully expressed through language alone, resulting in gaps in understanding and empathy between interlocutors. This tendency is particularly pronounced for individuals who are not eloquent or who find it difficult to express themselves through facial expressions and gestures.

With the advancement of affective computing and emotion visualization technologies, research has progressed in supporting the recognition and transmission of emotional states. However, such approaches often remain limited to emotion classification and simple labeling, and effective methods for expressing and extending a speaker’s multilayered and fluid inner experiences have not yet been sufficiently developed.

Furthermore, recent methods of visualizing speech content through web-based presentation systems and augmented reality (AR) technologies have been primarily designed with a focus on audience experience and the completeness of the presentation. In contrast, few approaches have been developed that center on the speaker’s intrinsic motivation to express and the subjective experience itself.

To address these challenges, this study proposes a new communication support concept, “Expressive Augmentation.” Expressive Augmentation is a form of human augmentation technology that seeks to support self-expression and foster empathic dialogue experiences by complementing and extending the emotional nuances and semantic content a speaker attempts to convey through real-time background projection. The aim of this study is to explore its effectiveness and potential.

Investigating Human Interest Dynamics Through Web Browsing Behavior
Nilupul Heshan and Randika Kodikara

[+ Abstract]

Maintaining focus during web-based learning poses a significant challenge for learners, as they are constantly exposed to distracting, goal-irrelevant information sources. This study examines how the semantic context of accessed content influences the preservation or disruption of goal-directed behavior in digital environments. To address this issue, we developed an analysis framework that monitors browsing behavior by capturing the semantics of observed content.

In an experiment with university students performing online research tasks, we tracked browsing sequences and extracted semantic embeddings of observed web materials using CLIP. Our method evaluates two types of similarity scores: (1) between the materials and the task objective, reflecting goal-oriented interest, and (2) between consecutive materials, capturing whether learners sustain focus or drift away from assigned goals.

Quantitative analyses demonstrated that the proposed method can capture diverse patterns of interest dynamics. We also confirmed that similarity scores for relevant task–material pairs were significantly higher than those for irrelevant pairs used as a baseline (p < 0.01), highlighting the framework's ability to detect interest fluctuations.

A notable strength of this method is that it relies solely on observed browsing materials without requiring special devices to infer internal cognitive states. This characteristic provides a computationally efficient approach to understanding and supporting sustained engagement in digital learning. Future applications include adaptive support systems that dynamically respond to learners' cognitive states, offering timely task recommendations to mitigate attention drift and enhance task persistence.

Language Synchrony and Contact Exchange in Japanese First-Meeting Dialogues: rLSM and Rolling-Window Analyses
Yuriko Kikuchi

[+ Abstract]

Human–AI Interaction increasingly relies on conversational adaptation, yet evidence on linguistic style matching in Japanese is scarce. We present a work-in-progress study asking whether linguistic synchrony measured from Japanese speed-dating conversations predicts contact-exchange outcomes. Building on the MMSD corpus introduced by Ishii et al., which collected two in-person rounds per dyad along with post-interaction outcomes including whether contact information was exchanged, we analyze the same dataset but focus on conversational transcripts rather than pre-obtainable traits.

Using the Japanese LIWC2015 dictionary, we quantify function-word usage per turn and derive three families of measures: conversation-level LSM, reciprocal LSM that captures dyadic asymmetries, and rolling-window rLSM to stabilize short turns and highlight local bursts of alignment. We additionally examine synchrony trajectories across Round 1 and Round 2 and over within-conversation deciles, and we handle backchannels and very short turns following recent recommendations in rolling-window work.

Analytically, we compare exchangers versus non-exchangers using Welch’s t-tests with effect sizes and confidence intervals, and we report robustness checks controlling for word counts, talk-time ratio, and gender composition. As a positioning contribution, we contrast our conversational synchrony approach with prior MMSD studies that relied on profiles, facial features, and psychometric scales to predict post-date impressions, clarifying how in-dialogue coordination complements pre-obtainable information.

Because the data contain sensitive personal information, the dataset itself will not be released; instead, we will share analysis code, detailed measurement specifications, and synthetic examples to ensure transparency and reproducibility. We will present descriptive statistics, effect sizes, and ablation-style robustness, and we invite feedback on window sizes, trajectory definitions, and evaluation protocols before scaling to cross-validated predictive models for the contact-exchange task.

LLM-Based Evaluation of Utterances with Implicature Understanding: A Preliminary Study
Ayu Iida

[+ Abstract]

Although Large Language Models (LLMs) have recently shown remarkable performance in many language comprehension tasks, they struggle to perform adequately in communicative contexts involving implicature. In our previous study, we proposed LLM-based agents by integrating LLMs with cognitive models. In three dialogue scenarios, these agents generated appropriate utterances as if inferring the speaker’s intentions (i.e., implicature). Further investigation of the agents’ performances requires an examination of their utterances in a large number of scenarios. In addition, it is also important to consistently evaluate the agents' utterances. Thus, this study proposes a method in which LLMs evaluate agents’ generated utterances in the same way as human evaluators. Using our pilot prompt, we demonstrated that the evaluations of LLMs and human evaluators were similar.

Locally-Coherent Dialogue with Multitasker: A Case Study
Tomoyuki Maekawa

[+ Abstract]

I would like to discuss the challenges and solutions of multitasking during group conversations. When people perform another task (e.g., using a laptop) while talking with others, they sometimes miss important information and have difficulty catching up. Our goal is to develop an assistive agent that provides useful information to help them engage in the conversation smoothly.

To observe human interaction, we conducted a case study with three participants: a facilitator, a presenter, and a multitasker. The presenter explained a paper they had read to the facilitator. The facilitator listened to the presenter and asked the multitasker questions at any time. The multitasker had a text chat with an AI chatbot and responded to the facilitator's questions only when asked.

We found that the facilitator tended to provide a brief summary of the previous conversation to the multitasker before asking a question. The multitasker responded in a locally coherent manner; the responses were superficially consistent with the questions, but not with the whole conversation. These findings suggest that agents need to provide multitaskers with context-rich information to help them understand the meaning of others' utterances.

RIMER: A Shoulder-Mounted Remote Dialogue Facilitation Robot for Situated Reminiscence Therapy
Ryunosuke Ito

[+ Abstract]

This paper proposes RIMER, a dialogue facilitation system specifically designed to support situation-aware remote reminiscence therapy using a shoulder-mounted robot. Reminiscence therapy is a psychological intervention that promotes cognitive improvement by encouraging individuals to recall and talk about their past experiences. Traditional reminiscence therapy typically involves face-to-face conversations guided by pre-selected photographs. In contrast, RIMER dynamically captures a local companion's current surroundings through a camera mounted on the robot and generates situated questions based on the observed scene and dialogue history. This allows a remote therapy recipient to naturally recall memories associated with what they are presently seeing and recent conversations, enabling more spontaneous and contextually relevant reminiscence. The situated therapy is conducted remotely, without the need for physical co-presence. To evaluate the effectiveness of RIMER, we conducted a comparative study with three groups: one using RIMER with reminiscence support, one with facilitation but without reminiscence, and one with no facilitation. We measured the amount of dialogue during the sessions and analyzed the effects of each condition on facilitation and memory recall. Results showed that the group using RIMER exhibited increased conversation volume and enhanced memory recall compared to the other groups.

Simulating the Role of Emotions in Cooperative Behavior through Cognitive Modeling
Kawaji Ruiki

[+ Abstract]

Emotions evolved to facilitate quick decision-making for survival and significantly influence the formation and maintenance of cooperation. Recently, artifacts that recognize and express emotion have been developed. However, the impact of artificial emotions on human cooperation is not well understood. To address this issue, this study uses cognitive modeling to simulate the role of emotions in cooperative behavior.

We adopted the two-player version of "Hanabi" as a microworld to instantiate cooperative situations and incorporated the effects of emotions into instance-based learning within the ACT-R cognitive architecture. Specifically, emotional valence was modeled as a mood-congruency effect and arousal as a memory activation modulator. Additionally, we implemented decision-making that bypasses the instance-based process as heuristics to represent situations in which cooperative protocols are institutionalized. In the simulations, we manipulated task difficulty, emotional conditions (positive, neutral, negative, and dynamically fluctuating), and the degree of institutionalization.

The results showed that when institutionalization was low, emotions negatively influenced performance, especially under positive conditions with high task difficulty. In these conditions, a decrease in performance due to repeated interactions was observed. In contrast, when the level of institutionalization was high, the influence of emotions was limited. Furthermore, improvement in cooperative performance through repeated interaction was observed under the neutral condition when emotions fluctuated in response to environmental feedback.

In conclusion, we demonstrated that the role of emotion in cooperative behavior depends on the type of situation and that dynamic emotional adjustment contributes to the success of cooperation. At the same time, we identified a mechanism through which institutionalized heuristics suppress risk-seeking behavior driven by positive emotional conformity. These findings are important for designing artificial emotional systems that cooperate with humans.

The Role of a Critical Tongue Dialogue Strategy in Stimulating Emotion Regulation: An Interaction Model and Video-Based Study
Keisuke Magara

[+ Abstract]

This study proposes a dialogue strategy called "Critical Tongue Dialogue Strategy" (CTDS) for dialogue agents that respond to users experiencing anxiety or tension by delivering blunt, non-conformist remarks while maintaining a non-sympathetic stance. This strategy employs balance theory principles to deliberately avoid acknowledging users' negative emotions, instead utilizing interpersonal emotion regulation strategies. By creating emotional imbalance, it aims to trigger users' own emotion regulation processes, thereby facilitating cognitive shifts that enable them to reframe their anxieties and tensions in a more positive light. In this paper, we present the interaction modeling of this dialogue strategy and the results of psychological evaluations from a scenario- and video-based study comparing CTDS agents with those employing empathetic dialogue strategies (n = 108). Experimental results demonstrate that in terms of cognitive change assessment metrics, CTDS significantly outperformed the empathetic dialogue strategy.

Verification and Future Perspectives of Teachable Agent Research Utilizing Group Instructional Formats
Kazuma Ichikawa

[+ Abstract]

Learning-by-Teaching (LbT) has been proposed as an effective approach to enhance learners’ motivation and deepen their understanding by having them teach others. Recent studies have demonstrated that this effect extends to situations in which learners teach virtual agents, highlighting the educational potential of such systems. Building on this foundation, the present study implemented a Learning-by-Teaching system in which learners conducted group lessons for multiple teachable agents. To evaluate the system’s effectiveness, we first conducted a field experiment with elementary and junior high school students using a system in which learners taught lessons to 20 agent-students. Questionnaires were used to capture both their regular study attitudes and their evaluations of the system. The results showed that the system was highly valued, and many students reported experiencing a sense of tension while teaching. Moreover, we found a positive correlation between system suitability and two motivational factors: task value and learning control beliefs. These findings suggest that the system is particularly effective for students who perceive high value in the learning task and who hold strong beliefs in their ability to regulate their own learning. To further examine the psychological mechanisms underlying these effects, we developed two versions of the system—one with a single agent and another with twenty agents—and conducted a comparative experiment with 16 university students. Drawing on cognitive appraisal theory, we examined whether the increased number of agents induced stress that functioned as eustress (a positive stress that enhances performance) or distress (a negative stress that impairs it). Participants’ psychological states were assessed using standardized questionnaires, while learning outcomes were evaluated through subjective reports, test performance, and teaching duration. Taken together, these findings provide empirical insights into how the number of teachable agents influences learners’ motivation, stress responses, and behavioral outcomes, demonstrating the feasibility of group-based LbT environments and their potential for future educational applications.

Page updated

Google Sites

Report abuse