ERR@HRI 2.0: Multimodal Detection of Errors and Failures in Human-Robot Conversations
When: TBD
ACM-MM25, Dublin, Irland
While the integration of large language models (LLMs) into conversational robots has made human-robot conversations more flexible, LLM-powered conversational robots remain prone to errors, such as misinterpreting user intent, interrupting users, or failing to respond. Detecting and addressing these failures is critical to preventing conversational breakdowns, task disruptions, and maintaining user trust. Towards tackling this challenge, the ERR@HRI 2.0 challenge provides a multimodal dataset of LLM-powered conversational robot failures during human-robot conversations and encourages researchers to benchmark machine learning-based models designed to identify when robot failures occur. This dataset features multimodal interaction data, including facial, speech, and head features from 15 hours of dyadic human-robot conversations multimedia, annotated with labels indicating the presence or absence of robot errors and user intent to correct for a mismatch between robot behavior and their expectation. We invite participants to collaborate in teams to develop and submit multimodal ML models designed to detect conversational robot failures, which will be benchmarked based on various performance metrics, including accuracy, precision, recall, F1 score, and timing-based metrics.
Important dates
Registration opening: March 15, 2025
Training and development sets available: April 1, 2025
Baseline code available: May 1, 2025
Test sets available: June 15, 2025
Final code and results submission: June 30, 2025
Notification of acceptance: July 7, 2025
Paper submission deadline: July 14, 2025
Camera-ready paper: August 26, 2025
Challenge day: TBD
Challenge Tasks
The ERR@HRI 2.0 Challenge will consist of two sub-challenges:
(1) Detection of robot failure during human-robot conversations (e.g., user intention recognition errors, interrupting the user, not responding to the user).
(2) Detection of user intention to correct for a mismatch between robot behavior and their expectation (i.e., user-initiated disruptive interruptions)
Challenge Dataset
Information about the dataset can be found at this page. Training and development sets to be made public on April 1st.
Participating in the Challenge
Results and Submissions
Participants should use the training and validation set to develop their detection models, and submit the final models via email to errathri@gmail.com. The final results will be evaluated by organizers on the test set. All participants will be ranked based on the results on the test set. Please look at this page for more details.
The challenge participants will be invited to submit a workshop-style paper describing their ML solutions and results on the dataset -- these will be peer-reviewed and once accepted, will appear in the ACM Multimedia 2025 Challenge/Workshop Proceedings. The format of the paper follows the same requirements as the main conference of the ACM Multimedia 2025.
Baseline and Code
To be published on the 1st of May.
Previous Edition
The first edition of this challenge was organised in conjunction with the ACM International Conference on Multimodal Interaction 2024 (ICMI) in San Josè, Costa Rica. The corresponding website can be found here.
Contact us
errathri@gmail.com (via email)