ERR@HRI:  Multimodal  Detection of Errors and Failures in Human-Robot Interactions

4th November 2024
ICMI24, San Josè, Costa Rica

Human-Robot Interaction (HRI) research is currently placing a greater emphasis on the development of autonomous robots that can be deployed in real-world scenarios to understand the implications of integrating such robots in our lives. However, past literature has shown that such autonomous robots are often characterised by making mistakes, for example when the robot interrupted people or when the robot took a very long time to respond. Such robot failures may disrupt the interaction and negatively impact the perception of people towards the robot. To overcome this problem, robots should be able to detect HRI failure.

The ERR@HRI challenge aims at addressing the problem of failure detection in human-robot interaction (HRI) by providing the community with the means to benchmark efforts for mono-modal vs. multi-modal robot failure detection in HRI. 

Upon participants acceptance of the ERR@HRI terms and condition by signing  the End User Licence Agreement (EULA) , we will share with them a dataset that includes multimodal non-verbal feature statistics (i.e., facial, speech, and pose features) of interaction clips where individuals interact with a robotic coach delivering positive psychology exercises, and labels. Audio-video recordings will not be provided due to anonymity and ethical requirements. The feature statistics and labels will be used to train and evaluate the predictive models. The dataset has been annotated as a time-series with the following labels: robot mistake (e.g., interruption or non-responding, (0) absent, (1) present), user awkwardness (e.g., when the coachee feels uncomfortable interacting with the robot without any robot mistakes, (0) absent, (1) present), and interaction ruptures (i.e., either when the user displays some cues of awkwardness towards the robot and/or when the robot makes some mistakes;  (0) absent, (1) present). 

We invite participants to collaborate in teams to submit their multi-modal ML model for evaluation, which will be benchmarked based on various performance metrics, including accuracy, precision, recall, F1 score, and timing-based metrics [10,11] in detecting robot failures.

Important dates

Challenge Tasks

The ERR@HRI Challenge will consist of the following three sub-tasks:

Baselines Models: We will provide a deep-learning multi-modal model as baseline for each of the three tasks, as in [6] and [5] (where we reported results for interaction rupture prediction) when scheduled.

Challenge Dataset

Information about the dataset can be found at this page.

Participating in the Challenge 


Results and Submissions

Participants should use the training and validation set to develop their detection models, and submit the final models via email to errathri@gmail.com. The final results will be evaluated by organizers on the test set.  All participants will be ranked based on the results on the test set. Please look at this page for more details.

The challenge participants will be invited to submit a workshop-style paper describing their ML solutions and results on the dataset -- these will be peer-reviewed and once accepted, will appear in the ICMI 2024 Challenge/Workshop Proceedings.  The format of the paper follows the same requirements as the main conference of the ICMI 2024

Baseline and Code

To be published on the 22nd of May.

Contact us

errathri@gmail.com (via email)

References

[1] Spitale, Micol, et al. "Longitudinal Evolution of Coachees’ Behavioural Responses to Interaction Ruptures in Robotic Positive Psychology Coaching." RO-MAN IEEE, 2023. 

[2] Spitale, Micol, Minja Axelsson, and Hatice Gunes. "Robotic mental well-being coaches for the workplace: An in-the-wild study on form." ACM/IEEE HRI 2023. 

[3] Kontogiorgos, Dimosthenis, et al. "Behavioural responses to robot conversational failures." ACM/IEEE HRI 2020. 

[4] Kontogiorgos, Dimosthenis, et al. "A systematic cross-corpus analysis of human reactions to robot conversational failures." ACM ICMI 2021. 

[5] Spitale, Micol, Minja Axelsson, and Hatice Gunes. "VITA: A Multi-modal LLM-based System for Longitudinal, Autonomous, and Adaptive Robotic Mental Well-being Coaching." arXiv preprint arXiv:2312.09740 (2023). 

[6] Bremers, A., Parreira, M.T., Fang, X., Friedman, N., Ramirez-Aristizabal, A., Pabst, A., Spasojevic, M., Kuniavsky, M. and Ju, W., 2023. The Bystander Affect Detection (BAD) Dataset for Failure Detection in HRI. arXiv preprint arXiv:2303.04835. 

[7] Kalatzis, Apostolos, et al. "A Multimodal Approach to Investigate the Role of Cognitive Workload and User Interfaces in Human-robot Collaboration." ACM ICMI 2023. 

[8] Tan, Xiang Zhi, et al. "Group Formation in Multi-Robot Human Interaction During Service Scenarios." ACM ICMI 2022. 

[9] Stiber, Maia, et al. “On Using Social Signals to Enable Flexible Error-Aware HRI.” ACM/IEEE HRI 2023. 

[10] M.T. Parreira et al. “Robot Duck Debugging: Can Attentive Listening Improve Problem Solving?” ACM ICMI 2023. 

[11] I.A. de Kok and Dirk K.J. Heylen. “A survey on evaluation metrics for backchannel prediction models” Interdisciplinary Workshop on Feedback Behaviors in Dialog. 2012.