These are the official rules that govern how the first “Multimodal Detection of Errors and Failures in Human-Robot Interactions” challenge (ERR@HRI 2025), to be held in conjunction with the 33rd ACM International Conference on Multimedia (ACM-MM25), henceforth simply referred to as the challenge, will operate.
This is a skill-based contest and chance plays no part in the determination of the winner(s). Each participant team is encouraged to develop a Machine Learning framework that can detect errors during human-robot interaction.
Participants should rely their detection models on the following definitions:
Robot Mistake (0-absent, 1-present): The robot makes a mistake such as interrupting or not responding to the user, or responding with an error message or an utterance that is not appropriate for what the user has just said.
Detection of user intention to correct for a mismatch between robot behavior and their expectation (0-absent, 1-present): The user displays behavior (verbal or non-verbal) that signals an intention to correct for a mismatch between robot behavior and their expectation such as user-initiated disruptive interruption. Disruptive interruption is defined to be when the listener challenges the speaker’s control and disrupts the conversational flow to express an opposing opinion, take the floor, hange the subject, or summarize the speaker’s point to end the turn and avoid unwanted information. This behavior suggests that there exists some mismatch between the user’s expectation for the robot and the robot’s behavior.
There are two tracks, or tasks, associated to this contest as described below:
(1) Detection of robot failure during human-robot conversations (e.g., user intention recognition errors, interrupting the user, not responding to the user).
(2) Detection of user intention to correct for a mismatch between robot behavior and their expectation (i.e., user-initiated disruptive interruptions)
Input: temporal facial action units, head pose, speech features.
Output: presence of robot mistakes or user intention: (0) absent; (1) present
The method for the three tasks will be evaluated based on metrics described below:
accuracy,
precision,
recall,
F1 score, and
time-tolerant metrics (Kok and Heylen 2012; Parreira et al. 2023). These metrics are calculated by admitting a tolerance of [-1,1] samples for each prediction made by the model. They include metrics 1-4 (accuracy precision, recall, F1-score). Tolerant metrics are bound to be higher than "exact sample matching" metrics.
Metrics 1-4 are going to be calculated using the sklearn.metrics library.
Participants are required to submit their developed model and weights. Specifically, during the development stage, participants need to submit their results, and then during the test stage, they need to submit their model and weights. The ranking of the submitted model competing in the Challenge relies on the metrics mentioned above.
The registered participants will be notified by email of any change in the following tentative schedule. Please check the ERR@HRI2025 challenge website for updated information:
Registration opening: March 15, 2025
Training and development sets available: April 1, 2025
Baseline code available: May 1, 2025 May 16, 2025
Test sets available: June 15, 2025
Final code and results submission: June 30, 2025 July 7, 2025
Notification of acceptance: July 10, 2025
Paper submission deadline: July 14, 2025 July 20, 2025
Camera-ready paper: August 26, 2025 (hard deadline)
Challenge day: TBD
You are eligible to enter this contest if you meet the following requirements:
You are an individual or a team of people desiring to contribute to the tasks of the challenge and accepting to follow its rules;
You are employed by a non-profit organisation or academic research institution;
You are not involved in any part of the administration and execution of this contest;
You are not an immediate family (parent, sibling, spouse, or child) or household member of a person involved in any part of the administration and execution of this contest.
This contest is void wherever prohibited by law. If you choose to submit an entry, but are not qualified to enter the contest, this entry is voluntary, and any entry you submit is governed by the remainder of these contest rules; the organisers of the challenge reserve the right to evaluate it for scientific purposes. If you are not qualified to submit a contest entry and still choose to submit one, under no circumstances will such entries qualify for sponsored prizes, if any.
To be eligible for judging, an entry must meet the following content/technical components:
During the period of the challenge, challenge participants are required to submit their results via email (development stage) and their code and trained models via email (test stage). At a later stage, defined in the competition schedule they are required to share their code with complete instructions to enable reproducibility of the results. Participants are required to publicly release their code to be eligible as winners.
To participate, participants are required to fill-in the registration form on the ERR@HRI official website.
The data provided for this challenge (henceforth referred to ERR@HRI dataset) is proprietary of the Johns Hopkins University. The dataset contains temporal non-verbal statistics extracted features. The ERR@HRI 2.0 data is freely available to the challenge participants after formal data request under licence terms provided in the End User Licence Agreements (EULA) of the ERR@HRI 2.0 dataset.
Participants will receive the EULAs and License after filling in the registration form on the ERR@HRI 2025 official website, along with instructions on how to submit the filled-in EULAs and License. As described in the EULAs and License, the data are available only for non-commercial research and educational purposes, within the scope of the challenge. Participants may only use the ERR@HRI 2025 2.0 dataset for the purpose of participating in this challenge. The copyright of the ERR@HRI 2.0 dataset and underlying datasets remains in property of the Johns Hopkins University. By downloading and making use of the ERR@HRI 2.0 dataset, you accept full responsibility for using the data and accept the rules specified in the EULAs and License of the underlying dataset. You shall defend and indemnify the challenge organisers and affiliated organisations against any and all claims arising from your use of the ERR@HRI 2.0 dataset. You agree not to transfer, redistribute, or broadcast the ERR@HRI 2025 2.0 dataset or portions thereof in any way, and to comply with the EU/UK General Data Protection Regulations (GDPR). Users may use portions or the totality of the ERR@HRI 2.0 dataset provided they acknowledge such usage in their publications by citing the baseline paper and the dataset papers. By signing the License and downloading ERR@HRI 2.0 dataset, you engage to strictly respect the conditions set therein.
We divided the ERR@HRI 2.0 dataset into training, validation, and testing sets by splitting the dataset using a subject-independent strategy (i.e., the training, validation, and testing sets do not include data from the same subjects). This resulted in a training set composed of 75 interactions (45 with voice assistant and 30 with social robot); validation set composed of 15 interactions (9 with voice assistant and 6 with social robot); test set composed of 20 interactions (12 with voice assistants and 8 with social robots). Please check more details about the dataset here.
The entries of the participants will be submitted online via email (code, weights, and results during the test stage). Participants will get quick feedback on validation data released for practice during the development phase.
The participants will get quick feedback on the test results throughout the testing period. Keep in mind that the performances on test data will be examined once the challenge is over during a step of code verification. Additionally, the limit for submissions per participant during the test stage will be set at three.
It is not permitted for participants to open more than one account to submit more than one entry. Any suspicious submissions that do not adhere to this criteria may be excluded by the organizers. The final list of winning techniques will only include entries that pass the code verification.
We are not asserting any ownership rights over your entry other than what is stated below.
In exchange for the chance to participate in the competition and potential prize payouts, you're granting us an irrevocable, worldwide right and licence to:
Use, review, evaluate, test, and otherwise assess results provided or produced by your code and other materials provided by you in connection with this competition and any upcoming research or contests sponsored by;
Accept to sign any paperwork that may be necessary for us and our designees to use the rights you granted above;
Use your entry and all of its content in connection with the marketing of this contest in all media (now known or subsequently developed);
If you do not want to grant us these rights to your entry, please do not enter this contest.
Based on the test results and code verification score, the competition winners will be chosen. We will nominate judges who are experts in causality, statistics, machine learning, computer vision, or related disciplines, as well as the experts in challenge organization. All judges will be prohibited from participating in the competition. On request, a list of the judges will be provided. The judges will evaluate all qualifying submissions and choose up to three winners for each track based on the metrics defined in the Evaluation section. The judges will check that the winners followed the requirements.
We will contact the participants via email for any communications. Participants who have registered will receive notification via the email address they supplied upon registration if there are any changes to the data, schedule, participation instructions, or rules.
We reserve the right to cancel, modify, or suspend this contest if an unforeseeable or unexpected event (such as, but not limited to: cheating; a virus, bug, or catastrophic event corrupting data or the submission platform; someone discovering a flaw in the data or modalities of the challenge) affects the fairness and/or integrity of the contest. This is known as a "force majeure" event. Regardless of whether a mistake was made by a human or a machine, this right is reserved.
The personal data required to fill-in the registration form will be stored and processed in accordance with EU/GDPR for the purpose of participating in the challenge, is meant for internal use only and will not be shared with third parties. We will use this information to verify the participants eligibility and contact them throughout the challenge period and subsequent workshop. The organisers will retain the provided information for as long as needed to proceed with the challenge and subsequent workshop.
Note that the participants data needed to request formal dataset access to the underlying datasets is considered a different set of personal data from the personal data described above, and as such it follows different rules and lawful basis of data processing. The right of information of such data is described in the respective EULAs and/or Licence.
DISCLAIMER
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED “AS-IS”. THE ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL CHALEARN AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE.