Rules, Terms, and Conditions

April ERR@HRI 2024 Challenge Contest Rules

These are the official rules that govern how the first “Multimodal Detection of Errors and Failures in Human-Robot Interactions” challenge (ERR@HRI 2024), to be held in conjunction with the 26th ACM International Conference on Multimodal Interaction (ICMI24), henceforth simply referred to as the challenge, will operate.

1. Challenge Description

This is a skill-based contest and chance plays no part in the determination of the winner(s). Each participant team is encouraged to develop a Machine Learning framework that can detect errors during human-robot interaction.

Participants should rely their detection models on the following definitions:

Robot Mistake: The robot makes a mistake such as interrupting or not responding to the coachee, or responding with an utterance that is not appropriate for what the coachee has just said.
User Awkwardness: The coachee displays behaviours that signal the interaction is awkward — they may look confused, uncertain, distressed or uncomfortable.
Interaction Rupture: We define an interaction rupture as either the presence of user awkwardness, a robot mistake, or both.

There are three tracks, or tasks, associated to this contest as described below:

1.1. Task 1. Detection of robot mistakes

This task aims to develop a machine learning model that takes the coachee temporal non-verbal features as the input, and detect whether there was a robot mistakes (e.g., interrupting or not responding to the coachee), formulating the task as a sequential binary classification model (robot mistake present (1) or absent (0)).

Input: temporal facial action units, distance and velocity of body joints, speech features.

Output: presence of robot mistakes: (0) absent; (1) present

1.2. Task 2. Detection of user awkwardness

This task aims to develop a machine learning model that takes the coachee temporal non-verbal features as the input, and detect whether the coachees display cues of awkwardness towards the robot (e.g., when the coachee feels uncomfortable interacting with the robot without any robot mistakes), formulating the task as a sequential binary classification model (user awkwardness present (1) or absent (0)).

Input: temporal facial action units, distance and velocity of body joints, speech features.

Output: presence of user awkwardness: (0) absent; (1) present

1.3. Task 3. Detection of interaction ruptures

This task aims to develop a machine learning model that takes the coachee temporal non-verbal features as the input, and detect whether there was a interaction rupture (i.e. when the robot makes mistakes described in task 1 and when user displays awkwardness towards the robot described in task 2), formulating the task as a sequential binary classification model (robot mistake present (1) or absent (0)).

Input: temporal facial action units, distance and velocity of body joints, speech features.

Output: presence of interaction ruptures: (0) absent; (1) present

1.3. Evaluation

The method for the three tasks will be evaluated based on metrics described below:

accuracy,
precision,
recall,
F1 score, and
time-tolerant metrics (Kok and Heylen 2012; Parreira et al. 2023).
These metrics are calculated by admitting a tolerance of [-1,1] samples for each prediction made by the model. They include metrics 1-4 (accuracy precision, recall, F1-score). Tolerant metrics are bound to be higher than "exact sample matching" metrics.

Metrics 1-4 are going to be calculated using the sklearn.metrics library.

Participants are required to submit their developed model and weights. Specifically, during the development stage, participants need to submit their results, and then during the test stage, they need to submit their model and weights. The ranking of the submitted model competing in the Challenge relies on the metrics mentioned above.

2. Tentative Contest Schedule

The registered participants will be notified by email of any change in the following tentative schedule. Please check the ERR@HRI2024 challenge website for updated information:

Registration opening: April 1, 2024
Training and development sets available: April 14, 2024
Baseline code available: May 22, 2024
Test sets available: June 12, 2024
Final code and results submission: June 23, 2024
Notification of acceptance: July 5, 2024
Paper submission deadline: July 21, 2024
Camera ready paper: August 6, 2024
Challenge day: November 4, 2024

3. Eligibility

You are eligible to enter this contest if you meet the following requirements:

You are an individual or a team of people desiring to contribute to the tasks of the challenge and accepting to follow its rules;
You are employed by a non-profit organisation or academic research institution;
You are not involved in any part of the administration and execution of this contest;
You are not an immediate family (parent, sibling, spouse, or child) or household member of a person involved in any part of the administration and execution of this contest.

This contest is void wherever prohibited by law. If you choose to submit an entry, but are not qualified to enter the contest, this entry is voluntary, and any entry you submit is governed by the remainder of these contest rules; the organisers of the challenge reserve the right to evaluate it for scientific purposes. If you are not qualified to submit a contest entry and still choose to submit one, under no circumstances will such entries qualify for sponsored prizes, if any.

4. Entry

To be eligible for judging, an entry must meet the following content/technical components:

4.1. Entry contents

During the period of the challenge, challenge participants are required to submit their results via email (development stage) and their code and trained models via email (test stage). At a later stage, defined in the competition schedule they are required to share their code with complete instructions to enable reproducibility of the results. Participants are required to publicly release their code to be eligible as winners.

4.2. Prerequisites

To participate, participants are required to fill-in the registration form on the ERR@HRI official website.

4.3. Use of data provided

The data provided for this challenge (henceforth referred to ERR@HRI dataset) is proprietary of the AFAR team lab and University of Cambridge. The dataset contains temporal non-verbal statistics extracted features. The ERR@HRI data is freely available to the challenge participants after formal data request under licence terms provided in the End User Licence Agreements (EULA) of the Multimodal Social HRI Ruptures dataset.

Participants will receive the EULAs and License after filling in the registration form on the ERR@HRI 2024 official website, along with instructions on how to submit the filled-in EULAs and License. As described in the EULAs and License, the data are available only for non-commercial research and educational purposes, within the scope of the challenge. Participants may only use the ERR@HRI 2024 dataset for the purpose of participating in this challenge. The copyright of the Multimodal Social HRI Ruptures dataset and underlying datasets remains in property of the University of Cambridge. By downloading and making use of the Multimodal Social HRI Ruptures dataset, you accept full responsibility for using the data and accept the rules specified in the EULAs and License of the underlying dataset. You shall defend and indemnify the challenge organisers and affiliated organisations against any and all claims arising from your use of the ERR@HRI dataset. You agree not to transfer, redistribute, or broadcast the Multimodal Social HRI Ruptures dataset or portions thereof in any way, and to comply with the EU/UK General Data Protection Regulations (GDPR). Users may use portions or the totality of the Multimodal Social HRI Ruptures dataset provided they acknowledge such usage in their publications by citing the baseline paper and the dataset papers. By signing the License and downloading Multimodal Social HRI Ruptures dataset, you engage to strictly respect the conditions set therein.

4.3.1. Training, development, and testing data

The employed dataset is the Multimodal Social HRI Ruptures dataset. We involved 23 employees. The robotic coaches conducted four positive psychology exercises. Please check the paper [2] for more detail about the study. During the interaction, we collected video recordings (coachee’s face and a side view of the interaction) and audio recordings (both the coachee’s and robot’s speech) using two cameras (a frontal video camera and a lateral GoPro) and a Jabra microphone. We used off-the-shelf state-of-the-art methods to extract multimodal behavioural features from the audio-visual data collected from the side-view of the camera in the study:

1) Facial Features: We used the OpenFace 2.2.0 toolkit to extract the presence and intensity of 17 facial action units (AUs), in a total of 34 facial features per frame.

2) Audio Features: We used the openSMILE toolbox and extracted interpretable speech features, namely loudness and pitch per window-size (e.g., for MFCC the toolbox creates frames of 25ms length every 10ms) .

3) Body Features: We used the OpenPose toolbox and extracted the 25-2D body key points per frame to estimate the movement of the torso, hands, arms, and head.

We divided the datasets into training, test and validation sets. Specifically, we split the datasets with a subject-independent strategy (i.e., the same subject was never included in the train and test sets). This results in 55 training clips, 16 validation clips and 18 test clips. Participants may use other third-party datasets to train their solutions, in addition to the training set provided.

4.4. Submission

The entries of the participants will be submitted online via email (code, weights, and results during the test stage). Participants will get quick feedback on validation data released for practice during the development phase.

The participants will get quick feedback on the test results throughout the testing period. Keep in mind that the performances on test data will be examined once the challenge is over during a step of code verification. Additionally, the limit for submissions per participant during the test stage will be set at three.

It is not permitted for participants to open more than one account to submit more than one entry. Any suspicious submissions that do not adhere to this criteria may be excluded by the organizers. The final list of winning techniques will only include entries that pass the code verification.

5. Potential use of the entries

We are not asserting any ownership rights over your entry other than what is stated below.

In exchange for the chance to participate in the competition and potential prize payouts, you're granting us an irrevocable, worldwide right and licence to:

Use, review, evaluate, test, and otherwise assess results provided or produced by your code and other materials provided by you in connection with this competition and any upcoming research or contests sponsored by;
Accept to sign any paperwork that may be necessary for us and our designees to use the rights you granted above;
Use your entry and all of its content in connection with the marketing of this contest in all media (now known or subsequently developed);

If you do not want to grant us these rights to your entry, please do not enter this contest.

6. Judging the entries

Based on the test results and code verification score, the competition winners will be chosen. We will nominate judges who are experts in causality, statistics, machine learning, computer vision, or related disciplines, as well as the experts in challenge organization. All judges will be prohibited from participating in the competition. On request, a list of the judges will be provided. The judges will evaluate all qualifying submissions and choose up to three winners for each track based on the metrics defined in the Evaluation section. The judges will check that the winners followed the requirements.

7. Notifications

We will contact the participants via email for any communications. Participants who have registered will receive notification via the email address they supplied upon registration if there are any changes to the data, schedule, participation instructions, or rules.

8. Unforeseen event

We reserve the right to cancel, modify, or suspend this contest if an unforeseeable or unexpected event (such as, but not limited to: cheating; a virus, bug, or catastrophic event corrupting data or the submission platform; someone discovering a flaw in the data or modalities of the challenge) affects the fairness and/or integrity of the contest. This is known as a "force majeure" event. Regardless of whether a mistake was made by a human or a machine, this right is reserved.

9. Privacy

The personal data required to fill-in the registration form will be stored and processed in accordance with EU/GDPR for the purpose of participating in the challenge, is meant for internal use only and will not be shared with third parties. We will use this information to verify the participants eligibility and contact them throughout the challenge period and subsequent workshop. The organisers will retain the provided information for as long as needed to proceed with the challenge and subsequent workshop.

Note that the participants data needed to request formal dataset access to the underlying datasets is considered a different set of personal data from the personal data described above, and as such it follows different rules and lawful basis of data processing. The right of information of such data is described in the respective EULAs and/or Licence.

DISCLAIMER

ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED “AS-IS”. THE ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL CHALEARN AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE.