Terms and Conditions

REACT2024 Challenge Contest Rules

These are the official rules that govern how the second “Multiple Appropriate Facial Reaction Generation in Dyadic Interactions” challenge (REACT2024), to be held in conjunction with the Automatic Face and Gesture Recognition (FG) 2024, henceforth simply referred to as the challenge, will operate.

1. Challenge Description

This is a skill-based contest and chance plays no part in the determination of the winner(s). Each participant team is encouraged to develop a Machine Learning framework that can generate multiple appropriate spatio-temporal facial reactions from each input speaker behaviour.

There are two (2) tracks, or tasks, associated to this contest as described below:

1.1. Task 1. Offline Appropriate Facial Reaction Generation

This task aims to develop a machine learning model that takes the entire speaker behaviour sequence as the input, and generates multiple appropriate and realistic / naturalistic spatio-temporal facial reactions, consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker's behaviour.

Input: the entire speaker audio-visual clip (each clip is 30s long).

Output: (i) three 25-channel time-series representing three predicted 30s facial reactions, where each time-series consists of the occurrence (0 or 1) of 15 facial action units (i.e., AU1, AU2, AU4, AU6, AU7, AU9, AU10, AU12, AU14, AU15, AU17, AU23, AU24, AU25 and AU26), 2 facial affects - valence and arousal intensities (range from -1 to 1) - and the probabilities (range from 0 to 1) of eight categorical facial expressions (i.e., Neutral, Happy, Sad, Surprise, Fear, Disgust, Anger and Contempt); and (ii) visualisation of the generated facial reactions (2D face sequences).

1.2. Task 2. Online Appropriate Facial Reaction Generation

This task aims to develop a machine learning model that estimates each frame, rather than taking all frames into consideration. The model is expected to gradually generate all facial reaction frames to form multiple appropriate and realistic / naturalistic spatio-temporal facial reactions consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker's behaviour.

Input: the speaker audio-visual behaviours expressed prior and equal to the time t.

Output: (i) frame-level facial attributes (15 AUs’ occurrences, valence and arousal intensities, and the probabilities of eight categorical facial expressions) representing the predicted t_{th} facial reaction frame; (ii) Once all frames of a speaker audio-visual clip are fed to the model, all predicted facial reaction frames are combined as a 25-channel time-series representing a predicted 30s facial reaction, where three facial reactions are required to be predicted from each speaker behaviour clip; and (iii) visualisation of the generated facial reactions (2D face sequences).

Appropriate real facial reactions:

We provide three matrices (for training, validation and testing) to define appropriate real facial reactions of each speaker behaviour in files neighbour_emotion_val.npy, neighbour_emotion_train.npy, and neighbour_emotion_test.npy (that you can find the the folder shared with useful materials), where M(i,j) = ‘True’ denotes the (j+1)th facial behaviour is an appropriate reaction for responding to (i+1)th behaviour (e.g., the facial behaviour displayed by the 7th clip is an appropriate reaction in response to the audio-visual behaviour displayed by the 5th clip ). The three matrices, and the relationship between each index in the matrix and the clip name in the dataset are provided here.

1.3. Evaluation

The method for both tasks will be evaluated based on metrics described below.

We follow [1] to evaluate four aspects of the facial reactions generated by participant models:

(i) their Appropriateness that are measured by two metrics: Dynamic Time Warping (DTW) and Concordance Correlation Coefficient (CCC) between the generated facial reactions and their most similar appropriate real facial reaction, which are named as FRDist and FRCorr, respectively;

(ii) their inter-condition and inter-frame Divers

ities that are measured by FRVar, FRDiv and FRDvs metrics defined in [1], respectively;

(iii) their Realism that are measured by Fréchet Inception Distance (FID) between the distribution of the generated facial reactions and the distribution of the corresponding appropriate real facial reactions (named as FRRea);

(iv) their Synchrony with the corresponding speaker behaviour, which is measured by Time Lagged Cross Correlation (TLCC), (named as FRSyn).

Participants are required to submit their developed model and weights. Specifically, during the development stage, participants need to submit their results, and then during the test stage, they need to submit their model and weights. The ranking of the submitted model competing in the Challenge relies on the two metrics: Appropriate facial reaction distance (FRDist) and facial reactions' diverseness FRDiv, for both sub-challenges [1].

We provide a matrix M (‘Files/Approprirate_facial_reaction.npy’) to define appropriate real facial reactions of each speaker behaviour, where M(i,j) = ‘True’ denotes the (j+1)th facial behaviour is an appropriate reaction for responding to (i+1)th behaviour (e.g., the facial behaviour displayed by the 7th clip is an appropriate reaction in response to the audio-visual behaviour displayed by the 5th clip ). The relationship between each index in the matrix and the clip name in the dataset is provided in ‘Files/data_indices.csv’.

2. Tentative Contest Schedule

The registered participants will be notified by email of any change in the following tentative schedule. Please check the REACT2024 challenge website for updated information:

Launching Challenge website and call for participation poster: November 2, 2023
Registration open: November 5, 2023
Training and validation sets released: November 14, 2023
Baseline paper and code released: December 31, 2023
Final result and model submission: February 28, 2024
Paper submission deadline: March 10, 2024
Paper acceptance notification: March 29, 2024
Camera ready paper submission deadline: April 15, 2024

3.Eligibility

You are eligible to enter this contest if you meet the following requirements:

You are an individual or a team of people desiring to contribute to the tasks of the challenge and accepting to follow its rules;
You are employed by a non-profit organisation or academic research institution;
You are not involved in any part of the administration and execution of this contest;
You are not an immediate family (parent, sibling, spouse, or child) or household member of a person involved in any part of the administration and execution of this contest.

This contest is void wherever prohibited by law. If you choose to submit an entry, but are not qualified to enter the contest, this entry is voluntary, and any entry you submit is governed by the remainder of these contest rules; the organisers of the challenge reserve the right to evaluate it for scientific purposes. If you are not qualified to submit a contest entry and still choose to submit one, under no circumstances will such entries qualify for sponsored prizes, if any.

4. Entry

To be eligible for judging, an entry must meet the following content/technical components:

4.1. Entry contents

During the period of the challenge, challenge participants are required to submit their results via email (development stage) and their code and trained models via email (test stage). At a later stage, defined in the competition schedule they are required to share their code with complete instructions to enable reproducibility of the results. Participants are required to publicly release their code to be eligible as winners.

4.2. Prerequisites

To participate, participants are required to fill-in the registration form on the REACT 2024 official website.

4.3. Use of data provided

The data provided for this challenge (henceforth referred to REACT 2024 data) is proprietary of the original dataset authors and affiliated organisations. The data contains segments from three (3) publicly available datasets for research purposes (NOXI, RECOLA), along with automatically extracted action units, valence and arousal. The REACT 2023 data is freely available to the challenge participants after formal data request under licence terms provided in the End User Licence Agreements (EULA) of NOXI and RECOLA datasets, and the License of UDIVA dataset. Participants will receive the EULAs and License after filling in the registration form on the REACT 2024 official website, along with instructions on how to submit the filled-in EULAs and License. As described in the EULAs and License, the data are available only for non-commercial research and educational purposes, within the scope of the challenge. Participants may only use the REACT 2024 data for the purpose of participating in this challenge. The copyright of the REACT 2024 data and underlying datasets remains in property of their respective owners. By downloading and making use of the REACT 2024 data, you accept full responsibility for using the data and accept the rules specified in the EULAs and License of the underlying datasets. You shall defend and indemnify the challenge organisers and affiliated organisations against any and all claims arising from your use of the REACT 2024 data. You agree not to transfer, redistribute, or broadcast the REACT 2024 data or portions thereof in any way, and to comply with the EU/UK General Data Protection Regulations (GDPR). Users may use portions or the totality of the REACT 2024 data provided they acknowledge such usage in their publications by citing the baseline paper and NOXI, RECOLA dataset papers. By signing the UDIVA License and downloading RECOLA, and NOXI datasets, you engage to strictly respect the conditions set therein.

4.3.1. Training, development, and testing data

The employed dataset relies on three corporas: NoXi [2], and RECOLA [4] datasets. We segmented the audio-video data of all the three datasets into 30-seconds long clips. Then, we cleaned the dataset by selecting only the dyadic interaction with complete data of both conversational partners (where both faces were in the frame of the camera). This resulted in 4308 pairs of audio-video dyadic interaction clips (8616 clips) of 30 seconds each .

We divided the datasets into training, test and validation sets. Specifically, we split the datasets with a subject-independent strategy (i.e., the same subject was never included in the train and test sets). This results in 2624 pairs of training clips, 839 pairs of validation clips and 845 pairs of test clips. The detailed data split is defined by .txt files in ‘Files/data split/’. Participants may use other third-party datasets to train their solutions, in addition to the training set provided.

4.4. Submission

The entries of the participants will be submitted online via email (code, weights, and results during the test stage). Participants will get quick feedback on validation data released for practice during the development phase. The participants will get quick feedback on the test results throughout the testing period. Keep in mind that the performances on test data will be examined once the challenge is over during a step of code verification. Additionally, the limit for submissions per participant during the test stage will be set at three. It is not permitted for participants to open more than one account to submit more than one entry. Any suspicious submissions that do not adhere to this criteria may be excluded by the organizers. The final list of winning techniques will only include entries that pass the code verification.

For task 2, participants are only allowed to use information from the past. This behaviour will be checked in the code verification stage, and solutions that break this rule will be disqualified.

5. Potential use of the entries

We are not asserting any ownership rights over your entry other than what is stated below.

In exchange for the chance to participate in the competition and potential prize payouts, you're granting us an irrevocable, worldwide right and licence to:

Use, review, evaluate, test, and otherwise assess results provided or produced by your code and other materials provided by you in connection with this competition and any upcoming research or contests sponsored by;
Accept to sign any paperwork that may be necessary for us and our designees to use the rights you granted above;
Use your entry and all of its content in connection with the marketing of this contest in all media (now known or subsequently developed);

If you do not want to grant us these rights to your entry, please do not enter this contest.

6. Judging the entries

Based on the test results and code verification score, the competition winners will be chosen. We will nominate judges who are experts in causality, statistics, machine learning, computer vision, or related disciplines, as well as the experts in challenge organization. All judges will be prohibited from participating in the competition. On request, a list of the judges will be provided. The judges will evaluate all qualifying submissions and choose up to three winners for each track based on the metrics defined in the Evaluation section. The judges will check that the winners followed the requirements.

7. Notifications

We will contact the participants via email for any communications. Participants who have registered will receive notification via the email address they supplied upon registration if there are any changes to the data, schedule, participation instructions, or rules.

8. Unforeseen event

We reserve the right to cancel, modify, or suspend this contest if an unforeseeable or unexpected event (such as, but not limited to: cheating; a virus, bug, or catastrophic event corrupting data or the submission platform; someone discovering a flaw in the data or modalities of the challenge) affects the fairness and/or integrity of the contest. This is known as a "force majeure" event. Regardless of whether a mistake was made by a human or a machine, this right is reserved.

9. Privacy

The personal data required to fill-in the registration form will be stored and processed in accordance with EU/GDPR for the purpose of participating in the challenge, is meant for internal use only and will not be shared with third parties. We will use this information to verify the participants eligibility and contact them throughout the challenge period and subsequent workshop. The organisers will retain the provided information for as long as needed to proceed with the challenge and subsequent workshop.

Note that the participants data needed to request formal dataset access to the underlying datasets is considered a different set of personal data from the personal data described above, and as such it follows different rules and lawful basis of data processing. The right of information of such data is described in the respective EULAs and/or Licence.

DISCLAIMER

ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED “AS-IS”. THE ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL CHALEARN AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE.

References

[1] Siyang Song, Micol Spitale, Yiming Luo, Batuhan Bal, and Hatice Gunes. 2023. Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How? arXiv e-prints (2023), arXiv–2302

[2] Angelo Cafaro, Johannes Wagner, Tobias Baur, Soumia Dermouche, Mercedes Torres Torres, Catherine Pelachaud, Elisabeth André, and Michel Valstar. 2017. The NoXi database: multimodal recordings of mediated novice-expert interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 350–359.

[4] Fabien Ringeval, Andreas Sonderegger, Juergen Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, 1–8