REACT 2024
The Second REACT Challenge@IEEE FG24

In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previously unseen speaker behaviour is a challenging task. Following the successful organisation of the first REACT challenge (REACT2023), we propose the second REACT Challenge focusing on developing generative models that can automatically output multiple appropriate, diverse, realistic and synchronised facial reactions under both online and offline settings.  Different from the first edition of the REACT Challenge, the second REACT Challenge encourages the participants to generate realistic images and video clips as results of their submission. Participants will develop and benchmark Machine Learning (ML) models that can be used to generate appropriate facial reactions given an input stimulus under various dyadic video conference settings, using two state-of-the-art datasets, namely, NOXI and RECOLA. As part of the challenge, we will provide challenge participants with the REACT Challenge Dataset that will be a compilation of NOXI and RECOLA recordings segmented into 30-secs interaction video-clips (pairs of videos) and baseline PyTorch code (including a well-developed dataloader).  We will then invite the participating groups to submit their developed / trained ML models for evaluation, which will be benchmarked in terms of the appropriateness diversity, realism and synchrony of the generated facial reactions.

Important Dates

The First Edition (REACT23 @ ACM-MM)

The first edition of the REACT challenge was held in conjunction with the with the ACM Multimedia (ACM-MM) 2023  in Ottawa, Canada. 

As result of the first edition, we released the baseline code in this GitHub repository and corresponding paper. The call for participation attracted registration of 11 teams from 6 countries, with 10 teams participating in the Offline and Online sub-challenges, respectively. The top 3 teams have successfully submitted valid models, results and papers for the challenge, with each paper submission being assigned two reviewers. 

The information about the previous edition can be found on this website

Challenge Tasks

Given the spatio-temporal behaviours expressed by a speaker at the time period, the proposed REACT 2024 Challenge will consist of the following two sub-challenges whose theoretical underpinnings have been defined and detailed in this paper.

Task 1 - Offline Appropriate Facial Reaction Generation

This task aims to develop a machine learning model that takes the entire speaker behaviour sequence as the input, and generates multiple appropriate and realistic / naturalistic spatio-temporal facial reactions, consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result,  facial reactions are required to be generated for the task given each input speaker behaviour. 

Task 2 - Online Appropriate Facial Reaction Generation

This task aims to develop a machine learning model that estimates each frame, rather than taking all frames into consideration. The model is expected to gradually generate all facial reaction frames to form multiple appropriate and realistic / naturalistic spatio-temporal facial reactions consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result,  facial reactions are required to be generated for the task given each input speaker behaviour. 

Challenge Datasets

The second REACT challenge relies on two video conference corpora: RECOLA [3], and NOXI [4]. Specifically, we first segmented each audio-video clip in two datasets into a 30-seconds long clip. Then, we cleaned the dataset by selecting only the dyadic interactions with complete data of both conversational partners (where both faces were within the frame of the camera). This resulted in 5919 clips of 30 seconds each (71,8 hours of audio-video clips), specifically: 5870 clips (49 hours) from the NoXi dataset and 54 clips (0,4 hour) from the RECOLA dataset. We divided the datasets into training, test and validation sets. We split the datasets with a subject-independent strategy (i.e., the same subject was never included in the train and test sets).

Participating in the Challenge

Result and Paper Submissions

Participants should use the training and validation set to develop their facial reaction models, and submit the final models via email to reactmultimodalchallenge@gmail.com. The final results will be evaluated by organizers on the test set.  All participants will be ranked based on the results on the test set. Please look at this page for more details.

The challenge participants will be invited to submit a workshop-style paper describing their ML solutions and results on the dataset -- these will be peer-reviewed and once accepted, will appear in the FG 2024 Challenge/Workshop Proceedings.  The format of the paper follows the same requirements as the main conference of the FG 2024 (4 pages excluding the references). 

Baseline code and paper

Please find the baseline code in this GitHub repository, while the paper here.

Organizers

Contact Us

Feel free to contact us at this email: reactmultimodalchallenge@gmail.com 

References

[1]  Song, S., Spitale, M., Luo, Y., Bal, B., and Gunes, H. "Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?." arXiv preprint arXiv:2302.06514 (2023).

[2] Song, S., Spitale, M., Luo, C., Barquero, G., Palmero, C., Escalera, S., ... & Gunes, H. (2023, October). REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 9620-9624).

[3] Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013, April). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG) (pp. 1-8). IEEE.

[4] Cafaro, A., Wagner, J., Baur, T., Dermouche, S., Torres Torres, M., Pelachaud, C., ... & Valstar, M. (2017, November). The NoXi database: multimodal recordings of mediated novice-expert interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (pp. 350-359).

[5] Song, Siyang, et al. "REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge." arXiv preprint arXiv:2306.06583 (2023).