REACT 2024
The Second REACT Challenge@IEEE FG24
In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previously unseen speaker behaviour is a challenging task. Following the successful organisation of the first REACT challenge (REACT2023), we propose the second REACT Challenge focusing on developing generative models that can automatically output multiple appropriate, diverse, realistic and synchronised facial reactions under both online and offline settings. Different from the first edition of the REACT Challenge, the second REACT Challenge encourages the participants to generate realistic images and video clips as results of their submission. Participants will develop and benchmark Machine Learning (ML) models that can be used to generate appropriate facial reactions given an input stimulus under various dyadic video conference settings, using two state-of-the-art datasets, namely, NOXI and RECOLA. As part of the challenge, we will provide challenge participants with the REACT Challenge Dataset that will be a compilation of NOXI and RECOLA recordings segmented into 30-secs interaction video-clips (pairs of videos) and baseline PyTorch code (including a well-developed dataloader). We will then invite the participating groups to submit their developed / trained ML models for evaluation, which will be benchmarked in terms of the appropriateness diversity, realism and synchrony of the generated facial reactions.
Important Dates
Launching Challenge website and call for participation poster: November 2, 2023
Registration open: November 5, 2023
Training and validation sets released: November 14, 2023
Baseline paper and code released: December 31, 2023 January 10, 2024
Test set released: March 1, 2024
Final result and model submission: March 15, 2024
Paper submission deadline: March 29, 2024
Paper acceptance notification: April 5, 2024
Camera ready paper submission deadline: April 11, 2024
The First Edition (REACT23 @ ACM-MM)
The first edition of the REACT challenge was held in conjunction with the with the ACM Multimedia (ACM-MM) 2023 in Ottawa, Canada.
As result of the first edition, we released the baseline code in this GitHub repository and corresponding paper. The call for participation attracted registration of 11 teams from 6 countries, with 10 teams participating in the Offline and Online sub-challenges, respectively. The top 3 teams have successfully submitted valid models, results and papers for the challenge, with each paper submission being assigned two reviewers.
The information about the previous edition can be found on this website.
Challenge Tasks
Given the spatio-temporal behaviours expressed by a speaker at the time period, the proposed REACT 2024 Challenge will consist of the following two sub-challenges whose theoretical underpinnings have been defined and detailed in this paper.
Task 1 - Offline Appropriate Facial Reaction Generation
This task aims to develop a machine learning model that takes the entire speaker behaviour sequence as the input, and generates multiple appropriate and realistic / naturalistic spatio-temporal facial reactions, consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
Task 2 - Online Appropriate Facial Reaction Generation
This task aims to develop a machine learning model that estimates each frame, rather than taking all frames into consideration. The model is expected to gradually generate all facial reaction frames to form multiple appropriate and realistic / naturalistic spatio-temporal facial reactions consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
Challenge Datasets
The second REACT challenge relies on two video conference corpora: RECOLA [3], and NOXI [4]. Specifically, we first segmented each audio-video clip in two datasets into a 30-seconds long clip. Then, we cleaned the dataset by selecting only the dyadic interactions with complete data of both conversational partners (where both faces were within the frame of the camera). This resulted in 5919 clips of 30 seconds each (71,8 hours of audio-video clips), specifically: 5870 clips (49 hours) from the NoXi dataset and 54 clips (0,4 hour) from the RECOLA dataset. We divided the datasets into training, test and validation sets. We split the datasets with a subject-independent strategy (i.e., the same subject was never included in the train and test sets).
The REmote COLlaboration and Affective (RECOLA) database contains 9,5 hours of audio, visual, and physiological recordings of online dyadic interactions between 46 French-speaking participants collaborating on a task.
NOXI (NOvice eXpert Interaction) is a database containing screen-mediated face-to-face interactions. It is annotated during an information retrieval task targeting multiple languages, multiple topics, and the occurrence of unexpected situations.
Participating in the Challenge
Result and Paper Submissions
Participants should use the training and validation set to develop their facial reaction models, and submit the final models via email to reactmultimodalchallenge@gmail.com. The final results will be evaluated by organizers on the test set. All participants will be ranked based on the results on the test set. Please look at this page for more details.
The challenge participants will be invited to submit a workshop-style paper describing their ML solutions and results on the dataset -- these will be peer-reviewed and once accepted, will appear in the FG 2024 Challenge/Workshop Proceedings. The format of the paper follows the same requirements as the main conference of the FG 2024 (4 pages excluding the references).
Organizers
Dr Micol Spitale*, Politecnico di Milano, Italy & University of Cambridge, Cambridge, United Kingdom
Dr Siyang Song*, University of Leicester & University of Cambridge, United Kingdom
Cheng Luo, Monash University, Australia
Cristina Palmero, Universitat de Barcelona, Barcelona, Spain
German Barquero, Universitat de Barcelona, Barcelona, Spain
Prof Sergio Escalera, Universitat de Barcelona, Barcelona, Spain
Prof Michel Valstar, University of Nottingham, Nottingham, United Kingdom
Dr Tobias Baur, University of Augsburg, Augsburg, Germany
Dr Fabien Ringeval, Université Grenoble Alpes, Grenoble, France
Prof Elisabeth Andrè, University of Augsburg, Augsburg, Germany
Prof Hatice Gunes, University of Cambridge, Cambridge, United Kingdom
Contact Us
Feel free to contact us at this email: reactmultimodalchallenge@gmail.com
References
[1] Song, S., Spitale, M., Luo, Y., Bal, B., and Gunes, H. "Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?." arXiv preprint arXiv:2302.06514 (2023).
[2] Song, S., Spitale, M., Luo, C., Barquero, G., Palmero, C., Escalera, S., ... & Gunes, H. (2023, October). REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 9620-9624).
[3] Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013, April). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG) (pp. 1-8). IEEE.
[4] Cafaro, A., Wagner, J., Baur, T., Dermouche, S., Torres Torres, M., Pelachaud, C., ... & Valstar, M. (2017, November). The NoXi database: multimodal recordings of mediated novice-expert interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (pp. 350-359).
[5] Song, Siyang, et al. "REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge." arXiv preprint arXiv:2306.06583 (2023).