According to the Stimulus Organism Response theory, for a given external stimulus, individuals may react differently according to their internal state and external contextual factors in a specific period in time. Analogously, in dyadic interactions, a broad spectrum of human facial reactions might be appropriate for responding to a specific human speaker behaviour. Following the successful organisation of the REACT 2023 and REACT 2024 challenges, a body of generative deep learning (DL) models has been investigated for the problem of multiple appropriate facial reaction generation (MAFRG). While these two REACT challenges built on manually segmented 30-second dyadic interaction clips originally recorded for personality and emotion recognition purposes and thus their corresponding appropriate facial reaction labels are not reliably labelled during the data collection, we are proposing the REACT 2025 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can be used to generate multiple appropriate, diverse, realistic and synchronised human-style facial reactions expressed by human listeners in response to an input stimulus (i.e., audio-visual behaviours expressed by their corresponding speakers). As a key of the challenge, we will provide challenge participants with the first natural and large-scale multi-modal MAFRG dataset (called MARS) recording 137 human-human dyadic interactions containing a total of 3,105 interaction sessions covering five different topics. We will then invite the challenge participating groups to submit their developed/trained ML models for evaluation, which will be benchmarked in terms of the appropriateness, diversity, realism and synchronisation of their generated facial reactions.
Launching Challenge website and call for participation poster: March 10, 2025
Registration open: March 10, 2025
Training and validation sets released: March 31, 2025
Baseline paper and code released: May 22, 2025
Model submission opening: May 26, 2025
Final result and model submission deadline: June 26, 2025 July 5, 2025
Paper submission deadline: June 30, 2025 July 20, 2025
Paper acceptance notification: July 24, 2025
Camera-ready paper submission deadline: August 26, 2025
Challenge workshop: Octorber 2025 (TBD)
The first edition of the REACT challenge was held in conjunction with the with the ACM Multimedia (ACM-MM) 2023 in Ottawa, Canada.
As result of the first edition, we released the baseline code in this GitHub repository and corresponding paper. The call for participation attracted registration of 11 teams from 6 countries, with 10 teams participating in the Offline and Online sub-challenges, respectively. The top 3 teams have successfully submitted valid models, results and papers for the challenge, with each paper submission being assigned two reviewers.
The information about the previous edition can be found on this website.
The second edition of the REACT challenge was held in conjunction with the with theIEEE International Conference on Automatic Face and Gesture Recognition (FG2024) in Istanbul, Turkey.
As result of the second edition, we released the baseline code in this GitHub repository and corresponding paper. The call for participation attracted registration of 13 teams from 6 countries, with 13 teams participating in the Offline and 12 teams in the Online sub-challenges, respectively. The top 3 teams have successfully submitted valid models, results and papers for the challenge, with each paper submission being assigned two reviewers.
The information about the previous edition can be found on this website.
Given the spatio-temporal behaviours expressed by a speaker at the time period, the proposed REACT 2025 Challenge will consist of the following two sub-challenges whose theoretical underpinnings have been defined and detailed in this paper.
This task aims to develop a machine learning model that takes the entire speaker behaviour sequence as the input, and generates multiple appropriate and realistic / naturalistic spatio-temporal facial reactions, consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
This task aims to develop a machine learning model that estimates each frame, rather than taking all frames into consideration. The model is expected to gradually generate all facial reaction frames to form multiple appropriate and realistic / naturalistic spatio-temporal facial reactions consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
Unlike last challenges, this year we propose a new dataset named the Multi-modal Multiple Appropriate Reaction in Social Dyads (MARS). The MARS Dataset is the first multi-modal dataset that is specifically collected for MAFRG tasks. It comprises 137 human-human dyadic interaction audio-visual-EEG clips recorded from 23 speakers and 137 listeners. These involved participants can be categorized into two roles: speakers and listeners. Specifically, each clip capturing a pair of human speaker and listener's audio, face, and EEG behaviours in separate files, resulting in 270 multi-modal recordings whose durations range from 20 to 35 minutes. Here, each multi-modal recording contains 23 distinct sessions covering five main topics conducted in a fixed order, including cultural differences, movie scene sharing, policy changes, quizzes and games, as well as scenario-based interviews. As a result, we split all recordings into 3,105 multi-modal session pairs (6,210 sessions). During each recording, the speaker and listener are situated in separate rooms to interact with each other on Microsoft Teams through screens. Prior to that, two volunteers assist both speaker and listener in setting up their cameras, microphones and EEG sensors (MUSE-2), while adjusting the webcam to optimally capture them within the recording frame. Then, the speaker is informed to start the conversation once the recordings of audio, video and EEG signal commence, where each speaker is responsible for initiating and directing the discussions with their corresponding listener, maintaining consistent semantic contexts through the pre-designed conversational topics. Consequently, a set of diverse verbal and non-verbal behaviour reactions expressed by different listeners under a consistent context are considered appropriate in response to each speaker behaviour, which shapes the designed semantic context. Throughout the conversation session, the two volunteers monitor the interaction from a third room to ensure that the conversation is under the designed interaction control, and to immediately handle the unexpected interruptions (i.e., network interruption). As the conversation concludes, the volunteers return to the recording rooms to stop the recordings and assist participants in removing their wearable equipment. All recordings of the MARS dataset were collected primarily from students and staff at the University of Leicester, United Kingdom, between June to October 2024.
Participants should use the training and validation set to develop their facial reaction models, and submit the final models via email to s.song@exeter.ac.uk. The final results will be evaluated by organizers on the test set. All participants will be ranked based on the results on the test set. Please look at this page for more details.
The challenge participants will be invited to submit a workshop-style paper describing their ML solutions and results on the dataset -- these will be peer-reviewed and once accepted, will appear in the ACM Multimedia 2025 Challenge/Workshop Proceedings. The format of the paper follows the same requirements as the main conference of the ACM-MM 2025 (4 pages excluding the references). Please look at this page for more details.
The baseline paper: https://arxiv.org/pdf/2505.17223
The baseline code: https://github.com/reactmultimodalchallenge/baseline_react2025
Dr Siyang Song, University of Exeter, Exeter, United Kingdom
Dr Micol Spitale, Politecnico di Milano, Milan, Italy
Xiangyu Kong, University of Exeter, Exeter, United Kingdom
Hengde Zhu, University of Leicester, Leicester, United Kingdom
Cheng Luo, King Abdullah University of Science and Technology, Saudi Arabia
Dr Cristina Palmero, King’s College London, London, United Kingdom
German Barquero, Universitat de Barcelona, Barcelona, Spain
Prof Sergio Escalera, Universitat de Barcelona, Barcelona, Spain
Prof Michel Valstar, University of Nottingham, Nottingham, United Kingdom
Prof Mohamed Daoudi, IMT Nord Europe, Villeneuve d’Ascq, France
Dr Tobias Baur, University of Augsburg, Augsburg, Germany
Dr Fabien Ringeval, Université Grenoble Alpes, Grenoble, France
Prof Andrew Howes, University of Exeter, Exeter, United Kingdom
Prof Elisabeth Andrè, University of Augsburg, Augsburg, Germany
Prof Hatice Gunes, University of Cambridge, Cambridge, United Kingdom
Feel free to contact us at this email: reactmultimodalchallenge@gmail.com
[1] Song, S., Spitale, M., Luo, Y., Bal, B., and Gunes, H. "Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?." arXiv preprint arXiv:2302.06514 (2023).
[2] Song, S., Spitale, M., Luo, C., Barquero, G., Palmero, C., Escalera, S., ... & Gunes, H. (2023, October). REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 9620-9624).
[3] Song, Siyang, et al. "REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge." arXiv preprint arXiv:2306.06583 (2023).
[4] Luo, Cheng, et al. "ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions." IEEE Transactions on Visualization and Computer Graphics (2024).
[5] Song, Siyang, et al. "React 2024: the second multiple appropriate facial reaction generation challenge." 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 2024.
[6] Nguyen, Minh-Duc, et al. "Vector quantized diffusion models for multiple appropriate reactions generation." 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 2024.
[7] Zhu, Hengde, et al. "Perfrdiff: Personalised weight editing for multiple appropriate facial reaction generation." Proceedings of the 32nd ACM International Conference on Multimedia. 2024.
[8] Nguyen, Dang-Khanh, et al. "Multiple facial reaction generation using gaussian mixture of models and multimodal bottleneck transformer." 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 2024.
[9] Hoque, Ximi, et al. "Beamer: Behavioral encoder to generate multiple appropriate facial reactions." Proceedings of the 31st ACM International Conference on Multimedia. 2023.
[10] Yu, Jun, et al. "Leveraging the latent diffusion models for offline facial multiple appropriate reactions generation." Proceedings of the 31st ACM International Conference on Multimedia. 2023.
[11] Liu, Zhenjie, et al. "One-to-many appropriate reaction mapping modeling with discrete latent variable." 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 2024.