REACT 2023 Multimodal Challenge
@ACM-MM23
Human behavioural responses are stimulated by their environment (or context), and people will inductively process the stimulus and modify their interactions to produce an appropriate response [1]. When facing the same stimulus, different facial reactions could be triggered across not only different subjects but also the same subjects under different contexts. The Multimodal Multiple Appropriate Facial Reaction Generation Challenge (REACT 2023) is a satellite event of ACM MM 2023, (Ottawa, Canada, October 2023), which aims at comparison of multimedia processing and machine learning methods for automatic human facial reaction generation under different dyadic interaction scenarios. The goal of the Challenge is to provide the first benchmark test set for multimodal information processing and to bring together the audio, visual and audio-visual affective computing communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under well-defined conditions.
Important Dates
Registration opening: April 3, 2023
Training and development sets available: April 10, 2023
Baseline code available: May 22, 2023
Test sets available: June 12, 2023
Final code and results submission: June 30, 2023 July 3, 2023
Top-3 teams notification of acceptance: July 8, 2023
Paper submission deadline: July 14, 2023
Camera ready paper: July 31, 2023
Workshop day: October 29, 2023 (TBD)
Challenge Tasks
Given the spatio-temporal behaviours expressed by a speaker at the time period, the proposed REACT 2023 Challenge will consist of the following two sub-challenges whose theoretical underpinnings have been defined and detailed in this paper.
Task 1 - Offline Appropriate Facial Reaction Generation
This task aims to develop a machine learning model that takes the entire speaker behaviour sequence as the input, and generates multiple appropriate and realistic / naturalistic spatio-temporal facial reactions, consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
Task 2 - Online Appropriate Facial Reaction Generation
This task aims to develop a machine learning model that estimates each frame, rather than taking all frames into consideration. The model is expected to gradually generate all facial reaction frames to form multiple appropriate and realistic / naturalistic spatio-temporal facial reactions consisting of AUs, facial expressions, valence and arousal state representing the predicted facial reaction. As a result, facial reactions are required to be generated for the task given each input speaker behaviour.
Challenge Datasets
The REACT 2023 Multimodal Challenge Dataset is a compilation of recordings from the following three publicly available datasets for studying dyadic interactions: UDIVA [2], RECOLA [3], and NOXI [4].
UDIVA (Understanding Dyadic Interactions from Video and Audio signals) contains 188 dyadic interaction clips between 147 voluntary participants for 90,5 hours of recordings. Each clip contains two audio-visual files that record the dyadic interaction between a pair of participants who converse about five tasks.
The REmote COLlaboration and Affective (RECOLA) database contains 9,5 hours of audio, visual, and physiological recordings of online dyadic interactions between 46 French-speaking participants collaborating on a task.
NOXI (NOvice eXpert Interaction) is a database containing screen-mediated face-to-face interactions. It is annotated during an information retrieval task targeting multiple languages, multiple topics, and the occurrence of unexpected situations.
Participating in the Challenge
You will be required to join the REACT challenge on CodaLab later in the challenge (around May).
Result and Paper Submissions
Participants should use the training and validation set to develop their facial reaction models, and submit the final models via email to reactmultimodalchallenge@gmail.com. The final results will be evaluated by organizers on the test set. All participants will be ranked based on the results on the test set. Please look at this page for more details.
The challenge participants will be invited to submit a workshop-style paper describing their ML solutions and results on the dataset -- these will be peer-reviewed and once accepted, will appear in the ACM Multimedia 2023 Challenge/Workshop Proceedings. The format of the paper follows the same requirements as the main conference of the ACM-MM 2023 (4 pages excluding the references). Please look at this page for more details.
Organizers
Dr Micol Spitale*, University of Cambridge, Cambridge, United Kingdom
Dr Siyang Song*, University of Leicester & University of Cambridge, United Kingdom
Cristina Palmero, Universitat de Barcelona, Barcelona, Spain
Prof Sergio Escalera, Universitat de Barcelona, Barcelona, Spain
Prof Michel Valstar, University of Nottingham, Nottingham, United Kingdom
Dr Tobias Baur, University of Augsburg, Augsburg, Germany
Dr Fabien Ringeval, Université Grenoble Alpes, Grenoble, France
Prof Elisabeth Andrè, University of Augsburg, Augsburg, Germany
Prof Hatice Gunes, University of Cambridge, Cambridge, United Kingdom
Sponsor
We would like to thank our sponsor BlueSkeye AI for supporting the REACT2023 challenge!
Contact Us
Feel free to contact us at this email: reactmultimodalchallenge@gmail.com
References
[1] Song, S., Spitale, M., Luo, Y., Bal, B., and Gunes, H. "Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?." arXiv preprint arXiv:2302.06514 (2023).
[2] Palmero, C., Selva, J., Smeureanu, S., Junior, J., Jacques, C. S., Clapés, A., ... & Escalera, S. (2021). Context-aware personality inference in dyadic scenarios: Introducing the udiva dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1-12).
[3] Ringeval, F., Sonderegger, A., Sauer, J., & Lalanne, D. (2013, April). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG) (pp. 1-8). IEEE.
[4] Cafaro, A., Wagner, J., Baur, T., Dermouche, S., Torres Torres, M., Pelachaud, C., ... & Valstar, M. (2017, November). The NoXi database: multimodal recordings of mediated novice-expert interactions. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (pp. 350-359).
[5] Song, Siyang, et al. "REACT2023: the first Multi-modal Multiple Appropriate Facial Reaction Generation Challenge." arXiv preprint arXiv:2306.06583 (2023).