The 1st International Workshop on Multimodal Foundation Models for 3D/4D Facial Expression Analysis and Synthesis (MFM-FE 2026) will be held in conjunction with the 20th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2026), taking place in Kyoto, Japan. This workshop aims to bring together researchers and practitioners from computer vision, affective computing, and multimodal AI to explore the latest advances in foundation models for facial behavior understanding and generation. With the rapid emergence of large-scale pre-trained models that integrate vision, language, and temporal dynamics, new opportunities have arisen for analyzing and synthesizing complex 3D/4D facial expressions across diverse contexts and applications. MFM-FE 2026 will serve as a focused forum for discussing novel methodologies, datasets, and applications, fostering interdisciplinary collaboration and advancing the state of the art in facial expression analysis and synthesis within the broader FG research community.
Facial expression analysis has long been central to understanding human affect, behavior, and communication. However, the emergence of foundation models, spanning vision, language, and multimodal learning, has transformed how subtle and dynamic facial behaviors can be modeled, interpreted, and generated. Traditional CNNor RNN-based approaches, while effective in constrained settings, struggle to generalize across identities, cultures, and real-world variability. In contrast, large-scale pre-trained multimodal architectures offer scalable, transferable, and interpretable representations for 3D/4D facial dynamics, micro- and macro-expression recognition, and textguided expression synthesis. This workshop aims to explore how multimodal and foundation model paradigms can advance facial expression research, thereby moving beyond static emotion recognition to dynamic, context-aware, and linguistically grounded understanding of human affect. It seeks to bring together researchers from affective computing, multimodal learning, behavioral signal processing, and generative modeling to define the next generation of human-centered AI for expressive behavior.
Topics of Interest (but are not limited to)
Multimodal foundation models for facial expression analysis
Vision-language models for emotion understanding
3D and 4D facial dynamics learning and modeling
Micro- and macro-expression recognition with pre-trained models
Self-supervised and few-shot learning for facial behavior analysis
Text-driven facial expression synthesis and editing
Diffusion and transformer-based generative models for expressions
Cross-modal fusion of facial, vocal, and physiological signals
Context-aware and explainable affective computing
Temporal and dynamic modeling of facial expressions
Domain adaptation and generalization across datasets
Applications in mental health and affective disorder detection
Deception detection and negotiation behavior analysis
Emotion understanding for human–robot interaction
Benchmark datasets and evaluation protocols for multimodal expression analysis
Ethical, privacy, and bias considerations in facial expression modeling
Interpretability and trustworthy AI for affective systems
Multiview and cross-domain facial representation learning
Text-guided 3D/4D facial animation and synthesis
Integration of large pre-trained models in social and behavioral computing
Workshop paper submission deadline: April 05, 2026
Notification of acceptance: April 15, 2026
Camera-ready deadline: April 21, 2026 (aligned with FG 2026 main conference deadline)
Submissions via the FG 2026 CMT portal website: https://cmt3.research.microsoft.com/FG2026 (select the corresponding Workshop track)
Review process: single-blind peer review.
Submissions may be up to a maximum of 8 pages + references, similar to the main conference.
Please follow the instructions and paper format (overleaf/latex/word templates) posted on the main IEEE FG 2026 website here.
Accepted papers published in the IEEE FG 2026 Workshop Proceedings
Workshop Lead Orgnizer: Dr. Muzammil Behzad, Assistant Professor, King Fahd University of Petroleum and Minerals, Saudi Arabia. Email: muzammil.behzad@kfupm.edu.sa.
Workshop Co-Orgnizer: Dr. Yante Li, Postdoctoral Researcher, Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Finland. Email: yante.li@oulu.fi.