ACII 2025 Workshop

Multilingual and Multimodal Affective Computing (MMAC)

Understanding Emotion Across Cultures and Contexts

October 11 morning, 2025 @Canberra, Australia

MMAC Workshop Successfully Completed!

Many thanks to all participants for your valuable contributions.

Keynote slides and workshop photos are available for participants upon request.

Email: cheng[at]tohoku.ac.jp

Overview

How we express and interpret emotion is shaped by the languages we speak, the cultures we belong to, and the social contexts we inhabit. However, current affective computing models, datasets, and annotation frameworks often rely on culturally homogeneous populations. This narrow focus risks misrepresenting the rich diversity of emotional expression found across global communities, resulting in inaccurate interpretations, cultural bias, and limited applicability in real-world scenarios.

The MMAC workshop aims to broaden the scope of affective computing by convening researchers from psychology, linguistics, human-computer interaction, machine learning, and AI ethics. Together, we will explore how emotional expression varies across languages, cultures, and communicative modalities. A central focus will be on integrating speech, facial expressions, body language, and gestures within diverse linguistic and social environments.

This workshop will be held at the 13th International Conference on Affective Computing and Intelligent Interaction (https://acii-conf.net/2025/). ACII 2025 will be an in-person event held at the Hotel Realm conference venue in Canberra, Australia.

We welcome submissions addressing the following topics:

Cross-cultural and multilingual emotion expression (Psychology & linguistics perspective)

E.g.

- Analyze how gestures, speech, and body language vary across subcultures (e.g., regional Asian groups, immigrant communities).
- Study emotion recognition in multilingual and code-switching contexts (e.g., dialectal variations, phonetic cues in emotional speech).
Multimodal Emotion Recognition in Dynamic Social Interactions (Computational perspective)

E.g.

- Explore the interplay of speech, facial and bodily expressions
- Model multiperson dynamics in social scenarios, including professional settings and informal communication
Context-aware Data Resources and Annotation (Data-centric perspective)

E.g.

- Build datasets capturing social norms and context-specific expressions
- Design annotation framework for linguistic diversity and ethical considerations (privacy, representation)
Inclusive AI (Ethics & social responsibility perspective)

E.g.

- Develop strategies for cross-cultural model adaptation (e.g., transfer learning, meta-learning).
- Establish benchmarks & evaluation metrics for AI’s ability to handle diverse expression styles.

Program

October 11, @High Courtyard North, Hotel Realm

08:10 Opening

Session 1 Keynotes & Paper Presentations

08:15 Keynote 1 Multimodal Deepfake Detection Across Cultures and Languages (Abhinav Dhall) (Online)

08:50 Keynote 2 Embracing the Complexities in Emotion Recognition: Ambiguity, Dynamics and Measurement (Vidhyasaharan Sethu)

09:25 Short Paper Presentations

Each talk will be 7 minutes including Q&A.

All papers will be uploaded to Google Drive, and will be accessible to all participants.

10:00 - 10:30 Coffee Break

10:30 Keynote 3 Personalising emotion recognition with explanations (Leimin Tian)

Session 2 Group Discussion

11:05 Introduction of topics and group allocation

11:15 Self-introduction within groups

11:25 Group discussion

12:10 Closing remarks

12:15 Lunch (self-arranged, opportunity for more discussion)

Keynote Speakers

Abhinav Dhall, Monash University

Title: Multimodal Deepfake Detection Across Cultures and Languages

Abstract: The growing accessibility of Generative AI based image and video manipulation tools has made the creation of deepfakes easier. This poses significant challenges for content verification and can spread misinformation. In this talk, we explore multimodal approaches that are inspired from user behavior for detecting and localizing manipulations in time. A key focus of our work is on multilingual and multicultural aspects of deepfake detection. Our research draws on user studies, including those focusing on multicultural deepfakes, which provide insights into how different audiences perceive and interact with manipulated media. I will also discuss the LLM based approaches for multimodal dataset creation for affect tasks as well.

Vidhyasaharan Sethu, UNSW

Title: Embracing the Complexities in Emotion Recognition: Ambiguity, Dynamics and Measurement

Abstract: Emotion recognition systems have made significant strides in recent years yet fall short of mimicking the nuances of human perception of emotions. This talk addresses three critical aspects that are essential for advancing the field toward more human-like emotion recognition systems. Namely, handling the inherent ambiguity of perceived affect, modelling temporal dynamics, and quantifying performance in the presence of uncertainty. It will present the view that affective computing systems must embrace the inherent ambiguity in emotion perception at all stage to better emulate human interactions. Additionally, it will explain how tracking this ambiguity over time can enhance affect recognition, by enabling the system to incorporate additional constraints and knowledge, such as the degree of ambiguity will not fluctuate rapidly during a natural conversation. Finally, this talk will address the critical yet often overlooked aspect of quantifying prediction accuracy in the presence of ambiguity.

Leimin Tian, CSIRO

Title: Personalising emotion recognition with explanations

Abstract: Automatic facial expression recognition (FER) is key to developing emotion-aware intelligent agents and systems. However, key challenges remain in real-world applications due to the diversity of emotional expressions in different social, cultural, and personal contexts. We developed a novel explanation method utilizing Facial Action Units (FAUs) to explain the output of a FER model through both textual and visual modalities. Our user study showed that visual+textual explanations resulted in better user understanding and appropriate trust of the FER model. Further, we demonstrated that the same model for generating the FAU-based explanations is effective as a sampling approach for personalising the FER model by selecting FAU activation matched re-training samples from collected dataset.

Important Dates

The time zone for the deadlines below is Anywhere on Earth (AOE).

Paper submission deadline for workshops: 30 June 2025 7 July 2025 (extended)
Workshop papers decision notification: 28 July 2025
Workshop camera-ready deadline: 20 August 2025
Workshop date and time: 11 October 2025 (morning)

Submission

We invite submissions of short papers (max. 5 pages; 4 pages + 1 page for references) focusing on one of the 4 key topics mentioned above. The workshop encourages both theoretical papers (establishing new directions for context- and culture-aware AI) and empirical papers (presenting new datasets, annotation frameworks, analytical methods, or computational models).

Submissions should be double-blind, i.e., anonymous, follow the official submission guidelines from ACII2025, and clearly state which one of the four key topics they most closely align with. Each paper will be sent to at least two expert reviewers associated with relevant key areas and will have one of the organizers assigned as editor.

Accepted papers will be published in the ACII 2025 Workshop Proceedings. At least one author must register for the workshop and one conference day.

Please submit your manuscript via the EasyChair platform.

Submission Platform

Initial submission [EasyChair]

Camara ready [ACII 2025 EasyChair]

Paper Template

[Latex] [Overleaf] [Word]