The 3rd Micro-Action Analysis

Grand Challenge

Towards Fine-grained Micro-Action Understanding

A Challenge in Conjunction with ACM MM'26

Introduction

The Micro-Action Analysis Grand Challenge focuses on computer vision and machine learning methods for automatic human behavior based on whole-body micro-action that is bound up with psychological and mental state and emotion state analysis. Micro-actions are spontaneous body movements that indicate a person's true feelings and potential intentions, yet recognizing, distinguishing, and understanding micro-actions are challenging because they are subtle and appear for milliseconds compared to normal actions. To address these challenges, we successfully organized the 1st Micro-Action Analysis Grand Challenge (MAC 2024@ACM MM 2024) and 2nd Micro-Action Analysis Grand Challenge (MAC 2025@ACM MM 2025), attracting over 100 teams worldwide, demonstrating the growing interest in this emerging field. Building on this momentum, we are excited to announce the 3rd Micro-Action Analysis Grand Challenge (MAC 2026).

This challenge aims to foster innovative research in this emerging domain and provide benchmark evaluations to stimulate new approaches for utilizing whole-body micro-actions in human behavior understanding. Ultimately, our goal is to promote technological advancements in deep psychological assessment and emotional state analysis, and to inspire interdisciplinary collaboration within the research community.

Challenges

MAC 2026 focuses on the recognition, detection, and fine-grained understanding of micro-action, this challenge aims to develop and benchmark models that are capable of human micro-action recognition (MAR), multi-label micro-action detection (MMAD) and fine-grained micro-action understanding (FMAU), in preparation for exploring the relationship between micro-actions and human emotions.

Track 1: Micro-Action Recognition (MAR)

Micro-Action Recognition (MAR) aims to recognize and distinguish subtle body actions that typically occur in a brief instant. The MAR task is similar to conventional action recognition, as it involves using video instances as input and requires precise and efficient algorithms. However, it is uniquely complex due to the presence of low-amplitude fluctuations in gestures and postures.

Dataset: MA-52 | The dataset used in this track remains the same as in the previous years. We will release the test platform in the test phase.

Track 2: Multi-label Mirco-Action Detection (MMAD)

Considering the co-occurrence of human micro-actions, i.e., the same micro-action may be repeated in time and different micro-actions may occur at the same time, Multi-label Micro-Action Detection (MMAD) is necessary for a deeper understanding of human bodily behavior. Multi-label Micro-Action Detection (MMAD) refers to the task of identifying and localizing all micro-actions in a given uncut and densely annotated video, determining their corresponding start and end times, as well as their categories. This task takes an entire video as input and requires a model capable of accurately capturing both long-term and short-term action relationships to detect and locate multiple micro-actions. Designing a model for MMAD is more challenging due to the brief duration and small magnitude of micro-actions. The dataset used in this track remains consistent with that of the previous year.

Dataset: MMA-52 | The dataset used in this track remains the same as in the previous years. We will release the test platform in the test phase.

Track 3: Fine-grained Mirco-Action Understanding (FMAU)

Inspired by the rapid advancement of multimodal large language models, which demonstrate strong capabilities in visual understanding and reasoning, Fine-grained Micro-Action Understanding (FMAU) aims to evaluate whether MLLMs can perceive, compare, and reason about subtle micro-actions. Formulated as a video question answering task, it takes a micro-action video and a query as input, requiring answers in multiple formats (e.g., multiple-choice, Yes/No, or open-ended). This track is further organized into three tiers—perceptual recognition, relational comprehension, and interpretive reasoning—making it more challenging as it requires deeper reasoning over subtle motion cues and complex dependencies.

Dataset: MA-Bench | 🔥🔥🔥This is a new track that focuses on fine-grained micro-action understanding. For more details, please refer to https://github.com/Micro-Action/MAC_2026_Track3_StarterKit.

🔥🔥🔥 Registration Form: Google Form

News and Updates

4, May, 2026 🔥🔥🔥We release the registration form.
7, Mar, 2026 Our Grand Challenge Proposal is accepted for ACM MM 2026!

Contacts

For questions regarding the challenge, please contact both kunli.hfut@gmail.com and guodan@hfut.edu.cn

Presentation policy: ACM Multimedia 2026 is an on-site event only. This means that all papers and contributions must be presented by a physical person on-site; remote presentations will not be hosted or allowed. Papers and contributions not presented on-site will be considered a no-show and removed from the proceedings of the conference. More details will be provided to handle unfortunate situations in which none of the authors would be able to attend the conference physically.

MAC 2026 Logo credit to GPT-Image2.

Page updated

Google Sites

Report abuse