Micro-Action Analysis Grand Challenge
A Challenge in Conjunction with ACM MM'24
Introduction
The Micro-Action Analysis Grand Challenge (MAC 2024) focuses on computer vision and machine learning methods for automatic human behavior based on whole-body micro-action that is bound up with psychological and mental state and emotion state analysis. Micro-actions are spontaneous body movements that indicate a person's true feelings and potential intentions, yet recognizing, distinguishing, and understanding micro-actions are challenging because they are subtle and appear for milliseconds compared to normal actions. We release the MA challenge of single micro-action recognition (MAR) and multi-label micro-action detection (MMAD), and have collected the challenge datasets MA-52 and MMA-52, both including 52 micro-action categories along with seven body parts that were collected for the purpose of MAC 2024. The goal of this Grand Challenge is to foster innovative research in this new direction and to provide benchmark evaluations to inspire a new way of utilizing whole-body micro-action for human behavior understanding, to advance technologies in the deep psychological assessment and human emotion state analysis communities.
Datasets
To support the Micro-Actions Analysis Challenge (MAC 2024), we have collected two new spontaneous whole-body micro-action video datasets consisting of 52 action categories through psychological interviews.
Dataset Collection
205 volunteers, the proportion of males to females is close to 1:1, there is a relatively even age distribution
Professional face-to-face psychological interviews
Under the guidance of the Symptom Checklist 90 (SCL90) test (a comprehensive 90-item psychological assessment questionnaire)
A high-resolution 1920×1080 camera to capture authentic and spontaneous micro-behaviors.
To collect authentic spontaneous micro-actions, we aimed to minimize interference with the respondents’ bodily action expressions and capture as many varied and rich micro-actions as possible.
To ensure participants' comfort and relaxation, they remained seated during the interview.
Annotations of two new datasets
The Micro-Action-52 (MA-52) dataset consists of trimmed video samples with single micro-action (MA) annotations. The duration of these samples ranges from 1 second to 7 seconds, with an average duration of 1.9 seconds and a total span of 12.29 hours. During the annotation of the MA-52 dataset, our focus was on annotating the beginning and ending segments of a single micro-action occurrence, to avoid annotating multiple micro-action behaviors within the same time period. This approach ensures clarity and accuracy of the annotations, resulting in a consistent and reliable dataset.
The Multi-Label Micro-Action-52 (MMA-52) dataset includes video samples of multiple micro-actions (MMA) with overlapping and intersecting occurrences. The duration of these samples ranges from 5 seconds to 15 seconds, with an average duration of 10.61 seconds and a total duration of 30.44 hours. During the annotation process of the MMA-52 dataset, the edited video segments may contain partially unannotated content, but at least two micro-movements are annotated per video clip.
Challenges
MAC 2024 focuses on the recognition and detection of micro-action, this challenge aims to develop and benchmark models that are capable of human micro-action recognition (MAR) and multi-label micro-action detection (MMAD), in preparation for exploring the relationship between micro-actions and human emotions. Detailed tasks are defined in this paper.
Track 1: Micro-Action Recognition (MAR)
Micro-Action Recognition (MAR) aims to recognize and distinguish subtle body actions that typically occur in a brief instant. The MAR task is similar to conventional action recognition, as it involves using video instances as input and requires precise and efficient algorithms. However, it is uniquely complex due to the presence of low-amplitude fluctuations in gestures and postures.
Track 2: Multi-label Mirco-Action Detection (MMAD)
Considering the co-occurrence of human micro-actions, i.e., the same micro-action may be repeated in time and different micro-actions may occur at the same time, Multi-label Micro-Action Detection (MMAD) is necessary for a deeper understanding of human bodily behavior. Multi-label Micro-Action Detection (MMAD) refers to the task of identifying and localizing all micro-actions in a given uncut and densely annotated video, determining their corresponding start and end times, as well as their categories. This task takes an entire video as input and requires a model capable of accurately capturing both long-term and short-term action relationships to detect and locate multiple micro-actions. Designing a model for MMAD is more challenging due to the brief duration and small magnitude of micro-actions.
News and Updates
19, Feb. We release the homepage.
Contacts
The contact info is listed as follows:
• For questions regarding the general issue, please get in touch with guodan@hfut.ed.cn
• For questions regarding the challenge, please contact both kunli.hfut@gmail.com and guodan@hfut.ed.cn