Task Introduction
Micro-Action Recognition (MAR) aims to recognize and distinguish subtle body actions that typically occur in a brief instant. The MAR task is similar to conventional action recognition, as it involves using video instances as input and requires precise and efficient algorithms. However, it is uniquely complex due to the presence of low-amplitude fluctuations in gestures and postures.
Considering the co-occurrence of human micro-actions, i.e., the same micro-action may be repeated in time and different micro-actions may occur at the same time, Multi-label Micro-Action Detection (MMAD) is necessary for a deeper understanding of human bodily behavior. MMAD refers to the task of identifying and localizing all micro-actions in a given uncut and densely annotated video, determining their corresponding start and end times, as well as their categories. This task takes an entire video as input and requires a model capable of accurately capturing both long-term and short-term action relationships to detect and locate multiple micro-actions.