Distortion-aware Motion Calibrator

A Self-Supervised Approach on Motion Calibration
for Enhancing Physical Plausibility in Text-to-Motion

Gahyeon Shim, Soogeun Park, and Hyemin Ahn

Code: [GitHub] / Paper: [arXiv]

Plug-and-Play Post-hoc Motion Refinement

Distortion-aware Motion Calibrator (DMC) is a self-supervised, model-agnostic post-hoc module that improves the physical plausibility of text-to-motion generation.

DMC is trained in a data-driven manner using synthetically distorted motions, allowing it to learn how to correct common physical artifacts without explicit physics modeling.

To support different application needs, we provide two variants: a WGAN-based DMC for fast refinement and improved perceptual quality, and a denoising-based DMC for finer-grained physical correction through iterative refinement.

dmc_video.mp4

Self-Supervised Motion Refinement Pipeline

During training, physically plausible ground-truth motions (m_gt ) are intentionally distorted using vertical bias and temporal smoothing to create artifact-laden motions (m_d ). DMC learns to recover refined motions (m_r) from distorted motions (m_d ), conditioned on the original textual description.
At inference time, DMC is applied as a post-hoc refinement step to motions generated by any pre-trained text-to-motion model (m_gen ).
Without modifying the original generation pipeline, DMC corrects physical artifacts (e.g., foot skating, floating, clipping and ground penetration), producing refined motions (m_r ) that are both physically plausible and semantically consistent.

Abstract

Generating semantically aligned human motion from textual descriptions has made rapid progress, but ensuring both semantic and physical realism in motion remains a challenge. In this paper, we introduce the Distortion-aware Motion Calibrator (DMC), a post-hoc module that refines physically implausible motions (e.g., foot floating) while preserving semantic consistency with the original textual description. Rather than relying on complex physical modeling, we propose a self-supervised and data-driven approach, whereby DMC learns to obtain physically plausible motions when an intentionally distorted motion and the original textual descriptions are given as inputs. We evaluate DMC as a post-hoc module to improve motions obtained from various text-to-motion generation models and demonstrate its effectiveness in improving physical plausibility while enhancing semantic consistency. The experimental results show that DMC reduces FID score by 42.74% on T2M and 13.20% on T2M-GPT, while also achieving the highest R-Precision. When applied to high-quality models like MoMask, DMC improves the physical plausibility of motions by reducing penetration by 33.0% as well as adjusting floating artifacts closer to the ground-truth reference. These results highlight that DMC can serve as a promising post-hoc motion refinement framework for any kind of text-to-motion models by incorporating textual semantics and physical plausibility.

A Self-Supervised Approach on Motion Calibrationfor Enhancing Physical Plausibility in Text-to-Motion

Plug-and-Play Post-hoc Motion Refinement

Self-Supervised Motion Refinement Pipeline

A Self-Supervised Approach on Motion Calibration
for Enhancing Physical Plausibility in Text-to-Motion