MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics

Xinchen Yan1  Akash Rastogi1  Ruben Villegas1
Kalyan Sunkavalli2  Eli Shechtman2  Sunil Hadap2  Ersin Yumer3  Honglak Lee1,4

1University of Michigan, Ann Arbor 2Adobe Research, 3Argo AI, 4Google Brain

Long-term human motion can be represented as a series of motion modes—motion sequences that capture short-term temporal dynamics—with transitions between them. We leverage this structure and present a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation. Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode. Our model is able to generate multiple diverse and plausible motion sequences in the future from the same input. We apply our approach to both facial and full body motion, and demonstrate applications like analogy-based motion transfer and video synthesis.
* Paper (Published at ECCV 2018)
Key Results
Video Generation on Human3.6M
Given 16 frames (green) as observation, 64 future frames (red) are predicted.