Hierarchical Style-based Networks for Motion Synthesis

Jingwei Xu1 Huazhe Xu2 Bingbing Ni1 Xiaokang Yang1 Xiaolong Wang3 Trevor Darrell2

Shanghai Jiao Tong University1 UC Berkeley2 UC San Diego3

[Paper] [Code]


Generating diverse and natural human motion is one of the long-standing goals for creating intelligent characters in the animated world. In this paper, we propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location. Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner. Given the starting and ending states, a memory bank is used to retrieve motion references as source material for short-range clip generation. We first propose to explicitly disentangle the provided motion material into style and content counterparts via bi-linear transformation modelling, where diverse synthesis is achieved by free-form combination of these two components. The short-range clips are then connected to form a long-range motion sequence. Without ground truth annotation, we propose a parameterized bi-directional interpolation scheme to guarantee the physical validity and visual naturalness of generated results. On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion, which is also generalizable to unseen motion data during testing. Moreover, we demonstrate the generated sequences are useful as subgoals for actual physical execution in the animated world.


Hierarchical style-based motion synthesis framework. (a) Reference Motion Search: Given starting and ending states, we search for reference subsequences in the training dataset for generation; (b) Short-range Motion Generation: given each reference subsequence, we generate a novel subsequence with motion style transfer; (c) Long-range Motion Generation: all synthesized subsequences are connected together in time with bi-directional modelling.

Synthesis Results

Final results: For each gif, we present three synthesized motion sequences, which are conditioned on the same starting and ending states. Facilitated by the proposed hierarchical, we are able to synthesize long-range, diverse and visually natural motion sequences.

Connecting short-range clips via motion interpolation. For each gif, we present three interpolation results under different configurations, i.e., transition with (a) right turn, (b) sharp turn, (c) mild turn and (d) straight move. Our model is able to synthesize visually natural transition sequences under all presented configurations.

(a) transition with right turn

(b) transition with sharp turn

(c) transition with mild turn

(d) transition with straight move

Short-range clip synthesis via motion style transfer: For each gif, left sequence refers to style; right sequence refers to content; mid sequence refers to composed motion.



author={Jingwei Xu and Huazhe Xu and Bingbing Ni and Xiaokang Yang and Xiaolong Wang and Trevor Darrell},

title={Hierarchical Style-based Networks for Motion Synthesis},