Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

AAAI-24: Paper

Lab: THUHCSI

ABSTRACT

This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to address the exploration capability deficiency in existing music-conditioned 3D dance generation models. Current models often generate monotonous and simplistic dance sequences that misalign with human preferences because they lack exploration capabilities. The E3D2 framework involves a reward model trained from automatically-ranked dance demonstrations, which then guides the reinforcement learning process. This approach encourages the agent to explore and generate high quality and diverse dance movement sequences. The soundness of the reward model is both theoretically and experimentally validated. Empirical experiments demonstrate that E3D2 outperforms existing approaches, achieving state-of-the-art performance on the AIST++ dataset.

METHODOLOGY

Demo Video

In wild musics

We use Auto Dance Camera and Concert Stage to improve rendering effects.

AAAI-demo-720.mp4

Visual Comparisons

Comparsion between Bailando and the proposed E3D2 on AIST++ test set

The avatar on the left (depicted in red) is generated by the proposed E3D2, while the avatar on the right (depicted in blue) is the result of the Bailando. The two videos below correspond to the results of the "Matchstick Men" dance for the respective videos above.

The avatar video, produced based on the results of inverse kinematics (IK) processing, may exhibit some unusual twisting movements. When such a phenomenon occurs, it would be beneficial to refer to the corresponding "Matchstick Men" dance sequences provided below for a more comprehensive understanding. Concurrently, it should be noted that the floor depicted in the avatar video serves solely as a reference and does not represent the actual ground location.

mWA0_audio.mp4

mJS3_audio.mp4

mKR2_audio.mp4

mLH4_audio.mp4

mPO1_audio.mp4

mMH3_audio.mp4

mHO5_audio.mp4

mLO2_audio.mp4

mJB5_audio.mp4

*Due to the inconsistency between objective metrics and subjective perception, we utilized a different checkpoint file than the one used for objective metrics for visualizations.

Page updated

Google Sites

Report abuse