Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations
ABSTRACT
This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to address the exploration capability deficiency in existing music-conditioned 3D dance generation models. Current models often generate monotonous and simplistic dance sequences that misalign with human preferences because they lack exploration capabilities. The E3D2 framework involves a reward model trained from automatically-ranked dance demonstrations, which then guides the reinforcement learning process. This approach encourages the agent to explore and generate high quality and diverse dance movement sequences. The soundness of the reward model is both theoretically and experimentally validated. Empirical experiments demonstrate that E3D2 outperforms existing approaches, achieving state-of-the-art performance on the AIST++ dataset.
METHODOLOGY
Visual Comparisons
Comparsion between Bailando and the proposed E3D2 on AIST++ test set
The avatar on the left (depicted in red) is generated by the proposed E3D2, while the avatar on the right (depicted in blue) is the result of the Bailando. The two videos below correspond to the results of the "Matchstick Men" dance for the respective videos above.
The avatar video, produced based on the results of inverse kinematics (IK) processing, may exhibit some unusual twisting movements. When such a phenomenon occurs, it would be beneficial to refer to the corresponding "Matchstick Men" dance sequences provided below for a more comprehensive understanding. Concurrently, it should be noted that the floor depicted in the avatar video serves solely as a reference and does not represent the actual ground location.
*Due to the inconsistency between objective metrics and subjective perception, we utilized a different checkpoint file than the one used for objective metrics for visualizations.