We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer.
The inset shows the original dog MoCap reference clip
The inset shows the original human MoCap reference clip.
The inset shows the original dog MoCap reference clip.
The robot follows a fixed velocity command until it reaches the end of the workspace, after which it turns on the spot until it faces the center and walks forward again.
The robot receives forward, lateral and yaw velocity commands from a user via an off-camera handheld joystick.
The robot receives forward, lateral and yaw velocity commands from a user via an off-camera handheld joystick.
A tracking controller generates velocity commands which are then executed by the walking controller in order to follow a slalom trajectory. Mid-way through the trial we introduce an obstacle the robot has to walk over in order to test robustness.
The robot is required to dribble the ball towards a shifting target location as indicated by the center of the red disc. The agent has learned to use both front and hind legs to control the ball.
The robot is required to dribble the ball towards a shifting target location as indicated by the center of the red disc. The controller is able to control the ball quite well despite the limited effort into modelling the contact dynamics of the ball.
The robot is required to dribble the ball towards a shifting target location as indicated by the center of the red disc. The agent has learned to strafe around the ball in order to be able to kick it in the right direction.
Sampling latent commands from the prior for the trained skill module results in temporally-extended behavior, with the robot maintaining balance and walking around randomly.
We use a procedural terrain to improve the robustness of the walking controller to small obstacles and slopes. The target velocities are randomly sampled according to the process described in the text.
.Just optimizing for task reward results in erratic and inefficient behavior that, while effective in solving the task, is not suited for deployment on hardware.