Learning to Walk from Three Minutes of Real-World Data with
Semi-structured Dynamics Models
Jacob Levy*, Tyler Westenbroek*, David Fridovich-Keil
*Equal Contribution
Jacob Levy*, Tyler Westenbroek*, David Fridovich-Keil
*Equal Contribution
We present a novel framework for learning predictive models for contact-rich systems which seamlessly integrates structured first-principles modeling techniques with black-box autoregressive models. This semi-structured approach enables us to make accurate predictions far into the future with substantially fewer training samples than prior methods. We leverage this capability to push the sample-complexity boundary for real-world model-based reinforcement learning. We validate our approach through real-world experiments with a Unitree Go1 quadruped robot, learning dynamic gaits -- from scratch -- on both hard and soft surfaces with just minutes of data.
Goal: Combine the benefits of black-box modeling and structured sysID to push the sample complexity boundary in model-based RL (MBRL) for contact rich systems.
Key Benefits:
More sample efficient
Uses only on-board observations; doesn’t require privileged information
No added exploration noise while acting in the real-world
Outcome: We train a quadruped to walk entirely from scratch, using only 3 minutes of real-world data.
Prior RL methods are general-purpose, but:
Model-free RL (MFRL) is sample inefficient
Black-box models in MBRL struggle to generalize beyond the training data
Added exploration noise required in MFRL and MBRL can damage actuators
First Principles sysID leverages known structure, but:
Prior Lagrangian-informed models do not scale to the complexities of contact-rich systems
Existing techniques require reconstructing the state of the ground, which is impractical
Question: How can we leverage known structure in a way that is practical for learning contact-rich control in the real-world?
Task: Learn to walk at maximum speed from scratch entirely in the real-world
Assumptions:
Only proprioceptive observations are available
The state and location of the ground is unknown
Lagrangian-based dynamics are known analytically
Our Approach:
I. Learn external torque and noise estimators:
Condition estimates on encoding of a history of observations to overcome partial observability issues
Model ensembles are trained with a multi-step NNL loss
II. Generate synthetic rollouts:
Integrate torque predictions through the Lagrangian dynamics
Add learned noise estimates to generate prediction of next observation
Autoregressively generate synthetic rollouts by feeding predicted observations back into the observation history buffer
III. Use both synthetic and real rollouts with model-free RL:
Collect real-world rollouts using a deterministic policy
Synthetic rollouts branch off from real-world rollouts
Add policy exploration noise for synthetic rollouts only
We perform training from scratch in the real-world with a Unitree Go1 quadruped robot. After only 3 minutes of interaction with the environment, the quadruped learns to walk straight and achieves an average velocity of 0.98 m/s.
Compared to prior work, we achieve:
Order-of-magnitude increase in sample efficiency
Significantly higher walking speeds
We additionally train from scratch on memory foam to demonstrate the versatility of our approach. When walking on this surface, the robot's feet sink deeply, which makes training more difficult. Nonetheless, the quadruped achieves an average velocity of 0.53 m/s with only 3 minutes of real-world training data, despite the significantly different contact dynamics.
Learned external torque estimates are smoothed versions of the real external torques, leading to accurate long-horizon predictions (see also below). The plot to the right shows the predicted and real external vertical force acting on the robot base over one second of real-world data.
Using structured dynamics models result in order-of-magnitude better policy performance compared to black-box models.
Structured dynamics models generate synthetic rollouts that are 2.3x more accurate compared to black-box models in unseen environments, indicating better generalization: