Learning to Walk from Three Minutes of Real-World Data with
Semi-structured Dynamics Models

Jacob Levy*, Tyler Westenbroek*, David Fridovich-Keil

*Equal Contribution

We present a novel framework for learning predictive models for contact-rich systems which seamlessly integrates structured first-principles modeling techniques with black-box autoregressive models. This semi-structured approach enables us to make accurate predictions far into the future with substantially fewer training samples than prior methods. We leverage this capability to push the sample-complexity boundary for real-world model-based reinforcement learning. We validate our approach through real-world experiments with a Unitree Go1 quadruped robot, learning dynamic gaits -- from scratch -- on both hard and soft surfaces with just minutes of data.

Overview

Goal: Combine the benefits of black-box modeling and structured sysID to push the sample complexity boundary in model-based RL (MBRL) for contact rich systems.

Key Benefits:

More sample efficient
Uses only on-board observations; doesn’t require privileged information
No added exploration noise while acting in the real-world

Outcome: We train a quadruped to walk entirely from scratch, using only 3 minutes of real-world data.

Challenges

Prior RL methods are general-purpose, but:

Model-free RL (MFRL) is sample inefficient
Black-box models in MBRL struggle to generalize beyond the training data
Added exploration noise required in MFRL and MBRL can damage actuators

First Principles sysID leverages known structure, but:

Prior Lagrangian-informed models do not scale to the complexities of contact-rich systems
Existing techniques require reconstructing the state of the ground, which is impractical

Question: How can we leverage known structure in a way that is practical for learning contact-rich control in the real-world?

Semi-structured RL (SSRL)

Task: Learn to walk at maximum speed from scratch entirely in the real-world

Assumptions:

Only proprioceptive observations are available
The state and location of the ground is unknown
Lagrangian-based dynamics are known analytically

Our Approach:

I. Learn external torque and noise estimators:

Condition estimates on encoding of a history of observations to overcome partial observability issues
Model ensembles are trained with a multi-step NNL loss

II. Generate synthetic rollouts:

Integrate torque predictions through the Lagrangian dynamics
Add learned noise estimates to generate prediction of next observation
Autoregressively generate synthetic rollouts by feeding predicted observations back into the observation history buffer

III. Use both synthetic and real rollouts with model-free RL:

Collect real-world rollouts using a deterministic policy
Synthetic rollouts branch off from real-world rollouts
Add policy exploration noise for synthetic rollouts only

Real-world Results

We perform training from scratch in the real-world with a Unitree Go1 quadruped robot. After only 3 minutes of interaction with the environment, the quadruped learns to walk straight and achieves an average velocity of 0.98 m/s.

Compared to prior work, we achieve:

Order-of-magnitude increase in sample efficiency
Significantly higher walking speeds

We additionally train from scratch on memory foam to demonstrate the versatility of our approach. When walking on this surface, the robot's feet sink deeply, which makes training more difficult. Nonetheless, the quadruped achieves an average velocity of 0.53 m/s with only 3 minutes of real-world training data, despite the significantly different contact dynamics.

Learned external torque estimates are smoothed versions of the real external torques, leading to accurate long-horizon predictions (see also below). The plot to the right shows the predicted and real external vertical force acting on the robot base over one second of real-world data.

Simulated Results

Forwards rollouts are simulated using Brax.

Using structured dynamics models result in order-of-magnitude better policy performance compared to black-box models.

Structured dynamics models generate synthetic rollouts that are 2.3x more accurate compared to black-box models in unseen environments, indicating better generalization:

Page updated