Legged Robots that Keep on Learning:
Fine-Tuning Locomotion Policies in the Real World

Laura Smith, J. Chase Kew, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine

[Code][Paper]

Overview

Legged robots are physically capable of traversing a wide range of challenging environments, but designing controllers that can handle this diversity is difficult. RL allows us to automate the controller design process and has produced remarkably robust controllers when trained in a suitable range of environments. What if instead of training controllers that are robust enough to handle any eventuality, we enable the robot to continually learn in any setting it finds itself in? We propose a practical robot RL system for fine-tuning locomotion policies and demonstrate that a modest amount of real-world training can substantially improve performance during deployment. This enables a real A1 quadrupedal robot to autonomously fine-tune multiple locomotion skills in a range of environments.

Example of our system: First, we pre-train skills (in the example above, forward/backward pacing and reset) in simulation using RL. We then deploy the policies in the real world. The robot executes forward or backward pacing depending on which will bring it closer to the origin. After each episode, it automatically runs its reset policy in preparation for the next. We continue to update the policies with the data collected in the real world using the same RL method to facilitate perpetual improvement.

Reset Controller

Resets are critical for learning policies via RL. In the real world, we must handle resetting the robot explicitly, e.g., in prior work either a human is required or a hand-designed controller is used to right the robot. In addition to allowing training with minimal human supervision, this ability to recover from failure is a crucial component of a robust robotic locomotion system. In this work, we learn a robust, efficient reset controller to right the robot in case of failure. We find our controller to be highly efficient on a variety of real-world terrains, even on those where the hand-designed controller is unreliable.

Comparison to the Robot's Built-in Controller

Carpet

Memory Foam

Examples of Usage During Real-World Training

Grass

Carpet

Memory Foam

Doormat

Robustness

Real World Fine-Tuning

Grass

Carpet

Memory Foam

Doormat

Supplementary Video