Learning to Walk via Deep Reinforcement Learning

Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, George Tucker, Sergey Levine

Google / UC Berkeley

We develop a stable variant of the soft actor-critic deep reinforcement learning algorithm that requires minimal hyperparameter tuning, while also requiring only a modest number of trials to learn multilayer neural network policies. This algorithm is based on the framework of maximum entropy reinforcement learning, and automatically trades off exploration against exploitation by dynamically and automatically tuning a temperature parameter that determines the stochasticity of the policy. We demonstrate that this algorithm can be used to learn locomotion gaits on a real-world Minitaur quadrupedal robot in about two hours.

We trained the Minitaur robot to walk in 2 hours.

Even though the policy was trained on flat terrain, it generalizes surprisingly well to unseen terrains.