Advanced Skills by

Learning Locomotion and Local Navigation End-to-End

Nikita Rudin, David Hoeller, Marko Bjelonic and Marco Hutter
Robotic Systems Lab, ETH Zurich & NVIDIA

Local navigation and locomotion of legged robots are commonly split into separate modules.
In this work, we propose to combine them by training an end-to-end policy with deep reinforcement learning.
Training a policy in this way opens up a larger set of possible solutions, which allows the robot to learn more complex behaviors.

The locomotion controller is usually tasked with accurately tracking a commanded velocity. However, this limit the robot's capabilities.

With our approach, the robot needs to reach a target position within a provided time. The task's success is only evaluated at the end of an episode, meaning that the policy does not need to reach the target as fast as possible. It is free to select its path and the locomotion gait.

This simple change enables the training of new behaviours

Jumping over gaps

Climbing on boxes

Navigating obstacles

Climbing steep slopes

We transfer all policies to the real robot

Policies trained with velocity tracking (V) usually end-up learning a trotting gait.

Our policies (Pf) learn an interesting three-phased gait, which proves to be more energy efficient than trotting

Page updated

Report abuse