Resilient Legged Local Navigation: 

Learning to Traverse with Compromised Perception End-to-End

Jin Jin* and Chong Zhang* ,  Jonas Frey, Nikita Rudin, Matías Mattamala, Cesar Cadena, Marco Hutter

(*EQUAL contribution, contact: jinjin@ethz.ch, chozhang@ethz.ch)

To appear in ICRA 2024, finalist for Best Paper Award in Cognitive Robotics

Our navigation policy can guide the robot to the local target no matter whether perception is enabled.

Motivation

In harsh environments or corner cases of the perception module, the sensors may fail to interpret the environment correctly, as shown in (A). This may lead to invisible obstacles or pits, or perception failures.

Perception failures pose a high risk to the robot navigation. (B) When there is no perception failure, the navigational task is trivial. However, when an obstacle becomes invisble to the robot, (C) classical planners typically get stuck as they cannot react. In this work, we provide an RL-based solution (D): the robot can learn reactive motions to reach the target, despite noisy sensor data, instantaneous impact, and complex dynamics, while scaling to different cases.

Results

Our policy can work even when the robot is fully blind to the obstacles.

A classical planner cannot react to the invisible obstacle and gets stuck.

Our policy can also function with a reliable map.

Our policy overcomes an invisible pit.

Our policy reacts to collisions on different body parts in different directions:

Base (front)

Base (side)

Thigh

Foot

Leg

Multi-contact

Such collision reactions on legged robots are non-trivial, due to noisy sensor data, instantaneous impact, complex dynamics, and variability of collisions. For example, it is difficult to use merely IMU for collision detection even on base.

A test example in simulation, showcasing that the collison detection is non-trivial even on base, not to mention on other body parts in different directions:

Method

We model perception failures as invisible obstacles and pits to the robot exteroception. Perception failures pose a high risk to the navigation module, as they can lead to the failed task if not properly handled.

We apply asymmetric actor-critic. Locomotion-level observations are also part of our high-level inputs, enabling our navigation policy to sense and react to perception failures. An LSTM layer is used to recurrently infer the environment, and it is regularized towards the critic in the latent space to implicitly reconstruct the true scene information. 

We design a terrain curriculum to facilitate learning. From easy to hard, the terrains get rougher, and the number of obstacles or pits increases. During training, a robot gets promoted to higher difficulty levels if it reaches the target position within the episode, and gets demoted otherwise.

We randomize the collision body of the base during training, so that the policy can learn to react to collisions on different body parts in different directions. For example, for "Tiny", the collisions are mostly foot collisions and thigh collisions, while for "Hugh" the base collisions dominate.