Rapid Exploration for Open-World Navigation with Latent Goal Models
Berkeley Artificial Intelligence Research
Oral Talk at Conference on Robot Learning (CoRL) 2021
London, UK
Oral Talk at Workshop on Never-Ending Reinforcement Learning (NERL) at ICLR 2021
Abstract
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory. We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration. Trained on a large offline dataset of prior experience, the model acquires a representation of visual goals that is robust to task-irrelevant distractors. We demonstrate our method on a mobile ground robot in open-world exploration scenarios. Given an image of a goal that is up to 80 meters away, our method leverages its representation to explore and discover the goal in under 20 minutes, even amidst previously-unseen obstacles and weather conditions.
Summary Video
Idea
Use an uncertainty-aware, context-conditioned goal representation to learn a short-horizon distance function and policy using a C-VAE: this representation, trained across environments, can quickly adapt to novel scenes.
Use the learned latent goal model for exploration via sampling "imaginary goals" in the local neighborhood and attempting to reach them.
Store environment interaction using a non-parametric memory (a topological graph). This graph can also be used to obtain long-horizon policies and maintain visitation counts for frontier-based exploration.
Graphical Model of Goals, Actions and Distances
Our model uses images of goals and current observations to obtain a latent state-goal representation that summarizes the goal for the purpose of prediction the action and temporal distance to goal.
Exploring Open-World Environments with RECON
Combining the latent goal model with the topological graph, RECON can quickly discover user-specified goals in new environments and navigate to them reliably. Our method consists of three stages:
Prior Experience: The goal-conditioned distance and action model is trained using experience from previously visited environments. Supervision for training our model is obtained by using time steps as a proxy for distances and a hindsight relabeling scheme.
Exploring a Novel Environment: When dropped in a new environment, RECON uses a combination of frontier-based exploration and latent goal-sampling with the learned model to discover a visual target. The learned model is also finetuned to the new environment.
Navigating an Explored Environment: Given an explored environment (represented by a topological graph G) and the model, RECON uses G to navigate to a goal image by planning a path of subgoals through the graph.
Example Environments
Robustness to Distractors
RECON plans over a compressed representation that ignores distractors in the environment, while the learned policy is reactive. It can explore a non-stationary environment, successfully discovering and navigating to the visually-specified goal. The learned representation and topological graph are robust to visual distractors, allowing RECON to reliably navigate to the goal under previously unseen obstacles and a variety of lighting and weather conditions.
BibTeX
@inproceedings{
shah2021rapid,
title={{Rapid Exploration for Open-World Navigation with Latent Goal Models}},
author={Dhruv Shah and Benjamin Eysenbach and Nicholas Rhinehart and Sergey Levine},
booktitle={5th Annual Conference on Robot Learning },
year={2021},
url={https://openreview.net/forum?id=d_SWJhyKfVw}
}