Offline Reinforcement Learning for Visual Navigation

Dhruv Shah*, Arjun Bhorkar*, Hrish Leen, Ilya Kostrikov, Nick Rhinehart, Sergey Levine

UC Berkeley

OpenReview | Summary Video | Talk @ CoRL | Code | BibTeX

Oral Talk at Conference on Robot Learning (CoRL) 2022

Auckland, New Zealand

Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass. However, online learning from trial-and-error for real-world robots is logistically challenging, and methods that instead can utilize existing datasets of robotic navigation data could be significantly more scalable and enable broader generalization. In this paper, we present ReViND, the first offline RL system for robotic navigation that can leverage previously collected data to optimize user-specified reward functions in the real-world. We evaluate our system for off-road navigation without any additional data collection or fine-tuning, and show that it can navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.

Key Idea

Offline RL is great to learn custom behaviors, but doesn't scale to long horizons. Topological graphs help, but how do we get distances?
Using (offline) values as distances, we infer the graph connectivity for planning!

Value graphs for planning + learned policies for control → custom behaviors in real world from offline data

Method Overview

Train a Q-learning agent on offline datasets labeled with the desired task rewards. We use IQL for offline Q-learning.

Use the learned offline value function as a task-specific distance function to create a topological graph --- where the nodes denote observations in the environment, and weighted edges represent connectivity.

Given a goal node, use a search algorithm to plan a sequence of subgoals in the graph that lead to the goal. Execute this plan using the learned control policy using the same Q-function, re-plan and repeat till you reach the goal.

BibTeX

@inproceedings{

shah2022offline,

title={Offline Reinforcement Learning for Visual Navigation},

author={Dhruv Shah and Arjun Bhorkar and Hrishit Leen and Ilya Kostrikov and Nicholas Rhinehart and Sergey Levine},

booktitle={6th Annual Conference on Robot Learning},

year={2022},

url={https://openreview.net/forum?id=uhIfIEIiWm_}

}

Page updated

Google Sites

Report abuse