GUAPO

Guided Uncertainty-Aware Policy Optimization:

Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Michelle A. Lee*, Carlos Florensa*, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, and Dieter Fox

Paper Link: https://arxiv.org/abs/2005.10872

Abstract: Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, learning-based approaches, such as Reinforcement Learning (RL), can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interaction with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a local RL policy can be learned directly from raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing tight-fitting peg insertion.

* These authors contributed equally to the paper

ICRA 2020 Talk

Video

Page updated

Google Sites

Report abuse