Visual Reinforcement Learning with Imagined Goals

Ashvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine

*Equal Contribution

Abstract

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

Preprint available on arXiv.

Code

Our environments are available at https://github.com/vitchyr/multiworld and our algorithm implementation is available at https://github.com/vitchyr/rlkit

Videos

Videos are included below. The bottom image shows a goal image, and the video above it shows the final trained policy trying to reach the goal. Our RL method learns behavior from images: it does not receive ground-truth state information through either the observation or the reward.

Train time: goals generated by a VAE.

Test time: goals come from the environment.

Variable-object Experiment

In this experiment, the agent always sees two objects during training time. At test time, the agent may be given 0, 1, or 2 objects to push to the correct locations.

Pick and Place

pickplace.mp4

Door Opening/Closing

door.mp4