Visual Reinforcement Learning with Imagined Goals
Ashvin Nair*, Vitchyr Pong*, Murtaza Dalal, Shikhar Bahl, Steven Lin, Sergey Levine
*Equal ContributionAbstract
For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.
Code
Our environments are available at https://github.com/vitchyr/multiworld and our algorithm implementation is available at https://github.com/vitchyr/rlkit
Videos
Videos are included below. The bottom image shows a goal image, and the video above it shows the final trained policy trying to reach the goal. Our RL method learns behavior from images: it does not receive ground-truth state information through either the observation or the reward.

Train time: goals generated by a VAE.
Test time: goals come from the environment.






Variable-object Experiment
In this experiment, the agent always sees two objects during training time. At test time, the agent may be given 0, 1, or 2 objects to push to the correct locations.

Pick and Place

Door Opening/Closing
