Skew-Fit: State-Covering Self-Supervised

Reinforcement Learning

Vitchyr Pong*, Murtaza Dalal*, Steven Lin*, Ashvin Nair, Shikhar Bahl, Sergey Levine

*Equal Contribution

Abstract

Reinforcement learning can enable an agent to acquire a large repertoire of skills. However, each new skill requires a manually-designed reward function, which typically requires considerable manual effort and engineering. Self-supervised goal setting has the potential to automate this process, enabling an agent to propose its own goals and acquire skills that achieve these goals. However, such methods typically rely on manually-designed goal distributions or heuristics to encourage the agent to explore a wide range of states. In this work, we propose a formal objective for exploration when training an autonomous goal-reaching policy that maximizes state coverage, and show that this objective is equivalent to maximizing the entropy of the goal distribution together with goal reaching performance. We present an algorithm called Skew-Fit for learning such a maximum-entropy goal distribution, and show that our method converges to a uniform distribution over the set of possible states, even when we do not know this set beforehand. When combined with existing goal-conditioned reinforcement learning algorithms, we show that Skew-Fit allows self-supervised agents to autonomously explore their entire state space faster than prior work, across a variety of simulated and real robotic tasks.

arXiv preprint

algorithm code

environments code

Summary Video

Environments

Videos are included below. Our reinforcement learning method learns directly from images: it does not receive ground-truth state information through either the observation or the reward. The videos below show each environment from 3 perspectives. The video on the left is a 3D view of the robot as it completes the task. The top right video is from the robot's perspective. We pass a down-sampled version of the image to the agent. Finally, the video on the bottom right shows the goal image the robot is attempting to achieve, which we also down-sampled before giving to the robot.

Simulation: Visual Door

Simulation: Visual Pusher

door_final.mpeg
pusher_final.mpeg

Simulation: Visual Pickup

Real World: Visual Door

pickup_final.mpeg
final_real_door_hard.mpeg