Self-Supervised Correspondence
in Visuomotor Policy Learning
Paper: link
Video: link
Source code:
- Policy learning code: will be released soon.
- Vision code: https://github.com/RobotLocomotion/pytorch-dense-correspondence (an updated branch will be released soon)
Abstract
In this paper we explore using self-supervised correspondence for improving the generalization performance and sample efficiency of visuomotor policy learning. Prior work has primarily used approaches such as autoencoding, pose-based losses, and end-to-end policy optimization in order to train the visual portion of visuomotor policies. We instead propose an approach using self-supervised dense visual correspondence training, and show this enables visuomotor policy learning with surprisingly high generalization performance with modest amounts of data: using imitation learning, we demonstrate extensive hardware validation on challenging manipulation tasks with as few as 50 demonstrations. Our learned policies can generalize across classes of objects, react to deformable object configurations, and manipulate textureless symmetrical objects in a variety of backgrounds, all with closed-loop, real-time vision-based policies. Simulated imitation learning experiments suggest that correspondence training offers sample complexity and generalization benefits compared to autoencoding and end-to-end training.
Examples of autonomous policies (left) and learned correspondences (right). See paper for higher-resolution.
Video
Model
Overview of our model (right) and a factorization of common visuomotor policies (left).
Team
MIT CSAIL, Robot Locomotion Group
Related Work (see paper for full Related Work)
Peter Florence*, Lucas Manuelli*, Russ Tedrake. "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation." Conference on Robot Learning (CoRL), 2018 [pdf] [video]
Tanner Schmidt, Richard Newcombe, Dieter Fox. "Self-supervised Visual Descriptor Learning for Dense Correspondence" Robotics and Automation Letters (RA-L), 2017 [pdf] [video]
Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel "End-to-End Training of Deep Visuomotor Policies." Journal of Machine Learning Research (JMLR), 2016 [pdf] [video]