Contrastive Pre-training and Data Augmentation for Efficient Robot Learning (CoDER)

Albert Zhan*, Ruihan Zhao*, Lerrel Pinto, Pieter Abbeel, Misha Laskin

Code: [GitHub] Paper: [arXiv]

*Previously named "A Framework for Efficient Robotic Manipulation"

Abstract

Data-efficient learning of manipulation policies from visual observations is an outstanding challenge for real-robot learning. While deep reinforcement learning (RL) algorithms have shown success learning policies from visual observations, they still require an impractical number of real-world data samples to learn effective policies. However, recent advances in unsupervised representation learning and data augmentation significantly improved the sample efficiency of training RL policies on common simulated benchmarks. Building on these advances, we present Contrastive pre-training and Data Augmentation for Efficient Robotics (CoDER), that utilizes data augmentation and unsupervised learning to achieve extremely sample-efficient training of robotic manipulation policies with sparse rewards. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 15-50 minutes of real-world training time.

Video

icra21_video.mp4

Sparse Reward Skills from Pixels in Minutes

Shown are example rollouts of the policy during evaluation.

Reach


Pickup


Move


Pull


Light Switch


Drawer Open


Generalization and Robustness

Unseen Locations

We evaluate our light switch policy on unseen locations, that are neither in the demonstrations, nor the location during training.

Since our policies take input as pure images, the policies generalize to these locations.

Adversarial Perturbation

Policy behavior when a human perturbs the block location using a stick.

Unseen objects

Our policy generalizes to a different object shape, never seen in training nor demonstrations

Outperforming Behavior Cloning

The cloned policy is not general, and is unable to reach the block in order to pick it up.

Even when luckily resetting close to the block, the policy is not proficient and fails to pick it up.


We try behavior cloning a pickup policy using the same 10 demonstrations fed into CoDER.

We provide the same expressiveness as CoDER, using the same network architecture, as well as the unsupervised pretraining.