ARIEL: Leveraging Prior Data to Automate Robotic Reinforcement Learning

Homer Walke, Jonathan Yang, Albert Yu, Aviral Kumar, Jedrzej Orbik, Avi Singh, Sergey Levine

UC Berkeley, UT Austin, Google

Overview

Usage Demo

Motivation

Training a robot to accomplish a new task is challenging and labor-intensive. In this work, we demonstrate that utilizing large, prior datasets containing a diverse amount of tasks simultaneously addresses several key problems in real-world robotic learning:

  • Sample-inefficiency: Reinforcement learning (RL) algorithms typically require a substantial amount of data, which may be time-consuming to collect on hardware.

  • Need for manual resets: Returning an environment to its initial state requires laborious human intervention and limits the ability to continually collect data.

  • Failure to generalize: Robotic RL policies often fail when deployed beyond the carefully controlled setting in which they were learned.

Our robotic policies map high-dimensional RGB images and robot states to a 7-dimensional action space. Rewards are provided in a sparse (+1/0) fashion by an object detection network.

Method

In the offline stage (Phase 1), we train a multi-task policy that captures prior knowledge from an offline dataset of previously experienced tasks. Optionally, we collect a small number (~40) of human demos of the downstream new task we want to learn (Phase 0.5) and concatenate this data to the offline dataset in Phase 1 when training our policy.

Then, in the online stage (Phase 2), this multi-task policy is used to initialize learning for a new task, providing both a forward policy and a backward (reset) skill, and improving learning speed and generalization.

This approach leads to sample-efficient learning of generalizable policies with a significant reduction in the need for manual interventions (i.e., environment resets).

Dataset Collection

We collected data using scripted policies to pick up objects and place them into a container and policies to open a drawer and place an object inside. The photo shows the diverse objects and containers we used to construct the container pick-and-placing tasks in our experiments.

Upper: containers and objects used in the offline data for pre-training.

Lower: test-time containers and objects used as part of new tasks for online fine-tuning.

Autonomous Fine-tuning

0 Trials

100 Trials

600 Trials

0 Trials

100 Trials

600 Trials

0 Trials

100 Trials

Zero-Shot Generalization

Policy trained with only single-task data

ARIEL (ours)

Policies initialized with multi-task data (ARIEL) demonstrate zero-shot generalization capabilities to objects not seen in the prior dataset, whereas policies trained on only single-task data do not.

Cite

@inproceedings{walkeariel,

title={Don’t Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning},

author={Walke, Homer Rich and Yang, Jonathan Heewon and Yu, Albert and Kumar, Aviral and Orbik, J{\k{e}}drzej and Singh, Avi and Levine, Sergey},

booktitle={Workshop on Learning from Diverse, Offline Data}

}