ARIEL: Leveraging Prior Data to Automate Robotic Reinforcement Learning

Videos

Overview

Usage Demo

Motivation

Training a robot to accomplish a new task is challenging and labor-intensive. In this work, we demonstrate that utilizing large, prior datasets containing a diverse amount of tasks simultaneously addresses several key problems in real-world robotic learning:

Sample-inefficiency: Reinforcement learning (RL) algorithms typically require a substantial amount of data, which may be time-consuming to collect on hardware.
Need for manual resets: Returning an environment to its initial state requires laborious human intervention and limits the ability to continually collect data.
Failure to generalize: Robotic RL policies often fail when deployed beyond the carefully controlled setting in which they were learned.

Our robotic policies map high-dimensional RGB images and robot states to a 7-dimensional action space. Rewards are provided in a sparse (+1/0) fashion by an object detection network.

Method

In the offline stage (Phase 1), we train a multi-task policy that captures prior knowledge from an offline dataset of previously experienced tasks. Optionally, we collect a small number (~40) of human demos of the downstream new task we want to learn (Phase 0.5) and concatenate this data to the offline dataset in Phase 1 when training our policy.

Then, in the online stage (Phase 2), this multi-task policy is used to initialize learning for a new task, providing both a forward policy and a backward (reset) skill, and improving learning speed and generalization.

This approach leads to sample-efficient learning of generalizable policies with a significant reduction in the need for manual interventions (i.e., environment resets).

Dataset Collection

We collected data using scripted policies to pick up objects and place them into a container and policies to open a drawer and place an object inside. The photo shows the diverse objects and containers we used to construct the container pick-and-placing tasks in our experiments.

Upper: containers and objects used in the offline data for pre-training.

Lower: test-time containers and objects used as part of new tasks for online fine-tuning.

Autonomous Fine-tuning

0 Trials

100 Trials

600 Trials

0 Trials

100 Trials

600 Trials

0 Trials

100 Trials

Zero-Shot Generalization

Policy trained with only single-task data

ARIEL (ours)

Policies initialized with multi-task data (ARIEL) demonstrate zero-shot generalization capabilities to objects not seen in the prior dataset, whereas policies trained on only single-task data do not.

Page updated

Google Sites

Report abuse