End-to-End Robotic Reinforcement Learning without Reward Engineering

Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine

University of California, Berkeley

paper | github | blog post

To appear in Robotics: Science and Systems, 2019

Motivation

Real world tasks often involve high dimensional observations, like images
Obtaining rewards from pixels is difficult, and often requires task-specific engineering
Our goal is to solve real world robotics task from pixel-level observations in an end-to-end fashion:
- Without task-specific systems to compute rewards
- With minimal human intervention

Method

1. User provides examples of successful outcomes

These success examples are provided as images, without any demonstrations, and with no additional guidance on solving the task. 3 of the total 80 user-provided examples are shown below.

2. Learn a reward function on images using a success classifier

We use a convolutional neural network for learning a success classifier on image data.

3. Run RL with this reward

We use log-probabilities obtained from the success classifier as reward for running reinforcement learning. We obtain negatives examples for the classifier using the data collected by the policy over the learning process.

4. Actively query the human user

Here, we show some example queries made by the algorithm, and the corresponding labels provided by a human user. This data is fed back into the classifier. 3 of the total 75 queried examples are shown below.

Real-World Experiments

We evaluated our method on three complex vision-based tasks: pushing a mug onto a coaster, draping a cloth over a box, and a task that requires the robot to insert a book onto a shelf between other books.

Visual Bookshelf

The goal is to insert a book in one of the multiple empty slots in the bookshelf.

Visual Draping

The goal is to drape a cloth over an object.

Visual Pusher

The goal is to push a mug onto a coaster. The initial position of the mug is randomized.

Simulated Experiments

Visual Door Opening

The goal is to open a door of a cabinet by 45 degrees. Initially, the door is either completely closed with probability 0.5, or open up to 15 degrees.

Visual Picker

The goal is to pick up a tennis ball from a table and hold it at a particular spot 20cm above the table. The initial position of the tennis ball on the table is randomized.

Visual Pusher

The goal is to push a mug onto a coaster, with a randomized initial position of the mug.

End-to-End Robotic Reinforcement Learning without Reward Engineering

Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine

University of California, Berkeley

paper | github | blog post

To appear in Robotics: Science and Systems, 2019

Motivation

Method

1. User provides examples of successful outcomes

2. Learn a reward function on images using a success classifier

3. Run RL with this reward

4. Actively query the human user

Real-World Experiments

Visual Bookshelf

Visual Draping

Visual Pusher

Simulated Experiments

Visual Door Opening

Visual Picker

Visual Pusher

Video