End-to-End Robotic Reinforcement Learning without Reward Engineering
Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, Sergey Levine
University of California, Berkeley
paper | github | blog post
To appear in Robotics: Science and Systems, 2019
Motivation
- Real world tasks often involve high dimensional observations, like images
- Obtaining rewards from pixels is difficult, and often requires task-specific engineering
- Our goal is to solve real world robotics task from pixel-level observations in an end-to-end fashion:
- Without task-specific systems to compute rewards
- With minimal human intervention
Method
1. User provides examples of successful outcomes
These success examples are provided as images, without any demonstrations, and with no additional guidance on solving the task. 3 of the total 80 user-provided examples are shown below.
2. Learn a reward function on images using a success classifier
We use a convolutional neural network for learning a success classifier on image data.
3. Run RL with this reward
We use log-probabilities obtained from the success classifier as reward for running reinforcement learning. We obtain negatives examples for the classifier using the data collected by the policy over the learning process.
4. Actively query the human user
Here, we show some example queries made by the algorithm, and the corresponding labels provided by a human user. This data is fed back into the classifier. 3 of the total 75 queried examples are shown below.
Real-World Experiments
We evaluated our method on three complex vision-based tasks: pushing a mug onto a coaster, draping a cloth over a box, and a task that requires the robot to insert a book onto a shelf between other books.
Visual Bookshelf
The goal is to insert a book in one of the multiple empty slots in the bookshelf.
Visual Draping
The goal is to drape a cloth over an object.
Visual Pusher
The goal is to push a mug onto a coaster. The initial position of the mug is randomized.
Simulated Experiments
Visual Door Opening
The goal is to open a door of a cabinet by 45 degrees. Initially, the door is either completely closed with probability 0.5, or open up to 15 degrees.
Visual Picker
The goal is to pick up a tennis ball from a table and hold it at a particular spot 20cm above the table. The initial position of the tennis ball on the table is randomized.
Visual Pusher
The goal is to push a mug onto a coaster, with a randomized initial position of the mug.