These success examples are provided as images, without any demonstrations, and with no additional guidance on solving the task. 3 of the total 80 user-provided examples are shown below.
We use a convolutional neural network for learning a success classifier on image data.
We use log-probabilities obtained from the success classifier as reward for running reinforcement learning. We obtain negatives examples for the classifier using the data collected by the policy over the learning process.
Here, we show some example queries made by the algorithm, and the corresponding labels provided by a human user. This data is fed back into the classifier. 3 of the total 75 queried examples are shown below.
We evaluated our method on three complex vision-based tasks: pushing a mug onto a coaster, draping a cloth over a box, and a task that requires the robot to insert a book onto a shelf between other books.
The goal is to insert a book in one of the multiple empty slots in the bookshelf.
The goal is to drape a cloth over an object.
The goal is to push a mug onto a coaster. The initial position of the mug is randomized.
The goal is to open a door of a cabinet by 45 degrees. Initially, the door is either completely closed with probability 0.5, or open up to 15 degrees.
The goal is to pick up a tennis ball from a table and hold it at a particular spot 20cm above the table. The initial position of the tennis ball on the table is randomized.
The goal is to push a mug onto a coaster, with a randomized initial position of the mug.