University of California, Berkeley
Paper: https://arxiv.org/abs/1810.00482
Code: https://github.com/anxie/meta_classifier
Abstract: Reinforcement learning and planning methods require an objective or reward function that encodes the desired behavior. Yet, in practice, there are a wide range of scenarios where an objective is difficult to provide programmatically, such as manipulation tasks with visual observations involving unknown object positions or deformable objects. In these cases, prior methods use engineered problem-specific solutions, e.g., by instrumenting the environment with additional sensors to measure a proxy for the objective. Such solutions require a significant engineering effort on a per-task basis, and make it impractical for robots to continuously learn complex skills outside of laboratory settings. We aim to find a more general and scalable solution for specifying goals for robot learning in unconstrained environments. To that end, we formulate the few-shot objective learning problem, where the goal is to learn a task objective from only a few example images of successful end states for that task. We propose a simple solution to this problem: meta-learn a classifier that can recognize new goals from a few examples. We show how this approach can be used with both model-free reinforcement learning and visual model-based planning, for manipulating ropes from images in simulation and moving objects into user-specified configurations on a real robot.
Our method succeeds in the tasks below, while other methods try to naively match the provided examples of success. In the comparisons for the first task below, the arm pushes the blue bowl to its absolute position in the example on the left, but fails to complete the task. In the second task, the DSAE distance method moves the purple notebook to its position in the example on the left, and the pixel distance method tries to match the arm's location in the example and inadvertently succeeds as a result.
5 Examples of Success
FLO (Our Method)
DSAE Distance
Pixel Distance
5 Examples of Success
FLO (Our Method)
DSAE Distance
Pixel Distance
Our method also enables the planner to complete tasks in the presence of distractors. Similar to the setting above, other methods will tend to naively match the nearest provided example. The planner using DSAE distance ignores the objects entirely and tries to match the arm's position in the examples on the left, while the planner struggles to find good actions with the pixel distance metric.
5 Examples of Success
FLO (Our Method)
DSAE Distance
Pixel Distance
5 Examples of Success
FLO (Our Method)
DSAE Distance
Pixel Distance
Task 1: 5 Examples of Success
FLO (Our Method)
Task 2: 5 Examples of Success
FLO (Our Method)
5 Examples of Success
FLO (Our Method)
5 Examples of Success
FLO (Our Method)
5 Examples of Success
FLO (Our Method)
5 Examples of Success
FLO (Our Method)