Few-Shot Goal Inference for Visuomotor Learning and Planning

Annie Xie, Avi Singh, Sergey Levine, Chelsea Finn

University of California, Berkeley

Paper: https://arxiv.org/abs/1810.00482

Code: https://github.com/anxie/meta_classifier

Abstract: Reinforcement learning and planning methods require an objective or reward function that encodes the desired behavior. Yet, in practice, there are a wide range of scenarios where an objective is difficult to provide programmatically, such as manipulation tasks with visual observations involving unknown object positions or deformable objects. In these cases, prior methods use engineered problem-specific solutions, e.g., by instrumenting the environment with additional sensors to measure a proxy for the objective. Such solutions require a significant engineering effort on a per-task basis, and make it impractical for robots to continuously learn complex skills outside of laboratory settings. We aim to find a more general and scalable solution for specifying goals for robot learning in unconstrained environments. To that end, we formulate the few-shot objective learning problem, where the goal is to learn a task objective from only a few example images of successful end states for that task. We propose a simple solution to this problem: meta-learn a classifier that can recognize new goals from a few examples. We show how this approach can be used with both model-free reinforcement learning and visual model-based planning, for manipulating ropes from images in simulation and moving objects into user-specified configurations on a real robot.

Supplementary Video Results

Pushing, without distractors

Goal: Rearrange the pair of objects to match the relative positioning shown in the examples of success.

Our method succeeds in the tasks below, while other methods try to naively match the provided examples of success. In the comparisons for the first task below, the arm pushes the blue bowl to its absolute position in the example on the left, but fails to complete the task. In the second task, the DSAE distance method moves the purple notebook to its position in the example on the left, and the pixel distance method tries to match the arm's location in the example and inadvertently succeeds as a result.

5 Examples of Success

FLO (Our Method)

DSAE Distance

Pixel Distance

5 Examples of Success

FLO (Our Method)

DSAE Distance

Pixel Distance

Pushing, with distractors

Goal: Rearrange the pair of objects to match the relative positioning shown in the examples of success while ignoring the distractor.

Our method also enables the planner to complete tasks in the presence of distractors. Similar to the setting above, other methods will tend to naively match the nearest provided example. The planner using DSAE distance ignores the objects entirely and tries to match the arm's position in the examples on the left, while the planner struggles to find good actions with the pixel distance metric.

5 Examples of Success

FLO (Our Method)

DSAE Distance

Pixel Distance

5 Examples of Success

FLO (Our Method)

DSAE Distance

Pixel Distance

Cascade of Classifiers

Goal: Perform two rearrangement tasks in succession.

Task 1: 5 Examples of Success

FLO (Our Method)

Task 2: 5 Examples of Success

FLO (Our Method)

Simulated Rope Manipulation

Goal: Manipulate the rope to match the shape shown in the examples of success.

5 Examples of Success

FLO (Our Method)

5 Examples of Success

FLO (Our Method)

Simulated Visual Navigation

Goal: Navigate to the target object shown in the examples of success.

5 Examples of Success

FLO (Our Method)

5 Examples of Success

FLO (Our Method)