Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy For Visuomotor Imitation Learning
Spotlight Paper in Workshop on Lifelong Learning for Home Robots @ CoRL 2024
University of Pennsylvania
Spotlight Presentation
Abstract
We propose an object-centric recovery policy framework to address the challenges of out-of-distribution (OOD) scenarios in visuomotor policy learning. Previous behavior cloning (BC) methods rely heavily on a large amount of labeled data coverage, failing in unfamiliar spatial states. Without relying on extra data collection, our approach learns a recovery policy constructed by an inverse policy inferred from object keypoint manifold gradient in the original training data. The recovery policy serves as a simple add-on to any base visuomotor BC policy, agnostic to a specific method, guiding the system back towards the training distribution to ensure task success even in OOD situations. We demonstrate the effectiveness of our object-centric framework in both simulation and real robot experiments, achieving an improvement of 77.7% over the base policy in OOD.
The Problem
Base Policy
In-Distribution
Most of the visuomotor policies can handle in-distribution data very well. All the demonstration data are on the left side of the white line. The bottle never appears on the right side.
Base Policy
Out-of-Distribution :(
However, the base policy fails to execute the task when the bottle is placed on the right side (OOD). What if we can always bring the relevant object back to the distribution without any new data?
The OCR Framework
The OCR Framework augments a base policy, trained via BC, by returning task-relevant objects to their training manifold, where the base policy takes over. First, we model the distribution of object keypoints in the training data using a Gaussian Mixture Model (GMM). At test time, we compute the gradient of the GMM to derive object-recovery vectors, which are used to plan a recovery trajectory. This trajectory is then converted into robot actions through a Keypoint Inverse Policy, trained solely on the base dataset. Finally, the base policy and the recovery policy are combined into a joint policy, allowing seamless interaction between recovery and task execution.
Experiments
Push-T
Base Policy
In-Distribution
All the demonstration data are on the left side of the space. The T shape never shows up on the right side.
Base Policy
Out-of-Distribution
The base policy fails to execute the task when the T shape is placed on the right side.
Base + Recovery
Out-of-Distribution
(Ours)
Square
Base Policy
In-Distribution
All the demonstration data are on the right side of the space. The square never appears on the left side.
Base Policy
Out-of-Distribution
The base policy fails to execute the task when the square is placed on the left side.
Base + Recovery
Out-of-Distribution
(Ours)
Real Robot
Base Policy
In-Distribution
All the demonstration data are on the left side of the white line. The bottle never appears on the right side.
Base Policy
Out-of-Distribution
The base policy fails to execute the task when the bottle is placed on the right side.
Base + Recovery
Out-of-Distribution
(Ours)