OCR: Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy for Visuomotor Imitation Learning

George Jiayuan Gao , Tianyu Li , Nadia Figueroa

Paper

Code (Coming Soon)

Accepted at IROS 2025!

Spotlight at LLHomeRobots Workshop @ CoRL 2024

Abstract

We propose an object-centric recovery policy framework to address the challenges of out-of-distribution (OOD) scenarios in visuomotor policy learning. Previous behavior cloning (BC) methods rely heavily on a large amount of labeled data coverage, failing in unfamiliar spatial states. Without relying on extra data collection, our approach learns a recovery policy constructed by an inverse policy inferred from object keypoint manifold gradient in the original training data. The recovery policy serves as a simple add-on to any base visuomotor BC policy, agnostic to a specific method, guiding the system back towards the training distribution to ensure task success even in OOD situations. We demonstrate the effectiveness of our object-centric framework in both simulation and real robot experiments, achieving an improvement of 77.7% over the base policy in OOD.

The Problem

Base Policy

In-Distribution

Most of the visuomotor policies can handle in-distribution data very well. All the demonstration data are on the left side of the white line. The bottle never appears on the right side.

Base Policy

Out-of-Distribution :(

However, the base policy fails to execute the task when the bottle is placed on the right side (OOD). What if we can always bring the relevant object back to the distribution without any new data?

The OCR Framework

The OCR Framework augments a base policy, trained via BC, by returning task-relevant objects to their training manifold, where the base policy takes over. First, we model the distribution of object keypoints in the training data using a Gaussian Mixture Model (GMM). At test time, we compute the gradient of the GMM to derive object-recovery vectors, which are used to plan a recovery trajectory. This trajectory is then converted into robot actions through a Keypoint Inverse Policy, trained solely on the base dataset. Finally, the base policy and the recovery policy are combined into a joint policy, allowing seamless interaction between recovery and task execution.