Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks


The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data, in combination with online fine-tuning guided by subgoals in learned lossy representation space. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. Learned from the broad data, the lossy representation emphasizes task-relevant information about states and goals while abstracting away redundant contexts that hinder generalization. It thus enables subgoal planning for unseen tasks, provides a compact input to the policy, and facilitates reward shaping during fine-tuning. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.

Real-World Experiments

Three multi-stage target tasks are evaluated in the real world. Objects in these target tasks are unseen in the prior data in this target domain.

Analysis of Learned Lossy Representations

We visualize the learned lossy representations using t-SNE. The planned subgoals from FLAP for Task B are projected on the plot.

Below are 9 demo trajectories, each of which is projected to be a curve on the plot. Semantically similar trajectories are close to each other in the learned lossy representation space.

Simulated Experiments

Three multi-stage tasks are evaluated in simulation. In each target task, the robot needs to strategically interact with the environment.