Affordance-Guided RL

Affordance-Guided Reinforcement Learning
via Visual Prompting

Olivia Y. Lee¹, Annie Xie¹, Kuan Fang², Karl Pertsch¹˒³, and Chelsea Finn¹

¹Stanford University, ²Cornell University, ³UC Berkeley

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

Robotics: Science & Systems 2024 Workshops: Task Specification & Lifelong Robot Learning

We leverage the ability of modern VLMs to reason about affordances in zero-shot to define
dense shaping rewards that improve online reinforcement learning.

We present 🔑 KAGI: Keypoint-based Affordance Guidance for Improvements.

KAGI consists of two main components:

(1) We leverage a VLM to select from a set of candidate affordance keypoints, then generate a waypoint sequence forming a trajectory towards the goal.

(2) We perform a per timestep reward computation for each frame in the episode replay buffer, computing dense reward with respect to the waypoint sequence and a sparse reward derived from a success classifier. The dense reward is used for online RL if the sparse reward is 0, else the sparse reward is used.

On four real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enable successful completion of the task in 30K online finetuning steps.

We compare our approach to MOKA, which executes actions planned by a VLM zero-shot. From this, we see that online fine-tuning is advantageous as KAGI learns policies robust to environment perturbations.

MOKA Precise Resets

Cloth Covering: 90%

Almond Sweeping: 100%

Spatula Pick-Place: 70%

Cube Stacking: 60%

MOKA Imprecise Resets

Cloth Covering: 50%

Almond Sweeping: 65%

Spatula Pick-Place: 30%

Cube Stacking: 15%

KAGI (Ours, Standard Demos)

Cloth Covering: 80%

Almond Sweeping: 80%

Spatula Pick-Place: 65%

Cube Stacking: 45%

Furthermore, we reduce the number of in-domain demonstrations used in the pipeline by 5x. Even with fewer demonstrations, KAGI can recover comparable performance with more fine-tuning. KAGI's dense rewards improve the system's robustness to reduced demonstrations, facilitating scalability to new tasks.

Offline RL

Cloth Covering: 45%

Almond Sweeping: 40%

Spatula Pick-Place: 10%

Cube Stacking: 5%

RoboFuME (Sparse)

Cloth Covering: 50%

Almond Sweeping: 55%

Spatula Pick-Place: 30%

Cube Stacking: 15%

KAGI (Ours, Dense+Sparse)

Cloth Covering: 75%

Almond Sweeping: 80%

Spatula Pick-Place: 60%

Cube Stacking: 40%

In simulation experiments, we verify that both dense and sparse rewards are critical for task success, and that
KAGI is robust to reductions in quantity of task demonstrations.

Video Summary

Website Video.mp4

BibTex

@article{lee2025affordanceguidedrl,

author = {Olivia Y. Lee and Annie Xie and Kuan Fang and Karl Pertsch and Chelsea Finn},

title = {Affordance-Guided Reinforcement Learning via Visual Prompting},

journal = {arXiv preprint arXiv:2407.10341,

year = {2024}

}

Page updated

Google Sites

Report abuse