Olivia Y. Lee¹, Annie Xie¹, Kuan Fang², Karl Pertsch¹˒³, and Chelsea Finn¹
¹Stanford University, ²Cornell University, ³UC Berkeley
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Robotics: Science & Systems 2024 Workshops: Task Specification & Lifelong Robot Learning
We leverage the ability of modern VLMs to reason about affordances in zero-shot to define
dense shaping rewards that improve online reinforcement learning.
We present 🔑 KAGI: Keypoint-based Affordance Guidance for Improvements.
KAGI consists of two main components:
(1) We leverage a VLM to select from a set of candidate affordance keypoints, then generate a waypoint sequence forming a trajectory towards the goal.
(2) We perform a per timestep reward computation for each frame in the episode replay buffer, computing dense reward with respect to the waypoint sequence and a sparse reward derived from a success classifier. The dense reward is used for online RL if the sparse reward is 0, else the sparse reward is used.
On four real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enable successful completion of the task in 30K online finetuning steps.
We compare our approach to MOKA, which executes actions planned by a VLM zero-shot. From this, we see that online fine-tuning is advantageous as KAGI learns policies robust to environment perturbations.
Cloth Covering: 90%
Almond Sweeping: 100%
Spatula Pick-Place: 70%
Cube Stacking: 60%
Cloth Covering: 50%
Almond Sweeping: 65%
Spatula Pick-Place: 30%
Cube Stacking: 15%
Cloth Covering: 80%
Almond Sweeping: 80%
Spatula Pick-Place: 65%
Cube Stacking: 45%
Furthermore, we reduce the number of in-domain demonstrations used in the pipeline by 5x. Even with fewer demonstrations, KAGI can recover comparable performance with more fine-tuning. KAGI's dense rewards improve the system's robustness to reduced demonstrations, facilitating scalability to new tasks.
Cloth Covering: 45%
Almond Sweeping: 40%
Spatula Pick-Place: 10%
Cube Stacking: 5%
Cloth Covering: 50%
Almond Sweeping: 55%
Spatula Pick-Place: 30%
Cube Stacking: 15%
Cloth Covering: 75%
Almond Sweeping: 80%
Spatula Pick-Place: 60%
Cube Stacking: 40%
In simulation experiments, we verify that both dense and sparse rewards are critical for task success, and that
KAGI is robust to reductions in quantity of task demonstrations.
@article{lee2025affordanceguidedrl,
author = {Olivia Y. Lee and Annie Xie and Kuan Fang and Karl Pertsch and Chelsea Finn},
title = {Affordance-Guided Reinforcement Learning via Visual Prompting},
journal = {arXiv preprint arXiv:2407.10341,
year = {2024}
}