FuRL: Visual-Language Models as Fuzzy Rewards for RL

Yuwei Fu¹ ² *         Haichao Zhang²           Di Wu¹       Wei Xu²        Benoit Boulet¹       

¹McGill University               ²Horizon Robotics

*Work done during an internship at Horizon Robotics

       ICML 2024

                                                                                                                                                                                [PDF] [Code]


Abstract

In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Specifically, we enhance the performance of SAC/DrQ baseline agents on sparse reward tasks by fine-tuning VLM representations and using Relay RL to avoid local minima. Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method.


Reward mis-alignment


VLM-as-Reward framework suffers from reward mis-alignment

Framework

FuRL contains two major and interacting components:

Performance Comparison

FuRL outperforms a number of baseline methods on the sparse reward MetaWorld benchmark.



Related Publications and Resources

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu, Haichao Zhang,  Di Wu,   Wei Xu,   Benoit Boulet

ICML 2024

[PDF] [Code]


@inproceedings{furltitle={{FuRL}: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning}, author={Yuwei Fu and Haichao Zhang and  Di Wu and  Wei Xu and Benoit Boulet}, booktitle={Internatonal Conference on Machine Learning}, year={2024}}