RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li,

Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

CUHK, SmartMore, PKU, BAAI

Paper Link

GitHub (coming soon)

Abstract

Large Language Models (LLMs) excel at utilizing tools via coding but struggle with complex logic and precise control. In embodied tasks, high-level planning suits direct coding, while low-level actions often require task-specific refinement, like Reinforcement Learning (RL). To integrate both, we propose RL-GPT, a two-level hierarchical framework with a slow agent for analyzing actions and a fast agent for execution. This decomposition focuses each agent on specific tasks, boosting efficiency. Our approach surpasses traditional RL and existing GPT agents, achieving superior performance. In Minecraft, it obtains diamonds within a day on an RTX3090 and sets SOTA across MineDojo tasks.

Method Overview

Overview of RL-GPT. The overall framework consists of a slow agent (orange) and a fast agent (green). The slow agent decomposes the task and determines "which actions'' to learn. The fast agent writes code and RL configurations for low-level execution.

To learn a subtask, the LLM can generate environment configurations (task, observation, reward, and action space) to instantiate RL. In particular, by reasoning about the agent behavior to solve the subtask, the LLM generates code to provide higher-level actions in addition to the original environment actions, improving the sample efficiency for RL.

Challenging long-horizon tasks in MineDojo.

Citation

@article{liu2024rlgpt,

title={{RL-GPT}: Integrating Reinforcement Learning and Code-as-policy},

author={Liu, Shaoteng and Yuan, Haoqi and Hu, Minda and Li, Yanwei and Chen, Yukang and Liu, Shu and Lu, Zongqing and Jia, Jiaya},

journal={arXiv preprint arXiv:2402.19299},

year={2024},

}

Page updated

Google Sites

Report abuse