Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Abstract

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to record transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

LEMAE Pipeline Video

Contributions

We build a bridge between LLM and RL to facilitate efficient multi-agent exploration by developing a systematic approach dubbed LEMAE.
We devise a computationally efficient inference strategy channeling task-specific information from LLM to distinguish key states critical for task fulfillment as subgoals for targeted exploration.
We introduce a Key State Memory Tree to organize exploration according to historic key state transitions and devise the Subspace-based Hindsight Intrinsic Reward, encouraging agents' guidance.

Experiments

LEMAE

(i) consistently outperforms the state-of-the-art~(SOTA) baselines with 10x acceleration in certain scenarios;

(ii) achieves performance comparable to the baseline with human-designed reward in sparse reward scenarios;

(iii) exhibits potential to generalize to brand-new, non-symbolic tasks.

These observations confirm the effectiveness of our design in reducing redundant exploration and improving exploration efficiency, showing promise for real-world deployment in scenarios requiring efficient exploration.

Significant Reduction in Redundant Exploration

Superier Performance in Challenging Benchmarks

Multiple-Particle Environment (MPE)

StarCraft Multi-Agent Challenge~(SMAC)

StarCraft Multi-Agent Challenge v2~(SMAC v2)

Algorithm Agnostic

Compatible with Various Algorithms

Applicable to Single-Agent Settings

Scalability and Generalization

a brand new task, termed River