Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

Abstract

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty has attracted increasing attention, the lack of proper guidance often leads to redundant exploration, posing a practical issue for the community. This paper introduces LEMAE, a systematic approach to channel informative task-relevant guidance from a Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to record transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., MPE, SMAC, and MuJoCo) by a large margin, achieving a 10x acceleration in certain scenarios.

Contributions

We build a bridge between LLM and RL to facilitate efficient multi-agent exploration by developing a systematic approach dubbed LEMAE.
We devise a computationally efficient inference strategy channeling task-specific information from LLM to discriminate task-crucial key states for coordinated and targeted exploration.
We introduce Key State Memory Tree to organize exploration with historic key states, and devise the Subspace-based Hindsight Intrinsic Reward to promote guided exploration.

Experiments

LEMAE

(i) consistently outperforms the state-of-the-art~(SOTA) baselines with 10x acceleration in certain scenarios;

(ii) achieves comparable performance to baselines trained with human-designed dense rewards under sparse rewards;

(iii) exhibits potential to generalize to tasks previously unseen by LLM or involving non-symbolic states.

These results validate LEMAE's ability to reduce redundant exploration and improve efficiency in MARL, showing promise for broader application in scenarios requiring efficient multi-agent exploration.