Multi-Agent LLM Actor-Critic Framework for Social Robot Navigation
Weizheng Wang, Obi Ike, and Byung-Cheol Min
To submit to IROS 2025
[Paper] [LLM-Actor Textual_Demo] [LLM-Critic Textual_Demo] [Re-Query Textual_Demo]
Weizheng Wang, Obi Ike, and Byung-Cheol Min
To submit to IROS 2025
[Paper] [LLM-Actor Textual_Demo] [LLM-Critic Textual_Demo] [Re-Query Textual_Demo]
Abstract
Recent advances in robotics and large language models (LLMs) have sparked growing interest in human-robot collaboration and embodied intelligence. To enable the broader deployment of robots in human-populated environments, socially-aware robot navigation (SAN) has become a key research area. While deep reinforcement learning approaches that integrate human-robot interaction (HRI) with path planning have demonstrated strong benchmark performance, they often struggle to adapt to new scenarios and environments. LLMs offer a promising avenue for zero-shot navigation through commonsense inference. However, most existing LLM-based frameworks rely on centralized decision-making, lack robust verification mechanisms, and face inconsistencies in translating macro-actions into precise low-level control signals. To address these challenges, we propose SAMALM, a decentralized multi-agent LLM actor-critic framework for multi-robot social navigation. In this framework, a set of parallel LLM actors, each reflecting distinct robot personalities or configurations, directly generate control signals. These actions undergo a two-tier verification process via a global critic that evaluates group-level behaviors and individual critics that assess each robot’s context. An entropy-based score fusion mechanism further enhances self-verification and re-query, improving both robustness and coordination. Experimental results confirm that SAMALM effectively balances local autonomy with global oversight, yielding socially compliant behaviors and strong adaptability across diverse multi-robot scenarios.
Architecture of SAMALM
SAMALM architecture: SAMALM is a decentralized multi-agent LLM actor-critic framework designed for multi-robot social navigation. In SAMALM, a set of LLM-actors generates low-level control signals for the robots, respectively. These generated actions are then evaluated by relative LLM-critics from both the team-level and agent-level perspective, which either confirms the actions or prompts a re-query with critic feedback. Once the actions pass the evaluation threshold, they are executed by the system’s executors in the multi-robot environment.
Multi-Agent LLM Actor-Critic Framework
Multi-Agent LLM Actor-Critic Framework: SAMALM facilitates multi-robot social navigation using a set of parallel LLM actors that extract semantic correlations from local world model observations and work in tandem with both global and local critics. The global critic assesses multi-robot behaviors by considering both inter-group and intra-group dynamics, while local critics evaluate individual actions based on long-term and short-term factors. Ultimately, the global and local critic scores are integrated via an entropy-based fusion mechanism that accounts for the level of disagreement among the critics, enabling self-verification and re-query with critic feedback.
Multi-Robot World Model Representation
Comparison Simulation Experiments and Trajectory Illustrations
More Trajectory Results
SAMALM Implementation Details
(1). LLM-Actor Input and Output
(Input) Environmental Configuration Prompt Engineering:
(Input) World Model Representation Prompt Engineering:
(Input) Auto-CoT Prompt Engineering:
(Output) LLM-Actor Inference Output:
The Entire Text Demo of LLM-Actor Input-Output
(2). LLM-Critic Input and Output
(Input) Evaluation Rules and Environmental Configuration Prompt Engineering:
(Input) Critic Observation Prompt Engineering:
(Input) Auto-CoT Prompt Engineering:
(Input) Global-Critic Prompt Engineering:
(Output) Local LLM-Critic Inference Output [Step-1 in Critic-CoT]:
(Output) Global-Critic Inference Output [Step-1 in Critic-CoT]:
(Output) All LLM-Critics Inference Output [Step-3 in Critic-CoT]:
[Note: Only inference information from CoT's Step-3 will be fed into LLM-Actors as evaluation feeback.]
(Output) All LLM-Critics Q-Value Output [Step-4 in Critic-CoT]:
The Entire Text Demo of LLM-Actor Input-Output
(3). Multi-LLM Actor-Critic Re-Query Mechanism
Actor-Critic Re-Query Object Example:
(Input) Re-Query LLM-Actor Environmental Configuration Prompt Engineering:
(Input) Re-Query LLM-Actor Observation Prompt Engineering:
(Input) Re-Query LLM-Actor Feedback (from Critic) Prompt Engineering:
(Input) Re-Query LLM-Actor CoT Prompt Engineering:
(Output) Re-Query LLM-Actor Inference w.r.t Critic Feedback Output:
Actor-Critic Re-Query Procedure Example:
The Entire Text Demo of LLM-Actor Re-Query Procedure
Related Socially Aware Navigation Works from SMART-LAB
[1]. (ICRA-2025) Hypergraph-based Coordinated Task Allocation and Socially-aware Navigation for Multi-Robot Systems
https://arxiv.org/pdf/2409.11561
[2]. (ICRA-2025) Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
https://arxiv.org/pdf/2409.13573
[3]. (ICRA-2024) Multi-Robot Cooperative Socially-Aware Navigation Using Multi-Agent Reinforcement Learning
https://arxiv.org/pdf/2309.15234
[4]. (IROS-2023) NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning
https://arxiv.org/pdf/2304.05979
[5]. (IROS-2022) FAPL: Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation
https://ieeexplore.ieee.org/document/9981616
References