Multi-Agent LLM Actor-Critic Framework for Social Robot Navigation

Weizheng Wang, Obi Ike, and Byung-Cheol Min

To submit to IROS 2025

[Paper] [LLM-Actor Textual_Demo] [LLM-Critic Textual_Demo] [Re-Query Textual_Demo]

Abstract

Recent advances in robotics and large language models (LLMs) have sparked growing interest in human-robot collaboration and embodied intelligence. To enable the broader deployment of robots in human-populated environments, socially-aware robot navigation (SAN) has become a key research area. While deep reinforcement learning approaches that integrate human-robot interaction (HRI) with path planning have demonstrated strong benchmark performance, they often struggle to adapt to new scenarios and environments. LLMs offer a promising avenue for zero-shot navigation through commonsense inference. However, most existing LLM-based frameworks rely on centralized decision-making, lack robust verification mechanisms, and face inconsistencies in translating macro-actions into precise low-level control signals. To address these challenges, we propose SAMALM, a decentralized multi-agent LLM actor-critic framework for multi-robot social navigation. In this framework, a set of parallel LLM actors, each reflecting distinct robot personalities or configurations, directly generate control signals. These actions undergo a two-tier verification process via a global critic that evaluates group-level behaviors and individual critics that assess each robot’s context. An entropy-based score fusion mechanism further enhances self-verification and re-query, improving both robustness and coordination. Experimental results confirm that SAMALM effectively balances local autonomy with global oversight, yielding socially compliant behaviors and strong adaptability across diverse multi-robot scenarios.

Architecture of SAMALM

SAMALM architecture: SAMALM is a decentralized multi-agent LLM actor-critic framework designed for multi-robot social navigation. In SAMALM, a set of LLM-actors generates low-level control signals for the robots, respectively. These generated actions are then evaluated by relative LLM-critics from both the team-level and agent-level perspective, which either confirms the actions or prompts a re-query with critic feedback. Once the actions pass the evaluation threshold, they are executed by the system’s executors in the multi-robot environment.

Multi-Agent LLM Actor-Critic Framework

Multi-Agent LLM Actor-Critic Framework: SAMALM facilitates multi-robot social navigation using a set of parallel LLM actors that extract semantic correlations from local world model observations and work in tandem with both global and local critics. The global critic assesses multi-robot behaviors by considering both inter-group and intra-group dynamics, while local critics evaluate individual actions based on long-term and short-term factors. Ultimately, the global and local critic scores are integrated via an entropy-based fusion mechanism that accounts for the level of disagreement among the critics, enabling self-verification and re-query with critic feedback.

Multi-Robot World Model Representation

An illustration of multi-robot world model construction.

Multi-Robot Social Navigation Scene

Comparison Simulation Experiments and Trajectory Illustrations

Open Space Simulator

Fov-90° Simulator

Test-Case-1: SAMALM (our model)

GPT-Actor + WM (world model) + Auto-CoT [+ Critic + Requery]

Test-Case-1: GPT-Actor + WM (world model) + Auto-CoT (Ablation Model)

[without Critic]

Test-Case-2: SAMALM (our model)

GPT-Actor + WM (world model) + Auto-CoT [+ Critic + Requery]

Test-Case-2: GPT-Actor + WM (world model) + Auto-CoT (Ablation Model)

[without Critic]

More Trajectory Results

GPT-4 + Auto-CoT + WM

GPT-4o without Auto-CoT

GPT-4 without Auto-CoT

GPT-3.5

SAMALM Implementation Details

(1). LLM-Actor Input and Output

(Input) Environmental Configuration Prompt Engineering:

(Input) World Model Representation Prompt Engineering:

(Input) Auto-CoT Prompt Engineering:

(Output) LLM-Actor Inference Output:

The Entire Text Demo of LLM-Actor Input-Output

Link: https://drive.google.com/file/d/1TFtzvwhy8E1P3sBnH0VRlHVkanacf8VW/view?usp=drive_link

[Textual_Demo]

(2). LLM-Critic Input and Output

(Input) Evaluation Rules and Environmental Configuration Prompt Engineering:

(Input) Critic Observation Prompt Engineering:

(Input) Auto-CoT Prompt Engineering:

(Input) Global-Critic Prompt Engineering:

(Output) Local LLM-Critic Inference Output [Step-1 in Critic-CoT]:

(Output) Global-Critic Inference Output [Step-1 in Critic-CoT]:

(Output) All LLM-Critics Inference Output [Step-3 in Critic-CoT]:

[Note: Only inference information from CoT's Step-3 will be fed into LLM-Actors as evaluation feeback.]

(Output) All LLM-Critics Q-Value Output [Step-4 in Critic-CoT]:

The Entire Text Demo of LLM-Actor Input-Output

Link: https://drive.google.com/file/d/1Ko6GW6ypRyquUv2nSWq081ah4MZx_RfD/view?usp=drive_link

[Textual_Demo]

(3). Multi-LLM Actor-Critic Re-Query Mechanism

Actor-Critic Re-Query Object Example:

(Input) Re-Query LLM-Actor Environmental Configuration Prompt Engineering:

(Input) Re-Query LLM-Actor Observation Prompt Engineering:

(Input) Re-Query LLM-Actor Feedback (from Critic) Prompt Engineering:

(Input) Re-Query LLM-Actor CoT Prompt Engineering:

(Output) Re-Query LLM-Actor Inference w.r.t Critic Feedback Output:

Actor-Critic Re-Query Procedure Example:

The Entire Text Demo of LLM-Actor Re-Query Procedure

Link: https://drive.google.com/file/d/1FV1MHdq64x1tWDJN8CJzMLjbnAdJLRZ_/view?usp=drive_link

[Textual_Demo]

Related Socially Aware Navigation Works from SMART-LAB

[1]. (ICRA-2025) Hypergraph-based Coordinated Task Allocation and Socially-aware Navigation for Multi-Robot Systems

https://arxiv.org/pdf/2409.11561

[2]. (ICRA-2025) Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation

https://arxiv.org/pdf/2409.13573

[3]. (ICRA-2024) Multi-Robot Cooperative Socially-Aware Navigation Using Multi-Agent Reinforcement Learning

https://arxiv.org/pdf/2309.15234

[4]. (IROS-2023) NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning

https://arxiv.org/pdf/2304.05979

[5]. (IROS-2022) FAPL: Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation

https://ieeexplore.ieee.org/document/9981616

References

Page updated

Google Sites

Report abuse