Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model

Anonymous Authors

Anonymous University

Submitted to ICRA 2026

Abstract

Social robot navigation increasingly relies on large language models for reasoning and path planning, enabling movement in dynamic human spaces. However, this approach can be unpredictable, difficult to control, and sometimes unsafe due to inherent LLM limitations. In this work, we propose NaviWM, a socially-aware robot Navigation World Model that represents human-robot interaction (HRI) features with respect to robot inner understanding paradigm. Meanwhile, the deductive CoT (Chain-of-Thoughts) is leveraged by NaviWM to enhance the reasoning ability of LLMs, thereby supporting socially-aware navigation with logical proofs. Prior works mainly use prompting or fine-tuning for social navigation, but their outputs often fail due to methodological limits and incomplete task control. NaviWM consists of two main components: (1) a dynamically populated world model derived from the environment, designed to model entities within the space, and (2) a deductive reasoning CoT approach that enables language models to systematically reason about the action space. We conducted extensive experiments using NaviWM to compare its performance against other baselines.

Core Contributions of NaviWM

1. We propose a novel LLM-based social navigation planner, named NaviWM to address the lack of physical knowledge on existing LLM, NaviWM leverages a world model to either capture complex environmental dynamics and HRI features as a spatial-temporal graph within a human-designed environment, or to exemplify the inner reasoning paradigm of LLM.

2. Converting comprehensive decision-making problem of social navigation tasks into a multi-step logical reasoning procedure. NaviWM introduces a natural deductive multi-step CoT (Chain-of-Thought) framework to enhance LLM reasoning ability, involving logical inference, verification, and other operations.

3. Incorporating world model and deductive CoT to enhance the integration of LLM and physical environments, addressing the multi-variable optimization problem. Our approach achieves strong performance with notable improvements over other baselines or ablation methods.

Architecture of NaviWM

The world model illustration of social robot navigation scenario: NaviWM constructs the world model from local observation to capture both agent vertex information and environmental semantic information with respect to spatial-temporal HRI features.

The architecture of NaviWM: (1). NaviWM constructs a world model to represent environmental description from local observation; (2). The deductive CoT algorithm is encoded as prompt engineering for LLM reasoning guidance; (3). The inference chains are generated step-by-step with respect to logical procedure and self-validation step; (4). Final robot action is obtained in the final step of the deductive CoT.

Simulation Scenario

We design a gym-based simulator environment illustrates a socially-aware navigation scenario in which a mobile robot (shown as a yellow marker) must move from its starting position toward a designated destination (the red star). Along the way, the robot must account for the presence of multiple pedestrians, represented as green circles with directional arrows indicating their walking trajectories.

Inspired by [1], we introduce a new representation of social distance based on observed human activities. In our environments, typical activities that are uniformly generated in the simulation are categorized as walking (default distance = 0.35 m), using a personal device (preferred distance = 0.85 m, requiring greater separation), and group interaction (default distance = 0.5 m), as shown in right figure.

[1] Thepsychophysical representation of proxemics by Hall (1963), which predicts the sensations an agent would likely experience in different physical proxemic configurations within the psychological distance zones.

NaviWM: Socially-aware Robot Navigation World Model Illustration

NaviWM introduces a socailly-aware robot navigation world model for environmental dynamics representation with respect to a spatial-temporal graph, involving Robot Node, Human Node, Robot Temporal Edge, Human Temporal Edge, and Human-Robot Spatial Edge. The world model augmented robot observation text is the input of LLM for the generation of robot actions.

The world model illustration of Robot Node, Robot Temporal Node, and Human-Robot Spatial Edge on NaviWM.

The world model illustration of Human Node and Human Temporal Node on NaviWM.

Other baseline without world model illustration

The environmental dyanmics representation of other baselines without world model in the eperiments.

Comparison Simulation Experiments and Trajectory Illustrations

TestCase-1

NaviWM [Success]

GPT4o_WM_CoT [Collision]

GPT4o_CoT [Collision]

GPT4o_WM [Collision]

GPT4o [Collision]

LLaMA405B_WM [Collision]

LLaMA405B [Collision]

LLaMA8B [Collision]

DeepSeek_WM [Collision]

DeepSeek [Collision]

GPT3.5 [Collision]

GPT4 [Collision]

TestCase-2

NaviWM [Success]

GPT4o_WM_CoT [Collision]

GPT4o_CoT [Collision]

GPT4o_WM [Collision]

GPT4o [Collision]

LLaMA405B_WM [Collision]

LLaMA405B [Collision]

LLaMA8B [TimeOut]

DeepSeek_WM [Collision]

DeepSeek [Collision]

GPT3.5 [TimeOut]

GPT4 [Collision]

TestCase-3

NaviWM [Collision]

GPT4o_WM_CoT [Collision]

GPT4o_CoT [Collision]

GPT4o_WM [Collision]

GPT4o [Collision]

LLaMA405B_WM [Collision]

LLaMA405B [Collision]

LLaMA8B [TimeOut]

DeepSeek_WM [Collision]

DeepSeek [Collision]

GPT3.5 [TimeOut]

GPT4 [TimeOut]

TestCase-4

NaviWM [Success]

GPT4o_WM_CoT [Success]

GPT4o_CoT [Collision]

GPT4o_WM [Success]

GPT4o [Success]

LLaMA405B_WM [TimeOut]

LLaMA405B [Collision]

LLaMA8B [TimeOut]

DeepSeek_WM [Collision]

DeepSeek [Collision]

GPT3.5 [TimeOut]

GPT4 [Collision]

TestCase-5

NaviWM [Success]

GPT4o_WM_CoT [Collision]

GPT4o_CoT [Success]

GPT4o_WM [Collision]

GPT4o [Collision]

LLaMA405B_WM [Collision]

LLaMA405B [Collision]

LLaMA8B [TimeOut]

DeepSeek_WM [Collision]

DeepSeek [Collision]

GPT3.5 [Collision]

GPT4 [Collision]

LLM Inference Procedure (NaviWM)

NaviWM

NaviWM Inference Example (Input)

Note: In NaviWM input, the additional logical form prompt is added to support following inference.

NaviWM Gentzen Logical Gentzen Tree Guidance (D1-CoT)

GPT4o-WM-CoT Inference Example (D1_Output)

After D1-Inference, if the generation of verification step is '[NO]' , which illustrate D1-event is fail. Then The LLM will be guided to orderly query D2-event, D3-event, and D4-event that are weaker conditions than D1-event.

The whole inference text of one time-step is accessible via the link: [NaviWM Inference Text]

https://drive.google.com/file/d/1nwL9WX8vmgcS3ShWKDPczOLlu7eJ0yl8/view?usp=sharing

(Note: In this LLM inference example, we enforced the result of each stage D1 to D4 as failure to illustrate the fully inference procedure.)

LLM Inference Procedure (Baselines)

Baseline Inference

GPT-3.5 Inference Example

The prompt input of LLM is composed by the general environmental description, robot observation information, and CoT (chain-of-thought) prompt.

The LLM inference output is directly obtained from LLM.

Baseline Inference

GPT-4 Inference Example

Baseline Inference

GPT-4o Inference Example

Baseline Inference

LLaMA Inference Example

LLM Inference Procedure (Ablation Models)

Ablation Model Inference

GPT4o-WM Inference Example

Ablation Model Inference

GPT4o-woWM-CoT Inference Example (Input)

GPT4o-woWM-CoT Inference Example (Output)

Ablation Model Inference

GPT4o-WM-CoT Inference Example (Input)

GPT4o-WM-CoT Inference Example (Output)

Trajectory Comparison

TestCase-1

TestCase-2

Experiment Results

Experiment Result Table

(SR: Average Success Rate; NP: Average Navigation Path Length; NT: Average Navigation Time; UI: Total Uncomfortable Interaction number; HA: Total Human Activity Preference Noncompliance)

References

[1]. Mead, Ross, and Maja J. Matarić. "Autonomous human–robot proxemics: socially aware navigation based on interaction potential." Autonomous Robots 41.5 (2017): 1189-1201.

Page updated

Google Sites

Report abuse