SRLM: Human-in-Loop Interactive Social Robot Navigation with Large Language Model and Deep Reinforcement Learning

Weizheng Wang, Le Mao, Ruiqi Wang, and Byung-Cheol Min

To submit in IROS 2024

Abstract

An interactive social robotic assistant must provide services in complex and crowded spaces while adapting its behavior based on real-time human language commands or feedback. In this paper, we propose a novel hybrid approach called Social Robot Planner (SRLM), which integrates Large Language Models (LLM) and Deep Reinforcement Learning (DRL) to navigate through human-filled public spaces and provide multiple social services. SRLM infers global planning from human-in-loop commands in real-time, and encodes social information into a LLM-based large navigation model (LNM) for low-level motion execution. Moreover, a DRL-based planner is designed to maintain benchmarking performance, which is blended with LNM by a large feedback model (LFM) to address the instability of current text and LLM-driven LNM. Finally, SRLM demonstrates outstanding performance in extensive experiments.

Architecture of SAMARL

SRLM framework mainly includes following three submodelsd: (1) LNM (language navigation model); (2) RLNM (reinforcement learning navigation model); (3) LFM (language feedback model).

SRLM is implemented as a human-in-loop interactive social robot navigation framework, which executes human commands based on LM-based planner, feedback-based planner, and DRL-based planner incorporating. Firstly, users' requests or real-time feedbacks are processed or replanned to high-level task guidance for three action executors via LLM. Then, the image-to-text encoder and spatio-temporal graph HRI encoder convert robot local observation information to features as LNM and RLNM input, which generate RL-based action, LM-based action, and feedback-based action. Lastly, the above three actions are adaptively fused by a low-level execution decoder as the robot behavior output of SRLM.

LNM Prompt Details

The prompt engineering of LNM comprises task description, global guidance, data annotation, initialization, historical data, additional information, and encoded state to directly generate low-level robot actions.

RLNM Details

The RLNM is composed of two parts: 1) Spatial Temporal Graph Transforemer Block, and 2) Multi-Modal Transformer Block. And RLNM utilizes a spatial-temporal graph transformer block and a multi-modal transformer block to abstract environmental dynamics and human-robot interactions into an ST-graph for safe path planning in crowd-flled environments. The spatial transformer is designed to capture hybrid spatial interactions and generate spatial attention maps, while the temporal transformer presents long-term temporal dependencies and creates temporal attention maps. The multi-modal transformer is deployed to adapt to the uncertainty of multi-modality crowd movements, aggregating all heterogeneous spatial and temporal features. Finally, the planner generates the next timestep action by a decoder.

LFM Details

LFM reconciles the output from LNM and RLNM to stabilize final mixture action, in which the GoT (graph-of-thought) construction of LFM is designed to evaluate and score the above two executions with more generated evidences or intermediate steps chains from different perspectives.

Learning Curve

Policy: SR-RLNM (ablation model)

Policy: SRLM (ours)

Policy: SR-LNM (ablation model)

Policy: SRLM (ours)

Policy: SR-LFM (ablation model)

Policy: SRLM (ours)

Comparison Simulation Experiments and Trajectory Illustrations

More Learning Curves and Experiments of SAMARL with Different Seeds

More Testcase Visualization

Simulation Experiments and Real-world User Study Video

Related Socially Aware Navigation Works from SMART-LAB

[1]. (ICRA-2024) SAMARL: Multi Robot Socially-aware Navigation with Multi-agent Reinforcement Learning

https://arxiv.org/pdf/2309.15234.pdf

[2]. (IROS-2023) NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning

https://ieeexplore.ieee.org/document/10341395

[3]. (IROS-2022) FAPL: Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation

https://ieeexplore.ieee.org/document/9981616

References

[1] W. Wang, R. Wang, L. Mao and B. -C. Min, "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning," 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 11348-11355, doi: 10.1109/IROS55552.2023.10341395.

[2] R. Wang, W. Wang and B. -C. Min, "Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 11336-11343, doi: 10.1109/IROS47612.2022.9981616.

[3] Wang, Ruiqi, et al. "Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition." arXiv preprint arXiv:2209.15182 (2022).

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).