Abstract
In public spaces shared with humans, ensuring multi-robot systems navigate without collisions while respecting social norms is challenging, particularly with limited communication. Although current robot social navigation techniques leverage advances in reinforcement learning and deep learning, they frequently overlook robot dynamics in simulations, leading to a simulation-to-reality gap. In this paper, we bridge this gap by presenting a new multi-robot social navigation environment crafted using Dec-POSMDP and multi-agent reinforcement learning. Furthermore, we introduce SAMARL: a novel benchmark for cooperative multi-robot social navigation. SAMARL employs a unique spatial-temporal transformer combined with multi-agent reinforcement learning. This approach effectively captures the complex interactions between robots and humans, thus promoting cooperative tendencies in multi-robot systems. Our extensive experiments reveal that SAMARL outperforms existing baseline and ablation models in our designed environment.
Related Socially Aware Navigation Works from SMART-LAB
[1]. (IROS-2023) NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning
https://ieeexplore.ieee.org/document/10341395
[2]. (Under Review) SRLM: Human-in-Loop Interactive Social Robot Navigation with Large Language Model and Deep Reinforcement Learning
https://https//arxiv.org/pdf/2403.15648
[3]. (IROS-2022) FAPL: Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation
https://ieeexplore.ieee.org/document/9981616
Architecture of SAMARL
The framework of SAMARL is composed of four blocks: 1) Local Observation; 2) ST-graph Social Interaction Encoder; 3) Multi Robot Strategy Executor; 4) Multi Robot System Output.
Each robot feeds its individual local observations into the hybrid spatial-temporal transformer-based ST-graph social interaction encoder to create spatial-temporal state representations of HRI and RRI states. Then, the robot leverages environmental dynamics features to perform multi-robot cooperative navigation policies and adhere to social norm, using the MAPPO trainer and the social norm reward function within the multi robot strategy executor block. Finally, the generated macro-action (MA) and local-action (LA) guide the robots in the environment.
Framework of Spatial-Temporal Transformer and Multi-Modal Transformer [1]
Spatial-Temporal Transformer and Multi-Modal Transformer neural network framework: (a) Spatial Transformer leverages a multi-head attention layer and a graph convolution network along the time-dimension to represent spatial attention features and spatial relational features; (b) Temporal Transformer utilizes multi-head attention layers to capture each individual agent’s long-term temporal attention dependencies; and (c) Multi-Modal Transformer fuses heterogeneous spatial and temporal features via a multi-head cross-modal transformer block [3] and a self-transformer block [4] to abstract the uncertainty of multimodality crowd movements.
Attention Map and Attention Matrix Visualization [1]
Attention Maps
An illustration of spatial attention maps and temporal attention maps: sub-fgures (a), (b), and (c) exhibit the spatial attention maps from different agents at the same timestep; (d), (e), and (f) present the temporal attention maps from different timesteps in the same agent’s view. The radius of the circle represents the importance level based on the perceptive of the agent represented by the red circle.
Attention Matrix
An illustration of attention matrices: Visualization of a NaviSTAR attention weight group example consisting of spatial and temporal attention matrices at the result of the spatial and temporal transformer, cross attention matrices at the fnal layer of the multi-modal transformer, and a fnal attention matrix from the self-attention transformer.
Comparison Simulation Experiments and Trajectory Illustrations
Open Space Simulator
Fov-90° Simulator
Policy: SAMARL-PPO
Policy: SAMARL (ours)
Policy: SAMARL-SRNN
Policy: SAMARL (ours)
Learning Curve
We have introduced the PPO algorithm into the hybrid spatial temporal transformer as an ablation model (SAMARL-PPO), meanwhile, we also have replaced the transformer with SRNN as (SAMARL-SRNN). They share the same training hyperparameters with SAMARL.
More Learning Curves and Experiments of SAMARL with Different Seeds
We have also trained SAMARL based on five different seeds within 3e7 timesteps.
Successful Rate Table based on 500 random test cases.
Social Score Table based on 500 random test cases.
Trajectory Demo (Policy: SAMARL)
Trajectory Demo2 (Policy: SAMARL)
Trajectory Demo3 (Policy: SAMARL)
Trajectory Demo4 (Policy: SAMARL)
More Testcase Visualization
Policy: SAMARL-SRNN
Policy: SAMARL-SRNN
Policy: SAMARL
Policy: SAMARL
Policy: SAMARL-PPO
Policy: SAMARL-SRNN
Policy: SAMARL
Policy: SAMARL-PPO
Policy: SAMARL-SRNN
Policy: SAMARL
Policy: SAMARL-PPO
Policy: SAMARL-SRNN
Policy: SAMARL
Simulation Experiments and Real-world User Study Video
We have conducted real-world tests with several physical robots as follows, utlizing SAMARL planner.
References
[1] W. Wang, R. Wang, L. Mao and B. -C. Min, "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning," 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 11348-11355, doi: 10.1109/IROS55552.2023.10341395.
[2] R. Wang, W. Wang and B. -C. Min, "Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 11336-11343, doi: 10.1109/IROS47612.2022.9981616.
[3] Tsai, Yao-Hung Hubert, et al. "Multimodal transformer for unaligned multimodal language sequences." Proceedings of the conference. Association for computational linguistics. Meeting. Vol. 2019. NIH Public Access, 2019.
[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).