SAMARL: Multi Robot Socially-aware Navigation with Multi-agent Reinforcement Learning
Weizheng Wang, Le Mao, Ruiqi Wang, and Byung-Cheol Min
To appear in ICRA 2024
[Paper]
Abstract
In public spaces shared with humans, ensuring multi-robot systems navigate without collisions while respecting social norms is challenging, particularly with limited communication. Although current robot social navigation techniques leverage advances in reinforcement learning and deep learning, they frequently overlook robot dynamics in simulations, leading to a simulation-to-reality gap. In this paper, we bridge this gap by presenting a new multi-robot social navigation environment crafted using Dec-POSMDP and multi-agent reinforcement learning. Furthermore, we introduce SAMARL: a novel benchmark for cooperative multi-robot social navigation. SAMARL employs a unique spatial-temporal transformer combined with multi-agent reinforcement learning. This approach effectively captures the complex interactions between robots and humans, thus promoting cooperative tendencies in multi-robot systems. Our extensive experiments reveal that SAMARL outperforms existing baseline and ablation models in our designed environment.
Related Socially Aware Navigation Works from SMART-LAB
[1]. (IROS-2023) NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning
https://ieeexplore.ieee.org/document/10341395
[2]. (Under Review) SRLM: Human-in-Loop Interactive Social Robot Navigation with Large Language Model and Deep Reinforcement Learning
https://https//arxiv.org/pdf/2403.15648
[3]. (IROS-2022) FAPL: Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation
https://ieeexplore.ieee.org/document/9981616
Architecture of SAMARL
The framework of SAMARL is composed of four blocks: 1) Local Observation; 2) ST-graph Social Interaction Encoder; 3) Multi Robot Strategy Executor; 4) Multi Robot System Output.
Each robot feeds its individual local observations into the hybrid spatial-temporal transformer-based ST-graph social interaction encoder to create spatial-temporal state representations of HRI and RRI states. Then, the robot leverages environmental dynamics features to perform multi-robot cooperative navigation policies and adhere to social norm, using the MAPPO trainer and the social norm reward function within the multi robot strategy executor block. Finally, the generated macro-action (MA) and local-action (LA) guide the robots in the environment.
Framework of Spatial-Temporal Transformer and Multi-Modal Transformer [1]
Spatial-Temporal Transformer and Multi-Modal Transformer neural network framework: (a) Spatial Transformer leverages a multi-head attention layer and a graph convolution network along the time-dimension to represent spatial attention features and spatial relational features; (b) Temporal Transformer utilizes multi-head attention layers to capture each individual agent’s long-term temporal attention dependencies; and (c) Multi-Modal Transformer fuses heterogeneous spatial and temporal features via a multi-head cross-modal transformer block [3] and a self-transformer block [4] to abstract the uncertainty of multimodality crowd movements.
Attention Map and Attention Matrix Visualization [1]
An illustration of spatial attention maps and temporal attention maps: sub-fgures (a), (b), and (c) exhibit the spatial attention maps from different agents at the same timestep; (d), (e), and (f) present the temporal attention maps from different timesteps in the same agent’s view. The radius of the circle represents the importance level based on the perceptive of the agent represented by the red circle.
An illustration of attention matrices: Visualization of a NaviSTAR attention weight group example consisting of spatial and temporal attention matrices at the result of the spatial and temporal transformer, cross attention matrices at the fnal layer of the multi-modal transformer, and a fnal attention matrix from the self-attention transformer.
Comparison Simulation Experiments and Trajectory Illustrations
Learning Curve
More Learning Curves and Experiments of SAMARL with Different Seeds
More Testcase Visualization
References
[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).