SAMARL: Multi Robot Socially-aware Navigation with Multi-agent Reinforcement Learning  


Weizheng Wang, Le Mao, Ruiqi Wang, and Byung-Cheol Min

 SMART Lab, Purdue University

To appear in ICRA 2024 

[Paper]


Abstract

In public spaces shared with humans, ensuring multi-robot systems navigate without collisions while respecting social norms is challenging, particularly with limited communication. Although current robot social navigation techniques leverage advances in reinforcement learning and deep learning, they frequently overlook robot dynamics in simulations, leading to a simulation-to-reality gap. In this paper, we bridge this gap by presenting a new multi-robot social navigation environment crafted using Dec-POSMDP and multi-agent reinforcement learning. Furthermore, we introduce SAMARL: a novel benchmark for cooperative multi-robot social navigation. SAMARL employs a unique spatial-temporal transformer combined with multi-agent reinforcement learning. This approach effectively captures the complex interactions between robots and humans, thus promoting cooperative tendencies in multi-robot systems. Our extensive experiments reveal that SAMARL outperforms existing baseline and ablation models in our designed environment. 

Related Socially Aware Navigation Works from SMART-LAB

[1]. (IROS-2023) NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning

https://ieeexplore.ieee.org/document/10341395


[2]. (IROS-2022) FAPL: Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation

https://ieeexplore.ieee.org/document/9981616


Architecture of SAMARL

The framework of SAMARL is composed of four blocks: 1) Local Observation; 2) ST-graph Social Interaction Encoder; 3) Multi Robot Strategy Executor; 4) Multi Robot System Output. 

Each robot feeds its individual local observations into the hybrid spatial-temporal transformer-based ST-graph social interaction encoder to create spatial-temporal state representations of HRI and RRI states. Then, the robot leverages environmental dynamics features to perform multi-robot cooperative navigation policies and adhere to social norm, using the MAPPO trainer and the social norm reward function within the multi robot strategy executor block. Finally, the generated macro-action (MA) and local-action (LA) guide the robots in the environment. 

Framework of Spatial-Temporal Transformer and Multi-Modal Transformer [1]

Spatial-Temporal Transformer and Multi-Modal Transformer neural network framework: (a) Spatial Transformer leverages a multi-head attention layer and a graph convolution network along the time-dimension to represent spatial attention features and spatial relational features; (b) Temporal Transformer utilizes multi-head attention layers to capture each individual agent’s long-term temporal attention dependencies; and (c) Multi-Modal Transformer fuses heterogeneous spatial and temporal features via a multi-head cross-modal transformer block [3] and a self-transformer block [4] to abstract the uncertainty of multimodality crowd movements. 

Attention Map and Attention Matrix Visualization [1]

Attention Maps

An illustration of spatial attention maps and temporal attention maps: sub-fgures (a), (b), and (c) exhibit the spatial attention maps from different agents at the same timestep; (d), (e), and (f) present the temporal attention maps from different timesteps in the same agent’s view. The radius of the circle represents the importance level based on the perceptive of the agent represented by the red circle. 

Attention Matrix

An illustration of attention matrices: Visualization of a NaviSTAR attention weight group example consisting of spatial and temporal attention matrices at the result of the spatial and temporal transformer, cross attention matrices at the fnal layer of the multi-modal transformer, and a fnal attention matrix from the self-attention transformer.

Comparison Simulation Experiments and Trajectory Illustrations

Open Space Simulator

Fov-90° Simulator

Policy: SAMARL-PPO

Policy: SAMARL (ours)

Policy: SAMARL-SRNN

Policy: SAMARL (ours)

Learning Curve

We have introduced the PPO algorithm into the hybrid spatial temporal transformer as an ablation model (SAMARL-PPO), meanwhile, we also have replaced the transformer with SRNN as (SAMARL-SRNN). They share the same training hyperparameters with SAMARL.

More Learning Curves and Experiments of SAMARL with Different Seeds

We trained SAMARL based on five different seeds within 20,000 episodes, in which the reward calculations in each episode are not terminated until the max timestep.

Successful Rate Table based on 500 random test cases.

Social Score Table based on 500 random test cases.

Trajectory Demo (Policy: SAMARL)

Trajectory Demo2 (Policy: SAMARL)

More Testcase Visualization

Policy: SAMARL-PPO

Policy: SAMARL-SRNN

Policy: SAMARL

Policy: SAMARL-PPO

Policy: SAMARL-SRNN

Policy: SAMARL

Policy: SAMARL-PPO

Policy: SAMARL-SRNN

Policy: SAMARL

Simulation Experiments and Real-world User Study Video

We have conducted real-world tests with several physical robots as follows, utlizing SAMARL planner.

1.mp4
2.mp4
peter11.mp4

References

[1] W. Wang, R. Wang, L. Mao and B. -C. Min, "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning," 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 11348-11355, doi: 10.1109/IROS55552.2023.10341395. 

[2] R. Wang, W. Wang and B. -C. Min, "Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 11336-11343, doi: 10.1109/IROS47612.2022.9981616.

[3] Wang, Ruiqi, et al. "Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition." arXiv preprint arXiv:2209.15182 (2022). 

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).