Best Paper Award at ICRA 2025 Workshop on Advances in Social Navigation
Under review for IEEE Transactions of Automation Science and Engineering (T-ASE)
Shuijing Liu1, Haochen Xia*,2, Fatemeh Cheraghi Pouria*,2, Kaiwen Hong2,
Neeloy Chakraborty2, Zichao Hu1, Joydeep Biswas1, and Katherine Driggs-Campbell2
1UT Austin, 2University of Illinois Urbana-Champaign, * Equal contribution
Indoor deployment with dense crowds
Outdoor deployment with varying crowd densities
We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different components in the environment and propose a heterogeneous spatio-temporal (st) graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions among entities through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success and efficiency in challenging navigation scenarios. Furthermore, we demonstrate that our pipeline achieves better zero-shot generalization capability than previous works when the densities of humans and obstacles change.
Main contributions:
An structured input representation that splits humans and static obstacles;
A heterogeneous spatio-temporal graph transformer (HEIGHT) as the robot policy network.
Below, let's check out why the above 2 components are important to robot navigation in crowded and constrained environments.
Question 1: Why we split humans and static obstacles at input representation?
Let's see what happens when previous works don't do this:
Methods that mix human and obstacle input representations
In contrast, we split human and obstacles into different representation forms, improving the result:
Methods that distinguish human and obstacle input representations
HEIGHT (ours), HH
(Detour & Success)
HEIGHT (ours)
(Success)
Question 2: Why we use 3 different sub-networks for human-robot, human-human, and obstacle-agent interactions?
Again, let's compare previous work and ablation models that use a single network for all interactions, versus ours:
Learning-based methods that do not distinguish different types of interactions
Learning-based methods that distinguish different types of interactions
No attn
(Timeout)
RH
(Human collision)
HH
(Human collision)
Ours
(Success)
Visualization of attention weights of HEIGHT:
HH attention with values > 0.1
RH attention with values > 0.1
Failure cases of ours in simulation:
The black human suddenly turns toward the robot
Humans and obstacles block almost all paths to the goal
A playlist with all videos can be found at https://www.youtube.com/playlist?list=PLL4IPhbfiY3ZjXE6wwfg0nffFr_GLtwee.
In a narrow hallway:
Works with both dynamic and static humans
Traverses through extremely narrow passageways
In a crowded lounge:
Works with the person who overreacts to the robot, and continuous human flows
Works with aggressive people who ignores the robot, and continuous human flows
The robot can throw trash for you!
Failure cases of ours in real-world:
Fails when all paths are blocked by humans
Fails with adversarial humans
and undetectable obstacles
@article{liu2024height,
title={HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments},
author={Shuijing Liu and Haochen Xia and Fatemeh Cheraghi Pouria and Kaiwen Hong and Neeloy Chakraborty and Katherine Driggs-Campbell},
journal={arXiv preprint arXiv:2411.12150},
year={2024}
}