🏔️ HEIGHT: HEterogeneous Interaction GrapH Transformer for

Robot Navigation in Crowded and Constrained Environments

Best Paper Award at ICRA 2025 Workshop on Advances in Social Navigation

Under review for IEEE Transactions of Automation Science and Engineering (T-ASE)

Shuijing Liu1, Haochen Xia*,2, Fatemeh Cheraghi Pouria*,2, Kaiwen Hong2,

Neeloy Chakraborty2, Zichao Hu1, Joydeep Biswas1, and Katherine Driggs-Campbell2

1UT Austin, 2University of Illinois Urbana-Champaign, * Equal contribution

[Paper] [Video Playlist] [Code] [Blog post (中文)]

Indoor deployment with dense crowds

Outdoor deployment with varying crowd densities

Paper presentation

Abstract

We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different components in the environment and propose a heterogeneous spatio-temporal (st) graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions among entities through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success and efficiency in challenging navigation scenarios. Furthermore, we demonstrate that our pipeline achieves better zero-shot generalization capability than previous works when the densities of humans and obstacles change.

Why HEIGHT?

Main contributions:

An structured input representation that splits humans and static obstacles;
A heterogeneous spatio-temporal graph transformer (HEIGHT) as the robot policy network.

Question 1: Why we split humans and static obstacles at input representation?

Let's see what happens when previous works don't do this:

Methods that mix human and obstacle input representations

Dynamic window approach

(Human collision)

A*+CNN

(Detour & Human collision)

DS-RNN

(Human collision)

In contrast, we split human and obstacles into different representation forms, improving the result:

Methods that distinguish human and obstacle input representations

HEIGHT (ours), HH

(Detour & Success)

HEIGHT (ours)

(Success)

Question 2: Why we use 3 different sub-networks for human-robot, human-human, and obstacle-agent interactions?

Again, let's compare previous work and ablation models that use a single network for all interactions, versus ours:

Learning-based methods that do not distinguish different types of interactions

Homogeneous graph attention network

(Human collision)

Learning-based methods that distinguish different types of interactions

No attn

(Timeout)

(Human collision)

Ours

(Success)

Visualization of attention weights of HEIGHT:

HH attention with values > 0.1

RH attention with values > 0.1

Failure cases of ours in simulation:

The black human suddenly turns toward the robot

Humans and obstacles block almost all paths to the goal

More real-world deployment

A playlist with all videos can be found at https://www.youtube.com/playlist?list=PLL4IPhbfiY3ZjXE6wwfg0nffFr_GLtwee.

In a narrow hallway:

Works with both dynamic and static humans

Traverses through extremely narrow passageways

In a crowded lounge:

Works with the person who overreacts to the robot, and continuous human flows

Works with aggressive people who ignores the robot, and continuous human flows

The robot can throw trash for you!

Failure cases of ours in real-world:

Fails when all paths are blocked by humans

Fails with adversarial humans

and undetectable obstacles

Citation

@article{liu2024height,
title={HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments},
author={Shuijing Liu and Haochen Xia and Fatemeh Cheraghi Pouria and Kaiwen Hong and Neeloy Chakraborty and Katherine Driggs-Campbell},
journal={arXiv preprint arXiv:2411.12150},
year={2024}
}

Page updated

Google Sites

Report abuse