LSTP-Nav: Lightweight Spatiotemporal Policy for Map-free Multi-agent Navigation with LiDAR

The code will be released upon paper acceptance.

Abstract

Safe and efficient multi-agent navigation in dynamic environments remains inherently challenging, particularly when real-time decision-making is required on resource-constrained platforms. Ensuring collision-free trajectories while adapting to uncertainties without relying on pre-built maps further complicates real-world deployment. To address these challenges, we propose LSTP-Nav, a lightweight end-to-end policy for multi-agent navigation that enables map-free collision avoidance in complex environments by directly mapping raw LiDAR point clouds to motion commands. At the core of this framework lies LSTP-Net, an efficient network that processes raw LiDAR data using a GRU architecture, enhanced with attention mechanisms to dynamically focus on critical environmental features while minimizing computational overhead. Additionally, a novel HS reward optimizes collision avoidance by incorporating angular velocity, prioritizing obstacles along the predicted heading, and enhancing training stability. To narrow the sim-to-real gap, we develop PhysReplay-Simlab, a physics-realistic multi-agent simulator, employs localized replay to mine near-failure experiences. Relying solely on LiDA, LSTP-Nav achieves efficient zero-shot sim-to-real transfer on a CPU-only robotic platform, enabling robust navigation in dynamic environments while maintaining computation frequencies above 40 Hz. Extensive experiments demonstrate that LSTP-Nav outperforms baselines with a 9.58% higher success rate and a 12.30% lower collision rate, underscoring its practicality and robustness for real-world applications.

Framework of LSTP-Nav.

Simulation Experiments and Results

Random Scenario Tests

We demonstrate the performance of the LSTP-Nav framework, with the agent and obstacles randomly placed in a 10 m × 10 m area

1 agent & 40 obstacles

1 agent & 50 obstacles

1 agent & 60 obstacles

1 agent & 70 obstacles

10 agent & 15 obstacles

10 agent & 25 obstacles

10 agent & 35 obstacles

15 agent & 15 obstacles

Human-controlled obstacle (red) move to block the agent's path

1 agent & 25 obstacles

3 agent & 25 obstacles

6 agent & 25 obstacles

Dynamic Goal

The green robot employs the LSTP-Net policy, while the yellow agent (controlled by a human operator) serves as the dynamic goal. The other red agents are deployed with a random walk policy, ORCA, Linear policy, GRU policy, CNN policy [25], or LSTP-Net policy.

A single agent navigates in an environment with

5 other agents and 40 obstacles.

A single agent navigates in an environment with

30 other agents and 5 obstacles.

Comparison and Ablation Experiments

We compare the CNN policy [23] and conduct ablation studies (Linear and GRU policies) against the LSTP-Net policy in both single-agent and multi-agent scenarios.

CNN policy [23]

MLP policy

GRU policy

LSTP-Net policy

CNN policy [23]

MLP policy

GRU policy

LSTP-Net policy

Hardware Deployment

Single-agent (LSTP-Net policy) passes narrow area

Multi-agents (LSTP-Net policy) in staic environment

Two-agents scenario

Four-agents scenario

Three-agent scenario

Four-agents scenario: the person walks randomly and block the agent's path.

Reference (Corresponds to the serial number of the articale's references)

[23] T. Fan, P. Long, W. Liu, and J. Pan, “Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,” The International Journal of Robotics Research, 2020.

Page updated

Google Sites

Report abuse