Safe and efficient multi-agent navigation in dynamic environments remains inherently challenging, particularly when real-time decision-making is required on resource-constrained platforms. Ensuring collision-free trajectories while adapting to uncertainties without relying on pre-built maps further complicates real-world deployment. To address these challenges, we propose LSTP-Nav, a lightweight end-to-end policy for multi-agent navigation that enables map-free collision avoidance in complex environments by directly mapping raw LiDAR point clouds to motion commands. At the core of this framework lies LSTP-Net, an efficient network that processes raw LiDAR data using a GRU architecture, enhanced with attention mechanisms to dynamically focus on critical environmental features while minimizing computational overhead. Additionally, a novel HS reward optimizes collision avoidance by incorporating angular velocity, prioritizing obstacles along the predicted heading, and enhancing training stability. To narrow the sim-to-real gap, we develop PhysReplay-Simlab, a physics-realistic multi-agent simulator, employs localized replay to mine near-failure experiences. Relying solely on LiDA, LSTP-Nav achieves efficient zero-shot sim-to-real transfer on a CPU-only robotic platform, enabling robust navigation in dynamic environments while maintaining computation frequencies above 40 Hz. Extensive experiments demonstrate that LSTP-Nav outperforms baselines with a 9.58% higher success rate and a 12.30% lower collision rate, underscoring its practicality and robustness for real-world applications.
Framework of LSTP-Nav.
We demonstrate the performance of the LSTP-Nav framework, with the agent and obstacles randomly placed in a 10 m × 10 m area
1 agent & 40 obstacles
1 agent & 50 obstacles
1 agent & 60 obstacles
1 agent & 70 obstacles
10 agent & 15 obstacles
10 agent & 25 obstacles
10 agent & 35 obstacles
15 agent & 15 obstacles
1 agent & 25 obstacles
3 agent & 25 obstacles
6 agent & 25 obstacles
The green robot employs the LSTP-Net policy, while the yellow agent (controlled by a human operator) serves as the dynamic goal. The other red agents are deployed with a random walk policy, ORCA, Linear policy, GRU policy, CNN policy [25], or LSTP-Net policy.
A single agent navigates in an environment with
5 other agents and 40 obstacles.
A single agent navigates in an environment with
30 other agents and 5 obstacles.
We compare the CNN policy [23] and conduct ablation studies (Linear and GRU policies) against the LSTP-Net policy in both single-agent and multi-agent scenarios.
CNN policy [23]
MLP policy
GRU policy
LSTP-Net policy
CNN policy [23]
MLP policy
GRU policy
LSTP-Net policy
Two-agents scenario
Four-agents scenario
Three-agent scenario
Four-agents scenario: the person walks randomly and block the agent's path.
[23] T. Fan, P. Long, W. Liu, and J. Pan, “Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios,” The International Journal of Robotics Research, 2020.