MAPF-GPT-DDG: Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

ArXiv

Code

Dataset

Multi-agent pathfinding (MAPF) is a common abstraction of multi-robot trajectory planning problems, where multiple homogeneous robots simultaneously move in the shared environment. While solving MAPF optimally has been proven to be NP-hard, scalable, and efficient, solvers are vital for real-world applications like logistics, search-and-rescue, etc. To this end, decentralized suboptimal MAPF solvers that leverage machine learning have come on stage. Building on the success of the recently introduced MAPF-GPT, a pure imitation learning solver, we introduce MAPF-GPT-DDG. This novel approach effectively fine-tunes the pre-trained MAPF model using centralized expert data. Leveraging a novel delta-data generation mechanism, MAPF-GPT-DDG accelerates training while significantly improving performance at test time. Our experiments demonstrate that MAPF-GPT-DDG surpasses all existing learning-based MAPF solvers, including the original MAPF-GPT, regarding solution quality across many testing scenarios. Remarkably, it can work with MAPF instances involving up to 1 million agents in a single environment, setting a new milestone for scalability in MAPF domains.

General Scheme

This figure illustrates how the Delta Data Generation (DDG) method works in two stages:

Data Generation: MAPF instances are solved using the current policy. When a degradation in solution quality is detected based on the difference in solution cost, expert solver is run on this instance starting from a state with the poorest solution quality. The collected observation-actions pairs are added to the training data.
Policy Improvement: The model is fine-tuned on both the original and newly collected data to avoid forgetting and improve generalization.

The loop repeats, gradually making the model more robust.

Experimental Evaluation

Citation

@article{andreychuk2025advancing,

title={Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning},

author={Anton Andreychuk and Konstantin Yakovlev and Aleksandr Panov and Alexey Skrynnik},

journal={arXiv preprint arXiv:2506.23793},

year={2025},

url={https://arxiv.org/abs/2506.23793}

}

Page updated

Google Sites

Report abuse