Multi-agent Multi-target Path Planning in Markov Decision Processes

August 2020 - December 2021

Inspired by autonomous vehicles operating underwater, we consider the problem of visiting a set of targets in minimum time by a team of non-communicating agents in a Markov decision process (MDP). The single-agent problem is at least NP-complete by reducing it to a Hamiltonian path problem. Even if the optimal solution to the single-agent problem is known, the multi-agent problem is NP-hard.

Nawaz, Farhad, and Melkior Ornik. "Multi-agent, multi-target path planning in Markov decision processes." IEEE Transactions on Automatic Control (2023).

4 agents visiting 40 targets in a 20x20 grid-world

CSL_poster_21_logo_A3.pdf

Preliminary work won the best poster award at the 16th CSL student conference, 2021, UIUC

Key takeaways and Contributions

An optimal algorithm for the single-agent problem based on Bellman’s optimality equation is exponential in the number of target states.
We proposed a sub-optimal algorithm that is polynomial at each time step.
Our algorithm generates optimal policies for certain classes of MDPs.
We proposed a target partitioning algorithm that approximately minimizes the expected time to visit the targets for the multi-agent scenario.
Our partitioning algorithm generates optimal partitions for clustered target scenarios.
Our algorithms perform much faster than the optimal procedure and more optimal than the currently available heuristic on random MDPs and gridworld environments inspired by ocean dynamics.

Page updated

Google Sites

Report abuse

Multi-agent Multi-target Path Planning in Markov Decision Processes

August 2020 - December 2021

4 agents visiting 40 targets in a 20x20 grid-world

Preliminary work won the best poster award at the 16th CSL student conference, 2021, UIUC

An optimal algorithm for the single-agent problem based on Bellman’s optimality equation is exponential in the number of target states.

We proposed a sub-optimal algorithm that is polynomial at each time step.

Our algorithm generates optimal policies for certain classes of MDPs.

We proposed a target partitioning algorithm that approximately minimizes the expected time to visit the targets for the multi-agent scenario.

Our partitioning algorithm generates optimal partitions for clustered target scenarios.

Our algorithms perform much faster than the optimal procedure and more optimal than the currently available heuristic on random MDPs and gridworld environments inspired by ocean dynamics.