Policy learning for DMPC

Toward Scalable Multi-Robot Control: Fast Policy Learning in Distributed MPC

Xinglong Zhang, Wei Pan, Cong Li, Xin Xu, Xiangke Wang, Ronghua Zhang, Dewen Hu

Motivation and Contributions

(A) In nonlinear DMPC, the optimization problems are usually solved through nonlinear programming solvers, which could be computationally intensive, especially for MRS with large scales. (B) Our approach generates the closed-loop DMPC policies through distributed policy learning, and the learned policies are composed of parameterized functions that could be online trained and deployed.

Distributed model predictive control (DMPC) is promising to achieve optimal cooperative control in multi-robot systems (MRSs). However, real-time DMPC implementation relies on numerical optimization tools to periodically calculate local control sequences online. This process is computationally demanding and lacks scalability for large-scale, nonlinear MRSs.

In this article, we propose a novel distributed learning-based predictive control (DLPC) framework for scalable multirobot control. Unlike conventional DMPC methods that calculate open-loop control sequences, our approach centers around a computationally fast and efficient distributed policy learning algorithm that generates explicit closed-loop DMPC policies for MRS without using numerical solvers. The policy learning is executed incrementally and forward in time in each prediction interval through an online distributed actor-critic implementation. The control policies are successively updated in a receding-horizon manner, enabling fast and efficient policy learning with the closed-loop stability guarantee. The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS.

Furthermore, we extend our methodology to tackle the multirobot safe learning challenge through a force field inspired policy learning approach. We validate our approach's effectiveness, scalability, and efficiency through extensive experiments on cooperative tasks of large-scale wheeled robots and multirotor drones. Our results demonstrate the rapid learning and deployment of DMPC policies for MRS with scales up to 10,000 units.

Technical approach

The optimization problem of DMPC over each prediction horizon is decomposed into several sequential subproblems and solved through policy learning. The control policy for each robot is a parameterized function of the neighbor states, which are updated incrementally and forward in time with an efficient distributed actor-critic implementation.

A: A sketch diagram of the distributed actor-critic learning algorithm in the prediction interval [k,k+N-1], for the formation control of wheeled robots or multirotor drones; B: The learned control policy is of explicit structure, and the one generated with 2 robots could be online deployed to 1,000 robots with weight sharing.

Online learning and deployment tests

We have shown that our approach could online learn the DMPC policies for MRSs with scales up to 10,000. As far as we know, no optimization-based control approach has realized distributed control for MRSs with such a large scale. Notably, our learned policy, trained with small scales, could be directly deployed to mobile wheeled vehicles with scales up to 1,000, and multirotor drones in Gazebo with scales up to 40.

Online policy learning with robot scales up to 10000

Online policy deployment from 2 robots to 1ooo

Video Material for Learning Convergence and Stability Verification in the 2-robot Scenario, including Parameter Setting, Code Running, and Results.

The experimental running and numerical results for online policy learning with 4, 8, 200, 1,000, and 10,000 robots. The experiments were run with randomly initialized weights and initial state conditions.

The visualization and animation of policy learning with 200, 1,000, and 10,000 robots.

Experimental Results

Multirotor drones in Gazebo

Our approach realizes formation control and transformation for multirotor drones with scales 6, 18, and 40. Our approach outperforms the baseline algorithm in terms of formation performance.

Our approach

Baseline

Formation of 40 multirotor drones to follow large-curvature paths

Formation and transformation of 40 multirotor drones

Formation of 40 multirotor drones to follow large-curvature paths

Formation and transformation of 40 multirotor drones

Real-world experiments on mobile wheeled robots

Our real-world experiments have verified two significant features of our approach. First, the control policies learned from simulation exhibit strong sim-to-real transferability. Second, the learned policies could also be directly deployed to real-world MRS across different scales, enabling scalability for optimization-based control of large-scale MRS.

Formation, collision avoidance, and transformation of three robots

Formation, collision avoidance, and transformation of two robots

Source codes are available at https://github.com/xinglongzhangnudt/policy-learning-for-distributed-mpc/ under a GPLv4 license.

Page updated

Google Sites

Report abuse