Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning

Sumeet Batra*, Zhehui Huang*, Aleksei Petrenko*, Tushar Kumar, Artem Molchanov, Gaurav Sukhatme

Conference on Robot Learning (CoRL), 2021

[Paper] [Github]


We demonstrate the possibility of learning drone swarm controllers via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable of controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful transfer of the model learned in simulation to highly resource-constrained physical quadrotors performing station keeping and goal swapping behaviors.

Baseline experiments

The video below demonstrates the learned behaviors of quadrotors in different scenarios. The results are shown for a policy trained with 8-drone swarms. The drones are able to maintain dense formations and reach moving targets while effectively avoiding collisions.

All videos are captured in realtime, i.e. physics of agile nano-quadrotors is simulated at 1.0x timescale.

Obstacle avoidance

The video below shows the performance of our drone teams trained and evaluated in scenarios containing a moving spherical obstacle. The quadrotors are able to robustly avoid collisions between each other while avoiding obstacles.

Larger drone teams

After additional training, the quadrotor policies are able to adapt to larger swarms (up to N=32 simulated quadrotor drones). Note that the number of neighbor drones seen by agents (K=6) remains constant, which means that our method can adapt to larger swarms without the increase in per-drone computational demands.

Real world experiments

Experiments have been run on the Crazyflie2.0 system in the real world with a few scenarios like "swarm-vs-swarm" and "same goal".



author = {Sumeet Batra and

Zhehui Huang and

Aleksei Petrenko and

Tushar Kumar and

Artem Molchanov and

Gaurav S. Sukhatme},

title = {Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning},

booktitle = {5th Conference on Robot Learning, CoRL 2021, 8-11 November 2021, London, England, {UK}},

series = {Proceedings of Machine Learning Research},

publisher = {{PMLR}},

year = {2021},

url = {}