A Reinforcement Learning Approach for Decentralized Multi-UAVs Collision Avoidance Under Imperfect Sensing
Dawei Wang, Tingxiang Fan, Tao Han, Jia Pan
Dawei Wang, Tingxiang Fan, Tao Han, Jia Pan
Different from autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs) have more complicated state space and a greater degree of freedom to plan. Moreover, the larger state space leads to greater difficulty to model the uncertainty and noise in the environment, which is often impractical.
In this paper, we proposed a reinforcement learning-based multi-UAVs collision avoidance approach without explicitly modeling the uncertainty and noise in the environment. Our goal is to train a policy to plan a collision-free trajectory, leveraging local noisy observations. However, it is still challenging for training a reinforcement learning (RL) model since RL doesn't have a fixed training set and ground truth targets, which cause RL methods hard to reproduce. To address this issue, we introduce a two-stage training method for reinforcement learning based collision avoidance policy. For the first stage, we optimize the model using a supervised training method with ``the distance to ORCA plane'' as the loss function. For the second stage, we apply the policy gradient method for refining the model. We validate our policy in a variety of simulated scenarios, the simulation result shows that our policy can generate time-efficient and collision-free paths under imperfect sensing. Our extensive experiments demonstrate that our policy is also able to handle noisy local observations with the unknown noise level.