DEEP Q LEARNING FOR AUTONOMOUS UAV NAVIGATION IN AIRSIM

MOTIVATION

Our project is aimed at developing optimum navigation policies for UAVs with the help of Deep Q Learning. These policies are to be ultimately deployed on UAVs in situations such as monitoring of agriculture, structural inspection etc. These UAVs can also be deployed as first respondents during a crisis or disasters in inaccessible areas.

This project has been inspired from the thesis written by Nahush Gondhalekar[1]; which describes the implementation of a naïve Reinforcement Learning algorithm with Gaussian Processes for infrastructure and environmental monitoring.

PROBLEM STATEMENT AND OUR APPROACH

We defined a goal position in the Neighborhood environment of AirSim which we expected the drone to reach on its own via a number of trial and errors.

We carried out a number of simulations, both successful and unsuccessful, in AirSim, a simulator by Microsoft. We defined our state spaces as the Depth Images from the front camera of our drone and the actions possible were to make the drone go forward, backward, rotate left and rotate right.

Our implementation of Deep Q Neural Network agent was adapted from: Nature 518. "Human-level control through deep reinforcement learning" (Mnih & al. 2015)[2]. The structure for this network was predefined in the CNTK library.

This Neural Network helped us to estimate the Q values which were in turn used to improve the optimal policies.

THE REWARD FUNCTION

The reward function was defined as a sum of the distance of the drone's current position to its goal position along with its velocity. As the drone moved closer to the goal, the reward increased. Similarly, as the drone collided into obstacles, the drone would be penalized and the rewards obtained by the drone would be drastically reduced. The velocity of the drone was included in the reward function to ensure that the drone doesn't learn that being stationary is a way to avoid obstacles.

The optimal policy designed also imposed a strong penalty if the drone wandered too far from the goal position.

This reward function was formulated to teach our drone, (the policy) to execute actions which would guide it towards the goal position while avoiding obstacles, thereby leaning how to navigate through the environment.

RESULTS

After running the simulation for more than three hundred and fifty episodes, which took almost 8 hours, we plotted the total rewards collected per episode. We observed that as the number of episodes increased, the rewards collected per episode showed a small ascent.

Considering that other naive implementations of the idea similar to ours took almost six thousand episodes to converge, we estimate our algorithm would take much more than that. Unfortunately, we did not have a GPU strong enough to take on that load.

DEMONSTRATION

Here, we show a couple of episodes being carried out in the simulator and how the drone returns to its start position after every collision. In the command line towards the right we can see the penalty being imposed for every collision. The image on the bottom left corner indicates the input to the Neural Network which are depth images captured by the drone's on board front camera.

DEPLOYMENT

We used the built binaries for Windows which can be accessed directly from AirSim's GitHub Repository.
AirSim provided APIs which can be used to control the drone inside the simulator.
To access the APIs we installed the msgpack-rpc which communicated with the simulator over a TCP protocol.
The base code can be found in the tutorials, by the name of DQNDrone.py which can clearly help one understand how the Neural Network was tweaked for AirSim.

FUTURE WORK

To implement Transfer Learning i.e., transferring the weights learnt from the Neighborhood environment in AirSim to other environments present in the AirSim simulator, thus to formulate a general optimal navigation policy.
To get a better understanding of the action spaces and take into account the Roll and Pitch of a drone.
To study Neural Networks such as MobileNets and understand how they can be deployed on drones.
To implement our policy on a real drone.

TEAM

Abhimanyu Chadha

Shalini Ragothaman

Report abuse