My project title was Drone Collision Avoidance Navigation with Reinforcement Learning. I received the Bell Labs summer intern award for Outstanding Innovation for my summer work. This award was signed by Nishant Batra, Chief Strategy and Technology Officer, on August 12, 2021.
I had a valuable opportunity to intern with the Mathematics & Algorithms Research Group at Bell Labs−Research in Murray Hill, New Jersey, USA. I was advised by Dr. Matthew Andrews, Karina Palyutina, and Máté Hell. During this summer 2021 internship, I had a tremendous learning experience by interacting with various researchers and attending lab talks. My focus of this internship was applying deep reinforcement learning algorithms to autonomous drone navigation for avoiding obstacles in a physics-based simulator called Microsoft AirSim.
Reinforcement Learning (RL) algorithms typically learn sequential decision-making policies by training on a simulator. In the last several years, Deep reinforcement learning (DRL) has showcased multiple breakthroughs in games, robotics, healthcare, and many more applications. In the early days of DRL, a powerful algorithm AlphaGo was able to defeat the then Go World Champion Lee Sedol in 2016. It has since come far to provide insights for healthcare applications like the AlphaFold2 algorithm decoding protein folding. Robotics applications despite being high-dimensional complex problems, DRL recently has shown tremendous promise in both stationary and mobile robots, but much potential has to be unraveled that paves the way for good research like solving the sample efficient and end-to-end system problems.
I focused on a problem related to Unmanned Aerial Vehicles (UAV) or drones, a mobile robot with six degrees of freedom. I worked with Microsoft AirSim physics-based simulator that provides access to the near-real-world drone equipped with many sensors like the first person image view, depth image, collision sense with an obstacle, and much more realistic sensory information.
My problem statement was to address the obstacle collision avoidance issue in a given environment using RL algorithms. The high-dimensional nature of this problem that arises from the vast space of the environment and the six degrees of freedom which the drone inherits for motion makes this a challenging research problem to tackle. Additionally, RL algorithms rely on the reward functions for learning good sequential decision-making policies to solve the problem. Thus, reward shaping/designing is yet another challenge to overcome for a successful sequential decision-making policy. During this internship, I experimented with various sensors such as depth images to tackle the vast space issue and various distance-based reward function criteria for tackling the reward shaping challenge; while keeping the motion of the drone simple to focus on the sub-problems mentioned earlier. One of the autonomous solutions is the drone's first-person-view video which is playing on the left side of this post.
Apart from my core research activities during my internship, I also attended various activities like Friday lunches, technical talks hosted by Bell Labs which were very insightful and useful. My mentors at Nokia Bell Labs were very resourceful, apt, supportive, and really made the work environment fun!