Drones have been increasingly useful in various applications, including critical situations that are dangerous and inoperable for humans. However, their pervasive use has been limited due the high skill requirement to accurately navigate a drone. The training for a drone pilot certification is very stringent, and thus to fully and effectively integrate drone-use into society, it is crucial to design a safe and intuitive way to aid the human-drone interaction.
What if piloting a drone was as simple as giving a command? We believe that drones are for everyone, not just for the enthusiasts and trained experts. Our project aims to make this task a safe and easy one by using an object detection algorithm to recognize a specific target in the environment. This target will then be specified by the user using a speech to text engine, which will instruct the drone to automatically navigate to the selected object.
Real-time object detection is crucial for this application, and since Convolutional Neural Network (CNN) based object detection is computationally demanding, we propose accelerating the object detection by implementing the CNN framework on a GPU. The CNN we propose to use is Fast YOLO (You Only Look Once), which is a state-of-the-art, real-time object detection system. Autonomous navigation and the simple voice based UI have the potential to improve ease-of-use in disaster relief, search and rescue operations and jobsite safety inspections.