Demo 1: Natural Language-Based Drone Control
This demo showcases a system where a DJI Tello drone is controlled entirely through natural language commands. By leveraging OpenAI’s GPT-4, the system interprets user input such as “take off and hover for 5 seconds” or “move forward and then land,” breaking it down into actionable control steps. These are executed through the djitellopy API, demonstrating seamless human-drone interaction without the need for manual joystick inputs or pre-programmed sequences.
Demo 2: Object-Aware Drone Commands with Vision-Language Models
Building on basic command control, Demo 2 integrates computer vision into the natural language interface. The drone captures images of its surroundings and uses a Vision-Language Model (VLM) such as BLIP to generate contextual image captions. These captions are then analyzed by GPT-4 to determine the presence of specific objects mentioned in the user's command. If a match is found (e.g., “take a picture of the red bottle”), the drone autonomously captures a photo, enabling intelligent object-aware drone operation.