Natural language control of autonomous systems represents one of the most compelling applications of Large Language Models in real life applications of LLMs. However, existing approaches face fundamental deployment challenges: cloud dependency creates privacy vulnerabilities, ongoing API costs scale prohibitively with usage, internet connectivity requirements limit operational flexibility, and even "edge" solutions may require desktop hardware that constrains true field portability.
Arial-Piloting addresses these challenges through an agentic architecture designed for real-world deployment. By combining specialized dual-model architectures with comprehensive autonomous decision-making systems, we demonstrate that sophisticated LLM-powered robotics can operate entirely at the edge while delivering enhanced capabilities
Designed specifically to be run smoothly on middle to high tier consumer laptops, Arial-Piloting provides an exciting combination of robust processing, autonomous corrections, and adaptive LLM piloting. Together the reimagined system is capable of both providing users with an intuitive framework for piloting drones and solving many of the previous issues inherent in previous LLM based autonomous systems that were cloud dependent or required powerful workstations.
Building off of the basic Typefly architecture and their creation of the minispec drone flight language, Arial-Piloting reinvented the architecture to run locally with 5 specialized models, new minispec commands, new vision detection techniques, and a truly autonomous edge based LLM piloting system. The most important changes is the integration of VLMs, automatic replanning and completion assessment, a two part minispec generation architecture with specialized reasoning and writing models, improved vision recognition capabilities, and a focus on lightweight LLM enable true edge server performance.
By creating a more adaptive program and combining it with carefully calibrated models, Arial-Piloting was able to create a robust piloting system. Through artificial dataset generation, fine tuning, and prompt engineering, Arial-Piloting was able to recreate and improve upon the capabilities demonstrated by Typefly’s original cloud dependent architecture. Marking an achievement for small model edge deployment capabilities as well as demonstrating the value of finetuned edge models over large few-shot models such as GPT-4.
Demo 1: Following a person in a windy environment. Demonstrates both the automatic adjustment of the drone’s flight and smart pathing towards persistently following a person. User prompt: “Follow the person for 30 seconds”
Demo 1: https://youtu.be/tjkAhXaK8CI
Demo 2: Searching a room with moderate lighting for a person not initially visible. Demonstrating the system’s persistence, replanning capability, and task assessment. Simulates dynamic environments where objects may not always be visible. User prompt: “Look for a person, go to them if found otherwise scan again until a person is found.”
Demo 2: https://youtu.be/Vsehu-O4UBI
Demo 3: A comparison of the same user prompt from a Typefly demo meant to test handling of complex tasks. However, this testing environment was purposely made chaotic with several items and unequal lighting to display the robustness of Arial-Piloting with both complex environments and tasks. User Prompt: “Can you find something for me to eat? If you can, go for it and return. Otherwise, find and go to something drinkable.”
Demo 3: https://youtu.be/NxF4KRd12Wg
Demo 4: Another direct comparison with a Typefly demo, this time meant to demonstrate mastery over conditional statements. In this demo lighting was once again dimmed, a greater initial distance was made, and more distractors were added. Additionally, at the end the auto-correction recognizes the drone went too far forward and can no longer see books and moves back so they remain visible. User prompt: "If you can see more than one chair behind you, then turn and go to the one with books on it.”
Demo 4:https://youtu.be/Nrf4vzC4H_w
The core innovation of Arial-Piloting lies in its multi-model architecture that enables high level reasoning and writing to be implemented entirely on an edge server. Through systematic experimentation across 130+ finetuned model iterations, it was found that splitting the process into reasoning (Qwen 3-4B) from code generation (Qwen 3-1.7B) eliminates the tendency of unified models to over-fixate on their environment rather than their task, be overly sensitive to changes in user prompts, and allows for the systems to generalize for a wider variety of tasks. The complete system orchestrates five specialized models: a classifier model for abstract object categories, dual reasoning and writing models for flight planning, a VLM for environment probing, and a dedicated assessment model for replanning decisions. Each model underwent either domain-specific fine-tuning or substantial prompt engineering rather than relying on general-purpose capabilities. Careful optimization and quantization enables the entire system to run LLMs exclusively on GPU with as little as 8GB VRAM.
The autonomous decision framework represents a fundamental advancement over traditional drone control systems. Basic automatic recentering occurs after every flight execution to maintain target visibility and optimal positioning, while a continuous replanning system using a dedicated Qwen 3-1.7B model enables real-time task assessment and adaptive behavior. For complex scene understanding beyond basic object detection, the system employs VLM-powered environment probing that can reason about abstract concepts and spatial relationships.
Edge computing optimization eliminates cloud dependencies entirely while achieving competitive performance metrics. The complete local processing pipeline delivers drone flight commands in 10-15 seconds on a laptop with 8GB VRAM and generated commands can continue for minutes depending on the task. This represents a true autonomous drone piloting system capable of processing in real time, autonomous replanning, and complex tasks all run entirely on an edge server.
Memory efficiency is achieved through quantized models and the usage of the Typefly study’s MiniSpec drone control language which optimizes the number of output tokens necessary. Real-world testing in challenging conditions including wind, poor lighting, and cluttered environments validates the system's robustness as a basis for practical deployment solutions.
This work demonstrates that edge-native autonomous systems can exceed the capabilities of cloud-based alternatives. The key finding is that cognitive separation—splitting reasoning from code generation—and extensive optimization of small models can enable them to outperform large unified models in autonomous robotics tasks. Arial-Piloting eliminates the traditional trade-offs between capability, privacy, and deployment flexibility. The system enables autonomous drone operation in environments where cloud connectivity is unavailable or undesirable, while maintaining sophisticated reasoning and real-time adaptation capabilities.
This work is based on the Typefly architecture and the minispec language developed by its authors.
The complete architecture and model weights are available on the Arial-Piloting Github: https://github.com/VincentYChia/Arial-Piloting