DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control
Conference on Robot Learning 2023 (Oral)
Kevin Huang*, Rwik Rana*, Alex Spitzer*, Guanya Shi**, Byron Boots*
*University of Washington, **Carnegie Mellon University
Abstract
Precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds on a novel feedforward-feedback-adaptive control structure trained in simulation using reinforcement learning. When deployed on real hardware, DATT is augmented with a disturbance estimator using L1 adaptive control in closed-loop, without any fine-tuning. DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields, including challenging scenarios where baselines completely fail. Moreover, DATT can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline
Overview and Approach
Commanding quadrotors in unknown environments with arbitrary trajectories is challenging due to nonlinear dynamics, actuation constraints, and potential infeasibility. Two common control strategies for accurate trajectory following are nonlinear control based on differential flatness and model predictive control (MPC).
We introduce DATT a neural network based quadrotor controller trained using reinforcement learning. The policy controller is conditioned on a feedforward embedding, similar to classical controllers, but uses only future reference positions due to undefined higher derivatives of arbitrary trajectories. The network is also conditioned on force disturbance experienced by the quadrotor. During training, a random constant force disturbance is applied, while during flight, an online estimate obtained through L1 adaptive control is used for conditioning.
The controller outputs mass-normalized thrust and body rates, which are converted to motor thrusts by an onboard PID controller.
Why use DATT ?
This controller can track arbitrary reference trajectories, even infeasible ones, and adapt to environmental disturbances like wind.
Nonlinear control is limited to differentially flat trajectories satisfying constraints.
MPC based quadrotor controller can track infeasible trajectories really well. But it relies on accurate models and efficient solvers, which may be costly.
DATT attains better tracking than MPC, while taking only a fraction of the computation time, making it much more feasible to deploy on cheaper computers.
Trajectory Tracking from Handwritten GUI input
We show an illustrative example of how DATT can be used to track arbitrary trajectories. Using a GUI (top-left), we can draw an arbitrary trajectory writing out the letters "RL" and "NW", which the drone can then follow. With this, a human (or any planner) can command the drone to follow any desired trajectory in real time just position waypoints. All graphs shown here are taken from flights on a real drone (see paper for hardware details).
Complex Trajectory Tracking with external disturbances
By conditioning our policy on the force disturbance experienced by the drone, predicted using L1 adaptive control online, DATT also allows for complex trajectory tracking in unknown environmental conditions. When dealing with aggressive, infeasible trajectories, standard L1 adaptive control fails. Likewise, we attain better performance than MPC with L1 adaptive control while taking far less computation time.
We test on an environment with multiple fans, as well as with a soft cardboard plate that is attached to the drone. This creates highly time and state dependent external disturbances on the drone, which we note are unseen during training, during which we model only constant force disturbances. Still, DATT is able to achieve better tracking performance than our baselines, even during zero-shot domain transfer.
Example infeasible trajectory with wind and plate:
An example where the drone tracks a zig-zag like trajectory with multiple sharp turns, which is infeasible. All the while, it is in a highly dynamic environment with fans and a swinging cardboard plate. We see that classical L1 adaptive control fails, while DATT is able to track the trajectory well.
5-pointed Star (no environmental disturbances):
Tracking a 5-pointed star: another example of an infeasible trajectory. In this example, there are no environmental disturbances added (no wind or plate). As expected, the addition of adaptive control (left) in this case makes little difference on performance. However, DATT achieves better tracking performance than the classical baselines. ( All graphs shown here are taken from flights on a real drone.)
5-pointed Star (With wind and plate):
5-pointed star
Tracking a 5-pointed star in the presence of wind and wind plate. DATT with adaptive control achieves better tracking performance than the classical baselines. (Taken from flights on a real drone.)
A long exposure shot of the drone performing the 5-pointed star using DATT.