Our Motivation

  1. AVOID complex controller tuning.
  2. AVOID complex model-specific system identification.
  3. AVOID unsafe data collection.

Our Objective

  1. To find a control policy that stabilizes ANY quadrotor and moves it to a goal position.
  2. The control policy should be robust to external disturbances.
  3. The control policy should be capable of trajectory tracking.

Our Approach

Sim-to-real with reinforcement learning

  • Train entirely in simulation.
  • Train entirely from scratch.
  • Train without any pre-tuned controller to aid the learning process, in contrast to previous work.

Neural network policy

  • Inputs
    • Position error (relative to the hovering position, vector of 3 elements: Px, Py,Pz)
    • Velocity error (vector of 3 elements: Vx, Vy, Vz)
    • Orientation represented by a rotational matrix (3x3 matrix)
    • Angular velocity (vector of 3 elements: φ, θ, ψ)
  • Outputs
    • Normalized rotor thrust forces


We experiment with randomization of quadrotor model parameters to see if it can aid transfer. We define a generalized quadrotor model as shown in the image. We randomize the quadrotor dynamics by :

  • randomizing the geometric parameters and densities (5 types of components: a baselink, a payload, 4 arms, 4 motors, 4 propellers). Then computing mass and inertia.
  • randomizing the thrust-to-weight and thrust-to-torque ratio.
  • randomizing motor dynamics parameters.

We make a couple of assumptions

X shape

We only consider quadrotors of x shape as this is convenient for camera placement as opposed to + shape.

Access to the state

We assume access to reasonably accurate estimates of the quadrotor's position, velocity, orientation, and angular velocity.

Standard quadrotors

We only consider quadrotors within a wide but bounded range of physical parameters.


All experiment are shown with what we call a "baseline" policy. Please read our paper for more details. We use three different quadrotor platforms.

Crazyflie 2.0

Weight: 33g

Body width: 65mm

Thrust-to-weight ratio: ~2.0

Small Quad

Weight: 73g

Body width: 85mm

Thrust-to-weight ratio: ~2.0

Medium Quad

Weight: 124g

Body width: 90mm

Thrust-to-weight ratio: ~2.7

Trajectory tracking


Small Quad

Medium Quad


Crazyflie pushes and spins

Crazyflie throws and slaps

Crazyflie oscillations due to IMU filter delays

Small Quad throws

Medium Quad throws

Medium Quad pushes

Future work

Different network structures

We will explore RNNs and other network structures for in-flight system identification. It will improve adaptation to the system specific and potentially time-dependent parameters, such as the thrust-to-weight ratio.

Closing sim2real loop

Now that we have a safe controller, we can collect and incorporate data (trajectories) from a specific platform into training. It will help us further boost the performance.

Train to handle failure cases

We have seen how remarkably good the neural network policies at handling disturbances. We want to extend their capability toward recognizing failures and performing safe emergency landing.

Latest results

We will be adding some interesting results as we keep working on the project.

Large Quad

Weight: 500g

Body width: 150mm

Thrust-to-weight ratio: ~2.7

Recently we have added a large quadrotor to our swarm. We tested it with our NN baseline controller. Here is the Figure 8 flight.