Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads

Goal

Our goal is to enable a quadcopter with a cable tether and magnetic payload to pick up and transport different ground payloads. Specifically, the quadcopter's task is to control the payload in the image space of an external camera, without any motion capture or ground truth state. The control output is the velocity commanded to the quadcopter, and the goal is to have the payload follow an image space target trajectory.

Challenge

Transporting suspended payloads with aerial vehicles is challenging because the payload can cause significant and unpredictable changes to the robot's dynamics. These changes can lead to suboptimal flight performance or even catastrophic failure.

Idea

We therefore investigate a model-based reinforcement learning algorithm which uses meta-learning to adapt online.

Approach

At the core of our approach is a neural network dynamics model, which takes as input the current state and action and predicts the next state. This model is trained with time series data of length T to optimize the neural network weight parameters. Given the payload parameters are unknown, we represent them with a latent variable with distributional parameters. During training, we train with K different payload tasks, which enables the model to adapt to different payloads.

Our algorithm consists of a training phase and a test phase.

In the training phase, we first gather data by manually piloting the quadcopter along random trajectories with a variety of different payloads. We then run meta-training to learn the shared dynamics model parameters, and the adaptation parameters for each payload.

At test time, using the learned dynamics model parameters, the robot infers the optimal latent variable online using all of the data from the current task. The dynamics model is then used by a model-based controller to plan and execute actions that follow the desired path. As the robot flies, it continues to store data, infer the optimal latent variable parameters, and perform planning until the task is complete.

Experiments

Here we show our method running at test time. The task is to follow the specified trajectory, shown in red, as closely as possible. The model-based planner evaluates multiple trajectories, shown in white, and selects the best trajectory, shown in blue. As the quadcopter flies, it infers the optimal latent variable.

We evaluated our approach on a series of suspended payload control tasks, and compared our method with a model-based reinforcement learner in which the current state contains a history of the past payload locations and quadcopter actions.

Our approach enables a quadcopter to follow desired trajectories more closely.

One of the key aspects of our meta-learning algorithm is that as it adapts, it improves.

Our approach also enables several applications involving suspended payloads, such as

  • obstacle avoidance

  • pick up, transport, and drop off of a payload

  • intuitive "wand" control

  • target following