A meta-reinforcement learning method for adaptive payload transportation with variations

Jingyu Chen, Ruidong Ma, Meng Xu, Fethi Candan, Lyudmila Mihaylova, John Oyekan

Institute of Software, Chinese Academy of Sciences, Beijing, China

Professor Lyudmila Mihaylova is with Department of the Automatic Control and Systems Engineering, The University of Sheffield

School of Information Technology & Management, University of International Business and Economics, Beijing, China

Dr John Oyekan is with the Department of Computer Science, University of York

Abstract

The safe transport of cable-suspended payloads by a group of Unmanned Aerial Vehicles (UAVs) depends on their capacity to effectively respond to fluctuations in the dynamics caused by external variations, such as wind gusts. For group transportation with obstacles, internal variations, such as changes in formation, can also alter the space occupancy of the system related to collision detection. However, traditional adaptive learning methods are challenging to adapt to these two variations. In this paper, we present a learning-based method for collision-free dual-UAV-payload transportation in the presence of varied wind force and formation change. It consists of an adaptive trajectory tracking controller based on meta-model-based reinforcement learning with online adaptation and a novel correction policy, and a path planner that can sample collision-free goal states of the system for the controller based on the meta-collision predictor. The simulation results demonstrate that the proposed trajectory tracking controller outperforms state-of-the-art model-free, model-based, and variational inference methods in terms of payload tracking error reduction and robustness when dealing with the variations mentioned above. Specifically, the proposed controller reduces the average payload tracking error to less than 0.1 metres in most tasks without obstacles. Furthermore, by following the adapted paths generated by the path planner, the trajectory tracking controller can effectively track the payload while ensuring collision-free safety of the dual-UAV-payload system during navigation among obstacles. The success rate of the proposed method is more than 80% in all scenarios with obstacles. Our project website can be seen at https://sites.google.com/view/meta-payload-fly/ and the source code is available at https://github.com/wawachen/Meta-load-fly.

code

paper

Main methodology

We propose a design that utilises a virtual leader to train a dual-UAV-payload system in meta-model-based reinforcement learning with offline human demonstrations. The learned meta-dynamics model can be adapted online for downstream payload tracking or full system tracking tasks.
We incorporate a model-free action correction policy into the meta-model-based reinforcement learning to alleviate the effects of an inaccurate adapted meta-model in a control-oriented way in order to improve the trajectory tracking performance.
We develop a path planner that combines a meta-collision predictor and RRT algorithm to sample collision-free goal states under varied space occupancy of the system. The collision-free payload transportation is achieved by tracking the sampled goal states using our trajectory tracking controller with a high success rate.

Collecting the data for meta-dynamics model and collision predictor in four tasks

Training task 1:

Wind force is 0.0 N

Neighbour distance is 0.6 m

Training task 2:

Wind force is 0.3 N

Neighbour distance is 1.0 m

Training task 3:

Wind force is 0.5 N

Neighbour distance is 0.8m

Training task 4:

Wind force is 0.8 N

Neighbour distance is 1.2m

Performance of the adaptive tracking tasks in figure8 and square paths

Trajectory tracking with figure8 trajectory

Trajectory tracking with square trajectory

The left video is to track the figure8 trajectories and the right video is to track the square trajectories.
The two white balls are the two UAVs and the pink ball is the centre mass of the payload. The green point is the goal trajectory that the UAV payload system needs to follow.
The proposed meta-adaptation and correction method is compared with other model-based, model-free and meta-learning methods. The results of the tracking error are shown in the following Table 3.

Performance of collision-free transportation in cross and square scenarios

In an environment with obstacles, the UAV-payload system needs to follow the collision-free trajectories sampled by the path planner to avoid the obstacles. In the cross scenario, there is only one obstacle in the centre. In the square scenario, there are two obstacles. Moreover, two challenging scenarios Crowd1 and Crowd2 are demonstrated. The videos and screenshots for different scenarios in the Gazebo and Rviz are shown below.

Cross scenarios

Square scenarios

The final merged screenshots of the trajectories of UAVs and payload are shown below,

Cross scenario

Square scenario

The performance of the proposed path planner with a collision predictor compared with other baselines is shown in Table 5.

Crowd1 and Crowd2 scenarios

Tracking errors in Crowd1 and Crowd2 scenarios

Acknowledgement

We would like to acknowledge the support of the Engineering and Physical Sciences Research Council (EPSRC) funding: DigiCORTEX (EP/W014688/1) for the work.

Page updated

Google Sites

Report abuse