Modular Adaptive Policy Selection for Multi-Task Imitation Learning through Task Division

Dafni Antotsiou, Carlo Ciliberto and Tae-Kyun Kim

Deep Imitation Learning requires a large number of expert demonstrations, which are not always easy to obtain, especially for complex tasks. A way to overcome this shortage of labels is through data augmentation. However, this cannot be easily applied to control tasks due to the sequential nature of the problem. In this work, we introduce a novel augmentation method which preserves the success of the augmented trajectories. To achieve this, we introduce a semi-supervised correction network that aims to correct distorted expert actions. To adequately test the abilities of the correction network, we develop an adversarial data augmented imitation architecture to train an imitation agent using synthetic experts. Additionally, we introduce a metric to measure diversity in trajectory datasets. Experiments show that our data augmentation strategy can improve accuracy and convergence time of adversarial imitation while preserving the diversity between the generated and real trajectories.

Paper

Code

Demo of the MAPS network on the MT10 task set. It demonstrates how the network automatically divided the tasks window-open, drawer-close, and drawer-open into 3 modules. From the video, it is seen that module 9 learnt the behaviour "reach and grab the handle" and it is shared by all 3 tasks. Similarly, module 5 learnt the behaviour "reach towards target moving forward". It is shared between window-open and drawer-close but it is not appropriate for drawer-open. To avoid negative transfer, drawer-open uses the task-specific module 7 instead. This module learnt the behaviour "reach target moving backwards".

Bibtex:

@inproceedings{antotsiou2022maps,

title={Modular Adaptive Policy Selection for Multi-Task Imitation Learning through Task Division},

author={Antotsiou, Dafni and Ciliberto, Carlo and Kim, Tae-Kyun},

booktitle={Proceedings of the 2022 IEEE International Conference on Robotics and Automation (ICRA)},

year={2022}

}

Page updated

Google Sites

Report abuse