Autonomous Learning Group, Max Planck Institute for Intelligent System, Germany
Robotic Systems Lab, ETH Zurich, Switzerland
ICRA 2023
Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained universal policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.
In this work, we present a novel adversarial imitation learning method named Cooperative Adversarial Self-supervised Skill Imitation (CASSI).
Given an unlabeled dataset, the system trains an imitation discriminator that learns sampled state transition patterns present in the demonstrations. Meanwhile, at the beginning of each episode, a latent skill variable is sampled and motivates the policy to generate distinct motions. A skill discriminator is trained to decode the original skill from these motions, whose discriminating performance is promoted to encourage discriminability among skill executions.
CASSI Overview
CASSI allows learning agents to extract and obtain individual behaviors from datasets containing diverse unknown state transition patterns.
We evaluate CASSI on the Solo 8 robot, an open-source research quadruped robot that performs a wide range of physical actions, in simulation and on the real system.
For evaluation, we construct an unlabeled dataset with mixed motions including crawl, leap, stilt, trot, walk and wave with various speeds.
The resulting policy is a versatile policy that generates consistently distinguishable state transition patterns. The policy enables active control of skills extracted from the unlabeled dataset with seamless transitions from one to another.
New behaviors are discovered in form of reference interpolations. The components of these novel skills are quantified using an oracle classifier, which is trained on the original dataset with ground-truth knowledge.
The learned policy is tested on the real Solo 8 robot for various tasks.
We attach the imeplementation details of CASSI below.