Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Autonomous Learning Group, Max Planck Institute for Intelligent System, Germany

Robotic Systems Lab, ETH Zurich, Switzerland

ICRA 2023

Abstract

Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained universal policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.


CASSI

In this work, we present a novel adversarial imitation learning method named Cooperative Adversarial Self-supervised Skill Imitation (CASSI).

Overview

Given an unlabeled dataset, the system trains an imitation discriminator that learns sampled state transition patterns present in the demonstrations. Meanwhile, at the beginning of each episode, a latent skill variable is sampled and motivates the policy to generate distinct motions. A skill discriminator is trained to decode the original skill from these motions, whose discriminating performance is promoted to encourage discriminability among skill executions.

CASSI Overview

Summary

CASSI allows learning agents to extract and obtain individual behaviors from datasets containing diverse unknown state transition patterns.

Evaluation

We evaluate CASSI on the Solo 8 robot,  an open-source research quadruped robot that performs a wide range of physical actions, in simulation and on the real system.

Unlabeled Mixed Motions

For evaluation, we construct an unlabeled dataset with mixed motions including crawl, leap, stilt, trot, walk and wave with various speeds.

Versatile Policy

The resulting policy is a versatile policy that generates consistently distinguishable state transition patterns. The policy enables active control of skills extracted from the unlabeled dataset with seamless transitions from one to another.

Novel Behaviors

New behaviors are discovered in form of reference interpolations. The components of these novel skills are quantified using an oracle classifier, which is trained on the original dataset with ground-truth knowledge.

Policy Deployment

The learned policy is tested on the real Solo 8 robot for various tasks.

Implementation Details

We attach the imeplementation details of CASSI below.