Co-GAIL

Learning Diverse Strategies for Human-Robot Collaboration


Chen Wang, Claudia Pérez-D'Arpino, Danfei Xu, Li Fei-Fei, C. Karen Liu, Silvio Savarese

Stanford University

Supplementary Video

CoGAIL.mp4

Abstract

We present a method for learning human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator in-the-loop.

Co-GAIL

We first propose a co-policy model that learns to output both human and robot actions. The model is trained with a GAIL imitation learning algorithm from the collected human-human collaboration data. To uncover the diverse human behaviors, we further introduce a latent human strategy representation z. This representation is trained by two learning objective to enforce the forward and inverse mapping between the strategy space and human behaviors.

Evaluation with real human

During evaluation, the humanoid in the simulation will be controlled by a real human operator and the robot will first estimate the hidden strategy of the human and react accordingly.

Experiment setups

Tasks

We test our method and baselines in three task domains:

  • A 2D low-dimensional collaborative game Fetch-Quest. (2D-Fetch-Quest)

  • A high-dimensional human-robot handover. (HR-Handover)

  • A multi-stage human-robot sequence manipulation task. (HR-SeqManip)

2D-Fetch-Quest

HR-Handover

HR-SeqManip

Data collection

We aim to learn human-robot collaborative behaviors from human-human collaboration data. Our first step is to collect the human-human collaboration demonstrations.

For 2D-Fetch-Quest, the data is collected with two people using a pair of joysticks.

2D-Fetch-Quest (Joystick control)

For HR-Handover and HR-SeqManip, we use the phone teleoperation platform RoboTurk. Both operators hold their phones to control the end-effectors of a humanoid and robot in the simulation to complete a collaboration task.

HR-Handover & HR-SeqManip (Phone teleoperation control)

Experiment results (real human evaluation)

Here we show the real-human evaluation results in three task domains with different human operators. For the other experiment results (replay evaluation and interpolation), please refer to our paper and the supplementary video (top of this page).

2D-Fetch-Quest

MA-InfoGAIL

DIAYN

Co-GAIL (ours)

HR-Handover

MA-InfoGAIL

Co-GAIL (ours)

HR-SeqManip

MA-InfoGAIL

Co-GAIL (ours)