Co-GAIL
Learning Diverse Strategies for Human-Robot Collaboration
Chen Wang, Claudia PĂ©rez-D'Arpino, Danfei Xu, Li Fei-Fei, C. Karen Liu, Silvio Savarese
Stanford University
Paper: available on arxiv
Code: available on github
Supplementary Video
Abstract
We present a method for learning human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator in-the-loop.
Co-GAIL
We first propose a co-policy model that learns to output both human and robot actions. The model is trained with a GAIL imitation learning algorithm from the collected human-human collaboration data. To uncover the diverse human behaviors, we further introduce a latent human strategy representation z. This representation is trained by two learning objective to enforce the forward and inverse mapping between the strategy space and human behaviors.
Evaluation with real human
During evaluation, the humanoid in the simulation will be controlled by a real human operator and the robot will first estimate the hidden strategy of the human and react accordingly.
Experiment setups
Tasks
We test our method and baselines in three task domains:
A 2D low-dimensional collaborative game Fetch-Quest. (2D-Fetch-Quest)
A high-dimensional human-robot handover. (HR-Handover)
A multi-stage human-robot sequence manipulation task. (HR-SeqManip)
2D-Fetch-Quest
HR-Handover
HR-SeqManip
Data collection
We aim to learn human-robot collaborative behaviors from human-human collaboration data. Our first step is to collect the human-human collaboration demonstrations.
For 2D-Fetch-Quest, the data is collected with two people using a pair of joysticks.
2D-Fetch-Quest (Joystick control)
For HR-Handover and HR-SeqManip, we use the phone teleoperation platform RoboTurk. Both operators hold their phones to control the end-effectors of a humanoid and robot in the simulation to complete a collaboration task.
HR-Handover & HR-SeqManip (Phone teleoperation control)
Experiment results (real human evaluation)
Here we show the real-human evaluation results in three task domains with different human operators. For the other experiment results (replay evaluation and interpolation), please refer to our paper and the supplementary video (top of this page).
2D-Fetch-Quest
MA-InfoGAIL
DIAYN
Co-GAIL (ours)
HR-Handover
MA-InfoGAIL
Co-GAIL (ours)
HR-SeqManip
MA-InfoGAIL
Co-GAIL (ours)