Eliciting Compatible Demonstrations for
Multi-Human Imitation Learning

Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh

📜 PAPER

⏳ CODE (coming soon)

ABSTRACT

Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation. While the ideal dataset for imitation learning is homogenous and low-variance - reflecting a single, optimal method for performing a task - natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task. This multimodality is inconsequential to human users, with task variations manifesting as subconscious choices; for example, reaching down, then across to grasp an object, versus reaching across, then down. Yet, this mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations. To combat this problem of demonstrator incompatibility, this work designs an approach for 1) measuring the compatibility of a new demonstration given a base policy, and 2) actively eliciting more compatible demonstrations from new users. Across two simulation tasks requiring long-horizon, dexterous manipulation and a real-world "food plating" task with a Franka Emika Panda arm, we show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users, leading to improved task success rates across simulated and real environments.

Interface for collection

practice-5x.mp4

The user explores the controls and completes the task three times.

learn-5x.mp4

The user sees five expert demonstrations and is asked to mimic the style of the expert.

teach2-3x.mp4

The user performs the task and receives online visual feedback about their actions.

feedback-5x.mp4

Corrective feedback is shown to the user if a demonstration is rejected.

Active Elicitation Results

Success rates (mean/std across users) for the user studies evaluating both naive and informed demonstration collection against base users.

Policy Rollouts for Active Elicitation vs Naive Collection

user-1-sharp-compatible-success.mov

(a) Successful rollout of a policy trained on demonstrations from an informed operator

user-1-naive-sideways-tilt-fail.mov

(b) Failed rollout of a policy trained on demonstrations from a naive operator