HITL-TAMP
Human-In-The-Loop Task and Motion Planning for Imitation Learning
HITL-TAMP
Human-In-The-Loop Task and Motion Planning for Imitation Learning
HITL-TAMP decomposes a robot manipulation task (such as making coffee) into planning-based and learning-based control segments. The planning-based segments are handled by a TAMP planner and the learning-based segments are handled by a human demonstrator (during data collection) and a policy trained on demonstrations (during policy deployment).
We develop HITL-TAMP, an efficient data-collection system for long-horizon and contact-rich manipulation tasks that synergistically combines and trades off control between a TAMP system and a human teleoperator.
HITL-TAMP contains novel components including (1) a mechanism that allows TAMP to learn planning conditions from a small number of demonstrations and (2) a queueing system that allows an operator to manage a fleet of parallel data collection sessions.
We conduct a user study across 15 users to compare HITL-TAMP with a conventional teleoperation system. Users collectively gathered over 1.4K demos, more than 3x the conventional system, given the same time budget. Proficient agents (over 75%) could be trained from just 10 minutes of non-expert teleoperation data.
We collected an additional 2.1K demos with HITL-TAMP across 12 contact-rich and long-horizon tasks and show that HITL-TAMP often produces near-perfect agents.
Below, we show several uncut videos of HITL-TAMP agent performance on real-world manipulation.
Real-World Tool Hang Agent (64% success and 88% frame insertion success from 50 HITL-TAMP demos)
Real-World Stack Three Agent (62% success from 100 HITL-TAMP demos)
Real-World Coffee Broad Agent (66% success from 100 HITL-TAMP demos)
Real-World Coffee Agent (74% success from 100 HITL-TAMP demos)
Real-World Coffee Broad Agent (66% success from 100 HITL-TAMP demos)
HITL-TAMP's queueing system allows human operators to manage fleets of asynchronously-running data collection sessions. This allows each operator to more effectively use their time and scale the amount of data that they collect. We collected 2.1K+ demos using the system across these tasks.
Square
Square Broad
Three Piece Assembly
Three Piece Assembly Broad
Coffee
Coffee Broad
Coffee Preparation
Tool Hang
Tool Hang Broad
Each policy below was trained on 200 demos from HITL-TAMP. Several policies are near-perfect.
Square (100%)
Square Broad (100%)
Three Piece Assembly (100%)
Three Piece Assembly Broad (85%)
Coffee (100%)
Coffee Broad (99%)
Coffee Preparation (96%)
Tool Hang (81%)
Tool Hang Broad (49%)
The policies below were trained on just 10 minutes of operator data from an operator with little to no experience with teleoperation. We compare policies trained on 10 minutes of HITL-TAMP data and 10 minutes of conventional teleoperation data.
Coffee (HITL-TAMP) (100%)
Coffee (Conventional) (28%)
Square Broad (HITL-TAMP) (84%)
Square Broad (Conventional) (0%)
Three Piece Assembly Broad (HITL-TAMP) (22%)
Three Piece Assembly Broad (Conventional) (0%)
Square
Square Broad
Three Piece Assembly
Three Piece Assembly Broad
Coffee
Coffee Broad
Coffee Preparation
Tool Hang
Tool Hang Broad
Stack Three Real
Tool Hang Real
Coffee Real
Coffee Broad Real
The videos below are from the HITL-TAMP datasets that we used to train our real world agents. The video resolution matches the image resolution used for policy training.
Stack Three Real (100 demos)
Tool Hang Real (50 demos)
Coffee Real (100 demos)
Coffee Broad Real (100 demos)
Real Tool Hang Task
Real Tool Hang Task
Real Stack Three Task
Real Coffee Broad Task
Real Coffee (Left) Task
Real Coffee (Right) Task
Simulation Demonstrations
Simulation Initial States