Transformers are Adaptable Task Planners
6th Conference on Robot Learning (CoRL 2022)
Vidhi Jain*^ Yixin Lin* Eric Undersander* Yonatan Bisk^ Akshara Rai*
Abstract
Every home is different, and every person likes things done in their particular way. Therefore, home robots of the future need to both reason about the sequential nature of day-to-day tasks and generalize to user's preferences. To this end, we propose a Transformer Task Planner (TTP) that learns high-level actions from demonstrations by leveraging object attribute-based representations. TTP can be pre-trained on multiple preferences and shows generalization to unseen preferences using a single demonstration as a prompt in a simulated dishwasher loading task. Further, we demonstrate real-world dish rearrangement using TTP with a Franka Panda robotic arm, prompted using a single human demonstration.
Scene to Instances
Instances to Prediction
Prompt Situation Architecture
The prompt encoder takes one demonstration, known as a "prompt", and outputs a learned representation of the preference.
These output prompt tokens are passed to a situation decoder, which also receives the current state of the environment, also referred a "situation".
The decoder is trained to predict the action chosen by the expert for the state of the given situation, adhering to the preference shown through the prompt.
Simulation experiment
Preference shown in prompt: "plates and trays on the bottom rack, then cups and bowls on the top rack."
Prompt
Situation
Real-world experiments
Video Speed 8x
4 out of 4 successfully placed.
4 out of 4 successfully placed.
3 out of 4 successfully placed. The dark blue plate toppled and was not detected.
3 out of 4 successfully placed. The pink bowl toppled and was not detected.