Overview
Framework
We design a unified Transformer-based model architecture to understand the multi-modal data and output picking and placing action with task completion prediction. We introduce a visible connectivity graph to tackle deformable objects’ complex configurations and dynamics.
Examples of language-conditioned deformable object manipulation tasks
Seen instructions, unseen instructions, unseen tasks are marked in black, grey and red, respectively
Videos of robot executions in the real-world experiments
Task: corner folding
Task: triangle folding
Task: Half folding
Task: T-shirt folding
Task: Trousers folding
If you have any questions, please feel free to contact us via :
mok21@mails.tsinghua.edu.cn
yuhongdeng@u.nus.edu