Foldsformer: Learning Sequential Multi-Step Cloth Manipulation
with Space-Time Attention
Kai Mo, Chongkun Xia, Xueqian Wang, Yuhong Deng, Xuehai Gao and Bin Liang
Tsinghua University
[IEEE Manuscript] | [ArXiv] | [Code]
Abstract
Sequential multi-step cloth manipulation is a challenging problem in robotic manipulation, requiring a robot to perceive the cloth state and plan a sequence of chained actions leading to the desired state. Most previous works address this problem in a goal-conditioned way, and goal observation must be given for each specific task and cloth configuration, which is not practical and efficient. Thus, we present a novel multi-step cloth manipulation planning framework named Foldformer. Foldformer can complete similar tasks with only a general demonstration and utilize a space-time attention mechanism to capture the instruction information behind this demonstration. We experimentally evaluate Foldsformer on four representative sequential multi-step manipulation tasks and show that Foldsformer significantly outperforms state-of-the-art approaches in simulation. Foldformer can complete multi-step cloth manipulation tasks even when configurations of the cloth (e.g., size and pose) vary from configurations in the general demonstrations. Furthermore, our approach can be transferred from simulation to the real world without additional training or domain randomization. Despite training on rectangular clothes, we also show that our approach can generalize to unseen cloth shapes (T-shirts and shorts)..
Framework
Experimental Setup
Our robot system (a) consists of a 7-DOF Franka Emika Panda robot arm with a standard two-finger panda gripper. We use 3 square cloths of different size, a T-shirt and a pair of pants:
(b) a 20 × 20 cm cloth
(c) a 30 × 30 cm cloth
(d) a 35 × 35 cm cloth
(e) a T-shirt (50 × 35 cm, unseen cloth shape during training)
(f) a pair of shorts (40 × 32 cm, unseen cloth shape during training)
Videos of robot executions in the real world
Task: Double Triangle
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (20 × 20 cm, rotate 0°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (35 × 35 cm, rotate 30°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (30 × 30 cm, rotate 45°)
Pick and place actions are visualized as red arrows
Task: Double Straight
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (20 × 20 cm, rotate 0°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (35 × 35 cm, rotate 30°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (30 × 30 cm, rotate 45°)
Pick and place actions are visualized as red arrows
Task: All Corners Inward
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (20× 20 cm, rotate 0°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (35 × 35 cm, rotate 30°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (30 × 30 cm, rotate 45°)
Pick and place actions are visualized as red arrows
Task: Corners Edges Inward
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (20 × 20 cm, rotate 0°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (35 × 35 cm, rotate 30°)
Pick and place actions are visualized as red arrows
Demonstration Images (30 × 30 cm, rotate 0°)
Actions & Achieved (30 × 30 cm, rotate 45°)
Pick and place actions are visualized as red arrows
Task: T-shirt Folding
T-shirt is unseen during training, we only use rectangular cloths for training in simulation.
Demonstration Images
Actions & Achieved
Pick and place actions are visualized as red arrows
Task: Shorts Folding
Demonstration Images
Actions & Achieved
Pick and place actions are visualized as red arrows
If you have any questions, please feel free to contact me via mok21@tsinghua.org.cn
Acknowledgement
This work was supported by the National Key R&D Program of China (2022YFB4701400/4701402), National Natural Science Foundation of China (No. U1813216, U21B6002, 62203260), the Shenzhen Science Fund for Distinguished Young Scholars (RCJC20210706091946001), Guangdong Young Talent with Scientific and Technological Innovation (2019TQ05Z111), China Postdoctoral Science Foundation (2022M711823).
If you find this code useful in your research, please consider citing:
@ARTICLE{mo2022foldsformer,
author={Mo, Kai and Xia, Chongkun and Wang, Xueqian and Deng, Yuhong and Gao, Xuehai and Liang, Bin},
journal={IEEE Robotics and Automation Letters},
title={Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention},
year={2023},
volume={8},
number={2},
pages={760-767},
doi={10.1109/LRA.2022.3229573}
}