Foldsformer: Learning Sequential Multi-Step Cloth Manipulation

with Space-Time Attention

Kai Mo, Chongkun Xia, Xueqian Wang, Yuhong Deng, Xuehai Gao and Bin Liang

Tsinghua University

Abstract

Sequential multi-step cloth manipulation is a challenging problem in robotic manipulation, requiring a robot to perceive the cloth state and plan a sequence of chained actions leading to the desired state. Most previous works address this problem in a goal-conditioned way, and goal observation must be given for each specific task and cloth configuration, which is not practical and efficient. Thus, we present a novel multi-step cloth manipulation planning framework named Foldformer. Foldformer can complete similar tasks with only a general demonstration and utilize a space-time attention mechanism to capture the instruction information behind this demonstration. We experimentally evaluate Foldsformer on four representative sequential multi-step manipulation tasks and show that Foldsformer significantly outperforms state-of-the-art approaches in simulation. Foldformer can complete multi-step cloth manipulation tasks even when configurations of the cloth (e.g., size and pose) vary from configurations in the general demonstrations. Furthermore, our approach can be transferred from simulation to the real world without additional training or domain randomization. Despite training on rectangular clothes, we also show that our approach can generalize to unseen cloth shapes (T-shirts and shorts)..

Framework

Experimental Setup

Our robot system (a) consists of a 7-DOF Franka Emika Panda robot arm with a standard two-finger panda gripper. We use 3 square cloths of different size, a T-shirt and a pair of pants:

(b) a 20 × 20 cm cloth

(d) a 35 × 35 cm cloth

(e) a T-shirt (50 × 35 cm, unseen cloth shape during training)

(f) a pair of shorts (40 × 32 cm, unseen cloth shape during training)

Videos of robot executions in the real world

Task: Double Triangle

review-t1-20.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (20 × 20 cm, rotate 0°)

Pick and place actions are visualized as red arrows

review-t1-35.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (35 × 35 cm, rotate 30°)

Pick and place actions are visualized as red arrows

review-t1-30.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (30 × 30 cm, rotate 45°)

Pick and place actions are visualized as red arrows

Task: Double Straight

review-t2-20.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (20 × 20 cm, rotate 0°)

Pick and place actions are visualized as red arrows

review-t2-35.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (35 × 35 cm, rotate 30°)

Pick and place actions are visualized as red arrows

review-t2-30.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (30 × 30 cm, rotate 45°)

Pick and place actions are visualized as red arrows

Task: All Corners Inward

review-t3-20.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (20× 20 cm, rotate 0°)

Pick and place actions are visualized as red arrows

review-t3-35.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (35 × 35 cm, rotate 30°)

Pick and place actions are visualized as red arrows

review-t3-30.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (30 × 30 cm, rotate 45°)

Pick and place actions are visualized as red arrows

Task: Corners Edges Inward

review-t4-20.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (20 × 20 cm, rotate 0°)

Pick and place actions are visualized as red arrows

review-t4-35.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (35 × 35 cm, rotate 30°)

Pick and place actions are visualized as red arrows

review-t4-30.mp4

Demonstration Images (30 × 30 cm, rotate 0°)

Actions & Achieved (30 × 30 cm, rotate 45°)

Pick and place actions are visualized as red arrows

Task: T-shirt Folding

T-shirt is unseen during training, we only use rectangular cloths for training in simulation.

shirt.mp4

Demonstration Images

Actions & Achieved

Pick and place actions are visualized as red arrows

Task: Shorts Folding

review-shorts.mp4

Demonstration Images

Actions & Achieved

Pick and place actions are visualized as red arrows

If you have any questions, please feel free to contact me via mok21@tsinghua.org.cn

Acknowledgement

This work was supported by the National Key R&D Program of China (2022YFB4701400/4701402), National Natural Science Foundation of China (No. U1813216, U21B6002, 62203260), the Shenzhen Science Fund for Distinguished Young Scholars (RCJC20210706091946001), Guangdong Young Talent with Scientific and Technological Innovation (2019TQ05Z111), China Postdoctoral Science Foundation (2022M711823).

If you find this code useful in your research, please consider citing:

@ARTICLE{mo2022foldsformer,

author={Mo, Kai and Xia, Chongkun and Wang, Xueqian and Deng, Yuhong and Gao, Xuehai and Liang, Bin},

journal={IEEE Robotics and Automation Letters},

title={Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention},

year={2023},

volume={8},

number={2},

pages={760-767},

doi={10.1109/LRA.2022.3229573}

}