Abstract
Cloth folding stands as an intricate subject in robot manipulation, requiring robots to fold diverse fabrics into different configurations according to human intentions. Most previous approaches address this problem in a vision/language-goal-conditioned way. Relying on substantial expert demonstrations for training and precise subgoals for inferring, they lack inherent multi-step reasoning ability and struggle to generalize to novel cloth appearances and tasks. To tackle these problems, our key insight is incorporating the common sense reasoning ability of Large Language Models (LLMs) into cloth manipulation while addressing the limitations of LLMs in manipulating deformable objects, which involves an effective grounding module and rational planning hierarchy. To this end, we present PolyFold, a novel language-conditioned bimanual cloth folding framework that leverages the parameterized polygon model as an effective abstraction and grounding module for cloth representation. Moreover, PolyFold enables LLMs to infer an intermediate-level action—specifically, the symmetrical fold line, while delegating the pick-and-place calculations to a fold-line-guided downstream policy, which is learned through self-supervision using random data. Experiments on 70 cloth folding tasks and 4 cloth types show that PolyFold excels in zero-shot generalization and inherent multi-step reasoning capability, while also operating in a sample-efficient expert-demonstration-free manner, surpassing previous SOTA vision-conditioned and language-conditioned methods. Our method can also be directly deployed in real-world scenarios.
System Pipeline
Real-World Experiments
Here we display real-world experiments of different types of tasks. Tasks are classified into different types, denoted as <Cloth Type>-<Folding Type>-Folding. For cloth type, 'S', 'R', 'T', and 'P' refer to square, rectangle, t-shirt, and pant cloth respectively. None of the evaluated cloth objects and evaluated tasks have been seen before.
S-Corner-Folding
"Fold all corners of the square cloth into the center."
"Bring all corners of the square towards the center."
"Fold both the top right and bottom left corners of the square towards the center."
S-Triangle-Folding
"Fold the square into a shape whose area is one fourth of its original area. The achieved shape is a triangle."
"Converge the top-right corner towards bottom-left corner and then bring the top-left corner down to meet bottom-right corner."
"Bring the bottom-right corner up to meet the top-left corner, and then fold the bottom-left corner up to meet the top-right corner."
R-Edge-to-Middle-Folding
"Fold the bottom edge of the rectangle cloth upward to the horizontal middle line, and then repeat the same for the top edge."
"Bring the top edge of the rectangle cloth downward to the horizontal middle line, and then repeat the same for the bottom edge."
"Position the right edge of the rectangular cloth leftwards, meeting it with the vertical center line, then do the same for left edge."
R-Edge-to-Opposite-Folding
"Fold the rectangular cloth in half from bottom to top and then fold it in half from left to right."
"Take the top edge of the rectangle and fold it towards the bottom edge, then fold the left edge towards the right edge."
*Square cloth is a sepcial type of rectanglular cloth, so here we also use some square cloth for evaluation of rectangle folding tasks.
"Fold the square so that it remains a square, but with side lengths half of the original."
*Square cloth is a sepcial type of rectanglular cloth, so here we also use some square cloth for evaluation of rectangle folding tasks.
T-Sleeve-Folding & T-Half-Folding
"Converge the left and right sleeves of t-shirt in half, letting the sleeve edges meet the armpit-shoulder lines."
"Fold the sleeves of the t-shirt inward. However, the sleeves are too long, you cannot fold them inward directly as they will exceed the main body of the garment. "
"Fold the t-shirt in half from left to right."
T-Block-Folding
"Organize this t-shirt by folding it into a rectangular block in three steps. "
"Fold the t-shirt into a neat rectangle. "
"Fold both sleeves of the t-shirt towards the center, followed by folding the t-shirt in half from bottom to top."
P-Half-Folding & P-Block-Folding
"Fold the pant into a rectangular block in two steps. "
"Fold the right leg of the pant in half from bottom to top. Then do the same for the left leg."
*As the length of the pant leg exceeds the operation space of the ABB robot, we place it like this. When inputting the image into the algorithm, we rotate the image 90 degrees clockwise, so that it matches the language description.
"Fold the pant in half from left to right and then fold the bottom edge of the pant upwards to meet the top."
Appendix: Evaluated Tasks
Figure. 70 evaluated tasks in SoftGym simulator.
Appendix: Polygon Model Fitting
Figure. Polygon model fitting process of flattened cloth and folded cloth, which contains the process of initialization stage, orientation optimization, symmetry optimization and asymmetry optimization optionally.
Appendix: Detailed Quantitative and Qualitative Analysis
Note that due to the limited space on the website for displaying an image, we recommend readers download the images from our Google Drive or refer to our PDF paper for better visualization.
Figure. Evaluation results of our proposed PolyFold and baselines on single-step/multi-step tasks conducted in the unseen cloth + unseen task setting.
Figure. More detailed simulation experiment results of PolyFold and four baselines evaluated under unseen cloth + unseen task setting. The provided final goal (for our method) and subgoals (for baselines) are also provided on the left column of each task. The orange arrow represents the pick-and-place action for one robot arm, and the green arrow represents the symmetrical fold line, which is the intermediate action representation deduced by LLM in our method PolyFold.
Acknowledgements
We would like to express our gratitude towards previous great works in deformable object manipulation for their inspiration and open-source projects, including A Geometric Approach to Robotic Laundry Folding, Foldsformer, Language-Deformable, Flingbot, ClothFunnels.