Zhejiang University
Cloth folding stands as an intricate subject in robot manipulation, requiring robots to fold diverse fabrics into different configurations according to human intentions. Previous work in this area falls into three primary categories: imitation learning, reinforcement learning, and geometric model-based planning methods. While each paradigm has its merits, they generally lack inherent multi-step reasoning ability and struggle to generalize to novel cloth appearances and tasks. To tackle these problems, our key insight is incorporating the common sense reasoning and generalization abilities of Large Language Models (LLMs) into cloth manipulation, while addressing the limitations of LLMs in manipulating deformable objects, which involves an effective grounding module and rational planning hierarchy. To this end, we present PolyFold, a novel language-conditioned bimanual cloth folding framework that leverages the parameterized polygon model as an effective abstraction and grounding module for cloth representation. Moreover, PolyFold enables LLMs to infer an intermediate-level action—specifically, the symmetrical fold line, while delegating the pick-and-place calculations to a fold-line-guided downstream policy, which is learned through self-supervision using random data. Experiments on 70 cloth folding tasks and 4 cloth types show that PolyFold excels in zero-shot generalization and inherent multi-step reasoning capability, while also operating in a sample-efficient expert-demonstration-free manner, surpassing previous SOTA vision-conditioned and language-conditioned methods. Our method can also be directly deployed in real-world scenarios.
Here we display real-world experiments of different types of tasks. Tasks are classified into different types, denoted as <Cloth Type>-<Folding Type>-Folding. For cloth type, 'S', 'R', 'T', and 'P' refer to square, rectangle, t-shirt, and pant cloth respectively. None of the evaluated cloth objects and evaluated tasks have been seen before.
S-Corner-Folding
"Fold all corners of the square cloth into the center."
"Bring all corners of the square towards the center."
"Fold both the top right and bottom left corners of the square towards the center."
S-Triangle-Folding
"Fold the square into a shape whose area is one fourth of its original area. The achieved shape is a triangle."
"Converge the top-right corner towards bottom-left corner and then bring the top-left corner down to meet bottom-right corner."
"Bring the bottom-right corner up to meet the top-left corner, and then fold the bottom-left corner up to meet the top-right corner."
R-Edge-to-Middle-Folding
"Fold the bottom edge of the rectangle cloth upward to the horizontal middle line, and then repeat the same for the top edge."
"Bring the top edge of the rectangle cloth downward to the horizontal middle line, and then repeat the same for the bottom edge."
"Position the right edge of the rectangular cloth leftwards, meeting it with the vertical center line, then do the same for left edge."
R-Edge-to-Opposite-Folding
"Fold the rectangular cloth in half from bottom to top and then fold it in half from left to right."
"Take the top edge of the rectangle and fold it towards the bottom edge, then fold the left edge towards the right edge."
*Square cloth is a sepcial type of rectanglular cloth, so here we also use some square cloth for evaluation of rectangle folding tasks.
"Fold the square so that it remains a square, but with side lengths half of the original."
*Square cloth is a sepcial type of rectanglular cloth, so here we also use some square cloth for evaluation of rectangle folding tasks.
T-Sleeve-Folding & T-Half-Folding
"Converge the left and right sleeves of t-shirt in half, letting the sleeve edges meet the armpit-shoulder lines."
"Fold the sleeves of the t-shirt inward. However, the sleeves are too long, you cannot fold them inward directly as they will exceed the main body of the garment. "
"Fold the t-shirt in half from left to right."
T-Block-Folding
"Organize this t-shirt by folding it into a rectangular block in three steps. "
"Fold the t-shirt into a neat rectangle. "
"Fold both sleeves of the t-shirt towards the center, followed by folding the t-shirt in half from bottom to top."
P-Half-Folding & P-Block-Folding
"Fold the pant into a rectangular block in two steps. "
"Fold the right leg of the pant in half from bottom to top. Then do the same for the left leg."
*As the length of the pant leg exceeds the operation space of the ABB robot, we place it like this. When inputting the image into the algorithm, we rotate the image 90 degrees clockwise, so that it matches the language description.
"Fold the pant in half from left to right and then fold the bottom edge of the pant upwards to meet the top."
Figure. 70 evaluated tasks in SoftGym simulator.