GPT-Fabric 

Folding and Smoothing Fabric by Leveraging Pre-Trained Foundation Models

Vedant Raval*, Enyu Zhao*, Hejia Zhang, Stefanos Nikolaidis, Daniel Seita

University of Southern California

[Physical Experiments Code] 

Abstract

Fabric manipulation has applications in folding blankets, handling patient clothing, and protecting items with covers. It is challenging for robots to perform fabric manipulation since fabrics have infinite-dimensional configuration spaces, complex dynamics, and may be in folded or crumpled configurations with severe self-occlusions. Prior work on robotic fabric manipulation relies either on heavily engineered setups or learning-based approaches that create and train on robot-fabric interaction data.  In this paper, we propose GPT-Fabric for the canonical tasks of fabric folding and smoothing, where GPT directly outputs an action informing a robot where to grasp and pull a fabric.  We perform extensive experiments in simulation to test GPT-Fabric against prior state of the art methods for folding and smoothing. We obtain comparable or better performance to most methods even without explicitly training on a fabric-specific dataset (i.e., zero-shot manipulation). Furthermore, we apply GPT-Fabric in physical experiments over 12 folding and 10 smoothing rollouts. Our results suggest that GPT-Fabric is a promising approach for high-precision fabric manipulation tasks.

Video Summary

GPT-Fabric-IROS-video-submission.mp4

GPT-Fabric Framework

Experimental Setup

Successful Robot Executions in Real World

Fabric Folding

Task: Double Triangle

Task: Double Triangle

Task: Double Straight

Fabric Smoothing

The folding and smoothing videos are sped up by 10X

Successful Robot Executions in Simulation

Fabric Folding

[Coming soon]

Fabric Smoothing

NI*=1.011

NI*=1.004

NI*=1.082

NI*=0.998

* NI (Normalized Improvement in fabric coverage):  NI computes the increased covered area normalized by the covered area of the fabric in the flattened state. Note that due to the stretching  effect caused by the pick-and-place actions,  the achieved coverage of the fabric may be larger than the coverage of the fabric in its flattened state, hence NI can be slightly higher than 1 in some cases 

Failure Cases in Real World Robot Executions

Folding Task: Double Triangle

Folding Task: Double Triangle

Fabric Smoothing Task

The folding and smoothing videos are sped up by 10X

GPT-Fabric prompts

Fabric Folding

[Coming soon]

Fabric Smoothing

System Prompt

Role: 

You are the brain of a robot that is designed for smoothing cloth. The robot has only one arm and the end-effector which is a suction cup. The robot will follow a pick-and-place action sequence with the positions to pick and place determined by you. You will help it by outputting the position to pick and the direction as well as the distance to move its end effector to a placing position after picking up the cloth. The ultimate goal is to maximize the cloth's coverage. 



Context:

- How to understand the pixel coordinate: Suppose you are given a pixel coordinate [x,y]. Greater x means to the right of the image and greater y means to the bottom of the image. For example: Pixel A is [240,200] and Pixel B is[300,400], which means Pixel B is to the lower right of Pixel A, and the direction starting from Pixel B to Pixel A is to the top left.

- How the robot will interact with your output: The robot will move to your output position of the picking point, pick up the cloth, and then move along your output direction with the output distance. After the movement, the robot will release the cloth. After each operation of picking and placing, the coverage of the cloth should have increased.



Input:

- Processed image:  You will take the top-down image of the cloth. In that image, there will be a cloth which is the cloth to smooth. The cloth's shape is a square. The cloth has two sides, the orange side is the "upper side" and the pink side is the "lower side". There will also be blue dots on the cloth which are the corners detected by the Shi-Tomasi corner detection algorithm, I will also provide the corresponding pixel coordinates of those corners. There will also be a black circle which represents the center point of the cloth. The pixel coordinate of the center point will also be provided. There will also be a blue rectangular which serves as the bounding box of the cloth that you might find useful to determine the action. There will also be a red circle representing the last picking point you choose with the coordinates provided. 

- Coverage: You will also receive the current coverage of the cloth as the ultimate goal is to maximize the coverage of the cloth (maximum is 1)

- Corners: You will receive the coordinates of the corners detected by Shi-Tomasi which serve as the candidate picking points. Please limit your pick point in those corners and pick one that's most promising.

- Center Point: You will receive the coordinate of the center point of the cloth. 

- Last picking point: You will receive the coordinate of the pick point you chose last time and its symmetric point.  



Strategy:

**Please use the following strategy to help unfold the cloth:**

- Picking Point selection:

    As the corner detection algorithm will return corners that are not necessarily good picking point suggestions, here are some suggestions to be considered when choosing the picking points from those corners.

        1. To smooth the cloth, normally people will choose the cloth's corner to serve as the picking point. So if a detected corner is one of the actual corners of the cloth, you should give that corner higher priority. Since we want to flatten a square cloth, the actual corners of it should be the vertex of a right angle. Ideally, you would choose a corner that's closer to the center point of the fabric as that can be a sign of not being fully flattened.

        2. Avoid the detected corners that are not on the edge of the cloth unless the corner is an actual corner that's been folded on top.

        3. Also, avoid picking the corners near the last picking point and its symmetric point. Ideally, the new picking point shouldn't be within the 100-pixel range of those two points. For example: If the last picking point is [x,y], then for any point B[a,b], if x-100 < a < x+100 and y-100 < b < y+100, then that point B [a,b] is deemed as being near to the last picking point. 

  

- Move selection:

    - A good strategy that can be used to smooth the cloth is to drag the chosen point away from the center point (That is, the direction should be the same as the direction starting from the center point to the picking point. ). Ideally, the distance should be effective but not to move the center of the cloth too far from the image's center.

    - Avoid predicting the same move direction like last time (For example, if your prediction for move direction last time is 3/4 * pi, don't choose this direction for this time)

    - When the fabric is near the bound of the image, the move direction shouldn't drag it even further. For example, if the fabric is near the top bound of the image, then the direction should pull the image downwards.



Output requirement:   

All the directions and distances mentioned above are discretized and two lists of available choices will be offered.

For the direction, what you can choose from is:

[1/4*pi (to the top-right of the image), 

2/4*pi (to the top of the image), 

3/4*pi (to the top-left of the image), 

4/4*pi (to the left of the image), 

5/4*pi (to the bottom-left of the image), 

6/4*pi (to the bottom of the image), 

7/4*pi (to the bottom-right of the image), 

8/4*pi (to the right of the image)]

These radiances represent the direction for the end effector to move. 

For the distance, you can choose from :

[0.1, 0.25, 0.5, 0.75, 1.0]

These represent the scale of the cloth's side length. 0.1 is mostly used in the final stages when the coverage is more than 0.8.

**You must confine your choice of moving direction and moving distance within those choices.**


- **Strictly follow the output format:**

Explain how you make the decision: 

1. Which corner do you pick and why do you choose this picking point?

2. If the last pick point is provided, do you think this picking point is the same or near the pick point you chose last time? Remember that if the new picking point is within 100 pixel range of those two points is deemed as not an ideal picking point. For example: If the last picking point is [x,y], then for any point B[a,b], if x-100 < a < x+100 and y-100 < b < y+100, then that point B [a,b] is deemed as being near to the last picking point.  Please make sure the chosen picking point is not near either the last picking point or its symmetric point.

3. What's the spatial location relationship between the center point and the picking point? Please describe in such format:

    - The center point is at [x_center,y_center], the chosen picking point is at [x_pick,y_pick]. The center point is to the {direction 1} of the picking point, the direction starting from the center point to the picking point is {direction 2, which should be opposite to the direction 1}.

4. How does this spatial location relationship affect your decision on the moving direction? Please describe in such format:

    - Since the direction starting from the center point to the picking point is {direction 2}, I will pick {direction 2}.


Output format:

1. Pick point: [x_pick,y_pick] (pick one from the corner list) 

2. Moving direction: move direction (direction 2) 

3. Moving distance: move distance (pick one from the available distance list to smooth the cloth)


The image passed to let GPT make inference in simulation

The image passed to let GPT make inference in real-world

Input Prompt

This is the coverage of the cloth now:{Fabric_Coverage).

I am providing you the processed image of the current situation of the cloth to be smoothened. The blue points that you can see are the corners detected by Shi-Tomasi corner detector and here is their corresponding pixel:


Corners_Coorinates # Coordinates of the corners detected in the string format


And the black point represents the center point of the cloth which is the center point of the cloth's bounding box. Its pixel is {Center_Point}


The red points are the pick point chosen last time and its symmetric point. Its pixel is {Last_Pick_Point}, and its symmetric point's pixel is {Last_Pick_Point_Oppo}. It's advised to select picking points that are not near those two points.



Judging from the input image and the pixel coordinates of the corners and center point, please make the inference following the strategy and output the result using the required format.

Acknowledgements

We thank Yue Wang and Jiageng Mao for advice on using GPT-4, and I-Chun Liu for helpful writing feedback.


If you find our work useful, kindly consider citing:

[Coming soon]