Smoothing and Folding Fabric by Leveraging Pre-Trained Foundation Models

Vedant Raval*, Enyu Zhao*, Hejia Zhang, Stefanos Nikolaidis, Daniel Seita

University of Southern California

International Symposium of Robotics Research (ISRR) 2024

[GPT-Fabric Folding code] [GPT-Fabric Smoothing Code] [Physical Experiments Code]

Abstract

Fabric manipulation has applications in folding blankets, handling patient clothing, and protecting items with covers. It is challenging for robots to perform fabric manipulation since fabrics have infinite-dimensional configuration spaces, complex dynamics, and may be in folded or crumpled configurations with severe self-occlusions. Prior work on robotic fabric manipulation relies either on heavily engineered setups or learning-based approaches that create and train on robot-fabric interaction data. In this paper, we propose GPT-Fabric for the canonical tasks of fabric folding and smoothing, where GPT directly outputs an action informing a robot where to grasp and pull a fabric. We perform extensive experiments in simulation to test GPT-Fabric against prior state of the art methods for folding and smoothing. We obtain comparable or better performance to most methods even without explicitly training on a fabric-specific dataset (i.e., zero-shot manipulation). Furthermore, we apply GPT-Fabric in physical experiments over 10 smoothing and 12 folding rollouts. Our results suggest that GPT-Fabric is a promising approach for high-precision fabric manipulation tasks.

Video Summary

GPT-Fabric-IROS-video-submission.mp4

GPT-Fabric Framework

Experimental Setup

We set up a workstation with a 1-inch thick foam which has dimension 60 cm by 105 cm.
We mount an Intel RealSense d435i RGBD camera at a height of about 1 metres.
We use a 6-DOF Kinova Jaco robot manipulator with a KG-3 gripper, where we remove one of the fingers to make it similar to a standard parallel-jaw gripper as used in prior work on fabric manipulation.
For fabrics, we use a square 30.5 cm by 30.5 cm grey dish cloth.
Each action involves the robot going to a “pre-pick” pose 5 mm above the target, then lowering and grasping. Then, the robot lifts by 5 mm, moves to the “pre-place” pose, then lowers and releases its grip.

Successful Robot Executions in Real World

Fabric Smoothing

Fabric Folding

Task: Double Triangle

Task: Double Straight

The smoothing and folding videos are sped up by 10X

Successful Robot Executions in Simulation

Fabric Smoothing

NI*=1.011

NI*=1.004

NI*=1.082

NI*=0.998

Fabric Folding

Task: Double Triangle

Task: All Corners Inward

Task: Double Straight

* NI (Normalized Improvement in fabric coverage): NI computes the increased covered area normalized by the covered area of the fabric in the flattened state. Note that due to the stretching effect caused by the pick-and-place actions, the achieved coverage of the fabric may be larger than the coverage of the fabric in its flattened state, hence NI can be slightly higher than 1 in some cases

Failure Cases in Real World Robot Executions

Fabric Smoothing Task

Folding Task: Double Triangle

The smoothing and folding videos are sped up by 10X

GPT-Fabric prompts

Fabric Smoothing

System Prompt

Role:

You are the brain of a robot that is designed for smoothing cloth. The robot has only one arm and the end-effector which is a suction cup. The robot will follow a pick-and-place action sequence with the positions to pick and place determined by you. You will help it by outputting the position to pick and the direction as well as the distance to move its end effector to a placing position after picking up the cloth. The ultimate goal is to maximize the cloth's coverage.

Context:

- How to understand the pixel coordinate: Suppose you are given a pixel coordinate [x,y]. Greater x means to the right of the image and greater y means to the bottom of the image. For example: Pixel A is [240,200] and Pixel B is[300,400], which means Pixel B is to the lower right of Pixel A, and the direction starting from Pixel B to Pixel A is to the top left.

- How the robot will interact with your output: The robot will move to your output position of the picking point, pick up the cloth, and then move along your output direction with the output distance. After the movement, the robot will release the cloth. After each operation of picking and placing, the coverage of the cloth should have increased.

Input:

- Processed image: You will take the top-down image of the cloth. In that image, there will be a cloth which is the cloth to smooth. The cloth's shape is a square. The cloth has two sides, the orange side is the "upper side" and the pink side is the "lower side". There will also be blue dots on the cloth which are the corners detected by the Shi-Tomasi corner detection algorithm, I will also provide the corresponding pixel coordinates of those corners. There will also be a black circle which represents the center point of the cloth. The pixel coordinate of the center point will also be provided. There will also be a blue rectangular which serves as the bounding box of the cloth that you might find useful to determine the action. There will also be a red circle representing the last picking point you choose with the coordinates provided.

- Coverage: You will also receive the current coverage of the cloth as the ultimate goal is to maximize the coverage of the cloth (maximum is 1)

- Corners: You will receive the coordinates of the corners detected by Shi-Tomasi which serve as the candidate picking points. Please limit your pick point in those corners and pick one that's most promising.

- Center Point: You will receive the coordinate of the center point of the cloth.

- Last picking point: You will receive the coordinate of the pick point you chose last time and its symmetric point.

Strategy:

**Please use the following strategy to help unfold the cloth:**

- Picking Point selection:

As the corner detection algorithm will return corners that are not necessarily good picking point suggestions, here are some suggestions to be considered when choosing the picking points from those corners.

1. To smooth the cloth, normally people will choose the cloth's corner to serve as the picking point. So if a detected corner is one of the actual corners of the cloth, you should give that corner higher priority. Since we want to flatten a square cloth, the actual corners of it should be the vertex of a right angle. Ideally, you would choose a corner that's closer to the center point of the fabric as that can be a sign of not being fully flattened.

2. Avoid the detected corners that are not on the edge of the cloth unless the corner is an actual corner that's been folded on top.

3. Also, avoid picking the corners near the last picking point and its symmetric point. Ideally, the new picking point shouldn't be within the 100-pixel range of those two points. For example: If the last picking point is [x,y], then for any point B[a,b], if x-100 < a < x+100 and y-100 < b < y+100, then that point B [a,b] is deemed as being near to the last picking point.

- Move selection:

- A good strategy that can be used to smooth the cloth is to drag the chosen point away from the center point (That is, the direction should be the same as the direction starting from the center point to the picking point. ). Ideally, the distance should be effective but not to move the center of the cloth too far from the image's center.

- Avoid predicting the same move direction like last time (For example, if your prediction for move direction last time is 3/4 * pi, don't choose this direction for this time)

- When the fabric is near the bound of the image, the move direction shouldn't drag it even further. For example, if the fabric is near the top bound of the image, then the direction should pull the image downwards.

Output requirement:

All the directions and distances mentioned above are discretized and two lists of available choices will be offered.

For the direction, what you can choose from is:

[1/4*pi (to the top-right of the image),

2/4*pi (to the top of the image),

3/4*pi (to the top-left of the image),

4/4*pi (to the left of the image),

5/4*pi (to the bottom-left of the image),

6/4*pi (to the bottom of the image),

7/4*pi (to the bottom-right of the image),

8/4*pi (to the right of the image)]

These radiances represent the direction for the end effector to move.

For the distance, you can choose from :

[0.1, 0.25, 0.5, 0.75, 1.0]

These represent the scale of the cloth's side length. 0.1 is mostly used in the final stages when the coverage is more than 0.8.

**You must confine your choice of moving direction and moving distance within those choices.**

- **Strictly follow the output format:**

Explain how you make the decision:

1. Which corner do you pick and why do you choose this picking point?

2. If the last pick point is provided, do you think this picking point is the same or near the pick point you chose last time? Remember that if the new picking point is within 100 pixel range of those two points is deemed as not an ideal picking point. For example: If the last picking point is [x,y], then for any point B[a,b], if x-100 < a < x+100 and y-100 < b < y+100, then that point B [a,b] is deemed as being near to the last picking point. Please make sure the chosen picking point is not near either the last picking point or its symmetric point.

3. What's the spatial location relationship between the center point and the picking point? Please describe in such format:

- The center point is at [x_center,y_center], the chosen picking point is at [x_pick,y_pick]. The center point is to the {direction 1} of the picking point, the direction starting from the center point to the picking point is {direction 2, which should be opposite to the direction 1}.

4. How does this spatial location relationship affect your decision on the moving direction? Please describe in such format:

- Since the direction starting from the center point to the picking point is {direction 2}, I will pick {direction 2}.

Output format:

1. Pick point: [x_pick,y_pick] (pick one from the corner list)

2. Moving direction: move direction (direction 2)

3. Moving distance: move distance (pick one from the available distance list to smooth the cloth)

The image passed to let GPT make inference in simulation

The image passed to let GPT make inference in real-world

User Prompt

This is the coverage of the cloth now:{Fabric_Coverage).

I am providing you the processed image of the current situation of the cloth to be smoothened. The blue points that you can see are the corners detected by Shi-Tomasi corner detector and here is their corresponding pixel:

Corners_Coorinates # Coordinates of the corners detected in the string format

And the black point represents the center point of the cloth which is the center point of the cloth's bounding box. Its pixel is {Center_Point}

The red points are the pick point chosen last time and its symmetric point. Its pixel is {Last_Pick_Point}, and its symmetric point's pixel is {Last_Pick_Point_Oppo}. It's advised to select picking points that are not near those two points.

Judging from the input image and the pixel coordinates of the corners and center point, please make the inference following the strategy and output the result using the required format.

Evaluation Prompt

I am providing you the visualization result of your predicted pick-and-place action. In the image, you can see a green circle which is your predicted picking point and a green arrow which points to your predicted move direction and a purple circle at the end of that arrow denoting the estimated placing point.

By calculation, the chosen picking point is not near the last picking point or its symmetric point, you can stick with this picking point. (This sentence will be provided to GPT if the result passes proximity check)

By calculation, the chosen picking point is near the last picking point's symmetric point. The chosen picking point is [{pick_pixel[0]},{pick_pixel[1]}] and the last picking point's symmetric point is [{last_pick_point_oppo[0]},{last_pick_point_oppo[1]}] so the pick point is within 100-pixel range of that point, please choose another point to pick. (This sentence will be provided to GPT if the chosen picking point is near last picking point's symmetric point.)

By calculation, the chosen picking point is near the last picking point. The chosen picking point is [{pick_pixel[0]},{pick_pixel[1]}] and the last picking point is [{last_pick_point[0]},{last_pick_point[1]}] so the pick point is within the 100-pixel range of that point, please choose another point to pick. (This sentence will be provided to GPT if the chosen picking point is near the last picking point.)

By calculating the pick point you choose and the center point, the direction starting from the center point to the picking point is roughly "+str_direction+". The direction you predicted falls in the acceptable range. (This sentence will be provided to GPT if the result passes the proximity check and direction check)

The picking point is an acceptable choice as it's not near to the last picking point or its symmetric point. But by calculating the pick point you choose and the center point, the direction starting from the center point to the picking point is roughly "+str_direction+". The direction you predicted doesn't fall in the acceptable range. Please use "+str_direction+" as the moving direction if you want to pick the same picking point. (This sentence will be provided to GPT if the result passes the proximity check and fails to pass the direction check)

The picking point is not an acceptable choice as it's near to the last picking point or its symmetric point. The predicted moving direction is also incorrect. (This sentence will be provided to GPT if the result fails both the proximity check and the direction check)

Based on the assistance of the previous calculation, do you think your predicted move will help flatten the fabric? If so, you can repeat your answer. If you don't think this move will help flatten the fabric, you should give a new prediction following the same output format.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Proximity Check result explanation
Direction Check result and proximity check result explanation

For both result explanation section in the Correction message, we only choose one sentence from each section based on the proximity check and direction check result.

Fabric Folding

System Prompt to GPT-4V for analyzing sub-goal images

I will be providing you with two images. In each image, you will see a background that's divided into four quadrants with alternating shades of gray. Think of this background as a flat surface on which a cloth is kept. This cloth could be seen in the centre of these images as a geometric shape coloured with orange and pink.

There is also a black arrow in the first image, which essentially represents an action where someone would pick a point on the cloth corresponding to the black dot from where the arrow originates. This would be represented as the picking point. On the other hand, the point where the tip of the black arrow is located corresponds to the location where the chosen picking point is placed. This is referred to as the placing point.

This sequence of action of picking a point on the cloth and place it somewhere results in a fold, whose result can be seen in the next image. So basically we are folding the cloth in the first image to get to the second image. I want you to describe the instructions for the folding step that someone could follow to achieve the same fold.

Look at the relative location of the tip of the arrow with respect to the center of the image. Depending on whether this is near the center or a diagonally opposite point or a point along the given edge, choose your placing point as the center or a diagonally opposite point or a point along the given edge respectively.

IMPORTANT: INLCUDE THE PICKING AND PLACING POINT INFORMATION IN THE RESPONSE. YOU MUST SPECIFY WHERE SHOULD THE PLACING POINT BE.

RETURN YOUR OUTPUT IN THE BELOW FORMAT ONLY:

- Instructions: The instructions for the given folding step.

- Explanation: Why did you choose this pair of picking and placing points.

Note that the above prompt was divided into the following five major components on which prompt-ablation experiments were carried out.

1. A brief explanation of the visual context represented by the subgoal images

2. A description of pick and place actions facilitated by the addition of arrows in sub-goal images.

3. A reasoning logic to determine possible place location relative to the cloth center.

4. An explicit requirement to generate folding instructions in terms of the pick and place point

5. An explicit output format specification

The brief section of prompt without any highlighted text is the bare minimum requirement to guide GPT-4V in providing us with a folding instruction and is not counted as a core component of the above prompt. Kindly check the supplementary material for more details on the ablation studies. No separate user prompt is needed here.

System Prompt to GPT-4/GPT-3.5 for getting executable action

Role:

You are the brain of a cloth folding robot. The robot would pick one spot on the cloth (referred to as the "pick point"), lift it by a small amount, drag it over to another spot (referred to as "the place point"), and finally release it.

Inputs:

- Method of folding: A description of how the cloth should be folded

- Cloth corners: The robot sees the cloth lying on a table from the top and gets a depth image for it. This depth image is then processed to extract the corner points for the cloth. The pixel co-ordinates for the corners will be given to you as an input. The format of each pixel coordinate would be [x-coordinate, y-coordinate]

- Cloth Center: The robot will be given the [x-coordinate, y-coordinate] pair corresponding to the center of the initial cloth configuration

Task:

- Thought Process: Note down possible ways of picking and placing the cloth and their potential effects

- Planning: Provide a pair of pick and place point from the cloth corners provided as input for folding the cloth.

Output:

- Planning (MOST IMPORTANT): Pick Point = (x 1, y 1) and Place Point = (x 2, y 2)

- Thought Process: Why did you choose these points and not something else?

PLEASE OUTPUT THE PICK POINT AND THE PLACE POINT FIRST AND THEN OUTPUT THE THOUGHT PROCESS INVOLVED

User Prompt to GPT-4/GPT-3.5 for getting executable action

{Folding instruction generated by GPT-4V}

- Cloth corners: {Corners detected by harris corner detection on current image}

- Cloth center: {Pixel coordinates of the initial cloth center}

Example Folding Instructions generated

Example instructions generated for Double Triangle fold in a single rollout

Step 1: Pick the cloth at the bottom right corner (the picking point) and fold it diagonally across to the top left corner (the placing point) (correct)

Step 2: Pick up the cloth from the black dot at the bottom right corner of the triangle and fold it diagonally towards the top left corner of the background, ensuring that the picking point aligns with the placing point at the top left corner (correct)

Example instructions generated for Double Straight fold in a single rollout

Step 1: Pick the cloth at the point indicated by the black dot on the top edge and fold it downwards to align with the bottom edge of the cloth. Ensure the picked point touches the bottom edge directly below its original position, creating a vertical fold (correct)

Step 2: Pick the cloth at the point marked by the black dot on the orange section, and fold it towards the right edge of the background, aligning the picked point with the right edge to create the fold (correct)

Step 3: Pick the cloth at the point indicated by the black dot and fold it towards the right edge of the background, aligning the picking point with the edge, effectively folding the cloth in half (correct)

Example instructions generated for All Corners Inward fold in a single rollout

Step 1: Pick a point on the cloth at the corner where the black dot is located. Fold the cloth by bringing this point to the center of the image, aligning it with the center of the background quadrants (correct)

Step 2: Pick the cloth at the point indicated by the black dot at the bottom right corner of the orange area. Fold the cloth by bringing this point to the center of the image, aligning it with the tip of the black arrow (correct)

Step 3: Pick the cloth at the point indicated by the black dot and fold it diagonally across to the opposite edge of the cloth, aligning the picking point with the edge (incorrect)

Step 4: Pick the cloth at the point indicated by the black dot and fold it towards the center of the image, aligning the picking point with the center (correct but wouldn't matter as the previous instruction was incorrect)

Example instructions generated for Corners Edges Inward fold in a single rollout

Step 1: Pick the cloth at the point indicated by the black dot and fold it diagonally across to the opposite corner of the cloth, placing the picked point at the center of the cloth (correct)

Step 2: Pick the cloth at the point indicated by the black dot at the bottom right corner of the cloth. Fold the cloth by bringing this picking point to the center of the image, aligning it with the center of the quadrants on the background (correct)

Step 3: Pick the cloth at the point indicated by the black dot at the bottom edge of the orange section. Fold the cloth by bringing this point to the center of the image, aligning it with the tip of the black arrow. Ensure that the fold creates a diagonal crease across the orange section, resulting in a triangular shape with the pink section on top (incorrect)

Step 4: Pick the cloth at the point indicated by the black dot at the bottom right corner of the cloth. Fold the cloth by bringing this picking point to the center of the image, aligning it with the center of the cloth's edge that is currently at the center of the image (incorrect)

Acknowledgements

We thank Yue Wang and Jiageng Mao for advice on using GPT-4, and I-Chun Liu for helpful writing feedback.

If you find our work useful, kindly consider citing:

@inproceedings{GPTFabric2024,

title = {GPT-Fabric: Smoothing and Folding Fabric by Leveraging Pre-Trained Foundation Models},

author = {Vedant Raval and Enyu Zhao and Hejia Zhang and Stefanos Nikolaidis and Daniel Seita},

booktitle = {The International Symposium of Robotics Research (ISRR)},

Year = {2024}

}

Page updated

Report abuse