Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor

Daniel Seita, Aditya Ganapathi, Ryan Hoque, Minho Hwang, Edward Cen, Ajay Kumar Tanwani, Ashwin Balakrishna,
Brijen Thananjeyan, Jeffrey Ichnowski, Nawid Jamali, Katsu Yamane, Soshi Iba, John Canny, Ken Goldberg

International Conference on Intelligent Robots and Systems (IROS), 2020

Table of Contents


Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color (RGB), depth (D), or combined color-depth (RGBD) images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic supervisor that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of RGB vs D vs RGBD images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 180 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, RGBD policies trained in simulation attain coverage of 83% to 95% depending on difficulty tier, suggesting that effective fabric smoothing policies can be learned from an algorithmic supervisor and that depth sensing is a valuable addition to color alone.

IROS 2020 Presentation Video


The code for the project is located across these repositories:

The repositories have a fair amount of documentation, but there are a lot of moving pieces to tie together. If you have questions about the code, email me at <> or write a GitHub issue report with details about what you want to do, and I will do my best to help you out.

Excitingly, there have been multiple follow-up works that have used our code:


March 2020 update: I generated an RGBD dataset for the IROS 2020 paper. You can find it here for Tiers 1, 2, and 3, respectively:

These files will correspond to these file names (dated from February 09 and February 10):


I strongly suggest using the above data set if you are interested in using data from this work.

The older offline demonstrator data (with color alone or depth alone) can be found HERE (warning: 5.2 GB). Run the command to tar it to get:

$ tar -zxvf offline-demo-data.tar.gz demos-2019-08-28-pol-oracle-seed-1337_to_1341-clip_a-True-delta_a-True-obs-blender-tier1_epis_2000_COMBINED.pkldemos-2019-08-28-pol-oracle-seed-1337_to_1341-clip_a-True-delta_a-True-obs-blender-tier2_epis_2000_COMBINED.pkldemos-2019-08-28-pol-oracle-seed-1337_to_1341-clip_a-True-delta_a-True-obs-blender-tier3_epis_2000_COMBINED.pkldemos-2019-08-30-pol-oracle-seed-1337_to_1341-clip_a-True-delta_a-True-obs-blender-depthimg-False-tier1_epis_2000_COMBINED.pkldemos-2019-08-30-pol-oracle-seed-1337_to_1341-clip_a-True-delta_a-True-obs-blender-depthimg-False-tier2_epis_2000_COMBINED.pkldemos-2019-08-30-pol-oracle-seed-1337_to_1341-clip_a-True-delta_a-True-obs-blender-depthimg-False-tier3_epis_2000_COMBINED.pkl

This is technically is not needed for DAgger, but it's nice to have data to get the learner policy in a good configuration before we do DAgger. The first three are for depth images, and the last three are for color images (as "depthimg" is False in the file name). All are pickle files that store one list, of which each item in the list contains information about a specific trajectory.

Videos (Simulation)

The videos below are taken from rendering software that we use to visualize the simulator. We use the rendering software for taking videos of the simulator and debugging, but not for domain randomization. For that, we export our cloth meshes to Blender. For simulated videos, the fabric plane is blue. (In the real setup, the fabric plane is white foam rubber.)

Note: it sometimes looks like the pulls are pulling "past" the corner but that's an artifact of our renderer and camera angles.


Oracle Corner Pulling Policy on a Tier 1 Starting State: it is able to do the trajectory in one shot. This is pretty typical.


Oracle Corner Pulling Policy on a Tier 2 Starting State: The first two actions pull the top layer above the corner furthest from the target, and in both cases it is trying to consider the "upper right" fabric corner.


Oracle Corner Pulling Policy on a Tier 2 Starting State: Here's another example of a tier 2 starting state.


Oracle Corner Pulling Policy on a Tier 3 Starting State: it is able to sufficiently smooth the fabric with 4 actions in this case.

Videos (Real)

Here are videos of the physical setup. I arrange this by tier, listing tiers 1, 2, and 3, respectively, and show the performance of policies that use color (RGB), depth (D), or RGBD image inputs. We show many videos for completeness, but I recommend looking at the videos below showing the RGBD policy on tier 3 starting configurations.

Tier 1 Starting Configurations


Color Policy on Tier 1: I observed this behavior frequently. In some of the 20 trajectories, the policy was able to get to the coverage threshold in just one action. (The same thing happens in simulation.)


Depth Policy on Tier 1: Here's another tier 1 starting state, this time with the depth based policy. In general, depth based policies don't do as good a job of "finishing" trajectories in one action as the color policies on tier 1 starting states. This video is sped up by 4x.

Tier 2 Starting Configurations


Color Policy on Tier 2: the upper left fabric corner is initially occluded and slightly underneath the fabric. The color trained policy (as done in simulation) will pull above it and then towards the upper left fabric plane target. It "over-pulls" but the next actions are able to compensate for that, resulting in great coverage. This is a common pattern I've observed, where slightly over-pulling first can be beneficial for later because corners that are folded underneath end up closer to the actual fabric plane (i.e., foam rubber) targets.


Depth Policy on Tier 2: Here's the depth policy. The main takeaway is that it does some reasonable actions here, but at the 9th action (second-to-last) it will perform a poor action, which decreases coverage. Also, notice how it misses the fabric a few times --- but then the next action touches it, perhaps largely due to depth being somewhat less consistent across time steps versus color images.

Tier 3 Starting Configurations


Color Policy on Tier 3: Despite a highly wrinkled starting state, the learned policy gets excellent coverage. This kind of "back and forth" motion is often helpful to later let the policy fine-tune the fabric by pulling at exposed corners.


Depth Policy on Tier 3: the actions are somewhat reasonable. It's not terrible, but not ideal. The depth policy is particularly susceptible to missing the fabrics.


RGBD Policy on Tier 3: The policy got excellent coverage here.


RGBD Policy on Tier 3: The policy got excellent coverage here.

Here are additional (successful) RGBD policy example rollouts on Tier 3 configurations, at 4X speed, with the camera movement partially stabilized compared to earlier videos.


Success in 9 actions.


Success in 6 actions.


Success in 5 actions.


Success in 7 actions.

Failure Case Example

A common failure case / limitation is when the policy slightly misses grabbing the fabric, as shown in the below left video. As a partial way to address this, we measure structural similarity of the image before and after the action. If it shows that the two images are nearly identical, the next action moves closer to the center, which is usually sufficient for our purposes. Notice that this may require several actions, since there's no guarantee the next action will touch fabric. An alternative would be to simply map it to the nearest fabric pixel, though this is also subject to calibration error.

The below two videos show back-to-back actions in the same episode. This is shown with our older setup where we taped some paper in the background. The newer setup, used for experiments, uses a flat piece of paper with a cut piece to allow the foam rubber to be seen.


Calibration Video

The following (sped-up!) video shows how we calibrated the robot. We have it go to the corners of the checkerboard, and visually inspect if it is accurate enough.

The dVRK robot is often difficult to use. Here are some papers from our lab at ICRA 2018 (arXiv), ISMR 2020 (arXiv), and IEEE RA-L 2020 (arXiv) that expand on the dVRK calibration and usage issues in more detail.




author = {Daniel Seita and Aditya Ganapathi and Ryan Hoque and Minho Hwang and Edward Cen and Ajay Kumar Tanwani and Ashwin Balakrishna and Brijen Thananjeyan and Jeffrey Ichnowski and Nawid Jamali and Katsu Yamane and Soshi Iba and John Canny and Ken Goldberg},

title = {{Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor}},

booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},

Year = {2020}



This research was performed at the AUTOLAB at UC Berkeley in affiliation with Honda Research Institute USA, the Berkeley AI Research (BAIR) Lab, Berkeley Deep Drive (BDD), the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS “People and Robots” (CPAR) Initiative, and by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, NSF National Robotics Initiative Award 1734633. The authors were supported in part by Siemens, Google, Amazon Robotics, Toyota Research Institute, Autodesk, ABB, Samsung, Knapp, Loccioni, Intel, Comcast, Cisco, Hewlett-Packard, PhotoNeo, NVidia, and Intuitive Surgical. Daniel Seita is supported by a National Physical Science Consortium Fellowship. We thank Jackson Chui, Michael Danielczuk, Shivin Devgon, and Mark Theis.