VisuoSpatial Foresight (VSF) for Physical Sequential Fabric Manipulation
Ryan Hoque*, Daniel Seita*, Ashwin Balakrishna, Adi Ganapathi, Ajay Tanwani, Nawid Jamali, Katsu Yamane, Soshi Iba, Ken Goldberg
Ryan Hoque*, Daniel Seita*, Ashwin Balakrishna, Adi Ganapathi, Ajay Tanwani, Nawid Jamali, Katsu Yamane, Soshi Iba, Ken Goldberg
Paper + Appendix: [Link] | Code: [Data Collection, Training, Simulated Experiments] [Physical Experiments]
Robotic fabric manipulation has applications in home robotics, textiles, senior care and surgery. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We build upon the Visual Foresight framework to learn fabric dynamics that can be efficiently reused to accomplish different sequential fabric manipulation tasks with a single goal-conditioned policy. We extend our earlier work on VisuoSpatial Foresight (VSF), which learns visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. In this earlier work, we evaluated VSF on multi-step fabric smoothing and folding tasks against 5 baseline methods in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. A key finding was that depth sensing significantly improves performance: RGBD data yields an 80 % improvement in fabric folding success rate in simulation over pure RGB data. In this work, we vary 4 components of VSF, including data generation, the choice of visual dynamics model, cost function, and optimization procedure. Results suggest that training visual dynamics models using longer, corner-based actions can improve the efficiency of fabric folding by 76% and enable a physical sequential fabric folding task that VSF could not previously perform with 90% reliability.
 rssHQv2.mp4
rssHQv2.mp4The VSF algorithm consists of two phases: (1) learning a video prediction model on random interaction episodes of simulated RGBD data (left) and (2) planning over that model toward some goal configuration provided as an image (right).
We apply VSF to the smoothing task, in which we wish to maximize fabric coverage of the plane. To do this, we provide the policy a goal image of a fully smooth fabric.
We roll out the learned policy in simulation and compare against an Imitation Learning (IL) agent trained on smoothing demonstrations. We find that even on highly wrinkled starting states, VSF rivals the IL agent despite not seeing any explicit smoothing during training or explicitly optimizing for coverage.
Here is an example rollout of the VisuoSpatial Foresight policy smoothing a fabric with the da Vinci Research Kit (dVRK) surgical robot. Physical performance rivals the IL agent despite being trained on random data purely in simulation.
 VF-01-29-t3_random_e004_15acts_good.mp4
VF-01-29-t3_random_e004_15acts_good.mp4We test with goal images of fabric in folded configurations using the same VSF policy. New results improve upon the folding performance in the prior work.
A RSS 2020 folding episode in simulation.
A RSS 2020 dual folding episode in simulation.
A new folding episode in simulation.
A new double folding episode in simulation.
Here we provide video of 3 successful rollouts (Trials 1-3 below) and 1 unsuccessful rollout (Trial 4) of the VSF folding policy deployed on real fabric. 9 of 10 trials succeed, where the 1 failure is due to an unintended pick of two layers of the fabric, which is not modeled in the simulator. Misses are corrected by moving to the nearest point on the color mask of the fabric.
 IMG_2448.MOV
IMG_2448.MOVTrial 1
 IMG_2449.MOV
IMG_2449.MOVTrial 2
 IMG_2450.MOV
IMG_2450.MOVTrial 3
 IMG_2437.MOV
IMG_2437.MOVTrial 4