VisuoSpatial Foresight (VSF) for Physical Sequential Fabric Manipulation

Ryan Hoque*, Daniel Seita*, Ashwin Balakrishna, Adi Ganapathi, Ajay Tanwani, Nawid Jamali, Katsu Yamane, Soshi Iba, Ken Goldberg

Abstract

Robotic fabric manipulation has applications in home robotics, textiles, senior care and surgery. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We build upon the Visual Foresight framework to learn fabric dynamics that can be efficiently reused to accomplish different sequential fabric manipulation tasks with a single goal-conditioned policy. We extend our earlier work on VisuoSpatial Foresight (VSF), which learns visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. In this earlier work, we evaluated VSF on multi-step fabric smoothing and folding tasks against 5 baseline methods in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. A key finding was that depth sensing significantly improves performance: RGBD data yields an 80 % improvement in fabric folding success rate in simulation over pure RGB data. In this work, we vary 4 components of VSF, including data generation, the choice of visual dynamics model, cost function, and optimization procedure. Results suggest that training visual dynamics models using longer, corner-based actions can improve the efficiency of fabric folding by 76% and enable a physical sequential fabric folding task that VSF could not previously perform with 90% reliability.

RSS 2020 Video Presentation

rssHQv2.mp4

Approach

The VSF algorithm consists of two phases: (1) learning a video prediction model on random interaction episodes of simulated RGBD data (left) and (2) planning over that model toward some goal configuration provided as an image (right).

Comparison of ground truth trajectories against predicted trajectories at test time, given a single context frame (not shown) and action sequence. Notice the domain randomization in color, camera angle, and shading.
Visualization of the planning algorithm during a simulated rollout. Here we show the top 5 trajectories during the Cross Entropy Method (CEM), where we minimize L2 cost between the final image and the goal. Pick-and-pull actions are projected onto the images as black arrows.

Smoothing

We apply VSF to the smoothing task, in which we wish to maximize fabric coverage of the plane. To do this, we provide the policy a goal image of a fully smooth fabric.

Simulation

We roll out the learned policy in simulation and compare against an Imitation Learning (IL) agent trained on smoothing demonstrations. We find that even on highly wrinkled starting states, VSF rivals the IL agent despite not seeing any explicit smoothing during training or explicitly optimizing for coverage.

Graph comparing coverage metrics for VSF against IL over 200 trajectories across three tiers of difficulty. VSF performs very similarly to IL.
Simulated smoothing rollout for a Tier 3 starting state. Here we show a time-lapse of observations after each action is taken and the cloth has settled. Resolution is low as these are the actual 56x56 frames we provide to the network.

Physical

Here is an example rollout of the VisuoSpatial Foresight policy smoothing a fabric with the da Vinci Research Kit (dVRK) surgical robot. Physical performance rivals the IL agent despite being trained on random data purely in simulation.

VF-01-29-t3_random_e004_15acts_good.mp4

Folding

We test with goal images of fabric in folded configurations using the same VSF policy. New results improve upon the folding performance in the prior work.

Simulation

A RSS 2020 folding episode in simulation.

A RSS 2020 dual folding episode in simulation.

A new folding episode in simulation.

A new double folding episode in simulation.

Physical

Here we provide video of 3 successful rollouts (Trials 1-3 below) and 1 unsuccessful rollout (Trial 4) of the VSF folding policy deployed on real fabric. 9 of 10 trials succeed, where the 1 failure is due to an unintended pick of two layers of the fabric, which is not modeled in the simulator. Misses are corrected by moving to the nearest point on the color mask of the fabric.

IMG_2448.MOV

Trial 1

IMG_2449.MOV

Trial 2

IMG_2450.MOV

Trial 3

IMG_2437.MOV

Trial 4