Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data

Priya Sundaresan, Jennifer Grannen, Brijen Thananjeyan, Ashwin Balakrishna,

Michael Laskey, Kevin Stone, Joseph E. Gonzalez, Ken Goldberg

Paper + Appendix: [Link] | Code: rope simulation/rendering and sim-to-real post-processing [Link]


Robotic manipulation of deformable 1D objects such as ropes, cables, and threads is challenging due to the lack of analytic models and large configuration spaces. Furthermore, learning end-to-end manipulation policies directly from images and physical interaction requires significant time cost on a robot and can fail to generalize across tasks. We address these challenges using interpretable deep visual representations for rope extending recent work on dense object descriptors for robot manipulation. This facilitates the design of interpretable and transferable geometric policies built on top of the learned representations, decoupling visual reasoning and control. We present an approach that learns point-pair correspondences between rope configurations, which implicitly encodes geometric structure, entirely in simulation from synthetic depth images. We demonstrate that the learned representation can be used to manipulate a real rope into a variety of different arrangements either by learning from demonstrations or using intuitive geometric policies. In 50 trials of a knot-tying task with the ABB YuMi Robot, the system achieves 66% knot-tying success from unseen configurations.

ICRA 2020 Conference Presentation

ICRA 2020 Video Submission

Descriptor Learning

A descriptor mapping is learned to output descriptors for each pixel in an image of a rope that are invariant to deformations. We build on top of prior work on dense object descriptors.

Descriptors are trained only on synthetic depth images of a braided rope. The rope is rendered using Blender, a simulation and animation engine.

Sim to Real Processing

Synthetic data is processed to resemble real depth data from our sensor.

Policy Design

Algorithm 1: One Shot Visual Imitation

We provide the robot with frames of a human manipulating a rope into various planar and non-planar configurations. The robot treats each frame of the demonstration as a subgoal and uses the discrepancy between correspondences generated from descriptors to plan actions.

The greedy policy is planned as follows:

1) Sample pixels on the rope mask in the current workspace configuration.

2) Find predicted correspondences for each sampled pixel in the current subgoal image.

3) Identify the corresponding pixel pair that has greatest L2 distance.

4) Perform a grasp and pull to align the furthest correspondences.

Iteratively continue, until IoU of current workspace/goal configuration is below a threshold or the maximum number of attempts for the current subgoal have been exhausted.

Example Action Sequence Plans (Planar):

Carousel imageCarousel imageCarousel imageCarousel imageCarousel image
Carousel imageCarousel imageCarousel imageCarousel imageCarousel imageCarousel imageCarousel imageCarousel image

Algorithm 2: Descriptor Parameterized Knot-Tying

Loop Physical Experiments

We use the descriptors to parameterize actions for a knot-tying task. The descriptors are used to semantically define the actions necessary to tie a knot in terms of the rope geometry. The robot is able to tie knots from unseen rope configurations.

Video: 1.75x speed