Stabilize to Act: Learning to Coordinate for Bimanual Manipulation


Jennifer Grannen, Yilin Wu, Brandon Vu, Dorsa Sadigh

Conference on Robot Learning, 2023, Oral Presentation

Paper  |  Video

BUDS zips many different jackets with a dynamic stabilizing policy.

BUDS can precisely cap markers of varied brands and appearances.

BUDS dexterously cuts 3 vegetables with challenging dynamics.

Abstract

The ability to coordinate meaningfully between two hands unlocks a variety of skills to humans. Bimanual robotics opens the door to accomplishing a similarly large set of manipulation tasks. However, constructing control policies for dual arm autonomous systems brings inherent difficulties---the high-dimensionality of the bimanual action space adds complexity to both model-based and data-driven methods. We counteract this challenge by drawing inspiration from humans to propose a novel role assignment framework: a stabilizing arm holds an object in place to simplify the environment while an acting arm executes the task. We instantiate this framework with BimanUal Dexterity from Stabilization (BUDS), which uses a learned restabilizing classifier to alternate between updating a learned stabilization position to keep the environment unchanged, and accomplishing the task with an acting policy learned from demonstrations.

We evaluate BUDS on four dexterous bimanual tasks on real-world robots and present experiments highlighting overall task success and policy generalizability across objects within a class. Given 20 demonstrations, BUDS achieves 76.9% success on completing a wide variety of physical bimanual tasks, and achieves 52.7% success on when generalizing to out-of-distribution objects. BUDS is 56.0% more successful than a unstructured baseline that instead learns a BC stabilizing policy due to the precision required of these complex tasks.

BUDS learns to coordinate by assigning roles to each arm: a stabilizing arm holds a point stationary for a period of time while an acting arm acts. The stabilizing position to hold stationary is learned as a keypoint on an overhead image, and is instantiated with a ResNet architecture. Then, a noncompliant controller holds this point stable while an acting arm rolls out a policy learned from 20 expert single-arm demonstrations with a BC-RNN architecture. These two actions comprise a bimanual action at a single time step. Finally, we inject flexbility into the stabilizing policy with a Restabilizing Classifier, which determines via visual feedback at every time step when a stabilizing position is no longer effective and a new point should be detected. This final piece allows BUDS to tackle more challenging tasks, such as making multiple cuts moving back up the length of a vegetable, as shown here.

Experiments

We present experiments on four complex and diverse bimanual tasks that all require dynamic stabilizing policies and high-precision acting policies. Together, this set represents a wide variety of bimanual tasks. In these experiments, we visualize the task performance on in-distribution objects that are seen during training. Later, we stress test BUDS's generalizability with more out of distribution objects (below).

Pepper Grinder (4X)

Jacket Zip (4X)

Marker Cap (4X)

Cut Vegetable (4X)

We compare to a BC-Stabilizer baseline, where both arms are each controlled by a policy learned via imitation learning. In other words, this baseline replaces BUDS's keypoint model with a stabilizing policy learned with a BC-RNN architecture.

BC-Stabilizer (4X)

BUDS (4X)

BC-Stabilizer (8X)

BUDS (4X)

BC-Stabilizer (8X)

BUDS (4X)

Generalizing to OOD Objects

We stress test the ability of BUDS to generalize to out-of-distribution (OOD) objects completely unseen during training. BUDS is able to generalize across OOD objects with highly varied visual appearances, materials, geometries, and dynamics.  

Jacket Zip OOD (4X)

Marker Cap OOD (4X)

Cut Vegetable OOD (4X)

Citation

To cite this work, please use the following BibTex entry: