Deep Dynamics Models for Learning Dexterous Manipulation
Anusha Nagabandi, Kurt Konolige, Sergey Levine, Vikash Kumar
Dexterous multi-fingered hands can provide robots with the ability to flexibly perform a wide range of manipulation skills. However, many of the more complex behaviors are also notoriously difficult to control: Performing in-hand object manipulation, executing finger gaits to move objects, and exhibiting precise fine motor skills such as writing, all require finely balancing contact forces, breaking and reestablishing contacts repeatedly, and maintaining control of unactuated objects. Learning-based techniques provide the appealing possibility of acquiring these skills directly from data. However, current learning approaches either require large amounts of data and produce task-specific policies, or they have not yet been shown to scale up to more complex and realistic tasks requiring fine motor skills. In this work, we demonstrate that our method of online planning with deep dynamics models (PDDM) addresses both of these limitations; we show that improvements in learned dynamics models, together with improvements in online model-predictive control, can indeed enable efficient and effective learning of flexible contact-rich dexterous manipulation skills -- and that too, on a 24-DoF anthropomorphic hand in the real world, using just 2-4 hours of purely real-world data to learn to simultaneously coordinate multiple free-floating objects.
METHOD OVERVIEW:
At a high level, this method of online planning with deep dynamics models involves an iterative procedure of (a) running a controller to perform action selection using predictions from a trained predictive dynamics model, and (b) training a dynamics model to fit that collected data. With recent improvements in both modeling procedures as well as control schemes using these high-capacity learned models, we are able to demonstrate efficient and autonomous learning of complex dexterous manipulation tasks.
BAODING BALLS:
Baoding balls - also referred to as Chinese relaxation balls - refer to the task of simultaneously manipulating two spheres around each other in the hand. This task requires both dexterity and coordination, which is why it is commonly used for improving finger coordination, relaxing muscular tensions, and recovering muscle strength and motor skills after surgery. In this work, we put our PDDM algorithm to the test by learning this task of Baoding balls with 0 simulation, using ~2 hours worth of real-world data.
TRAINING SETUP:
In our experiments, we use the ShadowHand: A 24-DoF 5-fingered anthropomorphic hand. In addition to its inbuilt proprioceptive sensing at each joint, we separately trained and integrated a dilated CNN-based RGB tracker to produce 3D position estimates for the external objects (Baoding balls) in this task, using a 280x180 RGB stereo image pair from a calibrated camera rig.
To enable continuous experimentation in the real world, we developed an automated reset mechanism that consists of a ramp and an additional robotic arm: The ramp funnels the dropped Baoding balls to a specific position and then triggers the 7-DoF Franka-Emika arm to use its parallel jaw gripper to pick them up and return them to the ShadowHand's palm to resume training. The episode terminates if the task horizon of 10 seconds has elapsed or if the hand drops either ball, which then triggers the automatic reset procedure again.
TRAINING PROGRESS:
PDDM's sample efficiency facilitates training complex behaviors directly with real-world experience on physical hardware, precluding the need for sim-to-real transfer or prior system/environment-specific information in general. On this task of Baoding balls, learning takes approximately 2 hours worth of real-world data to build a rich dynamics model and plan through it to achieve complex, dynamic, and contact-rich behaviors. Additional challenges present in such a real-world scenario include sensor noise, communication delays, unknown object properties, deformable materials, and further computationally expensive details such as the friction properties.
The system can very reliably perform 90-degree turns, and somewhat reliably perform 180-degree turns.
SIMULATED TASKS:
In order to develop our PDDM algorithm itself (as used above), we first designed a suite of simulated tasks on which we aimed to study the general challenges presented by contact-rich dexterous manipulation tasks. Some of the main challenges in dexterous manipulation involve the high dimensionality of the hand, the prevalence of complex contact forces that must be utilized and balanced to manipulate free floating objects, and the potential for failure from dropping objects in the hand. We identify a set of experimental tasks that specifically highlight these challenges, requiring delicate, precise, and coordinated movement.
The results of online planning through our learned models for these designed tasks are shown below, followed by some of the benefits of this approach, comparisons to other approaches, and the effect of various design decisions.
~20 min
worth of data
~1 hour
worth of data
~1 hour
worth of data
~1-2 hours
worth of data
~30-60 min
worth of data
Model-Reuse:
We find that models learned via PDDM can be re-purposed, sometimes even without additional training, to perform related tasks. For example, the model trained for the Baoding task of performing counterclockwise rotations (left) can be re-purposed to move a single ball to a goal location in the hand (middle) or to perform clockwise rotations (right) instead of the learned counterclockwise ones.
Task-Flexibility:
We study the flexibility of PDDM by experimenting with handwriting, where the base of the hand is fixed and arbitrary characters need to be written through the coordinated movement of the fingers and wrist. Although even writing a fixed trajectory is challenging, we see that writing arbitrary trajectories requires a degree of flexibility and coordination that is exceptionally challenging for prior methods. PDDM's separation of modeling and task-specific control allows for generalization across behaviors, as opposed to discovering and memorizing the answer to a specific task/movement. Below, we render PDDM's handwriting results that were trained on random paths for the green dot but then tested in a zero-shot fashion on numerical digits.
Comparisons:
We compare our method to the following state-of-the-art model-based and model-free RL algorithms:
Nagabandi et. al
learns a deterministic neural network model, combined with a random shooting MPC controllerPETS
NPG
SAC
MBPO
On our simulated suite of dexterous manipulation tasks, PDDM consistently outperforms these prior methods both in terms of learning speed and final performance, often solving flexible tasks that prior methods cannot.
Analysis of Design Decisions:
Here we present the impact of various design decisions on our model and our online planning method. We use the Baoding balls task for these experiments, though we observed similar trends on other tasks.
Comprehensive Videos
Hardware Results Overview
Method & Results Overview
Citation
@INPROCEEDINGS{PDDM,
AUTHOR = {Anusha Nagabandi AND Kurt Konoglie AND Sergey Levine AND Vikash Kumar},
TITLE = "{Deep Dynamics Models for Learning Dexterous Manipulation}",
BOOKTITLE = {Conference on Robot Learning (CoRL)},
YEAR = {2019}, }