Bi-Manual Manipulation and Attachment via Sim-to-Real Reinforcement Learning

Satoshi Kataoka, Seyed Kamyar Seyed Ghasemipour, Daniel Freeman, Igor Mordatch

Abstract

Most successes in robotic manipulation have been restricted to single-arm robots, which limits the range of solvable tasks to pick-and-place, insertion, and objects rearrangement. In contrast, dual and multi arm robot platforms unlock a rich diversity of problems that can be tackled, such as laundry folding and executing cooking skills. However, developing controllers for multi-arm robots is complexified by a number of unique challenges, such as the need for coordinated bimanual behaviors, and collision avoidance amongst robots. Given these challenges, in this work we study how to solve bi-manual tasks using reinforcement learning (RL) trained in simulation, such that the resulting policies can be executed on real robotic platforms. Our RL approach results in significant simplifications due to using real-time (4Hz) joint-space control and directly passing unfiltered observations to neural networks policies. We also extensively discuss modifications to our simulated environment which lead to effective training of RL policies. In addition to designing control algorithms, a key challenge is how to design fair evaluation tasks for bi-manual robots that stress bimanual coordination, while removing orthogonal complicating factors such as high-level perception. In this work, we design a “Connect Task”, where the aim is for two robot arms to pick up and attach two blocks with magnetic connection points. We validate our approach with two xArm6 robots and 3D printed blocks with magnetic attachments, and find that our system has 100% success rate at picking up blocks, and 65% success rate at the “Connect Task”.


Bimanual Attachment

We define bimanual attachment as a connect task. In the connect task, two blocks with magnetic connection points are placed on the ground, and two robotic arms must pick up the blocks and magnetically attach them. We designed this task as a minimal configuration that stresses bi-arm coordination and object manipulation. Despite its minimalism, increasing the number of magnetic blocks can support the creation of arbitrarily complex composed structures, which can lead to many intriguing avenues for future research. In our “Connect Task”, each trial is 25sec long. The success determination is different between simulation and real world. In simulation, we set 1mm distance and 0.05 radian orientations difference between the two magnets as the condition for success. In the real world, we set 1mm, 5mm, and 10mm as different levels of success and also set visual check for relative orientation as a success requirement, with a human operator checking the distance between the two magnets and the relative orientation between two blocks at the end of each trial.

Produced joint trajectories

Importance of Large-Scale Training

We train our MLP agent (Section VI), for 3.2 billion environment steps to observe training patterns that may arise over a long period of training. We compare training RL policies with and without simulator modifications that were incorporated towards making learned policies transferable to real-world robots. As can be seen, for both tasks, simulator modifications make training require significantly more iterations to achieve a high success rate. For the “Pickup Task”, success rate exceeds 90% at 100 million steps without simulator modifications, while taking 190 million steps when they are incorporated. In the “Connect Task”, success rate exceeds 90% at 280 million steps without simulation modifications, while taking 1.8 billion steps when they are incorporated. Although agents train faster without simulation modifications, they learn less robust policies that do not transfer well to real-world robots.

Training Progress

Future Directions

Concurrent to this work, we have been studying bimanual assembly via blueprint assembly environments with simulated direct actuations. To briefly introduce that research, it demonstrated training of a single agent that can simultaneously solve all seen and unseen assembly tasks via a combination of large-scale RL, structured policies, and multi-task training. Our current efforts in this directions can be viewed in the link below, with a video of our results presented below.