IndustReal: Transferring Contact-Rich Assembly Tasks
from Simulation to Reality

Bingjie Tang*, Michael A. Lin*, Iretiayo Akinola, Ankur Handa, Gaurav S. Sukhatme, Fabio Ramos, Dieter Fox, Yashraj Narang

Paper (Arxiv) | Blogpost | IndustRealKit | IndustRealSim | IndustRealLib

Updates:

1/4/23: IndustRealLib has been released! This release contains our code to deploy policies on a real-world Franka. Thank you for your patience! We experienced a few months of delays due to recent administrative and logistical changes on our team.

9/22/23: IndustRealSim has been released! This release contains our simulation environments and implementations of SAPU, SDF-based reward, and SBC for policy training. See the corresponding code and documentation within the Isaac Gym Envs repo.

8/22/23: IndustRealKit has been released! This release contains the CAD models and meshes for the assemblies used in IndustReal.

7/12/23: IndustReal was presented at RSS in Daegu, Korea. Thank you for stopping by or tuning in online!

Short Video (7.5 min)

Long Video (14 min)

This video has audio and is best viewed with sound turned on.

Abstract

Robotic assembly is a longstanding challenge, requiring contact-rich interaction and high precision and accuracy. Many applications also require adaptivity to diverse parts, poses, and environments, as well as low cycle times. In other areas of robotics, simulation is a powerful tool to develop algorithms, generate datasets, and train agents. However, simulation has had a more limited impact on assembly. We present IndustReal, a set of algorithms, systems, and tools that solve assembly tasks in simulation with reinforcement learning (RL) and successfully achieve policy transfer to the real world. Specifically, we propose 1) simulation-aware policy updates, 2) signed-distance-field rewards, and 3) sampling-based curricula for robotic RL agents. We use these algorithms to enable robots to solve contact-rich pick, place, and insertion tasks in simulation. We then propose 4) a policy-level action integrator to minimize error at policy deployment time. We build and demonstrate a real-world robotic assembly system that uses the trained policies and action integrator to achieve repeatable performance in the real world. Finally, we present hardware and software tools that allow other researchers to fully reproduce our system and results.

Policy Learning in Simulation

For simulation, we propose three methods to allow RL agents to solve contact-rich tasks in a simulator:

a simulation-aware policy update (SAPU) to provide the agent knowledge of when simulation predictions are reliable,
a signed distance field (SDF) reward to provide a dense alignment metric between geometrically-complex objects,
a sampling-based curriculum (SBC) to prevent overfitting to constrained phases of a curriculum.

Simulation-aware Policy Update (SAPU)

In contact-rich simulators, spurious interpenetrations between assets are unavoidable, especially when executing in real-time. Unfortunately, in simulation for RL, an agent can exploit inaccurate collision dynamics to maximize reward, learning policies that are unlikely to transfer to the real world Thus, we propose our first algorithm, a simulation-aware policy update (SAPU), where the agent is encouraged to learn policies that avoid interpenetrations.

As shown in the diagram below, for a given environment, the module takes as input the plug and socket mesh and associated 6-DOF poses. The module samples N = 1000 points on/inside the mesh of the plug, transforms the points to the socket frame, computes distances to the socket mesh, and returns the max interpenetration depth (algorithm 1). This procedure is performed each horizon, and the depth is used to weight the cumulative reward during the policy update.

Signed Distance Field (SDF)-Based Reward

To model the dense reward in contact-rich assembly, we propose signed distance field (SDF)-based dense reward for RL agent training. As shown in the following diagram, for each plug and socket, we generate plug SDFs using pysdf. For each object, area-weighted sampling is used to select N = 1000 points on the surface of its mesh. For a given environment, SDF values are queried at these points, using the SDF of the other object. This procedure is performed each timestep and is used to generate a dense reward signal.

Sampling-Based Curriculum (SBC)

Curriculum learning is an established approach for solving long-horizon problems; as the agent learns, the difficulty of the task is gradually increased. To avoid overfitting to a certain curriculum stage, we developed Sampling-Based Curriculum (SBC), whereby the agent is exposed to the entire range of initial state distributions from the start of the curriculum.

Here we visualize 3 different curriculum settings:

None: do not apply curriculum
Standard: at each curriculum stage, initialize plug at fixed height.
Sampling-Based: at each curriculum stage, sample initial plug position from entire range of possible heights, with gradually increasing lower bound.

Policy Deployment in Reality

For sim-to-real transfer, we propose a policy-level action integrator (PLAI), which reduces steady-state error in the presence of unmodeled dynamics.

Evaluations

Simulation

We proposed three algorithms for improving learning of contact-rich Insert policies:

Simulation-Aware Policy Update to compensate for simulator inaccuracy,
SDF-Based Dense Reward to resolve symmetry specification,
Sampling-Based Curriculum to prevent overfitting to initially over-constrained contact.

As a final evaluation, we comprehensively evaluated all three techniques in tandem (shown in Table 1). When training and testing with moderate state randomization (plug and hole randomization of +-10 mm and +-10 cm, respectively) and observation noise (+-1 mm), the Pegs and Holes assembly Insert policy achieved success and engagement rates of 88.6% and 96.6%, respectively, whereas the Gears and Gearshafts assembly Insert policy achieved 82.0% and 85.2%.

Table 1. Joint evaluation of SAPU, SDF-Based Reward, and SBC. (A) Pegs and Holes assembly Insert policy. (B) Gears and Gearshafts assembly Insert policy. Engage denotes partial insertion. This table evaluates in-distribution and out-of-distribution performance. Each test was executed on 5 seeds, with 1000 trials each.

Real

After developing and validating our algorithms, we performed comprehensive experiments and demos to evaluate our real-world system. Five types were executed:

Pick: Use the simulation-trained Pick policy to pick up the objects before releasing them.
Place: Use the simulation-trained Place policy to guide the laser to the centers of the targets.
Sort: Use simulation-trained Pick and Place policies to pick, place, and drop the objects into their corresponding bins.
Insert: Use simulation-trained Insert policy to insert plugs into their corresponding sockets.
Pick-Place-Insert (PPI): Use simulation-trained Pick, Place, Insert policies to bring all parts into their assembled configurations.

We show quantitative evaluations for Pick, Place, Insert, and Pick-Place-Insert in the following Figure 1 and Table 2. We show sort demo in our videos.

Figure 1. Place evaluation in Real.

Table 2. Real-world experimental results for Pick, Insert, and Pick-Place-Insert (PPI).

IndustReal: Transferring Contact-Rich Assembly Tasksfrom Simulation to Reality