IndustReal: Transferring Contact-Rich Assembly Tasks
from Simulation to Reality

Updates:

1/4/23: IndustRealLib has been released! This release contains our code to deploy policies on a real-world Franka. Thank you for your patience! We experienced a few months of delays due to recent administrative and logistical changes on our team.

9/22/23: IndustRealSim has been released! This release contains our simulation environments and implementations of SAPU, SDF-based reward, and SBC for policy training. See the corresponding code and documentation within the Isaac Gym Envs repo.

8/22/23: IndustRealKit has been released! This release contains the CAD models and meshes for the assemblies used in IndustReal.

7/12/23: IndustReal was presented at RSS in Daegu, Korea. Thank you for stopping by or tuning in online!

Short Video (7.5 min)

Long Video (14 min)

This video has audio and is best viewed with sound turned on.

Abstract

Robotic assembly is a longstanding challenge, requiring contact-rich interaction and high precision and accuracy. Many applications also require adaptivity to diverse parts, poses, and environments, as well as low cycle times. In other areas of robotics, simulation is a powerful tool to develop algorithms, generate datasets, and train agents. However, simulation has had a more limited impact on assembly. We present IndustReal, a set of algorithms, systems, and tools that solve assembly tasks in simulation with reinforcement learning (RL) and successfully achieve policy transfer to the real world. Specifically, we propose 1) simulation-aware policy updates, 2) signed-distance-field rewards, and 3) sampling-based curricula for robotic RL agents. We use these algorithms to enable robots to solve contact-rich pick, place, and insertion tasks in simulation. We then propose 4) a policy-level action integrator to minimize error at policy deployment time. We build and demonstrate a real-world robotic assembly system that uses the trained policies and action integrator to achieve repeatable performance in the real world. Finally, we present hardware and software tools that allow other researchers to fully reproduce our system and results.

Policy Learning in Simulation

For simulation, we propose three methods to allow RL agents to solve contact-rich tasks in a simulator: 

Simulation-aware Policy Update (SAPU)


In contact-rich simulators, spurious interpenetrations between assets are unavoidable, especially when executing in real-time. Unfortunately, in simulation for RL, an agent can exploit inaccurate collision dynamics to maximize reward, learning policies that are unlikely to transfer to the real world Thus, we propose our first algorithm, a simulation-aware policy update (SAPU), where the agent is encouraged to learn policies that avoid interpenetrations.


As shown in the diagram below, for a given environment, the module takes as input the plug and socket mesh and associated 6-DOF poses. The module samples N = 1000 points on/inside the mesh of the plug, transforms the points to the socket frame, computes distances to the socket mesh, and returns the max interpenetration depth (algorithm 1). This procedure is performed each horizon, and the depth is used to weight the cumulative reward during the policy update.

Signed Distance Field (SDF)-Based Reward


To model the dense reward in contact-rich assembly, we propose signed distance field (SDF)-based dense reward for RL agent training. As shown in the following diagram, for each plug and socket, we generate plug SDFs using pysdf. For each object, area-weighted sampling is used to select N = 1000 points on the surface of its mesh. For a given environment, SDF values are queried at these points, using the SDF of the other object. This procedure is performed each timestep and is used to generate a dense reward signal.

Sampling-Based Curriculum (SBC)


Curriculum learning is an established approach for solving long-horizon problems; as the agent learns, the difficulty of the task is gradually increased. To avoid overfitting to a certain curriculum stage, we developed Sampling-Based Curriculum (SBC), whereby the agent is exposed to the entire range of initial state distributions from the start of the curriculum.


Here we visualize 3 different curriculum settings:

Policy Deployment in Reality

For sim-to-real transfer, we propose a policy-level action integrator (PLAI), which reduces steady-state error in the presence of unmodeled dynamics.

Evaluations

Simulation


We proposed three algorithms for improving learning of contact-rich Insert policies: 


As a final evaluation, we comprehensively evaluated all three techniques in tandem (shown in Table 1). When training and testing with moderate state randomization (plug and hole randomization of  +-10 mm and +-10 cm, respectively) and observation noise (+-1 mm), the Pegs and Holes assembly Insert policy achieved success and engagement rates of 88.6% and 96.6%, respectively, whereas the Gears and Gearshafts assembly Insert policy achieved 82.0% and 85.2%.

Table 1. Joint evaluation of SAPU, SDF-Based Reward, and SBC. (A) Pegs and Holes assembly Insert policy. (B) Gears and Gearshafts assembly Insert policy. Engage denotes partial insertion. This table evaluates in-distribution and out-of-distribution performance. Each test was executed on 5 seeds, with 1000 trials each.

Real


After developing and validating our algorithms, we performed comprehensive experiments and demos to evaluate our real-world system. Five types were executed: 


We show quantitative evaluations for Pick, Place, Insert, and Pick-Place-Insert in the following Figure 1 and Table 2. We show sort demo in our videos. 

Figure 1. Place evaluation in Real.

Table 2. Real-world experimental results for Pick, Insert, and Pick-Place-Insert (PPI).