Efficient Sim-to-real Transfer of Contact-Rich Manipulation Skills with Online Admittance Residual Learning

Abstract

Learning contact-rich manipulation skills is essential to robotic applications. Such skills require the robots to interact with the environment with feasible manipulation trajectories and suitable compliance control parameters to enable safe and stable contact. However, learning these skills is challenging due to data inefficiency in the real world and the sim-to-real gap in simulation. In this paper, we introduce a hybrid offline-online framework to learn robust manipulation skills. We employ model-free reinforcement learning for the offline phase to obtain the robot motion and compliance control parameters in simulation. Subsequently, in the online phase, we learn the residual of the compliance control parameters to maximize robot performance-related criteria with force sensor measurements in real time. To demonstrate the effectiveness and robustness of our approach, we provide comparative results against existing methods for assembly and pivoting tasks.

Framework Overview

We propose a framework to learn robot manipulation skills that can transfer to the real world. The framework contains two phases: skill learning in simulation and admittance adaptation on the real robot. We use model-free RL to learn the robot's motion with domain randomization to enhance the robustness for direct transfer. The compliance control parameters are learned at the same time and serve as an initialization to online admittance learning. During online execution, we iteratively learn the residual of the admittance control parameters by optimizing the future robot trajectory smoothness and task completion criterion. We conduct real-world experiments on two typical contact-rich manipulation tasks: assembly and pivoting. Our proposed framework achieves efficient transfer from simulation to the real world. Furthermore, it shows excellent generalization ability in tasks with different kinematic or dynamic properties.

Learned Polices in Simulation

Assembly

Learned Policy in Simulation. Success rate: 100%

Pivoting

Learned Policy in Simulation. Success rate: 100%

Real World Experiments: Sim-to-real Transfer

direct_sim2real.mp4

Direct Transfer

Assembly Square Peg. Success Rate: 3/10

manual_sim2real.mp4

Manual Tune

Assembly Square Peg. Success Rate: 10/10

proposed_sim2real.mp4

Proposed

Assembly Square Peg. Success Rate: 10/10

pivot_wood_square_direct_transfer_edit.mp4

Direct Transfer

Pivoting Wood Square. Success Rate: 0/10

pivot_wood_square_manually_tuned_edit.mp4

Manual Tune

Pivoting Wood Square. Success Rate: 10/10

pivot_w_0_4_edit.mp4

Proposed

Pivoting Wood Square.  Success Rate: 9/10

Real World Experiments: Assembly  Tasks Generalization 

direct_triangle.mp4

Direct Transfer

Assembly Triangle Peg.  Success Rate: 0/10

manual_triangle.mp4

Manual Tune

Assembly Triangle Peg.  Success Rate: 8/10

proposed_triangle.mp4

Proposed

Assembly Triangle Peg.  Success Rate: 10/10

direct_pentagon.mp4

Direct Transfer

Assembly Pentagon Peg.  Success Rate: 1/10

manual_pentagon.mp4

Manual Tune

Assembly Pentagon Peg.  Success Rate: 9/10

proposed_pentagon.mp4

Proposed

Assembly Pentagon Peg.  Success Rate: 10/10

direct_ethernet.mp4

Direct Transfer

Assembly Ethernet Connector.  Success Rate: 0/10

manual_ethernet.mp4

Manual Tune

Assembly Ethernet Connector.  Success Rate: 1/10

proposed_ethernet.mp4

Proposed

Assembly Ethernet Connector.  Success Rate: 9/10

direct_waterproof.mp4

Direct Transfer

Assembly Waterproof Connector.  Success Rate: 0/10

manual_waterproof.mp4

Manual Tune

Assembly Waterproof Connector.  Success Rate: 0/10

proposed_waterproof.mp4

Proposed

Assembly Waterproof Connector.  Success Rate: 9/10

Real World Experiments: Pivoting Tasks Generalization 


adapter_learned_edit.mp4

Direct Transfer

Pivoting Adapter.  Success Rate: 0/10

pivot_adapter_manually_tuned_edit.mp4

Manual Tune

Pivoting Adapter.  Success Rate: 0/10

adapter_proposed_edit.mp4

Proposed

Pivoting Adapter.  Success Rate: 8/10

eraser_learned_edit.mp4

Direct Transfer

Pivoting Eraser.  Success Rate: 0/10

pivot_eraser_manually_tuned_edit.mp4

Manual Tune

Pivoting Eraser.  Success Rate: 10/10

eraser_proposed_edit.mp4

Proposed

Pivoting Eraser.  Success Rate: 9/10

pivot_poki_short_side_learned_edit.mp4

Direct Transfer

Pivoting Pocky Box (short).  Success Rate: 0/10

pivot_poki_short_side_manually_tuned_edit.mp4

Manual Tune

Pivoting Pocky Box (short).  Success Rate: 1/10

pivot_poki_short_side_proposed_edit.mp4

Proposed

Pivoting Pocky Box (short).  Success Rate: 8/10

poki_learned_edit.mp4

Direct Transfer

Pivoting Pocky Box (long).  Success Rate: 0/10

pivot_poki_long_side_manually_tunned_edit.mp4

Manual Tune

Pivoting Pocky Box (long).  Success Rate: 1/10

poki_proposed_edit.mp4

Proposed

Pivoting Pocky Box (long).  Success Rate: 7/10

Ablation on Weight Parameter Selection

We study the effect of weight parameter selection in online admittance residual learning. We find smaller weights tend to prioritize trajectory smoothness, and larger weight values prioritize task completion.  

w=.0_part2.mp4

w=0.0

Assembly Task.  Success Rate: 8/10

w=.2.mp4

w=0.2

Assembly Task.  Success Rate: 9/10

proposed_sim2real.mp4

w=0.4

Assembly Task.  Success Rate: 10/10

w=.6.mp4

w=0.6

Assembly Task.  Success Rate: 10/10

w=.8.mp4

w=0.8

Assembly Task.  Success Rate: 10/10

w=1.mp4

w=1.0

Assembly Task.  Success Rate: 10/10

pivot_w_0_0_edit.mp4

w=0.0

Pivoting Task.  Success Rate: 10/10

pivot_w_0_2_edit.mp4

w=0.2

Pivoting Task.  Success Rate: 9/10

pivot_w_0_4_edit.mp4

w=0.4

Pivoting Task.  Success Rate: 9/10

pivot_w_0_6_edit.mp4

w=0.6

Pivoting Task.  Success Rate: 7/10

pivot_w_0_8_edit.mp4

w=0.8

Pivoting Task.  Success Rate: 9/10

pivot_w_1_0_edit.mp4

w=1.0

Pivoting Task.  Success Rate: 6/10

Additional Experiments: Screwing

screw_1.mp4
screw.mp4

We additionally evaluate the effectiveness of the framework on a screwing task.  We directly applied the learned policy for the square peg-in-hole with the proposed online admittance learning method on an M10 bolt-nut assembly task and found it can robustly align the bolt with the nut. Therefore, instead of retraining from scratch, we directly utilized this policy and added a rotation primitive for screwing. The entire process, the online admittance learning, is constantly optimized for the admittance controller. Surprisingly, we can robustly align the nut and bolt with the proposed approach and stably screw it. 


Ablation on smaller clearances

learned policy on 1mm clearance

 testing on 0.5mm clearance

small_clearance_proposed.mp4

testing with proposed approach

Ablation on reward function design

original reward

distance based reward

dist_reward.mp4

distance reward on real robot

Ablation on model-free RL algorithms

DDPG

ddpg_direct.mp4

Direct transfer

ddpg_proposed.mp4

Proposed method

TD3

td3_direct.mp4

Direct transfer

td3_proposed.mp4

Proposed method

DDPG for pivoting

TD3 for pivoting

Comparison of Contact Force

Here we show the comparison of contact force. We let the robot move downwards to generate contact with the table at the same speed both in simulation and the real world. The figure on the left shows the difference in the contact force and further indicates the sim-to-real gap.

Additional Study: Comparison of External Force Estimation 

Here we compare the performance of using different external force estimation/modeling methods. The task is to let an admittance-controlled robot move to a table to establish contact. The objective function is purely the FITAVE objective to ensure the smoothness of the trajectory and reduce the oscillation. We found the 'record & replay' strategy performs better than using online data to fit a force model.

Model_fitting_edit.mp4

Force Model Fitting

With bad initialization, the robot contacts with the table and results in oscillation. Here we online fit a force model and utilize it in our online admittance learning. However, the robot cannot stablize itself.


Record_Replay_edit.mp4

Record & Replay

In comparison, with the same setup but using the record and replay strategy. The online admittance residual learning can efficiently reduce oscillation and establish stable contact

Additional Study: Utilizing Online Admittance Learning for Obtaining Optimal Admittance/Impedance Control Parameters

Here we demonstrate the effectiveness of purely using online admittance learning with (random) initialized admittance control parameters. We let the robot contact the table with different materials. With randomly initialized parameters, the robot will oscillate and cannot stabilize itself. With online admittance learning, the robot can instantly find the optimal control parameters and eliminate the oscillation within one step .

Experiment 2 (Plastic Board Contact)_ Constant Gain Baseline (CGIC).mp4

Random Gain

Plastic Surface. Bouncing on the surface

Experiment 2 (Plastic Board Contact)_ Safe OnGO-VIC.mp4

Online Admittance Learning

Plastic Surface. Instantly eliminate oscillations

Experiment 2 (Metal Board Contact)_ Constant Gain Baseline (CGIC).mp4

Random Gain

Metal Surface. Bouncing on the surface

Experiment 2 (Metal Board Contact)_ Safe OnGO-VIC.mp4

Online Admittance Learning

Metal Surface. Instantly eliminate oscillations.

Experiment 2 (Wood Chair Contact)_ Constant Gain Baseline.mp4

Random Gain

Wood Surface. Bouncing on the surface

Experiment 2 (Wood Chair Contact)_ Safe OnGO-VIC.mp4

Online Admittance Learning

Wood Surface. Instantly eliminate oscillations.