Abstract
Learning contact-rich manipulation skills is essential to robotic applications. Such skills require the robots to interact with the environment with feasible manipulation trajectories and suitable compliance control parameters to enable safe and stable contact. However, learning these skills is challenging due to data inefficiency in the real world and the sim-to-real gap in simulation. In this paper, we introduce a hybrid offline-online framework to learn robust manipulation skills. We employ model-free reinforcement learning for the offline phase to obtain the robot motion and compliance control parameters in simulation. Subsequently, in the online phase, we learn the residual of the compliance control parameters to maximize robot performance-related criteria with force sensor measurements in real time. To demonstrate the effectiveness and robustness of our approach, we provide comparative results against existing methods for assembly and pivoting tasks.
Framework Overview
We propose a framework to learn robot manipulation skills that can transfer to the real world. The framework contains two phases: skill learning in simulation and admittance adaptation on the real robot. We use model-free RL to learn the robot's motion with domain randomization to enhance the robustness for direct transfer. The compliance control parameters are learned at the same time and serve as an initialization to online admittance learning. During online execution, we iteratively learn the residual of the admittance control parameters by optimizing the future robot trajectory smoothness and task completion criterion. We conduct real-world experiments on two typical contact-rich manipulation tasks: assembly and pivoting. Our proposed framework achieves efficient transfer from simulation to the real world. Furthermore, it shows excellent generalization ability in tasks with different kinematic or dynamic properties.
Learned Polices in Simulation
Learned Policy in Simulation. Success rate: 100%
Learned Policy in Simulation. Success rate: 100%
Real World Experiments: Sim-to-real Transfer
Assembly Square Peg. Success Rate: 3/10
Assembly Square Peg. Success Rate: 10/10
Assembly Square Peg. Success Rate: 10/10
Pivoting Wood Square. Success Rate: 0/10
Pivoting Wood Square. Success Rate: 10/10
Pivoting Wood Square. Success Rate: 9/10
Real World Experiments: Assembly Tasks GeneralizationÂ
Assembly Triangle Peg. Â Success Rate: 0/10
Assembly Triangle Peg. Success Rate: 8/10
Assembly Triangle Peg. Success Rate: 10/10
Assembly Pentagon Peg. Â Success Rate: 1/10
Assembly Pentagon Peg. Success Rate: 9/10
Assembly Pentagon Peg. Success Rate: 10/10
Assembly Ethernet Connector. Â Success Rate: 0/10
Assembly Ethernet Connector. Success Rate: 1/10
Assembly Ethernet Connector. Success Rate: 9/10
Assembly Waterproof Connector. Â Success Rate: 0/10
Assembly Waterproof Connector. Success Rate: 0/10
Assembly Waterproof Connector. Success Rate: 9/10
Real World Experiments: Pivoting Tasks GeneralizationÂ
Pivoting Adapter. Â Success Rate: 0/10
Pivoting Adapter. Â Success Rate: 0/10
Pivoting Adapter. Success Rate: 8/10
Pivoting Eraser. Â Success Rate: 0/10
Pivoting Eraser. Success Rate: 10/10
Pivoting Eraser. Success Rate: 9/10
Pivoting Pocky Box (short). Â Success Rate: 0/10
Pivoting Pocky Box (short). Success Rate: 1/10
Pivoting Pocky Box (short). Success Rate: 8/10
Pivoting Pocky Box (long). Success Rate: 0/10
Pivoting Pocky Box (long). Success Rate: 1/10
Pivoting Pocky Box (long). Success Rate: 7/10
Ablation on Weight Parameter Selection
We study the effect of weight parameter selection in online admittance residual learning. We find smaller weights tend to prioritize trajectory smoothness, and larger weight values prioritize task completion. Â
Assembly Task. Â Success Rate: 8/10
Assembly Task. Success Rate: 9/10
Assembly Task. Success Rate: 10/10
Assembly Task. Success Rate: 10/10
Assembly Task. Success Rate: 10/10
Assembly Task. Success Rate: 10/10
Pivoting Task. Â Success Rate: 10/10
Pivoting Task. Success Rate: 9/10
Pivoting Task. Success Rate: 9/10
Pivoting Task. Success Rate: 7/10
Pivoting Task. Success Rate: 9/10
Pivoting Task. Success Rate: 6/10
Additional Experiments: Screwing
We additionally evaluate the effectiveness of the framework on a screwing task. We directly applied the learned policy for the square peg-in-hole with the proposed online admittance learning method on an M10 bolt-nut assembly task and found it can robustly align the bolt with the nut. Therefore, instead of retraining from scratch, we directly utilized this policy and added a rotation primitive for screwing. The entire process, the online admittance learning, is constantly optimized for the admittance controller. Surprisingly, we can robustly align the nut and bolt with the proposed approach and stably screw it.Â
Ablation on smaller clearances
Ablation on reward function design
Ablation on model-free RL algorithms
Comparison of Contact Force
Here we show the comparison of contact force. We let the robot move downwards to generate contact with the table at the same speed both in simulation and the real world. The figure on the left shows the difference in the contact force and further indicates the sim-to-real gap.
Additional Study: Comparison of External Force EstimationÂ
Here we compare the performance of using different external force estimation/modeling methods. The task is to let an admittance-controlled robot move to a table to establish contact. The objective function is purely the FITAVE objective to ensure the smoothness of the trajectory and reduce the oscillation. We found the 'record & replay' strategy performs better than using online data to fit a force model.
With bad initialization, the robot contacts with the table and results in oscillation. Here we online fit a force model and utilize it in our online admittance learning. However, the robot cannot stablize itself.
In comparison, with the same setup but using the record and replay strategy. The online admittance residual learning can efficiently reduce oscillation and establish stable contact
Additional Study: Utilizing Online Admittance Learning for Obtaining Optimal Admittance/Impedance Control Parameters
Here we demonstrate the effectiveness of purely using online admittance learning with (random) initialized admittance control parameters. We let the robot contact the table with different materials. With randomly initialized parameters, the robot will oscillate and cannot stabilize itself. With online admittance learning, the robot can instantly find the optimal control parameters and eliminate the oscillation within one step .
Plastic Surface. Bouncing on the surface
Plastic Surface. Instantly eliminate oscillations
Metal Surface. Bouncing on the surface
Metal Surface. Instantly eliminate oscillations.
Wood Surface. Bouncing on the surface
Wood Surface. Instantly eliminate oscillations.