Transferring Tactile-based Continuous Force Control Policies from Simulation to Robot

Luca Lach, Robert Haschke, Davide Tateo, Jan Peters, Helge Ritter, Júlia Borràs, Carme Torras

Abstract

The advent of tactile sensors in robotics has sparked many ideas on how robots can leverage direct contact measurements of their environment interactions to improve manipulation tasks. An important line of research in this regard is that of grasp force control, which aims to manipulate objects safely by limiting the amount of force exerted on the object. While prior works have either hand-modeled their force controllers, employed model-based approaches or have not shown sim-to-real transfer, we propose a model-free deep reinforcement learning approach that is trained in simulation and then transferred to the robot without further fine-tuning.

We therefore present a simulation environment that produces realistic normal forces, which we use to train continuous force control policies. An evaluation in which we compare against a baseline and perform an ablation study shows that our approach outperforms the hand-modeled baseline, and that our proposed inductive bias and domain randomization facilitate sim-to-real transfer.

Code   |   Paper*   |   Models | CAD

* points to a version published at the NeurIPS 2023 Workshop on Touch Processing. a conference preprint will be made available upon acceptance

Simulation Environment

A schematic overview of the grasping scenario we consider is shown on the left (upper), which details all parameters needed to define it. The gripper is depicted in its fully open state, with an object located between the fingers somewhere on the grasping axis.

W and O refer to the world and object frames, where W is considered fixed w.r.t. the gripper base and centered between the fingertips. If the offset oy is non-zero, the symmetric object is displaced w.r.t. to this center, causing one fingertip to touch it earlier than the other. The object width is defined by wo and the maximum penetration depth (or object deformation) is given by dp. Softer objects can be deformed more heavily and thus have larger values for dp

As the controller should learn to maintain the object position during grasping, we sample oy during episode initialization, thus exposing the learner to different object-gripper alignments. wo is also varied, such that the policy does not implicitly assume all objects to be of equal width.

We then performed a series of robot experiments in order to have realistic joint and sensor behavior in the MuJoCo simulation depicted on the left (lower). The results can be seen in the figures below.

Variation of actuator parameter b2

Variation of solver impedance

Variation of force scaling and impedance

Experimental Evaluation

We use Proximal Policy Optimization (PPO) to train the grasping policies. We first evaluate our proposed method in simulation, and then apply the policies to the real robot and compare them in terms of force reward and object movements.

We run training for a total of 4M steps with an episode length of 150 steps. In the beginning of each episode, all randomization parameters are sampled anew. Our network consists of two fully connected layers with 50 neurons each and ReLu activations. The output layer has two neurons, one per each finger’s desired position delta.

In a first evaluation, we compare our policy to a hand-crafted baseline model and perform an ablation study over two components of our method. Results can be seen on the left (upper), where we evaluated all models on different object softnesses kappa. All models perform well except for the one trained without domain randomization.

To assess whether our policies are general enough to be transferred to the real robot, we evaluate them on TIAGo using six test objects of varying stiffness. We perform 20 grasping trials per object and method, yielding 6 × 4 × 20 = 480 trials in total. In each trial, the object is offset to one finger, in half of the trials it is placed closer to the left finger, in the other half closer to the right. During the grasp, the object is placed on millimeter paper, allowing us to measure its movements during the grasp. Then, the model is commanded to perform a grasp, and after 6 seconds (150 steps at 25Hz) the gripper is automatically opened again. After the reward is computed and the traveled distance measured, the process repeats. Note, that the reward is not directly comparable to the simulation results, as it only includes the force reward. The object movement penalty would require measuring the object velocity at each time step: since this information is not available to us, we opted to measure the total object displacement after each trial instead.

Table I shows the results of the real-world evaluation. πIB shows the strongest overall performance with an average reward of 109, while the baseline achieved a slightly lower average reward of 107. Our experiments show that the inductive bias indeed facilitates sim-to-real performance, as evident by the lower average reward and higher object movements of πNO-IB. Furthermore, the (mostly) poor performance of πNO-RAND shows that domain randomization is a vital ingredient for both, generalization over different objects and sim-to-real transfer. πNO-RAND was able to perform well on rather soft objects however, likely because they behave similarly to the simulation object configuration it trained on with kappa = 0.5, and our default choice for b2 closely mirrors the robot behavior. Note, that all models exhibit slightly worse performance in terms of object movements for the Sponge as compared to other objects. This is due the Sponge being so light, that the sensors sometimes fail to detect first contact, a phenomenon also reported in other works.

The evaluation clearly shows that domain randomization is crucial for successful zero-shot policy transfer, and that domain knowledge in the form of inductive biases further facilitates the transfer. Without any domain randomization, policies will overfit to their narrow training distribution and fail to generalize. Our proposed simulation environment has shown to generate realistic forces, such that the transfer was possible for continuous control policies. 

force_ctrl_rl_icra2024_small.mp4

If you have any questions, feel free to contact us!