To assess whether our method is transferable to the real world without fine-tuning, we evaluate several policies on TIAGo using ten test objects of varying stiffness. We chose not to include πNO-CURR as it did not learn any rewarding behavior and neither πNO-PEN since it was clearly unable to minimize object movements. On the other hand, πNO-RAND is included in the real-world evaluation to test whether it performs well on a specific object type that is similar to the environment configuration it was trained on. We perform 20 grasping trials per object and method, yielding 20×10×4 = 800 real-world trials in total. f goal is sampled randomly, where the real-world experiment results from Sec. III-C were used to determine the upper and lower bounds of the sampling interval. In each trial, the object is offset to one finger; in half of the trials, it is placed closer to the left finger, and in the other half, closer to the right. Then, the policy is commanded to perform a grasp, and after 6 seconds (150 steps at 25 Hz), the gripper is automatically opened again. The process repeats after the reward is computed and the traveled distance is measured. Note that the force reward is not comparable between objects since it depends on the object’s softness and width because they determine the amount of time a force reward can be achieved. Wider objects come in contact with the fingers earlier, and force rewards are generated in more time steps than for narrower objects. The softer an object is, the slower the force builds up, leading to a reduced reward. Instead of measuring and integrating the object velocity to calculate the total object displacement for each trial, we measured it by placing the objects on millimeter paper, annotating the start and end positions, and calculating the difference. No displacements for the Mug are reported as it is almost as wide as the gripper opening and would, therefore, not move during a grasp regardless of the policy. As a consequence of using millimeter paper, the reported real-world object displacement measurements are less precise than the ones in the simulation, meaning they can not be compared directly.
For the complete discussion of the evaluation, please refer to the paper (link at the top of the page).