DiPGrasp: Parallel Local Searching for Efficient Differentiable Grasp Planning

Supplementary File

Supplementary File for detailed Application setting and more experiments: link

Abstract

Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner to satisfy all these goals. DiPGrasp takes a geometric surface matching grasp quality metric. It adopts a gradient-based optimization scheme on the metric which also considers parallel sampling and collision handling. This not only drastically accelerates the grasp search process over the object surface but also makes it differentiable. We apply DiPGrasp to three applications, namely grasp dataset construction, mask-conditioned planning, and pose refinement. For dataset generation, as a standalone planner, DiPGrasp has clear advantages over speed and quality in comparison with several classic planners. For mask-conditioned planning, it can turn a 3D perception model into a 3D grasp detection model instantly. As a pose refiner, it can optimize the coarse grasp prediction from the neural network, as well as the neural network parameters. Finally, we conduct real-world experiments with the Barrett hand and Schunk SVH 5-finger hand.

Method

DiPGrasp takes a point cloud with normal as input. It first samples locations on the point cloud (red dot) and initializes the pose

accordingly. Then it operates the differentiable optimization process to generate the grasps.

Applications

Grasp Dataset Generation

Extremely fast planning: For a Schunk SVH hand, it takes 2.5s to search over 80 locations (depends on GPU memory), results in 21 valid grasp. That is 118ms for a valid grasp in average. For a Barrett hand, it takes 30ms for a valid grasp in average.
Physics-based simulation evaluation: We evaluate the grasp poses in a Unity-based simulator. To note, the planning results validated by the planning algorithms may be rejected by the simulation evaluation. It is common since the contact modeling of planning algorithm and physics-based simulation are different.

To make the grasp evaluation in simulation can be conducted in parallel.
1. We load object and the corresponding grasp pose, the initial object pose is adjust to hand-centered pose.

2. The gripper will try to grasp the object, and lift it up 20cm.

3. We adjust the gravity directions multiple times.

If the object is still in gripper's hand, we consider it a valid grasp after physics-based simulation evaluation.

Scene Construction

Finally, we put the object models into a scene, and generate the IR-based scene point cloud, 3D annotations and grasp annotations for later use in Mask-conditioned planning.

Mask-Conditioned Planning

Standalone & differentiable: We can use DiPGrasp as a plug-and-play module for a pure perception model like Mask3D. When DiPGrasp is appended to Mask3D, it can directly conduct grasp planning on each segmented object point cloud, turning a 3D instance segmentation framework into a 3D dexterous grasp detection framework.

Pose Refinement

Differentiable: We can refine the predicted coarse grasp poses by appending DiPGrasp to a simple neural network regressor. Even the neural network prediction is extremely coarse in below case, we still can progressively make it a better grasp.

Real-World Experiments

Real-world experiments are based on Mask-conditioned Planning model trained from the data generated by Grasp Dataset Generation.
Real-world experiments please see the video in supplementary.