Learning Any-View 6DoF Robotic Grasping in

Cluttered Scenes via Neural Surface Rendering

TL;DR: A re-interpretation of robotic grasping as neural surface rendering for learning global and local representations that enable effective any-view grasping

Our method, NeuGraspNet, uses just a single random-view depth input, encodes the scene in an implicit feature volume, & uses multi-level rendering to select relevant features & predict grasping functions. NeuGraspNet generalizes to random-view mobile manipulation grasping scenarios.

Summary: 

We introduce a novel, fully implicit 6DoF grasp detection method, NeuGraspNet, that re-interprets robotic grasping as surface rendering and predicts high-fidelity grasps from any random single viewpoint of a scene.  Our method exploits a learned implicit geometric scene representation to perform global and local surface rendering. This enables effective grasp candidate generation (using global features) and grasp quality prediction (using local features from a shared feature space). Our local neural surface rendering allows the model to encode the interaction between the robot's end-effector and the object's surface geometry. NeuGraspNet outperforms existing implicit and semi-implicit baseline methods in the literature. We demonstrate the real-world applicability of NeuGraspNet with a mobile manipulator robot, grasping in open spaces with clutter by rendering the scene, reasoning about graspable areas of different objects, and selecting grasps likely to succeed without colliding with the environment.


NeuGraspNet: A single-view 3D Truncated Signed Distance Field (TSDF) grid is processed through a convolutional occupancy network to reconstruct the scene. The occupancy network is used to perform global, scene-level rendering. The rendered scene is used for grasp candidate generation in SE(3). We re-interpret grasping as rendering of local surface points and query their features from the shared 3D feature volume. Local points, their features, and the 6DoF grasp pose are passed to a Grasping PointNetwork to predict per grasp quality. NeuGraspNet effectively learns the interaction between the objects' geometry and the gripper to detect high-fidelity grasps.

Video demonstration: