Summary and Conclusions

Summary & Conclusions

In this project, we were motivated by the need to achieve target pose-aware grasping that is generalizable across object categories, for arbitrary initial and target poses. This idea made intuitive sense since it was inspired by the same flow of actions that humans follow to grasp items at unseen arbitrary configurations and place them at random arbitrary target poses. Rather than looking at the problem from pose estimation and grasping perspectives as isolated problems, we decided to use object descriptors to establish the difference in poses between the initial and target poses using continuous and differentiable energy fields. We then formulate the problem of finding the desired grasp as an energy optimization problem. For this, we employed Neural Descriptor Fields (NDF), an SE(3) equivariant feature descriptor. We also pursued the goal of having no demonstrations in the pipeline. We instead use a grasping network to provide a strong prior for the energy optimization process. More specifically, we show that using Volumetric Grasping Network (VGN) was successful in producing kinematically feasible grasps in the target configuration that can be used to determine the desired grasp in the source. Since VGN is easily generalizable across several novel object categories, it also mitigates the limitation on category-specific grasping as presented in the original NDF implementation. In the near future, we will further prune the grasps chosen based on their functional use. The future direction of this work will aim to combine the various networks into a more optimized and modular end-to-end network that can take the initial scene RGBD image, the target desired configuration RGBD image, the robot URDF and output a set of kinematically feasible grasps in the initial scene that guarantee the desired target pose. The optimization will aim to run on real robots in real-time with much lesser computational overhead. While we are not claiming that our solution solves the challenges of generalized target pose-aware grasping, we are happy with the progress we were able to make and believe that this work is in the right direction towards eventually achieving this goal.

A summary of the various sub-components used in the pipeline

Page updated

Report abuse