In its current state the FPA graspnet is computationally intensive, and not optimized to work in real-time on a real robot.
We do not yet have a working implementation that can prune the feasible grasps that are also functional due to limitations on time. We will incorporate this additional component in the near future. We will also work on fully integrating VGN with our implementation in the near future.
Our implementation aims to combine the results from multiple pipelines but does not do so in an end-to-end manner - i.e., the neural networks still are agnostic to the other networks.
The future direction of this work will aim to combine the various networks into a more optimized and modular end-to-end network that can take the initial scene RGBD image, the target desired configuration RGBD image, the robot URDF and output a set of kinematically feasible grasps in the initial scene that guarantee the desired target pose.
Stretch Goal: The dataset used for training the Neural Descriptor Field (ShapeNet) and requires inputs as RGB-D images. It is also currently agnostic to the feasibility of the path taken by the robot from initial to target configuration. It would be interesting to see how we can leverage video datasets (of humans doing similar tasks) to achieve this task. We believe this direction makes sense given the wide and easy availability of video data and the potential to infer kinematic feasibility from human grasps.