Evo-NeRF: Evolving NeRF for Sequential Robot Grasping of Transparent Objects

Justin Kerr, Letian Fu, Huang Huang, Yahav Avigal, Matthew Tancik, Jeffrey Ichnowski, Angjoo Kanazawa, Ken Goldberg

Bibtex

@inproceedings{kerr2022evo,

title={Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects},

author={Kerr, Justin and Fu, Letian and Huang, Huang and Avigal, Yahav and Tancik, Matthew and Ichnowski, Jeffrey and Kanazawa, Angjoo and Goldberg, Ken},

booktitle={6th Annual Conference on Robot Learning},

year={2022}

}

Abstract

Perceiving and handling transparent objects is critical in many industrial and household scenarios; however, their visual properties make perceiving their geometry difficult. Neural Radiance Fields (NeRFs) are a promising representation of scene geometry for downstream robotics task, however they have previously required hours of computation, making them impractical in robotics. We leverage recent advancements in NeRF which have made rapid training possible, and further extend the training regime to incrementally optimize over a stream of images as they are captured. The capture and optimization are terminated when a sufficient task confidence is achieved. Additional regularizations motivated by geometry graspability improve performance in rapid capture settings. These captures can still lead to unreliable geometry from NeRF, causing out-of-the-box grasp planners such as Dex-Net to struggle. To mitigate this distribution shift, we train a network adapted to NeRF's appearance. To obtain training data, we simulate photo-realistic scenes and train NeRF on them to produce a dataset of NeRF-rendered geometry and corresponding grasps. This method achieves a 89% grasp success rate over 27 trials on single objects, with early capture termination providing a 41% speed improvement with no loss in reliability. We additionally report results on a practical decluttering task.

Single object extraction with early stop

(a) Robot capture trajectory with early stop where point 1 denotes way-points near the beginning of the trajectory and point 3 denotes the early stop point. (b) NeRF optimization using the images captures at each of the three points with time stamp shown on the left bottom and Evo-NeRF confidence output shown at the bottom. (c) Robot execution of the grasp given by Evo-NeRF at the early stop point. It's shown that \netname{} confidence increases with more captures and reaches above the threshold of 0.7 at the early stop point. The robot successfully grasps the object using the given grasp.

Sequential Object Removal

(a) The YuMi robot with a camera on one arm and a parallel jaw gripper on the other, with a scene of household objects in the center. The red arrows indicate the full capture trajectory of the robot. (b) Robot grasping an object enabled by Evo-NeRF. (c) Example camera capture trajectory to update the NeRF reconstruction. (d) Evo-NeRF first reconstructs the whole scene (step 1) with the camera capture trajectory shown in (a), then progressively updates the scene with small camera captures (shown in (c)) as objects are removed.

Sim2Real Dataset Generation

Each scene includes a subset of the training objects in simulation. Top: Grasp generation starts by sampling grasps on the object meshes and projecting it to a top-down view. Bottom: We render multiple views of each scene using Blender, then train Instant-NGP and render a top-down depth image. We accumulate NeRF depth rendering and projected grasps into a dataset.