Dex-Net AR: Distributed Deep Grasp Planning Using a Commodity Cellphone and Augmented Reality App


Harry Zhang, Jeffrey Ichnowski, Yahav Avigal, Joseph Gonzalez, Ion Stoica, and Ken Goldberg

University of California, Berkeley

Paper in proceedings (ICRA 2020, Paris, France): [Link] Code (Point cloud collector and depth map transformation): [Link]

Abstract

Consumer demand for augmented reality (AR) in mobile phone applications, such as the Apple ARKit. Such applications have potential to expand access to robot grasp planning systems such as Dex-Net. AR apps use structure from motion methods to compute a point cloud from a sequence of RGB images taken by the camera as it is moved around an object. However, the resulting point clouds are often noisy due to estimation errors. We present a distributed pipeline, Dex-Net AR, that allows point clouds to be uploaded to a server in our lab, cleaned, and evaluated by Dex-Net grasp planner to generate a grasp axis that is returned and displayed as an overlay on the object. We implement Dex-Net AR using the iPhone and ARKit and compare results with those generated with high-performance depth sensors. The success rates with AR on harder adversarial objects are higher than traditional depth images.

ICRA 2020 Conference Presentation


ICRA 2020 Video Submission


Dex-Net AR System Pipeline

Point Cloud Cleaning Algorithm

Via K-Nearest Neighbors and Random Samples Consensus (RANSAC)

Multi-View Grasps

PhoXi depth camera is only able to capture top-down view depth maps, thus limiting the DoF and even grasp robustness.

Using AR point clouds, we can leverage the 3D geometry and render depth maps from arbitrary viewpoints, which could result in better physical grasps.

However, top-down depth grasps rarely deal with ground collision, as opposed to multi-view.