Active Visual-Tactile Sensing for In-Hand Object Reconstruction

Wenqiang Xu, Yijin Chen, Zhenjun Yu, Siqiong Yao, and Cewu Lu

Our proposed visual-tactile learning framework VTacO and its extended version VTacOH can reconstruct both the rigid and non-rigid in-hand objects. It also supports refining the mesh in an incremental manner.

Abstract

Tactile sensing is one of the modalities humans rely on heavily to perceive the world. Working with vision, this modality refines local geometry structure, measures deformation at the contact area, and indicates the hand-object contact state.

This paper introduces VTacO, a novel visual-tactile in-hand object reconstruction framework. VTacO combines partial point cloud data with tactile images, leveraging neural networks and the Winding Number Field to represent complex geometries, including open and thin structures. We extend VTacO to Active VTacO, incorporating a policy network to minimize touch frequency for efficient geometry recovery.

Overview

We consider the problem of 3D reconstruction of in-hand objects from a visual-tactile perspective. We propose VTacO, a novel deep learning-based visual-tactile learning framework, in which we combine global feature from point cloud and local feature from tactile images to reconstruct objects based on winding number field (WNF). We also extend the model to VTacOH for hand-object interact, by modeling and reconstructing MANO model. Based on this, we explore how to actively reconstruct 3D shapes by reducing the contact required and propose A-VTacO.

Dataset

The object benchmarks we use are the following two:

YCB Objects Models
AKB-48 Objects Models

For generating simulated dataset, we present VT-Sim, a simulation environment based on Unity to produce training samples of hand-object interaction with ground truths of WNF, visual depth images, tactile signals, and sensor poses effectively.

We use GraspIt! for hand-pose acquisition. In Unity, we model rigid objects with RigidBody and mesh collider, and for deformable objects we model them with Obi, an XPBD-based physics engine.When we move the MANO hand with sensors to the retargeted pose, the hand sensor will collide with the object and forms the grasp.

Results

Code Release

Codes and instructions for training VTacO and VTacOH are available at VTacO.

Codes of Active-VTacO are available at A-VTacO.

Datasets of VTacO are available at VTacO_Dataset