Im2Contact: Vision-Based Contact Localization Without Touch or Force Sensing
Leon Kim, Yunshuang Li, Michael Posa, Dinesh Jayaraman

University of Pennsylvania

CoRL 2023

Link

Paper

GitHub

Code

Abstract

Contacts play a critical role in most manipulation tasks, yet robots today do not have any approach to reliably sense contacts in general settings. Force torque and touch sensing are limited by how little of the world they can sense and by sensor drift, so that today's contact perception approaches built on them require restrictive assumptions or prior knowledge of object geometries and frequent re-calibration. We propose a challenging vision-based extrinsic contact localization task: with only a single RGB-D camera view of a robot workspace, identify when and where an object held by the robot contacts the rest of the environment. We show that careful task-attuned design is critical for a neural network trained in simulation to discover solutions that transfer well to a real robot. Our final approach im2contact demonstrates the promise of versatile general-purpose contact perception from vision alone, performing well for localizing various contact types (point, line, or planar; sticking, sliding, or rolling; single or multiple), and even under occlusions in its camera view.

Highlight Videos of im2contact

We find im2contact can localize extrinsic contacts across a variety of real world settings involving rich and dynamic contact interactions with an unknown grasped object being manipulated by a Franka Panda arm.

Legend: Contact annotations (pink cross), predicted contact probability map (green>blue), contact predictions (green circle).

Moderately Occluded Scene

Non-convex Objects

Deformable Objects (not annotated)

Refer to Videos subpages (dropdown menu in upper right) for videos of all figures presented in paper and more

Zero-shot Evaluation on Human Demonstrations

Anecdotally, we find when transferring our method directly (same approach/dataset) to human demonstrations, predictions are still reasonable. 

Note: human demonstrations have not been annotated.

Bookshelf insertion

Scooping Objects in Bowl

Dishrack insertion​