CORN: Contact-based Object Representation for Nonprehensile Manipulation of General Unseen Objects

Abstract

Nonprehensile manipulation is essential for manipulating objects that are too thin, large, or otherwise ungraspable in the wild. To sidestep the difficulty of contact modeling in conventional approaches, reinforcement learning (RL) has recently emerged as a promising paradigm. However, previous approaches either lack the ability to generalize over diverse object shapes, or use simple action primitives that limit the diversity of robot motions. Furthermore, using RL over diverse object geometry is challenging due to the high cost of training a policy that takes in high-dimensional sensory inputs. We propose a novel contact-based object representation and pre-training pipeline to tackle this. To enable massively parallel training, we leverage a lightweight patch-based transformer architecture for our encoder that processes point clouds, thus scaling our training across thousands of environments. Compared to learning from scratch, or other shape representation baselines, our representation facilitates efficient learning in terms of both computation and data without loss of performance. We validate the efficacy of our overall system by showing that the resulting policy can zero-shot transfer to unseen real-world objects.

Overview

Attention visualization

To interpret what part of the object geometry the policy prioritizes, we visualize the scores from the cross-attention layers, summed over all heads. We colorize the attention for each patch, then project the patchwise colors on the surface of the object for visualization.

Colormap for attention visualization. blue and yellow colors corresponds to low and high attention values, respectively.

CORN: Contact-based Object Representation for Nonprehensile Manipulation of General Unseen Objects

Overview

Attention visualization

Pushing

Toppling

Pivoting

Rolling