Exploratory Grasping

Exploratory Grasping:

Asymptotically Optimal Algorithms for Grasping Challenging Polyhedral Objects

Michael Danielczuk*, Ashwin Balakrishna*,

Daniel S. Brown, Shivin Devgon, Ken Goldberg

*equal contribution

Abstract

There has been significant recent work on data-driven algorithms for learning general-purpose grasping policies. However, these policies can consistently fail to grasp challenging objects which are significantly out of the distribution of objects in the training data or which have very few high quality grasps. Motivated by such objects, we propose a novel problem setting, Exploratory Grasping, for efficiently discovering reliable grasps on an unknown polyhedral object via sequential grasping, releasing, and toppling. We formalize Exploratory Grasping as a Markov Decision Process where we assume that the robot can (1) distinguish stable poses of a polyhedral object of unknown geometry, (2) generate grasp candidates on these poses and execute them, (3) determine whether each grasp is successful, and (4) release the object into a random new pose after a grasp success or topple the object after a grasp failure. We study the theoretical complexity of Exploratory Grasping in the context of reinforcement learning and present an efficient bandit-style algorithm, Bandits for Online Rapid Grasp Exploration Strategy (BORGES), which leverages the structure of the problem to efficiently discover high performing grasps for each object stable pose. BORGES can be used to complement any general-purpose grasping algorithm with any grasp modality (parallel-jaw, suction, multi-fingered, etc) to learn policies for objects in which they exhibit persistent failures. Simulation experiments suggest that BORGES can significantly outperform both general-purpose grasping pipelines and two other online learning algorithms and achieves performance within 5% of the optimal policy within 1000 and 8000 timesteps on average across 46 challenging objects from the Dex-Net adversarial and EGAD! object datasets, respectively. Initial physical experiments suggest that BORGES can improve grasp success rate by 45% over a Dex-Net baseline with just 200 grasp attempts in the real world.

Exploratory Grasping

Inspired by infants that repeatedly attempt to grasp a toy until they can learn reliable ways to grasp it, we consider a novel problem: Exploratory Grasping, where a robot is presented with an unknown object and learns to reliably grasp it by repeatedly attempting grasps and allowing the object pose to evolve based on grasp outcomes. The objective is for the robot to explore grasps across different object poses to reliably grasp the object from any of its stable resting poses. We formalize Exploratory Grasping as an MDP in which the robot attempts grasps in the current stable pose, and if a grasp is successful, the robot lifts and releases the object to sample a new random stable pose. If a grasp is unsuccessful, the object either remains in the same stable pose or topples into a new stable pose as a result of the perturbation from the failed grasp.

Bandits for Online Rapid Grasp Exploration Strategy (BORGES)

BORGES leverages the insight that most objects have a small finite set of stable poses and maintains a separate grasping policy for each stable pose to accelerate grasp exploration. We provide theoretical guarantees for BORGES and show that the grasping policy on each pose can be any no-regret algorithm for the multi-armed bandit problem. In experiments, we instantiate BORGES with two popular such algorithms, the UCB-1 algorithm and Thompson sampling.

Simulation Experiments

We evaluate BORGES with extensive simulation experiments. We compare BORGES with Dex-Net, a state of the art general-purpose grasping system that is fine-tuned online based on grasp outcomes, and UCRL2, a well known tabular RL algorithm. We evaluate 3 variants of BORGES, BORGES-UCB which uses the UCB algorithm for grasp exploration, BORGES-TS, which uses Thompson sampling with a uniform prior distribution, and BORGES-TS5, which uses Thompson sampling for grasp exploration but uses Dex-Net to seed initial grasp quality estimates. We find that BORGES significantly outperforms prior algorithms, and quickly approaches the performance of the optimal policy which knows the best grasp on each pose in advance. Notably, we find that BORGES-TS5 converges very quickly to near-optimal performance, indicating that BORGES can efficiently leverage priors from general-purpose grasping systems while rapidly converging to significantly higher grasp success rates.