To Err is Robotic

To Err is Robotic: Rapid Value-Based

Trial-and-Error during Deployment

Anonymous Authors | CoRL 2024 Submission

Note: gifs may take a little time to load

Base Policy

The trained base policy is not well-equipped to handle mistakes

Base Policy + Bellman-Guided Retrials

Our method, Bellman-Guided Retrials, explicitly endows the base policy with mistake detection, recovery, and retry capacities

Video

CoRLSupp.mp4

Bellman Guided Retrials Approach

Goal: Robots should be able to adapt quickly to a novel situation by making initial mistakes, recovering, and trying new things.

Problem: Expert datasets used to train robots may not show how to detect and correct mistakes properly.

Insight: Expert datasets always show how fast an expert would accomplish a similar task, and we can use this expectation to evaluate the viability of a robot's current strategy and make corrections if necessary.

Approach: Train a value function using expert demonstrations and then use the value function's self-consistency (Bellman error) to detect suboptimal strategies. When we detect suboptimality, we recover the robot and ask the base policy for a different strategy. This endows any trained base policy with explicit capacities to evaluate, recover, and retry.

Real Robot Experiments

For simulation results, quantitative analysis, and further discussion, please refer to the submitted paper.

RealObjectLift

For RealObjectLift, we evaluate on train, test, and adversarial splits. The adversarial split is accomplished by adding low-friction film to the robot grippers and objects, which reduces the number of viable grasps. We show some representative examples of evaluations below, with recovery periods highlighted:

Train: Carrot (Ours)

Test: Cheese (Ours)

Test: Pepper (Ours)

Test: Olive Oil Bottle (Ours)

Adversarial: Slippery Lime 1 (Ours)

Adversarial: Slippery Lime 2 (Ours)

We compare to baselines including the base policy, shown below. The base policy was trained on expert data, and it struggles with recovering from a mistake. On average, Bellman-Guided Retrials improves the base policy by over 50% in success rate.

DoorOpening

We test Bellman-Guided Retrials on a complicated long horizon as a proof of concept that our method works in a large range of task settings beyond object grasping. The DoorOpening task requires the robot to open a twist-lock tool cabinet by reaching towards the handle, pushing upwards to disengage the lock, regrabbing the handle, and pulling outwards to open the door. The base policy commonly fails by pushing too close to the pivot of the lock and getting stuck. Adding Bellman-Guided Retrials more than doubles the base policy performance. Qualitatively, we also observe that the robot switches strategies after the failure.

Base Policy

Base Policy + Bellman-Guided Retrials

Page updated

Google Sites

Report abuse