To Err is Robotic: Rapid Value-Based
Trial-and-Error during Deployment
Anonymous Authors | CoRL 2024 Submission
Note: gifs may take a little time to load
Base Policy
The trained base policy is not well-equipped to handle mistakes
Base Policy + Bellman-Guided Retrials
Our method, Bellman-Guided Retrials, explicitly endows the base policy with mistake detection, recovery, and retry capacities
Video

Bellman Guided Retrials Approach
Goal: Robots should be able to adapt quickly to a novel situation by making initial mistakes, recovering, and trying new things.
Problem: Expert datasets used to train robots may not show how to detect and correct mistakes properly.
Insight: Expert datasets always show how fast an expert would accomplish a similar task, and we can use this expectation to evaluate the viability of a robot's current strategy and make corrections if necessary.
Approach: Train a value function using expert demonstrations and then use the value function's self-consistency (Bellman error) to detect suboptimal strategies. When we detect suboptimality, we recover the robot and ask the base policy for a different strategy. This endows any trained base policy with explicit capacities to evaluate, recover, and retry.
Real Robot Experiments
For simulation results, quantitative analysis, and further discussion, please refer to the submitted paper.
RealObjectLift
For RealObjectLift, we evaluate on train, test, and adversarial splits. The adversarial split is accomplished by adding low-friction film to the robot grippers and objects, which reduces the number of viable grasps. We show some representative examples of evaluations below, with recovery periods highlighted:
Train: Carrot (Ours)
Test: Cheese (Ours)
Test: Pepper (Ours)
Test: Olive Oil Bottle (Ours)
Adversarial: Slippery Lime 1 (Ours)
Adversarial: Slippery Lime 2 (Ours)
We compare to baselines including the base policy, shown below. The base policy was trained on expert data, and it struggles with recovering from a mistake. On average, Bellman-Guided Retrials improves the base policy by over 50% in success rate.
DoorOpening
We test Bellman-Guided Retrials on a complicated long horizon as a proof of concept that our method works in a large range of task settings beyond object grasping. The DoorOpening task requires the robot to open a twist-lock tool cabinet by reaching towards the handle, pushing upwards to disengage the lock, regrabbing the handle, and pulling outwards to open the door. The base policy commonly fails by pushing too close to the pivot of the lock and getting stuck. Adding Bellman-Guided Retrials more than doubles the base policy performance. Qualitatively, we also observe that the robot switches strategies after the failure.
Base Policy
Base Policy + Bellman-Guided Retrials