DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

Yanjiang Guo*, Yen-Jen Wang*, Lihan Zha*, Jianyu Chen

IROS 2024

Abstract: In this paper, we propose DoReMi, a novel language model grounding framework that enables immediate Detection and Recovery from Misalignments between plan and execution. Specifically, we leverage LLMs to play a dual role, aiding not only in high-level planning but also generating constraints that can indicate misalignment during execution. Then vision language models (VLMs) are utilized to detect constraint violations continuously. Our pipeline can monitor the low-level execution and enable timely recovery if certain plan-execution misalignment occurs. Experiments on various complex tasks including robot arms and humanoid robots demonstrate that our method can lead to higher task success rates and shorter task completion times.

Experiments - Real-World Humanoid Robot!

We finetune Blip-2 VLM on XiaoXing Robot from RobotEra so that VLM can detect accidents immediately!

Stack block:

VLM detects the block collapses and recovers.

Prepare food:

VLM detects the food is dropped and recovers.

Experiments - Simulated Humanoid Robot

1. Task 1: Go forward with unexpected obstacles

(2x speed)

Baseline

Delayed replanning leads to failure.

DoReMi (Ours)

Immediate re-plan and recovery lead to success.

2. Task 2: Move box

with random drop

(2x speed)

Baseline:

Only replan when previous skill finished.

Complete the task in 68s.

DoReMi (Ours): efficient

Immediate re-plan and recovery.

Complete the task in 43s.

3. Task 3: Prepare food (Complicated task!)

with pick failure and random drop

Collect 5 demonstrations in simple scenarios with only fruit objects and plain backgrounds. (as shown in the right)

Finetune Vison-language model on it.

Test with Unseen objects and Unseen backgrounds!
(e.g., vegetables, junk food, and seafood.)(e.g., random background colors)

Unseen objects and backgrounds!

Unseen objects and backgrounds!

Unseen objects and backgrounds!

Even benefit Unseen tasks! Discover box drop more quickly!

Zero-shot transferred VLM

Discover drop until the box disappeared in the horizon.

Few-shot finetuned VLM

Discover drop immediately!

Experiments - Robot Arm

Task 1: Pick and place with random drops

Baseline: Longer execution time.

Re-plan only the previous trajectory finished.

Task 1: Pick and place with random drops

DoReMi (Ours): Shorter execution time.

Immediate detection and re-plan.

Task 2: Stack blocks in order with placement noise

Baseline

Repeating the previous step lead to failure.

Task 2: Stack blocks in order with placement noise

DoReMi (Ours)

Immediate re-plan and recovery from collapse lead to success.

Page updated

Google Sites

Report abuse