DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

Abstract: In this paper, we propose DoReMi, a novel language model grounding framework that enables immediate Detection and Recovery from Misalignments between plan and execution. Specifically, we leverage LLMs to play a dual role, aiding not only in high-level planning but also generating constraints that can indicate misalignment during execution. Then vision language models (VLMs) are utilized to detect constraint violations continuously.  Our pipeline can monitor the low-level execution and enable timely recovery if certain plan-execution misalignment occurs. Experiments on various complex tasks including robot arms and humanoid robots demonstrate that our method can lead to higher task success rates and shorter task completion times. 

Method:

Experiments - Robot Arm      

Task 1: Pick and place with random drops

Baseline: Longer execution time.

Re-plan only the previous trajectory finished. 

DoReMi (Ours): Shorter execution time.

Immediate detection and re-plan.

Task 2: Stack blocks in order with placement noise


Baseline

Repeating the previous step lead to failure.

DoReMi (Ours)

Immediate re-plan and recovery from collapse lead to success.

Experiments - Humanoid Robot

 (2x speed)



Baseline

Delayed replanning leads to failure.

DoReMi (Ours)

Immediate re-plan and recovery lead to success.

with random drop

 (2x speed)



Baseline: 

Only replan when previous skill finished.

 Complete the task in 68s.

DoReMi (Ours): efficient

Immediate re-plan and recovery.  

Complete the task in 43s.

with pick failure and random drop

Collect 5 demonstrations in simple scenarios with only fruit objects and plain backgrounds. (as shown in the right)

Finetune Vison-language model on it.

Unseen objects and backgrounds!

Unseen objects and backgrounds!

Unseen objects and backgrounds!


Zero-shot transferred VLM

Discover drop until the box disappeared in the horizon.

Few-shot finetuned VLM

Discover drop immediately!