vlm-planner

Completeness-Aware Task Planing and Execution using Pretrained Vision-Language Models in Manipulation Control

This page contain visualisation (images and videos) for Reliable Completeness-Aware Task Planning using VLM

We developed a framework that uses pretrained VLMs to generate the reliable and feasible task plan for manipulation control. Specifically, we construct the Planner and Evaluator that iteratively generate and evaluate the action plan for the robot given the human instruction and visual observation. The Planner will analyze the query and plan out the sequence of actions along with the necessary perception and grounding for execution to accomplish the task. The plan is then assessed by the Evaluator, detecting possible errors such as format, semantic or geometric violations that could prevent the successful execution and raising the necessary feedback to the Planner to modify the plan. This iterative process continues until the approval from the Evaluator, and then the plan will be performed by Executor to accomplish the task. Nevertheless, we enable the completeness check of the framework after the initial plan execution, allowing the framework to handle the long-horizon rich-semantic task to increase the successful rate of the overall execution

Evaluation

kitchen_collision.mp4

Kitchen Collision

We intentionally create the collision with the arm to trigger failure recovery and replanning

kitchen_insert.mp4

Kitchen Dynamics

We intentionally tamper the environment during the execution, triggering the task-completeness assessment and replanning

cans_collision.mp4

Cans Collision

Cans_Insert.mp4

Cans Dynamics

cubes_collision.mp4

Cubes and balls Collision

cubes_insert.mp4

Cubes and balls Dynamics

Page updated

Google Sites

Report abuse