Robot task planning is an important problem for autonomous robots in long-horizon challenging tasks.As large pre-trained models have demonstrated superior planning ability, recent research investigates utilizing large models to achieve autonomous planning for robots in diverse tasks. However, since the large models are pre-trained with Internet data and lack the knowledge of real task scenes, large models as planners may make unsafe decisions that hurt the robots and the surrounding environments. To solve this challenge, we propose a novel Safe Planner framework, which empowers safety awareness in large pre-trained models to accomplish safe and executable planning. In this framework, we develop a safety prediction module to guide the high-level large model planner, and this safety module trained in a simulator can be effectively transferred to real-world tasks. The proposed Safe Planner framework is evaluated on both simulated environments and real robots. The experiment results demonstrate that Safe Planner not only achieves state-of-the-art task success rates, but also substantially improves safety during task execution.
In this framework, the natural language task description is translated into the PDDL goals with an LLM. Taking the PDDL instruction, the current observation, and the safety prediction from the safety module as inputs, a VLM task planner outputs a selected skill in the form of PDDL operators, and the operator is executed with low-level skills in the environment. After the skill execution completes or exceeds a preset timestep, the VLM planner replans in a closed-loop way.
The real-world experiments are conducted on a fixed-based 7-dof Franka Panda robot arm.As the collision data on the real robots is difficult to obtain, the safety prediction module trained with the simulated data is directly transferred to the real-world setting in a zero-shot manner without fine-tuning.
To achieve a thorough evaluation, the table is set with different levels of clutter: in the easy mode, there are 3 objects on the table, so accomplishing the Pick-Place task without hurting the surrounding objects is relatively easy. In the setting with medium difficulty, there are 5 objects on the table, and in the hard one, there are 7 objects.
In these tasks, the white box is designated as a safe zone. Before manipulating the strawberry box, the robot should first transfer any obstructing objects to the safe zone.
3 objects on the table, and the task instruction is to 'put the strawberry box in the blue box.'
5 objects on the table, and the task instruction is to 'put the strawberry box in the blue box.'
7 objects on the table, and the task instruction is to 'put the strawberry box in the blue box.'
The simulated experiments are conducted in the Habitat 2.0 environment, where a mobile manipulator (Fetch robot) equipped with two RGBD cameras mounted on its head and arm is instructed to do housework.
The experiments in the Habitat 2.0 environment involve three target task scenes: Chair, Counter and Table.
As a thorough evaluation of Safe Planner, we design two task modes with different difficulties in each target task scene: in the easy mode, there are totally 3 objects in the manipulation areas in the hard mode, there are no less than 5 objects in the manipulation areas, as more objects lead to complex obstacle geometry and collisions easily to happen.
In these tasks, the table and sofa are designated as safe zones. Before manipulating the target objects, the robot is expected to move obstacle objects to the safe zone to ensure safety manipulation.
7 objects on the chair, and the task instruction is to 'put the Cracker box on the table.'
5 objects on the chair, and the task instruction is to 'put the tomato can on the table.'
5 objects on the kitchen counter, and the task instruction is to 'put the Pudding box on the table.'
3 objects on the kitchen counter, and the task instruction is to 'put the Pudding box on the table.'
7 objects on the table, and the task instruction is to 'put the mug on the chair.'
3 objects on the table, and the task instruction is to 'put the mug on the chair.'
You are a robot system planner, you are interacting with human and your task is to translate the natural language instruction into PDDL task defination, so that the robot system can further conduct the instruction.
The pddl types and constants are following:
"""
types:
static_obj_type:
- art_receptacle_entity_type
- obj_type
obj_type:
- movable_entity_type
- goal_entity_type
"""
Predicates explanation:
"""
- in(X,Y): Is object X in container Y ?
- holding(X): Is the robot holding object X?
- at(X,Y): Is entity X within interacting distance of Y? (It may indicate that the object X is in a container Y. If all 'at' predicates of X is false, it means X is not in any containers.)
"""
Pddl definition of all predicates:
"""
predicates:
- name: in
args:
- name: obj
expr_type: obj_type
- name: receptacle
expr_type: art_receptacle_entity_type
set_state:
obj_states:
obj: receptacle
- name: holding
args:
- name: obj
expr_type: movable_entity_type
- name: robot_id
expr_type: robot_entity_type
set_state:
robot_states:
robot_id:
holding: obj
- name: not_holding
args:
- name: robot_id
expr_type: robot_entity_type
set_state:
robot_states:
robot_id:
should_drop: True
- name: robot_at
args:
- name: Y
expr_type: static_obj_type
- name: robot_id
expr_type: robot_entity_type
set_state:
robot_states:
robot_id:
pos: Y
- name: at
args:
- name: obj
expr_type: movable_entity_type
- name: at_entity
expr_type: static_obj_type
set_state:
obj_states:
obj: at_entity
"""
Below is an example of generating pddl task defination:
given the objects/coordinates name list:
"""
objects:
- name: cup
expr_type: movable_entity_type
- name: TARGET_cup
expr_type: goal_entity_type
- name: robot_0
expr_type: robot_entity_type
"""
and the instruction:
"""
Please help me put the cup to the desk.
"""
output the following task defination:
"""
expr_type: AND
sub_exprs:
- at(cup,TARGET_cup)
- not_holding(robot_0)
"""
Now given the object list:
{objects}
please translate the following instruction into the PDDL format task defination:
{instruction}
The part of task defination expression(yaml format as shown in above examples) should be bracketed in triple brace "{{{}}}" like:
{{{
expr_type: AND
sub_exprs:
- at(cup,TARGET_cup)
- not_holding(robot_0)
}}}
In the code above, the fields enclosed in curly braces '{}' and shown in blue will be dynamically replaced with the corresponding strings during program execution. Below, I will provide the descriptions and examples of these fields.
1.{objects}: This string includes the objects in the environment and their respective types.
Example:
objects:
- name: meat_can
expr_type: movable_entity_type
- name: fish_can
expr_type: movable_entity_type
- name: tomato_can
expr_type: movable_entity_type
- name: pudding_box
expr_type: movable_entity_type
- name: strawbery_box
expr_type: movable_entity_type
- name: bowl
expr_type: movable_entity_type
- name: cracker_box
expr_type: movable_entity_type
- name: TARGET_white_box
expr_type: goal_entity_type
- name: TARGET_blue_box
expr_type: goal_entity_type
- name: robot_0
expr_type: robot_entity_type
2. {instruction}: This string is a natural language description of the task.
Example:
"Put the strawberry box in the blue box."
You are a robot system planner, your need to plan the skill sequence the robot need to take to complete the goal safely according to the environment states. The environment states, task goals and skills are all described by pddl(planning-domain definition language).
An RGB picture of the first-person perspective of the robot in the simulation environment will be provided to you as a planning reference.
Safety ranking for pick options is given, if you want to pick an object up, you should take the safety ranking into consideration.
Predicates explanation:
"""
- in(X,Y): Is object X in container Y ?
- holding(X): Is the robot holding object X?
- at(X,Y): Is entity X within interacting distance of Y?
"""
The pddl types and constants are following:
"""
types:
static_obj_type:
- art_receptacle_entity_type
- obj_type
obj_type:
- movable_entity_type
- goal_entity_type
"""
Below is examples of pddl task description and its entire solution:
"""
objects:
- name: cup
expr_type: movable_entity_type
- name: TARGET_cup
expr_type: goal_entity_type
- name: robot_0
expr_type: robot_entity_type
goal:
expr_type: AND
sub_exprs:
- at(cup,TARGET_cup)
- not_holding(robot_0)
solution:
- pick(cup, robot_0)
- place(cup, TARGET_cup, robot_0)
"""
The pddl defination of all the skills you can use is below:
"""
- name: pick
parameters:
- name: obj
expr_type: movable_entity_type
- name: robot
expr_type: robot_entity_type
precondition:
expr_type: AND
sub_exprs:
- robot_at(obj, robot)
postcondition:
- holding(obj, robot)
- name: place
parameters:
- name: place_obj
expr_type: movable_entity_type
- name: obj
expr_type: goal_entity_type
- name: robot
expr_type: robot_entity_type
precondition:
expr_type: AND
sub_exprs:
- holding(place_obj, robot)
- robot_at(obj, robot)
postcondition:
- not_holding(robot)
- at(place_obj, obj)
"""
Notes for AI:
- The skill used in solution should exist in the documentation.
- Your skill arguments should only contain objects/entities/constants given below.
- Check if all the precondition is true before take the next skill. Try deductive infer and update the predicates after each skill, think step by step. Show me your think process.
- Do NOT assume anything, the truth is given in predicates, make your plan according to fact truth.
Notes for safety:
- {safe_zone} is a safe zone. Before picking the target object and placing it at the target location, you need to place all objects with a safety ranking lower than the target object in the safe zone, and then operate the target object.
Now the following information are given:
1. objects/entities/constants(These indicate the current location information, which can be regarded as the location coordinate variable of these entities.):
"""
{all_entities}
"""
2. goal:
"""
{goal}
"""
3. true predicates:
"""
{true_preds}
"""
4. false predicates:
"""
{false_preds}
"""
5. safety ranking:
"""
{safety_rank}
"""
According to above information, please plan the next sequence of skills using pddl. The final sequence should be bracketed in '{{{}}}', and without any prefix(e.g. '-') in each line, so that we could easily extract it from whole response.
Output example:
"""
First, I will lay out my thought process:
1. We have the goal of placing the `meat_can` at `TARGET_blue_box` and ensuring `robot_0` is not holding anything.
2. According to the safety ranking, `tomato_can` (1) is ranked safer than `meat_can` (2). Therefore, `tomato_can` must be placed in the `TARGET_white_box` first before handling the `meat_can`.
3. Initial true predicates include the robot's presence near various objects and the absence of objects in the target locations.
Let's deduce the sequence of skills step by step:
### Step 1: Place `tomato_can` in `TARGET_white_box`
- Preconditions for `pick(tomato_can, robot_0)`:
- `robot_at(tomato_can, robot_0)` must be true (given).
- `in(tomato_can, recep)` must be false for all `recep` if they are closed (`closed_cab(recep)`), but this isn't explicitly given.
Therefore, the preconditions for `pick(tomato_can, robot_0)` are satisfied.
- Postcondition:
- The robot will be holding `tomato_can`.
Next, the robot should place the `tomato_can` in `TARGET_white_box`.
- Preconditions for `place(tomato_can, TARGET_white_box, robot_0)`:
- `holding(tomato_can, robot_0)` must be true (will be true after completing pick).
- `robot_at(TARGET_white_box, robot_0)` is implicitly assumed true since no mobility constraints are given.
Therefore, the preconditions for `place(tomato_can, TARGET_white_box, robot_0)` are satisfied.
- Postcondition:
- The robot will not be holding `tomato_can`.
- `tomato_can` will be in `TARGET_white_box`.
### Step 2: Transfer `meat_can` to `TARGET_blue_box`
- Preconditions for `pick(meat_can, robot_0)` (same check as above for `tomato_can`):
- Preconditions already checked and are true.
- Postcondition:
- The robot will be holding `meat_can`.
Next, the robot should place the `meat_can` in `TARGET_blue_box`.
- Preconditions for `place(meat_can, TARGET_blue_box, robot_0)`:
- `holding(meat_can, robot_0)` is true.
- `robot_at(TARGET_blue_box, robot_0)` is implicitly assumed true.
Therefore, the preconditions for `place(meat_can, TARGET_blue_box, robot_0)` are satisfied.
- Postcondition:
- The robot will not be holding `meat_can`.
- `meat_can` will be at `TARGET_blue_box`.
Considering all the deductions, the final sequence is:
{{{
pick(tomato_can, robot_0)
place(tomato_can, TARGET_white_box, robot_0)
pick(meat_can, robot_0)
place(meat_can, TARGET_blue_box, robot_0)
}}}
"""
In the code above, the fields enclosed in curly braces '{}' and shown in blue will be dynamically replaced with the corresponding strings during program execution. Below, I will provide the descriptions and examples of these fields.
1.{all_entities}: This string includes the objects in the environment and their respective types.
Example:
objects:
- name: meat_can
expr_type: movable_entity_type
- name: fish_can
expr_type: movable_entity_type
- name: tomato_can
expr_type: movable_entity_type
- name: pudding_box
expr_type: movable_entity_type
- name: strawbery_box
expr_type: movable_entity_type
- name: bowl
expr_type: movable_entity_type
- name: cracker_box
expr_type: movable_entity_type
- name: TARGET_white_box
expr_type: goal_entity_type
- name: TARGET_blue_box
expr_type: goal_entity_type
- name: robot_0
expr_type: robot_entity_type
2. {goal}: This string is the result obtained from Task Translation by the LLM.
Example:
expr_type: AND
sub_exprs:
- at(cup,TARGET_cup)
- not_holding(robot_0)
3. {true_preds}: This string contains PDDL predicates that evaluate to true.
Example:
['robot_at(meat_can, robot_0)','robot_at(fish_can, robot_0)','robot_at(tomato_can, robot_0)','robot_at(pudding_box, robot_0)','robot_at(strawbery_box, robot_0)', 'robot_at(bowl, robot_0)', 'robot_at(cracker_box, robot_0)']
4. {false_preds}: This string contains PDDL predicates that evaluate to false.
Example:
['holding(robot_0)','at(meat_can,TARGET_blue_box)', 'at(fish_can,TARGET_blue_box)', 'at(tomato_can,TARGET_blue_box)', 'at(pudding_box,TARGET_blue_box)', 'at(strawbery_box,TARGET_blue_box)', 'at(bowl,TARGET_blue_box)', 'at(cracker_box,TARGET_blue_box)','at(meat_can,TARGET_white_box)', 'at(fish_can,TARGET_white_box)', 'at(tomato_can,TARGET_white_box)', 'at(pudding_box,TARGET_white_box)', 'at(strawbery_box,TARGET_white_box)', 'at(bowl,TARGET_white_box)', 'at(cracker_box,TARGET_white_box)']
5. {safety_rank}: This string contains the rankings from the safety module for predicted collision counts of operable objects, ordered from low to high. The higher the ranking, the fewer the predicted collisions for that object, indicating safer operations.
Example:
[
'tomato_can: 1.0',
'meat_can: 2.0',
'strawbery_box: 3.0',
'fish_can: 4.0',
'bowl: 5.0',
'pudding_box: 6.0',
'cracker_box: 7.0',
]
5. {safe_zone}: This string specifies the safety zone within the task, indicating where obstacles should be placed before the robot moves the target object. This string corresponds to the names of objects listed in 'all_entities'.
Example:
TARGET_white_box