IROS 2024 (Oral)
Best Conference Paper/Best Student Paper Awards Finalist
Xiaolin Fang1 , Caelan Reed Garrett2, Clemens Eppner2,
Tomás Lozano-Pérez1, Leslie Pack Kaelbling1, Dieter Fox2
MIT CSAIL1 NVIDIA2
Generative models such as diffusion models, excel at capturing high-dimensional distributions with diverse input modalities, e.g. robot trajectories, but are less effective at multi-step constraint reasoning. Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by composing diffusion models using a TAMP system. We use the learned components for constraints and samplers that are difficult to engineer in the planning model, and use a TAMP solver to search for the task plan with constraint-satisfying action parameter values. To tractably make predictions for unseen objects in the environment, we define the learned samplers and TAMP operators on learned latent embedding of changing object states. We evaluate our approach in a simulated articulated object manipulation domain and show how the combination of classical TAMP, generative modeling, and latent embedding enables multi-step constraint-based reasoning. We also apply the learned sampler directly in the real world.
Diffusion Models as Samplers for Task and Motion Planning
Compose diffusion models and engineered samplers (e.g. RRT) through the TAMP framework
Adopt diffusion models as a generative representation of TAMP constraints.
✔️ Trajectory/skill modeling capability of diffusion models
✔️ Planning and reasoning capability of Task and Motion Planners
✔️ TAMP from raw perception input
Each task plan defines a Constraint-Satisfaction Problem (CSP)
Solve the CSP using learned and given samplers
Alternate between task plan searching and CSP solving
We model both the object state and the robot state (action) [Diffuser, Janner*, Du* et.al].
Motivation: Constraints are often defined on object states. Modeling the object state allows us to add or check constraints on the skill at testing time.
The trajectory model encodes general constraints on the object and robot states.
To draw samples that satisfy certain constraints during planning, such as trajectories that close the microwave, we use classifier-guided conditional sampling of the diffusion model.
Different skills may require different state and action representations. DiMSam can compose heterogeneous skills as long as constraints can be defined on the state representations used by those skills.
@inproceedings{fang2024dimsam,
title={{DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability}},
author={Xiaolin Fang and Caelan Reed Garrett and Clemens Eppner and Tomás Lozano-Pérez and Leslie Pack Kaelbling and Dieter Fox},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2024},
}
Click to switch between images
Simulation Environment
Real World Observation
Diffusion prediction error
Microwave door is not fully closed
Planning (non-diffusion component) and execution error
Collision detected in pre-pushing trajectory
Execution uncertainty: stick slips in the gripper
Partial observability: sparsity of the point cloud