Abstract
The performance of optimization-based robot motion planning algorithms is highly dependent on the initial solutions, commonly obtained by running a sampling-based planner to obtain a collision-free path. However, these methods can be slow in high-dimensional and complex scenes and produce non-smooth solutions. Given previously solved path-planning problems, it is highly desirable to learn their distribution and use it as a prior for new similar problems. Several works propose utilizing this prior to bootstrap the motion planning problem, either by sampling initial solutions from it, or using its distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we introduce Motion Planning Diffusion (MPD), an algorithm that learns trajectory distribution priors with diffusion models. These generative models have shown increasing success in encoding multimodal data and have desirable properties for gradient-based motion planning, such as cost guidance. Given a motion planning problem, we construct a cost function and sample from the posterior distribution using the learned prior combined with the cost function gradients during the denoising process. Instead of learning the prior on all trajectory waypoints, we propose learning a lower-dimensional representation of a trajectory using linear motion primitives, particularly B-spline curves. This parametrization guarantees that the generated trajectory is smooth, can be interpolated at higher frequencies, and needs fewer parameters than a dense waypoint representation. We demonstrate the results of our method ranging in 2D and more complex tasks using a 7-dof robot arm manipulator. In addition to learning from simulated data, we also use human demonstrations on a real-world pick-and-place task. The experiments show that MPD achieves higher success rates compared to an uninformed prior, and a baseline that first samples from the diffusion prior and then optimizes the cost, while maintaining diversity in the generated trajectories.
Denoising process in the EnvSimple2D-RobotPointMass2D task
Training environment
Additional objects
Inference pipeline and architecture
The input to MPD is the current joint configuration and a desired target configuration (either a joint position or an end-effector pose). The output is a batch of trajectories that are samples from the prior distribution and biased toward low-cost regions, using denoising cost guidance. Instead of diffusing trajectory waypoints, we parametrize trajectories with B-splines, and diffuse only the control points, which are a lower dimensional representation that ensures smooth solutions.
2D example that shows the difference between sampling from a prior and then optimizing the cost, versus sampling from the posterior with MPD.
Demonstrations
Diffusion prior + Cost optimization
MPD
Generated samples in the EnvWarehouse-RobotPanda task (in the simulation)
Training environment
Additional objects
Generated samples in the EnvWarehouse-RobotPanda task (in the real robot)
Human demonstrations via kinesthetic teaching in the EnvWarehouse-RobotPanda task
The diffusion prior generates trajectories that lead to collisions with new obstacles
MPD generates trajectories that are close to the prior but avoid new obstacles