In Submission
Learning to plan for multi-step, multi-manipulator tasks is notoriously difficult because of the large search space and the complex constraint satisfaction problems. We present Generative Factor Chaining (GFC), a composable generative model for planning. GFC represents a planning problem as a spatial-temporal factor graph, where nodes represent objects and robots in the scene, spatial factors capture the distributions of valid relationships among nodes, and temporal factors represent the distributions of skill transitions. Each factor is implemented as a modular diffusion model, which are composed during inference to generate feasible long-horizon plans through bi-directional message passing. We show that GFC can solve complex bimanual manipulation tasks and exhibits strong generalization to unseen planning tasks with novel combinations of objects and constraints.
Factor graph for a multi-arm coordination task. Our factor graph-based planning formulation is to solve for a sequence of spatial factor graphs from the initial state to a goal factor by chaining them using temporal skill factors. The above figure is an illustration of the temporal evolution of a factor graph by the execution of single or multiple skills sequentially or in-parallel. Given the hammer grasped by the left arm and a nail out of reach of the right arm, the goal is to handover the hammer to right arm followed by left arm picking up the nail. Finally, both arms coordinate to move to a position such that hammer can strike the nail. The subscript in the nodes denotes the temporal evolution of each of them.
Hammer must be picked by left arm and handed over to the right arm. Right arm places the hammer in the box
Hammer must be picked by left arm and handed over to the right arm. Left arm picks up the nail. Both the arm coordinate to perform a successful strike (hammering)
Left and right arms must pick the pick and green mug respectively. Both arms must coordinate to move the mugs such that contents can be poured from pink to green mug.
Left and right arms must coordinate to pick the pot. The task is to rotate the pot at a specified target reorientation angle.