Generative Factor Chaining: Coordinated Manipulation with Diffusion-based Factor Graph

In Submission

In the realm of challenging long-horizon planning tasks involving multiple manipulators, existing methods encounter computational scalability issues or require an impractical amount of training data. To address these limitations, we present Generative Factor Chaining (GFC), a novel approach based on modularized generative models for learning and composing skills in complex tasks. Our proposed method treats a long-horizon planning task in a complex scene as a spatial-temporal factor graph, where nodes represent objects in the scene and factors denote constraints/skills that connect different objects. By employing the diffusion model framework, different factors can be jointly learned using individual skill data, which is readily obtainable. During inference, these factors can be flexibly composed, possibly with additional constraints, to achieve long-horizon planning. The modular design of GFC enables generalization to unseen planning tasks. We showcase the advantages of our method through real-world experiments.

Factor graph for a multi-arm coordination task. Our factor graph-based planning formulation is to solve for a sequence of spatial factor graphs from the initial state to a goal factor by chaining them using temporal skill factors. The above figure is an illustration of the temporal evolution of a factor graph by the execution of single or multiple skills sequentially or in-parallel. Given the hammer grasped by the left arm and a nail out of reach of the right arm, the goal is to handover the hammer to right arm followed by left arm picking up the nail. Finally, both arms coordinate to move to a position such that hammer can strike the nail. The subscript in the nodes denotes the temporal evolution of each of them.

Evaluation on Coordination Tasks

hammer_place_final.mp4
hammer_nail_final.mp4

Hammer must be picked by left arm and handed over to the right arm. Right arm places the hammer in the box

Hammer must be picked by left arm and handed over to the right arm. Left arm picks up the nail. Both the arm coordinate to perform a successful strike (hammering)

pour_cup_final.mp4
bimanual_reorientation_final.mp4

Left and right arms must pick the pick and green mug respectively. Both arms must coordinate to move the mugs such that contents can be poured from pink to green mug.

Left and right arms must coordinate to pick the pot. The task is to rotate the pot at a specified target reorientation angle.