Abstract
Autonomous systems that efficiently utilize tools can assist humans in completing many common tasks such as cooking and cleaning. However, current systems fall short of matching human-level of intelligence in terms of adapting to novel tools. Prior works based on affordance often make strong assumptions about the environments and cannot scale to more complex, contact-rich tasks. In this work, we tackle this challenge and explore how agents can learn to use previously unseen tools to manipulate deformable objects. We propose to learn a generative model of the tool-use trajectories as a sequence of tool point clouds, which generalizes to different tool shapes. Given any novel tool, we first generate a tool-use trajectory and then optimize the sequence of tool poses to align with the generated trajectory. We train a single model on four different challenging deformable object manipulation tasks, using demonstration data from only one tool per task. The model generalizes to various novel tools, significantly outperforming baselines. We further test our trained policy in the real world with unseen tools, where it achieves the performance comparable to human.
Method Overview
We leverage the trajectory generation module to generate an ideal tool trajectory accomplishing the task. Then, we align the selected tool with the generated tool via sequential pose optimization to extract the pose of the selected tool, and finally, we use inverse kinematics to obtain the actions for the agent to execute.
Qualitative Results
Real World Results
Below, we showcase ToolGen’s trajectories as it successfully accomplishes various tasks using unseen test tools by transferring the simulation trained policy to the real world.
ToolGen (Ours)
Goal 1
Goal 2
Rolling Tool 1
Rolling Tool 2
ToolGen (Ours)
Goal 1
Goal 2
Cutting Tool 1
Cutting Tool 2
ToolGen (Ours)
Goal 1
Goal 2
Scooping Tool 1
Scooping Tool 2
Simulation Results
Below, we showcase ToolGen’s trajectories as it successfully accomplishes various tasks using unseen test tools, all done with just a single model trained across all tasks in the simulation. For comparison, we've also included results from TFN-Traj.
ToolGen (Ours)
Goals
Average performance: 0.80
Baseline (TFN-Traj)
Average performance: 0.35
ToolGen (Ours)
Goals
Average performance: 0.82
Baseline (TFN-Traj)
Average performance: 0.29
ToolGen (Ours)
Goals
Average performance: 0.75
Baseline (TFN-Traj)
Average performance: 0.70