FUNCTO: Function-Centric One-Shot Imitation Learning
for Tool Manipulation
FUNCTO: Function-Centric One-Shot Imitation Learning
for Tool Manipulation
Abstract: Learning tool use from a single human demonstration video offers a highly intuitive and efficient approach to robot teaching. While humans can effortlessly generalize a demonstrated tool manipulation skill to diverse tools that support the same function (e.g., pouring with a mug versus a teapot), current one-shot imitation learning (OSIL) methods struggle to achieve this. A key challenge lies in establishing functional correspondences between demonstration and test tools, considering significant geometric variations among tools with the same function (i.e., intra-function variations). To address this challenge, we propose FUNCTO, an OSIL method that establishes function-centric correspondences with a latent, keypoint-based representation that is functionally meaningful and physically grounded. Using this formulation, we factorize FUNCTO into three stages: (1) functional keypoint extraction, (2) function-centric correspondence establishment, and (3) functional keypoint-based action planning. We evaluate FUNCTO against exiting modular OSIL methods and end-to-end behavioral cloning methods through real-robot experiments across diverse tool manipulation tasks. The results demonstrate the superiority of FUNCTO when generalizing to novel tools with intra-function geometric variations.
Presentation Video
Overview
FUNCTO establishes functional correspondences between demonstration and test tools using 3D functional keypoints. With a single human demonstration video, FUNCTO generalizes the demonstrated tool manipulation skill to novel tools, even with significant intra-function geometric variations.
Pipeline
An overview of the FUNCTO framework. The pipeline consists of three stages: (1) Functional keypoint extraction, where functional keypoints and their trajectories are extracted from the human demonstration video; (2) Function-centric correspondence establishment, where function-centric correspondences between demonstration and test tools are established using functional keypoints; and (3) Functional keypoint-based action planning, where the test tool trajectory is synthesized and executed to accomplish a functionally equivalent task.
Real-Robot Experiments (pour)
All robot videos are played at 4x speed.
Real-Robot Experiments (cut, scoop, brush, pound)
All robot videos are played at 4x speed.
Qualitative Results
Citation
@article{tang2025functo,
title={FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation},
author={Tang, Chao and Xiao, Anxing and Deng, Yuhong and Hu, Tianrun and Dong, Wenlong and Zhang, Hanbo and Hsu, David and Zhang, Hong},
journal={arXiv preprint arXiv:2502.11744},
year={2025}
}