CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation
CVPR 2025 Highlight
Yuxing Long, Jiyao Zhang, Mingjie Pan, Tianshu Wu, Taewhan Kim, Hao Dong
CFCS, School of Computer Science, Peking University
PKU-Agibot Lab
Abstract
Correct use of electrical appliances has significantly improved human life quality. Unlike simple tools that can be manipulated with common sense, different parts of electrical appliances have specific functions defined by manufacturers. If we want the robot to heat bread by microwave, we should enable them to review the microwave’s manual first. From the manual, it can learn about component functions, interaction methods, and representative task steps about appliances. However, previous manual-related works remain limited to question-answering tasks while existing manipulation researchers ignore the manual's important role and fail to comprehend multi-page manuals.
In this paper, we propose the first manual-based appliance manipulation benchmark CheckManual. Specifically, we design a large model-assisted human-revised data generation pipeline to create manuals based on CAD appliance models. With these manuals, we establish novel manual-based manipulation challenges, metrics, and simulator environments for model performance evaluation. Furthermore, we propose the first manual-based manipulation planning model ManualPlan to set up a group of baselines for the CheckManual benchmark.
CheckManual Data Collection
Figure 1: Generation workflow of CheckManual dataset. In the leftmost part, we analyze real manuals to learn about their formats and collect different categories of appliance CAD models. The middle part demonstrates the creation of manual materials, including appliance creation, task generation, and figure design. Human verifies every step to guarantee correctness. Based on this information, the rightmost part generates appliance manuals with diverse formats through the LaTeX.
Created Manual Examples
Manual-based Manipulation Planing Method -- ManualPlan
Figure 2: The framework of our ManualPlan model. It is composed of Manual Resolution, Manipulation Planing and Part Alignment modules. The ManualPlan can make high level planning to control the CAD-assisted primitive actions or open-vocabulary manipulation large model (e.g., VoxPoser) to use the appliance, which serves as the baseline models for CheckManual benchmark.
Please cite our paper if you find it helpful :)