Jieyi Zhang, Wenqiang Xu, Zhenjun Yu, Pengfei Xie, Tutian Tang and Cewu Lu
Abstract
This study introduces a novel language-guided diffusion-based learning framework, DiffuTOG, aimed at advancing the field of task-oriented grasping (TOG) with dexterous hands. Unlike conventional TOG that mainly focuses on 2-finger grippers, this research addresses the complexities of dexterous manipulation, where the system must identify non-unique optimal grasp poses under specific task constraints, cater to multiple valid grasps, and navigate a high degree of freedom in grasp planning. The proposed DiffuTOG framework, leveraging natural language task descriptions, 3D object observations, and a hand model, predicts grasp poses by iteratively adding and reducing noise in the hand configuration space. To support this framework, a new dataset, DexTOG-80K, was developed using a shadow robot hand to perform various tasks on 80 objects from 5 categories, showcasing the dexterity and multi-tasking capabilities of the robotic hand. This research not only presents a significant leap in dexterous TOG but also provides a comprehensive dataset and simulation validation, setting a new benchmark in robotic manipulation research.
Method
Data Generation Pipeline
Qualitative Results with RL Policy