GraspGPT: Leveraging Semantic Knowledge from

a Large Language Model for Task-Oriented Grasping

(RA-L 2023)

Chao Tang1,2, Dehao Huang1,2, Wenqi Ge1,2, Weiyu Liu3, Hong Zhang1,2

[1] Shenzhen Key Laboratory of Robotics and Computer Vision, SUSTech, Shenzhen, China.

[2] Department of Electronic and Electrical Engineering, SUSTech, Shenzhen, China.

[3]  Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, United States.

Abstract: Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the generalization to novel concepts out of the pre-defined sets. To address this issue, we propose GraspGPT, a large language model (LLM) based TOG framework that leverages the open-end semantic knowledge from an LLM to achieve zero-shot generalization to novel concepts. We conduct experiments on Language Augmented TaskGrasp (LA-TaskGrasp) dataset and demonstrate that GraspGPT outperforms existing TOG methods on different held-out settings when generalizing to novel concepts out of the training set. The effectiveness of GraspGPT is further validated in real-robot experiments. 

Overview:

Language descriptions connect the novel concept to its related concepts described during training, enabling the generalization of task-oriented grasping skills from known concepts to novel concepts.

ICRA Presentation Video:

Pipeline:

(a) An overview of GraspGPT framework: when presented with a novel concept, such as a novel object class or task, in the natural language instruction, GraspGPT first prompts an LLM to acquire a set of language description paragraphs of the concept. Subsequently, GraspGPT evaluates the task compatibility of grasp candidates according to the visual and linguistic inputs from the sensors and an LLM. 

(b) The detailed strcuture of task-oriented grasp evaluator: the module is a customized transformer decoder that injects semantic knowledge from an LLM into the natural language instruction.

Perception Experiments:

Qualitative results of perception experiments: GraspGPT and the state-of-the-art GCNGrasp are evaluated under both held-out class and held-out task settings on LA-TaskGrasp dataset. Results on self-collected objects (no ground truth annotations) are also presented. Grasp poses are colored by their confidence scores (green is higher).

Real-Robot Experiments (Grasping):

"use the tongs to clip"

"hold the mug in a drinking position"

"to spray, take hold of the bottle"

"perform cleaning with the brush"

"make sure you have a handover-friendly grip on the box"

"help me use the saucepot to saute"

"obtain the ladle for dispensing"

"find the spatula and then scoop"

"mix with a whisk"

"if you want to screw, hold the screwdriver"

"ensure you grasp the hammer in a way that allows for handover"

"grip the mug to dirnk"

Real-Robot Experiments (Manipulation):

"scoop the coffee bean with a spoon"

"handover the hammer to me"

"pour with the sauecepot"

"just use the spoon to scoop"

Experiment Video:

Authors:

Citation:

@article{tang2023graspgpt,

  title={GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping},

  author={Tang, Chao and Huang, Dehao and Ge, Wenqi and Liu, Weiyu and Zhang, Hong},

  journal={arXiv preprint arXiv:2307.13204},

  year={2023}

}