Human-in-the-loop Robotic Grasping using BERT Scene Representation

Yaoxian Song, Penglei Sun, Pengfei Fang, Linyi Yang, Yanghua Xiao, Yue Zhang*


Paper Code Dataset Bibtex

Overview

Current NLP techniques have been greatly applied in different domains. In this paper, we propose a human-in-the-loop framework for robotic grasping in cluttered scenes, investigating a language interface to the grasping process, which allows the user to intervene by natural language commands. This framework is constructed on a state-of-the-art grasping baseline, where we substitute a scene-graph representation with a text representation of the scene using BERT. Experiments on both simulation and physical robot show that the proposed method outperforms conventional object-agnostic and scene-graph based methods in the literature. In addition, we find that with human intervention, performance can be significantly improved.

Method

Results

Pyhsical Experiment

Bibtex

@inproceedings{song-etal-2022-human,

title = "Human-in-the-loop Robotic Grasping Using {BERT} Scene Representation",

author = "Song, Yaoxian and

Sun, Penglei and

Fang, Pengfei and

Yang, Linyi and

Xiao, Yanghua and

Zhang, Yue",

booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",

month = oct,

year = "2022",

address = "Gyeongju, Republic of Korea",

publisher = "International Committee on Computational Linguistics",

url = "https://aclanthology.org/2022.coling-1.265",

pages = "2992--3006",

}