We make extensions to the Visual Manipulation Relationship Dataset (VMRD) by manually adding text descriptions and scene graphs to the scenes, resulting in a new dataset L-VMRD.
Our dataset is organized by a 6-tuple:
[ image, language descriptions, scene graph, object bounding box, grasping bounding box, surface ]
L-VMRD contains 4,676 examples, including 112,965 scene object relationship expressions and 21,713 surface attributes paired with grasp bounding boxes.
Image_object_caption: including RGB scene image; scene language description; object with name, index, boundingbox, and stacking tree from original VMRD.
Grasp_axis_align: including axis-align grasping bounding box with rotated angle label.
Scene_graph: including scene triplet data for scene graph generarion model; selected grasping object label based on scene graph for RGCN in our paper. We also provide wrappered object bounding box using in our scene-based model.
File structure is descripted in README.md.