GraspClutter6D - DATASET

GraspClutter6D Dataset & Resources

Overview

GraspClutter6D is a comprehensive dataset for 1) 6-DoF grasp detection, 2) 6D object pose estimation, and 3) instance segmentation in real-world high cluttered scenes.

Key Statistics:

1,000 cluttered scenes across 75 environment setups (bins, shelves, tables)
52,000 RGB-D images from 4 cameras (Realsense D415, D435, Azure Kinect, Zivid)
200 unique objects with an average of 14.1 objects per scene
736K 6D object pose annotations and 9.3B 6-DoF grasp poses

Resources

Dataset [Link] - data and annotations (compatible with GraspNet-1B and BOP dataset format)
GraspClutter6D API [Link] - Official API for loading and evaluating 6-DoF grasps
Annotation Tools [Link] - 6D object pose annotation toolkit
Object & Furnitures [Link] - Purchase the physical objects and furnitures used in our dataset
Pose Estimation Tools [comming soon] - Toolkits for 6D object pose estimation

Dataset Format

After downloading and extracting the dataset, you'll find the following directory structure:

-----

GraspClutter6D/

├── split_info/ # Dataset splits for different evaluation setups

│ ├── grasp_train_scene_ids.json # Train scene IDs for cross-object setup (YCB-HOPE objects)

│ ├── grasp_test_scene_ids.json # Test scene IDs for cross-object setup (non-YCB-HOPE objects)

│ ├── ycbv_train_scene_ids.json # Train scene IDs for intra-object setup (YCB objects)

│ ├── ycbv_test_scene_ids.json # Test scene IDs for intra-object setup (YCB objects)

│ └── obj_ids_per_scene.json # Metadata for custom split creation

│

├── scenes/ # Scene data organized by scene ID

│ ├── 000000/ # Scene 000000

│ │ ├── rgb/ # RGB images from multiple cameras (8-bit PNG files)

│ │ │ ├── 000001.png # Camera 1: RealSense D415 (IDs: 1,5,9,...)

│ │ │ ├── 000002.png # Camera 2: RealSense D435 (IDs: 2,6,10,...)

│ │ │ ├── ... # Camera 3: Azure Kinect (IDs: 3,7,11,...)

│ │ │ └── 000052.png # Camera 4: Zivid (IDs: 4,8,12,...)

│ │ │

│ │ ├── depth/ # Depth images (16-bit unsigned short, in mm)

│ │ │

│ │ ├── mask/ # Amodal masks for each instance (BOP format)

│ │ │ ├── 000001_000000.png # Binary mask for 0th instance in image 000001

│ │ │ ├── 000001_000001.png # Binary mask for 1st instance in image 000001

│ │ │ └── ...

│ │ │

│ │ ├── visible_mask/ # Visible masks for each instance (BOP format)

│ │ │

│ │ ├── label/ # Semantic masks (GraspNet-1B format)

│ │ │ ├── 000001.png # Semantic mask (0: background, 1-200: object ID)

│ │ │ └── ...

│ │ │

│ │ ├── scene_camera.json # Camera poses, intrinsic parameters, depth scale (BOP format)

│ │ ├── scene_gt.json # 6D object poses (BOP format)

│ │ └── scene_gt_info.json # Metadata for 6D object metric calculation

│ │

│ ├── 000001/ # Scene 000001

│ └── ... # Scenes 000002 through 000999

│

├── models/ # High-quality 3D object models in mm units (PLY format, BOP compatible)

│ ├── obj_000001.ply

│ └── ...

│

├── models_eval/ # Simplified 3D object models for evaluation (PLY format, mm units)

│

├── models_obj/ # High-quality 3D object models (OBJ format, mm units)

│

├── models_obj_eval/ # Simplified 3D object models for evaluation (OBJ format, mm units)

│

├── models_m/ # High-quality 3D object models (PLY format, meter units, GraspNet-1B format)

│

├── models_obj_m/ # 3D object models (OBJ format, meter units, GraspNet-1B compatible)

│

├── grasp_label/ # 6-DoF grasp pose labels (GraspNet-1B format)

│ ├── obj_000001_labels.npz

│ └── ...

│

└── dex_models/ # Pre-processed models for fast evaluation (optional but recommended)

├── obj_000001.pkl

└── ...

License

All data and resources are license under Creative Commons Attribution-ShareAlike 4.0 (CC BY-SA).