GraspClutter6D Dataset & Resources
GraspClutter6D Dataset & Resources
GraspClutter6D is a comprehensive dataset for 1) 6-DoF grasp detection, 2) 6D object pose estimation, and 3) instance segmentation in real-world high cluttered scenes.
1,000 cluttered scenes across 75 environment setups (bins, shelves, tables)
52,000 RGB-D images from 4 cameras (Realsense D415, D435, Azure Kinect, Zivid)
200 unique objects with an average of 14.1 objects per scene
736K 6D object pose annotations and 9.3B 6-DoF grasp poses
Dataset [Link] - data and annotations (compatible with GraspNet-1B and BOP dataset format)
GraspClutter6D API [Link] - Official API for loading and evaluating 6-DoF grasps
Annotation Tools [Link] - 6D object pose annotation toolkit
Object & Furnitures [Link] - Purchase the physical objects and furnitures used in our dataset
Pose Estimation Tools [comming soon] - Toolkits for 6D object pose estimation
After downloading and extracting the dataset, you'll find the following directory structure:
-----
GraspClutter6D/
├── split_info/ # Dataset splits for different evaluation setups
│ ├── grasp_train_scene_ids.json # Train scene IDs for cross-object setup (YCB-HOPE objects)
│ ├── grasp_test_scene_ids.json # Test scene IDs for cross-object setup (non-YCB-HOPE objects)
│ ├── ycbv_train_scene_ids.json # Train scene IDs for intra-object setup (YCB objects)
│ ├── ycbv_test_scene_ids.json # Test scene IDs for intra-object setup (YCB objects)
│ └── obj_ids_per_scene.json # Metadata for custom split creation
│
├── scenes/ # Scene data organized by scene ID
│ ├── 000000/ # Scene 000000
│ │ ├── rgb/ # RGB images from multiple cameras (8-bit PNG files)
│ │ │ ├── 000001.png # Camera 1: RealSense D415 (IDs: 1,5,9,...)
│ │ │ ├── 000002.png # Camera 2: RealSense D435 (IDs: 2,6,10,...)
│ │ │ ├── ... # Camera 3: Azure Kinect (IDs: 3,7,11,...)
│ │ │ └── 000052.png # Camera 4: Zivid (IDs: 4,8,12,...)
│ │ │
│ │ ├── depth/ # Depth images (16-bit unsigned short, in mm)
│ │ │
│ │ ├── mask/ # Amodal masks for each instance (BOP format)
│ │ │ ├── 000001_000000.png # Binary mask for 0th instance in image 000001
│ │ │ ├── 000001_000001.png # Binary mask for 1st instance in image 000001
│ │ │ └── ...
│ │ │
│ │ ├── visible_mask/ # Visible masks for each instance (BOP format)
│ │ │
│ │ ├── label/ # Semantic masks (GraspNet-1B format)
│ │ │ ├── 000001.png # Semantic mask (0: background, 1-200: object ID)
│ │ │ └── ...
│ │ │
│ │ ├── scene_camera.json # Camera poses, intrinsic parameters, depth scale (BOP format)
│ │ ├── scene_gt.json # 6D object poses (BOP format)
│ │ └── scene_gt_info.json # Metadata for 6D object metric calculation
│ │
│ ├── 000001/ # Scene 000001
│ └── ... # Scenes 000002 through 000999
│
├── models/ # High-quality 3D object models in mm units (PLY format, BOP compatible)
│ ├── obj_000001.ply
│ └── ...
│
├── models_eval/ # Simplified 3D object models for evaluation (PLY format, mm units)
│
├── models_obj/ # High-quality 3D object models (OBJ format, mm units)
│
├── models_obj_eval/ # Simplified 3D object models for evaluation (OBJ format, mm units)
│
├── models_m/ # High-quality 3D object models (PLY format, meter units, GraspNet-1B format)
│
├── models_obj_m/ # 3D object models (OBJ format, meter units, GraspNet-1B compatible)
│
├── grasp_label/ # 6-DoF grasp pose labels (GraspNet-1B format)
│ ├── obj_000001_labels.npz
│ └── ...
│
└── dex_models/ # Pre-processed models for fast evaluation (optional but recommended)
├── obj_000001.pkl
└── ...
All data and resources are for only non-commercial purposes, license under Creative Commons Attribution 4.0 Non Commercial License (BY-NC-SA).