Download Links

Training set: 

Validation set: 

Test set: 

Additional resources: 

README

HouseCat6D is an indoor dataset mainly focusing on two tasks: object pose estimation and grasp pose estimation. An example of the structure of each scene is as follows:

-- 📁 scene01

   - 📁 camera_pose # each camera pose in the trajectory

     - 📄 000000.txt ... 000290.txt # 4x4 matrices

   - 📁 depth/instance/nocs/normal/pol/RGB # depth map in D435 style/instance masks/NOCS correspondence/surface normals of objects/polarization images (4 sub-images for each)/RGB images

     - 📄 000000.png ... 000290.png

   - 📁 grasps (optional) # grasp labels for several scenes

     - 📁 polarization # grasps under the camera frame

       - 📄 000000.h5 ... 000290.h5

     - 📄 grasps_info_base.h5 # grasps under the tracker frame

   - 📁 labels # pose labels

     - 📄 000000_label.pkl ... 000290_label.pkl

   - 📁 obj_pose_final # object poses under the tracker frame (constant for each scene)

     - 📄 box-kleenex.txt ... remote-grey.txt

   - 📁 occlusion # occlusion content for each object at each camera pose

     - 📄 000000.json ... 000290.json

  - 📄 intrinsics.txt # camera intrinsic

  - 📄 meta.txt # class information

Object Pose Information 

The dataset contains 34 scenes for training, 2 scenes for validation, and 5 scenes for testing. The object pose part is in NOCS [1] fashion. We provide gt bounding boxes and masks for all the sets.

📁 labels: It contains ground truth information under the current camera frame--instance_ids: object instance ids. class_ids: object class ids.  bboxes: 2D bounding box of each object. translations: the translation of each object. rotations: the rotation of each object. gt_scales: the size of the 3D bounding boxes.

📄 meta.txt: Each line contains 1) instance id in this scene, 2) class id in the whole dataset, and 3) textual name of the object, e.g., "box-kleenex".

Grasp Pose Information

We curate grasp labels for 14 scenes (01, 02, 03, 04, 05, 09, 19, 21, 22, 23, 24, 25, 33, 34) as the training set, 1 scene (val_scene1) as the validation set, and 1 scene (test_scene1) as the test set. The dataset format and the gripper model are kept as most similar to the ones in ACRONYM [2], DA2 [3], and MonoGraspNet [4].

Every grasp file is in .h5 format under the 📁grasps folder in each scene. The file 📄grasps_info_base.h5 contains grasps under the tracker base (world frame). The subfolder, for example, 📁polarization, contains grasps mapped to each polarization camera frame in the trajectory. We give a demonstration of what a grasp file contains:

>> h5ls -r /path/to/scene/grasps/grasps_info_base.h5

/                        Group

/cup_stanford            Group

/cup_stanford/angles     Dataset {110, 1}

/cup_stanford/axis       Dataset {110, 3}

/cup_stanford/centers    Dataset {110, 3}

/cup_stanford/end_points Dataset {110, 2, 3}

/cup_stanford/grasp_points Dataset {110, 2, 3}

/cup_stanford/qualities  Group

/cup_stanford/qualities/object_in_gripper Dataset {110, 1}

/cup_stanford/transforms Dataset {110, 4, 4}

...


/glass_green_bottle      Group

/glass_green_bottle/angles Dataset {428, 1}

/glass_green_bottle/axis Dataset {428, 3}

/glass_green_bottle/centers Dataset {428, 3}

/glass_green_bottle/end_points Dataset {428, 2, 3}

/glass_green_bottle/grasp_points Dataset {428, 2, 3}

/glass_green_bottle/qualities Group

/glass_green_bottle/qualities/object_in_gripper Dataset {428, 1}

/glass_green_bottle/transforms Dataset {428, 4, 4}

/cup_stanford-- the object name; /axis--Directions of finger closing; /angles--Angles of rotation around the axis; /centers--Centers of two jaws; /end_points--Coordinates of the end fingers of two jaws (wide open); /grasp_points--Contact positions between fingers and objects; /qualities/object_in_gripper--success labels inspected by Isaac Gym; /transforms--6 DoF poses of two jaws.

Object Mesh Information

We also provide object meshes for reference (see the Download Links). Each folder represents one category, which contains a mesh file in .obj format, a material file in .mtl format (optional), a 2D rendering image (optional), and a URDF file for simulation. The folder 📁collision contains decomposing meshes using v-hacd for the potential usage on the simulation platform, e.g., PyBullet.

Reference

[1] Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., & Guibas, L. J. (2019). Normalized object coordinate space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2642-2651).

[2] Eppner, C., Mousavian, A., & Fox, D. (2021, May). Acronym: A large-scale grasp dataset based on simulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6222-6227). IEEE.

[3] Zhai, G., Zheng, Y., Xu, Z., Kong, X., Liu, Y., Busam, B., ... & Zhang, Z. (2022). DA $^{2} $ Dataset: Toward Dexterity-Aware Dual-Arm Grasping. IEEE Robotics and Automation Letters, 7(4), 8941-8948.

[4] Zhai, G., Huang, D., Wu, S. C., Jung, H., Di, Y., Manhardt, F., ... & Busam, B. (2023, May). Monograspnet: 6-dof grasping with a single rgb image. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 1708-1714). IEEE.