COB-3D-v2 Dataset Format

This page described the format of the COB-3D-v2 dataset (download)

The tar file contains the following directory structure:

v2/

|-- dset.json

|-- scenes/

|-- {scene_id}.npz

|-- meshes/

|-- {mesh_id}.stl

The dataset split is contained in v2/dset.json. This is a json file in the format:

{

"train": [

"scene_id_1",

"scene_id_2",

...,

"val: [

"scene_id_X",

...,

}

There are 6955 scenes in total (6259 train, 696 val).

Each scene is contained in an NPZ file: v2/scenes/{scene_id}.npz

The structure of each NPZ file is outlined below. The mesh_ids, obj_poses, voxel_grid fields were added since the v1 release to enable training shape completion models.

v2/scenes/{scene_id}.npz:

|-- rgb: The rendered RGB image.

Shape (3, H, W), dtype float32, values scaled to [0, 1].

|-- intrinsic: The camera intrinsics.

Shape (3, 3), dtype float32.

|-- depth_map: The rendered depth map corresponding to `rgb`.

Shape (H, W), dtype float32.

|-- normal_map: The rendered normal map corresponding to rgb`.

Shape (3, H, W), dtype float32.

|-- near_plane: The minimum depth value of the scene's working volume.

Scalar, float32.

|-- far_plane: The maximum depth value of the scene's working volume.

Scalar, float32.

|-- segm/

|-- boxes: 2D bounding boxes for each object in the scene.

Shape (N_objects, 4), dtype float32.

These are pixel coordinates relative to `rgb`.

The box format is `[x_low, y_low, x_high, y_high]`

|-- masks: Binary masks for each object in the scene.

Shape (N_objects, H, W), dtype bool.

|-- amodal_masks: Amodal instance masks for each object.

Shape (N_objects, H, W), dtype bool.

|-- bbox3d/

|-- poses: The pose of each object's 3D bounding box, as a 4x4 matrix.

This is the transform from the bbox frame to the camera frame.

Shape (N_objects, 4, 4), dtype float32.

|-- dimensions: The dimensions of each 3D bounding box.

Shape (N_objects, 3), dtype float32.

|-- corners: The corner points of each 3D bounding box, in the camera frame.

Shape (N_objects, 8, 3), dtype float32.

|-- mesh_ids: The mesh_id of each object.

These correspond to the STL files in `v2/meshes/{mesh_id}.stl`

List[str], length N_objects.

|-- obj_poses/

|-- poses: The pose of each mesh, as a 4x4 matrix.

This is the transform from the mesh frame to the camera frame.

Note that the mesh frame does not necessarily equal the bbox frame!

Shape (N_objects, 4, 4), dtype float32.

|-- scales: The scale of each mesh.

Shape (N_objects, 3), dtype float32.

|-- voxel_grid/

|-- voxels: The surface of each mesh, extracted into a voxel grid.

Shape (N_objects, n_voxels, n_voxel, n_voxels), dtype bool.

|-- extents: The extents of each object's voxel grid.

`voxel_grid/voxels[i]` span the cuboid `[-extents[i], extents[i]]`,

in the object frame `obj_poses/poses[i]`.

Shape (N_objects, 3), dtype float32.