COB-3D-v2 Dataset Format
This page described the format of the COB-3D-v2 dataset (download)
The tar file contains the following directory structure:
v2/
|-- dset.json
|-- scenes/
|-- {scene_id}.npz
|-- meshes/
|-- {mesh_id}.stl
The dataset split is contained in v2/dset.json. This is a json file in the format:
{
"train": [
"scene_id_1",
"scene_id_2",
...,
],
"val: [
"scene_id_X",
...,
],
}
There are 6955 scenes in total (6259 train, 696 val).
Each scene is contained in an NPZ file: v2/scenes/{scene_id}.npz
The structure of each NPZ file is outlined below. The mesh_ids, obj_poses, voxel_grid fields were added since the v1 release to enable training shape completion models.
v2/scenes/{scene_id}.npz:
|-- rgb: The rendered RGB image.
Shape (3, H, W), dtype float32, values scaled to [0, 1].
|-- intrinsic: The camera intrinsics.
Shape (3, 3), dtype float32.
|-- depth_map: The rendered depth map corresponding to `rgb`.
Shape (H, W), dtype float32.
|-- normal_map: The rendered normal map corresponding to rgb`.
Shape (3, H, W), dtype float32.
|-- near_plane: The minimum depth value of the scene's working volume.
Scalar, float32.
|-- far_plane: The maximum depth value of the scene's working volume.
Scalar, float32.
|-- segm/
|-- boxes: 2D bounding boxes for each object in the scene.
Shape (N_objects, 4), dtype float32.
These are pixel coordinates relative to `rgb`.
The box format is `[x_low, y_low, x_high, y_high]`
|-- masks: Binary masks for each object in the scene.
Shape (N_objects, H, W), dtype bool.
|-- amodal_masks: Amodal instance masks for each object.
Shape (N_objects, H, W), dtype bool.
|-- bbox3d/
|-- poses: The pose of each object's 3D bounding box, as a 4x4 matrix.
This is the transform from the bbox frame to the camera frame.
Shape (N_objects, 4, 4), dtype float32.
|-- dimensions: The dimensions of each 3D bounding box.
Shape (N_objects, 3), dtype float32.
|-- corners: The corner points of each 3D bounding box, in the camera frame.
Shape (N_objects, 8, 3), dtype float32.
|-- mesh_ids: The mesh_id of each object.
These correspond to the STL files in `v2/meshes/{mesh_id}.stl`
List[str], length N_objects.
|-- obj_poses/
|-- poses: The pose of each mesh, as a 4x4 matrix.
This is the transform from the mesh frame to the camera frame.
Note that the mesh frame does not necessarily equal the bbox frame!
Shape (N_objects, 4, 4), dtype float32.
|-- scales: The scale of each mesh.
Shape (N_objects, 3), dtype float32.
|-- voxel_grid/
|-- voxels: The surface of each mesh, extracted into a voxel grid.
Shape (N_objects, n_voxels, n_voxel, n_voxels), dtype bool.
|-- extents: The extents of each object's voxel grid.
`voxel_grid/voxels[i]` span the cuboid `[-extents[i], extents[i]]`,
in the object frame `obj_poses/poses[i]`.
Shape (N_objects, 3), dtype float32.