Conq Hose Manipulation Dataset

About the Dataset

This dataset was collected by Peter Mitrano and contributed to the Open-X Embodiment dataset. The purpose was to add some mobile manipulation data to the dataset, in the hopes that this the robotics transformer (RT) models could be generalized to handle mobile manipulation. Initially, these models could only control a stationary robot arm.

Downloading the Dataset

The dataset is large (~10gb) and can be downloaded from this public AWS s3 bucket:

mkdir -p ~/tensorflow_datasets/conq-hose-manipulation-dataset

aws s3 sync s3://conq-hose-manipulation-dataset/1.15.0/ ~/tensorflow_datasets/conq-hose-manipulation-dataset

To visualize the downloaded dataset, use the script visualize_dataset.py. Please see the ReadMe for more detailed instructions.

https://github.com/UM-ARM-Lab/conq_hose_manipulation_dataset_builder/blob/main/visualize_dataset.py

If you use this dataset, please cite us!

@misc{ConqHoseManipData,

author={Peter Mitrano and Dmitry Berenson},

title={Conq Hose Manipulation Dataset, v1.15.0},

year={2024},

howpublished={https://sites.google.com/view/conq-hose-manipulation-dataset}

}

Contents of the Dataset

The dataset consists of a few dozen trajectories. Each trajectory has several hundred timesteps, and each time step contains the following features:

'steps': tfds.features.Dataset({

'observation': tfds.features.FeaturesDict({

'hand_color_image': tfds.features.Image(

shape=(480, 640, 3),

dtype=np.uint8,

encoding_format='png',

doc='Hand camera RGB observation.',

),

'frontleft_fisheye_image': tfds.features.Image(

shape=(726, 604, 3),

dtype=np.uint8,

encoding_format='png',

doc='Front Left RGB observation.',

),

'frontright_fisheye_image': tfds.features.Image(

shape=(726, 604, 3),

dtype=np.uint8,

encoding_format='png',

doc='Front Right RGB observation.',

),

'state': tfds.features.Tensor(

shape=(66,),

dtype=np.float32,

doc='Concatenation of [joint states (2x: 20), body vel in vision (3 lin, 3 ang),'

'is_holding_item (1), estimated_end_effector_force_in_hand (3),'

'foot states (4x: (3 pos, 1 contact)))].'

'See bosdyn protos for details.',

)

}),

'action': tfds.features.Tensor(

shape=(7,),

dtype=np.float32,

doc='[xyz,rpy delta pose of hand in current hand frame, 1 gripper].',

),

...

'is_terminal': tfds.features.Scalar(

dtype=np.bool_,

doc='True on last step of the episode if it is a terminal step, True for demos.'

),

'language_instruction': tfds.features.Text(

doc='Language Instruction.'

),

'language_embedding': tfds.features.Tensor(

shape=(512,),

dtype=np.float32,

doc='Kona language embedding. '

'See https://tfhub.dev/google/universal-sentence-encoder-large/5'

),

The data was collected with VR teleoperation.

https://github.com/UM-ARM-Lab/conq_python/blob/master/scripts/generate_data_from_vr.py

Then, we convert the data into the Tensorflow Datasets format required by the Open-X Embodiment dataset. The code for that is here:

https://github.com/UM-ARM-Lab/conq_hose_manipulation_dataset_builder