Dataset is hosted on HuggingFace: link
See here an example on how to generate all the experts and modalities from a RGB video using the data-pipeline: link