dataset

The DISC2021 (aka DISC21) dataset is provided as zip files on AWS S3.

For the larger subsets, we split the data into several zip files to make it easier to download in batch. Thus, the zip files are around 8GiB in size.

Image files

Development query images (50k images): dev_queries.zip

Final query images (50k images): final_queries.zip

Reference images (20 × 50k images):

Training images (20 × 50k images):

Metatdata files are provided in CSV format. We expect the format to be self-explanatory.

Development queries ground truth: dev_ground_truth.csv

Final queries ground truth: final_ground_truth.csv

Attributions of all images (Flickr user that created each image): disc21_yfcc_attributions.csv disc21_testset_yfcc_attributions.csv

Metadata for the image manipulation process: metadata_final_10k.csv

Page updated

Google Sites

Report abuse