Supported Datasets

Although authors may use any datasets they choose in their submissions, the organizers are also providing 3 datasets to support generic video search and specific instance search research work. All the provided datasets has or are being used at the annual TRECVID (video retrieval evaluation) and VBS (video browser showdown) benchmarks.

Vimeo Creative Commons (V3C)

The V3C1 dataset (drawn from a larger V3C video dataset) is composed of 7475 Vimeo videos (1.3 TB, 1000 h) with Creative Commons licenses and mean duration of 8 min. All videos will have some metadata available e.g., title, keywords, and description in json files. The dataset has been segmented into 1,082,657 short video segments according to the provided master shot boundary files. In addition, Keyframes and thumbnails per video segment have been extracted and available.

  1. Please fill, sign and submit the data permission form by email
  2. After processing your request, you will be sent the access to download information
  3. The master shot boundary reference is available from here
  4. Small set of queries (6) and their ground truth is available as result of using V3C1 during the last Video Browser Showdown (VBS 2019).

The IACC.3 dataset is approximately 4600 Internet Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration ranging from 6.5 min to 9.5 min and a mean duration of almost 7.8 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description.

Internet Archive Creative Commons (IACC.3)

  1. Please fill, sign, and submit the data permission form by email
  2. After processing your request, you will be sent the access to download information
  3. The master shot boundary reference is available here in addition to keyframes. The provided ground truth uses this shot boundary to judge if a shot in a video is relevant or non-relevant to the query.
  4. Previously used 90 ad-hoc textual queries from 2016-2018 at TRECVID are available. (2016: queries 501-530, 2017: queries 531-560, 2018: queries 561-590)
  5. The ground truth data is available for 2016 , 2017 and 2018 queries. A readme file for the ground truth format is available here.

The BBC Eastenders dataset is approximately 244 video files (totally 300 GB, 464 h) with associated metadata, each containing a week's worth of BBC EastEnders programs in MPEG-4/H.264 format.

BBC Eastenders

  1. Please fill, sign, and submit the data permission form by email
  2. After processing your request, you will be sent the access to download information
  3. The master shot boundary reference is available here. The provided ground truth uses this shot boundary to judge if a shot in a video is relevant or non-relevant to the query.
  4. Previously used 90 instance search queries (find specific person X in specific location Y) from 2016-2018 at TRECVID are available
  5. Previously used 90 instance search queries (find specific person, object or location) from 2013-2015 at TRECVID are available
  6. The ground truth created at NIST during TRECVID 2013 to 2018 is available. A readme file for the ground truth format is available here.