Large Scale Movie Description Challenge (LSMDC)

Dataset download

Prior to accessing the data you should perform these steps (if not done previously):

Note, that the following download links will require username/password.

NEW! Updates for the 2021 challenge

  • We have revised some of the annotations, the links below now point to the corrected annotations.

  • We have modified the evaluation protocol for Tasks 2 and 3. The new evaluation is described in detail in our recent paper.

  • We have added the baseline code for Task 2 below.

LSMDC v2: Multi-sentence movie description with character IDs

  • Task 1: Training, validation and test annotations with "SOMEONE"-s [link].

  • Task 2: Training, validation and test annotations with character IDs "blanked" plus training and validation ground-truth IDs (global and local; evaluation will be carried out according to local IDs) [link]; test set IDs are considered blind.

  • Task 3: Training and validation annotations with characters IDs present (global and local; evaluation will be carried out according to local IDs) [link]; test set IDs are considered blind.

  • Segmentation into sets of 5 clips that will be used in the evaluation [link].

  • Download script for the video clips [link].

  • Additionally: training and validation annotations with original character names [link] and character meta information [link] (not necessary for the challenge, but provided for your convenience).

    • Character meta-information includes <NAME MENTION>\t<MAIN NAME MENTION>\t<GENDER>\t<ID>. The <MAIN NAME MENTION> is either identical to the <NAME MENTION> or is a reference to a certain "main" name mention. <GENDER> is annotated as M (male) or F (female), however also G (group) and X (not a person) are sometimes used.

Precomputed visual features

  • We share precomputed I3D features [link] (4.7G) pretrained on Imagenet and Kinetics, and Resnet152 features [link] (9.9G) pretrained on Imagenet. The zip files include directories for each movie; each directory contains numpy files for each video clip from a movie. Frames were extracted at 25 fps, and further uniformly sampled to have at most 200 frames per video. I3D features are numpy arrays of the size [num_of_segments, 1024], and Resnet features have the size [num_of_segments, 2048]. Both arrays have the same number of segments for each video clip, num_of_segments.

Baselines


Standard LSMDC: Movie description / Movie annotation and retrieval / Movie fill-in-the-blank

  • Details about the dataset and download procedure are given in the README.txt.

  • Download script for the movie description challenge data [link]

Movie annotation and retrieval

Training data:

  • The video clips and original annotations can be downloaded using the script in the Movie description (above). This data could be used for training joint visual-language models for challenge tracks on movie multiple-choice test and movie retrieval.

  • Details about the para-phrases data are given in the README_PP.txt

  • Download script for para-phrases data [link]

Challenge data:

  • Movie Retrieval track: this challenge track will be evaluated based on random 1000 test clips data [link]

  • Movie Multiple-Choice Test track:

    1. Details about the multiple-choice test are given in the README_MC.txt

    • Download script for multiple-choice test data [link]

    • This track will be evaluated on 10,053 test clips (LSMDC16_multiple_choice_test_randomized.csv) which is provided in above line download script.

Movie fill-in-the-blank

  • The video clips can be downloaded using the script in the Movie description above.

  • Details about the annotation format and download procedure are given in the README_FIB.txt

  • Download script for the movie fill-in-the-blank challenge data [link]

If you have any problems using the data or find any issues, please, contact "arohrbach at mpiĀ­-inf.mpg.de", "torabi.atousa at gmail.com" and "tegan.maharaj at polymtl.ca".

Citations

If you intend to publish results that use the data and resources provided by this challenge, please include the following references:

Movie description dataset paper:

@article{lsmdc,author = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Chris and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt},title = {Movie Description},journal={International Journal of Computer Vision},year = {2017},url = {http://link.springer.com/article/10.1007/s11263-016-0987-1?wt_mc=Internal.Event.1.SEM.ArticleAuthorOnlineFirst}}

Movie annotation and retrieval paper:

@article{lsmdc2016MovieAnnotationRetrieval, author = {Torabi, Atousa and Tandon, Niket and Sigal,Leon}, title = {Learning Language-Visual Embedding for Movie Understanding with Natural-Language}, journal = {arXiv:1609.08124},year = {2016}, url = {http://arxiv.org/pdf/1609.08124v1.pdf}}

Movie Fill-in-the-Blank paper:

@inproceedings{maharaj2017dataset, title={A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering.}, author={Maharaj, Tegan and Ballas, Nicolas and Rohrbach, Anna and Courville, Aaron C and Pal, Christopher Joseph}, booktitle={Computer Vision and Pattern Recognition (CVPR)}, year={2017}, url={http://openaccess.thecvf.com/content_cvpr_2017/papers/Maharaj_A_Dataset_and_CVPR_2017_paper.pdf}}