Prior to accessing the data you should perform these steps (if not done previously):
request access to MPII Movie Description dataset (MPII-MD)
UPDATE: we are re-working the access system, there might be issues in the next few days, stay tuned!
you will receive a username/password, which will allow you to access the data
Note, that the following download links will require username/password.
We have revised some of the annotations, the links below now point to the corrected annotations.
We have modified the evaluation protocol for Tasks 2 and 3. The new evaluation is described in detail in our recent paper.
We have added the baseline code for Task 2 below.
Task 1: Training, validation and test annotations with "SOMEONE"-s [link].
Task 2: Training, validation and test annotations with character IDs "blanked" plus training and validation ground-truth IDs (global and local; evaluation will be carried out according to local IDs) [link]; test set IDs are considered blind.
Task 3: Training and validation annotations with characters IDs present (global and local; evaluation will be carried out according to local IDs) [link]; test set IDs are considered blind.
Segmentation into sets of 5 clips that will be used in the evaluation [link].
Download script for the video clips [link].
Additionally: training and validation annotations with original character names [link] and character meta information [link] (not necessary for the challenge, but provided for your convenience).
Character meta-information includes <NAME MENTION>\t<MAIN NAME MENTION>\t<GENDER>\t<ID>. The <MAIN NAME MENTION> is either identical to the <NAME MENTION> or is a reference to a certain "main" name mention. <GENDER> is annotated as M (male) or F (female), however also G (group) and X (not a person) are sometimes used.
We share precomputed I3D features [link] (4.7G) pretrained on Imagenet and Kinetics, and Resnet152 features [link] (9.9G) pretrained on Imagenet. The zip files include directories for each movie; each directory contains numpy files for each video clip from a movie. Frames were extracted at 25 fps, and further uniformly sampled to have at most 200 frames per video. I3D features are numpy arrays of the size [num_of_segments, 1024], and Resnet features have the size [num_of_segments, 2048]. Both arrays have the same number of segments for each video clip, num_of_segments.
We provide the baseline code for Task 1 (generating descriptions with "SOMEONE") here: https://github.com/jamespark3922/lsmdc-baseline
We provide the baseline code for Task 2 (filling-in the IDs) here: https://github.com/jamespark3922/lsmdc-fillin
Details about the dataset and download procedure are given in the README.txt.
Download script for the movie description challenge data [link]
Training data:
The video clips and original annotations can be downloaded using the script in the Movie description (above). This data could be used for training joint visual-language models for challenge tracks on movie multiple-choice test and movie retrieval.
Details about the para-phrases data are given in the README_PP.txt
Download script for para-phrases data [link]
Challenge data:
Movie Retrieval track: this challenge track will be evaluated based on random 1000 test clips data [link]
Movie Multiple-Choice Test track:
Details about the multiple-choice test are given in the README_MC.txt
Download script for multiple-choice test data [link]
This track will be evaluated on 10,053 test clips (LSMDC16_multiple_choice_test_randomized.csv) which is provided in above line download script.
The video clips can be downloaded using the script in the Movie description above.
Details about the annotation format and download procedure are given in the README_FIB.txt
Download script for the movie fill-in-the-blank challenge data [link]
If you have any problems using the data or find any issues, please, contact "arohrbach at mpi-inf.mpg.de", "torabi.atousa at gmail.com" and "tegan.maharaj at polymtl.ca".
If you intend to publish results that use the data and resources provided by this challenge, please include the following references:
Movie description dataset paper:
@article{lsmdc,author = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Chris and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt},title = {Movie Description},journal={International Journal of Computer Vision},year = {2017},url = {http://link.springer.com/article/10.1007/s11263-016-0987-1?wt_mc=Internal.Event.1.SEM.ArticleAuthorOnlineFirst}}Movie annotation and retrieval paper:
@article{lsmdc2016MovieAnnotationRetrieval, author = {Torabi, Atousa and Tandon, Niket and Sigal,Leon}, title = {Learning Language-Visual Embedding for Movie Understanding with Natural-Language}, journal = {arXiv:1609.08124},year = {2016}, url = {http://arxiv.org/pdf/1609.08124v1.pdf}}Movie Fill-in-the-Blank paper:
@inproceedings{maharaj2017dataset, title={A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering.}, author={Maharaj, Tegan and Ballas, Nicolas and Rohrbach, Anna and Courville, Aaron C and Pal, Christopher Joseph}, booktitle={Computer Vision and Pattern Recognition (CVPR)}, year={2017}, url={http://openaccess.thecvf.com/content_cvpr_2017/papers/Maharaj_A_Dataset_and_CVPR_2017_paper.pdf}}