LSMDC 2015
Challenge winner
Based on the human evaluation (details below) we have determined the challenge winner:
Video Captioning with Recurrent Networks Based on Frame- and Video-Level Features and Visual Content Classification. Rakshith Shetty, Jorma Laaksonen
Human evaluation
We have carried out the human evaluation of four competing systems submitted to the challenge. Human were asked to rank four generated sentences and a reference sentence from 1 to 5 (lower - better) with respect to the following criteria.
- Grammar: judge the fluency and readability of the sentence (independently of the correctness with respect to the video).
- Correctness: for which sentence is the content more correct with respect to the video (independent if it is complete, i.e. describes everything), independent of the grammatical correctness.
- Relevance: Which sentence contains the more salient (i.e. relevant, important) events/objects of the video?
- Helpful for blind (additional criteria): how helpful would the sentence be for a blind person to understand what is happening in this movie snippet.
While the first three criteria are well established in the literature, we also asked humans to provide rankings for the additional (Helpful for blind) criteria. We have evaluated 1,200 randomly selected sentences, the results are shown below.
For results of automatic evaluation, please, see the competition leaderboard.
[1] The Long-Short Story of Movie Description. Anna Rohrbach, Marcus Rohrbach, Bernt Schiele. GCPR'15
[2] Sequence to Sequence – Video to Text. Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko. ICCV'15
[3] Video Captioning with Recurrent Networks Based on Frame- and Video-Level Features and Visual Content Classification. Rakshith Shetty, Jorma Laaksonen
[4] Describing Videos by Exploiting Temporal Structure.Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville
Challenge prize
The winner of the challenge will be awarded with a Titan X NVIDIA GPU, provided by NVIDIA.
Submission opened!
We started the second phase of the Challenge which will be evaluated on the Blind Test set. The videos and annotations for the Blind Test set (except sentences) are available now.
Evaluation server is here.
Leaderboard with the current submissions is here.
Dataset
Prior to accessing the data you should perform these steps (if not done previously):
- request access to MPII Movie Description dataset (MPII-MD)
- sign the access form for the Montreal Video Annotation Dataset (M-VAD), scan and email it to
- "torabi.atousa at gmail.com"
- you will receive 2 username/password pairs, which will allow you to access the data
Details about the dataset and download procedure are given in the README.txt (use username/password provided by MPII-MD).
- Download script for the main challenge* data [link] (use username/password provided by MPII-MD)
* Text generation using single video clip.
If you have any problems using the data or find any issues, please, contact "arohrbach at mpi-inf.mpg.de" and "torabi.atousa at gmail.com".
Citations
If you intend to publish results that use the data and resources provided by this challenge, please include the following references:
@article{lsmdc2015,author = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal Chris and Larochelle Hugo and Courville Aaron and Schiele, Bernt},title = {Movie Description},journal = {arXiv preprint},year = {2016},url = {https://arxiv.org/pdf/1605.03705.pdf}}Organizers
Anna Rohrbach
Max Planck Institute for Informatics
Atousa Torabi
Université de Montréal
Marcus Rohrbach
ICSI and UC Berkeley
Christopher Pal
École Polytechnique de Montréal
Hugo Larochelle
Université de Sherbrooke
Aaron Courville
Université de Montréal
Bernt Schiele
Max Planck Institute for Informatics