LSMDC 2015

Challenge winner

Based on the human evaluation (details below) we have determined the challenge winner:

Video Captioning with Recurrent Networks Based on Frame- and Video-Level Features and Visual Content Classification. Rakshith Shetty, Jorma Laaksonen

Human evaluation

We have carried out the human evaluation of four competing systems submitted to the challenge. Human were asked to rank four generated sentences and a reference sentence from 1 to 5 (lower - better) with respect to the following criteria.

Grammar: judge the fluency and readability of the sentence (independently of the correctness with respect to the video).
Correctness: for which sentence is the content more correct with respect to the video (independent if it is complete, i.e. describes everything), independent of the grammatical correctness.
Relevance: Which sentence contains the more salient (i.e. relevant, important) events/objects of the video?
Helpful for blind (additional criteria): how helpful would the sentence be for a blind person to understand what is happening in this movie snippet.

While the first three criteria are well established in the literature, we also asked humans to provide rankings for the additional (Helpful for blind) criteria. We have evaluated 1,200 randomly selected sentences, the results are shown below.

For results of automatic evaluation, please, see the competition leaderboard.

[1] The Long-Short Story of Movie Description. Anna Rohrbach, Marcus Rohrbach, Bernt Schiele. GCPR'15

[2] Sequence to Sequence – Video to Text. Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko. ICCV'15

[3] Video Captioning with Recurrent Networks Based on Frame- and Video-Level Features and Visual Content Classification. Rakshith Shetty, Jorma Laaksonen

[4] Describing Videos by Exploiting Temporal Structure.Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville

Challenge prize

The winner of the challenge will be awarded with a Titan X NVIDIA GPU, provided by NVIDIA.

Submission opened!

We started the second phase of the Challenge which will be evaluated on the Blind Test set. The videos and annotations for the Blind Test set (except sentences) are available now.

Evaluation server is here.

Leaderboard with the current submissions is here.

Dataset

Prior to accessing the data you should perform these steps (if not done previously):

request access to MPII Movie Description dataset (MPII-MD)
sign the access form for the Montreal Video Annotation Dataset (M-VAD), scan and email it to
- "torabi.atousa at gmail.com"
you will receive 2 username/password pairs, which will allow you to access the data

Details about the dataset and download procedure are given in the README.txt (use username/password provided by MPII-MD).

Download script for the main challenge* data [link] (use username/password provided by MPII-MD)

* Text generation using single video clip.

If you have any problems using the data or find any issues, please, contact "arohrbach at mpi-inf.mpg.de" and "torabi.atousa at gmail.com".

Citations

If you intend to publish results that use the data and resources provided by this challenge, please include the following references:

@article{lsmdc2015,author = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal Chris and Larochelle Hugo and Courville Aaron and Schiele, Bernt},title = {Movie Description},journal = {arXiv preprint},year = {2016},url = {https://arxiv.org/pdf/1605.03705.pdf}}

Organizers

Anna Rohrbach

Max Planck Institute for Informatics

Atousa Torabi

Université de Montréal

Marcus Rohrbach

ICSI and UC Berkeley

Christopher Pal

École Polytechnique de Montréal

Hugo Larochelle

Université de Sherbrooke

Aaron Courville

Université de Montréal

Bernt Schiele

Max Planck Institute for Informatics