Movie Fill-in-the-Blank

Introduction

Question-answering has become a popular task, with many practical applications (e.g. dialogue systems). It's appealingly easy to interpret and quantitatively evaluate, and with a simple setup, it's relatively easy to generate the very large datasets that work well for deep learning. Advancements in visual QA (with images and natural language questions) with large-scale datasets have pushed this field forward rapidly, and with this challenge dataset we hope to extend this progress to video.

Challenge description

Given a video clip and a sentence with a blank in it, the task is to fill in the blank with the correct word.

Blanks are single-word only, from a vocabulary of about 3000 words including nouns, verbs, adjectives, and adverbs, with each blank occurring 50-3000 times in the training set. There are almost 300 000 training examples, from 100 000 clips. We provide a separate non-overlapping validation set with 21 000 examples (from 7500 clips), and evaluation will be on a test set of 30 000 examples (from 10 000 clips).

In the evaluation, we will also report performance by part-of-speech (nouns, verbs, adjectives, and adverbs), as well as human-defined categories based on visual information, to be announced soon (e.g. kissing/affection, anger/stress, fast motion, etc.). The intention of this evaluation is to examine how different models perform in different areas, as well as too look at how well our evaluation metrics correspond with intuitive categorizations.

Download

Data can be downloaded here.

Submission server

You can submit here.

Citations

Movie FiB paper:

@inproceedings{maharaj2017dataset,title={A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering.}, author={Maharaj, Tegan and Ballas, Nicolas and Rohrbach, Anna and Courville, Aaron C and Pal, Christopher Joseph}, booktitle={Computer Vision and Pattern Recognition (CVPR)}, year={2017}, url={http://openaccess.thecvf.com/content_cvpr_2017/papers/Maharaj_A_Dataset_and_CVPR_2017_paper.pdf}}

Movie description dataset paper:

@article{lsmdc, author = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Chris and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt}, title = {Movie Description}, journal={International Journal of Computer Vision}, year = {2017},url = {http://link.springer.com/article/10.1007/s11263-016-0987-1?wt_mc=Internal.Event.1.SEM.ArticleAuthorOnlineFirst}}