Natural language-based video and image search has been a long standing topic of research among information retrieval, multimedia, and computer vision communities. Several existing on-line platforms (e.g. Youtube) rely on massive human curation efforts, manually assigned tags, click counts and surrounding text to match largely unstructured search phrases in order to retrieve ranked list of relevant videos from a stored library. However, as the amount of unlabeled video content grows, with advent of inexpensive mobile recording devices (e.g. smart phones), the focus is rapidly shifting to automated understand, tagging and search. In this challenge, we would like to explore a variety of different joint language-visual learning models for video annotation and retrieval task.
The majority of the LSMDC captions contain human activities description. The main goal of this challenge is to evaluate different visual-language models performance to annotate and search videos based on natural sentences for variety of human activities. Our challenge has two main tracks as described below, participants can participate in either one or both tracks:
Our movie dataset has only one description per video, we provide new complete/simplified descriptions for subset of training data and whole public test data based on paraphrases (i.e. summarized or main aspect of what is described in the original long description) that potentially could be used as additional data for training.
Data can be downloaded here.
If you have any question about movie retrieval and movie multiple choice challenges please contact to torabi.atousa@gmail.com
If you intend to publish results that use the data and resources provided by this challenge, please include the following reference:
Movie annotation and retrieval paper:
@article{lsmdc2016MovieAnnotationRetrieval, author = {Torabi, Atousa and Tandon, Niket and Sigal,Leon}, title = {Learning Language-Visual Embedding for Movie Understanding with Natural-Language}, journal = {arXiv preprint}, year = {2016}, url = {http://arxiv.org/pdf/1609.08124v1.pdf}}Movie dataset paper:
@article{lsmdc,author = {Rohrbach, Anna and Torabi, Atousa and Rohrbach, Marcus and Tandon, Niket and Pal, Chris and Larochelle, Hugo and Courville, Aaron and Schiele, Bernt},title = {Movie Description},journal={International Journal of Computer Vision},year = {2017},url = {http://link.springer.com/article/10.1007/s11263-016-0987-1?wt_mc=Internal.Event.1.SEM.ArticleAuthorOnlineFirst}}