Workshop at ICCV'15

This workshop was held in conjunction with ICCV 2015, Santiago, Chile.

The workshop was a success, thanks to all for participating!

Program (December 12th, AM)

MARRIOTT Hotel, Grand Ballroom DEF

Invited speakers

Sanja Fidler

University of Toronto

Jeffrey Siskind

Purdue University

Jason Corso

University of Michigan

Posters

He plants a tender kiss on her shoulder.

His vanity license plate reads 732.

SOMEONE sits on her roommate bed.

Automatically describing open-domain videos using rich natural sentences is among the most challenging tasks of computer vision, natural language processing and machine learning. To stimulate research on this topic, we propose the Describing and Understanding Videos Workshop & LSMDC Challenge, which features a unified version of the recently published large-scale movie datasets (M-VAD and MPII-MD). These datasets have been built using Audio Descriptions (AD) / Descriptive Video Service (DVS) resources for the visually impaired, which are transcribed and aligned to the video.

The goal of this workshop is thus to bring together researchers working on diverse topics in computer vision and natural language processing in order to obtain a better understanding of the existing challenges and new research directions of open-domain video description with natural sentences.

Call for papers and challenge participation

We invite submissions on topics pertaining to the subject, including:

Generating descriptions for videos.
Generating Audio Descriptions for movies.
Using textual descriptions as weak supervision for video understanding.
Using dialogs and/or audio for video understanding.
Understanding plots.
Recognizing characters in TV series / movies.
Novel tasks with Audio Descriptions / DVS dataset.

Paper submission

We invite paper submission in all standard conference formats (e.g. ICCV, CVPR, NIPS), non blind, 2 - 8 pages. Papers will not be included in the Proceedings of ICCV 2015 and not published in any form. Both novel and previously published works are welcome. Papers will be presented as part of a poster session.

Email submissions to: lsmdc2015 at gmail.com. Please, mention if the work is already published or accepted for publication.

LSMDC Challenge

A unified challenge based on the M-VAD and MPII-MD datasets is put in place. We will provide a blind test set (i.e. without descriptions). Winners will be selected based on a human evaluation of submissions to the challenge. Additional automatic evaluation scripts will be made available for development. We host two challenges:

Text generation using single video clip.
Text generation using single video clip as well as its surrounding context (i.e. video clips, ground truth descriptions, character names, dialogues).

The first setting is the standard one that most people in image and video description are currently working in. The second setting goes beyond most current works by including context and longer term reasoning. For details concerning the challenge, please, see the challenge page.

Important Dates

Paper submission early deadline: ~~October 1st extended to October 5th~~

Paper submission final deadline: ~~November 16th~~ ~~extended to November 22nd~~

Notification of paper acceptance: ~~within 2 weeks after the deadline~~

Challenge submission opens: ~~September 18th~~

Challenge submission deadline: ~~November 16th extended to November 22nd~~

Workshop: half day (only AM), December 12th

Contact

Please direct your inquires to: lsmdc2015 at gmail.com

Organizers

Anna Rohrbach

Max Planck Institute for Informatics

Atousa Torabi

Université de Montréal

Marcus Rohrbach

ICSI and UC Berkeley

Christopher Pal

École Polytechnique de Montréal

Hugo Larochelle

Université de Sherbrooke

Aaron Courville

Université de Montréal

Bernt Schiele

Max Planck Institute for Informatics