HOME

Location: Sala Pasinetti (Palazzo del Cinema)

Organizers

            Mohamed Elhoseiny, Postdoc Researcher, Facebook AI Research, elhoseiny<at>fb.com

Devi Parikh, Assistant Professor at Georgia Tech, parikh<at>gatech.edu                    

Leonid Sigal, Senior Research Scientist at Disney Research, lsigal<at>disneyresearch.com

Manohar Paluri, Research Lead, Facebook Research, mano<at>fb.com

Margerett Mitchell, Senior Research Scientist, Google Research, margarmitchell <at> gmail.com

Ishan Misra, CMU

            Ahmed Elgammal, Professor at Rutgers University, elgammal<at>cs.rutgers.edu

The scope of this workshop lies in the boundary of Computer Vision and Natural Language Processing. In recent years, there have been increasing interest in the intersection between Computer Vision and NLP.  Researchers have studied several interesting tasks, including generating text descriptions from images and videos,  language embedding of images, and predicting visual classifiers from unstructured text. More recent work has further extended the scope of this area to combine videos and language; learning to solve non-visual tasks using visual cues;  visual question answering visual dialog; and others.  In this workshop, we aim to provide a full day focused on these interesting research areas, helping to bolster the communication and shared knowledge across tasks and approaches in this area, and provide a space to discuss the  future and impact of vision-language technology. We will also have a panel discussion focused on how to develop useful datasets and benchmarks that are suitable to the various tasks in this area.

Workshop Theme

Besides being an umbrella for research at the intersection between vision and language, this year we introduce a special theme for the workshop focused on  “Visual Storytelling”.  Visual Storytelling is the task of creating a short story based on a sequence of images, and we plan to host a  competition on this topic. We also plan to have a different special theme in each round of future CLVL workshops, helping to further advance and shine light on different aspects of vision-language research.  

Organizers

The organizers of the workshop bring together a team of diverse researchers from academia (Georgia Tech and Rutgers) and industry (Facebook AI Research, Disney Research, Google Research).

Sponsors


Call for papers: The submitted extended abstracts will be considered for presentation. Accepted papers will be presented in the workshop poster session. A portion of the accepted papers will be orally presented. We solicit 2 page extended abstracts. Extended abstracts will not be included in the Proceedings of ICCV 2017 and not published in any form. Topics of this workshop include

Intended audience

The intended audience of this workshop are scientists working in the overlap area between vision and language.