Workshop on Integrating Language and Vision

Workshop date: Friday, December 16, 2011

Location:  Sierra Nevada, Spain (held at NIPS 2011)

Organizers: Trevor Darrell, Raymond J. Mooney, Kate Saenko


NEW! V&L Net Best Poster Prize has been awarded to:
Desmond Elliott and Frank Keller, A Treebank of Visual and Linguistic Data.

Workshop schedule is now available.

The workshop date has been set: Friday, December 16th.

V&L Net is sponsoring a Best Poster Prize.


A growing number of researchers in computer vision have started to explore how language accompanying images and video can be used to aid interpretation and retrieval, as well as train object and activity recognizers.  Simultaneously, an increasing number of computational linguists have begun to investigate how visual information can be used to aid language learning and interpretation, and to ground the meaning of words and sentences in perception.  However, there has been very little direct interaction between researchers in these two distinct disciplines.  Consequently, researchers in each area have a limited understanding of the methods in the other area, and do not optimally exploit the latest ideas and techniques from both disciplines when developing systems that integrate language and vision. The goal of this workshop is to bring together researchers in both computer vision and natural-language processing (NLP) to interact, collaborate, and discuss issues and future directions in integrating language and vision.

Traditional machine learning for both computer vision and NLP requires manually annotating images, video, text, or speech with detailed labels, parse-trees, segmentations, etc. Methods that integrate language and vision hold the promise of greatly reducing such manual supervision by using naturally co-occurring text and images/video to mutually supervise each other.

There is a wide range of important real-world applications that require integrating vision and language, including but not limited to: image and video retrieval, human-robot interaction, medical image processing, human-computer interaction in virtual worlds, and computer graphics generation.