The workshop will be held at the Marriott Ballroom ABC

12/07/2015 Extended Abstracts for the oral session are added to the CLVL Program 
12/01/2015 Presentation instructions is now available link
11/30/2015 Notification of acceptance
10/30/2015 Deadline of Abstract submission extended to Nov 20th, 2015
10/30/2015 Program is now available!

The scope of this workshop lies in the boundary of Computer Vision and Natural Language Processing. In the recent years, there have been increasing interest in the intersection between Computer Vision and NLP.  Researches addressed several interesting tasks including generating text description from images and videos,  language embedding of images, predicting visual classifiers from unstructured text. More recent works further extends the scope of this area to combine videos and language, Learning to solve non-visual tasks using visual cues, and question answering by visual verification of relation phrases.  In this workshop, we aim to cover all these interesting aspects which benefit from jointly modelling visual and semantic concepts, and discuss its future and impact. We will also have a panel discussion focused on how to develop useful datasets and benchmarks that are suitable to the various tasks in this area.

In order to achieve this goal, the program of this workshop will include three to four invited  talks by leading researchers in this area covering its diverse aspects. There is be a call for extended abstracts focused on this area .


The workshop is co-located with ICCV 2015 in Santiago, Chile.



Facebook Event Page

Call for papers: The submitted extended abstracts will be considered for presentation. Accepted papers will be presented in the workshop poster session. A portion of the accepted papers will be orally presented. We solicit 2 page extended abstracts. Extended abstracts will not be included in the Proceedings of ICCV 2015 and not published in any form. Topics of this workshop include

  • learning to solve non-visual tasks using visual cues
  • question answering by visual verification
  • novel problems in vision and language
  • visual sense disambiguation
  • deep learning methods for vision and language
  • visual Reasoning on language problems
  • language based visual abstraction
  • text as weak labels for image or video classification.
  • image/Video Annotation and natural language description Generation,
  • text-to-scene generation
  • transfer learning for vision and language,
  • jointly learn to parse and perceive (text+image, text+video)
  • multimodal clustering and word sense disambiguation
  • unstructured text search for visual content
  • visually grounded language acquisition and understanding
  • language-based image and video search
  • linguistic descriptions of spatial relations
  • auto-illustration
  • natural language grounding & learning by watching  
  • learning knowledge from the web
  • language as a mechanism to structure and reason about visual perception
  • language as a learning bias to aid vision in both machines and humans
  • dialog as means of sharing knowledge about visual perception
  • stories as means of abstraction
  • understanding the relationship between language and vision in humans

Intended audience
The intended audience of this workshop are scientists working in  the overlap area  between vision and language.

Workshop Program Chairs
Ahmed Elgammal Rutgers University

Leonid Sigal            Disney Research Pittsburgh


Mohamed Elhoseiny (workshop organizer, PhD Candidate at Rutgers University)

Ahmed Elgammal (Workshop organizer and Program Chair, Rutgers University)

Leonid Sigal            ((Workshop organizer and Program Chair, Disney Research Pittsburgh)