Paper Submissions
We invite 4-page abstracts in ICCV format (including references) of new or previously published work addressing the topics outlined below. (For the previously published works re-formatting is not necessary.) We will make the accepted submissions available on our website as non-archival reports, and will also allow for peer-reviewed novel submissions to appear in the ICCV workshop proceedings. The accepted works will be presented in the poster session and some will be selected for oral presentation.
CMT Website: https://cmt3.research.microsoft.com/ICCV2019CLVL
Format: The submission should follow the ICCV 2019 submission instructions.
Novel problems in vision and language
Learning to solve non-visual tasks using visual cues
Language guided visual understanding (objects, relationships)
Visual dialog and question answering by visual verification
Visual question generation
Visually grounded conversation
Visual sense disambiguation
Deep learning methods for vision and language
Visual reasoning on language problems
Text-to-image generation
Language based visual abstraction
Text as weak labels for image or video classification.
Image/Video Annotation and natural language description generation
Transfer learning for vision and language
Jointly learn to parse and perceive (text+image, text+video)
Multimodal clustering and word sense disambiguation
Unstructured text search for visual content
Visually grounded language acquisition and understanding
Referring expression comprehension
Language-based image and video search/retrieval
Linguistic descriptions of spatial relations
Auto-illustration
Natural language grounding & learning by watching
Learning knowledge from the web
Language as a mechanism to structure and reason about visual perception
Language as a learning bias to aid vision in both machines and humans
Dialog as means of sharing knowledge about visual perception
Stories as means of abstraction
Understanding the relationship between language and vision in humans
Humanistic, subjective, or expressive vision-to-language
Visual storytelling
Generating Audio Descriptions for movies
Multi-sentence descriptions for images and videos
Visual question answering for video
Visual fill-in-the blank tasks
Language as supervision for video understanding
Using dialogs and/or audio for video understanding
Understanding videos and plots
Limitations of existing vision and language datasets
Limitations of existing vision and language approaches