Location: 317A, COEX Convention Center, Seoul, South Korea. http://iccv2019.thecvf.com/

Audience Interaction: 

Mobile App: There is also an App where you need to enter eventcode: #5325, that gives access to both Live Q/A and the Poll



Opening Remarks

Workshop Organizers
9:00-9:30 Invited Talk: Learning Representations of Vision and Language Jin-Hwa Kim
9:30-10:00 Spotlight Presentation 1
see list of presentations below

10:00-10:30 Invited Talk: A Critical Look at Visual Grounding Svetlana Lazebnik
10:30-11:00 Posters + Coffee Break
11:00-11:30 Invited Talk: Can’t Close the Loop without Commonsense Models
Yejin Choi


Spotlight Presentation 2
see list of presentations below


Lunch and Posters
2:00-2:30 Invited Talk: Audio captioning and knowledge-grounded conversation

Gunhee Kim
2:30-3:15 VATEX Challenge: presentation of challenge & oral talks from winning teams Xin Wang
3:15-4:00 LSMDC Challenge: presentation of challenge & oral talks from winning teams Jae Sung Park
4:00-4:30 Coffee Break
4:30-5:00 Invited Talk: V&L --> V ∪ L: Breaking away from task- and dataset-specific vision+language 
Devi Parikh and Jiasen Lu

Panel Discussion: 
Devi Parikh, Jaisen Lu, 
Yejin Choi, Marcus Rohbrach, Leonid Sigal, Dhruv Batra, Svetlana Lazebnik

Invited Speakers

Jin-Hwa Kim
T-Brain, SK Telecom 

Svetlana Lazebnik
University of Illinois at Urbana-Champaign

Yejin Choi
University of Washington & AI2

Gunhee Kim
Seoul National University

Devi Parikh
Georgia Tech & FAIR

Jiasen Lu
Georgia Tech

Spotlight and Poster presentation
All spotlights are also presented as posters in the poster sessions 10:30-11:00 and 12:00-2:00

9:30-10:00Spotlight Presentation 1
https://www.dropbox.com/s/0r9q56dcdxjv165/0001.pdf?dl=0Are we asking the right questions in MovieQA?.Bhavan Jasani (Robotics Institute, Carnegie Mellon University)*; Rohit Girdhar (Carnegie Mellon University); Deva Ramanan (Carnegie Mellon University) 
Poster Number


3. https://www.dropbox.com/s/btedaep5pgxtavm/VTC_paper_CLVL.pdf?dl=0(Poster only) Video-Text Compliance: Activity Verification based on Natural Language Instructions,  Mayoore Jaiswal (IBM)*; Frank Liu (IBM Research); Anupama Jagannathan (IBM); Anne Gattiker (IBM); Inseok Hwang (IBM); Jinho Lee (Yonsei University); Matt Tong (IBm); Sahil Dureja (IBM); Soham Shah (IBM); Peter Hofstee (IBM); Valerie Chen (Yale University); Suvadip Paul (Stanford University); Rogerio Feris (IBM Research AI, MIT-IBM Watson AI Lab) 

  4. https://www.dropbox.com/s/btedaep5pgxtavm/VTC_paper_CLVL.pdf?dl=0SUN-Spot: An RGB-D Dataset With Spatial Referring Expressions,  Cecilia Mauceri (University of Colorado Boulder)*; Christoffer Heckman (University of Colorado); Martha S Palmer (University of Colorado), supp https://www.dropbox.com/s/qchioicfph90lnt/0004-supp.pdf?dl=0 86 
  5. https://www.dropbox.com/s/2npqat04crw3nba/camera_ready.pdf?dl=0Evaluating Text-to-Image Matching using Binary Image Selection (BISON),  Hexiang Hu (USC)*; Ishan Misra (Facebook AI Research ); Laurens van der Maaten (Facebook), supp https://www.dropbox.com/s/xd91r0szu8n698n/supp.pdf?dl=0 87 
  13. https://www.dropbox.com/s/vttv2r94mypoxp3/main.pdf?dl=0Visual Storytelling via Predicting Anchor Word Embeddings in the Stories, Bowen Zhang (University of Southern California)*; Hexiang Hu (USC); Fei Sha (Google Research)  
  14. https://www.dropbox.com/s/5wscdx3wrz1mx0h/Prose_for_a_Painting__ICCV_%20%282%29.pdf?dl=0Prose for a Painting,  Prerna Kashyap (Columbia University)*; Samrat H Phatale (Columbia University); Iddo Drori (Columbia University and Cornell) 
  16. https://www.dropbox.com/s/vsvnxf3smr5172a/vSTS_clvl2017.camera.pdf?dl=0Why Does a Visual Question Have Different Answers?,  Danna Gurari (University of Texas at Austin)* 90 
  17. https://www.dropbox.com/s/tdqr9efrjdkeicz/iccv.pdf?dl=0Analysis of diversity-accuracy tradeoff in image captioning, Ruotian Luo (Toyota Technological Institute at Chicago)*; Greg Shakhnarovich (TTI-Chicago)
  19. https://www.dropbox.com/s/56lea9vvm9ybyhs/nocaps%20%281%29.pdf?dl=0nocaps: novel object captioning at scale, Harsh Agrawal (Georgia Institute of Technology)*; Karan Desai (University of Michigan); Yufei Wang (Macquarie University); Xinlei Chen (Facebook AI Research); Rishabh Jain (Georgia Tech); Mark Johnson (Macquarie University); Dhruv Batra (Georgia Tech & Facebook AI Research); Devi Parikh (Georgia Tech & Facebook AI Research); Stefan Lee (Oregon State University); Peter Anderson (Georgia Tech) 
 20. https://www.dropbox.com/s/dhskhm016yblgt6/Unpaired_Caption_Data.pdf?dl=0Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach, Dong-Jin Kim (KAIST)*; Jinsoo Choi (KAIST); Tae-Hyun Oh (MIT CSAIL); In So Kweon (KAIST)  
21.https://www.dropbox.com/s/vsvnxf3smr5172a/vSTS_clvl2017.camera.pdf?dl=0Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering, Soravit Changpinyo (Google AI)*; Bo Pang (); Piyush Sharma (Google Research); Radu Soricut (Google)

 11:30-12:00 Spotlight Presentation 2  
 22. https://www.dropbox.com/s/yaopiqq863x66c8/MULE_cam.pdf?dl=0MULE: Multimodal Universal Language Embedding,  Donghyun Kim (Boston University)*; Kuniaki Saito (Boston University); Kate Saenko (Boston University); Stan Sclaroff (Boston University); Bryan Plummer (Boston University) 
  23.  https://www.dropbox.com/s/c8izb928nwvkdw5/PID6085667.pdf?dl=0Incorporating 3D Information into Visual Question Answering, Yue Qiu (National Institute of Advanced Industrial Science and Technology (AIST),University of Tsukuba)*; Yutaka Satoh (National Institute of Advanced Industrial Science and Technology (AIST)); Kazuma Asano (National Institute of Advanced Industrial Science and Technology (AIST); University of Tsukuba); Kenji Iwata (National Institute of Advanced Industrial Science and Technology (AIST)); Ryota Suzuki (National Institute of Advanced Industrial Science and Technology (AIST)); Hirokatsu Kataoka (National Institute of Advanced Industrial Science and Technology (AIST)) 

  24. https://www.dropbox.com/s/e4kelkdhskojbl1/0024.pdf?dl=0Multimodal Differential Network for Visual Question Generation, Badri Patro (IIT Kanpur)*; Sandeep Kumar (IIT Kanpur); Vinod Kumar Kurmi (IIT Kanpur); Vinay P Namboodiri (IIT Kanpur) 
  25. https://www.dropbox.com/s/u8zqaff4wo4rjlm/0025.pdf?dl=0Learning Semantic Sentence Embeddings using Pair-wise Discriminator, Badri Patro (IIT Kanpur)*; Vinod Kumar Kurmi (IIT Kanpur); Sandeep Kumar (IIT Kanpur); Vinay P Namboodiri (IIT Kanpur) 
  26. https://www.dropbox.com/s/ams143zaqud38zz/seqcave_camera_ready.pdf?dl=0Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning, Jyoti Aneja (University of Illinois, Urbana-Champaign)*; Harsh Agrawal (Georgia Institute of Technology) 
  27. https://www.dropbox.com/s/g2kerei332mhiqn/ICCV2019CLVL%20%281%29.pdf?dl=0Reinforcing an Image Caption Generator using Off-line Human Feedback, Paul Hongsuck Seo (POSTECH)*; Piyush Sharma (Google Research); Tomer Levinboim (Google); Bohyung Han (Seoul National University); Radu Soricut (Google) 
  28. https://www.dropbox.com/s/u334ylme6pbnyqv/028_Yang%20Liu%20-%200028.pdf?dl=0Use What You Have: Video retrieval using representations from collaborative experts, Yang Liu (University of Oxford)*; Samuel Albanie (University of Oxford); Arsha Nagrani (Oxford University ); Andrew Zisserman (University of Oxford) 
  29. https://www.dropbox.com/s/5fe88xkd4v3qk9e/STVQA_ICCV_Workshop.pdf?dl=0CDAR 2019 Competition on Scene Text Visual Question Answering, Ali Furkan Biten (Computer Vision Center); Rubèn Tito (Computer Vision Center); Andrés Mafla (Computer Vision Centre); Lluis Gomez (Universitat Autónoma de Barcelona)*; Marçal Rusiñol (Computer Vision Center, UAB); Minesh Mathew (CVIT, IIIT-Hyderabad); C.V. Jawahar (IIIT-Hyderabad); Ernest Valveny (Universitat Autónoma de Barcelona); Dimosthenis Karatzas (Computer Vision Centre) 
  30. https://www.dropbox.com/s/dru7u9htgpnts3i/ICCV_CLVL19_30.pdf?dl=0Recognizing and Characterizing Natural Language Descriptions of Visually Complex Images,  Ziyan Yang (University of Virginia)*; Yangfeng Ji (University of Virginia); Vicente Ordonez (University of Virginia)  103 
  31. https://www.dropbox.com/s/advksyg68hy4ld5/paper.pdf?dl=0Adversarial Learning of Semantic Relevance in Text to Image Synthesis,  Miriam Cha (Harvard University)*; Youngjune Gwon (Samsung SDS); H.T. Kung (Harvard University)  104 
  32. https://arxiv.org/abs/1905.02925 (Poster only)  ShapeGlot: Learning Language for Shape Differentiation, Panos Achlioptas, Judy Fan, Robert Hawkins, Noah Goodman, Leonidas Guibas Conference Paper International Conference on Computer Vision, 2019, Seoul 
VATEX Challenge Presentations 

Multi-modal Information Fusion and Multi-stage Training Strategy 
for Video Captioning
Ziqi Zhang*,Yaya Shi*, Jiutong Wei*,Chunfeng Yuan, Bing Li, Weiming Hu 

  Integrating Temporal and Spatial Attentions for VATEX Video 
Captioning Challenge 2019 

 Multi-View Features and Hybrid Reward Strategies for VATEX Video Captioning Challenge 2019
Xinxin Zhu*, Longteng Guo*, Peng Yao*, Jing Liu, Hanqing Lu