CLVL Program*

Location: Sala Pasinetti (Palazzo del Cinema)

 8:30am   Opening 
 8:40am - 9:20am Invited Talk: Larry Zitnick, Facebook AI Research


 9:20am - 10:00amInvited talk: Dhruv Batra, Georgia Tech and Facebook AI Research

Visual Dialog: Towards AI agents that can see, talk, and act

 10:00am - 10:30am Coffee Break (you can hang posters at this time and even discuss them here or later in the poster session) 
 10:30am - 11:20am Invited talk: Trevor Darrel, UC Berkeley

 11:20am - 12:00pm Spotlights for papers presented in the workshop (3 minutes each), 
  1. Evaluating Multimodal Representations on Sentence Similarity:vSTS, Visual Semantic Textual Similarity Dataset, Oier Lopez de Lacalle Aitor Soroa, Eneko Agirre, UPV/EHU University of the Basque Country, (abstract 
  2. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Abhishek Das*,Satwik Kottur*, José M.F. Moura,Stefan Lee, Dhruv Batra (abstract  
  3. Investigation of the correlations between CNN visual features and word embeddings, Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi, Graduate school of Synstem Informatics, Kobe University, (abstract  
  4. Vision and Language Integration: Objects and beyond, Ravi Shekhar, Sandro Pezzelle, Aur´elie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi, 
     University of Trento, Trento, Italy (abstract
  5. Self-view Grounding Given a Narrated 360 Video,   Shih-Han Chouy, Yi-Chun Cheny, Kuo-Hao Zengy, Hou-Ning Huy, Jianlong Fuz, Min Sun, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, TaiwanMicrosoft Research, Beijing, China (abstract
  6. Leveraging off-the-shelf models for entry-level tag prediction and ranking,   Jorge S´anchez, Agust´ın Caverzasi,  



  7. Story Learning from Kids Videos with Successive Event Order Embedding, 

    Min-Oh Heo, 

    Kyung-Min Kim,  

    Byoung-Tak Zhang, Seoul National University 

  8. Re-evaluating automatic metrics for image captioning, Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut ErdemHacettepe University Computer Vision Lab (reference
  9. Fine-Grained Video Retrieval for Multi-Clip Video, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikki¨aYuta Nakashim, Esa Rahtu, Janne Heikki¨a, NAIST, Osaka University, Tampere University of Technology, University of Oulu (abstract

  10. VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation,  Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong, (abstract, reference

  11. An Analysis of Visual Question Answering Algorithms, Kushal Kafle and Christopher Kanan, Rochester Institute of Technology, (abstract


 12:00pm - 1:00pm Lunch break 
 1:00pm - 2:00pm Poster Session 

 2:00pm - 2:35pm Visual Storytelling 
 Justin Johnson, Stanford

 2:35pm- 3:10pm Invited Talk: Chris Kanan, Rochester Institute of Technology

What does solving VQA mean?
 3:15pm - 3:50pmInvited Talk: Jia Deng, University of Michigan 
Teaching Computers to See and Think

 3:50pm - 4:00pmBreak 
 4:00pm - 4:45pmClosing Keynote Talk: Antonio Torralba, MIT

 4:45pm - 5:30pm (canceled) Panel Discussion