Location: Sala Pasinetti (Palazzo del Cinema)
8:30am
8:40am - 9:20am
9:20am - 10:00am
10:00am - 10:30am
10:30am - 11:20am
11:20am - 12:00pm
12:00pm - 1:00pm
1:00pm - 2:00pm
2:00pm - 2:35pm
2:35pm- 3:10pm
3:15pm - 3:50pm
3:50pm - 4:00pm
4:00pm - 4:45pm
4:45pm - 5:30pm
Opening
Invited Talk: Larry Zitnick, Facebook AI Research
Invited talk: Dhruv Batra, Georgia Tech and Facebook AI Research
Visual Dialog: Towards AI agents that can see, talk, and act
Coffee Break (you can hang posters at this time and even discuss them here or later in the poster session)
Invited talk: Trevor Darrel, UC Berkeley
Spotlights for papers presented in the workshop (3 minutes each),
Evaluating Multimodal Representations on Sentence Similarity:vSTS, Visual Semantic Textual Similarity Dataset, Oier Lopez de Lacalle Aitor Soroa, Eneko Agirre, UPV/EHU University of the Basque Country, (abstract
)
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Abhishek Das*,Satwik Kottur*, José M.F. Moura,Stefan Lee, Dhruv Batra (abstract
)
Investigation of the correlations between CNN visual features and word embeddings, Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi, Graduate school of Synstem Informatics, Kobe University, (abstract
)
Vision and Language Integration: Objects and beyond, Ravi Shekhar, Sandro Pezzelle, Aur´elie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi,
University of Trento, Trento, Italy (abstract
)
Self-view Grounding Given a Narrated 360 Video, Shih-Han Chouy, Yi-Chun Cheny, Kuo-Hao Zengy, Hou-Ning Huy, Jianlong Fuz, Min Sun, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan, Microsoft Research, Beijing, China (abstract
)
Leveraging off-the-shelf models for entry-level tag prediction and ranking, Jorge S´anchez, Agust´ın Caverzasi, CIEM-CONICET, FaMAF-UNC, DeepVision (abstract
)
Story Learning from Kids Videos with Successive Event Order Embedding, Min-Oh Heo, Kyung-Min Kim, Byoung-Tak Zhang, Seoul National University (abstract
)
Re-evaluating automatic metrics for image captioning, Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem, Hacettepe University Computer Vision Lab (reference
)
Fine-Grained Video Retrieval for Multi-Clip Video, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikki¨a, Yuta Nakashim, Esa Rahtu, Janne Heikki¨a, NAIST, Osaka University, Tampere University of Technology, University of Oulu (abstract
)
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation, Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong, (abstract
, reference https://arxiv.org/pdf/1708.04686.pdf).
An Analysis of Visual Question Answering Algorithms, Kushal Kafle and Christopher Kanan, Rochester Institute of Technology, (abstract
)
Lunch break
Poster Session
Visual Storytelling
Justin Johnson, Stanford
Invited Talk: Chris Kanan, Rochester Institute of Technology
What does solving VQA mean?
Invited Talk: Jia Deng, University of Michigan
Teaching Computers to See and Think
Break
Closing Keynote Talk: Antonio Torralba, MIT
TBD
(canceled) Panel Discussion