CLVL Program*

Location: Sala Pasinetti (Palazzo del Cinema)

8:30am

8:40am - 9:20am

9:20am - 10:00am

10:00am - 10:30am

10:30am - 11:20am

11:20am - 12:00pm

12:00pm - 1:00pm

1:00pm - 2:00pm

2:00pm - 2:35pm

2:35pm- 3:10pm

3:15pm - 3:50pm

3:50pm - 4:00pm

4:00pm - 4:45pm

4:45pm - 5:30pm

Opening

Invited Talk: Larry Zitnick, Facebook AI Research

Invited talk: Dhruv Batra, Georgia Tech and Facebook AI Research

Visual Dialog: Towards AI agents that can see, talk, and act

Coffee Break (you can hang posters at this time and even discuss them here or later in the poster session)

Invited talk: Trevor Darrel, UC Berkeley

Spotlights for papers presented in the workshop (3 minutes each),

Evaluating Multimodal Representations on Sentence Similarity:vSTS, Visual Semantic Textual Similarity Dataset, Oier Lopez de Lacalle Aitor Soroa, Eneko Agirre, UPV/EHU University of the Basque Country, (abstract

)
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Abhishek Das*,Satwik Kottur*, José M.F. Moura,Stefan Lee, Dhruv Batra (abstract

)
1. Investigation of the correlations between CNN visual features and word embeddings, Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi, Graduate school of Synstem Informatics, Kobe University, (abstract

1. )
2. Vision and Language Integration: Objects and beyond, Ravi Shekhar, Sandro Pezzelle, Aur´elie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi,
3. University of Trento, Trento, Italy (abstract

1. )
2. Self-view Grounding Given a Narrated 360 Video, Shih-Han Chouy, Yi-Chun Cheny, Kuo-Hao Zengy, Hou-Ning Huy, Jianlong Fuz, Min Sun, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan, Microsoft Research, Beijing, China (abstract

1. )
2. Leveraging off-the-shelf models for entry-level tag prediction and ranking, Jorge S´anchez, Agust´ın Caverzasi, CIEM-CONICET, FaMAF-UNC, DeepVision (abstract

1. )

Story Learning from Kids Videos with Successive Event Order Embedding, Min-Oh Heo, Kyung-Min Kim, Byoung-Tak Zhang, Seoul National University (abstract

)
Re-evaluating automatic metrics for image captioning, Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem, Hacettepe University Computer Vision Lab (reference

)
1. Fine-Grained Video Retrieval for Multi-Clip Video, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikki¨a, Yuta Nakashim, Esa Rahtu, Janne Heikki¨a, NAIST, Osaka University, Tampere University of Technology, University of Oulu (abstract

1. )
2. VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation, Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong, (abstract

1. , reference https://arxiv.org/pdf/1708.04686.pdf).
2. An Analysis of Visual Question Answering Algorithms, Kushal Kafle and Christopher Kanan, Rochester Institute of Technology, (abstract

1. )

Lunch break

Poster Session

Visual Storytelling

Justin Johnson, Stanford

Invited Talk: Chris Kanan, Rochester Institute of Technology

What does solving VQA mean?

Invited Talk: Jia Deng, University of Michigan

Teaching Computers to See and Think

Break

Closing Keynote Talk: Antonio Torralba, MIT

TBD

(canceled) Panel Discussion

Page updated

Google Sites

Report abuse