Accepted Full-Papers
(Presented as 4 min spotlight talks and posters in poster session 2)
1: MoQA - A Multi-Modal Question Answering Architecture
Monica Haurilet, Ziad Al-Halah, Rainer Stiefelhagen
2: Knowing Where to Look? Analysis on Attention of Visual Question Answering System
Wei Li, Zehuan Yuan, Changhu Wang
3: Pre-gen metrics: Predicting caption quality metrics without generating captions
Marc Tanti, Albert Gatt, Adrian Muscat
4: Quantifying the amount of visual information used by neural caption generators
Marc Tanti, Albert Gatt, Kenneth Camilleri
5: Distinctive-attribute Extraction for Image Captioning
Boeun Kim, Young Han Lee, Hyedong Jung, Choongsang Cho
6: Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data.
Alina Roitberg, Manel Martinez, Monica Haurilet, Rainer Stiefelhagen
7: How Do End-to-End Image Description Systems Generate Spatial Relations?
Mohammad Mehdi Ghanimifard, Simon Dobnik
8: How clever is the FiLM model, and how clever can it be?
Alexander Kuhnle, Huiyuan Xie, Ann Copestake
9: Image-sensitive language modeling for automatic speech recognition
Kata Naszadi, Dietrich Klakow
10: Improving Context Modelling in Multimodal Dialogue Generation
Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
11: Adding Object Detection Skills to Visual Dialogue Agents
Gabriele Bani, Tim Baumgärtner, Aashish Venkatesh, Davide Belli, Gautier Dagan, Alexander Geenen, Andrii Skliar, Elia Bruni, Raquel Fernandez
Accepted Extended-Abstracts
(Presented as posters in poster session 1). Download their pdf here
1: Video Object Segmentation with Language Referring Expressions
Anna Khoreva, Anna Rohrbach, Bernt Schiele
2: Semantic Action Discrimination in Movie Description Dataset
Andrea Amelio Ravelli, Lorenzo Gregori, Lorenzo Seidenari
3: Learning to see from experience: But which experience is more propaedeutic?
Ravi Shekhar, Ece Takmaz, Nikos Kondylidis, Claudio Greco, Aashish Venkatesh, Raffaella Bernardi, Raquel Fernandez
4: Visual Dialogue Needs Symmetry, Goals, and Dynamics: The Example of the MeetUp Task
David Schlangen, Nikolai Ilinykh, Sina Zarrieß
5: Building Common Ground in Visual Dialogue: The PhotoBook Task and Dataset
Janosch Haber, Elia Bruni, Raquel Fernandez
6: Entity-Grounded Image Captioning
Annika Lindh, Robert Ross, John Kelleher
7. Modular Mechanistic Networks for Computational Modeling of Spatial Descriptions
Simon Dobnik and John D. Kelleher (it won't be presented)
8: Visual Question Answering as a Meta Learning Task
Damien Teney, Anton Van Den Hengel
9:An Evaluative Look at the Evaluation of VQA
Shailza Jolly, Sandro Pezzelle, Tassilo Klein, Moin Nabi
10: The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR
Mateusz Malinowski, Carl Doersch
11: Make up Your Mind: Towards Consistent Answer Predictions in VQA Models
Arijit Ray, Giedrius Burachas, Karan Sikka, Anirban Roy, Avi Ziskind, Yi Yao, Ajay Divakaran
12: Visual speech language models
Helen L Bear
13: Be Different to Be Better: Toward the Integration of Vision and Language
Sandro Pezzelle, Claudio Greco, Aurelie Herbelot, Tassilo Klein, Moin Nabi, Raffaella Bernardi
14: Towards Speech to Sign Language Translation
Amanda Cardoso Duarte, Gorkem Camli, Jordi Torres, Xavier Giro-i-Nieto
15: The overlooked role of self-agency in artificial systems
Matthew D Goldberg, Justin Brody, Timothy Clausner, Donald Perlis
16: Women also Snowboard: Overcoming Bias in Captioning Models
Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach
17: Estimating Visual Fidelity in Image Captions
Pranava Madhyastha, Josiah Wang, Lucia Specia
18: Object Hallucination in Image Captioning
Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko
19: From entailment to Generation
Somayeh jafaritazehjani, Albert Gatt