Workshop on Shortcomings in Vision and Language (SiVL)

ECCV 2018, Munich, Germany

September 8, 2018

Program Schedule

Accepted Full-Papers

1: MoQA - A Multi-Modal Question Answering Architecture

Monica Haurilet, Ziad Al-Halah, Rainer Stiefelhagen

2: Knowing Where to Look? Analysis on Attention of Visual Question Answering System

Wei Li, Zehuan Yuan, Changhu Wang

3: Pre-gen metrics: Predicting caption quality metrics without generating captions

Marc Tanti, Albert Gatt, Adrian Muscat

4: Quantifying the amount of visual information used by neural caption generators

Marc Tanti, Albert Gatt, Kenneth Camilleri

5: Distinctive-attribute Extraction for Image Captioning

Boeun Kim, Young Han Lee, Hyedong Jung, Choongsang Cho

6: Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data.

Alina Roitberg, Manel Martinez, Monica Haurilet, Rainer Stiefelhagen

7: How Do End-to-End Image Description Systems Generate Spatial Relations?

Mohammad Mehdi Ghanimifard, Simon Dobnik

8: How clever is the FiLM model, and how clever can it be?

Alexander Kuhnle, Huiyuan Xie, Ann Copestake

9: Image-sensitive language modeling for automatic speech recognition

Kata Naszadi, Dietrich Klakow

10: Improving Context Modelling in Multimodal Dialogue Generation

Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser

11: Adding Object Detection Skills to Visual Dialogue Agents

Gabriele Bani, Tim Baumgärtner, Aashish Venkatesh, Davide Belli, Gautier Dagan, Alexander Geenen, Andrii Skliar, Elia Bruni, Raquel Fernandez

Accepted Extended-Abstracts

1: Video Object Segmentation with Language Referring Expressions

Anna Khoreva, Anna Rohrbach, Bernt Schiele

2: Semantic Action Discrimination in Movie Description Dataset

Andrea Amelio Ravelli, Lorenzo Gregori, Lorenzo Seidenari

3: Learning to see from experience: But which experience is more propaedeutic?

Ravi Shekhar, Ece Takmaz, Nikos Kondylidis, Claudio Greco, Aashish Venkatesh, Raffaella Bernardi, Raquel Fernandez

4: Visual Dialogue Needs Symmetry, Goals, and Dynamics: The Example of the MeetUp Task

David Schlangen, Nikolai Ilinykh, Sina Zarrieß

5: Building Common Ground in Visual Dialogue: The PhotoBook Task and Dataset

Janosch Haber, Elia Bruni, Raquel Fernandez

6: Entity-Grounded Image Captioning

Annika Lindh, Robert Ross, John Kelleher

7. Modular Mechanistic Networks for Computational Modeling of Spatial Descriptions

Simon Dobnik and John D. Kelleher (it won't be presented)

8: Visual Question Answering as a Meta Learning Task

Damien Teney, Anton Van Den Hengel

9:An Evaluative Look at the Evaluation of VQA

Shailza Jolly, Sandro Pezzelle, Tassilo Klein, Moin Nabi

10: The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

Mateusz Malinowski, Carl Doersch

11: Make up Your Mind: Towards Consistent Answer Predictions in VQA Models

Arijit Ray, Giedrius Burachas, Karan Sikka, Anirban Roy, Avi Ziskind, Yi Yao, Ajay Divakaran

12: Visual speech language models

Helen L Bear

13: Be Different to Be Better: Toward the Integration of Vision and Language

Sandro Pezzelle, Claudio Greco, Aurelie Herbelot, Tassilo Klein, Moin Nabi, Raffaella Bernardi

14: Towards Speech to Sign Language Translation

Amanda Cardoso Duarte, Gorkem Camli, Jordi Torres, Xavier Giro-i-Nieto

15: The overlooked role of self-agency in artificial systems

Matthew D Goldberg, Justin Brody, Timothy Clausner, Donald Perlis

16: Women also Snowboard: Overcoming Bias in Captioning Models

Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach

17: Estimating Visual Fidelity in Image Captions

Pranava Madhyastha, Josiah Wang, Lucia Specia

18: Object Hallucination in Image Captioning

Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

19: From entailment to Generation

Somayeh jafaritazehjani, Albert Gatt