Programme and schedule

By Artur Tarwacki - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=10505289

(conference will be both in person and fully available online, see registration page)

Location: Humanisten J335, Renströmsgatan 6, Gothenburg, Sweden

Initial proceedings link for archival papers: https://gupea.ub.gu.se/handle/2077/73614 (ACL Anthology link to follow)

Sept 15

8:30-9:15 - Arrival and registration, seating (Registration desk at Humanisten main entrance across from service center, coffee at 8:50).

9:10-9:20 - Welcome and introductory remarks from General Chair (Asad Sayeed)

9:20-9:30 - Opening remarks from CLASP director (Shalom Lappin)

9:30-10:00 - A small but informed and diverse model: The case of the multimodal GuessWhat!? guessing game (Claudio Greco, Alberto Testoni, Raffaella Bernardi and Stella Frank) [Long-paper, archival]

10:00-10:25 - A Cross-lingual Comparison of Human and Model Relative Word Importance (Felix Morger, Stephanie Brandl, Lisa Beinborn and Nora Hollenstein) [late-breaking long paper, archival]

10:25-10:50 - ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments (Michael Hanna, Federico Pedeni, Alessandro Suglia, Alberto Testoni and Raffaella Bernardi) [late-breaking long paper, non-archival]

10:50-11:10 - Break

11:10-12:00 - Keynote 1: Magnus Sahlgren - The Singleton Fallacy: why current critiques of language models miss the point

  • There is currently a lively debate about the semantic (in)capabilities of current language models: do language models really understand language or are they simply stochastic parrots? Are we wasting our time in the pursuit of bigger and bigger models, and should we instead be climbing some other hill in the NLP landscape? This talk provides an overview over the different positions in the debate, and attempts to disentagle the debate by pointing out an argumentation error that is referred to as the singleton fallacy.

12:00-13:30 - Lunch (Restaurang Näckrosen, Humanisten Level 4)

13:30-13:50 - Dispatcher: A Message-Passing Approach To Language Modelling. (Alberto Cetoli) [short paper, archival]

13:50-14:20 - In search of meaning and its representations for computational linguistics (Simon Dobnik, Robin Cooper, Adam Ek, Bill Noble, Staffan Larsson, Nikolai Ilinykh, Vladislav Maraev and Vidya Somashekarappa) [long paper, archival]

14:20-14:40 - Can We Use Small Models to Investigate Multimodal Fusion Methods? (Lovisa Hagström, Tobias Norlund and Richard Johansson) [short paper, archival]

14:40-15:00 - Break

15:00-15:50 - Keynote 2: Afra Alishahi - Getting closer to reality: Grounding and interaction in models of human language acquisition

  • Humans learn to understand speech from weak and noisy supervision: they manage to extract structure and meaning from speech by simply being exposed to utterances situated and grounded in their daily sensory experience. Emulating this remarkable skill has been the goal of numerous studies; however researchers have often used severely simplified settings where either the language input or the extralinguistic sensory input, or both, are small-scale and symbolically represented. I present a series of studies on modelling visually grounded language understanding.

15:50-16:00 - Charon: a FrameNet Annotation Tool for Multimodal Corpora (Frederico Belcavello, Marcelo Viridiano, Ely Matos and Tiago Timponi Torrent) [short paper, non-archival]

16:00-16:20 - Language Modelling with Pixels (Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux and Desmond Elliott) [late-breaking long paper, non-archival]

16:30-22:00 - EVENING BANQUET - Gunnebo Slott (departure and return by bus)


Sept 16

9:30 - 10:00 - Embodied Interaction in Mental Health Consultations: Some Observations on Grounding and Repair (Jing Hui Law, Patrick Healey and Rosella Galindo Esparza) [long paper, archival]

10:00-10:20 - The Case for Perspective in Multimodal Datasets (Marcelo Viridiano, Tiago Timponi Torrent, Oliver Czulo, Arthur Lorenzi, Ely Matos and Frederico Belcavello) [late-breaking long paper, non-archival]

10:20-10:50 - Effects of Task and Visual Context on Referring Expressions using Natural Scenes (Andreas Mädebach, Ekaterina Torubarova, Eleonora Gualdoni and Gemma Boleda) [late-breaking long paper, non-archival]

10:50-11:15 - Break

11:15-11:40 - Woman or tennis player? Visual typicality and lexical frequency affect variation in object naming (Eleonora Gualdoni, Thomas Brochhagen, Andreas Mädebach and Gemma Boleda) [student paper, non-archival]

11:40-12:00 - One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones (Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao and Yu Su) [late-breaking long-paper, non-archival]

12:05-13:30 - Lunch (Restaurang Näckrosen, Humanisten Level 4)

13:30-14:00 - Norm Participation Grounds Language (David Schlangen) [long paper, archival]

14:00-14:50 - Keynote 3: Felix Hill - Three studies that show that artificial models of general intelligence learn better with language

  • Having and using language makes humans as a species better learners and better able to solve hard problems. I'll present three studies that demonstrate how this is also the case for artificial models of general intelligence. In the first, I show that agents with access to visual and linguistic semantic knowledge explore their environment more effectively than non-linguistic agents, enabling them to learn more about the world around them. In the second, I demonstrate how an agent embodied in a simulated 3D world can be enhanced by learning from explanations -- answers to the question "why?" expressed in language. Agents that learn from explanations solve harder cognitive challenges than those trained from reinforcement learning alone, and can also better learn to make interventions in order to uncover the causal structure of their world. Finally, I'll present evidence that the skewed and bursty distribution of natural language may explain how large language models can be prompted to rapidly acquire new skills or behaviours. Together with other recent literature, this suggests that modelling language may make a neural network better able to acquire new cognitive capacities quickly, even when those capacities are not necessarily explicitly linguistic.

14:50-15:10 - Break

15:10-15:35 - Where am I and where should I go? Grounding positional and directional labels in a disoriented human balancing task (Sheikh Mannan and Nikhil Krishnaswamy) [late-breaking long-paper, archival]

15:35-16:05 - From speed to car and back. An exploratory study about associations between abstract nouns and images (Ludovica Cerini, Eliana Di Palma and Alessandro Lenci) [long paper, archival]

16:05-16:35 - Free-form open discussion on grounding, meaning, and language modeling (Asad Sayeed)

16:35-16:45 - Closing remarks, farewell