Swedish Dialogue Workshop 2019

Time and place

The Swedish Dialogue Workshop (Svensk dialogverkstad) 2019 is held at the University of Gothenburg on Friday 13th June, 2019.

The location is the Department of Philosophy, Linguistics and Theory of Science, Olof Wijksgatan 6, Room T219.

Preliminary schedule

  • 09.30-10.00: incremental arrival; coffee and sandwich
  • 10.00-12.00 talks & discussions (4 talks): Skantze, Noble, Boye, Dobnik
  • 12.00-13.30 lunch at Världskulturmuseet
  • 13.30-15.00 talks (3 talks): Larsson&Berman&Hjelm, Ekstedt, Axelsson
  • 15.00-15.30 coffee
  • 15.30-17.00 talks & discussions (2 talks + discussion): Saget, Edlund, discussion (briefing about ongoing activities and plans, ...?)
  • 17-? drinks & dinner somewhere in Gothenburg

Abstracts

Nils Axelsson (KTH): Modelling art presentations with behaviour trees

Behaviour trees have traditionally been used for robotics and video games, but are not commonly applied to social robotics. I propose that they can be used to model attention, hearing, understanding, and acceptance, in a joint action ladder, and that this can be used to create a system that presents a poster or an object in an adaptive way. I also present results from an experiment based on a system designed using behaviour trees and joint project theory as a base.

Johan Boye (KTH): Neural spatial reference resolution

Adding interactive capabilities to pedestrian wayfinding systems in the form of spoken dialogue will make them more natural to humans. Such an interactive wayfinding system needs to continuously understand and interpret pedestrian’s utterances referring to the spatial context. Achieving this requires the system to identify exophoric referring expressions in the utterances, and link these expressions to the geographic entities in the vicinity. This exophoric spatial reference resolution problem is difficult, as there are often several dozens of candidate referents. We present a neural network-based approach for identifying pedestrian’s references (using a network called RefNet) and resolving them to appropriate geographic objects (using a network called SpaceRefNet). Both methods show promising results beating the respective baselines and earlier reported results in the literature.

Simon Dobnik (CLASP, GU): Language, action and perception

Situated agents interact both with their physical environment they are located in and with their conversational partners. In this talk we examine the interaction between language, action and perception on the example of spatial descriptions and how this has been addressed in computational models: both in terms how different modalities contribute to their meaning representations and how to tackle the continuously changing dialogue and perceptual contexts.

Jens Edlund (KTH): Swedish speech tech resources at Språkbanken Tal in 2020

In 2018, the time-honoured "Språkbanken" in Göteborg was reorganized into "Nationella språkbanken", a Swedish national research infrastructure with funding from the Swedish Science Council and from the dozen or so partners. The new organization consists of "Språkbanken Text" (the new name for the "old" Språkbanken), Språkbanken Sam, hosted by the Language Council of Sweden and with a focus on societal issues, and "Språkbanken Tal", hosted by KTH Speech, Music and Hearing, with speech sicence and speech technology infrastructure as its domain.

The new parts of this infrastructure are still in their build-up phase, but we are gearing up this autumn, and at some point in 2020, the first deliverables will be in place: freely available speech-to-text (or ASR), text-to-speech (or TTS), and forced alignment for Swedish. This presentation gives a quick walk-trough of the work ahead and what we will offer.

Erik Ekstedt (KTH): Unsupervised and Representation Learning for (Turn Taking in) Spoken Dialogue Systems

For spoken dialog systems the coordination of turns between interlocutors is vital for comfortable and fluent dialog. This coordination is complex to learn and depend upon many different aspects of both semantic and acoustic information. In addition to this there are difficulties in trying to separate turn taking from other aspects of a conversation which make such models difficult to evaluate. This could mean that a reductionist view of this problem could hinder the development in this area of research and more refined models might be necessary. Taking inspiration from other successful areas such as text generation and image recognition we discuss how unsupervised and representation learning could be a valid step forward in the area of spoken dialog systems.

Staffan Larsson, Alex Berman & David Hjelm (GU, CLASP, Talkamatic): Towards Negotiative Dialogue for the Talkamatic Dialogue Manager

We describe a number of dialogue phenomena associated with negotiative dialogue, as implemented in a development version of the Talkamatic Dialogue Manager (TDM). This implementation is an initial step towards full coverage of general features of negotiative dialogue in TDM.

Bill Noble (GU, CLASP): Towards of formal model of word meaning negotiation

Word meaning negotiation (WMN) is a conversational routine in which speakers explicitly discuss the meaning of a lexical item. WMNs occur when one participant disagrees with or doesn’t understand what a speaker meant by a particular word or phrase. Such a discrepancy represents a breakdown in the alignment of the participants’ lexico-semantic resources.

WMN has been studied as an aspect language acquisition (Varonis and Gass, 1985; Clark, 2007) and appears in psycholinguistic research on semantic align- ment (Brennan and Clark, 1996; Metzing and Brennan, 2003). Recently, Myren- dal (2015) has taken a more in-depth look at the phenomenon, analyzing WMNs in Swedish online discussion forums. We present a work-in-progress formaliza- tion and annotation schema based on this work.

The proposed model represents the WMN as a graph structure, where the nodes are semantic anchors and the edges are proposed semantic links between them. Anchors are meaningful units agents use to “triangulate” the meaning of the term in question. They may include explications (definitions), examples, and possibly other terms. Links are semantic relations between anchors, as proposed (or accepted) by the speakers involved.

This model seeks to shed light on how speakers recover from such break- downs, and how meaning negotiation contributes to lasting semantic change. Further, we hope that the semantic negotiation strategies employed by humans will suggest methods by which dialogue systems can recover from similar break- downs in semantic alignment.

Sylvie Saget (GU, CLASP): Dialogue management and memory

While the majority of studies in dialogue address dialogue management from the process perspective, our project aims at developing an original approach of dialogue management from the information, epistemic states and memory architecture perspective. We will briefly detail our approach.

Gabriel Skantze (KTH): Learning symbol grounding through dialog

A fundamental requirement for any intelligent system that is situated in a physical environment – and that should also be able to reason symbolically about this environment or communicate about it using symbolic language – is that it can understand the relationship between these symbols, the objects or phenomena they denote, and their properties.

As humans, we typically learn these relationships though dialog with other humans who share the same symbol system. For a robot or intelligent system to be able to interact with other humans in their language, it is important that it can learn to adopt their symbol system and how it is grounded. Children typically learn language implicitly by observing other agents communicate in a situated environment, or by taking part in interaction with other agents and learn to gradually adopt their language use. Learning however can be made more effective by combining implicit learning with explicit teaching acts, like pointing at an object and stating the name of it or providing a linguistic explanation of an object or a concept.

In this talk, I will present an initial study on how this process can be modeled, which was presented at EMNLP 2018. In this study, we explored how a computer could learn to understand the semantics of referring language by observing humans interact with each other in a collaborative task, and how the language used by the two interlocutors converges after repeated interactions. By adapting the model to the participants’ specific language use, the reference resolution model improved.

I will also talk briefly about our plans for a newly funded WASP-project, called ”Robot learning of symbol grounding in multiple contexts through dialog”. We are currently setting up a data-collection framework based on the game Codename Pictures, where players can connect online and play the game together.