BABBLE project: domain-general methods for learning natural spoken dialogue systems

EPSRC grant no. EP/M01553X/1: start date april 2015 Concept

The BABBLE project will provide foundations and impetus for the rapid development of a next-generation of naturally interactive conversational interfaces with deep language understanding, in areas as diverse as healthcare, human-robot interaction, wearables, home automation, education, games, and assistive technologies.

Future conversational speech interfaces should allow users to interact with machines using everyday spontaneous language to achieve everyday needs. A commercial example with quite basic capabilities is Apple's Siri. However, even today's limited speech interfaces are very difficult and time-consuming to develop for new applications: their key components currently need to be tailor-made by experts for specific application domains, relying either on hand-written rules or statistical methods that depend on large amounts of expensive, domain-specific, human-annotated dialogue data. The components thus produced are of little or no use for any new application domain, resulting in expensive and time- consuming development cycles.

One key underlying reason for this status quo is that for spoken dialogue, general, scalable methods for natural language understanding (NLU), dialogue management (DM), and language generation (NLG) are not yet available. Current domain- general methods for language processing are sentence-based and so perform fairly well for processing written text, but they quickly run into difficulties in the case of spoken dialogue, because ordinary conversation is highly fragmentary and incremental: it naturally happens word-by-word, rather than sentence-by-sentence. Real conversation happens bit by bit,using half-starts, suggested add-ons, pauses, interruptions, and corrections -- without respecting the boundaries of sentences. And it is precisely these properties that contribute to the feeling of being engaged in a normal, natural conversation, which current state-of-the-art speech interfaces fail to produce.

We propose to solve these two problems together, by for the first time:

(1) combining domain-general, incremental, and scalable approaches to NLU, DM, and NLG;

(2) developing machine learning algorithms to automatically create working speech interfaces from data, using (1).

We propose a new method "BABBLE" in which speech systems can be trained to interact naturally with humans, much like a child who experiments with new combinations of words to discover their usefulness (though doing this offline to avoid annoying real users while doing so!).

Publications

Christine Howes and Arash Eshghi, "Feedback relevance spaces: the organisation of increments in conversation", Proceedings of the 12th International Conference on Computational Semantics (IWCS), Montpellier, 2017 [pdf]
Arash Eshghi, Igor Shalyminov, and Oliver Lemon, "Bootstrapping incremental dialogue systems from minimal data: linguistic knowledge or machine learning?", Proceedings of EMNLP, 2017 [pdf]
Igor Shalyminov, Arash Eshghi,and Oliver Lemon, "Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena", Proceedings of SemDial, 2017 [pdf]
Yanchao Yu, Arash Eshghi, Gregory Mills and Oliver Lemon, "The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings", Proceedings of the Sixth EACL workshop on Vision and Language, Valencia, 2017.
Arash Eshghi, Igor Shalyminov, and Oliver Lemon "Bootstrapping dialogue systems: using a semantic model of dialogue to generalise from minimal data", Proceedings of Conference on Logic and Machine Learning in Natural Language (LaML), 2017
Yanchao Yu, Arash Eshghi, and Oliver Lemon, "Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings", Proceedings of Robo-NLP workshop at ACL 2017, ** BEST PAPER AWARD ** [pdf]
Patrick G. T. Healey, Gregory J. Mills, and Arash Eshghi. "Real-time semantic repairs: Is misunderstanding the engine of co-ordination?", Topics in Cognitive Science (topiCS), under review.
Eshghi, A. & Lemon, O. (2017). "Grammars as Mechanisms for Interaction: The Emergence of Language Games", Theoretical Linguistics, 43(1-2), pp. 129-133. doi:10.1515/tl-2017-0010
Arash Eshghi, Igor Shalyminov, and Oliver Lemon, "Interactional Dynamics and the Emergence of Language Games", FADLI 2017
Ruth Kempson, Eleni Gregoromichelaki, Arash Eshghi, and Julian Hough. "Ellipsis in Dynamic Syntax". In Oxford Handbook of Ellipsis. Oxford University Press, 2017
Dimitris Kalatzis, Arash Eshghi, and Oliver Lemon, "Bootstrapping incremental dialogue systems:using linguistic knowledge to learn from minimal data", NIPS workshop on Learning Methods for Dialogue 2016 [arXiv]
Oliver Lemon, Arash Eshghi, and Yanchao Yu, "Learning how to learn: grounding word meanings through conversation with humans", Machine Intelligence - Human-Like Computing (MI20-HLC), 2016 [pdf]
Heriberto Cuayahuitl, Simon Keizer, Oliver Lemon, "Strategic Dialogue Management via Deep Reinforcement Learning", NIPS workshop on Deep Reinforcement Learning, 2015. [arXiv]
Arash Eshghi, "DS-TTR: An incremental, semantic, contextual parser for dialogue", SemDial 2015 (demonstration system)
Oliver Lemon and Arash Eshghi, "Deep Reinforcement Learning for constructing meaning by `babbling' ", International Conference on Computational Semantics (IWCS), 2015
Arash Eshghi, Christine Howes, Eleni Gregoromichelaki, Julian Hough, and Matthew Purver, "Feedback in Conversation as Incremental Semantic Update", International Conference on Computational Semantics (IWCS), 2015
Yanchao Yu, Arash Eshghi, Oliver Lemon "Comparing attribute classifiers for interactive language grounding", EMNLP workshop on Vision and Language, 2015
Yanchao Yu, Oliver Lemon, and Arash Eshghi, "Interactive Learning through Dialogue for Multimodal Language Grounding", SemDial, 2015
Ruth Kempson, Ronnie Cann, Arash Eshghi, Eleni Gregoromichelaki, and Matthew Purver. "Ellipsis". In Shalom Lappin and Chris Fox, editors, Handbook of Contemporary Semantic Theory. Wiley, 2nd edition, 2015.
Yanchao Yu, Oliver Lemon, and Arash Eshghi "Comparing dialogue strategies for learning grounded language from human tutors", Proc SEMDIAL 2016
Yanchao Yu, Arash Eshghi, and Oliver Lemon, "An Incremental Dialogue System for Learning Visually Grounded Language" Proc SEMDIAL 2016 (demonstration system)
Yanchao Yu, Arash Eshghi, and Oliver Lemon, "Incremental Generation of Visually Grounded Language in Situated Dialogue" (demonstration system), INLG 2016
Yanchao Yu, Arash Eshghi, and Oliver Lemon, "Training an adaptive dialogue policy for interactive learning of visually grounded word meanings", Proc. SIGDIAL 2016 [pdf] [video demonstration]
Yanchao Yu, Arash Eshghi, and Oliver Lemon, "VOILA: An Optimised Dialogue Agent for Interactively Learning Visually-Grounded Word Meanings (demonstration system)", SIGDIAL 2017
1. Invited talks
Oliver Lemon, FaceBook AI Faculty Summit, New York, October 2017
Oliver Lemon, SIGDIAL/ Semdial 2017 keynote talk, Saarbrucken, august 2017 [slides]
Arash Eshghi, Goteborg University, 5th November 2015. [slides]
Oliver Lemon, Dundee University, 11th May 2016.
Oliver Lemon, International Workshop on Domain Adaptation for Dialog Agents (DADA), Riva del Garda, Italy, "Domain-general learning methods for Conversational Interfaces" [slides] [video], 23rd September 2016
Arash Eshghi, Queen Mary University of London, November 2016. [slides]
Arash Eshghi, Bielefeld University, December 2016. [slides]

Posters

Heriberto Cuayahuitl, Simon Keizer, Oliver Lemon, "Strategic Dialogue Management via Deep Reinforcement Learning", Alan Turing Institute Deep Learning workshop, Edinburgh 2015
Heriberto Cuayahuitl, Simon Keizer, Oliver Lemon, "Situated Dialogue Management using Deep Reinforcement Learning", EPSRC workshop on Human-Like Computing, Bristol, 2016.

Data: Human-Human Dialogue Corpus

We report our effort on a new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET Experimental Toolkit (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as “burchak” for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self- and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. The data also contains various concept learning/teaching strategies. See Yu et al. (2017) for more details [pdf]. The corpus, containing original dialogues and the cleaned-up version, has been uploaded (please find the .zip file below for downloading).

People

Principal Investigator: Oliver Lemon

Researcher Co-Investigator: Arash Eshghi

PhD Students: Yanchao Yu and Igor Shalyminov

Google Sites

Report abuse