Home

MaDrIgAL: Multi-Dimensional Interaction Management and Adaptive Learning


 
https://www.epsrc.ac.uk

Duration: 1 June 2016 - 31 May 2019
Investigators: Dr. Verena Rieser and Dr. Simon Keizer
Institute: Interaction Lab, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh (UK)
Collaborators: Prof. Harry Bunt (University of Tilburg, Netherlands), Dr. Norbert Pfleger (SemVox GmbH, Germany)

Summary

As tech giants like Google, Facebook, Apple and Microsoft continue to invest in speech technology, the global voice recognition market is projected to reach a value of $133 billion by 2017 (companiesandmarkets.com, 2015). Speech-enabled interactive systems in particular, such as Apple's Siri and Microsoft's Cortana, are starting to show significant economic impact, with the Virtual Personal Assistant (VPA) market estimated to grow from $352 million in 2012 to over $3 billion in 2020 (Grand View Research, 2014).

Although such commercial systems allow consumers to use their voice in interacting with their devices and services, the user experience is still limited due to the lack of naturalness of the conversations and limited social intelligence of the VPA. Moreover, the quality of these user interfaces relies on large, carefully crafted rule sets, making development labour-intensive and not scalable to new application domains. With the emergence of the Internet of Things and voice control in the smart home, there is a huge demand for scalable development of natural conversational interfaces across task domains.

MaDrIgAL will develop a radically new approach to building interactive spoken language interfaces by exploiting the multi-dimensional nature of natural language conversation: in addition to carrying out the underlying task or activity, participants in a dialogue simultaneously address several other aspects of communication, such as giving and eliciting feedback and adhering to social conventions. In analogy to the singing voices in a madrigal, simultaneous processes for each dimension operate in harmony to produce multifunctional, natural utterances. Consider the two alternative responses S2a and S2b in the following example:

U1: Hello, I would like to book a flight to London.

S2a: Which date did you have in mind?

S2b: Okay, flying to London on what date?


Whereas S2a only asks for the next piece of information to book the flight (uni-dimensional), S2b also gives feedback about the arrival city, allowing the user to correct any recognition errors (multi-dimensional). We aim to develop a principled multidimensional modelling and learning framework that covers a wide range of different phenomena, including the implicit confirmation in S2b.

This multi-dimensional approach will not only allow us to build systems that support more natural and effective interactions with users, but also enables cost-effective development of such interfaces for a variety of domains by learning transferable conversational skills (e.g., selecting actions in domain independent dimensions). We will therefore demonstrate our approach by building interactive spoken language interfaces for multiple application domains in a home automation scenario, allowing users to interact with for example their Smart TV or heating control system. We will closely collaborate with the industrial partner SemVox to explore this scenario.

The project will bring together expertise in statistical machine learning approaches to state-of-the-art spoken dialogue systems and natural language generation, as well as linguistic theories of multi-dimensional dialogue modelling (collaborating in particular with academic partner Prof. Bunt). MaDrIgAL will develop Next Generation Interaction Technologies relevant to Health Technology and Assisted Living, as well as tackle the question of a common user interface to the Internet of Things and Big Data.

References