Statistical Learning for Natural Conversational Interfaces (Nuance invited talk May 2013)

[slides attached below!]

Abstract:

I discuss new data-driven methods for designing multimodal conversational interfaces, using combinations of speech, spatial information, and visual information, to produce systems which are increasingly robust, adaptive, and `natural'. The key challenges in building natural communication skills into conversational interfaces include handling uncertainty (e.g. from ASR, parsers, location sensors, and vision systems), optimisation of communicative action selection, incremental processing, and presenting information adaptively for different users and contexts. Statistical Machine Learning provides a suite of tools with which to address these challenges. I will illustrate some of our recent advances in these areas, and present our research involving an intelligent speech-enabled tourist guide (`SpaceBook': using 3D city models and question-answering), a socially intelligent robot bartender (`JAMES': using unsupervised learning and computer vision), and incremental dialogue systems (`PARLANCE'). Future directions involve combining statistical approaches (such as POMDP belief tracking) with exisiting NLP techniques.