Talks

Speaker: Maxine Eskenazi

Title: Human > User in the Loop

Abstract: Most of the work on intelligent agents in the past has centered on the agent itself, ignoring the needs and opinions of the user. We will show that it is essential to include the user in agent development and assessment. There is a significant advantage to relying on real users as opposed to paid users, which are the most prevalent at present. This introduces a study to assess system generation that employed the user’s following utterance for a more realistic picture of the appropriateness of an utterance. This takes us to a discussion of user-centric evaluation where two novel metrics, USR and FED, are introduced. Finally we present an interactive Challenge with real users held as a thread of DSTC9.

Speaker: Milica Gašić

Title: On the track of multi-domain dialogue models

Abstract: Current dialogue models are unnatural, narrow in domain and frustrating for users. Ultimately, we would rather like to converse with continuously evolving, human-like dialogue models at ease with large and extending domains. Limitations of the dialogue state tracking module, which maintains all information about what has happened in the dialogue so far, are central to this challenge. Its ability to extend its domain of operation is directly related to how natural the user perceives the system. I will talk about some of the latest research coming from the HHU Dialogue Systems and Machine Learning group that addresses this question.

Speaker: Larry Heck

Title: Master-Apprentice Learning

Abstract: I will present my recent research on expanding the AI skills of digital assistants through explicit human-in-the-loop dialogue and demonstrations. Digital assistants learn from other digital assistants with each assistant initially trained through human interaction in the style of a“Master and Apprentice”. For example, when a digital assistant does not know how to complete a requested task, rather than responding “I do not know how to do this yet”, the digital assistant responds with an invitation to the human“can you teach me?”. Apprentice-style learning is powered by a combination of all the modalities: natural language conversations, non-verbal modalities including gestures, touch, robot manipulation and motion, gaze, images/videos, and speech prosody. The new apprentice learning model is always helpful and always learning in an open world – as opposed to the current commercial digital assistants that are sometimes helpful, trained exclusively offline, and function over a closed world of “walled garden” knowledge. Master-Apprentice learning has the potential to yield exponential growth in the collective intelligence of digital assistants.

Speaker: Percy Liang

Title: Semantic Parsing for Natural Language Interfaces

Abstract: Natural language promises to be the ultimate interface for interacting with computers, allowing users to effortlessly tap into the wealth of digital information and extract insights from it. Today, virtual assistants such as Alex, Siri, and Google Assistant have given a glimpse into how this long-standing dream can become a reality, but there is still much work to be done. In this talk, I will discuss building natural language interfaces based on semantic parsing, which converts natural language into programs that can be executed by a computer. There are multiple challenges for building semantic parsers: how to acquire data without requiring laborious annotation, how to represent the meaning of sentences, and perhaps most importantly, how to widen the domains and capabilities of a semantic parser. Finally, I will talk about a new promising paradigm for tackling these challenges based on learning interactively from users.

Speaker: Ankur Parikh

Title: Towards High Precision Text Generation

Abstract: Despite large advances in neural text generation in terms of fluency, existing generation techniques are prone to hallucination and often produce output that is unfaithful or irrelevant to the source text. In this talk, we take a multi-faceted approach to this problem from 3 aspects: data, evaluation, and modeling. From the data standpoint, we propose ToTTo, a tables-to-text-dataset with high quality annotator revised references that we hope can serve as a benchmark for high precision text generation task. While the dataset is challenging, existing n-gram based evaluation metrics are often insufficient to detect hallucinations. To this end, we propose BLEURT, a fully learnt end-to-end metric based on transfer learning that can quickly adapt to measure specific evaluation criteria. Finally, we propose a model based on confidence decoding to mitigate hallucinations.

Collaborators: This is joint work with Thibault Sellam, Ran Tian, Xuezhi Wang, Sebastian Gehrmann, Shashi Narayan, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, and Dipanjan Das.

Speaker: Alexander I Rudnicky

Title: Creating socialbots with human-like conversational abilities

Abstract: We have two different communities in spoken language interaction, one focused on goal-oriented dialog systems, the other on open-domain conversational agents. The latter has allowed us to focus on the mechanics of conversation and on the role of social behaviors. This talk describes some of our recent work on conversation systems.

Speaker: Gokhan Tur

Title: Past, Present, Future of Conversational AI

Abstract: Recent advances in deep learning based methods for language processing, especially using self-supervised learning methods resulted in new excitement towards building more sophisticated Conversational AI systems. While this is partially true for social chatbots or retrieval based applications, the underlying skeleton of the goal oriented systems has remained unchanged: Still most language understanding models rely on supervised methods with manually annotated datasets even though the resulting performances are significantly better with much less data. In this talk I will cover two directions we are exploring to break from this: The first approach is aiming to incorporate multimodal information for better understanding and semantic grounding. The second part introduces an interactive self-supervision method to gather immediate actionable user feedback converting frictional moments into learning opportunities for interactive learning.