NLP for Conversational AI

ACL 2019 Workshop at Florence, Italy


Ever since the invention of the intelligent machine, hundreds and thousands of mathematicians, linguists, and computer scientists have dedicated their career to empowering human-machine communication in natural language. Although the idea is finally around the corner with a proliferation of virtual personal assistants such as Siri, Alexa, Google Assistant, and Cortana, the development of these conversational agents remains difficult and there still remain plenty of unanswered questions and challenges.

Conversational AI is hard because it is an interdisciplinary subject. Initiatives were started in different research communities, from Dialogue State Tracking Challenges to NIPS Conversational Intelligence Challenge live competition and the Amazon Alexa prize. However, various fields within the NLP community, such as semantic parsing, coreference resolution, sentiment analysis, question answering, and machine reading comprehension etc. have been seldom evaluated or applied in the context of conversational AI.

The goal of this workshop is to bring together NLP researchers and practitioners in different fields, alongside experts in speech and machine learning, to discuss the current state-of-the-art and new approaches, to share insights and challenges, to bridge the gap between academic research and real-world product deployment, and to shed the light on future directions. “NLP for Conversational AI” will be a one-day workshop including keynotes, spotlight talks, posters, and panel sessions. In keynote talks, senior technical leaders from industry and academia will share insights on the latest developments of the field. An open call for papers will be announced to encourage researchers and students to share their prospects and latest discoveries. The panel discussion will focus on the challenges, future directions of conversational AI research, bridging the gap in research and industrial practice, as well as audience-suggested topics.

Important Dates

    • Paper Submission: April 26, 2019 (23:59 PST)
    • Notification of Acceptance: May 24, 2019
    • Camera-ready Paper Due: June 3, 2019
    • Workshop Date: August 1, 2019

Invited Speakers

Keynote: TBD

Matt’s background is in statistical methods for conversational language understanding. He is the lead scientist of PolyAI, a spin-out from Steve Young’s lab at Cambridge University where he did his PhD. PolyAI is building a machine learning platform for spoken dialogue. After his PhD, he worked on neural network methods for speech synthesis with Heiga Zen at Google Research London, before moving to Ray Kurzweil’s language understanding group at Google Research in Mountain View. There he was technical lead for the Smart Reply research team, inventing a new method of modelling email response suggestion that allowed scaling the feature from Inbox to all of GMail. He was the principal data scientist at Carousell, where he launched image caption and category suggestion, chat reply and question answering features.

Keynote: TBD

Verena Rieser is a Professor in Computer Science at Heriot-Watt University, Edinburgh, where she is affiliated with the Interaction Lab. Verena holds a PhD from Saarland University (2008) and worked as a postdoctoral researcher at the University of Edinburgh (2008-11). Her research focuses on machine learning techniques for spoken dialogue systems and language generation, where she has authored almost 100 peer-reviewed papers. She has served as area chair for ACL for both generation and dialogue. For the past two years, Verena and her group were the only UK team to make it through to the finals of the Amazon Alexa Prize.

Keynote: Should Conversational AI use neural response generation?

Neural methods are a powerful tool for learning language models from large amounts of data, which can be used for text generation. But can they be used to accurately convey meaning in spoken dialogue systems?

In this talk I will discuss this question in the light of recent results from two large-scale studies on response generation in dialogue: First, I will summarise results from the End-to-End NLG Challenge for presenting information in task-based dialogue systems. Second, I will report our experience from experimenting with these models for generating responses in open-domain social dialogue as part of the Amazon Alexa Prize challenge.

Steve Young is Emeritus Professor of Information Engineering at Cambridge University and a Senior Member of Technical Staff at Apple. His main research interests lie in the area of statistical spoken language systems including speech recognition, speech synthesis and dialogue management. He has more than 300 publications in the field and he is the recipient of a number of awards including an IEEE Signal Processing Society Technical Achievement Award, a Eurosip Technical Achievement Award, the IEEE James L Flanagan Speech and Audio Processing Award, and an ISCA Medal for Scientific Achievement. He is a Fellow of the IEEE, ISCA and the IET and he is a Fellow of the UK Royal Academy of Engineering. In addition to his academic career, he was Senior Pro-Vice Chancellor at Cambridge University from 2009 to 2015 and he has founded a number of successful start-ups in the speech and language technology area.

Keynote: TBD

Jason Weston is a research scientist at Facebook, NY and a Visiting Research Professor at NYU. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2001, he was a researcher at Biowulf technologies. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning, with a focus on reasoning, memory, perception, interaction and communication. Jason has published over 100 papers, including best paper awards at ICML and ECML, and a Test of Time Award for his work "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", ICML 2008 (with Ronan Collobert). He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. He was listed as the 16th most influential machine learning scholar at AMiner and one of the top 50 authors in Computer Science in Science.

Keynote: Putting together the threads of conversational AI?

Maybe we don't have enough threads yet to knit together the whole, but let's try anyway! We present our view of what is necessary for conversational AI, and the pieces we have worked on so far to get there. In particular: software (ParlAI, a unified platform for dialogue research), various neural architectures for memory, reasoning, retrieval and generation, and interactive learning, tasks for employing personality (PersonaChat), knowledge (Wizard of Wikipedia) and perception (Image-Chat), evaluation studies & techniques (dialogue NLI), and a recent competition (ConvAI2) we ran that shows unfortunately how far we still have to go.

Jianfeng Gao is Partner Research Manager at Microsoft Research AI, Redmond. He leads the development of AI systems for machine reading comprehension (MRC), question answering (QA), social bots, goal-oriented dialogue, and business applications. From 2014 to 2017, he was Partner Research Manager at Deep Learning Technology Center at Microsoft Research, Redmond, where he was leading the research on deep learning for text and image processing. From 2006 to 2014, he was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond, where he worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation. From 2005 to 2006, he was a Research Lead in Natural Interactive Services Division at Microsoft, where he worked on Project X, an effort of developing natural user interface for Windows. From 2000 to 2005, he was Research Lead in Natural Language Computing Group at Microsoft Research Asia, where he and his colleagues developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Microsoft Windows. He is an IEEE fellow.

Keynote: The design and implementation of XiaoIce, an empathetic social chatbot

In this talk, I will describe the development of the Microsoft XiaoIce system, the most popular social chatbot in the world. XiaoIce is uniquely designed as an AI companion with an emotional connection to satisfy the human need for communication, affection, and social belonging. We takes into account both intelligent quotient (IQ) and emotional quotient (EQ) in system design, cast human-machine social chat as decision-making over Markov Decision Processes (MDPs), and optimize XiaoIce for long-term user engagement, measured in expected Conversation-turns Per Session (CPS). We detail the system architecture and key components including dialogue manager, core chat, skills, and an empathetic computing module. We show how XiaoIce dynamically recognizes human feelings and states, understands user intent, and responds to user needs throughout long conversations. Since the release in 2014, XiaoIce has communicated with over 660 million users and succeeded in establishing long-term relationships with many of them. Analysis of large-scale online logs shows that XiaoIce has achieved an average CPS of 23, which is significantly higher than that of other chatbots and even human conversations.

To be continued ...