NLP for Conversational AI

ACL 2019 Workshop at Florence, Italy

Program

Schedule (1 August 2019)

Verena Rieser is a Professor in Artificial Intelligence at Heriot-Watt University, Edinburgh, where she is affiliated with the Interaction Lab and the Edinburgh Center for Robotics. Verena holds a PhD from Saarland University (2008) and from 2008-2011 she was a postdoctoral researcher at the University of Edinburgh.

The ongoing theme of Verena's research is to develop intelligent conversational systems, such as chatbots and virtual personal assistants, where she researches machine learning techniques to automatically build these systems from data. She has authored over 100 peer-reviewed papers in this area. For example, Verena was one of the first researchers to introduce Reinforcement Learning to optimise task-based dialogues and Natural Language Generation (NLG). In 2017 and 2018, Verena's team was the only UK university to enter the finals of the prestigious Amazon Alexa Challenge, which aims to build open-domain chatbots.

Verena takes a number of leadership roles within the research community, such as serving on advisory committees for shared tasks, such as DSTC and WOCHAT, special interest groups (SigDial, SemDial) and steering committees (SIGGEN), member of editorial boards, organiser of international workshops and conferences, programme chair for major ACL conferences (e.g. ACL two years in a row), and as invited speaker and panel member.

Verena is actively involved in setting new research agendas for the community. For example, I organised the highly subscribed E2E NLG shared task, which attracted a substantial number of 62 system submissions, with 1/3 coming from industry. This work has been nominated for a SigDial 2017 best paper award.

Verena's work was also recognised with a number of prizes, including the Dr-Eduard-Martin Prize for outstanding research, winner of the SemEval 2016 Shared Task on Arabic Sentiment Analysis, and runner-up for the Amazon Alexa Prize 2017 and 2018. In 2019 alone, Verena was invited to be the keynote speaker of 7 international conferences and workshops. Verena's outreach activities have been recognised with the EPSRC Public Engagement Award 2018, a Herald Higher Education Award 2018 for Heriot-Watt's "Year of Robotics", as well as with a nomination as one of "12 Women Shaping AI" by NESTA, "One to Watch 2019" by FutureScot and as "Pioneering Women in AI" by The Telegraph, 2019.

Keynote: Should Conversational AI use neural response generation?

Neural methods are a powerful tool for learning language models from large amounts of data, which can be used for text generation. But can they be used to accurately convey meaning in spoken dialogue systems?

In this talk I will discuss this question in the light of recent results from two large-scale studies on response generation in dialogue: First, I will summarise results from the End-to-End NLG Challenge for presenting information in task-based dialogue systems. Second, I will report our experience from experimenting with these models for generating responses in open-domain social dialogue as part of the Amazon Alexa Prize challenge.

Matt’s background is in statistical methods for conversational language understanding. He is the lead scientist of PolyAI, a spin-out from Steve Young’s lab at Cambridge University where he did his PhD. PolyAI is building a machine learning platform for spoken dialogue. After his PhD, he worked on neural network methods for speech synthesis with Heiga Zen at Google Research London, before moving to Ray Kurzweil’s language understanding group at Google Research in Mountain View. There he was technical lead for the Smart Reply research team, inventing a new method of modelling email response suggestion that allowed scaling the feature from Inbox to all of GMail. He was the principal data scientist at Carousell, where he launched image caption and category suggestion, chat reply and question answering features.

Keynote: Neural Models of Response Selection for Bootstrapping Dialogue Systems

Unsupervised pre-training followed by in-domain fine-tuning has proven to be a powerful technique for natural language processing in recent years. This talk motivates response selection as a semi-supervised pre-training task, leveraging large corpora of natural conversational data to learn useful representations specifically for task-based dialogue systems. Dual encoder models are fast to train on this task, and can be far more computationally efficient than general models of language such as BERT. This talk presents how we have applied this paradigm at PolyAI to rapidly develop task-based dialogue systems.

Jianfeng Gao is Partner Research Manager at Microsoft Research AI, Redmond. He leads the development of AI systems for machine reading comprehension (MRC), question answering (QA), social bots, goal-oriented dialogue, and business applications. From 2014 to 2017, he was Partner Research Manager at Deep Learning Technology Center at Microsoft Research, Redmond, where he was leading the research on deep learning for text and image processing. From 2006 to 2014, he was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond, where he worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation. From 2005 to 2006, he was a Research Lead in Natural Interactive Services Division at Microsoft, where he worked on Project X, an effort of developing natural user interface for Windows. From 2000 to 2005, he was Research Lead in Natural Language Computing Group at Microsoft Research Asia, where he and his colleagues developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Microsoft Windows. He is an IEEE fellow.

Keynote: The design and implementation of XiaoIce, an empathetic social chatbot

In this talk, I will describe the development of the Microsoft XiaoIce system, the most popular social chatbot in the world. XiaoIce is uniquely designed as an AI companion with an emotional connection to satisfy the human need for communication, affection, and social belonging. We takes into account both intelligent quotient (IQ) and emotional quotient (EQ) in system design, cast human-machine social chat as decision-making over Markov Decision Processes (MDPs), and optimize XiaoIce for long-term user engagement, measured in expected Conversation-turns Per Session (CPS). We detail the system architecture and key components including dialogue manager, core chat, skills, and an empathetic computing module. We show how XiaoIce dynamically recognizes human feelings and states, understands user intent, and responds to user needs throughout long conversations. Since the release in 2014, XiaoIce has communicated with over 660 million users and succeeded in establishing long-term relationships with many of them. Analysis of large-scale online logs shows that XiaoIce has achieved an average CPS of 23, which is significantly higher than that of other chatbots and even human conversations.


Invited Talk Details

Yejin Choi is an associate professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington and also a senior research manager at AI2 overseeing the project Mosaic. Her research interests include language grounding with vision, physical and social commonsense knowledge, language generation with long-term coherence, conversational AI, and AI for social good. She was a recipient of Borg Early Career Award (BECA) in 2018, among the IEEE’s AI Top 10 to Watch in 2015, a co-recipient of the Marr Prize at ICCV 2013, and a faculty advisor for the Sounding Board team that won the inaugural Alexa Prize Challenge in 2017. Her work on detecting deceptive reviews, predicting literary success, and interpreting bias and connotation has been featured by numerous media outlets including NBC News for New York, NPR Radio, the New York Times, and Bloomberg Business Week. She received her Ph.D. in Computer Science from Cornell University.

Keynote: The Curious Case of Degenerate Neural Conversation

Despite considerable advances in deep neural language models, the enigma of neural text degeneration persists when these models are used as text generators---especially for conversations, and more broadly for open-ended long-form text generation as well.

In this talk, I will share our recent results that attack three major roadblocks toward neural conversation with long-term coherence: (1) neural text generation with long-term coherence (GROVER with Nucleus Sampling), (2) neural representations of social commonsense intelligence (COMET on ATOMIC), and (3) QA benchmarks for conversation understanding (DREAM and QuAC).

Jason Weston is a research scientist at Facebook, NY and a Visiting Research Professor at NYU. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2001, he was a researcher at Biowulf technologies. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning, with a focus on reasoning, memory, perception, interaction and communication. Jason has published over 100 papers, including best paper awards at ICML and ECML, and a Test of Time Award for his work "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", ICML 2008 (with Ronan Collobert). He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. He was listed as the 16th most influential machine learning scholar at AMiner and one of the top 50 authors in Computer Science in Science.

Keynote: Putting together the threads of conversational AI?

Maybe we don't have enough threads yet to knit together the whole, but let's try anyway! We present our view of what is necessary for conversational AI, and the pieces we have worked on so far to get there. In particular: software (ParlAI, a unified platform for dialogue research), various neural architectures for memory, reasoning, retrieval and generation, and interactive learning, tasks for employing personality (PersonaChat), knowledge (Wizard of Wikipedia) and perception (Image-Chat), evaluation studies & techniques (dialogue NLI), and a recent competition (ConvAI2) we ran that shows unfortunately how far we still have to go.

Ruhi Sarikaya is Director of Applied Science at Amazon Alexa since 2016. He built the Alexa Brain organization largely from the ground-up. His team has been building core AI capabilities and launching features around ranking, relevance, natural language understanding, dialog management, contextual understanding, personalization and end-to-end offline/online metrics and learning for Alexa. Prior to that, he was a principal science manager and the founder of the language understanding and dialog systems group at Microsoft between 2011 and 2016. His group has built language understanding and dialog management capabilities of Cortana, Xbox One, and the underlying platform supporting both 1st and 3rd party. Before Microsoft, he was a research staff member and team lead in the Human Language Technologies Group at IBM T.J. Watson Research Center for ten years. Prior to IBM, he worked as a researcher at the Center for Spoken Language Research (CSLR) at the University of Colorado at Boulder for two years. He received his Ph.D. degree from Duke University, NC in 2001 in electrical and computer engineering. He has published over 120 technical papers in refereed journal and conference proceedings and, is inventor of 70 issued/pending patents. He has received a number of prestigious awards for his work, including two Outstanding Technical Achievement Awards (2005 and 2008) and two Research Division Awards (2005 and 2007) and a best paper award (ASRU-2013). Dr. Sarikaya has served in the IEEE SLTC, the general co-chair of IEEE SLT’12, publicity chair of IEEE ASRU’05, associate editors of IEEE Trans. on Audio, Speech and Language Processing and IEEE Signal Processing Letters. I also gave a tutorial at Interspeech-2007. He has been regularly giving keynotes in major AI, Web, speech and language technology conferences.

Keynote: Enabling the Scalable, Natural, Self-Learning Contextual Conversational Systems

We are in the midst of an AI revolution that is in motion. This revolution is enabled by several disruptive changes including dramatic increase in compute power, mobile internet, and advances in machine learning. The next decade is expected to be about proliferation of Internet-of-Things (IoT) devices and sensors, which will give rise to new forms of interaction scenarios with these systems. These systems will generate exponentially larger amounts of data and pave the way for ‘collective intelligence’ of IoT systems. From the end-user’s perspective though, while new forms of contextual interactionswill emerge, the interaction complexity will also increase. The users will interact with these systems under increasingly richcontext (e.g. physical space/location/vehicle, date/time, end-point, device type, and input/output modality). Conversational AI has a critical role to play in this evolution, only if it delivers on its promise of enabling natural, frictionless interaction in any context the user is in, while hiding the complexity of these systems. However, current commercial conversational AI systems are primarily trained with a supervised learning paradigm, which is difficult, if not impossible, to scale by curating data for increasingly complex set of contextual conditions. Inherent ambiguity in natural language further complicates the problem. We need to devise new forms of learning that will scale to this complexity. In this talk, we present some early steps we are taking to 1) interpret their natural language requests contextually, 2) provide contextually and personally relevant answers, 3) self-learning techniques relying on user interactions for supervision.