ICLR-21 Workshop on Neural Conversational AI:
Bridging the gap between research and real world

7 May 2021

Joining workshop

On the day of workshop, please enter sessions through this link: https://iclr.cc/virtual/2021/workshop/2133


Every day, millions of people use natural language interfaces in virtual digital assistants such as Amazon Alexa, Apple’s Siri, Google, Microsoft Cortana, Samsung’s Bixby and Facebook Potal via in-home devices or phones. At the same time, interest among the NLP research community in conversational systems has blossomed to the extent that Dialogue and Interactive Systems is consistently among the top three tracks in NLP conferences receiving a record number of submissions. Today’s industrial conversational AI systems are built using the traditional NLP pipeline, i.e., natural language understanding, dialog state tracking, dialog policy, and natural language generation. Despite its success, this pipeline fundamentally limits performance, humanness, and scaling of conversational AI systems. To overcome these challenges, dialog researchers have started embracing end-to-end neural approaches for the next generation of conversational AI systems, as such approaches have been setting state-of-the-art performance records on several NLP tasks. However, Neural Conversational AI systems are still far from shippable in the real world. We identify the following main outstanding questions to bridge this gap.

  • Grounding in external systems: How can neural conversational AI assistants ground conversations against external systems, e.g. databases, services, and other information modalities? What is the right architecture for ingesting such information and ensuring factual correctness? What is the right representation for external information sources to enable efficient integration with the dialog context?

  • Safety/integrity/robustness: How can we prevent neural conversational AI assistants from making inappropriate or even harmful remarks? How should we think about ensuring the safety, ethics, and integrity of human-facing neural conversational agents? How should we make them robust against adversarial users?

  • Continual learning: How can neural conversational AI systems learn from their mistakes, adapt to user feedback, and continue to update with new information to become more effective and personalized assistants throughout their lifetimes? What is the right representation for such user preferences?

The goal of this workshop is to bring together machine learning researchers and dialog researchers from academia and industry to encourage knowledge transfer and collaboration in this space with the goal of bridging the gap between research and real world use cases in neural approaches to Conversational AI. The ideal outcome of the workshop is to identify a set of concrete research directions for the research community (both NLP and representation learning communities) to enable Neural Conversational AI systems in the real world. We will make the findings from this workshop broadly available to the research community.


Please contact the organizing committee at neuralconvai-iclr2021@googlegroups.com if you have any questions.

Invited Speakers & Panelists

Safety for Open-Domain Dialogue Agents

Emily Dinan (Facebook AI)

Models trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which include offensive or otherwise toxic behavior and unwanted biases. In this talk, I will discuss some recent work investigating methods for mitigating these issues in the context of open-domain generative dialogue models. Among other methods, I will introduce new human-and-model-in-the-loop framework for both training safer models and for evaluating them. Finally, I will discuss some limitations of this work, as well as next steps for this line of research.

Emily Dinan is a Research Engineer at Facebook AI Research in New York. Her research interests include conversational AI, natural language processing, and fairness and responsibility in these fields. Recently she has focused on methods for preventing conversational agents from reproducing biased, toxic, or otherwise harmful language. Prior to joining FAIR, she received her master's degree in Mathematics from the University of Washington.

Robust conversational AI with grounded text generation

Jianfeng Gao (Microsoft Research)

In this talk, I present a hybrid approach based on a Grounded Text Generation (GTG) model to building robust task bots at scale. GTG is a hybrid model which uses a large-scale Transform neural network as its backbone, combined with symbol manipulation modules for knowledge base inference and prior knowledge encoding, to generate responses grounded in dialog belief state and real-world knowledge for task completion. GTG is pre-trained on large amounts of raw text and human conversational data, and can be fine-tuned to complete a wide range of tasks. The hybrid approach and its variants are being developed simultaneously by multiple research teams. The primary results reported on task-oriented dialog benchmarks are very promising, demonstrating big potential of this approach. I provide an overview of this progress and discuss related methods and technologies that can be incorporated for building robust conversational AI systems.

Jianfeng Gao is a Distinguished Scientist and Vice President of Microsoft. He is the manager of the Deep Learning (DL) group at Microsoft Research, leading the development of AI systems for natural language processing, Web search, vision language understanding, dialogue, and business applications. He is an IEEE fellow. From 2014 to 2017, he was Partner Research Manager at Deep Learning Technology Center at Microsoft Research, Redmond, where he was leading the research on deep learning for text and image processing. From 2006 to 2014, he was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond, where he worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation. From 2005 to 2006, he was a Research Lead in Natural Interactive Services Division at Microsoft, where he worked on Project X, an effort of developing natural user interface for Windows. From 2000 to 2005, he was Research Lead in Natural Language Computing Group at Microsoft Research Asia, where he and his colleagues developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Microsoft Windows.

Conversational AI Research @ Facebook AI

Alborz Geramifard (Facebook AI)

The goal of the conversational AI team at Facebook AI Applied Research team is to create AI driven dialog capabilities with the augmented/virtual reality product focus. This talk provides an overview of our recent efforts on data collection, multimodal dialog, pipelined model-based policies and end-to-end architectures.

Alborz Geramifard is a senior research manager at Facebook AI supporting the Conversational AI. Prior to joining Facebook, he led the conversational AI team at Amazon Alexa and created more than a dozen of NLU models shipped into production. Prior to Amazon, he was a postdoctoral fellow at MIT’s Laboratory for Information & Decision Systems. He received his PhD from MIT in 2011 and MSc from University of Alberta in 2008, both with the focus on reinforcement learning. Alborz was the recipient of the NSERC postgraduate scholarships 2010-2012 program. He has contributed to the community in various roles including the guest editor for Machine Learning Journal and AI Magazine and Area Chair for EMNLP and ACL.

Conversational Machines: Bringing (Un)Structured World Knowledge to Task-Oriented Conversations

Dilek Hakkani-Tür (Amazon Alexa AI)

In the past decade, we have seen significant advances in language understanding and conversational systems. However, the chasm between the treatment of task-oriented and open-domain social conversations still remains, and task-oriented dialogue systems are restricted to the limited coverage of APIs related to the set of tasks considered in the application domain. Users of conversational systems, oftentimes have possibly domain-related requests that are not covered by these APIs, even for their task-focused intents. To enable natural conversational interactions with virtual agents, we propose to expand the coverage of task-oriented dialogue systems by incorporating external, unstructured knowledge sources, such as web documents related to the task domain. We recently introduced an augmented version of MultiWOZ multi-domain task-oriented dialogue corpus, which includes sub-dialogues of out-of-API-coverage turns and responses grounded on external knowledge sources and organized an associated track at DSTC-9. In this talk, I’ll first discuss challenging issues towards task completion, present a description of how we integrate knowledgeable responses into task-oriented conversations, and then summarize our learnings from the DSTC-9 track, findings since then, and challenges for future research motivating our new track in DSTC-10.

Dilek Hakkani-Tür is a senior principal scientist at Amazon Alexa AI and a Visiting Distinguished Professor at UC Santa Cruz, focusing on enabling natural dialogues with machines. Prior to joining Amazon, she held research scientist positions at Google Microsoft Research, International Computer Science Institute and AT&T Labs-Research. She received her BSc degree from Middle East Technical Univ., and MSc and PhD degrees from Bilkent Univ., in Computer Science. Her research interests include conversational AI, natural language and speech processing, spoken dialogue systems, and machine learning for language processing. She has over 80 patents that were granted and co-authored more than 300 papers in natural language and speech processing. She served as an associate editor for IEEE Transactions on Audio, Speech and Language Processing, member of the IEEE Speech and Language Technical Committee, area editor for speech and language processing for Elsevier's Digital Signal Processing Journal and IEEE Signal Processing Letters, and member of the ISCA Advisory Council. She is currently the Editor-in-Chief of the IEEE/ACM Transactions on Audio, Speech and Language Processing, and a fellow of the IEEE and ISCA.

Learning to Summarize Visits from Doctor Patient Conversations

Zachary Chase Lipton (CMU)

Following each patient visit, physicians must draft a detailed clinical summary called a SOAP note. Moreover, with electronic health records, these notes must be digitized. Despite the benefits of this documentation, their creation remains an onerous process, contributing to increasing physician burnout. In this paper, we present the first study to evaluate complete pipelines to train summarization models to generate these notes from conversations between physicians and patients. We benefit from a dataset that, along with transcripts and paired SOAP notes, consists of annotations marking noteworthy utterances that support each summary sentence. We decompose the problem into extractive and abstractive subtasks, exploring a spectrum of approaches according to how much they demand from each component. We observe that the performance improves as we shift the burden to the extractive subtask. Our best performing method first (i) extracts noteworthy utterances via multi-label classification, assigning each to summary section(s); (ii) clusters noteworthy utterances on a per-section basis; and (iii) generates the summary sentences by conditioning on the corresponding cluster and the subsection of the SOAP sentence to be generated.

Zachary Chase Lipton is the BP Junior Chair Assistant Professor of Operations Research and Machine Learning at Carnegie Mellon University and a Visiting Scientist at Amazon AI. His research spans core machine learning methods and their social impact and addresses diverse application areas, including clinical medicine and natural language processing. Current research focuses include robustness under distribution shift, breast cancer screening, the effective and equitable allocation of organs, and the intersection of causal thinking and the messy high-dimensional data that characterizes modern deep learning applications. He is the founder of the Approximately Correct blog (approximatelycorrect.com) and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks. Find on Twitter (@zacharylipton) or GitHub (@zackchase).

Continual Learning Dialogue Systems - Learning after Model Deployment

Bing Liu (UIC)

In existing dialogue systems applications, once a dialogue system is built and deployed, it is fixed. It does not learn during conversations with users to improve itself. This is a serious limitation. We humans learn a great deal of our knowledge from our daily conversations. In this talk, I will discuss some of our recent work that try to endow dialogue systems with the ability to continuously learn new factual knowledge and new language expressions during interactive conversations with human users. Over time, the dialogue system will become more and more knowledgeable and better and better at conversing. We call this learning paradigm learning after model deployment or learning on the job.

Bing Liu is a distinguished professor at the University of Illinois at Chicago (UIC). He received his Ph.D. in AI from the University of Edinburgh. Before joining UIC, he was a faculty member at National University of Singapore. His current research interests include lifelong/continual learning, dialogue systems, sentiment analysis, natural language processing (NLP), machine learning and data mining. He has published extensively in top conferences and journals, and authored four books: one on lifelong machine learning, two on sentiment analysis, and one on Web mining. Three of his papers received Test-of-Time awards (two from KDD and one from WSDM) and another one received Test-of-Time honorable mention award also from WSDM. Some of his work have also been widely reported in the press, including a front-page article in the New York Times. He served as the Chair of ACM SIGKDD from 2013-2017, as program chair of many leading data mining conferences, including KDD, ICDM, CIKM, WSDM, SDM, and PAKDD, and as area/track chair or senior PC member of numerous NLP, AI, Web and data mining conferences. He is the recipient of 2018 ACM SIGKDD Innovation Award, and is a Fellow of ACM, AAAI and IEEE.

What's wrong with SotA in Conversational AI: Data, Models and Metrics

Verena Rieser (Heriot Watt University)

Current NLP benchmarks are well known to be misleading: they contain noisy data, allow spurious correlations, and encourage overfitting through IID test sets.

In this talk, I will point out some additional challenges for Conversational AI, which I will illustrate using three case studies: our work on visual dialogue (Agarwal et al., ACL’20), data-to-text (e.g. Dusek et al., 2020) and open-domain social conversation, e.g. (Xu et al., EMNLP’18; Curry & Rieser, SigDial’19).

In particular, I will argue that we need more reliable metrics, valid datasets and better task-formulations, as well as more principled human evaluations in order to measure true progress in this field. I will also briefly present our latest work on controllable neural models, including commonsense grounding in open-domain conversation (Sevegnani et al. 2021) and planning-based data-to-text generation (Xu et al. 2021).

Verena Rieser is a professor at Heriot-Watt University in Edinburgh, where she leads research on Natural Language Generation and Spoken Dialogue Systems. She is also a co-founder of the Conversational AI company Alana. Verena was recently awarded a Leverhulme Trust Senior Research Fellowship by the Royal Society and she is PI of several publicly funded research projects and industry awards. Verena’s team has twice entered the prestigious Amazon Alexa Challenge as one of the three finalists two years in a row. Her research has featured in the BBC's documentary The Joy of AI, BBC’s Tomorrow's World and in national and international news. Verena’s current research interests include Ethics for open-domain conversational systems, Data-to-Text and Text-to-Text generation, as well as Multimodal Dialogue.