Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, and Goran Glavaš

Abstract: Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TOD domains. In this work, we investigate the effects of domain specialization of pretrained language models (PLMs) for task-oriented dialog. Within our DS-TOD framework, we first automatically extract salient domain-specific terms, and then use them to construct DOMAINCC and DOMAINREDDIT – resources that we leverage for domain-specific pretraining, based on (i) masked language modeling (MLM) and (ii) response selection (RS) objectives, respectively. We further propose a resource-efficient and modular domain specialization by means of domain adapters – additional parameter-light layers in which we encode the domain knowledge. Our experiments with two prominent TOD tasks – dialog state tracking (DST) and response retrieval (RR) – encompassing five domains from the MULTIWOZ TOD benchmark demonstrate the effectiveness of our domain specialization approach. Moreover, we show that the lightweight adapter-based specialization (1) performs comparably to full fine-tuning in singledomain setups and (2) is particularly suitable for multi-domain specialization, in which, besides advantageous computational footprint, it can offer better downstream performance.


Submitted: 15.10.2021

RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models

Soumya Barikeri, Anne Lauscher, Ivan Vulić, Goran Glavaš

Abstract: Text representation models are prone to exhibit a range of societal biases, reflecting the non-controlled and biased nature of the underlying pretraining data, which consequently leads to severe ethical issues and even bias amplification. Recent work has predominantly focused on measuring and mitigating bias in pretrained language models. Surprisingly, the landscape of bias measurements and mitigation resources and methods for conversational language models is still very scarce: it is limited to only a few types of bias, artificially constructed resources, and completely ignores the impact that debiasing methods may have on the final performance in dialog tasks, e.g., conversational response generation. In this work, we present RedditBias, the first conversational data set grounded in the actual human conversations from Reddit, allowing for bias measurement and mitigation across four important bias dimensions: gender, race, religion, and queerness. Further, we develop an evaluation framework which simultaneously 1) measures bias on the developed RedditBias resource, and 2) evaluates model capability in dialog tasks after model debiasing. We use the evaluation framework to benchmark the widely used conversational DialoGPT model along with the adaptations of four debiasing methods. Our results indicate that DialoGPT is biased with respect to religious groups and that some debiasing techniques can remove this bias while preserving downstream task performance.


Submitted: 07.06.2021

Accepted for ACL21

Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems

Evgeniia Razumovskaia, Goran Glavaš, Olga Majewska, Edoardo M. Ponti, Anna Korhonen, Ivan Vulić

Abstract: In task-oriented dialogue (ToD), a user holds a conversation with an artificial agent to complete a concrete task. Although this technology represents one of the central objectives of AI and has been the focus of ever more intense research and development efforts, it is currently limited to a few narrow domains (e.g., food ordering, ticket booking) and a handful of languages (e.g., English, Chinese). This work provides an extensive overview of existing methods and resources in multilingual ToD as an entry point to this exciting and emerging field. We find that the most critical factor preventing the creation of truly multilingual ToD systems is the lack of datasets in most languages for both training and evaluation. In fact, acquiring annotations or human feedback for each component of modular systems or for data-hungry end-to-end systems is expensive and tedious. Hence, state-of-the-art approaches to multilingual ToD mostly rely on (zero- or few-shot) cross-lingual transfer from resource-rich languages (almost exclusively English), either by means of machine translation or multilingual representations. These approaches are currently viable only for typologically similar languages and languages with parallel / monolingual corpora available. On the other hand, their effectiveness beyond these boundaries is doubtful or hard to assess due to the lack of linguistically diverse benchmarks (especially for natural language generation and end-to-end evaluation). To overcome this limitation, we draw parallels between components of the ToD pipeline and other NLP tasks, which can inspire solutions for learning in low-resource scenarios. Finally, we list additional challenges that multilinguality poses for related areas (such as speech and human-centred evaluation), and indicate future directions that hold promise to further expand language coverage and dialogue capabilities of current ToD systems.


Submitted: 03.06.2021

Über die Partner des Forschungsprojekts

Das Konsortium des Multi2ConvAI Forschungsprojekts besteht aus der Universität Mannheim und zwei KMUs mit Sitz in Karlsruhe, inovex GmbH und Neohelden GmbH. Die drei Partner teilen ihre Expertise im Rahmen des Projektes in der Hoffnung aus den entstehenden Synergien zu lernen und zu wachsen.


Bei Fragen und Anregungen stehen wir jederzeit unter zur Verfügung.


Das Projekt „Mehrsprachige und domänenübergreifende Conversational AI” wird durch das Land Baden-Württemberg als Teilnehmer des „KI-Innovationswettbewerbs” unterstützt. Die Förderung zielt darauf ab, die KI-Wertschöpfung und KI-Anwendung im Mittelstand branchenübergreifend zu unterstützen. Mit gemeinsamen Forschungs- und Entwicklungsvorhaben zwischen Forschungseinrichtungen und mittelständischen Unternehmen soll die Basis für neue und verbesserte KI-basierte Produkte und Dienstleistungen gelegt werden.