2nd Deep Learning Inside Out (DeeLIO)

Knowledge Extraction and Integration for

Deep Learning Architectures


Workshop@NAACL-HLT 2021

The second DeeLIO workshop was held online on June, 10 2021 in conjunction with the NAACL-HLT conference.


The DeeLIO 2021 proceedings are available on the ACL Anthology.



We have had three amazing keynote speakers at DeeLIO 2021.

  • Vered Shwartz (Allen Institute for AI (AI2) and University of Washington)


Title: A bookworm barely graduating from the University of Life: On Language Models and Commonsense Knowledge


Abstract: Pre-trained language models are used by deep learning models not only as a representation layer but also as a source of world knowledge. While they capture factual knowledge memorized from their training data, such as Dante's birthplace, how well do they capture commonsense knowledge, such as that it's unwise to tell a talking parrot your secrets?


In this talk I will present methods for extracting commonsense knowledge from language models. I will then discuss the limitations in relying on language models as a source of commonsense knowledge. Finally, we will look into methods for incorporating knowledge from external sources into language models.




Title: Probably Asked Questions and Parametric vs Non-Parametric Knowledge


Abstract: In this talk I look at factual knowledge as a function from questions to answers, and frame knowledge intensive tasks as distributions over “probably asked questions.” I will consider two paradigms: parametric models approximate the above function by optimising a fixed number of parameters using key/value pairs as training set; and non-parametric models that memorise key/value pairs explicitly. I show that parametric models are promising solutions when based on pre-trained LMs, but remain relatively poor approximators in comparison to non-parametric counterparts. I will also illustrate that traditional knowledge bases/graphs can be seen as non-parametric models optimised for very particular “probable question” distributions, with additional “training pairs” generated/curated in advance. We translate this paradigm to modern Open-Domain QA question distributions by synthetically generating PAQ, a dataset of 60M+ likely question/answers, and introducing RePAQ, a non-parametric model we train with PAQ. RePAQ enables us to readily build systems that either very fast (1000 q/s) and quite accurate, very small and quite accurate (winning two NeuRIPS competition tracks for their minimal memory footprint), or very accurate and still quite fast (More accurate and 2x faster than a SOTA model). Critically, RePAQ is always good at knowing what it doesn’t know.


  • Lena Voita (University of Edinburgh and University of Amsterdam)


Title: Neural Machine Translation Inside Out (Blog post)


Abstract: In the last decade, machine translation shifted from traditional statistical approaches (SMT) to end-to-end neural ones (NMT). While traditional approaches split the translation task into several components and use various hand-crafted features, NMT learns the translation task directly from data, without splitting it into subtasks. The main question of this talk is how NMT manages to do this, and I will try to answer it keeping in mind the traditional paradigm. First, I will show that NMT components can take roles corresponding to the features modelled explicitly in SMT. Then I will explain how NMT balances the two different types of context, the source and the prefix of the target sentence. Finally, we will see that NMT training consists of the stages where it focuses on the competences mirroring three core SMT components: target-side language modeling, lexical translation and reordering.