Deep Learning Inside Out (DeeLIO)

Knowledge Extraction and Integration for

Deep Learning Architectures


Workshop@ACL 2022 - May, 27

Keynote Speakers

Tal Linzen

New York University



Talk at 9:30am Dublin time

Title:


Beyond Probing Classifiers: Deconstructing the Function and Structure of Vector Representations


Abstract:


The success of artificial neural networks in language processing tasks has underscored the need to understand how they accomplish their behavior, and, in particular, how that behavior is supported by their internal vector representations. Probing classifiers, which are often used to address this question, rely on a strong, and arguably questionable, assumption: if a classifier can decode a particular piece of information from a vector, we can conclude that that piece of information affects the model's behavior. My talk will present two methods that go beyond this paradigm. The first method, AlterRep, modifies a contextualized representation such that it encodes a particular value of a linguistic feature (e.g., whether it’s in an embedded clause), and examines how that perturbation affects the model’s behavior. Using this method, I will show that syntactic information can be decoded from all layers of BERT, but only plays a functional role in a subset of those (the middle layers). The second, DISCOVER, aims to identify linear compositional structure in the vector: it tests the hypothesis that the vector can be fully decomposed into a sum of symbol-representing vectors (filler-role bindings). This hypothesis is borne out to a remarkable extent for networks trained on highly compositional tasks, and to a lesser extent for BERT’s embeddings.




Yejin Choi

Allen Institute for AI (AI2)

and University of Washington


Talk at 3:30pm Dublin time

Title:


Knowledge is Power: Symbolic Knowledge Distillation and Commonsense Morality


Abstract:


Scale appears to be the winning recipe in today's AI leaderboards. And yet, extreme-scale neural models are still brittle to make errors that are often nonsensical and even counterintuitive. In this talk, I will argue for the importance of knowledge, especially commonsense knowledge, and demonstrate how smaller models developed in academia can still have an edge over larger industry-scale models, if powered with knowledge.


First, I will introduce "symbolic knowledge distillation", a new framework to distill larger neural language models into smaller commonsense models, which leads to a machine-authored KB that wins, for the first time, over a human-authored KB in all criteria: scale, accuracy, and diversity. Next, I will present an experimental conceptual framework toward computational social norms and commonsense morality, so that neural language models can learn to reason that “helping a friend” is generally a good thing to do, but “helping a friend spread fake news” is not.




Allyson Ettinger

University of Chicago



Talk at 4:40pm Dublin time

Title:


"Understanding" and prediction: Controlled examinations of meaning sensitivity in pre-trained models


Abstract:


In recent years, pre-trained models in NLP have driven seemingly incredible progress, with models even surpassing human performance on many benchmarks. How should we interpret these advances? Have these models achieved language "understanding"? Operating on the premise that "understanding" will necessarily involve the capacity to extract and deploy meaning information, in this talk I will discuss a series of projects leveraging targeted tests to examine pre-trained models' ability to capture meaning in a systematic fashion. I will first discuss work probing model representations for compositional meaning, with a particular focus on disentangling compositional information from encoding of lexical properties. I'll then explore models' ability to extract and use meaning information when executing the basic pre-training task of word prediction in context. In all cases, these investigations apply tests that prioritize control of unwanted cues, so as to target the desired model capabilities with greater precision. The results of these studies suggest that although models show a good deal of sensitivity to word-level information, and to certain semantic and syntactic distinctions, when subjected to controlled tests these models show little sign of representing higher-level compositional meaning, or of being able to retain and deploy such information robustly during word prediction. Instead, models show signs of heuristic predictive strategies that are unsurprising given their training, but that differ critically from systematic understanding of meaning. I will discuss potential implications of these findings with respect to the goals of achieving "understanding" with currently dominant pre-training paradigms.