NLPhD Speaker Series
This is an NLP speaker series organized by PhD students and featuring PhD students. The talks are open to everyone!
The talks are held via MS Teams. You can join via this link. If you don't have a Saarland University email address, you are still very welcome, but you might encounter issues with joining. Please contact the organizers and we will add you.
Mor Geva (Tel Aviv University)
Date/Time: 14 September at 14:30 PM (CET)
Title: Down the Rabbit Hole: Debugging the Inner Workings of Transformer Models
The NLP field is dominated by transformer-based language models that are pre-trained on huge amounts of text. A large body of research has focused on understanding what knowledge and language features are encoded in the representations of these models. However, very little is known on how the models reason over the knowledge they encode. In this talk, I’ll share two recent findings on the inner workings of transformer models, and show how they can be harnessed for interpretability of model predictions. First, we will analyze one of the fundamental yet underexplored components in transformers - the feed-forward layers, and show their role in the prediction construction. Then, I will describe an interesting phenomenon of emergent behaviour in multi-task transformer models, that can be harnessed for interpretability and extrapolation of skills.
Maria Ryskina (CMU)
Date/Time: 22 June at 14:30 PM (CET)
Title: Unsupervised decipherment of informal romanization
Informal romanization is an idiosyncratic way of typing non-Latin-script languages in Latin alphabet, commonly used in online communication. Although the character substitution choices vary between users, they are typically grounded in shared notions of visual and phonetic similarity between characters. In this talk, I will focus on the task of converting such romanized text into its native orthography for Russian, Egyptian Arabic, and Kannada, showing how similarity-encoding inductive bias helps in the absence of parallel data. I'll also share some insights into the behaviors of the unsupervised finite-state and seq2seq models for this task and discuss how their combinations can leverage their different strengths.
Jean-Baptiste Cordonnier (EPFL)
Date/Time: 01 June at 14:00 PM (CET)
Title: Transformers for Vision
The Transformer architecture has become the de-facto standard for natural language processing tasks. In vision, recent trends of incorporating attention mechanisms have led researchers to reconsider the supremacy of convolutional layers as a primary building block. In this talk, we will explore the continuum of architectures mixing convolution and attention. I will share what is specific to the image domain and what lessons could be transferred to NLP.
Ben Peters (Instituto de Telecomunicações)
Date/Time: 04 May at 14:30 PM (CET)
Title: Down with softmax: Sparse sequence-to-sequence models
Sequence-to-sequence models are everywhere in NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. One alternative is sparse sequence-to-sequence models, which replace softmax with a sparse function from the 𝛼-entmax family. In this talk, we show how these models can shrink the search space by assigning zero probability to bad hypotheses, reducing length bias and sometimes enabling exact decoding.
Lieke Gelderloos (Tilburg University)
Date/Time: 13 April at 2.30 PM (CET)
Title: Active word learning through self-supervision
Models of cross-situational word learning typically characterize the learner as a passive observer. However, a language learning child can actively participate in verbal and non-verbal communication. In a computational study of cross-situational word learning, we investigate whether a curious word learner which actively selects input has an advantage over a learner which has no influence over the input it receives. We present a computational model that learns to map words to objects in images through word comprehension and production. The productive and receptive parts of the model can operate independently, but can also feed into each other. This introspective quality enables the model to learn through self-supervision, and also to estimate its own word knowledge, which is the basis for curious selection of input. We examine different curiosity metrics for input selection, and analyze the impact of each method on the learning trajectory. A formulation of curiosity which relies both on subjective novelty and plasticity yields faster learning, robust convergence, and best eventual performance.
Brielen Madureira (University of Potsdam)
Date/Time: 23 March at 3.00 PM (CET)
Title: Incremental Processing in the Age of non-Incremental Encoders
While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). In this talk, I will present the results of an investigation on how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We tested five models on various NLU datasets and compared their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality.
Shauli Ravfogel (Bar-Ilan University)
Date/Time: 26th January at 14:30 (CET)
Title: Identifying and manipulating concept subspaces
While LM representations are highly nonlinear in the input text, probing literature has demonstrated that linear classifiers can recover from the representations various human-interpretable concepts, from notions of gender to part-of-speech. I will present Iterative Nullspace Projection (INLP), a method to identify subspaces within the representation that correspond to arbitrary concepts. The method is data-driven and identifies those subspaces by the training of multiple orthogonal classifiers to predict the concept at focus. I will overview some recent work of ours, which demonstrates the utility of these concept subspaces for different goals: mitigating social bias in static and contextualized embeddings; assessing the influence of concepts on the model's behavior; and identifying syntactic features that explain the internal organization of representation space with respect to specific phenomena, such as relative clause structures.
Shauli will be available for meetings after the talk. Please reach out to Marius (see organizers) for scheduling meetings.