NLPhD Speaker Series
This is an NLP speaker series organized by PhD students and featuring PhD students. The talks are open to everyone!
The talks are held via MS Teams. You can join via this link. If you don't have a Saarland University email address, you are still very welcome, but you might encounter issues with joining. Please contact the organizers and we will add you.
Katerina Margatina (University of Sheffield)
Date/Time: 12 April at 14:30 PM (CET)
Title: Data Efficient NLP with Active Learning
Recent Active Learning (AL) approaches in Natural Language Processing (NLP) proposed using off-the-shelf pretrained language models (LMs). In this work, we argue that these LMs are not adapted effectively to the downstream task during the low-data resource setting of AL and we explore ways to address this issue. Furthermore, we focus also on the data acquisition step of the AL pipeline, aiming to investigate methods based on uncertainty and diversity sampling. Leveraging the best of both worlds, we propose an acquisition function, CAL, that opts for selecting contrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We show that both our contributions, in the model training and data acquisition part of the iterative AL loop, provide large improvements over the baselines, in a variety of natural language understanding tasks.
Rabeeh Karimi Mahabadi (EPFL and Idiap Research Institute)
Date/Time: 08 March at 14:30 PM (CET)
Title: Prompt-free and Efficient Language Model Fine-Tuning
Current methods for few-shot fine-tuning of pretrained masked language model (PLM) require carefully engineered prompts and verbalizers for each new task, to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. PERFECT makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn a new multi-token label embedding during fine-tuning which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. We will release our code publicly to facilitate future work.
Mostafa Abdou (University of Copenhagen (CoASTaL NLP))
Date/Time: 08 February at 14:30 PM (CET)
Title: Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color
Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To explore this question, we conduct a thorough case study on color. Namely, we employ a dataset of monolexemic color terms and color chips represented in CIELAB, a color space with a perceptually meaningful distance metric. Using two methods of evaluating the structural alignment of colors in this space with text-derived color term representations, we find significant correspondence. Analyzing the differences in alignment across the color spectrum, we find that warmer colors are, on average, better aligned to the perceptual color space than cooler ones, suggesting an intriguing connection to findings from recent work on efficient communication in color naming. Further analysis suggests that differences in alignment are, in part, mediated by collocationality and differences in syntactic usage, posing questions as to the relationship between color perception and usage and context.
Elizabeth Salesky (Center for Language and Speech Processing, Johns Hopkins University)
Date/Time: 25 January at 14:30 PM (CET)
Title: Looking beyond Unicode for Open-Vocabulary Text Representations
Models of text typically have finite vocabularies, and commonly use subword segmentation techniques to achieve an 'open vocabulary.' This approach relies on consistent and correct underlying unicode sequences, and experiences degradation when presented with common types of noise, variation, and rare forms. In this talk, I will discuss an alternate approach in which we render text as images and learn open-vocabulary representations along with the downstream task, which in our work is machine translation. We show that models using visual text representations approach or match performance of traditional unicode-based models for several language pairs and scripts, with significantly greater robustness. I will also discuss several open questions and avenues for future work.
William N. Havard (LSCP laboratory at ENS Ulm)
Date/Time: 14 December at 14:30 PM (CET)
Title: Lexical Emergence from Context: Exploring the Role of Attention
In recent years, deep learning methods have allowed the creation of neural models that are able to process several modalities at once. Neural models of Visually Grounded Speech (VGS) are such kind of models and are able to jointly process a spoken input and a matching visual input. Such models have sparked interest in linguists and cognitive scientists as they are able to model complex interactions between two modalities --- speech and vision --- and can be used to simulate child language acquisition and, more specifically, lexical acquisition. In this talk, I present how a RNN-based model of VGS uses its attention mechanisms to highlight key segments in the speech signal, and how this compares to child language acquisition. In order to gain a better understanding of the learning patterns of VGS models, I present the results of experiments conducted on two typologically different languages, English, and Japanese.
Ratish Puduppully (University of Edinburgh)
Date/Time: 23 November at 14:30 PM (CET)
Title: Data-to-text Generation with Neural Planning
Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or variants thereof. These models generate fluent (but often imprecise) text and perform quite poorly at selecting appropriate content and ordering it coherently. In this talk, I will discuss my attempts at overcoming these issues with the integration of content planning with neural models. I will first present work on integrating fine-grained or micro plans with data-to-text generation. Such micro plans can take the form of a sequence of records highlighting which information should be mentioned and in which order. I then discuss how coarse-grained or macro plans can be beneficial for data-to-text generation. Macro plans represent high-level organization of important content such as entities, events, and their interactions; they are learnt from data and given input to the generator. In conclusion, I show that planning makes data-to-text generation more interpretable, improves the factuality and coherence of the generated documents and reduces redundancy in the output text.
Ece Takmaz (University of Amsterdam)
Date/Time: 27 October at 14:30 PM (CET)
Title: Generating image descriptions guided by sequential speaker-specific human gaze
When we describe an image, certain visual and linguistic processes take place, acting in concert with each other. For instance, we tend to look at an object just before mentioning it, but this may not always be the case. Inspired by these processes, we have developed the first models of image description generation informed by the sequential cross-modal alignment between language and human gaze. We build our models on a state-of-the-art image captioning model, which itself was inspired by the visual processes in humans. Our results show that aligning gaze with language production would help generate more diverse and more natural descriptions that are sequentially and semantically more similar to human descriptions. In addition, such findings could also help us shed light on human cognitive processes by comparing different ways of encoding the gaze modality and aligning it with language production.
Mor Geva (Tel Aviv University)
Date/Time: 14 September at 14:30 PM (CET)
Title: Down the Rabbit Hole: Debugging the Inner Workings of Transformer Models
The NLP field is dominated by transformer-based language models that are pre-trained on huge amounts of text. A large body of research has focused on understanding what knowledge and language features are encoded in the representations of these models. However, very little is known on how the models reason over the knowledge they encode. In this talk, I’ll share two recent findings on the inner workings of transformer models, and show how they can be harnessed for interpretability of model predictions. First, we will analyze one of the fundamental yet underexplored components in transformers - the feed-forward layers, and show their role in the prediction construction. Then, I will describe an interesting phenomenon of emergent behaviour in multi-task transformer models, that can be harnessed for interpretability and extrapolation of skills.
Maria Ryskina (CMU)
Date/Time: 22 June at 14:30 PM (CET)
Title: Unsupervised decipherment of informal romanization
Informal romanization is an idiosyncratic way of typing non-Latin-script languages in Latin alphabet, commonly used in online communication. Although the character substitution choices vary between users, they are typically grounded in shared notions of visual and phonetic similarity between characters. In this talk, I will focus on the task of converting such romanized text into its native orthography for Russian, Egyptian Arabic, and Kannada, showing how similarity-encoding inductive bias helps in the absence of parallel data. I'll also share some insights into the behaviors of the unsupervised finite-state and seq2seq models for this task and discuss how their combinations can leverage their different strengths.
Jean-Baptiste Cordonnier (EPFL)
Date/Time: 01 June at 14:00 PM (CET)
Title: Transformers for Vision
The Transformer architecture has become the de-facto standard for natural language processing tasks. In vision, recent trends of incorporating attention mechanisms have led researchers to reconsider the supremacy of convolutional layers as a primary building block. In this talk, we will explore the continuum of architectures mixing convolution and attention. I will share what is specific to the image domain and what lessons could be transferred to NLP.
Ben Peters (Instituto de Telecomunicações)
Date/Time: 04 May at 14:30 PM (CET)
Title: Down with softmax: Sparse sequence-to-sequence models
Sequence-to-sequence models are everywhere in NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. One alternative is sparse sequence-to-sequence models, which replace softmax with a sparse function from the 𝛼-entmax family. In this talk, we show how these models can shrink the search space by assigning zero probability to bad hypotheses, reducing length bias and sometimes enabling exact decoding.
Lieke Gelderloos (Tilburg University)
Date/Time: 13 April at 2.30 PM (CET)
Title: Active word learning through self-supervision
Models of cross-situational word learning typically characterize the learner as a passive observer. However, a language learning child can actively participate in verbal and non-verbal communication. In a computational study of cross-situational word learning, we investigate whether a curious word learner which actively selects input has an advantage over a learner which has no influence over the input it receives. We present a computational model that learns to map words to objects in images through word comprehension and production. The productive and receptive parts of the model can operate independently, but can also feed into each other. This introspective quality enables the model to learn through self-supervision, and also to estimate its own word knowledge, which is the basis for curious selection of input. We examine different curiosity metrics for input selection, and analyze the impact of each method on the learning trajectory. A formulation of curiosity which relies both on subjective novelty and plasticity yields faster learning, robust convergence, and best eventual performance.
Brielen Madureira (University of Potsdam)
Date/Time: 23 March at 3.00 PM (CET)
Title: Incremental Processing in the Age of non-Incremental Encoders
While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). In this talk, I will present the results of an investigation on how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We tested five models on various NLU datasets and compared their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality.
Shauli Ravfogel (Bar-Ilan University)
Date/Time: 26th January at 14:30 (CET)
Title: Identifying and manipulating concept subspaces
While LM representations are highly nonlinear in the input text, probing literature has demonstrated that linear classifiers can recover from the representations various human-interpretable concepts, from notions of gender to part-of-speech. I will present Iterative Nullspace Projection (INLP), a method to identify subspaces within the representation that correspond to arbitrary concepts. The method is data-driven and identifies those subspaces by the training of multiple orthogonal classifiers to predict the concept at focus. I will overview some recent work of ours, which demonstrates the utility of these concept subspaces for different goals: mitigating social bias in static and contextualized embeddings; assessing the influence of concepts on the model's behavior; and identifying syntactic features that explain the internal organization of representation space with respect to specific phenomena, such as relative clause structures.
Shauli will be available for meetings after the talk. Please reach out to Marius (see organizers) for scheduling meetings.