The gap between what language models say and what they know
Abstract: Alignment efforts to make large language models (LLMs) trustworthy and safe are often easy to bypass, as it is possible to steer models from their safe behavior to generate biased, harmful, or incorrect information. This raises the question of what information LLMs capture in their hidden representations versus in the text they generate. In the first part of the talk, we will show that it is possible to estimate how knowledgeable a model is about a given subject only its hidden representations, using a simple and lightweight probe, called KEEN. While KEEN correlates with model factuality, question-answering performance, and hedging behavior, analyzing its scores after model fine-tuning reveals a gap between the model’s inner knowledge and the knowledge it expresses in its outputs. Next, we will consider the problem of unlearning and leverage “parametric knowledge traces” for evaluation. We will see that while existing unlearning methods succeed at standard behavioral evaluations, they fail to erase the concept from the model parameters and instead suppress its generation during inference.
Bio: Mor Geva is an Assistant Professor (Senior Lecturer) at the School of Computer Science at Tel Aviv University and a Visiting Researcher at Google. Her research focuses on understanding the inner workings of large language models, to increase their transparency and efficiency, control their operation, and improve their reasoning abilities. Mor completed a Ph.D. in Computer Science and a B.Sc. in Bioinformatics at Tel Aviv University, and was a postdoctoral researcher at Google DeepMind and the Allen Institute for AI. She was nominated as one of the MIT Rising Stars in EECS and is a laureate of the Séphora Berrebi Scholarship in Computer Science. She was awarded the Dan David Prize for graduate students in the field of AI and received an Outstanding Paper Award at EACL 2023.
Generalisation in LLMs – and beyond
Abstract: "Good generalisation" is often mentioned as a desirable property for NLP models. For LLMs, in the light of the sheer training corpora, among other things, it becomes more and more challenging to understand if our models generalise, and how important that still is. In this presentation, I briefly discuss generalisation in NLP on a higher level, and then move on to discussing it specifically for LLMs. What types of generalisation are still important, how would we evaluate it, and is it possible to evaluate it independently from the training corpus? I will – hopefully – answer some of your questions, but also raise a lot more!
Bio: Dieuwke Hupkes is a research scientist at Meta. Among other things, she works on better understanding how (large) language models generalise, what they (don't) understand and what that even means, and more generally on how they can reasonably be evaluated. She is excited about the new opportunities such models bring us and the new scientific challenges that go hand in hand with that.
Latent Space Exploration for Safe and Trustworthy AI
Abstract: The recent performance advancement in deep neural network models has substantially increased their dissemination to vast application areas. Given this widespread adoption, ensuring the safety and trustworthiness of AI models is more critical than ever before. A standard way for assessing AI model performance involves extrinsic evaluation across a set of downstream tasks. While effective for advancing state-of-the-art, these evaluations offer limited insights into how models learn and solve a task. In this talk, I advocate for a deeper exploration of model internals, particularly their latent space, to fully test their capabilities, to build better models and to increase trust in them. I will present a few use cases to support this stance. For instance, the intrinsic dimensionality trend of models explains the robustness-generalization tradeoff during adversarial training, informing the design of robust and scalable adversarial methods without compromising generalization. Moreover, the study of the structure and representation of knowledge within latent space is effective in evaluating the language comprehension capabilities of models and enables interpretation of their predictions.
Bio: Hassan Sajjad is an Associate Professor in the Faculty of Computer Science at Dalhousie University, Canada, and the director of the HyperMatrix lab. His research focuses on natural language processing (NLP) and safe and trustworthy AI, particularly text generation, robustness, generalization, alignment, interpretation, and explainability of NLP models. His research work is recognized at several prestigious venues such as NeurIPS, ICLR, and ACL and is featured in prominent tech blogs including MIT News.
Efficiency as an Inductive Bias: Towards Tokenizer-free and Dynamically Sparse Language Models
Abstract: Efficiency in language models is often hailed as a solution to democratise access to AI technology and to make it more environmentally sustainable. In this talk, I emphasise an additional and sometimes neglected advantage of efficiency: namely, providing an inductive bias for language use and acquisition. Firstly, I will explore how dynamically compressing token representations and/or the key-value cache in Transformer LLMs boosts memory and time efficiency. In addition, this process also discovers abstractions from raw data and results in tokenizer-free models. Secondly, I will demonstrate how fine-tuning subnetworks in LLMs allows for adapting them with limited memory and parameter budgets. In addition, learning how to route information through such neural pathways also leads to better generalisation in new tasks.
Bio: Edoardo M. Ponti is an assistant professor in natural language processing at the University of Edinburgh and a visiting professor at NVIDIA. Previously, he was a visiting postdoctoral scholar at Stanford University and a postdoctoral fellow at Mila and McGill University in Montreal. In 2021, he obtained a PhD in computational linguistics from the University of Cambridge, St John’s College. His main research foci are modular deep learning, efficient neural architectures, and computational typology. His research earned him a Google Research Faculty Award and 2 Best Paper Awards at EMNLP 2021 and RepL4NLP 2019. He is a (terrible) violinist, football and tennis player, and an aspiring practitioner of heroic viticulture.