WINTER 2024

Friday, 31 May, 2024

Thomas Bury

Recording : TBD



Compositional Reasoning for Deep learning for early warning signals of tipping points


Tipping points — abrupt changes in the state of a dynamical system — can occur in systems ranging from the Earth’s climate to the human heart, often with dire consequences. An abundance of research has focused on the development of early warning signals for tipping points based on generic properties of dynamical bifurcations, such as critical slowing down. In our work, we have trained a deep learning classifier to predict tipping points using a massive library of randomly generated dynamical systems. I will show how the classifier generalises to predicting tipping points in real ecological and cardiac systems, and does so with greater accuracy than conventional indicators. Finally, I will present recent work on using reinforcement learning to discover triggers for cardiac arrhythmia by interacting with a mathematical model for cardiac tissue. This talk will highlight the utility of combining deep learning with dynamical systems to better understand natural systems on which humanity relies.

Friday, 17 May, 2024

Parisa Kordjamshidi

Recording : TBD



Compositional Reasoning for Natural Language Comprehension and Grounding Leveraging Neuro-Symbolic AI 


Recent research indicates that large language models lack consistent reliability in tasks requiring complex reasoning. While they may impress us with fluently written articles prompted by user input, they can easily disappoint us by displaying shortcomings in basic reasoning skills, such as understanding that "left" is the opposite of "right." To address real-world problems, computational models often need to involve multiple interdependent learners, along with significant levels of composition and reasoning based on additional knowledge beyond available data. In this talk, I will discuss our findings and novel models for compositional reasoning over complex linguistic structures, grounding language in visual perception and combining multiple modalities of information. I will highlight our efforts in neuro-symbolic modeling to integrate explicit symbolic knowledge and enhance the compositional generalization of neural learning models. Additionally, I will introduce DomiKnowS, our library that facilitates neuro-symbolic modeling. DomiKnowS framework exploits both symbolic and sub-symbolic representations to solve complex, AI-complete problems. It seamlessly integrates domain knowledge in the form of logical constraints in deep models through various underlying algorithms.

Friday, 3 May, 2024

Herve Lombaert

Recording : here


Spectral Geometry of Graphs and Learning - Examples on Brain Surfaces


How to analyze the shapes of complex geometry, such as the highly folded surface of the brain?  This talk will show how spectral shape analysis can benefit general learning problems where data fundamentally lives on surfaces.  We exploit spectral coordinates derived from the Laplacian eigenfunctions of shapes.  Spectral coordinates have the advantage over Euclidean coordinates, to be geometry aware, invariant to isometric deformations, and to parameterize surfaces explicitly.  This change of paradigm, from Euclidean to spectral representations, enables a classifier to be applied *directly* on surface data, via spectral coordinates.  Brain matching and learning of surface data will be shown as examples.  The talk will focus, first, on the spectral representations of shapes, with an example on brain surface matching; second, on the basics of geometric deep learning; and finally, on the learning of surface data, with an example on automatic brain surface parcellation.  




Friday, 26 April, 2024

Jack Stanley

Recording : here


LLM EXPLAINABILITY IN BIOMEDECINE


Large Language Models (LLMs) are now revolutionizing natural language processing of text and other sequential data. By enabling new regimes of transfer learning and task-specific fine-tuning, pre-trained LLMs have the potential to be especially useful for deriving insights in data-poor domains, such as healthcare and biomedicine. However, a major barrier to progress hindering the adoption of LLMs as inference tools in biomedicine, and other areas of high-stakes decision making, is the seemingly impenetrable black-box nature of the model’s visceral internals. Ideally, we would be able to adequately identify the specific semantic elements present in the input that drive LLM decision making, so that we could draw mechanistic conclusions on biological systems and conclusions actionable in real-world settings. In this talk, we present a case study of our recent work attempting to chart avenues for understanding how LLMs operate internally. Specifically, we fine-tune a pre-trained LLM on free-form health reports written by experienced autism clinicians, with the goal of predicting autism diagnoses solely from these text reports. We show how architecture specialization by recasting the unit of investigation and inference, as well as how incorporating established external knowledge sources can support mechanistic interpretability in state-of-the-art LLMs.




Friday, 19 April, 2024

Leon Bottou

Recording : here


BORGES AND AI


Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask whether this mental imagery provides a good description of the phenomenon at hand. Understanding weather patterns through the moods of the gods only goes so far. The present paper instead advocates understanding LLMs and their connection to AI through the imagery of Jorge Luis Borges, a master of 20th century literature, forerunner of magical realism, and precursor to postmodern literature. This exercise leads to a new perspective that illuminates the relation between language modelling and artificial intelligence. 




Friday, 5 April, 2024

Ross Goroshin

Recording : here


Course Correcting Koopman Representations


Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS. This work was a joint collaboration between Google DeepMind and MILA, and will be presented at ICLR 2024. 




Friday, 22 March, 2024

Hinrich Schuetze

Recording : TBD


Quality data for LLMs : Challenges and opportunities 


The recent breakthroughs in AI owe much to scaling and to new algorithms and architectures.  But data quality has also turned out to be critical.  I will argue that going forward, data innovations may be as important as model innovations. Specifically, I will talk about two examples of the importance of data: quality data for training Glot500, a highly multilingual language model, and quality data for LongForm-C, an instruction tuning dataset created by reverse instructions for longform text generation.  Glot500 achieves good performance on a large number of low-resource languages, many of which were not covered by open models before. Models instruction-tuned on LongForm-C generate good-quality longform text on a wide variety of generation tasks. LongForm-C can be effectively combined with existing instruction-tuning datasets.




Friday, 15 March, 2024

Krikamol Muandet

Recording : TBD


On Imprecise Generalisation: From Invariance to Heterogeneity


The ability to generalise knowledge across diverse environments stands as a fundamental aspect of both biological and artificial intelligence (AI). In recent years, significant advancements have been made in out-of-domain (OOD) generalisation, including the development of new algorithmic tools, theoretical advancements, and the creation of large-scale benchmark datasets. However, unlike in-domain (IID) generalisation, OOD generalisation lacks a precise definition, leading to ambiguity in learning objectives.


In this presentation, I aim to clarify this ambiguity by arguing that OOD generalisation is challenging because it involves not only learning from empirical data but also deciding among various notions of generalisation. The intersection of learning and decision-making poses new challenges in modern machine learning, where distinct roles exist between machine learners (e.g., ML engineers) and model operators (e.g., doctors). 


To address these challenges, I will introduce the concept of imprecise learning, drawing connections to imprecise probability, and discuss our initial work in the context of domain generalisation (DG) problems. By exploring the synergy between learning algorithms and decision-making processes, this talk aims to shed light on the complexities of OOD generalisation and pave the way for future advancements in the field.




Friday, 1 March, 2024

Eva Portelance

Recording : here


Cognitive science and ai : neural network models for studying people


This talk will explore how AI models as objects of study can be useful tools for understanding the human mind. It will present two approaches for studying human behaviour using different types of neural networks and experimental designs. As a case study, I will consider how these different modeling approaches can be used to help us understand how humans learn language. Understanding how humans learn is an important problem for cognitive science and a window into how our minds work. Additionally, human learning is in many ways the most efficient and effective algorithm there is for learning language; understanding how humans learn can help us design better AI models in the future.




Friday, 23 February, 2024

Naila Murray

Recording : here


distilling self-supervised VIT for few-shot classification and segmentation


We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions through separate task heads. Our model is able to effectively learn to perform classification and segmentation in the absence of pixel-level labels during training, using only image-level labels. To do this it uses attention maps, created from tokens generated by the self-supervised ViT backbone, as pixel-level pseudo-labels. We also explore a practical setup with “mixed” supervision, where a small number of training images contains ground-truth pixel-level labels and the remaining images have only image-level labels. For this mixed setup, we propose to improve the pseudo-labels using a pseudo-label enhancer that was trained using the available ground-truth pixel-level labels. Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings, and in particular when little-to-no pixel-level labels are available.



Friday, 16 February, 2024

Elvis Dohmatob

Recording : here


DEMYSTIFYING NEURAL SCALING LAWS


Given a compute budget of X dollars, how would you optimally choose the size of the model and size of the dataset on which to train the model ? Neural Scaling Laws, as demonstrated by Kaplan et al. (OpenAI) and Hoffmann et al. (DeepMind), have been proposed to offer a systematic approach to address such inquiries. These laws empirically observe how the performance of LLMs scales, expressed as a sum of reciprocals of powers of scalable parameters like dataset size and model size. While these insights are very useful for compute allocation and even explaining the emergence (?) of skills in LLMs, it is crucial to establish their origin and validity. In this presentation, I will discuss our recent work [1], which establishes precise scaling laws for associative memory models, contributing to our understanding of LLMs. If time permits, I'll also sketch how scaling laws provide a satisfactory explanation [2, 3] of the recently popularized phenomenon of "model collapse", whereby model performance catastrophically declines due to pollution of data by older generations.

Joint work with Vivien Cabannes (Meta), Alberto Bietti (Simons Institute), Julia Kempe (NYU and Meta), Yunzhen Feng (NYU), François Charton (Meta), and Pu Yang (Peking University).



Friday, 9 February, 2024

Jamelle Watson-Daniels

Recording : here


The Roads not Taken: Model Multiplicity in Machine Learning


At times, decision making is complicated by the existence of multiple equally good options. For example, consider a person deciding between roads to take while travelling. In terms of the fastest route, there might be one road that beats the others. Or there may be two roads with equal travel time. In this case, other factors need to be taken into consideration like whether there is a gas station on the route or the likelihood of increased traffic. In terms of predictive models, we consider the question: What happens when there exists multiple equally good options for machine learning models? In applied machine learning, this is referred to as model multiplicity. Several recent works demonstrate that many ML tasks admit a large Rashomon Set of competing models that can disagree on a significant fraction of individual predictions (predictive multiplicity). In ML-supported decision-making, the selection of a model from the Rashomon Set without regard for predictive multiplicity can lead to unjustified or unfair decisions. I will present an overview of my ongoing research in this area.




Friday, 2 February, 2024

Emmanuel Noutahi

Recording : here

Autonomous AI Agents & Multimodal Models: A New Era for Drug Discovery


This talk will explore the transformative role of machine learning in drug discovery, addressing current challenges and highlighting how these are being tackled. We will further focus on the potential of autonomous AI agents to revolutionize the field, discussing both theoretical foundations and practical implications. A prime example of this innovation is LOWE, which exemplifies how AI agents can industrialize drug discovery processes.


Friday, 26 January, 2024 

Fabio Viola

Recording : here

The PRACTICE OF RESEARCH ENGINEERING


This talk will reflect the perspective that AI research is not a theoretical construct, but about building an artefact acting in the real world, and as such benefits strongly from adopting an engineering mindset from the outset.

I will discuss a number of research engineering practical lessons I have picked up over years of work in industry, and share tips on how good Research Engineering practice can help you in your {MSc, PhD, job}.


Friday, 19 January, 2024

Steven Harnad

Recording : here

LANGUAGE WRIT LARGE : LLMs, Chatgpt, mEANING AND UNDERSTANDING


Apart from what (little) OpenAI may be concealing from us, we all know (roughly) how ChatGPT works (its huge text database, its statistics, its vector representations, and their huge number of parameters, its next-word training, etc.). But none of us can say (hand on heart) that we are not surprised by what ChatGPT has proved able to do with these resources. It has even driven some of us to conclude that it actually understands. It’s not true that it understands. But it is also not true that we understand how it can do what it can do. 

I will suggest some hunches about benign “biases” that emerge at LLM-scale that may be helping ChatGPT do so much better than we would have expected. These biases are inherent in the nature of language itself, at LLM-scale, and they are closely linked to what it is that ChatGPT lacks, which is direct sensorimotor grounding, connecting its words to their referents and its propositions to their meanings. 

These benign biases are related to (1) the parasitism of indirect verbal grounding on direct sensorimotor grounding, (2) the circularity of verbal definition, (3) the “mirroring” of language production and comprehension, (4) iconicity in propositions at LLM-scale, (5) computational counterparts of human “categorical perception” in category learning by neural nets, and perhaps also (6) a conjecture by Chomsky about the laws of thought. 


Discussion with ChatGPT about this : https://generic.wordpress.soton.ac.uk/skywritings/2024/01/14/language-writ-large-llms-chatgpt-meaning-and-understanding/



Friday, 12 January, 2024 

Ankit Anand

Recording : here

AI for proving and disproving conjectures in Mathematics and Theoretical Computer Science


AI has made great strides in multiple domains including robotics, game playing, biology and climate science etc.. In this talk, we will describe some of our attempts to use AI for automated theorem proving and developing new lower bounds for open mathematical conjectures. 

Firstly, I will describe how we adapted the idea of hindsight experience replay from reinforcement learning to the automated theorem proving domain, so as to use the intermediate data generated during unsuccessful proofs. We show that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best configuration, on the popular benchmarks MPTP2078, M2k and Mizar40. The proofs generated by our algorithm are also almost always significantly shorter than E’s proofs.

Secondly, I will describe our recent work on studying a central extremal graph theory problem inspired by a 1975 conjecture of Erdős. We formulate the graph generation problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method. Using curriculum, we improve the state-of-the-art lower bounds for several sizes for this problem. I will also build close connections with drug discovery in this problem as well as GFlowNets.