WINTER 2023

Friday, 16 June, 2023 

Nicolas Le Roux

Recording : TBD

Deep language networks


We view large language models (LLMs) as stochastic language layers in a deep network, where the learnable parameters are the natural language \emph{prompts} at each layer. We stack multiple such layers, feeding the output of one layer to the next. We call the stacked architecture a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1), i.e. an LLM. We then show how to train 2-layer DLNs (DLN-2), where two prompts must be learnt. We consider the output of the first layer as a latent variable over which we need marginalize, and devise a variational inference algorithm for joint prompt training. A DLN-2 reaches higher performance than a single layer, sometimes comparable to GPT-4 even when each LLM in the network is smaller and less powerful.

Friday, 9 June, 2023 

Jun Ding

Recording : TBD

Unagi reconstructs the cellular dynamics in pulmonary fibrosis and identifies repurposed drugs


Idiopathic Pulmonary fibrosis (IPF) is a terminal chronic lung disease causing lung scarring and a progressive decline in lung function. Current medications for this disease are minimal (Pirfenidone and Nintedanib). Emerging single-cell sequencing technologies can track the cellular dynamics in IPF progression and thus provide unrivaled opportunities to identify more effective therapeutic targets and drugs. In this paper, we have profiled the cellular states across different IPF stages using single-nuclei RNA-seq. Furthermore, we have developed a unified and computationally efficient drug repurposing framework called UNAGI (computational approach driven repurposed drugs for idiopathic pulmonary fibrosis), which reconstructs the cellular dynamics from the IPF single-nuclei RNA-seq data and identifies candidate drugs for the disease. UNAGI employs a deep generative adversarial variational-autoencoder with graph embedding to iteratively learn cellular dynamic graphs of IPF progressions and suggest a list of potential therapeutic targets from the reconstructed gene regulatory network that modulate the disease progression. UNAGI empowers in-silico explorations of intervention strategies to restore the healthy status of dynamic cell populations during the disease progression, which presents a short list of target pathways, potential repurposed drugs, and novel compounds against IPF. The UNAGI platform successfully identifies Nintadanib as an efficacious IPF drug and identifies several other potential compounds previously reported to repress induced pulmonary fibrosis. We have also systematically examined the top pathways identified by the model, which are significantly associated with pulmonary fibrosis as documented in the existing literature. These all manifest the effectiveness of the UNAGI platform. 

Friday, 2nd June, 2023 

Pouya Bashivan

Recording : TBD

DEEP NETS AND THE BRAIN : ARE WE THERE YET?


Deep neural networks are constantly improving at performing complex behaviours. They are capable of doing many more tasks than before, while also getting better at generalization in ways that we humans expect them to. This progress motivated us to ask whether such performance improvements are being translated into internal representations that are more similar to the neuronal activity in the brain. We investigated this question in the context of vision by comparing various vision DNNs and neural activity from the macaque visual cortex and came to a surprising conclusion that the performance on none of the typical OOD recognition benchmarks is predictive of models’ brain-similarity. However, we found that most adversarially robust models had much more brain-like internal activity. Inspired by this observation, we then asked what factors may contribute to neuronal robustness in the absence of any apparent adversarial training procedure in brains. We pursued a usual suspect in biology, the lateral connectivity among neighbouring neurons. We simulated such connections in CNNs with a simple operation that we called kernel average pooling (KAP) and found that not only CNNs with KAP were more robust to adversarial attacks but also these networks learned topographically organized kernels that qualitatively mimic those in the primate visual cortex. 

Friday, 28 April, 2023 

Sven Gowal

Recording : here

gENERALIZATION AND ADAPTABILITY USING ADVERSARIALLY ROBUST MODELS


Enabling models to generalize robustly to adversarial and natural distribution shifts is a fundamental problem in machine learning. In this talk, we explore the generalization capabilities of models trained to be robust against lp-norm bounded adversarial perturbations. First, we introduce the concept of adversarial training, enumerate its key challenges, and demonstrate how we can leverage generative models to bypass these challenges and reach state-of-the-art performance on various adversarial benchmarks. We then shift our focus towards the ability of these adversarially trained models to generalize to in- and out-of-distribution settings. Finally, we demonstrate that ensembles of robustly trained models can be adapted on the fly, without retraining, allowing us to control the trade-off between standard and robust performance and to adapt to new out-of-distribution settings quickly.

Friday, 21 April, 2023 

Matthew Taylor

Recording :  here

HUMAN AND AGENT COOPERATIVE LEARNING


Reinforcement learning has had incredible successes in both controlled and applied settings. Rather than focusing on single agents, this talk will focus on the multi-agent case, where agents can teach each other, agents can learn from humans, and humans can learn from agents. This talk will summarize recent work in this area and highlight multiple open questions.

Friday, 31 March, 2023 

Changjian Shui

Recordinghere

Algorithmic fairness through group calibration: theories and applications in medical image analysis


 Developing a responsible learning system is essential for automatic clinic decision support. One important component is to ensure that the model is not biased in that it is performing poorly for different subgroups of population, thereby fair clinic decisions. In machine learning, various fair notions and algorithms have been developed. However, are these algorithms or notions suitable for healthcare? In this talk, we will discuss a strategy to mitigate biases in model calibration (i.e, group calibration), an issue in healthcare that has recently been uncovered (Obermeyer et al. Science 2019 ). We will present its theoretical properties that enable fair and accurate predictions, and further demonstrate how meta-learning can provably control calibration bias. Additionally, we discuss unique challenges of bias mitigation in medical image analysis, and propose a simple yet effective approach to mitigate calibration bias for medical image datasets.

Friday, 3 March, 2023 

Fred Hamprecht

Recording : here

Dimension reduction: from t-SNE to UMAP with contrastive learning


Visualizing high-dimensional data is useful in exploratory data analysis, and a good idea also before embarking on supervised learning of tabular data. The best currently known methods are t-SNE and UMAP. Motivated from entirely different viewpoints, their loss functions appear to be unrelated. In practice, they yield starkly differing visualizations that can suggest conflicting interpretations of the same data.


We uncover a conceptual connection between t-SNE and UMAP in terms of contrastive learning, and provide a mathematical characterization of the distortion introduced by UMAP's negative sampling. We exploit this new conceptual connection to propose and implement a generalization of negative sampling, allowing us to interpolate between (and extrapolate beyond) t-SNE and UMAP and their respective embeddings.


I may also give in to the temptation to tell you a little about Branched Optimal Transport and about generalizations of shortest path semirings.


Joint work with Sebastian Damrich, Jan Niklas Böhm and Dmitry Kobak.

Friday, 24 February, 2023 

Irina Rish

Recording : here


Scaling Laws for Foundation Models in Deep Learning


Modern AI systems have achieved impressive results in many application domains; however, until recently, such systems remained “narrow specialists” incapable of generalizing to a wide range of diverse tasks without being specifically trained on them. In the past several years, however, the situation is rapidly changing thanks to advances in large-scale self-supervised pre-trained on large amounts of diverse data  (a.k.a. “foundation  models”) . Scaling such models lead to emergence of impressive few-shot generalization capabilities with respect to broad sets of novel tasks. Furthermore, predicting the performance  of such models at scale is important in order to identify the methods that are most probable to stand the test-of-time; thus, studying  neural scaling laws as  a function of model, data and compute size became a rapidly growing area of research in the past couple of years. In this talk, we will present Broken Neural Scaling Laws that allows one to extrapolate the downstream (and upstream) performance of large-scale vision, language, audio, video, diffusion generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution generalization, continual learning, arithmetic, unsupervised/self-supervised learning, and reinforcement learning (single agent and multi-agent) tasks, assuming you have enough training runs before a break to extrapolate what happens until the next sharp break.


While the recent developments in foundation models area are truly exciting, they also pose a new  challenge to academic and non-profit AI research organizations which may not have access to a similar amount of compute that industry has.  This motivated us – a rapidly growing international collaboration across several Universities and non-profit organizations – to join forces and initiate an effort towards developing common objectives and tools for advancing the field of large-scale foundation models. Our long-term, overarching goal is to develop a wide international collaboration united by the objective of building foundation models that are increasingly more powerful, while at the same time are safe, robust and aligned with human values. Such models aim to serve as the foundation for numerous AI applications, from industry to healthcare to scientific discovery - i.e., AI-powered applications of great societal value. We aim to avoid accumulation of the most advanced AI technology in a small set of large companies, while jointly advancing the field of AI and keeping it open (“democratization” of AI). Obtaining an access to large-scale computational resources would greatly facilitate the development of open AI research world-wide, and ensure a collaborative, collective solution to the challenge of making AI systems of the future not only highly advanced but maximally beneficial for the whole of humanity.


Friday, 17 February, 2023 

Dhanya Sridhar

Recording : here


Learning causal variables with machine learning


Science and decision-making require us to infer the effects of interventions. Does knocking out a given gene suppress a function of interest? Does a proposed tax actually change some behavior of interest? Causal models provide a language to model interventions, and help us derive assumptions that yield valid causal inference. Despite the role causality plays in the sciences, the applications of causal inference have been limited, often restricted to questions where all the variables are carefully measured. In contrast, the field of machine learning (ML) has arguably succeeded at extracting task-relevant information from unstructured inputs such as text and images, inputs that implicitly capture abstract variables. Nevertheless, variables inferred using ML may not be substitutes for the underlying but unknown causal variables: ML methods may entangle the underlying causal variables, or neglect to capture them, biasing downstream causal inference. In this talk, I'll discuss two approaches to learning causally relevant variables. First, I'll introduce causally sufficient text embeddings, a general method that leverages causal model structure to learn causal variables from text data. Next, I'll discuss recent work, inspired by biological tasks, that exploits evolution in the causal mechanism mapping inputs to a target of interest to learn causal variables. Finally, I'll conclude by highlighting ongoing and open research to address the challenges of causal reasoning with ML. 

Friday, 10 February, 2023 

Yongyi Mao

Recording : here


Towards understanding the generalization behaviour of deep learning systems


The stunning power of deep learning remains a great mystery to date. Specifically, despite their extremely high expressive power, deep neural networks, trained with Stochastic Gradient Descent (SGD), appear to generalize well, contradicting the conventional wisdom in statistical learning theory. In this talk, I will highlight a theme of research carried out in my research group in understanding the generalization of deep neural networks. Specifically, we develop information-theoretic upper bounds for the generalization error of networks trained with SGD. Our results suggest that two correlated measures, namely, both the leakage of information along the training trajectory about the training sample and the stability of the found solution, impact the generalization of learned network. These bounds also inspire some simple and practicable training techniques that improve generalization. Time permitting, I will also touch on an ongoing work that improves upon the information-theoretic bounds from the super-sample setting of Steinke-Zakynthinou.  The talk is based on joint works with my PhD student Ziqiao Wang.

Friday, 3 February, 2023 

Connor Coley

Recording : here

AI for chemical space exploration and synthesis


The identification and synthesis of molecules that exhibit a desired function is an essential part of addressing contemporary problems in science and technology. Small molecules are the predominant solution to challenges in the development of medicines, chemical probes, specialty polymers, and organocatalysts, among others. The typical discovery paradigm is an iterative process of designing candidate compounds, synthesizing those compounds, and testing their performance experimentally, where each repeat of this cycle can require weeks or months. There are a variety of techniques used for prioritizing experiments based on the predicted property profiles of molecules. We will talk about some of the complexities of molecular representation learning and the limitations of graph-based models.


We will also discuss a primary consideration of molecular design workflows: the chemical space that comprises the search space for a molecular screening/optimization campaign. That is, the manner in which the search is constrained to a finite library of molecules or, following an increasingly popular trend, the manner in which the search navigates a virtually infinite space of molecules. Important considerations include whether the search space is constrained or unconstrained in terms of synthesizability (or commercial availability), which impacts the ease of experimental validation, as well as sample efficiency. 

Friday, 20 January, 2023 

Diane Bouchacourt

Recording : here

Learning structured representations for generalization under distribution shifts


Machine learning models commonly assume that the training and test data are independent and identically distributed (IID). In such settings, state-of-the-art (SOTA) architectures, potentially pre-trained with self-supervision, achieve impressive results. Issues arise when the test distribution differs from the training distribution, in which case SOTA models incur considerable degredation in performance. This setting is known as out-of-distribution (OOD) generalization. The brittleness of SOTA models prevents reliably deploying them, for example leading to potentially unfair model behaviors. Reasons for this brittleness are multiple: (self) supervised learning criteria ensure some degree of global invariance but they don’t aim to induce any structure beyond that. Thus, spurious and robust features are entangled in the representation, and models in turn rely on easy-to-learn spurious correlations during training.  Furthermore, while machine learning challenges ultimately require models deployed on real datasets, most existing works in the OOD literature are evaluated on synthetic, toy datasets where we have complete control and knowledge of the variations in the data. Our approach towards solving these challenges is three-fold. First, to shed light on the brittleness of SOTA models to variation in the data factors of variation, we perform large experimental studies of SOTA architecture invariance properties, as well as their robustness to change in the factors. Second, we design learning criteria that enforce some structure in the representation that is expected to bring OOD robustness. To do so, we induce disentanglement in the representation, or learn transformation operators that act on the representation space. Finally, we develop realistic datasets and benchmarks that allow us to test and analyze the robustness of SOTA models, e.g. by annotating the ImageNet validation set with its factors of variation

Friday, 13 January, 2023 

Christian Weilbach

Recording : here

General-purpose amortized inference through structured diffusion models


Recently diffusion models (DMs) have demonstrated superior performance on many generative modelling and inference tasks. Last year we introduced a flexible video diffusion model that can generate very long sequences of coherent videos by incorporating amortized marginalization into its attention mechanism. In this work we further extend DMs to amortize conditioning in graphical models and solve an illustrative and diverse set of hard inference problems, namely Sudoku solving, binary continuous matrix factorization and sorting. We do so by sparsely structuring the attention mechanism of the DM and yield much better theoretical and empirical scaling properties than standard DMs, reducing the gap to established algorithms. In combination our work provides a versatile framework for compiling probabilistic programs, e.g. stochastic scientific simulators, into amortized inference artifacts.