Amir Globerson: Understanding Complex Processing in Temporal Models - From Implicit Biases to Mechanistic Interpretability
Recent temporal models such as transformers and RNN variants can effectively capture complex structure in text. However, it remains largely unknown how this is achieved. The talk will discuss our work on this problem. First, I will discuss work demonstrating implicit biases of RNNs, showing that they have a bias towards learning “simple” rules that correspond to dynamic systems with a low dimensional state. I will then present several works on understanding how transformers solve complex problems. First, I will discuss work that dissects how transformers extract information about the world (e.g., “Paris is the capital of France”), highlighting several information processing streams that underlie this process. Finally, I will discuss our analysis of how transformers achieve in-context learning, and the internal hypothesis space used in this process.
Antti Oulasvirta: Towards More Robust Collaborative AI with Simulation Intelligence
For AI to be able to collaborate with human partners, it needs to be able to predict the consequences of its actions on them. This problem is incredibly hard, however, fundamentally because human behavior is complex and every collaborative situation is unique to certain degree. In this talk, I discuss the design of more robust approaches. I present a vision of user models that do inference, learning, and prediction via simulator-based models of behavior that are driven by theoretical principles originated in psychology. I present first results of “artificial users” and discuss open challenges.
Cédric Archambeau: Beyond SHAP: Explaining Probabilistic Models with Distributional Values
In this talk, I will revisit traditional game-theoretic explainable AI by assuming games have probabilistic rather than scalar payoffs. I will generalise cooperative games and value operators by framing marginal contributions of players to coalitions as differences between two random variables. I will introduce distributional values, random variables that track changes in the model output (eg, flipping of the predicted class) and replace the scalar attributions from traditional game-theoretic explainable AI methods. Finally, I will show examples of the application of this framework to vision and language models.
Jeremias Knoblauch: Post-Bayesian Machine Learning
In this talk, I provide my perspective on the machine learning community's efforts to develop inference procedures with Bayesian characteristics that go beyond Bayes' Rule as an epistemological principle. I will explain why these efforts are needed and provide some recent success stories at the interface between robustness and computation.
Peter Grünwald: E is the new P
How much evidence do the data give us about one hypothesis versus another? The standard way to measure evidence is still the p-value, despite a myriad of problems surrounding it. We present the e-value, a notion of evidence which overcomes some of these issues. While e-values were only given a name as recently as 2019, interest in them has since exploded with papers in the Annals, JRSS B, Biometrika, 2 international workshops. It is the main topic of the ERC Advanced Grant I was awarded earlier this year. In simple cases, e-values coincide with Bayes factors. But if the null is composite or nonparametric, or an alternative cannot be explicitly formulated, e-values and Bayes factors become distinct. Unlike the Bayes factor, e-values always allow for tests with strict frequentist Type-I error control under optional continuation of data collection and combination of data from different sources. E-values are also the basic building blocks of anytime-valid confidence intervals that remain valid under continuous monitoring and optional stopping. In parametric settings they tend to be strictly wider than, hence consistent with Bayesian credible intervals. This led to the development of the e-posterior, a robust analogue to the Bayesian posterior that *gets wider rather than wrong* if the prior is chosen badly.
Silvia Chiappa: Graph-based Statistical Causality for Decision Making
In this talk, I will first discuss at a high level how graph-based statistical causality can be used to achieve more efficient decision making with respect to number of interventions. I will then describe two recent works using graph-based statistical causality in sequential decision making settings: one extending causal Bayesian optimization to general intervention types and one relaxing the assumption of known causal graph in causal bandits.
Sinead Williamson: Posterior Uncertainty Quantification in Neural Networks using Data Augmentation
We approach the problem of uncertainty quantification in deep learning through a predictive framework, which captures uncertainty in model parameters by specifying our assumptions about the predictive distribution of unseen future data. Under this view, we show that deep ensembling (Lakshminarayanan et al., 2017) is a fundamentally mis-specified model class, since it assumes that future data are supported on existing observations only -- a situation rarely encountered in practice. To address this limitation, we propose MixupMP, a method that constructs a more realistic predictive distribution using popular data augmentation techniques. MixupMP operates as a drop-in replacement for deep ensembles, where each ensemble member is trained on a random simulation from this predictive distribution. Grounded in the recently-proposed framework of Martingale posteriors (Fong et al., 2023), MixupMP returns samples from an implicitly defined Bayesian posterior. Our empirical analysis showcases that MixupMP achieves superior predictive performance and uncertainty quantification on various image classification datasets, when compared with existing Bayesian and non-Bayesian approaches.
Tamara Broderick: An Automatic Finite-Sample Robustness Check: Can Dropping a Little Data Change Conclusions?
Practitioners will often analyze a data sample with the goal of applying any conclusions to a new population. For instance, if economists conclude microcredit is effective at alleviating poverty based on observed data, policymakers might decide to distribute microcredit in other locations or future years. Typically, the original data is not a perfect random sample from the population where policy is applied -- but researchers might feel comfortable generalizing anyway so long as deviations from random sampling are small, and the corresponding impact on conclusions is small as well. Conversely, researchers might worry if a very small proportion of the data sample was instrumental to the original conclusion. So we propose a method to assess the sensitivity of statistical conclusions to the removal of a very small fraction of the data set. Manually checking all small data subsets is computationally infeasible, so we propose an approximation based on the classical influence function. Our method is automatically computable for common estimators. We provide finite-sample error bounds on approximation performance and a low-cost exact lower bound on sensitivity. We find that sensitivity is driven by a signal-to-noise ratio in the inference problem, does not disappear asymptotically, and is not decided by misspecification. Empirically we find that many data analyses are robust, but the conclusions of several influential economics papers can be changed by removing (much) less than 1% of the data.
Vincent Fortuin: Use Cases for Bayesian Deep Learning in the Age of Foundation Models
Many researchers have pondered the same existential questions since the release of ChatGPT: Is scale really all you need? Will the future of machine learning rely exclusively on foundation models? Should we all drop our current research agenda and work on the next large language model instead? In this talk, I will try to make the case that the answer to all these questions should be a convinced “no” and that now, maybe more than ever, should be the time to focus on fundamental questions in machine learning again. I will provide evidence for this by presenting three modern use cases of Bayesian deep learning in the areas of interpretable additive modeling, neural network sparsification, and subspace inference for fine-tuning. Together, these will show that the research field of Bayesian deep learning is very much alive and thriving and that its potential for valuable real-world impact is only just unfolding.
Yingzhen Li: Towards Causal Deep Generative Models for Sequential Data
One of my research dreams is to build a high-resolution video generation model that enables granularity controls in e.g., the scene appearance and the interactions between objects. I tried, and then realised the need of me inventing deep learning tricks for this goal is due to the issue of non-identifiability in my sequential deep generative models. In this talk I will discuss our research towards developing identifiable deep generative models in sequence modelling, and share some recent and on-going works regarding switching dynamic models. Throughout the talk I will highlight the balance between causality "Theorist" and deep learning "Alchemist", and discuss my opinions on the future of causal deep generative modelling research.