Pre ICML @ London 2025

The goal of this meetup is to bring together students, researchers, and engineers from the greater London area to discuss machine learning research presented at International Conference on Machine Learning (ICML), 2025. The event also provides an opportunity for researchers to promote their work and connect with local community members. The day will feature poster sessions and in-person presentations (TBC).

While this event is focused around ICML research, we welcome all posters that fit the ICML topics of interest.

See photos of the event here.

Schedule. July 3rd, 2025 (Thursday)

Arrive at Lecture Theatre 1, Cruciform Building. See directions here.

1:00pm - 1:30pm

Leature Theatre 1, Cruciform Building

Welcome & Registration

Picking up name badges
Meeting and greeting

1:30pm - 2:50pm

Leature Theatre 1, Cruciform Building

Talk Session A.

Keynote 1 (1:30pm BST: 25min).

Valentin De Bortoli. Distributional Diffusion Models with Scoring Rules.

Diffusion models generate high-quality synthetic data. They operate by defining a continuous-time forward process which gradually adds Gaussian noise to data until fully corrupted. The corresponding reverse process progressively "denoises" a Gaussian sample into a sample from the data distribution. However, generating high-quality outputs requires many discretization steps to obtain a faithful approximation of the reverse process. This is expensive and has motivated the development of many acceleration methods. We propose to accomplish sample generation by learning the posterior {\em distribution} of clean data samples given their noisy versions, instead of only the mean of this distribution. This allows us to sample from the probability transitions of the reverse process on a coarse time scale, significantly accelerating inference with minimal degradation of the quality of the output. This is accomplished by replacing the standard regression loss used to estimate conditional means with a scoring rule. We validate our method on image and robot trajectory generation, where we consistently outperform standard diffusion models at few discretization steps.

Keynote 2 (1:55pm BST; 25min).

Leena Chennuru Vankadara. Towards a theory of scaling in Deep Learning.

Scaling computational resources is essential for advancing modern deep learning, enabling both quantitative improvements and qualitative emergence of new behaviors. This talk introduces recent theoretical progress on using scaling limits to derive principled scaling rules, such as the muP² rule for Sharpness-Aware Minimization (SAM), demonstrating their impact on stability, feature learning, and hyperparameter predictability. We further address discrepancies between infinite-width theory and practice under standard parameterization (SP), revealing a novel "controlled divergence" regime that explains empirical phenomena including stable feature evolution and learning rate scaling behavior.

Student Spotlight Talk 1 (2:20pm BST; 15min).

Aya Kayal. Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds.

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\O(\sqrt{\Gamma(T)T})$, where $\Gamma(T)$ represents the maximum information gain---a kernel-specific complexity term---and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO---achieved with richer feedback models---are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.

Student Spotlight Talk 2 (2:35pm BST; 15min).

Hugh Dance. Efficiently Vectorized MCMC on Modern Accelerators. See recording here.

With the advent of automatic vectorization tools (e.g., JAX’s vmap), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem—loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like vmap, by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.

15 minute break.

3:05pm - 4:00pm

Leature Theatre 1, Cruciform Building

Talk Session B

Keynote 3 (3:05pm BST; 25min).

Jeremias Knoblauch. Near-Optimal Approximations for Bayesian Inference in Function Space

We propose a scalable inference algorithm for Bayes posteriors defined on a reproducing kernel Hilbert space (RKHS). Given a likelihood function and a Gaussian random element representing the prior, the corresponding Bayes posterior measure can be obtained as the stationary distribution of an RKHS-valued Langevin diffusion. We approximate the infinite-dimensional Langevin diffusion, and expose that it generalises sparse variational Gaussian process (SVGP) (see Titsias, 2009) to the case where variational families are not parametrically constrained.

Student Spotlight Talk 3 (3:30pm BST; 15min).

Euodia Dodd. Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods.

Membership inference attacks (MIAs) are widely used to empirically assess privacy risks in machine learning models, both providing model-level vulnerability metrics and identifying the most vulnerable training samples. State-of-the-art methods, however, require training hundreds of shadow models with the same architecture as the target model. This makes the computational cost of assessing the privacy of models prohibitive for many practical applications, particularly when used iteratively as part of the model development process and for large models. We propose a novel approach for identifying the training samples most vulnerable to membership inference attacks by analyzing artifacts naturally available during the training process. Our method, Loss Trace Interquartile Range (LT-IQR), analyzes per sample loss trajectories collected during model training to identify high-risk samples without requiring any additional model training. Through experiments on standard benchmarks, we demonstrate that LT-IQR achieves 92% precision@k=1% in identifying the samples most vulnerable to state-of-the-art MIAs. This result holds across datasets and model architectures with LT-IQR outperforming both traditional vulnerability metrics, such as loss, and lightweight MIAs using few shadow models. We also show LT-IQR to accurately identify points vulnerable to multiple MIA methods and perform ablation studies. We believe LT-IQR enables model developers to identify vulnerable training samples, for free, as part of the model development process. Our results emphasize the potential of artifact-based methods to efficiently evaluate privacy risks.

Student Spotlight Talk 4 (3:45pm BST; 15min).

Yassine Abbahaddou. Graph Neural Network Generalization with Gaussian Mixture Model Based Augmentation. See recording here.

Graph Neural Networks (GNNs) have shown great promise in tasks like node and graph classification, but they often struggle to generalize, particularly to unseen or out-of-distribution (OOD) data. These challenges are exacerbated when training data is limited in size or diversity. To address these issues, we introduce a theoretical framework using Rademacher complexity to compute a regret bound on the generalization error and then characterize the effect of data augmentation. This framework informs the design of GRATIN, an efficient graph data augmentation algorithm leveraging the capability of Gaussian Mixture Models (GMMs) to approximate any distribution. Our approach not only outperforms existing augmentation techniques in terms of generalization but also offers improved time complexity, making it highly suitable for real-world applications.

move to Tavistock Room, Woburn House. See directions here.

4:30pm - 5:45pm

Tavistock Room, Woburn House

Poster Session

We will have prizes for best posters! 🏆