A public ELLIS reading group exploring the interplay between the mathematical foundations of deep learning and the practical challenge of making ML efficient — from optimization theory to hardware-aware training.
13. April 2026 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Sustainable Development and Energy Efficiency in Deep Learning
Raphael Fischer, TU Dortmund and Lamarr Institute, Germany
Abstract: With growing environmental impacts caused by modern deep learning, researchers have to establish reporting standards that go beyond predictive performance and explicitly account for sustainability. However, quantifying and informing on the energy efficiency of models and systems remains hard. The talk explores methods and experimental insights for understanding and balancing model performance in a multi-dimensional way, thus paving the way for sustainable development in the field.
27. April 2026 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
It's not a Lottery, it's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task
Hannah Pinson, Eindhoven University of Technology, Netherlands
Abstract: Our theoretical understanding of neural networks is lagging behind their empirical success. One of the important unexplained phenomena is why and how, during the process of training with gradient descent, the theoretical capacity of neural networks is reduced to an effective capacity that fits the task. We here investigate the mechanism by which gradient descent achieves this through analyzing the learning dynamics at the level of individual neurons in single hidden layer ReLU networks. We identify three dynamical principles -- mutual alignment, unlocking and racing -- that together explain why we can often successfully reduce capacity after training through the merging of equivalent neurons or the pruning of low norm weights. We specifically explain the mechanism behind the lottery ticket conjecture, or why the specific, beneficial initial conditions of some neurons lead them to obtain higher weight norms.
11. May 2026 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Finite-Time Lyapunov Exponents of Deep Neural Networks
Bernhard Mehlig, Department of Physics, University of Gothenburg, Sweden
Abstract: We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep feed-forward networks and dynamical systems, where the growth or decay of local perturbations is characterized by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualize the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.
25. May 2026 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant
Eugenia Iofinova, Institute of Science and Technology Austria
Andrej Jovanovic, University of Cambridge, UK
Abstract: The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as using personal data to fine-tune these models to imitate a user's unique writing style. Two key requirements for such assistants are personalization - in the sense that the assistant should recognizably reflect the user's own writing style - and privacy - users may justifiably be wary of uploading extremely personal data, such as their email archive, to a third-party service. In this paper, we present a new design and evaluation for such an automated assistant, for the specific use case of email generation, which we call Panza. Panza's personalization features are based on a combination of fine-tuning using a variant of the Reverse Instructions technique together with Retrieval-Augmented Generation (RAG). We demonstrate that this combination allows us to fine-tune an LLM to reflect a user's writing style using limited data, while executing on extremely limited resources, e.g. on a free Google Colab instance. Our key methodological contribution is the first detailed study of evaluation metrics for this personalized writing task, and of how different choices of system components--the use of RAG and of different fine-tuning approaches-impact the system's performance. Additionally, we demonstrate that very little data - under 100 email samples - are sufficient to create models that convincingly imitate humans. This finding showcases a previously-unknown attack vector in language models - that access to a small number of writing samples can allow a bad actor to cheaply create generative models that imitate a target's writing style.
8. June 2026 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Revisiting Glorot Initialization for Long-Range Linear Recurrences
Mariia Seleznova, Ludwig Maximilian University of Munich, Germany
Abstract: Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation—but derived under the infinite-width, fixed-length regime—an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length t = O(sqrt(n)), where n is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below one, preventing rapid signal explosion or decay. These results suggest that standard initialization schemes may break down in the long-sequence regime, motivating a separate line of theory for stable recurrent initialization.
30. March 2026 @ 5pm CEST — ▶️ YouTube
s1: Simple test-time scaling
Niklas Muennighoff, Stanford University, Allen Institute for AI, Contextual AI, USA
arXiv: https://arxiv.org/abs/2501.19393
16. March 2026 @ 5pm CET — ▶️ YouTube
Procedural Pretraining: Warming Up Language Models with Abstract Data
Liangze Jiang, EPFL and Idiap Research Institute, Switzerland
Zachary Shinnick, Australian Institute for Machine Learning (AIML), Adelaide University, Australia
arXiv: https://arxiv.org/pdf/2601.21725
9. March 2026 @ 5pm CET — ▶️ YouTube
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen, Stanford University, USA
arXiv: https://arxiv.org/abs/2211.05729
2. March 2026 @ 5pm CET — ▶️ YouTube
When Flatness Does (Not) Guarantee Adversarial Robustness
Nils Philipp Walter, CISPA Helmholtz Center for Information Security, Germany
arXiv: https://arxiv.org/pdf/2510.14231
9. February 2026 @ 5pm CET — ▶️ YouTube
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures
Yedi Zhang, Gatsby Computational Neuroscience Unit, University College London, UK
arXiv: https://arxiv.org/pdf/2512.20607
The paper on Muon Yedi mentioned in the talk is now on arXiv: https://arxiv.org/abs/2603.00742
19. January 2026 @ 5pm CET — ▶️ YouTube
Fast Video Generation (multiple papers)
Rahim Entezari, Wayve.ai
12. January 2026 @ 5pm CET — ▶️ YouTube
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Ting Han, Lamarr Institute, TU Dortmund, Germany and Institute for AI in Medicine, UK Essen
OpenReview: https://openreview.net/pdf?id=lbtOctHDQ3
Contact us for questions or suggestions via efficientml@gmail.com.
Self-nominations to present your published work in the reading group are welcome.