DLMath&Efficiency

Mathematics & Efficiency of Deep Learning
public ELLIS Reading Group

This reading group examines the interplay between the theoretical foundations of deep learning and the practical challenge of making machine learning efficient. On the theory side, we study mathematical insights into optimization, generalization, architectures, and training dynamics, exploring how these principles explain why modern networks learn and generalize. On the efficiency side, we investigate how these theoretical insights inform resource-aware and scalable ML, including algorithmic improvements, architecture design, and hardware-aware training. By linking rigorous theory with practical efficiency, the group aims to understand not only why deep learning works, but also how to make it faster, cheaper, and more sustainable in real-world applications, while staying current with cutting-edge research.

The list of topics includes but is not limited to:

Optimization & Training Dynamics
- Gradient-based methods: SGD vs GD, adaptive optimizers, Hessian-based methods
- Implicit regularization, convergence, and stationary points (local minima, saddles, valleys, mode connectivity)
- Training trajectories: smoothness, loss landscape geometry, and overparameterization effects
- Benchmarks of training efficiency, distributed and collaborative learning
Generalization & Robustness
- Double descent, benign overfitting, PAC-Bayes bounds, information-theoretic perspectives
- NTK vs finite networks, memorization, stability
- Invariance, equivariance, data augmentation, robustness, multi-task and continual learning
- Test-time adaptation, reconfiguration, and transfer learning
Architectures & Model Efficiency
- Initialization, lottery tickets, subnetwork discovery, and sparsity
- Efficient network architectures, neural architecture search (NAS), meta-learning
- Multi-modal models, mixture-of-experts (MoEs), model merging, ensembling
- Scaling laws and their effect on efficiency and generalization
Emerging Paradigms & Practical Efficiency
- Low-resource and low-data machine learning
- On-device learning, edge computing, hardware-aware training
- Resource-efficient ML paradigms: energy-efficient training, inference optimization
- Linking theoretical insights to design of scalable, sustainable ML systems

Schedule / Upcoming Talks

Subscribe to the Efficient ML mailing list / import the Efficient ML Events calendar to receive information on how to join the virtual talks.

🎄🎁🎄 Merry Christmas and Happy New Year! 🎄🎁🎄

by paper authors

12. January 2026 @ 5pm CET / 11am EST / 8am PST [timezone converter]
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Ting Han, Lamarr Institute, TU Dortmund, Germany and Institute for AI in Medicine, UK Essen

Abstract: Neural collapse, i.e., the emergence of highly symmetric, class-wise clustered representations, is frequently observed in deep networks and is often assumed to reflect or enable generalization. In parallel, flatness of the loss landscape has been theoretically and empirically linked to generalization. Yet, the causal role of either phenomenon remains unclear: Are they prerequisites for generalization, or merely by-products of training dynamics? We disentangle these questions using grokking, a training regime in which memorization precedes generalization, allowing us to temporally separate generalization from training dynamics and we find that while both neural collapse and relative flatness emerge near the onset of generalization, only flatness consistently predicts it. Models encouraged to collapse or prevented from collapsing generalize equally well, whereas models regularized away from flat solutions exhibit delayed generalization, resembling grokking, even in architectures and datasets where it does not typically occur. Furthermore, we show theoretically that neural collapse leads to relative flatness under classical assumptions, explaining their empirical co-occurrence. Our results support the view that relative flatness is a potentially necessary and more fundamental property for generalization, and demonstrate how grokking can serve as a powerful probe for isolating its geometric underpinnings.

OpenReview: https://openreview.net/pdf?id=lbtOctHDQ3

Past Events and Talks

2025

15. December 2025 @ 5pm CET — ▶️ YouTube
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli, EPFL, Switzerland
arXiv: https://arxiv.org/pdf/2506.14951
8. December 2025 @ 5pm CET
Understanding adversarially robust generalization via weight-curvature index
Yuelin Xu, CISPA Helmholtz Center for Information Security, Germany
arXiv: https://arxiv.org/pdf/2410.07719
1. December 2025 @ 5pm CET — ▶️ YouTube
Beyond Outliers: A Study of Optimizers Under Quantization
Georgios Vlassis, ETH Zurich, Switzerland
arXiv: https://arxiv.org/pdf/2509.23500
17. November 2025 @ 5pm CET — ▶️ YouTube
Mask in the Mirror: Implicit Sparsification
Tom Jacobs, CISPA Helmholtz Center for Information Security, Germany
arXiv: https://arxiv.org/abs/2408.09966
20. October 2025 @ 5pm CEST — ▶️ YouTube
Testing knowledge distillation theories with dataset size
Giulia Lanzillotta, ETH AI Center, ETH Zurich, Switzerland
OpenReview: https://openreview.net/pdf?id=zxvfnceX9S
29. September 2025 @ 5pm CEST — ▶️ YouTube
Self-Bootstrapping for Versatile Test-Time Adaptation
Shuaicheng Niu, Nanyang Technological University, Singapore
arXiv: https://arxiv.org/pdf/2504.08010
8. September 2025 @ 5pm CEST — ▶️ YouTube
Layer by Layer: Uncovering Hidden Representations in Language Models
Oscar Skean, University of Kentucky, USA
arXiv: https://arxiv.org/pdf/2502.02013
28. July 2025 @ 5pm CEST — ▶️ YouTube
Dynamic Sparse Training of Diagonally Sparse Networks
Abhishek Tyagi, University of Rochester, NY, USA
arXiv: https://arxiv.org/pdf/2506.11449
21. July 2025 @ 5pm CEST — ▶️ YouTube
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma, EcoCloud, EPFL, Switzerland
arXiv: https://arxiv.org/abs/2405.20935
14. July 2025 @ 5pm CEST — ▶️ YouTube
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang, Graz University of Technology, Austria
arXiv: https://arxiv.org/abs/2502.10216
30. June 2025 @ 5pm CEST — ▶️ YouTube
Improving Neural Network Accuracy by Concurrently Training with a Twin Network
Benjamin Vandersmissen, IDLab, University of Antwerp, Belgium
OpenReview: https://openreview.net/pdf?id=TEmE9PSC65
23. June 2025 @ 5pm CEST — ▶️ YouTube
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, ETH Zurich, Switzerland
arXiv: https://arxiv.org/abs/2410.18965
16. June 2025 @ 5pm CEST — ▶️ YouTube
Photon: Federated LLM Pre-Training
Lorenzo Sani, University of Cambridge and Flower Labs, UK
arXiv: https://arxiv.org/abs/2411.02908
12. May 2025 @ 5pm CEST — ▶️ YouTube
Do Deep Neural Network Solutions Form a Star Domain?
Ankit Sonthalia, Tübingen AI Center, Universität Tübingen, Germany
arXiv: https://arxiv.org/abs/2403.07968
14. April 2025 @ 5pm CEST — ▶️ YouTube
Leveraging the True Depth of LLMs
Ramón Calvo González, University of Geneva, Switzerland
arXiv: https://arxiv.org/pdf/2502.02790
7. April 2025 @ 5pm CEST — ▶️ YouTube
Mobile Video Diffusion
Denis Korzhenkov, Qualcomm AI Research, Netherlands
arXiv: https://arxiv.org/pdf/2412.07583
17. March 2025 @ 5pm CET — ▶️ YouTube
The Curse of Depth in Large Language Models
Shiwei Liu, Mathematics Institute, University of Oxford, UK
arXiv: https://arxiv.org/abs/2412.13795
3. February 2025 @ 5pm CET — ▶️ YouTube
HydraViT: Stacking Heads for a Scalable ViT
Janek Haberer and Ali Hojjat, Kiel University, Germany
arXiv: https://arxiv.org/pdf/2409.17978

2024

16. December 2024 @ 4pm CET
Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks
Wenying Duan, Nanchang University, Xiaoxi He, University of Macau, China
Link: https://dl.acm.org/doi/10.1145/3637528.3671912
2. December 2024 @ 5pm CET — ▶️ YouTube
Towards Meta-Pruning via Optimal Transport
Alexander Theus and Olin Geimer, ETH Zurich, Switzerland
arXiv: https://arxiv.org/abs/2402.07839
11. November 2024 @ 5pm CET — ▶️ YouTube
The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective
Nils Philipp Walter, CISPA Helmholtz Center for Information Security, Germany
Linara Adilova, Ruhr University Bochum, Germany
arXiv: https://arxiv.org/abs/2405.16918
28. October 2024 @ 5pm CET — ▶️ YouTube
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models
Eldar Kurtic, Institute of Science and Technology Austria (ISTA) and NeuralMagic, Austria
arXiv: https://arxiv.org/pdf/2406.12572
21. October 2024 @ 5pm CEST — ▶️ YouTube
Toward Greener Matrix Operations by Lossless Compressed Formats
Francesco Tosoni, University of Pisa, Italy
arXiv: https://arxiv.org/pdf/2409.18620
16. September 2024 @ 5pm CEST — ▶️ YouTube
Localizing Task Information for Improved Model Merging and Compression
Ke Wang and Nikolaos Dimitriadis, EPFL, Switzerland
arXiv: https://arxiv.org/pdf/2405.07813
2. September 2024 @ 5pm CEST — ▶️ YouTube
Expand-and-Cluster: Parameter Recovery of Neural Networks
Flavio Martinelli, EPFL, Switzerland
arXiv: https://arxiv.org/abs/2304.12794
22. July 2024 @ 5pm CEST — ▶️ YouTube
Efficiency for Free: Ideal Data Are Transportable Representations
Peng Sun, Zhejiang University, China
arXiv: https://arxiv.org/pdf/2405.14669
15. July 2024 @ 5pm CEST — ▶️ YouTube
Subspace-Configurable Networks
Dong Wang, Graz University of Technology, Austria
arXiv: https://arxiv.org/pdf/2305.13536
17. June 2024 @ 5pm CEST — ▶️ YouTube
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Beijing University of Posts and Telecommunications, China
arXiv: https://arxiv.org/pdf/2405.08553
27. May 2024 @ 5pm CEST
LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
Youbing Hu, AIoT Lab, Harbin Institute of Technology, China and Yun Cheng, Swiss Data Science Center, Switzerland
arXiv: https://arxiv.org/pdf/2402.00033.pdf
13. May 2024 @ 5pm CEST — ▶️ YouTube
94% on CIFAR-10 in 3.29 Seconds on a Single GPU
Keller Jordan, Independent Researcher, USA
arXiv: https://arxiv.org/pdf/2404.00498.pdf
29. April 2024 @ 5pm CEST
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Rahim Entezari, Stability AI, Austria
arXiv: https://arxiv.org/pdf/2403.03206.pdf
15. April 2024 @ 5pm CEST — ▶️ YouTube
Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
Diganta Misra, Max Planck Institute, Tübingen, Germany
arXiv: https://arxiv.org/pdf/2403.10853.pdf
8. April 2024 @ 5pm CEST
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jimenez, EPFL, Switzerland
arXiv: https://arxiv.org/pdf/2305.12827.pdf
18. March 2024 @ 5pm CET
Sit Back and Relax: Learning to Drive Incrementally in All Weather Conditions
M. Jehanzeb Mirza, Graz University of Technology, Austria
arXiv: https://arxiv.org/abs/2305.18953
26. February 2024 @ 5pm CET
Less is More – Towards parsimonious multi-task models using structured sparsity
Richa Upadhyay, Luleå University of Technology, Sweden
OpenReview: https://openreview.net/pdf?id=0VU6Vlh0zy
19. February 2024 @ 5pm CET
How to Prune Your Language Model: Recovering Accuracy on the “Sparsity May Cry” Benchmark
Eldar Kurtic, Institute of Science and Technology Austria (ISTA), Austria
arXiv: https://arxiv.org/abs/2312.13547
13. February 2024 @ 5pm CET
Efficient Continual and On-Device Learning for Edge Computing Platforms
Young D. Kwon, University of Cambridge and Samsung AI, UK
arXiv: https://arxiv.org/abs/2311.11420, https://arxiv.org/abs/2307.09988
29. January 2024 @ 5pm CET
HRBP: Hardware-friendly Regrouping towards Block-based Pruning for Sparse CNN Training
Haoyu Ma, University of California, Irvine, USA
OpenReview: https://openreview.net/pdf?id=VP1Xrdz0Bp
22. January 2024 @ 5pm CET
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
Alicia Curth and Alan Jeffares, University of Cambridge, UK
arXiv: https://arxiv.org/abs/2310.18988

2023

18. December 2023 @ 5pm CET
MosaicML: A Three-Year Retrospective of the Journey and Technology
Jonathan Frankle, Mosaic ML / Databricks, USA
11. December 2023 @ 5pm CET
Dual Algorithmic Reasoning
Danilo Numeroso, University of Pisa, Italy
arXiv: https://arxiv.org/abs/2302.04496
27. November 2023 @ 5pm CET
Privacy Side Channels in Machine Learning Systems
Edoardo Debenedetti, ETH Zurich, Switzerland
arXiv: https://arxiv.org/abs/2309.05610
17. November 2023 @ 3pm CET
Layer-wise Linear Mode Connectivity
Linara Adilova, Ruhr University Bochum, Germany
arXiv: https://arxiv.org/abs/2307.06966
6. November 2023 @ 5pm CET
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin, Institute of Computer Graphics and Vision, Graz University of Technology, Austria
arXiv: https://arxiv.org/abs/2303.08914
30. October 2023 @ 5pm CET
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko, EPFL, Switzerland
arXiv: https://arxiv.org/abs/2310.04415
27. October 2023 @ 3pm CET
Cost-effective On-device Continual Learning over Memory Hierarchy with Miro
Xinyue Ma, UNIST, South Korea
arXiv: https://arxiv.org/abs/2308.06053
18. September 2023 @ 5pm CET
Localised Adaptive Spatial-Temporal Graph Neural Network
Xiaoxi He, University of Macau, China
arXiv: https://arxiv.org/abs/2306.06930
11. September 2023 @ 5pm CET
AutoCoreset: An Automatic Practical Coreset Construction Framework
Alaa Maalouf, Computer Science and Artificial Intelligence Lab (CSAIL), MIT, USA
arXiv: https://arxiv.org/abs/2305.11980
4. September 2023 @ 5pm CET
SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
Elias Frantar, Institute of Science and Technology Austria (ISTA), Austria
arXiv: https://arxiv.org/abs/2301.00774
10. July 2023 @ 5pm CET
Information Plane Analysis for Dropout Neural Networks
Linara Adilova, Ruhr University Bochum, Germany
arXiv: https://arxiv.org/abs/2303.00596
26. June 2023 @ 5pm CET
MiniLearn: On-Device Learning for Low-Power IoT Devices
Marc-Andre Schümann and Olaf Landsiedel, Kiel University, Germany & Chalmers University of Technology, Sweden
PDF: https://ewsn2022.pro2future.at/paper/sessions/ewsn2022-final3.pdf
ACM digital library: https://dl.acm.org/doi/10.5555/3578948.3578949
5. June 2023 @ 5pm CET
A Modern Look at the Relationship Between Sharpness and Generalization
Maksym Andriushchenko, EPFL, Switzerland
arXiv link: https://arxiv.org/abs/2302.07011
22. May 2023 @ 5pm CET
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Shiwei Liu, VITA group and the Institute for Foundations of Machine Learning (IFML) at UT Austin, USA
arXiv link: https://arxiv.org/abs/2303.02141
15. May 2023 @ 5pm CET
Continual Pre-Training Mitigates Forgetting in Language and Vision
Andrea Cossu, University of Pisa, Italy
arXiv link: https://arxiv.org/abs/2205.09357
17. April 2023 @ 5pm CET
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan, Hive AI, USA
arXiv link: https://arxiv.org/abs/2211.08403
3. April 2023 @ 5pm CET
Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?
Mansheej Paul, Stanford, USA
arXiv link: https://arxiv.org/abs/2210.03044

2022

2nd Workshop on Efficient Machine Learning, 2022, Vienna, Austria

Organizing Team and Contact

Contact us in case of questions or suggestions (efficientml@gmail.com). Self-nominations to present your freshly published work in the reading group are welcome.