Today’s world needs orders of magnitude more efficient ML to address environmental and energy crises, optimize resource consumption and improve sustainability. With the end of Moore’s Law and Dennard Scaling, we can no longer expect more and faster transistors for the same cost and power budget. This is particularly problematic when looking at the growing data volumes collected by populated sensors and systems, larger and larger models we train, and the fact that most ML models have to run on edge devices to minimize latency, preserve privacy and save energy. The algorithmic efficiency of deep learning becomes essential to achieve desirable speedups, along with efficient hardware implementations and compiler optimizations for common math operations. ML efficiency is being actively investigated in many research communities. This reading group aims to help onboard young scientists interested in the topic and offers researchers at all levels a platform for an open dialog to foster collaboration, and stay up-to-date with rapid developments in the field of efficient ML. We welcome and discuss fresh research findings published as a pre-print or recently presented at research venues. The list of topics includes but is not limited to:
Low-resource and low-data ML, scaling laws
Model ensembling, model merging, MoEs, efficient inferences
Sparsity, efficient network architectures, NAS, meta-learning, transfer learning
Multi-modal, multi-task learning, ensembling, mixture of experts
Invariance, equivariance, data augmentation, generalization and robustness
Continual learning, test time adaptation, reconfiguration, on-device learning
Benchmarking ML training and inference methods, training efficiency
Distributed and collaborative learning, edge computing
New resource-efficient ML paradigms
Subscribe to the Efficient ML mailing list / import the Efficient ML Events calendar to receive information on how to join the virtual talks.
2. June 2025 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma, EcoCloud, EPFL, Switzerland
Abstract: The increasing size of deep neural networks (DNNs) necessitates effective model compression to reduce their computational and memory footprints. Sparsity and quantization are two prominent compression methods that have been shown to reduce DNNs’ computational and memory footprints significantly while preserving model accuracy. However, how these two methods interact when combined together remains a key question for developers, as many tacitly assume that they are orthogonal, meaning that their combined use does not introduce additional errors beyond those introduced by each method independently. In this paper, we provide the first mathematical proof that sparsity and quantization are non-orthogonal. We corroborate these results with experiments spanning a range of large language models, including the OPT and LLaMA model families (with 125M to 8B parameters), and vision models like ViT and ResNet. We show that the order in which we apply these methods matters because applying quantization before sparsity may disrupt the relative importance of tensor elements, which may inadvertently remove significant elements from a tensor. More importantly, we show that even if applied in the correct order, the compounded errors from sparsity and quantization can significantly harm accuracy. Our findings extend to the efficient deployment of large models in resource-constrained compute platforms to reduce serving cost, offering insights into best practices for applying these compression methods to maximize hardware resource efficiency without compromising accuracy.
16. June 2025 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Photon: Federated LLM Pre-Training
Lorenzo Sani, University of Cambridge and Flower Labs, UK
Abstract: Scaling large language models (LLMs) demands extensive data and computing resources, which are traditionally constrained to data centers by the high-bandwidth requirements of distributed training. Low-bandwidth methods like federated learning (FL) could enable collaborative training of larger models across weakly-connected GPUs if they can effectively be used for pre-training. To achieve this, we introduce Photon, the first complete system for federated end-to-end LLM training, leveraging cross-silo FL for global-scale training with minimal communication overheads. Using Photon, we train the first federated family of decoder-only LLMs from scratch. We show that: (1) Photon can train model sizes up to 7B in a federated fashion while reaching an even better perplexity than centralized pre-training; (2) Photon model training time decreases with available compute, achieving a similar compute-time trade-off to centralized; and (3) Photon outperforms the wall-time of baseline distributed training methods by 35% via communicating 64x-512xless. Our proposal is robust to data heterogeneity and converges twice as fast as previous methods like DiLoCo. This surprising data efficiency stems from a unique approach combining small client batch sizes with extremely high learning rates, enabled by federated averaging's robustness to hyperparameters. Photon thus represents the first economical system for global internet-wide LLM pre-training.
23. June 2025 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, ETH Zurich, Switzerland
Abstract: This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nyström initialization, which significantly improves the global convergence of Scaled Gradient Descent (ScaledGD) in both symmetric and asymmetric matrix factorization tasks. Specifically, we prove that ScaledGD with Nyström initialization achieves quadratic convergence in cases where only linear rates were previously known. Furthermore, we extend this initialization to low-rank adapters (LoRA) commonly used for finetuning foundation models. Our approach, NoRA, i.e., LoRA with Nyström initialization, demonstrates superior performance across various downstream tasks and model scales, from 1B to 7B parameters, in large language and diffusion models.
30. June 2025 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Improving Neural Network Accuracy by Concurrently Training with a Twin Network
Benjamin Vandersmissen, IDLab, University of Antwerp, Belgium
Abstract: Recently within Spiking Neural Networks, a method called Twin Network Aug- mentation (TNA) has been introduced. This technique claims to improve the val- idation accuracy of a Spiking Neural Network simply by training two networks in conjunction and matching the logits via the Mean Squared Error loss. In this paper, we validate the viability of this method on a wide range of popular Convolu- tional Neural Network (CNN) benchmarks and compare this approach to existing Knowledge Distillation schemes. Next, we conduct an in-depth study of the dif- ferent components that make up TNA and determine that its effectiveness is not solely situated in an increase of trainable parameters, but rather the effect of the training methodology. Finally, we analyse the representations learned by networks trained with TNA and highlight their superiority in a number of tasks, thus proving empirically the applicability of Twin Network Augmentation on CNN models.
OpenReview: https://openreview.net/pdf?id=TEmE9PSC65
14. July 2025 @ 5pm CEST / 11am EST / 8am PST [timezone converter]
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang, Graz University of Technology, Austria
Abstract: We introduce model folding, a novel data-free model compression technique that merges structurally similar neurons across layers, significantly reducing the model size without the need for fine-tuning or access to training data. Unlike existing methods, model folding preserves data statistics during compression by leveraging k-means clustering, and using novel data-free techniques to prevent variance collapse or explosion. Our theoretical framework and experiments across standard benchmarks, including ResNet18 and LLaMA-7B, demonstrate that model folding achieves comparable performance to data-driven compression techniques and outperforms recently proposed data-free methods, especially at high sparsity levels. This approach is particularly effective for compressing large-scale models, making it suitable for deployment in resource-constrained environments.
12. May 2025 @ 5pm CEST — ▶️ YouTube
Do Deep Neural Network Solutions Form a Star Domain?
Ankit Sonthalia, Tübingen AI Center, Universität Tübingen, Germany
arXiv: https://arxiv.org/abs/2403.07968
14. April 2025 @ 5pm CEST — ▶️ YouTube
Leveraging the True Depth of LLMs
Ramón Calvo González, University of Geneva, Switzerland
arXiv: https://arxiv.org/pdf/2502.02790
7. April 2025 @ 5pm CEST — ▶️ YouTube
Mobile Video Diffusion
Denis Korzhenkov, Qualcomm AI Research, Netherlands
arXiv: https://arxiv.org/pdf/2412.07583
17. March 2025 @ 5pm CET — ▶️ YouTube
The Curse of Depth in Large Language Models
Shiwei Liu, Mathematics Institute, University of Oxford, UK
arXiv: https://arxiv.org/abs/2412.13795
3. February 2025 @ 5pm CET — ▶️ YouTube
HydraViT: Stacking Heads for a Scalable ViT
Janek Haberer and Ali Hojjat, Kiel University, Germany
arXiv: https://arxiv.org/pdf/2409.17978
16. December 2024 @ 4pm CET
Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks
Wenying Duan, Nanchang University, Xiaoxi He, University of Macau
Link: https://dl.acm.org/doi/10.1145/3637528.3671912
2. December 2024 @ 5pm CET — ▶️ YouTube
Towards Meta-Pruning via Optimal Transport
Alexander Theus and Olin Geimer, ETH Zurich, Switzerland
arXiv: https://arxiv.org/abs/2402.07839
11. November 2024 @ 5pm CET — ▶️ YouTube
The Uncanny Valley: Exploring Adversarial Robustness from a Flatness Perspective
Nils Philipp Walter, CISPA Helmholtz Center for Information Security, Germany
Linara Adilova, Ruhr University Bochum, Germany
arXiv: https://arxiv.org/abs/2405.16918
28. October 2024 @ 5pm CET — ▶️ YouTube
Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models
Eldar Kurtic, Institute of Science and Technology Austria (ISTA) and NeuralMagic, Austria
arXiv: https://arxiv.org/pdf/2406.12572
21. October 2024 @ 5pm CEST — ▶️ YouTube
Toward Greener Matrix Operations by Lossless Compressed Formats
Francesco Tosoni, University of Pisa, Italy
arXiv: https://arxiv.org/pdf/2409.18620
16. September 2024 @ 5pm CEST — ▶️ YouTube
Localizing Task Information for Improved Model Merging and Compression
Ke Wang and Nikolaos Dimitriadis, EPFL, Switzerland
arXiv: https://arxiv.org/pdf/2405.07813
2. September 2024 @ 5pm CEST — ▶️ YouTube
Expand-and-Cluster: Parameter Recovery of Neural Networks
Flavio Martinelli, EPFL, Switzerland
arXiv: https://arxiv.org/abs/2304.12794
22. July 2024 @ 5pm CEST — ▶️ YouTube
Efficiency for Free: Ideal Data Are Transportable Representations
Peng Sun, Zhejiang University
arXiv: https://arxiv.org/pdf/2405.14669
15. July 2024 @ 5pm CEST — ▶️ YouTube
Subspace-Configurable Networks
Dong Wang, Graz University of Technology, Austria
arXiv: https://arxiv.org/pdf/2305.13536
17. June 2024 @ 5pm CEST — ▶️ YouTube
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao, Beijing University of Posts and Telecommunications, China
arXiv: https://arxiv.org/pdf/2405.08553
27. May 2024 @ 5pm CEST
LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
Youbing Hu, AIoT Lab, Harbin Institute of Technology, China and Yun Cheng, Swiss Data Science Center, Switzerland
arXiv: https://arxiv.org/pdf/2402.00033.pdf
13. May 2024 @ 5pm CEST — ▶️ YouTube
94% on CIFAR-10 in 3.29 Seconds on a Single GPU
Keller Jordan, Independent Researcher, USA
arXiv: https://arxiv.org/pdf/2404.00498.pdf
29. April 2024 @ 5pm CEST
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Rahim Entezari, Stability AI, Austria
arXiv: https://arxiv.org/pdf/2403.03206.pdf
15. April 2024 @ 5pm CEST — ▶️ YouTube
Just Say the Name: Online Continual Learning with Category Names Only via Data Generation
Diganta Misra, Max Planck Institute, Tübingen, Germany
arXiv: https://arxiv.org/pdf/2403.10853.pdf
8. April 2024 @ 5pm CEST
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jimenez, EPFL, Switzerland
arXiv: https://arxiv.org/pdf/2305.12827.pdf
18. March 2024 @ 5pm CET
Sit Back and Relax: Learning to Drive Incrementally in All Weather Conditions
M. Jehanzeb Mirza, Graz University of Technology, Austria
arXiv: https://arxiv.org/abs/2305.18953
26. February 2024 @ 5pm CET
Less is More – Towards parsimonious multi-task models using structured sparsity
Richa Upadhyay, Luleå University of Technology, Sweden
OpenReview: https://openreview.net/pdf?id=0VU6Vlh0zy
19. February 2024 @ 5pm CET
How to Prune Your Language Model: Recovering Accuracy on the “Sparsity May Cry” Benchmark
Eldar Kurtic, Institute of Science and Technology Austria (ISTA), Austria
arXiv: https://arxiv.org/abs/2312.13547
13. February 2024 @ 5pm CET
Efficient Continual and On-Device Learning for Edge Computing Platforms
Young D. Kwon, University of Cambridge and Samsung AI, UK
arXiv: https://arxiv.org/abs/2311.11420, https://arxiv.org/abs/2307.09988
29. January 2024 @ 5pm CET
HRBP: Hardware-friendly Regrouping towards Block-based Pruning for Sparse CNN Training
Haoyu Ma, University of California, Irvine, USA
OpenReview: https://openreview.net/pdf?id=VP1Xrdz0Bp
22. January 2024 @ 5pm CET
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
Alicia Curth and Alan Jeffares, University of Cambridge, UK
arXiv: https://arxiv.org/abs/2310.18988
18. December 2023 @ 5pm CET
MosaicML: A Three-Year Retrospective of the Journey and Technology
Jonathan Frankle, Mosaic ML / Databricks, USA
11. December 2023 @ 5pm CET
Dual Algorithmic Reasoning
Danilo Numeroso, University of Pisa, Italy
arXiv: https://arxiv.org/abs/2302.04496
27. November 2023 @ 5pm CET
Privacy Side Channels in Machine Learning Systems
Edoardo Debenedetti, ETH Zurich, Switzerland
arXiv: https://arxiv.org/abs/2309.05610
17. November 2023 @ 3pm CET
Layer-wise Linear Mode Connectivity
Linara Adilova, Ruhr University Bochum, Germany
arXiv: https://arxiv.org/abs/2307.06966
6. November 2023 @ 5pm CET
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin, Institute of Computer Graphics and Vision, Graz University of Technology, Austria
arXiv: https://arxiv.org/abs/2303.08914
30. October 2023 @ 5pm CET
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko, EPFL, Switzerland
arXiv: https://arxiv.org/abs/2310.04415
27. October 2023 @ 3pm CET
Cost-effective On-device Continual Learning over Memory Hierarchy with Miro
Xinyue Ma, UNIST, South Korea
arXiv: https://arxiv.org/abs/2308.06053
18. September 2023 @ 5pm CET
Localised Adaptive Spatial-Temporal Graph Neural Network
Xiaoxi He, University of Macau, China
arXiv: https://arxiv.org/abs/2306.06930
11. September 2023 @ 5pm CET
AutoCoreset: An Automatic Practical Coreset Construction Framework
Alaa Maalouf, Computer Science and Artificial Intelligence Lab (CSAIL), MIT, USA
arXiv: https://arxiv.org/abs/2305.11980
4. September 2023 @ 5pm CET
SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
Elias Frantar, Institute of Science and Technology Austria (ISTA), Austria
arXiv: https://arxiv.org/abs/2301.00774
10. July 2023 @ 5pm CET
Information Plane Analysis for Dropout Neural Networks
Linara Adilova, Ruhr University Bochum, Germany
arXiv: https://arxiv.org/abs/2303.00596
26. June 2023 @ 5pm CET
MiniLearn: On-Device Learning for Low-Power IoT Devices
Marc-Andre Schümann and Olaf Landsiedel, Kiel University, Germany & Chalmers University of Technology, Sweden
PDF: https://ewsn2022.pro2future.at/paper/sessions/ewsn2022-final3.pdf
ACM digital library: https://dl.acm.org/doi/10.5555/3578948.3578949
5. June 2023 @ 5pm CET
A Modern Look at the Relationship Between Sharpness and Generalization
Maksym Andriushchenko, EPFL
arXiv link: https://arxiv.org/abs/2302.07011
22. May 2023 @ 5pm CET
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
Shiwei Liu, VITA group and the Institute for Foundations of Machine Learning (IFML) at UT Austin, USA
arXiv link: https://arxiv.org/abs/2303.02141
15. May 2023 @ 5pm CET
Continual Pre-Training Mitigates Forgetting in Language and Vision
Andrea Cossu, University of Pisa, Italy
arXiv link: https://arxiv.org/abs/2205.09357
17. April 2023 @ 5pm CET
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan, Hive AI, USA
arXiv link: https://arxiv.org/abs/2211.08403
3. April 2023 @ 5pm CET
Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?
Mansheej Paul, Stanford, USA
arXiv link: https://arxiv.org/abs/2210.03044
2nd Workshop on Efficient Machine Learning, 2022, Vienna, Austria
Contact us in case of questions or suggestions (efficientml@gmail.com). Self-nominations to present your freshly published work in the reading group are welcome.