FLOW Talk #128
May 21, 2025 @ 1pm Coordinated Universal Time (UTC)
Federated Learning for Secure and Scalable Distributed Systems: Addressing Vulnerabilities and Attacks
Abstract: Federated Learning (FL) is transforming how we train machine learning models across decentralized devices while keeping data private. Yet, FL faces tough challenges, from privacy leaks to vulnerabilities like poisoning attacks that can threaten model integrity. Many current defenses rely on external datasets or prior assumptions, which limit scalability and real-world application. In this talk, we will explore these issues and present a new privacy-preserving defense framework, powered by a Conditional GAN (cGAN), that generates synthetic data at the server to verify client updates, eliminating the need for external data. Our approach is designed to be scalable and easily integrated into FL frameworks such as FEDn and others, making it practical for real-world deployments. We will walk through how it compares to popular defenses like Krum and Multi-Krum, and how it performs across challenging attack scenarios. Beyond that, we will open up exciting research opportunities, especially for students, showing where the next breakthroughs in federated learning might come from. Whether you are just starting out or already deep into FL research, this talk aims to inspire ideas that push the field forward.
Relevant Papers:
Usama Zafar, André Teixeira, Salman Toor. Robust Federated Learning Against Poisoning Attacks: A GAN-Based Defense Framework, arXiv 2025.
Morgan Ekmefjord, Addi Ait-Mlouk, Sadi Alawadi, Mattias Åkesson, Prashant Singh, Ola Spjuth. Scalable federated machine learning with FEDn, 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022.
FLOW Talk #127
April 30, 2025 @ 1pm Coordinated Universal Time (UTC)
Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics
Abstract: Asynchronous Stochastic Gradient Descent (Asynchronous SGD) is a cornerstone method for parallelizing learning in distributed machine learning. However, its performance suffers under arbitrarily heterogeneous computation times across workers, leading to suboptimal time complexity and inefficiency as the number of workers scales. While several Asynchronous SGD variants have been proposed, recent findings by Tyurin & Richtárik (NeurIPS 2023) reveal that none achieve optimal time complexity, leaving a significant gap in the literature. In this paper, we propose Ringmaster ASGD, a novel Asynchronous SGD method designed to address these limitations and tame the inherent challenges of Asynchronous SGD. We establish, through rigorous theoretical analysis, that Ringmaster ASGD achieves optimal time complexity under arbitrarily heterogeneous and dynamically fluctuating worker computation times. This makes it the first Asynchronous SGD method to meet the theoretical lower bounds for time complexity in such scenarios.
Relevant Papers:
Alexander Tyurin. Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics, ICLR 2025.
FLOW Talk #126
April 16, 2025 @ 1pm Coordinated Universal Time (UTC)
Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity
Abstract: Asynchronous Stochastic Gradient Descent (Asynchronous SGD) is a cornerstone method for parallelizing learning in distributed machine learning. However, its performance suffers under arbitrarily heterogeneous computation times across workers, leading to suboptimal time complexity and inefficiency as the number of workers scales. While several Asynchronous SGD variants have been proposed, recent findings by Tyurin & Richtárik (NeurIPS 2023) reveal that none achieve optimal time complexity, leaving a significant gap in the literature. In this paper, we propose Ringmaster ASGD, a novel Asynchronous SGD method designed to address these limitations and tame the inherent challenges of Asynchronous SGD. We establish, through rigorous theoretical analysis, that Ringmaster ASGD achieves optimal time complexity under arbitrarily heterogeneous and dynamically fluctuating worker computation times. This makes it the first Asynchronous SGD method to meet the theoretical lower bounds for time complexity in such scenarios.
Relevant Papers:
Artavazd Maranjyan, Alexander Tyurin, Peter Richtárik. Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity, arXiv 2025.
FLOW Talk #125
April 9, 2025 @ 1pm Coordinated Universal Time (UTC)
Abstract: The design of efficient parallel/distributed optimization methods and tight analysis of their theoretical properties are important research endeavors. While minimax complexities are known for sequential optimization methods, the theory of parallel optimization methods is surprisingly much less explored, especially in the presence of data, compute and communication heterogeneity.
In the first part of the talk [1], we establish the first optimal time complexities for parallel optimization methods (Rennala SGD and Malenia SGD) that have access to an unbiased stochastic gradient oracle with bounded variance, under the assumption that the workers compute stochastic gradients with different speeds, i.e., we assume compute heterogeneity. We prove lower bounds and develop optimal algorithms that attain them, both in the data homogeneous and heterogeneous regimes.
In the second part of the talk [2], we establish the first optimal time complexities for parallel optimization methods (Shadowheart SGD) that have access to an unbiased stochastic gradient oracle with bounded variance, under the assumption that the workers compute stochastic gradients with different speeds, as before, but under the further assumption that the worker-to-server communication times are nonzero and heterogeneous. We prove lower bounds and develop optimal algorithms that attain them, in the data homogeneous regime only.
Time permitting, I may briefly outline some further recent results [3, 4, 5, 6, 7].
Our results have surprising consequences for the literature of asynchronous optimization methods: in contrast with prior attempts to tame compute heterogeneity via "complete/wild" compute and update asynchronicity, our methods alternate fast asynchronous computation of a minibatch of stochastic gradients with infrequent synchronous update steps.
Papers:
[1] Alexander Tyurin and Peter Richtárik. Optimal time complexities of parallel stochastic optimization methods under a fixed computation model, Advances in Neural Information Processing Systems 36 (NeurIPS 2023).
[2] Alexander Tyurin, Marta Pozzi, Ivan Ilin, and Peter Richtárik. Shadowheart SGD: Distributed asynchronous SGD with optimal time complexity under arbitrary computation and communication heterogeneity, Advances in Neural Information Processing Systems 37 (NeurIPS 2024).
[3] Alexander Tyurin, Kaja Gruntkowska, and Peter Richtárik. Freya PAGE: First optimal time complexity for large-scale nonconvex finite-sum optimization with heterogeneous asynchronous computations, Advances in Neural Information Processing Systems 37 (NeurIPS 2024).
[4] Alexander Tyurin and Peter Richtárik. On the optimal time complexities in decentralized stochastic asynchronous optimization, Advances in Neural Information Processing Systems 37 (NeurIPS 2024).
[5] Artavazd Maranjyan, Omar Shaikh Omar, and Peter Richtárik. MindFlayer: Efficient asynchronous parallel SGD in the presence of heterogeneous and random worker compute times, NeurIPS 2024 Workshop: Optimization for Machine Learning (OPT 2024).
[6] Artavazd Maranjyan, Alexander Tyurin and Peter Richtárik. Ringmaster ASGD: The first asynchronous SGD with optimal time complexity, arXiv:2501.16168, 2025
[7] Artavazd Maranjyan, El Mehdi Saad, Peter Richtárik, and Francesco Orabona. ATA: Adaptive task allocation for efficient resource management in distributed machine learning, arXiv:2502.00775, 2025
FLOW Talk #124
March 26, 2025 @ 1pm Coordinated Universal Time (UTC)
Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning
Abstract: In this paper, we investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning. We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis.
To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected Hölder regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning.
Relevant Papers:
Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning, JMLR 2024, C. Philippenko and A. Dieuleveut.