Federated Learning One World Seminar

Archive of Talks: FLOW Season 2024

FLOW Talk #116

July 10, 2024 @ 1pm Coordinated Universal Time (UTC) 

Fast Proximal-Point methods for Federated Optimization

host: Sebastian Stich

[slides]

Abstract: In developing efficient optimization algorithms, it is crucial to account for communication constraints—a significant challenge in modern federated learning settings. In this talk, I will first revisit DANE, a distributed proximal point algorithm, and show that it can exploit second-order dissimilarity and achieve the desired communication reduction under such conditions. However, its local computation efficiency is sub-optimal. I will then introduce a novel distributed algorithm S-DANE. This method adopts a more stabilized prox-center in the proximal step and matches DANE’s communication complexity. Moreover, the accuracy requirement for solving its subproblem is weaker than DANE, leading to enhanced local computation efficiency. Finally, I will introduce how to accelerate S-DANE, and show that the resulting algorithm achieves the best-known communication complexity among all existing methods for convex distributed optimization, with the same improved local computation efficiency as S-DANE. 

FLOW Talk #115

March 27, 2023 @ 1pm Coordinated Universal Time (UTC) 

Byzantine Robustness and Partial Participation Can Be Achieved Simultaneously: Just Clip Gradient Differences

host: Samuel Horvath

[slides]

Abstract: Distributed learning has emerged as a leading paradigm for training large machine learning models. However, in real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models. Byzantine fault tolerance mechanisms have been proposed to address these issues, but they often assume full participation from all clients, which is not always practical due to the unavailability of some clients or communication constraints. In our work, we propose the first distributed method with client sampling and provable tolerance to Byzantine workers. The key idea behind the developed method is the use of gradient clipping to control stochastic gradient differences in recursive variance reduction. This allows us to bound the potential harm caused by Byzantine workers, even during iterations when all sampled clients are Byzantine. Furthermore, we incorporate communication compression into the method to enhance communication efficiency. Under quite general assumptions, we prove convergence rates for the proposed method that match the existing state-of-the-art (SOTA) theoretical results.

Paper:  

FLOW Talk #114

March 13, 2023 @ 5pm Coordinated Universal Time (UTC) 

Mariel Werner (Berkeley)

Provably Personalized and Robust Federated Learning 

[slides]

host: Samuel Horvath

Abstract: Federated learning is a powerful distributed optimization framework in which multiple clients collaboratively train a global model without sharing their raw data. In this work, we tackle the personalized version of the federated learning problem. In particular, we ask: throughout the training process, can each client in a federated system identify a subset of similar clients and collaboratively train with just those clients? In the affirmative, we formalize this problem as a stochastic optimization problem, achieving optimal convergence rates for a large class of loss functions. We propose simple iterative algorithms which identify clusters of similar clients and train a personalized model-per-cluster, using local client gradients and flexible constraints on the clusters. The convergence rates of our algorithms asymptotically match those obtained if we knew the true underlying clustering of the clients and are provably robust in the Byzantine setting where some fraction of the clients are malicious.

Paper:  

FLOW Talk #114

March 6, 2023 @ 5pm Coordinated Universal Time (UTC) 

Xinyi Xu (NUS)

Fairness and Incentives for Data Sharing and Collaborative Learning

[slides]

host: Samuel Horvath

Abstract: Data sharing and collaborative (machine) learning have use-cases where individual "agents" (e.g., researchers, organizations) have limited capabilities or resources to conduct large-scale data collection and thus turn to each other for collaboration. A motivating example is in medicine (NEJM, Nature Journal) where the data are incredibly valuable and costly, and are under stringent privacy regulations. Hence, we formally study the problem of data sharing and/or collaborative learning to analyze the desiderata (in light of these practical considerations) and propose principled solutions to achieve them. We identify several important desiderata, especially the fairness of the collaboration, which is motivated by the fact that each agent incurs a non-trivial (and sometimes significant) cost from procuring, acquiring or otherwise collecting their data, so it is imperative that their effort is fairly recognized and rewarded, in the form of some specific incentives. In this talk, I will talk about some precise formalizations of fairness (e.g., the Shapley value), how it is applied to incentivize data sharing/collaborative learning in some specific learning contexts (e.g., federated learning), and discuss some future directions.

Paper:  

FLOW Talk #113

February 21, 2023 @ 5pm Coordinated Universal Time (UTC) 

How to Make Federated Learning Work with Challenging Client Participation Patterns?

[slides]

host: Samuel Horvath

Abstract: A main challenge in many practical scenarios of federated learning is that the clients are only intermittently available to participate in learning. In this talk, I will present our recent results on understanding and overcoming this challenge. I will first explain the importance of aggregation weight adaptation and introduce a new algorithm that improves federated averaging (FedAvg) by adaptively weighting the client updates. The adaptation is based on online estimates of the optimal weights, where the statistics of client participation are heterogeneous and unknown a priori. Then, for the case with very infrequently participating clients, I will present an "amplification" mechanism that is applied to the model updates. The talk will cover both theoretical and empirical findings on this topic and also discuss further insights.

FLOW Talk #112

February 7, 2023 @ 1pm Coordinated Universal Time (UTC) 

Variance Reduction for Byzantine-Robust Distributed Optimization

[slides]

host: Samuel Horvath

Abstract: Byzantine robustness has been gaining a lot of attention due to the growth of the interest in collaborative and federated learning. To address this issue, different Byzantine-robust mechanisms have been proposed in recent years. In this talk, I will focus on the approaches based on variance reduction and, in particular, present our recent paper where variance reduction allows us to significantly improve prior convergence guarantees.

Paper: