Federated Learning One World Seminar
Archive of Talks: FLOW Season 2023
FLOW Talk #111
November 15, 2023 @ 1pm Coordinated Universal Time (UTC)
The Future of Consumer Edge-AI Computing
host: Samuel Horvath
Abstract: Deep Learning has proliferated dramatically across consumer devices in less than a decade, but has been largely powered through the hardware acceleration within isolated devices. Nonetheless, clear signals exist that the next decade of consumer intelligence will require levels of resources, a mixing of modalities and a collaboration of devices that will demand a significant pivot beyond hardware alone. To accomplish this, we believe a new Edge-AI paradigm will be necessary for this transition to be possible in a sustainable manner, without trespassing user-privacy or hurting quality of experience.
Paper:
The Future of Consumer Edge-AI Computing: Stefanos Laskaridis, Stylianos I Venieris, Alexandros Kouris, Rui Li, Nicholas D Lane, arXiv:2210.10514, 2022.
FLOW Talk #110
October 4, 2023 @ 1pm Coordinated Universal Time (UTC)
DualFL: A duality-based federated learning algorithm with communication acceleration in the general convex regime
[slides]
host: Sebastian Stich
Abstract: In this talk, we propose a novel training algorithm called DualFL (Dualized Federated Learning), for solving a distributed optimization problem in federated learning. Our approach is based on a specific dual formulation of the federated learning problem. DualFL achieves communication acceleration under various settings on smoothness and strong convexity of the problem. Moreover, it theoretically guarantees the use of inexact local solvers, preserving its optimal communication complexity even with inexact local solutions. DualFL is the first federated learning algorithm that achieves communication acceleration, even when the cost function is either nonsmooth or non-strongly convex.
This is a joint work with Jinchao Xu.
Paper:
FLOW Talk #109
August 30, 2023 @ 5pm Coordinated Universal Time (UTC)
FedTree: A Federated Learning System For Trees
[slides]
host: Sebastian Stich
Abstract: While the quality of machine learning services largely relies on the volume of training data, data regulations such as the General Data Protection Regulation (GDPR) impose stringent requirements on data transfer. Federated learning has emerged as a popular approach for enabling collaborative machine learning without sharing raw data. To facilitate the rapid development of federated learning, efficient and user-friendly federated learning systems are essential. Despite many existing federated learning systems designed for deep learning, tree-based federated learning systems have not been well exploited. This paper presents a tree-based federated learning system under a histogram-sharing scheme, named FedTree, that supports both horizontal and vertical federated training of GBDTs with configurable privacy protection techniques. Our extensive experiments show that FedTree achieves competitive accuracy to centralized training while incurring much less computational cost than the other generic federated learning systems.
Paper:
FedTree: A Federated Learning System For Trees, Qinbin Li, Zhaomin Wu, Yanzheng Cai, Yuxuan Han, Ching Man Yung, Tianyuan Fu and Bingsheng He. MLSys Conference, Miami Beach, FL, USA, 2023.
FLOW Talk #108
August 16, 2023 @ 1pm Coordinated Universal Time (UTC)
On-Device Training under 256KB of Memory
[slides]
host: Dan Alistarh
Abstract: On-device training enables the model to adapt to new data collected from the sensors. Users can benefit from customized AI models without having to transfer the data to the cloud, preserving privacy. However, the training memory footprint is prohibitive for IoT devices. I’ll present "Tiny Transfer Learning”(NeurIPS’20) and "On-Device Learning under 256KB Memory” (NeurIPS’22) to solve this issue. I’ll first analyze the memory bottleneck, showing that we should reduce the activations, not just trainable parameters for efficient on-device learning. I’ll then introduce Quantization-Aware Scaling (QAS) to calibrate the gradient scales and stabilize 8-bit quantized training, and "sparse update" to skip the gradient computation of less important layers and sub-tensors to save activation memory. The algorithm innovation is implemented by a lightweight training system, Tiny Training Engine, which prunes the backward computation graph to support sparse updates and offload the runtime auto-differentiation to compile time. Deployed on STM32H746 microcontroller, our framework uses less than 1/1000 of the training memory of Tensorflow and Pytorch while matching the accuracy. Our study enables IoT devices to not only perform inference but also continuously adapt to new data for on-device lifelong learning.
FLOW Talk #107
August 2, 2023 @ 5pm Coordinated Universal Time (UTC)
Can Public Large Language Models Help Private Cross-device Federated Learning?
host: Virginia Smith
Abstract: We study (differentially) private federated learning (FL) of language models. The language models in cross-device FL are relatively small, which can be trained with meaningful formal user-level differential privacy (DP) guarantees when massive parallelism in training is enabled by the participation of a moderate size of users. Recently, public data has been used to improve privacy-utility trade-offs for both large and small language models. In this talk, we will cover our systematic study of using large-scale public data and LLMs to help differentially private training of on-device FL models, and further improve the privacy-utility tradeoff by techniques of distillation. Moreover, we propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution, which significantly improves the sample efficiency of (pre-)training on public data. The proposed method is efficient and effective for training private model by taking advantage of public data, especially for customized on-device architectures that do not have ready-to-use pre-trained models.
Paper:
Abstract: Hyperparameter tuning is critical to the success of federated learning applications. Unfortunately, appropriately selecting hyperparameters is challenging in federated networks, as issues of scale, privacy, and heterogeneity introduce noise in the tuning process and make it difficult to faithfully evaluate the performance of various hyperparameters. In this work we perform the first systematic study on the effect of noisy evaluation in federated hyperparameter tuning. We first identify and rigorously explore key sources of noise, including client subsampling, data and systems heterogeneity, and data privacy. Surprisingly, our results indicate that even small amounts of noise can have a significant impact on tuning methods—reducing the performance of state-of-the-art approaches to that of naive baselines. To address noisy evaluation in such scenarios, we propose a simple and effective approach that leverages public proxy data to boost evaluation signal. Our work establishes general challenges, baselines, and best practices for future work in federated hyperparameter tuning.
Paper:
On Noisy Evaluation in Federated Hyperparameter Tuning, K. Kuo, P. Thaker, M. Khodak, J. Ngyuen, D. Jiang, A. Talwalkar, V. Smith, Conference on Machine Learning and Systems (MLSys), 2023
FLOW #105
June 7, 2023 @ 1pm Coordinated Universal Time (UTC)
On the 5th Generation of Local Training Methods in Federated Learning
[slides]
host: Samuel Horváth
Abstract: I will outline the history of the theoretical development of the local training “trick” employed in virtually all successful federated learning algorithms. In particular, I will identify five distinct generations of methods and results: 1) heuristic, 2) homogeneous, 3) sublinear, 4) linear and 5) accelerated. The 5th generation, initiated by the ProxSkip algorithm by Mishchenko et al (ICML 2022), finally led to the proof that local training, if carefully executed, leads to provable acceleration of communication complexity, without requiring any data homogeneity assumptions. Because these latest advances are very new, there are many opportunities to develop the 5th generation of local training methods further. I will give a brief overview of what we know now, and what problems still remain open.
Papers:
Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich and Peter Richtárik. ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! ICML 2022
Abdurakhmon Sadiev, Dmitry Kovalev and Peter Richtárik. Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox. NeurIPS 2022
Grigory Malinovsky, Kai Yi and Peter Richtárik. Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning. NeurIPS 2022
Laurent Condat and Peter Richtárik. RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates. ICLR 2023
Artavazd Maranjyan, Mher Safaryan and Peter Richtárik. GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity. arXiv:2210.16402, 2022
Laurent Condat, Ivan Agarský and Peter Richtárik. Provably Doubly Accelerated Federated Learning: The First Theoretically Successful Combination of Local Training and Compressed Communication. arXiv:2210.13277, 2022
Michał Grudzień, Grigory Malinovsky and Peter Richtárik. Can 5th Generation Local Training Methods Support Client Sampling? Yes! AISTATS 2023
Laurent Condat, Grigory Malinovsky and Peter Richtárik. TAMUNA: Accelerated Federated Learning with Local Training and Partial Participation. arXiv:2302.09832, 2023
FLOW Talk #104
May 31, 2023 @ 1pm Coordinated Universal Time (UTC)
Federated Averaging Made Asynchronous and Communication-Efficient
[slides]
host: Samuel Horváth
Abstract: In this work, we take steps towards addressing two of the main practical challenges when scaling federated optimization to large node counts: the need for tight synchronization between the central authority and individual computing nodes, and the large communication cost of transmissions between the central server and clients. Specifically, we present a new variant of the classic federated averaging (FedAvg) algorithm, which supports both asynchronous communication and communication compression. We provide a new analysis technique showing that, in spite of these system relaxations, our algorithm can provide similar convergence to FedAvg in some parameter regimes. Experimental results in the LEAF benchmark on setups of up to 300 nodes show that our algorithm ensures fast convergence for standard federated tasks, improving upon prior quantized and asynchronous approaches.
Paper:
Hossein Zakerinia, Shayan Talaei, Giorgi Nadiradze, Dan Alistarh. https://arxiv.org/abs/2206.10032
FLOW Talk #103
May 24, 2023 @ 5pm Coordinated Universal Time (UTC)
On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data
[slides]
host: Samuel Horváth
Abstract: Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. In order to explain the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions, this paper introduces the client consensus hypothesis: on some federated datasets, the average of client model updates starting from the optimum is very small and close to zero. We prove that under client consensus hypothesis, data heterogeneity can have no negative impact on the convergence of FedAvg. Moreover, we show that client consensus hypothesis holds on a simple quadratic problem and many naturally heterogeneous datasets (such as FEMNIST and StackOverflow). Therefore, the hypothesis is realistic and can lead to better understanding of the empirical success of FedAvg.
Papers:
Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang. On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data. arXiv:2206.04723, 2022
FLOW Talk #102
April 19, 2023 @ 1pm Coordinated Universal Time (UTC)
CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning
host: Aurélien Bellet
Abstract: Federated Learning (FL) is a setting for training machine learning models in distributed environments where the clients do not share their raw data but instead send model updates to a server. However, model updates can be subject to attacks and leak private information. Differential Privacy (DP) is a leading mitigation strategy which involves adding noise to clipped model updates, trading off performance for strong theoretical privacy guarantees. Previous work has shown that the threat model of DP is conservative and that the obtained guarantees may be vacuous or may overestimate information leakage in practice. In this paper, we aim to achieve a tighter measurement of the model exposure by considering a realistic threat model. We propose a novel method, CANIFE, that uses canaries - carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round. We apply this attack to vision models trained on CIFAR-10 and CelebA and to language models trained on Sent140 and Shakespeare. In particular, in realistic FL scenarios, we demonstrate that the empirical per-round epsilon obtained with CANIFE is 4-5x lower than the theoretical bound.
Paper:
Samuel Maddock, Alexandre Sablayrolles, Pierre Stock. CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning. ICLR 2023
FLOW Talk #101
April 12, 2023 @ 1pm Coordinated Universal Time (UTC)
FLECS: A Federated Learning Second-Order Framework via Compression and Sketching
host: Samuel Horváth
Abstract: Inspired by the recent work FedNL (Safaryan et al, FedNL: Making Newton-Type Methods Applicable to Federated Learning), we propose a new communication efficient second-order framework for Federated learning, namely FLECS. The proposed method reduces the high-memory requirements of FedNL by the usage of an L-SR1 type update for the Hessian approximation which is stored on the central server. A low dimensional `sketch' of the Hessian is all that is needed by each device to generate an update, so that memory costs as well as number of Hessian-vector products for the agent are low. Biased and unbiased compressions are utilized to make communication costs also low. Convergence guarantees for FLECS are provided in both the strongly convex, and nonconvex cases, and local linear convergence is also established under strong convexity. Numerical experiments confirm the practical benefits of this new FLECS algorithm.
Paper:
Artem Agafonov, Dmitry Kamzolov, Rachael Tappenden, Alexander Gasnikov, Martin Takáč. FLECS: A Federated Learning Second-Order Framework via Compression and Sketching, arXiv:2206.02009.
FLOW Talk #100
March 29, 2023 @ 4pm Coordinated Universal Time (UTC)
EIFFeL: Ensuring Integrity for Federated Learning
[slides]
host: Aurélien Bellet
Abstract: Federated learning (FL) enables clients to collaborate with a server to train a machine learning model. To ensure privacy, the server performs secure aggregation of model updates from the clients. Unfortunately, this prevents verification of the well-formedness (integrity) of the updates as the updates are masked. Consequently, malformed updates designed to poison the model can be injected without detection. In this talk, I will formalize the problem of ensuring both update privacy and integrity in FL and present a new system, EIFFeL, that enables secure aggregation of verified updates. EIFFeL is a general framework that can enforce arbitrary integrity checks and remove malformed updates from the aggregate, without violating privacy. Further, EIFFeL is practical for real-world usage. For instance, with 100 clients and 10% poisoning, EIFFeL can train an MNIST classification model to the same accuracy as that of a non-poisoned federated learner in just 2.4s per iteration.
Paper:
Amrita Roy Chowdhury, Chuan Guo, Somesh Jha, Laurens van der Maaten. EIFFeL: Ensuring Integrity for Federated Learning. CCS 2022.
FLOW Talk #99
March 22, 2023 @ 5pm Coordinated Universal Time (UTC)
Sparse Random Networks for Communication-Efficient Federated Learning
host: Peter Richtárik
Abstract: One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights at all. Instead, our method freezes the weights at their initial random values and learns how to sparsify the random network for the best performance. To this end, the clients collaborate in training a stochastic binary mask to find the optimal sparse random network within the original one. At the end of the training, the final model is a sparse network with random weights – or a sub-network inside the dense random network. We show improvements in accuracy, communication (less than 1 bit per parameter (bpp)), convergence speed, and final model size (less than 1 bpp) over relevant baselines on MNIST, EMNIST, CIFAR-10, and CIFAR-100 datasets, in the low bitrate regime.
Paper:
Berivan Isik, Francesco Pase, Deniz Gunduz, Tsachy Weissman, Zorzi Michele. Sparse random networks for communication-efficient federated learning, ICLR 2023.
FLOW Talk #98
March 15, 2023 @ 5pm Coordinated Universal Time (UTC)
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
[slides]
host: Samuel Horváth
Abstract: An oft-cited challenge of federated learning (FL) is the presence of heterogeneity. The data at different clients may follow very different distributions, gives rise to data heterogeneity. And client devices may have very different capabilities (compute, memory, network bandwidth) giving rise to system heterogeneity. The predominant training paradigm is local-update methods such as Federated Averaging, and several modifications have been proposed to address sources of heterogeneity. Empirical evaluations in these studies usually start federated training from a random initialization. However, in many practical applications of FL, the server may have access to some proxy data for the task that can be used to pre-train a model before starting federated training. We empirically study the impact of starting from a pre-trained model in FL. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models than is possible when starting from a random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. This study raises several questions for further work on understanding the role of heterogeneity and initialization in federated training.
Paper:
John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat. Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning, ICLR 2023.
FLOW Talk #97
March 8, 2023 @ 5pm Coordinated Universal Time (UTC)
TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels
host: Sebastian Stich
Abstract: State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. For neural networks, even when centralized SGD easily finds a solution that is simultaneously performant for all clients, current federated optimization methods fail to converge to a comparable solution. We show that this performance disparity can largely be attributed to optimization challenges presented by nonconvexity. Specifically, we find that the early layers of the network do learn useful features, but the final layers fail to make use of them. That is, federated optimization applied to this non-convex problem distorts the learning of the final layers. Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e.g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation. Our technique yields accuracy improvements of up to +36% on FMNIST and +37% on CIFAR10 when clients have dissimilar data.
(Joint work with Alexander Wei, Sai Praneeth Karimireddy, Yi Ma, and Michael I. Jordan)
Paper:
FLOW Talk #96
March 1, 2023 @ 5pm Coordinated Universal Time (UTC)
Federated Automatic Differentiation
[slides]
host: Peter Richtárik
Abstract: Federated learning (FL) is a general framework for learning across heterogeneous clients while preserving data privacy, under the orchestration of a central server. FL methods often compute gradients of loss functions purely locally (ie. entirely at each client, or entirely at the server), typically using automatic differentiation (AD) techniques. We propose a federated automatic differentiation (FAD) framework that 1) enables computing derivatives of functions involving client and server computation as well as communication between them and 2) operates in a manner compatible with existing federated technology. In other words, FAD computes derivatives across communication boundaries. We show, in analogy with traditional AD, that FAD may be implemented using various accumulation modes, which introduce distinct computation-communication trade-offs and systems requirements. Further, we show that a broad class of federated computations is closed under these various modes of FAD, implying in particular that if the original computation can be implemented using privacy-preserving primitives, its derivative may be computed using only these same primitives. We then show how FAD can be used to create algorithms that dynamically learn components of the algorithm itself. In particular, we show that FedAvg-style algorithms can exhibit significantly improved performance by using FAD to adjust the server optimization step automatically, or by using FAD to learn weighting schemes for computing weighted averages across clients.
Paper:
Keith Rush, Zachary Charles, Zachary Garrett. Federated Automatic Differentiation. arXiv:2301.07806, 2023
FLOW Talk #95
February 22, 2023 @ 1pm Coordinated Universal Time (UTC)
Combining federated learning and split learning, and a distributed machine learning framework with strict access control techniques for privacy and security
host: Samuel Horváth
Abstract: Federated learning (FL) and split learning (SL) provide default data privacy by following a model-to-data scenario; clients train and test machine learning models without sharing raw data. For faster model training in a resourced-constrained environment with several clients, FL and SL need to be blended to leverage their advantages jointly. In this regard, we present splitfed learning (SFL). Moreover, we further discuss the comparative training performance of FL, SL and SFL under real-world device settings, e.g., Raspberry Pi. FL, SL and SFL are suitable for model development under the consideration of highly sensitive, illegal to possess and psychologically harmful data; however, additional measures, including strict control, monitoring, and examination of all the activities involved, including communication, execution, and release of algorithms, datasets, outputs, and results are required within the machine learning framework. Thus, we present a new multi-zoned framework called MaLFraDA. MaLFraDA has soft air gaps between its zones to isolate and control communication in and out of the framework.
Paper:
Chandra Thapa, Mahawaga Arachchige Pathum Chamikara, Seyit Camtepe, Lichao Sun. Splitfed: When federated learning meets split learning. AAAI 2022
Yansong Gao, Minki Kim, Chandra Thapa, Sharif Abuadbba, Zhi Zhang, Seyit A Camtepe, Hyoungshick Kim, Surya Nepal. Evaluation and Optimization of Distributed Machine Learning Techniques for Internet of Things. IEEE Transactions on Computers 2022
Chandra Thapa, Seyit Camtepe, Raj Gaire, Surya Nepal, Seung Ick Jang. Demo-MaLFraDA: A Machine Learning Framework with Data Airlock. ACM CCS 2022
FLOW Talk #94
February 15, 2023 @ 1pm Coordinated Universal Time (UTC)
Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes
host: Samuel Horváth
Abstract: In this work, we consider the problem of minimizing the sum of Moreau envelopes of given functions, which has previously appeared in the context of meta-learning and personalized federated learning. In contrast to the existing theory that requires running subsolvers until a certain precision is reached, we only assume that a finite number of gradient steps is taken at each iteration. As a special case, our theory allows us to show the convergence of First-Order Model-Agnostic Meta-Learning (FO-MAML) to the vicinity of a solution of Moreau objective. We also study a more general family of first-order algorithms that can be viewed as a generalization of FO-MAML. Our main theoretical achievement is a theoretical improvement upon the inexact SGD framework. In particular, our perturbed-iterate analysis allows for tighter guarantees that improve the dependency on the problem's conditioning. In contrast to the related work on meta-learning, ours does not require any assumptions on the Hessian smoothness, and can leverage smoothness and convexity of the reformulation based on Moreau envelopes. Furthermore, to fill the gaps in the comparison of FO-MAML to the Implicit MAML (iMAML), we show that the objective of iMAML is neither smooth nor convex, implying that it has no convergence guarantees based on the existing theory.
Paper:
K Mishchenko, S Hanzely, P Richtárik. Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes, arXiv preprint arXiv:2301.06806, 2023.
FLOW Talk #93
February 8, 2023 @ 1pm Coordinated Universal Time (UTC)
FedPop: A Bayesian Approach for Personalised Federated Learning
host: Aurélien Bellet
Abstract: Personalised federated learning (FL) aims at collaboratively learning a machine learning model taylored for each client. Albeit promising advances have been made in this direction, most of existing approaches works do not allow for uncertainty quantification which is crucial in many applications. In addition, personalisation in the cross-device setting still involves important issues, especially for new clients or those having small number of observations. This paper aims at filling these gaps. To this end, we propose a novel methodology coined FedPop by recasting personalised FL into the population modeling paradigm where clients' models involve fixed common population parameters and random effects, aiming at explaining data heterogeneity. To derive convergence guarantees for our scheme, we introduce a new class of federated stochastic optimisation algorithms which relies on Markov chain Monte Carlo methods. Compared to existing personalised FL methods, the proposed methodology has important benefits: it is robust to client drift, practical for inference on new clients, and above all, enables uncertainty quantification under mild computational and memory overheads. We provide non-asymptotic convergence guarantees for the proposed algorithms and illustrate their performances on various personalised federated learning tasks.
Paper:
Nikita Kotelevskii, Maxime Vono, Eric Moulines and Alain Durmus. FedPop: A Bayesian Approach for Personalised Federated Learning. NeurIPS 2022.
FLOW Talk #92
February 1, 2023 @ 1pm Coordinated Universal Time (UTC)
Decentralized Constrained Optimization, Double Averaging and Gradient Projection
host: Samuel Horváth
Abstract: We consider a generic decentralized constrained optimization problem over static, directed communication networks, where each agent has exclusive access to only one convex, differentiable, local objective term and one convex constraint set. For this setup, we propose a novel decentralized algorithm, called DAGP (Double Averaging and Gradient Projection), based on local gradients, projection onto local constraints, and local averaging. We achieve global optimality through a novel distributed tracking technique we call distributed null projection. Further, we show that DAGP can also be used to solve unconstrained problems with non-differentiable objective terms, by employing the so-called epigraph projection operators (EPOs). In this regard, we introduce a new fast algorithm for evaluating EPOs. We study the convergence of DAGP and establish (1/K‾‾√) convergence in terms of feasibility, consensus, and optimality. For this reason, we forego the difficulties of selecting Lyapunov functions by proposing a new methodology of convergence analysis in optimization problems, which we refer to as aggregate lower-bounding. To demonstrate the generality of this method, we also provide an alternative convergence proof for the gradient descent algorithm for smooth functions. Finally, we present numerical results demonstrating the effectiveness of our proposed method in both constrained and unconstrained problems.
Paper:
Firooz Shahriari-Mehr and Ashkan Panahi. Decentralized Constrained Optimization, Double Averaging and Gradient Projection. arXiv:2210.03232, 2022.
FLOW Talk #91
January 18, 2023 @ 5pm Coordinated Universal Time (UTC)
Leveraging Spatial and Temporal Correlations in Distributed Learning
host: Samuel Horváth
Abstract: Distributed mean estimation is a central component of federated learning. In this talk, I will present work on the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments to show that our estimators consistently outperform more sophisticated and expensive sparsification methods.
Paper:
Jhunjhunwala, Divyansh, Ankur Mallick, Advait Gadhikar, Swanand Kadhe, and Gauri Joshi. Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation. Advances in Neural Information Processing Systems (NeurIPS), 2021.