All times are in Pacific Time (GMT-7).
08:45 Opening Remarks
09:00 Himanshu Tyagi
Information-constrained optimization: Can Adaptive Processing of Gradients Help?
We derive lower bounds for convergence rate of stochastic optimization using noisy gradients, when only limited information about the noisy gradients is allowed. Examples of such limitations on information include local privacy constraints, communication constraints, and constraint on the number of coordinates of the gradient that can be computed. Existing lower bounds for this problem have focused only on convex functions and not allowed the information constraints to be obtained adaptively. In contrast, our lower bounds apply to adaptive procedures for obtaining information about the gradients and allow us to examine whether adaptivity helps. We show that for both convex and strongly convex functions and for all three information constraints mentioned above, adaptivity does not lead to improvement in convergence rate. Furthermore, we exhibit an interesting structured least squares problem where adaptivity does lead to significantly faster optimization.
This talk is based on joint work with Jayadev Acharya, Clément Canonne, and Prathamesh Mayekar.
Himanshu Tyagi: Workshop on Communication Efficient Distributed Optimization
10:00 Martin Jaggi
Communication-Efficient Distributed Deep Learning
We discuss gradient compression methods for communication-efficient distributed training, in centrally-coordinated as well as fully decentralized scenarios. In particular, we demonstrate that low-rank linear compressors applied on model differences can lead to fast encoding and decoding, as well as efficient aggregation between workers, while maintaining training and test accuracy. The key building blocks used are the linearity of the power iteration, applied to the fast-evolving matrix of gradients, paired with error-feedback. We present empirical results on reduced training times for many neural network architectures with our open-source code, as well as theoretical convergence rates of the methods, applicable also for heterogeneous data, and asymptotically independent of the network topology and compression ratio.
This talk is based on joint work with Sai Praneeth Karimireddy & Thijs Vogels.
Martin Jaggi: Workshop on Communication Efficient Distributed Optimization
11:00 Poster Session
13:00 Virginia Smith
Heterogeneity Meets Communication-Efficiency: Challenges and Opportunities
A defining trait of federated learning is the presence of heterogeneity, i.e., that data and systems characteristics may differ significantly across the network. In this talk I show that the challenge of heterogeneity pervades the machine learning process in federated settings, affecting issues such as optimization, modeling, and fairness. In terms of optimization, I discuss distributed optimization methods that offer robustness to systems and statistical heterogeneity. I then explore the role that heterogeneity plays in delivering models that are accurate and fair to all users/devices in the network. Finally, I consider scenarios where heterogeneity may in fact afford benefits to distributed learning, through recent work in one-shot federated clustering.
Virginia Smith: Workshop on Communication Efficient Distributed Optimization
14:00 Aryan Mokhtari
Towards Communication-Efficient Personalized Federated Learning via Representation Learning and Meta-Learning
In Federated Learning (FL), we aim to train models across multiple computing units (users), while users can only communicate with a shared central server without exchanging their data samples. This mechanism exploits all users' computational power and allows users to obtain a richer model as their models are trained over a larger set of data points. However, traditional FL schemes often develop a common output for all the users and do not lead to a personalized model for each user. This is indeed an important missing feature, especially given the heterogeneity of the underlying data distribution for various users. In this talk, we address this issue by leveraging tools from (i) representation learning and (ii) meta-learning. In the first part of the talk, we study the cases that users’ data share a common representation. In particular, we discuss a novel and communication-efficient federated learning framework for learning a shared data representation across clients and unique local heads for each client. Our result shows provable sample complexity benefits for FL in the presence of data heterogeneity, both for original users and new users entering after a representation has been learned. In the second part of the talk, we focus on the settings that either there is no common structure between the given training tasks or the structure is unknown. In such cases, our goal is to obtain a global model for all users that can be adapted to each user’s task with a few computationally cheap local operations. We show how this goal can be achieved by exploiting ideas from Model-Agnostic Meta-Learning (MAML).
Aryan Mokhtari: Workshop on Communication Efficient Distributed Optimization