Research

Federated Learning with Buffered Asynchronous Aggregation.

John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Esmaeili Malek, and Dzmitry Huba. arXiv preprint arXiv:2106.06639, 2021 (FULL VERSION , under review)

John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Esmaeili Malek, and Dzmitry Huba. In International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with International Conference on Machine Learning 2021 (FL-ICML’21), 2021 (WORKSHOP VERSION)

ABSTRACT Federated Learning (FL) trains a shared model across distributed devices while keeping the training data on the devices. Most FL schemes are synchronous: they perform a synchronized aggregation of model updates from individual devices. Synchronous training can be slow because of late-arriving devices (stragglers). On the other hand, completely asynchronous training makes FL less private because of incompatibility with secure aggregation. In this work, we propose a model aggregation scheme, FedBuff, that combines the best properties of synchronous and asynchronous FL. Similar to synchronous FL, FedBuff is compatible with secure aggregation. Similar to asynchronous FL, FedBuff is robust to stragglers. In FedBuff, clients trains asynchronously and send updates to the server. The server aggregates client updates in a private buffer until K updates have been received, at which point a server model update is immediately performed. We provide theoretical convergence guarantees for FedBuff in a non-convex setting. Our theoretical results yield insights into different stages of the optimization process, and trade-offs between communication and computation under asynchronicity.

Empirically, FedBuff converges up to 3.8X faster than previous proposals for synchronous FL (e.g., FedAvgM), and up to 2.5X faster than previous proposals for asynchronous FL (e.g., FedAsync). We show that FedBuff is robust to different staleness distributions and is more scalable than synchronous FL techniques.

Convex Latent Effect Logit Model via Sparse and Low-rank Decomposition

ABSTRACT In this paper, we propose a convex formulation for learning logistic regression model (logit) with latent heterogeneous effect on sub-population. In transportation, logistic regression and its variants are often interpreted as discrete choice models under utility theory (McFadden, 2001). Two prominent applications of logit models in the transportation domain are traffic accident analysis and choice modeling. In these applications, researchers often want to understand and capture the individual variation under the same accident or choice scenario. The mixed effect logistic regression (mixed logit) is a popular model employed by transportation researchers. To estimate the distribution of mixed logit parameters, a non-convex optimization problem with nested high-dimensional integrals needs to be solved. Simulation-based optimization is typically applied to solve the mixed logit parameter estimation problem. Despite its popularity, the mixed logit approach for learning individual heterogeneity has several downsides. First, the parametric form of the distribution requires domain knowledge and assumptions imposed by users, although this issue can be addressed to some extent by using a non-parametric approach. Second, the optimization problems arise from parameter estimation for mixed logit and the non-parametric extensions are non-convex, which leads to unstable model interpretation. Third, the simulation size in simulation-assisted estimation lacks finite-sample theoretical guarantees and is chosen somewhat arbitrarily in practice. To address these issues, we are motivated to develop a formulation that models the latent individual heterogeneity while preserving convexity, and avoids the need for simulation-based approximation. Our setup is based on decomposing the parameters into a sparse homogeneous component in the population and low-rank heterogeneous parts for each individual.

Evaluating Lottery Tickets under Distributional Shifts.

Shrey Desai, Hongyuan Zhan, and Ahmed Aly. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 153–162, 2019

ABSTRACT The Lottery Ticket Hypothesis (Frankle and Carbin, 2019) suggests large, over-parameterized neural networks consist of small, sparse subnetworks that can be trained in isolation to reach a similar (or better) test accuracy. However, the initialization and generalizability of the obtained sparse subnetworks have been recently called into question. Our work focuses on evaluating the initialization of sparse subnetworks under distributional shifts. Specifically, we investigate the extent to which a sparse subnetwork obtained in a source domain can be re-trained in isolation in a dissimilar, target domain. In addition, we examine the effects of different initialization strategies at transfer-time. Our experiments show that sparse subnetworks obtained through lottery ticket training do not simply overfit to particular domains, but rather reflect an inductive bias of deep neural networks that can be exploited in multiple domains.

Efficient Online Hyperparameter Learning for Traffic Flow Prediction.

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, and Kesheng Wu. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 164–169. IEEE, 2018

Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction.

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, and Kesheng Wu. arXiv preprint arXiv:1811.00620, 2018 (Extended Version)

ABSTRACT Computational efficiency is an important consideration for deploying machine learning models for time series prediction in an online setting. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. Hyperparameters can significantly impact prediction accuracy. Traffic measurements, typically collected online by sensors, are serially correlated. Moreover, the data distribution may change gradually. A typical adaptation strategy is periodically re-tuning the model hyperparameters, at the cost of computational burden. In this work, we present an efficient and principled online hyperparameter optimization algorithm for Kernel Ridge regres- sion applied to traffic prediction problems. In tests with real traffic measurement data, our approach requires as little as one-seventh of the computation time of other tuning methods, while achieving better or similar prediction accuracy

Consensus Ensemble System for Traffic Flow Prediction.

Hongyuan Zhan, Gabriel Gomes, Xiaoye S Li, Kamesh Madduri, Alex Sim, and Kesheng Wu. IEEE Transactions on Intelligent Transportation Systems, 19(12):3903–3914, 2018

ABSTRACT Traffic flow prediction is a key component of an intelligent transportation system. Accurate traffic flow prediction provides a foundation to other tasks such as signal coordination and travel time forecasting. There are many known methods in literature for the short-term traffic flow prediction problem, but their efficacy depends heavily on the traffic characteristics. It is difficult, if not impossible, to pick a single method that works well over time. In this work, we present an automated framework to address this practical issue. Instead of selecting a single method, we combine predictions from multiple methods to generate a consensus traffic flow prediction. We propose an ensemble learning model that exploits the temporal characteristics of the data, and balances the accuracy of individual models and their mutual dependence through a covariance-regularizer. We additionally use a pruning scheme to remove anomalous individual predictions. We apply our proposed model to multi- step-ahead arterial roadway flow prediction. In tests, our method consistently outperforms recently published ensemble prediction methods based on Ridge Regression and Lasso. Our method also produces steady results even when the standalone models and other ensemble methods make wildly exaggerated predictions

HIGHLIGHTS Methods developed in this paper are used by the California Department of Transportation (CalTran) in the Connected Corridor project. Media coverages: 1, 2, 3