Xiaohan Wei

Biography

I am a Staff Research Scientist at Meta working on Ads ML modeling and large-scale distributed training. I received the Ph.D. degree in Electrical Engineering from University of Southern California (USC) co-advised by Michael J. Neely (Department of EE) and Stanislav Minsker (Department of Math). I obtained my B.S. from University of Science and Technology of China(USTC) in 2012 and M.S. from USC in 2014. My undergraduate advisor was Qing Ling. My main research interests are robust statistics and stochastic optimization.

Google Scholar

Research Areas

Theoretical Machine Learning & Compressed Sensing:

Recoverability of high dimensional structured information with heavy-tailed measurements
Stein’s method and its application to compressed sensing with nonlinear transformations
Robust covariance matrix estimation and U-statistics under weak moment assumptions

Algorithm Design & Convergence Analysis:

Near optimal algorithms for constrained Markov Decision Processes(MDPs) and bandit systems
Convergence time analysis for constrained stochastic optimization, renewal optimization and online MDPs.
Applications in Data center server provision, wireless file transmission and sensor network scheduling

PhD Thesis:

Part I: ASYNCHRONOUS OPTIMIZATION OVER WEAKLY COUPLED RENEWAL SYSTEMS. PDF
Part II: HIGH DIMENSIONAL ESTIMATION UNDER WEAK MOMENT ASSUMPTIONS: STRUCTURED RECOVERY AND MATRIX ESTIMATION. PDF

Conference papers:

Provably efficient generalized Lagrangian policy optimization for safe multi-agent reinforcement learning, Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, and Mihailo R. Jovanovic, 5th Annual Conference on Learning for Dynamics & Control, 2023.
AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models, Fan Lai, Wei Zhang, Rui Liu, William Tsai, Xiaohan Wei, Yuxi Hu, Sabin Devkota, Jianyu Huang, Jongsoo Park, Xing Liu, Zeliang Chen, Ellie Wen, Paul Rivera, Jie You, Jason Chen, Mosharaf Chowdhury, OSDI'2023.
Gradient-Variation Bound for Online Convex Optimization with Constraints, Shuang Qiu, Xiaohan Wei, Mladen Kolar, AAAI'2023.
Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits, Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao and Guanghui Lan, ICLR' 2022.
Hierarchical Training: Scaling Deep Recommendation Models on Large CPU Clusters, Yuzhen Huang, Xiaohan Wei, Xing Wang, Jiyan Yang, Bor-Yiing Su, Shivam Bharuka, Dhruv Choudhary, Zewei Jiang, Hai Zheng, Jack Langman, KDD' 2021.
Fast Distributed Training of Deep Neural Networks: Dynamic Communication Thresholding for Model and Data Parallelism, Vipul Gupta, Dhruv Choudhary, Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Michael W. Mahoney, Kannan Ramchandran, KDD' 2021. [arXiv]
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions, Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang, International Conference on Machine Learning(ICML), 2021. [Long talk top 15% of accepted papers]
Byzantine-resilient Distributed Learning under Constraints, Dongsheng Ding, Xiaohan Wei, Hao Yu, Mihailo R. Jovanovic, American Control Conference (ACC), 2021.
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization, Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovic, AISTATS' 2021. [Oral presentation 48/455 accepted papers]
Upper Confidence Primal-Dual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions, Shaung Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang, NeurIPS'20.
Robust One-Bit Recovery via ReLU Generative Networks: Near Optimal Statistical Rates and Global Landscape Analysis, Shuang Qiu*, Xiaohan Wei*, Zhuoran Yang, International Conference on Machine Learning(ICML) 2020. (*=Equal Contribution) arXiv
- Short version selected as a long talk with discussions in NeurIPS'19 Workshop on Solving Inverse Problems with Deep Networks
Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual optimization, Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovic, NeurIPS'19 OptRL. arXiv
Online Primal-Dual Mirror Descent under Stochastic Constraints, Xiaohan Wei, Hao Yu, Michael J. Neely, ACM Sigmetrics, 2020. Slides
Distributed Robust Statistical Learning: Byzantine Dual Averaging, Dongsheng Ding, Xiaohan Wei, Mihailo Jovanovic, IEEE Conference on Decision and Control (CDC), 2019.
On the Statistical Rate of Nonlinear Recovery in Generative Models with Heavy-tailed Data, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, International Conference on Machine Learning(ICML), 2019.
Solving Non-smooth Constrained Programs with Lower Complexity than O(1/ε): A Primal-Dual Homotopy Smoothing Approach, Xiaohan Wei, Hao Yu, Qing Ling, Michael J. Neely, 32nd Conference on Neural Information Processing Systems(NIPS), 2018.
Online Learning in Weakly Coupled Markov Decision Processes: A Convergence Time Study, Xiaohan Wei, Hao Yu, Michael J. Neely, ACM Sigmetrics, 2018. Abstract Slides
Estimation of Covariance Structure of Heavy-tailed Distributions, Stanislav Minsker, Xiaohan Wei (a-b order), 31st Conference on Neural Information Processing Systems(NIPS), 2017.
Online Convex Optimization with Stochastic Constraint, Hao Yu, Michael J. Neely, Xiaohan Wei, 31st Conference on Neural Information Processing Systems(NIPS), 2017.
Robust Group LASSO Over Decentralized Networks, Manxi Wang, Yongcheng Li, Xiaohan Wei, Qing Ling, IEEE Proceedings GlobalSIP (Oral), 2016.
Delay Optimal Power Aware Opportunistic Scheduling with Mutual Information Accumulation, Xiaohan Wei, Michael J. Neely, IEEE Proceedings WiOpt, 2016.
Power Aware Wireless File Downloading: A Constrained Restless Bandit Approach, Xiaohan Wei, Michael J. Neely, IEEE Proceedings WiOpt, 2014.

Journal papers:

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale, Zhaoxia Summer Deng, Jongsoo Park, Ping Tak Peter Tang, Haixin Liu, Jie Yang, Hector Yuen, Jianyu Huang, Daya S Khudia, Xiaohan Wei, Ellie Wen, Dhruv Choudhary, Raghuraman Krishnamoorthi, Carole-Jean Wu, Nadathur Satish, Changkyu Kim, Maxim Naumov, Sam Naghshineh, Misha Smelyanskiy, IEEE Micro, doi: 10.1109/MM.2021.3081981.
Online Primal-Dual Mirror Descent under Stochastic Constraints, Xiaohan Wei, Hao Yu, Michael J. Neely, Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2020. arXiv
- Short version invited to IOS 2020 for presentations
Moment inequalities for matrix-valued U-statistics of order 2, Stanislav Minsker, Xiaohan Wei (a-b order), Electronic Journal of Probability (EJP), 2019. arXiv
Robust Modifications of U-statistics and Applications to Covariance Estimation Problems, Stanislav Minsker, Xiaohan Wei (a-b order), Bernoulli, 2019. arXiv
Robust Group Lasso: Model and Recoverability, Xiaohan Wei, Qing Ling, Zhu Han, Linear Algebra and Its Applications (LAA), 2018.
Structured Signal Recovery from Non-linear and Heavy-tailed Measurements, Larry Goldstein, Stanislav Minsker, Xiaohan Wei (a-b order), IEEE Transactions on Information Theory, 2018.
Online Learning in Weakly Coupled Markov Decision Processes: A Convergence Time Study, Xiaohan Wei, Hao Yu, Michael J. Neely, Proceedings of the ACM on Measurement and Analysis of Computing Systems 2.1 (2018): 12.
Asynchronous Optimization over Weakly Coupled Renewal Systems, Xiaohan Wei, Michael J. Neely, INFORMS Stochastic Systems, to appear, 2017.
Non-Gaussian Observations in Nonlinear Compressed Sensing via Stein Discrepancies, Larry Goldstein, Xiaohan Wei (a-b order), Information and Inference: A Journal of the IMA, iay006, 2018
Data Center Server Provision: Distributed Asynchronous Control for Coupled Renewal Systems, Xiaohan Wei, Michael J. Neely, IEEE\ACM Tran. Networking, 25(4), pp. 2180-2194, 2017.
Power Aware Wireless File Downloading: A Lyapunov Indexing Approach to A Constrained Restless Bandit Problem, Xiaohan Wei, Michael J. Neely, IEEE\ACM Trans. Networking, 24(4), pp. 2264-2277, 2016.
DOA Estimation Using a Greedy Block Coordinate Descent Algorithm, Xiaohan Wei, Yabo Yuan, and Qing Ling, IEEE Trans. Signal Processing, 60(12), 6382 - 6394, 2012.
DOA Estimation Based on Sparse Signal Recovery Utilizing Weighted L1-Norm Penalty, Xu Xu, Xiaohan Wei, Zhongfu Ye, IEEE Signal Processing Letters, 19(3), 155-158, 2012.

Preprints:

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction, Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang, Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li, Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen, submitted, 2022.
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning, Shaung Qiu, Zhuoran Yang, Xiaohan Wei, Jieping Ye, Zhaoran Wang, submitted, 2020.
Opportunistic Scheduling over Time Varying Renewal Systems: An Empirical Method, Xiaohan Wei Michael J. Neely, submitted, 2019.
Primal-Dual Frank-Wolfe for Constrained Stochastic Programs with Convex and Non-convex Objectives, Xiaohan Wei, Michael J. Neely, Submitted, 2018. arXiv
Structured Recovery with Heavy-tailed Measurements: A Thresholding Procedure and Optimal Rates, Xiaohan Wei, Submitted, 2018. arXiv
A Probabilistic Sample Path Convergence Time Analysis of Drift-Plus-Penalty for Stochastic Optimization, Xiaohan Wei, Hao Yu, Michael J. Neely, Advances in Applied Probability, minor revision, 2016.

Professional Experiences

Research Scientist, Meta, Menlo Park, Aug. 2019 - Now

Ads ML modeling
Large-scale distributed training

Software Engineering intern, Facebook, Menlo Park, Sep. 2018 - Dec. 2018

Design and implement a new budget pacing system for multiple user cases in marketplace intelligence.
Rebuild parts of marketplace service framework to support user targeting on top of budget pacing.
Theoretical analysis of the pacing algorithm and prove the asymptotic stability of pacing via Lyapunov theory.

Research Scientist intern, Tencent AI lab, Seattle, Jun. 2018 - Aug. 2018

Design a new algorithm for variational inference and reinforcement learning in partially observed Markov decision processes
Application to self-localization and navigation in 2Dmazes.

Honors and Awards

IPAM fellow, UCLA, Spring, 2018.
Ming-Hsieh Scholar (Top 4 Ph.D. scholar in EE department), USC, 2017-18.
NIPS Travel Award, Long Beach, 2017.
Ming-Hsieh Master Honor Program (Top 20 in EE master program), USC, 2014.
Outstanding Undergraduate Research Award (Top 6 in EE department), USTC, 2012.
Outstanding Undergraduate Student Scholarship, USTC, 2008-11.

Talks

Online Primal-Dual Mirror Descent under Stochastic Constraints, Sigmetrics'20, Virtual, June, 2020.
Robust One-Bit Recovery via ReLU Generative Networks: Near Optimal Statistical Rates and Global Landscape Analysis, ICML'20, Virtual, June, 2020
On the Statistical Rate of Nonlinear Recovery in Generative Models under Weak Moment Assumptions, ICML’19, Long Beach, Jun. 2019.
Online Learning in Weakly Coupled Markov Decision Processes, Sigmetrics'18, UC Irvine, Jun. 2018
Online Learning with Stochastic Constraints, Department of Automation, USTC, Jan. 2018.
Robust Estimation in High Dimensional Spaces, Ming-Hsieh Institute (MHI), USC, Sep. 2017.
Estimation of Covariance Structure of Heavy-tailed Distributions, Information Science Institute (ISI), USC, Jun. 2017.
Non-linear compressed sensing with heavy-tailed measurements, Southern California Applied Mathematics Symposium (SOCAMS), UC Irvine, Jun. 2017.
Structured signal recovery from non-linear and heavy-tailed measurements, Conference on Big Data in Economics, USC, Oct. 2016.
Delay Optimal Power Aware Opportunistic Scheduling with Mutual Information Accumulation, IEEE Symposium Wiopt, Arizona State University, May 2016.
Power Aware Wireless File Downloading: A Constrained Restless Bandit Approach, IEEE Symposium WiOpt, Hammamet, Tunisia, May 2014.

Posters

Estimation of Covariance Structure of Heavy-tailed Distributions, SoCal Machine Learning Symposium 2017. Abstract

Misc notes

[Summer 2015] notes on Malliavin calculus.
[Fall 2017] functional analysis summary.
[Fall 2020] A scaling law on the second moment of quantization error of heavy-tailed distributions link
[Spring 2022] Effectiveness of EMA teacher through the lens of online learning link