Learning from Non-IID Data: Centralized vs. Federated Learning
Jun Wu, Wenxuan Bao, Jingrui He
(junwu3@illinois.edu; wbao4@illinois.edu; jingrui@illinois.edu)
Location: IJCAI 2023, Macao S.A.R, China
Time and Date: 11:00 - 12:30, August 20, 2023 (China Standard Time/CST, UTC +8)
ABSTRACT
A common assumption in data mining is that training and test data are independent and identically distributed (IID). This assumption, however, is often violated in real scenarios when we collect data from multiple sources. It motivates us to study the problem of learning from non-IID data. To this end, this tutorial provides a comprehensive review of state-of-the-art theoretical analysis and algorithms when learning from non-IID data. We start by introducing the non-IID problems commonly seen in real world applications. In addition to standard centralized learning with shared data from different sources, we also consider the privacy-aware federated learning scenarios with private data from individual clients. Then we review the recent advances in characterizing the knowledge transferability of non-IID data and exploring the adversarial robustness under distribution shift. At last, we discuss the open problems under the non-IID assumption and shed light on the future directions of learning from non-IID data.
TUTORIAL SLIDES
OUTLINE
Introduction
What is the Non-IID Assumption?
What are the Unique Challenges of non-IID Learning?
Why does Non-IID Learning Matter in Real World Applications (Centralized vs. Federated Learning)?
Part I: Transferability of Non-IID Data
Part I-1: Centralized Transferability
How to Transfer?
When to Transfer?
Part I-2: Federated Transferability
What is Federated Learning?
How to Learn from Multiple Clients in FL?
Part II: Adversarial Robustness of Non-IID Data
Part II-1: Centralized Adversarial Robustness
How to Maliciously Attack Transferability?
How to Improve the Adversarial Robustness of Transferability?
Part II-2: Federated Adversarial Robustness
How is FL Vulnerable to Malicious Attacks?
How to Defend Attacks in FL?
Part III: Conclusion and Future Directions
SPEAKERS
Jun Wu is currently a Ph.D. candidate at Department of Computer Science, University of Illinois Urbana-Champaign. His research interests focus on statistical machine learning, transfer learning, adversarial learning, with applications in image recognition, network mining, and agriculture analysis. He has published multiple articles in the top peer-reviewed conferences and journals (e.g., NeurIPS, KDD, CIKM, AAAI, IJCAI, IEEE BigData, etc.). He has also served as a program committee member in major top machine learning and artificial intelligence conferences (e.g., KDD, IJCAI, AAAI, etc.).
Wenxuan Bao is currently a Ph.D. student at Department of Computer Science, University of Illinois Urbana-Champaign. He received the B.Eng. degree in Automation from Tsinghua University in 2020. His research interests focus on improving the effectiveness, robustness and fairness of federated learning. He has served as a program committee member or a reviewer in major top conferences (e.g., CIKM, KDD, ICLR).
Jingrui He is a Professor at School of Information Sciences, University of Illinois Urbana-Champaign. Her research focuses on heterogeneous machine learning, rare category analysis, active learning and semi-supervised learning, with applications in social network analysis, healthcare, and manufacturing processes. Dr. He has more than 100 publications at major conferences (e.g., IJCAI, AAAI, KDD, ICML, NeurIPS) and journals (e.g., TKDE, TKDD, DMKD), and is the author of two books. Her papers have received the Distinguished Paper Award at FAccT 2022, as well as Bests of the Conference at ICDM 2016, ICDM 2010, and SDM 2010. For more information, please refer to her personal website at https://ischool.illinois.edu/people/jingrui-he.