Learning from Non-IID Data: Centralized vs. Federated Learning

LocationIEEE BigData 2022, Osaka, Japan (Hybrid)

Time and Date: 14:00 - 16:00, December 20, 2022 (Japan Standard Time/JST, UTC +9)


A common assumption in data mining is that training and test data are independent and identically distributed (IID). This assumption, however, is often violated in real scenarios when we collect data from multiple sources. It motivates us to study the problem of learning from non-IID data. To this end, this tutorial provides a comprehensive review of state-of-the-art theoretical analysis and algorithms when learning from non-IID data. We start by introducing the non-IID problems commonly seen in real world applications. In addition to standard centralized learning with shared data from different sources, we also consider the privacy-aware federated learning scenarios with private data from individual clients. Then we review the recent advances in characterizing the knowledge transferability of non-IID data and exploring the adversarial robustness under distribution shift. At last, we discuss the open problems under the non-IID assumption and shed light on the future directions of learning from non-IID data.


BigData Tutorial Non-IID.pdf



Jun Wu is currently a Ph.D. candidate at Department of Computer Science, University of Illinois Urbana-Champaign. His research interests focus on statistical machine learning, transfer learning, adversarial learning, with applications in image recognition, network mining, and agriculture analysis. He has published multiple articles in the top peer-reviewed conferences and journals (e.g., NeurIPS, KDD, CIKM, AAAI, IJCAI, IEEE BigData, etc.). He has also served as a program committee member in major top machine learning and artificial intelligence conferences (e.g., KDD, IJCAI, AAAI, etc.).

Wenxuan Bao is currently a Ph.D. student at Department of Computer Science, University of Illinois Urbana-Champaign. He received the B.Eng. degree in Automation from Tsinghua University in 2020. His research interests focus on improving the effectiveness, robustness and fairness of federated learning. He has served as a program committee member or a reviewer in major top conferences (e.g., CIKM, KDD, ICLR). 

Jingrui He is a Professor at School of Information Sciences, University of Illinois Urbana-Champaign. Her research focuses on heterogeneous machine learning, rare category analysis, active learning and semi-supervised learning, with applications in social network analysis, healthcare, and manufacturing processes. Dr. He has more than 100 publications at major conferences (e.g., IJCAI, AAAI, KDD, ICML, NeurIPS) and journals (e.g., TKDE, TKDD, DMKD), and is the author of two books. Her papers have received the Distinguished Paper Award at FAccT 2022, as well as Bests of the Conference at ICDM 2016, ICDM 2010, and SDM 2010.