Exploring Rare Categories on Graphs:

Representation, Inference, and Generalization

ABSTRACT

Rare category analysis refers to the problem of studying rare examples from the under-represented minority groups in an imbalanced data, which are observed in a variety of high-impact applications, such as fraud detection, emerging trend detection, rare disease diagnosis, and de-nov drug discovery. This tutorial aims to provide a concise review of the recent developments on rare category characterization, inference, and generalization in the data presented as graphs. In this tutorial, we will provide a concise review of rare category analysis on graph-structured data. In particular, we will start with the background and the unique challenges of rare category analysis on graphs, including (1) the highly-skewed class-membership distribution; (2) the non-separability nature of the rare categories from the majority classes; (3) the data and task heterogeneity, e.g., the multi-modal representation of examples, and the analysis of similar rare categories across multiple related tasks. Then we will introduce the recent developments in this problem setting, from representing rare patterns in a salient embedding space to the inference step that characterizes rare examples in a compact representation; from interpreting the prediction results to generating high-quality synthetic rare category examples for downstream applications, such as data augmentation and novelty discover. In the end, we will discuss the key challenges in rare category analysis and future directions.

Tutorial Slides

[PDF]

Outline

  • Introduction (10 mins)

    • What is rare categories

    • What are the key challenges in rare category analysis

    • What are some real-world applications with respect to rare category analysis

  • Part I: Rare Category Representation (20 mins)

    • Unsupervised representation learning

    • Supervised representation learning

  • Part II: Rare Category Inference (20 mins)

    • Inferences for prediction

    • Inferences for interpretation

  • Part III: Rare Category Generalization (30 mins)

    • Data augmentation

    • Novelty Discovery

  • Part IV: An End-to-End Rare Category Analysis Prototype (5 mins)

  • Part V: Conclusion and Future Directions (5 mins)

  • Part VI: Q&A (5 mins)

Speakers

Dawei Zhou is currently a Ph.D. student at the Department of Computer Science, University of Illinois at Urbana-Champaign. His current research interests include rare category analysis, active learning, and semi-supervised learning, with applications in financial fraud detection, social network analysis. Dawei Zhou has worked on rare category analysis for 4 years, which results in 14 publications at major conferences (e.g., IJCAI, AAAI, KDD, WWW, CIKM, SDM, ICDM) and 7 journals (e.g., TKDD, DMKD).

Lecheng Zheng is currently a Ph.D. student at the Department of Computer Science, University of Illinois at Urbana-Champaign. His current research interests include graph generation, heterogeneous learning, and semi-supervised learning. Lecheng Zheng has worked on graph generation for two years, which results in 4 publication at major conferences (e.g. AAAI, KDD, WWW, SDM) and 1 journal.

Jingrui He is an associate professor at School of Information Sciences, University of Illinois at Urbana-Champaign. She received her Ph.D. degree from Machine Learning Department, Carnegie Mellon University in 2010. Her research interests include rare category analysis, heterogeneous learning, semi-supervised learning, and active learning, with applications in semiconductor manufacturing, social media analysis, traffic prediction, healthcare, etc. Dr. He has worked on the topic of rare category analysis for over 10 years, which results in more than 30 publications at major conferences (e.g., NIPS, IJCAI, AAAI, KDD, ICML, ICDM) and journals (e.g., TKDE, TKDD, DMKD) and 1 books.