Gold Panning from the Mess:

Rare Category Exploration, Exposition, Representation and Interpretation

ABSTRACT

In contrast to the massive volume of data, it is often the rare categories that are of great importance in many high impact domains, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from spam image detection in social media to rare disease diagnosis in the medical decision support system. The unique challenges of rare category analysis include (1) the highly-skewed class-membership distribution; (2) the non-separability nature of the rare categories from the majority classes; (3) the data and task heterogeneity, e.g., the multi-modal representation of examples, and the analysis of similar rare categories across multiple related tasks. This tutorial aims to provide a concise review of the state-of-the-art techniques on complex rare category analysis, where the majority classes have a smooth distribution, while the minority classes exhibit a compactness property in the feature space. In particular, we start with the context, problem definition and unique challenges of complex rare category analysis; then, we present a comprehensive overview of recent advances that designed for this problem setting, from rare category exploration without any label information to the exposition step that characterize rare examples in a compact representation, from representing rare patterns in a salient embedding space to interpreting the prediction results and providing relevant clues for the end users’ investigation; at last, we will discuss the potential challenges and shed light on the future directions of complex rare category analysis.

Tutorial Slides

Outline

  • Introduction (20 mins)
    • Background
    • Problem definition
    • Challenges
  • Part I: Rare Category Exploration (60 mins)
    • Homogeneous rare category detection
    • Heterogeneous rare category detection
  • Part II: Rare Category Exploitation (60 mins)
    • Attributed data
    • Time series data
    • Graph data
  • Part III: Rare Category Representation & Interpretation (50 mins)
    • Rare Category Representation
    • Rare Category Interpretation
  • Part IV: An End-to-End Rare Category Analysis Prototype (10 mins)
  • Part V: Conclusion and Future Directions (10 mins)

Speakers' Bio

Dawei Zhou is currently a Ph.D. student at the Department of Computer Science, University of Illinois at Urbana-Champaign. His current research interests include rare category analysis, active learning, and semi-supervised learning, with applications in financial fraud detection, social network analysis. Dawei Zhou has worked on rare category analysis for 4 years, which results in 10 publications at major conferences (e.g., IJCAI, AAAI, KDD, SDM, ICDM) and journals (e.g., TKDD, DMKD).

Jingrui He is an associate professor at School of Information Sciences, University of Illinois at Urbana-Champaign. She received her Ph.D. degree from Machine Learning Department, Carnegie Mellon University in 2010. Her research interests include rare category analysis, heterogeneous learning, semi-supervised learning, and active learning, with applications in semiconductor manufacturing, social media analysis, traffic prediction, healthcare, etc. Dr. He has worked on the topic of rare category analysis for over 10 years, which results in more than 30 publications at major conferences (e.g., NIPS, IJCAI, AAAI, KDD, ICML, ICDM) and journals (e.g., TKDE, TKDD, DMKD) and 1 books, such as.