Abstract
Data heterogeneity plays a pivotal role in determining the performance of machine learning (ML) systems. Traditional algorithms, which are typically designed to optimize average performance, often overlook the intrinsic diversity within datasets. This oversight can lead to a myriad of issues, including unreliable decision-making, inadequate generalization across different domains, unfair outcomes, and false scientific inferences. Hence, a nuanced approach to modeling data heterogeneity is essential for the development of dependable, data-driven systems. In our tutorial, we present a thorough exploration of heterogeneity-aware machine learning, a paradigm that systematically integrates considerations of data heterogeneity throughout the entire ML pipeline—from data collection and model training to model evaluation and deployment. By applying this approach to a variety of critical fields, including health-care, agriculture, finance, and recommendation systems, we demonstrate the substantial benefits and potential of heterogeneity-aware ML. These applications underscore how a deeper understanding of data diversity can enhance model robustness, fairness, and reliability and help model diagnosis and improvements.
Outline
Background & Preliminaries
Data Collection: Introduces and defines the concept of predictive heterogeneity, demonstrating its utility in exploring and modeling dataset structures, with specific examples from healthcare and agriculture
Model Training: Discusses the integration of data heterogeneity into the model training process, highlighting its critical role, particularly in the applications of graph data and recommendation systems
Model Evaluation: Explores the advantages of incorporating data heterogeneity into model evaluation practices, including its application in improving face recognition model evaluation
Model Deployment: Discusses the impact of a better understanding of data heterogeneity in addressing and rectifying model failures after deployment, complemented by an in-depth finance case study
Conclusion & Future Work
Speakers