Data Analytics in Healthcare: Problems, Challenges and Future Directions

Health informatics refers to the process of leveraging information technologies to improve the quality of healthcare delivery. In recent years, the application of data analytics technologies into healthcare has aroused considerable interests in both data analysis and medical communities. In this tutorial, we will introduce the healthcare data, popular analytic problems, point out challenges and future research directions.

The goal of this tutorial is to provide a concise review of how the state-of-the-art data analytics technologies can be applied in healthcare, point out the challenges and identify future research directions.

Slides Available


1. Introduction: data analytics and healthcare

  • The current status of healthcare
  • What is health informatics
  • Examples of health informatics problems
  • Predictive Modeling
  • Care coordination
  • Comparative effectiveness research
  • Healthcare data
  • EMR data
  • Imaging data
  • Drug data
  • Genotype data
  • What is data analytics
  • The role of data analytics in healthcare
2. Feature Construction and Representation in Health Informatics
  •  Structured EMR Data
    • Storage Unification: Universal Feature Model
    • Representation Unification: Knowledge Driven Quantization
  • Unstructured EMR Data
    • Raw feature extraction with NLP Systems
    • Context analysis
  • Imaging Data
    • Photometric features
  • Geometric features
  • Drug Data
    • Chemical compounds
    • Protein targets
    • Therapeutic indications
    • Side-effects
  • Genotype Data
    • Gene expression
    • DNA sequence
    • Protein network
    • Feature Construction
      • Vector based representation
      • Sequence based representation
      • Matrix based representation
      • Tensor based representation
  • Feature Representation
3. Examples of Healthcare Analytics Problems
  • Predictive Modeling
    • Problem setting
    • Predictive modeling pipeline
    • Case study on Congestive Heart Failure
    • Parallel platform
  • Patient Similarity
    • Problem setting
    • Potential applications in personalized medicine
    • An example on supervised patient similarity evaluation
    • Visualization interface
  • . Risk Stratification
    • Problem setting
    • Traditional scoring based methods
    • (Semi-supervised) clustering method
    • A bilinear model towards better interpretability
    • Predictive Modeling
  • Disease Progression Modeling
    • Problem setting
    • Optimization approach with case study on Alzheimer’s disease
    • Probabilistic approach with case study on Chronic Obstructive Pulmonary Disease
4. Conclusions and Future Directions

All the researchers and practitioners engaged in data analytics and health informatics are welcome.
No prior knowledge on specific algorithms is assumed.