SDM16 Tutorial: Biomedical Data Mining with Matrix Models

In the last decade, advances in high-throughput technologies, growth of clinical data warehouses, and rapid accumulation of biomedical knowledge provided unprecedented opportunities and challenges to researchers in biomedical informatics. One distinct solution, to efficiently conduct big data analytics for biomedical problems, is the application of matrix computation and factorization methods such as non-negative matrix factorization, joint matrix factorization, tensor factorization. Compared to probabilistic and information theoretic approaches, matrixbased methods are fast, easy to understand and implement. In this tutorial, we provide a review of recent advances in algorithms and methods using matrix and their potential applications in biomedical informatics. We survey various related articles from data mining venues as well as from biomedical informatics venues to share with the audience key problems and trends in matrix computation research, with different novel applications such as drug repositioning, personalized medicine, and electronic phenotyping.

Outline
  • Introduction: matrices in biomedical data mining
    • Where are the matrices 
      • Traditional data: documents, images 
      • Matrices in biomedical informatics 
    • What are the typical problems 
      • Traditional learning: classification, clustering, semi-supervised learning 
      • Problems in biomedical informatics where matrix models could be useful 
        • Risk factor identification 
        • Risk stratification and comparative effectiveness research 
        • Complex longitudinal clinical event pattern discovery 
        • Disease progression 
  • An overview of the recent advances in applied matrix models 
    • Low-rank approximation 
      • Nonnegative matrix factorization and its variants 
      • (Truncated) nuclear norm 
    • Sparse learning 
      • The `1 norm, `1,p norm, elastic net, fused LASSO penalty norm 
      • Tree/graph-structured sparse penalty norm 
      • Non-redundancy vs. groupwise selection 
    • Temporal learning 
      • Convolutional/Evolutionary matrix factorization 
      • Online/streaming technologies 
  • Matrix models in biomedical informatics 
    • Electronic Phenotyping 
      • Vector based representation 
      • Matrix based representation 
      • Graph based representation 
    • Precision Medicine 
      • Patient similarity evaluation 
      • Drug similarity evaluation 
      • Heterogeneous random walk 
    • Drug Development 
      • Drug repositioning 
      • Drug drug interaction detection
  • Challenges and Opportunities 
    • Complexity 
    • Security 
    • Scalability
Comments