Talks‎ > ‎

Computational Phenotyping from Massive Clinical Data

Tutorial in 2016 IEEE International Conference on Healthcare Informatics (ICHI)

Slides: Part I, Part II

Abstract: In recent years, advances in data analytics and its applications to large amounts of healthcare data collected on daily basis opened numerous new opportunities and challenges in the field of health informatics. By definition, health informatics refers to the process of leveraging information technologies to improve the quality of healthcare. Many researchers are focusing on basic and translational research to achieve this goal by proposing novel or applying and adapting the state-of-the-art data analytics techniques to vast amounts of recently collected data. Recent adoption of Electronic Health Records (EHR) opens additional opportunities for data analytics, as we are able to access structured and unstructured data that is systematically collected for each event in the healthcare system or even contributed by the patients themselves.

Modern EHRs are composed of a diverse array of data, including structured information (e.g., diagnoses, medications, and lab results), molecular sequences, unstructured clinical progress notes, and social network information. There is mounting evidence that EHRs are a rich resource for clinical research, but they are notoriously difficult to leverage because of their orientation to healthcare business operations, heterogeneity across commercial systems, and high levels of missing or erroneous entries. Moreover, the interactions among different data sources within an EHR are challenging to model, hampering our ability to leverage traditional analytic frameworks. In recognition of this problem, various efforts have been undertaken to transform EHR data into concise and meaningful concepts, or phenotypes. Many previous efforts on phenotyping have been ad hoc and labor intensive, resulting in specific phenotypes for specific environments. In recent years, lots of computational models have been proposed to automate the process of phenotyping, which are usually referred to as computational phenotyping methods. In this tutorial, we will systematically introduce the concept of computational phenotyping, the challenges, state-of-the-art methodologies and future research directions. No specific knowledge will be required since the tutorial is self contained and most fundamental concepts will be introduced during the presentation.

Main Topics

  • Introduction: Computational phenotyping and healthcare informatics
    • What is healthcare informatics
    • What is computational phenotyping
    • The role of computational phenotyping in healthcare informatics
    • Examples of computational phenotyping applications in healthcare informatics 
  • Data representation Electronic Health Records
    • Unstructured data
    • Structured data
  • Computational Phenotyping with Unstructured Data in EHR
    • Keyword search and rule based systems
    • Statistical natural language processing in computational phenotyping
    • Graph mining for computational phenotyping
  • Computational Phenotyping with Structured Data in EHR
    • Supervised methods
    • Unsupervised methods
    • Semi-Supervised methods
  • Challenges and Opportunities
    • Scalability
    • Interoperatability
    • Privacy
Fei Wang is an Assistant Professor in Health Informatics Division, Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell UniversityHe got his Ph. D. degree from Department of Automation, Tsinghua University in 2008. After that, he spent one year in School of Computing and Information Science, Florida International University as a postdoc and another year in Department of Statistical Science, Cornell University as a postdoc. His research interests include semi-supervised learning, clustering, relational learning, optimization, social network analysis and healthcare data analytics. He has published over 150 papers on the leading conferences like SIGKDD, SIGIR, ICML, IJCAI, AAAI, SDM, ICDM and AMIA. He also serves as a referee for many distinguished journals including IEEE TPAMI, IEEE TKDE, DMKD, ACM TKDD, JBI, JAMIA and senior program committee member for many international conferences including KDD, ICDM and SDM. He was the program co-chair for the system track in ICHI 2015. His personal web site is here.

Jyotishman Pathak is a Professor and Chief of Division of Health Informatics, Department of Healthcare Policy and Research, Weill Cornell Medical College, Cornell University. Before joining Weill Cornell in October 2015, Dr. Pathak was Professor of Biomedical Informatics at the Mayo Clinic College of Medicine and served as the director of Biomedical Informatics at the Mayo Clinic Center for Clinical and Translational Sciences and the director of clinical informatics at the Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery. The current research focus of Dr. Pathak is on secondary uses of electronic health record data for clinical and health care delivery research, integration of genomic data within electronic health records, and clinical decision support systems for personalized therapeutics. Dr. Pathak has played important leadership roles in two large National Institutes of Health-Department of Health and Human Services initiatives — the Strategic Health IT Research Project (SHARP) and the Electronic Medical Records and Genomics (eMERGE) project — which have pioneered techniques for high-throughput phenotyping from electronic health records. He currently holds several major national grants from the National Institutes of Health, the Agency for Healthcare Research and Quality, and private foundations. Dr. Pathak has published over 150 papers, including many book chapters and invited reviews. His personal website is here.