Multimodal Generative AI for Precision Health

Hoifung Poon, Microsoft Research

Abstract:
The dream of precision health is to develop a data-driven, continuous learning system where new health information is instantly incorporated to optimize care delivery and accelerate biomedical discovery. The confluence of technological advances and social policies has led to rapid digitization of multimodal, longitudinal patient journeys, such as electronic medical records (EMRs), imaging, and multiomics. Our overarching research agenda lies in advancing multimodal generative AI for precision health, where we harness real-world data to pretrain powerful multimodal patient embedding, which can serve as digital twins for patients. This enables us to synthesize multimodal, longitudinal information for millions of cancer patients, and apply the population-scale real-world evidence to advancing precision oncology in deep partnerships with real-world stakeholders such as large health systems and pharmaceutical companies.

Bio:

Hoifung Poon is the General Manager at Health Futures in Microsoft Research and an affiliated faculty at the University of Washington Medical School. He leads biomedical AI research and incubation, with the overarching goal of structuring medical data to optimize delivery and accelerate discovery for precision health. His team and collaborators are among the first to explore large language models and multimodal generative AI in health applications, producing popular open-source foundation models such as PubMedBERT, BioGPT, BiomedCLIP, LLaVA-Med, GigaPath. He has led successful research partnerships with large health providers and life science companies, creating AI systems in daily use for applications such as molecular tumor board and clinical trial matching. His prior work has been recognized with Best Paper Awards from premier AI venues such as NAACL, EMNLP, and UAI. He received his PhD in Computer Science and Engineering from the University of Washington, specializing in machine learning and NLP.

Summary:

Focus: Precision Health
- Challenge: Medicine Today is imprecise
- Example: Cancer
  - First-generation treatments
    - High toxicity
    - Low precision
  - Targeted therapy: far more precise and effective treatment
  - Immunotherapy: e.g. Keytruda very effective but only works for minority of patients
  - Finding the right treatment for each patient is critical for treating their individual type of cancer
Vision: Continuous Learning Health System
- Insight Consumer (Pharma, Payor Regulator)
- Data Producer (researchers, labs)
- Emphasis on Real-World Evidence, based on data collected from real-world treatments
- “Population-scale free lunch”
Patient embedding
- Generative models can use diverse data on patient
- E.g. treatment notes, imagery
- Growing array of info-rich modalities
- Each modality is informative but seen individually, very limited
- Want to learn function f(sensor data) => disease progression, treatment response
- Challenge:
  - patient journey is longitudinal
  - extremely sparse
  - Noise in data and this noise may be biased (e.g. choice of treatment correlated with the attributes of the patient)
- Generative models can create patient embeddings based on observed information, even if it is sparse
- RealWorld Data -> Pretraining : patient embedding = digital twin => Biomedial Foundation Model
- Biomedical Foundation Model -> Reasoning : patient-like-me population-scale -> Real World Evidence
- => Improve patient care and Emergent Capabilities
- Discover what works (improve patient care) and what doesn’t (accelerate discovery)
Generative AI -> New Patterns
- Universal structuring -> Scale real world evidence
- Universal translator -> Rethink interoperability
- Universal annotator -> Scale dataset / evaluation
- Universal reasoning -> Talk to data
- MedPrompt: Generalist AI
  - GPT-4 can do well on medical tasks when given medical texts as prompt
  - Example: can feed medical trial inclusion/exclusion criteria into GPT and get out a clear explanation
  - Example: structure patient records. Take in notes output clean notes that follow annotation guidelines
- Multimodal GenAI: Growth Area
  - Challenging modalities
    - Structured data
    - Clinical notes
    - Radiology
    - Digital pathology
    - Genomics
    - Spatial transcriptomics
    - …
Example: Digital Pathology
- Case study: immunotherapy
- Today: simple rules to characterize the tumor
- Need to model the tumor’s microenvironment
- Wanted: whole slide modeling
- Transformer models characterize pathology images
  - Can capture arbitrarily complex dependencies across slide
  - Challenge:
    - Passing information across all pairs of pixels is extremely computationally expensive
    - Images are much higher resolution than web images
  - Approach: diffusive attention where local message passing is on high-res pixels and long-distance messages are passed between coarsened pixels
  - Created a Cancer Foundation Model
    - Application: cancer subtyping
      - Was able to improve upon state-of-the-art (local patch analysis) on 6 types of cancers
    - Application: gene mutation prediction
      - Created new benchmark, improved upon state-of-the-art
    - Can even achieve state-of-the-art on zero-shot inference
Multimodal
- Unimodal data: established encoder->decoder architectures
- Multimodal data creates a combinatorial explosion of inter-mode interactions
  - Different data sources include different modalities, so capture different subsets of interactions
  - Each type of data gives you different information influences different regions in the embedding space
- Leveraging lessons from language translation:
  - All languages refer to common reality
  - Idea: convert each language to common representation (e.g. English), then translate from that
- Approach: convert all modalities into a common text modality since most datasets include text + few others
- LLaVA-Med: first attempt at this direction
  - Radiology image - text reports
  - Image encoder + text encoder -> latent state ->
    - Image decoder
    - Mask decoder (identifies important features)
- https://microsoft.github.io/BiomedParse
Real-world applications
- Given a patient embedding, can create a universal embedding (e.g. intervention)
- Use-case: clinical trial matching
- In-silico clinical trial simulation
- Finding matches for clinical trials
- TrialScope
Productivity Gain (instruction following) vs Creativity Gain (instruction learning)