Abstract:
The dream of precision health is to develop a data-driven, continuous learning system where new health information is instantly incorporated to optimize care delivery and accelerate biomedical discovery. The confluence of technological advances and social policies has led to rapid digitization of multimodal, longitudinal patient journeys, such as electronic medical records (EMRs), imaging, and multiomics. Our overarching research agenda lies in advancing multimodal generative AI for precision health, where we harness real-world data to pretrain powerful multimodal patient embedding, which can serve as digital twins for patients. This enables us to synthesize multimodal, longitudinal information for millions of cancer patients, and apply the population-scale real-world evidence to advancing precision oncology in deep partnerships with real-world stakeholders such as large health systems and pharmaceutical companies.
Bio:
Hoifung Poon is the General Manager at Health Futures in Microsoft Research and an affiliated faculty at the University of Washington Medical School. He leads biomedical AI research and incubation, with the overarching goal of structuring medical data to optimize delivery and accelerate discovery for precision health. His team and collaborators are among the first to explore large language models and multimodal generative AI in health applications, producing popular open-source foundation models such as PubMedBERT, BioGPT, BiomedCLIP, LLaVA-Med, GigaPath. He has led successful research partnerships with large health providers and life science companies, creating AI systems in daily use for applications such as molecular tumor board and clinical trial matching. His prior work has been recognized with Best Paper Awards from premier AI venues such as NAACL, EMNLP, and UAI. He received his PhD in Computer Science and Engineering from the University of Washington, specializing in machine learning and NLP.
Summary:
Focus: Precision Health
Challenge: Medicine Today is imprecise
Example: Cancer
First-generation treatments
High toxicity
Low precision
Targeted therapy: far more precise and effective treatment
Immunotherapy: e.g. Keytruda very effective but only works for minority of patients
Finding the right treatment for each patient is critical for treating their individual type of cancer
Vision: Continuous Learning Health System
Insight Consumer (Pharma, Payor Regulator)
Data Producer (researchers, labs)
Emphasis on Real-World Evidence, based on data collected from real-world treatments
“Population-scale free lunch”
Patient embedding
Generative models can use diverse data on patient
E.g. treatment notes, imagery
Growing array of info-rich modalities
Each modality is informative but seen individually, very limited
Want to learn function f(sensor data) => disease progression, treatment response
Challenge:
patient journey is longitudinal
extremely sparse
Noise in data and this noise may be biased (e.g. choice of treatment correlated with the attributes of the patient)
Generative models can create patient embeddings based on observed information, even if it is sparse
RealWorld Data -> Pretraining : patient embedding = digital twin => Biomedial Foundation Model
Biomedical Foundation Model -> Reasoning : patient-like-me population-scale -> Real World Evidence
=> Improve patient care and Emergent Capabilities
Discover what works (improve patient care) and what doesn’t (accelerate discovery)
Generative AI -> New Patterns
Universal structuring -> Scale real world evidence
Universal translator -> Rethink interoperability
Universal annotator -> Scale dataset / evaluation
Universal reasoning -> Talk to data
MedPrompt: Generalist AI
GPT-4 can do well on medical tasks when given medical texts as prompt
Example: can feed medical trial inclusion/exclusion criteria into GPT and get out a clear explanation
Example: structure patient records. Take in notes output clean notes that follow annotation guidelines
Multimodal GenAI: Growth Area
Challenging modalities
Structured data
Clinical notes
Radiology
Digital pathology
Genomics
Spatial transcriptomics
…
Example: Digital Pathology
Case study: immunotherapy
Today: simple rules to characterize the tumor
Need to model the tumor’s microenvironment
Wanted: whole slide modeling
Transformer models characterize pathology images
Can capture arbitrarily complex dependencies across slide
Challenge:
Passing information across all pairs of pixels is extremely computationally expensive
Images are much higher resolution than web images
Approach: diffusive attention where local message passing is on high-res pixels and long-distance messages are passed between coarsened pixels
Created a Cancer Foundation Model
Application: cancer subtyping
Was able to improve upon state-of-the-art (local patch analysis) on 6 types of cancers
Application: gene mutation prediction
Created new benchmark, improved upon state-of-the-art
Can even achieve state-of-the-art on zero-shot inference
Multimodal
Unimodal data: established encoder->decoder architectures
Multimodal data creates a combinatorial explosion of inter-mode interactions
Different data sources include different modalities, so capture different subsets of interactions
Each type of data gives you different information influences different regions in the embedding space
Leveraging lessons from language translation:
All languages refer to common reality
Idea: convert each language to common representation (e.g. English), then translate from that
Approach: convert all modalities into a common text modality since most datasets include text + few others
LLaVA-Med: first attempt at this direction
Radiology image - text reports
Image encoder + text encoder -> latent state ->
Image decoder
Mask decoder (identifies important features)
Real-world applications
Given a patient embedding, can create a universal embedding (e.g. intervention)
Use-case: clinical trial matching
In-silico clinical trial simulation
Finding matches for clinical trials
TrialScope
Productivity Gain (instruction following) vs Creativity Gain (instruction learning)