MICCAI 2024 Tutorial

Foundation Models For
Medical ImagING (FOMMIA)

MICCAI 2024

Date: October 6, 2024

Diamant Room at Palmeraie Palace
8:00AM - 12:30PM

Scope

Generative AI and large-scale self-supervised foundation models are poised to have a profound impact on human decision making across occupations. Healthcare is one such area where such models have the capacity to impact patients, clinicians, and other care providers. This tutorial, structured as a combination of lectures and demonstrations, seeks to furnish participants with a comprehensive guide on harnessing the power of vision and large language models within the healthcare domain. Then, we will describe methodologies that are tailored to clinical tasks and provide application examples from various imaging domains reflecting research interests within a wide MICCAI community.

Description

Medical imaging has always been a challenging field to test the ideas developed in AI. The latest work in AI is all surrounding foundation models for language, vision, etc. Building healthcare-specific foundation models is relevant to our community as we have learned from past experience that the standard deep learning models still need a good amount of conditioning before they will be relevant to medical imaging. Learning these techniques in a timely fashion by our MICCAI community members will help accelerate not only their adoption in our field but also advance the science of AI by providing adequate requirements for such systems.

This is an emerging topic with little systematic courses organized at many universities. The course presented here is modeled after the speakers respective courses delivered at their universities, including CS277/BIODS271 at Stanford University.

Detailed breakdown of topics over a 4-hour tutorial window including a 30-minute coffee break is as follows:

1. Introduction to Foundation Models -- 8 to 8:40 AM

a. Evolution of Machine learning models

b. Definition of Foundation models

c. What makes a model foundational?

d. Examples of foundational models

e. Frameworks: Self-supervised learning, contrastive learning, masked auto-encoders

2. Vision-language models -- 8:40 to 9:20 AM

a. Zero-shot inference for classification

b. Vision-language models for medical imaging (e.g., embedding domain knowledge)

c. Captioning

3. Fine-tuning foundation models - 9:20 to 10:00 AM

a. Prompt learning

b. Adapters

c. Linear-probing baselines

d. Parameter-efficient fine-tuning (e.g., low-rank approximation)

e. Transduction helps VLMs.

Coffee break: 10 to 10:30 am

4. Foundational models for segmentation -- 10:30 to 11:10 AM

a. SAM and other foundation models for medical imaging.

b. Generalist vs. domain-specialized?

c. Finetuning for segmentation: Spatial adapters, parameter-efficient fine-tuning, constrained transductive inference.

5. Overview of Vision and Language Models (vLLMs) --11:10-11:50 AM

a. Expanding Large Language Models to Vision: LLaVA

b. vLLMs in Medicine

c. vLLMs in Pathology

d. vLLMs in Radiology

6. Enhancing vLLMs utilization -- 11:50 AM -12:30 PM

a. Prompting for chest X-ray report generation and disease diagnosis

b. Instruction tuning

c. Retrieval Augmented Generation

Familiarity with machine learning principles at a graduate level is expected of the participants.

Learning objectives

- - To become familiar with the latest foundation models and learn how they can be relevant for multimodal medical imaging research.
  - To also have hands-on experience in using the models for some standard tasks in healthcare.
  - Building a clear understanding of the main strengths and weaknesses of several stat-of-the-art approaches and learning how to use them in several medical imaging problems
  - Acquiring basic knowledge as to how to implement some of these solutions in a case-study example.