MICCAI 2025
Date: September 23rd, 2025
Daejeon Convention Center Room DCC1-1F-112
8:00AM - 17:30PM
Generative AI and large-scale self-supervised foundation models are poised to have a profound impact on human decision making across occupations. Healthcare is one such area where such models have the capacity to impact patients, clinicians, and other care providers. Medical imaging could benefit from these technologies for many applications ranging from phantom models to precision AI for interventional imaging. The latest work in AI is all surrounding foundation models for language, vision, etc. Building healthcare-specific foundation models is relevant to our community as we have learned from experience that the standard deep learning models still need a good amount of conditioning before they will be relevant to medical imaging. Learning these techniques in a timely fashion by our MICCAI community members will help accelerate not only their adoption in our field but also advance the science of AI by providing adequate requirements for such systems. This is an emerging topic with little systematic courses organized at many universities and hence will be a benefit to our MICCAI community members.
More recent advancements in medical multimodal large language models (MLLMs), such as MedGemini, have revolutionized clinical AI by enabling the integration of diverse data types, clinical reports, medical images, and graph representations, within a unified framework. These models demonstrate significant potential for enhancing diagnostics, personalizing care, and fostering human-AI collaboration in clinical decision-making. By incorporating clinician inputs, such as eye gaze and textual prompts, MLLMs can improve diagnostic accuracy, streamline workflows, and promote data-driven healthcare solutions. The second half of our tutorial will cover these advanced topics. It begins with an introduction to open-source MLLMs and their applications in addressing medical image analytics. Participants will then engage in practical demonstrations across three key tasks.
The morning session will cover Foundational Models, including VLMs based on CLIP, and Generative Models for medical image analytics. We will also cover the underlying fundamentals of these models as well as live exercises for the basic concepts so the audience can get hands-on experience as well. In the afternoon, we will focus on the latest VLMs using large language models. By covering more advanced and latest works mostly in the field of radiology, the audience can learn to leverage open-source multimodal large language models and to implement advanced techniques such as knowledge graph infusion through both lectures and practical demonstration in the afternoon.
To become familiar with the latest foundation models, both pre-training and adaptation aspects (e.g., parameter-efficient finetuning)
To learn how foundation models could be relevant for multimodal medical imaging research.
To have hands-on experience in using the models for some standard tasks in healthcare
Develop a comprehensive understanding of Foundation Models (FMs) and Multimodal Large Language Models (MLLMs) and their transformative impact on medical image analytics.
Acquire practical skills in deploying open-source FMs and MLLMs for medical image analytics.
Implement advanced techniques such as leveraging eye gaze and knowledge graph for enhancing models’ performance.
Ismail Ben Ayed
Full Professor at ÉTS Montréal
Tanveer Syeda-Mahmood
IBM Fellow, Chief Scientist
Razi Mahmood
PhD student at Rensselaer Polytechnic Institute
Yunsoo Kim
PhD candidate at University College London
Weidi Xie
Associate Professor at Shanghai Jiao Tong University
Sophie Ostmeier
PostDoc at Stanford
Luping Zhou
Associate Professor at University of Sydney
Curtis Langlotz
Professor at Stanford
Chaoyi Wu
PhD candidate at Shanghai Jiao Tong University
Honghan Wu
Professor at University of Glasgow