CVPR 2026 Tutorial
Xi Li, Manling Li, Muchao Ye
University of Alabama at Birmingham, Northwestern University, The University of Iowa
Time: June 3 2026
Location: Mile High 3C
Slides: TBD
Modern multi-modal learning leverages large-scale models, such as large language models (LLMs), to integrate diverse data sources (e.g., text, image, audio, and video), enabling enhanced perception, reasoning, and decision-making. However, the increased complexity of multi-modal systems introduces safety challenges that fundamentally differ from those studied in traditional uni-modal learning, calling for new safety solutions. Motivated by this shift, this tutorial examines the evolving safety landscape of multi-modal learning from both threat modeling and safety solution perspectives. We first present a taxonomy of emerging safety threats rooted in multi-modal characteristics, including compromised modality integration, modality misalignment, and fused vulnerabilities, and illustrate representative threat patterns. We then analyze how these threats reshape threat models and invalidate common safety assumptions inherited from uni-modal systems. Building on this analysis, the tutorial reviews emerging safety solutions, discusses open challenges and future directions, and sheds light on paths toward more reliable multi-modal systems. The tutorial covers adversarial attacks, data poisoning, jailbreak exploits, and hallucinations. We aim to provide a structured foundation for researchers and practitioners to advance trustworthy multi-modal AI systems.
Part 1: Introduction and Background (20 mins)
Uniqueness of Multi-Modal Learning
Revisiting Safety Studies
Safety Challenges of Multi-Modal Learning
Part 2: Emerging Multi-Modal Safety Threats (50 mins)
Compromised Modality Integration
Modality Misalignment
Fused Safety Risks
Break (10 mins)
Part 3: Evolving Multi-Modal Safety Landscape (20 mins)
How Multi-Modal Threats Reshape the Threat Model
When Traditional Safety Assumptions Break
Break (10 mins)
Part 4: Recent Advances in Safe Multi-Modal Learning (50 mins)
Supervised Safety Fine-Tuning
Preference-Based Optimization
Training-Free Solutions
Part 5: Conclusion (10 mins)
Open Challenges and Future Research Directions
Q&A (20 mins)
Dr. Xi Li is currently an Assistant Professor in the Department of Computer Science at the University of Alabama at Birmingham. She received her Ph.D. from the Department of Computer Science and Engineering at Pennsylvania State University. She earned her B.E. from Southeast University and her M.S. from Pennsylvania State University. Her research focuses on trustworthy AI and its integration in areas such as digital health and mental health. Her work has contributed to several publications in conferences such as ICCV, AAAI, and ACL, as well as in journals like IEEE Transactions on Knowledge and Data Engineering. She has tutorials on trustworthy AI in the era of large-scale models at SDM 2025 and on safe multi-modal learning at ICCV 2025. Additional information is available at her website.
Dr. Manling Li is an Assistant Professor at Northwestern University. She was a postdoc at Stanford University and obtained her PhD in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics. Her work won the ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, etc. She was a recipient of Microsoft Research PhD Fellowship in 2021, an EE CS Rising Star in 2022, a DARPA Riser in 2022, etc. She served as Organizing Committee of ACL 25, NAACL 25, EMNLP 24, and delivered tutorials about multimodal knowledge at AAAI'25, IJCAI'24, CVPR'23, NAACL'22, AAAI'21, ACL'21, etc. Additional information is available at her website.
Dr. Muchao Ye is an Assistant Professor in Computer Science at the University of Iowa. He received his PhD in Informatics at the Pennsylvania State University in 2024. Before that, he obtained his Bachelor of Engineering degree in Information Engineering at South China University of Technology. His research interests lie in the intersection of AI, security, and healthcare, with a focus on improving AI safety from the perspective of adversarial robustness and human-understandable explainability. His research works have been published in top venues including CVPR, NeurIPS, KDD, ACL, and AAAI. He has given tutorials on premier conferences including KDD'21, SDM'25, and ICCV'25. Additional information is available at his website.