Towards Safe Multi-Modal Learning:

Evolving Threats and Safety Solutions

CVPR 2026 Tutorial

Xi Li, Manling Li, Muchao Ye

University of Alabama at Birmingham, Northwestern University, The University of Iowa

Time: 8 AM June 3 2026

Location: Mile High 3C

Slides: TBD

Abstract:

Modern multi-modal learning leverages large-scale models, such as large language models (LLMs), to integrate diverse data sources (e.g., text, image, audio, and video), enabling enhanced perception, reasoning, and decision-making. However, the increased complexity of multi-modal systems introduces safety challenges that fundamentally differ from those studied in traditional uni-modal learning, calling for new safety solutions. Motivated by this shift, this tutorial examines the evolving safety landscape of multi-modal learning from both threat modeling and safety solution perspectives. We first present a taxonomy of emerging safety threats rooted in multi-modal characteristics, including compromised modality integration, modality misalignment, and fused vulnerabilities, and illustrate representative threat patterns. We then analyze how these threats reshape threat models and invalidate common safety assumptions inherited from uni-modal systems. Building on this analysis, the tutorial reviews emerging safety solutions, discusses open challenges and future directions, and sheds light on paths toward more reliable multi-modal systems. The tutorial covers adversarial attacks, data poisoning, jailbreak exploits, and hallucinations. We aim to provide a structured foundation for researchers and practitioners to advance trustworthy multi-modal AI systems.

Outline:

Part 1: Introduction and Background (20 mins)
- Multi-Modal Large Language Models
- Revisiting Safety Studies
- Safety Challenges of Multi-Modal Learning
Part 2: Emerging Multi-Modal Safety Threats (70 mins)
- Compromised Modality Integration
- Modality Misalignment
- Fused Safety Risks
Break (10 mins)
Part 4: Recent Advances in Safe Multi-Modal Learning (50 mins)
- Shifts in Threat Model and New Constraints on Safety Solutions
- Supervised Safety Fine-Tuning
- Preference-Based Optimization
- Training-Free Solutions
Part 5: Conclusion and Future Work (10 mins)
- Open Challenges and Future Research Directions
Q&A (20 mins)

Presenters:

Dr. Xi Li is currently an Assistant Professor in the Department of Computer Science at the University of Alabama at Birmingham. She received her Ph.D. from the Department of Computer Science and Engineering at Pennsylvania State University. She earned her B.E. from Southeast University and her M.S. from Pennsylvania State University. Her research focuses on trustworthy AI and its integration in areas such as digital health and mental health. Her work has contributed to several publications in conferences such as ICCV, AAAI, and ACL, as well as in journals like IEEE Transactions on Knowledge and Data Engineering. She has tutorials on trustworthy AI in the era of large-scale models at SDM 2025 and on safe multi-modal learning at ICCV 2025. Additional information is available at her website.

Dr. Manling Li is an Assistant Professor at Northwestern University. She was a postdoc at Stanford University and obtained her PhD in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics. Her work won the ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, etc. She was a recipient of Microsoft Research PhD Fellowship in 2021, an EE CS Rising Star in 2022, a DARPA Riser in 2022, etc. She served as Organizing Committee of ACL 25, NAACL 25, EMNLP 24, and delivered tutorials about multimodal knowledge at AAAI'25, IJCAI'24, CVPR'23, NAACL'22, AAAI'21, ACL'21, etc. Additional information is available at her website.

Dr. Muchao Ye is an Assistant Professor in Computer Science at the University of Iowa. He received his PhD in Informatics at the Pennsylvania State University in 2024. Before that, he obtained his Bachelor of Engineering degree in Information Engineering at South China University of Technology. His research interests lie in the intersection of AI, security, and healthcare, with a focus on improving AI safety from the perspective of adversarial robustness and human-understandable explainability. His research works have been published in top venues including CVPR, NeurIPS, KDD, ACL, and AAAI. He has given tutorials on premier conferences including KDD'21, SDM'25, and ICCV'25. Additional information is available at his website.

Page updated

Google Sites

Report abuse