Towards Safe Multi-Modal Learning:

Unique Challenges and Future Directions

ICCV 2025 Tutorial

Xi Li, Manling Li, Muchao Ye

University of Alabama at Birmingham, Northwestern University, The University of Iowa

Time: 1:00 - 5:00 PM HST, October 20, 2025

Location: Hawai'i Convention Center, 326B

Slides: https://drive.google.com/file/d/1dT8OUzAeanYQWxFqa1Au28Sucj4l9WjH/view?usp=sharing

Abstract:

Modern multi-modal learning leverages large models, such as large language models (LLMs), to integrate diverse data sources (e.g., text, images, audio, and video) and enhance understanding and decision-making. However, the inherent complexities of multi-modal learning introduce unique safety challenges that existing frameworks, primarily designed for uni-modal models, fail to address. This tutorial explores the emerging safety risks in multi-modal learning and provides insights into future research directions. We begin by examining the unique characteristics of multi-modal learning -- modality integration, alignment, and fusion. We then review existing safety studies across adversarial attacks, data poisoning, jailbreak exploits, and hallucinations. Next, we analyze emerging safety threats exploiting multi-modal challenges, including risks from additional modalities, modality misalignment, and fused representations. Finally, we discuss potential directions for enhancing the safety of multi-modal learning. As multi-modal learning expands, addressing its safety risks is crucial. This tutorial lays the foundation for understanding these challenges and fostering discussions on trustworthy systems.

Outline:

Part 1: Introduction and Background (30 mins)
- Uniqueness of Multi-Modal Learning
- Revisiting Safety Studies
- Safety Challenges of Multi-Modal Learning
Part 2: Modality Integration Exploitation (50 mins)
- Adversarial Perturbations in Image Modality
- Jailbreak Prompts in Text Modality
Break (15 mins)
Part 3: Modality Misalignment (50 mins)
- Hallucinations Caused by Modality Misalignment
- Attacks Exploiting Embedding Misalignment
Break (15 mins)
Part 4: Fused Vulnerabilities (30 mins)
- Poisoning Attacks via Cross-Modal Fusion
- Cross-Modal Threat Switching
Part 5: Conclusion and Future Directions (30 mins)
- Limitations of Existing Solutions
- Future Directions in Safe Multi-Modal Learning
Q&A (20 mins)

Presenters:

Dr. Xi Li is an Assistant Professor in the Department of Computer Science at the University of Alabama at Birmingham. She received her Ph.D. from the Department of Computer Science and Engineering at Pennsylvania State University. She earned her B.E. from Southeast University and her M.S. from Pennsylvania State University. Her research focuses on trustworthy AI and its integration in areas such as education and healthcare. Her work has contributed to several publications in conferences such as ACL, AAAI, ICCV, and ICML, as well as in journals like IEEE Transactions on Knowledge and Data Engineering and Neurocomputing. She had tutorial on trustworthy AI in the era of large-scale models at SDM. Additional information is available at her website.

Dr. Manling Li is an Assistant Professor at Northwestern University. She was a postdoc at Stanford University and obtained her PhD in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on the intersection of language, vision, and robotics. Her work won the ACL’24 Outstanding Paper Award, ACL'20 Best Demo Paper Award, and NAACL'21 Best Demo Paper Award, etc. She was a recipient of Microsoft Research PhD Fellowship in 2021, an EE CS Rising Star in 2022, a DARPA Riser in 2022, etc. She served as Organizing Committee of ACL 25, NAACL 25, EMNLP 24, and delivered tutorials about multimodal knowledge at AAAI'25, IJCAI'24, CVPR'23, NAACL'22, AAAI'21, ACL'21, etc. Additional information is available at her website.

Dr. Muchao Ye is an Assistant Professor in Computer Science at the University of Iowa. He received his PhD in Informatics at the Pennsylvania State University in 2024. Before that, he obtained his Bachelor of Engineering degree in Information Engineering at South China University of Technology. His research interests lie in the intersection of AI, security, and healthcare, with a focus on improving AI safety from the perspective of adversarial robustness and human-understandable explainability. His research works have been published in top venues including CVPR, NeurIPS, KDD, ACL, and AAAI. He has given tutorials on premier conferences including KDD'21 and SDM'25. Additional information is available at his website.

Page updated

Google Sites

Report abuse