Multimodal data presents a more comprehensive and natural form of information representation and communication in the real world. Our digital world is multimodal, combining different modalities of data such as text, audio, images, videos, animations, drawings, depth, 3D, biometrics, interactive content, etc. Multimodal data analytics algorithms often outperform single modal data analytics in many real-world problems.
Big Data technology has emerged as a key driver of the new industrial revolution. With the rapid advancement of Big Data technologies and their wide-ranging applications across various sectors, recent research has increasingly focused on multimodal data analysis. In this context, the integration of multimodal AI-driven Big Data has become a highly relevant and timely area of study.
This workshop aims to generate momentum around this topic of growing interest, and to encourage interdisciplinary interaction and collaboration between Natural Language Processing (NLP), computer vision, signal processing, machine learning, robotics, Human-Computer Interaction (HCI), bioinformatics, healthcare, and geospatial computing communities. It serves as a forum to bring together active researchers and practitioners from academia and industry to share their recent advances in this promising area.
This is an open call for papers, which solicits original contributions considering recent findings in theory, methodologies, and applications in the field of multimodal AI. The list of topics includes, but not limited to:
Multimodal data modeling
Multimodal learning
Cross-modal learning
Multimodal Large Language Models (LLMs)
Multimodal data analytics
Multimodal big data infrastructure and management
Multimodal scene understanding
Multimodal data fusion and data representation
Multimodal perception and interaction
Multimodal benchmark datasets and evaluations
Multimodal information tracking, retrieval and identification
Multimodal object detection, classification, recognition, and segmentation
Multimodal AI Generation (text to image, image to text, video to text, text to video, etc.)
Language, vision, and sound (e.g., image/video searching and captioning, visual question answering, visual scene understanding, etc.)
Biometrics data mining (e.g., face recognition, behavior recognition, eye retina and movement, palm vein and print, etc.)
Multimodal applications (autonomous driving, cybersecurity, smart cities, intelligent transportation systems, industrial inspection, medical diagnosis, healthcare, social media, arts, etc.)
Sep. 29, 2025: Submission of full papers (8-10 pages including references & appendices)
Oct. 6, 2025: Submission of short papers (5-7 pages including references & appendices)
Oct. 13, 2025: Submission of poster papers (3-4 pages including references)
Oct. 27, 2025: Notification of paper acceptance
Nov. 12, 2025: Camera-ready of accepted papers
Dec. 8-11, 2025: Workshop (Hybrid: In-person & Online)
Please follow the IEEE manuscript templates (Overleaf or US Letter) and IEEE reference guide to format your paper, and then directly submit to IEEE Big Data paper submission site.
The submissions must be in PDF format without author list (double-blind), written in English, and formatted according to the IEEE publication camera-ready style. All the paper review follows double-blind peer review.
Accepted papers will be published in the IEEE Big Data proceedings.
For MMAI 2025, we are introducing a dual-workshop mode:
Online workshop at MMAI@IEEE Big Data 2025 (MMAI@IEEE Big Data 2025)
In-person workshop at MMAI@IEEE ICDM 2025 (MMAI@IEEE ICDM 2025)
Authors are encouraged to select their preferred venue when submitting their papers. Please visit both workshop pages for more details and submission instructions. Papers accepted by MMAI@IEEE ICDM 2025 are also eligible for complimentary online presentation at MMAI@IEEE Big Data 2025 on Dec. 8, 2025.
Program Chairs:
Chair: Lindi Liao, George Mason University, USA
Co-Chair: Yanjia Zhang, Baptist Health South Florida, USA
Co-Chair: Kaiqun Fu, South Dakota State University, USA
Program Committee Members
Zhiqian Chen, Mississippi State University, USA
Naresh Erukulla, Macy's Inc., USA
Maryam Heidari, George Mason University, USA
Achin Kulshrestha, Google Inc., USA
Ge Jin, Purdue University, USA
Ashwin Kannan, Amazon, USA
Kevin Lybarger, George Mason University, USA
Abhimanyu Mukerji, Amazon, USA
Chen Shen, Google Inc., USA
Arpit Sood, Meta, USA
Gregory Joseph Stein, George Mason University, USA
Alex Wong, Yale University, USA
Marcos Zampieri, George Mason University, USA
If you are interested in serving on the workshop program committee or paper reviewing, please contact the Workshop Chair.
This group serves as a forum for notices and announcements of interest to the Multimodal AI (MMAI) community. This includes news, events, calls for papers, calls for collaborations between academia and industry, dataset releases, employment-related announcements, etc.
Welcome to subscribe to the Multimodal AI group.