2nd Collaboration and Evolution of Foundation and Specialized Models Workshop
Chicago, USA
Workshop at ACM International Conference on Multimedia Retrieval 2025 (June 30 - July 3, 2025)
2nd Collaboration and Evolution of Foundation and Specialized Models Workshop
Chicago, USA
Workshop at ACM International Conference on Multimedia Retrieval 2025 (June 30 - July 3, 2025)
The CEFSW Workshop 2025
The foundation models possess cognitive computing capabilities to handle complex general tasks and are typically deployed in the cloud. Correspondingly, specialized models are designed to be lightweight, goal-oriented, capable of quick responses, and adaptive iterations, making them potentially suitable for deployment on resource-constrained end devices. However, mainstream cloud-centric learning paradigms exhibit deficiencies in real-time performance, personalization, load cost, and privacy security, often neglecting the capabilities of device-based specialized models. In recent years, an increasing number of researchers have focused on the collaboration and evolution between cloud-based foundation models and device-based specialized models to propose innovative solutions that will ultimately benefit various applications.
This workshop is relevant to ICMR, as the collaboration and evolution of multi-media foundation models and specialized models can be applied to numerous multimedia and visual application problems (e.g., search engine, recommender systems). The collaboration between foundation models and specialized models holds significant relevance to the mission and scope of ACM ICMR. Multimedia retrieval has always demanded computational efficiency, adaptability, and scalability. Foundation models, known for their extensive capacity and cognitive-like reasoning, have achieved remarkable success in tasks such as large-scale visual understanding, natural language processing, and multimodal retrieval. However, their strengths come with notable drawbacks in certain contexts—such as heavy computational load, reliance on cloud infrastructures, and potential latency issues. Specialized models, on the other hand, excel at on-device inference with minimal computational overhead. By effectively combining these two types of models, ICMR’s goals of improving retrieval accuracy, personalization, and responsiveness can be simultaneously addressed. For instance, consider real-time video search in security or healthcare scenarios. Instead of sending all video frames to a remote cloud-based foundation model, a specialized model can run on the local device to quickly sift through frames, identify candidate segments of interest, and only forward these segments—potentially accompanied by compact descriptors or embeddings—to the more computationally powerful foundation model. This division of labor drastically reduces network load and latency, ensuring that insights are delivered promptly and resources are conserved.
The objective of this workshop is to forge an interdisciplinary forum that advances conceptual, theoretical, and practical understanding of how foundation models (e.g., large language models, multimodal large language models) and specialized models can co-evolve, complement each other, and collaboratively improve multimedia retrieval tasks. By bringing together researchers and practitioners from academia, industry, and related fields, we aim to delineate a roadmap for research and development that moves beyond existing approaches.
CEFSW Workshop will be held on June 30, 2025.
Organizers
is a Professor in the Department of Computing, Hong Kong Polytechnic University.
is an Assistant Professor in the Department of Computer Science and Engineering, Shanghai Jiao Tong University.
is a ZJU 100 Young Professor in the School of Software Technology, Zhejiang University.
Yuqing Zhang
is a final-year Ph.D. student in the College of Computer Science and Technology, Zhejiang University.