June 30, 2025

Large Multimodal Models for

Pixel-level Scene Understanding

at IEEE ICME 2025

A half-day workshop at IEEE ICME 2025

Call For Papers

The workshop invites original research contributions on all multimodal segmentation aspects. Topics of interest include, but are not limited to:

Foundational models for image/video/3D segmentation
Open-vocabulary/open-world image/video/3D segmentation
Semantic/instance/panoptic image/video/3D segmentation
Universal image/video/3D segmentation models
Diffusion models for segmentation
Training-free segmentation models
Real-world segmentation applications in medical imaging, autonomous driving, robotics, and visual navigation.

Submission Site: https://cmt3.research.microsoft.com/ICMEW2025/Submission/Index

Formatting Requirements:

Papers must not exceed 6 pages, including text, figures, and references.
Workshop papers should follow the same formatting guidelines as the main conference papers. Please refer to the main conference instructions here. All submissions must adhere to the double-blind review process in accordance with the conference guidelines.

Important Dates

Paper Submission Due Date: April 1, 2025

Notification of Acceptance/Rejection: April 25, 2025

Camera-Ready Due Date: May 15, 2025

Fast Track: We offer a fast track for papers rejected from the main conference or other top-tier conferences, provided the reviews are uploaded. Acceptance notifications will be sent in mid April.
No Proceedings (Non-archival): If you're interested in presenting previously accepted work from the main ICME conference or other top-tier conferences, we also accept papers relevant to the topic. Please email your submission to lmm.psu.organizers@gmail.com.

Organizers

Rao Muhammad AnwerMBZUAI, UAE

Jorma LaaksonenAalto University, Finland

Wenguan WangZhejiang University, China

Hisham CholakkalMBZUAI, UAE

Yutong Xie University of Adelaide Australia, MBZUAI UAE

Jiale CaoTianjin University, China

A summit to bring together the brave new ideas and trends in advancing multimodal pixel-level scene understanding in the era of Generative AI

Agenda

Monday, June 30th, 2025
2:00 PM – 5:00 PM

Workshop W4 - Room GH

2:00 – 2:15 PM: Opening Remarks
Jorma Laaksonen (Aalto University)

2:15 – 3:00 PM: Keynote Talk
Fahad Khan (MBZUAI / Linköping University)

3:00 – 3:25 PM: Oral Talk 1
LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation
Yixiao Liu, Yizhou Yang, Jinwen Li, Jun Tao, Ruoyu Li, Xiangkun Wang, Min Zhu, Junlong Cheng

3:25 – 3:50 PM: Oral Talk 2
RIASA: Enhancing Reasoning Industrial Anomaly Segmentation via Large Vision-Language Models
Zongyun Zhang, Xian Gao, Jiacheng Ruan, Ting Liu, Yuzhuo Fu

3:50 – 5:00 PM: Poster Session and Networking
RIASA: Enhancing Reasoning Industrial Anomaly Segmentation via Large Vision-Language Models
Zongyun Zhang, Xian Gao, Jiacheng Ruan, Ting Liu, Yuzhuo Fu

A Twin-network Architecture for RGB-based Panoramic Semantic Segmentation
Jingguo Liu, Jiayao Liu, Yujie Wang, Shigang Li, Jianfeng Li

LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation
Yixiao Liu, Yizhou Yang, Jinwen Li, Jun Tao, Ruoyu Li, Xiangkun Wang, Min Zhu, Junlong Cheng

360-Degree Full-view Image Segmentation by Spherical Convolution compatible with Large-scale Planar Pre-trained Models
Jingguo Liu, Han Yu, Shigang Li, Jianfeng Li

Program Committee

Jean Lahoud MBZUAI, UAE.

Lei Huang, Beihang University, CHINA

Ashmal Vayani, University of Central Florida, USA

Munawar Hayat, Qualcomm, USA

Mustansar Fiaz, IBM, USA

Muzammal Nasser, Khalifa University, UAE

Yaxing Wang, Nankai University, CHINA

Aditya Arora, York University, CANADA

Akshita Gupta, University of Guelph, CANADA

K J Joseph, Adobe Research, INDIA

Sanath Narayan, TII, UAE

Hanoona Bangalath, MBZUAI, UAE

Sara Ghaboura, MBZUAI, UAE

The Venue

La Cité Nantes Congress Centre

Nantes, France

ICME 2025 will be held at the venue La Cité Nantes Congress Centre, which is a world-class convention center in the heart of the city, within walking distance of the train station, hotels and historic places.

Large Multimodal Models for

Pixel-level Scene Understanding

Page updated

Google Sites

Report abuse