Accepted papers can also be read on OpenReview.
Archival track:
End-to-End RAW Synergy for Elevated Vision-Language Reasoning
Kepeng Xu, Tong Qiao, Zhenyang Liu, Gang He
Kareem elgohary, Ali Ayadi, KAWCZYNSKI Marzena, Agnes BLOCH-ZUPAN, Cédric Wemmert
MemeBlip2: A Novel Light Weight Multimodal System to Detect Harmful Memes
Ran Tong, Jiaqi Liu, Aowei Shen, Shuzheng Li, Changlin Yang, Lisha Xu
Non-Archival track:
Multi-Modal Interpretability for Enhanced Localization in Vision-Language Models
MMDU-Bench: Multi-modal Deep Unlearning Benchmark
GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
Nilay Pande, Sahiti Yerramilli, Jayant Sravan Tamarapalli, Rynaa Grover
HueManity: Probing Fine-Grained Visual Perception in MLLMs
Rynaa Grover, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Nilay Pande
Commonsense Storage Reasoning in Domestic Scenes: A Challenge for Vision-Language Models
Michaela Levi Richter, Oren Glickman, Reuth Mirsky
ManeuverVLM: A Novel Multimodal Fusion of Scene Images and Temporal Signals for Maneuver Prediction
Roksana Yahyaabadi, Soodeh Nikan
Yihong Tang, Ao Qu, Zhaokai Wang, Dingyi Zhuang, Zhaofeng Wu, Wei Ma, Shenhao Wang, Yunhan Zheng, Zhan Zhao, Jinhua Zhao