Schedule
9:00 am - 9:10 am Introduction and Opening remarks
9:10 am - 10:05 am Oral paper session (12 min talk + 3 min QA)
End-to-End RAW Synergy for Elevated Vision-Language Reasoning
Commonsense Storage Reasoning in Domestic Scenes: A Challenge for Vision-Language Models
Multi-Modal Interpretability for Enhanced Localization in Vision-Language Models
10:05 am - 10:10 am Best Reviewer Award and Best Paper Award
10:10 am - 11:10 am Poster Session
11:10 am - 12:10 pm Poster Session
MemeBlip2: A Novel Light Weight Multimodal System to Detect Harmful Memes
GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
ManeuverVLM: A Novel Multimodal Fusion of Scene Images and Temporal Signals for Maneuver Prediction
12:10 pm - 14:00 pm Lunch Break
14:00 pm - 14:45 pm Keynote by Professor Sai Rajeswar
14:45 pm - 15:30 pm Keynote by Professor Liang Zhao
15:30 pm - 16:00 pm End and Coffee Break