08:30 - Start and welcome speech
08:45 - Oral 1: Audio-Visual State Space Model for Question Answering with BIMBA
Varun Shanmugam, Abduljalil Radman, Jorma Laaksonen. Aalto University, Finland.
09:05 - Oral 2: Multimodal LLMs for Visual Style Evaluation
Rubén Pascual Casas, Izaskun Aguirre, Mikel Sesma-Sara, Aranzazu Jurio, Daniel Paternain, Mikel Galar. Universidad Pública de Navarra, Spain
09:25 - Oral 3: Unified Spatio-Temporal Model for Fair Facial Attribute Classification
Madhurika Patil, Ajita Rattani. University of North Texas, USA.
10:00 Coffee break
10:30 - Keynote: Arun Ross - Foundation Models in Biometrics: From Matching to Explainability
11:15 - Oral 4: Evaluating Attention Reuse in Dynamic 3D Gaussian Reconstruction
M Uzzwal Reddy, Sowmya Kamath S. National Institute of Technology Karnataka, India.
T N Rickesh. Nanyang Technological University, Singapore.
11:35 - Oral 5: Are Pretrained VideoLLMs Enough for Action Recognition? A Zero-Shot Evaluation on Charades
Andrea Lagorio, Giuseppe A. Trunfio, Matteo Poddighe, Massimo Tistarelli, Pietro Ruiu. University of Sassari, Italy.
12:00 Lunch Break
13:00 - Oral 6: Adaptive Query-Conditional Fusion for Egocentric Natural Language Query Grounding
Enmin Zhong, Carlos R. del-Blanco, Fernando Jaureguizar, Narciso García. Universidad Politécnica de Madrid, Spain.
13:20 - Oral 7: Confidence Scores in Open-Vocabulary Detection Are a Biased Mixture of Scale and Semantics
Yi Tang Soon, Jun-Wei Hsieh. National Yang-Ming Chiao-Tung University, Taiwan.
13:40 - Closing Remarks