When: Tuesday, 28th October 2025.
Where: Co-located with ACM Multimedia 2025 in Dublin, Ireland.
Invited Speakers
Allie Tran, Dublin City University
Lifelogging: from POV images to multi-perspective videos -- Lifelogging began as a personal quest: the idea that wearable cameras could capture one’s daily life for reflection and retrieval. Today, we are entering a new phase: multi-perspective lifelogging, where several individuals share overlapping lifelogs within a shared environment, producing rich, time-linked, multi-view video and sensor data. The CASTLE dataset is the first step toward that vision: capturing the complex, social, and contextual dimensions of everyday life. This keynote reflects on the evolution of lifelogging systems from single-user capture to collaborative memory infrastructures. I will discuss what this transition reveals about the nature of personal data, multimodal understanding, and the balance between privacy and utility. Looking ahead, I will outline how lifelogging can inform broader multimedia challenges, from egocentric vision to embodied AI, as we move from recording life to understanding lived experience.
Silvia Rossi, Centrum Wiskunde & Informatica
From Individual Immersion to Shared Experiences in Social XR -- Extended Reality (XR) is rapidly evolving from isolated and single-user applications towards shared and multi-user experiences. This shift opens new research questions about how group interact, collaborate and experience virtual environments. In this talk, I will outline my research which has followed a similar path: from modelling and predicting individual user behaviour in immersive spaces to designing and evaluating social XR experiences. I will present my recent works on cultural and collaborative application enabled by social VR platform as well as new evaluation methods that go beyond user experience, including studies on social density effects and shared mental models. Finally, I will discuss emerging research directions at the intersection of human behaviour, system design, and social interaction.
Samir Sadok, Inria @ Univ. Grenoble Alpes
Bridging Multimodal Representation Learning and Generation through Masked Modeling -- This talk presents recent advances in multimodal speech processing using masked modeling. I will show how learned audiovisual representations can support emotion recognition and how masked representations can be leveraged for precise analysis, control, and generation of speech. These developments highlight the potential of masked modeling to unify representation learning and generative tasks, offering new opportunities for interpretable and controllable speech technologies.