Invited Speakers
Jiahuan Pei, Social AI Group, Vrije Universiteit Amsterdam, The Netherlands
Multimodal AI Agents: Model Performance Meets User Experience. Multimodal AI agents are evolving into systems capable of understanding text, vision, and spatial environments while collaborating directly with humans. This talk explores how we should evaluate these agents through two complementary perspectives: model performance—how effectively they reason, perceive, and execute complex tasks—and user experience—how they influence human creativity, learning, and well-being. Through applications in mixed reality guidance, physical assembly benchmarking, virtual co-building, and gaze interpretation, we examine the opportunities and limitations of current multimodal systems. Although these agents show strong potential for transforming learning and creativity across virtual and physical environments, advances in spatial reasoning and long-horizon task execution are still needed before they can become reliable real-world assistants.
Livio de Luca, CNRS/MAP, France
From Multimedia Heritage Data to Knowledge Retrieval: Structuring, Correlating and Interpreting the Scientific Memory of Notre-Dame de Paris. Heritage sites are increasingly documented through large and heterogeneous multimedia corpora, combining 3D surveys, photographs, spatial annotations, historical sources, scientific observations, material analyses and interpretative records. Yet the main challenge is no longer only to acquire or visualise these data, but to make them searchable, interoperable and meaningful across disciplines, scales and temporalities.
Drawing on the scientific action conducted around the restoration of Notre-Dame de Paris, this conference will present an approach to heritage documentation in which digital models become structured environments for knowledge organisation and retrieval. The talk will focus on how heterogeneous multimedia data can be correlated along four complementary dimensions: space, form, time and knowledge domains. Through semantic annotation, geometric and visual descriptors, temporal tracking of transformations, and the formalisation of research practices, the objective is to move from isolated digital resources to dynamic knowledge ecosystems.
Particular attention will be given to the role of platforms such as Aïoli, Quasi.modo and Dür.air, which support collaborative acquisition, annotation, structuring and exploration of complex heritage datasets. These tools are conceived not merely as repositories or visualisation systems, but as socio-technical environments where scientific observations, interpretative hypotheses and material evidence can be progressively interlinked.
For the multimedia and information retrieval communities, this case study opens broader questions: how can multimodal heritage data be indexed and queried? How can spatial, visual, semantic and temporal structures be jointly exploited? And how can retrieval systems account not only for data content, but also for the scientific gestures and interpretative processes through which knowledge is produced? By addressing these questions, the presentation aims to position heritage science as a challenging and fertile field for multimedia structuring, multimodal correlation and knowledge-based retrieval.