Research seminars

This site compiles the information about the invited research seminars and talks in the

Master in Robotics, Graphics and Computer Vision - Universidad de Zaragoza

UPCOMING seminars and TALKS - 2025 / 26

All seminars will be hosted (in person or through online streaming) at the usual classroom (A07) unless stated differently.

JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

Lorenzo Mur-Labadia, PhD. Multiverse Computing.
(Work done previously while @ Research Scientist intern at Meta)

28th - APRIL @ 13h - A07

Bio: Lorenzo Mur-Labadia is currently a researcher at Multiverse Computing. He obtained his PhD in Deep Learning and Computer Vision at the University of Zaragoza (Spain), supervised by Prof. Rubén Martínez-Cantín and Prof. Josechu Guerrero. His research focuses on video understanding, video–language learning, and 4D scene representations; with applications to embodied and egocentric perception. In the last months of his PhD, Lorenzo was a Research Scientist intern at Meta AI (FAIR) in Paris, where he worked with Adrien Bardes and Yann LeCun on large-scale self-supervised video understanding.

Abstract: Lorenzo will present his recent work, V-JEPA 2.1, a family of self-supervised models that learn dense, high-quality visual representations for both images and videos while retaining strong global scene understanding. The approach combines four key components. First, a dense predictive loss uses a masking-based objective in which both visible and masked tokens contribute to the training signal, encouraging explicit spatial and temporal grounding. Second, deep self-supervision applies the self-supervised objective hierarchically across multiple intermediate encoder layers to improve representation quality. Third, multi-modal tokenizers enable unified training across images and videos. Finally, the model benefits from effective scaling in both model capacity and training data. Together, these design choices produce representations that are spatially structured, semantically coherent, and temporally consistent.
Empirically, V-JEPA 2.1 achieves state-of-the-art performance on several challenging benchmarks, including 7.71 mAP on Ego4D for short-term object-interaction anticipation and 40.8 Recall@5 on EPIC-KITCHENS for high-level action anticipation, as well as a 20-point improvement in real-robot grasping success rate over V-JEPA-2 AC. The model also demonstrates strong performance in robotic navigation (5.687 ATE on TartanDrive), depth estimation (0.307 RMSE on NYUv2 with a linear probe), and global recognition (77.7 on Something-Something-V2). These results show that V-JEPA 2.1 significantly advances the state of the art in dense visual understanding and world modeling.

"How to (post-)train an LLM"

Iñaki Iturrate, PhD. Gemini post-training lead | Google Deepmind.

30th - APRIL @ 14h - A07

Bio: Iñaki Iturrate is a research scientist and engineer at Google DeepMind, where he has worked on the development of large-scale AI models, including the Gemini family of multimodal models. Previously he also led a team developing end-to-end machine learning solutions for Amazon Transportation worldwide. He holds a Ph.D. from the University of Zaragoza, supervised by J. Minguez and L. Montesano, specializing in the intersection of machine learning and neuroscience, developing new Brain-Computer interfaces to control robotic systems. Before joining the tech industry, Iñaki held several postdoctoral researcher positions at EPFL in Switzerland and the University of California, Berkeley. He has published 50+ works in top journals and conferences in applied ML and neuroscience, and received international coverage on the news, grants (including a Marie Curie EU post-doctoral fellowship) and international prizes.

Abstract: Gemini is family of multimodal models that exhibit remarkable capabilities across image, audio, video, and text understanding, including different model sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Iñaki will give an overview on how LLMs are trained, with special emphasis on the post training of the models.

PAST SEMINARS

Neural Nanophotonics for Physical AI

Ethan Tseng - CTO of Cephia; prev. PhD student at Princeton University

13 th - APRIL @ 17h - Online (https://meet.google.com/ykf-nwxd-zut)

How to give a good (research) talk
Diego Gutierrez - Full Professor, Dept. Informática e Ingeniería de Sistemas, Universidad de Zaragoza
10 th - APRIL @ 12h - A07

Publish or Perish, Part 1: Why, When, Where, How much?
Juan D. Tardós. Full Professor, Dept. Informática e Ingeniería de Sistemas, Universidad de Zaragoza

13 th - MARCH @12h - A07

Perception-Based Techniques to Enhance User Experience in Virtual Reality
Colin Groth. Immersive Computing Lab @ NYU, New York, USA.
March, 4th @15h - Online - Streamed at A.07

Planning and Control of Biped Climbing Robots
Marc Fabregat, PhD. Robotics Expert @ BSH, Zaragoza, Spain.
February 20th @12h - A07 classroom, Ada Byron.

Camera Calibration in Sports
Floriane Magera, Innovation Engineer at EVS Broadcast Equipment. Researcher at Univ. of Liège (Belgium).
December 18th @15h - A07

Perceptually Inspired Learning Models for Intuitive Authoring of Material Appearance (PhD Defense)
Julia Guerrero-Viu, Graphics and Imaging Lab, Universidad de Zaragoza, Spain.
February 2nd @15h - Sala de Conferencias I3A - Edificio i+D+I

Page updated

Report abuse