EPFL - Swiss Federal Technology Institute of Lausanne
Vision-Language Models for Embodied AI
I will present recent advances in robot learning for object localization and manipulation, with a focus on improving generalization, data efficiency, and real-world deployment. The talk will explore how to leverage both egocentric and exocentric perception. First I will introduce an open-vocabulary architecture that uses a vision-language model (VLM) to estimate the 6D pose of previously unseen objects from natural language prompts. Next, I will discuss a two-stage framework for object-arrangement tasks, which uses a minimal number of demonstrations to train a VLM for zero-shot generalization.Finally, I will cover approaches to human-to-robot handover and how to leverage generative models to train a robot policy entirely in simulation. This method eliminates the need for real-world data collection and provides a foundation for creating personalized embodied assistants.
Politecnico di Torino
Learning human skills from egocentric videos: a path for human-level humanoids
Notwithstanding decades of research, robots still struggle in performing daily living activities. Recent advancements on VLAs are pushing the limits of what current manipulators can do, but are still overly limited to short horizon tasks, and are not applicable to cases in which textual descriptions are not sufficient. In our research, we investigate alternative solutions which use videos of human activities collected from a first person perspective as a rich source of knowledge to i) capture the nuances of human planning for long horizon procedures, ii) provide a proper physical grounding, and iii) learn actionable policies for long horizon (procedural) tasks execution. To do this, we take inspiration from the inherent hierarchical structure of human cognitive processing, and foster the development of architectures that expose and highlight the hierarchical representations of human activities, which we can use to better understand (and replicate) human behaviour in daily living activities.
Università Mercatorum
Deep understanding of shopper behaviors and interactions: research priorities to move on from insights to simulation
Advances in artificial intelligence and computer vision are causing a significant shift in the retail industry. A better knowledge of human behavior, including how consumers engage with items, navigate stores, and make decisions in more mixed physical-digital worlds, is at the core of this revolution. The most recent developments in third-view computer vision for action detection, trajectory analysis with (not-only) vision, and expert system automation of shelf inspection will all be covered in this talk. The discussion will highlight the difficulties in bridging the gap between technical innovation and human-centric design, drawing on research and real-world deployments. It will also provide a vision for the future generation of intelligent retail spaces.
KPMG
AI-Powered Retail: from innovation to Value Creation
Vincenzo Martinese is a Partner at KPMG Italy Advisory practice and currently serves as the Head of the Consumer & Retail Sector. With over 25 years of experience, Vincenzo has led transformative initiatives across strategy, finance, and IT, helping major retail and consumer goods companies navigate complex change. His leadership is defined by a deep understanding of market dynamics, a commitment to innovation, and a focus on delivering sustainable growth. Vincenzo is recognized for his ability to align strategic vision with operational execution, making him a trusted advisor to both national and international clients in the Italian retail landscape.