Event-driven vision to unlock low-latency real-time applications
Abstract
Biological sensory systems have developed to best capture the properties of surrounding objects and environment that are useful for acting in the world. The physical properties of tactile, visual and auditory sensory organs, and the way neurons encode the characteristics of each stimulus allow our brain to make sense of the world and take appropriate decisions on how to behave.
This is done by a very efficient system that spares the slightest bit of information, to avoid consuming too much energy for each single action. As such, artificial systems have much to learn from biology, to develop cheap solutions that can run in a very small device and at minimum energy cost.
Since the first prototypes of neuromorphic vision sensors and computing devices, part of the community focused its efforts in deploying neuromorphic devices in practical applications, to exploit their intrinsic compression, low latency, high temporal resolution, high dynamic range.
The quest to find the best strategy to exploit neuromorphic engineering is still open, but a lot of progress has been made. In this talk, I’ll describe those approaches that focus on low latency, real-time and light weight implementations of event-driven vision, with an example on eye tracking.
Biography
Chiara Bartolozzi is Researcher at the Italian Institute of Technology. She earned a degree in Engineering at University of Genova (Italy) and a Ph.D. in Neuroinformatics at ETH Zurich, developing analog subthreshold circuits for emulating biophysical neuronal properties onto silicon and modelling selective attention on hierarchical multi-chip systems. She is currently leading the Event-Driven Perception for Robotics group, with the aim of applying the "neuromorphic" engineering approach to the design of robotic platforms as enabling technology towards the design of autonomous machines.
Towards Robust Eye Tracking for Everyday Smart Eyewear Use
Abstract
Smart eyewear introduces unique challenges for eye tracking, from power constraints and device slippage to the variability of real-world environments and the shifting demands of everyday tasks. This talk will examine strategies to address these issues, beginning with conventional camera-based approaches and extending to low-power, high-frequency alternatives such as photosensors. I will discuss how the spatiotemporal dynamics of eye movements and multi-rate sensor fusion can contribute to robust and fast eye tracking in realistic settings. The talk will also consider existing datasets that support research in this direction, such as OpenSFEDS, and briefly reflect on how context-dependent strategies may shape future developments.
Biography
Cristina Palmero is a research associate at the Social AI and Robotics Lab at King’s College London. She holds a PhD in Engineering and Applied Sciences from University of Barcelona, where she focused on gaze estimation through spatiotemporal and multimodal deep learning. Cristina also holds an MSc in Artificial Intelligence and a BEng in Audiovisual Telecommunications Systems from the Polytechnic University of Catalonia. With more than a decade of experience across academia and industry (e.g. Computer Vision Center, Noldus IT, Meta), her research lies at the intersection of computer vision and machine learning for context-aware human behaviour understanding, analysis, and synthesis. She also leads the UDIVA dataset initiative within the HuPBA group at University of Barcelona, to support context-aware modeling of social interactions.
Toward Everyday Sensing and Modelling of Gaze Behavior
Abstract
Smart eyewear and ubiquitous gaze sensing have the potential to revolutionize our understanding of human attentive behavior in everyday settings, enabling several novel applications. In this talk, I will start by discussing our work using computer vision to sense visual attention on mobile devices in the wild, showcasing the behavioral insights pervasive gaze sensing could uncover. Afterward, leveraging lower-level gaze features, I will demonstrate how we can build machine learning models to predict higher-level cognitive functions, such as recallability — the ability to remember specific information — with a focus on information visualizations. Finally, recognizing that smart eyewear is not always widely available, I will discuss the need for computational models of gaze behavior to bridge this gap.
Biography
Mihai Bâce is an Assistant Professor leading the Computational Interaction for Intelligent Systems (CIIS) group. He is part of the Human-Computer Interaction research unit and an affiliated member of the e-Media research laboratory. Mihai's background is in computer science, with a PhD from ETH Zurich (Switzerland), a master's degree from EPFL (Switzerland), and a bachelor's from the Technical University of Cluj-Napoca (Romania). Mihai is an AI researcher working at the intersection of machine learning and human-computer interaction. His main goal is to develop new methods to better sense, understand, and computationally model human behaviours when interacting with complex (AI) systems.
OpenEdgeETH Glasses: A Fully Open, Ultra-Efficient On-Device AI Platform for Smart Eyewear
Abstract
This talk presents OpenEdgeETH Glasses, a new open platform that integrates dual ARM and RISC-V processors for ultra-efficient on-device AI in the milliwatt power envelope. The system incorporates multiple camera types., including low-power RGB and event-based cameras, as well as various sensors for comprehensive environmental perception. Leveraging these hardware capabilities, we introduce TinyssimoYOLO, a kilobytes highly optimized object-detection algorithm specifically tailored for resource-constrained devices, alongside other energy-efficient AI models. By combining state-of-the-art processor architectures and sensor technology, OpenEdgeETH Glasses aims to push the boundaries of wearable edge AI, delivering real-time data processing at minimal power budgets. This talk will delve into the platform’s design, discuss the challenges of integrating ARM and RISC-V cores in a wearable form factor, and explore how our optimized AI solutions can enable next-generation smart-glasses applications in fields such as assistive technologies, augmented reality, and beyond.
Biography
Michele Magno is Senior Scientist at the Department of Information Technology and Electrical Engineering at ETH Zurich, He hold a Privatedozenten habiliation and his group includes over 10 phd students and senior researchers. He focuses on Edge AI and energy-efficient smart sensor systems for robotics, wearable , biomedical, IoT, and ambient intelligence applications. His work spans machine learning on low-power microcontrollers, sensor interfaces, energy harvesting, and ultra-low-power communication (including wake-up radios). Dr. Magno has published over 300 papers in peer-reviewed journals and conferences, along with multiple book chapters, solidifying his reputation as a leading voice in the field. He is a Senior Member of the IEEE, a member of the ACM, and actively collaborates with both academic and industrial partners to drive the development of next-generation wearable and IoT solutions. He is visiting full professsor at Mid Sweden University, Sweden, and IT:U, Linz , Austria.
Listen and watch what matters: research projects with smart glasses at AImageLab
Abstract
The talk presents three research projects focused on smart glasses carried out at AImageLab - University of Modena and Reggio Emilia. In particular, we will see how to use eye-tracking glasses in the automotive field to learn and then predict with AI where it is best to look while driving. Next, a human pose and gaze estimation project will be shown, enabled by the collection of a dataset acquired with smart glasses. Finally, some details will be shown on the Auralys project, currently under development, which involves the use of microphones mounted on glasses as an aid tool for better speech understanding, useful for people with particular needs or pathologies.
Biography
Roberto Vezzani graduated in Computer Engineering in 2002 and received his PhD course in Information Engineering in 2007 at the University of Modena and Reggio Emilia, Italy. Since 2016 is Associate Professor at the Dipartimento di Ingegneria "Enzo Ferrari" of the University of Modena and Reggio Emilia. His research interests mainly belong to video surveillance systems, with particular focus on head, hand and body pose estimation, vision based HCI. He is also specializing in IoT, tinyML and multi-sensory systems, ranging from thermal cameras, depth, events to sensors.
Deploying AI vision models on edge devices: a case report
Abstract
In this talk we report on our experience gained in implementing tiny visual models on edge devices that can efficiently implement object detection and tracking tasks. Specifically we focus on the Sony IMX-500, discussing a couple of implementation set-ups and our prototype implementations. We then focus on CLIDE a framework that supports an effective distillation process, based on the teacher-student approach, leading to an effective methodology for obtaining an effective tiny model for the scenario of interest.
Biography
Daniele Nardi is full Professor (since 2000) at Sapienza Univ. Roma, Faculty of Information Engineering, Informatics, and Statistics, Department of Computer, Control and Management Engineering "A. Ruberti", where he thaught Artificial Intelligence courses for about 20 years. In Sapienza University, he was chair of the curricula in Computer Engineering starting the Master degree in Artificial Intelligence and Robotics (2009), chair of the PhD programme in Engineering in Computer Science and of the National PhD programme in Artificial Intelligence, and faculty vice-dean for International Relationships.
Daniele Nardi is the leader of the DIAG research lab "Artificial Intelligence and Robotics". His research on AI and Robotics was supported by national and international grants in several application domains: Elderly Care, Disaster Response, Cultural Heritage, Precision Agriculture, RoboCup Soccer Player robots. Daniele Nardi has a rich publication record in AI and Robotics (H-index > 53), including several Best Paper awards. He is EurAI Fellow, and former President of RoboCup Federation (2011-2014) and former Director of the Laboratory on Artificial Intelligence and Intelligent Systems (AIIS) of National Consortium for Informatics (CINI) (2022-2024).
Presentation
Egocentric Vision as a Bridge for Human-AI Collaboration
Abstract
A critical barrier to creating effective Human-AI collaborative systems is the AI's lack of understanding of the phisical world from the human perspective. Traditional Computer Vision observes the world from an external, third-person viewpoint, struggling to grasp the context, focus, and intent behind human actions. Towards this direction, egocentric vision has the potential to serve as a bridge for Human-AI collaboration. In this talk, I will first discuss how understanding human-object relations can serve as solid fundations to build such bridge. The talk will then explore how the ability to predict the future is key for egocentric vision systems. Finally, I’ll discuss methods that aim to assist humans in their contexts, providing personalized assistance in their daily activities or in the execution of procedural activities
Biography
Antonino Furnari is a tenure-track Assistant Professor at the University of Catania. He earned his PhD in Mathematics and Computer Science in 2017 at University of Catania. He is part of the Image Processing Laboratory and the FPV@IPLAB group, co-founder of Next Vision, and a member of EPIC-KITCHENS, EGO4D, and EGO-EXO4D teams. His research focuses on understanding human activity and intent from egocentric video to develop assistive AI systems.
What are event-based cameras good for?
Abstract
Event-based cameras have been part of the technological landscape for several decades, offering unique advantages in various computer vision applications. Despite this, their functioning remains enigmatic to many. This talk aims to answer the following questions: What practical applications do they serve? How can we make the best use of them?
To answer these questions, I will delve into the extensive experience that Prophesee has garnered in deploying event-based cameras across various applications, such as robotics, computational photography, and eye tracking. We will explore the distinct benefits these cameras bring, such as high dynamic range, low latency, and efficient data processing. Additionally, I will address the current challenges faced in this field and highlight the promising opportunities on the horizon.
Biography
Daniele Perrone is Algorithm Director at Prophesee, where he leads research initiatives in computer vision and machine learning. His work focuses on applications for mobile phones, XR devices, industrial systems, and IoT. Over the course of nearly a decade at Prophesee, Daniele has made significant contributions to the advancement of event-based vision technology and holds several patents in this field. He earned his Ph.D. in Computer Vision from the University of Bern, Switzerland.
AI Sensors for Wearables
Abstract
This talk presents AI sensors for wearable devices, with a focus on smart and AR glasses as a natural form factor for everyday AI. First, it introduces key use cases such as object and text detection, gesture recognition, and context-aware features enabled by always-on sensing. Then, it reviews system architecture and explains why traditional camera pipelines fall short in terms of power and latency. Next, it presents AI sensor solutions, including digital pixel sensors with high dynamic range and fully integrated AI sensors with on-sensor compute. Near-sensor compute options using co-packaged processors are also discussed. The talk outlines important sensor requirements such as low power, HDR, small size, and support for multiple imaging modes. It compares on-sensor and near-sensor compute, highlighting trade-offs in power, flexibility, and integration. Finally, it discusses system-level optimization, including ML model size, memory, and software support.
Biography
Raffaele Capoccia was born in Nardò, Italy, in 1990. He received the B.Sc. and M.Sc. degrees in electrical engineering from the Polytechnic University of Turin, Turin, Italy, in 2012 and 2014, respectively. He completed his Ph.D. in 2020 at the École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, under the supervision of Prof. C. Enz. During the PhD, he focused on the design and modeling of CMOS image sensors for ultra-low light conditions. Since 2020, he has been with Meta Reality Labs Research as a research scientist, working on the design of custom and smart image sensors for AR, VR, and smart glasses applications.