Event-based Vision

This is a topic that I started investigating when I moved to Zurich to work at the Robotics and Perception Group (UZH and ETH).

Event cameras, such as the Dynamic Vision Sensor (DVS), are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. They offer significant advantages over standard cameras, namely a very high dynamic range, minimal motion blur, and a latency in the order of microseconds. However, because the output is composed of a sequence of asynchronous events rather than actual intensity images, traditional vision algorithms cannot be applied, so that new algorithms that exploit the high temporal resolution and the asynchronous nature of the sensor are required.

Bio-inspired robotic eyes that better estimate motion - Advanced Science NewsEvent cameras mimic the human eye to allow robots to navigate their environment, and a new approach helps minimize computational costs.

Silicon retinas to help robots navigate the world - Advanced Science NewsEvent cameras, or silicon retinas, fuse data from multiple, moving cameras, simulating how the human eye perceives the world in 3D.

Eventkameras: eine völlig neue Art zu fotografierenDer Exzellenzcluster Science of Intelligence an der TU Berlin forscht zu neuen „Eventkameras“, die zukünftig in Robotern und unseren Smartphones zum Einsatz kommen könnten

Das Unsichtbare sichtbar machen: Schlierenfotografie mit EventkamerasEin völlig neues Konzept in der Fotografie kann die Aufnahme von Strömungen transparenter Medien wesentlich verbessern

Penguins in a State of EcstasyTU Berlin researchers use novel event cameras to shed light on a strange behavior shown by Antarctic penguins

SCIoI @ECCV 2024 – Advancing robotic vision with event cameras and intelligent motion tracking - scienceofintelligence.deAt the European Conference on Computer Vision (ECCV) 2024, held this year in Milan from 29 September to 4 October, thousands of scientists, engineers, and

Fourier‐Based Action Recognition for Wildlife Behavior Quantification with Event CamerasThis article investigates the detection of oscillatory motion in nature, such as in a penguin colony in Antarctica, by combining neuromorphic cameras with frequency-domain techniques. The pixels of t...

Tracking the untrackable: A method that works when video fails - scienceofintelligence.deNew method developed by researchers from Science of Intelligence (SCIoI), TU Berlin and the University of Pennsylvania lets us follow fast motion where

Team of Researchers led by SCIoI’s Guillermo Gallego Receive IEEE Honorable Mention for Work on Robot Vision Stabilization - scienceofintelligence.deSCIoI PI Guillermo Gallego has received a major recognition for his work on how robots see and interpret the world. His recent research paper, “On the

SCIoI’s Guillermo Gallego to co-organize 5th International Workshop on Event-based Vision at CVPR 2025 in Nashville - scienceofintelligence.deOn 12 June, 2025, Science of Intelligence PI Guillermo Gallego will once again co-organize the 5th International Workshop on Event-based Vision, to be held in

When machines learn to see differently, and artists start watching - scienceofintelligence.deWhat if the future of seeing doesn’t lie in seeing more, but in seeing differently: How a robot lab, a media artist, and a novel camera technology joined

Survey paper (IEEE TPAMI 2022) and List of Event-based Vision Resources

After the success of the First International Workshop on Event-based Vision at ICRA'17, where we saw a large and growing number of people interested in event-based cameras, we started a List of Event-based Vision Resources and wrote a Survey paper. The paper is a comprehensive introduction to the topic. The list collects links to event camera devices as well as papers, videos, code, presentations, etc. describing the algorithms and systems developed using this exciting technology. We hope the list will help us as well as people interested in this technology to be more aware of past and recent developments by directing them to the appropriate references, which are organized by topics, as shown in the Table of Contents at the top of the list.

Workshops on Event-based Vision

CVPR'25: Fifth International Workshop on Event-based Vision, Nashville, USA.
CVPR'23: Fourth International Workshop on Event-based Vision, Vancouver, Canada. Videos of the talks are available online!
CVPR'21: Third International Workshop on Event-based Vision. Videos of the talks are available online!
ICRA'20: Workshop on Sensing, Estimating and Understanding the Dynamic World
CVPR'19: Second International Workshop on Event-based Vision and Smart Cameras, Long Beach, USA. The slides and videos of the talks are available online!
ICRA'17: First International Workshop on Event-based Vision, Singapore. Video recordings and slides are available!

2017 Misha Mahowald Prize for Neuromorphic Engineering

Our research on event cameras for robotic applications wins the 2017 Misha Mahowald Prize! The award recognizes outstanding achievement in the field of neuromorphic engineering.

Recent Papers & Statistics

Here are some recent papers on event-based vision published in computer vision and robotic venues. As we can see, it is an emerging topic that more and more people are joining to investigate. This list is compiled from this repository.

If you use this spreadsheet in a publication, please cite it as "G.Gallego, Recent papers on event-based vision", (Online) [copy URL here]".

Papers on Event-based vision

2025_WACVW_EvGen_Gallego.pdf

WACV'25 Workshop EvGEN: Event-based Vision in the Era of Generative AI

2025-03-07 Event-based Vision - GaTech.pdf

DCL Seminar at Georgia Tech

2024_IROSW_Gallego.pdf

IROS'24 Workshop on Embodied Neuromorphic AI for Robotic Perception and Control

2024_ECCV_NeVi-W_Gallego.pdf

ECCV'24 NeVi Workshop. Invited Keynote presentation

Event-based SLAM

G. Gallego, J. Hidalgo-Carrió, D. Scazamuzza

Chapter10: Event-based SLAM

SLAM Handbook. From Localization and Mapping to Spatial Intelligence, 2025.

@inbook{sh-ch10-event,
title = {Event-based {SLAM}},
author = {Guillermo Gallego and Javier Hidalgo-Carri{\'{o}} and Davide Scaramuzza},
booktitle = {{SLAM Handbook.} From Localization and Mapping to Spatial Intelligence},
publisher = {Cambridge University Press},
editor = {Luca Carlone and Ayoung Kim and Frank Dellaert and Timothy Barfoot and Daniel Cremers},
year = 2025
}

PDF, Project page

Simultaneous Motion And Noise Estimation with Event Cameras (ESMD)

Event cameras are emerging vision sensors, whose noise is challenging to characterize. Existing denoising methods for event cameras consider other tasks such as motion estimation separately (i.e., sequentially after denoising). However, motion is an intrinsic part of event data, since scene edges cannot be sensed without motion. This work proposes, to the best of our knowledge, the first method that simultaneously estimates motion in its various forms (e.g., ego-motion, optical flow) and noise. The method is flexible, as it allows replacing the 1-step motion estimation of the widely-used Contrast Maximization framework with any other motion estimator, such as deep neural networks. The experiments show that the proposed method achieves state-of-the-art results on the E-MLB denoising benchmark and competitive results on the DND21 benchmark, while showing its efficacy on motion estimation and intensity reconstruction tasks. We believe that the proposed approach contributes to strengthening the theory of event-data denoising, as well as impacting practical denoising use-cases, as we release the code upon acceptance.

Reference:

S. Shiba, Y. Aoki, G. Gallego,

Simultaneous Motion And Noise Estimation with Event Cameras

IEEE International Conference on Computer Vision (ICCV), 2025.

PDF, Project page

Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras (E2FAI)

Event cameras rely on motion to obtain information about scene appearance. This means that appearance and motion are inherently linked: either both are present and recorded in the event data, or neither is captured. Previous works treat the recovery of these two visual quantities as separate tasks, which does not fit with the above-mentioned nature of event cameras and overlooks the inherent relations between them. We propose an unsupervised learning framework that jointly estimates optical flow (motion) and image intensity (appearance) using a single network. From the data generation model, we newly derive the event-based photometric error as a function of optical flow and image intensity. This error is further combined with the contrast maximization framework to form a comprehensive loss function that provides proper constraints for both flow and intensity estimation. Exhaustive experiments show our method's state-of- the-art performance: in optical flow estimation, it reduces EPE by 20% and AE by 25% compared to unsupervised approaches, while delivering competitive intensity estimation results, particularly in high dynamic range scenarios. Our method also achieves shorter inference time than all other optical flow methods and many of the image reconstruction methods, while they output only one quantity.

Reference:

S. Guo, F. Hamann, G. Gallego,

Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras

IEEE International Conference on Computer Vision (ICCV), 2025.

PDF, Project page

DERD-Net: Learning Depth from Event-based Ray Densities

Event cameras offer a promising avenue for multi-view stereo depth estimation and Simultaneous Localization And Mapping (SLAM) due to their ability to detect blur-free 3D edges at high-speed and over broad illumination conditions. However, traditional deep learning frameworks designed for conventional cameras struggle with the asynchronous, stream-like nature of event data, as their architectures are optimized for discrete, image-like inputs. We propose a scalable, flexible and adaptable framework for pixel-wise depth estimation with event cameras in both monocular and stereo setups. The 3D scene structure is encoded into disparity space images (DSIs), representing spatial densities of rays obtained by back-projecting events into space via known camera poses. Our neural network processes local subregions of the DSIs combining 3D convolutions and a recurrent structure to recognize valuable patterns for depth prediction. Local processing enables fast inference with full parallelization and ensures constant ultra-low model complexity and memory costs, regardless of camera resolution. Experiments on standard benchmarks (MVSEC and DSEC datasets) demonstrate unprecedented effectiveness: (i) using purely monocular data, our method achieves comparable results to existing stereo methods; (ii) when applied to stereo data, it strongly outperforms all state-of-the-art (SOTA) approaches, reducing the mean absolute error by at least 42%; (iii) our method also allows for increases in depth completeness by more than 3-fold while still yielding a reduction in median absolute error of at least 30%. Given its remarkable performance and effective processing of event-data, our framework holds strong potential to become a standard approach for using deep learning for event-based depth estimation and SLAM.

Reference:

D. de Oliveira Hitzges, S. Ghosh, G. Gallego,

DERD-Net: Learning Depth from Event-based Ray Densities

(under review), 2025.

PDF, Project page

ETAP: Event-based Tracking of Any Point

Tracking any point (TAP) recently shifted the motion estimation paradigm from focusing on individual salient points with local templates to tracking arbitrary points with global image contexts. However, while research has mostly focused on driving the accuracy of models in nominal settings, addressing scenarios with difficult lighting conditions and high-speed motions remains out of reach due to the limitations of the sensor. This work addresses this challenge with the first event camera-based TAP method. It leverages the high temporal resolution and high dynamic range of event cameras for robust high-speed tracking, and the global contexts in TAP methods to handle asynchronous and sparse event measurements. We further extend the TAP framework to handle event feature variations induced by motion -- thereby addressing an open challenge in purely event-based tracking -- with a novel feature-alignment loss which ensures the learning of motion-robust features. Our method is trained with data from a new data generation pipeline and systematically ablated across all design decisions. Our method shows strong cross-dataset generalization and performs 136% better on the average Jaccard metric than the baselines. Moreover, on an established feature tracking benchmark, it achieves a 20% improvement over the previous best event-only method and even surpasses the previous best events-and-frames method by 4.1%. Our code is available.

Reference:

F. Hamann, D. Gehrig, F. Febryanto, K. Daniilidis, G. Gallego,

ETAP: Event-based Tracking of Any Point

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
Highlight

PDF, Poster, Project page
SCIoI Excellence Cluster News (04.2025)

Iterative Event-based Motion Segmentation by Variational Contrast Maximization

Event cameras provide rich signals that are suitable for motion estimation since they respond to changes in the scene. As any visual changes in the scene produce event data, it is paramount to classify the data into different motions (i.e., motion segmentation), which is useful for various tasks such as object detection and visual servoing. We propose an iterative motion segmentation method, by classifying events into background (e.g., dominant motion hypothesis) and foreground (independent motion residuals), thus extending the Contrast Maximization framework. Experimental results demonstrate that the proposed method successfully classifies event clusters both for public and self-recorded datasets, producing sharp, motion-compensated edge-like images. The proposed method achieves state-of-the-art accuracy on moving object detection benchmarks with an improvement of over 30%, and demonstrates its possibility of applying to more complex and noisy real-world scenes. We hope this work broadens the sensitivity of Contrast Maximization with respect to both motion parameters and input events, thus contributing to theoretical advancements in event-based motion segmentation estimation.

Reference:

R. Yamaki, S. Shiba, G. Gallego, Y. Aoki

Iterative Event-based Motion Segmentation by Variational Contrast Maximization

IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2025.

PDF, YouTube, Project page

Event-based Continuous Color Video Decompression from Single Frames

We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image and an event camera stream. Conventional cameras struggle with high-speed motion capture due to bandwidth and dynamic range limitations. Event cameras are ideal sensors to solve this problem because they encode compressed change information at high temporal resolution. In this work, we tackle the problem of event-based continuous color video decompression, pairing single static color frames and event data to reconstruct temporally continuous videos. Our approach combines continuous long-range motion modeling with a neural synthesis model, enabling frame prediction at arbitrary times within the events. Our method only requires an initial image, thus increasing the robustness to sudden motions, light changes, minimizing the prediction latency, and decreasing bandwidth usage. We also introduce a novel single-lens beamsplitter setup that acquires aligned images and events, and a novel and challenging Event Extreme Decompression Dataset (E2D2) that tests the method in various lighting and motion profiles. We thoroughly evaluate our method by benchmarking color frame reconstruction, outperforming the baseline methods by 3.61 dB in PSNR and by 33% decrease in LPIPS, as well as showing superior results on two downstream tasks.

Reference:

Z. Wang, F. Hamann, K. Chaney, W. Jiang, G. Gallego, K. Daniilidis

Event-based Continuous Color Video Decompression from Single Frames

IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2025.

PDF, Project page

Combined Physics and Event Camera Simulator for Slip Detection

Robot manipulation is a common task in fields like industrial manufacturing. Detecting when objects slip from a robot's grasp is crucial for safe and reliable operation. Event cameras, which register pixel-level brightness changes at high temporal resolution (called ``events''), offer an elegant feature when mounted on a robot's end effector: since they only detect motion relative to their viewpoint, a properly grasped object produces no events, while a slipping object immediately triggers them. To research this feature, representative datasets are essential, both for analytic approaches and for training machine learning models. The majority of current research on slip detection with event-based data is done on real-world scenarios and manual data collection, as well as additional setups for data labeling. This can result in a significant increase in the time required for data collection, a lack of flexibility in scene setups, and a high level of complexity in the repetition of experiments. This paper presents a simulation pipeline for generating slip data using the described camera-gripper configuration in a robot arm, and demonstrates its effectiveness through initial data-driven experiments. The use of a simulator, once it is set up, has the potential to reduce the time spent on data collection, provide the ability to alter the setup at any time, simplify the process of repetition and the generation of arbitrarily large data sets. Two distinct datasets were created and validated through visual inspection and artificial neural networks (ANNs). Visual inspection confirmed photorealistic frame generation and accurate slip modeling, while three ANNs trained on this data achieved high validation accuracy and demonstrated good generalization capabilities on a separate test set, along with initial applicability to real-world data.

Reference:

T. Reinold, S. Ghosh, G. Gallego,

Combined Physics and Event Camera Simulator for Slip Detection

IEEE Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 935-943.

PDF, Project page

ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras

Event-based visual odometry is a specific branch of visual Simultaneous Localization and Mapping (SLAM) techniques, which aims at solving tracking and mapping subproblems (typically in parallel), by exploiting the special working principles of neuromorphic (i.e., event-based) cameras. Due to the motion-dependent nature of event data, explicit data association (i.e., feature matching) under large-baseline view-point changes is difficult to establish, making direct methods a more rational choice. However, state-of-the-art direct methods are limited by the high computational complexity of the mapping sub-problem and the degeneracy of camera pose tracking in certain degrees of freedom (DoF) in rotation. In this paper, we tackle these issues by building an event-based stereo visual-inertial odometry system on top of a direct pipeline. Specifically, to speed up the mapping operation, we propose an efficient strategy for sampling contour points according to the local dynamics of events. The mapping performance is also improved in terms of structure completeness and local smoothness by merging the temporal stereo and static stereo results. To circumvent the degeneracy of camera pose tracking in recovering the pitch and yaw components of general 6-DoF motion, we introduce IMU measurements as motion priors via pre-integration. To this end, a compact back-end is proposed for continuously updating the IMU bias and predicting the linear velocity, enabling an accurate motion prediction for camera pose tracking. The resulting system scales well with modern high-resolution event cameras and leads to better global positioning accuracy in large-scale outdoor environments. Extensive evaluations on five publicly available datasets featuring different resolutions and scenarios justify the superior performance of the proposed system against five state-of-the-art methods.

Reference:

J. Niu, S. Zhong, X. Lu, S. Shen, G. Gallego, Y. Zhou,

ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras

IEEE Transactions on Robotics (TRO), 2025.

doi, PDF, Project page

Event-based Stereo Depth Estimation: A Survey

This survey aims to provide a comprehensive view on the topic of event-based stereo depth estimation. We cover the entire spectrum of approaches with key insights, highlighting their advantages and limitations. We categorize and discuss both instantaneous stereo methods, as well as long-term methods suitable for simultaneous localization and mapping (SLAM). We track the research evolution of the field, comparing and contrasting techniques both theoretically and empirically. This is the first survey to extensively cover the ever-growing learning-based literature on this topic. This is also the first survey that not only discusses existing datasets and benchmarks for event based stereo, but also provides practical suggestions for establishing new benchmarks, to advance the field. We discuss the major advantages and challenges faced by event-based stereo depth estimation. We identified several gaps and propose future research directions. Despite the large body of work on this topic, our study reveals that we are still far from reaching optimal performance, not only in terms of accuracy but also efficiency, which is one of the cornerstones of event-based computing. We hope this survey inspires future research in this area, by serving as an accessible entry point for newcomers, as well as a practical guide for seasoned researchers in the community.

Reference:

S. Ghosh and G. Gallego,

Event-based Stereo Depth Estimation: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.

doi, PDF, Table of Methods, Table of Datasets, GitHub

On the Benefits of Visual Stabilization for Frame- and Event-based Perception

Vision-based perception systems are typically exposed to large orientation changes in different robot applications. In such conditions, their performance might be compromised due to the inherent complexity of processing data captured under challenging motion. Integration of mechanical stabilizers to compensate for the camera rotation is not always possible due to the robot payload constraints. This paper presents a processing-based stabilization approach to compensate the camera's rotational motion both on events and on frames (i.e., images). Assuming that the camera's attitude is available, we evaluate the benefits of stabilization in two perception applications: feature tracking and estimating the translation component of the camera's ego-motion. The validation is performed using synthetic data and sequences from well-known event-based vision datasets. The experiments unveil that stabilization can improve feature tracking and camera ego-motion estimation accuracy in 27.37% and 34.82%, respectively. Concurrently, stabilization can reduce the processing time of computing the camera's linear velocity by at least 25%.

Reference:

J.P. Rodríguez-Gómez, J.R. Martínez-de Dios, A. Ollero, G. Gallego,

On the Benefits of Visual Stabilization for Frame- and Event-based Perception

IEEE Robotics and Automation Letters (RA-L), 2024.
Honorable Mention (only 5 papers among more than 1500 papers in RA-L during 2024)

doi, Project page

Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation

Current optical flow and point-tracking methods rely heavily on synthetic datasets. Event cameras are novel vision sensors with advantages in challenging visual conditions, but state-of-the-art frame-based methods cannot be easily adapted to event data due to the limitations of current event simulators. We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories and propose an efficient solution to solve the high-dimensional assignment problem between non-linear trajectories and events. Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model on the real-world dataset EVIMO2 by 29%. In optical flow estimation, our method elevates a simple UNet to achieve state-of-the-art performance among self-supervised methods on the DSEC optical flow benchmark.

Reference:

F. Hamann, Z. Wang, I. Asmanis, K. Chaney, G. Gallego, K. Daniilidis,

Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation

European Conference on Computer Vision (ECCV), 2024.

PDF, Poster, Project page
Invited talk at NeuroPAC

Event-based Photometric Bundle Adjustment (EPBA)

We tackle the problem of bundle adjustment (i.e., simultaneous refinement of camera poses and scene map) for a purely rotating event camera. Starting from first principles, we formulate the problem as a classical non-linear least squares optimization. The photometric error is defined using the event generation model directly in the camera rotations and the semi-dense scene brightness that triggers the events. We leverage the sparsity of event data to design a tractable Levenberg-Marquardt solver that handles the very large number of variables involved. To the best of our knowledge, our method, which we call Event-based Photometric Bundle Adjustment (EPBA), is the first event-only photometric bundle adjustment method that works on the brightness map directly and exploits the space-time characteristics of event data, without having to convert events into image-like representations. Comprehensive experiments on both synthetic and real-world datasets demonstrate EPBA's effectiveness in decreasing the photometric error (by up to 90%), yielding results of unparalleled quality. The refined maps reveal details that were hidden using prior state-of-the-art rotation-only estimation methods. The experiments on modern high-resolution event cameras show the applicability of EPBA to panoramic imaging in various scenarios (without map initialization, at multiple resolutions, and in combination with other methods, such as IMU dead reckoning or previous event-based rotation estimation methods).

Reference:

S. Guo and G. Gallego

Event-based Photometric Bundle Adjustment

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.

doi, PDF, Project page and ECRot Dataset

Event-based Mosaicing Bundle Adjustment (EMBA)

We tackle the problem of mosaicing bundle adjustment (i.e., simultaneous refinement of camera orientations and scene map) for a purely rotating event camera. We formulate the problem as a regularized non-linear least squares optimization. The objective function is defined using the linearized event generation model in the camera orientations and the panoramic gradient map of the scene. We show that this BA optimization has an exploitable block-diagonal sparsity structure, so that the problem can be solved efficiently. To the best of our knowledge, this is the first work to leverage such sparsity to speed up the optimization in the context of event-based cameras, without the need to convert events into image-like representations. We evaluate our method, called EMBA, on both synthetic and real-world datasets to show its effectiveness (50% photometric error decrease), yielding results of unprecedented quality. In addition, we demonstrate EMBA using high spatial resolution event cameras, yielding delicate panoramas in the wild, even without an initial map.

Reference:

S. Guo and G. Gallego

Event-based Mosaicing Bundle Adjustment

European Conference on Computer Vision (ECCV), 2024.

PDF, Poster, Project page and ECRot Dataset

Motion and Structure from Event-based Normal Flow

Recovering the camera motion and scene geometry from visual data is a fundamental problem in computer vision. Its success in conventional (frame-based) vision is attributed to the maturity of feature extraction, data association and multi-view geometry. The emergence of asynchronous (event-based) cameras calls for new approaches that use raw event data as input to solve this fundamental problem. State-of-the-art solutions typically infer data association implicitly by iteratively reversing the event data generation process. However, the nonlinear nature of these methods limits their applicability in real-time tasks, and the constant-motion assumption leads to unstable results under agile motion. To this end, we reformulate the problem in a way that aligns better with the differential working principle of event cameras. We show that event-based normal flow can be used, via the proposed geometric error term, as an alternative to the full (optical) flow in solving a family of geometric problems that involve instantaneous first-order kinematics and scene geometry. Furthermore, we develop a fast linear solver and a continuous-time nonlinear solver on top of the proposed geometric error term. Experiments on both synthetic and real data show the superiority of our linear solver in terms of accuracy and efficiency, and its practicality as an initializer for previous nonlinear solvers. Besides, our continuous-time non-linear solver exhibits exceptional capabilities in accommodating sudden variations in motion since it does not rely on the constant-motion assumption.

Reference:

Z. Ren, B. Liao, D. Kong, J. Li, P. Liu, L. Kneip, G. Gallego, Y. Zhou,

Motion and Structure from Event-based Normal Flow

European Conference on Computer Vision (ECCV), 2024.

PDF, Poster, Project page

ES-PTAM: Event-based Stereo Parallel Tracking and Mapping

Visual Odometry (VO) and SLAM are fundamental components for spatial perception in mobile robots. Despite enormous progress in the field, current VO/SLAM systems are limited by their sensors' capability. Event cameras are novel visual sensors that offer advantages to overcome the limitations of standard cameras, enabling robots to expand their operating range to challenging scenarios, such as high-speed motion and high dynamic range illumination. We propose a novel event-based stereo VO system by combining two ideas: a correspondence-free mapping module that estimates depth by maximizing ray density fusion and a tracking module that estimates camera poses by maximizing edge-map alignment. We evaluate the system comprehensively on five real-world datasets, spanning a variety of camera types (manufacturers and spatial resolutions) and scenarios (driving, flying drone, hand-held, egocentric, etc). The quantitative and qualitative results demonstrate that our method outperforms the state of the art in majority of the test sequences by a margin, e.g., trajectory error reduction of 45% on RPG dataset, 61% on DSEC dataset, and 21% on TUM-VIE dataset. To benefit the community and foster research on event-based perception systems, we release the source code and results.

Reference:

S. Ghosh, V. Cavinato, G. Gallego

ES-PTAM: Event-based Stereo Parallel Tracking and Mapping

European Conf. on Computer Vision Workshops (ECCVW), 2024. Oral Spotlight.

PDF, Poster, Project Page and Code

MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice

Enabled by large annotated datasets, tracking and segmentation of objects in videos has made remarkable progress in recent years. Despite these advancements, algorithms still struggle under degraded conditions and during fast movements. Event cameras are novel sensors with high temporal resolution and high dynamic range that offer promising advantages to address these challenges. However, annotated data for developing learning-based mask-level tracking algorithms with events is not available. To this end, we introduce: (i) a new task termed space-time instance segmentation, similar to video instance segmentation, whose goal is to segment instances throughout the entire duration of the sensor input (here, the input are quasi-continuous events and optionally aligned frames); and (ii) MouseSIS, a dataset for the new task, containing aligned grayscale frames and events. It includes annotated ground-truth labels (pixel-level instance segmentation masks) of a group of up to seven freely moving and interacting mice. We also provide two reference methods, which show that leveraging event data can consistently improve tracking performance, especially when used in combination with conventional cameras. The results highlight the potential of event-aided tracking in difficult scenarios. We hope our dataset opens the field of event-based video instance segmentation and enables the development of robust tracking algorithms for challenging conditions.

Reference:

F. Hamann, H. Li, P. Mieske, L. Lewejohann, G. Gallego

MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice

European Conf. on Computer Vision Workshops (ECCVW), 2024.

PDF, Poster, Project Page and Code

Hamann et al., SIS-Challenge: Event-based Spatio-temporal Instance Segmentation Challenge at the CVPR 2025 Event-based Vision Workshop, IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2025

PDF, Webpage

Low-power, Continuous Remote Behavioral Localization with Event Cameras

Researchers in natural science need reliable methods for quantifying animal behavior. Recently, numerous computer vision methods emerged to automate the process. However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage. Event cameras offer unique advantages for battery-dependent remote monitoring due to their low power consumption and high dynamic range capabilities. We use this novel sensor to quantify a behavior in Chinstrap penguins called ecstatic display. We formulate the problem as a temporal action detection task, determining the start and end times of the behavior. For this purpose, we recorded a colony of breeding penguins in Antarctica during several weeks and labeled event data on 16 nests. The developed method consists of a generator of candidate time intervals (proposals) and a classifier of the actions within them. The experiments show that the event cameras' natural response to motion is effective for continuous behavior monitoring and detection, reaching a mean average precision (mAP) of 58% (which increases to 63% in good weather conditions). The results also demonstrate the robustness against various lighting conditions contained in the challenging dataset. The low-power capabilities of the event camera allows to record three times longer than with a conventional camera.

Reference:

F. Hamann, S. Ghosh, I. Juárez-Martínez, T. Hart, A. Kacelnik, G. Gallego

(Event Penguins) Low-power, Continuous Remote Behavioral Localization with Event Cameras

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

doi, PDF, Poster, Project page
Press release at TU Berlin "Penguins in a state of Ecstasy" (ENG) / (DEU)
SCIoI news (04.2024)
SCIoI Excellence Cluster News (03.2022)

F. Hamann, S. Ghosh, I. Juárez-Martínez, T. Hart, A. Kacelnik, G. Gallego

Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras

Advanced Intelligent Systems, 2024.

doi, PDF

CMax-SLAM: Event-based Rotational-Motion Bundle Adjustment and SLAM System using Contrast Maximization

This paper considers the problem of rotational motion estimation using event cameras. Several event-based rotation estimation methods have been developed in the past decade, but their performance has not been evaluated and compared under unified criteria yet. In addition, these prior works do not consider a global refinement step. To this end, we conduct a systematic study of this problem with two objectives in mind: summarizing previous works and presenting our own solution. First, we compare prior works both theoretically and experimentally. Second, we propose the first event-based rotation-only bundle adjustment (BA) approach. We formulate it leveraging the state-of-the-art Contrast Maximization (CMax) framework, which is principled and avoids the need to convert events into frames. Third, we use the proposed BA to build CMax-SLAM, the first event-based rotation-only SLAM system comprising a front-end and a back-end. Our BA is able to run both offline (trajectory smoothing) and online (CMax-SLAM back-end). To demonstrate the performance and versatility of our method, we present comprehensive experiments on synthetic and real-world datasets, including indoor, outdoor and space scenarios. We discuss the pitfalls of real-world evaluation and propose a proxy for the reprojection error as the figure of merit to evaluate event-based rotation BA methods. We release the source code and novel data sequences to benefit the community. We hope this work leads to a better understanding and fosters further research on event-based ego-motion estimation.

Reference:

S. Guo and G. Gallego

CMax-SLAM: Event-based Rotational-Motion Bundle Adjustment and SLAM System using Contrast Maximization

IEEE Transactions on Robotics (TRO), 2024.

doi, PDF, Poster, Project page with Code, and ECRot Dataset

Event-based Background-Oriented Schlieren

Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution, and data efficiency) to overcome such limitations due to their bio-inspired sensing principle. This paper presents a novel technique for perceiving air convection using events and frames by providing the first theoretical analysis that connects event data and schlieren. We formulate the problem as a variational optimization one combining the linearized event generation model with a physically-motivated parameterization that estimates the temporal derivative of the air density. The experiments with accurately aligned frame- and event camera data reveal that the proposed method enables event cameras to obtain on par results with existing frame-based optical flow techniques. Moreover, the proposed method works under dark conditions where frame-based schlieren fails, and also enables slow-motion analysis by leveraging the event camera’s advantages. Our work pioneers and opens a new stack of event camera applications, as we publish the source code as well as the first schlieren dataset with high-quality frame and event data.

Reference:

S. Shiba, F. Hamann, Y. Aoki, G. Gallego

Event-based Background-Oriented Schlieren

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.

doi, PDF, Poster, Project page and Data
Press release at TU Berlin "Unveiling the Invisible: Schlieren Photography With Event Cameras" (ENG) / (DEU)
SCIoI Excellence Cluster News

Formulating Event-based Image Reconstruction as a Linear Inverse Problem with Deep Regularization using Optical Flow

Event cameras are novel bio-inspired sensors that measure per-pixel brightness differences asynchronously. Recovering brightness from events is appealing since the reconstructed images inherit the high dynamic range (HDR) and high-speed properties of events; hence they can be used in many robotic vision applications and to generate slow-motion HDR videos. However, state-of-the-art methods tackle this problem by training an event-to-image Recurrent Neural Network (RNN), which lacks explainability and is difficult to tune. In this work we show, for the first time, how tackling the combined problem of motion and brightness estimation leads us to formulate event-based image reconstruction as a linear inverse problem that can be solved without training an image reconstruction RNN. Instead, classical and learning-based regularizers are used to solve the problem and remove artifacts from the reconstructed images. The experiments show that the proposed approach generates images with visual quality on par with state-of-the-art methods despite only using data from a short time interval. State-of-the-art results are achieved using an image denoising Convolutional Neural Network (CNN) as the regularization function. The proposed regularized formulation and solvers have a unifying character because they can be applied also to reconstruct brightness from the second derivative. Additionally, the formulation is attractive because it can be naturally combined with super-resolution, motion-segmentation and color demosaicing. Code is available at https://github.com/tub-rip/event_based_image_rec_inverse_problem

Reference:

Z. Zhang, A. Yezzi, G. Gallego

Formulating Event-based Image Reconstruction as a Linear Inverse Problem with Deep Regularization using Optical Flow

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, No. 7, July 2023.

doi, PDF, Source Code

Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion

In this work we tackle the problem of event-based stereo 3D reconstruction for SLAM. Most event-based stereo methods try to exploit the camera's high temporal resolution and event simultaneity across cameras to establish matches and estimate depth. By contrast, we investigate how to estimate depth without explicit data association by fusing Disparity Space Images (DSIs) originated in efficient monocular methods. We develop fusion theory and apply it to design multi-camera 3D reconstruction algorithms that produce state-of-the-art results, as we confirm by comparing against four baseline methods and testing on a variety of available datasets.

References:

S. Ghosh and G. Gallego

MC-EMVS: Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion

Advanced Intelligent Systems (AISY), 4: 2200221, Sep. 2022.

doi, PDF, Poster, Project Page and Code,
Presentation at IEEE MFI workshop 2022 (YouTube), Slides
Presentation at the GRASP Laboratory (UPenn) seminar (YouTube)

S. Ghosh and G. Gallego

Event-based Stereo Depth Estimation from Ego-motion using Ray Density Fusion

European Conf. on Computer Vision Workshops (ECCVW) Ego4D, 2022.

PDF, Project Page and Code

S. Ghosh and G. Gallego

Event-based Stereo Depth for SLAM in Autonomous Driving

Behavior-driven Autonomous Driving in Unstructured Environments (BADUE) Workshop at IROS 2022.

Presentation (YouTube), Project Page and Code

Secrets of Event-Based Optical Flow, Depth and Ego-motion Estimation by Contrast Maximization

Event cameras respond to scene dynamics and offer advantages to estimate motion. Following recent image-based deep-learning achievements, optical flow estimation methods for event cameras have rushed to combine those image-based methods with event data. However, it requires several adaptations (data conversion, loss function, etc.) as they have very different properties. We develop a principled method to extend the Contrast Maximization framework to estimate optical flow from events alone. We investigate key elements: how to design the objective function to prevent overfitting, how to warp events to deal better with occlusions, and how to improve convergence with multi-scale raw events. With these key elements, our method ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark. Moreover, our method allows us to expose the issues of the ground truth flow in those benchmarks, and produces remarkable results when it is transferred to unsupervised learning settings. We release the code open source.

Reference:

S. Shiba, Y. Klose, Y. Aoki, G. Gallego

Secrets of Event-based Optical Flow, Depth and Ego-motion Estimation by Contrast Maximization
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. doi, PDF, Poster

Secrets of Event-Based Optical Flow
European Conference on Computer Vision (ECCV), Oct. 2022. PDF
Oral Presentation. Acceptance rate: 2.7%

YouTube, Poster, Project page and Code
Presentation at the PRG Seminar Series U. Maryland (Video)
Presentation at the GRASP Laboratory (UPenn) seminar (YouTube)
Presentation at ViEW 2022: イベントカメラを用いたオプティカルフロー推定：動きとは何か？

A Fast Geometric Regularizer to Mitigate Event Collapse in the Contrast Maximization Framework

Event cameras are emerging vision sensors and their advantages are suitable for various applications such as autonomous robots. Contrast maximization (CMax), which provides state-of-the-art accuracy on motion estimation using events, may suffer from an overfitting problem called event collapse. Prior works are computationally expensive or cannot alleviate the overfitting, which undermines the benefits of the CMax framework. We propose a novel, computationally efficient regularizer based on geometric principles to mitigate event collapse. The experiments show that the proposed regularizer achieves state-of-the-art accuracy results, while its reduced computational complexity makes it two to four times faster than previous approaches. To the best of our knowledge, our regularizer is the only effective solution for event collapse without trading off runtime. We hope our work opens the door for future applications that unlocks the advantages of event cameras.

Reference:

S. Shiba, Y. Aoki, G. Gallego

A Fast Geometric Regularizer to Mitigate Event Collapse in the Contrast Maximization Framework

Advanced Intelligent Systems (AISY), 5: 2200251, Jan. 2023.

doi, PDF, Project page

Event Collapse in Contrast Maximization Frameworks

Contrast maximization (CMax) is a framework that provides state-of-the-art results on several event-based computer vision tasks, such as ego-motion or optical flow estimation. However, it may suffer from a problem called event collapse, which is an undesired solution where events are warped into too few pixels. As prior works have largely ignored the issue or proposed workarounds, it is imperative to analyze this phenomenon in detail. Our work demonstrates event collapse in its simplest form and proposes collapse metrics by using first principles of space-time deformation based on differential geometry and physics. We experimentally show on publicly available datasets that the proposed metrics mitigate event collapse and do not harm well-posed warps. To the best of our knowledge, regularizers based on the proposed metrics are the only effective solution against event collapse in the experimental settings considered, compared with other methods. We hope that this work inspires further research to tackle more complex warp models.

Reference:

S. Shiba, Y. Aoki, G. Gallego

Event Collapse in Contrast Maximization Frameworks

Sensors 2022, 22(14):5190.

doi, PDF, Project page

Fast Event-based Optical Flow Estimation by Triplet Matching

Event cameras are novel bio-inspired sensors that offer advantages over traditional cameras (low latency, high dynamic range, low power, etc.). Optical flow estimation methods that work on packets of events trade off speed for accuracy, while event-by-event (incremental) methods have strong assumptions and have not been tested on common benchmarks that quantify progress in the field. Towards applications on resource-constrained devices, it is important to develop optical flow algorithms that are fast, light-weight and accurate. This work leverages insights from neuroscience, and proposes a novel optical flow estimation scheme based on triplet matching. The experiments on publicly available benchmarks demonstrate its capability to handle complex scenes with comparable results as prior packet-based algorithms. In addition, the proposed method achieves the fastest execution time (> 10 kHz) on standard CPUs as it requires only three events in estimation. We hope that our research opens the door to real-time, incremental motion estimation methods and applications in real-world scenarios.

Reference:

S. Shiba, Y. Aoki, G. Gallego

Fast Event-based Optical Flow Estimation by Triplet Matching

IEEE Signal Processing Letters (SPL), vol. 29, pp. 2712-2716, 2022.

doi, PDF

Stereo Co-capture System for Recording and Tracking Fish with Frame- and Event Cameras

This work introduces a co-capture system for multi-animal visual data acquisition using conventional cameras and event cameras. Event cameras offer multiple advantages over frame-based cameras, such as a high temporal resolution and temporal redundancy suppression, which enable us to efficiently capture the fast and erratic movements of fish. We furthermore present an event-based multi-animal tracking algorithm, which proves the feasibility of the approach and sets the baseline for further exploration of combining the advantages of event cameras and conventional cameras for multi-animal tracking.

Reference:

F. Hamann, G. Gallego,

Stereo Co-capture System for Recording and Tracking Fish with Frame- and Event Cameras

26th International Conference on Pattern Recognition (ICPR), Visual observation and analysis of Vertebrate And Insect Behavior (VAIB) Workshop, Montreal, Canada, 2022.

PDF

EDS: Event-aided Direct Sparse Odometry

We introduce EDS, a direct monocular visual odometry using events and frames. Our algorithm leverages the event generation model to track the camera motion in the blind time between frames. The method formulates a direct probabilistic approach of observed brightness increments. Per-pixel brightness increments are predicted using a sparse number of selected 3D points and are compared to the events via the brightness increment error to estimate camera motion. The method recovers a semi-dense 3D map using photometric bundle adjustment. EDS is the first method to perform 6-DOF VO using events and frames with a direct approach. By design it overcomes the problem of changing appearance in indirect methods. We also show that, for a target error performance, EDS can work at lower frame rates than state-of-the-art frame-based VO solutions. This opens the door to low-power motion-tracking applications where frames are sparingly triggered "on demand'' and our method tracks the motion in between. We release code and datasets to the public.

Reference:

J. Hidalgo-Carrió, G. Gallego, D. Scaramuzza

Event-aided Direct Sparse Odometry

IEEE Conference of Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5771-5780.
Oral Presentation. Acceptance rate: 4.2%

PDF, Poster, YouTube, CVPR Video, Project page and Dataset, Code

Stabilizing Event Data on Flapping-wing Robots for Simpler Perception

We propose a stabilization method for event cameras mounted onboard flapping-wing robots. Differently from frame-based cameras, event cameras do not suffer for motion blur that typically occurs due to strong changes in the camera orientation. The method intends to offer an alternative to heavy gimbals mounted on ornithopters. It has been tested on event data acquired by a large-scale ornithopter (1.5m wingspan).

Reference:

J.P. Rodríguez-Gómez, G. Gallego, J. R. Martı́nez-de Dios, A. Ollero

Stabilizing Event Data on Flapping-wing Robots for Simpler Perception

Workshop on Challenges of Flapping-wing aerial robots of the IEEE International Conference on Robotics and Automation (ICRA), 2022.

PDF, Slides

ESL: Event-based Structured Light

Event cameras are bio-inspired sensors providing significant advantages over standard cameras such as low latency, high temporal resolution, and high dynamic range. We propose a novel structured-light system using an event camera to tackle the problem of accurate and high-speed depth sensing. Our setup consists of an event camera and a laser-point projector that uniformly illuminates the scene in a raster scanning pattern during 16 ms. Previous methods match events independently of each other, and so they deliver noisy depth estimates at high scanning speeds in the presence of signal latency and jitter. In contrast, we optimize an energy function designed to exploit event correlations, called spatio-temporal consistency. The resulting method is robust to event jitter and therefore performs better at higher scanning speeds. Experiments demonstrate that our method can deal with high-speed motion and outperform state-of-the-art 3D reconstruction methods based on event cameras, reducing the RMSE by 83% on average, for the same acquisition time.

Reference:

M. Muglikar, G. Gallego, D. Scaramuzza

ESL: Event-based Structured Light

IEEE International Conference on 3D Vision (3DV), 2021, pp. 1165-1174.

PDF, Poster, YouTube, Project page, Dataset and Code

Event-based Motion Segmentation with Spatio-Temporal Graph Cuts

Identifying independently moving objects is an essential task for dynamic scene understanding. However, traditional cameras used in dynamic scenes may suffer from motion blur or exposure artifacts due to their sampling principle. By contrast, event-based cameras are novel bio-inspired sensors that offer advantages to overcome such limitations. They report pixel-wise intensity changes asynchronously, which enables them to acquire visual information at exactly the same rate as the scene dynamics. We develop a method to identify independently moving objects acquired with an event-based camera, i.e., to solve the event-based motion segmentation problem. We cast the problem as an energy minimization one involving the fitting of multiple motion models. We jointly solve two subproblems, namely event cluster assignment (labeling) and motion model fitting, in an iterative manner by exploiting the structure of the input event data in the form of a spatio-temporal graph. Experiments on available datasets demonstrate the versatility of the method in scenes with different motion patterns and number of moving objects. The evaluation shows state-of-the-art results without having to predetermine the number of expected moving objects. We release the software and dataset under an open source license to foster research in the emerging topic of event-based motion segmentation.

Reference:

Y. Zhou, G. Gallego, X. Lu, S. Liu, S. Shen

Event-based Motion Segmentation with Spatio-Temporal Graph Cuts

IEEE Transactions on Neural Networks and Learning Systems (TNNLS), vol. 34, no. 8, pp. 4868-4880, Aug. 2023.

doi, PDF, Source Code, Project page and Data

The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data

Event cameras, inspired by biological vision systems, provide a natural and data efficient representation of visual information. Visual information is acquired in the form of events that are triggered by local brightness changes. However, because most brightness changes are triggered by relative motion of the camera and the scene, the events recorded at a single sensor location seldom correspond to the same world point. To extract meaningful information from event cameras, it is helpful to register events that were triggered by the same underlying world point. In this work we propose a new model of event data that captures its natural spatio-temporal structure. We start by developing a model for aligned event data. That is, we develop a model for the data as though it has been perfectly registered already. In particular, we model the aligned data as a spatio-temporal Poisson point process. Based on this model, we develop a maximum likelihood approach to registering events that are not yet aligned. That is, we find transformations of the observed events that make them as likely as possible under our model. In particular we extract the camera rotation that leads to the best event alignment. We show new state of the art accuracy for rotational velocity estimation on the DAVIS 240C dataset. In addition, our method is also faster and has lower computational complexity than several competing methods.

Reference:

C. Gu, E. Learned-Miller, D. Sheldon, G. Gallego, P. Bideau

The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data

IEEE International Conference on Computer Vision (ICCV), 2021, pp. 13475-13484.

PDF, YouTube, Project page, Code

Event-based Stereo Visual Odometry

Event-based cameras are bio-inspired vision sensors whose pixels work independently from each other and respond asynchronously to brightness changes, with microsecond resolution. Their advantages make it possible to tackle challenging scenarios in robotics, such as high-speed and high dynamic range scenes. We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig. Our system follows a parallel tracking-and-mapping approach, where novel solutions to each subproblem (3D reconstruction and camera pose estimation) are developed with two objectives in mind: being principled and efficient, for real-time operation with commodity hardware. To this end, we seek to maximize the spatio-temporal consistency of stereo event-based data while using a simple and efficient representation. Specifically, the mapping module builds a semi-dense 3D map of the scene by fusing depth estimates from multiple local viewpoints (obtained by spatio-temporal consistency) in a probabilistic fashion. The tracking module recovers the pose of the stereo rig by solving a registration problem that naturally arises due to the chosen map and event data representation. Experiments on publicly available datasets and on our own recordings demonstrate the versatility of the proposed method in natural scenes with general 6-DoF motion. The system successfully leverages the advantages of event-based cameras to perform visual odometry in challenging illumination conditions, such as low-light and high dynamic range, while running in real-time on a standard CPU. We release the software and dataset under an open source license to foster research in the emerging topic of event-based SLAM.

References:

Y. Zhou, G. Gallego, S. Shen

Event-based Stereo Visual Odometry

IEEE Transactions on Robotics (TRO), vol. 37, no. 5, pp. 1433-1450, Oct. 2021.

doi, PDF, YouTube, Source Code, Project page and Datasets,
Results on DSEC dataset, and Tutorial

Event-based Vision: A Survey

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

Reference:

G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, D. Scaramuzza

Event-based Vision: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 44, no. 1, pp. 154-180, 1 Jan. 2022.

doi, PDF

Focus Is All You Need: Loss Functions for Event-based Vision

Event cameras are novel vision sensors that output pixel-level brightness changes ("events") instead of traditional video frames. These asynchronous sensors offer several advantages over traditional cameras, such as, high temporal resolution, very high dynamic range, and no motion blur. To unlock the potential of such sensors, motion compensation methods have been recently proposed. We present a collection and taxonomy of twenty two objective functions to analyze event alignment in motion compensation approaches. We call them focus loss functions since they have strong connections with functions used in traditional shape-from-focus applications. The proposed loss functions allow bringing mature computer vision tools to the realm of event cameras. We compare the accuracy and runtime performance of all loss functions on a publicly available dataset, and conclude that the variance, the gradient and the Laplacian magnitudes are among the best loss functions. The applicability of the loss functions is shown on multiple tasks: rotational motion, depth and optical flow estimation. The proposed focus loss functions allow to unlock the outstanding properties of event cameras.

References:

G. Gallego, M. Gehrig, D. Scaramuzza

Focus Is All You Need: Loss Functions for Event-based Vision

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12272-12281.

doi, PDF, Poster, YouTube

Event-Based Motion Segmentation by Motion Compensation

In contrast to traditional cameras, whose pixels have a common exposure time, event-based cameras are novel bio-inspired sensors whose pixels work independently and asynchronously output intensity changes (called "events"'), with microsecond resolution. Since events are caused by the apparent motion of objects, event-based cameras sample visual information based on the scene dynamics and are, therefore, a more natural fit than traditional cameras to acquire motion, especially at high speeds, where traditional cameras suffer from motion blur. However, distinguishing between events caused by different moving objects and by the camera's ego-motion is a challenging task. We present the first per-event segmentation method for splitting a scene into independently moving objects. Our method jointly estimates the event-object associations (i.e., segmentation) and the motion parameters of the objects (or the background) by maximization of an objective function, which builds upon recent results on event-based motion-compensation. We provide a thorough evaluation of our method on a public dataset, outperforming the state-of-the-art by as much as 10%. We also show the first quantitative evaluation of a segmentation algorithm for event cameras, yielding around 90% accuracy at 4 pixels relative displacement.

References:

T. Stoffregen, G. Gallego, T. Drummond, L. Kleeman, D. Scaramuzza

Event-Based Motion Segmentation by Motion Compensation

IEEE International Conference on Computer Vision (ICCV), 2019, pp. 7243-7252.

doi, PDF (animations best viewed with Acrobat Reader), YouTube

Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization

Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes, called "events", instead of traditional video images. These asynchronous sensors naturally respond to motion in the scene with very low latency (in the order of microseconds) and have a very high dynamic range. These features, along with a very low power consumption, make event cameras an ideal sensor for fast robot localization and wearable applications, such as AR/VR and gaming. Considering these applications, we present a method to track the 6-DOF pose of an event camera in a known environment, which we contemplate to be described by a photometric 3D map (i.e., intensity plus depth information) built via classic dense 3D reconstruction algorithms. Our approach uses the raw events, directly, without intermediate features, within a maximum-likelihood framework to estimate the camera motion that best explains the events via a generative model. We successfully evaluate the method using both simulated and real data, and show improved results over the state of the art. We release the datasets to the public to foster reproducibility and research in this topic.

References:

S. Bryner, G. Gallego, H. Rebecq, D. Scaramuzza

Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization

IEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 325-331.

doi, PDF, YouTube, Poster, MS Thesis, Dataset, Source Code

Asynchronous, Photometric Feature Tracking using Events and Frames

We present EKLT, a feature tracking method that leverages the complementarity of event cameras and standard cameras to track visual features with low latency. Event cameras are novel sensors that output pixel-level brightness changes, called "events". They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the same scene pattern can produce different events depending on the motion direction, establishing event correspondences across time is challenging. By contrast, standard cameras provide intensity measurements (frames) that do not depend on motion direction. Our method extracts features on frames and subsequently tracks them asynchronously using events, thereby exploiting the best of both types of data: the frames provide a photometric representation that does not depend on motion direction and the events provide low latency updates. In contrast to previous works, which are based on heuristics, this is the first principled method that uses raw intensity measurements directly, based on a generative event model within a maximum-likelihood framework. As a result, our method produces feature tracks that are both more accurate (subpixel accuracy) and longer than the state of the art, across a wide variety of scenes.

References:

D. Gehrig, H. Rebecq, G. Gallego, D. Scaramuzza

EKLT: Asynchronous, Photometric Feature Tracking using Events and Frames

International Journal of Computer Vision (IJCV), vol. 128, pp. 601-618, 2020.

doi, PDF, Poster, YouTube, Tracking Code, Evaluation Code

D. Gehrig, H. Rebecq, G. Gallego, D. Scaramuzza

Asynchronous, Photometric Feature Tracking using Events and Frames

European Conference on Computer Vision (ECCV), 2018, pp. 766-781.
Oral Presentation. Acceptance rate: 2.4%

doi, PDF, Poster, YouTube, Oral presentation, Tracking Code, Evaluation Code

Semi-Dense 3D Reconstruction with a Stereo Event Camera

This paper presents a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. The proposed method consists of the optimization of an energy function designed to exploit small-baseline spatio-temporal consistency of events triggered across both stereo image planes. To improve the density of the reconstruction and to reduce the uncertainty of the estimation, a probabilistic depth-fusion strategy is also developed. The resulting method has no special requirements on either the motion of the stereo event-camera rig or on prior knowledge about the scene. Experiments demonstrate our method can deal with both texture-rich scenes as well as sparse scenes, outperforming state-of-the-art stereo methods based on event data image representations.

References:

Y. Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li, D. Scaramuzza

Semi-Dense 3D Reconstruction with a Stereo Event Camera

European Conference on Computer Vision (ECCV), 2018, pp. 242-258.

doi, PDF, Poster, YouTube, Dataset

Continuous-Time Visual-Inertial Odometry for Event Cameras

In this paper, we leverage a continuous-time framework to perform trajectory estimation by fusing visual data from a moving event camera with inertial data from an IMU. This framework allows direct integration of the asynchronous events with micro-second accuracy and the inertial measurements at high frequency. The pose trajectory is approximated by a smooth curve in the space of rigid-body motions using cubic splines. This formulation significantly reduces the number of variables in trajectory estimation problems. We evaluate our method on real data from several scenes and compare the results against ground truth from a motion-capture system. We show superior performance of the proposed technique compared to non-batch event-based algorithms. We also show that both the map orientation and scale can be recovered accurately by fusing events and inertial data. To the best of our knowledge, this is the first work on visual-inertial fusion with event cameras using a continuous-time framework.

References:

E. Mueggler, G. Gallego, H. Rebecq, D. Scaramuzza

Continuous-Time Visual-Inertial Odometry for Event Cameras

IEEE Transactions on Robotics (TRO), vol. 34, no. 6, pp. 1425-1440, Dec. 2018.

doi, PDF

A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation

We present a unifying framework to solve several computer vision problems with event cameras: motion, depth and optical flow estimation. The main idea of our framework is to find the point trajectories on the image plane that are best aligned with the event data by maximizing an objective function: the contrast of an image of warped events. Our method implicitly handles data association between the events, and therefore, does not rely on additional appearance information about the scene. In addition to accurately recovering the motion parameters of the problem, our framework produces motion-corrected edge-like images with high dynamic range that can be used for further scene analysis. The proposed method is not only simple, but more importantly, it is, to the best of our knowledge, the first method that can be successfully applied to such a diverse set of important vision tasks with event cameras.

References:

G. Gallego, H. Rebecq, D. Scaramuzza

A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3867-3876, 2018.
Spotlight Presentation.

doi, PDF, Poster, YouTube, Presentation

Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars

Event cameras are bio-inspired vision sensors that naturally capture the dynamics of a scene, filtering out redundant information. This paper presents a deep neural network approach that unlocks the potential of event cameras on a challenging motion-estimation task: prediction of a vehicle's steering angle. To make the best out of this sensor-algorithm combination, we adapt state-of-the-art convolutional architectures to the output of event sensors and extensively evaluate the performance of our approach on a publicly available large scale event-camera dataset (~1000 km). We present qualitative and quantitative explanations of why event cameras allow robust steering prediction even in cases where traditional cameras fail, e.g. challenging illumination conditions and fast motion. Finally, we demonstrate the advantages of leveraging transfer learning from traditional to event-based vision, and show that our approach outperforms state-of-the-art algorithms based on standard cameras.

References:

A.I. Maqueda, A. Loquercio, G. Gallego, N. Garcia, D. Scaramuzza

Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5419-5427.

doi, PDF, Poster, YouTube, Code

Event-based, 6-DOF Camera Tracking from Photometric Depth Maps

This paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. Our method is the first work addressing and demonstrating event-based pose tracking in six degrees-of-freedom (DOF) motions in realistic and natural scenes. We successfully evaluate the method in both indoor and outdoor scenes and show that, because of the technological advantages of the event camera, our pipeline works in scenes characterized by high-speed motion, which are still inaccessible to standard cameras.

References:

G. Gallego, J. E.A. Lund, E. Mueggler, H. Rebecq, T. Delbruck, D. Scaramuzza

Event-based, 6-DOF Camera Tracking from Photometric Depth Maps

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 2, pp. 2402-2412, Oct. 2018.

doi, PDF, YouTube, Datasets

EMVS: Event-Based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time

We introduce the problem of event-based multi-view stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (1) its ability to respond to scene edges - which naturally provide semi-dense geometric information without any preprocessing operation - and (2) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps, without requiring any explicit data association or intensity estimation. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a CPU. We release the source code.

References:

H. Rebecq, G. Gallego, E. Mueggler, D. Scaramuzza

EMVS: Event-Based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time

International Journal of Computer Vision (IJCV), vol. 126, no. 12, pp. 1394-1414, Dec. 2018.
Special Issue with best (extended) papers from BMVC 2016.

doi, PDF, YouTube, Code

EVO: Event-based, 6-DOF Parallel Tracking and Mapping in Real-Time

We present EVO, an Event-based Visual Odometry algorithm. Our algorithm successfully leverages the outstanding properties of event cameras to track fast camera motions while recovering a semi-dense 3D map of the environment. The implementation runs in real-time on a standard CPU and outputs up to several hundred pose estimates per second. Due to the nature of event cameras, our algorithm is unaffected by motion blur and operates very well in challenging, high dynamic range conditions with strong illumination changes. To achieve this, we combine a novel, event-based tracking approach based on image-to-model alignment with a recent event-based 3D reconstruction algorithm in a parallel fashion. Additionally, we show that the output of our pipeline can be used to reconstruct intensity images from the binary event stream, though our algorithm does not require such intensity information. We believe that this work makes significant progress in SLAM by unlocking the potential of event cameras. This allows us to tackle challenging scenarios that are currently inaccessible to standard cameras.

References:

H. Rebecq, T. Horstschaefer, G. Gallego, D. Scaramuzza

EVO: A Geometric Approach to Event-based 6-DOF Parallel Tracking and Mapping in Real-time

IEEE Robotics and Automation Letters (RA-L), vol. 2, no. 2, pp. 593-600, Apr. 2017.

doi, PDF, Poster, YouTube, Slides, Code

Accurate Angular Velocity Estimation with an Event Camera

We present an algorithm to estimate the rotational motion of an event camera. In contrast to traditional cameras, which produce images at a fixed rate, event cameras have independent pixels that respond asynchronously to brightness changes, with microsecond resolution. Our method leverages the type of information conveyed by these novel sensors (that is, edges) to directly estimate the angular velocity of the camera, without requiring optical flow or image intensity estimation. The core of the method is a contrast maximization design. The method performs favorably against ground truth data and gyroscopic measurements from an Inertial Measurement Unit, even in the presence of very high-speed motions (close to 1000 deg/s).

References:

G. Gallego and D. Scaramuzza

Accurate Angular Velocity Estimation with an Event Camera

IEEE Robotics and Automation Letters (RA-L), vol. 2, no. 2, pp. 632-639, Apr. 2017.

doi, PDF, Poster, YouTube, Slides

The Event Camera Dataset and Simulator:

Event-based Data for Pose Estimation, Visual Odometry, and SLAM

We present the world's first collection of datasets with an event-based camera for high-speed robotics. The data also include intensity images, inertial measurements, and ground truth from a motion-capture system. An event-based camera is a revolutionary vision sensor with three key advantages: a measurement rate that is several orders of magnitude faster than standard cameras, a latency of microseconds, and a high dynamic range of 130 decibels. These properties enable the design of a new class of algorithms for high-speed robotics, where standard cameras suffer from motion blur and high latency. All the data are released both as text files and binary (i.e., rosbag) files. Find out more on the dataset website!

References:

E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, D. Scaramuzza

The Event-Camera Dataset and Simulator: Event-based Data for Pose Estimation, Visual Odometry, and SLAM

International Journal of Robotics Research (IJRR), vol. 36, no. 2, pp. 142-149, Feb. 2017.

doi, PDF, YouTube, Dataset page

EMVS: Event-based Multi-View Stereo

We introduce the problem of Event-based Multi-View Stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (i) its ability to respond to scene edges --which naturally provide semi-dense geometric information without any preprocessing operation-- and (ii) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a laptop CPU and even on a smartphone processor. We release the source code.

References:

H. Rebecq, G. Gallego, D. Scaramuzza

EMVS: Event-based Multi-View Stereo

British Machine Vision Conference (BMVC), York, UK, Sep. 19-22, 2016.
Best Industry Paper Award (sponsored by NVIDIA and BMVA). Oral Talk: acceptance rate 7%

doi, PDF, YouTube, Code

Low-Latency Visual Odometry using Event-based Feature Tracks

We develop an event-based feature tracking algorithm for the DAVIS sensor and show how to integrate it in an event-based visual odometry pipeline. Features are first detected in the grayscale frames and then tracked asynchronously using the stream of events. The features are then fed to an event-based visual odometry pipeline that tightly interleaves robust pose optimization and probabilistic mapping. We show that our method successfully tracks the 6-DOF motion of the sensor in natural scenes (see video above).

References:

B. Kueng, E. Mueggler, G. Gallego, D. Scaramuzza

Low-Latency Visual Odometry using Event-based Feature Tracks

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea (South), 2016, pp. 16-23.
Best Application Paper Award Finalist. Highlight talk: acceptance rate 2.5%

doi, PDF, YouTube

D. Tedaldi, G. Gallego, E. Mueggler, D. Scaramuzza

Feature Detection and Tracking with the Dynamic and Active-pixel Vision Sensor (DAVIS)

IEEE International Conference on Event-Based Control, Communication, and Signal Processing (EBCCSP), Krakow, Poland, 2016, pp. 1-7.

doi, PDF, YouTube

Continuous-Time Trajectory Estimation for Event-based Vision Sensors

In this paper, we address ego-motion estimation for an event-based vision sensor using a continuous-time framework to directly integrate the information conveyed by the sensor. The DVS pose trajectory is approximated by a smooth curve in the space of rigid-body motions using cubic splines and it is optimized according to the observed events. We evaluate our method using datasets acquired from sensor-in-the-loop simulations and onboard a quadrotor performing flips. The results are compared to the ground truth, showing the good performance of the proposed technique.

References:

E. Mueggler, G. Gallego, D. Scaramuzza

Continuous-Time Trajectory Estimation for Event-based Vision Sensors

Robotics: Science and Systems XI (RSS), Rome, Italy, July 13-17, 2015.

doi, PDF

Event-based Camera Pose Tracking using a Generative Event Model

We tackle the problem of event-based camera localization in a known environment, without additional sensing, using a probabilistic generative event model in a Bayesian filtering framework. Our main contribution is the design of the likelihood function used in the filter to process the observed events. Based on the physical characteristics of the sensor and on empirical evidence of the Gaussian-like distribution of spiked events with respect to the brightness change, we propose to use the contrast residual as a measure of how well the estimated pose of the event-based camera and the environment explain the observed events. The filter allows for localization in the general case of six degrees-of-freedom motions.

G. Gallego, C. Forster, E. Mueggler, D. Scaramuzza

Event-based Camera Pose Tracking using a Generative Event Model

arXiv:1510.01972, 2015.

PDF

Lifetime Estimation of Events from Dynamic Vision Sensors

We develop an algorithm that augments each event with its "lifetime", which is computed from the event's velocity on the image plane. The generated stream of augmented events gives a continuous representation of events in time, hence enabling the design of new algorithms that outperform those based on the accumulation of events over fixed, artificially-chosen time intervals. A direct application of this augmented stream is the construction of sharp gradient (edge-like) images at any time instant. We successfully demonstrate our method in different scenarios, including high-speed quadrotor flips, and compare it to standard visualization methods.

References:

E. Mueggler, C. Forster, N. Baumli, G. Gallego, D. Scaramuzza

Lifetime Estimation of Events from Dynamic Vision Sensors

IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 2015, pp. 4874-4881.

doi, PDF, Code

Google Sites

Report abuse