Event-based Vision
This is a topic that I started to develop when I moved to Zurich to work at the Robotics and Perception Group (UZH and ETH).
Event cameras, such as the Dynamic Vision Sensor (DVS), are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. They offer significant advantages over standard cameras, namely a very high dynamic range, minimal motion blur, and a latency in the order of microseconds. However, because the output is composed of a sequence of asynchronous events rather than actual intensity images, traditional vision algorithms cannot be applied, so that new algorithms that exploit the high temporal resolution and the asynchronous nature of the sensor are required.
Press release at TU Berlin
After the success of the First International Workshop on Event-based Vision at ICRA'17, where we saw a large and growing number of people interested in event-based cameras, we started a List of Event-based Vision Resources and wrote a Survey paper. The paper is a comprehensive introduction to the topic. The list collects links to event camera devices as well as papers, videos, code, presentations, etc. describing the algorithms and systems developed using this exciting technology. We hope the list will help us as well as people interested in this technology to be more aware of past and recent developments by directing them to the appropriate references, which are organized by topics, as shown in the Table of Contents at the top of the list.
Workshops on Event-based Vision
CVPR'23: Fourth International Workshop on Event-based Vision, Vancouver, Canada. Videos of the talks are available online!
CVPR'21: Third International Workshop on Event-based Vision. Videos of the talks are available online!
ICRA'20: Workshop on Sensing, Estimating and Understanding the Dynamic World
CVPR'19: Second International Workshop on Event-based Vision and Smart Cameras, Long Beach, USA. The slides and videos of the talks are available online!
ICRA'17: First International Workshop on Event-based Vision, Singapore. Video recordings and slides are available!
2017 Misha Mahowald Prize for Neuromorphic Engineering
Our research on event cameras for robotic applications wins the 2017 Misha Mahowald Prize! The award recognizes outstanding achievement in the field of neuromorphic engineering.
Here are some recent papers on event-based vision published in computer vision and robotic venues. As we can see, it is an emerging topic that more and more people are joining to investigate. This list is compiled from this repository.
ECCV'24 NeVi Workshop. Invited Keynote presentation:
On the Benefits of Visual Stabilization for Frame- and Event-based Perception
Vision-based perception systems are typically exposed to large orientation changes in different robot applications. In such conditions, their performance might be compromised due to the inherent complexity of processing data captured under challenging motion. Integration of mechanical stabilizers to compensate for the camera rotation is not always possible due to the robot payload constraints. This paper presents a processing-based stabilization approach to compensate the camera's rotational motion both on events and on frames (i.e., images). Assuming that the camera's attitude is available, we evaluate the benefits of stabilization in two perception applications: feature tracking and estimating the translation component of the camera's ego-motion. The validation is performed using synthetic data and sequences from well-known event-based vision datasets. The experiments unveil that stabilization can improve feature tracking and camera ego-motion estimation accuracy in 27.37% and 34.82%, respectively. Concurrently, stabilization can reduce the processing time of computing the camera's linear velocity by at least 25%.
Reference:
J.P. Rodríguez-Gómez, J.R. Martínez-de Dios, A. Ollero, G. Gallego,,
On the Benefits of Visual Stabilization for Frame- and Event-based Perception
IEEE Robotics and Automation Letters (RA-L), 2024.
Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
Current optical flow and point-tracking methods rely heavily on synthetic datasets. Event cameras are novel vision sensors with advantages in challenging visual conditions, but state-of-the-art frame-based methods cannot be easily adapted to event data due to the limitations of current event simulators. We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories and propose an efficient solution to solve the high-dimensional assignment problem between non-linear trajectories and events. Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model on the real-world dataset EVIMO2 by 29%. In optical flow estimation, our method elevates a simple UNet to achieve state-of-the-art performance among self-supervised methods on the DSEC optical flow benchmark.
Reference:
F. Hamann, Z. Wang, I. Asmanis, K. Chaney, G. Gallego, K. Daniilidis,
Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
European Conference on Computer Vision (ECCV), 2024.
Event-based Mosaicing Bundle Adjustment
We tackle the problem of mosaicing bundle adjustment (i.e., simultaneous refinement of camera orientations and scene map) for a purely rotating event camera. We formulate the problem as a regularized non-linear least squares optimization. The objective function is defined using the linearized event generation model in the camera orientations and the panoramic gradient map of the scene. We show that this BA optimization has an exploitable block-diagonal sparsity structure, so that the problem can be solved efficiently. To the best of our knowledge, this is the first work to leverage such sparsity to speed up the optimization in the context of event-based cameras, without the need to convert events into image-like representations. We evaluate our method, called EMBA, on both synthetic and real-world datasets to show its effectiveness (50% photometric error decrease), yielding results of unprecedented quality. In addition, we demonstrate EMBA using high spatial resolution event cameras, yielding delicate panoramas in the wild, even without an initial map.
Reference:
S. Guo and G. Gallego
Event-based Mosaicing Bundle Adjustment
European Conference on Computer Vision (ECCV), 2024.
PDF, Poster, Project page and ECRot Dataset
Motion and Structure from Event-based Normal Flow
Recovering the camera motion and scene geometry from visual data is a fundamental problem in computer vision. Its success in conventional (frame-based) vision is attributed to the maturity of feature extraction, data association and multi-view geometry. The emergence of asynchronous (event-based) cameras calls for new approaches that use raw event data as input to solve this fundamental problem. State-of-the-art solutions typically infer data association implicitly by iteratively reversing the event data generation process. However, the nonlinear nature of these methods limits their applicability in real-time tasks, and the constant-motion assumption leads to unstable results under agile motion. To this end, we reformulate the problem in a way that aligns better with the differential working principle of event cameras. We show that event-based normal flow can be used, via the proposed geometric error term, as an alternative to the full (optical) flow in solving a family of geometric problems that involve instantaneous first-order kinematics and scene geometry. Furthermore, we develop a fast linear solver and a continuous-time nonlinear solver on top of the proposed geometric error term. Experiments on both synthetic and real data show the superiority of our linear solver in terms of accuracy and efficiency, and its practicality as an initializer for previous nonlinear solvers. Besides, our continuous-time non-linear solver exhibits exceptional capabilities in accommodating sudden variations in motion since it does not rely on the constant-motion assumption.
Reference:
Z. Ren, B. Liao, D. Kong, J. Li, P. Liu, L. Kneip, G. Gallego, Y. Zhou,
Motion and Structure from Event-based Normal Flow
European Conference on Computer Vision (ECCV), 2024.
ES-PTAM: Event-based Stereo Parallel Tracking and Mapping
Visual Odometry (VO) and SLAM are fundamental components for spatial perception in mobile robots. Despite enormous progress in the field, current VO/SLAM systems are limited by their sensors' capability. Event cameras are novel visual sensors that offer advantages to overcome the limitations of standard cameras, enabling robots to expand their operating range to challenging scenarios, such as high-speed motion and high dynamic range illumination. We propose a novel event-based stereo VO system by combining two ideas: a correspondence-free mapping module that estimates depth by maximizing ray density fusion and a tracking module that estimates camera poses by maximizing edge-map alignment. We evaluate the system comprehensively on five real-world datasets, spanning a variety of camera types (manufacturers and spatial resolutions) and scenarios (driving, flying drone, hand-held, egocentric, etc). The quantitative and qualitative results demonstrate that our method outperforms the state of the art in majority of the test sequences by a margin, e.g., trajectory error reduction of 45% on RPG dataset, 61% on DSEC dataset, and 21% on TUM-VIE dataset. To benefit the community and foster research on event-based perception systems, we release the source code and results.
Reference:
S. Ghosh, V. Cavinato, G. Gallego
ES-PTAM: Event-based Stereo Parallel Tracking and Mapping
European Conf. on Computer Vision Workshops (ECCVW), 2024.
MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice
Enabled by large annotated datasets, tracking and segmentation of objects in videos has made remarkable progress in recent years. Despite these advancements, algorithms still struggle under degraded conditions and during fast movements. Event cameras are novel sensors with high temporal resolution and high dynamic range that offer promising advantages to address these challenges. However, annotated data for developing learning-based mask-level tracking algorithms with events is not available. To this end, we introduce: (i) a new task termed space-time instance segmentation, similar to video instance segmentation, whose goal is to segment instances throughout the entire duration of the sensor input (here, the input are quasi-continuous events and optionally aligned frames); and (ii) MouseSIS, a dataset for the new task, containing aligned grayscale frames and events. It includes annotated ground-truth labels (pixel-level instance segmentation masks) of a group of up to seven freely moving and interacting mice. We also provide two reference methods, which show that leveraging event data can consistently improve tracking performance, especially when used in combination with conventional cameras. The results highlight the potential of event-aided tracking in difficult scenarios. We hope our dataset opens the field of event-based video instance segmentation and enables the development of robust tracking algorithms for challenging conditions.
Reference:
F. Hamann, H. Li, P. Mieske, L. Lewejohann, G. Gallego
MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice
European Conf. on Computer Vision Workshops (ECCVW), 2024.
Low-power, Continuous Remote Behavioral Localization with Event Cameras
Researchers in natural science need reliable methods for quantifying animal behavior. Recently, numerous computer vision methods emerged to automate the process. However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage. Event cameras offer unique advantages for battery-dependent remote monitoring due to their low power consumption and high dynamic range capabilities. We use this novel sensor to quantify a behavior in Chinstrap penguins called ecstatic display. We formulate the problem as a temporal action detection task, determining the start and end times of the behavior. For this purpose, we recorded a colony of breeding penguins in Antarctica during several weeks and labeled event data on 16 nests. The developed method consists of a generator of candidate time intervals (proposals) and a classifier of the actions within them. The experiments show that the event cameras' natural response to motion is effective for continuous behavior monitoring and detection, reaching a mean average precision (mAP) of 58% (which increases to 63% in good weather conditions). The results also demonstrate the robustness against various lighting conditions contained in the challenging dataset. The low-power capabilities of the event camera allows to record three times longer than with a conventional camera.
Reference:
F. Hamann, S. Ghosh, I. Juárez-Martínez, T. Hart, A. Kacelnik, G. Gallego
(Event Penguins) Low-power, Continuous Remote Behavioral Localization with Event Cameras
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
PDF, Poster, Project page
Press release at TU Berlin "Penguins in a state of Ecstasy" (ENG) / (DEU)
SCIoI news (04.2024)
SCIoI Excellence Cluster News (03.2022)
F. Hamann, S. Ghosh, I. Juárez-Martínez, T. Hart, A. Kacelnik, G. Gallego
Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras
Advanced Intelligent Systems, 2024.
CMax-SLAM: Event-based Rotational-Motion Bundle Adjustment and SLAM System using Contrast Maximization
This paper considers the problem of rotational motion estimation using event cameras. Several event-based rotation estimation methods have been developed in the past decade, but their performance has not been evaluated and compared under unified criteria yet. In addition, these prior works do not consider a global refinement step. To this end, we conduct a systematic study of this problem with two objectives in mind: summarizing previous works and presenting our own solution. First, we compare prior works both theoretically and experimentally. Second, we propose the first event-based rotation-only bundle adjustment (BA) approach. We formulate it leveraging the state-of-the-art Contrast Maximization (CMax) framework, which is principled and avoids the need to convert events into frames. Third, we use the proposed BA to build CMax-SLAM, the first event-based rotation-only SLAM system comprising a front-end and a back-end. Our BA is able to run both offline (trajectory smoothing) and online (CMax-SLAM back-end). To demonstrate the performance and versatility of our method, we present comprehensive experiments on synthetic and real-world datasets, including indoor, outdoor and space scenarios. We discuss the pitfalls of real-world evaluation and propose a proxy for the reprojection error as the figure of merit to evaluate event-based rotation BA methods. We release the source code and novel data sequences to benefit the community. We hope this work leads to a better understanding and fosters further research on event-based ego-motion estimation.
Reference:
S. Guo and G. Gallego
CMax-SLAM: Event-based Rotational-Motion Bundle Adjustment and SLAM System using Contrast Maximization
IEEE Transactions on Robotics (TRO), 2024.
doi, PDF, Poster, Project page with Code, and ECRot Dataset
Event-based Background-Oriented Schlieren
Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution, and data efficiency) to overcome such limitations due to their bio-inspired sensing principle. This paper presents a novel technique for perceiving air convection using events and frames by providing the first theoretical analysis that connects event data and schlieren. We formulate the problem as a variational optimization one combining the linearized event generation model with a physically-motivated parameterization that estimates the temporal derivative of the air density. The experiments with accurately aligned frame- and event camera data reveal that the proposed method enables event cameras to obtain on par results with existing frame-based optical flow techniques. Moreover, the proposed method works under dark conditions where frame-based schlieren fails, and also enables slow-motion analysis by leveraging the event camera’s advantages. Our work pioneers and opens a new stack of event camera applications, as we publish the source code as well as the first schlieren dataset with high-quality frame and event data.
Reference:
S. Shiba, F. Hamann, Y. Aoki, G. Gallego
Event-based Background-Oriented Schlieren
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
doi, PDF, Poster, Project page and Data
Press release at TU Berlin "Unveiling the Invisible: Schlieren Photography With Event Cameras" (ENG) / (DEU)
SCIoI Excellence Cluster News
Formulating Event-based Image Reconstruction as a Linear Inverse Problem with Deep Regularization using Optical Flow
Event cameras are novel bio-inspired sensors that measure per-pixel brightness differences asynchronously. Recovering brightness from events is appealing since the reconstructed images inherit the high dynamic range (HDR) and high-speed properties of events; hence they can be used in many robotic vision applications and to generate slow-motion HDR videos. However, state-of-the-art methods tackle this problem by training an event-to-image Recurrent Neural Network (RNN), which lacks explainability and is difficult to tune. In this work we show, for the first time, how tackling the combined problem of motion and brightness estimation leads us to formulate event-based image reconstruction as a linear inverse problem that can be solved without training an image reconstruction RNN. Instead, classical and learning-based regularizers are used to solve the problem and remove artifacts from the reconstructed images. The experiments show that the proposed approach generates images with visual quality on par with state-of-the-art methods despite only using data from a short time interval. State-of-the-art results are achieved using an image denoising Convolutional Neural Network (CNN) as the regularization function. The proposed regularized formulation and solvers have a unifying character because they can be applied also to reconstruct brightness from the second derivative. Additionally, the formulation is attractive because it can be naturally combined with super-resolution, motion-segmentation and color demosaicing. Code is available at https://github.com/tub-rip/event_based_image_rec_inverse_problem
Reference:
Z. Zhang, A. Yezzi, G. Gallego
Formulating Event-based Image Reconstruction as a Linear Inverse Problem with Deep Regularization using Optical Flow
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 45, No. 7, July 2023.
Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion
In this work we tackle the problem of event-based stereo 3D reconstruction for SLAM. Most event-based stereo methods try to exploit the camera's high temporal resolution and event simultaneity across cameras to establish matches and estimate depth. By contrast, we investigate how to estimate depth without explicit data association by fusing Disparity Space Images (DSIs) originated in efficient monocular methods. We develop fusion theory and apply it to design multi-camera 3D reconstruction algorithms that produce state-of-the-art results, as we confirm by comparing against four baseline methods and testing on a variety of available datasets.
References:
S. Ghosh and G. Gallego
MC-EMVS: Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion
Advanced Intelligent Systems (AISY), 4: 2200221, Sep. 2022.
doi, PDF, Project Page and Code,
Presentation at IEEE MFI workshop 2022 (YouTube), Slides
Presentation at the GRASP Laboratory (UPenn) seminar (YouTube)
S. Ghosh and G. Gallego
Event-based Stereo Depth Estimation from Ego-motion using Ray Density Fusion
European Conf. on Computer Vision Workshops (ECCVW) Ego4D, 2022.
S. Ghosh and G. Gallego
Event-based Stereo Depth for SLAM in Autonomous Driving
Behavior-driven Autonomous Driving in Unstructured Environments (BADUE) Workshop at IROS 2022.
Secrets of Event-Based Optical Flow, Depth and Ego-motion Estimation by Contrast Maximization
Event cameras respond to scene dynamics and offer advantages to estimate motion. Following recent image-based deep-learning achievements, optical flow estimation methods for event cameras have rushed to combine those image-based methods with event data. However, it requires several adaptations (data conversion, loss function, etc.) as they have very different properties. We develop a principled method to extend the Contrast Maximization framework to estimate optical flow from events alone. We investigate key elements: how to design the objective function to prevent overfitting, how to warp events to deal better with occlusions, and how to improve convergence with multi-scale raw events. With these key elements, our method ranks first among unsupervised methods on the MVSEC benchmark, and is competitive on the DSEC benchmark. Moreover, our method allows us to expose the issues of the ground truth flow in those benchmarks, and produces remarkable results when it is transferred to unsupervised learning settings. We release the code open source.
Reference:
S. Shiba, Y. Klose, Y. Aoki, G. Gallego
Secrets of Event-based Optical Flow, Depth and Ego-motion Estimation by Contrast Maximization
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. doi, PDF, Poster
Secrets of Event-Based Optical Flow
European Conference on Computer Vision (ECCV), Oct. 2022. PDF
Oral Presentation. Acceptance rate: 2.7%
YouTube, Poster, Project page and Code
Presentation at the PRG Seminar Series U. Maryland (Video)
Presentation at the GRASP Laboratory (UPenn) seminar (YouTube)
Presentation at ViEW 2022: イベントカメラを用いたオプティカルフロー推定:動きとは何か?
A Fast Geometric Regularizer to Mitigate Event Collapse in the Contrast Maximization Framework
Event cameras are emerging vision sensors and their advantages are suitable for various applications such as autonomous robots. Contrast maximization (CMax), which provides state-of-the-art accuracy on motion estimation using events, may suffer from an overfitting problem called event collapse. Prior works are computationally expensive or cannot alleviate the overfitting, which undermines the benefits of the CMax framework. We propose a novel, computationally efficient regularizer based on geometric principles to mitigate event collapse. The experiments show that the proposed regularizer achieves state-of-the-art accuracy results, while its reduced computational complexity makes it two to four times faster than previous approaches. To the best of our knowledge, our regularizer is the only effective solution for event collapse without trading off runtime. We hope our work opens the door for future applications that unlocks the advantages of event cameras.
Reference:
S. Shiba, Y. Aoki, G. Gallego
A Fast Geometric Regularizer to Mitigate Event Collapse in the Contrast Maximization Framework
Advanced Intelligent Systems (AISY), 5: 2200251, Jan. 2023.
Event Collapse in Contrast Maximization Frameworks
Contrast maximization (CMax) is a framework that provides state-of-the-art results on several event-based computer vision tasks, such as ego-motion or optical flow estimation. However, it may suffer from a problem called event collapse, which is an undesired solution where events are warped into too few pixels. As prior works have largely ignored the issue or proposed workarounds, it is imperative to analyze this phenomenon in detail. Our work demonstrates event collapse in its simplest form and proposes collapse metrics by using first principles of space-time deformation based on differential geometry and physics. We experimentally show on publicly available datasets that the proposed metrics mitigate event collapse and do not harm well-posed warps. To the best of our knowledge, regularizers based on the proposed metrics are the only effective solution against event collapse in the experimental settings considered, compared with other methods. We hope that this work inspires further research to tackle more complex warp models.
Reference:
S. Shiba, Y. Aoki, G. Gallego
Event Collapse in Contrast Maximization Frameworks
Sensors 2022, 22(14):5190.
Fast Event-based Optical Flow Estimation by Triplet Matching
Event cameras are novel bio-inspired sensors that offer advantages over traditional cameras (low latency, high dynamic range, low power, etc.). Optical flow estimation methods that work on packets of events trade off speed for accuracy, while event-by-event (incremental) methods have strong assumptions and have not been tested on common benchmarks that quantify progress in the field. Towards applications on resource-constrained devices, it is important to develop optical flow algorithms that are fast, light-weight and accurate. This work leverages insights from neuroscience, and proposes a novel optical flow estimation scheme based on triplet matching. The experiments on publicly available benchmarks demonstrate its capability to handle complex scenes with comparable results as prior packet-based algorithms. In addition, the proposed method achieves the fastest execution time (> 10 kHz) on standard CPUs as it requires only three events in estimation. We hope that our research opens the door to real-time, incremental motion estimation methods and applications in real-world scenarios.
Reference:
Stereo Co-capture System for Recording and Tracking Fish with Frame- and Event Cameras
This work introduces a co-capture system for multi-animal visual data acquisition using conventional cameras and event cameras. Event cameras offer multiple advantages over frame-based cameras, such as a high temporal resolution and temporal redundancy suppression, which enable us to efficiently capture the fast and erratic movements of fish. We furthermore present an event-based multi-animal tracking algorithm, which proves the feasibility of the approach and sets the baseline for further exploration of combining the advantages of event cameras and conventional cameras for multi-animal tracking.
Reference:
F. Hamann, G. Gallego,
Stereo Co-capture System for Recording and Tracking Fish with Frame- and Event Cameras
26th International Conference on Pattern Recognition (ICPR), Visual observation and analysis of Vertebrate And Insect Behavior (VAIB) Workshop, Montreal, Canada, 2022.
EDS: Event-aided Direct Sparse Odometry
We introduce EDS, a direct monocular visual odometry using events and frames. Our algorithm leverages the event generation model to track the camera motion in the blind time between frames. The method formulates a direct probabilistic approach of observed brightness increments. Per-pixel brightness increments are predicted using a sparse number of selected 3D points and are compared to the events via the brightness increment error to estimate camera motion. The method recovers a semi-dense 3D map using photometric bundle adjustment. EDS is the first method to perform 6-DOF VO using events and frames with a direct approach. By design it overcomes the problem of changing appearance in indirect methods. We also show that, for a target error performance, EDS can work at lower frame rates than state-of-the-art frame-based VO solutions. This opens the door to low-power motion-tracking applications where frames are sparingly triggered "on demand'' and our method tracks the motion in between. We release code and datasets to the public.
Reference:
J. Hidalgo-Carrió, G. Gallego, D. Scaramuzza
Event-aided Direct Sparse Odometry
IEEE Conference of Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5771-5780.
Oral Presentation. Acceptance rate: 4.2%
PDF, Poster, YouTube, CVPR Video, Project page and Dataset, Code
Stabilizing Event Data on Flapping-wing Robots for Simpler Perception
We propose a stabilization method for event cameras mounted onboard flapping-wing robots. Differently from frame-based cameras, event cameras do not suffer for motion blur that typically occurs due to strong changes in the camera orientation. The method intends to offer an alternative to heavy gimbals mounted on ornithopters. It has been tested on event data acquired by a large-scale ornithopter (1.5m wingspan).
Reference:
ESL: Event-based Structured Light
Event cameras are bio-inspired sensors providing significant advantages over standard cameras such as low latency, high temporal resolution, and high dynamic range. We propose a novel structured-light system using an event camera to tackle the problem of accurate and high-speed depth sensing. Our setup consists of an event camera and a laser-point projector that uniformly illuminates the scene in a raster scanning pattern during 16 ms. Previous methods match events independently of each other, and so they deliver noisy depth estimates at high scanning speeds in the presence of signal latency and jitter. In contrast, we optimize an energy function designed to exploit event correlations, called spatio-temporal consistency. The resulting method is robust to event jitter and therefore performs better at higher scanning speeds. Experiments demonstrate that our method can deal with high-speed motion and outperform state-of-the-art 3D reconstruction methods based on event cameras, reducing the RMSE by 83% on average, for the same acquisition time.
Reference:
M. Muglikar, G. Gallego, D. Scaramuzza
ESL: Event-based Structured Light
IEEE International Conference on 3D Vision (3DV), 2021, pp. 1165-1174.
Event-based Motion Segmentation with Spatio-Temporal Graph Cuts
Identifying independently moving objects is an essential task for dynamic scene understanding. However, traditional cameras used in dynamic scenes may suffer from motion blur or exposure artifacts due to their sampling principle. By contrast, event-based cameras are novel bio-inspired sensors that offer advantages to overcome such limitations. They report pixel-wise intensity changes asynchronously, which enables them to acquire visual information at exactly the same rate as the scene dynamics. We develop a method to identify independently moving objects acquired with an event-based camera, i.e., to solve the event-based motion segmentation problem. We cast the problem as an energy minimization one involving the fitting of multiple motion models. We jointly solve two subproblems, namely event cluster assignment (labeling) and motion model fitting, in an iterative manner by exploiting the structure of the input event data in the form of a spatio-temporal graph. Experiments on available datasets demonstrate the versatility of the method in scenes with different motion patterns and number of moving objects. The evaluation shows state-of-the-art results without having to predetermine the number of expected moving objects. We release the software and dataset under an open source license to foster research in the emerging topic of event-based motion segmentation.
Reference:
Y. Zhou, G. Gallego, X. Lu, S. Liu, S. Shen
Event-based Motion Segmentation with Spatio-Temporal Graph Cuts
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), vol. 34, no. 8, pp. 4868-4880, Aug. 2023.
The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data
Event cameras, inspired by biological vision systems, provide a natural and data efficient representation of visual information. Visual information is acquired in the form of events that are triggered by local brightness changes. However, because most brightness changes are triggered by relative motion of the camera and the scene, the events recorded at a single sensor location seldom correspond to the same world point. To extract meaningful information from event cameras, it is helpful to register events that were triggered by the same underlying world point. In this work we propose a new model of event data that captures its natural spatio-temporal structure. We start by developing a model for aligned event data. That is, we develop a model for the data as though it has been perfectly registered already. In particular, we model the aligned data as a spatio-temporal Poisson point process. Based on this model, we develop a maximum likelihood approach to registering events that are not yet aligned. That is, we find transformations of the observed events that make them as likely as possible under our model. In particular we extract the camera rotation that leads to the best event alignment. We show new state of the art accuracy for rotational velocity estimation on the DAVIS 240C dataset. In addition, our method is also faster and has lower computational complexity than several competing methods.
Reference:
C. Gu, E. Learned-Miller, D. Sheldon, G. Gallego, P. Bideau
The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data
IEEE International Conference on Computer Vision (ICCV), 2021, pp. 13475-13484.
Event-based Stereo Visual Odometry
Event-based cameras are bio-inspired vision sensors whose pixels work independently from each other and respond asynchronously to brightness changes, with microsecond resolution. Their advantages make it possible to tackle challenging scenarios in robotics, such as high-speed and high dynamic range scenes. We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig. Our system follows a parallel tracking-and-mapping approach, where novel solutions to each subproblem (3D reconstruction and camera pose estimation) are developed with two objectives in mind: being principled and efficient, for real-time operation with commodity hardware. To this end, we seek to maximize the spatio-temporal consistency of stereo event-based data while using a simple and efficient representation. Specifically, the mapping module builds a semi-dense 3D map of the scene by fusing depth estimates from multiple local viewpoints (obtained by spatio-temporal consistency) in a probabilistic fashion. The tracking module recovers the pose of the stereo rig by solving a registration problem that naturally arises due to the chosen map and event data representation. Experiments on publicly available datasets and on our own recordings demonstrate the versatility of the proposed method in natural scenes with general 6-DoF motion. The system successfully leverages the advantages of event-based cameras to perform visual odometry in challenging illumination conditions, such as low-light and high dynamic range, while running in real-time on a standard CPU. We release the software and dataset under an open source license to foster research in the emerging topic of event-based SLAM.
References:
Y. Zhou, G. Gallego, S. Shen
Event-based Stereo Visual Odometry
IEEE Transactions on Robotics (TRO), vol. 37, no. 5, pp. 1433-1450, Oct. 2021.
doi, PDF, YouTube, Source Code, Project page and Datasets,
Results on DSEC dataset, and Tutorial
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Reference:
Focus Is All You Need: Loss Functions for Event-based Vision
Event cameras are novel vision sensors that output pixel-level brightness changes ("events") instead of traditional video frames. These asynchronous sensors offer several advantages over traditional cameras, such as, high temporal resolution, very high dynamic range, and no motion blur. To unlock the potential of such sensors, motion compensation methods have been recently proposed. We present a collection and taxonomy of twenty two objective functions to analyze event alignment in motion compensation approaches. We call them focus loss functions since they have strong connections with functions used in traditional shape-from-focus applications. The proposed loss functions allow bringing mature computer vision tools to the realm of event cameras. We compare the accuracy and runtime performance of all loss functions on a publicly available dataset, and conclude that the variance, the gradient and the Laplacian magnitudes are among the best loss functions. The applicability of the loss functions is shown on multiple tasks: rotational motion, depth and optical flow estimation. The proposed focus loss functions allow to unlock the outstanding properties of event cameras.
References:
Event-Based Motion Segmentation by Motion Compensation
In contrast to traditional cameras, whose pixels have a common exposure time, event-based cameras are novel bio-inspired sensors whose pixels work independently and asynchronously output intensity changes (called "events"'), with microsecond resolution. Since events are caused by the apparent motion of objects, event-based cameras sample visual information based on the scene dynamics and are, therefore, a more natural fit than traditional cameras to acquire motion, especially at high speeds, where traditional cameras suffer from motion blur. However, distinguishing between events caused by different moving objects and by the camera's ego-motion is a challenging task. We present the first per-event segmentation method for splitting a scene into independently moving objects. Our method jointly estimates the event-object associations (i.e., segmentation) and the motion parameters of the objects (or the background) by maximization of an objective function, which builds upon recent results on event-based motion-compensation. We provide a thorough evaluation of our method on a public dataset, outperforming the state-of-the-art by as much as 10%. We also show the first quantitative evaluation of a segmentation algorithm for event cameras, yielding around 90% accuracy at 4 pixels relative displacement.
References:
T. Stoffregen, G. Gallego, T. Drummond, L. Kleeman, D. Scaramuzza
Event-Based Motion Segmentation by Motion Compensation
IEEE International Conference on Computer Vision (ICCV), 2019, pp. 7243-7252.
doi, PDF (animations best viewed with Acrobat Reader), YouTube
Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization
Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes, called "events", instead of traditional video images. These asynchronous sensors naturally respond to motion in the scene with very low latency (in the order of microseconds) and have a very high dynamic range. These features, along with a very low power consumption, make event cameras an ideal sensor for fast robot localization and wearable applications, such as AR/VR and gaming. Considering these applications, we present a method to track the 6-DOF pose of an event camera in a known environment, which we contemplate to be described by a photometric 3D map (i.e., intensity plus depth information) built via classic dense 3D reconstruction algorithms. Our approach uses the raw events, directly, without intermediate features, within a maximum-likelihood framework to estimate the camera motion that best explains the events via a generative model. We successfully evaluate the method using both simulated and real data, and show improved results over the state of the art. We release the datasets to the public to foster reproducibility and research in this topic.
References:
Asynchronous, Photometric Feature Tracking using Events and Frames
We present EKLT, a feature tracking method that leverages the complementarity of event cameras and standard cameras to track visual features with low latency. Event cameras are novel sensors that output pixel-level brightness changes, called "events". They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the same scene pattern can produce different events depending on the motion direction, establishing event correspondences across time is challenging. By contrast, standard cameras provide intensity measurements (frames) that do not depend on motion direction. Our method extracts features on frames and subsequently tracks them asynchronously using events, thereby exploiting the best of both types of data: the frames provide a photometric representation that does not depend on motion direction and the events provide low latency updates. In contrast to previous works, which are based on heuristics, this is the first principled method that uses raw intensity measurements directly, based on a generative event model within a maximum-likelihood framework. As a result, our method produces feature tracks that are both more accurate (subpixel accuracy) and longer than the state of the art, across a wide variety of scenes.
References:
D. Gehrig, H. Rebecq, G. Gallego, D. Scaramuzza
EKLT: Asynchronous, Photometric Feature Tracking using Events and Frames
International Journal of Computer Vision (IJCV), vol. 128, pp. 601-618, 2020.
D. Gehrig, H. Rebecq, G. Gallego, D. Scaramuzza
Asynchronous, Photometric Feature Tracking using Events and Frames
European Conference on Computer Vision (ECCV), 2018, pp. 766-781.
Oral Presentation. Acceptance rate: 2.4%
doi, PDF, Poster, YouTube, Oral presentation, Tracking Code, Evaluation Code
Semi-Dense 3D Reconstruction with a Stereo Event Camera
This paper presents a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. The proposed method consists of the optimization of an energy function designed to exploit small-baseline spatio-temporal consistency of events triggered across both stereo image planes. To improve the density of the reconstruction and to reduce the uncertainty of the estimation, a probabilistic depth-fusion strategy is also developed. The resulting method has no special requirements on either the motion of the stereo event-camera rig or on prior knowledge about the scene. Experiments demonstrate our method can deal with both texture-rich scenes as well as sparse scenes, outperforming state-of-the-art stereo methods based on event data image representations.
References:
Continuous-Time Visual-Inertial Odometry for Event Cameras
In this paper, we leverage a continuous-time framework to perform trajectory estimation by fusing visual data from a moving event camera with inertial data from an IMU. This framework allows direct integration of the asynchronous events with micro-second accuracy and the inertial measurements at high frequency. The pose trajectory is approximated by a smooth curve in the space of rigid-body motions using cubic splines. This formulation significantly reduces the number of variables in trajectory estimation problems. We evaluate our method on real data from several scenes and compare the results against ground truth from a motion-capture system. We show superior performance of the proposed technique compared to non-batch event-based algorithms. We also show that both the map orientation and scale can be recovered accurately by fusing events and inertial data. To the best of our knowledge, this is the first work on visual-inertial fusion with event cameras using a continuous-time framework.
References:
A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation
We present a unifying framework to solve several computer vision problems with event cameras: motion, depth and optical flow estimation. The main idea of our framework is to find the point trajectories on the image plane that are best aligned with the event data by maximizing an objective function: the contrast of an image of warped events. Our method implicitly handles data association between the events, and therefore, does not rely on additional appearance information about the scene. In addition to accurately recovering the motion parameters of the problem, our framework produces motion-corrected edge-like images with high dynamic range that can be used for further scene analysis. The proposed method is not only simple, but more importantly, it is, to the best of our knowledge, the first method that can be successfully applied to such a diverse set of important vision tasks with event cameras.
References:
G. Gallego, H. Rebecq, D. Scaramuzza
A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3867-3876, 2018.
Spotlight Presentation.
doi, PDF, Poster, YouTube, Presentation
Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars
Event cameras are bio-inspired vision sensors that naturally capture the dynamics of a scene, filtering out redundant information. This paper presents a deep neural network approach that unlocks the potential of event cameras on a challenging motion-estimation task: prediction of a vehicle's steering angle. To make the best out of this sensor-algorithm combination, we adapt state-of-the-art convolutional architectures to the output of event sensors and extensively evaluate the performance of our approach on a publicly available large scale event-camera dataset (~1000 km). We present qualitative and quantitative explanations of why event cameras allow robust steering prediction even in cases where traditional cameras fail, e.g. challenging illumination conditions and fast motion. Finally, we demonstrate the advantages of leveraging transfer learning from traditional to event-based vision, and show that our approach outperforms state-of-the-art algorithms based on standard cameras.
References:
Event-based, 6-DOF Camera Tracking from Photometric Depth Maps
This paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. Our method is the first work addressing and demonstrating event-based pose tracking in six degrees-of-freedom (DOF) motions in realistic and natural scenes. We successfully evaluate the method in both indoor and outdoor scenes and show that, because of the technological advantages of the event camera, our pipeline works in scenes characterized by high-speed motion, which are still inaccessible to standard cameras.
References:
EMVS: Event-Based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time
We introduce the problem of event-based multi-view stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (1) its ability to respond to scene edges - which naturally provide semi-dense geometric information without any preprocessing operation - and (2) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps, without requiring any explicit data association or intensity estimation. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a CPU. We release the source code.
References:
H. Rebecq, G. Gallego, E. Mueggler, D. Scaramuzza
EMVS: Event-Based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time
International Journal of Computer Vision (IJCV), vol. 126, no. 12, pp. 1394-1414, Dec. 2018.
Special Issue with best (extended) papers from BMVC 2016.
EVO: Event-based, 6-DOF Parallel Tracking and Mapping in Real-Time
We present EVO, an Event-based Visual Odometry algorithm. Our algorithm successfully leverages the outstanding properties of event cameras to track fast camera motions while recovering a semi-dense 3D map of the environment. The implementation runs in real-time on a standard CPU and outputs up to several hundred pose estimates per second. Due to the nature of event cameras, our algorithm is unaffected by motion blur and operates very well in challenging, high dynamic range conditions with strong illumination changes. To achieve this, we combine a novel, event-based tracking approach based on image-to-model alignment with a recent event-based 3D reconstruction algorithm in a parallel fashion. Additionally, we show that the output of our pipeline can be used to reconstruct intensity images from the binary event stream, though our algorithm does not require such intensity information. We believe that this work makes significant progress in SLAM by unlocking the potential of event cameras. This allows us to tackle challenging scenarios that are currently inaccessible to standard cameras.
References:
Accurate Angular Velocity Estimation with an Event Camera
We present an algorithm to estimate the rotational motion of an event camera. In contrast to traditional cameras, which produce images at a fixed rate, event cameras have independent pixels that respond asynchronously to brightness changes, with microsecond resolution. Our method leverages the type of information conveyed by these novel sensors (that is, edges) to directly estimate the angular velocity of the camera, without requiring optical flow or image intensity estimation. The core of the method is a contrast maximization design. The method performs favorably against ground truth data and gyroscopic measurements from an Inertial Measurement Unit, even in the presence of very high-speed motions (close to 1000 deg/s).
References:
Event-based Data for Pose Estimation, Visual Odometry, and SLAM
We present the world's first collection of datasets with an event-based camera for high-speed robotics. The data also include intensity images, inertial measurements, and ground truth from a motion-capture system. An event-based camera is a revolutionary vision sensor with three key advantages: a measurement rate that is several orders of magnitude faster than standard cameras, a latency of microseconds, and a high dynamic range of 130 decibels. These properties enable the design of a new class of algorithms for high-speed robotics, where standard cameras suffer from motion blur and high latency. All the data are released both as text files and binary (i.e., rosbag) files. Find out more on the dataset website!
References:
E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, D. Scaramuzza
The Event-Camera Dataset and Simulator: Event-based Data for Pose Estimation, Visual Odometry, and SLAM
International Journal of Robotics Research (IJRR), vol. 36, no. 2, pp. 142-149, Feb. 2017.
EMVS: Event-based Multi-View Stereo
We introduce the problem of Event-based Multi-View Stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (i) its ability to respond to scene edges --which naturally provide semi-dense geometric information without any preprocessing operation-- and (ii) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a laptop CPU and even on a smartphone processor. We release the source code.
References:
Low-Latency Visual Odometry using Event-based Feature Tracks
We develop an event-based feature tracking algorithm for the DAVIS sensor and show how to integrate it in an event-based visual odometry pipeline. Features are first detected in the grayscale frames and then tracked asynchronously using the stream of events. The features are then fed to an event-based visual odometry pipeline that tightly interleaves robust pose optimization and probabilistic mapping. We show that our method successfully tracks the 6-DOF motion of the sensor in natural scenes (see video above).
References:
B. Kueng, E. Mueggler, G. Gallego, D. Scaramuzza
Low-Latency Visual Odometry using Event-based Feature Tracks
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea (South), 2016, pp. 16-23.
Best Application Paper Award Finalist. Highlight talk: acceptance rate 2.5%
Continuous-Time Trajectory Estimation for Event-based Vision Sensors
In this paper, we address ego-motion estimation for an event-based vision sensor using a continuous-time framework to directly integrate the information conveyed by the sensor. The DVS pose trajectory is approximated by a smooth curve in the space of rigid-body motions using cubic splines and it is optimized according to the observed events. We evaluate our method using datasets acquired from sensor-in-the-loop simulations and onboard a quadrotor performing flips. The results are compared to the ground truth, showing the good performance of the proposed technique.
References:
Event-based Camera Pose Tracking using a Generative Event Model
We tackle the problem of event-based camera localization in a known environment, without additional sensing, using a probabilistic generative event model in a Bayesian filtering framework. Our main contribution is the design of the likelihood function used in the filter to process the observed events. Based on the physical characteristics of the sensor and on empirical evidence of the Gaussian-like distribution of spiked events with respect to the brightness change, we propose to use the contrast residual as a measure of how well the estimated pose of the event-based camera and the environment explain the observed events. The filter allows for localization in the general case of six degrees-of-freedom motions.
G. Gallego, C. Forster, E. Mueggler, D. Scaramuzza
Event-based Camera Pose Tracking using a Generative Event Model
arXiv:1510.01972, 2015.
Lifetime Estimation of Events from Dynamic Vision Sensors
We develop an algorithm that augments each event with its "lifetime", which is computed from the event's velocity on the image plane. The generated stream of augmented events gives a continuous representation of events in time, hence enabling the design of new algorithms that outperform those based on the accumulation of events over fixed, artificially-chosen time intervals. A direct application of this augmented stream is the construction of sharp gradient (edge-like) images at any time instant. We successfully demonstrate our method in different scenarios, including high-speed quadrotor flips, and compare it to standard visualization methods.
References: