This is a topic that I started to develop when I moved to Zurich to work at the Robotics and Perception Group (UZH and ETH).
Event cameras, such as the Dynamic Vision Sensor (DVS), are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the output is composed of a sequence of asynchronous events rather than actual intensity images, traditional vision algorithms cannot be applied, so that new algorithms that exploit the high temporal resolution and the asynchronous nature of the sensor are required.
After the success of the First International Workshop on Event-based Vision at ICRA'17, where we saw a large and growing number of people interested in event-based cameras, we started a List of Event-based Vision Resources. The list collects links to event camera devices as well as papers, videos, code, presentations, etc. describing the algorithms and systems developed using this exciting technology.
We hope the list will help us as well as people interested in this technology to be more aware of past and recent developments by directing them to the appropriate references, which are organized by topics, as shown in the Table of Contents at the top of the list.
Workshops on Event-based Vision
CVPR'19: Second International Workshop on Event-based Vision and Smart Cameras, Long Beach, USA. The slides and videos of the talks are available online!
ICRA'17: First International Workshop on Event-based Vision, Singapore. Video recordings and slides are available!
2017 Misha Mahowald Prize for Neuromorphic Engineering
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
EDS: Event-aided Direct Sparse Odometry
We introduce EDS, a direct monocular visual odometry using events and frames. Our algorithm leverages the event generation model to track the camera motion in the blind time between frames. The method formulates a direct probabilistic approach of observed brightness increments. Per-pixel brightness increments are predicted using a sparse number of selected 3D points and are compared to the events via the brightness increment error to estimate camera motion. The method recovers a semi-dense 3D map using photometric bundle adjustment. EDS is the first method to perform 6-DOF VO using events and frames with a direct approach. By design it overcomes the problem of changing appearance in indirect methods. We also show that, for a target error performance, EDS can work at lower frame rates than state-of-the-art frame-based VO solutions. This opens the door to low-power motion-tracking applications where frames are sparingly triggered "on demand'' and our method tracks the motion in between. We release code and datasets to the public.
ESL: Event-based Structured Light
Event cameras are bio-inspired sensors providing significant advantages over standard cameras such as low latency, high temporal resolution, and high dynamic range. We propose a novel structured-light system using an event camera to tackle the problem of accurate and high-speed depth sensing. Our setup consists of an event camera and a laser-point projector that uniformly illuminates the scene in a raster scanning pattern during 16 ms. Previous methods match events independently of each other, and so they deliver noisy depth estimates at high scanning speeds in the presence of signal latency and jitter. In contrast, we optimize an energy function designed to exploit event correlations, called spatio-temporal consistency. The resulting method is robust to event jitter and therefore performs better at higher scanning speeds. Experiments demonstrate that our method can deal with high-speed motion and outperform state-of-the-art 3D reconstruction methods based on event cameras, reducing the RMSE by 83% on average, for the same acquisition time.
Event-based Motion Segmentation with Spatio-Temporal Graph Cuts
Identifying independently moving objects is an essential task for dynamic scene understanding. However, traditional cameras used in dynamic scenes may suffer from motion blur or exposure artifacts due to their sampling principle. By contrast, event-based cameras are novel bio-inspired sensors that offer advantages to overcome such limitations. They report pixel-wise intensity changes asynchronously, which enables them to acquire visual information at exactly the same rate as the scene dynamics. We develop a method to identify independently moving objects acquired with an event-based camera, i.e., to solve the event-based motion segmentation problem. We cast the problem as an energy minimization one involving the fitting of multiple motion models. We jointly solve two subproblems, namely event cluster assignment (labeling) and motion model fitting, in an iterative manner by exploiting the structure of the input event data in the form of a spatio-temporal graph. Experiments on available datasets demonstrate the versatility of the method in scenes with different motion patterns and number of moving objects. The evaluation shows state-of-the-art results without having to predetermine the number of expected moving objects. We release the software and dataset under an open source license to foster research in the emerging topic of event-based motion segmentation.
The Spatio-Temporal Poisson Point Process: A Simple Model for the Alignment of Event Camera Data
Event cameras, inspired by biological vision systems, provide a natural and data efficient representation of visual information. Visual information is acquired in the form of events that are triggered by local brightness changes. However, because most brightness changes are triggered by relative motion of the camera and the scene, the events recorded at a single sensor location seldom correspond to the same world point. To extract meaningful information from event cameras, it is helpful to register events that were triggered by the same underlying world point. In this work we propose a new model of event data that captures its natural spatio-temporal structure. We start by developing a model for aligned event data. That is, we develop a model for the data as though it has been perfectly registered already. In particular, we model the aligned data as a spatio-temporal Poisson point process. Based on this model, we develop a maximum likelihood approach to registering events that are not yet aligned. That is, we find transformations of the observed events that make them as likely as possible under our model. In particular we extract the camera rotation that leads to the best event alignment. We show new state of the art accuracy for rotational velocity estimation on the DAVIS 240C dataset. In addition, our method is also faster and has lower computational complexity than several competing methods.
Event-based Stereo Visual Odometry
Event-based cameras are bio-inspired vision sensors whose pixels work independently from each other and respond asynchronously to brightness changes, with microsecond resolution. Their advantages make it possible to tackle challenging scenarios in robotics, such as high-speed and high dynamic range scenes. We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig. Our system follows a parallel tracking-and-mapping approach, where novel solutions to each subproblem (3D reconstruction and camera pose estimation) are developed with two objectives in mind: being principled and efficient, for real-time operation with commodity hardware. To this end, we seek to maximize the spatio-temporal consistency of stereo event-based data while using a simple and efficient representation. Specifically, the mapping module builds a semi-dense 3D map of the scene by fusing depth estimates from multiple local viewpoints (obtained by spatio-temporal consistency) in a probabilistic fashion. The tracking module recovers the pose of the stereo rig by solving a registration problem that naturally arises due to the chosen map and event data representation. Experiments on publicly available datasets and on our own recordings demonstrate the versatility of the proposed method in natural scenes with general 6-DoF motion. The system successfully leverages the advantages of event-based cameras to perform visual odometry in challenging illumination conditions, such as low-light and high dynamic range, while running in real-time on a standard CPU. We release the software and dataset under an open source license to foster research in the emerging topic of event-based SLAM.
Focus Is All You Need: Loss Functions for Event-based Vision
Event cameras are novel vision sensors that output pixel-level brightness changes ("events") instead of traditional video frames. These asynchronous sensors offer several advantages over traditional cameras, such as, high temporal resolution, very high dynamic range, and no motion blur. To unlock the potential of such sensors, motion compensation methods have been recently proposed. We present a collection and taxonomy of twenty two objective functions to analyze event alignment in motion compensation approaches. We call them focus loss functions since they have strong connections with functions used in traditional shape-from-focus applications. The proposed loss functions allow bringing mature computer vision tools to the realm of event cameras. We compare the accuracy and runtime performance of all loss functions on a publicly available dataset, and conclude that the variance, the gradient and the Laplacian magnitudes are among the best loss functions. The applicability of the loss functions is shown on multiple tasks: rotational motion, depth and optical flow estimation. The proposed focus loss functions allow to unlock the outstanding properties of event cameras.
Event-Based Motion Segmentation by Motion Compensation
In contrast to traditional cameras, whose pixels have a common exposure time, event-based cameras are novel bio-inspired sensors whose pixels work independently and asynchronously output intensity changes (called "events"'), with microsecond resolution. Since events are caused by the apparent motion of objects, event-based cameras sample visual information based on the scene dynamics and are, therefore, a more natural fit than traditional cameras to acquire motion, especially at high speeds, where traditional cameras suffer from motion blur. However, distinguishing between events caused by different moving objects and by the camera's ego-motion is a challenging task. We present the first per-event segmentation method for splitting a scene into independently moving objects. Our method jointly estimates the event-object associations (i.e., segmentation) and the motion parameters of the objects (or the background) by maximization of an objective function, which builds upon recent results on event-based motion-compensation. We provide a thorough evaluation of our method on a public dataset, outperforming the state-of-the-art by as much as 10%. We also show the first quantitative evaluation of a segmentation algorithm for event cameras, yielding around 90% accuracy at 4 pixels relative displacement.
Event-based, Direct Camera Tracking from a Photometric 3D Map using Nonlinear Optimization
Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes, called "events", instead of traditional video images. These asynchronous sensors naturally respond to motion in the scene with very low latency (in the order of microseconds) and have a very high dynamic range. These features, along with a very low power consumption, make event cameras an ideal sensor for fast robot localization and wearable applications, such as AR/VR and gaming. Considering these applications, we present a method to track the 6-DOF pose of an event camera in a known environment, which we contemplate to be described by a photometric 3D map (i.e., intensity plus depth information) built via classic dense 3D reconstruction algorithms. Our approach uses the raw events, directly, without intermediate features, within a maximum-likelihood framework to estimate the camera motion that best explains the events via a generative model. We successfully evaluate the method using both simulated and real data, and show improved results over the state of the art. We release the datasets to the public to foster reproducibility and research in this topic.
Asynchronous, Photometric Feature Tracking using Events and Frames
We present EKLT, a feature tracking method that leverages the complementarity of event cameras and standard cameras to track visual features with low latency. Event cameras are novel sensors that output pixel-level brightness changes, called "events". They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the same scene pattern can produce different events depending on the motion direction, establishing event correspondences across time is challenging. By contrast, standard cameras provide intensity measurements (frames) that do not depend on motion direction. Our method extracts features on frames and subsequently tracks them asynchronously using events, thereby exploiting the best of both types of data: the frames provide a photometric representation that does not depend on motion direction and the events provide low latency updates. In contrast to previous works, which are based on heuristics, this is the first principled method that uses raw intensity measurements directly, based on a generative event model within a maximum-likelihood framework. As a result, our method produces feature tracks that are both more accurate (subpixel accuracy) and longer than the state of the art, across a wide variety of scenes.
Semi-Dense 3D Reconstruction with a Stereo Event Camera
This paper presents a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. The proposed method consists of the optimization of an energy function designed to exploit small-baseline spatio-temporal consistency of events triggered across both stereo image planes. To improve the density of the reconstruction and to reduce the uncertainty of the estimation, a probabilistic depth-fusion strategy is also developed. The resulting method has no special requirements on either the motion of the stereo event-camera rig or on prior knowledge about the scene. Experiments demonstrate our method can deal with both texture-rich scenes as well as sparse scenes, outperforming state-of-the-art stereo methods based on event data image representations.
Continuous-Time Visual-Inertial Odometry for Event Cameras
In this paper, we leverage a continuous-time framework to perform trajectory estimation by fusing visual data from a moving event camera with inertial data from an IMU. This framework allows direct integration of the asynchronous events with micro-second accuracy and the inertial measurements at high frequency. The pose trajectory is approximated by a smooth curve in the space of rigid-body motions using cubic splines. This formulation significantly reduces the number of variables in trajectory estimation problems. We evaluate our method on real data from several scenes and compare the results against ground truth from a motion-capture system. We show superior performance of the proposed technique compared to non-batch event-based algorithms. We also show that both the map orientation and scale can be recovered accurately by fusing events and inertial data. To the best of our knowledge, this is the first work on visual-inertial fusion with event cameras using a continuous-time framework.
A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation
We present a unifying framework to solve several computer vision problems with event cameras: motion, depth and optical flow estimation. The main idea of our framework is to find the point trajectories on the image plane that are best aligned with the event data by maximizing an objective function: the contrast of an image of warped events. Our method implicitly handles data association between the events, and therefore, does not rely on additional appearance information about the scene. In addition to accurately recovering the motion parameters of the problem, our framework produces motion-corrected edge-like images with high dynamic range that can be used for further scene analysis. The proposed method is not only simple, but more importantly, it is, to the best of our knowledge, the first method that can be successfully applied to such a diverse set of important vision tasks with event cameras.
G. Gallego, H. Rebecq, D. Scaramuzza
A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3867-3876.
Event-based Vision meets Deep Learning on Steering Prediction for Self-driving Cars
Event cameras are bio-inspired vision sensors that naturally capture the dynamics of a scene, filtering out redundant information. This paper presents a deep neural network approach that unlocks the potential of event cameras on a challenging motion-estimation task: prediction of a vehicle's steering angle. To make the best out of this sensor-algorithm combination, we adapt state-of-the-art convolutional architectures to the output of event sensors and extensively evaluate the performance of our approach on a publicly available large scale event-camera dataset (~1000 km). We present qualitative and quantitative explanations of why event cameras allow robust steering prediction even in cases where traditional cameras fail, e.g. challenging illumination conditions and fast motion. Finally, we demonstrate the advantages of leveraging transfer learning from traditional to event-based vision, and show that our approach outperforms state-of-the-art algorithms based on standard cameras.
Event-based, 6-DOF Camera Tracking from Photometric Depth Maps
This paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. Our method is the first work addressing and demonstrating event-based pose tracking in six degrees-of-freedom (DOF) motions in realistic and natural scenes. We successfully evaluate the method in both indoor and outdoor scenes and show that, because of the technological advantages of the event camera, our pipeline works in scenes characterized by high-speed motion, which are still inaccessible to standard cameras.
EMVS: Event-Based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time
We introduce the problem of event-based multi-view stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (1) its ability to respond to scene edges - which naturally provide semi-dense geometric information without any preprocessing operation - and (2) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps, without requiring any explicit data association or intensity estimation. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a CPU. We release the source code.
H. Rebecq, G. Gallego, E. Mueggler, D. Scaramuzza
EMVS: Event-Based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time
International Journal of Computer Vision (IJCV), vol. 126, no. 12, pp. 1394-1414, Dec. 2018.
Special Issue with best (extended) papers from BMVC 2016.
EVO: Event-based, 6-DOF Parallel Tracking and Mapping in Real-Time
We present EVO, an Event-based Visual Odometry algorithm. Our algorithm successfully leverages the outstanding properties of event cameras to track fast camera motions while recovering a semi-dense 3D map of the environment. The implementation runs in real-time on a standard CPU and outputs up to several hundred pose estimates per second. Due to the nature of event cameras, our algorithm is unaffected by motion blur and operates very well in challenging, high dynamic range conditions with strong illumination changes. To achieve this, we combine a novel, event-based tracking approach based on image-to-model alignment with a recent event-based 3D reconstruction algorithm in a parallel fashion. Additionally, we show that the output of our pipeline can be used to reconstruct intensity images from the binary event stream, though our algorithm does not require such intensity information. We believe that this work makes significant progress in SLAM by unlocking the potential of event cameras. This allows us to tackle challenging scenarios that are currently inaccessible to standard cameras.
Accurate Angular Velocity Estimation with an Event Camera
We present an algorithm to estimate the rotational motion of an event camera. In contrast to traditional cameras, which produce images at a fixed rate, event cameras have independent pixels that respond asynchronously to brightness changes, with microsecond resolution. Our method leverages the type of information conveyed by these novel sensors (that is, edges) to directly estimate the angular velocity of the camera, without requiring optical flow or image intensity estimation. The core of the method is a contrast maximization design. The method performs favorably against ground truth data and gyroscopic measurements from an Inertial Measurement Unit, even in the presence of very high-speed motions (close to 1000 deg/s).
We present the world's first collection of datasets with an event-based camera for high-speed robotics. The data also include intensity images, inertial measurements, and ground truth from a motion-capture system. An event-based camera is a revolutionary vision sensor with three key advantages: a measurement rate that is several orders of magnitude faster than standard cameras, a latency of microseconds, and a high dynamic range of 130 decibels. These properties enable the design of a new class of algorithms for high-speed robotics, where standard cameras suffer from motion blur and high latency. All the data are released both as text files and binary (i.e., rosbag) files. Find out more on the dataset website!
EMVS: Event-based Multi-View Stereo
We introduce the problem of Event-based Multi-View Stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our EMVS solution elegantly exploits two inherent properties of an event camera: (i) its ability to respond to scene edges --which naturally provide semi-dense geometric information without any preprocessing operation-- and (ii) the fact that it provides continuous measurements as the sensor moves. Despite its simplicity (it can be implemented in a few lines of code), our algorithm is able to produce accurate, semi-dense depth maps. We successfully validate our method on both synthetic and real data. Our method is computationally very efficient and runs in real-time on a laptop CPU and even on a smartphone processor. We release the source code.
Low-Latency Visual Odometry using Event-based Feature Tracks
We develop an event-based feature tracking algorithm for the DAVIS sensor and show how to integrate it in an event-based visual odometry pipeline. Features are first detected in the grayscale frames and then tracked asynchronously using the stream of events. The features are then fed to an event-based visual odometry pipeline that tightly interleaves robust pose optimization and probabilistic mapping. We show that our method successfully tracks the 6-DOF motion of the sensor in natural scenes (see video above).
B. Kueng, E. Mueggler, G. Gallego, D. Scaramuzza
Low-Latency Visual Odometry using Event-based Feature Tracks
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, Oct. 9-14, 2016.
Best Application Paper Award Finalist. Highlight talk: acceptance rate 2.5%
Continuous-Time Trajectory Estimation for Event-based Vision Sensors
In this paper, we address ego-motion estimation for an event-based vision sensor using a continuous-time framework to directly integrate the information conveyed by the sensor. The DVS pose trajectory is approximated by a smooth curve in the space of rigid-body motions using cubic splines and it is optimized according to the observed events. We evaluate our method using datasets acquired from sensor-in-the-loop simulations and onboard a quadrotor performing flips. The results are compared to the ground truth, showing the good performance of the proposed technique.
Event-based Camera Pose Tracking using a Generative Event Model
We tackle the problem of event-based camera localization in a known environment, without additional sensing, using a probabilistic generative event model in a Bayesian filtering framework. Our main contribution is the design of the likelihood function used in the filter to process the observed events. Based on the physical characteristics of the sensor and on empirical evidence of the Gaussian-like distribution of spiked events with respect to the brightness change, we propose to use the contrast residual as a measure of how well the estimated pose of the event-based camera and the environment explain the observed events. The filter allows for localization in the general case of six degrees-of-freedom motions.
Lifetime Estimation of Events from Dynamic Vision Sensors
We develop an algorithm that augments each event with its "lifetime", which is computed from the event's velocity on the image plane. The generated stream of augmented events gives a continuous representation of events in time, hence enabling the design of new algorithms that outperform those based on the accumulation of events over fixed, artificially-chosen time intervals. A direct application of this augmented stream is the construction of sharp gradient (edge-like) images at any time instant. We successfully demonstrate our method in different scenarios, including high-speed quadrotor flips, and compare it to standard visualization methods.