Publications

Journals

Yi Zhou, Guillermo Gallego, Xiyuan Lu, Siqi Liu and Shaojie Shen

Event-based Motion Segmentation with Spatio-Temporal Graph Cuts

IEEE Transaction on Neural Networks and Learning Systems (2021)

[pdf] [project page] [code]

Abstract:

Identifying independently moving objects is an essential task for dynamic scene understanding. However, traditional cameras used in dynamic scenes may suffer from motion blur or exposure artifacts due to their sampling principle. By contrast, event-based cameras are novel bio-inspired sensors that offer advantages to overcome such limitations. They report pixelwise intensity changes asynchronously, which enables them to acquire visual information at exactly the same rate as the scene dynamics. We develop a method to identify independently moving objects acquired with an event-based camera, i.e., to solve the event-based motion segmentation problem. We cast the problem as an energy minimization one involving the fitting of multiple motion models. We jointly solve two subproblems, namely eventcluster assignment (labeling) and motion model fitting, in an iterative manner by exploiting the structure of the input event data in the form of a spatio-temporal graph. Experiments on available datasets demonstrate the versatility of the method in scenes with different motion patterns and number of moving objects. The evaluation shows state-of-the-art results without having to predetermine the number of expected moving objects. We release the software and dataset under an open source licence to foster research in the emerging topic of event-based motion segmentation.


Yi Zhou, Guillermo Gallego, and Shaojie Shen

Event-based Stereo Visual Odometry

IEEE Transactions on Robotics (2021)

[pdf] [project page] [code]

Abstract:

Event-based cameras are bio-inspired vision sensors whose pixels work independently from each other and respond asynchronously to brightness changes, with microsecond resolution. Their advantages make it possible to tackle challenging scenarios in robotics, such as high-speed and high dynamic range scenes. We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig. Our system follows a parallel tracking-and-mapping approach, where novel solutions to each subproblem (3D reconstruction and camera pose estimation) are developed with two objectives in mind: being principled and efficient, for real-time operation with commodity hardware. To this end, we seek to maximize the spatio-temporal consistency of stereo event-based data while using a simple and efficient representation. Specifically, the mapping module builds a semi-dense 3D map of the scene by fusing depth estimates from multiple local viewpoints (obtained by spatio-temporal consistency) in a probabilistic fashion. The tracking module recovers the pose of the stereo rig by solving a registration problem that naturally arises due to the chosen map and event data representation. Experiments on publicly available datasets and on our own recordings demonstrate the versatility of the proposed method in natural scenes with general 6-DoF motion. The system successfully leverages the advantages of event-based cameras to perform visual odometry in challenging illumination conditions, such as low-light and high dynamic range, while running in real-time on a standard CPU. We release the software and dataset under an open source license to foster research in the emerging topic of event-based SLAM.


Yi Zhou, Laurent Kneip, Hongdong Li

Canny-VO: Visual Odometry With RGB-D Cameras Based on Geometric 3-D–2-D Edge Alignment

IEEE Transactions on Robotics ( Volume: 35, Issue: 1, Feb. 2019)

[pdf] [CannySLAM Video]

Abstract:

This paper reviews the classical problem of free-form curve registration and applies it to an efficient RGB-D visual odometry system called Canny-VO, as it efficiently tracks all Canny edge features extracted from the images. Two replacements for the distance transformation commonly used in edge registration are proposed: approximate nearest neighbor fields and oriented nearest neighbor fields. 3-D-2-D edge alignment benefits from these alternative formulations in terms of both efficiency and accuracy. It removes the need for the more computationally demanding paradigms of data-to-model registration, bilinear interpolation, and subgradient computation. To ensure robustness of the system in the presence of outliers and sensor noise, the registration is formulated as a maximum a posteriori problem and the resulting weighted least-squares objective is solved by the iteratively reweighted least-squares method. A variety of robust weight functions are investigated and the optimal choice is made based on the statistics of the residual errors. Efficiency is furthermore boosted by an adaptively sampled definition of the nearest neighbor fields. Extensive evaluations on public SLAM benchmark sequences demonstrate state-of-the-art performance and an advantage over classical Euclidean distance fields.



Conferences

Yi Zhou, Guillermo Gallego, Henri Rebecq, Laurent Kneip, Hongdong Li, and Davide Scaramuzza.


Semi-dense 3D reconstruction with a stereo event camera

In Proceedings of the European Conference on Computer Vision (ECCV), pp. 235-251. 2018.

[pdf]

Abstract

Event cameras are bio-inspired sensors that offer several advantages, such as low latency, high-speed and high dynamic range, to tackle challenging scenarios in computer vision. This paper presents a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. The proposed method consists of the optimization of an energy function designed to exploit small-baseline spatio-temporal consistency of events triggered across both stereo image planes. To improve the density of the reconstruction and to reduce the uncertainty of the estimation, a probabilistic depth-fusion strategy is also developed. The resulting method has no special requirements on either the motion of the stereo event-camera rig or on prior knowledge about the scene. Experiments demonstrate our method can deal with both texture-rich scenes as well as sparse scenes, outperforming state-of-the-art stereo methods based on event data image representations.


Yi Zhou, Laurent Kneip, Hongdong Li

Semi-Dense Visual Odometry for RGB-D Cameras Using Approximate Nearest Neighbour Fields

The 2017 IEEE International Conference on Robotics and Automation (ICRA).

[Arxiv version] [Video]


Abstract:

This paper presents a robust and efficient semidense visual odometry solution for RGB-D cameras. The core of our method is a 2D-3D ICP pipeline which estimates the pose of the sensor by registering the projection of a 3D semidense map of a reference frame with the 2D semi-dense region extracted in the current frame. The processing is speeded up by efficiently implemented approximate nearest neighbour fields under the Euclidean distance criterion, which permits the use of compact Gauss-Newton updates in the optimization. The registration is formulated as a maximum a posterior problem to deal with outliers and sensor noise, and the equivalent weighted least squares problem is consequently solved by iteratively reweighted least squares method. A variety of robust weight functions are tested and the optimum is determined based on the probabilistic characteristics of the sensor model. Extensive evaluation on publicly available RGB-D datasets shows that the proposed method predominantly outperforms existing state-of-the-art methods.

Yi Zhou, Laurent Kneip, Cristian Rodriguez, Hongdong Li

Divide and Conquer: Effcient Density-Based Tracking of 3D Sensors in Manhattan Worlds

The 13th Asian Conference on Computer Vision (ACCV 2016), Oral presentation

[PDF] [Supplementary Material] [Video1] [Video2] [Sample Code]

Abstract

3D depth sensors such as LIDARs and RGB-D cameras have become a popular choice for indoor localization and mapping. However, due to the lack of direct frame-to-frame correspondences, the tracking traditionally relies on the iterative closest point technique which does not scale well with the number of points. In this paper, we build on top of more recent and effcient density distribution alignment methods, and notably push the idea towards a highly effcient and reliable solution for full 6DoF motion estimation with only depth information. We propose a divide-and-conquer technique during which the estimation of the rotation and the three degrees of freedom of the translation are all decoupled from one another. The rotation is estimated absolutely and drift-free by exploiting the orthogonal structure in man-made environments. The underlying algorithm is an effcient extension of the mean-shift paradigm to manifold-constrained multiple-mode tracking. Dedicated projections subsequently enable the estimation of the translation through three simple 1D density alignment steps that can be executed in parallel. An extensive evaluation on both simulated and publicly available real datasets comparing several existing methods demonstrates outstanding performance at low computational cost.


Yi Zhou, Laurent Kneip, Hongdong Li

Real Time Rotation Estimation for Dense Depth Senors in Piece-wise Planar Environments

Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on

[PDF] [Video]

Abstract

Low-drift rotation estimation is a crucial part of any accurate odometry system. In this paper, we focus on the problem of 3D rotation estimation with dense depth sensors in environments that consist of piece-wise planar structures, such as corridors and office rooms. An efficient mean-shift paradigm is developed to extract and track planar modes in the surface normal vector distribution on the unit sphere. Robust and piecewise drift-free behavior is achieved by registering the bundle of planar modes from the current frame with respect to a reference frame using a general `1-norm regression scheme. We furthermore add a memory scheme to the regular birth and death of modes, which further compensates accumulated rotational drift when previously discovered modes are revisited. We discuss the robustness issue and evaluate our algorithm on both custom synthetic as well as real publicly available datasets. Our experimental results demonstrate high robustness and effectiveness of the proposed algorithm.


Yi Zhou, Laurent Kneip, Hongdong Li

A Revisit of Methods for Determining the Fundamental Matrix with Planes

Digital lmage Computing: Techniques and Applications, 2015 International Conference on (DICTA2015)

Note: The proof of the property 1 in this version is a little bit different from the conference version.

[PDF] [Code]

Abstract

Determining the fundamental matrix from a collection of inter-frame homographies (more than two) is a classical problem. The compatibility relationship between the fundamental matrix and any of the ideally consistent homographies can be used to compute the fundamental matrix. Using the direct linear transformation (DLT), the compatibility equation can be translated into a least squares problem and can be easily solved via SVD decomposition. However, this solution is extremely susceptible to noise and motion inconsistencies, hence rarely used. Inspired by the normalized eight-point algorithm, we show that a relatively simple but non-trivial two-step normalization of the input homographies achieves the desired effect, and the results are at last comparable to the less attractive hallucinated points method. The algorithm is theoretically justified and verified by experiments on both synthetic and real data.


Laurent Kneip,Yi Zhou, Hongdong Li

SDICP: Semi-Dense Tracking based on Iterative Closest Points

The 26TH BRITISH MACHINE VISION CONFERENCE (BMVC2015), Swansea, UK

[PDF] [SDICP(BMVC)] [Video]

Abstract

This paper introduces a novel strategy for real-time monocular camera tracking over the recently introduced, efficient semi-dense depth maps. We employ a geometric iterative closest point technique instead of a photometric error criterion, which has the conceptual advantage of requiring neither isotropic enlargement of the employed semidense regions, nor pyramidal subsampling. We outline the detailed concepts leading to robustness and efficiency even for large frame-to-frame disparities. We demonstrate successful real-time processing over very large view-point changes and significantly corrupted semi-dense depth-maps, thus underlining the validity of our geometric approach.



Previous Works

  • Chao-Lei Wang, Tian-Miao Wang, Jian-Hong Liang, Yi-Cheng Zhang, Yi Zhou. Bearing-only Visual SLAM for Small Unmanned Aerial Vehicles in GPS-denied Environments. International Journal of Automation and Computing (IJAC), pp. 387-396, 2013. [pdf]

  • Han Gao, Tianmiao Wang, Jianhong Liang, Yi Zhou. Model Adaptive Gait Scheme Based on Evolutionary Algorithm. Industrial Electronics and Applications (ICIEA), 2013 8th IEEE Conference on, pp. 316 - 321, 2013. [pdf] [video]

  • Chaolei Wang, Tianmiao Wang, Jianhong Liang, Yicheng Zhang, Yi Zhou. Research on monocular visual FastSLAM for a small unmanned helicopter. Chinese High Technology Letters, pp. 1061-1067, 2013.

  • Yi Zhou, Tianmiao Wang, Jianhong Liang, Chaolei Wang, Yicheng Zhang. Structural target recognition algorithm for visual guidance of small unmanned helicopters. Robotics and Biomimetics (ROBIO), 2012 IEEE International Conference on, pp. 908 - 913, 2012. [pdf] [video]

  • Yicheng Zhang, Tianmiao Wang, Jianhong Liang, Chaolei Wang, Yang Chen, Yi Zhou, Yubao Luan, Han Gao. An implement of RPV control system for small unmanned helicopters. Robotics and Biomimetics (ROBIO), 2012 IEEE International Conference on, pp. 1141 - 1145, 2012. [pdf]