Research

We work on various problems in
geometric and semantic computer vision and machine learning,
with applications to mobile robots, automotives, and augmented/virtual reality. 

Visual SLAM / Visual Inertial Odometry

ROVO: Robust Omnidirectional Visual Odometry

In this paper we propose a robust visual odometry system for a wide-baseline camera rig with wide field-of-view (FOV) fisheye lenses, which provides full omnidirectional stereo observations of the environment. For more robust and accurate ego-motion estimation we adds three components to the standard VO pipelines, 1) the hybrid projection model for improved feature matching, 2) multi-view p3p ransanc algorithm for pose estimation, and 3) online update of rig extrinsic parameters. The proposed system is extensively evaluated with synthetic datasets with ground-truth and real sequences of highly dynamic environment, and its superior performance is demonstrated.

Visual Inertial Odometry Using Coupled Nonlinear Optimization

Visual inertial odometry (VIO) gained lots of interest recently for efficient and accurate ego-motion estimation of robots and automobiles. With a monocular camera and an inertial measurement unit (IMU) rigidly attached, VIO aims to estimate the 3D pose trajectory of the device in a global metric space. We propose a novel visual inertial odometry algorithm which directly optimizes the camera poses with noisy IMU data and visual feature locations.

Online Environment Mapping (Metric-topological map)

Building the map of environment for localization and navigation is critical for scene understanding and robot operation. We propose a metric-topological mapping which holds the benefits of both metric maps and topological maps.

Camera Motion Estimation using Points and Lines

Points are commonly used for structure from motion and ego-motion estimation. We investigated more robust and fast ways to use line features for motion estimation of a stereo camera rig.

Depth Estimation / 3D Modeling

Robust stereo matching using adaptive random walk with restart algorithm

In this paper, we propose a robust dense stereo reconstruction algorithm using a random walk with restart. The pixel-wise matching costs are aggregated into superpixels and the modified random walk with restart algorithm updates the matching cost for all possible disparities between the superpixels.

Visual Object Tracking

Tracking Persons-of-Interest via Adaptive Discriminative Features

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low- level features used in existing multi-target tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face features using convolutional neural networks (CNNs).

Visual Tracking Benchmark

For decades many visual trackers have been proposed, but there was little effort to quantitatively measure and compare their performance. In this work we provide a dataset which contains common test videos with hand-labeled groundtruth. The tracker library with standardized interface for massive evaluation enables the researchers to easily test and compare their trackers with the state-of-the-art trackers.

Deep Learning

DFT-based Transformation Invariant Pooling Layer for Visual Classification

We propose a DFT based pooling layer for convolutional neural networks. The proposed DFT magnitude pooling satisfies translation invariance and shape preserving properties. It pools DFT magnitude of last convolution feature map based on shift theorem. Convolutional neural networks with the proposed method improve the performance of various visual classification tasks. We validate the ability of the transformation invariance by sufficient experiments of the paper.

Research Project Pages