Members:
Mauricio Hess-Flores, Mark A. Duchaineau, Michael J. Goldman, Kenneth I. Joy
Abstract
A novel method to detect and correct inaccuracies in a set of unconstrained dense correspondences between two images is presented. Starting with a robust, general-purpose dense correspondence algorithm, an initial pose estimate and dense 3D scene reconstruction are obtained and bundle-adjusted. Reprojection errors are then computed for each correspondence pair, which is used as a metric to distinguish high and low-error correspondences. An affine neighborhood-based coarse-to-fine iterative algorithm for search is then applied only on the high-error correspondences to correct their positions. Such an error detection and correction mechanism is not possible for constrained dense correspondences, obtained for example through epipolar geometry-based guided matching, and is novel for unconstrained algorithms. Results indicate that correspondences in regions with issues such as occlusions, repetitive patterns and moving objects can be identified and corrected, such that a more accurate set of dense correspondences results from the feedback-based process, as proven by more accurate pose and structure estimates.
Overview
The computation of dense image correspondences is important in many modern applications, such as dense scene reconstruction, segmentation of background and moving objects, image stitching and data compression. These suffer from inaccuracies which don't affect sparse approaches, due mainly to occlusions, moving objects and repetitive patterns. We present a novel method to detect and correct inaccurate unconstrained dense correspondences, without any need for ground-truth knowledge.
The process can be summarized as follows. Per-pixel reprojection errors are determined after bundle-adjusting pose and structure computed from a set of input dense correspondences (1). An unconstrained dense correspondence solver [Duchaineau07] was used for all tests, while the SBA software package [Lourakis00] was used to perform sparse bundle adjustment. Higher errors are shown in lighter colors, and are present mainly in regions with the 'aperture problem', such as train tracks and highways (red circle), near occlusion edges (green circle) and near the edges of the image (blue circle). High-error correspondences (outliers) are then detected by thresholding reprojection error values (2). An appropriate threshold is the mean plus 1.5 standard deviations of all the reprojection errors. Correction is applied using a brute-force local affine registration algorithm (3). An overall more-accurate set of correspondences (4) results in better pose and structure estimates (5).
Figure 1: Summary of the error detection and correction process, performed between a pair of images (top) from the 'Walnut Creek' dataset.
The correction mechanism works as follows. For a given outlier correspondence pair, the objective is to correct the position of the match in the second (target) image to the information in the first (source) image, while keeping the position in the first image fixed, to find a better match than the one currently available. The algorithm works on a coarse-to-fine resolution pyramid, where a fixed amount of iterations (typically hundreds) is applied per resolution level, such that the pixel count doubles at each level. After constructing the hierarchy, a sub-pixel accurate iterative, three-phase algorithm is applied at successively finer levels. Each iteration consists of perturbation, matching (based on gradient descent) and affine-fitting phases. The resulting transformation for level i of the hierarchy is used as a starting prediction at level i+1.
Starting at the coarsest level, a fixed-size image chip from the source image is centered at the start position on the target image. In testing, it was determined that an 11 x 11 chip size provided the best combination of accuracy and processing time. The first phase of one iteration, perturbation, consists of adding noise to the source image chip in order to avoid local minima which could possibly occur in the next phase, which is based on gradient descent. In this matching phase, for each pixel of the source image chip, a local gradient is computed at its current position in the target image. This gradient is used to make a linear prediction of the direction and distance to move the source pixel in the target image to match its intensity. Each pixel moves independently in this phase. For robustness, the movement step size is only a fraction of a pixel, and further modified according to the magnitude of the gradient. As the gradient magnitude becomes small (as determined by an adaptive threshold), the gradient direction becomes more noise than signal, and such pixels are eliminated from use in the next phase. In the final phase, affine-fitting, a least-squares fit is applied to find an affine transformation to be applied to the source image chip. Only those pixels inside the chip that were not eliminated during the matching phase are used. Eliminated pixels The three-phase process is iterated a number of times at this coarsest level first and then at successively higher resolutions, resulting in a new and more accurate correspondence position in the target image once completed.
The process is illustrated in Figure 2. For an aerial view of a small section of a road with vehicles in the Walnut Creek dataset, the upper left image shows the initial position of the image chip, where gradients are color-coded such that the largest gradients are displayed in lighter colors. Results of the three-phase algorithm are also illustrated for a given iteration: the lower left image shows the result of noise perturbation followed by matching, where the image depicts (via tilts in the pixels) the direction and also the movement of each individual pixel, and the lower right image shows the affine fit computed from this information. The upper right image shows marked with a green 'X' those pixels that were eliminated in the matching phase.
Figure 2: Dense correspondence affine correction mechanism, with higher gradients in lighter colors. Starting position (upper left), perturbation plus matching (lower left), affine fitting (lower right) and eliminated pixels in green (upper right).
Results
For the synthetic Coneland dataset, the pose error ΔR for rotation and ΔT for translation (in degrees) were obtained with respect to ground-truth values, using the original versus corrected dense correspondences. Results show an improvement in the pose. Outlier percentage, average outlier error μE (pixels) before correction and error improvement percentage ΔE were obtained for a number of test datasets, further showing that the final set of correspondences is more accurate after applying our method; please refer to the paper for detailed numerical results. Visual results obtained for the Coneland synthetic dataset are shown in Figure 3.
Figure 3: Reprojection errors between two frames from the 'Coneland' dataset (bottom left), detected and corrected outliers (bottom middle), and errors for the resulting set of correspondences (bottom right). Input images are shown along the top row.
Future Work
Future work includes the implementation of adaptive neighborhood sizes to improve accuracy in texture-less regions. Also, the generalization to multiple views, in order to prevent error propagation in sequential processing scenarios. Hardware solutions (such as the use of GPUs) for expensive processes such as bundle adjustment and outlier correction will also be investigated.
Related Publications
Mauricio Hess-Flores, Mark A. Duchaineau, Michael J. Goldman, Kenneth I. Joy. Iterative Dense Correspondence Error Correction Through Bundle Adjustment Feedback-Based Error Detection. VISAPP (1) 2010: 400-405.
[Duchaineau07] Duchaineau, M., Cohen, J., and Vaidya, S. (2007). Toward Fast Computation of Dense Image Correspondence on the GPU. In Proceedings of HPEC 2007, High Performance Embedded Computing, Eleventh Annual Workshop, pages 91-92, Lincoln Laboratory, Massachusetts Institute of Technology.
[Lourakis00] Lourakis, M. and Argyros, A. (2000). The Design and Implementation of a Generic Sparse Bundle Adjustment Software Package Based on the Levenberg-Marquardt Algorithm. Technical Report 340, Institute of Computer Science - FORTH, Heraklion, Crete, Greece.