Members:
Shawn Recker, Mauricio Hess-Flores, Mark A. Duchaineau, Kenneth I. Joy
Abstract
We present the general idea of using common tools from the field of scientific visualization to aid in the design, implementation and testing of computer vision algorithms, as a complementary and educational component to purely mathematics-based algorithms and results. The interaction between these two broad disciplines has been basically nonexistent in the literature, and through initial work we have been able to show the benefits of merging visualization techniques into vision. Specific examples and results will be discussed, such as uncertainty and sensitivity based on scalar field analysis, with a discussion on other future work along these lines.
I. Introduction
The field of computer vision has seen great advances in recent years, in fields such as object detection and tracking, and the multi-view reconstruction of scenes. Algorithms such as the Scale-Invariant Feature Transform (SIFT) [Lowe04] have allowed for very accurate feature detection and tracking, the main component behind such algorithms. For an excellent overview of many classical vision algorithms, the reader is referred to Hartley and Zisserman [HartleyZisserman04]. One drawback of current computer vision methods is that many are based on mathematical optimization of initial parameter estimates to achieve accurate results. Though such optimization is provably necessary, such as in the case of the well known bundle adjustment [Triggs00] in structure-from-motion, little interest has been given as to how individual values and their patterns affect the total cost.
The main objective of our work is to introduce visualization techniques to the vision community as a very powerful educational and algorithm design/test tool, allowing for unique visually-aided numerical exploration of the solution space in a number of applications. Visualization as a field aims to provide renderings of volumes and surfaces, including those that are time-dependent, to graphically illustrate scientific data for its understanding. Some concrete applications and results will be discussed in Section II, such as the use of scalar fields in understanding scene reconstruction uncertainty and parameter sensitivity, as well as feature track summaries. Proposed future work will be discussed in Section III, with conclusions in Section IV. A short paper which summarizes all work so far in this area is provided in Hess-Flores et al. [HessFlores13].
II. Initial Applications
A. Visualization of Scene Structure Uncertainty
One application of visualization in computer vision involves a novel tool which we have designed, which allows for the analysis of scene structure uncertainty and its sensitivity to different multi-view scene reconstruction parameters [Recker12]. More details can be found here. Multi-view scene reconstruction is an important sub-field of computer vision, aiming to extract a 3D point cloud representing a scene. Detailed analysis and comparisons between methods are available in the literature [Seitz06]. Our tool creates a scalar field volume rendering, which provides insight into structural uncertainty for a given 3D point cloud position, displaying the error pattern for sampled bounding box-enclosed neighbors. A screenshot of the tool is shown in Figure 1. The combined statistical, visual, and isosurface information of the right-hand panel provides user insight into the uncertainty of computed structure given the stages leading up to its computation, such as frame decimation, feature tracking, and self-calibration, where larger regions of low uncertainty indicate robustness. Sensitivity is defined as the change in scalar field values as a specific reconstruction parameter's value changes. The scalar field is created at a user-specified size and resolution, from angular error measurements. For a reconstruction from images in Figure 2(a), a selected position in red enclosed by a green bounded region is shown in Figure 2(b) and magnified in Figure 2(c), with camera positions rendered in blue. The scalar field corresponding to the bounded region, shown in Figure 2(d), depicts lower positional uncertainties in red and higher ones in yellow and green. The red user-defined isosurface, which contains the ground-truth structure position, depicts regions of lowest uncertainty. The visible column-like shape indicates that lower uncertainty is seen in the directions along the lines of sight of the cameras. User interaction allows for uncertainty and sensitivity analysis in ways that have traditionally been achieved mathematically, without any visual aid.
Figure 1: Our user-interactive tool for uncertainty and sensitivity analysis in multi-view scene reconstruction, based on a scalar field analysis adopted from the visualization literature.
(a)
(b)
(c)
(d)
Figure 2: Our uncertainty visualization framework. From input images in (a), an initial point cloud reconstruction is computed in (b), with estimated camera positions as blue dots. A position on the point cloud is chosen, highlighted in red in (c) and surrounded by a bounding box for scalar-field analysis. The resulting field is shown in (d).
B. Tracking Summaries
Another application is the creation of feature tracking summaries [Recker13]. A feature track is a set of pixel positions representing a scene point tracked over a set of images. A summary can be created by stacking frames vertically and observing the 'path' taken by a track. Given a 3D position computed from multi-view stereo, its reprojection error with respect to its corresponding feature track is the only valid metric to assess error, in the absence of ground truth. However, the total reprojection error value in itself does not allow the researcher to attribute a cause to the problem, for example an individual bad match, to the error. We present a novel visualization technique to encode all individual reprojection errors in one such summary rendering, which provides insight into track degeneration patterns over time (in sequential reconstructions) as well as information about highly inaccurate individual track positions, all of which adversely affect camera pose estimation and structure computation. These summaries allow the user to infer the cause of a bad track, which pure total error cannot provide, allowing researchers to analyze errors in ways previously unachieved. Figure 3 shows an example.
(a)
(b)
Figure 3: Feature track summary for Oxford VGG's Dinosaur dataset. Low individual reprojection errors are depicted in blue, with higher ones in green (a). Overall values are much smaller after bundle adjustment (b).
Future Work
We are very excited about the wealth of possibilities in applying visualization techniques into other computer vision problems, such as those described above. One such application is in `visualizing' fields of covariance matrices, analyzing the covariance descriptors for all pixels at once. Such matrices are fundamental in computer vision, and yet the numerical data they provide is not easy to summarize and interpret. By visualizing for example the change over time of parameters related to such matrices, insight can be achieved into understanding specific values and trends. Looking further, object detection summaries, video summaries and bundle adjustment convergence visualization are also possible applications.
Related Publications
[Recker13] Shawn Recker, Mauricio Hess-Flores, Kenneth I. Joy, "Feature Track Summary Visualization for Multi-View Reconstruction", in "IEEE Applied Imagery Pattern Recognition (AIPR) Workshop", 2013.
[HessFlores13] Mauricio Hess-Flores, Shawn Recker, Kenneth I. Joy, Mark A. Duchaineau, "Visualization Methods for Computer Vision Analysis", in Fifth International Conferences on Pervasive Patterns and Applications (PATTERNS 2013)", Valencia, Spain, 2013.
[Recker12] Shawn Recker, Mauricio Hess-Flores, Mark A. Duchaineau, Kenneth I. Joy, "Visualization of Scene Structure Uncertainty in a Multi-View Reconstruction Pipeline", in "Vision, Modeling and Visualization Workshop (VMV 2012)", pp 183-190, 2012.
[HartleyZisserman04] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004.
[Lowe04] D. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints,"International Journal On Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
[Seitz06] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, "A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms," in CVPR '06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE Computer Society, 2006, pp. 519-528.
[Triggs00] B. Triggs, P. McLauchlan, R. I. Hartley, and A. Fitzgibbon, "Bundle Adjustment - A Modern Synthesis," in ICCV '99: Proceedings of the International Workshop on Vision Algorithms. London, UK: Springer-Verlag, 2000, pp. 298-372.