Course Description

Camera geometry estimation is a crucial tasks in computer vision with many applications, e.g., in structure- from-motion (SfM) (and by extension, areas relying on SfM as input, e.g., neural radiance fields), visual navigation, augmented and mixed reality, self-driving cars, large-scale 3D reconstruction, and visual localization. Due to the presence of noise and outliers in the input data, e.g., pixel-level correspondences, the predominant way in camera geometry estimation is to use a hypothesis-and-test framework, such as RANSAC. In RANSAC-like methods, two different types of solvers are used: (1) one for fitting a model to a minimal sample, and (2) one for refining a model on (a non-minimal subset of) all inliers, e.g., for final refinement or local optimization. For (1), the main objective is to solve the problem using as few correspondences as possible since the number of RANSAC iterations (and run-time) depends exponentially on the number of correspondences required for model estimation.

Minimal problems often result in complex systems of polynomial equations in several variables. The introduction of algebraic-based methods, i.e., Gr ̈obner basis and resultant-based methods for generating efficient polynomial solvers, into the computer vision community led to solutions to many previously unsolved problems, e.g., the relative and absolute pose problems for cameras with unknown focal length, unknown radial distortion, rolling shutter, and generalized and semi-generalized cameras. In addition, these methods resulted in solvers that can exploit the local image geometry of features, e.g., SIFT or affine features, use lines instead of points, combine lines and points or 2D-2D and 2D-3D matches, or specialize in certain types of camera motion (planar motion, etc.).

While significant progress has been made on minimal and non-minimal solvers over the last decade, current applications surprisingly still rely on classical solvers (e.g., the well-known 5-point-relative-pose and P3P solvers) and do not use more modern ones. E.g., the widely used SfM system COLMAP does not estimate focal length and radial distortion parameters during relative or absolute pose estimation but rather as a post-processing step. As a result, COLMAP (as many other SfM algorithms) struggles in the presence of strong radial distortion.

The aim of this tutorial is to raise awareness of the tools (solvers and solver generators) that are nowadays at the disposal of 3D vision researchers and practitioners and which problems can and cannot be efficiently solved at the moment. To this end, the tutorial will discuss current state-of-the-art minimal and non-minimal solvers, explain how to implement them and how to use them in practice, as well as give examples of their use in applications. The tutorial has three goals: 1) Provide a comprehensive overview over the current state-of-the-art. At the same time, the tutorial will function as an introduction to the field, e.g., for first- and second-year students. 2) Have experts teach the tricks of the trade to more experienced PhD students and engineers who want to refine their knowledge. 3) Highlight current open problems. This outlines what current algorithms can and cannot do. Throughout the tutorial, we provide links to publicly available source code for the discussed approaches

Part I: Introduction and overview over current camera geometry estimation solvers

We will briefly introduce the most common camera geometry estimation problems, including relative and ab- solute pose problems for calibrated, uncalibrated, and partially calibrated cameras. Starting with a short historic overview, we will then discuss the current state-of-the-art for these problems. This includes highlighting the challenges faced when aiming for efficient and robust solutions for camera geometry estimation.

Part II: Classification of solvers

Part II.1 (Solvers for different types of cameras) While the introduction focuses on ”standard” perspective cameras, this part will introduce and discuss more advanced camera models. We will explain how to model multi- camera systems or reconstructed image sequences as so-called generalized cameras and how to estimate absolute and relative poses for them. A highly relevant special case of generalized relative pose estimation is when one of the two cameras is a perspective camera. This tasks is, e.g., part of the SfM and visual localization problems. We will introduce efficient solutions, including for the case where the scene is partially planar. Finally, we will discuss absolute pose estimation solvers for cameras with rolling shutter. Rolling shutter cameras are practically every- where, e.g., in smartphones, and lead to very complex camera models and very challenging estimation problems. We will introduce efficient solutions that combine iterative methods with concepts from algebraic geometry.

Part II.2 (Solvers for different types of inputs) This part will focus on minimal and non-minimal solvers for estimating camera geometry, e.g., relative pose and homography, using various types of image features. Specifically, we will discuss solvers that leverage 2D line-to-line correspondences and their coincidences, e.g., vanishing points and junctions, to estimate geometric transformations. We will also cover solvers that use partially or fully affine covariant features (and how to make them affine covariant in practice) to estimate two-view geometry. This is an important concept since recent feature types (e.g., SuperPoint) can also be easily made affine covariant, which can improve robustness and accuracy in challenging environments. Furthermore, we will explore solvers that use multiple data modalities, which can help to further improve estimation accuracy and robustness. We will provide practical examples and guidelines for using these solvers effectively in real-world applications.

Part II.3 (Non-minimal solvers and non-linear refinement) This part of the tutorial will consider the problem of refitting the model once an initial inlier-set has been established. Since minimal solvers exactly fit the input measurements, any noise will be directly compensated by the estimated pose. The current paradigm in robust estimation is then to refit the model onto the inlier-set found by the minimal solver. This happens both during RANSAC, i.e., as part of local optimization (LO-RANSAC), and as a post-processing step. We will discuss the state-of-the-art in so-called non-minimal solvers which can leverage larger input sizes. We will also give a short crash-course in non-linear refinement methods and discuss implementation details (e.g. robust loss-functions, efficient Jacobian computations, rotation manifolds) that are relevant for camera pose estimation problems.

Part III: Introduction to automatic generators: Theory and Practice

Most of the solvers discussed in the tutorial were created using automated generators. These generators take as input a set of polynomial equations with symbolic coefficients and produce code for efficiently computing solutions to the input system with any non-degenerate coefficients. This part will first describe the underlying theoretical concepts from algebraic geometry, including Gr ̈obner bases and resultant-based approaches, methods for simplifying systems, and homotopy continuation methods, and how they are used to generate efficient polynomial solvers. We will then show a practical example of applying a modern generator on a concrete camera geometry problem.

Part IV: Using modern solvers inside RANSAC

This part will delve into the practical application of using minimal and non-minimal solvers in the presence of noise and outliers (i.e., points inconsistent with the sought model). The most popular approaches for robust estimation are based on the RANSAC-like hypothesize-and-verify framework. Therefore, we will focus on using these solvers within recent RANSACs to increase robustness and accuracy when dealing with outliers and noise.

We will start by explaining the benefits of using RANSAC for robust estimation, and how it can be used in combination with minimal and non-minimal solvers to obtain accurate estimates of camera geometry. Then, we will explore the best practices for using these solvers inside RANSAC to increase robustness and accuracy. This will include a discussion of different solver and RANSAC configurations and their relative strengths and limitations. Finally, we will explore advanced topics such as the use of hybrid robust estimators that combine multiple feature types to improve accuracy even further. Throughout this part, we will provide practical tips and guidelines for choosing the proper method for a given problem.

Part V: Applications: (Privacy-preserving) visual localization

The final part will discuss two applications of camera geometry estimation problems: (1) Applying solvers for the semi-generalized relative pose problem and the problem of estimating the camera pose from 2D-3D line correspondences in the context of privacy-preserving localization in order to avoid exposing private data of users. (2) Using semi-generalized relative pose estimation solvers for structure-less visual localization, i.e., for the problem of accurate pose estimation without the need to build and maintain a 3D map of the scene.