3. Image-Based Localization

Speaker: Torsten Sattler

Description: In the second half of the tutorial, we focus on the image-based localization problem, i.e., the problem of exactly determining the position and orientation from which a given query image was taken. We consider the case that we have a 3D model of the scene obtained using Structure-from-Motion. This setting allows us to establish 2D-3D matches between local features extracted in the query image and the 3D points in the model using image descriptors. These 2D-3D matches can then be used to estimate the camera pose using an n-point-pose solver inside a RANSAC loop.

This part of the tutorial is structured in three sub-parts:

    • In the first part, we briefly explain how we can obtain the 2D-3D matches through nearest neighbor search and review basic acceleration schemes (kd-trees and hierarchical k-means trees). We then briefly review RANSAC and how to estimate the camera pose using different n-point-pose solver. We conclude this part by introducing the standard datasets used for image-based localization.

    • The main focus of this half of the tutorial is on the second part, where we explain image-based localization approaches based on descriptor matching in detail. We show that the matching direction (2D-to-3D vs. 3D-to-2D matching) matters and how to perform prioritized search for both directions in order to obtain efficient localization methods. We then detail how to combine both search directions in order to obtain state-of-the-art localization effectiveness, i.e., in order to be able to localize as many query images as possible. Furthermore, we show how to integrate visibility information obtained from the Structure-from-Motion process into the resulting pipelines.

    • The image-based localization approaches introduced in the second part have two main issues preventing their scalability to larger datasets: The memory consumption induced by their need to use image descriptors and the fact that the ratio test employed to reject ambiguous matches starts to reject more and more correct matches for both datasets. We sketch individual solutions for both problems and then show that both can be solved simultaneously by using place recognition techniques to limit the search space of descriptor matching.