CVPR 2014 Tutorial on Large-Scale Visual Place Recognition and Image-Based Localization

Monday, June 23rd - Half Day (1pm - 5pm) - Room C216

 Torsten Sattler   Akihiko Torii 

Tutorial Description
The tutorial consists of two parts covering the general problem of visual place recognition and the more specific task of image-based localization. Throughout the tutorial, we provide links to publicly available source code for the discussed approaches as well as publicly available datasets.

The first part of the tutorial, covering visual place recognition, looks at an application scenario in which the scene is represented by a set of geo-tagged images. The aim of visual place recognition approaches is to approximate the position of the viewer by identifying the place visible in the query image using (image) retrieval methods. We will discuss several improvements to the standard retrieval pipeline that detect and remove confusing features, exploit the known spatial relations between the images, incorporate priors on the viewer’s position, and enable place recognition systems to handle the repetitive structures prevalent in urban environments. Finally, we present techniques aiming to better distinguish between different places, e.g., by learning the appearance of different parts of the scene or by identifying structures unique to certain areas.

Assuming that the scene is represented by a 3D Structure-from-Motion model, the full pose of the query image, i.e., its position and orientation, can be estimated very precisely. State-of- the-art approaches for image-based localization compute the pose from 2D-3D correspondences between 2D features in the query image and 3D points in the model, which are determined through descriptor matching. In the second part of the tutorial, we first introduce the standard data structures for descriptor matching as well as different approaches to estimate the camera pose from the 2D-3D matches. We then detail the prioritized matching schemes that enable state-of-the-art localization systems to efficiently handle 3D models consisting of millions of 3D points. We thereby focus on the details required for an efficient implementation of such systems. This includes discussing how to exploit existing visibility information between 3D points in the model and the database images used to reconstruct the scene for both the matching and pose estimation stages of localization. After explaining how to reduce the memory requirements by using only a subset of all 3D points, these direct matching methods are then compared to more scalable image-based localization approaches based on image retrieval techniques.

  • Introduction [1:00pm - 1:10pm, 10 min, Torsten]
  • Visual Place Recognition [1:10pm - 2:40pm, 80 min, Akihiko]
  • Questions [2:40pm - 2:50pm, 10 min, Akihiko]
  • Coffee Break [2:50pm - 3:10pm, 20 min]
  • Image-Based Localization [3:10pm - 4:40pm, 90 min, Torsten]
  • Questions & Closing Remarks [4:40pm - 5:00pm, 20 min, Akihiko & Torsten]