CVPR 2019 Workshop on

Long-Term Visual Localization under Changing Conditions


  • When: June 16th or 17th, 2019
  • Where: TBA
  • Time: half day, TBA
  • Schedule: TBA


  • Challenge submission opens: TBA
  • Challenge submission deadline: TBA
  • Notification: TBA

Note that the workshop focuses on the submissions to the challenges. There will be no contributed papers.


Visual localization is the problem of (accurately) estimating the position and orientation, i.e., the camera pose, from which an image was taken with respect to some scene representation. Visual localization is a vital component in many interesting Computer Vision and Robotics scenarios, including autonomous vehicles such as self-driving cars and other robots, Augmented / Mixed / Virtual Reality, Structure-from-Motion, and SLAM.

There are multiple approaches to solve the visual localization problem: Structure-based methods establish matches between local features found in a query image and 3D points in a Structure-from-Motion (SfM) point cloud. These matches are then used to estimate the camera pose by applying a n-point-pose solver inside a RANSAC loop. Localization techniques based on scene coordinate regression replace the feature extraction and matching stage through machine learning by directly predicting the 3D point corresponding to a pixel patch. The resulting 2D-3D matches are then used for classical, RANSAC-based pose estimation. Camera pose regression techniques such as PoseNet replace the full localization pipeline with a CNN that learns to regress the 6DOF pose from a single image. While these approaches aim to estimate a highly accurate pose, image retrieval-based approaches aim to provide a coarser prior. Using compact image-level descriptors, they are typically much more scalable than method that represent the scene either explicitly via a SfM point cloud or implicitly via a CNN.

Common to all ways to approach the visual localization problem is that they generate a representation of the scene from a set of training images. Also common to all these approaches is that they (implicitly) assume that the set of training images covers all relevant viewing conditions, i.e., that the test images are taken under similar conditions as the training images. In practice however, the set of training images will only depict the scene under a subset of all possible viewpoints and illumination conditions. Moreover, many scenes are dynamic over time. For example, the geometry and appearance of outdoor scenes changes significantly over time.

While a substantial amount of work has focused on making visual localization algorithms more robust to viewpoint changes between training and test images, there is comparably little work on handling changes in scene appearance over time. Part of this is due to a lack of suitable benchmark datasets, which have only started to become available recently. Yet, changes over time, e.g., due to seasonal changes in outdoor scenes or changes in furniture in indoor scenes, pose very significant problems as they often lead to changes in both scene appearance (captured in the images) and scene geometry.

The purpose of this workshop is to serve as a benchmark for the current state of visual localization under changing conditions, i.e., over long periods of time. To encourage work on this topic, the workshop consist of both presentations by experts in the field of (long-term) visual localization (both from academia and industry) and challenges designed to highlight the problems encountered during long-term operation.