ECCV 2020 Workshop on

Long-Term Visual Localization under Changing Conditions

Overview

When: August 28th, 2020
Where: Virtual
Time: 10:00 - 12:00 UTC+1, 18:00 - 20:00 UTC + 1
Schedule:
- 10:00 - 10:05 Introduction by the organizers
- 10:05 - 10:35 Invited Talk: Iro Armeni
- 10:35 - 11:05 Invited Talk: Yubin Kuang
- 11:05 - 12:00 Talks by Challenge winners and runner-ups
  - 11:05 - 11:25 1st place all challenges: Paul-Edouard Sarlin: Hierarchical Localization with hloc and SuperGlue
  - 11:25 - 11:40 2nd place Handheld Devices Challenge: Shuang Gao, Huanhuan Fan, Yuhao Zhou, Xudong Zhang, Ang Li, Jijunnan Li, Yandong Guo, RLOCS: Retrieval and Localization with Observation Constraints
  - 11:40 - 11:55 2nd place Local Feature Challenge: Iaroslav Melekhov, Gabriel J. Brostow, Juho Kannala, Daniyar Turmukhambetov, Image Stylization for Robust Features

- 18:00 - 18:05 Introduction by the organizers
- 18:05 - 18:35 Invited Talk: Nathan Jacobs
- 18:35 - 19:05 Retrospective: Akihiko Torii
- 19:05 - 19:35 Invited Talk: Angela Dai
- 19:35 - 20:00 Talks by Challenge winners and runner-ups
  - 19:35 - 19:50 2nd place Autonomous Vehicle Challenge: Martin Humenberger, Yohann Cabon, Nicolas Guerin, Julien Morat, Jérôme Revaud, Philippe Rerole, Noe ́ Pion, Cesar de Souza, Gabriela Csurka, Late Fusion of Global Image Descriptors for Visual Localization (plus Kapture Demo)

Deadlines

Challenge submission opens: July 15th
Challenge submission deadline: August 18th
Notification: August 19th

Note that the workshop focuses on the submissions to the challenges. There will be no contributed papers.

Note that the Mapillary Place Recognition Challenge has separate deadlines. See the challenges page for details.

Abstract

Visual localization is the problem of estimating the position and orientation from which an image was taken. It is a vital component in many Computer Vision and Robotics scenarios, including autonomous vehicles, Augmented / Mixed / Virtual Reality, Structure-from-Motion, and SLAM. Due to its central role, visual localization is currently receiving significant interest from both academia and industry. Of special practical importance are long-term localization algorithms that generalize to unseen scene conditions, including illumination changes and the effects of seasons on scene appearance and geometry. The purpose of this workshop is to benchmark the current state of visual localization under changing conditions and to encourage new work on this challenging problem. The workshop consists of both presentations by experts in the field (from academia and industry) and challenges designed to highlight the currently unsolved problems.

Detailed Description

Visual localization is the problem of (accurately) estimating the position and orientation, i.e., the camera pose, from which an image was taken with respect to some scene representation. Visual localization is a vital component in many interesting Computer Vision and Robotics scenarios, including autonomous vehicles such as self-driving cars and other robots, Augmented / Mixed / Virtual Reality, Structure-from-Motion, and SLAM.

There are multiple approaches to solve the visual localization problem: Structure-based methods establish matches between local features found in a query image and 3D points in a Structure-from-Motion (SfM) point cloud. These matches are then used to estimate the camera pose by applying a n-point-pose solver inside a RANSAC loop. Localization techniques based on scene coordinate regression replace the feature extraction and matching stage through machine learning by directly predicting the 3D point corresponding to a pixel patch. The resulting 2D-3D matches are then used for classical, RANSAC-based pose estimation. Camera pose regression techniques such as PoseNet replace the full localization pipeline with a CNN that learns to regress the 6DOF pose from a single image. While these approaches aim to estimate a highly accurate pose, image retrieval-based approaches aim to provide a coarser prior. Using compact image-level descriptors, they are typically much more scalable than method that represent the scene either explicitly via a SfM point cloud or implicitly via a CNN.

Common to all ways to approach the visual localization problem is that they generate a representation of the scene from a set of training images. Also common to all these approaches is that they (implicitly) assume that the set of training images covers all relevant viewing conditions, i.e., that the test images are taken under similar conditions as the training images. In practice however, the set of training images will only depict the scene under a subset of all possible viewpoints and illumination conditions. Moreover, many scenes are dynamic over time. For example, the geometry and appearance of outdoor scenes changes significantly over time.

While a substantial amount of work has focused on making visual localization algorithms more robust to viewpoint changes between training and test images, there is comparably little work on handling changes in scene appearance over time. Part of this is due to a lack of suitable benchmark datasets, which have only started to become available recently. Yet, changes over time, e.g., due to seasonal changes in outdoor scenes or changes in furniture in indoor scenes, pose very significant problems as they often lead to changes in both scene appearance (captured in the images) and scene geometry.