ICCV 2021 Workshop on

Long-Term Visual Localization under Changing Conditions

Overview

When: October 17th, afternoon (EDT/Montreal Time)
Where: Virtual, join via the ICCV website or via YouTube
Time: 2pm-5.30pm (EDT/Montreal Time)
Schedule: (EDT/Montreal Time)
- 14:00-14:15 | Introduction by organizers & overview over challenges
- 14:15-14:45 | Invited Talk - Margarita Chli
- 14:45-15:15 | Invited Talk - Johanna Wald
- 15:15-15:30 | Coffee Break
- 15:30-16:00 | Talks by winner and runner-up of Challenges 1&2 [30 min]
  - Winner: MegLoc - Shuxue Peng, Zihang He, Haotian Zhang, Ran Yan, Chuting Wang, Qingtian Zhu, Yikang Ding, Liangtao Zheng, Xiao Liu
  - Runner-up: OSpace - Shuang Gao, Xudong Zhang, Yuchen Yang, Yishan Ping, Jixiang Wan, Kun Jiang, Xiaohu Nan, Yulin Sun, Jijunnan Li, Yandong Guo, "Visual Localization with RLOCS and Point-Line Optimization"
- 16:00-16:20 | Talks by winner and runner-up of Challenge 3 [20 min]
- 16:20-16:50 | Invited Talk - Hongdong Li
- 16:50-17:20 | Invited Talk - Michael Milford

Deadlines

Challenge submission opens: July 30th
Challenge submission deadline: October 7th
Notification: October 11th

Abstract

Visual localization is the problem of estimating the position and orientation from which an image was taken. It is a vital component in many Computer Vision and Robotics scenarios, including autonomous vehicles, Augmented / Mixed / Virtual Reality, Structure-from-Motion, and SLAM. Due to its central role, visual localization is currently receiving significant interest from both academia and industry. Of special practical importance are long-term localization algorithms that generalize to unseen scene conditions, including illumination changes and the effects of seasons on scene appearance and geometry. The purpose of this workshop is to benchmark the current state of visual localization under changing conditions and to encourage new work on this challenging problem. The workshop consists of both presentations by experts in the field (from academia and industry) and challenges designed to highlight the currently unsolved problems.

Detailed Description

Visual localization is the problem of (accurately) estimating the position and orientation, i.e., the camera pose, from which an image was taken with respect to some scene representation. Visual localization is a vital component in many interesting Computer Vision and Robotics scenarios, including autonomous vehicles such as self-driving cars and other robots, Augmented / Mixed / Virtual Reality, Structure-from-Motion, and SLAM.

There are multiple approaches to solve the visual localization problem: Structure-based methods establish matches between local features found in a query image and 3D points in a Structure-from-Motion (SfM) point cloud. These matches are then used to estimate the camera pose by applying a n-point-pose solver inside a RANSAC loop. Localization techniques based on scene coordinate regression replace the feature extraction and matching stage through machine learning by directly predicting the 3D point corresponding to a pixel patch. The resulting 2D-3D matches are then used for classical, RANSAC-based pose estimation. Camera pose regression techniques such as PoseNet replace the full localization pipeline with a CNN that learns to regress the 6DOF pose from a single image. While these approaches aim to estimate a highly accurate pose, image retrieval-based approaches aim to provide a coarser prior. Using compact image-level descriptors, they are typically much more scalable than method that represent the scene either explicitly via a SfM point cloud or implicitly via a CNN.

Common to all ways to approach the visual localization problem is that they generate a representation of the scene from a set of training images. Also common to all these approaches is that they (implicitly) assume that the set of training images covers all relevant viewing conditions, i.e., that the test images are taken under similar conditions as the training images. In practice however, the set of training images will only depict the scene under a subset of all possible viewpoints and illumination conditions. Moreover, many scenes are dynamic over time. For example, the geometry and appearance of outdoor scenes changes significantly over time.

While a substantial amount of work has focused on making visual localization algorithms more robust to viewpoint changes between training and test images, there is comparably little work on handling changes in scene appearance over time. Part of this is due to a lack of suitable benchmark datasets, which have only started to become available recently. Yet, changes over time, e.g., due to seasonal changes in outdoor scenes or changes in furniture in indoor scenes, pose very significant problems as they often lead to changes in both scene appearance (captured in the images) and scene geometry.