The workshop consists of four challenges. The winner of each challenge will receive 1500 USD while the runner-up will receive 750 USD. The first three challenges are focused on accurate camera pose estimation:
visual localization for autonomous vehicles
visual localization for handheld devices
local features for long-term localization
The goal of the first two challenges is to develop full visual localization pipelines that take an image as input and estimate its camera pose with respect to the scene. They differ by the assumptions that they can make: for challenge 1, the camera will mostly undergo movement in a plane and sequences of query images are available. For the second challenge, only single images are available as query and they can undergo arbitrary 6DOF motion.
The local feature challenge is motivated by the observation that state-of-the-art approaches often still rely on local features. The goal of the challenge is to encourage work on robust local features without the need to develop a full localization pipeline. We provide code for feature matching and the camera pose estimation parts of a localization pipeline, such that participants can fully focus on devising new local features.
The last challenge focuses on the highly related problem of place recognition:
Mapillary Place Recognition Challenge
Note that the Mapillary Place Recognition Challenge has a separate set of rules and is handled through a service other than visuallocalization.net. Please see here for details.
The following rules, deadlines, and information only applies to the first three challenges. For details on the fourth challenge, please see here.
For each dataset and challenge, we evaluate the pose accuracy of a method. To this end, we follow [Sattler et al., Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions, CVPR 2018] and define a set of thresholds on the position and orientation errors of the estimate pose. For each (X meters, Y degrees) threshold, we report the percentage of query images localized within X meters and Y degrees of the ground truth pose.
For ranking the methods, we follow the Robust Vision Challenge from CVPR 2018: For each dataset and challenge, we will rank the submitted results based on these percentages. Afterwards, we rank all methods submitted to a challenge based on their ranks on the individual datasets. The rankings are computed using the Schulze Proportional Ranking method from [Markus Schulze, A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method, Social Choice and Welfare 2011].
The Schulze Proportional Ranking method is based on pairwise comparison of results. If the results of a method are not available for a dataset, the comparison will assume that it performs worse than a method for which the results are available.
A leaderboard for approaches participating in the challenge will be made available at a later point at https://www.visuallocalization.net .
By submitting to a workshop challenge, you agree to eventually publish a paper describing your approach. This can either be a regular conference / journal paper or a paper on arXiv. Prices will only be awarded to winners if the paper describing their approach is available to us. The paper needs to be available by August 18th.
If you are an author of a paper related to the challenges, we strongly encourage you to evaluate your method on the challenge datasets and to submit your results to one or more of the challenges. If you already have results on the datasets (potentially publicly visible on visuallocalization.net), we strongly encourage you to also submit your results to the challenges. Besides novel work on the topic, we also encourage the following types of submissions:
Combinations of existing methods, e.g., using SuperPoint features in a localization approach implemented in Colmap, a state-of-the-art feature matching algorithm in combination with local features such as SuperPoints or D2-Net features, or exchanging the components of existing algorithms to boost performance.
Submissions showing that existing methods can outperform methods with results published on the benchmarks, e.g., by carefully tuning parameters or using a different training dataset.
Combining existing methods with pre- or post-processing approaches, e.g., using histogram equalization on the input images, building better 3D models (for example through model compression or the use of dense 3D models), or integrating an existing localization algorithm into a (visual) Odometry / SLAM system.
Using matches obtained by an existing method for multi-image localization.
Showing that existing methods work well on our challenges, even though the community believes that they do not work.
We will not consider methods of the following type: Reproducing results already published on visuallocalization.net by running someone else's code out of the box (if you are not a co-author of the underlying method) or using your own implementation. However, re-implementations that outperform the existing one are explicitly encouraged.
Using additional data, e.g., for training is explicitly permitted. For example, one could use other nighttime images from the RobotCar dataset (not used in the RobotCar Seasons dataset) to train descriptors. Training on the test images is explicitly forbidden. You will need to explicitly specify which data was used for training.
One member (or representative) of the winner and runner-up teams of each challenge needs to attend the workshop and give a talk about their approach.
Each team can update its challenge results until the deadline.
We explicitly encourage participation from industry.
Challenge submissions will be handled via the evaluation service set up at https://visuallocalization.net/ :
In order to submit results, you will need to create an account. You are only allowed to use a single account per team.
You will need to specify that you are submitting results for one of the challenges in the submission mask and will need to specify for which challenge you are submitting results. You will be able to decide whether the results will be publicly visible on the leaderboard of the challenge. You will be able to make results publicly visible afterwards if you initially decide to not show the results.
In order to be considered for the evaluation, results need to be publicly visible. This is due to the fact that we are using a ranking-based approach to determine the winners.
More details will follow at a later point in time.
Challenge submission opens: July 15th
Challenge submission deadline: August 18th
Notification: August 19th
Note that the Mapillary Place Recognition Challenge has separate rules and deadlines. Most notably, winners will be decided after the workshop. Please see here for more details.
The following datasets will be used for the visual localization for autonomous vehicles challenge:
The following datasets will be used for the visual localization for handheld devices challenge:
The following dataset will be used for the local features for long-term localization challenge:
Aachen Day-Night v1.1
In order to make participation and data processing easier, in addition to the original formats all datasets are also provided in NAVER LABS Europe's kapture format, which is a data format and tool set for facilitating the integration of various datasets within your processing environment.
To download each dataset in this format, please follow the instructions provided here.
More details, news, and updates about kapture can be found on the kapture website.
Code, specifications, and documentation can be found on GitHub.
For questions, suggestions, and contributions, please contact kapture@naverlabs.com
The following is provided for the challenge:
A script that performs feature matching between images and the uses the COLMAP Structure-from-Motion software for pose estimation.
A list of image pairs for matching.
An empty COLMAP database, in which your features will be imported.
See this Github repository for the code, data, and information on using both.
The workflow for submitting to the challenge is:
Use your own feature detector and descriptor to extract local features for all query images and their reference images.
Execute the provided code for importing the features, performing feature matching, 3D reconstruction, and camera pose estimation. If you want to use your own feature matcher and not the one provided by us (which finds mutual nearest neighbors), you will need to adapt this code.
Submit the output file at visuallocalization.net, indicating that you are submitting to the local feature challenge of this CVPR 2020 workshop.