Challenges

Challenges

The workshop consists of two types of challenges. The winner of each challenge will receive 1000 USD while the runner-up will receive 500 USD.

  • Visual Localization: The goal is to estimate the 6DOF pose from which a given test image was taken as accurately as possibleGiven a set of reference images taken under a single condition / at one point in time, estimate the poses from which new images, taken under different conditions, were captured. The estimated poses should be as accurate as possible. Querying with multiple images (which requires estimating relative poses) is explicitly allowed, but not supported by all datasets. A reference 3D model of the scene will be available to the participants.
  • Local Features: The purpose of this challenge is to measure the impact of different types of local feature detectors and descriptors on camera pose estimation accuracy. To allow a fair comparison, we will fix the localization pipeline. For each query image, the participants will be provided with a set of relevant database images together with their camera poses and intrinsics. The participants will need to establish correspondences between the images in the set. Using the known poses of the database images, the correspondences will be used to generate a local SfM point cloud of the scene. This local point cloud will then be used to estimate the pose of the query image. We will provide source code for the reconstruction and pose estimation parts. The participants will thus only need to detect and match the features.

Concretely there are three challenges:

  • End-to-End Visual Localization: Given an image or a set of images, as well as a set of reference images with known poses, estimate the 6DOF pose(s) of the query image(s). If the dataset provides query sequences, we encourage to query with multiple images instead of a single one. Given that end-to-end trained methods have been shown to struggle to larger and / or more complex scenes, they have their separate challenge.
  • Visual Localization: Given an image or a set of images, as well as a set of reference images with known poses, estimate the 6DOF pose(s) of the query image(s). If the dataset provides query sequences, we encourage to query with multiple images instead of a single one. All non-end-to-end methods for visual localization will need to submit to this challenge.
  • Local Feature Evaluation: Given a set of query images, a set of relevant reference images per query and their ground truth poses, establish feature correspondences between the query images and the reference images. These correspondences are then used to estimate the pose of the query image. For more details, see below.

Evaluation

For each dataset and challenge, we evaluate the pose accuracy of a method. To this end, we follow [Sattler et al., Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions, CVPR 2018] and define a set of thresholds on the position and orientation errors of the estimate pose. For each (X meters, Y degrees) threshold, we report the percentage of query images localized within X meters and Y degrees of the ground truth pose.

For ranking the methods, we follow the Robust Vision Challenge from CVPR 2018: For each dataset and challenge, we will rank the submitted results based on these percentages. Afterwards, we rank all methods submitted to a challenge based on their ranks on the individual datasets. The rankings are computed using the Schulze Proportional Ranking method from [Markus Schulze, A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method, Social Choice and Welfare 2011].

The Schulze Proportional Ranking method is based on pairwise comparison of results. If the results of a method are not available for a dataset, the comparison will assume that it performs worse than a method for which the results are available.

Rules

  • By submitting to a workshop challenge, you agree to eventually publish a paper describing your approach. This can either be a regular conference / journal paper or a paper on arXiv. Prices will only be awarded to winners if the paper describing their approach is available to us. The paper needs to be available by June 1st.
  • The method used for submission has to have some novel elements that distinguish it from previous work by other authors. If the method only applies an existing approach from other authors, the prize will be awarded to the original authors. This will be checked based on the papers describing the methods.
  • Using additional data, e.g., for training is explicitly permitted. For example, one could use other nighttime images from the RobotCar dataset to train descriptors. Training on the test images is explicitly forbidden. You will need to explicitly specify which data was used for training.
  • One member (or representative) of the winner and runner-up teams of each challenge needs to attend the workshop and give a talk about their approach.
  • We will limit the number of submissions on the test sets to avoid gradient descent on the evaluation results.
  • Each team can update its challenge results until the deadline.
  • We explicitly encourage participation from industry.

Submission

Challenge submissions will be handled via the evaluation service set up at https://visuallocalization.net/ :

  • In order to submit results, you will need to create an account. You are only allowed to use a single account per team.
  • You will need to specify that you are submitting results for one of the challenges in the submission mask and will need to specify for which challenge you are submitting results. You will be able to decide whether the results will be publicly visible on the leaderboard of the challenge. You will be able to make results publicly visible afterwards if you initially decide to not show the results.
  • In order to be considered for the evaluation, results need to be publicly visible. This is due to the fact that we are using a ranking-based approach to determine the winners.

Deadlines

The deadline for submitting to any of the three challenges is June 1st, 23:59 PM (PST). In order to be able to receive a prize, you will need to notify us of the corresponding publication or arXiv paper until June 1st, 23:59 PM (PST) (contact Torsten Sattler at torsat@chalmers.se). We will notify the winners by June 4th.

Datasets

The following datasets will be used for the End-to-End Visual Localization and Visual Localization challenges:

The following datasets will be used for the Local Feature Evaluation challenge:

Details on the Local Feature Evaluation Challenge

The following is provided for the challenge:

  • A script that performs feature matching between images and the uses the COLMAP Structure-from-Motion software for pose estimation.
  • A list of image pairs for matching.
  • An empty COLMAP database, in which your features will be imported.

See this Github repository for the code, data, and information on using both.

The workflow for submitting to the challenge is:

  • Use your own feature detector and descriptor to extract local features for all query images and their reference images.
  • Execute the provided code for importing the features, performing feature matching, 3D reconstruction, and camera pose estimation. If you want to use your own feature matcher and not the one provided by us (which finds mutual nearest neighbors), you will need to adapt this code.
  • Submit the output file at visuallocalization.net, indicating that you are submitting to the local feature challenge of this workshop.