Local feature matching among the images taken from very different viewpoints is the crucial task for the practical applications such as SfM, image retrieval, and place recognition. Well-known approach to the viewpoint-invariant representation is that to approximate appearance changing as affine transformation. ASIFT [2] establishes large number of correspondences by simulating affine warping for both of feature detection and description. A shortcoming of this framework is that it requires additional computation costs for affine warped view synthesis and matching. It also provide plenty of mismatches because of the heuristic feature generation.
We propose a new image matching framework that considers descriptor variation due to viewpoint changes. We first detect and describe features from multiple affine-warped images in the same manner as in ASIFT for sampling varied descriptors due to viewpoint changes. Our feature grouping process groups varied descriptors pointing the same keypoint on the original image. We then learns how the descriptor varies by using constructed feature groups. Learned variations are applied to the similarity measurement process of matching to neglect the difference between descriptors caused by viewpoint changes. Our image matching framework establishes sufficient and reliable correspondences between a pair of images under large viewpoint changes. Furthermore, our framework is clearly faster than ASIFT because we can exclude view synthesis process for the input image since we predict the descriptor variation due to viewpoint changes. Our framework is particularly useful when the features for target image can be pre-processed off-line.
Effectiveness of our trained Mahalanobis metric for feature matching among largely view-changed images. (Left) Images [3] capture the same object (wall) from different viewpoints. Large appearance changing on the image plane prevent accurate feature matching. (Center) 2D PCA projection of SIFT descriptors [1] extracted from left 2 images. Highlighted points indicate the typical corresponding descriptors (circle: from top image, square: from bottom image). Descriptors vary due to the changes of the appearance. (Right) 2D PCA projection of descriptors projected to our trained Mahalanobis space. Our metric clearly decreases the relative distances between the correct pairs of descriptor.
Typical matching results between images in [3] obtained by ASIFT (left) and our method (right). ASIFT appears 432 correct matches in 582 matches (74.2%) besides our method appears 105 correct matches in 134 matches (78.4%). Our method obtains sufficient matches with high precision.
Typical matching results between images in [3] obtained by ASIFT (left) and our method (right). ASIFT appears 672 correct matches in 725 matches (92.7%) besides our method appears 57 correct matches in 59 matches (96.6%). Our method obtains sufficient matches with high p
Hajime Taira, Akihiko Torii and Masatoshi Okutomi, Robust Feature Matching by Learning Descriptor Covariance with Viewpoint Synthesis, Proceedings of the 23rd International Conference on Pattern Recognition (ICPR2016), pp.1954-1959, December, 2016 [POSTER]
[1] Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60.2 (2004): 91-110.
[2] Morel, Jean-Michel, and Guoshen Yu. "ASIFT: A new framework for fully affine invariant image comparison." SIAM Journal on Imaging Sciences 2.2 (2009): 438-469.
[3] Mikolajczyk, Krystian, and Cordelia Schmid. "Scale & affine invariant interest point detectors." International journal of computer vision 60.1 (2004): 63-86.