IntroductionRecognizing objects in images is one of the most important problems in computer vision. A common approach is to first extract the feature descriptions of the objects to be recognized from reference images, and store such descriptions in a database. When there is a new image, its feature descriptions are extracted and compared to the object descriptions in the database to see if the image contains any object we are looking for. In reallife applications, the objects in the images to be processed can differ from the reference images in many way:
Scaleinvariant feature transform (SIFT) is an algorithm for extracting stable feature description of objects call keypoints that are robust to changes in scale, orientation, shear, position, and illumination.
The AlgorithmThe SIFT algorithm contains
In this assignment, we exclude the last step descriptor generation as it is only a post processing of the keypoint information.Scale Space ConstructionIn SIFT, an image pyramid is built as the first step: the image is process with multiple Gaussian filters. The output of each filter is called a scale. Then the algorithm moves on to the next octave by downscale the image by a factor of 2, and produce another S Gaussian blurred images. This process continues until the image size reach certain limit. The scale space contributes to the scale invariance of SIFT. In the reference implementation, the scale space construction is separated into two steps. The first step is to build the octave bases. An image is downsampled into multiple octaves, each has half of the width and height of the octave at a lower level. An image is downsampled multiple times to create as many octaves as possible, but the top two octaves are discarded due to their small size. The number of octaves O = log2(min(width, height))  2. The second step is build the scale space from octave bases. Each octave is smoothed with multiple Gaussian filters of different sigma values to create multiple scales of the image. In the reference implementation, the scaleperoctave value S = 6, which means each octave will be smoothed by 6 different Gaussian kernels to create 6 different scales of the image. DifferenceofGaussianThe DifferenceofGaussian (DoG) images are taken from adjacent Gaussianblurred images per octave. The follow figure illustrate the idea of scale space construction and DoG calculation. For each octave, images of adjacent scales will subtract each other to create Difference of Gaussian (DoG) images. There will be Ox(S1) number of DoG images in total. Extreme Points Extractionkeypoints are identified as local minima/maxima of the DoG images across scales. This is done by comparing each pixel in the DoG images to its eight neighbors at the same scale and nine corresponding neighboring pixels in each of the neighboring scales. If the pixel value is the maximum or minimum among all compared pixels, it is selected as a candidate keypoint. Keypoint LocalizationExtreme points extraction usually produces too many keypoint candidates. The following two kinds of candidates are eliminated:
Orientation AssignmentThe orientation assignment stage calculates orientation of each keypoint base on gradient. The magnitude and direction calculations for the gradient are done for every pixel in a neighboring window around the keypoint in the Gaussianblurred image. Each pixel votes in a orientation histogram. In the provided reference implementation, the algorithm only picks the maximum orientation in the histogram as the orientation of a keypoint. More advanced implementations, e.g. sift++, lowe's sift, etc., consider all dominant orientations (>80% peak) as the orientations of a keypoint. In all these implementations, the orientation assignment is to ensure the keypoints are invariant to rotation. Reference ImplementationA reference implementation of the SIFT algorithm in C is available. You can get it in two ways: We recommend you to use the repository if you have an account on the server. SIFT related source files are in src/. Codes not directly related to SIFT are located in util/. You should focus on the codes in src/. The reference program produces two outputs that can be used for verification:
Remarks
Feel free to contact us if you find any problem in the reference code.
Useful Links

Assignments >