Proposal

Summary

In this project we implement a highly parallel version of the Binary Robust Independent Elementary Features (BRIEF) feature descriptor using CUDA. The complete pipeline for keypoint matching is implemented in parallel using CUDA:

Background

Keypoints

In the context of image processing, keypoints are pixel positions that are "significant" to the image. By significant we mean that that particular point is rich of unique information, so if we move the image, change the illumination or rotate this image, we will still be able to identify the same point.

Corners, for instance are an example of keypoints, because we are able to identify them even if the brightness in the image is changed or if the corner change its position in the image.

One typical rule of thumb to determine if a point is a keypoint or not, is to find if there is "change" across all directions surrounding the pixel. The following image (taken from the 16720 CMU course slides) describe three cases typically considered in keypoint detectors. It is important to note that a point may be a keypoint without necessarily being a corner.

The image below illustrate the keypoints detected in the image of an helicopter:

Applications

Keypoints are used extensively in computer vision for different applications such as:

Object matching

Given the template of an object find whether it is present or not in an image, and get its position.

Image homography estimation

Given two images, find out how to "go" from one image to another, i.e. what is the translation, rotation and scaling between the two images. This can be used for panoramic view creators using multiple images.

Feature descriptors

Once the keypoints in an image have been identified. The next step consists of assigning a "descriptor" to each of the keypoints that enable us to match that same keypoint in different images.

There are plethora of feature descriptors available providing different characteristics. The most relevant features are the following:

Recognition rate: Given a set of keypoints in Image A, how many of the keypoints that are in Image A and also in Image B can we identify?
Speed: How fast can we detect the feature descriptors for a set of keypoints.
Robustness: How robust is the feature descriptor to translation, rotation, scaling and change in brightness in an image.

Some of the most popular feature descriptors used in computer vision are:

SIFT Scale invariant feature descriptor.
FAST: Feature Accelerated Segment Test.
SURF: Speedup Robust Features
ORB: Oriented FAST and Rotated BRIEF.
BRIEF: Binary Robust Independent Elementary Features.

The pipeline

The process of keypoint matching can be divided in the following stages. Our plan is to implement ALL of the stages of the pipeline in parallel using CUDA:

Keypoint detection

In this stage the input is an image and the output is a set of image coordinates containing the keypoints. There are many algorithms to find the keypoints. We are using the Difference of Gaussians detector (DoG). Whose steps are described below:

1. Applying several (typically 8) Gaussian filters at different scales to the image.

2. (i) Stacking the N filtered images vertically

(ii) Subtracting the pixels from the "image above" on the stack, resulting in a N-1 stack of images.

(iii) Marking every point that is a local maximum (across neighbors in the same scale and contiguous scale) as a candidate keypoint.

3. Eliminate edges: Some of the candidates detected in the previous stage may correspond to edges. To eliminate edges from the candidates. We apply a test called the "curvature test". We compute the Hessian of each candidate pixel. And calculate 'R' as described in the left. If the point curvature is above a given threshold we discard the point.

The leftmost image shows the keypoints before and after applying the curvature test.

Feature descriptor

After finding all of the keypoints in the image, we need to assign a "tag" uniquely describing each of the keypoints. In this project we are implementing the BRIEF feature descriptor which has the following characteristics:

Great speed: Among all of the popular feature descriptors (SIFT, ORB, SURF, FAST), BRIEF is the fastest one.
Brightness and translation invariant BRIEF works well when the object changes its position and with illumination changes; however it is not robust to rotation and scale changes.

Below we describe the steps used to create the descriptor for each keypoint using BRIEF:

1. (i) Select a patch of pixels around the keypoint (5x5 or 7x7 for example)

(ii) Number all of the pixels in the patch, except for the keypoint

2. Randomly select a set of pairs from the numbered pixels around the keypoint (typically 128 or 256). This preselected set is in fact used for EVERY keypoint in all images.

3. Let I(P) be the intensity of pixel 'P'. We then create a binary feature descriptor by applying the following test to each pair:

I(P1) > I(P2)?

In other words, we check whether the intensity of the first pair member is greater than the intensity of the second pair member. In such case we set the bit for that pair as '0' and to '1' otherwise. This creates a bit vector describing the keypoint with length equal to the number of pre selected pairs.

Matching keypoints between two images

Now that each keypoint has a feature descriptor, we need to match keypoints between images. To do so with BRIEF, we usually calculate the smallest Hamming Distance between a given keypoint in a first image and the rest of the keypoints in the second image.

The Challenge

Dependencies: All of the three stages in the pipeline above are dependent of the previous stage. However, all of the keypoint computations are independent of each other within each stage, which makes this problem suitable for parallel implementation.
Computation vs communication: Each keypoint computation can be calculated independently and has a high arithmetic intensity when compared to the communication needed.
Memory access: The algorithm operates by taking one image as an input, and then generating a stack of Gaussian blured images. After that point, all of the data generated consists of just integer value tables that occupy very little memory.
Locality: Most of the steps for a keypoint operate on neighbor pixels, so there is very high locality
Motivation. The main reasons behind choosing this project are the following:

Resources

We will be using one cluster having access to a NVIDIA GPU. We will probably be using one of the GHC machines having the NVIDIA GT1080 GPU.
We are planning to start the code from scratch basing it on the papers that describe the algorithm.
Below are the references we are using to implement the algorithms:
We require to install and use OpenCV on the nodes we will be working on.

Goals and deliverables

We are going to implement the three sections of the pipeline. We will have three deliverables:
We will report a performance graph comparing our own sequential, the OpenCV and the CUDA version of the three stages of the pipeline.
We will include a video demo showing live feature tracking at a high FPS rate.

Platform choice

We will implement the code in C++ using CUDA to compute the heavy computations using a GPU. The reason why we chose C++ is because it is easy to integrate with OpenCV which is written in C++, and because C and C++ are much faster than most of the languages for tasks such as image processing, providing better memory management capabilities.

Schedule

Week 1: April 10- April 16: + Install OpenCV + Get serial version of Keypoint detector working
Week 2: April 17 - April 23: + Sequential BRIEF feature descriptor working +Keypoint matcher +Parallel keypoint detector
Week 3: April 24 - May 1: +Parallel BRIEF feature descriptor +Parallely Keypoint matcher analyze performance
Week 4: May 1 - May 8: + Prepare demo videos + Plot performance graphs +Write final report