cu-BRIEF
Abhinit Modi: abhinitm@andrew.cmu.edu Computer Science Department
Luis Fernando Fraga Gonzalez: lfragago@andrew.cmu.edu Computer Science Department
Abhinit Modi: abhinitm@andrew.cmu.edu Computer Science Department
Luis Fernando Fraga Gonzalez: lfragago@andrew.cmu.edu Computer Science Department
Implement an efficient key point detection and description algorithm and use it in an application to track logos on moving automobiles in real time (fps: 30). Develop the entire pipeline from scratch on GPU, to provide custom optimizations and a key point tracker mostly independent from OpenCV.
Keypoints are spatial locations, or points in the image that define what is interesting or what stand out in the image. No matter how the image changes... whether the image rotates, shrinks/expands, is translated or is subject to distortion we should be able to find the same keypoints. To match two objects in an image, keypoints are a common way. The pipeline has 3 major stages: Keypoint detection, key point description and descriptor matching. Here is an high level view of the algorithm we have adopted.
Detection
We use the difference of Gaussians method to locate keypoints in an image. The algorithm is as follows.
Description
Matching
Used four different benchmark images (having different number of keypoints) at 8 different scales: 32x32, 64x64, 128x128, 256x256, 512x512, 1024x1024, 2048x2048, 4096x4096.
Accuracy was verified by testing keypoint coordinates obtained for the data set from OpenCV experiments
After profiling the baseline version and a crude implantation we inferred that key point detection is the slowest and hence a bottleneck in the pipeline.
Premature Optimization
Pitfall: Too many computations performed were redundant. Value once computed and fetched was fetched, computed again instead of reusing.
Max Speed up: 1.6x.
An interesting observation is the drop in speed up for images greater than 512x512 pixels, which is attributed to the fact that they will not be fitting in the cache and will require memory reads and updates.
Naïve implementation
Max Speed up: 14x
Leveraging locality
Max Speed up: 18x
Loop Unrolling
The convolution and some of the loops in the CUDA kernel were optimized using the loop unrolling technique, a factor of 2 was found to be most efficient.
Max Speed up: 21x
Adopting the convolution optimizations used in Halide.
Max Speed up: 25x
Shared Memory
Workload imbalance
Max Speed up: 70x
Re-used the hamming distance CUDA feature matcher provided by OpenCV using hamming distance.
Below are the frame rates we could capture for videos for logo detection. Videos can be found here
Real time processing is only 30 frames per second. But since we are able to process at 62 frames per second, we can potentially detect and process key points in HD videos in real time.
The above graphs depict the speed ups obtained incrementally by adding more optimization techniques. For the given application we can see that workload balancing gives the maximum speed up.
https://github.com/opencv/opencv
https://github.com/opencv/opencv_contrib/
https://gilscvblog.com/2013/08/26/tutorial-on-binary-descriptors-part-1/
http://opencv.org/platforms/cuda.html
Numerous articles on key point detectors and descriptors.
Both members equally contributed to the project