cu-BRIEF
Abhinit Modi: abhinitm@andrew.cmu.edu Computer Science Department
Luis Fernando Fraga Gonzalez: lfragago@andrew.cmu.edu Computer Science Department
Abhinit Modi: abhinitm@andrew.cmu.edu Computer Science Department
Luis Fernando Fraga Gonzalez: lfragago@andrew.cmu.edu Computer Science Department
In this project we implement a highly parallel version of the Difference of Gaussian Key point detector and Binary Robust Independent Elementary Features (BRIEF) feature descriptor for a GPU. The complete pipeline for kepypoint matching is implemented in parallel using CUDA: Detection, Description and Matching. The final implementation will be used to track logos in cars in video recordings. We will be evaluating our CUDA GPU version against the highly optimized OpenCV serial version.
The most computationally expensive part of the pipeline is the keypoint detection because it involves applying multiple filters to all of the image and exploring a 3D pyramid, so our analysis is focused on this stage of the pipeline. For now, we report the performance using this part of the pipeline.
A serial version of the keypoint detector was run on a NVIDIA GTX-1080 GPU using CUDA and also on a Virtual Machine with two Intel i5 cores and AVX instructions. Our serial version uses the OpenCV filtering functions which are highly optimized to use SIMD. Four images which are typical benchmarks were used to test our system using sizes 32x32, 64x64, 128x128, 256x256, 512x512, 1024x1024, 2048x2048, 4096x4096.
We have tried three different approaches so far, and the results below show our best results to this moment.
We processed a short video and tracked a logo both using the CUDA version and the "OpenCV non-free" contrib version running on a single Intel i7 core. We calculated the frame processing rate and generated an output video simulating this frame rate. The videos below show the results:
OpenCV version
CUDA version
Below is the same video processed using the GPU CUDA version.
Please see https://sites.google.com/view/15418-parallel-brief/final-report for the final results. Below is the performance speedup obtained after finishing all optimizations.