We implement a speed-limit-sign recognition task using a template-based approach on the FPGA using the Intel FPGA SDK for OpenCL. Then we evaluate its performance against a GPU implementation that is based on a system presented in our previous study. This project discusses implementation differences between the FPGA and GPU systems and optimizations used in the FPGA version. Our research also presents a methodology for comparing FPGA and GPU results along with lessons learned from our experiments. While implementing the FPGA implementation, we build an efficient FFT engine for image processing on the FPGA which can be utilized by other developers to perform related tasks. We conclude that the FPGA implementation provides better power consumption for the same detection accuracy, while the GPU supports better programmer efficiency.
full paper: to be added...