Research

Many computer vision applications are complex enough that they do not run at real time rates when implemented in software on traditional CPUs. The term real time is application specific, but in general tends to suggest that algorithm can run at least 15 time per second. Accelerating vision applications involves multiple levels of optimization. The software implementation can be "tuned" using compiler flags. The algorithm itself can be adjusted to perform fewer operations, use approximations, work on down sampled data, operate at fewer scales, etc. Additionally, the application can be partitioned across software and hardware.

When using hardware to accelerate vision applications, the predominate model is to use graphics processing units (GPUs). GPUs provide massive parallelism in a single program multiple data (SPMD) paradigm. With the rise of CUDA and OpenCL, using GPUs to accelerate vision applications has become much easier. Field Programmable Gate Arrays (FPGAs) are a less popular though potentially more powerful platform for accelerating vision applications. FPGAs are fully programmable, reconfigurable, ASIC-like devices that provide near-ASIC performance. They don't have a fixed data path or memory structure. But have the advantage of assuming any data path for memory structure. 

Each platform has their advantages and disadvantages. FPGAs are a bit more difficult to program as the constructs are extremely low level (bit level) as compared to GPUs or CPU software. But GPUs are subject to high latency as video data must be acquired and moved into GPU memory before processing can begin. A comparison of CPUs, GPUs, and FPGAs is listed below. 

Because of the different characteristics of each hardware platform, it makes sense to try to leverage the advantages from each when accelerating vision applications. To accomplish this we are building a framework for leveraging CPU, GPU, and FPGA hardware to accelerate vision applications. We are calling it the Smart Frame Grabber.

CPU

FPGA

GPU

Smart Frame Grabber

Processing units:

~4

~100

~1000

~1000

Cycle frequency:

GHz

MHz (low)

MHz (high)

GHz

Programming:

High level

Low level

Low+ level

High level

Reusable libraries:

Countless

Few

Few

Countless

Development cycle:

Short

Long

Short

Short

Parallelism:

Opportunistic

Fine grain, dedicated

Massive

Massive, dedicated

Pipelining:

Sporadic

Fully

Mostly

Fully

Scheduler:

Dynamic

Custom

Dynamic

Custom

Memory:

Unlimited

Small

Large

Unlimited

Memory latency:

Moderate

Low

High

Low

Data access:

Fetch

Stream or fetch

Fetch

Stream or fetch

Data path:

Fixed

Custom

Fixed

Custom

Computation model:

MPMD

MPMD

SPMD

MPMD

Resource growth:

Slow

Fast

Moderate

Fast

Reconfiguration:

Compile time

Run time

Compile time

Run time


The Smart Frame Grabber is a framework that runs on an off the shelf CPU running Linux that supports the addition of GPUs and FPGAs to accelerate functions in computer vision applications. It's tailored to vision applications because of the reusable API (on the software side), GPU kernels, and FPGA IP cores. These reusable functions are designed to perform common vision tasks that would typically be bottlenecks in vision applications. An example of such functions would be sliding window based Haar feature extraction. 

Smart Frame Grabber


The physical setup of the Smart Frame Grabber it illustrated in the diagram to the left. A camera is connected directly to the FPGA. The FPGA can process video as it is being streamed and send the CPU and/or GPU video data and processing results. The GPU can process entire frame data once captured. The CPU drives the entire algorithm. It performs the dynamic calculations and acts as the glue between function calls to the GPU and FPGA. A PCIe bus connects the devices to the CPU and to each other.

Progress towards the Smart Frame Grabber framework will be made and described in the left navigation side bar. Feel free to use the resources there to help accelerate your vision applications.