conv2

conv2 - 2-D convolution

Implementation

OpenCL

Usage

output = Convolution_ATI(input, kernel);

Data Type Supported

All output, input and kernel are float types.

Algorithm

The program processes 8 points or 16 points along the column each time. The convolution kernel is fully loaded into shared memory in one time. During the calculation of each point along the column, it loads 512 points from one row of the input data into shared memory. The input, output and shared memory are converted to vector 4 data type. Memory Clamping is used in order to eliminate the "if" control flow and thus improve the performance. Some parts of the code are manually unrolled to increase the performance, too.