conv2
conv2 - 2-D convolution
conv2 - 2-D convolution
Implementation
Implementation
OpenCL
Usage
Usage
output = Convolution_ATI(input, kernel);
Data Type Supported
Data Type Supported
All output, input and kernel are float types.
Algorithm
Algorithm
The program processes 8 points or 16 points along the column each time. The convolution kernel is fully loaded into shared memory in one time. During the calculation of each point along the column, it loads 512 points from one row of the input data into shared memory. The input, output and shared memory are converted to vector 4 data type. Memory Clamping is used in order to eliminate the "if" control flow and thus improve the performance. Some parts of the code are manually unrolled to increase the performance, too.