imregionalmax

Implementation

OpenCL

Usage

output=imregionalmax_ATI(input)

Class Support

float is supported for input, output

Algorithm

Since the algorithm is to compute the maximum value from 3X3 input pixels, we use float4 as data unit and put x,w component into share memory. Thus for one thread, it will read float4 from global memory and left pixel's w and right pixel' x from share memory to reduce global memory access. We also use one thread to compute some neighbor thread in the same column to reduce global memory access.