edge

Implementation

OpenCL

Usage

output = Edge_ATI(input)

Class Support

Both output and input are float types.

Algorithm

Since the algorithm is to calculate the output pixel from 3X3 input

pixels, we use float4 as data unit and put x,w component into share

memory. Thus for one thread, it will read float4 from global memory and

left pixel's w and right pixel' x from share memory to reduce global

memory access. We also use one thread to compute some neighbor thread

in the same column to reduce global memory access.