edge
Implementation
Implementation
OpenCL
Usage
Usage
output = Edge_ATI(input)
Class Support
Class Support
Both output and input are float types.
Algorithm
Algorithm
Since the algorithm is to calculate the output pixel from 3X3 input
pixels, we use float4 as data unit and put x,w component into share
memory. Thus for one thread, it will read float4 from global memory and
left pixel's w and right pixel' x from share memory to reduce global
memory access. We also use one thread to compute some neighbor thread
in the same column to reduce global memory access.