mean2

Implementation

OpenCL

Usage

output = mean2_ATI(input)

Class Support

input is float and output is double.

Algorithm

mean2 is used to compute the average of all the elements in the input.

float4 is used for data transfer inside the device. Each thread would be in charge of accumulating a certain number of float4s along the column.

Then a group of 64 threads (a wavefront in ATI) will perform a simple reduction operation through the shared memory.

The eventual computation will be performed on the CPU since there is only a limited number of values left after GPU operation.