Mark Silberstein‎ > ‎

Sum-product GPU kernel

The first step in enabling GPU-based inference in Bayesian networks is performing a tensor product of several functions and summation (contraction) over  shared dimensions.
The challenge is to recognize the seemingly irregular memory access pattern at runtime and prefetch the right data based on the pre-processing.
The paper on this was presented at ICS08 in Greece.
Here is the lecture I gave on this subject @ MS Research, Redmond

The following archive contains the updated version of the Sum-product kernel implemented using NVIDIA CUDA and OpenMP.

Note that the code is experimental, and updated rarely.

Several key updates
  • The unrolling problem experienced by NVIDIA nvcc compiler has been fixed via the extensive use of macros.
  • Incorrect selection of the number of threads in the OpenMP has been fixed
  • The kernel now handles sum-products which are not necessarily "homogeneous", i.e. contain different number of summation variables in different functions of the product
Download.