The first step in enabling GPU-based inference in Bayesian networks is performing a tensor product of several functions and summation (contraction) over shared dimensions. The challenge is to recognize the seemingly irregular memory access pattern at runtime and prefetch the right data based on the pre-processing. The paper on this was presented at ICS08 in Greece. Here is the lecture I gave on this subject @ MS Research, Redmond The following archive contains the updated version of the Sum-product kernel implemented using NVIDIA CUDA and OpenMP. Note that the code is experimental, and updated rarely. Several key updates
|