Materials‎ > ‎Research Papers‎ > ‎

High Performance Discrete Fourier Transforms on Graphics Processors

posted Dec 18, 2008, 12:59 AM by Nicolas Pinto   [ updated Dec 31, 2008, 10:04 AM ]
Authors:
Naga Govindaraju  (Microsoft Corporation)
Brandon Lloyd  (Microsoft Corporation)
Yuri Dotsenko  (Microsoft Corporation)
Burton Smith  (Microsoft Corporation)
John Manferdelli  (Microsoft Corporation)
Papers Session Abstract:
We present novel algorithms for computing Fourier transforms with high performance on GPUs. We present hierarchical, mixed radix FFT algorithms for both power-of-two and non-power-of-two sizes. Our hierarchical FFT algorithms efficiently exploit shared memory on GPUs using a Stockham formulation. We reduce the memory transpose overheads in hierarchical algorithms by combining the transposes into a block-based multi-FFT algorithm. For non-power-of-two sizes, we use a combination of mixed radix FFTs of small primes and Bluestein's algorithm. We use modular arithmetic in Bluestein's algorithm to improve the accuracy. We implemented our algorithms using the NVIDIA CUDA API and compared their performance with NVIDIA's CUFFT library and an optimized CPU-implementation (Intel's MKL) on a high-end quad-core CPU. On an NVIDIA GPU, we obtained up to 250 GFlops, a 3x performance improvement over CUFFT, and 5-25x improvement over MKL on large sizes.
The full paper can be found in the IEEE Computer Society archive and ACM Digital Library
Comments