Parallel Fast Fourier Transform II
Parallel Fast Fourier Transform II
::: Home > Instruction > CMSC 280: Parallel Processing > Topic 12: Parallel Fast Fourier Transform II
The second FFT week delves into practical considerations, focusing on minimizing communication, especially when data exceeds local memory capacity (out-of-core FFT). Students will examine techniques for efficient data partitioning and communication scheduling across multiple stages of the butterfly network to achieve optimal performance on current high-performance computing (HPC) platforms.
The implementation and optimization strategies for parallel FFT, including message aggregation and latency hiding, are understood and evaluated.
The architectural differences between distributed-memory and shared-memory FFT implementations are analyzed in terms of scalability, cache coherence, and data locality.
The application of FFT to parallel convolution, filtering, and out-of-core computation is explained, emphasizing communication cost and I/O performance as dominant factors in scalability.
Handout: Parallel Fast Fourier Transform II*
When Performance Meets Architecture
Implementation Details and Communication Analysis
Optimization Techniques: Blocking & Hiding
Distributed vs. Shared Memory Trade-offs
Parallel Convolution and Filtering
Applications: Convolution and Filtering
Scaling Beyond Memory Limits (Out-of-Core)
From Transformation to Modern Frontiers
Note: Links marked with an asterisk (*) lead to materials accessible only to members of the University community. Please log in with your official University account to view them.
Optimization and Shared Memory FFT (FFTW)
Frigo, M., & Johnson, S. G. (2005). The design and implementation of FFTW3. Proceedings of the IEEE, 93(2), 216–231.
Parallel OOC and External Memory Algorithms
Vitter, J. S. (2001). External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys (CSUR), 33(2), 209–271. (Provides the foundational models for out-of-core computing, which applies directly to OOC-FFT).
Comprehensive Parallel FFT Analysis and Cost Modeling
Grama, A., Gupta, A., Karypis, G., & Kumar, V. (2003). Introduction to parallel computing. Addison-Wesley.
Access Note: Published research articles and books are linked to their respective sources. Some materials are freely accessible within the University network or when logged in with official University credentials. Others will be provided to enrolled students through the class learning management system (LMS).
::: Home > Instruction > CMSC 280: Parallel Processing > Topic 12: Parallel Fast Fourier Transform II