Parallel Dense Matrix Algorithms I
Parallel Dense Matrix Algorithms I
::: Home > Instruction > CMSC 280: Parallel Processing > Topic 02: Parallel Dense Matrix Algorithms I
This topic introduces parallel algorithms essential for linear algebra, starting with fundamental operations like parallel matrix-vector multiplication and matrix transposition, focusing on data partitioning strategies (1D and 2D block-cyclic decompositions). Students will analyze the communication costs associated with these operations and contrast the efficiency of different mapping schemes on distributed-memory architectures.
The principles of matrix decomposition for parallel computation are understood, and the trade-offs among row-wise, column-wise, and block-cyclic data distributions are evaluated.
The design and performance characteristics of standard parallel matrix-multiplication algorithms, such as Cannon’s and SUMMA, are analyzed with respect to computation–communication overlap.
The relationship between structured numerical computation and scalable system performance is recognized through the formulation and interpretation of cost models.
Handout: Parallel Dense Matrix Algorithms I*
When Data Becomes Geometry
Parallel Matrix Transposition and Vector-Matrix Operations
Matrix Data Decomposition
The Cost of Parallel Transposition
Standard Parallel Matrix Multiplication
Cannon's and SUMMA Algorithms
Optimizing Overlap and Granularity
Toward Systems of Equations
Note: Links marked with an asterisk (*) lead to materials accessible only to members of the University community. Please log in with your official University account to view them.
General Parallel Computing and Cost Models
Grama, A., Gupta, A., Karypis, G., & Kumar, V. (2003). Introduction to parallel computing. Addison-Wesley.
Cannon's Algorithm (Original Source)
Cannon, L. E. (1969). A cellular computer for the solution of linear simultaneous equations. (Doctoral dissertation, Montana State University, Bozeman).
SUMMA and Scalable Linear Algebra Libraries
Dongarra, J. J., Walker, D. W., & van de Geijn, R. A. (1997). A look at scalable dense linear algebra libraries. The International Journal of High Performance Computing Applications, 11(4), 316–327.
Access Note: Published research articles and books are linked to their respective sources. Some materials are freely accessible within the University network or when logged in with official University credentials. Others will be provided to enrolled students through the class learning management system (LMS).
::: Home > Instruction > CMSC 280: Parallel Processing > Topic 02: Parallel Dense Matrix Algorithms I