Advanced Topics and Current Trends
Advanced Topics and Current Trends
::: Home > Instruction > CMSC 280: Parallel Processing > Topic 13: Advanced Topics and Current Trends
This week shifts focus from theoretical models to contemporary, heterogeneous architectures, specifically Graphics Processing Units (GPUs), which dominate modern parallel computing. The module will introduce the CUDA/OpenCL programming models, analyze the differences between CPU and GPU memory hierarchies, and examine how previously covered algorithms (e.g., Matrix Multiplication) are optimized for GPU execution using thread blocks and warps.
The hierarchical organization and execution model of GPU computing, including threads, warps, and memory hierarchies, are understood and analyzed.
The strategies for computation offloading and data placement in heterogeneous systems are explained in relation to latency, bandwidth, and arithmetic intensity.
The principles of data and model parallelism in distributed deep learning are described, and their relationship to classical parallel patterns and collective communication is recognized.
Handout: Advanced Topics and Current Trends*
When Parallelism Meets the Present
Parallelism in Modern Architectures
GPU Computing: Threads, Warps, and Memory
Heterogeneous Systems: Data Placement and Offloading
Parallelism in Machine Learning
Distributed Deep Learning: Data vs. Model
ML Frameworks and Parallel Patterns
Beyond the Algorithm
Note: Links marked with an asterisk (*) lead to materials accessible only to members of the University community. Please log in with your official University account to view them.
GPU Programming and Architecture
Nickolls, J., Dally, W. J., Barke, S., & Gotsman, S. (2008). The GPU computing era. IEEE Micro, 28(2), 71–80.
Distributed Deep Learning and Parameter Servers
Dean, J., Corrado, G., Monga, R., Chen, K., Chen, M., Devin, M., ... & Ng, A. (2012). Large scale distributed deep networks. Advances in neural information processing systems, 25.
Heterogeneous Computing Principles
Lee, K. (2014). Principles of high performance computing. The MIT Press. (Provides foundational knowledge on the architecture and programming challenges of combining CPUs and GPUs.)
Access Note: Published research articles and books are linked to their respective sources. Some materials are freely accessible within the University network or when logged in with official University credentials. Others will be provided to enrolled students through the class learning management system (LMS).
::: Home > Instruction > CMSC 280: Parallel Processing > Topic 13: Advanced Topics and Current Trends