When Processors Talk:
The Hidden Conversations of Parallel Programs
When Processors Talk:
The Hidden Conversations of Parallel Programs
::: Home > Instruction > CMSC 180: Introduction to Parallel Computing > Topic 07: When Processors Talk
In this topic, we explore how processors talk to each other. In parallel computing, communication is as important as computation. We learn the basic ways messages move—one-to-one and one-to-many—and how collective operations help groups of processors share and combine data efficiently.
We also study how latency and bandwidth shape communication cost, how synchronization can stall progress, and how clever techniques like message aggregation and asynchronous transfers help us “hide” communication time behind computation. Understanding these patterns teaches us to make systems that don’t just compute together—but think together.
Explain the importance of communication in parallel programs.
Compare point-to-point and collective operations and describe their trade-offs.
Analyze communication costs and propose strategies to reduce them.
Why can fast processors still run slowly when communication is poor?
How do collective operations like broadcast and reduce make programs simpler and faster?
How can we overlap computation with communication to improve efficiency?
When Messages Move: The Basics of Communication
Latency and Bandwidth: The Two Enemies of Speed
Synchronous vs. Asynchronous Transfers
Collective Communication: When the Group Talks Together
Broadcast, Scatter, Gather, and Reduce
Implementation Trade-Offs and Topology Awareness
Cutting the Cost of Talking
Reducing Communication Overhead
Overlapping Communication and Computation
Current Lecture Handout
When Processors Talk: The Hidden Conversations of Parallel Programs, rev 2023*
Note: Links marked with an asterisk (*) lead to materials accessible only to members of the University community. Please log in with your official University account to view them.
The semester at a glance:
Baker, C., Chaudhuri, A., & Kale, L. V. (2021). Dynamic load balancing in Charm++ for exascale applications. Concurrency and Computation: Practice and Experience, 33(21), e6379. https://doi.org/10.1002/cpe.6379
Becker, D., et al. (2022). Cray Slingshot: A unified low-latency network for exascale systems. IEEE Micro, 42(4), 46–57. https://doi.org/10.1109/MM.2022.3166052
Choquette, J., et al. (2021). NVIDIA A100 Tensor Core GPU architecture. IEEE Micro, 41(2), 46–55. https://doi.org/10.1109/MM.2021.3051625
Dinan, J., et al. (2017). Scalable collective communication for extreme-scale systems. The International Journal of High Performance Computing Applications, 31(4), 382–396. https://doi.org/10.1177/1094342016646848
Gropp, W., Lusk, E., & Skjellum, A. (1999). Using MPI-2: Advanced features of the message-passing interface. MIT Press.
Hummel, F., et al. (2020). Strong and weak scaling of molecular dynamics simulations on GPUs. Computer Physics Communications, 255, 107263. https://doi.org/10.1016/j.cpc.2020.107263
Plimpton, S., et al. (2020). Efficient molecular dynamics simulations with topology-aware task mapping. Computer Physics Communications, 256, 107437. https://doi.org/10.1016/j.cpc.2020.107437
Sergeev, A., & Del Balso, M. (2018). Horovod: Fast and easy distributed deep learning in TensorFlow. Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS).
Access Note: Published research articles and books are linked to their respective sources. Some materials are freely accessible within the University network or when logged in with official University credentials. Others will be provided to enrolled students through the class learning management system (LMS).
::: Home > Instruction > CMSC 180: Introduction to Parallel Computing > Topic 07: When Processors Talk