UPLB HPC - 07-When Processors Talk

When Processors Talk:

The Hidden Conversations of Parallel Programs

::: Home > Instruction > CMSC 180: Introduction to Parallel Computing > Topic 07: When Processors Talk

In this topic, we explore how processors talk to each other. In parallel computing, communication is as important as computation. We learn the basic ways messages move—one-to-one and one-to-many—and how collective operations help groups of processors share and combine data efficiently.

We also study how latency and bandwidth shape communication cost, how synchronization can stall progress, and how clever techniques like message aggregation and asynchronous transfers help us “hide” communication time behind computation. Understanding these patterns teaches us to make systems that don’t just compute together—but think together.

Learning Objectives

Explain the importance of communication in parallel programs.
Compare point-to-point and collective operations and describe their trade-offs.
Analyze communication costs and propose strategies to reduce them.

Guide Questions

Why can fast processors still run slowly when communication is poor?
How do collective operations like broadcast and reduce make programs simpler and faster?
How can we overlap computation with communication to improve efficiency?

Topic Outline

When Messages Move: The Basics of Communication
- Latency and Bandwidth: The Two Enemies of Speed
- Synchronous vs. Asynchronous Transfers
Collective Communication: When the Group Talks Together
- Broadcast, Scatter, Gather, and Reduce
- Implementation Trade-Offs and Topology Awareness
Cutting the Cost of Talking
- Reducing Communication Overhead
- Overlapping Communication and Computation

Current Lecture Handout
When Processors Talk: The Hidden Conversations of Parallel Programs, rev 2023*
Previous lecture handout (2022)*

Note: Links marked with an asterisk (*) lead to materials accessible only to members of the University community. Please log in with your official University account to view them.

The semester at a glance:

Readings

Baker, C., Chaudhuri, A., & Kale, L. V. (2021). Dynamic load balancing in Charm++ for exascale applications. Concurrency and Computation: Practice and Experience, 33(21), e6379. https://doi.org/10.1002/cpe.6379
Becker, D., et al. (2022). Cray Slingshot: A unified low-latency network for exascale systems. IEEE Micro, 42(4), 46–57. https://doi.org/10.1109/MM.2022.3166052
Choquette, J., et al. (2021). NVIDIA A100 Tensor Core GPU architecture. IEEE Micro, 41(2), 46–55. https://doi.org/10.1109/MM.2021.3051625
Dinan, J., et al. (2017). Scalable collective communication for extreme-scale systems. The International Journal of High Performance Computing Applications, 31(4), 382–396. https://doi.org/10.1177/1094342016646848
Gropp, W., Lusk, E., & Skjellum, A. (1999). Using MPI-2: Advanced features of the message-passing interface. MIT Press.
Hummel, F., et al. (2020). Strong and weak scaling of molecular dynamics simulations on GPUs. Computer Physics Communications, 255, 107263. https://doi.org/10.1016/j.cpc.2020.107263
Plimpton, S., et al. (2020). Efficient molecular dynamics simulations with topology-aware task mapping. Computer Physics Communications, 256, 107437. https://doi.org/10.1016/j.cpc.2020.107437
Sergeev, A., & Del Balso, M. (2018). Horovod: Fast and easy distributed deep learning in TensorFlow. Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS).

Access Note: Published research articles and books are linked to their respective sources. Some materials are freely accessible within the University network or when logged in with official University credentials. Others will be provided to enrolled students through the class learning management system (LMS).

::: Home > Instruction > CMSC 180: Introduction to Parallel Computing > Topic 07: When Processors Talk

Page updated

Report abuse