Beyond the Basics:
How Smart Sorting Scales in Parallel Worlds
Beyond the Basics:
How Smart Sorting Scales in Parallel Worlds
::: Home > Instruction > CMSC 180: Introduction to Parallel Computing > Topic 12: Beyond the Basics
In this topic, we go deeper into how advanced sorting algorithms achieve both speed and scalability on modern parallel systems. We explore Sample Sort, Bucket Sort, Parallel Merge Sort, and Bitonic Sort, understanding how they balance computation and communication to perform well even on large distributed architectures.
We study not only the logic but also the engineering spirit behind these algorithms — how they manage trade-offs, predict behavior through modeling, and adapt to the quirks of real hardware.
Implement and analyze advanced parallel sorting algorithms like Sample Sort, Bucket Sort, and Bitonic Sort.
Evaluate how load balancing and communication efficiency affect overall scalability.
Choose appropriate sorting methods for specific architectures and workloads.
How does sampling improve balance and performance in sorting?
Why is Bitonic Sort favored for hardware-based systems?
How do we decide which sorting algorithm fits a particular platform?
Sampling Our Way to Balance
How Sample Sort Divides and Conquers
Communication in One Big Move
Merge-Based and Network-Based Sorting
Parallel Merge Sort: Tree of Cooperation
Bitonic Sort: Sorting by the Rhythm
Thinking Like Engineers: Cost and Scalability
The Cost Equation
Hardware Matters
Current Lecture Handout
Beyond the Basics: How Smart Sorting Scales in Parallel Worlds, rev 2023*
Note: Links marked with an asterisk (*) lead to materials accessible only to members of the University community. Please log in with your official University account to view them.
The semester at a glance:
Batcher, K. E. (1968). Sorting networks and their applications. Proceedings of the AFIPS Spring Joint Computer Conference, 32, 307–314. https://doi.org/10.1145/1468075.1468121
Dinan, J., et al. (2017). Scalable collective communication for extreme-scale systems. The International Journal of High Performance Computing Applications, 31(4), 382–396. https://doi.org/10.1177/1094342016646848
Govindaraju, N. K., Gray, J., Kumar, R., & Manocha, D. (2006). GPUTeraSort: High performance graphics co-processor sorting for large database management. Proceedings of the ACM SIGMOD International Conference on Management of Data, 325–336. https://doi.org/10.1145/1142473.1142511
Hammond, S. D., et al. (2020). Scaling sorting and sampling on exascale architectures. Concurrency and Computation: Practice and Experience, 32(12), e5662. https://doi.org/10.1002/cpe.5662
Marjanović, V., Gioiosa, R., Beltran, V., & Labarta, J. (2010). Overlapping communication and computation by using MPI task-aware runtime. IEEE International Conference on Cluster Computing, 1–10. https://doi.org/10.1109/CLUSTER.2010.19
Satish, N., Harris, M., & Garland, M. (2009). Designing efficient sorting algorithms for manycore GPUs. Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, 1–10. https://doi.org/10.1109/IPDPS.2009.5161005
Shi, H., & Schaeffer, J. (1992). Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 14(4), 361–372. https://doi.org/10.1016/0743-7315(92)90047-M
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 1–7.
Access Note: Published research articles and books are linked to their respective sources. Some materials are freely accessible within the University network or when logged in with official University credentials. Others will be provided to enrolled students through the class learning management system (LMS).
::: Home > Instruction > CMSC 180: Introduction to Parallel Computing > Topic 12: Beyond the Basics