The performance of parallel applications relies heavily on the underlying synchronization primitives used for concurrency control. So it is necessary to study the performance implications of synchronization primitives. Programming scalable, massively-parallel applications using fine grained locking is a very challenging problem requiring significant expertise. Transactional Memory (TM) is emerging as a promising alternative to the traditional lock-based synchronization. Transactional programming is easier for programmers because a lot of the burden of concurrency control is handled by the underlying system. This will become increasingly important, as the productivity of software developers continues to be stressed.
Of course these advances in the construction of large parallel machines, whether single-node SMPs or large distributed clusters, are done with the intention of providing more performance for the applications that are designed to run on these systems. Thus, it is imperative that we provide software developers the means to exploit these systems. Programming models and languages are instrumental in allowing software developers to efficiently develop parallel applications with suitable performance.
Over the last decade, interest in parallel programming has grown tremendously and hardware systems that contain different levels of parallelism have become mainstream. At one end of the spectrum, computer systems that contain many processing cores and are each capable of running multiple hardware threads are also becoming commonplace. It is common to find laptop and desktop systems that contain a small number (2-8) of these Shared-Memory Processor (SMP) chips. Furthermore, high-end computing systems are now able to contain hundreds of these SMP chips, resulting in machines that are capable of running more than 1000 hardware threads simultaneously. As processor speeds begin to stagnate, software developers are being forced to exploit the parallelism that is available in these systems in order to improve the performance of their applications.
Adapted from Challenges for Parallel Computing. Co-Chairs: Kit Barton, IBM Canada Ltd.; Amy Wang,
IBM Canada Ltd.; Steven Perron, IBM Canada Ltd.; Priya Unnikrishnan, IBM Canada Ltd.
It is important to raise awareness of potential challenges that practitioners may face when evolving sequential
code to exploit multicore platforms in order to be better prepared
for future evolution.
Cache hierarchy-aware code parallelization/mapping and scheduling strategy for multicore architectures