Mohammed Ibrahim

Analyzing and Leveraging Remote-Core Bandwidth for Enhanced Performance in Graphics Processing Units

Computer Science | William & Mary

Co-Authors: H. Liu, O. Kayiran

Advisor: Adwait Jog

Abstract

Bandwidth achieved from local/shared caches and memory is a major performance determinant in Graphics Processing Units (GPUs). These existing sources of bandwidth are often not enough for optimal GPU performance. Therefore, to enhance the performance further, we focus on efficiently unlocking an additional potential source of bandwidth, which we call as remote-core bandwidth. The source of this bandwidth is based on the observation that a fraction of data (i.e., L1 read misses) required by one GPU core can also be found in the local (L1) caches of other GPU cores. In this paper, we propose to efficiently coordinate the data movement across cores in GPUs to exploit this remote-core bandwidth. However, we find that its efficient detection and utilization presents several challenges. To this end, we specifically address: a) which data is shared across cores, b) which cores have the shared data, and c) how we can get the data as soon as possible. Our extensive evaluation across a wide set of general-purpose GPU applications shows that significant performance improvement can be achieved at a modest hardware cost on account of the additional bandwidth received from the remote cores.

Bio

Mohamed Assem Ibrahim is a Ph.D. candidate in the Department of Computer Science at William & Mary under the supervision of Professor Adwait Jog. Mohamed’s research interests lie in the broad area of computer architecture, with an emphasis on designing high-performance and energy-efficient GPU architectures. Before joining William & Mary, he received his bachelor's and master's degrees in Computer Engineering at Cairo University, Egypt.

Ibrahim, Mohamed.pdf