A multi-GPU system is a promising solution to provide the computing power demanded by today’s large-scale data-driven applications. However, architectural research on multi-GPU systems is challenging due to a lack of reliable, open-source, flexible, high-performance multi-GPU simulator. MGPUSim is an open-source tool that supports high-flexibility, high-performance and high-reliability when studying multi-GPU architectures. This simulator is developed in the Go programming language and models GPUs based on the AMD GCN3 ISA. MGPUSim can simulate unmodified GPU kernels compiled with the Radeon Open Computing Platform (ROCm) compiler. MGPUSim is an instance of one class of simulator that was built using the project Akita, a framework aimed at reducing the difficulty of developing a new computer architecture simulator. We validated MGPUSim against real hardware with an error as low as 5.5% when compared with real GPU execution. The value of MGPUSim is not limited to only multi-GPU system simulation, but can be used to drive studies on state-of-the-art single-GPU performance.
This tutorial offers participants new capabilities and advanced features that we have recently incorporated into the MGPUSim simulator as part of our recent accepted publication in HPCA 2020: “Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU systems”. We will engage the audience to let them work with the simulator. In particular, we expect participants will gain a more in-depth understanding of MGPUSim, illustrating: 1) How easy is to configure, implement and evaluate different multi-GPU system designs and inter-GPU networks; 2) How we can integrate state-of-the-art memory communication mechanisms in MGPUSim and what are the Non-Uniform Memory Access (NUMA) implications on application performance; 3) Finally, we will illustrate the applicability and usability of our new visual profiler integrated with MGPUSim named Daisen for carrying out thorough performance analysis and identification of performance bottlenecks. By the end of the tutorial, participants will be able to use Akita and MGPUSim to support their research.
The major takeaways for participants will include:
1.- Model and simulate both single-GPU and multi-GPU architectures
2.- Configure and simulate different Multi-GPU flavors
3.- Configure and simulate customized multi-GPU systems
4.- Configure and use Daisen for performance visualization
If you use MGPUSim to support your research, please use the following bibtex to cite us: