Tutorial Overview
Abstract
GPUs are widely used to accelerate scientific applications, but their adoption in HPC clusters presents several drawbacks. First, in addition to increasing acquisition costs, using accelerators also increments maintenance and space costs. Second, energy consumption is also increased. Third, GPUs in a cluster may present a low utilization rate. In consequence, virtualizing the GPUs of the cluster is an appealing strategy to simultaneously dealing with all these drawbacks. Additionally, cluster throughput is increased whereas costs and energy consumption are reduced.
In this tutorial we present the remote GPU virtualization technique as well as its benefits, and introduce one of these frameworks: rCUDA. We present the latest developments within this framework: improved bandwidth to remote GPUs, the use of low-power processors and job schedulers, virtual machines, etc. In the hands-on part of the tutorial we expose how to install and use this freely available remote GPU virtualization solution. We demonstrate how by using rCUDA over a high-performance interconnect the overhead of remote GPU virtualization is reduced to negligible values. Finally, attendees will be able to exercise with rCUDA by connecting to a real cluster located at Technical University of Valencia. This cluster includes several nodes with GPUs and the rCUDA software.
Tutorial goals
The goal of the proposed tutorial is to introduce remote GPU virtualization technologies to attendees, presenting how these technologies can increase overall cluster throughput by making the use of GPUs more flexible. Also, as a second goal, this tutorial pursues providing attendees a practical knowledge about the use in a cluster of the remote GPU virtualization technology. Attendees will benefit from the tutorial by acquiring a wide knowledge about this new virtualization technique, which may provide time and energy savings to the clusters in their home institutions or companies.
Key insights the tutorial intends to provide the audience with
General description of tutorial content
The tutorial includes two well-defined types of contents. On the one hand, there is a theoretical presentation introducing general concepts of the remote GPU virtualization technique and also the rCUDA framework. This presentation also provides many insights on the features and performance of such framework. Notice, however, that these features about the rCUDA technology are general features available in every modern GPU virtualization framework. On the other hand, the tutorial includes a practical part, which first shows how to install and use the rCUDA framework and later proposes attendees different exercises so that they can explore by themselves the rCUDA framework. This practical part is carried out in a cluster located at Technical University of Valencia. Notice that both parts are appropriately interleaved in order to create more expectation and in order to catch attendees’ attention.