Detailed Outline

Detailed Outline

The program of the tutorial can be adapted to a half-day tutorial and also to a full-day tutorial. For the full-day version, the outline is the following:

  • Presentation of remote GPU virtualization techniques and rCUDA basic features (1 hour): This first part of the tutorial is devoted to introduce and motivate the use of remote GPU virtualization technologies. To that end, the main drawbacks of current GPU clusters will be analyzed, presenting how to address these concerns. Later, the rCUDA framework will be presented, also introducing some basic performance numbers showing that this technology is able to address the concerns previously raised.
  • Practical demonstration about how to install and use rCUDA (1 hour): After the initial presentation of the rCUDA technology provided in the previous part of the tutorial, in this part the presenters will connect to our cluster in Technical University of Valencia in order to show how to install the rCUDA software. Also, the execution of several applications with the rCUDA framework will be presented.
  • Presentation of rCUDA advanced features (1 hour): Once the concept and basic ideas of remote GPU virtualization and the rCUDA technology have been introduced as well as practical demonstrations on the use of the rCUDA framework have been presented, the presenters will provide several insights on more advanced features such as the use of the rCUDA technology with virtual machines, an in-depth execution time and energy consumption analysis at the cluster level, bandwidth optimizations, etc.
  • Guided exercises for the audience using rCUDA in a remote cluster (1.5 hours): In this part of the tutorial the presenters will guide attendees along different exercises so that the audience can practice with rCUDA in our cluster in Technical University of Valencia. At the end of this part of the tutorial attendees will have practical knowledge on the remote GPU virtualization technique. We propose the following exercises:
  1. Local GPU vs. Remote GPU: simple applications to help attendees become familiar with the software environment (deviceQuery, bandwidthTest, etc.)
  2. Local GPU vs. Remote GPU: advanced applications (matrixMult, nbody, monteCarlo, etc.)
  3. Local GPU vs. Remote GPU: applications using CUDA libraries (CUBLAS, CUFFT, CURAND, etc.)
  4. Local GPU vs. Remote GPU: production codes (LAMMPS, CUDASW, GPU-BLAST, etc.)
  5. Local CPU vs. Remote GPU: demonstrate that using a remote GPU is faster than using a local CPU when running parallel applications
  6. Using multiple GPUs:
    • Environment configuration for using multiple remote GPUs
    • Aggregated bandwidth of 4 local GPUs vs. 6 remote GPUs
    • Multi-GPU application (i.e., monteCarloMultiGPU): 4 local GPUs vs. 6 remote GPUs
  7. Sharing GPUs: all the attendees sharing the same GPU at the same time to see influence on performance
  • Time for attendees to freely exercise with rCUDA in the remote cluster (1.5 hours): In this last part of the tutorial attendees will spend time on more advanced practical exercises in our cluster in Technical University of Valencia. Two possibilities are proposed to attendees. On the one hand, several exercises will be proposed so that those attendees willing to be guided can try them. On the other hand, for those who prefer to follow their own path, this last part of the tutorial will provide some time for them. In both cases the tutorial presenters will be available to answer questions and support attendees while using rCUDA.