OpenACC (http://openacc.org/) is a new parallel programming standard designed to enable C and FORTRAN programs to easily access GPU. The OpenACC API (Application Program Interface) describes a collection of compiler directives to specify loops and regions of code to be offloaded from a host CPU to an attached accelerator (GPU device).
Want to start from basic C++ using acc pragmas? vist this site.
Request a GPU node. For class Markov cluster use Account (-A) and Partition (-p) flags appropriately.
srun -p gpu --gres=gpu:1 -N 1 -n 6 --pty /bin/bash
Load NVIDIA HPC Suite (NVHPC) . Check appropriate version using "module spider NVHPC"
module load NVHPC
You can get the CUDA driver information issuing a command:
pgaccelinfo
Output:
PGI Compiler Option: cc75
Copy the below C source to a file named "calculate-pi.c" in your home directory. In the C source, we are just adding #pragma acc to run that loop in GPU. To prevent the transfer of data back and forth between the host and the device during computation, which degrade the performance, use the data clause (e.g. #pragma acc data copy(data1), create(data2) ) before kernels and loops. Refer to the references for details.
#include <stdio.h>
#define N 1000000
int main(void) {
double pi = 0.0f; long i;
#pragma acc parallel loop
for (i=0; i<N; i++) {
double t= (double)((i+0.5)/N);
pi +=4.0/(1.0+t*t);
}
printf("pi=%16.15f\n",pi/N);
return 0;
}
Compile:
nvc -Minfo=all -acc -gpu=cc75 calculate-pi.c -o test
where,
-acc enables recognition of OpenACC pragmas and include OpenACC runtime libraries
-Minfo provides the script info during compilation
-o create object file test
Compiler information using -Minfo=all provides important information about the performance and whether the code is parallelizable or not:
main:
6, Generating compute capability 2.0 binary
8, Loop is parallelizable
#pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */
CC 2.0 : 20 registers; 2056 shared, 48 constant, 0 local memory bytes; 100% occupancy
10, Sum reduction generated for pi
Executing:
./test
output: pi=3.141592653589877
Work in progress ...
References:
Practice directories exercises and solutions: /usr/local/doc/OPENACC (obtained from OpenACC workshop at Pittsburgh oct 16-17 2014.)
Getting Started
OpenACC Specification Guide:
NVIDIA Resources:
CUDA/OPENACC: http://developer.nvidia.com/cuda/openacc
TIPS for Optimization: http://www.nvidia.com/docs/IO/117377/directives-tips-for-c.pdf