OpenAcc

OpenACC

OpenACC (http://openacc.org/) is a new parallel programming standard designed to enable C and FORTRAN programs to easily access GPU. The OpenACC API (Application Program Interface) describes a collection of compiler directives to specify loops and regions of code to be offloaded from a host CPU to an attached accelerator (GPU device).

Important Notes

Want to start from basic C++ using acc pragmas? vist this site.

Running OpenACC Jobs

Interactive Job Submission

Request a GPU node. For class Markov cluster use Account (-A) and Partition (-p) flags appropriately.

srun -p gpu --gres=gpu:1 -N 1 -n 6 --pty /bin/bash

Load NVIDIA HPC Suite (NVHPC) . Check appropriate version using "module spider NVHPC"

module load NVHPC

You can get the CUDA driver information issuing a command:

pgaccelinfo

Output:

PGI Compiler Option: cc75

Copy the below C source to a file named "calculate-pi.c" in your home directory. In the C source, we are just adding #pragma acc to run that loop in GPU. To prevent the transfer of data back and forth between the host and the device during computation, which degrade the performance, use the data clause (e.g. #pragma acc data copy(data1), create(data2) ) before kernels and loops. Refer to the references for details.

#include <stdio.h>

#define N 1000000

int main(void) {

double pi = 0.0f; long i;

#pragma acc parallel loop

for (i=0; i<N; i++) {

double t= (double)((i+0.5)/N);

pi +=4.0/(1.0+t*t);

}

printf("pi=%16.15f\n",pi/N);

return 0;

}

Compile:

nvc -Minfo=all -acc -gpu=cc75 calculate-pi.c -o test

where,

-acc enables recognition of OpenACC pragmas and include OpenACC runtime libraries

-Minfo provides the script info during compilation

-o create object file test

Compiler information using -Minfo=all provides important information about the performance and whether the code is parallelizable or not:

main:

6, Generating compute capability 2.0 binary

8, Loop is parallelizable

#pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */

CC 2.0 : 20 registers; 2056 shared, 48 constant, 0 local memory bytes; 100% occupancy

10, Sum reduction generated for pi

Executing:

./test

output: pi=3.141592653589877

Using both OpenACC & OpenMP

Work in progress ...

References:

Practice directories exercises and solutions: /usr/local/doc/OPENACC (obtained from OpenACC workshop at Pittsburgh oct 16-17 2014.)
Getting Started
- http://www.openacc.org/Getting_Started (videos)
- Guide: http://moss.csc.ncsu.edu/~mueller/cluster/arc/openACC_gs.pdf
OpenACC Specification Guide:
- http://www.openacc.org/sites/default/files/OpenACC.2.0a_1.pdf
NVIDIA Resources:
- CUDA/OPENACC: http://developer.nvidia.com/cuda/openacc
- TIPS for Optimization: http://www.nvidia.com/docs/IO/117377/directives-tips-for-c.pdf

Page updated

Report abuse