PGI Compilers

PGI (not available in HPC; use nvhpc instead)

Portland Group Inc compilers (PGI: http://www.pgroup.com/about/index.htm) is a supplier of software compilers and tools for parallel computing. PGI 2011 ( http://www.pgroup.com/resources/accel.htm) and later also includes the PGI Accelerator Fortran and C99 compilers for x64+NVIDIA systems running under Linux, Mac OS X and Windows; PGFORTRAN and PGCC accelerator compilers are supported on all Intel and AMD x64 processor-based systems with CUDA-enabled NVIDIA GPUs.

Important Notes

Refer to Licence section regarding license information
PGI supports GPU, refer to OpenACC
Want to start from basic C++ using omp & acc pragmas? vist this site.

Installed Versions

All the available versions of PGI for use can be viewed by issuing the following command. This applies for other applications as well.

module avail pgi

output:

---------------------- /usr/local/share/modulefiles -------------------------

pgi/17.10 pgi/20.1 (D)

The default version is identified by "(D)" behind the module name and can be loaded as:

module load pgi

The other versions of PGI can be loaded as:

module load pgi/<version>

Compiling a Code & Executing

Compiling a C or C++ Program

Interactive

Request the compute node:

srun -c 4 --mem=8g --pty /bin/bash

Load the PGI module

module load pgi

Copy the content below in a hello.c file

hello.c:

#include <stdio.h>

int main()

{

printf("Hello World\n");

return 0;

}

Compile the code to create "hello" as an executable

pgcc -o hello hello.c

Run the Executable

./hello

Your output will be: Hello World

For .cc files as showed, use pgCC.

#include <iostream>

using namespace std;

int main(int argc, char **argv)

{

std::cout << "Hello world!";

return 0;

}

Batch

Copy the content in your file (let's say job.slurm).

#!/bin/bash

#SBATCH --time=00:10:00

#SBATCH -c 4

#SBATCH --mem=8g

#SBATCH -J PGItest

module load pgi

cd $PBS_O_WORKDIR

cp hello.c $TMPDIR

cd $TMPDIR

pgcc -o hello hello.c

./hello

cp * $PBS_O_WORKDIR

Submit the job:

sbatch job.slurm

You will find the output at PGItest.o<jobid>

GPU Compiling

Request a GPU node

srun -p gpu --gres=gpu:1 -c 4 --mem=8g --time=10 --pty /bin/bash

compile:

pgcc test.c -ta=nvidia -fast -Minfo -o output

For more information, visit OpenACC.

Compiling a FORTRAN Program

To compile the FORTRAN 77 source file "test.f" with the Intel FORTRAN 77 compiler, use the command

pgf77 -o test test.f

To compile the FORTRAN 90 source file "test.f90" with the Intel FORTRAN 90 compiler, use the command

pgf90 -o test test.f90

Note that the PGI FORTRAN 77 compiler is used for source files ending in ".f" and the PGI FORTRAN 90 compiler is used for source files ending in ".f90".

License Management

Check both PGI licenses by issuing the commands below. Check the server & port in the first line at /usr/local/pgi/license.dat

module load matlab

$MATLAB/etc/lmstat -a -c 27009@hpcmaster.priv.cwru.edu

You can check the number of licenses you have checked out using "| grep <caseID>" at the end of the above commands. Since we are limited by the licenses, try to minimize the license use.

Compilers & Flags

For example, to compile a GPU code using PGI's C Compiler, use:

srun -c 4 --mem=8g --pty /bin/bash

pgcc test.c -ta=nvidia -fast -Minfo -o output

An Example of LEVEL 2 BLAS (Matrix <- Vector x Vector) GPU Code, to be used only with PGI compiler:

#include <stdio.h>

#include <stdlib.h>

#include <assert.h>

//This code does a blas-level-2:

//Matrix <- vec x vec

int main(int argc, char *argv[]){

float *restrict gpu_vec1;

float *restrict gpu_vec2;

float *restrict gpu_result;

int cnt,i,j,N,n=1000; //i,j are row, column; N=matrix size; n=vector size; cnt=tmp. count variable

N=n*n;

gpu_vec1=(float*)malloc(n*sizeof(float));

gpu_vec2=(float*)malloc(n*sizeof(float));

gpu_result=(float*)malloc(N*sizeof(float));

for(cnt=0;cnt<n;cnt++){ gpu_vec1[cnt]=(float)cnt; gpu_vec2[cnt]=(float)cnt; }

#pragma acc region //Runs on the GPU - specific only to PGI Compiler

{

for(i=0;i<n;i++){ for(j=0;j<n;j++){ gpu_result[(i+(1000)*j)]=gpu_vec1[j]*gpu_vec2[i];} }

}

Refer to HPC Guide to Compiling and Linking for more info.