OpenMP
OpenMP
OpenMP (https://computing.llnl.gov/tutorials/openMP/ ), an API that enables direct multi-threaded, shared memory parallelism.
Important Notes
Request the number of processors that match the number of OpenMP threads that you want to use
If your requirement is very high look for available smp nodes at HPC Resource View.
Want to start from basic C++ using omp pragmas? vist this site.
Compiling OpenMP Code
The compilers available to you in HPC supports OpenMP. Find the available compilers at Software Guide and use one of the compilers appropriate to your source code (C/C++ or Fortran) as showed below:
Using GNU Compiler
gcc -fopenmp <your-c-code> -o <executable-name>
g++ -fopenmp <your-c++-code> -o <executable-name>
gfortran -fopenmp <your-fortran-code> -o <executable-name>
Using Intel Compiler
icc -openmp <your-c/c++-code> -o <executable-name>
ifort -openmp <your-Fortran-code>-o <executable-name>
Using PGI Compiler
The PGI module needs to be loaded to use PGI compiler. Use the command "module avail pgi" to see the available version. For the default version, type:
module load pgi
Compiling:
pgcc -mp <your-c/c++-code> -o <executable-name>
pgf90 -mp <your-Frotran-code> -o <executable-name>
Running Batch Job
Example - hello.c using Intel Compiler:
Copy/Paste hello.c file in your home directory.
#include <omp.h>
main () {
int nthreads, tid;
/* Fork a team of threads with each thread having a private tid variable */
#pragma omp parallel private(tid)
{
/* Obtain and print thread id */
tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);
/* Only master thread does this */
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
} /* All threads join master thread and terminate */
}
Compile using intel compiler icc. Intel module is loaded by default.
icc -openmp hello.c -o hello
After compiling hello executable, create the following job.slurm script:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH -J openmpi_test
#SBATCH --output=hello.txt
nproc=$(( $SLURM_JOB_CPUS_PER_NODE * $SLURM_NNODES ))
echo $nproc threads
cp hello $PFSDIR
cd $PFSDIR
# Set the number of Threads and Execute
export OMP_NUM_THREADS=$nproc
module load intel
./hello
cp * $SLURM_SUBMIT_DIR
Submit the job:
sbatch job.slurm
Output file (hello.txt):
4 threads
Hello World from thread = 0
Hello World from thread = 1
Hello World from thread = 2
Number of threads = 4
Hello World from thread = 3
Scale-Up
Copy the C code "pi_trape.c" which calculates the pi using trapezoid integration. Compile the code and run it as a batch job as described above. Note the smaller values for elapsed time for OpenMP.
Sample examples can be found at https://computing.llnl.gov/tutorials/openMP/exercise.html
References:
1. Introduction