Installation of GAMESS for OpenMP and MPI protocols
Installation of GAMESS for OpenMP and MPI protocols
Multiprocessors parallelization in high performance computing (HPC) is very important for high-speed computation. Parallelism scalability of computation heavily depends on how well the processors cores (CPUs) can communicate with each other. Sending data between processors has much more faster when latency is reduced. Many high-speed interconnect technologies have been developed, such as Intel Omni-Path architecture, Mellanox InfiniBand, and Ethernet. These all interconnect technologies generally uses message passing interface (MPI) protocol to parallel the computation.
Running quantum chemistry calculation in parallel mode using OpenMP and MPI is necessary when size of calculation becomes larger. It is impossible to run a huge calculation on standalone or personal computer (PC) or laptop. Therefore, HPC is the best solution. In this post, I would like to explain how to install/compile GAMESS program package, one of the most popular QM program, with OpenMP and MPI step-by-step. OpenMP is more convenient for general user than MPI. On the other hand, MPI is most-needed protocol for HPC cluster rather than OpenMP.
Terminology:
- Shared memory parallel (SMP) system: processor can see and use memory of other processors. Memory is locally shared.
- Distributed memory parallel system: Multiprocessor cores computer system, in which each processor has its own private memory. Computational tasks can only operate on local data.
Shared memory parallel system (OpenMP protocol)
Following is installation instruction step-by-step for compile GAMESS with OpenMP protocol.
1. Run config script to create compilation configuration for GAMESS, install.info file will be created in GAMESS directory.
For this step, please visit Installation of GAMESS for more details.
export GAMESS="/location/of/GAMESS/"
./config
2. Modify install.info file. Open install.info with text editor, e.g. VI.
vi install.info
3. In GMS_OPENMP line, change false to true.
setenv GMS_OPENMP false
to
setenv GMS_OPENMP true
4. Then edit Makefile in GAMESS directory with the same as (3).
GMS_OPENMP = false
to
GMS_OPENMP = true
5. Run the following scripts successively to compile and link program executable: compddi, compall, and lked.
cd $GAMESS/ddi
./compddi >& compddi.log &
mv ddikick.x ../
cd ../
./compall >& compall.log &
./lked gamess 00 >& lked.log &
6. Compilation of standard configuration (set GMS_OPENMP false) will take about 10 - 20 minutes, whereas OpenMP version (set GMS_OPENMP true) will take about 20 - 30 minutes.
7. After completed, there will be program executable, "gamess.00.x", in GAMESS directory.
Voilà !
Distributed memory parallel system (MPI protocol)
To install GAMESS with distributed parallel system using MPI protocol, the procedure is the same as install GAMESS using OpenMP protocol.
1. Run config script to generate install.info file, which contains compilation configuration for GAMESS.
The following code block is configuration install.info file generated on my Intel Xeon cluster equipped with Intel Omni-Path interconnect. I am using this install.info file to install GAMESS by Intel Parallel Studio XE 2018 update1.
#!/bin/csh
# compilation configuration for GAMESS
# generated on host
# generated at Sat Jul 7 00:48:07 CST 2018
setenv GMS_PATH /home/gamess-src
setenv GMS_BUILD_DIR /home/gamess-src
# machine type
setenv GMS_TARGET linux64
# FORTRAN compiler setup
setenv GMS_FORTRAN ifort
setenv GMS_IFORT_VERNO 18
# mathematical library setup
setenv GMS_MATHLIB mkl
setenv GMS_MATHLIB_PATH /opt/intel/2018_u1/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64
setenv GMS_MKL_VERNO 12
# parallel message passing model setup
setenv GMS_DDI_COMM mpi
setenv GMS_MPI_LIB impi
setenv GMS_MPI_PATH /opt/intel/2018_u1/compilers_and_libraries_2018.1.163/linux/mpi
# LIBCCHEM CPU/GPU code interface
setenv GMS_LIBCCHEM false
# Intel Xeon Phi build: none/knc/knl
setenv GMS_PHI none
# Shared memory type: sysv/posix
setenv GMS_SHMTYPE sysv
# SC17 SCF OpenMP support: true/false
setenv GMS_OPENMP false
# Please match your changes to the GMS_OPENMP
# flag in /home/u7/u31rkk00/gamess-src/Makefile
# before running make
2. After install.info is created, run compddi script in ddi folder and wait for a while. For mpi distribution, ddikick.x would not be generated. Do not need to move ddikick.x to GAMESS directory.
3. Back to GAMESS directory, run compall script to compile program and wait about 20 - 30 minutes.
4. Run lked script to link GAMESS executable with version "00".
5. Program executable will be generated in this directory, "gamess.00.x".
Normally, rungms script (provided by developer) is used to execute GAMESS calculation. It works smoothly for OpenMP, but, for my GAMESS/MPI compilation, it fails to execute a calculation. I tried to modify this the rungms script but it still fails after several attempts. So, I decided to create my own rungms script, "rungms.MPI". This modified script can be downloaded from my Github repository PBS-Submission/rungms.MPI.
Voilà ! Execute GAMESS calculation with MPI protocol using rungms.MPI script is tricky, however, it works for me.
Program Testing
- SMP - OpenMP protocol
I personally suggest to run GAMESS for OpenMP with omp-exam12.inp, file is available at $GAMESS/tests/openmp/parallel/omp-exam12.inp
This example is RHF (and DFT) calculation using OpenMP algorithm. Before run, you should set number of SMP threads using following command
export OMP_NUM_THREADS=N
where N is sensible number of SMP threads, such as 4, 8, 16, 20, and 32. CPU utilization is N*100 %.
To run GAMESS with 8 threads, use following commands
export OMP_NUM_THREADS=8
rungms omp-exam12.inp 00 8 >& exam12.openmp.out &
00 is version number specified when program executable was linked.
8 is number of SMP threads.
Use top command to see if CPU utilization is 800 %.
top
- Distributed memory parallel system - MPI protocol
Keep in mind that you should always set number of SMP threads to 1 to yield the highest efficiency of MPI.
To run GAMESS calculation using rungms.MPI script, use the following commands.
export OMP_NUM_THREADS=1
rungms.MPI input.inp 00 N >& output.out
where N is total number of processors cores, such as 16, 32, 64, 200, and 400, etc.
Use top command to see if N MPI processors is running.
Managing Jobs on HPC using Workload Manager
GAMESS calculation can be managed to run on HPC by using server workload manager, such as PBS, Torque (PBS), PBS Professional, SGE, and Slurm. One of the powerful workload manager is PBS Professional or PBS Pro. I wrote the script to submit GAMESS job using PBS Pro, to run in parallel using OpenMP and MPI protocols. The scripts can be downloaded from this Github repository. For job running with shared memory parallel system (OpenMP), please refer to subgms, whereas distributed memory parallel system (MPI) refers to subgms. These two scripts were written and tested based on server policy of TAIWANIA cluster, NCHC, Taiwan. https://iservice.nchc.org.tw.
How to use subgms and subgmsmpi with PBS Pro job manager.
- subgms
GAMESS 20180214 R1 Interactive Job Submission on TAIWANIA, NCHC, Taiwan
-----------------------------------------------------------------------
Usage: subgms input[.inp] [output[.out]] [-help]
Example: subgms water_hf.inp
subgms water_hf.inp water_hf_16cores.out
- subgmsmpi
GAMESS 20180214 R1 Interactive Job Submission on TAIWANIA, NCHC, Taiwan
-----------------------------------------------------------------------
Usage: subgmsmpi input[.inp] [output[.out]] [-help]
Example: subgmsmpi water_hf.inp
subgmsmpi water_hf.inp water_hf_16cores
Both subgms and subgmsmpi programs are freely downloadable and modifiable. Normal rungms script provided by GAMESS developer cannot be used with my program. subgms requires a rungms.mod script, whereas subgmsmpi requires rungms.MPI script, to execute GAMESS calculation.
Use following commands to see Help page of program for preparation instruction of rungms.mod and rungms.MPI for your cluster.
- subgms
subgms help
- subgmsmpi
subgmsmpi help
For proper way of submitting GAMESS job using OpenMP and MPI, read the last page of the following article: An eicient MPI/OpenMP parallelization of the Hartree-Fockmethod for the second generation of Intel®Xeon PhiTMprocessor
Rangsiman Ketkaew