NWChem: Benchmark of MPI Parallelization Efficiency for DFT calculation

NWChem: Benchmark of MPI Parallelization Efficiency for DFT calculation

Message Passing Interface (MPI) is a standardized and portable message-passing standard to function on a wide variety of parallel computing architectures. MPI supports several compiler including Intel Composer Suite, GCC, and PGI. In practicaly, MPI is a communication protocol for programming parallel computers. NWChem code architecture is also designed based on MPI method. On your right ahnd figure, it shows the cluster of compute node, where they are connected via network cable in which the compute node can see or recognize the memory and CPU of others.

  • Distributed Memory Model
  • Message Passing Style
  • More flexible & scalable

In this benchmark, NWChem 6.8.1 was used to run in parallel using MPI method on the HPC, whose CPUs is Intel Xeon Gold 6148 20 Cores (40 Threads) 2.4GHz. Compute nodes are connected using Intel Omni-Path (100 Gbps) network. NWChem was compiled with Intel MPI+MKL 2018. The number of processor cores used is varying in range of hundred-to-thousand cores: 200 - 4000 cores.

Another candidate is MPI/CASPER in which the standard MPI was improved performance in ability of communication between processors using CASPER algorithm. The CASPER is read more details in its official website. So computational cost of the use of standard MPI was compared with that of MPI/CASPER. The following is the computational details and benchmark results.

Single-point energy calculation of a C240 molecule.

  • PBE0/6-31G(d)
  • Basis functions: 3600
  • The use of 2400 CPU cores was the fastest calculation.

Single point energy calculation of Ru(II)-C2H2-Re(I) complex

  • B3LYP/6-31G(d)
  • Basis functions: 676
  • The use of 2000 CPU cores was the fastest calculation.

Concluding remarks

  • MPI and MPI/CASPER show good parallelization efficiency for both calculations when using 200 - 2000 CPUs cores, but poor when number of CPU is higher than 2600 cores.
  • NWChem can be well exploted by MPI and MPI, especially using of CPUs of 1800 - 2400 cores.
  • Both calculations clearly show that MPI with a help from CAPSER beats standard MPI.
  • CASPER can significantly improve the ability of process communications of MPI for DFT calculation module in NWChem.

Rangsiman Ketkaew