Gaussian: Benchmark of G16 and Nvidia Tesla P100 GPU Acceleration

Gaussian: Benchmark of G16 and Nvidia Tesla P100 GPU Acceleration

Gaussian 16 (G16) has implemented the general-purpose graphics processing unit (GPGPU) that can be used for enhancing the speedup of quantum calculation. GPGPU in G16 can only be used with Nvidia GPU Tesla series including K20, K40, and P100 (on the day of this writing I am using G16 revision B.01). GPGPU is now available for accelerating Hartree-Fock (HF) and Density functional theory (DFT) methods, particularly gradient and frequencies (hessian) calculations. From the recommendation of G16 developers, the GPGPU is not effective for n-th order Møller–Plesset (MPn) nor Coupled Cluster (CC) calculations as well as small calculations. Thus G16-GPGPU should only be used for large calculations.

In this post, I am going to show you a benchmark of the GPU speedup of the state-of-the-art  Nvidia Tesla P100 SXM 16GB using G16 revision B.01 compiled with GPGPU versus the regular CPU-based version of this G16 runtime. The test calculation is the geometry optimization of vomilenine in gas phase using DFT.

Compute Node Specification

Preparation of G16 input for exploiting GPGPU

G16 is very friendly for users, even newbies. Likewise, it is easy to prepare an input file for running a calculation using CPU and GPU. First, you have to know how many CPU cores and GPU your machine has.  Use the command lscpu to check the available CPU ranks and use nvidia-smi to check the GPU utilization of your machine's Nvidia GPU. 

In my case, I have 40 CPU cores and 4 GPUs. 4 cores out of all CPU cores will be used to control the 4 GPUs. So, the number of active CPU cores for the calculation will be 36 cores.  This means that the total number of CPU and GPU used for the computation are 36 CPU cores and 4 GPUs, respectively.  To set up a GPGPU input file based on this allocation, I replace the line of %nprocshared=N with the following lines

%CPU=0-39

%GPU=0-3=36-39

I call this CPU36+GPU4, which means that the calculation will 

For more details please consult http://gaussian.com/gpu/.


My G16-GPU calculation is submitted by using command

 g16 < input > output 2>&1 

2>&1 is used to print stderr and stdout to output file.

To make sure that your G16 calculation is actually using GPGPU, use nvidia-smi utility and check at the beginning of the output file. Below is an example of nvidia-smi interface and a part of the output file.

GPU utilization

Calculation using CPU8+GPU4 processors.  4 GPUs are used for GPGPU and 4 of 12 CPU cores are used to control those GPUs.

Computational details

Here is input file https://pastebin.com/B6GfC1Kc.

Structure of Vomilenine

Benchmark Results

Benchmark-G16-GPGPU

Concluding remarks


Rangsiman Ketkaew