LAMMPS

LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is a classical molecular dynamics program that has among its capabilities distributed-memory message-passing parallelism (MPI); spatial-decomposition of simulation domain for parallelism; easy to extend with new features and functionality; atomic, polymeric, biological, metallic, granular, or hybrid systems; pairwise potentials: Lennard-Jones, Coulombic, Buckingham, Morse, Yukawa, frictional granular, tabulated, hybrid; molecular potentials: bond, angle, dihedral, improper, class 2 (COMPASS); polymer potentials: all-atom, united-atom, bead-spring; long-range Coulombics: Ewald and PPPM (similar to particle-mesh Ewald); CHARMM and AMBER force-field compatability; constant NVE, NVT, NPT integrators; rRESPA hierarchical timestepping; SHAKE bond and angle constraints; parallel tempering (replica exchange); targeted molecular dynamics (TMD) constraints; and a variety of boundary conditions and constraints.

Important Note

The name of the executable may have been changed in different versions of LAMMPS. Find them using "module display lammps" command.

Installed Versions

All the available versions of LAMMPS for use can be viewed by issuing the following command. This applies for other applications as well.

module spider LAMMPS

output:

LAMMPS: LAMMPS/23Jun2022-foss-2021b-kokkos-CUDA-11.4.1

-----------------------------------------------------------------------------------------------------------------

Description:

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively ..

Running LAMMPS in HPC

Copy the job files "*.slurm" and LAMMPS input file "in.lj" from /usr/local/doc/LAMMPS to your home directory

cp /usr/local/doc/LAMMPS/* .

Serial Job

Submit your serial job:

sbatch lammps.slurm

Find the output at slurm-<jobid>.out

cat slurm-<jobid>.out

output:

...

Performance: 10167.140 tau/day, 23.535 timesteps/s

81.6% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:

---------------------------------------------------------------

Pair | 723.2 | 723.2 | 723.2 | 0.0 | 85.10

Neigh | 88.351 | 88.351 | 88.351 | 0.0 | 10.40

Comm | 13.44 | 13.44 | 13.44 | 0.0 | 1.58

Output | 0.052433 | 0.052433 | 0.052433 | 0.0 | 0.01

Modify | 18.884 | 18.884 | 18.884 | 0.0 | 2.22

Other | | 5.865 | | | 0.69

Nlocal: 32000 ave 32000 max 32000 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Nghost: 18872 ave 18872 max 18872 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Neighs: 1.19999e+06 ave 1.19999e+06 max 1.19999e+06 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 1199991

Ave neighs/atom = 37.4997

Neighbor list builds = 1000

Dangerous builds not checked

Total wall time: 0:14:09

Multi-node/Multi-core Parallel Job

Submit your parallel job:

sbatch mpi-lammps.slurm

Find the output at slurm-<jobid>.out

Performance: 170137.133 tau/day, 393.836 timesteps/s

383.7% CPU use with 4 MPI tasks x 4 OpenMP threads

....

MPI task timing breakdown:

---------------------------------------------------------------

Pair | 24.732 | 25.055 | 25.813 | 8.9 | 49.34

Neigh | 3.9396 | 4.0063 | 4.1408 | 4.0 | 7.89

Comm | 18.723 | 19.751 | 20.74 | 19.7 | 38.89

Output | 0.038683 | 0.055631 | 0.069498 | 5.1 | 0.11

Modify | 1.2665 | 1.8835 | 3.0945 | 53.6 | 3.71

Other | | 0.03051 | | | 0.06

Nlocal: 8000 ave 8038 max 7964 min

Histogram: 1 0 0 1 0 1 0 0 0 1

Nghost: 8590 ave 8604 max 8573 min

Histogram: 1 0 1 0 0 0 0 0 0 2

Neighs: 300139 ave 302472 max 298510 min

Histogram: 2 0 0 0 0 0 1 0 0 1

Total # of neighbors = 1200557

Ave neighs/atom = 37.5174

Neighbor list builds = 1000

Dangerous builds not checked

Total wall time: 0:00:50

GPU Job

In /usr/local/doc/LAMMP mentioned in the section above, you will also find the input file (in-gpu.lj) and job file (gpu-lammps.slurm) for GPU job.

Now, run the gpu job

sbatch gpu-lammps.slurm

Find the output at slurm-<jobid>.out

...

- Using acceleration for lj/cut:

- with 1 proc(s) per device.

--------------------------------------------------------------------------

Device 0: Tesla K40m, 15 CUs, 11/11 GB, 0.74 GHZ (Mixed Precision)

--------------------------------------------------------------------------

...

MPI task timing breakdown:

---------------------------------------------------------------

Pair | 16.118 | 16.118 | 16.118 | 0.0 | 64.31

Neigh | 0.00025239 | 0.00025239 | 0.00025239 | 0.0 | 0.00

Comm | 2.0493 | 2.0493 | 2.0493 | 0.0 | 8.18

Output | 0.014197 | 0.014197 | 0.014197 | 0.0 | 0.06

Modify | 6.1671 | 6.1671 | 6.1671 | 0.0 | 24.61

Other | | 0.7154 | | | 2.85

Nlocal: 32000 ave 32000 max 32000 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Nghost: 18751 ave 18751 max 18751 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Neighs: 0 ave 0 max 0 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 0

Ave neighs/atom = 0

Neighbor list builds = 1000

Dangerous builds not checked

Total wall time: 0:00:25

Note: compare the wall time by running your MPI and GPU job

Refer to HPC Guide to Molecular Modeling and Visualization and HPC Software Guide for more information.

References:

Instructions to write input file for lammps can be found in project website.

http://lammps.sandia.gov/

LAMMPS Documentation - https://lammps.sandia.gov/doc/Manual.html