LAMMPS-ICMS version 19 Nov 2010
Here as some simple comparisons (note the different number of CPU cores) with results from lammps-icms binaries compiled for various machines on the NSF TeraGrid. This is mainly to give an orientation to see what kind of performance can be expected from using hybrid OpenMP/MPI or GPU/MPI parallel execution of LAMMPS. The input files and raw output can be retrieved from
the LAMMPS-ICMS git repository.
Machine
- NICS Kraken: Cray XT5, 2x 2.6 GHz 6-core AMD Opteron per node, SeaStar2
- TACC Ranger: Sun Constellation, 4x 2.3 GHz 4-core AMD Opteron per node, SDR Infiniband
- NCSA Abe: Dell Blade Cluster, 2x 2.33 GHz 4-core Intel Xeon (Clovertown), DDR Infiniband
- TACC Longhorn: Dell r610 Cluster, 2x 2.53 GHz 4-core Intel Xeon (Nehalem), 2x Nvidia Quadro FX 5800, QDR Infiniband
Test system 1: Lennard-Jones melt: 1,149,984 particles for 10,000 MD steps
| Machine |
Nodes
|
MPI tasks
|
OpenMP/MPI
|
GPU/MPI |
Time/s |
| NICS Kraken |
32 |
384 |
-
|
-
|
72.8
|
NICS Kraken
|
32 |
192 |
2
|
-
|
72.6
|
NICS Kraken
|
32
|
128
|
3
|
-
|
79.4
|
NICS Kraken
|
32
|
64 |
6
|
-
|
102.7
|
TACC Ranger
|
8 |
128 |
- |
- |
230.0
|
TACC Ranger
|
8 |
64 |
2 |
- |
215.3
|
TACC Ranger
|
8 |
32 |
4 |
-
|
305.7
|
TACC Longhorn
|
16
|
128
|
-
|
-
|
103.0
|
TACC Longhorn
|
16
|
64
|
2
|
-
|
110.5
|
TACC Longhorn
|
16
|
32
|
-
|
1
|
51.9
|
NCSA Abe
|
16 |
112
|
-
|
-
|
193.9
|
NCSA Abe
|
16 |
64 |
2 |
- |
201.1
|
NCSA Lincoln
|
16 |
32 |
- |
1 |
74.7
|
Test system 2: Gay-Berne droplet: 64,000 particles for 10,000 MD steps
| Machine |
Nodes
|
MPI tasks
|
OpenMP/MPI
|
GPU/MPI |
Time/s |
NICS Kraken
|
16
|
192 |
-
|
-
|
738.9
|
NICS Kraken
|
16
|
96
|
2
|
-
|
746.2
|
NICS Kraken
|
16
|
64
|
3
|
-
|
731.9
|
NICS Kraken
|
16
|
32 |
6
|
-
|
752.1
|
TACC Ranger
|
8 |
128 |
- |
- |
905.8
|
TACC Ranger
|
8 |
64 |
2 |
- |
917.1
|
TACC Ranger
|
8 |
32 |
4 |
- |
909.1
|
TACC Longhorn
|
16
|
128
|
-
|
-
|
510.1
|
TACC Longhorn
|
16
|
32
|
4
|
-
|
508.4
|
TACC Longhorn
|
16
|
32
|
-
|
1
|
93.4
|
NCSA Abe
|
16 |
112
|
-
|
-
|
1042.3
|
NCSA Abe
|
16 |
64 |
2 |
- |
921.3
|
Test system 3: SDS coarse grain monolayer: 250,240 particles for 10,000 MD steps
| Machine |
Nodes
|
MPI tasks
|
OpenMP/MPI
|
GPU/MPI |
Time/s |
NICS Kraken
|
8
|
96 |
-
|
-
|
254.0
|
NICS Kraken
|
8
|
48
|
2
|
-
|
230.9
|
NICS Kraken
|
8
|
32
|
3
|
-
|
248.9
|
NICS Kraken
|
8
|
16 |
6
|
-
|
344.9
|
| TACC Ranger |
8 |
128 |
- |
- |
294.7
|
| TACC Ranger |
8 |
64 |
2 |
- |
193.5
|
| TACC Ranger |
8 |
32 |
4 |
- |
262.0
|
TACC Longhorn
|
16
|
128
|
-
|
-
|
125.2
|
TACC Longhorn
|
16
|
64
|
2
|
-
|
108.3
|
TACC Longhorn
|
16
|
32
|
-
|
1
|
43.7
|
NCSA Abe
|
16 |
112
|
-
|
-
|
212.3
|
NCSA Abe
|
16 |
64 |
2 |
- |
171.3
|
Test system 4: Bulk CHARMM water : 84,000 atoms for 10,000 MD steps
| Machine |
Nodes
|
MPI tasks
|
OpenMP/MPI
|
GPU/MPI |
Time/s |
NICS Kraken
|
16
|
192 |
-
|
-
|
430.7
|
NICS Kraken
|
16
|
96
|
2
|
-
|
347.8
|
NICS Kraken
|
16
|
64
|
3
|
-
|
348.8
|
NICS Kraken
|
16
|
32 |
6
|
-
|
508.1
|
| TACC Ranger |
8 |
128 |
- |
- |
681.7
|
| TACC Ranger |
8 |
64 |
2 |
- |
547.3
|
| TACC Ranger |
8 |
32 |
4 |
- |
668.0
|
TACC Longhorn
|
16
|
128
|
-
|
-
|
323.9
|
TACC Longhorn
|
16
|
64
|
2
|
-
|
298.6
|
TACC Longhorn
|
16
|
32
|
-
|
1
|
211.1
|
NCSA Abe
|
16 |
112
|
-
|
-
|
527.1
|
NCSA Abe
|
16 |
64 |
2 |
- |
430.3
|
|