On Ranger, Pople, and Abe
TG AUS Materials Science Project
Matt McKenzie
LONI
• Car Parrinello Molecular Dynamics
▫ www.cpmd.org
• Parallelized plane wave / pseudopotential implementation of Density Functional Theory
• Common chemical systems: liquids, solids, interfaces, gas clusters, reactions
▫ Large systems ~500atoms
Scales w/ # electrons NOT atoms
• Developers have done a lot of work here
• The Intel compiler is used in this study
• BLAS/LAPACK
▫ BLAS levels 1 (vector ops) and 3 (matrix-matrix ops)
Some level 2 (vector-matrix)
• Integrated optimized FFT Library
▫ Compiler flag: -DFFT_DEFAULT
• Nature of the modeled chemical system
▫ Solids, liquids, interfaces
Require different parameters stressing the memory along the way
▫ Volume and # electrons
• Choice of the pseudopotential (psp)
▫ Norm-conserving, ‘soft’, non-linear core correction (++memory)
• Type of simulation conducted
▫ CPMD, BOMD, Path Integral, Simulated Annealing, etc…
▫ CPMD is a robust code
• Very chemical system specific
▫ Any one CPMD sim. cannot be easily compared to another
▫ However, THERE ARE TRENDS
• FOCUS: simple wave function optimization timing
▫ This is a common ab initio calculation
• For any ab initio calculation:
• Accuracy is proportional to # basis sets used
• Stored in matrices, requiring increased RAM
• Energy cutoff determines the size of the Plane wave basis set,
N
PW
= (1/2π 2 )ΩE cut
3/2
Image obtained from the CPMD user’s manual
Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff)
NOTE: Choice of psp is important i.e. ‘softer’ psp = lower cutoff = loss of transferability
VASP specializes in soft psp’s ; CPMD works with any psp’s
Ψ optimization
, 63 Si atoms, SGS psp
Ecut = 50 Ryd Ecut = 70 Ryd
• N
PW
≈ 134,000
• Memory = 1.0 GB
• N
PW
≈ 222,000
• Memory = 1.8 GB
Well known CPMD benchmarking model: www.cpmd.org
Results can be shown either by:
Wall time = (n steps x iteration time/step) + network overhead
Typical Results / Interpretations, nothing new here
Iteration time = fundamental unit, used throughout any given CPMD calculation
It neglects the network, yet results are comparable
Note, CPMD runs well on a few nodes connected with gigabyte ethernet
Two important factor which affects CPMD performance
MEMORY BANDWIDTH
FLOATING-POINT
8
Pople 50 Ryd Pople 70 Ryd
7
Abe 50 Ryd Abe 70 Ryd
6
5
4
3
2
1
0
0 32 64
Ranger 50 Ryd
96 128 160
Number of Cores
192
Ranger 70 Ryd
224 256
• All calculations ran no longer than 2 hours
• Ranger is not the preferred machine for CPMD
• Scales well between 8 and 96 cores
▫ This is a common CPMD trend
• CPMD is known to super-linearity scale above ~1000 processors
▫ Will look into this
▫ Chemical system would have to change as this smaller simulation is likely not to scale in this manner
• Pople and Abe gave the best performance
• IF a system requires more than 96 procs, Abe would be a slightly better choice
• Knowing the difficulties in benchmarking CPMD,
( psp, volume, system phase, sim. protocol ) this benchmark is not a good representation of all the possible uses of CPMD.
▫ Only explored one part of the code
• How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance
▫ To increase system accuracy, increase E cut
between 70 and 50 Ryd
%Diff = [(t
70
-t
50
) / t
50
]*100
70
60
50
40
30
20
10
0
0 32 64 96 128
Number of Cores
160
Pople
Abe
Ranger
192 224 256
RANGER
• Re-ran Ranger calculations
• Lower performance maybe linked to Intel compiler on AMD chips
▫ PGI compiler could show an improvement
▫ Nothing over 5% is expected: still be the slowest
▫ Wanted to use the same compiler/math libraries
ABE
• Possible super-linear scaling, t
Abe, 256procs
< t others, 256procs
• Memory size effects hinders performance below 96 procs
POPLE
• Is the best system for wave function optimization
• Shows a (relatively) stable, modest speed decrease as the memory requirement is increased, it is the recommended system
• Half-node benchmarking
• Profiling Tools
• Test the MD part of CPMD
▫ Force calculations involving the non-local parts of the psp will increase memory
▫ Extensive level 3 BLAS & some level 2
▫ Many FFT all-to-all calls, Now the network plays a role
▫ Memory > 2 GB
A new variable ! Monitor the fictitious electron mass
• Changing the model
▫ Metallic system (lots of electrons, change of psp; E cut
▫ Check super-linear scaling
)