Matt McKenzie, LONI presentation

advertisement

Preliminary CPMD Benchmarks

On Ranger, Pople, and Abe

TG AUS Materials Science Project

Matt McKenzie

LONI

What is CPMD ?

• Car Parrinello Molecular Dynamics

▫ www.cpmd.org

• Parallelized plane wave / pseudopotential implementation of Density Functional Theory

• Common chemical systems: liquids, solids, interfaces, gas clusters, reactions

▫ Large systems ~500atoms

 Scales w/ # electrons NOT atoms

Key Points in Optimizing CPMD

• Developers have done a lot of work here

• The Intel compiler is used in this study

• BLAS/LAPACK

▫ BLAS levels 1 (vector ops) and 3 (matrix-matrix ops)

 Some level 2 (vector-matrix)

• Integrated optimized FFT Library

▫ Compiler flag: -DFFT_DEFAULT

Benchmarking CPMD is difficult because…

• Nature of the modeled chemical system

▫ Solids, liquids, interfaces

 Require different parameters stressing the memory along the way

▫ Volume and # electrons

• Choice of the pseudopotential (psp)

▫ Norm-conserving, ‘soft’, non-linear core correction (++memory)

• Type of simulation conducted

▫ CPMD, BOMD, Path Integral, Simulated Annealing, etc…

▫ CPMD is a robust code

• Very chemical system specific

▫ Any one CPMD sim. cannot be easily compared to another

▫ However, THERE ARE TRENDS

FOCUS: simple wave function optimization timing

▫ This is a common ab initio calculation

Probing Memory Limitations

• For any ab initio calculation:

• Accuracy is proportional to # basis sets used

• Stored in matrices, requiring increased RAM

• Energy cutoff determines the size of the Plane wave basis set,

N

PW

= (1/2π 2 )ΩE cut

3/2

Model Accuracy & Memory Overview

Image obtained from the CPMD user’s manual

Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff)

NOTE: Choice of psp is important i.e. ‘softer’ psp = lower cutoff = loss of transferability

VASP specializes in soft psp’s ; CPMD works with any psp’s

Memory Comparison

Ψ optimization

, 63 Si atoms, SGS psp

Ecut = 50 Ryd Ecut = 70 Ryd

• N

PW

≈ 134,000

• Memory = 1.0 GB

• N

PW

≈ 222,000

• Memory = 1.8 GB

Well known CPMD benchmarking model: www.cpmd.org

Results can be shown either by:

Wall time = (n steps x iteration time/step) + network overhead

Typical Results / Interpretations, nothing new here

Iteration time = fundamental unit, used throughout any given CPMD calculation

It neglects the network, yet results are comparable

Note, CPMD runs well on a few nodes connected with gigabyte ethernet

Two important factor which affects CPMD performance

MEMORY BANDWIDTH

FLOATING-POINT

Pople, Abe, Ranger CPMD Benchmarks

8

Pople 50 Ryd Pople 70 Ryd

7

Abe 50 Ryd Abe 70 Ryd

6

5

4

3

2

1

0

0 32 64

Ranger 50 Ryd

96 128 160

Number of Cores

192

Ranger 70 Ryd

224 256

Results I

• All calculations ran no longer than 2 hours

• Ranger is not the preferred machine for CPMD

• Scales well between 8 and 96 cores

▫ This is a common CPMD trend

• CPMD is known to super-linearity scale above ~1000 processors

▫ Will look into this

▫ Chemical system would have to change as this smaller simulation is likely not to scale in this manner

Results II

• Pople and Abe gave the best performance

IF a system requires more than 96 procs, Abe would be a slightly better choice

• Knowing the difficulties in benchmarking CPMD,

( psp, volume, system phase, sim. protocol ) this benchmark is not a good representation of all the possible uses of CPMD.

▫ Only explored one part of the code

• How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance

▫ To increase system accuracy, increase E cut

Percent Difference

between 70 and 50 Ryd

%Diff = [(t

70

-t

50

) / t

50

]*100

70

60

50

40

30

20

10

0

0 32 64 96 128

Number of Cores

160

Pople

Abe

Ranger

192 224 256

Conclusions

RANGER

• Re-ran Ranger calculations

• Lower performance maybe linked to Intel compiler on AMD chips

▫ PGI compiler could show an improvement

▫ Nothing over 5% is expected: still be the slowest

▫ Wanted to use the same compiler/math libraries

ABE

• Possible super-linear scaling, t

Abe, 256procs

< t others, 256procs

• Memory size effects hinders performance below 96 procs

POPLE

• Is the best system for wave function optimization

• Shows a (relatively) stable, modest speed decrease as the memory requirement is increased, it is the recommended system

Future Work

• Half-node benchmarking

• Profiling Tools

• Test the MD part of CPMD

▫ Force calculations involving the non-local parts of the psp will increase memory

▫ Extensive level 3 BLAS & some level 2

▫ Many FFT all-to-all calls, Now the network plays a role

▫ Memory > 2 GB

A new variable ! Monitor the fictitious electron mass

• Changing the model

▫ Metallic system (lots of electrons, change of psp; E cut

▫ Check super-linear scaling

)

Download