IlTodorov_slides

advertisement
DL_POLY: Software and Applications
I.T. Todorov & W. Smith
ARC Group & CC Group
CSED, STFC Daresbury Laboratory, Daresbury
Warrington WA4 1EP, Cheshire, England, UK
Where is Daresbury?
Molecular Dynamics: Definitions
• Theoretical tool for modelling the detailed microscopic behaviour
of many different types of systems, including gases, liquids, solids,
surfaces and clusters.
• In an MD simulation, the classical equations of motion governing
the microscopic time evolution of a many body system are solved
numerically, subject to the boundary conditions appropriate for the
geometry or symmetry of the system.
• Can be used to monitor the microscopic mechanisms of energy
and mass transfer in chemical processes, and dynamical properties
such as absorption spectra, rate constants and transport properties
can be calculated.
• Can be employed as a means of sampling from a statistical
mechanical ensemble and determining equilibrium properties.
These properties include average thermodynamic quantities
(pressure, volume, temperature, etc.), structure, and free energies
along reaction paths.
DL_POLY Project Background
• General purpose parallel (classical) MD simulation software
• It was conceived to meet the needs of CCP5 - The Computer
Simulation of Condensed Phases (academic collaboration
community)
• Written in modularised Fortran90 (NagWare & FORCHECK
compliant) with MPI2 (MPI1+MPI-I/O) fully self-contained
• 1994 – 2011: DL_POLY_2 (RD) by W. Smith & T.R. Forester
(funded for 6 years by EPSRC at DL) -> DL_POLY_CLASSIC
• 2003 – 2011: DL_POLY_3 (DD) by I.T. Todorov & W. Smith
(funded for 4 years by NERC at Cambridge) -> DL_POLY_4
• Over 11,000 licences taken out since 1994
• Over 1000 registered FORUM members since 2005
• Available free of charge (under licence) to University
researchers (provided as code) and at cost to industry
DL_POLY_DD Development Statistics
DL_POLY_DD Licence Statistics
DL_POLY Licence Statistics
DL_POLY Licence Statistics
DL_POLY Licence Statistics
DL_POLY Project Current State
• January 2011: DL_POLY_2 -> DL_POLY_CLASSIC on a BSD
type Licence (BS retired but supporting GUI and fixes)
• October 2010: DL_POLY_3 -> DL_POLY_4 still under STFC
Licence, over 1300 licences taken out since November 2010
• Rigid Body dynamics
• Parallel I/O & netCDF I/O – NAG dCSE (IJB & ITT)
• CUDA+OpenMP port (source, ICHEC) & MS Windows port
(installers)
• SPME processor grid freed from 2^N decomposition –
NAG dCSE (IJB)
• Load Balancer development (LJE, finished 30/03/2011)
• Continuous Development of DL_FIELD (pdb to DLP I/O, CY)
Current Versions
• DL_POLY_4 (version 1.2)
– Dynamic Decomposition parallelisation, based on
domain decomposition but with dynamic load balancing
– limits up to ≈2.1×109 atoms with inherent parallelisation.
– Full force field and molecular description with rigid body
description
– Free format (flexible) reading with some fail-safe features
and basic reporting (but fully fool-proofed)
• DL_POLY Classic (version 1.6)
– Replicated Data parallelisation, limits up to ≈30,000
atoms with good parallelisation up to 64 (system
dependent) processors (running on any processor count)
– Full force field and molecular description
– Hyper-dynamics: Temperature Accelerated Dynamics &
Biased Potential Dynamics, Solvation Dynamics – Spectral
Shifts, Metadynamics, Path Integral MD
– Free format reading but somewhat strict
Supported Molecular Entities
Rigid
molecules
Point ions
and atoms
Flexibly
linked rigid
molecules
Polarisable
ions (core+
shell)
Flexible
molecules
Rigid
bonds
Rigid bond
linked rigid
molecules
Force Field Definitions – I
• particle: rigid ion or atom (charged or not), a core or a shell of a
polarisable ion(with or without associated degrees of freedom), a
massless charged site. A particle is a countable object and has a
global ID index.
• site: a particle prototype that serves to defines the chemical &
physical nature (topology/connectivity/stoichiometry) of a particle
(mass, charge, frozen-ness). Sites are not atoms they are
prototypes!
• Intra-molecular interactions: chemical bonds, bond angles,
dihedral angles, improper dihedral angles, inversions. Usually, the
members in a unit do not interact via an inter-molecular term.
However, this can be overridden for some interactions. These are
defined by site.
• Inter-molecular interactions: van der Waals, metal (EAM, Gupta,
Finnis-Sinclair, Sutton-Chen), Tersoff, three-body, four-body.
Defined by species.
Force Field Definitions – II
• Electrostatics: Standard Ewald*, Hautman-Klein (2D) Ewald*,
SPM Ewald (3D FFTs), Force-Shifted Coulomb, Reaction Field,
Fennell damped FSC+RF, Distance dependent dielectric constant,
Fuchs correction for non charge neutral MD cells.
• Ion polarisation via Dynamic (Adiabatic) or Relaxed shell model.
• External fields: Electric, Magnetic, Gravitational ,Oscillating &
Continuous Shear, Containing Sphere, Repulsive Wall.
• Intra-molecular like interactions: tethers, core shells units,
constraint and PMF units, rigid body units. These are also defined
by site.
• Potentials: parameterised analytical forms defining the
interactions. These are always spherically symmetric!
• THE CHEMICAL NATURE OF PARTICLES DOES NOT CHANGE
IN SPACE AND TIME!!!
Force Field by Sums
 

V( r1 , r2 ,....., rN ) 
N'

 
U pair (| ri  rj |) 
i, j
N'

U Tersoff

  
ri , rj , rk  
i, j, k
ε metal
 N'
 

V
(|
r  rj |) 
  pair i
 i, j
U bond
i bond
i b on d
U 3  body
U dihed
  
ri , rj , rk  
 
, ra , rb  
N'

U 4  body

   
ri , rj , rk , rn  
 N'
  
F   ρ ij (| ri  rj |)   


 i, j

N

i
N ang le

U angle
i
angle
  
, ra , rb , rc  
i dihed
   
, ra , rb , rc , rd  
N in v ers

U invers
i invers
   
, ra , rb , rc , rd  
i in v ers
N teth er
i teth er
i, j
i, j, k, n
i d ih ed



q iq j
  
| ri  rj |
i ang le
N d ih ed


4π  0
N'
i, j, k
N b on d

N'
1
U tether
i tether
 
, rt , rt  0  
N core -sh ell

i core -sh ell
U core - shell
i
core - shell
 
, | ri  rj |  
N

i 1
Φ external

 ri 
Ensembles and Algorithms
Integration:
Available as velocity Verlet (VV) or leapfrog Verlet (LFV)
generating flavours of the following ensembles
• NVE
• NVT (Ekin) Evans
• NVT Andersen^, Langevin^, Berendsen, Nosé-Hoover
• NPT Langevin^, Berendsen, Nosé-Hoover, MartynaTuckerman-Klein^
• NT/NPnAT/NPnT Langevin^, Berendsen, NoséHoover, Martyna-Tuckerman-Klein^
Constraints & Rigid Body Solvers:
• VV dependent – RATTLE, No_Squish, QSHAKE*
• LFV dependent – SHAKE, Euler-Quaternion, QSHAKE*
Assumed Parallel Architecture
DL_POLY is designed for homogenious
distributed parallel machines
M0
P0
P4
M4
M1
P1
P5
M5
M2
P2
P6
M6
M3
P3
P7
M7
Replicated Data
A
B
C
D
Initialize
Initialize
Initialize
Initialize
Forces
Forces
Forces
Forces
Motion
Motion
Motion
Motion
Statistics
Statistics
Statistics
Statistics
Summary
Summary
Summary
Summary
P0Local
force
terms
P1Local
force
terms
P2Local
force
terms
Processors
Molecular
force field
definition
Global Force Field
Bonded Forces within RD
RD Scheme for long-ranged part of SPME
U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G.
Pedersen, J. Chem. Phys., (1995), 103, 8577
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Calculate self interaction correction
Initialise FFT routine (FFT – 3D FFT)
Calculate B-spline coefficients
Convert atomic coordinates to scaled fractional units
Construct B-splines
Construct charge array Q
Calculate FFT of Q array
Construct array G
Calculate FFT of G array
Calculate net Coulombic energy
Calculate atomic forces
Domain Decomposition
21
A
B
C
D
P0Local
atomic
indices
Tricky!
P1Local
atomic
indices
P2Local
atomic
indices
Processor Domains
Molecular
force field
definition
Global force field
Bonded Forces within DD
DD Scheme for long-ranged part of SPME
U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G.
Pedersen, J. Chem. Phys., 103, 8577 (1995)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Calculate self interaction correction
Initialise FFT routine (FFT – IJB’s DaFT: 3M2 1D FFT)
Calculate B-spline coefficients
Convert atomic coordinates to scaled fractional units
Construct B-splines
Construct partial charge array Q
Calculate FFT of Q array
Construct partial array G
Calculate FFT of G array
Calculate net Coulombic energy
Calculate atomic forces
I.J. Bush, I.T. Todorov, W. Smith, Comp. Phys. Commun., 175, 323
(2006)
Performance Weak Scaling on IBM p575
2005-2011
Solid Ar (32'000 atoms per CPU)
NaCl (27'000 ions per CPU)
SPC Water (20'736 ions per CPU)
1000
800
Speed Gain
fec
r
pe
n
tio
a
s
eli
l
l
a
33 million atoms
ar
p
t
28 million atoms
600
400
21 million atoms
n
allelisatio
r
a
p
d
o
o
g
200
max load
max load
max load
0
0
200
400
600
Processor Count
700'000 atoms per 1GB/CPU
220'000 ions per 1GB/CPU
210'000 ions per 1GB/CPU
800
1000
Rigid Bodies versus Constraints
450,000 particles with DL_POLY_4
Scaling
10
9
ICE7
steps per second
8
ICE7_CB
7
6
5
4
3
2
1
0
0
100
200
300
Np
400
500
600
I/O Weak Scaling on IBM p575
2005-2007
800
Solid Ar
dashed lines show shut-down times
NaCl
solid lines show start-up times
SPC Water
Time [s]
600
400
200
0
0
200
400
600
Processor Count
800
1000
Benchmarking BG/L Jülich 2007
Perfect
MD step total
Link cells
van der Waals
Ewald real
Ewald k-space
16000
14000
Speed Gain
12000
10000
8000
6000
4000
14.6 million particle Gd2Zr2O7 system
2000
2000
4000
6000
8000
10000
Processor count
12000
14000
16000
Benchmarking XT4/5 UK 2010
Perfect
MD step total
Link cells
van der Waals
Ewald real
Ewald k-space
8000
7000
Speed Gain
6000
5000
4000
3000
2000
14.6 million particle Gd2Zr2O7 system
1000
1000
2000
3000
4000
5000
Processor count
6000
7000
8000
Benchmarking on Various Platforms
9
8
6
-1
Evaluations [s ]
7
CRAY XT4 SC
CRAY XT4 DC
CRAY XT3 SC / IBM P6+
CRAY XT3 DC
BG/L
BG/P
IBM p575
3GHz Woodcrest DC
5
4
3
2
1
3.8 million particle Gd2Zr2O7 system
0
0
500
1000
Processor count
1500
2000
Importance of I/O - I
Types of MD studies most dependent on I/O
• Large length-scales (109 particles), short time-scale such as screw
deformations
• Medium big length-scales (106–108 particles), medium time-scale (ps-ns)
such as radiation damage cascades
• Medium length-scale (105–106 particles), long time-scale (ns-s) such as
membrane and protein processes
Types of I/O: portable
• ASCII
+
• Binary
–
• XDR Binary +
human readable
+
–
–
loss of precision
–
+
+
size
–
+
+
Importance of I/O - II
Example: 15 million system simulated with 2048 MPI tasks
MD time per timestep ~0.7 (2.7) seconds on Cray XT4 (BG/L)
Configuration read ~100 sec. (once during the simulation)
Configuration write ~600 sec. for 1.1 GB with the fastest I/O method –
MPI-I/O for Cray XT4 (parallel direct access for BG/L).
BG/L 16,000 MPI tasks – MD time per timestep 0.5 sec. with a
configuration write a frame ~18,000 sec.
I/O in native binary is only 3-5 times faster and 3-7 times smaller
Some unpopular solutions
• Saving only the important fragments of the configuration
• Saving only fragments that have moved more than a given distance
between two consecutive dumps
• Distributed dump – separated configuration in separate files for each
MPI task (CFD)
I/O Solutions in DL_POLY_4
1. Serial read and write (sorted/unsorted) – where only a single
MPI task, the master, handles it all and all the rest communicate in
turn to or get broadcasted to while the master completes writing a
configuration of the time evolution.
2. Parallel write via direct access or MPI-I/O (sorted/unsorted) –
where ALL / SOME MPI tasks print in the same file in some orderly
manner so (no overlapping occurs using Fortran direct access
printing. However, it should be noted that the behaviour of this
method is not defined by the Fortran standard, and in particular we
have experienced problems when disk cache is not coherent with the
memory).
3. Parallel read via MPI-I/O or Fortran
4. Serial NetCDF read and write using NetCDF libraries for
machine-independent data formats of array-based, scientific data
(widely used by various scientific communities).
Performance for 216,000 Ions of NaCl
on XT5
MPI-I/O Write Performance for
216,000 Ions of NaCl on XT5
3.09
3.10
3.09
3.10
Cores
I/O Procs Time/s Time/s Mbyte/s Mbyte/s
32
32 143.30
1.27
0.44
49.78
64
64
48.99
0.49
1.29 128.46
128
128
39.59
0.53
1.59 118.11
256
128
68.08
0.43
0.93 147.71
512
256 113.97
1.33
0.55
47.60
1024
256 112.79
1.20
0.56
52.47
2048
512 135.97
0.95
0.46
66.39
MPI-I/O Read Performance for
216,000 Ions of NaCl on XT5
3.10 New
3.10 New
Cores
I/O Procs Time/s Time/s Mbyte/s Mbyte/s
32
16
3.71
0.29
17.01 219.76
64
16
3.65
0.30
17.28 211.65
128
32
3.56
0.22
17.74 290.65
256
32
3.71
0.30
16.98 213.08
512
64
3.60
0.48
17.53 130.31
1024
64
3.64
0.71
17.32
88.96
2048
128
3.75
1.28
16.84
49.31
DL_POLY Project Background
• Rigid body dynamics and decomposition freed SPME
• no topology and calcite potentials
• Fully parallel I/O: reading and writing in ASCII,
optionally including netCDF binary in AMBER format
• CUDA (ICHEC) and Windows ports
• New GUI (Bill Smith)
• Over 1,300 licences taken out since November 2010
• DL_FILED field builder (Chin Yong) – 300 licencesc
DL_FILED
• AMBER & CHARM to DL_POLY
• OPLSAA & Drieding to DL_POLY
xyz,
PDB
Protonated
DL_FIELD
‘black box’
FIELD
CONFIG
DL_POLY Roadmap
• August 2011 – March 2012: PRACE-1IP-WP7 funds effort by
ICHEC towards CUDA+OpenMP port, SC@WUT towards
OpenCL+OpenMP port, and FZ Julich for FMP library testing
• October 2011 – October 2012: EPSRC’s dCSE funds effort by
NAG Ltd.
• OpenMP within MPI vanilla
• Beyond 2.1 billion particles
• October 2011 – September 2012: 2 Temperature Thermostat
Models, Fragmented I/O, On-the-Fly properties
• November 2011 – September 2013: MMM@HPC, Gentle
thermostat, Hyperdynamics
Acknowledgements
Thanks to
• Bill Smith (retired)
• Ian Bush (NAG Ltd.)
• Christos Kartsaklis (ORNL), Ruairi Nestor (ICHEC)
http://www.ccp5.ac.uk/DL_POLY/
Download