HECToR Capability Challenge Simulating Protein Control of Crystal Nuclei D. Quigley1, C. L. Freeman2, P. M. Rodger1 and J. H. Harding.2 1. Dept. of Chemistry and Centre for Scientific Computing, University of Warwick. 2. Dept. of Engineering Materials, University of Sheffield. 16/03/2009 MARC-Sino Workshop on Computer Simulation Contents • Motivation – Biomineralisation. – Ovocleidin-17. • Molecular dynamics – Influence of protein binding on mineral nanoparticle structure? – Timescale challenges and metadynamics. – Length-scale challenges and HECToR. • Computational Issues – File IO and domain decomposition. • Results – Ovocleidin-17 as a catalyst for crystal nucleation? 16/03/2009 MARC-Sino Workshop on Computer Simulation Biomineralisation? Emiliania huxleyi coccoliths Sheets of aragonite tablets Mollusc shells Henriksen, et al. American Mineralogist 89, 1709-1716 (2004) Nudelman et al Faraday Discuss. 136, 9-25 (2007) Calcite crystals grown without organic molecules Columns of calcite Morphological control by surface specific binding? Growth of minerals on biological membranes? Nucleation enhancement and polymorph selection? Reddy, M. M. and A. R. Hoch. Journal of Colloid and Interface Science 235, 365-370 (2001) 16/03/2009 MARC-Sino Workshop on Computer Simulation OvocleidinOvocleidin-17 • Avian eggshell growth is one of the fastest mineralisation processes in the natural world. • Various proteins are present with relative concentrations varying during the growth process. [Nys et al, C. R. Palevol 3, 549 (2004)] • Ovocledin-17 is found only in the shell of hen eggs. A crystal structure is available. [Lavelin et al, Poultry Science 79, 1014 (2000)] • What role does this protein play in controlling the nucleation and growth of polycrystalline calcium carbonate? • How can molecular simulation help in understanding this process? 16/03/2009 MARC-Sino Workshop on Computer Simulation Molecular Dynamics Aim : Sample the ensemble of configurations available to a nanoparticle of calcium carbonate bound to ovocleidin-17 and hence identify and characterise the thermodynamically stable state. • Protein contains 142 residues, 1599 atomic sites in AMBER united atom force-field. 16/03/2009 MARC-Sino Workshop on Computer Simulation Molecular Dynamics Aim : Sample the ensemble of configurations available to a nanoparticle of calcium carbonate bound to ovocleidin-17 and hence identify and characterise the thermodynamically stable state. • Protein contains 142 residues, 1599 atomic sites in AMBER united atom force-field. • Nanoparticle contains 192 formula units of CaCO3, 1500 atomic sites modelled with Pavese / Freeman force-field. 16/03/2009 MARC-Sino Workshop on Computer Simulation Molecular Dynamics Aim : Sample the ensemble of configurations available to a nanoparticle of calcium carbonate bound to ovocleidin-17 and hence identify and characterise the thermodynamically stable state. • Protein contains 142 residues, 1599 atomic sites in AMBER united atom force-field. • Nanoparticle contains 192 formula units of CaCO3, 1500 atomic sites modelled with Pavese / Freeman force-field. • Explicit water essential. Known to mediate mineral-organic interactions through structuring at interface. • Require ~20,000 TIP3P waters to eliminate image interactions. 16/03/2009 MARC-Sino Workshop on Computer Simulation Problems • Sampling of crystallisation events. – Free energy cost to forming an order-disorder interface. – Negligibly rare on nanosecond timescale accessible to molecular dynamics. – Requires special methods such as umbrella sampling or metadynamics. • Simulation size (~100,000 atomic sites) – – – – Requires domain decomposition to treat efficiently (DL_POLY 3). System decomposes into (8 x 8 x 8) or (8 x 8 x 4) interacting subdomains. Investigate 4 plausible protein-nanoparticle binding geometries. 1 CPU core per subdomain, requiring either 2048 or 1024 cores in total. • Simulation length – Forces from metadynamics require short Δt to integrate accurately (0.5 fs). – Aim for 20 Δt per wall-clock second. – Gathering sufficient statistics requires 5 – 7 weeks continuous runtime. 16/03/2009 MARC-Sino Workshop on Computer Simulation Metadynamics [Laio & Parrinello P.N.A.S. 99 12562 (2002)] Small Gaussian bias potentials are added to current location in order parameter space at intervals Taug. Pushed over free energy barriers into unexplored regions. Provided motion of order parameters is adiabatically slow, the free-energy is recovered as the negative of the total bias. 16/03/2009 MARC-Sino Workshop on Computer Simulation Collective Variables • Q4 Steinhardt “bond” orientation order parameter [ Steinhardt et al. Phys.Rev.B. 28 784 – 805 (1983) ] k runs over all separation vectors (“bonds”) within a cut-off radius. Contributions reinforce for an ordered system, and cancel for a disordered system. We characterise the nanoparticle using six collective variables. Q4 Calcium-Calcium Q4 Calcium-Carbon Q4 Calcium-Oxygen Q4 Carbon-Carbon Q4 Carbon-Oxygen Local nanoparticle potential energy See Quigley and Rodger, Mol. Simul. in press (2009) for details. 16/03/2009 MARC-Sino Workshop on Computer Simulation HECToR Capability Challenge • Cray XT4 scalar system opened for early access in September 2007. • 11,328 CPU cores with Cray interconnect. • Well suited to domain decomposed MD simulations. • Parallel LUSTRE filesystem. • 4 “capability challenge” projects began Jan 2008 sharing ~ 50% of allocated CPU time from January – August. • Well placed to bid for this programme having adapted our code to HECToR during early access testing. 16/03/2009 MARC-Sino Workshop on Computer Simulation Domain Decomposition and IO • • Initial performance disappointing, 1.18 Δt per wall-clock second for ~100,000 atomic sites. Writing 22Mb coordinate snapshot to disk every 500 Δt. computation routines file IO routines 2 4 6 8 10 12 14 time (minutes) • • • • Profile dominated by file IO. IO strategy of “everything out through MPI rank 0.” Gather and sort of atomic co-ordinates timed at < 5 seconds. IO routines dominated by write to disk – low bandwidth. 16/03/2009 MARC-Sino Workshop on Computer Simulation Parallel IO Strategies • Each sub-domain writes to the same file simultaneously. – – – – • Each of the P sub-domains writes its own snapshot file in local index order. – – – – • Makes use of P times the original bandwidth. Comms overhead in computing file offset for each atom site. Written using Fortran direct access mode, incurs a head-seek latency. Not defined in Fortran standard – can lead to corruption. Also makes use of P times the original bandwidth. Uses contiguous writes. Global snapshot data must be reconstructed offline for compatibility. Suitable only for “small” systems, intractable with millions of sites. Solutions based on MPI-IO – – – Longer lead time to implement, but now standard in DL_POLY 3. Avoid seek cost by writing in sub-domain order (breaks compatibility). Alternatively have subset of “IO nodes” which sort and write in global index order. See technical reports TR0707 and TR0806 by I Todorov and I. J. Bush at http://www.hpcx.ac.uk/research/hpc/ 16/03/2009 MARC-Sino Workshop on Computer Simulation Results – no protein • Control calculations with no protein. • 192 unit nanoparticle in water. • Initialised as amorphous. G/kBT Projection of six dimensional free energy surface • 52,113 steps with w=3.78 kBT (13 ns). • 67,091 steps with w=1.0 kBT (15.5 ns). 16/03/2009 MARC-Sino Workshop on Computer Simulation Results - Protein • Nanoparticle bound to a spatial concentration of arginine residues. • Free energy barrier to crystallisation removed! G/kBT Projection of six dimensional free energy surface Implies that ovocleidin-17 has a role in accelerating the transformation of amorphous calcium carbonate deposits into calcite nuclei. 16/03/2009 MARC-Sino Workshop on Computer Simulation Summary • We have used molecular dynamics simulations to reveal a possible role for ovocleidin-17 in hen egg biomineralisation. • Timescale issues associated with crystallisation events can be addressed with metadynamics. • Simulations of this size and duration were only possible via the HECToR capability challenge initiative. • Arginine-rich regions of ovocleidin-17 bind strongly to amorphous calcium carbonate and can accelerate crystallisation to calcite. 16/03/2009 MARC-Sino Workshop on Computer Simulation Acknowledgements Bill Smith, Ilian Todorov, Ian Bush (STFC) Martyn Foster, David Tanqueray (Cray) All members of the EPSRC funded consortium “Modelling of the Biological Interface with Materials” 16/03/2009 MARC-Sino Workshop on Computer Simulation