Molecular dynamics tutorial with applications to aqueous systems Garold Murdachaew 1400-1600, 13-14 October 2015 Chemicum A122 1 Outline Why should I learn about molecular simulations? Why should I learn about aqueous systems? CP2K package for molecular simulations VMD package for molecular visualization and analysis gnuplot and bash scripts and fortran codes for analysis of MD trajectories Hands-on MD exercises at CSC (taito) and on your local linux machine using CP2K and VMD 2 Why should I learn about molecular simulations? Another tool in your toolbox to study systems more complex than clusters Simulations are computer experiments Simulations allow one to see atomic detail and discover reaction mechanisms Simulations allow one to model difficult conditions or processes: not possible in lab (e.g., high P, high T, etc.) too dangerous (e.g., deactivation/breakdown of nerve agents) too expensive Always keep in mind: “The purpose of computing is insight, not numbers.” – Richard Hamming, Numerical Methods for Scientists and Engineers 3 Why should I learn about aqueous systems? Water is ubiquitous Atmospheric and environmental chemistries (one example: molecular adsorption and chemical reactions on wet and icy surfaces can lead to ozone holes) Catalysis Astrochemistry Simulated production of biological precursors on ice grains in the interstellar medium: http://pubs.acs.org/doi/abs/10.1021/jp502738x (see picture 1) Water is necessary for life Biochemistry and biology Ion channels Protein folding to native structure (see picture 2): http://www0.cs.ucl.ac.uk/staff/d.jones/t42morph.html “Liquid water is not a bit player in the theatre of life, it’s the headline act.” – Martin Chaplin, London South Bank University, Water Structure and Science, http://www1.lsbu.ac.uk/water/ 4 The phase diagram of water is complex 5 http://www1.lsbu.ac.uk/water/water_phase_diagram.html CP2K package for molecular simulations CP2K is free, open source (Fortran 2003), capable, and versatile package with a large, active user and developer base Some key parts of CP2K (we will use the bolded capabilities in the exercises) FIST: classical molecular mechanics Quickstep: density functional calculations QM/MM: quantum mechanics and classical mechanics Molecular dynamics, Monte Carlo, and much more See: http://www.cp2k.org/ Science with CP2K: http://www.cp2k.org/science Upcoming CECAM workshop: http://www.cecam.org/workshop-1122.html Previous CECAM workshop: http://www.cecam.org/workshop-273.html Tutorials: http://www.cp2k.org/tutorials Exercises: http://www.cp2k.org/exercises Input manual: http://manual.cp2k.org/trunk/CP2K_INPUT.html Google Groups: https://groups.google.com/forum/#!forum/cp2k On taito: module load cp2k-env/2.5 sbatch cp2k_script.bash 6 VMD package for molecular visualization & analysis VMD is free to download Can be used for visualization and also analysis (gpu acceleration possible) Can handle large systems and long trajectories in many formats (xyz, etc.) Can produce publication quality snapshots and movies in many popular formats Can be run interactively or using a script See: http://www.ks.uiuc.edu/Research/vmd/ Tutorials: http://www.ks.uiuc.edu/Research/vmd/current/docs.html#tutorials Documentation: http://www.ks.uiuc.edu/Research/vmd/current/docs.html Mailing list for questions: http://www.ks.uiuc.edu/Research/vmd/mailing_list/vmd-l/ On taito: module load vmd vmd system.xyz or vmd -e vmd_script.vmd 7 Exercises Hands-on exercises at CSC (taito) and on your local linux machine (ask if you wish to run locally) using CP2K and VMD. (Note that all examples are already equilibrated but you should confirm this.) Structure and dynamics of ambient bulk liquid water using— Example 1: Classical potential (exercise4) Example 2: Density functional theory (exercise5) Calculate: Internal energy (enthalpy); Structure (RDFs); Diffusion coefficient (Einstein relation); IR spectrum. Compare to experiment. Example 3: Rare instance of formic acid dissociation at the air-water interface studied with DFT (exercise6) Timescale of deprotonation; Grotthus migration of the proton defect; Mechanisms; RDFs. Extra examples (ask if interested): Minimum energy structures of water clusters (H2O)n=1-21 from density functional theory; Sulfuric acid deprotonation on wet quartz surface using DFT; etc. 8 Important CP2K and theory references Quickstep: http://www.sciencedirect.com/science/article/pii/S0010465505000615 (paper1) Performance of BLYP-D2 for water and effectiveness in reproducing the hydrogen bond: http://pubs.acs.org/doi/abstract/10.1021/jp901990u (paper2); see also: https://en.wikipedia.org/wiki/Hydrogen_bond ; https://en.wikipedia.org/wiki/Water_model Grotthuss mechanism: http://www.sciencedirect.com/science/article/pii/000926149500905J (paper3); see also: https://en.wikipedia.org/wiki/Grotthuss_mechanism Grimme’s DFT-D2: http://onlinelibrary.wiley.com/doi/10.1002/jcc.20495/abstract (paper4) or see: https://en.wikipedia.org/wiki/London_dispersion_force Books: M. P. Allen, D. J. Tildesley, Computer Simulation of Liquids (1989) Donald McQuarrie, Statistical Mechanics (1976, 2000) Dominik Marx, Jürg Hutter, Ab Initio Molecular Dynamics: Basic Theory and Advanced Methods (2009) Mark Tuckerman, Statistical Mechanics: Theory and Molecular Simulation (2010) View the wiki links; then download and start reading these papers, starting with the Quickstep paper (paper1), while you are waiting for calculations to finish. Finish the reading at home. papers5,6,7 (see next page) may also be helpful. Some of you may already have backgrounds in these areas, some do not. Thus I included the wiki links to give a quick flavor. 9 Recent publications from Halonen group using CP2K Relevant papers: Relevant to Examples 1 and 2: Simulated with semiempirical method (NDDO): “Semiempirical Self-Consistent Polarization Description of Bulk Water, the Liquid-Vapor Interface, and Cubic Ice” http://pubs.acs.org/doi/abs/10.1021/jp110481m (paper5) Relevant to Example 3: Simulated with DFT and shows acid deprotonation and Grotthus mechanism: “Dissociation of HCl into Ions on Wet Hydroxylated (0001) α-Quartz” http://pubs.acs.org/doi/abs/10.1021/jz4017969 (paper6) Relevant to Example 3: Simulated with classical potentials and shows molecular scattering : “Nitrogen dioxide at the air–water interface: trapping, absorption, and solvation in the bulk and at the surface” http://pubs.rsc.org/en/content/articlehtml/2012/cp/c2cp42810e (paper7) Other papers: 10 Ice slab and proton hopping example using DFT from Sampsa Riikonen: “Ionization of Acids on the Quasi-Liquid Layer of Ice” http://pubs.acs.org/doi/abs/10.1021/jp505627n Simulated with DFT and shows acid deprotonation and Grotthus mechanism: “First and second deprotonation of H2SO4 on wet hydroxylated (0001) α-quartz” http://pubs.rsc.org/en/content/articlehtml/2014/cp/c4cp02752c CP2K example 1: Water with classical potential @SET BASE_NAME run @SET ID 01 &NONBONDED &LENNARD-JONES &GLOBAL PROJECT liq ATOMS O O PREFERRED_FFT_LIBRARY FFTW EPSILON 78.198 ! this is K, = 0.155 kcal/mol = 0.650 kJ/mol PRINT_LEVEL LOW SIGMA 3.166 RCUT 11.4 RUN_TYPE GEOMETRY_OPTIMIZATION &END LENNARD-JONES &END GLOBAL &LENNARD-JONES ATOMS O H &MOTION EPSILON 0.0 &GEO_OPT SIGMA 3.6705 TYPE minimization RCUT 11.4 OPTIMIZER BFGS &END LENNARD-JONES MAX_ITER 400 ! 200 is default &LENNARD-JONES &END GEO_OPT ATOMS H H &END MOTION EPSILON 0.0 SIGMA 3.30523 &FORCE_EVAL RCUT 11.4 METHOD FIST &END LENNARD-JONES &MM &END NONBONDED &POISSON &END FORCEFIELD &EWALD &END MM EWALD_TYPE spme ALPHA .44 &SUBSYS GMAX 25 25 25 &SUBSYS &CELL O_SPLINE 6 ABC 12.4138 12.4138 12.4138 &END EWALD &END CELL &END POISSON &COORD &FORCEFIELD O 12.25967785390 1.34872474190 12.42975017890 H2O EMAX_ACCURACY 500.0 H 12.28658481340 1.45497852510 11.43794042330 H2O EMAX_SPLINE 1.0E15 ! 10000000000.0 H 12.12685964540 2.28501721350 12.78165108500 H2O EPS_SPLINE 1.0E-9 ... 10.52064998830 9.65806143920 9.70630308870 H2O &SPLINE &END SPLINE H &END COORD &BEND &TOPOLOGY ATOMS H O H &GENERATE K 0. THETA0 1.8 &END BEND &BEND ! BONDLENGTH_MAX 2.0 BONDPARM_FACTOR 0.9 &END GENERATE ATOMS O H H &END TOPOLOGY K 0. &KIND O THETA0 1.8 ELEMENT O &END BEND &END KIND &BOND &KIND H ATOMS O H ELEMENT H K 0. &END KIND R0 1.8 &PRINT &END BOND &BOND ATOMS H H &CELL &END CELL &END PRINT K 0. &END SUBSYS R0 1.8 &PRINT &END BOND &CHARGE ATOM O CHARGE -0.8476 &GRID_INFORMATION &END GRID_INFORMATION &END PRINT &END FORCE_EVAL &END CHARGE &CHARGE ! RESTART_FILE_NAME ./run-01.restart CHARGE 0.4238 !&END EXT_RESTART &END CHARGE 11 !&EXT_RESTART ATOM H CP2K example 1: Water with classical potential As you can see, the cp2k input file can have four major sections (order of the sections is not important). Note that ”!” or ”#” comments out the line. &FORCE_EVAL METHOD FIST &MM &POISSON &EWALD … &END MM &SUBSYS &CELL &GLOBAL ABC 12.4138 12.4138 12.4138 PROJECT liq &END CELL PREFERRED_FFT_LIBRARY FFTW &COORD PRINT_LEVEL LOW O 12.25967785390 1.34872474190 12.42975017890 H2O RUN_TYPE GEOMETRY_OPTIMIZATION H 12.28658481340 1.45497852510 11.43794042330 H2O H 12.12685964540 2.28501721350 12.78165108500 H2O &END GLOBAL … &MOTION &GEO_OPT TYPE minimization &END COORD …. &END FORCE_EVAL OPTIMIZER BFGS MAX_ITER 400 ! 200 is default &END GEO_OPT &END MOTION 12 !&EXT_RESTART ! RESTART_FILE_NAME ./run-01.restart !&END EXT_RESTART Running example 1 1. login to taito (you are going to be doing calculations in the queue, thus have open in a web browser for reference: https://research.csc.fi/taito-user-guide) 2. cd $WRKDIR 3. cp –pr /wrk/murdacha/md_class . (copy directories with fortran analysis codes and examples to your WRKDIR) 4. cd md_class/ANALYZE_PROGRAMS (compile two simple fortran-2003 analysis programs; later try to understand these programs since you may run them) 5. module load gcc 6. cd src-analyze-water 7. make analyze.x 8. cd ../src-rdf-water 9. make rdf.x 10. cd $WRKDIR/liq_spce (this is the input we just went over = Exercise4 for the class) 11. sbatch runit.bash 1. But first: edit if needed the input and script; module load vmd; vmd geometry.xyz or vmd –e liq.vmd to see the starting geometry 12. Examine the output: 13 1. Use gnuplot on the *.ener file to check energy conservation (plot column 2 versus 4, then column 2 versus 5 and 6) 2. Use vmd to view the trajectory: module load vmd; vmd run-01.xyz or use the vmd script (may need to edit) Running example 1 13. Now do the short MD NVE run but first clean the directory (rm some files), and edit liq.inp replacing: 1. RUN_TYPE GEOMETRY_OPTIMIZATION by RUN_TYPE MD (this means GEO_OPT stuff will be ignored) 2. Add these lines (see file md_lines) after the line &END GEO_OPT : &MD ENSEMBLE NVT ! NVE STEPS 1000 TIMESTEP 1.0 &PRINT &TRAJECTORY ON &EACH MD 10 TEMPERATURE 300.0 &END EACH &THERMOSTAT FILENAME =${BASE_NAME}-${ID}.xyz TYPE NOSE FORMAT XYZ REGION MOLECULE &END TRAJECTORY &NOSE &VELOCITIES ON LENGTH 3 YOSHIDA 3 &EACH MD 10 TIMECON 100 &END EACH MTS 2 FILENAME =${BASE_NAME}-${ID}_vel.xyz &END NOSE FORMAT XYZ &END THERMOSTAT &END VELOCITIES &PRINT ON &FORCES ON &ENERGY &EACH MD 1 &END EACH FILENAME =${BASE_NAME}-${ID}.ener &END ENERGY &END PRINT &END MD &EACH MD 10 3. Do the run: sbatch runit.bash 4. Examine the output: 1. Use gnuplot on the *.ener file to check energy conservation (plot column 2 versus 4, then column 2 versus 5 and 6) 2. Use vmd to view the trajectory: module load vmd; vmd run-01.xyz or use the vmd script (may need to edit) 3. How does an MD run at 300 K differ from a GEO_OPT run (at 0K)? &END EACH FILENAME =${BASE_NAME}-${ID}_force.xyz FORMAT XYZ &END FORCES &RESTART_HISTORY &EACH MD 1000 &END EACH &END RESTART_HISTORY &RESTART ON BACKUP_COPIES 1 &EACH MD 1 &END EACH FILENAME =${BASE_NAME}-${ID}.restart &END RESTART &END PRINT 14 Running example 1 14. Now do the MD NVT production run, first clean the directory (rm some files), and edit liq.inp replacing: 1. ENSEMBLE NVE by ENSEMBLE NVT 2. STEPS 1000 by STEPS 100000 (100 ps run) 3. VELOCITIES ON by VELOCITIES OFF 4. FORCES ON by FORCES OFF 15. Do the run and then examine the output: 1. Use gnuplot on the *.ener file to check energy conservation (plot column 2 versus 4, then column 2 versus 5 and 6) 2. Use vmd to view the trajectory: module load vmd; vmd run-01.xyz or use the vmd script (may need to edit) 3. Is the energy conserved? This the canonical ensemble (NVT). Should energy be conserved? Do you see oscillations? 4. Is your water liquid? How can you tell? Is it equilibrated? Hwne does equlibration occur? 5. Obtain RDFs using vmd 6. cd to the ANALYZE subdir, edit the *.in files, and do the analysis (use the bash script) 7. How do your results (structures in the form of the RDFs—plot against Soper experimental RDFs; internal energy/enthalpy) compare to the literature, see for example: http://pubs.acs.org/doi/abs/10.1021/jp110481m 8. The SPC/E potential you have used is from Berendsen et al., see: https://en.wikipedia.org/wiki/Water_model and https://dx.doi.org/10.1021%2Fj100308a038 Do you expect the results you obtained? If you have time, you can use the end point of your (hopefully fully equilibrated) NVT trajectory to do an NVE run. That can be analyzed in a similar way but also to obtain dynamical quantities like diffusion coefficient, IR spectra, etc. Speak with me and I will help you out. Note that the SPC/E water molecule is rigid. We can do a run using TIP3P-F flexible water to get a view of the internal IR vibrations. 15 CP2K example 2: Water with DFT @SET BASE_NAME run @SET ID 01 &GLOBAL PROJECT ${BASE_NAME}-${ID} RUN_TYPE MD &END GLOBAL &MOTION &MD ENSEMBLE NVT STEPS 20 ! Now you are calculating dft on the fly, it will be much slower TIMESTEP 0.5 TEMPERATURE 300.0 &THERMOSTAT TYPE NOSE REGION MASSIVE &NOSE LENGTH 3 YOSHIDA 3 TIMECON [wavenumber_t] 2300 MTS 2 &END NOSE &END THERMOSTAT &PRINT ON &ENERGY &EACH MD 1 &END EACH FILENAME =${BASE_NAME}-${ID}.ener &END ENERGY &END PRINT &END MD 16 &PRINT &TRAJECTORY ON &EACH MD 1 &END EACH FILENAME =${BASE_NAME}-${ID}.xyz FORMAT XYZ &END TRAJECTORY &VELOCITIES ON &EACH MD 1 &END EACH FILENAME =${BASE_NAME}-${ID}_vel.xyz FORMAT XYZ &END VELOCITIES &FORCES ON &EACH MD 1 &END EACH FILENAME =${BASE_NAME}-${ID}_force.xyz FORMAT XYZ &END FORCES &RESTART ON &EACH MD 1 &END EACH FILENAME =${BASE_NAME}-${ID}.restart &END RESTART &END PRINT &END MOTION CP2K example 2: Water with DFT (note how sections in blue differ from classical potential example) &FORCE_EVAL METHOD QS &DFT POTENTIAL_FILE_NAME ./GTH_POTENTIALS BASIS_SET_FILE_NAME ./GTH_BASIS_SETS ! WFN_RESTART_FILE_NAME ./run-01-RESTART.wfn &MGRID CUTOFF 280 &END MGRID &SCF MAX_SCF 20 EPS_SCF 1.0E-7 SCF_GUESS RESTART &OUTER_SCF EPS_SCF 1.0E-7 MAX_SCF 20 &END &OT T MINIMIZER DIIS N_DIIS 7 &END OT &PRINT &RESTART ON &END RESTART &RESTART_HISTORY OFF &END RESTART_HISTORY &END PRINT &END SCF &QS EPS_DEFAULT 1.0E-12 MAP_CONSISTENT EXTRAPOLATION ASPC EXTRAPOLATION_ORDER 3 &END QS &XC &XC_GRID XC_SMOOTH_RHO NN10 XC_DERIV SPLINE2_SMOOTH &END XC_GRID &XC_FUNCTIONAL BLYP &END XC_FUNCTIONAL &vdW_POTENTIAL DISPERSION_FUNCTIONAL PAIR_POTENTIAL &PAIR_POTENTIAL TYPE DFTD2 REFERENCE_FUNCTIONAL BLYP R_CUTOFF 40.0 &END PAIR_POTENTIAL &END vdW_POTENTIAL &END XC &END DFT &SUBSYS &CELL ABC 12.4138 12.4138 12.4138 &END CELL &COORD O 1.2025696987709971E+01 1.2412376840360351E+00 H 1.1959096889663195E+01 1.3409373770618183E+00 H 1.1593234139420252E+01 2.0327876480659519E+00 … O 1.2024298671712041E+01 9.9218625553065536E+00 H 1.2053386790559529E+01 9.6994663967598260E+00 H 1.1277449073604592E+01 9.4150658994176109E+00 &END COORD &KIND O BASIS_SET TZV2P-GTH POTENTIAL GTH-BLYP-q6 &END KIND &KIND H BASIS_SET TZV2P-GTH POTENTIAL GTH-BLYP-q1 &END KIND &END SUBSYS &END FORCE_EVAL !&EXT_RESTART ! RESTART_FILE_NAME ./run-01.restart !&END EXT_RESTART 17 1.1100847567157336E+01 1.0106406672798471E+01 1.1421274324532323E+01 9.2400384614568534E+00 1.0223617621157310E+01 8.9496605424081750E+00 Running and analyzing example 2 1. cd $WRKDIR/liq_blypd2_tzv2p_short (this is the input we just went over = Exercise5 for the class) 2. sbatch runit.bash 1. But first: edit if needed the input and script; module load vmd; vmd geometry.xyz or vmd –e liq.vmd to see the starting geometry 3. While the run is happening, continue the readings or ask questions 4. Examine the output: 1. Use gnuplot on the *.ener file to check energy conservation (plot column 2 versus 4, then column 2 versus 5 and 6) 2. Use vmd to view the trajectory: module load vmd; vmd run-01.xyz or use the vmd script (may need to edit) 3. We only did an extremely short run. Why? Compare timings in the *.ener file to the classical case. How many processor cores are we using now? How much more costly is Born-Oppenheimer MD with DFT compared to that with a classical potential 2-body Lennard-Jones plus charges potential? 5. Since this is so costly, you only ran 20 steps to get a feel for DFT-MD. Now you will analyze a precomputed long trajectory: 6. cd $WRKDIR/liq_blypd2_tzv2p (this is the identical input but this run went longer) 7. Examine the files as before. Use gnuplot, vmd, etc. You can cd to ANALYZE sub-dir and do analysis. 8. Finally, compare the results of the classical simulation with the DFT one and also with experiment. You can use gnuplot to plot RDFs obtained from SPC/E and BLYP-D2 and the experimental ones (Soper files). How do the plots look? What about enthalpy? Put some results together to show the whole class. 18 Running and analyzing example 3 (formic acid at air-water interface) 19 1. cd $WRKDIR/water_slab_with_formic_acid_blypd2_dzvp_nve300_short . How does the input file compare to the one for DFT liquid water? (Hint: use the linux sdiff command: ’sdiff –aw 192 file file2 |less’). What does the system look like (use: ’vmd geometry.xyz’)? What is the purpose of the vacuum? The constraints? 2. Run it: sbatch runit.bash 3. While the run is happening, continue the readings or ask questions 4. Examine the output 1. Use gnuplot on the *.ener file to check energy conservation (plot column 2 versus 4, then column 2 versus 5 and 6) 2. Use vmd to view the trajectory 3. The formic acid starts to fall. How can we monitor its height above the water surface? (hint ’use grep C position_file > C’, then use gnuplot) . (Ask me for a gnuplot file to make a good plot.) 4. We only did an extremely short run. Why? 5. Since this is so costly, you only ran 50 steps to get a feel for this problem. Now you will analyze a pre-computed longer trajectory: 6. cd $WRKDIR/water_slab_with_formic_acid_blypd2_dzvp_nve300 (this is the identical input but this run went longer, to 10 ps) 7. Examine the files as before. Use gnuplot, vmd (use the scripts and try to understand them), etc. You can cd to ANALYZE sub-dir and do analysis (first do: ’ssh taito-gpu’, vmd will run faster on gpus). Note that the analyze.x code called now is slightly different. (You may need to compile it.) Also, vmd is used for calculating RDFs. 8. Is there any chemistry happening? If yes, what are the mechanisms and time scales? (Formic acid is a weak acid so the deprotonation was not expected. Out of 50 trajectories, I only saw two deprotonate.) Make some nice vmd snaphots of the Grotthus steps and present to the class. Compare to this Lee et al. paper.