Exploring the Sequence Dependent Structure and Dynamics of DNA with Molecular Dynamics Simulation Sarah Harris School of Physics and Astronomy University of Leeds Introduction Calculations of the charge transport properties of complex biomolecules such as DNA are extremely difficult in general due to both the size and flexibility of these systems. Describe some relevant results from classical MD simulations: i) Insight into sequence and environment dependent DNA structure ii) A description of the dynamic properties of DNA and the methods developed to quantify them. iii) A preliminary series of MD simulations to show the effect of charged bases on DNA dynamics. The Structure of Duplex DNA CH3 HN H O N N N H N N N O 2 H-bonds Adenine Thymine O H NH N N N H N N N HN H O 3 H-bonds Guanine Cytosine Higher Order DNA Structures i ) A triplex DNA structure iii ) Certain quadruplexes are associated with a continuous channel of counterions ii ) Guanine-rich DNA can form folded quadruplex structures The Flow of Genetic Information Promoter of Transcription Start Codon Coding Region Stop Codon Terminator of Transcription DNA Transcription RNA Polymerase Nascent mRNA Splicing Intron Exon (non-coding) (coding) Nucleus Mature mRNA tRNA Translation Ribosome Protein Importance of Charge Transport in DNA Damage The genome is under continuous chemical attack (generally oxidative) which can result in dangerous mutations. GG and GGG sequences are preferentially oxidised, despite the event occuring remotely in the sequence. Such damage propagation has been observed in intact cell nuclei1. GGG rich motifs occur disproportionately at the termini of intron regions, ideally positioned to sacrificially protect the coding regions of genes2. 1. Nunez M. E., Holmquist G. P & Barton J. K. (2001) Biochemistry, 40, 12465-12471. 2. Friedman K. A. & Heller A. (2001) J. Phys. Chem. B, 105, 11859-11865. Charge Transport in Solution Excite tethered photoxidant Rh(phi)2bpy3+ Vary DNA sequence, add binding proteins etc GGG motif Photoxidant A hole is injected into the DNA, which oxidises a distant GG or GGG Oxidative damage can occur up to 200Å (~60 base pairs) from the site of hole injection The relative charge transport efficiency can be measured by detecting the amount of damage using biochemical methods Williams T. T., Odom D. T & Barton J. K. (2000) J. Am Chem. Soc. 122, 9048-9049 DNA Dynamics and Charge Transport The sequence dependence of charge transport efficiency remains poorly understood. Suggested mechanisms for electron/hole transport include: i) Superexchange (~ 3-4 base pairs) ii) Thermally activated hopping iii) Polaron hopping iv) Conformationally gated hopping through “charge transport active domains1,2” What role is played by the thermal fluctuations of the DNA, and which dynamic timescales are associated with the most important motions? 1. 2. O’ Neill M. & Barton J. K. (2004) J. Am Chem. Soc. 126, 11471-11483 Shao F., O’ Neil M. & Barton J. K. (2004) Proc. Natl. Acad. Sci. USA 101, 17914-17919 The Importance of Sequence Dependent Structure and Dynamics The structure and flexibility of DNA must be highly sequence dependent since DNA binding proteins must recognise specific binding sites to exert cellular control. Although much work has been done on quantifying sequence dependent structure by X-ray and NMR the sequence dependent dynamics of DNA remains poorly understood. Much of the dynamic behaviour is not accessible theoretically, therefore computer simulation is required. Sequence Specific Recognition by Proteins The TATA box protein-DNA complex Repair of a G-U mismatch Barratt T. E. et al (1999) EMBO 18, 6599 Sequence Dependant Structure Different DNA sequences have subtly different structures. For example ~ a run of AT bases will give the DNA a particularly narrow minor groove ~ this is responsible for the “Spine of Hydration” The precise position of chemical groups (ie H-bonds) also depends on the DNA sequence. The spine of hydration in A-tract DNA Changes in Structure Due to DNA Environment Canonical B-form DNA A-form DNA, present in water/ methanol mixtures Left-handed Z-form DNA, present at very high salt The Hierarchy of Dynamic Timescales Timescale Picosecond Nanosecond Microsecond Type of internal motion. Local oscillations of groups of atoms with amplitudes 0.1 A. Bending and twisting motions of the double chain with amplitudes A=5-7 A. Bending, winding and unwinding of the double helix; opening of base pairs of the DNA. Energy of activation. E=0.6 Kcal/Mol; E=2-5Kcal/Mol: Source: External Source: Collisions thermal reservoir. with hot solvent molecules. Experimental NMR, Raman spectroscopy, methods. X-ray. Theoretical Methods. NMR, Raman spectroscopy fluorescence. Molecular Molecular dynamics; dynamics; harmonic analysis. harmonic analysis. rod-like model. E=5-20Kcal/Mol Source: Changing of pH; increasing temperature; action of denaturation agents. NMR, hydrogen exchange. Theory of helix-coil transition; non-linear mechanics. The AMBER Forcefield The molecule is considered as a collection of atoms interacting through simple, classical potential energy functions. Electrostatic Repulsion - - Van Der Waals forces - Covalent Bonds - The simple potential energy function is fitted empirically for each specific interaction through as series of constants ~ the AMBER force field parameters. Bonds ∑ Kθ (θ − θ eq ) + Angles 2 Bonds + H-Bonds ∑ K r (r − req ) U Total = 2 Angles V [1 + cos(ηφ − γ ] + Dihedrals ∑ Dihedrals 2 ⎡⎛ R ij ⎜ ⎢ ε ∑ ij ⎜ ⎢⎝ rij Atoms ⎣ ∑ 12 6 ⎛ Rij ⎞ ⎤ Van der ⎞ ⎟ −⎜ ⎟ ⎥ + ⎜r ⎟ ⎥ ⎟ Waals ⎝ ij ⎠ ⎦ ⎠ qi q j Partial Charges εrij Electrostatics Contents of the Simulation Cell The simulation cell contains: i) The DNA. ii) Sufficient Na+ counterions to neutralise the system. iii) Enough water molecules to surround the DNA. A Molecular Dynamics Simulation of DNA Obtain the positions of all atoms in the system over timescales ~2fs to 50ns The most accurate simulations include water and counterions explicitly (~700 solute and ~3000 solvent atoms) and use PME to calculate long range electrostatics The Structure of DNA in a Vacuum MD simulations of DNA in the gas phase based on electrospray data show that the DNA does not remain in its B-form configuration. In vacuo DNA structures after 100ns of MD Rueda M. et al (2003) J. Am Chem. Soc. 125, 8007-8014 Principal Component Analysis (PCA) Calculate the ‘3N3N’ covariance matrix from the trajectory. Indicates how individual atomic motions were correlated during the simulation. C p ,q 1 = M ∑ (X M m =1 m, p − Xp )(X m ,q − Xq ) Diagonalise the covariance matrix to find the set of ‘3N’ eigenvectors and their corresponding eigenvalues. Find the types of overall structural deformation that were independent during the trajectory - called components or modes. C = u −1λ u Order the components in terms of their eigenvalues. The component with the highest eigenvalue has contributed the most to the system’s dynamics. Principal Component 1 The components with large eigenvalues are large, scale, quasiharmonic oscillations of the entire helix Tyically, the 1st,2nd and 3rd components contribution ~ 60% of the dynamics of the system The Dynamics of d(GGTAATTACC)2 The DNA helix has very simple mechanical properties which are sequence dependant. Bend at TA step 1 Bend at TA Step 2 Helix Twisting However, it can be difficult to obtain a quantitative comparison of flexibility between simulations using PCA due to anharmonic effects. MD Simulations of Charged Bases Caution: Preliminary Results!! Perform 5ns classical MD on d(GAAAAAAAAC) including i) neutral, ii) positive and iii) negatively charged thymine base. (placed at position 15 based, partial charges calculated using HF and RESP fitting). Neutral thymine nucleotide Positive thymine nucleotide Negative thymine nucleotide Does the presence of a charged base affect the dynamics of the DNA relative to the neutral system? The Configurational Entropy The entropy of a classical harmonic oscillator: 1 3 N −6 2 S = k ∑ ln xi 2 i =1 A formula is required which gives identical results for large eigenvalues, but which also gives: S → 0 as x i 2 →0 This is true for a function of the form: 1 3 N −6 ⎡ ⎛ kTe 2 ⎞ 2 ⎤ S = k ∑ ln ⎢1 + ⎜⎜ 2 ⎟⎟ mxi ⎥ 2 i =1 ⎣ ⎝ = ⎠ ⎦ The Schlitter Formula. Schlitter J. (1993) Chem. Phys. Lett. 217, No. 6, 617. Entropy Convergence An quantitative comparison of the flexibility of each sequence can be obtained by calculating the entropy from MD/PCA. TS (kcal/Mol) The entropy contains a hidden dependence on time due to the finite length of the trajectory. The presence of a singly 650 charged base slightly increases the overall flexibility of the 600 Neutral helix 550 Positive Negative 500 450 0 1000 2000 3000 Length of Sampling Window (ps) 4000 No simple dependence on key structural parameters (such as H-bond distances) has yet been detected Future Work Use these simple MD simulations to investigate whether the local or global dynamic modes are influenced by the presence of the charged base. Correctly optimise the geometry of these charged bases using DFT and perform equivalent simulations for comparison1. Construct semi-empirical QM/MM models of DNA including a charged base using results from these classical calculations to optimise the system. 1. Smith D. M. A. & Adamowicz L. (2001) J. Phys. Chem. 105, 9345-9354 Concluding Remarks DNA structure and dynamics is exquisitely dependent on both the sequence and the environment. DNA dynamics consists of high frequency, low amplitude local modes over ps timescales combined with global quasiharmonic oscillations over ns timescales. The flexibility of DNA is slightly increased in the presence of a charged base, which may be important in constructing models of transport processes. Other Useful References A general discussion of nucleic acid structure: Nucleic Acid Structure and Recogition, Steve Neidle, OUP. www.oup.co.uk/molbiol2/na-structure/ References which discuss the counterion distribution around DNA: Exploring the Counterion Atmosphere around DNA: What can be learned from Molecular Dynamics simulations? Rueda et al (2004) Biophys. J. 87, 800-811. DNA and its Counterions: A Molecular Dynamics Study. Varnai P. & Zakrzewska K. (2004) Nucl. Acid Res. 32 4269-4280