Molecular Mechanics, Molecular Dynamics, and Docking Michael Strong, PhD National Jewish Health University of Colorado, Denver Foldit Proteins are Dynamic Structures Aquaporin Water traveling through Aquaporin pore Control of the selectivity of the aquaporin water channel family by global orientational tuning. Tajkhorshid E, Nollert P, Jensen MØ, Miercke LJ, O'Connell J, Stroud RM, Schulten K. Science. 2002 Apr 19;296(5567):525-30. Experimental Methods provide clues to less rigid regions X-ray crystallography NMR Molecular Mechanics (MM) “The Physics of Proteins” Describe Proteins in terms of Physiochemical properties of Atoms and Bonds Calculate the dynamics of a protein, and search for minimum energy, by repeated integration of the forces acting on each atom Minimum energy conformation in solution assumed to be the native state (relevant to protein folding) Molecular Mechanics •A molecule is described by interacting spheres. • Different types of spheres describe different types of atoms. • The interaction between chemically bound atoms is described by special bonding interaction terms. • The interaction of not chemically bound atoms is described by non-bonding interaction terms. Energy Minimization Many forces act on a protein - Hydrophobic: inside of protein avoids water - Packing: Atoms can’t be too close or too far away - Bond Angle and Length Constraints - Non-covalent (longer distance) - Hydrogen Bonds - Ionic / Salt Bridges Can calculate all of these forces, and minimize Computationally intensive Molecular Mechanics Pros/Cons Pros: • detailed stereochemical model that describes certain aspects of biomolecules very well • conformational flexibility • dynamic model (time dependence) is possible • large systems (> 10^4 atoms) can be modeled Cons: • computationally demanding • large scale conformational changes are hard to model • no electronic (quantum) description, no chemical reaction (bond breaking/forming), no excited states, … • limited run times Energy Function • Target function that MM tries to optimize • Describes the interaction energies of all atoms and molecules in the system • Always an approximation – Closer to real physics --> more realistic, more computation time (I.e. smaller time steps and more interactions increase accuracy) The energy equation (in simplistic terms) Energy = Stretching Energy + Bending Energy + Torsion Energy + Non-Bonded Interaction Energy (most computationally costly, many) These equations together with the data (parameters) required to describe the behavior of different kinds of atoms and bonds, is called a forcefield. (potential energy) The energy model • Proposed by Linus Pauling in the 1930s • Bond angles and lengths are almost always the same • Energy model broken up into two parts: Covalent terms • Bond distances • Bond angles • Dihedral angles Non-covalent terms • Forces at a distance between all non-bonded atoms Bond length • Spring-like term for energy based on distance • kb is the spring constant of the bond. r0 is the bond length at equilibrium. Unique kb and r0 assigned for each bond pair, i.e. C-C, O-H Bond bend k is the spring constant of the bend. 0 is the bond angle at equilibrium. Unique parameters for angle bending are assigned to each bonded triplet of atoms based on their types (e.g. C-C-C, C-O-C, CC-H, etc.) Torsion Energy Energy needed to rotate about bonds. Only relevant to single bonds A controls the amplitude of the curve n controls its periodicity shifts the entire curve along the rotation angle axis (). The parameters are determined from curve fitting. Unique parameters for torsional rotation are assigned to each bonded quartet of atoms based on their types (e.g. C-C-CC, C-O-C-N, H-C-C-H, etc.) Non-bonded Energy Van der Waals – preferred distance between atoms If atoms are polar, some will have partial electrostatic charges (attract if opposite, repel if same) A and B constants depending on atom type. A determines the degree the attractiveness A determines the degree the attractiveness B determines the degree of repulsion B determines the degree of repulsion q is the charge q is the partial atomic charge Energy minimization • Given some energy function and initial conditions, we want to find the minimum energy conformation. (steepest decent algorithm) • Various programs: CHARMM, AMBER are two most widely used (and packaged), DE Shaw’s Desmond Molecular Dynamics can be used to predict protein folding (based on the physical properties of the protein) villin FiP35 Folding proteins at x-ray resolution, showing comparison of x-ray structures (blue) and last frame of MD simulation (red): (A) simulation of villin RMSD 1A (B) simulation of FiP35 Atomic-Level Characterization of the Structural Dynamics of Proteins Science 15 October 2010: vol. 330 no. 6002 341-346 Why simulate motion? • • • • Predict structure Understand interactions Understand properties Experiment on what cannot be studied experimentally • Solvation models: water & salt are very important to molecular behavior. Must model as many water atoms as protein atoms (often more than molecule, explicit model). Molecular Dynamics • Molecules, especially proteins, are not static. – Dynamics can be important to function – Molecules allowed to interact for a period of time (fs steps) – Consider number of particles, timestep, total time duration, nanoseconds to microseconds (several CPU days to CPU years) (nanosecond simulation -> millions of calculations) – 10usec simulation -> 3 months • Trajectories, not just minimum energy state. – MM ignores kinetic energy, does only potential energy – MD takes same force model, but calculates F=ma and calculates velocities of all atoms (as well as positions) Anton massively parallel supercomputer 512-node machine: 17,000 nanoseconds of simulated time per day for a proteinwater system consisting of 23,558 atoms. In comparison, MD codes running on general-purpose parallel computers with hundreds or thousands of processor cores achieve simulation rates of up to a few hundred nanoseconds per day on the same chemical system. (enabled first microsecond MD simulation, Science 2010) (modified Amber force field) named after Anton van Leeuwenhoek : “the father of microscopy” Folding@home : Distributed Computing Project Stanford University (Vijay S Pande) As of April 9, 2009 the peak speed of the project overall has reached over 5.0 native PFLOPS (8.1 x86 PFLOPS[18]) from around 400,000 active machines, including PS3. (Record) Popular Molecular Dynamics Programs – Linux Based AMBER (Peter Kollman, UCSF; David Case, Scripps) CHARMM (Martin Karplus, Harvard) GROMOS (Van Gunsteren, ETH, Zurich) Docking • Computation to assess binding affinity • Looks for conformational and electrostatic "fit" between proteins and other molecules • Optimization again: what position and orientation of the two molecules minimizes energy? • Large computations, since there are many possible positions to check, and the energy for each position may involve many atoms Docking Similar equation A and B constants depending on atom type. A determines the degree the attractiveness B determines the degree of repulsion q is the partial atomic charge Molecular Docking Start with PDB file, homology model, etc Add Hydrogens Select Grid Box Identify molecule to be docked >10 runs, > 1 million evaluations Genetic Algorithm Molecular Docking (Example in TB) A B C Heme W107 H108 W321 Isoniazid KatG Dimer with 2 heme molecules R104 T314 S315 KatG Heme Binding Site is also the site of Isoniazid Activation D P136 A139 L205 P232 T314 D282 S315 A281 G316 I317 Isoniazid Docked into the KatG active site Steps: 1. Get crystal structure of protein from PDB 2. Get small molecule coordinates (DrugBank) 3. Use AutoDock 4. Add Hydrogens to both structures 5. Identify potential binding site, specify GridBox (center on heme) (dimensions 40x40x40) 6. Dock using Genetic Algorithm, 10 runs, 2,500,000 evaluations Virtual Screening • Docking small ligands to proteins is a way to find potential drugs. Libraries • A small region of interest (pharmacophore) can be identified, reducing computation • Empirical scoring functions are not universal • Various search methods: – Rigid provides score for whole ligand (accurate) – Flexible breaks ligands into pieces and docks them individually Docking example Biotin docking with Streptavidin, from Olsen lab at Scripps Macromolecular docking • Docking of proteins to proteins or to DNA • Important to understanding macromolecular recognition, genetic regulation, etc. • Conceptually similar to small molecule docking, but practically much more difficult – Score function can't realistically compute energies – Use either shape complementarity alone or some kind of mean field approximation Docking Resources • AutoDock http://autodoc.scripps.edu/ • Dock http://www.cmpharm.ucsf.edu/kuntz/dock.html