Molecular Modeling The compendium of methods for mimicking the behavior of molecules or molecular systems Points for Consideration Remember • Molecular modeling forms a model of the real world • Thus we are studying the model, not the world • A model is valid as long as it reproduces the real world Why Use Molecular Modeling? (and not deal directly with the real world?) Fast, accurate and relatively cheap way to: • • • • • Study molecular properties Rationalize and interpret experimental results Make predictions for yet unstudied systems Study hypothetical systems Design new molecules And • Understand reactants probe ? products Some Molecular Properties Energy • • • • Molecular Spectroscopy • NMR (Nuclear Magnetic Resonance) • IR (Infra Red) • UV (Ultra Violate) • MW (Microwave) Free energy (DG) Enthalpy (DH) Entropy (DS) Steric energy (DG) 3D Structure • Distances • Angles • Torsions Kinetics • Reaction Mechanisms • Rate constants Electronic Properties • Molecular orbitals • Charge Distributions • Dipole Moments Molecular Structure: Saccharin A single 1D structure C7H5NO3S O A single 2D structure N S O Many 3D structures O H Molecular Structure and Molecular Properties Structure Property Activity Cell Permeability Toxicity Descriptors 1D: e.g., Molecular weight 2D: e.g., # of rotatable bonds 3D: e.g., Molecular volume Molecular mechanics Introduction to Force Fields Molecular mechanics • Ball and spring description of molecules • Better representation of equilibrium geometries than plastic models • Able to compute relative strain energies • Cheap to compute • Lots of empirical parameters that have to be carefully tested and calibrated • Limited to equilibrium geometries • Does not take electronic interactions into account • No information on properties or reactivity • Cannot readily handle reactions involving the making and breaking of bonds Energy as a function of geometry • Polyatomic molecule: – N-degrees of freedom – N-dimensional potential energy surface http://www.chem.wayne.edu/~hbs/chm6440/PES.html A Molecule is a Collection of Atoms Held Together by Forces Intuitively forces act between bonded atoms Forces act to return structural parameters to their equilibrium values stretch bend torsion And More Forces… Forces also act between non-bonded atoms Cross terms couple the different types of interactions Stretch-bend Non-bonded Forces are Described by Potential Energy Functions k v(l ) (l l0 ) 2 2 k v( ) ( 0 ) 2 2 N Vn (1 cos( n )] n 0 2 v( ) And More Functions… 12 6 4 ij ij ij r N N rij ij v i 1 j i 1 qi q j 4 0 rij v(l1 , l2 , ) kl1l2 2 [(l1 l1, 0 ) (l2 l2, 0 )]( 0 ) A Force Field is a Collection of Potential Energy Functions A force field is defined by the functional forms of the energy functions and by the values of their parameters. ki ki Vn 2 2 ( l l ) ( ) (1 cos( n )) i i ,0 i i ,0 bonds 2 angles 2 torsions 2 N N 12 6 q q 4 ij ij ij i j cross terms rij rij 4 0 rij i 1 j i 1 V (r N ) Molecular Mechanics Energy (Steric Energy) is Calculated from the Force Field A molecular mechanics program will return an energy value for every conformation of the system. Steric energy is the energy of the system relative to a reference point. This reference point depends on the bonded interactions and is both force field dependent and molecule dependent. Thus, steric energy can only be used to compare the relative stabilities of different conformations of the same molecule and can not be used to compare the relative stabilities of different molecules. Further, all conformational energies must be calculated with the same force field. Steric Energy Steric Energy = Estretch + Ebend + Etorsion + EVdW + Eelectrostatic + Estretch-bend + Etorsion-stretch + … Estretch Ebend Etorsion EVdW Eelectrostatic Estretch-bend Etorsion-stretch Stretch energy (over all bonds) Bending energy (over all angles) Torsional (dihedral) energy (over all dihedral angles) Van Der Waals energy (over all atom pairs > 1,3) Electrostatic energy (over all charged atom pairs >1,3) Stretch-bend energy Torsion-stretch energy EVdW + Eelectrostatic are often referred to as non-bonded energies Force Field and Potential Energy Surface A force field defines for each molecule a unique PES. Each point on the PES represents a molecular conformation characterized by its structure and energy. Energy is a function of the coordinates. Coordinates are function of the energy. CH3 energy CH3 CH3 coordinates Moving on (Sampling) the PES Each point on the PES is represents a molecular conformation characterized by its structure and energy. By sampling the PES we can derive molecular properties. Sampling energy minima only (energy minimization) will lead to molecular properties reflecting the enthalpy only. Sampling the entire PES (molecular simulations) will lead to molecular properties reflecting the free energy. In both cases, molecular properties will be derived from the PES. Force Fields: General Features Force field definition • Functional form (usually a compromise between accuracy and ease of calculation. • Parameters (transferability assumed). Force fields are empirical • There is no “correct” form of a force field. • Force fields are evaluated based solely on their performance. Force field are parameterized for specific properties • Structural properties • Energy • Spectra Force Fields: Atom Types In molecular mechanics atoms are given types - there are often several types for each element. Atom types depend on: • Atomic number (e.g., C, N, O, H). • Hybridization (e.g., SP3, SP2, SP). • Environment (e.g., cyclopropane, cyclobutane). MM2 type 2 C O C C MM2 type 3 “Transferability” is assumed - for example that a C=O bond will behave more or less the same in all molecules. A General Force Field Calculation Input • Atom Types • Starting geometry • Connectivity Energy minimization / geometry optimization Calculate molecular properties at final geometry Output • • • • Molecular structure Molecular energy Dipole moments etc. etc. etc. A Simple Molecular Mechanics Force Field Calculation V (r N ) ki ki Vn 2 2 ( l l ) ( ) (1 cos( n )) i i ,0 i i ,0 bonds 2 angles 2 torsions 2 12 6 q q 4 ij ij ij i j i 1 j i 1 rij rij 4 0 rij N N Bonds • C-C x 2 • C-H x 8 Angles • C-C-C x 1 • C-C-H x 10 • H-C-H x 7 Torsions • H-C-C-H x 12 • H-C-C-C x 6 Non-bonded • H-H x 21 • H-C x 6 Bond Stretching Morse potential v(l ) De{1 exp[ a(l l0 )]}2 a 2 De E k De = Depth of the potential energy minimum l0 = Equilibrium bond length (v(l)=0) = Frequency of the bond vibration = Reduced mass k = Stretch constant • Accurate. • Computationally inefficient (form, parameters). • Catastrophic bond elongation. r Bond Stretching Harmonic potential (AMBER) k v(l ) (l l0 ) 2 2 E r • Inaccurate. • Coincides with the Morse potential at the bottom of the well. • Computationally efficient. Bond Stretching Cubic (MM2) and quadratic (MM3) potentials k1 k2 v(l ) (l l0 ) 2 (l l0 )3 2 2 k1 k 2 amber 3 k 3 2 v(l ) (l l0 ) (mm2 l l0 ) (l l0 ) 4 2 2 mm3 C-C bond2 600.0 harmonic cubic quadratic 500.0 energy 400.0 300.0 200.0 100.0 0.0 -100.0 1 1.2 1.4 1.6 bond leng th 1.8 2 2.2 Bond Stretching Parameters (MM2) -1 -2 l0 (A) k (kcal mol A ) Csp -Csp Csp3-Csp2 1.523 1.497 317 317 Csp2=Csp2 1.337 690 1.208 777 1.438 1.345 367 719 Bond 3 3 2 Csp =O 3 3 Csp -Nsp C-N (amide) Hard mode. Bond types correlate with l0 and k values. A 0.2Å deviation from l0 when k=300 leads to an energy increase of 12 kcal/mol. Angle Bending k v( ) ( 0 ) 2 2 k1 k2 v( ) ( 0 ) 2 ( 0 ) 6 MM2: 2 2 k1 k2 k3 k4 k5 3 4 5 6 ( ) ( ) ( ) ( ) MM3: v( ) ( 0 ) 2 amber 0 0 0 0 mm2 2 2 2 2 2 mm3 AMBER: C-C-C angle 25.0 MM3 MM2 AMBER 20.0 energy energy 15.0 10.0 5.0 0.0 -5.0 80 90 100 110 angle bond ang le 120 130 Angle Bending Parameters (MM2) q0 Angle -1 Csp3-Csp3-Csp3 109.47 0.0099 Csp3-Csp3-H 109.47 0.0079 H-Csp -H 109.47 Csp3-Csp2-Csp3 117.2 0.007 0.0099 3 3 2 3 2 Csp -Csp =Csp Csp -Csp =O 2 -1 k (kcal mol deg ) 121.4 0.0121 122.5 0.0101 Hard mode (softer than bond stretching) Torsional (Dihedral) Terms Reflect the existence of barriers to rotation around chemical bonds. Used to set the relative energies of the rotational minima and maxima. Together with the non-bonded terms are responsible for most of the structural and energetic changes (soft mode). Usually parameterized last. Soft mode. General Functional Form N Vn (1 cos( n )] n 0 2 v( ) : Torsional angle. n (multiplicity): Number of minima in a 360º cycle. Vn: Correlates with the barrier height. (phase factor): Determines where the torsion passes through its minimum value. . Vn = 4, n = 2, = 180 Torsion energy . . . Vn = 2, n = 3, = 60 . Torsion angle ( ) Functional Form: AMBER N Vn v( ) (1 cos( n )] n 0 2 Preference to single terms. Usage of general torsional parameters (i.e., torsional potential solely depends on the two central atoms of the torsion). • C-C-C-C = O-C-C-C = O-C-C-N = … Example: Butane The barrier to rotation around the C-C bond in butane is ~20kJ/mol. All 9 torsional interactions around the central C-C bond should be considered for an appropriate reproduction of the torsional barrier. rotational profile of butane # 1 4 4 21.0 energy 15.7 10.5 5.2 0.0 0 90 180 dihedral ang le 270 360 Type C-C-C–C C-C-C–H H-C-C–H V1 0.200 0.000 0.000 V2 0.270 0.000 0.000 V3 0.093 0.267 0.237 Out-of-Plane Bending Allows for non-planarity when required (e.g., cyclobutanone). Prevents inversion about chiral centers (e.g., for united atoms). Induces planarity when required. O O correct United Atoms Absorb non-polar hydrogen atoms into respective carbons. Greatly reduces computational cost. Out-of-Plane Bending j Wilson Angle l i Between ijk plane and i-l bond v( ) k ( 0 ) 2 k i Pyramidal Distance j Between atom i and jkl plane v(d ) k (d d 0 ) 2 Improper torsion 1-2-3-4 torsion v( ) k[1 cos( n 0 )] l k Cross Terms: Stretch-Bend Coupling between internal coordinates. Important for reproducing structures of unusual (e.g., highly strained ) systems and of vibrational spectra. Stretch-Bend: As a bond angle decreases, the adjacent bonds stretch to reduce the interaction between the 1,3 atoms. v(l1 , l2 , ) kl1l2 2 [(l1 l1, 0 ) (l2 l2, 0 )]( 0 ) Cross Terms: Stretch-Torsion Stretch-Torsion: For an A-B-C-D torsion, the central B-C bond elongates in response to eclipsing of the A-B and C-D terminal bonds. v(l , ) k (l l0 ) cos n Non-Bonded Interactions Operate within molecules and between molecules. Through space interactions. Modeled as a function of an inverse power of the distance. Soft mode. Divided into: • Electrostatic interactions. • VdW interactions. Electrostatic Interactions A B Vintermoleclar i 1 j 1 qi q j 4 0 rij A qi q j A Vintramoleclar 4 0 rij i 1 j i 1 qi, qj are point charges. Charge-charge interactions are long ranged (decay as r-1). When qi, qj are centered on the nuclei they are referred to as partial atomic charges. • Fit to known electric moments (e.g., dipole, quadrupole etc.) • Fit to thermodynamic properties. • Ab initio Calculations – Electrostatic potential Rapid Methods for Calculating Atomic Charges Partial Equalization of Orbital Electronegativity (Gasteiger and Marsili) http://www2.chemie.uni-erlangen.de/software/petra/manual/manual.pdf Electronegativity (Pauling): “The power of an atom to attract an electron to itself” Orbital electronegativity depends upon: • Valence state (e.g., sp > sp3). • Occupancy (e.g., empty > single > double). • Charges in other orbitals. Electrons flow from the less electronegative atoms to the more electronegative atoms thereby equalizing the electronegativities. Partial Equalization of Orbital Electronegativity For the dependency of orbital electronegativity on charge assume: A aA bAQA cAQA2 Theoretically correct for orbitals but formally applied to atoms. Values of a, b and c were obtained for common elements in their usual valence states (e.g., for atom types). Apply an interactive process: • Assign each atom its formal charge. • Calculate atomic electronegativities based on above assumption. • Calculate the electron charge transferred from atom A to the more electronegative atom B bonded to it by: Q (k ) B( k ) A( k ) k Dumping factor A Cation Electronegativity of the less electronegative atom Solvent Dielectric Models A qi q j A V 4 0 rij i 1 j i 1 0 = Dielectric constant of vacuum (1). For a given set of charges and distance, 0 determines the strength of the electrostatic interactions. Solvent effect dampen the electrostatic interactions and so can be modeled by varying 0: eff = 0r r(protein interior) = 2-4 r(water) = 80 Solvent Dielectric Models: Distance Dependence Dielectric constant increases as a function of distance A A eff eff (r ) r V i 1 qi q j 2 4 r j i 1 0 ij Smooth increase in eff form 1 to r as the distance increases eff eff r distance r 1 2 [(rS ) 2 2rS 2]e rS eff varies from 1 at zero separation to the bulk permitivity of the solvent at large separation. As the separation between the interacting atoms increases, more solvent can “penetrate” between them. Van Der Waals (VdW) Interactions Electrostatic interactions can’t account for all non-bonded interactions within a system (e.g., rare gases). VdW interactions: • Attractive (dispersive) contribution (London forces) – Instantaneous dipoles due to electron cloud fluctuations. – Decays as r6. • Repulsive contribution – Nuclei repulsion. – At short distances (r < 1) rises as 1/r. – At large distances decays as exp(-2r/a0); a0 the Bohr radius. Van Der Waals (VdW) Interactions The observed VdW potential results from a balance between the attractive and repulsive forces. Lennard-Jones Potential 12 6 v(r ) 4 r r • Rapid to calculate • Attractive part theoretically sound • Repulsive part easy to calculate but too steep Alternative forms rm 21 6 sum of VdW radii energy Modeling VdW Interactions rm separation v(r ) {( rm r )12 2(rm r ) 6 } v(r ) A r12 C r 6 A rm12 4 12 C 2rm6 4 6 Hydrogen Bonding H-bond geometry independent: A C v(r ) 12 10 r r • r is the distance between the H-bond donor and H-bond acceptor. H-bond geometry dependent (co-linearity with lp preferred): vHB A A 2 12 10 cos Don...H ... Acc cos 4 H ... Acc ...LP rH ... Acc rH ... Acc Acc H N Don Force Field Parameterization Choosing values for the parameters in the potential function equations to best reproduce experimental data. Parameterization techniques • Trial and error • Least square methods Types of parameters • • • • • • Stretch: natural bond length (l0) and force constants (k). Bend: natural bond angles (0) and force constants (k). Torsions: Vi’s. VdW: (, VdW radii). Electrostatic: Partial atomic charges. Cross-terms: Cross term parameters. Parameters - Quantity For N atom types require: • • • • N non-bonded parameters N*N stretch parameters N*N*N bend parameters N*N*N*N torsion parameters Example MacroModel MM2*: 39 atom types Stretch Bend Torsion Required 1521 58,319 2,313,441 Actual 164 357 508 Parameters - Source A force field parameterized according to data from one source (e.g., experimental gas phase, experimental solid phase, ab initio) will fit data from other sources only qualitatively. Experiment: geometries and non-bonded parameters • • • • X-Ray crystallography Electron diffraction Microwave spectroscopy Lattice energies Advantages • Real Disadvantages • Hard to obtain • Non-uniform • Limited availability Parameterization - Example O H O O H ab initio -0.69 ABMER -1.25 parameterized AMBER -0.68 5.00 Energy (kcal/mol) 5.00 Energy (kcal/mol) O relative energy (kcal/mol) 4.00 3.00 2.00 1.00 0.00 4.00 3.00 2.00 1.00 0.00 0 90 180 torsion 270 360 0 90 180 torsion 270 360 Transferability of Parameters Assumption: The same set of parameters can be used to model a related series of compounds: • Prediction • Missing parameters – Educated guess – Parameters derived from atomic properties (e.g., UFF) • Reducing number of parameters – Generalized torsions (e.g., depend only on two central atoms) – Generalized VdW parameters (same for SP, SP2, SP3) Above assumption breaks down for close interaction function groups: O X O O Existing Force Fields AMBER (Assisted Model Building with Energy Refinement) • Parameterized specifically for proteins and nucleic acids. • Uses only 5 bonding and non-bonding terms along with a sophisticated electrostatic treatment. • No cross terms are included. • Results can be very good for proteins and nucleic acids, less so for other systems. CHARMM (Chemistry at Harvard Macromolecular Mechanics) • Originally devised for proteins and nucleic acids. • Now used for a range of macromolecules, molecular dynamics, solvation, crystal packing, vibrational analysis and QM/MM studies. • Uses 5 valence terms, one of which is electrostatic term. • Basis for other force fields (e.g., MOIL). Existing Force Fields GROMOS (Gronigen molecular simulation) • Popular for predicting the dynamical motion of molecules and bulk liquids. • Also used for modeling biomolecules. • Uses 5 valence terms, one of which is an electrostatic term. MM1, 2, 3, 4 • General purpose force fields for (mono-functional) organic molecules. • MM2 was parameterized for a lot of functional groups. • MM3 is probably one of the most accurate ways of modeling hydrocarbons. • MM4 is very new and little is known about its performance. • The force fields use 5 to 6 valence terms, one of which is an electrostatic term and one to nine cross terms. Existing Force Fields MMFF (Merck Molecular Force Field) • General purpose force fields mainly for organic molecules. • MMFF94 was originally designed for molecular dynamics simulations but is also widely used for geometry optimization. • Uses 5 valence terms, one of which is an electrostatic term and one cross term. • MMFF was parameterized based on high level ab initio calculations. OPLS (Optimized Potential for Liquid Simulations) • Designed for modeling bulk liquids. • Has been extensively used for modeling the molecular dynamics of biomolecules. • Uses 5 valence terms, one of which is an electrostatic term but no cross terms. Existing Force Fields Tripos (SYBYL force field) • Designed for modeling organic and biomolecules. • Often used for CoMFA analysis (QSAR method). • Uses 5 valence terms, one of which is an electrostatic term. MOMEC • A force field for describing transition metals coordination compounds. • Originally parameterized to use 4 valence terms but not an electrostatic term. • Metal-ligand interactions consist of bond stretch only. • Coordination sphere is maintained by non-bonding interactions between ligands. • Work reasonable well for octahedrally coordinated compounds. Existing Force Fields UFF (Universal Force Field) • Designed for coverage of the entire periodic table. • All force field parameters are atomic based. • Parameters for specific interactions are derived from atomic parameters based on a series of mixing rules, thereby addressing the problem of parameter explosion. • Uses 4 valence terms but not an electrostatic term. YATI • Designed for the accurate representation of non-bonded interactions. • Most often used for modeling interactions between biomolecules and small substrate molecules. Existing Force Fields CVFF (Consistent Valence Force Field) • Parameterized for small organic (amides, carboxylic acids, etc.) crystals and gas phase structures. • Handles peptides, proteins, and a wide range of organic systems. • Primarily intended for studies of structures and binding energies, although it predicts vibrational frequencies and conformational energies reasonably well. Existing Force Fields CFF, CFF91, CFF95 (Consistent Force Field) • Parameterized (based on quantum mechanics calculations and molecular simulations) for acetals, acids, alcohols, alkanes, alkenes, amides, amines, aromatics, esters, and ethers. • Useful for predicting: – Small molecules: gas-phase geometries, vibrational frequencies, conformational energies, torsion barriers, crystal structures. – Liquids: cohesive energy densities; for crystals: lattice parameters, RMS atomic coordinates, sublimation energies. – Macromolecules: protein crystal structures, polycarbonates and polysaccharides (CFF91,95). Existing Force Fields Dreiding • Parameters derived by a rule based approach. • Good coverage of the periodic table. • Good coverage for organic, biological and main-group inorganic molecules. • Only moderately accurate for geometries, conformational energies, intermolecular binding energies, and crystal packing. Energy Minimization Introduction A system of N atoms is defined by 3N Cartesian coordinates or 3N-6 internal coordinates. These define a multi-dimensional potential energy surface (PES). • Minima (stable conformations) • Maxima • Saddle points (transition states) energy A PES is characterized by stationary points: coordinates Classification of Stationary Points 1st Derivative 0 0 0 Type Minimum Maximum Saddle point 2nd Derivative* positive negative 1 negative 20.0 16.0 energy transition state 12.0 8.0 local minimum 4.0 global minimum 0.0 0 90 180 270 360 coordinate * Refers to the eigenvalues of the second derivatives (Hessian) matrix Population of Minima Active Structure Most populated minimum Global minimum Most minimization method can only go downhill and so locate the closest (downhill sense) minimum. No minimization method can guarantee the location of the global energy minimum. No method has proven the best for all problems. A General Minimization Scheme Starting point x0 Minimum? No Calculate xk+1 = f(xk) yes Stop Sequential Univariate Method For each coordinate: • Generate two new points (xi+dxi, xi+2dxi) and calculate their energies. • Fit a parabola to xi (1), xi+dxi (2), xi+2dxi (3) and locate its minimum (4). • Set the coordinate of xi to the parabola minimum. • When the change in all coordinates is small enough, stop. This method requires fewer function evaluations than the Simplex method but is still slow to converge. Summary Simplex, Sequential Univariate (memory requirements scale as N) • Robust (i.e., can be initiated from poor starting geometries). • Slow to converge even on quadratic surfaces. • Require many function evaluations (especially Simplex). Steepest Descent (memory requirements scale as 3N) • Robust. • Convergence guaranteed on quadratic surfaces. • Slow to converge especially near the minimum. Conjugate gradient, Quasi NR (memory requirements scale as (3N)2) • Converges in approximately N steps for N degrees of freedom. • Convergence guaranteed on quadratic surfaces. • Unstable away from the minimum (i.e., when surface not quadratic). • CG is the method of choice for large systems. Newton-Raphson (memory requirements scale as (3N)2) • Converges in one step on quadratic surfaces. • Unstable away from the minimum. • Computationally expensive.