Parameterisation of a custom amino-acid, PCA Contents • Force-fields and how they work – What kind of interactions do we need to parameterise? • Quantum Mechanical calculations – Where force-fields obtain basic information • CHARMM parameterisation process – The goal of developers => follow same goal • Worked example of pyroglutamic acid What this talk will cover • It will: – Teach the necessary background – Advise starting points for parameterisation, by analogy with existing compounds – Follow the process for a CHARMM style strategy • pyroglutamic acid under CHARMM27 - We will be following a published process from developers • It will not: • Teach you how to create a force-field from scratch – Full parameterisation is a long process, with many potential pit-falls – This applies to new atoms and geometries not seen in force-fields – If your molecule falls under this category, it’s best to ask the experts whether someone else has done it NB: read Vanommeasleagh's home page and tutorials: http://dogmans.umaryland.edu/~kenno/#CGenFF 1: Force fields • All force-fields have a purpose – There are several force-fields, each with their own goals (e.g. following developer’s research interest) – They are thus not fully compatible • All force-fields are made from simple components – They are not really black-boxes – Each terms can be fully understood – Therefore, researchers like us can make force-fields • I will describe CHARMM-ffs from above points Purpose of force-fields • Force-fields are used to replicate chemical or biological environments. – This means they should produce reasonable results compared to experimental data • Force-fields must also be derived from quantum mechanical (QM) simulations for molecular accuracy – i.e. FFs must approximate behaviour of electrons and atomic bonds – FFs must choose basic functions with which to do this • FF-developers must choose what kind of environment or interaction they will replicate • Thus, FFs must make choices – AMBER targets proteins conformations – OPLS-AA targets organic liquid environments – CHARMM targets hydrous solvent interactions and energies • .˙. Many force-fields can be used for task X, but some will perform better. Purpose of force-fields Due to these differences: • Force-fields have an optimal range of performances, and limits – E.g. GROMOS works particularly well for studies of lipid behaviour – Results may be surprising if ffs used in exotic conditions • Force-fields should not be mixed with each other • The detailed mechanics are different Analogy: TIP3 water do not melt or boil at expected temperatures – they were never meant to! CHARMM-CGenFF force field • Basis: General forcefield for all organic molecules – Match MM-properties to QM calculations (MP2/6-31G*) – Minimum necessary work, to allow parameters to be shared across many different molecules – Can use for, e.g. drug-binding studies in MD • Overall process for parametrisation – – – – – Charges <= Dipole moments, interaction with TIP3 water Bond and angle values <= QM Equilibrium geometry Bond and angle constraints <= Vibrational modes Dihedral constraints <= Potential energy scans Van der Waals <= Heat of solvation, molecular volume CHARMM force field for biomolecules • Basis: Energetically accurate set of parameters – Replicate observed experimental quantities specific to proteins, nucleic-acids, etc. – i.e. Aim is for CHARMM proteins/DNA which act exactly like their counterparts in real life • Overall process for parametrisation – Fit parameters like CGenFF. – Modify to fit with relevant experimental observations. This means heat of solvation, solvent/solute interactions, etc. – Operating conditions: This force-field is optimised for room temperature, liquid phase experiments – Similar process: CGenFF can combine with CHARMM Comparison: AMBER’s protein FF • Basis: To replicate protein behaviour by modelling amino acid conformations accurately – (subtle difference to CHARMM) • Process: – Point charges fit to higher QM calculations (B3LYP/cc-pVTZ/HF/631G**) at dielectric constant = 4 – Bonds and angles matched to X-ray crystals and vibrational spectra – Torsions fit to reproduce peptide conformations and phi-psi energy surfaces. • Performs similar to CHARMM by different means – Complete amino acids are parameterised, where as CHARMM uses fragments with known experimental data – More dependent on QM and biochemical observations, less on strict chemical data Inner-working of CHARMM • CHARMM and CGenFF (from Duan e.t al., 2003) : – Harmonic bonds – Harmonic angles (modified) – Harmonic impropers – Cosine dihedrals Inner-working of CHARMM Compositions Observations • Basic interactions between bonded atoms are harmonic potentials • Intra-molecular terms are very simple in nature • They must be fitted to approximate real covalently bonded molecules • QM-calculations form the basis of these target data • During parameterisation, we won’t usually need to modify vdW terms – This reduces computation cost by using simple functions and eliminating electrons from the equation • Electrostatics and vdW forces are preserved – They are essential to molecular systems! QM calculations: Gaussian • CHARMM developers use Gaussian to produce target data – Process documented for Gaussian – However, one can use other QM programs • I’ll explain what calculations are required and how to do them Using Gaussian with GaussView Gaussian: Scripting • Textual interface provides all input necessary – One file for one simulation • GaussView (GUI) is provided to assist process • The QM-level for parameterisation is MP2/6-31G* – Enough to describe common interactions – Probably not sufficient for certain organo-metal interactions Gaussian: Scripting Conditions and general purpose of this simulation Title Coordinates: different representations possible Detailed commands, follow-on simulations, etc. QM calculations required: 1: Equilibrium geometry 2: Vibrational spectra 3: Dihedral energy surfaces NB: CHARMM website provides a fully worked tutorial CHARMM parametrisation process • Priorities of different parameters – The energetics needs to be replicated ultimately to < 1 kcal/mol – Some parameters have wide-spread effects, other fill in important details • Flowchart – Parameterisation is iterative, and will take time • Illustrate process with pyroglutamic acid CHARMM parametrisation process • The priority of different parameters: – Charges – Equilibrium bond and angle values – Bonds and angles force constants – Torsions • This order will permeate throughout parameterisation • Each set of parameters depends on everything above • Hence, refinement of parameters follows: • Create/modify data • Refine, check results • repeat. Flowchart and notations • • • • The rest of this talk follows like so: 1 – Prepare entries and coordinates 2 – Optimise charges 3 – Optimise bonds and angles – 3.1 – equilibrium values by QM geometry – 3.2 – force constants by molecular vibrations • 4 – Optimise dihedrals – 4.1 – Generate Potential Energy Scans – 4.2 – Dihedral constants by matching and chemcial knowledge • 5 – Validation Individual Processes Set initial topology and geometry Set initial geometry Calculate QM vib. spec. in CHARMM QM-minim. geometry with water molecules QM-minim. geometry Modify bond/angle force const. Set charges and vdW Modify bonds and angles Calculate MM vib. spec. in CHARMM MM-minim. geometry with water molecules MM-minim. geometry Do they match? Do they match? charge and vdW fits complete bond and angle fits complete Do they match? bond/angle force const. complete ...etc. Validation • After the last fit to dihedrals is complete, all outputs need to be verified – Starting again from partial charges and water interaction simulations... • Any changes likely propagate itself downwards through flowchart • Main reason why parameterisation is timeconsuming First parametrisation Do charges Do bonds and angles Do dihedrals Validate all data all complete Worked example: • Pyroglutamic acid – As its name suggests, an amino acid – N-terminal only, cyclisation of glumatic acid or enzymatic activity – Used in proteinligand simulation with scorpion toxins. 1.1: prepare topology (idea) • Observe existing molecules in the database – What chemical groups are your molecules? • Cut and paste sections together, borrowing charges and groupings 1.1: prepare topology (details) • Adapt from existing residues. – PRO as base residue, referred to GLN and backbone values. • Consider its use and clashes with existing parameters • Right now, backbone angles and torsion parameters will be used (bad idea for cyclic molecule) • Change atom-typings to allow modification of important atom bonds and angles • NB: Creating new atom-types are not preferred, as it will be more difficult for future work 1.1: atom-typings • From existing ff: • neutral Ns in CHARMM – NH2 is primary amine – NH1 is secondary amine – N is tertiary amine • carboxyl Cs in CHARMM – CC/CD are proteins – CE1/CE2 are elementary alkanes • Use NH1/CC typing, no need to modify O and H. • Topology done! Return to ICs later. 1.1: prepare parameters • Work out required bonds, angles, dihedrals – running CHARMM can help, as it stops with a warning when bonds and angles are missing • Borrow known bond and angle values – PRO provides most of existing – CGenFF has same philosophy in making bonded parameters • 2PDO is a very similar molecule in CGenFF. Use its values for dihedrals as an initial guess. • TIP: Establish what you *need* to parameterise now, and change only them in future edits. 1.2: Create molecule for Gaussian • The starting geometry will affect results – We grab some experimental coordinates from, e.g. DrugBank or PDB – These coordinates can also be created with some chemical softwares • NB: CHARMM uses IC tables which it will use to generate the residues when coordinates are not given – Either paste by analogy or transfer from minimised geometry 1.2: QM Minimisation • Gaussian outputs in formats offering more accuracy than pdb – Outputs all bond, angle and dihedral data – file conversions may be necessary for visualisation 1.3: Create IC table CHARMM manual entry(!) • Each entry in the IC table (see below) lists 4 connected atoms, I, J, K and L,; for a normal IC table entry, the I-J bond length, R(IJ); the I-J-K bond angle, T(IJK); the dihedral angle I-J-KL, PHI; the bond angle T(JKL); and the K-L bond length, R(KL) are listed. Improper dihedral angles, which are used to keep sp2 atoms planar and sp3 atoms in a tetrahedral geometry, are marked with a star. The center atom of an improper dihedral angle is marked with a star. For an improper dihedral angle entry, the I-K bond length, R(IK); the I-K-J bond angle, T(IKJ); the dihedral angle I-J-K-L , PHI; the J-K-L bond angle T(JKL); and the KL bond length, R(KL) are listed. The atom entry "-99" indicates an undefined atom. • CHARMM begins with a seed of three atoms, then defines the rest of the molecule with these three – Protein conventions follow backbone N-CA-C • When you create an IC table, remember to base your first entries on these and ‘grow’ the molecule from there 1.3: Create IC table Atom set Bond Angle Dihedral Angle Bond I-J I-J-K I-J-K-L J-K-L K-L N CA C O 1.441 111.58 22.93 120.82 1.23 CB CA C O 1.545 110.65 -172.00 120.82 1.23 Atom set Bond Angle Dihedral Angle Bond I-K I-K-J I-J-K-L J-K-L K-L N C *CA HA 1.441 110.86 -122.40 109.09 1.102 N C *CA CB 1.441 110.86 113.74 110.65 1.545 1.3: IC table notes • CHARMM does not need every possible entry in a given IC table – Only the dihedrals are necessary • The command “IC PARAM” will fill in bonds/angles using existing parameters • Will save you a lot of time • Number of entries: about 1 less than the total number of atoms. 2: Optimise charges • In CHARMM ff, protein charges are not parameterised by QM calculations alone. • Consistent behaviour with other amino-acids means that PCA should obey similar charges, rather than QM data per-se – Obtained charges from analogy are “good enough” when compared with QM data – No unique chemical groups such that charge optimisation is required • This step is thus skipped for PCA • I will show you a mocked example 2.1: Optimise charges • Begin with charges from Gaussian/analogy • (can read from output) 2.1: Optimise charges • Run water interaction simulation and also check molecule dipole – i.e. imitate H-bonding interaction with charges • Modify charges to fit both data • NB: MM dipole needs to overestimate QM dipole by 30-50%, since QM data is in vacuum 3: Optimise bonds and angles • Begin by setting equilibrium values to QM or crystal values • Compare results and modify equilibrium values – Using an IC table by analogy gives the wrong ring conformation to CHARMM – I constructed an IC to start PCA near the other minimum. CHARMM then finds the equatorial conformer • It is important to check that your conformation agrees after dihedras are fitted 3.1: Bond/angle equilibrium values • CHARMM developers used 0.2 Å and 3° as the upper limit – Can usually do much better – Developers seek to use same parameters to describe many ligands, we do not need to 3.1: Bond/angle force constants • Method: Comparison of QM and MM vibrational spectra • Analogous residues are good starting points for force constants • However, perfect agreement is impossible – This is due to differences between MM and QM minima, and mixing with torsion parameters – May need to modify again during validation – A “general” agreement (about 10%) is good enough when all parameterisation is finished 3.1: Vibrational spectra • Vibrations can be collected into components – change in dipole determines IR absorption • Components are defined by motions of collective parts – bond stretches and angle bending – rocking, scissoring, wagging motions • Tip: revise IR and Raman spectroscopy dihedrals, out-of-plane motion N-C stretch C=O stretch C-H stretch N-H,O-H stretch 3.1: Vibrational notation • In CHARMM, you need to convert motions of atoms into bonds, angles and dihedrals (internal coordinates, IC) – Forms basis set of all the degrees of freedoms – # Vibrations = DoG = 3N-6 – CHARMM then uses these to fit vibrational spetra • Then you need to convert these ICs into vibrational modes – Read Pulay et. al. (1979) NB: I wrote a relatively simple tutorial on the CHARMM forums to run users through the Pulay conversion for water and propane. QM MM 3.1: Bond/angle force constants • Visualising these vibrations with GaussView and others will help you identify important vibrations • NB: Remember there is a distinction between fitting to QM calculations, and fitting to experimental spectra 4: Optimise dihedrals • Potential Energy Scans involve: fixing a dihedral at discrete points, minimising the rest of the geometry, and calculating absolute energy. • Determines conformational preferences of residues, especially important for packing 4: Optimise dihedrals • PCA with initial dihedrals favour the opposite ring conformer (top) • One will need to work out which parameter affect the rotations you need • After a series of fits and rationales for given parameters, a closer agreement can be obtained (bottom) Some rationales for the PCA case: • Fit only dihedrals that contains re-typed atoms – This includes NH1 and CC • Carboxyl backbone prefers equatorial over axial orientation w.r.t. ring – Fit a single 1-fold or 2-fold dihedral to CD-N-CA-C to express this • Keep all the 2PDO dihedrals as is, except where necessary – So, planar amide retains 2fold only due to symmetry – Ring-dihedrals contain only 3-fold – Modify numbers to produce correct energy surface • The rest is heuristic searching • Pointers for a good fit: • Single parameters of 1-fold,2-fold, 3-fold, 6-fold. • Accuracy to about 0.2 kcal/mol • Attention to barrier heights and relative minima positions • Using multiple dihedral parameters and no restrictions on phase, one can achieve a fit like this. • The energy surface is very close to QM. However, the parameters used are arguably unphysical 5: Iteration • Now that you have your molecule, re-run all the tests and check if discrepancies have arisen during the process • The molecular vibrational spectrum should look better – Dihedrals factor into lower vibrational modes • Adjust as necessary. 5: Comments • You can spend as much time as you wish tweaking the numbers, but keep in mind that: – Simulation accuracy is going to be larger than the residual errors in your parameters – Even experimental accuracy is 1 kcal/mol • If it is convenient, post the major results and check with the developers (be nice) Finished product • Once you are satisfied with the result, it’s time to test the new molecule in MD • As force-field are always in constant development... – if you did your parameterisation well, CHARMM developers may add it to the collection. • Good luck!