QMParam

advertisement
Parameterisation of a custom
amino-acid, PCA
Contents
• Force-fields and how they work
– What kind of interactions do we need to
parameterise?
• Quantum Mechanical calculations
– Where force-fields obtain basic information
• CHARMM parameterisation process
– The goal of developers => follow same goal
• Worked example of pyroglutamic acid
What this talk will cover
• It will:
– Teach the necessary
background
– Advise starting points for
parameterisation, by
analogy with existing
compounds
– Follow the process for a
CHARMM style strategy
• pyroglutamic acid under
CHARMM27
- We will be following a
published process from
developers
• It will not:
•
Teach you how to create
a force-field from scratch
– Full parameterisation is a
long process, with many
potential pit-falls
– This applies to new atoms
and geometries not seen in
force-fields
– If your molecule falls under
this category, it’s best to
ask the experts whether
someone else has done it
NB: read Vanommeasleagh's home page and tutorials:
http://dogmans.umaryland.edu/~kenno/#CGenFF
1: Force fields
• All force-fields have a purpose
– There are several force-fields, each with their own
goals (e.g. following developer’s research interest)
– They are thus not fully compatible
• All force-fields are made from simple components
– They are not really black-boxes
– Each terms can be fully understood
– Therefore, researchers like us can make force-fields
• I will describe CHARMM-ffs from above points
Purpose of force-fields
• Force-fields are used to
replicate chemical or
biological environments.
– This means they should
produce reasonable results
compared to experimental data
• Force-fields must also be
derived from quantum
mechanical (QM) simulations
for molecular accuracy
– i.e. FFs must approximate
behaviour of electrons and
atomic bonds
– FFs must choose basic
functions with which to do this
• FF-developers must choose
what kind of environment or
interaction they will replicate
• Thus, FFs must make choices
– AMBER targets proteins
conformations
– OPLS-AA targets organic
liquid environments
– CHARMM targets hydrous
solvent interactions and
energies
• .˙. Many force-fields can be
used for task X, but some will
perform better.
Purpose of force-fields
Due to these differences:
• Force-fields have an optimal
range of performances, and
limits
– E.g. GROMOS works
particularly well for studies of
lipid behaviour
– Results may be surprising if ffs
used in exotic conditions
• Force-fields should not be
mixed with each other
• The detailed mechanics are
different
Analogy: TIP3 water do not melt or boil at
expected temperatures – they were never
meant to!
CHARMM-CGenFF force field
• Basis: General forcefield for all organic molecules
– Match MM-properties to QM calculations (MP2/6-31G*)
– Minimum necessary work, to allow parameters to be shared
across many different molecules
– Can use for, e.g. drug-binding studies in MD
• Overall process for parametrisation
–
–
–
–
–
Charges <= Dipole moments, interaction with TIP3 water
Bond and angle values <= QM Equilibrium geometry
Bond and angle constraints <= Vibrational modes
Dihedral constraints <= Potential energy scans
Van der Waals <= Heat of solvation, molecular volume
CHARMM force field for biomolecules
• Basis: Energetically accurate set of parameters
– Replicate observed experimental quantities specific to
proteins, nucleic-acids, etc.
– i.e. Aim is for CHARMM proteins/DNA which act exactly like
their counterparts in real life
• Overall process for parametrisation
– Fit parameters like CGenFF.
– Modify to fit with relevant experimental observations. This
means heat of solvation, solvent/solute interactions, etc.
– Operating conditions: This force-field is optimised for room
temperature, liquid phase experiments
– Similar process: CGenFF can combine with CHARMM
Comparison: AMBER’s protein FF
• Basis: To replicate protein behaviour by modelling amino
acid conformations accurately
– (subtle difference to CHARMM)
• Process:
– Point charges fit to higher QM calculations (B3LYP/cc-pVTZ/HF/631G**) at dielectric constant = 4
– Bonds and angles matched to X-ray crystals and vibrational
spectra
– Torsions fit to reproduce peptide conformations and phi-psi
energy surfaces.
• Performs similar to CHARMM by different means
– Complete amino acids are parameterised, where as CHARMM
uses fragments with known experimental data
– More dependent on QM and biochemical observations, less on
strict chemical data
Inner-working of CHARMM
• CHARMM and CGenFF (from Duan e.t al., 2003) :
– Harmonic bonds
– Harmonic angles
(modified)
– Harmonic
impropers
– Cosine dihedrals
Inner-working of CHARMM
Compositions
Observations
• Basic interactions between
bonded atoms are harmonic
potentials
• Intra-molecular terms are
very simple in nature
• They must be fitted to
approximate real covalently
bonded molecules
• QM-calculations form the
basis of these target data
• During parameterisation, we
won’t usually need to
modify vdW terms
– This reduces computation
cost by using simple functions
and eliminating electrons
from the equation
• Electrostatics and vdW
forces are preserved
– They are essential to
molecular systems!
QM calculations: Gaussian
• CHARMM developers
use Gaussian to
produce target data
– Process documented
for Gaussian
– However, one can use
other QM programs
• I’ll explain what
calculations are
required and how to
do them
Using Gaussian with GaussView
Gaussian: Scripting
• Textual interface
provides all input
necessary
– One file for one simulation
• GaussView (GUI) is
provided to assist
process
• The QM-level for
parameterisation is
MP2/6-31G*
– Enough to describe
common interactions
– Probably not sufficient for
certain organo-metal
interactions
Gaussian: Scripting
Conditions and general
purpose of this simulation
Title
Coordinates:
different representations
possible
Detailed commands,
follow-on simulations,
etc.
QM calculations
required:
1: Equilibrium geometry
2: Vibrational spectra
3: Dihedral energy surfaces
NB: CHARMM website
provides a fully worked tutorial
CHARMM parametrisation process
• Priorities of different parameters
– The energetics needs to be replicated ultimately
to < 1 kcal/mol
– Some parameters have wide-spread effects, other
fill in important details
• Flowchart
– Parameterisation is iterative, and will take time
• Illustrate process with pyroglutamic acid
CHARMM parametrisation process
• The priority of
different parameters:
– Charges
– Equilibrium bond and
angle values
– Bonds and angles force
constants
– Torsions
• This order will
permeate throughout
parameterisation
• Each set of parameters
depends on everything
above
• Hence, refinement of
parameters follows:
• Create/modify data
• Refine, check results
• repeat.
Flowchart and notations
•
•
•
•
The rest of this talk follows like so:
1 – Prepare entries and coordinates
2 – Optimise charges
3 – Optimise bonds and angles
– 3.1 – equilibrium values by QM geometry
– 3.2 – force constants by molecular vibrations
• 4 – Optimise dihedrals
– 4.1 – Generate Potential Energy Scans
– 4.2 – Dihedral constants by matching and chemcial
knowledge
• 5 – Validation
Individual Processes
Set initial topology
and geometry
Set initial geometry
Calculate QM vib. spec.
in CHARMM
QM-minim. geometry
with water molecules
QM-minim. geometry
Modify bond/angle
force const.
Set charges and vdW
Modify bonds and angles
Calculate MM vib. spec.
in CHARMM
MM-minim. geometry
with water molecules
MM-minim. geometry
Do they
match?
Do they
match?
charge and vdW
fits complete
bond and angle
fits complete
Do they
match?
bond/angle
force const. complete
...etc.
Validation
• After the last fit to
dihedrals is complete, all
outputs need to be
verified
– Starting again from
partial charges and water
interaction simulations...
• Any changes likely
propagate itself
downwards through
flowchart
• Main reason why
parameterisation is timeconsuming
First parametrisation
Do charges
Do bonds and angles
Do dihedrals
Validate all data
all complete
Worked example:
• Pyroglutamic acid
– As its name suggests,
an amino acid
– N-terminal only,
cyclisation of
glumatic acid or
enzymatic activity
– Used in proteinligand simulation
with scorpion toxins.
1.1: prepare topology (idea)
• Observe existing
molecules in the
database
– What chemical
groups are your
molecules?
• Cut and paste
sections together,
borrowing charges
and groupings
1.1: prepare topology (details)
• Adapt from existing residues.
– PRO as base residue, referred
to GLN and backbone values.
• Consider its use and clashes
with existing parameters
• Right now, backbone angles
and torsion parameters will
be used (bad idea for cyclic
molecule)
• Change atom-typings to allow
modification of important
atom bonds and angles
• NB: Creating new atom-types
are not preferred, as it will be
more difficult for future work
1.1: atom-typings
• From existing ff:
• neutral Ns in CHARMM
– NH2 is primary amine
– NH1 is secondary amine
– N is tertiary amine
• carboxyl Cs in CHARMM
– CC/CD are proteins
– CE1/CE2 are elementary
alkanes
• Use NH1/CC typing, no need
to modify O and H.
• Topology done! Return to
ICs later.
1.1: prepare parameters
• Work out required bonds, angles,
dihedrals
– running CHARMM can help, as it stops
with a warning when bonds and angles
are missing
• Borrow known bond and angle
values
– PRO provides most of existing
– CGenFF has same philosophy in making
bonded parameters
• 2PDO is a very similar molecule in
CGenFF. Use its values for dihedrals
as an initial guess.
• TIP: Establish what you *need* to
parameterise now, and change only
them in future edits.
1.2: Create molecule for Gaussian
• The starting geometry will affect results
– We grab some experimental coordinates from, e.g.
DrugBank or PDB
– These coordinates can also be created with some
chemical softwares
• NB: CHARMM uses IC tables which it will use to
generate the residues when coordinates are not
given
– Either paste by analogy or transfer from
minimised geometry
1.2: QM Minimisation
• Gaussian outputs in
formats offering more
accuracy than pdb
– Outputs all bond, angle
and dihedral data
– file conversions may be
necessary for
visualisation
1.3: Create IC table
CHARMM manual entry(!)
•
Each entry in the IC table (see below)
lists 4 connected atoms, I, J, K and L,;
for a normal IC table entry, the I-J
bond length, R(IJ); the I-J-K bond
angle, T(IJK); the dihedral angle I-J-KL, PHI; the bond angle T(JKL); and the
K-L bond length, R(KL) are listed.
Improper dihedral angles, which are
used to keep sp2 atoms planar and
sp3 atoms in a tetrahedral geometry,
are marked with a star. The center
atom of an improper dihedral angle is
marked with a star. For an improper
dihedral angle entry, the I-K bond
length, R(IK); the I-K-J bond angle,
T(IKJ); the dihedral angle I-J-K-L , PHI;
the J-K-L bond angle T(JKL); and the KL bond length, R(KL) are listed. The
atom entry "-99" indicates an
undefined atom.
• CHARMM begins with a
seed of three atoms,
then defines the rest of
the molecule with these
three
– Protein conventions
follow backbone N-CA-C
• When you create an IC
table, remember to
base your first entries
on these and ‘grow’ the
molecule from there
1.3: Create IC table
Atom set
Bond Angle Dihedral Angle Bond
I-J
I-J-K
I-J-K-L
J-K-L
K-L
N CA C O
1.441
111.58
22.93
120.82
1.23
CB CA C O
1.545
110.65
-172.00
120.82
1.23
Atom set
Bond Angle Dihedral Angle Bond
I-K
I-K-J
I-J-K-L
J-K-L
K-L
N C *CA HA
1.441
110.86
-122.40
109.09 1.102
N C *CA CB
1.441
110.86
113.74
110.65 1.545
1.3: IC table notes
• CHARMM does not need
every possible entry in a
given IC table
– Only the dihedrals are
necessary
• The command “IC
PARAM” will fill in
bonds/angles using
existing parameters
• Will save you a lot of
time
• Number of entries:
about 1 less than the
total number of atoms.
2: Optimise charges
• In CHARMM ff, protein charges are not
parameterised by QM calculations alone.
• Consistent behaviour with other amino-acids
means that PCA should obey similar charges, rather
than QM data per-se
– Obtained charges from analogy are “good enough”
when compared with QM data
– No unique chemical groups such that charge
optimisation is required
• This step is thus skipped for PCA
• I will show you a mocked example
2.1: Optimise charges
• Begin with charges
from Gaussian/analogy
• (can read from output)
2.1: Optimise charges
• Run water interaction
simulation and also
check molecule dipole
– i.e. imitate H-bonding
interaction with charges
• Modify charges to fit
both data
• NB: MM dipole needs to
overestimate QM dipole
by 30-50%, since QM
data is in vacuum
3: Optimise bonds and angles
• Begin by setting equilibrium values to
QM or crystal values
• Compare results and modify
equilibrium values
– Using an IC table by analogy gives the
wrong ring conformation to CHARMM
– I constructed an IC to start PCA near the
other minimum. CHARMM then finds
the equatorial conformer
• It is important to check that your
conformation agrees after dihedras
are fitted
3.1: Bond/angle equilibrium values
• CHARMM developers
used 0.2 Å and 3° as
the upper limit
– Can usually do much
better
– Developers seek to
use same parameters
to describe many
ligands, we do not
need to
3.1: Bond/angle force constants
• Method: Comparison of QM and MM
vibrational spectra
• Analogous residues are good starting points
for force constants
• However, perfect agreement is impossible
– This is due to differences between MM and QM
minima, and mixing with torsion parameters
– May need to modify again during validation
– A “general” agreement (about 10%) is good
enough when all parameterisation is finished
3.1: Vibrational spectra
• Vibrations can be
collected into components
– change in dipole
determines IR absorption
• Components are defined
by motions of collective
parts
– bond stretches and angle
bending
– rocking, scissoring,
wagging motions
• Tip: revise IR and Raman
spectroscopy
dihedrals,
out-of-plane
motion
N-C
stretch
C=O
stretch
C-H
stretch
N-H,O-H
stretch
3.1: Vibrational notation
• In CHARMM, you need to convert motions of
atoms into bonds, angles and dihedrals
(internal coordinates, IC)
– Forms basis set of all the degrees of freedoms
– # Vibrations = DoG = 3N-6
– CHARMM then uses these to fit vibrational spetra
• Then you need to convert these ICs into
vibrational modes
– Read Pulay et. al. (1979)
NB: I wrote a relatively simple tutorial on the CHARMM forums to run users through the
Pulay conversion for water and propane.
QM
MM
3.1: Bond/angle
force constants
• Visualising these
vibrations with
GaussView and others
will help you identify
important vibrations
• NB: Remember
there is a
distinction
between fitting
to QM
calculations,
and fitting to
experimental
spectra
4: Optimise dihedrals
• Potential Energy Scans
involve: fixing a
dihedral at discrete
points, minimising the
rest of the geometry,
and calculating
absolute energy.
• Determines
conformational
preferences of
residues, especially
important for packing
4: Optimise
dihedrals
• PCA with initial
dihedrals favour the
opposite ring
conformer (top)
• One will need to work
out which parameter
affect the rotations
you need
• After a series of fits
and rationales for
given parameters, a
closer agreement can
be obtained (bottom)
Some rationales for the PCA case:
• Fit only dihedrals that
contains re-typed atoms
– This includes NH1 and CC
• Carboxyl backbone
prefers equatorial over
axial orientation w.r.t.
ring
– Fit a single 1-fold or 2-fold
dihedral to CD-N-CA-C to
express this
• Keep all the 2PDO
dihedrals as is, except
where necessary
– So, planar amide retains 2fold only due to symmetry
– Ring-dihedrals contain
only 3-fold
– Modify numbers to
produce correct energy
surface
• The rest is heuristic
searching
• Pointers for a good fit:
• Single parameters of 1-fold,2-fold, 3-fold, 6-fold.
• Accuracy to about 0.2 kcal/mol
• Attention to barrier heights and relative minima
positions
• Using multiple dihedral parameters and no restrictions
on phase, one can achieve a fit like this.
• The energy surface is very close to QM. However, the
parameters used are arguably unphysical
5: Iteration
• Now that you have your molecule, re-run all
the tests and check if discrepancies have arisen
during the process
• The molecular vibrational spectrum should
look better
– Dihedrals factor into lower vibrational modes
• Adjust as necessary.
5: Comments
• You can spend as much time as you wish
tweaking the numbers, but keep in mind that:
– Simulation accuracy is going to be larger than the
residual errors in your parameters
– Even experimental accuracy is 1 kcal/mol
• If it is convenient, post the major results and
check with the developers (be nice)
Finished product
• Once you are satisfied
with the result, it’s time
to test the new molecule
in MD
• As force-field are always
in constant
development...
– if you did your
parameterisation well,
CHARMM developers may
add it to the collection.
• Good luck!
Download