structural bioinformatics

advertisement
MSc in Bioinformatics
Module 2: Core Bioinformatics
Lesson 7. 3
Structural Bioinformatics
Molecular Modelling tools
Jean-Didier Maréchal
The Biotechnological Computational Chemistry Team
Department of Chemistry (UAB)
1
Course 2013-14
Module 2: Core Bioinformatics
MSc in Bioinformatics
General information
• JeanDi…
–
–
–
–
Email: jeandidier.marechal@gmail.com
Webpage: gent.uab.cat/jdidier
Room: C7/032 (chemistry building)
Research:
•
•
•
•
Enzyme design
Drug design (novel approaches for HIV, Al, Metabolism)
Peptide development
Software development
– Past: Academia, big pharma and spin off
– From computational chemistry to structural
bioinformatics
Structural Bioinformatics
Molecular Modelling tools
2
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
While the teacher bores me…
- Get your comp to linux
- Download the daily build of UCSF
Chimera
- Install it
Structural Bioinformatics
Molecular Modelling tools
3
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Introduction of Molecular Modeling
• Many atomic properties of the macromolecules can
not be experimentally assessed
• Molecular Modeling tools are key elements of
structural bioinformatics
• Molecular Modeling aims to provide with
reproductive and hopefully predictive simulations of
the molecular systems
• To do so, simulations are carried out with models that
explicitly represent the atoms in the molecular
system.
Structural Bioinformatics
Molecular Modelling tools
4
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Model I. Physics
• Molecular modeling studies lays
on the physical models used for
the atomic representation of the
systems
• The quality of the results is
directly proportional to the
exactness of the model
A physical model of a
reality
provides
Set of mathematic equations
defining the model
Applied on
a given system
• By default, the results provided by
modeling can not be exact
Solved through
computation
An estimated
behavior
Descriptive
Structural Bioinformatics
Molecular Modelling tools
5
Predictive
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Model II. Size does matter...
• Algorithms and computer or time
resources intrinsically limit the
size of the system that can be
treated
• Hence, modeling can also involves
the reduction of the number of
structural variables
– Study of only a part of the real
system
– Replacement of explicit solvent
molecules by a continuum
environment
– coarse grain approaches
– …
Structural Bioinformatics
Molecular Modelling tools
6
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
A model is ...a model
• The validity of a molecular modeling calculation
underlays in its approximations
–
–
–
–
Size
Environment
Physico-chemical conditions
…
• Results have to be discussed in the applicative
framework of the model:
– Do not over criticize the results
– Do not overstate the outcomes
Structural Bioinformatics
Molecular Modelling tools
7
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The early XXIst Century Modeller
• Many models can be used to
simulate the atomic behavior of
molecules
• Each technique relies on its
approximation which its field of
applicability
• All of them are based on
estimating the energy of a given
spatial arrangement of atoms and
reach for the stables, metastables
and transition structures
Structural Bioinformatics
Molecular Modelling tools
8
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The Potential Energy
Conformational Energy
•
•
Energy
( kcal /mol )
The potential energy, E, is a function of
the coordinates (positions), R, of all the
atoms in the system
60.0
42.0
24.0
5.0
-180
To a given geometry of the system
corresponds a unique value of potential
energy (if no electronic changes are
involved)
180
The entire map of the potential energy of
a system in function of its coordinates is
called the Potential Energy Surface (PES)
-90
-45
0
45
90
135
180
•
Because of the high dimensionality of the
entire PES, studies and analysis are
generally simplified to a reduced number
of variables
•
The question is how to calculate the
energy and how to explore the PES
9
Conformational Energy
4411.38
90
0
-90
-180
-180
Structural Bioinformatics
Molecular Modelling tools
-135
C(2)-C(4)-C(6)-C(11)(degrees)
N ( 7)- C ( 6)- C ( 5)- C ( 4)( degrees )
•
20.31
-90
0
90
C(5)-C(6)-N(7)-C(8)(degrees)
180
kcal/mol
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Some key questions for Molecular
Modeling
• Characterization of the most stables
conformations of a system
• Atomic description of dynamical properties
of the protein
• Determination in silico of the structural
features of protein (i.e. Homology
Modeling)
• Decode nature of interactions between
biomolecules
• Determination of the catalytic processes
Structural Bioinformatics
Molecular Modelling tools
10
Some examples
in this lesson
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Calculation of the energy
Structural Bioinformatics
Molecular Modelling tools
11
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Molecular events
Change in chemical state
Fiting/binding
Pre-organization
product
Affine chemical compoud
Good sampling
Accurate electronic
Structural Bioinformatics
Molecular Modelling tools
12
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The three main molecular modeling families
13
Structural Bioinformatics
Molecular Modelling tools
13
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
A - Quantum techniques
•
Generally solve Schrödinger equation…
–
–
•
time independent
in the Born-Oppenheimer approximation (electronic PES)
Implies that the structure with the lowest energy is the most occupied over time
 ( x, y, z)   ( x, y, z)
Hamiltonian
Energy
Wave function
With the exact hamiltonian:
N
ˆ  

A1
  Z Z

2M
R
1
2
A
A
N
N
A
i 1
j i
n
n
n
N
1
2
1
  i  
  Z A
i 1 2
i 1 j i r ij
i 1 A1 r iA
n
B
AB
1
Structural Bioinformatics
Molecular Modelling tools
Tn
Vnn
14
Te
Vee
14
V
en
Jean-Didier
Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
QM techniques
•
Different levels:
– ab initio
– huckel, extended huckel, semi empiric..
– Functional Density Theory
•
Techniques used when aiming to high quality results
•
Necessary for processes with changes in electronic nature of the system:
– Catalysis
– Changes in covalent bonds
– Changes in coordination bonds
•
QM method still have a substantially high ratio Time/natom
Structural Bioinformatics
Molecular Modelling tools
15
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
B - Molecular Mechanics techniques
• Nuclei and electrons behaviors
incorporated in a potential
l
• Parametrization of the different
kind of atomic forces
VEnllaç  kb (l  l0 ) 2 VPlegament  kb (  0 ) 2
d-
• Will be necessary to treat
conformational changes large
molecular system
VElectrostatic  
i
• Can not treat changes in chemical
natures
Structural Bioinformatics
Molecular Modelling tools
d+
16
j i
qi q j
rij Vtorsió  A[1  cos(n   )]
VVDW  4 ij [( ij / rij )12  ( ij / rij )6 ]
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
QM .vs. MM
• At same number of atoms, the conformational
explorations are a lot faster for MM than QM
approaches
• This velocity of calculation allows to treat easily
the system time dependently with MM
• Some techniques allow to explore large
conformational motions
• MM can not treat changes in the chemical state
of the system
Structural Bioinformatics
Molecular Modelling tools
17
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
C – QM/MM methods
•
Enzymes performs catalysis at their active
site
•
The proteic environment has some impact
on the active centre:
– Steric
– Electrostatic
– …
•
When modeling the entire system QM and
MM approximations are required
•
Hybrid QM/MM
– Part of the protein is treated with MM
techniques
– Where the key region is located, QM is used
(example catalytic center)
Structural Bioinformatics
Molecular Modelling tools
18
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Folding
Sampling
Recognition
Catalysis
Fine
Electronics
=
Quantum
Based
Structural Bioinformatics
Molecular Modelling tools
Docking
Homology
Modeling
Normal Mode
Analysis
Wide
space
=
Approx.
Energy
Molecular
Dynamics
QM and QM/MM
Motions
Transition
metal
19
Simplicity
of calculation
of E
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Exploration of the PES
Structural Bioinformatics
Molecular Modelling tools
20
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
A multidimensional problem
The number of local minima typically increases exponentially with the number
of variables (degrees of freedom).
Possible Conformations (3n) for linear
alkanes CH3(CH2)n+1CH3
n=1
n=2
n=5
n = 10
n = 15
• Combinatorial Explosion Problem n = 100
Structural Bioinformatics
Molecular Modelling tools
21
3
9
243
59,049
14,348,907
?
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Minimization
E
?
Transition
state
?
?
Local
Minimum
Global
Minimum
r
22
Structural Bioinformatics
Molecular Modelling tools
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Minimization: General Scheme
1.
2.
3.
4.
start at an initial point and
calculate E
determine according to a fixed
rule a direction of movement
Coordinates {x}0
Energy
move in that direction to a
(hopefully)
lowest
energy
structure.
At the new point, a new
direction is determined and the
same process is repeated.
The primary difference between
algorithms is the rule by which
successive directions of movement are
selected.
Structural Bioinformatics
Molecular Modelling tools
NO
Gradient
Hessian
Search Algorithm
Converged?
YES
Optimized
New coordinates {x}1
23
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Optimizers main families
• Don’t have any derivatives (hard to know which way to
move to reduce function value)
– Simplex method
– Sequential Univariant Method
Not efficient
• Do have derivatives (use them to move toward minimum )
– Line optimization
• Golden Mean Method
• Parabolic Optimization
– First derivate methods
• Steepest Descent
• Conjugate Gradient
– Second derivative methods
• Newton-Raphson
Structural Bioinformatics
Molecular Modelling tools
24
 ef  
 1 
 ef  
f ( x)   f2   
 e3  
  
   
f ( x d e1 )  f ( x )


f ( x d e2 )  f ( x ) 
d

f ( x d e3 )  f ( x )

d



d
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Steepest Descent
Data:
x0  R
n
Step 0: set i=0
Step 1: if
f ( xi )  0
stop
hi  f ( xi )
i  arg min f ( xi    hi )
else, compute search direction
Step 2: compute the step-size
Step 3: set
Structural Bioinformatics
Molecular Modelling tools
xi 1  xi  i  hi
25
 0
go to step 1
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Conjugate Gradient
• The basic idea: decompose the n-dimensional
quadratic problem into n problems of 1dimension
• This is done by exploring the function in
“conjugate directions”
• CG will find minimum of an N-dimensional
quadratic function in at most N steps! Nonquadratic functions take longer – but all functions
become quadratic near their minimum so CG is
efficient
Structural Bioinformatics
Molecular Modelling tools
26
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Practical aspects
• SD are generally applied at the beginning of the optimization and the CG at the
end.
• Other methodologies are even more eficient when one want to reach accuracy
in the determination of the minimium (Newton-Raphson)
• In many cases, it could be interesting to start with different structures.
• And to verify (frequency) that we are indeed with a minimum.
• For macromolecules:
• Minimization is used to relax the structure but not to catch the exact absolute minimum
• The minimization generally ends in a local minimum
Structural Bioinformatics
Molecular Modelling tools
27
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Exercise 1.
Lets get minimized
Minimization cyclosporin A
Structural Bioinformatics
Molecular Modelling tools
28
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
From wells to wells
T
R
Structural Bioinformatics
Molecular Modelling tools
29
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
More than a structure
• Minimization only provides one structure: the
closest minimum of a given starting point
• As the degrees of freedom of system increase,
the number of minima increase
• Exploring the PES is not as trivial
• Numerous methodologies aim at exploring the
conformational space:
– To locate the best minimum
– To extract statistical data with thermodynamical
means
Structural Bioinformatics
Molecular Modelling tools
30
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Some common conformational
exploration schemes
• Monte Carlo: Allows random changes of the
structure and evaluate their energetical cost. Low
energy structures are kept (can form ensembles
that are statistically relevant)
• Genetic Algorithms: Structural displacements mix
randoms and evolutionary guided changes. Only
low energy structures are kept based on survival
criteria
• Simulated Annealing: Overheat the system to
allow barrier jump then cool down to encounter
lowest energy structures
Structural Bioinformatics
Molecular Modelling tools
31
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Molecular Dynamics
• Calculate the motion of the
atoms using Newtonian
dynamics
• determine the net force
and acceleration
experienced by each atom.
• Several algorithms are
used to calculate
displacements of the atom
over time (verlet,
leapfrog…)
• Like MC allow statistical
analysis
Structural Bioinformatics
Molecular Modelling tools
32
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Time steps
• Knowledge of the atomic forces and masses can be used to
solve the position of each atom along a series of extremely
small time steps (on the order of femtoseconds = 10-15
seconds).
• The resulting series of snapshots of structural changes over
time is called a trajectory.
Structural Bioinformatics
Molecular Modelling tools
33
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Time scale
Biological molecules exhibit a wide range of time scales
over which specific processes occur;
for example
Local Motions (0.01 to 5 Å, 10-15 to 10-1 s)
Atomic fluctuations
Sidechain Motions
Loop Motions
Rigid Body Motions (1 to 10Å, 10-9 to 1s)
Helix Motions
Domain Motions (hinge bending)
Subunit motions
Large-Scale Motions (> 5Å, 10-7 to 104 s)
Helix coil transitions
Dissociation/Association
Folding and Unfolding
Structural Bioinformatics
Molecular Modelling tools
34
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The meaning of trajectories
Vibrations in proteins
vary widely in energy
Low frequencies
vibration correspond
to collective motion
of the proteins
High frequencies
vibration to localized
motions
Structural Bioinformatics
Molecular Modelling tools
35
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Relationship accuracy / time scale
• Computing numerous
structures (energy, gradient,
forces, etc.) is increasingly
ressource demanding in
function of the quality of the
energetic model
• Force field approaches are
simplified enough so that
calculations can be performed
on a very wild conformational
and chemical space
• Simulations can be performed
nowadays on solvated systems
and for long runs
Structural Bioinformatics
Molecular Modelling tools
36
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Exercise 2
Molecular Dynamics of Cyclosporin A
Structural Bioinformatics
Molecular Modelling tools
37
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Homology Modeling
Structural Bioinformatics
Molecular Modelling tools
38
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The problem
•
Most biochemical projects
(drug design, enzyme design,
etc.) require the physical
three dimensional
structure of the
physiological target
•
Experimental resolution
(NMR or X-ray) is not always
accessible.
•
Computational tools have
been set up to produce
models of proteins to further
study
– Ab initio
– Comparative/homology
modeling
Structural Bioinformatics
Molecular Modelling tools
39
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The grounds of comparative modeling
• From the mid 80s, studies showed that:
– Proteins with SeqID upper 80% mainly have
differences in fold in the range of experimental error
– Up to 30-20%, protein share a strong structural
similarity
– Below this threshold protein might be or not
structurally related.
• With a good engouh SeqID and alignment
modeling could find out its way to produce
accurate models.
Structural Bioinformatics
Molecular Modelling tools
40
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The general methodology
Required material:
The structure of a “close”
parent
The sequence of the target
protein
A sequence alignment
program (e.g. ClustalW, TCoffee)‫‏‬
A homology modeling
program (e.g. modeller)‫‏‬
Structural Bioinformatics
Molecular Modelling tools
41
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
A framework like modeller
Step 1 and 2: research of templates
1. template recognition
and sequence alignements
Step 3 – Generation of
2. alignment
main chain model
3. alignment correction
4. backbone generation
5. generation of canonical loops
Step 4 and 5 – Optimization of side chains and flexible parts
(data based)
6. side chain generation plus
optimisation
7. ab initio loop building (energy
based)
8. overall model optimisation
(energy minimisation)
step 6 – Full relaxation
9. model verification with
optional repeat of previous steps.
Structural Bioinformatics
Molecular Modelling tools
42
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
The success and the limitations
• When SeqID high, the method is generally highly
efficient.
• In dark regions or difficult structural assignment,
homology modeling could be helped by
secondary structure prediction programs
• Moreover, multiple alignment can be particularly
useful
• HM methods are generally updated tools that
improve the evaluation of the quality of the
model and better explore the conformational
space of flexible regions (i.e. loops)
Structural Bioinformatics
Molecular Modelling tools
43
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Post Modeling
• The accuracy of the model has to be checked.
• Generally the same than those of experimental
structure
• Procheck(http://biotech.embl-ebi.ac.uk:8400/)
• Check for protein stereochemistry
– MolProbity (http://molprobity.biochem.duke.edu/)
• Ramachandran plot, bond length etc
– Verify3D (http://www.doembi.ucla.edu/Services/Verify_3D/)
• Check sequence vs structure
Structural Bioinformatics
Molecular Modelling tools
44
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Exercise 3
Not quite!
Structural Bioinformatics
Molecular Modelling tools
45
Jean-Didier Maréchal
Module 2: Core Bioinformatics
MSc in Bioinformatics
Folding
Sampling
Recognition
Catalysis
Fine
Electronics
=
Quantum
Based
Structural Bioinformatics
Molecular Modelling tools
Docking
Homology
Modeling
Normal Mode
Analysis
Wide
space
=
Approx.
Energy
Molecular
Dynamics
QM and QM/MM
Motions
Transition
metal
46
Simplicity
of calculation
of E
Jean-Didier Maréchal
Download