Homology modelling of proteins

advertisement
Homology modelling of proteins.
Definition: Prediction of the three dimensional structure of a target protein from the
amino acid sequence of a homologous (template) protein for which an X-ray or NMR
structure is available.
Synonyms: Comparative modelling & Knowledge-based modelling.
Protein Structure Modelling
Three approaches to structure prediction:
a. Ab initio prediction
(no known homology with any sequence of known structure) Given only the
sequence, predict the 3D structure from “first principles”, based on
energetic or statistical principles.
b. Sequence- Structure Threading
Given the sequence, and a set of folds observed in PDB, see if any of the
sequences could adopt one the known folds.
c. Homology Modelling
Given a sequence with homology (> 25%) to a known structure in PDB, use
known structure as template to create a 3D model from the sequence.
Various ways of homology modelling
 One structure as main template (I will illustrate here).
 Fragment based modelling: Protein structure can be build from a combination
of segments from other proteins. The program Composer depends on the
assembly of rigid fragments.
Ab initio modelling
There are two components to ab initio prediction:
 devising a scoring (ie, energy) function that can distinguish correct structures
from incorrect ones
 a search method to explore the conformational space.
In many methods, the two components are coupled together such that a search
function drives, and is driven by, the scoring function to find native-like structures.
BUT
there is a difficulty of formulating an adequate scoring function
and it requires formidable computational effort to solve it
BECAUSE
fully-descriptive energy function must consider interactions between all pairs of
atoms in the polypeptide chain and the number of such pairs grows exponentially with
the number of amino acids in the protein. A full model must also take into account
vitally important interactions between the protein’s atoms and the environment, the
so-called ‘hydrophobic effect’.
For practical reasons, simplifying assumptions must be made.
(you can predict a structure using ab initio techniques on
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.html)
1
Why does Homology (Comparative/knowledge based) modelling work?
Proteins have a limited number of folds: The structure of a new protein can
resemble a known fold even with no apparent sequence similarity.
Why a model?
A model is desirable when either X-ray crystallography or NMR cannot
determine the structure of a protein, in time or at all. Many structure-function
relationships can be deduced from a reasonable model. Indeed, sometimes a modelled
structure can be used for successful drug design.
The 3D structure of a protein can tell us much more about how individual
residues interact to form a functional entity. For example residues that are far away in
a 1D sequence can be very close together in the actual folded protein.
Models are quite accurate: Form a rational basis for explaining experimental
observation & help redesigning proteins to improve their function.
Models can be used as starting points in the determination of protein structure
by NMR or X-ray.
Post-genomics – structural genomics
The potential benefits of having a structural model has lead to the concept that
the structures of all gene products should either be structurally solved or
experimentally modelled. To model so many proteins the technique of producing
accurate alignments and building three-dimensional models from the alignments has
to be fully automated. Sanchez and Šali have automatically modelled a large fraction
of the yeast genome, using their program MODELLER (see later section). But the
process has been limited to only ORF (Open Reading Frame) sequences from yeast
that had a relatively high homology to a three-dimensional template structure.
Automation of techniques for lower sequence homology model building is a step that
still needs to be addressed and considerable effort is being put into this type of
research.
History.
The first homology modelling studies were done using wire and plastic models of
bonds and atoms as early as the 1960’s. The models were constructed by taking the
coordinates of a known protein structure and modified by hand for those amino acids
that did not match the structure. In 1969 David Phillips, Brown and co-workers
published the first paper regarding homology modelling. They modelled lactalbumin based on the structure of hen-egg white lysozyme. The sequence identity
between these two proteins was 39%. In addition both proteins contained an identical
pattern of cysteins suggesting a similar arrangement of disulphide bonds. When the
structure of -lactalbumin was solved by X-ray crystallography it was compared to
the model and analysed. The model was essentially correct apart from the C-terminal
ends, which diverge in the structure in any case
2
Method.
Figure below illustrates the major steps of obtaining structure from sequence.
Protein Sequence
.
Database Searches
Sequence alignment
Secondary structure
prediction
Good Structure
homologue?
No
Improve alignment
using secondary
structure prediction
Yes
Homology modelling
Minimisation
Check model
Three dimensional
structure
Steps in molecular modelling:
1. Identification of structures that will form the template for the target structure
(model).
2. Alignment – the most important step. Alignment of low homology sequences
can be improved using secondary structure prediction (align-model-realignremodel).
3. Transfer of coordinates from the template(s) to the target of structurally conserved
regions (SCR’s) - many fragment method
- single structure.
4. Modelling variable regions
 Loops
 Insertions: Search of a high resolution fragment database
 Deletions: Local minimisation often sufficient.
5. Modelling side chains (practically a virtual step)
6. Minimisation:
 Local – especially loop-hinge regions
3
 Global.
7. Molecular Dynamics: To study regional flexibility.
8. Checking the correctness of the model.
 Correctness of the overall fold by:
- Bad: Non-polar side chains exposed to the solvent.
- Bad: Buried ionizable groups.
- Conformational energy calculations – Incorrect folds have high
solvation energy.
- Luthy’s method.
 Stereochemical properties: PROCHECK
- Bond angles
- Bond Length
Modelling using the Restrained-based method
 Distance restraints (Havel & Snow 1991)
 Structural features restraint (Sali & Blundell 1993)
Modelling of Loops
5 residue insertion
Database search for 9 residue
fragments
annealing
Anchor points
(2 residues)
4
Modelling of Side Chains
Side chains adopt distinct conformations that are dependent on Back Bone structure.
This observation gave rise to ROTAMER libraries that are used in modelling
procedures.
Same S.C.
conformer taken from template.
Partial Similarity:
substitution:
Most S.C. build on template.
build based on rotamer library & energetics.
Minimisation
•
•
•
LOCAL: Minimise a fragment. Usually a loop and its anchor regions - as these
often have bad geometries. First minimise without influence of surrounding
structure then take surrounding structure into account.
GLOBAL: Minimise whole protein (& H2O). Mainly to relieve short contacts
and to rectify bad geometry, like bond angles, peptide planarity etc. Problems
with minimisations are Local minima (egg box) and Approximations
(Dynamics - often local. To study movement of particular loop and/or
improve its geometry.)
5
Local minima problem of minimisation
Energy
Accuracy.
Generally the accuracy of a model depends on the initial sequence alignment and
percentage homology of the target to the template. Most errors occur in the loop or
variable regions of the model.
Check structural integrity of model
• Check the correctness of the overall fold
Look at distribution of polar (charged) and
hydrophobic residues on surface and inside
the protein. Buried charges must interact
• Detect local errors
• Check stereochemical parameters like bond length, bond angles and short
contacts.
Ramachandran plot.
Procheck.
Automatic modelling –
Swiss model free Web and local. http://www.expasy.ch/swissmod/
Easypred free Web http://www.fundp.ac.be/urbm/bioinfo/esypred/
6
WhatIf $$ local
Modeller – Unix machines – quite difficult to learn
How does Swiss-model work – an introduction:
For complete reference look at the web site documentation.
Step 1
Swiss_model first does a database search for homologous proteins. Then it
Superposes all the structures it finds.
Step 2
It generates a multiple alignment with the sequence to be modelled and all the
homologous structure
Step 3
Generates 3D framework for the target protein sequence.
 Atoms that occupy a similar spatial area and are aligned to the target
sequence and are used to compute the averaged atomic position of the
framework from which the target will be build.
 Side chains with incorrect geometries are removed
.
Step 4 Building of insertions or loops.
SWISS_Model uses two techniques:
The first method is the same as I described earlier.
It also uses first principles, in other words it searches conformational space to build
loops where:
is uses 7 allowed  combinations
adequate space allocation for the loop
space allocation for each -carbon
Both methods exclude loops in conflict with structure
Step 5 Side chain building
It also uses a library of allowed side-chain rotamers.
First the distorted but otherwise complete side chains are corrected
Then the incomplete side chains are built with a probabilistic approach using the
rotamers. A van der Waals exclusion test and dihedral angle constraints can be used
to select the “best” side chain conformation
Step 6 minimization
Step 7
The correctness of the structure is checked by analysing the conformational space of
each residue energetically.
The correctness of the structure is also checked by looking at the packing density of
the model which is compared to what is expected.
7
Automatic v Manual
Is our target protein homologous enough for an automatic procedure?
First it has to be found in a sequence search.
Otherwise you can use PDB-viewer and your own sequence/structure.
Even with some manual input will we get a good enough structure?
Sometimes, other times only exhaustive manual modelling is needed.
Need to decide what the model is going got be used for?
Is it to look at e.g. mutations … or do we want to do docking of ligands.
Do we really want to use an averaged template structure?
Some structure can distort an averaged template
Note:
Modelling is not the end of the “Experiment” - it is the means for further theoretical
studies.
It gives us a 3D representation of a sequence alignment with the gaps filled in.
It can be used further in structure-based ligand design if the model is accurate enough.
It can suggest residues to mutate and these mutations can be further studied both
theoretically and biochemically.
It can be used to understand the function of the protein better.
Further Reading & References:
General:
Protein Structure Prediction – A practical approach.
Ed: Michael J. E. Sternberg. IRL Press. 1996. ISBN: 0-19-963496-3.
Browne, W.J. et al. 1969. J. Mol. Biol., 42, 65.
Greer, J. 1981. J. Mol. Biol. 153, 1027.
Havel, T.F. & Snow, M. E. 1991. J. Mol. Biol., 217,1.
Sali, A. & Blundell, T.L. 1993. J. Mol. Biol. 234, 779
Finan P, Koga H, Zvelebil, M.J, Waterfield, MD & Kellie S. 1996. J. Mol. Biol. 261,
173.
8
Download