Protein Structures - the University of California, Davis

advertisement
Protein Structures:
Experiments and Modeling
Patrice Koehl
Structural Bioinformatics: Proteins
Proteins: Sources of Structure Information
Proteins: Homology Modeling
Proteins: Ab initio prediction
Proteins: Quality of Structures/Models
Structural Bioinformatics: Proteins
Proteins: Sources of Structure Information
Proteins: Homology Modeling
Proteins: Ab initio prediction
Proteins: Quality of Structures/Models
Proteins: Finding the Primary Structure
Methods for finding the sequence of a protein:
-Translating gene sequence
- For proteins from prokaryotes, direct translation
- For proteins from eukaryotes, we need the sequence of
mRNA or cDNA
-Edman degradation
limited to “small” proteins, up to 50 amino acids
for automated sequencer
-Mass spectrometry
Proteins: Finding the Primary Structure
(Phenyl isothiocyanate or PITC)
(Trifluoroacetic acid, TFA)
(http://en.wikibooks.org/wiki/Structural_Biochemistry/Proteins/Protein_sequence_determination_techniques)
Proteins: Finding the Tertiary Structure
Methods for finding the 3D structure of a protein:
- Circular Dichroism
(low resolution; provides information on secondary structure)
-X-ray crystallography
(high resolution; finds structure of a protein in a crystal)
- NMR spectroscopy
(high resolution; finds structure of a protein in solution)
Proteins: Circular Dichroism
Circular dichroism (CD) spectroscopy
measures differences in the absorption
of left-handed polarized light versus
right-handed polarized light which
arise due to structural asymmetry.
Different secondary structures in
proteins have different CD spectra
as they have different asymmetry.
CD therefore can detect secondary
structures in protein
Proteins: X-ray Crystallography
Bragg’s Law:
2dsin(q ) = nl
Intensity
From the “pattern of
diffraction”, i.e. the maximum
of intensities observed, we
can find the angles of
diffraction and for each
angle we get the corresponding d
using Bragg’s law
Angle
General principle of X-ray
crystallography applied to
proteins:
1) We need a crystal
2) From the diffraction pattern,
we get the crystal organization
3) From the diffraction intensities,
we get the electron densities
4) Once the electron density map
we fit a structure that matches
with this density
5) From the atomic model, we
can compute a theoretical
diffraction map; if it matches
with the experimental one,
we are done; otherwise refine
Getting the Diffraction Pattern
Rosalyn Franklin:
Diffraction pattern for DNA
From Diffraction to Electron Density Map
3D Fourier Transform
One hidden problem: diffraction patterns provide intensities; for
Fourier transform, need intensity and phase. A significant step in
X-ray crystallography is the solve the “phase problem”.
Fitting the structure:
Influence of the resolution
2.6 Å resolution
1.2 Å resolution
Resolution of X-ray structures
Resolution (Å) Meaning
>4.0
3.0 - 4.0
2.5 - 3.0
2.0 - 2.5
1.5 - 2.0
0.5 - 1.5
Individual coordinates meaningless
Fold possibly correct, but errors are very likely.
Fold likely correct except that some surface loops
might be mismodelled.
Many small errors can normally be detected.
Fold normally correct and number of errors in
surface loops is small.
Water molecules and small ligands become visible.
Many small errors can normally be detected.
Folds are extremely rarely incorrect, even in surface
loops.
In general, structures have almost no errors at this
resolution. geometry studies are made from these
structures.
(http://en.wikipedia.org/wiki/Resolution_(electron_density)
Large molecular assemblies:
X-ray crystallography and Cryo-EM
X-ray structure
(180 copies of
the same protein)
(Norwalk virus: http://www.bcm.edu/molvir/norovirus)
Large molecular assemblies:
X-ray crystallography and Cryo-EM
Cryo-EM:
-Microscopy technique;
as such, do not need crystal
(closer to physiological
conditions)
-Not high-resolution enough
to provide atomic details;
used in combination with
modeling
Structural Bioinformatics: Proteins
Proteins: Sources of Structure Information
Proteins: Homology Modeling
Proteins: Ab initio prediction
Proteins: Quality of Structures/Models
Why we need Homology Modeling?

Aim to solve the structure of all proteins: this is
too much work experimentally!

Solve enough structures so that the remaining
structures can be inferred from those
experimental structures

The number of experimental structures needed
depend on our abilities to generate a model.
Sequence Space
Structure Space
Why we need Homology Modeling?
Proteins
with
known
structures
Unknown proteins
Why does Homology Modeling Work?
High sequence
identity
High structure
similarity
Russell et al. (1997) J Mol Biol 269: 423-439
Homology Modeling:
How it works
Homology Modeling: How it works

Find template

Align target sequence
with template

Generate model:
- add loops
- add sidechains

Refine model
Homology Modeling: Input
The query, also called target sequence
The template structure and sequence
(need high quality structure)
The sequence alignment between query and template sequence
(Probably the most important input!)
Homology Modeling: Which program to use?
Wallner B, Elofsson A.
All are not equal: a benchmark of different homology modeling programs.
Protein Sci. 2005 14(5):1315-27.
Homology Modeling: Which program to use?
1) Web service: SwissModel
http://swissmodel.expasy.org/
3 modes:
- fully automatic
- “Alignment mode”: you provide your own target-template
alignment
- “Project mode”: provides an environment to edit alignment
2) Software: Modeller
http://www.salilab.org/modeller/
Probably the best maintained software the homology modeling
Structural Bioinformatics: Proteins
Proteins: Databases of Models
Do not start a homology modeling project
before checking…
– Swiss-Model repository
(http://swissmodel.expasy.org/repository/)
• Companion to the Swiss-Model tools – over 3.0 million
models of protein domains computed from over 2.2
million sequences.
– ModBase
(http://modbase.compbio.ucsf.edu/)
• Companion to Modeller
Structural Bioinformatics: Proteins
Proteins: Sources of Structure Information
Proteins: Homology Modeling
Proteins: Ab initio prediction
Proteins: Quality of Structures/Models
Secondary Structure Prediction

Given a protein sequence a1a2…aN, secondary structure
prediction aims at defining the state of each amino acid
ai as being either H (helix), E (extended=strand), or O
(other) (Some methods have 4 states: H, E, T for turns,
and O for other).

The quality of secondary structure prediction is
measured with a “3-state accuracy” score, or Q3. Q3 is
the percent of residues that match “reality” (X-ray
structure).
Secondary Structure Assignment
Determine Secondary Structure positions in known protein
structures using DSSP or STRIDE:
1. Kabsch and Sander. Dictionary of Secondary Structure in Proteins: pattern
recognition of hydrogen-bonded and geometrical features.
Biopolymer 22: 2571-2637 (1983) (DSSP)
2. Frischman and Argos. Knowledge-based secondary structure assignments.
Proteins, 23:566-571 (1995) (STRIDE)
Early methods for Secondary Structure
Prediction

Chou and Fasman
(Chou and Fasman. Prediction of protein conformation.
Biochemistry, 13: 211-245, 1974)

GOR
(Garnier, Osguthorpe and Robson. Analysis of the accuracy
and implications of simple methods for predicting the
secondary structure of globular proteins. J. Mol. Biol., 120:97120, 1978)
Chou and Fasman

Start by computing amino acids propensities
to belong to a given type of secondary
structure:
P(i / Helix )
P (i )
P(i / Beta )
P (i )
P(i / Turn)
P (i )
Propensities > 1 mean that the residue type I is likely to be found in the
Corresponding secondary structure type.
Chou and Fasman
Amino Acid
Ala
Cys
Leu
Met
Glu
Gln
His
Lys
Val
Ile
Phe
Tyr
Trp
Thr
Gly
Ser
Asp
Asn
Pro
Arg
-Helix
1.29
1.11
1.30
1.47
1.44
1.27
1.22
1.23
0.91
0.97
1.07
0.72
0.99
0.82
0.56
0.82
1.04
0.90
0.52
0.96
-Sheet
0.90
0.74
1.02
0.97
0.75
0.80
1.08
0.77
1.49
1.45
1.32
1.25
1.14
1.21
0.92
0.95
0.72
0.76
0.64
0.99
Turn
0.78
0.80
0.59
0.39
1.00
0.97
0.69
0.96
0.47
0.51
0.58
1.05
0.75
1.03
1.64
1.33
1.41
1.23
1.91
0.88
Favors
-Helix
Favors
-strand
Favors
turn
Chou and Fasman
Predicting helices:
- find nucleation site: 4 out of 6 contiguous residues with P()>1
- extension: extend helix in both directions until a set of 4 contiguous
residues has an average P() < 1 (breaker)
- if average P() over whole region is >1, it is predicted to be helical
Predicting strands:
- find nucleation site: 3 out of 5 contiguous residues with P()>1
- extension: extend strand in both directions until a set of 4 contiguous
residues has an average P() < 1 (breaker)
- if average P() over whole region is >1, it is predicted to be a strand
Chou and Fasman
f(i)
Position-specific parameters
for turn:
Each position has distinct
amino acid preferences.
Examples:
-At position 2, Pro is highly
preferred; Trp is disfavored
-At position 3, Asp, Asn and Gly
are preferred
-At position 4, Trp, Gly and Cys
preferred
f(i+1) f(i+2) f(i+3)
Chou and Fasman
Predicting turns:
- for each tetrapeptide starting at residue i, compute:
- PTurn (average propensity over all 4 residues)
- F = f(i)*f(i+1)*f(i+2)*f(i+3)
- if PTurn > P and PTurn > P and PTurn > 1 and F>0.000075
tetrapeptide is considered a turn.
Chou and Fasman prediction:
http://fasta.bioch.virginia.edu/fasta_www/chofas.htm
The GOR method
Position-dependent propensities for helix, sheet or turn is calculated for
each amino acid. For each position j in the sequence, eight residues on
either side are considered.
j
A helix propensity table contains information about propensity for residues at
17 positions when the conformation of residue j is helical. The helix
propensity tables have 20 x 17 entries.
Build similar tables for strands and turns.
GOR simplification:
The predicted state of AAj is calculated as the sum of the positiondependent propensities of all residues around AAj.
GOR can be used at : http://abs.cit.nih.gov/gor/ (current version is GOR IV)
Accuracy

Both Chou and Fasman and GOR have been
assessed and their accuracy is estimated to be
Q3=60-65%.
Secondary Structure Prediction
-Available servers:
- JPRED : http://www.compbio.dundee.ac.uk/~www-jpred/
- PHD:
http://cubic.bioc.columbia.edu/predictprotein/
- PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/
- NNPREDICT: http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html
- Chou and Fassman: http://fasta.bioch.virginia.edu/fasta_www/chofas.htm
-Interesting paper:
- Rost and Eyrich. EVA: Large-scale analysis of secondary structure
prediction. Proteins 5:192-199 (2001)
Protein Structure Prediction
One popular model for protein folding assumes a
sequence of events:
◦ Hydrophobic collapse
◦ Local interactions stabilize secondary structures
◦ Secondary structures interact to form motifs
◦ Motifs aggregate to form tertiary structure
Protein Structure Prediction
A physics-based approach:
- find conformation of protein corresponding to a
thermodynamics minimum (free energy minimum)
- cannot minimize internal energy alone!
Needs to include solvent
- simulate folding…a very long process!
Folding time are in the ms to second time range
Folding simulations at best run 1 ns in one day…
The Folding @ Home initiative
(Vijay Pande, Stanford University)
http://folding.stanford.edu/
The Folding @ Home initiative
Folding @ Home: Results
100000
Experiments:
villin
10000
Predicted folding time
(nanoseconds)
villin:
Raleigh, et al,
SUNY, Stony Brook
BBAW
beta
hairpin
1000
100
BBAW:
Gruebele, et al, UIUC
beta hairpin:
Eaton, et al, NIH
alpha helix
10
alpha helix:
Eaton, et al, NIH
PPA
1
1
10
100
1000
10000
experimental measurement
(nanoseconds)
100000
PPA:
Gruebele, et al, UIUC
http://pande.stanford.edu/
Protein Structure Prediction
DECOYS:
Generate a large number
of possible shapes
DISCRIMINATION:
Select the correct, native-like
fold
Need good decoy structures
Need a good energy function
Structural Bioinformatics: Proteins
Proteins: Sources of Structure Information
Proteins: Homology Modeling
Proteins: Ab initio prediction
Proteins: Quality of Structures/Models
What is a “good” protein structure?
-The structure must have good stereochemistry
-Bond lengths and bond angles close to ideal values
-Backbone dihedral angles in “allowed” regions of the
Ramachandran plot
-Side-chains close to “rotamer” states
-The structure should be energetically favorable
- No “clashes” (such as two atoms too close to each other)
- Hydrophobic residues should be mostly buried
- Polar residues should be on the surface. No charges buried.
I advise running Procheck to check the quality of a protein structure:
-As a standalone program:
http://www.ebi.ac.uk/thornton-srv/software/PROCHECK/
-Or through the PDBsum web portal:
http://www.ebi.ac.uk/pdbsum/
What have we learnt?

X-ray crystallography is the method of choice for the
elucidation of the high-resolution structure of a protein.

Alternative methods include NMR and Cryo-EM.

The mapping between protein sequence and protein
structure is not one-to-one: for one sequence, there is
(usually) one structure, but for one structure, there can be
many sequences
What have we learnt?

Homology modeling is a computational technique that predicts
the structure of a target protein based on a template structure
and an alignment between the target and template
sequences.

The quality of the alignment defines the quality of the
homology models.

Two good programs for homology modeling: Swiss-Model
(web-based) and Modeller (standalone software)

The swiss model repository and ModBase are useful databases
of protein structure models generated by homology modeling.

It is usually good practice to check the quality of a protein
structure or model. ProCheck is a useful tool for this: it checks
both the stereochemistry and energetics of the protein.
Download