Lab 7 BCB 444/544 Protein Structure Prediction Oct 11, 2007

advertisement
BCB 444/544
Lab 7
Protein Structure Prediction
Oct 11, 2007
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
1
Chp 14 - Secondary Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 14
Protein Secondary Structure Prediction
• Secondary Structure Prediction for Globular Proteins
• Secondary Structure Prediction for Transmembrane
Proteins
• Coiled-Coil Prediction
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
2
Secondary Structure Prediction
Has become highly accurate in recent years (>85%)
• Usually 3 (or 4) state predictions:
•
•
•
•
H = -helix
E = -strand
C = coil (or loop)
(T = turn)
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
3
Secondary Structure Prediction Methods
•
1st Generation methods
Ab initio - used relatively small dataset of structures available
Chou-Fasman - based on amino acid propensities (3-state)
GOR - also propensity-based (4-state)
•
2nd Generation methods
based on much larger datasets of structures now available
GOR II, III, IV, SOPM, GOR V, FDM
•
3rd Generation methods
Homology-based & Neural network based
PHD, PSIPRED, SSPRO, PROF, HMMSTR, CDM
•
Meta-Servers
combine several different methods
Consensus & Ensemble based
JPRED, PredictProtein, Proteus
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
4
Secondary Structure Prediction Servers
Prediction Evaluation?
• Q3 score - % of residues correctly predicted (3-state)
in cross-validation experiments
Best results? Meta-servers
• http://expasy.org/tools/
(scroll for 2' structure prediction)
• http://www.russell.embl-heidelberg.de/gtsp/secstrucpred.html
• JPred www.compbio.dundee.ac.uk/~www-jpred
• PredictProtein http://www.predictprotein.org/
Rost, Columbia
Best "individual" programs? ??
• CDM
http://gor.bb.iastate.edu/cdm/
Sen…Jernigan, ISU
• FDM
(not available separately as server)
• GOR V
http://gor.bb.iastate.edu/ Kloczkowsky…Jernigan, ISU
BCB 444/544 F07
ISU Dobbs
Cheng…Jernigan, ISU
- Lab 3 - BLAST
9/6/07
5
Consensus Data Mining (CDM)
• Developed by Jernigan Group at ISU
• Basic premise: combination of 2 complementary methods can
enhance performance by harnessing distinct advantages of both
methods; combines FDM & GOR V:
• FDM - Fragment Data Mining - exploits availability of sequencesimilar fragments in the PDB, which can lead to highly accurate
prediction - much better than GOR V - for such fragments, but such
fragments are not available for many cases
• GOR V - Garnier, Osguthorpe, Robson V - predicts secondary
structure of less similar fragments with good performance; these are
protein fragments for which FDM method cannot find suitable
structures
• For references & additional details: http://gor.bb.iastate.edu/cdm/
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
6
Secondary Structure Prediction:
for Different Types of Proteins/Domains
For Complete proteins:
Globular Proteins - use methods previously described
Transmembrane (TMM) Proteins - use special methods
(next slides)
For Structural Domains: many under development:
Coiled-Coil Domains (Protein interaction domains)
Zinc Finger Domains (DNA binding domains),
others…
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
7
SS Prediction for Transmembrane Proteins
Transmembrane (TM) Proteins
• Only a few in the PDB - but ~ 30% of cellular proteins are
membrane-associated !
• Hard to determine experimentally, so prediction important
• TM domains are relatively 'easy' to predict!
Why? constraints due to hydrophobic environment
2 main classes of TM proteins:
- helical
- barrel
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
8
SS Prediction for TM -Helices
-Helical TM domains:
•
•
•
•
Helices are 17-25 amino acids long (span the membrane)
Predominantly hydrophobic residues
Helices oriented perpendicular to membrane
Orientation can be predicted using "positive inside" rule
Residues at cytosolic (inside or cytoplasmic) side of TM helix, near
hydrophobic anchor are more positively charged than those on lumenal
(inside an organelle in eukaryotes) or periplasmic side (space between
inner & outer membrane in gram-negative bacteria)
• Alternating polar & hydrophobic residues provide clues to
interactions among helices within membrane
Servers?
• TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic
signal peptides (short hydrophobic sequences that target proteins to
the endoplasmic reticulum, ER)
•
Phobius - 94% accuracy - uses distinct HMM models for TM helices
& signal peptide sequences
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
9
SS Prediction for TM -Barrels
-Barrel TM domains:
• -strands are amphipathic (partly hydrophobic, partly
hydrophilic)
• Strands are 10 - 22 amino acids long
• Every 2nd residue is hydrophobic, facing lipid bilayer
• Other residues are hydrophilic, facing "pore" or opening
Servers? Harder problem, fewer servers…
TBBPred - uses NN or SVM (more on these ML methods later)
Accuracy ?
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
10
Chp 15 - Tertiary Structure Prediction
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
Protein Tertiary Structure Prediction
•
•
•
•
•
Methods
Homology Modeling
Threading and Fold Recognition
Ab Initio Protein Structural Prediction
CASP
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
11
Protein Tertiary Structure Prediction
3 Major Methods:
• Homology Modeling (easiest!)
• Threading and Fold Recognition (harder)
• Ab Initio Protein Structural Prediction (really hard)
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
12
Comparative Modeling?
Comparative modeling - term is sometimes used
interchangeably with homology modeling, but also
sometimes used to mean both homology modeling
and/or threading/fold recognition
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
13
Ab Initio Prediction
1. Develop energy function
• bond energy
• bond angle energy
• dihedral angle energy
• van der Waals energy
• electrostatic energy
2. Calculate structure by minimizing energy function
(usually Molecular Dynamics or Monte Carlo methods)

Ab initio prediction - impractical for most real (long) proteins
• Computationally? very expensive
• Accuracy? Usually poor for all except short peptides
 (but much improvement recently!)
Provides both folding pathway & folded structure
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
14
Comparative Modeling
Two types:
1) Homology modeling
2) Threading (fold recognition)
Both rely on availability of experimentally determined
structures that are "homologous" or at least
structurally very similar to target
Provide folded structure only
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
15
Homology Modeling
1.
2.
Identify homologous protein sequences (-BLAST)
Among available structures (in PDB), choose one with closest
sequence to target as template
(can combine steps 1 & 2 by using PDB-BLAST)
3.
Build model by placing target sequence residues in
corresponding positions of homologous structure & refine by
"tweaking" modeled structure (energy minimization)

Homology modeling - works "well"
• Computationally? "relatively" inexpensive
• Accuracy? higher sequence identity  better model
 Requires ~30% sequence identity with sequence for
which structure is known
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
16
Threading - Fold Recognition
Identify “best” fit between target sequence & template structure
1.
2.
3.
4.
5.
Develop energy function
Develop template library
Align target sequence with each template & score
Identify top scoring template (1D to 3D alignment)
Refine structure as in homology modeling

Threading - works "sometimes"
• Computationally? Can be expensive or cheap, depends on
energy function & whether "all atom" or "backbone only"
threading
• Accuracy? in theory, should not depend on sequence identity
(should depend on quality of template library & "luck")

Usually, higher sequence identity to protein of known
structure  better model
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
17
Today's Lab:
• Homology Modeling - using SWISS-MODEL
• http://swissmodel.expasy.org//SWISS-MODEL.html
• Threading - using 3-D JURY (BioinfoBank, a METAserver)
• http://meta.bioinfo.pl/submit_wizard.pl
• Take a look at CASP contest:
• http://predictioncenter.gc.ucdavis.edu/
• CASP7 contest in 2006
• http://www.predictioncenter.org/casp7/Casp7.html
BCB 444/544 F07
ISU Dobbs
- Lab 3 - BLAST
9/6/07
18
Download