#22 - Secondary & Tertiary Structure 10/12/07 Prediction BCB 444/544

advertisement
#22 - Secondary & Tertiary Structure
Prediction
10/12/07
Required Reading
BCB 444/544
(before lecture)
Mon Oct 8 - Lecture 20
Lecture 22
Protein Secondary Structure Prediction
• Chp 14 - pp 200 - 213
 Secondary Structure Prediction
Wed Oct 10 - Lecture 21
Protein Tertiary Structure Prediction
Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
Thurs Oct 11 & Fri Oct 12 - Lab 7 & Lecture 22
#22_Oct10
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
New Reading & Homework Assignment
Assignments & Announcements
ALL: HomeWork #3
ALL: HomeWork #4 (posted online today)
Due: Fri Oct 19 by 5 PM (one week from today)
√ Due: Mon Oct 8 by 5 PM
Read:
• HW544: HW544Extra #1
√ Due: Task 1.1 - Mon Oct 1 by noon
Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM
• 444 "Project-instead-of-Final" students should also submit:
• HW544Extra #1
• √ Due: Task 1.1 - Mon Oct 8 by noon
<Task 2 NOT required for BCB444 students>
10/12/07
Seminars this Week - (yesterday)
• Although somewhat dated, this paper provides a nice overview of protein
structure prediction methods and evaluation of predicted structures.
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
BCB List of URLs for Seminars related to Bioinformatics:
http://www.bcb.iastate.edu/seminars/index.html
http://www.bcb.iastate.edu/seminars/index.html
• Oct 11 Thurs
• Oct 12 Fri
• Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar
The Computational Microscope 2:10 PM in E164 Lagomarcino
http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/
Klaus_Schulten_Seminar.pdf
• Dr. Dan Gusfield (UC Davis) - Computer Science Colloquium
ReCombinatorics: Combinatorial Algorithms for Studying History
of Recombination in Populations 3:30 PM in Howe Hall Auditorium
http://www.cs.iastate.edu/~colloq/new/gusfield.shtml
BCB 444/544 Fall 07 Dobbs
http://nar.oxfordjournals.org/cgi/content/full/33/6/1874
(PDF posted on website)
Seminars this Week - Fri (today)
BCB List of URLs for Seminars related to Bioinformatics:
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
Ginalski et al.(2005) Practical Lessons from Protein Structure
Prediction, Nucleic Acids Res. 33:1874-91.
• Your assignment is to write a summary of this paper - for details see
HW#4 posted online & sent by email on Fri Oct 12
• Due: Task 1.2 - Fri Oct 12 by 5 PM
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
10/12/07
• Dr. Edward Yu (Physics/BBMB, ISU) - BCB Faculty Seminar
TBA: "Structural Biology" (see URL below) 2:10 PM in 102 Sci
http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/30/webne
wsfilefield_abstract/Dr.-Ed-Yu.pdf
• Dr. Srinivas Aluru (ECprE, ISU) - GDCB Seminar
Consensus Genetic Maps: A Graph Theoretic Approach
4:10 PM in 1414 MBB
http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/35/web
newsfilefield_abstract/Dr.-Srinivas-Aluru.pdf
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
1
#22 - Secondary & Tertiary Structure
Prediction
10/12/07
Experimental Determination of 3D Structure
Chp 12 - Protein Structure Basics
SECTION V
2 Major Methods to obtain high-resolution structures
STRUCTURAL BIOINFORMATICS
1. X-ray Crystallography (most PDB structures)
2. Nuclear Magnetic Resonance (NMR) Spectroscopy
Xiong: Chp 12 Protein Structure Basics
•
•
•
•
•
•
Amino Acids
Peptide Bond Formation
Dihedral Angles
Hierarchy
Secondary Structures
Tertiary Structures
Note Advantages & Limitations of each method
•
(See your lecture notes & textbook)
•
For more info: http://en.wikipedia.org/wiki/Protein_structure
3. Other methods (usually lower resolution, at present):
• Determination of Protein 3-Dimensional Structure
• Protein Structure DataBank (PDB)
•
•
•
•
http://www.uweb.engr.washington.edu/research/tutorials/SPM.pdf
•
•
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
"Best" Resolution of
Protein Structures
Circular Dichroism (CD), several other spectroscopic methods
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
Chp 13 - Protein Structure Visualization,
Comparison & Classification
SECTION V
• High-resolution methods
• X-ray crystallography (< 1A°)
• NMR (~1 - 2.5A°)
STRUCTURAL BIOINFORMATICS
Xiong: Chp 13
Protein Structure Visualization, Comparison &
Classification
• Lower-resolution methods
• Cryo-EM (~10-15A°)
• Theoretical Models?
• Protein Structural Visualization
• Protein Structure Comparison - later
• Protein Structure Classification
• Usually low resolution, at present, but
• Highly variable - & a few ~crystal data
Pevsner
Fig 9.36
Electron Paramagnetic Resonance (EPR - also called ESR, EMR)
Electron microscopy (EM)
Cryo-EM
Scanning Probe Microscopies (AFM - Atomic Force Microscopy)
Baker & Sali (2000)
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
Chp 14 - Secondary Structure Prediction
Protein Structure Classification
• SCOP = Structural Classification of Proteins
Levels reflect both evolutionary and structural relationships
http://scop.mrc-lmb.cam.ac.uk/scop
• CATH = Classification by Class, Architecture,Topology & Homology
http://cathwww.biochem.ucl.ac.uk/latest/
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 14
Protein Secondary Structure Prediction
• Secondary Structure Prediction for Globular Proteins
• Secondary Structure Prediction for Transmembrane
Proteins
• DALI - (recently moved to EBI & reorganized)
DALI Database (fold classification)
http://ekhidna.biocenter.helsinki.fi/dali/start
• Coiled-Coil Prediction
Each method has strengths & weaknesses….
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/12/07
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
2
#22 - Secondary & Tertiary Structure
Prediction
10/12/07
Secondary Structure Prediction
Secondary Structure Prediction Methods
Has become highly accurate in recent years (>85%)
•
Ab initio - used relatively small dataset of structures available
Chou-Fasman - based on amino acid propensities (3-state)
GOR - also propensity-based (4-state)
• Usually 3 (or 4) state predictions:
•
•
•
•
•
H = α-helix
E = β-strand
C = coil (or loop)
(T = turn)
1st Generation methods
2nd Generation methods
based on much larger datasets of structures now available
GOR II, III, IV, SOPM, GOR V, FDM
•
3rd Generation methods
Homology-based & Neural network based
PHD, PSIPRED, SSPRO, PROF, HMMSTR, CDM
•
Meta-Servers
combine several different methods
Consensus & Ensemble based
JPRED, PredictProtein, Proteus
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
Consensus Data Mining (CDM)
Secondary Structure Prediction Servers
Prediction Evaluation?
• Q3 score - % of residues correctly predicted (3-state)
in cross-validation experiments
Best results? Meta-servers
• http://expasy.org/tools/
(scroll for 2' structure prediction)
• http://www.russell.embl-heidelberg.de/gtsp/secstrucpred.html
• JPred www.compbio.dundee.ac.uk/~www-jpred
• PredictProtein http://www.predictprotein.org/
Rost, Columbia
Best "individual" programs? ??
• CDM
http://gor.bb.iastate.edu/cdm/
• FDM
(not available separately as server)
• GOR V
Sen…Jernigan, ISU
Cheng…Jernigan, ISU
http://gor.bb.iastate.edu/ Kloczkowsky…Jernigan, ISU
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
• Developed by Jernigan Group at ISU
• Basic premise: combination of 2 complementary methods can
enhance performance by harnessing distinct advantages of both
methods; combines FDM & GOR V:
• FDM - Fragment Data Mining - exploits availability of sequencesimilar fragments in the PDB, which can lead to highly accurate
prediction - much better than GOR V - for such fragments, but such
fragments are not available for many cases
• GOR V - Garnier, Osguthorpe, Robson V - predicts secondary
structure of less similar fragments with good performance; these are
protein fragments for which FDM method cannot find suitable
structures
• For references & additional details: http://gor.bb.iastate.edu/cdm/
10/12/07
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
Where Find "Actual" Secondary Structure?
In the PDB
10/12/07
How Does Predicted Secondary Structure
Compare?
e.g., from CMD
DSSP
Author
Query
GOR V
FDM
CDM
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/12/07
MAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIENEE
CCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCC
CCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC
CCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
3
#22 - Secondary & Tertiary Structure
Prediction
10/12/07
Secondary Structure Prediction:
for Different Types of Proteins/Domains
SS Prediction for Transmembrane Proteins
Transmembrane (TM) Proteins
For Complete proteins:
Globular Proteins - use methods previously described
• Only a few in the PDB - but ~ 30% of cellular proteins are
membrane-associated !
Transmembrane (TM) Proteins - use special methods
• TM domains are relatively 'easy' to predict!
• Hard to determine experimentally, so prediction important
(next slides)
Why? constraints due to hydrophobic environment
For Structural Domains: many under development:
Coiled-Coil Domains (Protein interaction domains)
2 main classes of TM proteins:
α- helical
β- barrel
Zinc Finger Domains (DNA binding domains),
others…
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
SS Prediction for TM α-Helices
SS Prediction for TM α-Helices
α-Helical TM domains:
•
•
•
•
Helices are 17-25 amino acids long (span the membrane)
Predominantly hydrophobic residues
Helices oriented perpendicular to membrane
Orientation can be predicted using "positive inside" rule
Residues at cytosolic (inside or cytoplasmic) side of TM helix, near
hydrophobic anchor are more positively charged than those on lumenal
(inside an organelle in eukaryotes) or periplasmic side (space between
inner & outer membrane in gram-negative bacteria)
• Alternating polar & hydrophobic residues provide clues to
interactions among helices within membrane
Servers?
• TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic
signal peptides ( short hydrophobic sequences that target proteins to
α-Helical TM domains:
•
•
•
•
Phobius - 94% accuracy - uses distinct HMM models for TM helices
& signal peptide sequences
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
SS Prediction for TM β-Barrels
Helices are 17-25 amino acids long (span the membrane)
Predominantly hydrophobic residues
Helices oriented perpendicular to membrane
Orientation can be predicted using "positive inside" rule
Residues at cytosolic (inside or cytoplasmic) side of TM helix, near
hydrophobic anchor are more positively charged than those on lumenal
(inside an organelle in eukaryotes) or periplasmic side (space between
inner & outer membrane in gram-negative bacteria)
• Alternating polar & hydrophobic residues provide clues to
interactions among helices within membrane
Servers?
• TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic
signal peptides ( short hydrophobic sequences that target proteins to
the endoplasmic reticulum, ER)
•
10/12/07
the endoplasmic reticulum, ER)
•
Phobius - 94% accuracy - uses distinct HMM models for TM helices
& signal peptide sequences
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
Prediction of Coiled-Coil Domains
β-Barrel TM domains:
• β-strands are amphipathic (partly hydrophobic, partly
hydrophilic)
• Strands are 10 - 22 amino acids long
• Every 2nd residue is hydrophobic, facing lipid bilayer
• Other residues are hydrophilic, facing "pore" or opening
Servers? Harder problem, fewer servers…
TBBPred - uses NN or SVM (more on these ML methods later)
Accuracy ?
Coiled-coils
• Superhelical protein motifs or domains, with two or more
interacting α-helices that form a "bundle"
• Often mediate inter-protein (& intra-protein) interactions
'Easy' to detect in primary sequence:
• Internal repeat of 7 residues (heptad)
• 1 & 4 = hydrophobic (facing helical interface)
• 2,3,5,6,7 = hydrophilic (exposed to solvent)
• Helical wheel representation - can be used manually detect
these, based on amino acid sequence
Servers?
Coils, Multicoil - probability-based methods
2Zip - for Leucine zippers = special type of CC in TFs:
characterized by Leu-rich motif: L-X(6)-L-X(6)-L-X(6)-L
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/12/07
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
4
#22 - Secondary & Tertiary Structure
Prediction
10/12/07
Structural Genomics - Status & Goal
Chp 15 - Tertiary Structure Prediction
~ 20,000 "traditional" genes in human genome
SECTION V
(recall, this is fewer than earlier estimate of 30,000)
STRUCTURAL BIOINFORMATICS
~ 2,000 proteins in a typical cell
> 4.9 million sequences in UniProt (Oct 2007)
> 46,000 protein structures in the PDB (Oct 2007)
Xiong: Chp 15
Protein Tertiary Structure Prediction
•
•
•
•
•
Experimental determination of protein structure lags far
behind sequence determination!
Methods
Homology Modeling
Threading and Fold Recognition
Ab Initio Protein Structural Prediction
CASP
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

10/12/07
Goal: Determine structures of "all" protein folds in
nature, using combination of experimental structure
determination methods (X-ray crystallography, NMR,
mass spectrometry) & structure prediction
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
Protein Sequence & Structure: Analysis
Structural Genomics Projects
TargetDB: database of structural genomics targets
http://targetdb.pdb.org
• Diamond STING Millennium - Many useful structure analysis
tools, including Protein Dossier
http://trantor.bioc.columbia.edu/SMS/
• SwissProt (UniProt)
Protein knowledgebase
http://us.expasy.org/sprot
• InterPro
S equence analysis tools
http://www.ebi.ac.uk/interpro
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
Protein Structure Prediction
or Protein Folding Problem
• Protein Structure Prediction
or "Protein Folding" Problem
In cells:
spontaneous
assisted by enzymes
assisted by chaperones
In vitro:
many proteins can fold to their "native"
states spontaneously & without assistance
but, many do not!
BCB 444/544 Fall 07 Dobbs
10/12/07
Deciphering the Protein Folding Code
"Major unsolved problem in molecular biology"
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
Given the amino acid sequence
of a protein, predict its
3-dimensional structure (fold)
10/12/07
• "Inverse Folding" Problem
Given a protein fold, identify
every amino acid sequence
that can adopt that
3-dimensional structure
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
5
#22 - Secondary & Tertiary Structure
Prediction
10/12/07
Steps in Protein Folding
Protein Structure Prediction
Structure is largely determined by sequence
BUT:
• Similar sequences can assume different structures
• Dissimilar sequences can assume similar structures
• Many proteins are multi-functional
2 Major Protein Folding Problems:
1- Determination of folding pathway
2- Prediction of tertiary structure from
sequence
1-"Collapse"- driving force is burial of hydrophobic aa’s
(fast - msecs)
2- Molten globule - helices & sheets form, but "loose"
(slow - secs)
3- "Final" native folded state - compaction &
rearrangement of some 2' structures
Native state? - assumed to be lowest free energy
- may be an ensemble of structures
Both still largely unsolved problems
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
Protein Dynamics
10/12/07
Difficulty of Tertiary Structure Prediction
• Protein in native state is NOT static
• Function of many proteins requires conformational
changes, sometimes large, sometimes small
• Globular proteins are inherently "unstable"
(NOT evolved for maximum stability)
• Energy difference between native and denatured
state is very small (5-15 kcal/mol)
(this is equivalent to ~ 2 H-bonds!)
• Folding involves changes in both entropy & enthalpy
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
Folding or tertiary structure prediction problem can
be formulated as a search for minimum energy
conformation
• Search space is defined by psi/phi angles of
backbone and side-chain rotamers
• Search space is enormous even for small proteins!
• Number of local minima increases exponentially
with number of residues
Computationally it is an exceedingly difficult problem!
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
10/12/07
From Thursday's Lab:
• Homology Modeling - using SWISS-MODEL
• http://swissmodel.expasy.org//SWISS-MODEL.html
• Threading - using 3-D JURY (BioinfoBank, a METAserver)
• http://meta.bioinfo.pl/submit_wizard.pl
• Be sure to take a look at CASP contest:
• http://predictioncenter.gc.ucdavis.edu/
• CASP7 contest in 2006
• http://www.predictioncenter.org/casp7/Casp7.html
BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/12/07
6
Download