#21 - Protein Secondary Structure 10/10/07 Prediction BCB 444/544

advertisement
#21 - Protein Secondary Structure
Prediction
10/10/07
Required Reading
BCB 444/544
(before lecture)
Mon Oct 8 - Lecture 20
Lecture 21
Protein Secondary Structure Prediction
• Chp 14 - pp 200 - 213
Protein Structure Visualization, Classification &
Comparison
Wed Oct 10 - Lecture 21
Protein Tertiary Structure Prediction
 Secondary Structure Prediction
• Chp 15 - pp 214 - 230
Thurs Oct 11 & Fri Oct 12 - Lab 7 & Lecture 22
#21_Oct10
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
1
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
Assignments & Announcements
2
Seminars this Week - Thurs:
ALL: HomeWork #3
BCB List of URLs for Seminars related to Bioinformatics:
√ Due: Mon Oct 8 by 5 PM
http://www.bcb.iastate.edu/seminars/index.html
• HW544: HW544Extra #1
• Oct 11 Thurs
√ Due: Task 1.1 - Mon Oct 1 by noon
• Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar
The Computational Microscope 2:10 PM in E164 Lagomarcino
Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM
http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/
Klaus_Schulten_Seminar.pdf
• 444 "Project-instead-of-Final" students should also submit:
• HW544Extra #1
• Dr. Dan Gusfield (UC Davis) - Computer Science Colloquium
ReCombinatorics: Combinatorial Algorithms for Studying History
of Recombination in Populations 3:30 PM in Howe Hall Auditorium
• √ Due: Task 1.1 - Mon Oct 8 by noon
• Due: Task 1.2 - Fri Oct 12 by 5 PM
<Task 2 NOT required for BCB444 students>
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
http://www.cs.iastate.edu/~colloq/new/gusfield.shtml
10/10/07
3
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
4
Chp 12 - Protein Structure Basics
Seminars this Week - Fri:
BCB List of URLs for Seminars related to Bioinformatics:
SECTION V
http://www.bcb.iastate.edu/seminars/index.html
STRUCTURAL BIOINFORMATICS
Xiong: Chp 12 Protein Structure Basics
• Oct 12 Fri
• Dr. Edward Yu (Physics/BBMB, ISU) - BCB Faculty Seminar
TBA: "Structural Biology" (see URL below) 2:10 PM in 102 Sci
http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/30/webne
wsfilefield_abstract/Dr.-Ed-Yu.pdf
• Dr. Srinivas Aluru (ECprE, ISU) - GDCB Seminar
Consensus Genetic Maps: A Graph Theoretic Approach
4:10 PM in 1414 MBB
http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/35/web
newsfilefield_abstract/Dr.-Srinivas-Aluru.pdf
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/10/07
10/10/07
5
•
•
•
•
•
•
Amino Acids
Peptide Bond Formation
Dihedral Angles
Hierarchy
Secondary Structures
Tertiary Structures
• Determination of Protein 3-Dimensional Structure
• Protein Structure DataBank (PDB)
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
6
1
#21 - Protein Secondary Structure
Prediction
10/10/07
6 Main Classes of Protein Structure
Protein Structure & Function
• Protein structure - primarily determined by sequence
• Protein function - primarily determined by structure
1) α-Domains
Bundles of helices connected by loops
2) β-Domains
• Globular proteins: compact hydrophobic core & hydrophilic surface
• Membrane proteins: special hydrophobic surfaces
Mainly antiparallel sheets, usually 2 sheets forming sandwich
3) α/β Domains
Mainly parallel sheets with intervening helices, mixed sheets
• Folded proteins are only marginally stable
• Some proteins do not assume a stable "fold" until they bind to
something = Intrinsically disordered
4) α+β Domains
Mainly segregated helices and sheets
 Predicting protein structure and function can be very hard
10/10/07
Multidomain (α & β)
6)
Membrane & cell-surface proteins
Containing domains from more than one class
-- & fun!
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
5)
7
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
8
PDB (RCSB) - recently "remediated"
Protein Structure Databases
http://www.rcsb.org/pdb
PDB - Protein Data Bank
http://www.rcsb.org/pdb/
(RCSB) - THE protein structure database
MMDB - Molecular Modeling Database
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure
(NCBI Entrez) - has "added" value
MSD - Molecular Structure Database
http://www.ebi.ac.uk/msd
Especially good for interactions & binding sites
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
9
10
http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml
http://www.ncbi.nlm.nih.gov/Structure
BCB 444/544 Fall 07 Dobbs
10/10/07
MMDB at NCBI
Structure at NCBI
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
11
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
12
2
#21 - Protein Secondary Structure
Prediction
10/10/07
MMDB: Molecular Modeling Data Base
MSD: M olecular Structure Database
http://www.ebi.ac.uk/msd/
• Derived from PDB structure records
• "Value-added" to PDB records includes:
•
•
•
•
•
•
Integration with other ENTREZ databases & tools
Conversion to parseable ASN.1 data description language
Data also available in mmCIF & XML (also true for PDB now)
Correction of numbering discrepancies in structure vs sequence
Validation
Explicit chemical graph information (covalent bonds)
• Integrated tool for identifying structural neighbors
Vector Alignment Search Tool (VAST)
http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
13
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
wwPDB: World Wide PDB
10/10/07
14
Experimental Determination of 3D Structure
http://www.wwpdb.org
2 Major Methods to obtain high-resolution structures
1. X-ray Crystallography (most PDB structures)
2. Nuclear Magnetic Resonance (NMR) Spectroscopy
Note Advantages & Limitations of each method
•
(See your lecture notes & textbook)
•
For more info: http://en.wikipedia.org/wiki/Protein_structure
3. Other methods (usually lower resolution, at present):
•
•
•
•
Electron Paramagnetic Resonance (EPR - also called ESR, EMR)
Electron microscopy (EM)
Cryo-EM
Scanning Probe Microscopies (AFM - Atomic Force Microscopy)
•
•
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
15
Chp 13 - Protein Structure Visualization,
Comparison & Classification
SECTION V
http://www.uweb.engr.washington.edu/research/tutorials/SPM.pdf
Circular Dichroism (CD), several other spectroscopic methods
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
16
Protein Structure Visualization
RASMOL & decendents: PyMol, MolMol
STRUCTURAL BIOINFORMATICS
http://www.umass.edu/microbio/rasmol/index2.htm
Xiong: Chp 13
Cn3D - esp. good for structural alignments
Protein Structure Visualization, Comparison &
Classification
http://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/
CHIME (Protein Explorer)
• Protein Structural Visualization
• Protein Structure Comparison
• Protein Structure Classification
http://www.umass.edu/microbio/chime/getchime.htm
MolviZ.Org
http://www.umass.edu/microbio/chime
Deep View = Swiss-PDB Viewer
http://www.expasy.org/spdbv
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/10/07
17
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
18
3
#21 - Protein Secondary Structure
Prediction
10/10/07
Cn3D
PyMol
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
http://pymol.sourceforge.net/
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
19
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
20
Cn3D: Structural Alignments
Cn3D : Displaying 3' Structures
NADH
Chloroquine
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
21
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
Protein Explorer (Chime)
10/10/07
22
Protein Structure Comparison Methods
http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl
/frntdoor.htm
We will skip this for now
3 Basic Approaches for Aligning Structures:
1. Intermolecular 2. Intramolecular 3. Combined •
DALI/FSSP (most commonly used)
•
•
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/10/07
23
Fully automated structure alignments
DALI server http://www.ebi.ac.uk/dali/index.html
DALI Database (fold classification)
http://ekhidna.biocenter.helsinki.fi/dali/start
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
24
4
#21 - Protein Secondary Structure
Prediction
10/10/07
SCOP - Structure Classification
Protein Structure Classification
http://scop.mrc-lmb.cam.ac.uk/scop/
• SCOP = Structural Classification of Proteins
Levels reflect both evolutionary and structural relationships
http://scop.mrc-lmb.cam.ac.uk/scop
• CATH = Classification by Class, Architecture,Topology & Homology
http://cathwww.biochem.ucl.ac.uk/latest/
• DALI - (recently moved to EBI & reorganized)
DALI Database (fold classification)
http://ekhidna.biocenter.helsinki.fi/dali/start
Each method has strengths & weaknesses….
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
25
CATH - Structure Classification
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
26
Chp 14 - Secondary Structure Prediction
http://www.cathdb.info/latest/index.html
SECTION V
STRUCTURAL BIOINFORMATICS
Xiong: Chp 14
Protein Secondary Structure Prediction
• Secondary Structure Prediction for Globular Proteins
• Secondary Structure Prediction for Transmembrane
Proteins
• Coiled-Coil Prediction
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
27
Secondary Structure Prediction
10/10/07
28
Secondary Structure Prediction Methods
Has become highly accurate in recent years (>85%)
• 1st Generation methods
Ab initio - used relatively small dataset of structures available
Chou-Fasman - based on amino acid propensities (3-state)
GOR - also propensity-based (4-state)
• Usually 3 (or 4) state predictions:
•
•
•
•
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
• 2nd Generation methods
H = α-helix
E = β-strand
C = coil (or loop)
(T = turn)
based on much larger datasets of structures now available
GOR II, III, IV, SOPM
• 3rd Generation methods
Homology-based & Neural network based
PHD, PSIPRED, SSPRO, PROF, HMMSTR
• Meta-Servers
combine several different methods
Consensus & Ensemble based
JPRED, PredictProtein
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/10/07
29
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
30
5
#21 - Protein Secondary Structure
Prediction
10/10/07
Consensus Data Mining (CDM)
Secondary Structure Prediction Servers
Prediction Evaluation?
• Developed by Jernigan Group at ISU
• Basic premise: combination of 2 complementary methods can
enhance performance by harnessing distinct advantages of both
methods; combines FDM & GOR V:
• Q3 score - % of residues correctly predicted (3-state)
in cross-validation experiments
Best results? Meta-servers
• http://expasy.org/tools/
• FDM - Fragment Data Mining - exploits availability of sequencesimilar fragments in the PDB, which can lead to highly accurate
prediction - much better than GOR V - for such fragments, but such
fragments are not available for many cases
(scroll for 2' structure prediction)
• http://www.russell.embl-heidelberg.de/gtsp/secstrucpred.html
• JPred www.compbio.dundee.ac.uk/~www-jpred
• PredictProtein http://www.predictprotein.org/
• GOR V - Garnier, Osguthorpe, Robson V - predicts secondary
structure of less similar fragments with good performance; these are
protein fragments for which FDM method cannot find suitable
structures
Rost, Columbia
Best individual programs? ??
• CDM
• GOR V
http://gor.bb.iastate.edu/cdm/
Sen…Jernigan, ISU
• For references & additional details: http://gor.bb.iastate.edu/cdm/
http://gor.bb.iastate.edu/ Kloczkowsky…Jernigan, ISU
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
31
Secondary Structure Prediction:
for Different Types of Proteins/Domains
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
32
SS Prediction for Transmembrane Proteins
Transmembrane (TM) Proteins
For Complete proteins:
Globular Proteins - use methods previously described
• Only a few in the PDB - but ~ 30% of cellular proteins are
membrane-associated !
Transmembrane (TMM) Proteins - use special methods
• TM domains are relatively 'easy' to predict!
• Hard to determine experimentally, so prediction important
(next slides)
Why? constraints due to hydrophobic environment
For Structural Domains: many under development:
Coiled-Coil Domains (Protein interaction domains)
2 main classes of TM proteins:
α- helical
β- barrel
Zinc Finger Domains (DNA binding domains),
others…
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
33
SS Prediction for TM α-Helices
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
34
SS Prediction for TM β-Barrels
α-Helical TM domains:
•
•
•
•
10/10/07
β-Barrel TM domains:
Helices are 17-25 amino acids long (span the membrane)
Predominantly hydrophobic residues
Helices oriented perpendicular to membrane
Orientation can be predicted using "positive inside" rule
• β-strands are amphipathic (partly hydrophobic, partly
hydrophilic)
• Strands are 10 - 22 amino acids long
• Every 2nd residue is hydrophobic, facing lipid bilayer
Residues at cytosolic (inside or cytoplasmic) side of TM helix, near
hydrophobic anchor are more positively charged than those on lumenal
(inside an organelle in eukaryotes) or periplasmic side (space between
inner & outer membrane in gram-negative bacteria)
• Other residues are hydrophilic, facing "pore" or opening
• Alternating polar & hydrophobic residues provide clues to
interactions among helices within membrane
Servers? Harder problem, fewer servers…
Servers?
TBBPred - uses NN or SVM (more on these ML methods later)
• TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic
signal peptides ( short hydrophobic sequences that target proteins to
Accuracy ?
the endoplasmic reticulum, ER)
•
Phobius - 94% accuracy - uses distinct HMM models for TM helices
& signal peptide sequences
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/10/07
35
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
36
6
#21 - Protein Secondary Structure
Prediction
10/10/07
Chp 15 - Tertiary Structure Prediction
Prediction of Coiled-Coil Domains
Coiled-coils
• Superhelical protein motifs or domains, with two or more
SECTION V
interacting α-helices that form a "bundle"
• Often mediate inter-protein (& intra-protein) interactions
STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
'Easy' to detect in primary sequence:
• Internal repeat of 7 residues (heptad)
Protein Tertiary Structure Prediction
• 1 & 4 = hydrophobic (facing helical interface)
• 2,3,5,6,7 = hydrophilic (exposed to solvent)
•
•
•
•
•
• Helical wheel representation - can be used manually detect
these, based on amino acid sequence
Servers?
Coils, Multicoil - probability-based methods
2Zip - for Leucine zippers = special type of CC in TFs:
Methods
Homology Modeling
Threading and Fold Recognition
Ab Initio Protein Structural Prediction
CASP
characterized by Leu-rich motif: L-X(6)-L-X(6)-L-X(6)-L
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
BCB 444/544 Fall 07 Dobbs
10/10/07
37
BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction
10/10/07
38
7
Download