#21 - Protein Secondary Structure Prediction 10/10/07 Required Reading BCB 444/544 (before lecture) Mon Oct 8 - Lecture 20 Lecture 21 Protein Secondary Structure Prediction • Chp 14 - pp 200 - 213 Protein Structure Visualization, Classification & Comparison Wed Oct 10 - Lecture 21 Protein Tertiary Structure Prediction Secondary Structure Prediction • Chp 15 - pp 214 - 230 Thurs Oct 11 & Fri Oct 12 - Lab 7 & Lecture 22 #21_Oct10 Protein Tertiary Structure Prediction • Chp 15 - pp 214 - 230 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 1 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction Assignments & Announcements 2 Seminars this Week - Thurs: ALL: HomeWork #3 BCB List of URLs for Seminars related to Bioinformatics: √ Due: Mon Oct 8 by 5 PM http://www.bcb.iastate.edu/seminars/index.html • HW544: HW544Extra #1 • Oct 11 Thurs √ Due: Task 1.1 - Mon Oct 1 by noon • Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar The Computational Microscope 2:10 PM in E164 Lagomarcino Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/ Klaus_Schulten_Seminar.pdf • 444 "Project-instead-of-Final" students should also submit: • HW544Extra #1 • Dr. Dan Gusfield (UC Davis) - Computer Science Colloquium ReCombinatorics: Combinatorial Algorithms for Studying History of Recombination in Populations 3:30 PM in Howe Hall Auditorium • √ Due: Task 1.1 - Mon Oct 8 by noon • Due: Task 1.2 - Fri Oct 12 by 5 PM <Task 2 NOT required for BCB444 students> BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction http://www.cs.iastate.edu/~colloq/new/gusfield.shtml 10/10/07 3 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 4 Chp 12 - Protein Structure Basics Seminars this Week - Fri: BCB List of URLs for Seminars related to Bioinformatics: SECTION V http://www.bcb.iastate.edu/seminars/index.html STRUCTURAL BIOINFORMATICS Xiong: Chp 12 Protein Structure Basics • Oct 12 Fri • Dr. Edward Yu (Physics/BBMB, ISU) - BCB Faculty Seminar TBA: "Structural Biology" (see URL below) 2:10 PM in 102 Sci http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/30/webne wsfilefield_abstract/Dr.-Ed-Yu.pdf • Dr. Srinivas Aluru (ECprE, ISU) - GDCB Seminar Consensus Genetic Maps: A Graph Theoretic Approach 4:10 PM in 1414 MBB http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/35/web newsfilefield_abstract/Dr.-Srinivas-Aluru.pdf BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/10/07 10/10/07 5 • • • • • • Amino Acids Peptide Bond Formation Dihedral Angles Hierarchy Secondary Structures Tertiary Structures • Determination of Protein 3-Dimensional Structure • Protein Structure DataBank (PDB) BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 6 1 #21 - Protein Secondary Structure Prediction 10/10/07 6 Main Classes of Protein Structure Protein Structure & Function • Protein structure - primarily determined by sequence • Protein function - primarily determined by structure 1) α-Domains Bundles of helices connected by loops 2) β-Domains • Globular proteins: compact hydrophobic core & hydrophilic surface • Membrane proteins: special hydrophobic surfaces Mainly antiparallel sheets, usually 2 sheets forming sandwich 3) α/β Domains Mainly parallel sheets with intervening helices, mixed sheets • Folded proteins are only marginally stable • Some proteins do not assume a stable "fold" until they bind to something = Intrinsically disordered 4) α+β Domains Mainly segregated helices and sheets Predicting protein structure and function can be very hard 10/10/07 Multidomain (α & β) 6) Membrane & cell-surface proteins Containing domains from more than one class -- & fun! BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 5) 7 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 8 PDB (RCSB) - recently "remediated" Protein Structure Databases http://www.rcsb.org/pdb PDB - Protein Data Bank http://www.rcsb.org/pdb/ (RCSB) - THE protein structure database MMDB - Molecular Modeling Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure (NCBI Entrez) - has "added" value MSD - Molecular Structure Database http://www.ebi.ac.uk/msd Especially good for interactions & binding sites BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 9 10 http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml http://www.ncbi.nlm.nih.gov/Structure BCB 444/544 Fall 07 Dobbs 10/10/07 MMDB at NCBI Structure at NCBI BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 11 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 12 2 #21 - Protein Secondary Structure Prediction 10/10/07 MMDB: Molecular Modeling Data Base MSD: M olecular Structure Database http://www.ebi.ac.uk/msd/ • Derived from PDB structure records • "Value-added" to PDB records includes: • • • • • • Integration with other ENTREZ databases & tools Conversion to parseable ASN.1 data description language Data also available in mmCIF & XML (also true for PDB now) Correction of numbering discrepancies in structure vs sequence Validation Explicit chemical graph information (covalent bonds) • Integrated tool for identifying structural neighbors Vector Alignment Search Tool (VAST) http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 13 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction wwPDB: World Wide PDB 10/10/07 14 Experimental Determination of 3D Structure http://www.wwpdb.org 2 Major Methods to obtain high-resolution structures 1. X-ray Crystallography (most PDB structures) 2. Nuclear Magnetic Resonance (NMR) Spectroscopy Note Advantages & Limitations of each method • (See your lecture notes & textbook) • For more info: http://en.wikipedia.org/wiki/Protein_structure 3. Other methods (usually lower resolution, at present): • • • • Electron Paramagnetic Resonance (EPR - also called ESR, EMR) Electron microscopy (EM) Cryo-EM Scanning Probe Microscopies (AFM - Atomic Force Microscopy) • • BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 15 Chp 13 - Protein Structure Visualization, Comparison & Classification SECTION V http://www.uweb.engr.washington.edu/research/tutorials/SPM.pdf Circular Dichroism (CD), several other spectroscopic methods BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 16 Protein Structure Visualization RASMOL & decendents: PyMol, MolMol STRUCTURAL BIOINFORMATICS http://www.umass.edu/microbio/rasmol/index2.htm Xiong: Chp 13 Cn3D - esp. good for structural alignments Protein Structure Visualization, Comparison & Classification http://www.biosino.org/mirror/www.ncbi.nlm.nih.gov/Structure/cn3d/ CHIME (Protein Explorer) • Protein Structural Visualization • Protein Structure Comparison • Protein Structure Classification http://www.umass.edu/microbio/chime/getchime.htm MolviZ.Org http://www.umass.edu/microbio/chime Deep View = Swiss-PDB Viewer http://www.expasy.org/spdbv BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/10/07 17 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 18 3 #21 - Protein Secondary Structure Prediction 10/10/07 Cn3D PyMol http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml http://pymol.sourceforge.net/ BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 19 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 20 Cn3D: Structural Alignments Cn3D : Displaying 3' Structures NADH Chloroquine BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 21 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction Protein Explorer (Chime) 10/10/07 22 Protein Structure Comparison Methods http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl /frntdoor.htm We will skip this for now 3 Basic Approaches for Aligning Structures: 1. Intermolecular 2. Intramolecular 3. Combined • DALI/FSSP (most commonly used) • • BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/10/07 23 Fully automated structure alignments DALI server http://www.ebi.ac.uk/dali/index.html DALI Database (fold classification) http://ekhidna.biocenter.helsinki.fi/dali/start BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 24 4 #21 - Protein Secondary Structure Prediction 10/10/07 SCOP - Structure Classification Protein Structure Classification http://scop.mrc-lmb.cam.ac.uk/scop/ • SCOP = Structural Classification of Proteins Levels reflect both evolutionary and structural relationships http://scop.mrc-lmb.cam.ac.uk/scop • CATH = Classification by Class, Architecture,Topology & Homology http://cathwww.biochem.ucl.ac.uk/latest/ • DALI - (recently moved to EBI & reorganized) DALI Database (fold classification) http://ekhidna.biocenter.helsinki.fi/dali/start Each method has strengths & weaknesses…. BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 25 CATH - Structure Classification BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 26 Chp 14 - Secondary Structure Prediction http://www.cathdb.info/latest/index.html SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 14 Protein Secondary Structure Prediction • Secondary Structure Prediction for Globular Proteins • Secondary Structure Prediction for Transmembrane Proteins • Coiled-Coil Prediction BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 27 Secondary Structure Prediction 10/10/07 28 Secondary Structure Prediction Methods Has become highly accurate in recent years (>85%) • 1st Generation methods Ab initio - used relatively small dataset of structures available Chou-Fasman - based on amino acid propensities (3-state) GOR - also propensity-based (4-state) • Usually 3 (or 4) state predictions: • • • • BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction • 2nd Generation methods H = α-helix E = β-strand C = coil (or loop) (T = turn) based on much larger datasets of structures now available GOR II, III, IV, SOPM • 3rd Generation methods Homology-based & Neural network based PHD, PSIPRED, SSPRO, PROF, HMMSTR • Meta-Servers combine several different methods Consensus & Ensemble based JPRED, PredictProtein BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/10/07 29 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 30 5 #21 - Protein Secondary Structure Prediction 10/10/07 Consensus Data Mining (CDM) Secondary Structure Prediction Servers Prediction Evaluation? • Developed by Jernigan Group at ISU • Basic premise: combination of 2 complementary methods can enhance performance by harnessing distinct advantages of both methods; combines FDM & GOR V: • Q3 score - % of residues correctly predicted (3-state) in cross-validation experiments Best results? Meta-servers • http://expasy.org/tools/ • FDM - Fragment Data Mining - exploits availability of sequencesimilar fragments in the PDB, which can lead to highly accurate prediction - much better than GOR V - for such fragments, but such fragments are not available for many cases (scroll for 2' structure prediction) • http://www.russell.embl-heidelberg.de/gtsp/secstrucpred.html • JPred www.compbio.dundee.ac.uk/~www-jpred • PredictProtein http://www.predictprotein.org/ • GOR V - Garnier, Osguthorpe, Robson V - predicts secondary structure of less similar fragments with good performance; these are protein fragments for which FDM method cannot find suitable structures Rost, Columbia Best individual programs? ?? • CDM • GOR V http://gor.bb.iastate.edu/cdm/ Sen…Jernigan, ISU • For references & additional details: http://gor.bb.iastate.edu/cdm/ http://gor.bb.iastate.edu/ Kloczkowsky…Jernigan, ISU BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 31 Secondary Structure Prediction: for Different Types of Proteins/Domains BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 32 SS Prediction for Transmembrane Proteins Transmembrane (TM) Proteins For Complete proteins: Globular Proteins - use methods previously described • Only a few in the PDB - but ~ 30% of cellular proteins are membrane-associated ! Transmembrane (TMM) Proteins - use special methods • TM domains are relatively 'easy' to predict! • Hard to determine experimentally, so prediction important (next slides) Why? constraints due to hydrophobic environment For Structural Domains: many under development: Coiled-Coil Domains (Protein interaction domains) 2 main classes of TM proteins: α- helical β- barrel Zinc Finger Domains (DNA binding domains), others… BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 33 SS Prediction for TM α-Helices BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 34 SS Prediction for TM β-Barrels α-Helical TM domains: • • • • 10/10/07 β-Barrel TM domains: Helices are 17-25 amino acids long (span the membrane) Predominantly hydrophobic residues Helices oriented perpendicular to membrane Orientation can be predicted using "positive inside" rule • β-strands are amphipathic (partly hydrophobic, partly hydrophilic) • Strands are 10 - 22 amino acids long • Every 2nd residue is hydrophobic, facing lipid bilayer Residues at cytosolic (inside or cytoplasmic) side of TM helix, near hydrophobic anchor are more positively charged than those on lumenal (inside an organelle in eukaryotes) or periplasmic side (space between inner & outer membrane in gram-negative bacteria) • Other residues are hydrophilic, facing "pore" or opening • Alternating polar & hydrophobic residues provide clues to interactions among helices within membrane Servers? Harder problem, fewer servers… Servers? TBBPred - uses NN or SVM (more on these ML methods later) • TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic signal peptides ( short hydrophobic sequences that target proteins to Accuracy ? the endoplasmic reticulum, ER) • Phobius - 94% accuracy - uses distinct HMM models for TM helices & signal peptide sequences BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/10/07 35 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 36 6 #21 - Protein Secondary Structure Prediction 10/10/07 Chp 15 - Tertiary Structure Prediction Prediction of Coiled-Coil Domains Coiled-coils • Superhelical protein motifs or domains, with two or more SECTION V interacting α-helices that form a "bundle" • Often mediate inter-protein (& intra-protein) interactions STRUCTURAL BIOINFORMATICS Xiong: Chp 15 'Easy' to detect in primary sequence: • Internal repeat of 7 residues (heptad) Protein Tertiary Structure Prediction • 1 & 4 = hydrophobic (facing helical interface) • 2,3,5,6,7 = hydrophilic (exposed to solvent) • • • • • • Helical wheel representation - can be used manually detect these, based on amino acid sequence Servers? Coils, Multicoil - probability-based methods 2Zip - for Leucine zippers = special type of CC in TFs: Methods Homology Modeling Threading and Fold Recognition Ab Initio Protein Structural Prediction CASP characterized by Leu-rich motif: L-X(6)-L-X(6)-L-X(6)-L BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/10/07 37 BCB 444/544 F07 ISU Dobbs #21 - Protein Secondary Structure Prediction 10/10/07 38 7