#22 - Secondary & Tertiary Structure Prediction 10/12/07 Required Reading BCB 444/544 (before lecture) Mon Oct 8 - Lecture 20 Lecture 22 Protein Secondary Structure Prediction • Chp 14 - pp 200 - 213 Secondary Structure Prediction Wed Oct 10 - Lecture 21 Protein Tertiary Structure Prediction Tertiary Structure Prediction • Chp 15 - pp 214 - 230 Thurs Oct 11 & Fri Oct 12 - Lab 7 & Lecture 22 #22_Oct10 Protein Tertiary Structure Prediction • Chp 15 - pp 214 - 230 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction New Reading & Homework Assignment Assignments & Announcements ALL: HomeWork #3 ALL: HomeWork #4 (posted online today) Due: Fri Oct 19 by 5 PM (one week from today) √ Due: Mon Oct 8 by 5 PM Read: • HW544: HW544Extra #1 √ Due: Task 1.1 - Mon Oct 1 by noon Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM • 444 "Project-instead-of-Final" students should also submit: • HW544Extra #1 • √ Due: Task 1.1 - Mon Oct 8 by noon <Task 2 NOT required for BCB444 students> 10/12/07 Seminars this Week - (yesterday) • Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html http://www.bcb.iastate.edu/seminars/index.html • Oct 11 Thurs • Oct 12 Fri • Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar The Computational Microscope 2:10 PM in E164 Lagomarcino http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/ Klaus_Schulten_Seminar.pdf • Dr. Dan Gusfield (UC Davis) - Computer Science Colloquium ReCombinatorics: Combinatorial Algorithms for Studying History of Recombination in Populations 3:30 PM in Howe Hall Auditorium http://www.cs.iastate.edu/~colloq/new/gusfield.shtml BCB 444/544 Fall 07 Dobbs http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website) Seminars this Week - Fri (today) BCB List of URLs for Seminars related to Bioinformatics: BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. • Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Fri Oct 12 • Due: Task 1.2 - Fri Oct 12 by 5 PM BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 10/12/07 • Dr. Edward Yu (Physics/BBMB, ISU) - BCB Faculty Seminar TBA: "Structural Biology" (see URL below) 2:10 PM in 102 Sci http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/30/webne wsfilefield_abstract/Dr.-Ed-Yu.pdf • Dr. Srinivas Aluru (ECprE, ISU) - GDCB Seminar Consensus Genetic Maps: A Graph Theoretic Approach 4:10 PM in 1414 MBB http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/35/web newsfilefield_abstract/Dr.-Srinivas-Aluru.pdf BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 1 #22 - Secondary & Tertiary Structure Prediction 10/12/07 Experimental Determination of 3D Structure Chp 12 - Protein Structure Basics SECTION V 2 Major Methods to obtain high-resolution structures STRUCTURAL BIOINFORMATICS 1. X-ray Crystallography (most PDB structures) 2. Nuclear Magnetic Resonance (NMR) Spectroscopy Xiong: Chp 12 Protein Structure Basics • • • • • • Amino Acids Peptide Bond Formation Dihedral Angles Hierarchy Secondary Structures Tertiary Structures Note Advantages & Limitations of each method • (See your lecture notes & textbook) • For more info: http://en.wikipedia.org/wiki/Protein_structure 3. Other methods (usually lower resolution, at present): • Determination of Protein 3-Dimensional Structure • Protein Structure DataBank (PDB) • • • • http://www.uweb.engr.washington.edu/research/tutorials/SPM.pdf • • BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 "Best" Resolution of Protein Structures Circular Dichroism (CD), several other spectroscopic methods BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Chp 13 - Protein Structure Visualization, Comparison & Classification SECTION V • High-resolution methods • X-ray crystallography (< 1A°) • NMR (~1 - 2.5A°) STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classification • Lower-resolution methods • Cryo-EM (~10-15A°) • Theoretical Models? • Protein Structural Visualization • Protein Structure Comparison - later • Protein Structure Classification • Usually low resolution, at present, but • Highly variable - & a few ~crystal data Pevsner Fig 9.36 Electron Paramagnetic Resonance (EPR - also called ESR, EMR) Electron microscopy (EM) Cryo-EM Scanning Probe Microscopies (AFM - Atomic Force Microscopy) Baker & Sali (2000) BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Chp 14 - Secondary Structure Prediction Protein Structure Classification • SCOP = Structural Classification of Proteins Levels reflect both evolutionary and structural relationships http://scop.mrc-lmb.cam.ac.uk/scop • CATH = Classification by Class, Architecture,Topology & Homology http://cathwww.biochem.ucl.ac.uk/latest/ SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 14 Protein Secondary Structure Prediction • Secondary Structure Prediction for Globular Proteins • Secondary Structure Prediction for Transmembrane Proteins • DALI - (recently moved to EBI & reorganized) DALI Database (fold classification) http://ekhidna.biocenter.helsinki.fi/dali/start • Coiled-Coil Prediction Each method has strengths & weaknesses…. BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 2 #22 - Secondary & Tertiary Structure Prediction 10/12/07 Secondary Structure Prediction Secondary Structure Prediction Methods Has become highly accurate in recent years (>85%) • Ab initio - used relatively small dataset of structures available Chou-Fasman - based on amino acid propensities (3-state) GOR - also propensity-based (4-state) • Usually 3 (or 4) state predictions: • • • • • H = α-helix E = β-strand C = coil (or loop) (T = turn) 1st Generation methods 2nd Generation methods based on much larger datasets of structures now available GOR II, III, IV, SOPM, GOR V, FDM • 3rd Generation methods Homology-based & Neural network based PHD, PSIPRED, SSPRO, PROF, HMMSTR, CDM • Meta-Servers combine several different methods Consensus & Ensemble based JPRED, PredictProtein, Proteus BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction Consensus Data Mining (CDM) Secondary Structure Prediction Servers Prediction Evaluation? • Q3 score - % of residues correctly predicted (3-state) in cross-validation experiments Best results? Meta-servers • http://expasy.org/tools/ (scroll for 2' structure prediction) • http://www.russell.embl-heidelberg.de/gtsp/secstrucpred.html • JPred www.compbio.dundee.ac.uk/~www-jpred • PredictProtein http://www.predictprotein.org/ Rost, Columbia Best "individual" programs? ?? • CDM http://gor.bb.iastate.edu/cdm/ • FDM (not available separately as server) • GOR V Sen…Jernigan, ISU Cheng…Jernigan, ISU http://gor.bb.iastate.edu/ Kloczkowsky…Jernigan, ISU BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 • Developed by Jernigan Group at ISU • Basic premise: combination of 2 complementary methods can enhance performance by harnessing distinct advantages of both methods; combines FDM & GOR V: • FDM - Fragment Data Mining - exploits availability of sequencesimilar fragments in the PDB, which can lead to highly accurate prediction - much better than GOR V - for such fragments, but such fragments are not available for many cases • GOR V - Garnier, Osguthorpe, Robson V - predicts secondary structure of less similar fragments with good performance; these are protein fragments for which FDM method cannot find suitable structures • For references & additional details: http://gor.bb.iastate.edu/cdm/ 10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction Where Find "Actual" Secondary Structure? In the PDB 10/12/07 How Does Predicted Secondary Structure Compare? e.g., from CMD DSSP Author Query GOR V FDM CDM BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/12/07 MAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIENEE CCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCC CCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC CCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 3 #22 - Secondary & Tertiary Structure Prediction 10/12/07 Secondary Structure Prediction: for Different Types of Proteins/Domains SS Prediction for Transmembrane Proteins Transmembrane (TM) Proteins For Complete proteins: Globular Proteins - use methods previously described • Only a few in the PDB - but ~ 30% of cellular proteins are membrane-associated ! Transmembrane (TM) Proteins - use special methods • TM domains are relatively 'easy' to predict! • Hard to determine experimentally, so prediction important (next slides) Why? constraints due to hydrophobic environment For Structural Domains: many under development: Coiled-Coil Domains (Protein interaction domains) 2 main classes of TM proteins: α- helical β- barrel Zinc Finger Domains (DNA binding domains), others… BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction SS Prediction for TM α-Helices SS Prediction for TM α-Helices α-Helical TM domains: • • • • Helices are 17-25 amino acids long (span the membrane) Predominantly hydrophobic residues Helices oriented perpendicular to membrane Orientation can be predicted using "positive inside" rule Residues at cytosolic (inside or cytoplasmic) side of TM helix, near hydrophobic anchor are more positively charged than those on lumenal (inside an organelle in eukaryotes) or periplasmic side (space between inner & outer membrane in gram-negative bacteria) • Alternating polar & hydrophobic residues provide clues to interactions among helices within membrane Servers? • TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic signal peptides ( short hydrophobic sequences that target proteins to α-Helical TM domains: • • • • Phobius - 94% accuracy - uses distinct HMM models for TM helices & signal peptide sequences BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 SS Prediction for TM β-Barrels Helices are 17-25 amino acids long (span the membrane) Predominantly hydrophobic residues Helices oriented perpendicular to membrane Orientation can be predicted using "positive inside" rule Residues at cytosolic (inside or cytoplasmic) side of TM helix, near hydrophobic anchor are more positively charged than those on lumenal (inside an organelle in eukaryotes) or periplasmic side (space between inner & outer membrane in gram-negative bacteria) • Alternating polar & hydrophobic residues provide clues to interactions among helices within membrane Servers? • TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic signal peptides ( short hydrophobic sequences that target proteins to the endoplasmic reticulum, ER) • 10/12/07 the endoplasmic reticulum, ER) • Phobius - 94% accuracy - uses distinct HMM models for TM helices & signal peptide sequences BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Prediction of Coiled-Coil Domains β-Barrel TM domains: • β-strands are amphipathic (partly hydrophobic, partly hydrophilic) • Strands are 10 - 22 amino acids long • Every 2nd residue is hydrophobic, facing lipid bilayer • Other residues are hydrophilic, facing "pore" or opening Servers? Harder problem, fewer servers… TBBPred - uses NN or SVM (more on these ML methods later) Accuracy ? Coiled-coils • Superhelical protein motifs or domains, with two or more interacting α-helices that form a "bundle" • Often mediate inter-protein (& intra-protein) interactions 'Easy' to detect in primary sequence: • Internal repeat of 7 residues (heptad) • 1 & 4 = hydrophobic (facing helical interface) • 2,3,5,6,7 = hydrophilic (exposed to solvent) • Helical wheel representation - can be used manually detect these, based on amino acid sequence Servers? Coils, Multicoil - probability-based methods 2Zip - for Leucine zippers = special type of CC in TFs: characterized by Leu-rich motif: L-X(6)-L-X(6)-L-X(6)-L BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 4 #22 - Secondary & Tertiary Structure Prediction 10/12/07 Structural Genomics - Status & Goal Chp 15 - Tertiary Structure Prediction ~ 20,000 "traditional" genes in human genome SECTION V (recall, this is fewer than earlier estimate of 30,000) STRUCTURAL BIOINFORMATICS ~ 2,000 proteins in a typical cell > 4.9 million sequences in UniProt (Oct 2007) > 46,000 protein structures in the PDB (Oct 2007) Xiong: Chp 15 Protein Tertiary Structure Prediction • • • • • Experimental determination of protein structure lags far behind sequence determination! Methods Homology Modeling Threading and Fold Recognition Ab Initio Protein Structural Prediction CASP BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Protein Sequence & Structure: Analysis Structural Genomics Projects TargetDB: database of structural genomics targets http://targetdb.pdb.org • Diamond STING Millennium - Many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/SMS/ • SwissProt (UniProt) Protein knowledgebase http://us.expasy.org/sprot • InterPro S equence analysis tools http://www.ebi.ac.uk/interpro BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Protein Structure Prediction or Protein Folding Problem • Protein Structure Prediction or "Protein Folding" Problem In cells: spontaneous assisted by enzymes assisted by chaperones In vitro: many proteins can fold to their "native" states spontaneously & without assistance but, many do not! BCB 444/544 Fall 07 Dobbs 10/12/07 Deciphering the Protein Folding Code "Major unsolved problem in molecular biology" BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction Given the amino acid sequence of a protein, predict its 3-dimensional structure (fold) 10/12/07 • "Inverse Folding" Problem Given a protein fold, identify every amino acid sequence that can adopt that 3-dimensional structure BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 5 #22 - Secondary & Tertiary Structure Prediction 10/12/07 Steps in Protein Folding Protein Structure Prediction Structure is largely determined by sequence BUT: • Similar sequences can assume different structures • Dissimilar sequences can assume similar structures • Many proteins are multi-functional 2 Major Protein Folding Problems: 1- Determination of folding pathway 2- Prediction of tertiary structure from sequence 1-"Collapse"- driving force is burial of hydrophobic aa’s (fast - msecs) 2- Molten globule - helices & sheets form, but "loose" (slow - secs) 3- "Final" native folded state - compaction & rearrangement of some 2' structures Native state? - assumed to be lowest free energy - may be an ensemble of structures Both still largely unsolved problems BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Protein Dynamics 10/12/07 Difficulty of Tertiary Structure Prediction • Protein in native state is NOT static • Function of many proteins requires conformational changes, sometimes large, sometimes small • Globular proteins are inherently "unstable" (NOT evolved for maximum stability) • Energy difference between native and denatured state is very small (5-15 kcal/mol) (this is equivalent to ~ 2 H-bonds!) • Folding involves changes in both entropy & enthalpy BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 Folding or tertiary structure prediction problem can be formulated as a search for minimum energy conformation • Search space is defined by psi/phi angles of backbone and side-chain rotamers • Search space is enormous even for small proteins! • Number of local minima increases exponentially with number of residues Computationally it is an exceedingly difficult problem! BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction 10/12/07 From Thursday's Lab: • Homology Modeling - using SWISS-MODEL • http://swissmodel.expasy.org//SWISS-MODEL.html • Threading - using 3-D JURY (BioinfoBank, a METAserver) • http://meta.bioinfo.pl/submit_wizard.pl • Be sure to take a look at CASP contest: • http://predictioncenter.gc.ucdavis.edu/ • CASP7 contest in 2006 • http://www.predictioncenter.org/casp7/Casp7.html BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/12/07 6