#23 - Protein Tertiary Structure Prediction 10/15/07 Required Reading BCB 444/544 (before lecture) Mon Oct 15 - Lecture 23 Lecture 24 Protein Tertiary Structure Prediction • Chp 15 - pp 214 - 230 ¾ Protein Tertiary Structure Prediction Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini) RNA Structure/Function & RNA Structure Prediction • Chp 16 - pp 231 - 242 Fri Oct 18 - Lecture 25 #24_Oct17 Gene Prediction • Chp 8 - pp 97 - 112 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 1 New Reading & Homework Assignment BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 2 Seminars this Week ALL: HomeWork #4 (emailed & posted online Sat AM) Due: Mon Oct 22 by 5 PM (not Fri Oct 19) BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html Read: • Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu (Genentech) Phage peptide and antibody Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. libraries in protein engineering and ligand selection http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website) • Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. • Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Lyric Bartholomay (Ent, ISU) TBA • Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat Oct 13 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 3 Chp 15 - Tertiary Structure Prediction SECTION V 10/17/07 4 Tertiary Structure Prediction Methods 2 (or 3) Major Methods: 1. Comparative Modeling: • Homology Modeling (easiest!) • Threading and Fold Recognition (harder) 2. Ab Initio Protein Structural Prediction (really hard) STRUCTURAL BIOINFORMATICS Xiong: Chp 15 Protein Tertiary Structure Prediction • • • • • BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Methods Homology Modeling Threading and Fold Recognition Ab Initio Protein Structural Prediction CASP BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/17/07 5 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 6 1 #23 - Protein Tertiary Structure Prediction 10/15/07 Rapid Threading Approach for Protein Structure Prediction A Local Example: Steps in Threading Target Sequence ALKKGF…HFDTSE Kai-Ming Ho, Physics Haibo Cao Yungok Ihm Zhong Gao James Morris Cai-zhuang Wang Structure Templates Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Jeff Sander 1. Align target sequence with template structures in fold library (usually from the PDB) 2. Calculate energy score to evaluate "goodness of fit" between target sequence & template structure Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004) Three-dimensional threading approach to protein structure recognition 3. Rank models based on energy scores Polymer 45:687-697 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 7 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 8 Simplify: Energy Function Simplify: Template structure representation • Interaction “counts” only if two hydrophobic amino acid residues are in contact • At residue level, pair-wise hydrophobic interaction is dominant: 1 i j E = Σi,j Cij Uij N C (N × N Template structure Cij = 1, if rij ≤ 6.5 Å Cij = 0, Cij : contact matrix Uij = U(residue I, residue contact matrix) MJ: LTW: HP: (contact) Otherwise (non-contact) A neighbor in sequence Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 9 C M= Li-Tang-Wingreen (LTW): C M F I L V W M I U = Qi*Qj U = {1,0} 10/17/07 10 Summary of Ho Threading Procedure L Template Structure Contact Matrix 1 046 054 049 057 052 i -020 -001 006 001 003 -008 018 010 -001 -004 ~ Mij = C 2{( qi + α )( qj + α ) + β } 20 parameters Ec = ∑ (QiCijQj + βCij ) N Contact Energy: F U = Uij Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Energy calculation: Contact energy Miyazawa-Jernigan (MJ) matrix: Statistical potential 210 parameters J) ij =1 Yungok Ihm BCB 444/544 Fall 07 Dobbs with qi ~ solubility Qi C ~ hydrophobicity j N Sequence AVFMRIHNDIVYNDIANTTQ contact matrix Qi = − q i − α Scoring Function α = −0.6797, β = −0.2604 Cij = 1, if rij < 6 5 Å Cij = 0, otherwise (a neighbor in sequence) Sequence Vector S = (QA, QV , QF ,....., QE ) = (0.7997, 0.9897, 1.1197, 0.6497) Contact Energy Ec = ∑ QiCijQj + β N Yungok Ihm ij =1 2 #23 - Protein Tertiary Structure Prediction 10/15/07 Can complexity be further reduced? Represent contact matrix by its dominant eigenvector (1D profile) Consider simplifying structure representation, too ALKKGF…HFDTSE Sequence – Structure (1D – 3D problem) Sequence – Contact Matrix (1D – 2D problem) Sequence – 1D Profile (1D – 1D problem) • First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure • Higher ranking (rank > 4) eigenvectors are “sequence blind” Haibo Cao Haibo Cao Threading Alignment Step - now fast! Parameters for alignment? Align target sequence vector (1D) with eigenvector profile of template structure (1D) • Gap penalty: Insertion/deletion in helices or strands is strongly penalized; smaller penalties for in/dels in loops 1D Profile P = V 1 ALKKGFG…HFDTSE Gap penalties apply to alignment score only, not to energy calculation Maximize the overlap between the Sequence (S) and the profile (P) S • P allowing gaps • If a target residue and aligned template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized New profile P = CP Calculate contact energy using the alignment: Helix Size penalties apply to alignment score only, not to energy calculation Ec Cao et al Polymer 45 (2004) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Loop Size penalty: 10/17/07 15 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Yungok Ihm 10/17/07 16 How much better is this “fit” than random? How incorporate secondary structure? • Predict secondary structure of target sequence Eshuffle : Shuffled Sequence vs Structure (PSIPRED, PROF, JPRED, SAM, GOR V) N+ = total number of matches between predicted & actual secondary structure of template N- = total number of mismatches Erelative = Emod – Eshuffled Ns = total number of residues selected in alignment “Global fitness” : f = 1 + (N+ - N-) / Ns E score modifed to reflect fit with predicted 2' structure Emod = f * Ethreading BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Yungok Ihm BCB 444/544 Fall 07 Dobbs 10/17/07 17 Avg E score for same sequence shuffled (randomized) many times Yungok Ihm 3 #23 - Protein Tertiary Structure Prediction 10/15/07 Typical Results: Performance Evaluation? "Blind Test" (well, actually, our BEST Results): HO = #1-Ranked CASP5 Prediction for this Target Predicted Structure CASP5 Competition (CASP7 is most recent) • Target 174 (Critical Assessment of Protein Structure Prediction) • PDB ID = 1MG7 Given: Amino acid sequence Goal: Predict 3-D structure T174_1 (before experimental results published) Actual Structure T174_2 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 19 Overall Performance in CASP5 Contest FR Fold Recognition (targets manually assessed by Nick Grishin) • ----------------------------------------------------------- • Rank • • • • • • • • • 1 2 3 4 5 6 7 8 9 • • • 24.26 21.64 19.55 16.88 15.25 14.56 13.49 11.34 10.45 9.00 7.00 8.00 6.00 7.00 6.50 4.00 3.00 3.00 12.00 12.00 12.50 10.00 7.00 11.50 11.00 6.00 5.50 NgNW 9 7 9 6 7 7 4 3 3 NpNW 12 12 14 10 7 13 11 6 6 20 Critical Assessment of Protein Structure Prediction • • Npred 10/17/07 CASP - Check it out! ~8th out of 180 (M. Levitt, Stanford) Z-Score Ngood 444/544Dobbs, F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Cao, Ihm,BCB Wang, Ho http://predictioncenter.gc.ucdavis.edu/ Group-name • CASP7 contest - 2006: Ginalski Skolnick Kolinski Baker BIOINFO.PL Shortle BAKER-ROBETTA Brooks Ho-Kai-Ming Jones-NewFold • http://www.predictioncenter.org/casp7/Casp7.html • Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP, EVA) & URLs for them • Related contests & resources: • Protein Function Prediction (part of CASP) ----------------------------------------------------------- • CAPRI = Critical Assessment of Predicted Interactions FR NgNW - number of good predictions without weighting for multiple models FR NpNW - number of total predictions without weighting for multiple models • New: CASPM = CASP for M = Mutant proteins • Predict effects of small (point) mutations, e.g., SNPs BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 21 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 22 Chp 13 - Protein Structure Visualization, Comparison & Classification Another Convenient List of Links for Protein Prediction Servers http://en.wikipedia.org/wiki/List_of_protein_structure_pre diction_software SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classification • Protein Structural Visualization ¾ Protein Structure Comparison • Protein Structure Classification BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/17/07 23 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 24 4 #23 - Protein Tertiary Structure Prediction 10/15/07 Another local example: Combining Structure Prediction, Machine Learning & "Real" (wet-lab) Experiments to Investigate the Lentiviral Rev Protein: Protein Structure Comparison Methods 3 Basic Approaches for Aligning Structures A Step Toward New HIV Therapies (see Xiong textbook for details) 1. Intermolecular 2. Intramolecular 3. Combined But, very active research area - many recent new methods Susan Carpenter (Washington State Univ) Wendy Sparks Yvonne Wannemuehler Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Kai-Ming Ho, Physics Yungok Ihm Haibo Cao Cai-zhuang Wang Gloria Culver, BBMB Laura Dutca 3 Popular Methods: DALI = Distance Matrix Alignment of Structures (Holm) • FSSP Database SSAP = Sequential Structure Alignment Program (Orengo) • CATH Database CE = Combinatorial Extension (Bourne) • VAST at NCBI • • • http://en.wikipedia.org/wiki/Structural_alignment_software URLS: BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 25 Macromolecular interactions mediated by Rev protein in lentiviruses (HIV & EIAV) Spliceosome Spliceosome AAAA 26 • Rev is a small nucleoplasmic shuttling protein Rev Rev AAAA pre-mRNA 10/17/07 Rev is essential for lentiviral replication Provirus Cytoplasm Nucleus BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction (HIV Rev 115 aa; EIAV Rev 165 aa) RNA BINDING • Recognizes a specific binding site on viral RNA: (protein-RNA) Rev Rev Rev RevAAAA Rev Responsive Element (RRE) MULTIMERIZATION • Interacts with CRM1 to export incompletely spliced viral RNAs from nucleus to the cytoplasm (protein-protein) Tat NUCLEAR EXPORT Rev Rev NUCLEAR IMPORT (protein-protein) Rev AAAA Rev Rev Rev • Critical role of Rev in lentiviral replication makes it (protein-protein) Late: Structural Proteins Early: Regulatory Proteins Progeny RNA Susan Carpenter BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction • Specific domains of Rev mediate nuclear localization, RNA binding, and nuclear export 10/17/07 27 Problem: no high resolution Rev structure! not even for HIV Rev, despite intense effort ($$) • Why?? an attractive target for antiviral (AIDs) therapy BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 28 Hypothesis: Rev proteins from diverse lentiviruses share structural features critical for function Approach: • Rev aggregates at concentrations needed for NMR or Xray crystallography • Computationally model structures of lentiviral Rev proteins • What about insights from sequence comparisons? - using structural threading algorithm (with Ho et al) • "undetectable" sequence similarity among Revs from different lentiviruses (eg, EIAV vs HIV <10%) • Predict critical residues for RNA-binding, protein interaction - using machine learning algorithms (with Honavar et al ) • Test model and predictions - using genetic/biochemical approaches (with Carpenter & Culver) - using biophysical approaches (with Andreotti & Yu groups) • But: • We know that lentiviral Rev proteins are functionally "homologous" - even in highly diverse lentiviruses Initially: focus on EIAV Rev & RRE BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/17/07 29 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 30 5 #23 - Protein Tertiary Structure Prediction 10/15/07 Predicted EIAV Rev Structure Functional domains: EIAV vs HIV Rev EIAV Rev exon 2 exon 1 1 31 RBM 165 Folding? NES NLS RRDRW ERLE KRRRK HIV-1 Rev 1 116 NLS/RBM NES NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif RQARRNRRRRWR BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 31 Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Comparison of Predicted Rev Structures EIAV FIV SIV Dimer 10/17/07 32 Predicted vs Experimental Structure of N-terminal region of HIV Rev HIV A B C HIV Dimer Overlay Predicted Structure NMR Structure Alignment of Predicte HIV Rev HIV Rev N-terminal & NMR Structures N-terminus Peptide (Battiste & Williamson) Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 33 Leu95 & Leu109: Leu36,45,49: On surface, consistent with role in nuclear export 10/17/07 34 Mutate hydrophobic residues predicted to be critical for helical packing in core Location of functional residues EIAV Rev? NES Yungok Ihm BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction L65 Buried in core, critical hydrophic contacts for fold? vs L95 & L109 Single mutants: Leu to Ala Leu to Asp Double mutants: Leu to Ala Single Ala Mutation L ⇒ A Negligible effect on Rev activity L109 L65 L95 Insert charged aa in hydrophobic Asp core Dramatic Single Mutation L ⇒ D change in Rev activity? Double Ala Mutation L⇔L ⇒ A⇔A Reduction in Rev activity? Putative RBM BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Yungok Ihm BCB 444/544 Fall 07 Dobbs 10/17/07 35 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Yungok Ihm 10/17/07 36 6 #23 - Protein Tertiary Structure Prediction 10/15/07 Functional Analysis of Rev Structural Mutants in vivo (CAT assay) Functional domains: EIAV vs HIV Rev Red - RNA interaction Green - Protein interaction NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif Activity of Rev Structural Mutants EIAV Rev 150 RBM Folding? NES NLS 100 RRDRW ERLE NLS/RBM NES KRRRK 50 1 116 L95A L109A RQARRNRRRRWR Double Mutations 37 Putative RNA-binding Motifs & Predicted BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 1 31 1-165 31-165 31-145 57-165 57-145 57-124 125-165 146-165 Yungok Ihm RRDRW 71 81 91 121 MBP MBP MBP MBP MBP MBP MBP 151 161 60 42 HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDSKRRRKHL 30 22 ++ +++ +++++++++++++++ + ++++ BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Michael Terribilini 10/17/07 39 MBP-ERev binds specifically to RRE in vitro 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Jae-HyungBCBLee No cold RRE No protein 31-165 MBP BSA 1-165 antisense 31-165 MBP 1-165 BSA Competition 40 PREDICTED: Protein binding residues UV crosslinking 10/17/07 EIAV Rev: Binding Predictions vs Experiments Structure sense 1-165 Marker + 141 165 MBP MBP + 131 146 NLS ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR … ++ +++++++ ++++++++++ 125 RBM Folding? 125-165 KRRRK 57 NES MBP-ERev 61 38 Express & purify MBP-ERev deletion mutants RNA-binding Residues Mapped onto Predicted EIAV Rev Structure ERLE 10/17/07 146-165 10/17/07 57-124 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Wendy Sparks 57-165 Single Mutations L65A L109A 57-145 L65A L95A A D L109 31-165 Controls A D L95 31-145 RI Sham pcDNA3 HIV-1 Rev A D L65 RNA binding residues KRRRK + + RRDRW 61 71 81 91 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++ ++++++++ ++++++++++++++++ 131 141 151 161 VALIDATED: QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ Protein binding residues + ++++++++++++++++++++ Cold RRE WT MBP 31-165 31-145 57-165 145-165 RNA binding residues Undigested 32P-RRE 57 31 RBM FOLD? RRDRW 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction Jae-HyungBCBLee BCB 444/544 Fall 07 Dobbs 10/17/07 41 Jae-Hyung Lee 125 145 165 NLS/RBM NES Lee et al (2006) J Virol 80:3844 ERLE KRRRK Terribilini et al (2006) PSB 11:415 7 #23 - Protein Tertiary Structure Prediction 10/15/07 Rev RNA Binding Motifs: Predicted vs Experiment Roles of Putative RNA Binding Motifs? PREDICTED: Structure 1 124 31 165 RBD RBD NES KRRRK + + Protein binding residues 57 146 RNA binding residues RRDRW NLS ERLE RRDRW 61 71 81 91 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++ ++++++++ ++++++++++++++++ 131 141 151 161 VALIDATED: QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ Protein binding residues + ++++++++++++++++++++ KRRRK RNA binding residues AADAA 57 KAAAK ERDE AALA WT AADAA 31 AALA KAAAK RRDRW ∅ Jae-Hyung BCB Lee444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 43 Summary: Predictions vs Experiments NES RBM FOLD 145 145 165 ERLE KRRRK AALA KAAAK ∅ ∅ ERDE Jae-Hyung Lee Combination of computational & wet lab approaches revealed that: • EIAV Rev has a bipartite RNA binding domain • Two Arg-rich RBMs are critical • RRDRW in central region (but not ERLE) • KRRRK at C-terminus, overlapping the NLS RRDRW 125 125 Conclusions & Future Directions KRRRK 57 31 FOLD? NLS/RBM NLS AADAA ERDE ERLE RBM NES 165 • Based on computational modeling, the RBMs are in close proximity within the 3-D structure of protein NLS/RBM • Lentiviral Rev proteins & their cognate RRE binding sites may be RRDRW ERLE more similar in structure than has been appreciated Future: KRRRK 61 71 81 91 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++ ++++++++ ++++++++++++++++ 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++ ++++++++++ ++ +++ Lee et al (2006) Terribilini et al (2006) + ++++++++++++ ++++++ ++ J Virol 80:3844 PSB 11:415 Computational: Use Rev-RRE model system to discover "predictive rules" for protein-RNA recognition Experimental? Lee et al (2006) Terribilini et al (2006) BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction J Virol 80:3844 PSB 11:415 10/17/07 46 Building “Designer” Zinc Finger DNA-binding Proteins J Sander, P Zaback, F Fu, J Townsend, R Winfrey D Wright, K Joung, L Miller, D Dobbs, D Voytas Experimentally determine the structure of Rev-RRE complex !!! BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/17/07 47 Wright et al (2006) Sander et al (2007) Nature Protocols Nucleic Acids Res 8 #23 - Protein Tertiary Structure Prediction 10/15/07 RNA Function Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS • Storage/transfer of genetic information • Newly discovered regulatory functions - RNAi pathways especially • Catalytic Xiong: Chp 16 RNA Structure Prediction (Terribilini) • • • • • • RNA Function Types of RNA Structures RNA Secondary Structure Prediction Methods Ab Initio Approach Comparative Approach Performance Evaluation BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 49 RNA types & functions Types of RNAs BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 50 RNA Structure Primary Function(s) mRNA - messenger translation (protein synthesis) regulatory rRNA - ribosomal translation (protein synthesis) t-RNA - transfer translation (protein synthesis) hnRNA - heterogeneous nuclear precursors & intermediates of mature mRNAs & other RNAs scRNA - small cytoplasmic signal recognition particle (SRP) tRNA processing <catalytic> snRNA - small nuclear snoRNA - small nucleolar mRNA processing, poly A addition <catalytic> rRNA processing/maturation/methylation regulatory RNAs (siRNA, miRNA, etc.) regulation of transcription and translation, other?? BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction <catalytic> 10/17/07 51 Levels of RNA Structure 10/17/07 52 • RNA tertiary structure is very difficult to predict • Focus on predicting RNA secondary structure • Given a RNA sequence, predict the secondary structure of the molecule • Almost all methods ignore higher order secondary structures like psuedoknots • Like proteins, RNA has primary, secondary, and tertiary structures • Primary structure - base sequence • Secondary structure - single stranded or base paired • Tertiary structure - 3D structure BCB 444/544 Fall 07 Dobbs BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction RNA Structure Prediction Rob Knight Univ Colorado BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction • RNA forms complex 3D structures • Mainly single stranded • The single RNA strand can self-hybridize to form base paired regions 10/17/07 53 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 54 9 #23 - Protein Tertiary Structure Prediction 10/15/07 Base Pairing in RNA Common structural motifs in RNA G-C, A-U, G-U ("wobble") & variants • Helices See: IMB Image Library of Biological Molecules • Loops • • • • http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs Hairpin Interior Bulge Multibranch • Pseudoknots Fig 6.2 Baxevanis & Ouellette 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 55 RNA Secondary Structure Prediction Methods 10/17/07 56 Ab Initio Prediction • Two main types of methods • Ab initio - based on calculating the most energetically favorable secondary structure • Comparative approach - based on evolutionary comparison of multiple related RNA sequences BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction • Only requires a single RNA sequence • Calculates minimum free energy structure • Base pairing lowers free energy of the structure, so methods attempt to find secondary structure with maximal base pairing 10/17/07 57 Ab Initio Prediction BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 58 Ab Initio Energy Calculation Method • Free energy is calculated based on parameters determined in the wet lab • Known energy associated with each type of base pair • Base pair formation is not independent - multiple base pairs adjacent to each other are more favorable than individual base pairs - cooperative • Bulges and loops adjacent to base pairs have a free energy penalty • Search for all possible base-pairing patterns • Calculate the total energy of the structure based on all stabilizing and destabilizing forces Fig 6.3 Baxevanis & Ouellette 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/17/07 59 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 60 10 #23 - Protein Tertiary Structure Prediction 10/15/07 Dot Matrices Dynamic Programming • Can be used to find all possible base pair patterns • Compare the input sequence to itself and put a dot anywhere there is a complimentary base • Finding the best possible secondary structure is difficult - lots of possibilities • Compare RNA sequence with itself • Apply scoring scheme based on energy parameters for base pairs, cooperativity, and penalties for destabilizing forces • Find path that represents the most energetically favorable secondary structure R Knight 2005 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 61 Problem 10/17/07 • Combines DP with thermodynamic calculations • Fairly accurate for short sequences, less accurate as sequence length increases • RNAfold • Returns multiple structures near the optimal structure • Computes a larger number of potential secondary structures than Mfold, so it uses a simplified energy function 63 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 64 Covariation • Uses multiple sequence alignment • Assumes related sequences fold into the same secondary structure BCB 444/544 Fall 07 Dobbs 62 • Mfold Comparative Approach BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 Popular Ab Initio Prediction Programs • DP returns the SINGLE best structure • There may be many structures with similar energies • Also, your predicted secondary structure is only as good as the energy parameters used • Solution - return multiple structures with near optimal energies BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction • RNA functional motifs are conserved • To maintain RNA structure during evolution, a mutation in a base paired residue must be compensated for by a mutation in the base that it pairs with • Comparative methods search for covariation patterns in MSA 10/17/07 65 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 66 11 #23 - Protein Tertiary Structure Prediction 10/15/07 Popular Comparative Prediction Programs Consensus Structures • Predict secondary structure of each individual sequence • Compare all structures and see if there is a most common structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 • Two types • Require user to provide MSA • No MSA required 67 RNAalifold 10/17/07 68 Foldalign • Requires user to provide the MSA • Creates a scoring matrix combining minimum free energy and covariation information • DP is used to select the minimum free energy structure BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 • User provides a pair of unaligned RNA sequences • Foldalign constructs alignment then computes a commonly conserved structure • Suitable only for short sequences 69 Dynalign BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 70 Performance Evaluation • User provides two input sequences • Dynalign calculates possible secondary structures using algorithm similar to Mfold • Dynalign compares multiple structures from both sequences to find a common structure • Ab initio methods achieve correlation coefficient of 20-60% • Comparative approaches achieve correlation coefficient of 20-80% • Programs that require user to supply MSA are more accurate • Comparative programs are consistently more accurate than ab initio programs • Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures! - Gutell, Pace BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Fall 07 Dobbs 10/17/07 71 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 72 12