BCB 444/544 Lecture 23 Protein Tertiary Structure Prediction #23_Oct15 BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 1 Required Reading (before lecture) Mon Oct 15 - Lecture 23 Protein Tertiary Structure Prediction • Chp 15 - pp 214 - 230 Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8 (Terribilini) RNA Structure/Function & RNA Structure Prediction • Chp 16 - pp 231 - 242 Fri Oct 18 - Lecture 25 Gene Prediction • Chp 8 - pp 97 - 112 BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 2 New Reading & Homework Assignment ALL: HomeWork #4 (emailed & posted online Sat AM) Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read: Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website) • Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures. • Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat Oct 13 BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 3 Seminars Last Week Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar The Computational Microscope 2:10 PM in E164 Lagomarcino http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulte n_Seminar.pdf • Check out links on Schulten's website (videos, etc) • http://www.ks.uiuc.edu/~kschulte/ • Great seminar - amazing simulations of dynamics in proteins and large macromolecular assemblies • Very computationally intensive - very impressive demonstration of power of computation to produce insights not attainable using only experimental approaches BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 4 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu (Genentech) Phage peptide and antibody libraries in protein engineering and ligand selection • Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Lyric Bartholomay (Ent, ISU) TBA BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 5 Protein Sequence & Structure: Analysis • Diamond STING Millennium - Many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/SMS/ • SwissProt (UniProt) Protein knowledgebase http://us.expasy.org/sprot • InterPro Sequence analysis tools http://www.ebi.ac.uk/interpro BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 6 Chp 14 - Secondary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 14 Protein Secondary Structure Prediction • √Secondary Structure Prediction for Globular Proteins • √Secondary Structure Prediction for Transmembrane Proteins • √Coiled-Coil Prediction BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 7 Where Find "Actual" Secondary Structure? In the PDB BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 8 How Does Predicted Secondary Structure Compare with Actual? (An example) Actual - Calculated from PDB coordinates by DSSP or author: DSSP Author Query GOR V FDM CDM MAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIEN CCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHH CCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHH CCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHH Predicted - Using 3 methods (from CMD server, Jernigan Group, ISU) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 9 Chp 15 - Tertiary Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 15 Protein Tertiary Structure Prediction • • • • • Methods Homology Modeling Threading and Fold Recognition Ab Initio Protein Structural Prediction CASP BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 10 Structural Genomics - Status & Goal ~ 20,000 "traditional" genes in human genome (recall, this is fewer than earlier estimate of 30,000) ~ 2,000 proteins in a typical cell > 4.9 million sequences in UniProt (Oct 2007) > 46,000 protein structures in the PDB (Oct 2007) Experimental determination of protein structure lags far behind sequence determination! Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 11 Structural Genomics Project TargetDB: Database of Structural Genomics Targets http://targetdb.pdb.org BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 12 Database of Theoretical Structures? PMDB: Protein Model Database http://mi.caspur.it/PMDB/help.php also, via NAR's Molecular Biology Database Collection http://www.oxfordjournals.org/nar/database/summary/855 Theoretical structural models (predicted) are no longer accepted by the PDB (since 10/15/06); but, it is possible to search for models deposited earlier: http://www.rcsb.org/pdb/search/searchModels.do BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 13 Protein Structure Prediction or Protein Folding Problem "Major unsolved problem in molecular biology" In cells: spontaneous assisted by enzymes assisted by chaperones In vitro: many proteins can fold to their "native" states spontaneously & without assistance but, many do not! BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 14 Deciphering the Protein Folding Code • Protein Structure Prediction or • Protein Folding Problem Given the amino acid sequence of a protein, predict its 3-dimensional structure (fold) • Inverse Folding Problem Given a protein fold, identify every amino acid sequence that can adopt its 3-dimensional structure BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 15 Protein Structure Prediction Structure is largely determined by sequence BUT: • Similar sequences can assume different structures • Dissimilar sequences can assume similar structures • Many proteins are multi-functional 2 Major Protein Folding Problems: 1- Determine folding pathway 2- Predict tertiary structure from sequence Both still largely unsolved problems BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 16 Steps in Protein Folding 1- "Collapse"- driving force is burial of hydrophobic aa’s (fast - msecs) 2- Molten globule - helices & sheets form, but "loose" (slow - secs) 3- "Final" native folded state compaction & rearrangement of 2' structures Native state? - assumed to be lowest free energy - may be an ensemble of structures BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 17 Protein Dynamics • Protein in native state is NOT static • Function of many proteins requires conformational changes, sometimes large, sometimes small • Globular proteins are inherently "unstable" (NOT evolved for maximum stability) • Energy difference between native and denatured state is very small (5-15 kcal/mol) (this is equivalent to ~ 2 H-bonds!) • Folding involves changes in both entropy & enthalpy BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 18 Difficulty of Tertiary Structure Prediction Folding or tertiary structure prediction problem can be formulated as a search for minimum energy conformation • Search space is defined by psi/phi angles of backbone and side-chain rotamers • Search space is enormous even for small proteins! • Number of local minima increases exponentially with number of residues Computationally it is an exceedingly difficult problem! BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 19 Tertiary Structure Prediction Methods 2 (or 3) Major Methods: 1. Comparative Modeling: • Homology Modeling (easiest!) • Threading and Fold Recognition (harder) 2. Ab Initio Protein Structural Prediction (really hard) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 20 Comparative Modeling? Comparative modeling - term is sometimes used interchangeably with homology modeling, but also sometimes used to mean both: • homology modeling • threading/fold recognition BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 21 Ab Initio Prediction 1. Develop energy function • bond energy • bond angle energy • dihedral angle energy • van der Waals energy • electrostatic energy 2. Calculate structure by minimizing energy function • usually Molecular Dynamics (MD) or Monte Carlo (MC) Ab initio prediction - impractical for most real (long) proteins • Computationally? very expensive • Accuracy? Usually poor for all except short peptides (but much improvement recently!) Provides both folding pathway & folded structure BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 22 Comparative Modeling Two types: 1) Homology modeling 2) Threading (fold recognition) Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target Provide folded structure only BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 23 Homology Modeling 1. 2. Identify homologous protein sequences (-BLAST) Among available structures (in PDB), choose one with closest sequence to target as template (can combine steps 1 & 2 by using PDB-BLAST) 3. Build model by placing target sequence residues in corresponding positions on homologous structure & refine by "tweaking" modeled structure (energy minimization) Homology modeling - works "well" • Computationally? "relatively" inexpensive • Accuracy? higher sequence identity better model Requires ~30% sequence identity with sequence for which structure is known BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 24 Threading - Fold Recognition Identify “best” fit between target sequence & template structure 1. 2. 3. 4. 5. Develop energy function Develop template library Align target sequence with each template in library & score Identify top scoring template (1D to 3D alignment) Refine structure as in homology modeling Threading - works "sometimes" • Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading is used • Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck") Usually, higher sequence identity to protein of known structure better model BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 25 Threading: the Motivation • Basic premise: The number of unique structural folds in nature is fairly small (probably 2000-3000) • Statistics from Protein Data Bank (>46,000 structures) Prior to Structural Genomics Project, 90% of "new" structures submitted to PDB were similar to existing folds in PDB - suggesting that almost all folds in nature have been identified • Thus, chances for a protein to have a native-like structural fold in PDB are quite good • Note: Proteins with similar structural folds could be either homologs or analogs BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 26 Steps in Threading Target Sequence ALKKGF…HFDTSE Structure Templates 1. Align target sequence with template structures in fold library (usually from the PDB) 2. Calculate energy score to evaluate "goodness of fit" between target sequence & template structure 3. Rank models based on energy scores BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 27 Threading Goal - & Issues Find “correct” sequence-structure alignment of a target sequence with its native-like fold in template library (usually derived from PDB) • Structure database - must be "complete" • Can't build a good model if there is no good template in library! • Sequence-structure alignment algorithm: • Bad alignment Bad score! • Energy function or Scoring Scheme: • Must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments • Must distinguish “correct” fold from close decoys • Prediction reliability assessment - How determine whether predicted structure is correct? (or even close?) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 28 Threading: Template database • Build a database of structural templates e.g., ASTRAL domain library derived from the PDB Sometimes, supplement with additional decoys e.g., generated using ab initio approach such as Rosetta (Baker) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 29 Threading: Energy function • Two main methods (& combinations of these) • Structural profile (environmental) physicochemical properties of amino acids • Contact potential (statistical) based on contact statistics from PDB famous one: Miyazawa & Jernigan (ISU) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 30 Protein Threading: Typical energy function What is "probability" that two specific residues are in contact? How well does a specific residue fit structural environment? Alignment gap penalty? Total energy: Ep + Es + Eg Goal: Find a sequence-structure alignment that minimizes energy function BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 31 Rapid Threading Approach for Protein Structure Prediction A Local Example: Kai-Ming Ho, Physics Haibo Cao Yungok Ihm Zhong Gao James Morris Cai-zhuang Wang Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Jeff Sander Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004) Three-dimensional threading approach to protein structure recognition Polymer 45:687-697 BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 32 Motivations for & Assumptions of Ho Threading Algorithm Goal: Develop a threading algorithm that: • Is simple & rapid enough to be used in high throughput applications • Is relatively "insensitive" to sequence similarity between target protein sequence & sequence of template structure (to enhance detection of remote homologs & structures that are similar due to convergent evolution) • Can be used to answer questions such as: What are predicted structures of all "unassigned" ORFs in Arabidopsis? Does Arabidopsis have a protein with structure similar to mammalian Tumor Necrosis Factor (TNF)? Assumptions: • Native state of a protein is lowest free energy state • Hydrophobic interactions drive protein folding BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 33 Simplify: Template structure representation 1 i j N Template structure Cij 1, if rij 6.5 Å Cij 0, C ( N N contact matrix) (contact) Otherwise (non-contact) A neighbor in sequence Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 34 Simplify: Target Sequence Representation • Miyazawa-Jernigan (MJ) model: inter-residue contact energy M(i,j) is a quasi-chemical approximation based on pairwise contact statistics extracted from known protein structures in the PDB: 20 X 20 matrix = 210 values ("letters") • Li-Tang-Wingreen (LTW): factorize the MJ interaction matrix to reduce the number of parameters associated with amino acids from 210 to 20 q values • Hydrophobic-Polar (HP): represent amino acids as either H (hydrophobic) or polar (P); Dill et al demonstrated the utility of this simple binary alphabet representation: 2 values Compare results with 210 vs 20 vs 2 letter representations How low can we go? BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 35 Simplify: Energy Function • Interaction “counts” only if two hydrophobic amino acid residues are in contact • At residue level, pair-wise hydrophobic interaction is dominant: E = i,j Cij Uij Cij : contact matrix Uij = U(residue I, residue MJ: LTW: HP: J) U = Uij U = Qi*Qj U = {1,0} Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 36 Energy calculation: Contact energy Miyazawa-Jernigan (MJ) matrix: Statistical potential 210 parameters C M C M F I L V W 046 054 049 057 052 M L -020 -001 006 001 003 -008 018 010 -001 -004 20 parameters Ec (QiCijQj Cij) N Yungok Ihm I ~ Mij C 2{( qi )( qj ) } Li-Tang-Wingreen (LTW): Contact Energy: F ij 1 with qi ~ solubility Qi C ~ hydrophobicity contact matrix Qi qi 0.6797, 0.2604 Summary of Ho Threading Procedure Template Structure Contact Matrix 1 i j N Sequence AVFMRIHNDIVYNDIANTTQ Scoring Function Cij 1, if rij < 6 5 Å Cij 0, otherwise (a neighbor in sequence) Sequence Vector S (QA, QV , QF ,....., QE ) (0.7997, 0.9897, 1.1197, 0.6497) Contact Energy Ec QiCijQj N Yungok Ihm ij 1 Can complexity be further reduced? Consider simplifying structure representation, too ALKKGF…HFDTSE Sequence – Structure (1D – 3D problem) Sequence – Contact Matrix (1D – 2D problem) Sequence – 1D Profile (1D – 1D problem) Haibo Cao Examine eigenvectors of contact matrix Hydrophobic Contacts ~ ~ N ~ T CT T ( iV iVi )T i 1 N ~ 2 ~ 2 i V iT 1V 1T i 1 :contact matrix i :i-th eigenvalue of C Vi :i-th eigenvector V 1 :eigenvector with largest eigenvalue T :protein sequence of the template structure i :fraction of hydrophobic contacts from i-th eigenvector C r Haibo Cao ~ 2 i (V iT ) ri N ~ 2 i(ViT ) i 1 Represent contact matrix by its dominant eigenvector (1D profile) • First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure • Higher ranking (rank > 4) eigenvectors are “sequence blind” Haibo Cao Threading Alignment Step - now fast! Align target sequence vector (1D) with eigenvector profile of template structure (1D) 1D Profile P V1 Maximize the overlap between the Sequence (S) and the profile (P) S P allowing gaps New profile P CP Calculate contact energy using the alignment: Ec Cao et al Polymer 45 (2004) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 42 Parameters for alignment? • Gap penalty: Insertion/deletion in helices or strands is strongly penalized; smaller penalties for in/dels in loops ALKKGFG…HFDTSE Gap penalties apply to alignment score only, not to energy calculation • Size penalty: If a target residue and aligned template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized Loop Helix Size penalties apply to alignment score only, not to energy calculation Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 43 How incorporate secondary structure? • Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V) N+ = total number of matches between predicted & actual secondary structure of template N- = total number of mismatches Ns = total number of residues selected in alignment “Global fitness” : f = 1 + (N+ - N-) / Ns Emod = f * Ethreading Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 44 How much better is this “fit” than random? Eshuffle : Shuffled Sequence vs Structure Erelative = Emod – Eshuffled E score modifed to reflect fit with predicted 2' structure Yungok Ihm Avg E score for same sequence shuffled (randomized) many times Performance Evaluation? "Blind Test" CASP5 Competition (CASP7 is most recent) (Critical Assessment of Protein Structure Prediction) Given: Amino acid sequence Goal: Predict 3-D structure (before experimental results published) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 46 Typical Results: (well, actually, our BEST Results): HO = #1-Ranked CASP5 Prediction for this Target Predicted Structure • Target 174 • PDB ID = 1MG7 T174_1 Actual Structure T174_2 BCB 444/544 F07 ISUHo Dobbs #23 - Protein Tertiary Structure Prediction Cao, Ihm, Wang, Dobbs, 10/15/07 47 Overall Performance in CASP5 Contest ~8th out of 180 (M. Levitt, Stanford) • • FR Fold Recognition (targets manually assessed by Nick Grishin) • ----------------------------------------------------------- • Rank • • • • • • • • • 1 2 3 4 5 6 7 8 9 • • • Z-Score Ngood 24.26 21.64 19.55 16.88 15.25 14.56 13.49 11.34 10.45 9.00 7.00 8.00 6.00 7.00 6.50 4.00 3.00 3.00 Npred 12.00 12.00 12.50 10.00 7.00 11.50 11.00 6.00 5.50 NgNW 9 7 9 6 7 7 4 3 3 NpNW 12 12 14 10 7 13 11 6 6 Group-name Ginalski Skolnick Kolinski Baker BIOINFO.PL Shortle BAKER-ROBETTA Brooks Ho-Kai-Ming Jones-NewFold ----------------------------------------------------------FR NgNW - number of good predictions without weighting for multiple models FR NpNW - number of total predictions without weighting for multiple models BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 48 CASP - Check it out! Critical Assessment of Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/ • CASP7 contest - 2006: • http://www.predictioncenter.org/casp7/Casp7.html • Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP, EVA) & URLs for them • Related contests & resources: • Protein Function Prediction (part of CASP) • CAPRI = Critical Assessment of Predicted Interactions • New: CASPM = CASP for M = Mutant proteins • Predict effects of small (point) mutations, e.g., SNPs BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 49 Another Convenient List of Links for Protein Prediction Servers http://en.wikipedia.org/wiki/List_of_protein_structure_pre diction_software BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 50 Chp 13 - Protein Structure Visualization, Comparison & Classification SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 13 Protein Structure Visualization, Comparison & Classification • Protein Structural Visualization Protein Structure Comparison • Protein Structure Classification BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 51 Protein Structure Comparison Methods 3 Basic Approaches for Aligning Structures (see Xiong textbook for details) 1. Intermolecular 2. Intramolecular 3. Combined But, very active research area - many recent new methods 3 Popular Methods: • • • URLS: DALI = Distance Matrix Alignment of Structures (Holm) • FSSP Database SSAP = Sequential Structure Alignment Program (Orengo) • CATH Database CE = Combinatorial Extension (Bourne) • VAST at NCBI http://en.wikipedia.org/wiki/Structural_alignment_software BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 52 Another local example: Combining Structure Prediction, Machine Learning & "Real" (wet-lab) Experiments to Investigate the Lentiviral Rev Protein: A Step Toward New HIV Therapies Susan Carpenter (Washington State Univ) Wendy Sparks Yvonne Wannemuehler Drena Dobbs, GDCB Jae-Hyung Lee Michael Terribilini Kai-Ming Ho, Physics Yungok Ihm Haibo Cao Cai-zhuang Wang Gloria Culver, BBMB Laura Dutca BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 53 Macromolecular interactions mediated by Rev protein in lentiviruses (HIV & EIAV) Provirus Rev AAAA pre-mRNA RNA BINDING (protein-RNA) Spliceosome Rev Rev AAAA AAAA MULTIMERIZATION (protein-protein) Tat NUCLEAR EXPORT Rev NUCLEAR IMPORT (protein-protein) Rev RevAAAA (protein-protein) Early: Regulatory Proteins Late: Structural Proteins Progeny RNA Susan Carpenter BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 54 Rev is essential for lentiviral replication • Rev is a small nucleoplasmic shuttling protein (HIV Rev 115 aa; EIAV Rev 165 aa) • Recognizes a specific binding site on viral RNA: Rev Responsive Element (RRE) • Interacts with CRM1 to export incompletely spliced viral RNAs from nucleus to the cytoplasm • Specific domains of Rev mediate nuclear localization, RNA binding, and nuclear export • Critical role of Rev in lentiviral replication makes it an attractive target for antiviral (AIDs) therapy BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 55 Problem: no high resolution Rev structure! not even for HIV Rev, despite intense effort ($$) • Why?? • Rev aggregates at concentrations needed for NMR or Xray crystallography • What about insights from sequence comparisons? • "undetectable" sequence similarity among Revs from different lentiviruses (eg, EIAV vs HIV <10%) • But: • We know that lentiviral Rev proteins are functionally "homologous" - even in highly diverse lentiviruses BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 56 Hypothesis: Rev proteins from diverse lentiviruses share structural features critical for function Approach: • Computationally model structures of lentiviral Rev proteins - using structural threading algorithm (with Ho et al) • Predict critical residues for RNA-binding, protein interaction - using machine learning algorithms (with Honavar et al ) • Test model and predictions - using genetic/biochemical approaches (with Carpenter & Culver) - using biophysical approaches (with Andreotti & Yu groups) Initially: focus on EIAV Rev & RRE BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 57 Functional domains: EIAV vs HIV Rev EIAV Rev exon 2 exon 1 1 31 RBM 165 Folding? NES NLS RRDRW ERLE KRRRK HIV-1 Rev 1 116 NLS/RBM RQARRNRRRRWR NES NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 58 Predicted EIAV Rev Structure Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 59 Comparison of Predicted Rev Structures EIAV FIV SIV Dimer HIV HIV Dimer Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 60 Predicted vs Experimental Structure of N-terminal region of HIV Rev A Predicted Structure HIV Rev N-terminus B NMR Structure HIV Rev N-terminal Peptide (Battiste & Williamson) C Overlay Alignment of Predicted & NMR Structures Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 61 Location of functional residues EIAV Rev? Leu95 & Leu109: NES Leu36,45,49: On surface, consistent with role in nuclear export Buried in core, critical hydrophic contacts for fold? Putative RBM Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 62 Mutate hydrophobic residues predicted to be critical for helical packing in core L65 vs L95 & L109 Single mutants: Leu to Ala Leu to Asp Double mutants: Leu to Ala Single Ala Mutation LA Single Asp Mutation LD Negligible effect on Rev activity L65 L109 L95 Insert charged aa in hydrophobic core Double Ala Mutation LL AA Dramatic change in Rev activity? Reduction in Rev activity? Yungok IhmBCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 63 150 100 RI pcDNA3 50 Sham Activity of Rev Structural Mutants Functional Analysis of Rev Structural Mutants in vivo (CAT assay) Controls A D L65 A D L95 A D L109 Single Mutations L65A L95A L65A L109A L95A L109A Double Mutations BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Wendy Sparks 10/15/07 64 Functional domains: EIAV vs HIV Rev Red - RNA interaction Green - Protein interaction NES - Nuclear Export Signal NLS - Nuclear Localization Signal RBM - putative RNA Binding Motif EIAV Rev RBM Folding? NES NLS RRDRW HIV-1 Rev ERLE 1 KRRRK 116 NLS/RBM NES RQARRNRRRRWR BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 65 Putative RNA-binding Motifs & Predicted RNA-binding Residues Mapped onto Predicted EIAV Rev Structure ERLE KRRRK Yungok Ihm RRDRW 61 71 81 91 ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR … ++ +++++++ ++++++++++ 121 + 131 + 141 151 161 HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDSKRRRKHL + ++++ ++ +++ +++++++++++++++ BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Michael Terribilini 10/15/07 66 Express & purify MBP-ERev deletion mutants 1 31 RBM Folding? 125 NES MBP-ERev 146 165 NLS MBP MBP MBP MBP MBP MBP MBP 146-165 125-165 57-124 57-145 57-165 31-145 31-165 1-165 MBP MBP Marker 1-165 31-165 31-145 57-165 57-145 57-124 125-165 146-165 57 60 42 30 22 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Jae-Hyung BCB Lee 10/15/07 67 MBP-ERev binds specifically to RRE in vitro No protein 31-165 1-165 BSA 31-165 MBP 1-165 MBP antisense sense BSA Competition No cold RRE UV crosslinking Cold RRE Undigested 32P-RRE 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction Jae-Hyung BCB Lee 10/15/07 68 EIAV Rev: Binding Predictions vs Experiments PREDICTED: Structure Protein binding residues RNA binding residues KRRRK + + RRDRW 61 71 81 91 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++++++++ ++ ++++++++++++++++ 131 141 151 161 VALIDATED: QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ Protein binding residues + ++++++++++++++++++++ 31-165 31-145 57-165 145-165 WT MBP RNA binding residues 57 31 RBM FOLD? 145 165 NLS/RBM NES RRDRW Jae-Hyung Lee 125 Lee et al (2006) J Virol 80:3844 ERLE KRRRK Terribilini et al (2006) PSB 11:415 Roles of Putative RNA Binding Motifs? 1 31 57 124 146 RBD 165 RBD NES NLS RRDRW ERLE KRRRK AADAA AALA KAAAK ERDE Jae-Hyung Lee BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 70 Rev RNA Binding Motifs: Predicted vs Experiment PREDICTED: Structure KRRRK + + Protein binding residues RNA binding residues RRDRW 61 71 81 91 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++++++++ ++ ++++++++++++++++ 131 141 151 161 VALIDATED: QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ Protein binding residues + ++++++++++++++++++++ KAAAK WT RNA binding residues 57 31 RBM NES 125 145 165 NLS/RBM NLS RRDRW AADAA Jae-Hyung Lee FOLD? ERLE AALA ERDE KRRRK KAAAK Summary: Predictions vs Experiments KRRRK ERLE RRDRW 57 31 NES 125 RBM RRDRW FOLD ERLE 145 165 NLS/RBM KRRRK 61 71 81 91 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++++++++ ++ ++++++++++++++++ 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ Lee et al (2006) Terribilini et al (2006) + ++++++++++++++++++++ J Virol 80:3844 PSB 11:415 Conclusions & Future Directions Combination of computational & wet lab approaches revealed that: • EIAV Rev has a bipartite RNA binding domain • Two Arg-rich RBMs are critical • RRDRW in central region (but not ERLE) • KRRRK at C-terminus, overlapping the NLS • Based on computational modeling, the RBMs are in close proximity within the 3-D structure of protein • Lentiviral Rev proteins & their cognate RRE binding sites may be more similar in structure than has been appreciated Future: Computational: Use Rev-RRE model system to discover "predictive rules" for protein-RNA recognition Experimental? Lee et al (2006) Terribilini et al (2006) BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction J Virol 80:3844 PSB 11:415 10/15/07 73 Experimentally determine the structure of Rev-RRE complex !!! BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 74 Building “Designer” Zinc Finger DNA-binding Proteins J Sander, P Zaback, F Fu, J Townsend, R Winfrey D Wright, K Joung, L Miller, D Dobbs, D Voytas Wright et al (2006) Sander et al (2007) Nature Protocols Nucleic Acids Res Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16 RNA Structure Prediction (Terribilini) • • • • • • Introduction Types of RNA Structures RNA Secondary Structure Prediction Methods Ab Initio Approach Comparative Approach Performance Evaluation BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07 76