Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing Center For Computational Biology and Bioinformatics Outline of Talk • • • • • • Examples of intrinsically disordered proteins Prediction of natural disordered regions Disorder and cell signaling Disorder and molecular recognition Disorder and alternative splicing Protein isoforms and functional diversity via the linkage of alternative splicing and intrinsic disorder Molecular Recognition Element (MoRE) CDK Cyclin A 3D structure from: Russo A et al., Nature 382:325-331 (1996) p27kip1 Disorder and Function Category Change Examples Descriptions Molecular Recognition DO 113 Inter- and Intra-protein, ssDNA, dsDNA, tRNA, rRNA, mRNA, nRNA, bilayers, ligands, co-factors, metals Protein Modification Variable 36 Acetylation, fatty acylation, glycosylation, methylation, phosphorylation, ADPribosylation, ubiquitination, proteolytic digestion Entropic Chains Variable 17 Linkers, spacers, bristles, clocks, springs, detergents, self-transport Dunker AK et al., Adv Protein Chem 62: 25-49 (2002) Prediction of Disorder Disordered Sequence Data Attribute Selection or Extraction Separate Training and Testing Sets Predictor Training Predictor Validation on Out-of-Sample Data Prediction www.disprot.org DisEMBLTM DISOPRED2 DRIPPRED FoldIndex© GlobPlot 2 IUPred PONDR® PreLink RONN VL2 VL3, VL3H, VL3E Intrinsic Protein Disorder Prediction Disorder Prediction Server Web based predictor for disordered regions in proteins Estimate the fold probability of a protein Intrinsic Protein Disorder, Domain & Globularity Prediction Prediction of Intrinsically Unstructured Proteins Predictors of Natural Disordered Regions Prediction of unfolded segments in a protein sequence based on amino acid composition Regional Order Neural Network DisProt Predictor of Intrinsically Disordered Regions DisProt Predictor of Intrinsically Disordered Regions PONDR® VL-XT Score p53 MoREs Oldfield et al., Biochemistry 44: 12454-12470 (2005) Protein Interaction Domains http://www.mshri.on.ca/pawson/domains.html GYF Domain and CD2 Chain B Freund et al., (2002) Embo J. 21:5985-5995 GYF Domain of CD2 Binding Protein Freund et al., (1999) Nat. Struct. Biol. 6:656-660 CD2: Binding Partner of GYF Domain Consensus sequence (GYF binding sites) has the sequence: ppppghr. The peptide in the crystal structure has the aa sequence: shrppppghrv. Freund et al., (1999) Nat. Struct. Biol. 6:656-660 Analysis of Signaling Interactions • Examined each interaction on Pawson’s website. • Almost all of the interactions involved ordered regions binding to disordered partners. • Conclusion: if Pawson’s examples are typical, then a very significant proportion of protein-protein signaling interactions use disordered regions. Parallel Paradigms Catalysis AA seq → 3-D Structure → Function Signaling AA seq → Disordered → Function Ensemble Alternative Splicing and Intrinsic Disorder • Find proteins with both ordered and disordered regions. • Find mRNA alternative splicing information for these proteins and map to the ordered and disordered regions. • For alternatively spliced regions of mRNA, do they code for ordered protein more often or do they code for disordered protein more often? Alternative Splicing 5’ UTR Coding Sequence 3’ UTR Alternative Splicing 5’ UTR mRNA Protein sequence Coding Sequence Transcription Translation 3’ UTR Alternative Splicing 5’ UTR mRNA 1 Isoform 1 Coding Sequence Transcription Translation 3’ UTR mRNA 2 Isoform 2 Alternative Splicing 5’ UTR mRNA 1 Isoform 1 Coding Sequence Transcription Translation AS region Folding 3’ UTR mRNA 2 Isoform 2 Structural Studies of AS Disordered AS regions Pyrophosphorylase Structured AS regions RAC1 Tumor necrosis factor Glutathione S-transferase Sulphotransferase Studying the Relationship IDAS DisProt Database of proteins with experimentally determined structure and disorder www.disprot.org ASG (AS Gallery) SwissProt (VarSplic) ASED dataset: 46 proteins 74 characterized AS regions >19,000 charaterized residues, 35% ID Results on ASED Distribution of structurally characterized AS regions Enlarging the Dataset PONDR® VSL1 ID predictor (> 80% accuracy) Validation Analysis ASED dataset ASSP dataset 558 AS human proteins from SwissProt 1,266 AS regions Global Results AS regions disorder distributions in ASED and ASSP 0.7 ASED experimental ASED predicted ASSP predicted Relative frequency 0.6 0.5 0.4 0.3 0.2 0.1 0 0-20% 20-40% 40-60% 60-80% Disorder content (AS regions) 80-100% Alternative Splicing and Disorder • Ordered Proteins: active site residues non-local in sequence, become associated by protein folding • Disordered Proteins and regions: functional residues localized in squence • Functional regions for signaling and regulation are located one after another • Alternative splicing edits functional sets and thereby leads to regulatory and signaling diversity Breast Cancer Protein 1 (BRCA1) Summary • Protein signaling interactions involve intrinsic disorder (ID) a high percentage of the time. • Alternative splicing (AS) often occurs in regions of pre-mRNA that code for intrinsic disorder. • AS + ID facilitate regulatory and signaling diversity. • Is AS + ID the critical combination for the evolution of multi-cellular organisms? Acknowledgements Indiana University Temple University Predrag Radivojac Pedro Romero Marc Cortese Gerard Go Amrita Mohan Jie Sun Siama Zaida Jack Yang Zoran Obradovic Slobodan Vucetic Vladimir Vacic Kang Peng University of Idaho Celeste J. Brown Chris Williams Molecular Kinetics Vladimir Uversky Yugong Cheng Rockefeller University Lilia Iakoucheva Sebat University of Wisconsin John Markley Chris Oldfield UCSF Ethan Garner PNNL Richard Smith Eric Ackerman Support • • • • • NSF CSE II 9711532 NIH R01 LM007688 USDA 2000 1740 INGEN®, Lilly Endowment Molecular Kinetics