Recognition of the Pyrimidine-Tract of the pre-mRNA by U2AF and a Novel Splicing Factor PUF60 by Patrick Schonleber McCaw B.A. in Biology Haverford College Submitted to the Department of Biology in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biology at The Massachusetts Institute of Technology June 1998 © 1998 Massachusetts Institute of Technology. All rights reserved. Signature of Author Department of Biology May 29, 1998 Certified by Dr. Phillip A. Sharp Professor of Biology Thesis Supervisor Accepted by Dr. Frank Solomon Chair, Biology Graduate Committee Recognition of the Pyrimidine-Tract of the Pre-mRNA by U2AF and a Novel Splicing Factor PUF60 by Patrick Schonleber McCaw Submitted to the Department of Biology on May 29, 1998 in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biology ABSTRACT The genetic information required for encoding functional protein molecules is interrupted in eukaryotes by non-coding sequences. These non-coding sequences, called introns, must be removed from the transcribed RNA molecule by the spliceosome before the genetic information, encoded in exons, can be used by the cell. The process of removing introns and joining exons is called splicing. The splice site sequences are recognized multiple times during splicing of the intron, presumably as a mechanism to ensure high fidelity splicing. Recognition of the intron by the spliceosome requires the interaction of many protein and snRNP components. The first intron recognition events are the recognition of the 5' splice site sequence by U 1 snRNP, recognition of the branch sequence by SF1/BBP, and recognition of the pyrimidine-tract sequence by U2AF. The 5' splice site and the branch sequence are subsequently recognized by U5 and U6 snRNPs and by U2 snRNP, respectively. The PUF splicing activity was identified as a second pyrimidinetract binding factor that is required for efficient splicing in vitro. The PUF activity is required for efficient formation of the stable, ATP-independent U2 snRNP:pre-mRNA complex, An . U2AF is not required for splicing under certain conditions. However, the PUF activity is required for splicing even in the absence of U2AF. The PUF activity is composed of the previously described splicing factor p54 and the novel protein PUF60. PUF60 binds specifically to pyrimidine-tract RNA, is conserved between vertebrates and the invertebrate Drosophila,and forms SDS-resistant dimers. The SDS-resistant dimerization of PUF60 is mediated by the C-terminal domain, the PUMP domain. Several other proteins containing PUMP-domain homologies have been identified. Among these PUMP-domain homologies is the domain of the small subunit of U2AF which is required for the stable proteinprotein interaction between the small subunit of U2AF and the large subunit of U2AF. The PUMP domain may generally mediate protein-protein interactions. Thesis Supervisor: Phillip A. Sharp Title: Salvador E. Luria Professor of Biology and Head, Department of Biology ACKNOWLEDGMENTS There are many people who have made this thesis possible. I would like to start by thanking the teachers. First and foremost I would like to thank Mr. Werner Feig who taught critical thinking by way of American History at Scarsdale High School; among many great teachers, I have had he remains the best. Dom Castillo taught chemistry in seventh and eighth grade and it is to him that I owe my interest in biochemistry. He was the first to teach thermodynamics and AG, and I thank him for having the courage to introduce the idea of Gibb's free energy to a pre-adolescent. Dom was followed by Mr. Moffett and Professor John Chesick in teaching me the mysteries of thermodynamics. Professor Chesick warned us often, "word to the wise, cudgel to the obtuse: study, practice, learn." An arrogant young biologist wannabe, I thought the gas laws a bit arcane, and did not take the wise word quite seriously enough. The cudgel did not hit until David Baltimore made a passing remark that thermodynamics was the most important course for a biologist. Now, years later, I have no doubt he was right, and I hope that some of the work presented here reflects an attempt to understand thermodynamics in the context of splicing. I have been generously invited to work in many labs and I owe many people thanks for teaching me the doing of science. I would like to thank Robert Wooley at the Albert Einstein College of Medicine for taking me in as a high school student and teaching me how to isolate nuclei from tumor samples, so that he could determine the ploidy of the tumor cells with a wonderful Rube Goldberg flow cytometer. It was from Karl Pfenninger and Marie-France Mayli6-Pfenninger that I first really learned experimental biology. Karl drove in to work on the Bronx River Parkway at heart-stopping speeds, filling my sleep-addled brain (I had not yet discovered coffee--thanks Charlotte) with the power of good experimentation and the fun of doing good science. I only wish I had been more awake. Marie-France taught me perspective and embryology. Edie Abreu showed me how to use a pipetman and much more. David Baltimore and Kees Murre taught me how to take an idea and turn it into good science, a lesson I am still trying to learn. The Baltimore lab, and the third floor of the Whitehead Institute, was a fantastic place to learn science and there are too many people to thank for my time there. I would especially like to thank the technicians with whom I worked: Carolyn Gorka, Anne Gifford, and Mike Paskind in the Baltimore lab; Annie Smith and Mitch Walkowicz in the Korman lab; Melissa Woodrow and Geoff Parsons in the Mulligan lab; and Lorene Lanier, Rebecca Riehl and Sallie Smith of the Weinberg lab. I would like to thank my great friends and classmates Jonathan Loeb, Julie Segre, and Brenda Schulman; also, Chris Stipp and Mary Herndon; Neil Silverman, Greg Marcus, Rachel Kindt, Jen Mach, Annie Williamson, and Leisa Johnson. My good friends from my days in the fly labs: Charlotte Wang, Paul Kauffman and Kathy Collins, Mandy Hannaford, Lulu Fresco, Tau-Mu Yi (honorary member), Chris Seibel and Sima Misra, Fay Shemanski, Jan Carminati, and Dan Moore. With co-workers and friends like these its a wonder we ever left the lab and what a place it was to work and learn: from meiosis to nuclear overhauser effect, rotisserie baseball to feminism, and genetics to gel filtration, all before lunch. I would especially like to thank Phillip Sharp and Margarita Siafaca for all their support and patience. I would like to thank the many people in the Sharp lab who have made my time here an enjoyable one and have made this work possible in so many ways: Maggie Beddall, Ben Blencowe, Chris Burge, Helen Cargill, Karyn Cepek, Dan Chasman, John Crispino, Gene Hunh, Robbyn Issner, Jae-Sang Kim, Lee Lim, Andrew MacMillan, Joel Pomerantz, Barbara Panning, Rock Pulak, Yubin Qiu, Charles Query, Erica Reifenberg, Tom Tuschl and my baymates, Ben Shykind, Grace Jones and Akira Mitsui. Kevin Amonlirdviman worked with me in the summer of 1997 and without him the PUF60 binding studies would not have been done and the summer would not have been nearly as much fun. I would like to thank my family, my parents Jack and Maura McCaw and John Schonleber for their support and understanding. Thanks also to my brothers and sisters: Chris, Michael, Kath, and Anna; and to my cousin and her husband Maura McMillin and Terefe Kerse. Finally, and most importantly, I would like to thank Andrea Page for making my life happy and complete. TABLE OF CONTENTS CHAPTER 1: RECOGNITION OF THE SPLICE SITE SEQUENCES IN pre-mRNA SPLICING AND THEIR ROLE IN MEDIATING SPLICEOSOME ASSEMBLY...... 10 INTRODUCTION............. . ................... .......... THE SPLICING REACTION ......................................... .................... 11 .................................... 12 INFORMATION AND THE SPLICE SITE DETERMINATION PROBLEM ....................................... 12 RECOGNITION OF THE 5' SPLICE SITE .......................................................... .................... 15 U snRNP................................... .................... .................................................... 15 RECOGNITION OF THE 3' SPLICE SITE .................... ......................................... 18 Three distinct recognitionelements at the 3' end of the intron................................. .... 18 U 2AF ..................................... ................. ...... 20 SF /B BP .......................................................... ASSEMBLY OF THE SPLICEOSOME ......... ... .......... .... ........ ............................................ 23 ....................................... 2 4 E comp lex ............................................................... A complex.................................................. 26 The spliceosomalcomplexes B and C..................................... MY CONTRIBUTIONS TO THIS PROJECT ............................ REFERENCES.......................................... FIGURES ........... 23 .......................................... .................. 27 ......................... 28 ................................ 30 ...... .......... .............. 44 Figure1. The splicing reaction is a two step transesterificationreaction........................ 44 Figure2. Splice site sequences do not alone determine splice sites................................. 46 Figure 3. Cartoon of the spliceosome assembly pathway ................. 48 .................... CHAPTER 2: IDENTIFICATION, PURIFICATION AND CHARACTERIZATION OF A NEW PYRIMIDINE-TRACT BINDING SPLICING ACTIVITY......................... 50 ABSTRACT..................................... INTRODUCTION .............. RESULTS ......... ........ ........................................... 51 ......................................................................... ........................................................................................ 55 Depletion ofNE for pyrimidine tractsplicing ......................... Role for U2AF35.............................. 52 ........ 55 ......................... 55 Purificationof the PUFactivity ............................................................ ................ 56 IdentificationofPUF60 andp54 as the predominantproteins in the activefraction....... The PUFactivity purifies as 400 kD complex ...................................... . 56 58 PUFis requiredfor efficient A3' complex assembly................................... 58 RNA crosslinking of the PUFfraction...................................................................... 58 PUFdoes not detectably interactwith the branchsequence or AG dinucleotide.................... 59 PUFand U2AF do not bind to the same pyrimidine-tractRNA ........................................ 59 Interactionwith the branch sequence bindingprotein SF1/BBP..................... .. 60 Subcellular localizationofPUF60.................................................... DISCUSSION ......................................................................................................... 60 ...... 62 Splicing in vitro requires two pyrimidine-tractRNA binding activities......................... 62 The PUFproteins: p54 and PUF60....... ........ .......... .............. 63 Sub-cellularlocalization ofPUF60.................................................... 64 M ETHODS AND M ATERIALS............................................................................... ............ 65 Preparationofpoly[U] depleted nuclearextract .......... ......... ............... 65 Purificationof the PUFactivity................................................................................ 66 RNAs used in this study............................................................................................. 66 Sp licing in vitro ..................................................................................................... 67 Complex assembly assays............................................................................. Gel shift........................................................... Crosslinking ........................................................ A ntibodies ......................................... ......... ........... ........... 67 ............................. 67 .............................. .................... 68 ....................................................... 68 Immunoblotting........................................ ................ ...................... 68 Immunoprecipitationsand Gstpull-downs.................................................................. 68 Im munofluorescence .............................................................................................. 68 ACKNOW LEDGMENTS................................................................................................... 69 R EFERENCES............................................................................................................... 70 FIGURE LEGENDS ....................................................................................................... 75 Figure1. Poly[U]-depletednuclearextract requiresboth U2AF andPUF......................... 75 Figure2. Purificationof the PUFactivity... ............................... 78 Figure3. PUF60 is ubiquitously expressed in humans as a 2.0 kb mRNA ................... 83 Figure4. NEAU is depleted of both PUF60and p54 .................................................... 85 Figure5. The PUFfactorp54 can form complexes with PUF60.................................. 88 Figure6. PUFactivity is requiredfor efficient A3 'spliceosome assembly. ...................... 90 Figure7. The PUFfraction has three species that crosslink to pre-mRNA................... 92 Figure8. PUFdoes not bind the AG dinucleotide or the branch sequence ...................... 94 Figure9. PUFpyrimidine-tractbindingin the presence of either U2AF or U2AF65............. 96 Figure10. PUF60interacts with the branchsequence bindingprotein SF1/BBP........ . 98 Figure 11. PUF60 localizes to a non-speckle domain of the nucleus...............................100 6 CHAPTER 3: PUF60 A NOVEL PYRIMIDINE TRACT BINDING FACTOR WITH HOMOLOGY TO THE SPLICING FACTORS U2AF65 AND MUD2P DIMERIZES VIA ITS C-TERMINAL RRM-LIKE DOMAIN, THE PUMP DOMAIN ....................... 102 ABSTRACT ........................................................................................... INTRODUCTION ..................................................................................... R ESULTS...... ................................. 103 ..................... 104 ....................................................................... 106 PUF60,conserved in evolution, is related to the yeast splicingfactorMud2p........... 106 The PUMP domain is a distinctsubfamily of the large RRM domain family ..................... 107 The PUMPdomain is aprotein-proteininteractiondomain........................................... 108 The PUMP domain does not contribute to RNA binding..............................................109 DISCUSSIoN......................................................................................................... .. 111 RNA binding activity of PUF60............................................................................. 111 PUF60 is a U2AF65 homologue................................. ......................................... 112 The PUMP domain is a subset of the RRM domainfamily............................................ 112 The PUMP domain is a protein-proteininteractiondomain.........................................114 PUMP domain interactions: otherproteins................................................................ 114 M ETHODS AND M ATERIALS....................................................................... ................. 116 Identification of ESTs, sequencing,and alignments..................................................... 116 Expression andpurificationof His6PUF60and His6PUF60AC........................116 Translationin vitro................................................................................................117 Dim erization assay ............................................................................................... 117 RNA binding assay.................................................................................................117 REFERENCES.......... .............................. .................................................................. 119 FIGURE LEGENDS........................................... ........................................................ 122 Figure1. Sequence ofPUF60and DPUF68; comparisonofPUF60, U2AF65 and DPUF68.122 Figure2. The PUMP domain is a distinct subset of the RRM domainfamily ............... 125 Figure3. HPUF60forms SDS-resistantdimers...........................................................128 Figure 4. RNA binding activity of His6PUF60........................................ ...... 133 CHAPTER 4: SC35 MEDIATED RECONSTITUTION OF SPLICING IN U2AFDEPLETED NUCLEAR EXTRACT ......................................... ABSTRACT................................................. .................. ................................. 138 INTRODUCTION ......................................................................................... M ATERIALS AND M ETHODS.................................................................. RNA Transcription................................................ 137 ............... ............... 139 ...................... 141 ..................... 141 NuclearExtracts.....................................................................................................141 U2AF and SC35 Preparation.................................................................................... 141 Splicing Assays............................................................................................142 R ESU LTS................................................................................. ................................ 142 U2AF6 5 and SC35 Mediated Reconstitution of Splicing in U2AF-DepletedExtracts...........142 FactorDependent Splicing in U1-Blocked Extracts......................................................143 DISCUSSION................................................... ........................................ ......... 143 ACKNOWLEDGMENTS .................................................................................................. 146 REFERENCES ............................................................................................. 147 FIGURE LEGENDS .......................................................................................... Figure 1. SC35functionally substitutesfor U2AF6 5 ...................... ........... 149 .......................... . 149 Figure2. SC35 reconstitution ofsplicing in U2AF-depletedreactions is substratespecific..151 Figure3. SC35 reconstitutespre-mRNA splicing dependent on the presence of U1 snRNP. 153 Figure4. Three distinctpathways resulting in spliceosome assembly. ............................. 155 CHAPTER 5: A MINIMAL SPLICEOSOMAL A COMPLEX RECOGNIZES BRANCH SITE AND POLYPYRIMIDINE TRACTS................................................ 157 A BSTRACT .......................................... ................................................................... 158 INTRODUCTION ........................................................................................................ MATERIALS AND METHODS ...................................... . 159 ................................ 160 RNA transcriptionand synthesis ofsubstrates.............................................................160 Formationand native gel analysis of splicing complexes. ....................................... Nuclearextracts andpurificationofsplicingfactors........................... 161 ....................161 Photo-cross-linkingassays................................................................................... 162 RE SULTS .......................................... ....................................................................... 162 A short oligonucleotide canform complexes with U2 snRNP.....................................162 Both branchsequence andpolypyrimidine tract are required............................................ 164 Similarities of A,,m to complex A containing U2 snRNP ........................................... 165 Amin complexformation is ATP independentand undergoes an ATP-dependent dissociation.167 Effects of 2'-Hsubstitutions...................................................................................168 D ISCUSSION ............................................................................................................... 169 A more sensitive system - 2'-OHand adenine interactions.......................................... 169 Amin complex forms independently of Ul snRNP and ATP.......................................170 An active mechanism of U2 snRNP removal............................................................... 171 ACKNOWLEDGMENTS.......... .......................... ........... ......................... 172 REFERENCES............................................................................................................. 172 LEGENDS TO FIGURES...............................................................................................180 FIGURE 1. BS-PPT RNA forms an A-like complex with U2 snRNP .......................... 180 FIGURE2. Both branch sequence andpolypyrimidinetractare requiredin cis................... 182 FIGURE 3. CharacteristicsofAmin complex. ......................................... ...... 184 FIGURE 4. Amin complex formation in the presence ofATP...................................186 FIGURE 5. Time course of complex assembly ................ ................. .. 189 FIGURE 6. Summary offormation of complexes A (left) and Amin (right)....................... 191 TABLES .................................... ....................... 193 TABLE 1. Relative yields for Amin complex formation of modified substrates.................. 193 SPECULATIVE APPENDIX: A PROPOSED INTERACTION BETWEEN THE BRANCH ADENOSINE AND THE U5 LOOP NUCLEOTIDE URIDINE 4, A MECHANISM TO JUXTAPOSE THE SUBSTRATES OF FIRST STEP OF pre-mRNA SPLIC IN G ........................................................................................ 195 A NEW REPRESENTATION OF THE SPLICEOSOMAL SECONDARY STRUCTURE ............................ 196 DESCRIPTION OF THE MODEL................................... ............. ...................... 197 STRUCTURAL CONSIDERATIONS................................................................ 198 GENETIC EXPERIMENTS THAT ARE CONSISTENT WITH THE MODEL ........................................ 199 BIOCHEMICAL EVIDENCE ................... . ....... . ........................ 201 202 EXPERIMENTS THAT TEST THE MODEL ................................................ ACKNOWLEDGMENTS.......................... REFERENCES.............................. FIGURE LEGENDS ............................. ............................. ............... ................... 202 ............. ............... 203 ................................. ............ ........ 206 Figure1. Known base-pairinginteractionsof U1, U2, U5, and U6 ............................... 206 Figure2. The rearrangedbase-pairinginteractions .................................... 208 Figure3. The proposedbase-pairinginteractionbetween the branch adenosineand Uridine4..210 Figure4. Photographsof the model of the proposedstructure....................................... 212 Figure 5. Proposedbase-pairinginteractions ............................................................ 214 AFTERW ORD .................................................................................... 217 PUF60:a splicingfactor? ............................................................. 217 Experiments that addressfunctional issues............................... ................. 218 PUF60 immunolocalization is the native protein detectable?.................. ............................................ Is PUF60a dimer in the native state? ........................................ ...... 219 ................... 219 CHAPTER 1: RECOGNITION OF THE SPLICE SITE SEQUENCES IN pre-mRNA SPLICING AND THEIR ROLE IN MEDIATING SPLICEOSOME ASSEMBLY INTRODUCTION Without the ability to accurately and reproducibly access and manipulate information the cell would be unable to function. Information in a biological context exists in many forms, and biological information can be carried by many kinds of biomolecules. Most famously, biological information is carried by DNA in the form of genes, but information can also be carried by proteins. For example, G proteins carry information about the state of membrane trafficking by carrying either GTP or GDP in their guanine nucleotide binding pocket (Boguski and McCormick, 1993). Other proteins carry information in their phosphorylation states: phosphorylation of the retinoblastoma protein is a marker of cell cycle state (Sherr, 1996). The pre-mRNA molecule carries information beyond the genetic information required for protein synthesis. This pre-mRNA specific information is required for appropriate processing of the pre-mRNA such as splicing and polyadenylation. Splicing removes introns from the pre-mRNA, a key step in the formation of the mRNA molecule. The unique gene structure of eukaryotes, first described over 20 years ago (Berget et al., 1977; Chow et al., 1977), in which the protein coding information is interrupted by noncoding sequence, requires that the pre-mRNA be accurately processed to remove each of the non-coding (intronic) sequences, leaving only the coding (exonic) sequences required for translation or other mRNA function. In the nucleus pre-mRNA is processed to remove introns, yielding mRNA which is exported from the nucleus and is competent for translation by the ribosome. Recognition of an intronic sequence element by a newly recognized splicing factor is the focus of this thesis. A description of the sequence elements required for intron/exon recognition and the trans-acting factors that recognize these elements will be the subject of this introductory chapter. Formation and constitution of the catalytically competent splicing enzyme, the spliceosome, will be more briefly described. This thesis concludes with a speculative appendix in which I discuss one aspect of the structure of the catalytic spliceosome. I shall make a distinction between the catalytically active splicesome, the catalytic spliceosome, and the early spliceosomal complexes that appear to play a role in splice site sequence recognition and not in catalysis. These problems, the splice site definition problem and the catalytic problem, are distinct intellectually; and I would argue that they are distinct, though related, problems biochemically. THE SPLICING REACTION The splicing reaction is indicated diagrammatically in Figure 1. Exons are indicated by boxes and the intron by a line. Removal of the intron is a two-step transesterification reaction, catalyzed by a large multi-subunit ribonucleoprotein particle, the spliceosome (Brody and Abelson, 1985; Frendeway and Keller, 1985; Grabowski et al., 1984; Grabowski et al., 1985; Padgett et al., 1984; Ruskin et al., 1984). In the first step, a 2' hydroxyl group of an intronic adenosine nucleotide, referred to as the branch adenosine, attacks the phosphodiester bond separating the 5' exon from the intron, referred to as the 5' splice site. This leads to the release of the 5' exon and the formation of a branched 2'-5' adenosine within the intron-3' exon fragment (Padgett et al., 1984; Ruskin et al., 1984; Wallace and Edmonds, 1983). This branched intron-3' exon first step product is often referred to as the lariat intron-3' exon because of its circular structure. In the second transesterification reaction the 3' hydroxyl of the 5' exon attacks the phosphodiester bond at the boundary between the intron and 3' exon, referred to as the 3' splice site, forming ligated exon product and free lariat intron product. How this chemical reaction is accomplished, both in terms of what are the catalytically important groups and in how those catalytically important groups come to be arrayed about the splice sites remain important and unanswered questions. INFORMATION AND THE SPLICE SITE DETERMINATION PROBLEM An important question arises from consideration of the information necessary to accomplish accurate, high-fidelity removal of the intronic sequence. Inaccurate splicing would be disastrous for the cell as most genes (with the exception of S. cerevisiaegenes) have multiple introns and even a relatively accurate splicing reaction, 90%, would lead to very few, 35%, accurately spliced mRNA molecules in a pre-mRNA with ten introns. Most of these inaccurately spliced introns may be functionally unimportant as they are subject to non-sense mediated decay, but some would have severe functional consequences (Hodgkin et al., 1989; Pulak and Anderson, 1993). A typical mammalian pre-mRNA will have thousands of nucleotides, thousands of phosphodiester bonds and 2' hydroxyls. How does the spliceosome know which ones to use? Sequences at the intron-exon boundary and the sequence near the branch adenosine, the branch sequence, are partially conserved. Are these sequences sufficient to uniquely define the splice sites? For the yeast Saccharomyces cerevisiae,the conserved sequences at the 5' splice site, the 3' splice site and the branch sequence are probably sufficient to define each intron uniquely, but for mammalian cells it is clear that there is insufficient information within these sequences to uniquely define intron position (Stephens and Schneider, 1992). In yeast the splice site sequences are precisely defined and little variation is found or tolerated in these sequences (Rymond and Rosbash, 1992). In vertebrates, as for the metazoans generally, the sequences are less well defined and variation of the sequence is better tolerated (see for example; Reed and Maniatis, 1985; Senapathy et al., 1990). The splice site consensus sequences have recently been refined by C. Burge and P. A. Sharp (to be published in Burge and Sharp, 1998; also see Mount, 1982) from a database of over 1600 mammalian introns. For these introns, the 5' splice site sequence consensus has been determined to be aG/GURrG, the branch sequence consensus is YTrAy and the pyrimidine tract-3' splice site sequence consensus is yyyyyyyynCAG/r (where / marks the splice site phosphodiester bonds, A is the position of the branch adenosine, R is a conserved purine, Y is a conserved pyrimidine and lower case letters are less conserved). The insufficiency of splice site sequences is illustrated in figure 2 which shows the first 7210 nucleotides of the Human CREB-RP transcript (chosen at random, Min et al., 1995). The 5' and 3' terminal nucleotides of the introns are indicated in bold. Note that the sequence of the fourth 5' splice site (underlined) is also found in the fifth intron (also underlined); however in this position this sequence is not used as a 5' splice site. How is the identical sequence used as a 5' splice site in one position but not at another? More information must be available to the spliceosome than is found in the splice site and branch sequences. This missing information is often referred to as the "context" of the splice site. Experimentally, the importance of context is demonstrated by the observation that mutation of a 5' or 3' splice site sequence activates cryptic splice sites. Activation of cryptic splice sites is seen in vivo (Treisman et al., 1983) and in vitro (Krainer et al., 1984). While cryptic splice site sequences fit the splice site sequence consensus, they are generally poorer matches to the consensus than the endogenous splice site sequence. Cryptic splice sites are not known to be used unless the endogenous splice site sequence is inactivated by mutation (Treisman et al., 1983). Many naturally occurring mutations of the globin loci have been identified, and mutations of splice sites cause both skipping and cryptic splice site activation (see for example; Treisman et al., 1983; Faustino, 1998). This result argues that information defining a splice site is not constrained to splice site sequences themselves, but can be encoded elsewhere and hence, context is important. The presence of information outside of the splice site sequence, the existence of splice site context, raises the question of whether the splice site sequence is not only not sufficient but also perhaps not necessary for splice site determination. In one sense, the splice site sequence is clearly necessary for splice site determination as all splice sites identified bear some resemblance to the splice site sequence consensus' and so the splice site sequence must determine which phosphodiester bond is targeted by the spliceosome. In a broader sense it is not as clear that the splice site sequence is necessary for determining that a splice site exists in a given region, as the frequency with which these cryptic splice sites are predicted to occur is quite high. Put another way, the splice site sequence is necessary to define the precise phosphodiester bond used as the splice site, but the splice site sequence is not sufficient to identify a region of the pre-mRNA as having a splice site (Green, 1986; Maniatis and Reed, 1987). Also the splice site sequence is not often necessary to identify a region as having a splice site, as inactivation of the wild-type splice site by mutation of the splice site sequence leads to activation of other splice sites that would not otherwise be used (Treisman et al., 1983; Wieringa et al., 1983). The source of the information that is necessary for determining a region as having an exon-intron border, the context, is not known. For pre-mRNAs that have regulated introns or exons this information may be contained in enhancer elements that can be found in either the intron or the exon (Robberson et al., 1990; Talerico and Berget, 1994). For constitutive exons and introns, the situation is less clear and has not been as well studied, although there is recent work on trans-acting factors that may recognize context (Sun et al., 1993; Tacke et al., 1997; Tacke and Manley, 1995; Wang and Manley, 1997). However, there appears to be broad similarity between the constitutive and regulated splicing systems. It is not clear whether the information required for regulated splicing and context information is encoded primarily in the exonic or in the intronic sequence. Rather it appears that location of the information varies between species: invertebrates may use information within introns (Talerico and Berget, 1994), while vertebrates use information primarily within exons (Robberson et al., 1990). Localization of the information may also vary from intron to intron as small mammalian exons have sequence elements in adjacent introns which regulate their recognition (see for example, Chan and Black, 1995), while others clearly have information within the exon (see for example, Lavigueur et al., 1993). In the following pages I will discuss recognition of the splice site sequences focusing on the recognition of the 5' splice site sequence by the Ul small nuclear ribonucleoprotein particle (U1 snRNP) and recognition of the pyrimidine tract by U2 snRNP auxiliary factor (U2AF). I will also discuss, more briefly, the increasing understanding we have of how the non-splice site sequence information is recognized and the protein factors that are implicated in this process. Assembly of the catalytic spliceosome will then be briefly discussed. The mammalian splicing ' Many of the exceptions to this statement are now known to be splice sites that are spliced by a new class of spliceosome, the U12 spliceosome (Hall and Padgett, 1996; Tarn et al., 1995). system is the model for these discussions, but it is not possible to discuss splicing without reference to the yeast Saccharomycescerevisiae. Whenever possible mammalian nomenclature will be used in preference to yeast nomenclature. RECOGNITION OF THE 5' SPLICE SITE The 5' splice site sequence consists of the consensus sequence aG/GURrG and this sequence is complementary to the 5' end of U1 snRNA (Lerner et al., 1980; Rogers and Wall, 1980). The sequence is not well conserved in vertebrates and variations are readily used in vivo (see for example Treisman et al., 1983). In the yeast, S. cerevisiae,the 5' splice site consensus sequence is more strictly conserved, and mutations are more detrimental to splicing (Rymond and Rosbash, 1992). U1 snRNP Ul snRNP is the most abundant U snRNP in the cell with approximately 106 copies per cell. Ul snRNP is a 17S protein-RNA complex (Reddy and Busch, 1988) consisting of Ul snRNA, a 164 nucleotide RNA transcribed by RNA polymerase II, and multiple proteins that are both specific to Ul snRNP and specific to the snRNPs generally (Reddy and Busch, 1988). U1 snRNP specific protein factors include U1A, UlB, UlC, and Ul 70k. The function of some of these factors is known and will be discussed below. The principle function of Ul snRNP is binding to 5' splice site sequences. This binding event likely identifies these sequences as potential targets of spliceosome complex formation. Interaction of Ul snRNP with the 5' splice site occurs via a base-pairing interaction between the 5' end of Ul snRNA and the 5' splice site. This interaction was first proposed based on the complementarity of the 5' splice site sequence and the 5' end of Ul snRNA (Lerner et al., 1980; Rogers and Wall, 1980) and binding was demonstrated fifteen years ago (Mount et al., 1983). While base-pairing has been demonstrated between the 5' splice site and the 5' end of U1 (the 5' splice site sequence complementarity region, Zhuang and Weiner, 1986), interaction of Ul snRNP with a 5' splice site RNA does not require the 5' splice site complementarity region (Rossi et al., 1996). This result suggests that protein components of U1 snRNP play a role in mediating pre-mRNA association, but that specificity and stability may require the Ul snRNA 5' end. In support of this hypothesis, it has been found that the Ul snRNP-specific protein Ul C is required for formation of the Ul snRNP-pre-mRNA complex, known as E, in Ul snRNP reconstitution experiments and is sufficient for mediating this interaction in the absence of the 5' splice site complementarity region (Heinrichs et al., 1990; Jamison et al., 1995; Will et al., 1996). U1C has additionally been shown to crosslink to 5' splice site sequence containing RNA oligonucleotides indicating a physical interaction between the pre-mRNA 5' splice site and U 1C (Rossi et al., 1996). In yeast, the role U 1C, YU 1C, plays in mediating pre-mRNA binding has been investigated genetically (Tang et al., 1997). YU1C, is required for viability and in the absence of YU1C formation of the Ul snRNPcontaining complexes CC1 and CC2 is inhibited. Further, the 5' end of U snRNA is hypersensitive to RNase digestion in YU 1C depleted strains (Tang et al., 1997). Other U 1 snRNP-specific proteins have been tested for their role in mediating Ul snRNP association with the pre-mRNA. Both the U1A and Ul 70k proteins are not required for binding to the 5' splice site sequence (Jamison et al., 1995; Will et al., 1996) and a functional role for U1A remains elusive. UlA is highly homologous to the U2 snRNP specific protein U2B" and both have two RNA recognition motif (RRM) domains. U1A binds stem loop II of U1 snRNA via its N-terminal RRM domain (Lutz-Freyermuth et al., 1990; Query et al., 1989b). The interaction between this RNA and RRM domain are among the best studied RNA-protein interactions (Nagai, 1996). The biochemical function of the Ul A C-terminal RRM domain is not known, but this domain is not thought to bind RNA (Lu and Hall, 1995). Yeast U A was identified in a screen for enhancers of a Ul snRNA temperature sensitive mutation and is called MUD1 (Liao et al., 1992). MUDI is not an essential gene, but mudlA is synthetic lethal with the U1 snRNA mutant. Surprisingly, the C-terminal RRM domain of Mudlp is more conserved than the N-terminal domain suggesting that it is functionally important (Tang and Rosbash, 1996). The decreased conservation of the N-terminal RRM compared to the C-terminal RRM may be due to the divergence of the U snRNA binding element rather than a difference in function. Mutations in MUDI have only mild effects in vitro. Decreased splicing efficiency is observed only for pre-mRNA substrates that already splice inefficiently in vitro (Liao et al., 1992; Tang and Rosbash, 1996). The Ul 70k protein binds to stem-loop I of Ul snRNA via its single RRM domain (Query et al., 1989a; Query et al., 1989b; Surowy et al., 1989). Ul 70k has a C-terminal degenerate RS domain. RS domains are regions rich in the dipeptide arginine-serine or serinearginine and are found on many splicing factors including both subunits of U2AF, p54 and Ul 70k as well as the SR proteins. The RS domain of U1 70k, sometimes referred to as the RD/E domain, is unusual in that many of the serines are replaced by aspartate or glutamate in the dipeptide repeats. In this respect, the RS domain of Ul 70k is similar to the RS domains of both U2AF and p54. Although Ul 70k does not play an essential role in mediating Ul snRNP binding to the pre-mRNA, it may play a role in stabilizing that interaction. Several studies have shown, that the RS-domains of ASF/SF2, SC35 and other SR proteins bind the RS domain of U1 70k (Jamison et al., 1995; Kohtz et al., 1994; Wu and Maniatis, 1993). Subsequently, it has been shown that phosphorylation of ASF/SF2 is important in mediating the ASF/SF2-U 1 70k association. ASF/SF2 binding to Ul 70k does not disrupt Ul 70k-U 1 snRNA binding (Xiao and Manley, 1997). It has been shown that SR proteins stabilize the interaction of Ul snRNP with the pre-mRNA (Kohtz et al., 1994). This interaction is specific for the SR protein and pre-mRNA and is presumably due to the interaction between Ul 70k and an SR protein bound to the pre-mRNA (Zahler and Roth, 1995). U1 70 k is well conserved in evolution; the RRM domain and RD/E domain are conserved to the invertebrate Drosophila(Mancebo et al., 1990). However, the yeast Ul 70k homologue, SNP 1, is not as well conserved. SNP 1 has been found to be non-essential in yeast; however, snpl cells are slow growing and exhibit a severe temperature sensitive phenotype (Hilleren et al., 1995). Remarkably, only the N-terminal domain, and not the RRM domain, is required to rescue these phenotypes (Hilleren et al., 1995). A partial deletion of the SNP 1 open reading frame is lethal in at least one strain; this mutation can be rescued with a yeast-human chimeric protein suggesting that the functional interactions of U1 70 k/SNP 1 are conserved between yeast and humans (Smith and Barrell, 1991). Yeast has two additional Ul snRNP specific genes, PRP39 (Lockhart and Rymond, 1994) and PRP40 (Kao and Siliciano, 1996). PRP39 is an essential gene, prp39 mutants show defects in splicing both in vivo and in vitro. Prp39p has no obvious structural similarities to other proteins. Absence of Prp39p from extracts prevents formation of spliceosomal complexes. Prp39p is associated with U1 snRNP and associates with spliceosomes as demonstrated by immunoprecipitation (Lockhart and Rymond, 1994). PRP40 was identified as a U 1 snRNP 5' splice site sequence mutation suppressor and is an essential gene. PRP40 is stoichiometrically associated with U1 snRNP and is required for the first catalytic step of splicing (Kao and Siliciano, 1996). PRP40 is also known to interact with the branch sequence binding protein SF 1/BBP and with the MUD2, the probable U2AF homologue in yeast (Abovich and Rosbash, 1997). Despite the clear role U 1 snRNP plays in binding the 5' splice site and the requirement in yeast for several Ul snRNP specific factors for viability (UI snRNA, YU1C, PRP39, and PRP40), the importance of Ul snRNP has been called into question by three observations. First, U snRNP appears to disassemble, or be destabilized, from the spliceosome before catalysis occurs and so is not present at the in the catalytic spliceosome (Konarska and Sharp, 1986; Michaud and Reed, 1993). Second, in yeast some mutations in the 5' splice site sequence lead to the activation of aberrant 5' splice site selection events, but compensatory mutagenesis of Ul snRNA does not correct this defect (Fouser and Friesen, 1986; Jacquier et al., 1985; Parker and Guthrie, 1985). This suggests that the 5' splice site sequence is recognized by two independent factors, Ul snRNP and a second factor that determines the position of the 5' splice site phophodiester bond (Rymond and Rosbash, 1992). Third, splicing in vitro can occur in the absence of U1 snRNP. This has been demonstrated both by affinity depletion of Ul snRNP from HeLa nuclear extract (Crispino et al., 1994; Crispino et al., 1996; Crispino and Sharp, 1995) and by specific nuclease digestion of the 5' end of Ul snRNP (Tam and Steitz, 1994). Most tested pre-mRNAs can only splice in U -depleted or Ul-knockout extracts in the presence of exogenously added SR proteins at high concentrations (Crispino et al., 1996). Cryptic 5' splice site sequences are activated in U 1 snRNP depleted extract suggesting that 5' splice site sequence determination in the absence of Ul snRNP is less specific (Crispino et al., 1996). These results clearly demonstrate that Ul snRNP is not required for formation of the catalytic spliceosome. Some molecule other than Ul snRNP must be responsible for recognition of the 5' splice site at the catalytic steps. The second recognition 5' splice site recognition event is believed to be mediated by U6 snRNP and by U5 snRNP (Kandels-Lewis and Seraphin, 1993; Lesser and Guthrie, 1993; Sawa and Abelson, 1992; Sawa and Shimura, 1992; Sontheimer and Steitz, 1993; Wyatt et al., 1992). These results suggest that Ul snRNP determines which 5' splice site sequence is chosen for spliceosome formation. Presumably, in the absence ofU 1 snRNP, 5' splice site sequences are chosen based on the specificity of the catalytic spliceosome and not on the splice site determination functions of the early spliceosomal complexes. RECOGNITION OF THE 3' SPLICE SITE Three distinct recognition elements at the 3' end of the intron The 3' splice site consists of three RNA elements, the branch sequence, the pyrimidine tract and the AG dinucleotide. The vertebrate 3' splice site sequence elements are similar to those used in the invertebrate Drosophila(Mount et al., 1992). The branch sequence, canonically YTrAY, contains the branch adenosine (underlined) and the 2' hydroxyl that is the nucleophile for the first catalytic step. In vertebrates, sequence requirements for recognition of the branch sequence are not as stringent as the requirements for recognition of the 5' splice site and the pyrimidine tract. In vertebrates, mutation of the branch sequence has little effect on splicing efficiency (Wieringa et al., 1983); however, splicing in vitro of these pre-mRNAs is less efficient than in vivo (Padgett et al., 1985). Mapping of the branch adenosine used in these mutant introns showed that the branch sequence used did not match the consensus, suggesting that the branch sequence was not an essential recognition element (Padgett et al., 1985). These data suggest that branch sequence recognition is not an essential feature of intron recognition in vertebrates. There is, however, a clear constraint on the branch nucleotide in that the branch nucleotide must be an adenosine for both steps of splicing to be completed (Query et al., 1994). In the yeast S. cerevisiae, in contrast, recognition of the branch sequence is probably the critical recognition event in mediating spliceosome assembly, and the branch sequence is both highly conserved and very sensitive to mutation (Parker et al., 1987). There is probably sufficient information in the yeast branch sequence to uniquely identify all but the longest yeast introns (Rymond and Rosbash, 1992), strongly implicating the branch sequence in being the critical mediator of intron recognition in yeast. Mutation of the branch sequence in S. cerevisiae leads either to a block in splicing or to inefficient splicing (Cellini et al., 1986; Parker et al., 1987) and can also lead to the accumulation of first step products (Fouser and Friesen, 1986). The pyrimidine tract in vertebrates generally lies between the branch sequence and the AG dinucleotide. Recognition of a pyrimidine tract is important for splicing in vertebrates as mutations of the pyrimidine tract decrease or block splicing (Reed and Maniatis, 1985; Roscigno et al., 1993; Ruskin and Green, 1985a; Ruskin and Green, 1985b; Smith et al., 1989). In vertebrates, recognition of the pyrimidine tract appears to be the primary recognition event at the 3' end of the intron as the branch sequence is so poorly conserved and the AG dinucleotide, 3' splice site, appears to be defined in relation to the pyrimidine tract (Smith et al., 1989). The pyrimidine tract and AG dinucleotide consensus sequence of both C. elegans (Zhang and Blumenthal, 1996) and the yeast S. cerevisiae (Parker et al., 1987) is quite different from the vertebrate consensus. The C. elegans 3' splice site consensus is UUUCAG/R with no discernible branch sequence and no pyrimidine tract (Zhang and Blumenthal, 1996). The yeast 3' splice site consensus is quite different in that most of the sequence conservation is found at the branch sequence with no apparent pyrimidine tract. Mutations or deletions in the region between the branch sequence and the AG dinucleotide generally have little effect on splicing in yeast (Fouser and Friesen, 1987). The important sequence recognition elements of the 3' splice site region appear to have switched between yeast and vertebrate introns. For yeast the presence of a pyrimidine tract is not important, but the branch sequence is essential, while for vertebrates the pyrimidine tract is important while the branch sequence is not. The AG dinucleotide is not an essential recognition element in vertebrates as deletion of the AG dinucleotide does not effect intron recognition. Both spliceosome assembly and first step catalysis can occur on pre-mRNAs that lack the AG dinucleotide (Anderson and Moore, 1997; Frendeway and Keller, 1985; Reed and Maniatis, 1985). The AG dinucleotide is, however, required for the second step, but it remains unclear whether the AG dinucleotide is selected by a scanning mechanism or by some other mechanism (Anderson and Moore, 1997) and whether this recognition event generally occurs before or after the first step. The following sections will discuss the pyrimidine tract recognition factor U2AF and the branch sequence binding protein SF 1/BBP. U2AF U2AF (U2 snRNP auxiliary factor) was first identified in an experiment that demonstrated that a protein factor was essential for the stable ATP-dependent association of U2 snRNP with the branch sequence (Ruskin et al., 1988). U2AF is a heterodimeric protein splicing factor consisting of a small subunit, U2AF35, and a large subunit, U2AF65 (Zamore and Green, 1989). Both subunits of U2AF have been shown to be highly conserved through evolution and both subunits are found in the fission yeast S. pombe (Potashkin et al., 1993; Wentz and Potashkin, 1996), in the plant A. thaliana (Accession number: AC002332) and in the invertebrates C. elegans (a small subunit orthologue has not been identified in C. elegans; Zorio et aL., 1997) and D. melanogaster(Kanaar et al., 1993). U2AF was purified using a U2 snRNP complex assembly assay. This assay demonstrated the critical importance of the pyrimidine tract in mediating U2 snRNP association with the pre-mRNA (Zamore and Green, 1989). The 5' splice site is not required for the formation of this complex; however, Ul snRNP is required (Barabino et al., 1990). U2AF65 U2AF was found to bind poly[U] RNA at high salt concentrations (Zamore and Green, 1989) and this allowed U2AF activity to be efficiently and specifically depleted from nuclear extract (Zamore and Green, 1991). U2AF65 was found to be necessary and sufficient to reconstitute in vitro splicing activity to these depleted extracts (Zamore and Green, 1991). The Drosophilalarge subunit of U2AF has been shown to be essential for viability and temperature sensitive mutations of the S. pombe homologue of the large subunit of U2AF are inviable (Kanaar et al., 1993; Potashkin et al., 1993). A gene with limited sequence similarity to U2AF65 is found in the yeast S. cerevisiae,Mud2p; despite the limited sequence similarity this gene is likely to be the S. cerevisiae orthologue of U2AF65 due to the striking functional similarity of Mud2p to the U2AF65. Mud2p will be discussed separately below. The domain structure of the large subunit of U2AF65 is conserved throughout these proteins. The Nterminal region of the protein consists of three parts, a short N-terminal peptide not conserved between invertebrates and vertebrates (Zorio et al., 1997), an RS-dipeptide rich region of variable length reminiscent of the RS domain of Ul 70k in its high RE and RD content, and a "hinge" region which has been shown to interact with the small subunit of U2AF in both S. pombe and H. sapiens. The C-terminal two thirds of U2AF65 is the RNA binding region of the protein and consists of two RRM domains and a third C-terminal RRM-like domain. The RRM-like domain is the most conserved portion of these proteins. This domain will be described in more detail in chapter 3. The C-terminal RRM-like domain of U2AF65 has additionally been shown to interact with the branch-sequence binding protein SF1/BBP. The RRM-like domain is required for the association of U2AF with SF1/BBP. Constructs missing 15 amino acids of the N-terminus of this domain cannot associate with SF1/BBP (Berglund et al., 1997). The RS domain of U2AF65 interacts weakly with SR proteins (Wu and Maniatis, 1993) and the RS domain may contact the branch sequence (Valcarcel et al., 1996). The interaction of the human U2AF subunits is stable to extremely high salt concentrations (at least 2M KCl; Zamore and Green, 1989). U2AF65 also contacts the DEAD-box protein UAP56 and recruits it to the assembling spliceosome (Fleckner et al., 1997). The N-terminal region of U2AF65, including the RS domain, the hinge domain, and a portion of the first RRM, can confer U2AF65 activity on a heterologous RNA binding protein, Sex lethal, in a splicing reaction with the U2AF depleted extract NEAU2AF (Valcarcel et al., 1993). The S. cerevisiae splicing factor Mud2p MUD2 is a clear functional homologue to the U2AF large subunit family. MUD2 has, however, diverged in sequence, and the amino acid sequence similarity between U2AF65 and Mud2p is difficult to detect (Abovich et al., 1994). Nonetheless, Mud2p has the same domain structure and organization as U2AF65 and is the most similar molecule to the U2AF65 in the S. cerevisiaegenome (Abovich et al., 1994). Like U2AF65, Mud2p interacts with the branch sequence binding protein SF1/BBP (Berglund et al., 1998). Mud2p plays a role in the early steps of spliceosome assembly, as does U2AF65 (Abovich et al., 1994). MUD2 has been shown to interact genetically, but not biochemically, with the yeast homologue of the U2 snRNP protein U2B ". YU2B " is not essential and mutants do not show a growth defect, however the U2 snRNP-containing spliceosome complex is compromised in yu2b" mutant extracts (Tang et al., 1996). Mud2p is present in the Ul snRNP complex CC2, but is not present in CC1. Mud2p does not interact with Ul snRNP in the absence of premRNA (Abovich et al., 1994). Interaction of Mud2p with pre-mRNA is dependent on the presence of a consensus branch sequence, but only moderately effected by mutation of the 5' splice site. Mud2p has been shown to interact with PRP 11 both by synthetic lethality and yeast two-hybrid analysis. PRP11 is the SAP62 component of the U2 snRNP component SF3a and associates with the pre-mRNA only in the presence of U1 and ATP (Abovich et al., 1994). Similarly, it has been shown that in the vertebrate equivalent of the yeast CC2 complex, the E complex, U2AF interacts with SAP62 and U2 snRNP (Hong et al., 1997). U2AF35 The small subunit of U2AF is also conserved between the fission yeast S. pombe (Wentz and Potashkin, 1996) and Drosophila(Rudner et al., 1996) and consists of a conserved N-terminal domain and less well conserved C-terminal RS domain. The N-terminal region of the protein consists of a conserved PUMP domain (chapter 3) flanked by two regions of near identity of about 60 and 20 amino acids each, between S. pombe, D. melanogaster,and humans (Wentz and Potashkin, 1996). Interaction with the large subunit of U2AF has been shown to be through this central domain. The U2AF small subunit has been shown to be required for viability in Drosophila(Rudner et al., 1996) Extracts depleted of U2AF, NEAU2AF, require the addition of U2AF65, but not U2AF35, for reconstitution of splicing activity (Zamore and Green, 1991). A role for U2AF35 in splicing in vitro was found by using extracts that had been co-depleted of U2AF65 and U2AF35 using polyclonal U2AF35 serum (Zuo and Maniatis, 1996). The remaining U2AF in the extract is approximately stoichiometric for both subunits. In contrast, there is approximately the same amount of U2AF35 remaining in NEAU2AF as in the antibody depleted extract, but substantially less U2AF65 (Zuo and Maniatis, 1996). These antibody depleted extracts show partial restoration of in vitro splicing activity with addition of either U2AF65 or U2AF35 and substantially more splicing activity when both proteins are present. The function of U2AF35 in these extracts is suggested by experiments in which the association of U2AF65 with the pyrimidine tract of a pre-mRNA that required an enhancer sequence for in vitro splicing was monitored. U2AF65 was shown to interact with this pyrimidine tract only in the presence of SR proteins, which bound the enhancer, and U2AF35. This suggests that U2AF35 acts to bridge U2AF65 and SR proteins bound to the pre-mRNA (Zuo and Maniatis, 1996). This interpretation is supported by the observation that SR proteins interact preferentially with U2AF35, and not U2AF65, in a two-hybrid assay (Wu and Maniatis, 1993). The interactions between the SR proteins, bound to the pre-mRNA, and U2AF65, mediated by U2AF35, is paralleled by the interaction between SR proteins, bound to the pre-mRNA, and Ul snRNP, mediated by Ul 70k. This interaction may be functionally significant in formation of the E complex and in splice site determination (Staknis and Reed, 1994b). Paralogues2 of U2AF The previous section has described the probable orthologues of U2AF65 and U2AF35; both of these molecules also have paralogues. U2AF65 has two paralogues; the first is HCC, which was identified as a human autoimmune antigen in hepatocarcinoma patients (Imai et al., 1993); the second, PUF60 is the subject of chapters 2 and 3. U2AF35 also has a paralogue, Urp or U2AFrsl (Tronchere et al., 1997). Urp is required for in vitro splicing and the Urp deficit can not be complemented with U2AF35. Urp binds U2AF65 in the same region that U2AF35 does. The region of Urp that interacts with U2AF65 contains an RRM-like domain (discussed in chapter 3; Tronchere et al., 1997). SF1/BBP SF1 was first identified as a biochemical fraction that was required for spliceosomal complex assembly together with other protein splicing factors, including U2AF and the U2 snRNP associated factors SF3, and purified snRNP particles (Brosi et al., 1993; Kramer and Utans, 1991; Utans and Kramer, 1990). Subsequently it has been shown that SF1 has a branch sequence binding activity and is the orthologue of the yeast protein BBP (Abovich and Rosbash, 1997; Berglund et al., 1997). SF1/BBP interacts with Mud2p and the Ul snRNP protein PRP40 in the CC2 complex (Abovich and Rosbash, 1997). ASSEMBLY OF THE SPLICEOSOME Spliceosome assembly is a multi-step process in which five snRNPs assemble on the pre-mRNA in an ordered fashion and numerous proteins are found in the spliceosomal complex, some are specific for particular complexes while others are found in more than one. This discussion will focus on the assembly of the earliest complex and on the formation of transitional complexes to the catalytic spliceosome. More complete descriptions of spliceosome 2 Paralogues are genes that show similarity in sequence, occur within the same genome and derive from a common ancestral gene (this definition is from Kendrew, 1994). The prototypical example of a paralogue is alpha and beta globin which are paralogues arising from the duplication of an ancestral globin gene. Paralogues are contrasted to orthologues; while paralogues are observed within a single genome, within a single species, orthologues are observed between species. Orthologous genes are related by similarity between species and are related by descent to an ancestral molecule with no duplication events. Thus, the alpha globin genes of horse and human are orthologues of one another, but the alpha globin gene and the beta globin gene are not orthologues but paralogues. assembly, the protein factors found in spliceosomal complexes and the RNA secondary structures found in the spliceosomal complexes can be found elsewhere (Madhani and Guthrie, 1994; Moore et al., 1993; Reed, 1996). Spliceosome assembly is shown diagrammatically in figure 3. E complex The first stable splicing complex that can be identified is the E, or early, complex (Kohtz et al., 1994; Michaud and Reed, 1993) which is analogous to the commitment complexes, CC1 and CC2, seen in S. cerevisiae (Abovich and Rosbash, 1997; S6raphin and Rosbash, 1989; S6raphin and Rosbash, 1991). E complex forms in the absence of ATP. E complexes can form on both 5' and 3' half substrates; while not requiring an intact pre-mRNA, E does require the presence of splice site sequences. Surprisingly, exonic sequences are also important in E5' and E3' complex formation (Michaud and Reed, 1993). The yeast CC1 complex requires the 5' splice site, but does not require the branch sequence for formation (S6raphin and Rosbash, 1989; S6raphin and Rosbash, 1991). It is likely that Ul snRNP binds to the pre-mRNA at many 5' splice site sequence consensus sites that may not correspond to the 5' splice site and that only those Ul snRNP complexes that bind near to SR proteins bound to enhancer sequences are stabilized and can initiate formation of the ATP-dependent spliceosomal complexes. Consistent with this there appears to be a higher stoichiometry of Ul snRNP in E complex than there is for the other snRNAs in later complexes (Michaud and Reed, 1991; Michaud and Reed, 1993). In one case it has been shown that Ul snRNP binds to a nonfunctional 5' splice site sequence (Siebel et al., 1992). However, in this case binding of U1 snRNP to this sequence is important in regulating this intron. It is also likely that U2AF binds to pyrimidine tracts some of which may not be 3' splice site sequences. So although Ul snRNP and U2AF have clear roles in splice site sequence recognition, they are probably not sufficient, in themselves, to determine the splice site. This is consistent with the previous discussion arguing that the splice site sequences themselves are insufficient to determine splice sites. Transition from the E complex to the A complex, which contains a stably bound U2 snRNP, requires ATP (Konarska and Sharp, 1986); ATP hydrolysis may, then, be required for the splice site determination step. There are two candidate classes of ATPases that may be responsible for the ATPdependent E to A complex transition. The first class of ATPases is the DEAD-box family of proteins which are believed to be ATP-dependent helicases (Will and Luhrmann, 1997). Although there are currently no helicases known to function at this step in spliceosome assembly, U2AF65 is known to bind to the UAP56 DEAD box protein and this protein may play a role in this transition (Fleckner et al., 1997). The Prp5p DEAD-box protein may also be required for this step in yeast (Wiest et aL, 1996). The second class of ATPases that may play a role in this process are the RS-domain kinases. Two RS-domain kinases have been purified based on substrate specificity (Colwill et al., 1996; Gui et al., 1994) and another RS domain kinase activity associates with Ul snRNP (Tazi et al., 1993). DNA topoisomerase I is also a candidate RS kinase (Rossi et al., 1996). These kinases have not been demonstrated to have an essential role in splicing, but it is known that dephosphorylation is essential for splicing (Mermoud et al., 1994), suggesting that a phosphorylation-dephosphorylation cycle is important in splicing (Cao et al., 1997). In these experiments phosphatase inhibitors blocked splicing, however spliceosomal complexes formed (Mermoud et al., 1994). Thiophosphorylation, known to inhibit phosphatases, of the Ul 70k protein also leads to accumulation of spliceosomal complexes and to a block to splicing (Tazi et al., 1993). The SR proteins have been shown to be present in the E complex and to play a role in intron recognition (Staknis and Reed, 1994b). It is interesting in this respect to recall that exonic sequences are important for the formation of the E complex (Michaud and Reed, 1993). SR proteins generally have two N-terminal RNA recognition motif (RRM) domains that are important for both sequence specific binding (Manley and Tacke, 1996) and intron specific splicing (Chandler et al., 1997; Tacke and Manley, 1995). SR proteins have also been shown to bind splicing regulatory elements (Lynch and Maniatis, 1996) and are likely to bind enhancer elements more generally. SR proteins would, therefore, be well suited to act as mediators in recognition of the context information that is required in addition to the splice site sequence information. These results suggest a model in which Ul snRNP and U2AF may be stabilized by interaction with SR proteins bound to nearby enhancer sequences. This interaction is likely to require phosphorylation of the SR domains of one or both components of the interaction (Xiao and Manley, 1997). The biochemical interaction between Ul snRNP or U2AF65 and the SR proteins would link splice site sequence information, in the form of the enhancer sequences, to context information, and also link ATP hydrolysis (phosphorylation of SR domains) to the transition from E to A complex. The complexes observed in the presence of phosphatase inhibitors and thiophosphorylated Ul 70k may represent complexes that are blocked in the transition from the U1 snRNP containing complexes to the later spliceosomal complexes. A model in which factors that associate with the splice site sequences, such as Ul 70k and U2AF, also interact with factors, such as SR proteins, associated with exonic or intronic sequence elements may allow for greater flexibility and fidelity than a model in which the factors bind to only one or two sequences. Herschlag argues that multiple, weak interactions can be more specific than a single, strong interaction (Herschlag, 1991). This hypothesis also predicts that phosphatases and kinases may play a role in regulating splicing. Interestingly, recent work suggests that a phosphatase interacts with PSF, a pyrimidine tract-associated splicing factor (Hirano et al., 1996; Patton et al., 1993). The PSFassociated protein PTB has been shown to play a role in inhibiting exon inclusion (Ashiya and Grabowski, 1997; Gooding et al., 1998). It is interesting to contrast the splice site determination-system in vertebrates to that found in yeast where there are no clear orthologues to the SR proteins. SR proteins may not be required in yeast as there is much more information present in the yeast splice site sequences than in vertebrate sequences, perhaps enough to uniquely define each intron (C. Burge, personal communication). In the case of yeast splice site determination, additional enhancer information, or context, may not be required and all the splice site determination information may be found within the splice site sequences themselves. Notably, there are few long introns in yeast and for these introns an RNA duplex formed between intronic sequences, near the intron ends, is thought to assist splicing efficiency (Charpentier and Rosbash, 1996). These duplex sequences may be the functional equivalent to splicing enhancer sequences found in the more complicated vertebrate splicing systems. A complex U2 snRNP has recently been shown to associate with E complex; the SF3a component of 17S U2 snRNP is likely to be important for this interaction (Hong et al., 1997). The S. cerevisiaeequivalent of SF3a is also required for the binding of U2 snRNP to the CC complex (Ruby et al., 1993). SF3a is composed of three proteins, SAP61, SAP62 and SAP114, and these proteins are the orthologues of PRP9, PRP11 and PRP21 of S. cerevisiae (Behrens et al., 1993; Bennett and Reed, 1993; Brosi et al., 1993). The yeast gene PRP5 is a DEAD-box protein and interacts with PRP9, 11 and 21 and so may be responsible for mediating the ATPdependent association of U2 snRNP with the branch sequence in yeast (Dalbadie-McFarland and Abelson, 1990; Ruby et al., 1993). The A. complex may also represent an early stage in the association of U2 snRNP with the pre-mRNA (Query et al., 1997). Assembly of this complex requires the activity of the PUF fraction that is the subject of chapters 2 and 3 (Query et al., 1997). U2 snRNP has been shown to base-pair with the branch sequence (Parker et al., 1987; Wu and Manley, 1989; Zhuang and Weiner, 1989), leaving the branch adenosine unpaired and bulged in A complex (see for example; Query et al., 1994). The SF3a components have been shown to contact the pre-mRNA both 5' and 3' to the branch sequence in this complex. This interaction does not occur prior to the ATP-dependent stable association of U2 snRNP to the branch sequence (Gozani et al., 1996). The spliceosomal complexes B and C Stable binding of U2 snRNP to the pre-mRNA is rapidly followed by the association of the tri-snRNP complex U4/5/6 snRNP (Konarska and Sharp, 1986). Factors required for the association of the tri-snRNP with the pre-mRNA and the structure of this complex have been described elsewhere (Moore et al., 1993). The catalytically active spliceosome, C, is thought to result from a conformational change in the RNA and protein components present in the B complex (Moore et al., 1993). Some second-step specific splicing factors may associate at this time as well (Umen and Guthrie, 1995). The B and C complexes and protein factors and genes responsible for their assembly and function have been reviewed elsewhere (Moore et al., 1993). Few experiments address the question of how the transition from the A to B/C complexes is accomplished. I will describe an unusual complex identified by Ast and colleagues that may represent an intermediate transitional complex between the A and the B/C complexes (Ast and Weiner, 1997a; Ast and Weiner, 1997b; Ast and Weiner, 1996). Ast and co-workers have shown that a U1/4/5 snRNP complex is detectable under some conditions (Ast and Weiner, 1996). This was unexpected as both Ul and U4 are destabilized from the spliceosome before splicing catalysis (Blencowe et al., 1989; Cheng and Abelson, 1987; Lamond et al., 1988; Michaud and Reed, 1993; Pikielny et al., 1986; Yean and Lin, 1991), but U5 is associated with the spliceosome at the time of catalysis (Sontheimer and Steitz, 1993). A function for this complex seemed, at first, obscure. In more detail, this work shows that addition of an oligo, BUSAe, complementary to the 3' end of the U5 stem induces a conformational change in U5 snRNP (Ast and Weiner, 1997a). The conformational change leads to the formation of a U5 complex with Ul and U4 snRNPs and this U1/4/5 complex has 5' splice site binding properties. Addition of a pyrimidine-tract 3' splice site oligo to the 5' splice site binding reaction increases association of the 5' splice site oligo by 8-fold. Association of the pyrimidine tract occurs in a complex containing U2 and U6 snRNPs to the U1/4/5 complex by protein factors. The interaction of U2 and U6 snRNPs with the U1/4/5 complex enhanced by the presence of pre-mRNA suggests that a factor mediates a bridging interaction (Ast and Weiner, 1996). Such a bridging interaction could be mediated by interaction with U2 via the pyrimidine tract and with U1 via Ul 70 kD. The formation of a U 1/4/5 complex and its possible association with a U2/6 complex is suggestive of a transitional complex that may form between the commitment complex and the catalytic spliceosome. It is tempting to speculate that a U 1/4-containing complex disassembles from the pre-mRNA in a concerted fashion as the U2/5/6 spliceosomal components bind to the pre-mRNA. The U 1/4/5 complex may then represent the final complex formed before U5 binds the exonic region of the 5' splice site and Ul and U4 dissociate from the pre-mRNA. The presence of the BU5Ae oligo is required for the formation of these complexes. While the addition of this oligo does not inhibit splicing and can associate with U5 snRNP in the spliceosome through both catalytic steps, it is not clear whether the observed changes in U5 snRNP caused by BU5Ae faithfully model the changes that occur normally during spliceosome complex assembly (Ast and Weiner, 1996). The nature of these changes, if shown to occur in the spliceosome, will be of great interest. It will also be of interest to determine whether the complexes observed in the presence of the BU5Ae oligo can be observed in its absence. MY CONTRIBUTIONS TO THIS PROJECT: Chapter two presents the identification, purification, and characterization of a new pyrimidine tract binding factor, poly[U] factor (PUF), that is required for efficient splicing in vitro. Chapter three describes a novel pyrimidine tract binding protein present in the PUF factor that associates with p54, an SR-protein splicing factor, and identifies a domain found of that protein as a protein-protein interaction domain. Chapter four describes experiments by Andrew MacMillan, John Crispino and myself (MacMillan et al., 1997) that demonstrate that U2AF is not required for splicing in vitro. John had previously demonstrated that U 1 snRNP is not required for splicing in vitro and Andrew determined using the poly[U]-depleted splicing extract, described in chapter 2 (NEAU), that U2AF was not required for splicing in vitro in the presence of high concentrations of added SC35, an SR protein. As described above, neither Ul snRNP nor U2AF are thought to be present in the catalytic spliceosome, these experiments and John Crispino's work (Crispino et al., 1994; Crispino and Sharp, 1995) suggest that the spliceosomes formed in the absence of U1 snRNP or U2AF have bypassed the splice site determination step. It is interesting to note that PUF is required in the U2AF-depleted SC35 reconstitution experiments; this suggests that PUF acts at a post-U2AF dependent step in spliceosome assembly. Alternatively, under these conditions, PUF may substitute for U2AF function. Chapter 5 describes the identification of a U2 snRNP complex that forms on a minimal RNA ligand (Query et al., 1997). Charles Query demonstrated that this complex uncovers sequence binding specificities of U2 snRNP that are masked in the U2 snRNP complexes that form on larger RNAs. These specificities are presumably masked when U2 snRNP binds to longer RNAs due to the RNA binding activities of U2 snRNP associated proteins (Staknis and Reed, 1994a). Together, Charles and I, showed that this U2 snRNP complex requires the PUF and U2AF activity to form in NEAU. This demonstrates that PUF activity is required prior to, or coincident with, U2 snRNP complex formation at the branch sequence. The thesis includes a speculative appendix that describes a model for the interaction of the branch adenosine with the conserved U5 snRNA loop. This base-pairing interaction juxtaposes the 5' splice site phosphodiester bond with the 2' hydroxyl nucleophile of the branch adenosine and so may be an interaction that aligns these substrates for the first chemical step. The model is discussed in the context of the genetic and biochemical evidence describing the interaction between the U5 snRNA loop and the 5' splice site sequence (Newman, 1997; O'Keefe et al., 1996; Sontheimer and Steitz, 1993; Wyatt et al., 1992) and found to be consistent with this interaction. Experiments that test the model are suggested. In an afterword, experiments are discussed that will help to clarify the many outstanding issues regarding the PUF factor and PUF60. Particular emphasis will be placed on determining if PUF60 is required for splicing in vitro. REFERENCES Abovich, N., Liao, X. C., and Rosbash, M. (1994). The yeast MUD2 protein: an interaction with PRP 11 defines a bridge between commitment complexes and U2 snRNP addition. Genes Dev 8, 843-54. Abovich, N., and Rosbash, M. (1997). Cross-intron bridging interactions in the yeast commitment complex are conserved in mammals. Cell 89, 403-12. Anderson, K., and Moore, M. J. (1997). Bimolecular exon ligation by the human spliceosome. Science 276, 1712-6. Ashiya, M., and Grabowski, P. J. (1997). A neuron-specific splicing switch mediated by an array of pre-mRNA repressor sites: evidence of a regulatory role for the polypyrimidine tract binding protein and a brain-specific PTB counterpart. RNA 3, 996-1015. Ast, G., and Weiner, A. M. (1997a). Antisense oligonucleotide binding to U5 snRNP induces a conformational change that exposes the conserved loop of U5 snRNA. Nucleic Acids Res 25, 3508-13. Ast, G., and Weiner, A. M. (1997b). A novel U1/U5 interaction indicates proximity between U1 and U5 snRNAs during an early step of mRNA splicing. RNA 3, 371-81. Ast, G., and Weiner, A. M. (1996). A U1/U4/U5 snRNP complex induced by a 2'-O-methyloligoribonucleotide complementary to U5 snRNA. Science 272, 881-4. Barabino, S. M. L., Blencowe, B. J., Ryder, U., Sproat, B. S., and Lamond, A. I. (1990). Targeted snRNP depletion reveals an additional role for mammalian Ul snRNP in spliceosome assembly. Cell 63, 293-302. Behrens, S. E., Galisson, F., Legrain, P., and Luhrmann, R. (1993). Evidence that the 60-kDa protein of 17S U2 small nuclear ribonucleoprotein is immunologically and functionally related to the yeast PRP9 splicing factor and is required for the efficient formation of prespliceosomes. Proc Natl Acad Sci U S A 90, 8229-33. Bennett, M., and Reed, R. (1993). Correspondence between a mammalian spliceosome component and an essential yeast splicing factor. Science 262, 105-8. Berget, S. M., Moore, C., and Sharp, P. A. (1977). Spliced segments at the 5' terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. USA 74, 3171-3175. Berglund, J. A., Abovich, N., and Rosbash, M. (1998). A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition. Genes Dev 12, 858-67. Berglund, J. A., Chua, K., Abovich, N., Reed, R., and Rosbash, M. (1997). The splicing factor BBP interacts specifically with the pre-mRNA branchpoint sequence UACUAAC. Cell 89, 781-7. Birney, E., Kumar, S., and Krainer, A. R. (1993). Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res 21, 5803-16. Blencowe, B. J., Sproat, B. S., Ryder, U., Barabino, S., and Lamond, A. I. (1989). Antisense probing of the human U4/U6 snRNP with biotinylated 2'-OMe RNA oligonucleotides. Cell 59, 531-539. Boguski, M. S., and McCormick, F. (1993). Proteins regulating Ras and its relatives. Nature 366, 643-54. Brody, E., and Abelson, J. (1985). The 'spliceosome': Yeast premessenger RNA associates with a 40 S complex in a splicing-dependent reaction. Science 228, 963-967. Brosi, R., Groning, K., Behrens, S. E., Luhrmann, R., and Krimer, A. (1993). Interaction of mammalian splicing factor SF3a with U2 snRNP and relation of its 60-kD subunit to yeast PRP9. Science 262, 102-5. Burge, C., and Sharp, P. A. (1998). Manuscript in preparation. In The RNA World II. Cao, W.,Jamison, S. F., and Garcia-Blanco, M. A. (1997). Both phosphorylation and dephosphorylation of ASF/SF2 are required for pre-mRNA splicing in vitro. RNA 3, 14561467. Cellini, A., Parker, R., McMahon, J., Guthrie, C., and Rossi, J. (1986). Activation of a cryptic TACTAAC box in the Saccharomyces cerevisiae actin intron. Mol Cell Biol 6, 1571-8. Chan, R. C., and Black, D. L. (1995). Conserved intron elements repress splicing of a neuronspecific c-src exon in vitro. Mol Cell Biol 15, 6377-85. Chandler, S. D., Mayeda, A., Yeakley, J. M., Krainer, A. R., and Fu, X. D. (1997). RNA splicing specificity determined by the coordinated action of RNA recognition motifs in SR proteins. Proc Natl Acad Sci U S A 94, 3596-601. Charpentier, B., and Rosbash, M. (1996). Intramolecular structure in yeast introns aids the early steps of in vitro spliceosome assembly. RNA 2, 509-22. Cheng, S. C., and Abelson, J. (1987). Spliceosome assembly in yeast. Genes Dev 1, 10141027. Chow, L. T., Gelinas, R. E., Broker, T. R., and Roberts, R. J. (1977). An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA. Cell 12, 1-8. Colwill, K., Pawson, T., Andrews, B., Prasad, J., Manley, J. L., Bell, J. C., and Duncan, P. I. (1996). The Clk/Sty protein kinase phosphorylates SR splicing factors and regulates their intranuclear distribution. EMBO J 15, 265-75. Crispino, J. D., Blencowe, B. J., and Sharp, P. A. (1994). Complementation by SR proteins of pre-mRNA splicing reactions depleted of U1 snRNP. Science 265, 1866-9. Crispino, J. D., Mermoud, J. E., Lamond, A. I., and Sharp, P. A. (1996). Cis-acting elements distinct from the 5' splice site promote U -independent pre-mRNA splicing. RNA 2, 664-73. Crispino, J. D., and Sharp, P. A. (1995). A U6 snRNA:pre-mRNA interaction can be ratelimiting for Ul-independent splicing. Genes & Dev. 9, 2314-2323. Dalbadie-McFarland, G., and Abelson, J. (1990). PRP5: A helicase-like protein required for mRNA splicing in yeast. Proc. Natl. Acad. Sci. USA 87, 4236-4240. Faustino, P., Osorio-Almeida, L., Romao, L., Barbot, J., Fernandes, B., Justica, B., and Lavinha, J. (1998). Dominantly transmitted beta-thalassemia arising from the production of several aberrant mRNA species and one abnormal peptide. Blood 91, 685-90. Fleckner, J., Zhang, M., Valcarcel, J., and Green, M. R. (1997). U2AF65 recruits a novel human DEAD box protein required for the U2 snRNP-branchpoint interaction. Genes Dev 11, 1864-72. Fouser, L. A., and Friesen, J. D. (1987). Effects on mRNA splicing of mutations in the 3' region of the Saccharomyces cerevisiae actin intron. Mol Cell Biol 7, 225-30. Fouser, L. A., and Friesen, J. D. (1986). Mutations in a yeast intron demonstrate the importance of specific conserved nucleotides for the two stages of nuclear mRNA splicing. Cell 45, 81-93. Frendeway, D., and Keller, W. (1985). Stepwise assembly of a pre-mRNA splicing complex requires U-snRNAs and specific intron sequences. Cell 42, 355-367. Gooding, C., Roberts, G. C., and Smith, C. W. (1998). Role of an inhibitory pyrimidine element and polypyrimidine tract binding protein in repression of a regulated alpha-tropomyosin exon. RNA 4, 85-100. Gozani, O., Feld, R., and Reed, R. (1996). Evidence that sequence-independent binding of highly conserved U2 snRNP proteins upstream of the branch site is required for assembly of spliceosomal complex A. Genes Dev 10, 233-43. Grabowski, P. J., Padgett, R. A., and Sharp, P. A. (1984). Messenger RNA splicing in vitro: An excised intervening sequence and a potential intermediate. Cell 37, 415-427. Grabowski, P. J., Seiler, S. R., and Sharp, P. A. (1985). A multicomponent complex is involved in the splicing of messenger RNA precursors. Cell 42, 345-353. Green, M. R. (1986). Pre-mRNA splicing. Annu. Rev. Genet. 20, 671-708. Gui, J. F., Tronchere, H., Chandler, S. D., and Fu, X. D. (1994). Purification and characterization of a kinase specific for the serine- and arginine-rich pre-mRNA splicing factors. Proc Natl Acad Sci U S A 91, 10824-8. Hall, S. L., and Padgett, R. A. (1996). Requirement of U12 snRNA for in vivo splicing of a minor class of eukaryotic nuclear pre-mRNA introns. Science 271, 1716-8. Heinrichs, V., Bach, M., Winkelmann, G., and Luhrmann, R. (1990). Ul-specific protein C needed for efficient complex formation of U1 snRNP with a 5' splice site. Science 247, 69-72. Herschlag, D. (1991). Implications of ribozyme kinetics for targeting the cleavage of specific RNA molecules in vivo: more isn't always better. Proc Natl Acad Sci U S A 88, 6921-6925. Hilleren, P. J., Kao, H. Y., and Siliciano, P. G. (1995). The amino-terminal domain of yeast U 1-70K is necessary and sufficient for function. Mol Cell Biol 15, 6341-50. Hirano, K., Erdodi, F., Patton, J. G., and Hartshorne, D. J. (1996). Interaction of protein phosphatase type 1 with a splicing factor. FEBS Lett 389, 191-4. Hodgkin, J., Papp, A., Pulak, R., Ambros, V., and Anderson, P. (1989). A new kind of informational suppression in the nematode Caenorhabditis elegans. Genetics 123, 301-13. Hong, W., Bennett, M., Xiao, Y., Feld Kramer, R., Wang, C., and Reed, R. (1997). Association of U2 snRNP with the spliceosomal complex E. Nucleic Acids Res 25, 354-61. Imai, H., Chan, E. K., Kiyosawa, K., Fu, X. D., and Tan, E. M. (1993). Novel nuclear autoantigen with splicing factor motifs identified with antibody from hepatocellular carcinoma. J Clin Invest 92, 2419-26. Jacquier, A., Rodriguez, J. R., and Rosbash, M. (1985). A quantitative analysis of the effects of 5' junction and TACTAAC box mutants and mutant combinations on yeast mRNA splicing. Cell 43, 423-30. Jamison, S. F., Pasman, Z., Wang, J., Will, C., Luhrmann, R., Manley, J. L., and GarciaBlanco, M. A. (1995). U snRNP-ASF/SF2 interaction and 5' splice site recognition: characterization of required elements. Nucleic Acids Res 23, 3260-7. Kanaar, R., Roche, S. E., Beall, E. L., Green, M. R., and Rio, D. C. (1993). The conserved pre-mRNA splicing factor U2AF from Drosophila: requirement for viability. Science 262, 56973. Kandels-Lewis, S., and Seraphin, B. (1993). Involvement of U6 snRNA in 5' splice site selection. Science 262, 2035-9. Kao, H. Y., and Siliciano, P. G. (1996). Identification of Prp40, a novel essential yeast splicing factor associated with the Ul small nuclear ribonucleoprotein particle. Mol Cell Biol 16, 960-7. Kendrew, J. (1994). The Encyclopedia of Molecular Biology (Oxford: Blackwell Science Ltd.). Kohtz, J. D., Jamison, S. F., Will, C. L., Zuo, P., Luhrmann, R., Garcia-Blanco, M. A., and Manley, J. L. (1994). Protein-protein interactions and 5'-splice-site recognition in mammalian mRNA precursors. Nature 368, 119-24. Konarska, M. M., and Sharp, P. A. (1986). Electrophoretic separation of complexes involved in the splicing of precursors to mRNAs. Cell 46, 845-855. Krainer, A. R., Maniatis, T., Ruskin, B., and Green, M. R. (1984). Normal and mutant human beta-globin pre-mRNAs are faithfully and efficiently spliced in vitro. Cell 36, 993-1005. Kramer, A., and Utans, U. (1991). Three protein factors (SF1, SF3 and U2AF) function in pre-splicing complex formation in addition to snRNPs. EMBO J 10, 1503-9. Lamond, A. I., Konarska, M. M., Grabowski, P. J., and Sharp, P. A. (1988). Spliceosome assembly involves binding and release of U4 small nuclear ribonucleoprotein. Proc. Natl. Acad. Sci. USA 85, 411-415. Lavigueur, A., LaBranche, H., Komblihtt, A. R., and Chabot, B. (1993). A splicing enhancer in the human fibronectin alternate ED 1 exon interacts with SR proteins and stimulates U2 snRNP binding. Genes & Dev. 7, 2405-2417. Lerner, M. R., Boyle, J. A., Mount, S. M., Wolin, S. L., and Steitz, J. A. (1980). Are snRNPs involved in splicing? Nature 283, 220-224. Lesser, C. F., and Guthrie, C. (1993). Mutations in U6 snRNA that alter splice site specificity: implications for the active site. Science 262, 1982-8. Liao, X. C., Colot, H. V., Wang, Y., and Rosbash, M. (1992). Requirements for U2 snRNP addition to yeast pre-mRNA. Nucleic Acids Res. 20, 4237-4245. Lockhart, S. R., and Rymond, B. C. (1994). Commitment of yeast pre-mRNA to the splicing pathway requires the novel Ul snRNP polypeptide, Prp39p. Mol. Cell. Biol., in press. Lu, J., and Hall, K. B. (1995). An RBD that does not bind RNA: NMR secondary structure determination and biochemical properties of the C-terminal RNA binding domain from the human U1A protein. J Mol Biol 247, 739-52. Lutz-Freyermuth, C., Query, C. C., and Keene, J. D. (1990). Quantitative determination that one of two potential RNA-binding domains of the A protein component of the Ul small nuclear ribonucleoprotein complex binds with high affinity to stem-loop II of U1 RNA. Proc. Natl. Acad. Sci. USA 87, 6393-6397. Lynch, K. W., and Maniatis, T. (1996). Assembly of specific SR protein complexes on distinct regulatory elements of the Drosophila doublesex splicing enhancer. Genes & Dev. 10, 20892101. MacMillan, A. M., McCaw, P. S., Crispino, J. D., and Sharp, P. A. (1997). SC35-mediated reconstitution of splicing in U2AF-depleted nuclear extract. Proc Natl Acad Sci U S A 94, 1336. Madhani, H. D., and Guthrie, C. (1994). Dynamic RNA-RNA interactions in the spliceosome. Annu Rev Genet 28, 1-26. Mancebo, R., Lo, P. C. H., and Mount, S. M. (1990). Structure and expression of the Drosophilamelanogastergene for the Ul small nuclear ribonucleoprotein particle 70K protein. Mol. Cell. Biol. 10, 2492-2502. Maniatis, T., and Reed, R. (1987). The role of small nuclear ribonucleoprotein particles in premRNA splicing. Nature 325, 673-8. Manley, J. L., and Tacke, R. (1996). SR proteins and splicing control. Genes Dev 10, 156979. Mermoud, J. E., Cohen, P. T., and Lamond, A. I. (1994). Regulation of mammalian spliceosome assembly by a protein phosphorylation mechanism. EMBO J 13, 5679-88. Michaud, S., and Reed, R. (1991). An ATP-independent complex commits pre-mRNA to the mammalian spliceosome assembly pathway. Genes & Dev. 5, 2534-2546. Michaud, S., and Reed, R. (1993). A functional association between the 5' and 3' splice site is established in the earliest prespliceosome complex (E) in mammals. Genes Dev 7, 1008-20. Min, J., Shukla, H., Kozono, H., Bronson, S. K., Weissman, S. M., and Chaplin, D. D. (1995). A novel Creb family gene telomeric of HLA-DRA in the HLA complex. Genomics 30, 149-56. Moore, M. J., Query, C. C., and Sharp, P. A. (1993). Splicing of precursors to mRNA by the spliceosome. In The RNA World, R. Gesteland and J. Atkins, eds. (New York: Cold Spring Harbor Laboratory Press), pp. 303-357. Mount, S. M. (1982). A catalogue of splice junction sequences. Nucleic Acids Res 10, 459-72. Mount, S. M., Burks, C., Hertz, G., Stormo, G. D., White, 0., and Fields, C. (1992). Splicing signals in Drosophila:intron size, information content, and consensus sequences. Nucleic Acids Res. 20, 4255-4262. Mount, S. M., Pettersson, I., Hinterberger, M., Karmas, A., and Steitz, J. A. (1983). The Ul small nuclear RNA-protein complex selectively binds a 5' splice site in vitro. Cell 33, 509-18. Nagai, K. (1996). RNA-protein complexes. Curr. Opin. Struct. Biol. 6, 53-61. Newman, A. J. (1997). The role of U5 snRNP in pre-mRNA splicing. EMBO J 16, 5797-800. O'Keefe, R. T., Norman, C., and Newman, A. J. (1996). The invariant U5 snRNA loop 1 sequence is dispensable for the first catalytic step of pre-mRNA splicing in yeast. Cell 86, 67989. Padgett, R. A., Konarska, M. M., Aebi, M., Hornig, H., Weissmann, C., and Sharp, P. A. (1985). Nonconsensus branch-site sequences in the in vitro splicing of transcripts of mutant rabbit 3-globin genes. Proc. Natl. Acad. Sci. 82, 8349-8353. Padgett, R. A., Konarska, M. M., Grabowski, P. J., Hardy, S. F., and Sharp, P. A. (1984). Lariat RNA's as intermediates and products in the splicing of messenger RNA precursors. Science 225, 898-903. Parker, R., and Guthrie, C. (1985). A point mutation in the conserved hexanucleotide at a yeast 5' splice junction uncouples recognition, cleavage, and ligation. Cell 41, 107-18. Parker, R., Siliciano, P. G., and Guthrie, C. (1987). Recognition of the TACTAAC box during mRNA splicing in yeast involves base pairing to the U2-like snRNA. Cell 49, 229-39. Patton, J. G., Porro, E. B., Galceran, J., Tempst, P., and Nadal-Ginard, B. (1993). Cloning and characterization of PSF, a novel pre-mRNA splicing factor. Genes & Dev. 7, 393-406. Pikielny, C. W., Rymond, B. C., and Rosbash, M. (1986). Electrophoresis of ribonucleoproteins reveals an ordered assembly pathway of yeast splicing complexes. Nature 324, 341-345. Potashkin, J., Naik, K., and Wentz, H. K. (1993). U2AF homolog required for splicing in vivo. Science 262, 573-5. Pulak, R., and Anderson, P. (1993). mRNA surveillance by the Caenorhabditis elegans smg genes. Genes Dev 7, 1885-97. Query, C. C., Bentley, R. C., and Keene, J. D. (1989a). A common RNA recognition motif identified within a defined Ul RNA binding domain of the 70K Ul snRNP protein. Cell 57, 89-101. Query, C. C., Bentley, R. C., and Keene, J. D. (1989b). A specific 31 nucleotide domain of Ul RNA directly interacts with the 70K Ul snRNP protein. Mol. Cell. Biol. 9, 4872-4881. Query, C. C., McCaw, P. S., and Sharp, P. A. (1997). A minimal spliceosomal complex A recognizes the branch site and polypyrimidine tract. Mol Cell Biol 17, 2944-53. Query, C. C., Moore, M. J., and Sharp, P. A. (1994). Branch nucleophile selection in premRNA splicing: evidence for the bulged duplex model. Genes & Dev. 8, 587-597. Reddy, R., and Busch, H. (1988). Small Nuclear RNAs: RNA Sequences, Structure and Modifications. In Small Ribonucleoprotein Particles, M. L. Birnsteil, ed. (Berlin: SpringerVerlag), pp. 1-37. Reed, R. (1996). Initial splice-site recognition and pairing during pre-mRNA splicing. Curr Opin Genet Dev 6, 215-20. Reed, R., and Maniatis, T. (1985). Intron sequences involved in lariat formation during premRNA splicing. Cell 41, 95-105. Robberson, B. L., Cote, G. J., and Berget, S. M. (1990). Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol. Cell. Biol. 10, 84-94. Rogers, J., and Wall, R. (1980). A mechanism for RNA splicing. Proc. Natl. Acad. Sci. USA 77, 1877-1879. Roscigno, R. F., Weiner, M., and Garcia-Blanco, M. A. (1993). A mutational analysis of the polypyrimidine tract of introns. Effects of sequence differences in pyrimidine tracts on splicing. J Biol Chem 268, 11222-9. Rossi, F., Forne, T., Antoine, E., Tazi, J., Brunel, C., and Cathala, G. (1996). Involvement of Ul small nuclear ribonucleoproteins (snRNP) in 5' splice site-U1 snRNP interaction. J Biol Chem 271, 23985-91. Rossi, F., Labourier, E., Forne, T., Divita, G., Derancourt, J., Riou, J. F., Antoine, E., Cathala, G., Brunel, C., and Tazi, J. (1996). Specific phosphorylation of SR proteins by mammalian DNA topoisomerase I. Nature 381, 80-2. Ruby, S. W., Chang, T. H., and Abelson, J. (1993). Four yeast spliceosomal proteins (PRP5, PRP9, PRP11, and PRP21) interact to promote U2 snRNP binding to pre-mRNA. Genes Dev 7, 1909-25. Rudner, D. Z., Kanaar, R., Breger, K. S., and Rio, D. C. (1996). Mutations in the small subunit of the Drosophila U2AF splicing factor cause lethality and developmental defects. Proc Natl Acad Sci U S A 93, 10333-7. Ruskin, B., and Green, M. R. (1985a). Role of the 3' splice site consensus sequence in mammalian pre-mRNA splicing. Nature 317, 732-4. Ruskin, B., and Green, M. R. (1985b). Specific and stable intron-factor interactions are established early during in vitro pre-mRNA splicing. Cell 43, 131-42. Ruskin, B., Krainer, A. R., Maniatis, T., and Green, M. R. (1984). Excision of an intact intron as a novel lariat structure during pre- mRNA splicing in vitro. Cell 38, 317-31. Ruskin, B., Zamore, P. D., and Green, M. R. (1988). A factor, U2AF, is required for U2 snRNP binding and splicing complex assembly. Cell 52, 207-19. Rymond, B. C., and Rosbash, M. (1992). Yeast pre-mRNA splicing. In The Molecular and Cellular Biology of the Yeast Saccharomyces: Gene Expression, E. W. Jones, J. R. Pringle and J. R. Broach, eds. (Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press), pp. 143-192. Sawa, H., and Abelson, J. (1992). Evidence for a base-pairing interaction between U6 snRNA and the 5' splice site during the splicing reaction in yeast. Proc. Natl. Acad. Sci. USA 89, 11269-11273. Sawa, H., and Shimura, Y. (1992). Association of U6 snRNA with the 5'-splice site region of pre-mRNA in the spliceosome. Genes & Dev. 6, 244-254. Senapathy, P., Shapiro, M. B., and Harris, N. L. (1990). Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods Enzymol. 183, 252-278. S6raphin, B., and Rosbash, M. (1989). Identification of functional Ul snRNP-pre-mRNA complexes committed to spliceosome assembly and splicing. Cell 59, 349-358. S6raphin, B., and Rosbash, M. (1991). The yeast branchpoint sequence is not required for the formation of a stable Ul snRNP pre-mRNA complex and is recognized in the absence of U2 snRNA. EMBO J. 10, 1209-1216. Sherr, C. J. (1996). Cancer cell cycles. Science 274, 1672-7. Siebel, C. W., Fresco, L. D., and Rio, D. C. (1992). The mechanism of somatic inhibition of DrosophilaP-element pre-mRNA splicing: multiprotein complexes at an exon pseudo-5' splice site control Ul snRNP binding. Genes & Dev. 6, 1386-1401. Smith, C. W. J., Porro, E. B., Patton, J. G., and Nadal-Ginard, B. (1989). Scanning from an independently specified branch point defines the 3' splice site of mammalian introns. Nature 342, 243-247. Smith, V., and Barrell, B. G. (1991). Cloning of a yeast U1 snRNP 70K protein homologue: functional conservation of an RNA-binding domain between humans and yeast. EMBO J 10, 2627-34. Sontheimer, E. J., and Steitz, J. A. (1993). The U5 and U6 small nuclear RNAs as active site components of the spliceosome. Science 262, 1989-96. Staknis, D., and Reed, R. (1994a). Direct interactions between pre-mRNA and six U2 small nuclear ribonucleoproteins during spliceosome assembly. Mol Cell Biol 14, 2994-3005. Staknis, D., and Reed, R. (1994b). SR proteins promote the first specific recognition of PremRNA and are present together with the U1 small nuclear ribonucleoprotein particle in a general splicing enhancer complex. Mol Cell Biol 14, 7670-82. Stephens, R. M., and Schneider, T. D. (1992). Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites. J Mol Biol 228, 1124-36. Sun, Q., Hampson, R. K., Krainer, A. T., and Rottman, F. M. (1993). General splicing factor SF2/ASF promotes alternative splicing by binding to an exonic splicing enhancer. Genes & Dev. 7, 2598-2608. Surowy, C. S., van Santen, V. L., Scheib-Wixted, S. M., and Spritz, R. A. (1989). Direct, sequence-specific binding of the human U1-70K ribonucleoprotein antigen protein to loop I of U1 small nuclear RNA. Mol Cell Biol 9,4179-86. Tacke, R., Chen, Y., and Manley, J. L. (1997). Sequence-specific RNA binding by an SR protein requires RS domain phosphorylation: creation of an SRp40-specific splicing enhancer. Proc Natl Acad Sci U S A 94, 1148-53. Tacke, R., and Manley, J. L. (1995). The human splicing factors ASF/SF2 and SC35 possess distinct, functionally significant RNA binding specificities. EMBO J 14, 3540-51. Talerico, M., and Berget, S. M. (1994). Intron definition in splicing of small Drosophila introns. Mol Cell Biol 14, 3434-45. Tang, J., Abovich, N., Fleming, M. L., Seraphin, B., and Rosbash, M. (1997). Identification and characterization of a yeast homolog ofU snRNP-specific protein C. EMBO J 16,408291. Tang, J., Abovich, N., and Rosbash, M. (1996). Identification and characterization of a yeast gene encoding the U2 small nuclear ribonucleoprotein particle B" protein. Mol Cell Biol 16, 2787-95. Tang, J., and Rosbash, M. (1996). Characterization of yeast U1 snRNP A protein: identification of the N- terminal RNA binding domain (RBD) binding site and evidence that the C- terminal RBD functions in splicing. RNA 2, 1058-70. Tarn, W. Y., and Steitz, J. A. (1994). SR proteins can compensate for the loss of Ul snRNP functions in vitro. Genes Dev 8, 2704-17. Tam, W. Y., Yario, T. A., and Steitz, J. A. (1995). U12 snRNA in vertebrates: evolutionary conservation of 5' sequences implicated in splicing of pre-mRNAs containing a minor class of introns. RNA 1, 644-56. Tazi, J., Kornstadt, U., Rossi, F., Jeanteur, P., Cathala, G., Brunel, C., and Luhrmann, R. (1993). Thiophosphorylation of Ul-70K protein inhibits pre-mRNA splicing. Nature 363, 2836. Treisman, R., Orkin, S. H., and Maniatis, T. (1983). Specific transcription and RNA splicing defects in five cloned beta- thalassaemia genes. Nature 302, 591-6. Tronchere, H., Wang, J., and Fu, X. D. (1997). A protein related to splicing factor U2AF35 that interacts with U2AF65 and SR proteins in splicing of pre-mRNA. Nature 388, 397-400. Umen, J. G., and Guthrie, C. (1995). The second catalytic step of pre-mRNA splicing. RNA 1, 869-885. Utans, U., and Kramer, A. (1990). Splicing factor SF4 is dispensable for the assembly of a functional splicing complex and participates in the subsequent steps of the splicing reaction. EMBO J 9, 4119-26. Valcarcel, J., Gaur, R. K., Singh, R., and Green, M. R. (1996). Interaction of U2AF 6 5 RS region with pre-mRNA of branch point and promotion base pairing with U2 snRNA. Science 273, 1706-9. Valcarcel, J., Singh, R., Zamore, P. D., and Green, M. R. (1993). The protein Sex-lethal antagonizes the splicing factor U2AF to regulate alternative splicing of transformer pre-mRNA. Nature 362, 171-5. Wallace, J. C., and Edmonds, M. (1983). Polyadenylated nuclear RNA contains branches. Proc. Natl. Acad. Sci. USA 80, 950-954. Wang, J., and Manley, J. L. (1997). Regulation of pre-mRNA splicing in metazoa. Curr Opin Genet Dev 7, 205-11. Wentz, H. K., and Potashkin, J. (1996). The small subunit of the splicing factor U2AF is conserved in fission yeast. Nucleic Acids Res 24, 1849-54. Wieringa, B., Meyer, F., Reiser, J., and Weissmann, C. (1983). Unusual splice sites revealed by mutagenic inactivation of an authentic splice site of the rabbit beta-globin gene. Nature 301, 38-43. Wiest, D. K., O'Day, C. L., and Abelson, J. (1996). In vitro studies of the Prp9.Prp 11.Prp21 complex indicate a pathway for U2 small nuclear ribonucleoprotein activation. J Biol Chem 271, 33268-76. Will, C. L., and Luhrmann, R. (1997). Protein functions in pre-mRNA splicing. Curr Opin Cell Biol 9, 320-8. Will, C. L., Rumpler, S., Klein, G. J., van, V. W., and Luhrmann, R. (1996). In vitro reconstitution of mammalian Ul snRNPs active in splicing: the U1-C protein enhances the formation of early (E) spliceosomal complexes. Nucleic Acids Res 24, 4614-23. Wu, J., and Manley, J. (1989). Mammalian pre-mRNA branch site selection by U2 snRNP involves base pairing. Genes & Dev. 3, 1553-1561. Wu, J. Y., and Maniatis, T. (1993). Specific interactions between proteins implicated in splice site selection and regulated alternative splicing. Cell 75, 1061-70. Wyatt, J. R., Sontheimer, E. J., and Steitz, J. A. (1992). Site-specific cross-linking of mammalian U5 snRNP to the 5' splice site before the first step of pre-mRNA splicing. Genes Dev 6, 2542-53. Xiao, S. H., and Manley, J. L. (1997). Phosphorylation of the ASF/SF2 RS domain affects both protein-protein and protein-RNA interactions and is necessary for splicing. Genes Dev 11, 334-44. Yean, S.-L., and Lin, R.-J. (1991). U4 small nuclear RNA dissociates from a yeast spliceosome and does not participate in the subsequent splicing reaction. Mol. Cell. Biol. 11, 5571-5577. Zahler, A. M., and Roth, M. B. (1995). Distinct functions of SR proteins in recruitment of U1 small nuclear ribonucleoprotein to alternative 5' splice sites. Proc Natl Acad Sci U S A 92, 2642-6. Zamore, P. D., and Green, M. R. (1991). Biochemical characterization of U2 snRNP auxiliary factor: an essential pre-mRNA splicing factor with a novel intranuclear distribution. EMBO J 10, 207-14. Zamore, P. D., and Green, M. R. (1989). Identification, purification, and biochemical characterization of U2 small nuclear ribonucleoprotein auxiliary factor. Proc Natl Acad Sci U S A 86, 9243-7. Zhang, H., and Blumenthal, T. (1996). Functional analysis of an intron 3' splice site in Caenorhabditis elegans. RNA 2, 380-8. Zhang, M., Zamore, P. D., Carmo, F. M., Lamond, A. I., and Green, M. R. (1992). Cloning and intracellular localization of the U2 small nuclear ribonucleoprotein auxiliary factor small subunit. Proc Natl Acad Sci U S A 89, 8769-73. Zhuang, Y., and Weiner, A. M. (1989). A compensatory base change in human U2 snRNA can suppress a branch site mutation. Genes Dev. 3, 1545-1552. Zhuang, Y., and Weiner, A. M. (1986). A compensatory base change in Ul snRNA suppresses a 5' splice site mutation. Cell 46, 827-35. Zorio, D. A., Lea, K., and Blumenthal, T. (1997). Cloning of Caenorhabditis U2AF65: an alternatively spliced RNA containing a novel exon. Mol Cell Biol 17, 946-53. Zuo, P., and Maniatis, T. (1996). The splicing factor U2AF35 mediates critical protein-protein interactions in constitutive and enhancer-dependent splicing. Genes Dev 10, 1356-68. FIGURES Figure 1. The splicing reaction is a two step transesterification reaction. The exons are indicated by boxes and the introns by lines. The 2' hydroxyl that is the first step substrate is indicated by the :OH. The phosphodiester bonds that are the splice site substrates of the first and second steps are at the junction between exon and intron. The first and second steps are indicated. 3' 51 :OH I ntron EXON I IZ: + 2 I LI EXON 2 1 I \\4 , I + Figure 2. Splice site sequences do not alone determine splice sites. The first 7210 nucleotides of the CREB-RS gene pre-mRNA are shown. The sequences of the terminal dinucleotides of each intron are indicated in bold. The 5' splice site sequence of the fourth intron is underlined. The identical sequence is found in the middle of the fourth intron; however, this sequence is not known to be used as a 5' splice site. oobbbpnonppnbnooBbnpbbnoonoooobnoebnbnc)no.6,eobpbnbb-eobeoo.6,ebebbobbbbobvppvbobbpnopobppbbbpp bnbnonnbLveo-eb-eni2-ebn5nenonn-enoobb-eono5noooonnopoobnc)ooooo-eb-ebbnobn-e-eBnbneoobbbnnn-elebbieni2b-ee obpoonpupbbnbbppnpbbb-eobbbnn-eobn-eb-eb.bn-eobeonbnnbbobbnnoeorobnebpbbBbbeoobeoD-V-5 =DVobnbbno nleiebeonbvebbnnob-eobbleob-eooabbbpeb.bn-e-e.6-ebnleobble.B".Bno-eBnbbbnnnebL;nnf3,eooBbn-ennn-eonoobe-ebbeon ,eeobbbboebnbnbnnoepeiebbebbbnoo5n,6bnnnbnobnnnobbebno-ebnbb-eB-eonobbebf)npoobbebbnn-ennnieb-eo.bblep -enronb-e-eo-e-eon-ebnb-e.bbn-eonnnonnnn-eoB-enoobbbbn-eo-eboo-eonnniennnnbbb-e-eb-eonooobbnon-e-ebnobbbnbnooe bnnbnoebbnbbnb-enbbebpoopBbnenbbo-e-ebnb-e-e-eooienb-epnn-e-en-enec)n-ebnB'ennonLooobeeie-e-ennnnble-ebnn-e-e-ene npep-eeonn-enon-enn-enbno-ebnieonenbnnn-ebbnnnbonnbnnnoonnnn-e-enno-e.bnnnnnnnnnnpbeeooboooobnBoopoob vbnbobbpopnnpbbbnobnbepeooonoobbonoobnoopoonvbnbonoopbnoononebononBbne5broobvnnbnbooponnnB bbboieb-eb-enb-ennnnnnn-enonnnnnL*-eno.bbnoobo-eoovooboonbobb-eo-eno-e.5b.bnobnnb-eeooononB-eonooonc)ononne oobneonnbbbnoooooboonoBLuobno-eonof)bonon-e-eobobbnBpobnbebbnobbLooo'enonobononbeb-eo-eb-ebnnnnnnn nnnnnnnnnnnnnb-epbbbleebbboo-ebb-ebbnoiebnb-ebnon-e-ebbb-enbnbnononiennbbnb-en-ebnonnf5-en-e-e-ennbbn-eoob-eo oobnvoopoobpbnbobbvopnnpbbbnobnbpppooonoobbonbnbbbnnbobnoovoonpvobpponnobbnoonorrvonbnbBno bb-eonnbnnbneoobnnnnbbb-eoeb*eb-enbbnnnnn-enbnnnnnnn-enobboon5n-eoo-eoobnbo-eo.Bb-eo-eno-ebbienob-enlepnc)o oooobeonoobnoonbnnienob-elennnbbbnoonnnboonoo-e-eobno-eono-eoononlebnbobbnnieobn.6-ebpno.6bvoooeonbnoo obnnnnbebieoeblebnnnnnnnnnnnnnnnnnnnnnnnnnooonnb-en-e-elennbbn-eon-eo-en-e-en-ennbbb-ebnbnnnnb-eb-pbbnnnn obnnonnb-en-eoeb-ebn-eo-ebbn-eeonooo-ennno-enbnooibbb-ebnnnnnn-enb.6-enno-en-eoonoononn-eb-eonb-eb-enn-ennbnbp npobnnpobnnnobbnbbnnbnnpbbppnBnnpbbvpppobvobpnnpnobnorooobonpbpbooBpbnb-eobnnob-ebbobbpbbboo opvbnbbBbnvvbpbbpobbpbnobbvbbbonovnobrononb-enbnoo-eobb-eobbnbbn-eobb-eoobenn-ee-e-e-e-eo-en-ev-e-e-eno-en ononvoobobpvbnbropovvnobbbonnvoo-ebe-eon-eb-ebb-eonbbeboeoneB-eobbbobbe-eoobbpbbbnnno-eob-eooon-eenB noovovonnbbnbvorobb'eoo.bb-eobenn-eoo-en-e-e-ef)nn-e.B.B-een-e-e'e-eoLDopnLob-e-epeL-en-ennbn-eo'en-e-ennpbLDobienonienp nnLobLDoeeeopBbb-ebbnnnLobbn.Bevebbn*ebbb-ebbb-eob-eo-eob-ebbb-eonon-e.bbnooononnebeonn-eoonb-eobn-ef)nobooo bpbbbpooppoonovvbnobbnoponononvoorobbnoononiebbnonnoo'enb-e-e-eonnnLbnnbnnLf)nnf)n-eo-ebnnb-eooobn-eo ...squ TSETIII Pnoopobpvnnpnnnpnpppbppppppnpnnppnppnpnbbvbnon-en-eon-ebnbn-ebnn'ennnbn'e2Lnb-ennn-enbnbnobpnbonop nnnpnnnbeleooonnbnbeeLDnnnnnenonnf)nnn-ennieoie'enn-e-eo'enonneb-eno'e'e'eno'enpnpooLoepf)ienobb-eeebeo-eneoe vvnnnnnpopppnpooobpooo5obonvoobubnbobbpounn-eb-ebnobnbe-eeooonooboonoopnoobonbpbnponoovbnoono eebnnonbbnobbL:)oobbnnf)n-eooeonnnbbbbn-e.B-eb-enb-ennnnn-enbnnnnnn-elenobvoooboeoc),eoo-eooo-eobbeo-ennebb bnobgnBvpooonoobvononbnoononnebnb-e-eonn.Bb-eooonooLoonoo'eleobnoeo'nob*eononbbnbn.bbnb-eobnbepbnobb bonobonbnovopononBvbpopbvbnnbnnbnnbnnnnnpovvvnvnnpnnnnpoppooopoonoooonnpbnjbnbnon-en5n-enpobb bennnbieob"ebnbnno'e'eepennnnno-nno-eLonn-eoobnLonbnonn-enb-enoonnbnnonnnnnonononnnnn-enn-en-ebbn-ebpb b-ebnobnb-eobbbe-eob-eoeoo'eebbb-eonnn-eo-e-e-ebb-eb-ebb-e-ebbvobb-eebbbiebb-eooLD-eLDLDbLobbn-e-e-eibbBb-eb.bbbnbb-ee PepbvbbbpopbbvobvobBeobnbeebbbb-ebbn-ebvnbnbnbnvnbnbnbnBnjBnbnennbnbneebebbbo-eno-enonobbnnoonp Pooooobnoobrovppenpenbbponoonn-ebn7ebnonooeoooonennbn-eeonebeoonboor-e-ebnnneonoon-eo-e-eooo-ebn-ebe bBbnoononbnbnopooooopob5nnoonbpBeopbvvbnbnpoonoBnbbpbbbBpnbbbbnnonobbvooopononoppoovopoooo Bnooooo5nonnnonnooobnnnnnnbnnooononpo-e-eonbno-ennonobononnbnnnnobp-e-enonnoee-enooob-e-enbnnobbne bnnppnvpponbnppbpoopbponnpoopnbponbbnpbpppbpnnbbvopvbnppbpppbpnbpppnpppbvooononnsponvonnop bebneonbeeBnnnenvoobbiebnonoeoonbnnf)-e-eooobnnonnbnoie-ebbnonn-e-ebebnobnoLbnn-ennf:)-eb-ebpiebppbBbnc)b bbbnbnpnnbLobpe-en-enLLonnnnnonbbbnnonebnooboL*oo-eoob-eien-enbbpo-enn-ebbnnob",eepooon-eobbnnoo-eooono onebnbe-eonobbbnoonoeepnnoubbnb-ebeooobnnbn'enoonnonbbbbnL2-e-e'e-enbnnnbnniennnnnnnneienof)bnonbneoo Poobnpopopbpopoovbbbnnbenbebnoonoob-eonoobnoonoone.6nbbLonob-ebnoonoo-ebnnoobpnbno-eonobeo-eonbb beoebnbenf)nbeo'eno-eveoooeonbnonooonoo.6bbvnebvbnnnnnnnnobnnnnnnbnnnnnnbnnnbnnnbnnnnnnnEbLovibn Brobpnonpnnbbnnnoooonropoonnnnponnonpneennoonnnboo5nnnoopnopoononnoobbbbbbnbnb.bebnbnbnbbieb nponb-ebn-eno.bnbnbnnniebn-elepnnn-ebn-e-elef)-e-e-eo-ebTibbbnbooo-eonnooonneonnobnonoebbbbo-ebnj5nonono-ennon ooon-en-eoob-eb-eDVDLj E)DVbobvoon-eoob-eb-eo-eoonononboboneonbeboonob"eonooonoonoonnonobnooooonnon"eo obpbnonbppbnbBponnonebvoonoonnnooonoernonnnpenvbnoonnobnonobppnnonnonovbbnnnnnopnbbbnnoonb npopboopoooonppbbvoLnbbobnobnnooononnonnnbnnnnnenoooon-e-eonnoonononoonpnoobnb-e-envo-enonnobo-e onoonpnpnoonnobpnbopnopppbpooonnnbnopooonnonvonooovvvbvoonnnpbvonooonnbonbnnnonbpnbbvpnbbp ooonnonpboobnoonc),epBbbnpooooob-ebnonoooob-ec)nbnpbbn-ebbb.bnbo-ebbnooonobeobb:)PbnnnbLnnnpnnooono onpnbnnonooobnoonoropononnnoonvononnnognooooponbebbnonronovenvoobbroponnpooovbbnooevbpooob vbnbppvbbnbbbnbbnnooopvnononnnoobvoooonbnbnnertBbooonbnpbbuobpbbooobnnboonnonobrobopbpobpbb Pboobbnbppbnpbpnoobbnonnpnbnnoorobroebrobnnonnn-eob-eonobo-eboooo;oorbopuboopoobbbbnobbvnbnonp Bobnbnbnnobbbbbobbbbnpbbbpbnppoobpbbobpboobbnonbbbnbbobbbobbbnbvponnnnoponpnbn"bnooobobpp bnobbbbpbbpvbbbbonbbbboonnbnnbpbonbovvbbnobnooonobnonopebvvbb5ponbpeppopnpbbbbbppnnpobnbnn bonobbbbbpobpbbobvbLobbppbnon-eooboon-e-ebbo'en-e-e-eobf)bnonbeLbb-ebppL*noobbbbbobpebpvpBbbo5opobbpb nooonbnopprobbnbnobbbpbpbbbbpBbbboo-eobb-ebnb-eB-eobnon6biBiBno-ebb-ebbooob-ennobnoo-eeo-ebooeonnonnn Figure 3. Cartoon of the spliceosome assembly pathway. Assembly of the spliceosome on the pre-mRNA is shown. The exons are shown as boxes, the intron as a line. The 5' GU dinucleotide; the 3' AG dinucleotide; the branch adenosine, A; and the pyrimidine tract, PYR, are indicated on the intron. E complex contains Ul snRNP, U2AF and the SR proteins and is the first spliceosomal complex to form on the pre-mRNA. Ul snRNP is shown base-paired to the 5' splice site sequence. The SR proteins are shown interacting via RS domains interactions with U1 snRNP and U2AF. The SR proteins are not shown bound to the pre-mRNA as binding can occur to either the intron or exon sequences. Following E complex is A complex, the first ATP-dependent complex that forms on the premRNA. U1 snRNP and U2AF are believed to be dissociated or destabilized from the premRNA at this step and are shown in gray. The tri-snRNP is bound to the assembling spliceosome and then base-pairing with the 5' splice site sequence in complexes B and C. A PYR f E + U- ul U2 AF A + i U1 ri 1 *s~ i U2 AF U2 ~~~~u 5 B,C CHAPTER 2: IDENTIFICATION, PURIFICATION AND CHARACTERIZATION OF A NEW PYRIMIDINE-TRACT BINDING SPLICING ACTIVITY. Patrick Schonleber McCaw, Andrew MacMillan, Charles Query, Barbara Panning, and Phillip A. Sharp ABSTRACT We have identified a new pyrimidine-tract binding splicing factor (PUF) that is required, together with U2AF, for the efficient reconstitution of in vitro splicing activity to a poly[U] depleted nuclear extract. The activity has been purified to near homogeneity and found to consist of at least two proteins; PUF60 and the previously described splicing factor p54 (Zhang and Wu, 1996). p54 and PUF60 form a complex in vitro and PUF 60 interacts weakly, but detectably, with the branch sequence binding protein SF1/BBP. While PUF is required for the formation of the U2 snRNP-branch sequence complex, A. (Query et al., 1997), it is not absolutely required for the formation of the U2 snRNP complex formed on a 3' half substrate, A3'. This A3' complex forms with diminished efficiency in the absence of the PUF activity. PUF binds to pyrimidine-tract RNAs, but does not appreciably interact with either the branch sequence or the AG dinucleotide of the 3' splice site. PUF60, unlike p54, appears to localize to speckle-like structures in the nucleus that are distinct from the nuclear speckle domain to which many splicing factors localize. INTRODUCTION Four sequence elements found in the mammalian intron are known to be essential for recognition of the intron. These are the 5' splice site sequence, the branch sequence, the pyrimidine tract and the 3' splice site AG dinucleotide. The 5' splice site sequence is bound by Ul snRNP early in spliceosome assembly and ATP is not required for this interaction (Mount et al., 1983; Siliciano and Guthrie, 1988; Zhuang and Weiner, 1986). The branch sequence is recognized at least twice during spliceosome assembly, first by the branch sequence binding protein SF1/BBP and, subsequently, by U2 snRNP (Berglund et al., 1997). The stable binding of U2 snRNP to the branch sequence requires recognition of the pyrimidine tract by the splicing factor U2 snRNP Auxiliary Factor, U2AF, and ATP (Roscigno et al., 1993; Ruskin et al., 1988; Zamore and Green, 1989). Ul snRNP and U2AF appear to be the primary determinants of splice site sequence binding early in spliceosome assembly (Bennett et al., 1992; Reed, 1996). The factors required for recognition of the 3' splice site AG dinucleotide are not known; however, spliceosome assembly and the first step of splicing can occur in the absence of an AG dinucleotide (Anderson and Moore, 1997; Frendeway and Keller, 1985; Ruskin and Green, 1985) indicating that recognition of the AG dinucleotide is not required for spliceosome assembly. The pyrimidine-tract binding splicing factor U2AF was identified as an activity required for the association of U2 snRNP with the pre-mRNA. U2AF was purified and found to consist of two proteins of 65 and 35 kD (Zamore and Green, 1989). U2AF can be depleted from extracts by taking advantage of the very stable interaction of U2AF has with poly[U] RNA. U2AF binds poly[U] RNA in the presence of 1 M KCl and requires either 2 M Guanidine HCl (Zamore and Green, 1989) or 3 M KC1 (this work) to be dissociated from a poly[U] Sepharose column. The large subunit of U2AF, U2AF65, has been shown to be essential for the splicing of pre-mRNA in nuclear extract depleted by this method (Zamore and Green, 1991), while the small subunit appears to act as an enhancer of this activity (Zuo and Maniatis, 1996). U2AF65 is the pyrimidine-tract binding component of U2AF. U2AF is highly conserved in evolution: the invertebrates D. melanogaster(Kanaar et al., 1993) and C. elegans (Zorio et al., 1997) and the fission yeast S. pombe (Potashkin et al., 1993) have clear orthologs of the large subunit of U2AF, while the yeast S. cerevisiae gene MUD2 is a probable orthologue of U2AF65. The sequence, but not the function, of MUD2 has diverged sufficiently that this can not be determined with certainty (Abovich et al., 1994) and the plant A. thalianahas a similar protein identified in the sequence project (Rounsley et al., 1997). While the small subunit of U2AF may not be conserved in C. elegans, orthologs of this protein can be found in D. melanogaster (Rudner et al., 1996) and S. pombe (Wentz and Potashkin, 1996) as well as vertebrates. U2AF, bound to the pyrimidine tract, and Ul snRNP, bound to the 5' splice site, together with several other proteins, including SR proteins bound to exon sequences, form the E complex prior to the ATP-dependent spliceosome assembly step (Michaud and Reed, 1991; Staknis and Reed, 1994). Following the formation of E complex, U2 snRNP assembles on the branch sequence in an ATP-dependent step, the resulting complex is known as spliceosomal complex A. An analogue of A complex can form on 3' half substrate pre-mRNA consisting of the branch sequence, pyrimidine tract, 3' splice site and 3' exon, known as A3' (Barabino et al., 1990; Zamore and Green, 1991). U1 snRNP and U2AF are dissociated or destabilized from the pre-mRNA upon formation of A complex. The pyrimidine tract may remain associated with other pyrimidine-tract splicing factors such as PTB and PSF. PSF is known to be required for the second chemical step of splicing (Patton et al., 1993). Whether the pyrimidine tract is recognized between these two steps is not known. Following U2 snRNP association with the pre-mRNA, the tri-snRNP, U4/5/6 snRNP, binds the pre-mRNA and the complete spliceosome is formed and the chemical steps of splicing take place. In order to better understand the substrate binding requirements for U2 snRNPs binding to the pre-mRNA, a complex on a minimal U2 snRNP binding RNA has been described (Query et al., 1997). This U2 snRNP complex, An n, shows additional sequence specificity requirements and is, therefore, an important tool in understanding what the requirements are for stable U2 snRNP binding. A~ n forms many of the same protein-RNA contacts that complex A forms as judged by photochemical crosslinking. Splicing factors have been shown to localize to discrete structures of the nucleus known as nuclear speckles. Many splicing factors are known to localize to these domains including the SR protein p54 (Chaudhary et al., 1991). p54 is conserved between vertebrates and the invertebrates D. melanogaster(Kennedy and Berget, 1998) and C. elegans (McCombie et al., 1993). The function of p54 is not known, although like the general class of SR proteins it activates the cytoplasmic extract S 100 for in vitro splicing (Zhang and Wu, 1996). Unlike the general class of SR proteins, p54 has an SR domain that has sequence characteristics that are reminiscent ofUl1 70 kD and U2AF. We have identified a new pyrimidine-tract binding splicing factor (PUF) that is required, together with U2AF, for the efficient reconstitution of in vitro splicing to a poly[U] depleted nuclear extract. The activity has been purified to near homogeneity and found to consist of at least two proteins; PUF60 and the previously described splicing factor p54. This chapter compiles the current state of our understanding of the PUF splicing activity and the PUF60 protein. Several lines of evidence suggest that the PUF activity is required after the U2AFdependent step in spliceosome assembly. However, we have been unable to conclusively demonstrate that p54, PUF60 or both proteins are responsible for the PUF splicing activity. RESULTS Depletion of NE for pyrimidine tract splicing Splicing extracts can be depleted of the pyrimidine-tract binding factor, U2 snRNP Auxiliary Factor (U2AF) by passing the extract over a poly[U] Sepharose or oligo dT column at high salt concentrations. Extracts depleted in this way (NEAU2AF) are unable to catalyze the splicing reaction and are blocked at an early step in spliceosome assembly. Spliceosome assembly and in vitro splicing activity can be restored to these extracts through the addition of purified U2AF or recombinant U2AF65 (Zamore and Green, 1991). Small modifications of this depletion protocol (discussed in the Methods section) lead to the co-depletion of a second pyrimidine-tract associated splicing activity. Nuclear extract depleted using this method (NEAU) is inactive for splicing in vitro (figure lA, lane 2). Addition of either recombinant U2AF65 or the 2.0 M KCl eluate of the poly[U] column leads to a partial restoration of splicing activity (figure 1A, lanes 3 and 4). Efficient restoration of in vitro splicing activity requires the addition of both fractions (figure 1A, lane 5). The activity present in the 2.0 M KCl eluate is referred to as PUF (Poly[U] Eactor). To determine if the PUF activity was required for splicing of multiple introns or was specific to the PIP85a intron, two other pre-mRNAs were tested for their requirements for the PUF activity. We tested the in vitro splicing activity in the reconstituted system of the AD 10 pre-mRNA and a PIPI3G chimeric intron. Both introns required both PUF activity and U2AF65 for efficient reconstitution of in vitro splicing activity (figure IB compare lanes 4 with 5, and 18 with 19). Role for U2AF35 Recombinant U2AF65 is sufficient to reconstitute splicing activity to extracts depleted of both U2AF65 and U2AF35 (Zamore and Green, 1991). To determine if the PUF activity substituted for U2AF35 in NEAU reconstituted splicing, U2AF purified from nuclear extract, containing both U2AF35 and U2AF65, was compared to recombinant U2AF65 in the reconstituted splicing reaction. For each pre-mRNA tested, PUF activity stimulated splicing when compared to purified U2AF or recombinant U2AF65. In the case of PIP85a, purified U2AF had additional stimulatory activity when compared to recombinant U2AF65, presumably due to the presence of U2AF35 in this fraction. For the PIPBG and the AD10 pre-mRNAs purified U2AF and recombinant U2AF65 had comparable activities either in the presence of the PUF activity (figure IB compare lane 5 to 7 and 19 to 21) or in its absence (compare lane 4 with 6 and 18 to 20). In contrast to the results obtained with PIPBG and AD10, purified U2AF had more activity than recombinant U2AF65 on the PIP85a substrate in both the presence and the absence of the PUF activity (compare lanes 12 to 14 and 11 to 13). This result argues that PUF does not functionally substitute for U2AF35 in these reactions. Purification of the PUF activity The PUF activity was purified to near homogeneity using the reconstituted in vitro splicing reaction as an assay. Purification of the activity was by the scheme shown in figure 2A. The final purification step on S Sepharose is shown in figure 2B. The top panel shows the silver-stained gel of eluted fractions, while the bottom panel shows the activity of each eluted fraction. Quantitation of the PUF activity recovered during fractionation has been difficult, and we believe that this was due to three factors. First, the in vitro reconstituted splicing reaction had a limited linear range. Second, the presence of multiple poly[U] binding activities that copurified with the activity may have interfered with the assay by competing for pyrimidine tract binding. Finally, the presence of contaminating poly[U] RNA from the poly[U] Sepharose affinity chromatography may have competed for PUF and U2AF activity. A Coomassie stained gel showing fraction 8 of this purification is shown in figure 2C. Bands present in this fraction are of apparent molecular weight 130 kD, 80 kD, 60 kD and 48 kD. All four bands were excised from the gel, digested with trypsin and peptides were sequenced. Identification of PUF60 and p54 as the predominant proteins in the active fraction The four proteins identified in figure 2C were digested with trypsin and the resulting peptides were sequenced. Three peptides were sequenced from the 60 kD band and were identified as the previously described splicing factor p54 (Chaudhary et al., 1991; Zhang and Wu, 1996). Fourteen peptides were sequenced from the 130 kD band and found to be encoded by a previously undescribed gene. No tryptic peptides were identified from the 80 kD or 48 kD bands. cDNAs that encoded all fourteen peptides found in the 130 kD protein were identified in the EST database, and the cloning and characterization of this factor (PUF60) and its domain structure will be the subject of the next chapter. PUF60 forms SDS-resistant dimers with an apparent molecular weight of 130 kD explaining the aberrant mobility of this protein on SDSpolyacrylamide gels (discussed in chapter 3). PUF60 is a ubiquitously expressed and abundant mRNA of 2.0 kb (figure 3) as would be expected of a splicing factor. The differences in expression between tissues seen on this blot mirror those seen for the nuclear-matrix associated splicing factor SRml60 (BlC8, (Blencowe et al., 1995). Rabbit polyclonal antibodies raised to PUF60, demonstrate that NEAU is depleted of PUF60 as shown in figure 4A. Equal volumes of different batches of NEAU (lanes 2, 4, and 5) and control extract, NE (lanes 3 and 7) were compared by immunoblotting. NEAU was estimated to be depleted at least 90% of PUF60 by this method. Antibodies raised against the splicing factor p54 demonstrate that p54 is also depleted from NEAU to a similar extent (figure 4B, lanes 1 and 2). It should be noted that p54 has an aberrantly slow mobility in SDS PAGE gels (figure 4B compare lane 2 with lane 4). When depleted extracts boiled in SDS were added to the purified or partially purified PUF activity (also boiled in SDS sample buffer), p54 comigrated with p54 found endogenously in the extract (figure 4B, lanes 2, 3 and 4). As both the extract and the p54 containing sample had been boiled in SDS prior to mixing we do not believe that the change in mobility represents a covalent modification of p54, but rather represents a gel artifact. (The band marked x is a cross-reacting band that is not detected with other p54 antisera.) p54 translated in vitro, in the presence of the proteins found reticulocyte lysate, runs as a discrete band of 65 kD (figure 5, lane 1). However, when immunoprecipitated from the translation reaction p54 migrates aberrantly as a diffuse band of approximately 65 kD, as well as bands at the interface between the stacking and resolving gel and at the origin (figure 5, lane 5). Presumably, the high protein concentration found in nuclear extract and in reticulocyte lysate acts as a carrier, allowing resolution of the protein in SDS-polyacrylamide gels. Similar aberrantly slow migration of proteins is observed for the highly charged protein PACT and for the SR proteins. PACT is known to behave aberrantly on SDS-polyacrylamide gels without prior covalent modification of the lysine residues of this protein (Simons et al., 1997). The arginine-rich SR proteins also migrate aberrantly slowly on SDS-polyacrylamide gels (Zahler et al., 1993), suggesting that aberrantly slow migration in SDS-polyacrylamide gels may be a common property of proteins with arginine and lysine-rich domains. To determine if PUF60 and p54 can form a complex in vitro, co-translation of PUF60 and p54 was performed in vitro. Antibody to p54 (Chaudhary et al., 1991) immunoprecipitates p54 (figure 5, lane 6). Immunoprecipitation of the co-translation reaction of p54 and PUF60 led to the immunoprecipitation of PUF60, whereas control immunoprecipitations with no p54 present does not lead to the precipitation of PUF60. The co-immunoprecipitation of PUF60 with p54 strongly suggests that p54 and PUF60 form a complex in vitro. The PUF activity purifies as 400 kD complex Application of the purified PUF fraction to a gel filtration column allowed further purification of the splicing activity and resolution of the size of the active PUF complex. The peak of PUF activity eluted from the column with an estimated size of 400 kD (figure 2D, bottom panel lanes 5 and 6). The proteins present in the adjacent fractions included the 130 kD band and the 60 kD band (figure 2D, top panel lanes 6, 7 and 8). Presumably, these represent PUF60 monomer and dimer and p54. Notably, the 48 kD protein does not co-purify with the activity; the 48 kD protein was found in fractions 20-22 and presumably is a distinct protein complex from the PUF activity (top panel, lanes 5 and 6). Other proteins of about 125 kD and 48 kD eluted earlier and later than the PUF activity. PUF is required for efficient A3' complex assembly U2AF depleted extract, NEAU2AF, is blocked prior to the first step in splicing and does not form spliceosomal complexes (Zamore and Green, 1991). NEAU was blocked prior to the first step of splicing as is NEAU reconstituted with U2AF alone (figure lA and lB) suggesting that the PUF activity, like U2AF, acts early in spliceosome complex assembly. To determine if this was the case, spliceosomal complex formation on 3' half pre-mRNAs consisting of a branch sequence, pyrimidine tract, 3' splice site and 3' exon was tested. A time course of A3' complex assembly is shown in figure 6. While PUF activity is not absolutely required for formation of A3' complex (figure 6 lanes 5 and 6 have about 30% of the A3' found in lanes 11 and 12), it is required for efficient formation of this complex. In the experiments shown in figure 6, the formation of A3' was particularly robust in the absence of PUF. More generally, in the absence of PUF the A3' complex formed at about 10% the levels seen in the presence of both factors, rather than the 30% shown here (not shown). In contrast to these results, formation of the ATP-independent U2 snRNP complex, An, is more dependent on the presence of PUF activity (Query et al., 1997). RNA crosslinking of the PUF fraction To determine which proteins in the active fraction contact RNA a crosslinking assay was performed. Purified PUF fraction was incubated with full length PIP85a pre-mRNA under splicing conditions and subsequently irradiated with ultraviolet light to crosslink the proteins to the RNA. Crosslinked bands are apparent at 130 kD, 60 kD and 48 kD (figure 7, lane 1), indicating that PUF60 and the 48 kD band contact RNA. It is not clear from this experiment whether p54 also contacts the RNA as p54 runs as a diffuse band of approximately 60 kD at the protein concentrations used here and so may not be resolvable under these conditions. The PUF60 protein should be monomeric at the concentrations used in this experiment and so should migrate primarily as a 60 kD band; the 60 kD crosslinked product is presumed to be PUF60. To determine the specificity of PUF protein binding, the binding reaction was carried out in the presence of RNA homo- and heteropolymers. As expected, poly[U] RNA efficiently competed for the RNA binding activity of the PUF fraction, more surprisingly, poly[C] did not compete, but poly[G] did compete at high concentrations. Because p48 does not co-purify with the splicing activity, but does crosslink to RNA we suggest that p48 fortuitously co-purified with the PUF activity through its ability to bind poly[U] RNA. PUF does not detectably interact with the branch sequence or AG dinucleotide Both the UV crosslinking assay and PUF's affinity for poly[U] Sepharose strongly suggested that PUF activity bound to the pyrimidine tract. To determine whether the predominant RNA binding activity in the purified extract interacted with other 3' splice site sequences, mobility shift assays were performed. Purified PUF fractions form a discrete complex on pyrimidine tract containing RNAs (figure 8). To determine if the PUF activity interacted with the AG dinucleotide found at the 3' splice site, the affinity of PUF for a pyrimidine-tract RNA or a pyrimidine-tract RNA-AG dinucleotide RNA was compared (figure 8, compare lanes 9 to 14 with 15 to 20). No difference in affinity was detected. Similarly, to determine if the presence of a branch sequence 5' to the pyrimidine tract increases RNA-binding affinity of the purified PUF activity, an RNA with the branch sequence 5' to the pyrimidine tract was compared to a mutant RNA in which the branch sequence was placed 3' to the pyrimidine tract. No difference in RNA binding affinity was observed between these two RNAs (figure 8, compare lanes 1 to 4 with lanes 5 to 8). These results suggest that PUF interacts solely with the pyrimidine tract and not with other 3' splice site sequences. PUF and U2AF do not bind to the same pyrimidine-tract RNA To determine if the pyrimidine tract binding factors U2AF and PUF cooperate in binding the pyrimidine tract, binding reactions were performed in the presence of PUF and either U2AF65 or U2AF. Comparison was made between three identical binding titrations of PUF (from 2.15 nM to 1 tM, figure 9, lanes 12-20) in the presence U2AF65 (figure 9, lanes 2-10, U2AF65 alone is in lane 1) or U2AF (figure 9, lanes 22-30, U2AF alone in lane 21, RNA alone is shown in lanes 11 and 31). Both the change in apparent binding affinity and the appearance of super-shifted bands was assessed. Neither a super-shifted band nor a change in apparent PUF affinity for the pyrimidine-tract RNA was observed, strongly suggesting that PUF and U2AF do not interact on pyrimidine-tract RNA incubated under splicing conditions. It was also noted that PUF bound to the RNA cooperatively, with a Hill coefficient of close to 2 (for a more complete description of the RNA binding activity of PUF60 see the next chapter) and that this RNA-binding activity was not effected by the presence of either U2AF65 or U2AF. Interaction with the branch sequence binding protein SF1/BBP To determine if PUF60 interacts with the branch sequence binding protein SF1/BBP, a GstSF l/BBP fusion protein was bound to glutathione-agarose beads and then incubated with nuclear extract. The associated proteins were detected by immunoblotting. A distinct, but weak, PUF60 band was observed in the GstSFl/BBP pull down from the nuclear extract (figure 10, top panel, lane 2). Although the signal observed with this pull-down was very faint when compared with the signal observed with a U2AF65 antibody on a duplicate immunoblot (figure 10, bottom panel, lane 2), we believe that the inefficient binding observed represents specific binding, as the result has been reproduced multiple times and as no PUF60 was observed in binding reactions with Gst (lane 8), GstPTB (lane 4) or GstU2AF65 (lane 6). Further, as a control for antibody specificity each Gst fusion protein was incubated with NEAU (odd numbered lanes); no PUF60 binding was observed in these lanes. The Gst fusion proteins (marked with arrowheads), present in vast excess in each of these binding reactions, crossreacts to the PUF60 primary antibody. This weak, but specific, interaction between GstSF 1/BBP and PUF60 is also observed with in vitro translated PUF60 (data not shown). Subcellular localization of PUF60 To investigate the cellular localization of the PUF factors immunoflourescence was performed both on a spontaneously transformed mouse cell line and the human cell line 293 using PUF60 affinity-purified antiserum. p54 is known to co-localize with the splicing factors in nuclear speckle bodies (Chaudhary et al., 1991). Because p54 and PUF60 associate in vitro and co-purify, we expected that PUF60 would localize with p54 to nuclear speckles. To determine if PUF60 co-localized with these bodies we co-stained with the nuclear-speckle marker antibody BlC8 (Blencowe et al., 1994; Wan et al., 1994). Typical results from such an experiment are shown in figure 11. Surprisingly, PUF60 does not co-localize with B 1C8 in this assay (compare panels B and C, merged in panel E). Instead, PUF60 localizes to a small number of discrete structures, only some of which overlap with or are adjacent to the speckle bodies. These PUF 60 staining structures are substantially less abundant and of lower intensity than the nuclear speckles and their identity is currently unknown. The antibodies used in these experiments are not known to recognize PUF60 in the native state (as determined by immunoprecipitation experiments, data not shown); it is possible then, that the PUF60 staining structures represent denatured PUF60 and are not indicative of the subcellular localization of native PUF60 protein. DISCUsSION We have identified a new pyrimidine-tract binding splicing factor (PUF) that is required, together with U2AF, for the efficient reconstitution of in vitro splicing of multiple pre-mRNAs to a poly[U] depleted nuclear extract. The activity has been purified to near homogeneity and is found to consist of at least two proteins PUF60, and the previously described splicing factor p54 (Zhang and Wu, 1996). Splicing in vitro requires two pyrimidine-tract RNA binding activities PUF was purified to near homogeneity and found to consist of at least two proteins, the previously described splicing factor p54 (Chaudhary et al., 1991; Zhang and Wu, 1996) and a novel protein with striking similarity to U2AF65, PUF60 (see next chapter for a description of this protein). The PUF activity eluted from a gel filtration column with an apparent size equivalent to a 400 kD protein complex. The size of this complex suggests that the PUF60/p54 complex is a higher order oligomer in the purified fraction. It is also possible that this unexpectedly large complex is the result of oligomerization of the PUF activity through proteinpoly[U] RNA interactions. RNA from the poly[U] Sepharose column co-elutes with the PUF activity. The gel filtration experiment suggests that PUF activity does not depend on the presence of the 48 kD protein found in the S Sepharose purified fraction. However as the 80 kD protein, also found in the S Sepharose purified fraction, was not detected in any of the gel filtration fractions its importance to the activity could not be evaluated, but it is unlikely to be an important component of the activity. The PUF activity together with U2AF is required for the efficient splicing of several introns. For the PIPBG and AD10 pre-mRNAs the presence of U2AF35 was not required for maximal splicing activity. For the PIP85a substrate, maximal activity was obtained only in the presence of U2AF35. This is reminiscent of the U2AF35 splicing-enhancer activity detected in U2AF35 depleted nuclear extracts and is presumably due to interactions between U2AF35 and SR proteins bound to an enhancer element found in the pre-mRNA (Zuo and Maniatis, 1996). PUF activity was required for the efficient assembly of A3' complex, the U2AF and ATPdependent U2 snRNP-containing complex that forms on branch-sequence, pyrimidine-tract, 3' exon RNAs. In contrast, PUF activity was more stringently required for the formation of the An n complex (Query et al., 1997). This complex, a model U2 snRNP-branch sequence complex, is more sensitive to the sequence and functional groups of the branch sequence than A or A3', revealing specificities to the interaction of U2 snRNPs with the branch sequence that are masked when U2 snRNP binds to longer RNAs (Query et al., 1997). Factor requirements for U2 snRNP's interaction with the branch sequence are also revealed in this complex that are not as apparent for the A or A3' complexes: the PUF activity is more stringently required for A,, than for A3' or A complex formation. This may be due to the absence of binding sites for the SF3a factors on the pre-mRNA 5' to the branch sequence (Chiara et al., 1994; Gozani et al., 1996). What role PUF plays in mediating spliceosome complex assembly is not known, but it is interesting to speculate that PUF may be required after the U2AF-dependent step. Three arguments support this view. First, it is observed that U2AF or U2AF65 and PUF can not bind to the same RNA molecules and so it seems unlikely that they act at the same point in spliceosome assembly. Second, in the absence of PUF activity, A3' spliceosomal complexes can form, but they do not form in the absence of U2AF, suggesting that PUF may play a role in the transition from A or A3' to later spliceosomal complexes. Third, U2AF is not absolutely required for the splicing reaction, but PUF activity is required under these conditions (MacMillan et al., 1997). Furthermore, it is known that U2AF becomes destabilized from the pre-mRNA, possibly with Ul snRNP, at the transition between the commitment complex and formation of the spliceosome (Michaud and Reed, 1993). PUF could replace U2AF at the pyrimidine tract during this transition and might act as a U2AF-like factor for assembly of the tri-snRNP complex or other splicing factors on the pre-mRNA. It also remains a possibility that PUF acts as an intron-specific splicing factor. At least for the limited number of pre-mRNAs tested this does not appear to be the case. The PUF proteins: p54 and PUF60 The purified PUF activity consists of at least four polypeptides, two of which, PUF60 and p54 have been identified in this work. We do not believe that the other two polypeptides, the 48 kD and 80 kD bands, are likely to be important for the PUF activity as they did not copurify with PUF activity upon gel filtration chromatography. p54 has been described previously as a protein that co-localizes with splicing factors, has an SR domain that is similar in sequence to the Ul 70k SR domain, and reconstitutes splicing activity to an S100 extract (Chaudhary et al., 1991; Zhang and Wu, 1996). PUF60 interacts weakly with the splicing factor SFl/BBP, the branch sequence binding protein. This interaction is substantially weaker than the interaction between U2AF65 and SF /BBP. This may be indicative of functional differences between PUF60 and U2AF65, either in regulating splice site use or in mediating spliceosome complex assembly. PUF60 has been identified in unpublished work in two other experiments. The first experiment to identify PUF60 was a yeast two-hybrid experiment; PUF60 was identified as a protein that interacts with the mouse homologue of the Drosophilaprotein seven-in-absentia (D. Bowtell, personal communication, PUF60 appears in the Genbank database as SIAH-BP1 based on this work). Seven-in-absentia, sina,is known to target the product of the tramtrack (ttk) gene for degradation. Ttk is a transcription factor that prevents cell-fate determination and its degradation is essential for cell fate determination of the R7 cell (Dickson, 1998; Li et al., 1997). Strikingly, ttk protein exists in two alternatively spliced forms, at least one of which has been determined to be degraded through the sina pathway (ttk88B; Li et al., 1997). It is interesting to speculate that the D. melanogaster homologue of PUF60, DPUF68 (chapter 3), may also interact with sina protein and so may regulate the ttk alternative splicing pathway. The second experiment to identify PUF60 was an immunoprecipitation experiment; PUF60 was identified as a protein that co-immunoprecipitated with the human auto-immune antigen Ro (Pascal Bouffard, personal communication). The function of the Ro auto-antigen is not known, but Ro is known to localize to both nuclear and cytoplasmic compartments (Peek et al., 1993). Nuclear Ro may be associated with the nuclear speckle domain (Wahren et al., 1996). The functional significance of the association between Ro and PUF60 remains to be determined. Sub-cellular localization of PUF60 Antibodies raised against PUF60 have proven unable to recognize the protein in native form in immunoprecipitation experiments (data not shown). Although this has greatly hindered the biochemical characterization of this protein, the antibodies do recognize faint speckle-like structures in the nucleus. These structures are reminiscent of the PML-staining bodies found in the nucleus (Lin et al., 1998; Sternsdorf et al., 1997). PML, a protein commonly found to be a fusion protein with the retinoic acid receptor in promyelocytic leukemia has been shown to be covalently modified by ubiquitin-like molecules (Kamitani et al., 1998; Muller et al., 1998). It will be of great interest to determine if the PUF60-staining structures are related to the PML structures. It is intriguing that two avenues of study, the localization to PML-like bodies and the association with the degradation-targeting protein sina, suggest that PUF60 may be targeted for degradation or modification by ubiquitin or ubiquitin-like molecules. As PUF60 antibodies do not recognize the native protein in immunoprecipitation assays, it is possible that the majority of the PUF60 immunostaining observed represents denatured protein or peptide fragments of PUF60 and that the bulk of the PUF60 remains undetected by the methods used. We are currently pursuing methods to detect the localization of the native protein. METHODS AND MATERIALS Preparation of poly[U] depleted nuclear extract: HeLa cell nuclear extract was prepared using standard protocols (Dignam et al., 1983). Briefly, 15-30 liters HeLa cells grown in suspension culture were harvested by centrifugation, washed in ice cold PBS and resuspended in five packed cell volumes of ice cold Buffer A (10 mM Hepes [from 1.0 M stock pH 7.9, with KOH at room temperature], 1.5 mM MgCl 2, 10 mM KC1, 5 mM dithiothreitol), proteinase inhibitors were included in all buffers and included 50 ptg/ml of PMSF and Leupeptin, Pepstatin A and Aprotinin (Boehringer Mannheim, used at the concentration recommended by the manufacturer), and incubated for 10 minutes on ice. All subsequent steps were carried out on ice or at 40 C. Swollen cells were centrifuged at 1200g, resuspended to two swollen cell pellet volumes of Buffer A and dounced 10 strokes with a Hamilton B dounce to lyse the cells. Lysed cells were centrifuged for 10 minutes at 1200g and the supernatant was decanted off the loosely packed nuclei and discarded. Nuclei were packed at 25,000 g for 20 minutes and the supernatant was discarded. Nuclei were resuspended in 1.25 volumes Buffer C (20 mM Hepes pH 7.9, 0.42 M NaCl, 1.5 mM MgCl 2, 0.2 mM EDTA, 5 mM dithiothreitol, 0.5 mM PMSF) per volume of packed cells with a Hamilton A dounce and incubated tumbling at 40 C for 45 minutes. Extracted nuclei were pelleted at 20,000 g for 10 minutes and the pellet was discarded. Nuclear extract supernatant was twice dialyzed against 1.0 M KCl HENG10 (20 mM Hepes pH 7.9, 0.2 mM EDTA, 0.05% Np40, 10% glycerol) for 1 hour each on ice, until the conductivity of the extract matched that of the dialysis buffer. Control nuclear extract (NE) dialyzed to 1.0 M KCl was removed and set aside at 40 C during the depletion procedure. The high salt nuclear extract was depleted by application of the extract to a poly[U] Sepharose column resuspended in H20 on ice, and washed several times in batch in H20 followed by equilibration in 1.0 M KC1 HENG10. A volume of poly[U] Sepharose approximately equal to that of the high salt extract was used for the depletion. Extract was applied to the column at approximately one column volume per hour. The flow through of the column (NEAU) was detected by Bradford, pooled and dialyzed immediately against three changes of 0.1 M KC1 HENG20 (HENG10 but with 20% glycerol) on ice for a total of three hours. Extract was centrifuged at 25,000g for 15 minutes to pellet insoluble material. The protein concentration of NEAU consistently was two thirds to one half the protein concentration of the control extract (NE) due to loss during depletion and dialysis. Extract was frozen in small aliquots in liquid nitrogen and stored at -80 0 C. The entire procedure was routinely performed in a single, long day for best results. The critical difference between this protocol, which uncovers the PUF requirement, and the U2AF depletion protocol (Zamore and Green, 1991) lies in the dialysis of the nuclear extract directly into high salt buffer. If the extract is dialyzed into low salt buffer (0.1 M KC1) prior to dialysis into high salt, the extract is not depleted of PUF activity, nor is it depleted of the PUF60 and p54 proteins as determined by immunoblotting analysis (data not shown). The reason for this difference is not known, but it is interesting to speculate that the prolonged highsalt dialysis used in making NEAU may disrupt an interaction with a protein or RNA that associates with the PUF activity under low salt conditions and prevents its interaction with poly[U] RNA. This interaction might not be disrupted if the extract were dialyzed first into low salt buffer. Purification of the PUF activity PUF activity eluted from the poly[U] Sepharose column with 2.0 M KCl HENG10. U2AF was eluted with 3.0 M KC1. Both factors eluted in a broad peak of 1.5 to 2 column volumes and were dialyzed against 0.1 M KC1 HENG20. The 2.0 M KCl eluate was applied to a phosphocellulose (Whatmann P 11) column according to manufacturers instructions and washed at 0.1 M and 0.3 M and eluted in batch at 0.6 and 1.0 M KC1. The 0.6 and 1.0 M KCl eluates were pooled and dialyzed to 0.9 M KCl with 1.0 M HENG20 and applied to a 5 ml poly[U] Sepharose (Pharmacia) column at 0.25 ml/min and eluted with 1.0 M KC1. The poly[U] Sepharose eluate (360 gtg) was dialyzed against 0.1 M KC1 HENG20 and applied to a 5 ml S Sepharose column (Pharmacia). The S Sepharose column was step eluted at 0.1 M KC1, 0.55 M KCl and 1.0 M KC1. Activity eluted at 0.55 M KC1. Further purification of the activity was performed by gel filtration chromatography on Superose 6 (Pharmacia) by directly applying the S Sepharose eluate in 0.55 M KCl to the Superose column equilibrated at 0.55 M KCl at 0.25 ml/min KC1 HENG20. Elution of the PUF activity was compared to that of markers (Boehringer Mannheim Biochemicals, HPLC markers, #1213 776). RNAs used in this study The PIP85a, PIPBG and AD10 RNA substrates were transcribed from plasmids pPIP85a (Moore and Sharp, 1992), pPIPBG (Crispino et al., 1996), and pAD 10 (Konarska and Sharp, 1986) using T7 RNA polymerase (United States Biochemicals)and c32P UTP (New England Nuclear) under standard conditions (Query et al., 1996). The 3' half substrate PIP85aARX was transcribed from pPIP85aARX which is a deletion of PIP85a. pPIP85aARX was constructed by digesting pPIP85a with Eco RI and Xho I, the 5' overhang ends of the DNA were filled-in with Klenow and the DNA was ligated. Pyrimidine-tract RNA, bs-ppt and ppt-bs RNAs and ppt-AG were previously described RNA oligomers (Query et al., 1997) and were labeled at their 5' ends. Splicing in vitro In vitro splicing reactions were performed under standard conditions (Grabowski et al., 1984) and were 24% NE or NEAU supplemented with recombinant U2AF65 or the 3 M KCI eluate of the poly[U] Sepharose column and the PUF fraction in 60 to 100 mM KC1, 20 mM Hepes pH 7.9, 5 mM MgCl 2, 2 mM ATP and 5 mM creatine phosphate for 1.5 hours at 300. The reaction was stopped with the addition of 250 pl of 2 x Proteinase K buffer (10 mM Tris pH7.8, 10 mM MgCl 2, 0.5% SDS), 2jig of glycogen (Boehringer Mannheim Biochemicals) and 0.5 mg/ml Proteinase K (Boehringer Mannheim Biochemicals). Reactions were digested for 30 minutes at 650 C and precipitated with the addition of 4 volumes of 95% ethanol, 1.5 M ammonium acetate, 5 mM MgC12 . 70% ethanol washed pellets were resuspended in 5 tl 8 M Urea, 1 X TBE, heated to 1000 for three minutes and resolved on 20% Acrylamide (19:1)/50% Urea (Natural Diagnostics) gels in lx TBE. Complex assembly assays A3' complex assembly was performed under standard splicing conditions (above) using an RNA transcribed from PIP85aARX (an Eco RI to Xho I deletion of PIP85a) and resolved on 4% Acrylamide (60:1) Tris-glycine gels (Konarska and Sharp, 1986). Binding reactions were stopped with the addition of heparin to 5 mg/ml. Gel shift Binding reactions were carried out under splicing conditions at 80 mM KCl to CQ58-19 (Query et al., 1997), labeled at the 5' end, for fifteen minutes at 30 'C, heparin was added to 0.7 mg/ml and reactions were incubated on ice for the remainder of the time course. Binding reactions were resolved on 8% acrylamide (60:1) Tris-glycine gels. 10 V/cm 2.5 h at room temperature. Crosslinking Binding reactions were carried out under splicing conditions for 10 minutes at 300 C and crosslinking was carried out on ice for 15 minutes as previously described. The reaction was digested for lh at 370 with 10 ltg of RNaseA (Calbiochem), trichloroacetic acid precipitated in the presence of deoxycholate and resolved on a 7.5% polyacrylamide gel. Antibodies PUF60 antibodies were generated in two rabbits by inoculation of bacterially expressed PUF60(P) (Covance). PUF60(P) is the Pst I internal fragment of PUF60 inserted into the Pst I site of pQE31 (Qiagen) and expressed and purified in 6M Urea according to the manufacturers instructions. Protein was renatured on the Ni-NTA resin (Qiagen) according to manufacturers instructions. PUF60(P) protein was further purified on mono S Sepharose. For immunofluorescence experiments, PUF60 antibodies were affinity purified against PUF60(P) coupled to CNBr Sepharose (Pharmacia) and eluted sequentially with 100 mM glycine pH 2.0 and 100 mM Triethylamine pH 11.5 and neutralized with Tris to pH 7. Immunoblotting The gel that was blotted for figure 4B was run in the presence of 2M Urea to yield a sharper p54 band. All other gels were standard SDS 4-20% polyacrylamide gels (BioRad). p54 antibody (anti pep C) was the generous gift of Nilabh Chaudhary (Chaudhary et al., 1991). p54 immunoblotting was detected using 125 -protein A using standard techniques (Harlow and Lane). PUF60 immunoblot was detected using the ECL reagent (Amersham). Immunoprecipitations and Gst pull-downs GstSF1 and GstU2AF was prepared as previously described (Berglund et al., 1997). GstU2AF was prepared using standard techniques except that the protein was ammonium sulfate precipitated and then purified on S Sepharose (Pharmacia). GstPTB and Gst were the kind gifts of Anna Gil. Immunofluorescence Spontaneously transformed mouse fibroblasts (B. Panning, unpublished observations) were fixed as described by (Lawrence et al., 1989), and immunofluorescence was carried out as described by (Leonhardt et al., 1992). ACKNOWLEDGMENTS: I would like to thank Nilabh Chaudhary for generously contributing p54 antibodies and his substantial enthusiasm, Nadja Abovich for GstSF1 and SF1/BBP antisera, Helen Cargill, Margaret Beddall, and Yubin Qiu for expert technical assistance and Robbyn Issner for so much including growing the cells used in the purification of PUF and especially for general lab sanity maintenance. GstU2AF was the kind gift of Anna Gil, Phil Zamore and Michael Green. The protein sequencing of p54 and PUF60 was expertly performed by MIT Biopolymers. REFERENCES Abovich, N., Liao, X. C., and Rosbash, M. (1994). The yeast MUD2 protein: an interaction with PRP 11 defines a bridge between commitment complexes and U2 snRNP addition. Genes Dev 8, 843-54. Anderson, K., and Moore, M. J. (1997). Bimolecular exon ligation by the human spliceosome. Science 276, 1712-6. Barabino, S. M. L., Blencowe, B. J., Ryder, U., Sproat, B. S., and Lamond, A. I. (1990). Targeted snRNP depletion reveals an additional role for mammalian Ul snRNP in spliceosome assembly. Cell 63, 293-302. Bennett, M., Michaud, S., Kingston, J., and Reed, R. (1992). Protein components specifically associated with prespliceosome and spliceosome complexes. Genes Dev. 6, 1986-2000. Berglund, J. A., Chua, K., Abovich, N., Reed, R., and Rosbash, M. (1997). The splicing factor BBP interacts specifically with the pre-mRNA branchpoint sequence UACUAAC. Cell 89, 781-7. Blencowe, B. B., Nickerson, J. A., Issner, R., Penman, S., and Sharp, P. A. (1994). Association of nuclear matrix antigens with exon-containing splicing complexes. J. Cell Biol. 127, 593-607. Blencowe, B. J., Issner, R., Kim, J., McCaw, P., and Sharp, P. A. (1995). New proteins related to the Ser-Arg family of splicing factors. RNA 1, 852-65. Chaudhary, N., McMahon, C., and Blobel, G. (1991). Primary structure of a human argininerich nuclear protein that colocalizes with spliceosome components. Proc. Natl. Acad. Sci. USA 88, 8189-8193. Chiara, M. D., Champion-Arnaud, P., Buvoli, M., Nadal-Ginard, B., and Reed, R. (1994). Specific protein-protein interactions between the essential mammalian spliceosome-associated proteins SAP 61 and SAP 114. Proc Natl Acad Sci U S A 91, 6403-7. Crispino, J. D., Mermoud, J. E., Lamond, A. I., and Sharp, P. A. (1996). Cis-acting elements distinct from the 5' splice site promote Ul-independent pre-mRNA splicing. RNA 2, 664-73. Dickson, B. J. (1998). Photoreceptor development: breaking down the barriers. Curr Biol 8, R90-2. Dignam, J. D., Lebovitz, R. M., and Roeder, R. D. (1983). Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res. 11, 1475-1489. Frendeway, D., and Keller, W. (1985). Stepwise assembly of a pre-mRNA splicing complex requires U-snRNAs and specific intron sequences. Cell 42, 355-367. Gozani, O., Feld, R., and Reed, R. (1996). Evidence that sequence-independent binding of highly conserved U2 snRNP proteins upstream of the branch site is required for assembly of spliceosomal complex A. Genes Dev 10, 233-43. Grabowski, P. J., Padgett, R. A., and Sharp, P. A. (1984). Messenger RNA splicing in vitro: An excised intervening sequence and a potential intermediate. Cell 37, 415-427. Kamitani, T., Nguyen, H. P., Kito, K., Fukuda-Kamitani, T., and Yeh, E. T. (1998). Covalent modification of PML by the sentrin family of ubiquitin-like proteins. J Biol Chem 273, 3117-20. Kanaar, R., Roche, S. E., Beall, E. L., Green, M. R., and Rio, D. C. (1993). The conserved pre-mRNA splicing factor U2AF from Drosophila:requirement for viability. Science 262, 56973. Kennedy, C. F., and Berget, S. M. (1998). Direct Submission to GENBANK: accession AF055719. Unpublished. Konarska, M. M., and Sharp, P. A. (1986). Electrophoretic separation of complexes involved in the splicing of precursors to mRNAs. Cell 46, 845-855. Lawrence, J. B., Singer, R. H., and Marselle, L. M. (1989). Highly localized tracks of specific transcripts within interphase nuclei visualized by in situ hybridization. Cell 57, 493-502. Leonhardt, H., Page, A. W., Weier, H. U., and Bestor, T. H. (1992). A targeting sequence directs DNA methyltransferase to sites of DNA replication in mammalian nuclei. Cell 71, 86573. Li, S., Li, Y., Carthew, R. W., and Lai, Z. C. (1997). Photoreceptor cell differentiation requires regulated proteolysis of the transcriptional repressor Tramtrack. Cell 90, 469-78. Lin, R. J., Nagy, L., Inoue, S., Shao, W., Miller, W. H., Jr., and Evans, R. M. (1998). Role of the histone deacetylase complex in acute promyelocytic leukaemia. Nature 391, 811-4. MacMillan, A. M., McCaw, P. S., Crispino, J. D., and Sharp, P. A. (1997). SC35-mediated reconstitution of splicing in U2AF-depleted nuclear extract. Proc Natl Acad Sci U S A 94, 1336. McCombie, W. R., Kelley, J. M., Aubin, L., Goscoechea, M., FitzGerald, M. G., Wu, A., Adams, M. D., Dubnick, M., Kerlavage, A. R., Venter, J. C., and Fields, C. A. (1993). Caenorhabditiselegans cDNAs. Genbank: TO 1856. Michaud, S., and Reed, R. (1991). An ATP-independent complex commits pre-mRNA to the mammalian spliceosome assembly pathway. Genes & Dev. 5, 2534-2546. Michaud, S., and Reed, R. (1993). A functional association between the 5' and 3' splice site is established in the earliest prespliceosome complex (E) in mammals. Genes Dev 7, 1008-20. Moore, M. J., and Sharp, P. A. (1992). Site-specific modification ofpre-mRNA: the 2' hydroxyl groups at the splice sites. Science 256, 992-997. Muller, S., Matunis, M. J., and Dejean, A. (1998). Conjugation with the ubiquitin-related modifier SUMO-1 regulates the partitioning of PML within the nucleus. EMBO J 17, 61-70. Patton, J. G., Porro, E. B., Galceran, J., Tempst, P., and Nadal-Ginard, B. (1993). Cloning and characterization of PSF, a novel pre-mRNA splicing factor. Genes & Dev. 7, 393-406. Peek, R., Pruijn, G. J., van der Kemp, A. J., and van Venrooij, W. J. (1993). Subcellular distribution of Ro ribonucleoprotein complexes and their constituents. J Cell Sci 106, 929-35. Potashkin, J., Naik, K., and Wentz, H. K. (1993). U2AF homolog required for splicing in vivo. Science 262, 573-5. Query, C. C., McCaw, P. S., and Sharp, P. A. (1997). A minimal spliceosomal complex A recognizes the branch site and polypyrimidine tract. Mol Cell Biol 17, 2944-53. Query, C. C., Strobel, S. A., and Sharp, P. A. (1996). Three recognition events at the branchsite adenine. EMBO J 15, 1392-402. Reed, R. (1996). Initial splice-site recognition and pairing during pre-mRNA splicing. Curr. Opin. Genet. Dev. 6, 215-220. Roscigno, R. F., Weiner, M., and Garcia-Blanco, M. A. (1993). A mutational analysis of the polypyrimidine tract of introns. Effects of sequence differences in pyrimidine tracts on splicing. J Biol Chem 268, 11222-9. Rounsley, S. D., Lin, X., Ketchum, K. A., Crosby, M. L., Brandon, R. C., Sykes, S. M., Mason, T. M., Kerlavage, A. R., Adams, M. D., Somerville, C. R., and Venter, J. C. (1997). Arabidopsis thalianachromosome II BAC F4P9 genomic sequence. Unpublished AC002332. Rudner, D. Z., Kanaar, R., Breger, K. S., and Rio, D. C. (1996). Mutations in the small subunit of the DrosophilaU2AF splicing factor cause lethality and developmental defects. Proc Natl Acad Sci U S A 93, 10333-7. Ruskin, B., and Green, M. R. (1985). Role of the 3' splice site consensus sequence in mammalian pre-mRNA splicing. Nature 317, 732-4. Ruskin, B., Zamore, P. D., and Green, M. R. (1988). A factor, U2AF, is required for U2 snRNP binding and splicing complex assembly. Cell 52, 207-19. Simons, A., Melamed-Bessudo, C., Wolkowicz, R., Sperling, J., Sperling, R., Eisenbach, L., and Rotter, V. (1997). PACT: cloning and characterization of a cellular p53 binding protein that interacts with Rb. Oncogene 14, 145-55. Staknis, D., and Reed, R. (1994). SR proteins promote the first specific recognition of PremRNA and are present together with the U1 small nuclear ribonucleoprotein particle in a general splicing enhancer complex. Mol Cell Biol 14, 7670-82. Sternsdorf, T., Grotzinger, T., Jensen, K., and Will, H. (1997). Nuclear dots: actors on many stages. Immunobiology 198, 307-31. Wahren, M., Mellqvist, E., Vene, S., Ringertz, N. R., and Pettersson, I. (1996). Nuclear colocalization of the Ro 60 kDa autoantigen and a subset of U snRNP domains. Eur J Cell Biol 70, 189-97. Wan, K., Nickerson, J. A., Krockmalnic, G., and Penman, S. (1994). The B1C8 protein is in the dense assemblies of the nuclear matrix and relocates to the spindle and pericentriolar filaments at mitosis. Proc. Natl. Acad. Sci. USA 91, 594-598. Wentz, H. K., and Potashkin, J. (1996). The small subunit of the splicing factor U2AF is conserved in fission yeast. Nucleic Acids Res 24, 1849-54. Zahler, A. M., Neugebauer, K. M., Stolk, J. A., and Roth, M. B. (1993). Human SR proteins and isolation of a cDNA encoding SRp75. Mol Cell Biol 13, 4023-8. Zamore, P. D., and Green, M. R. (1991). Biochemical characterization of U2 snRNP auxiliary factor: an essential pre-mRNA splicing factor with a novel intranuclear distribution. EMBO J 10, 207-14. Zamore, P. D., and Green, M. R. (1989). Identification, purification, and biochemical characterization of U2 small nuclear ribonucleoprotein auxiliary factor. Proc Natl Acad Sci U S A 86, 9243-7. Zhang, W. J., and Wu, J. Y. (1996). Functional properties of p54, a novel SR protein active in constitutive and alternative splicing. Mol Cell Biol 16, 5400-8. Zorio, D. A., Lea, K., and Blumenthal, T. (1997). Cloning of Caenorhabditis U2AF65: an alternatively spliced RNA containing a novel exon. Mol Cell Biol 17, 946-53. Zuo, P., and Maniatis, T. (1996). The splicing factor U2AF35 mediates critical protein-protein interactions in constitutive and enhancer-dependent splicing. Genes Dev 10, 1356-68. FIGURE LEGENDS Figure 1. Poly[U]-depleted nuclear extract requires both U2AF and PUF. A. in vitro splicing reaction on the PIP85a pre-mRNA. Nuclear extract, high salt treated but not depleted (NE), is compared to nuclear extract that has been depleted at high salt by passage over a poly[U] Sepharose column (NEAU, compare lanes 1 and 2). Addition of either the 2.0 M KCl eluate fraction of the poly U depletion column (the PUF containing fraction) or recombinant U2AF65 to the depleted extract does not efficiently restore splicing activity to the depleted extract (lanes 3 and 4). Addition of both recombinant U2AF65 and PUF restores splicing activity to the extract. The positions of the pre-mRNA, mRNA and lariat product and intermediate are indicated schematically to the right of the figure. B. PUF activity is required for efficient splicing of at least three pre-mRNAs. U2AF purified from nuclear extract (the 3 M KC1 eluate fraction of the poly[U] depletion column) was compared to U2AF65 in the reconstituted splicing system. Splicing in vitro of PIPBG (lanes 17), PIP85a (lanes 8 to 14) and AD10 (lanes 15 to 21) with or without added purified PUF activity were compared in the presence of U2AF (lanes 6 and 7, 13 and 14, and 20 and 21) or recombinant U2AF65 (lanes 4 and 5, 11 and 12, and 18 and 19). The activity supplied by the PUF fraction does not substitute for U2AF35. If PUF activity substituted for U2AF35, then PUF60 and U2AF35 containing fractions should have equivalent activities, while this may be true for PIP85a (compare lanes 12 and 13) it is not true for PIPBG or for AD10 (compare lane 5 with 6 and lane 19 with 20). A NE NEAU + His 6 U2AF65 PUF + + + O-r 0- rn-E EL 12345 B PIPBG PIP85a AD10O NE NEAU NE NEAU NE NEAU + + + PUF U2AF65 + + ++ ++ + U2AF + + + + + + + + + + I EID15 16 17 18 19 20 21 I IE I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 I Figure 2. Purification of the PUF activity. A. Schematic diagram of the purification procedure, described in the Methods section. B. Chromatography of the PUF activity on S Sepharose. Top panel, silver stained protein gel of the fractionation. Bottom panel, reconstitution of in vitro splicing activity to NEAU supplemented with recombinant U2AF65, PIP85a substrate. Load on the column (LD) and flowthrough (FT) are indicated as are fraction numbers. C. Coomassie stained gel of the highly purified PUF activity fraction (fraction 8, in figure 2B, lane 2) and Perfect Protein markers (Novagen, lanes 1 and 3). D. Purification of the S Sepharose purified material on Superose 6 gel filtration chromatography. Top panel, silver stained gel of the even numbered column fractions, peak fraction of the of the gel filtration standards (BioRad; 415 kD, fraction 23; 150 kD, fraction 27; 50 kD, fraction 33) is indicated above the gel. Bottom panel, reconstitution of in vitro splicing assay of odd numbered fractions. A NE I poly U Sepharose I 2 2.0 1.0 3.0 M KCI (NEAU) Phosphocellulose I 1 I 0.6 FT (0.1 M) 0.3 1.0 M KCI poly U 2.0 M KCI 1.0 S Sepharose 1.0 M 0.55 (PUF) FT (0.1 M) FT (0.9 M) B 1.0 M mM 500 mM 100 -m fraction: 3 i 6 7 8 9 171819 20 LD FT 1 2..... ii:i-:i:: ! ~ ::':':::''i" ii U - : 5? * LD 1 FT .0) C, 0 - 75 kD 2 ........ ' :::: ... .... : i::::: I :: s : : .:.:.::..:: 50 7::::::::::':::': ': :-:-:::i--:-::::i:--::::i:i::-S~ :::::: 3 7 kD i 8 ........... . .j 150 kD . ::'. i~ ::: : I?[: IX[!!!i[!: ii?i ii[[ !!!!::i - 35 kD - 25 kD 1.0 M 500 mM 100 mM fraction: - :-'!'''''''';::::::::':: iiiiili'iiiiiiiiii iiii~ i - 100 kD 9 17 18 oC1E o) 0 O 0 0O .L u d o it N 0 IC, 0 CV) 0 IC CN 0 cm 150 kD 1 1 50 kD 415415 I 16 18 20 22 24 26 28 30 32 34 II LD 2 3 4 5 6 7 8 9 10 11 12 13 14 LL. IL. o 12345678910 ........ Figure 3. PUF60 is ubiquitously expressed in humans as a 2.0 kb mRNA. A northern blot of human tissues (tissues indicated, Clontech) was probed with an antisense PUF60 RNA probe. A ubiquitously expressed transcript of 2.0 kb is observed and shows similar expression levels in tissue as SRm160/B 1C8 on the same blot (Blencowe et al., 1995). I PLACENTA LUNG LIVER MUSCLE KIDNEY PANCREAS ") :'iFii':i:ii:c;i i BRAIN HEART Figure 4. NEAU is depleted of both PUF60 and p54. A. PUF60 immunoblotting analysis of three different NEAU (lanes 2, 4 and 5) and the corresponding NE for two of those extracts (lanes 3 and 7). A cytoplasmic extract, S 100, is also shown (lane 1). The 55 kD band is a non-specific band that appears in immunoblots developed with pre-immune serum. B. Immunoblotting analysis of NE (lane 2) and NEAU (lane 1) detected with an antibody directed against a p54 peptide (Chaudhary et al., 1991). The p54 in the purified PUF fraction is of slower mobility than the p54 found in nuclear extract (compare lane 4 with lane 2). This slower mobility of purified p54 is a gel artifact as mixing SDS-treated and boiled NEAU to SDS-treated and boiled PUF fraction followed by immediate loading on the gel, led to the comigration of the purified activity with the endogenous activity (compare lane 3 with lanes 2 and 4). x marks a cross-reacting species. 4 -le, 44* IL V0 co LL 0 C D "0 o.4 PUF NEApU + PUF NE jNEApU Figure 5. The PUF factor p54 can form complexes with PUF60. Translation in vitro of p54, PUF60, and both p54 and PUF60 (lanes 1-3 respectively) were immunoprecipitated with p54 antibody and the pellets were resolved by SDS PAGE (lanes 5-7). Immunoprecipitation of unprogrammed lysate (UN) is shown in lane 4. Control immunoprecipitation with no antibody is shown in lane 8. TOTAL IP no anti-p54 p54 p54 PUF PUF UN p54 p54 p54 PUF PUF PUF well (/) 00a, co, PUF60 dimer ci 0) 03) o 0i p54 I- PUF60 1 2 3 4 5 6 7 8 Figure 6. PUF activity is required for efficient A3' spliceosome assembly. A time course (0 minutes, 10 minutes and 30 minutes) of complex assembly in NEAU alone (lanes 1-3), in the presence of PUF activity (lanes 4-6), in the presence of recombinant U2AF65 (lanes 7-9), or in the presence of both activities (lanes 10-12). The complexes formed in NE (lanes 13-15) are shown for comparison. NE NEAU His 6U2AF65 PUF time + + + + + + +++ +++ 0 10 30 0 10 30 0 10 300 10 30 0 10 30 A3' H 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 7. The PUF fraction has three species that crosslink to pre-mRNA. PIP85a RNA was incubated with purified PUF fraction under splicing conditions except that ATP and creatine phosphate were omitted. The reaction was exposed to ultraviolet light, digested with RNaseA and cross-linked proteins were resolved by SDS-PAGE (lane 1). Specificity of this interaction was tested by including 0.5 and 5 gg of homo- or heteropolymeric RNAs in the binding reaction. poly[G] RNA competed poorly for PUF binding (lanes 2 and 3), poly[I] and poly[C] did not compete at either concentration tested (lanes 4 and 5, 8 and 9), poly[U] competed at both concentrations tested (lanes 6 and 7) and poly [ACU] competed only at the higher concentration tested (10 and 11). 00.5 5 0.5 5 0.5 5 05 5 0.5 5 12 3 4 5 6 7 8 9 pg competitor 10 11 Figure 8. PUF does not bind the AG dinucleotide or the branch sequence. An electrophoretic mobility shift analysis of a binding titration of PUF on pyrimidine-tract RNAs with or without an AG dinucleotide is shown (lanes 9-14 and lanes 15 to 20). A similar titration is shown with a pyrimidine-tract RNA containing either a 3' branch sequence or a 5' branch sequence (lanes 1 to 4 and lanes 5 to 8). No difference in PUF60 binding was observed for any of these RNAs. ppt-bs bs-ppt ppt AG ppt PUF60 ,, . . : :: ...., . ...... :,"::': ,........,:.-..:.,: !~:::.,:..:::•. i~i: •i! :,.:,,q i~ ii• i!............................. ii~i:" BOUNDiiii BOUND BOUND FREE FREE 1234 5678 9 1011121314 15 16 17 18 19 20 Figure 9. The binding of PUF to pyrimidine-tract RNA is unaffected by the presence of either U2AF or U2AF65. Serial dilutions of PUF (lanes 2 to 10, 12 to 20, and 22 to 30) were added to constant amounts of buffer alone (lanes 12 to 20), U2AF (lanes 2 to 11), or U2AF65 (lanes 22 to 31) and then mixed with pyrimidine-tract RNA probe. U2AF in the absence of PUF is shown in lane 11 and U2AF65 in the absence of PUF is shown in lane 31. Pyrimidine-tract RNA probe is shown in lanes 1 and 21. U2AF65 U2AF PUF U2AF I U2AF65 PUF60 PUF60 U2AF65 FREE FREE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Figure 10. SF1/BBP. PUF60 interacts with the branch sequence binding protein GstSF1/BBP fusion protein (lanes 1 and 2) was immobilized on glutathione agarose and incubated with NE (even numbered lanes) or NEAU (odd numbered lanes) and associated proteins were detected by immunoblotting using antiserum to PUF60 (top panel) or to U2AF65 (bottom panel). As controls for specificity the pyrimidine-tract binding proteins GstPTB (lanes 3 and 4, Anna Gil and P. A. S., unpublished) and GstU2AF65 (lanes 5 and 6) and the Gst protein alone (lanes 7 and 8) were used. Arrow indicates the PUF60 signal Lanes marked TTL are the signal obtained from 20% of input NE. 4% 1I.t 0 (P E u AU NE AU NE E Cm u" AU NE AU NE TTL oaPUF60 TTL 1 2 3 4 i~oiiIU2AF65 5 6 7 8 TTL Figure 11. PUF60 localizes to a non-speckle domain of the nucleus. A. DAPI stained nucleus of a spontaneously transformed mouse fibroblast cell B. The same nucleus stained with the B1C8 antibody which recognizes SRml60, a nuclear speckle component (Blencowe et al., 1994). C. Staining with anti-PUF60 antibody, the PUF60 signal localizes to a discrete nuclear body. All of the available PUF60 antibodies are not observed to recognize native protein and the PUF60 staining structures seen here may reflect only the non-native PUF60 protein present in the cell. D. Merge of the DAPI, B C8 and PUF60 signals and E. merge of the B1C8 and PUF60 signal showing that some PUF60 staining bodies co-localize with B 1C8-stained nuclear speckles. 100 CHAPTER 3: PUF60 A NOVEL PYRIMIDINE TRACT BINDING FACTOR WITH HOMOLOGY TO THE SPLICING FACTORS U2AF65 AND MUD2P DIMERIZES VIA ITS CTERMINAL RRM-LIKE I)OMAIN, THE PUMP DOMAIN Patrick Schonleber McCaw, Kevin Amonlirdviman, and Phillip A. Sharp Center for Cancer Research Massachusetts Institute of Technology 102 ABSTRACT We have identified a new member of the U2AF65 family of splicing factors, PUF60. PUF60 was identified in highly purified fractions that, together with U2AF, restored splicing activity to poly[U] depleted nuclear extracts (see chapter 2). Homologues of PUF60 are found in vertebrates (human and mouse) and in the invertebrate Drosophila,but not in C. elegans. The most similar protein in the budding yeast, S. cerevisiae, is Mud2p, which is the probable orthologue of U2AF65 based on functional and structural similarities. PUF60 has the unusual property of forming an SDS-resistant dimer. This dimerization is mediated by a C-terminal RRM-like domain, the PUMP domain. The SDS-resistant dissociation constant of dimerization by the PUMP domain is approximately 1 tM. The PUMP domain homology is found in several other proteins, some of which have been shown to mediate protein-protein interactions. 103 INTRODUCTION Binding of many proteins to RNA occurs via the RNA recognition motif (RRM) domain. The RRM domain has been found in many splicing and polyadenylation factors, hnRNP proteins, and other proteins and is considered a hallmark of one class of RNA-binding proteins (Kenan et al., 1991; Nagai et al., 1995). While many RRM domains have been found to interact with RNA, others have no known RNA ligand. For instance the U A protein has two RRM domains: the first RRM mediates interaction with the Ul stem loop, while the second RRM domain has no known RNA binding activity. Despite the fact that this RRM domain has no known RNA ligand, it is the most conserved domain of the protein and has been shown to be functionally important in the yeast S. cerevisiae(Tang and Rosbash, 1996). It has been known for some time that some RRM domains can mediate protein-protein interactions. The best known example of this is the interaction of the first RRM of U2B" with U2A' (Scherly et al., 1990). The RRM domain is a compact domain of 80 or more amino acids that is characterized by two conserved motifs: the N-terminal RNP2 or hexamer motif and the central RNP 1 or octamer motif (Kenan et al., 1991). These motifs are characterized by the presence of conserved aromatic residues, but the RRM domain family is extremely diverse in sequence. Aside from the RNP2 and RNP 1 motifs, the RRM domains has few conserved positions, although there are conserved hydrophobic residues that constitute the hydrophobic core of the domain (Nagai et al., 1995). The structure of the U1A RRM1 domain has been solved both bound to the Ul snRNA stem-loop (Oubridge et al., 1994) and unbound (Avis et al., 1996; Hoffman et al., 1991; Nagai et al., 1995). The RRM domain consists of four stranded antiparallel B sheet backed by two a helices in a P a P P a P structure (Hoffman et al., 1991; Nagai et al., 1990). The tertiary structure of the RRM domain explains the high degree of conservation of the aromatic residues found in the RNP2 and RNP I motifs. These residues are found on the surface of the protein in the B1 and B3 strands that compose the RNP motifs. The conserved aromatic residues of U1A are known to make base-stacking interactions with the stem loop I RNA to which U1A binds. Many other conserved positions are found to constitute the hydrophobic core (Hoffman et al., 1991; Nagai et al., 1990; Oubridge et al., 1994). The pre-mRNA is recognized by the spliceosome through recognition of the 5' splice site, the branch sequence and the pyrimidine tract. Recognition of the 5' splice site is accomplished by means of base pairing to U1 snRNA, while recognition of the pyrimidine tract is accomplished by the binding of the pyrimidine-tract binding splicing factor U2AF (U2 104 snRNP Auxiliary Factor). U2AF is a heterodimer of 65 and 35 kD proteins and is highly conserved in all splicing organisms. U2AF65 is required for the stable, ATP-dependent association of U2 snRNP with the branch sequence. However, neither Ul snRNP nor U2AF is required for splicing in vitro. Splicing in Ul snRNP and U2AF depleted extracts is dependent on the addition of large quantities of SR proteins and is also dependent on the presence of the PUF pyrimidine-tract binding factor (MacMillan et al., 1997). The PUF pyrimidine-tract binding factor has been purified and consists of the splicing factor p54 and at least one other protein, poly[U] binding factor 60 kD (PUF60; chapter 2). PUF60 is a member of the U2AF/Mud2 family of pyrimidine tract binding proteins. In this chapter we show that PUF60 is conserved from insects to man. The Drosophilahomologue of PUF60, DPUF68, has been cloned and sequenced. Like U2AF65 and Mud2p, PUF60 has a central domain containing two RRM domains and a C-terminal domain consisting of a degenerate RRM domain. We have found that this C-terminal domain of PUF60 mediates an unusually stable homodimerization reaction and does not contribute to the RNA-binding affinity of PUF60. This domain is also shown to be a distinct subfamily of the RRM domain homology group. Because of the unusual properties of this domain and its status as a distinct subfamily of the RRM domain homology, we have termed this domain the PUMP (PUF60, U2AF65 and Mud2p Protein-Protein interaction) domain. 105 RESULTS PUF60, conserved in evolution, is related to the yeast splicing factor Mud2p We have purified a complex of proteins from HeLa cell nuclear extracts that bind at high salt concentrations to poly[U] Sepharose and promote the efficiency of RNA splicing in vitro (chapter 2). A 130 kD protein was selected for tryptic digestion and peptide sequencing. Thirteen peptides were sequenced yielding 169 amino acids of sequence. A cDNA in the human EST database was identified which encoded all thirteen sequenced peptides in a 559 amino acid open reading frame. This cDNA encoded a predicted protein of 59.9 kD with overall similarity to the pyrimidine-tract binding protein U2AF65, the human auto-immune antigen HCC (Imai et al., 1993) and the S. cerevisiaesplicing factor Mud2p. The protein encoded by this cDNA is referred to as PUF60 (poly[_] binding Factor-60 kD). A search of the EST database identified numerous mouse cDNA sequences allowing reconstruction of the murine ORF from the sequences available in the database. Only 12 amino acids were found to differ between the human and murine PUF60 homologues. These differences consisted of an insertion of five alanines in the N-terminal poly-alanine repeat, and the remaining seven differences were conservative substitutions (data not shown). A PUF60 homologue was identified in the DrosophilaEST database and has been sequenced. The DrosophilacDNA was sequenced and is 45.3% similar to PUF60 over its entire length (figure lA). The Drosophilahomologue (DPUF68) contains an ORF of 570 amino acids, encoding a predicted protein of 68 kD. Comparison of the hypothetical translation products of the human, mouse, and Drosophilahomologues identified four domains of this protein (figure 1). The N-terminal domain is not well conserved between PUF60 and DPUF68, and DPUF68 contains four RS or SR dipeptides which are not present in the mammalian proteins. The conserved regions begin with a segment of approximately 40 amino acids immediately upstream of the first RNA recognition motif (RRM) domain. The central domain consists of two RRM domains with good matches to the RNP 1 and RNP2 consensus motifs; these domains are similar to the RRM domains of U2AF65 and Mud2p. The C-terminal domain is preceded by a variable domain that, in Drosophila,contains a polyalanine motif similar to the polyalanine motif found near the N-terminus of the mammalian PUF60 homologue. The C-terminal domain is similar to the RRM domain, however it is unusual in that it is characterized by a poor match to the N-terminal RNP2 consensus motif, but does have a good match to the central RNP I1consensus motif. This C-terminal domain is the most conserved part of the protein when compared among the 106 PUF60 homologues U2AF65, Mud2p and HCC. Because this domain has both sequence features and biochemical activities (discussed below) that differ from the general class of RRM domains it is referred to as the PUMP (PUF60, U2AF65, Mud2p protein-protein interaction) domain. The PUMP domain is a distinct subfamily of the large RRM domain family To determine if the unusual sequence characteristics of the PUMP domain of PUF60 and its homologues U2AF65 and Mud2p are found in other proteins we performed a database search by building a motif block of the PUMP domains of these proteins using the BLOCKMAKER algorithm (Henikoff et al., 1995). The block was then used to search the non-redundant Genbank database for potential homologues using the MAST program (Bailey and Gribskov, 1998; Bailey and Gribskov, 1997). Several PUMP domain homologues were identified that had not previously been identified as RRM domains (for example, U2AF35), while others had previously been identified as potential RRM domains (for example, kis/PCIP2). Alignment of the potential PUMP domain homologues (using ClustalW; Thompson et al., 1994) with all of the RRM family of domains found in the PFAM database (Sonnhammer et al., 1998) clearly demonstrated that the PUMP domain is a distinct subset of the RRM domain family (figure 2). Notably, alignment of the PUMP domain with the RRM domain family showed that the conserved hydrophobic residues of the RRM domain were conserved in the PUMP domain as well. However, the RNP2 motif was absent and may be replaced instead by a cluster of hydrophobic amino acids. In contrast to the PUMP RNP2 motif, an RNP1 motif was readily identifiable in the PUMP homology, but the RNP 1 motif showed remarkable differences from that of the RRM domain family consensus. In the PUMP domains the conserved basic residue ofRNP1 was replaced with a hydrophobic residue and the first aromatic residue was replaced with lysine, asparagine or arginine. C-terminal to the RNP 1 motif a conserved hydrophobic residue in the RRM family is found to be an aromatic residue, while C-terminal to the suspected position of the RNP2 motif there is often a clustering of acidic residues in the PUMP family that is not present in the RRM family. Together these features form a signature for the PUMP domain. Proteins identified as containing PUMP homologies by this method include the cytoplasmic kinase, P-CIP2, a protein previously identified as having homology with the Cterminus of U2AF65, U2AF35 and its homologue Urp, and Tat-SF 1, a protein implicated in the transcriptional elongation activity of the HIV protein Tat. Several other examples were identified that are implicated in splicing including the S. cerevisiae factor Cus2 (Wells et al., 1996) and the S. pombe U2AF interacting protein Uap2 (McKinney et al., 1997). Domains 107 that are likely to be PUMP homologues based on this criteria include the splicing factor Prp24, and the yeast genes Not4, Nrdl and Ylfl which are involved in RNA metabolism. The PUMP domain is a protein-protein interaction domain Because PUF60 is encoded by an open reading frame of 559 amino acids, it is expected to have a mass of 59.9 kD and is expected to migrate on an SDS-polyacrlyamide gel at 60 kD. Instead, PUF60 was identified as a 130 kD product (chapter 2, figure 2C). The paradox of how a 130 kD protein purified from nuclear extract could be encoded by an open reading frame of 559 residues was resolved when PUF60 was produced in vitro by a translation reaction. The predominant translation product migrated as a 60 kD polypeptide, but a small amount migrated as a 130 kD polypeptide. This suggested that the 60 kD primary translation product could form SDS-resistant dimers with an apparent mobility of a 130 kD polypeptide. To test whether the 130 kD form is a dimer of PUF60, full-length PUF60 and N-terminally truncated PUF60 (deleted from 1 to 76, PUF60AN) were co-translated (figure 3A). Translation in vitro of the full length PUF60 produced a predominant 60 kD product and a small amount of a 130 kD product. Translation in vitro of PUF60AN produced a predominant band of approximately 55 kD and a small amount of a 120 kD species. Co-translation of both the full length and PUF60AN produced the expected 60 kD and 55 kD bands as well as bands of 130, 125, and 120 kD. These latter bands were present in approximately the ratio of 1:2:1, consistent with the hypothesis that the 130 kD and 125 kD species were homodimers of the input forms of PUF60 while the 125 kD species was a heterodimer of these two forms. Other experiments indicate that dimerization of PUF60 was stable in high concentrations of the reducing reagents dithiothreitol (100 mM) or 2-mercaptoethanol (280 mM) and was stable to the addition of 4-vinylpyridine, a thiol-alkylating reagent, when added in excess after reduction with 2-mercaptoethanol, indicating that dimerization was not dependent on disulfide bond formation (data not shown). Consistent with the hypothesis that PUF60 forms SDS-resistant dimers, we observe in immunoblot experiments that the PUF60 protein in nuclear extract occurs in both monomer (60 kD) and dimer (130 kD) form (chapter 2, figure 3A). Bacterially expressed PUF60 also formed SDS-resistant dimers (figure 3B). Further, this dimerization was found to be concentration dependent; high concentrations of monomers formed dimers more efficiently than low concentrations. When the concentration of PUF60 protein, which was synthesized in bacteria as a His6 fusion protein (His 6PUF60), was increased by two-fold increments from 70 nM to 2.3 gM the fraction of protein migrating as a dimer dramatically increased. It was estimated that an approximately equal fraction of PUF60 was 108 monomer and dimer at between 0.75 to 1.5 gM suggesting a dissociation constant for the SDSresistant form of about 1 jtM. The PUMP domain at the carboxyl terminus of PUF60 was necessary for formation of the dimer. PUF60AC protein, which lacks this domain (amino acids 516 to 559), does not form dimers even at high concentrations of protein. This C-terminal truncation deletes the last 44 amino acids containing the RNPI motif of the PUMP domain. The PUF60AC version of the protein was produced as a His 6 fusion protein in parallel with His 6PUF60. Low and high concentrations of these proteins were incubated and then analyzed by SDS-polyacrylamide gel electrophoresis (SDS-PAGE)(Figure 6). His6PUF60 formed SDS-resistant dimers while His 6PUF60AC did not. It should be noted that there are no cysteines in the deleted region of the protein which could stabilize dimerization by disulfide bond formation. Formation of SDS-resistant dimers is a property of the PUMP domain alone. Different portions of PUF60 were expressed in translation reactions in vitro and in bacterial expression systems. These proteins were individually tested for formation of SDS-resistant dimerization at high concentrations (figure 3D). Subregions of PUF60 that included the PUMP domain formed dimers, while subregions that did not include the PUMP domain did not form dimers. A Gst fusion protein with the C-terminal 94 amino acids of PUF60, encoding the entire PUMP domain, formed SDS-resistant dimers at approximately the same efficiency and concentration as the full length protein. The PUMP domain does not contribute to RNA binding To test if the PUF60 PUMP domain contributed significantly to RNA-binding activity, the affinities of bacterially expressed PUF60 and PUF60AC for a pyrimidine-tract RNA were determined. A series of different length pyrimidine-tract RNAs were synthesized using sequences from the PIP85a substrate, and RNA binding was evaluated using an electrophoretic mobility shift assay. Pyrimidine-tract RNAs of 14 (pyrl4) and 23 (pyr23) nucleotides were bound to approximately the same extent by PUF60. In contrast, RNAs of 11 (pyrl 1) and 7 (pyr7) nucleotides were not bound by PUF60. This suggested that optimal binding required sequences longer than 11 nucleotides. The specificity of PUF60 binding to pyr23 was examined by competition with RNA homo- and heteropolymers. Pyr23 binding was competed only with RNA polymers containing uridine, i.e., with poly(U), poly(C,U), and poly (G,U). This binding could not be competed with homopolymers of poly[C], poly[G], poly[A] or poly[I]. 109 When the binding activity of His 6PUF60 was compared to that of His 6PUF60AC for the substrate pyr23, the two proteins had indistinguishable affinities (Kd obs = 138 nM and 122 nM respectively). The Hill coefficients, which indicate degree of cooperativity during binding, were evaluated (Creighton, 1984). The coefficients were found to be 2.8 and 1.8, respectively. This difference in Hill coefficients suggests that the PUMP domain of PUF60 mediates a cooperative pyrimidine tract binding; however, as these proteins are not monodisperse (data not shown) the Hill coefficients are difficult to interpret. Under these same conditions a His 6 fusion of U2AF65 had a Kd obs of 300 nM and a Hill coefficient of 1.1. Zamore et al. (1992) report values of GstU2AF65 binding ranging from 10 nM to 2 tM for wild-type pyrimidine tracts. These latter affinity measurements were made with RNAs that differ in two respects from pyr23. First, they contain a branch sequence and 3' splice site, and second, the pyrimidine tracts tend to be shorter than those of either pyr23 or pyrl4. These measurements were made with the GstU2AF65 fusion protein rather than the His, fusions used here; glutathione-Stransferase dimerization may also have affected affinity measurements. 110 DIscusSION We have identified a new member of the U2AF65 family of pyrimidine-tract binding factors, PUF60. PUF60 was identified in highly purified fractions that, together with U2AF, restored splicing activity to poly[U] depleted nuclear extracts. Orthologues of the PUF60 are found in vertebrates (human and mouse) and in the invertebrate Drosophila. The closest homologue in the budding yeast, S. cerevisiae, is Mud2p, which is structurally similar and approximately equally distant in sequence similarity to both PUF60 and U2AF65 (data not shown). PUF60 has the unusual property of forming SDS-resistant dimers. This dimerization is mediated by a C-terminal RRM-like domain, the PUMP domain. The dissociation constant for formation of the SDS-resistant dimers was approximately 1 ptM. The PUMP domain homology is found in several other proteins, some of which have been shown to mediate protein-protein interactions. RNA binding activity of PUF60 Like U2AF65, PUF60 was purified from nuclear extracts based on its ability to associate with poly[U] RNA at high salt concentrations. We subsequently tested the RNA binding activity of bacterially expressed His 6 PUF60 and found that this protein bound a pyrimidine-tract RNA of 23 nucleotides with high affinity. This affinity, 130 nM, was comparable to that of U2AF65, 300 nM, for the same RNA. PUF60 RNA binding to pyrimidine-tract RNAs of 14, 11 and 7 nucleotides was also tested. PUF60 requires pyrimidine-tract RNAs of greater than 11 nucleotides for high affinity binding. As the RNAs tested for binding were short oligonucleotides it is not known what effect, if any, flanking RNA would have on PUF60's affinity for these short RNAs. U2AF65 has been reported to contact the branch sequence via its N-terminal SR domain. Interestingly, PUF60 does not have an SR domain and so may not be able to make similar contacts. The RNA binding specificity of PUF60 for the pyrimidine tract was tested by competing PUF60 with RNA homopolymers and heteropolymers. Only polymers containing uridine could successfully compete for pyrimidine-tract binding. Notably poly(C) could not compete PUF60 binding although both poly(C,U) and poly(U) could. This result suggests that PUF60, like U2AF65 and unlike PTB, prefers binding uridine-rich sequences, although both appear to tolerate the presence of cytosine (Singh et aL, 1995). This sequence specificity mimics the consensus sequence of the pyrimidine tract, which is composed predominantly of uridine (Roscigno et al., 1993). 111 PUF60 is a U2AF65 homologue PUF60 is distantly similar to U2AF65; however this homology extends across the entire length of these two proteins with the exception of the very N-terminal domain. Both proteins have a relatively non-conserved N-terminal domain, two central RRM domains and a C-terminal RRM-like PUMP domain. Comparison of the N-terminal domains is informative in that only PUF60, among these proteins, lacks SR dipeptide repeats. DPUF68, however, has five SRdipeptide repeats in this region. This is reminiscent of the variable number of SR dipeptide repeats observed between invertebrates and vertebrates in the large subunit of U2AF65 (Kanaar et al., 1993). Phylogenetic analysis of these proteins and their metazoan and yeast homologues showed that PUF60 and U2AF65 were approximately equally related to the yeast protein Mud2p. Mud2p, while not required for viability, does play a central role in 3' splice site recognition. MUD2 interacts genetically with Ul snRNA and is present in the commitment complex. Mud2p and U2AF65 are both known to interact with the branch site interacting protein SF1/BBP (Berglund et al., 1998). PUF60 has also been shown to weakly interact with SFl/BBP (chapter 2). Mud2 is also known to interact with the splicing factor Prpl 1 (which adds to the spliceosome coincidentally with U2 snRNP; Abovich et al., 1994), suggesting that Mud2p plays a role in bringing U2 snRNP to the branch sequence. Analysis of the complete yeast genome database demonstrates that there are no other PUF60 or U2AF65 homologues. It is interesting to speculate that PUF60 and U2AF65 may interact with SF1/BBP at different points of spliceosome assembly. Alternatively, PUF60 may function on a discrete set of introns and may play a regulatory role. We do not favor this model as PUF60 and U2AF65 show similar binding affinities and specificities and PUF60 has been shown to be required for efficient splicing of each intron tested (chapter 2). The PUMP domain is a subset of the RRM domain family The highest degree of conservation between PUF60, U2AF65, and Mud2p is found in the C-terminal PUMP domain. This domain had previously been identified as an RRM-like domain but has also been recognized as being divergent from the RRM domain sequence (Birney et al., 1993). The PUF60, U2AF65 and Mud2p C-terminal domains were used to search the protein database to identify other possible homologues. Several other domains that previously had been identified as RRM domains, but are more closely related to the PUMP domain subfamily, were identified. The small subunit of U2AF35 and its homologue U2AFbpl/Urp, in contrast, had not previously been identified as containing an RRM homology, but showed significant homology to the PUMP domain. 112 Additional PUMP domain homologues can be found among several proteins that have previously been identified as RRM homologues: kis/P-CIP2, a kinase that binds the cytoplasmic tail of the transmembrane protein PAM; Tat-SF1, which binds to the HIV tat-TAR complex; D 111, an Arabidopsis protein implicated in DNA-damage repair; CUS2, a splicing factor in S. cerevisiae;and UAP2, which interacts with S. pombe U2AF large subunit. Less clearly homologous are domains found in PRP24, a splicing factor, Not4, which has been shown to regulate basal and activated transcription, Nrdl a regulator of pre-mRNA abundance in yeast (Steinmetz and Brow, 1996), Nop4 and RNA12 which are important for rRNA maturation, and Ylfl which encodes a GTP binding protein and was identified as a polymersase III mutant suppressor. The function of these domains is unknown, but it is interesting to note that all known examples of PUMP domains are in some way associated with RNA metabolism in the nucleus except for P-CIP2 which may be an accessory factor in Golgi shuttling. Because the RRM domain is indicative of RNA binding activity, we tested whether the PUMP domain of PUF60 contributed to RNA binding activity. Surprisingly, deletion of the Cterminal half of this domain had no detectable effect on the RNA binding activity of PUF60, strongly arguing that the PUMP domain of PUF60, in contrast to that of U2AF65, does not contribute to the RNA binding activity of the protein. In contrast to the PUMP domain of PUF60, the PUMP domain of U2AF65 does contribute to RNA binding as deletions of either RRM domain or the PUMP domain severely effected RNA-binding activity. The PUMP domain of U2AF65 has recently been shown to interact with SFl/BBP, suggesting that it may play multiple roles in 3' splice site recognition. RRM domains are generally found to be RNA binding domains, but exceptions to this rule have been identified. Perhaps the best known example of an RRM that has no known RNA binding activity is the second RRM domain of U1A. While UIA is known to bind RNA, it is the first RRM domain of this protein that is sufficient for the known RNA-binding activities of this protein (Scherly et al., 1990). We have identified an family of RRM domains that form a distinct subset of the RRM domain class, the PUMP family. The PUMP domain of PUF60 is unlikely to have RNA binding activity and is a protein-protein interaction domain. This appears to be a general feature of the PUMP domain class. Since the second RRM of U1A does not bind RNA, it was of interest to determine ifU1A is a PUMP domain; surprisingly, it is not (data not shown). This raises the question of whether RRM domains can generally be assumed to be RNA-binding domains. 113 The PUMP domain is a protein-protein interaction domain The PUMP domain was shown by deletion analysis to be responsible for the unusual SDS-resistant dimerization of PUF60. The fact that dimerization is stable to the addition of SDS was unexpected but is not without precedent. Bacteriophage P22 tailspike Endorhanosidase is a protein that is resistant to SDS denaturation (Goldenberg et al., 1982). Like the RRM domain, and presumably the PUMP domain, the tailspike protein consists predominantly of B strands (Steinbacher et al., 1994). B-strand containing proteins have been shown to be remarkably resistant to denaturants. Examples of this are found in amyloid plaques and Prion protein which are insoluble under most conditions and are composed predominantly of B sheets (Meyer et al., 1986; Pan et al., 1993). Denaturation of protein by SDS is believed to occur by disruption of the proteins' hydrophobic core, and it is interesting to speculate that the affinity of the PUF60 PUMP domain dimerization may have only limited contribution by a hydrophobic pocket. In support of this idea it has recently been shown that a B sheet can form stable, non-aggregating structures in the absence of a hydrophobic core (Pham et al., 1998). PUMP domain interactions: other proteins It is interesting to note that the region of U2AF35 and Urp that are implicated in binding U2AF65 contain a PUMP domain (Tronchere et al., 1997; Wentz and Potashkin, 1996; Zhang et al., 1992). This domain has previously been described as the H2 homologous region of Urp and U2AF35. The heterodimerization reaction between U2AF65 and U2AF35 is known to be very stable. U2AF35 remains bound to U2AF65 in the presence of at least 2 M KCl and coelutes from poly[U] Sepharose with U2AF65 in 2M guanidine. This heterodimerization reaction is different from the homodimerization reaction of PUF60 in that the U2AF65 PUMP domain is not involved. Instead, a small region of U2AF65, N-terminal to the first RRM, is required for this interaction. This heterodimerization interaction is reminiscent of the heterodimerization between the short cytoplasmic domain of PAM (Eipper et al., 1993), the PCIP2 ligand, and the PUMP domain protein P-CIP2 (Alam et al., 1996). Deletion analysis of the C-terminal domain of PAM implicates a 35 amino acid region as the P-CIP2 binding site (Alam et al., 1996). Comparison of the region of the large subunit of U2AF from vertebrates, Drosophilaand the fission yeast S. pombe required for small subunit binding together with the C-terminal domain of PAM, required for the interaction of the PUMP domain of P-CIP2 with PAM, suggests that the PUMP domains of the small subunits of U2AF and of P-CIP2 bind the peptide motif glycine, phenylalanine or tyrosine, glutamate or aspartate, X, valine or leucine, 114 serine or threonine (G, F/Y, E/D, X, V/I, T/S). It will be of interest to determine if this is the PUMP domain interaction peptide of these proteins and if this motif is more generally found to be a PUMP domain interaction motif. 115 METHODS AND MATERIALS: Identification of ESTs, sequencing, and alignments The human EST clone 33065 (accession numbers R43914 and R18804) was sequenced by oligo directed automated dideoxy sequencing (ABI) at MIT Biopolymers. The Drosophila clone was identified by a BLAST search of the DrosophilaEST database and clone CK001 12 was sequenced. Management of the sequencing project was by DNA* (Lasergene). Murine ESTs sequences from the following accession numbers were aligned to generate the murine PUF60 sequence: AA139823, W33907, W29734, AA086867, AA068510, R75215, AA062267, W29345, AA023679, 11166178, AA002661, AA109731, W98367, W78451, AA073303, W14352, W64149, AA075822, AA184449, AA105954, W44199, AA003590, W34769, W82745, W65772, W65505, AA175197, AA182190, and R75214. Alignments of the PUF60, U2AF65, and Mud2p protein sequence was by Clustal W. Identification of new PUMP domains was achieved by identifying blocks of similarity in the C-terminal domains of PUF60, U2AF65, and Mud2p using BLOCKMAKER. The resulting blocks were used to search the non-redundant database using the MAST program. Alignment of the PUMP domains and the compiled sequences of the RRM domain (PFAM) was by Clustal W. The PUMP and PFAM seed RRM dendrogram was displayed by TreeViewPPC (Page, 1996). Expression and purification of His 6PUF60 and His6PUF60AC pET15-His 6PUF60 and pET15-His 6PUF60AC were transfected into BL21 [DE3] cells and grown to mid-log phase where they were induced with 1 mM IPTG. Induced cells were harvested after 7 hours by centrifugation. Cells were lysed in QBA (6 M Guanidine HC1, 10 mM 2-mercaptoethanol, 100 mM NaPO 4, 10 mM Tris, pH 8.0) and sonicated to shear the DNA. Lysate was spun 25,000g for 15 minutes and the pellet was discarded. Lysate was loaded onto a Ni-NTA Agarose (Qiagen) gel according to the manufacturers instructions, washed with QBB (8.0 M Urea, 10 mM 2-mercaptoethanol, 100 mM NaPO 4 , 10 mM Tris, pH 6.3 with HC1) and QBC (QBB at pH 4.5). The protein was eluted in QBE (QBB at pH 3.7) and the pH was brought to 7.5 with the addition of Tris base. Eluted protein was refolded by applying to fresh Ni-NTA Agarose, washing the column with a linear gradient of 6 M Urea to 1 M Urea in QR (500 mM KC1, 20% glycerol, 20 mM HEPES, 10 mM 2-mercaptoethanol, pH 7.9) and eluted with 250 mM imidazole, 50 mM EDTA in 1 M Urea QR. Protein is dialyzed against 100 mM KC1, 20 mM HEPES (pH 7.9) 0.2 mM EDTA, 20% glycerol, 40 mM DTT and stored frozen. Yield on refolding was 10 and 18% respectively, 4.5 and 4.7 mg was 116 obtained from 1 liter of cultured cells. The purified, refolded protein was polished on Mono Q Sepharose (Pharmacia) on a linear gradient of 350 mM KCl to 500 mM KCl in 20 mM Tris pH 7.5 (at room temperature), 0.2 mM EDTA and 10% glycerol. Peak fractions were dialyzed against 0.1 M KC1, 20 mM HEPES, 0.2 mM EDTA, 0.05% Np40, 20% glycerol, pH7.9, frozen on liquid nitrogen and stored at -800 C. Translation in vitro Plasmids for translation in vitro were pCITE 4 with the following fragments of PUF60 cloned. Translation of PUF60 and the deletion series was in the coupled transcription and translation system, TnT (Promega), and was labeled by incorporation of 35S-Methionine Dimerization assay All samples were made 1 x with SDS-electrophoresis buffer (1% SDS, 62.5 mM TrisCl pH 6.8, 10% glycerol and 180 mM 2-mercaptoethanol) and boiled for six minutes before resolution on BioRad 4-15% ReadyGels. For figure 3B the proteins were diluted serially, quantitation of this and similar gels gave results similar to that shown in figure 3D. For figure 3D two quantities of His 6PUF60 were used and the concentration was varied by changing the volume. This allowed better quantitation of the small amount of dimer formed at low concentrations. Protein concentration was determined by Bradford Assay (BioRad). Gels were stained with SYPRO Red and scanned with a Storm 860 (Molecular Dynamics). Binding curves were generated by determining the fraction of dimer and monomer at each concentration and plotting fraction of dimer and monomer as a function of protein concentration. Data was fit to the Hill equation using Kaleidagraph: 0 = 1/(I+(Kd /[A]) n where 0 is fraction of dimer or monomer, Kd is the apparent dissociation constant, [A] is the protein concentration, and n is the Hill coefficient. RNA binding assay RNA binding were carried out in 83 mM KC1, 60 gtM EDTA, 3 mM MgC 2, 26 mM HEPES pH 7.9, 2.5 mM dithiothreitol, 6% glycerol and 10 tg/ml BSA (New England Biolabs). Binding reactions were run out on an 8% (60:1) polyacrylamide gel in 50 mM Tris, 50 mM glycine, 3 mM MgAcetate at 10 V/cm. Dried gels were exposed to Phosphorimager plates (Molecular Dynamics), and binding was quantitated by determining the volume of each band, corrected for background. Fraction bound at each concentration was determined for three identical experiments. Protein concentration was determined by Bradford Assay (BioRad). 117 Binding curves were generated by determining the fraction bound at each concentration and plotting the fraction bound as a function of protein concentration. Data was fit to the Hill equation using Kaleidagraph as for the protein dimerization assay. 118 REFERENCES Abovich, N., Liao, X. C., and Rosbash, M. (1994). The yeast MUD2 protein: an interaction with PRP 11 defines a bridge between commitment complexes and U2 snRNP addition. Genes Dev 8, 843-54. Alam, M. R., Caldwell, B. D., Johnson, R. C., Darlington, D. N., Mains, R. E., and Eipper, B. A. (1996). Novel proteins that interact with the COOH-terminal cytosolic routing determinants of an integral membrane peptide-processing enzyme. J Biol Chem 271, 28636-40. Avis, J. M., Allain, F. H., Howe, P. W., Varani, G., Nagai, K., and Neuhaus, D. (1996). Solution structure of the N-terminal RNP domain of U1A protein: the role of C-terminal residues in structure stability and RNA binding. J Mol Biol 257, 398-411. Bailey, T. L., and Gribskov, M. (1998). Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48-54. Bailey, T. L., and Gribskov, M. (1997). Score distributions for simultaneous matching to multiple motifs. J Comput Biol 4, 45-59. Berglund, J. A., Abovich, N., and Rosbash, M. (1998). A cooperative interaction between U2AF65 and mBBP/SF1 facilitates branchpoint region recognition. Genes Dev 12, 858-67. Birney, E., Kumar, S., and Krainer, A. R. (1993). Analysis of the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res 21, 5803-16. Creighton, T. E. (1984). Proteins: Structures and Molecular Principles (New York: W. H. Freeman and Company). Eipper, B. A., Milgram, S. L., Husten, E. J., Yun, H. Y., and Mains, R. E. (1993). Peptidylglycine alpha-amidating monooxygenase: a multifunctional protein with catalytic, processing, and routing domains. Protein Sci 2, 489-97. Goldenberg, D. P., Berget, P. B., and King, J. (1982). Maturation of the tail spike endorhamnosidase of Salmonella phage P22. J Biol Chem 257, 7864-71. Henikoff, S., Henikoff, J. G., Alford, W. J., and Pietrokovski, S. (1995). Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163, GC 17-26. Hoffman, D. W., Query, C. C., Golden, B. L., White, S. W., and Keene, J. D. (1991). RNA-binding domain of the A protein component of the Ul small nuclear ribonucleoprotein 119 analyzed by NMR spectroscopy is structurally similar to ribosomal proteins. Proc Natl Acad Sci U S A 88, 2495-9. Imai, H., Chan, E. K., Kiyosawa, K., Fu, X. D., and Tan, E. M. (1993). Novel nuclear autoantigen with splicing factor motifs identified with antibody from hepatocellular carcinoma. J Clin Invest 92, 2419-26. Kanaar, R., Roche, S. E., Beall, E. L., Green, M. R., and Rio, D. C. (1993). The conserved pre-mRNA splicing factor U2AF from Drosophila:requirement for viability. Science 262, 56973 Kenan, D. J., Query, C. C., and Keene, J. D. (1991). RNA recognition: towards identifying determinants of specificity. Trends Biochem Sci 16, 214-20. MacMillan, A. M., McCaw, P. S., Crispino, J. D., and Sharp, P. A. (1997). SC35-mediated reconstitution of splicing in U2AF-depleted nuclear extract. Proc Natl Acad Sci U S A 94, 1336. Meyer, R. K., McKinley, M. P., Bowman, K. A., Braunfeld, M. B., Barry, R. A., and Prusiner, S. B. (1986). Separation and properties of cellular and scrapie prion proteins. Proc Natl Acad Sci U S A 83, 2310-4. Nagai, K., Oubridge, C., Ito, N., Avis, J., and Evans, P. (1995). The RNP domain: a sequence-specific RNA-binding domain involved in processing and transport of RNA. Trends Biochem Sci 20, 235-40. Nagai, K., Oubridge, C., Ito, N., Jessen, T. H., Avis, J., and Evans, P. (1995). Crystal structure of the U1A spliceosomal protein complexed with its cognate RNA hairpin. Nucleic Acids Symp Ser, 1-2. Nagai, K., Oubridge, C., Jessen, T. H., Li, J., and Evans, P. R. (1990). Crystal structure of the RNA-binding domain of the Ul small nuclear ribonucleoprotein A. Nature 348, 515-520. Oubridge, C., Ito, N., Evans, P. R., Teo, C. H., and Nagai, K. (1994). Crystal structure at 1.92 A resolution of the RNA-binding domain of the U1A spliceosomal protein complexed with an RNA hairpin. Nature 372, 432-8. Page, R. D. M. (1996). TREEVIEW: An application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 12, 357-358. Pan, K. M., Baldwin, M., Nguyen, J., Gasset, M., Serban, A., Groth, D., Mehlhorn, I., Huang, Z., Fletterick, R. J., Cohen, F. E., and et al. (1993). Conversion of alpha-helices into beta-sheets features in the formation of the scrapie prion proteins. Proc Natl Acad Sci U S A 90, 10962-6. 120 Pham, T. N., Koide, A., and Koide, S. (1998). A stable single-layer beta-sheet without a hydrophobic core. Nat Struct Biol 5, 115-9. Roscigno, R. F., Weiner, M., and Garcia-Blanco, M. A. (1993). A mutational analysis of the polypyrimidine tract of introns. Effects of sequence differences in pyrimidine tracts on splicing. J Biol Chem 268, 11222-9. Scherly, D., Dathan, N. A., Boelens, W., van Venrooij, W. J., and Mattaj, I. W. (1990). The U2B" RNP motif as a site of protein-protein interaction. EMBO J 9, 3675-81. Singh, R., Valcarcel, J., and Green, M. R. (1995). Distinct binding specificities and functions of higher eukaryotic polypyrimidine tract-binding proteins. Science 268, 1173-6 Sonnhammer, E. L., Eddy, S. R., Birney, E., Bateman, A., and Durbin, R. (1998). PFAM: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 26, 320-2. Steinbacher, S., Seckler, R., Miller, S., Steipe, B., Huber, R., and Reinemer, P. (1994). Crystal structure of P22 tailspike protein: interdigitated subunits in a thermostable trimer. Science 265, 383-6. Steinmetz, E. J., and Brow, D. A. (1996). Repression of gene expression by an exogenous sequence element acting in concert with a heterogeneous nuclear ribonucleoprotein-like protein, Nrdl, and the putative helicase Senl. Mol Cell Biol 16, 6993-7003. Tang, J., and Rosbash, M. (1996). Characterization of yeast Ul snRNP A protein: identification of the N- terminal RNA binding domain (RBD) binding site and evidence that the C- terminal RBD functions in splicing. RNA 2, 1058-70 Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673-80. Tronchere, H., Wang, J., and Fu, X. D. (1997). A protein related to splicing factor U2AF35 that interacts with U2AF65 and SR proteins in splicing of pre-mRNA. Nature 388, 397-400. Wells, S. E., Neville, M., Haynes, M., Wang, J., Igel, H., and Ares, M., Jr. (1996). CUSl, a suppressor of cold-sensitive U2 snRNA mutations, is a novel yeast splicing factor homologous to human SAP 145. Genes Dev 10, 220-32. Wentz, H. K., and Potashkin, J. (1996). The small subunit of the splicing factor U2AF is conserved in fission yeast. Nucleic Acids Res 24, 1849-54. Zhang, M., Zamore, P. D., Carmo, F. M., Lamond, A. I., and Green, M. R. (1992). Cloning and intracellular localization of the U2 small nuclear ribonucleoprotein auxiliary factor small subunit. Proc Natl Acad Sci U S A 89, 8769-73. 121 FIGURE LEGENDS Figure 1. Sequence of PUF60 and DPUF68 and comparison of PUF60, U2AF65 and DPUF68. A. Alignment of HPUF60 and DPUF68, with RRM and PUMP domain alignments of U2AF65 and Mud2p. The middle line shows identity (*) and similarity (:) between HPUF60 and DPUF68, underlined amino acids sequence are peptides identified by amino acid sequencing. B, C, and D. The two RRM domains and PUMP domain of HPUF60, DPUF68, Mud2p and HU2AF65 were aligned separately and residues that are absolutely conserved between the four proteins are shown in white on a black background while residues conserved between three of the proteins are shown with gray background. The locations of the RNP 1 and RNP2 motifs is indicated for each RRM and the PUMP domain. 122 A HPUF60 MAT---ATIALQVNGQQGGGSEPAAAAAVVAAG --- DKWKP--POGTDSIK ---* *: : : : **: : *: : * :* * * MENGQ 47 * :: :* DPUF68 MGSNDRASRSPRSDDQREISDMPATKRTRSDSGKSTDSKIPYLSQPLYDLKQTGDVKFGP 60 HPUF60 ST-AAKLG ----- LPPLTPEOOEALOKAKKYAMEOSIKSVLVKQTIAHQQQQLTNLQMAA 102 * ** :* ** *: **:: : ************ **:******:*** DPUF68 GTRSALLGLLGGALPKLSSEQHDLVSKAKKYAMEQSIKMVLMKQTLAHQQQQLA ------ 114 HPUF60 VTMGFGDPLSPLQSMAAQRQRALAIMCRVYVGSIYYELGEDTIRQAFAPFGPIKSIDMSW 162 DPUF68 ----------- TQRTQVQRQQALALMCRVYVGSISFELKEDTIRVAFTPFGPIKSINMSW 163 HPUF60 DSVTMKHKGFAFVEYEVPEAAOLALEQMNSVMLGGRNIKVGRPSNIGQAQPIIDQLAEEA 222 DPUF68 DPITQKHKGFAFVEYEIPEGAQLALEQMNGALMGGRNIKVGRPSNMPQAQQVIDEVQEEA 223 HPUF60 RAFNRIYVASVHODLSDDDIKSVFEAFGKIKSCTLARDPTTGKHKGYGFIEYEKAOSSOD 282 * ***:.********** ::********* * * **: ********* :: : *: DPUF68 KSFNRIYVASIHPDLSEEDIKSVFEAFGPILYCKLAQGTSLHTHKGYGFIEYANKQAMDE 283 HPUF60 AVSSMNLFDLGGOYLRVGKAVTPPMPLLTPATPGGLPPAAAVAAAAATAKITAQEAVAGA 342 ** ******W*** **t****** * *.* ** ************* * :*** DPUF68 AIASMNLFDLGGQLLRVGRSITPPNALACPTTNSTMPTAAAVAAAAATAKIQALDAVASN 343 HPUF60 AVLG-TLGTP ----------- G-LVS----------PALTLAQPLGTLPQAVMAAQAP- 377 AtW* : ** * .** ***. * * ** *: * .* DPUF68 AVLGLSQNTPVMAAGAVVTKVGAMPVVSAATSAAALHPALAQAAP-ALLPPGIFQAPTPV 402 HPUF60 ----- GVITG------ VTPARPP--------IPVTIPSVGV-VNPILAS---- PPTLG-- 411 ** * :* ** *: :*:*: *** : * * :*: * DPUF68 APSLLGVPAGLQXLQAVVPTLPPPALLATPTLPMTVGGVGVGLVPTVATLAGAEASKGAA 462 HPUF60 ------------------ LLE--PKK-EKEEEELFPESERP---EMLSEOEHMSISGSSA 447 * * * : * **::*** :***** * ** DPUF68 AAAALSAAANNAAVTAANLSENIKKAHEKQQEELQKKLMDEGDVQTLQQQENMSIKGQSA 522 HPUF60 RHMVMQKLLRKQESTVMVLRNMVDPKDIDDDLEGEVTEECGKFGAVNRVIIYQEKQGEEE 507 *::***:*:* :* *::***** *:*:*: *: *: *** ***:* ****::*** *:* DPUF68 RQLVMQRLMRPVDSRVIILRNMVGPEDVDETLQEEIQEECSKFGTVSRVIIFNEKQTENE 582 HPUF60 D---AEIIVKIFVEFSIASETHKAIOALNGRWFAGRKVVAEVYDOERFDNSDLSA 559 DPUF68 DDDEAEIIVKIFVEFSAGAEAMRGKEALDGRFFGGRRVVAELYDQGIFDQGDLSG DPUF68 DDDEAEIIVK****IFVEFSAGAEAMR*** * GKEALDGRFFGGRRVVAEL**:**YDQGIF* ******QGDLSG*** ** *** 637 B RNP2 111% RNP1 -%9% HPUF60 DPUF68 HU2AF65 MUD2P C RNP2 HPUF60 1 DPUF68 ] U2AF65 I MUD2P D D HPUF 60 DPUF68 HU2AF65 MUD2P 10 ITEORRMDMKI I. RLFIT 30 40 LPFGP S DM.WDS~TMKKG.... rPFGP KS .WDP TQ KG .... AQMRGG TQ PGNP LAVQINQDKN KTESEDFK .NFY GEGIPD .... 60 70 80 90 100 EQMNSNRNG-GSNIaGAQPI DQLAEEA :VP..EAQ .EQMNBRN SNMP QQVDM.QEEA IP..E Q S P... AFDGDY.PLP LS...VDETTQ QLDHnFCRGT RSFF4NKTFD WRNDY ISQICST RNP1 nA 30 40 fKIKSCT PTTGKH :A LY C KQSLHTH P&rAF N TGLS KCS.SNTNN EFTKC 20 30 70 80 90 GQYPPMPTPATP PPINACPTTN D GQL G GMQDK VGAK TLVSP. SLID PYK 40 50 NR IIIDYEIEKQGEEED.... KPNDG.. RNP1 60 EJ DLE T .... LQE .... TSRIFEEKQTENEDDD.EEIl EYEIVDVRDIS KSIEDPRPVDGVEVPG .... CG.. KT... LKYSI DTK CPGVDYRLNFENLSGIC T ........... so 90 100 110 ELY QGI Q G.. lFW... HR D TKYC PIE MMMLS ED T MTYI Figure 2. The PUMP domain is a distinct subset of the RRM domain family. A. Alignment of the PUMP domain. The top panel is an alignment of the best matches to the PUMP family consensus. The PUMP consensus and RRM consensus is shown at the bottom of the figure. The sequence of the first RRM of U1A is shown on the bottom line. B. A dendrogram showing that the PUMP family represents a distinct subset of the RRM domain family. Alignment of 40 RRM domains with the PUMP domains. The RRM seed database from PFAM was aligned with the PUMP family sequences and displayed as a dendrogram. The PUMP family is boxed and is consistently found to form a distinct group from the set of RRMs in the alignment. 125 A RNP HPUF60 DPUF68 ATD111 SPUAP2P TAT-SF1 HU2AF65 DU2AF50 CeU2AF SpU2AF HU2AF35 DU2AF38 SPU2AF23 HURP ScMud2p SCCUS2P YMC7_CA32 YHS7_YE159 NRD1_YE341 LU15_HU233 YAX9_SC137 NOP4_YE149 RN12_YE200 YG3Q_ YE22 EECG ----OPKDIDD----- DL TL QEECS -----GPEDVDE--GECA----PGQVDD----EL D -LEELDKTP--ELI KDDITEEAE- FHPMDFEDD---PL REDLRVECS-ECS----5PEELLDDEE-YEEI rPDELRDEEE-YEDI KEECT----1EDELKRDDE-YEEI ECS----S. TQFS-----rGDEIMDVQE-YEDI 6DVEMQEH---YDEF TEMEE--CED----PDEEMQEH---YDNF CEFS-----QRELAEQ---FD LEYSEEE---TYQQ EDVLPEFK-KY------PLDLKDET--FITE LEGCE-----S, NDDIND-------I SS-------PEDVFDN------KQ EN------- LGVNQE------- N K--------- LNMKEW ------- D S--------- PHTVVD ------- S F KPVTTNG-------S R--------I FG--------PWSCRDP------VK S FR-------GPALTEE-------E S PEWNQD-------I 1 --- IASW K QEKQGE--EED-----AEII EKQTE--NEDDDE--AEII' EITEPN-------- FPVHEA -------KE-------PD --------DR------HPD RPVDG---------VEVPGC RPIEG---------VEVPGC RPYED---------LPVPGV RSIGTR-------- NSGLGT DNLGD------------HLV DNLGD------------HLV DNVGD------------HLV CNLEP------------HLR R-R-- 3------ NNE RPE --- --DPE --- SVF --- SVL --- DLPS----GS-----AT ---N-------------SR IKDKQTQQ ---------NR -S --- DLS --- KIS --- SF S----------ETG----DL -- RKRD-----GK-----LC --- ITK Yi 1( I VDGTT R-R-R-- R---- DIR --- REE --- N --- YEE --- SEE --- TL -::NN -SP --- TTQ --- SRH QPGVDYR- -- LNFENLASGA ----------- PN------ K FD--------FLR-----SF ---------- SDLSAGDLSGKNELAPQKSGKHT QVEETSR oDPDS RRDFW"DPDK RREF-DVDK QF---- DQER IDQGI DEEK DGK R-- TAA-----NN ND------PR-----TG C-ir GEC 3PVTD CCRQ 3PVTD RECCRQ R-R-- N A Q1 A K SPVTD k-iR---D-- Q Q. I is. -YLTS EACCRQ .'PVTR ICGL DEDD MMEATQ jGDEN SSTSDKN ANSQML kQRI SY--kSDES QHGY3PRDC LVLSDG AKSA NHPK PNAEHL DYKKA AVQKN ,NIRR LVSNFF BPNY LRETAN iI-I-- PUMP consensus RRM consensus RU1AHU12 NE-KIKKDE--LKKS -- P1-RNP 2 loopl FS--------- --- al--- loop 2 RS---------LK-----MR loop 3 -- --- EVS P3 -- loop4----t2----- RNP 1 -PZRPQkKTDS loop 5 -- )4- IAKMKG ~, PUB1 YEAST/163-234 PABP DROME/4-75 PES4 YEAST/93-164 EWS fus/tls-H.s. 0.1 Figure 3. HPUF60 forms SDS-resistant dimers. A. Translation in vitro of the full length PUF60 yields a predominant band of 60 kD and a band of 130 kD (lane 1); an N-terminal truncation of HPUF60 yields a 55 kD band and a 120 kD band (lane 3); co-translation of both yields the 60 and 55 kD products as well as products of 120, 125 and 130 kD (lane 2). B. Bacterially expressed His6PUF60 also forms SDS-resistant dimers. Serial two-fold dilutions of the protein (lanes 1-6) showed that monomeric His 6PUF60 could be chased into dimeric form at high concentrations. A small C-terminal deletion of PUF60, His6PUF60AC, prepared identically to His6PUF60, does not form SDS-resistant dimers (lanes 9 and 10). C. Plot of the fraction PUF60 in monomer and dimer forms upon boiling in SDS. In order to more accurately measure the concentration of dimer at low protein concentrations for this experiment two different quantities of protein were used and the volume was varied to vary the concentration over the range shown. Similar results are obtained by using the method shown in A and B. D. The domain responsible for SDS-resistant dimerization was mapped by expression of PUF60 fragments, as indicated. 128 I _ _ _ __ A PUF60 + PUF60AN + + DIMERS MONOMERS PUF60 PUF60AN 4 RRM1 I RRM2 PUMP RRM2 PUMP RRMF B His 6 PUF60 His 6 PUF60 ' ,' O* O* ' r* --:'::::: ::::::::::::::::L::::i::~ :::X:: :.::.:i:: :~:;:::i:::::: j_ 4, O* His 6 PUF60 dimers 7 3 4 5 6 150 kD 100 kD 75 kD ::: :;:l .;:::::,:u : ::::::::::':::::: :::::: ::":':': His 6 PUF60 2 r*k ::::::::: :: -::::: : ::::: :-~ ::"':":':::':::: 1 His 6 PUF60AC 8 50 kD 9 10 C 0.8 -i 1.8 0.6 - 0.4 - 0.2 - 0 I I I 110 - 210-6 I I 310 -s 410-6 Concentration of HisPUF60 (M) y = m2/(1+(1 .5e-6/MO)^m1) Error Value ml 1.5 0.1 m2 0.97 0.025 Chisq 0.0057 NA R 1 NA 5 10 -6 1l +I I Ir I !l03 3 + !ot 3 + !1oo 3 I '' I r I !lO 3 I Lu U Izwuti I dwnd I Figure 4. RNA binding activity of His 6PUF60. A. Binding of His 6PUF60 to pyrimidine-tract RNAs of different lengths was tested. Pyrimidine-tract RNAs of various lengths were incubated with a titration of PUF60 protein, and complexes formed were resolved by native gel electrophoresis. Complexes formed on the longer and not the shorter RNAs. B. Specificity ofPUF60 RNA binding was determined by competing His 6PUF60 binding with cold competitor RNA homo- and heteropolymers. The specificity of PUF60 binding was tested by binding PUF60 to pyr23 in the presence of varying concentrations of homo- and heteropolymeric RNAs. Added competitor RNA is x 0.1 ng. C. Comparison of the binding of His 6PUF60 with His 6PUF60AC for three independent experiments. A plot of three replicates of a PUF60 binding reaction to pyr23. 133 A His6PUF60 o0p 0 pyr23 0o 0 pyrl4 pyr11 pyr7 ! ',i WOW 'i! ........... BOUND FREE 1 2 3 4 5 6 7 8 9101112 13 14 15 16 17 18 19 20 21 22 23 24 :-:-:: +50000 +5000 +500 " 0 'a %z ) C C " 0 > 0 0 S50000 +0 +0 +50000 +5000 +500 +50 +5 , 50000 +0 +500 +50 +5 S50000 +0 + 50000 ' 50000 +0 +50000 g +5000 +500 ..!iii:::ri ... ...... +50 +5 0 C " ' 50000 C " 0 +0 +50000 +5000 +500 +50 +5 +50000 +5000 +500 +50 +5 S50000 +0 0 mj (0.1 ng) C COMPETITOR -o H C 1 0.8 0.6 0.4 0.2 - 0 10-8 10-6 10-7 Concentration (M) 10-s CHAPTER 4: SC35 MEDIATED RECONSTITUTION OF SPLICING IN U2AF-DEPLETED NUCLEAR EXTRACT Andrew M. MacMillan, Patrick S. McCaw, John D. Crispino, and Phillip A. Sharp* Center for Cancer Research and Department of Biology Massachusetts Institute of Technology Cambridge, Massachusetts 02139-4307 USA This chapter was published originally in the Proceedings of the National Academy of Sciences, volume 94, pp. 133-6. 137 ABSTRACT Assembly of the mammalian spliceosome is known to proceed in an ordered fashion through several discrete complexes, but the mechanism of this assembly process may not be universal. In an early step, pre-mRNAs are committed to the splicing pathway through association with U 1 snRNP and non-snRNP splicing factors including U2AF and members of the SR protein family. As a means of studying the steps of spliceosome assembly, we have prepared HeLa nuclear extracts specifically depleted of the splicing factor U2AF. Surprisingly, the SR protein SC35 can functionally substitute for U2AF 65 in the reconstitution of pre-mRNA splicing in U2AF-depleted extracts. This reconstitution is substrate-specific and is reminiscent of the SC35 mediated reconstitution of splicing in extracts depleted of Ul snRNP. However, SC35 reconstitution of splicing in U2AFdepleted extracts is dependent on the presence of functional Ul snRNP. These observations suggest that there are at least three distinguishable mechanisms for the binding of U2 snRNP to the pre-mRNA including U2AF dependent and independent pathways. 138 INTRODUCTION Pre-mRNA splicing occurs via two sequential transesterification reactions in a 60S complex known as the spliceosome which assembles on the pre-mRNA substrate in an ordered fashion through several discrete complexes (E, A, B, C; ref. 1). The spliceosome includes the small nuclear ribonucleoprotein (snRNP) particles U1, U2, U4/6, and U5, as well as associated splicing factors (2-4). Commitment of a pre-mRNA substrate to assembly of the spliceosome involves the ATP-independent formation of the E (early) or commitment complex (5-8). This complex, in the mammalian system, contains U1 snRNP as well as non-snRNP protein factors including the U2 auxiliary factor, U2AF, and members of the SR protein family (8). SR proteins contain extensive serine/arginine (SR) repeats and a subset contain RNA recognition motifs (RRM); the predominant members of the family are conserved from Drosophilato humans. Many SR proteins are important splicing factors which function in both constitutive and alternative RNA splicing (9). SR proteins containing RRMs display modest affinity and sequence specificity in their association with RNA and probably bind cooperatively in association with other factors including other SR proteins (9). SR family members associate with pre-mRNAs early in spliceosomal assembly (10, 11) and this association may persist through the chemistry of splicing (12). It has been suggested that SR proteins are required for specific transitions during the course of spliceosomal assembly such as the progression from A to B complex (13). The 35 kD SR protein SC35 has been reported to stimulate E complex formation (8) and has been shown to be associated with a complex formed at the 3' end of the intron at early stages in spliceosome assembly (14). Conserved sequence elements at the 5' and 3' splice sites and in the branch sequence and pyrimidine tract of the pre-mRNA direct formation of the spliceosome (3, 4). In particular, recognition of the pyrimidine tract is required early in the formation of commitment complexes. A number ofpolypeptides have been reported to bind specifically to the pyrimidine tract including the U2 auxiliary factor, U2AF (15-18). The splicing factor U2AF was first identified as an activity required for the stable association of U2 snRNP with the pre-mRNA branch site in the formation of the A complex (15). U2AF is a heterodimer consisting of both a large (65 kD) and a small (35 kD) subunit (18). The large subunit, U2AF 65 , is an essential splicing factor containing an N-terminal SR domain and three C-terminal RRM domains. U2AF 65 binds with avidity to 139 the pyrimidine tract while U2AF35 is in turn tightly associated with the larger sub-unit through protein-protein interactions (17, 18). The interactions between the various components of the commitment complex and their precise role in progression through spliceosomal assembly remain to be determined. U2AF has been detected in affinity selected commitment (E) complexes isolated by gel filtration (8). Although U2AF is clearly important for the transition from E to A complex, it'has not been found to be stably associated with either the A or B/C complexes (19). In the commitment complex, U2AF is probably bound to the pyrimidine tract while Ul snRNP is associated with the 5' splice site. Far Western analysis has suggested that the SR protein SC35 is associated with both the U1 snRNP associated factor U1-70K and U2AF through interactions with the small sub-unit U2AF 35 (20). Thus, SC35 may function as a bridge across the 5' and 3' splice sites. In order to study the mechanism of spliceosomal assembly, we have prepared HeLa nuclear extracts depleted of the splicing factor U2AF and reconstituted splicing of premRNA substrates by the combination of both column fractions and purified recombinant splicing proteins. Interestingly, the SR protein SC35 can functionally substitute for U2AF 65 in this reconstitution in a manner which is both substrate specific and dependent on the presence of functional Ul snRNP. 140 MATERIALS AND METHODS RNA Transcription The PIP85.A RNA pre-mRNA substrate was transcribed from plasmid pPIP85.A (22). The PIPPG pre-mRNA substrate is a chimera consisting of the 5' portion of PIP85.A and the 3' portion of 13-globin (23). It was transcribed from a template constructed by ligating the PCR product representing P-globin sequences between +252 and +386 (which include the last 92 nucleotides of the intron and the complete 3' exon) to pPIP85.A digested with Xba I and Hind III. Nuclear Extracts Nuclear extracts were prepared from HeLa cells as described by Dignam et al. (24). Extracts depleted of poly[U]-binding proteins including U2AF65 (P.S. McCaw & P.A. Sharp, chapter 2) were prepared by dialyzing nuclear extract directly into 1 M KCl/buffer D (20 mM Hepes, pH 7.9, 20% glycerol, 0.2 mM EDTA, 0.05% NP-40, 0.5 mM DTT). The resulting extract was passed over a poly[U]-Sepharose 4B column (Pharmacia) at 0.1 ml/min and subsequently dialyzed against 0.1 M KCl/buffer D. After washing this column with 1 M KCl/buffer D, the column was eluted with buffer D containing 2 M KCl and the eluate was dialyzed against 0.1 M KC1. Combination of the poly[U]-depleted nuclear extract with the 2M KCl eluate gave an extract, AU2AF NE, specifically depleted of U2AF activity. The extent of U2AF depletion in this extract was assayed by Western analysis with a-U2AF65 antibody. U2AF and SC35 Preparation Recombinant His6-tagged U2AF 6 5 was prepared from an insoluble E. coli lysate and was further purified by a 60% ammonium sulfate precipitation (P.S. McCaw & P.A. Sharp, chapter 2). SC35 was purified as described elsewhere (10) from baculovirus infected Hi5 insect cells. The protein recovered from phenyl-Sepharose chromatography was treated with micrococcal nuclease, concentrated by precipitation with 20 mM MgCl 2 , and then resuspended in buffer D. 141 Splicing Assays Splicing reactions (25 tl) were performed under standard conditions (25) using 20% HeLa nuclear extract, incubated at 30"C, and resolved on 20% denaturing polyacrylamide gels. Reactions containing U2AF-depleted extract, AU2AF NE, were supplemented with 500 ng of recombinant U2AF 65 and/or 300 ng recombinant SC35. For the Ul blocking experiments, mock or U2AF-depleted nuclear extract was incubated in the presence of 7 pM (-15 fold excess over Ul snRNP; 26) a-U1 2'-OMe oligonucleotide (27) for 15 min. at 30"C followed by addition of substrate RNA and SR protein and incubation under splicing conditions for 60 min. RESULTS U2AF 65 and SC35 Mediated Reconstitution of Splicing in U2AF-Depleted Extracts In order to examine the mechanism of action of the splicing factor U2AF, HeLa nuclear extracts were depleted of this activity. HeLa nuclear extracts depleted of poly-[U] binding factors and supplemented with a 2M KCI column fraction (P.S. McCaw & P.A. Sharp, chapter 2) were efficiently depleted of the splicing factor U2AF 65 (400 fold by Western analysis; Fig. IB and data not shown). Splicing of pre-mRNA substrates including PIPPG could be restored to the U2AF-depleted extract by either the addition of a 3M KCl column fraction containing U2AF 65 or recombinant U2AF 65 (Fig. lA and data not shown). Combinations of U2AF-depleted extract, purified SR proteins, and recombinant U2AF 65 were tested for reconstitution of splicing. Surprisingly, addition of SR proteins to nuclear extracts depleted of U2AF65 reconstituted splicing of the PIPPG pre-mRNA (data not shown). To examine this effect more closely, reconstitution reactions were carried out with specific SR proteins purified from baculovirus infected insect cells. Addition of recombinant SC35, purified from insect cells, restored splicing of the PIPPG pre-mRNA in U2AF-depleted extracts (Fig. lA). This effect was substrate specific since splicing of the PIP85.A pre-mRNA in depleted extract could not be reconstituted with the addition of 142 SC35 although addition of recombinant U2AF 65 resulted in reconstitution of PIP85.A splicing at the levels observed with the PIPPG pre-mRNA substrate (Fig. 2). The SC35 reconstitution activity was not due to the presence of the insect homologue of U2AF; insect U2AF 50 was detected by Western analysis in crude insect cell lysates but not in samples of purified SC35 (Fig. 1C). Factor Dependent Splicing in U1-Blocked Extracts SR proteins, specifically SC35, have been shown to reconstitute splicing in a substrate specific manner in reactions in which Ul snRNP has been either depleted by affinity selection (28, 23) or blocked by the pre-incubation of extract with an antisense-U1 oligonucleotide (26). Because of this observation and because Ul snRNP is a component of the commitment complex, we examined the role of Ul snRNP in the reconstitution of U2AF-depleted extracts. U 1 snRNP was inactivated in both mock and U2AF-depleted extracts by preincubation of the extracts with a 15-fold excess (over endogenous U1 snRNP; 26) of an antisense-U1 2'-OMe oligonucleotide. This blocking of Ul snRNP severely decreased the splicing of PIPPG pre-mRNA (Fig. 3; lanes 2 and 5). In accordance with previous observations (26), an excess of SC35 restored splicing activity in Ul blocked reactions in the presence of U2AF (Fig. 3; lanes 3 and 6). More interestingly, SC35 could only restore splicing activity in U2AF-depleted reactions in the presence of functional Ul snRNP (Fig. 3; compare lanes 7 and 8). We have shown (Fig. 1) that SC35 functionally substitutes for U2AF 65 in a U2AFdepleted extract (Fig 3, lane 7). However, when U2AF-depleted extract was pre-incubated with an antisense-U I oligonucleotide in the absence of added U2AF 65 , addition of recombinant SC35 alone did not restore splicing of the PIPiG pre-mRNA (Fig. 3, lane 8). Thus, SC35 can only functionally substitute for U2AF in the presence of Ul snRNP. These results distinguish a U2AF independent pathway from a Ul snRNP independent pathway (Fig. 4; ref. 28) DISCUSSION HeLa nuclear extracts depleted of the splicing factor U2AF were reconstituted for splicing of pre-mRNA substrates by the addition of purified recombinant SC35. 143 Furthermore, the SR protein SC35 restored splicing in a U2AF-depleted extract in a manner that was both substrate specific and U snRNP dependent. This suggests that U2AF 65 is not an essential factor in either spliceosome assembly or RNA splicing in the presence of high concentrations of SC35; it is likely that the enhanced concentrations of SR domains provided by the addition of SC35 complement the U2AF deficiency. The pyrimidine tract binding protein U2AF 65 has been shown previously to be required for the formation of the first stable complex formed between the pre-mRNA and U2 snRNP, the A complex (15). However, addition of purified SR proteins and more particularly recombinant SC35 can functionally substitute for U2AF 65 in the reconstitution of splicing in U2AF-depleted HeLa nuclear extracts. This complementation required addition of SR protein to approximately a 10-fold excess over levels present in a typical extract. Complementation by SC35 was substrate specific: the PIPPG pre-mRNA was efficiently spliced in an SC35 reconstituted extract while the PIP85.A pre-mRNA was not. The basis for this specificity is unclear, but it is intriguing in that it mirrors the specificity of the SC35 mediated reconstitution of splicing in extracts depleted of Ul snRNP (28, 29). There are no obvious specific SC35 binding sites (30) in the PIPPG substrate, but the specific interaction of SR proteins with the pre-mRNA substrate may only occur in the context of complex protein-RNA and RNA-RNA interactions (9). Several lines of evidence suggest that the requirement of U2AF for in vitro splicing may not be stringent. First, it does not appear that U2AF is absolutely required for spliceosome assembly. Green and coworkers have shown that either the SR domain of U2AF 65 fused to a heterologous RNA binding domain or heterologous SR domains fused to the U2AF 65 RNA binding domains can restore U2AF function in a depleted extract (31, 32). These results suggest that the function of U2AF is to position an SR domain in the vicinity of the branch region/pyrimidine tract. Second, U2AF does not appear to be present in catalytically active spliceosomes isolated by gel filtration and thus the chemistry of splicing probably occurs in the absence of U2AF (19). Finally, the reported Saccharomyces cerevisiae homolog of U2AF, MUD2, is not a required splicing factor indicating that its function in U2 snRNPopre-mRNA association is either not essential or redundant (33). Thus, although in the mammalian system U2AF is highly conserved, it is possible that there are a variety of mechanisms or factors responsible for directing U2 snRNP to the branch region of the pre-mRNA. 144 The mechanism of the SC35 complementation for U2AF 65 deficiency is not clear. Members of the SR protein family have been proposed to recruit U2AF to the branch site via exon enhancers (34) and it is possible that trace amounts of U2AF in the depleted extract were recruited to a commitment complex by excess SC35. However, this seems unlikely for several reasons. First, U2AF cannot be detected in either the depleted extract or in the purified SC35 which complements the reactions - the upper limit of contamination is 1/400th of endogenous levels of U2AF (Fig. 1 and data not shown). The U2AF-depleted extracts contained at least 10 fold less U2AF than that required for restoration of splicing (as determined by adding mock depleted extract to depleted extract; data not shown). Second, a recruitment mechanism would suggest that excess SC35 should reconstitute a U2AF-depleted extract even when Ul activity was blocked in accordance with the observed SC35 reconstitution of U1 snRNP blocked (26) or depleted extracts (28, 29). This was not the case: pre-incubation of U2AF-depleted extract with antisense-U1 oligonucleotide effectively blocked the SC35 reconstitution (Fig. 3; see below). It is likely that SC35 plays several roles in the reconstitution reactions including substitution for the activity of U2AF (Fig. 4). The mechanism of U2AF activity in recruitment of U2 snRNP to the branch region is not well understood but clearly involves recognition of the N terminal SR region (18). While most SR domains are believed to be involved in protein-protein interactions, it has been suggested that the SR domain of U2AF functions as an RNA annealing activity (35). Most probably, SC35 complements the U2AF deficiency in depleted extracts by providing a surrogate SR domain required for critical interactions in the course of spliceosome assembly. High levels of SR proteins, including SC35, promote the splicing of the PIP3G substrate in the absence of Ul snRNP (23). Under these conditions, the SR proteins facilitate the formation of the U2 snRNP containing A complex independent of Ul snRNA and the association of U6 snRNA with the substrate RNA can become rate-limiting (29). The U1 snRNP-bypass reaction does not occur with PIP85.A pre-mRNA. The observed substrate specificity is identical to that in the SC35-mediated U2AF bypass reaction. This intriguing observation might have reflected a common pathway in which the Ul snRNP independent reaction was also U2AF independent. However, this was not the case: interference with the activity of Ul snRNP inactivated splicing in U2AF-depleted extracts, even in the presence of high concentrations of SC35. The common substrate specificities in the two reconstitutions perhaps reflect the sequence specificity of SC35-pre-mRNA interaction and not common splicing pathways. Thus, there are at least three different 145 pathways to the formation of an active spliceosome: the conventional pathway requiring both UI snRNP and U2AF 65 ; a second pathway, which is independent of U1 snRNP but dependent on U2AF; and a third pathway which is independent of U2AF but dependent upon Ul snRNP (Fig. 4). Any one of these mechanisms could be operative for a particular intron under specific conditions in vivo. However, since the splicing of a typical intron requires both Ul snRNP and U2AF, both of these entities probably act at a common step in stabilizing the interaction of U2 snRNP with the pre-mRNA. ACKNOWLEDGMENTS We thank M. Green for the generous gift of U2AF 65 antibody, R. Kanaar and D. Rio and for the generous gift of DrosophilaU2AF 50 antibody, and B. Blencowe for the generous gift of anti-U1 oligonucleotide. We also thank L. Lim, J. Pomerantz, and C. Query for their critical reading of the manuscript; M. Siafaca for her ever-present assistance; M. Beddall and R. Issner for indispensable technical support. This work was supported by United States Public Health Service MERIT award R37-GM34277 and grant RO1-AI32486 from the National Institutes of Health to P.A.S. and partially by a Cancer Center Support (core) grant P30-CA14051 from the National Cancer Institute. A.M.M. was supported by the Medical Research Council of Canada. 146 REFERENCES 1. Konarska, M.M. & Sharp, P.A. (1986) Cell 46, 845-855. 2. Guthrie, C. (1991) Science 253, 157-163. Moore, M.J., Query, C.C., & Sharp, P.A. (1993) in The RNA World, eds. 3. Gesteland, R.F. & Atkins, J.F. (Cold Spring Harbor, New York) pp. 303-357. 4. Krnimer, A. (1995) in Pre-mRNA Processing,ed. Lamond, A.I. (R.G. Landes, Austin) pp. 35-64. 5. Ruby, S.R. & Abelson, J. (1988) Science 242, 1028-1035. 6. Seraphin, B. & Rosbash, M. (1989) Cell 59, 349-358. 7. Michaud, S. & Reed, R. (1991) Genes Dev. 5, 2534-2546. 8. Staknis, D. & Reed, R. (1994) Mol. Cell Biol. 14, 7670-7682. 9. Fu, X.-D. (1996) RNA 1, 663-680. 10. Fu, X.-D. (1993) Nature 365, 82-85. 11. MacMillan, A.M., Query, C.C., Allerson, C.R., Chen, S., Verdine, G.L., & Sharp, P.A. (1994) Genes Dev. 8, 3008-3020. 12. Blencowe, B.J.; Nickerson, J.A., Issner, R., Penman, S., & Sharp, P.A. (1994) J. Cell Biol. 127, 583-607. 13. Roscigno, R.F. & Garcia-Blanco, M.A. (1995) RNA 1, 692-706. 14. Fu, X.-D. & Maniatis, T. (1992) Proc. Natl. Acad. Sci. USA 89, 1725-1729. 15. Ruskin, B., Zamore, P.D., Green, M.R. (1988) Cell 52, 207-219. 16. Zamore, P.D. & Green, M.R. (1989) Proc. Natl. Acad. Sci. USA 86, 9243-9247. 17. Zamore, P.D. & Green, M.R. (1991) EMBO J. 10, 207-214. 18. Zamore, P.D., Patton, J.G., & Green, M.R. (1992) Nature 355, 609-614. 147 19. Bennett, M., Michaud, S., Kingston, J., & Reed, R. (1992) Genes Dev. 6, 19862000. 20. Wu, J.Y. & Maniatis, T. (1993) Cell 75, 1061-1070. 21. Robberson, B.L., Cote, G.J., & Berget, S.M. (1990) Mol. Cell. Biol. 10, 84-94. 22. Moore, M.J. & Sharp, P.A. (1992) Science 256, 992-997. 23. Crispino, J.D., Mermoud, J.E., Lamond, A.I., & Sharp, P.A. (1996) RNA 2, 664-673. 24. Dignam, J.D., Lebovitz, R.M., & Roeder, R.D. (1983) Nucleic Acids Res. 11, 1475-1489. 25. Grabowski, P.J., Padgett, R.A., & Sharp, P.A. (1984) Cell 37, 415-427. 26. Tarn, W.-Y. & Steitz, J.A. (1994) Genes Dev. 8, 2704-2717. 27. Barabino, S.M.L., Blencowe, B.J., Ryder, U., Sproat, B.S., & Lamond, A.I. (1990)Cell 63, 293-302. 28. Crispino, J.C., Blencowe, B.J., & Sharp, P.A. (1994) Science 265, 1866-1869. 29. Crispino, J.D. & Sharp, P.A. (1995) Genes Dev. 9, 2314-2323. 30. Tacke, R. & Manley, J.L. (1995) EMBO J. 14, 3540-3551. 31. 175. Valdcrcel, J., Singh, R., Zamore, P.D., & Green, M.R. (1993) Nature 362, 171- 32. Valcarcel, J., Gaur, R.J., Singh, R., & Green, M.R. (1996) Science 273, 17061709. 33. Abovich, N., Liao X.C., & Rosbash, M. (1994) Genes Dev. 8, 843-854. 34. Wang, Z., Hoffinann, H.M., Grabowski, P.J. (1995) RNA 1, 21-35. 35. Lee, C.G., Zamore, P.D., Green, M.R., & Hurwitz, J. (1993) J. Biol. Chem. 268, 13,472-13,478. 148 FIGURE LEGENDS Figure 1. SC35 functionally substitutes for U2AF 6 5 in the reconstitution of pre-mRNA splicing in U2AF-depleted extracts. (A) Splicing of PIPOG pre-mRNA in mock depleted extract (lane 1); U2AF-depleted extract (lane 2); and U2AF65 reconstituted (lane 3) and SC35 reconstituted (lane 4) U2AFdepleted extract. (B) Western analysis with anti-U2AF 65 antibody of mock depleted (lane 1) and U2AFdepleted (lane 2) extract. (C) Western analysis of: (left) crude lysate of SC35 overexpression with a-p50 (antiDrosophilaU2AF 50 antibody) and anti-SC35 antibody (Mab 104); (right) purified SC35 with a-p50 and anti-SC35 antibody (Mab 104). 149 __~__ A NE IU2AF NE SIVS-E2 IWS E1-IVS-E2 ~\C~ 94 E1-E2 U2AF 6 5 - 1 234 2 lysate purified SC35 az-p50 Mabl04 a-p50 Mabl04 H p50 --sC35 - 4-SC35 Figure 2. SC35 reconstitution of splicing in U2AF-depleted reactions is substrate specific. Splicing of PIP3G pre-mRNA in mock (lane 1), U2AF-depleted (lane 2), U2AF 65 reconstituted (lane 3) and SC35 reconstituted (lane 4) extracts. Splicing of PIP85.A in mock (lane 5), U2AF-depleted (lane 6), U2AF 6 5 reconstituted (lane 7) and SC35 reconstituted (lane 8) extracts. 151 PIPBG NE U2AF 6 5 SC35 AU2AF NE - - PIP85.A NE AU2AF NE -E- IVS-E2 IVS E1-IVS-E2 El-E2 12 345678 Figure 3. SC35 reconstitutes pre-mRNA splicing in U2AF-depleted extracts dependent on the presence of functional U1 snRNP. Splicing of PIPI3G pre-mRNA in: nuclear extract (lane 1), nuclear extract pre-blocked with a-U1 oligonucleotide (lane 2), nuclear extract pre-blocked with a-U1 oligonucleotide and supplemented with SC35 (lane 3); U2AF-depleted extract supplemented with U2AF 65 (lane 4), U2AF-depleted extract supplemented with U2AF 65 and pre-blocked with a-U1 oligonucleotide (lane 5), U2AF-depleted extract supplemented with U2AF 65 , pre-blocked with a-U1 oligonucleotide, and supplemented with SC35 (lane 6); U2AF-depleted extract supplemented with SC35 (lane 7), U2AF-depleted extract blocked with a-U1 oligonucleotide and supplemented with SC35 (lane 8). 153 NE 65 U2AF t-Ul SC35 - - - + + - - + AU2AF NE + + + - ++ - - + - + + + VS-E2 E1-IVS-E2 123 456 78 Figure 4. Three distinct pathways resulting in spliceosome assembly. Under typical splicing conditions, both U2AF and Ul snRNP are required for spliceosome assembly with a network of stabilizing interactions between U2AF, U1 snRNP, and SR proteins (B; 20). In a substrate specific manner, excess SC35 reconstitutes splicing in U2AF-depleted reactions in a Ul snRNP dependent pathway (A) and in U1 snRNPdepleted reactions in a U2AF dependent pathway (C). 155 (C) U2AF independent U2 snRNP U2AF dependent Ut snRNP dependent U2 U1 snRNP independent U2 snRNP snRNP S3snRNP A A Complex CHAPTER 5: A MINIMAL SPLICEOSOMAL A COMPLEX RECOGNIZES BRANCH SITE AND POLYPYRIMIDINE TRACTS Charles C. Query, Patrick S. McCaw, and Phillip A. Sharp* Center for Cancer Research and Department of Biology Massachusetts Institute of Technology Cambridge, MA 02139-4307 This chapter was originally published in the Journal of Molecular and Cellular Biology, vol. 17, pp. 2944-53. 157 ABSTRACT The association of U2 snRNP with the pre-mRNA branch region is a critical step in the assembly of spliceosomal complexes. We describe an assembly process that reveals both minimal requirements for formation of a U2 snRNP-substrate RNA complex, here designated A., and specific interactions with the branch-site adenosine. The substrate is a minimal RNA oligonucleotide, containing only a branch sequence and polypyrimidine tract. Interactions at the branch-site adenosine and requirements for polypyrimidine tract-binding proteins for An are the same as those of authentic pre-spliceosome complex A. Surprisingly, A formation does not require Ul snRNP or ATP, suggesting that these factors are not necessary for stable binding of U2 snRNP per se, but rather for accessibility of components on longer RNA substrates. Further, there is an ATP-dependent activity that releases or destabilizes U2 snRNP from branch sequences. The simplicity of the An complex will facilitate a detailed understanding of the assembly of pre-spliceosomes. 158 INTRODUCTION The removal of introns from precursors to messenger RNA molecules (pre-mRNA) is catalyzed by the spliceosome, a dynamic 50-60S complex composed of small nuclear RNAs (snRNAs) U1, U2, U5, and U4/6, as well as protein components (for review, see 34, 39, 43, 46). Such intron excision proceeds by way of two sequential transesterification reactions. The spliceosome assembles de novo on each substrate premRNA and several distinct intermediates in an assembly pathway can be observed in vitro. The E (early) or commitment complex contains Ul snRNP and non-snRNP protein factors (28, 42, 58). Complex A is generated by the stable binding of U2 snRNP to the branch region of the pre-mRNA; a larger complex, B, is formed by association of U4/5/6 trisnRNP with complex A. Complex C follows B after significant rearrangements and contains splicing intermediates (29, 30, 43). The branch region contains the nucleophile for the first chemical step of splicing, and its recognition is required early in splicing complex assembly. U2 snRNP binds the pre-mRNA, in part, through U2 snRNA*branch region base-pairing (48, 69, 73), and the first step nucleophile is selected, in part, by virtue of being bulged from this duplex (51). Early branch site recognition in yeast requires U1 snRNP and a non-snRNP splicing factor, a component of which may be MUD2 (2, 55, 59). In mammals factors SF3a, SF3b (both of which join 12S U2 to form 17S U2 snRNP), SF1, U2AF 65 , U2AF35 , Ul snRNP and members of a family of proteins containing arginine-serine dipeptide repeats (SR proteins; for review see 23, 40, 66) are important for the stable association of U2 snRNP with the pre-mRNA (3, 6, 7, 9, 10, 33, 74). U2AF 65 binds specifically to polypyrimidine tracts in early complexes (24, 42, 56, 70). Another factor, PUF-2 (poly[U]-binding factor-2), which contains two more polypyrimidine-binding proteins, a p54 SR protein (14, 71) and a p130, is also important for efficient complex A formation (41). Some of the components of SF3 have been shown to cross-link to the pre-mRNA upstream of the branch region and are suggested to tether or stabilize U2 snRNP binding to the pre-mRNA (13, 25). Within the branch region, but not at the adenosine, two proteins, BPS7 2 and BPS 70 , have been cross-linked in E and A complexes, respectively (16, 52). At the branch-site adenosine itself, three proteins have been detected in complex A within 15 A: p14, p35, and p150 (38); one of these, p14, can be directly photo-cross-linked to the branch-site adenosine (52). 159 ATP is required at numerous points during the splicing process and probably for multiple distinct functions, although it is not involved directly in either of the two transesterification reactions (45). Phosphorylation and dephosphorylation of SR proteins are believed to occur, as well as structural rearrangements of the snRNAs (reviewed in 23, 46, 65). The earliest detected requirement for ATP is during the transition from E complex to complex A, when U2 snRNP joins the pre-mRNA (e.g., 15, 29, 32, 37, 42, 50). Although this has been generally interpreted to indicate that U2 snRNP binding requires ATP, the exact mechanism is unclear. By analogy to known systems operative in the ribosome for the fidelity of translation, there have been many suggested steps of proofreading during the splicing process (11). The yeast protein PRP 16 may be part of a proofreading/discard pathway that examines the branched nucleotide after chemical step one, as mutant prpl16 alleles increase the rate of progression to the second step of splicing of certain non-adenine branches (12, 17). In the present study, we have determined the minimal substrate requirements for formation of complexes containing U2 snRNP. A variety of criteria indicates that this minimal complex, A ,, represents an accurate model for interactions with many factors influencing assembly of pre-spliceosome complex A. A~i n formation is a more sensitive system, as it is more affected by subtle modifications than is complex A. Surprisingly, formation of Amin does not require ATP; but, the complex is subject to an ATP-dependent dissociation, which may reflect a fidelity mechanism normally operative at the time of prespliceosome assembly. MATERIALS AND METHODS RNA transcription and synthesis of substrates. pPIP85.B is a modification of pPIP85.A (44) that has only one adenosine in the branch region and encodes the following 234-nucleotide sequence: 5 '-GGGCGAAUUCGAGCUCACUCUCUUCCGCAUCGCUGUCUGCGAGGUACCCUACCAG GU GAGUAUGGAUCCCUCUAAAAGCGGGCAUGACUUCUAGAGUAGUCCAGGGUUUCCGAGGGUU UCCGUCGACGAUGUCAGCUCGUCUCGAGGGUGCUGACUGGCUUCUUCUCUCUUUUUC CCUCAG GUCCUACACAACAUACUGCAGGACAAACUCUUCGCGGUCUCUGCAUGCAAGCU 3 '. Arrows indicate the 5' and 3' splice sites, and the underlined A indicates the branch site. The bold sequence represents RNA(146-179), or BS-PPT RNA. Transcription of this full-length pre-mRNA and of other RNAs were performed under standard conditions (see 51). 160 Two-way RNA ligation reactions and gel purification of products were performed as described previously (44, 51). Briefly, oligo-ribonucleotides containing a branch sequence and polypyrimidine tract [BS-PPT RNA: RNA(146-179)] were prepared by joining a branch region decamer [RNA(146-155): 5'-GGGUGCUGAC-3'] and a 5'-32p phosphorylated polypyrimidine tract [RNA(156-179): 5'UGGCUUCUUCUCUCUUUUUCCCUC-3'] using T4 DNA ligase (USB) and a bridging oligonucleotide [cDNA(169-136): 5'-GAGAGAAGAAGCCAGTCAGCACCCTCGAGACGAG-3']. PPT-BS RNA [RNA(156-179, 145-155)] was prepared by joining RNA(156-179) and 5'- 32 p-phosphorylated branch region decamer [RNA(146155)] using cDNA (5'-GTCAGCACCCGAGGGAAAAAGAGAGAAGAAGCC-3'). Ligation products were purified on 15% polyacrylamide (29:1), 8 Murea gels run in lx TBE (89 mM Tris-borate, 2 mM EDTA). All-RNA and 2,6-diaminopurine-containing branch region decamers were prepared by chemical synthesis as described (62). 2'-H substituted branch region decamers and polypyrimidine tract-containing RNA(156-179) were prepared by chemical synthesis on an ExpediteTM 8909 oligonucleotide synthesizer (by M.J. Moore) and purified similarly. Branch region decamer containing a convertible adenosine for cross-linking experiments was described in (38). Formation and native gel analysis of splicing complexes. To form splicing complexes, RNAs were incubated under standard splicing conditions (26) using nuclear extracts as described below; or, for ATP-depleted reactions, ATP and creatine phosphate were omitted from the mixes, which were preincubated for 15 min at 30 0 C to deplete endogenous ATP and, in some cases, then adjusted to 10 mM EDTA. RNAs were then added and incubated at 300 C for the times indicated. Reactions were adjusted to 0.5 mg/ml heparin and separated by electrophoresis in 50 mM Tris-glycine through non-denaturing 4% (80:1) polyacrylamide gels. Nuclear extracts and purification of splicing factors. Nuclear extracts were prepared from HeLa cells as described (21). Extracts depleted of individual snRNPs were generous gifts from John Crispino, were prepared as described (6, 8), and were characterized in (18). Extracts depleted of poly[U]-binding proteins and U2AF 65 were prepared by dialyzing nuclear extract directly into 1 M KCl/buffer D [20 mM Hepes (pH 7.9), 20% glycerol, 0.2 mM EDTA, 0.05% NP-40, 0.5 mM DTT]. The resulting extract was passed over a poly[U]-Sepharose 4B column (Pharmacia) at 0.1 ml/min and subsequently dialyzed against 0.1 MKCl/buffer D. After 161 washing this column with 1 MKCl/buffer D, the column was eluted with buffer D containing 2 MKCI; the eluate was dialyzed against 0.1 M KCI to obtain the "2 M KC1 fraction", or PUF-2, which contains the poly[U]-binding proteins p54 and p130 (41). Mock-depleted extract was prepared in parallel to depleted extract by dialyzing nuclear extract into 1 MKCl/buffer D and subsequently against 0.1 MKCl/buffer D. Recombinant His6-tagged U2AF 65 was prepared from a 60% ammonium sulfate precipitate of a soluble E. coli lysate. This was loaded onto a Ni2+-NTA-agarose column (Qiagen), eluted with 250 mM imidazole/buffer D, and dialyzed into 0.1 MKCl/buffer D. Photo-cross-linking assays. High specific activity substrate (107 c.p.m./reaction) containing an N6-ethylthiolmodified adenosine was reduced by treatment with 5 mi dithiothreitol (DTT) in 20 mM NaHCO3 at 30"C for 1 hr and then derivatized by reaction with 20 mM benzophenone maleimide (Molecular Probes) in 50% dimethyl formamide at room temperature for 1 hr (38). Reactions were extracted with phenol/chloroform and chloroform and then ethanol precipitated. The RNA was incubated in HeLa nuclear extract as above except that RNasin was omitted, and was adjusted to 0.5 mg/ml heparin prior to UV irradiation with a 302-nm lamp (0.12 W/cm 2 at 1 cm; Ultraviolet Products) for 20 min on ice. Alternatively, cross-linking of 2,6-diaminopurine-containing RNA was performed on ice by irradiation with a 254-nm lamp (0.12 W/cm 2 at 1 cm; Ultraviolet Products) for 60 min. After either photo-cross-linking technique, reactions were separated on 4% (80:1) native polyacrylamide gels (29) and frozen; the individual complexes, visualized by autoradiography, were excised. These were incubated with 0.32 mg RNase A/ml of gel in 125 mM Tris-HCI (pH 6.8) at 37 0 C overnight, then incubated with SDS loading buffer at 37 0C for 2 hours and 65 0 C for 5 min, and placed directly onto the stacking gel of a disassembled SDS 16% (200:1) polyacrylamide gel, which was reassembled and electrophoresed in 0.25 MTris (pH 8.3), 0.192 M glycine, 0.1 % SDS. RESULTS A short oligonucleotide can form complexes with U2 snRNP. U2 snRNP complexes form on full-length pre-mRNAs (complex A) as well as on 3' half RNAs that lack a 5' splice site (A3' complexes; 29). These RNAs contain a number of elements, illustrated in Figure lA, upper, believed to contribute to complex A 162 formation and stability. 5' to the branch site is a region that interacts with SF3a and SF3b components, which is believed to stabilize complex A (25). Surrounding the branch site is a region of U2 snRNA complementarity important for efficient complex formation (48, 69, 72, 73). The polypyrimidine tract interacts with several factors, including U2AF 65 and PUF-2 (41, 56, 70), and exon enhancer sequences or downstream 5' splice sites interact with SR and other proteins or Ul snRNP to promote U2AF 65 binding and complex formation (e.g., 27, 36, 61, 67). In addition, binding of U1 snRNP and other factors to the 5' splice site probably stimulates complexes; and, in a role that is not understood, Ul snRNP is also required for complex A formation independently of 5' splice site interaction (6). To establish minimal requirements for this process, shorter RNAs, made by deleting from both ends of a 234-nucleotide model pre-mRNA, PIP85.B RNA (which contains a well-defined branch site with only one adenosine in the region; 51), were tested for formation of A-like U2 snRNP complexes (data not shown). The shortest RNA efficiently forming a complex that co-migrated with complex A on native gels was RNA(146-179), a 34-nucleotide RNA containing only a branch sequence and polypyrimidine tract (BS-PPT RNA; Figures lA, lower, and lB). This complex is designated Amin, since it represents an A-like complex on a minimal substrate. This RNA substrate notably lacks several elements discussed above that presumably contribute to efficient complex A formation. It does not contain the region thought to be the binding site for SF3a and SF3b (25). Nor does it contain any sequence 3' to the 3' splice site that could act as an exon enhancer element. In addition, it does not contain the 3' splice site AG: comparison of RNAs either containing or deleted of the 3' splice site AG or containing a mutated 3' splice site region did not show detectable differences in complex formation (data not shown). As indicated in Figure IB and discussed in depth below, formation of Amin does not require ATP. In the analysis of truncated pre-mRNAs, RNAs containing additional sequences 3' to the BS-PPT region also formed A-like complexes in the absence of ATP (e.g., Figure 4C, lane 9); but, RNAs containing additional sequences 5' to the BS-PPT region did not [RNA(1-234), RNA(64179), and RNA(104-179); Figure IB, lane 2, and data not shown]. Northern blot analysis of the A, complex separated by native gel electrophoresis showed no detectable U1, U4, U5, or U6 snRNA in the complex; however, free 17S U2 snRNP (in the absence of BS-PPT RNA) migrates close to Amin in this gel system, making evaluation of the U2 snRNA content of Amin indeterminate (data not shown). To verify that contained U2 snRNP, the snRNA composition was analyzed by Northern blot after streptavidin-agarose affinity selection using BS-PPT RNA containing 3'-terminal biotin Amin 163 [RNA(146-179, bio); Figure IC]. The A. complex was highly enriched for U2 snRNA (lane 2) as compared to all five snRNAs in spliceosomes formed on full-length pre-mRNA (lane 3). A small amount of U4, U5, and U6 snRNAs was selected (<5-10% of the level of U2 snRNA relative to full-length pre-mRNA); this may relate to larger, as yet uncharacterized, complexes sometimes observed after long incubations (e.g., Figure 3B, lanes 6 and 7, or Figure 4B, lane 7). Ul snRNA was also selected using biotin-tagged BS-PPT RNA; this was not unexpected since formation of complex A on full-length pre-mRNA, as well as complex A3' on 3' partial RNA substrates, is dependent on both U 1 and U2 snRNPs (6, 58). However, since the Ul snRNP association was not stoichiometric with U2 snRNP, the snRNP requirements for A. n complex formation were tested in extracts depleted of various snRNPs (6, 8). These extracts alone did not form mature spliceosomes on pre-mRNA, but when mixed they complement for spliceosome formation and for splicing (data not shown; for an analysis of these specific extracts, see 18). In particular, the extracts depleted of either Ul or U2 snRNP did not form complex A on pre-mRNA (see Figure 1 in ref 20). As expected, extracts depleted of U2 snRNP did not form An n complex (Figure lD, lane 3), and extracts depleted of U4/6 snRNP formed complexes just as well as mockdepleted extract (cf. lane 4 to 1). Surprisingly, however, U 1-depleted extracts also formed An n complexes efficiently (cf. lane 2 to 1); thus, the binding of U2 snRNP to the branch region per se does not require Ul snRNP. Both branch sequence and polypyrimidine tract are required. BS-PPT RNA contains two sequence elements - a branch sequence (i.e., U2 complementarity region; 5'-UGCUGAC-3', where the underlined A represents the branchsite adenosine) and a polypyrimidine tract (5'-CUUCUUCUCUCUUUUUCCCUC-3') (Figure 1A, lower). To investigate the individual contributions of each of these elements, we tested RNAs containing mutations in each element (Figure 2). RNAs containing a mutated branch sequence (5'-...UGCUGAC...-3' -- 5'-...GUCGUAC...-3') did not form Amin complex (Figure 2A, lane 3). Similarly, RNAs in which the polypyrimidine tract was replaced by 5'-...GACGGACAUGCAAUGCAACUC-3' did not form Amin complex (lane 2). Furthermore, RNAs containing shorter polypyrimidine tracts did not form complexes with U2 snRNP as efficiently. For example, removal of 7 or 14 pyrimidines from the 3' end [RNA(146-172) and RNA(146-165), respectively] or an internal deletion [RNA(146-155, 169-179)] in the polypyrimidine tract resulted in significantly less 164 complex (data not shown). These data suggest that both sequence elements make specific contributions to An complex formation, as expected for an analog of complex A. Both elements, the branch sequence and the polypyrimidine tract, were required in cis. As expected from the mutations tested above, neither sequence alone formed A-like complexes (Figure 2B, lanes 1 and 6). When added in trans they also could not form complexes: labeled branch sequence RNA mixed with unlabeled polypyrimidine-tract RNA did not form detectable complexes (Figure 2B, lanes 2-5); similarly, unlabeled branch sequence RNA mixed with labeled polypyrimidine-tract RNA also did not form detectable complexes (lanes 7-10). We next tested the ability of each of the two RNAs to compete with BS-PPT RNA in complex formation. Although neither branch sequence RNA nor polypyrimidine-tract RNA formed a stable complex alone, polypyrimidine-tract RNA did compete with BS-PPT RNA (lanes 11-15). Branch sequence RNA competed with BSPPT RNA only at the highest concentrations tested (1 gM; lanes 16-20). Therefore, although each element may interact with required factors independently at high concentrations, both elements are required in cis to form a stable complex. Furthermore, factors recognizing the polypyrimidine tract are either more limiting, required earlier in the binding process, or more critical than factors recognizing the branch sequence. In addition, Amn complex will form only on RNAs in which the branch sequence and polypyrimidine tract elements are in the correct orientation. When the polypyrimidine tract was placed 5' of the branch sequence [PPT-BS RNA: RNA(156-179, 145-155)], no A-like complexes were detected (Figure 2C). Thus, interactions between factors binding to these two elements are sensitive to their relative positions, and the branch sequence must be 5' of the polypyrimidine tract in order to form correct interactions in making A n. Similarities of Amin to complex A containing U2 snRNP. In addition to the above, several lines of evidence suggest that ,in reflects many aspects of authentic complex A. For example, the ionic strength dependence of A,. n complex formation corresponds to that required for splicing (Figure 3A). When assayed across a series of KC1 concentrations, the optimum was 60 mM, as is found for splicing conditions (reviewed in 47). No complexes were observed at high ionic strength (>200 mM), which was previously found to stabilize the formation of pseudospliceosomes (31). It should be noted that these high ionic strengths would be expected to stabilize simple duplexes, so destabilization of the An complex suggests that the latter is not simply due to RNA-RNA base-pairing. This is also supported by several other lines of evidence. When RNA-RNA pairing was enhanced by making the branch sequence perfectly complementary 165 to U2 snRNA (5'-...UGCUGC...-3'), Amin complexes were reduced 96% (52); this contrasts with the stable binding of oligonucleotides for tagging or depletion that is via a much longer sequence complementarity to U2 snRNP (5, 35). Also, unlike the stabilizing effect of 2'-O-methyl sugars on simple hybridization, 2'-O-methylation across the branch region of BS-PPT RNA abrogated formation of A,, n complexes (data not shown). Finally, when the melting temperatures of several RNA-RNA duplexes were measured in the absence of proteins, a branch sequence-U2 RNA duplex was not stable under these conditions (see 51). In contrast, after formation, Amin complex was stable to chase with excess cold competitor for greater than 4 hours at 300 C (in the absence of ATP, see below; Figure 3B, lanes 1-7). If added first, this level of competitor completely saturated the A n complex-forming components (lanes 8-14), demonstrating that the maintenance of complexes in lanes 1-7 was not due to release and reformation. These data, together with the requirement for both branch and polypyrimidine tract sequences, argues strongly that in complex is not based principally on base-pairing interactions. The factor requirements for Amin are similar to those for complex A. Assembly of U2 complexes on full-length pre-mRNA requires the presence of U2AF 65 (56, 70) and is strongly stimulated by the presence of a factor PUF-2, which elutes from a poly[U] column at 2 MKC1. This factor contains primarily two polypyrimidine tract-binding proteins, p54 and p130 (41). Extracts depleted of these factors did not support Amin formation (Figure 3C, lane 2), whereas a mock-treated extract did form Amin (lane 1). Addition of the PUF-2 fraction alone did not significantly restore activity (lane 3), and addition of recombinant U2AF 65 restored only a low level of activity (lane 4). Addition of both PUF-2 and U2AF65 restored the ability to form Amin complex (lane 5), in keeping with the requirement of these protein factors for efficient formation of complex A and for splicing (41). Previously, three proteins - p14, p35, and p150 - were photo-cross-linked to the branch site as components of both complexes A and A3', using a linker and photo-active agent that could sample distances up to 15 A (38). The same benzophenone photo-reagent was placed site-specifically on the branch-site adenosine ofBS-PPT RNA (Figure 3D). This modified RNA was incubated to form A. complex and UV irradiated; the complexes were separated on a native gel, the A n complexes excised and digested with RNase A, and the proteins cross-linked to the labeled RNA fragment analyzed on an SDS gel. The same three molecular weight proteins, p14, p35, and p150, were labeled within Amin as were observed within full complex A. Using direct UV irradiation, one of these three proteins, the p14, cross-linked directly to the branch-site nucleotide in An n complex as it does in complex A (Figure 3E and 52). The other protein detected in this assay, p70, is cross166 linked at another site within the branch region (see 52). and likely corresponds to BPS 70 previously observed to cross-link in complex A (16, 52). Thus, Ami n contains similar components and similar interactions proximal to the branch-site adenosine as those detected in authentic complex A. Amin complex formation is ATP independent and undergoes an ATPdependent dissociation. Formation of complex A on full-length pre-mRNA, as well as A3' complexes on 3' partial RNAs, requires ATP (29). Surprisingly, as was suggested in Figures IB and 2, assembly of A. n complexes does not require ATP. A complex formed more efficiently (see below) in the absence of ATP (i.e., in extracts depleted of ATP; see materials and methods) compared to levels observed in the presence of ATP (Figure 4A, cf. lanes 8-14 to 1-7). As expected, A-type complexes did not form on full-length pre-mRNA in the absence of ATP (cf. lanes 24-25 to 22-23). Also, Ain complex formation was even more efficient, or stabilized, in the presence of EDTA (lanes 15-21); this increase may be due to many effects, including stabilization of the RNA from degradation or chelation of Mg 2+ from trace levels of contaminating ATP (data not shown). Other studies have suggested that the presence of EDTA does not inhibit the formation of functional splicing complexes (1, 15). Furthermore, A n complexes, but not A or A3' complexes, form at 40 C, albeit with slower kinetics than at 30 0 C, also suggesting that ATP hydrolysis is not required (data not shown). The increase and subsequent decrease in A.i complexes in the presence of ATP (Figure 4A, lanes 1-7; and Figure 4D, curve a) suggests that two distinct processes are at work: both formation and dissociation. The increased level of A. complexes observed in the absence of ATP (Figure 4A, lanes 8-14 or 15-21; 4D, cf. curves b to a) suggested that the dissociation process was ATP-dependent. To test whether this represented an active process, complexes were formed in the absence of ATP, challenged with excess cold competitor BS-PPT RNA, and re-incubated either with or without the addition of ATP. During this re-incubation, A, complexes were dissociated in the presence of ATP, but not in the absence of ATP (Figure 4B upper, cf.lanes 8-13 to 2-7). This was not due to degradation of the RNA, which remained at similar levels throughout the incubations (Figure 4B lower, cf. lanes 8-13 to 2-7). The dissociation of complexes required both magnesium cation and hydrolyzable nucleotide-triphosphate. For example, AMP-PcP, AMP-cPP, or AMP-PnP could not replace ATP, although other NTPs or dNTPs could (data not shown). Thus, A.. complex is a substrate for an NTP-dependent dissociation 167 activity that results in rapid disassembly of U2 snRNP-containing complexes (Figure 4D, curve c). The level of complexes formed in the presence of ATP (curve a) is probably the sum of the two processes of complex formation without ATP (curve b) and of dissociation using ATP (curve c), indicating a dynamic assembly and disassembly of U2 snRNP complexes. To test whether the presence of additional sequences would alter the susceptibility of the complex to the dissociation activity, the stability of ,n complexes was compared to complexes formed on RNA additionally containing a 3' splice site and 3' exon [RNA(146234); Figure 4C]. Although this RNA ostensibly is similar to 3' half RNAs used to form A3' complex, it does not contain sequences 5' to the branch region and forms complexes in the absence of ATP, making this comparison possible. As before, preformed An n complex dissociated rapidly when challenged with ATP and competitor BS-PPT RNA (lanes 2-8) compared to no chase (lane 1) or chase without added ATP (lane 2). In contrast, complexes containing the RNA with additional 3' sequences were relatively stable to this challenge with ATP and competitor RNA(146-234) (lanes 11-16) compared to no chase (lane 9) or chase without ATP (lane 10). Thus, the presence of additional 3' sequences stabilizes Amin complex from disassembly in the presence of ATP. Effects of 2'-H substitutions. Formation of the A n complex is exquisitely sensitive to branch-site modifications. In contrast, formation of complexes at the branch site of full-length pre-mRNAs is only minimally affected by branch site modifications (51, 52). For example, a double 2'-deoxynucleotide (2'-H) substitution at the branch-site adenosine and immediately 5' to it (5'-UGCUGHAHC -3'; where the superscripted letters indicate the 2' moiety) only slightly reduced U2 snRNP complex formation on a full-length pre-mRNA; rather, there was an accumulation of later complexes unable to undergo the first chemical step of splicing (51). In contrast, the same double substitution in BS-PPT RNA resulted in a 97% decrease in A complex formation relative to the all-ribose RNA (Figure 4, cf. lanes 8-14 to 1-7; Table 1). To test whether the large effect of the double 2'-H substitutions at the branch site and adjacent nucleotide were specific to these positions, a similar double 2'-H substitution was prepared three and four residues 5' to the branch site ( lanes 15-21). This resulted in a 70% decrease in the level of A complexes, significant but much less than the effect at the two positions above. Single 2'-deoxynucleotide substitutions at the branch site or at the immediately 5' residue resulted in approximately 40% and 20% decreases in A.i n complex 168 formation, respectively (Table 1). These effects are comparable to that of a 2'-H placed four nucleotides 5'-distal to the branch site, which decreased complex formation by approximately 12%. The modest effects of single substitutions compared to the dramatic effect of two 2'-H substitutions are consistent with either position contributing an important contact (see Discussion). The strong effect of double substitutions at the branch site and 5' to it (97% decrease) is not due just to cumulative effects of individual substitutions, as two separated 2'-H substitutions resulted in only a 40% decrease (which is roughly cumulative of the individual effects) and the two adjacent substitutions discussed above inhibited only 70%. Nor are one or two 2'-H substitutions likely to alter the conformation of the branch sequence-U2 helix (4, 22, 49). Thus, the simplified Amin system revealed an important 2'-OH contact at the branch site. DISCUSSION Interactions of the pre-mRNA branch site with U2 snRNP have been studied using a minimal RNA sequence containing only the branch region and polypyrimidine tract. The Amin complex formed under these conditions is an accurate reflection of many interactions in the generation of complex A, as both are critically dependent upon the sequences of the branch region and polypyrimidine tract and both require U2AF 65 and PUF-2 factors. Further, the adenine base and 2'-OH constituents of the branch site are important for formation of APin. Finally, the protein-RNA contacts around the branch site in Amin are identical to those of complex A, as shown by two photo-cross-linking methods. These characteristics indicate that the engagement of Amin complex components with the branch site is the same as within complex A. Surprisingly, formation of Amin complex does not require ATP or U1 snRNP, indicating that these factors are not necessary for stable association of U2 snRNP with a branch sequence per se. In the absence of ATP, Ain-type complexes do not form with RNAs containing sequences upstream of the branch site, suggesting that accessibility of the branch site in these RNAs might be ATP-dependent. Finally, the association of U2 snRNP with the branch region on a minimal substrate is dynamic - rapid ATP-dependent turnover indicates the presence of an active mechanism that releases or destabilizes U2 snRNP from branch sequences. A more sensitive system - 2'-OH and adenine interactions. The dramatic effects of 2'-deoxynucleotide and branch-site base substitutions on the formation of Amin demonstrate that this complex is more sensitive to subtle atomic changes than is pre-spliceosomal complex A. If multiple interactions contribute to overall stability 169 of complex A, then the absence of some of these interactions should result in complex formation being more critically dependent upon the remaining ones. In the case of the branch region, multiple weak interactions almost certainly contribute to the formation and stability of complex A (reviewed in 54). When some of these are absent - e.g., interactions at the 5' splice site, at the "U2 anchoring site" 5' to the branch region, and 3' to the polypyrimidine tract at the 3' splice site and exon enhancer sequences (e.g., 25, 36, 53, 60, 63, 66, 68, 74) - recognition of the branch site and polypyrimidine tract becomes more important. This increased sensitivity to the precise nature of chemical groups at the branch site has revealed 2'-OH and adenine contacts in Amin. The large effect of double 2'-H substitutions at the branch and the 5' adjacent sites, unlike single substitutions that have modest effects, indicates that contacts with these two positions are critical and, in some way, cooperative. This may relate to the alternative bulging of these two positions described previously (51) which would allow either 2'-OH present to fill a similar position; or, interaction with either 2'-OH might be adequate for formation of a stable complex. Recognition of the adenine base at the branch site is critical for Ar n formation and previously was shown to contribute to complex A stability (52). Relative to complex A, Ami has enhanced dependency upon an exocyclic C6-NH 2 group at the branch site, which contributes a significant positive effect; a C2-NH 2 group has a significant negative effect and a C6-oxo/N 1-H of guanine is strongly inhibitory. Thus, even at the time of initial U2 snRNP addition, and without any ATP-dependent transitions, there are specific contacts both with the adenine base and with 2'-OH groups in the ribose-phosphate backbone in the branch region. Interestingly, recognition of these specific contacts strongly correlates with the direct cross-linking of a p14 protein. This protein is thus a good candidate for the component of complex A that directly recognizes the branch-site adenosine. Amin complex forms independently of U1 snRNP and ATP. Formation of complexes in nuclear extracts depleted for individual snRNPs demonstrated that U2 snRNP, but neither Ul nor U4/6 snRNPs, is required for Amin formation. The Ul snRNP independence of Amin is surprising, since this snRNP is generally required for assembly of complexes on full-length pre-mRNAs (6). However, this requirement is not absolute, since both complex formation and splicing can sometimes occur in the absence of Ul snRNP (18-20, 64). Such Ul snRNP-independent splicing has been observed for a subset of pre-mRNAs exemplified by fushi tarazu (ftz) and with certain substrates in the presence of elevated concentrations of SR proteins; other substrates 170 do not show Ul -independent activity under any condition tested. The conditions reported here for An formation do not have elevated levels of SR proteins and the BS-PPT RNA is derived from a pre-mRNA that does not exhibit U -independent splicing even with added SR proteins (18). Since Am readily forms on the isolated branch sequencepolypyrimidine tract in the absence of Ul snRNP, the general requirement of Ul snRNP for complex A must be due to sequences external to these elements. The mechanism requiring ATP during the formation of complex A on pre-mRNA is not known. The independence of A. formation from this ATP requirement implies that ATP is not needed for U2 binding per se. This is consistent with previous results that suggested that U2 addition could occur without ATP in the background of a weakened U15' splice site interaction (37). The A. complex does not form on RNA substrates with sequences 5' of the branch site; formation of complex A or A3' on these substrates requires ATP. This suggests that there may be an ATP-dependent step required for exposure of the longer substrate RNA for the binding to U2 snRNP. For example, a helicase-type activity might unfold the substrate RNA for the subsequent binding of U2 snRNP; alternatively, a conformational change in U2 snRNP or another complex A factor might be necessary to allow interaction with sequences 5' to the branch region. In the presence of ATP, the binding of U2 snRNP in complex A is probably stabilized by interactions with both Ul snRNP and SR proteins in a dynamic equilibrium. The strength of interactions with Ul snRNP and SR proteins summed with recognition of the branch region and polypyrimidine tract would determine the level of complex A. If the other interactions were weak, then the determinants for stable formation of A a consensus branch region and extended polypyrimidine tract - would be critical. Thus, the sequence requirements of A. probably reflect those of complex A at introns containing other weak splice site elements. That formation of A is not as dependent upon the above dynamic processes is likely due to the simplicity of the short consensus substrate RNA. At the moment, it is conjectured that U2AF 65 and the PUF-2 complex of proteins bind the short substrate RNA since these proteins tightly bind poly[U] tracts and are required for complex formation. The other components that subsequently bind the substrate are the 17S U2 snRNP complex and perhaps other factors (Figure 6). The simplicity of the Ain complex will facilitate a full analysis of the assembly of pre-spliceosomes. An active mechanism of U2 snRNP removal. The A. complex rapidly dissociates in the presence of ATP. This represents an active mechanism that removes or destabilizes U2 snRNP from branch sequences. There 171 are several points in the spliceosome cycle at which such a mechanism may be required. This process might reflect the pathway of U2 snRNP removal from excised lariat introns, although this seems unlikely as both the chemical nature of the RNA substrate (no branched nucleotide) and the snRNP complement (U2 vs. U2/6/5) are different. More likely, it could represent a destabilization of complex A interactions that normally occurs during formation of spliceosomes; this destabilization process could be a weakening of interactions needed to progress beyond complex A, or, alternatively, this dynamic process could act as a proof-reading step. We propose that the ATP-dependent step may test the fidelity or total stability of the U2 snRNP-RNA complex. If the U2 snRNP complex is not stabilized by interactions with components recognizing other splicing signals, such as Ul snRNP and SR proteins bound to nearby sequences, then U2 snRNP would dissociate (Figure 6, dashed arrow). This mechanism would preclude stable formation of an A-type complex on sequences fortuitously resembling branch sequences and polypyrimidine tracts (see 57) found within introns and exons of nuclear precursor mRNAs. The biochemical assay demonstrated here will allow characterization of this active mechanism and its role in the splicing process. ACKNOWLEDGMENTS We are grateful to J. Crispino for generous sharing of reagents and to M. Moore for synthesis of oligo-ribonucleotides. We thank D. Bartel, B. Blencowe, J. Crispino, L. Lim, M. Moore and R. Pulak for their critical review of this manuscript, M. Siafaca for her patience and ever-present assistance, and the members of the Sharp lab for their continued interest and support. C.C.Q. is supported by Leukemia Society of America postdoctoral fellowship (3075-94). This work was supported by United States Public Health Service grant ROl-GM34277 and ROl-AI32486 from the National Institutes of Health to P.A.S. and partially by a Cancer Center Support (core) grant P30-CA14051 from the National Cancer Institute. REFERENCES 1. Abmayr, S. M., R. Reed, and T. Maniatis. 1988. Identification of a functional mammalian spliceosome containing unspliced pre-mRNA. Proc. Natl. Acad. Sci. USA 85: 7216-7220. 2. Abovich, N., X. C. Liao, and M. Rosbash. 1994. The yeast MUD2 protein: an interaction with PRP 11 defines a bridge between commitment complexes and U2 snRNP addition. Genes & Dev. 8: 843-854. 172 3. Arning, S., P. Griiter, G. Bilbe, and A. Krimer. 1996. Mammalian splicing factor SF1 is encoded by variant cDNAs and binds to RNA. RNA 2: 794-810. 4. Ban, C., B. Ramakrishnan, and M. Sundaralingam. 1994. A single 2'-hydroxyl group converts B-DNA to A-DNA. Crystal structure of the DNA-RNA chimeric decamer duplex d(CCGGC)r(G)d(CCGG) with a novel intermolecular G*C basepaired quadruplet. J. Mol. Biol. 236: 275-285. 5. Barabino, S. M., B. S. Sproat, U. Ryder, B. J. Blencowe, and A. I. Lamond. 1989. Mapping U2 snRNP-pre-mRNA interactions using biotinylated oligonucleotides made of 2'-OMe RNA. EMBO J. 8: 4171-4178. 6. Barabino, S. M. L., B. J. Blencowe, U. Ryder, B. S. Sproat, and A. I. Lamond. 1990. Targeted snRNP depletion reveals an additional role for mammalian Ul snRNP in spliceosome assembly. Cell 63: 293-302. 7. Behrens, S.-E., F. Galisson, P. Legrain, and R. Liihrmann. 1993. Evidence that the 60-kDa protein of 17S U2 small nuclear ribonucleoprotein is immunologically and functionally related to the yeast PRP9 splicing factor and is required for the efficient formation of prespliceosomes. Proc. Natl. Acad. Sci. USA 90: 82298233. 8. Blencowe, B. J., and S. M. L. Barabino. 1995. Antisense affinity depletion of RNP particles: application to spliceosomal snRNPs. In M. J. Tymms (ed.), Methods in Molecular Biology, Vol. 37: In Vitro Transcription and Translation Protocols. Humana Press Inc.: Totowa, N.J. p. 67-76. Brosi, R., K. Groining, S.-E. Behrens, R. Liihrmann, and A. 9. Kraimer. 1993. Interaction of mammalian splicing factor SF3a with U2 snRNP and relation of its 60-kD subunit to yeast PRP9. Science 262: 102-105. 10. Brosi, R., H.-P. Hauri, and A. Krimer. 1993. Separation of splicing factor SF3 into two components and purification of SF3a activity. J. Biol. Chem. 268: 17640-17646. Burgess, S. M., and C. Guthrie. 1993. Beat the clock: paradigms in the maintenance of biological fidelity. Trends Biochem. Sci. 18: 381-384. 11. Burgess, S. M., and C. Guthrie. 1993. A mechanism to enhance mRNA splicing fidelity: the RNA-dependent ATPase Prpl6 governs usage of a discard pathway for aberrant lariat intermediates. Cell 73: 1377-1391. 12. 173 13. Champion-Arnaud, P., and R. Reed. 1994. The prespliceosome components SAP 49 and SAP 145 interact in a complex implicated in tethering U2 snRNP to the branch site. Genes & Dev. 8: 1974-1983. 14. Chaudhary, N., C. McMahon, and G. Blobel. 1991. Primary structure of a human arginine-rich nuclear protein that colocalizes with spliceosome components. Proc. Natl. Acad. Sci. USA 88: 8189-8193. 15. Cheng, S. C., and J. Abelson. 1987. Spliceosome assembly in yeast. Genes & Dev. 1: 1014-1027. 16. Chiara, M. D., O. Gozani, M. Bennett, P. Champion-Arnaud, L. Palandjian, and R. Reed. 1996. Identification of proteins that interact with exon sequences, splice sites, and the branchpoint sequence during each stage of spliceosome assembly. Mol. Cell. Biol. 16: 3317-3326. 17. Couto, J. R., J. Tamm, R. Parker, and C. Guthrie. 1987. A trans-actingsuppressor restores splicing of a yeast intron with a branch point mutation. Genes & Dev. 1: 445-455. 18. Crispino, J. D., B. J. Blencowe, and P. A. Sharp. 1994. Complementation by SR proteins of pre-mRNA splicing reactions depleted of Ul snRNP. Science 265: 1866-1869. 19. Crispino, J. D., J. E. Mermoud, A. I. Lamond, and P. A. Sharp. 1996. Cis-actingelements distinct from the 5' splice site promote Ul-independent pre-mRNA splicing. RNA 2: 664-673. 20. Crispino, J. D., and P. A. Sharp. 1995. A U6 snRNA:pre-mRNA interaction can be rate-limiting for Ul-independent splicing. Genes & Dev. 9: 2314-2323. 21. Dignam, J. D., R. M. Lebovitz, and R. D. Roeder. 1983. Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res. 11: 1475-1489. 22. Egli, M., N. Usman, and A. Rich. 1993. Conformational influence of the ribose 2'-hydroxyl group: crystal structures of DNA-RNA chimeric duplexes. Biochemistry 32: 3221-3237. Fu, X. D. 1995. The superfamily of arginine/serine-rich splicing factors. 23. RNA 1: 663-680. 174 24. Gaur, R. K., J. Valcaircel, and M. R. Green. 1995. Sequential recognition of the pre-mRNA branch point by U2AF 65 and a novel spliceosome-associated 28-kDa protein. RNA 1: 407-417. 25. Gozani, O., R. Feld, and R. Reed. 1996. Evidence that sequence- independent binding of highly conserved U2 snRNP proteins upstream of the branch site is required for assembly of spliceosomal complex A. Genes & Dev. 10: 233-243. Grabowski, P. J., R. A. Padgett, and P. A. Sharp. 1984. 26. Messenger RNA splicing in vitro: An excised intervening sequence and a potential intermediate. Cell 37: 415-427. Hoffman, B. E., and P. J. Grabowski. 1992. Ul snRNP targets an essential splicing factor, U2AF65, to the 3' splice site by a network of interactions spanning the exon. Genes & Dev. 6: 2554-2568. 27. Jamison, S. F., and M. A. Garcia-Blanco. 1992. An ATP28. independent U2 small nuclear ribonucleoprotein particle/precursor mRNA complex requires both splice sites and the polypyrimidine tract. Proc. Natl. Acad. Sci., USA 89: 54825486. Konarska, M. M., and P. A. Sharp. 1986. Electrophoretic separation of complexes involved in the splicing of precursors to mRNAs. Cell 46: 845-855. 29. Konarska, M. M., and P. A. Sharp. 1987. Interactions between 30. small nuclear ribonucleoprotein particles in formation of spliceosomes. Cell 49: 763-774. 31. Konarska, M. M., and P. A. Sharp. 1988. Association of U2, U4, U5 and U6 small nuclear ribonucleoproteins in a spliceosome-type complex in absence of precursor RNA. Proc. Natl. Acad. Sci. USA 85: 5459-5462. Krimer, A. 1988. Presplicing complex formation requires two proteins 32. and U2 snRNP. Genes & Dev. 2: 1155-1167. Krimer, A. 1995. The biochemistry of pre-mRNA splicing. In A. I. 33. Lamond (ed.), Pre-mRNA Processing. R. G. Landes Company: Austin, TX. p. 35-64. Krimer, A. 1996. The structure and function of proteins involved in mammalian pre-mRNA splicing. Ann. Rev. Biochem. 65: 367-409. 34. Lamond, A. I., B. Sproat, U. Ryder, and J. Hamm. 1989. Probing the structure and function of U2 snRNP with antisense oligonucleotides made of 35. 2'-OMe RNA. Cell 58: 383-390. 175 Lavigueur, A., H. LaBranche, A. R. Kornblihtt, and B. 36. Chabot. 1993. A splicing enhancer in the human fibronectin alternate ED1 exon interacts with SR proteins and stimulates U2 snRNP binding. Genes & Dev. 7: 2405-2417. Liao, X. C., H. V. Colot, Y. Wang, and M. Rosbash. 1992. Requirements for U2 snRNP addition to yeast pre-mRNA. Nucleic Acids Res. 20: 42374245. 37. MacMillan, A. M., C. C. Query, C. R. Allerson, S. Chen, G. L. Verdine, and P. A. Sharp. 1994. Dynamic association of proteins with the premRNA branch region. Genes & Dev. 8: 3008-3020. 38. Madhani, H. D., and C. Guthrie. 1994. Dynamic RNA-RNA interactions in the spliceosome. Annu. Rev. Genet. 28: 1-26. 39. 40. Manley, J. L., and R. Tacke. 1996. SR proteins and splicing control. Genes & Dev. 10: 1569-1579. 41. McCaw, P. S., and P. A. Sharp. In preparation. 42. Michaud, S., and R. Reed. 1991. An ATP-independent complex commits pre-mRNA to the mammalian spliceosome assembly pathway. Genes & Dev. 5: 2534-2546. Moore, M. J., C. C. Query, and P. A. Sharp. 1993. Splicing of precursors to mRNA by the spliceosome. In R. Gesteland and J. Atkins (ed.), The RNA World. Cold Spring Harbor Laboratory Press: New York. p. 303-357. 43. Moore, M. J., and P. A. Sharp. 1992. Site-specific modification of pre-mRNA: the 2' hydroxyl groups at the splice sites. Science 256: 992-997. 44. Moore, M. J., and P. A. Sharp. 1993. The stereochemistry of pre45. mRNA splicing: evidence for two active sites in the spliceosome. Nature 365: 364-368. Nilsen, T. W. 1994. RNA-RNA interactions in the spliceosome: unraveling the ties that bind. Cell 78: 1-4. 46. Padgett, R. A., P. J. Grabowski, M. M. Konarska, S. Seiler, and P. A. Sharp. 1986. Splicing of messenger RNA precursors. Ann. Rev. Biochem. 55: 1119-1150. 47. Parker, R., P. G. Siliciano, and C. Guthrie. 1987. Recognition of the TACTAAC box during mRNA splicing in yeast involves base pairing to the U2-like snRNA. Cell 49: 229-39. 48. 176 49. Portmann, S., S. Grimm, C. Workman, N. Usman, and M. Egli. 1996. Crystal structures of an A-form duplex with single-adenosine bulges and a conformational basis for site-specific RNA self-cleavage. Chemistry & Biology 3: 173184. 50. Pruzan, R., H. Furneaux, P. Lassota, G. Y. Hong, and J. Hurwitz. 1990. Assemblage of the prespliceosome complex with separated fractions isolated from HeLa cells. J. Biol. Chem. 265: 2804-2813. Query, C. C., M. J. Moore, and P. A. Sharp. 1994. Branch 51. nucleophile selection in pre-mRNA splicing: evidence for the bulged duplex model. Genes & Dev. 8: 587-597. 52. Query, C. C., S. A. Strobel, and P. A. Sharp. 1996. Three recognition events at the branch site adenine. EMBO J. 15: 1392-1402. Ramchatesingh, J., A. M. Zahler, K. M. Neugebauer, M. B. 53. Roth, and T. A. Cooper. 1995. A subset of SR proteins activates splicing of the cardiac troponin T alternative exon by direct interactions with an exonic enhancer. Mol. Cell. Biol. 15: 4898-4907. 54. Reed, R. 1996. Initial splice-site recognition and pairing during pre- mRNA splicing. Curr. Opin. Genet. Dev. 6: 215-220. 55. Ruby, S. W., and J. Abelson. 1988. An early hierarchic role of U1 small nuclear ribonucleoprotein in spliceosome assembly. Science 242: 1028-35. 56. Ruskin, B., P. D. Zamore, and M. R. Green. 1988. A factor, U2AF, is required for U2 snRNP binding and splicing complex assembly. Cell 52: 207219. 57. Senapathy, P., M. B. Shapiro, and N. L. Harris. 1990. Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods Enzymol. 183: 252-278. 58. Sraphin, B., and M. Rosbash. 1989. Identification of functional U1 snRNP-pre-mRNA complexes committed to spliceosome assembly and splicing. Cell 59: 349-358. 59. S6raphin, B., and M. Rosbash. 1991. The yeast branchpoint sequence is not required for the formation of a stable Ul snRNP pre-mRNA complex and is recognized in the absence of U2 snRNA. EMBO J. 10: 1209-1216. 177 60. Staknis, D., and R. Reed. 1994. Direct interactions between premRNA and six U2 small nuclear ribonucleoproteins during spliceosome assembly. Mol. Cell. Biol. 14: 2994-3005. Staknis, D., and R. Reed. 1994. SR proteins promote the first specific recognition of pre-mRNA and are present together with the Ul small nuclear ribonucleoprotein particle in a general splicing enhancer complex. Mol. Cell. Biol. 14: 7670-7682. 61. 62. Strobel, S. A., T. R. Cech, N. Usman, and L. Beigelman. 1994. The 2,6-diaminopurine riboside*5-methylisocytidine wobble base pair: an isoenergetic substitution for the study of G*U pairs in RNA. Biochemistry 33: 1382413835. Sun, Q., R. K. Hampson, A. T. Krainer, and F. M. Rottman. 1993. General splicing factor SF2/ASF promotes alternative splicing by binding to an exonic splicing enhancer. Genes & Dev. 7: 2598-2608. 63. Tarn, W. Y., and J. A. Steitz. 1994. SR proteins can compensate for 64. the loss of Ul snRNP functions in vitro. Genes & Dev. 8: 2704-2717. 65. Umen, J. G., and C. Guthrie. 1995. The second catalytic step of premRNA splicing. RNA 1: 869-885. Valcircel, J., and M. R. Green. 1996. The SR protein family: pleiotropic functions in pre-mRNA splicing. Trends Biol. Sci. 21: 296-301. 66. Wang, Z., H. M. Hoffmann, and P. J. Grabowski. 1995. Intrinsic U2AF binding is modulated by exon enhancer signals in parallel with changes in 67. splicing activity. RNA 1: 21-35. Watakabe, A., K. Tanaka, and Y. Shimura. 1993. The role of exon 68. sequences in splice site selection. Genes & Dev. 7: 407-418. Wu, J., and J. Manley. 1989. Mammalian pre-mRNA branch site 69. selection by U2 snRNP involves base pairing. Genes & Dev. 3: 1553-1561. Zamore, P. D., J. G. Patton, and M. R. Green. 1992. Cloning and domain structure of the mammalian splicing factor U2AF. Nature 355: 609-614. 70. Zhang, W.-J., and J. Y. Wu. 1996. Functional properties of p54, a 71. novel SR protein active in constitutive and alternative splicing. Mol. Cell. Biol. 16: 54005408. 178 Zhuang, Y., A. M. Goldstein, and A. M. Weiner. 1989. 72. UACUAAC is the preferred branch site for mammalian mRNA splicing. Proc. Natl. Acad. Sci. USA 86: 2752-2756. Zhuang, Y., and A. M. Weiner. 1989. A compensatory base change in human U2 snRNA can suppress a branch site mutation. Genes & Dev. 3: 1545-1552. 73. Zuo, P., and T. Maniatis. 1996. The splicing factor U2AF 3 5 mediates critical protein-protein interactions in constitutive and enhancer-dependent splicing. Genes & Dev. 10: 1356-1368. 74. 179 LEGENDS TO FIGURES FIGURE 1. BS-PPT RNA forms an A-like complex with U2 snRNP. (A) Schematic comparison of RNAs that form complexes A (upper) and A. (lower). Regions that promote formation of complex A on pre-mRNA are bracketed. SS, splice site; BS, branch site adenosine; PPT, polypyrimidine tract; Ul compl., region complementary to Ul snRNA; SF3, binding site for SF3a and SF3b components; U2 compl., region complementary to U2 snRNA; exon seq., exon sequences that typically include enhancer elements, SR protein binding sites, and/or downstream 5' splice sites (none of which specifically are known to exist in this 234 nt pre-mRNA). (B) Co-migration. BS-PPT RNA [RNA(146-179)] (lane 1) or full-length PIP85.B pre-mRNA (lanes 2 and 3) were incubated in nuclear extract at 300 C for 20 min, adjusted to 0.5 mg/ml heparin, and separated on a native 4% polyacrylamide gel. A., a minimal U2 snRNP complex; H, nonspecific complexes. (C) snRNA composition. Biotinylated RNAs were incubated in nuclear extract at 300 C for 20 min, bound to streptavidin-agarose beads, and washed. Bound complexes were digested with protease, eluted, separated on a 10% polyacrylamide (19:1) gel, transferred to Nytran, and probed with antisense RNA probes for Ul, U2, U4, US and U6 snRNAs (30). Lane 1, beads alone; lane 2, biotinylated BS-PPT RNA [RNA(146-179, bio)]; lane 3, full-length PIP85.B pre-mRNA with biotin incorporated at random positions. (D) Dependence on snRNPs. BS-PPT RNA was incubated in mock-depleted extract (lane 1) or extracts depleted of Ul snRNPs (lane 2), U2 snRNPs (lane 3), or U4/6 snRNPs (lane 4), and analyzed as in (A). 180 5' SS Full-length pre-mRNA (234 nt) BS IU11--1GU-AG I PPT 1W UA EI U1 compl. BS-PPT RNA (34 nt) 0 3' SS SF3 exon seq. GGGUGCUGACUGGCUUCUUCUCUCUUUUUCCCUC U2 compl. polypyrimidine tract < z -Z / C-o m ~E ATPAmin- - An- B/C -A i -H H1 D nuclear extract E s Z C Amin- 4l 1 cD E ul e,"" -U2 -Ul -U4 -U5 -U6 H1234 1 2 3 FIGURE 2. Both branch sequence and polypyrimidine tract are required in cis. (A) RNA(146-179) with wild-type branch sequence and polypyrimidine tract (BSPPT RNA, lane 1), with a mutated polypyrimidine tract (5'-...CUUCUUCUCUCUUUUUCCCUC-3' -5'-...GACGGACAUGCAAUGCAACUC-3', lane 2), or with scrambled branch sequence (5'-...UGCUGAC...-3' -, 5'-...GUCGUAC...-3', lane 3) were incubated in nuclear extract at 30 0 C for 20 min and analyzed as in Figure IB. (B) Labeled branch sequence RNA (5'-GGGUGCUGAC-3', lanes 1-5), labeled polypyrimidine-tract RNA (5'-UGGCUUCUUCUCUCUUUUUCCCUC-3', lanes 6-10), or labeled BS-PPT RNA (lanes 11-20) were incubated in nuclear extract at 30 0 C for 20 min in the presence of 0 tM (lanes 1, 6, 11, 16), 0.001 gM (lanes 2, 7, 12, 17), 0.01 tM (lanes 3, 8, 13, 18), 0.1 giM (lanes 4, 9, 14, 19), or 1 LM (lanes 5, 10, 15, 20) cold competitor RNA. Competitor RNAs were either polypyrimidine-tract RNA (lanes 1-5, 11-15) or branch sequence RNA (lanes 6-10, 16-20). Reactions were adjusted to 0.5 mg/ml heparin and analyzed as above. (C) BS-PPT RNA [RNA(146-179)] (lane 1) or PPT-BS RNA [RNA(156-179, 146-155)] (lane 2) were incubated in nuclear extract at 300 C for 20 min and analyzed as above. 182 B A labeled RNA - BS r- PPT BS-PPT Amin cold RNA - Amin - ~ ~ Amin- Amin- Hf H-1 1 2 3 4 5 123 6 7 8 9 10 11 12 13 14 15 16 17 18 18 20 1 2 FIGURE 3. Characteristics of Ain complex. (A) Dependence on ionic strength. BS-PPT RNA was incubated in extracts adjusted to the KCl concentration indicated (mM), and adjusted to 0.5 mg/ml heparin, and separated on a native 4% polyacrylamide gel. (B) Stability. BS-PPT RNA was incubated in nuclear extracts depleted of ATP for 20 min to form A., complexes, then challenged with 1 nmol/ml cold competitor RNA(146-179), reincubated for the time course indicated, analyzed as above (lanes 1-7). Alternatively, the cold competitor BS-PPT RNA was incubated first for 20 min, and labeled BS-PPT RNA was then added and reincubated for the times indicated (lanes 8-14). (C) Dependence on U2AF 65 and PUF-2 (poly[U]-binding factor-2). BS-PPT RNA was incubated in mock-depleted extract (lane 1) or extract depleted of poly[U]binding proteins (lane 2); or poly[U]-depleted extract supplemented with a PUF-2 fraction (lane 3), supplemented with recombinant U2AF 65 (lane 4), or supplemented with both recombinant U2AF 65 and the PUF-2 fraction (lane 5) and analyzed as in (A) above. (D) Proteins that cross-link in An,. complexes using a 15 A probe. BS-PPT RNA modified at the N6 position of the branch-site adenosine to contain benzophenone (38) was incubated to form A. complexes, UV irradiated at 302 nm and separated on a native polyacrylamide gel. Complexes were isolated and treated with RNase, and the proteins subsequently separated on a 16% (200:1) polyacrylamide-SDS gel. (E) Proteins that cross-link in A. complexes within 2 A of the branch site. BSPPT RNA modified at the branch site to contain 2,6-diaminopurine (52) was incubated to form A. complexes, UV irradiated at 254 nm, and analyzed as in (D) above. 184 [KCI] - 20 60 100 20250300 I Amin- chase with cold compet. cold compet. first time (hours) - '0.25 .5 1 1.5 2 4 0 .25 .5 1 1.5 2 4 500 of chase Amin - Io H- H- t 1 2 3 4 5 6 7 1 23456789 E Dap Benzophenone r nuclear _ extract Amin- H Amin 200 I . 46- r ,---p35 30 - I 14- 1 45 10069- 46- 123 1 H Amin I S-- p150 20010069- 8 9 1011 121314 - -p14 30-- - 14- 1 2 p1 4 FIGURE 4. Independence of Amin complex formation on the presence of ATP and dissociation of Amin complex in the presence of ATP. (A) Time course of complex assembly for BS-PPT RNA (lanes 1-21) or for fulllength PIP85.B pre-mRNA (lanes 22-27) in the presence of ATP (lanes 1-7, 22-23), absence of ATP (lanes 8-14, 24-25), or absence of ATP and presence of EDTA (lanes 1521, 26-27). RNAs were incubated for the times indicated as described in Materials and methods, adjusted to 0.5 mg/ml heparin, and separated on a native 4% polyacrylamide gel. A n, a minimal U2 snRNP complex; B/C, spliceosomal complexes B and C containing U2/4/5/6 snRNPs and pre-mRNA; A, pre-spliceosomal complex A containing U2 snRNP and pre-mRNA; H, nonspecific complexes; *, a faster migrating complex observed at low levels in the presence of ATP. (B) upper. BS-PPT RNA was incubated in nuclear extract depleted of ATP for 20 min to form A n complex (lane 1), then chased with 1 nmol/ml cold competitor RNA and reincubated for the time courses shown either in the absence (lanes 2-7) or presence of ATP (lanes 8-13). Reactions then were adjusted to 0.5 mg/ml heparin and loaded onto a native 4% polyacrylamide gel. lower. The RNA in samples from the above reactions was analyzed on a 15% (19:1) 8 Murea gel. (C) BS-PPT RNA (lanes 1-8) or BS-PPT-3'Exon RNA [RNA(146-234); lanes 9-16] were incubated in nuclear extract depleted of ATP for 20 min to form Amn or An like complexes, respectively. Cold competitor RNA was added [cold BS-PPT RNA for lanes 1-8, or cold RNA(146-234) for lanes 9-16] and reactions were reincubated at 300 C in the presence of ATP for the time courses indicated (lanes 3-8 and 11-16) or in the absence of ATP for 60 min (lanes 2 and 10). Lanes 1 and 9, no reincubation. Reactions were then analyzed as above. 186 FIGURE 4. (continued) (D) Graph of kinetics of formation and dissociation of A n complexes. For formation, complexes were assembled on BS-PPT RNA in the presence of ATP [--0-curve a; as in (A) lanes 1-7] or in the absence of ATP [--- curve b; as in (A) lanes 1521]. For dissociation, Ai n complexes were first assembled by incubation for 30 min in nuclear extract depleted of ATP, and then reincubated in the presence of ATP [---curve c; as in (B) lanes 8-13]. Relative complex formation was determined as the fraction of A . complex relative to the input RNA. Polyacrylamide gels were quantitated using a Molecular Dynamics PhosphorImager and ImageQuant software version 3.22. 187 BS-PPT RNA time(min) - Amin - +ATP t o In)-.oo soo 1 2 5 10 -ATP I 012510-- 20 30' '0 .1 pre-mRNA -ATP/+EDTA 1 2 5 10 20 30 01261O0 1 2 5 10 2 30 - + time (min) of chase -1+ is 0 ~0200 20' ( Amin -ATP +ATP 0 2 5 102060 0 2 5 102060 ""aI1 00 2 3 4 5 6 7 8 9 10 11 1213 -B/C $A 1 time(min) 1 2 3 4 5 5 7 8 91011121314 15161718192021 ofchase 222324252627 -ATP +ATP 0 2 5 1020 o 0 2 5 10 20 o 1 2 3 4 5 6 7 8 910111213 BS-PPT RNA -- RNA - BS-PPT BS-PPT-3'Exon +ATP time (min) of chase P 60 2 5 10 2060 I i Amin +ATP -ATP +ATP 2 5 102060 8 D dki Sa (+ATP i'in 0 10 20 30 30 50 70 time (min) 123456 s 78 9 101112 13141516 Formation D issociat ion 90 FIGURE 5. Time course of complex assembly for RNA oligonucleotides containing 2'-H substitutions. BS-PPT RNAs containing an all-ribose branch sequence (lanes 1-7), two 2'-deoxynucleotides at the branch site and immediately 5'-adjacent (UGCUGHAHC ; lanes 8-14), or two 2'-deoxynucleotides 5'-distal to the branch position (UGHCHUGAC ; lanes 15-21) were incubated for the times indicated as described in Materials and methods, adjusted to 0.5 mg/ml heparin, and loaded onto a native 4% polyacrylamide gel. See Table I for quantitations of effects of these and other substitutions. 189 all-ribo double deoxy double deoxy - -UGCUGAC - -UGCUGAc 0 1 2 5 102030 0 1 2 5 102030 0 1 2 5 102030 -UGCUGAc time (min) - - S. Amin - I H1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 FIGURE 6. Summary of formation of complexes A (left) and Amin (right). Schematic comparison of assembly of complexes A and At. At top of the panels are shown diagrams of full-length pre-mRNA and the minimal BS-PPT RNA substrates. Both complexes require branch sequence and polypyrimidine-tract RNA elements, U2 snRNP, and protein factors U2AF 65 and PUF-2. Complex A requires U1 snRNP and ATP, which are not necessary for An. formation. In both complexes A and An., three proteins are detected within 15 A of the branch-site adenosine - p14, p35, and p150; the pl14 cross-links directly to the adenosine in both complexes. In the presence of ATP, U2 snRNP is released from A. complexes (dashed arrow). Other components not shown, such as SR proteins (23, 40, 66) and SF1 (3, 31), are important for complex A formation in other systems and may also be required for A. n. The arrangement of proteins is illustrative only, and not meant to imply a known spatial order. SS, splice site; BS, branch site adenosine; PPT, polypyrimidine tract; PUF-2, poly[U]-binding factor-2; Ul, U1 snRNP; U2, U2 snRNP; CC/E, commitment or early complex; A, pre-spliceosomal complex A containing U2 snRNP and pre-mRNA; A., a minimal substrate RNA-U2 snRNP complex. 191 M 5' SS BS 1 1 3'ISS BS PPT GU AM Full-length pre-mRNA (234 nt) 1 PPT AAG -A" BS-PPT RNA (34 nt) CC/E -A VIP AT P \ATP / Amin TABLES TABLE 1. Relative yields for Am,i complex formation of modified substrates. Modification Branch Relative Am sequence a complex formationb all-ribose UGCUGAC 1.0 2'-deoxyribose UGCUGAHC 0.62 2'-deoxyribose UGCUGHAC 0.80 2'-deoxyribose UGHCUGAC 0.88 double 2'-deoxyribose UGCUGHAHC 0.03 double 2'-deoxyribose UGHCHUGAC 0.30 double UGHCUGHAC 0.59 2'-deoxyribose 193 aSite-specific 2'-deoxyribose modifications at or near to the branch site are indicated by the superscripted letters. bRelative complex formation was determined as the fraction of A. complex formed during a time course relative to the input RNA and normalized to the respective all-ribosecontaining RNA. Polyacrylamide gels were quantitated using a Molecular Dynamics PhosphorImager and ImageQuant software version 3.22. For each band in every lane an individual background value was determined from the area in the same lane immediately above or below that band. 194 SPECULATIVE APPENDIX: A PROPOSED INTERACTION BETWEEN THE BRANCH ADENOSINE AND THE U5 LOOP NUCLEOTIDE URIDINE 4, A MECHANISM TO JUXTAPOSE THE SUBSTRATES OF FIRST STEP OF premRNA SPLICING Patrick Schonleber McCaw and Phillip A. Sharp 195 While much is known about the constituents and secondary structural elements that are found in the catalytic spliceosome, many questions remain about what is the chemical mechanism of splicing. It is not yet clear, for example, whether the splicing reaction is catalyzed by RNA or by protein. The mechanism by which the first step and second step substrates are positioned is also not known. A model for the structure of the active site has been proposed (Steitz, 1992), but it remains controversial. What is clear is that RNA secondary structural elements are important and conserved features of the spliceosome (Madhani and Guthrie, 1994; Moore et al., 1993). These known secondary structural features do not suggest a mechanism for the juxtaposition of the 2' hydroxyl nucleophile of the branch adenosine with the phosphodiester bond of the 5' splice site (Figure 1, reprinted from (Moore et al., 1993)). Juxtaposition of the two substrates of the first step (the 2' hydroxyl of the branch adenosine and the phosphodiester bond of the 5' splice site) and the two substrates of the second step (the 3' hydroxyl of the 5' terminal exonic nucleotide and the phosphodiester bond of the 3' splice site) will be an essential feature of any successful model for the structure of the catalytic core of the spliceosome. This speculative appendix will briefly describe a model that juxtaposes the branch adenosine 2' hydroxyl nucleophile and the phosphodiester bond of the 5' splice site. Before the model is described a different representation of the secondary structural elements than the one shown in figure 1 will be described. A NEW REPRESENTATION OF THE SPLICEOSOMAL SECONDARY STRUCTURE A rearrangement of the secondary structure diagram of the group I ribozyme, proved to be a useful heuristic tool in understanding the structure of the group I ribozyme (Cate et al., 1996; Cech et al., 1994). I have arranged the standard secondary structure of the spliceosome (Madhani and Guthrie, 1994; Moore et al., 1993) in a way that mimics the method used for the group I ribozyme (Figure 2). The RNAs are indicated as vertical lines, the pre-mRNA is shown as a thick line and the snRNAs are shown as thin lines. Base pairs are indicated by horizontal lines. The rearrangement organizes the known helices in three columns. On the left is shown the 5' splice site helices with U5 snRNA (U5:5'ss) (Cortes et al., 1993; Newman and Norman, 1992) and U6 snRNA (Lesser and Guthrie, 1993; Sawa and Shimura, 1992). In the center is shown the helices formed between U6 snRNA and U2 snRNA, both at the top (Madhani and Guthrie, 1992) and at the bottom (Sun and Manley, 1995) of the page. On the right is shown the U2 snRNA helix with the 196 branch sequence (U2:BS). The branch adenosine is shown flipped out of the U2:BS helix. The branch adenosine is thought not to be base-paired to U2 snRNA for either chemical step (Query et al., 1994). The 2' hydroxyl and the phosphodiester bond of the 5' splice site are indicated. The position of the introns present in U6 genes are indicated by arrows (Tani and Ohshima, 1989). These introns are suggested to be the result of a trans-splicing reaction into a U6 molecule that was subsequently incorporated into the genome and so are thought to lie near to the active center of the catalytic spliceosome. The interaction between U2 snRNA A25 and U6 snRNA G52 deduced from genetic co-variation (Madhani and Guthrie, 1992) is indicated by the gray line. Crosslinking detected between the pre-mRNA terminal nucleotide of the 5' exon and U5 snRNA is indicated by gray lines (Sontheimer and Steitz, 1993; Wyatt et al., 1992). Nucleotides that are important in yeast for the first step of splicing are in gray boxes and U2 (Fabrizio and Abelson, 1990). The principal advantage to this secondary structural representation is that residues important for the first step of splicing are seen to cluster. This is even more apparent when the proposed interaction of Madhani and Guthrie is taken into account. This interaction would fold the top part of the U2:U6 helix back over the page and bring the boxed residues shown in figure 2 at the top of the page close to the center of the page where both the branch adenosine and the 5' splice site are found. This new diagrammatic arrangement of the secondary structural elements does not itself solve the juxtaposition problem; the branch adenosine 2' hydroxyl is shown on the far right of the diagram, while the 5' splice site phosphodiester bond is shown on the left hand side of the diagram. DESCRIPTION OF THE MODEL The proposed model for the secondary structure of the catalytic spliceosome requires that the branch adenosine (Ab) stacks between the last nucleotide of the exon (G-1) and the first nucleotide of the intron (G,). That is it stacks into to the 5'ss:U5 helix. Based on the genetic evidence and crosslinking evidence, I propose that the branch adenosine base-pairs to Uridine 4 of the U5 snRNA loop. Base-pairing of the branch adenosine and Uridine 4 displaces the G-_:Uridine 4 base-pair, forcing G_1 to base-pair with Uridine 5. This interaction is shown in figure 3 and the Ab:Uridine 4 base pair is indicated in bold. This base-pair interaction juxtaposes the 2' hydroxyl and the splice site phosphodiester bond in a way that is consistent with the known stereochemistry of splicing (Maschloff and Padgett, 1992; Moore and Sharp, 1993) and aligns the 2' hydroxyl nucleophile for attack of the phosphodiester bond of the 5' splice site, discussed in greater detail below. It should be 197 noted that the U5 loop contains a series of four uridines (Uridine4 to Uridine7) and in some of the mutations of the U5 loop, discussed below, the register of the base-pairing scheme is not conserved. In these cases, the branch adenosine is proposed to base-pair either to Uridine 5 or to Uridine 6. Evidence that supports the 5'ss:U5 interaction in the context of the Ab:Uridine4 base-pair will be discussed. The experiments performed to date do not directly test the model, however the evidence is consistent with the model. I will conclude with a brief description of experiments that may be done to test the model. The branch adenosine is remarkably conserved. It is perhaps the best conserved nucleotide in the vertebrate pre-mRNA, but surprisingly, it is not known to be base-paired to any of the snRNAs at any point in spliceosome assembly or catalysis. The branch adenosine conservation may be due to a required protein interaction; alternatively, a branch adenosine functional group could play a direct role in catalysis or the branch adenosine could base-pair to another nucleotide. It seems unlikely that the branch adenosine base is directly involved in catalysis as many of the substituents of the base can be changed and the first step of splicing can occur (Query et al., 1996). Most other highly conserved nucleotides of the spliceosome have been shown to base pair with other nucleotides, for example the 5' splice site sequences base pair to U 1 snRNA and to U6 and U5 snRNA, conserved nucleotides of U2 and U6 snRNAs are known to basepair (Madhani and Guthrie, 1994). A role for branch nucleotide-protein interaction cannot be ruled out, and in fact, the branch nucleotide is in contact with several proteins during assembly and probably in the active spliceosome (MacMillan et al., 1994; Query et al., 1996). Notably, both guanosine and 2-aminopurine can act as the branch nucleophile, both should be able to base pair with Uridine4. STRUCTURAL CONSIDERATIONS Unstacking an unpaired nucleotide from a helix is thermodynamically unfavorable due in large part to the loss of base-stacking interactions. How might the cost of unstacking this base effect the model? It is not known what the energetic cost is of moving an unpaired nucleotide from one helix into another. For the branch adenosine unstacking from the U2-pre-mRNA helix and base-pairing with Uridine4 while stacking into the 5'ss:U5 helix, the thermodynamics might be favorable as two hydrogen bonds would be gained upon base-pairing with Uridine4 and none would be lost by unstacking from the U2 helix. Moving the phosphodiester backbones of the two helices close enough to allow this interaction will be energetically unfavorable; however, this interaction must occur whether 198 the branch adenosine is base-paired with Uridine 4 or not, as the 2' hydroxyl and its adjacent 3' and 5' phosphates must approach the phosphodiester bond of the 5' splice site. A model for a branch adenosine that is bulged from the bs:U2 helix is provided by Portmann et al. who solved the crystal structure of duplex RNA-DNA containing an unpaired adenosine (Portmann et al., 1996). In this structure, the helix from which the adenosine is bulged stacks normally, the adenosine is bulged from the helix and is stacked on a bulged adenosine of an adjacent helix. Although the crystal structure provides no measure of the thermodynamic cost of unstacking, it suggests that bulged adenosines can be removed from the helix. Furthermore, the presence of a single base on which to stack, and not a helix may be sufficient to stabilize such a structure. The base-stacking interaction observed was believed to occur in solution as well as in the crystal (Portmann et al., 1996). This result argues that the branch adenosine can be unstacked and bulged, it could then stack into an adjacent helix. Portmann et al. also point out that the 2' hydroxyl is exposed upon unstacking and bulging, possibly facilitating its role as a nucleophile. Why would the branch adenosine interact with Uridine4? In order for the branch adenosine to base pair with Uridine 4, the 5' exonic terminal nucleotide must shift register one nucleotide in the 3' direction on U5 snRNA. This means that G_1, the exonic nucleotide, will base pair with Uridine 5 of U5 snRNA stretching the phosphodiester bond at the 5' splice site. If the branch adenosine enters the 5' ss:U5 helix from the major groove, the 2' hydroxyl is brought into apposition with the phosphodiester bond stretched across this base. The 2' hydroxyl nucleophile is arranged in this structure in a way that facilitates direct, in-line attack of the phosphodiester bond of the 5' splice site stretched across the branch adenosine basepair position. This stretched phosphodiester is itself not unprecedented, as intercalating DNA dyes should stretch the helix by an equivalent distance. Further support can be found in the crystal structure of a four stranded DNA molecule in which the two DNA helices are coaxial and the bases of each helix are intercalated with the bases of the other helix (Gehring et al., 1993). GENETIC EXPERIMENTS THAT ARE CONSISTENT WITH THE MODEL Is a base-pair between G_- and Uridine, predicted or allowed by the known interactions of this base and U5 snRNA as determined by genetic and biochemical experiments? There is genetic evidence that the branch adenosine is base-paired with Uridine 4 in both yeast and mammalian systems, though the evidence is indirect. I will discuss the evidence presented by Newman and Norman (Newman and Norman, 1992). In these experiments Newman and Norman use an activated cryptic 5' splice site to identify 199 the nucleotides of the U5 snRNA loop that are important in mediating splice site determination. U5 snRNA molecules that have a randomized U5 loop were selected and the sequence of the loops was determined (shown in figure 5). Seven loop sequences were identified; the consensus sequence of these loops is 5' GCNNUAUYC 3 '. Position 6, an invariant A (underlined), is interpreted to be sufficient for activating the cryptic 5' splice site used in these experiments, as Adenosine 6 would base-pair with the unusual exonic base U_-. The adjacent invariant position, Uridine 8, base-pairs with the 5' exon position A-2. Uridine5 is described as basepairing with G1 of the 5' splice site sequence. Why is Uridine5 an invariant uridine and not cytosine in every U5 loop sequenced, if position 5 interacts with G, and only G, in these reactions? The branch adenosine model predicts that position 5 must base-pair to both G, and the branch adenosine and so must be a uridine and cannot be a cytosine as uridine and not cytosine can base pair to both the branch adenosine and G1 . Newman and Norman test mutants of the U5 loop sequence against mutations made in the 5' exon sequences of the cryptic splice site. Each of the splicing phenotypes of these mutations is consistent with a branch adenosine-Uridine 4 or Uridine5 base-pairing interaction. None of the mutations made directly tests the proposed model. Similar experiments have been performed by Cortes et al. in mammalian cells (Cortes et al., 1993). In this experiment mutations of the U5 loop were made and the cryptic 5' splice site that was activated by the mutant U5 in the presence of a mutant 5' splice site was identified and sequenced. In only one case, does the addition of the branch adenosine to the 5'ss:U5 helix decrease the number of hydrogen bonds in the helix; splicing to this cryptic 5' splice site is very weak. In the other cases, addition of the branch adenosine increases or does not change the total number of hydrogen bonds in this helix. In some cases where the U5 loop Uridine 4 was mutant Uridine 6 was the base-pairing partner of the branch adenosine, more often it was Uridine5 . Deletion of the U5 loop does not prevent splicing (O'Keefe et al., 1996). This unexpected result would seem to contradict the proposed model. However, the fact that a reaction is robust to the deletion of a well established interaction; here, the interaction of the 5' splice site sequence with the U5 loop, should not be interpreted to mean that the interaction does not occur or that it is unimportant. There are two examples of redundant function in splicing already described; first, the interaction of U1 snRNA with the 5' splice site is known to be redundant in splicing (Crispino et al., 1994) and, second, the interaction of U2AF with the pyrimidine tract is known to be redundant as well (chapter 4). The 5'ss:U5 helix has been demonstrated both genetically and biochemically, described below, in both vertebrates and in yeast. This conserved interaction is likely to occur, 200 though it may not be absolutely required, other factors, such as the protein p200/PRP8, may substitute for this interaction to align the 5'ss:U5 helix. The 5'ss:U5 helix, present during catalysis, is known not to inhibit splicing (Sontheimer and Steitz, 1993). Sontheimer et al. demonstrate that the terminal nucleotide of the exon can be crosslinked to the U5 loop early in the splicing reaction and that this crosslink does not inhibit the splicing reaction. BIOCHEMICAL EVIDENCE Site-specific crosslinking experiments also support a base-pairing interaction between the U5 loop and a branch adenosine. In the first experiment Wyatt et al. use a thio-uridine at the -2 position of the exon, tU_2, they observe a strong crosslink to Uridine6 of the U5 loop and weaker crosslinks to adjacent residues of the U5 loop (Wyatt et al., 1992). Again demonstrating the close proximity of the U5 loop and the 5' splice site sequence. These experiments were extended by positioning the thio-uridine at positions -1 and +2 of the 5' splice site and -1 of the 3' exon (Sontheimer and Steitz, 1993). Crosslinks of the 5' splice site -1 position were mapped to two positions in the U5 loop Uridine4 and Uridine 5. This is consistent with the model in that it is expected that prior to the branch adenosine-Uridine 4 base-pairing that G_1 will base-pair to Uridine4 and subsequently be displaced to Uridine 5. Based on this model, we would expect that crosslinks formed between tU_, and Uridine 5 will splice significantly more efficiently than crosslinks formed between tU_1 and Uridine4, as crosslinks to Uridine4 should inhibit branch adenosine intercalation with the 5' splice site sequence. The stereochemistry of the first and second steps of splicing has been determined (Maschloff and Padgett, 1993; Moore and Sharp, 1993). Is the model consistent with the stereochemistry of the first step? It is known that the Rp phosporotioate diastereomer and not the Sp diastereomer inhibits splicing. This has been interpreted to mean that the Rp diastereomer may be bound to Mg +2 (Moore and Sharp, 1993). A plastic model built to represent the U5 loop-5' splice site branch adenosine helix demonstrated that the Spoxygen is directed in to the center of the 5 'ss:U5 helix and is unlikely to be bound by a metal due to steric constraints. In contrast, the Rp-oxygen faces out toward the U2 snRNA-branch sequence helix where it is entirely possible that it interacts with a metal. The model is shown in figure 4. To distinguish the Rp from the Sp oxygens in the figure, the Sp oxygen is a small white ball and the Rp oxygen is indicated with an arrow. 201 EXPERIMENTS THAT TEST THE MODEL Perhaps the easiest and most direct test of model is to repeat the experiments of Newman and Norman described above (Newman and Norman, 1992), using a mutant branch residue. It is known that a G at the branch position will complete the first step of splicing in vertebrates (Query et al., 1995), if this is true for yeast as well, then it might be possible demonstrate a compensatory mutagenesis between position 4 or 5 of the U5 loop and the branch position. It may also be possible to use a 4-thiouridine derivitized U5 loop to demonstrate the proposed base-pairing interaction, while tU favors RNA-RNA crosslinks in regions of non-Watson-Crick interaction, A:tU crosslinks have been observed (Sontheimer and Steitz, 1993). ACKNOWLEDGMENTS: This speculation is the result of many long, productive conversations with Charles Query and Andrew MacMillan. They have been great teachers. I am forever grateful for their patience and kind criticism. While any credit due this speculation belongs equally to them, its inevitable shortcomings are a reflection of the author, and not the teachers. The errors in reasoning and judgment are solely my own. 202 REFERENCES Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Kundrot, C. E., Cech, T. R., and Doudna, J. A. (1996). Crystal structure of a group I ribozyme domain: principles of RNA packing. Science 273, 1678-85. Cech, T. R., Damberger, S. H., and Gutell, R. R. (1994). Representation of the secondary and tertiary structure of group I introns. Nat Struct Biol 1, 273-80. Cortes, J. J., Sontheimer, E. J., Seiwert, S. D., and Steitz, J. A. (1993). Mutations in the conserved loop of human U5 snRNA generate use of novel cryptic 5' splice sites in vivo. EMBO J 12, 5181-9. Crispino, J. D., Blencowe, B. J., and Sharp, P. A. (1994). Complementation by SR proteins of pre-mRNA splicing reactions depleted of Ul snRNP. Science 265, 1866-9. Fabrizio, P., and Abelson, J. (1990). Two domains of yeast U6 small nuclear RNA required for both steps of nuclear precursor messenger RNA splicing. Science 250, 404409. Gehring, K., Leroy, J. L., and Gueron, M. (1993). A tetrameric DNA structure with protonated cytosine.cytosine base pairs. Nature 363, 561-5. Lesser, C. F., and Guthrie, C. (1993). Mutations in U6 snRNA that alter splice site specificity: implications for the active site. Science 262, 1982-8. MacMillan, A. M., Query, C. C., Allerson, C. R., Chen, S., Verdine, G. L., and Sharp, P. A. (1994). Dynamic association of proteins with the pre-mRNA branch region. Genes & Dev. 8, 3008-3020. Madhani, H. D., and Guthrie, C. (1994). Dynamic RNA-RNA interactions in the spliceosome. Annu Rev Genet 28, 1-26. Madhani, H. D., and Guthrie, C. (1992). A novel base-pairing interaction between U2 and U6 snRNAs suggests a mechanism for catalytic activation of the spliceosome. Cell 71, 803-817. Maschloff, K. L., and Padgett, R. A. (1992). Phosphorothioate substitution identifies phosphate groups important for pre-mRNA splicing. Nucleic Acids Res. 20, 1949-1957. 203 Maschloff, K. L., and Padgett, R. A. (1993). The stereochemical course of the first step of pre-mRNA splicing. Nucleic Acids Res. 21, 5456-5462. Moore, M. J., Query, C. C., and Sharp, P. A. (1993). Splicing of precursors to mRNA by the spliceosome. In The RNA World, R. Gesteland and J. Atkins, eds. (New York: Cold Spring Harbor Laboratory Press), pp. 303-357. Moore, M. J., and Sharp, P. A. (1993). Evidence for two active sites in the spliceosome provided by stereochemistry of pre-mRNA splicing. Nature 365, 364-8. Newman, A., and Norman, C. (1992). U5 snRNA interacts with exon sequences at 5' and 3' splice sites. Cell 68, 743-754. O'Keefe, R. T., Norman, C., and Newman, A. J. (1996). The invariant U5 snRNA loop 1 sequence is dispensable for the first catalytic step of pre-mRNA splicing in yeast. Cell 86, 679-89. Portmann, S., Grimm, S., Workman, C., Usman, N., and Egli, M. (1996). Crystal structures of an A-form duplex with single-adenosine bulges and a conformational basis for site-specific RNA self-cleavage. Chemistry & Biology 3, 173-184. Query, C. C., Moore, M. J., and Sharp, P. A. (1994). Branch nucleophile selection in pre-mRNA splicing: evidence for the bulged duplex model. Genes & Dev. 8, 587-597. Query, C. C., Strobel, S. A., and Sharp, P. A. (1995). The branch site adenosine is recognized differently for the two steps of pre-mRNA splicing. Nucleic Acids Symp Ser, 224-5. Query, C. C., Strobel, S. A., and Sharp, P. A. (1996). Three recognition events at the branch-site adenine. EMBO J 15, 1392-402. Sawa, H., and Shimura, Y. (1992). Association of U6 snRNA with the 5'-splice site region of pre-mRNA in the spliceosome. Genes & Dev. 6, 244-254. Sontheimer, E. J., and Steitz, J. A. (1993). The U5 and U6 small nuclear RNAs as active site components of the spliceosome . Science 262, 1989-96. Steitz, J. A. (1992). Splicing takes a holliday. Science 257, 888-9. Sun, J. S., and Manley, J. L. (1995). A novel U2-U6 snRNA structure is necessary for mammalian mRNA splicing. Genes Dev 9, 843-54. 204 Tani, T., and Ohshima, Y. (1989). The gene for the U6 small nuclear RNA in fission yeast has an intron. Nature 337, 87-90. Wyatt, J. R., Sontheimer, E. J., and Steitz, J. A. (1992). Site-specific cross-linking of mammalian U5 snRNP to the 5' splice site before the first step of pre-mRNA splicing. Genes Dev 6, 2542-53. 205 FIGURE LEGENDS Figure 1. Known base-pairing interactions of U1, U2, U5, and U6 The conventionally drawn base pairing interactions of the pre-mRNA and U 1, U2, U5 and U6 are shown. Reprinted from Moore et al. (1993). 206 caaagagAuuuaUUucgUUUU>p GuuUCUcuaag cA U2 G Figure 2. The rearranged base-pairing interactions. The basepairing interactions between the pre-mRNA and Ul, U2, U5 and U6 are shown. The U2:U6 helices extend beyond the top and bottom of the page. The U6 stem-loop, in figure 1, is not shown, but begins at the top of the page. 208 U25' C* 1.. U6 3' I I 00000 I sss9la~8ossp~ssss~Bs~ intron 3' I I I 4 rQ I Vo oCo 1 I v 0* I II" " I" 1 p I XN1I 00 00 U5 5' I I I I I l U5 3' I "D .I.iL..T . 0O |I/I SO @ 00 U2 3' D D I I I I I I U6 5' Figure 3. The proposed base-pairing interaction between the branch adenosine and Uridine4. The branch adenosine is shown basepairing with Uridine 4 of the U5 stem loop. Note that the exonic G (G,) has been displaced one nucleotide 3' on the U5 loop to Uridine 5. 210 U2 5 l~~lI I I I QUD~ 0 U6 31 IIl intron 3' i fzj 0 D~~0D~D U C) 00 0 -0 z U5 5 ii i I I LD ID I I I U5 3' D -,qq I U2 3 u 0 0 Figure 4. Photographs of the model of the proposed structure. A. The 5' splice site G-l/G1 U5 loop helix with the intercalated branch adenosine (Ab) is shown, view is in to the major groove. The pre-mRNA:U2 snRNA helix is shown to the left. The boxed region is shown in greater detail in B. B. Close-up view of the 5' splice site phosphodiester bond and the branch adenosine ribose sugar. The Rp oxygen is indicated, the Sp oxygen is facing out of the page. 212 p H,OIZ. . . ----- Figure 5. Proposed base-pairing interactions of the selected U5 snRNA molecules in the Experiment of Newman and Norman and selected 5'splice site sequences of Cortes et al. Selected U5 loop sequeces are shown in the top half of the figure, the consensus of the selected sequences is indicated and the sequence of the 5' splice site and stacked branch adenosine is shown. Adapted from Newman and Norman (1992), figure 2. Below the dashed line is the 5' splice site selections with mutant and wild type U5 snRNA of Cortes et al.. Branch adenosine is in outline font, exon sequences are underlined and proposed base pairs are indicated by a dash, these base pairs include G:U pairs. In the Cortes experiment the mutant U5 bases are indicated as plain rather than bold text. 214 pre-mRNA 3 GAUGAUAUC 5 CONSESUS 5 GCNNUAUYC 3 GC G C GC G C GC G C GC selected U5 loops 5 snRNA 5, GGUAUCC ACUAUUC UAUAUCC CGUAUUC UAUAUUC CUUAUCC CCUAUCC re-mRN A GUGAGGACU 5 snRNA GCCUUUUAC NA UUGAAGUGG snR NA GCCUUUUAC r e -mR 5 r e -mR NA ' GUGAGAAUC 5 NA S GCCUUUUAC snR r e -mRNA 5 UCG snRNA AAUCG GCCUUUUAC re-mRNA UUGAGACGG snRNA GCCUUUUAC 5 re-m RNA 5 sn RNA 3 GCCUUUUAC 3 5 ' GGUG CAC GCCGUUUAC 215 pre-mRNA U5 snRNA pre-mRNA U5 snRNA pre-mRNA I I I I snRNA I 5 ' GCCUGUUAC 3 ' 3 GAGUGAGUG 5 ' ' 5 GCCAUUUA 3 ' GAGUGAGUG 5 ' GCC I U5 5 GUGACACU 3 C 3 ' 5 ' 3 ' x I-I U AUUAC 216 AFTERWORD A mechanistic understanding of the process by which splice site sequence information and context information is read by the spliceosome, requires an understanding of the proteins and spliceosomal complexes that form on the pre-mRNA and an understanding of the basic cell biology of splicing. The identification of a new pyrimidine tract binding factor, PUF, that is required for the efficient splicing of pre-mRNA in vitro suggests that the process of pyrimidine tract recognition is more complicated than has been anticipated. There are four outstanding issues to be resolved with regard to the PUF factor and the PUF protein PUF60. 1. Is PUF60 and/or p54 required for PUF activity? Is some other component of the PUF fraction required? 2. What is the function of the PUF activity? At what step in spliceosome assembly does it act? 3. What cytological structure is PUF60 associated with? Are the discrete nuclear bodies observed to stain with PUF60, PML bodies, coiled bodies or some unidentified nuclear body? Is the staining that we observe representative of all or just a subset of the PUF60 in the cell? 4. Is the SDSresistant dimerization representative of a native state interaction? Is the proposed PUMP domain interaction motif G, F/Y, E/D, X, V/I, T/S responsible for the interaction of the U2AF35 PUMP domain with U2AF65? PUF60: a splicing factor? The principal question that needs to be answered is, is PUF60 a splicing factor? A related question is, is the PUF60/p54 complex responsible for the PUF activity that was purified in chapter 2? Without an antiserum that immunoprecipitates PUF60 or a reconstituting expressed PUF60/p54 fraction, it will be difficult to determine if PUF60 is a splicing factor. These two problems have been major stumbling blocks to progress. One approach that is being taken to solve this problem is to affinity tag the PUF60 protein and express this protein in mammalian cells. This experiment is being pursued both here and at the University of Sherbrooke by Pascal Bouffard and Gilles Boire and a collaboration has been established to share reagents. We hope to show, using this method, that the expressed-tagged protein restores splicing activity to NEAU. A large-scale immunoprecipitation (IP) with the weakly immunoprecipitating anti p54 antibody can also be attempted, and the eluate of such an IP could be tested for reconstitution of NEAU activity. This has not previously been attempted due to the limiting quantity of this antibody (the gift of Nilabh Chaudhary). 217 Experiments that address functional issues It will be interesting to determine what proteins associate with PUF60 and p54. Under what conditions do PUF60 and SF1/BBP interact. Current experiments show only a weak interaction, but this interaction may be stronger under some conditions. It will be important to determine, both on functional grounds and on pedagogical grounds what cofactors or conditions enhance this activity. It will be of interest to determine what snRNAs and splicing protein components are present on pre-mRNAs that are incubated in NEAU in the presence of PUF or U2AF or both proteins. This experiment may most easily be done using tagged PUF and U2AF. Complexes formed can then be compared directly for snRNA content and protein content. A similar experiment was performed using GstU2AF several years ago that demonstrated an association with both U2 and Ul snRNPs dependent on the presence of the 3' half premRNA (P. S. M., Anna Gil and P. A. S., unpublished). A similar approach was taken in which pre-mRNAs associated with SRm160 were examined for proteins that crosslinked to the pyrimidine tract. This experiment may be revisited to determine a time course of the association of PUF60 and p54 with the pre-mRNA . More straightforward and short term experiments include direct immunoprecipitation of tagged PUF60 and determination of what proteins associate with PUF60 in the cell using a panel of antibodies directed against the various nuclear proteins including the splicing factors SRml60, SFl/BBP, SF3a components, U2AF65 and 35, Urp, PTB, Sm, Ro and p54. To demonstrate that the interaction between PUF60 and p54 is conserved in evolution, an immunoprecipitation using the weakly immunoprecipitable p54 antiserum will be attempted from insect cells and the pellet will be tested for the presence of DPUF68 using the PUF60 antiserum. It is expected that due to the high degree of conservation the anitserum directed against the human proteins will cross react with the Drosophilaproteins The human autoimmune antigen Ro, of unknown function, immunoprecipitates PUF60 (Pascal Bouffard and Gilles Boire, personal communication). Does Ro immunoprecipitate the PUF splicing activity as well? Does PUF60 associated Ro associate with the pre-mRNA? Ro has been shown to immunoprecipitate PUF60 and it should be determined if the Ro IP pellet contains PUF activity. It is possible that Ro immunoprecipitates will have p54 as well, and this should be tested. As Ro's function is unknown, any associated proteins that are suggestive of function is of interest. 218 p54 translated in vitro and added to NEAU in the presence or absence of U2AF or in vitro translated PUF60 caused degradation of the pre-mRNA (data not shown). Unprogrammed and control programmed lysate did not have this effect. The mechanism of degradation is not known, but it is phenomenologically interesting. PUF60 immunolocalization: is the native protein detectable? PUF60, detected in situ by immunofluorescence experiments, does not localize to the expected nuclear structures. The p54 protein, shown to co-purify with and associate with PUF60, localizes to the speckle structures as do most splicing factors. The lack of PUF60 staining of this structure and the unexpectedly faint staining that is observed suggests that the PUF60 epitopes are masked in situ, as they are in vitro. To determine if this is the case fixed cells will be treated with denaturing agents, SDS and heat, and the cells will be stained for PUF60. We expect to see increased staining under these conditions This experiment is being actively pursued. Tagged PUF60 will also be transfected into cells and the localization of the expressed protein determined by using anti-tag antibodies. Is PUF60 a dimer in the native state? PUMP domain interaction motif? Is the PUMP domain bound to the One easy way to test this hypothesis is to transfect cells with tagged PUF60 and determine if this tagged protein associates with untagged endogenous protein by immunoprecipitation. Native state dimerization can also be tested using native gel electrophoresis, crosslinking, analytical ultracentrifugation and gel filtration chromatography. It will also be of interest to see if the Drosophilahomolog of PUF60, DPUF68, is an SDS-resistant dimer. The U2AF35 PUMP domain may interact with a short peptide motif of U2AF65. This may be tested by mutagenesis and far western blotting as performed by Zhang et al. 1992. 219