Identification of Novel Branch Points Reveals Insights into RNA Processing by Genevieve Michelle Gould B.A. Molecular and Cell Biology with an emphasis in Genetics, Genomics, and Development University of California, Berkeley (2009) Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2015 © Massachusetts Institute of Technology 2015. All rights reserved. Signature of Author .................................................................................................................................................... Department of Biology August 31, 2015 Certified by .................................................................................................................................................................... Christopher B. Burge Professor of Biology Thesis Supervisor Accepted by.................................................................................................................................................................... Michael Hemann Associate Professor of Biology Co-­‐Chair, Biology Graduate Committee 1 2 Identification of Novel Branch Points Reveals Insights into RNA Processing by Genevieve Michelle Gould Submitted to the Department of Biology on August 31, 2015 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology Abstract Pre-­‐mRNA splicing is a ubiquitous process necessary for the production of functional eukaryotic mRNAs. The branch point (BP) sequence is one of three key nucleotide sequences required for pre-­‐mRNA splicing, however, in metazoa it has been less comprehensively studied than the 5' splice site (5'SS) and 3' splice site (3'SS) due to the relative difficulty of identifying each sequence element. 5'SS and 3'SS are readily identified by aligning spliced cDNAs, ESTs, or RNA-­‐Seq reads to the genome, while lower throughput techniques such as primer extension are usually required to map BPs, with some exceptions. To understand how the BP affects splicing outcomes, we developed an experimental method to locate BPs on a genome-­‐wide scale. Applying our method to Saccharomyces cerevisiae (S. cerevisiae), one of the only eukaryotes for which most BPs are known, allowed us to assess the sensitivity and specificity of our method. We enriched for RNA lariats by isolating RNA from debranching enzyme null yeast and purified circular RNAs (including lariats) from linear RNAs using a 2D PAGE gel. This was followed by a custom library preparation protocol that produced insert ends that identified the BP and 5'SS of individual lariats. Using this method, we located known BPs and discovered a substantial number of novel BPs both in annotated introns and other genomic regions. We attempted to verify these novel introns using RNA-­‐seq and Lariat-­‐seq and surprisingly observed considerable amounts of alternative splicing (AS) in S. cerevisiae beyond the previously known stress-­‐regulated intron retention events and handful of alterative splice sites. Additionally, we observed several introns with 2 BPs and one intron with 3 BPs. In the LSM2 transcript, we showed alternative BP usage was associated with alternative splice site usage, where one of the mRNA isoforms contains a premature termination codon and leads to nonsense-­‐mediated mRNA decay of the transcript. This suggests AS may control gene expression levels in yeast as is known to be the case in metazoans. Preliminary application of our method to Drosophila melanogaster showed recursive splicing, a phenomenon known only to occur in introns larger than 10Kb, to occur in a 383nt intron. Thesis supervisor: Christopher B. Burge Title: Professor of Biology 3 Acknowledgements I’d like to begin by thanking my advisor, Chris Burge, for allowing me to join his lab and pursue
a risky project that let me combine my desire to perform both experimental and computational
biology research. The Burge lab has been a great environment for me to learn and grow. Thank
you Chris for being receptive to my requests over the years, agreeing to meet with me regularly
to discuss my research and allowing me to present my findings at several scientific venues.
To my committee members, Phil Sharp and Tom RajBhandary, thank you for all of your helpful
advice over the years. Also, Robin Reed, thank you for agreeing to serve on my thesis committee
and for providing me with the HeLa Nuclear Extracts that were essential to the success of my
research.
Next, thank you to all the members of the Burge lab, past and present, who have made the lab a
great environment for doing research. I appreciate all you have taught me through sharing your
own knowledge of techniques and through your efforts critiquing my presentations and writing
over the years. Special thanks to Nicole for encouraging me to purify yeast DBR1 protein which
was the key to getting my protocol to work, to Athma for patiently helping me learn R, Alex,
Jason, Noah, Charles, Maria, Peter F. and Peter S. for teaching me new Python tricks, Matt for
insightful suggestions on ways to plot data, Eric for initial ideas pertaining to my project, Jess for
talking some sense into me when trying to get last minute experiments to work the night before
group meeting, Reut for helpful conversations over her late-morning breakfast in the dry lab, Joe
for being always being upbeat and being a wonderfully motivated guy to work with, and to
Jennifer, Dan, Caitlin, Razvan, Robin, Yarden, Albert, Rob, Vincent, Monica, Yevgenia, Chetan,
Abby, Cassie, Ritu, Dima, Daniel, Phil, and Brad for making my time in the lab so memorable.
Thank you to my collaborators Boris, Yuchun, and Joe for countless conversations and
questions; they have been some of the best parts of grad school.
I’d also like to thank all of my friends in the building, especially all of my 2nd and 3rd floor
neighbors for making the lab a lively place to do science, providing moral support, and
organizing fun extracurricular activities.
Thank you to my classmates. It’s been great bouncing ideas off of you and it has been
comforting to know I always have good friends nearby. I believe the bonds we have formed will
last a lifetime and I look forward to learning of everyone’s future accomplishments. Also, thank
you to my BBS friends. It’s been fun to observe the differences between the MIT and Harvard
Biology PhD programs over the years and it’s been wonderful having more friends in the area
who understand the time requirements of research. Also thank you to my roommates, past and
present, who have always been there for me when I needed to unwind at the end of the day.
Thanks to MIT’s extracurricular activities, I’ve been able to maintain a work-life balance. Thank
you to the friendly staff and volunteers at the MIT Sailing Pavilion, members of the MIT Figure
Skating Club, and volunteers at the MIT Rock Wall for creating positive outlets.
4 Thank you to my friends from home. Even though some of you admitted you probably wouldn’t
understand what I was studying, you were always willing to give it a try and wanted to catch up
anyway. Thank you to my college friends, especially the Cal Sailing Team, who still make the
time to get together even though we are now scattered across the globe. And to those Cal Sailors
whom I discuss scientific topics with from afar, I look forward to our future conversations about
scientific breakthroughs, and what the general public thinks of them.
Thank you to Mike Eisen for allowing me to experience what computational biology was all
about first hand. If I hadn’t worked in your lab, I wouldn’t have come to grad school. Thank you
to my additional mentors outside the lab, Kim Hamad-Schifferli, Frank Solomon, and Alan
Grossman, who have provided me with valuable advice over the years.
I would like to especially thank my best friend, Dr. Lauren Barclay, for always being there for
me. As we both know, grad school can be trying at times, and having my best friend nearby, who
was going through a PhD herself, was the best thing I could have asked for. Thanks for making
time to catch up and getting me out of the lab to enjoy New England!
I’d like to thank my high school biology teacher for instilling in me my love of biology. Mr. Van
Loo was an excellent teacher who really worked hard to make the subject matter he was teaching
interesting and memorable. I’ll never forget when he dressed up a hockey player to demonstrate
the Calvin Cycle, bringing a puck of “carbon” in the open “stomata” door to show us where the
carbon went and what happened to it once it entered the “cell” classroom, or the time when he
had a student volunteer stand on a chair, hold a couple of branches, and try, to no avail, to drink
water through a long straw from a water bottle on the floor to demonstrate why transpiration was
important for plants to transport water from their roots to their branches. He made biology fun
and accessible. It was also through his course that I learned about the UC Davis Young Scholars
Program and ended up having my first of many research experiences.
I’d like to thank my extended family in the Boston area that made Cambridge a home away from
home for me. It’s been great spending time with you, especially since we lived so far apart while
I was growing up. I’ve really enjoyed all of our great meals together, Red Sox games, trips to the
Cape and other outings. Also, thank you for opening your home to me after the Boston Marathon
bombing. A special thank you to the officers who protect MIT, especially Officer Sean Collier.
To my grandparents, thank you for always wanting to hear about my latest endeavors. To my
“little” brother, thanks for being born after me, you would have been a tough act to follow. I’ve
enjoyed all of our fun East Coast visits and appreciate all your advice over the years. Finally, I
would like to thank my mom and dad for everything they have given me. Without them, there’s
no way I would be where I am today. They have always been there for me, from the endless hours of practicing vocabulary words and spelling in elementary school, to coaching my soccer teams, to caring for me after injuries from said soccer, to taking me on unforgettable family vacations. You taught me to be persistent and it has definitely paid off. Thank you for your invaluable love and support, I couldn’t have done it without you. -­‐Genny Gould 5 Table of Contents Abstract .................................................................................................................................................... 3 Acknowledgements .............................................................................................................................. 4 Table of Contents .................................................................................................................................. 6 Chapter 1: Introduction ...................................................................................................................... 9 Overview ......................................................................................................................................................... 10 Pre-­‐mRNA splicing ....................................................................................................................................... 10 Spliceosomal splicing and self splicing ............................................................................................................... 10 Consequences of alternative splicing .................................................................................................................. 14 Unconventional intron removal ............................................................................................................................. 14 Branch points ................................................................................................................................................ 15 Discovery ......................................................................................................................................................................... 15 BP identification ........................................................................................................................................................... 16 BP characteristics: motifs and locations ............................................................................................................ 20 Functional roles of BPs: effects of location, mutations, and altered recognition ............................. 22 Lariats .............................................................................................................................................................. 24 Sources of lariats .......................................................................................................................................................... 24 Lariat turnover: debranching ................................................................................................................................. 25 RNAs processed from lariats ................................................................................................................................... 26 Lariats versus circular RNAs ................................................................................................................................... 27 Sequencing technologies ........................................................................................................................... 29 Thesis overview ............................................................................................................................................ 30 References ...................................................................................................................................................... 31 Chapter 2: Identification of New Branch Points and Unconventional Introns in Saccharomyces cerevisiae ................................................................................................................ 39 Abstract ........................................................................................................................................................... 40 Introduction ................................................................................................................................................... 41 Results .............................................................................................................................................................. 44 Branch-­‐seq accurately identifies locations of 75% of expressed, annotated BPs ............................ 44 Branch-­‐seq identifies novel BP and associated 5'SS ..................................................................................... 47 Over 100 additional introns and splice sites in the yeast genome ......................................................... 52 New splice sites have distinctive features and conservation .................................................................... 53 New splice sites have distinctive features and conservation .................................................................... 56 AT-­‐AC splice sites are used in yeast ..................................................................................................................... 57 Multi-­‐BP introns occur in at least twelve genes and can impact gene expression .......................... 58 Changes in splicing among growth conditions ................................................................................................ 61 Discussion ....................................................................................................................................................... 64 Methods ........................................................................................................................................................... 67 Data access ..................................................................................................................................................... 89 Acknowledgements ..................................................................................................................................... 89 Author contributions .................................................................................................................................. 89 Supplemental figures .................................................................................................................................. 90 Tables ............................................................................................................................................................. 101 References .................................................................................................................................................... 102 Chapter 3: Conclusions .................................................................................................................. 107 Implications ................................................................................................................................................. 108 6 Future directions ........................................................................................................................................ 110 BP sequencing approaches .................................................................................................................................... 110 Advice for future development of BP sequencing approaches ............................................................... 111 Additional applications of BP sequencing ....................................................................................................... 112 Final remarks .............................................................................................................................................. 113 References .................................................................................................................................................... 114 Appendix I: Branch-­‐seq Protocol ............................................................................................... 115 Part 1: Branch-­‐seq protocol .................................................................................................................... 116 Pre-­‐protocol steps: .................................................................................................................................................... 116 Branch-­‐seq protocol: ................................................................................................................................................ 117 Part 2: Advice for future BP sequencing protocols ......................................................................... 124 Figures ........................................................................................................................................................... 128 References .................................................................................................................................................... 132 Appendix II: Supplemental Tables to Chapter 2 ................................................................... 133 Table II-­‐S1. Branch-­‐seq BP peaks paired 5'SS motifs. .................................................................... 134 Table II-­‐S2. GEM-­‐BP and winBP peaks ................................................................................................ 135 Table II-­‐S3. GTATGT motif frequency at 5'SS and generally in introns. .................................. 152 Table II-­‐S4. Branch-­‐seq CPMs. ................................................................................................................ 153 Table II-­‐S5. SacCer 3 coordinates of lariat junction reads ........................................................... 164 Table II-­‐S6. Novel splice junctions with entropy ≥ 2 bits. ............................................................ 166 Appendix III: BP Identification in Metazoans ........................................................................ 171 Abstract ......................................................................................................................................................... 172 Introduction ................................................................................................................................................. 172 Methods ......................................................................................................................................................... 173 Results ............................................................................................................................................................ 175 Knockdown of ldbr does not result in a noticeable accumulation of lariat RNA ............................ 175 Fly Branch-­‐seq reads largely do not map to the fly genome ................................................................... 177 Fly Branch-­‐seq reads identify the first recursive splice site in a short intron ................................ 178 Discussion ..................................................................................................................................................... 182 Supplemental note ..................................................................................................................................... 183 Acknowledgments ...................................................................................................................................... 185 References .................................................................................................................................................... 185 7 8 Chapter 1: Introduction 9 Overview In eukaryotes, most intron containing pre-­‐mRNAs require splicing in the nucleus before they are exported to the cytoplasm (Hocine, Singer, & Grünwald, 2010). Pre-­‐mRNA splicing is the ubiquitous process by which intervening sequences, introns, are removed from pre-­‐mRNAs and exonic sequences are joined together as part of the mRNA maturation process (Padgett, Konarska, Grabowski, Hardy, & Sharp, 1984). This is accomplished through two successive transesterification reactions that can produce constitutive or alternative splicing patterns. During alternative splicing, the exons of one pre-­‐mRNA are joined together in different combinations to produce two or more distinct mRNAs, termed isoforms. mRNA isoforms may differ in their translation, stability, or localization (Hocine et al., 2010). These mRNA isoforms often code for different proteins contributing greatly to the diversity of the proteome (Matlin, Clark, & Smith, 2005). Though the value of alternative splicing has been appreciated for some time, much remains to be understood about how splicing is regulated. Pre-­‐mRNA splicing Spliceosomal splicing and self splicing The branch point (BP) sequence is one of three key nucleotide sequences required for splicing of precursors to mRNAs. It is typically located near the 3' end of the intron, between the two other required sequences, the 5' splice site (5'SS) and the 3' splice site (3'SS), which identify the ends of the intron. All three of these sequences are absolutely 10 required for spliceosome-­‐mediated splicing because they participate in the chemistry of splicing. The spliceosome is comprised of an array of RNAs and proteins that assemble on the pre-­‐mRNA in a step-­‐wise manner. During assembly of the spliceosome, the 5'SS is recognized by the U1 small nuclear ribonucleoprotein (snRNP) through complementarity between the U1 small nuclear RNA (snRNA) and the 5'SS sequence. The BP is first recognized by the BP binding protein (BBP) (yeast) or splicing factor 1/mammalian BBP (SF1/mBBP) (mammals), and the polypyrimidine tract and 3'SS are recognized by U2AF2 and U2AF1, respectively (Fig. 1-­‐1A). The U2 snRNP subsequently replaces SF1/BBP and the U2 snRNA base pairs with the BP sequence, forming a structure in which the BP nucleotide, typically an adenosine embedded inside the BP motif, is bulged from the RNA duplex (Langford & Gallwitz, 1983; Query, Moore, & Sharp, 1994; Wahl, Will, & Lührmann, 2009), preparing the RNA for the first transesterification reaction of splicing. The base pairing of the U2 snRNA with the BP is stabilized by the SF3a and SF3b complex components of the U2 snRNP (Gozani, Feld, & Reed, 1996). Next, the pre-­‐assembled U4/U6.U5 tri-­‐snRNP is recruited to the splicing complex and then the U1 and U4 snRNPs are released. Once this step occurs, the first splicing reaction creates an unusual 2'-­‐5' RNA linkage between the 2' OH of the BP nucleotide and the 5'SS. This reaction results in the formation of a lariat structure attached to the downstream exon, leaving the upstream exon with a free 3' OH. The second transesterification reaction joins the two exons together and frees the lariat (Meyer, Plass, Pérez-­‐Valle, Eyras, & Vilardell, 2011; Padgett et al., 1984). The lariat is rapidly debranched and degraded in most cases (Chapman & Boeke, 1991; Corvelo, Hallegger, Smith, & Eyras, 2010; Folco & Reed, 2014; Ruskin & Green, 1985), making BP 11 identification difficult. In contrast, splice site identification is relatively straightforward because spliced alignment of cDNAs to the genome reveals the locations of splice sites. Figure 1-­‐1: Intron removal. (A) Two steps of splicing for spliceosome mediated splicing showing 5'SS, 3'SS, BP, U1 snRNP, U2 snRNP, BBP, and U2AF. Adapted from (Alberts et al., 2007). (B) Recursive splicing. Ratchet introns are involved in splicing of large Drosophila introns, including Ubx intron 1, kuz intron 3, and osp introns 1 and 2. Adapted from (Burnette, Miyamoto-­‐Sato, Schaub, Conklin, & Lopez, 2005). (C) Nested intron splicing. Some mammalian introns including introns in the human gene EPB41 contain nested introns. Adapted from (Parra, Tan, Mohandas, & Conboy, 2008). 5'SS white dotted line. 3'SS grey dotted line. 12 Not all introns are removed by the major spliceosome. For one, the minor spliceosome often splices out introns that have /AT 5'SS and AC/ 3'SS (where “/” represents the boundary with exonic sequence), though it has been shown that both the major and minor spliceosomes can splice introns with /GT-­‐AG/ or /AT-­‐AC/ termini (Dietrich, Incorvaia, & Padgett, 1997). The only snRNP shared between the major and minor spliceosome is the U5 snRNP, with the minor spliceosome containing the U11, U12, U4atac, and U6atac snRNPs that are functionally analogous to the major spliceosomal snRNPs described above (reviewed by (Patel & Steitz, 2003)). Second, introns may be removed by self splicing, as is the case in Group I and Group II introns. Group I introns use a free nucleotide, typically a guanosine, as the nucleophile for the first step of splicing. In contrast, Group II intron splicing is quite similar to that of spliceosomal introns in that the 2'OH of a nucleotide embedded in the intron sequence itself, often an adenosine, is used as the nucleophile in the first step of splicing (Bonen & Vogel, 2001). Additionally, Group II introns usually conform to a particular secondary structure that consists of an elaborate series of stem loops (Sharp, 1991). Group I and Group II introns can often self splice in vitro in the absence of proteins, but the efficiency may be augmented in vivo by specific proteins that are generally unrelated to spliceosomal proteins (Cech, 1990). A third type of intron removal occurs in eukaryotic and archaeal tRNAs where introns can be removed by a series of RNA cleavage and ligation steps that differ from spliceosomal splicing and self splicing (reviewed by (Abelson, Trotta, & Li, 1998; Phizicky & Hopper, 2010). Bacterial tRNA introns can be removed by self splicing (Biniszkiewicz, Cesnaviciene, & Shub, 1994; Kuhsel, Strickland, & Palmer, 1990; Reinhold-­‐Hurek & Shub, 1992). 13 Consequences of alternative splicing Alternative splicing of pre-­‐mRNAs can have many different functional consequences at both the RNA and protein levels. For instance, SRP75 mRNA is destabilized by splicing in an extremely well conserved (“ultra-­‐conserved”) exon (Lareau, Inada, Green, Wengrod, & Brenner, 2007; Ni et al., 2007). Localization of RNAs can be altered by splicing as well, as is the case of oskar in Drosophila, where splicing causes the mRNA to localize to the posterior pole of the oocyte, whereas the unspliced mRNA can be seen diffusely throughout the ooplasm (Hachet & Ephrussi, 2004). Similarly, splicing can change the localization of the protein, as in the case of Nop30, where splicing of the mRNA alters the C-­‐terminus of the protein, changing the protein’s localization between the nucleus and cytoplasm (Stoss, Schwaiger, Cooper, & Stamm, 1999). Alternative splicing is regulated at the tissue and organism level and is important for development. While gene expression is tissue-­‐specific, alternative splicing is conserved in only a subset of tissues and is often organism-­‐ or lineage-­‐specific (Barbosa-­‐Morais et al., 2012; Merkin, Russell, Chen, & Burge, 2012). Additionally, the splicing of certain introns can contribute to the proper timing of gene expression that is critical for development, as is the case of Hes7 in mouse somite segmentation (Takashima, Ohtsuka, González, Miyachi, & Kageyama, 2011). Unconventional intron removal Though BPs are typically located near the 3' ends of introns, BPs located far away from the 3'SS have been observed in the cases of recursive splicing in flies and humans and 14 nested intron splicing in humans (Burnette et al., 2005; Duff et al., 2015; Hatton, Subramaniam, & Lopez, 1998; Sibley et al., 2015). Recursive splicing is achieved by splicing the 5'SS to a sequence inside the intron that resembles a 3'SS immediately adjacent to a second 5'SS (Fig. 1-­‐1B). Splicing continues in this fashion until the next exon is reached. To date, recursive splicing has only been observed in fly and human introns that are larger than 10 kbp in length (Burnette et al., 2005; Duff et al., 2014). In nested intron splicing, a central segment of a large intron is initially removed, followed by splicing of the remainder of the intron using the normal 5'SS and 3'SS (Fig. 1-­‐1C) (Ott, Tamada, Bannai, Nakai, & Miyano, 2003; Parra et al., 2008). Branch points Discovery In 1982 Wallace and Edmunds discovered branched RNA that contained a 2' to 5' phosphodiester bond. They observed that branching occurred in the nuclear RNA fraction as opposed to the cytoplasmic RNA fraction and observed that the branched nucleotide is often an adenosine (Wallace & Edmonds, 1983). Shortly thereafter, in 1983, the BP motif in budding yeast was proposed after a detailed deletion analysis of the 3' end of the actin intron identified a region of the intron near the 3'SS that was necessary for splicing. Comparison to the three other budding yeast introns sequenced at the time revealed the presence of the same TACTAAC motif near the 3' ends of all four introns (Langford & Gallwitz, 1983). After the sequence of the 15 Saccharomyces cerevisiae (S. cerevisiae) genome was released in 1996, researchers sought to comprehensively identify yeast introns and test those predictions (Davis, 2000; Spingola, Grate, Haussler, & Ares, 1999). Additionally, BPs were computationally predicted in annotated yeast introns based on a combination of their unusually strong motif (Fig. 1-­‐2A) and location relative to the 3'SS (Davis, 2000; Meyer et al., 2011). Computational BP predictions in S. cerevisiae have been limited to annotated introns, however, additional yeast introns are still being discovered today using genome-­‐wide assays (Kawashima, Douglass, Gabunilas, Pellegrini, & Chanfreau, 2014; Z. Zhang, Hesselberth, & Fields, 2007). Thus, any BPs that fell outside of the intron annotations at the time would have been missed. BP identification Historically, BPs have been much more challenging to identify than splice sites in a high-­‐throughput manner. Splice site identification can be accomplished by aligning a cDNA back to its parent genome to determine the missing intronic sequence. BPs on the other hand are best identified from lariat RNAs. The short half-­‐lives and unusual branched structure of lariats requires additional methods to pinpoint BP locations. Traditionally, BPs have been experimentally verified using more laborious low-­‐throughput techniques such as primer extension, in vitro splicing, and RT-­‐PCR across the lariat 5'SS-­‐BP junction (Padgett et al., 1985; Vogel, Hess, & Börner, 1997; Wahl et al., 2009). To identify a BP using primer extension, a gene specific primer is designed to prime reverse transcription (RT) starting in the 3' exon. RT often stops at the branched nucleotide, revealing the location of the BP based on the product size and sequence (Fig. 1-­‐2B). In vitro splicing can be used to 16 17 Figure 1-­‐2: BP characteristics (A) BP and SS motifs. Figure from (Lim & Burge, 2001). (B) Classical methods for experimental BP identification. (C) Re-­‐splicing results in a BP inside of a CDS. Adapted from (Kameyama, Suzuki, & Mayeda, 2012) (D) Number of known 5'SS, 3'SS, BP based on estimates from Hg18 and (Gao, Masuda, Matsuura, & Ohno, 2008; Mercer et al., 2015; Taggart, DeSimone, Shih, Filloux, & Fairbrother, 2012). (E) Mutually exclusive splicing of α-­‐tropomyosin as a result of unusual BP location near a 5'SS. (F) BP mutations in the XPC gene are associated with xeroderma pigmentosum (Khan et al., 2004). splice a gene of interest and typically is combined with mutagenesis experiments of the presumptive BP region, or primer extension on the splicing products, to locate the BP nucleotide. RT-­‐PCR across the lariat 5'SS-­‐BP junction identifies a BP by the juxtaposition of the 5'SS sequence to the BP sequence in the PCR product (Fig. 1-­‐2B). Because RT rarely crosses the 5'SS-­‐BP junction, gene specific primers have traditionally been used to amplify such RT products for sequencing. Application of such techniques has allowed identification of dozens of human BPs and revealed discrepancies with computational BP predictions. In 2008, the first large scale experimental study identified ~100 human BPs using RT-­‐PCR on 293T cell RNA (Gao et al., 2008). This approach targeted 52 introns using nested PCR primer pairs, similar to Figure 1-­‐2B (top), and found that only 50% of their sequenced BPs agreed with those generated by a predictive algorithm, demonstrating the value of experimental BP validation (Gao et al., 2008). While BP prediction algorithms commonly use proximity to the 3'SS as a predictive feature, a number of studies have found examples of distant BPs located more than 100 nucleotides away from the 3'SS (Grossman et al., 1998; Hallegger, Sobala, & Smith, 2010). Additionally, BPs located very far from the 3'SS have been observed in the cases of recursive splicing and nested intron splicing (see above). Existing algorithms would also fail to predict a BP if it were located in a coding sequence (CDS), as occurs in re-­‐splicing of 18 specific mRNAs in cancer (Kameyama et al., 2012) (Fig. 1-­‐2C). Distant BPs, unannotated introns, CDS BPs, and poor agreement between predictive algorithms and experientially validated BP locations support the utility of an untargeted experimental approach to identify BPs genome-­‐wide. Alternative high-­‐throughput approaches to identify BP locations have only been developed recently. All of these approaches have been made possible by recent advances in sequencing technologies that allow routine sequencing of millions of short, heterogeneous cDNA fragments. When these fragments, termed “reads”, are generated from mRNAs, the collection of reads yields information about relative gene expression and splicing levels. This type of data, known as RNA sequencing (RNA-­‐seq), is generated by selecting poly(A) tailed RNAs or depleting ribosomal RNAs from total RNA in order to isolate mRNAs. Fragmentation of the mRNAs followed by random hexamer priming creates cDNA fragments for sequencing. Once sequenced, the short reads are aligned back to the reference genome for downstream computational analyses. Generally, a small fraction of the reads will not align, or “map”, to the genome. These unmapped reads arise from a combination of technical and biological sources. In the last few years, new computational analyses of RNA-­‐seq data have been used to identify BPs. In 2012, Taggart and colleagues identified split reads that cross the 5'SS to BP junction in existing RNA-­‐seq data from reads that do not map to the genome contiguously. This approach resulted in the identification of ~900 human BPs (Taggart et al., 2012). A drawback of this approach is the extremely low efficiency: out of 1.2 billion reads analyzed, only 2,118 (0.0002%) crossed the 5'SS to BP junction. Increasing the fraction of lariat junctions reads to total reads would make this split read mapping 19 approach more appealing for global identification of BPs. The following year, Awan and colleagues developed a method that addressed this enrichment problem. Their method, Lariat-­‐seq, specifically sequences lariat RNAs. Using Lariat-­‐seq, they discovered novel introns and splicing events in Schizosaccharomyces pombe (S. pombe) and identified ~900 BPs using a variation of the split read mapping strategy originally developed by Taggart et al. (Awan, Manfredo, & Pleiss, 2013). A year later, Bitton and colleagues came up with a variation on the computational split read mapping algorithm to find BPs, termed LaSSO. Applied to the human dataset used by Taggart et al., LaSSO found a largely different set of BPs than the study by Taggart and colleagues (Bitton et al., 2014). These discrepancies indicate that it is likely more BP locations remain to be gleaned from existing RNA-­‐seq datasets through further development of computational algorithms. More recently, a novel targeted BP sequencing approach found ~60,000 human BPs (Mercer et al., 2015). The success of this method was largely due to the strategies used to enrich for informative 5'SS to BP traversing reads. However, the targeted approach used in this study made use of oligonucleotide probes designed to map near annotated 5' and 3' ends of introns and thus was unlikely to find the unusual BPs discussed above. Based on the number of constitutive and alternative 5' and 3'SS known, it is certain that many tens of thousands of mammalian BPs remain to be discovered (Fig. 1-­‐2D). BP characteristics: motifs and locations Years of study using both experimental and computational techniques have revealed consensus motifs of the three required splicing sequences. Among budding yeast, worms, flies, plants, and human, yeast has the strongest BP motif (Fig. 1-­‐2A). The BP motif is highly 20 constrained in S. cerevisiae with ~90% of annotated BPs matching the TACTAAC motif perfectly (Spingola et al., 1999), contrasted with metazoans and plants which have a highly degenerate BP motif (yUnAy) (where y = C or U and n = any base) (Chapman & Boeke, 1991; Folco & Reed, 2014; Gao et al., 2008; Lim & Burge, 2001; Ruskin & Green, 1985). Budding yeast also contain the most information in their 5'SS motifs and the least information at their 3'SS compared to these other organisms (Fig. 1-­‐2A). These differences contribute to the accuracy of splicing predictions across different organisms. Previous work found that Drosophila melanogaster and Caenorhabditis elegans short introns contain most of the information necessary for their recognition by the splicing machinery. S. cerevisiae introns also contain much of this information, but not enough to clearly identify the 3'SS, whereas human and plant introns do not contain enough information in their splice site motifs to accurately predict splicing outcomes (Lim & Burge, 2001). Known BPs are typically located near the 3' ends of introns. In S. cerevisiae, BPs are often easily found 20-­‐45 nt upstream of the 3'SS due to the strong BP motif and short intron size (Meyer et al., 2011). These properties have allowed computational prediction of a BP in every S. cerevisiae intron, but not in other organisms. Nevertheless, in 2010 a computational study predicted human BPs using sequence conservation, predicted U2 snRNA binding stability, and intronic position (Corvelo et al., 2010). This study found that BP strength and distance to the 3'SS correlate strongly with alternative splicing, suggesting a role for the BP in determining splicing outcomes. Interestingly, in budding yeast, when the BP to 3'SS distance is larger than ~45 nt there is typically secondary structure that reduces the effective distance between the BP and 3'SS (Meyer et al., 2011). For 21 experimentally mapped human BPs, the BP tends to be close to the 3'SS (Gao et al., 2008; Mercer et al., 2015; Taggart et al., 2012), similar to the majority of yeast BPs. Functional roles of BPs: effects of location, mutations, and altered recognition While it is clear a BP is required for every splicing reaction, the degree to which BP selection determines alternative splicing outcomes has not been well studied. However, a few examples illustrate some of the functional consequences of BP usage. Work from our lab suggests that BP positioning plays a role in 3'SS selection for the special case where alternative 3'SS are 3 nucleotides apart, known as NAGNAGs (Bradley, Merkin, Lambert, & Burge, 2012). This work showed that the putative BP is located farther upstream in the intron when the upstream NAG is favored compared to the case when the downstream NAG is predominantly used. Additionally, steric effects have been shown to influence the outcome of splicing events, as in the case of α-­‐tropomyosin. The BP upstream of the second mutually exclusive exon in α -­‐tropomyosin is located very close to the 5'SS of the competing exon, preventing splicing of the intervening intron due to steric hindrance of splicing components (Smith & Nadal-­‐Ginard, 1989) (Fig. 1-­‐2E). BP mutations can alter splicing events both in vivo and in vitro, implying constraint on what sequence can be selected as a BP in the intron. For instance, yeast splicing reporters with mutated BPs show greatly reduced levels of splicing (Rain, 1997; Vijayraghavan et al., 1986). Similarly, in cases of genetic diseases, BP mutations have been shown to cause exon skipping or intron retention, defined as the events where a single exon is alternatively 22 spliced out of the mRNA or a single intron is included in the mRNA, respectively. BP mutations have been linked to disease phenotypes in Fish-­‐eye disease, X-­‐linked hydrocephalus, Ehlers-­‐Danlos syndrome, hemophilia B, xeroderma pigmentosum, tuberous sclerosis, familial hypercholesterolemia, Niemann-­‐Pick disease, extrapyramidal movement disorder, and allele-­‐dependent production of soluble DQ (Královicová, Lei, & Vorechovský, 2006). More specifically, a familial case of xeroderma pigmentosum (XP), an autosomal recessive disease associated with a 1000-­‐fold increase of skin cancer frequency, is caused by BP mutations in the XPC gene. These mutations cause exon skipping that creates a non-­‐
functional DNA repair enzyme (Khan et al., 2004) (Fig. 1-­‐2F).
Mutations in core splicing factors that recognize the BP and the 3' end of the intron have recently been observed in many cancers. In several blood cancers SF3B, which is involved in BP recognition, and U2AF, which is involved in polypyrimidine tract and 3'SS recognition, have been observed to be hotspots of mutations (Hahn & Scott, 2012). Independent studies have identified SF3B1 among the top genes containing somatic mutations in chronic lymphocytic leukemia (CLL) samples (Quesada et al., 2012; Wan & Wu, 2013; L. Wang et al., 2011; X. Wu, Tschumper, & Jelinek, 2013). In secondary acute myeloid leukemia (sAML), recurrent mutations in U2AF1 have been identified (Graubert et al., 2012). These and other studies have documented changes in pre-­‐mRNA splicing in mutant samples, implicating pre-­‐mRNA splicing in myelodysplastic syndromes (MDS) (DeBoever et al., 2015). Interestingly, the anti-­‐tumor splicing drugs Spliceostatin A (SSA) and E7107 have been shown to interfere with normal functions of SF3B. More specifically, SSA and E7107 disrupt proper recognition of the BP by the U2 snRNP and U2 snRNA, respectively, and alter the outcome of splicing (Corrionero, Miñana, & Valcárcel, 2011; 23 Folco, Coil, & Reed, 2011). SF3B is also the binding target of Pladienolide B, another anti-­‐
tumor compound that inhibits splicing and is structurally similar to E7107 (Effenberger et al., 2014; Kotake et al., 2007). Lariats The location of the BP is defined by the unusual 2'-­‐5' linkage between the BP nucleotide and the 5'SS present in lariat RNA, making lariats the key to identifying BP locations. Sources of lariats There are several different sources of RNA lariats. For one, lariats can be produced in vitro using a deoxyribozyme. In this case, an in vitro synthesized linear RNA is mixed with a partially complementary DNA oligo. Pairing of the RNA and DNA facilitates branch formation by positioning various parts of the RNA near each other spatially so that a nucleophilic attack can occur (Y. Wang & Silverman, 2005). Second, in vitro self splicing of a Group II intron can be used to produce a lariat by placing an in vitro transcribed RNA under the correct temperature and buffer conditions (Costa, Fontaine, Loiseaux-­‐de Goër, & Michel, 1997). Third, in vitro splicing using HeLa nuclear extracts can be used to produce lariats spliced by the spliceosome (Folco & Reed, 2014; Padgett, Hardy, & Sharp, 1983). In the cases of self splicing and in vitro splicing, lariat RNA is not the only product of the splicing reaction; ligated exons and splicing intermediates are also produced. To remove the linear RNA products and leave lariat RNA intact, an exonuclease, such as RNase 24 R, can be added to the reaction (Suzuki, 2006). RNase R is a processive 3' to 5' exonuclease that requires 7 nt of single-­‐stranded RNA at the 3' end of an RNA to initiate digestion of its substrates (Vincent & Deutscher, 2006). In an in vitro splicing reaction, most of what is left after treatment with RNase R will be lariat RNA (Fig. I-­‐1). Lariat turnover: debranching The debranching enzyme, DBR1, rapidly linearizes lariat RNA in vivo so that the RNA can be degraded and the nucleotides can be recycled. The debranching enzyme was discovered in 1991 in a screen for factors required for Ty1 retrotransposition (Chapman & Boeke, 1991). The study found that the DBR1 gene is required for Ty1 transposition and inadvertently found DBR1 is required to debranch lariats. Characterization of the DBR1 gene revealed that one copy of DBR1 was sufficient to debranch the yeast actin lariat in vivo, but a homozygous dbr1 deletion resulted in accumulation of the lariat. Subsequently, DBR1 has been implicated in HIV replication (Ye, De Leon, Yokoyama, Naidu, & Camerini, 2005). In human cells, 80% knockdown of DBR1 did not significantly affect cell viability but did lead to a decrease in HIV cDNA and protein production. Debranching enzyme is a highly conserved protein from yeast to human. It has been shown that ectopic expression of human DBR1 can complement S. cerevisiae and S. pombe dbr1 nulls (Kim et al., 2000). S. cerevisiae dbr1∆ mutants have slight growth defects (Chapman & Boeke, 1991) and S. pombe dbr1∆ mutants have filamentous growth defects (Mösch & Fink, 1997). Recently a homolog of DBR1, DRN1, has been reported to aid in the process of debranching lariats (Garrey et al., 2014). 25 Detailed studies of the debranching enzyme revealed its reaction condition requirements and target sequence preferences for fast debranching activity. DBR1 is optimally active between 30-­‐37˚C, prefers purines at the 2' position relative to the BP nucleotide, and requires more than an H or OH group at the 3' position to debranch a lariat (Nam et al., 1994; Ooi et al., 2001). Low concentrations of divalent cations will enhance the activity of DBR1. However, it is not always necessary to add cations to the debranching reaction, perhaps because the enzyme tightly binds two metal ions that may remain bound during DBR1 purification. Inhibitors of debranching include high concentrations of KCl, RNasin, and yeast tRNA (Ooi et al., 2001). Additionally, the catalytic residues of DBR1 have been identified (Findlay, Boyle, Hause, Klein, & Shendure, 2014; Khalid, Damha, Shuman, & Schwer, 2005). RNAs processed from lariats It is apparent that there are many varieties of functional RNAs that are derived from intronic lariat RNAs. For one, snoRNAs are often processed from debranched lariats by exonucleases (Bachellerie, Cavaillé, & Hüttenhofer, 2002; Kiss & Filipowicz, 1995; Tycowski, Shu, & Steitz, 1993). snoRNAs guide RNA modifications, including 2'-­‐O methylation and pseudouridylation. Interestingly, the spacing between snoRNAs and BPs is critical for proper snoRNA processing (Hirose, Shu, & Steitz, 2003; Vincenti, De Chiara, Bozzoni, & Presutti, 2007). Many additional types of RNAs are processed from introns, but the relevance of BP position in the maturation of these RNAs is less clear. For instance, sno-­‐lncRNAs, long non-­‐
coding RNAs (lncRNAs) flanked on each end by a snoRNA, are processed out of introns, two 26 of which have been implicated in the pathogenesis of Prader-­‐Willi Syndrome (Yin et al., 2012). Additionally, microRNAs (miRNAs) can be processed out of debranched introns. These miRNAs/introns, called mirtrons, structurally look like pre-­‐miRNAs once the lariat has been debranched, allowing the mirtron to enter the miRNA processing pathway without Drosha-­‐mediated cleavage (Ruby, Jan, & Bartel, 2007). In another example, in the yeast Cryptococcus neoformans, stalled splicing coupled to lariat debranching has been shown to produce siRNAs that silence transposons (Dumesic & Madhani, 2014; Dumesic et al., 2013). As a final example, debranching is necessary for class switch recombination of the antibodies expressed by B cells because an intronic RNA processed from a lariat guides activation-­‐induced cytidine deaminase (AID) to immunoglobulin switch region DNA (Zheng et al., 2015). BP position might affect the stability of these intron derived RNAs. In S. cerevisiae, when lariats accumulate their tails are digested (Chapman & Boeke, 1991). If lariat tails are digested in metazoans as they are in S. cerevisiae, BP position could determine whether the intron encoded RNA will be protected in a lariat loop or subject to degradation in a lariat tail. Lariats versus circular RNAs Despite the circular nature of the lariat loop, there are important differences between a lariat RNA and a circular RNA. Circular RNAs are defined as having only 3' to 5' linkages of the sugar-­‐phosphate backbone whereas lariat RNAs contain many 3' to 5' linkages, but also contain a single 2'-­‐5' RNA linkage. The BP nucleotide in a lariat is attached to three other nucleotides, whereas every base in a truly circular RNA is only attached to two other nucleotides. Topographically, lariats and circles seem quite similar, 27 and though the literature has deceptively used the term “circular intronic RNA (ciRNA)” to describe lariat RNA (Y. Zhang et al., 2013), to enzymes circles and lariats are quite different. It is difficult for RT to traverse the 2'-­‐5' RNA linkage at the BP and often results in incorporation of a mismatched nucleotide at the BP or skipping of a base altogether at the BP (Bitton et al., 2014; Gao et al., 2008; Taggart et al., 2012). Circular shaped RNAs have known functions. Circular RNA sponge for miR-­‐7 (ciRS-­‐
7) binds the miRNA miR-­‐7 (Hansen et al., 2013a), affecting the expression of many oncogenes (Hansen, Kjems, & Damgaard, 2013b). Similarly, the circular Sry transcript in mouse (Capel et al., 1993) produced from “head to tail” splicing of the Sry pre-­‐mRNA, has been shown to interact with miR-­‐138 (Hansen et al., 2013a). Additionally, ciRNAs have been shown to regulate the expression level of their parent transcript (Y. Zhang et al., 2013). A variety of techniques can be employed to prove that an RNA is circular (Jeck & Sharpless, 2014). First, circular and lariat RNAs are resistant to RNase R digestion. Because RNase R cannot digest all linear RNAs, digestion by RNase R should not be the only evidence used to prove that a given RNA is circular. Another way to distinguish circular RNA from linear RNA is to perform an RNase H digestion on the RNA with an oligo somewhere in the middle of the circle. After digestion, circular RNAs will be linearized and appear as only one band on a gel whereas linear RNAs will be broken into two smaller RNA fragments. A third option is to look for retarded mobility of circular RNAs in a gel, since their shape makes the circular RNAs appear to run slower than their linear counterparts (Chapman & Boeke, 1991). Additionally, DBR1 should be able to linearize lariat RNAs but 28 not circular RNAs. Sequence confirmation around the lariat/circle can also help to prove that a molecule is circular, especially if the sequence traverses the circle multiple times. Sequencing technologies Rapid developments in sequencing technology in the last decade have led to many advances in genomics research. The cost of sequencing is decreasing faster than Moore’s Law predicts (G. E. Moore, 1965), making quick adoption and wide use of sequencing technology feasible. This availability of sequencing technology has prompted the development of many assays to measure quantities and locations of nucleic acids in cells. In the area of RNA biology, techniques have been developed to measure gene expression levels, relative splice isoform abundance (Pan, Shai, Lee, Frey, & Blencowe, 2008; E. T. Wang et al., 2008), locations of ribosomes on mRNAs (Ingolia, Ghaemmaghami, Newman, & Weissman, 2009), poly(A) site locations (Jan, Friedman, Ruby, & Bartel, 2011; Spies, Burge, & Bartel, 2013), transcript initiation site locations (Arribere & Gilbert, 2013), sites of RNA modification (Carlile et al., 2014), and nascent RNAs (Core, Waterfall, & Lis, 2008; Khodor et al., 2011; Paulsen et al., 2014), just to name a few. These data are often generated in large volumes and raise new computational challenges for analysis. Today there are many different kinds of sequencing instruments (Pareek, Smoczynski, & Tretyn, 2011) that provide different read lengths and depths of sequencing. Illumina sequencers produce shorter reads, typically in the range of 40-­‐250 nt, and sequence millions of fragments per run. The Ion Torrent instrument uses an alternative method for reading DNA bases and generates similar read lengths to the Illumina platforms 29 (Salipante et al., 2014). Other technologies, such as machines developed by PacBio and Oxford Nanopore, sequence fewer fragments per run but allow for long read sequencing, averaging 5-­‐15 Kb (Goodwin, Gurtowski, Ethe-­‐Sayers, & Deshpande, 2015; PacBio, 2014), which will likely be important for accurately identifying full length mRNA isoforms and sequencing around circular RNAs to confirm their shape (Tilgner et al., 2015; You et al., 2015). Thesis overview When I began my PhD in 2009, many of the aforementioned sequencing technologies were in their infancy while others were becoming popular. Though there were several technical challenges to overcome, for the first time it seemed feasible to sequence BPs on a genome wide scale. This combination of factors, along with my interest in how sequence elements contribute to gene regulation and my desire to perform both experimental and computational research, led me to study the role of the BP sequence in regulation of splicing. Chapter II of this thesis describes my findings regarding yeast BPs and novel splicing events in S. cerevisiae. Chapter III contains suggested future applications of BP sequencing methods. In the first half of Appendix I, I describe the Branch-­‐seq protocol in detail including secondary tips that will be helpful for anyone performing the protocol in the future. In the second half of Appendix I, I offer suggestions for further development of the current Branch-­‐seq protocol and for development of alternative BP sequencing methods. Appendix II contains the supplemental tables from Chapter 2. Finally, Appendix III describes my application of Branch-­‐seq to metazoans, focusing on Drosophila, and the first report of recursive splicing in a short intron. 30 References Abelson, J., Trotta, C. R., & Li, H. (1998). tRNA Splicing. Journal of Biological Chemistry, 273(21), 12685–12688. doi:10.1074/jbc.273.21.12685 Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., & Walter, P. (2007). Molecular Biology of the Cell. Garland Science. Arribere, J. A., & Gilbert, W. V. (2013). Roles for transcript leaders in translation and mRNA decay revealed by transcript leader sequencing. Genome Research, 23(6), 977–987. doi:10.1101/gr.150342.112 Awan, A. R., Manfredo, A., & Pleiss, J. A. (2013). Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans., 110(31), 12762–12767. doi:10.1073/pnas.1218353110 Bachellerie, J.-­‐P., Cavaillé, J., & Hüttenhofer, A. (2002). The expanding snoRNA world. Biochimie, 84(8), 775–790. doi:10.1016/S0300-­‐9084(02)01402-­‐5 Barbosa-­‐Morais, N. L., Irimia, M., Pan, Q., Xiong, H. Y., Gueroussov, S., Lee, L. J., et al. (2012). The evolutionary landscape of alternative splicing in vertebrate species. Science (New York, N.Y.), 338(6114), 1587–1593. doi:10.1126/science.1230612 Biniszkiewicz, D., Cesnaviciene, E., Shub, D. A. (1994). Self-­‐splicing group I intron in cyanobacterial initiator methionine tRNA: evidence for lateral transfer of introns in bacteria. The EMBO Journal, 13(19), 4629. Bitton, D. A., Rallis, C., Jeffares, D. C., Smith, G. C., Chen, Y. Y. C., Codlin, S., et al. (2014). LaSSO, a strategy for genome-­‐wide mapping of intronic lariats and branch points using RNA-­‐seq. Genome Research, 24(7), 1169–1179. doi:10.1101/gr.166819.113 Bonen, L., & Vogel, J. (2001). The ins and outs of group II introns. TRENDS in Genetics, 17(6), 322–331. doi:10.1016/S0168-­‐9525(01)02324-­‐1 Bradley, R. K., Merkin, J., Lambert, N. J., & Burge, C. B. (2012). Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution, 10(1), e1001229. doi:10.1371/journal.pbio.1001229 Burnette, J. M., Miyamoto-­‐Sato, E., Schaub, M. A., Conklin, J., & Lopez, A. J. (2005). Subdivision of large introns in Drosophila by recursive splicing at nonexonic elements. Genetics, 170(2), 661–674. doi:10.1534/genetics.104.039701 Capel, B., Swain, A., Nicolis, S., Hacker, A., Walter, M., Koopman, P., et al. (1993). Circular transcripts of the testis-­‐determining gene Sry in adult mouse testis. Cell, 73(5), 1019–
1030. Carlile, T. M., Rojas-­‐Duran, M. F., Zinshteyn, B., Shin, H., Bartoli, K. M., & Gilbert, W. V. (2014). Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells. Nature, 515(7525), 143–146. doi:10.1038/nature13802 Cech, T. R. (1990). Self-­‐splicing of group I introns. Annual Review of Biochemistry, 59, 543–
568. doi:10.1146/annurev.bi.59.070190.002551 Chapman, K. B., & Boeke, J. D. (1991). Isolation and characterization of the gene encoding yeast debranching enzyme. Cell, 65(3), 483–492. doi:10.1016/0092-­‐8674(91)90466-­‐C Core, L. J., Waterfall, J. J., & Lis, J. T. (2008). Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science (New York, N.Y.), 322(5909), 1845–1848. doi:10.1126/science.1162228 Corrionero, A., Miñana, B., & Valcárcel, J. (2011). Reduced fidelity of branch point 31 recognition and alternative splicing induced by the anti-­‐tumor drug spliceostatin A. Genes & Development, 25(5), 445–459. doi:10.1101/gad.2014311 Corvelo, A., Hallegger, M., Smith, C. W. J., & Eyras, E. (2010). Genome-­‐wide association between branch point properties and alternative splicing, 6(11), e1001016. doi:10.1371/journal.pcbi.1001016 Costa, M., Fontaine, J. M., Loiseaux-­‐de Goër, S., & Michel, F. (1997). A group II self-­‐splicing intron from the brown alga Pylaiella littoralis is active at unusually low magnesium concentrations and forms populations of molecules with a uniform conformation. Journal of Molecular Biology, 274(3), 353–364. Davis, C. A. (2000). Test of intron predictions reveals novel splice sites, alternatively spliced mRNAs and new introns in meiotically regulated genes of yeast. Nucleic Acids Research, 28(8), 1700–1706. doi:10.1093/nar/28.8.1700 DeBoever, C., Ghia, E. M., Shepard, P. J., Rassenti, L., Barrett, C. L., Jepsen, K., et al. (2015). Transcriptome Sequencing Reveals Potential Mechanism of Cryptic 3’ Splice Site Selection in SF3B1 -­‐mutated Cancers, 11(3), e1004105. doi:10.1371/journal.pcbi.1004105 Dietrich, R. C., Incorvaia, R., & Padgett, R. A. (1997). Terminal Intron Dinucleotide Sequences Do Not Distinguish between U2-­‐ and U12-­‐Dependent Introns. Molecular Cell, 1(1), 151–160. doi:10.1016/S1097-­‐2765(00)80016-­‐7 Duff, M. O., Olson, S., Wei, X., Garrett, S. C., Osman, A., Bolisetty, M., et al. (2015). Genome-­‐
wide identification of zero nucleotide recursive splicing in Drosophila. Nature, 521(7552), 376–379. doi:10.1038/nature14475 Duff, M. O., Olson, S., Wei, X., Osman, A., Plocik, A., Bolisetty, M., et al. (2014). Genome-­‐wide Identification of Zero Nucleotide Recursive Splicing in Drosophila. bioRxiv. doi:10.1101/006163 Dumesic, P. A., & Madhani, H. D. (2014). Recognizing the enemy within: licensing RNA-­‐
guided genome defense. Trends in Biochemical Sciences, 39(1), 25–34. doi:10.1016/j.tibs.2013.10.003 Dumesic, P. A., Natarajan, P., Chen, C., Drinnenberg, I. A., Schiller, B. J., Thompson, J., et al. (2013). Stalled spliceosomes are a signal for RNAi-­‐mediated genome defense. Cell, 152(5), 957–968. doi:10.1016/j.cell.2013.01.046 Effenberger, K. A., Anderson, D. D., Bray, W. M., Prichard, B. E., Ma, N., Adams, M. S., et al. (2014). Coherence between cellular responses and in vitro splicing inhibition for the anti-­‐tumor drug pladienolide B and its analogs. Journal of Biological Chemistry, 289(4), 1938–1947. doi:10.1074/jbc.M113.515536 Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C., & Shendure, J. (2014). Saturation editing of genomic regions by multiplex homology-­‐directed repair. Nature, 513(7516), 120–
123. doi:10.1038/nature13695 Folco, E. G., & Reed, R. (2014). In vitro systems for coupling RNAP II transcription to splicing and polyadenylation. Methods in Molecular Biology (Clifton, NJ), 1126, 169–177. doi:10.1007/978-­‐1-­‐62703-­‐980-­‐2_13 Folco, E. G., Coil, K. E., & Reed, R. (2011). The anti-­‐tumor drug E7107 reveals an essential role for SF3b in remodeling U2 snRNP to expose the branch point-­‐binding region. Genes & Development, 25(5), 440–444. doi:10.1101/gad.2009411 Gao, K., Masuda, A., Matsuura, T., & Ohno, K. (2008). Human branch point consensus sequence is yUnAy, 36(7), 2257–2267. doi:10.1093/nar/gkn073 32 Garrey, S. M., Katolik, A., Prekeris, M., Li, X., York, K., Bernards, S., et al. (2014). A homolog of lariat-­‐debranching enzyme modulates turnover of branched RNA. RNA (New York, N.Y.), 20(8), 1337–1348. doi:10.1261/rna.044602.114 Goodwin, S., Gurtowski, J., Ethe-­‐Sayers, S., & Deshpande, P. (2015). Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome. bioRxiv. Gozani, O., Feld, R., & Reed, R. (1996). Evidence that sequence-­‐independent binding of highly conserved U2 snRNP proteins upstream of the branch site is required for assembly of spliceosomal complex A. Genes & Development, 10(2), 233–243. Graubert, T. A., Shen, D., Ding, L., Okeyo-­‐Owuor, T., Lunn, C. L., Shao, J., et al. (2012). Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nature Genetics, 44(1), 53–57. doi:10.1038/ng.1031 Grossman, J. S., Meyer, M. I., Wang, Y. C., Mulligan, G. J., Kobayashi, R., & Helfman, D. M. (1998). The use of antibodies to the polypyrimidine tract binding protein (PTB) to analyze the protein components that assemble on alternatively spliced pre-­‐mRNAs that use distant branch points. RNA (New York, N.Y.), 4(6), 613–625. Hachet, O., & Ephrussi, A. (2004). Splicing of oskar RNA in the nucleus is coupled to its cytoplasmic localization. Nature, 428(6986), 959–963. doi:10.1038/nature02521 Hahn, C. N., & Scott, H. S. (2012). Spliceosome mutations in hematopoietic malignancies. Nature Genetics, 44(1), 9–10. doi:10.1038/ng.1045 Hallegger, M., Sobala, A., & Smith, C. W. J. (2010). Four exons of the serotonin receptor 4 gene are associated with multiple distant branch points. RNA (New York, N.Y.), 16(4), 839–851. doi:10.1261/rna.2013110 Hansen, T. B., Jensen, T. I., Clausen, B. H., Bramsen, J. B., Finsen, B., Damgaard, C. K., & Kjems, J. (2013a). Natural RNA circles function as efficient microRNA sponges. Nature, 495(7441), 384–388. doi:10.1038/nature11993 Hansen, T. B., Kjems, J., & Damgaard, C. K. (2013b). Circular RNA and miR-­‐7 in Cancer. Cancer Research, 73(18), 5609–5612. doi:10.1158/0008-­‐5472.CAN-­‐13-­‐1568 Hatton, A. R., Subramaniam, V., & Lopez, A. J. (1998). Generation of alternative Ultrabithorax isoforms and stepwise removal of a large intron by resplicing at exon-­‐
exon junctions. Molecular Cell, 2(6), 787–796. Hirose, T., Shu, M.-­‐D., & Steitz, J. A. (2003). Splicing-­‐dependent and -­‐independent modes of assembly for intron-­‐encoded box C/D snoRNPs in mammalian cells. Molecular Cell, 12(1), 113–123. Hocine, S., Singer, R. H., & Grünwald, D. (2010). RNA processing and export. Cold Spring Harbor Perspectives in Biology, 2(12), a000752. doi:10.1101/cshperspect.a000752 Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S., & Weissman, J. S. (2009). Genome-­‐Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling. Science (New York, N.Y.), 324(5924), 218–223. doi:10.1126/science.1168978 Jan, C. H., Friedman, R. C., Ruby, J. G., & Bartel, D. P. (2011). Formation, regulation and evolution of Caenorhabditis elegans 3'UTRs. Nature, 469(7328), 97–101. doi:10.1038/nature09616 Jeck, W. R., & Sharpless, N. E. (2014). Detecting and characterizing circular RNAs. Nature Biotechnology, 32(5), 453–461. doi:10.1038/nbt.2890 Kameyama, T., Suzuki, H., & Mayeda, A. (2012). Re-­‐splicing of mature mRNA in cancer cells promotes activation of distant weak alternative splice sites. Nucleic Acids Research, 40(16), 7896–7906. doi:10.1093/nar/gks520 33 Kawashima, T., Douglass, S., Gabunilas, J., Pellegrini, M., & Chanfreau, G. F. (2014). Widespread use of non-­‐productive alternative splice sites in Saccharomyces cerevisiae. PLoS Genetics, 10(4), e1004249. doi:10.1371/journal.pgen.1004249 Khalid, M. F., Damha, M. J., Shuman, S., & Schwer, B. (2005). Structure-­‐function analysis of yeast RNA debranching enzyme (Dbr1), a manganese-­‐dependent phosphodiesterase. Nucleic Acids Research, 33(19), 6349–6360. doi:10.1093/nar/gki934 Khan, S. G., Metin, A., Gozukara, E., Inui, H., Shahlavi, T., Muniz-­‐Medina, V., et al. (2004). Two essential splice lariat branchpoint sequences in one intron in a xeroderma pigmentosum DNA repair gene: mutations result in reduced XPC mRNA levels that correlate with cancer risk. Human Molecular Genetics, 13(3), 343–352. doi:10.1093/hmg/ddh026 Khodor, Y. L., Rodriguez, J., Abruzzi, K. C., Tang, C.-­‐H. A., Marr, M. T., & Rosbash, M. (2011). Nascent-­‐seq indicates widespread cotranscriptional pre-­‐mRNA splicing in Drosophila. Genes & Development, 25(23), 2502–2512. doi:10.1101/gad.178962.111 Kim, J.-­‐W., Kim, H.-­‐C., Kim, G.-­‐M., Yang, J.-­‐M., Boeke, J. D., & Nam, K. (2000). Human RNA lariat debranching enzyme cDNA complements the phenotypes of Saccharomyces cerevisiae dbr1 and Schizosaccharomyces pombe dbr1 mutants. Kiss, T., & Filipowicz, W. (1995). Exonucleolytic processing of small nucleolar RNAs from pre-­‐mRNA introns. Genes & Development, 9(11), 1411–1424. Kotake, Y., Sagane, K., Owa, T., Mimori-­‐Kiyosue, Y., Shimizu, H., Uesugi, M., et al. (2007). Splicing factor SF3b as a target of the antitumor natural product pladienolide. Nature Chemical Biology, 3(9), 570–575. doi:10.1038/nchembio.2007.16 Královicová, J., Lei, H., & Vorechovský, I. (2006). Phenotypic consequences of branch point substitutions. Human Mutation, 27(8), 803–813. doi:10.1002/humu.20362 Kuhsel, M., Strickland, R., & Palmer, J. (1990). An ancient group I intron shared by eubacteria and chloroplasts. Science (New York, N.Y.), 250(4987), 1570–1573. doi:10.1126/science.2125748 Langford, C. J., & Gallwitz, D. (1983). Evidence for an intron-­‐contained sequence required for the splicing of yeast RNA polymerase II transcripts. Cell, 33(2), 519–527. doi:10.1016/0092-­‐8674(83)90433-­‐6 Lareau, L. F., Inada, M., Green, R. E., Wengrod, J. C., & Brenner, S. E. (2007). Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature, 446(7138), 926–929. doi:10.1038/nature05676 Lim, L. P., & Burge, C. B. (2001). A computational analysis of sequence features involved in recognition of short introns, 98(20), 11193–11198. doi:10.1073/pnas.201407298 Matlin, A. J., Clark, F., & Smith, C. W. J. (2005). Understanding alternative splicing: towards a cellular code. Nature Reviews. Molecular Cell Biology, 6(5), 386–398. doi:10.1038/nrm1645 Mercer, T. R., Clark, M. B., Andersen, S. B., Brunck, M. E., Haerty, W., Crawford, J., et al. (2015). Genome-­‐wide discovery of human splicing branchpoints. Genome Research, 25(2), 290–303. doi:10.1101/gr.182899.114 Merkin, J., Russell, C., Chen, P., & Burge, C. B. (2012). Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science (New York, N.Y.), 338(6114), 1593–
1599. doi:10.1126/science.1228186 Meyer, M., Plass, M., Pérez-­‐Valle, J., Eyras, E., & Vilardell, J. (2011). Deciphering 3'ss selection in the yeast genome reveals an RNA thermosensor that mediates alternative 34 splicing. Molecular Cell, 43(6), 1033–1039. doi:10.1016/j.molcel.2011.07.030 Moore, G. E. (1965). Moore: Cramming more components onto integrated circuits,... -­‐ Google Scholar. Electronics Magazine. Mösch, H. U., & Fink, G. R. (1997). Dissection of filamentous growth by transposon mutagenesis in Saccharomyces cerevisiae. Genetics, 145(3), 671–684. Nam, K., Hudson, R. H., Chapman, K. B., Ganeshan, K., Damha, M. J., & Boeke, J. D. (1994). Yeast lariat debranching enzyme. Substrate and sequence specificity. The Journal of Biological Chemistry, 269(32), 20613–20621. Ni, J. Z., Grate, L., Donohue, J. P., Preston, C., Nobida, N., O'Brien, G., et al. (2007). Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-­‐mediated decay. Genes & Development, 21(6), 708–718. doi:10.1101/gad.1525507 Ooi, S. L., Dann, C., Nam, K., Leahy, D. J., Damha, M. J., & Boeke, J. D. (2001). RNA lariat debranching enzyme. Methods in Enzymology, 342, 233–248. Ott, S., Tamada, Y., Bannai, H., Nakai, K., & Miyano, S. (2003). Intrasplicing-­‐-­‐analysis of long intron sequences. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 339–350. PacBio. (2014, October 15). New Chemistry Boosts Average Read Length to 10 kb – 15 kb for PacBio® RS II. Blog.Pacificbiosciences.com. Retrieved May 31, 2015, from http://blog.pacificbiosciences.com/2014/10/new-­‐chemistry-­‐boosts-­‐average-­‐read.html Padgett, R. A., Hardy, S. F., & Sharp, P. A. (1983). Splicing of adenovirus RNA in a cell-­‐free transcription system. Proceedings of the National Academy of Sciences of the United States of America, 80(17), 5230–5234. Padgett, R. A., Konarska, M. M., Aebi, M., Hornig, H., Weissmann, C., & Sharp, P. A. (1985). Nonconsensus branch-­‐site sequences in the in vitro splicing of transcripts of mutant rabbit beta-­‐globin genes, 82(24), 8349–8353. Padgett, R. A., Konarska, M. M., Grabowski, P. J., Hardy, S. F., & Sharp, P. A. (1984). Lariat RNA's as intermediates and products in the splicing of messenger RNA precursors. Science (New York, NY), 225(4665), 898–903. Pan, Q., Shai, O., Lee, L. J., Frey, B. J., & Blencowe, B. J. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-­‐throughput sequencing. Nature Genetics, 40(12), 1413–1415. doi:10.1038/ng.259 Pareek, C. S., Smoczynski, R., & Tretyn, A. (2011). Sequencing technologies and genome sequencing. Journal of Applied Genetics, 52(4), 413–435. doi:10.1007/s13353-­‐011-­‐
0057-­‐x Parra, M. K., Tan, J. S., Mohandas, N., & Conboy, J. G. (2008). Intrasplicing coordinates alternative first exons with alternative splicing in the protein 4.1R gene. The EMBO Journal, 27(1), 122–131. doi:10.1038/sj.emboj.7601957 Patel, A. A., & Steitz, J. A. (2003). Splicing double: insights from the second spliceosome. Nature Reviews. Molecular Cell Biology, 4(12), 960–970. doi:10.1038/nrm1259 Paulsen, M. T., Veloso, A., Prasad, J., Bedi, K., Ljungman, E. A., Magnuson, B., et al. (2014). Use of Bru-­‐Seq and BruChase-­‐Seq for genome-­‐wide assessment of the synthesis and stability of RNA. Methods (San Diego, Calif.), 67(1), 45–54. doi:10.1016/j.ymeth.2013.08.015 Phizicky, E. M., & Hopper, A. K. (2010). tRNA biology charges to the front. Genes & Development, 24(17), 1832–1860. doi:10.1101/gad.1956510 35 Query, C. C., Moore, M. J., & Sharp, P. A. (1994). Branch nucleophile selection in pre-­‐mRNA splicing: evidence for the bulged duplex model. Genes & Development, 8(5), 587–597. doi:10.1101/gad.8.5.587 Quesada, V., Conde, L., Villamor, N., Ordóñez, G. R., Jares, P., Bassaganyas, L., et al. (2012). Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nature Genetics, 44(1), 47–52. doi:doi:10.1038/ng.1032 Rain, J. C. (1997). In vivo commitment to splicing in yeast involves the nucleotide upstream from the branch site conserved sequence and the Mud2 protein. The EMBO Journal, 16(7), 1759–1771. doi:10.1093/emboj/16.7.1759 Reinhold-­‐Hurek, B., & Shub, D. A. (1992). Self-­‐splicing introns in tRNA genes of widely divergent bacteria. Nature, 357(6374), 173–176. doi:10.1038/357173a0 Ruby, J. G., Jan, C. H., & Bartel, D. P. (2007). Intronic microRNA precursors that bypass Drosha processing. Nature, 448(7149), 83–86. doi:10.1038/nature05983 Ruskin, B., & Green, M. R. (1985). An RNA processing activity that debranches RNA lariats. Science (New York, NY), 229(4709), 135–140. Salipante, S. J., Kawashima, T., Rosenthal, C., Hoogestraat, D. R., Cummings, L. A., Sengupta, D. J., et al. (2014). Performance comparison of Illumina and ion torrent next-­‐generation sequencing platforms for 16S rRNA-­‐based bacterial community profiling. Applied and Environmental Microbiology, 80(24), 7583–7591. doi:10.1128/AEM.02206-­‐14 Sharp, P. A. (1991). "Five easy pieces". Science (New York, N.Y.), 254(5032), 663. Sibley, C. R., Emmett, W., Blazquez, L., Faro, A., Haberman, N., Briese, M., et al. (2015). Recursive splicing in long vertebrate genes. Nature, 521(7552), 371–375. doi:10.1038/nature14466 Smith, C. W., & Nadal-­‐Ginard, B. (1989). Mutually exclusive splicing of alpha-­‐tropomyosin exons enforced by an unusual lariat branch point location: implications for constitutive splicing. Cell, 56(5), 749–758. Spies, N., Burge, C. B., & Bartel, D. P. (2013). 3' UTR-­‐isoform choice has limited influence on the stability and translational efficiency of most mRNAs in mouse fibroblasts. Genome Research, 23(12), 2078–2090. doi:10.1101/gr.156919.113 Spingola, M., Grate, L., Haussler, D., & Ares, M. (1999). Genome-­‐wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae., 5(2), 221–234. Stoss, O., Schwaiger, F. W., Cooper, T. A., & Stamm, S. (1999). Alternative splicing determines the intracellular localization of the novel nuclear protein Nop30 and its interaction with the splicing factor SRp30c. The Journal of Biological Chemistry, 274(16), 10951–10962. Suzuki, H. (2006). Characterization of RNase R-­‐digested cellular RNA source that consists of lariat and circular RNAs from pre-­‐mRNA splicing. Nucleic Acids Research, 34(8), e63–
e63. doi:10.1093/nar/gkl151 Taggart, A. J., DeSimone, A. M., Shih, J. S., Filloux, M. E., & Fairbrother, W. G. (2012). Large-­‐
scale mapping of branchpoints in human pre-­‐mRNA transcripts in vivo. Nature Structural & Molecular Biology, 19(7), 719–721. doi:10.1038/nsmb.2327 Takashima, Y., Ohtsuka, T., González, A., Miyachi, H., & Kageyama, R. (2011). Intronic delay is essential for oscillatory expression in the segmentation clock. Proceedings of the National Academy of Sciences of the United States of America, 108(8), 3300–3305. doi:10.1073/pnas.1014418108 Tilgner, H., Jahanbani, F., Blauwkamp, T., Moshrefi, A., Jaeger, E., Chen, F., et al. (2015). 36 Comprehensive transcriptome analysis using synthetic long-­‐read sequencing reveals molecular co-­‐association of distant splicing events. Nature Biotechnology. doi:10.1038/nbt.3242 Tycowski, K. T., Shu, M. D., & Steitz, J. A. (1993). A small nucleolar RNA is processed from an intron of the human gene encoding ribosomal protein S3. Genes & Development, 7(7A), 1176–1190. Vijayraghavan, U., Parker, R., Tamm, J., Iimura, Y., Rossi, J., Abelson, J., & Guthrie, C. (1986). Mutations in conserved intron sequences affect multiple steps in the yeast splicing pathway, particularly assembly of the spliceosome. The EMBO Journal, 5(7), 1683–
1695. Vincent, H. A., & Deutscher, M. P. (2006). Substrate recognition and catalysis by the exoribonuclease RNase R. The Journal of Biological Chemistry, 281(40), 29769–29775. doi:10.1074/jbc.M606744200 Vincenti, S., De Chiara, V., Bozzoni, I., & Presutti, C. (2007). The position of yeast snoRNA-­‐
coding regions within host introns is essential for their biosynthesis and for efficient splicing of the host pre-­‐mRNA. RNA (New York, N.Y.), 13(1), 138–150. doi:10.1261/rna.251907 Vogel, J., Hess, W. R., & Börner, T. (1997). Precise branch point mapping and quantification of splicing intermediates. Nucleic Acids Research, 25(10), 2030–2031. Wahl, M. C., Will, C. L., & Lührmann, R. (2009). The Spliceosome: Design Principles of a Dynamic RNP Machine. Cell, 136(4), 701–718. doi:10.1016/j.cell.2009.02.009 Wallace, J. C., & Edmonds, M. (1983). Polyadenylylated nuclear RNA contains branches. Proceedings of the National Academy of Sciences of the United States of America, 80(4), 950–954. Wan, Y., & Wu, C. J. (2013). SF3B1 mutations in chronic lymphocytic leukemia. Blood, 121(23), 4627–4634. doi:10.1182/blood-­‐2013-­‐02-­‐427641 Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., et al. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature, 456(7221), 470–476. doi:doi:10.1038/nature07509 Wang, L., Lawrence, M. S., Wan, Y., Stojanov, P., Sougnez, C., Stevenson, K., et al. (2011). SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. The New England Journal of Medicine, 365(26), 2497–2506. doi:10.1056/NEJMoa1109016 Wang, Y., & Silverman, S. K. (2005). Efficient one-­‐step synthesis of biologically related lariat RNAs by a deoxyribozyme. Angewandte Chemie (International Ed. in English), 44(36), 5863–5866. doi:10.1002/anie.200501643 Wu, X., Tschumper, R. C., & Jelinek, D. F. (2013). Genetic characterization of SF3B1 mutations in single chronic lymphocytic leukemia cells. Leukemia, 27(11), 2264–2267. doi:10.1038/leu.2013.155 Ye, Y., De Leon, J., Yokoyama, N., Naidu, Y., & Camerini, D. (2005). DBR1 siRNA inhibition of HIV-­‐1 replication. Retrovirology, 2(1), 63. doi:10.1186/1742-­‐4690-­‐2-­‐63 Yin, Q.-­‐F., Yang, L., Zhang, Y., Xiang, J.-­‐F., Wu, Y.-­‐W., Carmichael, G. G., & Chen, L.-­‐L. (2012). Long noncoding RNAs with snoRNA ends. Molecular Cell, 48(2), 219–230. doi:10.1016/j.molcel.2012.07.033 You, X., Vlatkovic, I., Babic, A., Will, T., Epstein, I., Tushev, G., et al. (2015). Neural circular RNAs are derived from synaptic genes and regulated by development and plasticity. Nature Neuroscience, 18(4), 603–610. doi:10.1038/nn.3975 37 Zhang, Y., Zhang, X.-­‐O., Chen, T., Xiang, J.-­‐F., Yin, Q.-­‐F., Xing, Y.-­‐H., et al. (2013). Circular intronic long noncoding RNAs. Molecular Cell, 51(6), 792–806. doi:10.1016/j.molcel.2013.08.017 Zhang, Z., Hesselberth, J. R., & Fields, S. (2007). Genome-­‐wide identification of spliced introns using a tiling microarray. Genome Research, 17(4), 503–509. doi:10.1101/gr.6049107 Zheng, S., Vuong, B. Q., Vaidyanathan, B., Lin, J.-­‐Y., Huang, F.-­‐T., & Chaudhuri, J. (2015). Non-­‐
coding RNA Generated following Lariat Debranching Mediates Targeting of AID to DNA. Cell, 161(4), 762–773. doi:10.1016/j.cell.2015.03.020 38 Chapter 2: Identification of New Branch Points and Unconventional Introns in Saccharomyces cerevisiae This research is currently under review at Genome Research. 39 Abstract Spliced messages constitute one-­‐fourth of expressed mRNAs in the yeast Saccharomyces cerevisiae, and most mRNAs in metazoans. Splicing requires 5' splice site (5'SS), branch point (BP), and 3' splice site (3'SS) elements, but the role of the BP in splicing control remains poorly understood because BP identification remains difficult. We developed a high-­‐throughput method, Branch-­‐seq, to map BP and 5'SS of isolated RNA lariats. Applied to S. cerevisiae, Branch-­‐seq detected 76% of expressed, annotated BPs and identified a comparable number of novel BPs. We used RNA-­‐seq to confirm associated 3'SS locations, identifying 136 novel splice junctions, including an AT-­‐AC intron. We show that several yeast introns use two or even three different BPs, with effects on 3'SS choice, protein coding potential and regulation via nonsense-­‐mediated mRNA decay (NMD), and find that some novel introns are regulated in response to environmental changes. Together, these findings reveal BP-­‐based regulation and demonstrate unanticipated complexity of splicing in yeast. 40 Introduction Pre-­‐mRNA splicing is required for the expression of most eukaryotic genes and is often regulated. The first step of splicing involves selection of a specific base, usually an adenine, in the pre-­‐mRNA as the BP nucleophile and formation of an unusual 2'-­‐5' RNA linkage between the 2' OH of the BP and the 5'SS (Wahl, Will, & Lührmann, 2009). This step is followed by ligation of the two exons and freeing of the intron in the form of a branched lariat (Padgett, Konarska, Grabowski, Hardy, & Sharp, 1984). The lariat is rapidly debranched and degraded in most cases (Chapman & Boeke, 1991; Folco & Reed, 2014; Ruskin & Green, 1985), making BP identification difficult. Current BP annotations suggest that yeast introns almost always have a single BP. However, those annotations are based on computational predictions and lack a comprehensive experimental basis (Meyer, Plass, Pérez-­‐Valle, Eyras, & Vilardell, 2011; Spingola, Grate, Haussler, & Ares, 1999). While computational predictions of BP locations are sufficient in many cases, experimental knowledge of BP location is essential to understand the full repertoire of splicing decisions cells make. For instance, unusual BP placement is known to affect the outcome of splicing of mammalian alpha-­‐tropomyosin. The BP upstream of the second mutually exclusive exon in alpha-­‐tropomyosin is located very close to the 5'SS of the competing exon, preventing splicing of the intervening intron due to steric hindrance of splicing components (Smith & Nadal-­‐Ginard, 1989). BP position can also affect usage of “NAGNAG” alternative 3'SS separated by 3 nt, a common type of alternative splicing (AS) in mammals (Bradley, Merkin, Lambert, & Burge, 2012). Many types of AS involve regulated use of 3'SS (e.g., alternative 3'SS, exon skipping, mutually exclusive exons, alternative last exons, intron retention). In general, the relative 41 contribution of BP recognition versus 3'SS recognition to each of these types of AS is unknown. In budding yeast, the BP is arguably more critical than the 3'SS to the first step of splicing because the 3'SS does not have to be identified by the splicing machinery until after the first step of splicing (Séraphin & Kandels-­‐Lewis, 1993). In contrast, in metazoans (Aebi, Hornig, Padgett, Reiser, & Weissmann, 1986) and S. pombe (Reich, VanHoy, Porter, & Wise, 1992) recognition of the 3'SS often precedes the first step of splicing. A 3'SS without a BP is not sufficient for splicing of an intron as evidenced by splicing reporters in yeast in which mutating the annotated BP motif greatly reduces splicing of the transcript (Rain, 1997; Vijayraghavan et al., 1986). Similarly, in humans BP motif mutations can result in aberrant splicing or intron retention which are associated with several diseases (Královicová, Lei, & Vorechovský, 2006). Regulation of AS in yeast can occur in response to environmental cues. For example, amino acid starvation inhibits splicing of ribosomal protein genes and exposure to other stresses can decrease or increase the splicing of different subsets of genes (Pleiss, Whitworth, Bergkessel, & Guthrie, 2007). In the case of PTC7, a serine/threonine phosphatase, AS responds to changes in the available carbon source by creating mRNA isoforms that code for unique proteins. One protein isoform localizes to the mitochondria and the other contains a transmembrane domain which causes the isoform to localize to the nuclear envelope (Juneau, Nislow, & Davis, 2009). In mammals, AS is nearly universal and has many regulatory roles such as targeting proteins to different cellular compartments, altering transcription factor binding preferences, and influencing RNA stability (Pan, Shai, Lee, Frey, & Blencowe, 2008; E. T. Wang et al., 2008). One widespread regulator of RNA stability is nonsense-­‐mediated mRNA decay (NMD), a pathway that 42 degrades mRNAs that contain premature termination codons (PTCs). Several metazoan splicing factors autoregulate by altering splicing of transcripts from their own loci to shift toward increased production of unstable, NMD-­‐targeted isoforms when protein levels are high (Sureau, 2001; Wollerton, Gooding, Wagner, Garcia-­‐Blanco, & Smith, 2004). NMD also occurs in yeast (González, Wang, & Peltz, 2001), and can be coupled to splicing to regulate gene expression (Kawashima, Douglass, Gabunilas, Pellegrini, & Chanfreau, 2014). Here we developed Branch-­‐seq, a genome-­‐wide technique to sequence lariat BPs and their associated 5'SS. We tested our method in S. cerevisiae, where every annotated intron has a confident BP prediction (Meyer et al., 2011; Spingola et al., 1999) allowing us to assess the accuracy and sensitivity of our method. Surprisingly, in addition to confirming the locations of most annotated BP, we also identified more than 200 novel BPs. This finding prompted us to further explore splicing patterns and regulatory consequences in yeast using additional genome-­‐wide assays and data. These analyses uncovered unexpected complexities in yeast splicing, including introns with multiple BPs, an intron with AT-­‐AC splice sites, and a gene that couples splicing to NMD for gene regulation, revealing a number of parallels to mammalian splicing that were not previously appreciated. 43 Results Branch-­‐seq accurately identifies locations of 75% of expressed, annotated BPs Though the yeast genome sequence has been available since 1996 and studies have sought to comprehensively identify yeast introns and test those predictions (C. A. Davis, 2000; Spingola et al., 1999), genome-­‐wide assays are still discovering additional yeast introns (Kawashima et al., 2014; Zhang, Hesselberth, & Fields, 2007). BP detection has lagged behind intron detection largely because of the short-­‐lived nature and unique structure of lariat RNAs. BPs are typically verified using fairly laborious, low-­‐throughput techniques such as primer extension, in vitro splicing, and RT-­‐PCR across the lariat 5'SS-­‐BP junction (Padgett et al., 1985; Vogel, Hess, & Börner, 1997), with alternative approaches developed only recently (Awan, Manfredo, & Pleiss, 2013; Bitton et al., 2014; Mercer et al., 2015; Taggart, DeSimone, Shih, Filloux, & Fairbrother, 2012). To date, budding yeast BPs have not been validated using a genome-­‐wide approach. To experimentally locate BPs, we developed Branch-­‐seq, an untargeted, high-­‐
throughput method for identification of lariat BPs and their associated 5'SS. Initially, lariats were stabilized in vivo by deleting DBR1, the debranching enzyme that linearizes lariats in the default intron decay pathway (Chapman & Boeke, 1991). In the first step of Branch-­‐seq, lariats were enriched from dbr1∆ total RNA using a denaturing two-­‐
dimensional (2D) polyacrylamide gel. Because the mobility of lariat RNAs (and other circular RNAs) is retarded to different extents at different gel densities compared to linear RNA, lariat and circular RNAs run in an arc above the diagonal produced by linear RNAs 44 (Awan et al., 2013; Chapman & Boeke, 1991; Friedman & Brewer, 1995). A prominent off-­‐
diagonal arc was visible in 2D gel analysis of dbr1∆ RNA (Fig. 2-­‐1A). RNA was isolated from the top, middle and bottom portions of the 2D gel arc to enrich for lariats of different sizes (Fig. 2-­‐S1A) and linearized using purified recombinant DBR1 enzyme. Following debranching, standard techniques were used to obtain libraries suitable for paired-­‐end Illumina sequencing. This strategy yields read pairs in which the 3' mapping read corresponds to the BP, and the 5' mapping read identifies the associated 5'SS. The 3' ends mostly correspond to BPs rather than 3'SS because the lariat intermediate stabilized in dbr1∆ yeast is one in which the intron sequence 3' of the BP has been degraded (Fig. 2-­‐S1B). To further characterize yeast introns, we performed a version of random hexamer-­‐primed RNA-­‐seq known as ‘Lariat-­‐seq’ (Awan et al., 2013), again using RNA isolated from a 2D gel arc (steps 2L and 3L)(discussed further below). Branch-­‐seq accurately identified annotated BP and 5'SS. Overall, ~60% of mappable reads corresponded to annotated introns, and ~75% of expressed yeast introns contained one or more read pairs. As an example, read pairs mapping to the intron of PCH2 are shown in Figure 2-­‐1B. The 5' end reads (pink) predominantly began exactly at the annotated 5'SS, which matches the /GTATGT yeast consensus (with ‘/’ indicating the splice junction), while 3'-­‐end reads predominantly began at the presumptive BP of this intron, a CACTAAC sequence near the intron 3' end (differing only at the underlined C from the yeast BP consensus motif, TACTAAC, where the BP nucleotide is in bold). A meta-­‐analysis of all annotated 5'SS and BP confirmed this pattern, with a sharp peak of 5' end read starts at annotated 5'SS, and a similarly sharp peak of 3' end read starts at annotated BP (Fig. 2-­‐1C). 45 Figure 2-­‐1. Branch-­‐seq accurately identifies BP locations on a genome wide scale. (A) Schematic of the Branch-­‐seq protocol. Steps labeled with “B” and “L” correspond to Branch-­‐seq and Lariat-­‐seq, respectively. (B) Branch-­‐seq locates the annotated 5'SS (pink) and BP (blue) in the PCH2 intron(Robinson et al., 2011). Dashed lines show locations of 5'SS (GTATGT), BP (CACTAAC), and 3'SS (AG) sequences. Mismatches from consensus are underlined. BP nucleotide is red and bold. Mismatches in reads are indicated by small red, green, dark blue, and orange horizontal lines. Inset axes show read start locations for PCH2 intron 5'SS and BP reads where the 0nt is the 5'SS or BP nucleotide, respectively. (C) Meta 5'SS and BP read start plots as in (b) but for all annotated 5'SS and BP. Dotted vertical lines at +/-­‐ 2nt. (D) Locations of BP peaks called by SW and GEM-­‐BP relative to annotated BP positions. 46 Additionally, Branch-­‐seq finds one novel BP adjacent to an annotated BP, suggesting that the annotation needs to be changed (Fig. 2-­‐S1F). A small secondary peak 2 bases upstream of the BP in the meta analysis likely reflects shifted RT priming, (Fig. 2-­‐S1C-­‐E and Supplemental Methods). These results support the utility of Branch-­‐seq for systematic identification of yeast BP and associated 5'SS. Branch-­‐seq identifies novel BP and associated 5'SS Application of two independent peak calling algorithms to Branch-­‐seq data identified BP locations with high precision, yielding an unexpectedly large number of 268 “confident novel BPs” (cnBPs). First, we used a simple sliding window approach (winBP) to find peaks of high local read density without using any sequence information. Second, we adapted the existing GEM ChIP-­‐seq peak caller (Guo, Mahony, & Gifford, 2012) to identify BP peaks in software called GEM-­‐BP (Supplemental Methods). GEM-­‐BP uses the sharply peaked distribution of read starts at BPs and strong BP motif in yeast to accurately call peaks. GEM-­‐BP recovered 75% of expressed annotated BP within 3 nt of their annotated locations, while winBP identified 59% of expressed annotated BP, including two not found by GEM-­‐BP, with somewhat lower precision (Table 1, Fig. 2-­‐1D, Table II-­‐S1). The BP motif is highly constrained in S. cerevisiae, with ~90% of annotated BP matching the TACTAAC motif perfectly (Spingola et al., 1999). Overall, GEM-­‐BP peaks matched the consensus BP motif more frequently than winBP peaks, reflecting the use of a motif in the predictions by GEM-­‐BP. To maximize sensitivity, the union of peaks called by both approaches was used, a set of 430 putative novel BP (Table II-­‐S2). We generated a high confidence set of novel BP peaks for all downstream analyses, using the paired-­‐end sequencing information from Branch-­‐seq, which provides a built-­‐in quality control for BP 47 identification. Requiring presence of a typical 5'SS motif in the associated 5' end reads yielded a set of 268 cnBPs, with an estimated false discovery rate (FDR) of 1.1% (Fig. 2-­‐2A, Table II-­‐S3) (see Methods), nearly doubling the number of BPs in budding yeast. The remaining set of 162 putative novel BP with atypical 5'SS showed a modest preference for the /GTATGT consensus (Fig. 2-­‐S2B) suggesting the presence of additional novel BPs, but was not pursued further. Most of the 268 cnBPs were located in annotated exons, introns or UTRs, but almost one third were located outside of annotated transcripts, sometimes in regions antisense to annotated genes such as cryptic unstable transcripts (CUTs) and stable uncharacterized transcripts (SUTs) (Fig. 2-­‐2B, 2-­‐S2C). These observations suggest that Branch-­‐seq can be used to extend annotation of genic as well as non-­‐genic features in yeast. For example, in the second exon of the RPL30 gene we observed a substantial peak of more than one hundred Branch-­‐seq reads at a variant BP motif, GGCTAAC, associated with a potential novel 5'SS, pointing to the presence of a second intron in this gene (Fig. 2-­‐2C). The 5' end reads associated with the cnBP in RPL30 began with the sequence GTAAGT, just one mismatch from the yeast 5'SS consensus (Fig. 2-­‐2C). As another example, we observed three distinct peaks of Branch-­‐seq reads in the intron and second exon of the TDA5 gene. These peaks corresponded to the annotated BP (TACTAAC) and to two other sites downstream in the transcript, which were associated with motifs related to the BP motif by one (AACTAAC) or two (GTCTAAC) mismatches (Fig. 2-­‐2D). All three of these peaks were paired with reads mapping to the annotated 5'SS. These data suggest that alternative BPs are used in splicing of this intron, likely yielding at least two or three different 3'SS. 48 Figure 2-­‐2. Branch-­‐seq locates hundreds of novel BPs. (A) Number of annotated BP recovered by Branch-­‐seq (light orange) compared to number of computationally predicted BP (dark orange) (Meyer et al., 2011). The cnBP (light green) are a subset of all novel BP (dark green). (B) Genomic locations of the 268 cnBP. Novel BPs located in CDS (C-­‐D) introns (D) and of the TDA5 and RPL30 genes. Annotated TDA5 BP and 5'SS are blue. Potential AG 3'SS are depicted. 3'SS confirmed by entropy are indicated by asterisk (C-­‐D). Potential BP-­‐3'SS paring indicated by matching colors (D). (E) Sequence motifs created by MEME of annotated BPs (left) and typical 5'SS cnBPs(middle) recovered by Branch-­‐seq and human BP motif (right) for comparison. Position 0 is the BP nucleotide. 49 Comprehensive BP sequencing allowed us to identify BP that deviate from the strict yeast consensus. Of the 268 cnBP, 51 were a perfect match to the TACTAAC consensus motif and the remaining 217 had up to 4 mismatches, yielding a more degenerate motif when aligned (Fig. 2-­‐2E). Interestingly, the –1 position, which in yeast is considered to have a strong preference for “A” appears to also tolerate “G”, as often seen in mammalian BP motif (Mercer et al., 2015). The –5 to –3 positions are also more degenerate in cnBP motif, and “T” seems to be tolerated at the –4 and –3 positions. Overall, the cnBP motif resembles known mammalian BP motifs of CTRAC or minimally TNA (R = A or G, N = any base). Surprisingly, these cnBP did not show a peak in conservation at the BP as is observed for annotated BPs, suggesting that these BPs are specific to S. cerevisiae (Fig. 2-­‐S2A). The weaker cnBP motifs might reflect lower levels of splicing (Fig. 2-­‐S3C, Table II-­‐S4) or more frequent regulation of novel BPs than of annotated BPs. We compared our approach for BP detection to a recently described approach that uses ‘lariat junction’ (LJ) reads that originate from reverse transcription across the 5'SS to BP junction of the lariat (Awan et al., 2013; Bitton et al., 2014; Taggart et al., 2012). For this purpose, we identified Lariat-­‐seq reads that were composed of a pair of segments that mapped near each other, but in a discordant order, and used the ends of these read segment pairs to define 5'SS and BP locations (Fig. 2-­‐3A, 2-­‐S2D, Table II-­‐S5). For example, we detected 23 reads that crossed 5'SS (GTAAGT) and BP (CACTAAC) of an unannotated intron in the BDF2 gene, which encodes a transcription factor (Fig. 2-­‐3B). The yield of LJ reads was two orders of magnitude higher in Lariat-­‐seq data (450 per 106 reads) than in conventional RNA-­‐seq data (5.5 per 106), confirming that Lariat-­‐seq synergizes with the LJ 50 Figure 2-­‐3. Lariat-­‐set junction reads identify BP locations. (A) Schematic of lariat junction read mapping strategy. Green box indicates location of best 5'SS in lariat junction read. (B) Novel intron in BDF2 CDS is supported by Branch-­‐seq reads (top, pink and blue as in Figure 2-­‐1) and Lariat-­‐seq junction reads (middle, 5'SS read fragments in dark green, BP read fragments in light green). Black boxes denote novel 5'SS and BP sequences identified by Branch-­‐seq and Lariat-­‐seq reads. (C) Summary of overlaps among novel BP identified by Lariat-­‐seq JR reads, cnBP identified by Branch-­‐seq, and novel splice junctions identified by RNA-­‐seq. 51 approach (Taggart et al. 2012; Awan, Manfredo, and Pleiss 2013). These LJ reads confirmed 41 annotated BPs and 17 novel BPs (Table II-­‐S5), several of which overlapped with cnBPs identified by Branch-­‐seq (Fig. 2-­‐3C). Differences in novel BPs recovered by Lariat-­‐seq and Branch-­‐seq is in part due to the lariat sizes successfully recovered by each method (Fig. 2-­‐S3D). Over 100 additional introns and splice sites in the yeast genome The unexpectedly large number of new BPs identified by our approaches prompted us to further explore yeast splicing patterns using RNA-­‐seq, which is complementary to Branch-­‐seq in that it identifies 3'SS as well as 5'SS. We hypothesized that some cnBP might be located inside unannotated introns that derive from spliced mRNAs that are quickly degraded by NMD. Therefore, we performed RNA-­‐seq on a upf1∆ strain (which is defective for NMD), as well as wildtype (WT) and dbr1∆ strains, and used stringent criteria to define novel splice junctions from the data. Briefly, RNA-­‐seq reads were mapped using TopHat, allowing novel junction discovery (Kim et al., 2013). After mapping, we required a minimum splice junction entropy of at least 2 bits (Graveley et al., 2011), which corresponds to uniform coverage of at least 4 distinct start positions, or more variable coverage of a larger number of positions (Fig. 2-­‐S4A). This approach yielded 136 unannotated splice junctions, 38 of which were observed in a recent study (Kawashima et al., 2014) (Table II-­‐S6). In all, 115 novel introns overlapped 88 annotated genes. Gene ontology analysis of this set yielded a bias for ribosomal protein genes, a class which is enriched for annotated introns as well (Spingola et al., 1999). Comparing the locations of cnBPs defined by Branch-­‐seq (n = 268) and novel introns defined by RNA-­‐seq (n =136), we observed a degree of overlap (n = 22) that was 52 significant (binomial test, p<0.001) but relatively modest (Fig. 2-­‐3C), likely because the two protocols have different biases in the RNAs they capture. We observed that novel splice junctions with RNA-­‐seq entropy ≥2 bits are strongly biased toward shorter genes (Fig. 2-­‐
S4C). This trend likely reflects features of yeast messages, which have shorter poly(A) tails than in metazoa and can be stable in non-­‐polyadenylated form, biasing recovery by poly(A)+ RNA-­‐seq (Hu, Sweet, Chamnongpol, Baker, & Coller, 2009; Presnyak et al., 2015). Branch-­‐seq also has a bias toward recovery of BPs from shorter introns (Fig. 2-­‐S3A,B,D), but the associated introns are located in genes from all yeast length classes (Fig. 2-­‐S4C). The novel introns identified by RNA-­‐seq exhibited several characteristics of known yeast introns. The distribution of their lengths mirrored that of known introns (Fig. 2-­‐4A). The splice sites of novel introns resembled motifs of annotated introns, but showed more deviation from the consensus, especially at the +4 and +6 positions of the 5'SS, and lacked a polypyrimidine tract at the 3'SS (Fig. 2-­‐S4B), consistent with a recent report (Kawashima et al., 2014). Presence of weaker splice site motifs suggested that splicing of these introns might be less efficient and/or regulated, making them more difficult to detect. New splice sites have distinctive features and conservation The novel splice junctions identified by RNA-­‐seq were associated with distinct patterns of evolutionary sequence conservation across yeast species compared to annotated introns, suggesting a level of evolutionary constraint that is above background but well below that of annotated introns (Fig. 2-­‐4B). As expected, conservation declined sharply downstream of the 5'SS and increased just upstream of the 3'SS for most annotated introns. For novel splice sites that fell inside of annotated coding sequences, a modest 53 Figure 2-­‐4. RNA-­‐seq discovers additional novel introns. (A) Length distribution of annotated (blue) and novel (red) splice junctions. Novel splice junctions include any junction with entropy ≥2. (B) Conservation of splice sites for annotated splice sites (black) and novel splice sites located in annotated CDS (blue), introns (yellow), and outside of ORFs (green). Average predicted BP location for intronic 3'SS is 54 denoted with dotted line, shading is +/-­‐ 1 standard deviation (only plotting -­‐30 to +30 nt around the splice site). For 5'SS, annotated n=282, CDS n=14, intron n=19, intergenic=18. For 3'SS, annotated n=282, CDS n=34, intron n=7, intergenic n=18. (C) Effect on coding length of ORFs from splicing out of novel introns. Predicted change to the coding sequence of REC107 (D) and RUB1 (E) after splicing out novel introns. Red arrow indicates location of RUB1 protein cleavage prior to its addition to substrates. (F) RT-­‐PCR sequence (black) aligns to annotated intron of RPL30 (light blue)(Kent, 2002) http://genome.ucsc.edu, SacCer3 genome assembly. Colored triangles represent splice sites. Grey, annotated splice sites; red, AT-­‐AC 5'SS; orange, AT-­‐AC 3'SS 1; green, AT-­‐AC 3'SS 2. Depending on which AC/ 3'SS is used, the second AUG is either 104 nt or 170 nt downstream of the truncated main ORF. (G) WebLogo of published U2-­‐dependent AT-­‐AC intron 5’SS and 3’SS. RPL30 AT-­‐AC splice sites shown bellow(Sheth et al., 2006). 55 decline in conservation was observed after the 5'SS. This pattern might be expected for an intron that is spliced in some species but retained (or alternatively spliced) in others, giving rise to a conservation pattern that is intermediate between those of typical exons and introns. Novel pairs of splice sites located in annotated introns exhibited a different pattern, with low conservation overall (expected in introns), but with elevated conservation about 20 nt upstream of the 3'SS, in the vicinity of the predicted BP location (shaded yellow, Fig. 2-­‐4B). New splice sites have distinctive features and conservation The novel splice junctions identified by RNA-­‐seq were associated with distinct patterns of evolutionary sequence conservation across yeast species compared to annotated introns, suggesting a level of evolutionary constraint that is above background but well below that of annotated introns (Fig. 2-­‐4B). As expected, conservation declined sharply downstream of the 5'SS and increased just upstream of the 3'SS for most annotated introns. For novel splice sites that fell inside of annotated coding sequences, a modest decline in conservation was observed after the 5'SS. This pattern might be expected for an intron that is spliced in some species but retained (or alternatively spliced) in others, giving rise to a conservation pattern that is intermediate between those of typical exons and introns. Novel pairs of splice sites located in annotated introns exhibited a different pattern, with low conservation overall (expected in introns), but with elevated conservation about 20 nt upstream of the 3'SS, in the vicinity of the predicted BP location (shaded yellow, Fig. 2-­‐4B). Often, splicing of the novel introns shortened the predicted protein sequence by at least 50% (Fig. 2-­‐4C) as in the case of REC107 (Fig. 2-­‐4D), a gene involved early on during 56 meiotic recombination (Malone et al., 1991). Even for cases in which the splicing did not change the length of the ORF dramatically, the function of the protein might be altered, as is the case in RUB1, a ubiquitin-­‐like protein (Fig. 2-­‐4E). The predicted RUB1 protein resulting from the spliced mRNA would be shortened by just 7 residues, but with altered composition of the 20 C-­‐terminal residues, including loss of the C-­‐terminal glycine, which is used in ligation of RUB1 to its targets (Vierstra & Callis, 1999). AT-­‐AC splice sites are used in yeast No /AT-­‐AC/ introns have been reported in yeast. However, we identified RNA-­‐seq splice junction reads supporting the splicing of a novel intron nested inside the annotated RPL30 intron, which had an /AT 5'SS that spliced to one of two different AC/ 3'SS, one of which had high entropy (> 3 bits). The unconventional AT-­‐AC isoform with distal AC/ 3'SS was supported in the WT, dbr1∆, and upf1∆ RNA-­‐seq datasets (Fig. 2-­‐S4A), and we confirmed both novel AT-­‐AC splice junctions by RT-­‐PCR and sequencing (Fig. 2-­‐4F, 2-­‐S5). By RNA-­‐seq analysis, AT-­‐AC splicing is much less abundant than the canonical isoform, representing ~1-­‐2% of mRNAs from this highly-­‐expressed gene locus (see supplemental text in Methods). Yeast lack the distinct machinery of the minor spliceosome that splices most known /AT-­‐AC/ introns in metazoans (Russell, Charette, Spencer, & Gray, 2006), and as expected, the 5'SS motif of the RPL30 intron bore no resemblance to the highly conserved /ATATCCTT consensus typical of animal and plant U12-­‐type AT-­‐AC introns. However, it also deviated quite substantially from the consensus of the few dozen major spliceosomal (‘U2-­‐type’) AT-­‐AC introns that are known in metazoans (Sheth et al., 2006), which have a 57 very strong /ATAAGT consensus (Fig. 2-­‐4G), raising questions about the mechanism of its splicing (see Discussion). Multi-­‐BP introns occur in at least twelve genes and can impact gene expression Branch-­‐seq revealed 11 unconventional introns that make use of two BPs (Fig. 2-­‐
5A) and one intron that uses three BPs (Fig. 2-­‐2D). In about half of these cases, the novel BP is located in a long intron, but is much closer to the 5'SS than the annotated BP (Fig. 2-­‐
5A), consistent with preferential selection of small lariats by Branch-­‐seq (Fig. 2-­‐S3A,B,D). In one case a methylation guide snoRNA, snR18, is located between two BPs, and in two other cases a putative ORF occurs between BPs (Fig. 2-­‐5B). In the snR18 case, use of the upstream BP would shift the intron-­‐encoded RNA from the lariat loop to the lariat tail. Overall, for introns that use two BPs, the first BP tends to have a weaker motif than the downstream BP (Fig. 2-­‐5C). Branch-­‐seq identified two BPs in the LSM2 gene (Fig. 2-­‐5D), which encodes an Sm-­‐
like protein that has both nuclear and cytoplasmic functions and plays a role in RNA processing and turnover (Beggs, 2005). The novel BP in LSM2, AACTAAC, is upstream of the annotated BP and allows for splicing to a novel 3'SS, located between the annotated and novel BPs (Fig. 2-­‐5D,E). The novel isoform, which was confirmed by RT-­‐PCR and sequencing (Fig. 2-­‐5E), contains a PTC in the newly included portion of the downstream exon, making it a potential NMD target (Fig. 2-­‐5D). Isoform-­‐specific primers used for qRT-­‐
PCR showed that the PTC isoform is up-­‐regulated about 3-­‐fold in upf1∆ yeast, with the 58 Figure 2-­‐5. Alternative BP usage reveals previously unknown nonsense-­‐mediated mRNA decay splice isoform. (A) Distance from 5'SS to BP for first and second BP in introns that use two BPs. Red line: x=y. (B) Three genes from (A) where novel BP (red) is located close the 5'SS and far from the annotated BP (blue). Intronic transcript position shown bellow each intron, direction indicated with white arrows. (C) Motif of upstream BP (top) and downstream BP (bottom) for 11 introns that use two BPs. (D) Branch-­‐seq read coverage from the top, middle, and bottom sections of the 2D gel arc (Figure 2-­‐S1A) 59 correspond to usage of the canonical LSM2 BP (blue dotted line and circle) and a “new” BP (red dotted line and circle). Potential alternative 3'SS usage would insert a premature termination codon (octagon stop sign). (E) RT-­‐PCR and subsequent sequencing confirmed the novel LSM2 PTC isoform. (F) qPCR verification that LSM2 PTC isoform is up regulated in upf1 null yeast. 60 annotated isoform remaining unchanged (Fig. 2-­‐5E,F), implicating NMD in targeting of the novel isoform. Thus, it is likely that AS of the novel isoform is used to regulate the level of LSM2 message and protein. Changes in splicing among growth conditions To investigate the regulation of novel AS, we mapped RNA-­‐seq data from 18 different environmental growth conditions to annotated and novel intron junctions to assess intron retention (Fig. 2-­‐6A) (Waern & Snyder, 2013). MISO (Katz, Wang, Airoldi, & Burge, 2010) was used to quantify “percent spliced in” (PSI, representing the fraction of a gene’s mRNAs that retain the intron) across samples. Novel introns generally had high PSI values relative to previously annotated introns, with some exceptions (Fig. 2-­‐6A). Even some novel splice sites that are poorly conserved in other yeast species, such as RPL43B and MTR2 (Fig. 2-­‐S6), undergo splicing changes in response to environmental conditions. Overall, we observed increased intron retention during stationary phase as compared to all other environmental conditions analyzed (Fig. 2-­‐6A). Additionally, growth in salt or juice conditions each seem to have unique effects on the splicing profile. We also observed a group of genes that have substantial but not complete intron retention during most growth conditions (bottom of Fig. 2-­‐6A). Since AS is known to be more common in yeast during meiosis than in vegetative growth, we analyzed RNA-­‐seq and ribosome footprint profiling data from a detailed meiosis time course (Brar et al., 2012) to determine if the splicing and translation of the novel splice isoforms that do not appear to be regulated under mitotic growth (Fig. 2-­‐6A, black boxes) are regulated during the dramatic cellular transformation of meiosis. We 61 Figure 2-­‐6. Rare splicing of novel retained introns mirrors splicing patterns of known introns. 62 Clustering of psi values calculated by MISO for retained introns in (A) RNA-­‐seq of 18 environmental conditions (Waern & Snyder, 2013) and (B) Ribosome Footprint Profiling data in a meiosis time course (Brar et al., 2012). Black; retained introns. Purple; introns spliced out. Side bar: red; novel intron, blue; annotated intron. If psi value was confident in at least half of the samples, unknown psi values were replaced with the mean psi value in the confident conditions. *alternative splice site to annotated splice site. ** only one splice site overlaps gene ORF listed. ***antisense to an annotated transcript. ****intron likely in unannotated UTR of transcript. Alternative splice sites were considered as retained introns if the annotated introns were not detected with entropy≥2. YLL056C: 5'UTR supported by RNA-­‐seq. IDP3: 5'SS inside ORF. RFU1 and RSB1: 3'SS inside ORF. (C) Sashimi plot (Katz et al., 2010)depiction of ribosome footprint profiling splice junction reads from (B) joining YNL194 and YNL195 transcripts at a few stages of meiosis. 63 observe that the YNL194C-­‐YNL195C splice fusion transcript (whose splicing was previously confirmed (Miura et al., 2006), is rarely spliced during environmental stress conditions (Fig. 2-­‐6A), but is differentially represented in the translatome during meiosis (Fig. 2-­‐6B,C, Fig. 2-­‐S7). Changes in translation of these novel isoforms during meiosis suggest involvement in meiotic progression, as in the case of YNL194C, an integral membrane protein required for sporulation (Young et al., 2002) (Fig. 2-­‐6B,C). The other novel isoforms might be regulated under other conditions, or might represent new introns that have yet to evolve function. Discussion The Branch-­‐seq approach introduced here allowed comprehensive identification of BP and associated 5'SS from individual lariat RNAs with high precision (Fig. 2-­‐1). These BPs revealed unexpected post-­‐transcriptional regulatory capacity of the yeast genome. Examples included genes that make use of multiple BP with various regulatory consequences. For example, alternative BP usage in the LSM2 pre-­‐mRNA alters 3'SS usage, producing a message that is degraded by NMD, enabling regulation of expression level at the level of splicing (Fig. 2-­‐5), analogous to regulatory strategies used by a number of metazoan splicing factors (Sureau, 2001; Wollerton et al., 2004). We also found that the EFB1 intron contains alternative BP whose use shifts the location of the snoRNA snR18 between the lariat loop and the lariat tail (Fig. 2-­‐5B), potentially impacting the relative production of mature mRNA and mature snoRNA as seen for other snoRNAs (Hirose et al., 2006). 64 Some novel introns were spliced at intermediate or even high levels (e.g., those in the MTR2, RPL22, and RPL43B genes), but most appear to be spliced at lower levels than annotated introns in standard growth conditions (Fig. 2-­‐6A). Considering all introns, we observed a large increase in intron retention during stationary growth that affected most known introns, as well as some novel introns such as the one in RPL43B (Fig. 2-­‐6A). Increased retention of a substantial subset of introns was observed in salt stress, however most novel introns maintained relatively low levels of splicing across most of the environmental conditions examined. Nevertheless, some of the novel introns that exhibited low and constant splicing across all environmental conditions, like PDC1, exhibited large changes in splicing during meiosis (Fig. 2-­‐6B). This observation suggests that some yeast introns may have specialized regulation that is only observed when considering a large number of cell states and conditions. Additionally, further analysis of different types of AS, including alternative 5'SS and 3'SS, may reveal regulation under the conditions we have examined or other conditions. Defining novel splice junctions from RNA-­‐seq led us to find and validate an intron with AT-­‐AC splice site dinucleotides, nested inside the annotated intron of RPL30. To our knowledge, this is the first AT-­‐AC splice site intron reported in S. cerevisiae. The relatively rare AT-­‐AC introns that occur in metazoans are often spliced by the U12-­‐dependent “minor” spliceosome (Russell et al., 2006). However, the extended 5'SS and BP motifs characteristic of U12-­‐dependent introns are absent in this case, and no evidence for presence of U12 snRNA or related machinery has been found in S. cerevisiae, ruling out involvement of the minor spliceosome (Russell et al., 2006). The U2-­‐dependent “major” spliceosome is also capable of splicing a small subset of AT-­‐AC introns (Dietrich, Incorvaia, 65 & Padgett, 1997). However, the RPL30 /AT 5'SS does not resemble typical /AT 5'SS spliced by the major spliceosome (Fig. 2-­‐4G), leaving open the question of whether this intron is indeed spliced by the major spliceosome or by some other mechanism (e.g., a protein enzyme or an RNA-­‐based self-­‐splicing mechanism) (Kruger et al., 1982). Recently, recurrent mutations in several core spliceosome components that recognize BP and intron 3' ends, including U2 snRNP component SF3B1 and the U2AF1 and U2AF2 genes, have been observed in leukemias (Quesada et al., 2012), raising interest in understanding details of BP and 3'SS recognition. Branch-­‐seq is a powerful method for detection of BP in small lariats, and yeast have provided many insights into the inner workings of the spliceosome (Hossain & Johnson, 2014), making this a suitable system for application of Branch-­‐seq to study BP regulation. Due to ease of creating double mutant in yeast, Branch-­‐seq could be used to study the effects of perturbations of the core splicing machinery on BP selection by crossing the desired spliceosome mutant to dbr1∆ yeast. Applying this method to other organisms with small introns such as other fungi, plants or Drosophila could aid in detection of novel introns or regulatory mechanisms, such as recursive splicing (Burnette, Miyamoto-­‐Sato, Schaub, Conklin, & Lopez, 2005) or stalled splicing (Dumesic et al., 2013). 66 Methods Yeast strains Strains were grown in YPD (1% Yeast extract, 2% Peptone, 0.01% Adenine hemisulfate, 2% Dextrose) at 30°C with vigorous shaking unless otherwise noted. The null strains were obtained from the deletion collection. WT (s288c): BY4742 Mat α his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0. dbr1Δ:BY4742 Mat α his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0 YLK149C::KanMX4. upf1Δ: BY4742 Mat α his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0 YMR080C::Kan.
RNA isolation RNA isolation for Lariat-­‐seq and Branch-­‐seq was performed as follows. Yeast were grown to OD600 0.94-­‐0.98 and were collected by centrifugation at 7000 RPM for 5 min at 4°C. Media was poured off and yeast were washed twice in water and frozen at -­‐80°C. Cells were thawed and transferred to tubes containing 2.8mm ceramic beads and 1mL Trizol (Life Technologies) was added to 1/10 cell pellet. An Omni Bead Ruptor was used to lyse the cells, twice for 20 seconds on ½ max speed and once for 10 seconds on max speed. Samples were incubated at room temp for 5 min, 1/5 volume of chloroform was added and mixed, samples incubated at room temp 2-­‐3 min and were spun at max speed for 15 min at 4°C. The upper aqueous layer was transferred to a new tube and precipitated with ½ volume isopropanol. After 5min on ice, samples were spun at max speed, 4°C for 25 min. The RNA pellet was washed with 70% ethanol before storage at -­‐80°C. RNA isolation for RNA-­‐seq was performed as follows. Overnight yeast cultures were 67 grown in 5mL YPD media and were diluted in the morning into 50mL YPD and grown to log phase (OD600 0.5 to 1), spun down, and the pellets were frozen in liquid nitrogen. RNA was isolated as in (Clarkson, Gilbert, & Doudna, 2010). Pellets were resuspended in 1mL Acid Phenol and an equal volume of AES buffer (50mM NaAcetate pH 5.2, 10mM EDTA, 1% SDS) was added. In 2mL eppendorf tubes, samples were incubated at 65°C for 10 min with vortexing every minute. Samples were incubated on ice for 5 min and then transferred to a phaselock tube and one volume chloroform was added. After spinning, the top aqueous layer was transferred to a fresh phaselock tube and one volume of phenol:chloroform:isoamyl alcohol (25:24:1) was added, tubes were spun, one volume of chloroform was added, tubes were spun, and the aqueous layer was transferred to a fresh tube to be precipitated with 50uL 3M NaOAc (pH 5.5) and 550uL isopropanol. Samples were spun at max speed for 25 minutes at 4°C. The pellet was washed twice with 70% ethanol and resuspended in water. 2D PAGE Gels For all 2D polyacrylamide gels, RNA was mixed with an equal volume of denaturing loading dye and heated at 80-­‐95°C prior to loading. For the Branch-­‐seq gels, ultra-­‐pure sequagel reagents from National Diagnostics were used to pour 6% (first dimension: D1) and 20% (second dimension: D2) gels. These gels were poured with 1.5mm spacers and ~20cm by ~32cm plates and a metal heat sink was used. D1 was poured the night before with 12 wells and stored at 4°C in saran wrap to maintain moisture. D1 was run at 15W for 1hr and 45 min, stained with sybr gold, and imaged on a Safe Light. D2 was poured while D1 was running using a comb with one large well. After removing the D2 comb, running 68 buffer (TBE) was added to the well to aid in D1 gel insertion. A single lane of the D1 gels was cut out with a clean razor and slid into the D2 gel using tweezers and a razor blade, taking care to minimize the number of air bubbles between the D1 and D2 gel interface. Additional loading dye was added on top of the D1 gel slice in the D2 gel for easy visualization of running of the D2 gel. D2 was run at 30W for 6hr and 30 min. Gels were stained with sybr gold. Slices were excised, and in the case of Fig. 2-­‐S1A were frozen at -­‐
20°C. The gels used for lariat-­‐seq were pre-­‐cast mini gels from Invitrogen where D1:6% and D2: 10% where the wells were manually cut out to make one large well. RNA was eluted from 2D gels using PAGE elution buffer (30 mM Tris-­‐HCl (pH 7.5), 300 mM NaCl, and 3 mM EDTA) (Ooi et al., 2001) 12mL for by rotation over night at 4°C. RNA was precipitated with isopropanol and glycogen. Debranching enzyme purification S. cer. DBR1 cDNA was generated from WT S288C yeast and cloned into the pET151 expression vector from Invitrogen. Protein was expressed in Rosetta 2(DE3)pLysS competent cells grown in YT media at 37°C until they were induced with IPTG and grown at 18°C. Bacteria were lysed using Native Lysis Buffer (Qiagen). Protein was purified with a Ni-­‐NTA column (Qiagen) and subsequently over an S200 column (Buffer: 125 mM KCL, 20mM HEPES pH 7.3, 1mM DTT, 10% glycerol). Protein was concentrated (final 50% glycerol) and flash frozen. Protein was tested for RNase activity and debranching activity on linear RNA an in-­‐vitro spliced lariat. 69 Isolation of in vitro-­‐spliced Drosophila melanogaster lariat RNA HeLa nuclear extracts for in-­‐vitro splicing were a kind gift from the Reed Lab (Folco, Lei, Hsu, & Reed, 2012). Coupled in vitro transcription and splicing were performed similar to Folco and Reed (Folco & Reed, 2014) except without addition of α-­‐amanitin to obtain as many lariats as possible. Reactions were digested with RNase R (Epicenter) at 37°C for 1 hour to obtain radio labeled FTZ lariats. Debranching Debranching was performed similar to (Ooi et al., 2001) protocol. Briefly, RNA and debranching enzyme were incubated for 1 hour at 30°C in debranching buffer (5X debranching buffer: 100nM Hepes, 625mM KCl, 2.5 mM MgCl2, 5 mM DTT, 50% glycerol). Prior to debranching of the top, middle, and bottom fractions of lariats, radio-­‐labeled FTZ lariat RNA was spiked in to each sample to confirm debranching via gel electrophoresis. Samples were phenol chloroform extracted after debranching and ethanol precipitated. PolyA tailing Debranched lariat RNA was poly(A) tailed using E. coli poly(A) polymerase from NEB for 10 minutes at 37°C and subsequently phenol chloroform extracted and isopropanol precipitated. Reverse transcription Reverse transcription was performed using primer /5Phos/AGATCGGAAGAGCGTCGTGTAGGGAAAG/iSp18/CACTCA/iSp18/GTGACTGGAGTTC
70 CTTGGCACCCGAGAATTCCA/TTTTTTTTTTTTTTTTTTTTVN (designed in collaboration with Yarden Katz (Katz et al., 2014)) incubated with SuperScriptIII RT (Invitrogen) for 30 min at 48°C. Subsequently 2.1 uL of 1M NaOH was added and samples were incubated at 98°C for 15 min. The RT primer is a modified version if the ribosome footprint profiling RT primer where the 5' end of the RNA gets sequenced first and paired end, barcoded sequencing is possible(Ingolia, Ghaemmaghami, Newman, & Weissman, 2009). The samples were then run on a 6% TBE-­‐urea gels (Invitrogen) for 93 min at 200V to remove excess RT primer. Gels were stained with SYBR gold and gel slices were excised where product was observed to run above the RT primer for the top, middle, and bottom lariat samples. Gel slices were shredded and DNA was eluted in 400uL PAGE elution buffer overnight (see 2D gels above). Gel was removed before precipitation using a NanoSep column. Circularization Circligase (Epicentre) was used to circularize the gel isolated RT products for 1 hour at 60°C and the enzyme was inactivated by heating at 80°C for 10 minutes. PCR Phusion high-­‐fidelity polymerase (NEB) was used to amplify the circularized products. Illumina PCR primer 1.0 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT was paired with Illumina barcode primers (RPI#s) (RPI1) CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA 71 (RPI2) CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA (RPI3) CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA (RPI4) CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA Samples were removed after 6, 8, 10 and 12 PCR cycles and run on an 8% TBE gel (Invitrogen) for 40 min at 200V. PCR products were gel isolated by shredding the gel through a hole poked with a needle in the bottom of a 0.5 mL eppendorf tube and eluted in 400uL PAGE elution puffer (see above) at 65°C, shaking at 1400RPM for one hour. Gel was removed with a NanoSep column and precipitated with isopropanol. Oligonucleotide sequences © 2006-­‐2008 Illumina, Inc. All rights reserved. http://epigenome.usc.edu/docs/resources/core_protocols/Illumina%20Sequence%20Info
rmation%20for%20Customers%20DEC2008.pdf Sequencing One Illumina MiSeq flow cell was sequenced at the MIT Bio Micro Center (November 2011). 5' end reads were 50 bases and 3' end reads were 250 bases. 3' end reads were sequenced with custom sequencing GTGACTGGAGTTCCTTGGCACCCGAGAATTCCATTTTTTTTTTTTTTTTTTTT primer to avoid sequencing the un-­‐templated As added by the poly(A) tailing reaction. The 3' end sequencing primer was gel purified prior to use in sequencing (primer design might need to be changed for sequencing on other Illumina machines). Branch-­‐seq read mapping 72 Reads were trimmed to 30 by 30nt and mapped with Bowtie1(Langmead, Trapnell, Pop, & Salzberg, 2009) (bowtie-­‐1.0.0) using the following parameters: bowtie -­‐S -­‐m 1 -­‐1 end1reads.fastq -­‐2 end2reads.fastq. Branch-­‐seq reads for each gel slice were mapped to the genome and then combined using samtools merge (samtools-­‐0.1.7a)(Li et al., 2009). Reads were initially mapped to SacCer2 (S288C_reference_genome_R61-­‐1-­‐1_20080605) and subsequently to SacCer3 (S288C_reference_genome_R64-­‐1-­‐1_20110203) downloaded from SGD (http://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/). Peak calling Was done in SacCer2 and peak calls were converted to SacCer3 coordinates using the UCSC browser tool, liftOver (http://genome.ucsc.edu/). Peaks were called using the combined reads from the top, middle, and bottom sections of the arc (see samtools merge above). For Figure 2-­‐1D if there were multiple peaks within 3nt of the annotated BP, the annotated BP was only counted once. winBP peak calling The sliding window approach was adapted from Arribere and Gilbert (Arribere & Gilbert, 2013) with some modifications. A 200nt region was taken starting at the 5' end of each chromosome. Average read coverage per nucleotide, α, for this region was calculated using only BP end (second end) reads and was required to be at least 0.1. A sliding window of 5 nt (196 of these windows/200nt region) within each 200nt region was used to reduce 73 spurious calls in regions with uneven coverage. If coverage in the 5 nt sliding window was at least 12α a peak was called. At least 1nt was required between reported peaks. Peak calling was done for each strand, always in the 5' to 3' direction. The 200nt regions was shifted 100nt down the chromosome, and the steps outlined above were repeated until reaching the end of the chromosome. winBP recovered 58% (153/260) (Table 1) of annotated BPs in expressed genes. GEM-­‐BP peak calling To discover BP events from the data, we extended the ChIP-­‐seq and ChIP-­‐exo peak caller GEM4 that calls events with high spatial resolution. Unlike other peak callers, GEM does not assume any specific distribution of reads, and therefore is flexible to adapt to a new data type by learning a data-­‐specific empirical spatial read distribution. We used a +/-­‐
10bp window around the confident set of annotated BPs to learn the empirical read distribution (Fig. 2-­‐1C) and used it for peak calling by GEM. To avoid including noisy reads from the non-­‐BP strand, we modified GEM to perform single-­‐strand peak calling and used only the 3' end (BP end) reads as input. As part of the integrated event finding and motif discovery process, GEM discovered the consensus BP motif TACTAAC, some variants that are similar to the consensus motifs, and a poly A motif that represents technical artifacts resulting from anchored oligo(dT) RT step of the protocol. To distinguish events associated with different motifs, we modified GEM to use multiple position weight matrix (PWM) motifs as the positional priors for event discovery. If a base position is matched by multiple motifs, GEM chooses the PWM model that has a more significant p-­‐value to set the positional prior. For each called event, GEM computes an event shape score that quantify 74 the similarity of the event read distribution to the empirical read distribution. The event shape score is defined as the Pearson correlation of read count values across the +/-­‐10bp bases between the called event and the empirical read distribution. The new functionalities of the GEM software, which we called GEM-­‐BP, were implemented in version 2.6. The following parameters were used to analyzed the Branch-­‐seq data: -­‐-­‐k 7 -­‐-­‐a 2 -­‐-­‐q 1 -­‐
-­‐bp -­‐-­‐pp_pwm -­‐-­‐not_update_model -­‐-­‐nrf -­‐-­‐nf. We then post-­‐processed the GEM-­‐BP event calls to discover BP events using a Random Forest classifier (Breiman, 2001) in the MATLAB software (MathWorks, 2012).The features for the Random Forest include GEM-­‐BP event read count, event shape score, and the binary motif categorical variables. We used the GEM-­‐BP calls that overlap with the annotated BPs as the positive training set, and those that overlap with the tRNA genes as the negative training set. The trained Random Forest classifier was then applied to all of the GEM-­‐BP event calls to make the final BP event calls. In total, GEM-­‐BP discovered 546 BPs (Table 1), including 75% of expressed BPs (196/260) (Table 1) within 3 nt of their annotated locations (Fig. 2-­‐1D). Of 546 GEM-­‐BP predicted BPs, 47 (8.6%) had more than one mismatch from the BP consensus motif TACTAAC, compared to 74 (21.5%) of the 344 peaks identified by the winBP approach. These numbers indicate that the GEM-­‐BP predictions are more biased toward consensus BP, presumably because of its use of motif information and training on annotated BP, which match the consensus very closely, information which is not used by the winBP approach. Thus, we used the union of predictions made by both peak callers for subsequent analyses. Typical 5'SS filter for putative novel BPs 75 GEM-­‐BP and winBP together called numerous unannotated BPs in the yeast genome; the union of their peak calls yielded 430 putative novel BP peaks in all (Table 1). To define a high confidence subset of putative novel BPs, the paired-­‐end sequencing information from Branch-­‐seq was used as a built-­‐in quality control for BP identification. Branch-­‐seq data contains strand-­‐specific read pairs connecting the BPs and 5'SS. Authentic putative novel BP resulting from splicing should be associated with a plausible 5'SS motif at the start of the associated 5' end reads, while any artefactual putative novel BP peaks would not be expected to have such a motif. For each BP, we took all BP end reads (3' end) within 5nt of the BP peak, accounting for strand. We obtained the paired 5'SS read for each BP read in this set and noted the location of the 5'SS read start. We calculated the mode position from all 5'SS read starts for that BP and looked at the 6mer motif at that position and one position 3'. We considered 6mers that matched the yeast 5'SS consensus GTATGT perfectly or with at most one mismatch as ‘typical 5'SS motifs’, and all others as ‘atypical 5'SS motifs’. Almost all (97%) annotated yeast introns in nuclear genes have typical 5'SS motifs by this definition (Table II-­‐S3). Of the Branch-­‐seq 3' end peaks that were associated with annotated BP, 76% (149/196) and 90% (138/153) had 5' end peaks at the annotated 5'SS for GEM-­‐BP and winBP, respectively (Table II-­‐S1). This result indicates that our approach can reliably and comprehensively map both the BP and 5'SS of introns, as intended. After applying the typical 5'SS filter to the 430 putative novel BP, 268 cnBP remained. This subset of 268 should be treated as highly confident and was used for all downstream analyses. We estimate the FDR for the set of 268 BP is 1.1%, which is the 76 genomic background frequency of 6mers matching typical 5'SS motifs in the yeast nuclear genome (Table II-­‐S3, see bellow). As a note, the overlap between the GEM-­‐BP and winBP cnBP was only 80 BPs (Table 1), further suggesting that the two methods have different strengths and weaknesses in their ability to call novel BPs and there is benefit to using both methods. Lariat-­‐seq library prep Reverse transcription was performed on 2D gel isolated lariat RNA using 1ul Random hexamer Primers (3ug/ul) (Invitrogen) and SuperScript III reverse transcriptase (Invitrogen). RNA and primer mix was heated at 70°C for 10 minutes and then put on ice. 12 uL of Mix A (mix A: 4uL 5x 1st strand buffer, 2uL 100mM DTT, 1uL dNTPs (10mM), 4uL Actinomycin D [1mg/1mL], 1uL SuperaseIn (20U/ul)) was added to the RNA and primer. Then 1 uL of SSIII was added and the RT program was run: 25⁰C 10 minutes, 42⁰C 50 minutes, 70⁰C 15 minutes, 4⁰C hold. Sample volume was brought up to 200uL with water and then samples were phenol chloroform extracted and ethanol precipitated. Second strand synthesis was performed with DNA pol I and dUTP to make strand specific libraries. Next the samples underwent SPRI-­‐TE (end repair, adenylation, adapter ligation, gel purification #1). Subsequently uracil digestion was performed with USER, samples underwent PCR and gel purification before sequencing (1/30 of a HiSeq2000 lane). RNA-­‐seq library prep RNA was isolated using the hot acid phenol method (see RNA isolation above) to ensure isolation of high quality RNA. All 6 samples, 2 WT, 2 dbr1Δ, 2 upf1Δ, had RQN 77 (quality) values of 8.8 or higher as measured on the Advanced Analytical machine. Strand specific libraries were prepared by the MIT Bio Micro Center using the TruSeq™ RNA Sample Prep Kit v2 (RS-­‐122-­‐2101 kit) through cDNA after which LM-­‐PCR was preformed using the Beckman Coulter SPRIte system with a 200-­‐400bp size cutoff. Samples were barcoded and all sequenced in one HiSeq2000 lane. Genomic background frequency of 5'SS motif One random position was selected in each of the 298 nuclear encoded intron containing genes in the SacCer3 genome annotation. The 6mer motif beginning at this location was score for number of mismatches from “GTATGT.” This was done 10 times to obtain 2980 simulated 5'SS in introns. 10 motifs had 0 mismatches and 24 motifs had 1 mismatch for an estimated FDR of 1.1% ((10+24)/2980) (Table II-­‐S3). Lariat tails are largely absent in vivo Lariat tails appear to be efficiently digested in vivo, as previously reported, evidenced by a dearth of Lariat-­‐seq reads in the long lariat tail of UBC13 (Fig. 2-­‐S1B). With Branch-­‐seq we are able to see RT priming preferences based on the nucleotides left down stream of the BP nucleotide after digestion of the lariat tail. It appears 2 nt are generally left after the BP, resulting in RT priming peaks that begin at the +1 or +2 position relative to the BP (Fig 2-­‐S1C) depending on the genomic sequence at those positions (Fig. 2-­‐S1D-­‐E). The peak at -­‐2 relative to the BP is likely to miss-­‐priming of RT (Fig. 2-­‐S1D). See Fig. 2-­‐S1 legend for more information. To our knowledge, this is the first report of the precise number of 78 nucleotides downstream of the BP nucleotide left undigested in lariat tails from RNA isolated from dbr1∆ yeast. Mapping lariat junction reads Lariat junction reads were identified and aligned in four main steps: 1. Reads were attempted to be aligned to the S.cer. genome using the Bowtie (version 1.0.0) read aligner and those aligning with fewer than 4 mismatches were omitted from further analysis.
2. Each unalignable read was split into two fragments such that each fragment was at least 12 bases long and the hexamer beginning the second fragment had maximum probability of being sampled from the S.cer. 5’ss position weight matrix. Reads for which this maximum probability was less than 0.01 were omitted from further analysis. The fragments will be referred to by their position at the 3’ or 5’ end of the original read moving forwards. 3. The fragment pairs were mapped to the S.cer. genome using the bowtie read aligner allowing no mismatches. The fragments were required to map in an inverted order (3’ fragment upstream of 5’ fragment). The final base of each 5’ fragment, the putative BP nucleotide, was omitted from this alignment due to the prevalence of mismatches at this position. 4. For all fragment pairs with a valid alignment, the final base of each 5’ fragment was re-­‐added. The aligned position of the 3’ end of the 5’ fragment was called as a BP and the aligned position of the 5’ end of the 3’ fragment was called as the corresponding 5’ss. 79 Skipping across lariat 5'SS-­‐BP junctions We found that reverse transcriptase often introduces short insertions and deletions when crossing a lariat junction. This results in the 3' end of 5' fragment of lariat junction reads not always ending directly at the BP. The frequency of these events was determined by comparing the BP location called by each lariat junction read to a known BP location as annotated by Meyer et al. within 25 bps if one exists. Figure 2-­‐S2D reports the distribution when allowing no mismatches as used elsewhere in this paper. This criterion precluded observing insertion events as they were found to always have the sequence UACUACU at the 3’ end of the 5’ fragment, resulting in mismatches in the last two positions when aligned to the BP consensus motif. BP calling from lariat junction reads In order to make precise BP calls from the lariat junction reads, a probabilistic model based on the observed skipping rates in introns with annotated BP and a self-­‐learned BP motif position weight matrix (PWM) was used. Reads were separated into clusters based on proximity of their downstream ends. The ith cluster of reads is denoted by Ri. The distribution P (Bi = x | Ri ), where Bi is a RV indicating the location of the BP generating Ri, was computed using the proportion P (Bi = x | Ri ) / P (Ri | Bi = x) ⇤ P (Bi = x). Assuming a uniform prior and that reads are independent P (Bi = x | Ri ) /
Q
r2Ri
given a BP, we rewrite proportion as P (r | Bi = x) . Note that P (r | Bi = x) is simply the probability of observing a deletion of the size in read r given Bi = x. 80 this An EM framework was used to learn a BP motif PWM, which was then used to improve precision. Beginning with an unbiased motif, the following protocol was repeated until the motif did not change between iterations: 1. Calculate P (Bi = x | Ri , M ) , where M is the current motif, by multiplying P (Bi = x | Ri ) and the probability that the motif implied by Bi = x would be sampled from M and then normalizing by the sum across each cluster. 2. Refine M based on the updated distribution. For each nucleotide in all positions in M , start with a pseudocount of 1. For all possible x , in all clusters i, add P (Bi = x | Ri , M ) to the count for the nucleotide in the respective position, for each position in the motif. Normalize by dividing all counts by the number of clusters plus 4. Mapping RNA-­‐seq reads for entropy calculations 60 X 60 bp reads (WT, upf1 null, and dbr1 null samples) were initially mapped with TopHat2(Kim et al., 2013) (tophat-­‐2.0.0.Linux_x86_64) giving TopHat no annotations and allowing it to discover novel splice junctions using the following parameters: tophat -­‐i 20 -­‐
I 10000 -­‐a 10 -­‐-­‐segment-­‐length 15 -­‐-­‐bowtie1 SacCer3 end1.fastq end2.fastq Each barcoded sample was mapped on its own and additionally all samples were mapped together to find as many novel splice junctions as possible. A custom Bowtie index was created for all splice junctions found by Tophat by concatenating the 50nt of sequence immediately before and after the junction to ensure the reads had at least a 10nt overhang on each side of the junction. Bowtie1 was run with this custom index (genome + novel splice junctions) on each end of each sequencing library separately because parried end 81 reads would be able to map to this custom index with many 100nt fragments. Bowtie was run as follows: bowtie -­‐S -­‐m 1 -­‐SacCer3_custom_index one_end_reads.fastq outfile.sam. Bowtie read mapping to the custom splice index was used to calculate entropy of each splice junction(Graveley et al., 2011) using the formula bellow, as done in Graveley et al., using the positions around the junction where read starts may fall. pi = reads at offset i / total reads to junction window Entropy = -­‐ sumi(pi * log(pi) / log2) RPL30 AT-­‐AC isoforms These isoforms insert a stop codon early in the message, generating an upstream open reading frame (uORF). These isoforms might therefore be translated under specific conditions via uORF-­‐mediated translational regulation (Hinnebusch, 2006), potentially producing a truncated protein comprising the C-­‐terminal half of full length RPL30. RPL30 is known to regulate splicing and translation of transcripts from the RPL30 locus by binding to RNA secondary structure at the 5' end of the pre-­‐mRNA or mRNA. Conservation PhastCons scores were downloaded from the UCSC genome browser (phastCons7way) for the novel BP and novel splice site analyses. For the novel splice site plots, the entire region surrounding the splice site in the figure had to fall into the region of question (i.e., intron or CDS). “Intergenic” refers to any region completely outside of a 82 CDS or intron. For the BP conservation plot, only the location of the BP was considered for classifying the BP by location. Protein length analysis For all novel splice junctions with entropy at least 2 that overlap an annotated gene, the protein sequence of the resultant transcript was constructed. The length of each novel protein sequence was compared to the length of the annotated protein from the same gene and reported in figure 2-­‐4C. When constructing the novel protein sequences, the following assumptions were followed: 1. In cases where a gene has multiple novel splice junctions, only one is considered at a time (i.e. if there are 3 novel splice junctions in one gene, three protein sequences are created). 2. All annotated introns are spliced out, except if they overlap the novel splice junction being considered at the time. 3. If a novel splice junction removes the annotated translation start site, the next available AUG is used. MISO analysis of splicing Retained intron annotations were created from all splice junctions with entropy >=2. Retained introns were splice junctions detected in the WT, upf1 null, or dbr1 null samples that did not overlap any other splice junctions detected, annotated or novel. To build the RI MISO annotations 200nt flanking the intron was used as exonic sequence. MISO (misopy/0.4.6) was run. For Waern et al data (downloaded 83 from http://downloads.yeastgenome.org/published_datasets/Waern_2013_PMID_233906
10/fastq/), -­‐-­‐read-­‐length = 76. For Brar et al. data (GEO accession number GSE34082), only reads of length 28-­‐30 nt were used and -­‐-­‐read-­‐length was set to 29. Only footprints are shown for Brar et al. data because the total RNA libraries had few reads that fell into the 28-­‐30 nt range. Prior to mapping Brar et al. data, poly(A) adaptor sequences were trimmed off of the reads using Cutadapt. Brar et al. and Waern et al. reads were mapped to the genome, defined splice junctions (UCSC, sacCer3), and novel splice junctions with entropy ≥ 2 in the WT, upf1 null, and dbr1 null RNA-­‐seq (see above) using Tophat2. Summary tables from MISO output were generated for evens with x=1, y=0, n=20, psi confidence = 0.5 (see “Using the read class counts” https://miso.readthedocs.org/en/fastmiso/). These were considered “confident” psi values (see bellow). Clustering of PSI values If an event had confident PSI values in at least half of the conditions, the missing psi values were replaced with the mean PSI from the confident samples. Clustering was done with heatmap.2 in R (Warnes et al., 2015). Cufflinks (RNA-­‐seq FPKMs) Cufflinks (Trapnell et al., 2012) (version 2.2.1) was used to calculate FPKMs for the RNA-­‐seq data using the command cuffdiff -­‐o . -­‐-­‐library-­‐type fr-­‐firststrand -­‐u -­‐N -­‐b SacCer3.fsa saccharomyces_cerevisiae_R64-­‐1-­‐1_20110208.gff dbr1-­‐1.bam,dbr1-­‐2.bam upf1-­‐1.bam,upf1-­‐2.bam 84 wt1.bam,wt1.bam Branch-­‐seq CPM calculations Branch-­‐seq CPMs were calculated using the formula CPM = F/((L)(M/1,000,000)) Where M is the total number of mapped reads. F is the number of strand-­‐specific BP (3' end) reads within the L nucleotides centered on the BP peak. L=11 nt. Genes with multiple BPs 5'SS-­‐BP pairs from annotated introns with computationally predicted BPs (282)(Meyer et al., 2011) and all 268 cnBPs with typical 5'SS 5'SS-­‐BP were considered in this analysis for a total of 550 5'SS-­‐BP pairs. Any overlapping 5'SS-­‐BP pairs on the same strand were grouped into one “intron island.” For islands that contain 2 or more BPs, it was required that there was a BP motif with 2 or fewer mismatches from “TACTAAC” within 3nt of the BP peak the keep the peak for downstream analyses. This yielded 11 intron islands that use 2 BPs and one intron island that uses 3 BPs. For the genes that use 2 BPs the distance from the 5'SS to the BP is the distance for each BP to its paired 5'SS. BP1 is the more 5'SS BP in the intron island. Sequence logos made with WebLogo(Crooks, Hon, Chandonia, & Brenner, 2004). Novel and annotated BP motifs Sequence 15nt up and downstream of the BP peaks were submitted to MEME (Bailey et al., 2009) (Version 4.10.0) to generate sequence logos. Only BP detected by Branch-­‐seq are in the logos in Figure 2-­‐2. 85 Human BP motif was generated using sequences 10 nt up and downstream of the BP nt from Mercer et al’s (Mercer et al., 2015) annotated BPs. 1000 sequences were submitted to MEME (maximum MEME accepts) to generate the motif. LSM2 qPCR primer sequences Actin primers: ScerACT1_junct_F: ATGGATTCTGAGGTTGCTGCT ScerACT1_mRNA_Rev: GGAGTCTTTTTGACCCATACCGA LSM2 constitutive exon: LSM2 qPCR Exon 2F constitutive: TAAAAAACGACATTGAAATAAAAGGTACA LSM qPCR Exon 2R constitutive: TTCATCTGTGCATGATATGTTGTCTA LSM2 novel 3'SS (PTC isoform): LSM2 qPCR new 3’ss junction F: GTGGTCGTAGAGTCAAGTACTAAC LSM qPCR Exon 2R constitutive: TTCATCTGTGCATGATATGTTGTCTA LSM2 annotated 3'SS isoform: LSM2 qPCR canonical (normal) 3’ss junction F: GTGGTCGTAGAGTTAAAAAACGAC LSM qPCR Exon 2R constitutive: TTCATCTGTGCATGATATGTTGTCTA RNA14 (NMD negative control): GG10_for: ATGTCCAGCTCTACGACTCCTGAT GG11_rev: GCGTATGACTCTTGAGTTTCCAAA (From Joshua Arribere (Arribere & Gilbert, 2013)) TCA17 (NMD positive control): GG8_for:GCCTTGCTTCGTATCATTGATAGA 86 GG9_rev:CATCATCAGCTCCACTTAGGCTTT (From Joshua Arribere (Arribere & Gilbert, 2013)) RPL30 primer sequences RT: SuperScript II protocol (Invitrogen) GG13_YGL030W_rev: AAGCCAACTTTTGGTTGATAGA PCR: Phusion (NEB) GG14:YGL030W_5’end_for: agaccggagtgtttaagaacct GG15:YGL030W_rev_ATACjunc: TAACTGGGGCctgttgaaat SED1 primers For Figure 2-­‐S4B: RT: Random hexamers (Invitrogen), following SuperScript II protocol (Invitrogen). PCR: Phusion (NEB) GG17:SED1_for: TACATCTTTGCCACCAAGCA GG18:SED1_rev: TTTGGTGGTAGTGCCCTTAGA For Figure 2-­‐S5E: SED1 apparent RT artifact Colony PCR was performed to put a T7 primer onto the start of the SED1 sequence. PCR product was gel extracted and used as a temple for T7 in vitro transcription (Epicentre AmpliScribe™ T7-­‐Flash™ Transcription Kit), DNA was digested, and RNA product was cleaned via phenol chloroform extraction. RNA was gel extracted using UV shadowing visualization. RT and PCR were performed as in Figure 2-­‐S5B. 87 Scer_SED1_colony_Forward: TAATACGACTCACTATAGGGgacaagcaaaataaaatacgttcg Scer_SED1_colony_Reverse: ttaaactacccctattgcttttaga Plotting Additional plots in this paper were made with ggplot2(Wickham, 2009), IGV (Robinson et al., 2011), matplotlib, Pictogram, WebLobo, and MEME. 88 Data access The data can be found under GEO accession number GSE68022. GEM-­‐BP (GEM 2.6) software for peak calling can be downloaded from http://cgs.csail.mit.edu/gem/versions.html Code to find BPs from lariat junction reads can be downloaded from https://github.com/jpaggi/findbps Acknowledgements We’d like to thank Andy Berglund for initial ideas on the Branch-­‐seq protocol and members of the Burge lab for helpful discussions. We thank the Reed lab for coupled in-­‐
vitro splicing and translation reagents and protocols that were used in development of Branch-­‐seq. We thank Yarden Katz for primer design assistance, Shijie Zhao for initial conservation analysis of novel splice sites, David Weinberg for personal communication, Josh Arribere for sporulating dbr1∆ yeast, the Sauer and King labs for assistance with protein purification, Thomas Hansen and Angela Brooks for helpful discussions, and MIT Bio Micro Center for sequencing assistance. This work was supported by an NIH training grant, by an NSF equipment grant (no. 0821391) and by a research grant from the NIH (C.B.B.). Author contributions GMG, ETW, and CBB designed the study. GMG performed the experiments. GMG and JMP analyzed the data. YG and DG contributed to analysis tools. BZ processed the Brar et al. data. GMG, JMP, YG, BZ, WVG, and CBB wrote the manuscript. 89 Supplemental figures Figure 2-­‐S1. Additional details pertaining to Branch-­‐seq protocol. (A) Left: 2D gel used to isolate lariats from top, middle, and bottom sections of arc. Right: Top and bottom splices excised. D1: 6% TBE-­‐urea. D2: 20% TBE-­‐urea. (B) Read coverage (green) in UBC13 intron from Lariat-­‐seq. Depletion of reads between BP and 3'SS indicates 90 lariat tails are digested when lariats accumulate in dbr1∆ yeast(Chapman & Boeke, 1991). (C) Additional examples like inset in figure 2-­‐1B of read start plots for BPs in 4 individual introns. The majority of reads are located at +1 or +2 position on an intron by intron basis. (D) Hypothesis for predominant +1 vs. +2 read start position in individual introns. RNA sequence in black, question marks are unknown nucleotides after the BP. BP A in red. The RT primer, green, may prime at different locations, and produce sequencing products (blue arrow), starting at different positions relative to the BP nucleotide. +1 sequencing is expected if nucleotide after TACTAAC is an A because of anchored oligo(dT) priming step in RT. Similarly, +2 position is expected if nucleotide after TACTAAC is C, G, or T. Sequencing at -­‐2 is due to mis-­‐priming of anchored oligo(dT) primer over the terminal C of the BP motif. (E) Genomic sequence immediately downstream of annotated BPs (boxed) with maximum peak from (C) at +1, left, and +2, right, confirms hypothesis in (D). (F) Branch-­‐
seq reads in the EFM5 intron are shifted 5 nt from the annotated BP location (blue underline) corresponding to a AACTAAC BP (red underline). 91 Figure 2-­‐S2. Further characterization of novel BPs. (A) Left: Novel BPs (blue) are not conserved compared to annotated BPs (red). Right: novel BPs from blue line in left plot broken down by genomic location. (B) 5'SS motif of 162 92 putative novel BP with atypical 5'SS. (C) Novel BP overlapping YDL138W ORF (plus strand) comes from the minus strand, potentially from a longer form of the annotated CUT/SUT on the minus strand. Novel BP is confirmed by one Branch-­‐seq read pair and several Lariat-­‐
seq junction reads. (D) RT sometimes skips over the BP nucleotide in Lariat-­‐seq junction reads (see methods). 93 Figure 2-­‐S3. Characteristics of lariats captured by Branch-­‐seq. (A) Comparison of expression levels of lariats recovered in Branch-­‐seq (combined top, middle, and bottom slices of arc) to expression of their parent mRNA in poly(A) selected RNA-­‐seq. Only annotated BPs are plotted. (B) Same as (A) but regression calculated for different lariat sizes, suggested that Branch-­‐seq read counts are semi-­‐quantitative for lariat loops smaller than 100 nt. (C) Expression level of annotated and novel BPs recovered by Branch-­‐seq. (D) Lariat loop lengths recovered by Branch-­‐seq and Lariat-­‐seq LJ reads. 94 Figure 2-­‐S4. Novel introns confirmed by entropy resemble annotated introns but preferentially come from short transcripts. (A) Entropy of annotated (green) and novel (pink) splice junctions, separated by splice site motif AT/AC, GC/AG, GT/AG. A cutoff of entropy of 2 was used to define novel splice junctions(Graveley et al., 2011). (B) 5'SS and 3'SS motifs for annotated (top) and novel 95 (bottom) splice sites. (C) Gene lengths (TSS to poly(A) site)(Pelechano, Wei, & Steinmetz, 2013) for genes containing novel BPs identified in Branch-­‐seq and genes containing novel introns with entropy ≥ 2 identified in RNA-­‐seq data. 96 Figure 2-­‐S5. Experimental testing of AT-­‐AC splice site introns. RT-­‐PCR on total RNA to verify (A) RPL30 and (B) SED1 AT-­‐AC splice sites. SED1 AT-­‐AC splice site intron is located inside a long repeat (C) highlighted in green and (D) shown in a 97 dot plot. (E) RT-­‐PCR on in-­‐vitro transcribed full length SED1 RNA. The presence of a product here of the expected spliced size suggests the presence of some sort of RT artifact. 98 Figure 2-­‐S6. Conservation of novel intron splice sites from isoforms that show splicing patterns similar to annotated introns. Arrows above each splice site indicate sequence direction. UCSC browser snapshots are shown for splice sites located outside of coding sequences. 99 Figure 2-­‐S7. Translation of YNL194-­‐YNL195C fusion transcript changes throughout meiosis time course. Sashimi plots depict reads in exons and reads spanning splice junction (numbered arcs) with PSI value shown to the right with confidence bounds (tie fighter plot). Plots are ordered by progression through meiosis time course from Brar et al. (Brar et al., 2012) for (A) ribosome footprint profiling data. 100 Tables Table 1. Summary of BP peak calling analysis. Peak Caller
No. known BP
No. putative novel BP
No. of cnBP
winBP
153
191
126
GEM-BP
196
350
222
Overlap
151
111
80
Union
198
430
268
101 References Aebi, M., Hornig, H., Padgett, R. A., Reiser, J., & Weissmann, C. (1986). Sequence requirements for splicing of higher eukaryotic nuclear pre-­‐mRNA. Cell, 47(4), 555–565. doi:10.1016/0092-­‐8674(86)90620-­‐3 Arribere, J. A., & Gilbert, W. V. (2013). Roles for transcript leaders in translation and mRNA decay revealed by transcript leader sequencing. Genome Research, 23(6), 977–987. doi:10.1101/gr.150342.112 Awan, A. R., Manfredo, A., & Pleiss, J. A. (2013). Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans., 110(31), 12762–12767. doi:10.1073/pnas.1218353110 Bailey, T. L., Boden, M., Buske, F. A., Frith, M., Grant, C. E., Clementi, L., et al. (2009). MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research, 37(Web Server issue), W202–8. doi:10.1093/nar/gkp335 Beggs, J. D. (2005). Lsm proteins and RNA processing. Biochemical Society Transactions, 33(Pt 3), 433–438. doi:10.1042/BST0330433 Bitton, D. A., Rallis, C., Jeffares, D. C., Smith, G. C., Chen, Y. Y. C., Codlin, S., et al. (2014). LaSSO, a strategy for genome-­‐wide mapping of intronic lariats and branch points using RNA-­‐seq. Genome Research, 24(7), 1169–1179. doi:10.1101/gr.166819.113 Bradley, R. K., Merkin, J., Lambert, N. J., & Burge, C. B. (2012). Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution, 10(1), e1001229. doi:10.1371/journal.pbio.1001229 Brar, G. A., Yassour, M., Friedman, N., Regev, A., Ingolia, N. T., & Weissman, J. S. (2012). High-­‐
resolution view of the yeast meiotic program revealed by ribosome profiling. Science (New York, NY), 335(6068), 552–557. doi:10.1126/science.1215110 Breiman, L. (2001). Random forests. Machine Learning. Burnette, J. M., Miyamoto-­‐Sato, E., Schaub, M. A., Conklin, J., & Lopez, A. J. (2005). Subdivision of large introns in Drosophila by recursive splicing at nonexonic elements. Genetics, 170(2), 661–674. doi:10.1534/genetics.104.039701 Chapman, K. B., & Boeke, J. D. (1991). Isolation and characterization of the gene encoding yeast debranching enzyme. Cell, 65(3), 483–492. doi:10.1016/0092-­‐8674(91)90466-­‐C Clarkson, B. K., Gilbert, W. V., & Doudna, J. A. (2010). Functional overlap between eIF4G isoforms in Saccharomyces cerevisiae. PloS One, 5(2), e9114. doi:10.1371/journal.pone.0009114 Crooks, G. E., Hon, G., Chandonia, J.-­‐M., & Brenner, S. E. (2004). WebLogo: a sequence logo generator. Genome Research, 14(6), 1188–1190. doi:10.1101/gr.849004 Davis, C. A. (2000). Test of intron predictions reveals novel splice sites, alternatively spliced mRNAs and new introns in meiotically regulated genes of yeast. Nucleic Acids Research, 28(8), 1700–1706. doi:10.1093/nar/28.8.1700 Dietrich, R. C., Incorvaia, R., & Padgett, R. A. (1997). Terminal Intron Dinucleotide Sequences Do Not Distinguish between U2-­‐ and U12-­‐Dependent Introns. Molecular Cell, 1(1), 151–160. doi:10.1016/S1097-­‐2765(00)80016-­‐7 Dumesic, P. A., Natarajan, P., Chen, C., Drinnenberg, I. A., Schiller, B. J., Thompson, J., et al. (2013). Stalled spliceosomes are a signal for RNAi-­‐mediated genome defense. Cell, 152(5), 957–968. doi:10.1016/j.cell.2013.01.046 Folco, E. G., & Reed, R. (2014). In vitro systems for coupling RNAP II transcription to 102 splicing and polyadenylation. Methods in Molecular Biology (Clifton, NJ), 1126, 169–177. doi:10.1007/978-­‐1-­‐62703-­‐980-­‐2_13 Folco, E. G., Lei, H., Hsu, J. L., & Reed, R. (2012). Small-­‐scale nuclear extracts for functional assays of gene-­‐expression machineries. Journal of Visualized Experiments : JoVE, (64). doi:10.3791/4140 Friedman, K. L., & Brewer, B. J. (1995). Analysis of replication intermediates by two-­‐
dimensional agarose gel electrophoresis. In Methods in Enzymology (Vol. 262, pp. 613–
627). Elsevier. doi:10.1016/0076-­‐6879(95)62048-­‐6 González, C. I., Wang, W., & Peltz, S. W. (2001). Nonsense-­‐mediated mRNA decay in Saccharomyces cerevisiae: a quality control mechanism that degrades transcripts harboring premature termination codons. Cold Spring Harbor Symposia on Quantitative Biology, 66, 321–328. Graveley, B. R., Brooks, A. N., Carlson, J. W., Duff, M. O., Landolin, J. M., Yang, L., et al. (2011). The developmental transcriptome of Drosophila melanogaster. Nature, 471(7339), 473–479. doi:10.1038/nature09715 Guo, Y., Mahony, S., & Gifford, D. K. (2012). High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Computational Biology, 8(8), e1002638. doi:10.1371/journal.pcbi.1002638 Hinnebusch, A. G. (2006). Gene-­‐specific translational control of the yeast GCN4 gene by phosphorylation of eukaryotic initiation factor 2. Molecular Microbiology, 10(2), 215–
223. doi:10.1111/j.1365-­‐2958.1993.tb01947.x Hirose, T., Ideue, T., Nagai, M., Hagiwara, M., Shu, M.-­‐D., & Steitz, J. A. (2006). A spliceosomal intron binding protein, IBP160, links position-­‐dependent assembly of intron-­‐encoded box C/D snoRNP to pre-­‐mRNA splicing. Molecular Cell, 23(5), 673–684. doi:10.1016/j.molcel.2006.07.011 Hossain, M. A., & Johnson, T. L. (2014). Using yeast genetics to study splicing mechanisms. Methods in Molecular Biology (Clifton, N.J.), 1126, 285–298. doi:10.1007/978-­‐1-­‐62703-­‐
980-­‐2_21 Hu, W., Sweet, T. J., Chamnongpol, S., Baker, K. E., & Coller, J. (2009). Co-­‐translational mRNA decay in Saccharomyces cerevisiae. Nature, 461(7261), 225–229. doi:10.1038/nature08265 Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S., & Weissman, J. S. (2009). Genome-­‐Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling. Science (New York, N.Y.), 324(5924), 218–223. doi:10.1126/science.1168978 Juneau, K., Nislow, C., & Davis, R. W. (2009). Alternative splicing of PTC7 in Saccharomyces cerevisiae determines protein localization. Genetics, 183(1), 185–194. doi:10.1534/genetics.109.105155 Katz, Y., Li, F., Lambert, N. J., Sokol, E. S., Tam, W.-­‐L., Cheng, A. W., et al. (2014). Musashi proteins are post-­‐transcriptional regulators of the epithelial-­‐luminal cell state. eLife, 3, e03915. doi:10.7554/eLife.03915 Katz, Y., Wang, E. T., Airoldi, E. M., & Burge, C. B. (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods, 7(12), 1009–1015. doi:10.1038/nmeth.1528 Kawashima, T., Douglass, S., Gabunilas, J., Pellegrini, M., & Chanfreau, G. F. (2014). Widespread use of non-­‐productive alternative splice sites in Saccharomyces cerevisiae. PLoS Genetics, 10(4), e1004249. doi:10.1371/journal.pgen.1004249 103 Kent, W. J. (2002). BLAT-­‐-­‐the BLAST-­‐like alignment tool. Genome Research, 12(4), 656–664. doi:10.1101/gr.229202 Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., & Salzberg, S. L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14(4), R36. doi:10.1186/gb-­‐2013-­‐14-­‐4-­‐r36 Královicová, J., Lei, H., & Vorechovský, I. (2006). Phenotypic consequences of branch point substitutions. Human Mutation, 27(8), 803–813. doi:10.1002/humu.20362 Kruger, K., Grabowski, P. J., Zaug, A. J., Sands, J., Gottschling, D. E., & Cech, T. R. (1982). Self-­‐
splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena. Cell, 31(1), 147–157. Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. L. (2009). Ultrafast and memory-­‐efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3), R25. doi:10.1186/gb-­‐2009-­‐10-­‐3-­‐r25 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16), 2078–
2079. doi:10.1093/bioinformatics/btp352 Malone, R. E., Bullard, S., Hermiston, M., Rieger, R., Cool, M., & Galbraith, A. (1991). Isolation of mutants defective in early steps of meiotic recombination in the yeast Saccharomyces cerevisiae., 128(1), 79–88. MathWorks, I. (2012). MathWorks: MATLAB and Statistics Toolbox Release -­‐ Google Scholar. Mercer, T. R., Clark, M. B., Andersen, S. B., Brunck, M. E., Haerty, W., Crawford, J., et al. (2015). Genome-­‐wide discovery of human splicing branchpoints. Genome Research, 25(2), 290–303. doi:10.1101/gr.182899.114 Meyer, M., Plass, M., Pérez-­‐Valle, J., Eyras, E., & Vilardell, J. (2011). Deciphering 3'ss selection in the yeast genome reveals an RNA thermosensor that mediates alternative splicing. Molecular Cell, 43(6), 1033–1039. doi:10.1016/j.molcel.2011.07.030 Miura, F., Kawaguchi, N., Sese, J., Toyoda, A., Hattori, M., Morishita, S., & Ito, T. (2006). A large-­‐scale full-­‐length cDNA analysis to explore the budding yeast transcriptome., 103(47), 17846–17851. doi:10.1073/pnas.0605645103 Ooi, S. L., Dann, C., Nam, K., Leahy, D. J., Damha, M. J., & Boeke, J. D. (2001). RNA lariat debranching enzyme. Methods in Enzymology, 342, 233–248. Padgett, R. A., Konarska, M. M., Aebi, M., Hornig, H., Weissmann, C., & Sharp, P. A. (1985). Nonconsensus branch-­‐site sequences in the in vitro splicing of transcripts of mutant rabbit beta-­‐globin genes, 82(24), 8349–8353. Padgett, R. A., Konarska, M. M., Grabowski, P. J., Hardy, S. F., & Sharp, P. A. (1984). Lariat RNA's as intermediates and products in the splicing of messenger RNA precursors. Science (New York, NY), 225(4665), 898–903. Pan, Q., Shai, O., Lee, L. J., Frey, B. J., & Blencowe, B. J. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-­‐throughput sequencing. Nature Genetics, 40(12), 1413–1415. doi:10.1038/ng.259 Pelechano, V., Wei, W., & Steinmetz, L. M. (2013). Extensive transcriptional heterogeneity revealed by isoform profiling. Nature, 497(7447), 127–131. doi:10.1038/nature12121 Pleiss, J. A., Whitworth, G. B., Bergkessel, M., & Guthrie, C. (2007). Rapid, transcript-­‐specific changes in splicing in response to environmental stress. Molecular Cell, 27(6), 928–937. doi:10.1016/j.molcel.2007.07.018 104 Presnyak, V., Alhusaini, N., Chen, Y.-­‐H., Martin, S., Morris, N., Kline, N., et al. (2015). Codon optimality is a major determinant of mRNA stability. Cell, 160(6), 1111–1124. doi:10.1016/j.cell.2015.02.029 Quesada, V., Conde, L., Villamor, N., Ordóñez, G. R., Jares, P., Bassaganyas, L., et al. (2012). Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nature Genetics, 44(1), 47–52. doi:doi:10.1038/ng.1032 Rain, J. C. (1997). In vivo commitment to splicing in yeast involves the nucleotide upstream from the branch site conserved sequence and the Mud2 protein. The EMBO Journal, 16(7), 1759–1771. doi:10.1093/emboj/16.7.1759 Reich, C. I., VanHoy, R. W., Porter, G. L., & Wise, J. A. (1992). Mutations at the 3′ splice site can be suppressed by compensatory base changes in U1 snRNA in fission yeast. Cell, 69(7), 1159–1169. doi:10.1016/0092-­‐8674(92)90637-­‐R Robinson, J. T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., & Mesirov, J. P. (2011). Integrative genomics viewer. Nature Biotechnology, 29(1), 24–26. doi:10.1038/nbt.1754 Ruskin, B., & Green, M. R. (1985). An RNA processing activity that debranches RNA lariats. Science (New York, NY), 229(4709), 135–140. Russell, A. G., Charette, J. M., Spencer, D. F., & Gray, M. W. (2006). An early evolutionary origin for the minor spliceosome. Nature, 443(7113), 863–866. doi:10.1038/nature05228 Séraphin, B., & Kandels-­‐Lewis, S. (1993). 3′ splice site recognition in S. cerevisiae does not require base pairing with U1 snRNA. Cell, 73(4), 803–812. doi:10.1016/0092-­‐
8674(93)90258-­‐R Sheth, N., Roca, X., Hastings, M. L., Roeder, T., Krainer, A. R., & Sachidanandam, R. (2006). Comprehensive splice-­‐site analysis using comparative genomics. Nucleic Acids Research, 34(14), 3955–3967. doi:10.1093/nar/gkl556 Smith, C. W., & Nadal-­‐Ginard, B. (1989). Mutually exclusive splicing of alpha-­‐tropomyosin exons enforced by an unusual lariat branch point location: implications for constitutive splicing. Cell, 56(5), 749–758. Spingola, M., Grate, L., Haussler, D., & Ares, M. (1999). Genome-­‐wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae., 5(2), 221–234. Sureau, A. (2001). SC35 autoregulates its expression by promoting splicing events that destabilize its mRNAs. The EMBO Journal, 20(7), 1785–1796. doi:10.1093/emboj/20.7.1785 Taggart, A. J., DeSimone, A. M., Shih, J. S., Filloux, M. E., & Fairbrother, W. G. (2012). Large-­‐
scale mapping of branchpoints in human pre-­‐mRNA transcripts in vivo. Nature Structural & Molecular Biology, 19(7), 719–721. doi:10.1038/nsmb.2327 Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., et al. (2012). Differential gene and transcript expression analysis of RNA-­‐seq experiments with TopHat and Cufflinks. Nature Protocols, 7(3), 562–578. doi:10.1038/nprot.2012.016 Vierstra, R. D., & Callis, J. (1999). Polypeptide tags, ubiquitous modifiers for plant protein regulation. Plant Molecular Biology, 41(4), 435–442. Vijayraghavan, U., Parker, R., Tamm, J., Iimura, Y., Rossi, J., Abelson, J., & Guthrie, C. (1986). Mutations in conserved intron sequences affect multiple steps in the yeast splicing pathway, particularly assembly of the spliceosome. The EMBO Journal, 5(7), 1683–
1695. 105 Vogel, J., Hess, W. R., & Börner, T. (1997). Precise branch point mapping and quantification of splicing intermediates. Nucleic Acids Research, 25(10), 2030–2031. Waern, K., & Snyder, M. (2013). Extensive transcript diversity and novel upstream open reading frame regulation in yeast. G3 (Bethesda, Md.), 3(2), 343–352. doi:10.1534/g3.112.003640 Wahl, M. C., Will, C. L., & Lührmann, R. (2009). The Spliceosome: Design Principles of a Dynamic RNP Machine. Cell, 136(4), 701–718. doi:10.1016/j.cell.2009.02.009 Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., et al. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature, 456(7221), 470–476. doi:doi:10.1038/nature07509 Warnes, G. R., Ben Bolker, Bonebakker, L., Gentleman, R., Liaw, W. H. A., Lumley, T., et al. (2015). gplots: Various R Programming Tools for Plotting Data. R package version 2.16.0. CRAN.R-­‐Project.org. Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Retrieved from http://books.google.com/books?hl=en&lr=&id=bes-­‐
AAAAQBAJ&oi=fnd&pg=PR5&dq=H+Wickham+ggplot2+elegant+graphics+for+data+an
alysis+Springer+New+York+2009&ots=SA95Sz5RTU&sig=jfYAe6OOtsEgtMPKIuy6Z1q
TYFA Wollerton, M. C., Gooding, C., Wagner, E. J., Garcia-­‐Blanco, M. A., & Smith, C. W. J. (2004). Autoregulation of Polypyrimidine Tract Binding Protein by Alternative Splicing Leading to Nonsense-­‐Mediated Decay. Molecular Cell, 13(1), 91–100. doi:10.1016/S1097-­‐
2765(03)00502-­‐1 Young, M. E., Karpova, T. S., Brugger, B., Moschenross, D. M., Wang, G. K., Schneiter, R., et al. (2002). The Sur7p Family Defines Novel Cortical Domains in Saccharomyces cerevisiae, Affects Sphingolipid Metabolism, and Is Involved in Sporulation. Molecular and Cellular Biology, 22(3), 927–934. doi:10.1128/MCB.22.3.927-­‐934.2002 Zhang, Z., Hesselberth, J. R., & Fields, S. (2007). Genome-­‐wide identification of spliced introns using a tiling microarray. Genome Research, 17(4), 503–509. doi:10.1101/gr.6049107 106 Chapter 3: Conclusions 107 Implications Our discovery of hundreds of novel BPs by Branch-­‐seq revealed an unexpected picture of the budding yeast transcriptome. The surprising number of “intergenic” BPs we found, together with other recent studies that demonstrate gene boundaries in yeast are still being refined, has revealed additional post-­‐transcriptional gene regulation mechanisms and increased coding diversity in yeast (Arribere & Gilbert, 2013; Pelechano, Wei, & Steinmetz, 2013). Thus, the number of novel BPs that we report in UTRs and antisense transcripts are likely underestimates. Presumably all of the novel BPs fall inside introns, since splicing is the only known mechanism that creates RNA lariats with 2' branched structures. To help understand the origins of these BPs, we identified over 100 novel introns in budding yeast using RNA-­‐seq. However, most of the novel BPs fell outside of our novel introns and those introns recently identified by another study (Kawashima, Douglass, Gabunilas, Pellegrini, & Chanfreau, 2014). This persistent discrepancy begs an explanation. We propose that the small overlap of novel BPs and novel introns could be due to different technical biases in Branch-­‐seq and RNA-­‐seq library preparations or could result from identification of products of incomplete splicing. One way to assay for stalled splicing is to perform targeted RT-­‐PCR to assess whether the completely spliced product is formed. In a more high-­‐throughput approach, read density upstream of the 5'SS, relative to read density downstream of the 3'SS used to normalize for gene expression, can be compared in RNA-­‐
seq libraries that were poly(A)-­‐selected versus rRNA-­‐depleted. As long as the first exon is stable following the first step of splicing, it will be present in an rRNA-­‐depleted library but will not be present in a poly(A)-­‐selected library because it lacks a poly(A) tail. Thus, the 108 expectation is that poly(A)-­‐selected libraries will have lower read density upstream of the 5'SS than rRNA-­‐depleted libraries in cases of incomplete splicing. Our identification of novel introns refined the budding yeast genome annotations and lead us to validate the first AT-­‐AC splice site intron in S. cerevisiae. The usage of AT-­‐AC splice sites in budding yeast is puzzling because the minor spliceosome that usually splices introns with these splice site motifs in metazoans and plants is not present in S.cer. It is possible that this AT-­‐AC intron is removed by the major U2 spliceosome, as is the case for 1-­‐2 dozen other known introns in metazoans. It is also conceivable that this intron is removed by self splicing. However, this intron is only 214 nt long, making it unclear whether it could adopt the 2D conformation typical of group II introns. Our identification of introns that contain multiple BPs in yeast (Chapter 2) and fly (Appendix III) suggests that BP selection may impact post-­‐transcriptional gene regulation. In yeast we showed that alternative BP usage affected 3'SS choice and RNA stability in the LSM2 gene. In addition to the effects of multiple BPs on alternative splicing, we hypothesize that the position of multiple BPs in one intron may impact the processing of intron-­‐derived RNAs such as snoRNAs. In both yeast and fly, we observed alternative BP usage that alters snoRNA position in a lariat structure. In yeast we observed alternative BP usage that shifts the snoRNA from the loop of the lariat to the lariat tail. In a fly intron containing two snoRNAs, we observed alternative BPs that create two lariats, resulting in one snoRNA inside each lariat as opposed to two snoRNAs inside one larger lariat. The distance from snoRNAs to BPs is known to be constrained in multiple organisms for proper snoRNA processing (Hirose & Steitz, 2001; Huang, Chen, Zhou, Li, & Qu, 2007; Vincenti, De Chiara, 109 Bozzoni, & Presutti, 2007), suggesting that changes in snoRNA-­‐BP spacing that result from differential BP usage may impact snoRNA biogenesis. Future directions BP sequencing approaches The strength of the current Branch-­‐seq protocol lies in its ability to detect BPs from short lariat loops. Thus, Branch-­‐seq should be applicable to additional organisms, such as fly, worm, and plants, that have many short introns (Lim & Burge, 2001). Of those organisms, I have attempted to isolate lariat RNA from 2D gels for fly and worm samples where DBR1 has been knocked down or deleted, respectively (Fig. III-­‐1A and III-­‐S1A). The more promising of these two organisms is fly, because fly RNA produced a faint arc in a 2D gel and worm RNA did not, as shown in Appendix III. It would also be worthwhile to apply the current Branch-­‐seq protocol to study splicing factor mutants in yeast. By applying Branch-­‐seq on such samples, one could begin to dissect how different mutations affect BP selection and usage, which may elucidate mechanisms for splicing changes observed in those mutants. To sequence BPs in the long introns that predominate in mammals, a targeted sequencing approach called CaptureSeq was recently adapted to enrich for lariat RNAs and produce reads that cross the 5'SS to BP junctions of individual lariats (Mercer et al., 2015). Capture-­‐based approaches should be well suited to assay the same group of BPs across many different samples. This approach is useful in cases where one has a set of candidate 110 regulated BPs. For instance, if a set of splicing changes are known in a disease-­‐associated splicing factor mutant (e.g., SF3B1, U2AF1, or U2AF2 mutants in MDS), then the BPs of the affected introns could be targeted in patient samples using CaptureSeq to help understand the mechanism underlying the observed splicing changes. This approach is also attractive for studying BP usage across samples where interesting changes in splicing occur (see below). Another application of the CaptureSeq approach is to discover additional BPs, which requires large-­‐scale design of capture probes throughout thousands of introns. A limitation of this design is that it will not recover BPs in unexpected genomic locations, such as unannotated introns. Our discovery of many novel BPs throughout the relatively small yeast genome emphasizes the utility of untargeted approaches like Branch-­‐seq for BP identification. However, if intron annotations are more complete in metazoans, targeted approaches should identify a large fraction of novel mammalian BPs. Advice for future development of BP sequencing approaches There are two main topics discussed in the second half of Appendix I regarding BP sequencing methods: (1) further optimization of Branch-­‐seq for sequencing BPs of longer introns and (2) suggestions for new BP sequencing strategies, including alternative lariat enrichment approaches. By adjusting how the adapters are added to the debranched lariats in Branch-­‐seq, it may be possible to reliably sequence lariats larger than 100 nt. For any BP sequencing approach, isolation of lariat RNA is key. Some methods for lariat enrichment discussed in Appendix I include isolation of nuclear RNA, performing DBR1 co-­‐
immunoprecipitation, using cells null for DBR1 and its homolog DRN1 (Garrey et al., 2014), 111 digesting linear RNAs, and/or isolating Y-­‐shaped RNAs. These lariat enrichment strategies can be used in combination with Branch-­‐seq or with future BP sequencing methods. Additional applications of BP sequencing Using the BP sequencing methods mentioned above, a wide range of BP-­‐centric questions can be addressed. BP evolution can be studied by mapping BP locations in homologous introns across multiple organisms. Depending on the method used, distant BPs, recursive splicing, and nested intron splicing can be further characterized across organisms. BP usage coupled to 3'SS usage can be examined by performing targeted or untargeted BP sequencing in conjunction with RNA-­‐seq under conditions in which interesting splicing changes occur, such as differentiation, epithelial-­‐mesenchymal transition, and across different tissues. These datasets have the potential to demonstrate how often AS involves different BPs, the spacing of alternative 3'SS that use different BPs, and could provide further evidence on the extent to which distal BPs favor distal 3'SS usage in NAGNAGs (Bradley, Merkin, Lambert, & Burge, 2012). BP sequencing can also reveal global BP selection effects of anti-­‐tumor drugs that disrupt proper BP recognition, such as SSA and E7017 (Corrionero, Miñana, & Valcárcel, 2011; Folco, Coil, & Reed, 2011). For organisms where BP sequencing yields non-­‐comprehensive data, the experimentally located BPs can be used to better train predictive BP algorithms (Friedman, 2006). Lastly, if DBR1 is present in the sample used to generate BP sequencing data, it may be possible to calculate lariat degradation rates by sequencing BPs in a time course after transcriptional inhibition. Such an experiment could identify unusually stable lariats that may function as sponges for other RNAs or proteins. 112 Final remarks The number of known BP motif mutations that give rise to disease phenotypes is likely to grow as more human BPs are identified and as sequencing of disease samples becomes more prevalent. Whole exome and whole genome sequencing on patient samples will allow identification of BP mutations that lead to splicing defects, expanding our understanding of gene regulation in humans. Though the yeast genome is much more compact than the human genome, even in yeast our understanding of gene regulation is incomplete. The studies outlined in this thesis have shown that BP selection in yeast is more complex than previously known and this complexity can dictate gene regulatory choices. For instance, we have shown several yeast introns use multiple BPs which influence 3'SS choice and regulation via NMD. I hope the future application of BP sequencing to the outstanding questions detailed above will deepen our understanding of BP regulation and that the techniques described in this thesis will prove useful in those endeavors. 113 References Arribere, J. A., & Gilbert, W. V. (2013). Roles for transcript leaders in translation and mRNA decay revealed by transcript leader sequencing. Genome Research, 23(6), 977–987. doi:10.1101/gr.150342.112 Bradley, R. K., Merkin, J., Lambert, N. J., & Burge, C. B. (2012). Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution, 10(1), e1001229. doi:10.1371/journal.pbio.1001229 Corrionero, A., Miñana, B., & Valcárcel, J. (2011). Reduced fidelity of branch point recognition and alternative splicing induced by the anti-­‐tumor drug spliceostatin A. Genes & Development, 25(5), 445–459. doi:10.1101/gad.2014311 Folco, E. G., Coil, K. E., & Reed, R. (2011). The anti-­‐tumor drug E7107 reveals an essential role for SF3b in remodeling U2 snRNP to expose the branch point-­‐binding region. Genes & Development, 25(5), 440–444. doi:10.1101/gad.2009411 Friedman, B. A. (2006). Evolution and specificity of ribonucleic acid splicing. In C. B. Burge. Massachusetts Institute of Technology. Retrieved from http://hdl.handle.net/1721.1/37139 Garrey, S. M., Katolik, A., Prekeris, M., Li, X., York, K., Bernards, S., et al. (2014). A homolog of lariat-­‐debranching enzyme modulates turnover of branched RNA. RNA (New York, N.Y.), 20(8), 1337–1348. doi:10.1261/rna.044602.114 Hirose, T., & Steitz, J. A. (2001). Position within the host intron is critical for efficient processing of box C/D snoRNAs in mammalian cells. Proceedings of the National Academy of Sciences of the United States of America, 98(23), 12914–12919. doi:10.1073/pnas.231490998 Huang, Z.-­‐P., Chen, C.-­‐J., Zhou, H., Li, B.-­‐B., & Qu, L.-­‐H. (2007). A combined computational and experimental analysis of two families of snoRNA genes from Caenorhabditis elegans, revealing the expression and evolution pattern of snoRNAs in nematodes. Genomics, 89(4), 490–501. doi:10.1016/j.ygeno.2006.12.002 Kawashima, T., Douglass, S., Gabunilas, J., Pellegrini, M., & Chanfreau, G. F. (2014). Widespread use of non-­‐productive alternative splice sites in Saccharomyces cerevisiae. PLoS Genetics, 10(4), e1004249. doi:10.1371/journal.pgen.1004249 Lim, L. P., & Burge, C. B. (2001). A computational analysis of sequence features involved in recognition of short introns, 98(20), 11193–11198. doi:10.1073/pnas.201407298 Mercer, T. R., Clark, M. B., Andersen, S. B., Brunck, M. E., Haerty, W., Crawford, J., et al. (2015). Genome-­‐wide discovery of human splicing branchpoints. Genome Research, 25(2), 290–303. doi:10.1101/gr.182899.114 Pelechano, V., Wei, W., & Steinmetz, L. M. (2013). Extensive transcriptional heterogeneity revealed by isoform profiling. Nature, 497(7447), 127–131. doi:10.1038/nature12121 Vincenti, S., De Chiara, V., Bozzoni, I., & Presutti, C. (2007). The position of yeast snoRNA-­‐
coding regions within host introns is essential for their biosynthesis and for efficient splicing of the host pre-­‐mRNA. RNA (New York, N.Y.), 13(1), 138–150. doi:10.1261/rna.251907 114 Appendix I: Branch-­‐seq Protocol 115 Part 1: Branch-­‐seq protocol Suggestions: 1) Purify DBR1 before starting Branch-­‐seq protocol. 2) Make in vitro spliced lariat close to when performing debranching step so it will have enough radioactive signal for downstream steps. 3) Remember to radiolabel your RNA ladder before performing in vitro splicing. Pre-­‐protocol steps: Debranching enzyme purification • Clone S.cer. DBR1 using cDNA generated from WT s288c yeast into the pET151 expression vector from Invitrogen. • Express protein in Rosetta 2(DE3)pLysS competent cells. Grow bacteria in YT media at 37°C until induction of bacteria with IPTG, at which point grow bacteria at 18°C. • Lyse bacteria using Native Lysis Buffer (Qiagen). • Purify protein with a Ni-­‐NTA column (Qiagen) and subsequently over an S200 column (Buffer: 125 mM KCl, 20 mM HEPES pH 7.3, 1 mM DTT, 10% glycerol). • Concentrate protein (final 50% glycerol) and flash freeze. • Test protein for RNase activity and debranching activity on an in vitro transcribed linear RNA (body labeled) and an in vitro spliced lariat, respectively. • Note on debranching enzyme: I tried using many different sources of DBR1 including a commercially available (human) DBR1 from Abnova, recombinantly purified human DBR1, and recombinantly purified S.cer. DBR1. All three of these were capable of debranching an in vitro synthesized lariat (bellow), but the S.cer. DBR1 proved to be the most reliable, perhaps because it was easier to express higher levels of the yeast protein than the human DBR1 and the commercially available human DBR1 was not sold for the purpose of debranching. Production of in vitro spliced fly lariat HeLa nuclear extracts for in vitro splicing were a kind gift from the Reed Lab. Coupled in vitro transcription and splicing were performed similar to Folco and Reed (Folco & Reed, 2014) except addition of α-­‐amanitin was omitted to obtain as many lariats as possible. Reactions were digested with RNase R (Epicenter) at 37°C for 1 hour to obtain radio labeled FTZ lariats. Note on RNase R digestion: U6 snRNA which became radio labeled during the coupled reaction was not digested by RNase R as seen in Fig. I-­‐1A and B denoted by arrow. Notes: Adapted coupled in vitro transcription and splicing protocol: • Prepared coupled transcription and splicing reaction: o 1 uL 12.5 mM ATP o 1 uL 0.5 M Creatine Phosphate (di-­‐Tris salt) o 1 uL 80 mM MgCl2 o 1 uL FTZ template DNA (final 200 ng/uL) 116 •
•
•
•
•
•
•
•
•
•
•
•
o 6 uL alpha-­‐UTP o 15 uL Nuclear Extract Note, I typically perform 2 reactions at the same time to produce more labeled lariats. Incubate at 30°C for one hour. Add 170 uL water and 5 uL Proteinase K. Incubate at 37°C for 15 min. Remove unincorporated α -­‐UTP using RNeasy cleanup (Qiagen, typically use 4 columns) Elute in 40 uL water each. Pool RNA. RNase R digest the coupled reaction: o 20uL 10X RNase R buffer (Epicenter) o 3uL RNase R (It is advisable to keep track of the tube of RNase R used if you have multiple tubes because there is a report of possible of endonuclease contamination in some RNase R batches (Salzman, Gawad, Wang, Lacayo, & Brown, 2012)). o 80uL radio-­‐labeled, cleaned, coupled reaction RNA. o 97uL water o 200uL total Incubate at 37°C for 1 hour. Phenol/chloroform (pH 7.9) extract in a gel lock tube (Ref # 2302830, 5 Prime Phase Lock Gel Heavy 2 mL tube). Precipitate RNA o 20uL sodium Acetate (pH 5.5) o 500uL 100% EtOH o 2uL glycogen Resuspend in 10uL water. Run gel to confirm presence of lariat RNA o 0.5uL RNA + 4uL water + 5uL 2X denaturing loading buffer. o ladder: low range ssRNA (NEB Cat # N0364S) . o heat samples before loading gel. o 6% TBE urea gel, 200V, 50 min. o expose to phosphorimager plate. o For example, see Fig. I-­‐1A lane 1, 6, 13, 23 and Fig. I-­‐1B no SuperaseIn lane. Branch-­‐seq protocol: S.cer. strain used: dbr1Δ:BY4742 Mat α his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0 For schematic of protocol, see Figure 2-­‐1A. I. Isolate Total RNA -­‐Trizol Isolation 117 •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Grow yeast (750mL), collect by centrifugation at 7000 RPM for 5 min at 4°C. Wash yeast twice with water. Freeze yeast at -­‐80°C (optional). Thaw cells. Add 10 mL water and transfer to 12 Omni Bead Ruptor compatible tubes containing 2.8mm ceramic beads. Spin at 7000 RPM for 5 min at 4°C, keep cell pellet. Add 1mL Trizol (Life Technologies) to each tube. Use an Omni Bead Ruptor to lyse the cells: o Homogenize twice for 20 seconds on ½ max speed. o Homogenize once for 10 seconds on max speed. Transfer to 2 15mL conicals (polypropylene plastic). Incubate samples at room temp for 5 min. Add 1/5 volume of chloroform (1.2mL/15mL conical) and mix. Incubate samples at room temp 2-­‐3 min. Spin at max speed for 15 min at 4°C. Transfer upper aqueous layer to a new tube and precipitate with ½ volume isopropanol (6mL). Incubate on ice 5min. Spin 19000 RPM at 4°C for 25 min. Wash the RNA pellet with 70% ethanol, resuspend in 200uL EB (Qiagen), store at -­‐
80°C. II. 2D PAGE Gel Notes: 1) Gel reagents: Ultra-­‐pure sequagel reagents from National Diagnostics. 2) Gel running: Use metal heat sink for all gels. 3) Suggest pouring D1 (first dimension: D1) the evening before you want to run your 2D gel so it is polymerized and ready to use in the morning. This makes it more feasible to run the 2D gel in one day. Wrap the polymerized D1 gel in wet paper towels and saran wrap and store at 4°C overnight. 4) Suggest pouring D2 while D1 is running. 5) I found linear acrylamide (and to a lesser extent glycoblue) inhibit debranching (Fig. I-­‐1A), so I advise only using glycogen for any precipitation steps. 6) Different percentage 2D gels and different gel running times give altered separation of lariat arc from linear RNA diagonal and are better for isolating different size lariats (Fig I-­‐2). Protocol: • Pour D1: o 6% gel o 1.5 mm spacers o ~20 cm by ~32 cm glass plates o 12 well comb 118 Use 100 ug total RNA. Mix with 2X denaturing loading dye and heat at 80-­‐95°C prior to loading D1. If using a ladder, leave 1 empty lane on each side of total RNA lane so the total RNA lane can be cleanly cut from the D1 gel. • Run D1 at 15 W for 1 hr and 45 min. • Pour D2 o 20% gel o 1.5 mm spacers o ~20 cm by ~32 cm glass plates o 1 well comb • Stain D1 with sybr gold and image on a Safe Light. • Cut out a single lane of the D1 gel as one long strip using a clean razor blade. Leave it on the imager while preparing D2 gel. • Prepare D2 gel: Remove comb, add running buffer (TBE) to well to aid in D1 gel insertion. • Carefully slide D1 gel into D2 gel using tweezers and a razor blade. Avoid introducing air bubbles between the D1 and D2 gel interface. • (Optional) add loading dye on top of the D1 gel slice in the D2 gel for easy visualization of running of the D2 gel. • Run D2 at 30W for 6hr and 30 min. • Stain D2 with sybr gold. • Cut arc out of gel. (Optional) freeze gel at -­‐20°C. • Elute RNA in 12 mL of PAGE elution buffer (30 mM Tris-­‐HCl (pH 7.5), 300 mM NaCl, and 3 mM EDTA) (Ooi et al., 2001) and rotate continuously over night at 4°C. • Precipitate RNA with isopropanol (13.5 mL) and glycogen (7 uL) at -­‐20°C overnight. Spin at 19000 RPM, 25 min, 4°C. Wash with 500uL 70% EtOH, spin in a 1.5 mL epi 10 min at 17000 RPM. Dry pellet and resuspend in 10 uL water. III Debranching Debranching was performed similar to (Ooi et al., 2001) protocol. • Prepare 5X debranching buffer: o 100 nM Hepes o 625 mM KCl o 2.5 mM MgCl2 o 5 mM DTT o 50% glycerol • Debranching reaction o 3 uL 5X debranching buffer o _____ uL radio-­‐labeled FTZ lariat RNA (I used 1 uL of 10 uL (described in pre-­‐protocol section). This volume can be modified to the user’s discretion depending on the application and radioactive strength of the radio-­‐labeled lariats. The following controls are recommended: (1) debranch the FTZ lariat RNA alone (2) using FTZ lariat RNA, leave out DBR1 (3) spike in FTZ lariat RNA to the samples you wish to debranch and run a small portion of them on gel afterwards to confirm debranching) •
119 •
•
•
o 0.5uL recombinantly purified yeast DBR1 o _____uL water o ______uL 2D RNA (I used 4uL) o Total volume = 15uL Incubate at 30°C for 1 hour. Do NOT use SuperaseIn (RNase inhibitor) because it causes a gel shift like behavior that makes it difficult to determine if debranching worked when running the debranched product on a diagnostic gel (Fig. I-­‐1B). Addition of proteinase K prior to running the gel can resolve this issue if you want to use an RNase inhibitor. Phenol/chloroform extract the experimental samples, saving a small amount for diagnostic gel (see next step) o Bring volume of debranching reaction to 200uL with water. o Phenol/chloroform extract (pH 7.9), gel lock tube. o Precipitate: 20uL sodium acetate, 500uL 100% EtOH, 2uL glycogen. o Resuspend in 22.5uL water for poly(A) tailing step (bellow). Run diagnostic debranching gel: o Add 2X denaturing loading buffer to controls and to small aliquots from experimental samples. o Heat samples, run on 10% TBE urea gel, 1hr 15min, 200V. o Expose to phosphorimager plate. IV. PolyA tail debranched lariats Poly(A) tailing protocol is adapted from the Burge Lab’s ribosome footprint profiling protocol, originally developed by the Weissman Lab (Ingolia, Ghaemmaghami, Newman, & Weissman, 2009). • Prepare RNA samples: o Suggest as a control 0.5uL piSPIKE RNA (from IDT) +/-­‐ poly(A) polymerase to confirm poly(A) tailing and RT steps. 10 ng RNA X ul Water To 22.5 ul • Make tailing mix and enzyme mix and store on ice. Amounts listed per 1 sample: o 2X tailing mix (total 25 ul/sample, but only add 22.5 ul to each tube) 10X PAP buffer 5 ul 10 mM ATP (1 mM ATP final conc) 5 ul RNase inhibitor (30 units) (SuperaseIn) 0.75 ul Water 14.25 ul o Enzyme mix (total 5 ul/sample) 120 2X tailing mix Water Poly(A) polymerase enzyme (5 U/ul) •
•
•
•
•
•
•
•
•
•
2.5 ul 2 ul 0.5 ul Denature RNA samples 2 min at 80 ˚C. Place on ice. Add 22.5 ul 2X tailing mix. Add 5ul enzyme mix on ice (final volume 50 ul). Incubate at 37°C for 10 min. Quench reaction with 200uL 5mM EDTA. Add 250uL phenol/chloroform (pH 7.9), extract using gel lock tube. Precipitate RNA (1uL glycogen, 28uL sodium acetate, 300uL 100% isopropanol). Resuspend in 12uL 10nM Tris pH 8.0. Run gel to confirm successful poly(A) tailing if you used the piSPIKE control: o ½ reaction (6uL) + 2X denaturing loading buffer (LB) o 1uL low range ssRNA ladder + 2X denaturing LB o heat samples o run on 6% TBE urea gel, 200V, 30 min. Stain with sybr gold. Should see smear above piSPIKE RNA confirming polyA tailing. V. Reverse Transcription • Prepare mix. Amounts listed per 1 sample: o 11.5 uL template RNA (debranched lariat or 6 uL piSPIKE control +5.5 uL water) o 1 uL 10mM dNTP mix o 1 uL 25uM RT primer • RT primer (barcode compatible): /5Phos/AGATCGGAAGAGCGTCGTGTAGGGAAAG/iSp18/CACTCA/iSp18/GTGAC
TGGAGTTCCTTGGCACCCGAGAATTCCA/TTTTTTTTTTTTTTTTTTTTVN (designed in collaboration with Yarden Katz (Folco & Reed, 2014; Katz et al., 2014)) . • Incubate at 65˚C for 5min • Place on ice. • Master mix (1 reaction): o 4 uL 5X first strand buffer (comes with enzyme) o 0.5 uL SuperaseIn o 1 uL 0.1 M DTT o 1 uL SuperScript III Reverse Transcriptase (Invitrogen) • Incubate at 48 ˚C for 30 min. • Add 2.1 uL 1M NaOH. • Incubate at 98˚C for 15 min. • Add 2.1 uL 1M HCl. 121 •
•
•
Prepare to load on gel. This allows for removal of excess RT primer: o Add 22.5 uL 2X denaturing LB to each RT reaction (debranched lariats and piSPIKE +/-­‐ poly(A) tailing controls). o Prepare primer standard for gel: 0.5 uL 25uM RT primer +9.5 uL water + 10 uL 2X LB. o Ladders: 10 bp and 25 bp ladders. o Heat at 95˚C, 1 min. o On ice briefly. o Pre-­‐run gel (10% TBE urea) briefly. o Load gel and run for 1 hour 33 min at 200V. Debranched lariats required 2 lanes/sample due to volumes. o Stain with sybr gold for 10 min. Excise RT product from gel. Product will appear as a smear, mostly larger than the piSPIKE RNA band (corresponds to a 5'SS to BP distance of 31 nt). Avoid excess RT primer that did not extend any RNA. o Put gel slices into 0.5mL epi. Can freeze to help elution. o (optional) Elute piSPIKE RT DNA as a control. Elute RT products from gel: o Poke hole in bottom of 0.5 mL epi (after thawing). o Place 0.5 mL epi into 1.5 mL epi. (used 2 tubes/sample because ran each sample in 2 lanes). o Spin gel through 0.5 mL epi at max speed, shredding gel. o Add 400uL PAGE elution buffer. o Elute at 65˚C for 1 hr shaking at 1400 RPM. o Remove shredded gel using a NanoSep column. o Precipitate DNA (450 uL isopropanol, 2 uL glycogen). o Resuspend in 15 uL 10 mM Tris pH 8.0 VI. Circularization • Prepare mix (1 reaction volumes): o 2 uL 10X Circligase buffer (comes with enzyme) o 1 uL 1 mM ATP o 1 uL 50 mM MnCl2 o 15 uL DNA from RT • put into PCR tubes. • mix well. • Add 1uL Circligase (Epicentre), mix well. • Incubate at 60˚C for 60min. • Incubate at 80˚C for 10 min. • 4˚C for ∞, store at -­‐80˚C or proceed directly to PCR. VII. PCR Illumina PCR primer AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT 122 1.0 was paired with Illumina barcode primers (RPI#s) (RPI1) CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA (RPI2) CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA (RPI3) CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA (RPI4) CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA Oligonucleotide sequences © 2006-­‐2008 Illumina, Inc. All rights reserved. http://epigenome.usc.edu/docs/resources/core_protocols/Illumina%20Sequence%20Info
rmation%20for%20Customers%20DEC2008.pdf • PCR master mix (volumes for 1 reaction, made 4.5X for 4 reactions/sample to remove samples after 6, 8, 10, or 12 PCR cycles): o 3.34 uL 5X HF buffer o 0.334 uL 10 mM dNTPs o 1.67 uL primer mix (5 uM each) o 11.7 uL water o 0.167 uL Phusion high-­‐fidelity polymerase (NEB) • to the 77.4 uL of 4.5X master mix add 4.5 uL of circularized template. • Aliquot 16.7 uL reaction mix + template/PCR strip tube. • Remove samples after 6, 8, 10, and 12 PCR cycles: o 98˚C, 30 sec (initial denature) o 98˚C, 10 sec (denature) o 68˚C, 10 sec (anneal) o 72˚C, 15 sec (extend) • Run PCR on gel o Add 3.4 uL 6X DNA LB to each PCR reaction o Use 25 bp and 10 bp DNA ladders (0.5uL/each) o Run 12 well 8% TBE gel, 200V, 40 min. o Stain with sybr gold, 5min. • Excise the bands (will appear as smears) larger than the no insert circularization product at 141nt. Some of this no insert product will exist, even when care was taken to avoid excision of the RT primer band in earlier gel excision steps. o Elute from gel using 0.5 mL epi and PAGE buffer as above in Reverse Transcription section). o Precipitate in 400 uL 100% isopropanol and 1 uL glycogen. o Resuspend final pellet in 13 uL 10 mM Tris pH 8.0 VIII. Sequencing One Illumina MiSeq flow cell was sequenced at the MIT Bio Micro Center (November 2011). 5' end reads were 50 bases and 3' end reads were 250 bases. 3' end reads were sequenced with custom sequencing primer GTGACTGGAGTTCCTTGGCACCCGAGAATTCCATTTTTTTTTTTTTTTTTTTT to avoid sequencing the un-­‐templated As added by the poly(A) tailing reaction. The 3' end sequencing primer was gel purified prior to use in sequencing. Note: primer design might have to be changed for sequencing on other Illumina machines where custom primers can only be added for first end sequencing, not 2nd end sequencing. 123 Part 2: Advice for future BP sequencing protocols If one wanted to further optimize the current Branch-­‐seq protocol now that several technical obstacles have been overcome, the obvious step to modify first is attachment of sequencing adapters to the RNA. The reason to adjust this step is that the DNA circularization likely loses large lariats. The evidence that supports this hypothesis comes from the Branch-­‐seq data itself. I observed large introns that do not contain reads at their annotated BPs, but instead contain BP reads near their 5'SS at an intronic stretch of adenosines. This implies that large lariats were successfully eluted from the 2D gel and that a stretch of adenosines inside the lariat loop primed reverse transcription. Presumably short RNAs are more readily circularized than long RNAs, resulting in this arrangement of BP reads near the 5'SS, but not near the annotated BP. Thus, ligation of adapters onto the debranched RNA should be attempted instead of poly(A) tailing/circularization to alleviate the size bias. This approach should only been applied to samples with 5'SS to BP lengths suitable for clustering on flow cells used for sequencing (~1 kbp maximum for Illumina). Another option is to use CircLigase II instead of CircLigase I (both from Epicenter) to determine whether changing the enzyme mitigates the size bias. The protocol as it stands now works well for lariat loops 100 nt or shorter (Figure 2-­‐S3B). It is useful to note that I was able to polyadenylate the in vitro spliced FTZ lariat, but was not able to ligate an adapter to it. However, since it is possible to debranch the lariats without adding any additional nucleotides to their 3' ends, the RNA can be linearized and thus should be a good substrate for ligation. The remainder of this appendix provides advice for future BP sequencing endeavors. The first hurdle to overcome when sequencing BPs is isolating lariats. If 124 possible, it is advisable to isolate only nuclear RNA in order to obtain a higher fraction of lariat RNA to linear RNA. This approach will miss cytoplasmic lariats, but the enrichment for lariat RNA should help avoid amplification of undesired RNAs. Strategies that allow lariats to accumulate to high levels in cells are advantageous as well, as demonstrated in Branch-­‐seq (Fig. 2-­‐1A). Double deletion of DBR1 and DRN1, an enzyme that has recently been implicated in debranching (Garrey et al., 2014; Salzman et al., 2012), may improve lariat yields relative to DBR1 single mutants. Based on my attempts to isolate lariats from multiple organisms, I hypothesize that additional lariat degradation pathways apart from DBR1-­‐dependent debranching may exist. This hypothesis largely emerged from my lariat isolation attempts in worm and fly. In dbr1 null worm RNA I did not observe RNA running in a 2D gel arc (Fig. III-­‐S1A). In RNA from fly cells, I did not observe a noticeable difference in 2D gel arc intensity between WT and DBR1 RNAi treated samples (Fig. III-­‐1). Alternative approaches to enrich for lariats that do not require 2D gels are attractive ways to isolate large amounts of lariat RNA. One option is to use a catalytically inactive version of DBR1 created by mutating DBR1 residues implicated in debranchase activity (Findlay, Boyle, Hause, Klein, & Shendure, 2014; Khalid, Damha, Shuman, & Schwer, 2005; Montemayor et al., 2014; Ooi et al., 2001). Introduction of this mutant enzyme into the organism could be used to co-­‐immunoprecipitate (co-­‐IP) DBR1 with lariat RNAs, with or without crosslinking (Ooi et al., 2001; Ule, Jensen, Mele, & Darnell, 2005). Digesting the proteins will leave intact lariat RNA. An analogous approach would be to use an antibody that recognizes branched RNAs (Ingolia et al., 2009; Reilly et al., 1990) for the co-­‐IP instead of mutant DBR1, which would circumvent the need to create a catalytically inactive debranching enzyme. Another way to enrich for lariats would be to use a cocktail of 125 exonucleases to digest all RNAs except circular and lariat RNAs, including both 5' to 3' and 3' to 5' exonucleases. Using a thermostable RNase R as one of these enzymes, or similar enzyme such as hDIS3L2 (Lubas et al., 2013), would allow heating of the RNA to disrupt secondary structure that might otherwise prevent RNase R digestion at conventional reaction temperatures. Additional approaches could take advantage of the “Y” shaped structure created by a cleavage event inside the lariat loop to isolate or sequence branched RNA. One option to enrich for branched RNAs would be to nick lariat loops with a limited RNase digestion, and then isolate “Y” shaped RNAs that have one free 5' end and two free 3' ends. Ligation with a mix of two different 3' adapters should yield a population of “Y” shaped RNAs containing both adapters. Sequential RNA pulldowns on each adapter should enrich for nicked lariat RNAs with two free 3' ends from which BPs can be sequenced. Alternatively, limited digestion to nick RNAs prior to 2D gel electrophoresis could allow large introns to be cleaved to produce smaller “Y” shaped molecules that could be isolated using a 2D gel. Limited digestion to produce “Y” shaped RNAs could also be used after isolating a pure population of lariats to reduce the size of the lariat loop for long introns. Ligation after digestion could produce smaller lariat loops amenable to the current Branch-­‐seq protocol (similar to Fig. I-­‐3 and I-­‐4A). This ligation step could alternatively be used to add adapters inside the lariat loop, forcing production of LJ reads (Chapter 2) that are informative for identifying BP position (similar to Fig.I-­‐3 and I-­‐4B). A last approach could take advantage of the sequence 3' of the BP for isolation and sequencing of BPs. First, one could design capture probes to the 3'SS-­‐exon boundaries to enrich for pre-­‐mRNAs, followed by primer extension with a primer complementary to the 126 3'SS-­‐exon junction. RNAs that have undergone the first but not the second step of splicing should produce an RT stop. As a control, primer extension following debranching should reduce RT stops. As a note, RT will rarely transit past the BP nucleotide in its lariat form during primer extension. In a low throughput experiment, I observed that RT inserts several cytosine nucleotides at the BP nucleotide in this case. This information could be used to identify rare sequences where RT does not stop at the BP. 127 Figures Figure I-­‐1: Debranched and gel shift like behaviors of lariats. Lariats were produced from the coupled in vitro splicing reaction followed by RNase R digestion. Contrast adjusted and false colors added using the Typhoon. (A) Using Abnova human DBR1, different co-­‐
precipitants inhibit debranching activity to various extents. Arrow points to undigested U6 snRNA. Repeated pairs of experiments shown side by side. (B) Addition of SuperaseIn to lariat RNA results in a shifted mobility of lariat RNA. 10% TBE Urea gel. Samples in denaturing loading buffer and were heated prior to gel loading. 128 Figure I-­‐2: S.cer. 2D gels of different densities in D2 show different separation of arc from diagonal. All D1:6% . Samples are total yeast RNA unless otherwise noted. (A) D2:10%. Small gels. (B) D2:15%. Left gel was run using the low range ssRNA ladder. Small gels. (C) D2:20%. Large gels. 129 !"##"$%&"'("$%")%&'"*"+",-%
1. Isolation of total RNA (Trizol)
'./0%
0001%
2. Depletion of rRNA (Ribo-Zero)
0001%
3. Digestion of linear RNA (Rnase R)
4. Addition of poly(A) tail, biotin
(PA polymerase, ATP, ATP-biotin)
0000002%
!"#
$"#
5. Bead capture (streptavidin),
Partial digestion (Rnase A)
0000002%
Figure I-­‐3: Common portion of original proposed Branch-­‐seq protocol for lariat enrichment and capture. Dotted white line: 5'SS. Open circle: BP. Grey Circle: biotin. Grey “Y”: streptavidin capture. Thin line: intron. Boxes: exons. 130 0%
3'4$+526789?%
@7='4$+5;$<2=4-7>%
!"#
$"#
0000002%
3% 3'4$+526789:%%
./0%,;<4("$2=4-7>%
!"#
0000002%
$"#
1.1. RNA circularization
(RNA ligase)
0000002%
0000002%
1.3. RT (adapter-polyTVN primer)
0000002%
NVTTTTT
1.4. Circularize & PCR
(adapter primers)
p
ppA
2.1. Addition of adapters
(RNA ligase)
p
$"#
1.2. Debranching (Dbr1)
%#
0000002%
2.2. RNA circularization
(RNA ligase)
0000002%
2.3. RT (adapter primer,
specific RT enzyme)
0000002%
2.4. PCR (adapter primers)
TTTTT
1.5. Sequencing and
analysis
2.5. Sequencing and
analysis
Figure I-­‐4: Protocols for generation of Illumina libraries for BP and 5'SS sequencing from captured RNA lariats (continuation from Fig. I-­‐3). (A) Branch-­‐SeqV1, which uses debranching followed by anchored adapter-­‐polyT priming to attach adapters. (B) Branch-­‐
SeqV2, which uses RNA ligation to attach adapters. Dotted white line: 5’SS. Long dotted grey line: sequencing primers. Short dotted grey line: path of RT. Open circle: BPS. Grey Circle: biotin. Grey “Y”: streptavidin capture. Thin line: intron. Boxes: exons. 131 References Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C., & Shendure, J. (2014). Saturation editing of genomic regions by multiplex homology-­‐directed repair. Nature, 513(7516), 120–
123. doi:10.1038/nature13695 Folco, E. G., & Reed, R. (2014). In vitro systems for coupling RNAP II transcription to splicing and polyadenylation. Methods in Molecular Biology (Clifton, NJ), 1126, 169–177. doi:10.1007/978-­‐1-­‐62703-­‐980-­‐2_13 Garrey, S. M., Katolik, A., Prekeris, M., Li, X., York, K., Bernards, S., et al. (2014). A homolog of lariat-­‐debranching enzyme modulates turnover of branched RNA. RNA (New York, N.Y.), 20(8), 1337–1348. doi:10.1261/rna.044602.114 Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. S., & Weissman, J. S. (2009). Genome-­‐Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling. Science (New York, N.Y.), 324(5924), 218–223. doi:10.1126/science.1168978 Katz, Y., Li, F., Lambert, N. J., Sokol, E. S., Tam, W.-­‐L., Cheng, A. W., et al. (2014). Musashi proteins are post-­‐transcriptional regulators of the epithelial-­‐luminal cell state. eLife, 3, e03915. doi:10.7554/eLife.03915 Khalid, M. F., Damha, M. J., Shuman, S., & Schwer, B. (2005). Structure-­‐function analysis of yeast RNA debranching enzyme (Dbr1), a manganese-­‐dependent phosphodiesterase. Nucleic Acids Research, 33(19), 6349–6360. doi:10.1093/nar/gki934 Lubas, M., Damgaard, C. K., Tomecki, R., Cysewski, D., Jensen, T. H., & Dziembowski, A. (2013). Exonuclease hDIS3L2 specifies an exosome-­‐independent 3“-­‐5” degradation pathway of human cytoplasmic mRNA. The EMBO Journal, 32(13), 1855–1868. doi:10.1038/emboj.2013.135 Montemayor, E. J., Katolik, A., Clark, N. E., Taylor, A. B., Schuermann, J. P., Combs, D. J., et al. (2014). Structural basis of lariat RNA recognition by the intron debranching enzyme Dbr1. Nucleic Acids Research, 42(16), 10845–10855. doi:10.1093/nar/gku725 Ooi, S. L., Dann, C., Nam, K., Leahy, D. J., Damha, M. J., & Boeke, J. D. (2001). RNA lariat debranching enzyme. Methods in Enzymology, 342, 233–248. Reilly, J. D., Freeman, S. K., Melhem, R. F., Kierzek, R., Caruthers, M. H., Edmonds, M., & Munns, T. W. (1990). Antibodies specific for branched ribonucleic acids. Analytical Biochemistry, 185(1), 125–130. Salzman, J., Gawad, C., Wang, P. L., Lacayo, N., & Brown, P. O. (2012). Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PloS One, 7(2), e30733. doi:10.1371/journal.pone.0030733 Ule, J., Jensen, K., Mele, A., & Darnell, R. B. (2005). CLIP: a method for identifying protein-­‐
RNA interaction sites in living cells. Methods (San Diego, Calif.), 37(4), 376–386. doi:10.1016/j.ymeth.2005.07.018 132 Appendix II: Supplemental Tables to Chapter 2 133 Table II-­‐S1. Branch-­‐seq BP peaks paired 5'SS motifs. No. mismatches from
/GTATGT
153 annotated BPs:
winBP
191 putative novel BPs:
winBP
196 annotated BPs:
GEM-BP
350 putative novel BPs:
GEM-BP
134 0
1
2
3
4
5
6
All
0 or 1
mut
% 0 or
1 mut
103
35
7
7
1
0
0
153
138
90.20
64
62
28
13
18
5
1
191
126
65.97
105
44
24
16
7
0
0
196
149
76.02
61
161
78
34
12
3
1
350
222
63.43
Table II-­‐S2. GEM-­‐BP and winBP peaks 5ss_bp_pair is in the format chr:nt1:nt2:strand nt1 < nt2 On the plus strand, nt1 is the 5'ss and nt2 is the BP On the reverse strand, nt1 is the BP and nt2 is the 5'SS In general 1=true, 0=false, except for 5ss_mm, the number of mismatches at the 5'ss from GTATGT u_a and u_n are the union of annotated and novel BP, respectively, of both peak callers chr
bp_nt
strand
5ss_bp_pair
win
BP
GEM
-BP
bp_
anno
5ss_
anno
5ss_
mm
5ss
u_a
u_n
cnBP
chrXII
-
chrXII:564467:564515:-
1
1
1
1
1
GTAAGT
1
0
0
chrIV
564467
123876
9
-
chrIV:1238769:1238817:-
1
1
1
1
0
GTATGT
1
0
0
chrXII
857038
+
chrXII:856988:857038:+
1
1
1
0
2
GTAAGC
1
0
0
chrII
479389
+
chrII:479340:479389:+
1
1
1
1
0
GTATGT
1
0
0
chrXII
chrXII
I
694444
+
chrXII:694385:694444:+
1
1
1
1
0
GTATGT
1
0
0
666961
-
chrXIII:666961:667016:-
1
1
1
1
0
GTATGT
1
0
0
chrV
307801
+
chrV:307743:307801:+
1
1
1
1
0
GTATGT
1
0
0
chrX
365853
131975
1
+
chrX:365780:365853:+
1
1
1
1
0
GTATGT
1
0
0
-
chrIV:1319751:1319809:-
1
1
1
1
0
GTATGT
1
0
0
407047
110387
1
-
chrII:407047:407116:-
1
1
1
1
0
GTATGT
1
0
0
chrIV
+
chrIV:1103808:1103871:+
1
1
1
1
0
GTATGT
1
0
0
chrIX
166484
+
chrIX:166432:166484:+
1
1
1
1
0
GTATGT
1
0
0
chrVII
157245
-
chrVII:157245:157288:-
1
1
1
1
0
GTATGT
1
0
0
chrII
142787
-
chrII:142787:142849:-
1
1
1
1
0
GTATGT
1
0
0
chrV
548611
+
chrV:548548:548611:+
1
1
1
1
0
GTATGT
1
0
0
chrIV
337596
+
chrIV:337525:337596:+
1
1
1
1
1
GTAAGT
1
0
0
chrXI
625594
+
chrXI:625544:625594:+
1
1
1
1
1
GTAAGT
1
0
0
chrV
chrXI
V
chrXII
I
159013
-
chrV:159013:159086:-
1
1
1
1
0
GTATGT
1
0
0
48377
+
chrXIV:48293:48377:+
1
1
1
1
0
GTATGT
1
0
0
140113
-
chrXIII:140113:140183:-
1
1
1
1
0
GTATGT
1
0
0
chrIV
267780
+
chrIV:267726:267780:+
1
1
1
1
0
GTATGT
1
0
0
chrX
chrXV
I
469190
-
chrX:469190:469256:-
1
1
1
1
2
GTTCGT
1
0
0
883453
126679
0
+
chrXVI:883384:883453:+
1
1
1
1
0
GTATGT
1
0
0
-
chrIV:1266790:1266854:-
1
1
1
1
0
GTATGT
1
0
0
286498
131962
7
123758
1
-
chrXII:286498:286557:-
1
1
1
1
0
GTATGT
1
0
0
-
chrIV:1319627:1319690:-
1
1
1
1
0
GTATGT
1
0
0
+
chrIV:1237527:1237581:+
1
1
1
0
1
GTTTGT
1
0
0
218711
+
chrXVI:218646:218711:+
1
1
1
1
0
GTATGT
1
0
0
+
chrI:87389:87439:+
1
1
1
1
1
GTAAGT
1
0
0
chrIV
87439
145048
5
-
chrIV:1450485:1450533:-
1
1
1
0
3
GTTGTT
1
0
0
chrXII
766086
-
chrXII:766086:766129:-
1
1
1
1
0
GTATGT
1
0
0
chrII
170771
+
chrII:170680:170771:+
1
1
1
1
0
GTATGT
1
0
0
chrIV
254999
-
chrIV:254999:255044:-
1
1
1
1
0
GTATGT
1
0
0
chrIV
chrII
chrIV
chrXII
chrIV
chrIV
chrXV
I
chrI
135 chrVI
63915
-
chrVI:63915:63972:-
1
1
1
1
0
GTATGT
1
0
0
chrII
chrVII
I
chrXII
I
47071
-
chrII:47071:47143:-
1
1
1
1
0
GTATGT
1
0
0
129611
+
chrVIII:129523:129611:+
1
1
1
1
0
GTATGT
1
0
0
206162
+
chrXIII:206098:206162:+
1
1
1
1
0
GTATGT
1
0
0
chrIV
239421
-
chrIV:239421:239509:-
1
1
1
1
0
GTATGT
1
0
0
chrIII
173137
-
chrIII:173137:173194:-
1
1
1
1
0
GTATGT
1
0
0
chrVII
chrXII
I
62173
+
chrVII:62132:62173:+
1
1
1
1
0
GTATGT
1
0
0
854879
+
chrXIII:854816:854879:+
1
1
1
1
0
GTATGT
1
0
0
chrX
435302
+
chrX:435222:435302:+
1
1
1
1
0
GTATGT
1
0
0
chrII
462255
+
chrII:462204:462255:+
1
1
1
1
1
GTATGA
1
0
0
chrV
308040
+
chrV:307953:308040:+
1
1
1
1
0
GTATGT
1
0
0
chrXII
chrXV
I
786667
+
chrXII:786616:786667:+
1
1
1
1
0
GTATGT
1
0
0
174036
+
chrXVI:173971:174036:+
1
1
1
0
3
GTTTTA
1
0
0
chrIV
733713
-
chrIV:733713:733773:-
1
1
1
1
0
GTATGT
1
0
0
chrXV
chrXV
I
92476
-
chrXV:92476:92521:-
1
1
1
0
3
GTAATG
1
0
0
911328
+
chrXVI:911273:911328:+
1
1
1
1
0
GTATGT
1
0
0
chrXI
chrVII
I
618096
-
chrXI:618096:618168:-
1
1
1
0
1
GTTTGT
1
0
0
189778
-
chrVIII:189778:189843:-
1
1
1
1
0
GTATGT
1
0
0
chrIV
chrXV
I
65358
+
chrIV:65308:65358:+
1
1
1
1
0
GTATGT
1
0
0
492958
-
chrXVI:492958:493018:-
1
1
1
1
0
GTATGT
1
0
0
chrII
chrXV
I
726947
-
chrII:726947:727006:-
1
1
1
1
0
GTATGT
1
0
0
833779
+
chrXVI:833690:833779:+
1
1
1
1
0
GTATGT
1
0
0
chrII
653429
+
chrII:653364:653429:+
1
1
1
1
0
GTATGT
1
0
0
chrII
186375
-
chrII:186375:186430:-
1
1
1
1
0
GTATGT
1
0
0
chrX
74178
+
chrX:74112:74178:+
1
1
1
0
2
GGTTGT
1
0
0
chrIV
715265
-
chrIV:715265:715356:-
1
1
1
1
0
GTATGT
1
0
0
chrVII
346854
-
chrVII:346854:346896:-
1
1
1
1
0
GTATGT
1
0
0
chrV
131853
+
chrV:131777:131853:+
1
1
1
1
0
GTATGT
1
0
0
chrXII
744219
102463
1
+
1
1
1
1
1
GTACGT
1
0
0
+
chrXII:744156:744219:+
chrXII:1024570:1024631:
+
1
1
1
1
1
GTAAGT
1
0
0
545341
+
chrXIV:545293:545341:+
1
1
1
1
1
GTAAGT
1
0
0
chrXII
chrXI
V
chrXI
V
609837
+
chrXIV:609792:609837:+
1
1
1
1
1
GTAAGT
1
0
0
chrII
462472
+
chrII:462424:462472:+
1
1
1
1
0
GTATGT
1
0
0
chrX
387380
-
chrX:387380:387430:-
1
1
1
1
0
GTATGT
1
0
0
chrIV
399468
+
chrIV:399360:399468:+
1
1
1
1
0
GTATGT
1
0
0
chrII
602175
+
chrII:602099:602175:+
1
1
1
1
0
GTATGT
1
0
0
chrIII
chrXI
V
101646
-
chrIII:101646:101700:-
1
1
1
1
0
GTATGT
1
0
0
534915
-
chrXIV:534915:534966:-
1
1
1
1
1
GTATGC
1
0
0
chrVI
chrVII
I
203304
-
chrVI:203304:203374:-
1
1
1
1
0
GTATGT
1
0
0
251233
+
chrVIII:251158:251233:+
1
1
1
1
0
GTATGT
1
0
0
chrXII
987191
+
chrXII:987139:987191:+
1
1
1
1
0
GTATGT
1
0
0
chrXV
900802
-
chrXV:900802:900850:-
1
1
1
0
2
GTAATT
1
0
0
chrVII
497394
-
chrVII:497394:497462:-
1
1
1
1
0
GTATGT
1
0
0
chrXI
447376
-
chrXI:447376:447453:-
1
1
1
1
0
GTATGT
1
0
0
chrXV
242453
-
chrXV:242453:242503:-
1
1
1
1
0
GTATGT
1
0
0
chrXV
240976
-
chrXV:240976:241024:-
1
1
1
1
1
GTAAGT
1
0
0
136 chrVI
221291
-
chrVI:221291:221402:-
1
1
1
1
0
GTATGT
1
0
0
chrXI
chrXI
V
chrXI
V
449611
-
chrXI:449611:449663:-
1
1
1
1
0
GTATGT
1
0
0
366096
+
chrXIV:366038:366096:+
1
1
1
1
0
GTATGT
1
0
0
351021
+
chrXIV:350960:351021:+
1
1
1
1
0
GTATGT
1
0
0
chrIII
107089
+
chrIII:107034:107089:+
1
1
1
1
0
GTATGT
1
0
0
chrII
125234
+
chrII:125158:125234:+
1
1
1
1
0
GTATGT
1
0
0
chrV
chrXII
I
148254
+
chrV:148194:148254:+
1
1
1
1
1
GTATGC
1
0
0
721231
-
chrXIII:721231:721344:-
1
1
1
1
0
GTATGT
1
0
0
chrIII
chrXII
I
177952
-
chrIII:177952:178031:-
1
1
1
0
1
GTATAT
1
0
0
123777
-
chrXIII:123777:123824:-
1
0
1
0
3
ATCTCT
1
0
0
chrIV
chrVII
I
569665
-
chrIV:569665:569722:-
1
1
1
1
0
GTATGT
1
0
0
498731
-
chrVIII:498731:498786:-
1
1
1
1
0
GTATGT
1
0
0
chrXI
chrXII
I
437526
+
chrXI:437481:437526:+
1
1
1
1
0
GTATGT
1
0
0
337886
+
chrXIII:337817:337886:+
1
1
1
1
1
GTACGT
1
0
0
chrXI
chrVII
I
83061
+
chrXI:83004:83061:+
1
1
1
1
0
GTATGT
1
0
0
107875
+
chrVIII:107827:107875:+
1
1
1
1
1
GTAAGT
1
0
0
chrXII
327295
-
chrXII:327295:327400:-
1
1
1
1
0
GTATGT
1
0
0
chrV
166786
-
chrV:166786:166873:-
1
1
1
1
0
GTATGT
1
0
0
chrII
110451
-
chrII:110451:110507:-
1
1
1
1
1
GTAAGT
1
0
0
chrVI
242044
+
chrVI:241997:242044:+
1
1
1
1
1
GTAAGT
1
0
0
chrVII
497961
-
chrVII:497961:498003:-
1
0
1
1
1
GTAAGT
1
0
0
chrVII
556282
+
chrVII:556232:556282:+
1
1
1
0
3
GTGGAT
1
0
0
chrX
50351
-
chrX:50351:50411:-
1
1
1
1
0
GTATGT
1
0
0
chrIII
107255
+
chrIII:107192:107255:+
1
1
1
1
0
GTATGT
1
0
0
chrII
679973
121294
1
-
chrII:679973:680034:-
1
1
1
1
0
GTATGT
1
0
0
+
chrIV:1212871:1212941:+
1
1
1
1
0
GTATGT
1
0
0
494551
-
chrXIV:494551:494633:-
1
1
1
0
3
GTAATG
1
0
0
chrIV
chrXI
V
chrVII
I
298418
-
chrVIII:298418:298487:-
1
1
1
1
0
GTATGT
1
0
0
chrIX
chrXII
I
chrVII
I
155276
+
chrIX:155220:155276:+
1
1
1
1
1
GCATGT
1
0
0
211547
+
chrXIII:211502:211547:+
1
1
1
0
4
TTTTAA
1
0
0
255689
-
chrVIII:255689:255752:-
1
1
1
1
0
GTATGT
1
0
0
chrII
606615
+
chrII:606567:606615:+
1
1
1
0
1
TTATGT
1
0
0
chrVII
chrVII
I
chrXV
I
73034
-
chrVII:73034:73136:-
1
1
1
1
1
GTACGT
1
0
0
354926
+
chrVIII:354868:354926:+
1
1
1
1
0
GTATGT
1
0
0
623662
+
chrXVI:623575:623662:+
1
1
1
1
0
GTATGT
1
0
0
chrIX
chrXV
I
chrXI
V
chrXI
V
47743
+
chrIX:47699:47743:+
1
1
1
1
1
GTAAGT
1
0
0
407002
+
chrXVI:406952:407002:+
1
1
1
0
1
GTGTGT
1
0
0
145191
-
chrXIV:145191:145255:-
1
1
1
1
0
GTATGT
1
0
0
557672
+
chrXIV:557612:557672:+
1
1
1
1
0
GTATGT
1
0
0
chrIV
chrXV
I
579963
+
chrIV:579894:579963:+
1
1
1
0
1
GTATGG
1
0
0
96189
-
chrXVI:96189:96233:-
1
1
1
1
0
GTATGT
1
0
0
chrV
348217
-
chrV:348217:348272:-
1
1
1
1
0
GTATGT
1
0
0
chrVII
946407
+
chrVII:946331:946407:+
1
1
1
1
1
GTACGT
1
0
0
chrIX
317136
+
chrIX:317062:317136:+
1
1
1
0
2
TTCTGT
1
0
0
chrII
333365
+
chrII:333319:333365:+
1
1
1
0
2
GTAGGA
1
0
0
137 chrXI
V
chrXV
I
380726
-
chrXIV:380726:380783:-
1
1
1
1
0
GTATGT
1
0
0
412958
+
chrXVI:412887:412958:+
1
1
1
0
2
TAATGT
1
0
0
chrIX
232012
-
chrIX:232012:232070:-
1
1
1
0
3
GTACTG
1
0
0
chrXII
40353
-
chrXII:40353:40399:-
1
1
1
1
1
GTATGC
1
0
0
chrIV
431423
-
chrIV:431423:431470:-
1
1
1
1
1
GTACGT
1
0
0
chrXII
chrXV
I
766205
-
chrXII:766205:766249:-
1
1
1
1
0
GTATGT
1
0
0
305374
107334
6
+
chrXVI:305306:305374:+
1
1
1
1
0
GTATGT
1
0
0
-
chrIV:1073346:1073398:-
1
1
1
1
1
GTATGC
1
0
0
chrIV
chrXII
I
82343
+
chrXIII:82291:82343:+
1
1
1
1
0
GTATGT
1
0
0
chrXII
398583
+
chrXII:398534:398583:+
1
1
1
1
0
GTATGT
1
0
0
chrIV
chrXI
V
458048
-
chrIV:458048:458095:-
1
1
1
1
0
GTATGT
1
0
0
185553
+
chrXIV:185493:185553:+
1
1
1
1
0
GTATGT
1
0
0
chrXI
chrXV
I
chrXV
I
430163
-
chrXI:430163:430239:-
1
1
1
1
0
GTATGT
1
0
0
729414
-
chrXVI:729414:729479:-
1
1
1
1
0
GTATGT
1
0
0
678219
-
chrXVI:678219:678276:-
1
1
1
1
0
GTATGT
1
0
0
chrXII
548706
-
chrXII:548706:548765:-
1
1
1
1
0
GTATGT
1
0
0
chrX
chrVII
I
chrVII
I
396512
-
chrX:396512:396565:-
1
1
1
1
0
GTATGT
1
0
0
315810
-
chrVIII:315810:315861:-
1
1
1
1
0
GTATGT
1
0
0
187613
-
chrVIII:187613:187669:-
1
1
1
1
0
GTATGT
1
0
0
chrIII
111588
-
chrIII:111588:111632:-
1
1
1
1
1
GTATGC
1
0
0
chrII
chrXII
I
chrXII
I
110932
+
chrII:110882:110932:+
1
1
1
1
1
GTATGC
1
0
0
225264
-
chrXIII:225264:225338:-
1
1
1
1
1
GTACGT
1
0
0
652797
-
chrXIII:652797:652846:-
1
1
1
1
1
GTTTGT
1
0
0
chrXII
250899
-
chrXII:250899:250948:-
1
1
1
1
1
GTATGG
1
0
0
chrI
chrXII
I
chrXII
I
151022
-
chrI:151022:151098:-
1
1
1
1
0
GTATGT
1
0
0
537527
+
chrXIII:537448:537527:+
1
1
1
1
0
GTATGT
1
0
0
99301
-
chrXIII:99301:99375:-
1
1
1
1
0
GTATGT
1
0
0
chrVII
543708
+
chrVII:543643:543708:+
1
1
1
1
0
GTATGT
1
0
0
chrII
366534
-
chrII:366534:366582:-
1
1
1
1
0
GTATGT
1
0
0
chrII
chrVII
I
chrXII
I
426524
-
chrII:426524:426624:-
0
1
1
0
1
GTAAGT
1
0
0
104751
+
chrVIII:104684:104751:+
0
1
1
0
2
ACATGT
1
0
0
223421
-
chrXIII:223421:223496:-
0
1
1
0
2
GTACGC
1
0
0
chrV
269803
-
chrV:269803:269865:-
0
1
1
0
3
GTGTTA
1
0
0
chrX
chrXII
I
chrXII
I
chrVII
I
608540
+
chrX:608476:608540:+
0
1
1
0
1
GTAGGT
1
0
0
753788
-
chrXIII:753788:753832:-
0
1
1
0
4
ATAAAA
1
0
0
517842
+
chrXIII:517791:517842:+
0
1
1
0
3
GTTAGC
1
0
0
138274
-
chrVIII:138274:138323:-
0
1
1
0
1
GTCTGT
1
0
0
chrIV
307375
-
chrIV:307375:307457:-
0
1
1
0
2
GGAAGT
1
0
0
chrVI
chrXI
V
64615
-
chrVI:64615:64682:-
0
1
1
0
4
GCAGTA
1
0
0
62407
-
chrXIV:62407:62472:-
0
1
1
0
3
AGTTGT
1
0
0
chrX
702871
-
chrX:702871:702937:-
0
1
1
0
1
TTATGT
1
0
0
chrXV
chrVII
I
chrXII
I
93868
-
chrXV:93868:93914:-
0
1
1
0
2
GTACGG
1
0
0
262372
-
chrVIII:262372:262442:-
0
1
1
1
0
GTATGT
1
0
0
425113
+
chrXIII:425020:425113:+
0
1
1
0
3
TTTTCT
1
0
0
chrII
168770
+
chrII:168696:168770:+
0
1
1
0
2
GTAATT
1
0
0
138 chrXV
I
345483
-
chrXVI:345483:345574:-
0
1
1
0
4
TAACGG
1
0
0
chrXII
chrXII
I
242652
+
chrXII:242595:242652:+
0
1
1
0
2
GTACTT
1
0
0
732835
+
chrXIII:732773:732835:+
0
1
1
0
2
GTATTA
1
0
0
chrV
397225
+
chrV:397147:397225:+
0
1
1
0
3
TGTTGT
1
0
0
chrIV
chrXII
I
630016
+
chrIV:629904:630016:+
0
1
1
1
0
GTATGT
1
0
0
499898
-
chrXIII:499898:499948:-
0
1
1
0
1
TTATGT
1
0
0
chrVII
311486
+
chrVII:311402:311486:+
0
1
1
0
2
GGTTGT
1
0
0
chrXV
chrVII
I
778910
-
chrXV:778910:778959:-
0
1
1
0
4
GTCGAA
1
0
0
382356
-
chrVIII:382356:382410:-
0
1
1
0
4
ATACAA
1
0
0
chrV
433162
+
chrV:433110:433162:+
0
1
1
0
3
TTCTCT
1
0
0
chrIV
230262
+
chrIV:230202:230262:+
0
1
1
0
1
GTATTT
1
0
0
chrIV
chrXI
V
chrXV
I
491873
+
chrIV:491792:491873:+
0
1
1
0
3
ATTTTT
1
0
0
623270
+
chrXIV:623196:623270:+
0
1
1
0
3
GTACAG
1
0
0
76014
-
chrXVI:76014:76071:-
0
1
1
0
2
GTCTTT
1
0
0
chrXII
522984
+
chrXII:522893:522984:+
0
1
1
0
1
GTATGC
1
0
0
chrVII
chrXV
I
435728
+
chrVII:435686:435728:+
0
1
1
0
2
TTAAGT
1
0
0
405426
+
chrXVI:405372:405426:+
0
1
1
0
3
ATTTAT
1
0
0
chrV
239667
-
chrV:239667:239710:-
0
1
1
1
1
GTACGT
1
0
0
chrXI
chrXI
V
283016
+
chrXI:282962:283016:+
0
1
1
0
2
GGAAGT
1
0
0
415872
+
chrXIV:415827:415872:+
0
1
1
0
2
GCATGA
1
0
0
chrXV
867516
+
chrXV:867439:867516:+
0
1
1
0
3
GTAAAA
1
0
0
chrX
76268
+
chrX:76213:76268:+
0
1
1
0
2
GTATTG
1
0
0
chrIX
225834
-
chrIX:225834:225896:-
0
1
1
1
1
GTATAT
1
0
0
chrXII
chrXI
V
263540
+
chrXII:263478:263540:+
0
1
1
0
4
GCAGTA
1
0
0
331803
+
chrXIV:331744:331803:+
0
1
1
0
2
ATATGC
1
0
0
chrIX
chrXII
I
348380
-
chrIX:348380:348491:-
0
1
1
1
1
GTATGA
1
0
0
651593
+
chrXIII:651524:651593:+
0
1
1
0
2
GTACAT
1
0
0
chrIV
chrXV
I
chrXV
I
217970
+
chrIV:217907:217970:+
0
1
1
0
2
GTATTA
1
0
0
654534
+
chrXVI:654471:654534:+
0
1
1
0
2
GTATAA
1
0
0
335968
+
chrXVI:335905:335968:+
1
0
0
0
0
GTATGT
0
1
1
chrIV
929251
+
chrIV:929206:929251:+
1
0
0
0
1
GTATGA
0
0
0
chrX
264006
-
chrX:264006:264058:-
1
0
0
0
2
GTGCGT
0
0
0
chrV
305682
+
chrV:305624:305682:+
1
0
0
0
0
GTATGT
0
0
0
chrXII
609447
-
chrXII:609447:609498:-
1
1
0
0
1
GTATGA
0
1
1
chrIV
331276
+
chrIV:331190:331276:+
1
0
0
0
1
GTAAGT
0
0
0
chrIV
392679
+
chrIV:392591:392679:+
1
0
0
0
0
GTATGT
0
0
0
chrV
374670
-
chrV:374670:374721:-
1
0
0
0
1
GTTTGT
0
0
0
chrXII
818678
+
chrXII:818645:818678:+
1
0
0
0
3
GTGGCT
0
1
0
chrVII
543691
+
chrVII:543643:543691:+
1
0
0
1
0
GTATGT
0
1
1
chrV
151105
+
chrV:151038:151105:+
1
0
0
0
0
GTATGT
0
1
1
chrXII
522714
+
chrXII:522672:522714:+
1
0
0
1
0
GTATGT
0
0
0
chrXV
349520
-
chrXV:349520:349598:-
1
0
0
0
0
GTATGT
0
1
1
chrII
306856
-
chrII:306856:306902:-
1
0
0
0
2
GTATCA
0
0
0
chrXI
225773
-
chrXI:225773:225852:-
1
0
0
0
1
GTAAGT
0
0
0
chrII
643000
-
chrII:643000:643073:-
1
0
0
0
3
GCTCGT
0
1
0
chrIX
54025
+
chrIX:53941:54025:+
1
0
0
0
0
GTATGT
0
0
0
139 chrIV
268933
-
chrIV:268933:269000:-
1
1
0
0
1
GTACGT
0
1
1
chrII
170731
+
chrII:170680:170731:+
1
0
0
1
0
GTATGT
0
1
1
chrX
172450
-
chrX:172450:172499:-
1
1
0
0
4
AGTTCT
0
1
0
chrVII
chrXI
V
594137
-
chrVII:594137:594202:-
1
0
0
0
0
GTATGT
0
0
0
413438
-
chrXIV:413438:413482:-
1
0
0
0
4
CGGCGT
0
1
0
chrXII
327338
-
chrXII:327338:327400:-
1
0
0
0
4
CCATTA
0
1
0
chrV
540383
-
chrV:540383:540436:-
1
0
0
0
0
GTATGT
0
1
1
chrXV
117635
-
chrXV:117635:117684:-
1
0
0
0
1
GTATGC
0
1
1
chrXI
chrXV
I
231492
+
chrXI:231447:231492:+
1
1
0
0
1
GTATGA
0
1
1
883586
-
chrXVI:883586:883649:-
1
0
0
0
1
GTATAT
0
0
0
chrXII
chrXV
I
605300
-
chrXII:605300:605434:-
1
0
0
0
3
GCTCGT
0
1
0
481182
-
chrXVI:481182:481238:-
1
0
0
0
1
GTACGT
0
0
0
chrIV
768399
-
chrIV:768399:768453:-
1
0
0
0
1
ATATGT
0
0
0
chrIX
127052
-
chrIX:127052:127135:-
1
0
0
0
1
GTATAT
0
0
0
chrIII
228654
+
chrIII:228610:228654:+
1
0
0
0
1
GTATGA
0
0
0
chrVII
chrXV
I
497408
-
chrVII:497408:497462:-
1
0
0
1
0
GTATGT
0
1
1
656520
-
chrXVI:656520:656572:-
1
1
0
0
2
GTAGTT
0
1
0
chrII
221024
+
chrII:220982:221024:+
1
0
0
0
1
GTATGA
0
1
1
chrXII
382387
+
chrXII:382302:382387:+
1
0
0
0
0
GTATGT
0
1
1
chrIII
42034
+
chrIII:41999:42034:+
1
0
0
0
3
CTAATT
0
1
0
chrII
443734
-
chrII:443734:443827:-
1
1
0
0
0
GTATGT
0
1
1
chrX
234005
+
chrX:233962:234005:+
1
0
0
0
4
CTGGCT
0
1
0
chrXII
chrXV
I
50295
+
chrXII:50224:50295:+
1
0
0
0
0
GTATGT
0
0
0
445519
-
chrXVI:445519:445573:-
1
1
0
0
1
GTGTGT
0
1
1
chrII
chrXII
I
341037
-
chrII:341037:341094:-
1
0
0
0
0
GTATGT
0
0
0
480618
-
chrXIII:480618:480665:-
1
0
0
0
4
ATGACT
0
1
0
chrIX
261764
+
chrIX:261723:261764:+
1
0
0
0
1
GTATGA
0
0
0
chrXV
791703
-
chrXV:791703:791753:-
1
1
0
0
2
TTACGT
0
1
0
chrII
291715
114514
8
-
chrII:291715:291771:-
1
0
0
0
1
GTAAGT
0
1
1
+
chrIV:1145107:1145148:+
1
0
0
0
1
GTACGT
0
1
1
chrIV
chrXI
V
583910
-
chrXIV:583910:583971:-
1
1
0
0
0
GTATGT
0
1
1
chrV
258656
+
chrV:258593:258656:+
1
0
0
0
2
GTAGGA
0
0
0
chrIV
+
chrIV:235039:235105:+
1
0
0
0
1
GTATGA
0
0
0
chrIV
235105
149051
6
-
chrIV:1490516:1490571:-
1
0
0
0
0
GTATGT
0
0
0
chrXI
272946
-
chrXI:272946:272992:-
1
0
0
0
0
GTATGT
0
0
0
chrXI
67767
-
chrXI:67767:67805:-
1
0
0
0
4
GACTAA
0
1
0
chrXII
491543
+
chrXII:491497:491543:+
1
0
0
0
1
GTATAT
0
0
0
chrXV
chrXI
V
448694
+
chrXV:448667:448694:+
1
0
0
0
3
TAATAT
0
1
0
429574
-
chrXIV:429574:429637:-
1
0
0
0
0
GTATGT
0
0
0
chrIV
122157
+
chrIV:122079:122157:+
1
0
0
1
0
GTATGT
0
0
0
chrV
chrXV
I
chrVII
I
chrXII
I
166806
-
chrV:166806:166873:-
1
0
0
1
0
GTATGT
0
1
1
281450
-
chrXVI:281450:281502:-
1
0
0
1
0
GTATGT
0
1
1
148769
-
chrVIII:148769:148814:-
1
1
0
0
1
GTATGC
0
1
1
652808
-
chrXIII:652808:652846:-
1
0
0
1
1
GTTTGT
0
1
1
chrVII
143935
-
chrVII:143935:143984:-
1
0
0
0
0
GTATGT
0
1
1
chrVII
472352
-
chrVII:472352:472397:-
1
0
0
0
2
GTGAGT
0
1
0
140 chrXI
V
763544
+
chrXIV:763498:763544:+
1
0
0
0
3
GTACAG
0
0
0
chrVII
555885
+
chrVII:555835:555885:+
1
0
0
1
0
GTATGT
0
1
1
chrXI
74700
+
chrXI:74657:74700:+
1
0
0
0
4
GCGACT
0
1
0
chrII
142813
-
chrII:142813:142850:-
1
0
0
0
3
CTAAGA
0
1
0
chrII
342755
+
chrII:342697:342755:+
1
0
0
0
1
GTGTGT
0
0
0
chrXII
286519
-
chrXII:286519:286557:-
1
0
0
1
0
GTATGT
0
1
1
chrIV
992896
+
chrIV:992862:992896:+
1
0
0
0
4
CCATCG
0
1
0
chrVII
574775
+
chrVII:574705:574775:+
1
0
0
0
1
GTATTT
0
0
0
chrXII
83742
-
chrXII:83742:83803:-
1
1
0
0
3
GTTGTT
0
1
0
chrIII
123571
-
chrIII:123571:123637:-
1
0
0
0
3
GTCTAG
0
1
0
chrIV
107156
+
chrIV:107107:107156:+
1
0
0
0
1
GTAGGT
0
1
1
chrIV
22303
-
chrIV:22303:22355:-
1
0
0
0
1
GTAAGT
0
1
1
chrVII
chrXII
I
439462
+
chrVII:439387:439462:+
1
0
0
0
1
GTAAGT
0
1
1
182669
+
chrXIII:182594:182669:+
1
0
0
0
0
GTATGT
0
1
1
chrI
152002
-
chrI:152002:152055:-
1
0
0
0
2
GTATAC
0
0
0
chrXV
chrXV
I
720497
+
chrXV:720444:720497:+
1
0
0
0
0
GTATGT
0
0
0
729395
-
chrXVI:729395:729479:-
1
0
0
1
0
GTATGT
0
1
1
chrIX
166501
125699
9
+
chrIX:166432:166501:+
1
0
0
1
0
GTATGT
0
1
1
-
chrIV:1256999:1257056:-
1
0
0
0
4
GGTTAG
0
1
0
chrVII
chrXI
V
167428
+
chrVII:167361:167428:+
1
0
0
1
0
GTATGT
0
0
0
427176
+
chrXIV:427106:427176:+
1
0
0
0
1
GTATGA
0
0
0
chrXII
chrXI
V
987202
+
chrXII:987139:987202:+
1
0
0
1
0
GTATGT
0
0
0
174520
-
chrXIV:174520:174572:-
1
0
0
0
2
GTATCG
0
0
0
chrXV
chrXV
I
chrVII
I
845020
+
chrXV:844959:845020:+
1
0
0
0
0
GTATGT
0
0
0
218695
+
chrXVI:218646:218695:+
1
0
0
1
0
GTATGT
0
1
1
189806
-
chrVIII:189806:189844:-
1
0
0
0
4
GCTGAT
0
1
0
chrVII
253204
-
chrVII:253204:253253:-
1
0
0
0
1
GTATGA
0
0
0
chrIX
chrXV
I
128117
+
chrIX:128061:128117:+
1
0
0
0
2
TTTTGT
0
0
0
937537
-
chrXVI:937537:937617:-
1
0
0
0
0
GTATGT
0
1
1
chrIV
456687
-
chrIV:456687:456757:-
1
0
0
0
1
GTAAGT
0
0
0
chrXI
193009
-
chrXI:193009:193071:-
1
0
0
0
0
GTATGT
0
0
0
chrXI
93382
-
chrXI:93382:93470:-
1
0
0
1
1
GTACGT
0
0
0
chrIV
655244
+
chrIV:655202:655244:+
1
0
0
0
1
GTATGC
0
1
1
chrV
chrXI
V
292571
-
chrV:292571:292618:-
1
1
0
0
2
GTACAT
0
1
0
380699
-
chrXIV:380699:380783:-
1
0
0
1
0
GTATGT
0
0
0
chrI
142338
+
chrI:142256:142338:+
1
0
0
1
0
GTATGT
0
0
0
chrII
606346
+
chrII:606276:606346:+
1
0
0
1
0
GTATGT
0
1
1
chrXII
744185
+
chrXII:744185:744221:+
1
0
0
0
4
ATAATC
0
1
0
chrXV
505980
+
chrXV:505939:505980:+
1
0
0
1
0
GTATGT
0
1
1
chrX
chrVII
I
71474
+
chrX:71426:71474:+
1
0
0
0
2
GTTGGT
0
0
0
406673
-
chrVIII:406673:406756:-
1
0
0
0
1
GCATGT
0
1
1
chrIV
451430
-
chrIV:451430:451488:-
1
0
0
0
1
GTATGG
0
1
1
chrVII
561204
-
chrVII:561204:561267:-
1
0
0
0
2
GGTTGT
0
0
0
chrVI
176326
-
chrVI:176326:176385:-
1
0
0
0
1
GTATGG
0
0
0
chrVII
chrXI
V
700757
+
chrVII:700717:700757:+
1
0
0
0
4
TCCTGA
0
1
0
373657
+
chrXIV:373596:373657:+
1
0
0
0
1
GTATGA
0
0
0
chrIV
141 chrXV
518763
-
chrXV:518763:518826:-
1
0
0
0
1
ATATGT
0
0
0
chrIX
chrXII
I
387166
+
chrIX:387136:387166:+
1
0
0
0
5
CCAACA
0
1
0
290869
+
chrXIII:290835:290869:+
1
0
0
0
4
CACCGT
0
1
0
chrXII
982467
-
chrXII:982467:982535:-
1
0
0
0
0
GTATGT
0
0
0
chrX
570579
-
chrX:570579:570633:-
1
0
0
0
3
GTAGTC
0
0
0
chrIV
715306
-
chrIV:715264:715306:-
1
0
0
0
2
GTTAGT
0
1
0
chrV
517899
+
chrV:517847:517899:+
1
0
0
0
0
GTATGT
0
0
0
chrX
31819
+
chrX:31768:31819:+
1
0
0
0
0
GTATGT
0
0
0
chrX
435281
+
chrX:435222:435281:+
1
0
0
1
0
GTATGT
0
1
1
chrXI
chrVII
I
468921
-
chrXI:468921:468975:-
1
0
0
0
1
GTATGC
0
1
1
508825
+
chrVIII:508795:508825:+
1
0
0
0
6
CCGGCC
0
1
0
chrII
115534
+
chrII:115478:115534:+
1
0
0
0
0
GTATGT
0
1
1
chrXII
chrXII
I
898621
+
chrXII:898549:898621:+
1
0
0
0
2
GTATAA
0
0
0
647110
+
chrXIII:647059:647110:+
1
0
0
0
1
GCATGT
0
0
0
chrIV
chrXII
I
chrXV
I
chrVII
I
chrXII
I
392638
+
chrIV:392591:392638:+
1
0
0
0
0
GTATGT
0
1
1
822588
-
chrXIII:822588:822625:-
1
0
0
0
5
ATCAAA
0
1
0
777582
-
chrXVI:777582:777639:-
1
0
0
0
1
GTAAGT
0
1
1
129590
+
chrVIII:129523:129590:+
1
0
0
1
0
GTATGT
0
1
1
425094
+
chrXIII:424997:425094:+
1
0
0
1
0
GTATGT
0
1
1
chrII
592709
-
chrII:592709:592763:-
1
0
0
1
0
GTATGT
0
1
1
chrVII
chrVII
I
443012
-
chrVII:443012:443066:-
1
1
0
0
2
GTCAGT
0
1
0
255706
-
chrVIII:255706:255753:-
1
0
0
0
3
GTCCAT
0
1
0
chrXV
733353
-
chrXV:733353:733417:-
1
0
0
0
1
GCATGT
0
0
0
chrX
chrXII
I
chrXV
I
538617
+
chrX:538579:538617:+
1
0
0
0
5
TCCTAA
0
1
0
559828
+
chrXIII:559782:559828:+
1
0
0
0
0
GTATGT
0
0
0
146507
-
chrXVI:146507:146597:-
1
1
0
0
1
GTATGA
0
1
1
chrXI
408145
+
chrXI:408101:408145:+
1
0
0
0
1
GTAAGT
0
0
0
chrXV
423675
-
chrXV:423675:423735:-
1
1
0
0
1
GTATGG
0
1
1
chrV
336934
+
chrV:336866:336934:+
1
0
0
0
1
GTATGA
0
0
0
chrIX
183435
-
chrIX:183435:183500:-
1
0
0
0
4
CCCAGT
0
1
0
chrXI
166477
+
chrXI:166405:166477:+
1
0
0
1
1
GTACGT
0
0
0
chrXI
96757
+
chrXI:96692:96757:+
1
0
0
0
1
GTAGGT
0
1
1
chrXV
594353
-
chrXV:594353:594426:-
1
0
0
0
5
GCGCAA
0
1
0
chrII
331390
-
chrII:331390:331438:-
1
0
0
0
1
GTATGA
0
0
0
chrX
chrXV
I
chrXV
I
517545
-
chrX:517545:517607:-
1
1
0
0
2
GTACGC
0
1
0
939905
-
chrXVI:939905:939951:-
1
0
0
0
3
GTACCG
0
0
0
602284
-
chrXVI:602284:602338:-
1
0
0
0
0
GTATGT
0
1
1
chrIV
chrXII
I
437849
+
chrIV:437796:437849:+
1
0
0
0
4
AAAGAT
0
1
0
551141
-
chrXIII:551141:551202:-
1
0
0
1
0
GTATGT
0
0
0
chrVII
859434
-
chrVII:859434:859478:-
1
0
0
1
0
GTATGT
0
1
1
chrXV
-
chrXV:506800:506870:-
1
0
0
0
1
GTATTT
0
0
0
chrIV
506800
126684
6
+
chrIV:1266789:1266846:+
1
0
0
0
2
GTTAGT
0
0
0
chrXI
490069
+
chrXI:489994:490069:+
1
0
0
0
0
GTATGT
0
0
0
chrII
chrXV
I
chrXV
I
13949
-
chrII:13949:14027:-
1
1
0
0
0
GTATGT
0
1
1
231558
+
chrXVI:231499:231558:+
1
0
0
0
2
GTAAGA
0
0
0
593080
-
chrXVI:593080:593139:-
1
0
0
0
2
GTATAA
0
0
0
142 chrXII
I
832410
+
chrXIII:832362:832410:+
1
0
0
0
2
GTAAAT
0
0
0
chrX
632996
-
chrX:632996:633051:-
1
1
0
0
1
GTATGC
0
1
1
chrVII
35284
-
chrVII:35284:35351:-
1
0
0
0
2
GTAAGG
0
0
0
chrVI
96257
+
chrVI:96184:96257:+
1
0
0
0
1
GTATTT
0
0
0
chrXI
+
chrXI:302572:302664:+
1
0
0
0
4
CTCAAT
0
1
0
chrIV
302664
141191
4
-
chrIV:1411914:1411963:-
1
1
0
0
2
ACATGT
0
1
0
chrV
362828
+
chrV:362729:362828:+
1
0
0
1
0
GTATGT
0
0
0
chrVII
627186
-
chrVII:627186:627225:-
1
0
0
0
4
AACTGA
0
1
0
chrVII
chrXV
I
436362
+
chrVII:436318:436362:+
1
0
0
0
0
GTATGT
0
0
0
115267
+
chrXVI:115219:115267:+
1
0
0
1
0
GTATGT
0
0
0
chrIV
chrXV
I
chrXII
I
438275
+
chrIV:438208:438275:+
1
0
0
0
1
GTAAGT
0
1
1
76164
-
chrXVI:76164:76223:-
1
0
0
1
0
GTATGT
0
1
1
608719
+
chrXIII:608660:608719:+
1
0
0
0
2
GTATTA
0
1
0
chrXV
930113
+
chrXV:930063:930113:+
1
0
0
0
1
GTAAGT
0
1
1
chrXII
155740
-
chrXII:155740:155813:-
1
0
0
0
1
GTAAGT
0
0
0
chrX
432390
-
chrX:432390:432464:-
1
0
0
0
2
GTATAC
0
0
0
chrV
423912
+
chrV:423821:423912:+
1
0
0
0
3
GTAGAG
0
0
0
chrIX
317159
+
chrIX:317107:317159:+
1
0
0
0
5
ATGAAA
0
1
0
chrIV
chrXV
I
676355
+
chrIV:676270:676355:+
1
0
0
0
1
GTACGT
0
0
0
739748
-
chrXVI:739748:739794:-
1
0
0
0
2
GTAAGC
0
0
0
chrXI
94081
-
chrXI:94081:94154:-
1
1
0
0
0
GTATGT
0
1
1
chrV
201996
+
chrV:201953:201996:+
1
0
0
0
0
GTATGT
0
0
0
chrIV
540580
+
chrIV:540538:540580:+
1
0
0
0
1
GTAAGT
0
0
0
chrX
422629
+
chrX:422549:422629:+
1
0
0
0
1
GTATGA
0
0
0
chrIX
134038
+
chrIX:133971:134038:+
1
0
0
0
1
GTAAGT
0
0
0
chrIX
chrXI
V
155301
+
chrIX:155220:155301:+
1
0
0
1
1
GCATGT
0
1
1
611577
106251
2
+
chrXIV:611503:611577:+
1
0
0
0
0
GTATGT
0
0
0
-
chrXV:1062512:1062560:-
1
1
0
0
1
GTATGC
0
1
1
chrVII
chrXV
I
148658
-
chrVII:148658:148721:-
1
0
0
0
2
TTACGT
0
0
0
584022
-
chrXVI:584022:584078:-
1
0
0
0
2
GTATTG
0
0
0
chrVII
383552
+
chrVII:383489:383552:+
1
0
0
0
0
GTATGT
0
0
0
chrIV
+
chrIV:232747:232835:+
1
0
0
0
1
GTATAT
0
0
0
chrIV
232835
111433
2
+
chrIV:1114289:1114332:+
1
0
0
0
0
GTATGT
0
0
0
chrXI
290610
-
chrXI:290610:290705:-
1
1
0
0
1
GTAGGT
0
1
1
chrXV
chrVII
I
325375
-
chrXV:325375:325450:-
1
1
0
0
2
GTATCG
0
1
0
236985
+
chrVIII:236958:236985:+
1
0
0
0
4
TTAAAA
0
1
0
chrV
336936
+
chrV:336866:336936:+
0
1
0
0
1
GTATGA
0
1
1
chrXI
262172
+
chrXI:262138:262172:+
0
1
0
0
3
AGAAGT
0
1
0
chrI
78464
+
chrI:78419:78464:+
0
1
0
0
2
GTGTGG
0
1
0
chrXII
491545
+
chrXII:491497:491545:+
0
1
0
0
1
GTATAT
0
1
1
chrXV
518764
-
chrXV:518764:518826:-
0
1
0
0
1
ATATGT
0
1
1
chrXII
366443
+
chrXII:366367:366443:+
0
1
0
0
2
GTATCG
0
1
0
chrVII
188025
-
chrVII:188025:188070:-
0
1
0
0
3
GTACCG
0
1
0
chrII
425495
-
chrII:425495:425538:-
0
1
0
0
2
CTAAGT
0
1
0
chrVII
772104
+
chrVII:772019:772104:+
0
1
0
0
0
GTATGT
0
1
1
chrXII
982469
-
chrXII:982469:982535:-
0
1
0
0
0
GTATGT
0
1
1
chrXV
143 chrXV
I
685775
102138
6
+
chrXVI:685691:685775:+
0
1
0
0
1
GTAGGT
0
1
1
+
chrIV:1021323:1021386:+
0
1
0
0
1
ATATGT
0
1
1
224799
+
chrXIV:224764:224799:+
0
1
0
0
3
TTAAGA
0
1
0
chrVII
chrXII
I
chrXI
V
423867
-
chrVII:423867:423911:-
0
1
0
0
1
GTATGA
0
1
1
491153
+
chrXIII:491103:491153:+
0
1
0
0
3
GTACTG
0
1
0
373658
+
chrXIV:373596:373658:+
0
1
0
0
1
GTATGA
0
1
1
chrVII
951678
-
chrVII:951678:951729:-
0
1
0
0
2
GTATAC
0
1
0
chrX
712053
140717
7
103046
3
-
chrX:712053:712106:-
0
1
0
0
1
GTTTGT
0
1
1
+
chrIV:1407133:1407177:+
0
1
0
0
2
GTATAC
0
1
0
-
chrXV:1030463:1030507:-
0
1
0
0
4
ATCTAA
0
1
0
134871
100399
4
-
chrI:134871:134944:-
0
1
0
0
1
GTAAGT
0
1
1
-
chrXV:1003994:1004066:-
0
1
0
0
2
GTGAGT
0
1
0
chrIV
chrXI
V
chrIV
chrXV
chrI
chrXV
chrXII
I
887491
+
chrXIII:887427:887491:+
0
1
0
0
3
CGTTGT
0
1
0
chrV
258657
+
chrV:258593:258657:+
0
1
0
0
2
GTAGGA
0
1
0
chrVII
chrVII
I
504217
-
chrVII:504217:504268:-
0
1
0
0
1
GTATGA
0
1
1
442922
+
chrVIII:442871:442922:+
0
1
0
0
1
GTATAT
0
1
1
chrIV
235107
+
chrIV:235039:235107:+
0
1
0
0
1
GTATGA
0
1
1
chrV
124638
-
chrV:124638:124687:-
0
1
0
0
0
GTATGT
0
1
1
chrXI
313867
+
chrXI:313796:313867:+
0
1
0
0
2
GTAGAT
0
1
0
chrX
209415
+
chrX:209358:209415:+
0
1
0
0
1
GTATGG
0
1
1
chrXV
437796
-
chrXV:437796:437846:-
0
1
0
0
1
GTATAT
0
1
1
chrXII
918954
-
chrXII:918954:919021:-
0
1
0
0
1
ATATGT
0
1
1
chrII
431594
-
chrII:431594:431650:-
0
1
0
0
1
GTAGGT
0
1
1
chrII
419959
+
chrII:419915:419959:+
0
1
0
0
2
TTATGG
0
1
0
chrVII
345543
-
chrVII:345543:345614:-
0
1
0
0
0
GTATGT
0
1
1
chrVI
131142
-
chrVI:131142:131201:-
0
1
0
0
2
GTTAGT
0
1
0
chrVII
380660
-
chrVII:380660:380732:-
0
1
0
0
1
GTATTT
0
1
1
chrVII
167430
+
chrVII:167361:167430:+
0
1
0
1
0
GTATGT
0
1
1
chrII
342756
+
chrII:342697:342756:+
0
1
0
0
1
GTGTGT
0
1
1
chrVII
chrXI
V
chrXV
I
chrXII
I
875660
+
chrVII:875581:875660:+
0
1
0
0
1
GCATGT
0
1
1
598312
+
chrXIV:598259:598312:+
0
1
0
0
2
GTTTGA
0
1
0
339245
+
chrXVI:339174:339245:+
0
1
0
0
1
GTATAT
0
1
1
647112
+
chrXIII:647059:647112:+
0
1
0
0
1
GCATGT
0
1
1
chrI
30552
-
chrI:30552:30626:-
0
1
0
0
3
AAGTGT
0
1
0
chrXI
633804
-
chrXI:633804:633874:-
0
1
0
0
1
GTCTGT
0
1
1
chrII
206284
+
chrII:206192:206284:+
0
1
0
0
0
GTATGT
0
1
1
chrIV
chrXV
I
653423
+
chrIV:653365:653423:+
0
1
0
0
3
TTACGA
0
1
0
196570
-
chrXVI:196570:196616:-
0
1
0
0
1
GTATGG
0
1
1
chrV
chrVII
I
362830
+
chrV:362729:362830:+
0
1
0
1
0
GTATGT
0
1
1
491866
-
chrVIII:491866:491919:-
0
1
0
0
0
GTATGT
0
1
1
chrVII
937274
+
chrVII:937216:937274:+
0
1
0
0
2
GTAAGA
0
1
0
chrVI
103130
-
chrVI:103130:103173:-
0
1
0
0
2
TTTTGT
0
1
0
chrXI
chrXV
I
225774
-
chrXI:225774:225852:-
0
1
0
0
1
GTAAGT
0
1
1
579895
-
chrXVI:579895:579965:-
0
1
0
0
1
GTATGA
0
1
1
chrX
345273
+
chrX:345230:345273:+
0
1
0
0
3
GTAGAG
0
1
0
chrII
792451
-
chrII:792451:792495:-
0
1
0
0
2
GTACTT
0
1
0
144 chrXII
827561
-
chrXII:827561:827606:-
0
1
0
0
4
CGACTT
0
1
0
chrXII
988409
+
chrXII:988329:988409:+
0
1
0
0
2
GTAAGC
0
1
0
chrIX
363683
+
chrIX:363652:363683:+
0
1
0
0
4
TAACGC
0
1
0
chrIV
188133
-
chrIV:188133:188194:-
0
1
0
0
1
GTATGA
0
1
1
chrIX
54026
+
chrIX:53941:54026:+
0
1
0
0
0
GTATGT
0
1
1
chrX
chrXV
I
422631
+
chrX:422549:422631:+
0
1
0
0
1
GTATGA
0
1
1
-
chrXVI:513415:513466:-
0
1
0
0
1
GTATGG
0
1
1
chrIV
513415
118108
0
+
chrIV:1181036:1181080:+
0
1
0
0
3
AGATCT
0
1
0
chrVII
59556
+
chrVII:59483:59556:+
0
1
0
0
1
GTATGG
0
1
1
chrX
432391
108374
7
-
chrX:432391:432464:-
0
1
0
0
2
GTATAC
0
1
0
chrVII
-
chrVII:1083747:1083798:-
0
1
0
0
2
GTATAA
0
1
0
chrI
142343
+
chrI:142256:142343:+
0
1
0
1
0
GTATGT
0
1
1
chrXV
630885
-
chrXV:630885:630936:-
0
1
0
0
2
GTATAG
0
1
0
chrXII
382366
+
chrXII:382302:382366:+
0
1
0
0
0
GTATGT
0
1
1
chrIII
27689
-
chrIII:27689:27745:-
0
1
0
0
3
GTAATG
0
1
0
chrX
328112
-
chrX:328112:328177:-
0
1
0
0
2
GTAAGC
0
1
0
chrVII
chrXII
I
16566
+
chrVII:16513:16566:+
0
1
0
0
4
ATTAAT
0
1
0
273520
-
chrXIII:273520:273577:-
0
1
0
0
1
GTGTGT
0
1
1
chrV
423913
+
chrV:423821:423913:+
0
1
0
0
3
GTAGAG
0
1
0
chrXV
chrVII
I
506801
-
chrXV:506801:506870:-
0
1
0
0
1
GTATTT
0
1
1
372283
+
chrVIII:372191:372283:+
0
1
0
0
1
GTTTGT
0
1
1
chrII
519067
-
chrII:519067:519131:-
0
1
0
0
1
GTAAGT
0
1
1
chrIV
392681
+
chrIV:392591:392681:+
0
1
0
0
0
GTATGT
0
1
1
chrI
31425
-
chrI:31425:31464:-
0
1
0
0
3
ATAACT
0
1
0
chrV
561245
+
chrV:561200:561245:+
0
1
0
0
0
GTATGT
0
1
1
chrXI
465717
-
chrXI:465717:465764:-
0
1
0
0
2
GTAGGA
0
1
0
chrIV
chrXI
V
721691
-
chrIV:721691:721739:-
0
1
0
0
1
GTTTGT
0
1
1
+
chrXIV:724794:724878:+
0
1
0
0
3
GGTAGT
0
1
0
-
chrVII:1017560:1017603:-
0
1
0
0
5
CCACCA
0
1
0
chrIV
724878
101756
0
149051
7
-
chrIV:1490517:1490571:-
0
1
0
0
0
GTATGT
0
1
1
chrX
455748
+
chrX:455690:455748:+
0
1
0
0
1
GTACGT
0
1
1
chrIX
134040
+
chrIX:133971:134040:+
0
1
0
0
1
GTAAGT
0
1
1
chrX
31820
+
chrX:31768:31820:+
0
1
0
0
0
GTATGT
0
1
1
chrII
chrVII
I
89157
-
chrII:89157:89219:-
0
1
0
0
4
CAAAGC
0
1
0
335607
-
chrVIII:335607:335675:-
0
1
0
0
1
GTACGT
0
1
1
chrIV
273737
+
chrIV:273662:273737:+
0
1
0
0
2
GTCCGT
0
1
0
chrXV
418633
+
chrXV:418565:418633:+
0
1
0
0
1
GTATGA
0
1
1
chrXII
316897
+
chrXII:316853:316897:+
0
1
0
0
1
GTATGA
0
1
1
chrX
chrXII
I
349225
+
chrX:349181:349225:+
0
1
0
0
0
GTATGT
0
1
1
242968
-
chrXIII:242968:243056:-
0
1
0
0
1
GTAAGT
0
1
1
chrIV
248590
+
chrIV:248513:248590:+
0
1
0
0
2
GTCAGT
0
1
0
chrVII
682969
-
chrVII:682969:683069:-
0
1
0
0
0
GTATGT
0
1
1
chrXV
87614
+
chrXV:87554:87614:+
0
1
0
0
1
GTATCT
0
1
1
chrII
331391
151950
5
-
chrII:331391:331438:-
0
1
0
0
1
GTATGA
0
1
1
chrIV
-
chrIV:1519505:1519576:-
0
1
0
0
1
GTATGC
0
1
1
chrIV
540585
+
chrIV:540538:540585:+
0
1
0
0
1
GTAAGT
0
1
1
chrVII
145 chrVII
chrXI
V
chrXII
I
166586
+
chrVII:166543:166586:+
0
1
0
0
3
ATACGC
0
1
0
429575
-
chrXIV:429575:429637:-
0
1
0
0
0
GTATGT
0
1
1
204536
-
chrXIII:204536:204588:-
0
1
0
0
1
GTATGA
0
1
1
chrIV
chrXV
I
122159
+
chrIV:122079:122159:+
0
1
0
1
0
GTATGT
0
1
1
339305
+
chrXVI:339230:339305:+
0
1
0
0
1
GTAAGT
0
1
1
chrIX
chrXI
V
225116
+
chrIX:225035:225116:+
0
1
0
0
1
GTATGA
0
1
1
429599
-
chrXIV:429599:429637:-
0
1
0
0
0
GTATGT
0
1
1
chrXI
285868
108120
4
105263
8
108023
8
+
chrXI:285816:285868:+
0
1
0
0
1
GTATGA
0
1
1
+
chrIV:1081142:1081204:+
0
1
0
0
1
GTATGA
0
1
1
-
chrVII:1052638:1052724:-
0
1
0
0
1
GTATGG
0
1
1
-
chrIV:1080238:1080322:-
0
1
0
0
1
GTATAT
0
1
1
369425
-
chrXIV:369425:369506:-
0
1
0
0
0
GTATGT
0
1
1
17806
126684
8
+
chrIII:17761:17806:+
0
1
0
0
2
GTAGGC
0
1
0
+
chrIV:1266789:1266848:+
0
1
0
0
2
GTTAGT
0
1
0
chrIV
chrXII
I
451995
-
chrIV:451995:452077:-
0
1
0
0
2
GTCGGT
0
1
0
649541
-
chrXIII:649541:649630:-
0
1
0
0
1
GTATGG
0
1
1
chrXI
chrXII
I
chrXII
I
93384
-
chrXI:93384:93470:-
0
1
0
1
1
GTACGT
0
1
1
552476
+
chrXIII:552425:552476:+
0
1
0
0
3
ATTTTT
0
1
0
894575
-
chrXIII:894575:894616:-
0
1
0
0
4
CAATCA
0
1
0
chrVII
383554
+
chrVII:383489:383554:+
0
1
0
0
0
GTATGT
0
1
1
chrIV
698041
100293
6
-
0
1
0
0
1
GTACGT
0
1
1
+
chrIV:698041:698092:chrVII:1002886:1002936:
+
0
1
0
0
1
GCATGT
0
1
1
400568
-
chrVIII:400568:400651:-
0
1
0
0
0
GTATGT
0
1
1
chrIV
chrXV
I
299959
-
chrIV:299959:300001:-
0
1
0
0
2
GTTTGG
0
1
0
800193
+
chrXVI:800147:800193:+
0
1
0
0
5
AACTTC
0
1
0
chrXII
chrXII
I
chrXI
V
155741
-
chrXII:155741:155813:-
0
1
0
0
1
GTAAGT
0
1
1
832412
+
chrXIII:832362:832412:+
0
1
0
0
2
GTAAAT
0
1
0
64388
111433
4
-
chrXIV:64388:64425:-
0
1
0
0
3
TTTTCT
0
1
0
+
chrIV:1114289:1114334:+
0
1
0
0
0
GTATGT
0
1
1
chrIV
chrVII
chrIV
chrXI
V
chrIII
chrIV
chrVII
chrVII
I
chrIV
chrXII
I
114128
-
chrXIII:114128:114189:-
0
1
0
0
3
TTAAGA
0
1
0
chrX
165022
+
chrX:164962:165022:+
0
1
0
0
2
GTGGGT
0
1
0
chrII
341038
-
chrII:341038:341094:-
0
1
0
0
0
GTATGT
0
1
1
chrXII
898623
+
chrXII:898549:898623:+
0
1
0
0
2
GTATAA
0
1
0
chrVII
607018
-
chrVII:607018:607091:-
0
1
0
0
1
GTATGG
0
1
1
chrXV
chrXII
I
chrXV
I
chrXI
V
819159
+
chrXV:819077:819159:+
0
1
0
0
1
GTCTGT
0
1
1
559830
+
chrXIII:559782:559830:+
0
1
0
0
0
GTATGT
0
1
1
745391
+
chrXVI:745342:745391:+
0
1
0
0
2
TAATGT
0
1
0
611579
+
chrXIV:611503:611579:+
0
1
0
0
0
GTATGT
0
1
1
chrIV
885148
+
chrIV:885101:885148:+
0
1
0
0
2
GTATCA
0
1
0
chrVI
chrXV
I
96258
+
chrVI:96184:96258:+
0
1
0
0
1
GTATTT
0
1
1
937231
-
chrXVI:937231:937285:-
0
1
0
0
2
GTATAC
0
1
0
chrVII
chrXV
I
chrXI
V
414313
+
chrVII:414250:414313:+
0
1
0
0
1
GTATGG
0
1
1
617122
+
chrXVI:617066:617122:+
0
1
0
0
1
GTAAGT
0
1
1
561150
-
chrXIV:561150:561218:-
0
1
0
0
3
AAATAT
0
1
0
chrXII
791911
-
chrXII:791911:791964:-
0
1
0
0
0
GTATGT
0
1
1
146 chrXI
chrXV
I
chrXV
I
191089
-
chrXI:191089:191173:-
0
1
0
0
1
GTAAGT
0
1
1
115269
+
chrXVI:115219:115269:+
0
1
0
1
0
GTATGT
0
1
1
841159
+
chrXVI:841109:841159:+
0
1
0
0
2
GAAAGT
0
1
0
chrVII
962303
+
chrVII:962229:962303:+
0
1
0
0
0
GTATGT
0
1
1
chrXII
chrXV
I
chrXI
V
90638
+
chrXII:90568:90638:+
0
1
0
0
1
GTATGC
0
1
1
593081
-
chrXVI:593081:593139:-
0
1
0
0
2
GTATAA
0
1
0
250194
-
chrXIV:250194:250238:-
0
1
0
0
1
GTATAT
0
1
1
chrIV
chrVII
I
676357
+
chrIV:676270:676357:+
0
1
0
0
1
GTACGT
0
1
1
107814
+
chrVIII:107766:107814:+
0
1
0
0
2
GCAAGT
0
1
0
chrIV
657890
+
chrIV:657846:657890:+
0
1
0
0
2
GTATCA
0
1
0
chrIV
331277
+
chrIV:331190:331277:+
0
1
0
0
1
GTAAGT
0
1
1
chrXV
chrXII
I
527237
+
chrXV:527186:527237:+
0
1
0
0
1
GTATGA
0
1
1
56618
+
chrXIII:56557:56618:+
0
1
0
0
1
GTATTT
0
1
1
chrXII
chrXV
I
333652
+
chrXII:333603:333652:+
0
1
0
0
3
GTAAAG
0
1
0
739749
-
chrXVI:739749:739794:-
0
1
0
0
2
GTAAGC
0
1
0
chrXII
chrXI
V
chrVII
I
331859
-
chrXII:331859:331922:-
0
1
0
0
0
GTATGT
0
1
1
174521
-
chrXIV:174521:174572:-
0
1
0
0
2
GTATCG
0
1
0
97941
-
chrVIII:97941:98036:-
0
1
0
0
1
GTACGT
0
1
1
chrVII
427172
-
chrVII:427172:427221:-
0
1
0
0
0
GTATGT
0
1
1
chrXV
chrXII
I
chrXI
V
chrXI
V
836398
-
chrXV:836398:836448:-
0
1
0
0
2
ATATTT
0
1
0
38609
-
chrXIII:38609:38680:-
0
1
0
0
2
GTATCG
0
1
0
17324
-
chrXIV:17324:17361:-
0
1
0
0
3
CCAAGT
0
1
0
324785
-
chrXIV:324785:324827:-
0
1
0
0
5
AGGTAG
0
1
0
chrVII
878175
-
chrVII:878175:878242:-
0
1
0
0
3
TTAAGA
0
1
0
chrXII
928301
-
chrXII:928301:928370:-
0
1
0
0
1
GTATAT
0
1
1
chrXI
552231
+
chrXI:552176:552231:+
0
1
0
0
2
GTAAAT
0
1
0
chrX
506239
+
chrX:506193:506239:+
0
1
0
0
1
GTATGA
0
1
1
chrIV
130285
-
chrIV:130285:130359:-
0
1
0
0
2
GTATCG
0
1
0
chrV
7557
-
chrV:7557:7625:-
0
1
0
0
2
GTATAA
0
1
0
chrV
517901
+
chrV:517847:517901:+
0
1
0
0
0
GTATGT
0
1
1
chrXII
907075
+
chrXII:907032:907075:+
0
1
0
0
3
GTCAGG
0
1
0
chrI
157809
-
chrI:157809:157862:-
0
1
0
0
2
GTATAC
0
1
0
chrXI
166500
+
chrXI:166452:166500:+
0
1
0
0
3
GTACTC
0
1
0
chrXV
chrXV
I
605429
+
chrXV:605379:605429:+
0
1
0
0
1
GTACGT
0
1
1
717057
-
chrXVI:717057:717144:-
0
1
0
0
1
GTATGC
0
1
1
chrII
chrXII
I
76231
+
chrII:76189:76231:+
0
1
0
0
1
GTCTGT
0
1
1
910098
-
chrXIII:910098:910130:-
0
1
0
0
2
CGATGT
0
1
0
chrXI
chrXII
I
649449
+
chrXI:649382:649449:+
0
1
0
0
1
GTATGA
0
1
1
551142
-
chrXIII:551142:551202:-
0
1
0
1
0
GTATGT
0
1
1
chrIV
chrXII
I
chrXV
I
232837
+
chrIV:232747:232837:+
0
1
0
0
1
GTATAT
0
1
1
653956
-
chrXIII:653956:654018:-
0
1
0
0
0
GTATGT
0
1
1
590144
-
chrXVI:590144:590200:-
0
1
0
0
1
ATATGT
0
1
1
chrVII
51958
+
chrVII:51882:51958:+
0
1
0
0
0
GTATGT
0
1
1
chrIII
221009
+
chrIII:220979:221009:+
0
1
0
0
4
TATTCT
0
1
0
chrXII
50297
+
chrXII:50224:50297:+
0
1
0
0
0
GTATGT
0
1
1
147 chrXII
I
chrXI
V
637453
+
chrXIII:637403:637453:+
0
1
0
0
1
GTGTGT
0
1
1
273231
+
chrXIV:273189:273231:+
0
1
0
0
1
GTACGT
0
1
1
chrVII
77520
+
chrVII:77467:77520:+
0
1
0
0
2
GTATCA
0
1
0
chrV
61194
+
chrV:61112:61194:+
0
1
0
0
1
GTACGT
0
1
1
chrVI
chrXV
I
176327
-
chrVI:176327:176385:-
0
1
0
0
1
GTATGG
0
1
1
883587
-
chrXVI:883587:883649:-
0
1
0
0
1
GTATAT
0
1
1
chrIX
chrXI
V
40245
+
chrIX:40178:40245:+
0
1
0
0
3
TCAAGT
0
1
0
273584
-
chrXIV:273584:273650:-
0
1
0
0
2
GTATAA
0
1
0
chrVII
148659
-
chrVII:148659:148721:-
0
1
0
0
2
TTACGT
0
1
0
chrXI
193010
-
chrXI:193010:193071:-
0
1
0
0
0
GTATGT
0
1
1
chrVII
140633
+
chrVII:140554:140633:+
0
1
0
0
1
GTATCT
0
1
1
chrIV
929253
+
chrIV:929206:929253:+
0
1
0
0
1
GTATGA
0
1
1
chrVII
253205
-
chrVII:253205:253253:-
0
1
0
0
1
GTATGA
0
1
1
chrXI
408146
+
chrXI:408101:408146:+
0
1
0
0
1
GTAAGT
0
1
1
chrXV
720499
+
chrXV:720444:720499:+
0
1
0
0
0
GTATGT
0
1
1
chrXI
446830
-
chrXI:446830:446885:-
0
1
0
0
1
GTATGC
0
1
1
chrIV
381064
-
chrIV:381064:381143:-
0
1
0
0
1
GTATCT
0
1
1
chrVII
293422
+
chrVII:293335:293422:+
0
1
0
0
1
GTATTT
0
1
1
chrXI
158626
-
chrXI:158626:158679:-
0
1
0
0
1
ATATGT
0
1
1
chrII
chrXI
V
167927
+
chrII:167855:167927:+
0
1
0
0
1
GTAAGT
0
1
1
217113
-
chrXIV:217113:217193:-
0
1
0
0
0
GTATGT
0
1
1
chrVI
137938
+
chrVI:137848:137938:+
0
1
0
0
4
ATGGAT
0
1
0
chrV
201997
+
chrV:201953:201997:+
0
1
0
0
0
GTATGT
0
1
1
chrIV
333709
+
chrIV:333639:333709:+
0
1
0
0
1
GTATGC
0
1
1
chrV
435437
-
chrV:435437:435498:-
0
1
0
0
2
GTAAAT
0
1
0
chrVII
35287
-
chrVII:35287:35351:-
0
1
0
0
2
GTAAGG
0
1
0
chrXII
823170
+
chrXII:823108:823170:+
0
1
0
0
1
GTAAGT
0
1
1
chrII
chrXV
I
756982
-
chrII:756982:757037:-
0
1
0
0
1
GTAGGT
0
1
1
560470
+
chrXVI:560413:560470:+
0
1
0
0
1
GTAAGT
0
1
1
chrXII
987203
+
chrXII:987139:987203:+
0
1
0
1
0
GTATGT
0
1
1
chrX
237005
+
chrX:236962:237005:+
0
1
0
0
1
GTATGC
0
1
1
chrV
487859
+
chrV:487816:487859:+
0
1
0
0
1
GTATGA
0
1
1
chrX
633064
-
chrX:633064:633144:-
0
1
0
0
2
GTAAGC
0
1
0
chrXV
249913
+
chrXV:249872:249913:+
0
1
0
0
3
ATCGGT
0
1
0
chrVI
268722
+
chrVI:268674:268722:+
0
1
0
0
1
GTATGC
0
1
1
chrX
chrXV
I
526803
-
chrX:526803:526879:-
0
1
0
0
2
TAATGT
0
1
0
616651
+
chrXVI:616593:616651:+
0
1
0
0
0
GTATGT
0
1
1
chrVII
561205
-
chrVII:561205:561267:-
0
1
0
0
2
GGTTGT
0
1
0
chrI
126071
+
chrI:125985:126071:+
0
1
0
0
1
GTATGG
0
1
1
chrXII
chrXII
I
75835
-
chrXII:75835:75882:-
0
1
0
0
3
GTAAAC
0
1
0
845438
-
chrXIII:845438:845541:-
0
1
0
0
1
TTATGT
0
1
1
chrIV
456688
-
chrIV:456688:456757:-
0
1
0
0
1
GTAAGT
0
1
1
chrXI
chrXI
V
166478
+
chrXI:166405:166478:+
0
1
0
1
1
GTACGT
0
1
1
349280
-
chrXIV:349280:349355:-
0
1
0
0
1
GTATGG
0
1
1
chrVI
32028
-
chrVI:32028:32077:-
0
1
0
0
1
GTAAGT
0
1
1
chrIX
127053
-
chrIX:127053:127135:-
0
1
0
0
1
GTATAT
0
1
1
148 chrIII
chrXV
I
84709
+
chrIII:84665:84709:+
0
1
0
0
1
GTCTGT
0
1
1
159662
+
chrXVI:159585:159662:+
0
1
0
0
1
GTAAGT
0
1
1
chrVII
chrXII
I
chrXI
V
574777
+
chrVII:574705:574777:+
0
1
0
0
1
GTATTT
0
1
1
810474
-
chrXIII:810474:810549:-
0
1
0
0
1
GTATGA
0
1
1
227320
-
chrXIV:227320:227353:-
0
1
0
0
4
GTGATA
0
1
0
chrIV
104615
+
chrIV:104542:104615:+
0
1
0
0
1
GTATGA
0
1
1
chrXV
845022
+
chrXV:844959:845022:+
0
1
0
0
0
GTATGT
0
1
1
chrIII
chrXV
I
169189
-
chrIII:169189:169268:-
0
1
0
0
1
GTATAT
0
1
1
584023
-
chrXVI:584023:584078:-
0
1
0
0
2
GTATTG
0
1
0
chrX
71475
+
chrX:71426:71475:+
0
1
0
0
2
GTTGGT
0
1
0
chrVII
chrXI
V
658296
+
chrVII:658204:658296:+
0
1
0
0
1
GTTTGT
0
1
1
359077
-
chrXIV:359077:359147:-
0
1
0
0
1
GTATAT
0
1
1
chrV
chrXI
V
chrXV
I
chrXV
I
203928
+
chrV:203855:203928:+
0
1
0
0
1
GTATGG
0
1
1
69244
+
chrXIV:69184:69244:+
0
1
0
0
0
GTATGT
0
1
1
490809
-
chrXVI:490809:490857:-
0
1
0
0
1
GTAAGT
0
1
1
824941
-
chrXVI:824941:825021:-
0
1
0
0
2
GTGCGT
0
1
0
chrXI
272947
-
chrXI:272947:272992:-
0
1
0
0
0
GTATGT
0
1
1
chrIII
228656
+
chrIII:228610:228656:+
0
1
0
0
1
GTATGA
0
1
1
chrXI
chrXV
I
chrXII
I
38909
+
chrXI:38829:38909:+
0
1
0
0
4
AGATAG
0
1
0
101222
+
chrXVI:101139:101222:+
0
1
0
0
1
GTAAGT
0
1
1
301516
-
chrXIII:301516:301586:-
0
1
0
0
1
GTAAGT
0
1
1
chrVII
792327
-
chrVII:792327:792385:-
0
1
0
0
1
GTATCT
0
1
1
chrXV
chrXV
I
733355
-
chrXV:733355:733417:-
0
1
0
0
1
GCATGT
0
1
1
231560
+
chrXVI:231499:231560:+
0
1
0
0
2
GTAAGA
0
1
0
chrXII
783405
-
chrXII:783405:783492:-
0
1
0
0
2
TAATGT
0
1
0
chrVII
593472
+
chrVII:593395:593472:+
0
1
0
0
1
GTAAGT
0
1
1
chrIII
290580
+
chrIII:290522:290580:+
0
1
0
0
1
GTATTT
0
1
1
chrVII
chrXI
V
chrXI
V
262616
-
chrVII:262616:262696:-
0
1
0
0
0
GTATGT
0
1
1
427177
+
chrXIV:427106:427177:+
0
1
0
0
1
GTATGA
0
1
1
380702
-
chrXIV:380702:380783:-
0
1
0
1
0
GTATGT
0
1
1
chrII
49795
+
chrII:49765:49795:+
0
1
0
0
3
GAACGC
0
1
0
chrIII
119534
-
chrIII:119534:119586:-
0
1
0
0
3
GTACTG
0
1
0
chrXII
522715
+
chrXII:522672:522715:+
0
1
0
1
0
GTATGT
0
1
1
chrV
chrXI
V
160514
+
chrV:160465:160514:+
0
1
0
0
1
GTATAT
0
1
1
394037
+
chrXIV:393996:394037:+
0
1
0
0
1
GTATGC
0
1
1
chrII
306857
-
chrII:306857:306902:-
0
1
0
0
2
GTATCA
0
1
0
chrVII
chrXI
V
914790
-
chrVII:914790:914879:-
0
1
0
0
1
GTAAGT
0
1
1
767744
-
chrXIV:767744:767788:-
0
1
0
0
2
GTACGC
0
1
0
chrIV
768400
-
chrIV:768400:768453:-
0
1
0
0
1
ATATGT
0
1
1
chrII
chrVII
I
45935
+
chrII:45875:45935:+
0
1
0
0
2
GCATGC
0
1
0
115532
+
chrVIII:115489:115532:+
0
1
0
0
2
GTATAA
0
1
0
chrIII
120379
+
chrIII:120290:120379:+
0
1
0
0
1
GTAAGT
0
1
1
chrXII
965102
133331
6
+
chrXII:965038:965102:+
0
1
0
0
1
GTATAT
0
1
1
chrIV
-
chrIV:1333316:1333401:-
0
1
0
0
1
GTATGC
0
1
1
chrIX
261766
+
chrIX:261723:261766:+
0
1
0
0
1
GTATGA
0
1
1
149 chrXI
490071
+
chrXI:489994:490071:+
0
1
0
0
0
GTATGT
0
1
1
chrXV
139434
+
chrXV:139377:139434:+
0
1
0
0
1
GTAAGT
0
1
1
chrIX
334600
-
chrIX:334600:334643:-
0
1
0
0
1
GTTTGT
0
1
1
chrX
chrXI
V
chrXI
V
119609
-
chrX:119609:119650:-
0
1
0
0
3
GAGTTT
0
1
0
726914
-
chrXIV:726914:726978:-
0
1
0
0
1
GTCTGT
0
1
1
401581
+
chrXIV:401504:401581:+
0
1
0
0
2
GTATCG
0
1
0
chrXII
380684
+
chrXII:380627:380684:+
0
1
0
0
0
GTATGT
0
1
1
chrXV
754237
111188
7
-
chrXV:754237:754290:-
0
1
0
0
1
GTAAGT
0
1
1
+
chrIV:1111836:1111887:+
0
1
0
0
0
GTATGT
0
1
1
763546
+
chrXIV:763498:763546:+
0
1
0
0
3
GTACAG
0
1
0
-
chrXVI:939906:939951:-
0
1
0
0
3
GTACCG
0
1
0
chrXV
939906
105970
9
-
chrXV:1059709:1059794:-
0
1
0
0
1
GTAAGT
0
1
1
chrV
374671
-
chrV:374671:374721:-
0
1
0
0
1
GTTTGT
0
1
1
chrX
649521
-
chrX:649521:649569:-
0
1
0
0
1
GCATGT
0
1
1
chrX
chrXV
I
chrXI
V
570580
-
chrX:570580:570633:-
0
1
0
0
3
GTAGTC
0
1
0
481183
-
chrXVI:481183:481238:-
0
1
0
0
1
GTACGT
0
1
1
545602
+
chrXIV:545556:545602:+
0
1
0
0
2
GTAACT
0
1
0
chrVII
314262
-
chrVII:314262:314298:-
0
1
0
0
2
ATATAT
0
1
0
chrXII
232370
+
chrXII:232297:232370:+
0
1
0
0
1
GTAAGT
0
1
1
chrXV
-
chrXV:674437:674504:-
0
1
0
0
1
GTATGA
0
1
1
chrIV
674437
135514
0
-
chrIV:1355140:1355228:-
0
1
0
0
3
GTAGAG
0
1
0
chrV
305684
+
chrV:305624:305684:+
0
1
0
0
0
GTATGT
0
1
1
chrIX
chrXII
I
360694
-
chrIX:360694:360737:-
0
1
0
0
1
GCATGT
0
1
1
-
chrXIII:112660:112715:-
0
1
0
0
1
GTATGA
0
1
1
chrIV
112660
145341
2
+
chrIV:1453370:1453412:+
0
1
0
0
2
GTTGGT
0
1
0
chrVII
730063
+
chrVII:730017:730063:+
0
1
0
0
1
GTATGG
0
1
1
chrVII
594140
-
chrVII:594140:594202:-
0
1
0
0
0
GTATGT
0
1
1
chrXII
987240
+
chrXII:987139:987240:+
0
1
0
1
0
GTATGT
0
1
1
chrVI
109969
-
chrVI:109969:110026:-
0
1
0
0
1
GTAAGT
0
1
1
chrVII
293817
+
chrVII:293737:293817:+
0
1
0
0
1
GTATGG
0
1
1
chrIV
chrXI
V
130328
+
chrIV:130289:130328:+
0
1
0
0
1
GTATGC
0
1
1
573917
-
chrXIV:573917:573973:-
0
1
0
0
2
GTTTGA
0
1
0
chrX
535286
-
chrX:535286:535343:-
0
1
0
0
1
GTAAGT
0
1
1
chrIV
247245
-
chrIV:247245:247300:-
0
1
0
0
1
GTATAT
0
1
1
chrII
290470
-
chrII:290470:290528:-
0
1
0
0
0
GTATGT
0
1
1
chrVII
34393
-
chrVII:34393:34444:-
0
1
0
0
4
ACCGGT
0
1
0
chrX
chrXII
I
264007
-
chrX:264007:264058:-
0
1
0
0
2
GTGCGT
0
1
0
493957
+
chrXIII:493909:493957:+
0
1
0
0
1
GTTTGT
0
1
1
chrI
chrXII
I
152003
-
chrI:152003:152055:-
0
1
0
0
2
GTATAC
0
1
0
701813
+
chrXIII:701740:701813:+
0
1
0
0
1
GTAGGT
0
1
1
chrIX
chrXII
I
128118
+
chrIX:128061:128118:+
0
1
0
0
2
TTTTGT
0
1
0
727505
+
chrXIII:727436:727505:+
0
1
0
0
6
TCCCAA
0
1
0
chrII
372030
+
chrII:371971:372030:+
0
1
0
0
2
GAACGT
0
1
0
chrVII
436364
+
chrVII:436318:436364:+
0
1
0
0
0
GTATGT
0
1
1
198
430
268
chrIV
chrXI
V
chrXV
I
totals
150 151 Table II-­‐S3. GTATGT motif frequency at 5'SS and generally in introns. No. mismatches
from /GTATGT
Annotated 5'SS
Annotated
5'SS,
no chrM
Arbitrary
intron
positions
152 0
216
1
73
2
6
3
18
4
7
5
7
6
3
total
330
0 or
1
mut
289
% 0
or 1
mut
87.58
216
73
6
1
2
0
0
298
289
96.98
10
24
130
471
982
973
390
2980
34
1.1
Table II-­‐S4. Branch-­‐seq CPMs. The format of the bp_id is chromsome:bp_nucleotide:strand_of_bp Branch_seq_cpm are the counts per million calculated from all data (top, middle, and bottom slices of gel arc combined). If the gene_name is “not_in_intron_or_TIF” it did not fall inside a gene according to Pelechano et al. (2013). Bp_type is “annotated” if the peak fell within 3nt of an annotated BP location according to Meyer et al. (2011). bp_id
chrVII:497961:chrXIII:123777:chrV:433162:+
chrV:308040:+
chrV:307801:+
chrV:269803:chrV:348217:chrV:148254:+
chrV:239667:chrV:131853:+
chrV:159013:chrV:548611:+
chrV:166786:chrV:397225:+
chrII:462255:+
chrII:479389:+
chrII:602175:+
chrII:606615:+
chrII:653429:+
chrII:170771:+
chrII:110932:+
chrII:333365:+
chrII:110451:chrII:186375:chrII:462472:+
chrII:142787:chrII:47071:chrII:168770:+
chrII:426524:chrII:125234:+
chrII:366534:-
branch_seq_cpm
476.7266362
1.126520145
0.292060778
205.9028487
248.1264927
0.083445937
221.7575767
301.9908448
0.75101343
160.3830903
674.4517831
1.376857955
251.2557153
0.333783747
274.036456
247.9596008
2.711992942
0.917905303
196.306566
29.95709126
198.5178833
7.092904617
375.590161
513.1507876
79.1901939
241.9932163
44.47668425
0.166891873
0.458952652
112.1930619
129.0491411
gene_name
YGR001C
YML073C
YER133W
YER074W-A
YER074W-A
YER056C-A
YER093C-A
YEL003W
YER044C-A
YEL012W
YER003C
YER179W
YER007C-A
YER117W
YBR111W-A
YBR119W
YBR186W
YBR191W
YBR215W
YBL026W
YBL059W
YBR048W
YBL059C-A
YBL018C
YBR111W-A
YBL040C
YBL091C-A
YBL027W
YBR090C
YBL050W
YBR062C
bp_type
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
153 chrII:726947:chrII:679973:chrII:407047:chrVI:64615:chrVI:221291:chrVI:242044:+
chrVI:63915:chrVI:203304:chrXIV:62407:chrXIV:534915:chrXIV:494551:chrXIV:623270:+
chrXIV:351021:+
chrXIV:545341:+
chrXIV:48377:+
chrXIV:331803:+
chrXIV:185553:+
chrXIV:145191:chrXIV:609837:+
chrXIV:366096:+
chrXIV:380726:chrXIV:415872:+
chrXIV:557672:+
chrXVI:96189:chrXVI:492958:chrXVI:623662:+
chrXVI:729414:chrXVI:407002:+
chrXVI:305374:+
chrXVI:883453:+
chrXVI:678219:chrXVI:345483:chrXVI:76014:chrXVI:218711:+
chrXVI:654534:+
chrXVI:405426:+
chrXVI:911328:+
chrXVI:833779:+
chrXVI:412958:+
chrXVI:174036:+
chrXI:625594:+
chrXI:447376:chrXI:430163:chrXI:283016:+
chrXI:437526:+
chrXI:449611:chrXI:83061:+
154 120.9966082
102.6802251
2316.334033
0.458952652
3.00405372
349.5967517
332.3651658
787.2289666
0.166891873
362.1970881
3.50472934
0.083445937
438.2163364
1186.726388
87.78512538
0.50067562
134.3062351
172.9417038
1.25168905
225.1371371
537.4335551
0.292060778
334.659929
218.8369689
403.210766
391.1528281
126.3788711
9.679728654
165.9322451
202.1477816
565.4713899
0.083445937
0.166891873
254.3849379
0.292060778
0.083445937
1047.663735
136.8930591
28.87229409
0.792736398
630.6009434
229.267711
132.1366407
0.292060778
592.6330422
5.131925105
319.4727685
YBR255C-A
YBR230C
YBR082C
YFL034C-A
YFR031C-A
YFR045W
YFL034C-B
YFR024C-A
YNL302C
YNL050C
YNL069C
YNL004W
YNL147W
YNL044W
YNL312W
YNL162W
YNL246W
YNL265C
YNL012W
YNL138W-A
YNL130C
YNL112W
YNL038W
YPL241C
YPL031C
YPR028W
YPR098C
YPL079W
YPL129W
YPR170W-B
YPR063C
YPL109C
YPL249C-A
YPL175W
YPR043W
YPL081W
YPR187W
YPR153W
YPL075W
YPL198W
YKR095W-A
YKR004C
YKL006C-A
YKL081W
YKL002W
YKR005C
YKL190W
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
chrXI:618096:chrVII:543708:+
chrVII:946407:+
chrVII:497394:chrVII:157245:chrVII:435728:+
chrVII:346854:chrVII:73034:chrVII:311486:+
chrVII:556282:+
chrVII:62173:+
chrX:702871:chrX:387380:chrX:435302:+
chrX:76268:+
chrX:74178:+
chrX:365853:+
chrX:608540:+
chrX:396512:chrX:469190:chrX:50351:chrXV:93868:chrXV:240976:chrXV:778910:chrXV:867516:+
chrXV:242453:chrXV:92476:chrXV:900802:chrIX:166484:+
chrIX:232012:chrIX:225834:chrIX:317136:+
chrIX:47743:+
chrIX:155276:+
chrIX:348380:chrXII:522984:+
chrXII:766205:chrXII:564467:chrXII:548706:chrXII:1024631:+
chrXII:398583:+
chrXII:286498:chrXII:786667:+
chrXII:857038:+
chrXII:40353:chrXII:694444:+
chrXII:250899:-
15.64611313
129.6332626
1.00135124
401.9590769
3.212668562
0.166891873
32.08496265
19.65151809
0.667567493
0.625844525
317.0945594
0.166891873
414.5594134
207.3214297
0.083445937
31.33394922
441.0117753
0.792736398
329.3193891
15.47922125
77.02059955
0.083445937
344.9655022
0.083445937
0.542398588
19.60979512
51.44441996
1.585472797
603.3141221
1.293412018
0.083445937
0.542398588
547.1132838
275.329868
0.50067562
0.292060778
535.8063594
31.12533438
175.1530211
1.919256543
89.28715224
113.0275212
10.84797177
81.40151122
2850.179413
310.5023304
1.710641702
YKR094C
YGR029W
YGR225W
YGR001C
YGL183C
YGL033W
YGL087C
YGL226C-A
YGL103W
YGR034W
YGL232W
YJR145C
YJL031C
YJL001W
YJL189W
YJL191W
YJL041W
YJR094W-A
YJL024C
YJR021C
YJL205C
YOL120C
YOL048C
YOR234C
YOR293W
YOL047C
YOL121C
YOR312C
YIL106W
YIL069C
YIL073C
YIL018W
YIL156W-B
YIL111W
YIL004C
YLR185W
YLR316C
YLR211C
YLR199C
YLR445W
YLR128W
YLR078C
YLR329W
YLR367W
YLL050C
YLR275W
YLR054C
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
155 chrXII:327295:chrXII:744219:+
chrXII:766086:chrXII:987191:+
chrXII:242652:+
chrXII:263540:+
chrXIII:337886:+
chrXIII:666961:chrXIII:223421:chrXIII:499898:chrXIII:732835:+
chrXIII:651593:+
chrXIII:854879:+
chrXIII:425113:+
chrXIII:82343:+
chrXIII:206162:+
chrXIII:140113:chrXIII:537527:+
chrXIII:517842:+
chrXIII:721231:chrXIII:753788:chrXIII:652797:chrXIII:211547:+
chrXIII:99301:chrXIII:225264:chrIII:111588:chrIII:177952:chrIII:173137:chrIII:101646:chrIII:107089:+
chrIII:107255:+
chrIV:1212941:+
chrIV:1266790:chrIV:1237581:+
chrIV:1103871:+
chrIV:1073346:chrIV:1450485:chrIV:491873:+
chrIV:65358:+
chrIV:1238769:chrIV:1319751:chrIV:254999:chrIV:267780:+
chrIV:239421:chrIV:217970:+
chrIV:431423:chrIV:337596:+
156 10.51418802
171.6900147
482.1506221
193.8031879
0.584121557
0.166891873
835.6276098
225.4291979
0.292060778
0.208614842
0.083445937
0.166891873
223.6768332
0.083445937
217.2932191
118.4515071
99.59272542
1.543749828
0.083445937
1.460303892
0.125168905
114.9885007
59.9559055
437.2984311
634.5646254
118.2428923
1.919256543
618.2926678
93.66806391
297.6099331
352.4339135
276.3312193
144.945592
3.379560435
150.8285305
150.9119765
1.960979512
0.166891873
205.9862947
150.7868076
113.277859
439.1342417
101.5537049
415.2687038
0.417229683
1294.872322
331.6975983
YLR093C
YLR306W
YLR316C
YLR426W
YLR048W
YLR061W
YMR033W
YMR201C
YML026C
YMR116C
YMR230W
YMR194W
YMR292W
YMR079W
YML094W
YML036W
YML067C
YMR133W
YMR125W
YMR225C
YMR242C
YMR194C-B
YML034W
YML085C
YML025C
YCL002C
YCR031C
YCR028C-A
YCL012C
YCL005W-A
YCL005W-A
YDR367W
YDR397C
YDR381W
YDR318W
YDR305C
YDR500C
YDR025W
YDL219W
YDR381C-A
YDR424C
YDL115C
YDL108W
YDL125C
YDL136W
YDL012C
YDL064W
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
chrIV:715265:chrIV:579963:+
chrIV:399468:+
chrIV:630016:+
chrIV:733713:chrIV:569665:chrIV:458048:chrIV:307375:chrIV:230262:+
chrIV:1319627:chrI:151022:chrI:87439:+
chrVIII:354926:+
chrVIII:129611:+
chrVIII:187613:chrVIII:498731:chrVIII:382356:chrVIII:298418:chrVIII:104751:+
chrVIII:255689:chrVIII:107875:+
chrVIII:315810:chrVIII:189778:chrVIII:251233:+
chrVIII:262372:chrVIII:138274:chrV:151105:+
chrV:166806:chrV:540383:chrII:115534:+
chrII:170731:+
chrII:221024:+
chrII:606346:+
chrII:291715:chrII:592709:chrXVI:218695:+
chrXVI:335968:+
chrXVI:76164:chrXVI:281450:chrXVI:602284:chrXVI:729395:chrXVI:777582:chrXVI:937537:chrXI:96757:+
chrXI:468921:chrVII:439462:+
chrVII:543691:+
166.4746437
20.65286933
10.38901912
0.709290462
307.7903374
153.7491383
576.6531454
0.25033781
0.083445937
157.2538677
100.4271848
179.6173787
255.67835
89.12026036
1389.917244
30.45776688
0.083445937
198.976836
0.292060778
528.1293332
674.0345535
286.3864547
151.7881588
92.41637486
0.083445937
0.083445937
0.625844525
0.625844525
1.084797177
15.47922125
24.44965944
3.087499657
0.709290462
1.00135124
6.425337124
0.876182335
1.335134987
0.667567493
0.667567493
4.839864327
1.460303892
0.834459367
1.25168905
2.795438878
0.917905303
5.25709401
2.419932163
YDR129C
YDR064W
YDL029W
YDR092W
YDR139C
YDR059C
YDR005C
YDL083C
YDL130W
YDR424C
YAL001C
YAL030W
YHR123W
YHR012W
YHR039C-A
YHR199C-A
YHR141C
YHR097C
YHL001W
YHR077C
YHR001W-A
YHR101C
YHR041C
YHR076W
YHR079C-A
YHR016C
not_in_intron_or_TIF
YER007C-A
YER175C
YBL056W
YBL026W
not_in_intron_or_TIF
YBR191W
YBR025C
YBR181C
YPL175W
YPL114W
YPL249C-A
snR17b
not_in_intron_or_TIF
YPR098C
not_in_intron_or_TIF
not_in_intron_or_TIF
YKL184W
YKR015C
YGL030W
YGR029W
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
annotated
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
157 chrVII:555885:+
chrVII:143935:chrVII:497408:chrVII:859434:chrXIII:182669:+
chrXIII:425094:+
chrXIII:652808:chrXV:505980:+
chrXV:930113:+
chrXV:117635:chrXV:349520:chrIX:155301:+
chrIX:166501:+
chrXII:382387:+
chrXII:286519:chrX:435281:+
chrIV:107156:+
chrIV:392638:+
chrIV:438275:+
chrIV:655244:+
chrIV:1145148:+
chrIV:22303:chrIV:451430:chrVIII:129590:+
chrVIII:406673:chrV:61194:+
chrV:124638:chrV:160514:+
chrV:201997:+
chrV:203928:+
chrV:305684:+
chrV:336936:+
chrV:362830:+
chrV:374671:chrV:487859:+
chrV:517901:+
chrV:561245:+
chrII:13949:chrII:76231:+
chrII:167927:+
chrII:206284:+
chrII:290470:chrII:331391:chrII:341038:chrII:342756:+
chrII:431594:chrII:443734:-
158 0.542398588
6.174999314
1.543749828
5.799492599
0.959628272
0.375506715
3.50472934
4.965033232
2.086148417
0.792736398
1.209966082
1.084797177
0.709290462
3.713344182
3.212668562
1.376857955
1.710641702
0.50067562
1.75236467
1.960979512
30.37432095
3.75506715
4.130573865
0.75101343
0.959628272
0.166891873
0.667567493
0.375506715
46.43766376
0.333783747
9.26249897
1.543749828
1.75236467
2.461655132
0.625844525
395.3668479
0.083445937
0.959628272
0.75101343
0.166891873
0.125168905
0.083445937
5.090202137
0.75101343
5.50743182
0.50067562
32.29357749
YGR034W
not_in_intron_or_TIF
YGR001C
YGR183C
YML046W
YMR079W
YMR194C-B
YOR096W
YOR326W
not_in_intron_or_TIF
not_in_intron_or_TIF
YIL111W
YIL106W
YLR116W
YLR078C
YJL001W
YDL195W
not_in_intron_or_TIF
YDL007W
YDR100W
YDR336W
not_in_intron_or_TIF
YDR001C
YHR012W
not_in_intron_or_TIF
not_in_intron_or_TIF
YEL016C
YER004W, YER005W
YER023W
YER024W
YER073W
YER088W-B
YER102W
YER107C
not_in_intron_or_TIF
YER167W
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YBR025C
YBR046C
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YBR101C
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
chrII:519067:chrII:756982:chrVI:32028:chrVI:96258:+
chrVI:109969:chrVI:176327:chrVI:268722:+
chrXIV:69244:+
chrXIV:217113:chrXIV:250194:chrXIV:273231:+
chrXIV:349280:chrXIV:359077:chrXIV:369425:chrXIV:373658:+
chrXIV:380702:chrXIV:394037:+
chrXIV:427177:+
chrXIV:429575:chrXIV:429599:chrXIV:583910:chrXIV:611579:+
chrXIV:726914:chrXVI:101222:+
chrXVI:115269:+
chrXVI:146507:chrXVI:159662:+
chrXVI:196570:chrXVI:339245:+
chrXVI:339305:+
chrXVI:445519:chrXVI:481183:chrXVI:490809:chrXVI:513415:chrXVI:560470:+
chrXVI:579895:chrXVI:590144:chrXVI:616651:+
chrXVI:617122:+
chrXVI:685775:+
chrXVI:717057:chrXVI:883587:chrXI:93384:chrXI:94081:chrXI:158626:chrXI:166478:+
chrXI:191089:-
0.292060778
0.25033781
0.625844525
1.209966082
0.292060778
3.75506715
0.375506715
0.834459367
0.542398588
0.083445937
0.125168905
0.125168905
0.083445937
0.125168905
2.086148417
16.6057414
0.584121557
1.710641702
8.886992255
0.458952652
3.546452309
4.464357612
0.125168905
0.208614842
65.7553981
10.9314177
0.208614842
0.125168905
0.083445937
0.083445937
3.629898245
4.00540496
0.333783747
0.125168905
0.50067562
0.333783747
0.125168905
0.125168905
0.125168905
0.083445937
0.417229683
2.044425448
13.39307284
5.215371042
0.083445937
4.464357612
0.083445937
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YFL021W
not_in_intron_or_TIF
YFR015C
not_in_intron_or_TIF
YNL298W
YNL231C
YNL211C
not_in_intron_or_TIF
YNL149C
not_in_intron_or_TIF
YNL137C
not_in_intron_or_TIF
YNL130C
YNL124W
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YNL025C
YNL012W
not_in_intron_or_TIF
YPL237W
YPL230W
not_in_intron_or_TIF
YPL208W
YPL184C
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YPL037C
YPL032C
YPL020C
not_in_intron_or_TIF
YPR010C
YPR015C
not_in_intron_or_TIF
not_in_intron_or_TIF
YPR070W
YPR091C
not_in_intron_or_TIF
YKL186C
not_in_intron_or_TIF
not_in_intron_or_TIF
YKL150W
YKL134C
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
159 chrXI:193010:chrXI:225774:chrXI:231492:+
chrXI:272947:chrXI:285868:+
chrXI:290610:chrXI:408146:+
chrXI:446830:chrXI:490071:+
chrXI:633804:chrXI:649449:+
chrVII:51958:+
chrVII:59556:+
chrVII:140633:+
chrVII:167430:+
chrVII:253205:chrVII:262616:chrVII:293422:+
chrVII:293817:+
chrVII:345543:chrVII:380660:chrVII:383554:+
chrVII:414313:+
chrVII:423867:chrVII:427172:chrVII:436364:+
chrVII:504217:chrVII:574777:+
chrVII:593472:+
chrVII:594140:chrVII:607018:chrVII:658296:+
chrVII:682969:chrVII:730063:+
chrVII:772104:+
chrVII:792327:chrVII:875660:+
chrVII:914790:chrVII:962303:+
chrVII:1002936:+
chrVII:1052638:chrXIII:56618:+
chrXIII:112660:chrXIII:204536:chrXIII:242968:chrXIII:273520:chrXIII:301516:-
160 38.80236055
3.671621214
0.458952652
0.834459367
0.625844525
0.959628272
25.74307146
0.292060778
6.717397902
0.208614842
0.083445937
0.375506715
0.125168905
0.083445937
6.00810744
95.17009077
0.083445937
0.083445937
0.75101343
0.125168905
0.083445937
94.46080031
0.041722968
0.333783747
0.375506715
39.9288807
0.125168905
1.919256543
0.292060778
1.043074208
0.375506715
0.083445937
0.083445937
0.083445937
0.417229683
0.625844525
0.125168905
0.667567493
0.667567493
0.166891873
0.333783747
0.25033781
0.208614842
0.584121557
0.50067562
0.458952652
0.417229683
YKL133C
not_in_intron_or_TIF
YKL109W
not_in_intron_or_TIF
YKL080W
not_in_intron_or_TIF
YKL015W
YKR004C
not_in_intron_or_TIF
YKR098C
not_in_intron_or_TIF
YGL238W
YGL233W
not_in_intron_or_TIF
YGL178W
YGL136C
not_in_intron_or_TIF
YGL114W
YGL114W
not_in_intron_or_TIF
YGL065C
YGL063W
YGL045W
not_in_intron_or_TIF
YGL037C
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YGR089W
not_in_intron_or_TIF
not_in_intron_or_TIF
YGR141W
YGR150C
not_in_intron_or_TIF
YGR210C
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YML106W
YML076C
not_in_intron_or_TIF
YML015C
not_in_intron_or_TIF
YMR015C
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
chrXIII:493957:+
chrXIII:551142:chrXIII:559830:+
chrXIII:637453:+
chrXIII:647112:+
chrXIII:649541:chrXIII:653956:chrXIII:701813:+
chrXIII:810474:chrXIII:845438:chrXV:87614:+
chrXV:139434:+
chrXV:418633:+
chrXV:423675:chrXV:437796:chrXV:506801:chrXV:518764:chrXV:527237:+
chrXV:605429:+
chrXV:674437:chrXV:720499:+
chrXV:733355:chrXV:754237:chrXV:819159:+
chrXV:845022:+
chrXV:1059709:chrXV:1062512:chrIX:54026:+
chrIX:127053:chrIX:134040:+
chrIX:225116:+
chrIX:261766:+
chrIX:334600:chrIX:360694:chrXII:50297:+
chrXII:90638:+
chrXII:155741:chrXII:232370:+
chrXII:316897:+
chrXII:331859:chrXII:380684:+
chrXII:382366:+
chrXII:491545:+
chrXII:522715:+
chrXII:609447:chrXII:791911:chrXII:823170:+
0.125168905
6.049830409
128.5484654
0.667567493
1.168243113
0.125168905
0.166891873
0.166891873
0.292060778
0.083445937
0.667567493
0.208614842
0.125168905
1.293412018
0.458952652
16.68918733
2.711992942
0.417229683
0.292060778
0.083445937
2.75371591
7.76047211
0.166891873
0.125168905
13.05928909
0.125168905
1.877533575
0.834459367
1.126520145
8.636654445
0.166891873
1.710641702
0.208614842
0.166891873
1.877533575
0.25033781
0.792736398
0.166891873
0.292060778
0.584121557
0.458952652
1.418580923
3.838513087
14.56131595
7.885641015
0.083445937
0.292060778
not_in_intron_or_TIF
YMR142C
not_in_intron_or_TIF
YMR189W
YMR192W
not_in_intron_or_TIF
not_in_intron_or_TIF
YMR217W
YMR272C
not_in_intron_or_TIF
YOL123W
not_in_intron_or_TIF
not_in_intron_or_TIF
YOR049C
not_in_intron_or_TIF
YOR097C
not_in_intron_or_TIF
YOR109W
not_in_intron_or_TIF
YOR180C
not_in_intron_or_TIF
YOR207C
not_in_intron_or_TIF
YOR264W
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YIL153W
not_in_intron_or_TIF
YIL121W
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YLL043W
YLL026W
YLR002C
not_in_intron_or_TIF
YLR088W
YLR095C
YLR115W
YLR116W
not_in_intron_or_TIF
YLR185W
YLR233C
not_in_intron_or_TIF
not_in_intron_or_TIF
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
161 chrXII:918954:chrXII:928301:chrXII:965102:+
chrXII:982469:chrXII:987203:+
chrXII:987240:+
chrIII:84709:+
chrIII:120379:+
chrIII:169189:chrIII:228656:+
chrIII:290580:+
chrX:31820:+
chrX:209415:+
chrX:237005:+
chrX:349225:+
chrX:422631:+
chrX:455748:+
chrX:506239:+
chrX:535286:chrX:632996:chrX:649521:chrX:712053:chrIV:104615:+
chrIV:122159:+
chrIV:130328:+
chrIV:188133:chrIV:232837:+
chrIV:235107:+
chrIV:247245:chrIV:268933:chrIV:331277:+
chrIV:333709:+
chrIV:381064:chrIV:392681:+
chrIV:456688:chrIV:540585:+
chrIV:676357:+
chrIV:698041:chrIV:721691:chrIV:768400:chrIV:929253:+
chrIV:1021386:+
chrIV:1080238:chrIV:1081204:+
chrIV:1111887:+
chrIV:1114334:+
chrIV:1333316:-
162 0.458952652
0.667567493
0.083445937
71.84695147
7.969086952
0.542398588
0.083445937
0.125168905
0.208614842
0.834459367
0.166891873
12.18310675
0.333783747
0.25033781
0.50067562
4.631249485
0.083445937
0.125168905
0.125168905
59.1631691
0.125168905
0.125168905
0.625844525
465.4614348
0.208614842
0.166891873
1.00135124
4.631249485
0.125168905
2.127871385
31.87634781
0.25033781
0.166891873
0.584121557
1.376857955
1.293412018
1.335134987
0.125168905
0.125168905
1.50202686
2.086148417
0.125168905
0.125168905
0.50067562
0.458952652
2.378209195
0.125168905
YLR398C
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YLR426W
YLR426W
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YCR063W
YCR095W-A
YJL213W
YJL111W
YJL100W
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YJR109C
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YDL189W
YDL185W
YDL148C
YDL128W
YDL127W
YDL119C
not_in_intron_or_TIF
YDL070W
not_in_intron_or_TIF
YDL040C
not_in_intron_or_TIF
YDR005C
YDR041W
YDR110W
YDR123C
not_in_intron_or_TIF
not_in_intron_or_TIF
YDR232W
YDR280W
YDR309C
not_in_intron_or_TIF
not_in_intron_or_TIF
not_in_intron_or_TIF
YDR435C
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
chrIV:1490517:chrIV:1519505:chrI:126071:+
chrI:134871:chrI:142343:+
chrVIII:97941:chrVIII:148769:chrVIII:335607:chrVIII:372283:+
chrVIII:400568:chrVIII:442922:+
chrVIII:491866:-
1.084797177
0.125168905
0.166891873
0.083445937
4.297465739
0.166891873
5.674323694
0.75101343
0.292060778
0.25033781
0.083445937
0.083445937
not_in_intron_or_TIF
YDR541C
YAL016W
YAL010C
YAL003W
YHL007C
not_in_intron_or_TIF
YHR112C
YHR134W
YHR151C
YHR169W
not_in_intron_or_TIF
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
cnBP
163 Table II-­‐S5. SacCer 3 coordinates of lariat junction reads List of BPs detected through LJ reads from the Lariat-­‐seq data. BP positions were determined from locations of the 3' most end of LJ reads and sequence information as described in the methods. The 'reads_at_dist_to_bp' field represents the number of LJ reads ending at various positions from the reported BP, the first in the list is zero away. The final two fields are marked 1, if the 5'SS or BP was previously annotated in the SGD annotations or by Meyer et al. respectively, or 0 otherwise. strand chrom 5'ss bp reads_at_dist_to_bp bp_seq anno_5'ss anno_bp -­‐ chrII 47146 47074 2,0,2 GTTACTAATATG
1 1 + chrII 125155 125231 49 ATTACTAACATT
1 1 + chrII 170677 170768 0,11 AGTACTAACGTT
1 1 -­‐ chrIII 178213 177956 1,6 GATACTAACAAC
1 1 + chrIV 122078 122159 19 CGTACTAACAAC
1 0 -­‐ chrIV 215384 215274 19 TTTACTAACGAG
0 0 + chrIV 230020 230262 122,12,5 TTTACTAACAAA
1 1 -­‐ chrIV 239509 239421 4,9,353 AATACTAACAAT
1 1 + chrIV 331189 331277 23 ATCACTAACCTG
0 0 + chrIV 399362 399470 100,5,0,0,0,0,3 AATACTAACCAT
1 1 -­‐ chrIV 715358 715267 109 TATACTAACAAA
1 1 -­‐ chrIX 99385 99153 1,6,33 AATACTAACAAA
1 1 + chrIX 225875 225994 2 TTTACTAATATT
0 0 + chrIX 317018 317138 5 TTTACTAACAGG
0 1 -­‐ chrIX 348494 348383 1,16 TTTACTAACTAT
1 1 -­‐ chrV 166874 166787 4 ATTACTAACATC
1 1 -­‐ chrV 248671 248563 2 CATTCTAACATT
0 0 + chrV 362733 362835 2 AATTCTAACGCA
1 0 -­‐ chrV 505180 505049 0,0,2 AGTACTAACCAG
0 0 -­‐ chrVI 221414 221303 30,0,0,0,0,0,0,0,0,8 TATACTAACAGA
1 1 -­‐ chrVII 73137 73035 10 ATTACTAACAAG
1 1 + chrVII 249887 250015 35 GTTACTAACAGG
1 1 -­‐ chrVII 497458 497390 0,0,5 CTTACTAACTGT
1 1 -­‐ chrVII 1061028 1060825 10 GATACTAACTTT
0 0 + chrVII 1084883 1085006 14 CTTACTAACTGA
1 1 + chrVIII 129529 129617 2 AATACTAACATA
1 1 -­‐ chrVIII 138408 138281 5,65 TGTACTAACAAC
1 1 + chrVIII 251156 251231 0,1 GAGACTAACTTT
1 1 -­‐ chrVIII 505516 505289 15 TTTACTAACAAG
1 1 + chrX 73797 74179 2 TTTACTAACAAC
1 1 + chrX 236903 237010 3 ATCACTGACATA
0 0 164 + chrX 435228 435309 2 ATTACTAACTAA
1 1 + chrX 608307 608548 32,3,3 TTTACTAACAAA
1 1 -­‐ chrX 703054 702881 5 TTTACTAACGAG
1 1 -­‐ chrXI 355283 355153 0,4 CATACTTACAGT
0 0 -­‐ chrXI 430596 430520 3,5 GCTACTAACTAT
1 1 -­‐ chrXII 327399 327294 5 TAGACTAACGTT
1 1 + chrXII 382301 382388 6 TTTACTTACTAG
0 0 -­‐ chrXII 707892 707769 0,0,2 CGTACTGACATT
0 0 -­‐ chrXIII 23658 23500 3,30 TTTACTAACAGT
1 1 + chrXIII 236592 236788 72,2,1 GCCACTAACAAT
1 1 -­‐ chrXIII 243056 242969 0,0,4 AATACTGACAAT
0 0 + chrXIII 424998 425114 255,10,6 TTTACTAACAAA
1 1 -­‐ chrXIII 500151 499899 2,42 CTTACTAACAAA
1 1 -­‐ chrXIII 721345 721232 313 AATACTAACAGC
1 1 + chrXIV 48294 48378 10,0,3 ATTACTAACAAT
1 1 -­‐ chrXIV 237531 237419 0,0,9 TTTACTGACCTA
0 0 -­‐ chrXIV 380781 380701 11 ATTACTAATCTG
1 0 + chrXIV 502164 502269 0,0,10 GATACTGACTAT
0 0 + chrXIV 616067 616229 4,0,4 TCAACTTACTGT
0 0 -­‐ chrXV 552874 552738 27,23,51 ATTACTAACTGG
1 1 + chrXV 780121 780265 604,27,100 TTGACTAACACA
1 1 -­‐ chrXVI 76223 76014 6 GTTACTAACATA
1 1 -­‐ chrXVI 281503 281386 18 TTGACTAACACA
1 1 -­‐ chrXVI 582701 582570 76,0,0,0,0,0,0,0,0,0,0,10,59 TATACTAACAAA
1 1 + chrXVI 623578 623665 481,39,30 TATACTAACAAG
1 1 + chrXVI 833694 833783 18 AAAACTAACAAT
1 1 + chrXVI 943051 943174 6,0,1,0,5 CTTACTAACTGA
1 1 165 Table II-­‐S6. Novel splice junctions with entropy ≥ 2 bits. Junction field is the chromosome, first splice site coordinate, second splice site coordinate, and strand joined by colons. The first splice site is always the one more 5' on the chromosome (lower number), so if the strand is “+” the first SS is the 5'SS and if the strand is “-­‐” the first SS is the 3'SS. a_ss1 and a_ss2 are 0 for unannotated splice sites and 1 for annotated splice sites. a_ss_pair is 1 if this pair of splice sites is annotated as a splice junction in the existing SGD gff annotations. K_SS_1, K_SS_2 and K_SS_pair are set to: 0 if not annotated and not found by Kawashima et al., 1 if annotated by Kawashima et al., 2 if not annotated but occurring in a Kawashima et al. junction from WT or UPF1 null, and 3 if not annotated, but occurring in a Kawashima et al. junction not from WT or UPF1 null. intron_containing_gene is set to 0 if not from an intron containing gene and 1 if it is. a_SS
1
a_SS
2
a_SS_p
air
stran
d
entro
py
0
0
0
-
2.61
0
1
0
-
4.21
0
1
0
-
3.24
0
1
0
+
3.66
1
0
0
+
3.54
1
0
0
-
4.19
1
0
0
+
3.50
0
1
0
-
2.32
0
0
0
+
3.73
0
1
0
-
2.86
1
0
0
+
2.16
2
0
0
-
3.88
1
0
0
+
2.00
1
0
0
+
4.78
0
1
0
+
2.85
0
1
0
+
2.72
0
1
0
+
3.38
1
0
0
+
4.10
0
0
0
+
3.90
0
0
0
+
3.23
0
1
0
+
2.48
1
0
0
+
2.25
chrII:60188:60697:chrXVI:412255:4129
95:+
chrXIII:23359:23654
:chrXII:28461:28834:
-
0
1
0
-
2.12
0
0
0
+
2.95
1
0
0
-
3.03
0
0
0
-
chrIV:1359968:13603
1
0
0
+
junction
chrXVI:717048:7171
46:chrVII:72983:73137:
chrXIV:494321:4949
73:chrIV:1401795:14021
84:+
chrXVI:406645:4070
19:+
chrXI:618373:618526
:chrV:131775:131880
:+
chrXIV:443658:4441
71:chrIV:601387:601496
:+
chrII:110218:110505
:chrXIV:557609:5576
98:+
chrII:426515:426630
:chrXII:242320:24269
0:+
chrII:393179:393507
:+
chrV:423823:423951
:+
chrX:608481:608581
:+
chrXIV:331451:3318
37:+
chrXVI:138724:1388
68:+
chrVII:436311:43637
4:+
chrXVI:412311:4129
95:+
chrIV:308521:308792
:+
chrIII:107032:10730
4:+
166 2.25
5ss_long
GTATGCTT
TT
GTACGTTG
CC
GTACGTAA
AA
GTTGGTAC
GT
GTATGTCC
AT
GTTTGTTT
GT
GTATGTTT
GA
GTATGTTA
AA
ATTGACTA
TC
GTAAGTAT
CC
GTATGTAT
TC
GTAAGTCA
GG
GTATGTAC
AC
GTATGTAC
AC
GTAGAGGC
AA
GTAGGTCC
AC
GTATAATC
TG
GTATGTTA
TC
GTATGTAT
TT
GTATGGAG
TT
GCATGCAT
AA
GTATGTGT
CA
GTATGTTA
AA
GTATGGTA
TG
GTATGCGT
TC
GTATGACA
CA
3ss_long
CGCTATAA
AG
CTTAAGAA
AG
CTCCATCT
AG
ATTATCGT
AG
AATTTAAA
AG
TTTTGTAC
AG
AACTTCAA
AG
AACCATCT
AG
AACCAACC
AC
ATTGAGGA
AG
ATTTGCCC
AG
ATTAACTT
AG
GTCCACCA
AG
TTTATCTAA
G
TAATTTTTA
G
TTTTTTGC
AG
TTCTATTTA
G
GTACAGTC
AG
ATCTTTAC
AG
ATTGGAAC
AG
ATTATATC
AG
AGAAGTAC
AG
AACTAGTT
AG
ATTGGAAC
AG
TTTACAAC
AG
GGATATTA
AG
4.10
GTATGTTT
TATCAATA
k_SS
1
k_SS
2
k_SS_p
air
intron_co
ntaining_
gene
0
0
0
0
2
1
2
1
3
1
3
1
2
1
2
1
1
2
0
1
1
2
0
1
1
2
0
1
0
1
0
1
0
0
0
0
3
1
3
1
1
2
0
1
1
2
0
1
1
2
0
1
1
2
0
1
0
1
0
0
0
1
0
1
2
1
2
1
1
3
0
1
0
0
0
1
2
2
2
1
0
1
0
1
1
0
0
1
2
1
2
1
2
2
2
1
1
2
0
1
0
0
0
0
1
2
0
1
73:+
AT
AG
chrXIII:551950:5525
07:+
chrXIV:185490:1855
66:+
chrII:168647:168808
:+
chrXIII:914403:9146
48:chrXIII:424996:4251
24:+
chrIV:216156:216512
:+
chrVII:311227:31152
6:+
chrXVI:271302:2718
96:+
chrVII:365508:36598
5:chrXI:551679:552043
:+
chrXII:242320:24316
2:+
chrVII:555829:55610
9:+
chrVII:497364:49799
9:chrXI:155270:155636
:+
chrVIII:498706:4987
86:chrVII:253184:25324
8:chrIV:417220:417626
:chrIV:629904:630056
:+
chrIV:601314:601470
:+
chrII:691965:692125
:+
chrXII:233641:23388
5:chrXI:437836:437925
:+
chrXII:766185:76624
9:chrXIV:557573:5576
84:+
chrVII:534472:53478
1:chrXII:242320:24282
6:+
chrII:393179:393669
:+
chrII:691965:692133
:+
GTATGTTT
TC
GTATGTAG
GA
GTACGTGT
CT
GTAAGTAA
GT
GTATGTTG
TT
GTATGTAA
CG
GTACTCTT
CC
GTAAGTAT
GA
GTATGTAT
AC
GTATGTTC
GA
GTATGTAC
AC
GTATGTTT
GG
GTAAGTAC
AG
GTATGTTT
AC
GTATGTCA
CC
GTATGAAC
CC
GTATGTAT
TT
GTATGTTC
AA
ATACTACT
TA
GTATGGAA
AC
GTATGTCT
TG
GTATGTTG
TT
GTATGTAT
CT
GTACGTAA
AT
GTATCTAT
AA
GTATGTAC
AC
GTATGTAC
AC
GTATGGAA
AC
GTTGTTAG
AT
GTATGTTC
TG
GTATGTTG
TT
GTATGGAT
GT
GTATGTGC
AA
GTATGTTC
AT
GTAGGTCA
TG
GTATGTTA
CT
GTATGTAG
GA
GTACGTTG
AC
GTATGGAT
GT
GTAAGATC
AG
GTATGTAC
AG
GTATGTCT
GT
TTTTGGTA
AG
ATGCACTT
AG
TTTTTCACA
G
ATGTGGTC
AG
CAAACACA
AG
AGGAATTA
AG
TTTTGTAC
AG
GTTCATCA
AG
TTGACCCC
AG
CTACCAAC
AG
TGGTCGAC
AG
CCGGCTTT
AG
TATAAAAT
AG
AAGTACGA
AG
GAAACAAC
AG
CTTATTTTA
G
ATTGTTAT
AG
TTTTGTCC
AG
CACTGAGT
AC
AGTTTAGA
AG
ACCGCCCC
AG
CTTTAGAA
AG
CATATATA
AG
AACAATGC
AG
TTTATCATA
G
TCTCTTCC
AG
AGTATCCA
AG
AGGTAAGC
AG
TCATGTAT
AG
ACGCAAAC
AG
CCTTGATC
AG
GCAACAGC
AG
ACTATCAA
AG
AAATTAAC
AG
AACTATCT
AG
GACCATCA
AG
AGTGCGGT
AG
GTTAATTT
AG
CCAGCAAC
AG
AGTTCTTA
AG
TCCTATGT
AG
TAGTTTAA
AG
chrX:74112:74204:+
chrII:653367:653524
:+
chrIV:399360:399495
:+
chrI:128520:129021:
chrXIV:616065:6164
12:+
chrIV:733684:733775
:chrXII:281426:28162
8:chrII:142752:142846
:chrXIV:185490:1855
78:+
chrIV:306804:307073
:chrI:128523:129021:
chrXIII:557827:5580
01:chrXVI:729352:7294
81:chrVII:62130:62196:
+
1
0
0
+
4.13
1
0
0
+
3.78
0
1
0
+
2.59
0
0
0
-
2.16
1
0
0
+
4.52
1
0
0
+
2.92
0
1
0
+
3.45
0
0
0
+
2.42
0
1
0
-
3.19
1
0
0
+
3.03
1
0
0
+
2.48
1
0
0
+
2.99
1
1
0
-
2.32
1
0
0
+
4.46
0
1
0
-
3.90
0
0
0
-
4.80
0
0
0
-
3.29
1
0
0
+
2.66
0
0
0
+
3.43
0
0
0
+
2.46
0
0
0
-
3.45
1
0
0
+
2.90
0
1
0
-
3.20
0
1
0
+
2.75
1
0
0
-
2.94
1
0
0
+
2.29
1
0
0
+
2.52
0
0
0
+
3.87
0
1
0
+
4.44
1
0
0
+
3.61
1
0
0
+
2.67
0
0
0
-
3.96
0
0
0
+
2.97
0
1
0
-
2.97
0
0
0
-
2.83
0
1
0
-
4.03
1
0
0
+
3.96
0
0
0
-
3.46
0
0
0
-
3.11
0
0
0
-
2.64
0
1
0
-
2.72
1
0
0
+
3.86
1
2
0
1
1
2
0
1
0
1
0
1
0
0
0
0
1
2
0
1
1
0
0
0
3
1
3
1
0
0
0
0
3
1
3
1
1
2
0
1
1
3
0
1
1
2
0
1
1
1
0
1
1
2
0
1
2
1
2
1
0
0
0
0
0
0
0
0
1
3
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
2
0
1
3
1
3
1
2
1
2
1
1
2
0
0
1
2
0
1
1
0
0
1
0
0
0
0
2
1
2
1
1
3
0
1
1
2
0
1
0
0
0
0
0
0
0
0
3
1
3
1
0
0
0
0
2
1
2
1
1
2
0
1
0
0
0
1
0
0
0
0
0
0
0
0
2
1
2
1
1
2
0
1
167 chrIV:307015:307765
:chrIV:1406991:14072
31:+
chrX:649513:649657
:chrXII:982457:98253
8:chrXVI:673747:6743
76:+
chrXII:856568:85705
7:+
chrXV:867396:86758
6:+
chrVI:223439:223727
:-
4.31
GTATGTTA
AA
GTATGTTA
CG
GTAAGTTA
TG
GTATGTAT
GA
GTATGTCT
GA
GCATGGTA
TG
GTATGAAA
TA
GCAGGTAG
CC
GTACGTTA
AT
GTACGTAT
TA
GTATAACA
TG
ATAACATG
AT
GTACGTAT
AA
GTATGTTA
TT
GTATGCTT
TT
GTATGTTG
AT
GTGTGTTA
GT
GTTTGTGT
AC
GTATGTTC
AT
GTAAGTAT
TT
GTATGTCT
GT
ATGGATTT
TT
GTATAAAA
AA
GTAAGTAC
AC
GTTAAAAA
GC
GTATGCAG
AA
GTATGTTA
AA
GTATGTCA
AG
GTATGGTA
CC
GTATGAGA
AT
GTACTGCA
AT
GTATGTTT
TC
GTATGTAT
AT
GTTGGTAG
CA
GTACGTAT
AA
GTATGTAA
CG
GTATGTGG
AC
GTATGTTG
TT
GTATGTAC
AA
GTAATGGT
AA
GCATGTTT
AT
GTATGAAT
AT
CTTACGAC
AG
ACTCACTT
AG
ATACAAAA
AG
GTTTGAGT
AG
TGAACAAA
AG
CCTTATTTA
G
TTTTATACA
G
TCTTCTCC
AG
TTTGAAGA
AG
TTTTTAGG
AG
ATTTCAAC
AG
TATCGTTT
AC
GACGTTGC
AG
AGGAAAAA
AG
TTCCACTC
AG
ATATAGAA
AG
TATAATCA
AG
ATTTCGAC
AG
TTGAAAAA
AG
GCCAGCAC
AG
ATTTTTGA
AG
CTGTACGG
AC
ATTATTGC
AG
TTTGAGAA
AG
GTTGTTGC
AG
TTGGTTAT
AG
CGATATTG
AG
CAGGCTAA
AG
TGTCGTAC
AG
ATTTAAAC
AG
ATTAAAAT
AG
TGGTAAGA
AG
TGCGTTCA
AG
TTTTTCACA
G
GTTTTTATA
G
GATTTCAT
AG
CAAAACAT
AG
AGAATCAA
AG
TTGTCAAA
AG
TTTATTATA
G
CTTTTTTTA
G
ATTTTGAT
AG
3.10
GCAGGTAA
TGTAGTAT
0
1
0
-
2.62
0
0
0
+
2.16
0
0
0
-
2.24
0
0
0
-
2.78
0
0
0
+
2.50
0
1
0
+
3.18
0
1
0
+
3.88
1
0
0
-
3.05
chrVI:64352:64920:chrII:604513:604930
:+
chrVII:439096:43932
3:+
chrVII:439098:43931
3:+
chrXIV:62360:62923:
chrXV:505936:50599
5:+
chrII:170619:170804
:+
chrVIII:505242:5055
16:chrII:342697:342789
:+
chrXIV:272386:2736
01:chrII:170675:170757
:+
chrXIV:728553:7289
21:+
chrXIII:559781:5601
57:+
chrII:565749:566935
:+
chrXII:898547:89864
5:+
chrVII:439380:43947
9:+
chrXII:457115:46625
4:chrIV:340809:341183
:-
0
1
0
-
3.28
1
0
0
+
2.55
0
1
0
+
4.16
0
0
0
+
3.80
0
1
0
-
3.59
1
0
0
+
3.34
0
1
0
+
4.07
0
1
0
-
3.08
0
0
0
+
3.63
0
0
0
-
2.28
1
0
0
+
3.87
0
0
0
+
3.35
0
0
0
+
5.14
0
0
0
+
2.25
0
1
0
+
3.51
0
0
0
+
4.35
0
0
0
-
2.92
1
0
0
-
3.82
chrII:60207:60697:chrXI:109576:109890
:+
chrXV:423656:42373
4:chrIX:317016:317171
:+
chrV:362911:363092
:+
chrXIII:551950:5525
10:+
chrIII:101604:10170
0:chrII:168553:168808
:+
0
1
0
-
2.65
1
0
0
+
2.82
0
0
0
-
2.85
0
1
0
+
4.39
0
1
0
+
2.37
1
0
0
+
2.16
0
1
0
-
2.59
0
1
0
+
3.45
chrXI:93365:93465:chrIV:216156:216521
:+
chrXII:855876:85642
7:+
chrXI:437836:437913
:+
chrXVI:795028:7953
77:+
chrXIV:494523:4946
32:chrIV:491687:491898
:+
chrXV:778858:77925
2:-
0
1
0
-
5.06
1
0
0
+
3.08
1
0
0
+
4.37
1
0
0
+
2.99
1
0
0
+
2.66
1
0
0
-
2.12
0
1
0
+
4.33
1
0
0
-
chrX:469182:469405
1
0
0
-
168 0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
1
2
1
2
1
2
1
1
3
0
0
2
1
2
1
1
0
0
1
2
1
2
1
2
2
2
1
2
1
2
1
1
2
0
1
0
1
0
1
2
1
2
1
0
0
0
0
0
0
0
0
1
2
0
1
0
0
0
0
0
0
0
0
0
0
0
0
2
1
2
0
2
2
2
1
0
0
0
0
1
0
0
0
3
1
3
1
1
2
0
1
0
0
0
0
2
1
2
1
3
1
3
0
1
0
0
1
3
1
3
1
3
1
3
1
2
1
2
0
1
0
0
0
1
0
0
0
1
2
0
1
1
2
0
1
1
0
0
1
2
1
2
1
1
3
0
1
1
2
2
1
:-
AC
AG
chrVIII:148115:1485
08:chrVIII:103616:1038
56:chrVII:383484:38358
0:+
chrXVI:795133:7953
94:+
chrXV:349496:34959
8:chrXIV:144846:1452
54:chrXII:242320:24277
5:+
chrXI:625900:625986
:+
chrVII:62130:62183:
+
chrVII:920663:92112
9:+
chrXII:522889:52302
8:+
chrVII:365525:36596
9:chrXIV:62369:62923:
chrIV:122076:122167
:+
chrIV:306804:307765
:chrXVI:303560:3036
24:+
chrXV:505936:50625
2:+
chrIV:655202:655272
:+
chrXII:987140:98734
9:+
chrIV:122076:122194
:+
chrXVI:243488:2440
25:chrII:168599:168808
:+
chrXV:373898:37412
2:+
chrXIII:666921:6670
17:-
GTATGCGT
TT
GTAAGGTG
AG
GTATGTAT
GA
GTAAGGGA
GA
GTATGTTT
TT
GTATGTTT
AT
GTATGTAC
AC
GTAAGTAG
AA
GTATGTCT
GT
GTATGTTA
TA
GTATGCCT
GA
GTCTATTTT
A
GTACGTAT
AA
GTATGTTG
AA
GTATGTTA
AA
ATTGGTTT
GC
GTATGTTA
TT
GTATGCTT
CC
GTATGTAA
AG
GTATGTTG
AA
GTATGTTT
CT
GTACGAAT
TG
GTAAGCAT
TC
GTATGTGT
GA
TTCATTAC
AG
GTAACTTG
AG
ATGGTAGT
AG
TTTAAAAC
AG
TTACTTTTA
G
AAGCTATC
AG
TGGCTGCT
AG
AGAACTAA
AG
CGATCAAA
AG
GTTCACCA
AG
TTTTAAATA
G
ATTATTAC
AG
TCAATTAG
AG
AACAACTA
AG
GTTAATTT
AG
TGCTTCTG
AC
CATTTACA
AG
TGATAATC
AG
AATGGCAT
AG
AGTATATA
AG
TACTTAGA
AG
TTTTTCACA
G
ATCCTATA
AG
AGGCTAAC
AG
1
0
0
-
4.02
0
0
0
-
3.11
0
0
0
+
3.31
0
1
0
+
3.90
0
0
0
-
2.73
0
1
0
-
3.20
1
0
0
+
3.33
1
0
0
+
3.18
1
0
0
+
3.91
1
0
0
+
2.65
0
1
0
+
3.42
1
0
0
-
2.82
0
1
0
-
2.50
1
0
0
+
2.11
0
1
0
-
3.76
0
0
0
+
2.00
1
0
0
+
2.34
0
0
0
+
3.06
1
0
0
+
3.45
1
0
0
+
2.00
0
0
0
-
3.55
0
1
0
+
2.69
0
0
0
+
2.48
0
1
0
-
2.04
1
2
0
1
0
0
0
0
0
0
0
0
2
1
2
1
0
0
0
0
2
1
2
1
1
0
0
1
1
2
0
1
1
2
0
1
1
2
0
1
3
1
3
1
1
2
0
1
0
1
0
1
1
0
0
0
0
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
2
0
1
1
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
2
1
2
1
169 170 Appendix III: BP Identification in Metazoans 171 Abstract Pre-­‐mRNA splicing is required to produce the full complement of protein diversity observed in cells by removing introns from transcripts to produce mature mRNAs. It has been shown in fly and human that introns longer than 10 Kb can be removed by sequential splicing reactions, known as recursive splicing. We have discovered the first endogenous case of recursive splicing in a short intron, only 383 nt long. By performing Branch-­‐seq to locate fly branch points (BPs), we observed the second intron of RPL13A is spliced out via two lariats. These lariats have a recursive splice site motif located between them, as expected for a recursively spliced intron. The recursive splicing event places the two snoRNAs in the RPL13A second intron into distinct lariats, perhaps impacting the processing of these snoRNAs. Introduction Recursive splicing is the process by which an intron is spliced in two or more separate segments. It involves a 5' splice site (5'SS) that splices to a 3' splice site (3'SS) juxtaposed to another 5'SS located inside of the intron. The first splicing reaction regenerates a 5'SS so another splicing reaction can occur. Recursive splicing was first discovered in the Ubx gene in Drosophila melanogaster (Hatton, Subramaniam, & Lopez, 1998; Langmead, Trapnell, Pop, & Salzberg, 2009) and has subsequently been detected in additional fly genes and in human genes (Burnette, Miyamoto-­‐Sato, Schaub, Conklin, & 172 Lopez, 2005; Duff et al., 2015; Kelly et al., 2015; Sibley et al., 2015) through a combination of computational predictions and analyses of experimental data. We performed Branch-­‐seq to study the impact of BP regulation on the outcome of splicing decisions in fly. Drosophila melanogaster introns are on average 81 nt long (Lim & Burge, 2001) and Branch-­‐seq works best on lariat loops less than 100 nt in length. This combination made fly an ideal organism for BP sequencing using Branch-­‐seq. An additional advantage of Branch-­‐seq is that unlike computational BP prediction methods that rely on proximity of BP motifs to annotated 3'SS to predict BP locations, Branch-­‐seq’s untargeted experimental approach has the ability to locate BPs at arbitrary intronic locations. Here we report the first example of recursive splicing inside a short intron. Methods Cell culture S2R+ cells were a kind gift from Dr. Jessica Hurt. Cells were grown at room temperature in the dark in Schneider's Drosophila Medium (modified with L-­‐Glutamine, VWR cat# CA12001-­‐982) with 10% FBS (heat inactivated) and Penicillin-­‐Streptomycin. ldbr RNAi Cells were grown for 3-­‐5 days after application of dsRNA against debranching enzyme (ldbr). The dsRNA constructs DRSC10933 and DRSC36280 were obtained from the 173 Drosophila RNAi Screening Center (DRSC) and amplified according to their protocols http://www.flyrnai.org/DRSC-­‐PRO.html. ldbr qPCR RNA was isolated from S2R+ cells using Trizol (Invitrogen). Reverse transcription was performed with random hexamer primers and PCR was performed with the primers bellow: actin_L: TCTGGGTATGATCTGGACGA actin_R: CAGACCATCCTTGAACGACA ldbr_L: ACGACACCATAGAGGGCATC ldbr_R: CCACTGTAGTATTTGTAAAAGGAGCA
Branch-­‐seq Branch-­‐seq was performed as in Chapter 2 with the following modifications. For the 2D gel: D1=6%, D2=20%. The arc was isolated as one sample (not split in to top, middle, and bottom). In the first Branch-­‐seq experiment, the WT arc, WT “tRNA”, ldbr knockdown (KD) arc, and KD dot were sequenced. In the second Branch-­‐seq experiment, the KD arc was sequenced and a band corresponding to smaller material was cut from the RT gel to try to increase the percentage of reads mapping to the fly genome. 2D Gels Worm 2D gels were run under the same conditions as the fly and yeast (Chapter 2) 2D gels. Mouse 2D gel conditions: D1 was 6% and run at 100V for 2hr, D2 was 10% and run 174 at 200V for 90 min. Mouse D1 and D2 gels were small, pre-­‐cast, denaturing PAGE gels from Invitrogen. Sequencing Branch-­‐seq libraries were sequenced (150 nt by 150 nt paired end reads) on the MiSeq in the MIT Bio Micro Center and on the Reddien Lab MiSeq. Reads were mapped as in Chapter 2 using Bowtie to dm3 (Langmead et al., 2009). Results Knockdown of ldbr does not result in a noticeable accumulation of lariat RNA Knockdown of ldbr by RNAi in Drosophila S2R+ cells did not result in a noticeable accumulation of 2D arc RNA (Fig. III-­‐1A) despite a 90% knockdown of ldbr RNA level (Fig. III-­‐1B). Similarly, an arc was not visible in RNA from worms that were null for ldbr (Fig . III-­‐
S1A and see supplemental note). This was surprising since we observed a striking difference in 2D arc intensity in WT versus dbr1∆ yeast RNA (Fig. 2-­‐1A). However, yeast heterozygous for DBR1 do not accumulate lariats (Chapman & Boeke, 1991; Hatton et al., 1998), so it is possible that even low levels of debranchase protein are sufficient to debranch all available lariats. Additionally, if ldbr protein has a long half-­‐life, the RNA levels of ldbr would not be a good proxy for amount of debranching activity in the sample. 175 Figure III-­‐1: ldbr knockdown by RNAi in S2R+ cells does not cause a noticeable accumulation of lariat RNA. (A) 2D gels of RNA from WT cells (left) and cells knocked down for ldbr (right). (B) qPCR quantification of ldbr knockdown efficiency. 176 Though we did not observe an abundant accumulation of lariat RNA in the RNAi treated S2R+ cell RNA, we did observe a faint arc in both WT and knockdown 2D gels (Fig. III-­‐1A). Using that arc material we proceeded with Branch-­‐seq library preparation. Branch-­‐
seq was performed twice in an attempt to increase the quality of the data (see methods). Fly Branch-­‐seq reads largely do not map to the fly genome Unfortunately, a very low fraction of the total reads mapped uniquely to the fly genome. In the second Branch-­‐seq sample, an order of magnitude more reads mapped uniquely to the fly genome than the first sample, but this fraction was still only 3%. Mapping only the first 30 nt (bases 1-­‐30) or second 30 nt (bases 31-­‐60) of the reads did not improve the mapping statistics (Fig. III-­‐2A). The reads did not appear to come from phiX (spike in control for sequencing), E.coli, mouse, nor human (Fig. III-­‐2A). BLASTing (Altschul, Gish, Miller, Myers, & Lipman, 1990; Burnette et al., 2005; Duff et al., 2015; Kelly et al., 2015; Sibley et al., 2015) the reads suggested homology to snRNAs, but turned up species such as dolphin, which we do not work with in the lab. In all, we obtained 96,185 reads that did map to the fly genome. Many of these reads map to introns and correctly identify annotated 5'SS. Surprisingly, the BP end reads are often very close to annotated 3'SS. In the case of CG9796 the BP end reads end at a typical BP motif (Fig. III-­‐2B), but in most cases the BP reads are located a few nucleotides from the 3'SS, making it appear that the tail of the lariat was not digested (Fig. III-­‐2C). Additionally, Branch-­‐seq seems to have sequenced some snoRNAs (Fig. III-­‐3A, top). The second sequencing experiment selected for shorter RNA fragments than the first experiment, explaining the differences observed in the first and second sequencing runs. 177 Fly Branch-­‐seq reads identify the first recursive splice site in a short intron Surprisingly, the Branch-­‐seq reads identify a recursive splice site in the second intron of RPL13A which is only 383 nucleotides long (Fig. III-­‐3A). The recursive splice site contains the AG|GT sequence typical of recursive splice sites, where the “|” represents the location of the 3'SS to 5'SS boundary. As is the case in most of the other introns observed, the BP reads map very close to the 3'SS. Interestingly, the recursive splice site is located in between two snoRNAs in the RPL13A intron. The UCSC Genome Browser depicts this intron as having an alternative 3'SS (Fig. III-­‐3B) which coincides with the location of the recursive splice site. 178 179 Figure III-­‐2: Rare Branch-­‐seq reads that map to the fly genome often map to introns. (A) Read mapping statistics for first (left) and second (right) Branch-­‐seq experiments. Reads that only map to one genomic location are blue. Reads for which the first 30 nt mapped uniquely to the genome were used for downstream analyses. (B) Reads that map to CG9796 identify the annotated 5'SS and a BP with a typical BP motif (boxed). Note CG9796 is on the reverse strand. 5'SS reads in pink, BP reads in blue. A single nucleotide polymorphism can be observed as the solid vertical blue stripe in the 5'SS reads. Dotted liens show 5'SS and BP read ends. (C) Reads that map to introns often map to the annotated 5'SS and to the annotated 3'SS with lower accuracy (dotted line for 3' end reads). Example genes shown: CG17836, B52, and InR. All reads are 30nt (serve as scale bars). 180 Figure III-­‐3: Branch-­‐seq identifies a recursive splice site in a short intron. (A) The first Branch-­‐seq experiment identified the AG|GT ratchet site in the RPL13A intron. As in Figure III-­‐2C, the BP reads are very close to the 3'SS. The second experiment sequenced the snoRNAs inside the intron. (B) UCSC Genome Browser screen shot of the annotated gene structure of RPL13A depicts an alternative 3'SS in the ratchet intron (Kent et al., 2002; Lim & Burge, 2001) http://genome.ucsc.edu. 181 Discussion Our discovery of recursive splicing in RPL13A is the first report of recursive splicing in an intron <10 Kb in length. Studies to date have only computationally predicted and subsequently found experimental evidence of recursive splice sites in introns ≥10 Kb. The recursive splicing in RPL13A might be used to regulate the levels of the two snoRNAs inside the intron, since snoRNA placement inside of introns, specifically snoRNA to BP distance, is known to be important for snoRNA processing. Further experiments are needed to determine if these snoRNAs are located in the lariat loops or lariat tails. Additionally, the recursive splicing may dictate 3'SS choice, but further experiments are needed to determine if the entire 383 nt intron can be removed by splicing that produces a single lariat. To find additional fly BPs using Branch-­‐seq, it would be advantageous to produce higher quality Branch-­‐seq data. First, to increase the number of reads mapping to the fly genome it would be preferable to start with more lariat RNA. Isolation of nuclear RNA should increase the proportion of lariat RNA to linear RNA, as lariats are presumably more abundant in the nucleus than in the cytoplasm. Additionally, deleting ldbr (and any homologs of ldbr), rather than using RNAi should increase the half-­‐life of lariats, allowing more lariat RNA to be captured. Second, since the BP reads in the existing data are located in very close proximity to the 3'SS, RNase R should be used to digest the lariat tails. RNase R treatment may also remove contaminating snoRNAs from the Branch-­‐seq samples. 182 Supplemental note Debranching enzyme depletion in worms and mouse does not result in an arc on 2D gel WT and debranching enzyme null worm RNA did not show any striking differences when run on a 2D gel (Fig. III-­‐S1A). This observation leads to two main possibilities. Either lariat RNA did not accumulate in the deletion strain or lariats did accumulate, but did not migrate differently than linear RNA in the 2D gel due to the very small average size of 60 nt of worm introns (Lim & Burge, 2001). Similarly, despite good knockdown of debranching enzyme using shRNAs in mouse embryonic stem cells (mESCs), no lariat arc was easily discernable (Fig. III-­‐S1B). However, the mouse RNA should be run on larger, higher percentage 2D gels because mouse introns are much larger than fly, worm, or yeast introns. 183 Figure III-­‐S1: 2D gels for worm and mECS RNA. (A) 2D gels on worm total RNA. Gel running conditions are the same as for fly. (B) Knockdown of DBR1 by shRNAs and 2D gels on mESC total RNA samples. D1: 6%, D2: 10%. 184 Acknowledgments We thank Anna Corrionero Saiz for the worm RNA and Paul Boutz for the mESC RNA and knockdown quantification. We thank Jessica Hurt, Kerry Kelley, Karen Traverse, Mary-­‐
Lou Pardue, Frank Mason, Ky Lowenhaupt, Iva Kronja, and Jessica Von Stetina for training and supplies related to fly cell culture. We thank the MIT Bio Micro Center and the Reddien Lab at MIT for MiSeq sequencing. References Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410. doi:10.1016/S0022-­‐
2836(05)80360-­‐2 Burnette, J. M., Miyamoto-­‐Sato, E., Schaub, M. A., Conklin, J., & Lopez, A. J. (2005). Subdivision of large introns in Drosophila by recursive splicing at nonexonic elements. Genetics, 170(2), 661–674. doi:10.1534/genetics.104.039701 Chapman, K. B., & Boeke, J. D. (1991). Isolation and characterization of the gene encoding yeast debranching enzyme. Cell, 65(3), 483–492. doi:10.1016/0092-­‐8674(91)90466-­‐C Duff, M. O., Olson, S., Wei, X., Garrett, S. C., Osman, A., Bolisetty, M., et al. (2015). Genome-­‐
wide identification of zero nucleotide recursive splicing in Drosophila. Nature, 521(7552), 376–379. doi:10.1038/nature14475 Hatton, A. R., Subramaniam, V., & Lopez, A. J. (1998). Generation of alternative Ultrabithorax isoforms and stepwise removal of a large intron by resplicing at exon-­‐
exon junctions. Molecular Cell, 2(6), 787–796. Kelly, S., Georgomanolis, T., Zirkel, A., Diermeier, S., O'Reilly, D., Murphy, S., et al. (2015). Splicing of many human genes involves sites embedded within introns. Nucleic Acids Research. doi:10.1093/nar/gkv386 Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., & Haussler, D. (2002). The human genome browser at UCSC. Genome Research, 12(6), 996–1006. doi:10.1101/gr.229102 Langmead, B., Trapnell, C., Pop, M., & Salzberg, S. L. (2009). Ultrafast and memory-­‐efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3), R25. doi:10.1186/gb-­‐2009-­‐10-­‐3-­‐r25 Lim, L. P., & Burge, C. B. (2001). A computational analysis of sequence features involved in recognition of short introns, 98(20), 11193–11198. doi:10.1073/pnas.201407298 Sibley, C. R., Emmett, W., Blazquez, L., Faro, A., Haberman, N., Briese, M., et al. (2015). Recursive splicing in long vertebrate genes. Nature, 521(7552), 371–375. doi:10.1038/nature14466 185