Investigating the competing endogenous RNA hypothesis Genome-wide and in Single Cells by Apratim Sahay B.S in Physics and Mathematics, University of Chicago (2008) Submitted to the Department of Physics in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY at the I I- CO cO- Ul) MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2015 Massachusetts Institute of Technology 2015. All rights reserved. Author. Signature redacted Department of Physics May 22nd, 2015 /1 Signature redacted Certified by A/ /7 Alexander van Oudenaarden MIT Pro sor of Physics and Professor of Biology Director, Hubrecht Intitute for evelopmental Biology I Certified by Signature redacted Thesis Supervisor Jeff Gore Latham Family Career Development Assistant Professor of Physics Thesis Supervisor Accepted by Signature redacted_ _ Professor Nergis Mavalvala Associate Department Head of Physics C MITLibraries 77 Massachusetts Avenue Cambridge, MA 02139 htp://Iibraries.mit.edu/ask DISCLAIMER NOTICE Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available. Thank you. The images contained in this document are of the best quality available. Investigating the competing endogenous RNA hypothesis Genome-wide and in Single Cells by Apratim Sahay Submitted to the Department of Physics on May 22nd, 2015, in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Abstract The observation that microRNAs (miRNAs), through a titration mechanism can couple interactions of their common targets (competing endogenous RNAs or ceRNAs) has prompted a general "ceRNA hypothesis' that RNAs can regulate each other indirectly through global RNA-miRNA-RNA networks. These ceRNAs are said to "crosstalk' with each other by competing for common miRNAs. Although many individual ceRNAs have been found, fundamental questions about both the magnitude and generality of the crosstalk effect remain. In our study we combine RNA sequencing and single-molecule FISH (smFISH) approaches to both measure the magnitude of the crosstalk effect genome-wide by perturbing three known ceRNAs (Pten, Vapa, Cnot6l) and to identify mechanisms by which the crosstalk effect acts. We identify hundreds of putative ceRNAs and dissect the contributions of individual miRNAs in transmitting crosstalk. We demonstrate that while the crosstalk effect is pervasive, it nevertheless remains bounded by the size of the perturbation. Furthermore, we show that both the number and affinity of shared miRNA binding sites between targets is crucial in determining the magnitude of the crosstalk strength. Using the smFISH data, we examined the single-cell gene expression profiles of pairs of ceRNAs and found that ceRNA gene expression is correlated only in the presence of active miRNAs. Additionally, on inspecting the intra-cellular localization of RNA molecules, we found a miRNA-dependent colocalization of ceRNAs, suggesting a new signature of crosstalk between ceRNAs that extends and modifies the original hypothesis. Thesis Supervisor: Alexander van Oudenaarden Title: MIT Professor of Physics and Professor of Biology Director, Hubrecht Institute for Developmental Biology Thesis Supervisor: Jeff Gore Title: Latham Family Career Development Assistant Professor of Physics This work is dedicated to my grandparents Gaur Priya Devi & Krishnanand Sahay, Veena Srivastava & Shailendra Nath Srivastava who instilled in me their love for the life of the mind and the desire to share its fruits with others. Acknowledgements This thesis would not have been possible without the help, encouragement and support of many people to whom I owe a debt of gratitude. First and foremost, Alexander van Oudenaarden, my thesis advisor, who welcomed me into his lab and gave me great freedom and support throughout my PhD. Alexander's grasp of experimental biophysics is truly broad and deep, which I found as he led the lab through the smFISH era, the RNA sequencing era and the single-cell sequencing era. Not only was he an inspiring scientist, but he also created a fantastic group of enormously talented students and post-docs in building 68 that buzzed with stimulating ideas. After introducing me to microRNAs and suggesting an experimental plan of attack, he then stepped back to let me find my own way. Always there to offer a suggestion, to share in excitement or to help think through a problem, he has been a great mentor. After his move to Utrecht, he offered me numerous opportunities to visit him there and work with another set of fantastic people. Finally, I am also thankful for the opportunity as a graduate student to be able to make mistakes. I will be forever grateful for Alexanders limitless patience throughout this process. I sincerely thank my thesis committee members, Jeff Gore, Jeremy England and Mehran Kardar for their support and advice throughout my graduate years. Jeff in particular for his blend of unflappable enthusiasm and guidance during some of the more trying phases of research. Next, my wonderful collaborators - Joern Schmiedel, Yannan Zheng, Sandy Klemm, Dominic Gruen. Joern came to MIT a year into my thesis project and has helped shape and sharpen my ideas tremendously. His enthusiasm and dogged persistence in solving problems were a great boost whenever I was stuck in dark alleys. Yannan and I started vi and finished our PhD's together and also been through all the ups and downs of graduate student life together. She taught me a lot about microRNA biology and was an invaluable source of experimental guidance, especially cell culturing and cloning. Dominic helped set up the RNA Sequencing pipeline in Utrecht and generously shared his expertise in microRNA bioinformatic analysis. Sandy was a fantastic friend, a critical sounding board for hypothesis, and taught me the intricacies of live-cell FACS sorting. My graduate life would not have been half as much fun without the tremendous people at the AvO lab: Dong Hyun Kim for mentoring me in worm biology when I first came to the lab and training me in the dark arts of FISH. He and Christoph Engert were vital founts of friendship, mentorship and cheer. To the postdocs: Stefan, Jeroen, Magda, Nick,Nikolai, Lenny, Shalev, Philipp, Anna, Gregor, Arjun, Scott, who took the time to provide critical advice on experiments, research, and life. To the amazing graduate students in the lab office who shared all the joy and frustrations of research. You made the AvO lab fun and exciting: Ruizhen, Miaoqing, Bernardo, Clinton, Ni, Annnalisa, Dylan, Kay, Juan, Shankar, Hyun. Lastly, Monica Wolf, Annemiek van Rooijen, Crystal, Cathy and Katie who have meticulously taken care of any and all administrative issues that have cropped up. During my time at MIT, I've been lucky to have some wonderful roommates and friendsMichelle, Andrew, Andrew Stecker, Arghavan, David who have been fantastic at keeping a balanced life. Friends on the squash courts who have offered huge support and camaraderie over the years, thank you for helping me maintain my sanity- Najib, Ann, Pam, Jan, Frans, Christopher, Christoph, Justin, Mehmood. Finally I would like to thank my parents Aparajita and Avinash, for being so amazingly supportive throughout my entire academic career, and life in general, and providing countless opportunities to me. My sisters Ananya and Apoorva for your love and feigned excitement at my research! My cousins, Sunny, Pranay, and Abhilash for their encouragement and shared geekdom. My extended family in India for their tremendous support over the years. Lastly, my wife Liz, without whom I would never have been introduced to the world of biology, and without whose unwavering support none of this would have happened. Your intelligence, encouragement and limitless love makes all things possible. vii Table Of Contents Acknowledgements vi List of Figures xii Introduction 1.4 . . . . . . . . . . . . . 10 1.1.2 Biogenesis of miRNAs . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.1.3 miRNAs: target binding and competition . . . . . . . . . . . . . . 12 ceRNAs: Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.1 Different types of endogenous ceRNAs . . . . . . . . . . . . . . . . 15 1.2.2 3'UTRs as ceRNAs . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.3 Circular RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.4 Pseudogenes as ceRNAs . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.5 Long non coding (lncRNA) as ceRNAs . . . . . . . . . . . . . . . . 17 Modulators of crosstalk activity . . . . . . . . . . . . . . . . . . . . . . . . 18 . . . . . . . . . . . . . . Abundance of miRNA binding sites and miRNA concentration . . 19 1.3.2 MiRNA binding affinity . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.3 MRE Accessibility and Local concentrations . . . . . . . . . . . . . 20 1.3.4 Post-transcriptional network effects . . . . . . . . . . . . . . . . . . 21 Summary and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 . . . . 1.3.1 Assesment of the ceRNA hypothesis with integrated genome-wide measurements reveals bounded yet pervasive crosstalk activity 24 2.1 26 . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 ODE biochemical model of crosstalk predicts that crosstalk strength 28 2.1.2 Quantification of crosstalk following siRNA knockdown of sender 32 2.1.3 Pervasive yet bounded mRNA Crosstalk upon siRNA knockdown. 35 2.1.4 Crosstalk strength correlates with the number of shared binding sites 37 2.1.5 miRNA's hierarchically contribute to transmitting crosstalk . . . . 40 . should be bounded by 1 . . . . . . . . . . . . . . . . . . . . . . . . . 2 Discovery of miRNA Regulation . 1.3 10 1.1.1 . 1.2 9 MicroRNAs-discovery, biogenesis, target binding and competition . 1.1 . 1 viii TABLE OF CONTENTS 2.1.6 Pten miRNAs have the greatest crosstalk power due to high [miRNA]: Target abundance ratios . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.7 2.2 2.3 Transfecting Pten UTR as a sponge de-represses putative ceRNA's in a dose-dependent and miRNA dependent manner . . . . . . . . . . . 46 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Methods and Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.3.1 Cell culture and siRNA Transfection . . . . . . . . . . . . . . . . . . 58 2.3.2 RNA extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.3.3 RT-PCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.3.4 Reporter Plasmid Construction . . . . . . . . . . . . . . . . . . . . . 60 2.3.5 Transient Transfection of plasmid . . . . . . . . . . . . . . . . . . . . 60 2.3.6 FACS sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.3.7 RNA Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.3.8 RNASeq Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.3.9 miRNA-mRNA Target prediction . . . . . . . . . . . . . . . . . . . . 62 2.3.10 miRNA expression Data sources . . . . . . . . . . . . . . . . . . . . 3 62 2.3.12 GO term analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.3.13 TMM (Trimmed Mean of M-values) Normalization . . . . . . . . . . 63 Supplementary Figures and Tables 64 . . . . . . . . . . . . . . . . . . . . . . . A single molecule analysis of ceRNAs reveals miRNA-dependent correlation and colocalization 69 3.1 70 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Quantification of gene expression for Pten, Vapa and Cnot6l in single cells with 3-colour smFISH 3.1.2 4 62 . . . . . . . . . . . 2.3.11 Target Abundance and Sequestration estimation 2.4 44 . . . . . . . . . . . . . . . . . . . . . . . 70 Presence of shared miRNAs generates correlated fluctuations of Pten ceRNAs in single cells . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.1.3 Pten, Vapa, Cnot6l are mutually reciprocal ceRNAs 75 3.1.4 Individual molecules of Pten ceRNAs are colocalized in a miRNA- . . . . . . . . . . dependent manner . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2 D iscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3 Methods.......... 81 ...................................... 3.3.1 Fluorescent in situ hybridization and imaging . . . . . . . . . . . . 81 3.3.2 Image analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.3.3 siRNA transfection and cell culturing . . . . . . . . . . . . . . . . . MicroRNA-mediated control of protein expression noise 82 83 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Effects of microRNAs on gene expression noise . . . . . . . . . . . . . . . . 84 ix Chapter 0 4.3 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Directions References 94 95 98 Appendix A: Mathematical model of microRNA regulation by Joern Schmiedel x 101 List of Figures 1.1 Canonical miRNA biogenesis pathway (adapted from (Davis-Dusenbery Hata, 2010)........... 11 ........................................ 1.2 Logic of the ceRNA language (adapted from (Salmena, 2011) . . . . . . . . 1.3 Various types of validated competing endogenous RNAs (adapted from (Tay & Pandolfi, 2014) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 13 15 Extensive co-targeting of miRNAs - many targets share miRNA binding sites(adapted from Obermayer(2014). The color of the edges indicates the number of pairs which share a given pair of miRNAs while the size of the nodes indicates the total number of shared targets for a given miRNA . . . 2.1 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 31 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 21 32 siRNA knockdown of 3 different endogenous senders shows crosstalk strength is bounded by 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 . . . . 36 2.3 Crosstalk is miRNA-mediated and pervasive on a genome-wide scale 2.4 Crosstalk strength of receivers with sender CNOT6L does not depend on their predicted number of shared binding sites with CNOT6L Related to (Figure 2.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 39 Crosstalk strength of receivers correlates with the predicted number of miRNA binding sites shared with the sender . . . . . . . . . . . . . . . . . . . . . . 40 2.6 Dissecting relative contributions of miRNAs in transmitting crosstalk . . . . 43 2.7 Greater miRNA:Target ratios underlie Pten's superior ability to send crosstalk 45 2.8 Derepression of Pten ceRNAs is detected upon modulating the levels of Pten 3' UTR with a transiently trasnfected synthetic reporter construct 2.9 . . . . . 47 Normalization is required for FACS Sorted RNAseq data as reads from plasmid occupy a large percentage of total sequencing reads leading to an overall offset in fold changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.10 Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dosedependent and miRNA dependent manner . . . . . . . . . . . . . . . . . . . xi 53 Chapter 0 2.10 Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dosedependent and miRNA dependent manner . . . . . . . . . . . . . . . . . . . 54 2.11 Predicted TargetScan conserved miRNA binding sites in the 3'UTR of the ceRNAs chosen in this study . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.12 Crosstalk is microRNA mediated and pervasive on a genome-wide scale. Re- lated to (Figure 2.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.13 Distribution of log 2 fold changes (PTEN UTR/NULL) for all genes post TMM normalization is centered around zero in each bin i.e no bin-dependent effects are seen. Related to Figure 2.10 . . . . . . . . . . . . . . . . . . . . . 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Measuring crosstalk strength with smFISH for 3 different senders in HCT116 and DICER 3.5 73 Pten does not lose correlation in DICER for a gene with which it doesn't share miRNAs 3.4 71 Crosstalk helps ceRNAs co-fluctuate in single cells thereby tightening their stoichiometric ratios in the presence of active miRNAs . . . . . . . . . . . . 3.3 65 Measuring Pten, Vapa and Cnot6l gene expression in single cells with 3-colour single-molecule FISH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 65 -/-. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Single molecule FISH shows Pten ceRNAs are colocalized in a DICER dependent m anner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.1 Opposing noise effects of microRNA regulation at low and high gene expression 85 4.1 Opposing noise effects of microRNA regulation at low and high gene expression 86 4.2 Noise model predictions for a microRNA regulated gene . . . . . . . . . . . 87 4.3 microRNA-mediated intrinsic noise effects . . . . . . . . . . . . . . . . . . . 89 4.4 Estimation of microRNA pool noise and noise effects for endogenous genes . 91 4.4 Estimation of microRNA pool noise and noise effects for endogenous genes . 92 5.1 Colocalization of ceRNA's can enhance crosstalk by increasing their local concentrations hence promoting rates of miRNA association between ceRNA as free miRNA's are more likely to bind to nearby mRNA than other targets (adapted from Jens (2015) . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 96 Chapter 1 Introduction According to the central dogma of molecular biology, RNAs are passive messengers of genetic information, or carrying out DNA instructions for protein production in cells. Studies on gene regulatory networks in the past focused on transcriptional regulation in the form of protein transcription factors binding to DNA, but increasing evidence suggests that posttranscriptional regulation are a significant part of the regulatory network. The discovery of microRNAs, a class of short noncoding RNA 18-25 nucleotides in length,that were shown to inhibit their target genes through binding to sites on the 3' untranslated regions (UTRs) of target RNA transcripts with imperfect complementarity, and leading to decreased expression of their target proteins either by mRNA degradation or translational inhibition (Bartel,2009) has dramatically increased the complexity of the gene regulatory networks. Each microRNAs can act in a combinatorial manner as a single mRNA usually contains binding for multiple miRNAs. At the same time, individual miRNA often targets up to 200 transcripts which are diverse in their function. Within the network of potential interactions that ensue, miRNAs have been thought to function mainly as fine tuners of gene regulation by weakly dampening protein output (Bartel 2004) but more recently attention has been directed to their system-level effects. In particular, If microRNAs act to negatively regulate RNAs, could RNA's themselves regulate microRNA levels? After all, each target binding Chapter 1 site sequesters miRNA from their other targets. The central mechanism underlying the ceRNA hypothesis proposed by (Salmena 2011) is the idea that RNA species are coupled by their targeting miRNA through their shared miRNA binding sites. Therefore, they may have interactions that are not direct, but instead indirect and mediated by competition and depletion of shared microRNA pools. Thus RNA's could be said to "crosstalk" with each other. Moreover, the hypothesis contends that these indirect RNA interactions result in a biologically important mRNA network- either by functional changes in protein levels or by inducing correlations in different RNA species or by reducing noise in protein levels. This mechanism is believed to play a role in many biological processes, from cancer (Tay 2011) to cell differentiation (Cesana 2011). In the next section, we discuss miRNA biology and literature summarizing the experimental evidence of RNA-RNA crosstalk, as well as the modulators of crosstalk activity. 1.1 MicroRNAs-discovery, biogenesis, target binding and competition 1.1.1 Discovery of miRNA Regulation MicroRNAs were first discovered in the nematode C. Elegans in 1993 where lin-4, a short non-coding RNA, was found to imperfectly base-pair to complementary sequences on the 3'UTR of the lin-41 transcript (Wightman 1993, Lee 1993), and block lin-41 gene expression. Reduction of LIN-41 protein resulted in mis-timing of the developmental stages of the animal. Lin-4 remained the only miRNA discovered until 2000, when another miRNA important in the development of C. elegans, let-7 was discovered (Reinhart et al., 2000). Analogues of lin-4 and let-7 were found in a wide-range of other species, including humans and in the following years, over 1500 different miRNA sequences were discovered. A huge amount of research focused on the identification of target sites(Lewis 2005, Stark 2003), their likely cellular functions (Giraldez 2006, Vigorito 2007) and their biogenesis (Hutvagner and Zamore, 2002). MiRNAs have been ascribed roles in nearly every biological process, includ10 1.1. MICRORNAS-DISCOVERY, BIOGENESIS, TARGET BINDING AND COMPETITION ing apoptosis (Cimmino 2005), pluripotency (Subramanyam 2011), and cell-cycle control (Ivanovska 2008). i Ftt r 1..Tioee ils of miN .C A s In animals, miRNAs are transcribed by RNA Pol II as long primary transcripts (primiRNAs) with both a 5' cap and 3' poly-adenylated ends (Cai 2004). miRNA genes are often genomically clustered such that pri-miRNA transcripts contain multiple mature miRNA sequences (Lau 2001). These primary mIRNAs are recognized and clipped by the microprocessor complex, comprising the RNAse III enzyme DROSHA (Lee 2002) and its co-factor RNA-binding protein DGCR8 (Gregory 2004), into hairpin loops 60-65 bp long. These hairpin loops are bound and exported from the nucleus into the cytoplasm by Exporin-5. Once in the cytoplasm, the pre-miRNA are bound by a second RNAase III enzyme DICERi 11 Chapter 1 which cleaves the precursor loops into short double-stranded 20-24 nt RNA (Grishok 2001), containing the mature miRNA "guide-strand' and "passenger strand". In a less understood process, DICERI loads the mature miRNA into the Argonaute complex (usually AGO2), that in turn recruits the RNA-induced silencing complex (RISC) (Sontheimer 2005). Upon loading of the miRNA into the RISC complex, the passenger strand of the double-strand miRNA is usually degraded while the guide-strand bound to the silencing complex seeks out its complementary RNA sequence. As biogenesis consists of multiple steps, numerous mechanisms for modulating its propagation have been shown, with implications for ceRNA competition that will be discussed later. In particular, over expression of Ago2 was found to increase mature miRNA levels in some cells, while disruption of DICER1 enzyme resulting in lowered levels of mature miRNAs (Diederichs 2007, Tay 2011). We will use cells lacking in DICER1 as an important control in all our experiments in Chapters 2 and 3. 1.1.3 miRNAs: target binding and competition The specificity of the target recognition process depends upon a crucial "seed" region of the miRNA (usually nt 2-7/8) recognizing as few as 6-7 nucleotides in the 3'UTR of target mRNA (called the microRNA Response Element or MRE). In most cases, even a single mismatch in the seed sequence leads to disruption of miRNA binding (Lewis 2005). Even so, with such few nucleotides in the seed region responsible for target recognition, individual miRNAs potentially bind to a large number of target mRNAs. However, as with any bimolecular binding reaction of the form A +B +-- AB, the mass-action law dictates what proportion of targets would be bound, and thus repressed by a miRNA. This relates the molecular concentrations of the miRNA and its targets to the Kd of the interaction. If miRNAs are limiting, then increasing the number of targets would result in lower occupancy per target. Put another way, each miRNA bound to a target necessarily prevents, to some extent, the binding of that miRNA to other target sites. Thus, target sites can be said to compete with each other for miRNAs. More generally, competition and saturation effects occur in other parts of the miRNA regulatory process. When mature miRNAs are loaded onto the Argonaute complex there is competition for access due to the small number of 12 1.2. CERNAS: DISCOVERY molecules involved which can lead to saturation conditions for the RISC machinery. The concept of competitive target inhibition by miRNAs inside the cell was first shown in 2007 by (Ebert 2007), who used plasmids overexpressing miRNA seed-sites (upto -10,000 copies) to 'sponge up" specific endogenous miRNAs, and thereby titrate away those miRNAs from their other targets, resulting in a specific up-regulation of the corresponding miRNAs targets. Consistent with the limited power of miRNA repression, they measured a mild 1.5-2 fold up-regulation of the miRNA target. In order to stress the large number of strong binding sites for a single miRNA that had been introduced into the cell, they used the term PmiRNA sponge". Later (Seitz,2009) proposed that these highly expressed artificial sponges may have a biological function and that the role of a substantial fraction of computationally identified miRNA targets may be to sequester miRNAs, preventing them from binding to their authentic targets. Such sponges had been discovered in plants where over-expression of the long non-coding RNA IPS1 sequesters miR-339 and results in the up regulation of miR339 target gene (Franco-Zorrilla 2007). To what extent similar competition and saturation effects naturally occurred in animals remained unexplored. 1.2 ceRNAs: Discovery A Conventional RNA S UTR COS logicRN 3'UTR ~T MRE Figure 1.2 ILogic of the ceRNA language (adapted from (Salmena, 2011). In 2010 the Pandolfi group devised a combined computational/experimental strategy to search for potential competing endogenous RNAs (termed ceRNAs) for a tumour-suppresor 13 Chapter 1 gene Pten based on the number of predicted shared miRNA binding sites on other transcripts. This computational analysis identified over a hundred protein coding genes that shared at least 7 miRNA binding sites with Pten. These genes were considered candidate ceRNAs for Pten. For a subset of these gene (6 out of the 8 genes tested) they demonstrated a depletion of their expression upon Pten knockdown via siRNA, and conversely, a up-regulation upon overexpression of PTEN 3'UTR. Specifically, the genes Vapa and Cnot6l were confirmed as bona fide Pten bi-directional ceRNAs as transfecting the 3'UTR of these mRNAs intensified their miRNA sponging and led to an increase in PTEN protein abundance. Such a change in PTEN protein levels was shown to have a functional significance: it antagonized PI(3)K signaling and caused growth and tumor suppression (Tay 2011). The authors went further and extrapolated that all kinds of RNA transcripts talk to each other in a miRNA-mediated language and proposed a "crosstalk" hypothesis: RNA's sharing multiple MRE in their 3' UTRs (or in other ncRNA) communicate with each other and regulate their expression levels by competing for a limited pool of miRNAs (Salmena 2011). Upregulating a given RNA would lead to an increase in the total number of MRE's and thereby attract miRNA binding towards it. As a result the targeting miRNAs would be sequestered leading to the de-repression of other miRNAs sharing the same MRE's. This indirect correlation, between competing targets was termed the ceRNA or crosstalk effect. (Figure 1.2) While the ceRNA hypothesis was a natural consequence of target competition and sequestration, it nevertheless made a startling claim: a new, pervasive gene regulatory network must exist due to the highly promiscuous and clustered nature of miRNA-target binding (Karreth 2011, Sumazin 2011). These papers proposed that shared miRNA target sites linked dense networks of thousands of genes in a regulatory complex and moreover, the expression of these genes is correlated in many cancer cell-types. In order to test computational predictions of ceRNAs, individual ceRNAs were either down regulated or over-expressed and expression levels of other ceRNAs were measured. In this manner, many new ceRNAs were discovered. In the following section, we briefly discuss some classes of transcripts that have been identified as ceRNAs. 14 1.2. CERNAS: DISCOVERY Different types of endogenous ceRNAs 1.2.1 miRNAs A -rrr= e Ti"If2f- miRNA Pseudogenes AAAAA miRNA uicRNA circRNA Competing mRNA MAAA mRNA AAAAA B T" I I ftNotv&W&W Cerwclf w aft nds m u~fN Sminon-codengRMA H"sZI fO1 IP$1 PnO2 MM-27a iOR-399 ifPsa"AwzahMw Lnjgnon-codingRNA Hl.C PRACO ff372 Unc4AM1 MAEL AMc mAW133 inI-135 HOMMOW AMnma cukand Homo sapvns ImcRoR ~NG WUr AmbAdoptis SUwana n)&145I hiamsop"M SM~ PTCSC3 miR-5745p Le-7 H29 PTMIPI PTEN KWAIP Pbcs4 OmXatRNA -my -17.mR-19 Mwnumuscus and HOMa sapans m&iR-21, miR-26 and L*t-7 bn~i Miarnuanuband Hoinoseiwis SCAS4 Sqy Mwwmosuandhamose Mws musadus and Homo sapid Dan,..ed, CoRlas/AyS- m*138 Figure 1.3 1 Various types of validated competing endogenous RNAs (adapted from (Tay & Pandolfi, 2014). 1.2.2 3'UTRs as ceRNAs 3' UTRs are critical for mRNA stability and typically contain MREs for several different miRNAs. One can view them differently as ceRNAs because 3' UTRs regulate not only the stability of their own transcripts in cis, but are also likely to attract miRNAs from transcripts with shared MREs, thereby regulating such transcripts in trans. This suggests that mutations or changes in abundance, structure, or length of 3' UTRs may affect their 15 Chapter 1 ability to sponge miRNAs. Supporting this view, alternative polyadenylation of 3'UTR has been observed- leading to their lengthening during embryogenesis, and shortening in proliferating cells (Mayr 2009) and in cancer (Mercer 2011). These changes in the length of 3' UTRs affect the interaction of miRNAs with such transcripts and affect protein output. Moreover, due to a reduced number of MREs, 3' UTR shortening will also modify the ability of these mRNAs to compete for/sequester miRNAs and thereby function as ceRNAs. 1.2.3 Circular RNAs RNA's that are covalently linked at the ends to form circles had been described in plants (Sanger 1976) but a new class of noncoding circular RNAs (circRNAs) was recently identified and characterized in mammals (> 5000). These RNAs are processed by the spliceosome in an unusual head-to- tail fashion, resulting in circular transcripts that contain multiple miRNA binding sites and act as miRNA sponges to deplete the cell of specific miRNAs, essentially alleviating repression of the mRNAs they target (Memczak 2013, Hansen 2013). They found that a circRNA ciRs-7 contained >70 MRE's for the miRNA miR-7 and formed complexes with AGO in a miR-7 dependent manner. smFISH then showed that circRNA-miRNA complexes localize to P-bodies, suggesting that the complexes were being sequestered from translational machinery. circRNAs have proven to be highly effective at sequestering miRNA's as compared to their linear counterparts partly because they are almost immune to miRNA mediated target destabilization due to inherent resistance to RNA exonucleases. Effective "supersponge" ceRNAs have precisely such properties: resistance to degradation, high expression levels, multiple miRNA binding sites. Further characterization of this abundant class of non coding RNAs will be necessary to determine how universal this mechanism is for sequestering miRNAs inside cells and their ceRNA function. 1.2.4 Pseudogenes as ceRNAs Pseudogenes, a class of non-coding RNAs, are transcribed yet posses features such as premature stop codons, deletions/insertions, or frameshift mutations that prevent them from producing functional proteins. Hence they have been considered "junk" DNA. However, they 16 1.2. CERNAS: DISCOVERY are thought to act as "perfect sponges" because they possess many of the same MREs located on their ancestral genes; for example, PTENP1 is able to change the miRNA network normally involved in the regulation of PTEN [Tay 2011, Poliseno 2010]. PTENP1, the processed pseudogene of PTEN represents the first reported example of an RNA transcript that acts as a ceRNA for PTEN. Within the coding region, the PTENP1 sequence differs from the PTEN sequence by only 18 mismatches, thus PTEN-targeting microRNAs that bind to MREs are usually PTENP1-targeting as well. (Poliseno, 2011) tested ceRNA activity of PTENP1 in prostate cancer cells, and showed that inhibiting the common microRNAs miR-17, -19, -21, -26 and -214 de-repressed PTENP1. Conversely, PTENP1 3'UTR overexpresison led to the de-repression of PTEN. Another pseudogene acting as a ceRNA is the Oct44 pseudogene, Oct4-pg4 (Wang 2013). Oct4 pesudogene was shown to sponge away the miR-145, and hence upregulate Oct4. These studies have attributed a miRNA-sponge function to pseudogenes however, the difficulty of reliably quantifying pseudogene expression (due to the aforesaid sequence similarity) have hindered attempts to quantitatively study their ceRNA function on a large scale. 1.2.5 Long non coding (IncRNA) as ceRNAs Similar to pseudogenes, long non coding RNAs don't have any protein-coding capacity, but are found pervasively across the transcriptome (-10,000) . They are good candidates to act as ceRNAs because they are peppered with miRNA binding sites, and have an ability to sequester miRNAs (Chi 2009). Moreover, lncRNAs are also known to display specific expression patterns in different tissues, developmental stages, cell types and disease and thus have been recognized as ideal candidates to tune post-transcriptional regulation (Guttman 2012). Two such IncRNA ceRNAs that have been discovered acting as miRNA - sponges are HULC and ROR. The lncRNA HULC has been shown to act as a ceRNA it sequesters a set of miRNAs, including miR-372, and its over expression reduces miR372 expression and activity in the liver cancer cell line Hep3B. This miR-372 sequestration increases the translational level of the miR-372 target gene, PRKACB (Cesana, 2011). Recently, (Wang 2013) showed that lnc-RoR competes for miR-145 binding with the well17 Chapter 1 known core pluripotency factors Oct4, Nanog and Sox2 in pluripotent embryonic stem cells and thereby protects them from miR-145 induced degradation. Interestingly, Inc-ROR was expressed at a greater level(>100 copies/cell) than its miRNA-145 (10-20 copies/cell) suggesting that it acts as a good sponge. 1.3 Modulators of crosstalk activity The size of the crosstalk effect depends upon whether or not a single ceRNA perturbation has an appreciable effect on the total miRNA target pool so as to titrate away miRNAs from other shared targets and thereby relieve their miRNA induced repression. Recent mathematical models of miRNA gene regulation (Bosia 2013, Figliuzzi 2013, Ala 2013) have aimed to quantitatively model ceRNA crosstalk through both steady-state and kinetic descriptions for a small number of interacting miRNA-ceRNA species. The quantitative predictions of these models may not sufficiently explain the magnitudes of the endogenously measured ceRNA effect due to the limited number of ceRNAs modeled and the use of free kinetic parameters of transcription, degradation and association rates that are difficult to experimentally ascertain (Ebert 2012). However, they illustrate some useful principles of miRNA-target competition: (i)the optimal regime for ceRNA crosstalk occurs when targets concentrations are close to the binding Kd of miRNA-target interaction (ii) crosstalk between targets is intensified with a greater number of shared miRNAs (iii) higher expressed targets that form a greater proportion of a miRNA's total target pool are better senders for crosstalk.(iv)ceRNA effects will be selective and hierarchical depending on the particular affinities and binding strengths of miRNA-target pairs (Figliuzzi 2013) (v) ceRNA effects can be indirect i.e if ceRNA1 shares miRNA1 with ceRNA2 and also shares miRNA2 with ceRNA3,then ceRNA1 will be indirectly coupled to ceRNA3 through even though they do not share any mIRNAs directly in common with each other.(Ala 2013) Quantitative prediction of ceRNA effect in miRNA networks critically requires knowledge of the relative concentrations of miRNAs and targets in the cell. Both of these are experimentally difficult to measure. Absolute concentrations of miRNA have been reported 18 1.3. MODULATORS OF CROSSTALK ACTIVITY to range up to 120,000 copies per cell in various cell types (Bissels et al., 2009; Calabrese et al., 2007; Denzler et al., 2014; Lim et al., 2003; Mukherji et al., 2011). Estimated total target concentrations for a given miRNA vary from 500 copies per cell to over 440,000 (Denzler et al., 2014; Loeb et al., 2012; Wee et al., 2012). Estimates of target abundance concentrations are done in-silico and are widely divergent estimates. Consequently, differing target pool size predict very different characteristics of miRNA target competition networks. Recently, researchers (Bosson 2014) have critically advanced the field by making state-of-the-art measurements of both miRNA abundance and the total abundance of miRNA-binding sites (Bosson 2014). 1.3.1 Abundance of miRNA binding sites and miRNA concentration Firstly, to directly determine bound miRNA target sites, (Bosson 2014) relied on crosslinking and immunopreciptiation (CLIP) of the Argonaute 2 protein to identify bound AGO2 mRNA and consequently target-site abundance in vivo. CLIP protocols first use ultra-violet (UV) light to induce protein-RNA cross links, then AGO2 protein is immunoprecipitated using a specific antibody, thus bringing both the guide miRNAs and their bound targets, and these are stringently purified to get rid of unbound RNA, digested into short RNAs, and prepared for sequencing. By quantifying the CLIP reads at each miRNA seed-site they were able to specifically and reproducibly estimate the concentration of bound targets. Secondly, they measured miRNA concentrations with a small RNA-seq assay and normalized the counts to miR-295 copies per cell quantified by northern blot. With these data, they show that for the thirty highest expressed miRNAs in ES cells, total 6-mer/7mer/8-mer target pools were more abundant than all miRNA concentrations. Thus any perturbation of a ceRNA for those miRNAs is unlikely to titer them away as binding sites are already in excess. Similarly, (Denzler 2014) reported that even for the highest expressed miRNA, miR-122, total target binding sites are above miRNA levels; consequently miR-122 targets are not derepressed until they added unphysiologically high amounts of miR-122 sponges. These studies, done on primary cells, have considerably diminished the possibility of a appreciable ceRNA effect that is purely stoichiometric in nature. It is important to 19 Chapter 1 realize that these CLIP protocols (Bosson 2014) pool together millions of cells, yielding an average binding profile which may not be reflective of dynamic conditions in single cells. Moreover, the studies by the Pandolfi group were done in a cancer cell-line which are known to have altered miRNA concentrations. Therefore, we cannot rule out ceRNA effects in all types of cells. 1.3.2 MiRNA binding affinity The two main factors that affect miRNA-binding affinity are the number of miRNA binding sites on a target and the free energy of the miRNA-target hybridization (AG). Given the variation in binding affinities across targets, miRNAs will preferentially bind targets with greater affinity before spreading to lower-affinity sites. Thus the total target pool is partitioned into hierarchical affinity classes that do not compete equally. Conceptually, all binding sites of the same affinity (Kd) "see" the same concentration of free miRNA, which means that they can be grouped together. Targets with affinity much greater than the rest of the pool would act in a simple 1:1 titration regime with the miRNA. Since high-affinity target sites more favorably bind the available miRNA pool, competition can occur without approaching expression levels of the total pool of weak and strong sites combined. 1.3.3 MRE Accessibility and Local concentrations Going back to the binding reaction of the form A+ B <-- AB, one notices that the relevant concentrations of each species is not the global concentration (assuming a well-mixed cellular environment) but rather that binding probabilities are determined by local concentrations. If miRNAs or mRNAs are kept sequestered in sub-cellular structures, local concentrations may deviate from the average by a large magnitude. Structures such as P-bodies or RNA granules can harbor RNAs and mIRNAs in small volumes, thereby concentrating them and possibly altering binding and unbinding of miRNAs. While these phenomena are very difficult to quantify, altered local concentrations can change the competition between miRNA-mRNAs and enhance the size of the ceRNA effect. Essentially, rather than competing for binding with the whole target pool, miRNAs could bind much more favorably to locally available 20 1.3. MODULATORS OF CROSSTALK ACTIVITY target sites. smFISH studies can allow us to quantify the localization of ceRNAs, which we will perform in Chapter 3. 1.3.4 Post-transcriptional network effects shard pairs *3'0,\ 0 230 m 45 = 60 .10 Mir-203mir-34 mir-2m2r-96 2000 0 1000 0 100 connectivity (total # of shared pair targets) . Figure 1.4 I Extensive co-targeting of miRNAs - many targets share miRNA binding sites(adapted from Obermayer(2014). The color of the edges indicates the number of pairs which share a given pair of miRNAs while the size of the nodes indicates the total number of shared targets for a given miRNA A systematic analysis of the ceRNA effect is impeded by the complexity of natural miRNAD ceRNA regulatory networks. The ceRNA effect depends both on the underlying dynamical binding parameters of miRNAs-target RNAs and on the topology of the network. The miRNA-RNA network is known to be highly clustered-certain miRNAs often target genes in tandem- consequently, there appears strong correlations in network connectivity (Figure 1.4). An implication of the highly interconnected nature of the miRNA-RNA target network is that perturbations of gene expression can potentially propagate in the network through a cascade of coregulated target RNAs and miRNAs that share targets (Nitzan 2014). Pairs of miRNAs which have greater number of shared targets would therefore act as key nodes in the ceRNA network. Conversely, certain ceRNAs, which are commonly tar- 21 Chapter 1 geted by a large number of miRNA species can selectively transmit crosstalk than others. Whether or not small effects caused by a propagation of the ceRNA effects are biologically meaningful remains to be investigated. Similar network propagation issues affect other gene regulatory mechanisms. It has often been observed after a gene perturbation (eg. of a transcription factor, miRNA, or drug target) that unrelated genes (off-targets) changed expression i.e those genes whose connection to the perturbed genes was not traceable. 1.4 Summary and Outline Following the discovery of transcripts that can sequester miRNAs thereby releasing other targets from miRNA-mediated repression, a new principle for post-transcriptional gene regulation has been proposed. This layer of gene regulation works through competition for miRNA binding between different RNAs, and thus has the capability to form a large-scale regulatory network across the transcriptome. The competing endogenous RNA (ceRNA) or RNA-RNA crosstalk hypothesis certainly seems an attractive explanation for the functionality of non-coding RNAs and pseudogenes, and until now, many ceRNAs, both coding and non-coding, have been implicated in varied biological contexts, from cancer (Fang 2013) to muscle differentiation (Cesana 2011). Nonetheless, only a handful of ceRNAs have been experimentally identified and many features of the proposed ceRNA hypothesis remain unexamined. Our aim in this thesis is to address some of the fundamental questions about the generality and magnitude of the crosstalk mechanism. In Chapter 2 we describe the results of perturbing single ceRNAs ( Pten, Vapa and Cnot6l) and quantifying its effects on the transcriptome to extract both the size of the ceRNA effect and test the contribution of specific microRNAs. As will be seen the ceRNA effect is bounded yet pervasive across the transcriptome. We find that in addition to the number of shared miRNA binding sites between the perturbed ceRNA and its targets, the affinity of shared miRNA-target binding is crucial in determining the magnitude of the ceRNA effect. Chapter 3 investigates three specific ceRNAs at a single-cell level with single-molecule resolution to explore how ceRNA co-regulation plays out in single cells. Unexpectedly, we find significant co-localization of these ceRNAs which can enhance crosstalk locally through competition, thus allowing us 22 1.4. SUMMARY AND OUTLINE to revise the original hypothesis. Moreover, we find miRNA-coupling between ceRNAs is capable of buffering their individual fluctuations and producing surprising correlations in gene expression. Chapter 4 studies the role of miRNAs in dampening fluctuations in protein levels (Schmiedel et al. 2015). We find that miRNA regulation provides a significant reduction in intrinsic protein noise at low expression levels which scales with miRNA repression, but variability in miRNA concentrations itself propagates to target fluctuations at higher expression levels. 23 Chapter 2 Assesment of the ceRNA hypothesis with integrated genome-wide measurements reveals bounded yet pervasive crosstalk activity MicroRNAs (miRNAs) are an abundant class of small non-coding RNA that play complex roles in post-transcriptional regulation of gene expression. Individual genes are typically regulated by many distinct miRNAs, and conversely individual miRNAs often target multiple genes leading to complex regulatory networks (Friedman 2009) that drive a large variety of cellular processes, from differentiation and proliferation to apoptosis and cancer [Yi 2008, Sluijter 2010, Cimmino 2005]. Several recent studies have added a new facet of posttranscriptional gene regulation: one that is mediated by transcripts with shared miRNA binding sites (Salmena 2011; Tay 2011; Tay 2014). This stems from the bidirectional effects between miRNAs and their target mRNAs- where a change in one transcript might affect the expression of other transcripts by sequestering miRNAs from their shared targets and thereby inhibit miRNA repression of those other targets. These transcripts-coupled by their shared miRNAs- are said to 'crosstalk' or regulate each other by competing for common miRNAs. Based upon such a target competition and sequestration mechanism, the competing endogenous RNA (ceRNA) hypothesis proposes a rich network of protein coding-independent regulatory interactions mediated by miRNAs. Although many individual ceRNAs have been found, fundamental questions about the magnitude of the effect remain. The experimental setup usually consists of altering the level of a particular transcript, ncRNA or a 3 'UTR (a 'sender'), then measuring the change in other genes ( 'receivers' ) that share MRE (miRNA response elements) with the sender, and verifying that this change in receiver expression is miRNA dependent. In this way, perturbation of senders by siRNA knockdown or UTR overexpression assays indicates that specific receivers move in a correlated fashion (Tay 2011, Salmena 2011)- they are reduced when senders are knocked down and are de-repressed when senders are upregulated. However such a competition mechanism faces three major limitations in accounting for the magnitude of the observed ceRNA effects. Firstly, individual miRNAs have long been thought to confer limited repression (-2 fold Bartel 2004, Baek 2008). Secondly, given the large target abundances in a cell, any sender perturbation is only thought to add or subtract very few sites from the total target pools for a targeting miRNA (Arvey 2011), implying that the repressive influence of that miRNA on individual receivers would be muted, and thus any consequent crosstalk would be small. Thirdly, mathematical models predict an optimum regime where crosstalk might be possible, namely when regulating miRNA and its target binding sites are near equal effective concentrations (modulo binding Kd) (Jens 2015, Bosia 2013, Figliuzzi 2013). While estimates of miRNA concentrations exist (tens to 120,000 copies per cell) [Bissels 2009 Denzler 2014], estimates of total target abundances and binding affinities are highly variable, making it difficult to asses whether genes are susceptible to crosstalk in an endogenous environment. However, a recent study of ceRNA effects for the exceptionally highly expressed liver-specific miR-122 determined that no target-competition occurs in vivo because of the large relative abundance of the miRNA target pools (Denzler 2014). Thus the hypothesis remains controversial despite a variety of examples: psudogenes (Poliseno 2010), circ-RNAs (Hansen 2011), and lnc-RNA (Cesana 2011) which suggest the existence of ceRNA interactions. The logic of crosstalk, supplemented with the highly interconnected network of miRNAmRNA interaction, suggests that ceRNA effects should be pervasive across the transcrip- 25 Chapter 2 tome (Sumazin,2011). Since each sender typically sequesters multiple miRNAs,which in turn have other targets, perturbing the levels of one sender could potentially result in the change in expression of hundreds of RNAs competing for shared miRNAs. Signal propagation through miRNA -+ ceRNA -+ miRNA could take place, affecting distant receivers (Nitzan 2014, Bosia 2013). However no widespread ceRNA effects have been shown experimentally. Existing studies typically focus on perturbing a sender and testing only a handful of ceRNAs. For example, after computationally searching for Pten ceRNAs based upon the number of shared miRNAs, (Tay 2011), found hundreds of possible ceRNA candidates but tested only a selected few Vapa, Cnot6l, Serinci, Znf460 that each shared at least 7 miRNA binding sites with Pten. Consequently, it has proved difficult to ascertain whether crosstalk is restricted to a select few sender-receiver pairs with high numbers of shared miRNAs, or only to those in favourable stoichimetric [miRNA] / Target pool ratios or instead if crosstalk is a general phenomenon. Identifying which miRNAs are involved in transmitting crosstalk between a particular sender and a receiver is crucial to refining the ceRNA hypothesis. Current methods to identify ceRNAs rely upon computational miRNA-mRNA target predictions. In particular, they emphasize the number of shared miRNAs between a sender-receiver pair (Salmena 2011, Ala 2013). However, each miRNA-mRNA interaction is affected differently by the strength of the miRNA-mRNA binding and by the local concentration of each interacting species. Thus the ability of a specific miRNA to transmit crosstalk will be influenced by its differential sequestration by the sender and differential repression on the receiver, and not only on the number of shared miRNA binding sites. In the case of Pten ceRNAs, miR-17, miR-19 and miR-26 families have been validated as transmitting crosstalk but it remains unknown whether other miRNAs are functional in the Pten ceRNA network. 2.1 Results In our study we used RNA Sequencing to quantify both the magnitude and extent of the crosstalk effect genome-wide by directly measuring the effect of perturbation of 3 different 26 2.1. RESULTS senders on the transcriptome. The senders we chose to knock down - Pten, Vapa and Cnot6l share many miRNA binding sites, and were each experimentally demonstrated as putative ceRNAs, competing for miRNAs with each other in the colon carcinoma HCT 116 cell line. Genome-wide measurement of the transcriptome after the perturbation of senders using RNA Sequencing would allow an assessment of key features of the ceRNA hypothesis. In particular, it would permit a quantification of the magnitude of crosstalk strength for thousands of potential receivers. Our work is focused on three major questions: a)How large is the magnitude of crosstalk in an endogenous system? Are ceRNAs restricted or are they extensive when you test thousands of sender-receiver pairs? What are the characteristics of a good sender? Which miRNAs are involved in transmitting crosstalk? What are the characteristics of miRNA's that makes them good at transmitting crosstalk? We used a highly simplified model of miRNA regulation of a single sender-receiver aimed at quantifying the magnitude of crosstalk interactions. The model predicts that crosstalk strength is bounded by 1 and is usually much smaller for reasonable binding parameters. On evaluating the crosstalk strength transcriptome-wide in our experiments, we found that crosstalk strength is indeed bounded for each of the senders, yet it is surprisingly pervasive across the genome- including hundreds of genes at all expression levels. We uncover putative ceRNA's for each sender based on the difference of crosstalk strength in the HCT116 and HCT 116 DICER -/- colon carcinoma cells. We further characterize the influence of shared miRNAs between senders and receivers upon the crosstalk strength and determine that crosstalk strength is intensified when sender-receivers pairs share more miRNAs. Using our quantification of crosstalk strength, we estimate the power of a miRNA to transduce crosstalk for each sender, and find that there is a hierarchy of miRNAs crosstalk power i.e miRNA are differentiated in their ability to affect ceRNAs. Surprisingly, we find that the miRNAs targeting Pten have the highest crosstalk power of the three senders. We suggest that the ability of a gene to be a good sender of crosstalk (like Pten) is dependent upon its ability to sequester miRNAs and the overall stoichiometry of its [miRNA] / target pools. We further find that we can modulate the levels of these putative ceRNA's by transfecting a plasmid carrying endogenous Pten 3'UTR sponges into the cells at varying levels. 27 Chapter 2 Specifically, we find a subset of 'robust' Pten ceRNAs are both de-repressed in a dosedependent manner and depleted when Pten is knocked down suggesting that Pten exists in an optimal regime for crosstalk. 2.1.1 ODE biochemical model of crosstalk predicts that crosstalk strength should be bounded by 1 The endogenous molecular environment consisting of numerous miRNAs and targets is complicated; any perturbation of a sender changes target pools for many different miRNA's targeting many receivers (Figure 2.1a). To characterize the strength of the ceRNA effect, we need to answer two questions: How does a change in the sender influence the free miRNA pool? How does the corresponding change in the miRNA pool influence the receiver? We sought to understand the simplest system consisting of one sender- one transmitting miRNA and one receiver. In the simplest titration mass-action ODE model (analagous to Buchler 2008;Mukherji 2011) of two mRNAs regulated by one miRNA which is recycled after interacting with its target we take into account the dynamic properties of miRNAs (p), free mRNAs (ceRNAs) for the two targets (m, and M 2 ), and complexes of the miRNA with its targets (mg and M 2,a) . The model's parameters are transcription and degradation rates for m1, 2 (il and dm) and [t resp.), and association, dissociation, and degradation rates for the complex m1,2t (kn,koff, d"' . For illustration (but this simplification can be relaxed), all the transcription, degradation and association rates are assumed equal. Considering one target as the sender and the other as the receiver of crosstalk (Figure 2.1a), we would like to know the impact of the variation of the single sender m1 's transcription rate on the receiver m 2 (their derivative is what we term crosstalk strength). d[i] = V - d dt 2 i.[m,] - k".[n.].[jL] + k f.[mp] Z 2 d[minp] = k ".[mj.[p] -k' d .[mp - A' = [p] + [mIA] + [n 2p] 28 (2.1) 2 d .[mp (2.2) (2.3) 2.1. RESULTS where we assumed that the total miRNA concentration is a constant ILT . We can obtain the steady state solutions for each species: (2.4) K where K is the effective dissociation constant of the miRNA complex, K d coped-,an [Mil ([m2 - [pt - K* + V([m1 - [p*] - K*) 2+ 4m .K Where we defined a microRNA "target load" w = ml/Kd + kof f+dm" on and (2.5) m 2 /Kd which describes the sequestration of the miRNA by the two regulated mRNAs and captures the competition between those two co-regulated genes for the same mRNA. [m9] = v m /d' is the steady state mRNA level without any microRNA regulation, and the effective miRNA concentration is A d* /d . [p T]. The effect of the competing mRNA can be subsumed into an apparent Kd*=Kd. (1+ w,) corrected by the miRNA target load that the other mRNA contributes to the miRNA, w 1 2 = m 2 /Kd. The quantity we are interested in, crosstalk strength, is the sensitivity of m2 to M1 levels, d". drnj To see how it varies with sender expression, we fixed all parameters but the transcription rate of the sender. For the steady state solution of the model, the dissociation constant for the miRNA-target complex K , dictates the threshold at which the miRNA is bound or unbound by the sender. Most miRNAs are bound as the sender levels increase above the threshold while they become unbound below it (Figure 2.1b). The model gives us the steady state concentration of the receiver as depending on the free miRNA concentration. As free miRNA levels decrease, the receiver gets progressively unbound (Figure 2.1c). If there are too many sender molecules then all miRNA would be bound by it, leaving the receiver free, thus no crosstalk would be observed. If there would be too many miRNAs then both ceRNAs would be bound by the miRNA, and no crosstalk would be observed. Above the threshold, miRNA repression is lost and receiver levels grow almost linearly with transcription rate of the sender while its variation is maximal close to the threshold. Thus crosstalk is only present in a narrow range near the threshold Kd, where bound 29 Chapter 2 miRNA-mRNA complexes are most sensitive to free miRNA concentrations. Moreover, the model illustrates that both the binding affinities and the overall stoichiometry of the system dictates whether or not there is cross-talk between ceRNAs (Figure 2.1d) [similar to Figluizzi 2013]. The magnitude of the crosstalk strength between the sender and receiver can be shown to be the product of two factors: the response of the miRNA level to perturbations of the transcription rate of of the sender, and the response of the level of the receiver to the perturbations of the miRNA level (See Appendix A). The former depends upon the fraction of miRNAs bound by the sender (Sequestration;determined by Kd, ) while the latter depends upon the the relative repression conferred by a miRNA upon its target (Repression; determined by rates of degradation and association and by the relative concentrations of free miRNA). As the sequestration factor is always less than 1 (miRNAs are always bound to other targets than just sender) and the repressive effect of the miRNA on the receiver mRNA is also always less than 1, their product will also be less than 1. Thus, the crosstalk strength can be shown to be bounded by 1. CS recive < Sequestration x Repressionreceiver< 1 (2.6) In simulations of the single sender-receiver model, where we sweeped parameters (with biologically reasonable values from the literature] in about 90% of expression states in all systems, crosstalk strength was below 0.1. The simulations show that crosstalk is strongest when the expression of the sender is in the sender's ultra-sensitive regime and the expression of the receiver is below the receiver's ultra-sensitive regime. Though the single senderreceiver model is perhaps too simple, it does make a testable prediction: crosstalk strength in an endogenous system should be small and generally bounded by 1. To evaluate this general prediction we used RNA Sequencing to quantify both the magnitude and extent of the crosstalk effect genome-wide for three different sender mRNA. 30 2.1. RESULTS minimal system endogenous situation I :. transmittng miRNA 0 receiver sender How does the corre sponding How does a change in the sender influence the miRNA pool? b * a %~S change in the miRNA pool influence the receiver? C m"A 4D z a0 0 -0.3 100 102 101 [miRNA] [sender] d 03 CS a) 11 o1 0" lop[sender] 0 = dm/dm,= dm/dTL * dTL/dm, < microRNA-mediated changes in mnRNA2 upon change of targedoad (TL) 1 change of targetdo Ad upon change in n RNA1 . 101 C, W Figure 2.1 1 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded. (a) Generally RNAs (wavy lines) in an endogenous system of multiple miRNAs (cicles) interacting with many targets will sequester miRNA and produce RNA competition effects. This competition between competing endogenous RNA (ceRNA) species for their miRNA is termed 'crosstalk' or the ceRNA effect. (b)We study a minimal model with only one 'sender', one transmitting miRNA and one 'receiver' under simple mass-action kinetics to computationally ascertain how a change in the sender influences the miRNA pool and how the corresponding change in the miRNA pool influences the sender under reasonable biochemical binding parameters.] Steady state concentrations in the system are obtained by fixing all parameters but the transcription rate of the sender. All binding parameters are assumed equal between sender and receiver. Sender and receiver expressions are normalized by their (equal) dissociation constants.Numerical simulations of the model show that bound miRNA-target complexes are formed and free miRNA declines as more sender target sites are introduced into the system until the sender saturates the miRNA pool. Maximal change requires free miRNA concentration around the dissociation constant (Kd) of sender binding sites. Inset contains the derivative of miRNA concentrations with respect to sender concentrations which is always negative because an increase in the sender always causes an increase in the level of bound miRNA-target complex 31 Chapter 2 Figure 2.1 1 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded. (c)Under repression by miRNAs, the receiver levels decline upon increase of miRNA levels until they are maximally repressed. Inset contains the derivative of receiver concentrations with respect to miRNA concentrations and it is always negative because an increase in [miRNA] always has a peak around Kd of receiver. (d)Combining the dynamics in (b) and (c) we obtain the response of the receiver to sender levels. The receiver is sensitive to variations in the level of its competitor (sender) via the change of the free miRNA concentration [miRNA], and is progressively derepressed as the sender starts to sequester the miRNA. Its derivative is what we refer to as the crosstalk strength (CS) i.e the relative change in the free levels of the receiver upon a relative change in the sender. The inset depicts the crosstalk strength in this model (parameter set). The crosstalk strength increases in the regime where free and bound molecules have similar concentrations. Crosstalk is bound by 1 because it is the product of two factors that are each less than 1: the fraction of miRNAs bound by the sender and the change in repression of the receiver upon changes in its target pool. 2.1.2 Quantification of crosstalk following siRNA knockdown of sender Previous studies of ceRNA's have focused on only one sender or on only a few targets of a miRNA, even though a perturbation in ceRNA levels that changes miRNA activity would be expected to affect many hundreds of genes. To obtain a more comprehensive view of the effects of sender knockdown in an endogenous system, we knocked down a sender using siRNA and quantified the concomitant changes in the transcriptome using RNAseq (Figure 2.2a). These experiments were performed in triplicate using siRNA pools (a combination of four independent siRNAs) which have been specifically designed to achieve strong target knockdown and minimize off-target effects. We chose to knock down Pten, Vapa and Cnot6l as each of them has been previously shown to be a strong sender of crosstalk [Tay 2011]; moreover, they are targeted by many different, validated, miRNA families [figure], each of which, in turn, targets many different RNA's, thus allowing us to simultaneously test i) thousands of possible sender-receiver pairs for crosstalk and ii) isolate the contribution of specific miRNA's in transmitting crosstalk iii) test the impact of shared miRNAs on crosstalk. As any siRNA knockdown experiment has confounding direct and indirect effects that are either a) due to off-target effects of siRNA transfection or b) not mediated through competition with miRNA's but instead due to the changes in sender transcription (Pten for eg. is a key antagonizer in the PI3K-AKT/PKB signalling pathway), all our siRNA 32 2.1. RESULTS knockdown experiments were performed in parallel with two essential controls: a) with negative control siRNA's (Gene expression levels following the knockdown were compared to expression data collected from three replicates that were transfected with negative control siRNA) b) in the HCT 116 DICER -/- cell lines. The DICER -/- HCT 116 cell line has a deletion in exon5 of the DICER enzyme which is crucial in the processing of mature microRNA's [Cummins 2006]; additionally mature microRNA's are known to be significantly depleted in them [Tay 2011]. We expect crosstalk would thus be reduced significantly in the DICER cell for any putative ceRNA, as observed previously [Tay 20111 thus allowing us to use it as a control to eliminate non miRNAmediated fold changes. After treating the cells with siRNA, we waited for 24 hours to ensure a strong knockdown, extracted RNA and prepared RNA-sequencing libraries for each of the knockdowns. We sequenced with Illumnia HiSeq 2500 at a depth of roughly 20-30 million short reads per sample. We quantified gene expression in each condition by using reads-per-kilobase million (RPKM) normalization and averaging RPKM over three biological replicates. To remove variability from low-abundance RNA species, we removed genes that had 0 reads counts in any library and measured fold changes for each gene between the sender knockdown libraries and the negative control libraries. We achieved a direct knockdown fold change of 70-80% for each of the three senders. A representative RPKM expression scatter-plot in siPten vs the negative control sample in (Figure 2.2b) shows that Pten is the most strongly differentially expressed gene. We also confirmed that siRNA mediated gene-silencing is independent of DICER processing and hence fully functional in the DICER -/- cell line, as comparable knockdown fold changes for the senders were observed in the DICER cells. 33 Chapter 2 25nM si-VAPA 29nM si-PTEN 25nm hbCNOT6L - a HCT 116 high cs cells med cs p. Z C 24 h . Extract RNA for RNAseq X1 . -"'- -,"",""" os t -. . * Test thousands of senderreceiver pairs 6* ... 3 * Biological ReplicatesI HCT 116 & DICER - - 3 I 0 4 2 6 8 5 10 gene expression si-neg. control log2 (RPKM) C 20 10 15 si-neg. control RPKM 25 I 30 VAM AM 0- to OLO - 0 -1.5 -1.0 -. 5 0.0 0.5 PTEN Crosstalk Strenath 1.0 -1.5 -1.0 0.0 0.5 -0.5 VAPA Crosstalk Strenath 1.0 -1.5 -1.0 -0.5 0.0 0.5 CNOT6L Crosstalk Strength Figure 2.2 1siRNA knockdown of 3 different endogenous senders shows crosstalk strength is bounded by 1. (a) Experimental system for quantifying crosstalk strength genome-wide upon siRNA knockdown of either Pten, Vapa or Cnot6l in HCT116 and miRNA deficient DICER -/- HCT 116 cells. Each cell-line was transfected with sendersiRNA and negative control siRNA in parallel and their RNA was extracted after 24 hours. For each sample, RNAseq libraries were created and transcript expression was quantified with sequencing. All RNAseq exeriments were performed with 3 biological replicates. (b) RNAseq mean expression (in units of log 2 RPKM) scatter plot for the Pten knockdown and negative control in HCT 116 cells. Each dot represents the mean expression for all genes expressed at greater than 0.1 RPKM in the two libraries (n=13,700 genes). The direct fold change in Pten (shown in green) due to the si-Pten knockdown was 80%. Crosstalk strength for each receiver gene is defined as their fold change normalized to the fold change of Pten (sender). Genes below the diagonal (purple line) have positive crosstalk strength as they are reduced upon Pten knockdown. The right panel is a zoomed in version to highlight changes in genes with expression similar to Pten. The magnitude of crosstalk strength can be estimated as their relative distance from the diagonal compared to Pten's distance from the diagonal. Genes marked in light blue have a lower crosstalk strength than those marked in dark blue. Most genes that fall along diagonal show no changes in expression i.e no crosstalk. In contrast, previously known Pten ceRNAs, Cnot6l and Vapa both have positive CS and are marked in red for comparison. Expression is in units of RPKM. (c) Volcano plot of statistical significance for Crosstalk Strength versus P-value in each of the senderknockdowns. Crosstalk strength is bounded by 1 (dotted green line) but can have larger negative values. CS=1 for each of the senders (black dots) by construction. P-values are adjusted for multiple comparisons by Benjamini and Hochberg false discovery rate (FDR) fthod with a = 0.05) 1.0 2.1. RESULTS 2.1.3 Pervasive yet bounded mRNA Crosstalk upon siRNA knockdown Different receivers will in general respond differently to a change in sender levels depending upon exactly which miRNA are being sequestered by the sender or by the repressive effect of miRNAs, and thus can exhibit more or less crosstalk. We wished to quantify the crosstalk strength between senders and all its potential receivers for each of the 3 different siRNA knockdown RNAseq datasets in the HCT and DICER cell lines. We defined the 'crosstalk strength' of a receiver with respect to a sender in the respective cell lines/conditions, as the relative fold change in the receiver levels after the sender knockdown to the relative change in sender levels after its siRNA knockdown. For example, for HCT 116 cells, when the sender is Pten, then for a receiver gene X we compare its mean expression in the negative control (termed 'HN') replicates to its expression in the siPten (termed 'HP') biological replicates CSceLs=HCT,receiver=X sender=Pten fold change of gene X in HN over HP fold change of Pten in HN over HP _ - XHN-XHP XHN PtenHN-PtenHP PtenHN This means, that when the crosstalk strength is 0.1 and the sender levels reduce by 80% then the receiver levels will reduce by 8% through crosstalk. The crosstalk strength, so defined, is dependent on the relative direction (sign) of the fold change: Genes with positive crosstalk strengths are thus depressed when the sender is knocked down i.e they co-vary with the sender as implied by the ceRNA hypothesis. Genes that are upregulated on sender-knockdown will thereby have negative crosstalk strength but should not considered as putative ceRNAs. We calculated the crosstalk strength for all the 13,700 expressed genes in each of the sender libraries as described above and examined its distribution in the HCT 116 and DICER cell lines. Most genes suffered no expression change on knocking down the Pten, Vapa or Cnot6l senders, thus the CS distribution was centered around zero in both HCT and DICER. As suggested by the reduction of miRNA activity in DICER -/-, the distribution of CS in DICER was substantially shifted towards smaller values than CS in HCT. (Figure 2.3). Strikingly however, crosstalk strength in all the conditions was bounded by +1 - almost no genes were down-regulated greater than the sender down-regulation i.e receiver gene expression fold changes were smaller than the 70-80% fold change of the sender.(Figure 35 Chapter 2 2.2c). Hundreds of genes had statistically significant (p<0.05) CS between 0.1 and 0.5 but relatively few had greater crosstalk strength that was also significant. We obtained the statistical significance for gene Crosstalk Strength by calculating z-values from our replicates and using the Benjamini-Hochberg method to adjust p-values for multiple comparison testing. Interestingly, genes with negative crosstalk strengths had comparatively higher p-values (more replicate variability) indicating that they tended to be expressed at lower levels. Taken together our prediction that crosstalk strength should be bounded was supported by the genome-wide expression data. a> 0 HCT CS DICER CS HCT CS DICER CS HCT CS DICER CS "q sq C -1.5 -1.0 -0.5 0.0 0.5 PTEN Crosstalk Strength 1.0 -1.5 -1.0 - .5 0.0 VAPA Crosstalk Strength 0.5 1.0 -1.5 0.5 0.0 -0.5 -1.0 CNOT6L Crosstalk Strength Figure 2.3 1 Crosstalk is miRNA-mediated and pervasive on a genome-wide scale. (a) Probability density of the crosstalk strength distribution in both HCT (black) and DICER (red) for all genes expressed above 0.1 RPKM in each of the 3 senders. Observed crosstalk strength in all of the knockdowns is always less than 1. CS is higher in HCT cells compared to DICER for many genes, and more genes have negative CS in DICER. The inset shows the same distributions but with the number of genes whose CS HCT > CS DICER calculated for each of the 30 bins across HCT CS. This indicates that hundreds of genes exhibit miRNA mediated crosstalk across the genome for each of the three senders with low-moderate crosstalk strength (0.1<CS<0.5) To determine whether or not these extensive positively crosstalking genes were indeed miRNA mediated, we chose only those genes whose crosstalk strength in HCT116 was greater than that in DICER. We found such genes across a range of crosstalk strengths ranging from low (n=440 Pten ceRNAs with CS=0.1) to high (n=65 Pten ceRNAs with CS=0.5)(Figure 2.3) indicating that putative ceRNAs were found pervasively across the 36 1.0 2.1. RESULTS transcriptome. In addition, on examining the expression range of these putative ceRNAs, we found that they were expressed across 3 orders of magnitude. These included some ceRNAs previously discovered (Cnot6l, serinci, Vapa, zeb2) but also hundreds of novel ceRNAs ((Figure ??). A GO-term analysis for putative Pten ceRNAs showed significant enrichment for a range of biological processes including "protein phosphorylation", "regulation of phosphate metabolic process' (Table 2.3), which are also GO-terms linked to the functional role of Pten- which acts as a tumor suppressor through the function of its phosphatase protein product. To assess whether these putative ceRNAs were actually responding to changes in miRNA levels due to depletion of the sender, we performed a miRNA enrichment analysis. MiRNAmediated crosstalk would require that these putative ceRNAs are enriched in miRNA binding sites for their particular senders. Indeed, we found many sender-targeting miRNAs that are enriched in their respective putative ceRNA lists (Table 2.1). These include miRNA families (mir-17, mir-19, miR-93, miR-26) previously implicated in Pten ceRNA networks [Poliseno 2010]. Intriguingly, we also found statistically significant miNRA enrichment in these ceRNA sets for miRNAs that are not known to have binding sites on the sender, suggesting ceRNA effects can propagate via the interconnected miRNA-target networks. 2.1.4 Crosstalk strength correlates with the number of shared binding sites We reasoned intuitively that if miRNAs of different families are sequestered by a sender, then each miRNA released upon sender knockdown would repress their targets independently, thus amplifying any crosstalk between the sender and receivers which share binding sites. Indeed, our model, along with others [Ala 2013], suggests that crosstalk depends on the overlap of miRNA-binding sites between senders and receivers. Specifically, it increases with the increase in the number of shared MRE's. In order to test this hypothesis we first tested the weaker claim: genes that share multiple miRNA binding sites with the sender must have greater crosstalk strength than the set of all genes. The second, stronger claim we tested was: the more miRNA binding sites a receiver shares with the sender, the more 37 Chapter 2 its crosstalk strength ought to increase. We tested the weaker claim by ranking genes exclusively by the of shared miRNAs with the sender (independently for Pten, Vapa and Cnot6l). We counted all the predicted target-scan overlapping binding sites shared between any given mRNA and the sender, and then ranked this list of genes by the number of shared binding sites. We thus obtained a list of top 500 Pten, Vapa and Cnot6l "shared miRNA predicted ceRNAs". We then compared the CS of these genes in HCT to that in DICER cells, and found that their HCT CS is significantly greater than their DICER CS for Pten, Vapa but not for Cnot6l (Figure 2.5a) suggesting that our measurement of crosstalk strength was miRNA dependent and supported the hypothesis of the correlation between shared binding sites and crosstalk strength. In order to eliminate any systematic CS bias in HCT versus DICER, we also checked the CS distribution in the three HCT 116 sender libraries. We found that the HCT crosstalk strengths in these "top 500 shared miRNA predicted ceRNAs was significantly greater the the control set (consisting of all genes) for Pten, Vapa but not for Cnot6l (Figure 2.5b). We caution that not all of these computationally predicted genes that share miRNA binding sites with a sender have positive CS. For example, 155 genes of the "top 500" genes that share more than 3 miRNA binding sites with Pten have negative crosstalk strength thus demonstrating that computational methods of predicting ceRNAs have to be supplemented by experimental tests due to the high number of false positives present in TargetScan binding sites predictions. Because the second claim is more quantitative than a simple comparison, we wished to remove contamination from non-ceRNAs and required our basic condition: HCT CS> DICER CS be met. To further increase stringency, we took this list of candidate ceRNAs and required that they share at least four miRNA-binding sites with the sender. For Pten we found 858 genes and for Vapa 610 genes We then binned these receiver genes into quintiles of the number of shared miRNAs. For each of these quintile gene-sets, we computed the median CS independently for each sender. Consistent with the model, the crosstalk strength was shifted to higher levels in receivers 38 2.1. RESULTS that share more and more miRNA binding sites with the sender (Figure 2.5c). The greater CS of Pten may indicate a greater propensity to sequester miRNA's or its greater affinities to miRNA (see discussion). These results confirmed that shared miRNA binding sites play a significant role in transmitting crosstalk between a sender and a receiver. With this analysis, we found that Cnot6l shows no evidence of crosstalk strength dependence on number of shared mIRNA (Figure 2.4) : (i)the genes that share more than 8 miRNA binding sites with Cnot6l have lower crosstalk strength than those that share no miRNA binding sites. (ii) there is no increase in crosstalk strength for genes binned by the # of shared miRNAs with Cnot6l all genes top 500 shared miRNA - C; 00 cell of"to500 gees ihteC (black)~~~~~~~~~NO6 hatshare grae Cln itiuinfralgns(ry.Temda TL hn7miRNAbidgstswthC rosstalk strength is smaller for "top 500" genes than that for all genes indicating that CNOT6L not dependent on the number of shared miRNA crosstalk is 39 Chapter 2 b 0.20 HCT 116 HCT 116 - top 500 shared miRNA alligenes tIoP 500 DICER * siVAPA C0.15 - a a) 0<0 E '0.10 -1.0 0.0 0.5 -0.5 PTEN Cro.takatnenEM 1.0 DICER 1.0 0.0 0.5 -015 OTEN Crossilk .100001 -to - CA 0 a) cc, to S1M1. 000 L 0.00 (3 4 P< 1 -1.0 10 0.5 0.0 -0.5 VAPA 00.000.00050 Ss0500010 VAA -10 o etelkst0 -0.0 00 05 to 3' 1 ~ Bins (# of Shared miRNA) Figure 2.5 1Crosstalk strength of receivers correlates with the predicted number of miRNA binding sites shared with the sender. (a) and (b) Crosstalk strength is higher for receivers that have the largest number of predicted miRNA binding sites in common with their respective senders both between HCT116 and DICER, and within HCT116 cells. (a) Cumulative distributions of crosstalk strengths wrt each sender for receivers that share the most binding sites with the sender. The crosstalk strength distributions for these set of genes is shown in HCT116 and DICER. "top 500 shared miRNAs" indicates the ranked list of genes sharing at least 4 or more binding sites with the sender, see text). These genes show a significant increase in CSh116 compared to CSdicer . p < 10-9 for the difference between the distributions was calculated by the one-side Kolmogorov Smirnov (K-S) test. (b) same as above but the Crosstalk strength distributions wrt each sender are for all genes in HCT116 and the set of "top 500 shared miRNA" genes also in HCT116. These genes show a significant increase in CS compared to the 'all genes' background set. p < 10- 5, p < 10-4. for CS pten and CSvapa respectively (K-S test).(c) Genes that share the most binding sites with the respective senders were grouped into bins based on their of shared binding sites (colored # of shared binding sites is indicated on x-axis). Only those receivers with CShctll6 > CSd"i were selected. The median crosstalk strength in each bin is reported (for each sender). The distribution of CS for each bin was significantly different from the preceding bin with all p-values less than 10-3. (KS test). Each bin had atleast 90 genes. 2.1.5 miRNA's hierarchically contribute to transmitting crosstalk With these quantitative genome-wide measurements of crosstalk strength, we next turned to measuring the ability of a miRNA to transmit crosstalk. Given that different miRNAs vary in their concentrations, binding affinities, target abundances, all of which modulate 40 2.1. RESULTS their ability to transduce crosstalk, we wished to dissect their individual contributions. To determine which miRNA's were involved in mediating crosstalk, and to what extent, we developed a metric to quantify the bulk effect of sender knock-down upon the predicted targets of a miRNA. Rather than evaluating the crosstalk stength for a particular target of a miRNA, our metric characterizes the the cumulative, concordant variations of all, rather than individual target genes. Specifically, for each miRNA, we calculated the difference between the median CS of its targets (genes that contain a predicted binding site for that miRNA) and its non-targets (genes that don't contain a binding site for that miRNA). = median (CS targets of miRNA,)- median (CS non-targets of miRNA,) CT powered miRNAi (2.7) We term this shift in the CS distribution for targets vs non-targets the Crosstalk Power for each miRNA (Figure 2.6a). Note that we don't require the crosstalk strength of a particular gene to be statistically significant as we are interested in the cumulative effect of a miRNA on all its targets. Reassuringly, the 'Crosstalk power' for conserved miRNA's that have known binding sites on Pten and Vapa (again, not for Cnot6l) is greater than the crosstalk power of miRNA's that are not predicted to have binding sites on these senders(Figure 2.6c). This suggests that crosstalk power can be used to discriminate between sender-targeting miRNAs and non sender-targeting miRNA. The crosstalk power of the sender-targeting miRNA families are shown in (Figure 2.6b). We found that miRNA's differ considerably in their ability to transmit crosstalk, as exemplified by miR-374ab and miR-875 which emerged as the miRNA with the greatest Pten and Vapa miRNA crosstalk power respectively. Strikingly, almost all Pten-targeting miRNAs have positive CT power, including many miRNAs that have greater crosstalk power than mir-17, mir-19, miR-20a, mir-26a which have been previously shown to directly mediate crosstalk for Pten. [Poliseno 2010]. Pten therefore has the ability to transmit miRNA through by sequestering many different miRNAs allowing it to promiscuously interact with ceRNAs. In general however, not all miRNAs that are predicted to have binding sites on the senders necessarily have a positive 41 Chapter 2 crosstalk power; nor do all miRNAs with positive crosstalk powers necessarily have binding sites on the senders. For example, we uncovered 92 different miRNA with positive crosstalk power for Pten and 67 different miRNA with positive crosstalk power for Vapa that do not have any predicted binding sites on the respective genes. One factor that our model suggests can influence the ability of a miRNA to transduce crosstalk for a sender is the cumulative number of its binding sites sequestered by that ) sender. A more effective sender of crosstalk would sequester many miRNA's (higher Kd but would only be weakly repressed by them, enabling a greater contribution to free miRNA pools when the sender is perturbed. However, it is very difficult to experimentally quantitate miRNA sequestration on miRNAs. We therefore estimated the sequestration fraction bioinformatically for each of the miRNA's which target Pten (similarly for Vapa, Cnot6l). To do so, we calculated the ratio of the number of predicted targetscan binding sites on Pten (scaled by Pten's expression) to the predicted targetscan binding sites for that miRNA on all its other targets (scaled by their expression). This ratio quantifies for each miRNA the fraction potentially sequestered by Pten. As expected from the model, we find that miRNA crosstalk power for each sender is strongly correlated with the fraction of miRNA binding sites sequestered by the sender (Figure 2.6d). Notably, those miRNAs which are sequestered by Pten the most tend also to have the greatest crosstalk power (miR-374ab, miR-410) 42 2.1. RESULTS a b Ist-7M84W no- P.3030946-10 rges PTEN miRNA CT power .040.02- CrT 0.0Emil I 'r 0.04- 06-6- 000 E V TE 6- LC I VAPA miRNA CT power - 0.02 - 0.00 P - -0.02 -5 -1.0 -05 00 05 1.5 10 PTEN Crosstalk Strength "2 9 C Eg d PTEN mRN bgr miRNAs 3 - --E2-2 E -a I T rho = 0.37 . .. '_ ............. .. 00.... C) -0.04 0.00 -0.02 0.02 004 0 0.00 0.00 0.02 0.04 0.0 0.06 0.10 % maRNA sequestraion o miRNAs rfo = 0.35 - VAPA bgr PoRNAs 6 a VAPA I- 0 z -g -0.04 -0.02 0.00 0.02 0.06 0.00 0.02 0.04 0.00 miRNA sequestraton 0.6 0.10 % miRNA Crosstalk power 0.04 Figure 2.6 1 Dissecting relative contributions of miRNAs in transmitting crosstalk. (a) Histogram of Pten crosstalk strength in HCT116 for predicted targetscan targets of let-7 (red) and its non-targets (gray). The bulk-contribution of let-7 in transmitting Pten crosstalk to all of its targets can be estimated by the difference in the medians of the two distributions. We defined this difference as the "Crosstalk power" (CT power) of the miRNA let-7 for the sender-Pten. Crosstalk power can similarly be calculated for all 153 conserved miRNA families expressed in HCT cells from each of the sender crosstalk strength distributions. CT power is larger for those miRNAs whose targets suffer a large overall repression when the sender is knocked down. Only those genes with CS HCT>CS DICER were considered. (b)miRNA CT power for all the miRNAs which target Pten, Vapa shows differential ability of sender-targeting miRNAs to transmit crosstalk. Those miRNA with negative CT power are those whose targets tend to be up-regulated when the sender is knocked down, and are thus unlikely to be involved in the ceRNA effect. miRNAs which have shared binding sites in all the three senders are in bold. See [table] for miRNA CT power and p-values for all 153 miRNA families. (c)Cumulative distributions of miRNA crosstalk power for all the miRNAs which target the sender (red) compared to the miRNAs which dont target the sender (black). (d)miRNA CT power for sender-targeting miRNAs is correlated with the its fraction of binding sites on the sender- its sequestration fraction. Those miRNA with higher CT power also tend to be (relatively) highly sequestered by the sender. 43 Chapter 2 2.1.6 Pten miRNAs have the greatest crosstalk power due to high [miRNA]: Target abundance ratios A recent study using Argonaute CLIP assays [Bosson 2014] has shown that miRNA:Target ratios is correlated with higher Argonaute binding on genes, and consequently, greater miRNA repression. It has been experimentally demonstrated that only the most abundant miRNAs have significant repression suggesting that ceRNAs are free from miRNAs when those miRNAs have low concentrations. Conversely, previous analysis of miRNA repression showed that miRNAs with lower miRNA:Target abundance ratios deliver minimal repression [Garcia 2011; Arvey 2010]. A possible explanation is that lowly expressed miRNA have a low probability to find their target sites on transcripts because miRNA-target target encounter occurs by mass action. Additionally, when microRNAs that are expressed at a low level have hundreds of different targets (i.e have high target abundance), a single miRNA would have a limited repressive impact on any one gene. We sought to investigate differences in the relative miRNA and target levels for our three senders. We first obtained miRNA expression profiles in HCT116 from a miRNA microarray [Yan 2011] and found that the average expression of miRNAs which target Pten were greater that Vapa or CNOT6L targeting miRNAS. Surprisingly, even though Cnot6l has several more predicted miRNA binding sites than Pten (Figure 2.7a)(44 and 24 respectively), the average expression of Pten-targeting miRNAs is almost four times greater than the average expression of Cnot6l targeting miRNAs(Figure 2.7c). We next estimated Target Abundances (TA) for each miRNA by summing the predicted 6-mer,7-mer and -8-mer binding sites on each of its targets scaled by the RPKM expression of that target in HCT116 in our data [following Bosson 2014]. We averaged the target abundance for each of the miRNAs targeting Pten, Vapa, Cnot6l. Interestingly, we found the opposite hierarchy between the 3 senders: Pten had the least TA while Cnot6l had an average TA about 10 fold higher(Figure 2.7b). Thus, Pten has the greatest [miRNA]: TA ratio of the three, allowing us to hypothesize that miRNA's targeting Pten might confer greater repression on their targets, compared to Vapa and Cnot6l, rendering Pten ceRNA's 44 2.1. RESULTS more susceptible to crosstalk. a e aof Prodded twgotio ndFtdA PTE!% or- 0- b n~lbg S Avg TWgotobuiduio of 0- 0 AqLL~q1Ofld C 0 lnvHNA C) Avg EMwedon of 1W98"o ojfA cc CNCFr VM% ---- VAPA z 0O 0~ -CNOT6L KM d Ag WE~so)O WgWdoNA modhon ,oIANA coostilpowortorrno tWgott adRt4A 2.00 FdCoQv&ddpoffb g.&.oA * Targeting miRNA 0 Background rniRNA 01 ci 2.10 [miRNA] Target Abundance 2.05 2.15 NA Figure 2.7 1 Greater miRNA:Target ratios underlie Pten's superior ability to send crosstalk. (a) TargetScan based prediction of the number of different miRNA families targeting each of Pten, Vapa and Cnot6l. (b)Average target abundance of sendertargeting miRNAs (in log10 units). Target abundance (TA) for each of the 153 human conserved miRNA was calculated (see methods) by summing the predicted 6-mer,7-mer and -8-mer binding sites on each of its targets scaled by its target expression in HCT116. (c)Average miRNA expression for each of sender-targeting miRNAs. miRNA expression in HCT116 cells are from a miRNA microarray dataset (Yan 2011) and are in relative units. Pten is targeted by highly expressed miRNAs compared to Cnot6l. (d)Median crosstalk power for all miRNAs which target Pten, Vapa and Cnot6l respectively. Pten miRNAs have greater crosstalk power. Crosstalk powers for each miRNA (for each sender) were calculated from the crosstalk strength distribution as in the text. (d)miRNAs with greater crosstalk power also have higher [miRNA]: Target ratio as exemplified by Pten which has the greatest [miRNA]: Target ratio and miRNA crosstalk power of the three senders. Targeting miRNAs (white) are those (red) are all those miRNAs which target the sender. Background miRNAs miRNAs that dont have predicted binding sites on the sender. We used miRNA crosstalk power as a proxy for repression and ranked the senders by our experimentally determined crosstalk power for each miRNA. Pten clearly emerged as the best sender of crosstalk-its miRNAs had much greater miRNA crosstalk power than the other two senders Vapa and Cnot6l (Figure 2.7d)- about twice the median crosstalk power than Vapa's miRNAs. In fact, for each sender, we observed that miRNAs which target them had both greater [miRNA]:TA and crosstalk power on average, than the background set of 45 Chapter 2 miRNA's that did not target them (Figure 2.7e). Thus, we conclude that senders such as Cnot6l, which are targeted largely by lowabundance miRNA's with comparatively more targets have much weaker ability to transmit crosstalk compared with a sender such as Pten. However, we are only making a comparative claim between the senders- as both the miRNA expression data and target abundance estimations are in non-absolute concentrations, we cannot not be certain that miRNA concentrations are in excess of the target pool size or vice versa. Moreover, we observed no correlation between the [miRNA]:TA ratios and the miRNA crosstalk power for just the Pten miRNAs, just an overall correspondence in the median miRNA crosstalk power and average [miRNA]:TA of the three senders. We found individual miRNAs that had high Pten miRNA crosstalk power but had a low [miRNA]:TA and vice-versa. 2.1.7 Transfecting Pten UTR as a sponge de-represses putative ceRNA's in a dose-dependent and miRNA dependent manner As Pten emerged as the strongest sender of crosstalk in the siRNA knockdown experiments, we wanted to exclude the possibility that transcriptional regulation via PTEN protein, a tumour suppressor and a key member of the P13KT -mTOR pathway, may have been a factor in the widespread crosstalk changes observed. We sought to clarify two questions: a)whether miRNA binding sites on the Pten 3 'UTR were directly responsible for the crosstalk effects b)to what extent would Pten ceRNAs be de-repressed by modulating the amount of Pten 3' UTR i.e the varying levels of MREs by an endogenous 3' UTR 46 2.1. RESULTS a P pTRE-Tight b PTEN 3'UTR 20% transfection 0.V pTRE-TihtNULL Iiu ic~iencyiiii31m C) 0 102 10 103 4 105 d C 5 4 -PTEN -NULL -NL 2.0 3'UTR 3'UTR TR 1.5 13 4, NO - 4.5 1.0 0.5 2.5 3 3.5 4 0 4.5 logio mCherry [a.u] Figure 2.8 1Derepression of Pten ceRNAs is detected upon modulating the levels of Pten 3' UTR with a transiently trasnfected synthetic reporter construct. (a) A synthetic two-color reporter construct for measuring the effect of Pten 3' UTR sponging in single cells The construct consists of a bidirectional tetracycline-responsive promoter that drives the transcription of two fluorescent reporter proteins: ZsGreen and mCherry. We fused Pten 3'UTR to Zsgreen, and the unmodified plasmid is used as a control (NULL 3' UTR) (b) Flow cytometry measurement of HCT116 cells transiently transfected with Pten 3'UTR sponge plasmid and induced with doxycycline for 18h (cells positive for plasmid are in purple) indicate a robust expression of the Pten 3'UTR sponge across 3-decades. Transfection efficiency is about 20%. (c)Pten 3'UTR. is under robust repression throughout the plasmid expression range. It exerts a strong influence on Zsgreen levels as seen by the difference in transfer function of Pten 3'UTR and NULL 3'UTR transfections. Cells were binned by mcherry expression and the mean Zsgreen expression was calculated for each bin. (d) Total RNA from sorted cells (purple in b)) carrying the Pten 3'UTR plasmid was probed for the expression of known Pten ceRNAs with RT-PCR. Expression of indicated genes was sem(wt) normalized to their expression in the un-transfected cells. Data are mean Constructing and transfecting Pten 3'UTR reporter sponge into HCT116 cells The Pten 3'UTR contains predicted sites for 25 different miRNA families (http://www. targetscan.org) and there is direct evidence using RNA immunoprecipitation for Pten reg47 Chapter 2 ulation by miR-106ab, miR- 130 and the miR-17-92 cluster (which encodes the microRNAs miR-17, -18, -19a, -19b, -20a, and -92 in HCT 116 (Tay, 2011). To explore the genome-wide effects of sponging away miRNA's with an endogenous 3' UTR on competing RNA's we adapted a plasmid-reporter system previously developed in our lab [Mukherji 2011]. The plasmid contains two genes that encode fluorescent proteins (ZsGreen and mCherry), which are transcribed at identical levels from a common bi-directional tetracycline-inducible promoter and contains multiple-cloning sites to insert any 3'UTRs of interest(Figure 2.8a). To probe the effect of microRNA sponging, we constructed a variant carrying the entire 3' Pten UTR fused to the Zsgreen gene. We transfected this plasmid into the HCT116 cells and used the original plasmid (without the Pten 3'UTR) as a control (we call this the null UTR ). The mcherry/Zsgreen fluorescence from the NULL UTR construct is used as a control and allows us to isolate only the effect of the Pten 3'UTR sponge. In order to induce the promoter with doxycycline we co-transfected these cells with the rtTA plasmid as HCT 116 does not endogenously produce rtTA transcription factors. We observed robust expression of the Pten 3'UTR sponge construct across 3 orders of magnitude on quantifying the single cell fluorescence 18 hours later using a flow cytometer (Figure 2.8b). In principle, plasmid induction starts immediately after the addition of doxycycline, but we observed more derepression in confirmed Pten ceRNAs Vapa, Cnot6l and SERINC1 18h later as compared to after 12h or 36h [supp figure], possibly due to miRNA degradation timescales. To ascertain whether overexpression of Pten 3'UTR with a synthetic construct was capable of sponging away endogenous miRNAs, and thus derepressing other targets, we measured its effect on some previously established Pten ceRNAs. Taking the ratio of zsgreen to mcherry across bins of mcherry fluorescence in the flow cytometry measurements in each of the Pten 3'UTR and NULL plasmid transfections, allowed us to calculate the Pten 3'UTR fold repression across the transfection range. We observed that Pten 3'UTR was under weak repression (upto 2-fold) throughout the transfection range (Figure 2.8c). FACS sorting only the mcherry expressing cells, and measuring the bulk RNA levels of four Pten ceRNAS- (Vapa, Cnot6l, SERINC1) with RT-PCR showed that they were de-repressed [figure] by 40-80% which confirmed that the Pten 3'UTR sponge was functionally engaging the 48 2.1. RESULTS miRNAs in the cell and competing with its known ceRNAs (Figure 2.8d). Additionally, in the RNAseq analysis, we could discriminate between the Pten UTR and Pten coding sequence reads, and found that Pten mRNA (cds) was de-repressed increasingly as the exogenous Pten 3'UTR sponged away miRNAs from the endogenous Pten mRNA, confirming our observation that Pten 3'UTR was under mild repression [figure 2.10 b]. Having observed an increase in Pten (coding sequence) expression throughout all the bins, we reasoned that the transfected Pten UTR sponges could also derepress other potential ceRNAs across the transcriptome. FACS sorting cells with varying amounts of Pten 3'UTR for RNASeq Assay In order to isolate cell populations expressing varying amounts of Pten 3'UTR we used FACS sorting. We used mcherry fluorescence intensity to bin cells with similar transcriptional activity (e.g. due to varying plasmid copy numbers), indicating similar levels of Pten 3'UTR sponge. For both of the Pten 3'UTR and NULL 3'UTR transfections, we then FACS sorted 100,000 live cells in 4 different bins across 3 orders of magnitude (Figure 2.10a), see Methods), extracted RNA (500-10OOng per bin) and quantified the transcriptome of each bin using RNA sequencing. As the amount of plasmid expression in bins 2 and 3 were upto 30% of the total reads (Figure 2.9a), and moreover, due to repression the expression of Pten 3'UTR and Zsgreen expression were very different in each Pten UTR or NULL BIN, estimating fold changes was not straightforward. Even after explicitly removing the reads coming from the plasmid, and performing RPKM normalizations, we observed an overall offset in the overall distribution of fold changes (Pten UTR/NULL) in bins 1,2,3 (Figure 2.9b). We used the more appropriate TMM (trimmed mean of M-values) normalization method [Robinson 2010] to estimate the scale factor to remove the overall offset. After doing so, we could measure the fold changes of the transcriptome in each bin reliably, and set it as the ratio of the normalized TMM values in each Pten 3'UTR bin to the TMM values in the corresponding NULL UTR bin. Now that we could reliably infer the magnitude of fold changes causes by the sponging effect of Pten UTR, we decided to explore the concordance between genes identified as 49 Chapter 2 putative Pten ceRNA's by the siPten knockdown and genes derepressed by Pten UTR overexpression. These genes would be sensitive to the levels of Pten, and so would be extremely likely interacting with Pten through the crosstalk mechanism. To identify these genes, we first obtained the distribution of null fold changes from technical replicates in both the siPten knockdown and the Pten 3'UTR RNAseq data, and defined a null fold change threshold and CS of 1 standard deviation above 0 (Figure 2.10c). We only considered genes whose FC and CS were above this threshold. These genes, were therefore both reduced when Pten was knocked down, and de-repressed when miRNA's were sponged away by Pten 3'UTR, making them sensitive to perturbations in Pten levels in both directions. We found 2305 genes meeting these criteria in bin 0, 2493 in bin 1, 2090 in bin 2 and 2470 in bin 3. 50 2.1. RESULTS a C RPKM normalization Fewgesswah motmad counts C\1 TMM normalization C.'j 3,~ z0 0 0 C, BINO Ti- I. b C%j . F1 * PTEN UTR NULL UTR J 20 0 BIN I 04 Jft 30 BIN -C\- I* -j .* BIN I .- 10 - t 0.. .- --- . *S. * 0 BINO BINI i BIN2 BIN3 C.j -j d 1Wit BIN 2 -1 . . BIN 2 C\j 0 U- -C'J 0.75 B IN 3 0 0.5 BINO BINI BIN2 BIN3 10 5 A=.5*10g2(WT*NULL) 10 5 A=.5*10g2(WTr*NULL) 0 Figure 2.9 1 Normalization is required for FACS Sorted RNAseq data as reads from plasmid occupy a large percentage of total sequencing reads leading to an overall offset in fold changes. (a) Schematic of two libraries A and B with a small set of genes in library B having enormous of sequencing reads thereby reducing the sequencing "real estate" for the rest of the genes. Require an overall scale factor to normalize the library sizes. (b) Proportion of Plasmid reads (mcherry+zsgreen+Pten 3'UTR) of the total sequencing reads from the indicated sorted bins ( in the Pten 3'UTR and the NULL 3'UTR data sets). Total RNA output from each bin is quite different with reads from plasmid taking increasing sequencing real estate. (c) M (fold changes) versus A (average expression) plot comparing RPKM values from the Pten UTR and NULL datasets for each bin shows a clear offset from zero from Bins 1,2,3 (left panel). Genes indicated in red are in the middle 40% of M values and middle 90% of A values which are used to estimate the TMM factor as described in the methods. The green line shows the estimated TMM factor and is offset from zero in bins 1,2,3. Panel on the right contains the same M-A plots with the offset removed after normalizing the fold changes with the TMM factors. y-axis is in log scale. (d) Estimated TMM normalization factors from b) is used to normalize the library size of the respective bin. We hypothesized that these genes were most likely to be 'robust Pten ceRNAs' , and thus would show a bin-dependent signature of crosstalk. We observed increasing derepression in 51 Chapter 2 these robust Pten ceRNA's as more Pten UTR's were expressed in the system (Figure 2.10d). Notably, these robust Pten ceRNA's have a median fold change of 0.19 even when very few Pten UTR sponges are present (bin 1) and the median fold-change increases to 0.27 when 10 3 more Pten UTR sponges are present (in bin 3). In order to verify that the fold changes in the transcriptome that we observed with the 3'Pten UTR plasmid were miRNA-dependent we examined if the magnitude of derepression was correlated with the overall functional efficiency of the miRNA binding sites (based on the context+ score of the site) in each bin. We relied on our siPten knockdown data to select those miRNAs that were involved in Pten crosstalk with high confidence. We had ascertained that miR-17, miR-19a, miR-20a and miR-130 were the miRNAs both strongly involved in transmitting Pten crosstalk to their other targets, and were highly enriched in the putative Pten ceRNAs. Moreover, they have been show to physically bind to Pten 3'UTR by RNA-Immunoprecipitation (RIP assays) [Poliseno 2010] . With our cleaner Pten 3'UTR overexpression system, we investigated whether these miRNA were directly being sequestered by Pten 3'UTR and thus derepressing their targets, as such a dependence would result in an increasing bin-dependent fold changes of their targets. We analyzed the relationship between bin-dependent derepression of genes by these miRNA's based on their site number, site type (6-,7,-8-nt sites), site position, and other determinants used by TargetScan to calculate total context+ scores of predicted miRNA targets [Lewis 2005; Garcia 2011]. Binding sites with greater context+ scores have been shown to to be effectively bound by miRNAs and repressed. When predicted targets of miR-17, miR-19a, miR-93 and miR-130 were distributed into 4 context+ score bins and the distribution of fold changes was plotted, the effect of increasing target derepression was clear in bins 2 and 3, but not so in bins 0 and 1 (Figure 2.10e). Thus, the affinity of miRNA binding sites in each receiver leads to greater crosstalk strength even for a fixed amount of the sender in each bin (PTEN 3'UTR level) 52 2.1. RESULTS a 5 "'r '3- 2 3r MCHERRY - - M- PTENLUTR ZSGREEN - -1 transfecti on+flow sorting 3 N 2.5- B -N bos 0 1 2 3 r BIN 0 BIN I 4 BIN 2 BIN 3 d Iog(mcherry) C +FC -1.0 0.5 0.0 -0.5 FoMwinge i 1.0 1.5 MM1) I -I**1 / 15 1 IC / I BN BIND# 0F -1.5 -1.0 -0.5 0,0 PTENCro95b e 1 * -1.5 M 0.5 1.0 h0 (5 1.6 1N0 FC~ang Ch9 ( NULL r 1O rI~hn BIN 3 (WT/NULL) BIN 2 (WT/NULL) BIN I (WT/NULL) 0 25 75 miR-17 miR-19a miR-93 miR-130 Lot Fold Change miR-17 miR-19a miR-93 miR-130 - Lot 0.0 Can Loni Fold Change 1.0 miR-17 miR-19a - miR-93 miR-130 I. Lotu Fold Change Figure 2.10 1 Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dose-dependent and miRNA dependent manner. (a) Schematic of FACS sorting: Cells are transfected with bidirectional plasmids expressing mCherry and ZsGreen with Pten 3' UTR and without (NULL 3' UTR) . The transfected cells are sorted on the flow sorter into 4 different bins depending on mCherry expression and collected for downstream RNAseq (b) Expression (in RPKM) of mcherry, zsgreen, pten3' UTR and pten coding sequence in each bin for the cells transfected with Pten 3' UTR plasmid. Pten coding sequence (RPKM) is increasingly upregulated in each bin indicating that the Pten 3' UTR plasmid is capable of sponging away miRNA. 53 Chapter 2 Figure 2.10 | Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dose-dependent and miRNA dependent manner. (c) Distribution of RNAseq fold changes for bini (WT/NULL) and Pten CS. We refine potential Pten ceRNAOs by intersecting the sets of genes repressed in Pten knockdown and derepressed in Pten UTR transfection. A threshold for null changes "OFC" or "OCS" is determined as 1 std. deviation of the fold change in the technical replicates (gray bar). Only genes that have positive Pten crosstalk stength (+CS) and positive fold changes in each bin are considered as 'robust Pten ceRNAs' as they are sensitive to Pten levels in both directions i.e they are reduced when Pten is knocked down and are de-repressed when Pten 3'UTR sponges are introduced. (d) Cumulative distributions of fold change for genes in the intersection of the two datasets. Inset shows the median fold change for robust ceRNAs in each bin. Robust ceRNAs are increasingly derepressed in each bin. P-values for difference in medians between each preceding bin were calculated by Wilcox rank sum test (P<0.05 (bini), P<0.01(bin2), P<10^-16(bin3)) (e) Cumulative distributions of fold changes for all targets of indicated Pten miRNA's (that were enriched in the list of Pten ceRNAs from the knockdown dataset) with increasing Context+ scores (colour). 2.2 Discussion and conclusions Recent experimental studies have suggested that miRNA-mediated competition between RNAs could be a new channel of post-transcriptional gene regulation, and such RNA-RNA 'crosstalk' affects many different biological contexts. Our study represents the first genome wide measurement of crosstalk strength in response to the knockdown of three different genes. Previous studies of the ceRNA hypothesis have concentrated on only one or a few targets of a miRNA even though a perturbation in a ceRNA that changes miRNA activity is expected to affect hundreds of targets. Quantifying the magnitude of crosstalk has also proven challenging as existing studies rely on qPCR or luciferase assays, both of which have difficulties in extracting precise fold changes due to issues with primer/enzyme efficiency or amplifications biases. In order to fully test the generality and magnitude of the miRNAmediated crosstalk hypothesis, it is necessary to perform perturbation experiments to see how the alteration of the expression level of one 'sender' mRNA could affect other 'receiver' mRNAs regulated by the same miRNAs. Thus it is essential to measure crosstalk strength transcriptome wide using a quantitative assay. Knocking down a individual mRNA is expected to widely affect the transcriptome making it difficult to extract the effect of miRNA-mediated effects. Thus we used the DICER -/54 2.2. DISCUSSION AND CONCLUSIONS cell line, which has depleted levels of mature mIRNAs, to isolate only those fold changes that were miRNA dependent. Careful scrutiny of RNA-seq crosstalk strength measurements in HCT116 and DICER-/- yielded a high-confidence set of putative ceRNAs for each of the three senders. We studied whether this cohort of putative ceRNA were actually miRNAmediated and hence in accord with a ceRNA effect. Firstly, we found them to be enriched in miRNA-binding sites for their respective senders. Secondly, the hypothesis implies that the crosstalk effect should be more effective if genes share more miRNA binding sites with the sender. However, such a feature has not been experimentally demonstrated to the best of our knowledge. We binned our list of putative ceRNAs by the number of miRNA shared binding sites and found that their magnitude of crosstalk strength was correlated with the number of binding sites shared with their senders. We suspect that this feature implies that multiple miRNAs act cooperatively on receivers. Thirdly, different miRNAs have different total # of binding sites in the transcriptome, are sequestered to different extents by each sender, and therefore should have different ability to transmit crosstalk ("crosstalk power"). By considering the difference in CS distributions for targets vs non-targets, we ranked the crosstalk power for all miRNAs expressed in the cell-line. Other miRNAs besides miR-17, miR-19 and miR-26 (tested by Tay(2011) have greater PTEN crosstalk powers, thus we suggest that manipulating the levels of more highly ranked miRNAs from our list, would be more effective for future studies. As an example, because PTEN ceRNA's have known oncogenic effect, miR-374ab which we found has the highest crosstalk power for PTEN, could be a useful target for miRNA based cancer therapies(Cai 2013). The originating studies of the crosstalk hypothesis had computationally found many possible ceRNA candidates, but had only experimentally tested a few genes. Our data indicates that ceRNAs were pervasive across the transcriptome and were broadly expressed ( 3-decade range of RPKM).The functional relevance of a broad class of ceRNAs may be concordant buffering of lots of genes involved in similar biological functions. Indeed, in a GO-term analysis of putative ceRNAs, we observe shared functional roles of Pten, Vapa and Cnot6l ceRNAS with their respective senders. Such a covariation between a broad class of ceRNAs when a sender is perturbed could help maintain a stoichiometric balance in pathways. 55 Chapter 2 However, we caution that it is difficult to construct a null model for whether these covariations are themselves caused by transcriptional level changes of the sender. Our candidate ceRNAs may not be true ceRNAs. A limitation of our analysis is its dependence on DICER -/- cells to control for spurious crosstalk effects that result from purely transcriptional network perturbations. For example, it is well-known that regulatory network structures such as incoherent feed-forward loops can produce positive correlation between an mRNA and targeting miRNA/Transcription Factors (Tsang 2007). How many of the ceRNA candidates identified in our analysis are directly repressed by targeting miRNAs is currently unknown. Detailed experimental work is needed to examine these candidate ceRNAs; in particular, assays for miRNA binding and siRNA knockdown experiments can provide more conclusive evidence for ceRNA interactions in individual receiver-sender pairs. It will be a combination of our transcriptomic analysis with more biochemical assays to identify binding partners that will enable a greater understanding of the crosstalk mechanism. The size of the ceRNA effect has been widely considered larger than expected purely by steady-state target competition because of the typically large number of targets (Broderick & Zamore 2014). We find that crosstalk strength, though substantial, is usually less than 0.5 for most genes, and is generally bound by 1 for the three different senders that we tested. Crosstalk strength is larger than we expected based on most sequestration models (including ours). For example, we estimate sequestration for most miRNAs on Pten to be less than 1% and Pten mRNA repression to be atmost 2 fold (based on PTEN 3'UTR sponging data) . So CS < Sequestration de Repressionr implies CS < 2%. In or- der to explain the relatively large CS magnitude, we suggest two possibilities. Firstly, it remains unclear if the total binding sites for a miRNA are truly in excess of miRNA concentrations locally. Estimates of total average binding sites in the cell might be irrelevant to individual miRNA-target interactions that depend of local miRNA/target concentrations. A recent theoretical study (Figliuzzi,2013) also finds that substantial crosstalk requires a small number of competing target sites. They propose that ceRNA function may require a channel of 'stoichiometric decay', in which a bound miRNA needs to be destabilized or functionally depleted by other mechanisms such as trapping in P-bodies. Secondly, the topol- 56 2.2. DISCUSSION AND CONCLUSIONS ogy of the ceRNA-miRNA network may play an important role as strongly interconnected sender:miRNA:receiver subnetworks could enhance crosstalk. For examples, the miRNAs (miR-17 and miR-19) which are strongly implicated in PTEN ceRNAs in our data, are co-transcribed in polycistronic regions, and tend to have similar sets of targets, suggesting their repressive effects can amplify for a large number of ceRNas. (Yip 2014) The discrepancy between the ceRNA effect we detect by over expressing PTEN 3'UTR (even at low amounts) and the lack of any detectable ceRNA effect by over expression of synthetic miR-122 seed-sites (Denzler 2014) may be due to atleast two reasons. Firstly, we used a full-length endogenous PTEN 3'UTR (3.3Kb) which contains multiple binding sites of 25 miRNA families while Denzler et al used a short (125bp) AldoA mRNA with a single miRNA binding site (miR-122). The functionality of miRNA target sites is affected by numerous 3'UTR properties including, the presence of multiple target sites in close proximity (Grimson 2007, Broderick 2011), the position of the site in the 3'UTR (Marjoros 2007), and the synergistic repression of multiple miRNAs(Lai 2012). Thus an endogenous 3'UTR' with multiple miRNA sites could have greater sequestration and miRNA-repression ability than AldoA. Moreover,in our system, the PTEN 3' UTR sponge is under constant repressive fold changes ranging from 2-3 (Pten) unlike miR-122 sponge, which exhibited a loss of repression at higher induction levels ( from 2-fold to 0.1) suggesting that miR-122 was saturated. Other endogenous 3'UTRs we measured also had constant repression fold changes 2-fold (Weel) and 5-fold (Lats2) (c.f Chapter 4, Schmiedel 2015) at all induction levels, suggesting that endogenous 3'UTRs carrying more seed-sites are attracting more miRNA repression. Secondly, the ceRNA effect depends on the cellular concentrations of miRNAs; our cancer HCT116 cell-line has a different miRNA expression profile compared to primary cells. The oncogenic miR-17-92 cluster in particular, which we found has high PTEN crosstalk power, is known to be significantly upregulated in the HCT116 cell line (Wang 2008). 57 Chapter 2 2.3 Methods and Materials 2.3.1 Cell culture and siRNA Transfection The HCT 116 colorectal cancer cell-line was obtained from ATCC ( American Type Culture Collection). The HCT116 DICEReon5 -/- cell lines was a kind gift from Dr. B. Vogelstein and was generated as described previously (Cummins 2006). HCT116 wild-type and HCT 116 DICER -/- cells were grown in an ATCC-formulated McCoy's 5a Medium Modified (Catalog No. 30-2007) plus 10% (v/v) FBS, penicillin/streptomycin (Gibco), L-glutamine at 370C in a humidified atmosphere with 5% CO 2. Cells were grown adherently in 10cm dishes or 6-well plates at a seeding density of 1.0x10 5 cells/cm 2 until they were 50% confluent (4050 hours), upon which they were trypsinized, re-plated and transfected with 25nM siRNA for 24 hours. Titration of the siRNA and the transfection reagent was performed (data not shown), and the lowest working amounts of the siRNA and the transfection reagent were applied in the present study. Transfection of siRNA oligonucleotides was performed with Dharmafect lipid transfection reagent according to the manufacturer's protocols. siRNA were purchased from Dharmacon as smart pools. Titration of the siRNA and the transfection reagent was performed (data not shown), and the lowest working amounts of the siRNA (25nM) and the transfection reagent were applied in the present study.With this protocol more than 90% of cells were positive to the fluorescent siGLO RISC-free control siRNA. A list of immunological reagents used in this study is below. A master mix was created for each individual condition in order to eliminate pipetting errors and to increase consistency between each well. Each siRNA was transfected in triplicate in each of HCT116 and DICER -/- and all the knockdown experiments were done simultaneously to avoid an additional source of variation. After 24 hours cells were harvested for various assays. 58 2.3. METHODS AND MATERIALS Reagent Source McCoys 5A Medium;Fetal Bovine Serum (FBS) ATCC(30-2007, 30-2020) Trypsin ATCC(30-2101) siGENOME siRNA pool for nontargeting 1 Dharmacon (Catalog D-001206) 5X siRNA buffer Dharmacon (Catalog B-002000) SMARTpool si-Pten Dharmacon (Catalog M-003023) SMARTpool si-Pten Dharmacon (Catalog M-021382) SMARTpool si-Pten Dharmacon (Catalog M-016411) 2.3.2 RNA extraction Total RNA was extracted from cells using Trizol reagent for the RT-PCR assay or using RNeasy (Qiagen) for RNA-sequencing assays following the manufacturer's protocols. RNA pellets were resuspended in 20ul RNase-free sterile water, RNA quantity was assessed spectrophotometrically using the NanoDrop ND-1000 UV-VIS Spectrophotometer (Thermo Fisher). The RNA integrity number (RIN) was assessed with a 2100 Agilent Bioanalyzer to verify RNA quality for all experimental samples. Only samples with RIN >9 were used for sequencing. 2.3.3 RT-PCR mRNA levels of various transcripts were measured using RT-PCR. Reverse transcription into cDNA was done using a First Strand Synthesis kit (Invitrogen). RT-PCR was performed in triplicate reactions using SYBRGreen mix (Applied Biosystems), run on Applied Biosystems 7500 Real-Time PCR instrument. Levels of various genes after siRNA knockdown were measured with the ddCT method and human Actin for normalization. List of primers used are in a supplementary table. 59 Chapter 2 2.3.4 Reporter Plasmid Construction Starting from a previously established reporter system (Mukherji 2011), the plasmid pTRETight-BI (Clontech), eYFP was replaced with ZsGreen1-1 (Clontech) using EcoRI and NdeI digestion sites. We received the psicheck2 -Pten 3'UTR plasmid as a kind gift from Yvonne Tay, The Pten 3'UTR sequence was cloned from that plasmid using custom primers, and was inserted into the bi-directional plasmid into the ZsGreen MCS via the NdeI and XbaI digestion sites using standard cloning techniques.This reporter plasmid is referred to as the Pten 3'UTR sponge plasmid in the text. The "NULL" plasmid, which we used as a control, consists of the same construct as above, but without the Pten 3'UTR i.e just the plasmid containing the bidirectional tetracycline-responsive promoter that drives the transcription of two fluorescent reporter proteins: ZsGreen and mCherry. All constructs were sequence confirmed 2.3.5 Transient Transfection of plasmid HCT 116 cells were grown in 2m of culturing media (antibiotic free) on 6-well dishes for two days before the transfection. PtenT3'UTR or NULL plasmids were mixed with the rtTA plasmid at a ratio of 3:1 (1.5 ug reporter plasmid: 0.5 ug rtTA plasmid) and then cotransfected into the cells in a medium consisting of 10ul Lipofectamine 2000 (Invitrogen) and 250ul Opti-MEM. 6 hours post-transfection, when the cells had stabilized, they were detached with trypsin, passaged onto 60mm plates in 3m1l culturing medium and induced with 1 ug/ml doxycycline (Sigma). Live cells were taken for flow sorting assay 18 hours post-induction. 2.3.6 FACS sorting At the end of the transfection period, live cells were trypsinized, pelleted and resuspended into a single-cell suspension in McCoys 5A medium . These transfected cells were sorted by FACS into ice-cold PBS+3% FBS using a BD Biosciences Aria II flow cytometer in the following manner: (i) Single cells were gated using their FSC-A and SSC-A scatter profiles 60 2.3. METHODS AND MATERIALS (ii) Only those cells containing the reporter plasmid were chosen based on their mCherry expression values. (iii) We collected cells into 4 different bins based on their mCherry expression values (see figure). 100,000 cells from each bin (the same bins were used for sorting both Pten UTR and NULL UTR) were sorted into eppendorf tubes containing ice-cold PBS 1%FBS buffer and their RNA was extracted as above. This method gave a total of 500-1000 ng of RNA per bin. For Analytic flow cytometry cells were detached with 0.05% trypsinEDTA, washed and resuspended in sterile 3% FBS PBS. Measurements were performed on a BD Biosciences LSR Fortessa platform. 2.3.7 RNA Sequencing From isolated RNA, poly(A)+ RNA sequencing libraries were prepared using Illumina TrueSeq Stranded mRNA kit in the MIT BioMicro Center. The prepared libraries were multiplexed and sequenced on an Illumnia HiSeq 2500 sequencer to obtain single-end 40-bp reads. On average we obtained 20 million reads per sample. For each sample, there were three biological replicates. Reads were aligned with Burrows-Wheeler Aligner (BWA)(Li and Durbin, 2009) using parameters [q (PHRED-quality)=30,1 (seed length)=30] to modENCODE integrated transcript models on the basis of human genome (hg19 version). We allowed a maximum edit distance of 2 [options "aln -n2" and flag "-uniq=1' to only map unique reads. The output was converted into SAM format using the BWA "samse" option, and processed with a custom perl script. Each library had 85% mapped reads. For the Pten 3'UTR plasmid transfection experiment, we disaggregated pten sequence into pten 3'UTR and pten cds and the sequences of mcherry, zsgreen were added to hg19 transcript model. Reads were aggregated across isoforms, and expression per gene locus was calculated in reads per million mapped reads (RPM). Whenever expression was measured in RPKM, the length of merged isoforms was used for normalization 2.3.8 RNASeq Data Analysis Genes with no zero-read counts in any of the libraries were retained, resulting in a total of 13,700 (out of 23,704) expressed genes. RPKM values were averaged over the 3 biological 61 Chapter 2 replicates. The CV was estimated in a Gene-independent manner by pooling all the CV measurements at a given expression in the following way: Loess regression was performed to obtain an error model relating expression CV for each gene as a function of expression mean for all samples. Expression CV for each gene was adjusted to the loess-regression fitted line of expression CV to expression mean. Significance of fold-changes was by calculating z-scores and standard benjamini-hochberg multiple hypothesis corrected p-values were obtained. 2.3.9 miRNA-mRNA Target prediction Genes were labeled as predicted microRNA targets if they contain at least one predicted conserved microRNA binding site (Targetscan6.2 (Garcia, 2011) for a microRNA seed family expressed in HCT 116. 2.3.10 miRNA expression Data sources For expression of miRNAs in HCT 116, we obtained microarray-data sets generated by [Yan 2011], and were downloaded from NCBI GEO (Series GSE26819). 2.3.11 Target Abundance and Sequestration estimation For each conserved human miRNA, the total number of predicted 6-,7-, and 8-nt 3'UTR binding sites on a gene were weighted by the RPKM expression value of that gene in the untreated HCT 116 RNAseq data to yield the TA for each miRNA. We estimated the fraction of miRNA i sequestered by Pten (similarly for Vapa and Cnot6l) as SequestrationmiRNAi Pten 2.3.12 [# of predicted niRNAi binding sites on Pten] x [PtenRPKMexpression] miRNAi binding sites on gene j] x [gene JRPKM expression] Ej[#of predicted GO term analysis GO term analysis was performed in R using the GOstats package [Falcon 2007]. For each set of putative Pten, Vapa or Cnot6l ceRNAs, we collected the GO terms associated for 62 2.3. METHODS AND MATERIALS each mRNA in the set. For each term, we then computed a p-value using a hypergeometric test, to indicate the enrichment of the term in the ceRNA set compared to the background set of all genes. 2.3.13 TMM (Trimmed Mean of M-values) Normalization Methods for normalization of RNA-sequencing gene expression data commonly assume equal total expression between compared samples. The number of reads expected to map to a gene not only depends on the expression level and length of the gene, but also on the composition of the RNA population that is being sampled [Robinson 2010]. Thus, if a large number of genes are unique to, or highly expressed in a experimental condition, the sequencing 'realestate' available for the remaining genes in that sample is decreased. If not adjusted for, this sampling artifact can force any fold-change analysis to be skewed. This is precisely the situation in our FACS sorted sequencing dataset. Upon transfecting reporter plasmids into cells and inducing thousands of transcripts we obviously change the global gene expression to different extents in each bin. We sorted cells by their expression of mcherry transcripts, and consequently found a large 3-log decade increase in mcherry read counts in the untrasnfected (bin 0) and the fully saturated bin 3 (Figure 2.9b). Mchery and zsgreen reads combined were as much as 30% of the total reads in the last bin. Define Y 9kk as the observed read count for gene g in library k and N as the total number of reads for library k. Remove all instances of Y 9k = 0 as fold changes cannot be calculated. Then for each bin, let k and and k' stand for the Pten UTR and control NULL UTR library. Define the gene-wise log-fold- changes M between these two libraries as: 9 M =io g = Yk/Nk 2(Ygki1Nk,) and absolute expression levels A as g A = 2log2 (Y kIN x Y k,/N,) for Y ,4 0 If there would be no bias by RPKM normalization, one would expect that the distribution of M values would be centered around zero. This is not so due to the distorting effects of the different amounts of plasmid RNA in each bin. To eliminate their effects, we robustly 63 Chapter 2 summarize the M and A values, by a trimmed mean. A trimmed mean is the mean after removing the upper x% and lower x% of the data. We use a double trimming of both the M and A values: trimming the top 30% and bottom 30% of M values, and the top 5% and bottom 5 % of A. After trimming, we define the TMM Factor as the mean M of the remaining genes. This TMM factor is then used to normalize the library size for library k. We estimated TMM factors for bin 1 as 0.91 indicating that the Pten UTR library size had to reduced by that factor. After performing this normalization, we find that the overall offset in fold changes is 0 as expected(Figure 2.9c). The TMM factor is reasonably stable for different choices of trim percentiles [data not shown]. 2.4 a Supplementary Figures and Tables LYR UMmgi3302 1Uman PEI!N NM.0*314 CoaservWd tm for~nMINA fimflhsbraft1 comerved among vertebratks miR-23ab 29bmiR-19 iR-19 miR-26b1297 miR-17 5p-nm9 miR-23ab nuUR-19 i-26EW1297 ma103Imi-22 miR-148&152 1OM5194 I - MiR-130V30I mil-205 I I b Human VAPA N14M 3574 SI trMkw5724 amngertba fxmes brocoswerd fo ndR A smmM-13=/12 Conswed ma&-bMlaag nR2 mM.3Oa-plmiR-194 miR-11206 miR-145 75 p0c4/384-5p 2 miR-10130 miR-451 miR-19 C Human CNOT6L NM_144571 31 UTR Iengfk7042 Cosmrved dbz for miRNA fmuiak brosift comaurved ateb miR-5/16t94247 nu -9 nR-19 maiR-17-5p/20W3,d1.5I94 miR-499/49-5p miR-IR2 miR-23ab miR-23ab rutesx miR- miR-365 miR-15/16195424/497 niR-34w34b-5p34c/34-5pi449449abd609 miR-137 iR-961271 ImiR-507 1 Ilet-7 miR-145 mlR-144 Figure 2.11 I Predicted TargetScan conserved miRNA binding sites in the 3'UTR of the ceRNAs chosen in this study. (a) PTEN is targeted by 25 conserved miRNAs (b) VAPA is targeted by 28 conserved miRNAs (c) CNOT6L is targeted by 44 conserved miRNAs 64 2.4. SUPPLEMENTARY FIGURES AND TABLES a C :xz:.: vc r b t 0- . . 0-' 0-. 5 -1. 1.0 0.5 0'.0 -. 5 PTEN Crosstalk Strength -1.5 -1.0 -0.5 0.0 0.5 VAPA Crosstalk Strength 1.0 -1.5 0.0 0. -1.0 -0.5 CNOT6L Crosstalk Strength Figure 2.12 1 Crosstalk is microRNA mediated and pervasive on a genome-wide scale. Related to (Figure 2.3). Volcano plot of magnitude of Crosstalk Strength versus P-value in each of the sender-knockdowns for putative ceRNAs i.e only those genes with crosstalk strength in HCT 116 greater than DICER cells. (HCT CS >DICER CS). Data points for genes marked in green have P-value < 10-3 and are statistically significant. Number of putative ceRNAs for the given sender are indicated in the legend. C BIN1 BIN2 BINS LL -5 E 0 0 0 y C -1.0 -0.5 - -'^-~BIN - 0.0 0.5 1.0 2.0 3.0 # 0.0 1.0 1.5 Fold Change (WT/NULL) Figure 2.13 1 Distribution of log2 fold changes (PTEN UTR/NULL) for all genes post TMM normalization is centered around zero in each bin i.e no bin-dependent effects are seen. Related to Figure 2.10. 65 1.0 Chapter 2 Table 2.1 1MicroRNA's enriched in genes with positive PTEN Crosstalk Strength with hypergeometric p-value less than 0.05. MicroRNA's in bold are those that are predicted to target PTEN. miRNA seed family P-value Enrichiment factor nuiR-200bc/429/548a miR-17/17-5p/2Oab/20b-5p/93/1O6ab/427/518a-3p/519d miR-23abc/23b-3p miR-340-5p miR-101/l01ab miR-19ab niR-181abcd/4262 rniR-144 miR-300/381/539-3p miR-590-3p rniR-13Oac/301ab/301b/30lb-3p/454/721/4295/3666 miR-93/93a/105/1O6a/29la-3p/294/295/3O2abcde/372/373/428/519a/520be/52acd-3p/1378/1420ac miR-30abcdef/30abe-5p/384-5p miR-26ab/1297/4465 miR-186 miR-141/200a niiR-25/32/92abc/363/363-3p/367 miR-15abc/16/16abc/195/322/4-)4/497/1907 miR-27abe/27a-3p miR-216b/216b-5p miR-148ab-3p/152 nuiR-495/1192 miR-96/507/1271 miR-21/590-5p miR-132/212/212-3p miR-543 miR-503 miR-153 miR-374ab miR-205/205ab miR-448/448-3p miR-124/124ab/506 miR-7/7ab miR-410/344de/344b-1-3p miR.155 miR-433 miR-lab/206/613 miR-221/222/222ab/1928 miR-217 miR-202.3p rniR-128/128ab miR-320abcd/4429 miR.140/140-5p/876-3p/1244 miR-223 rniR-544/544ab/5,14-3p miR-218/218a miR-iSI miR-499-5p miR-199ab-5p mill-29abcd miR-139-5p miR-194 miR-494 miR-103a/ 107/lO7&b miR-2O8ab/M ab-3p miR.137/137ab iniR-224 let-7/98/4458/4500 miR-421 miR-290-5p/292-5p/371-5p/293 miR-9/9ab mniR-135ab/135a-5p miR-653 miRl-142-3p miR-l96abc miR-377 miR-l8ab/4735-3p miR-425/425-5p/489 miR-l38/138ab miR-24/24ab/24-3p miR-324-5p miR-33a-3p/365/365-3p 0 1.01E-13 4.02E-13 4.33E-13 6.78E-13 9.20E-13 4.15F,12 1.41E-11 8.74E-11 4.63E-10 1.12E-09 1.55F,09 2.77E-08 2.92E-08 1.04E-07 1.32E-07 1.55E-07 6.52E-07 1.13E-06 1.65E-06 2.19E-06 2.56E-06 2.74E-06 2.89F-06 4.94E-06 4.98E-06 5.20E-06 7.92E-06 8.67E-06 9.67E-06 2.29-E-05 2.27E-05 3.60E-05 6.57-05 7.06F-05 9.37E-05 0.0X001 0.0001 0.00018 0.0001 0.0003 0.0004 0.0005 0.001 0.001 0.001 0.00)1 0.001 0.00)2 0.002 0.003 0.003 0.003 0.003 0.003 0.004 0.006 0.006 0.007 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.019 0.03 0.03 0.04 0.04 0.04726543453742 1.87 1.8 1.79 1.7 1.9 1.78 1.75 1.87 1.83 1.62 1.8 1.8 1.58 1.72 1.68 1.74 1.71 1.53 1.55 2.14 1.71 1.62 1.57 2.13 1.94 1.69 2 1.69 1.67 1.86 1.67 1.42 1.8 1.67 1.81 1.87 1.57 1.76 1.86 1.58 1,218 1.54 1.8 1.8 1.58 1.48 1.411 1.76 1.61 1.42 1.74 1.69 1.56 1.53 2.06 1.4 1.63 1.1 1.59 1.63 1.35 1.47 1.78 1.66 1.75 1.46 1.71 1.78 1.47 1.12 1.94 1.64 66 2.4. SUPPLEMENTARY FIGURES AND TABLES Table 2.2 1 MicroRNA's enriched in genes with positive VAPA Crosstalk Strength with hypergeometric p-value less than 0.05. MicroRNA's in bold are those that are predicted to target VAPA. miRNA seed family P-value Enrichment factor miR-17/17-5p/20ab/20b-5p/93/106ab/427/518a-3p/519d miR-200bc/429/548a miR-93/93a/105/106a/291a-3p/294/295/302abcde miR-30abcdef/30abe-5p/384-5p miR-202-3p miR-300/381/539-3p miR-186 let-7/98/4458/4500 miR-23abc/23b-3p miR-19ab miR-27abc/27a-3p miR-590-3p miR-217 6.53E-08 1.50E-07 2.49E-07 2.90E-07 1.60E-06 2.80E-06 1.09E-05 7.OOE-05 7.86E-05 0.0001 0.0001 0.0001 0.0001 1.8 1.8 1.94 1.72 1.96 1.84 1.79 1.7 1.64 1.62 1.62 1.55 2.17 miR-148ab-3p/152 0.0002 1.8 miR-340-5p miR-9/9ab miR-144 0.0003 0.0003 0.001 1.53 1.58 1.64 miR-130ac/301ab/301b/301b-3p/454/721/4295/3666 0.002 1.63 miR-26ab/1297/4465 miR-25/32/92abc/363/363-3p/367 0.002 0.002 1.62 1.64 miR-128/128ab miR-101/101ab miR-182 miR-503 miR-374ab miR-141/200a miR-192/215 miR-543 0.002 0.005 0.007 0.007 0.008 0.01 0.01 0.01 1.57 1.63 1.5 1.93 1.64 1.59 2.4 1.61 miR-196abc 0.01 2.04 miR-139-5p miR-15abc/16/16abc/195/322/424/497/1907 miR-155 miR-221/222/222ab/1928 miR-181abcd/4262 0.01 0.01 0.02 0.03 0.04 1.89 1.44 1.75 1.72 1.42 67 Chapter 2 Table 2.3 1 Table of Biological Processes Gene Ontology (GO) annotations significantly enriched in putative PTEN ceRNAs. Only the top 10 are shown GO Term P-value Enrichment Description GO:0001568 GO:0036211 GO:0031323 GO:2000112 GO:0072358 GO:0019220 GO:0009891 GO:2001141 GO:0009892 GO:0045944 2.37E-09 8.06E-09 8.72F,09 2.69E-08 3.35E-08 7.64E-08 1.07E-07 1.51E-07 1.57E-07 1.76F,07 2.73 1.59 1.50 1.53 2.66 1.89 1.79 1.50 1.70 2.09 blood vessel development protein modification process regulation of cellular metabolic process regulation of cellular macromolecule biosynthetic process cardiovascular system development regulation of phosphate metabolic process positive regulation of biosynthetic process regulation of RNA biosynthetic process negative regulation of metabolic process positive regulation of transcription from RNA polymerase II promoter Table 2.4 1 Table of Biological Processes Gene Ontology (GO) annotations significantly enriched in putative VAPA ceRNAs. Only the top 10 are shown GO Term P-value Enrichment Description GO:0000279 GO:0051301 GO:0048285 GO:0000278 GO:0007067 GO:0007098 GO:0000236 GO:0006796 GO:0022403 GO:0009889 2.20E-14 4.61E-11 1.61E-10 4.49E-09 2.50E-05 3.62E-05 6.22E-05 6.50E-05 6.56E-05 6.83E-05 3.37 3.08 3.20 3.86 2.90 6.79 3.89 1.61 2.39 1.45 M phase cell division organelle fission mitotic cell cycle mitosis centrosome cycle mitotic prometaphase phosphate-containing compound metabolic process cell cycle phase regulation of biosynthetic process Table 2.5 1 Table of Biological Processes Gene Ontology (GO) annotations significantly enriched in putative CNOT6L ceRNAs. Only the top 10 are shown GO Term P-value Enrichment Description GO:0045766 GO:0009890 GO:0071276 GO:0048714 GO:0048514 GO:0071294 GO:0010035 GO:0001944 GO:0051918 GO:0031324 1.07E-05 7.93E-05 0.0001 0.0001 0.0001 0.0002 0.0002 0.0003 0.0003 0.0005 6.68 1.95 22.6913 84.91 2.50 18.91 2.54 2.29 42.45 1.69 positive regulation of angiogenesis negative regulation of biosynthetic process cellular response to cadmium ion positive regulation of oligodendrocyte differentiation blood vessel morphogenesis cellular response to zinc ion response to inorganic substance vasculature development negative regulation of fibrinolysis negative regulation of cellular metabolic process 68 Chapter 3 A single molecule analysis of ceRNAs reveals miRNA-dependent correlation and colocalization In the preceding chapter, we presented the genome-wide measurement of crosstalk strength for three different senders Cnot6l,Pten, Vapa and identified key features - # of shared miR- NAs, target abundance of miRNAs and the binding affinity of miRNAs -that impact the magnitude of crosstalk strength. We were able to isolate these factors by profiling transcript abundances for a bulk population of cells after perturbing sender levels. However the theoretical framework of the ceRNA hypothesis depends upon relative concentrations of targets and miRNAs, sequestration of miRNAs and the repression of miRNA targets. Each of these processes occur within individual cells. The cell is a highly complex environment that cannot be approximated as well-mixed due to the presence of numerous sub-cellular structures. Local concentrations of miRNA binding sites and miRNAs may differ from the average by large amounts thus affecting the rates of miRNA sequestration or repression. Hence, the absolute intracellular concentrations of these species, their spatial localization, and dynamics along with other molecules involved in miRNA biogenesis (Argonaute2) have to be taken into account for a quantitative understanding of the ceRNA hypothesis. Moreover, bulk measurements of crosstalk in cells following sender-knockdowns may mask some Chapter 3 other features of miRNA-mRNA coupling such as buffering of individual ceRNA fluctuations by their shared miRNA. This chapter focuses on quantifying endogenous transcription in single cells of known ceRNAs (Cnot6l,Pten, Vapa) with single-molecule resolution by an in situ hybridization (smFISH) assay and analyzing their spatial localizations. 3.1 3.1.1 Results Quantification of gene expression for Pten, Vapa and Cnot6l in single cells with 3-colour smFISH In order to quantify the absolute abundance of ceRNAs in single cells with molecular resolution, we used the single-molecule RNA FISH (smFISH) method (Femino et al., 1998; Raj, Bogaard, et al., 2008) which labels each RNA molecule of a particular species with a fluorescently coupled set of complementary oligonucleotide probe sequences. For each of Pten, Vapa and Cnot6l, we designed 25 to 48 fluorescently labeled probes, each 20 bases long, complementary to the coding-sequence of the target transcript (Figure 3.1). Cnot6l had a shorter coding sequence that admitted only 25 probes. We hybridized individual probe-sets for each of the three genes to fixed and permeabilized cells, stringently washed unbound probes and finally imaged the cells under the fluorescence microscope. For simultaneous detection of three different genes, we labeled our probes with spectrally distinguishable fluorophores (Cy5, Alexa Fluor 594, and Cy5) and imaged with the appropriate filter sets. The large number of fluorophores bound to a single mRNA results in diffraction-limited fluorescent spots corresponding to single transcripts (Figure 3.1). We found no obvious sub-cellular localization of these transcripts in the cytoplasm suggesting that ceRNAs are not uniquely captured in a particular structure. In order to quantify the expression level of endogenous mRNA in individual cells, we counted the fluorescent spots from the 3D images of cells using custom MATLAB scripts adapted from (Raj 2008). Each computationally identified molecule was assigned to a cell by manually tracing individual cell boundaries based on the DAPI nuclear staining signal. To robustly estimate expression levels we counted gene expression in 300-400 cells. We 70 3.1. RESULTS also performed identical smFISH experiments in the miRNA-deficient HCT 116 DICER -/cell line. In HCT 116, the ceRNAs Cnot6l,Pten and Vapa were expressed at an average of 12, 28 and 82 molecules/cell respectively. Expression levels of Pten, Vapa and Cnot6l were largely unchanged in the miRNA-deficient DICER-/- cell-line. Quantification of Pten and Vapa mRNA levels with our sensitive single-molecule method revealed a 1:3 ratio contrary to their previously reported ratio of 1:100 ratio in the same HCT 116 cell-line as estimated by qPCR. (Tay 2011, Ala 2013). We note that such a reported disparity in expression levels had led many authors to conclude that the crosstalk mechanism could not account for a sender expressed at extremely low levels to affect the levels of a receiver expressed 100-fold higher [Ebert, Sharp 2012]. Having established our quantitative 3-colour smFISH data-set we then proceeded to analyze the correlative structure of the single-cell gene expression of the three ceRNAs. PTEN ORF 48 probes CNOT6L ORF 25 probes VAPA ORF Aft4 4 - __L__- Figure 3.1 1 Measuring Pten, Vapa and Cnot6l gene expression in single cells with 3-colour single-molecule FISH. (a) Multiple 20-mer oligonucleotide probes for Pten, Vapa and Cnot6l were constructed and labeled with distinct dyes to allow simultaneous measurement of gene expression with a smFISH assay (b) Spots corresponding to single mRNA molecules resulting from the transcription of the genes Vapa (red, detected with oligonucleotide probes coupled to Alexa 595) and Cnot6l (green, oligonucleotide probes coupled to Cy5) in HCT116 cells. Representative maximum intensity z- projection. Diffraction-limited spots (molecules) in each channel were automatically identified with a custom MATLAB script and assigned to individual cells which had been manually segmented based on DAPI nuclear staining. 71 Chapter 3 3.1.2 Presence of shared miRNAs generates correlated fluctuations of Pten ceRNAs in single cells We used the smFISH data to determine if ceRNAs are correlated in individual cells, which would suggest shared miRNAs co-regulate their fluctuations, or if they varied independently, which would indicate miRNA coupling occurs at a slower timescale to gene expression fluctuations. Previous studies of Pten and Vapa have shown that their gene expression levels are correlated in different bulk tumor samples (Ala 2013). However, any competition and sequestration of miRNAs and consequent crosstalk is a single-cell phenomenon i.e ceRNAs sponge away miRNAs from each other within a noisy intracellular environment consisting of different levels of ceRNAs, miRNAs and RISC/DICER enzymes. Thus it becomes necessary to study the expression of ceRNAs in single-cells to investigate how the presence of shared miRNA biding sites in the three ceRNAs influences their gene expression. We plotted ceRNA pairwise gene expression in single cells for both the HCT116 and DICER datasets. Strikingly, we observed a significant correlation (Pearson correlation coefficient p -0.40) between the gene expression of the three ceRNA pairs in HCT116 cells that was lost in the miRNA- deficient DICER cells (p -0.10) (Figure 3.2b,c). Thus, a cell with low or high expression levels of one of the ceRNAs is likely to be in the corresponding expression state of the other ceRNAs only in the presence of functional miRNAs. In order to control for possible large-scale transcriptional network imbalances in the DICER cell-line that might result in all genes fluctuating randomly, we performed smFISH on Twisti and Pten. Twisti is a highly expressed transcription factor that induces epithelial to mesenchymal transition (Yang 2010), and significantly doesn't have any predicted miRNA binding sites in common with Pten. Moreover, Twisti had negligible Pten crosstalk strength in our RNAseq si-Pten knockdown experiment, making it an attractive negative control. We found that while Twisti and Pten expression levels were significantly negatively correlated in HCT 116 cells (p = -0.31), they remained negatively correlated in DICER cells (p = -0.29).(Figure 3.3) 72 3.1. RESULTS A nkdepender* ge AB gen s twing A stoichiomnery extrinsic noise Kntnsic noise C B D "U ""st"in"A'""essh-) 200 200 1 4150 0 . 2=0.91 *DICER HFCT p =.42*0.09 S 150 p=.11 000.8 g < o >n> 35 I0. SHCT D0 E 30 p=.42*0.08 30 1C P 30 % P =.020 .13 S 0 35 HC 0 420 0 4 p=. ef.t n25 # 0 09 0* *. ICER * 5 51:: 150 s0 100 VA DrRNA Counts 200 0.73 0.1 C s. HCT 0 CR 'a21.16 0.3 0 Z 0 0.2 ~0.5 4P-0 1500 15 02=0. mRNA Counts - 25 8 Is"20 =100 glCER 30 0o2= 0t50 0r6 20 00 5 35r 91 s FI da t 0 z 10 , NEN c HCT '.C0.4 . 35 and 30 * . 3 0. 0.5 .2P .4 5 15 0 20 40 F EN mRNA Counts dH DE 060 . * 30 Log2(Stoichlometric Ratio) PTEN / VAPA PTEN mnRA Counts 35 1010 DOR 0 mRNA Counts 15 HCT 02 = 0.24 0.4 0.4 0.2 504 IFEN 3 0 20 40 60 80 Log2(Stoichiometric Ratio) PTEN / CNOT6L 1 0.12 0.2 5*0.1 0 200 -5 100 150 50 VA p RNA Counts Log2(Stchior etrlc Ratio) VAPA /(CNOTbL 5 10 Figure 3.2 t Crosstalk helps ceRpNAs co-fluctuate in single cells thereby tightening their stoichiometric ratios in the presence of activedsiRNAs. (a) Two genes that are coupled by a common microRNA (red) will thereby also manage to couhave reduced deviations "intrinsic' fluctuations ple DIE-their endogenous orltoin their oso hwsgnfcn n ntprgt and therefore ar CorltoCefcet stoichiometric expressions (upper marginal histogram) when compared to genes that don't share miRNAs (grey) .(b,c) Using 3-colour smFISH to quantify expression of Pten, Vapa and Cnot6l in HCT116 and control DICER -/- single cells. Over 300 cells were analysed. Scatter-plot for single-cell transcript counts for Ptert, Vapa, Crtot6l of each pair of ceRNAs. Left column is smFISH data for untreated HCT116 and right column that for control DICER-/-. Correlation coefficients are on top right, and show significant loss of correlation in DICER -/-. Error bars are bootstrapped 95% confidence intervals (d) Using data in (b,c) we computed the ratio of the transcript count for two genes in each cell and refer to this as the stoichiometric ratio of two genes. Red and black curves are the distribution of the 1og 2 (Stoichiometric Ratio) for each pair of genes in HCT116 and DICER cells respectively. The variance of these distributions are indicated in the top left. 73 Chapter 3 A B P - -. 31 30 HC - P - -. 29 CE -. - 250200 IS 500 00 0 0 Figur 3. 0 0 I 2 Oe 0doe 0 d Ts 116 notn 0 00 90 0 DIE .oecreaini to1000 4 0 for a 0 0 0 eewthwih DICR PTEN MRNA PTEN MRNA Figure 3.3 1 Pten does not lose correlation in DICER for a gene with which it doesn't share miRNAs. (a,b) Scatter plot of gene expression in single cells for Ptert and Twisti in HCT1 16 (left) and DICER (right) cell lines. Twisti was chosen as a control as it does not share any predicted miRNA binding sites with Pten and is highly expressed. The two genes remain negatively correlated in both the cell lines. Correlation coefficient on the top right. Another possible explanation for the the observed difference in ceRNA correlations in HCT 116 and DICER is that their cell-cycles proceed at different rates. However, we had cultured the two cell-lines under identical conditions and found only a small difference of doubling time between them (-21h and -23h for HCT116 and DICER respectively). Nevertheless, we accounted for a possible cell-cycle mechanism in explaining such a correlation by calculating the concentration of mRNAs (by dividing the of mRNA in each cell by its cellular volume) and found a similar loss of ceRNA correlation in DICER cells compared to HCT 116 cells. Together, these data suggest that individual ceRNAs appear to be correlated i.e they co-fluctuate with each other in single cells due to the buffering effect of active miRNAs. Stoichiometric ratio of ceRNAs is tightened by active miRNAs We speculated that shared miRNAs could couple fluctuations of ceRNAs and thus regulate the stoichiometry of gene expression. By dynamically buffering individual fluctuations in each species via miRNA-mediated crosstalk, ceRNAs could have tighter stoichiometric ratios with each other than with non miRNA regulated genes(Figure 3.2a). Cellular processes 74 3.1. RESULTS are acutely sensitive to changes in dosage for many genes, and thus ceRNAs may be used in pathways to minimize fluctuations. Pten, for example, is a haplo-insufficient gene such that even moderate Pten down-regulation resulting from the loss of a single allele may be tumorigenic (Kwabi-addo 2001). To compare the range of these ceRNA fluctuations in the HCT116 and DICER cell lines, we calculated a 'stoichiometric ratio' , defined as the ratio between the individual mRNA counts for each ceRNA pair in each cell. Notably the stoichiometric ratio is calculated for each single cell in our dataset, and is thus different from the pearson correlation which is defined for two mRNA count series for a entire cellpopulation. When the distribution of 'stoichiometric ratio' values is plotted for each of the - three ceRNA pairs (Pten & Vapa; Pten & Cnot6l; Vapa & Cnot6l) in HCT116 and DICER /- cells, significant differences can be detected between the two cell-lines. The distribution of ceRNA stoichiometric ratio is tighter in HCT116 cells compared to DICER -/- as measured by the variance in the distribution, implying that the loss of active miRNAs in DICER -/- causes ceRNAs to fluctuate independently of each other. (Figure 3.2d). 3.1.3 Pten, Vapa, Cnot6l are mutually reciprocal ceRNAs As ceRNAs share miRNA binding sites, it is expected that they should behave in a bidirectional manner i.e their interactions should be reciprocal. In order to study their reciprocal effects, we knocked down 3 separate transcripts (with three biological replicates) of 25nM si-Pten, 25nM si- Vapa and 25nM si- Cnot6l and counted the number of transcripts of Pten, Vapa and Cnot6l simultaneously using smFISH for each of the knockdowns.Though we had quantified the crosstalk strength genome wide for Pten, Vapa and Cnot6l as described in Chapter 2 by knocking them down individually with siRNA, and RNA sequencing the transcriptomes, we observed a significantly greater crosstalk strength in HCT116 compared to DICER -/- in 4 of the 6 possible sender-receiver pairs. Given that smFISH measurements yield absolute mRNA expression levels rather than relative RPKM values, we anticipated that quantifications of crosstalk strength would be more accurate when performed at a single molecule resolution. Pairwise analysis of scatter-plots for each of the receivers reveals that they are each depleted when any individual sender is knocked-down (Figure 3.4a,b,c). 75 Chapter 3 fractional charge . *WT 2nM *I-VAPA ilt 0, - in receiver fractional 00 ptc-n = 33% CS, = =.53 63% change C en * not 6l eOc 0.7 0W 40 9 2 ------ r U. a& PTEN VAWA B _. DICER HCTI1I6 J6 vapacnot6l CSgt C - DICER HCT116 DICER ,.CN 6L UI apapten lo-cnt DICER HCT116 ~ VAM Cscno6:I DICER A Figure 3.4 I Measuring crosstalk strength with smFISH for 3 different senders in HCT116 and DICER -/-. (a) Single-cell mRNA counts with 3-colour FISH on Vapa, Pten and Cnot6l in WT(black) and 25nM si- Vapa knockdown (violet). Each dot is the mRNA count/cell for the two indicated mRNA species. Marginal histograms for each mRNA in the two different conditions are on the top and right of each scatter plot. Bars indicate the mean expression in each single-cell distribution (black=WT and violet=si- Vapa). Knocking down Vapa by 60% results in a 33% fold change of Pten. (b,c) Same as (a) with 25 nM si-Pten knockdown (pink) and 25nM si-Cnot6l knockdown(cyan) (c)Crosstalk strength for a receiver wrt to a sender is defined as in the text. Average CS measured in 3 different biological replicates for each sender-receiver pair in HCT and DICER -/- cells. Error bars are standard deviations of 3 independent sets of knockdown experiments. For instance, Pten is reduced by 33% when Vapa levels are knocked down by si-Vapa = 0.53 (Figure 3.4a). Similarly we could calculate the crosstalk by 60%, thus the CSPte" vapa strength for each of the six possible sender-receiver pairs. Even though Pten is not as highly expressed at Vapa, it again emerges as the best sender of crosstalk as corroborated from our genome-wide RNAsequencing results. Importantly, though senders suffer similar fold knockdowns in DICER -/- as in HCT116 cells, the receiver reduction (and consequently crosstalk strength) is much weaker in DICER -/- cells for all 6 sender-receiver pairs indicat76 HCT116 PrEN VAPAVAA PTEN CS 0.4jo 0.3 m DICER HCTI 16 VAPA HCT116 3.1. RESULTS ing that mature miRNAs are essential for the crosstalk mechanism (Figure 3.4d). Notably, we always measure a non-zero 'residual' crosstalk effect in DICER -/- due to the attenuation but not elimination of mature microRNAs as reported by Taqman miRNA qPCR in the DICER -/- cell line (Tay 2011). Taken together, we find that the ceRNA effect is indeed bi-directional and miRNA dependent. 3.1.4 Individual molecules of Pten ceRNAs are colocalized in a miRNAdependent manner On inspecting our smFISH dataset closely, we surprisingly found some of the individual ceRNA molecules were colocalized with each other (Figure 3.5b). As discussed in the introduction, local concentrations of miRNAs and mRNAs can differ considerably from average cellular concentrations. If ceRNAs are co-localized with each other, or sequestered in miRNA processing machinery, then their competition for miRNAs could substantially increase as their effective local concentrations would be much greater than the average concentration of all possible competing miRNA binding sites. Put another way, bound miRNAs released from a sender would have greater propensity to bind to other receiver mRNAs in its vicinity than diffusing to other far-away binding sites. We speculated that the high magnitude of crosstalk strength that we observed for the three reciprocal Pten, Vapa, Cnot6l ceRNAs might be explained by such colocalization of their transcripts. Quantifying degree of colocalization between ceRNAs In order to measure the degree of colocalization between ceRNAs, we used the 3-colour smFISH expression datasets to first identify the precise 3D locations of the centers of each diffraction-limited fluorescent spot. To do so, we fitted a gaussian to each spot's intensity trace for each channel and thereby calculated the centre of each spot. The channels are aligned using TetraSpeckTM Microspheres (Invitrogen) and in each channel we find all the spots in another channel that are within 2 pixels in the xy plane and 2 z-planes away to control for possible stage drifts during the imaging procedure. This method allows the automated quantification of the number of transcripts being colocalized in each single cell. 77 Chapter 3 B A PTEN ORF 48 probes CNOT6 ORF 48 probes VAPA ORF 48 robes P~TE 1VAPA roRKA oocafted with PlM n~U4 4.52 DICERMdePenetm ()R 07.6k0 Figur Fiur 3. 5 Snge Singl FIS oecl moeueFS2hw mRt4A osocaftW~ Y~h YAP r *wer set P s RNs wsPe tnceN AP A 37 d 0, ef IM" OatWdt, uUL A A f , C aeclclzdi saeclclzdi ~4 frth three wthe t hree trHacriptsfo ~ OfOO trnciptse wocv detctio CoS MFAllwnTS detection flopoe lo.wit flopOre diffren 1~ t dTO#feret copled toh nR cupled C NOTOL CnotOl nd expression genes simultaneously. A representative dual-colour image for Vapa and Cnot6l in HCT116 cells is shown (maximum intensity projection) (b) A single z-slice of a 3-colour FISH image for Pten, Vapa and Cnot6l in HCT116 cells. Arrows indicate colocalized tranlocation scripts for each pair of genes. (c)We computationally detected the precise 3D of each transcript's intensity peak and calculated the percentage of transcripts that are colocalized between pairs of ceR.NAs in HCT116 and the control DICER -/- in different experimental conditions (indicated below each barpolot). ]For cg. the colocalization percentto the age of Vapa with Pten indicates the fraction of colocalized Vapa and Pten molecules and analysed were cells 300 than more total number of Vapa molecules. For each condition, cell. a of percentage the colocalization percentage represents the mean colocalization For each pair of ccRNAs, we define the average colocalization fraction as follows: ithceRA =# of colocalized transcripts of ceRNA1 with ceRNA2 Coloaliedracton2=f cRNA total # of transcripts of ceRN A1 frcino eR ,wt eR Colocahzed~ where () denotes the average over all the ells. Note that the Colocalization fraction is not 78 3.1. RESULTS symmetric as the denominators are different even though the numerators are identical i.e Colocalized fraction of Pten with Vapa will always be greater than Colocalized fraction of Vapa with Pten because Vapa expression (denominator) is greater than Pten expression (denominator) even though the number of colocalized Pten and Vapa (numerator) molecules is identical. In order to test whether the colocalization was miRNA dependent, we measured the colocalization fraction between each ceRNA pair in all our experimental conditions, and for both the HCT116 and DICER -/- cell lines. We found that colocalization fraction for each ceRNA pair was significantly higher in HCT compared to DICER -/- in all the conditions,suggesting that miRNAs were partly responsible for colocalization (Figure 3.5c). The fraction of Cnot6l colocalized with Vapal was surprisingly high and ranged from 25-40% in the siRNA knockdown conditions. Most other ceRNA pairs had colocalization fractions between 2-10% in HCT116. However, this is likely to be a lower bound for colocalization of ceRNA species over a cell-cycle because we only take snapshots of gene expression with smFISH. To test for the specificity of our colocalization algorithm, and exclude the possibility that the colocalization was independent of common miRNAs between ceRNAs we used Twisti as a negative control. We checked for colocalization between Pten and Twisti which dont share any miRNA binding sites. We found no colocalization between the two suggesting that colocalization was specific to ceRNA species. We also estimated a null model for random colocalization in the following manner: we took the probability of 2 transcripts to randomly colocalize as the size of a voxel occupied by a diffraction-limited spot / cellular volume. The size of a voxel for a diffraction limited spot is -0.2pm x 0.2pm x 0.3pm while the volume of a cell is -10pm x 10pm x 5pm, thus the probability of random colocalization is negligible. Taken together, we find that colocalization of ceRNAs is miRNA dependent and differs considerably for each ceRNA pair. 79 Chapter 3 3.2 Discussion Here we used a smFISH assay to quantify endogenous transcription in single cells of known ceRNAs (Cnot6l,Pten, Vapa) with single-molecule resolution and analyzed their spatial localizations. Our smFISH single-cell measurement of crosstalk strength for these three ceRNAs that share at least 7 miRNA binding sites is consistent with the previous chapter's population-level result. However, we measured Vapa, Pten and Cnot6l's crosstalk effects on each other with a much greater accuracy, and found that they affected each other reciprocally at both mean-level changes and dynamically in single-cells. In analyzing the single-cell expression profile, we uncovered a miRNA-dependent correlation and stoichiometric covariation of ceRNA expression in single cells along with a miRNA-dependent colocalization of their mRNA molecules. These findings may have important implications of a crosstalkbased mechanism of post-transcriptional regulation. Firstly, if microRNAs promote a stoichiometric balance among genes that share miRNA binding sites then this could explain the paradox of weak miRNA repression on individual targets versus strong evolutionary selection of microRNA-targeting. Stoichiometric balance is crucial within macromolecular complexes and cellular networks where imbalances can lead to severe malfunctions. As microRNAs are known to extensively co-target functionally shared gene networks and proteins in macromolecular complexes, we suggest that microRNAs may be selected for their combinatorial regulation on many different ceRNAs together rather than on individual targets. The individual repressive effect of a miRNA on its shared targets would be correlated through the crosstalk channel and allow for stoichiometric expression of a large set of miRNA targets. Such a crosstalk based co-regulatory mechanism at the transcript level would allow a flexible,adaptive mechanism for compensating environmental, genetic or random perturbations in mRNA abundance. Secondly, our observation that ceRNAS exhibit reduced gene expression correlations in miRNA deficient DICER -/- cells may be taken as a general signature of crosstalk to help in their identification. Putative ceRNAs could be identified without perturbing the cell i.e without relying upon either down-regulating or up-regulating the levels of a particular 80 3.3. METHODS sender and observing changes in a particular receivers. Instead, the intrinsic variability of sender transcript levels in a cell would correlate the levels of a receiver through the shared miRNA crosstalk channel. Recent advances in single-cell sequencing technology has resulted in the ability to measure the entire transcriptome of hundreds of cells, and thereby compute single-cell correlations between all possible pairs of genes (Gruen,Kester 2014). Pairs of genes that appear to lose correlation in DICER -/- when compared to HCT 116 would thus be attractive ceRNA candidates. Such an unbiased, "loss of correlation" based approach to identify ceRNAs would circumvent two major limitations of the sender perturbation strategy. One, the reliance on microRNA-target predictions to identify putative ceRNAs. Computational target predictions are often noisy and have limited accuracy and consistencyin practice, false positives and false negatives in the target predictions often make it difficult to identify mRNAs with common targeting miRNAs. Secondly, perturbing a sender mRNA causes a cascade of transcriptional and protein-level changes which make the construction of a null model challenging. 3.3 3.3.1 Methods Fluorescent in situ hybridization and imaging Hybridization and washes were carried out according to previously established protocols (Femino 1998,Raj, 2008). Briefly, we hybridized probes for at least 18 hours at 30C, we used wash buffers of formamide concentration 25%. Optimal washing conditions and probe concentrations were determined empirically for each gene. For nuclear staining, we used the DAPI after the wash steps. Z-stacks of images were taken with a Nikon Ti-E inverted fluorescence microscope equipped with a 100x oil-immersion objective and a Photometrics Pixis 1024B CCD (charge-coupled device) camera using MetaMorph software (Molecular Devices, Downington, PA). The image-plane pixel dimension was 0.13 Jpm and the Z spacing between planes was 0.4 pm. 81 Chapter 3 Table of smFISH experimental conditions Treatment Cell-line smFISH species untreated HCT116 and DICER -/- Pten, Vapa, Cnot6l, Twisti 25nM si-non targeting neg control HCT116 and DICER -/- Pten, Vapa, Cnot6l 25nM si-Pten HCT116 and DICER -/- Pten, Vapa, Cnot6l 25nM si- Vapa HCT116 and DICER -/- Pten, Vapa, Cnot6l 25nM si-Cnot6l HCT116 and DICER -/- Pten, Vapa, Cnot6l 3.3.2 Image analysis The transcript distribution was measured by counting smFISH labeled mRNA in single cells as previously described (Raj, Bogaard, et al., 2008). Briefly, a log filter is applied to each optical plane of the image stack to enhance the fluorescent signal. A threshold on intensity values is taken for where the plot consisting of the of identified spots with respect to intensity plateaus to pick up true mRNA spots. The locations of mRNA spots are then taken to be the regional maximum pixel value of each connected region. Cell boundaries are manually traced using the dapi and bright-field images. The number of mRNA spots located within the cell boundaries of an individual cell can thus be quantified. 3.3.3 siRNA transfection and cell culturing Transfections and cell culturing were carried out as described in Chapter 2. 82 Chapter 4 MicroRNA-mediated control of protein expression noise 4.1 Background 1 MicroRNAs regulate a large number of genes in metazoan organisms (Friedman et al., 2009; Lewis et al., 2005; John et al., 2004; Lee et al., 1993; Wightman et al., 1993; Enright et al., 2003) by accelerating mRNA degradation and inhibiting translation (Guo et al., 2010; Lim et al., 2005). Although the physiological function of some microRNAs is known in detail (Lee et al., 1993; Wightman et al., 1993; Brennecke et al., 2003; Johnston and Hobert, 2003), it is not clear why microRNA regulation is so ubiquitous and conserved, since individual microRNAs only weakly repress the vast majority of their target genes (Baek et al., 2008; Selbach et al., 2008) and knockouts rarely result in mutant phenotypes (Miska et al., 2007). One reasons for this widespread regulation that has been proposed is the ability of microRNAs to provide robustness to gene expression (Bartel and Chen, 2004; Hornstein and Shomron, 2006) - e.g. by buffering stochastic variability in gene expression(Ebert and 'This chapter has been adapted from a paper entitled "MicroRNA control of protein expression noise' that has been published (Science 3 April 2015: 128-132) with lead author J6rn Schmiedel. My contribution was to aid in experimental design and in writing an earlier version of the final paper. Chapter 4 Sharp, 2012). In this work we use mathematical modeling and single cell reporter assays to show that microRNAs - in conjunction with increased transcription - decrease protein expression noise for lowly expressed genes, but increase noise for highly expressed genes. Genes that are regulated by multiple microRNAs show more pronounced noise reduction. We estimate that hundreds of (lowly expressed) genes in mouse embryonic stem cells have reduced noise due to substantial microRNA regulation. Our findings therefore suggest that microRNAs confer precision to protein expression and thus offer plausible explanations for the commonly observed combinatorial targeting of endogenous genes by multiple microRNAs as well as the preferential targeting of lowly expressed genes. Gene expression is inherently variable due to the stochasticity of all molecular reactions (Raj et. al., 2006; see (Figure 4.1a). Noise in the expression of a gene is thought to mainly originate from transcriptional dynamics (Blake et al., 2003; Raj, Peskin, et al., 2006), low number of mRNA molecules (Ozbudak et al., 2002; Bar-Even et al., 2006) or fluctuations that propagate to the gene from external sources, such as varying numbers of transcription factors or ribosomes (Pedraza and van Oudenaarden, 2005; Paulsson, 2004). Previous work has hypothesized that microRNAs should be able to reduce gene expression noise when their repressive post-transcriptional effects are antagonized by accelerated transcriptional dynamics (Ebert and Sharp, 2012; Noorbakhsh et al., 2013). However, this has not been shown experimentally and since microRNA levels themselves are variable, the propagation of their fluctuations should theoretically contribute additional gene expression noise. 4.2 Effects of microRNAs on gene expression noise To explore the effects of endogenous microRNAs on protein expression noise, we adapted a single-cell plasmid reporter system (Mukherji et al., 2011) to measure microRNA-dependent expression fluctuations in mouse embryonic stem cells (mESC). The plasmid contains two genes that encode fluorescent proteins (ZsGreen and mCherry), which are transcribed from a common bi-directional promoter (Figure 4.1b). 84 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE A translational transcriptional machinery machinery gene r -- =- protein mRNA I expr. -~-- microRNA 00 D C B 1 05 four pTRE-Tight ~i-'~ - < no 3'UTR >0.1 <Ri> bulged miR-20a sites 3UTR I 10 Cd 0 -- microRNA pr te 104 10' mCherry intensity (a.u.] (in one ZsGreen bin) aE U 0 protein E 10e 101 E A L 1 105 intensity [a.u.] *no 3'UTR * one perfect miR-20a site *no 3'UTR e one bulged miR-20a site . 104 103 102 ZsGreen 1.5 .no 3'UTR * four bulged miR-20a sites (0 05 E 0.5 0102 0 103 104 0.5 010 102 103 104 102 1W 104 mCherry intensity mean [a.u.] Figure 4.1 I Opposing noise effects of microRNA regulation at low and high gene expression. (a) Model scheme for the expression of a microRNA regulated gene. The microRNA can reversible bind the mRNA (not depicted) to inhibit its translation and decrease its stability. If the mRNA is degraded in the mRNA-microRNA complex, the microRNA is recycled. Noise in gene expression originates from the stochasticity of molecular reactions (intrinsic noise; jagged reaction arrows), or variability in the cellular machinery (extrinsic noise; external factors with fluctuating levels). (b)The plasmid reporter system. The plasmid carries a pTRE-Tight bi-directional promoter from which ZsGreen and mCherry are transcribed. The mCherry 3'UTR can be modified to contain no or a certain number and type of microRNA binding sites. (c) Overlay of two flow cytometry measurements of mouse embryonic stem cells transiently transfected with two different variants of the reporter system, one with no mCherry 3'UTR (black) and the other with four bulged miR-20a binding sites in the mCherry 3'UTR (blue). For further processing we binned cells according to ZsGreen intensity (red vertical lines) and discarded cells in ZsGreen background (grey) (see Appendix C, Methods). a.u.: arbitrary units. 85 Chapter 4 Figure 4.1 1 Opposing noise effects of microRNA regulation at low and high gene expression. (d) Example of mCherry intensity distributions in one ZsGreen bin. In each bin we calculate the mean and noise - defined as the coefficient of variation (standard deviation divided by mean) - of mCherry intensity distributions. (e)Noise of mCherry intensity as a function of mean mCherry intensity in each bin for three different miR-20a regulated constructs (blue) compared to respective unregulated constructs (black). Panels are ordered from left to right according to increasing repression of constructs by miR-20a (cf. Figure C.1). Dots and error bars represent data mean and bootstrapped standard deviation, respectively. Dashed lines and patches represent optimal model fit and 95% confidence interval, respectively. To probe the effect of microRNAs, we constructed variants of the plasmid with different numbers and types of microRNA binding sites in the 3'UTR of the mCherry gene. We transfected plasmids into mESCs and quantified single cell fluorescence two days later using a flow cytometer (Figure 4.1c). We used ZsGreen fluorescence intensity to bin cells with similar transcriptional activity (e.g. due to varying plasmid copy numbers) and in each bin we calculated mean and noise of mCherry intensities over all cells in the bin ( (Figure 4.1d), see Appendix C, Materials and Methods and Supplementary Note). We define noise as the standard deviation of the protein expression distribution divided by its mean, which is an intuitive measure of the relative size of expression fluctuations. We started by assessing the effects of miR-20a, a microRNA endogenously expressed in mESC, on mCherry protein expression noise (Figure 4.le). In cells with low mCherry expression, miR-20a regulation reduces noise compared to an unregulated control. In contrast, in cells with high mCherry expression, miR-20a regulation increases noise. These changes in mCherry noise are more pronounced for reporters where miR-20a repression of mCherry protein is stronger, e.g. when using perfect and multiple target sites ( (Figure 4.lf,g)and Figure C.1). We utilized a mathematical model in order to understand these opposing effects of microRNA regulation on protein expression noise (see Appendix C, Supplementary Model).In this work, we adopt the commonly used decomposition of total noise 7tot into intrinsic noise and extrinsic noise q 2 (Elowitz, Levine, et al., 2002; Swain et al., 2002). Here, squared total noise is the sum of the squared intrinsic and extrinsic noise components: 86 2t 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE 77ot mt A intrinsic noise nt A 1.5 CL -no ~| I mRNA-miRNA interaction interaction B 1 C total noise TIo A 1.5 0L V 1 a) 1 A) 0 C 0.5'S 0 101 102 103 - (0 C extrinsic noise next A 1.5 CL (4.1) ext 104 105 protein expression [a.u.] 0.51 C: 0 10 0.5 0' 102 103 10 10, protein expression [a.u.] 10 10 10 10 10, protein expression [a.u.] Figure 4.2 1 Noise model predictions for a microRNA regulated gene. (a) Intrinsic noise due to low molecule numbers declines with increasing expression. MicroRNA regulation reduces intrinsic noise as a function of repression due to higher mRNA numbers necessary and dampened propagation of noise from the mRNA to the protein level. (b)microRNA regulation results in additional extrinsic noise due to fluctuations in the microRNA pool that are propagated to the target gene dependent on conferred repression and satu- ration of the microRNA pool (cf. Figure C.2). (c) Net influence of microRNA regulation results in decreased total noise at low and increased total noise at high expression levels. Intrinsic noise stems from the reactions internal to the expression of the gene and is dominated by transcriptional dynamics and low mRNA copy numbers. Extrinsic noise stems from fluctuations propagating from external factors to the gene (Figure 4.2a). The modeling results in two key predictions. Firstly, the model predicts that a microRNA-regulated gene (reg) has reduced intrinsic noise compared to an unregulated gene (unreg) at equal protein expression levels; the size of intrinsic noise reduction is approximately equal to the square root of microRNA-mediated fold-repression r (Figure 4.2a): unreg _reg 77 -V' (4.2) int The model predicts that the effect and its size are independent of the mode of repression, since translational inhibition requires higher mRNA levels and therefore reduces intrinsic 87 Chapter 4 noise resulting from low mRNA copy numbers, while accelerated mRNA degradation dampens the propagation of noise from the mRNA to the protein level (see Appendix C, Supplementary Note 1; Ebert et. al., 2012; Pedraza et. al., 2005; Fraser et. al., 2004). To achieve equal protein expression given increased mRNA turnover, there must be increased transcription rates. Reduction of intrinsic noise can therefore be understood as the combined effect of microRNA-mediated accelerated turnover and increased transcriptional activity (Ebert and Sharp, 2012). Secondly, the model predicts ( (Figure 4.2b) and Figure C.2) that microRNA regulation acts as an additional extrinsic noise source given by 7ext =4 where (4.3) IT denotes the noise in the pool of regulating microRNAs (see Appendix C, Sup- plementary Model), and 0 is the microRNA repression (see Figure C.2). The combined effects of decreased intrinsic and additional extrinsic noise result in decreased total noise at low expression, but increased total noise at high expression (Figure 4.2c) ; and model-fits, with the microRNA pool noise as the only free parameter, yield accurate agreement with the experimentally observed total noise profiles (Figure 3.le-g). To distinguish the effects of microRNA regulation on intrinsic and extrinsic noise experimentally, we modified our plasmid reporter system such that both ZsGreen and mCherry are regulated by miR-20a through identical 3'UTRs ((Figure 4.3a) and Figure C.3a). As a result of this design, both fluorescent reporters share the same regulatory inputs and cellular environment, and intracellular differences in their expression can only result from processes inherent to each gene, i.e. the processes that create intrinsic noise (Elowitz, Levine, et al., 2002; Swain et al., 2002). Results from this experimental design show that miR-20a regulation reduces intrinsic noise compared to an unregulated construct ((Figure 4.3b) and Figure C.3b). As predicted by our model, the intrinsic noise is reduced by the square root of fold-repression conferred by miR-20a ((Figure 4.3c) ; see also Figure C.3d), confirming our results reported in Figure 4.1c These results further imply that the observed increase in total noise at high mCherry expression must be due to additional extrinsic noise (Figure C.3c). 88 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE B A pTRE-Tight P=UUTR 4 0 CD 0 4 0.5 . no 3'UTR . elxbulged miR-20a * no 3'UTR . no 3'UTRs * I xperfect miR-20a 9 4xbulged miR-20a C 0 0 C mRNA C protein nicroRNA 104 10m mean mCherry + ZsGreen C =3 0 10310105103104 4 3 intensity [a.u.] 4xbulged rr 8-20a ix perfect mirA-29ieT Q.) --lprfectmtrri-20a L l tulged miR-20a 2 3 4 sqrt(fold-repression) Figure 4.3 1 microRNA-mediated intrinsic noise effects. (a)Modified plasmid reporter system where ZsGreen and mCherry have identical 3'UTRs, which allows to quantify expression-dependent intrinsic noise. (b) Intrinsic noise as a function of mean ZsGreen mCherry intensities in each bin, showing that microRNA regulation reduces intrinsic noise. Dots and error bars represent data mean and bootstrapped standard deviation, respectively. Dashed lines and patches represent optimal model fit and 95% confidence interval, respectively. (c) Measured intrinsic noise reduction for bi-regulated constructs is proportional to square root of fold-repression, as measured independently by mCherry-regulated constructs (cf. Figure C.1). Error bars indicate standard deviation of three biological replicates. + 1 In summary, our data show that miR-20a regulation reduces intrinsic noise while it increases extrinsic noise of target genes, resulting in lower total noise at low expression but increased total noise at high expression levels. Our analyses so far suggest that the reduction of intrinsic noise is a generic property of microRNAs as post-transcriptional repressors of protein expression and therefore noise reduction should occur irrespective of the specific microRNAs or the molecular details of the mRNA-microRNA interaction. In contrast, additional extrinsic noise stems from the variability of the microRNA pools and should therefore depend on the specific microRNA. To investigate these hypotheses, we constructed reporters with binding sites for eight additional microRNAs that are endogenously expressed in mESC over a wide range (Figure 89 Chapter 4 C.4). Since the molecular details of mRNA-microRNA interactions do not affect microRNAmediated noise effects we chose perfect target sites to allow for high specificity with respect to the regulating microRNA pool and to optimize measurement signals. The data from all eight reporters consistently show intrinsic noise reduction as large as the square root of foldrepression (Figure C.3e), and we additionally confirmed this by directly measuring intrinsic noise reduction for miR-291a (cf. (Figure 4.3c) ). We furthermore found that AU-rich elements, which induce post-transcriptional repression of protein expression due to binding of various co-factors (Barreau et al., 2005), also reduce intrinsic noise by the square root of fold-repression (Figure C.3f). These data therefore support the hypothesis that reduction of intrinsic noise is a generic property of microRNAs as post-transcriptional repressors that is independent of the specific identity of the regulating microRNA. Next we used our mathematical model to extract the microRNA pool noise from the fits to the experimental data. We find that microRNA pool noise differs across all assayed microRNAs (Figure 4.4a) , while estimates of microRNA pool noise for different constructs assaying the same microRNA are similar (Figure C.7), validating that our model fits can faithfully estimate microRNA pool noise. Although microRNA pool noise decreases for microRNAs that repress the reporters more strongly, it is still substantial even for the most highly expressed microRNAs in mESC (miR-290 cluster, including miR-290, miR291a miR295; Marson et. al., 2008). Interestingly, we find that the subset of assayed microRNAs with two independent gene copies, producing the identical mature microRNA ((Figure 4.4a), marked in red), tend to have lower microRNA pool noise compared to microRNAs that confer similar repression but only have one gene copy ((Figure 4.4a), marked in black). 90 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE A E A) 0 0.5 r_ ----. miR-200b m-- R2&- ------. niiR-_9P 0 iRT2amiR-2 0. z 0.25 ..-.- 0 m1F 126a :3104 acc o miR-291a .2E E)- iR-16ni 3 10 A i102 OE 0Q 10p 102 10' 0 -J 103 m herry mRNA leve s [RPKM] 101 fold-repression mESC B transcriptome 0.5 0) 0 CL z 0.25 CL q - -- - - 0 0.3~ C5 . I - .. 0. 2 0.1 0 10' 102 10 100 mRNA levels [RPKMI 0, qt * + F + NC percentage of genes expressed below 25 50 75 90 95 99 C 1.5 v 0 - - Weel 3'UTR ------- I Lats23'UTR 100- - 3'UTR mut A Wee1 3'UTR wt "Weel A so, 1 0. a C 100 0 0.5 50. E 0 D 2 104 103 10 mCherry intensity mean [a.u.] 0. E 100- e Lats2 3'UTR E mut * Lats2 3'UTR wt 1 ~ ~ ---- 0 --------- Casp2 TUTR -- - - - -- - - 0 100 -Rbl2 3'UTR - ----------- 50- 0 0. C 100 10' 102 10 mRNA levels [RPKM] 0 crossover * endogenous expression . -0- 0.5 E so so0 E 0. A 0 - a) 3 max. possible noise reduction 104 1io 102 mCherry intensity mean [a.u.] Figure 4.4 1Estimation of microRNA pool noise and noise effects for endogenous genes. (a) MicroRNA pool noise estimates from reporters with perfect target sites for nine different microRNAs endogenously expressed in mESC. Subsets of microRNAs with one (black) or two gene copies (red) show negative scaling of pool noise with conferred foldrepression, with latter subset having lower noise levels. 91 Chapter 4 Figure 4.4 1Estimation of microRNA pool noise and noise effects for endogenous genes. (b)microRNA pool noise estimates of individual pools of miR-16, miR-20a and miR290 compared to mixed pools of miR-16 miR-20a and miR-20a miR-290, as determined from a reporter regulated by two perfect target sites for the respective microRNA species. Red bars in columns for mixed pools show expected microRNA pool noise when individual microRNA sub-pools were fully correlated. (c)Total noise levels for the 3'UTR of the cell cycle regulator Weel, wild-type (blue) and microRNA binding sites point-mutated (black) versions. (d) Total noise levels for the 3'UTR of the tumor suppressor Lats2, wild-type (blue) and microRNA binding sites point-mutated (black) versions. (e) Mapping fluorescent reporter levels to the transcriptome of mESC. (Upper panel) FACS sorting and least square regression was used to determine conversion between mean mCherry fluorescent intensities and mCherry mRNA levels (as measured by RNA seq). (Lower panel) The range covered by the fluorescent reporter system in relation to the transcriptome expression (n = 13751) in mESC (25% to -99% of transcriptome expression). (f) Relative microRNA-mediated effects on total noise in assayed endogenous 3'UTRs compared to their point-mutated 3'UTR versions as a function of transcriptome expression. Blue line and area represent model-based extrapolation of noise effects to transcriptome expression (mean and 95% confidence interval based on parameter estimates of n=3 measurements). Black dots indicate crossover from reduced to increased total noise. Red dots indicate endogenous transcriptome expression of the respective gene in mESC. Red dashed lines indicate maximally expected reduction of total noise given the observed repression. Error bars in (a) & (b) indicate standard deviation of at least three biological replicates. In (c) & (d) dots and error bars represent data mean and bootstrapped standard deviation, respectively. Dashed lines and patches represent optimal model fit and 95% confidence interval. This suggests that microRNA pools could have lower noise if they consist of independently transcribed microRNAs. We reasoned that these findings should extend to genes that are regulated by different microRNAs, where uncorrelated fluctuations between the different microRNAs can average out, resulting in lower noise of the overall pool. To test this hypothesis, we constructed reporters with a perfect target site for miR-20a and an additional perfect target site for either miR-16 or miR-290 in the mCherry 3'UTR and compared them to reporters with two perfect target sites for miR-16, miR-20a or miR-290, respectively. When estimating microRNA pool noise from the total noise profiles (Figure C.8) we find that the noise levels in the mixed pools are lower than expected if the individual microRNA pools were fully correlated (see Appendix C, Methods) and can be lower than the noise in the individual microRNA pools (Figure 4.4b). Taken together these experiments show that although microRNA regulation increases extrinsic protein expression noise, mixed pools of microRNAs can attenuate this effect. 92 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE So far we investigated microRNA-mediated noise effects using nearly or fully complementary microRNA binding sites in an artificial 3'UTR setting. Endogenous microRNA targets however often harbor many binding sites, albeit with less complementarity, for different microRNAs in their 3'UTRs (Friedman et al., 2009; Enright et al., 2003; Krek et al., 2005; Stark et al., 2005). To test if our findings extend also to those situations, we constructed four mCherry reporters with the 3'UTRs for the genes Weel, Lats2, Casp2 and Rbl2, which all have multiple binding sites for different microRNAs endogenously expressed in mESC. We then compared protein expression noise for constructs with the wild-type 3'UTRs to versions with point-mutated microRNA binding sites (see Appendix C, Methods). The microRNAs together confer between 3 and 5.5-fold repression for the wild-type 3'UTRJs compared to the point-mutated 3'UTRs (Figure C.9a). For all wild-type 3'UTRs we observed reduced total noise at low and intermediate expression compared to the mu- tated 3'UTRs ((Figure 4.4c,d) and Figure C.9a). As observed for the artificial 3'UTR constructs, intrinsic noise for the wild-type 3'UTR constructs is reduced by the square root of fold-repression (Figure C.3g), indicating that our previous findings on the reduction of intrinsic noise can be extrapolated to endogenous microRNA targets. Interestingly, total noise is hardly increased at high expression levels and the estimated noise levels for the mixed microRlNA pools regulating the endogenous 3'UTRs are very low compared to the noise levels estimated for single microRNA pools (Figure C.9b), consistent with the findings above that mixing of different microRNA species results in lowered microRNA pool noise. Finally, we determined if the expression range covered by our reporter assay covers relevant expression levels of endogenous genes. We collected cells at different mCherry fluorescence intensities using fluorescence-activated cell sorting, and measured mCherry mRNA levels in conjunction with the whole transcriptome using mRNA sequencing (see Appendix C, Methods Figure C.10a). We find that our reporter assay covers the range of 25% to 99% (-1 RPKM to -500 RPKM) of expressed genes in mESC (Figure 4.4e), indicating that the noise effects observed in our reporter assay are relevant to endogenous genes. For all four 3'UTRs that we assayed with our reporter, reduction of total noise extends in a graded fashion up to the top 10% of the transcriptome expression distribution (Figure 4.4f). 93 Chapter 4 While most microRNAs individually repress genes only to a small extend (11, 12), we find that hundreds of genes are substantially repressed (>2 fold) by the combinatorial action of microRNAs in mESC (Figure C.11), as determined from data comparing the transcriptome expression between wild-type and microRNA-deficient Dicer knockout mESC (Leung et al., 2011). Furthermore, most of the highly repressed genes have low expression levels (see Figure C.11; Stark et. al., 2005; Farh et. al., 2005; Sood et. al., 2006), suggesting that these genes should have reduced protein expression noise as a consequence of microRNA regulation. 4.3 Conclusions Genome-scale analysis of microRNA binding data (Farh et al., 2005; Sood et al., 2006) has shown that microRNAs preferentially target lowly expressed genes that are dominated by intrinsic noise, while selectively avoiding ubiquitous and highly expressed genes that are more sensitive to extrinsic fluctuations. Our integrated theoretical and experimental approach has shown that microRNAs reduce intrinsic noise while increasing extrinsic noise. Together these results suggests that a common effect of microRNAs is to reduce gene expression noise. Our work has further shown that combinatorial microRNA regulation, a widely observed phenomenon in vivo (Friedman et al., 2009; Enright et al., 2003; Krek et al., 2005; Stark et al., 2005), enhances overall noise reduction by amplifying repression and buffering stochastic fluctuations in the abundance of single microRNAs. Combinatorial microRNA regulation may thus be a potent mechanism to reinforce cellular identity by reducing gene expression fluctuations that are undesirable for the cell. The principle established in this work is that fluctuations in protein abundance can be effectively regulated at the level of transcription. Here, we have focused on the capacity of microRNAs to regulate gene expression noise; however, any translationally invariant mechanism that decreases the timescale of mRNA fluctuations will, in principle, produce a similar effect. This conceptual perspective provides a foundation for studying a broad range of transcriptional regulators as alternative instruments for controlling protein noise. 94 Chapter 5 Conclusions and Future Directions it is now well-established that miRNA play an important role in gene regulation through either translational repression or mRNA degradation. By being able to target different mRNA species, their impact may be more extended. In this thesis we have investigated the ceRNA hypothesis which proposes to add a new layer of post-transcriptional gene regulation mediated by the titration of common miRNAs by competing targets. This RNA-RNA crosstalk effect is a subject of intense activity and indeed controversy. Indeed, it is difficult to imagine that perturbing the expression of individual miRNA targets, which are only a small part of the total number of binding sites in the cell, could possibly influence enough miRNA to significantly change the repression of other targets. The focus of this work has been to interrogate key questions about the ceRNA mechanism- its generality, its dependence on shared miRNAs, and the size of the effect. We aimed to answer these questions by integrating three kinds of experiments: a) perturbing the levels of 3 known ceRNAs and systematically searching for miRNA-mediated crosstalk effects on the transcriptome b) modulating the levels of binding sites in the cell by over-expressing an endogenous PTEN 3'UTR and sorting cells carrying specific amounts of the PTEN 3'UTR to isolate dose-dependent crosstalk effects c) quantifying the expression and spatial localization of ceRNAs in single-cells While initial studies of the ceRNA hypothesis were restricted only to a few computa- Conclusion and Review tionally predicted ceRNAs, our results show that an appreciable crosstalk effect exists quite pervasively across the genome, i.e the levels of hundreds of genes, across all expression scales, appear co-regulated along with the perturbed sender. Through carefully selecting genes whose crosstalk was lowered in a miRNA-deficient control we could ascertain that the effect was miRNA-mediated. More specifically, the size of the crosstalk effect can be correlated to the number of shared miRNA, and the quality of miRNA binding sites in the receiver genes. Thus both the overlap of miRNA binding sites and the affinity of those miRNAs are important determinants of crosstalk. In the case of VAPA, PTEN, CNOT6L, we found that shared miRNA binding sites made their interactions reciprocal- perturbations in each caused changes in the other. These findings suggest that combinatorial miRNA targeting could be a mechanism that cells use to concordantly shape the expression of an entire class of genes which may be functional in similar pathways or need to be expressed stoichiometrically. 10075 Binding equation: U50 Binding or unbndning oP 25S 0 1 10,000 100 1 1 100 Free miRNA concentation F (units of Kd) 10,000 Figure 5.1 I Colocalization of ceRNA's can enhance crosstalk by increasing their local concentrations hence promoting rates of miRNA association between ceRNA as free miRNA's are more likely to bind to nearby mRNA than other targets (adapted from Jens (2015). The size of the ceRNA effect was bounded by 1 for all receivers, for each of the 3 senders. That is, the fold change in a receiver was always lower than the fold change in the perturbed sender. The existence of a hard bound emerged naturally from our minimal ceRNA model because each receiver is only weakly repressed by a miRNA and each sender sequesters only a fraction of the total miRNA pool. However the moderate crosstalk strength we 96 measured for many genes was still larger than predicted by our minimal miRNA-ceRNA model, and other steady-state models of target competition. To examine this discrepancy in more detail, we used smFISH to measure the intracellular concentrations of these molecules at the molecular level and surprisingly found a colocalization of different ceRNA species with each other. Thus, we hypothesize that the strong crosstalk for PTEN, VAPA and CNOT61 (between 0.2-0.5), and possibly other ceRNAs, might be explained by localization. Effectively, localization renders the available pool of interacting binding sites much smaller than the total, amplifying crosstalk between select ceRNAs. Put another way, colocalization of bound ceRNAs increases their local concentrations making it more likely that dynamically binding/un-binding miRNAs from one ceRNA will bind to another nearby ceRNA. We are currently working on extending our minimal model to take localization effects into account. Experimentally, one can apply new multiplexed smFISH technologies to potentially search for colocalization between multiple ceRNAs (Lee 2014). Though more challenging, with recently developed technologies to visualize sub-cellular localization of miRNAs (Pitchiaya 2014), one can probe the miRNAs we have identified to search for spatial colocalizations between ceRNA-miRNA pairs. Genome-wide ceRNA studies have measured expression changes in population averages of cells after perturbing either the # of targets/or miRNAs and have found crosstalk effects to be small. We think miRNA-mediated crosstalk effects are more visible in in un-perturbed single cells. As we found in Chapter 4 miRNA pools of different miRNA families themselves can be quite noisy and propagate noise to their target proteins. Our work suggests that both miRNA pool noise and miRNA coupling between ceRNAs are a mechanism to suppress their independent fluctuations, leading to more correlated and even stoichiometric expression of genes in single cells. However we caution that we only demonstrated this effect in fixed cells by observing differences in ceRNA correlations between HCT 116 and miRNA deficient DICER cells. Future studies could track ceRNA levels dynamically in single-cells after antagonizing specific miRNAS to truly isolate which miRNAs are responsible for reduced fluctuations. Measuring correlations or noise would be a more sensitive measure of miRNA induced interactions between ceRNAs than perturbing individual ceRNAs 97 References Ebert, M.S., Neilson, J.R., and Sharp, P.A. (2007). MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nat. Methods 4, 721-726 Baek, D, Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71. Selbach, M., B. Schwanhausser, N. Thierfelder, Z. Fang, R. Khanin, N. Rajewsky. (2008). Widespread changes in protein synthesis induced by microRNAs. Nature 455: 58-63 Bartel DP. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281297 Wightman, B., Ha, I. and Ruvkun, G. (1993). Post-transcriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862. Reinhart, B. J., Slack, F. J., Basson, M., Pasquenelli, A. E., Bet- tinger, J. C., Rougvie, A. E. and Horvitz, H. R., (2000) The 21-nucleotide let-7 RNA regulates developmental timings in Caenorhabditis ele- gans. Nature, 403, 901-906. Cai S, Han HJ, Kohwi-Shigematsu T (2003) Tissue-speciWc nuclear architecture and gene expression regulated by SATB1. Nat Genet 34(1):42-51 Hansen T.B.,J.Kjems, C.K.Damgaard (2010),. CircularRNAand miR-7 in cancer. Cancer Research, vol. 73, no. 18, pp. 5609-5612, 2013. Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI, Rubio-Somoza I, Leyva A, Weigel D, Garcia JA, Paz-Ares J. (2007). Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet 39: 1033-1037. Seitz, H. (2009). Redefining microRNA targets. Curr. Biol. 19, 870-873. Mayr, C., and Bartel, D.P. (2009). Widespread shortening of 3'UTRs by alterna- tive cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673-684. Poliseno, L., Salmena, L., Zhang, J., Carver, B., Haveman, W.J., and Pandolfi, P.P. (2010). A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033-1038. Cesana M, Cacchiarelli D, Legnini I, Santini T, Sthandier 0, Chinappi M, Tramontano A, Bozzoni I. (2011). A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147:358 -369. Lewis B, Shih 1, et al (2003).: Prediction of mammalian microRNA targets. Cell, 115(7):787798. Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenyla- tion and clearance of maternal mRNAs. Science 312, 75-79. Ebert, M. S. & Sharp, P. A. Emerging roles for natural microRNA sponges. Curr. Biol. 20, R858R861 (2010). 98 Memczak S, et al. (2013). Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495:333- 338. Brewster, R.C., Weinert, F.M., Garcia, H.G., Song, D., Rydenfelt, M., and Phillips, R. (2014). The transcription factor titration effect dictates level of gene expression. Cell 156, 1312-1323. Buchler, N.E., and Louis, M. (2008). Molecular titration and ultrasensitivity in regulatory networks. J. Mol. Biol. 384, 1106-1119. Bartel DP. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281297 Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, Karreth F, Poliseno L, Provero P, Di Cunto F, Lieberman J, Rigoutsos I, Pandolfi PP. (2011) Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell 147, 344-357. Karreth FA et al (2011) In vivo identification of tumor-suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma. Cell, 147:382-395 Sumazin P, Yang X, Chiu HS, Chung WJ, lyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J. (2011). An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147: 370-381 Yi et al. (2008). A skin microRNA promotes differentiation by repressing 'stemness'. Nature 452, 225-229. Sluijter, J.P.G. et al. (2010). MicroRNA-1 and -499 regulate differentiation and proliferation in human-derived cardiomyocyte progenitor cells. Arterioscler. Thromb. Vasc. Biol. 30, 859-868. Cimmino, A. et al. (2005). miR-15 and miR-16 induce apoptosis by targeting Bcl2. Proc. Nati. Acad. Sci. USA 102, 13944-13949 Jens M, Rajewsky N, (2015) Competition between target sites of regulators shapes posttranscriptional gene regulation, Nature Reviews Genetics 16, 113-126 Nitzan M., et al. Interactions between distant ceRNAs in regulatory networks. Biophys. J., 106 (2014), pp. 2254-2266 Salmena, L., Poliseno, L., Tay, Y., Kats, L., and Pandolfi, P. P. (2011). A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353-358. Cesana M, et al. (2011). A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA, Cell, 147 , pp. 358-369 Tay, Y, Rinn, J., and Pandolfi, P.P. (2014). The multilayered complexity of ceRNA crosstalk and competition. Nature 505, 344-352. Figliuzzi, M., Marinari, E., and De Martino, A. (2013). MicroRNAs as a selective channel of communication between competing RNAs: a steady-state theory. Biophys. J. 104, 1203-1213. Bosia C, Pagnani A, Zecchina R. (2013) Modelling Competing Endogenous RNA Networks PLoS ONE vol. 8 (6) pp. e 66609 99 Denzler, R., Agarwal, V., Stefano, J., Bartel, D.P., and Stoffel, M. (2014). Assessing the ceRNA hypothesis with quantitative measurements of miRNA and target abundance. Mol. Cell 54, 766776. Cummins J. M.et al. (2006) The colorectal microRNAome Proc. Nati. Acad. Sci. U.S.A. 103, 3687-3692 Arvey A, Larsson E, Sander C, Leslie CS, Marks DS. (2010). Target mRNA abundance dilutes MicroRNA and siRNA activity. Mol Syst Biol, 6(363). Li, H. & Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589-595 Yan H, Choi AJ, Lee BH, Ting AH.(201 1) Identification and functional analysis of epigenetically silenced microRNAs in colorectal cancer cells. PLoS One 6(6):e20628 Garcia et al., Weak seed-pairing stability and high target-site abundance decrease the proficiency of Isy-6 and other microRNAs. (2011) Nature Structural & Molecular Biology 18, 1139-1146 Robinson M, Oshlack A, (2010) A scaling normalization method for differential expression analysis of RNA-seq data Genome Biology, 11:R25 . Falcon, S. & Gentleman, R. (2007). Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257-258 Mukherji, S., M. S. Ebert, ., A. van Oudenaarden. (2011). MicroRNAs can generate thresholds in target gene expression. Nat. Genet. 43: 854-859 Levine E, McHale P, Levine H. (2007). Small regulatory RNAs may sharpen spatial expression patterns. PLoS computational biology, 3(11):e233, Figliuzzi M, Marinari E, De Martino A. (2013). MicroRNAs as a selective channel of communication between competing RNAs: a steady-state theory. Biophys J 104: 1203-1213. Yuan Y, Liu B, Xie P, Zhang MQ, Li Y, Xie Z, Wang X. (2015). Model-guided quantitative analysis of microRNA-mediated regulation on competing endogenous RNAs using a synthetic gene circuit. Proc Natl Acad Sci 112: 3158-3163. Ala U, Karreth FA, Bosia C, Pagnani A, Taulli R, Leopold V, Tay Y, Provero P, Zecchina R, Pandolfi PP. (2013). Integrated transcriptional and competitive endogenous RNA networks are cross-regulated in permissive molecular environments. Proc Natl Acad Sci 110: 7154-7159. R Core Team. (2011). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. Hausser J, Zavolan M. (2014). Identification and consequences of miRNA-target interactionsbeyond repression of gene expression. Nat Rev Genet 15: 599-612. Bosson AD, Zamudio JR, Sharp PA. (2014). Endogenous miRNA and target concentrations determine susceptibility to potential ceRNA competition. Mol Cell 56: 347-359. Broderick JA, Zamore PD. (2014). Competitive endogenous RNAs cannot alter microRNA function in vivo. Mol Cell 54: 711-713. 100 Supplementary Note Derivation of the mathematical model of microRNA regulation In order to investigate the regulation of genes by microRNAs we build a kinetic model describing the expression of a gene that is regulated on the post-transcriptional level by a microRNA. We start with a previously published model of microRNA regulation [Mukherji et al., 20111 that we extend to include the competition of multiple mRNAs for the same microRNA regulation and the turnover of the microRNA (see later section). The model is an ordinary differential equation model that describes the temporal evolution of free mRNA levels [mi] as well as the levels of the complex between mRNAs and the microRNA [milt] for an unlimited number of regulated genes. We denote different genes and the parameters associated with them by subscripts. We assume that mRNA is transcribed with the constant rate vi and constitutively degraded with a rate d- [mi]. The mRNA can bind to the free microRNA to reversibly form the complex mitt, with the associated on-rate k?' and off-rate [mii]. k? . When bound in the complex, the mRNA is degraded with the rate d" By assuming mass-action kinetics, the ordinary differential equations for the free levels of an mRNA mi and the levels of the respective complex mip can be written as d[mi] =vi - d . [mi] - k?' - [mi] - [p] + ko I - [mip] dt 0T , (1) d[m2 p] n (2) k - [mi] - [p] - k'f - [mip] - di" - [mip] dt In the beginning we assume that the turnover of microRNAs is much slower and therefore we treat the total microRNA concentration as constant. Consequently, the following conservation relation holds N [WT]=[p]+ [mui] (3) , 3=1 where [ 1 T],[p] and [mjp] are the levels of total microRNA, free microRNA and all complexes formed by the microRNA with the regulated mRNAs mj with j = 1, ... , N , respectively. Solving equation 2 for steady state, i.e. setting the time derivate of [mipi] equal to zero, we obtain i[mi] - [P] [m ]= Ki (4) 4 ' is the dissociation constant of the mRNA-microRNA interaction. It follows from equation where Ki = 4 that the concentrations of the complexes formed by two different mRNAs with the microRNA are related by [mxp] _ [m.] Ky [myPI] K, [my] Using equations 3 and 5 we can solve the steady state of [mip] as [mip] = [mi] K (5) (6) [T 1+w Here we define the sum over all free levels of regulated mRNAs normalized by their respective dissociation constants =(7) E j=1 Kj as the microRNA workload. It follows from equations 4 and 6 that the inverse of one plus the microRNA workload is the fraction of free microRNA 7 (8) . [A] = 1+w The workload describes the sequestration of the microRNA by all regulated mRNAs and therefore captures the competition between co-regulated genes. With equation 6 we can write the steady state of the free mRNA levels implicitly as [mni] = r Z = 1 n? + Kic(1+w) 9 K-lw where we define (10) [i] = as the steady state concentration of the mRNA when it is not regulated by a microRNA. Further it is beneficial to also define the effective total microRNA concentration as [p7]= .- [pT ] .(11) To quantify the effect of the microRNA regulation on the free levels of an mRNA we introduce the measure of repression as Ri = 1 - [m.] [1?] (12) Therefore repression of 0% (Ri = 0) means the free levels of the mRNA are not changed by the microRNA regulation and repression of 100% (Ri = 1) means the levels of the free mRNA are completely suppressed by the microRNA regulation. Using the implicit expression for the steady state of the free mRNA (equation 9), the repression of regulated mRNAs can be re-written in terms of the workload as Ri = [me] I- '.j (13) Ki (14) [in? (15) 1+w + xj where xi =- d(l'I - is the ratio between the maximal microRNA mediated mR.NA degradation rate constant (at zero microRNA workload, w = 0) and the constitutive mRNA degradation rate constant. Therefore, at a given workload of the microRNA, the repression of any regulated mRNA is simply determined by its ratio between the maximal microRNA mediated degradation rate constant and the constitutive mRNA degradation rate constant. The workload at which each mRNA's repression is reduced to half of the maximal repression present at zero workload w = 0 is R - Ri = 0) =1+x . (16) An increased ratio of microRNA mediated to constitutive mRNA degradation rate constant therefore increases repression and also shifts the loss of repression to higher microRNA workload values. 8 The steady state of the mRNA can also be solved explicitly as [mi] = ([m+ - [p*I -K* (1 +wi) + ([m?] - [i*J -Ki (1 +-wi))2 4- [my] -K- (1 +wi) , (17) where Wi =W - (18) ,i] is the workload of the microRNA contributed by all regulated mRNAs except mRNA mi. The competition of co-regulated mRNAs results in an apparent dissociation constant KZ for each regulated mRNA, depending on the workload contributed by all co-regulated mRNAs: Ki =Ki-(1 +w) . (19) Further, to quantify the influence of an mRNA towards the microRNA, we introduce the fraction of microRNA sequestrated by mRNA mi as [Mi] s 1+ . (20) Quantification of mRNA crosstalk To investigate the coupling between co-regulated genes that share a common microRNA regulation, we introduce the measure of crosstalk strength. Crosstalk strength describes the relative change in the free levels of the receiver m, upon a relative change in the free levels of the sender m. Cr - Oln([m,]) _ O[m,] [m,] 91Bn([Tn.]) - [m,] [m,l (44) (44 Using the implicit equation for the steady state of the free mRNA levels (equation 9) and the theorem on implicit differentiation we can rewrite equation 44 and solve it as - K,+ VM-1 ( gr[T] - [m,] - [m,] K_+_(45) S[T] a[,m1] [M[]( v,-d"P -[p'] d" + V -d"A -[p -J2 Ks K, -(1 +w)2 (d + T] 22 m ) V ] Crosstalk strength is always positive, because all terms in equation 47 are positive. And it is always smaller than 1, because [ms 1 1 + + [ . 2 (48) Crosstalk strength can be reformulated in terms of repression and sequestration as C7 = S, - (9 Rr (19 1 -R, -S where R, is the repression of the receiver in the given state (cf. equation 12) and Sr and S, are the fractions of microRNA sequestrated by the receiver and the sender, respectively (cf. equation 20). Further it can be shown that given a certain concentration of the sender [m] crosstalk strength will be maximal when all concentrations of co-regulated mRNAs (including the receiver) are close to zero [in] -+ 0 Vj # s (50) . Therefore crosstalk strength at a given concentration of the sender [m,] will always be equal to or less than Cr < Ss -R mrax .(1 Equation 51 can also be used to estimate the limits of crosstalk effects among several mRNAs. Case 1: When the receiver is sequentially influenced by multiple senders, all of them who share the same microRNA regulation, the sum over all crosstalk strengths must be smaller 1: C; R"max. r S = Rm1ax . r 13 1 +w -+R rmax < 1 . (52) The sum over all crosstalk strengths from different senders can be re-written as the product of repression of the receiver times the sum over all fractions of microRNA sequestration by the senders (-). The sum over all fractions of microRNA sequestration must be smaller 1 (-.). Case 2: When the receiver is sequentially influenced by multiple senders, all of whom share a connon microRNA regulation with the receiver, but no common microRNA regulation among each other, the sum over all crosstalk strengths must be smaller 1. Let us denote the different microRNA regulators with the index k, then we can formulate this as Cr<Z Rk"m k .Rs Rsk'max: . (53) k k The sum over all crosstalk strengths from the different senders with microRNA regulation k towards the receiver can be re-written as the product of repression of the receiver by microRNA regulation k times the fraction of microRNA k sequestered by the respective sender (-). The fraction of a microRNA k sequestered by the respective sender is always smaller 1 (--). Assuming that the repression effects by different microRNA regulations are additive, the sum over all receiver repressions by the microRNAs must be smaller 1 ( ... ). Also for all cases in-between, where the receiver is influenced by multiple senders, some of them who might share a common microRNA regulation, the sum over all crosstalk strengths must be smaller 1. References [Baccarini et al., 2011] Baccarini, A., Chauhan, H., Gardner, T. J., Jayaprakash, A. D., Sachidanandam, R. and Brown, B. D. (2011). Kinetic analysis reveals the fate of a microRNA following target regulation in mammalian cells. Current biology : CB 21, 369-376. [Baek et al., 20081 Baek, D., Vill6n, J., Shin, C., Camargo, F. D., Gygi, S. P. and Bartel, D. P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71. IBruggeman et al., 20091 Bruggeman, F. J., Bhithgen, N. and Westerhoff, H. V. (2009). Noise management by molecular networks. PLoS Computational Biology 5, e1000506. [Elf and Ehrenberg, 20031 Elf, J. and Ehrenberg, M. (2003). Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Research 13, 2475-2484. [Gantier et al., 20111 Gantier, M. P., McCoy, C. E., Rusinova, I., Saulep, D., Wang, D., Xu, D., Irving, A. T., Behlke, M. A., Hertzog, P. J., Mackay, F. and Williams, B. R. G. (2011). Analysis of microRNA turnover in mammalian cells following Diceri ablation. Nucleic Acids Research 39, 5692-5703. [Haley and Zamore, 20041 Haley, B. and Zamore, P. D. (2004). Kinetic analysis of the RNAi enzyme complex. Nature structural & molecular biology 11, 599-606. [Lim et al., 20031 Lim, L. P., Lau, N. C., Weinstein, E. G., Abdelhakim, A., Yekta, S., Rhoades, M. W., Burge, C. B. and Bartel, D. P. (2003). The microRNAs of Caenorhabditis elegans. Genes & Development 17, 991-1008. [Mukherji et al., 20111 Mukherji, S., Ebert, M. S., Zheng, G. X. Y., Tsang, J. S., Sharp, P. A. and van Oudenaarden, A. (2011). MicroRNAs can generate thresholds in target gene expression. Nature Genetics 43, 854-859. [Paulsson, 20041 Paulsson, J. (2004). Summing up the noise in gene networks. Nature 427, 415-418. [Pedraza and van Oudenaarden, 20051 Pedraza, J. M. and van Oudenaarden, A. (2005). Noise propagation in gene networks. Science 307, 1965-1969. [Schwanhiiusser et al., 20111 Schwanhiiusser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W. and Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature 473, 337-342. 14