Molecular Titration by MicroRNAs and Target Mimic Inhibitors by MASSACHUSETTS S INSTITUTE OFTECHNC Margaret S. Ebert SEP 14 2010 LOGY LIB3RAF; IES M.Phil., Molecular Biology University of Cambridge 2004 B.S., Molecular, Cellular, and Developmental Biology Yale University 2003 ARCHIVES SUBMITTED TO THE DEPARTMENT OF BIOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTORATE OF PHILOSOPHY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY SEPTEMBER 2010 @ 2010 Massachusetts Institute of Technology All rights reserved IA / .. ... .. - -...................... . Signature of Author....... Margaret S. Ebert Department of Biology June 18, 2010 Certified by............... . . .. ... . . . .. .. . . . .. ........................... Phillip A. Sharp Institute Professor of Biology Thesis Supervisor >2: Accepted by........... . . .......................... . . ...--.- Stephen P. Bell Professor of Biology Chair, Biology Graduate Committee Molecular Titration by MicroRNAs and Target Mimic Inhibitors by Margaret S. Ebert ABSTRACT MicroRNAs (miRNAs) are short, highly conserved non-coding RNA molecules that repress gene expression in a sequence-dependent manner. Each miRNA is predicted to target hundreds of genes, and a majority of protein-coding genes are computationally predicted to be miRNA targets. To test miRNA functions experimentally, we introduced the miRNA "sponge" method, which uses miRNA target mimics to sequester mature miRNAs and thereby create continuous miRNA loss of function in cell lines and transgenic organisms. Sponge RNAs contain complementary binding sites to a miRNA of interest, and are produced from transgenes within cells. As with most miRNA target genes, a sponge's binding sites are specific to the miRNA seed region, which allows them to block a whole family of related miRNAs. This transgenic approach has proven to be a powerful tool to generate miRNA phenotypes in a variety of experimental systems. Bulk measurements on populations of cells have indicated that, although pervasive, repression due to miRNAs is on average quite modest. To assay repression in single cells, we performed quantitative fluorescence microscopy and flow cytometry to monitor a target gene's protein expression in the presence and absence of regulation by miRNA. We found that repression among individual cells varies dramatically. miRNAs establish a threshold level of target mRNA below which protein production is highly repressed and above which expression responds ultrasensitively to target mRNA input until reaching high enough mRNA levels to almost escape repression. We constructed a mathematical model describing molecular titration of target mRNAs by miRNAs. The model predicted, and experiments confirmed, that the ultrasensitive regime could be shifted to higher target mRNA levels by increasing the miRNA concentration or the number of miRNA binding sites in the 3' untranslated region (UTR) of the target mRNA. Thus even a single species of miRNA can act both as a switch to effectively silence gene expression and as a fine-tuner of gene expression. This fits the emerging paradigm in which miRNAs help to confer robustness to biological processes by reinforcing transcriptional programs, attenuating leaky transcripts, and perhaps buffering random fluctuations in transcript copy number. Thesis Supervisor: Phillip A. Sharp Title: Institute Professor of Biology Table of contents A bstract........................................................................................... . .. 2 A cknow ledgm ents....................................................................................5 Chapter 1. 6 Introduction......................................................................................... 6 systems.......................................... miRNA biogenesis in mammalian Regulation of miRNA biogenesis......................................................7 miRNA target recognition...............................................................8 Mechanisms of gene repression........................................................8 Modulation of miRNA-mediated repression..........................................9 Target prediction and validation........................................................9 11 miRNA expression profiling ............................................................. Functional manipulation..................................................................11 12 References................................................................................ Chapter 2. MicroRNA sponge inhibitors: competitive inhibitors of small RNAs in mammalian . 22 cells................................................................................................ Introduction..............................................................................22 23 ................................................... Results 27 D iscussion................................................................................ Methods......................................................................................29 29 References................................................................................. 1 Figures.....................................................................................3 Supplementary Information....................................35 Chapter 3. MicroRNA sponge inhibitors: progress and possibilities...................................40 40 Introduction.............................................................................. Recent applications of miRNA sponges................................................41 Stable miRNA sponge expression.......................................................42 miRNA sponges in transgenic animals..................................................44 45 Are there natural miRNA sponges? ....................................................................... Concluding remarks....................................................................47 References................................................................................48 53 Figures..................................................................................... Supplementary Information...........................................................56 Chapter 4. MicroRNAs generate gene expression thresholds with ultrasensitive transitions..... Introduction.............................................................................. Results and Discussion................................................................58 M ethods................................................................................... References................................................................................ Figures..................................................................................... Supplementary Information...........................................................71 57 57 62 64 67 Chapter 5. Roles for microRNAs in conferring robustness to biological processes..................81 Introduction.............................................................................. 81 miRNA-target architectures that increase robustness..............................82 miRNAs attenuate leaky transcripts......................................................84 miRNAs set gene expression thresholds for their targets.............................85 miRNAs may buffer transcriptional noise..............................................86 Some miRNA phenotypes appear upon stress..........................................88 miRNAs, robustness and disease......................................................89 Implications for miRNA influence in evolution.....................................90 C onclusions.............................................................................. 91 References...................................................................................9 1 Figures..................................................................................... 97 Chapter 6. Conclusions and future directions...............................................................101 C onclusions................................................................................10 Future directions..........................................................................103 References.................................................................................105 F igures......................................................................................106 1 Appendix. Genome-wide dissection of microRNA functions and co-targeting networks using gene set signatures.......................................................................................107 Introduction ................................................................................ 107 Results......................................................................................109 D iscussion ................................................................................. 117 Experimental Procedures................................................................118 References.................................................................................120 Figures......................................................................................127 Supplemental Information...............................................................135 C urriculum vitae...................................................................................156 Acknowledgments The work presented in this thesis was made possible by the generous help of many friends, colleagues, and teachers. For their contributions I thank the following: members of the Koch Institute fifth floor labs for advice, protocols and reagents, and for their welcoming and cooperative spirit; Koch Institute core facilities staff Glenn Paradis, Michele Perry, and Mike Jennings for flow cytometry training and cell sorting; Margarita Siafaca for administrative support; Mary Lindstrom for lab management and for making figures for this thesis; members of the Sharp lab 2005-20 10, for countless questions answered and conversations shared, with special thanks to Amanda Young, Joel Neilson, Anthony Leung, Grace Zheng, Amy Seila White, Mauro Calabrese, Lourdes Aleman, and John Doench for training on new techniques; Joe Markson and Peter Ebert for helpful discussions; Tudor Fulga; Madhu Kumar; Moshe Gatt; Shankar Mukherji, John Tsang, and Gregor Neuert for friendly and productive collaborations; thesis committee members Jackie Lees, Dave Bartel, and Alexander van Oudenaarden for insightful comments and recommendations; and Phil Sharp for exceptional mentorship and inspiration. Chapter 1. Introduction This chapter was written by Margaret S. Ebert. In the past decade we have witnessed a revolution in molecular biology sparked by the discovery of RNA interference (RNAi): pathways in which small RNAs complexed with regulatory proteins sequence-specifically interact with messenger RNAs (mRNAs) to silence gene expression. This thesis concerns the branch of RNAi called microRNAs (miRNAs). miRNAs were discovered in 1993 when Victor Ambros's group cloned the C. elegans lin-4 gene, which was known to control larval developmental timing (Lee et al. 1993). Lin-4 is a non-coding RNA that produces a 22-nucleotide (nt) form with partial complementarity to sequences in the 3' untranslated region (UTR) of lin-14; lin-4 was shown genetically to down-regulate LIN- 14 protein. The next animal miRNA was not discovered until 2000, when Gary Ruvkun's group reported that let- 7 is a 21-nt RNA that down-regulates lin-41 and other target genes involved in temporal control of worm development (Reinhart et al. 2000). Unlike lin-4, let-7 was found to be conserved in other phyla including vertebrates (Pasquinelli et al. 2000). Since 2000, cloning and sequencing of small RNAs from a variety of organisms has revealed the expression of miRNAs in animals, viruses, and plants (Lagos-Quintana et al. 2001, Lau et al. 2001, Reinhart et al. 2002, Pfeffer et al. 2004). Some are deeply conserved, some species-specific. At present there are several hundred confirmed miRNAs in mammals representing about 200 conserved miRNA families (Chiang et al. 2010). miRNA biogenesis in mammalian systems Most miRNAs are processed from longer precursors that are transcribed by Pol II, capped and polyadenylated (Lee et al. 2004). These transcripts are called pri-miRNAs. A minority of pri-miRNAs are transcribed by Pol III (Borchert et al. 2006). miRNA hairpins (stem-loops with imperfect pairing in the stem) are located in intergenic regions or introns; about 40% are in introns of protein-coding host genes (Rodriguez et al. 2004). Some occur in clusters of two or more within one pri-miRNA (Baskerville and Bartel 2005). The hairpin is recognized and excised by the nuclear RNaseIII enzyme Drosha and its double-stranded RNA binding protein partner DGCR8 (Han et al. 2004). Associated proteins include p68 and p72 RNA helicases and hnRNPs that may promote Drosha processing of some precursors (Gregory et al. 2004). The excision occurs cotranscriptionally, before the introns are spliced (Morlando et al. 2008). The excised hairpin, called the pre-miRNA, is ~70 nt long with a 2-nt 3' overhang. A Droshaindependent processing pathway also exists: mirtrons are pre-miRNAs generated by splicing and debranching of short introns (Ruby et al. 2007, Okamura et al. 2007). The pre-miRNA is transported to the cytoplasm by Exportin-5 in a Ran-GTP-dependent manner (Lund et al. 2004). In the cytoplasm, pre-miRNAs are rapidly recognized and cut by the RNaseIII enzyme Dicer (Hutvigner and Zamore 2002) and its double-stranded RNA binding protein partner TRBP (Chendrimada et al. 2005) or PACT (Lee et al. 2006). The product is a ~22-nucleotide duplex with 2-nt 3' overhangs. The strands of the duplex are unwound and one strand is loaded into one of the four Argonaute proteins to form the core miRNA effector complex (Pillai et al. 2004). Typically the miRNA strand whose 5' end is less stably paired gets incorporated into Argonaute as the mature miRNA guide strand (Schwarz et al. 2003). The remaining strand, called the passenger strand or miRNA star (*) strand, is degraded. Some miRNA duplexes incorporate each strand frequently, e.g. miR-17-5p and -3p, and the relative amounts of miRNA and miRNA* (or -5p and -3p) can vary substantially among different tissues (Landgraf et al. 2007, Chiang et al. 2010). One Dicer-independent miRNA has been discovered: miR-451 is processed from its pre-miRNA by Ago2 (Cheloufi et al. 2010), the sole Argonaute that has endonucleolytic activity for paired RNAs (Meister et al. 2004). Biochemical purifications have identified several proteins that associate with Argonautes and can influence miRNA loading or co-regulate target mRNAs. These include FMRP (Caudy et al. 2002), Gemin-3 and -4 (Mourelatos et al. 2002), GW182 (Liu et al. 2005a), MOV10 (Meister et al. 2005), and RCK/p54 (Chu and Rana 2006). Regulation of miRNA biogenesis miRNA expression is regulated at multiple levels. Intron-embedded pri-miRNAs undergo the same transcriptional regulation as their host genes. In embryonic cells and tumor cells, the Drosha processing of some pri-miRNAs is blocked (Thomson et al. 2006). In some cell lines Drosha processing is more efficient when there are more cell-cell contacts (Hwang et al. 2009). The splicing regulatory protein KSRP associates with both Drosha and Dicer complexes and binds to the loop region of a subset of miRNA precursors, promoting their processing (Trabucchi et al. 2009). The pri-miRNA can be subject to Ato-I editing by ADAR adenosine deaminase; this can prevent Drosha processing (Yang et al. 2006) or alter the miRNA sequence and thereby change its target specificity (Kawahara et al. 2007). The let-7 pre-miRNAs are recognized by Lin28, which recruits the uridylyl transferase TUT4 to add a 3' oligouridine tail, blocking Dicer processing (Heo et al. 2009) miRNA turnover is also regulated. Mature miRNAs are protected through their association with Argonaute protein complexes (Diederichs and Haber 2008). Perhaps for this reason they can be very stable - the heart muscle-specific miRNA miR-208 has a measured in vivo half-life of about 12 days (van Rooij et al. 2007) - but a broad range of differential miRNA half-lives are observed in cultured cells (Bail et al. 2010). The turnover of miR- 150 occurs rapidly in stimulated T cells (Monticelli et al. 2005) and miR-122 degradation is accelerated by interferon beta signaling in liver cells (Pedersen et al. 2007). On the other hand, 3' monoadenylation of mature miR-122 by the polyA polymerase GLD-2 has a stabilizing effect (Katoh et al. 2009). Non-templated addition of adenines and uridines to the miRNA 3' end is common and adds to the 3' heterogeneity that arises from imprecise processing (Chiang et al. 2010). miRNA target recognition Most known miRNA-target interactions occur at partially complementary sites in 3' UTRs. Some binding sites have been identified in coding regions (Rigoutsos 2009) and it is possible for miRNAs to repress expression by binding to sites in the 5' UTR (Lytle et al. 2007). The major specificity determinant for miRNA-target binding is called the seed, which is defined as miRNA nucleotides 2-8 of and is the region of highest evolutionary conservation (Lim et al. 2003) and greatest importance for repression (Doench and Sharp 2004). Structural studies show it is presented for target recognition by the Argonaute protein, pre-structured for base-pairing, and that miRNA position 1 is not paired to the target (Parker et al. 2005). Genome-wide analysis of target repression shows the strongest effects from 8mer seed matches, viz. matches to miRNA positions 2-8 plus an A opposite position 1 (Baek et al. 2008). More moderate effects are observed with 7mer seed matches: pairing at positions 2-8, or pairing at positions 2-7 plus an A opposite position 1. miRNAs are grouped into families that share a common seed and whose members may be encoded at near or distant genomic loci. Seed family members are expected to regulate the same set of targets, with perhaps slight preferences based on different pairing to the 3' ends (Grimson et al. 2007). The contribution of the miRNA 3' end to base-pairing is unresolved and may be minimal (Bartel 2009). Features of the sequence flanking the seed match also contribute to the strength of repression. Optimal targeting occurs where the binding site is located more than 15 nt downstream of the stop codon (presumably placing it out of the way of translating ribosomes); near the proximal or distal end of the 3' UTR rather than in the middle; in an AU-rich, relatively unstructured region; and with other miRNA binding sites nearby (Grimson et al. 2007). Some cooperativity is observed where sites are 13-35 nt apart (Saetrom et al. 2007). Sites that are deeply conserved tend to show stronger repression but non-conserved sites can also be functional (Farh et al. 2005). Mechanisms of gene repression miRNAs were first thought to act through translational repression. In recent years there have been reports suggesting mechanisms involving steps in both translation initiation and elongation. The observation of miRNAs and target mRNAs cosedimenting with polysome fractions supports a post-initiation mechanism such as slowed elongation or enhanced termination (Olsen and Ambros 1999, Petersen et al. 2006). Other evidence implicates the initiation step of translation: dependence of repression on the presence of the m7 G mRNA cap (Pillai et al. 2005), or association of the miRNA complex with the inhibitory factor eIF6, which blocks formation of the 80S ribosome (Chendrimada et al. 2007). The mechanism of translational repression remains unresolved and it is possible that different miRNAs in different cellular contexts use more than one mechanism. It appears too that the experimental protocols used in these reports generate different outcomes: applying different DNA or mRNA transfection methods (Lytle et al. 2007) or using different promoters for target reporters (Kong et al. 2008) altered the apparent mechanism of repression. Moreover, the interpretation of some experimental results is problematic where experimental modulations such as inefficient IRES elements or an m7A cap create a new rate-limiting step in translation (Nissan and Parker 2008). Increasingly miRNAs have been shown to act through mRNA degradation in addition to and independent of translational repression. Targets of let-7 and lin-4 in C. elegans show enhanced mRNA degradation (Bagga et al. 2005) and in mammalian cells, genome-wide analysis indicates mRNA knockdown often in excess of translational repression for target genes (Baek et al. 2008, Hendrickson et al. 2009). Where there is extensive complementarity to the miRNA, Ago2 can perform endonucleolytic cleavage of target mRNA (Meister et al. 2004), but the general mRNA knockdown effects are not dependent on Ago2 catalysis, with the known exception of miR- 196 and its almost perfectly complementary target HoxB8 (Yekta et al. 2004). Rather the mechanism involves accelerated deadenylation and decapping of the target mRNA (Wu et al. 2006, Piao et al. 2010). The miRNA complex recruits the CAF 1 and CCR4 deadenylases, and the deadenylated message is then decapped by Dcp 1/Dcp2 enzymes, after which is it vulnerable to 5'-to-3' exonucleolytic decay by XrnI and 3'-to-5' decay by the exosome (Fabian et al. 2009). This accelerated turnover mechanism appears to play a role in the clearance of many deposited maternal transcripts during the maternal-zygotic transition in zebrafish and other animals (Giraldez et al. 2006). The subcellular localization of miRNA repression is also an ongoing area of investigation. Mature miRNAs are cytoplasmic with the known exception of miR-29b, which contains a 6-nt nuclear localization sequence in its 3' end (Hwang et al. 2007). Although miRNA complexes are associated with polysomes, they are also found in Pbodies, cytoplasmic granules that exclude ribosomes and contain RNA degradation enzymes such as Dcpl/2 and XrnI (Liu et al. 2005b). Whether localization in P-bodies is a cause (Liu et al. 2005a) or consequence (Eulalio et al. 2007) of repression is not entirely clear. Additionally, the dynamics of miRNA complexes (dis)associating with P-bodies appear to be slow whereas those of the (dis)association with stress granules, another type of translationally silent cytoplasmic granule, are rapid (Leung et al. 2006). Upon cellular stress, Argonaute proteins traffick to stress granules in a miRNA-dependent manner, and target mRNAs may be stored there for subsequent re-initiation of translation or for degradation. Modulation of miRNA-mediated repression Some miRNA targets show reversible repression. In hippocampal neurons, miR- 134mediated repression of Limkl near synapses is partially rescued by BDNF treatment (Schratt et al. 2006). In hepatocarcinoma cells, HuR binds an AU-rich motif in the 3' UTR of CAT- 1 mRNA, releasing it from P-bodies and relieving miR- 122-mediated repression upon amino acid starvation or other stresses (Bhattacharyya et al. 2006). In zebrafish and mammalian cells, Dndl binds several target mRNAs in U-rich regions adjacent to miRNA binding sites, thereby occluding miRNA binding and rescuing target expression (Kedde et al. 2007). In addition to modulation of specific targets by RNAbinding proteins, some miRNA-target interactions may be preempted by the expression of competing target RNAs with strong seed matches (see Chapter 3). There are also factors that may globally modulate miRNA activity by (de)stabilizing Argonaute proteins. mLin-41 is an E3 ubiquitin ligase that polyubiquitinates Ago2, promoting its protein turnover in murine stem cells (Rybak et al. 2009). On the other hand, proline hydroxylation of Ago2 by C-P4H(I) appears to stablize the protein (Qi et al. 2008). These post-translational modifications may be important as Argonaute is a limiting component in the miRNA pathway: transfection of artificial miRNA is seen to compete with endogenous miRNAs for loading into Argonaute complexes, such that many miRNA target genes are partially derepressed (Khan et al. 2009). Target prediction and validation We assume that miRNAs function through their target genes, so it is critical to identify the set of target genes for each miRNA. Computational methods such as TargetScan and PicTar score all or some of the following parameters in annotated 3' UTRs: number of seed matches, type of seed matches (e.g. 8mer match or 7mer match), context around the site, and species conservation (Lewis et al. 2003, Krek et al. 2005, Friedman et al. 2009). With these approaches, a typical mammalian miRNA has several hundred predicted targets. To validate target predictions, there are several commonly used assays. One approach is to append the 3' UTR or UTR fragment of a predicted target onto a reporter gene such as luciferase, and transfect cultured cells with the reporter plasmid. By measuring the average protein output of a wild-type UTR to a version with the miRNA binding sites mutated (typically with point mutations in the seed) in cells expressing the miRNA of interest, one can assess the degree of repression (Lewis et al. 2003). Adding or inhibiting the miRNA should modulate the strength of repression. While the luciferase assay is convenient and sensitive, its limitations are beginning to be recognized: since it measures the population average of transfected cells, it may obscure substantial cell-to-cell differences in repression. Furthermore, the luciferase reporters are typically driven not by cellular promoters but by strong viral promoters that may produce enough target mRNA to overwhelm the pool of endogenous miRNAs and under-report the miRNA activity relative to its physiological activity (see Chapter 4). Genome-wide approaches are also used to provide evidence for target predictions. Microarray or RNA deep sequencing in the presence and absence of a miRNA of interest reveals changes in target mRNA abundance (Lim et al. 2005). To capture total repression including translational repression, mass spectrometry measures target protein abundance (Baek et al. 2008), and ribosome profiling measures translational activity on target mRNAs (Hendrickson et al. 2009). With all of these methods, the degree of target repression can be correlated to the number and quality of miRNA binding sites in the target genes. Unlike in the UTR reporter assay, however, repression is not necessary due to direct targeting but may also reflect secondary effects. The biochemical association of miRNA complexes with targets can be assayed by immunopurification of Argonaute complexes followed by isolation and sequencing of the associated mRNAs (Karginov et al. 2007, Azuma-Mukai et al. 2008). More recently, this method has been improved by means of cross-linking the complexes in cells (Chi et al. 2009, Zisoulis et al. 2010), in some cases site-specifically (Hafner et al. 2010), before immunopurification of the miRNA-target complexes. miRNA expression profiling The first miRNAs were discovered by classical genetics, but most have been identified by small RNA cloning. This procedure takes advantage of the expected size range (-19-25 nt, amenable to gel purification of total RNA) and end chemistries (5' monophosphate and 3' hydroxyl, amenable to oligoribonucleotide linker ligation) of the mature miRNAs. Cloning analyses from a myriad of human and rodent tissues led to the compilation of a basic miRNA expression atlas (Landgraf et al. 2007). More recently, high-throughput Illumina sequencing has enabled confident identification of even very rare sequences (Chiang et al. 2010). Further evidence that a newly discovered small RNA is a miRNA includes the existence of a hairpin precursor, and dependence on Drosha and Dicer for expression. miRNA expression profiles vary not only by tissue but also by developmental stage and other conditions. A common practice .is to screen miRNA expression in a tissue of interest comparatively, for example before and after differentiation (Sempere et al. 2004, Xie et al. 2009); in diseased tissue and its healthy counterpart (Ikeda et al. 2007, Valastyan et al. 2009); or before and after the application of a specific stimulus (Krol et al. 2010), to find miRNAs whose expression increases or decreases substantially. Such miRNAs then become candidates for regulating the process in question. To test whether they play a causal role, one can experimentally manipulate the miRNA expression or activity in several ways. Functional manipulation Gain-of-function approaches increase the expression of the miRNA by introducing a hairpin precursor (Dickens et al. 2005) or a transfectable miRNA duplex (Doench et al. 2003). Cells treated in this manner can provide useful information about targets and phenotypes: adding brain- or muscle-specific miRNA to HeLa cells changed their mRNA expression profiles to resemble those of the corresponding tissue type (Lim et al. 2005), and ectopically expressing a B cell-specific miRNA in hematopoietic progenitor cells increased the fraction of cells committed to the B lineage (Chen et al. 2004). Nonetheless, ectopic miRNA expression does not necessarily reveal physiological targets. Gain-offunction experiments are more natural models when they serve to restore physiological miRNA function, as by introducing let-7 precursors to lung cancer cells that have aberrantly down-regulated expression of the endogenous let-7 family members (Kumar et al. 2008). Loss-of-function approaches for miRNAs include genetic knockouts (Thai et al. 2007, van Rooij et al. 2007, Johnnidis et al. 2008); chemically modified antisense oligonucleotide inhibitors (Hutvaigner et al. 2004); and target mimic inhibitors called miRNA sponges (see Chapters 2 and 3 for a detailed discussion of these strategies). Whether by abrogating the miRNA's expression or preventing it from accessing targets, deletion and inhibitor strategies cause derepression of the set of target genes. Alternatively one can block a specific miRNA-target interaction using 'target protector' oligonucleotides that pair to the miRNA binding site and its gene-specific flanking sequence (Choi et al. 2007). In some cases miRNA-target interactions are disrupted or created by natural mutations. For example, a chromosomal translocation in the oncogene HMGA2 results in removal of a region of the 3' UTR that contains multiple let-7 binding sites (Mayr et al. 2007). In sheep, a single nucleotide polymorphism in the 3' UTR of the myostatin gene creates a seed match for a muscle-specific miRNA, resulting in muscle hypertrophy (Clop et al. 2006). To date miRNAs have been implicated in controlling embryonic development, regulating the physiology of many organs of the body, and preventing or exacerbating human diseases. Nonetheless, much remains to be learned about the target genes and functions of miRNAs. This thesis addresses some of the fundamental properties of miRNA-target interactions in mammalian cells. Chapter 2 describes a method that we developed to inhibit specific miRNA seed families in cell lines and in transgenic animals. Chapter 3 reviews the expanding applications of this loss-of-function method and their contributions to the field to date. It also considers the potential for endogenous transcripts to act as target mimics to inhibit miRNAs in the same manner as our artificial inhibitors. Chapter 4 describes the results of assaying miRNA activity in single mammalian cells with a quantitative reporter for both target gene transcription and target protein expression. As will be seen, the target gene expression threshold set by miRNA concentration and miRNA binding sites helps explain the efficacy of sponge inhibitors: above a certain level of target mRNA, the endogenous pool of miRNA is overwhelmed and additional target transcripts are free to be translated. Chapter 5 explores how the gene expression threshold fits the emerging roles of miRNAs in conferring robustness to gene expression. One of those potential roles, the buffering of random fluctuations in protein output, could be assayed with an experimental model described in Chapter 6. Another role, the coordinate regulation of multiple components of signaling pathways and protein complexes, is described in the Appendix. References Azuma-Mukai A, Oguri H, Mituyama T, Qian ZR, Asai K, Siomi H, Siomi MC. Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. Proc. Natl Acad. Sci. USA 105, 7964-7969 (2008). Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP. The impact of microRNAs on protein output. Nature 455, 64-71 (2008). Bagga S, Bracht J, Hunter S, Massirer K, Holtz J, Eachus R, Pasquinelli AE. Regulation by let-7 and lin-4 miRNAs results in target mRNA degradation. Cell 122, 553-563 (2005). Bail S, Swerdel M, Liu H, Jiao X, Goff LA, Hart RP, Kiledjian M. Differential regulation of microRNA stability. RNA 16, 1032-1039 (2010). Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233 (2009). Baskerville S, Bartel DP. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11, 241-247 (2005). Beilharz TH, Humphreys DT, Clancy JL, Thermann R, Martin DI, Hentze MW, Preiss T. microRNA-mediated messenger RNA deadenylation contributes to translational repression in mammalian cells. PLoS One 4, e6783 (2009). Bhattacharyya SN, Habermacher R, Martine U, Closs El, Filipowicz W. Relief of microRNA-mediated translational repression in human cells subjected to stress. Cell 125, 1111-1124 (2006). Borchert GM, Lanier W, Davidson BL. RNA polymerase III transcribes human microRNAs. Nat. Struct. Mol. Biol. 13, 1097-1101 (2006). Caudy AA, Myers M, Hannon GJ, Hammond SM. Fragile X-related protein and VIG associate with the RNA interference machinery. Genes Dev. 16, 2491-2496 (2002). Cheloufi S, Dos Santos CO, Chong MM, Hannon GJ. A dicer-independent miRNA biogenesis pathway that requires Ago catalysis. Nature (2010). Chen CY, Zheng D, Xia Z, Shyu AB. Ago-TNRC6 triggers microRNA-mediated decay by promoting two deadenylation steps. Nat. Struct. Mol. Biol. 16, 1160-1166 (2009). Chen CZ, Li L, Lodish HF, Bartel DP. MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83-86 (2004). Chendrimada TP, Finn KJ, Ji X, Baillat D, Gregory RI, Liebhaber SA, Pasquinelli AE, Shiekhattar R. MicroRNA silencing through RISC recruitment of eIF6. Nature 447, 823828 (2007). Chendrimada TP, Gregory RI, Kumaraswamy E, Norman J, Cooch N, Nishikura K, Shiekhattar R. TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436, 740-744 (2005). Chi SW, Zang JB, Mele A, Darnell RB. Argonaute HITS-CLIP decodes microRNAmRNA interaction maps. Nature 460, 479-486 (2009). Chiang HR, Schoenfeld LW, Ruby JG, Auyeung VC, Spies N, Baek D, Johnston WK, Russ C, Luo S, Babiarz JE, Blelloch R, Schroth GP, Nusbaum C, Bartel DP. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 24, 992-1009 (2010). Chu CY, Rana TM. Translation repression in human cells by microRNA-induced gene silencing requires RCK/p54. PLoS Biol. 4, e210 (2006). Clop A, Marcq F, Takeda H, Pirottin D, Tordoir X, Bib6 B, Bouix J, Caiment F, Elsen JM, Eychenne F, Larzul C, Laville E, Meish F, Milenkovic D, Tobin J, Charlier C, Georges M. A mutation creating a potential illegitimate microRNA target site in the myostatin gene affects muscularity in sheep. Nat. Genet. 38, 813-818 (2006). Dickins RA, Hemann MT, Zilfou JT, Simpson DR, Ibarra I, Hannon GJ, Lowe SW. Probing tumor phenotypes using stable and regulated synthetic microRNA precursors. Nat. Genet. 37, 1289-1295 (2005). Diederichs S, Haber DA. Dual role for argonautes in microRNA processing and posttranscriptional regulation of microRNA expression. Cell 131, 1097-1108 (2007). Doench JG, Petersen CP, Sharp PA. siRNAs can function as miRNAs. Genes Dev. 17, 438-442. (2003). Doench JG, Sharp PA. Specificity of microRNA target selection in translational repression. Genes Dev. 18, 504-511 (2004). Eulalio A, Behm-Ansmant I, Schweizer D, Izaurralde E. P-body formation is a consequence, not the cause, of RNA-mediated gene silencing. Mol. Cell Biol. 27, 39703981 (2007). Eulalio A, Huntzinger E, Izaurralde E. GW 182 interaction with Argonaute is essential for miRNA-mediated translational repression and mRNA decay. Nat. Struct. Mol. Biol. 15, 346-353 (2008). Eulalio A, Huntzinger E, Nishihara T, Rehwinkel J, Fauser M, Izaurralde E. Deadenylation is a widespread effect of miRNA regulation. RNA 15, 21-32 (2009). Fabian MR, Mathonnet G, Sundermeier T, Mathys H, Zipprich JT, Svitkin YV, Rivas F, Jinek M, Wohlschlegel J, Doudna JA, Chen CY, Shyu AB, Yates JR 3rd, Hannon GJ, Filipowicz W, Duchaine TF, Sonenberg N. Mammalian miRNA RISC recruits CAF 1 and PABP to affect PABP-dependent deadenylation. Mol. Cell 35, 868-880 (2009). Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821 (2005). Feinbaum R, Ambros V. The timing of lin-4 RNA accumulation controls the timing of postembryonic developmental events in Caenorhabditis elegans. Dev. Biol. 210, 87-95 (1999). Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92-105 (2009). Giraldez AJ, Mishima Y, Rihel J, Grocock RJ, Van Dongen S, Inoue K, Enright AJ, Schier AF. Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 312, 75-79 (2006). Gregory RI, Yan KP, Amuthan G, Chendrimada T, Doratotaj B, Cooch N, Shiekhattar R. The Microprocessor complex mediates the genesis of microRNAs. Nature 432, 235-240 (2004). Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91105 (2007). Grishok A, Pasquinelli AE, Conte D, Li N, Parrish S, Ha I, Baillie DL, Fire A, Ruvkun G, Mello CC. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34 (2001). Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M Jr, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141 (2010). Han J, Lee Y, Yeom KH, Kim YK, Jin H, Kim VN. The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev. 18, 3016-3027 (2004). Hendrickson DG, Hogan DJ, McCullough HL, Myers JW, Herschlag D, Ferrell JE, Brown PO. Concordant regulation of translation and mRNA abundance for hundreds of targets of a human microRNA. PLoS Biol. 7, e1000238 (2009). Heo I, Joo C, Cho J, Ha M, Han J, Kim VN. Lin28 mediates the terminal uridylation of let-7 precursor MicroRNA. Mol. Cell 32, 276-284 (2008). Heo I, Joo C, Kim YK, Ha M, Yoon MJ, Cho J, Yeom KH, Han J, Kim VN. TUT4 in concert with Lin28 suppresses microRNA biogenesis through pre-microRNA uridylation. Cell 138, 696-708 (2009). Hutvigner G, Simard MJ, Mello CC, Zamore PD. Sequence-specific inhibition of small RNA function. PLoS Biol. 2, E98 (2004). Hwang HW, Wentzel EA, Mendell JT. A hexanucleotide element directs microRNA nuclear import. Science 315, 97-100 (2007). Hwang HW, Wentzel EA, Mendell JT. Cell-cell contact globally activates microRNA biogenesis. Proc. Natl Acad. Sci. USA 106, 7016-7021 (2009). Ikeda S, Kong SW, Lu J, Bisping E, Zhang H, Allen PD, Golub TR, Pieske B, Pu WT. Altered microRNA expression in human heart disease. Physiol. Genomics 31, 367-373 (2007). Jin P, Zarnescu DC, Ceman S, Nakamoto M, Mowrey J, Jongens TA, Nelson DL, Moses K, Warren ST. Biochemical and genetic interaction between the fragile X mental retardation protein and the microRNA pathway. Nat. Neurosci. 7, 113-117 (2004). Johnnidis JB, Harris MH, Wheeler RT, Stehling-Sun S, Lam MH, Kirak 0, Brummelkamp TR, Fleming MD, Camargo FD. Regulation of progenitor cell proliferation and granulocyte function by microRNA-223. Nature 451, 1125-1129 (2008). Karginov FV, Conaco C, Xuan Z, Schmidt BH, Parker JS, Mandel G, Hannon GJ. A biochemical approach to identifying microRNA targets. Proc. Natl Acad. Sci. USA 104, 19291-19296 (2007). Katoh T, Sakaguchi Y, Miyauchi K, Suzuki T, Kashiwabara S, Baba T, Suzuki T. Selective stabilization of mammalian microRNAs by 3' adenylation mediated by the cytoplasmic poly(A) polymerase GLD-2. Genes Dev. 23, 433-438 (2009). Kawahara Y, Zinshteyn B, Sethupathy P, Iizasa H, Hatzigeorgiou AG, Nishikura K. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 1137-1140 (2007). Kedde M, Strasser MJ, Boldajipour B, Oude Vrielink JA, Slanchev K, le Sage C, Nagel R, Voorhoeve PM, van Duijse J, Orom UA, Lund AH, Perrakis A, Raz E, Agami R. RNA-binding protein Dndl inhibits microRNA access to target mRNA. Cell 131, 12731286 (2007). Khan AA, Betel D, Miller ML, Sander C, Leslie CS, Marks DS. Transfection of small RNAs globally perturbs gene regulation by endogenous microRNAs. Nat. Biotechnol. 27, 549-555 (2009). Kong YW, Cannell IG, de Moor CH, Hill K, Garside PG, Hamilton TL, Meijer HA, Dobbyn HC, Stoneley M, Spriggs KA, Willis AE, Bushell M. The mechanism of microRNA-mediated translation repression is determined by the promoter of the target gene. Proc. Natl Acad. Sci. USA 105, 8866-8871 (2008). Krek A, Grn D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N. Combinatorial microRNA target predictions. Nat. Genet. 37, 495-500 (2005). Krol J, Busskamp V, Markiewicz I, Stadler MB, Ribi S, Richter J, Duebel J, Bicker S, Fehling HJ, Schtibeler D, Oertner TG, Schratt G, Bibel M, Roska B, Filipowicz W. Characterizing light-regulated retinal microRNAs reveals rapid turnover as a common property of neuronal microRNAs. Cell 141, 618-631 (2010). Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T. Suppression of non-small cell lung tumor development by the let-7 microRNA family. Proc. Natl Acad. Sci. USA 105, 3903-3908 (2008). Landgraf P, Rusu M, Sheridan R, Sewer A, Iovino N, Aravin A, Pfeffer S, Rice A, Kamphorst AO, Landthaler M, Lin C, Socci ND, Hermida L, Fulci V, Chiaretti S, Foi R, Schliwka J, Fuchs U, Novosel A, MUller RU, Schermer B, Bissels U, Inman J, Phan Q, Chien M, Weir DB, Choksi R, De Vita G, Frezzetti D, Trompeter HI, Hornung V, Teng G, Hartmann G, Palkovits M, Di Lauro R, Wernet P, Macino G, Rogler CE, Nagle JW, Ju J, Papavasiliou FN, Benzing T, Lichter P, Tam W, Brownstein MJ, Bosio A, Borkhardt A, Russo JJ, Sander C, Zavolan M, Tuschl T. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414 (2007). Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T. Identification of novel genes coding for small expressed RNAs. Science 294, 853-858 (2001). Lau NC, Lim LP, Weinstein EG, Bartel DP. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862 (2001). Lee RC, Ambros V. An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864 (2001). Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin- 14. Cell 75, 843-854 (1993). Lee Y, Hur I, Park SY, Kim YK, Suh MR, Kim VN. The role of PACT in the RNA silencing pathway. EMBO J. 25, 522-532 (2006). Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN. MicroRNA genes are transcribed by RNA polymerase II. EMBO J. 23, 4051-4060 (2004). Leung AK, Calabrese JM, Sharp PA. Quantitative analysis of Argonaute protein reveals microRNA-dependent localization to stress granules. Proc. Natl Acad. Sci. USA 103, 18125-18130 (2006). Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. Prediction of mammalian microRNA targets. Cell 115, 787-798 (2003). Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP, Linsley PS, Johnson JM. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433, 769-773 (2005). Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP. The microRNAs of Caenorhabditis elegans. Genes Dev. 17, 991-1008 (2003). Liu J, Carmell MA, Rivas FV, Marsden CG, Thomson JM, Song JJ, Hammond SM, Joshua-Tor L, Hannon GJ. Argonaute2 is the catalytic engine of mammalian RNAi. Science 305, 1437-1441 (2004). Liu J, Rivas FV, Wohlschlegel J, Yates JR 3rd, Parker R, Hannon GJ. A role for the Pbody component GW182 in microRNA function. Nat. Cell Biol. 7, 1261-1266 (2005). Liu J, Valencia-Sanchez MA, Hannon GJ, Parker R. MicroRNA-dependent localization of targeted mRNAs to mammalian P-bodies. Nat. Cell Biol. 7, 719-723 (2005). Lund E, Gfttinger S, Calado A, Dahlberg JE, Kutay U. Nuclear export of microRNA precursors. Science 303, 95-98 (2004). Lytle JR, Yario TA, Steitz JA. Target mRNAs are repressed as efficiently by microRNAbinding sites in the 5' UTR as in the 3' UTR. Proc. Natl Acad. Sci. USA 104, 9667-9672 (2007). Mayr C, Hemann MT, Bartel DP. Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 315, 1576-1579 (2007). Meister G, Landthaler M, Patkaniowska A, Dorsett Y, Teng G, Tuschl T. Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Mol. Cell 15, 185-197 (2004). Meister G, Landthaler M, Peters L, Chen PY, Urlaub H, Lflhrmann R, Tuschl T. Identification of novel argonaute-associated proteins. Cuff Biol. 15, 2149-2155 (2005). Monticelli S, Ansel KM, Xiao C, Socci ND, Krichevsky AM, Thai TH, Rajewsky N, Marks DS, Sander C, Rajewsky K, Rao A, Kosik KS. MicroRNA profiling of the murine hematopoietic system. Genome Biol. 6, R71 (2005). Morlando M, Ballarino M, Gromak N, Pagano F, Bozzoni I, Proudfoot NJ. Primary microRNA transcripts are processed co-transcriptionally. Nat. Struct. Mol. Biol. 15, 902909 (2008). Mourelatos Z, Dostie J, Paushkin S, Sharma A, Charroux B, Abel L, Rappsilber J, Mann M, Dreyfuss G. miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev. 16, 720-728 (2002). Nissan T, Parker R. Computational analysis of miRNA-mediated repression of translation: implications for models of translation initiation inhibition. 14, 1480-1491 (2008). Okamura K, Hagen JW, Duan H, Tyler DM, Lai EC. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130, 89-100 (2007). Olsen PH, Ambros V. The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN- 14 protein synthesis after the initiation of translation. Dev. Biol. 216, 671-680 (1999). Parker JS, Roe SM, Barford D. Structural insights into mRNA recognition from a PIWI domain-siRNA guide complex. Nature 434, 663-666 (2005). Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B, Hayward DC, Ball EE, Degnan B, Muller P, Spring J, Srinivasan A, Fishman M, Finnerty J, Corbo J, Levine M, Leahy P, Davidson E, Ruvkun G. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89 (2000). Pedersen IM, Cheng G, Wieland S, Volinia S, Croce CM, Chisari FV, David M. Interferon modulation of cellular microRNAs as an antiviral mechanism. Nature 449, 919-922 (2007). Petersen CP, Bordeleau ME, Pelletier J, Sharp PA. Short RNAs repress translation after initiation in mammalian cells. Mol. Cell 21, 533-542 (2006). Pfeffer S, Zavolan M, Grasser FA, Chien M, Russo JJ, Ju J, John B, Enright AJ, Marks D, Sander C, Tuschl T. Identification of virus-encoded microRNAs. Science 304, 734736 (2004). Piao X, Zhang X, Wu L, Belasco JG. CCR4-NOT deadenylates mRNA associated with RNA-induced silencing complexes in human cells. Mol. Cell Biol. 30, 1486-1494 (2010). Pillai RS, Artus CG, Filipowicz W. Tethering of human Ago proteins to mRNA mimics the miRNA-mediated repression of protein synthesis. RNA 10, 1518-1525 (2004). Pillai RS, Bhattacharyya SN, Artus CG, Zoller T, Cougot N, Basyuk E, Bertrand E, Filipowicz W. Inhibition of translational initiation by Let-7 MicroRNA in human cells. Science 309, 1573-1576 (2005). Qi HH, Ongusaha PP, Myllyharju J, Cheng D, Pakkanen 0, Shi Y, Lee SW, Peng J, Shi Y. Prolyl 4-hydroxylation regulates Argonaute 2 stability. Nature 455, 421-424 (2008). Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G. The 21 -nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906 (2000). Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP. MicroRNAs in plants. Genes Dev. 16, 1616-1626 (2002). Rigoutsos I. New tricks for animal microRNAS: targeting of amino acid coding regions at conserved and nonconserved sites. Cancer Res. 69, 3245-3248 (2009). Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A. Identification of mammalian microRNA host genes and transcription units. Genome Res. 14, 1902-1910 (2004). Ruby JG, Jan CH, Bartel DP. Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83-86 (2007). Rybak A, Fuchs H, Hadian K, Smirnova L, Wulczyn EA, Michel G, Nitsch R, Krappmann D, Wulczyn FG. The let-7 target gene mouse lin-41 is a stem cell specific E3 ubiquitin ligase for the miRNA pathway protein Ago2. Nat. Cell Biol. 11, 1411-1420 (2009). Saetrom P, Heale BS, Snove 0 Jr, Aagaard L, Alluin J, Rossi JJ. Distance constraints between microRNA target sites dictate efficacy and cooperativity. Nucleic Acids Res. 35, 2333-2342 (2007). Schratt GM, Tuebing F, Nigh EA, Kane CG, Sabatini ME, Kiebler M, Greenberg ME. A brain-specific microRNA regulates dendritic spine development. Nature 439, 283-289 (2006). Schwarz DS, Hutvigner G, Du T, Xu Z, Aronin N, Zamore PD. Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208 (2003). Sempere LF, Freemantle S, Pitha-Rowe I, Moss E, Dmitrovsky E, Ambros V. Expression profiling of mammalian microRNAs uncovers a subset of brain-expressed microRNAs with possible roles in murine and human neuronal differentiation. Genome Biol. 5, R13 (2004). Thai TH, Calado DP, Casola S, Ansel KM, Xiao C, Xue Y, Murphy A, Frendewey D, Valenzuela D, Kutok JL, Schmidt-Supprian M, Rajewsky N, Yancopoulos G, Rao A, Rajewsky K. Regulation of the germinal center response by microRNA-155. Science 316, 604-608 (2007). Thomson JM, Newman M, Parker JS, Morin-Kensicki EM, Wright T, Hammond SM. Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev. 20, 2202-2207 (2006). Trabucchi M, Briata P, Garcia-Mayoral M, Haase AD, Filipowicz W, Ramos A, Gherzi R, Rosenfeld MG. The RNA-binding protein KSRP promotes the biogenesis of a subset of microRNAs. Nature 459, 1010-1014 (2009). Valastyan S, Reinhardt F, Benaich N, Calogrias D, Szaisz AM, Wang ZC, Brock JE, Richardson AL, Weinberg RA. A pleiotropically acting microRNA, miR-3 1, inhibits breast cancer metastasis. Cell 137, 1032-1046 (2009). van Rooij E, Sutherland LB, Qi X, Richardson JA, Hill J, Olson EN. Control of stressdependent cardiac growth and gene expression by a microRNA. Science 316, 575-579 (2007). Wu L, Fan J, Belasco JG. MicroRNAs direct rapid deadenylation of mRNA. Proc. Natl Acad. Sci. USA 103, 4034-4039 (2006). Xie H, Lim B, Lodish HF. MicroRNAs induced during adipogenesis that accelerate fat cell development are downregulated in obesity. Diabetes 58, 1050-1057 (2009). Yang W, Chendrimada TP, Wang Q, Higuchi M, Seeburg PH, Shiekhattar R, Nishikura K. Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat. Struct. Mol. Biol. 13, 13-21 (2006). Yekta S, Shih IH, Bartel DP. MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596 (2004). Zdanowicz A, Thermann R, Kowalska J, Jemielity J, Duncan K, Preiss T, Darzynkiewicz E, Hentze MW. Drosophila miR2 primarily targets the m7GpppN cap structure for translational repression. Mol. Cell 35, 881-888 (2009). Zisoulis DG, Lovci MT, Wilbert ML, Hutt KR, Liang TY, Pasquinelli AE, Yeo GW. Comprehensive discovery of endogenous Argonaute binding sites in Caenorhabditis elegans. Nat. Struct. Mol. Biol. 17, 173-179 (2010). Chapter 2. MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells This chapter was written by Margaret S. Ebert and edited by Joel R. Neilson and Phillip A. Sharp. This chapter was published as an article in Nature Methods vol. 4 pp. 721-726 (2007). Copyright belongs to the authors. MicroRNAs are predicted to regulate thousands of mammalian genes, but relatively few targets have been experimentally validated and few microRNA loss-of-function phenotypes have been assigned. As an alternative to chemically modified antisense oligonucleotides, we developed microRNA inhibitors that can be expressed in cells, as RNAs produced from transgenes. Termed 'microRNA sponges,' these competitive inhibitors are transcripts expressed from strong promoters, containing multiple, tandem binding sites to a microRNA of interest. When vectors encoding these sponges are transiently transfected into cultured cells, sponges derepress microRNA targets at least as strongly as chemically modified antisense oligonucleotides. They specifically inhibit microRNAs with a complementary heptameric seed, such that a single sponge can be used to block an entire microRNA seed family. RNA polymerase II promoter (Pol II)-driven sponges contain a fluorescence reporter gene for identification and sorting of sponge-treated cells. We envision the use of stably expressed sponges in animal models of disease and development. Introduction MicroRNAs are 20-24-nucleotide RNAs derived from hairpin precursors. Through pairing with partially complementary sites in 3' untranslated regions (UTRs), they mediate post-transcriptional silencing of a predicted 30% of protein-coding genes in mammals (Lewis et al. 2005). MicroRNAs have been implicated in critical processes including differentiation, apoptosis, proliferation, and the maintenance of cell and tissue identity; furthermore, their misexpression has been linked to cancer and other diseases (Lu et al. 2005, Li et al. 2007, Cheng et al. 2007, Chang et al. 2007, He et al. 2005, Care et al. 2007). But relatively few microRNA-target interactions have been experimentally validated in cell culture or in mouse models, and the functions of most microRNAs remain to be discovered. Creating genetic knockouts to determine the function of microRNA families is difficult, as individual microRNAs expressed from multiple genomic loci may repress a common set of targets containing a complementary seed sequence. Thus, a method for inhibiting these functional classes of paralogous microRNAs in vivo is needed. Presently, loss-of-function phenotypes are induced by means of chemically modified antisense oligonucleotides - 2' 0-methyl, locked nucleic acid (LNA) and others - which are presumed to pair with and block mature microRNAs through extensive sequence complementarity (Hutvagner et al. 2004, Meister et al. 2004, Orom et al. 2006). Typically, oligonucleotide inhibitors are transiently transfected into cells, providing a correspondingly transient derepression of microRNA targets. One type of inhibitor has been demonstrated to silence microRNAs in vivo: 'antagomirs,' which are 2' 0-methyl, phosphorothioate, cholesterol-modified antisense oligonucleotides; their effect in an animal, however, is only achieved with a high dose (Krutzfeldt et al. 2005). Antisense oligonucleotides work as competitive inhibitors of microRNAs, presumably by annealing to the mature microRNA guide strand after the RNA-induced silencing complex has removed the passenger strand (Davis et al. 2006). Delivering a dose sufficient to saturate the cellular pool of microRNAs is critical to their function. We reasoned that a microRNA target expressed at a sufficiently high level could, analogously, function as a competitive inhibitor of cognate microRNA(s). To boost the affinity of a decoy target for its cognate microRNA, multiple binding sites could be inserted into its 3' UTR. By designing the microRNA binding sites with a bulge at the position normally cleaved by Argonaute 2, these targets would be able to stably interact with, or 'soak up', microribonucleoprotein complexes (microRNPs) loaded with the corresponding microRNA. Such inhibitor RNAs could be expressed transiently from transfected plasmids or stably from chromosomal insertions. Because the interaction between microRNA and target is nucleated by and largely dependent on base-pairing in the seed region (positions 2-8 of the microRNA), a decoy target should interact with all members of a microRNA seed family. In so doing, it should better inhibit functional classes of microRNAs than do antisense oligonucleotides, which are thought to block single microRNA sequences. We made decoy targets for several microRNA seed families, named them 'microRNA sponges,' and tested their ability to derepress microRNA targets in mammalian cells. Here we present evidence that microRNA sponges are at least as effective as present antisense technology, that their activity is specific to microRNA seed families, and that they can be used to validate target predictions and assay microRNA loss-of-function phenotypes. Results Construction of microRNA sponges We constructed Pol II sponges by inserting tandemly arrayed microRNA binding sites into the 3' UTR of a reporter gene encoding destabilized GFP driven by the CMV promoter (Fig. 1a). Binding sites for a particular microRNA seed family were perfectly complementary in the seed region with a bulge at positions 9-12 to prevent RNA interference-type cleavage and degradation of the sponge RNA (Fig. Ib). We also constructed perfectly complementary sponges for individual microRNAs. As a control, we constructed a sponge with repeated binding sites complementary to an artificial microRNA based on a sequence from the CXCR4 gene (but not complementary to any known microRNA). Binding site information for all sponge constructs is available in Supplementary Table 1. We constructed a second class of microRNA sponges to take advantage of strong RNA polymerase III promoters (Pol III), which are known to drive expression of the mostabundant cellular RNAs (Fig. 1c). We subcloned tandemly arrayed microRNA binding sites from the GFP sponge constructs into a modified U6 small nuclear RNA promoterterminator vector, which produces short (<300 nt) RNAs with structurally stabilized 5' and 3' ends (Paul et al. 2003). As they lack an open reading frame, these U6 sponges are substrates for microRNA binding, but not for translation or translational repression. Efficacy of microRNA sponges We transfected HEK293T cells expressing abundant endogenous miR-20 with the CXCR4 control sponge plasmid (C-CX) or with sponge plasmids imperfectly (C-20b) or perfectly (C-20pf) complementary to miR-20. We cotransfected a sponge plasmid and a TK promoter-driven gene encoding Renilla reniformis luciferase (RLuc) regulated by 7 bulged miR-20 sites and an unregulated gene encoding firefly luciferase as a transfection control, at a ratio of 8:1 sponge plasmid to target plasmid. We assayed the expression of the RLuc target 24 h after transfection and observed that it was rescued by both Pol IIand Pol IL-driven sponges with bulged or perfect miR-20 binding sites (Fig. 2a). At 48 h, we observed similar results (data not shown). We measured amounts of reporter mRNA by real-time PCR and found that derepression occurred mostly at the translational level (data not shown). For both sponge classes, sponges with 4-7 bulged binding sites produced stronger derepressive effects than sponges with two perfect binding sites. This difference may be due to the availability of more binding sites in the bulged sponges, and/or to the greater stability expected of bulged sponge RNAs compared to sponge RNAs that can be cleaved by miR-20-loaded Argonaute 2. Between the two sponge classes, the CMV sponges and U6 sponges derepressed the target reporter about equally well - nearly 50% rescue of a target with 7 miR-20 binding sites relative to an unrepressed control reporter - but the U6 sponges also produced a general inhibition of RLuc expression (Supplementary Fig. 1). Fluorescence in situ hybridization with a probe against the U6 sponge RNAs primarily labeled the nucleus, as in previous work (Paul et al. 2003) (data not shown). How an inhibitor localized primarily to the nucleus can function against microRNA localized primarily in the cytoplasm is not clear. We speculate that a sufficient fraction of the U6 sponge RNA is present in the cytoplasm to inhibit mature microRNA. We performed subsequent assays with the GFP bulged sponges, as they gave the highest activity on both microRNA target reporters tested (miR-16 and miR-20). Cells transfected with these sponge plasmids expressed large amounts of GFP, with only slight repression by endogenous microRNAs. Transfected at low doses, the sponge plasmids expressed GFP mRNA at a subsaturating level such that translation was visibly repressed by endogenous microRNAs relative to unregulated GFP control constructs (data not shown). Thus, the sponge mRNAs function by associating with active microRNPs. To quantify the inhibition of cognate microRNAs by sponges, we used a target reporter with a single bulged binding site for an artificial microRNA based on the CXCR4 sequence. (This system, established in our laboratory, has been used to show that transfected small interfering RNA (siRNA) enters the same effector pathway as endogenous microRNA (Doench et al. 2003).) The majority of predicted microRNA targets contain a single binding site in their 3' UTR, so this target reporter probably mimics the response of a natural microRNA target. We cotransfected the CXCR4 siRNA at varying concentrations and included the CXCR4 sponge containing 7 bulged binding sites to the microRNA, or, as a negative control, a sponge containing 7 bulged binding sites to miR-21, a microRNA not expressed in 293T cells (Fig. 2b). At transfected siRNA concentrations of 1 and 5 nM, the luciferase target was repressed 2-2.5-fold, similar to the observed regulation by endogenous microRNAs of natural UTRs containing one binding site. Furthermore, flow cytometry analysis of GFP revealed that the CXCR4 sponge targeted by 5 nM CXCR4 siRNA was repressed to the same extent that a miR-21 sponge was repressed by endogenous miR-21 in T98G, a cell type that highly expresses that microRNA (data not shown). We infer that this range of transfected siRNA corresponds to the concentration range of natural endogenous microRNAs acting on typical target messages. In this range, the CXCR4 sponge rescued target gene expression 75-95% (1.8-1.9-fold derepression) and rescue was above 60% even at the highest siRNA concentration tested (20 nM). We conclude that the GFP sponge RNAs are being produced and accumulating to sufficiently high level to inhibit most endogenous microRNAs. To compare the efficacy of inhibiting endogenous microRNAs by microRNA sponges to that of present antisense technology, we transfected 293T cells with target reporters and either a 2' 0-methyl antisense oligonucleotide, LNA antisense oligonucleotide, or a bulged GFP sponge, or with control inhibitors (Fig. 2c). The GFP sponge more strongly derepressed the target reporter than the 2' 0-methyl antisense oligonucleotide transfected at standard conditions (20 nM) for all microRNAs tested (miR-16, 18, 20, 21 and 30). This effect could be increased slightly by cotransfecting sponge and oligonucleotide. A miR-20 sponge outperformed, but a miR- 16 sponge only performed about as well as, an LNA antisense oligonucleotide transfected at 20 nM (Fig. 2c and Supplementary Fig. 2). Perhaps the cross-reactivity of the sponges to seed family members, such as miR-17-5p in the case of the miR-20 sponge, allows them to rescue the effects of entire microRNA families more completely than specific antisense oligonucleotides. We tested sponges with artificial target reporters and 3' UTR reporters in two additional human cell lines and in mouse 3T3 cells and found them to be similarly active in all cell lines (data not shown). To investigate the possibility of expressing sponges continuously from multicopy chromosomal insertions, we constructed polyclonal cell lines by cotransfecting 293T cells with linearized GFP sponge plasmids and a puromycin selection marker. After sorting the cell lines for a high-GFP fraction, we assayed the activity of endogenous microRNA in comparison to cells transiently transfected with sponge plasmids. The stable miR- 16 sponge-expressing cell line allowed threefold higher expression of a miR16 target (relative to an untargeted control reporter) than the stable CXCR4 sponge cell line or the parental 293T cells (Supplementary Fig. 3). This represents an activity approximately 40% as strong as that of the transiently transfected sponge. Thus, sponges expressed from transgenes have the potential to at least partially inhibit endogenous microRNAs. Seed specificity of microRNA sponges To assess the specificity of the Pol I-driven sponges, we transfected HeLa cells with target reporters and sponges against two microRNAs with different seeds: miR-20, miR21 or a 50:50 combination of the two sponges (Fig. 3a). Dose-dependent derepression was apparent in samples treated with a 50:50 mixture of the two plasmids. Each target was derepressed by its cognate microRNA sponge and unaffected by the other microRNA sponge relative to treatment with the CXCR4 sponge control. In contrast, we expected sponges based on the sequence of a given microRNA to be recognized as targets by multiple microRNAs that share the seed. In HeLa cells, microRNA expression profiling detects high levels of miR-30c and miR-30d, and a much lower level of miR-30e (Barad et al. 2004). We reasoned that a sponge element based on the sequence of the lowabundance microRNA would recognize each family member through the common seed and thereby derepress a target of the high-abundance microRNA family member. Accordingly, we assayed a target reporter with perfect sites for miR-30c with either a 2' 0-methyl antisense oligonucleotide against miR-30e or a sponge with 6 bulged sites against miR-30e (Fig. 3b). As expected, the antisense oligonucleotide derepressed the miR-30c target to a very low degree, <1.5-fold, presumably by inhibiting only the lowabundance miR-30e. In contrast, the sponge designed to miR-30e derepressed the target by over fourfold, suggesting cross-reactivity with the more abundant miR-30 family members. Consistent with this, transfection of 20 nM 2' 0-methyl oligonucleotide against the more abundant miR-30c derepressed the miR-30c target to a slightly greater extent than the miR-30e sponge. Further supporting the generality of seed recognition by sponges, we observed derepression of perfect target reporters for miR- 15a, miR- 15b and miR- 16, which share a common seed, by treatment with sponges based on the miR- 16 sequence (data not shown). Validation of predicted microRNA targets To test the ability of sponges to derepress natural microRNA targets, we assayed the E2F 1 protein, a demonstrated target of the miR-20 seed family and a predicted target of miR-18 (O'Donnell et al. 2005; Fig. 4a). The amount of the target protein increased by about 1.5-fold after treatment with the miR- 18 GFP sponge and by about 2.5-fold after treatment with the miR-20 GFP sponge, as shown in relation to lanes loaded with 1 or 1.5 times the amount of lysate from the control CXCR4 sponge treatment. This difference likely results from the presence of two miR-20 binding sites and one miR- 18 site in the E2F] 3' UTR, plus the added inhibition of the coexpressed miR-20 family member miR17-5. These effects were recapitulated in a luciferase assay wherein the RLuc reporter was fused to a fragment of the E2F1 UTR spanning the two miR-20 sites (Fig. 4b). Thus, sponges show direct effects on natural and endogenous targets, and can be used to validate target predictions. To test some predicted targets that had not yet been experimentally validated, we used a luciferase reporter regulated by a large fragment of the CD69 3' UTR (Neilson et al. 2007) or by the E2F5 UTR. As predicted by the TargetScan 4.0 and miRanda algorithms, respectively, these UTRs are each regulated by a single miR-20 site (Lewis et al. 2003, John et al. 2004). Correspondingly, each reporter was derepressed upon treatment with a miR-20 sponge in 293T cells (Fig. 4c and Supplementary Fig. 4). Effect of sponges on microRNA levels Antisense oligonucleotides have been shown to reduce the cellular concentration of their cognate microRNAs (KrUtzfeldt et al. 2005, Davis et al. 2006). These results from northern blots are complicated by the possibility that the complementary RNA could compete with a labeled probe for base-pairing to the microRNA or prevent transfer of the short RNA to the hybridization matrix. We expected that the overexpression of a microRNA target, namely, expression of a microRNA sponge construct, would not alter the amount of endogenous microRNA. But northern blot analysis showed a modest (typically about twofold, ranging from 1.2-3-fold) specific decrease in free microRNA 24-48 h after transfection of the corresponding sponge (Fig. 5). We observed this effect for bulged and perfect sponges of both the Pol II and Pol III classes. The northern blots also showed microRNA signal near the location of the bands detected by probing against the GFP and U6 sponge RNAs, respectively. Thus, cellular microRNA concentration may be unchanged by sponge expression and the loss of a northern blot signal explained by microRNA retention at the top of the gel owing to interaction with the cognate sponge RNA. It is important to note that the signal of the GFP sponge RNAs is comparable to the signal of endogenous miR- 16 detected with the same-length DNA probe and after the same exposure time, supporting the expected inhibition of microRNAs by excess binding sites in the form of Pol II-driven sponges. To evaluate the abundance of GFP sponge RNAs in transfected cells, we quantified GFP transcripts by real-time PCR in relation to GFP plasmid standards (data not shown). We estimated the copy number of bulged GFP mRNAs in transiently transfected 293T cells to be at least 1,000-2,000 per cell. If all seven binding sites in the sponge RNA's UTR were used to bind microRNA, then this level of sponge expression should allow inhibition of approximately 104 microRNAs per cell, which would be sufficient to inhibit most microRNAs in most cell types. Discussion Sponges designed as decoy targets for microRNAs were effective and specific inhibitors of microRNA seed families. Somewhat surprisingly, the sponges with perfectly complementary binding sites were not degraded so rapidly as to be ineffective at competing microRNA from targets. Although these sponge RNAs should be degraded by Argonaute 2-catalyzed cleavage, they probably also stably associate with microRNAs complexed to the cleavage-incompetent Argonautes 1, 3 and 4. They could also form stable interactions with other microRNAs that share the same seed but vary at nucleotides 10-11, producing a bulge that protects against endonucleolytic cleavage. Inclusion of the GFP reporter in the sponge mRNA is useful for assessing transfection efficiency and for tracking those cells that express high levels of the inhibitor RNA. We envision multiple applications of the GFP sponges for target validation and phenotypic analysis. Cells with poor transfection rates can be subjected to fluorescence-activated cell sorting to isolate subpopulations expressing the sponge RNA and thus suppressing microRNA activity. This could be critical for detecting typically subtle (less than twofold) changes in the levels of proteins targeted by endogenous microRNA. Alternatively, transfected cells can be immunostained for predicted targets or phenotypic markers and two-color flow cytometry can be used to assess the correlation between GFP expression and target-protein level. In these applications GFP expression serves both as an indicator of sponge plasmid dose and as a sensor of cellular microRNA activity. By contrast, chemically modified antisense oligonucleotides, which lack a reporter function, limit the experimenter to pooled cell analyses and dilute the inhibitor's effect often to unobservable levels in cell lines with low transfection rates. The properties of antisense oligonucleotides and sponges are summarized in Supplementary Table 2. There might be several ways to improve the sponge technology described in this study. Addition of more microRNA binding sites to the sponge UTRs would increase the dose of antisense sequences and should therefore increase the potency of the sponges. Testing a microRNA sponge with 6, 10 or 18 sites showed a marginal increase in activity above 6 sites, with apparently saturating effect, but for sponges expressed at lower levels from chromosomal insertions, the additional sites may be beneficial (data not shown). Alternatively, the spacing between sites might be optimized to enhance the binding of miRNPs to every possible site, although previous results suggest that nearby sites are fully functional (Doench and Sharp 2004). One could also construct sponges with combinations of seed binding sites for two or more microRNA families of interest. To express sponges at a high level transiently in vivo, one could use viral vectors as in a recent work using adenovirus delivered to cardiac tissue. Finally, there may be Pol III elements other than U6 that would produce sponge RNAs at a high level that are transported to the cytoplasm where they would encounter mature microRNA. Just as sponges inhibit endogenous microRNAs, they could also be used to inhibit siRNAs. In a short hairpin RNA-expressing cell line, a siRNA sponge could provide another level of regulatory control. An extension of the current technology would be to express sponges from stably integrated transgenes in vivo. Just as short hairpin RNAs, mRNA inhibitors expressed from transgenes, have expanded the experimental scope of siRNAs, transgenic sponges could expand the scope of antisense microRNA inhibitors. Beyond assaying long-term effects of microRNA loss of function in cell lines, we envision the use of drug-inducible sponges in xenograft models to investigate microRNA contributions to tumorigenesis; bone marrow reconstitution approaches to investigate microRNA roles in immune cell development; and, ultimately, germline transgenic sponge mice to ascertain the functions of microRNA families at cell, tissue, organ and organism levels. In principle, microRNA sponges expressed from appropriate promoters should be applicable in any transgenic model organism, including worm, fly and plants. Methods Construction of sponge plasmids and reporters We annealed, ligated, gel purified and cloned oligonucleotides for microRNA binding sites with 4-nt spacers for bulged sites, or with no spacers for perfect sites, into pcDNA5CMV-d2eGFP vector (Invitrogen/Clontech) digested with Xhol and ApaI. We constructed Pol III sponges by subcloning the UTR into pTZ-U6+27 vector (see Acknowledgments). We constructed luciferase reporters by the same oligonucleotide annealing method or by subcloning the UTR into pcDNA5-TK-RLuc vector. We PCRamplified and ligated the E2F] UTR fragment (nucleotides 393-978), the CD69 UTR fragment (nucleotides 25-899) and the E2F5 UTR (1-653) into the same vector. Luciferase assays We plated 293T cells or HeLa cells the day before transfection and transfected them in triplicate with Lipofectamine 2000 (Invitrogen) and 50 ng of pGL3 (Firefly luciferase plasmid), 90 ng of RLuc target reporter plasmid, and 700 ng of sponge plasmid. We transfected the E2F] UTR reporter at 4.5 ng, the E2F5 UTR reporter at 0.9 ng. We cotransfected 2' O-methyl antisense (Dharmacon) and LNA antisense (Exiqon, Dharmacon) oligonucleotides at 20 nM. We transfected the CXCR4 microRNA in the form of a siRNA mixed in varying ratios with negative control siRNA (Dharmacon) to maintain 20 nM total siRNA concentration. We performed all assays at 24 h after transfection with the dual luciferase assay (Promega) on an Optocomp I luminometer (MGM Instruments). Additional methods Primers used, western blot and northern blot analyses, construction of stable cell lines, and quantification of sponge RNAs are described in Supplementary Methods. Author contributions M.S.E. and J.R.N. conceived the experimental design and made the sponge constructs. M.S.E. performed the experiments and wrote the manuscript. P.A.S. supervised the work. References Barad 0, Meiri B, Avniel A, Aharonov R, Barzilai A, Bentwich I, Einav U, Gilad S, Hurban P, Karov Y, Lobenhofer EK, Sharon E, Shiboleth YM, Shtutman M, Bentwich Z, Einat P. MicroRNA expression detected by oligonucleotide microarrays: system establishment and expression profiling in human tissues. Genome Res. 14, 2486-2494 (2004). Care A, Catalucci D, Felicetti F, Bonci D, Addario A, Gallo P, Bang ML, Segnalini P, Gu Y, Dalton ND, Elia L, Latronico MV, Hoydal M, Autore C, Russo MA, Dorn GW 2nd, Ellingsen 0, Ruiz-Lozano P, Peterson KL, Croce CM, Peschle C, Condorelli G. MicroRNA-133 controls cardiac hypertrophy. Nat. Med. 13, 613-618 (2007). Chang TC, Wentzel EA, Kent OA, Ramachandran K, Mullendore M, Lee KH, Feldmann G, Yamakuchi M, Ferlito M, Lowenstein CJ, Arking DE, Beer MA, Maitra A, Mendell JT. Transactivation of miR-34a by p53 broadly influences gene expression and promotes apoptosis. Mol. Cell 26, 745-752 (2007). Cheng HY, Papp JW, Varlamova 0, Dziema H, Russell B, Curfman JP, Nakazawa T, Shimizu K, Okamura H, Impey S, Obrietan K. MicroRNA modulation of circadian-clock period and entrainment. Neuron 54, 813-829 (2007). Davis S, Lollo B, Freier S, Esau C. Improved targeting of miRNA with antisense oligonucleotides. Nucleic Acids Res. 34, 2294-2304 (2006). Doench JG, Petersen CP, Sharp PA. siRNAs can function as miRNAs. Genes Dev. 17, 438-442 (2003). Doench JG, Sharp PA. Specificity of microRNA target selection in translational repression. Genes Dev. 18, 504-511 (2004). He L, Thomson JM, Hemann MT, Hernando-Monge E, Mu D, Goodson S, Powers S, Cordon-Cardo C, Lowe SW, Hannon GJ, Hammond SM. A microRNA polycistron as a potential human oncogene. Nature 435, 828-833 (2005). Hutvagner G, Simard MJ, Mello CC, Zamore PD. Sequence-specific inhibition of small RNA function. PLoS Biol. 2, e98 (2004). John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS. Human microRNA targets. PLoS Biol. 2, 363 (2004). Krntzfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M. Silencing of microRNAs in vivo with 'antagomirs'. Nature 438, 685-689 (2005). Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20 (2005). Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. Prediction of mammalian microRNA targets. Cell 115, 787-798 (2003). Li QJ, Chau J, Ebert PJ, Sylvester G, Min H, Liu G, Braich R, Manoharan M, Soutschek J, Skare P, Klein LO, Davis MM, Chen CZ. miR- 181 a is an intrinsic modulator of T cell sensitivity and selection. Cell 129, 147-161 (2007). Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, Downing JR, Jacks T, Horvitz HR, Golub TR. MicroRNA expression profiles classify human cancers. Nature 435, 834-838 (2005). Meister G, Landthaler M, Dorsett Y, Tuschl T. Sequence-specific inhibition of microRNA- and siRNA-induced RNA silencing. RNA 10, 544-550 (2004). Neilson JR, Zheng GX, Burge CB, Sharp PA. Dynamic regulation of miRNA expression in ordered stages of cellular development. Genes Dev. 21, 578-589 (2007). O'Donnell KA, Wentzel EA, Zeller KI, Dang CV, Mendell JT. c-Myc-regulated microRNAs modulate E2F1 expression. Nature 435, 839-843 (2005). Orom UA, Kauppinen S, Lund, AH. LNA-modified oligonucleotides mediate specific inhibition of microRNA function. Gene 372, 137-141 (2006). Paul CP, Good PD, Li SX, Kleihauer A, Rossi JJ, Engelke DR. Localized expression of small RNA inhibitors in human cells. Mol. Ther. 7, 237-247 (2003). Acknowledgments This work was funded by US Public Health Service grants U 19-AI056900 from the National Cancer Institute, by an Integrative Cancer Biology Program Grant U54 CA] 12967 from the National Institutes of Health to P.A.S. and partially by Cancer Center Support (core) P30-CA14051 from the National Cancer Institute. M.S.E. is supported by a Howard Hughes Medical Institute Predoctoral Fellowship and a Paul and Cleo Schimmel Scholarship. J.R.N. is supported by the Cancer Research Institute. We thank A. Garfinkel and M. Kumar for luciferase reporter preparations, A. Leung for assistance with fluorescence in situ hybridization, D. Engelke (University of Michigan) for the U6 vector and members of the Sharp laboratory for helpful discussions. Figures Figure 1. Design of microRNA sponges. (a) We constructed GFP sponges by inserting multiple microRNA binding sites into the 3' UTR of a 2-h destabilized GFP reporter gene driven by the CMV promoter. (b)The imperfect pairing between a microRNA and a sponge with bulged binding sites is diagrammed for miR-2 1. We designed sponges with a bulge to protect against endonucleolytic cleavage by Argonaute 2. (c) We constructed U6 sponges by subcloning the microRNA binding site region into a vector containing a U6 snRNA promoter with 5'and 3' stem-loop elements. BGH poly(A) d2eGFP microRNA binding sites (bulged or perfect) CMV b C AGAC 3'- AGU)UGUA GUC Sponge 5'-UCAACAUCAG UAUUCGAU-5' miR-21 AUAAGCUA-3' Polil1,5'stemloop U6 rn-0n- -- 3' stemloop microRNA binding sites (bulged or perfect) Figure 2. Efficacy of microRNA sponges. (a-c) RLuc activity relative to firefly luciferase activity was assayed in 293T cells 24 h after transfection with RLuc microRNA target reporters, firefly luciferase transfection control and microRNA sponge plasmids. An RLuc target regulated by 7 miR-20 sites was derepressed by GFP sponges and U6 sponges with bulged or perfect binding sites for miR-20 (a). C, CMV sponge; U, U6 sponge. CX, CXCR4 control; 20b, 7 bulged miR-20 sites; 20pf, two perfect miR20 sites. Bars represent the expression of the miR-20 target relative to an untargeted control reporter. We measured an artificial CXCR4 target reporter with a single bulged binding site in the presence of a control GFP sponge against miR-21 (miR-21 sponge) or a GFP sponge containing seven CXCR4 binding sites (CXCR4 sponge; b). We transfected cells with 20 nM antisense oligonucleotide (2' 0-methyl 20 or LNA 20) or with the CMV bulged sponge against miR-20 (sponge 20; c). Negative controls; mock (no oligonucleotides or sponges), 2' O-methyl against miR-30, LNA against miR-122, CXCR4 sponge. We performed each experiment at least three times and have shown a representative example. Error bars, s.d.; n = 3. aa C1 * CXUPJspo 10' L-75] C IXC-20-1 C,20,60 UCX 1 5 U0b U-25Pf CXCN :iJ LA ~HN Figure 3. Specificity of microRNA sponges. (a) We assayed RLuc activity relative to firefly luciferase activity in HeLa cells 24 h after transfection with RLuc microRNA target reporters, firefly luciferase transfection control and microRNA sponge plasmids. Targets of miR-20 and miR-21 are specifically derepressed by the corresponding GFP sponge. Bars are normalized to the relative RLuc units of samples treated with the CXCR4 control sponge. (b) We assayed a perfect target reporter of miR-30c in HeLa cells transfected with oligonucleotide or sponge inhibitors of miR-30e. Controls: 2' 0-methyl anti-miR-181, CXCR4 sponge. MicroRNA sequences below show the heptameric seed sequence in bold, with nucleotide differences between the two family members underlined. We performed each experiment at least three times and have shown a representative example. Error bars, s.d; n miR-20 target 0 miR-21 target b 5 . 4 miR-30c target a. X 3 c-2 1.0 0.0 0.5 0.5 0.0 miR-20 sponge 1.0 miR-21 sponge Control 2' 0-methyl miR-30e anti-miR-30e sponge miR-20 5'-UAAAGUGCUUAUAGUGCAGGUA-3' miR-30c 5'- UGUAAACAUCCUACACUCUCAGC-3' miR-21 5'- UAGCUUAUCAGACUGAUGUUGA-3' miR-30e 5'- UGUAAACAUCCUUGACUGGA -3' Figure 4. Validation of microRNA targets. (a) We assayed 293T cells transfected with GFP sponges against miR-18, miR-20 or the CXCR4 control by western blot 48 h after transfection. The increase in endogenous E2F 1 upon inhibition of miR- 18 or miR-20 is shown relative to the control samples loaded at indicated amounts; E2F 1 is the 60 kDa band indicated; the other bands are nonspecific (top). Beta-actin loading control (bottom). (b) We assayed 293T cells transfected with an RLuc reporter fused to a fragment of the E2F1 UTR spanning two miR-20 sites, firefly luciferase and GFP sponges. Bars represent RLuc units relative to firefly luciferase units. (c) We assayed RLuc activity relative to firefly luciferase activity in 293T cells transfected with an RLuc reporter fused to a fragment of the CD69 UTR containing a predicted miR-20 binding site, firefly luciferase and GFP sponges. We performed each experiment at least three times and have shown a representative example. Error bars, s.d; n 3. C C C C) Sponge 1.50 1.25 2.0- 1.00 0.75 1.55 0.50 0.25 0.5- E2F1 -e - " - fp-actin 0.00 1.0- 0.0 C-CX C-20 Sponge C-CX C-20 Sponge Figure 5. Effect of sponges on microRNA levels. (a) We transfected 293T cells with sponge plasmids and collected total RNA for northern blot analysis 48 h later. We probed the blot for miR-16 (top; 24-h exposure), then stripped the blot and reprobed it for GFP mRNA, U6 sponge RNA (both 24-h exposure) and a tRNA loading control (3-h exposure; bottom). (b) Quantitation of miR-16 relative to tRNA for each sponge-treated sample. We performed northern blots for miR-16 and miR-20 >10 times in 293T and HeLa cells, and show results from a representative blot. a Jer%) K4 (> miR-16 GFP sponges 4$ - U6 sponges C-CX C-16b C-16pf U-CX U-16b U-16pf Sponge - tRNAGIn ........... . ................................... - Supplementary Information Supplementary Figure 1. Effect of sponges on a miR-20 target and an untargeted control. We transfected 293T cells with a Renilla luciferase vector containing seven miR-20 sites or an otherwise identical vector containing seven CXCR4 control sites and the sponge plasmids indicated. Bars represent Renilla luciferase units relative to Firefly luciferase units. Error bars represent standard deviation among triplicate samples. Results are representative of a minimum of three independent experiments. 120 100 80 60 40 20 C-CX C-20b C-20pf U-CX U-20b U-20pf Sponge Supplementary Figure 2. Comparison of sponges to antisense oligos. We transfected 293T cells with 20 nM antisense oligo (2' O-methyl or LNA) or with the CMV bulged sponge against miR-16. Negative controls: mock (no oligos or sponges), 2' O-methyl against miR-30, LNA against miR-122, CXCR4 sponge. The target reporter contains nine miR-16 sites and is derepressed slightly more strongly by the LNA and sponge than by the 2' O-methyl oligo. Error bars denote standard deviation in triplicate samples. Results are representative of a minimum of three independent experiments. 10 8 6 4- 2 CC 0 ~ '(0 ( (0 e. 0 0o X" 4 o 0' Supplementary Figure 3. Inhibition of microRNA by a stably expressed sponge. We stably transfected 293T cells with the miR-16 sponge or CXCR4 sponge (control) plasmid, sorted for high GFP expression, and tested by dual luciferase assay with a Renilla luciferase reporter for miR-16 or an untargeted Renilla luciferase control. Bars represent expression of the miR- 16 target relative to the untargeted control in each cell line. The miR-16 target is rescued about 40 percent as well by the stably expressed sponge as by the transiently transfected sponge. Error bars represent standard deviation in triplicate samples. 0 -10 IQb C ....... .. . ................. Supplementary Figure 4. Validation of new microRNA targets. We fused the E2F5 UTR (which contains a predicted miR-20 binding site) to a Renilla luciferase reporter. We transfected 293T cells with the UTR reporter, Firefly luciferase, and GFP sponges. Bars represent Renilla luciferase units relative to Firefly luciferase units. Error bars represent standard deviation among triplicate samples. Results are representative of a minimum of three independent experiments. Interestingly, miR-20 is now shown to directly regulate at least two members of the E2F family of transcription factors. 1.4 . 1.2 - S 1.0 X ( 0.8 0.6 S0.4 0.2 0.0 C-20 C-CX Sponge Supplementary Table I. Sequences of sponges and reporters. sites written 5' to 3'. CXCR4 bulged Renilla luciferase reporter 7 sites or 1 site; CMV sponge. 7 sites. U6 sponge. 4 sites. AAGUUUUCAGAAAGCUAACA miR- 16 bulged Remulla luciferase reporter. CMV sponge. and U6 sponge, 9 sites AAUAUUC UAUGCUGCUA miR- 16 perfect CMV sponge and U6 sponge, 2 sites CGCCAAUALUUACGUGCUGCUA. miR- 18 bulged CIV sponge. 8 sites UAUCUGCAC UUAGGCAC-CUUA. miR-20 bulged Renilla luciferase reporter and CMV sponge. 7 sites. U6 sponge, 4 sites UACCUGCACUC GCGCACUUUA. miR-20 perfect CMV sponge and U6 sponge. 2 sites CUACC'UGCAC'UAUAAGCACUUUA. miR-21 bulged Renilla luciferase reporter. 6 sites. CMV sponge. 7 sites UCAACAUCAGGACAUAAGCUA. miR-30c perfect Remlla luciferase reporter GCUGAGAGUGUAGGAUGUUUACA. 2 sites miR-30e bulged CIV sponge. 6 sites UCCAGUCC'CUAUGUUUACA. Method 2'-O-methyl antisense Modified RNA oligo One LNA antisense MicroRNA sponge Modified RNA and DNA oli2o One iRNA containing a 3'UTR Multiple Transient transfection Transient transfection Reporter function None None Transient transfection or stable expression fronm chromosomal insertions GFP or other genetically encoded Specificity Single microRNA Single microRNA Composition Number of binding sites Means of addition to cells reporter proteins MicroRNA seed family Supplementary Table 2. Comparison of microRNA sponges to antisense oligos. Supplementary Methods Primers E2F 1 UTR fragment: forward primer AATATTCTAGACTCTAACTGCACTTTCGGCC and reverse primer AATAAGGGCCCGAAGCAAATCAAAGTGCAGATTG. CD69 UTR fragment: forward primer AGCTAGCTCGAGACTGTGCCATAGCACCACAG and reverse primer ATGCATGCGGCCGCACAGCTTAAACTTTATAGTGGGTTTT. E2F5 UTR: forward primer GACTCGAGATTCCATGGAAACTTGGGAC and reverse primer CCGCGGCCGCAATGTTTTATACAATTTTATTTT. Western blot We transfected 293T cells two days in a row with Lipofectamine 2000 and sponge plasmids. Fluorescence microscopy confirmed that 95-100 percent of the cells were GFP-positive 48 hours after the first transfection. We lysed cells in RIPA buffer and resolved the lysates on a Tris-HCl 4-20% gel, transferred to a nitrocellulose membrane, and probed with anti-E2F 1 (Santa Cruz sc193), stripped, and re-probed for beta-actin (Sigma A5441). We imaged the blots with Western Lightning chemiluminescence reagent (PerkinElmer) and film. We performed the experiment three times and have shown a representative result. Northern blot We transfected 293T cells with Lipofectamine2000 and sponge plasmids. We harvested total RNA by Trizol extraction 48 hours post-transfection. We ran 20 pg RNA on a 12% polyacrylamide gel, along with end-labeled 10-bp DNA ladder (Invitrogen), transferred to Hybond N+ membrane, and probed against miR- 16, then stripped and reprobed for glutamine tRNA, then for the 3' end of the d2eGFP coding region, then for the 3' end of the U6 sponge RNA. We imaged the blots with a Storm scanner (Molecular Dynamics) and quantified the bands with ImageQuant software (Amersham Biosciences). We performed the experiment at least three times each for miR-16 and for miR-20 and have shown a representative result. Construction of stable cell lines We cotransfected 293T cells with linearized GFP sponge plasmids for miR-16 or the CXCR4 control at a 20:1 ratio to linear puromycin marker (Clontech). We cultured the cells in 2.5 pg/ml puromycin for about six weeks and sorted on a MoFlo FACS instrument (Cytomation) for the highest 10 percent of GFP expression. We cultured these fractions for another week before performing transfection assays. Quantification of sponge RNAs We transfected 293T cells with CMV sponges (CXCR4, miR-16, miR-20) and harvested total RNA 24-48 hours later. We treated RNA with DNaseI (Ambion) and reverse transcribed it with random primers using MMLV Reverse Transcriptase (Ambion). We used the cDNA samples or no-RTase controls as templates for real-time PCR with SYBRGreen detection (Applied Biosystems) and primers in the coding region of GFP. We used a dilution series of GFP plasmid standards to estimate the number of GFP cDNAs present in each reaction. We ran each PCR experiment in triplicate and averaged the results of three experiments. Chapter 3. MicroRNA sponge inhibitors: progress and possibilities This chapter was written by Margaret S. Ebert and edited by Phillip A. Sharp. The microRNA (miRNA) "sponge" method was introduced three years ago as a means to create continuous miRNA loss of function in cell lines and transgenic organisms. Sponge RNAs contain complementary binding sites to a miRNA of interest, and are produced from transgenes within cells. As with most miRNA target genes, a sponge's binding sites are specific to the miRNA seed region, which allows them to block a whole family of related miRNAs. Whether termed sponges, decoys, erasers, or lentiviral antagomirs, this transgenic approach has proven to be a useful tool to probe miRNA functions in a variety of experimental systems. In this review we discuss the recent applications of miRNA sponges with particular emphasis on stable sponge expression in cancer studies and in transgenic animals. We also consider the likelihood that there exist natural mRNAs or non-coding RNAs that function as miRNA sponges to inhibit miRNA families. Introduction The widespread involvement of microRNAs (miRNAs) in regulating developmental processes, physiological responses, and pathological conditions in animals has been amply demonstrated (He and Hannon 2004, Bushati and Cohen 2007, Bartel 2009). Nonetheless, the specific functions of each miRNA in the various contexts in which it is expressed are only beginning to be discovered. The typical miRNA is computationally predicted to regulate hundreds of target genes (Friedman et al. 2009), and while there has been progress in compiling sets of predicted targets into pathways (Tsang et al. 2010; see Appendix), every prediction still needs to be experimentally validated. The best experimental approaches create a loss of function in the miRNA of interest. Loss-offunction approaches are superior because they reveal functions that depend on physiological miRNA levels; by contrast, adding exogenous miRNA to the system can result in repression of non-physiological target mRNAs since miRNA-target interaction is strongly concentration-dependent (Mukherji et al. 2010; see Chapter 4). There are three general methods for miRNA loss-of-function studies: genetic knockouts, antisense oligonucleotide inhibitors (Meister et al. 2004, Orom et al. 2006, Kritzfeldt et al. 2005), and sponges (Ebert et al. 2007). The sponge mRNA containing multiple target sites complimentary to a seed family is a dominant negative method. When the sponge is expressed at high levels, it inhibits the activity of the set of miRNAs with the common seed but not other seed families of miRNAs. While deleting a miRNA is the only way to guarantee complete loss of its activity, the sponge method offers several advantages. First is the convenience of making dominant negative transgenics over knockouts, and the applicability to a broader range of model organisms and cell lines. Second, many miRNAs have seed family members encoded at multiple distant loci; due to this functional redundancy, these miRNAs would have to be knocked out individually and the animals bred to generate the complete knockout strain. Furthermore, some miRNA precursors are transcribed in clusters; the proximity of the miRNAs within a cluster may make it difficult to cleanly delete one miRNA without affecting the processing of the others. Since sponges have a trans dominant activity, the clustering of miRNA precursors is irrelevant. Sponges also offer advantages over chemically modified antisense oligonucleotide inhibitors. First, these antisense inhibitors are specific for one miRNA since they depend upon extensive sequence complimentarity. Thus, to neutralize a family of miRNAs requires the delivery of a mixture of oligonucleotides. In addition, many cells both in vitro and in vivo are resistant to the uptake of antisense oligonucleotides. In contrast, for difficult-to-transfect cell lines or cells in vivo, the sponge transgene can be delivered by a retroviral vector. Inclusion of an open reading frame for a selectable marker or reporter gene in the vector allows for selection or screening, fluorescence-activated cell sorting, or even laser capture microdissection of cells strongly expressing the sponge. (See Supplementary Information for suggestions on sponge design.) This makes it possible to isolate a fraction of cells in which the family of miRNAs is strongly inhibited, which can reveal even subtle changes in target gene expression. In principle one could include regulatory elements in the sponge promoter to make it drug-inducible or tissue-specific for the tissue of choice. By contrast, the cholesterol-modified 'antagomir' oligonucleotides that can be injected into the mouse cannot access all tissues, and mostly accumulate in the liver (KrUtzfeldt et al. 2005). Finally, antagomirs require repeated administration in large doses to inhibit a miRNA over long durations, whereas one could generate germline transgenic sponge-expressing animals to continuously inhibit the miRNA of interest for the lifetime of the animal. The current status of sponge technology will be described below as examples of the above principles. It is remarkable how useful this technology has become. Recent applications of miRNA sponges The immediate application of miRNA sponges as first described was transient treatment and assay in cell culture models. A number of reports demonstrate the flexibility of the method with respect to cell type, promoter, vector, reporter gene, and type of miRNA targeted. Sponges were transfected or transduced into human, mouse, and rat cell lines such as non-small cell lung cancer (Kumar et al. 2008), B cell lymphoma (Bolisetty et al. 2009), embryonic neural stem cells (Rybak et al. 2008), and dissociated hippocampal neurons (Edbauer et al. 2010). Sponge RNAs were transcribed from strong promoters such as CMV (Elcheva et al. 2009, Rybak et al. 2008), U6 (Sayed et al. 2008), and viral LTRs (Kumar et al. 2008). The most commonly used vectors were plasmids (Elcheva et al. 2009, Kumar et al. 2008, Edbauer et al. 2010, Rybak et al. 2008) but some used retroviruses (Bolisetty et al. 2009), lentiviruses (Nachmani et al. 2009, Horie et al. 2009) or adenovirus (Sayed et al. 2008). Individual miRNAs e.g. miR-155 (Bolisetty et al. 2009) or large seed families e.g. let-7 (Kumar et al. 2008) were successfully targeted. The most common reporter gene was eGFP (Elcheva et al. 2009, Kumar et al. 2008, Bolisetty et al. 2009, Rybak et al. 2008, Nachmani et al. 2009), but mCherry (Edbauer et al. 2010) and luciferase (Horie et al. 2009) were also used. Typically, cellular assays and target validation assays (visualization of derepressed target protein or 3' UTR reporter expression) were performed 24-72 hours after introduction of the sponge construct. Transient delivery to tissue is also feasible: Care et al. used an adenoviral eGFP sponge to inhibit miR-133 in cardiac myocytes in vivo in a mouse model of cardiac hypertrophy (Care et al. 2007). Krol et al. used adeno-associated virus (AAV) to deliver sponges to mice subretinally. In the latter case, the eGFP sponge was driven by the rhodopsin promoter to allow for specific expression in photoreceptor cells, and each animal received a combination sponge for three light-regulated miRNAs (miR- 182, -96, and 183) in one eye and an empty control sponge in the other (Krol et al. 2010). Three weeks post-injection, retinas were isolated and dissected into retinal layers using laser capture microdissection for eGFP-expressing cells. Western blotting revealed strong derepression for the target glutamate transporter SLCIAl. One fortuitous aspect of sponge treatment is that it can cause a significant and specific reduction in the miRNA level (Sayed et al. 2008, Rybak et al. 2008, Horie et al. 2009). This may indicate that miRNA-target interaction stimulates degradation of the miRNA. Another positive outcome is the absence of any feedback response that would upregulate the miRNA upon introduction of increased target sites in the form of the miRNA sponge. Even though early results with transiently introduced sponges were encouraging, it was not certain that sponge mRNAs would be able to accumulate to levels sufficient to inhibit miRNA in stable expression formats. Recent results indicate that this is possible. Stable miRNA sponge expression Continuous expression of the sponge inhibitor makes it possible to perform long-term miRNA loss-of function studies in cell culture and in vivo assays such as bone marrow reconstitution and cancer xenografts. Several groups have achieved stable miRNA sponge activity by expressing the transgene from one or more chromosomal integrations (Scherr et al. 2007, Haraguchi et al. 2009, Gentner et al. 2009, Bonci et al. 2008, Valastyan et al. 2009, Starczynowski et al. 2010, Ma et al. 2010a, Ma et al. 2010b, Gatt et al. 2010, Papapetrou et al. 2010). In principle, stably propagated episomal vectors (Kimchi et al. 1999) should also yield similar results. The challenge for stable expression is to produce a sufficient dose of sponge mRNA given much lower transgene copy numbers compared to transient plasmid transfection. The good news from recent reports is that even partial miRNA inhibition can yield measurable and interesting phenotypes. Papapetrou et al. sought to probe the role of the erythroid-specific, closely clustered miRNAs miR- 144 and miR-451 in blood cell development. To this end they used lentiviral sponges marked with a different color fluorescent reporter for each miRNA to dissect their relative contributions in erythropoiesis (Papapetrou et al. 2010). Bone marrow reconstitution was performed with a 1:1 mixture of green control sponge with red (miR- 144) or yellow (miR-45 1) sponge, or both. Three to four weeks after transplantation, the competitive repopulation of the chimeric blood was analyzed by flow cytometry. Both miRNAs were found to be required for normal progression through the first stage of erythroblast maturation, and their simultaneous inhibition showed that they act additively. One of the most common applications of stably expressed sponges is to mimic the downregulation of specific miRNAs that are aberrantly expressed in certain disease states. For example, by screening miRNA expression and metastatic potential of a panel of mammary cell lines, Valastyan et al. identified miR-31 as strongly down-regulated in aggressive metastatic cancer (Valastyan et al. 2009). They set up an experimental model wherein human non-metastatic breast cancer cells transduced with retroviral eGFP sponges for miR-31 or an irrelevant sequence were orthotopically implanted in mouse mammary fat pads. Primary tumor size was not significantly affected by the inhibition of miR-3 1, but, while the control sponge tumors did not metastasize, miR-3 1 sponge tumors metastasized to the lungs, forming ten times more lesions (easily identifiable by their GFP fluorescence). This result allowed the authors to identify miR-31 as a suppressor of metastasis. A similar approach was taken to show that miR-Ob (Ma et al. 2010a) and miR-9 (Ma et al. 201 Ob) promote breast cancer metastasis. The recent finding that reduction in the expression of a tumor suppressor by a mere 20 percent can promote the development of cancer (Alimonti et al. 2010) suggests that screens with sponges, which may alter target gene expression to a similar extent, could be generally informative. A related experiment is the application of a sponge to mimic the genetic state of patients with a genomic deletion of a particular miRNA or miRNA cluster. For example, the miR15a-16-1 cluster is located within a region of chromosome 13q14 that is frequently deleted in leukemia, prostate cancer, and other malignancies (Bottoni et al. 2005, Bandi et al. 2009, Bonci et al. 2008, Hanlon et al. 2009, Corthals et al. 2010, Gatt et al. 2010). Bonci et al. and Gatt et al. used lentiviral GFP sponges with sites for miR- 15a and miR16 respectively and tested transduced human prostate cancer and multiple myeloma cell lines by xenograft assay. In both cases the miR-15/16-inhibited cancers developed larger, more invasive tumors than their negative controls; in the multiple myeloma study, the animals showed substantially decreased survival, from a median of 80 to 31 days. Analysis of the tumors implicated several signaling pathways in which the miR- 15/16 family acts to suppress survival, proliferation and invasiveness (Gatt et al. 2010). Another instance of a disease-associated miRNA cluster deletion occurs in the 5qsubtype of myelodysplastic syndrome (MDS) (Starczynowski et al. 2010). In this case the miRNAs in the cluster, miR- 145 and -146a, have different seeds. To model the partial loss of these two miRNAs in hematopoietic stem/progenitor cells, Starczynowski et al. used a combination sponge containing 8-9 bulged sites for each miRNA. Cells transduced with retroviral YFP sponges were transplanted into lethally irradiated recipient mice, and were mixed with wild-type cells to mimic the chimerism of human 5q- patients. Eight weeks post-transplantation, the animals' blood cells manifested most of the features of MDS. Observation over the long term proved the benefit of including a fluorescent reporter in the competition assay: over the course of several months, YFP* cells were depleted from the blood of the sponge-transduced (but not vector control) recipients, yet thrombocytosis was still evident, indicating a cell non-autonomous effect of miRNA depletion. This correlated with an increased serum IL-6 concentration attributable to the derepression of miR-146 target gene TRAF6. Sustained, systemic phenotypes may result from transient miRNA perturbation in a subset of cells if secreted cytokines operate in a positive feedback loop, as in the recently described inflammatory cascade driven by IL6, let-7 down-regulation, and NF-kappaB (Iliopoulos et al. 2009). As in the case of miR15a-16-1 depletion in cancer, the ability of the stable sponge to partially knock down miRNA activity provides a good mimic for the partial loss of miRNA expression in patients with a heterozygous deletion. The miR- 145-146a miRNA cluster was shown to be haploinsufficient in conferring protection against disease (Starczynowski et al. 2010). miRNA sponges in transgenic animals The first transgenic organisms made to express miRNA sponges were plants (FrancoZorrilla et al. 2007). These incorporated a single bulged binding site for the miRNA of interest in the context of an overexpressed non-coding RNA, and successfully generated phenotypes opposite those of the corresponding miRNA-overexpressing plants. Stable, germline miRNA sponge expression in an animal model organism was first achieved in Drosophilausing the Gal4-UAS (Upstream Activation Sequence) system (Loya et al. 2009). The sponge constructs consist of five UAS elements, a fluorescent reporter, and ten bulged miRNA binding sites in the 3' UTR. Gal4 expressed from a tissue-specific promoter drives high expression of the sponge transgene. These inhibitors were able to completely suppress a neomorphic phenotype caused by an overexpressed miRNA in the eye, and to largely rescue expression of a target UTR reporter regulated by an endogenous miRNA in the wing imaginal disc. Hypomorphic phenotypes were enhanced by means of a sensitized background: the heterozygous miRNA deletion mutant, which has a reduced level of the miRNA but no detectable phenotype on its own. In this background, the sponge transgenics could phenocopy miRNA-null mutant flies. Varying the number of transgene copies also modulated the inhibitory effect, which could be used in combination with the miRNA genetic background to generate allelic series. The power of the Gal4 inducible system to dissect a null phenotype was shown by inhibiting a miRNA's activity in specific subtypes of cells. It is known that the miR-8 knockout has neuromuscular junction defects; activating the expression of a miR-8 sponge specifically in neurons or in muscle cells revealed the locally required activity (and regulation of the target gene Ena) in the postsynaptic muscle cell, even though miR8 is present in both pre- and post-synaptic cells. The ability to probe miRNA function in restricted subsets of cells could be critical, as there are cases of miRNA-target interactions restricted to one cell type; an extreme example is miR-273 repressing the transcription factor die-I in the right chemosensory ASE neuron, and lsy-6 repressing cog-I in the left chemosensory ASE neuron in C. elegans (Chang et al. 2004). Transgenic vertebrates expressing sponges are a work in progress. The recent development of the Tol2 transposon system and various Gal4 strains should facilitate the introduction of sponge transgenes for tissue-specific expression in zebrafish (Asakawa and Kawakami 2008). In the mouse, an inducible sponge could be created by means of the Cre-lox system (to remove a transcriptional stop cassette with tissue-specific recombinase expression) or with a tet-responsive element driving the sponge and tissue- specific reverse tet transactivator (rtTA) expression in combination with feeding the animal doxycycline. A sensitized background of DGCR8 and/or Dicer heterozygosity, which show partially reduced levels for some miRNAs (Murchison et al. 2005, Wang et al. 2007), might enhance the loss of function. It should be noted, however, that the Dicer heterozygous state can accelerate the development of tumors in mouse models (Kumar et al. 2009). Are there natural miRNA sponges? Given the ability of stably integrated mRNA-based miRNA sponges to specifically and in some cases inducibly inhibit miRNA seed families, it seems reasonable to expect that nature might also have invented this type of miRNA inhibitor. There are further reasons to support this hypothesis. First, miRNAs have been shown to be very stable (Bail et al. 2010), some with in vivo half-lives of more than a week (van Rooij et al. 2007); thus it should be more effective to induce a sponge RNA to sequence-specifically sequester a miRNA than to sequence-specifically degrade the mature miRNA strand, which is encased in an Argonaute protein complex. Sequestration by a target mimic RNA would operate through seed specificity, so an entire functional class of miRNA seed family members would be inhibited. Finally, effective sponges should be easy to evolve as they require only short stretches of complementarity to miRNA seeds in regions of relatively unstructured RNA. A sponge could contain sites for one miRNA family or for a combination of miRNAs such that it could serve as a specific rescue molecule for one or a few target genes. One can imagine several scenarios in which the expression of a sponge RNA could add a layer of regulation to post-transcriptional control of miRNA targets. During a developmental transition or in response to a cellular stress, when a miRNA is transcriptionally down-regulated, induction of a sponge RNA could sharpen the loss of that miRNA activity over time (Figure 3A). A miRNA induced to respond to a transient stress could be inhibited shortly thereafter by the accumulation of a stress-induced sponge (Figure 3B). Alternatively, such a stress-induced sponge could act as a quality control mechanism, setting a threshold above which miRNA expression must rise to successfully enact a change in the expression of critical target genes. A viral sponge RNA could inhibit a host miRNA to change the infected cell's gene expression program so as to evade immune response or hijack cellular pathways to promote viral propagation. A sponge RNA expressed in a specific tissue could uncouple the activity of an intronderived miRNA from the expression of its host gene. A tissue-specific sponge could also neutralize passenger strand miRNPs to enhance the specificity of miRNA loading (beyond what is determined by the thermodynamic asymmetry of the miRNA duplex that normally controls strand assembly), as has been done with artificial sponges to prevent passenger strand-mediated off-target effects from shRNA vectors (Mockenhaupt et al. 2010). A sponge could be constitutively expressed to fine-tune the activity of a miRNA to a slightly lower level. In certain cellular contexts such as in neurons, spatially separated zones of translation could experience major consequences from local sequestration of miRNA and the ensuing rescue of expression of a small pool of messages. All speculation aside, the best reason to believe that there could be natural miRNA sponges in animal systems is that there is already evidence for one in plants (FrancoZorrilla et al. 2007). The TPSI family of non-coding RNAs (IPS 1 and its paralog At4) are processed as mRNAs but contain very short, poorly conserved open reading frames. They also contain in the 3' UTR a 23-nt sequence that is highly conserved among different plant species, and that can act as a single bulged binding site for miR-399. In fact, the miRNA's nucleotides 1-10 are perfectly paired in more than 80 percent of IPS 1 genes; there is additional strong, conserved pairing to the miRNA's 3' end. The mismatches opposite nucleotides 10 and 11 protect the mRNA from endonucleolytic cleavage by miR-399-loaded Argonautes. While the TPSI RNAs are induced upon phosphate starvation, miR-399 is also induced, and the miR-399 target gene PHO2 is initially downregulated (Chitwood and Timmermans 2007). Franco-Zorrilla et al. found that overexpressing IPS 1 in the presence of miR-399 was able to rescue the level of PHO2 mRNA and thereby lower the shoot P1 content. Whether the endogenous TPSI levels are sufficient to derepress PHO2 to incur the same physiological response remains to be shown. As miR-399 and its sponge inhibitor are both induced by phosphate stress, they appear to act in an incoherent manner to regulate PHO2 target expression. Depending on the relative production and turnover rates of the miRNA and the sponge RNA, this type of regulatory architecture could serve to generate a brief pulse of miRNA activity followed by an attenuation period during which target mRNA levels recover (Chitwood and Timmermans 2007). mRNAs that act as competitive inhibitors of regulatory small RNAs (sRNAs) were also recently discovered in prokaryotes (Overgaard et al. 2009, Figueroa-Bossi et al. 2009). In this case a constitutively expressed, long-lived sRNA binds to and is destabilized by a target mimic RNA which is induced by chitobiose, a breakdown product of chitin from the outer membrane (Mandin and Gottesman 2009). What results is derepression of a chitoporin gene whose message is normally degraded by the sRNA. In animals systems, one place to look for potential sponge RNAs is in viral transcripts, which can be expressed at very high levels. In fact, there are hints that a viral miRNA sponge might be at work in cells lytically infected with murine cytomegalovirus (Buck et al. 2010). Upon infection, Buck et al. observed rapid post-transcriptional down-regulation of miR-27a and -27b, in a manner dependent on RNA polymerase activity; higher multiplicity of infection correlated with lower miR-27 levels. A gain-of-function experiment showed that the miR-27 family suppresses viral replication, supporting the possibility that inhibition of this miRNA family by a viral sponge RNA could facilitate viral replication. Cellular RNAs are also potential candidates for miRNA target mimics. Recently genomewide analysis of chromatin marks uncovered hundreds of large intergenic non-coding RNAs (lincRNAs) (Guttman et al. 2009), some of which localize to the cytoplasm where they could interact with mature miRNAs. There are also dozens of PolIII products and PolII-generated mRNA-like non-coding RNAs of undetermined function listed in noncoding RNA databases; some have been detected at high levels in specific cell types or under specific conditions (Pang et al. 2005). While such RNAs may be transcribed from intergenic promoters or promoters within 3' UTRs, another mechanism that can generate a UTR RNA was recently observed in mouse embryonic development: an exon exclusion event causes the entire coding region of the mRNA to be spliced out, leaving the untranslated regions in a non-coding transcript (Kanadia and Cepko 2010). Such transcripts could act as target mimics for the miRNA or combination of miRNAs with binding sites in their 3' UTRs. Concluding remarks Sponges are simply mRNAs with target sites in their 3' UTR that are expressed at sufficient levels to competitively interfere with miRNA regulation of specific endogenous targets. This can be pictured in the context of a cell as a population of miRNAs associated with Argonuate proteins that are bound as RNPs to the population of mRNAs with target sites of different affinities. Surprisingly, the concentration dependence of miRNA interactions with targets of different affinity is not well understood. But survey experiments suggest that miRNA concentrations of 500-1,000 per cell are necessary for silencing (Calabrese et al. 2008, Mukherji et al. 2010). The concentration of individual miRNAs in some differentiated tissue can range as high as 30,000-50,000 per cell. Under steady-state conditions most of these miRNAs are probably bound to endogenous mRNAs with different affinities, and the addition of sponges at levels of perhaps 50-100 RNAs per cell with 5-10 target sites each, can, at least in some cases, compete the endogenous miRNAs from targets adequately to produce a physiological change. Typically, the sponge's binding sites are designed to be of high affinity with extensive complementarity to the miRNA. Thus, the effectiveness of a sponge for de-repression of a specific target might be expected to vary with the abundance of the miRNA, the nature of the pool of its endogenous targets, and the affinity of the interaction of the miRNA with the specific target mRNA. Given the large number of variables, it is difficult to predict whether a sponge will be effective in a particular cell. Recent statistical analysis of dependences of silencing by miRNA and siRNA on the cellular abundance of target mRNAs illustrates these issues (Arvey et al. 2010). As outlined above, it is encouraging that the use of sponges has generated phenotypes with multiple vectors and in multiple organisms. They will probably be used more broadly as the study of miRNAs focuses on more physiological questions requiring perturbation of their activities in specific cell states. In conclusion, the miRNA sponge has become a versatile method for miRNA inhibition in cell culture and in vivo. The successful production of transgenic fruitflies whose sponge activity mimics known null mutant phenotypes was a major advance that should encourage attempts to generate other transgenic sponge animals. In plants, the discovery of an endogenous mRNA that acts as a natural miRNA sponge to attenuate a stress response may open the door to discovering more natural target mimics. Acknowledgments We thank Mary Lindstrom for help preparing the figures. This work was supported by United States Public Health Service grant RO1-CA133404 from the National Institutes of Health to P.A.S. and partially by Cancer Center Support (core) grant P30-CA14051 from the National Cancer Institute. References Alimonti A, Carracedo A, Clohessy JG, Trotman LC, Nardella C, Egia A, Salmena L, Sampieri K, Haveman WJ, Brogi E, Richardson AL, Zhang J, Pandolfi PP. Subtle variations in Pten dose determine cancer susceptibility. Nature Genet. 42, 454-458 (2010). A, Larsson E, Sander C, Leslie CS, Marks DS. Target mRNA abundance dilutes microRNA and siRNA activity. Mol. Syst. Bio. 6, 363 (2010). Asakawa K, Kawakami K. Targeted gene expression by the Gal4-UAS system in zebrafish. Dev. Growth Differ. 50, 391-399 (2008). Bail S, Swerdel M, Liu H, Jiao X, Goff LA, Hart RP, Kiledjian M. Differential regulation of microRNA stability. RNA 16, 1032-1039 (2010). Bandi N, Zbinden S, Gugger M, Arnold M, Kocher V, Hasan L, Kappeler A, Brunner T, Vassella E. miR- 15a and miR- 16 are implicated in cell cycle regulation in a Rbdependent manner and are frequently deleted or down-regulated in non-small cell lung cancer. Cancer Res. 69, 5553-5559 (2009). Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233 (2009). Bolisetty MT, Dy G, Tam W, Beemon KL. Reticuloendotheliosis virus strain T induces miR-155, which targets JARID2 and promotes cell survival. J. Virol. 83, 12009-12017 (2009). Bonci D, Coppola V, Musumeci M, Addario A, Giuffrida R, Memeo L, D'Urso L, Pagliuca A, Biffoni M, Labbaye C, Bartucci M, Muto G, Peschle C, De Maria R. The miR- 15a-miR- 16-1 cluster controls prostate cancer by targeting multiple oncogenic activities. Nature Med. 14, 1271-1277 (2008). Bottoni A, Piccin D, Tagliati F, Luchin A, Zatelli MC, degli Uberti EC. miR- 15a and miR-16-1 down-regulation in pituitary adenomas. J. Cell Physiol. 204, 280-285 (2005). Buck AH, Perot J, Chisholm MA, Kumar DS, Tuddenham L, Cognat V, Marcinowski L, DOlken L, Pfeffer S. Post-transcriptional regulation of miR-27 in murine cytomegalovirus infection. RNA 16, 307-315 (2010). Bushati N, Cohen SM. microRNA functions. Annu. Rev. Cell Dev. Biol. 23, 175-205 (2007). Calabrese JM. Dicer delection and short RNA expression analysis in mouse embryonic stem cells. Doctoral thesis (2008). Care A, Catalucci D, Felicetti F, Bonci D, Addario A, Gallo P, Bang ML, Segnalini P, Gu Y, Dalton ND, Elia L, Latronico MV, Hoydal M, Autore C, Russo MA, Dom GW 2nd, Ellingsen 0, Ruiz-Lozano P, Peterson KL, Croce CM, Peschle C, Condorelli G. MicroRNA-133 controls cardiac hypertrophy. Nature Med. 13, 613-618 (2007). Chang S, Johnston RJ Jr, Frokjaer-Jensen C, Lockery S, Hobert 0. MicroRNAs act sequentially and asymmetrically to control chemosensory laterality in the nematode. Nature 430, 785-789 (2004). Chitwood DH, Timmermans MC. Target mimics modulate miRNAs. Nature Genet. 39, 935-936 (2007). Corthals SL, Jongen-Lavrencic M, de Knegt Y, Peeters JK, Beverloo HB, Lokhorst HM, Sonneveld P. Micro-RNA- 15 a and micro-RNA- 16 expression and chromosome 13 deletions in multiple myeloma. Leuk. Res. 34, 677-681 (2010). Ebert MS, Neilson JR, Sharp PA. MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nature Methods 4, 721-726 (2007). Edbauer D, Neilson JR, Foster KA, Wang CF, Seeburg DP, Batterton MN, Tada T, Dolan BM, Sharp PA, Sheng M. Regulation of Synaptic Structure and Function by FMRPAssociated MicroRNAs miR-125b and miR-132. Neuron 65, 373-3 84 (2010). Elcheva I, Goswami S, Noubissi FK, Spiegelman VS. CRD-BP protects the coding region of betaTrCP1 mRNA from miR-183-mediated degradation. Mol. Cell 35, 240-246 (2009). Figueroa-Bossi N, Valentini M, Malleret L, Bossi L. Caught at its own game: regulatory small RNA inactivated by an inducible transcript mimicking its target. Genes Dev. 23, 2004-2015 (2009). Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI, Rubio-Somoza I, Leyva A, Weigel D, Garcia JA, Paz-Ares J. Target mimicry provides a new mechanism for regulation of microRNA activity. Nature Genet. 39, 1033-1037 (2007). Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92-105 (2009). Gatt ME, Ebert MS, Mani M, Zhang Y, Gazit R, Carrasco DE, Dutta J, Adamia S, Munshi NC, Minvielle S, Avet-Loiseau H, Tai YT, Anderson KC, Carrasco DR. MicroRNAs 15a/16-1 function as tumor suppressor genes in multiple myeloma. Submitted (2010). Gentner B, Schira G, Giustacchini A, Amendola M, Brown BD, Ponzoni M, Naldini L. Stable knockdown of microRNA in vivo by lentiviral vectors. Nature Methods 6, 63-66 (2009). Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk 0, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223-227 (2009). Hanlon K, Rudin CE, Harries LW. Investigating the targets of MIR- 15a and MIR- 16-1 in patients with chronic lymphocytic leukemia (CLL). PLoS One 4, e7169 (2009). Haraguchi T, Ozaki Y, Iba H. Vectors expressing efficient RNA decoys achieve the longterm suppression of specific microRNA activity in mammalian cells. Nucleic Acids Res. 37, e43 (2009). He L, Hannon GJ. MicroRNAs: small RNAs with a big role in gene regulation. Nature Rev. Genet. 5, 522-531 (2004). Horie T, Ono K, Nishi H, Iwanaga Y, Nagao K, Kinoshita M, Kuwabara Y, Takanabe R, Hasegawa K, Kita T, Kimura T. MicroRNA-133 regulates the expression of GLUT4 by targeting KLF 15 and is involved in metabolic control in cardiac myocytes. Biochem. Biophys. Res. Commun. 389, 315-320 (2009). Iliopoulos D, Hirsch HA, Struhl K. An epigenetic switch involving NK-kappaB, Lin28, Let-7 MicroRNA, and IL6 links inflammation to cell transformation. Cell 139, 693-706 (2009). Kanadia RN, Cepko CL. Alternative splicing produces high levels of noncoding isoforms of bHLH transcription factors during development. Genes Dev. 24, 229-234 (2010). Kimchi A. Functional approaches to gene isolation in mammalian cells. Science 285, 299 (1999). Kotin RM, Siniscalco M, Samulski RJ, Zhu XD, Hunter L, Laughlin CA, McLaughlin S, Muzyczka N, Rocchi M, Berns KI. Site-specific integration by adeno-associated virus. Proc. Natl Acad. Sci. USA 87, 2211-2215 (1990). Krol J, Busskamp V, Markiewicz I, Stadler MB, Ribi S, Richter J, Duebel J, Bicker S, Fehling HJ, Schtubeler D, Oertner TG, Schratt G, Bibel M, Roska B, Filipowicz W. Characterizing Light-Regulated Retinal MicroRNAs Reveals Rapid Turnover as a Common Property of Neuronal MicroRNAs. Cell 141, 618- 631 (2010). Krultzfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M. Silencing of microRNAs in vivo with 'antagomirs'. Nature 438, 685-689 (2005). Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T. Suppression of non-small cell lung tumor development by the let-7 microRNA family. Proc. Natl Acad. Sci. USA 105, 3903-3908 (2008). Kumar MS, Pester RE, Chen CY, Lane K, Chin C, Lu J, Kirsch DG, Golub TR, Jacks T. Dicer1 functions as a haploinsufficient tumor suppressor. Genes Dev. 23, 2700-2704 (2009). Loya CM, Lu CS, Van Vactor D, Fulga TA. Transgenic microRNA inhibition with spatiotemporal specificity in intact organisms. Nature Methods 6, 897-903 (2009). Ma L, Reinhardt F, Pan E, Soutschek J, Bhat B, Marcusson EG, Teruya-Feldstein J, Bell GW, Weinberg RA. Therapeutic silencing of miR- 1Ob inhibits metastasis in a mouse mammary tumor model. Nature Biotech. 28, 341-347 (2010). Ma L, Young J, Prabhala H, Pan E, Mestdagh P, Muth D, Teruya-Feldstein J, Reinhardt F, Onder TT, Valastyan S, Westermann F, Speleman F, Vandesompele J, Weinberg RA. miR-9, a MYC/MYCN-activated microRNA, regulates E-cadherin and cancer metastasis. Nature Cell Biol. 12, 247-256 (2010). Mandin P, Gottesman S. Regulating the regulator: an RNA decoy acts as an OFF switch for the regulation of an sRNA. Genes Dev. 23, 1981-1985 (2009). Meister G, Landthaler M, Dorsett Y, Tuschl T. Sequence-specific inhibition of microRNA- and siRNA-induced RNA silencing. RNA 10, 544-550 (2004). Mockenhaupt S, Schurmann N, Grimm D. Alleviation of adverse shRNA off-targeting via vector-encoded passenger strand decoys. Keystone symposium poster (2010). Mukherji S, Ebert MS, Zheng GZ, Tsang JS, Sharp PA, van Oudenaarden A. microRNAs generate gene expression thresholds with ultrasensitive transitions. Submitted (2010). Murchison EP, Partridge JF, Tam OH, Cheloufi S, Hannon GJ. Characterization of Dicerdeficient murine embryonic stem cells. Proc. Natl Acad. Sci. USA 102, 12135-12140 (2005). Nachmani D, Stern-Ginossar N, Sarid R, Mandelboim 0. Diverse herpesvirus microRNAs target the stress-induced immune ligand MICB to escape recognition by natural killer cells. Cell Host Microbe 5, 376-385 (2009). Orom UA, Kauppinen S, Lund AH. LNA-modified oligonucleotides mediate specific inhibition of microRNA function. Gene 10, 137-141 (2006). Overgaard M, Johansen J, Moller-Jensen J, Valentin-Hansen P. Switching off small RNA regulation with trap-mRNA. Mol. Microbiol. 73, 790-800 (2009). Pang KC, Stephen S, Engstr6m PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS. RNAdb--a comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 33, D125-130 (2005). Papapetrou EP, Korkola JE, Sadelain M. A genetic strategy for single and combinatorial analysis of miRNA function in mammalian hematopoietic stem cells. Stem Cells 28, 287296 (2010). Rybak A, Fuchs H, Smimova L, Brandt C, Pohl EE, Nitsch R, Wulczyn FG. A feedback loop comprising lin-28 and let-7 controls pre-let-7 maturation during neural stem-cell commitment. Nature Cell Biol. 10, 987-993 (2008). Sayed D, Rane S, Lypowy J, He M, Chen IY, Vashistha H, Yan L, Malhotra A, Vatner D, Abdellatif M. MicroRNA-21 targets Sprouty2 and promotes cellular outgrowths. Mol. Biol. Cell 19, 3272-3282 (2008). Scherr M, Venturini L, Battmer K, Schaller-Schoenitz M, Schaefer D, Dallmann I, Ganser A, Eder M. Lentivirus-mediated antagomir expression for specific inhibition of miRNA function. Nucleic Acids Res. 35, e149 (2007). Starczynowski DT, Kuchenbauer F, Argiropoulos B, Sung S, Morin R, Muranyi A, Hirst M, Hogge D, Marra M, Wells RA, Buckstein R, Lam W, Humphries RK, Karsan A. Identification of miR- 145 and miR- 146a as mediators of the 5q- syndrome phenotype. Nature Med. 16, 49-58 (2010). Tsang JS, Ebert MS, van Oudenaarden A. Genome-wide dissection of microRNA functions and cotargeting networks using gene set signatures. Mol. Cell 38, 140-153 (2010). Valastyan S, Reinhardt F, Benaich N, Calogrias D, Szisz AM, Wang ZC, Brock JE, Richardson AL, Weinberg RA. A pleiotropically acting microRNA, miR-3 1, inhibits breast cancer metastasis. Cell 137, 1032-1046 (2009). van Rooij E, Sutherland LB, Qi X, Richardson JA, Hill J, Olson EN. Control of stressdependent cardiac growth and gene expression by a microRNA. Science 316, 575-579 (2007). Wang Y, Medvid R, Melton C, Jaenisch R, Blelloch R. DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell self-renewal. Nature Genet. 39, 380-385 (2007). Figures Figure 1. (A) In the absence of sponge treatment, target mRNAs for a particular miRNA seed family are repressed. (B) After introduction of the sponge transgene, sponge mRNAs are expressed at a high level and sequester the miRNA complexes, rescuing the expression of the endogenous targets. Spongetreated cells can be identified by their eGFP reporter expression. (C) Pairing of a miRNA with a bulged sponge site shows mismatches opposite miRNA nucleotides 9-12. The miRNA seed region is highlighted. (A) No Sponge Sponge Treated (B) 7 Gemmems.....w V (C) AGAC miR-21 3'- AGUUGUAGUC UAUUCGAU -5' Sponge 5'- UCAACAUCAGGAC AUAAGCUA-3' Figure 2. (A) Tissue-specific expression of the Gal4 transcription factor was used to drive miRNA sponge expression under the control of upstream activating sequences (UAS) in transgenic fruitflies. (B) Dissection of a complex phenotype using tissue-specific sponges. A developmental defect in the axonal branching and synaptic boutons of neuromuscular junctions (NMJ) was observed in the miR-8 knockout (second panel) and in miR-8 heterozygous flies expressing a miR-8 sponge inhibitor specifically in muscle (fourth panel). Wild-type appearance of the NMJ is seen in the miR-8 heterozygote (first panel) and in miR-8 heterozygous flies expressing a miR-8 sponge specifically in neurons (third panel). Sponge expression is indicated by GFP fluorescence (shown in green). -GA-- Im W ...... . .. .................... ilm" I I Figure 3. Roles for natural sponges in regulating miRNA activity. (A) Rapid transitions: transcriptional down-regulation of a miRNA is sharpened by induction of a sponge RNA that sequesters the lingering mature miRNA. (B) Transient responses: a stress-induced miRNA is allowed a pulse of activity before being inhibited by accumulating stress-induced sponge RNA. Stage 1 Stage 2 Time M miRNA activity with sponge I1lll111 miRNA activity without sponge Time Sponge expression Supplementary Information Optimizing the sponge construct Design of miRNA binding sites: Sites perfectly complementary to the miRNA show some inhibitory activity (Ebert et al. 2007, Sayed et al. 2008, Gentner et al. 2009), perhaps because miRNAs complexed with the catalytically inactive Argonautes 1, 3, and 4 can still be titrated by these sites without cleavage of the sponge RNA. More effective are bulged sites mispaired opposite miRNA positions 9-12 (Ebert et al. 2007, Gentner et al. 2009), presumably because they form a more stable interaction with the miRNA, including miRNA complexed with Ago2. Typical sponge constructs contain 4-10 binding sites separated by a few nucleotides each. Increasing the number of binding sites may have diminishing marginal utility, as each site increases the probability of sponge RNA degradation. Variations in the bulged mismatches and the spacers can be introduced to reduce the risk of recombination during cloning and reduce the risk of introducing multiple unintended binding sites for other regulatory factors. Sites should be placed in an unstructured, non-coding region of the RNA; for PolIll-generated sponges, terminal stem-loops can be included as stabilizing elements (Ebert et al. 2007). Expression and delivery: To maximize sponge expression, the strongest available promoter for the cell type of interest should be used. For transient assays, plasmid transfection can deliver the highest dose of the sponge transgene. For longer-term assays, viral transduction with high multiplicity of infection should be performed. In vivo delivery can be achieved with adenoviral or adeno-associated viral (AAV) vectors; AAV vectors may be ideal given their ability to infect non-dividing cells and give high expression from a non-random integration site (Kotin et al. 1990). It should be noted that optimized sponges may still exhibit different degrees of inhibition in different contexts: where miRNA concentration is very high, complete titration demands a very high dose of sponge RNA. On the other hand, where the pool of endogenous targets for the miRNA of interest is large, there should be less free miRNA available, so a lower dose of sponge RNA should suffice to give strong inhibition. Chapter 4. MicroRNAs generate gene expression thresholds with ultrasensitive transitions This chapter was written by Margaret S. Ebert and Shankar Mukherji and edited by Phillip A. Sharp and Alexander van Oudenaarden. MicroRNAs (miRNAs) are short, highly conserved non-coding RNA molecules that repress gene expression in a sequence-dependent manner. Each miRNA is predicted to target hundreds of genes (Lewis et al. 2005, Selbach et al. 2008, Baek et al. 2008, Friedman et al. 2009) and a majority of protein-coding genes are predicted to be miRNA targets (Friedman et al. 2009). Bulk measurements on populations of cells have indicated that, although pervasive, repression due to miRNAs is on average quite modest (-2-fold) (Selbach et al. 2008, Baek et al. 2008, Bartel and Chen 2004). Information on the magnitude of repression in single cells, however, has been lacking. Here we perform single-cell measurements using quantitative fluorescence microscopy and flow cytometry to monitor a target gene's protein expression in the presence and absence of regulation by miRNA. We find that while the average level of repression is modest and in agreement with previous population-based measurements, the repression among individual cells varies dramatically. In particular, we show that regulation by miRNAs establishes a threshold level of target mRNA below which protein production is highly repressed. Beyond this threshold, there is a regime in which expression responds ultrasensitively to target mRNA input until reaching high enough mRNA levels to almost escape repression by miRNA. We constructed a mathematical model describing repression of target gene expression by both non-catalytic and catalytic activity of miRNA. The model predicted, and experiments confirmed, that the ultrasensitive regime could be shifted to higher target mRNA levels by transfecting additional miRNA or by increasing the number of miRNA binding sites in the 3' UTR of the target mRNA. The ultrasensitive transition is not observed when the miRNA targets a perfectly complementary site that can undergo catalytic cleavage. These results demonstrate that even a single species of miRNA can act both as a switch to effectively silence gene expression and as a fine-tuner of gene expression. Introduction MicroRNAs regulate protein synthesis in the cell cytoplasm by promoting target mRNAs' degradation or inhibiting their translation. Their importance is suggested by their abundance, with some miRNAs expressed as high as 50,000 copies per cell (Lim et al. 2003); by their sequence conservation, with some miRNAs conserved from sea urchins to humans (Grimson et al. 2008); and by their number of targets, the majority of protein-coding genes (Friedman et al. 2009). miRNAs can regulate a large variety of cellular processes, from differentiation and proliferation to apoptosis (Yi et al. 2008, Sluijter et al. 2010, Esau et al. 2004, Cimmino et al. 2005, Li and Carthew 2005, Bernstein et al. 2003). Further, miRNAs also confer robustness to systems by stabilizing gene expression during stress and in developmental transitions (Li et al. 2009, Li et al. 2006). Results and Discussion Despite the evidence for the importance of gene regulation by miRNAs, the typical magnitude of observed repression by miRNAs is relative small (Baek et al. 2008), with some notable exceptions such as the switch-like transitions due to miRNAs lin-4 and let7 targeting the heterochronic genes lin-14 and lin-41 respectively in Caenorhabditis elegans (Bagga et al. 2005). Importantly however, most of the previous studies of regulation by miRNAs in mammalian cells have measured population averages, which often obscure how individual cells respond to signals (Raj and van Oudenaarden 2008). To assay for miRNA activity in single mammalian cells, we constructed a two-color fluorescent reporter system that permits simultaneous monitoring of protein levels in the presence and absence of regulation by miRNA (Fig. 1a). The construct consists of a bidirectional Tet-inducible promoter driving two genes expressing the fluorescent proteins mCherry and eYFP tagged with nuclear localization sequences. The 3' UTR of mCherry is engineered to contain N binding sites for miRNA regulation. In the initial experiments, the inserted sites are recognized by miR-20, which is expressed endogenously in Hela cells along with its seed family members miR-17-5p and miR106b. The 3' UTR of eYFP is left unchanged so that it can serve as a reporter of the transcriptional activity in a single cell. We constructed cell lines that stably expressed the fluorescent reporter construct with either a single bulged miR-20 binding site or no site in the mCherry 3' UTR. The levels of eYFP and mCherry protein were measured for single cells using quantitative fluorescence microscopy. Arranging individual cells according to their eYFP expression level, we observed that cells whose mCherry 3' UTR lacks miRNA binding sites had a concomitant increase in mCherry expression (Fig. Ib). This indicates that in the absence of miRNA targeting of the mCherry mRNA, the level of expression of eYFP is directly related to the level of expression of mCherry. However, in cells with a miR-20 site in the mCherry 3' UTR, the eYFP fluorescence initially increases with no corresponding increase in mCherry expression level (Fig. 1c). To capture this behavior quantitatively, we measured joint distributions of mCherry and eYFP levels in single cells, binned the single cell data according to their eYFP levels, and calculated the mean mCherry level in each eYFP bin (Supplementary Fig. 1). We refer to this binned joint distribution as the transfer function. As suggested by the representative single cells shown in Fig. 1c, the transfer function shows a threshold-linear behavior in which the mCherry level, which represents the target protein production, does not appreciably rise until the curve reaches a threshold level of eYFP. We developed a simple mathematical model of miRNA-mediated regulation that could reproduce the nonlinearity in the above transfer function. This model (Fig. 2a) is similar to previous models (Elf et al. 2003) used to describe protein-protein titration (Buchler and Louis 2008) and small RNA (sRNA) regulation in bacterial systems (Levine et al. 2007). It describes the concentration of free target mRNA (r) subject to regulation by miRNA (m). We assume that only r can be translated into protein. Experimentally, we expect the mCherry signal to be proportional to the concentration of r, and the eYFP signal to be proportional to the concentration of runtargeted.The core of the model involves the binding of r to m to form a mRNA-miRNA complex and the release of m from the complex back into the pool of active miRNA molecules either with or without the accompanying destruction of r. We assume that the total amount of miRNA is fixed; experimentally we observe no decrease in the miR-20 level beyond experimental uncertainty as a function of eYFP (see Supplementary Fig. 2). The qualitative shape of the transfer functions generated by the model depends on two key lumped parameters. The first parameter X, which behaves like a dissociation constant, governs the sharpness of the threshold (Fig. 2b). On a log-log plot relating r to runtargeted(Fig. 2d) the increased sharpness manifests itself as a slope (which we refer to as the logarithmic gain) greater than 1, marking an ultrasensitive transition connecting the branches of the transfer function of slope 1 that indicate little protein expression (below the ultrasensitive transition) and nearly maximal protein production (above the ultrasensitive transition). k is inversely proportional to the rate at which miRNA binds the target mRNA (ko,); as ko, increases at a constant kff, k decreases and thus sharpens the transition. The threshold constant 0 plays a role in the placement of the threshold and also in the sharpness of the transition between the threshold and escape regimes (Fig. 2c). 0 is proportional to the concentration of free miRNA available within the cell; as the total concentration of free miRNAs increases, 0 increases and pushes the ultrasensitive transition to higher values of runtargeted(Fig.2e). The mathematical model thus suggests experiments that could be performed to modulate the ultrasensitive transitions generated by miRNA-mediated regulation. As our stable cell lines could not achieve high enough levels of reporter expression to capture the complete ultrasensitive transition to escape from miRNA-mediated repression, we carried out the remainder of our experiments by transiently transfecting HeLa cells with reporter constructs and measuring fluorescence via flow cytometry to increase the number of cells in our datasets. To sharpen the transitions by increasing kon we increased the number of miRNA binding sites N in the 3' UTR of mCherry. The maximum logarithmic gain increases smoothly from approximately 1 when N= 1 to 1.8 when N = 7 (Fig. 3a); as expected from the model, the effect is stronger going from 1 to 4 binding sites than from 4 to 7 sites. We were also able to recapitulate a similar transfer function with N= 7 in the 3' UTR of eYFP, thus isolating the effect to miR-20 mediated regulation rather than any property intrinsic to the mCherry reporter (Supplementary Fig. 3). Interestingly, unlike with previous experiments using bacterial sRNA (Levine et al. 2007), we can also directly test the importance of titration to generate thresholds by using miR-20 binding sites that are perfectly complementary to the endogenous miR-20, thus converting the interaction between target and miRNA into a strongly catalytic, RNAi-type repression. We observe that when the miR-20 bulged binding sites are replaced by a perfectly complementary binding site that yields the same maximum repression as N= 7 bulged sites, the ultrasensitive transition is abolished altogether (Fig. 3a, grey points). To measure the fold repression as a function of target expression level, we measure the transfer function in the absence of miR-20 binding sites and calculate the ratio of this control transfer function to transfer functions in the presence of 1, 4, and 7 miR-20 sites (Fig. 3b). As expected from Fig. 3a, increasing the number of binding sites increases the fold repression at lower eYFP levels, from just over 2-fold repression with a single miR20 site to approximately 10-fold repression with seven miR-20 sites, while not significantly changing the fold repression at high eYFP (Fig. 3b). Seen this way, we demonstrate that rather than being only a subtle effect as suggested by population-based averages, which in this case results in at most 2.5-fold repression with seven binding sites (Supplementary Fig. 4), regulation by miR-20 can exert very strong repression of protein production at low target transcript levels. Moreover the boundary of the regime of strongest repression is marked by the ultrasensitive transition, so shifting this transition to lower or higher target mRNA levels can be of functional significance. Consistent with the model, the ultrasensitive transition can be shifted to either higher or lower eYFP levels by transfecting either miR-20 mimic oligonucleotides (siRNAs) or miRNA sponges that inhibit miR-20 activity (Ebert et al. 2007) (Fig. 3c, d; Supplementary Fig. 5). Increasing the level of miRNA increased the fold repression below the threshold; the threshold mRNA level needed for protein expression; and the sharpness of the transition. In the extreme case of seven miR-20 binding sites with 30 nM miR-20 mimic transfected (Fig. 3d), miRNA-mediated repression can achieve ~40-fold repression compared to a target with no miRNA binding site; the threshold is shifted to a 10-fold higher eYFP level; and the transition between repressed and unrepressed expression is quite sharp with a maximum logarithmic gain of ~5.4 (Fig. 3d), compared to ~1.8 without the transfected miR-20 mimic, i.e. endogenous levels (Fig.3a). To quantitatively compare the data to the model, we simultaneously fit all the datasets holding k constant across the fits for particular values of N and 0 constants for a particular amount of transfected siRNA mimic. Interestingly, we see that the fit parameter 0 increases with increasing siRNA mimic (Fig. 3e), but in a saturable fashion, while 1/k increases linearly with N (Fig. 3f). This suggests that the amount of transfected miRNA entering functional complexes is limited by entry into the cytoplasm and/or availability of miRNP components. In order to test the generality of these findings, that the strength of repression of a miRNA target depends strongly on the relative amounts of the miRNA and its target, we sought to recapitulate the results in more physiological settings. First, we tested whether similar ultrasensitive transitions would be observed when the reporter construct incorporated naturally occurring miRNA binding sequences by fusing the 3' UTRs of the oncogene HMGA2 and the major GABA transporter gene SLC6AI to the mCherry reporter and performing dual-color flow cytometry. The HMGA2 3' UTR contains seven binding sites for the miRNA family let-7, which is abundant in HeLa cells, while SLC6A 1 contains three binding sites for the neuronal miRNA miR-218, which we supplied exogenously. The experiments showed that we could indeed observe ultrasensitive transitions with these constructs (Supplementary Fig. 6) and for HMGA2, we increased the ultrasensitive threshold incrementally by transfecting higher doses of let-7 siRNA mimic (Supplementary Fig. 6). Finally we used a standard dual luciferase assay (see Methods, Supplementary Fig. 7) to measure target expression in mouse embryonic stem cells (ES cells) using only their endogenous pool of miRNA to retain physiological relevance. Furthermore, we measured a transfer function complementary to that in the experiments with Hela cells: the mRNA target level remained fixed while the miRNA concentration varied. To test varying miRNA concentrations we exploited the fact that different miRNA species are present at different abundances in ES cells (Calabrese et al. 2007). Finally, to gauge the strength of miRNA repression, target expression in wild-type ES cells was normalized to target expression in ES cells that lack the enzyme Dicer and thus contain no miRNAs. We observe a similar threshold-linear curve except that it reflects the level of miRNAs (Supplementary Fig. 7): at high miRNA abundances, repression is 5-fold but decreases with miRNA abundance until at the lowest miRNA levels target expression in wild-type cells is virtually indistinguishable from that in the miRNA-free Dcr' cells. The threshold in regulation by miRNA is determined by the level of the miRNA and by the number and affinity of the target sites. Taking the case described above for regulation by endogenous miR-20 in Hela cells, the threshold transition occurs at approximately 60 target mRNAs per cell with seven typical sites in the 3' UTR at an endogenous level of approximately 2,000 miR-20 molecules per cell (Supplementary Fig. 2, Supplementary Fig. 8). Many of these miRNAs as miRNP complexes could be bound to the endogenous miR-20 target mRNAs in the cell, leaving a limited pool for binding to the reporter mRNAs. Since these experiments are done at steady state conditions, this suggests that the miRNA system probably has limited capacity to accommodate increases in target populations. These results are consistent with our ability to strongly suppress miR-20 regulation of the target reporter by adding high levels of miR-20 target sites in the form of an exogenous sponge inhibitor (Ebert et al. 2007) (Supplementary Fig. 5). The sponge phenomenon has been observed in multiple mammalian (Edbauer et al. 2010; Starczynowski et al. 2010) and non-mammalian (Loya et al. 2009) organisms indicating its generality in miRNA regulation. Our analysis of miRNA-mediated gene regulation at high target expression levels is consistent with previous population-based results, but measuring single cells offers a level of detail inaccessible to bulk assays. The detailed picture, which revealed the ultrasensitive response bounded by a high degree of repression at low target mRNA levels and little repression at high levels of target mRNA, may have important implications for miRNA-mediated regulation. There has been disparity between the concept of miRNAs as switches, exemplified by the lin-14 developmental switch in Caenorhabditiselegans where there is a high degree of repression by the miRNA lin-4, versus many observations of miRNA-mediated regulation in mammalian cells where they are best considered as fine-tuners of gene expression. These results show that for some miRNA-target interactions, the miRNA behaves both as a switch, in the target expression regime below the threshold, and as a fine-tuner, in the ultrasensitive transition between the threshold and the minimal repression regime at high mRNA levels. The target expression thresholds generated by miRNAs could be important in development. Ultrasensitivity characterizes developmental switches such as cell fate decisions. To maintain their identity, differentiated cells must be able to distinguish between leaky and legitimate transcripts. In addition to participating in feedback and feed-forward networks (Tsang et al. 2007, Stark et al. 2005), tissue-specific miRNAs could use molecular titration to set a threshold below which transcripts would be treated as leaky. Such a phenomenon is consistent with the reported tendency of Drosophila miRNAs to target mRNAs that are highly expressed in neighboring tissues derived from a common progenitor (Stark et al. 2005), and with the observed tendency of mammalian miRNAs induced upon differentiation to target mRNAs that were highly expressed in the previous developmental stage (Farh et al. 2005). The ultrasensitive transition would minimize the range of uncertainty between leaky and legitimate messages. Decisive onoff regulation of gene expression is necessary in differentiation and in the continual reinforcement of cell/tissue identity throughout the life of the animal. Methods Reporter plasmid construction: Fluorescent reporters were cloned into pTRE-Tight-BI (Clontech). NLS sequences (ATGGGCCCTAAAAAGAAGCGTAAAGTC) were appended to the N-terminus of the eYFP and mCherry ORFs (Clontech) by PCR. The NLS-eYFP was inserted with EcoRI and NdeI. The NLS-mCherry was inserted with BamHI and Clal. Regulatory elements were placed into the eYFP 3' UTR with NdeI and XbaI; they were placed into the mCherry 3' UTR with Clal and EcoRV. N= 1 bulged miR-20 binding site (TACCTGCACTCGCGCACTTTA) was appended by PCR. N= 4 and N= 7 miR-20 sites, separated by CCGG spacers, were PCR-amplified from miR-20 sponge constructs (Ebert et al. 2007). All constructs were sequence-confirmed. HMGA2 w.t. and seedmutant 3' UTRs (Mayr et al. 2007) were a gift from Christine Mayr, David Bartel lab. The SLC6A1 3' UTR fragment (nt 703-2041) was PCR-amplified from human genomic DNA. Generation of stable lines: Reporter plasmids were linearized with Asel and cotransfected at 20:1 ratio with linear puromycin marker (Clontech). Transfected cells were selected in 2.5 pg/ml puromycin with 200 pig/mi G418. Individual eYFP-positive colonies were isolated, grown, and sorted for eYFP-positivity upon dox induction (MoFlo instrument, DAKO-Cytomation). Fluorescence microscopy: Cells were plated on glass-bottomed Nunc chambers (#1), induced with dox for 4 days, and imaged in a Nikon TEI-2000 inverted fluorescence microscope with a Princeton Instruments Pixis back-cooled CCD camera. Images were processed using custom software in MATLAB. Briefly, following subtraction of camera background and any cellular autofluorescence, pixel values in both eYFP and mCherry channels corresponding to cells expressing the construct were extracted. The single-cell data were then binned along the eYFP axis. Figure 1d reports the result of this binning procedure; the error bars are the standard errors of the mean within its corresponding bin. Transient transfection: Tet-On HeLa cells (Clontech) below passage 10 were plated in G418 (Gibco) 200 ptg/ml and doxycycline (Sigma) 1 pig/ml media in 12-well dishes the day before transfection. Reporter plasmids were diluted 1:50 in pUC18b carrier plasmid (Qiagen HiSpeed maxipreps) and mixed with DreamFect Gold (Oz Biosciences), 8 pil reagent and 2 ptg DNA per well. miR-20a, let-7b, and miR-218 mimics (Dharmacon) were cotransfected at the indicated concentrations. For U6 sponge assays, reporter plasmids were diluted 1:50 in sponge plasmid. Media was changed 24hr post-transfection. Assays were performed 48hr post-transfection. Reporter transfections were also performed with Lipofectamine 2000 (Invitrogen) with the same results. Flow cytometry: Cells were run on LSRII analyzer (Becton Dickinson) with FACSDiva software. The raw FACS data were analyzed with FlowJo to gate cells according to their forward (FSC-A) and side (SSC-A) scatter profiles; specifically we chose cells near the peak of the (FSCA, SSC-A) distribution. Untransfected cells were used to characterize the cellular autofluorescence in the LSRII analyzer from which we obtain the mean and standard deviation of the autofluorescence distribution. Each cell's eYFP and mCherry fluorescence values were subtracted by the mean autofluorescence plus twice the standard deviation. Following background subtraction, cells with eYFP fluorescence levels less than 0 (i.e. indistinguishable from background) were excluded from further analysis and mCherry fluorescence levels less than 0 were set equal to 0. The single-cell data were then binned in the same manner as described above. Fluorescence-activated cell sorting: Cells transfected with the N= 0 or N= 7 reporter were sorted 48hr post-transfection into low and high fractions using a MoFlo high-speed sorting instrument (DAKOCytomation). Cell pellets were washed and snap-frozen before RNA isolation. RT-PCR: Total RNA was harvested using RNeasy Micro Plus kit with the protocol modified for inclusion of small RNAs (Qiagen). RNA was treated with DNaseI (Ambion) and reversetranscribed with oligo-dT primer using MMLV RTase (Ambion). qPCR for mCherry and eYFP was performed in triplicate reactions using SYBRGreen mix (Applied Biosystems), run on an Applied Biosystems 7500 Real-Time PCR instrument. Single-stranded DNA standards spiked into untransfected cell cDNAs were used for estimation of mCherry mRNAs per cell. miR-20 was measured with miScript RT-PCR assay (Qiagen) in quadruplicate reactions using miR-31 and snoRNA as controls. Small RNA Northern blot: Total RNA was extracted from transfected cells with TRIzol (Invitrogen). 24 ptg of total RNA was run on 12% polyacrylamide gel (UreaGel system, National Diagnostics), with miR-20 mimic as a standard, spiked into yeast sheared total RNA (Ambion). The blot was probed for miR-20a and tRNAgn as a loading control. Quantitation of bands was performed with ImageJ. mES cell luciferase assays: Reporters were constructed by insertion of two bulged binding sites into the 3' UTR of CMV Renilla luciferase. Cells were transfected in triplicate in 24-well plates with 2 pl Lipofectamine 2000 (Invitrogen), 0.01 pg of CMV-Renilla plasmid, 0.1 pg of pGL3 (Promega), and 0.69 pg of pWS (carrier plasmid). Cells were lysed and assayed 24hr post-transfection by Dual Luciferase reporter assay (Promega) using a Glomax 20/20 luminometer (Promega). References Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP. The impact of microRNAs on protein output. Nature 455, 64-71 (2008). Bagga S, Bracht J, Hunter S, Massirer K, Holtz J, Eachus R, Pasquinelli AE. Regulation by let-7 and lin-4 miRNAs results in target mRNA degradation. Cell 122, 553-563 (2005). Bartel DP, Chen CZ. Micromanagers of gene expression: the potentially widespread influence of metazoan microRNAs. Nat. Rev. Genet. 5, 396-400 (2004). Bernstein E, Kim SY, Carmell MA, Murchison EP, Alcorn H, Li MZ, Mills AA, Elledge SJ, Anderson KV, Hannon GJ. Dicer is essential for mouse development. Nat. Genet. 35, 215-217 (2003). Buchler N, Louis M. Molecular titration and ultrasensitivity in regulatory networks. J. Mol. Biol. 384, 1106-1119 (2008). Calabrese JM, Seila AC, Yeo GW, Sharp PA. RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc. Natl Acad. Sci. USA 104, 18097-18102 (2007). Cimmino A, Calin GA, Fabbri M, Lorio MV, Ferracin M, Shimizu M, Wojcik SE, Aqeilan RI, Zupo S, Dono M, Rassenti L, Alder H, Volinia S, Liu CG, Kipps TJ, Negrini M, Croce CM. miR- 15 and miR- 16 induce apoptosis by targeting Bcl2. Proc. Natl Acad. Sci. USA 102, 13944-13949 (2005). Ebert MS, Neilson JR, Sharp PA. MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nat. Meth. 4, 721-726 (2007). Edbauer D, Neilson JR, Foster KA, Wang CF, Seeburg DP, Batterton MN, Tada T, Dolan BM, Sharp PA, Sheng M. Regulation of Synaptic Structure and Function by FMRPAssociated MicroRNAs miR-125b and miR-132. Neuron 65, 373-384 (2010). Elf J, Paulsson J, Berg OG, Ehrenberg M. Near-critical phenomena in intracellular metabolite pools. Biophys. J. 84, 154-170 (2003). Esau C, Kang X, Peralta E, Hanson E, Marcusson EG, Ravichandran LV, Sun Y, Koo S, Perera RJ, Jain R, Dean NM, Freier SM, Bennett CF, Lollo B, Griffey R. MicroRNA-143 regulates adipocyte differentiation. J. Biol Chem. 279, 52361-52365 (2004). Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP. The widespread impact of mammalian microRNAs on mRNA repression and evolution. Science 310, 1817-1821 (2005). Friedman RC, Farh KK, Burge CB, Bartel DP. Most Mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92-105 (2009). Grimson A, Srivastava M, Fahey B, Woodcroft BJ, Chiang HR, King N, Degnan BM, Rokhsar DS, Bartel DP. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455, 1193-1197 (2008). Levine E, Zhang Z, Kuhlman T, Hwa T. Quantitative characteristics of gene regulation by small RNA. PLoS Biol. 5, e229 (2007). Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20 (2005). Li X, Carthew RW. A microRNA mediates EGF receptor signaling and promotes photoreceptor differentiation in the Drosophila eye. Cell 123, 1267-1277 (2005). Li X, Cassidy JJ, Reinke CA, Fischboeck S, Carthew RW. A microRNA imparts robustness against environmental fluctuation during development. Cell 137, 273-282 (2009). Li Y, Wang F, Lee JA, Gao FB. MicroRNA-9a ensures the precise specification of sensory organ precursors in Drosophila. Genes Dev. 20, 2793-2805 (2006). Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP. The microRNAs of Caenorhabditis elegans. Genes Dev. 17, 991-1008 (2003). Loya CM, Lu CS, Van Vactor D, Fulga TA. Transgenic microRNA inhibition with spatiotemporal specificity in intact organisms. Nat. Meth. 6, 897-903 (2009). Mayr C, Hemann MT, Bartel DP. Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 315, 1576-1579 (2007). Raj A, van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216-226 (2008). Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N. Widespread changes in protein synthesis induced by microRNAs. Nature 455, 58-63 (2008). Sluijter JP, van Mil A, van Vliet P, Metz CH, Liu J, Doevendans PA, Goumans MJ. MicroRNA- 1 and -499 regulate differentiation and proliferation in human-derived cardiomyocyte progenitor cells. Arterioscler. Thromb. Vasc. Biol. 30, 859-868 (2010). Starczynowski DT, Kuchenbauer F, Argiropoulos B, Sung S, Morin R, Muranyi A, Hirst M, Hogge D, Marra M, Wells RA, Buckstein R, Lam W, Humphries RK, Karsan A. Identification of miR- 145 and miR- 146a as mediators of the 5q- syndrome phenotype. Nat. Med. 16, 49-58 (2010). Stark A, Brennecke J, Bushati N, Russell RB, Cohen SM. Animal microRNAs confer robustness to gene expression and have a significant impact on 3' UTR evolution. Cell 123, 1133-1146 (2005). Tsang J, Zhu J, van Oudenaarden A. MicroRNA-mediated feedback and feedforward loops are recurrent network motifs in mammals. Mol. Cell 26, 753-767 (2007). Yi R, Poy MN, Stoffel M, Fuchs E. A skin microRNA promotes differentiation by repressing 'stemness'. Nature 452, 225-229 (2008). Acknowledgments This work as supported by an NIH Director's Pioneer Award to A.v.O. (1DP1OD003936); and by United States Public Health Service grants ROl-ROlCA 133404 from the National Institutes of Health, PO1-CA42063 from the National Cancer Institute and partially by Cancer Center Support (core) grant P30-CA14051 from the National Cancer Institute (to P.A.S.) M.S.E. was supported by a HHMI Predoctoral Fellowship and a Paul and Cleo Schimmel Scholarship. G.X.Z. and J.S.T. were partially supported by Natural Sciences and Engineering Research Council of Canada Post Graduate Scholarships. We thank Gregor Neuert for help with cloning the reporter genes, Koch Institute flow cytometry staff for training and cell sorting, and David Bartel for helpful discussions. Author Contributions M.S.E., J.S.T., P.A.S. and A.v.O. conceived the project. M.S.E., S.M. and G.X.Z. performed the experiments. S.M. and M.S.E. analyzed the data. S.M. performed the modeling. S.M., M.S.E., A.v.O. and P.A.S. interpreted the results and wrote the paper. u uuumuinii Figures Figure 1: Quantitative fluorescence microscopy reveals microRNA-mediated gene expression threshold. a. The two-color fluorescent reporter construct consists of a bidirectional Tet promoter that coregulates the enhanced yellow fluorescent protein (eYFP) and mCherry. Each fluorescent protein is tagged with a nuclear localization sequence (NLS) to aid in image analysis. The 3' UTR of the mCherry gene is engineered to contain N binding sites for the microRNA mir-20. b. Sample fluorescence microscopy data from representative single cells stably expressing eYFP and mCherry both in the presence and absence of regulation of mCherry by miR-20. The cells are arranged according to eYFP intensity. c. Transfer function relating eYFP to mCherry generated by binning according to eYFP intensity and plotting the mean mCherry in each bin (see Supplementary Fig. 1). PTRE-Tight N miR-20 binding site(s) 3'-UTR A AN=O AA AN=1 (TACCTGCACTCGCGCACTTTA )z AA AIL LAM 3'-UTR 6 A A AA A AM A.O AAA A 0 25 50 75 eYFP (a.u.) 100 Figure 2: Biochemical model of microRNA-mediated gene regulation. a. The model describes the steady state level of mRNA free to be translated (r), which we experimentally observe as the mCherry signal, as a function of transcriptional activity (runtargeted), which we experimentally observe as eYFP. miRNA and mRNA bind with rate kon, unbind with rate koff, and result in mRNA decay, but not miRNA decay, with rate yr*. b. Steady state solutions for r as a function of runtargeted for various values of konc. Steady state solutions for r as a function of runtargeted for various values of [miRNA]total. d, e. Same as b and c except depicted in log-log axes. The slope of the log-log curve is known as the logarithmic gain. Notably, thresholds in the linear representation appear as segments with logarithmic gain greater than 1 in the log-log representation. Increasing kon increases the maximum logarithmic gain, but does not change its position along the ru.ntargeted axis, while increasing [miRNA]ttai increases the maximum logarithmic gain and shifts it to higher levels of runtargeted- ........ .................................... kR translation free mRNA (r) miRNA-mRNA complex (r*) YR* runtargeted runtargeted e CD 0 og0o(runtargeted) 1 og0o(runtargeted) 1 Figure 3: Modulating the ultrasensitive response. a. Log-log transfer functions for N = 0, 1, 4, and 7. Additionally, we can abolish the ultrasensitive response by using a miR-20 binding site that is perfectly complementary to miR-20. b. Ratio of N = 0 transfer function to N= 1, 4, and 7 transfer functions, depicting the fold repression as a function of eYFP expression. c, d. Effects of titrating defined amounts of miR-20 mimic siRNA on the transfer function for N 4 (c) and N= 7 (d). e, f. Following simultaneous fitting of all transfer function data to the quantitative model, the fitting parameter 0, proportional to the total amount of active miR-20 in the cell, is plotted against the amount of miR-20 mimic transfected (e), and 1/X, proportional to the rate of mCherry-miR-20 association, is plotted against N (f). a *NM1 *N=4 5 -N- 0 4 E Pa-r 0 3 0 x 104 C 1 2 eYFP 3 4 d *Mao N 4 3nM mimic 5 +nM mimic SNWmmi " c> (> 4 E E 0 0 C> 3 3 4 5 og1 0(eYFP) e 3 4 5 loglo(eYFP) f 10 Z 0 8 6 4 OD2 Xi 10 0x io 0 10 20 [miRNA]transe 30 (nM) Mukherji et al. - Fig. 3 -11 , 111111- - - MMMW*W"!M- Supplementary Information Figure Si: Data processing. a. Each cell's raw eYFP and mCherry intensities, either from fluorescence microscopy (not shown) or flow cytometry (shown here), are plotted. b. Then the background, autofluorescent levels of eYFP and mCherry are subtracted. The background-corrected correlation data are then binned according to eYFP levels, and for each eYFP bin its mean mCherry signal is calculated; this binned curve is depicted in c. subtract autofluorescent background 1og1 (eYFP) calculate mean mCherry in each eYFP bin C E 1og1 (eYFP) Mukherjj et a1 - Fig 81 NNW- - Figure S2: miR-20 expression in Tet-On HeLa cells. a. Absolute miR-20 expression measured by northern blot. Total RNA from Tet-On HeLa cells transfected with various reporter constructs was probed for miR-20 expression compared to a standard curve of miR-20 mimic spiked into yeast RNA. tRNAgin serves as a loading control. b. Relative miR-20 expression above and below the threshold measured by RT-PCR. Cells transfected with the N= 7 target reporter or the N= 0 control reporter were sorted into low and high fractions. Total RNA was assayed for miR-20 and normalized to miR-31 as a loading control. Bar height and error bars represent the average relative normalized miR-20 value in the high fraction compared to the low fraction and the s.e.m. of three RT-PCR assays. a copies/cell 1 b 3: 30 nt C 0 miR-20a 1 0.9 ---- - . 0.8 O0 20 nt E 0.6 0.5 0 0.3 ~ N=O tRNAin Mukherji et al. - Fig. S2 N=7 - -- - - -----------------... . ............ . .. ....... Figure S3: Dye swap control. The binding site region from the N = 7 reporter was fused onto the 3' UTR of eYFP instead of mCherry. Cells transfected with this construct were assayed by flow cytometry at 48hr posttransfection. - slope= 1 * 2gY N=7 * / Iogj(eYFP) An i 1 , I - Figure S4: Average fold repression as a function of N. Using the flow cytometry data, we compute the ratio of the mean eYFP level to the mean mCherry level for N= 1, 4, and 7. We then normalize this ratio by the mean eYFP to mean mCherry ratio for N= 0; we refer to this normalized ratio as the fold repression. Error bars are estimated by bootstrapping. 3 0 L 422 NV=1I N=-4 N=7I Figure S5: Inhibition of endogenous miR-20 using miRNA sponges. Reporter plasmids were cotransfected with Pol 111-driven sponges containing seven CXCR4 control sites or seven bulged miR-20 binding sites. Samples were assayed by flow cytometry at 48hr post-transfection. a. N= 0 reporter. b. N = 1 reporter. c. N = 1 perfect reporter. d. N= 4 reporter. e. N = 7 reporter. a + control sponge + miR-20 sponge + control sponge V + control sponge + miR-20 sponge 0 + miR-20 sponge 5 5 -c 04 4 03 3 :N=O 3 4 3 5 log 1 (eYFP) d 4 4 5 +control sponge 5 () 4 04 control sponge *+ + miR-20 sponge C 5 log 10(eYFP) 1og1 (eYFP) * + miR-20 sponge 5 0 E E 0 3 4 5 1og1 (eYFP) 4 03 3 4 5 log 10(eYFP) Mukherji et al - Fig. S5 Figure S6: Ultrasensitivity in endogenous 3' UTRs. a. The 3' UTR of HMGA2 or a version with the seven let-7 seed matches mutated was fused to mCherry. The reporters were cotransfected with varying concentrations of let-7b mimic. Cells were assayed by flow cytometry 48hr post-transfection. b. The 3' UTR of SLC6A1, which contains three seed matches for miR-218, was fused to mCherry. The reporter was transfected with or without miR-218 mimic. Cells were assayed by flow cytometry 48hr post-transfection. a 3 2.5 3.5 4 4.5 Iogje(eYFP) logio(eYFP) 0 MutantHMGA23'UTR * SLC6AI 3'UTR * + " HMGA2 3'UTR HMGA2 3' UTR + 1OnMlet-7 mimic 5 30nM miR-218 mimic " HMGA2 3' UTR + 31nM let-7 mimic * HMGA2 3' UTR + 1OOnM let-7 mimic Mukherji et al - Fig S6 .......... Figure S7: Fold repression as a function of microRNA abundance. a. Schematic depicting dual luciferase assay used to measure fold repression in mES cells. The 3' UTR of Renilla luciferase is re-engineered for each measurement to contain two binding sites for different miRNAs. b. Fold repression as a function of miRNA concentration in copies per cell. aN=2 sites or GXCR4 control 3' UTR Transfect R-luc with 2 bulged miRNA sites or CXCR4 control sites; F-luc is the loading control * Measure expression of construct with miRNA sites relative to construct with CXCR4 control sites relative expression in fold repression = relative expression in 0 2 4 6 8 10 12 14 [miRNA] x 103 per cell relative expression in 4 relative expression in ( Mukherji et al - Fig. 7 Figure S8: mRNA quantitation above and below the threshold. a. Cells transfected with the N = 7 target reporter were sorted into low and high fractions separated by the ultrasensitive transition. Plots are in log-log scale. b. Corresponding low and high fractions were collected from cells transfected with the N = 0 control reporter. c. RT-PCR from the N= 0 control reporter's sorted fractions. The absolute mCherry mRNA levels were estimated by making a standard curve using a DNA oligo spanning the mCherry cDNA's amplicon, spiked into cDNA from untransfected cells. Shown are the average mCherry mRNAs per cell +/- s.e.m. from three RT-PCR assays. The threshold for the N = 7 target reporter is represented by the average mCherry mRNAs per cell present in the corresponding low fraction of the untargeted control reporter. d. mRNA knockdown above and below the threshold. The relative mCherry mRNA expression in the N= 0 and N = 7 low and high fractions was calculated by normalizing the mCherry RT-PCR signal to the eYFP RT-PCR signal in each fraction. Bar height and error bars indicate the average relative mCherry to eYFP value and s.e.m. of four RT-PCR assays. a I LG 1, FI1 5 R2 104 R, AS .90 P.7 .0 6 "VF Fraction N= 0 low mRNAs per cell 56+/-36 0.6 N= 0 high 1066 +/- 472 0.2 N-0 low N-7 low N-0 high Mukherji et a/. - Fig. S8 N-7 high Molecular titration model of miRNA-mediated gene regulation In order to describe our data, we devised a simple mathematical model of the biochemistry of miRNA-mediated gene regulation. The model is largely similar to models of protein-protein interactions proposed by Buchler and Louis as well as models of sRNA regulation of expression proposed by Levine et al. The model describes the time evolution of the target mRNA free of miRNA (r) and the target mRNA bound by miRNA (r*) and assumes that the turnover of miRNA is slow compared to the timescale of gene expression so that it can be held constant. The model consists of the following set of coupled, first-order, ordinary differential equations and the conservation relation for miRNA: dr -= -k kR dt 0 r [miRNA] + ko,,r - YRr dr * dt = konr [miRNA] - koffr* - yR r* (2) [miRNA]T = [miRNA] + r* (3) For the sake of simplicity, we assume that no translation can occur from the miRNA-bound target mRNA such that for the purposes of protein production it is sufficient to track only the free target mRNA (r). Solving for the steady-state level of r yields: ir = ~~Ungeted 2 ruaree A - 0+ rutXeta- XO 2 0)2 + 4krentaqgete1 ] +Xulree](4) where: kg runtgeted =R YR YR* + off ko 0= [miRNA]total YR Just as in the Buchler and Louis and Levine et al. cases, when the dissociation constant (here denoted by k) is small - meaning that the interaction strength is high between the miRNA and its target - it is possible to achieve a threshold-linear relationship between the free target mRNA and the total amount of mRNA (denoted by runtargeted, which in the experiments is reported by the eYFP signal). In our case, because we allow recycling of the miRNA following destruction of its bound target mRNA, the titration effect only becomes apparent when the rate at which free miRNAs are removed from the system (kan) is much larger than the rate at which they reappear in the system, which itself consists of two parts: unbinding of the miRNA from its target (koff) and destruction of the target (YR*). In the most extreme case, for example, where kon >> koff+ YR* such that k -> 0 one obtains: r 2 [rurntargeted I[untargeted - 0 if + I runtargeted runtageted < (untargeted - 0)2] - 0 (6) 0 (7) untargeted In this limit, we see that the constant 0 sets the level of expression at which the threshold takes place. Chapter 5. Roles for microRNAs in conferring robustness to biological processes This chapter was written by Margaret S. Ebert and edited by Phillip A. Sharp. Biological systems use a variety of mechanisms to maintain their functions in the face of environmental and genetic perturbations. Increasing evidence suggests that, among their roles as post-transcriptional repressors of gene expression, microRNAs (miRNAs) help to confer robustness to gene expression by reinforcing transcriptional programs and attenuating leaky transcripts, and they may in some contexts help suppress random fluctuations in transcript copy number. These activities have important consequences for normal development and physiology, disease, and evolution. Here we will discuss examples and principles of miRNAs acting in networks that contribute to robustness in several animal systems. Introduction microRNAs (miRNAs) are -20-24-nucleotide-long hairpin-derived RNAs that posttranscriptionally repress the expression of target genes. As a class, miRNAs constitute about 1-2% of the genes in worms, flies, and mammals (Bartel 2009). About 60% of protein-coding genes are computationally predicted as targets based on conserved basepairing between the 3' UTR and the 5' region of the miRNA termed the seed (Friedman et al. 2009). The diversity of miRNA expression increases over the course of embryonic development (Thomson et al. 2006), and the diversity of the miRNA repertoire in animal genomes has increased with increasing organismal complexity (Lee et al. 2007, Heimberg et al. 2008). While many miRNAs and their target binding sites are deeply conserved, suggesting important function, many of these interactions seem to produce only very subtle repression (-2-fold), and many miRNAs can be knocked out without creating any obvious phenotype (Leaman et al. 2005, Miska et al. 2007). As more miRNA-target relationships are validated and more phenotypes are described, a view is emerging that miRNAs evolved to play the role not of the primary decision-maker but rather of the reinforcer, one that sharpens transitions and entrenches identities. Robustness refers to a system's ability to maintain its function in spite of internal or external perturbations (Kitano 2004). In biology, such systems can be considered at several levels: a biochemical pathway regulating the expression of a protein, a cluster of cells undergoing differentiation, or an organism responding to variable nutrient sources, for example. All of these biological systems, like sophisticated man-made systems, use controls such as feedback loops and back-up components to be able to carry on reliably when conditions change or one component fails. It is clear that animals living in the wild face unpredictable environments such as fluctuations in temperature or food availability, though we may lose sight of this aspect of biology when we grow our model organisms under standard, consistent laboratory conditions. The production of macromolecules within cells over time and between different cells of the same type also suffers from inherent noisiness that must be managed for biochemical pathways to function robustly. These requirements are especially relevant to the development and physiology of multicellular organisms with complex body plans. Not only must embryonic cells choose many different fates, but they must also remember their choice to maintain their cell type identity in the adult. From the cell's perspective, environment may mean one of many microenvironments within the organism, such as different regions along a morphogen gradient during embryonic development, which must be sensed and interpreted for normal morphogenesis. Why might miRNAs, in addition to other regulators of gene expression, have been selected for making biological systems more robust? As post-transcriptional regulators, miRNAs can intervene late in the pathway of gene expression to counteract variation from the upstream processes of transcription and splicing. Their mechanisms of target repression may also be specifically suited for various types of regulation: by accelerating mRNA degradation, they swiftly and irreversibly reduce target protein production; by inhibiting translation, they allow for temporary silencing followed by restoration of the target message to a translationally competent state (Bhattacharyya et al. 2006). As titrating molecules for mRNAs, miRNAs partition among thousands of targets in equilibrium association to stabilize protein expression. In the sections that follow, we will discuss the roles miRNAs play in gene regulatory networks, and specifically in dampening leaky transcripts and buffering the effects of mRNA fluctuations. These sections will shed light on the contributions of miRNAs during periods of stress and pathological states. miRNA-target architectures that increase robustness miRNAs participate in several stereotyped network motifs that are enriched in nature and known to act in making systems robust (Milo et al. 2002). One is the simple negative feedback loop, in which component A activates component B and component B inhibits component A. This motif contributes to the homeostasis of component A (and component B). For example, methyl CpG-binding protein 2 (MeCP2) acts through BDNF to induce the neuronal miRNA miR-132, which feeds back to repress MeCP2 (Klein et al. 2007; Figure 1A). Homeostasis in the level of MeCP2 expression is important, as over- or under-expression of this regulator causes neurodevelopmental defects. One of the most common feedback motifs known to involve miRNAs is the mutual negative feedback loop, in which components A and B inhibit each other (Chang et al. 2004, Bracken et al. 2008, Burk et al. 2008, Juan et al. 2009, Kefas et al. 2009, Roush and Slack 2009, Xu et al. 2009, Zhao et al. 2009). Typically this motif helps to establish bistability between a precursor cell type and a terminally differentiated cell fate. For example, the transcription factor NFI-A suppresses expression of the primary miR-223 transcript in undifferentiated myeloid precursors, and upon retinoic acid-induced differentiation into granulocytes, miR-223 accumulates and represses NFI-A, thereby helping to prevent a return to the precursor state (Fazi et al. 2005; Figure IB). Positive feedback loops, in which component A and component B activate each other, also contribute to switches in development. For example, the "2 degrees" vulval precursor cell fate is established in the worm when LIN 12 directly activates transcription of miR-6 1, which then represses vav- 1, a negative regulator of LIN 12 activity (Yoo and Greenwald, 2005; Figure IC). In this case the indirect link may build additional control into the lineage decision, as LIN 12 expression must be sustained enough for miR-61 to accumulate to sufficiently lower the level of Vav- 1 protein in order to allow for adequate LIN12 activity. Feedforward loops are another set of common motifs that involve miRNAs and are consistent with conferring robustness. In a coherent feed-forward loop, component A inhibits (or activates) component B and activates (or inhibits, respectively) component C, which is another repressor of component B. This architecture can increase the fidelity of inhibition of the downstream component by acting on it redundantly; that is, a transient loss of component A can be compensated for by the lingering presence of component C. As with positive and negative feedback loops, it is often used in lineage commitment. For example, CCAAT enhancer binding protein alpha (C/EBPalpha) inhibits transcription of the cell cycle regulator E2F1 during granulopoiesis (Pulikkan et al. 2010). C/EBPalpha also induces miR-223, which post-transcriptionally represses E2F 1. As is often the case, this feedforward loop is interlocked with a feedback loop: E2F 1 inhibits production of miR-223 (Figure ID). This example illustrates several principles of miRNA networks in development: 1, In these loops, the miRNA often targets a transcriptional regulator. 2, Combining feedforward with feedback motifs may allow cells to distinguish between transient fluctuations (which should be counteracted) and permanent changes (which should be enhanced or maintained). 3, There are often other network motifs involving a cell type-specific miRNA that reinforce the same cell fate decision, as with miR-223 and NFI-A in granulocytes (see above). Incoherent feedforward loops also comprise three components, but instead of reinforcing a signal, the added component sends a contradictory signal. Component A activates component B and simultaneously activates component C, which is a repressor of component B. This motif can play several roles depending on the relative rates of production and turnover of its components. Where component C is slower to accumulate or decay compared to the downstream component B, the feedforward can create a pulse (or, where component A is an inhibitor of both B and C, a delay) in the expression of component B. Where both component C and B respond on the same timescale, the feedforward can buffer the expression of component B against fluctuations in component A. Overall, it fine-tunes the target protein level below the level set by transcriptional control. One example of an incoherent feedforward loop is c-myc activating transcription of both E2F1 and the 17~92 miRNA cluster on chromosome 13 (O'Donnell et al. 2005). The constituent seed family members miR-17-5p and -20 directly repress E2F 1 (Figure 1E). There are also a few cases of a pure incoherent feedforward loop wherein a miRNA expressed from an intron of a host gene targets that very gene, e.g. miR-26a and its host/target CTDSP2 (Tsang et al. 2007). Finally, the incoherent feedforward motif can act indirectly to disambiguate a signaling pathway whose components are produced in different cells: the chemorepellent axon guidance ligand Slit is the host gene for miR- 218, and miR-218 targets the ROBO receptors for Slit (Tie et al. 2010). Thus expression of Slit leads to signaling through the ROBO pathway, but in the same cell, the coexpressed miRNA represses the ROBO pathway. During neural development, this feedforward control could prevent the Slit-expressing cell from sending a repellent signal to itself or from wasting its secreted ligand on its own cell-surface ROBO receptors, thereby making the paracrine signaling more robust. In addition to miRNAs' roles in feedback and feedforward loops, it is common for a single miRNA family to coordinately regulate multiple components of a signaling pathway or protein complex (Tsang et al. 2010; see Appendix). Where the target components act coherently to stimulate or suppress the signaling output, the miRNA could act as a master regulator to enhance a decisive signal or tightly shut off signaling. For example, miR- 181 regulates T cell signaling by repressing multiple phosphatases in the cascade: targets SHP-2 and PTPN22 act immediately downstream of the T cell receptor complex, with the latter inhibiting Lck kinase; targets DUSP6 and DUSP5 act late in the pathway to inhibit phosphoErk in the cytoplasm and in the nucleus, respectively (Li et al. 2007; Figure IF). In this way, miR- 181 helps to set different activation thresholds in different stages of T cell development. In double positive T cells, miR- 181 level is relatively high, which heightens sensitivity even to low-affinity self antigens and therefore facilitates positive and negative selection. In more mature differentiated T cells, miR- 181 level is downregulated and the signaling pathway is only responsive to high-affinity foreign antigens. miRNA-target networks are highly connected not only due to coordinate regulation of interacting targets by a single miRNA, but also due to the co-targeting of common genes by combinations of miRNAs (Tsang et al. 2010). Different miRNA seeds regulate overlapping sets of target genes, with each miRNA backing up the other for a given shared target. There is also redundancy in the miRNAs themselves: many have functionally redundant seed family members expressed from the same polycistronic cluster or from distant genomic loci (Kim 2005). miRNAs attenuate leaky transcripts The regulated expression of miRNAs can provide robustness to developmental processes. Global gene expression analysis in fly, fish, and mouse has shown that miRNAs and their targets tend to have anticorrelated RNA expression across tissues, especially in neighboring tissues derived from common progenitors (Stark et al. 2005, Farh et al. 2005, Sood et al. 2006, Tsang et al. 2007, Shkumatava et al. 2009). This suggests that miRNAs can act to reinforce the transcriptional gene expression program by repressing leaky transcripts. For example, in the Drosophilaembryo, neurectodermal progenitors express miR- 124 as they differentiate into neurons. Neuronal genes that are activated during this transition tend not to have miR- 124 sites whereas genes expressed in epidermal tissues that are also ectodermal derivatives are enriched for miR- 124 sites (Stark et al. 2005). Thus expression of miR- 124 stabilizes the neuronal transition. A reciprocal pattern holds for the ectoderm-specific miR-9a. These miRNA-target relationships likely involve many instances of the coherent feedforward loop described above, with tissue-specific transcription factors in the role of component A. Even amongst closely related cell types there can be an inverse relationship: the miR- 124 target repo is expressed only in lateral glia of the central nervous system, cells shown to apparently lack miR-124 expression (Stark et al. 2005). Intriguingly, this anticorrelative pattern may apply not only to transcription but also to alternative splicing: a non-muscle-specific isoform of tropomyosin- 1 is targeted by the muscle-specific miRNA miR- 1 whereas the three muscle-expressed isoforms lack miR- 1 sites, a trend that is conserved in vertebrates (Stark et al. 2005). Thus a mis-splicing event that generated the cytoplasmic gut/brain/epidermis isoform in muscle cells would be corrected by miRNA-mediated repression. For the anticorrelated expression trend to have arisen over the course of evolution, there must have been selective pressure in the form of biological variation or noise at the level of transcription or alternative splicing of functionally important genes. Leakiness may be a necessary tradeoff for cells that are differentiated from plastic progenitors. In the context where miRNAs continuously reinforce cell type identity once differentiation is initiated, miRNAs would serve best as long-lived molecules. Indeed some miRNAs show extreme stability compared to mRNAs, such as the heart muscle-specific miR-208, which has a half-life of about 12 days in vivo (van Rooij et al. 2007). Controlling leaky transcripts may be especially important when transient spikes in mRNA level are enhanced by positive feedback loops. This appears to be the function of a miRNA that acts as a gatekeeper for sensory organ precursor (SOP) determination in flies. miR-9a targets the proneural transcription factor Senseless (Li et al. 2006). Normally, only one cell in a proneural cluster becomes a SOP; it arises when a transient and apparently random increase in Senseless protein feeds back positively through other proneural genes (Figure 2). Lateral inhibition via Delta-Notch signaling maintains the neighboring cells in a non-SOP fate by repressing Senseless and its downstream proneural genes. miR-9a is expressed in all the cells of the neuroectodermal clusters and then after differentiation only in the non-SOPs, keeping Senseless expression low. Deletion of miR-9a resulted in the appearance of variable numbers of extra sensory bristles in about 40% of mutant animals. Given the incomplete penetrance of the phenotype and the randomness of the development of ectopic sensory organs, it seems the miRNA is required to suppress some of the random spikes in Senseless protein level. By setting a threshold above which the proneural transcription factor is allowed to trigger the feedback, miR-9a makes the cell fate switch less error-prone (Cohen et al. 2006). miRNAs set gene expression thresholds for their targets The ability of a miRNA to set an expression threshold for its target genes was recently generalized. Mukherji et al. assayed miRNA activity in single cells using a tet-responsive promoter driving an mCherry reporter targeted by endogenous miRNA, and from the other direction, an eYFP reporter that served as a proxy for target transcription (Mukherji et al. 2010). Inducing the genes over a wide range of target mRNA production, the strength of miRNA-mediated repression varied dramatically: in the low target input regime the mCherry reporter was almost completely repressed; above a certain threshold of target input, repression was very mild, and at very high target levels, essentially null (Figure 3). A simple mathematical model predicted the threshold effect from the titrationlike nature of miRNA-target interaction. The threshold was modulated by changing the miRNA concentration and the number and strength of the binding sites in the target's 3' UTR. This RNA titration system could allow cells expressing a certain set of miRNAs to discriminate between low-level, transient, leaky transcripts, and legitimate transcripts expressed at higher levels and for sustained periods. By post-transcriptionally silencing and inducing destruction of the sub-threshold transcripts, miRNAs contribute to cell fate decisions and the maintenance of cell/tissue identity. The use of cell type-specific combinations of miRNAs and co-targeting of multiple miRNAs per mRNA not only provides vast combinatorial control, but also allows for tuning of the threshold based on total miRNA concentration and total number of binding sites. Though most target genes have only one conserved binding site per miRNA seed family, the majority have sites for multiple miRNA families, with an average of more than four total conserved sites per 3' UTR (Friedman et al. 2009). Below the miRNA-determined threshold for target mRNA production, target protein expression is essentially switched off. Above the threshold, target protein output increases steeply in what is termed an ultrasensitive transition (Mukherji et al. 2010). Across this transition, the target is repressed at every possible degree until it is entirely derepressed. The lower the miRNA concentration, the lower the target mRNA expression at the point of escape from repression. The pool of endogenous target mRNAs also contributes to the concentration of available miRNA; during a developmental transition where a miRNA is upregulated and its pool of target genes are down-regulated, its effective concentration and therefore its potency could greatly increase for a small number of functionally important targets. miRNAs may buffer transcriptional noise As discussed above, anticorrelated expression of miRNAs and targets is common. Perhaps surprisingly, incoherent expression of miRNAs and targets is also prevalent. Tsang et al. used expression profiles of human and mouse host genes to infer the expression of intron-embedded miRNAs and compared this large dataset to that of predicted target gene expression. To avoid ambiguity resulting from mixed cell types in a given tissue, they also used data from homogeneous isolated neuronal cell populations. About 70% of the 60 miRNAs analyzed had a significantly higher number of targets (genes with conserved seed matches) in the top 10 percentile of correlated or anticorrelated genes, whereas at most 8% had a significantly higher number of targets in the middle ten percentile sets (Tsang et al. 2007). Correlated and anti-correlated miRNAtarget patterns were about equally prevalent. Co-expression across a variety of conditions implies transcriptional control by (a) common transcription factor(s). What benefit might accrue from expressing a gene and simultaneously expressing a repressor for that gene? The additional resources used to do so may pay for enhanced robustness in gene expression. Random fluctuations in protein levels arise from several sources. Intrinsic noise refers to variation arising from stochastic events including promoter binding, mRNA decay, translation, and protein degradation (Raser and O'Shea 2005). Extrinsic noise refers to variation arising from differences such as transcription factor or ribosome concentration or cell cycle stage. Both sources cause protein levels to fluctuate in a given cell over time and between clonally identical cells. The degree to which protein level fluctuates around its mean may be influenced by the rates at which transcription and translation occur. To synthesize 100 molecules of a certain protein per cell, one can imagine two extreme strategies: transcribe 10 copies of mRNA and translate 10 protein copies per mRNA, or transcribe 100 mRNAs and repress their translation so as to translate 1 protein copy per mRNA (Figure 4A). What would be the consequences of these strategies? The mean protein output would be the same but the expected variance would be substantially different (Figure 4B). Transcription occurs in stochastic bursts (Blake et al. 2006) and higher transcription rates correlate with lower noise (Paulsson 2004). Translation events amplify transcriptional noise (Paulsson 2004, Pedraza and van Oudenaarden 2005) such that noise increases linearly with the rate of translation (Ozbudak et al. 2002). By transcribing a gene at a high rate and simultaneously reducing its translation rate using miRNAs, cells should reduce fluctuations in target protein number. Specific coexpression of miRNA and target in an incoherent feedforward loop has been proposed as a way to do this (Hornstein and Shomron 2006). In prokaryotes, small non-coding regulatory RNAs (sRNAs) induce target mRNA degradation and are seen to reduce protein noise, but only when target transcription falls below a certain threshold (Levine et al. 2010). Below the target expression threshold set by miRNA, variation in target mRNA input is transmitted into disproportionately small variation in protein output (slope < 1, Figure 3); in this regime, random fluctuations in target mRNA number could be suppressed (Mukherji et al. 2010). But within the ultrasensitive transition, which corresponds to higher transcription rates, the variation in mRNA input corresponds to greater variation in protein output for miRNA-targeted genes than for non-targeted genes (slope > 1, Figure 3). Perhaps the cost of tunability within the ultrasensitive transition is increasednoise in protein expression. There are other, ubiquitously active cis regulatory elements that may help buffer fluctuations by attenuating the translation of mRNAs, but that lack the tunability provided by combinations of miRNA binding sites and cell type-specific miRNA repertoires. For example, upstream open reading frames and weak noncanonical Kozak sequences reduce the efficiency of translation initiation (Calvo et al. 2009), and rare codons and secondary structures slow translation elongation. Pairing strong transcription with these mechanisms may reduce protein fluctuations arising from fluctuations in mRNA copy number. In the regime where miRNAs are expected to suppress noise, coordinate targeting of multiple pathway components may be especially useful since noise can propagate through a network (Blake et al. 2003, Pedraza and van Oudenaarden 2005). Moreover, in an incoherent feedforward loop, if the miRNA has a half-life comparable to that of the target mRNA and protein - which is plausible given the wide range of observed miRNA stability (Bail et al. 2010) - then this motif could serve to partially decouple the target protein output from fluctuations in the upstream transcription factors. When the transcription factor spikes, it will induce the target gene but also increase miRNA- mediated repression; when it plummets, the drop in target transcription will be counteracted by post-transcriptional derepression. As a result, the expression level of target proteins with short half-lives would be stabilized over time. The consequences for fluctuations in the level of a protein depend on the timescale of fluctuation (the time it takes for a peak or trough to return to the mean). It is not yet clear whether incoherent feedforward regulation buffers target genes at the relevant timescale to produce physiological effects. Some miRNA phenotypes appear upon stress Many miRNAs show no phenotype when inhibited or knocked out in cells or animals under normal conditions (Leaman et al. 2005, Miska et al. 2007). Dicer-mutant zebrafish lacking any detectable miRNAs still develop all the major organs and differentiated cell types (Giraldez et al. 2005). If miRNAs act to confer accuracy and uniformity to developmental transitions, then loss of a miRNA may result not in catastrophic defects but rather in imprecise, variable phenotypes. If other feedback or back-up mechanisms are in place, then the loss of robustness may only be detected by applying additional perturbations. Indeed, several miRNA knockout animals have shown losses of function only upon stress. In Drosophila,miR-7, like miR-9a, plays a role in the determination of sensory organs (Li and Carthew 2005). Loss of miR-7 had no observable impact on the development of the sensory organs under normal, uniform conditions; expression of the proneural transcription factor Atonal was also detected at wild-type level (Li et al. 2009). But when an environmental perturbation was added during larval development - fluctuating the temperature between 31 0C and 18"C roughly every 90 minutes - the miR-7 mutant eyes showed abnormally low Atonal expression and abnormally high, irregular expression of the antineural transcription factor Yan. Sensory organ precursor (SOP) defects also appeared: some groups of antennal SOPs failed to develop, or developed with abnormal patterning; their cells showed low Atonal levels. These outcomes were better understood by elucidating the regulatory networks for these components. In photoreceptor determination, Yan inhibits and Pnt-P 1 activates, respectively, the transcription of the miR-7 precursor through binding at an upstream enhancer element (Figure 5A). Yan also inhibits miR-7 production indirectly by repressing phyllopod, an E3 ubiquitin ligase that promotes the degradation of TTK69, a transcription factor that represses miR-7 precursor transcription. Pnt-P 1 also transcriptionally inhibits Yan. If the Yan protein level drops transiently in a Yan-ON state, miR-7 can still be repressed by TTK69; since Yan is a target of miR-7, this loop sustains the expression of Yan protein. If the Pnt-P 1 protein level drops transiently in a Yan-OFF state, Yan expression can still be kept low through the activity of miR-7. By contrast, when there is a persistent decrease in Yan protein, the mutual negative feedback with miR-7 switches the cell state to miR-7 ON-Yan OFF, and the coherent feedforward loop further reinforces the sustained high expression of miR-7. Thus, in the absence of miR-7, there is no counterweight to Yan, so the switch mechanism to achieve the Yan-OFF state is impaired. miR-7 also participates in an incoherent feedforward loop in SOP determination: Atonal activates E(spl) but likewise activates miR-7, which represses E(spl) (Figure 5B). There is an interlocking negative feedback loop from E(spl) to Atonal. In wild-type flies, transient rises in Atonal protein level would be counteracted by rises in E(spl), whereas a sustained increase in Atonal would be maintained by miR-7 repressing E(spl). In the miR-7 mutants, Atonal expression is decreased due to derepression of E(spl), so cells are impaired for switching to the Atonal-ON state, and sensory organs fail to develop in normal numbers. Several other miRNA knockout phenotypes appear in response to internal stressors. In the Drosophilalarva, loss of the muscle-specific miRNA miR-l does not impair the formation or physiological function of muscle, but a dramatic phenotype appears when the larvae start feeding and their muscle cells undergo rapid post-mitotic growth (Sokol and Ambros 2005, Brennecke et al. 2005). The mutants experienced paralysis, growth arrest, and death, and showed severely deformed body wall musculature. This larval phenotype was rescued by providing a protein-free diet that blocks the normal rapid growth phase. In mice, deletion of the heart muscle-specific miRNA miR-208 has little phenotype under normal conditions but results in a failure to induce cardiac remodeling upon stress (van Rooij et al. 2007). When the mice were treated to induce pressure overload or hypothyroidism, miR-208 activity was required in the cardiomyocytes to help upregulate betaMHC by targeting the thyroid receptor signaling pathway. miRNAs, robustness and disease When feedback or feedforward loops get co-opted in inappropriate contexts, they may contribute to disease. Iliopoulos et al. describe a network of feedback loops that flips an epigenetic switch in cancer. Transient activation of Src or other triggers of NF-kappaB induced stable transformation of a mammary epithelial cell line (Iliopoulos et al. 2010). NF-kappaB transcriptionally activates IL6 and inhibits let-7 family members by activating Lin28B (which prompts destruction of let-7 precursor RNAs). The ensuing drop in let-7 level derepresses IL6, a direct let-7 target, and IL6 is further activated by derepression of the let-7 target Ras. IL6 feeds back in both an autocrine and paracrine fashion to activate NF-kappaB, which further inhibits let-7, and it signals through STAT3 to promote cell growth and motility (Figure 6). In a xenograft model, inhibiting NFkappaB, Lin28B, or IL6 suppressed tumor growth. What function does this inflammatory network serve in a healthy animal? In normal tissue, a transient inflammatory cue could signal through this pathway to induce cell growth to repair damage. The miRNA holds the positive feedbacks in check, as in the case of miR-9a in Drosophila SOP determination (see above). In cancer, where let-7 is typically down-regulated (Kumar et al. 2008, Dong et al. 2010), the positive feedbacks would go unchecked, and continuous, self-reinforcing proliferation would result. In human tumors the positive feedback loop would be made even stronger by the presence of oncogenic v-Src or Ras-V12 (Iliopoulos et al. 2010). Another example involves co-option of a developmental process in cancer. The transcription factor ZEB 1 induces the epithelial-to-mesenchymal transition, which is important for tissue remodeling during embryonic development. ZEB 1 suppresses transcription of miR-200 family members, and the miR-200 family strongly represses ZEB 1 (Bracken et al. 2008, Burk et al. 2008). In development, this mutual negative feedback reinforces the mesenchymal cell fate decision. Within carcinomas, some tumor cells lose miR-200 expression and switch to a mesenchymal state, which promotes their ability to metastasize (Gibbons et al. 2009). In a healthy animal, robust signaling processes keep cells behaving appropriately. In a cancer patient, the tumor might actually become robust against therapy by virtue of unhinging the usual controls and making gene expression less stable. miRNAs are globally depleted in tumors relative to their normal tissue counterparts (Lu et al. 2005) and modeling this state by knocking down components of the miRNA biogenesis pathway (Kumar et al. 2007) or by heterozygous deletion of Dicer (Kumar et al. 2009) accelerates tumor growth. In addition, 3' UTRs are globally shortened in tumors via alternative polyadenylation site choice (Sandberg et al. 2008, Mayr and Bartel 2009). The combined effect of these trends should be widespread derepression of miRNA target genes and potentially also un-buffering of gene expression. Might this increase the heterogeneity and plasticity of the tumor cell population by epigenetic dysregulation? Perhaps the tumor can be thought of as analogous to a clonal population of bacteria or yeast where noise in the population adapts them to unpredictably changing environmental conditions (Acar et al. 2008, Cagatay et al. 2009). For cancer cells, these changing conditions could include increasingly hypoxic tumor cores, new microenvironments for metastases, or on-and-off chemotherapy regimens, and the consequence of their noisedriven adaptability would be that a fraction of the cells survive almost any condition. Implications for miRNA influence in evolution While we speculate about the ability of miRNAs to act as buffers of gene expression, there is as of yet only one well-characterized example of a general mutation buffering agent. The chaperone Hsp90 assists the folding of client proteins such that it can compensate for point mutations in the protein coding regions of client genes (Rutherford and Lindquist 1998). In doing so Hsp90 acts as a capacitor of phenotypic variation, storing cryptic genetic variation until environmental stress overwhelms Hsp90 and reveals the mutant proteins, allowing them to affect phenotypes and become substrates for selection. Do miRNAs potentiate cryptic genetic variation in their target genes? If so, the mutations would be in the promoters or transcription factors driving target gene expression or the expression of downstream genes. The ability of the miRNA to compensate for otherwise elevated target protein levels would allow such mutations to accrue without selective penalty. The emergence of non-lethal mutations that give diverse phenotypes is one requirement for evolvability (Kitano 2004). Analagously to the case of Hsp90, loss of miRNA activity due to mutations or misexpression of components in the miRNA pathway could unleash the mutated gene products for exposure to natural selection. Another way to describe miRNAs' hypothetical role in enhancing evolvability is canalization. For a trait to be canalized refers to the evolved robustness conferred by entrenching mechanisms that protect against environmental or genetic perturbations (Hornstein and Shomron 2006). Where miRNAs reduce fluctuations in target gene expression and stabilize signaling decisions, they would tighten the linkage between genotype and phenotype, thereby increasing the heritability of traits (Peterson et al. 2009). The more heritable a trait is, the more efficiently it is selected. Thus not only does evolution favor robustness, but robustness also promotes evolution (Kitano 2004). Conclusions Multicellular organisms must manage the tasks of development and physiology in unpredictably changing environments and with imperfect genetic and biochemical components. Random noise in gene expression must be reduced or, as in the case of some cell fate decisions, harnessed to a system control network to designate one fate or another among neighboring cells. Robustness goes beyond the job of keeping one state the same in the face of perturbations. In development, it can mean not sending a signal until the right time, and then sending it strongly and irreversibly. The addition of miRNAs to metazoan genomes over time and the diversity of miRNA repertoires among different tissues of developing animals suggest that miRNAs are involved in reinforcing developmental decisions to make organismal complexity reliable and heritable from one generation to the next. Acknowledgments We thank Mary Lindstrom for help making the figures, and Shankar Mukherji, Anthony Leung, and Dave Bartel for insightful comments on the manuscript. References Acar M, Mettetal JT, van Oudenaarden A. Stochastic switching as a survival strategy in fluctuating environments. Nat. Genet. 40, 471-475 (2008). Bail S, Swerdel M, Liu H, Jiao X, Goff LA, Hart RP, Kiledjian M. Differential regulation of microRNA stability. RNA 16, 1032-1039 (2010). Bar-Even A, Paulsson J, Maheshri N, Carmi M, O'Shea E, Pilpel Y, Barkai N. Noise in protein expression scales with natural protein abundance. Nat. Genet. 38, 636-643 (2006). Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233 (2009). Bartel DP, Chen CZ. Micromanagers of gene expression: the potentially widespread influence of metazoan microRNAs. Nat. Rev. Genet. 5, 396-400 (2004). Bhattacharyya SN, Habermacher R, Martine U, Closs El, Filipowicz W. Relief of microRNA-mediated translational repression in human cells subjected to stress. Cell 125, 1111-1124 (2006). Blake WJ, Balizsi G, Kohanski MA, Isaacs FJ, Murphy KF, Kuang Y, Cantor CR, Walt DR, Collins JJ. Phenotypic consequences of promoter-mediated transcriptional noise. Mol. Cell 24, 853-865 (2006). Bracken CP, Gregory PA, Kolesnikoff N, Bert AG, Wang J, Shannon MF, Goodall GJ. A double-negative feedback loop between ZEB 1-SIP 1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res. 68, 7846-7854 (2008). Brennecke J, Stark A, Cohen SM. Not miR-ly muscular: microRNAs and muscle development. Genes Dev. 19, 2261-2264 (2005). Burk U, Schubert J, Wellner U, Schmalhofer 0, Vincan E, Spaderna S, Brabletz T. A reciprocal repression between ZEB 1 and members of the miR-200 family promotes EMT and invasion in cancer cells. EMBO Rep. 9, 582-589 (2008). Cagatay T, Turcotte M, Elowitz MB, Garcia-Ojalvo J, Snlel GM. Architecture-dependent noise discriminates functionally analogous differentiation circuits. Cell 139, 512-522 (2009). Chang S, Johnston RJ Jr, Frekjaer-Jensen C, Lockery S, Hobert 0. MicroRNAs act sequentially and asymmetrically to control chemosensory laterality in the nematode. Nature 430, 785-789 (2004). Calvo SE, Pagliarini DJ, Mootha VK. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc. Natl Acad. Sci. USA 106, 7507-7512 (2009). Cohen SM, Brennecke J, Stark A. Denoising feedback loops by thresholding--a new role for microRNAs. Genes Dev. 20, 2769-2772 (2006). Dong Q, Meng P, Wang T, Qin W, Qin W, Wang F, Yuan J, Chen Z, Yang A, Wang H. MicroRNA let-7a inhibits proliferation of human prostate cancer cells in vitro and in vivo by targeting E2F2 and CCND2. PLoS One 5, e10147 (2010). Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821 (2005). Fazi F, Rosa A, Fatica A, Gelmetti V, De Marchis ML, Nervi C, Bozzoni I. A minicircuitry comprised of microRNA-223 and transcription factors NFI-A and C/EBPalpha regulates human granulopoiesis. Cell 123, 819-831 (2005). Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92-105 (2009). Gibbons DL, Lin W, Creighton CJ, Rizvi ZH, Gregory PA, Goodall GJ, Thilaganathan N, Du L, Zhang Y, Pertsemlidis A, Kurie JM. Contextual extracellular cues promote tumor cell EMT and metastasis by regulating miR-200 family expression. Genes Dev. 23, 21402151 (2009). Giraldez AJ, Cinalli RM, Glasner ME, Enright AJ, Thomson JM, Baskerville S, Hammond SM, Bartel DP, Schier AF. MicroRNAs regulate brain morphogenesis in zebrafish. Science 308, 833-838 (2005). Heimberg AM, Sempere LF, Moy VN, Donoghue PC, Peterson KJ. MicroRNAs and the advent of vertebrate morphological complexity. Proc. Natl Acad. Sci. USA 105, 29462950 (2008). Hornstein B, Shomron N. Canalization of development by microRNAs. Nat. Genet. 38 Suppl, S20-4 (2006). Iliopoulos D, Hirsch HA, Struhl K. An epigenetic switch involving NF-kappaB, Lin28, Let-7 MicroRNA, and IL6 links inflammation to cell transformation. Cell 139, 693-706 (2009). Juan AH, Kumar RM, Marx JG, Young RA, Sartorelli V. Mir-214-dependent regulation of the polycomb protein Ezh2 in skeletal muscle and embryonic stem cells. Mol. Cell 36, 61-74 (2009). Kefas B, Comeau L, Floyd DH, Seleverstov 0, Godlewski J, Schmittgen T, Jiang J, diPierro CG, Li Y, Chiocca EA, Lee J, Fine H, Abounader R, Lawler S, Purow B. The neuronal microRNA miR-326 acts in a feedback loop with notch and has therapeutic potential against brain tumors. J. Neurosci. 29, 15161-15168 (2009). Kim VN. MicroRNA biogenesis: coordinated cropping and dicing. Nat. Rev. Mol. Cell Biol. 6, 376-385 (2005). Kitano H. Biological robustness. Nat. Rev. Genet. 5, 826-837 (2004). Klein ME, Lioy DT, Ma L, Impey S, Mandel G, Goodman RH. Homeostatic regulation of MeCP2 expression by a CREB-induced microRNA. Nat. Neurosci. 10, 1513-1514 (2007). Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T. Suppression of non-small cell lung tumor development by the let-7 microRNA family. Proc. Natl Acad. Sci. USA 105, 3903-3908 (2008). Kumar MS, Lu J, Mercer KL, Golub TR, Jacks T. Impaired microRNA processing enhances cellular transformation and tumorigenesis. Nat. Genet. 39, 673-677 (2007). Kumar MS, Pester RE, Chen CY, Lane K, Chin C, Lu J, Kirsch DG, Golub TR, Jacks T. Dicerl functions as a haploinsufficient tumor suppressor. Genes Dev. 23, 2700-2704 (2009). Leaman D, Chen PY, Fak J, Yalcin A, Pearce M, Unnerstall U, Marks DS, Sander C, Tuschl T, Gaul U. Antisense-mediated depletion reveals essential and specific functions of microRNAs in Drosophila development. Cell 121, 1097-1108 (2005). Lee CT, Risom T, Strauss WM. Evolutionary conservation of microRNA regulatory circuits: an examination of microRNA gene complexity and conserved microRNA-target interactions through metazoan phylogeny. DNA Cell Biol. 26, 209-218 (2007). Levine E, Huang M, Huang Y, Kuhlman T, Shi H, Zhang Z, Hwa T. On noise and silence in small RNA regulation. Submitted (2010). Li QJ, Chau J, Ebert PJ, Sylvester G, Min H, Liu G, Braich R, Manoharan M, Soutschek J, Skare P, Klein LO, Davis MM, Chen CZ. miR- 181 a is an intrinsic modulator of T cell sensitivity and selection. Cell 129, 147-161 (2007). Li X, Carthew RW. A microRNA mediates EGF receptor signaling and promotes photoreceptor differentiation in the Drosophila eye. Cell 123, 1267-1277 (2005). Li X, Cassidy JJ, Reinke CA, Fischboeck S, Carthew RW. A microRNA imparts robustness against environmental fluctuation during development. Cell 137, 273-282 (2009). Li Y, Wang F, Lee JA, Gao FB. MicroRNA-9a ensures the precise specification of sensory organ precursors in Drosophila. Genes Dev. 20, 2793-2805 (2006). Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, Downing JR, Jacks T, Horvitz HR, Golub TR. MicroRNA expression profiles classify human cancers. Nature 435, 834-838 (2005). Mayr C, Bartel DP. Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673-684 (2009). Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science 298, 824-827 (2002). Miska EA, Alvarez-Saavedra E, Abbott AL, Lau NC, Hellman AB, McGonagle SM, Bartel DP, Ambros VR, Horvitz HR. Most Caenorhabditis elegans microRNAs are individually not essential for development or viability. PLoS Genet. 3, e215 (2007). Mukherji S, Ebert MS, Zheng GZ, Tsang JS, Sharp PA, van Oudenaarden A. MicroRNAs set gene expression thresholds with ultrasensitive transitions. Submitted (2010). O'Donnell KA, Wentzel EA, Zeller KI, Dang CV, Mendell JT. c-Myc-regulated microRNAs modulate E2F1 expression. Nature 435, 839-843 (2005). Ozbudak EM, Thattai M, Kurtser I, Grossman AD, van Oudenaarden A. Regulation of noise in the expression of a single gene. Nat. Genet. 31, 69-73 (2002). Paulsson J. Summing up the noise in gene networks. Nature 427, 415-418 (2004). Pedraza JM, van Oudenaarden A. Noise propagation in gene networks. Science 307, 1965-1969 (2005). Peterson KJ, Dietrich MR, McPeek MA. MicroRNAs and metazoan macroevolution: insights into canalization, complexity, and the Cambrian explosion. Bioessays 31, 736747 (2009). Pulikkan JA, Dengler V, Peramangalam PS, Peer Zada AA, MUller-Tidow C, Bohlander SK, Tenen DG, Behre G. Cell-cycle regulator E2F I and microRNA-223 comprise an autoregulatory negative feedback loop in acute myeloid leukemia. Blood 115, 1768-1778 (2010). Raser JM, O'Shea EK. Noise in gene expression: origins, consequences, and control. Science 309, 2010-2013 (2005). Roush SF, Slack FJ. Transcription of the C. elegans let-7 microRNA is temporally regulated by one of its targets, hbl- 1. Dev. Biol. 334, 523-534 (2009). Rutherford SL, Lindquist S. Hsp90 as a capacitor for morphological evolution. Nature 396, 336-342 (1998). Sandberg R, Neilson JR, Sarma A, Sharp PA, Burge CB. Proliferating cells express mRNAs with shortened 3' untranslated regions and fewer microRNA target sites. Science 320, 1643-1647 (2008). Shalgi R, Lieber D, Oren M, Pilpel Y. Global and local architecture of the mammalian microRNA-transcription factor regulatory network. PLoS Comput. Biol. 3, e131 (2007). Shkumatava A, Stark A, Sive H, Bartel DP. Coherent but overlapping expression of microRNAs and their targets during vertebrate development. Genes Dev. 23, 466-481 (2009). Sokol NS, Ambros V. Mesodermally expressed Drosophila microRNA- 1 is regulated by Twist and is required in muscles during larval growth. Genes Dev. 19, 2343-2354 (2005). Sood P, Krek A, Zavolan M, Macino G, Rajewsky N. Cell-type-specific signatures of microRNAs on target mRNA expression. Proc. Natl Acad. Sci. USA 103, 2746-2751 (2006). Stark A, Brennecke J, Bushati N, Russell RB, Cohen SM. Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123, 1133-1146 (2005). Thattai M, van Oudenaarden A. Intrinsic noise in gene regulatory networks. Proc. Natl Acad. Sci. USA 98, 8614-8619 (2001). Thomson JM, Newman M, Parker JS, Morin-Kensicki EM, Wright T, Hammond SM. Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev. 20, 2202-2207 (2006). Tie J, Pan Y, Zhao L, Wu K, Liu J, Sun S, Guo X, Wang B, Gang Y, Zhang Y, Li Q, Qiao T, Zhao Q, Nie Y, Fan D. MiR-218 inhibits invasion and metastasis of gastric cancer by targeting the Robo 1 receptor. PLoS Genet. 6, e 1000879 (2010). Tsang JS, Ebert MS, van Oudenaarden A. Genome-wide dissection of microRNA functions and cotargeting networks using gene set signatures. Mol. Cell 38, 140-153 (2010). Tsang J, Zhu J, van Oudenaarden A. MicroRNA-mediated feedback and feedforward loops are recurrent network motifs in mammals. Mol. Cell 26, 753-767 (2007). van Rooij E, Sutherland LB, Qi X, Richardson JA, Hill J, Olson EN. Control of stressdependent cardiac growth and gene expression by a microRNA. Science 316, 575-579 (2007). Varghese J, Cohen SM. microRNA miR- 14 acts to modulate a positive autoregulatory loop controlling steroid hormone signaling in Drosophila. Genes Dev. 21, 2277-2282 (2007). Xu N, Papagiannakopoulos T, Pan G, Thomson JA, Kosik KS. MicroRNA-145 regulates OCT4, SOX2, and KLF4 and represses pluripotency in human embryonic stem cells. Cell 137, 647-658 (2009). Yang X, Feng M, Jiang X, Wu Z, Li Z, Aau M, Yu Q. miR-449a and miR-449b are direct transcriptional targets of E2F 1 and negatively regulate pRb-E2F 1 activity through a feedback loop by targeting CDK6 and CDC25A. Genes Dev. 23, 2388-2393 (2009). Yoo AS, Greenwald I. LIN- 12/Notch activation leads to microRNA-mediated downregulation of Vav in C. elegans. Science 310, 1330-1333 (2005). Zhao C, Sun G, Li S, Shi Y. A feedback regulatory loop involving microRNA-9 and nuclear receptor TLX in neural stem cell fate determination. Nat. Struct. Mol. Biol. 16, 365-371 (2009). Figures Figure 1. miRNA-target network motifs (A) A negative feedback loop contributes to homeostasis for MeCP2 protein in neurons. (B) A mutual negative feedback loop contributes to bistability between myeloid precursors and granulocytes. (C) A positive feedback loop enforces lineage commitment of nematode "2 degrees" vulval cells. (D) A coherent feedforward loop both directly and indirectly inhibits the cell cycle regulator E2F 1 in granulopoiesis. (E) An incoherent feedforward loop activates E2F I while indirectly repressing it through the miR-20 seed family. (F) miR-181 coordinately targets four components to modulate the sensitivity of the T cell receptor signaling pathway. (1A) A ) B MeCP2 BDNF - miR-1 32 (B) A-1 B NFl-A B LIN12 miR-223 I (1 C) A )- miR-61 ) Vav-1 (I D) A ) C B C/EBPa ) miR-223 E2F1.. (lE) 01miR-17-5 17-3p, 18, y19a, 2019b-1, 92-1 C E2F1 (1F) B SHP-2 C PTPN22 -- ILCK D DUSP6 --- ERK E DUSP5 I -IT cell receptor Ir miR-1l81 A -IERK (cytoplasmic) (nuclear) Figure 2. miR-9a suppresses random spikes in the level of the proneural transcription factor Senseless, setting a threshold for positive feedback activation in Drosophilasensory organ precursor (SOP) formation. Figure adapted from Cohen et al. 2006. SOP c 0_j SOP SOPP 0 0O Vi) SOP /A A miR-9a mutant wild-type Figure 3. miRNA-target interaction produces non-linear target protein output. Below a certain threshold of target mRNA production, the target is strongly repressed. Above the threshold, repression is fine-tuned along every degree in an ultrasensitive transition. The threshold can be modulated by changing the miRNA concentration or the number of miRNA binding sites in the target mRNA. Switch Fine-tuning | || 4-J 0 Lower [miRNA] or Fewer binding sites 4-J 0 L_ ~ Higher [miR NA] or More bindin g sites mRNA Input Untargeted miRNA target Figure 4. (A) A given mean protein level can be achieved by a variety of transcription-translation strategies. (B) The variance around the mean is anticipated to be smaller when strong transcription is paired with attenuated translation. (B) (A) U) C 0 4- 0 4-I C" 0 C C" I.) 5..- F- Transcription rate Protein copies per cell ............. . Figure 5. The requirement for miR-7 in DrosophilaSOP cell fate switching is revealed by adding an environmental perturbation during development. Diagrams adapted from Li et al. 2009. (A) miR-7 and the antineural transcription factor Yan participate in a coherent feedforward loop in DrosophilaSOP determination such that SOP cells switch to a miR-7 ON-Yan OFF state. (B) miR-7 and the proneural transcription factor Atonal participate in an incoherent feedforward loop in DrosophilaSOP determination such that SOP cells switch to a miR-7 ON-Atonal ON state. Notch EGFR Su(H) ERK Notch EGFR E(spl)C Pnt-P1 miR-7 miR-7 p-Yn T I-I TTK69 Atonal F-Pnt-P1 Phyl -- 44 TTK88 Senseless Figure 6. A transient inflammatory cue induces stable malignant transformation through an NFkappaB/IL6 positive feedback network that is normally kept in check by let-7. Diagram adapted from Iliopoulos et al. 2009. Src - Ras NFKB - IL-6 |- Lin-28B let-7 STAT3 100 Chapter 6. Conclusions and future directions This chapter was written by Margaret S. Ebert. Conclusions The discussions in Chapter 5 lead to some suggestions that connect systems biology concepts to the experimental study of miRNA biology. 1, It is important to think of miRNAs and targets in terms of regulatory networks. Small differences in target protein expression can have dramatic outcomes when the target protein participates in a positive feedback loop and potentially even more so when the loop involves paracrine signaling. 2, The output of a regulatory motif depends on the relative stabilities of the molecules involved. One must consider the miRNA's half-life in relation to the processes it regulates, and consider dynamics (pulses and lags in target protein), not just steady states (fine-tuning of target protein). 3, To test the potential buffering effects of miRNAs on target gene expression, experimenters will need to make use of single-cell assays that measure mRNA and protein levels among different cells within a clonal population (see Future directions). 4, miRNA loss-of-function phenotypes may need to be teased out by culturing the mutant animals or cells under non-standard conditions that mimic the stresses of a natural environment. If miRNAs are responsible for sharpening developmental transitions, then mutants may show ambiguous intermediate states between characteristic developmental stages. Another way of thinking that emerges from previous chapters is about the relationship not between a single miRNA interacting with single target gene in isolation, but rather between a pool of different miRNAs that co-target the gene of interest, and a pool of different endogenous mRNAs that compete for binding to those miRNAs. Depending on the abundance and quality of binding sites in the endogenous mRNAs, the free miRNA pool may be much smaller than suggested by total cellular miRNA concentration, as a large fraction of the miRNA complexes may be partitioned among hundreds of targets. The link between the work on miRNA sponges (described in Chapters 2 and 3) and the work on miRNA-generated expression thresholds (described in Chapters 4 and 5) can be thought of as molecular titration. miRNAs titrate target mRNAs, and miRNA sponges titrate miRNAs. The effectiveness of sponge inhibitors is probably due in part to the fact that many of the miRNAs are already sequestered on endogenous target mRNAs. Perhaps some miRNA target genes whose repression is functionally inconsequential evolved binding sites to act as sponges, tuning miRNA availability to a precise level for the regulation of a small number of targets whose repression does have important phenotypic consequences (Seitz 2009). In this sense, the distinction between 'target' and 'sponge' is blurred. We see that a sponge mRNA expressed at an insufficient level just behaves as a miRNA target; a miRNA target overexpressed to a very high level behaves as a miRNA sponge. Indeed, in cells above but not below the threshold, the miR-20 N= 7 mCherry target reporter partially derepresses another target of miR-20 but not a target of the 101 unrelated miRNA miR- 16 (data not shown). The more highly expressed a given miRNA is, the higher the threshold for an N= 7 target such as a GFP sponge mRNA; hence the higher the concentration of sponge required to sequester it. The less abundant the miRNA, the more complete its suppression at a given level of sponge expression. The threshold result also has interesting implications for how we assay miRNA targets. Conventionally, to validate a miRNA target prediction, one fuses the putative target's 3' UTR to a luciferase reporter driven by a strong viral promoter and transfects cultured cells with many copies of plasmid DNA. After one to three days, the cells are lysed and the average luciferase expression is measured. Typically the degree of repression reported by this assay is less than two-fold. Considering however that most of the luciferase expression arises from cells containing many copies of the reporter gene, presumably those located above the target expression threshold, there are likely other cells in the population that express less luciferase mRNA and experience stronger repression. By averaging over the population of cells, the strength of repression is diluted. We expect that transcriptional output from cellular promoters in chromosomal DNA would correspond to the lower range of expression from transfected luciferase reporter plasmids, placing it below and around the presumed threshold. Thus we conclude that the bulk reporter assay most commonly used to test miRNA targets may be systematically underestimating the in vivo potency of miRNAs in these interactions. On the other hand, in experiments where endogenous target genes are tested in the presence of added miRNA, there is opportunity to overestimate the strength of repression. By introducing non-physiological concentrations of miRNA, one can shift the threshold such that certain targets become strongly repressed whereas they are only weakly repressed in their natural context. Perhaps the parameters for thresholding could be applied not only for interpreting but also for predicting miRNA targeting effects. Given predictions that score the overall quality and number of binding sites for a given set of miRNAs, and expression data for mRNAs and miRNAs in cell types of interest, one could weight the target predictions by the relative concentration of miRNA to mRNA. Threshold effects are prevalent in biology. Switch-like thresholds can be created by other mechanisms such as strongly cooperative binding of regulatory proteins, but molecular titration is more tunable than cooperativity (Buchler and Louis 2008). miRNA-target titration allows different thresholds to be generated for different genes in the same tissue (by virtue of different miRNA binding sites) and for the same gene in different tissues (by virtue of different miRNA concentrations). Over developmental time and in response to environmental cues, miRNA profiles change. This resets the threshold for many target genes such that the protein output for some targets now falls below or rises above the level required for a functional outcome. Where steep miRNA thresholds connect to such protein thresholds, switch-like responses can be produced. Perhaps as important as the conclusions from this work are the experimental tools that we made. Sponge vectors are being tested and adapted in many labs around the world, and their use in published reports is still rising. Efforts to generate transgenic mice with inducible tissue-specific miRNA sponges are underway. The bidirectional eYFPmCherry vector will be a powerful new tool for measuring miRNA activity in lieu of 102 luciferase reporters or single-color fluorescent reporters. Its inducibility allows for assaying any possible level of physiological transcription by varying the doxycycline concentration. It could be transferred to lentiviral vectors for delivery to a broader selection of cell lines. We envision its application in target validation assays and assays for regulators of the miRNA pathway. It should be possible to use automated flow cytometry with stable dual color miRNA reporter cell lines to screen libraries of chemical compounds for their effect on miRNA activity. Future directions In Chapter 3 we considered the possibility that there exist natural RNAs that act as miRNA sponge inhibitors to sequester one or more miRNA seed families and rescue the expression of their target genes. Recently a set of hundreds of mRNA-like (spliced, polyadenylated, > 200 nt) mammalian non-coding RNAs were discovered (Guttman et al. 2009). This library of sequences will be screened computationally for potential miRNA target mimics, scoring for the prevalence and quality of miRNA binding sites. Candidates will be assessed with respect to their expression profile and subcellular localization to determine whether these RNAs might plausibly encounter the miRNA(s) whose sites they contain. Any prospective sponge RNAs will be assayed for interaction with the miRNA(s) in question and for the ability to derepress other targets of the same miRNA seed(s) in their natural context. In Chapter 4 we used a bidirectional dual color reporter to assay miRNA activity in HeLa cell lines, and observed dramatic variation in target repression among individual cells. To what extent does miRNA activity vary amongst different cells in animal tissue? To address this question we plan to apply the same reporter system to measure miRNA activity in vivo. One approach is to generate tumors stably expressing the eYFP-mCherry constructs from chromosomal insertions. This could be done by subcutaneous xenograft in immunocompromised mice using stable Tet-On HeLa cell lines with multi-site miR-20 reporters or the N= 0 control reporter. Tissue sections from the tumors could be imaged by fluorescence microscopy, and the relative expression of the mCherry target reporter to the eYFP internal control would serve as an indication of miRNA activity. The mCherry:eYFP ratio in miR-20 targeted tumors would be normalized to the same ratio from the N = 0 control tumors generated contralaterally in the same animal. Another approach is to generate transgenic mice expressing the dual color reporters throughout the body. This approach could provide information about miRNA activity in healthy tissue and at different stages of the animal's development. Another advantage is the option to express the miRNA reporter genes from isogenic chromosomal insertions: a system developed in the Jaenisch laboratory allows for a reporter construct to be site-specifically inserted immediately downstream of the ColA1 collagen locus in murine embryonic stem (mES) cells that also stably express the rtTA transcription factor for tet inducibility (Beard et al. 2006). mES cell clones with single-copy integration of miRNA-targeted or control reporters can be used to generate the transgenic mice. Previously this system was shown to achieve robust inducible fluorescent reporter expression in the majority of cells in the liver, spleen, thymus, intestine, and skin, and measurable expression in many more organs and cell types. Blood cells and tissue sections could be harvested for analysis by flow cytometry and fluorescence microscopy to measure the relative mCherry and eYFP 103 expression. Transgenic miRNA reporter mice could be crossed to various genetic mutants to explore the effects of different signaling environments on miRNA activity for example in mouse models of cancer. In Chapter 5, one of the prevalent miRNA-target network motifs we described is the incoherent feedforward loop. In its stereotypical form, this motif consists of a trancription factor that induces both a protein-coding gene and a miRNA that represses that gene (Tsang et al. 2007). To our knowledge this motif has not been experimentally tested to see if it performs the following three predicted functions: 1, fine-tuning of protein output compared to equal transcriptional induction of the protein-coding gene without miRNAmediated repression; 2, reduction in stochastic fluctuations in protein output compared to transcriptional induction that produces the same mean protein output without miRNAmediated repression; 3, production of a timed pulse of target protein whose attenuation phase depends on miRNA-mediated repression. To test these predictions, we will adapt the bidirectional tet-inducible fluorescence reporter system so as to express both a protein reporter and a miRNA that targets the reporter. Nuclear-localized eCFP serves as the quantitative reporter of protein output; from the other side of the bidirectional promoter, the precursor for the liver-specific miRNA miR- 122 is or is not inserted; finally, the eCFP 3' UTR contains seven bulged binding sites for miR-122 or no sites (Figure 1). The incoherent feedforward loop is constituted when the construct containing both miRNA and eCFP target sites is expressed in HeLa or mES cells expressing the rtTA transcription factor in the presence of doxycycline. Versions lacking the miRNA or the target sites remove a link from the loop such that eCFP is induced without being regulated by miRNA. miR-122 is chosen as it is not expressed in the cell lines to be used; this should provide a clean background and a large dynamic range of inducible expression. The tetinducible promoter modulates transcription rate in a manner that can be finely tuned by varying the doxycycline concentration. Stable lines will be assayed by flow cytometry and fluorescence microscopy. To assay whether the incoherent feedforward loop generates a pulse of eCFP expression, fluorescence microscopy images of live cells induced with doxycycline will be acquired at timed intervals over the course of at least 24-48 hours. To assay whether fluctuations in eCFP are buffered by miRNA repression, cell lines expressing constructs with and without miRNA or target sites will be induced with different doxycycline concentrations such that the mean eCFP expression between samples of cells is equal; then the variance of each cell population will be compared. Finally, the motif in which miRNA is induced in concert with target mRNA will be compared to a scenario in which the miRNA is constitutively expressed. We expect that the incoherent feedforward loop will still produce a target expression threshold, but one with a somewhat flattened ultrasensitive transition where repression is relatively weaker at lower target mRNA production levels and relatively stronger at higher target mRNA production levels. In 2010 the study of miRNAs and other RNAi pathways is flourishing. The work in this thesis suggests that there is still fundamental knowledge about the molecular biology of miRNA regulation that can be gleaned from simple experimental models such as HeLa cells expressing artificial target reporters. In the future, we look forward to the dissection of miRNA target networks and functions in more physiological contexts. 104 Acknowledgments John Tsang provided helpful guidance for testing the incoherent feedforward loop. Evgeny Kiner, an undergraduate research student, helped construct the reporters for the incoherent feedforward loop. References Beard C, Hochedlinger K, Plath K, Wutz A, Jaenisch R. Efficient method to generate single-copy transgenic mice by site-specific integration in embryonic stem cells. Genesis 44, 23-28 (2006). Buchler NE, Louis M. Molecular titration and ultrasensitivity in regulatory networks. J. Mol. Biol. 384, 1106-1119 (2008). Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk 0, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223-227 (2009). Seitz H. Redefining microRNA targets. Curr. Biol. 19, 870-873 (2009). Tsang J, Zhu J, van Oudenaarden A. MicroRNA-mediated feedback and feedforward loops are recurrent network motifs in mammals. Mol. Cell 26, 753-767 (2007). 105 .................... ......... . ....... Figures Figure 1. Experimental model for an incoherent feedforward loop. In the presence of rtTA transcription factor and doxycycline, a bidirectional tet-inducible construct drives expression of the eCFP reporter and miR-122 precursor. miR-122 represses eCFP through interaction with multiple bulged binding sites in the 3' UTR. F- 1 pre-miR-122 4-1 F TRE MMMMM NLS-eCFP rtTA (+ dox) miR-122 |eCFP 106 miR-122 sites .... ...... ...... .... .... Appendix. Genome-wide dissection of microRNA functions and co-targeting networks using gene set signatures This chapter was written by John S. Tsang and Margaret S. Ebert and edited by Alexander van Oudenaarden. This article was published in Molecular Cell vol. 38 pp. 14 0 - 15 3 (2010). Permission to use the full article was obtained from Elsevier. MicroRNAs (miRNAs) are emerging as important regulators of diverse biological processes and pathologies in animals and plants. Though hundreds of human miRNAs are known, only a few have known functions. Here we predict human miRNA functions by using a new method that systematically assesses the statistical enrichment of several miRNA targeting signatures in annotated gene sets such as signaling networks and protein complexes. Some of our top predictions are supported by published experiments, yet many are entirely new or provide mechanistic insights to known phenotypes. Our results indicate that coordinated miRNA targeting of closely connected genes is prevalent across pathways. We use the same method to infer which miRNAs regulate similar targets and provide the first genome-wide evidence of pervasive co-targeting, where a handful of "hub" miRNAs are involved in a majority of co-targeting relationships. Our method and analyses pave the way to systematic discovery of miRNA functions. Introduction MicroRNAs (miRNAs) regulate diverse biological processes in animals and plants (Bushati and Cohen 2007) and are among the most abundant regulatory factors in the human genome, comprising 3-5% of known human genes (Griffiths-Jones et al. 2008). miRNAs recognize target mRNAs by imperfect base pairing to sites in the 3' untranslated region (3' UTR), usually with perfect pairing of the miRNA seed region (nucleotides 28), ultimately leading to translational repression and/or mRNA degradation (Bushati and Cohen 2007). Thousands of human genes are predicted to be targeted by miRNAs (Rajewsky 2006), suggesting that miRNAs play a pervasive role in the regulation of gene expression. Although hundreds of human miRNAs have been identified and new ones are continually being discovered (Griffiths-Jones et al. 2008), the function of most miRNAs remains unknown. Increasingly, miRNA expression changes are being linked to phenotypes, but the mechanistic role of the miRNA in the underlying biological network is often unclear. 107 Given that many human miRNAs can target up to thousands of genes, how often do miRNAs target a set of related genes to regulate a specific pathway or process? Though recent studies show that a few miRNAs have pathway-specific functions (Xiao and Rajewsky 2009), earlier work suggests that miRNAs primarily serve to fine-tune and confer robustness upon the expression of many genes (Bartel and Chen 2004, Farh et al. 2005, Stark et al. 2005). The prevalence of multiple miRNAs targeting the same gene ("co-targeting") is also unclear. While many genes contain putative binding sites for multiple miRNAs (Krek et al. 2005, Stark et al. 2005), many putative sites may not be functional in vivo. More specifically, the combinations of miRNAs that function together by regulating common targets are unknown. Knowledge of such co-targeting relationships would also enable one to infer a miRNA's function from the function of its co-targeting miRNAs. Typically miRNA function is predicted by assessing whether the predicted targets of a given miRNA are enriched for particular functional annotations. Such an approach has several limitations: (1) target prediction is imperfect and can lead to spurious targets (Rajewsky 2006); (2) having a subset of one's favorite pathway genes in the putative target set does not necessarily mean that the miRNA functions in the pathway; (3) predicted target sets are often so large (hundreds to thousands of genes) and have such heterogeneous functional annotations that standard algorithms are not sufficiently sensitive to make high-confidence predictions. Rather than progressing from a miRNA to a potentially spurious target set that may or may not have enriched function, here we introduce a computational method called mirBridge, which starts with a gene set of known function, then assesses whether functional sites for a given miRNA are enriched in the gene set compared to random gene sets with similar properties. We apply mirBridge to a variety of annotated gene sets for signaling pathways, diseases, drug treatments, and protein complexes. We also use mirBridge to infer miRNA pairs that tend to function together by regulating common targets and use the results to assemble a miRNA-miRNA co-targeting network. Together, our analyses provide: (1) hundreds of miRNA function predictions, many of which are supported by published experiments; (2) genome-wide evidence that many miRNAs coordinately regulate multiple components of pathways or protein complexes; and (3) evidence that miRNA co-targeting is highly prevalent, with a small number of "hub" miRNA families involved in a large fraction of the co-targeting interactions. Both the mirBridge method and the predictions it has generated can serve as important resources for the future experimental dissection of miRNA functions. 108 Results mirBridge: linking miRNAs to gene sets Many gene sets contain tens to hundreds of putative targets for any particular miRNA. However, for a variety of reasons (e.g. mRNA secondary structure occludes binding, or the miRNA and the target are not expressed together) many target sites are not functional in vivo. The goal of mirBridge is to infer whether an unusually large proportion and number of putative target sites for a miRNA (m) in a given gene set (G) are likely to be functional in vivo. Toward this end, mirBridge computes a score by combining the results of three statistical tests that evaluate different aspects of likely functional target-site enrichment in G. It is essential that the enrichment of sites in G be compared to enrichment in appropriate control gene sets. Below we describe the individual tests and the method for constructing the control gene sets (see Supplemental Experimental Procedures for details). The following definitions are essential to the methodology of mirBridge. First, any gene with one or more seed-matched sites for m in its 3' UTR is deemed a "putative target." Second, seed-matched sites can be classified into two categories (Figure 1A): "conserved sites" (CS) are sites that are conserved across mammalian genomes; "high-context scoring sites" (HCS) are sites with a context score above a predefined threshold. The context score reflects the likelihood of a seed-matched site to confer repression based on several features, including the distance of the site from the stop codon, accessibility of the site based on secondary structure, and the extent of base pairing beyond the seed (Grimson et al. 2007). The first test used by mirBridge, called "conservation enrichment signature" (CE), infers whether the number of CS in G is significantly higher than that of random gene sets containing the same number of putative targets as G. This test is similar to evaluating whether the sites have evolved at a slower rate compared to random putative target sets, but is fundamentally different from prior tests that utilize sequence conservation (Lewis et al. 2005, Stark et al. 2005) (see Supplemental Experimental Procedures). The second test, called "context-score signature" (CTX), evaluates whether the number of HCS is significantly higher than that of random gene sets containing the same number of putative targets as G. The CTX test is designed to detect enrichment of sites in G that are likely functional but not necessarily conserved. The third test, called "site occurrence signature" (OC), evaluates whether the number of putative target sites in G is unusually high compared to random gene sets containing the same number of genes. While target site abundance alone is not necessarily indicative of functional targeting by m, functional targeting enrichment becomes a likely scenario even when G tests as moderately significant for the CE and/or CTX tests. Note that both CE and CTX are based on comparison with random gene sets with the same number of putative targets to detect 109 enrichment in the proportion rather than the number of CS or HCS. This ensures that the comparisons are valid, as gene sets with more putative targets tend to have more CS or HCS. Because true positives are more likely than false positives to test as simultaneously significant across the tests, we combine the three tests and form a composite score ("OCCE-CTX") to increase sensitivity without sacrificing specificity. We developed a nearest-neighbor gene sampling algorithm, motivated by the principle of kernel-based density estimators (Wegman 1972), to generate random gene sets that are similar to the input gene set with respect to general conservation level, 3' UTR length, and GC content, which primarily bias the CE, OC, and CTX tests, respectively. Simultaneous adjustment is particularly important because these factors are correlated with each other across genes. Specifically, for the OC test, comparable random gene sets are generated by replacing each member of G with a randomly drawn gene that has similar GC content, 3' UTR length, and general conservation level (Figure IB). To ensure that the number of putative targets in the random gene sets is the same as that in G for the CE and CTX tests, the same nearest-neighbor procedure is used, but only putative targets in G are replaced by random putative targets (Figure 1C). Finally, to obtain the OC-CE-CTX p value, the p values of the individual tests are combined using a customized version of the inverse-normal method that corrects for dependencies among tests (Joachim 1999). When multiple gene sets and/or miRNAs are tested simultaneously, multiple hypothesis testing is corrected by computing the false discovery rate (FDR) using the q-value method (Storey and Tibshirani 2003). "FDR" and "q value" are used interchangeably below. Besides 3' UTR length, GC content, and general conservation, other less apparent factors could bias mirBridge results, but their effects are likely small (see Supplemental Experimental Procedures). The statistical model in mirBridge was also designed to incorporate additional factors if needed; in principle, any number of factors can be accounted for by our nearest-neighbor sampling procedure. mirBridge is fundamentally different from testing whether the number of predicted miRNA targets in a gene set is significantly higher than expected using the Fisher Exact Test (FET), a standard way to assess the significance of gene set overlaps. First, mirBridge takes gene set properties into account; second, it combines different and important biological characteristics of target sites; and finally, it uses metrics (CE and CTX) that focus on the proportion of likely functional target sites instead of the number of predicted target overlaps. In fact, mirBridge has superior sensitivity and specificity compared to FET as shown in the applications below. 110 Inferring human miRNA functions To link human miRNA families (miRNAs with a shared seed sequence) to functions, we applied mirBridge to gene sets from (1) canonical signaling pathways from MSigDB (Subramanian et al. 2005); (2) KEGG (Kanehisa and Goto 2000); (3) human protein complexes from the CORUM database (Ruepp et al. 2008); (4) gene co-expression modules (Segal et al. 2004); (5) Gene Ontology (GO) Biological Process; (6) GO Component; and (7) GO Function (Ashburner et al. 2000). At a FDR cutoff of 0.2, mirBridge predicts 185, 128, 1198, 456, 432, 71, and 175 distinct miRNA-function associations, respectively (Tables S1-S7). Most predictions implicate pathways or protein complexes with multiple putative targets for the miRNA, whereas some have only one (or very few) putative targets containing multiple high-quality sites (e.g. miR-33 and statin pathway). The latter fits the paradigm implied in some recent papers where a miRNA phenotype seems to be accounted for by one (or just a few) targets: "miR-X regulates process Y by targeting gene Z." However, the prevalence of coordinate targeting of multiple related genes suggests that most miRNAs exert their phenotypic effects by targeting multiple network components. To facilitate a succinct discussion of such a large set of predictions, Tables 1 and 2 show a selection of predictions that either already have support from the literature or wherein the predicted pathway (1) has known activity in the tissue where the miRNA is known to be expressed; or (2) represents core cellular processes (e.g. "apoptosis") and has a large number of putative targets for the miRNA. We also favor predictions that reoccur in closely related or synonymous gene sets, e.g. "cell cycle" and "G1 to S transition." mirBridge is sensitive to biological signals and can independently uncover known miRNA functions Although mirBridge is not trained on any dataset of known miRNA functions, several of the top hits already have experimental support in the literature (Table 1), such as the association of miR-16 with the cell cycle, Wnt signaling, and prostate cancer (Calin et al. 2005, Linsley et al. 2007) (Figure SlA). This is also an example in which mirBridge links a disease and the pathways underlying its pathology: miR-16 has been shown to work through the Wnt pathway to function as a tumor suppressor in prostate cancer (Bonci et al. 2008). Analogously, miR-7 hits the ErbB pathway in glioblastoma (Kefas et al. 2008, Webster et al. 2009); miR-221/222 hits the estrogen signaling pathway in breast cancer (Miller et al. 2008, Zhao et al. 2008); and let-7 hits the Gl-S cell-cycle pathway in breast cancer (Schultz et al. 2008, Yu et al. 2007). mirBridge can also implicate a pathway of interest given the tissue specificity of a miRNA: miR-7 is predicted to regulate the insulin receptor pathway and is known to be highly expressed in insulin-producing cells of 111 pancreatic islets (Bravo-Egana et al. 2008, Correa-Medina et al. 2009, Joglekar et al. 2009). mirBridge also independently uncovered feedback loops: miR-146 is predicted to target several upstream signaling genes in the NF-kB pathway, whereas its transcription is known to be activated by NF-kB (Taganov et al. 2006) (Figure S 1B). Another notable prediction supported by the literature is miR-34 targeting BCL2 and several additional anti-apoptotic genes in the BAD pathway (Chang et al. 2007, Cloonan et al. 2008, He et al. 2007). This prediction provides an attractive hypothesis for how miR-34 upregulation could lead to apoptosis. In sum, these results are reassuring and indicate that mirBridge can capture biologically relevant signals. mirBridge is significantly more sensitive than the standard approach of evaluating gene set overlaps using FET. For instance, when FET is applied to the canonical pathway gene sets, only five predictions can be made at the 0.2 FDR cutoff (Table S8); all five have FDRs greater than 0.18, and only one has support from the literature (miR-16 and the Gleevec pathway, given that miR-16 is associated with leukemia). Furthermore, none of the top mirBridge predictions supported by published experiments were uncovered. For example, for miR-16, none of the cell-cycle related pathways are ranked near the top, even if we ignore the statistical significance and order the pathways within each miRNA family by their q values (the top cell-cycle related entry has rank 54, q = 0.55). These results suggest that mirBridge can better uncover biologically relevant signals than FET. It is important to note that the comprehensiveness of our predictions is dependent on the gene sets used. Some known miRNA functions are not in our predicted list because the appropriate gene set(s) were not included in the analysis. For example, miR-200 is known to function in the epithelial-mesenchymal transition (Burk et al. 2008, Gregory et al. 2008, Korpal et al. 2008, Park et al. 2008), but none of the gene sets used in our analysis captures this process. However, when mirBridge is applied to genes whose function annotation in the GeneCards database includes "epithelial-mesenchymal transition," miR141/200a has the lowest q value among all miRNAs (q = 0.08). To further assess the ability of mirBridge to predict known miRNA functions independently, we compiled eight additional miRNA phenotypes from the literature and applied mirBridge to seemingly relevant gene sets from KEGG or GeneCards (Table S10). Of nine phenotypes, four miRNA-gene set p values are significant and two are marginally significant (Table 3). In a multiple hypothesis testing context in which all miRNAs are tested simultaneously for the phenotype gene set, however, only two would have been predicted at a FDR cutoff of 0.2 even though the desired miRNA ranks at or near the top for all four of the significant cases. This suggests that, for these specific gene sets, mirBridge is sensitive to the relevant biological signals but lacks sufficient statistical power after multiple-testing correction. It follows that the hundreds of low-FDR 112 predictions that are made by mirBridge are compelling candidates for experimental follow-up given that these emerged in the simultaneous testing of thousands of miRNAgene set combinations. We expect the statistical power of mirBridge to continue to improve as additional genomes and knowledge of miRNA-target interactions become available. We also sought to understand cases where mirBridge failed to predict the correct functions. Closer examination of the three failed cases in Table 3 suggests that, for let-7 and miR-133, the gene sets used do not capture the biology relevant to the miRNA targeting. The cell cycle may be a key pathway through which let-7 exerts its effect on lung cancer (Esquela-Kerscher et al. 2008, Kumar et al. 2008, Schultz et al. 2008), but the non-small cell lung cancer gene set lacks most cell cycle genes and other postulated targets such as HMGA2 and MYC (let-7 does hit the Gl-S cell-cycle transition pathway; Table 1). Similarly, for miR-133 and cardiac hypertrophy, two out of the three known targets relevant to the phenotype are not in the GeneCards set (CDC42 and WHSC2; Care et al. 2007). Finally, for miR-122, it turns out that inhibition of miR-122 by antagomir treatment tends to downregulate, rather than upregulate, cholesterol biosynthetic genes (Krutzfeldt et al. 2005), suggesting that the effect of miR-122 on cholesterol biosynthetic genes is indirect. Thus, the insignificant mirBridge p value for miR-122 and cholesterol biosynthesis genes is not surprising. mirBridge provides many new miRNA function predictions The majority of mirBridge predictions are as yet untested (Tables 2 and S1-S7). Some pathways predicted in common for multiple miRNAs seem particularly compelling . because the miRNAs are known to be co-regulated. For example, the apoptosis pathway is predicted for miR-23 and -24, which are different in sequence but are co-expressed from the same cluster (Chhabra et al. 2009). Some predictions seem reasonable based on the function of the miRNA host gene. For example, the statin/cholesterol homeostasis pathway is linked to miR-33, which is embedded in an intron of a transcription factor (SREBP2) that regulates cholesterol synthesis and uptake (Figure SIC). Other predictions seem plausible based on known miRNA functions with similar developmental placement and timing. For example, axon guidance pathways are predicted for miR-124, which has already been shown to positively regulate neurogenesis (Cheng et al. 2009, Visvanathan et al. 2007). Consistently, miR-124 was linked to the SNARE protein complex as it putatively targets VAMP3, a component of SNARE, via three conserved and high context-scoring sites; VAMP3 is known to function in the docking and fusion of synaptic vesicles with the presynaptic membrane (Sudhof 2004). 113 mirBridge predictions can also provide mechanistic interpretations of published experiments. For example, it is known that activation of PIP3 signaling leads to the hypertrophic response in cardiac myocytes and that miR-1 expression is down-regulated upon hypertrophic stress (Care et al. 2007, Heineke and Molkentin 2006, Sayed et al. 2007). mirBridge links miR-1 to the PIP3 pathway, and the putative miR-1 targets in the pathway are all pro-hypertrophic except PTPNJ (Table 1), suggesting that the downregulation of miR-1 helps to drive pathway activation (Figure 2). Post-transcriptional repression by miR-1 could allow these genes to be transcribed at higher (or leaky) levels without triggering a hypertrophic response, such that a reduction in miR-1 expression would suffice to rapidly activate signaling at multiple levels. For example, de-repression of the most downstream factors (e.g. CDC42) could quickly lead to sarcomere remodeling, a first step in the hypertrophic response (Nagai et al. 2003). Increasing levels of upstream factors coupled with positive feedback loops would intensify the response. We envision that a useful application of mirBridge would be to probe a function of interest guided by the known expression profile of miRNAs. Because we are interested in neurotransmitter pathways, we applied mirBridge to manually curated gene sets for these pathways (see Supplemental Experimental Procedures). miR-218, a known neuronal miRNA (Sempere et al. 2004), is the most and second-most significant hit for GABA and glutamate gene sets, respectively (q = 0.025 and 0.033). That these two neurotransmitter activities may be regulated by the same miRNA is intriguing given that glutamate and GABA are, respectively, the major excitatory and inhibitory neurotransmitters and that the latter can be enzymatically converted from the former. In addition, we tested a gene set for synaptic vesicle formation because miR-218 is enriched at synapses of hippocampal neurons (Siegel et al. 2009). miR-135, a brain-enriched miRNA (Sempere et al. 2004), and miR-218 are the top two hits (q = 0.000003 and 0.024, respectively). In sum, the mirBridge hits for these gene sets extend early experimental findings to implicate miR-218 as a potential regulator of neuronal activity at hippocampal synapses. miRNA co-targeting is prevalent Our miRNA-pathway map indicates that some miRNAs function in the same pathway(s) by targeting a similar set of genes. Indeed, many miRNAs may function together (via "co-targeting") to regulate target-gene expression. To assess the prevalence of cotargeting and infer which miRNAs are co-targeting partners, we next used sets of genes likely regulated by particular miRNAs to create a miRNA-to-miRNA mapping. Specifically, our inputs to mirBridge were the predicted target sets (PTS) of 73 deeply conserved human miRNA families. We call a miRNA family Y a "co-targeting partner" of a miRNA family X if at least one of Y's seed-matched sequences has a significant mirBridge q value in the PTS of X and denote the relationship as "X->Y." We predicted 114 co-targeting relationships for all ordered pairs of the 73 families (73 X 72 = 5256 distinct pairs). Our results indicate that miRNA co-targeting is prevalent: 221 distinct X->Y co-targeting relationships are inferred at a FDR cutoff of 0.2 (Table S 11). A subset of these predictions corresponds to miRNA genomic clusters (Yu et al. 2006), such as the miR]9b-2/106a cluster on Xq26.2 and the miR-1 7-18-19a-20-92 cluster on 13q31.3 (Table S11). Co-targeting pairs in close genomic proximity are not surprising: these miRNAs are polycistronic and co-expressed, and are thus likely to function together to regulate common targets. In fact, clustered miRNAs are enriched for co-targeting relationships: when X and Y are members of a genomic cluster, they are predicted as co-targeting partners 25% of the time, compared to 3% when X and Y are not clustered. Consequently, the median q-value of clustered pairs is significantly lower than that of unclustered ones (p < 2.1 X 10- 7 , Mann-Whitney Test; see Table SI1 for the clusters used in this analysis), indicating that our method for detecting co-targeting is sensitive, specific, and capable of uncovering biologically relevant signals. If our predictions reflect bonafide biological signals, we also expect a significant percentage of the X->Y pairs to possess mutual co-targeting relationships, i.e. each miRNA's putative binding sites would have a score below the FDR cutoff in the other miRNA's PTS. Indeed, 96/221 (43%) of the X->Y predicted pairs do. Though the remaining 57% of the X->Y pairs do not have the corresponding Y->X pairs falling below the FDR cutoff, there is nonetheless a significant correlation between their q values (Spearman correlation = 0.42 (p = 0); Figure S2). Also, the reciprocal (Y->X) q values of significant X-)Y pairs are lower than those of pairs with q values greater than 0.2 (p < 5 X 10-14 Mann-Whitney Test). The general reciprocation of co-targeting scores indicates that a significant percentage of our predictions are specific and that the signals we are detecting are likely biologically relevant. We also tested whether co-targeting relationships could be inferred from gene set overlaps, where the X->Y q value was computed using FET on the number of genes shared between the PTSs of the miRNA family pair. This analysis failed to provide informative results because almost all tested pairs have a significant q value: 2264 (86%) and 2628 (100%) of the pairs have a q value of less than 0.05 by using the Bonferroni and FDR correction, respectively. This suggests that a core set of genes are frequently predicted as targets for many miRNA family pairs; these likely correspond to genes with highly conserved 3' UTRs and/or low GC content, properties that favor a gene being predicted as a target using Targetscan. This result strongly suggests that the degree of PTS overlap is not sufficiently specific to detect authentic co-targeting relationships, 115 whereas mirBridge has superior specificity and is thus able to provide biologically relevant signals, as shown above. Network analysis of co-targeting interactions Our co-targeting predictions can naturally be organized as a network in which the nodes are miRNA families and the directed edges between nodes denote the X->Y predictions. A network representation enables examination of connectivity patterns beyond pairwise interactions. We first checked whether the edges in the network are evenly distributed across nodes or concentrated around a few nodes ("hubs"). Strikingly, the edges connecting the 10 most highly connected nodes (out of 69 nodes with at least one adjacent edge) account for more than 55% (123/221) of the edges in the network (Figure 3A and Table S11). While overall the size of a miRNA family's PTS is correlated to its connectivity ranking (p = 10~6 Spearman correlation), this correlation becomes insignificant when restricted to families with at least 900 predicted targets (p > 0.1). Since only six of the top 40 most-connected families have fewer than 900 predicted targets, the size of a miRNA family's PTS alone cannot explain the connectivity pattern among the top 40 families. The hub miRNA families probably have functions in diverse contexts. For example, some hubs have a large number of members and therefore are likely to have more diverse functions depending on the spatial-temporal expression of individual miRNAs (e.g. miR-93.hd/291-3p/294/295/302/372/373/520). We reasoned that groups of tightly interconnected nodes might represent miRNAs that perform similar functions. To identify such groups we used a graph clustering tool that ignores edge weights to identify tightly interconnected nodes (Bader and Hogue 2003) (Figure 3B). We find that subnetwork 1 has four families and is the largest and most highly interconnected; three of the families (miR-1 7 -5p, -130, -93.hd) are among the most connected families (Figure 3A). This subnetwork is also well connected to subnetwork 3 (miR-18, -19, -181), probably because miR-1 7-18-19-20 are co-expressed from a polycistronic transcript. The miR-1 7 cluster is known to be overexpressed in a number of human cancers, including B-cell tumors, whereas miR-142 is also highly expressed in B cells (Chen and Lodish 2005, Mendell 2008). Their shared PTS is enriched for genes in developmental processes (p < 3.8 x 10a), consistent with the miR-1 7 cluster's function in the development of B cells, the heart, and lungs (Mendell 2008, Ventura et al. 2008). Our linking of the miR-142 and miR-130/301 families - whose functions are largely unknown - to the miR-1 7 cluster suggests that these miRNA families also participate in similar developmental and oncogenic processes. 116 Discussion We have introduced a systematic method for inferring miRNA functions by assessing the enrichment of likely functional target sites in gene sets. Key features of mirBridge include combining test metrics that detect different aspects of functional targeting, and a sampling algorithm for removing gene set biases to improve estimation of statistical significance. Hundreds of human miRNA-function associations were inferred by mirBridge; some are reassuringly supported by published experiments, but many are asyet untested and/or provide mechanistic insights beyond published data. Our results provide hints about the general principles of miRNA-mediated regulation in networks. While some miRNAs could act as global regulators by repressing up to thousands of targets genome-wide (Lewis et al. 2005), many appear to have pathwayspecific functions, and these miRNAs tend to target multiple genes in the same pathway. Typically, the predicted targets of the miRNA are genes that drive pathway activity in a coherent direction (e.g. miR-1 6 targeting of G 1-to-S-promoting genes). Such coordinate targeting could partially explain how individual miRNAs can be potent effectors of pathway activity even though the amount of repression conferred by miRNAs tends to be modest for any single target (Baek et al. 2008, Selbach et al. 2008, Xiao and Rajewsky 2009). As was observed earlier (Martinez et al. 2008, Tsang et al. 2007), some of our predictions (e.g. miR-1) involve miRNAs mediating feedback and feedforward loops, whose functions include protein homeostasis and signal amplification, respectively. For example, miRNAs could be "master" regulators of pathways and thus serve as effective therapeutic targets because positive feedbacks could amplify small changes in protein concentration conferred by miRNA targeting of multiple genes. Our analysis also indicates that miRNAs can function in, and mediate crosstalk among, multiple canonical pathways, such as miR-16's potential roles across the cell cycle and Wnt pathways to coordinately regulate cellular growth and proliferation. mirBridge also facilitates context-specific target prediction: one can first predict which pathways a miRNA regulates and then compile high-quality putative targets within a pathway. This strategy may be especially effective for miRNAs that function in only a few pathways, as targets predicted genome-wide may have low specificity (Lewis et al. 2005). Additional filtering can be used to strengthen the target predictions, for example, by requiring that the putative target and the miRNA be significantly correlated in their expression using miRNA-mRNA expression data sets (Lu et al. 2005) (Table S9). In addition to providing functional links across miRNAs, our human miRNA-miRNA map provides, to the best of our knowledge, the first genome-wide evidence that miRNA co-targeting is prevalent, and that a handful of hub miRNA families are involved in a large fraction of the co-targeting connections. The abundance of co-targeting further 117 suggests that while individual miRNAs may provide only modest levels of repression, combinatorial targeting by multiple miRNAs (Krek et al. 2005) can potentially achieve a wide range of target-level modulations. Given that multiple miRNAs are expressed at different levels in any given cell type, individual genes can evolve combinations of miRNA binding sites to optimize expression levels across cell types (Bartel and Chen 2004). miRNA target sites are short and could thus be acquired or lost relatively quickly over evolution to fine-tune gene expression levels. Designating a group of miRNAs as "co-targeting" does not necessarily imply that these miRNAs are co-expressed so as to regulate their common targets at the same time and place. In fact, the exact opposite is also likely: different miRNAs are responsible for controlling a given set of targets in different contexts. In general, a combination of the above scenarios is likely for individual cases, and additional data (e.g. miRNA and target expression profiles) are needed to further dissect the mechanistic basis of individual cotargeting predictions. mirBridge is currently limited to assessing enrichment at the level of miRNA families using seed-matched motifs. But this is largely due to our lack of general understanding of miRNA-target interaction beyond seed pairing and features captured by the context score. In principle, the mirBridge methodology is general and can be applied to any combinations of gene sets, sequence motifs, and site scoring metrics, including nonmiRNA motifs, such as those involved in regulating mRNA stability. Given mirBridge's ability to simultaneously correct for multiple gene set biases, and the increasing availability of genomes and annotated gene sets, mirBridge is poised to serve as a key resource for the comprehensive functional dissection of miRNAs and other regulatory sequence motifs in genomes. Experimental Procedures Seed-matched site compilation miRNA family memberships, 3' UTR sequences, seedmatched sites and their context scores and conservation status were downloaded from TargetScan (http://www.targetscan.org/vert_40/). For each known human gene, the number of seed-matched sites for each miRNA family, the number of those that are conserved, and the context score were computed. Since the context score depends on the full miRNA sequence, the context score for a miRNA family is defined as the average of all human members of that family. mirBridge The method as described in the text was implemented in Matlab. More details and related discussions can be found in Supplementary Experimental Procedures. 118 miRNA function analysis Canonical signaling pathway and KEGG gene sets were downloaded from http://www.broad.mit.edu/gsea/msigdb/index.jsp. The cancer, CORUM, and GO sets were downloaded from http://robotics.stanford.edu/-erans/cancer/, http://mips.helmholtzmuenchen.de/genre/proj/corum, and NCBI Gene, respectively. To reduce noise and avoid spurious annotations, we only used GO annotations with experimental and peer-reviewed evidence. A miRNA-gene set prediction requires at least one of the miRNA seed motifs (m2-8 and/or m7-A) to test as significant in the gene set. The q value reported for individual miRNAs corresponds to the q value of the seed motif with the smaller p value. miRNA family selection The deeply conserved miRNAs are ones that are conserved across human, mouse, rat, dog and chicken. We focused on these miRNAs because they probably have (1) more conserved functions, (2) a larger number of targets compared to less-conserved miRNAs, and (3) stronger conservation enrichment signals. Target prediction Targets were compiled for each miRNA by including genes with at least one conserved seed-match (across human, mouse, rat and dog) or a seed-match with a context score of greater than 68 in the 3' UTR (see Supplemental Experimental Procedures). Predictions based on context score alone were included because functional target sites can be imperfectly conserved. High-quality putative targets in gene sets (Table 1 and S1) were compiled using the same definition. X->Y predictions and analysis mirBridge was applied to the predicted target set of each miRNA family. Only the seed-matched motifs of the 73 families were scored. When both seed-matched motifs of a miRNA family are tested significant, the smaller q value is used as the X->Y q value. Human miRNA clusters were obtained from (Yu et al. 2006). Predicted target set overlap analysis The number of overlaps between the predicted target set of each miRNA-family pair was computed. The statistical significance was computed using Fisher Exact Test (see Supplemental Experimental Procedures). Predicted target set and pathway overlap analysis Similar to above except that (1) all genes that are not predicted as a target for any miRNA were removed from the pathway gene sets; and (2) the population size is taken as the number of genes that are predicted as a target for at least one miRNA family and belong to at least one pathway. Acknowledgments We thank H. Fraser, D. Muzzey, M. Narayanan and M. Umbarger for comments on the manuscript; J. Zhu for discussions; D. Bartel for the suggestion to examine co-targeting 119 by polycistronic miRNAs; M. Fang for help on importing gene sets. This work was supported by a NIH Director's Pioneer Award to A.v.O.; J.T. was partially supported by a doctoral scholarship from the NSERC of Canada; M.S.E. was supported by a HHMI Predoctoral Scholarship and Paul and Cleo Schimmel Scholarship. References Ashbumer M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25-29 (2000). Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 2 (2003). Baek D, Vill6n J, Shin C, Camargo FD, Gygi SP, Bartel DP. The impact of microRNAs on protein output. Nature 455, 64-71 (2008). Bartel DP, Chen CZ. Micromanagers of gene expression: the potentially widespread influence of metazoan microRNAs. Nat. Rev. Genet. 5, 396-400 (2004). Bonci D, Coppola V, Musumeci M, Addario A, Giuffrida R, Memeo L, D'Urso L, Pagliuca A, Biffoni M, Labbaye C, Bartucci M, Muto G, Peschle C, De Maria R. The miR- 15a-miR- 16-1 cluster controls prostate cancer by targeting multiple oncogenic activities. Nat. Med. 14, 1271-1277 (2008). Bravo-Egana V, Rosero S, Molano RD, Pileggi A, Ricordi C, Dominguez-Bendala J, Pastori RL. Quantitative differential expression analysis reveals miR-7 as major islet microRNA. Biochem. Biophys. Res. Commun. 366, 922-926 (2008). Burk U, Schubert J, Wellner U, Schmalhofer 0, Vincan E, Spadema S, Brabletz T. A reciprocal repression between ZEB 1 and members of the miR-200 family promotes EMT and invasion in cancer cells. EMBO Rep. 9, 582-589 (2008). Bushati N, Cohen SM. microRNA functions. Annu. Rev. Cell Dev. Biol. 23, 175-205 (2007). Calin GA, Ferracin M, Cimmino A, Di Leva G, Shimizu M, Wojcik SE, Iorio MV, Visone R, Sever NI, Fabbri M, luliano R, Palumbo T, Pichiorri F, Roldo C, Garzon R, Sevignani C, Rassenti L, Alder H, Volinia S, Liu CG, Kipps TJ, Negrini M, Croce CM. A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N. Engl. J. Med. 353, 1793-1801 (2005). Care A, Catalucci D, Felicetti F, Bonci D, Addario A, Gallo P, Bang ML, Segnalini P, Gu Y, Dalton ND, Elia L, Latronico MV, Hoydal M, Autore C, Russo MA, Dom GW 2nd, 120 Ellingsen 0, Ruiz-Lozano P, Peterson KL, Croce CM, Peschle C, Condorelli G. MicroRNA-133 controls cardiac hypertrophy. Nat. Med. 13, 613-618 (2007). Chan JA, Krichevsky AM, Kosik KS. MicroRNA-21 is an antiapoptotic factor in human glioblastoma cells. Cancer Res. 65, 6029-6033 (2005). Chang TC, Wentzel EA, Kent OA, Ramachandran K, Mullendore M, Lee KH, Feldmann G, Yamakuchi M, Ferlito M, Lowenstein CJ, Arking DE, Beer MA, Maitra A, Mendell JT. Transactivation of miR-34a by p53 broadly influences gene expression and promotes apoptosis. Mol. Cell 26, 745-752 (2007). Chen CZ, Lodish HF. MicroRNAs as regulators of mammalian hematopoiesis. Semin. Immunol. 17, 155-165 (2005). Cheng LC, Pastrana E, Tavazoie M, Doetsch F. miR-124 regulates adult neurogenesis in the subventricular zone stem cell niche. Nat. Neurosci. 12, 399-408 (2009). Chhabra R, Adlakha YK, Hariharan M, Scaria V, Saini N. Upregulation of miR-23a approximately 27a approximately 24-2 cluster induces caspase-dependent and independent apoptosis in human embryonic kidney cells. PLoS One 4, e5848 (2009). Cloonan N, Brown MK, Steptoe AL, Wani S, Chan WL, Forrest AR, Kolle G, Gabrielli B, Grimmond SM. The miR-17-5p microRNA is a key regulator of the Gl/S phase cell cycle transition. Genome Biol. 9, R127 (2008). Correa-Medina M, Bravo-Egana V, Rosero S, Ricordi C, Edlund H, Diez J, Pastori RL. MicroRNA miR-7 is preferentially expressed in endocrine cells of the developing and adult human pancreas. Gene Expr. Patterns 9, 193-199 (2009). Esquela-Kerscher A, Trang P, Wiggins JF, Patrawala L, Cheng A, Ford L, Weidhaas JB, Brown D, Bader AG, Slack FJ. The let-7 microRNA reduces tumor growth in mouse models of lung cancer. Cell Cycle 7, 759-764 (2008). Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821 (2005). Gregory PA, Bert AG, Paterson EL, Barry SC, Tsykin A, Farshid G, Vadas MA, KhewGoodall Y, Goodall GJ. The miR-200 family and miR-205 regulate epithelial to mesenchymal transition by targeting ZEB 1 and SIP 1. Nat. Cell Biol. 10, 593-601 (2008). Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 36, D154-158 (2008). Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91105 (2007). 121 He L, He X, Lim LP, de Stanchina E, Xuan Z, Liang Y, Xue W, Zender L, Magnus J, Ridzon D, Jackson AL, Linsley PS, Chen C, Lowe SW, Cleary MA, Hannon GJ. A microRNA component of the p53 tumour suppressor network. Nature 447, 1130-1134 (2007). Heineke J, Molkentin JD. Regulation of cardiac hypertrophy by intracellular signalling pathways. Nat. Rev. Mol. Cell Biol. 7, 589-600 (2006). Ji Q, Hao X, Meng Y, Zhang M, Desano J, Fan D, Xu L. Restoration of tumor suppressor miR-34 inhibits human p53-mutant gastric cancer tumorspheres. BMC Cancer 8, 266 (2008). Ji Q, Hao X, Zhang M, Tang W, Yang M, Li L, Xiang D, Desano JT, Bommer GT, Fan D, Fearon ER, Lawrence TS, Xu L. MicroRNA miR-34 inhibits human pancreatic cancer tumor-initiating cells. PLoS One 4, e6816 (2009). Joachim H. A Note on Combining Dependent Tests of Significance. Biometrical Journal 41, 849-855 (1999). Joglekar MV, Joglekar VM, Hardikar AA. Expression of islet-specific microRNAs during human pancreatic development. Gene Expr. Patterns 9, 109-113 (2009). Johnnidis JB, Harris MH, Wheeler RT, Stehling-Sun S, Lam MH, Kirak 0, Brummelkamp TR, Fleming MD, Camargo FD. Regulation of progenitor cell proliferation and granulocyte function by microRNA-223. Nature 451, 1125-1129 (2008). Johnson CD, Esquela-Kerscher A, Stefani G, Byrom M, Kelnar K, Ovcharenko D, Wilson M, Wang X, Shelton J, Shingara J, Chin L, Brown D, Slack FJ. The let-7 microRNA represses cell proliferation pathways in human cells. Cancer Res. 67, 77137722 (2007). Jones SW, Watkins G, Le Good N, Roberts S, Murphy CL, Brockbank SM, Needham MR, Read SJ, Newham P. The identification of differentially expressed microRNA in osteoarthritic tissue that modulate the production of TNF-alpha and MMP 13. Osteoarthritis Cartilage 17, 464-472 (2009). Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27-30 (2000). Kefas B, Godlewski J, Comeau L, Li Y, Abounader R, Hawkinson M, Lee J, Fine H, Chiocca EA, Lawler S, Purow B. microRNA-7 inhibits the epidermal growth factor receptor and the Akt pathway and is down-regulated in glioblastoma. Cancer Res. 68, 3566-3572 (2008). Korpal M, Lee ES, Hu G, Kang Y. The miR-200 family inhibits epithelial-mesenchymal transition and cancer cell migration by direct targeting of E-cadherin transcriptional repressors ZEBI and ZEB2. J. Biol. Chem. 283, 14910-14914 (2008). 122 Krek A, Grtin D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N. Combinatorial microRNA target predictions. Nat. Genet. 37, 495-500 (2005). Krutzfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M. Silencing of microRNAs in vivo with 'antagomirs'. Nature 438, 685-689 (2005). Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T. Suppression of non-small cell lung tumor development by the let-7 microRNA family. Proc. Natl Acad. Sci. USA 105, 3903-3908 (2008). Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20 (2005). Li QJ, Chau J, Ebert PJ, Sylvester G, Min H, Liu G, Braich R, Manoharan M, Soutschek J, Skare P, Klein LO, Davis MM, Chen CZ. miR- 181 a is an intrinsic modulator of T cell sensitivity and selection. Cell 129, 147-161 (2007). Li Z, Hassan MQ, Jafferji M, Aqeilan RI, Garzon R, Croce CM, van Wijnen AJ, Stein JL, Stein GS, Lian JB. Biological functions of miR-29b contribute to positive regulation of osteoblast differentiation. J. Biol. Chem. 284, 15676-15684 (2009). Li, Z., Hassan, M. Q., Volinia, S., van Wijnen, A. J., Stein, J. L., Croce, C. M., Lian, J. B., and Stein, G. S. (2008). A microRNA signature for a BMP2-induced osteoblast lineage commitment program. Proc. Natl Acad. Sci. USA 105, 13906-13911. Linsley PS, Schelter J, Burchard J, Kibukawa M, Martin MM, Bartz SR, Johnson JM, Cummins JM, Raymond CK, Dai H, Chau N, Cleary M, Jackson AL, Carleton M, Lim L. Transcripts targeted by the microRNA- 16 family cooperatively regulate cell cycle progression. Mol. Cell Biol. 27, 2240-2252 (2007). Liu Q, Fu H, Sun F, Zhang H, Tie Y, Zhu J, Xing R, Sun Z, Zheng X. miR-16 family induces cell cycle arrest by regulating multiple cell cycle genes. Nucleic Acids Res. 36, 5391-5404 (2008). Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, Downing JR, Jacks T, Horvitz HR, Golub TR. MicroRNA expression profiles classify human cancers. Nature 435, 834-838 (2005). Lu TX, Munitz A, Rothenberg ME. MicroRNA-21 is up-regulated in allergic airway inflammation and regulates IL-12p35 expression. J. Immunol. 182, 4994-5002 (2009). Martinez NJ, Ow MC, Barrasa MI, Hammell M, Sequerra R, Doucette-Stamm L, Roth FP, Ambros VR, Walhout AJ. A C. elegans genome-scale microRNA network contains composite feedback motifs with high flux capacity. Genes Dev. 22, 2535-2549 (2008). Mendell JT. miRiad roles for the miR- 17-92 cluster in development and disease. Cell 133, 217-222 (2008). 123 Miller TE, Ghoshal K, Ramaswamy B, Roy S, Datta J, Shapiro CL, Jacob S, Majumder S. MicroRNA-221/222 confers tamoxifen resistance in breast cancer by targeting p27Kipl. J. Biol. Chem. 283, 29897-29903 (2008). Nagai T, Tanaka-Ishikawa M, Aikawa R, Ishihara H, Zhu W, Yazaki Y, Nagai R, Komuro I. Cdc42 plays a critical role in assembly of sarcomere units in series of cardiac myocytes. Biochem. Biophys. Res. Commun. 305, 806-810 (2003). Park SM, Gaur AB, Lengyel E, Peter ME. The miR-200 family determines the epithelial phenotype of cancer cells by targeting the E-cadherin repressors ZEB 1 and ZEB2. Genes Dev. 22, 894-907 (2008). Pickering MT, Stadler BM, Kowalik TF. miR- 17 and miR-20a temper an E2F 1-induced GI checkpoint to regulate cell cycle progression. Oncogene 28, 140-145 (2009). Rajewsky N. microRNA target predictions in animals. Nat. Genet. 38 Suppl, S8-13 (2006). Raver-Shapira N, Marciano E, Meiri E, Spector Y, Rosenfeld N, Moskovits N, Bentwich Z, Oren M. Transcriptional activation of miR-34a contributes to p53-mediated apoptosis. Mol. Cell 26, 731-743 (2007). Ruepp A, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Stransky M, Waegele B, Schmidt T, Doudieu ON, Stumpflen V, Mewes HW. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 36, D646650 (2008). Sayed D, Hong C, Chen IY, Lypowy J, Abdellatif M. MicroRNAs play an essential role in the development of cardiac hypertrophy. Circ. Res. 100, 416-424 (2007). Schultz J, Lorenz P, Gross G, Ibrahim S, Kunz M. MicroRNA let-7b targets important cell cycle molecules in malignant melanoma cells and interferes with anchorageindependent growth. Cell Res. 18, 549-557 (2008). Segal E, Friedman N, Koller D, Regev A. A module map showing conditional activity of expression modules in cancer. Nat. Genet. 36, 1090-1098 (2004). Selbach M, Schwanhausser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N. Widespread changes in protein synthesis induced by microRNAs. Nature 455, 58-63 (2008). Sempere LF, Freemantle S, Pitha-Rowe I, Moss E, Dmitrovsky E, Ambros V. Expression profiling of mammalian microRNAs uncovers a subset of brain-expressed microRNAs with possible roles in murine and human neuronal differentiation. Genome Biol. 5, R13 (2004). Siegel G, Obernosterer G, Fiore R, Oehmen M, Bicker S, Christensen M, Khudayberdiev S, Leuschner PF, Busch CJ, Kane C, Hubel K, Dekker F, Hedberg C, Rengarajan B, 124 Drepper C, Waldmann H, Kauppinen S, Greenberg ME, Draguhn A, Rehmsmeier M, Martinez J, Schratt GM. A functional screen implicates microRNA-13 8-dependent regulation of the depalmitoylation enzyme APTI in dendritic spine morphogenesis. Nat. Cell Biol. 11, 705-716 (2009). Stark A, Brennecke J, Bushati N, Russell RB, Cohen SM. Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3' UTR evolution. Cell 123, 1133-1146 (2005). Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440-9445 (2003). Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545-15550 (2005). Sudhof TC. The synaptic vesicle cycle. Annu. Rev. Neurosci. 27, 509-547 (2004). Taganov KD, Boldin MP, Chang KJ, Baltimore D. NF-kappaB-dependent induction of microRNA miR- 146, an inhibitor targeted to signaling proteins of innate immune responses. Proc. Natl Acad. Sci. USA 103, 12481-12486 (2006). Tarasov V, Jung P, Verdoodt B, Lodygin D, Epanchintsev A, Menssen A, Meister G, Hermeking H. Differential regulation of microRNAs by p53 revealed by massively parallel sequencing: miR-34a is a p53 target that induces apoptosis and GI-arrest. Cell Cycle 6, 1586-1593 (2007). Thai TH, Calado DP, Casola S, Ansel KM, Xiao C, Xue Y, Murphy A, Frendewey D, Valenzuela D, Kutok JL, Schmidt-Supprian M, Rajewsky N, Yancopoulos G, Rao A, Rajewsky K. Regulation of the germinal center response by microRNA- 155. Science 316, 604-608 (2007). Tsang J, Zhu J, van Oudenaarden A. MicroRNA-mediated feedback and feedforward loops are recurrent network motifs in mammals. Mol. Cell 26, 753-767 (2007). van Rooij E, Sutherland LB, Thatcher JE, DiMaio JM, Naseem RH, Marshall WS, Hill JA, Olson EN. Dysregulation of microRNAs after myocardial infarction reveals a role of miR-29 in cardiac fibrosis. Proc. Natl Acad. Sci. USA 105, 13027-13032 (2008). Ventura A, Young AG, Winslow MM, Lintault L, Meissner A, Erkeland SJ, Newman J, Bronson RT, Crowley D, Stone JR, Jaenisch R, Sharp PA, Jacks T. Targeted deletion reveals essential and overlapping functions of the miR-17 through 92 family of miRNA clusters. Cell 132, 875-886 (2008). Visvanathan J, Lee S, Lee B, Lee JW, Lee SK. The microRNA miR- 124 antagonizes the anti-neural REST/SCP 1 pathway during embryonic CNS development. Genes Dev. 21, 744-749 (2007). 125 Webster RJ, Giles KM, Price KJ, Zhang PM, Mattick JS, Leedman PJ. Regulation of epidermal growth factor receptor signaling in human cancer cells by microRNA-7. J. Biol. Chem. 284, 5731-5741 (2009). Wegman EJ. Nonparametric Probability Density Estimation: I. A Summary of Available Methods. Technometrics 14, 533 (1972). Xiao C, Rajewsky K. MicroRNA control in the immune system: basic principles. Cell 136, 26-36 (2009). Xie H, Lim B, Lodish HF. MicroRNAs induced during adipogenesis that accelerate fat cell development are downregulated in obesity. Diabetes 58, 1050-1057 (2009). Yang Z, Kaye DM. Mechanistic insights into the link between a polymorphism of the 3' UTR of the SLC7A1 gene and hypertension. Hum. Mutat. 30, 328-333 (2009). Yu F, Yao H, Zhu P, Zhang X, Pan Q, Gong C, Huang Y, Hu X, Su F, Lieberman J, Song E. let-7 regulates self renewal and tumorigenicity of breast cancer cells. Cell 131, 11091123 (2007). Yu J, Wang F, Yang GH, Wang FL, Ma YN, Du ZW, Zhang JW. Human microRNA clusters: genomic organization and expression profile in leukemia cell lines. Biochem. Biophys. Res. Commun. 349, 59-68 (2006). Zhao JJ, Lin J, Yang H, Kong W, He L, Ma X, Coppola D, Cheng JQ. MicroRNA221/222 negatively regulates estrogen receptor alpha and is associated with tamoxifen resistance in breast cancer. J. Biol. Chem. 283, 31079-31086 (2008). 126 .......... . wwffi* e ... ....... i ............... ............ Mg Figures Figure 1. mirBridge overview (A) The input to mirBridge is a set of genes. Red and blue squares denote conserved and nonconserved seed-matched sites in the 3' UTR respectively. The number inside the squares denotes the context score. For each miRNA target sequence of interest, mirBridge computes the N, K, H, and T as illustrated. (B) The procedure for evaluating whether N is significantly higher than that of comparable random gene sets (the OC test). To obtain the null distribution for N, random gene sets with similar 3' UTR properties were constructed by replacing each gene in the original set (gl. ... g,; solid red dots) by a randomly drawn gene (r, r2 , ... r). The probability that ri is drawn to replace gi is inversely proportional to its distance to g, in the 3-D space defined by 3' UTR length, GC content and general conservation level. The histogram depicts the null distribution of N for miR-16 in the cell-cycle gene set. (C) The procedure for evaluating whether K and H are significantly higher than those of random gene sets containing T putative targets with similar 3' UTR properties as the putative targets in G (the CE and CTX tests, respectively). The same gene sampling procedure from (B) is used except that only the putative targets in G (empty ) so that T is identical across G red dots) are replaced by random putative targets ( and the random gene sets. The histograms depict the null distributions of K and H, respectively, for random gene sets with T=5 putative targets for the miR-16 and the cell-cycle gene set. A T UTRs of gene 1 to gene n gene set G 31 gene 3 - non -on,-ed~e m R X seed-matc with contet score wico N total#of(U]and[ 76 te t score 92.&'na I K total#of l with contet score > threshold t H total # of [ ]and[ T total # genes in G with at least one seed-niatch (either (NJ or 127 ) * IJRS agne %eto 3 UTqs ginome Draw random gene se R (r,, Obtan p(N)bydrawmng 5000 gene sets a r~r.r~ taigws in gP~ ~et ~ ~ ur~ wir~aq~It* OfUwd~nakh Putattv~t~ta in genmw r.) ~ UTRawitti hi bail an, iuinr mut~li Draw random putative target set I'a (;, Obtain p(KI T)by dtawmng 5000 gen sets r;. r; Obtain p(Hi T) by drawing 5000 gene sets vegsites in ge-netet 0 cx 0 N 4 i - a $te Occurrqnce $ignature (OC) onsrvation Signature (CE) Conte.score Signature (CTX) Is N sgnditcantly higher than comparabue random gene sets with n genes? Is K sgnatcantly higher than corpawabe random putative target sets with T5 mR-16 putative targets? Is H signscatly higher than rmeer o putatve COMve target sets with T*5 miR-16 putative targets? 128 .. ......... ------- ............................................ ....... . . .... Figure 2. miR-Jand PIP3 signaling in cardiac hypertrophy The orange repressive arrows depict high-quality putative targets of miR-1 in the PIP3 pathway in cardiac myocytes (see Experimental Procedures). The rest of the network is based on known interactions compiled from the literature (Heineke and Molkentin, 2006). See Figure SI for network diagrams of other selected predictions discussed in the text. Cardiac hypertrophic signals mR-1 K stress +14. i Poieiw~a4iy ie~~e~i Oy I iT1~R~ 111NOW I, 4 * 129 ............ Figure 3. The miRNA-cotargeting network inferred by mirBridge The thickness of the edges is proportional to - log (q). (A) The ten most highly connected nodes and the adjacent edges are highlighted in yellow and red, respectively. (B) Examples of highly interconnected subnetworks. See also Figure S3. / ~< / I ~ A ~ A m*INA lomiuy miR-93.hd/2910 4 3p/29 /295/3 2/ 372/3 73/520 mfR-175p/20/93 m r/106/519 d miR-130/301 miR-148/152 miR-181 rm#R-1O1 mtR-34/449 miR-26 m)R-19 miR-221/222 TotWl o*fpmEked ttore 7 17 24 1717 6 18 24 1388 13 13 9 7 10 4 11 9 10 10 11 13 9 13 5 6 23 23 20 20 19 17 16 15 1121 1063 1322 1725 1280 1430 1332 787 alt 130 7 )7341 Table 1. Selected mirBridge predictions with published evidence. Due to space limitations, typically only targets with a conserved and high context-scoring site are shown (see Table SI for details). "High-quality putative targets" are ones with either a conserved or high context-scoring site (see Experimental Procedures). miRNA Function q value # of highquality putative targets Selected targets Evidence 0 3 TRAF6, IRAK1 (Joneset al, 2009, Taganov al. 2006) IL1 receptor, 146 NFKB, Toll Like Receptor signaling signlinget CCNE1, CCND1, CDC25A, CCND2 15/16/195/424/497 Cell cycle; Cl cl G1 to S 0 29 Collagen 0 7 E1bB signaling, Eirsnalig ghioma 7 insulin signaling CCNE1, CCND1, CDC25A, CCND2, E2F3, WEE1 7 COL4A1, COL4A2, COL4A3, COL4A4, COL4A5 16 PTK2, PIK3CD, RAF1, ERBB4, RPS6KB1 12 RB1,PIK3CD,RAF1 0 0.000208 18 MKNK1, PTK3CD, RAF1, D, RKB1 RPS6KB1I, IRS2 (Linsley et al. 2007, Liu et al. 2008) (Li et al. 2009, van Rooij et al. 2008) (Kefas et al. 2008, Webster et al. 2009) (Bravo-Egana et al. 2008, CorreaMedin ea Medina et al. 2009, Joglekar et al. 2009) 15/16/195/424/497 Wnt pathway 0.0356 14 FZD10, CCND1, CCND2, PAFAHIBI, PPP2R5C (Bonci et al. 2008) 103/107 TNF pathway 0.0522 6 HRB, MAP3K7, NR2C2 (Xie et al. 2009) 122 NO1 pathway 0.0546 4 CALM3, SLC7A1 Kae 2 09) 15/16/195/424/497 prostate cancer 0.07345 18 PIK3R1, AKT3, CCNE1, CCND1, FGFR1, E2F3,' MAP2K1 (Bonci et al 2008)e SMAD5, FKBPIA, ROCKI, SMURF2, ACVR1B, INHBA, ROCK2, TGFBR1 (Li et al. 2008) 7NUMBL, JAGI, NOTCHI, NOTCH2, DLLI (Ji et al. 2008, Ji et al. 2009) 0.0865 17 CCL1, IL12A, FASLG ACVR2A (Lu et al. 2009) . . PIP3 signahng in cardiac myocytes 0.0977 8 IGF1, CDC42, CREB5, YWHAZ, PTPN1, YWHAQ, MET, PREXI (Care et al. 2007, Sayed et al. 2007) cell cycle; 0.122 7 E2FI, CCND2, RBLI (Cloonan et al. 135 TGF beta signaling 0.07389 19 34a/449 Notch signaling 0.07389 21 cytokine-cytokine receptor interaction 1/206 17- 131 E2F1, CCND2, RBL1 5p/20/93.mr/106/519.d Gi to S 0.122 221/222 breast cancer estrogen signaling 0.1432 7 KIT, CDKN1B, ESRI 34/449 BAD pathway (apoptosis) 0.1499 5 BCL2, KITLG, KIT, IGF1, PRKACB let-7/98 breast cancer estrogen signaling 0.1595 13 CYP19A1, FASLG let-7/98 GI to S 0.1871 8 E2F6 CCNG2, E2F1, CCND2, WEE1, RBL1 2008, Pickering et al. 2009) (Miller et al. 2008, Zhao et al. 2008) (Chang et al. 2007, Cloonan et al. 2008, He et al. 2007) (Schultz et al. 2008, Yu et al. 2007) (Schultz et al. 2008, Yu et al. 2007) 132 Table 2. Selected new mirBridge miRNA function predictions (see Table SI for details). Same format as Table 1A. of high-quality Selected targets miRNA Function q value 33 statin pathway 0.00155 2 ABCA1 203 G alpha i pathway 0.00532 9 PITX2, SHC1, SRC, ITPR2 23 apoptosis 0.00801 9 IRFI, IRF2, BNIP3L, CHUK, CASP7 205 tight junction 0.01195 16 CNKSR3, YES1, EPB41, PRKCE, MAGI2, ACTB, CLDN11 187 antigen processing and presentation 0.02192 9 KIR2DL2, IFNA2, KIR2DL5A 219 nuclear receptors 0.02806 6 THRB, NR2C2 175p/20/93.mr/106/519. d JMAP3K3, KMAPK pathway 0.0377 12 MAPK9, DUSP8, MAP3K12, MAP3K5, MAP3K9, NR2C2, GAB1, MAP3K2 124.2/506 axon guidance 0.04983 24 SEMA6D, CHP, NFAT5, NRAS, GNAIl, ROCKI, PLXNA3, GNAI3, ITGB1, NFATC1, NRPI, SEMA6A 34a/449 glycosphingolipid biosynthesis 0.05144 3 FUT9 128 GnRH signaling 0.05144 13 ADCY8, MAP2K7, GRB2, PRKX, PRKY, MAPK14 24 cytokine-cytokine receptor interaction 0.05203 32 EDA, PDGFRA, ILIRI, TNFRSF19 375 purine metabolism 0.0544 11 PDE4D, PDE5A 141/200a EGF/PDGF pathway 0.0637 7 MAP2K4, STAT5A, GRB2 101 ubiquitin mediated proteolysis 0.07681 - 9 UBE2D1, UBE2D2, UBE2D3, FBXW11, FBXW7, UBE2A regulation of actin cytoskeleton 0.07816 13 CFL2, ITGB8, ROCK2, CRK, RAC1, APC, ITGAV Ca signaling 0.07827 21 GRIN2A, ADRB1, ADCY9, CACNA1C, ADCY1, ITPR1, CALMI, ADCY7, SLC8A1 19 133 apoptosis 0.1148 135 integrin pathway 0.122 12 AKT3, PTK2, ROCKI, ROCK2, ANGPTL2, PLCG1, ARHGEF6, ARHGEF7, PAK7 93.HD/2913P/294/295/302/372/ nuclear receptors 0.1342 8 NR4A2, ESRI, NR2C2 27 statin pathway 0.1396 4 HMGCR, ABCAI 33 cell cycle 0.1555 4 CDK6 insulin signalingreceptor 0.1934 9 PIK3R1, GRB2, RPS6KB1 BCL,2L11 373/520 153 153 Table 3. Testing mirBridge on several known phenotypes compiled from the literature. The q values were computed based on simultaneous testing across miRNA seeds for the gene set. Black: highly significant; blue: marginally significant; : not significant. See also Table S2. miRNA Known miRN function fnctin Knon p qq rank (outmotifs) of 143 seedmatched References 141/200a epithelial-mesenchymal 0.0018 0.08 1 (Burk et al. 2008, Gregory et al. 2008, 21 apoptosis 0.006 0.39 1 (Chan et al. 2005) 155 B cell receptor signaling 0.007 0.29 5 (Thai et al. 2007) 181 T cell receptor signaling 0.008 0.07 5 34 P53 path way 0.04 0.32 14 (Li et al. 2007) (Chang et al. 2007, He et al. 2007, Raver-Shapira et al. 2007, Tarasov et al. transition Korpal et al. 2008, Park et al. 2008) 2007) 223 22______ ngranulocyte differentiation 0.07 0.62 15 134 (Johnnidis et al. 2008) ....................................... ..... _ _ I Supplemental Information Figure S1. Network diagrams of selected mirBridge predictions discussed in the main text (related to Figure 2). Aside from the miRNA targeting links, the networks are compiled based on the literature. (A) mirBridge predicts that miR-15/16/195 could regulate several intricately linked pathways that control cell proliferation and cancer, suggesting that a general function of the miR-15/16/195 family is to control proliferation and/or growth. Several putative targets have multiple high-quality seed-matched sites (Table Sla). (B) mirBridge indicates that miR-146 functions in NF-kB, IL4 and TOLL pathways where miR-146 mediates several negative feedback loops to upstream signaling factors. (C) mirBridge indicates that miR-33 functions in cholesterol homeostasis. miR-33a is probably co-expressed with SREBP2 because it is embedded in an intron of SREBP2. miR-33 also putatively regulates the cell cycle network and the PGC 1a pathway, forming a double-negative (i.e. positive) feedback to cholesterol. B A miR | 15/16/195 ed 2a Cychn-El-DL 02,-03, M*-G2 Breen-cancer-assocated and Win patiway activation caCelycle miR-146 Cell proliferation and cancer C PGClagcoactivated energy I SREBP-2 1 Peripheral cells and miR-3 Lier liver HMGCR A GOA/i 1CA1 M Cholesterol horneostasis 135 G2 Cell cycle Figure S2. (related to Figure 3) Correlation between reciprocal co-targeting predictions For each miRNA-family pair (X,Y), the lowest mirBridge p values of X->Y and Y->X are plotted against each other. The entries were partitioned into 10 bins by the X->Y p values and the average Y->X p value wasere computed and plotted against the average X->Y p value of each bin (resulting in the blue line). The reciprocal p values are significantly correlated (Spearman correlation = 0.42, p = 0). It is important to note that while many miRNA families are reciprocal co-targeting pairs (X<->Y), it is biologically plausible that X->Y need not imply Y-X. For instance, Y may function in more diverse contexts than X, yet co-targeting may be functionally important only in the contexts where X functions. A likely example, albeit on the more extreme end, involves the miR-99/100 and miR-125/351 families with 80 and 1362 predicted targets, respectively. The PTS of miR-99/100 has a large number of seed-matched sites for miR-125/351 with a significant fraction of those being conserved and/or having high context scores, yielding a q-value of 0.03. In contrast, the reciprocal q-value is 0.92 because the larger miR-125/351 PTS only contains a small number of sites for miR-99/100, and an insignificant fraction of those are conserved and/or have high context scores, suggesting that most of miR-125/351's functional contexts are not shared with miR-99/1 00. A similar example involves the miR-17 and -18 families where the latter has a smaller PTS. Individual cases aside, PTS-size difference is not a major contributing factor: the size-difference distribution between PTSs for miRNA-family pairs having both X->Y and Y->X q-values of less than 0.2 do not significantly deviate from those pairs with a significant p-value in only one direction (p = 0.24 Kolmorgorov-Smirnov Test). Correlation between reciprocal predictions 07 046 032 01 0[ 0 0.1 02 0.3 04 05 06 07 0.8 X->Y p value 136 09 1 ................ ...... .................. Figure S3. General conservation level is predictive of the conservation level of individual motifs. The distribution of correlations between general and specific conservation rates across 314 seedmatched motifs (i.e. one correlation value for each motif) is shown. The specific conservation rate was computed based on individual motifs whereas the general conservation rate was computed across all 7-mers. All but 5 of the correlations have P values less than or equal to 0.01. The results are similar if only 3' UTRs that are at least 1000 nt long were used. Distribution of correlation between general and specific conservation rates 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Spearman Correlation --- based on all 3UTRs --- based on 3UTRs with length > 1000 Figure S4. Similar to Fig. S3, but the correlation was computed based on general conservation rate and the occurrence count of individual motifs. All but 5 of the correlations have P values less than or equal to 0.01. Distribution of correlation between general conservation rate and occrrence count of conserved motifs 0.14 0.12 * 0.1 0.08 S0.06 L. 0.04 0.02 ! M 0 0.05 0.1 0.15 0.2 Spearman Correlation 137 0.25 .......... Figure S5. The general conservation rate distribution of genes in the PIP3 gene set vs. that of all genes in the genome. PIP3 genes in general have higher background conservation levels. The two distributions are significantly different (Kolmogorov-Smirnov Test; the p value is as shown). 0.2 0.18 Conservation 0.16 0.14 j P = 6.3e-8 C 0.12 0.1 J 1 0.08 a All 0.06 N PIP3 0.04 0.02 0 W z m m n - o V N 0 MA p m -4 0 00 (N00 ;t 0( 17 qT T' Wn q4 r iii N 00 m rn -q log(conservation) Figure S6. GC content of a 3' UTR is negatively correlated with the context score. Distribution of correlation between GC-content and context score -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 Spearman correlation 138 0.1 - - . -. - - - -- -------------------------- .............................. . .................................. Figure S7. Histograms are examples of kernel-based density estimators. The kernels are constant functions with a fixed value within a defined neighborhood and zero everywhere else. The red dots are samples, which were drawn from a normal distribution with mean=10 and standard deviation=5. The top example uses kernel functions of width=4. The histogram was constructed by sliding a window of size 4 starting from -10 and counting the number of samples that fall within the window. The bottom example estimates the density by using kernels of width=2. Bandwidth=4 I) 5 10 15. 0 15 values 139 25 3 Figure S8. Density estimation by using Gaussian kernels. The red dots are samples, which were drawn from a normal distribution with mean=10 and standard deviation=5. The estimated density is the sum of normal densities with means set to the values of individual samples; the standard deviation is specified by the bandwidth parameter. The blue densities are the individual kernels and the green density is the sum. Note when the number of samples and the bandwidth are both small, there are lots of local bumps in the resulting density (top plot). A larger bandwidth avoids such biases and results in a smoother estimate (bottom plot). 0 Bandwidth=1 0 3 - 0 -0 0 00 10 -5 0 5 10 15 20 25 20 25 values 2 6 Bandwidth=2 4 4 - 0 0 10 5 15 values 140 M991= _ MMM .................... ......................... "I'll -- Figure S9. The input gene set is the PIP3 signaling pathway in cardiac myocytes. For each bandwidth parameter a, 100 random gene sets were generated using the algorithm described in the text. The length, general conservation rate, and GC content distributions of each of the random gene sets were compared to those of the input gene set by the Kolmogorov-Smirnov (KS) test. The average KS test p value across the 100 random gene sets is plotted. Note that as expected, the higher the bandwidth, the lower the p value. mirBridge uses the largest bandwidth so that the lowest of the three average p values is higher than a predetermined threshold. --- length --. conservation - -- GC 500 0 1000 1500 2000 2500 Figure S10. Context score distributions of conserved (green) and non-conserved (red) sites. Context score distributions of conserved and non-conserved sites / .4 :4-X 6 46 U 04- context score 141 Supplemental Experimental Procedures The mirBridge algorithm Inputs: 1. 2. 3. 4. A set M of miRNA seed-matched motifs. The motifs can be partitioned into two classes: m2-8 and ml-7-A-anchor A gene set G with n genes and their 3' UTRs The context score of all seed-matches (from M) in the 3' UTRs in G A context score threshold (t) Processing: 1. For each motif m in the class m2-8, determine the following statistics in G: a. The number of seed matches (N) (for OC) b. The number of genes (T) in G with at least one seed-matched site c. The number of conserved seed matches (K) (for CE) d. The number of seed-matches (H) with context score in the top t-percentile (for CTX) 2. Build the gene neighborhood: a. For each gene g in G, build an ordered array A of gene neighbors by sorting all 3' UTR x in the genome by the normalized Euclidean distance between the 3' UTRs of g and x using length, GC content, and general conservation. A[1] is the closest, A[2] the next closest, and so on. 3. Build the putative target neighborhoods for each miRNA motif: a. For each motif m from Step 1, form ordered array Am (as in Step 2) for each g in G by removing entries in the corresponding A that do not contain the motif m in its 3' UTR (i.e. not a putative target of the miRNA) 4. Compute the null distributions for N (the number of seed-matches) a. Determine the bandwidth parameter i. O For each a from a list of possible a's 1. For each g in G a. draw a random number x from the Gaussian density with mean 0 and variance a b. round x to the nearest integer and take its absolute value c. use x to index to g's neighbor list to draw a gene/3' UTR; i.e. A[x] 2. 3. 4. Compute the Kolmogorov-Smirnov p value between the drawn random gene set and G for each of length, GC-content, and general conservation Repeat the above for 100 (or more) times Take the average p value for each of length, GC content, and general conservation over the 100 iterations ii. b. Pick the largest a such that the lowest of the three average p values must be greater than a given threshold (currently set to 0.67) Repeat 10,000 times (or more) i. Using the a from Step a, draw a random gene set R as in Step 4-a-i-i 142 ii. 5. For each seed-matched motif in Step 1, compute N as in Step 1 for the random gene set to obtain the null distribution-for N Compute the null distributions for K (the number of conserved sites) and H (the number of highcontext-scoring sites) conditional on T a. For each motif from Step 1 i. Identify the putative targets in G (i.e. genes in G with at least one site) ii. Determining the bandwidth parameter as in Step 4a except: 1) use the putative target neighborhood for the motif (from Step 3); 2) only use the putative targets as members of G (i.e. ignore/remove genes without sites) iii. Using the procedure in Step 4-a-i-I, generate random putative target sets by replacing each of the putative targets in G with a randomly sampled putative target from the putative target neighborhood array (Am) for the motif and gene (from Step 3). Note that each random target set would have exactly T genes with at least one motif site For each random target set, compute K and H (note that by design each random target set has exactly T putative targets) Repeat 10,000 times (or more) to obtain the null K and H distributions conditional on T iv. v. 6. Compute the p value for N by using the null distribution from Step 4: count the percentage of random gene sets that have an equal or higher N. Similarly compute the p values for K and H by using the null distributions from Step 5: count the percentage of random putative target sets that have an equal or higher corresponding statistics. 7. FDR analysis: computing the q values across all m2-8 motifs: a. Use A - 0.5 to estimate the proportion of null features, 7r1, by counting the number of p values that are greater than 0.5; and divide this by n(1 b. - A). For each motif with p value p. Use A = 0.5, and estimate the proportion of null features by counting the #{pj > A; i =1,2 ... ,n} nA)= ii. c. Coipute q as Q= For each q, set q, to q) where q; is the minimum of all q values for which the corresponding p alues are greater than P . 8. Construct the composite test statistics (CE-CTX and OC-CE-CTX) and compute the corresponding p and q values (modified inverse-normal method): a. For each p value from the basic statistics (i.e. Pces te= 4 -(l - Poc>Pe.x), compute pce) where 4~1 is the inverse of the standard normal cumulative distribution funetion (i.e. normal with mean 0 and std 1). Similarly compute toc and tc~rx* b. Construct the composite statistics for each motif in G: tce ctx ~~ Weetce toc 1cX ctx c ete, + Wctxtctx + Wetxtctx 143 14 Octo where we, + Wctv = 1 a+d wt , + w = 1. The w's can be adjusted to assign different weights to basic statistics (currently c. wc 0.5,wer =0.5;w, 0.4, w, = 0.35,w e = 0.25 ). To compute p values, obtain the null distributions of ice crx and toc-c,_crx by the following method: i. ii. Compute the covariance between each pair of tC,, t0c, tr (by using values across all motifs. In cases where multiple gene sets are considered, values from all motif-gene-set combinations can be used) Compute the variance of t.,_c. anld toc-ce crx by using the formula: var(at, + bt, + ct 3 ) a'var(t1 ) + b 2 var(t,) + c. var(tO 3 + +2abc(cov(t. t.) + cov (t, t) + cov(t Z, t 3 *)) iii. Compute the means of te,_ctx and toc-c,_cex by using the formula: mean(at1 + bt,+ct)= mean(t) a eant 1) + b -mean(t) +c - The null distributions of tcce, and toc ce ~cx are normal distributions with the mean and variances computed above. v. The p values of the observed statistics for each motif can be computed from the null distributions. Compute the q values for the composite p values as in Step 7. iv. d. 9. Repeat steps 1-8 for ml-7-A-anchor motifs Output: For each input motif, the q value of each test is provided. Note: If multiple gene sets are being tested simultaneously, the FDR procedure (Step 7) can be adjusted to include p values from all motif-gene-set combinations. Similarly for Step 8c the t's from all motif-gene-set combinations can be used to estimate the covariances and to compute the q values (Step 8d). The mirBridge null model The discussion below focuses on defining the appropriate null models for the test statistics used in mirBridge (i.e. CE, CTX, OC). As discussed, the null model of the CE and CTX tests is based on randomizing putative target sets while that of OC is based on randomizing the entire gene set (Fig. 1 in main text). The main task is, however, that of generating a random set of genes that has similar properties as a particular gene set (i.e. for mirBridge the gene set can be a putative target set or the input gene set itself). Thus the following discussion revolves around "gene sets," but it should be understood that it equally applies to "putative target sets." 144 The simplest null model is to generate size-matched uniformly sampled random gene sets. However, as discussed in the main text, this can be an inappropriate null model because other factors, such as general (or non-specific) motif conservation rate, may lead to systematic biases. Below these key factors are empirically analyzed to show that they can indeed introduce systematic biases. The analysis of 3' UTR length is omitted because it is obvious that it is correlated with motif occurrences. General evolutionary rate For a given 3' UTR, the general (or non-specific) conservation rate is defined as the number of conserved 7-mers (because seed matches are 7-mers) divided by the total number of 7-mers (i.e. 3' UTR length - 6). By counting only the occurrences of a particular motif type, a similar definition is used for the conservation rate of a motif. To investigate whether non-specific conservation rate can affect the CE statistic, the general conservation rate and conservation rate of each seed-matched motif were computed for all 3' UTRs. The Spearman correlation' between the general and specific conservation rates for each motif was computed across all human 3' UTRs, resulting in 314 correlation coefficients (one for each of the Targetscan seed motifs of conserved miRNAs) (Fig. S3). 309 out of 314 of the motifs exhibit significant correlations (p < 0.01). To ensure that the correlation is not primarily due to unusually short 3' UTRs, the correlations were recomputed using only 3' UTRs that are longer than 1000 bp; the same result holds (Fig. S3). The significant correlations persist when the correlation between general conservation rate and the occurrence count of each motif were computed (309/314 have p < 0.01) even though the absolute correlation coefficients are lower (Fig. S4). This analysis strongly indicates that non-specific conservation rate is a strong predictor for the conservation rate of specific motifs. Therefore, an effective null model has to take the general conservation level of a gene set into account. For instance, genes in many biological gene sets, such as the human PIP3 signaling pathway in cardiac myocytes, have significantly higher general conservation levels than the rest of the genome (Fig. S5). GC content A key property used to compute the context score is the GC content around the seed match: higher GC contents can lead to more stable local secondary structures that block miRNARISC access (Grimson et al. 2007). This implies that the overall GC content of the 3'UTR can have an effect on the context score. To investigate this possibility, the percentage of bases that are either G or C was computed for each 3' UTR. The Spearman correlation between the percent-GC and the context score for each type of seed match was computed across all 3' UTRs, resulting in 314 correlation coefficients (Fig. S6). 304 out of 314 motifs exhibit significant negative correlation at p < 0.01, indicating that the overall GC content of the 3' UTR is a strong predictor of the context score. Correlation between different factors Significant pair-wise correlation exists between length (L), GC-content (GC), and general conservation rate (C) across human 3' UTRs, indicating that accounting for systematic biases introduced by any one of the factors alone can over- or under1 A non-parametric correlation measure is used because the normality assumption does not hold 145 compensate others (table below). An effective null model needs to consider all factors simultaneously (see below). Variable Pair Spearman correlation Simulated P value L-C 0.185 0 L-GC -0.085 0 C-GC -0.125 0 Additional factors So far three gene set properties (length, GC content and general conservation) that can introduce systematic biases have been discussed. A key question is whether additional factors need to be considered. In other words, are other factors largely conditionally independent 2 of the test statistics given L, GC, and C? This is a difficult question to answer empirically because there are a large number of possible factors. For instance, can the occurrence rates of certain kmers (k=2, 3, 4...) affect the context score and/or evolutionary rate of certain seed-matched motifs? The frequency of a given k-mer can affect the frequency of motifs containing subsequences that are correlated in frequency to the k-mer. However, aside from OC, our test statistics are conditional on N, so factors that affect motif frequencies are unlikely to have a significant effect (as discussed in the main text, OC is only used in the composite score but is not used alone as an indication of functional targeting). A related concern is that the evolutionary rate of a subset of the motifs may be dependent upon the frequency of some k-mers, but such dependencies should be largely captured by the general conservation rate measure, especially if the number of affected motifs is relatively large. In fact, one would not want to miss the signal if the differential rate is specific to a small set of motifs, because such signals can reflect constraints imposed by miRNA-mediated regulation. L, GC, and C are likely the most direct gene-set properties that affect the test statistics. The p value distributions from the analysis of a large number of biological gene sets (using OC-CE-CTX) indicate that a null model that accounts for these three factors is effective (i.e. the distribution is quite uniform). In addition, our formulation of the null model and our method to compute the null distribution do not preclude the incorporation of additional factors (see below). In fact, in principle any combination of factors can be incorporated. Defining the null model The above analysis indicates that an effective null model can be defined based on comparable random gene sets, i.e. ones that have similar L, GC and C distributions as the given gene set (G). Formally, given a statistic S (e.g. K IN) and a gene set G, whose genes have a joint empirical (L,GC, C) distribution D (i.e. L. GC, C IG ~D), the goal is to obtain the distribution of S ID. By conditioning on D, this model formally requires that the random gene sets have similar properties as G. Note how this definition allows the incorporation of additional factors by conditioning on a joint distribution. The p values of the observed statistics of G can be computed from the SID distribution. 2A random variable X is conditionally independent of Ygiven Z if P(X. YIZ) = P(XIZ) -P(YlZ). In other words, all correlation between Xand Yis through Z; once Z is fixed, X and Yare no longer correlated. 146 The advantage of this model is that the joint empirical (L, GC, C) distribution of G is taken into account, but the computation of the null distributions can be challenging. A simpler alternative is to only condition on a summary statistic of the empirical distribution, such as the mean or median, to account for overall trends. However, this is problematic if the higher moments of the empirical distribution are also significantly different from the genome-wide distribution. Below a novel sampling scheme is introduced to compute the null distribution of any gene-set based statistic given the (L, GC, C) distribution of G. Computing the null distributions Given G with n genes (or putative targets), a direct way to compute the null distribution is to generate random gene sets by sampling n gene from the genome according to the empirical distribution D. One approach to accomplish this is to repeatedly draw a sample from D (i.e. a (1, ge, c) triple) and pick a gene whose length, GC content, and general conservation is closest to the drawn sample. This sampling procedure requires that a parametric form be fitted to the empirical (L,GC,C) distribution; the joint density can also be obtained by techniques such as kernel-based estimation (Duda et al. 2001). We opted to pursue the latter because it is non-parametric and purely data driven, and can thus avoid potential biases introduced by parametric models; it also allows the easy incorporation of additional conditioning factors because different parametric models are likely needed for different combinations of factors. A kernel-based estimator fits a given empirical density by a set of parameterized functions called kernels. The density function is the sum of kernel functions defined over the domain of the random variable(s). Formally, let f1 (xj.) be the ith kernel with parameter vector 6; the f (x| ) where x can be a vector and nk is the total number estimated density is f(x) = f' of kernels. A simple example of a kernel-based density estimation procedure is the construction of histograms from data (Fig. S7). The kernels in this case are constant functions in a defined interval. Each kernel is parameterized by two parameters: location and height. For instance, a one-dimensional kernel has the form: h if x e [a,b] 0 otherwise where [a,b] specifies the location and h specifies the height (or probability mass) in [a,b]. The location of the kernels is determined by the center of each bin and the height reflects the number of data points that fall within the bin (Fig. S7). The location parameter in a multidimensional kernel specifies a hypercube. The size or volume (also called the bandwidth) of the location parameter (e.g. lb-al in the 1-d case) is a key that determines the performance of the estimator. Ideally the bandwidth should always be small if sufficient data are available; because if the bandwidth were too large each data point would exert bias on the density of the nearby points. However, in practice, data can be limiting and hence the bandwidth parameter needs to be optimized so that the maximum amount of information can be extracted from the data with minimum bias (Turlach 1993). 147 ....... .... A common approach is to use one kernel per data point and then infer the bandwidth parameter, either individually for each kernel or one for all kernels. Gaussian kernels are often used because they have a tractable analytical form and nicely model the intuitive notion that the density influence of a data point should gradually diminish as one moves away from the data point (rather than abruptly going to 0 if a constant function is used). For instance, given n one-dimensional data points di, the estimated density is f(x) = 1 g(x Idi, oa), where g(- li, a2) denotes the Gaussian density with mean y and variance a 2 (Fig. S8). Sampling from such kernel-based densities is straightforward: one can randomly pick one of the kernels and sample according to the kernel density. Gene-neighborhood sampling Multidimensional Gaussian kernels (i.e, in L-GC-C space), one per gene in the input gene set G, can be used to obtain the empirical (L, GC, C) distribution of G. The following algorithm can be used to generate a random gene set: For each gene g in G, 1. Sample a (lgc,c) triple from the Gaussian kernel of g 2. Find the gene in the genome whose 3' UTR length, GC content, and general conservation is the closest to (l,gcc). To evaluate "closeness" in the second step, a distance metric is needed in the L-GC-C space. The Euclidean distance can be used after normalizing each dimension by their mean and standard deviation 3 to ensure that the variables with larger absolute magnitudes do not dominate the distance measure (e.g. 3' UTR length). A verbatim implementation of this algorithm can be inefficient because locating the closest gene for any given (1,gc, c) takes time proportional to the number of genes in the genome. However, note that for each g, the above algorithm is equivalent to sampling from genes that are close to g in the L-GC-C space (i.e. the neighbors of g), so by indexing the neighbors using their normalized Euclidean distance to g, the look-up step for the closest gene can be made more efficient: 1. For every gene in the genome, sort all genes in the genome in the order of normalized Euclidean distance to g; index them by the distance. 2. For each gene g in G a. sample a (l,gc,c) triple from the Gaussian kernel of g b. determine the distance d between (l,gcc) and g c. use d to look up the index to obtain the closest gene the (Igcc) triple associated with each gene, the normalized length, gc-content, and general conservation level is ( , , -), where <-> and a are the mean and standard deviation of 3 For the respective variables. 148 NIr NWIWM. _ W - '. ............ ...... . . ....... . . .. ........................................... M .............. 1 .. ..... . ...... ............ ............. ................. Note that in this algorithm the sampling from L-GC-C space essentially reduces down to sampling from the distance space, i.e. each (1,gc, c) triple sampled was converted to d, which is the critical parameter for locating which gene to pick. Hence a one-dimensional kernel in distance space can be defined for each gene in G to replace the three-dimensional L-GC-C kernel. The distance-space sampling can be further simplified to distance-rank-space sampling: 1. For every gene u in the genome, assign ranks to all genes in the genome based on their normalized Euclidean distance to u (e.g. the closest gene has rank 1, next has rank 2, and so on). 2. For each gene g in G a. sample a rank from the Gaussian kernel of g (draw a sample from the Gaussian, take the absolute value and round to the nearest integer). b. return the gene with the sampled rank Note that the rank is gene-dependent and can correspond to different actual distance units across genes. A rank-based kernel, such as the one used above, is desirable if one wants to ensure that every gene has an equal-size sampling neighborhood (i.e. with the same number of genes). This makes intuitive sense in that if a gene in G resides in a sparse neighborhood in the L-GC-C space, its effect on the mass of the estimated density in L-GC-C space around the neighborhood should be broader. This is equivalent to scaling the kernel bandwidth in distance space by the gene density around the gene (i.e. genes with rare L-GC-C attributes have a kernel with larger bandwidth). The parameter remaining to be specified is the bandwidth of the kernels (i.e. the o of Gaussians). If a is too large, the L-GC-C distribution of the random gene sets would be significantly different from G; whereas a small a can lead to bias as illustrated in Fig. S8. In practice o is largely a function of the size of G. To determine a reasonable a, we use the algorithm above to draw random gene sets using different a and compare the L, GC and C distributions of each random set to the respective L, GC and C distributions of G. For each a, a large number (>100) of random gene sets are used so that an average deviation based on the Kolmogorov-Smirnov Test can be computed. The largest a that does not result in an average deviation greater than a prespecified threshold4 from the L-GC-C distributions of G can be used as a good bandwidth estimate. An example can be found in Fig. S9. Compiling high-quality putative target sets To compile high-quality putative target (HPT) sets for co-targeting analysis (and also for examining HPTs within gene sets), we aim to include Targetscan predictions that either have at least one perfectly conserved seed match and/or predictions with at least one seed-matched site that has a high context score. To infer a good context score cutoff, we examined the context score distributions of conserved and non-conserved seed-matched sites (Fig. S10). Below (above) a 4 Currently set to 0.67, which was determined based on a simulation experiment 149 context score of -68, non-conserved (conserved) sites are enriched. This suggests that a context score of 68 is a good cutoff to use for inferring high quality non-conserved sites if we make the plausible assumption that conserved sites are enriched with true positives. Thus we defined highquality targets as ones having at least one conserved seed-matched site and/or ones having at least one seed-matched site with a context score greater than 68. The connection between CE and prior tests that use evolutionary conservation The CE test is fundamentally different from a couple of seemingly similar tests (Lewis et al. 2005, Stark et al. 2005): CE evaluates the degree of gene set-specific conservation of the miRNA target sequence above that of the same sequence in comparable random gene sets, whereas the earlier tests evaluate whether the conservation level of the target sequence is significantly above that of random sequences in the same gene set. miRNA target sequences are typically significantly more conserved than random sequences across all genes and gene categories (Stark et al. 2005, Xie et al. 2005). Thus, merely having higher conservation than random motifs in the same gene set may not be sufficiently specific to establish functional linkage between a miRNA and a gene set; the type of conservation enrichment detected by the CE test is more appropriate. Sensitivity and specificity of the OC-CE-CTX test: Alternative test scores and comparisons Other combinations of the three basic tests (CE, CTX and OC) are possible. For instance, by combining the CE and CTX tests one can form the "CE-CTX" score, which can lead to miRNAgene set predictions solely from known functional targeting signals (i.e. conservation and favorable 3' UTR sequence context). Comparing the performance of different tests is difficult because true positives (i.e. known miRNA functions), especially in the context of pathways, are lacking. Below we discuss several analyses that suggest OC-CE-CTX has the best sensitivity and specificity among tests that use the three basic 'scores. Specifically, we will compare two basic tests (CE and CTX) and the CE-CTX composite test to the OC-CE-CTX test using the pathway and module gene sets. While other tests are possible, e.g. OC-CTX and OC-CE, their utility is clearly bested by the OC-CE-CTX test (and the OC test alone is insufficient to suggest functional targeting as discussed in the main text). At a global FDR cutoff of 0.2 (across gene-set and seed-motif combinations), the CE, CTX, CECTX, and OC-CE-CTX tests predict 7, 1, 37 and 215 miRNA-gene-set associations, respectively, for the pathway gene sets; and 4, 2, 23, and 186 respective predictions for the module gene sets. The CE and CTX predictions are all in the CE-CTX and OC-CE-CTX lists, indicating that, as expected, the composite tests are more sensitive. Below we focus on comparing the CE-CTX and OC-CE-CTX pathway prediction results. The CE-CTX pathway predictions are largely in the OC-CE-CTX set, except four pathways with higher (close to 0.2) CE-CTX q values (in the case of modules, only one prediction is in CE-CTX exclusively; we only focus on the pathway results in the discussion below as the module results share the same trend). However, the relative ranking of some individual predictions (based on the 150 q values) are different across the OC-CE-CTX and CE-CTX lists. For example, predictions ranked near the top of the CE-CTX list but having a low OC score are ranked lower in the OCCE-CTX predicted list. The miR-1-PIP3 association is such an example, where it has a higher rank (9/37 versus 72/215) and a more significant q value (0.065 versus 0.098) in the CE-CTX list because the number of putative miR-1 binding sites is not unusually high (p=0. 3 8 ) in the PIP3 gene set (even though the proportion of conserved and high-context-scoring sites are unusually high-the basis of significant CE and CTX scores). The fact that the OC-CE-CTX test only excludes a few CE-CTX predictions with higher q values is encouraging as this suggests that OCCE-CTX achieves higher sensitivity (i.e. significantly larger number of predictions) without sacrificing specificity (that is, OC-CE-CTX selectively excludes only the less-confident predictions in the CE-CTX list; see below). To infer whether the additional predictions made by OC-CE-CTX are enriched for true positives, we compare the CE-CTX p-value distribution of miRNA-pathway pairs that are exclusively predicted by OC-CE-CTX to that of miRNA-pathway pairs not predicted by OC-CE-CTX. If the use of OC signals by OC-CE-CTX largely results in false positives, we expect the two distributions to be statistically indistinguishable (they would also have comparable median p values). However, the two distributions are drastically different (p < 2.3 X 10~1S, Kolmogorov-Smimov Test) and the median p values are 0.009 and 0.5 respectively (their difference is highly significant: p < 4.4 X 10-3, Mann-Whitney Test). Reassuringly, the latter distribution is essentially uniform, as is expected for p values randomly drawn from the null. Furthermore, if we compute the CE-CTX q values by using only those miRNA-pathway pairs predicted by OC-CE-CTX exclusively, all CE-CTX pairs would have a q value smaller than 0.2. This suggests that these pairs had insignificant CE-CTX q values (>0.2) only because the CE-CTX test has insufficient statistical power when many miRNAs and gene sets are tested simultaneously. In stark contrast to the analysis of pathway and module gene sets, CE alone gives a much larger number of miRNA-miRNA co-targeting predictions at a F DR cutoff of 0.2 than both CE-CTX and OC-CE-CTX (3053, 85, and 221 distinct miRNA-family pairs predicted by CE, CE-CTX, and OC-CE-CTX tests, respectively). A majority (>75%) of the CE predictions that overlap with those of OC-CE-CTX have small CE q values (<0.1), while more than 90% of non-overlapping pairs have CE q values larger than 0.1. This strongly suggests that a large percentage of the nonoverlapping predictions are false positives, where OC-CE-CTX excludes them because they are not simultaneously supported by other tests (CTX and/or OC). This apparent lack of specificity of CE compared to the composite tests indicates that the non-specific conservation biases in these predicted target sets are extremely strong; only by combining multiple tests that use different aspects of functional targeting can we enrich for true positives. Our method for correcting for non-specific conservation bias has already helped significantly as CE gives a significantly smaller number of predictions than gene-set overlap analysis using Fisher's Exact Test at the same FDR cutoff (see main text). Similar to the results in pathway analysis, CE-CTX and OC-CE-CTX results are largely overlapping (70 out of 85 CE-CTX predictions are in the OC-CE-CTX list). Taken together, our analyses strongly suggest that the OC-CE-CTX test has significantly better sensitivity and specificity than other tests. 151 Supplemental References Duda RO, Hart PE, Stork DG. Pattern classification, 2nd edn (New York: Wiley) (2001). Turlach BA. Bandwidth selection in kernel density estimation: A review, Paper presented at: Discussion Paper 9317 (Voie du Roman Pays 34, B-1348 Louvain-la-Neuve, Belgium: Institut de Statistique) (1993). Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434, 338-345 (2005). Supplemental Tables are available online at Molecular Cell. Gene sets used for miR-218 analysis Glutamate set Slc1Al Slc1A2 Slc1A3 Slc1A6 Slc1A7 Slc17A6 Slc 17A7 Slc17A8 Grml Grm2 Grm3 Grm4 Grm5 Grm6 Grm7 Grm8 GrikI Grik2 Grik3 Grik4 Grial Gria2 Gria3 Gria4 GrinI Grin2A Grin2B Grin2C 152 Grin2D Grin3A Grin3B GrinA GrinLIA Gridl Grid2 Homer1 Homer2 Homer3 GLS GAD GLUL GABA set SLC6A1 SLC6A1 1 SLC6A13 SLC32A1 GABRA1 GABRA2 GABRA3 GABRA4 GABRA5 GABRA6 GABRB1 GABRB2 GABRB3 GABRD GABRE GABRGI GABRG2 GABRG3 GABRP GABRQ GABRR1 GABRR2 GAD ABAT ALDH5A1 Dopamine set DRD2 DRD3 DRD4 DRD5 DBH DDC COMT MAOA SLC6A3 153 TYR TH PAH SLC29A4 Serotonin set HTR1A HTRIB HTR1D HTR1E HTR1F HTR2A HTR2C HTR3A HTR4 HTR6 HTOR SLC6A4 HTR5A 5HTT HTR7 HTR2B HTR3B HTR5A HTR3E HTR3D HTR5B TPH MAOA SLC29A4 Adrenaline/epinephrine set ADRA1A ADRAIB ADRA1D ADRA2A ADRA2B ADRA2C ADRB1 ADRB2 ADRB3 ADRBK1 ADRBK2 COMT PNMT TH DBH Synaptic vesicle formation set BSN RAPGEF4 154 RIMS1 RIMS2 PCLO UNC13A ERC2 SV2A SV2B NAPA STXBP1 SYTI CPLX1 CPLX2 NSF 155 Curriculum vitae Margaret Ebert Contact Information Date of birth: June 25, 1981 work phone: (617) 253-6458 email: ebertms@mit.edu Education Hopewell Valley Central High School, 1995-1999 Yale University, B.S. in Molecular, Cellular, and Developmental Biology, May 2003 University of Cambridge, M.Phil. in Molecular Biology (Medical Research Council Laboratory of Molecular Biology), Aug 2004 Massachusetts Institute of Technology, Ph.D. candidate in Biology, Sept 2004-June 2010 Awards and Honors Beckman Scholarship for undergraduate research, May 2001-Aug 2002 Phi Beta Kappa, fall 2002 Editor-in-Chief, Yale Scientific Magazine, 2002-2003 Churchill Scholarship, 2003-2004 Howard Hughes Medical Institute Predoctoral Fellowship, 2004-2009 Yale College Chittenden Prize, May 2003 (highest academic record among all science majors in the Class of 2003) Yale College Belknap Prize for senior research in biology Paul and Cleo Schimmel Scholarship, 2006-2009 Gene Brown-Merck Teaching Award, 2009 Teaching assistantships Introduction to Experimental Biology and Communication, spring 2006 Molecular and Engineering Aspects of Biotechnology, spring 2009 Research experience James Anderson, Yale University, summer 2000, tight junction complexes Ronald Breaker, Yale University, summer 2001-2003, riboswitches in prokaryotes Savithramma Dinesh-Kumar, Yale University, summer 2003, RNA silencing in plants Andrew Griffiths, MRC LMB, 2003-2004, in vitro selection of proteins Phillip Sharp, MIT, 2005-, microRNAs in mammalian cells Publications Nahvi A, Sudarsan N, Ebert MS, Zou X, Brown KL, Breaker RR. Genetic control by a metabolite binding mRNA. Chem. Biol. 9, 1043-1049 (2002). Sudarsan N, Wickiser JK, Nakamura S, Ebert MS, Breaker RR. An mRNA structure in 156 bacteria that controls gene expression by binding lysine. Genes Dev. 17, 2688-2697 (2003). Ebert MS, Neilson JR, Sharp PA. MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nat. Methods. 4, 721-726 (2007). Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, Jacks T. Suppression of non-small cell lung tumor development by the let-7 microRNA family. Proc. Natl Acad. Sci. USA 105, 3903-3908 (2008). Tsang JS, Ebert MS, van Oudenaarden A. Genome-wide dissection of microRNA functions and co-targeting networks using gene-set signatures. Mol. Cell 38, 140-53 (2010). Gatt ME, Ebert MS, Mani M, Zhang Y, Gazit R, Carrasco DE, Dutta J, Adamia S, Munshi NC, Minvielle S, Avet-Loiseau H, Tai Y-T, Anderson KC, Carrasco DR. MicroRNAs 15a/16-1 function as tumor suppressor genes in multiple myeloma. Submitted (2010). Mukherji S*, Ebert MS*, Zheng GZ, Tsang JS, Sharp PA, van Oudenaarden A. MicroRNAs generate gene expression thresholds with ultrasensitive transitions. Submitted (2010). *co-first author Ebert MS, Sharp PA. MicroRNA sponges: progress and possibilities. Submitted (2010). Ebert MS, Sharp PA. Roles for microRNAs in conferring robustness to biological processes. Submitted (2010). Ebert MS, Sharp PA. Emerging roles for natural microRNA sponges. Submitted (2010). Patents Riboswitches, methods for their use, and compositions for use with riboswitches. Breaker R, Nahvi A, Sudarsan N, Ebert MS, Winkler W, Barrick JE, Wickiser J. (2003). 157