Characterizing cell-cycle as a global regulator of stochastic transcription and noisy gene expression in S. cerevisiae by Katie J. Quinn B.Engineering (Chemical & Biological) & B.Science (Molecular Biology) University of Queensland, 2008 Submitted to the Department of Chemical Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Chemical Engineering MASSACHUSETT$ MN1TTUTE. OF TECHNOLOGY at the Massachusetts Institute of Technology JUN 3 0 2014 June 2014 LIBRARIES @ 2014 Massachusetts Institute of Technology. All rights reserved. Signature of Author: Signature redacted Department of Chemical Engineering May 20, 2014 Signature redacted Certified by: Narendra Maheshri Assistant Professor of Chemical Engineering Thesis cupervisor Accepted by: Signature redacted Vatlick S. Doyle Professor of Chemical Engineering Chairman, Committee for Graduate Students Characterizing cell-cycle as a global regulator of stochastic transcription and noisy gene expression in S cerevisiae by Katie J. Quinn Submitted to the Department of Chemical Engineering on May 20, 2014 in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Chemical Engineering Abstract Even in the same environment, genetically identical cells can exhibit remarkable variability, or noise, in gene expression. This expression noise impacts the function of gene regulatory networks, depending on its origins. Hence, a prerequisite for understanding or designing gene regulatory networks is characterizing the origins and statistics of the noise. Variability has been largely attributed to the inherently stochastic nature of transcription. Expression statistics from multiple organisms are consistent with an influential model of "bursty" expression, where promoters are generally inactive but infrequently produce multiple mRNA. But fluctuations in the cell environment can also contribute, leaving the origins of noise unclear. We sought to determine the origins of noise in gene expression from the synthetic tetO promoter in S cerevisiae. We use single-molecule mRNA FISH to quantify nuclear and cytoplasmic mRNA in a population expression distribution, and models of stochastic mRNA production and degradation to infer underlying transcriptional dynamics. Rather than transcriptional bursting, we find that noise is driven by large differences in transcriptional activity between the G1 and S/G2/M stage of the cell cycle. Furthermore, we quantitatively characterize these dynamics of transcription by measuring expression in cells arrested at the G1/S and G2/M transition. Promoters activate in S/G2 with probability determined by activator level. mRNA statistics from an active promoter with a single operator are Poisson; expression with multiple operators is more variable. Promoters appear to inactivate at the M/G1 transition, with lower activator levels leading to increased probability of inactivation. Thus below a certain activator threshold, all cells are inactive in G1. mRNA processing and export introduces further variability. Similar analysis of the native, chromatin-regulated PHO5 promoter yields the same results. Hence cell-cycle driven transcription dynamics may be prevalent among regulated yeast genes. The timing of S/G2 activation suggests DNA replication and chromatin maturation may be linked to repressed transcription. Cell-cycle-linked fluctuations in expression are likely to affect gene behavior in regulatory networks. This thesis advocates the importance of cellular context in gene regulation and reveals a novel role of cell-cycle as a driver of eukaryotic transcription, advancing our understanding of stochastic transcription and noise in gene expression. Thesis Supervisor: Narendra Maheshri Title: Assistant Professor of Chemical Engineering 3 Acknowledgements I first thank my advisor Narendra Maheshri for his unfailing support and enthusiasm, boundless ideas and knowledge, and inspiring approach to science. On top of this, Narendra is also a natural and generous teacher, making for a wonderful mentoring experience. I thank my thesis committee members, Arup Chakraborty and Christopher Love, for their involvement and insights, and all the ChemE staff and faculty for support during my time here at MIT. My thanks also go to my undergraduate advisor, Lars Nielson, who encouraged me to pursue a PhD in the USA and gave me my first taste of independent research at the University of Queensland in Australia. The General John Monash Awards provided generous financial support and welcomed me into a wonderful community of Australians abroad. I owe my labmates, T.L., Tek-Hyung, C.J., Bradley, Shawn & Nick, for teaching me in the lab, for valuable feedback in group meetings, and for sharing in the everyday struggles of a PhD student. I also thank my ChemE friends, especially my housemates at Speridakis, for many good times in and out of Building 66. I whole-heartedly thank the past and present members of the MIT Cycling Club for being a highlight of my time at MIT. The friendships and shared experiences have been true pleasure, and I hope that's so for many more MIT students to come. To Adam, thank you for sharing and supporting me in every step of the past few years. Finally, I thank my family, who helped me find my way to MIT in the first place. To my brothers and sister, Simon, Jess and Andrew, thanks for the listening and counselling it's been a great help. And to my parents, Greg and Julie, thank you for teaching me to love learning and for your selfless, endless support of all of my endeavors, even when they're on the other side of the world. 4 Contents A b stra ct ....................................................................................................................... 3 A cknow ledgem ents .................................................................................................... 4 CHAPTER 1. Introduction...........................................................................................9 1.1 Genotype to phenotype: Regulation of transcriptional dynamics..........................9 1.2 Intrinsic expression noise from stochastic transcription dynamics.........................10 1.2.1 An introduction to stochastic transcription dynamics ............................... 10 1.2.2 A model for understanding stochastic transcription dynamics...................11 1.2.3 Conflicting evidence for cis and trans regulators' modes of controlling transcription dynamics...............................................................................15 1.3 Extrinsic noise in gene expression ....................................................................... 19 1.4 Consequences of noise from stochastic transcription dynamics..............................20 1.5 Thesis aim and sum m ary ...................................................................................... 21 1.6 R eferen ces.............................................................................................................24 CHAPTER 2. Mode of transcriptional regulation can qualitatively affect gene behavior in positive feedback ................................................................................ 32 2 .1 A b stract ............................................................................................................... 32 2.2 Introduction: Modes of regulating transcriptional bursting ................................... 33 2 .3 Th eo ry :.................................................................................................................3 6 2.3.1 Steady-state expression of frequency- or size- regulated stochastic transcriptional bursting in feedback control..............................................36 2.3.2 The regime of bimodal expression ............................................................. 38 2.3.3 Two modes of regulating burst size are equivalent in the bursting limit.......45 2 .4 R esults:.................................................................................................................4 5 6 2.4.1 Bimodal expression patterns associated with positive feedback loops are enhanced with burst frequency regulation but reduced with burst size regu la tio n ..................................................................................................... 46 2.4.2 Mode of regulation affects mean expression.............................................. 49 2 .5 D iscu ssion ............................................................................................................. 50 2 .6 R eferen ces.............................................................................................................5 1 CHAPTER 3. The cell-cycle dependence of transcription is a dominant source of noise in gene expression..................................................................................56 3 .1 A b stra ct ............................................................................................................... 56 3 .2 In trod u ction .......................................................................................................... 56 3 .3 R esu lts:.................................................................................................................5 8 3.3.1 Multiple transcription patterns result in expression distributions consistent with transcriptional bursting......................................................................58 3.3.2 Static mRNA FISH reveals cell-cycle dependent expression may create extrinsic noise in expression ..................................................................... 60 3.3.3 A stochastic model to infer cell-cycle dependent transcription from mRNA expression distributions.............................................................................64 3.3.4 Regulated transcription at low activator levels is restricted to S/G2; Constitutive expression varies with gene dosage........................................65 3.3.5 Real-time fluctuations in protein levels corroborate mRNA measurements and reveal globally correlated activation..........................................................72 3 .4 D iscu ssion :............................................................................................................75 3.4.1 Implications for understanding stochastic gene expression.........................75 3.4.2 Gene activation kinetics are also cell-cycle dependent .............................. 75 3.4.3 A hypothesis that chromatin maturation permits repressed transcription .... 76 3 .5 Referen ces ............................................................................................................. 6 77 CHAPTER 4. Characterization of the tetO gene regulatory function using cell-cycle arrested mRNA expression.................................................................82 4 .1 A b stra ct ............................................................................................................... 82 4 .2 In trod u ction .......................................................................................................... 83 4.3 R esu lts:.................................................................................................................8 4 4.3.1 Analysis of single-molecule nuclear and cytoplasmic mRNA FISH in arrested 84 cells reveals instantaneous cell-cycle dependent transcription................... 4.3.2 mRNA expression under cell-cycle arrest reveals that activator regulates probabilistic activation of a long-lived transcribing state in S/G2.............90 4.3.3 Conditional variances quantify the origins of gene expression noise...........92 4.3.4 A model of transcription underlying arrested expression distributions reveals that activator only regulates the probability and stability of activity, whereas 97 the promoter determines active transcription dynamics ............................ 4.3.5 Reproducing cell-cycle kinetics with a model of S/G2/M and GI stationary transcription dynam ics ............................................................................... 105 4.3.6 Cell-cycle dependent transcription at the yeast PHO5gene suggests generality among regulated genes in yeast.................................................. 4 .4 D iscu ssion :.......................................................................................................... 108 109 4.4.1 Naive interpretation of expression distributions with the bursting tran scription m odel .................................................................................... 109 4.4.2 Reconsidering cis and trans modes of regulating transcriptional dynamics in y ea st .......................................................................................................... 1 12 4.4.3 Predictions about cell-cycle as a global transcriptional regulator in other org an ism s ................................................................................................... 115 4 .5 C on clusion s......................................................................................................... 116 4 .6 R eferen ces........................................................................................................... 117 7 CHAPTER 5. Future directions ................................................................................ 119 5 .1 S y n o p sis .............................................................................................................. 1 19 5.2 The origin of the S/G2 window for transcriptional activation............................. 119 5.3 The com plete m echanism of cell-cycle regulation of transcription ....................... 122 5.4 Prevalence of cell-cycle driven transcription........................................................ 125 5.5 Towards a generalized, predictive model for stochastic transcription dynamics... 126 5.6 Consequences of cell-cycle driven transcription in gene networks........................ 128 5.7 Future techniques to observe transcription with single-molecule precision in realtim e .................................................................................................................... 5 .8 R eferen ces ........................................................................................................... CHAPTER 6. Appendix ........................................................................................... 130 13 1 133 6.1 Yeast strains and plasmids ................................................................................. 133 6 .2 P rotocols:............................................................................................................ 134 6.2.1 Growth & arrest protocols ......................................................................... 134 6.2.2 m RNA FISH .............................................................................................. 135 6.2.3 m RNA FISH im age analysis....................................................................... 135 6.2.4 Num erical solutions to stochastic models ................................................... 137 6.3 Quantification of mRNA dynamics ..................................................................... 138 6.3.1 Nuclear and cytoplasm ic m RNA degradation half-life ................................ 138 6.3.2 Rate of nuclear mRNA export.................................................................... 139 6.4 m RNA FISH error analysis................................................................................. 140 6.5 References........................................................................................................... 143 8 CHAPTER 1. 1.1 Introduction Genotype to phenotype: Regulation of transcriptional dynamics An overarching goal of biology is to predict and explain cell and organism behavior in response to a set of environmental conditions. Whole genome sequencing has become commonplace, mapping not only genes but also the genetic regulatory elements that control their expression. A new challenge is to decipher how genetic regulatory networks integrate internal and external signals to actuate gene expression. A first-pass understanding of gene regulation relates regulatory conditions, such as the regulatory DNA sequence and the level of regulatory proteins, to the rate of mRNA or protein production. As such, a gene regulatory networks can in principle be modeled by ordinary differential equations that explicitly enumerate these regulatory relations. But transcription does not appear to occur in a continuous, deterministic manner , but instead as a random, intermittent process where a gene fluctuates between periods of activity and inactivity. Transcriptional dynamics include a gene's fluctuations between states of varying transcriptional activity, despite unchanging regulatory conditions. The resulting variability, or noise, in expression can affect the behavior of gene regulatory networks, and so the origins of variability are important to understand and predict gene function. Noise in gene expression has been conceptually divided based on its origination from two general sources: Intrinsic noise describes variability that originates from molecular noise in the reactions inherent to transcription; extrinsic noise originates from variability in factors that influence transcription (Elowitz et a., 2002). The next two sections discuss each of these in turn, towards the thesis' central goal of characterizing how regulators and regulatory elements modulate transcriptional dynamics and the resulting variability in gene expression. 9 1.2 Intrinsic expression noise from stochastic transcription dynamics 1.2.1 An introduction to stochastic transcription dynamics Transcription occurs through a series of molecular interactions at a gene's promoter leading to successful production of an mRNA. Like all chemistry, the interactions are inherently stochastic. Most systems of chemical reactions involve large pools of each molecular species (on the order of Avogadro's number) such that, while each molecular transformation is stochastic, the macroscopic behavior is just the average behavior of all of these molecules and can be described deterministically. However, a cell usually has just one or two copies of a gene in its nucleus and the average number of mRNA produced can be anywhere from 100-105, depending on the transcriptional activity and the organism. Thus stochastic fluctuations in transcriptional activity can contribute to variability in mRNA and protein expression between cells, or in a single-cell over time. Variability in expression at steady-state is quantified by noise, which is the coefficient of variance (the standard deviation divided by mean) of expression levels between cells in a population, or by the Fano factor, which is the variance divided by the mean of expression in the population, with intuitive units of mRNA or protein. That stochastic chemistry could create biological variability was long-ago predicted from physical principles (e.g. Schr6dinger, 1944). It was first suspected as the cause of heterogeneous induction of the lac operon inherited over generations in isogenic cells (Novick & Weiner, 1957). Interest was more recently revived when stochastic expression appeared to explain the previously observed (DelbrOck, 1945) lysis/lysogeny decision of the phage lambda (McAdams & Arkin, 1999; Arkin, Ross & McAdams, 1998). Genome-wide studies of expression noise in budding yeast have since revealed that genes with highly regulated expression (such as stress-response genes) tend to be noisier than those that are constitutively expressed (Bar-Even et a]., 2006; Newman et a]., 2006), suggesting a qualitative difference in the dynamics of constitutive and regulated transcription. The molecular biology of transcription can ground models that attempt to explain the source and statistics of intrinsic variability. Constitutive genes that are not noisy are most- 10 often un-occluded by nucleosomes, existing in a state permissive to transcription, which is limited by diffusion and assembly of the general transcription factors and machinery at fairly regular time intervals. On the other hand, noisy genes regulated by the binding of one or more activators to operators in their promoter are enriched in nucleosome-occluded promoters, which rest in an inactive state where transcription is limited by chromatin remodeling. Noisy genes are also more likely to have TATA boxes, which ensure strong active transcription. Each of these aspects contribute to stabilize the "reinitiation complex" containing Mediator and other transcription cofactors (Yudkovsky, Ranish & Hahn, 2000), enabling several rounds of productive transcription after each slow-step of initiation (Figure 1-1). This fluctuation between promoter activity states is a potential origin of noise in gene expression and informs models of transcription. Mediator Cofacto§ Gene Nucleosome binding sites Figure 1-1: Components of the reinitiation complex in yeast transcription. Cis activator binding sites (red) stabilize trans activators (yellow) at the gene promoter providing a, foundation f)r assembly of the complete initiation complex. 1.2.2 A model for understanding stochastic transcription dynamics A useful abstraction of this underlying molecular biology is the "two-state promoter" model. It was conceived with the earliest studies of gene regulation at the lac operon (Jacob & Monod, 1961) then developed by Peccoud & Ycart (1995). The model, depicted by Figure 1-2 and outlined in Equation 1-1, supposes a gene exists in either an inactive OFF (I) or active ON (A) state. When ON, it produces mRNA (M), which degrades by first-order kinetics. 11 M A I Figure 1-2: Thw-state promoter model of gene expression. A promoter can exist in an inactive (1) or active (A) state and transitions between the states with rate A and y. An active promoterproduces mRNA with rate p, which are degraded by first-orderkinetics with rate6. A 1-1 '> I A > A+M M 40> For low-noise constitutive genes, the model reduces to a single state, where the promoter lives in the active state and produces mRNA with rate p. At steady-state, constant stochastic production yields a Poisson distribution of mRNA. As such, Poisson mRNA statistics represents a "null" case of minimal expression noise. Figure 1-3 shows an example trajectory of production and degradation and the resulting steady-state mRNA expression distribution. Poisson statistics of mRNA have been observed for multiple constitutive genes in yeast (Larson et aL, 2011; Zenklusen, Larson & Singer, 2008). B A50 0.2 0.15 40 o 30 0<~ z 20 E X5 0.1 0.05 0 2 4 6 Time (hour) 8 0 10 20 30 40 mRNA Count 50 Figure 1-3: The trajectozy and stationmiy expression distributionof a Poissonproductionprocess. (A) mRAA count fluctuates over time due to stochastic single-molecule production and degradation events (Burst frequency: A/6 = 10, Burst size: p/y = I mRNA). (B) Sampled over long times or across a laige population, expression is a Poisson distribution with mean and variance of 10 nRNA. 12 On the other hand, single-cell studies of population distributions and real-time activity suggest that expression of regulated eukaryotic genes can occur in bursts of transcription consistent with a model where the promoter switches randomly and rarely from a stable inactive state to a short-lived, actively-transcribing state (Larson et a]., 2011, Raj et a]., 2006). Rare transcriptional initiation and then rapid reinitiation (discussed above) may represent the molecular basis for this dynamic behavior (Hahn, 1998; Struhl, 1996; Yudkovsky, Ranish, & Hahn, 2000). In this case, the two-state promoter operates in the "bursting" regime, with rare activation compared to inactivation (<<7) fluctuations are faster than the mRNA lifetime (f > and promoter 1 ), where tilde denotes that a parameter is normalized by the degradation rate. In this regime, f is the burst frequency, and ;i / f is the burst size and the distribution can be described by a two-parameter Gamma distribution: P X1 1-2 This is equivalent to the result of Friedman, Cai & Xie (2006). The discrete equivalent, representing integer mRNA counts, is the negative binomial distribution. The two parameters specify the burst frequency and burst size respectively and their product is the expression mean. While the model says nothing about the actual molecular events leading to transcription, the burst frequency is thought to correspond to the transcriptional initiation rate and the burst size may correspond to the transcriptional reinitiation and/or elongation rate. Figure 1-4 shows an example trajectory of transcriptional bursting and the resulting steady-state expression distribution, parameterized by a burst frequency and burst size, and fit with the negative binomial distribution of those parameters. (While Master equations that describe the two-state model at steady-state have an analytical solution, more complicated stochastic models that lack analytical solutions must be solved numerically, with kinetic Monte Carlo simulations or a Finite Markov Chain method (Munsky & Khammash, 2006).) 13 A50 BO.2 40 C 0 30 z 20 E 0.15 2 0- 0.1 0.05 10 0L 0 2 4 6 Time (hour) 0 8 0 10 30 20 mRNA Count 40 50 Fig-ure 1-4: The trajectory and stationaiyexpression distributionof a burstingprocess. (A) mRNA count fluctuates widely over time due to "bursts" of transcription (Burst frequency: A16 = 1. Burst size: p/y = 10 mRNA). (B) Sampled over long times or across a large population. expression is a ne'g-ative binomial distribution with a mean of 10 mRNA andparameters corresponding to the burst frequency and burst size. This bursting and the previous Poisson example (Figure1-3) have tMe same mean expression level but very different distributions. A regulatory element's mode of affecting transcriptional dynamics could be inferred from how the expression distribution changes with mean expression. For the case of the two-state promoter model, the product of the burst frequency and burst size gives the mean expression. Transcriptional regulators could affect expression via the frequency or size of bursts. Regulation of burst frequency will increase sampling as the mean increases, thereby decreasing the noise; regulation of burst size will not (Figure 1-5). B A Burst frequency regulation Burst size regulation Cg Zi 0 Mean (log) Signal Figure 1-5: Regulation via burst frequency or burst size will affect expression noise. For two hypotheticalgenes with equivalent mean expression in response to an activating signal (A), the gene regulated via burst frequency will decrease expression noise as activation increases whereas the gene under burst size control will not (B). 14 1.2.3 Conflicting evidence for cis and trans regulators' modes of controlling transcription dynamics Many studies have sought to observe and characterize transcriptional dynamics. These studies use single-cell techniques that fall in two classes: measuring mRNA dynamics in realtime and back-calculating transcription events (corresponding to the trajectories of Figure 1-3 & 1-4 A); and measuring mRNA expression across a static cell population and inferring steady-state transcriptional dynamics (as in the distributions of Figure 1-3 & 1-4 B). Golding et a]. (2005) were the first to visualize transcription in real-time, introducing repeats of secondary structure in mRNA that bound an MS2 phage coat protein fused to GFP. This enabled counting the production of individual MS2-GFP-labeled mRNA transcripts using in growing F. coli.. mRNA appeared to be produced in "bursts" of transcription. Studies in mammalian cells (Suter et a]. (2011) and Harper et aL. (2011)) using luciferase as a readout suggest mammalian promoters are activated in bursts, followed by an inactive refractory periods. Results of earlier studies which inferred transcriptional dynamics from protein noise in static populations are also consistent with bursting dynamics. Consistent with expectations, when Ozbudak et a]. (2002) modulated bursting at the translational level in the prokaryote B. subtilis (via point mutations that affect ribosome binding and thus translational efficiency, p), noise remained high with increasing expression level (Kaern et a]., 2005). Blake et a]. (2003) demonstrated the same in S. cerevisiae. Raser & O'Shea (2004) used a yeast strain with two homologous reporters of PHO5 expression to measure intrinsic noise at the protein level (2004). A mutant of the TATA-binding site, expected to decrease transcription rate, p, decreased noise. A mutation of the activator binding site, expected to decrease promoter activation, A, increased noise, both consistent with the noise-mean trends of the bursting model. Multiple studies have examined noise in protein expression genome-wide in both S. cerevisiae (Bar-Even et a]., 2006; Newman et a]., 2006; Hornung et aL, 2012) and E coli (Taniguichi, 2010). In yeast, these studies revealed the correlation between regulated and 15 noisy gene. They also found that, in general, noise decreased with the inverse square-root of mean expression, suggesting that gene activity is predominantly controlled via modulating the frequency of activation events. Similar global regulation of burst frequency was seen in bacteria (Taniguichi, 2010). However regulated genes are often repressed in standard growth conditions and thus all such genes may not be captured in these trends. Also, the inverse square root scaling of noise with protein abundance was not seen clearly at high levels of expression, where extrinsic noise is expected to dominate. Additional studies have focused on measuring expression statistics from several different promoters to infer how changes in cis regulatory elements can affect noise. Zenklusen et a]. (2008) found Poisson statistics at three constitutive genes, as expected. But expression of the regulated PDR5 gene had higher noise, and an expression distribution well-fit by the negative binomial distribution solution to stationary bursting. Carey et a]. (2013) conditioned on activator-specific effects by measuring expression from multiple genes activated by the same transcription factor, and saw that the level of noise, or "burstiness", was a function of the promoter sequence. They also saw that the degree of noise depended on whether the transcription factor acted as an activator or repressor (repression lowered the apparent burst size). So et a]. (2011) measured expression from multiple bacterial genes and concluded that both burst frequency (at lower levels) and burst size (at higher levels) increased expression level. Dar et a]. (2012) and Skupsky et a]. (2010) integrated a single promoter at many locations in a human genome and reported evidence of bursting across all locations with burst frequency and then burst size increasing with mean expression. However the latter study did not measure stationary expression, a requirement for the applicability of the negative binomial noise-mean trends. A further cause-for-thought is that this trend is consistent with the expected dominance of extrinsic noise at high expression levels (discussed in the next section). More detailed studies systematically varying regulatory elements within a single gene's promoter has proven particularly effective for identifying regulatory trends. In yeast, Murphy et a]. (2007) saw that placing transcription factor binding sites closer to the promoter increased burst size, perhaps increasing the productivity or the stability of the initiation complex. Both this study, work in our lab (To & Maheshri, 2010), and work in 16 mammalian cells (Raj et al. 2006, Suter et al. 2011) compared expression from promoters identical except for the number of activator binding sites and showed noise increases with binding site number. Dadiani et al. (2013) performed an interesting study in yeast, engineering increased expression of a single gene via either increased binding site strength or nucleosome-disfavoring sequence. The former substantially increased expression noise ("burst size"), the latter did not. This suggests the nucleosome disfavoring sequence transitioned the dynamics out of the "bursting" regime and into continuous activity. Raj et al. (2006) measured mRNA expression distributions from two identical copies of a gene in single diploid mammalian cells and saw that fluctuations between the two loci were largely uncorrelated, suggesting intrinsic origins. The shape of the distributions was consistent with bursting. Table 1-1 summarizes the conclusions of these studies according to whether they found cis or trans regulators to affect burst size, frequency, or both. There is strong evidence for cis elements, such as promoter architecture (binding site number, chromatin structure) and local chromatin environment, dictating the "size" of transcriptional bursts. How both genespecific and global trans regulators affect bursting is less clear. While the paradigm of "bursty" transcriptional dynamics seems prevalent, its veracity and ubiquity is not conclusive. Muramoto et al. (2012) detected periodic, long-lived pulses of transcription in real-time in Dictyostelum, rather than bursts. And particularly troubling is the fact that models for transcriptional bursting describe the variability in mRNA levels from purely intrinsic sources, whereas appreciable amounts of extrinsic noise have been measured in a wide number of studies, although generally at the protein level. Using FACS to measure protein noise does allows for gating by cell shape, but removes no other extrinsic variability. Several studies (including Blake et al., 2003; Raser & O'Shea, 2004; So et al., 2011; To & Maheshri, 2010) used scaling arguments to justify that noise in their data was intrinsic: It was claimed that i 2 decreasing monotonically with mean, a/papproaching one at very low expression levels and q2 decreasing sharply with mean at high expression levels were all indicative of intrinsic rather than extrinsic noise. Our noise data also has these scaling properties, but its origins have proven to be predominantly extrinsic. Exceptions 17 that do explicitly account for extrinsic variability are few, including the original study with two mRNA reporters (Raj et a]., 2006) and a theoretical study (Shahrezaei, Ollivier & Swain, 2008). Table 1-1: Summarv of studies identifying cis- and trans- regulation of bursting dynamics Burst frequency regulation Burst size regulation Binding site number or strength: S. cerevisiae: Raser & O'Shea 2004 Blake et a]. 2006 Murphy et aL. 2007 To & Maheshri 2010 Dadianiet aL 2013 Mammalian: Raj et al. 2006 Suter et a]. 2011 TATA strength: S. cerevisiae: Raser & O'Shea 2004 Mogno et a]. 2010 Hornung et al. 2012 Cis variation Nucleosome occupancy/remodeling: S. cerevisiaeBai et a]. 2010 Dadianiet a]. 2013 Genomic location: Mammalian: Skupsky et aL. 2010 Dar et a]. 2012 Multiple genes (promoters): E. col: So et aL. 2011 S. cerevisiae: Hornung et a]. 2012 Activator levels or activity: E. coli: Pedraza & van Oudenaarden 2005 Golding et aL 2005 S. cerevisiae: Raser & O'Shea 2004 Mao et a]. 2010 Activator levels: E. col: Choi et a]. 2008 S. cerevisiae: Mao et a]. 2010 Carey et aL. 2013 Trans variation Global protein noise: E. col: Taniguchi et a]. 2010 S. cerevisiaeBar-Even et al. 2006 Newman et a]. 2006 _ __ _ _I I 18 _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ 1.3 Extrinsic noise in gene expression Extrinsic noise originates due to fluctuations in upstream factors that impact gene expression. Upstream fluctuations can occur in global factors (affecting all genes) or genespecific pathway components. Upstream fluctuations may also derive from intrinsic noise in their expression, but it need not be the case. For example, cell size seems an important determinant of global transcriptional activity (Raj, unpublished). The cell-cycle, including the doubled DNA copy number of S/G2/M, may also be a potential source of transcription variability (Elliott & McLaughlin, 1978, Volfson et a]., 2006). Stochastic partitioning of mRNA and protein upon cell division is another source of extrinsic noise, and appeared as intrinsic noise in a popular experimental method to measure contributions from the two sources (Huh & Paulsson, 2011). The early experimental example of consequential stochastic gene expression, the lysis/lysogeny decision of phase lambda, was later attributed to extrinsic sources (St-Pierre & Endy, 2008). Hilfinger & Paulsson (2011) noted that even intrinsic noise parameters will depend on extrinsic factors in the history of the cell. Studies of global protein expression (Bar-Even et a]., 2006) showed evidence of a baseline of extrinsic noise, with noise never measured below a coefficient of variation of 0.2. Yet these important sources of extrinsic variability are suggested to have small, non-qualitative effects on variability of mRNA and transcription, because of the large wealth of experimental results consistent with the transcriptional bursting model (Table 1-1). 19 1.4 Consequences of noise from stochastic transcription dynamics Stochastic transcription and expression variability is of particular interest because it can have qualitative consequences for phenotype. In synthetic gene circuits, noise has stabilized toggle switches and oscillators (Becskei, Seraphin & Serrano, 2001; Elowitz & Leibier, 2000; Gardner, Cantor & Collins, 2000). Our lab previously demonstrated that noise can cause bimodal expression in a transcriptional positive feedback loop even when deterministic models predict no bistability (To & Maheshri, 2010). Stochastic expression also has consequences in evolution and development of organisms. Heterogeneous expression among a population of isogenic unicellular organisms may lead to variability that assists with responses to changes in nutrients (e.g. lactose utilization, Ozbudak et a]., 2004), to stress (e.g. competence in B. subtilis, Maamar, Raj & Dubnau, 2007; Suel et a]., 2006; Suel et al, 2007) or to pathogens (e.g. bacterial persistence against antibiotics, Blake et aL., 2006) and may play a role in development (e.g. variability in a stem cell marker correlated strongly with choice of lineage, Chang et a]., 2008). But consistent with intuition about control systems, noise in gene expression can also be detrimental, limiting information transfer (Bialek & Setayeshgar, 2005; Lestas, Vinnicombe & Paulsson, 2010), such that gene regulatory networks have evolved to suppress noise effects (McAdams & Arkin, 1999; Raj et a]., 2010). Noise from transcriptional dynamics may be an unavoidable biophysical limitation of achieving a highly-regulable range of transcription (Bremer & Ehrenberg, 1995; Guptasarma, 1996; Salman et a]., 2012). (One counterargument is that ribosomal genes, which have high dynamic range with little noise. But these genes fundamentally differ, transcribed by their own polymerase (RNA Pol I) with ON/OFF, rather than graded, control.) In either case, the fact that organisms are known to both exploit and evolve to minimize noise is evidence of its significance for gene regulatory network performance. 20 1.5 Thesis aim and summary One summary of the current understanding of transcriptional dynamics is: Transcription occurs in continuous, stochastic burst events, with a frequency regulated by activator level and a size dependent on promoter architecture and constraints of the molecular biology of transcription in the particular organism. But studies have emerged suggesting that extrinsic, global regulators of gene expression have gone un-appreciated, suggesting a direction for the field to advance. Another question remaining answered is how transcriptional dynamics control the kinetics of a response to changing regulatory signals. This thesis is a case of the former, exploring a novel case of global transcriptional regulation. While kinetics are not covered here, characterization of stationary dynamics is a key step towards predicting kinetic behavior. This work aims to characterize transcriptional dynamics, and thus the origin of noise in gene expression, at the yeast tetO gene in response to cis and transregulators and the cellcycle, which we reveal as a global regulator of transcription. In contrast to current understanding of single-cell dynamics and noise, we establish that large difference in transcription between cell-cycle stages drives noisy expression at the tetO promoter, and suggest this may be prevalent at other regulated yeast genes. Specifically, we ask: What are the dynamics of transcription in response to the level of activator proteins, number of activator binding sites at the gene promoter, and global process of the cell-division cycle? In Chapter 2 we present a case study demonstrating how the mode of regulating transcriptional dynamics can affect phenotype. We consider a hypothetical pair of promoters transcribed with bursting dynamics: one whose expression is modulated by the frequency of bursts, and the other by the size of bursts. Hence they differ in how their intrinsic noise varies with changes in expression level. We analyze expression in positive feedback and see that regulation via burst frequency can create bimodal expression, or stabilize deterministic bistability; whereas regulation of burst size never creates bimodality, and instead destabilizes bistability. Hence regulators' mode of controlling dynamics is important for gene function. 21 Previous studies' attempts to infer transcription dynamics have been quick to attribute all noise in expression to the model of bursting transcription. But a strain with two copies of our "noisy" gene of interest revealed that much of the noise we observed was extrinsic, deriving from sources other than transcription itself. When probing the origin of this extrinsic noise, we uncovered a strong dependence of transcription on the cell-cycle. In Chapter 3, we establish the nature of this cell-cycle driven transcription. We measure transcription rates with two single-cell methods: by tracking protein levels in real-time we back-calculate time courses of transcription rates, albeit with relatively low molecular and temporal resolution; by counting single mRNA molecules in a static population segregated by cell-cycle stage, we obtain a higher resolution readout of recent transcription but with no temporal information. An immediate result is that transcription increases 2-fold from GI to S/G2/M for constitutive or highly-expressed, regulated genes. Though previously underappreciated, this is expected because the copy number of each gene doubles during Sphase DNA replication between G1 and G2. We show that this dominates super-Poissonian noise in growing populations' constitutive gene expression. Of greater interest, we find that transcription under repressed conditions only occurs in S/G2/M. Clearly, something is at play alleviating transcriptional repression in early S/G2. We hypothesize that chromatin maturation following DNA replication creates a permissive window for transcription from otherwise inactive genes. But the resolution afforded by these techniques limited further characterization of transcriptional dynamics. mRNA expression blurs fluctuations in transcription on the timescale of its turnover. In Chapter 4, we seek a new approach to quantify both the cellcycle dependence of transcription and transcriptional dynamics within a single cell-cycle stage. We find that cell-cycle arrest obtains mRNA expression that reflects the pseudosteady-state transcription dynamics of each cell-cycle stage. Levels of mRNA at the site of transcription in the nucleus provides a more immediate readout of transcriptional activity than the cytoplasmic mRNA, which have undergone processing and export. We find that cells with or without nuclear mRNA have very different levels of cytoplasmic mRNA, indicating an active state lifetime on the timescale of a cell-cycle stage. Activation occurs in S/G2 with a probability determined by activator level. Activator also 22 modulates the stability of the active state across the M/G1 transition, such that there is a distinct activator threshold below which there is no GI expression. But expression from an active loci is essentially activator-independent. An active loci with a single operator has minimally-variable Poisson expression statistics; multiple operators result in higher expression noise, perhaps accessing multiple activity states that correspond to discrete promoter occupancy states. The growing population's normalized variance (i.e. the Fano factor) peaks at an activator level that is finally revealed to derive from noise peaking when 50% of cells activate in S/G2. This is a new picture of transcriptional dynamics that challenges previous expectations of transcriptional bursting dynamics. This thesis answers some questions but creates many more. We hypothesize that S/G2 alleviation of repression is due to the delay in maturation and assembly of new chromatin following DNA replication. Several studies link chromatin maturation to transcription, but is this at play here, and if so what is the molecular mechanism? Is this phenomena of cellcycle dependent de-repression of transcription universal, or shared among a smaller class of genes, and why? Can it affect protein-level behavior within a signaling network? What developments in experimental techniques are necessary to complete our understanding of transcriptional dynamics? We discuss these open questions in Chapter 5. Together, this thesis reveals a novel pattern of transcriptional regulation and provides an example of the importance of considering cell context when studying gene expression and regulation. The role of cell cycle as a regulator is remarkable and previously unappreciated in the context of noisy gene expression. It has implications for network design in synthetic biology, and for understanding gene expression perturbed by disease, particularly in the fastgrowing cells of cancers and developing organisms. This work to characterize the transcriptional dynamics of a single gene is one very small step towards developing generalized and predictive models of gene regulation, which may eventually enable us to understand and engineer biology as we would any other physical system. 23 1.6 References Arkin, A., Ross, J., & McAdams, H. H. (1998). Stochastic kinetic analysis of developmental pathway bifurcation in phage-lambda infected Escherichia coli cells. Genetics, 149(4), 1633-1648. Bai, L., Charvin, G., Siggia, E. D., & Cross, F. R. (2010). Nucleosome-depleted regions in cell-cycle-regulated promoters ensure reliable gene expression in every cell cycle. Developmental Cell, 18(4), 544-555. Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O'Shea, E., Pilpel, Y., & Barkai, N. (2006). Noise in protein expression scales with natural protein abundance. Nature Genetics, 38(6), 636-643. Becskei, A., Seraphin, B., & Serrano, L. (2001). Positive feedback in eukaryotic gene networks: Cell differentiation by graded to binary response conversion. EMBO Journal,20(10), 2528-2535. Bialek, W., & Setayeshgar, S. (2005). Physical limits to biochemical signaling. Proceedings of the NationalAcademy of Sciences of the United States of America, 102(29), 1004010045. Blake, W. J., Balazsi, G., Kohanski, M. A., Isaacs, F. J., Murphy, K. F., Kuang, Y., Collins, J. J. (2006). Phenotypic consequences of promoter-mediated transcriptional noise. Molecular Cell, 24(6), 853-865. Blake, W. J., Kaern, M., Cantor, C. R., & Collins, J. J. (2003). Noise in eukaryotic gene expression. Nature, 422(6932), 633-637. Bremer, H., & Ehrenberg, M. (1995). Guanosine tetraphosphate as a global regulator of bacterial RNA synthesis: A model involving RNA polymerase pausing and queuing. Biochiinica Et Biophysica Acta (BBA)-Gene Structure and Expression, 1262(1), 1536. 24 Carey, L. B., van Dijk, D., Sloot, P. M. A., Kaandorp, J. A., & Segal, E. (2013). Promoter sequence determines the relationship between expression level and noise. PLoS Biol, 11(4), e1001528. Chang, H. H., Hemberg, M., Barahona, M., Ingber, D. E., & Huang, S. (2008). Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature, 453(7194), 544-547. Cheung, A. M., & Cramer, P. (2012). A movie of RNA polymerase II transcription. Cell, 149(7), 1431-1437. Choi, P. J., Cai, L., Frieda, K., & Xie, X. S. (2008). A stochastic single-molecule event triggers phenotype switching of a bacterial cell. Science, 322(5900), 442-446. Dadiani, M., van Dijk, D., Segal, B., Field, Y., Ben-Artzi, G., Raveh-Sadka, T., Segal, E. (2013). Two DNA-encoded strategies for increasing expression with opposing effects on promoter dynamics and transcriptional noise. Genome Research, 23(6), 966-976. Dar, R. D., Razooky, B. S., Singh, A., Trimeloni, T. V., McCollum, J. M., Cox, C. D., ... Weinberger, L. S. (2012). Transcriptional burst frequency and burst size are equally modulated across the human genome. Proceedings of the NationalAcademy of Sciences, 109(43), 17454-17459. Delbruck, M. (1945). The burst size distribution in the growth of bacterial viruses (bacteriophages). The Journalof Bacteriology, 50(2), 131-135. Elliott, S. G., & McLaughlin, C. S. (1978). Rate of macromolecular synthesis through the cell cycle of the yeast saccharomyces cerevisiae. Proceedingsof the National Academy of Sciences, 75(9), 4384-4388. Elowitz, M. B., & Leibier, S. (2000). A synthetic oscillatory network of transcriptional regulators. Nature, 403(6767), 335-338. 25 Elowitz, M. B., Levine, A. J., Siggia, E. D., & Swain, P. S. (2002). Stochastic gene expression in a single cell. Science, 2945584), 1183-1186. Friedman, N., Cai, L., & Xie, X. S. (2006). Linking stochastic dynamics to population distribution: An analytical framework of gene expression. PhysicalReview Letters, 97(16), 168302. Gardner, T. S., Cantor, C. R., & Collins, J. J. (2000). Construction of a genetic toggle switch in escherichia coli. Nature, 403(6767), 339-342. Guptasarma, P. (1996). Cooperative relaxation of supercoils and periodic transcriptional initiation within polymerase batteries. Bioessays, 18(4), 325-332. Hahn, S. (1998). Activation and the role of reinitiation in the control of transcription by RNA polymerase II. Cold Spring HarborSymposia on QuantitativeBiology, 63, 181188. Harper, C., Finkenst~dt, B., Woodcock, D., Friedrichsen, S., Semprini, S., Ashall, L., White, M. (2011). Dynamic analysis of stochastic transcription cycles. PLoS Biology. 9(4), e1000607. Hilfinger, A., & Paulsson, J. (2011). Separating intrinsic from extrinsic fluctuations in dynamic biological systems. Proceedingsof the NationalAcademy of Sciences, 108(29), 12167-12172. Hornung, G., Bar-Ziv, R., Rosin, D., Tokuriki, N., Tawfik, D. S., Oren, M., & Barkai, N. (2012). Noise-mean relationship in mutated promoters. Genome Research, 22(12), 2409-2417. Huh, D., & Paulsson, J. (2011). Non-genetic heterogeneity from stochastic partitioning at cell division. Nature Genetics, 43(2), 95-100. 26 Huisinga, K. L., & Pugh, B. F. (2004). A genome-wide housekeeping role for TFIID and a highly regulated stress-related role for SAGA in saccharomyces cerevisiae. Molecular Cell, 13(4), 573-585. Jacob, F., & Monod, J. (1961). On the regulation of gene activity. Cold Spring Harbor Symposa on QuantitativeBiology, 26, 193-211. Jacob, F., & Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins. Journalof Molecular Biology, 3(3), 318-356. Kaern, M., Elston, T. C., Blake, W. J., & Collins, J. J. (2005). Stochasticity in gene expression: From theories to phenotypes. Nature Reviews. Genetics, 6(6), 451-464. Larson, D. R.., Zenklusen, D., Wu, B., Chao, J. A., & Singer, R. H. (2011). Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science, 332(6028), 475-478. Lestas, I., Vinnicombe, G., & Paulsson, J. (2010). Fundamental limits on the suppression of molecular fluctuations. Nature, 467(7312), 174-178. Maamar, H., Raj, A., & Dubnau, D. (2007). Noise in gene expression determines cell fate in bacillus subtilis. Science, 317(5837), 526-529. doi:10.1126/science.1140818 Mao, C., Brown, C. R., Falkovskaia, E., Dong, S., Hrabeta-Robinson, E., Wenger, L., & Boeger, H. (2010). Quantitative analysis of the transcription control mechanism. Mol Syst Bid, 6(1), -. McAdams, H. H., & Arkin, A. (1999). It's a noisy business! Genetic regulation at the nanomolar scale. Trends in Genetics, 15(2), 65-69. Mogno, I., Vallania, F., Mitra, R. D., & Cohen, B. A. (2010). TATA is a modular component of synthetic promoters. Genome Research, 20(10), 1391-1397. 27 Munsky, B., & Khammash, M. (2006). The finite state projection algorithm for the solution of the chemical master equation. The Journalof Chemical Physics, 124(4), -. Murakami, K. S., & Darst, S. A. (2003). Bacterial RNA polymerases: The whole story. Current Opinion in StructuralBiology, 13(1), 31-39. Muramoto, T., Cannon, D., Gierliaski, M., Corrigan, A., Barton, G. J., & Chubb, J. R. (2012). Live imaging of nascent RNA dynamics reveals distinct types of transcriptional pulse regulation. Proceedings of the National Academy of Sciences, 109(19), 7350-7355. Murphy, K. F., Balszsi, G., & Collins, J. J. (2007). Combinatorial promoter design for engineering noisy gene expression. Proceedingsof the NationalAcademy of Sciences, 104(31), 12726-12731. Newman, J. R. S., Ghaemmaghami, S., Ihmels, J., Breslow, D. K., Noble, M., DeRisi, J. L., & Weissman, J. S. (2006). Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441(7095), 840-846. Novick, A., & Weiner, M. (1957). Enzyme induction as an all-or-none phenomenon. Proceedingsof the NationalAcademy of Sciences of the United States of America, 43(7), 553-566. Ozbudak, E. M., Thattai, M., Lim, H. N., Shraiman, B. I., & Van Oudenaarden, A. (2004). Multistability in the lactose utilization network of escherichia coli. Nature, 427(6976), 737-740. Ozbudak, E. M., Thattai, M., Kurtser, I., Grossman, A. D., & van Oudenaarden, A. (2002). Regulation of noise in the expression of a single gene. Nature Genetics, 31(1), 69-73. Peccoud, J., & Ycart, B. (1995). Markovian modeling of gene-product synthesis. TheoreticalPopulation Biology, 48(2), 222-234. 28 Pedraza, J. M., & van Oudenaarden, A. (2005). Noise propagation in gene networks. Science, 307(5717), 1965-1969. Raj, A., Rifkin, S. A., Andersen, E., & van Oudenaarden, A. (2010). Variability in gene expression underlies incomplete penetrance. Nature, 463(7283), 913-918. Raj, A., Peskin, C., Tranchina, D., Vargas, D., & Tyagi, S. (2006). Stochastic mRNA synthesis in mammalian cells. PLoSBio, 4(10), e309-e309. Raj, A., van, d. B., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nat Meth, 5(10), 877-879. Raser, J. M., & O'Shea, E. K. (2004). Control of stochasticity in eukaryotic gene expression. Science, 304(5678), 1811-1814. doi:10.1126/science.1098641 Salman, H., Brenner, N., Tung, C., Elyahu, N., Stolovicki, E., Moore, L., Braun, E. (2012). Universal protein fluctuations in populations of microorganisms. Physical Review Letters, 108(23), 238105. Sanchez, A., & Golding, I.(2013). Genetic determinants and cellular constraints in noisy gene expression. Science, 342(6163), 1188-1193. Schr6dinger, E. (1944). What is life? Cambridge University Press. Skupsky, R., Burnett, J. C., Foley, J. E., Schaffer, D. V., & Arkin, A. P. (2010). HIV promoter integration site primarily modulates transcriptional burst size rather than frequency. PLoS ComputationalBiology, 6(9), e1000952. So, L., Ghosh, A., Zong, C., Sepulveda, L. A., Segev, R., & Golding, I. (2011). General properties of transcriptional time series in Escherichia coli. Nature Genetics, 43(6), 554-560. 29 St-Pierre, F., & Endy, D. (2008). Determination of cell fate selection during phage lambda infection. Proceedings of the NationalAcademy of Sciences, 105(52), 20705-20710. Struhl, K. (1996). Chromatin structure and RNA polymerase II connection: Implications for transcription. Cell, 84(2), 179-182. Suel, G. M., Garcia-Ojalvo, J., Liberman, L. M., & Elowitz, M. B. (2006). An excitable gene regulatory circuit induces transient cellular differentiation. Nature, 440(7083), 545-550. Suel, G. M., Kulkarni, R. P., Dworkin, J., Garcia-Ojalvo, J., & Elowitz, M. B. (2007). Tunability and noise dependence in differentiation dynamics. Science, 315(5819), 1716-1719. Suter, D. M., Molina, N., Gatfield, D., Schneider, K., Schibler, U., & Naef, F. (2011). Mammalian genes are transcribed with widely different bursting kinetics. Science, 332(6028), 472-474. Taniguchi, Y., Choi, P. J., Li, G., Chen, H., Babu, M., Hearn, J., Xie, X. S. (2010). Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329(5991), 533-538. Tirosh, I., Barkai, N., & Verstrepen, K. J. (2009). Promoter architecture and the evolvability of gene expression. Journalof Biology, 8(95). Tirosh, I., & Barkai, N. (2008). Two strategies for gene regulation by promoter nucleosomes. Genome Research, 18(7), 1084-1091. To, T., & Maheshri, N. (2010). Noise can induce bimodality in positive transcriptional feedback loops without bistability. Science, 327(5969), 1142-1145. Volfson, D., Marciniak, J., Blake, W. J., Ostroff, N., Tsimring, L. S., & Hasty, J. (2006). Origins of extrinsic variability in eukaryotic gene expression. Nature, 439(7078), 861864. 30 Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-seq: A revolutionary tool for transcriptomics. Nature Reviews. Genetics, 10(1), 57-63. Yudkovsky, N., Ranish, J. A., & Hahn, S. (2000). A transcription reinitiation intermediate that is stabilized by activator. Nature, 408(6809), 225-229. Zenklusen, D., Larson, D. R., & Singer, R. H. (2008). Single-RNA counting reveals alternative modes of gene expression in yeast. Nat Struct Mol Biol, 15(12), 1263-1271. Zhang, Z., Revyakin, A., Grimm, J. B., Lavis, L. D., Tjian, R., & Kadonaga, J. T. (2014). Single-molecule tracking of the transcription cycle by sub-second RNA detection. ELife Sciences, 3. 31 CHAPTER 2. Mode of transcriptional regulation can qualitatively affect gene behavior in positive feedback 2.1 Abstract Cellular information processing often employs multi-stability for decision-making, memory and bet-hedging. Within gene networks, multi-stability is accomplished via positive feedback loops. We demonstrate with a theoretical case study that gene expression noise in these networks either stabilize/create or destabilize/eliminate bimodal gene expression patterns when respectively. transcriptional activators modulate burst frequency or burst This illustrates how the mode by which regulatory elements size, actuate transcription can have profound implications for network and cell behavior. Hence correct characterization of stochastic transcription dynamics are important for design and analysis of genes networks. 32 2.2 Introduction: Modes of regulating transcriptional bursting Myriad studies have demonstrated that gene expression is stochastic, and a stochastic description of even simple regulatory networks can lead to unintuitive and even qualitatively different behavior as compared to a deterministic description (Acar, Becskei & van Oudenaarden, 2005; Blake et a]., 2006; Cagatay et a]., 2009; Elowitz et a]., 2002; Kaern et a]., 2005; Maamar, Raj & Dubnau, 2007; Maheshri & O'Shea, 2007; McAdams & Arkin, 1997; Raser & O'Shea, 2004; Suel et al., 2006; To & Maheshri, 2010; Turcotte, Garcia-Ojalvo & Suel, 2008). For example, stochastic noise propagated through networks can either increase (Rosenfeld et al., 2005) or decrease (Paulsson & Ehrenburg, 2000; Thattai & van Oudenaarden, 2001) variability of downstream gene expression. It has been debated whether noise in gene expression could have evolutionary benefits, such as in creating diversity in response to a change in environmental conditions (e.g. the decision between the lytic or lysogenic response in phage infected bacteria), (Raj & van Oudenaarden, 2008) or perhaps noise is an unavoidable consequences of molecular events and evolution has uniformly selected against noise (McAdams & Arkin, 1999; Raj et al., 2010). Recent single molecule approaches suggest that noisy gene expression largely arises from random and intermittent "bursts" of transcription (Cai, Friedman & Xie, 2006; Chubb et al., 2006; Golding et al., 2005; Raj & van Oudenaarden, 2009; Taniguchi et al., 2010). A two-state promoter model has been employed to interpret these results (Peccoud & Ycart, 1995; Raj et al., 2006; Shahrezaei, Ollivier & Swain, 2008; introduced in Chapter 1). It models promoters transitioning between inactive and active states, and produces transcript when active (Figure 1-2). This model has three kinetic parameters -- the promoter activation rate, the promoter deactivation rate, and the transcription rate when active. The observed "bursty" transcription is consistent with rare, transitions of the promoter from the inactive to a short-lived, but highly productive active state, resulting in a burst of mRNA expression. The statistics of bursting can be succinctly described by two parameters: the burst frequency (the promoter activation rate) which is the number of bursts over the lifetime of mRNA or protein and the burst size (the ratio of the transcription rate to the deactivation rate), which is the number of mRNA or proteins produced per burst. 33 For a "bursty 7 gene, the mean level of gene expression is simply the product of burst size and burst frequency. Transcription factors that modulate this mean level may do so by either affecting burst frequency, burst size, or a combination of both (Hahn, 1998). Which mode of regulation is being employed can be inferred by examining how the intrinsic noise in gene expression scales with the mean expression level (Friedman, Cai & Xie, 2006; Raser & O'Shea, 2004). When activators regulate burst frequency, gene expression noise decreases with increased expression (with a square root dependence on protein abundance); in contrast, when activators regulate burst size, gene expression noise is constant (Figure 2-1). Several experimental studies have examined the dependence of gene expression noise on cis and trans factors. Strong TATA boxes (Mogno et a]., 2010; Raser & O'Shea, 2004) and higher number and affinity of activator binding sites (Raj et a]., 2006; Suter et a]., 2011; To & Maheshri, 2010) appears to increase burst size. In cases where intrinsic noise in protein expression was directly measured in response to changing a transcriptional activator, such as at the PHO5 gene in budding yeast (Raser & O'Shea, 2004) or a lac promoter variant in E. coli (Pedraza & van Oudenaarden, 2005), the intrinsic noise appears to scale with the inverse square root of protein abundance, characteristic of burst frequency regulation. Still, a wealth of biochemical evidence exists for activators (and repressors) to influence transcriptional reinitiation and thereby potentially burst size (Hahn, 2004). In eukaryotes, burst sizes have been measured in the range of 10O-102 mRNA (reviewed in Sanchez & Golding, 2013). Generally, regulable genes exhibit low basal expression, including a basal burst size in the absence (or presence) of activators (or repressors). A typical average mRNA copy number of such repressed genes in yeast is -0.1-1 mRNA per cell (Holland, 2002). If these genes were subject to pure burst frequency regulation, then the basal burst size would remain 10-100 mRNAs at very low levels of expression. That would mean a small but measurable (-0.1%) fraction of cells would appear strongly ON for gene expression, yet this has not been reported. Therefore, it would seem that burst size must be regulated at some point in the transition from basal to regulated expression. Single molecule studies examining the synthetic Tet OFF system in HeLa cells (Raj et a., 2006) and the lac repressor (Choi et a]., 2008) in F. coli provide evidence of burst size regulation. Global studies of noise in protein expression in E. co]] (Taniguchi et a]., 2010) and yeast (Bar-Even et a]., 2006; 34 Newman et a]., 2006) show that noise of the majority of these genes scale with the inverse square root of abundance at low to intermediate levels of expression when extrinsic noise does not dominate. One interpretation consistent with the data is that the higher expression level of stronger promoters is due burst frequency. While this may be true for many "constitutive", housekeeping genes that tend to be less noisy, highly regulable genes tended to deviate from this dependence (Bar-Even et a]., 2006; Zenklusen, Larsen & Singer, 2008). Close examination of the E. coli data set indicates that low expressing promoters (corresponding to <20 proteins) span a 10-fold range of both burst size and frequency (Taniguchi et aL., 2010). This is consistent with recent examination of burst statistics in several different E. coli promoters (So et a]., 2011) that suggests that differences in expression level are largely due to changes in burst size. Taken together, the notion that burst size and frequency can both be regulated by trans factors for many genes seems a reasonable one. Here, we investigate the key differences burst size and frequency regulation can have on the outcome of simple gene circuits involving feedback loops. Positive feedback with burst frequency regulation has previously been shown to stabilize deterministically bistable states or create a bimodal expression distribution when a bistability is not predicted (Friedman, Cai & Xie, 2006; Karmakar & Bose, 2007; Samoilov, Plyasunov & Arkin, 2005; To & Maheshri, 2010). Using both analytical and computational methods, we demonstrate that burst size regulation can have the opposite effect, destabilizing deterministically bistable states and eliminating bimodal expression. 35 2.3 Theory: 2.3.1 Steady-state expression of frequency- or size- regulated stochastic transcriptional bursting in feedback control We use a standard two promoter state model of gene expression (Raj et aL, 2006): 2-I A '> A * > A+x x i> I All rates have been normalized by the lifetime of x. If fluctuations are only assumed to arise due to stochastic transitions between the inactive and active promoter states, then a Fokker-Planck equation can be formulated for the probability density of x (Raj et a]., 2006): d (f(x)p(x)) = g(x)p(x) dx f x = -x1-xX P) g(x) =-AG1- X/ P)+ YX / P While x naturally corresponds to mRNA, we will assume coupled transcription and translation so that it may correspond to protein. This assumption is exact when mRNA and protein lifetimes are largely different (Raj et a1, 2006). The continuum approximation on x is valid when the production of x is high and p/5 > 1. The solution to the Fokker-Planck equation has been shown to be a Beta distribution (Raj et a], 2006). Under conditions of "bursty gene expression where gene activation is rare compared to activation A << y and promoter fluctuations are slower than the mRNA/protein degradation rate y > 1, the solution simplifies to a Gamma distribution (Friedman, Cai & Xie, 2006) characterized by two parameters: the burst frequency A and the burst size p / y . (The discrete equivalent, without the continuum approximation, is the negative binomial distribution.) 36 We extend this model to the case of feedback where x increases gene expression by either increasing burst frequency or burst size. To do so, we let A and 2' depend explicitly on x using a Hill-like functional form: 2-3 A 2(CA +±(1+(K2 .Am 2-4 v=vo(,7+ (1I(/,)")) where x x/,p is a normalized mRNA/protein number with respect to the maximum level if the promoter were always in the active configuration. Both Ag and threshold value for half-maximal activation, K represent a 3 oE, and o(1+E,) represent basal expression, and n and m are the Hill coefficient. We then solve the modified Fokker-Planck equation including either Eqn. 2-3 or 2-4 to find the steady-state probability density for the two types of regulation in the case of feedback. 2-5 Pf (x) = C(K"+ "')x ' x 2-6 Pb,()= C exp L-Yo (1+± (1-X) /k )-I(1 -y)- 2- (1 -)Y-- Here, C represents a normalization constant. We have previously reported 2-5 in the Supplemental Text of (To & Maheshri, 2010). The integral in 2-6 can be evaluated in closedform for n = 1. Our use of burst size and frequency as subscripts is premature, because 2-5 and 2-6 do NOT assume bursty gene expression - they only require that promoter fluctuations are the sole source of variability. Making the "bursty 2-7 gene expression assumption for 2-5 yields: P(x)c (K, +x"')'x 1 exp[-x / ] which is equivalent to equation 9 in (Friedman, Cai & Xie, 2006). That equation was derived assuming translational bursts in protein levels, although it was argued that the same equation applies to transcriptional bursts from promoter fluctuations, which is in agreement to the results here. Moreover, while in this model the only source of noise is promoter fluctuations, in the "bursty" limit, the noise from random bursts and deaths of x is negligible. Finally, an alternative expression to 2-6 can be derived if x influences the transcription rate 37 /I (Section 2.3.3), but both expressions are identical in the "bursty" limit, as p and 7 are no longer independent and both influence burst size. We note these results provide an analytical solution to the total distribution in the case of feedback regulation, not just the moments. Analytical solutions for feedback in genes with Hill-like functions (Friedman, Cai & Xie, 2006; Karmakar & Bose, 2007) have been limited to burst frequency regulation. An exact analytical solution for feedback has been reported with no continuum approximation (Hornos et a]., 2005) but it is restricted to a repressor increasing the promoter deactivation rate in a linear manner. 2.3.2 The regime of bimodal expression The analytical expressions for the mRNA/protein (x) distribution in feedback provide a convenient starting point to determine if and when the distribution is bimodal. To derive the bimodal conditions, the general strategy is to determine the number of extrema in the distribution by analyzing the number of real roots of the derivative of the distribution. Importantly, we only have to restrict our attention to the range i e [0,1]. For burst frequency regulation, we start by taking the derivative of 2-5: V dpf di Pbf 6 - + + First, it is worth noting that nature of the distribution at the boundaries i= 0,1 is simply given by Is' and 3 rd terms in the bracketed expression of 2-8. For 1 c, greater than 1 (high basal burst frequency) the slope of the distribution at x 0 is positive and the probability mass is shifted from 0. If y ever falls below 1 (slow promoter fluctuations), then the slope of the distribution at i = 1 is positive and the probability mass is shifted to 1. (If the promoter were always on the distribution would be a delta function at x = mRNA/protein dynamics are deterministic.) 38 1 because >> 1 and both the third term is always negative. The second Under bursty conditions term is always positive. If I{fA> 1, then the first term is also always positive. Therefore the derivative changes from positive to negative only once, corresponding to a single maximum of a unimodal distribution. This is true for ANY hill coefficient m. Basal bursts are large and noisy enough to always keep the promoter ON. When ItCA< 1, the derivative can change from negative to positive, and then back to negative, corresponding to two maxima and a bimodal distribution. Intuitively, the transition from negative to positive occurs at a value of i when the middle term representing the activator-dependent burst frequency regulation gets larger than the second term, but the third term is still not large comparatively. To derive conditions under which this occurs, we set the bracketed term equal to zero and rewrite it as a polynomial in 2-9 where L =vfA -1 (L+G+~\)i"'" and G -1. ).i"' + (L+ i: ,)(L+G)^-(k;"(L This was used to generate the plots in Figure 2-2. Roots for this polynomial can be found numerically for any particular set of parameters. For better understanding, we explicitly derive conditions for bimodality in two extreme cases, m =1 and m-iOc. With m = 1, explicitly assuming bursty expression i-10 (G)x 2 + (GK G >>L+ -(L + 2))x -(KA simplifies 2-9 to: )(L) This quadratic equation can be solved and has two real roots (corresponding to a bimodal expression profile) when the discriminant is positive. The discriminant is: s~~n (Gk GK,2 2 2 -2GK( ~~~2~ +)( -1)2 +qA(24GK, )+ CA4 39 And recognizing GK. K = 2 K2 <2(1-62)+1- K2 22(2- 22 1-Z82+/0 - is the condition for bimodal expression (provided (+1)) 0 (2-ke 2 ,(l 0 Zop 2<1). or +1)) For a given burst frequency, bimodal expression occurs when the burst size is large enough compared to the promoter threshold. While 2-12 yields bimodal expression for larger burst sizes, or stronger feedback (via a smaller effective promoter threshold), the OFF peak at = 0 becomes vanishingly small. Therefore, there is a feedback strength where the response is realistically unimodal (all ON) which we operationally defined by at least 5% of mass in each of the ON and OFF population and at least 5 mRNA counts of separation between the ON and OFF peaks. For the case of m--)Oc, we return to the bracketed term in 2-8 which can now be simplified to: L 2-13 - ( + - K) +1- where 8(z) is the Heaviside step function, which is zero for z <1 and unity for z f>>1 and 1. With CAe<1, the first and middle terms are always negative. Hence the derivative can only have a zero if the middle term is large enough that for some intermediate value of K , 2-13 is positive. In this case it must have two zeros because for large enough ii the third term will dominate and the derivative will become negative again. Thus the condition for bimodality is: 2-i (+ - K .< 1+ 40 ) 1 - Or &npJ'>jn 1 K2 Equation 2-14 is very similar to 2-12. There is a wider range of bimodality because the radical term in Eq. 2-12 is not present in 2-14; hence for any given burst frequency, the right hand side is smaller in 2-14. The effect of the first term on the right hand side of Eq. 2-14 is negligible in the bursting limit. While intermediate values of m lead to more complicated conditions, they tend to increase the range of bimodality from the lower limit of Eq. 2-12 to the upper limit of 2-14. For burst size regulation, we begin by taking the derivative of Eq. 2-6: dpS Pb, dx 1 ____ Lx -___+ 1+Gi/K,)" 1-x __-___o + - The first term dominates for small jand can be positive or negative depending on (where for 7 7 <1 there will be a peak at c= 0). The third term is always negative since the promoter inactivation rate remains fast even at the maximum burst size (6jo -lis always greater than zero). The middle term is always negative but goes to zero for large x. Therefore, for 7 <1, all 3 terms are negative and the distribution is peaked at i = 0 and monotonically decreases. For 7 >1, bimodality is possible if the sum of the three terms changes signs from positive to negative to positive to negative. This occurs when for some x < K. the sum changes from positive to negative as the middle term gets larger; then for some i > K. the sum changes back to positive as the middle term gets smaller; and then finally the sum becomes negative as the last term dominates. As in the case of burst frequency regulation, we can set the bracketed term to zero and rewrite as a polynomial in x: -16 [(1 -)+ (I -Ifo, " 1-Zs Z"(141 + (1 - fs o)2-(a)1-Z A bimodal distribution is possible only when 1 - < 0 and Eq. 2-16 has 3 positive real roots. For better understanding, we explicitly derive conditions for bimodality in two extreme cases, n =1 andn -f+oo. With a Hill coefficient of 1, we rewriting Eq. 2-16 for the case of n 2-17 [(l- )+(1 ]i2 [(+ (Z" -1)((l- )+(I- _o, =1: - f))]i-[(Z")(1-2)] Bursty expression implies a short-lived active promoter state, even in the presence of saturating amounts of transcriptional activator. Therefore, the fact 1 -t (1-06,)<0. Combined with < 0 , the quadratic coefficient is always negative and the constant coefficient 1, the linear coefficient is always negative. is always positive. It is also clear that for" For K" < , the linear coefficient is also always negative, provided y0 >>A, a condition of bursty expression. To see this, the linear coefficient in Eq. 2-17 can be rewritten as (K" - 1)(-A) -O + (K, -f70,). The first term of this expression is positive but always strictly less than f and the third term is always negative; therefore the linear coefficient is always negative. Then, by Des Cartes rule of signs, there is only ever one positive real root under bursty conditions which implies that there is always a unimodal distribution.In other words, in contrast to pure burst frequency regulation, pure burst size regulation will never yield a bimodal response with a Hill coefficient of 1. For the case of nf-of, we return to the bracketed term in 2-15 which can now be simplified to: - v_^ K,) + -Er o] 0(z) is again the Heaviside step function. In the relevant case of 1 - t < 0 (1- 0 6,) and <0, the derivative is always positive for iclose to 0 and negative for iclose to 1. The condition for bimodal expression then is that the derivative changes signs from negative 42 i to positive at some intermediate K. where the middle term in Eq. 2-18 drops out. = Formally, this translates into the following two conditions: A-1 f - -19~- K, -70<0 E -< - + 1- K, 1K-K r -+ 7Yo 1-K, K, - >0 The two conditions can be combined to yield the range of (over which bimodal expression occurs: 2-1 2-20 +1+ +- <K < e(1 E,) ' 2-1 -1+s + -1 Next we compare the stochastic and deterministic range of bimodality. In the deterministic case, the differential equation describing positive feedback in terms of the microscopic rate constants used here is: dx dt 2(x) A(x)+y(x) where we make explicit that A and )/ are potentially functions of Xdepending on whether burst size or frequency is being regulated. (The denominator can be simplified for bursty expression since y >> 2 , but we will treat this general case). To find the fixed points of 2-21, we set the rhs equal to zero. Rewriting in terms of the rescaled constants: ~0 ) ?+ (^) + A(0 43 We want to make an explicit comparison of the range of bimodality in terms of parameters for the deterministic case versus the stochastic case. For nm = 1, we already know that Eq. 2-22 only yields a single fixed point and no deterministic bistability is possible. For m, n --> 0 the range of bistability is easily calculated. For example, for M-)OC, Eq. 2-22 can be rewritten as: Z + +0(i +oS < k7)) which is an explicit expression for the fixed points i that can be evaluated for i greater or less thanK . Bistability occurs when the calculated fixed point is consistent with it being greater or less than K. This leads to the following range of bistability for burst frequency, Eq. 2-24, and burst size regulation, Eq. 2-25: 2-24~ < K, < < -K - The deterministic range of bistability of equations 2-24 and 2-25 can be directly compared to the stochastic range of bimodality of equations 2-14 and 2-20. For both burst size and burst frequency regulation, the upper limit of the stochastic range of bimodality is less than its deterministic counterpart, but approaches it for large burst frequency and small burst size. Near this upper limit, the stable steady-states are close (a pitchfork bifurcation occurs at the upper limit) and so noise in expression can easily promote transitions, erasing any distinction. The lower limit of the stochastic range of bimodality is more complicated. For burst frequency regulation, while there is in principle no lower limit (in the continuum approximation for protein levels) we have operationally defined it as when less than 5% of the probability mass remains in the peak corresponding to little or no protein. The result with this definition leads to a stochastic lower limit that is slightly lower than the deterministic lower limit. For burst size regulation, comparison of Eq. 2-20 with 2-25 reveals 44 that for smaller 2[ , the lower limit of the stochastic range of bimodality is significantly (easily found by equating lower than the deterministic lower limit. There is particular a I the lower limits of Eq. 2-20 and 2-25) at which this relationship flips. However, at this higher value of f , the burst approximation breaks down. With the operational lower limit set by the vanishing OFF population, the stochastic range is shrunk relative to the deterministic range, as supported by the numerical solution in Figure 2-2. 2.3.3 Two modes of regulating burst size are equivalent in the bursting limit In the previous discussion, burst size regulation has been modeled as a change in). In the burst limit, either for Ci could be regulated and there should be no effect on expression. To show this explicitly, consider an alternative to Eq. 2-4 that is similar to 2-3: (=I,++ +0 (KUI 2-,, )" - . For illustrative purposes, we focus on the case For now, we do not normalize A by of m=1, although similar arguments can be made for arbitrary m. With this regulation, the >> A and ODE in Eq. 2-2 can be solved. After applying the bursty assumption rearranging: K 2-27 Pb5,(x) Equation (0 + f 2-27 K =C( W " + (f x) is identical > K f / )- xE (1+ep)ep to ,and, . -+ ' with 2-17, These ~-1 the y ()- I--,3 +) e, following 1+-% change transformations are of v ariables due 1+ reciprocal relationship between in the effect of changing fi and 45 F on burst size. to the 2.4 Results: 2.4.1 Bimodal expression patterns associated with positive feedback loops are enhanced with burst frequency regulation but reduced with burst size regulation To compare pure burst size and burst frequency regulation in a mathematically controlled manner we require an identical dependence of the mean level of gene expression on a transcriptional regulator for both forms of regulation. Therefore, both the mean basal level and fold change in expression is equal for both cases, which constrains other parameters (Table 2-1). However, the expression distribution is very different, and the earlier described scaling relationships of noise on abundance are apparent (Figure 2-1). To focus on the differences between the two types of regulation, we neglect extrinsic noise (plasmid copy number, global expression capacity, etc.). Table 2-1: Parametersselected for controlled comparison of burst fiequency and size regulation Burst frequency regulation Burst size regulation Foldchange in expression Range of mean expression X0 [mRNA / [mRNA/ [mRNA] minute] mRNA 41 [1.25,50] 500 5 41 [1.25,50] 500 1.25 lifetime] Frequency regulation Size regulation 3 C .0 40 W LD Yo [mRNA/ 1-] minute] .025 50 488 ey .025 Frequency regulation Size regulation 2 0. X W C CD T 20Z " LF 1 n 0 105 100 Input Signal 20 40 Mean Expression Fgvure 2-1: Despite an identical mean response to an open-loop input signaL regulation have different expression distributions with different noise levels. 46 fiequency and size Next, we compare the effect of burst size and burst frequency regulation on a gene within a positive feedback loop, where the Hill coefficient in the promoter dose-response characteristic is varied (Figure 2-2). We calculate the distributions at different values of the feedback strength both analytically using Equations 2-5 and 2-6 and numerically using either a finite Markov approach (Munsky & Khammash, 2006) or the Gillespie algorithm (Gillespie, 1977). Feedback strength is varied by changing K, the threshold level of regulator which leads to half-maximal activation. We define the regime of bimodality as the range of K over which one observes a bimodal distribution with at least 5% of the distribution lying in the ON or OFF state. As has been observed earlier theoretically (Friedman, Cai & Xie, 2006; Karmakar & Bose, 2007; Samoilov, Plyasunov & Arkin, 2005) and experimentally (To & Maheshri, 2010), with burst frequency regulation at intermediate feedback strengths one observes a bimodal distribution even with a Hill coefficient of 1. In general, burst frequency regulation widens the range of feedback strengths over which a bimodal expression is observed, with the extent of widening diminishing with an increased Hill coefficient of the autoregulatory response. In contrast, burst size regulation destabilizes the bistability and decreases the range of feedback strengths over which bimodal expression is observed. With a Hill coefficient < 1 burst size regulation never resultsin bimodalresponse. Why does burst frequency regulation result in bimodal expression over a larger range of feedback strengths? Without feedback, bursty gene expression results in a gamma distribution of mRNA or protein at steady-state (Peccoud & Ycart, 1995; Raj et a]., 2006; Shahrezaei, Ollivier & Swain, 2008). If burst frequency is less than 1 a burst occurs less frequently than the mRNA or protein lifetime, a significant fraction of cells have no mRNA or protein, and the distribution is peaked at zero and monotonically decreasing. Burst frequencies greater than 1 result in a distribution peaked at a non-zero value. When burst frequency is regulated, a bimodal expression distribution with peaks at a zero and non-zero value can be observed. This occurs when the positive feedback loop samples burst frequencies across a range spanning a burst frequency of 1. Given the basal burst frequency set here is less than 1, this always happens for large enough feedback strength. In contrast, when burst size is regulated the burst frequency is set to some fixed quantity while the positive feedback loop samples various burst sizes. Increasing burst size increases the mean level of expression, 47 but it does so by widening the distribution rather than shifting it from an OFF to ON peak. With higher Hill coefficients, the underlying deterministic bistability results in a bimodal expression profile, but this is destabilized by burst size regulation (Figure 2-2, right). Burst size regulation Burst frequency regulation Region of bimodality Noncooperative (n=1) M 2 Stochastic, mRNA per cell Frequency regulation! ----------- Deterministic Stochastic, Size regulati Cn mRNA per cell ----e----- Cooperative (n>1=4) .0 mRNA per cell mRNA per cell Figure 2-2: Qualitative differences in population variability due to positive feedback loops depending on burst size and burst frequency regulation. Sinulation of positive feedback with burst -iequencyregulation (left) and burst size regulation (right)shows that burst frequency regulation can create Iiiodality in the absence of deterministic bistability whereas burst size regulation cannot (top). In the presence of deterministic bistability (bottom), burst frequency stabilizes the bistability and extends the regime of bimodality, whereas burst size regulation destabilizes the bistability and reduces the bimodal regimne. Colored plots are expression distributions at four different expression levels, modulated via feedback strength. simulated with Finite Markov Chain. Center panels are summary of the range of feedback strengths for which the system has bimodal expression. 48 2.4.2 Mode of regulation affects mean expression These qualitative differences between burst size and frequency regulation affect the mean expression level. The bimodal expression profile generated with burst frequency regulation results in a mean expression that is less than the deterministic expectation at intermediate feedback strengths (Figure 2-3, left, blue). It is the low expression peak that decreases the mean. The extent of this decrease will increase for lower basal burst frequencies. This is intimately connected to the fact that lower basal burst frequencies lead to a larger range of feedback strength over which a bimodal profile is observed. For burst size regulation, the mean of the unimodal distribution is larger than the deterministic expectation, increasing the mean expression (Figure 2-3, left, red). Again, the extent of this difference is magnified with lower burst frequencies and hence higher basal burst sizes. Even at low feedback strengths, there is enough sampling of higher burst sizes to increase the mean expression. Sampling of higher burst sizes at low feedback also increases the noise in expression at intermediate levels (Figure 2-3, right). Frequency regulation Size regulation - - 3 .2 40 Frequency regulation Size regulation 2 X w - 0 z 20 - 1- 0 0 100 102 104 Feedback Strength (1/K) 0 40 20 Mean Expression Figure 2-3: Aean expression in fredback ihr stochastic burst frequency or size regulation versus line). (Blue) The zero-peak of bimodal expression with burst frequency regulation decreases its mean expression relative to deterministic expectation. (Red) Size regulation samples higher burst sizes at lower kfedback strengths, increasing the unimodal population mean slighttv above the deterministic expectation (left) and also population noise (right). determintic (red dotted 49 2.5 Discussion While these results assume a model of coupled transcription and translation, they are exact when the ratio of the mRNA to protein degradation rates are very high or low. When the protein is more stable than mRNA, the burst frequency should normalize the promoter activation rate by the protein lifetime, whereas the mRNA lifetime is appropriate if the protein is unstable. The former condition is true for most proteins, and long-lived proteins can time-average transcriptional bursts resulting in relatively high protein burst frequencies. However, transcriptional activators and repressors are often unstable (Belle et aL, 2006), which allows genes to turn ON and OFF quickly, and protein lifetimes can be short in rapidly growing microbes due to dilution. The results presented here are limited to a two-state promoter model under conditions where a continuum approximation is appropriate for mRNA. Clearly, promoters have complex architectures and transitions between multiple states (Hahn, 2004), leading to a more complex dependence of noise on trans factors and expression level (Sanchez et al., 2011). Nevertheless, the qualitative effects of burst size and burst frequency regulation described here should hold and contribute to our understanding of what these additional complexities add. In reality, genes do not follow the well-parameterized continuous transcriptional bursting of the popular model, but some genes will behave closer to it than others, often enough to extract useful insight from the model. The following chapters show that our gene of interest has cell-cycle dependent active periods rather than bursting. Yet the bursting framework is still useful for understanding active/inactive dynamics and predicting consequences for behavior in gene networks. 50 2.6 References Acar, M., Becskei, A., & van Oudenaarden, A. (2005). Enhancement of cellular memory by reducing stochastic transitions. Nature, 435(7039), 228-232. Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O'Shea, E., Pilpel, Y., & Barkai, N. (2006). Noise in protein expression scales with natural protein abundance. Nature Genetics, 38(6), 636-643. Belle, A., Tanay, A., Bitincka, L., Shamir, R., & O'Shea, E. K. (2006). Quantification of protein half-lives in the budding yeast proteome. Proceedingsof the National Academy of Sciences, 103(35), 13004-13009. Blake, W. J., Bal zsi, G., Kohanski, M. A., Isaacs, F. J., Murphy, K. F., Kuang, Y., Collins, J. J. (2006). Phenotypic consequences of promoter-mediated transcriptional noise. Molecular Cell, 24(6), 853-865. agatay, T., Turcotte, M., Elowitz, M. B., Garcia-Ojalvo, J., & SOel, G. M. (2009). Architecture-dependent noise discriminates functionally analogous differentiation circuits. Cell, 139(3), 512-522. Cai, L., Friedman, N., & Xie, X. S. (2006). Stochastic protein expression in individual cells at the single molecule level. Nature, 440(7082), 358-362. Choi, P. J., Cai, L., Frieda, K., & Xie, X. S. (2008). A stochastic single-molecule event triggers phenotype switching of a bacterial cell. Science, 322(5900), 442-446. Chubb, J. R., Trcek, T., Shenoy, S. M., & Singer, R. H. (2006). Transcriptional pulsing of a developmental gene. Current Biology, 16(10), 1018-1025. Elowitz, M. B., Levine, A. J., Siggia, E. D., & Swain, P. S. (2002). Stochastic gene expression in a single cell. Science, 297(5584), 1183-1186. 51 Friedman, N., Cai, L., & Xie, X. S. (2006). Linking stochastic dynamics to population distribution: An analytical framework of gene expression. PhysicalReview Letters, 97(16), 168302. Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journalof Physical Chemistry, 81(25), 2340-2361. Golding, I., Paulsson, J., Zawilski, S. M., & Cox, E. C. (2005). Real-time kinetics of gene activity in individual bacteria. Cell, 123(6), 1025-1036. Hahn, S. (1998). Activation and the role of reinitiation in the control of transcription by RNA polymerase II. Cold Spring HarborSymposia on QuantitativeBiology, 63, 181188. Hahn, S. (2004). Structure and mechanism of the RNA polymerase II transcription machinery. Nat Struct Mol Biol, 11(5), 394-403. Holland, M. J. (2002). Transcript abundance in yeast varies over six orders of magnitude. Journal of Biological Chemistry, 277(17), 14363-14366. Hornos, J. E. M., Schultz, D., Innocentini, G. C. P., Wang, J., Walczak, A. M., Onuchic, J. N., & Wolynes, P. G. (2005). Self-regulating gene: An exact solution. Physical Review E, 72(5), 051907. Kaern, M., Elston, T. C., Blake, W. J., & Collins, J. J. (2005). Stochasticity in gene expression: From theories to phenotypes. Nature Reviews. Genetics, 6(6), 451-464. Karmakar, R., & Bose, I. (2007). Positive feedback, stochasticity and genetic competence. Physical Biology, 4(1) 29. Maamar, H., Raj, A., & Dubnau, D. (2007). Noise in gene expression determines cell fate in bacillus subtilis. Science, 317(5837), 526-529. 52 Maheshri, N., & O'Shea, E. K. (2007). Living with noisy genes: How cells function reliably with inherent variability in gene expression. Annual Review of Biophysics and Biomolecular Structure, 36(1), 413-434. McAdams, H. H., & Arkin, A. (1999). It's a noisy business! Genetic regulation at the nanomolar scale. Trends in Genetics, 15(2), 65-69. McAdams, H., & Arkin, A. (1997). Stochastic mechanisms in gene expression. Proceedings of the NationalAcademy of Sciences, 94(3), 814-819. Mogno, I., Vallania, F., Mitra, R. D., & Cohen, B. A. (2010). TATA is a modular component of synthetic promoters. Genome Research, 20(10), 1391-1397. Munsky, B., & Khammash, M. (2006). The finite state projection algorithm for the solution of the chemical master equation. The Journalof Chemical Physics, 124(4), Newman, J. R. S., Ghaemmaghami, S., Ihmels, J., Breslow, D. K., Noble, M., DeRisi, J. L., & Weissman, J. S. (2006). Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441(7095), 840-846. Paulsson, J., & Ehrenberg, M. (2000). Random signal fluctuations can reduce random fluctuations in regulated components of chemical regulatory networks. Physical Review Letters, 84(23), 5447. Peccoud, J., & Ycart, B. (1995). Markovian modeling of gene-product synthesis. TheoreticalPopulation Biology, 48(2), 222-234. Pedraza, J. M., & van Oudenaarden, A. (2005). Noise propagation in gene networks. Science, 307(5717), 1965-1969. Raj, A., Peskin, C., Tranchina, D., Vargas, D., & Tyagi, S. (2006). Stochastic mRNA synthesis in mammalian cells. PLoSBiol, 4(10), e309-e309. 53 -. Raj, A., & van Oudenaarden, A. (2009). Single-molecule approaches to stochastic gene expression. Annual Review of Biophysics, 38(1), 255-270. Raj, A., & van Oudenaarden, A. (2008). Nature, nurture, or chance: Stochastic gene expression and its consequences. Cell, 135(2), 216-226. Raser, J. M., & O'Shea, E. K. (2004). Control of stochasticity in eukaryotic gene expression. Science, 304(5678), 1811-1814. Rosenfeld, N., Young, J. W., Alon, U., Swain, P. S., & Elowitz, M. B. (2005). Gene regulation at the single-cell level. Science, 307(5717), 1962-1965. Samoilov, M., Plyasunov, S., & Arkin, A. P. (2005). Stochastic amplification and signaling in enzymatic futile cycles through noise-induced bistability with oscillations. Proceedings of the National Academy of Sciences of the United States of America, 102(7), 2310-2315. Sanchez, A., Garcia, H. G., Jones, D., Phillips, R., & Kondev, J. (2011). Effect of promoter architecture on the cell-to-cell variability in gene expression. PLoS Comput Bio, 7(3), e1001100. Shahrezaei, V., Ollivier, J. F., & Swain, P. S. (2008). Colored extrinsic fluctuations and stochastic gene expression. Mol Syst Biol, 4(198). So, L., Ghosh, A., Zong, C., Sepulveda, L. A., Segev, R., & Golding, I. (2011). General properties of transcriptional time series in escherichia coli. Nature Genetics, 43(6), 554-560. Suel, G. M., Garcia-Ojalvo, J., Liberman, L. M., & Elowitz, M. B. (2006). An excitable gene regulatory circuit induces transient cellular differentiation. Nature, 440(7083), 545-550. 54 Suter, D. M., Molina, N., Gatfield, D., Schneider, K., Schibler, U., & Naef, F. (2011). Mammalian genes are transcribed with widely different bursting kinetics. Science, 332(6028), 472-474. Taniguchi, Y., Choi, P. J., Li, G., Chen, H., Babu, M., Hearn, J., Xie, X. S. (2010). Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329(5991), 533-538. Thattai, M., & van Oudenaarden, A. (2001). Intrinsic noise in gene regulatory networks. Proceedings of the NationalAcademy of Sciences of the United States of America, 98(15), 8614-8619. To, T., & Maheshri, N. (2010). Noise can induce bimodality in positive transcriptional feedback loops without bistability. Science, 327(5969), 1142-1145. Turcotte, M., Garcia-Ojalvo, J., & SOel, G. M. (2008). A genetic timer through noiseinduced stabilization of an unstable state. Proceedingsof the NationalAcademy of Sciences, 105(41), 15732-15737. Zenklusen, D., Larson, D. R., & Singer, R. H. (2008). Single-RNA counting reveals alternative modes of gene expression in yeast. Nat Struct Mol Biol, 15(12), 1263-1271. 55 CHAPTER 3. The cell-cycle dependence of transcription is a dominant source of noise in gene expression 1 3.1 Abstract The large variability in mRNA and protein levels found from both static and temporal measurements in single cells has been largely attributed to random periods of transcription, often occurring in bursts. The cell cycle has a pronounced global role in affecting transcriptional and translational output, but how this influences transcriptional statistics from noisy promoters is unknown and generally ignored by current stochastic models. Here we show that variable transcription from the synthetic tetO promoter in S. cerevisiae is dominated by its dependence on the cell cycle. Real-time measurements of fluorescent protein at high expression levels indicate tetO promoters increase transcription rate -2-fold in S/G2/M similar to constitutive genes. At low expression levels, where tetO promoters are thought to generate infrequent bursts of transcription (Raj et a]., 2006; To & Maheshri, 2010), we observe random pulses of expression restricted to S/G2/M, which are correlated between homologous promoters present in the same cell. The analysis of static, single-cell mRNA measurements at different points along the cell cycle corroborates these findings. Our results demonstrate that highly variable mRNA distributions in yeast are not solely the result of randomly switching between periods of active and inactive gene expression, but instead largely driven by differences in transcriptional activity between GI and S/G2/M. 3.2 Introduction At the single-cell level, mRNA and protein levels of regulable genes are often found to be highly variable (Newman et a]., 2006; Raj & van Oudenaarden, 2009; Taniguchi et a., 2010). The resulting long-tailed mRNA and protein distributions are well-described by stochastic models (Peccoud & Yeart, 1995; Raj et al., 2006; Shahrezaei, Ollivier & Swain, I Some text and figures are taken from Zopf, Quinn, Zeidman & Maheshri, 2013. 56 2008; Taniguchi et a]., 2010) of transcriptional bursting, where a promoter undergoes random and intermittent periods of highly active transcription. Real-time observations of transcription in multiple organisms appear consistent with this behavior (Choi et a]., 2008; Chubb et aL., 2006; Golding et a]., 2005; Larson et a]., 2011; Maiuri et a]., 2011; Muramoto et a., 2012; Taniguchi et a]., 2010; Suter et a]., 2011). Thus, both static and temporal views attribute much of the observed mRNA variability to the stochastic nature of reactions intrinsic to transcription. Consequently, the standard stochastic model of gene expression has been widely used to infer steady-state dynamics (Mao et aL, 2010; Munsky, Neuert & van Oudenaarden, 2012; Raj et a]., 2006; Tan & van Oudenaarden; To & Maheshri, 2010). However, earlier studies examining the origin of variability in protein expression found such variability is not solely due to stochasticity in reactions intrinsic to gene expression, but also extrinsic factors. These studies looked for correlations in expression between identical copies of one promoter (Elowitz et a]., 2002; Raser & O'Shea, 2004; Volfson et a]., 2006) and/or between that promoter and a global or pathway-specific gene (Colman-Lerner et al., 2005; Pedraza & van Oudenaarden, 2005). Not only is the importance of extrinsic factors clear, without time-series measurements the intrinsic noise measured by these techniques may not completely be ascribed to stochastic reactions in gene expression (Hilfinger & Paulsson, 2011). While global extrinsic factors have been suggested to largely impact translation (Raj et al., 2006), their influence on transcription and transcriptional bursting is unclear. Numerical analysis indicates that an expression distribution well-fit by the solution to the bursting model does not mean variability arises from bursting. This cell-cycle dependence of transcription has gone largely unnoticed but, to some degree, should be expected. The cell cycle is known to have global effects on total protein and RNA synthesis that should play a role in transcription (Larson et al., 2011; Trcek et al., 2011; Volfson et al., 2006). However, with few exceptions (Volfson et al., 2006), most models of gene regulation do not account for cell cycle variability. Using both static single molecule mRNA and dynamic real-time protein measurements in single cells, we show that much of the variability in a synthetic tetO promoter typical of noisy genes in yeast is driven by differences in transcription rate between G1 and S/G2/M. 57 3.3 Results: 3.3.1 Multiple transcription patterns result in expression distributions consistent with transcriptional bursting We measure noise in gene expression from the constitutive DOA1 promoter (Po0A) and the regulated tetO promoter with 1 (Pixteto) or 7 (P7 x,,to) activator binding sites. tetO is a synthetic inducible gene regulation system, originally developed for studying gene regulation in mammalian cells (Gossen & Bujard, 1992), then transferred to yeast (Gari et a]., 1997). It reversibly expresses a reporter gene in response to its activator tetracycline transactivator (tTA), which is a fusion of the tetracycline repressor (TetR) from E coli and the activation domain VP16 from Herpes Simplex Virus. tTA's binding activity is controlled by derivatives of the tetracycline antibiotic, such as doxycycline. tTA binds to a specific tetO operator sequence, one or more copies of which are located upstream of a minimal promoter (CYC] in this study) (Figure 3-1). Dox 9f NxtetO Figure 3-1: Simple depiction of the tetO promoter and activator.N copies of the tTA binding site. tetO, are inserted upstream of the minimal CYCI promoter. Dox modulates tTA activity. This enables controlled expression of a gene of interest (the vYFP fluorescent reporter in this stuldy). To begin to understand the origins of noise, we first consider the standard stochastic model of gene expression, which describes promoter fluctuations between two states, ON and OFF, with exponential waiting times in both states. Conditions of transcriptional bursting occur when the promoter rarely transitions to the ON state (with some burst frequency) and spends a short but productive period producing mRNA (with some burst size). This process is predicted to yield a negative binomial distribution of mRNA at stationary conditions (Paulsson & Ehrenberg, 2000). Conditions of bursting are expected for noisy, regulated genes; constitutive genes are expected to be expressed with a "burst size" of one, equating to Poisson expression statistics. We measure cytoplasmic mRNA distributions from P 1 xteo and P 7,,,,o without tTA present at basal conditions (Figure 3-2 A, B), with 58 intermediate levels of tTA (Figure 3-2 C, D) and from PDOA1, at two expression levels titrated by growth phase ((Figure 3-2 E, F). These expression distributions agree well with the expectations of the bursting model (Figure 3-2, black). tetO expression distributions are well-fit by the negative binomial distribution and the inferred burst sizes range from 5-8 mRNA for Pxtet)o and 9-10 for Expression from PDOA1 P7,tetO mRNA across basal and intermediate expression levels. leads to Poisson-like distributions, with a burst size between 1 and 2 mRNA (Table 3-1). These burst parameters are consistent with previous measurements (To & Maheshri, 2010; Zenklusen, Larsen & Singer, 2008). A 0.6 Total population 0.3 C B N=136 0.4 N=237 N=341 .. 0.2.00101 WtTA - . F -dox E +:dox D MTA - N=302 - 20. N=560 - N=454 0.3 Figure 3-2: Cytoplasmnic mRNA expression distributions for Pi 1 -O and PaO without activator (A, B), Pi,,o and Ps,.,O with intermediate levels of activator (C, D) and two levels of PIX)OI expression. The dot and horizontal lines above the distribution represent the mean and standard dei ation of the distribution.Error bars show the sampling errorfrom bootstrapping. The expression distributions are well-fit by a negative binomial distribution with moderate burst sizes fi)r P 1 ,711,10 and a very Jov buist size of 1-2 mRNA for Po.I: (black). But we found signs of extrinsic noise in this expression variability, leading us to doubt the applicability of the bursting model to explain the transcription dynamics underlying the expression distributions. The bursting parameters inferred across a wide range of activator levels indicate a biphasic trend in bursting dynamics at the tetO promoters (data shown and revisited later in Figure 4-12). But we show with kinetic Monte Carlo (Gillespie, 1977) simulations that several transcription patterns produce the negative binomial solution of the two-state promoter model. Some even recreate the observed biphasic bursting dynamics despite no underlying biphasic origin. Figure 3-5 shows this for: a combination of small activator-independent basal bursts and large bursts whose frequency increases with activator (Figure 3-5A); transcriptional bursting with a maximum burst size (rather than an infinite gamma distribution of burst sizes) (Figure 3-5B); and bursting with an activator-regulated 59 frequency that is also modulated by the cell cycle (Figure 3-5C). This is an example of a common error where a model does indeed fit the data, but is not actually a valid representation of the data's origins. Large activator-stabilized bursts Original Truncated D mRNA ,,, enN gene 0 40 Small basal bursts G1 80 30 aeuu *20 20 A e - UU * e 10 10. 0 M name a s....... 20 0 G2 30 30 s S Burst size 0.5 1 1.5 2 Burst Frequency 0 B 0.5 1 1.5 2 Burst Frequency 00 0.5 2 1.5 1 Burst Frequency Figure3-3: Biologicalv plausible transcrjptionpatterns whose noise recreatesthe biphasicpattern we observed experimentaly. Top: Schenatic of transcriptionpattern. Bottom: Black: the piaram1Jeters used to simulate transcription; blue: the parameters infrred using the negative binomial lit. Hypothetical transcriptionpatterns are: (a) a combination oflsmall basal bIursts and large activated bursts. (b) a maximum sampled blurst size of 40, (c) burst frequency regulation where the burst iquen(y halves for one-quarter of the cell cycle. 3.3.2 Static mRNA FISH reveals cell-cycle dependent expression may create extrinsic noise in expression Furthermore, we measured high positive covariance (p = 0.3-0.7) of mRNA expression from identical genes at homologous loci of a diploid cell (constructed for this purpose). This was clear evidence of extrinsic noise, prompting further investigation of the source of noise. Figure 3-4 shows a sample mRNA FISH image from three of the conditions above. expression (F) shows the expected low-level, tightly-distributed expression. P/ 7 xt~eo POAI (A, D) show the expected "noisy" expression, with most cells having few mRNA but some filled with many mRNA. But closer inspection suggests an interesting trend: the highestexpressing cells in bursty regime are all in S/G2/M (colored orange and red here). But no conclusions can be made without further quantitative analysis. 60 A +D.. F S/G2/M, SMALL BUD S/G2/M, LARGER BUD Figure 3-4: Aiicrographs of cells from three of the samples with (top) classification of cell-qycle stage as GJ (yellow). earlv-S/G2 (orange) or later S/G2/Ml (red) and (bottom) the maximum projection of eight images fluorescent rhodamine staining within a Z-stack. I'Ve clearky see highly , (B) and, less so, P.,-,() (A) but tight expression from P 0o. (C). The variable expression from cases of highest expression from Pt,.o and P-rto, are S/G21/,[ cells. 61 To investigate whether mRNA number has cell-cycle dependence, we classify cells by whether they are in G1 or the size of their bud in S/G2/M (Figure 3-5). A cell develops a bud at the G1/S transition, which then grows in size until it reaches approximately 60% of the cross-sectional area of the mother before budding off as a daughter cell at mitosis (M). Cell-cycle identification is assisted by staining the nucleus with DAPI, by indicating whether the nucleus has split into two (Figure 6-1B). We classify G1 and three stages of S/G2/M based on increasing absolute bud size (Figure 3-5). Gi S G2 M Figure 3-5 Cell-c ycle stage is classified visually. Budding cells are classed as early, mid or late S/G2/ il according to bud size. This creates a pseudo-temporal cell-cycle profile. Without any further analysis, direct measurements of mRNA expression in these four cell-cycle stages clearly show cell-cycle dependent expression (Figure 3-6). PoAl expression increases from GI through increasing bud size in S/G2/M, but only about 2-fold, based on the distributions' means. But the change in basal expression from GI through increasing bud size is remarkable: mean expression increases several-fold, but the most obvious effect is that the percentage of cells without mRNA which drops from 50% to 5% for Pltct(o (A). Intermediate expression (C, D) appears somewhere between the two. Our observations that mRNA profiles vary across the cell cycle clearly contradict the current bursting model of expression. But these expression distributions contain a history of transcription over the lifetime of the mRNA reporters, making the underlying transcription rate changes nonobvious. The distribution in each cell-cycle stage contains noise, which can't be directly interpreted. For both reasons, we require a model of mRNA production and degradation across the cell-cycle to further interpret the results. 62 A No bud 0 Early bud Small bud S 0.6 N=791 C - 0.4 50.6 0 0.4 ~0.3 0.2 0 0.6 0 N=18 0.4 0.3 0.2 N=22 ~ 0 N=1251 N=41 ~ D N=191 .:dox + WtTA E F ~N=178 N=388 0 .3 0 6AMi 11 N=53 N=40 ~ 10 20 30 .05 N=47 0.2 02 'N=42 0.1 N4 H N=7 00. 0 A 20 40 0 20 08 20 40 60 mRNA Count 40 N=61 N=282 - N=58 N=67 0.6 03 0.1 0 0.2 0.1 0 N=53| 0.6 0.3 01 -0.1 i .05 L 0.1 N=50.2 N=31 0.2 0 0.1 0.2 .051 .1 N=17 .6 0 .:doxl + WtTA 0.2 0.3 0 Lmare bud B N=51 ~ N=52 0.6 0.3 0 M 10 20 0 5 10 Figure3-6: Expression distributions of the cases in Figure 3-2 stratifiedby cell-cycle stage. Basal expression (A. B) shows large cell-cycle dependence of expression; constitutive expression (E. F) shows a smaller degree of cell-cycle dependent expression. 63 3.3.3 A stochastic model to infer cell-cycle dependent transcription from mRNA expression distributions We develop a simple model of cell-cycle transcription to assess what fold-change in transcription across the cell-cycle is consistent with experimental data. Since transcription rate is the parameter of interest, whereas mRNA count is observed, we use a model that incorporates the lifetime of mRNA to link observed mRNA distributions to underlying dynamics. We modified standard stochastic models for gene expression to incorporate cellcycle effects and see if these could describe observed mRNA distributions. The model uses different but fixed transcription rates in G1 (k,./) and S/G2/M (k,,.): - 3-1I dt ktxf/ 0 f -MM ktX -7MM < t ! tG1 tG1 <iICJ with f defined as the ratio of transcription in S/G2/M to G1, such that f=1 means no cell cycle effect and f = Inf means no expression in G1. Model parameters set by experimental observations are: a 20 min vYFP mRNA half-life (1n(2)/y,) (To & Maheshri, 2010), a 120 min cell cycle duration (t(c), and a 55 min G1 duration (tG1), with the latter two parameter values varying slightly depending on the sample. We choose to simulate only Poisson transcription, in order to see how much of the super-Poissonian noise derives from cell-cycle effects. However the model can be adapted for bursty transcription, where kt1 , is simply the product of the size and frequency of bursts. We use a finite state Markov approach (Munsky & Khammash, 2006) to simulate the stochastic birth and death of mRNA across the cell cycle. This yields the mRNA distribution as a function of cell-cycle progression. The stationary condition is enforced such that the beginning mRNA distribution is the result of binomial partitioning of the end mRNA distribution. If one explicitly ignored cell-cycle, this model is well-known to yield a stationary mRNA distribution that is Poisson with mean kx,/vI. But our solution is an oscillatory steady-state because the initial mRNA distribution in new cells matches binomial partitioning of mRNA present in the mother plus bud right before mitosis. While much is known about the age-dependent structure and size distribution of yeast populations, we do not attempt to describe these details in this or other models. These details do not have large qualitative effects and omitting them provides simplicity without sacrificing our ability to assess the importance of cell-cycle dependent transcription. 64 3.3.4 Regulated transcription at low activator levels is restricted to S/G2; Constitutive expression varies with gene dosage To review our results so far, while the overall mRNA distributions exhibit excellent fits to a negative binomial distribution predicted by the standard model (Paulsson & Ehrenberg, 2000; Raj et a]., 2006) (Figure 3-7, grey), partitioning data by cell-cycle phase clearly shows it is incorrect. We develop a model of constant Poisson transcription with a rate that increases by ffold between G1 and S/G2/M. We next apply it to data to estimate the fold- change in expression and the extent to which this causes overall expression variability. The model has two free parameters, ktx and f, which dictate the magnitude of transcription in G1 and S/G2/M. We evaluated 3 different choices: setting f=1 such that ktxu is constant throughout the cell cycle, setting f= 2 consistent with the expected increased due to gene dosage, and allowingf> 2. With the first two choices, fis set and k,, is specified such that the mean of the measured and model distributions are equivalent. For the last choice, [and ktx are specified such that the mean of the measured and model G1 and S/G2/M distributions match. Because each choice corresponds to a different way of modeling cell-cycle dependent transcription, we will refer these as separate models. We evaluated the relative performance of each model by qualitative agreement of experimental distributions over the cell cycle (Figure 3-7) and in S/G2 specifically (Figure 3-9); and quantitatively, by evaluating a X goodness of fit (Table 3-1) and comparing the ratio of the mean mRNA number in GI to the mean mRNA at the end of the cell cycle, defined as: t=tGl RM 3-2 = (M(t()) / M(t) This is estimated experimentally (RM ) as the ratio of the mean mRNA in cells with the largest bud size (late G2/M) to cells in G1, and hence will be biased downward. (Experimental RM values are listed in Table 3-1.) This is a slightly different way of examining the models because it ignores the early/mid S/G2 data, but it informs directly on whether a particular model is capable of describing the relative (S/G2/M versus Gi) increase in mRNA number across the cell-cycle. Figure 3-7 shows the cell-cycle-stage expression distributions overlaid by the best-fit case for f = 2 and f > 2. 65 B A N=136 C A t:dox -- ViuTA population 3 .2 0 0 I. N=791 4 N=302 2 ~ N=191 N=560 .2 N=178 .2 N=454 -f= 12 N=388 .6 ~- N=282 00 No N=22 Early 6 bbf= & = 0 N=125 E E I -1 .f=100 F -dox +VtA N=341 .2 N=237 . Total D .4 1 N=53 N=41 N=47 .2 3 N=61 N=53 N=58 N=67 N=42 Small bud 6 Large bud .6 N=18 0- ~ N=55 .2 1-N=40 .2 0 -- N=17 0 N=31 .4 10 20 30 0 Experimental data Distnibutionmean +1- 1 SD 20 N=4 ~ - -=51 N=37 .6 N=52 0 10 20 0 5 10 0 20 40 60 40 mRNA Count Transcriptional bursting -+- Poisson transcription, f= 2 --- Poisson transcription, f> 2 20 20 0 40 0 20 & O ! 2 Cell cycle progression Cell cycle progression Figiure 3-7 Large differences in transcriptionalactivity between 0 1 2 Cell cycle progression S/G2/M and G1 depend on pronoter. (A) YFP mRNA distributionsin a haploid yeast with integratedPxmo- YFP ad no tTA arc shown in a column as a function of cell-cycle phase. Horizontallines above each distribution arc the experimental (green) and predicted meanlstandarddeviation fur diffetrenr models, calculated by (B) As in (A) but for P HO. (C&-D) As in] assuming each bud phase represents 1/3 of S/G2/. (A&B) biit with tTA and 100 or 500 ng/rmL dox added for Pwqo and P;-,to, respectively. (EF,) Integrated Poo.j- YFP with native DOAI expressed from a plasmid. Mid log-phase cells analyzed. (F) As in (E) but late log-phase cells. 66 Not surprisingly, the f=1 model under-predicts the difference between mRNA levels in GI and S/G2/M (see Figure 3-9, black) for all sets of experimental data. This is also reflected in the quantitative metrics, with a model prediction of RA = 1.2, where experimental estimates of RM are much larger than 1.2 (Table 3-1). But when f= 2 as expected based on differences in gene dosage, the model qualitatively describes the progression of the observed distributions for PDOAI expression (Figure 3-7E&F, Table 3-1). This leads to the conclusion that transcription from a constitutive gene varies across the cell-cycle with gene dosage. This result should be expected based on simple molecular biology, but has been underappreciated as a source of extrinsic noise in gene expression, with a couple of exceptions (Huh & Paulsson, 2011; Volfson et a., 2006). This becomes the null hypothesis for the cellcycle dependence of transcription. But tetO promoter measurements are not consistent with this null hypothesis, instead better described by f>2, with f>100 for basal expression (Figure 3-7A-D). Such a large value of fis consistent with no GI transcription at all, suggesting that transcription at basal levels is restricted to S/G2/M. This is supported by experimental RM values of 9.3 and 3.1 for basal Pxtt,() and P7 xtto respectively. This corresponds to f values of Infinite fold-change and >3 fold-change in expression (Table 3-1). But it also shows the lack of precision in these inferences: an R. of 6 should be the maximum, corresponding to no expression in G1. Yet we observe higher than this for basal Pixteto expression, and the lower value of 3.1 for P7Xet( expression may actually be due to experimental error in the low direction. Expression at intermediate levels seems consistent with some GI expression, but fold-changes greater than explained by gene dosage. PIxteto and P7 xteto expression is best fit by f = 4 and f= 9. Expression distributions that differ across cell-cycle stages will of course explain some of the variability in a growing population's expression distribution. But to assess the extent to which cell-cycle driven changes in transcription level explain noise, we again need more quantitative analysis. At the DOA1 promoter, all expression noise is explained by Poisson expression with 2-fold change between GI and S/G2/M. This is evident in the agreement 2 values in Table 3-1. But the between the horizontal bars in Figure 3-7E&F, and the high X tetO promoters have somewhat more variable expression than generated by this model of 67 Poisson expression with f-fold change in expression, especially for P7Xteto. We incorporate an extra potential source of variability into the model, by randomizing the timing of the transcription rate transition to occur in a uniformly distributed 40 minute window starting at the beginning of S/G2/M. (This is supported by the real-time protein measurements reported in the next section.) This predicts distributions that agree better with observations for tetO, but not for PDOAl (Figure 3-8). For both basal and intermediate expression levels from Pixe o and intermediate expression from P7XtetO, the model passes the x2 goodness of fit 1 test, indicating most of the variability is explained by this model (Table 3-1). These results in no way exclude the possibility of other sources of variability - for example adding transcriptional bursting during S/G2/M can also describe the variability, including the increased variability in P 7xeo expression. The mRNA FISH images for tetO promoters tend to have bright spots thought to represent nascent mRNA transcription that are more likely in S/G2/M (Figure 3-4) and may indicate this "bursty- expression is a source of variability. But it is nonetheless surprising, given former impressions of Piteo as a "noisy" gene, that all expression variability is described by cell-cycle changes in transcription with only slightly super-P oissoni an underlying transcription dynamics. To summarize all of the models, we show each model's best fit to total S/G2/M expression in Figure 3-9. It shows the expression distributions are well-captured by: f= 2 with Poisson transcription for PDOA1, f~ 10 for intermediate tetO expression and f= Infinite for basal tetO expression, suggesting no G1 transcription at basal levels. All PI/7 xtetO P[)oAi and most variability is explained by the change in transcription rate from GI to S/G2/M. But the remaining P/ 7xteto variability could be explained by variability in the timing of transitioning to S/G2/M transcription levels. These are strong statements, that contradict current understanding of noise in transcription. While well-supposed by mRNA measurements, this data is static and all conclusions about temporal cell-cycle changes are inferred from morphology. This data is therefore well-supported by a technique that collects data that follows cell behavior in realtime. 68 A Total populati on B 0.60. N=136 0.3 0.2 No bud N=79 .:dox WTA N=341 4 =$ - N=191 0.2 N=302 N=22 E F 0.2 - N=560 0.2M7 N=41 N=53 0 =13 N=1781 0.2 -Sj -- N=388 0 0.2 N=47 0 N=282 0.6 0 N=61 0 0.6 N=53 0.3 0 0 ~0.3 N=18 1 N=40 0.2 ~ N=55 0.2 N=40 ~N=58 0.3 0 N=17 N=31 0.6 0.4 0.3 0.2 0 N=67 0.6- 0 0.1 03O Large bud N=454 0.3 01 0.2J 0.6- 0.3 0 0.1 06 0.3 .:dox + WtTA f6 -- N=125 I 0.10.1 .05 f100 0 0.2 0.3 Small bud + N=237 flnf 0L Early bud C 10 20 30 .- N=42 0 2 N=37 0.6 0 0.1 0 20 40 0 20 1 40 N=52 0.3 ) 20 40 60 00 10 20 0 5 10 mRNA Count Poisson txpn, f> 2, variable timing of transition from GI to S/G2/M transcription rate] Figure 3-8: mRNA distributionsfrom P1 ~ are better fit by introducing variable timing in the transition from Gi to increasedS/G2/M transcriptionrates. (A-F) As in Figure 3-7, but with the model modified to incorporate a random. uniformly distributed transition from G1 to S/G21l transition rates occurring during a 40 min window after budding. 69 A B 0.4 N=57 .-dox C 0.4 + 'WtTA N=150 N=112 f=1 c 5 0.2- 0.2 ---- 0 .- -1 f=-2 0 5 f=-4 .1 -- ;100 f=1 0 =2 1 N=172 02 0 0 &- F--Inf 0.4 --- N=170 N 124 5 0A 1 -- 0 '0.4 .1 --- 5 0 F .-dox + 'tTA U 012 0 f=-9 0 P=4 A 0.2 0.2 5 02 1 0=1 m 0.4 f--nf 0.4 - 02 ^4 A 0 1020 30 0 0 20 40 012 -=13 i 5 0.2 0 f=-6 f100 -1 - 0 0 0& 0 20 0 0 40 20 40 60 0 10 20 0 5 1 mRNA Count Figure 3-9: Sumrmry of each models fit to S/G2/Ai-specific mRNA distributions.(A-F) Strains as in Figure 3-7. but comprlIsing a specific comparison of the agregate S/G2 Al distribution and its mean and standarddeviation. Green bars and horizontal lines represents the experimental S/G2,/Al disrri/)utiOn and its mean and standard deviation. 70 Table 3-1: Comparing experimental mRNA expression distributions to simple models A B C D tetO promoter without tetO promoter with activator (basal) tTA activator 1xtetO** 7xtetO*** I1xtetO 7xtetO E F Constitutive DOA promoter Higher Lower Ratio of mean mRNA at late G2 M to G1 as a measure of fold-change in transcription: R (measured) 9.3 3.1 2.8 2.7 2.4 4.2 corresponding Inf 4.3 3.6 3.3 2.7 8.9 Negative binomial fit to total distribution: The standard model equates a frequency and size of bursting to parameters of negative binomial fit to total population's stationary distribution. "Burst frequency" 0.4 0.5 2.2 1.2 4.1 1.1 "Burst size" 7.7 8.5 5.8 10.0 1.4 1.3 2-fold transcription increase: * Poisson transcription with S/G2/M transcription rate increased 2-fold over G1 transcription rate. The transcription rate is fit to the experimental total mean mRNA count. f 2 2 2 2 2 2 2 X fit p-value, Total 0 0 0 0 0.95 0.96 2 X fit p-value, G1 0 0 0 0 0.02 0.002 X2 fit p-value, G2 0 0 0 0.001 0.001 0.001 f-fold transcription increase: * Poisson transcription with S/G2/M transcription rate increased ffold over G1 transcription rate. The transcription rate and fare fit to the experimental G1 and G2 mean mRNA count. f Inf 100 4 9 4 12 2 X fit p-value, Total 0.06 0.002 0.14 0.07 0.004 0.12 X2 fit p-value, GI 2 X fit p-value, G2 0.69 0.17 0.46 0.84 0 0.002 0.004 0 0.18 0.007 0.78 0.69 f-fold transcription increase with variable timing of transition: * Poisson transcription with S/G2/M transcription rate increased f-fold over GI transcription rate with a 40 minute uniformly distributed window of switching to S/G2/M transcription rates after S/G1. Transcription rate and fare fit to the experimental G1 and G2 mean mRNA count. Inf 100 6 13 7 18 X2 fit p-value, Total 2 X fit p-value, GI 2 X fit p-value, G2 * 0.85 0.003 0.13 0.15 0 0 0.35 0.06 0.30 0.73 0 0 0.01 0 0.84 0.76 0.06 0.27 * Result of a X2 goodness of fit test of the measured data against the model prediction. Cases that pass or fail are in black or gray text, respectively with p = 0.05 representing the cutoff. ** For basal expression from to, the G1 and S/G2/M expression differ more than expected for no transcription in GI. Thus the data is best fit where f = infinity, representing no GI transcription. *** For basal expression from P7xtetO, because of high expression noise and the fact that 10% of cells have not turned on by late G2/M, specifying f by matching G1 and S/G2/M means gives a value of f (-7) that fits distributions poorly. The X2 goodness of fit was maximal and fairly constant over a range of approximately 50 < f< 100. Thus, f= 100 was selected. For the model with variable timing of transitioning to S/G2/M transcription rates, the estimated 40 minute window was extended to the full duration of S/G2/M. 71 3.3.5 Real-time fluctuations in protein levels corroborate mRNA measurements and reveal globally correlated activation A real-time method for inferring transcription is a strong complement to our mRNA measurements, which have high mRNA-count resolution but no temporal information. In our lab, CJ Zopf developed a platform for inferring transcription by tracking fluctuations in protein levels in single cells growing in microfluidic chambers. The method has a time resolution of approximately 15 minutes. (See Zopf et a]. (2013) main and supporting text for details of the method.) Transcription rate was inferred from a diploid strain expressing fluorescent reporters from two copies of P7xteto at homologous loci and a control constitutive, highly-expressed PGK1 promoter (Figure 3-1OA). For constitutive expression and expression from a regulated gene at high levels, transcription rate increased approximately two-fold from G1 to S/G2. This was robust across several different growth conditions, shown here for growth in glucose (Figure 3-10B, 85 min cell cycle) and raffinose (Figure 3-10C, 210 min cell cycle). This is strong agreement with our inferences from A 0 B 2% Cycle time (mnin) 21.25 42.5 63.75 171N N Constitutive 6 0.750 45 E 0.5 1 Regulated 0.25 15 0 52.5 105 2% raffinose 1 - mRNA expression. cycle time (rn-dn) C 85 '2- 45 PDOA, 157.5 210 G) =246 0.75 30 0.5 15 0.25 x 0 0 0 0.25 0.5 0.75 Cycle progression 1 0 , 4e.. 0 0.25 0 0.5 0.75 Cycle progression 1 Figure 3-10: For the constitutive gene and the regulated genes at high expression levels (A). transcription rate increased approximately two-fiid in S/G2/M veisus GI in two growth cOnditions (BC). Dots axe the average of expression at each time point of N cells. (This data and the igur1 itself were produced by CJ Zopf.) 72 To then study repressed transcription in real-time, we added 50 ng/mL dox to reduce P7 ,,,,() expression in the 3-color diploid to levels where transcription is thought to occur in infrequent, independent bursts at each locus that should be resolvable by the real-time analysis. But instead of bursts, single-cell traces of transcription rate show occasional "ON" periods that are restricted to S/G2, generally beginning within 20 minutes of bud formation, and lasting until division (Figure 3-11A). This also agrees strongly with our measurements at basal tetO promoters. It offers further information that, rather than just an average foldchange in expression, expression restricted to S/G2/M is also probabilistic. Our data suggests that all cells do have some expression in S/G2/M, even under basal conditions, because the zero-peak of the expression distribution drops almost to zero by the end of S/G2/M. But the resolution of the protein method for detecting low levels of mRNA transcription is unknown. So both methods are consistent with a model where cells may switch to strongly-transcribing states in S/G2/M and otherwise transcribe small amount of mRNA. A *CFP only .YFP only 0.25 Avg. G1 B eBoth N=288 C # cycles in bin 5 0 m Avg. S/G2/M ~90 0.2 . N=324 10 O 0 60 0.15 03 0- -U 0.1 .. c 0 0.05 0 0 0.5 Cycle progression N=66 p=0.46 _ -30 1 -30 0 30 60 90 CFP txn start (min, tw = 0) 2 0 F ONFF YFP CFP Figure 3-11. Transcriptionalbursts from honologous loci are cell-cycle dependent and partially correlated. The 3-color diploid strain was grown in microfluidics with 50 ng/mL dox, reducing expression. (A) The probability that each 7xtetO proinoter's transcriptionrate is above background, computed by averagingindividualcell responses at different cell-cycle progression,increases after GL (B) A 2D histogram of activation time for each promoter when both activate (t = 0 at budding). Most activation occurs near budding and is correlated. (C) Classifying single-cell S/G2/M periods from (A) by whether each P., activates reveals correlations in sporadic expression. Error bars represent SEM from bootstrapping. (This data and the figure itself were produced by CJ Zopf ) 73 The diploid cells studied with the protein method offer further information about correlation of transcription in a given cell. The "on '2 test; periods are not independent (p < 10- , 0.42) at each locus (Figure 3-11C). And if both P 7 1co copies turn on, > 70% of the time they do so within 15 minutes of each other (Figure 3-11B). These results corroborate analysis of mRNA expression, and are in striking contrast to the view of transcriptional bursting as intrinsically driven with exponential interarrival times (Golding et a]., 2005; Larson et a]., 2011; Raj et a]., 2006; Raj & van Oudenaarden, 2009). While increased protein production in S/G2 may be due to increases in translational capacity, this is unlikely for three reasons. First, while ribosomes numbers and activity are known to increase in yeast in S/G2 (Elliott & McLaughlin, 1978; Waldron, Jund & Lacroute, 1977), ribosome number is generally not considered rate-limiting for any particular gene as increasing gene dosage or mRNA number by transcriptional regulation leads to increased gene expression. Second, recent work in budding (Trcek et al., 2011) and fission (Zhurinsky et a]., 2010) yeast suggests mRNA levels of constitutive genes increase during S/G2. Third, we find average protein to mRNA ratios of cells grouped by cell-cycle phase to show no discernible cell-cycle dependent trend (data not shown). 74 Discussion: 3.4 3.4.1 Implications for understanding stochastic gene expression Our results indicate the G1 to S/G2 transition has strong effects on transcriptional activity beyond differences in gene dosage for the tetO promoters, which have characteristics (strong TATA box, regulable) of "noisy" promoters identified in genome-wide studies (BarEven et al., 2006; Newman et al., 2006). Temporary disruption of a repressed promoter's chromatin architecture during DNA replication could explain the pulse timing in early S/G2. Whatever the event, it does not occur independently at homologous loci. Our data alters the interpretation of studies where static mRNA/protein distributions are fit to stochastic models of gene expression to infer steady-state dynamics (Mao et a]., 2010; Munsky, Neuert & van Oudenaarden, 2012; Raj et a]., 2006; To & Maheshri, 2010). This difficulty of using static data to pinpoint origins of variability has been anticipated (Hilfinger & Paulsson, 2011; Taniguchi et a]., 2010), although even static mRNA FISH data can reveal additional dynamic information (Wyart, Botstein & Wingreen, 2010), including disaggregating mRNA distributions by cell-cycle stage New models incorporating cell-cycle linked pulses of transcription should alter predictions of gene network behavior. These models will benefit from further characterization of transcription dynamics across and within cell-cycle phases., with greater resolution than afforded by the techniques used here. 3.4.2 Gene activation kinetics are also cell-cycle dependent Of further interest is whether cell-cycle also drives the kinetics of a gene's response to a changing environment. CJ Zopf used the real-time protein tracking platform to investigate the cell-cycle dependence of gene activation kinetics (Zopf et a]., 2013). He measured the time to activate Plt( and P 7 o in response to a step change in transcription factor (TF) input. When the signal arrived during early S/G2, activation mostly occurred during S/G2. When the signal arrived during G1, activation was often delayed until S/G2. When the signal arrived during late G2/M, activation was sometimes delayed until the following S/G2, almost an entire cell-cycle later. Thus activation is cell-cycle dependent and enriched in 75 S/G2. This suggests that stationary dynamics can inform about kinetic behavior, and that cell-cycle plays a role in the kinetics of a gene network's response to changes in signaling. 3.4.3 A hypothesis that chromatin maturation permits repressed transcription The notion that nascent chromatin may permit transcription from repressed genes following DNA replication is longstanding, and suggests a most interesting source of cellcycle dependent transcription. In 1991, Wolffe (1991) suggested that the open chromatin structure of newly-replicated DNA might allow for formation of an active transcription complex; Guptasarma (1995) hypothesized that even in E coli, the unwrapping of DNA during replication may allow for transcription of repressed genes. Experimental studies followed, with a demonstration that nascent chromatin provides a transient period in which basal transcription can occur (Almouzni & Wolffe, 1993) and in which an activator could bind (Kamakaka, Bulger, & Kadonaga, 1993). Cesari et a]. (1998) used cycloheximide to uncouple DNA replication and chromatin assembly, which induced S-phase transcriptional activation. Later examples show that replication can disrupt epigenetic states to allow transcription, e.g. from heterochromatic repeats (Chen et aL, 2008). Of broader interest, activation of transcription is linked to DNA replication in several cases in development (e.g. Fisher & Mechali, 2003). It may also have a role in disease, such as derepression of an oncogene. Crowe et a]. (2000) showed that unscheduled, accelerated replication contributes to chromatin accessibility. derepression This evidence by diluting repressing of links between factors and enhancing chromatin remodeling activator and repressed transcription informs our hypothesis that post-replication chromatin maturation creates the S/G2 window for transcriptional activation, explained in Section 5.2. 76 3.5 References Almouzni, G., & Wolffe, A. P. (1993). Replication-coupled chromatin assembly is required for the repression of basal transcription in vivo. Genes & Development, 7(10), 20332047. Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O'Shea, E., Pilpel, Y., & Barkai, N. (2006). Noise in protein expression scales with natural protein abundance. Nature Genetics, 38(6), 636-643. Cesari, M., Heliot, L., Meplan, C., Pabion, M., & Khochbin, S. (1998). S-phase-dependent action of cycloheximide in relieving chromatin-mediated general transcriptional repression. Biochemical Journal 336(3), 619-624. Charvin, G., Cross, F. R., & Siggia, E. D. (2008). A microfluidic device for temporally controlled gene expression and long-term fluorescent imaging in unperturbed dividing yeast cells. Plos One, 3(1), e1468. Chen, E. S., Zhang, K., Nicolas, E., Cam, H. P., Zofall, M., & Grewal, S. I. S. (2008). Cell cycle control of centromeric repeat transcription and heterochromatin assembly. Nature, 451(7179), 734-737. Choi, P. J., Cai, L., Frieda, K., & Xie, X. S. (2008). A stochastic single-molecule event triggers phenotype switching of a bacterial cell. Science, 322(5900), 442-446. Chubb, J. R., Trcek, T., Shenoy, S. M., & Singer, R. H. (2006). Transcriptional pulsing of a developmental gene. Current Biology, 16(10), 1018-1025. Colman-Lerner, A., Gordon, A., Serra, E., Chin, T., Resnekov, 0., Endy, D., Brent, R. (2005). Regulated cell-to-cell variation in a cell-fate decision system. Nature, 437(7059), 699-706. 77 Crowe, A. J., Piechan, J. L., Sang, L., & Barton, M. C. (2000). S-phase progression mediates activation of a silenced gene in synthetic nuclei. Molecular and Cellular Biology, 20(11), 4169-4180. Elliott, S. G., & McLaughlin, C. S. (1978). Rate of macromolecular synthesis through the cell cycle of the yeast saccharomyces cerevisiae. Proceedings of the National Academy of Sciences, 75(9), 4384-4388. Elowitz, M. B., Levine, A. J., Siggia, E. D., & Swain, P. S. (2002). Stochastic gene expression in a single cell. Science, 297(5584), 1183-1186. Fisher, D., & Mechali, M. (2003). Vertebrate HoxB gene expression requires DNA replication. The EMBO Journal,22(14), 3737-3748. Garf, E., Piedrafita, L., Aldea, M., & Herrero, E. (1997). A set of vectors with a tetracycline-regulatable promoter system for modulated gene expression in saccharomyces cerevisiae. Yeast, 13(9), 837-848. Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25), 2340-2361. Gossen, M., & Bujard, H. (1992). Tight control of gene expression in mammalian cells by tetracycline-responsive promoters. Proceedings of the National Academy of Sciences, 89(12), 5547-5551. Golding, I., Paulsson, J., Zawilski, S. M., & Cox, E. C. (2005). Real-time kinetics of gene activity in individual bacteria. Cell, 123(6), 1025-1036. Guptasarma, P. (1995). Does replication-induced transcription regulate synthesis of the myriad low copy number proteins of escherichia coli? Bioessays, 17(11), 987-997. Hilfinger, A., & Paulsson, J. (2011). Separating intrinsic from extrinsic fluctuations in dynamic biological systems. Proceedingsof the National Academy of Sciences, 108(29), 12167-12172. 78 Huh, D., & Paulsson, J. (2011). Non-genetic heterogeneity from stochastic partitioning at cell division. Nature Genetics, 43(2), 95-100. Kamakaka, R. T., Bulger, M., & Kadonaga, J. T. (1993). Potentiation of RNA polymerase II transcription by Gal4-VP16 during but not after DNA replication and chromatin assembly. Genes & Development, 7(9), 1779-1795. Larson, D. R., Zenklusen, D., Wu, B., Chao, J. A., & Singer, R. H. (2011). Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science, 332(6028), 475-478. Maiuri, P., Knezevich, A., De Marco, A., Mazza, D., Kula, A., McNally, J. G., & Marcello, A. (2011). Fast transcription rates of RNA polymerase II in human cells. EMBO Reports, 12(12), 1280-1285. Mao, C., Brown, C. R., Falkovskaia, E., Dong, S., Hrabeta-Robinson, E., Wenger, L., & Boeger, H. (2010). Quantitative analysis of the transcription control mechanism. Mol Syst Biol, 6 Munsky, B., & Khammash, M. (2006). The finite state projection algorithm for the solution of the chemical master equation. The Journalof Chemical Physics, 124(4), Munsky, B., Neuert, G., & van Oudenaarden, A. (2012). Using gene expression noise to understand gene regulation. Science, 336(6078), 183-187. Muramoto, T., Cannon, D., GierliA,,ski, M., Corrigan, A., Barton, G. J., & Chubb, J. R. (2012). Live imaging of nascent RNA dynamics reveals distinct types of transcriptional pulse regulation. Proceedingsof the NationalAcademy of Sciences, 109(19), 7350-7355. Newman, J. R. S., Ghaemmaghami, S., Ihmels, J., Breslow, D. K., Noble, M., DeRisi, J. L., & Weissman, J. S. (2006). Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441(7095), 840-846. 79 -. O'Neill, E. M., Kaffman, A., Jolly, E. R., & O'Shea, E. K. (1996). Regulation of PHO4 nuclear localization by the PH080-PHO85 cyclin-CDK complex. Science, 271(5246), 209-212. Paulsson, J., & Ehrenberg, M. (2000). Random signal fluctuations can reduce random fluctuations in regulated components of chemical regulatory networks. Physical Review Letters, 84(23), 5447. Peccoud, J., & Ycart, B. (1995). Markovian modeling of gene-product synthesis. Theoretical PopulationBiology, 48(2), 222-234. Pedraza, J. M., & van Oudenaarden, A. (2005). Noise propagation in gene networks. Science, 307(5717), 1965-1969. Raj, A., Peskin, C., Tranchina, D., Vargas, D., & Tyagi, S. (2006). Stochastic mRNA synthesis in mammalian cells. PLoSBiol, 4(10), e309-e309. Raj, A., & van Oudenaarden, A. (2009). Single-molecule approaches to stochastic gene expression. Annual Review of Biophysics, 38(1), 255-270. Raser, J. M., & O'Shea, E. K. (2004). Control of stochasticity in eukaryotic gene expression. Science, 304(5678), 1811-1814. Shahrezaei, V., Ollivier, J. F., & Swain, P. S. (2008). Colored extrinsic fluctuations and stochastic gene expression. Mol Syst Biol, 4 Suter, D. M., Molina, N., Gatfield, D., Schneider, K., Schibler, U., & Naef, F. (2011). Mammalian genes are transcribed with widely different bursting kinetics. Science, 332(6028), 472-474. Tan, R. Z., & van Oudenaarden, A. (2010). Transcript counting in single cells reveals dynamics of rDNA transcription. Mol Syst Biol, 6 80 Taniguchi, Y., Choi, P. J., Li, G., Chen, H., Babu, M., Hearn, J., Xie, X. S. (2010). Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329(5991), 533-538. To, T., & Maheshri, N. (2010). Noise can induce bimodality in positive transcriptional feedback loops without bistability. Science, 327(5969), 1142-1145. Trcek, T., Larson, D., Moldan, A., Query, C., & Singer, R. A. (2011). Single-molecule mRNA decay measurements reveal promoter- regulated mRNA stability in yeast. Cell, 147(7), 1484-1497. Volfson, D., Marciniak, J., Blake, W. J., Ostroff, N., Tsimring, L. S., & Hasty, J. (2006). Origins of extrinsic variability in eukaryotic gene expression. Nature, 439(7078), 861864. Waldron, C., Jund, R., & Lacroute, F. (1977). Evidence for a high proportion of inactive ribosomes in slow-growing yeast cells. BiochemicalJournal, 168(3), 409-415. Wolffe, A. (1991). Implications of DNA replication for eukaryotic gene expression. J Cell Sci, 99 (Pt 2), 201-206. Wyart, M., Botstein, D., & Wingreen, N. S. (2010). Evaluating gene expression dynamics using pairwise RNA FISH data. PLoS Comput Biol, 6(11), e1000979. Zenklusen, D., Larson, D. R., & Singer, R. H. (2008). Single-RNA counting reveals alternative modes of gene expression in yeast. Nat Struct Mol Biol, 15(12), 1263-1271. Zhurinsky, J., Leonhard, K., Watt, S., Marguerat, S., B-Ahler, J., & Nurse, P. (2010). A coordinated global control over cellular transcription. Current Biology, 20(22), 20102015. Zopf, C. J., Quinn, K., Zeidman, J., & Maheshri, N. (2013). Cell-cycle dependence of transcription dominates noise in gene expression. PLoS Comput Biol, 9(7), e1003161. 81 CHAPTER 4. Characterization of the tetO gene regulatory function using cell-cycle arrested mRNA expression 4.1 Abstract Isogenic cells in identical conditions display variability in expression. This variability in gene expression impacts the dynamics and function of gene regulatory networks. Predicting the nature of this impact depends on the extent and statistics of the noise, which in turn depends on the biological and molecular origins of the noise. Hence, a prerequisite for understanding or designing the function of gene regulatory networks is to characterize the origins and statistics of the noise, in response to regulatory conditions. The paradigm that noise arises from stochastic "bursts" of transcription, is challenged by evidence of previouslyunappreciated extrinsic sources of noise. We showed in Chapter 3 that large differences in transcriptional activity across the cellcycle dominate noise in transcription from the tetO promoter in budding yeast. Here, we further characterize the origins of noise in cell-cycle driven transcription across activation levels and for two promoters differing by operator number, to determine the origins of expression noise. By measuring single cell mRNA distributions in cells arrested at the G1/S and G2/M transitions, we find: (1) gene activation in S/G2 is correlated between each locus with a probability dependent on activator levels, (2) there is no GI transcription below a certain activator threshold, (3) mRNA distributions from an active tetO promoter with a single operator are Poisson but from an active multiple operator tetO promoter they are super-Poissonian. Similar conclusions can be made for the native PHO5 promoter, indicating these results are not an artifact of using a synthetic gene. These results confirm that much of the variability in mRNA levels is due to the large differences in the probability of activation between cells in GI and S/G2. However, they also show the interplay of aspects of promoter architecture that have been associated with affecting noise, such as increased operator number, with cell-cycle dynamics to provide an integrated picture. 82 4.2 Introduction Noise in gene expression is the sum of all sources of variability leading up to transcription and expression of a gene. Some variability results from the stochastic, single-molecule biochemical reactions that comprise steady-state transcription at a gene loci. Experimental evidence suggests these dynamics of transcription are dominated by brief bursts of activity followed by long inactive periods, with "burstiness" modulated by promoter architecture and correlated with expression range (Cai, Friedman & Xie, 2006; Chubb et a]., 2006; Golding et a., 2005; Raj & van Oudenaarden, 2009; Taniguchi et a]., 2010). Noise in expression is often attributed to such dynamics, and used to infer how regulatory elements control transcription (see Table 1-1 for examples). But noise also arises from fluctuations in global cell activities that modulate transcription or expression, and may result in expression with statistics that agree with transcriptional bursting (Figure 3-3). Even noise that appears intrinsic may depend on extrinsic factors in the cell's history (Hilfinger & Paulsson, 2011). Dynamics inferred from expression noise without accounting for extrinsic variability overpredict noise at the promoter and incorrectly infer regulatory modes. In previous work, we demonstrated that expression noise from synthetic tetO promoters in budding yeast is largely due to differences in transcriptional activity between the GI and S/G2 stages of the cell-cycle (Zopf, Quinn et a]., 2013, Chapter 3). But our previous methods limited further characterization. Fluctuations in protein levels have low time- and molecularresolution. Expression across a snapshot of a growing population has no temporal information and cell-cycle fluctuations in transcription are blurred by slow mRNA turnover. Data obtained with these methods could not discriminate between multiple qualitative descriptions of how transcription changes across the cell-cycle, did not titrate activator levels and was limited to the synthetic tetO promoter. Here, we seek to quantitatively characterize transcription dynamics across the cell-cycle in response to activator levels and the number of promoter binding sites. Analysis of these measurements enables us to arrive at a model of the transcription dynamics and expression at a single gene locus across the cell cycle. We extend our investigation to the native PHO5 promoter and discuss the consequences of naively interpreting all transcriptional noise as intrinsic in origin. 83 4.3 Results: 4.3.1 Analysis of single-molecule nuclear and cytoplasmic mRNA FISH in arrested cells reveals instantaneous cell-cycle dependent transcription While we previously found much of the transcriptional variability of tetO promoters was due to cell-cycle phase dependent differences, we were unable determine to what extent variability driven by noisy expression within each cell-cycle phase conttibuted to the overall noise. Moreover, it was difficult to distinguish whether a promoter was active. In previous work in Chapter 3, we utilized mRNA FISH measurements of Venus YFP transcripts driven from genomically integrated tetO promoters to infer transcriptional activity. Because transcripts of Venus YFP have a half-life (t/ 2 = 20 minutes, To & Maheshri (2010), Figure 6-2) on order of G1 (-60 minutes) or S/G2/M (-60 minutes) in fast-growing conditions, steady-state YFP transcript levels represent a time-averaged activity of expression across different cell-cycle stages and are hence poor reporter of cellcycle stage-specific transcription. This prevented us from determining to what extent variability within a cell-cycle stage contributed to overall noise. Moreover, it was difficult to precisely determine whether a promoter was active in G1, as the presence of mRNA could potentially be due to activity in the previous S/G2/M stage. We reasoned that steady-state mRNA FISH measurements of tetO promoter-driven Venus mRNA in cells arrested at the G1/S checkpoint (by addition of alpha-factor) or G2/M checkpoint (by addition of iiocodazole) for several mRNA lifetimes would reflect the transcription dynamics when rapidly growing cells were in GI or S/G2/M, respectively. To validate this approach, we tested whether arrest affects normal transcription dynamics. To do so, we measured stationary mRNA expression distributions in both growing populations (stratified by cell-cycle stage according to our previous method, Chapter 3.3) and G1/S or G2/M arrested populations. Figure 4-1A describes timing of mRNA FISH measurements post-arrest. Because an asynchronously growing population is arrested, cells will exist in all cell-cycle stages. The vertical timeline in Figure 4-1A describes events that would occur for a cell that was at the beginning of S phase and had a cell cycle of length 2 hours (1 hr for GI and 1 hr for S/G2/M, as measured for these cells). So for example, if a S phase cell was arrested with nocodazole and fixed and analyzed after 5 hours of nocodazole 84 treatment, it would actually have been arrested just under 4 hours (since it would take just under 1 hour for the cell to reach the G2/M checkpoint). Similarly, if a S phase cell was arrested with alpha-factor and fixed and analyzed after 5 hours of alpha-factor treatment, it would actually have been arrested for 3 hours (since it would complete a full 2 hour cell cycle before it reaches the G1/S checkpoint). The curly braces in Figure 4-1A describe the range of time various members of the cell population have actually been arrested subsequent to 2,5, or 8 hours of nocodazole treatment, or 3 or 5 hours of alpha-factor treatment. After these cells are fixed and stained, mRNA numbers per cells are automatically determined using custom image analysis software, and the corresponding mRNA distributions can be generated. Experimental error in mRNA FISH measurements like these arise from technical error in the actual measurement, biological error between biological replicate samples. mRNA FISH has been validated to not suffer from systematic technical error (Raj et a., 2008; and To & Maheshri, 2010) and the measurement error is dominated by biological error. Based on multiple biological replicate measurements on different days, the biological error is never > 10% and usually ranges from 5-10%. When the mean and variance are estimated from these distributions the sampling error is < 5% as generally distributions are constructed from N >100 cells (N=186 +/- 95 cells per sample on average +/ - st. dev. --see Appendix 6.4 for error analysis.) For cells in S/G2/M, two copies of the tetO promoter are present. Therefore, to control for this difference and facilitate comparison with cells in G1, we present the distribution of mRNA per gene copy. We do so by subjecting the distribution to binomial partition (random assignment of each mRNA to each gene copy). Hence data described here controls for the effects of gene dosage and any differences between G1- and G2/M-arrested distributions are due to additional cell-cycle dependent effects. We first employed the above protocol on cells containing a genomically-integrated copy of the promoter of the low expressed DOA1 housekeeping gene (PDOA1). This gene has previously been reported to generate a near-Poisson steady-state mRNA distribution and hence would not be expected to show large differences in expression in G1- and G2/Marrested cells. We find that the mRNA distribution does not change between 3 and 5 hours after alpha-factor arrest, and also looks similar at 2,5, and even 8 hours after nocodazole 85 arrest (Figure 4-2A). This verifies that arrest is not affecting expression. The Gi-arrested distributions (blue bars in Figure 4-2A) have slightly higher means than S/G2, indicating there is no large decrease in expression in Gi-arrested cells relative to G2/M-arrested cells. The high expressing (activator levels at 95% of the maximum) has also been PNtet0 previously shown to exhibit only 2-fold gene-dosage differences in transcriptional activity in GI versus S/G2/M stages. Similar to PnOA1, steady-state expression is achieved by 3 (alphafactor) or 5 (nocodazole) hour arrest, and the distributions look equivalent. Interestingly, at 8 hours post-nocodazole arrest the mean mRNA levels decrease significantly, suggesting that very prolonged arrest may alter transcript levels. Still, we can conclude that arrest has no global effects on gene expression at least 3-5 hours post-arrest (which constitute many mRNA lifetimes). C B A 0 S .S 1 2 G42 M is .S . S(G2 M Mitosis 2h Nocodazole treatment = G-2h arrest 3 3h alpha-factor =4 5h Nocodazole treatment =3-5h arrest S6 8 9 treatment = 1-3h arrest Sh alpha-factor treatment =-3- 5h arrest 8h Nocodazole treatment = 6-8h arrest U EU Eu Eu Lii o 30 60 0 30 60 60 30 0 Cytoplasmic mRNA count per cell Figure 4-1: mRNA FISH of cell-cycle arrested populations. (A) Asynchronously Arowng populations of yeast cells were arrestedwith (left) nocodazole or (right) aipha-lactor, and cells we fixed 2.5., and 8 hours post nocodazole treatment or 3 and 5 hours post-treatment for alpha-factor. Because the population is growing asynchronously, cells experience varied times of actual arrest. depicted lby the range (see text lbr detais). (B) Representative mRNA FISH images of (top row) row) G2/Mil-arrested cells. highly-expressed Pas,5 h-o growing (middle row) Gl-arrested and (bottom Bright field and fluorescent images of fixed cells hybridized with mRNA probes are shown in the left and middle columns, with mRNAs detected using custom image analysisroutinesoverlaid in the right column. (C) Subsequent to image analysis, the correspondingmRNA distributionscan be construct(ed. shown here for all S/G2/ Algrowing (orange). Gl arrested(blue) and G2/Al-arrested(red) cells with highly-expressed P,--,Ie. Above each distribution is a box whisker plot where the black dot describes the mean of the distribution, the edges of the box denote the 25'h to 7511, percentile of the histogra., and the ends denote the 1 (yh and 901 percentile. Because samples in S/G2/1l have two copies of the tetO promoter. we construct an mRNA distribution on a per gene copy basis to facilitate direct comparison with distributions from cells in G1. This distribution is constructed by binomially partitioningthe measured S/G2/ l distribution (see text). 86 When activator levels lower, we previously inferred large differences between expression in S/G2M and GI given there was a 4-to-6-fold difference in respective mRNA levels when measured in growing populations. After 3 hours of arrest in G1, we would expect any contribution of mRNA transcripts arising during the previous S/G2/M stage would disappear due to degradation. Gi-arrested mRNA levels from the P7 xtetO promoter at 13%- activation fall from 9 mRNA on average per cell in growing GI cells to just 1 mRNA per cell with a, median of 0 mRNA after 3 hours arrest (Figure 4-2C). Therefore 3 hours of GI arrest, which lasts 6-to-12 mRNA half-lives, reflects steady-state GI expression. After 2 hours of G2/M arrest (where the asynchronous population experiences between 0 to 2 hours of actual arrest), the expression is slightly lower than growing cells identified as near G2/M based on bud size. Interestingly, whereas expression at 95% activation was unchanged until 5 hours of arrest, G2/M expression at 13% activation falls between 2 hours and 5 hours of arrest, suggesting activator-dependent stability of the active state. (This trend is identical at 5% activation, data not shown.) This activator-dependent decrease in mRNA also confirms that arrest has not simply ceased all production and degradation of mRNA. Similarly, 2 hours of G2/M arrest is a pseudo-steady-state approximation of late G2/M activity. Extended durations of G2/M arrest tell about the stability of the G2/M active state in the absence of the mitosis/G1 transition. 87 A B 0 C 0 4 42 6 . 8 ~dLte S2( SG2 S6 - 4 _________4 6 8 8 10 40 60 80 0 0 20 Cytoplasmic mRNA per gene copy Cytoplasmic D 01 o 13%' 6 10 15 0 5 Cytoplasmic mRNA per gene copy 3 0I -2 2'~~ - 6 7XDO ~_ 0L________ ____ _ _ _ _ 13%, 2 4 .4 -- G2 arrest G1 40 30 per gene copy E 0[ arrest G1 arrest 20 mRNA arrest 6 6 8 G2 arrest CD 8 8 0 0-25 0-5 0.75 Fraction of active cells 1 0 025 0,5 0.75 1 Fraction of active cells Figure 4-2: Cvtoplasmic mRNA distributions and active cell fractions various times afer cell(ycle arrest. (AB, C) mRNA expression distributionsare presented using box and whisker plots as in Figure 4-iC for the (A) constitutive DOA 1 promoter. (B) a highly activated 7.tetO promoter. and (C) a lower expressing 7xtetO promoter. In both (A.) and (B), arresteddistributionsplateau at 3 (for (dpha-factoi) and 5 (fornocodazole) hours post-arrest.indicatingarrest does not affect transcriptional activity in this time range. Gi- (blue) and G2/Al- (red) arrested cell distributions are nearly equivalent. hence no significant cell-cycle stage differences in transcri>tionalactivity are present. In contrast, at lower expression, mRNA levels in G-arrestedsteadily decrease from that in late G2A. till there are no mRNVA present 3 hours post-arrest. This shows no expression in Gi 4 hours after Sphase. Expression under G2/ l arrest decreases too, though slower. (D) The fraction of active cells, indicated by a nuclearspot, remains high for 95%-activatedP-to.Note that cells in G2/AI have two gene copies: if either is active then the cell appears active. (E) The fraction of active cells in GI for 13%-activated P,-.,o drops to zero after A-phase and 3 hours of arrest, showing clear lack of G1 activity. Experimental error is not shown, but error in mean expression is 5-10% between sample replicates and an additional< 5% due to population sampling. 88 But, compared with the mRNA at the site of transcription in the nucleus, this cytoplasmic mRNA is a relatively delayed readout of transcriptional activity. It may also contain additional noise from mRNA processing and export. Nuclear mRNA is visible as a bright spot in mRNA FISH images and aligns with a DAPI stain (Figure 4-1B). Partially complete nascent mRNA transcripts may contribute significantly, but we interpret the spot intensity as the equivalent number of full transcripts. PDOA, expression produces no bright nuclear mRNA spots because of its low, Poisson-distributed expression. Measurement of the rate of nuclear spot degradation with thiolutin treatment finds the lifetime of the nuclear spot is similar to cytoplasmic mRNA (see Appendix Figure 6-2). Therefore, presence of a nuclear spot indicates active or recent transcription, and its intensity indicates the magnitude of activity; absence of nuclear mRNA is indicates that a cell has been OFF for > 10s of minutes given its 20 minute half-life. Across all samples, average cytoplasmic mRNA counts are double the average nuclear mRNA counts (Figure 6-3), suggesting nuclear export occurs at twice the rate of cytoplasmic mRNA degradation (Appendix 6.3.2). The fraction of active cells during arrest corroborates that arrest maintains growing transcription dynamics. It is high for 95%-activated arrest (Figure 4-2D). For 13%-activated P 7 .,1o, P 7 ,to expression and maintained during few cells are active in GI arrest (3%). We employ more detailed modeling of transcriptional dynamics later in this chapter. But we can quickly assess whether expression in growing GI cells agrees with transcription at arrested GI levels buffered by history from higher S/G2/M transcription levels. It is consistent, with 37% of cells active, versus 30-40% expected from the sum of degradation of S/G2/M expression and transcription at low Gi-arrest levels (Figure 4-2E). (This is also true for median cytoplasmic mRNA expression, observed at 5 mRNA for growing GI cells, compared with an expectation of 5-8 mRNA.) Therefore cytoplasmic and nuclear mRNA expression under arrest in GI (3 hours) and G2/M (2 hours) is a readout of GI and S/G2/M transcription dynamics. The data also suggests that extended periods in G2/M have a similar but dampened effect to the M/G1 transition in turning OFF transcriptional activity at low and intermediate expression. 89 4.3.2 mRNA expression under cell-cycle arrest reveals that activator regulates probabilistic activation of a long-lived transcribing state in S/G2 To explore a large domain of regulatory conditions, we titrated expression over large range of activator levels at the tetO promoter with 1 and with 7 activator binding sites, then arrest growth and measure mRNA expression (Figure 4-3). The trend in how mRNA levels change with activator level is striking: the ratio of G2/M to G1 expression per loci shifts from infinite (no GI expression) to equivalent as activator levels increases (Figure 4-3A,B). For both Pxtet() and P7 tet 0 , basal expression occurs in S/G2/M, but not in G1 (Figure 4-3, blue). This confirms our previously inferred results (Zopf et a]., 2013). Nuclear mRNA measurements in Gi-arrested cells reveal a clear threshold activator level below which there is no expression in GI (40% for Pixeo, 20% for P 7xteto; Figure 4-3C,D). The trend in the fraction of cells with nuclear mRNA spots ("ON" cells) reveals new information about underlying dynamics. The fraction of ON cells increases just as much as the overall increase in expression levels (Figure 4-3C,D), meaning that cells' likelihood of activity, and not the productivity of an active cell, is the activator's major point of control. Because we infer dynamics from static measurements, we cannot directly measure the dynamics of switching between these ON and OFF states. But the rate can be confidently estimated using the 20-30 minute nuclear mRNA half-life and the extent of separation between ON and OFF subpopulation expression (Figure 4-3E-H). If the timescale of switching is faster than 20 minutes, most cells will appear ON. If switching is slower, cytoplasmic expression will be similar between cells with or without nuclear mRNA. Because we observe a large separation between cytoplasmic mRNA expression levels of ON and OFF cells (Figure 4-3E-H), the timescale of switching must be longer than the mRNA lifetime. Given we know expression changes at the M/G1 transition, we conclude sustained expression in S/G2. From this, and the previous section, we can conclude that the tTA activator regulates the likelihood of activation in S/G2. 90 A 0% =.rrxteto G1.arrest 25%4 G2 arrmt ,509 7xtetO1 25% 50% _ 75%-_ 759% 60 40 20 0 Cytoplasmic mRNA per gene copy 30 20 10 0 Cytoplasnmic mRNA per gene copy C - 0% 7xt etO ixtetO w25% 25% Ummmmm 0 50%%m 1650% 75%I 0 75% 0.25 0.5 075 1 0 0.25 0.5 0.75 Fraction of active cells Fraction of active cells E 0% IxtetO* - t25% 25% 50% 50% 75% 75% G 0% 30 20 10 CytoplasMIc mRNA per gene copy in ON cells - IxtetO 0 HO% 25% 25% 50% 50% <75% 75% 0 30 20 10 Cytoplasmic mRNA per gene copy in OFF cells 80 40 20 Cytoplasniic mRNA per gene copy in ON cells --.-- 0 7xtetO -~ - 0 1 - 7xtetO 60 40 20 Cytoplasmic mRNA per gene copy in OFF cells Figure 4-3: Activator titration of mRNTA expression under cell-cycle arrest. (AB) For P, utO and P 7. 00, there is no niRNA expressed in Gl arrestedcells at low activator levels. As activatorincreases. expression in Gl and S/G2/M is equivalent, when examined on a per gene copy basis. (C.D) Increase in the fraction of active cells explains most of the dynamic range. (E-I) Expression substantially differs between the ON (EF)and OFF (GH) subpopulations. indicating a long-lived active and inactive state. Activator-dependent increase in OA-state expression is small, an(l later proves to he due mereli to the increasing fraction of S/G2/il[ ON cells with 2 rather than 1 active loci. Experirnentaleror is 10-15Y( (Appendix 6.4). 91 4.3.3 Conditional variances quantify the origins of gene expression noise Given expression level varies substantially between cells in GI or S/G2/M and with or without nuclear mRNA ("ON" or "OFF"), a breakdown of total expression variance into the contribution from each subpopulation is a useful visualization of the origins of noise. First, we review the trend in noise across growing populations (Figure 4-4, B&E, black). The Fano factor of growing population expression is higher for P7xtetO than Pixteto, as previously reported for these promoters (To & Maheshri, 2010). In both cases, the Fano factor increases at low levels of activator then plateaus or decreases at higher levels of activator. (In the bursting model, the Fano factor is equivalent to the "burst size", and we revisit this interpretation in Section 4.4.1.) Breaking down growing populations by cells' activity state (ON/OFF) and cell-cycle stage begins to understand this trend. Figure 4-4A&D shows growing populations' cytoplasmic mRNA expression distribution broken down into cell-cycle subpopulations. G1 expression distributions differ from S/G2/M. We can decompose the variation in mRNA levels (X) due to the conditions (C) of cellcycle stage and whether the promoter is ON/OFF using the aw of Total Variance: 4-1 Var(X)=E(Var[XC])+Var(E[XC]) The first term is the weighted sum of the variance within each conditioned subpopulation, such as cells in G1. This may still contain variability from extrinsic sources that affect expression in a given state, along with the variance from stochastic singlemolecule transcription events. The second term is the variance of the mean of each subpopulation, and hence describes the variability due to differences between expression in the different conditions. Figure 4-4B&E breaks down the variance (normalized by population mean) into variance derived within (labeled by the condition) and between the cell-cycle stages (labeled by "Between"), and Figure 4-4C&F normalize by total variance to see relative contributions. We see differences in expression across cell-cycle alone contribute 20-30% of total expression variance. But the slow degradation of mRNA means that expression in each cell-cycle stage contains history of transcription in previous cell-cycle stages. Thus the variance "Between" the subpopulations under-predicts the variance derived transcriptional differences between these cell-cycle and activity (ON/OFF) states. 92 from B A 0.08 E D 0.04 30 1xteto 7xtetO 30 7xteto 1% 0.02 E A 0.08 1xtetO 13% -004 0-08 1xteto 0.02 E C 0 50% 004 1xteto 0.08 95% 0.04 a 0 - 20 40 Cytoplasuic mRNA count per cell 13%t, 0.04 Cz o 100 0 50 25 S00 75% 7xtetO 0.04 5 75% 5 50% 50 75% 50 75% F 100 0.02 C5%07 75 0.04 0 25 0 25 7xtetO 95% 0.02 0 25 50 Activation A 25 0 75% (50 40 80 Cytoplasmic mRNA count per cell 0 25 Activation Population Fano factor -m (CL S M .2 lid S,,G2/NM late Figure 4-4: Variabilityin growingpopulations apportioned to variance within and between cellcycle stages. (B.E, black triangles) The Fano factor of growing populations increases and then plateaus, at a higher noise level lbr P,,o and Pit.u The origin of this trend is revealed later in Figure 4-12 (A) Growing population P,,, cytoplasmic mRNA expression distributions for Ibur activator levels (1%. 13%, 50%, 95% activation). G1 and S/G2/Al expression differ substantially. (BC)Breakdown of total variance in cytoplasmic mRNA expression. normalized by populationmean to give units of mRNA, into variance from within and between cell-cycle stages. (D.E.F) Pa) expression, as for (A,B, C) 93 However, in the previous section we found that the fraction of active cells is a large difference between cells in GI or S/G2/M. Figure 4-5 shows expression and variance conditioned on the subpopulations of active (ON) or inactive (OFF) cells in G1 and S/G2/M. As before, Figure 4-5B&E and Figure 4-5C&F show contribution to normalized and total variance for Pxteto and P 7 teto respectively. The variance between these subpopulations is substantial, at 50-70%. Of course, the S/G2/M ON subpopulation includes cells with 1 or 2 loci ON, and noise is approximately apportioned between the two here. Not all variability within the subpopulations is due to stochastic transcription. Some expression variability in Gi-OFF cells at lower activator levels is from levels of mRNA decreasing throughout G1 (which has lower transcription rate than the preceding S/G2/M). It also reflects fluctuations in mRNA processing and export and variable global cell activity, perhaps related to cell size. This leaves stochastic, single-molecule promoter fluctuations contributing no more than 25% of the total noise in expression from these tetO promoters. A B 06 IxtetO 1% 0A D E 30 -. Z4 7xtetO 0.6 0.4 0.2 1IxtetO 0.2 1% 7xtetO 0* 0 0.1 0.04 1xtetO 13% S0 or1Ixteto 0.04 0.02 0 50% 7xtetO> 13% 10 0 25 50 75% E 10 00 0 7xWeO C 100 50% F 100 0.02 75 0 0 04 30 -e IxtetO 95% 0.01 25 0-02 0 20 40 Cytoplasmic mRNA count per cell 00 75 0.27xteto 95% 50 25 50 AL 5 ,25 75% Cytoplasmic mRNA Activation count per cell Activation Varianct betwteen subpopulations S G(2 1 ON at 1 loci S/G2/M. ON at 2 loci Figure 4-5: Varjabihlitl ill growiiig Jpopulations apportionedto variance within and between cellcyicle and ON/'OFFsubpopulatiOns, sho w-inIg variance between ldoninates. (A) Growlying populaion P%-> expiression distributionsfor fr'ur activatorle vel', showing the changig cntributionsof GI and S>'G2/'1f OFF and ON subpopilations. (BC)Subpoplations contribution to total variance in c: yroplasnic maR-A4 (xprossioin showing that variance between the subpopulations is substantial. (D.EF) P-xoe) expression, as for (A.,B, C). 94 Breaking down the expression variability in near steady-state arrested cell populations better represents the underlying transcription in the ON/OFF active states and informs our choice of model. For G2/M-arrested cells, expression distributions show that the fraction of cells in the OFF or ON state changes dramatically with activator level, but how the mean and shape of the OFF or ON distribution is unclear (Figure 4-6A&D). Accounting for subpopulations with 1 or 2 active loci reveals that noise between the ON and OFF populations dominates, contributing 25-50% of variance from P1 ,teo and P 7, 9,o expression (Figure 4-6C&F). A L 1 E D B xteto - 0.04 1xtetO 7xteto 1% _0 z 0.02 . SxtetO 005 1xteto 7xteto 13% b 0.04 13% 0.02 50%C10 E 0 0.051te 100 0.0 0 2 1xteto C0 5% 40 100 0.15g S0 0.02i00 20 40 Cytoplasmic mRNA count per cell 0 25 50 75% 0.04 5 0 5 7XtetO 95% 0.02 25 50 25 0 40 80 Cytoplasmic mRNA count per cell 75% Activation 50 00 25 50 75% Activation Variance between subpopulations G2I1 arrest, OF G2/M arrest, ON at 1 loci G2/M arrest, ON at 2 loci Figure4-6: Variabilityin G2/M1-arrestedpopulationsapportionedto variance within and betw een ON/IOFFsubpopulations.(A) P ,Io expression distributionsshow OFF and ON subpopulations are tighter than hill distribution. (B, C) Even within a cell-cycle phase, much of the variancederives from expression. as variation between, rather than within, ON and OFF subpopulations. (DEF)P fir (AB,C). Variance within the subpopulations comprises a higher fraction of total variance compared with Pw(,'o,. but variation between the ON and OFFpopulationsis still high. 95 Cells arrested in G1 have only one gene loci and so are the closest direct reflection of dynamics at a single loci (Figure 4-7). At most activation levels, variance in the OFF subpopulation dominates, partly because it is the largest subpopulation. Comparison of the normalized variance for Pixteo and P 7Ao shows that P7xtet() has higher intrinsic variability, by the same 2-fold ratio over PixtetO in the original growing population Fano factor. But the Fano factor of these subpopulations is much smaller, around 2 and 4, which begins to reveal the true dynamics at the promoter and informs our choice of model for the next section. A Ixteto 1% 0.04 B D Ixeo 0,08 3 z E 1% !c 7xe Z 0.04 E 20 008 . 3%I C0 0.08 2 13% 10~t 0.04 50 C.0 0.04 L 70 0 75% 1xffi C 50 75% 75 1xteto 50 0 95% 20 40 Cytoplasmic mRNA count per cell 0 50 47xteto 0.04 95% 0.02 25 OM2 00 75% 0.04 o0 0.04 50 F 7xt00 5n% 100 0.08 2 25 25 50 25 0 75% Activation 80 40 Cytoplasmic mRNA count per cell 0 25 Activation Sariance between subpopulations GI arrest. ON at 1 loci Figure 4-7: As for Fiuare 4-6, but for expression under Gl arrest. which is from a single loci. (BE) Variance from active loci is relatively small compared with variance rom the original total grouwing population. (CF) Variance between the OFFand ON states is up to 5M% of total variance. Taken together, this analysis shows that variability between subpopulations dominates the total noise in expression and the variability within a single state is surprisingly low. But a model is useful to characterize the underlying transcription, beyond the simple statistics of the resulting expression distribution. In addition, we have not been able to account for the complication of expression from 1 or 2 active loci in S/G2/M conditions nor to understand the kinetics of transitioning between cell-cycle stage expression levels in growing populations. This motivates a model of transcription dynamics in each cell-cycle stage. 96 4.3.4 A model of transcription underlying arrested expression distributions reveals that activator only regulates the probability and stability of activity, whereas the promoter determines active transcription dynamics A model of transcription at the tetO promoter in G1 and S/G2/M will provide insight into the activator's mechanism of regulating transcription, the dynamics of transcription, and enable us to link these insights to the expression observed in growing populations. Analysis of nuclear mRNA has shown the general dynamics of transcription from the tetO promoter across the cell-cycle are activation events in S/G2/M and in G1 to an ON state that lasts throughout a cell-cycle stage. Thus the metrics for characterizing transcription are the likelihood of activation and the expression from an active loci. Although cytoplasmic mRNA is the functional product of transcription, we have described how nuclear mRNA is a more direct readout of transcription. Thus the model is based on the presence and intensity of nuclear mRNA spots. The nuclear mRNA, rather than cytoplasmic mRNA, expression distribution in Giarrested cells is the most direct readout of transcription dynamics from an active loci. Cells in G2/M arrest contain two loci and we cannot distinguish whether one or two are ON. This presents a challenge to separating activation likelihood and active expression in S/G2/M, which we overcome later in the model development. Thus we compare the nuclear expression distributions in GI arrested cells with a single active loci to see how it depends on activator level and binding site number. These distributions are overlaid in Figure 4-8A. Expression is unaffected by activator level, but differs for PIxte 1 o and P7Xt(eto. Thus active expression is a function of promoter only. To gain intuition about the transcription dynamics underlying the expression and in order to develop a predictive model, we parameterize the distribution. (We incorporate data from all activator levels, weighted by the fraction of active cells at each activator level to account for sampling error in distributions constructed from fewer cells.) Thus we seek a model of transcription that produces an expression distribution that fits the Pixteo and P7xtetO Gi-arrest nuclear expression distribution. The Pi1 ,eo expression distribution has a Fano factor close to 1 mRNA, suggesting low variability in the transcription rate. (We've previously shown that transcription producing mRNA with a constant rate and first-order mRNA degradation result in a Poisson 97 distribution, which has a Fano factor of 1 mRNA.) The distribution is well-fit by a Poisson distribution with rate of 7 mRNA produced per nuclear mRNA lifetime (overlaid on the data in Figure 4-8A). Constant transcription from the active Pixteto promoter is surprising: This is recognized as a "noisy" gene, yet once active it has minimal variability in the timing of transcription events. The P 7 ,teo expression distribution has higher variance. Thus active P 7 xteto promoters must be transcribed with a range of transcription rates. We cannot detect whether the transcription rate at a single promoter fluctuates quickly (much faster than a cell-cycle stage) or whether a single promoter maintains a particular transcription rate for an entire cell-cycle stage with variation in the rates between promoters. But the lifetime of nuclear mRNA at the active loci (-10 minutes) sets a "joint distribution" on the range and rate of switching transcription rates. For the former option, a single promoter must sample greatly differing (>> 10-fold) transcription rates during a cell-cycle stage, which are "averaged" by the nuclear mRNA at the loci. In this case, the expression variance still comes from some cells sampling a higher transcription rate than others "on average" throughout the cell-cycle stage. But for the latter option, the variance of the expression distribution will give the range of transcription rates. Because we cannot easily distinguish between the two, and functionally they are similar, we create a model reflecting the latter situation. But we could use that model to extrapolate and consider the rate and range of transcription rates sampled by a single promoter. Thus we model expression with a range of transcription rates, ignoring fluctuations between rates. This could be done by simply fitting the expression distribution to an empirical curve that represents the probability of a given rate of transcription (this is equivalent to breaking down the super-Poisson distribution into infinite Poisson distributions). But we can gain more intuition if we fit the distribution to a finite number of Poisson distributions, because that gives a simple indication of the range of transcription rates. We choose transcription rates that are integer multiples of the Pixteto rate, inspired by the fact that the left-side of the P7 xteto expression distribution is well-fit by a Poisson distribution with 7 mRNA produced per the nuclear mRNA degradation rate, like PixtetO. This suggests that the minimum transcription rate from an active P 7xtetO loci occurs when it 98 behaves like P1 tO, perhaps having only one of seven binding sites occupied. To choose how many Poisson distributions to use to parameterize the expression distribution, we fit the distribution and then compare the squared-error. One Poisson distribution cannot capture the super-Poisson variance. The maximum mRNA count is around 60 mRNA, suggesting that -10 integer-multiple Poisson distributions will capture all of the variability. But this unnecessarily introduces too many variables: most of the Poisson distributions would contribute little to the total expression. Therefore we compare the goodness-of-fit for multiple Poisson distributions between 1 and 10 and choose 3 Poisson distributions as a balance of capturing the shape (and variance) of the distribution with fewest parameters. Table 4-1 shows the contributions of the multiple Poisson distributions and the least-squareerror. Thus we parameterize transcription from the active P 7,teto loci with a rate of 7 mRNA (20% probability), 14 mRNA (50% probability) or 21 mRNA (30% probability) per nuclear mRNA degradation. As discussed above with the choice of the form of the model, these are likely not independent integer-multiple states, but serve as a representation of the spread of activity states. An additional piece of evidence (data not shown) is that the correlation between mRNA level and cell size is no greater for active cells with P7 xtetO than Pxteto expression, indicating that the P 7 ,teto activity state is not dictated by size-correlated global cell activity. The Gi-arrest nuclear expression distributions are overlaid with the model in Figure 4-8A. Tahle 4-1: Alodol selection for active expression distributions 7xtetO IxtetO Sum Parameters of best fit A 1 ON State 2 ON States 3 State 2 7 111NA X=7 mRNA 2-fold, 35% = 7 mRNA Sum Sq Err. Err. 1.8e-4 State 2=2-fold, 60% 0.4e-4 Sq. Sum q. Err. 0.5(- I 0.5e-4 - Parameters of best fit SItte 2 St~ e State State 3 State 4 ON States 4 ON States * Fit is for nuclear mRNA in active arrestedcells. 99 = 2-fold, 0 :Ifo1. 30 -Si__ 0.(A4 _ 2 2-fold, 40% = 3-fold, 15% = 4-fold, 15% 0.1e-4 Next, we aim to parameterize the other metric of active transcription: the likelihood of activation as a function of cell-cycle stage, activator level for Pixteto and P 7x,,,O. But to do this, we need to first determine whether active cells in G2/M arrest have 1 or 2 active loci. But we can solve this with the parameterized the active loci expression distribution, using G1 arrest expression distributions (above). The G2/M expression distributions comprise cells with 1 or 2 active loci, each transcribed according to the dynamics of GI arrest. (This is actually an assumption that proves true when fitting the G2/M distributions.) Thus the G2/M arrest expression distribution at each activator level is fit by a weighted combination of the expression distributions from a single active loci and two-fold that rate. This is shown for four activator levels in Figure 4-8B. The weighting that gives the best fit is concluded to be the fraction of cells with 1 or 2 active loci. Now we have the probability (or fraction) of a G2/M-arrested cell having 0, 1 or 2 active loci, for each activator level, for Pixteto and P7 xtetO. Our first observation is that Pixtet 0 and P 7xtto activation likelihoods are the same when the "effective" activator level is doubled for P 7 xtto, to scale by mean expression, just as it was for the activator level threshold for the presence/absence of GI activity. The 2-fold increase in "effective" activator level likely results from multiple binding opportunities at the sites closest to the promoter, but we don't consider its origin here. So we combine P1 xeo and P 7 xteto data on this adjusted activation scale. This gives a curve for the fraction of cells with 0 (light grey), 1 (medium grey) or 2 (black) active loci (Figure 4-8C). These were parameterized by a binding curve for two loci with an additional parameter to capture any correlation between the two loci's activation: 4-2: 2(" PG2,1Loci 2 1+2("+W() W(I2 PG2,2Loci = 2, where A'=A+Aasai, best fit by w = 25, K = 1.9 1+2 ('+W(A PG2,AnyLoci = PG2,1Loci + PG2,2Loci These fits are shown in Figure 4-8C (grey, black; and red background). The positive value for w indicates positive correlation between activation of mother and daughter copies of the gene. These probabilities of 1 or 2 active loci are combined to give the likelihood that any given loci is active (Figure 4-8C, white; Table 4-2). These 1-, 2- and Any loci activation 100 probabilities quantify the Pearson correlation coefficient for the correlation between mother and daughter gene copies: POFF = (1 4-3: - P) 2 + aP(1 p) 2 (1 P1ON = 2P(_ - P20N = p2 aP(1 + - - P), - P) a) where P = PG2,AnyLoci Both loci are active more often than expected, indicating extrinsic correlation in activation. On average, the Pearson correlation coefficient a = 0.5, which matches the correlation between activity at two homologous loci of a diploid observed in the previous chapter (Zopf et a]., 2013). The observed basal likelihood of activation in the absence of any activator is written directly into the binding curve, but has a small effect on greater-than-basal expression. The case for G1 activity likelihood is simpler: There is only one loci so P(Loci ON) is directly observable, and plotted in Figure 4-8D, for Pxtet() and P 7xtetO together. This is also fit to a simple binding curve, including a specified minimum threshold for activation (Table 4-2): 4-4: PG1 =(, where A'= A-Athres, best fit by K = 0.48 But our understanding of transcription across the cell-cycle is that S/G2 provides the window for activation whereas M/G1 is a window of heightened likelihood of turning OFF. Thus the probability of staying ON characterizes the process better than the probability of being ON at all. This is shown in Figure 4-8E. Interestingly, the curve is sharp and almost linear. The threshold for activity in G1 appears to just be the activator below which all cells have turned OFF by GI (rather than an abrupt change in activator behavior). This also suggests that activator levels control the stability of the active complex. (Activator levels appear to control the stability of the active complex in long-term G2/M arrest too, but with lower inactivation rates than G1 (Figure 4-2).) Thus either the M/G1 transition is a window of increased likelihood of inactivation or the GI environment is more conducive to inactivation than G2/M. Given that the M/G1 transition involves extensive chromatin remodeling, we favor the former. 101 A ixteto 7xtetO 0 0 B 10 0,1. 0 20 ~oi 0 ---- 10 M4 0.i 0 20 0- 0 30 20 60 40 b JModel t bata 20 40 60 20 40 60 20 40 60 20 40 60 05. . 0 --- -4 0 0 .0.05 . O05.......... . 01 0 30 0 0 10 20 30 0 10 20 30 0 005 ~0A K 0,05 0 10 20 30 0 Nuclear mRNA Count C D 0 IxtetO A 7xtetO 100% 0350% 10 050% C-C- 0 50 100 Activation 150 200% T 50 100 Activation 150 200% E 100% rf- G2, Any Loci ON G2/M, 1 Loci ON G2/M, 2 Loci ON GI, OFF Z50% 0 G1,ON so P(ON) in S/G2 T00% Figure 4-8: Development of a model of tetO transcriptiondynamics. (A) The active-loci nuclear mRNA expression distribution is fit to expression in active Gl-arrestedcells. The boxplot is a visual aid of the data (blue) to model (black) fit. (B) Active G2/M-arrested cells' expression (red) 1s a combination of the active-loci distribution and the two-active-loci distribution (thin black lines), weigh ted by the fraction of active cells with 1 or 2 active loci and summed (thick black line). Each activator level's expression distribution is fit to find the best-fit fraction of cells with 1 or 2 active loci. The boxplot is a visual aid of the data (red) to model (black) fit. (C) The model fit from (B) gives the fraction of cells with 0, 1 or 2 loci ON as a function of activator level for Pi 0O and P7O. boci ON (white). (D) The fraction of cells ON in Gl is observed directly of all iction This gives the from experimental rcsults, (E) but the probability of staying ON after G271M may better reflect the situation, 102 Table 4-2: Sumnmary of the model of arrested tetO transciptiondynamins Gi G2/M 2(A PG2,1Loci = 1)+w( , PG1 PG2,2Loci = 1+2 Probability of activity: 2 T) 1+2 + PG2,AnyLoci = PG2,1Loci 2 + PG2,2Loci 25 w K = 190% (or 95% for A'= A + Ahasal, 2 Abasai =.0 units) P 7 xtetO for Pi.teto K 48% (or 24% for P7 xtetO units) A' = A - Athres, Athres =0.4 (or 0.2 for P 7 xtet units) 0 dM dN Nuclear mRNA: - = p - 8NN, Pixteto: Active-loci transcription: P 7 xteto: - = 2p - 6MM mRNA pt = 7 p Cyto. mRNA: (1 *7,... P =.2 = 2 * 7, ... P = .5 3 *7,... P = .3 1/(10 minutes) N kExport = m= kDegradation = 1/(20 minutes) The final assumption to complete the full model of transcriptional dynamics is that there is no transcription from the inactive state. The expression observed in OFF cells does suggest some low-level basal activity, that may even be activator-dependent, but its contribution to overall expression is low and so we choose to ignore it. This model of tetO transcription dynamics is summarized in Table 4-2. The steady-state model under-predicts variability in cytoplasmic mRNA expression, which is higher than variability in nuclear mRNA. This indicates that mRNA and processing and export introduce noise, most clearly evidenced by the Fano factor of nuclear versus cytoplasmic mRNA expression in active cells. Plxteto active-cell mRNA expression distributions have a Fano factor of 1-2 for nuclear mRNA and 3-4 for cytoplasmic mRNA. (For P 7xtetO, the Fano factor is 4-5 for nuclear and 6-8 for cytoplasmic mRNA.) Interestingly, this increase in noise from nuclear to cytoplasmic mRNA comes with increased correlation between cell size and mRNA count (Pearson correlation coefficient, p and p - - 0 for nuclear mRNA 25% for cytoplasmic mRNA). Thus extrinsic variability in global cell activity may affect mRNA processing and export, explaining the additional noise in expression. Another 103 limitation of the model is that it doesn't attempt to capture outlying, extremely intense (>80 mRNA) nuclear spots, which we observed in about 5% of P 7xtet 0 cells in S/G2/M and G2/M arrest. Ignoring these cases is reasonable because they do not carry over to cytoplasmic mRNA. They may represent an overabundance of aborted mRNA transcripts (Rondon et al., 2009) or a bottleneck of mRNA processing or export. 104 4.3.5 Reproducing cell-cycle kinetics with a model of S/G2/M and G1 stationary transcription dynamics We use this model of transcriptional dynamics within G1 and S/G2/M to compare arrested expression to growing populations transitioning between cell-cycle stages. The two key differences in growing populations which will be captured by introducing cell-cycle kinetics into the model are that mRNA expression contains the history of previous cell-cycle stages and cytoplasmic mRNA is delayed by mRNA processing and export. Cells switch between the simple Poisson dynamics and OFF states according to the likelihood of activity as they cycle between S/G2/M and G1. Four assumptions that we make are: (1) We model activity state across the G1/S transition as memoryless, i.e. all cells have equivalent likelihood of activation in S, regardless of G1 activity. (2) Activation occurs probabilistically at a single time-point at the beginning of S/G2. (3) Steady-state nuclear mRNA expression is established immediately, whereas cytoplasmic mRNA is produced by export of nuclear mRNA. (4) Cells retain a 50% binomial sample of their mRNA during mitosis and may turn OFF into G1, but not ON from an OFF state. Assumptions (2) and (3) are evaluated by , then evaluate by comparing the model prediction to growing data. (Summarized in diagram of Figure 4-9.) The model is limited by uncertainty in mRNA degradation. While degradation at short timescales follows simple first-order degradation with a half-life of 20 minutes, there is a long-tail of degradation better described by half-lives around 40 minutes (Appendix 6.3.1). We implement the model numerically with a finite Markov chain model. o < Active fraction 1 or 2 loci are active, s/ according to Loci activation likelihood & mother/ daughter correlation. G2/ M i 1 Active cells stay ON with G1/G2 activity likelihood. G 2 Nuclear mRNA Immediately at steady-state distribution from Poisson production and degradation rate. Cytoplasmic Evolves by Poisson production and degradation (Finite Markov method). Nuclear and cytoplasmic mRNA is halved by binomial sampling. Remains at steadystate. (3 Continues to evolve with new activity state. Simulate 3-4 cell cycles, until stationary. Figure4-9: DiagTain suwnmnarizing the model of cell-cycle dependent tetO transcription. 105 Figure 4-10 shows the results for each promoter at four activation levels, for nuclear and cytoplasmic mRNA. Expression distributions are simplified to quartiles. The model captures two qualitative aspects of growing populations very well, with surprisingly good quantitative agreement (Figure 4-10). First, it indicates that cells growing in G1 have the same activity as steady-state GI arrest, with extra expression levels simply carried over from previous S/G2/M activity (Figure 4-10, blue). This suggests cells at low and intermediate expression turn OFF to GI levels at or soon after the M/G1 transition. The range of uncertainty in mRNA degradation leaves open the possibility that cells turn OFF throughout G1, but if so it has little effect on expression. Second, the model captures kinetics of nuclear and cytoplasmic mRNA within S/G2/M remarkably well (Figure 4-10, yellow-to-orange). Nuclear mRNA is approximately constant during S/G2/M, supporting the model assumption it reaches steady-state expression immediately after activation. On the other hand, cytoplasmic mRNA usually rises from early, to mid, to late S/G2/M, with good agreement to the model, which assumed "production" at the rate of nuclear mRNA export into the pool of mRNA leftover from GI. This supports the idea that activation occurs at the beginning of S/G2, rather than the alternative that cytoplasmic mRNA increases because activation occurs throughout S/G2/M. 106 A lxtetO 1% Nue. 40 lxtetO 13% Nue. ixtetO 95% Nue. IxtetO 50% - Nue. 20 0 S B - G2 M - - S S GI G2 M ixtetO 20 GI S S G2 M G2 M G1 S 1xteto 95% 50% Cyto. 13% Cyto. Cyto. S S 1xteto ixtetO 1%, G1 CytoM 10 S G2 M G1 S S G2 M G2 M GI s S G2 M 7xtetO 50% Nue 7xtetO 13% * Nue. 7xtetO 1% Nuc. z 80 s S GI G1 s 7xtetO Nuc- 40 S D G2 M Gi S S G2 M s S G1 G2 M 7xtetO 7xtetO 40 13%7 " Cyto. Cyto. G1 s S G2 M s G1 7xteto 50%, 7xtetO Cyto. Cyto. 95%- 20 0 S G2 M G1 S UZ M Cell cycle S " I G2 Activation M - G1 S S G2 M G1 s .- ModeL Median Data: 2 5 th, 7 5 th percentile Median -/+ 29, 7ffl perentile... S/G2/M, mid S/G2/M, late 01 Figure 4-10: The gene regulatoryfunction predicts temporal expression in growing populations. Growing data (colored) versus model (black/grey)predictionfor (A) P,, o nuclearmRNA expression, nuclear iRNA expression, (D) P-,0O (B) P,)O cytoplasmnic mRNA expression, (C) P,,o cytoplasmic mRNA expression, each at four activatorlevels (1%. 13%, 50%, 95%). Quartiles (25%/ 50%/75%) of experimental data are shown as colored boxplots. The model median is a black line and model quartilesare a grey shaded area. Nuclear mRNA reaches "steady-state"early in S/G2 and is fairly stable across S/G2/Al (yellow-to-orange,A, C). Cytoplasmic mnRNA increaseacross S/G2/AM is captured by the model (yellow-to-orange,BD). Modelprediction of GI expression agrees with data within the uncertainty of mRNA degradation (blue). 107 4.3.6 Cell-cycle dependent transcription at the yeast PHO5 gene suggests generality among regulated genes in yeast Our observations of cell-cycle driven fluctuations in transcription at the tetO promoter lead us to ask whether this is the case at any other yeast genes. We choose to investigate expression from the native yeast PHO5 gene, which has been actively studied for its noise properties (Mao et al., 2010; Raser & O'Shea, 2004). Like tetO, expression from the PHO5 promoter (PpHo 5) is titrated by changing the level of its activator, Pho4p. We observe the fraction of active cells and the cytoplasmic mRNA count of a growing population segregated into cell-cycle stages. We infer underlying G1 transcription rates using the mRNA degradation rate (Equation 4-5), supported by the agreement of arrested and growing tetO expression data. Remarkably, PpH5 transcription shares the same cell-cycle dependence as tetO: basal expression is restricted to SG2/M, and G1 expression increases towards S/G2/M levels as activator level increases (Figure 4-11). Unlike tetO, there is less separation between cytoplasmic mRNA levels in cells with and without nuclear mRNA (on average, OFF subpopulation mean expression is only 20% less than ON subpopulation mean expression, compared with 80% for P7 m 0o and Pteto). This suggests that switching at the PHO5 gene may occur on faster timescales then tetO, independent of cell-cycle activation. The PHO5 gene system has been studied extensively, yet this trend has not previously been appreciated. A 211tGI Mss = Mfci - A _ __ B__ -Et=1 (-,tc 2 4-5 _ B100% _ z 0 _ CG2/= 2x GI Growing late G2/M cel 40 G2/M= 20 0 Growing GI cel =25% 0 25% GI 50% - 0% x 50% 75% Activator level ~0%0 IGI 0 inferred from growing data 25% 50% 75% Activator level Figure4-11: Cell-cycle dependent transcrjptionfrom the PHO5promoter. Expression is measured in grwing cells segregatedinto G1 (light blue) and late G2Al (orange) cell-cycle stages. Under-ving Gi transcrptio n is inferred and stationazy expression is shown (blue). (A) Mean of cytoplasmic mBRNA expression distributionsin each cell-cycle condition for a titration of Pho4p activator levels. Like the tetO promoter, all basal expression occurs in G2. Gl expression occurs above an activation threshold, then approaches G2/ , expression. (B) The ratio of cells active in G2/Al versus GI shows the same trends. 108 Discussion: 4.4 4.4.1 Naive interpretation of expression distributions with the bursting transcription model We have shown that cell-cycle and other extrinsic sources of noise dominate expression variability. It follows that assigning all noise to stochastic transcription must overestimate the noise, or "burstiness", of transcription at the gene loci. Figure 4-12A shows expression of a growing population, which is well-described by the negative binomial distribution solution to the stationary "bursting" model. According to the this model solution, the parameters of the negative binomial fit are the Frequency and 1/(1+Size) of bursts respectively. We fit and extract these parameters for each condition (Figure 4-12A is one example) and plot the parametric curve of burst size versus burst frequency in Figure 4-12B. All three promoters (ixtetO, 7xtetO, PHO5) show a biphasic trend in the "burst size", that apparently increases and then plateaus as activator level increases. The apparent burst size at P 7 xteto is about 2-fold higher than at PIxtto. (Note: This negative binomial fit is to mRNA count per cell, where some cells are in S/G2/M with two loci.) 109 A B 1xtetO 5% Neg. bin. fit. OA 0.4 1xtetO 301 PHO5 C) 34 0-3 0-2 m~ 201 10 0.1 0 0 10 20 0' ( 30 1 2 3 4 Burst frequency Burst frequency C D P(Nli C' - PoNm G1 1 xtetO 3 30 PO* m 621 1 xtetO - 20 PHO! 1 -1 PHO5" POn A 2 10 C, 62 - 1u JI 096 50% 100% 150% Activation 200% 0% 50% 100% 150% 200% Activation Figure4-12: An explanation of the dynamics inferred bl the transcriptionalbursting model. (A) Growing (and arrested,not shown) distributions are well-fit by the negative binomial distribution solution to the stationary bursting transcriptionmodel. (B) For each of the three genes studied here, the model interpretsa b/phasicmode ofregulation. where activatorincreases expression first via burst size and then burst frequency (trend lines are added only to aid the eye). (C) The model predicts burst frequency increases linearly with activatorlevel. (Pro ,s plotted against2-fold activatorlevel. such that Pj,(o and P,lo reduce to the same activation curve.) (D) The model predicts burst size increases and then plateaus. This trend is mimicked by the diference between Gi and S/G2/M activity, which peaks at the sane point as apparent"burstsize" (light grey dotted line). But arrested populationsshare this trend. The trend is actually caused by the variation between cells that activate or not in S/G2, which peaks whein 50c of cells are OFF or ON (dark grer dotted line). 110 But expression clearly does not follow stationary transcriptional bursting, because expression is markedly different between cell-cycle stages. The burst size inferred here, of about 10 mRNA for Pi,, eo0 and about 20 mRNA for P 7 xtetO, represents some combination of the 3-4 mRNA and 5-7 cytoplasmic mRNA Fano factor observed for Pitco and P7 tXtO active loci plus superposition of inactive and active cells in Gi and G2. But the biphasic trend in the apparent "burst size" is interesting, and we consider its underlying cause (Figure 4-12D). One possibility is that at absolute basal levels, active transcription does not follow the dynamics of higher active levels. Our data cannot determine this because the active events are so rare and thus poorly sampled by the mRNA FISH technique. In theory, a high throughput method for counting single mRNA molecules could determine if basal activity shares the expression distribution of higher levels. We hypothesized that apparent "burst size", or growing population Fano factor, peaks at the biggest different between the fraction of active cells in Gi and S/G2/M (Figure 4-12D, lightgrey). This combination of mostly-ON S/G2/M cells and mostly-OFF GI cells might create the highest variability. This correlates well, but proves not to be the cause, partly because even if all cells are OFF in Gi, they carry mRNA from the previous G2/M for much of G1. Instead, the biphasic trend in burst size appears to track the variability in S/G2 activation likelihood, such that the Fano factor (normalized variance) peaks when there is a 50% chance of a cell being OFF or ON (Figure 4-12D, dark-grey; activity likelihoods from Figure 4-8D). Beyond this specific connection, the fact that the expression distribution fit the negative binomial solution, despite not adhering to the bursting model, is a cautionary example for using models to explain data. 111 4.4.2 Reconsidering cis and trans modes of regulating transcriptional dynamics in yeast Our findings here give grounds to reconsider previous understanding of how cis and trans elements modulate expression noise through transcriptional bursting. Results of studies varying cis and trans elements and inferring how they modulate dynamics were listed in Table 1-1. Multiple activator binding sites are expected to increase burst size (Raser & O'Shea, 2004; To & Maheshri, 2010). In this study, adding multiple binding sites, as P 7xtetO vs PlxtetO, affected expression from an active promoter, apparently supporting multiple higher activity states, long-lived compared to mRNA lifetime. In this way, multiple binding sites increase expression noise for a given mean (which appears as increased "burst size") through the variability of expression at the promoter. The multiple binding sites also affect the mapping of activation likelihood to activator level, which may appear as modulation of the "burst frequency" at a given activator level. But when scaled by mean expression, we saw the relationship reduce to a single binding response curve, effectively "decoupling" the two effects of binding site number. Binding site strength is also expected to affect burst size. It may also modulate the likelihood of activation in S/G2, via the same mechanism as multiple binding sites. But binding site strength likely also regulates the stability of the ON state, modulating its duration in a fairly close approximation of "burst size regulation". TATA strength is known to modulate expression noise alongside mean, also appearing as burst size regulation (Mogno et aL, 2010; Raj et a]., 2006; Raser & O'Shea, 2004). The CYC1 promoter within the tetO promoter used in this study has one strong and three weak TATA binding sites. But we observed Poisson-like expression at the active Pitto promoter. So how could ablation of the TATA further reduce noise? Our hypothesis is informed by T.L. To's previous measurement in our lab that a growing population expressing from PIXt(to with an ablated TATA site has an apparent burst size of 1-2 and a frequency changing with activator level. This low Fano factor may derive from activation with similar likelihoods but a, much lowered productivity in the active state (i.e. ~1 mRNA produced per mRNA lifetime 112 instead of 7 mRNA measured here). This would mean that ON cells are not highly differentiated from OFF cells, keeping the apparent burst size low. The alternative scenario of cells activating with less likelihood in S/G2, or cells activating but then turning OFF very soon after, would be expected to result in a growing population with a higher Fano factor. But the actuality may be a combination of both scenarios. Chromatin remodeling prior to activation of transcription is also expected to increase burst size. It appears that once chromatin is removed, the activator can stay primed for multiple rounds of reinitiation in one "burst" of transcription. In this study, we investigated cell-cycle as a global regulator of transcription at the PHO5promoter. The inactive PHO5 gene is occluded by three well-positioned nucleosomes, on the TATA box and a highaffinity Pho4p binding site; a low-affinity Pho4p binding site is exposed between the nucleosomes (Vogel, Horz & Hinnen, 1989) (Figure 4-13). Pho4p binds to the exposed low-affinity binding site, which instigates chromatin remodeling and disassembly (Svaren & Horz, 1997), which exposes the high-affinity binding site and TATA box and enables active transcription. PHO5 Nucleosome binding sites Figure 4-13: The PHO5 gene. One strong activator binding site ('i) is occluded by a Vellpositionednucleosomne; one weak activatorbinding site ('L") is exposed. Pho4p binds the exposed site to induce chromatin remodeling, allowing Pho4p binding of the high-affinity site, then activation of PHO5 expression. We saw basal transcription restricted to S/G2 and increasing activation in S/G2/M, and then in G1, in response to increasing activator levels. But compared to the tetO promoter, where chromatin remodeling plays a minor role, expression in active and inactive states was less separated. This suggests that the promoter more quickly transitioned between a state producing mRNA, resulting in a visible nuclear spot, and a dormant state. This may indicate multiple physical states with varying activity and chromatin occupancy. For example, the S/G2 window for activation may play the same role for PPHo5 as PL/7xteto, enabling initial activation. But throughout the cell-cycle, an activated promoter may fluctuate between states with one or more bound chromatin, 113 creating shorter-term fluctuations in transcription. Others have studied expression noise at the PHO5 gene in response to a range of Pho4p activator mutants (Mao et al., 2010). They compared three models of activation and concluded that Pho4p controls the rate of nucleosome disassembly and then assembly of transcription machinery. This data may well be confounded by the cell-cycle effects observed here. It may be that Pho4p controls the likelihood of activation (during nucleosome disassembly of S/G2) and also then controls the rate of fluctuations between the nucleosome-free states throughout the cellcycle. Other genes in the PHO family have different promoter architectures. Promoters less-occluded by chromatin may mimic the tetO gene in the dominant separation of ON/OFF states. Promoters occluded by chromatin but with only weak activator binding sites may have only a short-lived period of activity following S/G2 activation. This could be easily tested at a family of PHO5variants with our method and would give interesting insight into chromatin regulation dynamics. The role of activator level or function was least clear in Table 1-1. Activator appears to predominantly regulate burst frequency, but also burst size in some cases. Here, we saw that activator directly regulated the likelihood of activation in S/G2, which may correspond to apparent burst frequency, and is in agreement with the basic understanding of chemical kinetics. But it also appears to regulate the stability of the active complex across the M/G1 transition and during extended periods of G2. This could be tantamount to "burst size" regulation because it modulates the amount of mRNA production in a single active period. One unresolved question about our observations at the tetO promoter, and common to any gene that follows a complicated scheme of activation, is how the overall response of mean expression to activator level is linear. This seems surprising, given that the S/G2/M activation curve increases rapidly before plateauing and GI transcription occurs only after some activator threshold. It appears that these balance to give an overall linear response curve. But whether a linear response is inherent to the mechanism of activation or coincidence remains to be answered. 114 4.4.3 Predictions about cell-cycle as a global transcriptional regulator in other organisms Cell-cycle may prove to act as a global transcriptional regulator, to some extent, at all yeast genes. But key differences between yeast and other organisms create doubt as to whether this is so in other organisms. Nonetheless, our findings suggest other effects of cellcycle or chromatin remodeling in lower and higher organisms. Bacteria do not share the same chromatin DNA packaging as eukaryotes, but bursting has still been identified at some genes (Golding et a]., 2005; So et a]., 2011). This suggests other factors regulate promoter activity state, which may or may not involve the cell-cycle. But E. coi cells in fast growing conditions actually contains multiple genome copies, in preparation for upcoming rounds of division. Thus expression must be the average of transcription at each copy of a gene. This should be taken into account when inferring dynamics from expression noise because the apparent, averaged size of transcriptional bursts would dampen true burstiness or switching dynamics. Mammalian cell cycles (10-20 hours) are slow compared with yeast cultures, and spend a relatively small fraction of time in S/G2. Hence S/G2/M activity will make a relatively small contribution to real-time expression or expression across a population, and most likely doesn't substantially affect noise. But if the S/G2 window of activation observed here is indeed linked to post-replication chromatin remodeling, there are implications for mammalian cells. In this case, cell processes that govern chromatin remodeling events may underlie all bursting transcription in mammalian cells. Also, cell volume alone is proving to play a dominant role in global transcription activity at constitutive genes (Raj, 2013, data not yet published). But what dominates variability in expression of regulated genes under repressed conditions remains to be seen. 115 4.5 Conclusions We found that tetO transcription dynamics are characterized by globally-correlated, probabilistic S/G2 activation controlled by activator binding to its operator, and active transcription with surprisingly low noise depending on promoter architecture. Beyond the results presented here, these findings motivate revisiting studies that that ascribe all noise to stochastic transcription. It suggests a causative link between DNA replication and transcription under repressed conditions, with implications for biologically relevant cases of fast-growing cells, such as cancer and development. A gene with cell-cycle dependent transcription will perform differently in many network topologies, which should be considered in synthetic biology design. Together, this presents a method for determining transcriptional dynamics at a single promoter and quantifies the role of cell-cycle as general regulator and source of noise in gene expression. 116 4.6 References Cai, L., Friedman, N., & Xie, X. S. (2006). Stochastic protein expression in individual cells at the single molecule level. Nature, 440(7082), 358-362. Chubb, J. R., Trcek, T., Shenoy, S. M., & Singer, R. H. (2006). Transcriptional pulsing of a developmental gene. Current Biology, 16(10), 1018-1025. Friedman, N., Cai, L., & Xie, X. S. (2006). Linking stochastic dynamics to population distribution: An analytical framework of gene expression. Physical Review Letters, 97(16), 168302. Golding, I., Paulsson, J., Zawilski, S. M., & Cox, E. C. (2005). Real-time kinetics of gene activity in individual bacteria. Cell, 123(6), 1025-1036. Hilfinger, A., & Paulsson, J. (2011). Separating intrinsic from extrinsic fluctuations in dynamic biological systems. Proceedingsof the National Academy of Sciences, 108(29), 12167-12172. Mao, C., Brown, C. R., Falkovskaia, E., Dong, S., Hrabeta-Robinson, E., Wenger, L., & Boeger, H. (2010). Quantitative analysis of the transcription control mechanism. Mol Syst Biol, 6 Mogno, I., Vallania, F., Mitra, R. D., & Cohen, B. A. (2010). TATA is a modular component of synthetic promoters. Genome Research, 20(10), 1391-1397. Raj, A., Peskin, C., Tranchina, D., Vargas, D., & Tyagi, S. (2006). Stochastic mRNA synthesis in mammalian cells. PLoS Biol, 4(10), e309-e309. Raj, A., & van Oudenaarden, A. (2009). Single-molecule approaches to stochastic gene expression. Annual Review of Biophysics, 38(1), 255-270. Raser, J. M., & O'Shea, E. K. (2004). Control of stochasticity in eukaryotic gene expression. Science, 304(5678), 1811-1814. 117 Rondon, A. G., Mischo, H. E., Kawauchi, J., & Proudfoot, N. J. (2009). Fail-safe transcriptional termination for protein-coding genes in S. cerevisiae. Molecular Cell, 36(1), 88-98. So, L., Ghosh, A., Zong, C., Sepulveda, L. A., Segev, R., & Golding, I. (2011). General properties of transcriptional time series in escherichia coli. Nature Genetics, 43(6), 554-560. Suter, D. M., Molina, N., Gatfield, D., Schneider, K., Schibler, U., & Naef, F. (2011). Mammalian genes are transcribed with widely different bursting kinetics. Science, 332(6028), 472-474. Svaren, J., & Hbrz, W. (1997). Transcription factors vs nucleosomes: Regulation of the PH05 promoter in yeast. Trends in BiochemicalSciences, 22(3), 93-97. Taniguchi, Y., Choi, P. J., Li, G., Chen, H., Babu, M., Hearn, J., Xie, X. S. (2010). Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science, 329(5991), 533-538. To, T., & Maheshri, N. (2010). Noise can induce bimodality in positive transcriptional feedback loops without bistability. Science, 327(5969), 1142-1145. Vogel, K., Horz, W., & Hinnen, A. (1989). The two positively acting regulatory proteins PHO2 and PHO4 physically interact with PHO5 upstream activation regions. Molecular and Cellular Biology, 9(5), 2050-2057. Zopf, C. J., Quinn, K., Zeidman, J., & Maheshri, N. (2013). Cell-cycle dependence of transcription dominates noise in gene expression. PLoS Comput Biol, 9(7), e1003161. 118 CHAPTER 5. 5.1 Future directions Synopsis This thesis characterizes how cell cycle influences the transcriptional dynamics of promoters not previously associated with cell-cycle dependent control in budding yeast. It precisely and quantitatively defines to what extent this influence contributes to the noise in gene expression. While the work has been limited to two promoters, and in budding yeast, it leads to the exciting possibility that these particular cell-cycle dependent transcriptional dynamics operate for the 20-30% of genes characterized as "noisy.- in S. cerevisiae (BarEven et a., 2006). Additional important questions remain regarding the origin, mechanism, generality and consequences of cell-cycle dependent transcription. We discuss our hypotheses pertaining to these questions, and identify existing and new approaches in testing these hypotheses. The origin of the S/G2 window for transcriptional activation 5.2 We have observed that low expressing PHO5 and tetO promoters activate only in early S/G2 (Chapter 3). Given the importance of chromatin remodeling for efficient activation of yeast promoters, we have hypothesized that chromatin maturation following DNA replication leaves otherwise chromatin-repressed genes open for activation. The hypothesis is informed by current understanding of chromatin maturation. During replication, old histones are recycled to new histones, apparently with random segregation of groups of histones to leading and lagging strands. Our mRNA FISH can detect no evidence for or against bias towards mother or daughter turning on, however it is also believed that chromosomes segregate between mother and daughter cells randomly (Keyes et a., 2012).) This is likely to involve the Mcm2p helicase, which is bound to the replicating strands and also binds free histones, acting as a transient docking site. But equal numbers of histones must be newly synthesized and assembled on the nascent DNA. Production of histones is rapid but tightly regulated. These are then shuttled to the nucleus via sequential association with histone chaperones, undergoing post-translational modifications, to the chromatin assembly factor CAF1 that mediates replication-coupled histone deposition 119 (Alabert & Groth, 2012; Annunziato, 2012). The nascent chromatin is highly acetylated and thus open and sensitive to nuclease digestion, potentially creating a 'window of opportunity' for not only DNA repair, but also transcription factor binding and transcriptional activation. We theorize that this is the underlying cause for replication-linked transcription. Maturation of nuclease-sensitive nascent chromatin to nuclease-resistant chromatin similar to bulk chromatin in interphase takes 10 to 20 minutes (Alabert & Groth, 2012). During this time, it undergoes rapid processing by chromatin modifying and remodeling agents, often guided by interactions with the replication machinery, into a more compact state. The PCNA clamp recruits several of these chromatin modifiers and is positioned to ingrate chromatin assembly and maturation with replication and fork repair. PCNA dynamics have been measured as highly stable, remaining on replicated DNA for up to 20 minutes (Moggs et a]., 2000). One hypothesis for our observation that sometimes only one strand of newly replicated DNA is ON (based on nuclear spots from mRNA FISH) is that discontinuous DNA synthesis on the lagging strand creates asymmetry in PCNA activity and thus chromatin maturation. However the positive correlation in activation of a mother and daughter chromosome suggests this effect is small, if at all, and there is similar chances for activation of the chromosomes made from the leading and lagging strand. At any rate, chromatin maturation primarily involves histone deacetylation and histone linker H1 binding, and thus may be disrupted or slowed by short-term treatment with histone deacetylase (HDAC) inhibitors. (This brief summary of chromatin maturation draws from reviews by Alabert & Groth (2012) and Annunziato (2012).) Hence we hypothesize that DNA replication and associated chromatin maturation leads to the window for transcription from otherwise repressed genes in early S/G2. Ideally, we would test this hypothesis directly on a non-replicating promoter. Comparing expression from a replicating and non-replicating promoter should establish whether replication is necessary for repressed transcription. This could involve a non-replicating promoter expressing a reporter from a centromeric plasmid whose replication origin is flanked by loxP sequences, which can be excised to remove the replication sequence and create a stable nonreplicating plasmid. 120 Nonetheless, a more complete picture will come from knowing which factors are involved. Our hypothesis predicts that removing factors involved in chromatin maturation will reduce or enhance the cell-cycle dependent transcription. For example, addition of HDAC inhibitors or knockout of HDAC genes should delay chromatin maturation and extend the window for activation, perhaps enhancing expression activity from repressed genes. A set of candidate genes whose deletion might affect cell-cycle dependent transcription are shown in Table 5-1 Such affects can be easily screened by mRNA FISH on arrested cells is the most efficient, reliable and direct method to screen the knockouts for changes in cell-cycle dependent transcription. Each knockout will be screened under GI (alpha-factor) and G2/M (nocodazole) arrest at high and low expression levels. The high expression case serves as a control for global changes in expression; the low expression case is the test of whether the knockout gene is involved in the window of opportunity for repressed transcription. A caveat is that chromatin regulation is so central to eukaryotic cell function that knockouts are likely to change global expression levels. In this case, the test result would be only a relative and not absolute change in expression levels. Table 5-1: A list of gene knockouts for studying the origin of cell-cycle dependent transcription Function Chromatin remodeling Details SWI/SNF subunits, involved in DNA replication and Gene name f SNF2, RTT102 transcription Chromatin assembly factor CAF-1, involved in chromatin dynamics during transcription Histone chaperone Nucleosome spacing factor CA C2, RLF2, MSI1 RTT106, VPS75 INO80 Nucleosome assembly factor ASF-1 Histone SAGA complex units, global GCN5, HF1, SPT3 acetyltransferase regulator ADA complex units HAT complex Histone deacetylase NuA4 complex SAS complex, acetylases free histones Rpd3L/Rpd3S complex units (GCN5,) AHC2 HIFi, HA TiI RTT109 EAF1 SAS4 PHO2, CTI1, SIN, RPD3, HOS2 BRE2, SET1, SET2 COMPASS units Histone methyltransferase Cell cycle progression Promotes G2 to M transition CLB5 (Cherry et a1, 2012, Saccharomyces Genome Database) 121 5.3 The complete mechanism of cell-cycle regulation of transcription Our experiments and current literature support this picture of cell-cycle regulated activation: A gene loci may turn ON in S/G2, with likelihood increasing with activator level. Activation is somewhat correlated at the cell level (i.e. between homologous loci of a diploid) and between mother and daughter loci. The activation state is maintained through G2 and M, though its stability is limiting on longer timescales such that it may turn OFF during extended G2/M, depending on activator levels. There is a stronger activator-dependent likelihood of turning OFF over the M/G1 transition, and the new activation state is maintained through G1. There is no indication for or against memory across the G1/S transition, which would make S/G2 activation dependent on GI state. Our quantitative analysis was enabled by the resolution of single-molecule nuclear and cytoplasmic mRNA FISH under arrest and (CJ Zopf's) real-time protein tracking of steadystate expression and activation kinetics. But still many aspects are uncertain and others were unobservable. Completing this picture is necessary for full understanding of the cellcycle's role in transcriptional activation, and I present suggestions for it here, summarized in Figure 5-1. Results 0 Future questions Mother/daughter activation bias? Activator-dependent S/G2 activation Delay in / G/ cytoplasmic mRNA G due to export M Activator-dependent inactivation at M/G1 Can activation occur outside S/G2? i Activator-dependent inactivation in extended G2 Is correlation global or at the replicating chromosome? Gc1 E What is the timing of inactivation? Is history of activation carried into S? 2 Figure 5-1: Summaryv of resuilts and open questions abouit the cell-cycle ti-anlsorptionpattern 122 Starting with S/G2 activation, is there a preference for activation of the mother gene copy on the leading strand versus the nascent daughter on the lagging strand? This would inform us of the mechanism of chromatin maturation following DNA replication. Because mother/daughter chromosomes are randomly segregated to mother/daughter cells in mitosis, this will require identifying which strand(s) are ON at the time of replication and activation. This could involve direct visualization with single-molecule techniques, or ChIP or fluorescence techniques to visualize key proteins at the replication and transcription site. The correlation in activation of homologous loci in a diploid cell and between a mother and daughter chromosome following replication is similar but the source is unknown. Do both simply arise from global cell transcriptional capacity or activator level? The first test for this is to hybridize probes to the activator simultaneously with the reporter and look for correlated levels. Gene-independent, cell-wide expression correlation should then be probed at other constitutive and regulated genes, in addition to mining already-available genomewide expression correlation data. The S/G2 period appears to dominant activation, but can activation occur outside of it? Could a highly-expressed or moderately-expressed gene activate in late G2/M or Gi? This should be tested in slow growth conditions where cells spend most of their cycle in G1. Measurements are taken before and after gene induction to test for GI activation. A caveat of any study in slow growth conditions is that these lead to global changes in expression which can cloud all results. This experiment may require several controls on global expression or selection of a gene unaffected by growth conditions. Further probing G1 activity, it is unclear how G2/M activation levels transition to G1. Data indicates that cells may turn OFF over M/G1, but do not (or very rarely) turn ON. But when does the inactivation occur and on what does it depend? For a gene promoter with a range of activity levels when ON, such as Pax, are less active genes more likely to turn OFF? Further, is there an increase in these gene's activity prior to mitosis in anticipation of dormant transcription during division, as has been identified at other genes? These answers require advanced experimental techniques obtaining real-time data at mRNA-level for the necessary resolution. The MS2 variant of mRNA FISH, where a gene is 123 engineered to bind multiple fluorescently-tagged MS2 proteins in vivo may be best, so long as it the bulk of the engineered and tagged mRNA does not change activation dynamics. Finally, we know little about the G1/S transition. Here too, real-time, precise mRNA levels, perhaps obtainable with the MS2 method, or real-time visualization of a gene's activation state, with fluorescently labelled transcription machinery, will answer whether S/G2 activation carries GI history. 124 5.4 Prevalence of cell-cycle driven transcription While the majority of work in this thesis was focused on synthetic tetO promoters, we observed similar cell-cycle dependent expression at the native PHO5gene. A key feature of this was at low expression, there was no transcriptional activity in G1. Moreover, related work demonstrates that kinetic activation of the PHO5 promoter upon introduction of its upstream transcription factor is strongly biased to occur in S/G2 (Zopf, Wren, Maheshri, unpublished). We have suggested that this phenomenon may at least be present at the highly regulable, noisy, TATA-box containing class of yeast promoters previously identified (Newman et a]., 2006). This could be tested by screening a set of well-studied regulated yeast promoters. The most basic experiment will be measuring gene expression at high and low expression under G2/M and G1 arrest. Ideally this could be done with RNA-seq, to measure all genes' expression at once. But RNA-seq may not give clear results because expression is normalized against total RNA levels. Experiments on one gene at a time could first screen at a bulk population level, rather than single cell, to test for a much higher fold-change in G2:G1 expression at low versus high expression. We do expect differences in cell-cycle dependent activation that depend on the nature of the gene's promoter. An immediate example is that expression from PHO5 has much less separation between cytoplasmic expression in cells with and without nuclear mRNA. This is consistent with a less stable ON state, or switching between activity states. One can imagine that a promoter with a complex sequence of initiation, involving removal of multiple nucleosomes, could have multiple slow-steps for initiation, only some of which may be dictated by the cell-cycle. The PHO5 system would be an excellent case study, given the well-known characteristics of the promoter variants. More generally, this approach will reveal elements of promoter architectures enriched among genes identified as subject to cellcycle driven transcription, or those that are not. Finally, observing the nature and effects of cell-cycle regulated activation in several natural genes will allow us to answer whether it has effects at the phenotypic level in biologically relevant regimes. 125 5.5 Towards a generalized, predictive model for stochastic transcription dynamics An eventual goal of biological research is to construct models that enable a priori prediction of cell- and tissue-level behavior of genes and gene networks. Models such as the regulatory function developed here for a single gene are a small step to that eventual goal. A next step is to extend our gene regulatory function beyond its current domain, by measuring expression under different cis and trans regulatory conditions. This work has established the requirement for gene regulatory information for transcription with any cellcycle dependence: mRNA expression data under cell-cycle arrest. Here we consider interesting directions to extend the domain of the gene regulatory function. Additional binding sites seem to act like multiple activity states, causing higher ON-state variability. To what extent does this trend continue with more binding sites and when does it saturate? The TATA box is another cis element important to transcriptional dynamics. T.L. To measured expression from Pixtet 0 with an ablated TATA box in our lab and saw a reduction in population-wide noise, as have others at other genes (Hornung et al., 2012; Mogno et al., 2010; Raser & O'Shea, 2004) . Seeing that the Plitcto promoter has minimal noise when ON, a likely hypothesis is that the TATA box is necessary for stability to maintain the active state through the cell-cycle stage. This could be tested on several variants of TATA box strength. The case of higher expression through activation is particularly interesting. Activation at 100% in this study refers to maximum induction of tTA when expressed from the ADHi gene s promoter on a centromeric plasmid. But tTA levels can be increased further using other promoters or by placing it under positive autoregulation. Our lab has previously measured that maximum activation gives expression 2-fold higher for Pxte,0 and 4-fold higher for P 1xtcto, than the level of activation assigned to 100% in Chapter 4 (To & Maheshri, 2010). But our gene regulatory function allows only 60% and 20% further increase in expression levels, through saturating the ON state. So where does the remaining expression increase come from? One possibility is that further increase in tTA levels increases ON state activity. Yet ON state activity is particularly unaffected by activator level for Ix. Therefore, what is the tTA level at which ON state activity increases, and does this occur after the 126 probability of being in the ON state in any cell-cycle stage reaches 100%? Such questions could be answered determined by measuring expression distributions in arrested cells containing higher amounts of tTA, but caution is warranted as too high levels of tTA are toxic to cells and have global effects on expression. Repeating work suggested here and done earlier in the thesis on several gene classes' promoter variants should form a foundation for generalized models of regulated stochastic transcription dynamics and progress towards a prioridesign of gene networks. 127 5.6 Consequences of cell-cycle driven transcription in gene networks As described in Chapters 1 & 2, noise in expression has consequences for behavior in gene networks, and how an activator regulates transcriptional dynamics can affect the network behavior. Most studies to date have focused on the consequences of variability from transcription assuming bursting dynamics. This thesis work now leads to a whole new set of questions for how genes with variable transcriptional activity driven by the cell-cycle function within gene networks of different network topologies. Our work in determining a quantitative gene regulatory function for the tetO promoter in Chapter 4 enables the development and exploration of physically realistic mathematical models in which genes exhibit stochastic transcription with cell-cycle dependent dynamics of the type studied here. We describe some general consequences of adopting this new view of stochastic gene expression. Straightforward computational and theoretical approaches can confirm and expand upon these ideas. First, we predict more ordered and correlated fluctuations. At minimum, the fact that each cell in S/G2/M has two homologous loci contributing to a gene's expression doubles the apparent burst frequency, dampening fluctuations. On the other hand, we note that expression between homologous loci, and perhaps all genes, within a cell is correlated. In this case, fluctuations would not be dampened, but entire network activity may fluctuate together within a cell. On a single-cell basis over time, these fluctuations may be more cyclical and ordered because they are dictated by the periodic cell cycle, rather than independent, stochastic promoter events. However, just as for fluctuations from transcriptional bursting, network activity occurs on the protein level so we must consider whether cell-cycle driven fluctuations will be averaged away by delays between mRNA export, translation, protein relocation to the nucleus and protein lifetime. Any protein-level fluctuations are more likely for transcription factor proteins with short lifetimes, which is often true for signaling proteins. Whichever the case, significant fluctuations at the protein level are more likely with slower cell cycling, which could make network behavior dependent on cell growth and metabolism. It will be interesting to explore these general effects, through experiments measuring networked genes' mRNA and protein fluctuations in a range of growth scenarios, and models to capture dynamics. 128 Within a positive feedback loop stochastic fluctuations have previously been predicted and demonstrated to lead to bimodal expression distributions even without deterministic bistability (e.g. To & Maheshri, 2010), theoretically using models assuming transcriptional bursting and experimentally using the exact same tetO promoters studied here. Reevaluating these results in light of this thesis work, it's first worth considering whether the pattern of cell-cycle driven transcription changes underlying deterministic stability. Assuming transcriptional bursting, bimodality was understood to arise from two relatively stable states: OFF cells with low activator levels resulting in a low burst frequency and ON cells with high activator levels resulting in a high burst frequency. Transitions could occur from OFF to ON cells if a rare large burst resulted in enough activator molecules to drive further expression to the ON state. Transitions could occur from ON to OFF provided activators present in the ON state could all degrade before the next burst. Hence, if an activator's lifetime was on a timescale similar to the burst frequency in the ON state, bimodality was possible. Noise-stabilized bimodality could arise from cell-cycle driven transitions with a different pair of stable states: decay of expression during GI to an OFF state such that activation in S/G2 is unlikely and cells may remain OFF for one or more cell cycles, and cells that transition ON in S/G2 that quickly generate high expression causing reactivation in subsequent cycles. Less clear is how cells transition from the ON to the OFF state, perhaps through higher-than-average stochastic degradation in GI. But multiple operator sites' varied expression levels could also result in stochastic fluctuations that permit a gene to enter the OFF state. Full stochastic models, based on the model developed here for stationary transcription dynamics at the tetO promoter but also incorporating delay in relocation of the processed transcription factor to the nucleus, will provide further insights. Ultimately, experimental results for the behavior of positive feedback in each cell-cycle stage will explain how bimodal gene expression arises in light of the cell-cycle driven noise we have measured here. 129 5.7 Future techniques to observe transcription with single-molecule precision in real-time Many of the challenges of this work involved inferring transcription from static snapshots of its mRNA product. Though it is possible to see individual mRNA in real-time (reviewed in Darzaqc et a]., 2009), it so far requires the method of engineering the sequence of the mRNA to bind multiple MS2 bacteriophage coat proteins attached to fluorescent probes. The method has been successful in several studies and, recently, used to quantify transcription rates at any gene in a mammalian cell (Yunger et a]., 2013). But it requires engineering of the gene of interest and the bulky probes may affect mRNA dynamics. A new generation of methods is seeking to view single-molecule transcription, but so far only in vitro. Revyakin et a]. (2012) tethered the sequence of a mammalian gene to a surface and used advanced optics to visualize basal transcription by transcriptional machinery, purified from cell culture. They were able to directly visualize TF assembly and individual transcription events. Purification and addition of cofactors and mediator would enable similar study of activated transcription. The same group also developed an interesting method of in vitro FISH, where probes bind very quickly, because they only contain A, T and C nucleotides and so fold poorly. So transcription processes can be measured on a subsecond temporal resolution. They used this "fastFISH" method to visualize the fast T7 bacteriophage's rate of promoter escape, elongation and termination of transcription (Zhang et al., 2014). Given how important single-cell, single-molecule techniques have been to the study of transcriptional dynamics so far, it seems inevitable that methods visualizing multiple species of molecules in real-time will pave the way for a new generation of studies of transcriptional dynamics. 130 5.8 References Alabert, C., & Groth, A. (2012). Chromatin replication and epigenome maintenance. Nature Reviews Molecular Cell Biology, 13(3), 153-167. Annunziato, A. T. (2012). Assembling chromatin: The long and winding road. Biochimica Et Biophysica Acta (BBA)-Gene Regulatory Mechanisms, 1819(3), 196-210. Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O'Shea, E., Pilpel, Y., & Barkai, N. (2006). Noise in protein expression scales with natural protein abundance. Nature Genetics, 38(6), 636-643. Cherry, J. M., Hong, E. L., Amundsen, C., Balakrishnan, R., Binkley, G., Chan, E. T., Wong, E. D. (2012). Saccharomyces genome database: The genomics resource of budding yeast. Nucleic Acids Research, 40, D700-D705. Darzacq, X., Yao, J., Larson, D. R., Causse, S. Z., Bosanac, L., de Turris, V., Singer, R. H. (2009). Imaging transcription in living cells. Annual Review of Biophysics, 38(1), 173-196. Hornung, G., Bar-Ziv, R., Rosin, D., Tokuriki, N., Tawfik, D. S., Oren, M., & Barkai, N. (2012). Noise-mean relationship in mutated promoters. Genome Research, 22(12), 2409-2417. Keyes, B. E., Sykes, K. D., Remington, C. E., & Burke, D. J. (2012). Sister chromatids segregate at mitosis without mother-daughter bias in saccharomyces cerevisiae. Genetics, 192(4), 1553-1557. Moggs, J. G., Grandi, P., Quivy, J., J-msson, Z.,O., H-Bbscher, U., Becker, P. B., & Almouzni, G. (2000). A CAF-1-PCNA-mediated chromatin assembly pathway triggered by sensing DNA damage. Molecular and Cellular Biology, 20(4), 1206-1218. Mogno, I., Vallania, F., Mitra, R. D., & Cohen, B. A. (2010). TATA is a modular component of synthetic promoters. Genome Research, 20(10), 1391-1397. 131 Revyakin, A., Zhang, Z., Coleman, R. A., Li, Y., Inouye, C., Lucas, J. K., Tjian, R. (2012). Transcription initiation by human RNA polymerase II visualized at singlemolecule resolution. Genes & Development, 26(15), 1691-1702. Yunger, S., Rosenfeld, L., Garini, Y., & Shav-Tal, Y. (2013). Quantifying the transcriptional output of single alleles in single living mammalian cells. Nat.Protocols, 8(2), 393-408. Zhang, Z., Revyakin, A., Grimm, J. B., Lavis, L. D., Tjian, R., & Kadonaga, J. T. (2014). Single-molecule tracking of the transcription cycle by sub-second RNA detection. ELife Sciences, 3, e01775. 132 CHAPTER 6. Appendix 6.1 Yeast strains and plasmids Strain and plasmid construction. All S. cerevisiae strains were constructed in a W303 background (Thomas & Rothstein, 1989) using standard methods of yeast molecular biology (Guthrie & Fink, 2002). Table 6-1: Yeast strains used in this study Strain Relevant Genotype Y1 MA Ta trp1-1 canl-100 leu2-3,112 his 311,5 ura3 GAL+ ADE+ MA Til ade2-1 trpl-l canl-100 leu2Y6 3,112 his 3-11,5 ura3 GAL+ Y139 MA Ta his3::P,,,ato-vYFP-HIS3 MA Ta his3::P r7,,-vYFP-HIS3 Y163 0 Y236 MA Ta/17 leu2::PPGK-RFP ADE2.PmyOz-tTA ura3/ura3::P,tetoCFP-kanR/ P WN 9o-vYFP-kanR Y532 MATa doal::vYFP + pRS316-DOA1 Y955 MA T his3::P1 o-tdTomato-HIS3 Y1011 MA Ta/7 his3/his3.:Pt,,ao-vYFPHIS31Pixeto-tdTomato-HIS3 EY2436 pho5A::CFP-KANMX6 Parent Strain W303 Reference Lab collection W303 Lab collection Y1 +B163 Y1 +B165 Y231xY216 Lab collection Lab collection This study Y1 + B858 Y2 +B229 Y139 x Y955 This study This study This study W303 Gift from E. O'Shea Table 6-2: Plasmids used in strain construction Plasmid B163 B165 B228 B229 B858 Base vector pRS303 pRS303 pCM189 pRS303 pRS316 Relevant gene Pj,,tu-vYFP P7 ,te(rvYFP Construction information Lab collection Lab collection Lab collection Lab collection PCRed DOAl locus from Y1 (BamHI/NotI) and ligated into pRS316 PADHI-tTA Pixtet-tdTomato PDOA-DOA1 133 6.2 Protocols: 6.2.1 Growth & arrest protocols Yeast cultures were grown at 30 C in test tubes in synthetic minimal medium supplemented with 2% glucose and amino acids. A fresh culture, grown for 4-6 hours off a petri dish, was diluted to A600nm=.0003-.002 and induced with doxycycline (Sigma-Aldrich) such that overnight culture of at least 14 hours reached A6 00 1nm=0.1-0.5. This was steady state expression (7 or more doublings). Densities higher than Aoom,,, appeared to affect expression. 3mL was sufficient for each mRNA FISH sample. Cell-cycle arrest at the G1/S transition was achieved with mating pheromone alphafactor. This was added to a concentration of 3pM from 1OOX stock at 0 hours then supplemented with half that amount (i.e. an extra 1.5pM) after 1 and 3 hours. Cells were fixed after 3 or 5 hours of alpha-factor treatment. Most cells (>95%) showed clear signs of alpha-factor arrest during microscopy (with the shmoo projection), and only these were analyzed. Arrest at G2/M was achieved with microtubule-inhibiting nocodazole. This was added at 0.015mg/mL from a 10OX stock of 1.5mg/mL in DMSO at 0 hours, then supplemented with half that amount every 2 hours. Cells were fixed after 2, 5 or 8 hours of nocodazole treatment. Again, most cells (>95%) were clearly arrested, with a large dumbbell shape, and only these were analyzed. Expression was always measured on growing, G1- and G2- arrested populations from an identical overnight culture, which was split for fixation (growing) and arrest at t=0. 134 6.2.2 mRNA FISH Fluorescence in situ hybridization (FISH) to count mRNA in single cells (Raj et al. 2008): 20-50 different single-stranded DNA probes to vYFP were coupled to tetramethylrhodamine (TMR) or indodicarbocyanine (Cy5) fluorophores, and probes to tdTomato were coupled to TMR fluorophores, as reported in To & Maheshri (2010). Yeast were grown to early log-phase (OD, 0Onm = 0.1-0.5) then fixed, spheroplasted, hybridized and washed similarly to (Raj & van Oudenaarden, 2008) with modifications as described in (To & Maheshri, 2010). DNA probes at -5 pM were diluted 50-fold into hybridization solution containing 10% formamide. The set of probes produce sufficient fluorescence to detect a single mRNA with wide-field fluorescence microscopy. Cells were imaged on a Zeiss AxioObserver inverted microscope equipped with a PRIOR Lumen200 mercury arc lamp, a 100X/1.40 objective (Zeiss) and a rhodamine- and Cy5-specific filter set (Chroma Technology Cat. No. 31000v2 and 41024 respectively). For each sample, eight Z-stack images 0.3 microns apart were obtained. 6.2.3 mRNA FISH image analysis Z-stack images were analyzed using custom software written in MATLAB based on that used in (To & Maheshri, 2010). The algorithm used to identify spots corresponding to single mRNA applies region-based thresholding and identifies local maxima as spots. Three parameters used by the algorithm can change due to day-to-day variation in staining: (1) the minimum intensity for a pixel to be considered as part of a spot, manually set by examining several z-stacks, picking a threshold that identifies spots and not background, and verified by insuring a false positive rate of < 4% in negative control samples; (2) the average intensity of a single mRNA spot, chosen using the mode of spot intensities for lower expressing samples (Figure 6-1A), to allow counting of multiple overlapping mRNA; and (3) the threshold intensity at which a spot is classified as a site of nascent transcription, chosen as the transition between the peaked and flat sections of a histogram of spot intensities (-510 fold higher than the intensity of a single spot - Figure 6-1A). Mean protein levels in different samples were used as an internal control, and we always verified the ratio of mean 135 protein level to mean mRNA count was consistent across samples and expression levels. Figure 6-lB is an example of a processed image showing mRNA spots. A 0.15 z 0.1 E S0.05 LL j.fl~A~ 0 0 0.5 I 1.5 Spot intensity 2 X 10 5 B A AAIL B C JA92 -x WU D +.:td E F W_A Figure6-1: (A) Histogram of the mean pixel intensity of spots detected as mRNA in a population of cells with green, blue and red lines showing the parameters selected to analyze the spots. The threshold (green line) is the minimurn pixel intensity for a pixel to be considered as a spot, selected to keep false positives < 4% in a negative control sample without the fluorescent reporter and be consistent with manual vlsual inspection of a subset of images. Alultiple mIRNAs that overlap in the z-projection appear as a brighter spots. The mode of the histogram (blue line) is selected as the intensity of a single mRAVA and spots that are >= 2-fold brighter are counted as multiple spots bzy normalizing with the mode threshold. Very bright spots (>4-fold brighter than a single mRNA spot in the flat region of the histogram - red line) are thought to be formed by sites ofnascent transcription if they align with the nucleus and are automatically identified as those with intensities in the flat region of the histogram (red line). (B) Images of cells with mRNA counts measured by mRNA FISH. (top) Bright-field overlaid with blue DAPI-stainednucleus and (middle) the maximum projection of e4ght images fluorescent rhodamine staining within a Z-stack. (bottom) mRNA and nascent transcriptionsites identified by the spot-counting algorithm are marked with a red or magenta dot respectively. Nascent spots align with the blue DAPI-stainednucleus. 136 6.2.4 Numerical solutions to stochastic models The two-state transcriptional bursting model has an analysis solution at steady state; but more complex schemes, including cell-cycle dependent transcription, do not. Numerical methods enable simulation of more complicated transcription patterns to predict expression distributions. The Gillespie algorithm (Gillespie, 1977) is a kinetic Monte Carlo simulation of stochastic chemical equations. It is simple to implement for any reaction scheme and statistically correct but requires large sampling and thus can be computationally expensive. It allows for simultaneous tracking of multiple state variables. The Finite Markov Chain method is also amenable to almost any reaction scheme and gives a solution for the distribution of state variables over transient or steady-state dynamics. For steady-state dynamics, the principal eigenvector of the transition matrix is the steady-state distribution. It is an exact solution provided that the simulation models a sufficiently large state space (Munsky & Khammash, 2008). But it becomes memory limited and intractable for large state spaces, making the Gillespie method more useful for simulating multiple species/state variables. 137 Quantification of mRNA dynamics 6.3 6.3.1 Nuclear and cytoplasmic mRNA degradation half-life mRNA dynamics are necessary to infer transcription dynamics from mRNA expression. The kinetics of mRNA degradation were measured by thiolutin treatment, which is a potent inhibitor of bacterial and yeast RNA polymerases, and thus ceases all transcription. mRNA expression following thiolutin treatment represents mRNA decay. We simplify mRNA degradation as a first-order process (Equation 6-1) with half-life of 20 minutes. This agrees well for 80% of mRNA degradation over 60 minutes (Figure 6-2A,B black). But degradation is long-tailed, meaning that some portion of mRNA degrade much more slowly. This is evident in mRNA levels higher than predicted by first-order decay at long-timepoints. This trend is better represented by two first-order decay processes (Equation 6-2) with half-lives of 10 minutes and 30-40 minutes (Figure 6-2A, B grey). This uncertainty in mRNA degradation causes uncertainty in transcriptional dynamics, particularly in G1 where mRNA decay is a dominant process. dM 6-1 I - - dt 6-2 dt A z 100% 75% =-1 tDeg M tDeg 2 t M=MoetDe - M M=Me M I M(_ tDeg 2 Sytoplasmic mRNA B 100% tDeg = 22, R2=929 tDegl= 92 tDeg2 75% 40 R29%t 50% 25% 25% 0% 0 g Nuclear mRNA tDeg = 20, R2=97 tDeg jtDeg2 e 50% E t 21 = 8-8 30 R2=999 b 0% 60 30 0 90 60 30 Time after thiolutin treatment (minutes) 90 Fig-re 6-2: Measurement of inRNA degradation dynamics. Cells were grown to steady-state expression and then treated with thiolutin. which ceases trans(iption. (A) Cytoplasmic mRNA degriades by first-order exponential decay with a half-life of 20 minutes over 60 minutes. But degradation beyond 60 minutes is long-taied better reflected by two exponential decay process with half-lives of 10 minutes and 40 minutes. (B) Nuclear mRNA decays with similar kinetics to :Vtoplasnic. Rates are slightly faster, perhaps because nuclear nRNA is not detected below 2-3 nRNA. 138 6.3.2 Rate of nuclear mRNA export To calculate the rate of nuclear mRNA export, we consider nuclear and cytoplasmic mRNA expression levels in cells active under arrest. Assuming these cells are at steady-state (supported by our measurements of arrested expression over time), the export rate is related to the known cytoplasmic mRNA degradation rate by the ratio of nuclear and cytoplasmic mRNA: 63dM 6-3 Nk=0M _ =- kExportN - kDegradationM 0 @SS N kExport kDegradation Where M is mean cytoplasmic mRNA and N is mean nuclear mRNA for a given sample. Across all samples, this ratio is 2 (Figure 6-3). Thus we infer a first-order export rate of 10/minute. 60 .2 40 0 20 .c 0 -- M/N=2 30 20 10 Median nuclear mRNA Figure 6-3: In arrested, steacy-state expression. ON cells' cytoplasmic mnRNA levels are 2-fold ON cells' nuclear inRNA levels across all samlples, indicating nuclear mRNA export has twice the rate of cytoplasmic inRAA degradation. 139 6.4 mRNA FISH error analysis Error analysis for mRNA FISH results is not shown on the figures in Chapter 4 for clarity. (Error is particularly difficult to report when the experimental results are distributions themselves.) Here, we provide the range of the number of replicates for each data point (N=1-5, Table 6-3) and the range of the number of cells sampled for each replicate (N= 186 +/ - 95 cells on average +/ - standard deviation, Table 6-4). This gave experimental error in mRNA count (Table 6-5, 3mRNA on average) and in the fraction of active cells (Table 6-6, 6% on average). Most (70-95%) error derived from variation between replicates, rather than error from sampling individual cells (as determined by bootstrapping). The level of error is illustrated in Figure 6-4 & Figure 6-5, which correspond to figures from Chapter 4. They show error between the mean of each replicate is low and unlikely to interfere with results. Error from sampling within each replicate, shown as error bars on each point, is even lower. Table 6-3: Na ber of replicatesfor each data point. 1xtetO 0 1 % 13% 22% 33% 50% 77% 95% Table Growing 2 4 2 2 1 1 3 1 5 G1 arrest 1 1 1 2 1 1 2 1 3 G2 arrest 1 1 1 2 1 1 2 1 3 Growing 1 2 3 3 2 2 2 2 5 6-4: A verage number of cells sampled in each replicate. ixtetO 0% 1% 5% 13%7 22%c 337 50% 77% 957o Growing 230 260 180 210 470 560 320 140 160 G1 arrest 110 190 150 250 180 130 170 200 270 G2 arrest 130 190 140 120 100 70 80 70 100 140 Growing 170 130 260 260 280 270 350 230 140 7xtetO GI arrest 1 1 3 3 2 2 2 2 3 G2 arrest 1 1 4 4 2 2 2 2 4 7xtetO G1 arrest 260 130 270 150 130 200 160 180 230 G2 arrest 250 150 80 90 110 110 110 100 90 Table 6-5: Total experimental error in mean mRAA count from error between replicates and from sampling within a replicate. Error is reported for data point with > 1 replicate. IxtetO 7xtetO 0% 1% 5% 13% 22% 33% 50% 77% 957 Growing 1.3 0.7 1.8 1.7 G1 arrest G2 arrest 0.2 5.4 1.8 0.7 3.3 3.1 1.9 3.4 Growing G1 arrest G2 arrest 0.7 1.4 1.6 0.6 1.8 3.3 5.0 Average: 4.0 6.7 1.8 12.2 5.1 3.0 8.7 3 mRNA 0.2 5.1 3.0 1.1 3.9 2.3 3.4 3.5 Table 6-6: Total experimental error infraction of active cells, from error between replicates and from sampling within a replicate. 0% 1% 5% 13% 22% 33% 50o 77% 95% A 0 2 1xtetO GI arrest Growing 4%7 5% 10% 7% 1% 2 8% 1% 23% 4% 4% 3% 2o 2% 5% 8% 14% 4. DOA1 c 4% 6% 3b B 0 B0 -'. u.- 7xtetO GI arrest Growing . 2 94 G2 arrest 7xtetO 0% 5% 4 8% 3% 7% 8% Average: C 0 95%. D2 G2 arrest 13 % 11% 3% 4% 2o 5% 2% 6%F TK 2 7xtetO 13% 2 ?-4 - 6 8 twe10 " 8 . b 10 0 10 20 30 Cytoplasmic mRNA per gene copy 0 20 40 60 Cytoplasmic mRNA per gene copy > D 0, G2, Mid G2, Late G1 G2 arrest GI arrest - b 10 00 10 Cytoplasmic - mRNA 20 30 per gene copy E o- - 2 22 4 4 ~=~- 3%meto 13% 6 2 8 8 +4- be 10L 0 0.5 Fraction of active cells 1 b 10 _ S0 0.5 Fraction of active cells I Figure 6-4: Experimental eiror in data points from Figure 4-2 (A, B, C) Error in mean mRNA counts: Each replicatejs shown as a point and error bars represent standard deviation of sampling error determined by bootstrapping. (D. E) Errorin fraction of active cell: Each replicate is shown in overlapping bars. with error bars representing standard deviation from sampling error. Erroris low enough that it does not interlre with interpretation of results. 141 * 25% . IxtetO * * ** G2 arrest- - 25% -. 7xtetO arrest G2 arrest .9.* G arrest -* 50% G1 -*- ** * -- ..V ) 50% r 04 75% 9. - - 75% - 40 60 0 20 Cytoplasmic mRNA per gene copy 0 10 20 30 Cytoplasmic mRNA per gene copy C 0% '_____ IxtetO _ -_- G1 arrest 25% 25% G2 arrest-4- Z050% 7xtetO G1 arrest 2.50% G2~arr-4- * 75% 0 0-5 Fraction of active cells E 0% - * 25% ..Z50% 75% 75% 10 20 3C Cytoplasymic mRNA per gene cop3 0 - 0% in ON cells *-*- 25% 1xtetO G1 arrest G2 arrest" H 0% - * 9 GI arrest G2 arrest -9.-.. 60 20 40 0 Cytoplasrmic mRNA per gene copy in ON cells * - -. 1 7xtetO -+ S-- ) 50% G 0-5 Fraction of active cells IxtetO GI arrest G2 arrest- -~4 25% II 1 25% *7xtetO 9-* G2 arrest -~~ *-9--- 5. .6050% 050% 75% 75% * 0 Nuclear -9 5 10 0 15 30 20 10 Nuclear mRNA per gene copy in ON cells mRNA per gene copy in ON cells Figure 6-5: Experimental error in data J)oints from Figure 4-3. (A, B. E-H) Errorin mean mRNVA coiuits: Each replicate is shown as a point and error bars represent standarddeviation of sampling error determined by bootstrapping. (C, D) Error in fraction of active cells: Each replicate is shown in overlapping bars, with error bars representingstandarddeviation from sampling error. Error is reliably low and does not interiere with interl)retationol results. 142 6.5 References Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25), 2340-2361. Guthrie, C., & Fink, G. R. (2002). Guide to yeast genetics and molecular and cell biology: Part C Gulf Professional Publishing. Munsky, B., & Khammash, M. (2008). The finite state projection approach for the analysis of stochastic noise in gene networks. Automatic Control, IEEE Transactions on, 53(Special Issue), 201-214. Raj, A., & van Oudenaarden, A. (2008). Nature, nurture, or chance: Stochastic gene expression and its consequences. Cell, 135(2), 216-226. Thomas, B. J., & Rothstein, R. (1989). Elevated recombination rates in transcriptionally active DNA. Cell, 56(4), 619-630. To, T., & Maheshri, N. (2010). Noise can induce bimodality in positive transcriptional feedback loops without bistability. Science, 327(5969), 1142-1145. 143