Analysis of Metazoan DNA Replication Initiation using Drosophila Gene Amplification as a Model System by Jane Christina Kim B.S. Molecular, Cellular, and Developmental Biology, 2004 Yale University New Haven, CT Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology ARCHVEs MASSACHUSETS INSTITUE at the OF TECHNOLOGY Massachusetts Institute of Technology Cambridge, MA NOV 16 2010 LIBRARIES February 2011 @ 2010 Jane Christina Kim. All rights reserved. The author hereby grants to MIT permission to reproduce or distribute publicly paper and electronic copies of this thesis document in whole or in part. Signature of Author ........................................... Department of Biology November 8, 2010 Certified by ........................... Terry L. Orr-Weaver Professor of Biology Thesis Supervisor Accepted by ........................... Stephen P. Bell Chair, Committee on Graduate Students Department of Biology Analysis of Metazoan DNA Replication Initiation using Drosophila Gene Amplification as a Model System by Jane Christina Kim Submitted to the Department of Biology on November 8, 2010 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology ABSTRACT Gene amplification in Drosophila follicle cells is an excellent model to study origin specification and developmental regulation of DNA replication in vivo. We mapped all follicle cell amplicons using a comparative genomic hybridization strategy and identified two new amplicons. We determined the precise localization of the origin recognition complex (ORC) on a genome-wide level and observed that, at the start of synchronous amplification, ORC localizes to the six amplicons with levels corresponding to the magnitude of amplification. Additionally, we investigated amplification with respect to transcription and chromatin state. The levels and timing of gene expression in some amplicons suggest that gene amplification is not exclusively a developmental strategy to promote high expression levels. Follicle cell amplicons are enriched for acetylated H4, but this mark is not sufficient for ORC localization or amplification. In addition to genome-wide analyses, we characterized the two new amplicons and discovered unique properties that make both distinctive replication models. Strikingly, DAFC-22B shows strain-specificity in amplification, a property that is correlated with the ability to localize ORC. We identified sequence differences between closely related amplifying and non-amplifying strains and used P element mediated transformation to test sufficiency for ORC binding and amplification at this region. DAFC-34B contains two genes that are expressed in follicle cells. Vm34Ca is a structural component of the vitelline membrane but is expressed prior to the onset of gene amplification. CG16956 is expressed in amplification stages but only in a small subset of follicle cells. Like the previously characterized DAFC-62D, DAFC-34B displays origin firing at two separate stages of development. However, unlike DAFC-62D, amplification at the later stage is not transcription dependent. We mapped the DAFC-34B amplification origin to 1kb by nascent strand analysis and delineated the cis requirements for origin activity, finding that a 6 kb region, but not the 1 kb origin alone, is sufficient for amplification. We analyzed the developmental localization of ORC and the MCM complex, the replicative helicase. Intriguingly, the final round of origin activation at DAFC-34B occurs in the absence of detectable ORC, though MCMs are present, suggesting a novel initiation mechanism. Our analysis of follicle cell amplicons highlights the diversity of amplification origin control mechanisms within the same cell type, which may be representative of similar regulatory diversity during S phase DNA replication. Thesis Supervisor: Terry L. Orr-Weaver Title: Professor of Biology Dedicated,with love and thankfulness, to my parents Sung and Kathy Kim Acknowledgements I rotated in and joined Terry's lab in the spring of 2005 when she was away on sabbatical. It was a bold decision but one that I would make again in a heartbeat because of the kind of advisor Terry is. Terry is a brilliant scientist who has made many seminal contributions to biology. Additonally, she has poured so much of herself into the mentorship of her trainees that, even in her absence my first year (though with frequent phone meetings and regular visits), I experienced top-notch scientific guidance and overwhelmingly supportive collegiality through being a member of the Orr-Weaver lab. Since then, I have only had more reasons to appreciate Terry: her genuine passion for science and the fairness and thoughtfulness with which she manages her lab. I know I have grown in tremendous ways scientifically, professionally, and personally because of Terry's mentorship, and for this I am deeply thankful. I want to thank all of the members of the Orr-Weaver lab who I had the privilege of working with. They have been and continue to be wonderfully supportive and fun colleagues. Their names are too numerous to list here, but each has made lasting contributions on my scientific and personal wellbeing. I will miss being in the TOW-ZONE very much. Steve Bell and Peter Reddien have been on my thesis committee for the past five years. They have provided valuable scientific input and practical guidance that has helped move my project forward every year. Since the beginning, I have felt that they were rooting for me to do well, and this implicit encouragement has been immensely motivating. Graham Walker is newly on my thesis committee this year, but I have admired the work of his education group since the beginning of grad school. His commitment to training scientist-educators and developing effective biology curriculum has inspired me to believe that this endeavor is a worthy one. Jeff Kapler was on sabbatical in Terry's lab during the 2008-2009 academic year from Texas A&M. He provided a lot of good advice about molecular biology experiments and how to be a wellrounded scientist. He also gave me the single best piece of advice I have ever received and reflect on it often. Despite having a wonderful advisor and supportive lab mates, grad school has been challenging, though in the words of Randy Pausch, "Brick walls are there to show how badly we want something." My cherished friends and family have continuously kept my perspective in check and my mood cheerful. I want to thank Wendy Lam, Michelle Sander, Sudeep Agarwala, and Renuka Pandya for their friendship, which I treasure greatly. The friends I have made through MIT and MIT Biograds, Sidney Pacific, and Highrock are, blessedly, too numerous to list here, but I am grateful for them. My high school A4L, Yale girlfriends, and extended relatives have treated me like a (scientific) rock star, and their love and enthusiastic support has stood the test of time (and my scientific hermit phases). This most definitely includes thanks to my brother John for being everything a girl could ask for in an older brother. Finally, I would like to thank my parents. They gave life to me, but through their unwavering support and love they gave me my life. I know the immigrant story of leaving everything familiar behind to pursue opportunity for one's children is not uniquely my family's, but I am profoundly thankful to be the outcome of this one story. This thesis is a testament to my parents' sacrifice, support, and love. Thank you, Mom and Dad. TABLE OF CONTENTS Chapter One: Introduction Eukaryotic replication initiation proteins and cell cycle regulation Replication origin discovery Models from budding yeast and fission yeast Identification of individual metazoan replication origins Methods of replication origin discovery on a genome-wide scale Developmental gene amplification as an origin discovery tool Properties of metazoan replication origins Sequence properties and genome distribution of replication origins Replication and transcription Replication and chromatin context Replication timing and cell type specific replication programs The importance of cataloging replication origins Summary of thesis References 7 8 I1 11 14 22 25 29 29 30 35 37 39 41 42 Chapter Two: Genome-wide identification of Drosophila follicle cell amplicons as in vivo model replicons Abstract Introduction Results Identification of two new follicle cell amplicons by aCGH Genome-wide expression analysis of follicle cells ORC binding in amplicons localizes to the most amplified region H4 acetylation corresponds to the magnitude of amplification DAFC-22B exhibits strain-specific amplification Mapping cis elements responsible for DAFC-22B amplification Discussion Materials and Methods Acknowledgements References 47 48 49 51 51 52 62 70 76 86 86 93 96 96 Chapter Three: Differential ORC localization during two rounds of replication initiation at a Drosophila follicle cell amplicon Abstract Introduction Results Two genes in DAFC-34B are expressed in follicle cells DAFC-34B shows two distinct stages of replication initiation DAFC-34B amplification origin corresponds to the Vm34Ca transcription unit 99 100 101 104 104 107 113 Developmental control of ORC and MCM localization at DAFC-34B orc mutant blocks both rounds of replication initiation at DAFC-34B Delineation of cis control elements for replication at DAFC-34B Discussion Materials and Methods Acknowledgements References Chapter Four: Conclusions and Perspectives Active transcription as a causal determinant of gene amplification CG7337 expression and strain-specific amplification at DAFC-22B Specific histone acetylation and gene amplification Investigating ORC independent initiation at DAFC-34B Application of Drosophila genomic resources to the study of follicle cell gene amplification References 116 122 125 128 132 135 135 138 139 142 143 144 144 146 Appendix One: Strategy and preliminary results toward generation of conditional replication factor mutants in Drosophila Introduction Results Discussion References 147 148 148 150 152 Appendix Two: Synteny analysis of the DAFC-30B amplified region Results Acknowledgements References 157 158 159 159 Appendix Three: Summary of follicle cell amplicons 162 Chapter One: Introduction Complete duplication of the genome during S phase is critical for the accurate transmission of genetic material to daughter cells during the cell cycle. Two fundamental questions related to the regulation of this genome duplication process are: (1) how are individual DNA regions selected to function as replication origins, or sites from which replication initiation occurs; and (2) how is the activation of replication origins coordinated genome-wide such that every sequence is replicated once, and only once, per cell cycle? The identification of replication origins to uncover functional properties of these genomic regions and analyze their regulatory mechanisms is key to understanding these questions. However, this task has been challenging in metazoan systems because until recently, a small number of replication origins had been identified for molecular characterization. This chapter focuses on the identification of metazoan replication origins and the insights into origin function that their analyses have provided: both insights from studies of individual replication origins as well as recent genome-wide studies to map replication origins using microarray and next-generation sequencing methods. The emerging picture of metazoan replication origins is one where, despite the absence of a sequence-specific motif for origin specification, cis-acting information does influence the binding of replication proteins as well as origin function. Furthermore, whether or not DNA replication initiates from a given genomic locus is a highly dynamic process subject to influence from transcriptional activity and chromatin state. Eukaryotic replication initiation proteins and cell cycle regulation In 1963, Jacob et al proposed the "replicon model" to explain the regulation of DNA synthesis in bacteria (JACOB and BRENNER 1963). According to this model, an initiator protein that was encoded by a structural gene would interact with a sequence-encoded genetic element called the replicator to initiate DNA replication. Identification of the origin recognition complex, or ORC, nearly 30 years later demonstrated that a eukaryotic initiator existed (BELL and STILLMAN 1992). However, the identification of origins in eukaryotes reveals that there is a greater diversity of origin structure than the protein complexes that promote origin activation. ORC marks all potential sites of origin activation in eukaryotes (BELL and DUTTA 2002). This hexameric complex binds DNA and recruits Cdc6 and Cdtl. The subsequent ATP hydrolysis activities of ORC and Cdc6 result in the stable loading of the MCM2-7 replicative helicase to DNA and poise the site for unwinding of the double-stranded helix. Together, these proteins comprise the pre-replicative complex (pre-RC), and formation of these complexes establishes origin licensing, or competence for DNA synthesis. Though identified primarily in yeast, the components of the pre-RC have been identified in all eukaryotes examined. The preRC is activated to form the pre-initiation complex (pre-IC), whose components in yeast include Cdc45, GINS, Sld2, Sld3, Dpbl 1, and Mcm10. The mechanism of this activation process is an area of active investigation, but a key event is the Dbf4-dependent kinase, or DDK, dependent phosphorylation of MCM subunits (WALTER and ARAKI 2006). Additionally, CDK-dependent phosphorylation of Sld2 and Sld3 leads to activation of the pre-IC and recruitment of DNA polymerases to initiate DNA synthesis (TANAKA et al. 2007; ZEGERMAN and DIFFLEY 2007). Activation of the pre-RC (and the initiation of DNA synthesis) is often referred to as origin firing and will be used interchangeably with origin activation in this text (see Table 1-1 for replication terminology). To ensure that origin activation occurs once and only once per cell cycle, the process of pre-RC formation is tightly regulated. Across diverse eukaryotic organisms, a general strategy is to separate the origin licensing and origin activation stages in the cell cycle. Origin licensing Table 1-1. Replication terminology. Term Definition Reference Initiator Protein or protein complex that binds replicators and are required for replication initiation (term first proposed in replicon model) (JACOB and BRENNER 1963) Initiation site (replication origin) Location on DNA from which replication forks emanate Origin activation (origin firing) Activation of the pre-RC and the initiation of DNA synthesis Origin efficiency Percentage of cells that activate a particular replication origin every cell cycle Replicator Genetic element that is required for replication initiation from a particular chromosomal location (term first proposed in replicon model) Replicon Region of DNA that is duplicated by a replication origin Spatial replication program The physical distribution of all replication origins used to replicate the genome Temporal replication program The timing with which all replication origins are activated to replicate the genome (JACOB and BRENNER 1963) occurs during GI phase when CDK activity is low, whereas origin activation occurs when CDK activity is high. Additionally, cells use redundant mechanisms to prevent pre-RC assembly outside of G1 including targeted protein synthesis of pre-RC components during G1 and inactivation of these proteins following replication initiation via protein degradation and nuclear export (ARIAs and WALTER 2007). Although the specific combination of strategies differs depending on the organism, one mechanism common to metazoans is the activity of the protein Geminin (BELL and DUTTA 2002; MELIXETIAN and HELIN 2004). Geminin binds to Cdtl and inhibits its function, thus adding another layer of regulation to prevent origin licensing outside of G1 phase. Overexpression of Cdtl has been shown to cause re-replication in multiple cell systems (THOMER et al. 2004; YANOW et al. 2001; ZHONG et al. 2003), thus highlighting the importance of limiting Cdtl activity to prevent re-replication and maintain genomic stability. Replication origin discovery Models from budding yeast andfission yeast To identify cis-acting origins of replication in yeast, a strategy from bacteria was adapted to isolate sequences that would support autonomous replication in a host cell. The autonomously replicating sequence (ARS) assay was successfully implemented in budding yeast and fission yeast to identify sequences that conferred origin function both in the context of a plasmid as well as the native chromosomal locus (ALADJEM et al. 2006; STINCHCOMB et al. 1980). In budding yeast, these sequences are approximately 125 base pairs in length. Compilation of these sequences led to the identification of an 1lbp ARS consensus sequence (ACS) that is required, though not sufficient, for ORC binding. Molecular dissection of a number of budding yeast origins reveals a modular structure, with different conserved elements that stabilize ORC binding and MCM recruitment (LIPFORD and BELL 2001; RAo and STILLMAN 1995) (Figure 1-lA). Figure 1-1. Examples of replication origins from diverse eukaryotes. (A) Budding yeast ARS] is a well-defined replication origin. ORC binds to the 11 bp ACS (and the B 1 element). The origin of bidirectional replication has been precisely identified by replication initiation point (RIP) mapping immediately downstream of the ORC binding site. Collectively, the B elements are essential for origin activity, but their number, type, and position are variable among budding yeast origins. (B) Fission yeast ars2004 is approximately 1 kb in length and contains three asymmetric ATrich sequences (blue boxes) that are required for origin activity. (C) The DHFR locus in Chinese hamster ovary cells is an example of a broad initiation zone. Replication initiates in the 55 kb intergenic regions between the DHFR and 2BE212 genes. Replication initiation occurs most frequently from the three sites designated with triangles. (D) The human p-globin locus is an example of a confined replication site. Replication initiates efficiently from a region <10 kb spanning the p-globin gene. (E) The chorion amplicon DAFC-66D is an example of a confined replication site. The amplification enhancer A CE3 and primary origin Orip3 are both necessary and sufficient for amplification. .... .. .... ......................................... . . ... .............. . ............................................ -... ........ :1 :::::, Figure 1-1 A. Budding yeast ARS1 ORC 1Obp B Initiation B. Fission yeast ars2004 100bp Initiation ORC ORC C. Chinese hamster DHFR locus 10kb DHFR A&2BE2121 Y 0' - w w w MW" w 4w 4w Mw Initiation D. Human -globin locus 10kb Initiation E. Fruitfly Chorion amplicon DAFC-66D 1kb 1ACE3cp 18 PRM pc19pI Snw Orio Initiation (80%) In fission yeast, the ARS assay identified origin sequences approximately 1 kb in length. Although a consensus sequence could not be computationally extracted, fission yeast origins were found to display an extended asymmetric AT-rich sequence, and several discrete regions within individual origins have been shown to influence ORC binding and origin activation. For example, ars2004 contains three different required regions, but they can also be replaced with 40 base pair poly(dA/dT) fragments to restore origin activity (OKUNo et al. 1997) (Figure 1-1B). Identification of individual metazoan replication origins When applied to metazoans systems, the ARS assay was unsuccessful in systematically identifying specific replication origins (GILBERT and COHEN 1989; MASUKATA et al. 1993). It appeared that many sequences could support autonomous replication in this assay (KRYSAN and CALOS 1991). Additional experiments determined that circular DNA could bind ORC and replicate in mammalian cell culture without any sequence specificity (KRYSAN et al. 1993; SCHAARSCHMIDT et al. 2004). These results were consistent with experiments using Xenopus eggs, where any injected sequence could be fully replicated (HARLAND and LASKEY 1980). Although these results discredited the existence of a universal sequence-specific replicator in metazoans, investigators acknowledged that because the early embryo displayed a very high density of origin firing to accommodate the rapid cell division cycles, this could be a specialized case where specific origins are not used. The discovery of site-specific replication initiation in mammalian cells (described below) demonstrated that non-random origin selection could be utilized during DNA replication. The identification of additional metazoan origins to survey the extent of site-specific initiation would rely on developing experimental methods to look for replication origins in well-delineated genomic regions. Approximately 20 metazoan replication origins have been identified and studied using multiple methods (ALADJEM et al. 2006). In the next section, two select examples are described to illustrate the principles of origin discovery and analysis. In addition, they represent two classes of replication origins: broad replication zones that contain multiple infrequent initiation sites and confined replication sites from which initiation events occur at a high frequency. The first mammalian origin identified was the dihydrofolate reductase (DHFR) locus in Chinese hamster ovary (CHO) cells (HEINTZ and HAMLIN 1982) (Figure 1-IC). Its discovery and subsequent analysis was facilitated by several experimentally advantageous characteristics of the cell line. This genomic region becomes amplified in the presence of increasing concentrations of the competitive inhibitor methotrexate, allowing cell lines containing an approximate 800-fold amplification of the 240 kb region encompassing the DHFR gene to be isolated (MILBRANDT et al. 1981). The high copy number of the amplified sequence could be visualized as distinct restriction fragments against the background of single copy genomic DNA on an agarose gel. This property, combined with the ability to arrest these cells in a GO state and track resumption of the cell cycle through the Gl/S boundary in the presence of radiolabeled nucleotide, meant that the earliest-replicating restriction fragments could be visualized and positionally mapped (HEINTZ and HAMLIN 1982). The discovery of the DHFR locus replication origin is representative of a general strategy to identify newly replicated DNA molecules by their increased abundance compared to nonreplicated DNA. Because many researchers used this locus to study gene regulation, gene and restriction maps of the region were available. Since its discovery the DHFR locus has been the subject of many studies aimed at finely mapping the position of the multiple origins in this region. The storied history and often-conflicting results related to this locus are described in a recent review (HAMLIN et al. 2010). The current understanding is that the DHFR locus is a broad replication initiation zone where replication initiates infrequently from one of several origins. The identification of the human p-globin locus origin also was facilitated by being a wellstudied gene region (Figure 1-ID). Newly synthesized leading strand DNA was isolated by pulse-labeling cells with the nucleotide analog bromodeoxyuridine (BrdU) in the presence of emetine, which inhibits lagging strand synthesis, and then recovering single-stranded DNA fractions that were enriched for BrdU incorporation (KITSBERG et al. 1993). These samples were slotted onto filters and hybridized with strand specific probes across a 200 kb region. Replication direction could be inferred by determining whether hybridization was more abundant using the plus or minus template probe. Likewise, the replication origin could be identified as the site of tail-to-tail DNA synthesis or the junction at which leading strand DNA was on opposite templates. This example is representative of a general strategy to identify origins by determining the transition point at which leading and lagging strands switch templates (HAMLIN et al. 2010) (Figure 1-2C). The spacing of the probes allowed the human P-globin origin to be positioned to within a 10 kb, well-defined fragment. Thus, the human p-globin locus is an example of a confined replication site from which replication initiation occurs from a predominant site. The origin sequence from the human p-globin locus could be moved to an ectopic site and still confer origin activity, the first such demonstration in a mammalian system (ALADJEM et al. 1998). The initiation activity at the human p-globin locus requires a region 40 kb upstream called the locus control region (LCR). Although origin function requires the LCR at the native locus, initiation could be observed at ectopic sites without the LCR. This result suggests that distal cis regulatory sequences can influence replication initiation, but their functions can be Figure 1-2. DNA structures at a replication origin. (A) ORC marks all potential sites of origin activation in eukaryotes. ORC-bound sites can be identified by chromatin immunoprecipitation followed by competitive or quantitative PCR (ChIP-qPCR), hybridization to a microarray (ChIP-chip), or high-throughput sequencing (ChIP-seq). (B) Replication bubbles are found at origins due to the melting of double stranded DNA and initiation of DNA synthesis. Replication bubbles can be identified by 2-D gel mapping techniques or the "bubble-trap method" (see text for details). (C) Replication origins correspond to the junction where leading and lagging strands switch templates. Initial identification of the replication origin at the human p-globin locus relied on this property. (D) Newly replicated DNA can be isolated by pulse labeling cells with the nucleotide analog BrdU, followed by anti-BrdU immunoprecipitation. BrdU-IP DNA can be identified by competitive or quantitative PCR, hybridization to a microarray, or high-throughput sequencing (e.g. Repli-seq). (E) Short nascent strands can be isolated by size fractionation on a sucrose gradient or low melting agarose gel. An additional -exonuclease treatment step removes nicked DNA that lacks a 5' RNA primer and is unprotected from digestion. Short nascent strands can be identified by competitive or quantitative PCR, hybridization to a microarray, or highthroughput sequencing (the last method has not been reported but is technically feasible. .......... . ... ..... ................... I-.............................................. ............ . ........... ............. ........................................................ ......... ............. Figure 1-2 A. ORCbound DNA B. Replication bubbles ORC C. Junction where leading and lagging strands switch templates anti-BrdU immunoprecipitated DNA E. Short nascent strands (resistant to X-exonuclease digestion) RNA primer BrdU Leading strand Lagging strand substituted, presumably with sequences that display similar properties, the nature of which are not well-understood. A powerful approach developed to analyze replication origins on an individual level was two-dimensional (2-D) mapping (FANGMAN and BREWER 1991). Though developed to study ARS function in budding yeast, these methods were adapted to study replication origins in metazoans. Neutral-alkaline 2-D gels are used to map origins based on nascent strand sizes (HUBERMAN et al. 1987) (Figure 1-3A) whereas neutral-neutral 2-D gels map origins by branch topology (BREWER and FANGMAN 1987) (Figure 1-3B). An additional advantage of these methods is that origin efficiency, the percentage of cells within a population that activates initiation from a particular site, can be roughly determined. For example, depending on the intensity of the bubble arc compared to the Y arc in a neutral-neutral 2-D gel, one can infer the proportion of cells in which a particular origin is active (Figure 1-3B). Using 2-D analysis, the discrete origins in the DHFR locus were found to fire with low efficiency (MESNER et al. 2003). 2-D gel mapping techniques have been considered the gold standard in precisely mapping replication origins, but their use has technical limitations. Replication bubble structures are scarce in a population of asynchronous cells, making their detection difficult unless the origin is very efficient or the cell population is synchronized, not a trivial task in metazoan cell culture. Furthermore, they require the investigator to know what genomic region to assess. Prior to the availability of complete genome sequences, origin identification relied on the existence of welldelineated gene regions to assess, for example, where nascent DNA molecules or replication bubbles were positioned. Thus, most of the identified origins were in regions that were well studied with respect to the regulation of gene expression, and there were no known examples of Figure 1-3. Experimental methods to identify replication origins. (A) Neutral-alkaline 2-D gels are used to map origins based on nascent strand sizes. The axes indicate the order of electrophoresis. The 2X position marks the spot of nearly completely replicated DNA. The high molecular weight of this structure makes it the slowest migrating (farthest to the left). When the second dimension is run using an alkaline gel, the nascent strands separate but have a large molecular weight, nearly identical to the parental strand. (B) Neutral-neutral 2-D gels map origins by branch topology. The axes indicate the order of electrophoresis. DNA containing a replication origin exhibits the pattern of the bubble arc. When the second dimension is run using a neutral gel, origin centered DNA that has replicated the most is also impeded the most in the gel, due to its large bubble topology. DNA that is passively replicated by an adjacent origin exhibits the pattern of a Y arc. (C) Repli-Seq is a variation of BrdU immunoprecipitation coupled to the precise partitioning of cells into sample groups based on flow cytometry-measured cell cycle stage. Each of these BrdU-IP samples is then subject to high-throughput sequencing, and the data is assembled into mapped sequence reads throughout S phase for any genomic region. The position of an early firing origin can be inferred through the inverted V shape of the mapped sequence tags. Adapted from (HANSEN et al. 2010) . .......... .............. ............................................ Figure 1-3 A. Neutral-alkaline 2-D gel analysis - Restriction site Restriction site Probe Replication bubbles First 1x 2X B. Neutral-neutral 2-D gel analysis ...... First Bubble 2X 8 1x C. Repli-Seq r2, I BrdU pulse in vivo Sort cells into S phase samples by FACS N4..oun OX5++444 Gem early I U I I -... 4NMLTh C12s084 OCATI4 LRg GI 82 IP BrdU-DNA NOW Tr Twg Sequence S4 late G2 -aII 4LVWM OUKRAS metazoan origins located in so-called gene deserts, extended regions of DNA without any known genes. Methods of replication origin discovery on a genome-wide scale Though the DHFR and human p-globin locus replication origin regions were first identified using the methods described above, these origins have been subject to many experimental analyses to further characterize their replication properties as well as refine the strategy of identifying new origins. One general approach is to isolate newly replicated DNA strands and then determine to what genomic regions they map. The DNA isolation is accomplished through primarily one of two methods. The first is to isolate nascent DNA of a particular size, typically in the range of 0.5-2 kb, through a sucrose gradient or on an agarose gel. An additional -exonuclease treatment removes background generated from broken DNA fragments, as this enzyme will degrade DNA fragments not protected by a 5' RNA primer (designated short nascent strands) (Figure 1-2E). The second method to isolate is to pulse label cells with the nucleotide analog bromodeoxyuridine (BrdU) and immunoprecipitate newly synthesized DNA with an anti-BrdU antibody (designated BrdU-IP DNA) (Figure 1-2D). To determine the identity or enrichment of these molecules in the era before complete genome sequences required the investigator to probe a specific genomic region either by hybridization to positionally mapped DNA clones or quantitative PCR. However, the availability of complete genome sequences allows these short nascent strands and BrdU-IP DNA to be identified using microarrays or high-throughput sequencing. Repli-Seq is a variation of BrdU immunoprecipitation coupled to the precise partitioning of cells into sample groups based on flow cytometry-measured cell cycle stage (HANSEN et al. 2010). Each of these BrdU-IP samples is then subject to high-throughput sequencing, and the data is assembled into mapped sequence reads throughout S phase for any genomic region. Data generated from this method is shown in Figure 1-3C. Early initiating origins can be inferred through the inverted V shape of newly synthesized DNA. The Hamlin lab developed a method to identify origins that also uses the general principle of isolating newly replicating DNA and then identifying these molecules on a genomewide scale. The first step exploits the property of circular DNA molecules, including replication bubbles, to be trapped in the agarose-plugged well following electrophoresis (MESNER et al. 2006) (Figure 1-2B). This DNA can be cloned into a genomic library and identified using microarrays or sequencing. Although the methodology is published, the microarray results to identify these replication origins is only referenced as submitted material in a published review article and thus will not be discussed (HAMLIN et al. 2010). The final method to map replication origins is the genome-wide localization of ORC by chromatin immunoprecipitation followed by hybridization to a microarray (ChIP-chip) (WYRICK et al. 2001) or high throughput sequencing (ChIP-seq). Because ORC marks all potential sites of replication initiation, identifying genome-wide ORC localization serves as a proxy for uncovering all potential replication origins (Figure 1-2A). The analyses of short nascent strands, BrdU-immunoprecipitated nascent DNA, RepliSeq, and ORC ChIP to identify replication origins has resulted in a dramatic increase in the number of origins experimentally identified (summarized in Table 1-2). In some cases, such as the identification of replication origins in human cells (HeLa), there was not significant overlap among the origin datasets, suggesting that the different methods selectively identify a subset of all origins. Nevertheless, these studies represent one to two orders of magnitude increase in the Table 1-2. Genome-wide approaches to metazoan origin identification. Transcription Chromatin Timing / Efficiency Notes 28 new ORIs ~ annotated genes ORIs ~ DNaseI hypersensitive sites Early firing origins ~ H3acK9,14 No exonuclease treatment 283 ORIs ~ Gene-rich GC regions, CpG islands, cJun and c-Fos binding sites ORIs - DNaseI hypersensitive sites, H3Ac, H3K4Me2, H3K4Me3 (though 44% Timing independent of origin density for select regions examined 71% overlap Hansen early ORIs <14% overlap with Karnani ORIS Study / Approach Origins Identified Lucas et al, 2007 Human lymphoblastoid cells; short nascent strands; 1.4Mb tiling [Inter-origin Distance] array Cadoret et al, 2008 HeLa cells; short nascent strands; ENCODE 30Mb tiling array (1% [1kb to 500kb, average 63kb] show no enrichment) genome) Sequeira-Mendes et al, 2009 Mouse ES cells; short nascent strands; 10.1Mb tiling array (0.4% 97 85% map to txn units 44% map to promoters [average 103kb] promoter-ORIs ~ expressed genes Select ORIs show H3Ac, H3K4Me2, H3K4Me3 enrichment by ChIP-qPCR Efficient origins ~ overlap TSS ORIs - DNaseI hypersensitive sites Early firing origins ~ expressed genes genome) Hansen et al, 2010 Four human cell lines; Repli-Seq; wholegenome coverage Erythroid n/a Lymphoid male 1131 Lymphoid female 1199 hESCs 1809 Early ORIs ~ high gene density, gene expression, GC content, CpG density Nuclear lamina association Fibroblast 1547 Karnani et al, 2010 HeLa cells; BrdU-IP and short nascent strands; ENCODE 30Mb tiling array (1% genome) MacAlpine et al, 2010 Drosophila Kc167 cells; ORC ChIP-chip and HU BrdU-IP; whole genome tiling arrays BrdU-IP 815 Short nascent 320 Overlap ORIs 150 [BrdU-IP 27.6kb Short nascent 58.4kb] 5135 ORC binding sites (OBS) 630 early origins [average 11kb] 68% ORIs map +/-5kb from TSS ORIs ~ H3Ac, H3K4Me2, H3K4Me3 - late firing origins 49% early replicating ORIs ~ RNA PolIl binding sites 2/3 OBSs overlap with TSS no correlation to individual TF binding site ORIs = mapped replication origin; OBS = ORC binding sites; "~"= OBSs - H3.3 deposition at promoter and nonpromoter ORIs, depletion for bulk nucleosomes significantly enriched for 30% early replicating Higher density of OBS ~ early replicating 49% of genome shows replication timing plasticity number of previously known metazoan replication origins and provide valuable datasets with which to compare replication origins to genomic features such as transcription and chromatin modifications (discussed below). Developmental gene amplification as an origin discovery tool Developmental gene amplification has served as an important origin discovery tool and model for investigating the regulation of metazoan replication origins. Specific genomic regions are amplified through replication-based mechanisms, either chromosomal excision followed by extra-chromosomal amplification or repeated bidirectional replication from an endogenous chromosomal locus, to increase gene copy number. Mapping of these amplification origins has added to the catalog of known metazoan replication origins (CLAYCOMB and ORR-WEAVER 2005). Developmental gene amplification increases the DNA template to allow for sufficient levels of gene products required at high levels in a short developmental period, such as rRNA in frog ooctyes and cocoon proteins in Sciarid fly salivary glands. In Drosophila, two chorion gene clusters are amplified in ovarian follicle cells, somatic epithelial cells that surround the ooctye and secrete the components of the eggshell, by repeated origin activation at the endogenous locus (see Figure 1-4A and 1-4B for developmental context). This process enables the eggshell proteins to be produced and the eggshell structure to be constructed in less than five hours. Importantly for its use as a replication model, gene amplification uses the same replication machinery and cell cycle kinase regulation that is used in the canonical S phase. Several female-sterile mutants have been isolated that produce a thin eggshell phenotype due to the inadequate transcription of eggshell genes, and these mutants have been found to contain mutations in replication factors such as Orc2, Mcm6, and Dbf4/chiffon (LANDIs et al. 1997; LANDIS and TOWER 1999; SCHWED et al. 2002). Figure 1-4. Gene amplification in Drosophila follicle cells as a model to study metazoan DNA replication. (A) Gene amplification occurs in the context of egg chamber development and maturation. Drosophila ovaries are made of multiple ovarioles, or strings of developing egg chambers. (B) Adapted from (SPRADLING 1993). DAPI staining of egg chambers. Distinct egg chamber stages can be visually distinguished by the egg chamber size, proportion of nurse cell volume compared to the oocyte volume, and size and shape of the anterior dorsal filaments. (C) Schematic of gene amplification. Initiation is the repeated rounds of origin firing, resulting in an amplified region. Elongation is the replication of existing replication forks such that there is no increase in DNA copy number at the origin but an increase in the flanking regions. (D) The process of gene amplification can be visualized using immunofluorescence experiments, monitoring the incorporation of a nucleotide analog. Fluorescence in situ hybridization (FISH) using a genomic probe marks a specific site of amplification. .............. .............. .... . ....... .................. . . .......................... Figure 1-4 A. Drosophila ovaries B. Egg chamber stages stlOA-B stl1 stl2 stl3 ovariole C. Schematic of gene amplification D. Visualizing gene amplification Initiation Elongation DAPI EdU Genomic FISH probe At the major chorion amplicon, Drosophila Amplicon in Follicle Cells (DAFC)-66D, the cis requirements for amplification have been finely mapped (Figure 1-1E). DAFC-66D is an example of a confined replication site. The majority of initiation events as determined by 2-D gel mapping has been narrowed down to the 884 base pair element Orip located in the intergenic region between two chorion genes. Additionally a 320 base pair enhancer element, Amplification Control Element on the Third (ACE3), is required for amplification and has been shown to bind ORC in vivo, though it itself does not serve as a replication origin in the endogenous locus (CLAYCOMB and ORR-WEAVER 2005). ACE3 is proposed to serve as a nucleation point for ORC binding, possibly by permitting a chromatin environment where pre-RCs can assemble. In support of this model, multimers of ACE3 are capable of autonomously inducing amplification at ectopic sites in the genome, although at lower levels than the endogenous locus (CARMINATI et al. 1992). Another advantage of studying replication using follicle cell amplicons is that they permit in vivo study of replication timing. Replication events during gene amplification can be precisely examined because the process occurs after genomic replication is shut off in development. Developing egg chamber stages are morphologically distinct and can be isolated and analyzed using methods such as Southern blotting, quantitative PCR, and protein localization by immunofluorescence and chromatin immunoprecipitation. For DAFC-66D, replication initiation events occur exclusively in stages 10 B and 11, which is followed by a period when only replication elongation occurs in stages 12 and 13 (CLAYCOMB et al. 2002) (See Figure 1-4C for schematic representations of initiation and elongation). The absence of initiation events in these later stages allows the elongating replication forks to be visualized as double bar structures in immunofluorescence experiments, monitoring the incorporation of BrdU. In contrast, at another amplicon DAFC-62D, there are two stages of replication initiation: one in stage 10B, followed by a second round of replication initiation in stage 13 (XIE and ORR-WEAVER 2008). Thus, follicle cell gene amplification provides a unique opportunity to study replication events during development. Properties of metazoan replication origins Sequence properties and genome distribution of replication origins Early radiography studies revealed that DNA replication initiates from hundreds to thousands of sites in the genome (HUBERMAN and RIGGS 1968; TAYLOR 1968), and more recently, genome-wide studies have shown that cells utilize distinct spatial replication programs. Origin distribution is quite variable, though the average is similar to previously determined estimates of replicon size (approximately 50 kb). Inter-origin distance can range from 1 to 500 kb, with sparse origin distribution in gene poor regions. Origin density is strongly correlated with gene density and the related measure of high GC content (CADORET et al. 2008). This correlation of origin density to high GC content appears to result from replication origins frequently overlapping with transcription units (discussed below). At the level of individual origins, Karnani et al found that in human cells there was an enrichment of AT sequences in the region 100 base pairs to each side of the replication peak, consistent with previous analyses of individual metazoan origins (KARNANI et al. 2010). A study in Drosophila cells found that, although there was no simple consensus motif for ORC binding, machine learning approaches could be applied to discriminate between ORCbound and non ORC-bound sequences (MACALPINE et al. 2010), suggesting that sequence likely plays some indirect role in promoting or permitting ORC binding. For example, it appears that DNA topology is an important determinant of ORC binding in vitro, as ORC binds preferentially to superhelical DNA without sequence specificity (REMUS et al. 2004), and the short sequences found to be enriched in the MacAlpine study may promote this DNA structure. Two studies examined whether replication origins mapped to evolutionary conserved regions (CR), and both observed significant enrichment (CADORET et al. 2008; KARNANI et al. 2010). Karnani et al observed that 50% of their identified origins overlapped with conserved elements defined by the Encyclopedia of DNA Elements (ENCODE) consortium of investigators. Cadoret et al found that 70% of their identified origins overlapped with CR's, lower than the overlap of protein coding exons with CR's (86%) but comparable to promoter regions (72%). It is unclear how much of this evolutionary constraint is due primarily to the replication, transcription, and/or epigenetic functions of these conserved sequences, though each is likely to contribute to the observed conservation. Replication and transcription The relationship between replication initiation and transcription is an area of active investigation, and examples from many systems show both negative and positive effects of transcription on replication. Because DNA replication and RNA transcription share the same template, one possibility is that transcriptional elongation poses a steric inhibition on possible origin licensing or activation. In the DHFR locus, replication initiation normally occurs from one of several potential sites in the 50 kb intergenic spacer between the DHFR gene and the downstream 2BE2121 gene. When transcription is extended into the non-transcribed spacer via deletion of 3' processing signals, replication initiation is suppressed in the intergenic region and confined to the region immediately adjacent to 2BE2121 (MESNER and HAMLIN 2005). In contrast, when the DHFR promoter is deleted, initiation events can be detected within the DHFR gene, though with lower overall efficiency throughout the region (KALEJTA et al. 1998). There are also several specific examples of a positive relationship of transcription of replication initiation. In some cases, transcriptional regulatory elements are necessary for replication initiation. The LCR in the human P-globin locus and transcription factor binding sites in the c-myc locus play important roles in promoting replication initiation (ALADJEM et al. 1995; Liu et al. 2003). In a more direct example, pre-loading transcription factors onto DNA resulted in site-specific replication in Xenopus eggs (DANIS et al. 2004). Histone modifications associated with open and active chromatin, specifically acetylated H3, were localized to this region and are likely to contribute more directly to origin firing than transcription factor binding itself (replication and histone modifications discussed below). Despite the diverse methods, one of the most striking findings of recent genome-wide mapping studies was the significant number of origins that corresponded to gene regions and in particular, the transcription start sites (TSS) of active genes. Although examples of replication origins coinciding with transcription units or promoters such as the human myc locus and human lamin B2 locus were previously known, it was unclear how representative these individual examples were of all replication origins. In mouse ES cells, 85% of origins map to transcription units and 44% specifically to promoter regions (SEQUEIRA-MENDES et al. 2009). In comparison to all promoters, these promoter origins were significantly enriched in cap analysis gene expression (CAGE) tags, which mark the 5' end of mRNAs, derived from early embryos. This result suggests that origins correspond to actively transcribed promoters. Furthermore, the promoter-associated origins were found to be the most efficient as assessed by abundance of nascent strands quantified using qPCR (SEQUEIRA-MENDES et al. 2009). The mouse ES cell results are very consistent with work from human cells, where 68% of origins were located 5 kb up or downstream of a TSS. In addition, origins were significantly enriched near sites of RNAPII binding (KARNANI et al. 2010). Furthermore, in Drosophila cell culture, two-thirds of ORC binding sites were found to overlap with TSS's, primarily at actively transcribed genes (MAcALPINE et al. 2010). The co-localization of replication origins with actively transcribed promoters raises the possibility that specific transcription factors contribute to origin function at many initiation sites. Cadoret et al found that replication origins in human cells were significantly enriched in binding sites for c-JUN and c-FOS (CADORET et al. 2008), which together form the AP-I complex and regulate a variety of cellular processes such as proliferation, differentiation, and apoptosis. The c-Myc protein has been shown to bind to its own promoter, and mutations in this sequence abolish replication in a plasmid replication assay (ARIGA and IGUCHI-ARIGA 1989). However, Cadoret et al found no significant enrichment of c-Myc binding sites in origins identified in their study (CADORET and PRIOLEAU 2010), and it is possible that a direct role of this protein in replication initiation is limited to the c-Myc locus or a small subset of replication origins. With over 5000 ORC binding sites to query, MacAlpine et al reasoned they would observe conserved transcription factor binding motifs, or enrichment of specific functional gene categories, if a select group of transcription factors were responsible for ORC localization (MAcALPINE et al. 2010). However, they did not observe either of these possibilities, suggesting that specific transcription factors are not generally responsible for origin specification. Instead, specific transcription factors may regulate replication at a small subset of origins. Studies using Drosophila follicle cell gene amplification as a model system have revealed a direct role of transcription factors on DNA replication initiation. The chorion amplicon DAFC66D is regulated by the E2F, Myb, and Rb complexes (BEALL et al. 2004; Bosco et al. 2001). For example, hypomorphic mutations in Rb or E2f1 mutations that cannot bind Rb result in inappropriate genomic replication during amplification stages (Bosco et al. 2001). This result supports a model where E2F1/Rb directly represses replication at DAFC-66D, which is independent of transcriptional regulation, until the appropriate developmental time when Rb is phosphorylated and E2F 1 can positively influence amplification. These complexes have been localized to the amplicon by chromatin immunoprecipitation and shown to physically interact with ORC. An interaction between Rb and replication initation sites has been reported in mammalians cells with Rb localizing to initiation sites after DNA damage to repress replication (AvNI et al. 2003). There is also evidence that the insect molting hormone ecdsyone regulates amplification as dominant negative mutants of the ecdysone receptor (EcR) display reduced amplification (HACKNEY et al. 2007). These results are consistent with gene amplification in the salivary gland of Sciara coprophila,where ecdysone treatment can induce premature amplification (FOULK et al. 2006). This study also demonstrated that ScEcR binds this amplification origin in vitro at a putative ecdysone response element. Analysis of the follicle cell amplicon DAFC-62D has revealed a relationship between transcription and MCM loading. DAFC-62D exhibits two separate stages of replication initiation, with a period of elongation in between. By culturing egg chambers in the presence of the drug camanitin, which inhibits RNA polymerase II dependent transcription, the second round of replication initiation was specifically inhibited (XIE and ORR-WEAVER 2008). Transcription inhibition had no effect on ORC localization but specifically inhibited MCM loading at this second stage of initiation. A direct physical interaction between RNAPII and the MCM complex has been reported in yeast raising the possibility that the transcriptional machinery may also function to promote MCM loading at some replication origins (GAUTHIER et al. 2002; HOLLAND et al. 2002). Up to one-third of origins are not associated with known promoters (KARNANI et al. 2010; MAcALPINE et al. 2010), indicating that there are other mechanisms of origin specification that do not involve transcription. Indeed, if replication could only initiate from the site of active transcription, gene-desert regions would be in serious risk of not being fully replicated every cell cycle. Conversely, not all active promoters correspond to replication initiation sites. It remains to be seen what properties permit ORC to bind to active promoter sequences and function as replication origins, though open and active chromatin is one good candidate (discussed below). Furthermore, one advantage of having DNA replication and transcription initiation occur from the same location is that this configuration minimizes the likelihood of head-to-head collisions of the replication and transcription machineries and disruption of both processes. Gene distribution studies in bacteria have revealed that 90% of essential genes in B. subtilis and 70% of essential genes in E. coli are oriented so that DNA replication and transcription are co-directional (MIRKIN and MIRKIN 2005). It is possible that potentially deleterious consequences of head-on polymerase collisions are reduced by gene orientation in bacteria, where there is just one replication origin. In the larger genomes of higher eukaryotes, the coincidence of replication and transcription start sites may prevent polymerase collision immediately at the initiation site and enable flexibility of spatial replication programs depending on the developmental stage and cell type. In contrast to metazoans, DNA replication initiation sites are located primarily in intergenic regions in budding and fission yeast (RAGHURAMAN et al. 2001; SEGURADO et al. 2003). Additionally, a comparative study of replication origins in Saccharomyces yeast species revealed a significant enrichment of replication origins between convergent transcription units, and when located between tandem transcription units, the replication origin was observed closer to the transcriptional terminator than the promoter (NIEDUSZYNSKI et al. 2006). One possible explanation for this difference from metazoan replication origins is that the well-defined nature of ORC binding in yeast may constrain origin positioning, whereas multicellular organisms require greater flexibility in origin usage and thus have replication origins and active promoters overlap to coordinate replication and transcription. Replication and chromatin context DNA replication occurs in the context of chromatin, the combination of DNA and associated histone proteins around which DNA wraps to form nucleosomes. The N-terminal tails of histones are subject to a number of covalent modifications such as acetylation, methylation, and phosphorylation, which induce chromosomal changes that either promote or inhibit various genomic processes such as transcription, replication, and recombination (KOUZARIDES 2007). These histone modifications have been best characterized with regard to transcription, and there are several well-defined marks characteristic of actively transcribed or repressed chromatin. However, there is also an interest in the relationship between replication and chromatin state. Recent work in budding yeast purifying histone proteins around a single origin and performing high-resolution mass spectrometry to identify all histone modifications throughout the cell cycle has revealed dynamic acetylation patterns of histone H3 and H4 (UNNIKRISHNAN et al. 2010). Multiply acetylated H3 and H4 are required for efficient origin activation during S phase. Additionally, deletion of the histone deacetylase Rpd3 in budding yeast has been shown to result in early activation of late origins at non-telomere positions (KNOTT et al. 2009). Two independent studies have found enrichment of hyperacetylated H3 and H4 at Drosophila follicle cell amplicons (AGGARWAL and CALVI 2004; HARTL et al. 2007). H4 acetylation did not co-localize with elongating replicating forks, indicating that this modification is associated with replication initiation and not histone deposition at newly replicated DNA (HARTL et al. 2007). Loss-of-function mutant clones of the histone deacetylase Rpd3 resulted in increased acetylation levels and showed increased genomic replication in amplification stage egg chambers. Furthermore, using a reporter construct, follicle cell amplification could be inhibited by tethering Rpd3 to the region (AGGARWAL and CALVI 2004). Genome-wide mapping studies found origins to be enriched for specific active histone marks such as H3K4 dimethylation, H3K4 trimethylation, and H3 acetylation (CADORET et al. 2008; KARNANI et al. 2010). In addition, ORC binding sites in Drosophila cell culture are significantly enriched for the histone variant H3.3, which marks active promoters and regulatory sequences (MAcALPINE et al. 2010). Notably, the enrichment of H3.3 is also found at ORC binding sites not associated with active transcription, indicating that this mark may be more general to replication origins in Drosophila and not just the promoter associated ones. Currently, it remains unclear whether ORC binding is an indirect consequence of local chromatin structure or whether ORC localization is somehow actively regulated. The observation that many regions of active, open chromatin do not contain replication origins or bind ORC implies that there are likely to be additional mechanisms to regulate ORC binding to specific sites. As more functional elements of the genome are mapped, there will be greater understanding of which of these features are linked to DNA replication initiation. An attractive hypothesis for why replication origins are significantly enriched for TSS's is that active transcription necessitates or creates an open chromatin state that is also required for ORC binding or origin activation. Multiple genome-wide studies have found that the origin datasets identified are significantly enriched for DNaseI hypersensitive sites (CADORET et al. 2008; HANSEN et al. 2010; LUCAS et al. 2007). DNA regions are hypersensitive to cleavage by DNaseI when not wrapped in the nucleosome, as when transcription factors displace histone octamers. In Drosophila cell culture, ORC binding sites are significantly depleted for bulk nucleosomes (MACALPINE et al. 2010). These results suggest that accessible DNA may be important for ORC binding or some other step of replication initiation. In budding yeast, nucleosomes are excluded from the ACS and B elements of ARSI (LIPFORD and BELL 2001). Recent studies have demonstrated that origin sequence is sufficient to maintain nucleosome free origins on a genome-wide scale, although ORC binding is required for the precise nucleosomal positioning surrounding the origin (EATON et al. 2010). One possibility is that, in budding yeast, the nucleosome free region established by origin sequence, in concert with additional mechanisms, is necessary for ORC binding. In metazoans, where there is no consensus origin motif, ORC binding may rely on nucleosome-free DNA established by other means, such as at promoters during transcription. Replication timing and cell type specific replication programs Cells have a distinct temporal replication program. That is, there are regions that are consistently replicated early in S phase and others that are replicated late in S phase, which can be assessed across the genome. Work in Drosophila and mammalian cell culture, though not in yeast, shows a correlation between early origin firing and active transcription (MAcALPINE et al. 2004; WHITE et al. 2004). This relationship appears to hold for large zones and not necessarily the level of individual genes. Furthermore, the two alleles of the same gene can display asynchronous replication timing in the case of imprinted genes where the expressed allele replicates earlier than the silent allele (SINGH et al. 2003). Several reviews discuss the relationship of replication timing and transcription based on analyses of model replication origins (ALADJEM 2007; HIRATANI et al. 2009). This section will focus on relevant findings regarding replication and transcription from the genome-wide mapping experiments. The study in Drosophila cell culture identified 630 hydroxyurea-resistant early origins by BrdU immunoprecipitation, and higher density of ORC binding corresponded to earlier replication time (MACALPINE et al. 2010), but Cadoret et al found that replication timing was independent of origin density (CADORET et al. 2008), which may be due to the limited number of genomic regions they assessed. Alternatively, the distinction of ORC binding density versus origin density may be significant. High density of ORC binding may increase the likelihood that replication will initiate early at this region because there are more potential origins that can be activated. In contrast, the clustering of origins is not necessarily indicative of replication timing. A comprehensive study of replication timing comes from the Repli-Seq method, where there is whole-genome data of newly replicated DNA at six time points spanning the G 1/S transition to the end of S phase (HANSEN et al. 2010). Furthermore, because multiple cell types were used, the relationship of cell type specific transcription and DNA replication could be investigated. Similar to previous studies, regions were identified where a gene that is exclusively expressed in one cell type is early replicating, but it is late replicating in the cell types where it is not expressed (HATTON et al. 1988). Additional examples of asynchronous replication based on allelic expression were also identified. In terms of replication features, constant early replication regions are associated with high gene density, gene expression, Alu density, GC content, and CpG density, all of which are consistent with the previously reported findings of transcription and early replication. In addition, genomic regions associated with the nuclear lamina, which are physically segregated from the internal nuclear compartment, are late replicating. By looking at pair-wise combinations of cell types for differences in replication timing, Hansen et al observed that 50% of the human genome displayed plasticity, or variability in replication timing (HANSEN et al. 2010). One intriguing hypothesis is that this plasticity is due to the different gene expression patterns among the four cell types, and genomic regions containing genes that are more similarly expressed in terms of timing and abundance may display uniform replication timing among cell types. Additional analysis will be required to investigate this possibility. Despite the abundance of information that recent genome-wide mapping studies have provided, one of their common limitations is that the experiments are performed in cell culture, where replication events cannot be directly studied at the time of origin firing. Gene amplification in Drosophila follicle cells offers a powerful model for metazoan DNA replication without these limitations. Also, regarding the question of replication plasticity, the mapped amplification origins do not correspond to origins used in the canonical S phase, which provides a model system to examine cell type specific replication programs and why replication origins used in one cell type are or are not used in other cell types. The importance of cataloging replication origins Experimental innovations made possible by complete genome sequences have led to an approximate 100-fold increase in the number of identified metazoan replication origins. With improved technology and reduction in the cost of high-throughput sequencing to apply to wholegenome experiments, hundreds, if not thousands, more replication origins will be identified. This welcomes an evaluation of why identifying replication origins is important and what additional information will be gained from this endeavor. There are at least two motivations. First, identifying replication origins on a genome-wide scale will provide a picture of how chromosomes are normally replicated in S phase: what portion of the genome shows origin clustering versus dispersed distribution of origins? What areas of the genome show large interorigin distance? How large can inter-origin distance maximally be? A catalog of replication origins will enable a more direct investigation of the mechanisms regulating replication initiation for a diverse set of origins. Second, once we know what the typical spatial replication program looks like for a given cell type, we can examine the situation in abnormal or disease-causing states. For example, recent work sequencing cancer genomes has revealed that amplification of gene regions and chromosomal rearrangements is widespread (CAMPBELL et al. 2008; STEPHENS et al. 2009). In addition, cancer cells often show de-regulation of replication components (PETROPOULOU et al. 2008). One possibility is that re-replication at origins results in increased DNA copy levels that are subsequently retained in the chromosome, possibly via recombination or non-homologous end joining at sites of DNA breaks. It will be interesting to see if the regions most susceptible to amplification in cancer cells also contain replication origins in the untransformed cell state. It also will be important to assess whether common amplifications or rearrangements found in cancer cells are due to the distinct spatial and temporal replication program of the cell type from which they originate. For regions of low origin density, how do cells ensure the region will be fully replicated? During the cell cycle, the entire genome must be fully replicated for each daughter cell to inherit a complete and accurate copy of the genome. Regions where origins are far apart may be vulnerable to incomplete replication if replication forks collapse and cannot restart or if no other origins are activated in the intervening region. Studies have shown that replicon size can be as large as 500kb (CADORET et al. 2008). Future studies can examine whether regions of sparse origin density are particularly susceptible to chromosomal loss. Potentially vulnerable regions of sparse origin density can be used as models to study how intra-S phase checkpoints are activated when genomic replication is incomplete. Summary of thesis This thesis investigates the properties and regulation of metazoan DNA replication origins using Drosophila follicle cell gene amplification as a model system. We integrate genome-wide analyses with detailed molecular characterization of individual origins to elucidate the properties that enable specific genomic regions to serve as replication origins. In Chapter 2, we use a comparative genomic hybridization strategy to identify all of the follicle cell amplicons and identify two new follicle cell amplicons. As gene amplification is typically a strategy to augment gene expression, we examine the relationship between the amplified regions and gene expression. We also determine the localization of ORC on a genome-wide scale. We confirm that H4 acetylation is enriched in amplified regions but find that it is not deterministic of amplification. Instead, levels of H4 acetylation appear to correspond to the magnitude of gene amplification. We also identify an amplicon, DAFC-22B, that displays strain-specific amplification and use it to study the requirements of ORC binding. In Chapter 3, we characterize DAFC-34B in detail. We find that amplification may not fit the model of augmenting gene expression as one of the genes in the region, encoding a vitelline envelope protein of the eggshell, is expressed prior to amplification stages whereas another gene is expressed in only a small subset of follicle cells. DAFC-34B displays two rounds of gene amplification, but unlike the previously characterized DAFC-62D, the second round of replication initiation is not dependent on transcription. We map the amplification origin to a 1 kb region using nascent strand analysis and find that the origin corresponds to the transcription unit of the vitelline membrane gene. We find that ORC binds in a broad 10 kb zone at the first stage of amplification but is surprisingly absent in subsequent stages, despite a second round of replication initiation. We determine the cis requirements for amplification and find that a 6 kb region is sufficient for amplification at an ectopic site. This work highlights the power of using Drosophila follicle cell amplification to study replication as both genome-wide and individual analyses of replication origins can be performed to study what properties confer origin function and what strategies are used to regulate replication initiation. REFERENCES B. D., and B. R. CALVI, 2004 Chromatin regulates origin activity in Drosophila follicle cells. Nature 430: 372-376. ALADJEM, M. I., 2007 Replication in context: dynamic regulation of DNA replication patterns in metazoans. Nat Rev Genet 8: 588-600. ALADJEM, M. I., A. FALASCHI and D. KOWALSKI, 2006 Eukaryotic DNA Replication Origins in DNA Replication and Human Disease, edited by M. L. DEPAMPHILIS. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. ALADJEM, M. I., M. GROUDINE, L. L. BRODY, E. S. DIEKEN, R. E. FOURNIER et al., 1995 Participation of the human beta-globin locus control region in initiation of DNA replication. Science 270: 815-819. AGGARWAL, ALADJEM, M. I., L. W. RODEWALD, J. L. KOLMAN and G. M. WAHL, 1998 Genetic dissection of a mammalian replicator in the human beta-globin locus. Science 281: 1005-1009. ARIAS, E. E., and J. C. WALTER, 2007 Strength in numbers: preventing rereplication via multiple mechanisms in eukaryotic cells. Genes Dev 21: 497-518. ARIGA, H., and M. M. IGUCHI-ARIGA, 1989 [DNA replication and RNA transcription regulated by c-myc protein]. Tanpakushitsu Kakusan Koso 34: 1163-1174. AVNI, D., H. YANG, F. MARTELLI, F. HOFMANN, W. M. ELSHAMY et al., 2003 Active localization of the retinoblastoma protein in chromatin and its response to S phase DNA damage. Mol Cell 12: 735-746. BEALL, E. L., M. BELL, D. GEORLETTE and M. R. BOTCHAN, 2004 Dm-myb mutant lethality in Drosophila is dependent upon mip130: positive and negative regulation of DNA replication. Genes Dev 18: 1667-1680. BELL, S. P., and A. DUTTA, 2002 DNA replication in eukaryotic cells. Annu Rev Biochem 71: 333-374. BELL, S. P., and B. STILLMAN, 1992 ATP-dependent recognition of eukaryotic origins of DNA replication by a multiprotein complex. Nature 357: 128-134. Bosco, G., W. Du and T. L. ORR-WEAVER, 2001 DNA replication control through interaction of E2F-RB and the origin recognition complex. Nat Cell Biol 3: 289-295. BREWER, B. J., and W. L. FANGMAN, 1987 The localization of replication origins on ARS plasmids in S. cerevisiae. Cell 51: 463-471. CADORET, J. C., F. MEISCH, V. HASSAN-ZADEH, I. LUYTEN, C. GUILLET et al., 2008 Genomewide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci U S A 105: 15837-15842. CADORET, J. C., and M. N. PRIOLEAU, 2010 Genome-wide approaches to determining origin distribution. Chromosome Res 18: 79-89. CAMPBELL, P. J., P. J. STEPHENS, E. D. PLEASANCE, S. O'MEARA, H. LI et al., 2008 Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet 40: 722-729. CARMINATI, J. L., C. G. JOHNSTON and T. L. ORR-WEAVER, 1992 The Drosophila ACE3 chorion element autonomously induces amplification. Mol Cell Biol 12: 2444-2453. CLAYCOMB, J. M., D. M. MACALPINE, J. G. EVANS, S. P. BELL and T. L. ORR-WEAVER, 2002 Visualization of replication initiation and elongation in Drosophila. J Cell Biol 159: 225236. CLAYCOMB, J. M., and T. L. ORR-WEAVER, 2005 Developmental gene amplification: insights into DNA replication and gene expression. Trends Genet 21: 149-162. DANIS, E., K. BRODOLIN, S. MENUT, D. MAIORANO, C. GIRARD-REYDET et al., 2004 Specification of a DNA replication origin by a transcription complex. Nat Cell Biol 6: 721-730. EATON, M. L., K. GALANI, S. KANG, S. P. BELL and D. M. MACALPINE, 2010 Conserved nucleosome positioning defines replication origins. Genes Dev 24: 748-753. FANGMAN, W. L., and B. J. BREWER, 1991 Activation of replication origins within yeast chromosomes. Annu Rev Cell Biol 7: 375-402. FOULK, M. S., C. LIANG, N. Wu, H. G. BLITZBLAU, H. SMITH et al., 2006 Ecdysone induces transcription and amplification in Sciara coprophila DNA puff II/9A. Dev Biol 299: 151163. GAUTHIER, L., R. DZIAK, D. J. KRAMER, D. LEISHMAN, X. SONG et al., 2002 The role of the carboxyterminal domain of RNA polymerase II in regulating origins of DNA replication in Saccharomyces cerevisiae. Genetics 162: 1117-1129. GILBERT, D., and S. N. COHEN, 1989 Autonomous replication in mouse cells: a correction. Cell 56: 143-144. HACKNEY, J. F., C. PUCCI, E. NAES and L. DOBENS, 2007 Ras signaling modulates activity of the ecdysone receptor EcR during cell migration in the Drosophila ovary. Dev Dyn 236: 1213-1226. HAMLIN, J. L., L. D. MESNER and P. A. DIJKWEL, 2010 A winding road to origin discovery. Chromosome Res 18: 45-61. HANSEN, R. S., S. THOMAS, R. SANDSTROM, T. K. CANFIELD, R. E. THURMAN et al., 2010 Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A 107: 139-144. HARLAND, R. M., and R. A. LASKEY, 1980 Regulated replication of DNA microinjected into eggs of Xenopus laevis. Cell 21: 761-771. HARTL, T., C. BOSWELL, T. L. ORR-WEAVER and G. BoSCo, 2007 Developmentally regulated histone modifications in Drosophila follicle cells: initiation of gene amplification is associated with histone H3 and H4 hyperacetylation and HI phosphorylation. Chromosoma 116: 197-214. HATTON, K. S., V. DHAR, E. H. BROWN, M. A. IQBAL, S. STUART et al., 1988 Replication program of active and inactive multigene families in mammalian cells. Mol Cell Biol 8: 2149-2158. HEINTZ, N. H., and J. L. HAMLIN, 1982 An amplified chromosomal sequence that includes the gene for dihydrofolate reductase initiates replication within specific restriction fragments. Proc Natl Acad Sci U S A 79: 4083-4087. 2009 Replication timing and transcriptional control: beyond cause and effect--part II. Curr Opin Genet Dev 19: 142149. HIRATANI, I., S. TAKEBAYASHI, J. Lu and D. M. GILBERT, HOLLAND, L., L. GAUTHIER, P. BELL-ROGERS and K. YANKULOV, 2002 Distinct parts of minichromosome maintenance protein 2 associate with histone H3/H4 and RNA polymerase II holoenzyme. Eur J Biochem 269: 5192-5202. HUBERMAN, J. A., and A. D. RIGGS, 1968 On the mechanism of DNA replication in mammalian chromosomes. J Mol Biol 32: 327-341. HUBERMAN, J. A., L. D. SPOTILA, K. A. NAWOTKA, S. M. EL-ASSOULI and L. R. DAVIS, 1987 The in vivo replication origin of the yeast 2 microns plasmid. Cell 51: 473-481. JACOB, F., and S. BRENNER, 1963 [On the regulation of DNA synthesis in bacteria: the hypothesis of the replicon.]. C R Hebd Seances Acad Sci 256: 298-300. KALEJTA, R. F., X. LI, L. D. MESNER, P. A. DIJKWEL, H. B. LIN et al., 1998 Distal sequences, but not ori-beta/OBR- 1, are essential for initiation of DNA replication in the Chinese hamster DHFR origin. Mol Cell 2: 797-806. KARNANI, N., C. M. TAYLOR, A. MALHOTRA and A. DUTTA, 2010 Genomic study of replication initiation in human chromosomes reveals the influence of transcription regulation and chromatin structure on origin selection. Mol Biol Cell 21: 393-404. KITSBERG, D., S. SELIG, I. KESHET and H. CEDAR, 1993 Replication structure of the human betaglobin gene domain. Nature 366: 588-590. KNOTT, S. R., C. J. VIGGIANI, S. TAVARE and 0. M. APARICIO, 2009 Genome-wide replication profiles indicate an expansive role for Rpd3L in regulating replication initiation timing or efficiency, and reveal genomic loci of Rpd3 function in Saccharomyces cerevisiae. Genes Dev 23: 1077-1090. KOUZARIDES, T., 2007 Chromatin modifications and their function. Cell 128: 693-705. KRYSAN, P. J., and M. P. CALOS, 1991 Replication initiates at multiple locations on an autonomously replicating plasmid in human cells. Mol Cell Biol 11: 1464-1472. KRYSAN, P. J., J. G. SMITH and M. P. CALOS, 1993 Autonomous replication in human cells of multimers of specific human and bacterial DNA sequences. Mol Cell Biol 13: 26882696. LANDIS, G., R. KELLEY, A. C. SPRADLING and J. TOWER, 1997 The k43 gene, required for chorion gene amplification and diploid cell chromosome replication, encodes the Drosophila homolog of yeast origin recognition complex subunit 2. Proc Natl Acad Sci U S A 94: 3888-3892. LANDIS, G., and J. TOWER, 1999 The Drosophila chiffon gene is required for chorion gene amplification, and is related to the yeast Dbf4 regulator of DNA replication and cell cycle. Development 126: 4281-4293. LIPFORD, J. R., and S. P. BELL, 2001 Nucleosomes positioned by ORC facilitate the initiation of DNA replication. Mol Cell 7: 21-30. LIu, G., M. MALOTT and M. LEFFAK, 2003 Multiple functional elements comprise a Mammalian chromosomal replicator. Mol Cell Biol 23: 1832-1842. LUCAS, I., A. PALAKODETI, Y. JIANG, D. J. YOUNG, N. JIANG et al., 2007 High-throughput mapping of origins of replication in human cells. EMBO Rep 8: 770-777. MACALPINE, D. M., H. K. RODRIGUEZ and S. P. BELL, 2004 Coordination of replication and transcription along a Drosophila chromosome. Genes Dev 18: 3094-3105. MACALPINE, H. K., R. GORDAN, S. K. POWELL, A. J. HARTEMINK and D. M. MACALPINE, 2010 Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res 20: 201-211. MASUKATA, H., H. SATOH, C. OBUSE and T. OKAZAKI, 1993 Autonomous replication of human chromosomal DNA fragments in human cells. Mol Biol Cell 4: 1121-1132. MELIXETIAN, M., and K. HELIN, 2004 Geminin: a major DNA replication safeguard in higher eukaryotes. Cell Cycle 3: 1002-1004. L. D., E. L. CRAWFORD and J. L. HAMLIN, 2006 Isolating apparently pure libraries of replication origins from complex genomes. Mol Cell 21: 719-726. MESNER, L. D., and J. L. HAMLIN, 2005 Specific signals at the 3' end of the DHFR gene define one boundary of the downstream origin of replication. Genes Dev 19: 1053-1066. MESNER, L. D., X. LI, P. A. DIJKWEL and J. L. HAMLIN, 2003 The dihydrofolate reductase origin of replication does not contain any nonredundant genetic elements required for origin activity. Mol Cell Biol 23: 804-814. MILBRANDT, J. D., N. H. HEINTZ, W. C. WHITE, S. M. ROTHMAN and J. L. HAMLIN, 1981 Methotrexate-resistant Chinese hamster ovary cells have amplified a 135-kilobase-pair region that includes the dihydrofolate reductase gene. Proc Natl Acad Sci U S A 78: 6043-6047. MIRKIN, E. V., and S. M. MIRKIN, 2005 Mechanisms of transcription-replication collisions in bacteria. Mol Cell Biol 25: 888-895. MESNER, NIEDUSZYNSKI, C. A., Y. KNOX and A. D. DONALDSON, 2006 Genome-wide identification of replication origins in yeast by comparative genomics. Genes Dev 20: 1874-1879. OKUNO, Y., T. OKAZAKI and H. MASUKATA, 1997 Identification of a predominant replication origin in fission yeast. Nucleic Acids Res 25: 530-537. PETROPOULOU, C., P. KOTANTAKI, D. KARAMITROS and S. TARAVIRAS, 2008 Cdtl and Geminin in cancer: markers or triggers of malignant transformation? Front Biosci 13: 4485-4494. RAGHURAMAN, M. K., E. A. WINZELER, D. COLLINGWOOD, S. HUNT, L. WODICKA et al., 2001 Replication dynamics of the yeast genome. Science 294: 115-121. RAO, H., and B. STILLMAN, 1995 The origin recognition complex interacts with a bipartite DNA binding site within yeast replicators. Proc Natl Acad Sci U S A 92: 2224-2228. REMUS, D., E. L. BEALL and M. R. BOTCHAN, 2004 DNA topology, not DNA sequence, is a critical determinant for Drosophila ORC-DNA binding. Embo J 23: 897-907. SCHAARSCHMIDT, D., J. BALTIN, I. M. STEHLE, H. J. LIPPs and R. KNIPPERS, 2004 An episomal mammalian replicon: sequence-independent binding of the origin recognition complex. Embo J 23: 191-201. SCHWED, G., N. MAY, Y. PECHERSKY and B. R. CALVI, 2002 Drosophila minichromosome maintenance 6 is required for chorion gene amplification and genomic replication. Mol Biol Cell 13: 607-620. SEGURADO, M., A. DE LUIS and F. ANTEQUERA, 2003 Genome-wide distribution of DNA replication origins at A+T-rich islands in Schizosaccharomyces pombe. EMBO Rep 4: 1048-1053. SEQUEIRA-MENDES, J., R. DIAZ-URIARTE, A. APEDAILE, D. HUNTLEY, N. BROCKDORFF et al., 2009 Transcription initiation activity sets replication origin efficiency in mammalian cells. PLoS Genet 5: e1000446. SINGH, N., F. A. EBRAHIMI, A. A. GIMELBRANT, A. W. ENSMINGER, M. R. TACKETT et al., 2003 Coordination of the random asynchronous replication of autosomal loci. Nat Genet 33: 339-341. SPRADLING, A., 1993 Developmental Genetics of Oogenesis in The Development ofDrosophila melanogaster,edited by M. BATE and A. MARTINEz ARIAS. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. STEPHENS, P. J., D. J. MCBRIDE, M. L. LIN, I. VARELA, E. D. PLEASANCE et al., 2009 Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462: 10051010. STINCHCOMB, D. T., M. THOMAS, J. KELLY, E. SELKER and R. W. DAVIS, 1980 Eukaryotic DNA segments capable of autonomous replication in yeast. Proc Natl Acad Sci U S A 77: 4559-4563. TANAKA, S., T. UMEMORI, K. HIRAI, S. MURAMATSU, Y. KAMIMURA et al., 2007 CDK-dependent phosphorylation of Sld2 and Sld3 initiates DNA replication in budding yeast. Nature 445: 328-332. TAYLOR, J. H., 1968 Rates of chain growth and units of replication in DNA of mammalian chromosomes. J Mol Biol 31: 579-594. THOMER, M., N. R. MAY, B. D. AGGARWAL, G. KwOK and B. R. CALVI, 2004 Drosophila double- parked is sufficient to induce re-replication during development and is regulated by cyclin E/CDK2. Development 131: 4807-4818. UNNIKRISHNAN, A., P. R. GAFKEN and T. TSUKIYAMA, 2010 Dynamic changes in histone acetylation regulate origins of DNA replication. Nat Struct Mol Biol 17: 430-437. WALTER, J. C., and H. ARAKI, 2006 Activation of Pre-replication Complexes in DNA Replication and Human Disease, edited by M. L. DEPAMPHILIS. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. WHITE, E. J., 0. EMANUELSSON, D. SCALZO, T. ROYCE, S. KOSAK et al., 2004 DNA replicationtiming analysis of human chromosome 22 at high resolution and different developmental states. Proc Natl Acad Sci U S A 101: 17771-17776. WYRICK, J. J., J. G. APARICIO, T. CHEN, J. D. BARNETT, E. G. JENNINGS et al., 2001 Genomewide distribution of ORC and MCM proteins in S. cerevisiae: high-resolution mapping of replication origins. Science 294: 2357-2360. XIE, F., and T. L. ORR-WEAVER, 2008 Isolation of a Drosophila amplification origin developmentally activated by transcription. Proc Natl Acad Sci U S A 105: 9651-9656. YANOW, S. K., Z. LYGEROU and P. NURSE, 2001 Expression of Cdc 18/Cdc6 and Cdtl during G2 phase induces initiation of DNA replication. Embo J 20: 4648-4656. ZEGERMAN, P., and J. F. DIFFLEY, 2007 Phosphorylation of Sld2 and Sld3 by cyclin-dependent kinases promotes DNA replication in budding yeast. Nature 445: 281-285. ZHONG, W., H. FENG, F. E. SANTIAGO and E. T. KIPREOS, 2003 CUL-4 ubiquitin ligase maintains genome stability by restraining DNA-replication licensing. Nature 423: 885-889. Chapter Two: Genome-wide identification of Drosophila follicle cell amplicons as in vivo model replicons Jane C. Kim, Jared Nordman, Fang Xie, Helena Kashevsky, Thomas Eng, and Terry L. Orr-Weaver Whitehead Institute and Dept. of Biology, Massachusetts Institute of Technology Cambridge, MA 02142 J.N. performed the 16C follicle cell RNA-seq experiment and contributed to data analysis. F.X. performed and analyzed histone modification ChIP-qPCR and tethering experiments. H.K. performed the aCGH experiment of OrRTOW flies. T.E. performed the aCGH experiment of OrRMOD flies. J.K. performed all other experiments and data analysis. ABSTRACT We report the genome-wide identification of all follicle cell amplicons in Drosophila, uncovering two new amplified regions. We determined the precise localization of the origin recognition complex (ORC) on a genome-wide level and observed that, at the start of synchronous amplification, ORC localizes to the six amplicons with levels corresponding to the magnitude of amplification. Additionally, we investigated amplification with respect to transcription and chromatin state. The levels and timing of gene expression in some amplicons suggest that gene amplification is not exclusively a developmental strategy to promote high expression levels. Follicle cell amplicons are enriched for tetra-acetylated H4, but this mark is not sufficient for ORC localization or amplification. In addition to genome-wide analyses, we investigated the replication properties of one new amplicon in finer molecular detail. Strikingly, DAFC-22B shows strain-specificity in amplification, a property that is correlated with the ability to localize ORC. We identified sequence differences between closely related amplifying and nonamplifying strains and used P element mediated transformation to test sufficiency for ORC binding and amplification at this region. INTRODUCTION The initiation of DNA replication occurs from discrete genomic regions called replication origins. Although the protein complexes that bind DNA and license origins for replication are known and well conserved, the properties that enable a DNA sequence to function as a replication origin in metazoans are more poorly delineated than in simpler eukaryotes (CVETIC and WALTER 2005; GILBERT 2004). One reason is that analysis of metazoan replication origins has, until recently, proceeded from molecular characterization of a small number of identified origins (ALADJEM 2007). For example, the DHFR locus in Chinese hamster ovary cells is a valuable model replication origin and has been analyzed using a variety of methods; replication at this locus initiates in a broad initiation zone containing multiple inefficient initiation sites (HAMLIN et al. 2010). Recent genome-wide origin mapping studies in Drosophila, mouse, and human cell culture have greatly increased the number of identified metazoan origins (CADORET et al. 2008; KARNANI et al. 2010; MACALPINE et al. 2010; SEQUEIRA-MENDES et al. 2009), revealing that most origins coincide with active transcription units, specifically the transcription start site, and marks for open chromatin as well as confirming studies from model replication origins that there is no sequence specific motif for ORC binding or origin specification in metazoans. Gene amplification that is developmentally regulated is an important origin discovery tool and model for investigating the regulation of metazoan replication origins in vivo (CLAYCOMB and ORR-WEAVER 2005). Specific genomic regions are amplified through replication-based mechanisms, either chromosomal excision followed by extra-chromosomal amplification or repeated bidirectional replication from an endogenous chromosomal locus, to increase gene copy number. Developmental gene amplification increases the DNA template to allow for sufficient levels of gene products required at high levels, such as rRNA in frog ooctyes and cocoon proteins in Sciarid fly salivary glands. In Drosophila two chorion gene clusters are amplified in ovarian follicle cells, somatic epithelial cells that surround the ooctye and secrete the components of the eggshell, by repeated origin activation at the endogenous locus (OSHEIM et al. 1988; SPRADLING 1981). This process enables the eggshell proteins to be produced in a short developmental period. Several features make Drosophila follicle cell gene amplification a powerful model for investigating metazoan origin function. First, the process occurs within the context of developing egg chambers that are morphologically distinct and can be isolated for experimental analysis, allowing replication events to be studied in the context of development. Second, because gene amplification begins after genomic replication has shut off in development, methods to assess DNA replication including quantitative PCR or immunofluorescence of the nucleotide analog bromodeoxyuridine (BrdU) to visualize newly replicated DNA can be used to assess the precise timing of replication events. The third chromosome chorion amplicon displays an early phase when replication initiation occurs, resulting in approximate 60-fold amplification of the locus. At subsequent stages, there are no further initiation events and only elongation of existing replication forks (CLAYCOMB et al. 2002). Finally, Drosophila genetic tools and manipulation allow one to test cis requirements for gene amplification. Unlike the DHFR locus, the third chromosome chorion amplicon is an example of a confined site-specific replication origin. A discrete 320 base pair amplification control element, ACE3, and non-contiguous 884 base pair origin, Orip, are sufficient for amplification (Lu et al. 2001). Because the chorion amplicons have served as important replication models, Claycomb et al undertook a microarray strategy to identify additional follicle cell amplicons (CLAYCOMB et al. 2004). Two were identified and designated Drosophila Amplicon in Follicle Cells (DAFC) followed by the cytological position, DAFC-30B and DAFC-62D. Recent analysis of DAFC-62D has proven it to be a unique replication model. Unlike DAFC-66D, DAFC-62D displays two stages of replication initiation with a period of elongation in between. The second round of replication initiation is dependent on RNAPII transcription for localization of the MCM2-7 helicase complex (XIE and ORR-WEAVER 2008). In contrast, ORC localizes to DAFC-62D throughout all amplification stages and is unaffected by transcription inhibition. These studies exemplify how identification and molecular characterization of follicle cell amplicons can uncover new regulatory mechanisms of DNA replication. To identify additional amplification origins to be used as model replicons, we expanded our follicle cell amplicon discovery approach, as the cDNA microarrays used in the 2004 study contained fewer than 50% of Drosophila genes and no intergenic regions. Here we report the identification of all follicle cell amplicons using high-density microarrays, uncovering two new amplicons to make a total of six follicle cell amplicons. With a complete catalog of the follicle cell amplicons, we investigate how gene amplification is related to or possibly influenced by aspects of the genomic landscape such as transcription and histone modifications as an in vivo complement to genome-wide analyses of metazoan replication origins in cell culture. In addition, we investigate the determinants of ORC binding at one amplification origin by exploiting its property of strain-specific amplification. RESULTS Identification of two new follicle cell amplicons by aCGH To identify all of the amplified regions in follicle cells, we employed an array-based comparative genomic hybridization (aCGH) strategy. Before follicle cells undergo gene amplification, they undergo three rounds of endoreduplication, or chromosomal replication without intervening mitoses, to reach 16C copy levels. We isolated pure populations of follicle cell nuclei enriched for gene amplification stages by performing flow cytometry to collect 16C nuclei. DNA from these collections was competitively hybridized with diploid embryonic DNA on genome-wide tiling microarrays containing one 60-mer probe approximately every 600 bp. Because of this high-density coverage, we are confident that we have identified all of the amplicons present in these cells. Using this approach, we confirmed the presence of the four previously known follicle cell amplicons and identified two new follicle cell amplicons (Figure 2-1A). Like the four previously identified amplicons, DAFC-22B and DAFC-34B show a gradient of replicated DNA that spans approximately 100 kb (Figures 2-1B and 2-1C). The maximum aCGH enrichment ratios for DAFC-22B and DAFC-34B are less than those of the chorion amplicons but comparable to DAFC-30B and DAFC-62D (Table 2-1). As expected, the genomic regions in DAFC-22B and DAFC-34B contain genes. The 20 genes within the amplified region of DAFC-34B are generally less than 5 kb in length, and one gene, Vm34Ca, located in the peak of amplification encodes a structural component of the vitelline membrane, the innermost layer underlying the Drosophila eggshell. Surprisingly, there is one 60 kb gene, CG7337, in the most amplified region of DAFC-22B, thus differing from the genomic organization of the other characterized follicle cell amplicons, which contain small genes encoding eggshell proteins and enzymes that are approximately 1 kb in length. Genome-wide expression analysis of follicle cells Gene amplification is typically considered a strategy to augment gene expression to high levels. Female-sterile mutants in replication factors such as ORC2, MCM6, and DBF4 lay eggs with a thin eggshell phenotype attributed to the inadequate transcription of eggshell genes Figure 2-1. Genome-wide identification of Drosophila follicle cell amplicons by aCGH identifies two new amplified regions. 16C follicle cells were competitively hybridized with diploid embryonic DNA to microarrays with approximately one probe every 600 bp. The Y-axis represents the Log 2 ratio of 16C follicle cell DNA compared to diploid embryonic DNA. Entire chromosome arms are shown in (A). The newly identified amplicons DAFC-22B and DAFC-34B are marked in red. (B) Close up 150 kb view of DAFC-22B. CG7337 spans the entire length of the amplified region. (C) Close up 150 kb view of DAFC-34B. Vm34Ca encodes a structural component of the vitelline membrane. ........... ........................................................... ........... ... ..................... - A......... .................. ....... Figure 2-1 A Chromosome X 10 MbI 200000001 150000001 100000001 50000001 DAFC-7F Chr omosome 2R Chromosome 2L 10 f0 50000001 200000001 150000001 100000001 50000001 100000001 200000001 15DDDO0I 5- 5- 0) 0- 01 - 0 I , DAFC-30B DAFC-34B Da -A Chromosome 3R Chromosome 3L Ot 150000001 100000001 50OW01o I - 10 00. 10 100000001 500ODO0 200000001 150000001 200000001 250000001 CD 0 4 DAFC-66D DAFC-62D Chromosome 4 2000001 0 00000 400O00 I 60000 I 000000o1 1 100000 12000001 05- 00- p B DAFC-34B DAFC-22B 50 50kb 18500001 134000001 133500001 19000001 134500001 1950000, 30) 0 0 - "-'~~~" ~~~ ,..amigll ..llllihi.....-- - .--- 0 IE||ill IEllu is..n..... .... CD0- umililil ~ os~C~l w3 C1 00 Tm423 CG caiM 180 caieo CG733 670@4 26O0Z CG10858U bOWtOP0 TehaoI" 491-3 CG1082 CG16850 4' 31 . Table 2-1. Drosophila Amplicons in Follicle Cells. * Genomic position CGH max (Log 2) ORC2 max (Log2) acH4 max (Log 2 )* DAFC-22B 1.5503 1.8736 1.1593 DAFC-30B 1.5354 2.1616 1.4092 DAFC-34B 2.338 2.5568 1.4569 DAFC-62D 1.4083 1.9345 1.3374 DAFC-66D 4.9087 4.3749 3.6094 DAFC-7F 3.6786 2.4783 2.33 maximum tetra-acetylated H4 ChIP value that corresponds to ORC binding zone (LANDIS et al. 1997; LANDIS and TOWER 1999; SCHWED et al. 2002). To determine the transcriptional profiles of the genes in the amplified regions, we performed high throughput RNA sequencing. We isolated RNA from 16C follicle cells recovered by FACS and performed Illumina sequencing to uncover a global view of transcript levels in amplification stage follicle cells. 16C cells will include follicle cells from stage 9 egg chambers, when the cells exit the endocycles, to stage 14, immediately before egg deposition. Transcript profiles for DAFC-66D and DAFC-7F show high expression levels of the chorion genes in these regions, confirming that our RNA-seq results are an accurate reflection of amplification stage follicle cells (Figures 2-2E and 2-2F). For example, at DAFC-66D, cp18, cp15, cp19, and cp16 are all highly expressed. Although these four genes are located in the same 10 kb most amplified region, cp]6 shows at least six-fold less expression compared to the other three by reads per kilobase of exon model per million mapped (RPKM) values. These results correspond precisely with gene expression studies using developmental Northern blots and in situ hybridization experiments, thus validating the accuracy of RNA-seq quantification (GRIFFINSHEA et al. 1982). The same is true of the genes in the most amplified region of DAFC-7F; they are highly expressed but exhibit variable levels consistent with Northern blot analysis (PARKS et al. 1986). Other genes within the 100 kb amplified gradients of DAFC-66D or DAFC-7F were expressed at low levels. Thus, being located within an amplicon will not, by default, activate transcription nor will it, for expressed genes, enhance expression to uniform levels. Although gene amplification may promote high expression levels of some genes, there are additional regulatory mechanisms that fine-tune the levels, developmental timing, and spatial specificity of gene expression. Figure 2-2. Follicle cell expression levels vary among the genes located in the six amplified regions. 200 kb regions of all follicle cell amplicons (A-F) with sequence reads from RNA-seq above and aCGH data below. Maximum sequence read is 100 for DAFC-22B, DAFC-30B, and DAFC-62D. Maximum sequence read is 1000 for DAFC-34B, DAFC-66D, and DAFC- 7F. ... .. .... .... .. ...... .. ........ .............. .......... ........ .. .. .. .. .......... - - - - _. ............... 7 111111 111111771111 ........ ....... ....... Figure 2-2 DAFC-22B DAFC-30B 100kW. 185ODO0I 19000001 19500001 2000001 95000001 95500001 9000001 LA -I....in i L. -.. _ _ _ .. a 003161-4 C03165 0"3W' UI * "I00I 001358 07337, CG7337 (7337 0108083 NP0C2100C3 3 0015 0078a 0033 T8-2!!Q I 0 ~ 438 c1,71855 j"T 4~~ U ip 0031870 00438 0018823 3113 DAFC-34B 13400001 TxH 0 1 Tx 0180 ... 31T 4 DAFC-62D 100 133500001 m *G86.rc. 134500001 135000001 2 00 225DD0D1 22000001 100 23000001 000 4) c) a) 5- 5- 0366 18426 008077W JA030 00306 p24 T11P 00848G02731388 Hp03 0000. 00853 4 '4 I 841 FM5707 6 00158786 oe-IN 0312755 G090186 00127564 ACX P.D 8ogl .0 F .= DAFC-66D DAFC-7F lw.-. 87000001 OGIS 01370 0013801as wFOL2 D01-2~ C131 001380 0020041 OG138071 00138088 C0323021 weiow-gli1 007088 0071103beOaWCopE 88500001 s-2'.W. Dms3-i 87500I 88000001 8300001 I 8400000I 800o001 Adsoonal (E a)il I= pll.#Sl #r OO416 h8 00,4 ~32022Poo CaW3~~ Ca 028 0111 001100 OTr121 0012135 00140 C~p7F83 003 OpFI 00MI Prrn 003308 I6 pm 0G,35 C06448 Ota C1530314* 0T1.1 ----- 00318 e...44 00331811216 .o.m4900280- 121113 1(54 001265 1265.f 0012065em 2.- The other four amplicons show distinct transcript profiles. Like the chorion amplicons, DAFC-34B shows high expression levels for at least two genes in the amplified region (Figure 22C). The vitelline membrane gene, Vm34Ca, had the highest RPKM value in our RNA-seq dataset, though unlike the chorion amplicons, there is not a cluster of highly expressed genes in the 10 kb most amplified region of DAFC-34B. In contrast, the transcript profiles of DAFC-30B and DAFC-62D do not fit the simple model that amplification is a developmental strategy to promote very high expression levels, as these amplicons contain genes that are only moderately expressed (Figures 2-2B and 2-2D). Claycomb et al showed that genes within DAFC-30B and DAFC-62D such as CG13113 and yellow-g2, respectively, show high expression levels across all follicle cells in only one or two stages of egg chamber development (CLAYCOMB et al. 2004), unlike the chorion genes, which show more sustained high expression levels throughout several stages (PARKS and SPRADLING 1987). These genes show reduced expression in amplification mutants by in situ hybridization, indicating the importance of amplification for normal transcription levels (CLAYCOMB et al. 2004). Thus, the lower RPKM values of these genes may be a reflection of the very narrow developmental window in which they are expressed. Furthermore, DAFC-30B contains genes at the edge of the amplified gradient that are more highly expressed than genes in the central region, a unique feature among the six amplicons. For DAFC-22B, we found that CG7337 is expressed at low levels (Figure 2-2A). While it is possible that gene amplification is necessary for even these low levels of transcription to be met, equivalent gene expression of CG7337 in a background that does not amplify the locus (discussed below) makes this an unlikely explanation. Another possibility is that amplification of this region does not augment gene expression but may be the indirect consequence of transcription or active chromatin marks at this region that result in replication initiation. Figure 2-3. Gene amplification is not required for high follicle cell gene expression. 200 kb regions of six non-amplified genomic loci (A-F) with highly expressed genes. Sequence reads from RNA-seq are above, and aCGH data is below. 26A and 32E contain vitelline membrane genes homologous to Vm34Ca. . . . ........ ....................................................... Figure 2-3 100l 1000_ 14500001 140000I 13500001 13000001 5- 5- 78500001 - --j- cc to 78000001 77500001 77000001 8 0Sta 1470 hatsch CG1402 CG14796= CG149 14 sta CG32810m 41 C140 CG1C1532 I g ,0 CG1 a G1 00178.3 814af 06153323 snb- n 1514M , 5 1531 10777 F0, C 15325 G 10 C~oiSOcS am k1)61551 21841 O 1) dcI. 0624 som CGRpS146 dpr4 0C 10 26A inn0 100001 99500001 0000001 ---1 o n1 samw ma6smmM"n CG32804 go m.. Ka mma.I""--..~~ Mmi C G11 66n 061814 0 IMMtN.Mlmmas CG12645 C629723 6,m" 0000001 600500001 10058WW -~*" - ---- U L---- 45 : 5 500000 1000 .a.oa 06 O12645 062974 1512 90 15220 32E 30E 8500001 - 000 - 001115000 111000001 og500001 - 1125ODDDI 112000001 1000_ ~~~.1 CO 5- 5- - 0- . N . ..... . ... -..... ... 8 gopf~~21ij 0 43 m L-11,6613A24 23.24 0642360 06(13124m : .... C- m". ,a="Ir-e. g........... I C013126.4 . . .. .. ..... bGb133 RpL13C C0131301 4 016754 c084k 0 mor -n monn.U 1C CG.05 C ia-rn -- *--"-nLE 1 049 ..- - E.......- hgoM C1 C Ca-M 1 Ca-belam ___~ 047886 CIA& 0471152C3 014 801.6 --- a 60)0 a- If active transcription promotes specific chromatin modifications or other structural changes that permit gene amplification to occur in follicle cells, we would expect gene regions with highly expressed genes to become amplified. However, this does not appear to be the case, as there are highly expressed genes and gene clusters that are not amplified (Figure 2-3). This observation demonstrates that gene amplification is not the exclusive means by which genes become highly expressed in follicle cells. ORC binding in amplicons localizes to the most amplified region ORC function is essential for DNA replication initiation during genomic replication as well as gene amplification. ORC localization marks all potential sites of replication initiation in the genome, and in Drosophila, the chorion amplicon DAFC-66D was the first example of ORC binding at a replication origin in vivo (AUSTIN et al. 1999). Localization of ORC by chromatin immunoprecipitation (ChIP) showed enrichment at defined replication control elements A CE3 and Orip for DAFC-66D. In addition, ORC was found to co-localize with amplification foci by immunofluorescence. However, apart from DAFC-62D, for which a 20 kb region was assessed for ORC binding by ChIP-qPCR, the precise localization of ORC for the other amplicons was unknown. Therefore, to determine the localization of ORC with respect to the 100 kb amplified gradients as well as to see whether other genomic regions, especially those showing high expression levels but no amplification, were bound by ORC, we performed a genome-wide assessment of ORC localization. We isolated stage 10 egg chambers by hand, when synchronous follicle cell gene amplification commences, and performed chromatin immunoprecipitation using a polyclonal antibody specific for ORC2 followed by hybridization to a high-density tiling microarray (ChIPchip). The starting number of egg chambers (approximately 1200) produced enough material so 62 that no amplification had to be performed on the samples. We found that ORC localizes to all follicle cell amplicons in a zone centered at the peak of amplification, ranging from approximately 10 to 30 kb depending on the amplicon. In addition, we found that levels of ORC enrichment generally corresponded to the magnitude of gene amplification (Figure 2-4 and Table 1). The log ratios of ORC binding were highest for the two chorion amplicons and DAFC-34B, which also show the greatest magnitude of amplification among the six amplicons. These results are consistent with previous ChIP-qPCR quantification comparing DAFC-62D and DAFC-66D (XIE and ORR-WEAVER 2008). As the status of ORC binding for DAFC-7F and DAFC-30B were previously unknown, these studies contribute four new ORC bound amplification origins to the catalog of metazoan replication origins. As noted above, many genes and gene clusters are highly expressed in follicle cells despite not being amplified. We examined ORC binding at several of these highly expressed regions to see if the absence of ORC binding could explain why these regions are not amplified (Figure 2-5). Apart from a peak of ORC binding at the dec-1 promoter at 7C, we did not observe ORC localization at these highly expressed regions. We computationally identified the top scoring regions of ORC enrichment and found that, while the amplicons were among the top scoring regions, ORC localized to other discrete genomic regions that were not amplified (Figure 2-6). Although ORC binding is necessary for gene amplification, it is not predictive of replication initiation at a specific site. Other regulatory mechanisms are likely necessary to establish the full pre-replicative complex (pre-RC) and activate replication initiation. Alternatively, ORC localization at these non-amplifying sites may play a role independent of replication initiation. ORC is required for mating type silencing in budding yeast, maintenance of heterochromatic silencing in Drosophila via interaction with HP1/Su(Var)205, and establishment Figure 2-4. The magnitude of gene amplification corresponds to the levels of ORC binding and tetra-acetylated H4 200 kb regions of all follicle cell amplicons (A-F) with aCGH data, ORC2 ChIP-chip, and tetraacetylated H4 ChIP-chip, all in log2 ratios. The chorion amplicons (E,F) show the highest levels of increased DNA copy number, indicating repeated rounds of origin firing, which correspond to high levels of ORC and tetra-acetylated H4 enrichment. Y-axis of ChIP-chip data shows Log 2 ratio of immunoprecipitated DNA to stage 10 input DNA. . ..................... . ......... ...................... .... ..... .............. Figure 2-4 DAFC-30B DAFC-22B 100 100 1 I100 1850001 0 - -* - 0 k0 900O000 uow I 0000000 i19-0001 "........ 5- 5- C.) ci 0 4(L 4 ~iL s W ,Lp - 0 H ~ i 10111114U--. Noma 2 1 "*"1-o1 DAFC-62D DAFC-34B 100 kb: 134000001 2300001 30000 22000DI 22DDODDI 135000001 -Mod" " " -" ffiooo iiiU 0 L A 4- 4- 0.&,au&L Ll REI I 01 go.. 4F- * on I-I-11-4 in~~min~am~m~ I 54. 364 4 DAFC-7F DAFC-66D o50o0l 87500001 87000001 *I5 1 88000001 83500001 400000 Imamam. 10.EIw"III.I1uII a- w 4- 00 se iprqu o r I £i 1~1A U.l 6jk.-- IFa1~ W-wp1qqr , 4. 1-44101 "M * k 0 i 4- 4. - 1-5* MII 31144* .4- NOR4 I .iin.inh~, o.~T~y*r U *~U.4 Bh~Aa.jU 5146 kL~.I~aia..U.ad.. 111.4 WS4.5*4U.-41.--5*4---5*4 mm a- Figure 2-5. ORC is not enriched at most highly expressed gene regions. 200 kb regions exhibiting high follicle cell expression from Figure 2-3 (A-F) with aCGH data, ORC2 ChIP-chip, and tetra-acetylated H4 ChIP-chip, all in log2 ratios. The maximum log2 enrichment of ORC2 at 7C is 1.5935. ... ..... .. ...... ... - - -=Z.............. -.................... ........... ........... .. . .:..m: :m .......................... -- ---------- .............................. Figure 2-5 100 - 1300I0 I 13500001 0 bin.6alnEUEiin1amm uumaem -m.1min..urnm 100 77000001 14500001 14000001 sam..emle.s~me 77500001 imEaml No. sml 78500001 78000001 l..uuin.'mimiLana 5- - 9A 26A 100 9000001 90500001 7UU &*WWIool 10000000| -. m..inmum.iiiu~....~uuiIm oI.inin.,11,E a I ininlI uinrnlh ,.~,. 0500001 .-...-.. 0000000| 0060000| maammamammmmism....-.................... 5-_ aC'4 0 A.b~JLAA..~ .AkoP~XI0 wr.A~w 0 a. 4- C 01 OW4 INS 0 - H I 30E 0000001 32E 111500001 111000001 95000001 0 . -...... ... ,..-............m.mse o C) C)4 IdJI CL)4 I 0 -.- I4~ M ina arn- s N*4-4~~Io ~ 112000001 m ma mamma, s_ 11250I Figure 2-6. ORC binds to genomic regions that do not become amplified. 200 kb regions of non-amplified, top-scoring ORC2 enrichment values (A-F). aCGH data, ORC2 ChIP-chip, and tetra-acetylated H4 ChIP-chip are shown in log2 ratios. i7 I -W 3814 tetH4ac ChIP 4s JL SEE-~ No Ms'~ T Jj tetH4ac ChIP 40 ORC2 ChIP ORC2 ChIP CGH CGH ~.11 JO tetH4ac ChIP tetH4ac ChIP ORC2 ChIP ORC2 ChIP CGH CGH 1. ri Mu1 8 RE~U,8 tetH4ac ChIP *I tetH4ac ChIP km ORC2 ChIP 4 y I ORC2 ChIP CGH CGH CD -I -Il ci. of cohesin loading in Xenopus egg extracts (BELL et al. 1993; HUANG et al. 1998; PAK et al. 1997; TAKAHASHI et al. 2004). Similarly, ORC may have roles in gene silencing or cohesin establishment or maintenance at these non-amplifying regions. H4 acetylation corresponds to the magnitude of gene amplification DNA replication occurs in the context of chromatin, the combination of DNA and associated histone proteins around which DNA wraps to form nucleosomes. The N-terminal tails of histones are subject to a number of covalent modifications that either promote or inhibit various genomic processes such as transcription, recombination, and replication (KOUZARIDES 2007). Two independent studies have found enrichment of hyperacetylated H3 and H4 at follicle cell amplicons (AGGARWAL and CALVI 2004; HARTL et al. 2007). Loss-of-function mutant clones of the histone deaectylase Rpd3 resulted in increased acetylation levels and showed inappropriate genomic replication in amplification stage egg chambers. Furthermore, follicle cell amplification using a reporter construct of DAFC-66D could be inhibited by tethering Rpd3 to the region (AGGARWAL and CALVI 2004). To investigate the relationship between gene amplification and histone acetylation on a genome-wide scale, we performed ChIP on stage 10 egg chambers using an antibody against tetra-acetylated H4, which recognizes acetylated lysines 5, 8, 12, and 16. Enrichment of tetraacetylated H4 is found at all six amplicons (Figure 2-4). Like ORC, the ratio of enrichment generally correlates to the magnitude of amplification (Table 1). DAFC-66D shows the greatest enrichment of tetra-acetylated H4 whereas DAFC-22B shows the smallest enrichment, though still above background levels. Because this antibody recognizes all four acetylated lysines, we used antibodies specific for single residues to test their correlation to gene amplification. Whereas acetylated H4K5 and H4K12 antibodies showed only modest enrichment at the amplicons by ChIP-qPCR, acetylated H4K8 was highly enriched around the amplification origins, in a pattern resembling tetra-acetylated H4 distribution with levels correlated to magnitude of amplification (Figure 2-7 and H4K12 data not shown). To test whether acetylation levels, particularly that of H4K8, were necessary for differences in amplification levels, we used the amplification reporter system developed by the Calvi lab to quantitatively measure DNA copy number and enrichment of H4K8 (AGGARWAL and CALvi 2004). The reporter designated TT1 contains the 3.8 kb minimal origin with ACE3 and Orip from DAFC-66D next to UAS, which binds GAL4 and GAL4 DNA binding domain (DBD) fusion proteins. Following one-hour heat-shock induction of a GAL4DBD: :Rpd3 fusion, amplification was completely abolished in stage 10 as well as pooled egg chambers of stages 11 and 12, without affecting the endogenous amplicons (Figure 2-8A). TT1 amplification levels were measured by quantitative PCR using a probe specific to the transposon vector (to distinguish between TT1 and endogenous DAFC-66D) compared to a non-amplified locus. We examined stage 10 egg chambers for acetylated H4K8 after induction of the GAL4DBD::Rpd3 fusion and found that this histone mark was significantly reduced at TT1 but essentially unchanged at the endogenous amplicons (Figure 2-8B). Thus, these results indicate that acetylation of H4K8 is necessary for amplification of the TT1 transgene. Recruitment of the histone acetyltransferase HATI has been reported to enhance amplification using the same reporter assay. In contrast, we found that following one-hour heatshock induction of a GAL4DBD::HAT1 fusion, there was no effect on TT1 amplification in any stage for three independent experiments (Figure 2-8C). The previous study measured gene amplification by quantifying FISH signals and Southern blot experiments, so it possible that the discrepancy is due to the greater sensitivity of quantitative PCR in determining DNA copy level Figure 2-7. Levels of H4K8 acetylation correlate with amplification levels. ChIP-qPCR analysis showing enrichment levels of tetra-acetylated H4 (A), acetylated H4K8 (B), and acetylated H4K5 (C) across four amplified regions. .. .................. .................... ............................ .......... ............ . .....--- ------------------................... .. ........ Figure 2-7 7 03 input 5 * tetra-AcH4 6 4 3 2 4 0 N'NI, 4: - lb X$ T t' 'Pt OV 'bi 14 Bg 12 10 U- 8 6 4 Zo2 0 N 4 "N'Z 4 + Nb *0 * 3.5 l~ 3- *0 2- input c *AcH4K5 -2 c 1.5 - 0~ 1411i I I .sZ Nlz N Nz' Al lkl 'ZI "o e ,x*x lkl Ili1ri i Iti 1 b Figure 2-8. Tethering of Rpd3 to TT1 represses its amplification and H4K8 acetylation whereas tethering of HATI to TT1 does not affect its amplification. (A) Stage-specific copy number of TT 1 in different genetic backgrounds and experimental conditions. Arrow points to repressed amplification upon Rpd3 expression and tethering to TT1. (B) Levels of H4K8 acetylation at TTl and representative probes from endogenous amplicons in stage 10. Arrow points to reduction in acetylated H4K8 levels upon Rpd3 expression and tethering to TT1. (C) TTl amplification is not affected by HATi expression and tethering to TT1. (D) Levels of H4K8 acetylation in stage 10. "Reduction" in flies containing the hsp::HATl transposon could be due to leaky expression of HAT1, causing genome-wide acetylation. (E) Levels of H4K5 acetylation in stage 10. Arrow points to increased H4K5 acetylation upon HATI expression and tethering to TT1. ....... ............. ...... ...... .......... . . Figure 2-8 16 14 12 'a 8 6 4 E 0 TT1/+ 7F-0 (ACE1) TT1/+ (heatshock) 30B-5 62D-O (or62) TT11+; TT1/+; hsp::Rpd3 hsp::Rpd3 (heatshock) 61 Genomic position 16 14 12 C 10 0 8 6 .4 2 - 0 TT1/+ TT1+ (heatshock) TT11+; TT1+; hsp::HATI hsp::HAT1 (heatshock) E 6. 10 S5. x 4. 3 TT11+ * T7T1/+ (heatshock) STT1/+; hsp::HA TI STT11+; hsp::HAT1 (heatshock) 0 E 204 LIL 7F-0 (ACE1) 30B-5 62D-O (or62) 61 Genomic position 0- 7F-4 (ACE1) 30B-5 62D-O (or62) 66D-5 Genomic position 771 .............. .. ......... .... . differences. We performed ChIP-qPCR experiments for acetylated H4K5 and acetylated H4K8 to assess the status of these chromatin marks when HATI is tethered to TT1. Flies carrying the GAL4DBD::HAT1 fusion displayed lower levels of acetylated H4K8 enrichment at the endogenous amplicons compared with TT1 alone, but they still show the same overall pattern of acetylated H4K8 enrichment corresponding with amplification levels (Figure 2-8D). There was no effect of HATI recruitment on acetylated H4K8 levels at TT1 upon HATI tethering. In contrast, acetylated H4K5 is found at very low levels except at TT1 when HATI is induced and acetylated H4K5 is significantly enhanced (Figure 2-8E). HATI has been reported to catalyze acetylation at H4K5 and H4K12. Thus, the absence of H4K8 hyperacetylation may explain why enhanced amplification was not observed with HATI tethering. DAFC-22B exhibits strain-specific amplification To characterize further the amplification properties of DAFC-22B, we examined gene amplification at this locus in a number of genetic backgrounds. Surprisingly, we found that DAFC-22B displays strain-specific amplification. By aCGH, we found that even in two closely related "wild type" strains where all five other amplicons are common, DAFC-22B amplifies in OrR T and is not amplified in OrRMOD (Figure 2-9). To determine the stage-specific replication profile of DAFC-22B, we hand sorted OrRTOW egg chambers to isolate genomic DNA and performed qPCR quantification of DNA copy levels. We found that replication initiation increased from stages 1OB through 13, resulting in approximately four-fold amplification by stage 13 (Figure 2-10). In addition, we performed ChIP-chip with ORC2 on the strain that does not amplify DAFC-22B and found that ORC localization is absent at DAFC-22B despite the same ORC localization for the other five amplicons (Figure 2-9). Because the determinants of ORC binding are poorly understood in metazoans, uncovering the difference between the two Figure 2-9. DAFC-22B exhibits strain-specific amplification that correlates with the ability of the region to bind ORC. DAFC-22B is not amplified in the OrRMOD strain, and this region does not bind ORC. Tetraacetylated H4 is observed in both amplifying and non-amplifying strains. 000 013 1c: T ORC2 ChIP ORC2 ChIP CGH CGH 8w AA tetH4ac ChIP - tetH4ac ChIP C Figure 2-10. DAFC-22B initiates amplification at stage 1OB and increases copy number approximately four-fold by stage 13. Genomic DNA was isolated from hand sorted staged egg chambers (A-F), and DNA copy levels were assessed by qPCR compared to a non-amplified locus. Figure 2-10 DAFC-22B Stage 1OB DAFC-22B Stage 1OA 1900000 1920000 1940000 1960000 1880000 1900000 1920000 1940000 1960000 Genomic Position Genomic Position DAFC-22B Stage 12 DAFC-22B Stage 11 rrols 1880000 1900000 1920000 1940000 196000 1880000 1900000 1920000 1940000 1960000 Genomic Position Genomic Position DAFC-22B DAFC-228 Stage 13 + Stage 1OA Stage 1OB + Stage 11 7-+ 6 + Stage 12 - 1880000 1900000 1920000 1940000 1960000 Genomic Position Stage 13 1880000 1900000 1920000 1940000 1960000 Genomic Position strains that is responsible for the difference in amplification will be a powerful model for understanding how ORC binds to specific DNA regions to promote replication initiation. Recent genome-wide studies from Drosophila to human cell culture have shown that the majority of replication origins correspond to gene regions and specifically the transcription start sites of genes (MAcALPINE et al. 2010). Because DAFC-22B coincides with a gene and one of its isoforms has the 5' end located in the peak of amplification, we examined stage-specific gene expression of CG7337 in both strains for possible differences that might explain the presence or absence of amplification at the region. To our surprise, we found that there was no difference in overall CG7337 expression levels between the amplifying and non-amplifying strain for any developmental stage of dissected egg chambers (Figure 2-11). Additionally, we used probes that could distinguish between the A,E-isoforms and D-isoform. Whereas the A,E-isoforms showed equivalent levels between the two strains, the D-isoform was more highly expressed in the nonamplifying OrRMO strain. Because whole egg chambers contain nurse cells and the oocyte in addition to follicle cells, we isolated purified follicle cell RNA from both whole ovaries and stage 13 egg chambers but detected no difference in expression levels of the D-isoform. Intriguingly, there are differences in CG7337 expression between the two strains but they do not appear to arise from the follicle cells. We performed ChIP to examine H4 tetra-acetylation at DAFC-22B and observed enrichment in the non-amplifying OrRMO strain (Figure 2-9). We observed the same result by acetylated H4K8 ChIP-qPCR (Figure 2-12). Thus, although levels of H4 acetylation correspond to the magnitude of gene amplification, H4 acetylation does not make chromatin sufficient for replication initiation. Figure 2-11. CG7337 is expressed at similar levels in follicle cells of amplifying and nonamplifying strains. RNA was isolated from hand sorted staged egg chambers. cDNA was quantified using a probe recognizing all five isoforms, the A and E isoforms, or D isoform (two unique probes). To investigate the difference in D isoform levels, purified follicle cells were isolated from stage 13 egg chambers and whole ovaries. .................... Figure 2-11 All CG7337 Isoforms CG7337 AE Isoforms ] MOD MOD [] m M TOW r, I ~'- l r"" - C>~\ C, C> -mSM :.b Stages Stages CG7337 D Isoform (Probe 1) CG7337 D |soform (Probe 2) 5n M TOW MOD D3M0D - TOW TOw 10- 5.r" s 'maim .& Stages Stages Stage 13 Stage 13 Follide Cells 4- 4 DOMW O: MOD 3- * W TOW Tow 2- 2o o.L. 004 Isoform Isofom Embryo Total Follicle Cells 4- 4- E * 3- Isoform MOW TOW MOD TOW 3 boform Figure 2-12. H4K8 acetylation levels are equivalent between the amplifying and non-amplifying strains by ChIP-qPCR. Acetylated H4K8 ChIP was performed on both OrRMOD and OrRT0 strains and quantified by qPCR. No difference in enrichment levels was detected between the two strains. .. ... ........ .... .. -- - ................ --------------- Figure 2-12 acH4K8 ChIP-qPCR 127 10864- II nm I1j A4 I I\r'0100 A DAFC-22B , 0 ,<,o DFt DAFC-62D DAFC-66D I DAFC-34B Mapping cis elements responsible for DAFC-22B amplification To address whether the difference in gene amplification between OrRTOW and OrRMOD could be explained by differential activity of a trans acting factor, we tested whether DAFC-22B amplification in OrRTow segregated with the second chromosome on which it is located. This test is only definitive if the trans acting factor is not on this chromosome, but consistent with the difference being a cis effect, amplification was found to segregate with the second chromosome (data not shown). In Drosophila cis requirements for amplification can be determined using P element mediated transformation to examine amplification of a test sequence at an ectopic site. We generated transposons containing the 10 kb ORC binding zone from OrRTOW as well as the equivalent region from OrRMOD by PCR (Figure 2-13). In addition, DAFC-22B is the only amplicon that also shows ORC binding in cell culture, though the ORC binding regions do not appear to be overlapping between egg chambers and cell culture (Figure 2-13). Thus, we also amplified the 10 kb sequence containing the cell culture ORC binding sites from OrRTow to test for amplification. We flanked the 22B sequences with Suppressor of Hairy-wing binding sites to minimize position effect variability (Lu and TOWER 1997). Sequencing these 22B regions revealed no major rearrangements but a number of 20-30 bp insertion/deletion differences between the two strains (data not shown). Analysis of these transgenic flies will provide an important example of the determinants ORC binding in a metazoan system. DISCUSSION We have used Drosophila follicle cell gene amplification as a model system to study metazoan replication origins: identifying all follicle cell amplicons, performing genome-wide analyses to look for general relationships of amplification to gene expression and histone Figure 2-13. Genetic analysis of cis control elements for differential DAFC-22B amplification The DAFC-22B regions for P element mediated transformation and testing sufficiency for amplification are shown in black bars. Region A was PCR-amplified from OrRTOw (amplifying) and OrRMOD (non-amplifying) flies. Region B was amplified from OrR TOW flies. CGH data and ORC2 ChIP-chip data are shown for reference (scale is Log 2 ratio). ORC2 ChIP-seq data from KC cells is also shown (scale is sequence tag density). Figure 2-13 I 50 kbi 18900001 19000001 19100001 19300001 19400001 19500001 19600001 "" . 3- iilIL. 0* cc 00 19200001 I" ". """". 0-. . 18800001 Il 0B 0& ..L&A. . iE-i~*i~ J& .~.k AAAA,A.'_ fkAr_ 0 0 CG153581H CG7337 CG7337' CG7337 CG7337 lot- *4w C:'G7337 CG33673 6 + CG31 670 H'.-* 40%. acetylation, as well as investigating these relationships in greater detail at the level of individual origins. The molecular and genetic tools available in Drosophila permit incredible dexterity in moving between genome-wide and individual origin analyses, which will likely be more important as the hypotheses generated from genome-wide studies continue to increase. Thus, these ORC-bound Drosophila follicle cell amplicons serve as powerful experimental models for in vivo analysis of replication origin properties and regulatory mechanisms. In the context of gene amplification, the relationship between replication and transcription has two facets: how replication and increased DNA copy number affects transcription as well as how transcription affects replication. The examples from follicle cell gene amplification reveal that both display complex relationships. First, the identification of all the amplified regions in Drosophila combined with genome-wide assessment of transcript levels reveals that the simple model of gene amplification being a developmental strategy to promote high levels of gene expression, as seems to be the case for the two chorion amplicons, may not be an absolute one. When examined by RNA-seq of 16C follicle cells, the genes in DAFC-30B and DAFC-62D are expressed to moderate levels. However, by in situ hybridization, these genes are highly expressed in one or two stages of egg chamber development, and their levels require the function of replication proteins (CLAYCOMB et al. 2004). Thus, even a four-fold increase in DNA may be important for providing sufficient expression levels of these genes. Notably, high levels of expression do not necessitate gene amplification to reach these large amounts as many genes show abundant expression without being amplified. The expression of genes in DAFC-22B and DAFC-34B presents an enigma in explaining how DNA copy number affects transcription. At DAFC-22B, CG7337 is expressed to the same levels regardless of whether the region is amplified. Whether this outcome is due to CG7337 not requiring amplification for expression is unknown. An alternative possibility is that the strains that do not amplify DAFC-22B have a more efficient or active promoter than the strains that do amplify the region. However, given that two of the strains that do and do not amplify DAFC-22B are closely related OrR strains, we favor the explanation that gene amplification is not required for expression of CG733 7. At DAFC-34B, Vm34Ca is present at high levels in our RNA-seq data, but this gene begins to be expressed in stage 8 (MINDRINOS et al. 1985). This is the first example of a gene in a follicle cell amplicon that is expressed before synchronous amplification begins and raises the question of whether gene amplification is necessary for Vm34Ca expression. Another possibility is that high levels of Vm34Ca promote gene amplification at this locus. Furthermore, there are at least three other homologous vitelline membrane genes in the Drosophila genome, two of which are in a cluster at cytological position 26A containing several genes expressed in follicle cells, yet none of these genes are amplified. The effect of transcription on replication initiation is not well delineated, and there are many diverse examples even among amplification origins. The chorion amplicon DAFC-66D is regulated by the E2F, MYB, and RB transcription factor complexes (BEALL et al. 2002; Bosco et al. 2001). However, as ACE3 and Orif#are sufficient for amplification in the absence of the cp18 transcription unit normally between the two elements, active transcription is not necessary for amplification (Lu et al. 2001). In contrast, at DAFC-62D, transcription is required for MCM complexes to be loaded at the second stage of initiation (XIE and ORR-WEAVER 2008). Gene amplification in Sciara coprophila offers an example of the inhibitory effect of transcription on replication initiation. In the salivary gland, puff II 9A is amplified to promote abundant expression of genes encoding cocoon proteins. When the locus becomes transcriptionally active, the replication initiation zone becomes constricted from a 7-8 kb region encompassing two transcription units to a 2 kb region that does not coincide with any genes (LUNYAK et al. 2002). The low expression levels of CG7337 and unique timing of Vm34Ca raise the possibility that amplification at these regions is a consequence of gene expression and may not have a direct role in regulating gene expression. One of the most reproducible findings among recent genomewide origin mapping studies was the significant number of origins that corresponded to gene regions and in particular, the transcription start sites of active genes. As both transcription and replication initiation require open chromatin, these two processes may be functionally linked with respect to the regulatory mechanisms controlling initiation; studies in several metazoan systems show that early replicating regions correspond to actively transcribed zones (MACALPINE et al. 2004; WHITE et al. 2004). A high resolution study of replication timing in multiple cell lines using next generation sequencing methods reported that genes expressed solely in one cell type were early-replicating exclusively in that cell type, suggesting a causal effect of transcription on early replication (HANSEN et al. 2010). Although there are just six follicle cell amplicons to examine, compared to thousands of origins activated in the canonical S phase, all six ORC binding amplification origins correspond to gene regions. This result is not unexpected since amplification can be a strategy to augment gene expression. However, given the relationship between replication origins corresponding to active promoters, the hundreds of genes that are highly expressed in follicle cells but not amplified also provide powerful models to investigate what properties, in addition to transcription, determine ORC binding and origin activation. As 16C follicle cells encompass stage 9 through 14 egg chambers and spans over 20 hours, we believe many of these genes are actively expressed during amplification stages, though RNA polymerase II localization on stage sorted follicle cell DNA would be necessary to assess this directly. We investigated the relationship between histone acetylation and gene amplification in Drosophila follicle cells and found that acetylation of H4, and specifically H4K8, quantitatively correlated with levels of gene amplification. The amplicons undergoing the most replication initiation events displayed the highest levels of acetylation. Recent work in budding yeast purifying histone proteins around a single origin and performing high-resolution mass spectrometry to identify all histone modifications throughout the cell cycle has revealed dynamic acetylation patterns of histone H3 and H4 (UNNIKRISHNAN et al. 2010). Multiply acetylated H3 and H4 were shown to be required for efficient origin activation during S phase, suggesting that H4 hyperacetylation is an evolutionarily conserved mark of replication initiation. Using an amplification reporter, we found that H4K8 hyperacetylation is necessary for origin activation in a transgene, as tethering a histone deacetylase to DAFC-66D and eliminating enrichment of acetylated H4K8 resulted in complete repression of amplification at stage 10. However, acetylated H4 is not sufficient for amplification, as there are regions in the genome that show enrichment by ChIP but are not amplified. For example, DAFC-22B displays similar enrichment of tetra-acetylated H4 and acetylated H4K8 despite amplifying the region in OrRTOW and not amplifying the region in OrRMOD. One explanation is that acetylated H4 creates a chromatin environment to which ORC can bind, but other conditions are necessary for ORC to localize to a region with acetylated H4. In the context of gene amplification, levels of acetylated H4 may influence the number of initiation events that can occur from an ORC bound amplification origin. During chromosomal replication, acetylated H4 and additional chromatin marks may specify the efficiency of origin activation. DAFC-22B provides a powerful opportunity to study the determinants of ORC binding in metazoans since the region displays strain-specific amplification that is correlated with ORC localization. One model is that DAFC-22B is amplified in some strains because something permissive of the region to ORC binding has been gained. Conversely, the locus may not amplify in other strains because some element that once made it permissive to ORC binding has been lost. Nevertheless, amplification appears to have no effect on CG7337 expression in terms of overall levels between the amplifying and non-amplifying strain, the reason for which remains an enigma. DAFC-22B provides a unique model to investigate the determinants of ORC binding and the relationship of gene amplification and transcription, highlighting the utility of genomic approaches to uncover new model origins for molecular analysis. MATERIALS AND METHODS Comparative genomic hybridization 16C nuclei were isolated by FACS from OrRTOW and OrRMOD fattened females as previously described (LILLY and SPRADLING 1996). Genomic DNA was prepared using the DNeasy Blood and Tissue Kit (Qiagen), digested with Alul and RsaI, and labeled using Invitrogen's BioPrime Total for Agilent aCGH labeling kit. Slides were hybridized to custom Agilent tiling arrays with probes every 600 or 400 bp and washed as per Agilent recommendations. Array intensities were median normalized across channels and smoothed by genomic windows of 10 kb using the Ringo package in R (TOEDLING et al. 2007). RNA-sequencing Ovaries from two-day fattened OrRTOW females were dissected in Grace's medium containing Hoechst, and follicle cells were isolated as described previously (BRYANT et aL. 1999). Follicle cells were sorted by FACS, and RNA from 16C follicle cells was extracted using Trizol according to the manufacturer's protocol. 1OOng of RNA was processed with the Illumina mRNA Sample Preparation kit and subject to DSN normalization according to the manufacturer's protocol. Chromatin immunoprecipiation ChIP-qPCR was performed on 300 staged egg chambers per experiment as described (XIE and ORR-WEAVER 2008). ChIP-chip was performed using four times the starting material. All ChIP experiments were compared to input DNA. For hybridization to arrays, DNA was labeled using Invitrogen's BioPrime Total for Agilent aCGH labeling kit. ChIP was performed with the following antibodies: ORC2 (Steve Bell), tetra-acetylated H4 (Active Motif 39179), acetylated H4K5 (Upstate 07-327), acetylated H4K8 (Upstate 07-328), and acetylated H4K12 (Upstate 07-595). Commercial antibodies used were ones validated by the ModENCODE consortium for specificity. Drosophila strains and heat-shock overexpression Transgenic lines carrying the TT1, hspGAL4DBD::Rpd3 or hspGAL4DBD ::HAT1 transposons were a gift from Brian Calvi. Flies were crossed to introduce two transposons into the same line: either TT1 and the Rpd3 fusion or TTl and the HATI fusion. Siblings that contained only TT1 (TTl/+) were kept as controls. One hour heat-shock at 37'C was used to overexpress the GAL4 fusion protein. Quantitative PCR Genomic DNA was isolated from staged egg chambers and quantified using absolute quantitative PCR as described (CLAYCOMB et al. 2004) or relative quantification as described (XIE and ORR-WEAVER 2008). Absolute quantification was used for the DAFC-22B replication profile whereas relative quantification was used for all other experiments. CG7337 expression analysis was performed on RNA samples prepared using Trizol and reverse transcribed with AMV reverse transcriptase (Promega). Purified follicle cells were isolated using a protocol modified from Bryant et al. 200 stage 13 egg chambers were dissected in ice-cold Schneider's medium supplemented with 10% FBS. Tissue was digested with 0.9mL of 0.25% Trypsin/EDTA and 0. 1ml of 50mg/mL collagenase for 12 minutes at room temperature. The supernatant was strained through a 40 tm mesh and spun at 1OOg for 7 minutes in the cold. Trizol was added to the pellet for RNA isolation. For cDNA analysis, Rps] 7 was used as the endogenous control. Transgenic fly construction To test the cis requirements for amplification at DAFC-22B, we constructed transposons with the 12 kb ORC binding region (stage 10) from OrRMOD (MOD4-9), the 12 kb ORC binding region (stage 10) from OrRTOW (TOW4-9), and the 10 kb ORC binding region (cell culture) from OrRT '(TOW8-10). The gap in probes in the DAFC-22B aCGH amplification profile represents repeated DNA sequence not present in either OrR strain. These sequences were flanked by suppressor of Hairy wing binding sites (SHWBS) to control for genomic position-specific integration effects. The sequences were PCR amplified using exTaq DNA polymerase (Takara) and primers with AscI and AvrII sites on the forward and reverse sequences, respectively. These products were cloned into a modified PCRA vector with AscI and AvrII sequences engineered into the multiple cloning site (Lu et al. 2001). These plasmids are called PCRA_22BMOD4-9, PCRA_22BTOW4-9, and PCRA_22BTOW8-10. These plasmids were digested with NotI and subjected to a partial XhoI digest to transfer the 12 kb or 10 kb inserts to the NotI and XhoI sites of Big Parent to generate BP_22BMOD4-9, BP_22BTOW4-9, and BP_22B_TOW8-10. These constructs were sent to BestGene Inc (Chino Hills, CA) for injection. ACKNOWLEDGEMENTS We thank David MacAlpine for design of the microarray, Steve Bell for the ORC2 antibody, Brian Calvi for transgenic flies. George Bell provided helpful bioinformatics advice. REFERENCES AGGARWAL, B. D., and B. R. CALVI, 2004 Chromatin regulates origin activity in Drosophila follicle cells. Nature 430: 372-376. ALADJEM, M. I., 2007 Replication in context: dynamic regulation of DNA replication patterns in metazoans. Nat Rev Genet 8: 588-600. AUSTIN, R. J., T. L. ORR-WEAVER and S. P. BELL, 1999 Drosophila ORC specifically binds to ACE3, an origin of DNA replication control element. Genes Dev 13: 2639-2649. BEALL, E. L., J. R. MANAK, S. ZHOU, M. BELL, J. S. LIPSICK et al., 2002 Role for a Drosophila Myb-containing protein complex in site-specific DNA replication. Nature 420: 833-837. BELL, S. P., R. KOBAYASHI and B. STILLMAN, 1993 Yeast origin recognition complex functions in transcription silencing and DNA replication. Science 262: 1844-1849. Bosco, G., W. Du and T. L. ORR-WEAVER, 2001 DNA replication control through interaction of E2F-RB and the origin recognition complex. Nat Cell Biol 3: 289-295. BRYANT, Z., L. SUBRAHMANYAN, M. TWOROGER, L. LATRAY, C. R. Liu et al., 1999 Characterization of differentially expressed genes in purified Drosophila follicle cells: toward a general strategy for cell type-specific developmental analysis. Proc Natl Acad Sci U S A 96: 5559-5564. CADORET, J. C., F. MEISCH, V. HASSAN-ZADEH, I. LUYTEN, C. GUILLET et al., 2008 Genomewide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci U S A 105: 15837-15842. CLAYCOMB, J. M., M. BENASUTTI, G. Bosco, D. D. FENGER and T. L. ORR-WEAVER, 2004 Gene amplification as a developmental strategy: isolation of two developmental amplicons in Drosophila. Dev Cell 6: 145-155. CLAYCOMB, J. M., D. M. MACALPINE, J. G. EVANS, S. P. BELL and T. L. ORR-WEAVER, 2002 Visualization of replication initiation and elongation in Drosophila. J Cell Biol 159: 225236. CLAYCOMB, J. M., and T. L. ORR-WEAVER, 2005 Developmental gene amplification: insights into DNA replication and gene expression. Trends Genet 21: 149-162. CVETIC, C., and J. C. WALTER, 2005 Eukaryotic origins of DNA replication: could you please be more specific? Semin Cell Dev Biol 16: 343-353. GILBERT, D. M., 2004 In search of the holy replicator. Nat Rev Mol Cell Biol 5: 848-855. GRIFFIN-SHEA, R., G. THIREOS and F. C. KAFATOS, 1982 Organization of a cluster of four chorion genes in Drosophila and its relationship to developmental expression and amplification. Dev Biol 91: 325-336. HAMLIN, J. L., L. D. MESNER and P. A. DIJKWEL, 2010 A winding road to origin discovery. Chromosome Res 18: 45-61. R. S., S. THOMAS, R. SANDSTROM, T. K. CANFIELD, R. E. THURMAN et al., 2010 Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A 107: 139-144. HANSEN, HARTL, T., C. BOSWELL, T. L. ORR-WEAVER and G. BoSco, 2007 Developmentally regulated histone modifications in Drosophila follicle cells: initiation of gene amplification is associated with histone H3 and H4 hyperacetylation and HI phosphorylation. Chromosoma 116: 197-214. HUANG, D. W., L. FANTI, D. T. PAK, M. R. BOTCHAN, S. PIMPINELLI et al., 1998 Distinct cytoplasmic and nuclear fractions of Drosophila heterochromatin protein 1: their phosphorylation levels and associations with origin recognition complex proteins. J Cell Biol 142: 307-318. KARNANI, N., C. M. TAYLOR, A. MALHOTRA and A. DUTTA, 2010 Genomic study of replication initiation in human chromosomes reveals the influence of transcription regulation and chromatin structure on origin selection. Mol Biol Cell 21: 393-404. KOUZARIDES, T., 2007 Chromatin modifications and their function. Cell 128: 693-705. LANDIS, G., R. KELLEY, A. C. SPRADLING and J. TOWER, 1997 The k43 gene, required for chorion gene amplification and diploid cell chromosome replication, encodes the Drosophila homolog of yeast origin recognition complex subunit 2. Proc Natl Acad Sci U S A 94: 3888-3892. LANDIS, G., and J. TOWER, 1999 The Drosophila chiffon gene is required for chorion gene amplification, and is related to the yeast Dbf4 regulator of DNA replication and cell cycle. Development 126: 4281-4293. LILLY, M. A., and A. C. SPRADLING, 1996 The Drosophila endocycle is controlled by Cyclin E and lacks a checkpoint ensuring S-phase completion. Genes Dev 10: 2514-2526. Lu, L., and J. TOWER, 1997 A transcriptional insulator element, the su(Hw) binding site, protects a chromosomal DNA replication origin from position effects. Mol Cell Biol 17: 22022206. Lu, L., H. ZHANG and J. TOWER, 2001 Functionally distinct, sequence-specific replicator and origin elements are required for Drosophila chorion gene amplification. Genes Dev 15: 134-146. LUNYAK, V. V., M. EZROKHI, H. S. SMITH and S. A. GERBI, 2002 Developmental changes in the Sciara II/9A initiation zone for DNA replication. Mol Cell Biol 22: 8426-8437. MACALPINE, D. M., H. K. RODRIGUEZ and S. P. BELL, 2004 Coordination of replication and transcription along a Drosophila chromosome. Genes Dev 18: 3094-3105. MACALPINE, H. K., R. GORDAN, S. K. POWELL, A. J. HARTEMINK and D. M. MACALPINE, 2010 Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res 20: 201-211. MINDRINOS, M. N., L. J. SCHERER, F. J. GARCINI, H. KWAN, K. A. JACOBS et al., 1985 Isolation and chromosomal location of putative vitelline membrane genes in Drosophila melanogaster. Embo J 4: 147-153. 0. L. MILLER, JR. and A. L. BEYER, 1988 Visualization of Drosophila melanogaster chorion genes undergoing amplification. Mol Cell Biol 8: 2811-2821. PAK, D. T., M. PFLUMM, I. CHESNOKOV, D. W. HUANG, R. KELLUM et al., 1997 Association of the origin recognition complex with heterochromatin and HP1 in higher eukaryotes. Cell 91: 311-323. PARKS, S., and A. SPRADLING, 1987 Spatially regulated expression of chorion genes during Drosophila oogenesis. Genes Dev 1: 497-509. OSHEIM, Y. N., PARKS, S., B. WAKIMOTO and A. SPRADLING, 1986 Replication and expression of an X-linked cluster of Drosophila chorion genes. Dev Biol 117: 294-305. SCHWED, G., N. MAY, Y. PECHERSKY and B. R. CALVI, 2002 Drosophila minichromosome maintenance 6 is required for chorion gene amplification and genomic replication. Mol Biol Cell 13: 607-620. SEQUEIRA-MENDES, J., R. DIAZ-URIARTE, A. APEDAILE, D. HUNTLEY, N. BROCKDORFF et al., 2009 Transcription initiation activity sets replication origin efficiency in mammalian cells. PLoS Genet 5: e1000446. SPRADLING, A. C., 1981 The organization and amplification of two chromosomal domains containing Drosophila chorion genes. Cell 27: 193-201. TAKAHASHI, T. S., P. Yiu, M. F. CHOU, S. GYGI and J. C. WALTER, 2004 Recruitment of Xenopus Scc2 and cohesin to chromatin requires the pre-replication complex. Nat Cell Biol 6: 991-996. TOEDLING, J., 0. SKYLAR, T. KRUEGER, J. J. FISCHER, S. SPERLING et al., 2007 Ringo--an R/Bioconductor package for analyzing ChIP-chip readouts. BMC Bioinformatics 8: 221. UNNIKRISHNAN, A., P. R. GAFKEN and T. TSUKIYAMA, 2010 Dynamic changes in histone acetylation regulate origins of DNA replication. Nat Struct Mol Biol 17: 430-437. WHITE, E. J., 0. EMANUELSSON, D. SCALZO, T. ROYCE, S. KOSAK et al., 2004 DNA replicationtiming analysis of human chromosome 22 at high resolution and different developmental states. Proc Natl Acad Sci U S A 101: 17771-17776. XIE, F., and T. L. ORR-WEAVER, 2008 Isolation of a Drosophila amplification origin developmentally activated by transcription. Proc Natl Acad Sci U S A 105: 9651-9656. Chapter Three: Differential ORC localization during two rounds of replication initiation at a Drosophila follicle cell amplicon Jane C. Kim and Terry L. Orr-Weaver Whitehead Institute and Dept. of Biology, Massachusetts Institute of Technology Cambridge, MA 02142 ABSTRACT We investigated the developmental and replication properties of a newly identified follicle cell amplicon, DAFC-34B. DAFC-34B contains two genes that are expressed in follicle cells, though their timing and spatial patterns of expression suggest that amplification is not a strategy to promote high levels of expression at this locus. Vm34Ca is a structural component of the vitelline membrane but is expressed prior to the onset of gene amplification. CG16956 is expressed in amplification stages but only in a small subset of follicle cells. Like the previously characterized DAFC-62D, DAFC-34B displays origin firing at two separate stages of development. However, unlike DAFC-62D, amplification at the later stage is not transcription dependent. We mapped the DAFC-34B amplification origin to 1kb by nascent strand analysis and delineated the cis requirements for origin activity, finding that a 6 kb region, but not the 1 kb origin alone, is sufficient for amplification. We analyzed the developmental localization of ORC, the origin recognition complex, and the MCM complex, the replicative helicase. Intriguingly, the final round of origin activation at DAFC-34B occurs in the absence of detectable ORC, though MCMs are present, suggesting a new amplification initiation mechanism. 100 INTRODUCTION The initiation of DNA replication is a critical regulatory step for complete duplication of the genome during S phase. In eukaryotes, DNA replication initiates from hundreds to thousands of sites, called replication origins, across the genome. Bidirectional replication proceeds from these initiation sites until the replication forks from adjacent origins converge and the genome is fully replicated. In metazoans, there is no sequence-specific motif for specification of origin activity or localization of the origin recognition complex (ORC), the essential replication initiation factor that is conserved in all eukaryotes (CVETIC and WALTER 2005; GILBERT 2004). Origins individually identified in cell culture such as the dihydrofolate reductase (DHFR) locus in Chinese hamster ovary (CHO) cells and the human s-globin locus, among several others, have been the molecular workhorses for replication origin studies (ALADJEM 2007). The DHFR locus is an example of an extended replication zone where origin activation can occur from one of multiple inefficient origins, whereas the p-globin locus displays a high frequency of initiation from a confined site (HAMLIN et al. 2010; KITSBERG et al. 1993). Recently, there has been a tremendous increase in the number of metazoan origins identified due to the application of methods to select origin-centered DNA to microarray or highthroughput sequencing technologies (CADORET et al. 2008; HANSEN et al. 2010; KARNANI et al. 2010; LUCAS et al. 2007; MACALPINE et al. 2010; SEQUEIRA-MENDES et al. 2009). These methods include isolation of small DNA fragments (0.5 kb to 2 kb) combined with Xexonuclease treatment to purify short nascent strands, which are protected from digestion by their 5' RNA primers, as well as pulsing cells with the nucleotide analog bromodeoxyuridine (BrdU) and immunoprecipitating newly synthesized DNA with an anti-BrdU antibody. In addition, origin activation produces a replication bubble, and these circular DNA structures can 101 be selectively trapped in gelling agarose, cloned, and identified (MESNER et al. 2006). Replication origins can also be identified using chromatin immunoprecipitation (ChIP) of ORC because this complex marks all potential sites of origin activation. The strength of these approaches lies in the ability to examine hundreds to thousands of mapped origins and ORC binding sites to look for relationships to genome-wide properties including transcription, epigenetic modification, and the coordination of replication timing. The picture emerging from these studies is that origin activation and ORC binding are significantly influenced by an open chromatin structure, showing a strong correlation to transcription start sites, particularly active promoters, as well as DNaseI hypersensitive sites and histone modifications associated with active transcription. Genome regions with a high density of these features tend to be early replicating in S phase. However, a common limitation of these studies is that they rely on cell culture, and it is not possible to study events directly at the time of origin activation. Gene amplification in Drosophila follicle cell is an excellent in vivo model for investigating the regulation of DNA replication initiation (CLAYCOMB and ORR-WEAVER 2005). Gene amplification occurs by a process of repeated origin activation and bidirectional fork progression resulting in a gradient of replicated DNA that spans approximately 100 kb. Genes that encode structural components of the eggshell and eggshell cross-linking enzymes are located in the peak of amplification at several amplicons (CLAYCOMB et al. 2004; SPRADLING 1981). This increase in genomic template enables the eggshell to be constructed in a short developmental period. Importantly for its use as a replication model, gene amplification relies on the same replication machinery and cell cycle kinase regulation that is used in the typical eukaryotic S phase. 102 Analyses of two Drosophila Amplicons in Follicle Cells, DAFC-66D and DAFC-62D, reveal distinct developmental and regulatory strategies. At the major chorion amplicon DAFC66D, replication initiation and exclusive elongation occur in distinct phases with origin activation events confined to stages 10B and 11, followed by elongation of existing replication forks in stages 12 and 13 (CLAYCOMB et al. 2002). In contrast, DAFC-62D displays two stages of replication initiation, one at stage 1OB and another at stage 13, with elongation occurring in the intervening stages (CLAYCOMB et al. 2004). Furthermore, whereas DAFC-66D amplification can be delineated to the cis interaction of a 320 base pair amplification enhancer A CE3 on a major 884 base pair replication origin Orip(Lu et al. 2001), ectopic amplification of DAFC-62D cannot be narrowed down to a region smaller than 10 kb (XIE and ORR-WEAVER 2008). In addition, the second phase of replication initiation at DAFC-62D is surprisingly transcription dependent, requiring active transcription for loading of the MCM helicase, though not ORC localization (XIE and ORR-WEAVER 2008). These studies suggest that elucidating the developmental and regulatory strategies of additional follicle cell amplicons will uncover new mechanisms regulating DNA replication initiation. Recently, all follicle cell amplicons were identified in Drosophila using an array based comparative genomic hybridization (aCGH) approach, which uncovered two new amplicons (Chapter 2). This study examined the relationship of gene amplification to transcription, ORC localization, and histone H4 acetylation on a genome-wide scale revealing that amplified regions often, though not always, contain highly expressed genes. ORC localizes to the most amplified region, making the amplicons useful replication models, and amplification levels correlate with acetylated H4 levels. Here we analyze DAFC-34B in close detail and find yet a distinct example of developmental and regulatory control strategies from DAFC-66D and DAFC-62D. 103 RESULTS Two genes in DAFC-34B are expressed in follicle cells DAFC-34B was recently identified as a new amplicon using an aCGH strategy to uncover all follicle cell amplicons (Chapter 2). After undergoing three rounds of endoreduplication, chromosomal replication without intervening mitoses, to reach 16C copy levels, follicle cells initiate synchronous gene amplification at stage 1OB. 16C follicle cells contain egg chambers from stages 9 through 14 and are enriched for amplification stages. Whole genome tiling arrays were competitively hybridized with DNA from 16C nuclei and diploid early embryos to uncover regions of follicle cell amplification. Additionally, Illumina RNA-sequencing of 16C follicle cells showed that several genes in the amplified region are expressed and at least one, Vm34Ca, a vitelline membrane gene located in the peak of amplification, at very high levels (Figure 3-1A). To assess the timing and spatial expression pattern of genes in the amplified region more precisely, we performed in situ hybridization of nine genes in the central amplified region. Consistent with the RNA-seq data, Vm34Ca was highly expressed in follicle cells. However, this gene was expressed beginning in stage 8, prior to the onset of synchronous gene amplification, and continued until stage lOB (Figure 3-1B), consistent with previous expression analysis by Northern blot (MINDRINOS et al. 1985). Of the remaining genes we examined, only one was expressed in follicle cells by in situ hybridization. CG16956 is expressed in stages 12 and 13, but only in a small subset of follicle cells at the anterior region (Figure 3-1C), which was similar to the late stage expression patterns of yellow-g and yellow-g2 in DAFC-62D (CLAYCOMB et al. 2004). Gene amplification is generally considered a strategy to promote high levels of gene expression. Because the expression patterns of Vm34Ca and CG16956 did not conform to this simple model, showing 104 Figure 3-1. Two genes located in DAFC-34B are expressed in follicle cells. 100 kb amplified region of DAFC-34B with aCGH in log2 ratio and RNA-seq data in number of mapped reads (A). Nine genes were tested by in situ hybridization for expression in follicle cells. The genes in black were tested but did not show any signal in follicle cells. Vm34Ca is expressed broadly in all follicle cells from stages 8, prior to the onset of synchronous gene amplification, through stage 10 B (B). CG1 6956 is expressed exclusively in a small population of cells at the anterior region. Stage 12 is above, and stage 13 is below in (C). 105 .. .... ... . ... .. Figure 3-1 A 50kb, ~ 0 ..........HI~hIlililIllI I 100 134500001 134000001 133500001 IIhIIIlIlI I II I llIIIIIIIIElIlIIIhIIII CG65650 CG1 68263 CG9377 CG65231 C99m Nnp-1 IM CG7099 Nnp12 CG23 SO CG9293 CG31855 CG31855 0 CG9306 CG9350 CG93020 beta!Cop I11ll11 m34Ca 7110qa lI CT ou.CG169561 i li il TehaoC 6N CG16849H CG31846 CG9267 HS6D* illaIN1lis bNoss N.. CG16850E1 RpL24I CG169571 CGI08590 CG16956 106 high expression levels prior to the start of synchronous amplification and being expressed in a limited number of cells, we investigated the precise timing and possible cell population specificity of replication events. DAFC-34B shows two distinct stages of replication initiation Drosophila egg chambers are morphologically distinct and thus allow origin activation to be analyzed in the context of developmental progression. To determine the replication profile at DAFC-34B, we hand sorted individual egg chamber stages to isolate genomic DNA and assessed DNA copy levels by quantitative real-time PCR (qPCR) compared to a non-amplified locus. This approach allows initiation to be distinguished from elongation based on the shape of the replication profile: specifically, whether there is an increase in DNA copy number at the most amplified region or only in the flanking regions. Because Vm34Ca is expressed beginning in stage 8, one possibility was that gene amplification begins in an earlier stage at DAFC-34B, which had not been previously recognized in BrdU immunofluorescence experiments. However, at stage 9, no amplification was detected at DAFC-34B (Figure 3-2A), confirming that Vm34Ca expression occurs prior to amplification. At stage lOB, we observed an increase in copy number to 4-fold, showing that replication initiation occurs at this stage (Figure 3-2C). At stage 11, there was no further increase in copy number, but a widening of the replication profile, indicating the process of replication elongation (Figure 3-2D). At stage 12, we saw a further doubling of DNA copy number, revealing a second phase of DNA replication initiation, followed again by elongation at stage 13 (Figure 3-2E and 3-2F). Thus, DAFC-34B shows two stages of replication initiation in the same region with a period of only elongation in between, which is similar to the previously characterized DAFC-62D. 107 Figure 3-2. DAFC-34B exhibits two stages of replication initiation. Genomic DNA was isolated from hand-sorted staged egg chambers, and DNA copy levels were quantified by qPCR compared to a non-amplified locus at 62C5 (A-G). Error bars show standard deviation for triplicate reactions. Genomic position is shown on the X axis (13380 is Chr2L: 13,380,000). Although Vm34Ca begins to be expressed at stage 8, gene amplification is not first observed until stage 10A, when there is a two-fold increase in DNA copy number (B). Replication initiation occurs in stage 1OB to reach approximately four-fold amplification (C). There is another period of replication initiation at late stage 12, resulting in a doubling in copy number at DAFC-34B (E). 108 Figure 3-2 DAFC-34B Stage 1OA DAFC-34B Stage 9 I 13380 13440 13420 13400 Genomic Position 13380 13420 13400 Genomic Position 13440 DAFC-34B Stage 11 DAFC-34B Stage 10B 421338 13440 1340 134 Genomic Position 13380 - 13;W r - 13420 13400 Genomic Position 13iW 13440 DAFC-34B - Stage 9 + Stage 1OA + Stage 1OB - Stage 11 + Stage 12 +Stage 13 U| 13380 13440 DAFC-34B Stage 13 DAFC-34B Stage 12 -- U-- 13400 13420 Genomic Position | 13400 13420 Genomic Position 13440 109 1320 130 Genomic Position 13440 The timing of CG] 6956 expression in stages 12 and 13 was consistent with the timing of gene amplification, but the expression pattern in a small number of cells raised the possibility that DAFC-34B might be differentially amplified among follicle cell populations. To address whether CG16956-expressingcells amplify this genomic region to higher levels than other follicle cell populations, we used two approaches. First, the expression pattern of CG16956 was reminiscent of slow borders (slbo), a gene expressed in and required for the migration of border cells (MONTELL et al. 1992). Border cells are a small group of cells necessary for proper development of a functional micropyle, or sperm-entry structure in the anterior region, and CG16956 expression co-localized with the slbo marker (Figure 3-3A). We isolated slboexpressing cells by FACS, using the GAL4-UAS system to drive GFP expression with the slbo regulatory region (slbo-GAL4) (WANG et al. 2006). When compared to DNA recovered using a ubiquitous follicle cell driver (c323a), slbo-positive cells did not display higher gene amplification levels (Figure 3-3B). In fact, when we examined the top 20% of GFP expressing cells, slbo-positive cells showed lower levels of gene amplification than the total cell population driven by c323a. Second, we divided stage 13 egg chambers into anterior and posterior regions by hand and performed qPCR on the genomic DNA samples. The anterior region did not display more abundant amplification of DAFC-34B and reproducibly showed lower DNA copy number than either the posterior region or whole egg chambers (Figure 3-3C). Thus, DAFC-34B is not amplified to higher levels in CG16956-expressing cells or anterior follicle cells compared to the rest of the follicle cell population. We next addressed whether transcription influenced DAFC-34B amplification. At DAFC62D, the final round of initiation is dependent on transcription, as culturing egg chambers in the presence of the drug a-amanitin, which blocks RNAPII translocation, for 5 hours specifically 110 Figure 3-3. Follicle cells expressing CG16956 do not selectively or more greatly amplify DAFC34B. RNA fluorescent in situ hybridization was performed along with a-GFP immunofluorescence on slbo-GAL4, UAS-GFP ovaries to determine co-localization of CG] 6956 and the border cell specific sibo marker (A). CG16956 is expressed in slbo-positive cells. GFP-positive, slboexpressing cell were recovered by FACS, and genomic DNA from these samples was compared to GFP-positive cells driven by the ubiquitous follicle cell driver, c323a. Results for total GFPpositive cells and the top 20% of GFP-positive cells are shown in (B). Stage 13 egg chambers were hand-dissected into anterior and posterior regions, and genomic DNA from these samples was examined by qPCR. 111 ........... - -............ .... ..... ....... ..................... : ::.: - 11 -: ". ".. -- Figure 3-3 A DAFC-66D DAFC-34B -- T / I - 0 DAFC-66D DAFC-34B 40 .4' 'b c~. C, 112 I'll, . .. " blocked stage 13 origin activation but had no effect on stage 10 amplification (XIE and ORRWEAVER 2008). As both DAFC-34B and DAFC-62D exhibit two distinct stages of replication initiation, we tested the effect of transcription inhibition on DAFC-34B amplification to see if these late stage amplification events occur by the same mechanism. Like DAFC-62D, we found that five hour a-amanitin treatment had no effect on stage 1OB origin activation at DAFC-34B (Figure 3-4A). However, whereas a-amanitin specifically blocked stage 13 origin activation at DAFC-62D, it had no effect on late stage origin activation at DAFC-34B (Figure 3-4B), indicating a different mode of regulation. DAFC-34B amplification origin corresponds to the Vm34Ca transcription unit The amplification origins of DAFC-66D and DAFC-62D have been mapped using various methods, and the results show distinct origin positions with respect to the surrounding genes. The major origin at the chorion amplicon DAFC-66D is intergenic, though still residing in a gene-rich region (HECK and SPRADLING 1990), whereas the origin at DAFC-62D coincides with the yellow-g2 gene (XIE and ORR-WEAVER 2008). We used nascent strand analysis to map the origin at DAFC-34B. We hand sorted stage 10 egg chambers, enriched for replication intermediates, and size fractionated DNA fragments by gel electrophoresis. Short nascent strands were further enriched using -exonuclease treatment to remove nicked DNA that lacks 5' RNA primers. Using this method, we found that the DAFC-34B amplification origin coincides with the transcription unit of Vm34Ca (Figure 3-5A). We found that this gene region was highly enriched in the 0.7 to 1.5 kb fractions of DNA in three biological replicates. As a control for the efficiency of nuclease digestion and the recovery of short nascent strands, we found that DNA greater than 5 kb in size showed uniform low enrichment levels across the 10 kb most amplified region of 113 Figure 3-4. Transcription inhibition with a-amanitin has no effect on DAFC-34B amplification. Whole ovaries were in vitro cultured with a-amanitin for 5 hours, upon which staged egg chambers were hand sorted. qPCR quantification of stage 10 and stage 13 are shown for DAFC34B and DAFC-62D. At stage 10, a-amanitin treatment has no effect on amplification at either amplicon (A). At stage 13, a-amanitin treatment specifically inhibits late amplification at DAFC62D but has no effect at DAFC-34B (B). 114 Figure 3-4 A Stage 10 - alpha-amanitin + alpha-amanitin I' -Li 0~ 04 Stage 13 - alpha-amanitin + alpha-amanitin Q 115 DAFC-34B (Figure 3-5B). Notably, we used probes to detect for nascent strand abundance in the other gene regions located in DAFC-34B but did not observe any enrichment (Figure 3-5C). Developmental control of ORC and MCM localization at DAFC-34B Because DAFC-34B exhibits two rounds of replication initiation, we examined the stage specific localization of ORC at this genomic locus to see how it corresponded to origin firing. Genome-wide localization using ORC2 ChIP-chip in stage 10 egg chambers showed that ORC localizes to all follicle cell amplicons and at DAFC-34B, in an approximate 10 kb zone centered at the peak of amplification (Chapter 2 and Figure 3-6A). We confirmed this result by ChIPqPCR using independent biological samples and saw the same profile of ORC localization for DAFC-34B as with ChIP-chip (Figure 3-6B). With both methods, we observed a sharp boundary of ORC binding that coincided with Vm34Ca, which also corresponds to the origin we identified by nascent strand analysis. Strikingly, when we examined subsequent stages for ORC localization at DAFC-34B, we found that ORC enrichment was absent despite the second round of replication initiation. We performed ORC2 ChIP-chip for pooled stage 11 and 12 egg chambers and found that ORC was not detectable at DAFC-34B. In contrast, we observed that ORC was present at DAFC-62D, as it has been previously reported by ChIP-qPCR (XIE and ORR-WEAVER 2008) (Figure 3-6A). We confirmed this result for pooled stages 11/12 as well as examined ORC localization at DAFC-34B and DAFC-62D at stage 13 by ChIP-qPCR and found that ORC was not enriched at DAFC-34B and enriched at DAFC-62D in the later stages of egg chamber development (Figure 3-6C). We examined the localization of the MCM2-7 replicative helicase by ChIP and found that the MCM complex localizes to DAFC-34B, coincident with both stages of replication initiation. At stage 10, we observed that MCMs localize to DAFC-34B by ChIP-chip (Figure 3116 Figure 3-5. DAFC-34B amplification origin maps to Vm34Ca transcription unit by nascent strand analysis. Size fractionated nascent DNA was isolated from stage 10 egg chambers, and the abundance of short nascent strands from the 1 to 1.5 kb fraction (plus -exonuclease treatment) was quantified by qPCR compared to a non-amplified locus (A). Genomic position is shown on the X axis (8000 is Chr2L: 13,408,000). There is a peak of nascent strand enrichment corresponding to the Vm34Ca transcription unit. This enrichment is absent in the 5 kb and up fraction (minus Xexonuclease treatment) (B). The region was assessed using additional probes in the 0.7 to 1 kb (plus -exonuclease treatment) fraction, and the enrichment was specific to the Vm34Ca gene and not adjacent gene regions (C). 117 ..... ...... ...... ............ . Figure 3-5 A 1 100001 13415000 1 13420000 1 13400001 vmS4CdDGlSp CG168U4 Nascent Strand (1-1.5kb)+Iambda Nascent Strand (5kb and up)-lambda 60-1 12000 16000 20000 0.1 800 24000 Genomic Position 0 - 11000 12000 13000 Genomic Position CG7110 Vm34Ca 16000 20000 Genomic Position Nascent Strand (0.7-1 kb)+Iambda 0 , 10000 12000 CG16848 118 24000 28000 Figure 3-6. ORC is not detectable at DAFC-34B after stage 10 despite a late stage of replication initiation. ChIP-chip experiments using antibodies specific for ORC2 and MCM2-7 were performed on stage 10 as well as pooled stages 11 and 12 egg chambers. The amplified regions for DAFC-34B, DAFC-62D, and DAFC-66D are shown. ORC localizes to DAFC-34B in a 10 kb zone with a boundary of ORC binding corresponding to the Vm34Ca transcription unit. These results are consistent with quantification by ChIP-qPCR (B). After stage 10, there is no localization of ORC at DAFC-34B despite enrichment at DAFC-62D (A,C) , which also exhibits late amplification. DAFC-66D, which shows replication initiation at stages 10 and 11, is shown for comparison. Although ORC is not detected at DAFC-34B after stage 10, MCM enrichment was observed at stage 13 using ChIP-qPCR (D). 119 Figure 3-6A DAFC-34B 00 kb 133800001 13400M001 1 DAFC-62D 4 13440000 134600001 13420M01 22400001 kb .20 22800001 DAFC-66D 2200M001 2328001 2380000 8Bo00001 00 kb 87000001 8728000 87600001 $740M001 CGH 00212, co.~ym Stage 10 ORC ChIP "I"L. 1.1-I'll 0 G1682014 b.0~ T.[C009 00127 07080i . 1116 .1....... 11 .61 Stage 10 MCM ChIP Stage 11/12 ORC ChIP 3- Stage 11/12 MCM ChIP o - i...a bm i. a Wi lk.,101111--,IIAA,,, IIIIHIP11111"J.I yd 2 2?56. DIU OG13002U 654 0002.....U -A - _ -II - - l. CGXMO 3306 ----------.. .. ..... .... Figure 3-6B DAFC-34B Stage 10 ORC ChIP 1210- 8- 'Lii 64- I I II.. 10000 12000 14000 16000 18600 20000 Genomic Position CG71 10 a G16848 n Vm34Ca "~CG1 6956 DAFC-62D Stag. 13 ORC ChIP - oi62 genomic position Stage 11112 MCM2-7 ChIP Stage 13 MCM2-7 ChIP 24 24 20 115 Ic 2 -C 2 c 5 a * 4-- DAFC-62D DAFC-34B (1) DAFC-34B (2) genomic position DAFC-62D DAFC-34B (1) genomic position 121 DAFC-348 (2) 6A). Furthermore, we observed MCM enrichment at stage 13 by ChIP-qPCR (Figure 3-6D). Thus, the late stage initiation at DAFC-34B does not occur in the complete absence of all prereplicative (pre-RC) complex components. We did not observe MCM enrichment in pooled stages 11 and 12 egg chambers at DAFC-34B, indicating that a pool of MCMs pre-loaded at stage 10 is not responsible for later origin activation. In contrast, at DAFC-66D, which undergoes replication initiation at stage 11, we observed significant MCM enrichment in pooled stages 11 and 12. DAFC-34B demonstrates an example of replication initiation that occurs in the absence of detectable ORC despite MCM loading. ORC mutant blocks both rounds of initiation at DAFC-34B Given the finding of replication initiation in the absence of ORC, we tested genetically whether each period of origin activation is dependent on functional ORC. orc2f is a female- sterile allele that specifically reduces follicle gene amplification, and orc2k4 3 is a null allele (LANDIS et al. 1997). We reasoned that, as ORC is absent for the late stage amplification, it might not be required for this round of replication initiation, and we would observe amplification specifically in the late stage egg chambers. When we examined DNA copy levels for stages 1OB and 13 egg chambers by qPCR, however, the mutant effect on gene amplification at DAFC-34B was comparable to the other two amplicons examined (Figure 3-7). All three displayed 1.5 to 2fold enrichment at stage 13, likely due to low levels of ORC activity. These results indicate that both rounds of replication initiation require ORC, though initiation in the later stages in its absence could be explained by some earlier action of ORC to promote replication initiation. For example, the first round of replication initiation at DAFC-34B may set up conditions that permit the final origin firing in the absence of ORC function. 122 Figure 3-7. ORC function is required to allow both stages of replication initiation at DAFC-34B. DNA copy number was examined in a female-sterile allele of ORC, orc2 , which reduces gene amplification. orc2k43 is a null allele. Amplification in the sibling heterozygotes are shown 43 in (A) and (B) showing typical amplification levels. In orc2/2 93 /orc2k , there was no specific increase in gene copy number by stage 13 at DAFC-34B. There was 1.5 to 2 fold amplification by stage 13 at DAFC-34B, DAFC-62D, and DAFC-66D, likely due to low levels of ORC activity. 123 -,. -, - ......... . : Figure 3-7 A Gene Amplification in Orc2/+ Gene Amplification in Orc2/+ DAFC-34B DAFC-62D 46- E3 DAFC-66D 2- V IIB I Ao ,61 cp I I & 4x sk b-~ 4 Gene Amplification in Orc2 Mutants 4 m as c 3ME DAFC-34B DAFC-62D DAFC-66D 282 I~ I ,Ao 4$'OI%' ,p, Kb 0*~ -I------ 'It' r Jl'o ' 124 *1 Delineation of cis control elements for replication at DAFC-34B To delineate the cis requirements for amplification at DAFC-34B and in particular, to determine if distinct control elements necessary for early versus late stage initiation could be defined, we used P element mediated transformation to test sufficiency for amplification at an ectopic site. We used Suppressor of Hairy-wing insulator binding sites (SHWBS) to flank the test sequences and minimize inhibitory effects on gene amplification due to insertion position (Lu and TOWER 1997). We tested the 1 kb origin sequence alone and found that it was insufficient for amplification, indicating that additional sequences, possibly conferring replication enhancer activity, are necessary (Figure 3-8). When we examined a transformed 10 kb region spanning the stage 10 ORC binding zone by qPCR, we found that this transformant line showed the same timing and magnitude of amplification as the endogenous DAFC-34B locus, indicating that this region contains all of the information necessary for amplification (Figure 3-8). Previous analyses of DAFC-66D and DAFC-62D demonstrated that not all follicle cell amplicons could be reduced to equivalent replication control elements. At DAFC-66D the interaction of ACE3 and Orip is sufficient for amplification, whereas at DAFC-62D a region sufficient for amplification cannot be limited to smaller than 10 kb. DAFC-34B displays an intermediate result. We found that a 6 kb region that spans the peak of ORC binding is sufficient for amplification. However, neither a 2.6 kb fragment containing the DAFC-34B origin nor a 1.8 kb fragment spanning the peak of ORC binding were sufficient for amplification. With the sequences we tested, we did not separate replication control elements that were responsible for the two developmental stages of gene amplification. 125 Figure 3-8. A 6 kb ORC binding region is sufficient for amplification of DAFC-34B. DNA sequences tested by P element mediated transformation are shown in (A) with the results of ectopic amplification in (B). Neither the 1 kb origin alone nor a 2.1 kb region containing the origin is sufficient for amplification. The 6 kb region containing the 1 kb origin as well as the peak of ORC binding is the minimal sequence that conferred amplification activity. 126 Figure 3-8 A 3. ) G 134190001134200001 134210001 134180001 134150001134160001 134170001 134140001 0134110001134120001 134130001 0 - DAFC-34B 1kb DAFC-34B 10kb DAFC-34B_6kb DAFC-34B 2.1kb DAFC-34B 1.8kb Replication Initiation Construct No. of lines tested DAFC-34B 1kb DAFC-34B 10kb DAFC-34B 6kb DAFC-34B 2.1kb DAFC-34B 1.8kb 127 Stage 10 Stage 13 DISCUSSION We investigated the developmental and regulatory strategies of DAFC-34B and find that it displays unique characteristics, making it a valuable new replication model to delineate the relationship of transcription and gene amplification as well as the requirements of ORC function for gene amplification. Though the amplicon contains genes expressed in follicle cells, the expression pattern with regard to timing and spatial restriction are previously unobserved among other amplicons. Despite having two rounds of replication initiation, we find that the second round of origin firing is not dependent on transcription, as it is in DAFC-62D. Strikingly, we find that the second round of replication initiation occurs in the absence of detectable ORC, though MCMs are loaded. Finally, we delineate the cis requirements for replication and find that a 6 kb minimal region is sufficient for origin function. DAFC-34B contains two genes that are expressed specifically in follicle cells, but why these genes are amplified remains an enigma. Vm34Ca is expressed beginning in stage 8 prior to the onset of gene amplification, and CG16956 is expressed only in a small subset of follicle cells at the anterior region during amplification stages. We performed in situ hybridization in replication factor mutants but did not observe any detectable changes in expression level, spatial distribution, or temporal profile of either Vm34Ca or CG16956 (data not shown), suggesting that amplification is not necessary for adequate expression. In addition, we used RNAi lines to test whether Vm34Ca or CG16956 are required for eggshell formation or fertility (DIETZL et al. 2007). The line targeting Vm34Ca has two off-targets that are homologous vitelline membrane genes, and expression of this RNAi construct with a ubiquitous follicle cell driver resulted in thin eggshells and failure of embryos to hatch. In contrast, expression of the RNAi construct targeting CG16956, with no reported off-targets, had no detectable phenotype despite a 60% reduction in expression levels (data not shown). 128 Recent studies in cell culture have begun to identify replication origins on a genome-wide scale, finding correlations to other genomic processes and features such as transcription and histone modifications (KARNANI et al. 2010; MAcALPINE et al. 2010; SEQUEIRA-MENDES et al. 2009). Despite their diverse approaches, one of the common findings among these studies is that replication origins and ORC binding sites significantly correspond to gene regions, specifically the transcription start site of actively transcribed genes. It has been proposed that the lack of sequence specificity for ORC binding in metazoans, and rather the reliance on open and active chromatin to specify origins, may serve to ensure that origin selection can change according to developmental stage and cell type (MACALPINE et al. 2010). However, the mechanism of how active transcription, in many but not all cases, enables a DNA sequence to function as a replication origin requires analysis of individual replication origins. Consistent with the majority of origins identified in cell culture, we find that the DAFC34B amplification origin corresponds to the transcription unit of Vm34Ca. We used nascent strand analysis to map the DAFC-34B amplification origin, though due to the small size of the Vm34ca transcription unit (455 bp) and the size fraction of the nascent DNA we used (0.7 to 1.5 kb), we could not precisely distinguish whether the origin mapped to the gene region or TSS. The coincidence of the amplification origin with the Vm34Ca transcription unit makes DAFC34B an excellent model to test directly the role of active transcription on gene amplification. One model for amplification at DAFC-34B is that abundant transcription, creating a chromatin environment permissible for ORC binding, promotes amplification of this region. For example, metazoan ORC has been shown to preferentially bind negatively supercoiled DNA (REMUS et al. 2004), which can be found directly upstream of active promoters. Although we showed that transcription inhibition by five hour a-amanitin treatment had no effect on stage 10 129 amplification, we cannot conclude whether transcription is absolutely necessary for stage 10 amplification since Vm34Ca is expressed for over 12 hours, and inhibiting expression for the duration of stages 8 to 10 is not possible in vitro. One certainty is that active transcription, by itself, is not sufficient for amplification because homologous vitelline membrane genes (for example, Vm26Aa, Vm26Ab, and Vm32E) are expressed at similarly high levels, yet none of these regions are amplified (Chapter 2). Furthermore, the DAFC-34B 2.6 kb construct does not amplify though it contains the full-length Vm34Ca gene and at least 1 kb in both 5' and 3' regulatory sequences. Thus, DAFC-34B can be used as a model replication origin to dissect what regulatory elements, in addition to or in place of active transcription, are responsible for origin activity. DAFC-34B is also a distinctive replication model because the second round of replication initiation occurs in the absence of detectable ORC. ORC binds to the region in a 10 kb zone during stage 10, but it is not detectable by ChIP in subsequent stages. In contrast, DAFC-62D, which also has a late stage of replication initiation, exhibits ORC binding throughout these later stages. Despite the absence of ORC, MCMs are loaded at DAFC-34B by stage 13. Importantly, there is not a pool of MCMs at this region during stages 11 and 12, revealing that ORC function in stage 10 does not load MCMs that will be activated at a later stage. What, then, can explain the absence of ORC at DAFC-34B after stage 10? One model is that ORC is required for late initiation but that failure to detect enrichment by ChIP is due to the epitope being masked, specifically after stage 10. ORC may undergo a significant structural rearrangement uniquely at DAFC-34B. Thus, the apparent absence of ORC enrichment may be, more accurately, absence of the same ORC structure that is present at stage 10 in DAFC-34B and for all five other amplicons at all amplification stages. 130 Conversely, ORC may not be required for late initiation at DAFC-34B, and this amplicon may demonstrate ORC-independent MCM loading and replication initiation. There are at least three mechanistic explanations for ORC-independent initiation. First, Park and Asano reported that ORC is dispensable for endoreplication in Drosophila (PARK and ASANO 2008). By generating an Orc1 null allele, the authors showed that nuclei in homozygous mutant clones reach the same size as nuclei in wild type clones. However, cell proliferation and gene amplification, assessed by loss of BrdU foci, were disrupted in these mutants. Asano proposed the existence of a protein or complex X that can recruit CDC6 and promote replication initiation (ASANO 2009). If such a factor exists, it may also play a role in late initiation at DAFC-34B. Second, DUP/CDT1 persists in late stage egg chambers and is present at elongating replicating forks (CLAYCOMB et al. 2002). A pool of DUP/CDT1 at the origin may recruit MCMs for late amplification at DAFC-34B. Finally, Lydeard et al reported that break-induced replication in yeast does not require ORC or Cdc6 (LYDEARD et al. 2010). The replication profile at DAFC34B shows that late initiation occurs at a specific developmental stage and from the same region as early initiation. Thus, it seems unlikely that double-stranded breaks generated by collapsed replication forks could result in the symmetrical replication profile we observe. However, if the first round of replication initiation led to some susceptibility of the DAFC-34B origin to incur a double-stranded break, then random strand invasion occurring from both sides of the break could result in a symmetric replication gradient. The necessity of the first round of replication initiation to create a double-stranded break would also explain the requirement of ORC function for both stages of amplification. Investigation of the developmental and replication profiles of DAFC-34B has revealed that it is a powerful model for gaining mechanistic insight into metazoan replication initiation. It 131 shows unique properties of developmental gene expression and replication initiation in the absence of detectable ORC, making it a tractable experimental model for studying the influence of transcription on replication initiation and possible ORC-independent means of replication initiation. Furthermore, analysis of multiple follicle cell amplicons highlights the diversity of amplification control mechanisms within the same cell type and is likely to be representative of similar regulatory diversity during S phase DNA replication. MATERIALS AND METHODS RNA in situ hybridization RNA in situ hybridization with colorimetric signal output was performed as previously described (IVANOVSKA et al. 2005). 200-800 bp exonic fragments were PCR amplified from CG7110, CG16848, Vm34Ca, Tehao, CG6866/loqs, CG9293, CG7099, CG10859, and beta'Cop. PCR products were cloned into the pCRII-TOPO dual promoter vector (Invitrogen). Sense and antisense probes were in vitro transcribed and digoxygenin-labeled using either T7 or SP6 polymerase depending on the orientation of the insert, according to the manufacturer's instructions (Roche). Ovaries from wild type OrR fattened females were dissected in Grace's medium and hybridized at 55'C. Fluorescent hybridization for CG16956 was performed using the same DIG-labeled probes and hybridized as previously described (XIE and ORR-WEAVER 2008). Co-localization with the slbo marker was assessed using a-GFP (a gift from Mary Lou Pardue) immunofluorescence immediately following RNA FISH as previously described for visualizing other proteins in follicle cells (CLAYCOMB et al. 2002). Quantitative PCR 132 Absolute quantitative (real-time) PCR was performed as described for the DAFC-34B replication profile and cell population experiments (CLAYCOMB et al. 2004). Standard curves were generated from four ten-fold dilutions of stage 1-8 egg chamber DNA or 0-4h embryonic DNA. The endogenous control was a non-amplified locus at 62C5 (CLAYCOMB et al. 2002). Relative quantification was performed as described (XIE and ORR-WEAVER 2008). Absolute quantification was used for the DAFC-34B replication profile and cell population experiments whereas relative quantification was used for all other experiments. For cell population experiments, follicle cells were isolated using a protocol modified from Bryant et al (BRYANT et al. 1999). -150 whole ovaries were dissected in ice-cold Schneider's medium supplemented with 10% FBS. Tissue was digested with 0.9mL of 0.25% Trypsin/EDTA and 0. 1ml of 5Omg/mL collagenase for 15 minutes at room temperature. The supernatant was strained through a 40[tm mesh and spun at 1OOg for 7 minutes in the cold and washed once with non-supplemented Grace's medium. GFP sorting was performed on a MoFlo2 at the MIT Koch Institute Flow Cytometry Core Facility. Cells were pelleted at 1OOOg for 7 minutes and processed for genomic DNA isolation as described (CLAYCOMB et al. 2002). Nascent Strand Analysis Nascent strand abundance analysis was performed for stage 10 egg chambers as previously described (XIE and ORR-WEAVER 2008). Each fraction was analyzed for the abundance of specific sequences by relative qPCR using 0-4 hour embryonic DNA as the calibrator and a non-amplified locus at 62C5 as the endogenous control. Chromatin Immunoprecipitation 133 ChIP-qPCR was performed on 300 staged egg chambers per experiment as described (XIE and ORR-WEAVER 2008). ChIP-chip was performed using four times the starting material. All experiments were compared to input DNA. For hybridization to arrays, DNA was labeled using Invitrogen's BioPrime Total for Agilent aCGH labeling kit. ChIP was performed with ORC2 and MCM2-7 antibodies provided by Steve Bell. Array intensities were median normalized across channels and smoothed by genomic windows of 1 kb using the Ringo package in R (TOEDLING et al. 2007). Transgenic fly construction To test the cis requirements for amplification at DAFC-34B, we constructed transposons with various sequences from the most amplified region of DAFC-34B flanked by suppressor of Hairy wing binding sites (SHWBS) to control for genomic position-specific integration effects. The 10kb and 4.5kb central amplified regions were PCR amplified from BACR06AO3 using exTaq DNA polymerase (Takara) and primers with AscI and AvrII sites on the forward and reverse sequences, respectively. These products were cloned into a modified PCRA vector with AscI and AvrII sequences engineered into the multiple cloning site (Lu et al. 2001). These plasmids are called PCRA_34B_10kb and PCRA_34B_4.5kb. These plasmids were digested with NotI and subjected to a partial XhoI digest to transfer the 10kb and 4.5kb inserts to the NotI and XhoI sites of Big Parent to generate BP_34B_10kb and BP_34B_4.5kb (Lu et al. 2001). BP_34B_6kb was generated from the partial digest of PCRA_34B_10kb. The 1kb origin mapped by nascent strand analysis was PCR amplified from BACR06AO3 using exTaq DNA polymerase (Takara) and primers with NheI sites. The product was cloned into the original PCRA vector, generating PCRA_34B_1kb, which was subsequently cloned into the BP vector at the NotI and XhoI sites to generate BP_34B_1kb. 134 ACKNOWLEDGEMENTS We thank Steve Bell for ORC2 and MCM antibodies and Mary Lou Pardue for GFP antibodies. Flies were provided by the Bloomington Stock Center. REFERENCES ALADJEM, M. I., 2007 Replication in context: dynamic regulation of DNA replication patterns in metazoans. Nat Rev Genet 8: 588-600. ASANO, M., 2009 Endoreplication: the advantage to initiating DNA replication without the ORC? Fly (Austin) 3: 173-175. BRYANT, Z., L. SUBRAHMANYAN, M. TWOROGER, L. LATRAY, C. R. Liu et al., 1999 Characterization of differentially expressed genes in purified Drosophila follicle cells: toward a general strategy for cell type-specific developmental analysis. Proc Natl Acad Sci U S A 96: 5559-5564. CADORET, J. C., F. MEISCH, V. HASSAN-ZADEH, I. LUYTEN, C. GUILLET et al., 2008 Genomewide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci U S A 105: 15837-15842. CLAYCOMB, J. M., M. BENASUTTI, G. Bosco, D. D. FENGER and T. L. ORR-WEAVER, 2004 Gene amplification as a developmental strategy: isolation of two developmental amplicons in Drosophila. Dev Cell 6: 145-155. CLAYCOMB, J. M., D. M. MACALPINE, J. G. EVANS, S. P. BELL and T. L. ORR-WEAVER, 2002 Visualization of replication initiation and elongation in Drosophila. J Cell Biol 159: 225236. CLAYCOMB, J. M., and T. L. ORR-WEAVER, 2005 Developmental gene amplification: insights into DNA replication and gene expression. Trends Genet 21: 149-162. CVETIC, C., and J. C. WALTER, 2005 Eukaryotic origins of DNA replication: could you please be more specific? Semin Cell Dev Biol 16: 343-353. DIETZL, G., D. CHEN, F. SCHNORRER, K. C. Su, Y. BARINOVA et al., 2007 A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature 448: 151-156. GILBERT, D. M., 2004 In search of the holy replicator. Nat Rev Mol Cell Biol 5: 848-855. HAMLIN, J. L., L. D. MESNER and P. A. DIJKWEL, 2010 A winding road to origin discovery. Chromosome Res 18: 45-61. HANSEN, R. S., S. THOMAS, R. SANDSTROM, T. K. CANFIELD, R. E. THURMAN et al., 2010 Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc Natl Acad Sci U S A 107: 139-144. HECK, M. M., and A. C. SPRADLING, 1990 Multiple replication origins are used during Drosophila chorion gene amplification. J Cell Biol 110: 903-914. IVANOVSKA, I., T. KHANDAN, T. ITO and T. L. ORR-WEAVER, 2005 A histone code in meiosis: the histone kinase, NHK- 1, is required for proper chromosomal architecture in Drosophila oocytes. Genes Dev 19: 2571-2582. 135 KARNANI, N., C. M. TAYLOR, A. MALHOTRA and A. DUTTA, 2010 Genomic study of replication initiation in human chromosomes reveals the influence of transcription regulation and chromatin structure on origin selection. Mol Biol Cell 21: 393-404. KITSBERG, D., S. SELIG, I. KESHET and H. CEDAR, 1993 Replication structure of the human betaglobin gene domain. Nature 366: 588-590. LANDIS, G., R. KELLEY, A. C. SPRADLING and J. TOWER, 1997 The k43 gene, required for chorion gene amplification and diploid cell chromosome replication, encodes the Drosophila homolog of yeast origin recognition complex subunit 2. Proc Natl Acad Sci U S A 94: 3888-3892. Lu, L., and J. TOWER, 1997 A transcriptional insulator element, the su(Hw) binding site, protects a chromosomal DNA replication origin from position effects. Mol Cell Biol 17: 22022206. Lu, L., H. ZHANG and J. TOWER, 2001 Functionally distinct, sequence-specific replicator and origin elements are required for Drosophila chorion gene amplification. Genes Dev 15: 134-146. LUCAS, I., A. PALAKODETI, Y. JIANG, D. J. YOUNG, N. JIANG et al., 2007 High-throughput mapping of origins of replication in human cells. EMBO Rep 8: 770-777. LYDEARD, J. R., Z. LIPKIN-MOORE, Y. J. SHEU, B. STILLMAN, P. M. BURGERS et al., 2010 Breakinduced replication requires all essential DNA replication factors except those specific for pre-RC assembly. Genes Dev 24: 1133-1144. MACALPINE, H. K., R. GORDAN, S. K. POWELL, A. J. HARTEMINK and D. M. MACALPINE, 2010 Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res 20: 201-211. MESNER, L. D., E. L. CRAWFORD and J. L. HAMLIN, 2006 Isolating apparently pure libraries of replication origins from complex genomes. Mol Cell 21: 719-726. MINDRINOS, M. N., L. J. SCHERER, F. J. GARCINI, H. KWAN, K. A. JACOBS et al., 1985 Isolation and chromosomal location of putative vitelline membrane genes in Drosophila melanogaster. Embo J 4: 147-153. MONTELL, D. J., P. RORTH and A. C. SPRADLING, 1992 slow border cells, a locus required for a developmentally regulated cell migration during oogenesis, encodes Drosophila C/EBP. Cell 71: 51-62. PARK, S. Y., and M. ASANO, 2008 The origin recognition complex is dispensable for endoreplication in Drosophila. Proc Natl Acad Sci U S A 105: 12343-12348. REMUS, D., E. L. BEALL and M. R. BOTCHAN, 2004 DNA topology, not DNA sequence, is a critical determinant for Drosophila ORC-DNA binding. Embo J 23: 897-907. SEQUEIRA-MENDES, J., R. DIAZ-URIARTE, A. APEDAILE, D. HUNTLEY, N. BROCKDORFF et al., 2009 Transcription initiation activity sets replication origin efficiency in mammalian cells. PLoS Genet 5: e1000446. SPRADLING, A. C., 1981 The organization and amplification of two chromosomal domains containing Drosophila chorion genes. Cell 27: 193-201. TOEDLING, J., 0. SKYLAR, T. KRUEGER, J. J. FISCHER, S. SPERLING et al., 2007 Ringo--an R/Bioconductor package for analyzing ChIP-chip readouts. BMC Bioinformatics 8: 221. WANG, X., J. Bo, T. BRIDGES, K. D. DUGAN, T. C. PAN et al., 2006 Analysis of cell migration using whole-genome expression profiling of migratory cells in the Drosophila ovary. Dev Cell 10: 483-495. 136 XIE, F., and T. L. ORR-WEAVER, 2008 Isolation of a Drosophila amplification origin developmentally activated by transcription. Proc Natl Acad Sci U S A 105: 9651-9656. 137 Chapter Four: Conclusions and Perspectives 138 In this thesis, we have investigated metazoan DNA replication initiation using Drosophila follicle cell gene amplification as a model system. We have combined genomic approaches to examine whole-genome views of ORC binding, transcription, and histone modifications with molecular analyses of individual amplicons to uncover their developmental properties and distinctive modes of regulation. DAFC-22B displays strain-specific amplification and is a model replicon to study the determinants of ORC binding. DAFC-34B exhibits replication initiation in the absence of ORC binding and provides a way to investigate ORC-independent means of replication initiation. As genome-wide studies in cell culture identify more replication origins and correlative relationships, this abundance of data highlights the importance of gaining mechanistic understanding for how individual replication origins are specified and activated. This thesis demonstrates the utility in combining genomic approaches with detailed molecular analyses of individual replication origins, which Drosophila gene amplification uniquely offers as an in vivo metazoan experimental model to study DNA replication. Future studies with the follicle cell amplicons, described here, will further elucidate the extent to which different origins are similar and diverse in terms of their regulation. Active transcription as a causal determinant of gene amplification This thesis has demonstrated that gene amplification does not, in all cases, promote high transcription levels or augment transcription. The examples of DAFC-22B, where amplification apparently has no effect on CG7337 expression levels, and DAFC-34B, where Vm34Ca is transcribed prior to the onset of gene amplification, raise the possibility that active transcription may promote gene amplification at these loci. There is already one example of this link in follicle cells. At DAFC-62D active transcription is required for the late stage initiation, specifically to load MCM complexes (XIE and ORR-WEAVER 2008). Investigating whether active 139 transcription plays a causal role in replication initiation can be readily done for DAFC-34B, where a transformed 6 kb sequence is sufficient for amplification. Modifications can be made to this 6 kb sequence, including deletion of the Vm34Ca promoter region, substitution of the Vm34Ca promoter region with another sequence, or changing the orientation of the Vm34Ca gene, to test their effects on gene amplification. Additionally, Vm34Ca can be substituted with any of its homologous genes (Vm26Aa, Vm26Ab, or Vm32E) to determine whether something about the Vm34Ca sequence specifically is required for amplification or merely active transcription at a particular developmental stage. It will be important to introduce a sequence tag to the Vm34Ca gene to distinguish the transcript of the transformed 6 kb from the endogenous gene when assessing, for example, the effect of the promoter deletion. Although active transcription may promote replication initiation at some genomic regions, it is clearly not sufficient for amplification, as there are many highly expressed gene regions that do not display a significant increase in copy number. One possibility is that active transcription promotes ORC binding and replication initiation only when it occurs during specific stages. Because 16C follicle cells contain a mixed population of stages 9 through 14, the highly expressed genes that are not in amplified regions may be expressed exclusively in stage 14, though we know this is not the case for vitelline membrane genes. It is also possible that highly expressed genes were actively transcribed prior to stage 9 and persist in follicle cells, as is certainly the case for genes encoding components of the vitelline membrane. Because the time span of 16C follicle cells is over 20 hours, however, we believe there must be some highly expressed genes that were actively transcribed during amplification stages but are not amplified. These genomic regions are also useful models to investigate what properties, in addition to active transcription, are necessary for gene amplification. 140 Genome-wide origin mapping studies in cell culture have found that origins are significantly proximal to or overlapping with RNAPII binding sites ((KARNANI et al. 2010; MAcALPINE et al. 2010). It will be important to determine whether the follicle cell amplified regions have different localization patterns or levels of RNAPII enrichment compared to nonamplified regions. ChIP-seq of RNAPII is capable of distinguishing between active transcription and genes poised for transcription. However, one technical consideration of this experiment is starting material. Performing ChIP using staged egg chambers allows developmental timing to be precisely known. For example, the absence of ORC at DAFC-34B would not have been detected without stage specific experiments. However, using hand-sorted egg chambers means that nurse cells comprise a significant portion of the population at stage 10. Because nurse cells are not known to amplify the DAFC regions (stage 9 egg chambers show no regions of amplification, [Eng T, personal communication]), ChIP of replication proteins using whole egg chambers almost certainly reflects the follicle cell population. However, nurse cells are very transcriptionally active, so it would be difficult to assess whether RNAPII signal came from the nurse cells or follicle cells for stage 10 egg chambers; nurse cells undergo apoptosis in stage 11. We have not yet performed ChIP using flow sorted 16C nuclei, but this approach would ensure pure populations of follicle cell DNA without the resolution of distinct developmental stages. This method would also likely require amplification of ChIP DNA because of the low quantity of starting material, a step that was not necessary by isolating enough staged egg chambers. Combining approaches, ChIP-seq of staged egg chambers as well as 16C follicle cell DNA, may elucidate distinctive properties of RNAPII localization that cause these highly expressed regions not to become amplified. 141 Although there are only six genomic regions that are amplified significantly in follicle cells (greater than 2-fold), close examination of the aCGH data of 16C follicle cells reveals regions that show apparent low levels of amplification. For example, at 26A, which contains a cluster of genes highly expressed in follicle cells, there is a 100 kb region of very low DNA enrichment (maximum log2 ratio is 0.4024 at 26A) (Figure 2-3A). One possibility is that high transcriptional activity of this region enables replication initiation to occur, but only in a subset (50% or less) of follicle cells. Because 0.5-fold enrichment would be difficult to assess by qPCR using individual probes, further investigation of these potential regions of replication initiation would require aCGH experiments. It would also be important to determine if these low levels of DNA enrichment were eliminated when RNAPII activity was reduced, which could be achieved using transgenic flies with RNAi knock-down of RNAPII subunits or a temperature-sensitive allele of RpII215 (MORTIN and KAUFMAN 1982). CG7337 expression and strain-specific amplification of DAFC-22B DAFC-22B is unique in that a single 60 kb gene is located in the most amplified region. Surprisingly, strains that do and do not amplify the region show the same overall expression levels of CG7337. The transcription start site of the D isoform corresponds to the most amplified region, and when we observed higher levels of this isoform in non-amplifying OrRMOD staged egg chambers, it was appealing to hypothesize that expression of this isoform was responsible for inhibiting amplification. However, when we examined expression in purified follicle cells, we did not observe a difference in D isoform expression levels between the two strains. Although we attempted to examine the expression of CG7337 by in situ hybridization using multiple probes and both colorimetric as well as fluorescent detection, we were not able to detect expression in follicle cells. Because CG7337 expression may play a critical role in DAFC-22B 142 amplification, it would be important to revisit the localization of this gene by in situ hybridization. We identified sequence differences between the closely related non-amplifying OrRMO and amplifying OrR TOW strains and used P element mediated transformation to introduce these sequences into flies. We are currently waiting to recover transformants of these sequences. By testing for ectopic amplification, we will be able to uncover what sequence differences, if any, are responsible for differential ORC binding and amplification. Furthermore, because DAFC-22B is the only amplicon known to bind ORC in another cell type (multiple cell culture lines), we are also testing whether this sequence, buffered with Suppressor of Hairy-wing binding sites, will enable gene amplification in follicle cells. Specific histone acetylation and gene amplification Using an amplification reporter system, we demonstrated that H4K8 acetylation is necessary for amplification of TT1 (the minimal sequence for DAFC-66D amplification). Although we demonstrated by ChIP-qPCR that H4K8 acetylation at the amplicons was similar to the pattern of tetra-acetylated H4, we did not examine H4K8 acetylation on a genome-wide scale. Tetra-acetylated H4 is enriched at many sites across the genome, and it is possible that H4K8 specifically marks sites of replication initiation. Additionally, we can test antibodies specific for H4K16 acetylation, as this reagent (verified for specificity) is now available. However, genome-wide localization of H4K8 (or other histone modifications) has the same challenges as the localization of RNAPII by ChIP; it is impossible to distinguish signal arising from follicle cells versus nurse cells. As with RNAPII, it will be important to perform ChIP-chip or ChIP-seq on both staged egg chambers, to retain development timing information, as well as purified follicle cell DNA, to ensure tissue specificity of the signal. These experiments will also 143 reveal whether there are differences in H4 acetylation at the later egg chamber stages compared to stage 10, which may influence late amplification at DAFC-34B or DAFC-62D. Investigating ORC-independent initiation at DAFC-34B At DAFC-34B, there is no detectable ORC after stage 10 despite a late round of replication initiation. In Chapter 3, we proposed several models to explain this observation. First, ORC may be required for late amplification, but it may undergo a significant structural rearrangement uniquely at DAFC-34B that masks the ORC2 antibody epitope. An ORCI antibody is available, and although use of this antibody has not been reported for ChIP, we can test for developmental localization of ORCI. Additional models posit ORC-independent mechanisms of replication initiation. To address these models, conditional replication factors will be necessary. In Appendix 1, we report our strategy and preliminary results toward generating conditional replication factor mutants in ORC 1, MCM6, and DUP/CDT 1. Another model proposes that late initiation at DAFC-34B is due to break-induced replication. In yeast, Pol32 is uniquely required for break-induced replication (LYDEARD et al. 2007). By BLAST analysis, CG3975 is the top candidate for a Pol32 homolog in Drosophila. There are several P element insertion lines in CG3975, and it will be important to assess DAFC-34B amplification in these lines. Because gene amplification may produce doublestranded breaks at the elongating replication forks of other amplicons, it will be informative to test for enrichment of y-H2Av by ChIP-chip as well as genome-wide aCGH analysis in CG3975 mutants. Application of Drosophila genomic resources to the study of follicle gene amplification 144 In this thesis we have demonstrated that follicle cell gene amplification is a powerful in vivo model to study DNA replication. There are many questions to address at the level of individual amplicons. However, Drosophila offers many powerful genomic resources that can be used to further investigate follicle cell gene amplification and uncover replication regulatory mechanisms. There are several transgenic RNAi libraries in Drosophila. In the process of studying DAFC-22B amplification, we discovered that this region is amplified in the genetic background of the VDRC collection (DIETZL et al. 2007). Such a collection makes a genomewide screen for genes affecting amplification very feasible. For example, by crossing RNAi lines to a ubiquitous follicle cell driver such as c323a, dissecting stage 13 egg chambers, and performing qPCR using probes for the DAFC-66D origin, a region 50 kb away from the DAFC66D origin, or the DAFC-22B most amplified region, one could identify genes affecting amplification levels, replication elongation, and possibly DAFC-22B specific replication. It may be very informative to implement a candidate screen on groups of genes such as transcription factors, histone modifying enzymes, and chromatin remodeling factors to test the effectiveness of this strategy. In addition to transgenic and molecular reagents, Drosophila has the advantage of having 12 species with genome sequences (CLARK et al. 2007). It will be informative to do comparative analysis of the amplicon regions to potentially identify DNA elements important for amplification based on sequence conservation. Additionally, we performed preliminary synteny analysis of the follicle cell amplicons and discovered that there is a break in synteny at DAFC30B in D. pseudoobscurathat makes it a useful tool for studying the sequence boundaries necessary for amplification (Appendix 2). In D. pseudoobscura,we observed that there was an increase in copy number at stage 10 and stage 13. Consistent with late amplification, our 145 developmental analysis of ORC localization by ChIP-chip shows ORC binding in pooled stages 11 and 12 at DAFC-30B (Appendix 3). Because a complete replication profile at DAFC-30B has not been performed, it will be important to revisit this experiment to examine whether it also displays unique replication properties or late replication initiation. Close molecular analysis of all six amplicons will provide the first comprehensive analysis of all replication origins in a metazoan cell type. These studies will provide an important picture of what properties are shared or distinct in the specification of ORC localization and replication initiation. REFERENCES CLARK, A. G., M. B. EISEN, D. R. SMITH, C. M. BERGMAN, B. OLIVER et al., 2007 Evolution of genes and genomes on the Drosophila phylogeny. Nature 450: 203-218. DIETZL, G., D. CHEN, F. SCHNORRER, K. C. Su, Y. BARINOVA et al., 2007 A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature 448: 151156. KARNANI, N., C. M. TAYLOR, A. MALHOTRA and A. DUTTA, 2010 Genomic study of replication initiation in human chromosomes reveals the influence of transcription regulation and chromatin structure on origin selection. Mol Biol Cell 21: 393-404. LYDEARD, J. R., S. JAIN, M. YAMAGUCHI and J. E. HABER, 2007 Break-induced replication and telomerase-independent telomere maintenance require Pol32. Nature 448: 820-823. MAcALPINE, H. K., R. GORDAN, S. K. POWELL, A. J. HARTEMINK and D. M. MACALPINE, 2010 Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res 20: 201-211. MORTIN, M. A., and T. C. KAUFMAN, 1982 Developmental genetics of a temperature-sensitive RNA polymerase II mutation in Drosophila melanogaster. Mol Gen Genet 187: 120-125. XIE, F., and T. L. ORR-WEAVER, 2008 Isolation of a Drosophila amplification origin developmentally activated by transcription. Proc Natl Acad Sci U S A 105: 9651-9656. 146 Appendix One: Strategy and preliminary results toward generation of conditional replication factor mutants in Drosophila Jane C. Kim, Wendy M. Lami, James M. Berger 2, Stephen P. Bell', and Terry L. Orr-Weaver 1 Dept. of Biology, MIT and HHMI 2 Dept. of Molecular and Cell Biology, UC Berkeley W.L. performed the yeast experiments. J.B. identified candidate sites for TEV protease cleavage site insertion. J.K. performed the molecular cloning. 147 INTRODUCTION In metazoans one challenge of studying the functions of essential genes in a temporally specified manner is the difficulty in obtaining and generating conditional null mutants. Some temperature-sensitive alleles have been isolated in mutagenesis screens, but these mutagenic events are not amenable to studying specifically defined genes (GAZIOVA et al. 2004; SUZUKI et al. 1971) Recently, the Nasmyth lab developed an approach to generate a conditional mutant of the SCC1/RAD21 component of the essential cohesin complex in Drosophila (PAULI et al. 2008). By engineering a version of RAD21 with three tandem tobacco etch mosaic virsus (TEV) protease cleavage sites that could complement a Rad21 null mutant, TEV protease expression could be induced to inactivate RAD21 function in a cell-type specific and temporally controlled manner. This appendix describes the strategy and initial experiments to generate conditional mutants in Drosophila in essential replication initiation components, specifically the prereplication complex (pre-RC) proteins ORC1, MCM6, and DUP/CDT1. Having these tools will enable the study of a number of key questions related to the developmental regulation of DNA replication such as the requirement of ORC function for all initiation events during follicle cell amplification, the role of DUP/CDT1 during replication elongation, and the requirement of polyploidy in various Drosophila tissues. These applications will be described further in the Discussion. RESULTS To construct conditional mutants in Drosophila, our goal was to identify positions within the protein sequence for which addition of three tandem TEV protease cleavage sites (amino acids ENLYFQG) would, when introduced in a transgenic fly, rescue a null mutant. Upon 148 expression of TEV protease, the engineered protein should be completely inactivated. Because this strategy requires a null mutant and the ability to rescue this mutant, we narrowed our choice of ORC and MCM subunit by selecting specific genes for which the criteria of null mutants and transgenic complementation were fulfilled: Orc1 and Mcm6 (PARK and ASANO 2008; SCHWED et al. 2002). For dup/Cdtl, it was previously shown that expression of dup cDNA driven by the UASGAL4 system failed to rescue the sterility or lethality of any mutant combination (CLAYCOMB 2004). One explanation is that dup expression is precisely regulated and mutants can only be rescued by a genomic construct and not via the ectopic expression inherent in the UAS-GAL4 system. We therefore decided to introduce a large BAC containing the dup genomic region into flies using site-specific integration (VENKEN et al. 2009). The 60 kb BAC CH321-89D23 (FigureAl-1) was injected into two lines VK33 (65B2) and VK37 (22A3). These lines will require genetic testing to determine whether they rescue dup mutant alleles. We decided to test different insertion positions for TEV protease cleavage sites in Orc Ip, Mcm6p, and Cdtlp first in budding yeast. Structural analysis was used to predict positions between structured domains or unordered regions between ordered regions. The target sequences are listed in Table Al-I by rank based on protein structure. To test the candidate sites in budding yeast, we took the approach outlined in Figure Al2. Two unique six pair restriction sites were introduced to the target position by site-directed mutagenesis. Three tandem TEV protease cleavage sites were synthesized with the appropriate flanking restriction sites and cloned into the target position. The modified gene was integrated into the LEU2 locus. For the OrcIp and Mcm6p candidates, we could use "swapper strains" to test the ability of the modified gene to complement wild type function. These URA3- strains have 149 ORC1 and MCM6 genomic deletions with URA3+ plasmids containing the wild type gene. Selection of URA3- clones using 5-FOA will determine whether the modified gene is functional. For the Cdtlp candidate, we integrated the modified gene into a temperature-sensitive strain and tested for growth at the restrictive temperature. Using this approach, we found that the cdt1-K450 allele complemented wild type function and showed reduced growth upon galactoseinduced expression of TEV protease. The equivalent mutation in Drosophila, K543, will be constructed as a candidate conditional replication factor mutant. The results and status of cloning for the other candidates are summarized in Table A1-2. DISCUSSION We have described the strategy and initial experiments to generate conditional mutants of Orc1, Mcm6, and dup/Cdt] in Drosophila. Once generated, these tools will enable the study of a number of key questions related to the developmental regulation of DNA replication. We have shown that a new follicle cell amplicon DAFC-34B displays two rounds of DNA replication initiation separated by an elongation phase (Chapter 3). Although ORC is localized to the amplicon during the first stage of amplification, it is absent in subsequent stages despite the later round of origin activation, suggesting a possible ORC independent role of initiating DNA replication. By using a conditional allele, we will specifically inactivate ORC function after the first round of amplification to determine if, in fact, ORC is necessary for the second round. One of the advantages of using follicle cell gene amplification in Drosophila as a model to study metazoan DNA replication is that the process can be directly visualized. Using immunofluorescence and detection of newly incorporated bromodeoxyuridine (BrdU), a nucleotide analog, DUP/CDT1 was found to co-localize with elongating replication forks (CLAYCOMB et al. 2002). Given the unique onionskin structure of amplicons, with replication 150 bubbles within replication bubbles and the possible head-to-tail collision of replication forks, it was proposed that DUP/CDT1 might serve as a processivity factor for the MCM helicase or facilitate continuous helicase reloading at these slow replication forks (CLAYCOMB et al. 2002). A conditional mutant of DUP/CDT 1 will allow its inactivation after replication initiation has occurred and permit the study of what role this pre-RC component has during replication elongation. Many plant and animal cells use endocycles as a developmental strategy to increase DNA content (EDGAR and ORR-WEAVER 2001). This increase in nuclear DNA typically corresponds to a proportional increase in cell size and increased metabolic activity. Studies in Drosophila have implicated Notch signaling as a key pathway in regulating the mitotic-to-endocycle switch, and several target genes have been identified (DENG et al. 2001; SUN and DENG 2005). Because Notch signaling also plays a critical role in cell proliferation, it is difficult to use Notch inactivation to study the direct impact of inhibiting the endocycles without perturbing other developmental processes. Conditional replication factor mutants will allow replication to be inhibited after the mitotic cycles have occurred and the normal cell number reached to determine the importance of increased ploidy on cellular and tissue function. Because these conditional mutants will employ the use of GAL4 drivers to induce protein inactivation, tissue-specific drivers can be used to determine if increased ploidy plays differential roles in various Drosophila tissues. Several questions regarding ORC function during gene amplification, DUP/CDT1 function during replication elongation, and the importance of polyploidy in different Drosophila tissues remain unstudied because of the lack of appropriate experimental tools. The Nasmyth lab has shown that conditional mutants of a specific gene can be generated in Drosophila. Finding 151 the appropriate position in the ORC1, MCM6, and DUP/CDT 1 proteins that will permit insertion of the TEV protease cleavage sites and be accessible to TEV protease cleavage will allow these important questions to be addressed. REFERENCES CLAYCOMB, J. M., 2004 Gene Amplification in Drosophila Ovarian Follicle Cells as a Developmental Strategy and Model for Metazoan DNA Replication, pp. 203 in Biology. MIT, Cambridge, MA. CLAYCOMB, J. M., D. M. MACALPINE, J. G. EVANS, S. P. BELL and T. L. ORR-WEAVER, 2002 Visualization of replication initiation and elongation in Drosophila. J Cell Biol 159: 225236. DENG, W. M., C. ALTHAUSER and H. RUOHOLA-BAKER, 2001 Notch-Delta signaling induces a transition from mitotic cell cycle to endocycle in Drosophila follicle cells. Development 128: 4737-4746. EDGAR, B. A., and T. L. ORR-WEAVER, 2001 Endoreplication cell cycles: more for less. Cell 105: 297-306. GAZIOVA, I., P. C. BONNETTE, V. C. HENRICH and M. JINDRA, 2004 Cell-autonomous roles of the ecdysoneless gene in Drosophila development and oogenesis. Development 131: 27152725. PARK, S. Y., and M. ASANO, 2008 The origin recognition complex is dispensable for endoreplication in Drosophila. Proc Natl Acad Sci U S A 105: 12343-12348. PAULI, A., F. ALTHOFF, R. A. OLIVEIRA, S. HEIDMANN, 0. SCHULDINER et al., 2008 Cell-typespecific TEV protease cleavage reveals cohesin functions in Drosophila neurons. Dev Cell 14: 239-251. SCHWED, G., N. MAY, Y. PECHERSKY and B. R. CALVI, 2002 Drosophila minichromosome maintenance 6 is required for chorion gene amplification and genomic replication. Mol Biol Cell 13: 607-620. SUN, J., and W. M. DENG, 2005 Notch-dependent downregulation of the homeodomain gene cut is required for the mitotic cycle/endocycle switch and cell differentiation in Drosophila follicle cells. Development 132: 4299-4308. SUZUKI, D. T., T. GRIGLIATTI and R. WILLIAMSON, 1971 Temperature-sensitive mutations in Drosophila melanogaster. VII. A mutation (para-ts) causing reversible adult paralysis. Proc Natl Acad Sci U S A 68: 890-893. UHLMANN, F., D. WERNIC, M. A. POUPART, E. V. KOONIN and K. NASMYTH, 2000 Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell 103: 375-386. VENKEN, K. J., J. W. CARLSON, K. L. SCHULZE, H. PAN, Y. HE et al., 2009 Versatile P[acman] BAC libraries for transgenesis studies in Drosophila melanogaster. Nat Methods 6: 431434. 152 ...... .. .... ..... - Figure Al-1 60 kb BAC containing the dup genomic region I. 112701 I U4U~A4 na 00 11280k na CG34365 taic-5 CG34365-RFC~~~~~ Pies2 Pms2A-AU4 Mtk Mtk-RI- SRP-RCWZT CH321-89023 153 W ~~~~ 'ift t- CG30472 CG30472-RA-U CG34188 cG34188-RA+- Figure A1-2 Experimental Strategy Flow Chart (1) Move gene region into pBluescript (2) Introduce two unique six pair restriction sites to target sequence by site-directed mutagenesis SphI---NheI (3) Clone three tandem TEV protease cleavage site sequences (generated by custom DNA synthesis) into the target sequence SphI-3xTEVseq-NheI (4) Move gene region with TEV protease cleavage sites into integrating plasmid ChrV ChrXI (5) Transform integrating plasmid into swapper strain (Orc1, Mcm6) or temperature-sensitive strain (Cdt1) (6) Test for loss of covering plasmid (Orc1, Mcm6) or growth at restrictive temperature (Cdt]) Growth on 5-FOA? (7) Test for reduced viability upon expression of TEV protease 154 Table A1-1. Conditional replication mutant candidates. ORC 1 Conditional Mutant Candidates Rank Target Amino Acid Region 1 E290 EDEEE-DEDEE 2 L30 GGQKRL-RRRGA 2 E250 ITDNE-DGNE 2 D760 KAKDD-NDDDD 3 3 3 3 T413 N644 S700 E747 Notes 342aa N-terminal deletion viable 342aa N-terminal deletion viable 342aa N-terminal deletion viable Completed to Step 7. Equivalent growth with TEV protease expression Completed to Step 2 Completed to Step 2 Completed to Step 2 Completed to Step 2 LKTT-QKHQ KGLN-DSFF ASVS-GDAR YDDE-DKDL MCM6 Conditional Mutant Candidates Rank Target Amino Acid Region 1 KIO LNHVK-KVDDV 1 S469 NIGAS-SPDAN 2 A365 IQENA-NEIPT 2 G686 ANPVG-GRYNR Notes N-terminal deletion viable Completed to Step 3 Completed to Step 3 Completed to Step 3 CDT1 Conditional Mutant Candidates Rank Target Amino Acid Region 1 K450 KVTQK-SSNAN Notes Completed to Step 7. Reduced viability with TEV protease expression 2 S70 PDTS-QGFD Completed to Step 7. Equivalent growth with TEV protease expression 155 Table A1-2. Yeast Strains Used in this Study. Strain 9127 (UHLMANN et al. 2000) AIAy19 (SPB) ASY2157 (SPB) Description Galactoseinducible TEV protease OrcI swapper E1541 Cdtl ts-allele Mcm6 swapper (JACOBSON et al. Relevant Genotype MATu, SCCJ-HA3..H1S3, GAL-NLS-myc9-TEV-NLS2 x J0..iRPJ MTa. ade2-1, ura3-], his3-]1,15, trp]-], leu2-3,]]2, 1, URA3) can-]00, orclxhisG, pSPB]6 MATa, ade2-], ura3-11, his3-]1,15, leu2-3, cani-QO, trp]MATa, ade2-1, trpl-l, canl-]00, leu2-3,112, his3-1],]S, ura3, GAL, psi±, sid2-2] (cdt]-ts) 2001) YWL8 YWL9 YWLJ Ta, crc] 1)760 TEV..LEU2, orcl..hisG cdt]-ts, GAL-NLS-myc9-TEV-NLS2 x ]0..TRPJ GAL-NLS-myc9-TEV-NLS2 x 1O:TRP ________MA YWLc2 YWL94 _MATa, _MTP, rc1D760_TEV.LEU2, orc].hisG, GAL-NLSpS452myc9-TEV-NLS2 x 10 .TRP MATa, cdtl-ts, GAL-NLS-myc9-TEV-NLS2 x 10::TRP1, MAT, cdt] K450 TEV::LEU2 REFERENCES JACOBSON, M. D., C. X. MUNOZ, K. S. KNOX, B. E. WILLIAMS, L. L. Lu et al., 2001 Mutations in SID2, a novel gene in Saccharomyces cerevisiae, cause synthetic lethality with sic1 deletion and may cause a defect during S phase. Genetics 159: 17-33. UHLMANN, F., D. WERNIC, M. A. POUPART, E. V. KOONIN and K. NASMYTH, 2000 Cleavage of cohesin by the CD clan protease separin triggers anaphase in yeast. Cell 103: 375-386. 156 Appendix Two: Synteny analysis of the DAFC-30B amplified region 157 Gene amplification in Drosophila is a powerful model for studying metazoan DNA replication. In addition to the genetic and molecular tools available, 12 Drosophila species have been sequenced, enabling comparative genomic analysis of functional DNA elements (CLARK et al. 2007). Although primarily investigated in Drosophilamelanogaster,follicle cell gene amplification has been demonstrated in at least 14 Drosophila species as well as the Mediterranean fruit fly Cerratitiscapitata (CALVI et al. 2007; VLACHOU et al. 1997). We analyzed gene synteny in the six amplified genomic regions. At DAFC-30B, we observed a break in synteny between D. melanogasterand D. pseudoobscura that prompted closer investigation of this genomic region (Figure A2-1A). Because the break occurred in the most amplified region, we tested whether amplification of DAFC-30B was conserved in D. pseudoobscuraby quantitative real-time PCR. Additionally we tested whether the region distal to the break was amplified in D. pseudoobscura.We observed stage-specific amplification of DAFC-30B in D. pseudoobscura(Figure A2-1B). However, we did not observe amplification of genes distal to the break (Figure A2-1C). These experiments reveal that the CG1 7855 homolog (GA14701) is the left-most boundary of the sequence required for amplification. Furthermore, analysis of syntenic breaks may be a useful method to define the cis requirements for amplification in different Drosophila species. 158 ACKNOWLEDGEMENTS We thank Matt Rasmussen and Manolis Kellis (MIT) for assistance with visualizing synteny in Drosophila species, which led to the observations reported here. REFERENCES CALVI, B. R., B. A. BYRNES and A. J. KOLPAKAS, 2007 Conservation of epigenetic regulation, ORC binding and developmental timing of DNA replication origins in the genus Drosophila. Genetics 177: 1291-1301. CLARK, A. G., M. B. EISEN, D. R. SMITH, C. M. BERGMAN, B. OLIVER et al., 2007 Evolution of genes and genomes on the Drosophila phylogeny. Nature 450: 203-218. VLACHOU, D., M. KONSOLAKI, P. P. TOLIAS, F. C. KAFATOS and K. KoMITOPOULOU, 1997 The autosomal chorion locus of the medfly Ceratitis capitata. I. Conserved synteny, amplification and tissue specificity but sequence divergence and altered temporal regulation. Genetics 147: 1829-1842. 159 Figure A2-1. Analysis of gene synteny and DAFC-30B amplification in D. pseudoobscura. In D. pseudoobscura,there is a break in synteny at DAFC-30B to the left of CG1 7855 (A). The region to the right of the break is designated Dp DAFC-30B, and the region to the left of the break is Dp Distal Region. (B and C) qPCR quantification of genomic DNA from egg chambers staged as in Calvi et al. DNA copy number is quantified relative to DpAct5c, which we assume to be non-amplified. At Dp DAFC-30B, there is stage-specific gene amplification (B). The Dp Distal Region is not amplified (C). 160 .......... .... . ............. . .......... . ...... . ......... ...... : . .. .......... :: ................ .................. . Figure A2-1 1 95100001 50 kb 95200001 9 95300001 9 95400001 95500001 95600001 95700001 95800001 95900001 Dmn aCGH 10 CG33298 CG332 0t3 ~J85 Oatp3OB OatpO Oatp3OB Dm genes I*44444$ CG31883 D !QwLC48! ip INA31709 jp, Q C34 CG3838 '.~gcmft CG3838 CG4300 CG4389N Cgr3113I CG 13114EM D. pseudoobs cura synteny H I a GA10336 (CG 10473) GA25312 (CG33298) GA14701 (CG 17855) I. GA 17702 (Oat3OB) Gene Amplification in Dp DAFC-30B GA12056 (CG 13114) Dp DAFC-30B Dp Distal Region Gene Amplification in Dp Distal Region m GA12056 (CG13114) * GA14701 (CGI7855) G A10336 (CG10473) idkm dnf 543 2- i 161 M GA25312 (CG33298) M GA17702 (Ot30B) Appendix Three: Summary of follicle cell amplicons 162 ....... .. ............ ..................................................... ..... ..... - - - -, W.::--:- :- . ........ .......... .............. Figure A3-1 Amplicon Expressed genes Maximum Amplification Identification Reference DAFC-7F Cp36, Cp38... 15-20 fold Spradling DAFC-22B CG7337 4 fold This Thesis CG13113, 4 fold Claycomb et 1981 DAFC-30B CG13114... DAFC-34B a/ 2004 Vm34Ca, 6-8 fold This Thesis 4 fold Claycomb et 60-80 fold Spradling CG16956... DAFC-62D yellow-g, yellow-g2... DAFC-66D a/ 2004 Cp15, Cp18... _ DAFC-228 DAFC-348 _1981 DAFC-62D st1OB initiation st1OB initiation st1OB initiation stil initiation stil elongation stil elongation st12-13 elongation st12 final initiation st12 elongation st13 elongation st13 final initiation 163 -- I.............. u :: :::::: ''I'll", - - - = . ............... . ...... Figure A3-2 DAFC-30B DAFC-22B 005 SUO 1WO55001fj3 pjI 000554006 05 2W5 5 "" 5 0555w 19S C.) CGM7 40C6M 5- 00 0 5- - -1- 0c ot 5 00 DAFC-34B0N 1 51 1 1 3000 13 0 10OW 1 1 13 00 DAFC-62D I- I 50005 xO zloi mm8 M 2mm nOM o 00 OU anul*l 051 | En m 5 ,n ' CcD l l l il i i a 0; 0e' 0 -~ 09 5-L o5. 0" lill~ilbla . - .a . o 5- 00 o- 7. DAFC-66D DAFC-7F _0 55 a8o505 870o5 175051 -AIL s70050l 87555lsm 5 S 0 M01 5am5mi ammOW 80M 00 0- 5I0 8ami 00m05u 1 7001pp. i m ili Illl 5 _ , 5- 0 0 5: _ o 0 : 0 5- 0 1,- O o) . 5 . o 000 L M .-. 164 --- 1.1.-". " im'- -p