The analysis of novel distal Cebpa enhancers and silencers using a transcriptional model reveals the complex regulatory logic of hematopoietic lineage specification Eric Bertolinoa,*, John Reinitza,b,c, Manuc,d,* a Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL 60637, U.S.A. b Department of Statistics, The University of Chicago, Chicago, IL 60637, U.S.A. c Department of Ecology and Evolution and Institute of Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, U.S.A. d Department of Biology, University of North Dakota, 10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019, U.S.A. * Corresponding authors. E-mail addresses: manu.manu@und.edu (M) and eric.bertolino@gmail.com (EB). Abstract C/EBPα plays an instructive role in the macrophage-neutrophil cell-fate decision and its expression is necessary for neutrophil development. How Cebpa itself is regulated in the myeloid lineage is not known. We decoded the cis-regulatory logic of Cebpa, and two other myeloid transcription factors, Egr1 and Egr2, using a combined experimental-computational approach. With a reporter design capable of detecting both distal enhancers and silencers, we analyzed 46 putative cis-regulatory modules (CRMs) in cells representing myeloid progenitors, and derived early macrophages or neutrophils. In addition to novel enhancers, this analysis revealed a surprisingly large number of silencers. We determined the regulatory roles of 15 potential transcriptional regulators by testing 32,768 alternative sequence-based transcriptional models against CRM activity data. This comprehensive analysis allowed us to infer the cis-regulatory logic for most of the CRMs. Silencer-mediated repression of Cebpa was found to be effected mainly by TFs expressed in non-myeloid lineages, highlighting a previously unappreciated contribution of long-distance silencing to hematopoietic lineage resolution. The repression of Cebpa by multiple factors expressed in alternative lineages suggests that hematopoietic genes are organized into densely interconnected repressive networks instead of hierarchies of mutually repressive pairs of pivotal TFs. More generally, our results demonstrate that de novo cisregulatory dissection is feasible on a large scale with the aid of transcriptional modeling. Keywords: cell fate; gene regulation; hematopoiesis; silencers; transcriptional modeling 2 Introduction The spatiotemporal expression of genes is encoded in the genome by cis-regulatory sequences, which may be located tens to hundred of kilobases from the transcription start site (Carey et al., 2008; Spitz and Furlong, 2012). It is usually possible to empirically define cisregulatory modules (CRMs) as sequences that act as enhancers (Banerji et al., 1983; Banerji et al., 1981) or silencers (Brand et al., 1985; Ogbourne and Antalis, 1998) of the activity of the core promoter in reporter assays. The activity of CRMs results from sequence-specific transcription factors that bind to their recognition sites and recruit cofactors which interact with the RNA polymerase II holoenzyme complex or remodel chromatin (Spitz and Furlong, 2012). Careful analysis of the CRMs of a few well-characterized genes (Fromental et al., 1988; Göttgens et al., 2002; Ondek et al., 1988; Schirm et al., 1987; Small et al., 1992; Wilson et al., 2011; Yuh et al., 1998) has revealed how the internal composition and structure of CRMs—the arrangement of transcription factor binding sites (TFBS), the TFs binding to them, and interactions between bound TFs—encodes the pattern of gene expression. For the vast majority of genes however, both the identities of CRMs as well as their cis-regulatory logic remain unknown. Determining the cis-regulatory logic of individual genes is an important goal of functional genomics (Nam et al., 2010). First and foremost, determining the cis-regulatory logic of individual genes is a prerequisite for constructing high-quality gene regulatory networks (GRNs) (Levine and Davidson, 2005; Singh et al., 2014) and modeling them predictively. Second, even though the putative rules of cis regulation have been inferred by the analysis of a few genes (Cantor and Orkin, 2002; Göttgens et al., 2002; Kim et al., 2013; Small et al., 1993; Wilson et al., 2011), checking their generality requires that we repeat such analyses on a much larger scale. Transcriptional regulation is an input-output problem. The key to unscrambling cis-regulatory logic is to map inputs (TF concentrations) to output (rate of transcription), conditioned by regulatory sequence. A necessary requirement for successfully decoding regulatory logic therefore is to include all three: TF concentrations, DNA sequence, and transcriptional output. Mainstream genomic approaches, such as Chromatin Immunoprecipitation followed by Sequencing (ChIP-Seq), RNA-seq, and massively parallel reporter assays (Arnold et al., 2013; Levo and Segal, 2014; Melnikov et al., 2012; Nam et al., 2010; Sharon et al., 2012), assay either input or output but not both. This fact necessitates the development of the means to include all three components in cis-regulatory decoding. 3 More than mapping an input to an output, transcriptional regulation is a problem of mapping multiple inputs to a single output, since CRMs are regulated by multiple interacting TFs. For example, the CRM driving the expression of the second stripe of the even-skipped gene of Drosophila is bound by 7 TFs at about 20 binding sites (Arnosti et al., 1996b; Janssens et al., 2006; Small et al., 1992). The binding of CRMs by multiple TFs is widespread. Studies in multiple cellular contexts, including the hematopoietic system (Heinz et al., 2010; Wilson et al., 2010a), have detected combinatorial binding of lineage-specifying TFs. More generally, the ENCODE and modENCODE projects (ENCODE Project Consortium et al., 2012; Gerstein et al., 2012; Gerstein et al., 2010) have identified Highly Occupied Targets (HOTs)—DNA sequences occupied by multiple TFs—which occur at a frequency higher than one expected by chance (Nègre et al., 2011). TFs interact in complex manners to control the spatiotemporal program of gene expression. Many activators are known to promote gene expression synergistically, TFs can bind cooperatively, and repressors interfere with the activator function (Arnosti et al., 1996a; Cantor and Orkin, 2001; Cantor and Orkin, 2002; He et al., 2012a; Heinz et al., 2010; Kulkarni and Arnosti, 2005; Small et al., 1993; Small et al., 1996). Multiple interacting inputs make largescale cis-regulatory inference challenging since there isn’t a straightforward correspondence between TF binding and gene expression (Calero-Nieto et al., 2014). Addressing this complexity of cis-regulation requires that we devise a computational attack on the problem. Here, we present a new approach for reverse engineering the cis-regulatory logic of a target gene. Our approach overcomes the challenge of regulatory complexity by integrating multiple datasets —evolutionarily conserved non-coding DNA sequence, genome-wide gene expression data, TF binding preferences, and reporter activity data—using a transcriptional model that explicitly simulates mechanisms of TF interaction. Our premise is that datasets assaying multiple aspects of gene regulation, in combination with the rules of gene regulatory interaction encapsulated in the model, will constrain the number of cis-regulatory schemes consistent with activity data. Our transcriptional model is of the so-called “thermodynamic” type. Thermodynamic models have been used to quantitatively predict CRM activity during development (He et al., 2012b; Janssens et al., 2006; Kazemian et al., 2010; Kim et al., 2013; Reinitz et al., 2003; Segal et al., 2008; Zinzen et al., 2006). In contrast to the previous applications of such models where the key TFs and functional roles were known from previous work, here we use the model to learn them from datasets probing multiple aspects of gene regulation. 4 The reverse engineering approach (Fig. 1) relies on four elements, 1) DNA sequences of a large number of putative CRMs, 2) estimates of TF concentrations, 3) quantitative measurements of CRM activity, and 4) a transcriptional model that takes CRM sequence and TF concentrations as input to compute a prediction for CRM activity. We identify TFs likely to be regulating the set of CRMs under consideration based on the presence of TF binding sites and their expression patterns. We then formulate models that simulate the regulation of each CRM by the candidate TFs, taking into account TF binding, interactions, and functional roles. In this work we allow for two functional roles, activation or repression, which are not known beforehand. At this point, our approach deviates from previous applications of transcriptional models. In order to learn the functional roles of TFs, we construct 2 N models, where N is the number of TFs included in the model (Fig. 1). Each model is then fit to CRM activity data using simulated annealing (Kim et al., 2013; Lam and Delosme, 1988a, b). The composition of the best-fitting models then implies the TF functional roles consistent with the CRM activity data. At the end of the process, we arrive at specific predictions for the TFs regulating each CRM, their binding sites, and whether they activate or repress their targets—the cis-regulatory logic of the CRM. We applied the reverse-engineering methodology to determine the cis-regulatory logic of genes encoding TFs involved in hematopoietic cell-fate specification. Cell-fate choice during hematopoiesis is known to depend on the expression levels of specific transcription factors. For example, the expression of the Ets-family transcription factor PU.1, encoded by spleen focus forming virus proviral integration oncogene (Sfpi1), is necessary for myeloid and lymphoid development (Scott et al., 1994). Although PU.1 is expressed in both lineages, it specifies B-cell and macrophage fates in a concentration-dependent manner (DeKoter and Singh, 2000). Lineagespecifying TFs exert their effects in two main ways. First, they regulate the expression of genes encoding other TFs (Laslo et al., 2008), forming transcriptional networks. Second, hematopoietic TFs regulate the expression of cytokine receptors (DeKoter et al., 2002; Zhang et al., 1997), allowing progenitor cells to respond to extracellular signals in order to escape cell death, enter/exit the cell cycle, or move to the next level of differentiation (Bertolino et al., 2005 and references therein). We considered three genes, CCAAT/enhancer binding protein, alpha (Cebpa), early growth response 1 (Egr1), and early growth response 2 (Egr2), which participate in a GRN directing macrophage-neutrophil cell-fate choice. Cebpa-/- mutant mice lack mature neutrophils and 5 multipotential progenitors do not express granulocyte colony stimulating factor receptor (Zhang et al., 1997). PU.1 and C/EBPα promote the macrophage and neutrophil fates by upregulating the antagonists of the alternative cell fate, Egr1/Egr2 and growth factor independent 1 (Gfi1) respectively (Dahl et al., 2003; Laslo et al., 2006). Egr2 and Gfi1 also repress each other, forming a mutually antagonistic GRN. This GRN has been suggested to function as a bistable switch that selects a macrophage state at high PU.1 levels and neutrophils at high C/EBPα levels (Laslo et al., 2006). Although the model above treats PU.1 and C/EBPα as autonomous inputs, it is clear that their own regulation is not independent of the cell fate decision. For example, PU.1 positively regulates its own expression indirectly, by promoting longer cell cycles causing increased accumulation of its protein product (Kueh et al., 2013), and directly, by binding two distal CRMs (Leddin et al., 2011; Li et al., 2001). Cebpa is also known to be regulated by C/EBPα and other C/EBP family members (Legraverend et al., 1993), which bind to a 350bp promoter region upstream of the transcription start site (TSS). An enhancer located 37kb downstream of the Cebpa gene has recently been identified (Guo et al., 2014; Guo et al., 2012). It is activated by several TFs, including PU.1, RUNX1, and C/EBPα (Cooper et al., 2015). These results hint that the GRN guiding myeloid differentiation is yet to be fully explored. In an effort to uncover new regulatory links participating in the macrophage-neutrophil decision, we undertook a systematic cis-regulatory dissection of the Cebpa, Egr1, and Egr2 loci. We identified and analyzed a total of 46 putative CRMs, which were assayed in the Sfpi1-/- PU.1inducible estrogen receptor (PUER) cell line (Walsh et al., 2002). PUER cells are blocked at a progenitor state and can be differentiated into either macrophage- or neutrophil-like cells by inducing the translocation of PU.1 (PUER) protein into the nucleus with 4-hydroxy-tamoxifen (OHT) (Dahl et al., 2003; Laslo et al., 2006; Walsh et al., 2002). We generated quantitative Luciferase reporter activity data in uninduced cells with the IL-3 cytokine (progenitor stage), OHT-induced cells with IL-3 (early macrophage), and OHT-induced cells with G-CSF cytokine (early granulocyte). These assays identified several CRMs that enhanced or diminished the activity of the proximal promoter, as well as apparently inactive sequences. The transcriptional output data were matched with TF concentration input data from a genome-wide gene expressionmicroarray dataset (Laslo et al., 2006) acquired in the same conditions. These data and the model were used to reverse engineer the cis regulation of Cebpa, Egr1, and Egr2. We evaluated the regulation of these CRMs by 15 candidate TFs in parallel and constructed 215=32,768 alternative models to test functional roles. Predicted TFs were validated against prior evidence and ChIP datasets deposited in NCBI Gene Expression Omnibus. 6 Our analysis shows that Cebpa has a surprisingly complex regulatory logic, integrating inputs from multiple activators and repressors. We found that Cebpa proximal promoter and enhancing CRMs are activated primarily by TFs expressed in the myeloid lineage—C/EBP family members, PU.1, and Egr1—implying that, in addition to upstream TFs, the gene is regulated by its own targets in a positive feedback loop topology. In contrast, Cebpa is repressed primarily by TFs expressed in other hematopoietic lineages, suggesting that cross-lineage antagonism is widespread and not limited to pair-wise interactions modeled in bistable switch models (Huang et al., 2007; Laslo et al., 2006). This study extends the utility of transcriptional models beyond systems where the TFs and their functional roles are already known and demonstrates the feasibility of reverse engineering cis-regulatory logic on a larger scale. Materials and Methods Cell Culture We utilized Sfpi1-/- cells expressing conditionally activable PU.1 protein (PUER) that can be differentiated into macrophages or neutrophils by PU.1 activation (Dahl et al., 2003; Laslo et al., 2006; Walsh et al., 2002). PUER cells were routinely maintained in complete Iscove’s modified Dulbecco’s medium (IMDM) containing 5 ng/ml IL-3. PUER cells were differentiated into macrophages by adding 100nM 4-hydroxy-tamoxifen (OHT). PUER cells were differentiated into neutrophils by adding OHT in the presence of 10 ng/ml Granulocyte Colony Stimulating Factor (G-CSF). Identification of CRMs We downloaded pairwise alignments, produced by the blastz tool (http://www.bx.psu.edu/miller_lab/), of the mm9 (mouse) and canFam2 (dog) genomes from the UCSC genome browser (http://genome.ucsc.edu). We computed the mean sequence identity in the 101bp surrounding each nucleotide position. A threshold of 0.7 was applied to the mean sequence identity to delineate conserved regions. Regions containing at least one conserved region were identified as putative CRMs. The sequences of the assayed CRMs are provided in Supplementary Text S2 in FASTA format. 7 Reporter design Putative CRMs were cloned into a pGL3 Luciferase reporter vector (Promega). The proximal promoter was introduced in the multiple cloning site (MCS) of pGL3. The distal CRMs were inserted in a SalI site downstream of the SV40 late poly(A) signal. The intervening sequence was 2828bp in length and consisted of pGL3 backbone including the beta-lactamase gene (see Text S2 for sequence). Since the lengths of the promoter regions were different for Cebpa, Egr1, and Egr2, the distance between the distal elements and the TSS was different for each gene. The 3’ ends of the distal CRMs were located 4,022bp, 3,241bp, and 3,352bp from the TSSs of Cebpa, Egr1, and Egr2 respectively. Reporter assays PUER cells were plated and cultured overnight. Cells were transiently transfected subsequently with the reporter vector and Renilla reporter vector (Promega) using the Fugene transfection reagent (Roche) according manufacturer’s instructions. After 24hrs, the cells were washed, lysed, and the levels of both firefly and Renilla luciferase activities were measured using a dual luciferase activity kit (Promega). Transfections were performed in duplicate in all conditions. For Cebpa CRMs, assays were performed in uninduced PUER cells in IL-3 (progenitor), in the presence of IL-3 and OHT (early macrophage), and in the presence of G-CSF and OHT (early granulocyte). Egr1 and Egr2 CRMs were only assayed in progenitor and macrophage conditions. The firefly luciferase activity was normalized to Renilla activity to control for sample-to-sample transfection efficiency variation. Sequence-based model of transcription A model is constructed by identifying candidate TFs and specifying three inputs: 1) DNA sequence of reporter constructs, 2) estimates of the concentrations of the included TFs, and 3) PWMs of the included TFs. Identification of candidate TFs. We used the “Match” tool of the transcription factor database TRANSFAC (Matys et al., 2006) to search the CRM sequences for TFBS of TFs known to regulate immune-specific genes. There were 62 immune-specific factors, listed in Table S1, having at least one predicted TFBS in the sequences. Based on a literature search, we further 8 subdivided these TFs into those implicated in myeloid-specific gene regulation and those not yet implicated—which we refer to as “non-myeloid” for convenience (Table S1). To measure differential expression, we computed the standard deviation (Fig. S1) of gene expression in the uninduced, IL3+OHT, and GCSF+OHT conditions in the Laslo et al. dataset (Laslo et al., 2006). TF concentrations. We estimated TF concentrations using microarray gene expression measurements from PUER cells in uninduced, IL3+OHT, and GCSF+OHT conditions reported by Laslo et al. (Laslo et al., 2006) (Fig. S2). For genes with multiple probes, we chose the probe with the highest mean intensity over the three cell types to represent the gene’s expression level in the model (Table S3). DNA sequences. In order to accurately represent the distances between binding sites and the TSS, we modeled the CRM, vector, and proximal promoter sequences as they appear in the reporter (Fig. 3D). Binding sites detected in the vector sequence were not included in the computation as they are presumed to be nonfunctional. PWMs. We obtained PWMs from TRANSFAC (Matys et al., 2006) (http://www.biobaseinternational.com/product/transcription-factor-binding-sites) and JASPAR (Mathelier et al., 2014) (http://jaspar.genereg.net) databases. We evaluated a total of 88 factor-specific and pan-family PWMs for the 19 TFs modeled in this study. While choosing PWMs, we considered two issues affecting their quality. The first is that PWMs derived from a small number of bound sequences can be biased toward the base composition of the founder sites as well as favor high affinity sites. The second is that PWMs derived by pooling sites bound by multiple members of a TF family may be non-specific and exhibit a high false positive rate. When considering a large number of PWMs and TFs, it is not practical to determine the provenance of each PWM individually. We developed an empirical quality criterion to identify high quality PWMs. The first PWM property included in the criterion is the affinity of the highest-scoring binding site in the CRMs relative to the consensus (highest affinity) site. This is evaluated as the affinity factor 1 , where S cons−S max +1 Scons and Smax are the scores of the consensus and the highest-scoring site amongst the CRMs respectively. This factor increases in value with Smax , reaching a maximum value of 1 if the CRMs contain one or more consensus sites. Low values imply that the PWM only detects weak sites in the CRMs and indicate a potential high-affinity or founder-sequence bias in the PWM. To 9 address the second quality issue, the inability to discriminate between CRMs, we computed the difference in scores between the top-scoring sites of the 1st and 5th CRMs, denoted by Δ . Low values of Δ imply that the PWM has lower specificity and a higher false positive rate. We multiplied the affinity factor and Δ to obtain the quality criterion Q= Δ . PWMs with Scons−S max +1 higher values of Q are able to detect stronger sites and better discriminate between the CRMs. With two exceptions, we represented TFs with the PWMs having the highest quality criterion value. The exceptions were C/EBPα and C/EBPδ, for which a TRANSFAC pan-family PWM, CEBP_Q2, had the highest score. To discriminate between individual members of the family, we chose instead the highest-scoring factor-specific PWM. For GATA, the pan-family PWM GATA_Q6 had a very low quality score and we utilized the highest-quality factor-specific PWM, GATA3_02. The results are robust with respect to different GATA family PWMs (Fig. S12; see below). The PWMs chosen are listed in Table S2. The CRM sequences were scored with PATSER (Hertz et al., 1990). We chose thresholds low enough so that weak sites would be included in the model; sites having a binding affinity of at least 0.07 of the consensus site were included (Table S2). The model results are robust with respect to the choice of PWM. We replaced 1) the GATA3_02 PWM with a GATA1, a GATA2, or a pan-family GATA PWM and 2) the C/EBPα/C/EBPδ factorspecific PWMs (Table S2) with the CEBP_Q2 pan-family PWM in model 81762. The PWMs were substituted one at a time and the modified models were fit to data while representing the associated TF, GATA, C/EBPδ, or C/EBPα, as an activator or a repressor. In all cases the same role was inferred as 81762, and the models utilizing alternative PWMs agreed very well with 81762 (Fig. S12; r 2 ≥ 0.87). The scores were however slightly higher in the modified models (Fig. S12), validating our PWM quality control. Global nonlinear optimization We generated 215 models encapsulating all the possible combinations of the regulatory roles of the TFs (Fig. 1). We determined the free parameters of each model using Lam Simulated Annealing (LSA) (Reinitz and Sharp, 1995) as described previously (Janssens et al., 2006) in 5 replicates. 10 We tested the dependence of the quality of fit on the number of TFs included in the model by removing TFs in order of increasing regulatory constraint, that is, from right to left in Figure 5C. Upon removing the 4 least constrained TFs, the lowest score achieved increases, but is close to the range of the 20 lowest scoring 15-TF models (Fig. S13). The lowest scores progressively increase as the number of TFs is reduced. 7- and 5-TF models have scores comparable to those achieved with randomized data (Fig. S3). These simulations suggest that removing the unconstrained TFs has a minor effect on the quality of the fit and a core group of 11 wellconstrained TFs is essential to achieve the lowest obtained scores. Model selection We identified the 20 lowest scoring models. See Table S4 for scores of each replicate. In order to identify a representative model, we clustered the models hierarchically using the dissimilarity of regulatory roles as the pairwise distance metric (Fig. S4). Lower dissimilarity scores imply greater likeness of regulatory schemes between pairs of models. Representing the regulatory roles assigned to each TF as a binary vector, with 1 for activation and −1 for repression, we computed the weighted Euclidean distance between each pair of models. The weights were |f i −0.5|, act act where f i is the fraction of models, among the 20, that assigned an activating role to TF i . This ensured that models assigning the same role to well-constrained activators clustered together. We hierarchically clustered the models using the LINKAGE function of MATLAB (v. 8.0.0.783) using the shortest distance algorithm. For the first round of reverse-engineering with only myeloid implicated factors, there are 6 clusters at a dissimilarity cutoff of 0.4, of which all but one are sparsely populated (Fig. S4A). The biggest cluster has 8 models with highly similar regulatory schemes (Fig. 5B; top 8 models), allowing us to pick one, model 12058, for further analysis (see Table S5 for parameter values). In the second round of reverse engineering with additional non-myeloid factors, the 20 lowest scoring models form 5 clusters at a cutoff of 0.4, the largest of which consists of 8 models (Fig. S4B). All eight members have highly similar regulatory schemes (Fig. 5C; top 8 models), and we chose a member, model 81762, for further analysis. Results 11 Sequence-based model of transcription Our model (Fig. 2) is derived from a sequence-based model of transcription (Reinitz et al., 2003) that has been demonstrated to predict gene regulation during Drosophila segmentation quantitatively (Janssens et al., 2006; Kim et al., 2013). A detailed description of the model and its equations is provided in Supplementary Text S1. Given the DNA sequence of a CRM and the position weight matrices (PWMs) and concentrations of the TFs believed to regulate the CRM, our model computes the rate of transcription in several steps. First, the model identifies binding sites by scoring the sequence with PWMs and retaining the sites above a pre-specified threshold score (Fig. 2A; see Methods). Second, the model computes the fractional occupancy of each site by taking an average over the ensemble of siteoccupancy configurations, whose statistical weights depend on the concentrations of the TFs and the binding affinities of the sites (Fig. 2B). Some repressors act by reducing the activity of a specific activator in a position-dependent manner (Arnosti et al., 1996a; Ogbourne and Antalis, 1998; Stopka et al., 2005), a phenomenon referred to as quenching. In the third step of the calculation, quenching is implemented by reducing the occupancy of activators bound near repressors (Fig. 2C). In the fourth step, the fractional occupancies of the bound activators are summed, weighted by their activation efficiencies, to determine the strength of interaction of the CRM with the core promoter (Fig. 2D). In contrast to quenching, some TFs act over large distances to directly repress the activity of the proximal promoter by interfering with promoterenhancer interactions or recruiting chromatin-remodeling enzymes to establish large repressive chromatin domains (Harmston and Lenhard, 2013). We represent their effects by reducing the interaction strength of the CRM (Fig. 2E) to determine the net interaction strength. In the final step, transcription initiation is modeled as a diffusion-limited enzymatic reaction, in which the activation energy barrier is lowered in proportion to the net interaction strength computed previously (Fig. 2F). In summary, the model takes TF concentrations as input and simulates TFTF interaction by the mechanisms of 1) competition for binding sites, 2) quenching, 3) long-range repression, and 4) synergistic activation to determine the transcriptional output of a CRM. Identification of putative CRMs 12 We identified putative CRMs as non-coding sequences having a high degree of evolutionary conservation, a commonly used strategy for de novo CRM prediction (Hardison and Taylor, 2012; Landry et al., 2009; Wilson et al., 2010b). We computed the average sequence identity over a 100bp window between the mouse and dog genomes (Fig. 3A-C). Highly conserved regions were identified as those having greater than 70% identity. Applying this threshold to the dog-mouse sequence identity yielded putative CRMs varying in lengths between 400bp and 1500bp, which were long enough to include potential quenching or other TF-TF interactions but short enough that cis-regulatory dissection was still practical (Fig. 3A-C). We tested a total of 46 CRMs varying in length between 400bp and 1500bp. Below, we refer to CRMs by the gene name followed by the CRM number in parentheses. For example, CRM 7 of Cebpa is denoted as Cebpa(7). Reporter constructs and activity data Design of reporter constructs. Sequences upstream of the core promoter usually contain a proximal promoter, which binds sequence-specific TFs and acts together with distal CRMs to regulate gene expression (Bertolino and Singh, 2002; Carey et al., 2008). The reporter vectors were designed to take into account potential positive and/or negative interactions of distal CRMs with their cognate proximal promoters. We identified putative proximal promoters as evolutionarily conserved sequences upstream of the TSS of the endogenous gene (Fig. 3A-C; CRMs numbered 0) and placed them immediately upstream of the Luciferase gene in the vector (Fig. 3D; pink boxes). Since most CRMs are distant from the TSS, placing them near the core or proximal promoters in reporter assays—a common practice—can introduce artificial regulatory interactions (Chopra et al., 2012; Gray and Levine, 1996). Instead, we placed the distal putative CRMs of each gene ~3kb upstream (see Methods) of the cognate proximal regulatory sequence (Fig. 3D; blue boxes). This construct design allowed us to detect both long-range enhancing and silencing activities of CRMs by comparing CRM-bearing vectors with the proximal-only vectors (Fig. 3D; top row). The location of the CRMs implied that any modulation of the activity relative to the proximal-only vector occurred over long distances (see Methods). Activity data reveal the regulatory complexity of Cebpa CRMs. We measured the activity of the CRMs in three conditions—1) PUER cells in IL-3 uninduced with OHT (uninduced), 2) 24 hours after induction by OHT in IL-3 (IL3+OHT), and 3) 24 hours after OHT induction in G-CSF (GCSF+OHT)—which resemble macrophage-neutrophil progenitors, early macrophages, and 13 early neutrophils respectively. The activity of Cebpa CRMs (Fig. 4A) vary extensively by CRM —up to 4.5x—and cell type—up to 15x. The patterns of differential expression are CRM dependent. For example, Cebpa(7) has greater activity in the uninduced condition whereas CRMs Cebpa(16) and Cebpa(18) have the greatest activity in IL-3+OHT conditions. Three patterns of cis-regulation are discernable. A few putative CRMs, such as Cebpa(5), Cebpa(13), and Cebpa(19), do not change activity relative to the proximal-only construct (Cebpa(0)) and hence appear to be inert in the cell types we consider. We find four CRMs, Cebpa(7), Cebpa(14), Cebpa(16), and Cebpa(18), which act as enhancers by increasing activity, up to 4.5x, relative to Cebpa(0). Note that Cebpa(18) roughly corresponds to a recently described +37kb enhancer (Guo et al., 2014; Guo et al., 2012), being in the same genomic location but ~200bp longer. We also find many CRMs, such as Cebpa(2), Cebpa(6), Cebpa(9), Cebpa(10), Cebpa(11), Cebpa(15), Cebpa(20), Cebpa(23), and Cebpa(24), which diminish activity relative to Cebpa(0) by a factor of up to 4.5x, and thereby act as silencers. Although a few CRMs, such as Cebpa(22), activate in one cell type while repressing in another, the enhancers or silencers listed above act consistently in all three cell types. Egr1 and Egr2 CRMs were assayed only in uninduced and IL3+OHT conditions. In contrast to the rich activity patterns exhibited by Cebpa CRMs, Egr1/2 putative CRMs behave quite uniformly. Egr1 has only one enhancer, Egr1(2), and two silencers, Egr1(5) and Egr1(9) (Fig. 4B). Egr2 has no silencers; most CRMs have enhancing activity in un-induced cells but have no effect in the IL-3 condition (Fig. 4C). Notably, neither gene showed CRM-dependent differential activity as was observed for Cebpa. These differences between Cebpa and Egr1/2 in the complexity of CRM behavior suggest that the genes have distinct regulatory architectures. Reverse engineering the putative CRMs of Cebpa, Egr1, and Egr2 Here we describe the general approach to reverse engineering in the context of its application to Cebpa, Egr1, and Egr2. The main steps (see schematic in Fig. 1) are as follows. First, we identify candidate TFs to include in the model. Second, we construct a family of models encompassing all the possible combinations of regulatory roles. Third, we use global nonlinear optimization to infer the free parameters of the models by minimizing the score—the sum of squared difference between model and data—for each model. The score is also used to pick the models that best explain the observed patterns of CRM activity for further analysis. Fourth, the chosen models are analyzed further to infer the cis-regulatory logic of each CRM in the model. 14 Identification of candidate TFs. We took a broad approach to identifying TFs to include in the model, starting with 62 immune-specific TFs predicted to bind at least one CRM in the transcription factor database TRANSFAC (Matys et al., 2006) (see Methods). We further winnowed the candidate TFs by identifying ones expressed differentially in PUER cells (Laslo et al., 2006), reasoning that they are more likely to explain the regulatory complexity observed in the activity data (Fig. 4). Surprisingly, we found that TFs previously implicated in myeloid differentiation had much higher differential expression than non-myeloid TFs (Fig. S1C,D; Methods). This led us to suspect that differential expression of the myeloid-specific TFs drives the cell-type specific response of the CRMs. We chose the top 15 differentially expressed myeloid-specific TFs as candidates for a first attempt at reverse engineering (Fig. S1). Model inputs. The model takes several inputs to compute CRM activity. The first is DNA sequence. We incorporated the DNA sequences of all the assayed CRMs, including the inactive ones, into the model. We expect that inactive CRMs will act as negative controls, constraining the model to reduce the amount of TFs binding to them and hence reduce the number of falsely identified TFBS. The second input, the concentrations of the TFs, was provided by microarray gene expression measurements from PUER cells (Laslo et al., 2006) in conditions matched to the CRM activity measurements. The data are shown in Figure S2 for the TFs included in the model, and are in general agreement with an independent dataset from PUER cells in IL3 conditions (Weigelt et al., 2009) (Fig. S2B). The third input, the DNA binding properties of the TFs was provided by PWMs from TRANSFAC and JASPAR, which were used to detect TFBS for the candidate TFs (Table S2; see Methods). The resulting sequence-based model for 46 CRMs contained ~700 binding sites. The model was formulated in an internally self-consistent manner so that TF properties were common to all CRMs. This implies that differences in predicted CRM activity arise solely from differences in DNA sequence. A family of sequence-based models. In order to infer the regulatory roles of the TFs, we constructed a family of models that realized all the possible combinations of regulatory roles for the 15 TFs. Allowing each TF to assume two roles, activation or repression, resulted in 215 (32,768) alternative models. Note that each model realization is structurally distinct from all the 15 others since changing the role of even one TF results in completely different TF-TF and TFpromoter interactions. Model and parameter inference by nonlinear optimization. We used Lam Simulated Annealing (Janssens et al., 2006; Kim et al., 2013; Reinitz and Sharp, 1995) to minimize the loss function or score, computed as the residual sum of squares, for each model realization and inferred the values of their free parameters in 5 replicates. The median absolute deviation over the replicates for the 20 lowest-scoring models (Table S4) varied between 0.004% and 5% of the median score. A narrow range of replicate scores indicates that each model is attaining the global minimum; the termination of replicate fits at different local minima would have led to a broad distribution of their scores. The optimization problem is not underdetermined, having many more datapoints than parameters (Supplementary Text S2), and the fits are statistically significant (Fig. S3; Supplementary Text S2). Lastly, the scores of the family of models range over an order of magnitude (Fig. 5A), suggesting that they are able to discriminate between different realizations of TF functional roles. Model analysis. We chose the 20 lowest scoring models (Table S4) to determine how well the data constrain regulatory roles and to check the per-CRM agreement with data. The regulatory roles of each TF represented in these models is depicted in Figure 5B. The regulatory roles of 5 TFs, C/EBPδ, Egr1, Gfi1, Myb, and PU.1, are completely constrained, being identical in all 20 models. 6 TFs, C/EBPα, Egr2, Ikaros, IRF4, Jun, and Myc, are well constrained, having the same role in more than 60% of the models. 4 TFs, C/EBPβ, Fos, Fli1, and Ets1, are poorly constrained. This implies that not only can we infer TFs likely to be regulating the CRMs, but also eliminate TFs that are poorly constrained by the data as unlikely to be regulating the CRMs. We further inspected the quality of the fit by analyzing a representative model in detail. We clustered the models on the similarity of their regulatory schemes (Fig. S4A; see Methods), and chose one, model 12058, from the largest cluster for further analysis (see Table S5 for parameter values). The output of model 12058 is correlated well with the activity data (r 2=0.78; Pearson’s correlation coefficient; Fig. 5D), implying that the model recapitulates cell-type- and CRMspecific changes and the dynamic range of the activity data. A direct comparison of the data and model output is shown in Figure 6A-C. For Cebpa CRMs, with a few exceptions noted below, the model correctly reproduces the cell type- and CRM-specific upregulation of all the enhancers. The model shows greater upregulation by Cebpa(7) and Cebpa(14) in uninduced than IL3+OHT 16 conditions, and reverses the pattern for Cebpa(16) and Cebpa(18) in accordance with the data. The lack of up- or down-regulation by inactive elements and down-regulation by several, though not all, silencers is also reproduced. For Egr1, the model reproduces the pattern of upregulation and downregulation observed in data, although in several cases the amounts are different. For Egr2, the model reproduces the overall low level of activity of its CRMs but incorrectly shows similar levels of activity in uninduced and IL3+OHT conditions instead of the relatively lower levels observed in the latter. To summarize, the model reproduces most of the cell type- and CRM-specific features of the data and the levels of a majority of the individual CRMs. Model 12058 deviates from data in several CRMs and conditions; one class of deviations indicated that the model lacked repressors. There are isolated instances of the model predicting lower than observed activity, such as Cebpa(18), Egr1(14), and Egr2(10), but the reverse is more common. Predicting higher than observed levels is particularly prominent in the silencers. The model predicts overexpression for 4 of the 9 silencers of Cebpa, Cebpa(6), Cebpa(20), Cebpa(23), and Cebpa(24) and three other CRMs, Cebpa(8), Cebpa(17), and Egr2(7). The inability to correctly repress activity suggests that the model lacks repressors that presumably bind the silencer CRMs. We had limited the model to myeloid-specific TFs initially since nonmyeloid TFs are not differentially expressed in these cell types. Uniformly expressed TFs can, however, provide CRM-specific but cell-type independent repression. Since cross-lineage antagonism is quite common in hematopoiesis (Laslo et al., 2008), it is possible that some of the excluded non-myeloid TFs might bind silencer CRMs and repress Cebpa in the lymphoid or red blood-cell lineages. To rigorously evaluate this possibility and improve the prediction of silencer CRMs in the model, we included a limited number of non-myeloid factors in the model. Increasing the number of TFs adds additional parameters, that is, additional degrees of freedom to the optimization problem (Supplementary Text S1). Including more TFs would therefore make it difficult to discern whether the improvement in the fit results from the additional degrees of freedom or from novel regulation introduced by the new TFs. Analysis of the low-scoring models indicated however, that some TFs were dispensable since their roles were poorly constrained (Fig. 5B; Fig. S4). We exploited this to include additional non-myeloid TFs without introducing additional parameters by identifying and eliminating TFs having the least activity in model 12058. We computed the maximum activity of each TF over all CRMs (Fig. S5) and removed the two activators and the two repressors having the smallest maximum activity, C/EBPβ, Fos, IRF4, and Egr2. 17 Next, we included four lineage-specifying TFs from non-myeloid lineages, E2A, Elf1, EBF1, and GATA(s), that had binding sites in the silencer CRMs. EBF1 is involved in specifying the B-cell lineage (Laslo et al., 2008; Pongubala et al., 2008), Elf1 is required for the differentiation of the natural killer (NK)- and NKT-cells (Choi et al., 2011), whereas E2A is required for both B- and Tcell development (Bain et al., 1994; Rothenberg, 2014). GATA1, GATA2, and GATA3 have very similar binding site preferences in vitro (Ko and Engel, 1993; Merika and Orkin, 1993) and bind a large number of overlapping sites in vivo (Doré et al., 2012; May et al., 2013). Due to this degeneracy in binding and uniform expression in the PUER cells (Fig. S2), we expected that the model would not be able to distinguish amongst the three and hence represented them as a “lumped” GATA regulator. Any conclusions we draw regarding GATA might pertain to influences from either the erythryoid (GATA1), MEP or mast cell (GATA2), or T-cell (GATA3) lineages. The revised scheme of reverse engineering was executed in a manner identical to the previous round. After optimizing the 215 models, we observed a dramatic reduction in the scores of the lowestscoring models—from 84,486 in the previous round to 35,072 here (Fig. 5A; Table S4). The regulatory roles of 8 TFs are completely constrained, compared to 5 in the models lacking nonmyeloid factors (Fig. 5B,C). Notably three of the newly added non-myeloid factors, EBF1, GATA, and Elf1, were well constrained as repressors, supporting the hypothesis that non-myeloid TFs repress Cebpa CRMs. Clustering on similarity of regulatory roles (Fig. S4B; see Methods) allowed us to choose the lowest-scoring model, 81762, as a representative for further analysis (Tables S4 and S5). The correlation between model output and data is further improved (r 2=0.91; Fig. 5E), implying a better recapitulation of the changes in activity by cell type and CRM. This overall improvement is achieved in part by better repression of nearly all the under-repressed CRMs, including silencers. A few mispredictions remain uncorrected, such as Egr2(10), while four CRMs, Cebpa(8), Cebpa(21), Egr1(5), and Egr2(4) are over-repressed. The overall improvement in the agreement between model and data suggests that the non-myeloid TFs help explain the CRM-specific patterns of expression. It is notable in this context that model 81762, the lowest-scoring model, assigns repressive roles to all the non-myeloid factors. The cis-regulatory logic of Cebpa, Egr1, and Egr2 18 Here we use the model to infer the TFs, binding sites, and interactions that generate the cell typeand CRM-specific pattern of activity observed for the three genes. We do this by inspecting the intermediate steps in the calculation of transcription rate and decomposing it into contributions from individual binding sites. Since each binding site is associated with a particular TF, we can infer the cognate TF and its regulatory role as well. Activators acting through their coactivators catalyze the recruitment of the transcription holoenzyme complex to the promoter to increase the rate of transcription. This is represented in the model by reducing the activation energy barrier by an amount ΔΔ A , which depends on the net interaction strength (Fig. 2F and Supplementary Text S1). The net interaction strength, in turn, depends on the occupancies and activation efficiencies of the bound activators and can be decomposed into contributions from individual activator binding sites (Fig. 2D). Hence, we can determine individual contributions to ΔΔ A by plotting each term of the net interaction strength separately (Fig. 7A-C). Long-range repressors act by interfering in the recruitment of the holoenzyme complex, a phenomenon modeled by reducing the net interaction strength (Fig. 2E) in the model. To determine repressive activity, we plot the factor by which each bound repressor reduces the interaction strength (Fig. 7D-F). We found negligible and inconsistent contributions from quenching and they were not considered in our regulatory analysis. This is consistent with our reporter design (Fig. 3D) and activity data (Fig. 4), since the reduction of activity in reporters carrying silencers in addition to the promoters could not have occurred via quenching. First, we illustrate the process with the examples of the Cebpa proximal promoter, an enhancer, and a silencer. Following the illustrative examples, we describe the inferred cis-regulatory logic and compare to available evidence. Figure 7A shows the contributions of the activator binding sites in the model for the proximal promoter of Cebpa, Cebpa(0). The model identifies a total of 6 binding sites for 4 TFs. The largest contribution in all three conditions is from C/EBPδ, with relatively smaller contributions from Gfi1 and Egr1. Although Myc has a binding site, its contribution to activation is negligible. Plotting the strength of repression reveals two repressor sites, for Jun and C/EBPα, with weak activity (Fig. 7D). The patterns of expression (Fig. S2) of the TF inputs, the main activator C/EBPδ and the repressor Jun, explain the pattern of the output, promoter activity, in combination. The promoter is strongly downregulated in GCSF+OHT (Fig. 4A). This is a combined effect of C/EBPδ downregulation (Fig. 7A) and Jun upregulation (Fig. 7D), whereas 19 the promoter activity is unchanged in IL3+OHT due to compensation of Jun repression by C/EBPδ upregulation. The model for enhancer Cebpa(16) contains 4 activator binding sites, 3 for PU.1 and 1 for C/EBPδ (Fig. 7B). Although each individual PU.1 site has a small contribution to activation, together they upregulate Cebpa(16) expression by a factor of ~2.5 in IL-3 due to the synergism in the action of multiple activator sites represented in the model (Fig. 2D,F). As with the promoter, the patterns of PU.1 and C/EBPδ expression (Fig. S2) explain the activity of the reporter containing Cebpa(16) and Cebpa(0) (Fig. 4A). This reporter is upregulated more in OHT-induced than uninduced conditions. This can be directly attributed to the induction of PU.1 and the preferential upregulation of C/EBPδ in IL3+OHT. The model’s prediction that PU.1 binds Cebpa(16) is supported by genome-wide ChIP studies of PU.1 binding (Fig. S11A), one in PUER cells (Heinz et al., 2010) and the other, an independent study, in neutrophil conditions of the FDCPmix system (May et al., 2013). Our analysis thus identifies Cebpa(16) as a novel PU.1responsive enhancer in the vicinity of Cebpa. In the final example of cis-regulatory inference, we analyze a silencer, Cebpa(11) (Fig. 7C,F). Cebpa(11) downregulates the activity of the proximal promoter by a factor of 2.7, 3, and 1.9 in uninduced, IL3+OHT, and GCSF+OHT conditions respectively. Correspondingly, the activation provided by binding sites is much lower in the model for Cebpa(11) (Fig. 7C; compare to panel A). Since Cebpa(11) is located ~15kb downstream of the proximal promoter in the endogenous locus and ~3kb upstream of the proximal promoter in the reporter constructs (Fig. 3A,D), the repressor interactions are occurring over long distances. Plotting the long-range repression mediated by sites in the Cebpa(11) model reveals the TFs likely responsible for downregulation— EBF1, Myb, GATA(s), and Ikaros (Fig. 7F). Consistent with the uniform expression patterns of the repressive inputs (Fig. S2), the fold downregulation is approximately the same in all three conditions. EBF1 and Ikaros are B-cell factors regulating cell-fate commitment and immunoglobulin heavy-chain rearrangement (Lin and Grosschedl, 1995; Pongubala et al., 2008; Reynaud et al., 2008). Despite the presence of a binding site in the model, Ikaros does not have any repressive contribution here (Fig. 7F). Ikaros did however have a strong repressive contribution in the model 12058 which lacked EBF1 (data not shown). This suggests that EBF1 and Ikaros repress Cebpa redundantly. 20 To summarize, the examples above show that the model’s inferences display a high degree of accord between the expression patterns of the TFs inputs, their functional roles, and the empirically observed patterns of cell- and CRM-specific transcriptional output. It is not yet possible to conclusively connect the behavior of the CRMs to that of the endogenous Cebpa transcriptional unit since CRMs do not act additively (Dunipace et al., 2011; Landry et al., 2009; Perry et al., 2011). Nevertheless, the downregulation of Cebpa in alternative lineages (Huang et al., 2009; Pongubala et al., 2008; Reynaud et al., 2008) and with OHT induction (Fig. S2) matches well with the abundance of silencers in the locus. We now describe the inferred cis-regulatory logic, limiting the discussion to TFs exerting particularly strong effects and/or regulating multiple CRMs. We also compare model predictions to available evidence—including relevant publicly-available ChIP datasets compiled from the NCBI Gene Expression Omnibus (Fig. S11). C/EBP family. The activation of Cebpa promoter by C/EBPδ (Fig. 7A) is in general agreement with the in vitro binding and reporter assays of the Cebpa promoter (Legraverend et al., 1993) that showed both binding of C/EBPα, C/EBPβ, and C/EBPδ, and transactivation by C/EBPα and C/EBPβ. C/EBP family transcription factors have very similar binding properties. This fact, in combination with similarity of expression patterns in PUER cells (Fig. S2), implies that these factors play redundant roles in the model and our conclusions about C/EBPδ could pertain to the whole family. C/EBPδ is also predicted to bind to and activate two other Cebpa enhancers, Cebpa(7) (Fig. S6A) and Cebpa(16) (Fig. 7B). We conclude therefore that one or more C/EBP family TFs bind and activate Cebpa(7) and Cebpa(16). This is supported by a ChIP-seq dataset for C/EBPα in peritoneal macrophages (Heinz et al., 2010), which shows binding of C/EBPα at the three CRMs (Fig. S11A). Lastly, a gene network inferred in human hematopoietic cells identified C/EBPδ as an upstream regulator of a module of genes coexpressed in granulocytes and monocytes (Novershtern et al., 2011). PU.1. The two strongest enhancers, Cebpa(16) (Fig. 7B) and Cebpa(18) (Fig. S6B) are driven by PU.1. Cebpa(18), like Cebpa(16), is bound by PU.1 in both PUER and FDCPmix cells (Fig. S11A; (Heinz et al., 2010; May et al., 2013)). The activation of Cebpa(18) by PU.1 is in agreement with reporter analyses in 32Dcl3 cells (Cooper et al., 2015) that also document the effect. PU.1, one of the key TFs specifying the myeloid and to a degree lymphoid lineages (Scott 21 et al., 1994), is known to promote the macrophage lineage by antagonizing the activity of C/EBPα (Laslo et al., 2006). Our analysis shows that Cebpa is a direct target of PU.1. Egr1. The model predicts that, in addition to the proximal promoter, the Cebpa enhancer Cebpa(14) is also activated by Egr1 (Fig. S6C). Egr1 binds the Cebpa proximal promoter in ChIP-seq data from bone marrow-derived dendritic cells (Fig. S11A; (Garber et al., 2012)). Also, Egr1 expression promotes macrophage differentiation at the expense of neutrophils (Nguyen et al., 1993). Egr1 has been suggested to operate downstream of PU.1 and C/EBPα in the macrophage/neutrophil decision; our analysis suggests that Egr1 supports Cebpa expression in the macrophage lineage. The model also infers that Egr1 activates its own expression by binding to CRM Egr1(2) (Fig. S9B), which is also bound in the dendritic-cell ChIP dataset (Fig. S11B; (Garber et al., 2012)). Ets1. Ets1 has been implicated in a broad range of roles in hematopoiesis, including the development of lymphocytes (Bories et al., 1995; Muthusamy et al., 1995), megakaryocytes (Lulli et al., 2006), and granulocytes (Lulli et al., 2010). The model infers that two CRMs are activated by Ets1, the proximal promoter of Egr1, Egr1(0) (Fig. S9A), and the Cebpa enhancer Cebpa(18) (Fig. S6B). Egr1’s promoter activity is downregulated in IL3+OHT (Fig. 4B); our results imply that this results from the downregulation of the Ets1 activating input (Fig. S9A and S2). The Egr1 promoter has indeed been found to be activated by Ets1 in NIH-3T3 cells (Robinson et al., 1997) and Ets1 binds the Egr1 locus in G1ME cells (Fig. S11B; (Doré et al., 2012)). The activation of Cebpa(18) by Ets1 agrees with a dramatic loss of activation when Ets sites are mutated in a roughly comparable +37kb enhancer (Cooper et al., 2015). EBF1 and Ikaros. In addition to Cebpa(11), EBF1 represses Cebpa(8) (Fig. S7D), Cebpa(19) (Fig. S8F), and Egr1(5) (Fig. S9F). Pongubala et al (Pongubala et al., 2008) found that EBF1 represses Cebpa by binding to weak sites in its promoter. Our results suggest that EBF1 represses Cebpa by binding to distal sites in addition to the promoter-proximal ones. Additionally, there is detectable EBF1 binding at the Egr1 locus (Fig. S11B) in pre-B cells (Treiber et al., 2010) and Rag1-/- pro-B cells (Lin et al., 2010). Ikaros is also known to repress Cebpa (Reynaud et al., 2008), although we detected Ikaros repression only in the models lacking EBF1 (Fig. 5B,D). GATA(s). The model infers that many silencer CRMs, Cebpa(11), Cebpa(8), Cebpa(23), and Cebpa(24), and an enhancer, Cebpa(18), are repressed by one or more members of the GATA 22 family (Fig. 7F, S7D-F, and S6E). Supporting this, GATA2 binding was detected at Cebpa CRMs 24, 8, 11, and 18 in G1ME megakaryocyte progenitors (Doré et al., 2012) and at Cebpa CRMs 8 and 18 in FDCPmix multipotential progenitors (May et al., 2013) (Fig. S11A). Moreover, Cebpa is upregulated in GATA2 knockdown in G1ME cells (Huang et al., 2009). GATA2 regulates the proliferation of HSCs (Rodrigues et al., 2012), granulocyte-macrophage progenitors (Rodrigues et al., 2008), and megakaryocyte-erythrocyte progenitors (Doré et al., 2012; Huang et al., 2009). Upon mutating GATA sites, Cooper et al. (2015) found a reduction in expression driven by the +37kb enhancer corresponding to Cebpa(18) in 32Dcl3 cells. However, the direct binding of GATA2 to Cebpa(18) and the derepression of Cebpa upon GATA2 knockdown in G1ME cells suggest that the activation observed in 32Dcl3 cells is a context-dependent effect and not a general feature of Cebpa regulation. Our analysis combined with the ChIP-seq data (Fig. S11A) therefore implies that GATA2 represses Cebpa in progenitors and the red-blood cell lineage. Discussion The regulatory function of the non-coding parts of the genome remains largely unexplored. We have developed an approach that integrates datasets probing multiple aspects of gene regulation to decode cis-regulatory logic in a scalable manner. Using this approach we analyzed 46 CRMs in parallel to show that Cebpa—a gene for which previously only one distal CRM was known—has a regulatory logic that relies on multiple distal enhancers and silencers. Our approach goes beyond the detection of CRMs by determining the identity and binding sites, as well as the likely functional roles, of the TFs regulating a gene. The functional roles of TFs were determined by constructing 32,768 alternative models encapsulating all possible regulatory schemes for 15 TFs. Previous analyses have implied that hematopoietic gene expression is supported by multiple enhancers, which are usually bound by relatively few (1-3) activators in complex (Göttgens et al., 2002; Leddin et al., 2011; Pimanda et al., 2008; Wilson et al., 2010b; Yeamans et al., 2007). Our results about Cebpa suggest a considerably more complex regulatory organization involving the prominent use of silencers and enhancer-bound repressors to finely control cell type-specific expression patterns. Perhaps our most surprising finding is that the Cebpa locus contains several silencers, which, in fact, outnumber the enhancers (Fig. 4A). We base this conclusion on the fact that CRMs placed in the reporter ~3kb upstream of the cognate proximal promoter (Fig. 3D) are still able to diminish its activity (Fig. 4A). Hematopoietic genes are known to have distal silencers; a distal element 23 located 2.8kb upstream of Gata2 is known to mediate its repression by GATA1 (Grass et al., 2003) for example. However, known enhancers vastly outnumber silencers (Wilson et al., 2011). This situation is quite likely the result of a bias toward detecting enhancers in reporter design rather than an actual deficit in the number of silencers relative to enhancers. Use of reporters designed to detect both up- and down-regulation, such as the ones employed here, will likely lead to the discovery of many more silencer elements. The significance of the result is further clarified once we consider the identity of the repressors inferred to be regulating the silencers (Fig. 8A). TFs binding to the distal silencers and repressing the activity of the proximal promoter (Fig. 7F and S7), such as EBF1 and the GATA family, are expressed at very low levels in the myeloid lineage but at high levels in alternative ones. The repression of Cebpa by non-myeloid TFs is supported by two results. First, correctly simulating silencer activity was only possible in models that included non-myeloid TFBS detected in the silencer CRMs (compare panels A and D of Fig. 6). Second, silencers are occupied in vivo by the predicted repressors (Fig. S11). These results show that cross-lineage antagonism (Cantor and Orkin, 2001; Chou et al., 2009; Huang et al., 2007; Laslo et al., 2008; Laslo et al., 2006) is mediated by distal silencers. Hematopoietic lineage resolution is currently believed to occur in a hierarchical manner, where pairs of pivotal TFs function as bistable switches and repress each others’ targets (Graf and Enver, 2009; Laslo et al., 2008). The mediation of cross-lineage antagonism by silencers suggests instead that lineage-specifying TFs form densely interconnected repressive networks. For example, it has been suggested that C/EBPα functions in a cross-antagonistic pair together with EBF1 to resolve the myeloid and B-lymphoid lineages (Laslo et al., 2008; Pongubala et al., 2008). Our results together with those of Reynaud et al. (Reynaud et al., 2008), show that Cebpa is also repressed redundantly by Ikaros, which itself regulates EBF1 (Pongubala et al., 2008). This makes it difficult to partition the triplet into antagonistic pairs. Similarly, it was recently shown that GATA2 represses Sfpi1 in combination with GATA1 and its knockdown leads to myeloid differentiation of multipotential progenitors (May et al., 2013). Ascertaining whether silencermediated cross-antagonism is a general property of hematopoietic GRNs will require widespread cis-regulatory dissection with reporters designed to detect potential distal silencers as well as enhancers. 24 Although the molecular mechanisms of the repression of most hematopoietic regulators remain to be elucidated, our data suggest that cross-lineage repressors are interacting directly with the promoter. There are two modes by which a distantly bound repressor may repress a gene. The first is by displacing an activator bound to the distal CRM as seen, for example, in the GATA1dependent repression of Gata2 (Grass et al., 2003), which is represented as quenching in the model (Fig. 2C). The second mode is to directly repress the activity of the proximal promoter (long-range repression; Fig. 2E) by interfering with promoter-enhancer interactions or recruiting chromatin-remodeling enzymes to establish large repressive chromatin domains (Harmston and Lenhard, 2013). For example, Ikaros is found in complex with components of the nucleosome remodeling and deacetylation (NURD) and SWI-SNF complexes (Georgopoulos, 2002) and the GATA3 DNA binding domain bridges two separate DNA sequences, suggesting an ability to mediate looping or long distance interactions (Chen et al., 2012). Here, the cross-lineage repressors of Cebpa must be acting in the second, long-range, manner since reporters carrying both the silencer and proximal promoter have lower activity than those carrying proximal promoters alone (Fig. 4). In contrast, quenching would predict that the combined reporter has the same or higher activity, since it can, at most, reduce the activation provided by the CRM to zero. Although most of our cis-regulatory inferences are consistent with prior genetic, biochemical, and genomic evidence, there are a few points of discord. For example, the assignment of a repressive role to C/EBPα is inconsistent with its ability to transactivate its own promoter (Legraverend et al., 1993). The inconsistent assignment has most likely been made because C/EBPδ and C/EBPα, having very similar binding properties and expression patterns, can substitute for each other in the model. This in silico redundancy reflects the in vivo redundancy of the binding and activity of the C/EBP family members (Friedman, 2007b; Nerlov, 2007; Tsukada et al., 2011). Second, Gfi1, known to function as a repressor (Laslo et al., 2006; Yücel et al., 2004), has been inferred to be an activator here, albeit with a rather minor contribution. The model might be utilizing Gfi1 as a stand-in for an activator with redundant binding properties that is yet to included in our analysis. Redundancy arising from TF binding properties may be addressed in the future by acquiring TF concentration and reporter activity data from more cell types and including more TFs. This will allow the model to distinguish between members of TF families based on their differential expression in multiple cell types. Notwithstanding the exceptions noted above, our cis-regulatory dissection paints a rather complex picture of Cebpa regulation (Fig. 8). Broadly speaking, Cebpa is activated by TFs implicated in 25 the specification of the myeloid lineage and repressed by TFs directing the specification of alternative hematopoietic lineages. We found two PU.1-dependent enhancers, which implies— taken together with the observation that C/EBPα activates Sfpi1 by binding to a distal enhancer (Friedman, 2007a; Yeamans et al., 2007)—that the pair form a positive feedback loop (Alon, 2007). Cebpa is also activated by the binding of C/EBP family members to the proximal promoter (Fig. 7A), forming another positive feedback loop. Positive feedback loops can result in bistable behavior (Alon, 2007) and Cebpa’s participation in two of them might be a strategy to maintain stable gene expression once induced. The activation of Cebpa by Egr1 via Cebpa(14) is surprising, since Egr1 and Egr2 are thought to antagonize Gfi1 to resolve the macrophage and neutrophil gene expression programs (Laslo et al., 2006). However, the expression level of retrovirally-expressed Cebpa controls the ratio of CD11b+Gr-1- macrophages to CD11b+Gr-1+ neutrophils (Dahl et al., 2003); activation from Egr1 might serve to tune the level of Cebpa expression in the two cell types. Furthermore, this interaction might occur in liver tissue, where Cebpa and Egr1 are also coexpressed (Jakobsen et al., 2013). To summarize, the scheme of Cebpa activation combines induction by PU.1 and potentially other C/EBP family factors (Novershtern et al., 2011), positive feedback loops to stably maintain expression level, and potential tuning by Egr1 within the myeloid lineage. Besides alternative-lineage repressors, Cebpa is also predicted to be repressed by TFs coexpressed in the myeloid lineage, such as Jun and Myb. Such repressors are mostly active at enhancers (Fig. 7E and S6) or inactive (Fig. S8E,F) CRMs. During hematopoiesis, Cebpa is expressed at low levels in HSCs, monocytes, and granulocytes but at high levels in GMPs (Bagger et al., 2013; Hasemann et al., 2014). Jun, which can function as a repressor by forming heterodimers with C/EBPβ (Hsu et al., 1994), has an approximately complementary pattern of expression, with the exception that it is expressed at low levels in granulocytes (Bagger et al., 2013; http://servers.binf.ku.dk/hemaexplorer/). Repression by Jun is a potential mechanism to achieve the downregulation of Cebpa after the differentiation of GMPs, although confirming this possibility would require the characterization of the time courses of CRM activity in the PUER system. Both the expression pattern and inferred regulation of Egr1 are considerably simpler than Cebpa (Fig. 4B and 8). Egr1 is predicted to be activated by Ets1 and itself and repressed by EBF1 and the GATA(s). This apparent difference in the complexity of regulation could be either genuine or arise from differences in the evolutionary conservation—our criterion for identifying CRMs—of 26 the regulatory sequences of the two genes. The latter possibility may be checked by identifying putative CRMs using other means such as DNase I hypersensitivity (Hesselberth et al., 2009) or the binding of other hematopoietic TFs. If the regulatory complexity of the two genes is indeed different, it would suggest that Cebpa enjoys a more prominent position than Egr1 in the gene network directing myeloid differentiation (Laslo et al., 2006). Dissecting the gene regulation of other myeloid genes in this manner will help clarify the construction of the network. The model was unable to produce any clear inferences about the regulation of Egr2 (Fig. S10). The main reason for the lack of clear conclusions appears to be the general inactivity of Egr2 CRMs (Fig. 4C). A potential explanation for the inertness of conserved sequences in the Egr2 locus is that they serve regulatory functions in other cell types or during Egr2’s rapid induction as an immediate early gene. This possibility may be checked by measuring the enrichment of chromatin marks, such as H3K4me1 and H3K27ac (Creyghton et al., 2010; Lara-Astiaso et al., 2014), at the apparently inert locations in other cell types. Such data might not serve an analogous function for silencers since it’s not known whether any marks are enriched at distal silencers or not. Here we have utilized a model that assumes very little about the specific mechanisms of TF-TF interaction, such as dimerization (Chlon et al., 2012; Hsu et al., 1994), switchable activation/repression (He et al., 2012a), and sequestration (Cantor and Orkin, 2002), that are known—for a few example TFs—to operate in mammalian gene regulation. Lacking comprehensive knowledge about which TFs interact in these ways, we believe that parsimony in assumptions combined with inference from data is the more prudent approach. In the future, as proteomic approaches generate comprehensive maps of protein-protein interactions, it will be possible to implement such specific mechanisms into our framework to increase its power. A second limitation is the use of genome-wide gene expression data as a proxy for TF concentration, which leaves out post-transcriptional and post-translational regulation from the analysis. Although including these phenomena is desirable, using a standard and relatively easyto-acquire dataset such as genome-wide gene expression permits a broad and unbiased approach to identifying candidate TFs that is applicable to a wide range of cell types and organisms. In the future, it will be possible to include post-transcriptional and –translational regulation by using proteomic technologies such as micro-western arrays (Ciaccio et al., 2010) and modificationspecific antibodies. 27 cis-regulatory analysis is generally considered to be the gold standard for establishing functional regulatory linkages within GRNs (Nam et al., 2010). Successful cis-regulatory dissection necessitates methodologies for mapping transcriptional inputs and regulatory sequences to their output. Furthermore, regulatory control by multiple interacting TFs creates a formidable challenge since the potential number of sites and TFs to be tested is very large. The approach we have developed here leverages the mathematical rules of gene regulation, as understood currently, to map the inputs—TF expression patterns, TF binding preferences, and CRM sequence—to CRM activity patterns. We overcome the challenge of multiple inputs by allowing regulation by several TFs and combinatorially testing all possible regulatory schemes. The approach generates specific predictions that may be tested readily. A recent technological innovation, massively parallel reporter assays (MPRA) (Arnold et al., 2013; Levo and Segal, 2014; Melnikov et al., 2012; Nam et al., 2010; Sharon et al., 2012) that measure the activity of thousands of CRMs in parallel, further enhances the scalability of our approach. We expect that combining the approach presented here with MPRA datasets will enable cis-regulatory dissection on a genomic scale. Acknowledgements We thank M. Kreitman for the use of laboratory facilities during the course of this work. We thank M. Kreitman, M. Ludwig, K. Barr, and J. Gavin-Smith for discussions and A. Repele for comments on the manuscript. EB would like to thank H. Singh for support and discussions and J. Quintans for support. This work was supported by IIA-1355466, project UND0019821 from NSF ND EPSCoR (to M), 2R01OD10936 from NIH (to JR), and in part by the Chicago Biomedical Consortium with support from the Searle Funds at The Chicago Community Trust (to EB). 28 References Alon, U., 2007. Network motifs: theory and experimental approaches. Nat Rev Genet 8, 450-461. Arnold, C.D., Gerlach, D., Stelzer, C., Boryn, L.M., Rath, M., Stark, A., 2013. GenomeWide Quantitative Enhancer Activity Maps Identified by STARR-seq. Science. Arnosti, D., Gray, S., Barolo, S., Zhou, J., Levine, M., 1996a. The gap protein Knirps mediates both quenching and direct repression in the Drosophila embryo. The EMBO Journal 15, 3659-3666. Arnosti, D.N., Barolo, S., Levine, M., Small, S., 1996b. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205-214. Bagger, F.O., Rapin, N., Theilgaard-Mönch, K., Kaczkowski, B., Thoren, L.A., Jendholm, J., Winther, O., Porse, B.T., 2013. HemaExplorer: a database of mRNA expression profiles in normal and malignant haematopoiesis. Nucleic Acids Res 41, D1034-1039. Bain, G., Maandag, E.C., Izon, D.J., Amsen, D., Kruisbeek, A.M., Weintraub, B.C., Krop, I., Schlissel, M.S., Feeney, A.J., van Roon, M., 1994. E2A proteins are required for proper B cell development and initiation of immunoglobulin gene rearrangements. Cell 79, 885-892. Banerji, J., Olson, L., Schaffner, W., 1983. A lymphocyte-specific cellular enhancer is located downstream of the joining region in immunoglobulin heavy chain genes. Cell 33, 729-740. Banerji, J., Rusconi, S., Schaffner, W., 1981. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299--308. Bertolino, E., Reddy, K., Medina, K.L., Parganas, E., Ihle, J., Singh, H., 2005. Regulation of interleukin 7-dependent immunoglobulin heavy-chain variable gene rearrangements by transcription factor STAT5. Nat Immunol 6, 836-843. Bertolino, E., Singh, H., 2002. POU/TBP cooperativity: a mechanism for enhancer action from a distance. Mol Cell 10, 397-407. Bories, J.C., Willerford, D.M., Grévin, D., Davidson, L., Camus, A., Martin, P., Stéhelin, D., Alt, F.W., 1995. Increased T-cell apoptosis and terminal B-cell differentiation induced by inactivation of the Ets-1 proto-oncogene. Nature 377, 635-638. Brand, A.H., Breeden, L., Abraham, J., Sternglanz, R., Nasmyth, K., 1985. Characterization of a "silencer" in yeast: a DNA sequence with properties opposite to those of a transcriptional enhancer. Cell 41, 41-48. Calero-Nieto, F.J., Ng, F.S., Wilson, N.K., Hannah, R., Moignard, V., Leal-Cervantes, A.I., Jimenez-Madrid, I., Diamanti, E., Wernisch, L., Göttgens, B., 2014. Key regulators control distinct transcriptional programmes in blood progenitor and mast cells. EMBO J 33, 1212-1226. Cantor, A.B., Orkin, S.H., 2001. Hematopoietic development: a balancing act. Curr Opin Genet Dev 11, 513-519. Cantor, A.B., Orkin, S.H., 2002. Transcriptional regulation of erythropoiesis: an affair involving multiple partners. Oncogene 21, 3368-3376. Carey, M.F., Smale, S.T., Peterson, C.L., 2008. Transcriptional Regulation in Eukaryotes: Concepts, Strategies, and Techniques, 2nd ed. Cold Spring Harbor Laboratory Press. 29 Chen, Y., Bates, D.L., Dey, R., Chen, P.-H., Machado, A.C.D., Laird-Offringa, I.A., Rohs, R., Chen, L., 2012. DNA binding by GATA transcription factor suggests mechanisms of DNA looping and long-range gene regulation. Cell Rep 2, 11971206. Chlon, T.M., Doré, L.C., Crispino, J.D., 2012. Cofactor-Mediated Restriction of GATA-1 Chromatin Occupancy Coordinates Lineage-Specific Gene Expression. Mol Cell. Choi, H.-J., Geng, Y., Cho, H., Li, S., Giri, P.K., Felio, K., Wang, C.-R., 2011. Differential requirements for the Ets transcription factor Elf-1 in the development of NKT cells and NK cells. Blood 117, 1880-1887. Chopra, V.S., Kong, N., Levine, M., 2012. Transcriptional repression via antilooping in the Drosophila embryo. Proc Natl Acad Sci U S A 109, 9460-9464. Chou, S.T., Khandros, E., Bailey, L.C., Nichols, K.E., Vakoc, C.R., Yao, Y., Huang, Z., Crispino, J.D., Hardison, R.C., Blobel, G.A., Weiss, M.J., 2009. Graded repression of PU.1/Sfpi1 gene transcription by GATA factors regulates hematopoietic cell fate. Blood 114, 983-994. Ciaccio, M.F., Wagner, J.P., Chuu, C.-P., Lauffenburger, D.A., Jones, R.B., 2010. Systems analysis of EGF receptor signaling dynamics with microwestern arrays. Nat Methods 7, 148-155. Cooper, S., Guo, H., Friedman, A.D., 2015. The +37 kb Cebpa Enhancer Is Critical for Cebpa Myeloid Gene Expression and Contains Functional Sites that Bind SCL, GATA2, C/EBPα, PU.1, and Additional Ets Factors. PLoS One 10, e0126385. Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J., Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., Boyer, L.A., Young, R.A., Jaenisch, R., 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 107, 21931-21936. Dahl, R., Walsh, J.C., Lancki, D., Laslo, P., Iyer, S.R., Singh, H., Simon, M.C., 2003. Regulation of macrophage and neutrophil cell fates by the PU.1:C/EBPalpha ratio and granulocyte colony-stimulating factor. Nat Immunol 4, 1029-1036. DeKoter, R.P., Lee, H.-J., Singh, H., 2002. PU.1 regulates expression of the interleukin-7 receptor in lymphoid progenitors. Immunity 16, 297-309. DeKoter, R.P., Singh, H., 2000. Regulation of B lymphocyte and macrophage development by graded expression of PU.1. Science 288, 1439-1441. Doré, L.C., Chlon, T.M., Brown, C.D., White, K.P., Crispino, J.D., 2012. Chromatin occupancy analysis reveals genome-wide GATA factor switching during hematopoiesis. Blood 119, 3724-3733. Dunipace, L., Ozdemir, A., Stathopoulos, A., 2011. Complex interactions between cisregulatory modules in native conformation are critical for Drosophila snail expression. Development. ENCODE Project Consortium, Bernstein, B.E., Birney, E., Dunham, I., Green, E.D., Gunter, C., Snyder, M., 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74. Friedman, A.D., 2007a. C/EBPa induces PU.1 and interacts with AP-1 and NF-kB to regulate myeloid development. Blood Cells Mol Dis 39, 340 - 343. Friedman, A.D., 2007b. Transcriptional control of granulocyte and monocyte development. Oncogene 26, 6816-6828. 30 Fromental, C., Kanno, M., Nomiyama, H., Chambon, P., 1988. Cooperativity and hierarchical levels of functional organization in the SV40 enhancer. Cell 54, 943953. Garber, M., Yosef, N., Goren, A., Raychowdhury, R., Thielke, A., Guttman, M., Robinson, J., Minie, B., Chevrier, N., Itzhaki, Z., Blecher-Gonen, R., Bornstein, C., Amann-Zalcenstein, D., Weiner, A., Friedrich, D., Meldrim, J., Ram, O., Cheng, C., Gnirke, A., Fisher, S., Friedman, N., Wong, B., Bernstein, B.E., Nusbaum, C., Hacohen, N., Regev, A., Amit, I., 2012. A high-throughput chromatin immunoprecipitation approach reveals principles of dynamic gene regulation in mammals. Mol Cell 47, 810-822. Georgopoulos, K., 2002. Haematopoietic cell-fate decisions, chromatin regulation and ikaros. Nat Rev Immunol 2, 162-174. Gerstein, M.B., Kundaje, A., Hariharan, M., Landt, S.G., Yan, K.-K., Cheng, C., Mu, X.J., Khurana, E., Rozowsky, J., Alexander, R., Min, R., Alves, P., Abyzov, A., Addleman, N., Bhardwaj, N., Boyle, A.P., Cayting, P., Charos, A., Chen, D.Z., Cheng, Y., Clarke, D., Eastman, C., Euskirchen, G., Frietze, S., Fu, Y., Gertz, J., Grubert, F., Harmanci, A., Jain, P., Kasowski, M., Lacroute, P., Leng, J., Lian, J., Monahan, H., O'Geen, H., Ouyang, Z., Partridge, E.C., Patacsil, D., Pauli, F., Raha, D., Ramirez, L., Reddy, T.E., Reed, B., Shi, M., Slifer, T., Wang, J., Wu, L., Yang, X., Yip, K.Y., Zilberman-Schapira, G., Batzoglou, S., Sidow, A., Farnham, P.J., Myers, R.M., Weissman, S.M., Snyder, M., 2012. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91-100. Gerstein, M.B., Lu, Z.J., Van Nostrand, E.L., Cheng, C., Arshinoff, B.I., Liu, T., Yip, K.Y., Robilotto, R., Rechtsteiner, A., Ikegami, K., Alves, P., Chateigner, A., Perry, M., Morris, M., Auerbach, R.K., Feng, X., Leng, J., Vielle, A., Niu, W., Rhrissorrakrai, K., Agarwal, A., Alexander, R.P., Barber, G., Brdlik, C.M., Brennan, J., Brouillet, J.J., Carr, A., Cheung, M.-S., Clawson, H., Contrino, S., Dannenberg, L.O., Dernburg, A.F., Desai, A., Dick, L., Dosé, A.C., Du, J., Egelhofer, T., Ercan, S., Euskirchen, G., Ewing, B., Feingold, E.A., Gassmann, R., Good, P.J., Green, P., Gullier, F., Gutwein, M., Guyer, M.S., Habegger, L., Han, T., Henikoff, J.G., Henz, S.R., Hinrichs, A., Holster, H., Hyman, T., Iniguez, A.L., Janette, J., Jensen, M., Kato, M., Kent, W.J., Kephart, E., Khivansara, V., Khurana, E., Kim, J.K., Kolasinska-Zwierz, P., Lai, E.C., Latorre, I., Leahey, A., Lewis, S., Lloyd, P., Lochovsky, L., Lowdon, R.F., Lubling, Y., Lyne, R., MacCoss, M., Mackowiak, S.D., Mangone, M., McKay, S., Mecenas, D., Merrihew, G., Miller, r., David M, Muroyama, A., Murray, J.I., Ooi, S.-L., Pham, H., Phippen, T., Preston, E.A., Rajewsky, N., Rätsch, G., Rosenbaum, H., Rozowsky, J., Rutherford, K., Ruzanov, P., Sarov, M., Sasidharan, R., Sboner, A., Scheid, P., Segal, E., Shin, H., Shou, C., Slack, F.J., Slightam, C., Smith, R., Spencer, W.C., Stinson, E.O., Taing, S., Takasaki, T., Vafeados, D., Voronina, K., Wang, G., Washington, N.L., Whittle, C.M., Wu, B., Yan, K.-K., Zeller, G., Zha, Z., Zhong, M., Zhou, X., modENCODE Consortium, Ahringer, J., Strome, S., Gunsalus, K.C., Micklem, G., Liu, X.S., Reinke, V., Kim, S.K., Hillier, L.W., Henikoff, S., Piano, F., Snyder, M., Stein, L., Lieb, J.D., Waterston, R.H., 2010. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330, 1775-1787. 31 Göttgens, B., Nastos, A., Kinston, S., Piltz, S., Delabesse, E.C.M., Stanley, M., Sanchez, M.-J., Ciau-Uitz, A., Patient, R., Green, A.R., 2002. Establishing the transcriptional programme for blood: the SCL stem cell enhancer is regulated by a multiprotein complex containing Ets and GATA factors. EMBO J 21, 3039-3050. Graf, T., Enver, T., 2009. Forcing cells to change lineages. Nature 462, 587-594. Grass, J.A., Boyer, M.E., Pal, S., Wu, J., Weiss, M.J., Bresnick, E.H., 2003. GATA-1dependent transcriptional repression of GATA-2 via disruption of positive autoregulation and domain-wide chromatin remodeling. Proc Natl Acad Sci U S A 100, 8811-8816. Gray, S., Levine, M., 1996. Short-range transcriptional repressors mediate both quenching and direct repression within complex loci in Drosophila. Genes and Development 10, 700--710. Guo, H., Ma, O., Friedman, A.D., 2014. The Cebpa +37-kb enhancer directs transgene expression to myeloid progenitors and to long-term hematopoietic stem cells. J Leukoc Biol 96, 419-426. Guo, H., Ma, O., Speck, N.A., Friedman, A.D., 2012. Runx1 deletion or dominant inhibition reduces Cebpa transcription via conserved promoter and distal enhancer sites to favor monopoiesis over granulopoiesis. Blood 119, 4408-4418. Hardison, R.C., Taylor, J., 2012. Genomic approaches towards finding cis-regulatory modules in animals. Nat Rev Genet 13, 469-483. Harmston, N., Lenhard, B., 2013. Chromatin and epigenetic features of long-range gene regulation. Nucleic Acids Res 41, 7185-7199. Hasemann, M.S., Lauridsen, F.K.B., Waage, J., Jakobsen, J.S., Frank, A.-K., Schuster, M.B., Rapin, N., Bagger, F.O., Hoppe, P.S., Schroeder, T., Porse, B.T., 2014. C/EBPα is required for long-term self-renewal and lineage priming of hematopoietic stem cells and for the maintenance of epigenetic configurations in multipotent progenitors. PLoS Genet 10, e1004079. He, A., Shen, X., Ma, Q., Cao, J., von Gise, A., Zhou, P., Wang, G., Marquez, V.E., Orkin, S.H., Pu, W.T., 2012a. PRC2 directly methylates GATA4 and represses its transcriptional activity. Genes Dev 26, 37-42. He, X., Duque, T.S.P.C., Sinha, S., 2012b. Evolutionary origins of transcription factor binding site clusters. Mol Biol Evol 29, 1059-1070. Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C., Singh, H., Glass, C.K., 2010. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589. Hertz, G.Z., Hartzell, r., G W, Stormo, G.D., 1990. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci 6, 81-92. Hesselberth, J.R., Chen, X., Zhang, Z., Sabo, P.J., Sandstrom, R., Reynolds, A.P., Thurman, R.E., Neph, S., Kuehn, M.S., Noble, W.S., Fields, S., Stamatoyannopoulos, J.A., 2009. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods 6, 283-289. Hsu, W., Kerppola, T.K., Chen, P.L., Curran, T., Chen-Kiang, S., 1994. Fos and Jun repress transcription activation by NF-IL6 through association at the basic zipper region. Mol Cell Biol 14, 268-276. 32 Huang, S., Guo, Y., May, G., Enver, T., 2007. Bifurcation dynamics in lineagecommitment in bipotent progenitor cells. Developmental Biology 305, 695--713. Huang, Z., Dore, L.C., Li, Z., Orkin, S.H., Feng, G., Lin, S., Crispino, J.D., 2009. GATA2 reinforces megakaryocyte development in the absence of GATA-1. Mol Cell Biol 29, 5168-5180. Jakobsen, J.S., Waage, J., Rapin, N., Bisgaard, H.C., Larsen, F.S., Porse, B.T., 2013. Temporal mapping of CEBPA and CEBPB binding during liver regeneration reveals dynamic occupancy and specific regulatory codes for homeostatic and cell cycle gene batteries. Genome Res 23, 592-603. Janssens, H., Hou, S., Jaeger, J., Kim, A., Myasnikova, E., Sharp, D., Reinitz, J., 2006. Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even-skipped gene. Nature Genetics 38, 1159-1165. Kazemian, M., Blatti, C., Richards, A., McCutchan, M., Wakabayashi-Ito, N., Hammonds, A.S., Celniker, S.E., Kumar, S., Wolfe, S.A., Brodsky, M.H., Sinha, S., 2010. Quantitative analysis of the Drosophila segmentation regulatory network using pattern generating potentials. PLoS Biol 8. Kim, A.-R., Martinez, C., Ionides, J., Ramos, A.F., Ludwig, M.Z., Ogawa, N., Sharp, D.H., Reinitz, J., 2013. Rearrangements of 2.5 kilobases of noncoding DNA from the Drosophila even-skipped locus define predictive rules of genomic cisregulatory logic. PLoS Genet 9, e1003243. Ko, L.J., Engel, J.D., 1993. DNA-binding specificities of the GATA transcription factor family. Mol Cell Biol 13, 4011-4022. Kueh, H.Y., Champhekhar, A., Nutt, S.L., Elowitz, M.B., Rothenberg, E.V., 2013. Positive Feedback Between PU.1 and the Cell Cycle Controls Myeloid Differentiation. Science. Kulkarni, M.M., Arnosti, D.N., 2005. cis-Regulatory logic of short-range transcriptional repression in Drosophila melanogaster. Molecular and Cellular Biology 25, 34113420. Lam, J., Delosme, J.-M., 1988a. An efficient simulated annealing schedule: Derivation. Yale Electrical Engineering Department, New Haven, CT. Lam, J., Delosme, J.-M., 1988b. An efficient simulated annealing schedule: Implementation and evaluation. Yale Electrical Engineering Department, New Haven, CT. Landry, J.-R., Bonadies, N., Kinston, S., Knezevic, K., Wilson, N.K., Oram, S.H., Janes, M., Piltz, S., Hammett, M., Carter, J., Hamilton, T., Donaldson, I.J., Lacaud, G., Frampton, J., Follows, G., Kouskoff, V., Göttgens, B., 2009. Expression of the leukemia oncogene Lmo2 is controlled by an array of tissue-specific elements dispersed over 100 kb and bound by Tal1/Lmo2, Ets, and Gata factors. Blood 113, 5783-5792. Lara-Astiaso, D., Weiner, A., Lorenzo-Vivas, E., Zaretsky, I., Jaitin, D.A., David, E., Keren-Shaul, H., Mildner, A., Winter, D., Jung, S., Friedman, N., Amit, I., 2014. Immunogenetics. Chromatin state dynamics during blood formation. Science 345, 943-949. Laslo, P., Pongubala, J.M.R., Lancki, D.W., Singh, H., 2008. Gene regulatory networks directing myeloid and lymphoid cell fates within the immune system. Semin Immunol 20, 228-235. 33 Laslo, P., Spooner, C.J., Warmflash, A., Lancki, D.W., Lee, H.-J., Sciammas, R., Gantner, B.N., Dinner, A.R., Singh, H., 2006. Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell 126, 755-766. Leddin, M., Perrod, C., Hoogenkamp, M., Ghani, S., Assi, S., Heinz, S., Wilson, N.K., Follows, G., Schönheit, J., Vockentanz, L., Mosammam, A.M., Chen, W., Tenen, D.G., Westhead, D.R., Göttgens, B., Bonifer, C., Rosenbauer, F., 2011. Two distinct auto-regulatory loops operate at the PU.1 locus in B cells and myeloid cells. Blood 117, 2827-2838. Legraverend, C., Antonson, P., Flodby, P., Xanthopoulos, K.G., 1993. High level activity of the mouse CCAAT/enhancer binding protein (C/EBP alpha) gene promoter involves autoregulation and several ubiquitous transcription factors. Nucleic Acids Res 21, 1735-1742. Levine, M., Davidson, E.H., 2005. Gene regulatory networks for development. Proc Natl Acad Sci U S A 102, 4936-4942. Levo, M., Segal, E., 2014. In pursuit of design principles of regulatory sequences. Nat Rev Genet 15, 453-468. Li, Y., Okuno, Y., Zhang, P., Radomska, H.S., Chen, H., Iwasaki, H., Akashi, K., Klemsz, M.J., McKercher, S.R., Maki, R.A., Tenen, D.G., 2001. Regulation of the PU.1 gene by distal elements. Blood 98, 2958-2965. Lin, H., Grosschedl, R., 1995. Failure of B-cell differentiation in mice lacking the transcription factor EBF. Nature 376, 263-267. Lin, Y.C., Jhunjhunwala, S., Benner, C., Heinz, S., Welinder, E., Mansson, R., Sigvardsson, M., Hagman, J., Espinoza, C.A., Dutkowski, J., Ideker, T., Glass, C.K., Murre, C., 2010. A global network of transcription factors, involving E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nat Immunol 11, 635-643. Lulli, V., Romania, P., Morsilli, O., Gabbianelli, M., Pagliuca, A., Mazzeo, S., Testa, U., Peschle, C., Marziali, G., 2006. Overexpression of Ets-1 in human hematopoietic progenitor cells blocks erythroid and promotes megakaryocytic differentiation. Cell Death Differ 13, 1064-1074. Lulli, V., Romania, P., Riccioni, R., Boe, A., Lo-Coco, F., Testa, U., Marziali, G., 2010. Transcriptional silencing of the ETS1 oncogene contributes to human granulocytic differentiation. Haematologica 95, 1633-1641. Mathelier, A., Zhao, X., Zhang, A.W., Parcy, F., Worsley-Hunt, R., Arenillas, D.J., Buchman, S., Chen, C.-y., Chou, A., Ienasescu, H., Lim, J., Shyr, C., Tan, G., Zhou, M., Lenhard, B., Sandelin, A., Wasserman, W.W., 2014. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42, D142-147. Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., LewickiPotapov, B., Saxel, H., Kel, A.E., Wingender, E., 2006. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34, D108-110. May, G., Soneji, S., Tipping, A.J., Teles, J., McGowan, S.J., Wu, M., Guo, Y., Fugazza, C., Brown, J., Karlsson, G., Pina, C., Olariu, V., Taylor, S., Tenen, D.G., Peterson, C., Enver, T., 2013. Dynamic analysis of gene expression and genome-wide 34 transcription factor binding during lineage specification of multipotent progenitors. Cell Stem Cell 13, 754-768. Melnikov, A., Murugan, A., Zhang, X., Tesileanu, T., Wang, L., Rogov, P., Feizi, S., Gnirke, A., Callan, J., Curtis G, Kinney, J.B., Kellis, M., Lander, E.S., Mikkelsen, T.S., 2012. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol 30, 271277. Merika, M., Orkin, S.H., 1993. DNA-binding specificity of GATA family transcription factors. Mol Cell Biol 13, 3999-4010. Muthusamy, N., Barton, K., Leiden, J.M., 1995. Defective activation and survival of T cells lacking the Ets-1 transcription factor. Nature 377, 639-642. Nam, J., Dong, P., Tarpine, R., Istrail, S., Davidson, E.H., 2010. Functional cis-regulatory genomics for systems biology. Proc Natl Acad Sci U S A 107, 3930-3935. Nègre, N., Brown, C.D., Ma, L., Bristow, C.A., Miller, S.W., Wagner, U., Kheradpour, P., Eaton, M.L., Loriaux, P., Sealfon, R., Li, Z., Ishii, H., Spokony, R.F., Chen, J., Hwang, L., Cheng, C., Auburn, R.P., Davis, M.B., Domanus, M., Shah, P.K., Morrison, C.A., Zieba, J., Suchy, S., Senderowicz, L., Victorsen, A., Bild, N.A., Grundstad, A.J., Hanley, D., MacAlpine, D.M., Mannervik, M., Venken, K., Bellen, H., White, R., Gerstein, M., Russell, S., Grossman, R.L., Ren, B., Posakony, J.W., Kellis, M., White, K.P., 2011. A cis-regulatory map of the Drosophila genome. Nature 471, 527-531. Nerlov, C., 2007. The C/EBP family of transcription factors: a paradigm for interaction between gene expression and proliferation control. Trends Cell Biol 17, 318-324. Nguyen, H.Q., Hoffman-Liebermann, B., Liebermann, D.A., 1993. The zinc finger transcription factor Egr-1 is essential for and restricts differentiation along the macrophage lineage. Cell 72, 197-209. Novershtern, N., Subramanian, A., Lawton, L.N., Mak, R.H., Haining, W.N., McConkey, M.E., Habib, N., Yosef, N., Chang, C.Y., Shay, T., Frampton, G.M., Drake, A.C.B., Leskov, I., Nilsson, B., Preffer, F., Dombkowski, D., Evans, J.W., Liefeld, T., Smutko, J.S., Chen, J., Friedman, N., Young, R.A., Golub, T.R., Regev, A., Ebert, B.L., 2011. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296-309. Ogbourne, S., Antalis, T.M., 1998. Transcriptional control and the role of silencers in transcriptional regulation in eukaryotes. Biochem J 331 ( Pt 1), 1-14. Ondek, B., Gloss, L., Herr, W., 1988. The SV40 enhancer contains two distinct levels of organization. Nature 333, 40-45. Perry, M.W., Boettiger, A.N., Levine, M., 2011. Multiple enhancers ensure precision of gap gene-expression patterns in the Drosophila embryo. Proc Natl Acad Sci U S A 108, 13570-13575. Pimanda, J.E., Chan, W.Y.I., Wilson, N.K., Smith, A.M., Kinston, S., Knezevic, K., Janes, M.E., Landry, J.-R., Kolb-Kokocinski, A., Frampton, J., Tannahill, D., Ottersbach, K., Follows, G.A., Lacaud, G., Kouskoff, V., Göttgens, B., 2008. Endoglin expression in blood and endothelium is differentially regulated by modular assembly of the Ets/Gata hemangioblast code. Blood 112, 4512-4522. Pongubala, J.M.R., Northrup, D.L., Lancki, D.W., Medina, K.L., Treiber, T., Bertolino, E., Thomas, M., Grosschedl, R., Allman, D., Singh, H., 2008. Transcription factor 35 EBF restricts alternative lineage options and promotes B cell fate commitment independently of Pax5. Nat Immunol 9, 203-215. Reinitz, J., Hou, S., Sharp, D.H., 2003. Transcriptional control in Drosophila. ComPlexUs 1, 54--64. Reinitz, J., Sharp, D.H., 1995. Mechanism of eve stripe formation. Mechanisms of Development 49, 133--158. Reynaud, D., Demarco, I.A., Reddy, K.L., Schjerven, H., Bertolino, E., Chen, Z., Smale, S.T., Winandy, S., Singh, H., 2008. Regulation of B cell fate commitment and immunoglobulin heavy-chain gene rearrangements by Ikaros. Nat Immunol 9, 927-936. Robinson, L., Panayiotakis, A., Papas, T.S., Kola, I., Seth, A., 1997. ETS target genes: identification of egr1 as a target by RNA differential display and whole genome PCR techniques. Proc Natl Acad Sci U S A 94, 7170-7175. Rodrigues, N.P., Boyd, A.S., Fugazza, C., May, G.E., Guo, Y., Tipping, A.J., Scadden, D.T., Vyas, P., Enver, T., 2008. GATA-2 regulates granulocyte-macrophage progenitor cell function. Blood 112, 4862-4873. Rodrigues, N.P., Tipping, A.J., Wang, Z., Enver, T., 2012. GATA-2 mediated regulation of normal hematopoietic stem/progenitor cell function, myelodysplasia and myeloid leukemia. Int J Biochem Cell Biol 44, 457-460. Rothenberg, E.V., 2014. Transcriptional control of early T and B cell developmental choices. Annu Rev Immunol 32, 283-321. Schirm, S., Jiricny, J., Schaffner, W., 1987. The SV40 enhancer can be dissected into multiple segments, each with a different cell type specificity. Genes Dev 1, 65-74. Scott, E.W., Simon, M.C., Anastasi, J., Singh, H., 1994. Requirement of transcription factor PU.1 in the development of multiple hematopoietic lineages. Science 265, 1573-1577. Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U., Gaul, U., 2008. Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535-540. Sharon, E., Kalma, Y., Sharp, A., Raveh-Sadka, T., Levo, M., Zeevi, D., Keren, L., Yakhini, Z., Weinberger, A., Segal, E., 2012. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat Biotechnol 30, 521-530. Singh, H., Khan, A.A., Dinner, A.R., 2014. Gene regulatory networks in the immune system. Trends Immunol 35, 211-218. Small, S., Arnosti, D.N., Levine, M., 1993. Spacing ensures autonomous expression of different stripe enhancers in the even-skipped promoter. Development 119, 767-772. Small, S., Blair, A., Levine, M., 1992. Regulation of even-skipped stripe 2 in the Drosophila embryo. The EMBO Journal 11, 4047--4057. Small, S., Blair, A., Levine, M., 1996. Regulation of two pair-rule stripes by a single enhancer in the Drosophila embryo. Developmental Biology 175, 314--324. Spitz, F., Furlong, E.E.M., 2012. Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13, 613-626. 36 Stopka, T., Amanatullah, D.F., Papetti, M., Skoultchi, A.I., 2005. PU.1 inhibits the erythroid program by binding to GATA-1 on DNA and creating a repressive chromatin structure. EMBO J 24, 3712-3723. Treiber, T., Mandel, E.M., Pott, S., Györy, I., Firner, S., Liu, E.T., Grosschedl, R., 2010. Early B cell factor 1 regulates B cell gene networks by activation, repression, and transcription- independent poising of chromatin. Immunity 32, 714-725. Tsukada, J., Yoshida, Y., Kominato, Y., Auron, P.E., 2011. The CCAAT/enhancer (C/EBP) family of basic-leucine zipper (bZIP) transcription factors is a multifaceted highly-regulated system for gene regulation. Cytokine 54, 6-19. Walsh, J.C., DeKoter, R.P., Lee, H.J., Smith, E.D., Lancki, D.W., Gurish, M.F., Friend, D.S., Stevens, R.L., Anastasi, J., Singh, H., 2002. Cooperative and antagonistic interplay between PU.1 and GATA-2 in the specification of myeloid cell fates. Immunity 17, 665-676. Weigelt, K., Lichtinger, M., Rehli, M., Langmann, T., 2009. Transcriptomic profiling identifies a PU.1 regulatory network in macrophages. Biochem Biophys Res Commun 380, 308-312. Wilson, N.K., Calero-Nieto, F.J., Ferreira, R., Göttgens, B., 2011. Transcriptional regulation of haematopoietic transcription factors. Stem Cell Res Ther 2, 6. Wilson, N.K., Foster, S.D., Wang, X., Knezevic, K., Schütte, J., Kaimakis, P., Chilarska, P.M., Kinston, S., Ouwehand, W.H., Dzierzak, E., Pimanda, J.E., de Bruijn, M.F.T.R., Göttgens, B., 2010a. Combinatorial transcriptional control in blood stem/progenitor cells: genome-wide analysis of ten major transcriptional regulators. Cell Stem Cell 7, 532-544. Wilson, N.K., Timms, R.T., Kinston, S.J., Cheng, Y.-H., Oram, S.H., Landry, J.-R., Mullender, J., Ottersbach, K., Gottgens, B., 2010b. Gfi1 expression is controlled by five distinct regulatory regions spread over 100 kilobases, with Scl/Tal1, Gata2, PU.1, Erg, Meis1, and Runx1 acting as upstream regulators in early hematopoietic cells. Mol Cell Biol 30, 3853-3863. Yeamans, C., Wang, D., Paz-Priel, I., Torbett, B.E., Tenen, D.G., Friedman, A.D., 2007. C/EBPalpha binds and activates the PU.1 distal enhancer to induce monocyte lineage commitment. Blood 110, 3136-3142. Yücel, R., Kosan, C., Heyd, F., Möröy, T., 2004. Gfi1:green fluorescent protein knock-in mutant reveals differential expression and autoregulation of the growth factor independence 1 (Gfi1) gene during lymphocyte development. J Biol Chem 279, 40906-40917. Yuh, C.-H., Bolouri, H., Davidson, E.H., 1998. Genomic cis-regulatory logic: Functional analysis and computational model of a sea urchin gene control system. Science 279, 1896--1902. Zhang, D.E., Zhang, P., Wang, N.D., Hetherington, C.J., Darlington, G.J., Tenen, D.G., 1997. Absence of granulocyte colony-stimulating factor signaling and neutrophil development in CCAAT enhancer binding protein alpha-deficient mice. Proc Natl Acad Sci U S A 94, 569-574. Zinzen, R.P., Senger, K., Levine, M., Papatsenko, D., 2006. Computational models for neurogenic gene expression in the Drosophila embryo. Current Biology 16, 1358--1365. 37 Figure Legends Figure 1. A schematic illustration of the methodology for reverse engineering cis-regulatory logic. Figure 2. Sequence-based model of transcription. The model takes DNA sequence, PWMs, and estimates of TF concentration as inputs and computes the rate of transcription as output. The different steps of the calculation are shown; see Supplementary Text S1 for a detailed description of the model. A. The DNA sequence is scored using PWMs by sliding an L bp window. The score, S, is thresholded to identify sites (see Table S2 and Methods). The binding affinity, K , is computed for each TFBS. B. For each site k , the fractional occupancy ( f k ), the fraction of time for which it is occupied, is computed. Here, this is illustrated with a 3-site example. All the configurations in which the binding sites can be occupied are enumerated. The weight w i is proportional to the probability that the sites are occupied in configuration σ i. The fractional occupancy is given by the sum of the weights of the configurations in which a site is occupied divided by the sum of weights. vi are the concentrations of the TFs that bind to the sites under consideration. Competition between TFs for overlapping sites is implemented by excluding configurations in which overlapping sites are simultaneously occupied. C. Quenching, or activator-specific repression, reduces the fractional occupancy of activators to f 'k if they are within the repression range, specified by the distance function q (∙) , of one or more repressors. The reduction of activator occupancy results in a lower transcription rate. The effects of multiple repressors are multiplicative, allowing several weak sites to act as a strong one. D. The sum of the a (k) fractional occupancies of the activators, weighted by the efficiencies of activation, E A , is computed to determine the interaction strength, I , of the CRM(s) with the core promoter. In the final step of the calculation (panel F), we model transcription initiation as an enzymatic process where the reduction ΔΔ A in the activation energy barrier Δ A is determined in proportion to the net interaction strength. Due to the nonlinear form of the Arrhenius law (panel F), multiple bound activator molecules have a superlinear additive, that is, synergistic effect on transcription. E. Generalized or long-range repression reduces the interaction strength in a multiplicative but distance independent manner, giving the net interaction strength. F. The net interaction strength is used to calculate the rate of transcription using a diffusion-limited version of the Arrhenius law (Kim et al., 2013). Here Q is a factor that converts fractional occupancy to units of energy and Θ 38 is the activation energy barrier when no activators are bound. Figure 3. Identification of putative CRMs and design of reporter constructs. A-C. Plots of sequence identity and putative CRMs tested in this study for Cebpa (A), Egr1 (B), and Egr2 (C). The first track shows sequence identity between the mm9 (Mouse) and CanFam2 (Dog) genomes averaged over a 101bp window. The second track shows only those positions having identity > 70%. The third track shows annotated genes. The fourth track displays the putative CRMs tested here. CRMs are referred to by the gene name followed by CRM number in parentheses. D. Design of the reporter constructs. For each gene, the first construct carries the proximal promoter, Cebpa(0), Egr1(0), or Egr2(0), immediately upstream of the core promoter of the construct. The remaining constructs carry both a distal CRM and the cognate proximal promoter separated by an intervening vector sequence 2828bp in length (see Methods). Figure 4. Luciferase activity pattern of CRM-reporter constructs in PUER cells. A. Cebpa CRMs. B. Egr1 CRMs. C. Egr2 CRMs. Activity in uninduced condition (progenitor), after 24 hrs IL-3 and OHT treatment (early macrophage), and 24 hrs G-CSF and OHT treatment (early granulocyte) is shown in blue, red, and green bars respectively. Error bars are measurements in two replicates and the colored bar height is the mean. Figure 5. Results of nonlinear optimization and the selection of representative models for further analysis. A. The scores of the two rounds of nonlinear optimization are shown. The first round included only myeloid-implicated factors, while the second round included four non-myeloid factors. The parameters of each combination of TF regulatory roles were inferred in 5 replicates. The lowest score of each combination, numbering 215 in all, is plotted. The 20 lowest scoring regulatory combinations, analyzed further, are plotted in red. B,D. Model selection from myeloid-only runs. C,E. Model selection from runs including non-myeloid TFs. B-C. Regulatory roles assigned to the 20 lowest scoring models in each run. Red is activation and blue is repression. The models were clustered hierarchically (Fig. S4) based on the similarity of regulatory role assignment. The members of the largest cluster are the top 8 models in both panels. D-E. Scatter plot of model output against reporter activity for models selected for further analysis. Both axes are in log scale to show low expression values clearly. D. Model 12058. Pearson’s correlation coefficient is r 2=0.78. E. Model 81762. r 2=0.91. 39 Figure 6. Comparison of the output of representative models with reporter activity data. A-C. Model 12058, representative of the first round of reverse engineering having myeloid-only factors. D-F. Model 81762, representative of the second round of reverse engineering that included the non-myeloid factors EBF1, E2A, GATA(s), and Elf1. A,D. Cebpa. B,E. Egr1. C,F. Egr2. Reporter activity data and model output are shown in filled and open bars respectively. Colors and error bars are shown as in Fig. 4. Figure 7. Inference of regulatory logic. A-C. Activation. D-F. Repression. A,D. Cebpa proximal promoter Cebpa(0). B,E. Cebpa enhancer Cebpa(16). C,F. Cebpa silencer Cebpa(11). A-C. The activity of each activator site is plotted. The activity is the amount by which the individual site reduces the activation energy barrier, and depends on the occupancy of the site and the efficiency of the bound activator (Fig. 2D,F). D-F. The repressive activity of each repressor site is plotted. The repressive activity is fraction by which the repressor reduces the interaction strength, which results in a higher activation energy barrier. It depends on the occupancy of the repressor site and the efficiency of long-range repression of the bound repressor (Fig. 2E,F). The gray box is intervening vector sequence (Fig. 3D). The sites to the right of the gray box are on the proximal promoter. The x-axis shows each binding site modeled and the position of its 5’ end in the reporter construct relative to the 3’ end of the proximal promoter in parentheses. Figure 8. Summary of the inferred cis-regulation of Cebpa and Egr1. A. Cebpa. B. Egr1. 40 The analysis of novel distal Cebpa enhancers and silencers using a transcriptional model reveals the complex regulatory logic of hematopoietic lineage specification Supplementary Information Eric Bertolinoa,* , John Reinitza,b,c , & Manuc,d,* a Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL 60637, U.S.A. b Department of Statistics, The University of Chicago, Chicago, IL 60637, U.S.A. c Department of Ecology and Evolution and Institute of Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, U.S.A. d Department of Biology, University of North Dakota, 10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019, U.S.A. * Corresponding authors. E-mail addresses: manu.manu@und.edu (M) and eric.bertolino@gmail.com (EB). Contents Text S1: Sequence-based model of transcription 5 Text S2: The optimization problem and significance of fits 8 Supplementary Figures 10 Supplementary Tables 23 Text S2: CRM and vector sequences in FASTA format 30 References 46 2 List of Figures S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 Mean and standard deviation of the gene expression of 62 candidate TFs in PUER cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gene expression in PUER cells of the TFs tested in this study . . . . . . . . . . . . Scores of model fits with permuted data . . . . . . . . . . . . . . . . . . . . . . . Hierarchical clustering of models to identify representative ones for further analysis Maximum activity of each TF in model 12058 . . . . . . . . . . . . . . . . . . . . Regulatory logic of Cebpa enhancers . . . . . . . . . . . . . . . . . . . . . . . . . Regulatory logic of Cebpa silencers . . . . . . . . . . . . . . . . . . . . . . . . . Regulatory logic of miscellaneous Cebpa CRMs . . . . . . . . . . . . . . . . . . . Regulatory logic of Egr1 CRMs . . . . . . . . . . . . . . . . . . . . . . . . . . . Regulatory logic of Egr2 CRMs . . . . . . . . . . . . . . . . . . . . . . . . . . . Compilation of ChIP-seq and ChIP-chip datasets from NCBI Gene Expression Omnibus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The dependence of model output on PWM choice . . . . . . . . . . . . . . . . . . The dependence of the quality of fit on the number of TFs in the model . . . . . . . 3 10 11 12 13 14 15 16 17 18 19 20 21 22 List of Tables S1 S2 S3 S4 S5 List of 62 candidate TFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Position Weight Matrices used to detect binding sites for each transcription factor Microarray probes used to determine gene expression for each TF . . . . . . . . Scores of the lowest scoring models by replicate . . . . . . . . . . . . . . . . . . Parameter values for the two analyzed models . . . . . . . . . . . . . . . . . . . 4 . . . . . 23 24 25 26 29 Text S1: Sequence-based model of transcription The model takes CRM DNA sequence, estimates of TF concentration, and TF PWMs as input and computes the rate of transcription driven by the CRM. A model is defined by specifying the TFs and their regulatory roles beforehand. Binding sites and their affinity Let the length of the PWM be L. Each L-mer has a score S= L X ln j=1 ✓ fbj pb â—† , (S1) where, fbj is the frequency of base b at position j in an alignment of known binding sites for the TF and pb is the a priori frequency of the base in the genome (Hertz & Stormo, 1999). The site-selection theory of Berg & von Hippel (1987) provides a means of relating the frequency of occurrence of nucleotides within a site to its free energy of binding. The theory assumes that 1) large numbers of sites are selected to have free energy of binding within a narrow range and 2) individual base pairs contribute independently or additively to the free energy of binding. Under these assumptions the difference of the scores between a site and the consensus sequence is proportional to the discrimination energy, the difference in the free energy of binding between the site and the consensus sequence (Berg & von Hippel, 1987). As a consequence, the binding affinity of the site relative to the consensus site can be written as ✓ â—† S Scons K = Kcons exp , (S2) where Kcons is the binding affinity of the consensus site, expressed in units of inverse fluorescence intensity. is the proportionality constant between binding energy and score. Kcons is a free parameter of the model and is inferred from CRM activity data by global nonlinear optimization (see Methods). is not known for most TFs, but is believed to have values between 0.5 and 1.5 (Berg & von Hippel, 1987). In order to limit the number of free parameters of the model, we fix its value to 1 here. We score each L-mer in a CRM with the PWM, and retain it in the model if the score exceeds a preset per-TF threshold value (see Methods). Fractional occupancy calculation and competition Let the sites in the CRM be indexed by i, and the TF binding to site i be a(i). Two sites are regarded as overlapping if they share at least one nucleotide. We represent the site-occupancy state 5 by a binary vector and, for each , subdivide the sites into occupied, O( ) = {i : i = 1}, and unoccupied, U ( ) = {i : i = 0}, subsets. The weight of each state, , is given by the following rules. 1. w( ) = 1, if all sites are unoccupied, that is O( ) = . 2. w( ) = Q j2O( ) Kj va(j) , if the occupied sites don’t overlap with each other. 3. w( ) = 0, if any two overlapping sites are occupied; the state is excluded from the calculation. Here, va(j) is the concentration of the TF a(j) in the cell type under consideration and Kj is the binding affinity of the jth site. The fractional occupancy is computed by summing the weights of all the allowed states in which the site is occupied, denoted Si , and normalizing to the sum of weights over all states S, fi = 1 X w( ); Z 2S i Z= X w( ). (S3) 2S TF-TF and TF-promoter interactions The TFs bound to DNA exert their influence over the total activity of the CRM by interacting with the promoter. TFs interact with the promoter indirectly by recruiting cofactors that either interact with the promoter or change chromatin conformation to increase or diminish the rate of transcription (Harmston & Lenhard, 2013). One implication of the indirect action of TFs is that each one potentially has a different interaction efficiency, stemming from differential recruitment or activity of cofactors. Since the identity of cofactors and mode of action are not known for most TFs, we do not model cofactors explicitly. Instead we introduce the efficiency factors, EAa , EQa , and ELa , which are proportionality constants between fractional occupancy and the strength of interaction. The first constant is for activation whereas the other two correspond to the two modes of repression represented in the model and described further below. TFs are also known to interact with each other to modulate each others’ occupancy or the strength of interaction with the promoter. We include four mechanisms for such interactions: 1) competition for binding sites, 2) quenching, 3) long-range repression, and 4) synergistic activation. The model incorporates competition between TFs for overlapping binding sites by disallowing states with simultaneous occupancy of overlapping sites in the fractional occupancy calculation. Competition has two main effects. If the competitor is a repressor, it will repress CRM activity by reducing activator occupancy. If the competitor is an activator, it will take control of CRM 6 activity in the cell types where it is expressed strongly. The remaining three TF-TF interactions are discussed in the order of their implementation in the calculation of CRM activity. Quenching. Many repressors act by reducing the activity of a specific activator. For example, PU.1 binds GATA1 protein bound to DNA and recruits a repressive complex that leads to the creation of repressive chromatin carrying the H3K9me3 mark (Stopka et al, 2005). One means by which repressors achieve specificity to activators is by acting in a position dependent manner (Ogbourne & Antalis, 1998; Arnosti et al, 1996; Hewitt et al, 1999). In Drosophila, the range of position-dependent repression has been shown to be ~150bp (Arnosti et al, 1996). Furthermore, repression efficiency also depends on the stoichiometry and affinities of the activator and repressor sites (Kulkarni & Arnosti, 2005). Following the convention of earlier models (Janssens et al, 2006; Kim et al, 2013), we refer to activator-specific repression as quenching here. We model quenching by reducing the fractional occupancy of activators in a multiplicative manner, so that fi0 = fi Y a(k) (1 q(d(i, k))EQ fk , (S4) k2R where i 2 A, the set of sites bound by activators, and R is the set of sites bound by repressors. fi0 is the fractional occupancy of the site after repression and d(i, k) is the distance between sites i and k, measured from their outer edges. The attenuation with distance, q(d), is 1 within 100bp and decreases linearly to 0 at 150bp. Thus, q(d) = 8 > > > <1 |d|  100 150 |d| > 50 100 < |d|  150 > > :0 (S5) 150 < |d|. Synergistic activation. The total interaction strength is determined by summing over the interaction strengths of all the activating sites I= X EAa fk0 . (S6) k2A Note that the interaction strength, I, corresponds to the product of the adaptor factor fractional occupancy and activator efficiency in earlier implementations of the model (Kim et al, 2013). In the final step of the calculation, described below in Eqs. (S8) and (S9), we model transcription initiation as an enzymatic process where the reduction in the activation energy barrier, A, is proportional to I. Therefore, multiple bound activator molecules have a superlinear additive, that is, synergistic, effect on transcription. 7 Long-range repression. Some repressors also diminish activity without regard to the identity of the activators and act over large distances (Cai et al, 1996; Grass et al, 2003). Such repressors usually act by modifying chromatin marks and conformation (Perissi et al, 2010; Harmston & Lenhard, 2013). The activity data (Fig. 4) suggest that a few putative CRMs are mediating repression at distances of over 4kb. We model generalized long-range repression by reducing the total interaction strength in a multiplicative manner to obtain the net interaction strength, I0 = I Y (1 a(k) (S7) EL fk ). k2R In the model, each repressor can potentially act at a short range on specific activators, or as a general long-range repressor. We determine the values of EQa and ELa by fitting to the activity data, and hence infer which mode of repressor action is consistent with the observed pattern of activity. Transcription rate. At the final step, we compute the rate of transcription initiation from the net interaction strength. Transcriptional initiation is regarded as a diffusion-limited enzymatic reaction catalyzed by the activators acting via their cofactors. The energy barrier for initiation, A, is lowered by an amount proportional to the interaction strength, giving A=⇥ A=⇥ QI 0 , (S8) where Q is the proportionality constant between interaction strength and energy. ⇥ is the activation energy barrier when no activators are bound and sets the basal transcription rate. The rate of transcription is then given by the Arrhenius law, limited by the diffusion of polymerase to the promoter (Kim et al, 2013), R = Rmax ✓ exp( (⇥ QI 0 )) 1 + exp( (⇥ QI 0 )) â—† , (S9) where Rmax is the maximum rate of transcription. These calculations are repeated to provide predictions for the activity of each CRM and cell type being modeled. The free parameters characterize properties of the TFs and do not change with the CRM. The nonlinear optimization is performed in an internally consistent manner, so that the same TF parameters are used to predict the activity of all the CRMs together (see Methods). This has two implications. First the number of free parameters depends only on the number of TFs in the model. Second, any differences predicted in the activity of CRMs arise solely from differences in DNA sequence. 8 Text S2: The optimization problem and significance of fits This problem differs from machine learning problems in two important respects, which determined how we validated the model. First, we are performing a regression where the number of parameters p varies between 32 and 47 (Supplementary Text S1) and the number of datapoints N = 114. In contrast to typical machine learning problems (Hastie et al, 2009), where N ⇡ p or N < p, N > p here. Second, and more importantly, our dataset is not based on repeatedly sampling from the joint distribution of TF concentrations and reporter activities. This would require the joint measurement of reporter activities and TF concentrations in single cells or a large number of cell types. Instead our dataset is heterogeneous, consisting of 46 CRMs, where the measurements are activities averaged over hundreds of thousands of cells but in few cell types. The heterogeneity implies that it is not possible to partition the data in a meaningful way for cross-validation tests (Hastie et al, 2009). For example, we cannot reasonably expect a model trained solely on Cebpa CRMs to predict the activity of Egr1 CRMs. Reflecting the nature of the data, our model is not a generalized statistical one, but is instead rooted in the biology of gene regulation (Kim et al, 2013; Segal et al, 2008). With these considerations, instead of performing a computational cross-validation here, we checked the validity of our results by performing an in depth comparison with literature and in vivo TF binding data (see Results). We also determined the significance of the fit for the lowest scoring models. Our null hypothesis was that the activity measurements are distributed randomly with respect to the identity of the CRMs and cell types. The nonlinearity of the model precludes an explicit formulation of the likelihood function and hence a calculation of a p-value. Nevertheless, we simulated (Papatsenko & Levine, 2011) the scores under the above null hypothesis by scrambling the correspondence of the CRMs and cell types to the activity data while preserving its dynamic range. 8 lowest-scoring models were chosen from the first round of reverse engineering. Holding the order of the CRMs and cell types constant, the order of the activity measurements was permuted randomly until the Pearson’s correlation coefficient between real and permuted data was less than 0.2. 10 permuted datasets were generated in this manner. The parameters of the 8 models were inferred (10 replicates) with the permuted data sets and real data using simulated annealing. Figure S3 shows that the lowest scores achieved with permuted data are 5-fold higher than the median score achieved with real data, demonstrating the significance of the fits. 9 Std dev Expression Supplementary Figures 5000 0 4000 A B C D 2000 0 0 20 TF 0 10 TF 20 Figure S1: Mean and standard deviation of the gene expression of 62 candidate TFs in PUER cells. A- B. Mean expression. C-D. Standard deviation. Microarray gene expression measurements in uninduced, IL3+OHT, and GCSF+OHT conditions are as reported by Laslo et al (2006). The set of immune-specific TFs were identified based on the presence of at least one binding site in the tested CRMs. They were further classified as having been previously implicated in myeloid differentiation or not based on a literature search. A,C. TFs previously implicated in myeloid differentiation. The TFs are plotted from left to right in the order of the first column of Table S1. B,D. TFs not yet implicated in myeloid differentiation (“nonmyeloid”). The TFs are plotted from left to right in the order of the second column of Table S1. Although many non-myeloid TFs are expressed in PUER cells (B), most of them have low standard deviation and are thus expressed uniformly in the three conditions (D). 10 7000 A Uninduced 6000 IL3+OHT GCSF+OHT Expression 5000 4000 3000 2000 1000 0 7000 B 6000 Expression 5000 4000 3000 2000 EB F1 E2 A El f1 Fl i1 G AT A Irf 4 s .1 ro Ik a yc PU M Et s1 Fo s Ju n /E BP α C G fi1 Eg r2 C /E BP δ yb /E BP β C M Eg r 0 1 1000 Figure S2: Gene expression in PUER cells of the TFs tested in this study. A. Microarray gene expres- sion measurements in uninduced, IL3+OHT, and GCSF+OHT conditions reported by Laslo et al (2006). B. Measurements in uninduced and IL3+OHT conditions reported by Weigelt et al (2009). Activity in uninduced condition (progenitor), after 24 hrs IL-3 and OHT treatment (early macrophage), and 24 hrs G-CSF and OHT treatment (early granulocyte) is shown in blue, red, and green bars respectively. For TFs with multiple probes, the expression values of the brightest probe were used (Table S3). The TFs plotted left of the vertical line were part of the first round of reverse engineering. The TFs to the right are non-myeloid TFs that replaced C/EBP , Egr2, Irf4, and Fos in the second round. 11 6 Score 10 5 35226 35225 35224 35223 35222 35221 35220 35219 10 Figure S3: Scores of model fits with permuted data. Holding the order of the CRMs and cell types constant, the order of the activity measurements was permuted randomly until the Pearson’s correlation coefficient between real and permuted data was less than 0.2. 10 permuted datasets were generated in this manner and 8 lowest-scoring models were fit to each dataset in 10 replicates. The replicate scores are plotted as boxplots; the bullseye is the median, vertical lines correspond to the 2nd and 3rd quartiles, and circles are outliers lying outside 1.5 times the interquartile range. Scores of the fits with real data are blue and scores produced with permuted data are other colors. x-axis is model number and y-axis is score. 12 Dissimilarity score 1 A B 0.8 0.6 0.4 0.2 81 7 81 61 7 81 62 7 81 57 7 81 58 6 81 97 6 81 98 6 81 93 6 80 94 6 80 70 7 80 34 6 80 74 7 80 37 7 81 38 7 81 25 7 81 29 7 81 89 7 80 93 7 81 54 7 81 78 52 2 12 0 16 58 1 12 54 0 12 89 1 16 22 1 11 81 0 15 34 1 11 30 0 16 98 1 16 97 2 11 61 0 11 32 0 12 96 0 16 56 1 16 83 0 16 69 1 16 33 0 16 53 1 10 17 8 11 42 86 6 0 Figure S4: Hierarchical clustering of models to identify representative ones for further analysis. The 20 lowest-scoring models were chosen for clustering. The dissimilarity score measures the similarity of regulatory-role assignment between pairs of models. Each model’s regulatory roles were represented as a binary vector, with 1 for activation and 1 for repression. The dissimilarity score was computed as the Euclidean distance between the role vectors weighted by |fiact 0.5|, where fiact is the fraction of models, among the 20, that assigned an activating role to TF i. A. Myeloid-only models. At a dissimilarity score cutoff of 0.4, the largest cluster, located to the left, has eight models: 12058, 16154, 12089, 12122, 16181, 11034, 15130, and 11098. B. Models including non-myeloid TFs. At a dissimilarity score cutoff of 0.4, the largest cluster has eight models: 81761, 81762, 81757, 81758, 81697, 81698, 81693, and 81694. 13 0.7 A Maximum activation 0.6 0.5 0.4 0.3 0.2 0.1 0 C/EBPδ Ets1 PU.1 Egr1 Gfi1 Myc C/EBPβ Fos 0.7 B Maximum repression 0.6 0.5 0.4 0.3 0.2 0.1 0 Myb Ikaros Jun Fli1 C/EBPα Irf4 Egr2 Figure S5: Maximum activity of each TF in model 12058. A. Activators. B. Repressors. The cumulative activity of each TF was calculated in each CRM. For activators, this was accomplished by summing the P a(k) occupancy of each site weighted by the activation efficiency, k EA fk0 , where k is an index over all the activator’s sites in a given CRM. For repressors, the factor by which they reduced the interaction strength Q a(k) was computed as k (1 EL fk ), where k is an index over all the repressor’s sites in the CRM. See Figure 2 and Supplementary Text for more details. The maximum activity over all the CRMs is plotted here. 14 Activation A B C 0.4 0.2 D 1( Eg -42 r1 75 (-4 ) 1 ve 04 ) G ctor fi1 C (-1 /E 11 BP 2) Eg δ(-8 r1 11 () G 73 fi1 5) ( M -59 yc 9) ( Eg -29 r1 2) (-2 83 ) Eg r E F Uninduced IL3+OHT GCSF+OHT C Ju /E BP EB α(F1 463 (-4 8) ve 087 c ) Ju tor n C (- 2 /E BP 19) α( -2 07 ) 0 n( Ju 451 n( 2) -4 2 ve 93) c Ju tor n C (- 2 /E BP 19) α( -2 07 ) 0.5 Fl i1 (El 460 f1 1 G (-46 ) AT 0 A( 1) M -44 yb ( 02 Ik -41 ) ar os 78) EB (-4 F 11 C 1(- 3) /E 4 BP 10 8 EB α(- ) F1 409 (-4 8) ve 059 c ) Ju tor n C (- 2 /E BP 19) α( -2 07 ) Repression 1 G fi1 ( Et -46 s1 57 ) ( PU -46 .1 01 (-4 ) M yc 580 (-4 ) 2 ve 92) G ctor fi1 C (-1 /E 11 BP 2) Eg δ(-8 r1 11 ( G -73 ) fi1 5) (-5 M 9 yc 9) ( Eg -29 r1 2) (-2 83 ) G fi C 1(-4 /E B 42 C P δ 1) /E (-4 BP 3 8 G δ(-4 5) fi1 3 (-4 72 G 2 ) fi1 44 (-4 ) 0 ve 98 ) G ctor fi1 C (-1 /E 11 BP 2) Eg δ(-8 r1 11 ) ( G -73 fi1 5) ( M -59 yc 9) ( Eg -29 r1 2) (-2 83 ) 0 Figure S6: Regulatory logic of Cebpa enhancers. A,D. Cebpa(7). B,E. Cebpa(18). C,F. Cebpa(14). A-C. Activation. D-F. Repression. A-C. The activity of each activator site is plotted. The activation is a(k) the contribution of the TFBS to the interaction strength I (Fig. 2D), EA fk0 . Here, k is the index of the a(k) binding site, fk0 is its occupancy, and EA is the efficiency of activation of the cognate TF, a(k). D-F. The repressive activity of each repressor site is plotted. The repressive activity is the fraction by which the a(k) repressor reduces the interaction strength (Fig. 2E), EL fk . Here, k is the index of the binding site, fk is a(k) its occupancy, and EL is the efficiency of long-range repression for the cognate repressor a(k). The gray box is intervening vector sequence (Fig. 3D). The x-axis shows each binding site modeled and the position of its 5’ end in the reporter construct relative to the 3’ end of the proximal promoter in parentheses. 15 0 16 M yb ( /E -48 BP 14 α ) E2 (-4 A( 779 E2 464 ) A( 7) E2 449 A( 5) G -43 AT 2 A( 8) Fl 428 i1 (-4 7) 1 ve 03) ct Ju or n C (-2 /E 1 BP 9) α( -2 07 ) D C Repression 1 Ik ar os G (-4 AT 63 A( 6) -4 4 ve 76 ct ) Ju or n C (-21 /E BP 9) α( -2 07 ) EB F1 G (-4 AT 96 3) A G (-48 AT 6 A( 3) G AT 484 8) A G (-43 AT 9 A( 7) -4 3 ve 67 ) c Ju tor n C (-21 /E BP 9) α( -2 07 ) 0.2 0 C Et s1 ( Et -42 s1 13 (-4 ) 1 ve 03) c G tor fi1 ( C -11 /E 1 BP 2) δ Eg (-8 r1 11 ) (G 735 fi1 ) (-5 M 9 yc 9) (Eg 29 r1 2) (-2 83 ) 0.4 B -4 2 ve 35) c G tor fi1 C (-11 /E BP 12) δ Eg (-8 r1 11) (G 73 fi1 5) (M 599 yc (- ) Eg 292 ) r1 (-2 83 ) G fi1 ( G fi1 (/E 44 BP 51 δ ) Eg (-4 r1 39 (-4 1) 0 ve 90) c G tor fi1 C (-11 /E BP 12) δ Eg (-8 r1 11) (-7 G fi1 35) (-5 9 M yc 9) (Eg 29 r1 2) (-2 83 ) C Activation A E F 0.5 Uninduced IL3+OHT GCSF+OHT Figure S7: Regulatory logic of Cebpa silencers. A,D. Cebpa(8). B,E. Cebpa(23). C,F. Cebpa(24). A-C. Activation. D-F. Repression. See legend of Figure S6 for the details of the calculation, axes, and legend. D E 17 Fl i1 (EB 468 F1 6) EB (-44 F1 34 (-4 ) 3 ve 30) ct Ju or n( -2 C 1 /E BP 9) α( -2 07 ) Repression 1 Fl i1 (G 457 AT 7 A( ) E2 -45 A( 14) -4 30 ve 4) ct Ju or n( -2 C 1 /E BP 9) α( -2 07 ) /E BP α( Ju -45 05 n( 42 ) C /E BP 42) Ik α(-4 ar os 151 (-4 ) 1 ve 43) ct Ju or n( -2 C 1 /E BP 9) α( -2 07 ) C 0.2 0 C G fi1 (-4 7 Et s1 12) (-4 68 ve 6) ct or G fi1 ( C -11 /E 1 BP 2) δ Eg (-81 1 r1 (-7 ) 3 G fi1 5) (-5 9 M yc 9) (-2 Eg 92 ) r1 (-2 83 ) 0.4 B -4 57 ve 7) ct o G fi1 r ( C -11 /E 1 BP 2) δ Eg (-81 1 r1 (-7 ) 3 G fi1 5) (-5 9 M yc 9) (-2 Eg 92 ) r1 (-2 83 ) Et s1 ( M yc (-4 Eg 27 5) r1 C (-42 /E 5 BP 4 δ( ) -4 ve 149 ) c G tor fi1 ( C -111 /E BP 2) δ Eg (-81 1 r1 (-7 ) 3 G fi1 5) (-5 9 M yc 9) (-2 Eg 92 ) r1 (-2 83 ) Activation A F Uninduced IL3+OHT 0.5 GCSF+OHT 0 Figure S8: Regulatory logic of miscellaneous Cebpa CRMs. A,D. Cebpa(10). B,E. Cebpa(13). C,F. Cebpa(19). A-C. Activation. D-F. Repression. See legend of Figure S6 for the details of the calculation, axes, and legend. A 0 B D E F 18 E2 A EB (-43 F1 22 ) G (-4 AT 04 C A(- 5) /E 3 BP 91 α 2) Fl (-3 8 i1 (-3 25 El 73 ) f1 2 (-3 ) Ju 73 n( 2) -3 E2 51 6) A EB (-34 F1 86 ) EB (-33 F1 42 (-3 ) 29 v 3 C ect ) /E or BP α( Fl -3 6 i1 (- 5) El 346 f1 (-3 ) Ik ar 46 os ) ( Fl -29 i1 (-1 8) 20 ) G fi1 ( G -40 fi1 60 () Et 375 s1 3) ( M -37 yc 32 (-3 ) 7 ve 21) PU ctor .1 ( Et -34 s1 9) ( Eg -34 r1 6) ( Et -31 s1 0) (-1 Eg 20 r1 ) (-8 6) 0.5 Eg r1 ( G -37 fi1 57 C (-3 ) /E 2 BP 96 δ( ) ve 328 2 PU ctor ) .1 (-3 Et s1 49) ( Eg -34 r1 6) ( Et -31 s1 0) (-1 Eg 2 r1 0) (-8 6) 1 Ju n( El 380 f1 7 C (-3 ) /E 3 BP 54 α( ) v 32 C ect 80) /E or BP α Fl (-3 6 i1 (- 5) El 346 f1 (-3 ) Ik ar 46 os ) Fl (-29 i1 (-1 8) 20 ) ve PU ctor .1 ( Et -34 s1 9) ( Eg -34 r1 6) ( Et -31 s1 0) (-1 Eg 2 r1 0) (-8 6) Activation 0 ve c /E tor BP α Fl (-3 i1 (-3 65) El 46 f1 ( ) Ik -34 ar os 6) Fl (-29 i1 (-1 8) 20 ) C Repression 0.5 Uninduced IL3+OHT C GCSF+OHT Figure S9: Regulatory logic of Egr1 CRMs. A,D. Egr1 proximal promoter Egr1(0). B,E. Egr1 enhancer Egr1(2). C,F. Egr1 silencer Egr1(5). A-C. Activation. D-F. Repression. See legend of Figure S6 for the details of the calculation, axes, and legend. (-4 yb yb 94 7) (-4 70 Fl 6 i1 (-4 ) 67 El 8) f1 (- 4 Ju 678 ) n( -4 64 M 1) yb (-4 60 Fl 0) i1 (-4 El 334 f1 ) (- 4 33 Ju 4) n( -4 EB 12 F1 1) (-4 01 1) M Repression 1 M E2 A( EB 431 F1 1) (-4 G AT 176 ) A( -4 07 Fl 0) i1 (-3 El 659 f1 ) (-3 65 9) 0.5 67 8) .1 (-4 6 Et s1 66) (- 4 Eg 334 ) r1 (-3 9 G fi1 07) (-3 64 5) Et s1 (- 4 PU (-4 4 /E BP 17) δ PU (-40 3 .1 (-3 2) 8 Et s1 44) (- 3 65 9) C Eg r1 Activation 0.5 A Uninduced IL3+OHT 0 B GCSF+OHT C D 0 Figure S10: Regulatory logic of Egr2 CRMs. A,C. Egr2(7). B,D. Egr2(10). A-B. Activation. C-D. Repression. See legend of Figure S6 for the details of the calculation, axes, and legend. 19 A 78 kb 35,870 kb Chr 7 35,880 kb 35,890 kb 35,900 kb 35,910 kb Refseq genes Cebpa CRMs C/EBPα 35,920 kb 35,930 kb 35,940 kb Cebpa 24 23 22 20 21 19 2 0 5 6 87 9 10 11 12 13 14 151617 18 Macrophages Egr1 Dendritic cells PU.1 PUER cells PU.1 Neutrophils (FDCP mix) GATA2 Megakaryocyte pro. (G1ME) GATA2 Multipotential pro. (FDCP mix) B Chr 18 42 kb 35,010 kb 35,020 kb Refseq genes 35,030 kb 35,040 kb 35,050 kb Egr1 Egr1 CRMs Ets1 6 5 4 3 2 0 7 9 12 14 Megakaryocyte pro. (G1ME) Egr1 Dendritic cells EBF1 pre-B cells (38B9) EBF1 RAG1-/- pro-B cells GATA2 Megakaryocyte pro. (G1ME) GATA1 GATA1 transduced (G1ME) Figure S11: Compilation of ChIP-seq and ChIP-chip datasets from NCBI Gene Expression Omnibus. Where available, BED format files were downloaded and plotted in Integrated Genomics Viewer (Thorvaldsdóttir et al, 2013). The first track shows annotated genes in the genomic region. The second track shows the CRMs analyzed in this study. The other tracks show TF binding peaks from ChIP-seq or ChIP-chip datasets. The TF and the cell type the ChIP was performed in are listed on the left of each track. Empirical evidence for binding is matched with CRMs predicted to be bound by the TF in the red boxes. A. Cebpa locus. Tracks 3-8: GSM537984 (Heinz et al, 2010), GSM881139 (Garber et al, 2012), GSM538003 (Heinz et al, 2010), GSM1218228 (May et al, 2013), GSM777091 (Doré et al, 2012; Chlon et al, 2012), and GSM1218221 (May et al, 2013). B. Egr1 locus. Tracks 3-8: GSM777093 (Doré et al, 2012; Chlon et al, 2012), GSM881139 (Garber et al, 2012), GSM499030 (Treiber et al, 2010), GSM546524 (Lin et al, 2010), GSM777091 (Doré et al, 2012; Chlon et al, 2012), and GSM777092 (Doré et al, 2012; Chlon et al, 2012). 20 Model 81762 output A 2 B 2 r = 0.91 C 2 r = 0.93 D 2 r = 0.91 r = 0.87 2 10 0 10 0 10 2 10 w/ GATA1_01 PWM 0 10 2 10 w/ GATA2_02 PWM 0 10 2 10 w/ GATA_Q6 PWM 0 10 2 10 w/ CEBP_Q2 PWM Figure S12: The dependence of model output on PWM choice. Scatter plots of the output of model 81762 (Fig. 5C,E) against models in which the GATA3_02 PWM has been replaced with the GATA1_01, GATA2_02, or the pan-family GATA_Q6 PWMs (panels A–C) and the C/EBP↵ /C/EBP PWMs (Table S2) have been replaced with the pan-family CEBP_Q2 PWM (panel D). The modified models were fit to the data without changing other TFs/PWMs. The regulatory roles inferred with the alternative PWMs were identical to model 81762. The score of model 81762 was 35072. A. GATA1_01 PWM, model score: 46083. B. GATA2_02 PWM, model score: 46826. C. GATA_Q6 pan-family PWM, model score: 45876. D. C/EBP↵ /C/EBP PWMs replaced with CEBP_Q2 PWM, model score: 74878. 21 6 Score 10 5 10 4 10 15 11 9 7 Number of TFs 5 Figure S13: The dependence of the quality of fit on the number of TFs in the model. Starting with the 15-TF optimization runs with non-myeloid factors, TFs were removed in order of increasing regulatory constraint, that is, from right to left in Figure 5C. The scores of optimization runs carried out with different number of TFs are shown. The parameters of each combination of TF regulatory roles were inferred in 5 replicates. The lowest score of each combination is plotted. The 20 lowest scoring regulatory combinations of the 15-TF optimization run are plotted in red. Upon removing the 4 least constrained TFs, the lowest score achieved increases, but is close to the range of the 20 lowest scoring 15-TF models. The lowest scores of 7- or 5-TF runs are as high as those achieved with randomized data (Fig. S3). 22 Supplementary Tables Myeloid-implicated TFs Non myeloid-implicated TFs EGR1 MYB CEBPB GFI1 EGR2 CEBPD JUN CEBPA FOS ETS1 MYC SFPI1 IKZF1 IRF4 FLI1 CEBPG YY1 SP1 IRF1 RARA GABPA STAT5A EGR3 ELK1 IRF2 RUNX1 RUNX3 MZF1 LEF1 FOXO4 GFI1B MAF POU5F1 SMAD HMGA1 TCF12 RXRA ETS2 E2F1 NFATC2 NFATC1 ELF1 POU2F1 KLF1 TCF3 ELF5 PATZ1 POU3F1 POU6F1 SOX17 POU2F2 GATA3 GATA2 EBF1 PAX5 GATA1 SOX2 Table S1: List of 62 candidate TFs. 23 TF PWM name Source Accession Threshold Consensus score Egr1 Myb C/EBP Gfi1 Egr2 C/EBP Jun C/EBP↵ Fos Ets1 Myc PU.1 Ikaros Irf4 Fli1 GATA Elf1 E2A EBF1 KROX_Q6 Myb_JASPAR CEBP_Q2_01 GFI1_01 KROX_Q6 CEBPD_Q6 AP1_01 Cebpa_JASPAR AP1_01 ETS1_01 MYC_02 PU1_01 IKZF1_03 Irf4_2_JASPAR FLI1_01 GATA3_02 ELF1_01 Tcf3_1_JASPAR COE1_Q6 TRANSFAC JASPAR TRANSFAC TRANSFAC TRANSFAC TRANSFAC TRANSFAC JASPAR TRANSFAC TRANSFAC TRANSFAC TRANSFAC TRANSFAC TRANSFAC TRANSFAC TRANSFAC TRANSFAC JASPAR TRANSFAC M00982 MA0057.1 M00912 M00250 M00982 M00621 M00517 MA0102.1 M00517 M01986 M01154 M01203 M00088 PB0138.1 M02038 M00350 M01975 PB0082.1 M01871 10 6 6 6 10 6 7 6 7 7 7 8 7 7 7 6 7 6 8 16.92 12.011 8.711 15.687 16.92 10.779 11.466 9.706 11.466 10.517 14.689 14.409 13.451 13.468 10.692 9.549 10.829 12.875 12.725 Table Kthresh /Kcons 9.88 ⇥ 10 2.45 ⇥ 10 6.65 ⇥ 10 6.21 ⇥ 10 9.88 ⇥ 10 8.40 ⇥ 10 1.15 ⇥ 10 2.46 ⇥ 10 1.15 ⇥ 10 2.97 ⇥ 10 4.58 ⇥ 10 1.65 ⇥ 10 1.58 ⇥ 10 1.55 ⇥ 10 2.49 ⇥ 10 2.88 ⇥ 10 2.17 ⇥ 10 1.03 ⇥ 10 8.87 ⇥ 10 4 3 2 5 4 3 2 2 2 2 4 3 3 3 2 2 2 3 3 S2: Position Weight Matrices used to detect binding sites for each transcription factor. PWMs were obtained from the TRANSFAC (Matys et al, 2006, http://www.biobase-international.com/product/transcriptionfactor-binding-sites) and JASPAR (Mathelier et al, 2014, http://jaspar.genereg.net) databases. The sixth column shows the maximum possible score, which is achieved by the consensus sequence. The ratio of the affinity of a binding site having the threshold score to that of the consensus site (seventh column) was computed as KKthresh = eSthresh Scons (see Fig. 2 and Supplementary Text ). cons 24 Name Probe ID Egr1 Myb C/EBP Gfi1 Egr2 C/EBP Jun C/EBP↵ Fos Ets1 Myc PU.1 Ikaros Irf4 Fli1 GATA Elf1 E2A EBF1 1417065_at 1421317_x_at 1418901_at 1417679_at 1427683_at 1423233_at 1417409_at 1418982_at 1423100_at 1452163_at 1424942_a_at 1418747_at 1436312_at 1421173_at 1433512_at 1448886_at 1417540_at 1436207_at 1457441_at Table S3: Microarray probes used to determine gene expression for each TF. The probes are on the Affymetrix Mouse Genome 430 2.0 Array (Laslo et al, 2006). For TFs with multiple probes, those having the highest average expression across conditions were chosen. 25 Model number 1 2 Replicate score 3 Median absolute deviation 4 5 Models with myeloid-specific TFs only 12058 16154 12089 12122 16181 11034 15130 11098 16197 16261 11032 11096 12056 16183 16069 16133 16053 16117 10842 11866 84588 85816 92488 84459 84712 84848 85965 84525 89653 84992 84956 87461 91119 84908 83995 84845 91251 85004 85073 85002 84780 85011 84440 84612 84858 782231 85307 84575 84115 94240 85180 85011 88557 86156 90393 84403 84757 98869 85038 84847 84578 108352 92472 84660 84758 84603 85734 84398 84235 84869 84884 102838 85237 93829 84223 88747 84755 85000 84558 84646 84486 85830 95896 84867 94119 84486 98235 84375 94237 84821 84961 92432 84942 91515 84253 84814 84788 84926 87251 84708 84753 84713 91417 84619 84887 84351 84588 84984 89867 89622 85142 85103 85057 91455 89386 85163 84795 84957 84930 84744 102 805 1055 41 100 245 427 127 4584 171 77 2450 295 2374 258 318 31 43 108 98 Models including non-myeloid TFs 81761 81762 81757 81758 81697 81698 81693 81694 80670 80734 80674 80737 80738 81725 81729 81789 81793 80754 81778 81522 36997 35114 40174 40640 48554 47962 48483 53782 47774 41259 47399 49543 39419 54962 55230 52803 48685 51514 52119 70736 38373 35072 40129 40634 54404 535358 48476 48199 47874 43291 47620 49558 39401 54692 48777 55391 48526 51238 79873 53933 37158 64063 40140 40637 48559 47960 48356 48218 47853 74172 47642 49408 39274 48944 56348 65949 48691 51612 51957 75414 37050 35132 40277 40677 48431 47952 49466 48232 47894 41277 47511 49407 42075 55390 54953 52880 45343 51623 470678 54151 80891 35144 40234 40629 54211 47962 61827 48201 47822 41223 47584 202858 47061 55200 55548 48257 54666 51421 52220 53837 161 18 45 3 128 2 127 17 31 54 58 135 145 270 318 2511 159 98 263 314 Table S4: Scores of the lowest scoring models by replicate. Median absolute deviation is given by mediani (|Xi medianj (Xj )|). 26 Transcription rate parameters Model 12058 Model 81762 Rmax (fixed) ⇥ Q 1500 5.77001 5.02737 1500 6.37259 4.96505 Parameter constant across TFs Model 12058 Model 81762 (fixed) 1 1 Activators Model 12058 Model 81762 EA Kcons C/EBP 3.39369 0.000557698 4.99494 0.000453547 EA Kcons Egr1 0.598979 0.003015459 0.402529 0.004086745 EA Kcons Ets1 4.99863 0.000191494 1.60987 0.012646051 EA Kcons Gfi1 0.212086 0.39995002 0.292196 0.39970697 EA Kcons Myc 4.9571 0.000247503 4.98925 0.064071995 EA Kcons PU.1 4.98391 0.001604342 1.69259 0.00891181 EA Kcons C/EBP 0.622789 0.001232013 NA NA 27 Fos 0.178452 0.078787372 NA NA Repressors Model 12058 Model 81762 EQ EL Kcons Fli1 0.000672614 0.999977 0.000449492 0.00511047 0.290777 0.010875566 EQ EL Kcons Ikaros 0.00123356 0.998445 0.004740461 0.933711 0.000115092 0.24209953 EQ EL Kcons Jun 0.0225533 0.215683 0.1833314 0.99947 0.586878 0.002607652 EQ EL Kcons Myb 0.957583 0.248584 0.073540587 0.317748 0.223248 0.07360456 EQ EL Kcons C/EBP↵ 0.00916877 0.0866283 0.16941582 0.999984 0.00303068 0.16931703 EQ EL Kcons Egr2 0.480565 0.00353553 2.96E-06 NA NA NA EQ EL Kcons Irf4 0.932284 0.960832 0.001217458 NA NA NA EA Kcons 28 EQ EL Kcons GATA NA NA NA 0.000138341 0.545508 0.08858245 EBF1 EQ EL Kcons NA NA NA 0.999986 0.870706 0.053408421 NA NA NA 0.999929 0.101139 0.39385154 NA NA NA 0.983475 0.313624 0.044488665 E2A EQ EL Kcons Elf1 EQ EL Kcons Table S5: Parameter values for the two analyzed models. NA is entered for a TF if it was not included in a model. Model 12058 included myeloid-implicated TFs based on differential expression in PUER cells. Model 81762 included non-myeloid TFs that were uniformly expressed in PUER cells. Parameters whose values were fixed instead of being determined by fitting are indicated. 29 Text S2: CRM and vector sequences in FASTA format >pGL3 intervening vector sequence TCGACCGATGCCCTTGAGAGCCTTCAACCCAGTCAGCTCCTTCCGGTGGGCGCGGGGCATGACTATCGTCGCCGC ACTTATGACTGTCTTCTTTATCATGCAACTCGTAGGACAGGTGCCGGCAGCGCTCTTCCGCTTCCTCGCTCACTG ACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAG AATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGT TGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAA ACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGC CGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATC TCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACA GGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAA GAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCA AACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAG AAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGA GATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATG AGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCAT CCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAA TGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCA GAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGC CAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTT CATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCT TCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATT CTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGT GTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAG TGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGT AACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAA GGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATT ATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAG GGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGG TGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTC TCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTAC GGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTC GCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCT CGGTCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAA AATTTAACGCGAATTTTAACAAAATATTAACGCTTACAATTTGCCATTCGCCATTCAGGCTGCGCAACTGTTGGG AAGGGCGATCGGTGCGGGCCTCTTCGCTATTACGCCAGCCCAAGCTACCATGATAAGTAAGTAATATTAAGGTAC GGGAGGTACTTGGAGCGGCCGCAATAAAATATCTTTATTTTCATTACATCTGTGTGTTGGTTTTTTGTGTGAATC GATAGTACTAACATACGCTCTCCATCAAAACAAAACGAAACAAAACAAACTAGCAAAATAGGCTGTCCCCAGTGC AAGTGCAGGTGCCAGAACATTTCTCTATCGATAGGTACCGAGCTCTTACGCGT >Cebpa CRM 0 CACGCAAACTCCTACCCACAGCCGCGAGCCTGTAGGCGGCGCGGCGCGGAGGGCTCCCAAGTGGGTGCTCGAAAG GCTTCGTAGCTAGGAATTGGACACCCGAGCTACCGAGATTAGTGCCCCCATGAATGACAGTAGGGAAAGAAAACT 30 GTGTCTTCAGGCCCCTGGCTATGGGCCCCGCCTGGGGATCACAGTCCCCGATTCAAGTTCACTCCTCTCCAACAC CCTGCCCTCTCGCGGCCCCTGTGCGCTCTCCTTAGGGTCCTTTTCCGCGAGGCTCAGAGGACCCACCGGCTGGTC CCGGGCGTGGGGTGGTGGTGTCCCGAACACTTGACTAGAGTGCTCCACGCTGGGTAGCAACGTCTGCCTGGTAAG CCTAGCAATCCTATCGCTCTGGCCTGGAGACGCAATGAAAAAGAAAGTTTTCCAGCCTAGGCGAGTGGACGAGCC AGGTCCACCAGGCTCCGGGGTTAGCGGCCGCCTCCGCTCCCCGCCGGGTCCTAGCGCCCTACTACTCTGAAGAGC CCGTGGGACCCTGTAGTTCTAGAGAAGCTGGGCGAAAGAGAGGTGCTCTGCCTGGAAGCCGTGGGGTCGCGTGGA GTTCAGAGAAAAAGACGCACAATCTCTGCGCTCCCGGCCTCGCCACTCGGCGGTGCGCGCTAGGTTGCTGGTCCA AAGCAGTCTCCAACCTCCCGCGCCGCGGCTCTCCGCCACAGCCTTTTAGAAATCCGGGTGGGAGACAGGCCTAGT CCCAGCTTTTAACACAAGTCTGCACTACGGTAGCTCAAAACCAACATTCTCTCTCCAAACGCTCCCCAACCTCCA CCTCCCCTCGCTCGGCCTCTAGATGCTCCCGGGCTCCCTAGTGTTGGCTGGAAGTGGGTGACTTAGAGGCTTAAA GGAGGGGCGCCTAACCACGGACCACGTGTGTGCGGGGGCGACAGCGCCGCCGGGGTGGGGCTGAGCGCTGCAAGC CGGGTTCGCCTTGCAGCGCAGGAGTCAGTGGGCGTTGCGCCACGATCTCTCTCCACTAGCACTATGCTCCCCACT CACCGCCTTGGAAAGTCACAGGAGAAGGCGGGCTCTAAGACCCAGCAGGCACCATCCTACTGGCGCCTTCGATCC GAGACCCGTTTGGACACCAGGGGGCGATGCCCGACCCTCTATAAAAGCGGTCCCCGCGCGGGCCTGGCCATTCGC GACCCGAAGCTG >Cebpa CRM 2 TTAAGGATTTTTCCCCAGGGGACACTCCACAGGTAGAGTCGCGTGCATCTGGCCCAGAGAGCGATCCTCTGCTCA CACCAGGTTGTTTATTTAGTGTATCTGCTTTTTATTCACTCTGTTCAAAAGCATCAAGCTAAAAATATCTTTTTT ATGCGCTCTGCACCGAAAATGAAGCAGTTGAGCCATTAATCACTGAGAATATGTTGAATGAGAATTTGATTAAAG TACAGGATGCGTCCTTCAGACGAAGTCGGTTCTTTACTGTAACCGCAGTGGAAGCACAGCAGACCTCCCTAACAG TTTTTAGCCGCTCCTCTTGGAGCCTCCTGGGCGGATCGCAGGGCCTTTCCTGGCCTAAAAAGGCCATTGCACTGG GGGTCTTGACCTGCGACCCTAGAGCGC >Cebpa CRM 5 AGTGTGACCAGGCTTTGTGGTTAAGATGTCCCACCAACCCCTTCCCCCACTGCCCTCTCTGTCTTATGCCTTGCC ACCTCCTCACCCATGGCTGCCCTTGGCAGAGGAAAGGCCAAAGCCCAGTGAGGAGGCGGTGGCTTGCCCTGTCAG CCTGGGAACTAGGGAGGCAGAGCCCCATCGCCTGCCTAGAGGCTGGGGCAGGCCGTGGCTTGCAGGCCCCAGCAA ACCCTCTCTGAGTTCAGTGTGGATTCCAGTTTTTCTTGTTTACAGGGAATGATCCTTGAATTCCCCCTTTGCTCC CAGAACAGGGCTTAGGCTGAACGGGGGAGGGGGCCCTCCTGCTCTCCTTTCCTAACCACCTGATGTGGGGGTGGT CATCTTTCTAAGGGCTCATGGAAGGGCACCCTTCCTGCACACCCAACCCAAGTTT >Cebpa CRM 6 GAGAGAAACAAGGGCGGGAGTGAGAGAGGGAGGTGACTTCTTTGCCACCAGATAGCTTAACGCTGTCACGGATGT CACCCTTATCTCTCGGATCTCCTGGGAATCTGTGAACACGGGGTTTGGCCCTGACACCTGCCATGAGCCACTGGG AAGGGTGAGGAGTGTCCACCCCTCCCAGGCTTCCCGCCCCCTCCTTCCCAGCCCAGGTGGAACCTGCCTGGCCTC CACAAGCTGCTGATTTCACAAGAGGGGAGGGGACCCCTCATCTTGAAGGTCAAGGGGAGCAGAAATTCTGATCTT CCTTAACCAAGTGGTAACATCTTCCAGAGCACCCATGGGTCTTCTGCAGACATGAAGTATGATGTTAATGTTCTA GCTGTTGCTTAGCTGTGCTAATGGATAGAGGAAACCCTTTTCTTTTCTTTCATTTCTTTTTTTTTTTCCTTTCTT TTCTCCCTTCGCATGCCCTCCCCTCCTTCCCCTAACCTGCCATCTCTTCCTTGCTTCTCTAACACCACCTCTTCT TTCTCAACCCCACCCATCCCCACTCTCATCCTTCCCCAGTGGACAGAGCCTTTAACGAAAGCTGTAAGCTAGGAA AGAAACCAATTCACTATCAAAGCATCCTAGCCCCATCCCAGAACAATTAAACAGCCAGATGGAATCTCTGGGAAA ACGCATTAATAACGCGAACAATCTGTGCAGGCCAGGCAGGAGCCCAGGAGCCTGGAGCAGGGGTGGGGTTTGGGA GTGGAAGTGGGCAGAGGAAGGCTGGGGATGCTG >Cebpa CRM 7 CCCACTTCCACCCCCTAAGAATACTGGATCCCTCTTGCCGATAAGGAACTGTGGTCAACTTCTAGTGGCTTTCCT GTGCACGTGTTGGGCAACCAAGCCTCAGCTGGACTTAGTTGCCAAGCCCAGACAACAGGTGGCAAGGGGGTGTCA GGGACTGGGTACCAGCTCTTTGGGGAGCTGCCATGACCTTCACCATCAGGTTAGGACCCGTCAGAAGTGGCCTCC 31 TTGAGTGATTTACAATTTGCAAACATGTTTTATTTGATTCCCGAGTTCTGCCGGGGCAATTACAGTGACTAAGCA TGACAAATCACTCTCAGCACAGTGTCTGCTCGCTGTTAAGAAATGTGTTTGCCTCACTGTTTTGCCTGGTGCGGC AACATTTTAAAAATAGACTCGCTCACTGTACGCGAAGGCAATTTGTTCCAAATTTTCCCACTAATTTGATTTTAA TCTGATATTTAAAATTCGTGTGACCACATTCCCACTGATTTATAGGGAATAAGCCCTACCTGGCGGCACTGTAAT TGGCTTTGGCCCAGGAGTCCACAGGACAGAGCATTTATCCCAGAACAATTTGAAGGCACTCATGTCTTAATGTTT TAAATATAGCCTAATTTAGCCTCACAAGTTCTGATTCCCTGGGGGCAGGACAATGAGTGTTAAGGTTGCTCTGCT CAGG >Cebpa CRM 8 TTTGTCTTTGCTGTCTCTTGGTCTCCATCCTCCACTGCTACCCTCCCTCAAGGTCACCGCTAGTTCTTAGAACTC CCCTGGGTCCTTGCTTGGCGTCTTTGCCGACCTTGGCTCCAGCAGACATTTAGTATCTAGCCCAGGGCAGGTCAT TGCACATGTGTCTGAATGCTAGTATCTCTTTCCAGCTTATCTCTCTCCTTTTTGGAACACCCTTGCCCTCTTCTT TCCAGCCAAGGGGCTCCAGGGTTGGCTTAAGGTCCACCGAGTATGGGCTGAGGGTGTCATTGTCAGGAGCAGCCC AAAGGATGGATCACAGACTTCCACCCGATGGCCTTCAATAGATGAGTTCTTGCTTCTAGAAAATGACTTTTAAAG AACAAGACTCTAGCGAGGTGGTCTGCCTAGTCCTTGAGAGCATTCAGTGGCAATGTGGGAATAGTTTATCATCAG ACTCAGAGTGGCCAGGCCTCTGCAGAAGGCTATGCTTTATAGGGACACTGGGTGGGGGAGAGCTGGAACTCTAAA GAAGAGAGTGAGGAGGAAGCCCAAGCTATCTGATATATGCCATGAACTGCTGAGGTGAAGGCCCACTCACTGTAC GGCCCGGACGCACTAGAAGCAACAGTTCGGAGTTAAAAGATAATTTCAGACCTCCATGCCTCTTTTCTTATCTCC ACTCTAGGATGCCTGGAAAGGTCTTCCTAGAGGAATGAGAGCCCAAGAGCAGGGCTAGCAGTGGTCAAATGTTAA CAGTCTAGTTTCAAAACACATTGCTGGGTGACCCAGAGAGCCACCACAGCCTGTCCCAAGCCCTCCTTGACCTGA AGCTATTTCTACTCTTGAAATATAAGAGGAACCCCTGTCCTGATCCAAAGCTACAGAAGTCATAGAGCCCCACAC CACTCTATTGCCCTTGAAGCCCACCAAGAGCACCCTCCCAGAGCTGCCCACCCCCCACTTCCACCCCCTAAGAAT ACTGGATCCCTCTTGCCG >Cebpa CRM 9 CCCTGTGGAAGAGTTGGTCAGGCTGGTCCTCAGACAACCAGGGAAGCTCTTGGGGTCCTGGAGAATAGGCACATA GCAGATAAAAGGAGTTCTTAACCAAACTTCCCTAGAACGGAGGGAGCTAACAAGAAAGAACTTTGGAAATCTACC CTCCTCTTTCCCTGTCACTGCCAGGAATGTCACCATGAGAGCAGTTTCAGTTAATGAGCAAACTCCTCAGACAAG GCAGGAAGGCAGCTCTTGGGCCTCACTGTCAAGCACAGGAAGCGACTGGATTCCACTTGCCCGGTGTAGGGATGA CAGCAGGTATTGAGTGGGACTGCAGGCCTGACATCCTTAGCTCCTCCACACCCAGGACAGCCCGGCTGTCAGCAC AGGGCAGCAGAAAGGACAGGGGACAAGCTCCAGGTGTGGGCGAGTCCCAGAGCAGCCCGGGGAGAGTGTCACTGT GTGGGTGCTGGCGTGGGGGCAGGAGCACACCACATGCAGTTGCACGGGGGACTGAATCTGAGGCTTTGGGGGAAG CATCCAAGCCCCAGGGTGTGTGTGAGGGGGTCCCCCACATGAGCAATTCCTCAGGCCCAGCAATGGCTGATTCCC TCTGCTCGAGGAGAAATCTCATGTGAGGAAGGTGGAGTCAGGTGAGTCACAGGCCAGGCCCCTGTGCCGTGGGGA GGTGGGGACGATGGCTCCGAGCCAGAAATGTGTCAGAGGCCAGATGAGCATCTTGAGGGACGCGAAGTTTATGTA ATTATGTGGGGCAGCACACTGCTCGGGTGTGTGAGAGCCATGCTAGGAAGGAGAATCGTCTGCTGCATGTCCTCT GCCTCCCTGGGCTATACTGACCTAACAGCCCAGCATCCCCACAGCTGGCCCGGCCAGCTGCCCACAGTCACAGAG TCCGAGTGTACCCAGATCATGTCTTTGTAGGACCAATGGGCTAG >Cebpa CRM 10 GGTGACATCTCTGTCTTCGGTCACCACAGGCTGCCAGCTGACCATCTCTTTCTGCCTCCCTACGAACCAGTGTGT AGTGGCAACAGGGCATCCATGAGACTCCATGAGATGTTGCTCACAGTGCCCAGGCCATGGTGCTGGTTACACAAG GCCTGAACCTGCCTTCCCACCCACTGGGAGCGTGAGACCGTGTATACCCTGGCCCCCACACACACCTGCTGTGTG ACCATACGCTGGTTTGGCAGCCTCTCTGTTCGAGCATGTCTAAAAGGTGTGCCGGGATTCCCAGAGGCAAGATGA TGAGTGGTGTGTTCGAAAACACAACCACAAGAGGCTGATCCCAGGACAAGTCACAGTGAGAACAGGGAACGGGCC ACCTGGCCACTAAGGTTGGGGGGGGGGTGCGGTGAGTCAGAAACAAACAAGCCACCTGCCAGCTAGATTGTTCAC CTCTCCAAGAGAGTGAAGGGTTGGGAGTGAGAGGGCTGCAGGACCCGGGTTGAGTAATTCCCTGCCTGCTTCTGG AGCCCAGATCCTGGCTTCTTGCCAGCCAGGTCCTAGGTGATGCTATTGTGCTAGATTAATAAATGGCATTACTCG 32 TTTCAGC >Cebpa CRM 11 GGGAGGAATAGAGAATTGAGATCAGTAATCTGTCTGGGAATCCTGGCGGGGGGCCAGTCCCCTGGGGCAGCCAAA CGGGGTGTTGTCAGCCCACATGAAGGCCCTGCTGCTGGCCACATTCTGTAAACAAACATCCACATGTGTGCGCAT AGCAACCTAGTGCCAAAGGACAAACACAACTGTGACAAGTTTATGGAGCTGTAGTATCGGGGAGGACAGAGATAA TTAGGCAAATCATAAATGCATGTAGTGATGCCTTTTGTCTGTGCGAACCCTGCCTGTCTGTTGGGAATTCCAGGC TGGAGATGGGTGGGGAAGGTGGTTTGTAATCTTTATTCAGACTCGGCCCCCAGAGTCCTCTCAGATGAAACAACT GGGGGACTGAGGAGAAAGTAATTCTGATTTCTGGGGTGACTTCGTAGGCACAGTCAGTCGTCTTTCCTTCCCTGC TCCCTGTACCTCCCCAGATAGAGTCAGTTTACCCTCAGAAACACCTCCCTAGAAGCTGACAGGTGCCTGGGGGAA TTCAATGCATCCTGCCTGCCAGCGCCAAAGAAGGGATCTCAGACACAGAGCTGAGAGCAAGGGGGCGGGGGCGGG GTGGGGCACAAGTCACAACTTTGAGATGGAGGTGGTTTGAGTGGGAATGTGTGGTCAGGGAAGGGCTCTTGGGCA GGTAGTGTTCAATGTGGGGCCACAATAACGAGCTAGCTCCAGCTACTCAAAGGACTTCTCTGGGTTGAAGGGCGA AACTGCAGATGGCTCAGAGAG >Cebpa CRM 12 TTGGCTTGGAAGTATAAGAAGTGAGTTTGGGGAGGCAGAGAGGAGAGGAACAAGGAGGTGAGAGCAAGCCATGGA ACTCATCTATTCTGGGACAAGAGTCTCTGGCTCAGGAGGGAGGCAACACTGAGTTCCAGCCCAGCAGAGTGTGCA GCCCCACCAGGGATCTAGAGACCGGCTTCCTCTTGGTGTCTTCATGCCACAAGATGTAAAATAATCCCCTAGGCT CAGAAAAACACATTTTATGGCTGTCTGACTCATTTCCCACAAAGCTAGGTTCACCCCCCCAAAAAAACGTTTACA GCTTTTGTTTTGAAAAATAAAAAAAGTATGTTGAAGATTTTAGGATGACTTTTAGACATAAATGGTTCCCTTCAC CCTCGTCCCAGGCCCCCCTCACCTAATTTCTCAAATGCTGGCTGCCAGCGGCTGTGGAGGATCCAGTCCAAGTGG GTGTGGGATGACTTGGGGAGCACTGGGCCTCATAGCCCCAGGGCAGGGCAAGTTGATGCCTGCTGGGCATTGAGA CAGGCCAGAGCTCCAGGGTGGGGGTGGGGGTGTCCCTGAGCCTGCCGGTTTGGGGTCTTAGGCAGTGATGTCACT AGCTTCTAGCTTGGCACTCTCCTTGGGGGACATGAGGGACAAAGGCCACATCAAACCGGTTGCTTTTTTG >Cebpa CRM 13 CTCAGAGTTAAGGGCTGGATCTAGGACAGACATGGCTTCAGCAGGAGGGGATGTGGACTGGGGGGTCTTCCAGCT GGGCTAACCCACAGGCATCAGGGGACAGGCAAGAGGCGACCAGGACTTCCCTAACCACCAGAAAGTGATTCAGCT AAATTGAGGAAGACTTCCGTTTCAGCAGAATGCATTTTCCACTATGGTAAGAGCTTTCTCACCCACGGGCTCTAG AAGATCAGAGTCCTCCAAAGAGGGTTTGGAGAGTTTCGGTTTGAATAGTCAATGCTGATTCTTAATCATTATTTT GAAGAGGCAGCTCTTGGTTTTCCAGCAGGTGGCTGGGTCTCAGGGCCGTCAAGCCAGGTGTGTATGTGAATACAT GGGTGCATACCGTGTGAGCATTTGTCACCTTGGAGGCCAGGAAGGAAGGCAGCCATGATCATTCCTTTGGTCAGT GGTAAGCTCTGTCCAGGTCTGCAGAATGACCTATATATGCCACCCTCACAACCCTGCCATGAGAGCTAATAGCTA CACTGAAGTTTCACTGTGTGCCAGGCAGCAGGCTGGGCACATGTGCCCAGCCTTGGGAACCTACCTATCAGCTAA GGCTCTAAGAGAGGAAAATACCAGTCAACATTCAGCCAGACACACACCCAGGTAGCCTAACAGCATCGCAGCCAC CCTACCTTCTGGTTCTCCTGTTCC >Cebpa CRM 14 GTAAGGAATCACAGGGGTCAGTCAGGGCTTCCCTAACTGGAAAACCCAAGTTCAGAGGTACCACAGACTATGACT GGGGTTAGAGTTGGAACATGGGGTAGGCCGACCTGGGGTACAAGGGAGGAGGACCCCCAGTGTTCATACCATAGG GGCACCTGCTTCTGCTAGACAGTGGGGGGGGGGGAGCCTGAGCCATGAGAGAGCACAAGGGAGGTCAAGGGGCAG AAAGGCCAGAGGGTGTCAGCAGGCTCCAGCAGGCTGTGGACACTTGGCCAGAAAGGCCTGTTTACTGAGAGGCCT GGGAGGTCAAGGCCCAGGCCTGGAGTTAATCATTAATGGCTCACCCTGCTCGTGGCTGCCTAGTGTGGTCTGGAC CAGGCCCCAGTACACAGGTACTGCCCCACTGCCACGCTGTGTGTATGGGGGTGGGGGCGGGCAGGGGCAGTACTG GTGTGCTTTGGAGACACTGACTTTCTGAAACACCCTA >Cebpa CRM 15 AGCTGGCAGTGCTATTGGAGTTGGAGGAACTGGCTGCTGGTGGGAGGAACTTAATGGGGGCTGCTGCCAGAACTC GGGCCGCCTGGTCCTGCCCCTGGCCCCGGGCAGTTGATGGAGTCCAGAGTGGAGGCAGGCTGCCACAGGTGAGGT 33 GGGGGAAAACTGGCAACAAAGGCCTATTCTTCAGGGTTTAAAGTGTCTCCGAGCCTGTCCAACTTTCCTTCATTA GATGCTCCAAGATGTTTCCCCTTCCAGGCTGCTTGAGATACGCGGCTGATAAGGGTACGATTTGAAGAGACTTAA TTATGGCCCCGAGACCATGATGTAAACATCTGTGGAGTGTTTGCTTTCCCCTCCCTGTTCCGTGTGAATAGGGAA CGCAGCCCAGGTGCCAGGGCAGAGGACCCGGTGACCCCAGCCTTTCCTAGGAAAGGAGGAAACCAGGGGCAGTGT GTTTCTGTGGTGCTGGCCAGGGGTCTGGGTCCTGCCATGACAATCTCCTGACCATTTAGTCACAAGAAGCTTATC TGTGATGGCCGTAATCATCTCTCACGAAGGCACCAGACAAGGGCTCCGTAAATGCTGCCTAGCGACATGGAGGGG ATAGGGAGTTTCCCAGGCTGGCCTGACCTCCACAGGGGCCTCAGCCGTCTAGGAGGAAGCATCTGTTCCCATAGC TTTTCTGGCCAGCTGGATGACAGGGAAGAGGACAAGGCTTTGGCCAGTCAATAGCCCTACCGTGTTTACC >Cebpa CRM 16 AAAATCAGTTTATCCCTATGCTGCCCCAGGCCTGGTACCCAGCATGACACAGCTAAGTTTCATTTGGGAAGAAGC TCGGTTCAGAGTTAGGAATGGGAGGAAATGGTGGCCCCATCCCAGTCCGACACAGATGCATGCTATACCTAGTTT CACACACATACATCATGCATATATACTGCAAGTTTGTCAGGTCAGTCCTTCCACTGCCTCTAGGAGAATGCTCAC AGGTGTGTGTGCGCACACACACACACACACACACACACACACACACACACACTAGCCAGTGAAGTGCTGCTTAGG AGTAGTCTGCATCTCCGGGGATGGGGAGGAAAAGTGAGAGAGGAGGTGACCATCTAAGGTCACTTGGGCAGGACA CCCTATGACATTAGGCAAGTGTTTTCACTGAAGGCCCCATTGCAGGAGCTTCAGGGACCTCATTAATAAGGACCG TGGCTCAGCTCTCAGTGATGACCCAACAGAGGTGCAGATCCTGCTTCCTTCTGTAGCTGTCCCTTGGTTGGGTCA CTTGTCCCAGCATACACACACACACACACACACACACACACACACACACACACACCATGCCTCCACTTCAGACCA GGAAGCCATTTTGATCCGTGATGAGAAGAGGAAGTCCTGTGGCCCCAAAACAGAAACAGGAACCTGGGAGGGGGA TATGGATAGGGAACGGGAAGGGGGCACACACCCAGCAGATGCAGCCAGAACACCCCAGCCTCCCCTGGCCCCACA TTAGCAGGCAGGAGTTAGCATCTACACCAAATCCCGATGCTATCTATGGTCCTCTGTGCATCTGGACGTGCCATG TGAGGGTAGGGCTAAGGGTCCCTCTAAAGCTGGATGCCTGGCTATTGGGCTGTGTTCAATTACCCAGGCTGATCT GTCCCACGGAGAGATGAATGGACCTCTGCTGTCCACCCACATGTCTGTGTATCATCTGACCGTCAGGCTGCGGGC TCCTTGCTGTTGCCCCGTGGGATCCCACACAGAGTTGTCCTCAGCCCA >Cebpa CRM 17 ATGTGGCAGGAACCTAGTCTTTACCCTTACTGGGCAACTATACTCAGGATTGTTGTGGTGCAGTCGCTGCAGGCC AGAACTGGGAATCTGCCCCATGGGCACCACTGTCTCACACTATCCTGGGCCAGGCCCATCCTCGGCCAGGCCCTT TTTTCTGGCTGGGCTGTGTGGAGGCAGAAGGACAGCTCTCATTCGGGGAAGCCATTGGCTGCTCCTCCTCCTTGG GTTTGGAGGTTCCTGCGGGAAGACTCTTGAGGCGGGTGGGCGGGGGCCGGAGCCTGGAGGACAGAGACATATGTC ACCAATTTCAGTTCTGCCCCGGCTGGCCTGAAGCAGTTTGGGACTTCCCAGGGTGGAAGGACAGGCTGGGGGGCC AGGCCACCTGGAGGCAGAACACACTCTTGAGTCCCCCTCTGTACCCTAATTGTCAACAGGGGAGGTGTGAAAAGT TCTGGGCTGCCTCCATAGGGTACCTGCTTGCCTCTTCTCTTCTTGAGCCCTTAGAGCAACCTTCCAGGACTTCCT GTTAGTCAACCCAAGATTATGGCAATAAGATACTTTTTTGGCCCTCTGAGCTCCCAGAATGAAGCCTTTAAGAGT TGCGACGCTCCAGGGGCACCTGGGAAAGATTTTCAGTATGAGAGTGAGGTGGTCTGGATAGATGACCCAGGAAGA CCACATCCCCTCCCCAGACTGAAGGATAAGGCAGAAGCCATAGCACATCCACCCTAAAGGGCTTCACCATCCCAA CCCCCCACAACTGTTTCCTAGAGGAGGAACAGAGGGAATGGAGTGGGCAGGTGAGAAAAGCCCCCACCTCTGCCT AGGGCGGTCACCCCCAAAGTCAGCTGGGAAACGGCTTGGCCTGAGGTAGCTCATTTGACCGGACCCTGTGCTCGG GAGATCAGTTGTGAAACTTGCTTCAAACCCCCAAACCACAAAAACGGCCGATTTTCCCAGCTCTGTTCACATTTC TCAGATCAGCCAAGGAGTCTTTATTATTTCCATCCGAAAAAATAAGCCAGGCCAGCCACCTGCCCTGGGGGGAAC AAGAAAATTCCAAAGGTGCAGAGCCATGGTTCTGTGGAAGCCCCTTCCACACTGGCAAGCTGTTCCCACCAGCCT CTTTACCGACGTGGGGCCCGGAGAAAGGGCAAGGCCTGCTGGGCACTTCTGGAGAAATCTGGAGATCCGAACA >Cebpa CRM 18 ATGCCACCCCTCTGATTTTGCCATGCCCACCCCCTGATTTGCCATTCATGCCCGCCCCATGCCCTGACTACCGGC GACCACAGGAAGTGCTGCCCTAGCTCAGTACTTCCCGTTTCTGAAATCTGCCCCCAGCAGCCTGGTGGCCAGGGA AGGCAGACTTCCCGCTGCCTCCACCCTGGGCTCTTCCCACCGGTCACAAGTGGTTTGTTCCTGGGTAGAGGTGAC CTTCTTGCCACAACCACACATCAGTTATTTATCAGAACAGGAAAGATGGCACCAGAGATATGTCCTCACCCCGCC 34 CAGGAGGGCCTGACCTGCTCCAACGACCACACTCCTGTTCCCCACATCACACGGGGCCTGCTCACCACATCACAT AGAGGGGTGTGGCTGGCACGTGCCAGCGGGGCTGCATTGTGGGGGTGGGCAGAGCAGGAGGTCTGGTGGGCAGGT ATGGTGGGACCTGGGACCAATGGGCTTGAGCGAGGCTCTCTGTGTGGCTGGGGGCTGTTGAGACATCTGGTAACC TTTGGGTCCCATGGAAAGCACCTTCCACGTAGACCCTCTCCTGACAGGGAATCCCTTGGGGTTGGGACATGTGCT CAGAATGGATATGTGGCTCTATTCCAGGGGACTCAGTGAGG >Cebpa CRM 19 CCTGTCTCGAAAAACCAAAAAATAAGACCAAATGCTTACACTGGGGCTGCCACCTTTCATCTCTTGATTTTGTAC CAAGTTTGGGCTCTTAGCTCGTTCAGATTTGTTAAATGAACCAAGGGCACAGATTGTTGGAATCTGCAGGAAGTC AGGATAAGGTCCGGATAGTGAGGTCAGAGGAGCAGGACCATGCTCAGCAGGCTGGCCTGGCCAGTCTGAAGTCTG CAGCAGTTCCTTCCTATCACATGAACCTGCAGGGAGGCAGACCTGTGCCCCCAGGAGCTAAGGACCAGCCATGTC CAGGTTCTATGCCTGGCCCAGCTGGGACCCTCTTTGTACCAGTGCCCTTACAGACTAGACACACAGCAGAGCCAG GGTAGCTTTTGGGCCCAGCTCTGGGGACCTTCACTGGCAAAGACATTTTGTCCTGCCCGGATTTTTCCCAAGAAA ATAGGCTCTTTCTCCTCCTTCCAAGGCACTTTTTTGTTCAGCCAAGCCCCTGGGGACCACATCTGCCTGAGGGGG GGCCTCTTTCCAGTCAGCCCTAGGCACCGCTGCATTCAGTCTGGTATTTCGGGAACCGCTTCAAGGGGTTCTGGA TTGGGTCTGCTTTGGGGAACCGGGAGTGGCGTCCTCTCCGGCTGAGTTCTAGGAGGACCTCTGGGTAGCTGCTTT CTGCCACATAGTTCATTCTGTGTTCATTTAACAAGTATTTGAGGCTCTCTTCTCAGCCAAGGCCTGTCCCAGTCC AGGGGAGGAGGGGTATGCAGAACAGGACAGACAAGGT >Cebpa CRM 20 TGTCAGGCAGCTCAAGAATGCAGTCGGCATCCACTCAGGGCAAGGTCCCTGTCACTAAGCCACTAGGCAGATTCC AGTGACCTATTAACCTTCTAGTAATTCCTCCCAGAAGACCAAGAAAGCTGGAACAACCTCATCCTATGAGGATCA CCCTCAAGAGTCTCTCACTCCAAGAAACACTGAATTCCAGAAGCACACTGTTTTATGGGGAATCTGACACACTCT GGTTTTCCTTAACTGTTACTGGTACACAATGTGTGTGTGTGCATATGTGTTAATCTGGGTGAGATTAAAAACTAT TTAAAAAAAAAAGAAAGAAAAAGAAAAAGAAAGAAAAGAAAAGATTGCATTTGGAACGAATGCAAGGTGGATTTT CTAATCAGATATATTTTTACCTGGATGCTAAAACATGCTTGTTGAAATTCAGTTTTTCCAATGTCACCCCTACCT GTTGACAGTGTCAGATGGGAGTTCTAACTTGGTGTTTGCTTGGTGCCCATGATGGTGCACAGGGATTTAGGCCTT GGAAGTTATCTGGCCAACAGGCAACAGGAAGTGTCACGAGAGTGGGTTATGTGCTTTTCAGTTGACCTTTGGGTT TTTGACTGAAAGCGCTCAAATGGAGACCAGCGGGAGAAGTGGGCTAGGCCAGTGTGCCCTGCCCATGCTGCATAA ACTGCTAATCAGTACAGATGGGAAA >Cebpa CRM 21 GGTGCCATATTTGGTGGAATCACAACCACACAGAGACATTGCAAAGGCCTTTGCAGGAGAAGGTCACATGCCAGT TAGCAGCCACTAACCCAAAGGAGCCAGCAGCAAGCTACAGCTACCAAGATGAGGGAAGTATGGTAGGACACACCC GAAAAAAAAAAAAAAAAAGCCAACTGTGCTTTGCTGCTGCTTTCTACCACAGGAATGTAGAGAAAAGGCTGGCAG AGCACAAAGAGGAACCAGCGCTATTAAAAGAACGGAGGAATTAGCTCGGAGTCAAGCTCCTAGGGATGGCTAGGG TCATCTGTTTCCCTCTCTATACTGATGCATCATGGGCCAAGCCCACAAGGGAATGTGATGGTGTCTTGAACAAAA TAGACTAGGCACCCAGCAAAATCTCCTTGGAAGAGAAGTCTGGGTTCAATACTAGACCCAACCATCAATACCAGA CAATATACGATGAGGACCAGCTAGACCTGAAGCCCAGGAAGTGCTATTTTTGGACCCCTCTGCACTGAAAGTAGT TTGCCTTATTGGAGGAAAGGTCATTCTCTCTACAGATCCATTGAGTAGCAGCAAGAAACTCAGACTCTGTCTTCG TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAGGGTGTTAGATGTAACGG >Cebpa CRM 22 GCTCCAGCCTGTATTATTTCATTACTTTTTAAAGGGCTCATCGATCTAAAACTTGTCCAAACAAGAGCTCTCAGG AAGCACGCGGGAATTGAGCGTGTTCTCTTGCTCACGGACGGGTTTAGTAGCTGCTACGCCTACTTCCAAAAACTC CCGTGAGACCTGCCAGGATGCCCTCCTCCTTAGGTGGAGGATGGAAGGACCGGGCCTTGAGCCAAGGAAGCGCCA GCTGGGTAGGAAATCCCCACCTGTGATTACTGGGGTATTCATTAATCCAGGATTGTTGTGAGAAAAAAAGAAGAA ACTGAAGCTGGTTTTGACAGAGAATTCTGCTAGAATCACATGAGGTCCGCAGATGCCAAATTCTGTTAAGGGTCT AGGGTAAATCTGCAGCTCGCTCAGTTCCACCCACCCCCACGCCCACTCCCACCCACCCCCGCGTCGGCTGTAGCC 35 AAAGCCGCAGCTTGGAGAGCTCAGCTCTGCCCCCTGGGATAAGGAAGAAGAGGGTAAGGGTTGTTTTTGATTACA GGTTTCTATGGTAACCAGGACTCGCAACAGAGGGGGTTTGAGCATAGCAACCGCAGACTCAGCTAGCTTTGGGAC ACTCGAAGCCAGAGTTTCGCCTAAATAATCCGTGGGGATAAGGAAGCACCGTCCTTAAGGGGGATCCATGGTGAA AATTAGTTTTGGATTCCTTGGGGATAAGAACTACATCTGGGTAGCTGCGTTGGGTCTATTCTTGTCTTGTTTTTT TTCCCAGGTTGCTTTTCTTCGACTTCTTCTGAAACCTTTCTTCTCAATTCC >Cebpa CRM 23 CTCTTCCCCTAGGCATCTACAATGGACCCCAATCAGTTGTCACATCCTATCAGGAACCCAGGGTTAACACGTACC TCCCAGTGAGGCATTGTGCTTCCTACAGTGATCTCCAAGGAAGCCTGCACCGTGCTAGGAACACAACAATTACTA AAAACATATGTGTTGACTGGGAATCTGAAAGCATGTACGCAGCGTCATGCTGTGCTGTGTTCTGGAAAGTGAGAA GAGCGGCTAGGGAGGAACAGACCAAGGCTCCCACTCAGCTGGGGTCTGTGTGACCTTGGGCTGGTTATCTCTTCT CTCCGGGTCTCTGTTTCTCCCTCTGCAGACTTGCATGCTAATATATGTCTCTCTCAGGGAAAGGCAGAGGCAGAC AAAATGTCTGGCAGGAGTTTGGCATCCAGTTCCCTACTTGGTCCTTCTAATCGCCTGTAAGCCGTTTTCACATGA CGCCTCGGCCGCTTCACGCCATGATTAAAGACAAATGTATGTGTTGTCTTTAAATTGTTCTTCAGGGACTCTTCA GTTAACCCCCAAATCACTTTATATTGTAGAATAAAATTTTTCTAGGATTTTATATCTTGCTCTTAGGAGATTGAT ATTTACTGGTAATTTTTTTTTTAAGATTTACTTTTGGCATTTTAAAGTGTGTGCATGTGTATGTGTTTGTGTGTG TGTGTGCATGTGCGTGTATATGTGAGTACATGGGGGTATGTGTCTGTGT >Cebpa CRM 24 CAGCAGCTTTCTATCAACTTGTGACTGTTTCATGCATGACCGTAGGTTTTACTCACACCTAAACAAACAGAACTT GATTTCCCTCTCTGGTTACTATAATTCATCCCGAACATATATCTGTCATGGCCCCGAAGAGCGTGTGCAATTATT CTAGTGTGGGAGAATCTCAGATAATTAGGGCCAACCTCAATGTTTTGAAGGTGTGACTGTTGTGGGAGGAGGCAG ATAATTAGTTAGATTTGAACAATCTGTGTGCCCGGTGTCTGTCCTTGAGAGTAATTGAAGCCCTTCCAAAACAAA AAACAAAAAACAAAAAAAAAAAAAAGAAAAGAAAAGAAAAAAGAAAAAAAAATGACTTGTGTCCTTTAGAACATT CATCAAAAAAACTTTAAAATGTTTCAATAATTCTTAAAATGTCCATAATTGCCCTAGTTCCTTGTGCTTAAAGGG ATTTATTAAGAGTTATTTTCCCATGACAACTTAAAATGAAAAACTCCAAGCTGAAAAAAAATGAACTGCAGTAAT TGGCTTTGATGAATTCAAGGCACTATCCCGCACTGTAACTGTGGGTACGGAGAACTTAAAGGCACAGATGTGCCT TGAGAATAAAGTGGGGGATGAACCCCGGGTACTGTGGAAGGGAACCAGGGTTACGGTCAGTATTTACCCCATAAC TGCGTCAGCCTGAGCTAATTCTTTGATGATTTCAGGACCAGGCGCAGCACAACTTTACAGATAATATCTCAGTTG TTGAGCCAAACTGCAGGAAAGCCTCTCCCTGCCTCAGTCACTCAGAACCAACCCGGCAGGATGCAGGAACTGACA CAGCCCTCCTAGCTGCCTGGGCCAGAGCTCAAATCCAGTCTGGCTTACTGTCTCCCGTGGCAAAAGAGGGACTTT CAGCTCTGGTCCCGTGGCTTCCTGTGCAATTAGCAAGCTTCAAGCAGATCCCAGGAACTACAGGAAAGAAACTGC CATT >Egr1 CRM 0 GAAACGCCATATAAGGAGCAGGAAGGATCCCCCGCCGGAACAGACCTTATTTGGGCAGCGCCTTATATGGAGTGG CCCAATATGGCCCTGCCGCTTCCGGCTCTGGGAGGAGGGGCGAGCGGGGGTTGGGGCGGGGGCAAGCTGGGAACT CCAGGCGCCTGGCCCGGGAGGCCACTGCTGCTGTTCCAATACTAGGCTTTCCAGGAGCCTGAGCGCTCGCGATGC CGGAGCGGGTCGCAGGGTGGAGGTGCCCACCACTCTTGGATGGGAGGGCTTCACGTCACTCCGGGTCCTCCCGGC CGGTCCTTCCATATTAGGGCTTCCTGCTTCCCATATATGGCCATGTACGTCACGGCGGAGGCGGGCCCGTGCTGT TCCAGACCCTTGAAATAGAGGCCGATTCGGGGAGTCGCGAGAGATCCCAGCGCGCAGAACTTG >Egr1 CRM 2 TCCTTCCACACAGGCACTCTCTGCTTTCTTTTTAAATAAAAAAATAAAATTAAAATAACAGCACCTTCCTCGTAT TCAAAGTTGGAAACAAGAGCCTCCCATTCCTGGAATCCCTTCTCCCTTTGGGTTGCTTCGGAGATAGGGCTTCAC TGCTTGCGTCAGGGTCCCGGGAGACCAGCGGGATCTCTCTGCCATCACACCCCCGCCCCCTCCCCCCCCCCCCCC TGTTCCCTGCCCTTGGCCTGGCTCTGTGAAGGAAGTGTTACCCTGAATTCTGGGCGCTTTGGCAGTGGCGGTTCC CTCGGGACTGCGGGGAAGGCCCAGGCCGCCGCGCCTGCTCAGTTCTCCCTCACTGCGTCTAAGGCTCTCCCGGCC TGGCTCCGCGCCCAGCCCAGACTACGGGAGGGGGAACGTGGAGGCGACGGAAGAGCCCGTCGCGCCTGGGGCTCC 36 CGAAATACAACCAGAGACCTACAGAGGGCAGCACCGAGCCGTAAACGGGTCCTCCGCACTGCAAGCTTGGGGTCG CCAGACTGCCCAAAGCCAAGTCCCCCTCTTTAGGACAGGGCAGGGTTCGTGCCCGACCAGTCCCTGGCCTGGATA AAAGTCAGGAAGTGTCTAACCATCACAAGAACCAACAGATCCTGGCGGGGACTTAGGACTGACCTAGAACAATCA GGGTTCCGCAATCCAGGT >Egr1 CRM 3 TGGTGGGAATCTAACCTGGGGCATCGAACATACTAGGAAAGTACTCTACCACTGAGCTACACCCCAGCCCATGAG GGGCCTTCTCAGACTACCATTCGATCTGCTTCAGGCCTCAGCACTACTCTGCGTGGCCCTTCCCACCTAGCCTAG GTCTCAGGGGTTTCAAAGAGGGGGTGATTTTGCACCTCCACACTCTGGGACAATTCTCTTTCTCTGGAACCTGTC TTTTCCCATGACTCCCCATTTGTCCCCAACTTCTTTCCCACCTCGTGTGTGTGGCCTCTTCAGCCTTCTTCTCAT CTCTCAAAGCCCTCCATGGGCGGGAGCAAAGCTGTACTTGGAAGCATCCCAGGCCAAGTTGTCTAGGTTTCCGCT GGGGCCTCTCAGCCATCGGAAAGTGTGACGGGCAACCCAGACAGAACCCAAAAGAGAGTCACTTCCTGAGCCCTA AGCTAGCCACAGAAGTGGCAAACCACGGCCTGAGCAGCACATGTGGGTGCTGACGAAACCCAGCCCTGGGAGGAG GAAGTGGCGATGGAAAGGGGACTTGCTTTCCTCTGAGAGTGGCAGCTCGCCTGGCCTTAGGTTAAGTAGAGTCAA GATGGCCCACTGCACTTGCTCTCTAGTGAAGACTCAGAAGCCTGGTTCTCCCTCTCCTCTGCCTCCTCCCATCAC CTGACCAGTAAGACCCTAGACTCTCAGGACATCCCTGAGGATTCCTTGGGGCCCAGCATCCCTGCAGGCCCATAA AGCCTGTTGTTGGTGACTTGGCTCCTTAGGAGGAACCAATTCCCCCTCCCCATGTCATCATTGTCCCCTCTGACC CTCAATCATAGTAAACAGAAGTGTGCCACCAGCTATAGAACATATCTGCTGAGTGTGAGGATGGCTCAGAGGCGC CTAGCGTTGCCTCTGAGATGTCGGGCCAGCAGCACAACCCTCTCTAAGAGCATGGCTGAGAGCATGGCTGACCTT CATGGTTCCTGGGCAGAATCTGCCTTCCAGTGGCTTATACCAAGGAGACTGGGCAAAGTCAAGAGGA >Egr1 CRM 4 TGCATCGTTCCTCATCAATGAAATGAGGCTGAGACAGAGACCTACCGCATAAGGCATTGTGAGGACAGACTCTTA GAAGCACTGAGCCCAAAGTCAGAGTAAATGTGTTCTGGGAGGGGTGATCATGGGAAAAGAGATCCCCCAGAGATT AAGGAAAGCCGGAAGAAGCAGGCACCCTGAGACTATCGCTTTTTGGCATTCTAAGGTTTGTTAGAGTGGGGTAGC CGAGGACAGTTCTGAATCAGGTAGCCCCAGAAGGTGCCTGCTCCTCCCTCCTTCCCCTCCTGGCAGGTCTGCAGA ATGCAAGCTTACCCACCAAGAAGCTGAGCAGCCTGGAGCTGGGCGGTGGGGAGGGGCCAGAAATAACCCCTGTTT CTATTCCTGCCTTTCTGGGCTGGGTGGAAACGCTGCTTTTTTGAATCTAAAGGCTGCTTCGACTTCCTGCTCATC ACTAGGGCTCTTATGTAACATTCCCCTCCCTCTCCTGCCCAGGGAGCTTCCAGAGATTTCTACAGCCGGCTTTGC TGCGCTGCACATGCAGCTGGGGGTGGGGAGAGCAGAGAAAACACCACAGGATGGGTTGTCCTCTGAGCCCAGTGT CTGAGCTTCAGATCCAGGGGAGGGCTGTGCAGGGAAGGACTGAGGGCCAAGGACTACAGCCTCTTTCGGTCTCTA CATGGCAACCTGCCGCAGGAAAGAGCCTGTCAGCCTATAGATAGGCGGCTGCCTCCTCCCTTTACAGACCTGCTT AAGGAACAGACTATAGCGCCTCCTCCAACTCAGTATGCTGCTGTTGCTTCCATTTTAAGCCTAACCTGGCTGGCT CAACTCTCTGTAGGGCTCTGAGGACTTTTATTCAAGTCACCCAGACCACAACTGCCTCCTTAGGCAAGCTGTACC TAGTGCCAAATTCCCCATGGTGTGTGTGTGGGGGTGGTGGTGGTGGGGTCTTGGTGGGAATCTAACCTGGGGCAT C >Egr1 CRM 5 GGCCTGACAAGGTAGAGAATCCTCTTTAACAGCCTACCTTTCTCTGACCTCTTGCCCACACTCACCCCAGCAGGT TTGTATTGGTGATGAACTACCTCCTCTTTTTGTCCCTGATAAGTGATCAATAAATAATGTTGCTTGAAATATTGA TGGAGAGTTTTACACCAGCATCCCTTAGCTGGGTCTGGCAGGCCATGACCCTCCCGGCAGCCTTTTCACACTGAC AACAAAAGGTTTCCTTTGACCAAGATCTCCACCCGATCCATTCGGAACCAAAATCATGATTAAGTTTCAACCAGT AGAGGAAGAGATGTGAGAAGAACTTAGCTCCTCCCCCTGAGAAGGACGGTTCGGCTAAAACAAGCAAAGACCCTT AACTGATTTCCCATGATGCCCTCCAGATCCTTCTCCAGTTCCATGTCCACGGTGGGCCTGCCAGGTCCACTTTCT GAGTGGTCTAAGCAGAAGGTGATCACATGCAGGGCACAGTAGAGAACACCTGGAATCCCAGCCCTGGGGAAGTAG AAGCAGAAGGGTCCACTCCAGGCTCCAGGCCAGCCTGGTCTACGTAAGTTCCAGGCCAGGCAGGGCCTCCTGGAA ACTAACAACAAAACAAAACAGGCAAACAAAACAAACATGATCACGTGATCAGACATCTCTGGCCCAGGTCACTCT CACCTGGTTTATTCTCCAGCATGAAGGGTAAGAAGAGGAAAAGAGGGGAGAATCTGAGGTTGAGGAAGCTCCCGG 37 ACCAAGTCAGGCCCCATGCTCTGTGCCTTGTGCTCAACAGACAGAAGCCATATCACCTCAAATCACTGCCCAACA CTTCCTGCTGCCCAAGTGCCCCTCCCCCACTCTCCCACACAGCTGGGCCAGCTCCTACAGGTGCACAAGTCTTGG GCTTAAACCCGCAAGCTGTCAGGTGTCCCCTCACACCTGTCCTGCCACACACCTGCTGATGCCACCATCAGCCCG TGCCCTTAGTGCCTGTTCCTCAGTCTCCTCCCCCATCAGACTGCGAGCCCTTTCTCACATACAGATGGGAGTCAG CAAGCCAGCCACATTCGGTTCTTGCTTTAATGCAAATGCTAAATTGGGAAGAAGAGCAGTTTCACACTCCATATT TAGCCTGGGAGAGAAAAAGTGAGACAGCATGGGTGTTACCAGGGAAAGCAGGCTGGGGCTGGGCTGGGTGGTGGA AAACCTACCCAGAGCCCAGGGGAAGAGCATGGGCCCTTGTCAACTCACTCCTAAAGAGAATCACCCAAGGGCCCT GCCACTTAAAGACCC >Egr1 CRM 6 GTGAGTTCCAGGACAGCCAGAGCTACATAGGGAAACCCTATCTCGAAAAAAAATGGAAAAATAAAAGATTCCCAG AAAGTAAAATGCTGTAGTACTCTGCTAGCGTTTACCATATAGCCAAGAAAACAAGTTAGGTTTCTGTTTCATTTG CTACATCTCACACACACACACACACACACACACACTTGTGGCTGGGAGACCACAGCCAGATTCTGAATAGCCCCT CTCTACATATACATTTGGGTTTTTTTGTGTCAAAAGGAGTATGCATGTTCAGACTTCCAAGCTCCTTGTCCTGTT GCCATGGAAACATTCAGGCTTCCAGAACTCTAGGCTCAGGGTCTGCCCCCTGTGGTGCAGGGGAGGAGAGGAGAG TTGACAGGTGACAAAGAATAACTGGAAAGCCTTTTAGGAGAAGGTTGGCTGGTGGGGCGCACTGGGTCTGGGTGA CCTCATACTGTCACTCATTCCCTAGGGCCACCTGGCCTTGCTCCTTTGCCACAGATCTTCTCTGGCAGGAGGGAA AGACCTCTGTTAGAGGTGGGTGGAGGGCACTAAATCAAGGGGTTCTCGGGGGCCCTTGGGAAGTATTATTAGCTT TAGCGATAGGGTTTAGTGCCATGTCACGTGCCCAGGTCCCCTGGGAATATTAGGCAACCCTCCAGGCCTGTGTTC CAGACTATGATGGCCTCAGGCCTGAAGCCGCTTTATCTGGCTTCTCCTCCCTTTTTTGTGGGTGTTCCAGCCTCC CAAGAACCTGCTTAAAATGGGATTTCCAGGCCCTGACGTCAGAGGAAAGCCAGCAGCTCCCTCTAGCCTGTGCCA GCTCTACCGTGATTAGCAGAGCTAAGTTCAGCCTTGCTCAGCCTACTGGAAAGCTAACAGGGACTGGAGGGAGGA ACTTGGGACTCTAAACAGTGGTCTCTGTATCTGTGGCTTTCTGGATGACAGGAACAGTCTGTTTCCAGGTCAAAA GACCCTCCTGGCTTTCCTACTAACTTAAATTTCAGCTAATGTATGATCATTTCCCTCCCAACGCCATAGTTGCTT TCTCTCGGTTCTAGGTCTCATGCCTGACTTGAGGAAGAAAAGGGCATCTCAAGGCAGTCCTGAGAGCTGGACAGC GGCTTCCGTTTTGGTTTTTACCCAAGAGGAGGTTGAAGGTGGCGGCTGTGGGAACTCTCCCTGCAAGACGTTGAA AGGCCCACTAGGTGGCGCAGCTTCCTCCCGATGTGGATTCTACCCTCTAGCAGCTCAGGGCCTGGAGACCAGAAT ACCTCCTACTCTGCTCCCCGGAATGAGAGACTAAAGGGGGTAGAAAAGA >Egr1 CRM 7 GCCAGTGTTTGTTCTGCTTCGGGCTGGTCAACTATAGCTTTGTGTTGATGAATTGGAGCCAGAGGCCACGTGGCC AGAGGTGGTGGCCAATCCAATCCCTTATCTCTACCCAATATTCGAGAAATCTGCTCCAGGCCAGATGTGCTCATA GGAGATAAGAGGTAGACAACACAGGCTTTAGGAAACAGTTACAAGGCTAAGGGTGACTCAACTTCTCCCTCCTTG CATGTCCCCAGCAACTTAAAAACAAGGGCTAGTTGTCCAGCCAAGAATCCAGGAAGCAGAGCTCATCCCTTTGCC AGTTGGAATGGCCATTCTTGGCAGCTTCCTGGGAGACGGGGCAGAATGGGTAGGAGGAGGGTGGCAAAGTATGTT CCCAGTGAGTGGGAGGTACACAGTAGGAGGTTCTCTTGACCCTGGAGCCTAAGGTCTACGCACTGCTGGAGTGAT CCTTGCGGGTATGTGAGCATCTGTCTCCTCAGTGAGAGCCTTTGACCTGGCCATGACAGAGCAAGGACAGCTCCT GTTCCAGGATTGCAGATGGTTGGAGAGATGGAGATGTTGAGCCAAAAATGACAGCACAATGGTATGTTACACAAG AAGGGCCAGCGTGTGGCTCTCTGGAAGCACACAGACAGTTGTCCCGTCCAGCTGGGGCCTTGGAACAGGTGAGAA CATGAAACAGAAGCCTAAGTAGGGGTTAGACCTCAGGGCCCGTGTGATGCTCGGACAGGGAAGATGACGGCCAAT AGGGTGGAGTGTCGCTAGAATGGAGTATTGTACAAGTGCTTCCTGCCACCCTAATGTGCCCTAAGTCTTCTGTAA ACTGATCAGATGCCCACAACCTTTGGGCAGATGGTACAGTGGATTTGGTGGGCTGAATTCCCAAGCCTTGGCTCC CCACTCACCTTTGACCCCAGACCCCAGAGTGTCCCTTCCACCCAGCATGGTCACCAGGAACTAAGTGGATGGAGG GTAAACTCACCTCCACCTACTCCTTTTCCTG >Egr1 CRM 9 TGGCCCTCTAGGAATTCTCTAGCTTCTTTTCCTTTTTGGCAGGGTGAGCTATTTCAAGACACAGTTTCTCTGGCT GTCCTGGAACTCAGTCTATAAGAACAGGCTGGCCTTGAACTCAAAGAAATCCACCTGCCTTCCCAGTGCTGGGAC 38 TAAAGGTGTGCACCACTACGGCCTGGCTCTCTTGAAGTTCTCAATAAGTCTTTTTCTCTTGAATTTCAGGAATGG TCAGCCCCATGCATTTGGAAAGGTTCCACTTAGAAGAAAAGAACTTGGCCTCCCTGTCATATTCAGTTCTTCTGT TTTCCCCTGTGAGCTTCCTTCTACCCTTCCTGGCTCACAGGAGAAATGGAACCAGGTCAGAATAGCATAGGGTGG ATATTTTCTGCCAGGCCCCCATCCCCAGCTTGATTGCACTCCTAGCTTCCTGGTAACATTAGTACTGTTTATGAG GGGGGAGAGTAGACAAATGAAGACCGCAGTGTCCTAAATGAGACTGATTCTCAAAGAATCAGATGTCAAAACCAT TCGATTACAACATTTTATCCAGAAATCTTTGGATACCATGGTGCGTGTGATTTCCTCCAATTGAAACATTCCTTC CTGGAGGGAGGTAGGAGGTCCCAGACTCAGGAAAGGCAACAGGCCTCCTCAGTTGAAGAATGCTGAAGCAAGCTC CCTAGCCTCACAAGGCCCCTCCCTAAGGTTATAGAAACGTAAACAAAAGCTAGGAAGAGGAGGCCAGCTCCCATC ACAGTAGAGTCCCGAGGACAGTCATGAAGACAGATGCAGGAACACAGTTCTCATGAAGCCTGATCCATGCTGTGG TAGGCTTTGAGC >Egr1 CRM 12 GACCAGGCTGGCCTAAAACTCAGAAATCCACCTGCCTCTGCCTCCCAAGAGCTGGCATTAAAGGTGAGCACCACC ACCGCCCTGTTTCAAAAACAAGTATCAGGAGACCTCGTAGAACCAAGCAGGGAATTAGCTGAAATAGTGTATGGG AGATCGAAACGTAGGACATAGTCAATATTCATTGATTCCACAGATATTTGAAGAGAAAAGCCACTGCATTTGGGA GGTTTTGTGACTTATCAGTTTTATTCTACAGTGATGGAAGATGTTAAGTGCGGCTCAATTTGCTGGGTGCCTGAA GGTGTAGTGAGATGCTTGCCATTCACAAGTGACATCCCCAAAAGGCTTCAGAACTGGTTTCTCTGGGTGGATGAG GCCTGACACACAGGGAAGGAGATCCAATCTGCATCTCCCTTGAGTCTCTGGCTCCCTGGGAACCAGTTCTTCCTT TCTGCTACCAGAGAGGCCTGGATGACACAGCTACATGGCACTCTGTGGCCCACAGAAGGAGCTGTGTCCCTGGTG GAATCGAAAGACATCCTTGAAAGCTGCAACTTGTACAGGGAAGGTTACCATGCCAACCACAGCCTCCTTGCCCAA CAGGGATGAAAGGAATCCAGGCCCTGTGTGACAGGCAGCCATTGTGTGTTCACAGACTTGCCTGGAATCTATCAG TAGCTGTTAACTGCTGCTCAGCAGGACGGGGCAGGGAAATTCAGGTTAGATTTCGGTGGGACTCCAGGACAGGCT GGACCGACAAGGGCTGATTGGGAAATGCTTGTCTGAGAGTGATACCCATCTCTTGGTGAAATGTGTGGGTGTGGT ACTGACAGCAGGTGTGGGAGGGGAGGGTGTCCTACATACCGGGTTCTGGACAGGATGGGAACTCAACTTCCTCTG GGTTCTTTTTTGGCCCTTCCACCCTTGCCTTCTGGAGGAAGTAGGGGAGGGCCATGTGCAGGGTCAGGAGATCAC TGAAGCAATGGCCAAATTGAGAAAGGCAAGTCAGGGCCACACATACCTCAGGTAAAGGCAATAACCCTAATTGAG CCACAGGCTGGTGCCACAGGCTGGTGCCACAGGCTGGTGTCACAGGCTGGTGTCA >Egr1 CRM 14 CTAGGTCGTTGTGATTGGCATCAGGAACCTTCTGCAGAGCCATCCCGTCTTCAATCTTTCTCTTCCTCTTTTCAG ATTTCCTTCCTTCTTTTCTTCCTTTTCTGGTTCTGATCTAGAATAAATCGACCTACAGGCTTCCAGTATCCACCA GCCAAAGACACATGTGCTGTGTGACTACAACGTTCAGAAGTTGAGTTCTGTGTGTGCCCCCGGCCATCACAGGTC ATCTTGAGTGCCTGCGAATAGGGGGCATGGTGAGGTTGTCAAGGGTCCAGAAGATCCAGGAGTTAGGTCCCACAC CCCTGTCCTGGGAACCTCTTGCCTTAGCATGGCTATGGCTCTTGTGGATGAAGTGGTTCAGGAAGATCAGGCTTC CTGTGGCTCCCCATTGACTCCCCGCTACCTTTCGCCCCCTGCTGGCCTATGCCTCCTCACGGTCGAGGGCAGTTC AGGGTGTGGAGGCGCTGACTCCTGAAGCTGGGAGATCCCGGGCTGGAAACCAGAGAGATATTTATAACTGAGCGG GGTGGAGGGAGGGAGGAGGCGTTGGAGGTGTTGAACTCCACTAGGATGCCACAGAAATCGCGGGAACCCCGCGTT TCTGCTTGGAAGGCCTTCTCTTTCTCGGTTAGGGAGCTTTGAGACCCAGAAAGTCCTTTGAATTGGCAATAGTCC TAGGTCTCATGAAAGCTCTGATGCCAGCATTTAAGGTTCCTTTTTGGGGGTTGGATATTTGATCTAGATTGATAA TTTTCTCTCCCTCTCCCCCTTCCCTAGTGCCTTTAATCCATTTAGCCGTCTCAACAGCGTTCGTTAAGAAATTTT AGTATAGGGCTGGA >Egr2 CRM 0 TTGCTTGCGGTTTTGAGCTGCCAAGAAAGTGAAGGAGGGGTTTGACTGTAGTGTCTCGGCAGCGCTCGGTTTTCT TTCCGAAGTTTAATTTTCCGGAATGGCTCCCAAACAAGGGCCGGGGAGGCGGAGCCGCCACTACCGGATCTTCTC CTTTTTTGGAAAGTCTCGGAGAACCGGAATTCCTCCCCGCCCCAAGAGACAGAGCTACCAGCGCGGCCGCCGTGG GTGAACTCACGGCGGCCGCGCTAGGGTCGGTGCTCGCGCCTTCTTCCCGCTGAACTCTGCAGTCCGGAGTCCCCG CTGCAGGCAGGGGCCGAGAGCCCCAGACCCGGGTGGTTGTCCACCGGCTGCAAATCGTTCCTGGCGAGCTCAGCG 39 GAGCCCGCGCAGCCAAGCCCGTATGCAAATTGGCCATGTGACGGCAAAAGCTGCCAGGCCCAGCCCTGTTCCTCA GTCCATATATGGGCAGCGACGTCACGGGTATTGAAGACCTGCCCATAAATACTCCGAGCCTAACACTTTCCGTCT GAGAGAGCAGCGATTGATTAATAGCTGGGCGAGGGGACACACTGACTGTTATAATAACACTACACCAGCAACTCC TGGCTCCCCAACAGCCGGATCACAGGCAGGAGAGAGTCAGTGACGGATAGACTTTTTTTTTTCTTTAAGAAGCCA ACAACTTGGTTGCTAGTTTTATTTCTGTTAATTTTTTTCTTTTTTTTGGTGTGTGTGGATGTGTTGTGGTGGTCT TTTCTAAGTGTGGAGGGCAAAAGGAGATACCATCCTAGGCTC >Egr2 CRM 3 GCTGTCTCTTGAGTGCACACGCATGTCCGTGCGCGCGCGCACACACACACACATCCACACACAGAGCTTCCAGTG GAGAGGTGAATTTGTCATTATCTGCAAACACAGGGTGATGGAACACCTGTGTAATAGGGCACAGTCCTCTGTCAA GGCCTTACTCATCTCTAGTGTTTCTGAGCTAACAGATGTGGGGCCAAATCAACCACGTGCGGCATGTCTTCAGCG TTCCTTCATGAGATGCAATCAGAGAAAGAGATCTAAATTGCAAAAAAAAAAAAAAATATTTTCTTCCTTTCAAAG CTCCCATGGCTGTTGCCTGGGGAAAAAAAAACCTGTTTATAAAAAGCAAACTCTGGGCTGGACCTCACCAGGTCC CTGGGGTAAACACTGCCTGTGTGCTTACAAGACCATTGACTGAAACTGTTCGGTGACTCAGGAATAAGCCTGGTG GTGAACCCGAAGAGCAGAAATTACACATTTTTGTCAGTTGCTAGGAGTGTGACTGTGTGTCTAGCCTGTTTGCAT CATCAAGAGAAGCAAGAGAGATTGGGACTGATCCCAAAGGCCCCAATTCTCCAGGGAACCCCCCCTTCCACGCTA TGAAAAGGAGTACTCAGATGTGGACCACCCCCTAATGTGAGGAGGAGGAAGAGAGACCATTTGGAAGGAGCTTTG GGATTTGACAGGAAGCAATCAGGTCCAATCCAAAGATGCGCTGCCTCTCTTCTAGCCTCAACTGGGTCTCCTTCC CTGCCCTAATCTACATTCACCTCTTGCAGCCTAGCAACCACTCAGAGAGACAGCAACTAGAGCTCTCCCACAATG CCCGAGCCATGGTCAGTAGAGTCAGACACATCAGTCTCCATCTTAAAGATGGGAAAACAGAGCCTCTGAGAAGAG AGGTGCCACATCTCAGTACACAGGCTAGG >Egr2 CRM 4 GCTCTAGGCAGAAGGAACAGCCAGTATATAGGTTCTGAGACAGGAAGGGCCCCGGGACACTTCAGGGTTGGGGCA GGATGGTGACCCACGGAAGATTAGCCCTCTGGAGCAGCCTTGTGGATTGGTGGGTTACTCCGAACACAAACGAGA TAGTGTGGGGCCAGTCGTGGCGAGAAATGCAAACCATAGATACTGTGGCCTTCTCTGAGCACACCATTGCACAGG GTAAACTGGGTGTGGAGAAAGCAAAACCATTAAGGCTGTCTGTTAAGCTTGTCCCATCTCCCTGGTCAGGCTGGT CACCCCAGTTGAGACTCAGGCACACTTAAGCCACCCCACACAGATGTCAATCTCTTAGGTGTGATTTCACCACAG ACTGATAACCGACAGGGTGGCAATCACTGAGTCAGTGCCAGCCTCGTAAAATTGGGACAAGTGAGGACTCAGAGG ATGGAACGGAAGAGGAATGGCACCCAGAATTGCCTCCCAGGACCTACGAGGTGAAGATCTGTCTGCCTGGTGGCA GCTCATCACCATTAGGGCACTAGTCCTTAACCACTGTATGCAACAGGACACGAGGGTACGAACGGGGTCAGTCAC AGCTGTGGAGATGCAAACAGATGACCTTAATGATAACTGGCCATTTGCCATCCCAGTTCTCAGAAATTGCACAAG CCTTCGCATGCCTAAGCCTTCCCAACCAGTTCCCAACCAGTACGTGGACACCCATTTCTCAGACTGAAACTCAGG CTCAGCGGCATCGGGAGGCATGGTGTATAGCCCTAGGTAAGTCACCTGAGCTCTCTGGGATTCATTTGGCTTCTC TGGGCAGAGGGAAGGTGAACAAGGCTTGTCAGTATCAAGAGGTTATACTGTACATCTGAAAAGCAGTCATGTTGG AGAATGGGAGTTGGAAAAGTTTGAGGAACATCTAAAAACATACTAATTAGCCTTTCCAATGGAGGGGAAGCATAG GCAATTGGGAAGTTCACGGTGACACTGCTTTAGATAGGAAATGGACCCAAAGGCCCGAATCCCAAGACTGACTCT TGAGAACTGGGATTTTTGTCTACAGGGATCAGAAGG >Egr2 CRM 5 CCTTCACTTTTGGGTGGTATATGCAAATCTTGCTTTTCATTCCTGTGGCTGAGTCTGAAAAATCTTTGTTCAGAA ACTTAAAGAATGATAAATACAAAAAGAACTTATGTTTACGGATCTGCATCTTCAGGAACTGGACTCTGTCAGTCT CCAAATAAACTCCCAGTGTCGACTACTGCTACCCAGTGGGTGCTGTGAATAAATATCCATCAATCAGTTGATGCA TTGATTAACGTTTAGGGATTGTCTCCCACAGGCATTCCAGTGTATTTAATATAGACAAGTGCACCATTTATGAAG GGCAGGGACAATAAATGACCCTCATGTGACTTGTTCCACACCCCTGAGACTGTCCCCGCCCTCCCTCAGGCCTGA AGCAGTGTTTTGAACATGTCTGACAGTACCTATCCTGTCTTGTGCACACTCTCCGTATTTGTACACTCACCAGCC TCTATGCTAGAACCCTGTTTCAAAGAGTTAAGGACTCTAGTTCATCTTTTATTCCTTCTAAACTCTTACAAAAGC CACAACACCCTACATAGGCCTCAACTAAATGATTTTGCCAATACTGCCCCCTTCCCCTGGGCCTAACAGATGACA 40 GCACTGTGTTAGTGACCAGAGAGCCCTGACATTTTAGATACTTTTGGACCCTAGTGATAAAACGCTTGACTTACA TCTGTGCAAGCCTGTTTTTTTCCTTTTACACCGTGTGTTGGTTTGTATGTGTGCATGCATTTTTAAATAACCAGG CTGTAATTAAGTCAGCCTATTCTTGATTTTCTAAAAATTATATAAATTATAGTTTCCTTTCTATAGCGCAAATTA AAATACATCGCTGAAATTAAAGTGGACCTACAGGAGTGCCATTAGCATATTCAAAGCCGAAGGTTGATAATAACG GTGTTTACTTAACTGAAATTTAGCGGTATTCATTCAAAAACAACATGTAGCCCTAACCGCCATCCATGAAATAAT GCATGTTGCTTTACGGTTCTGAACCCTAATTAGCTGGGAACAAAACAACTCCTTCAACTCTTATTAACCGTTTCC ATTCTGCTGTTCTCTGTGTTAGCTGAAGCAGAACACCCTTTGGAGGTGTTCTGGGACTCCTCCCAGGGGGGCGTG GCCTGGAGACTGAGAAAGGACACTCCACCTTCCTTTATCCAAGTGAAAAGCAGTGCACATGGCTACAGTCAGTTC TCATTTTCCTCCTGAGCATCGCCCTATTTATATCTGCACGTGGGTTTGCCTTCTTTGTGTGCAAATGCCTTGGCG TGAGTGAGCCAACAAATAGGAGTTAAATCAAAATGATTTCAAGGAACCAAATTCCTCCCAGGCCCATCCAATCTC TTCCTGGGAGAAAGCCTGGCCTGAGAGAAGCCTCTTCCACGGGCCTTCATCTCCTGGGGTGATGCCTCCCTTGGG CCACCAAGTGTGGGTGATGGTTGAGACCACAGCCCTTGAATTCCTCACAGATTCAGGGCCAGATAAGAAGAGACT GCCAGTGATGTGCAGGG >Egr2 CRM 6 AGGCTAATTCTTGAGCCAGGTAGAACTCTGGCAAGGTTGGGTCCTGCTAAGGGCCAGTCCCTCGCACACTGTACT GTGTGTGAACTCCCTAATACCTGTCGCTGTGTAGCAATGGAGCATACCCTTGGGACAATACACATCCAGTGATGA TGAGGTGGGGTGAACATGGAAGCTACTGGTCTAAGGAATGTGATGACTGGAGACATCCTCTGAGACCCATTTTGT GGGGTCCAGAACCGTTCCCACTAATTGGCCTAGGAAAGCCACCCCACATGGTGAGAGGGTCACCACACCACTGGT CTGTTTAAAGTGGGAAGCCCTGAGCCCCTTCCTGGCAAAGCGTTCTGACTCTGAGTTTGGGGATAAATGACTACT CCAGACGTGAGTCACTGCAGTTTGGAAATTCTCAGCTGCAGCCTGCACTTTAAGAAAAAAAAATTATTATTATCA TCTCTGACTAATAACTAGCAAACCCAGAGTGAAAAAATAACTCTGTGGGAGATGAAGAGAAAGTTCTAAAAAAAA AAAAAAAAAAAAAAAAAACTTCACAAGGAGCTCAAAGCACAAGAAGACAGGAGTACAGCAAGGCAAACAGCGAGT CTTAAAATATTAACTGAGTAATTATACCAATTAACATTTAGCTGTATCTATGTCCATTCCATTTTGTCTCTAATA GAATGATAGGTGGTGGTTATGGCTGGAGATCTCTGAAAGTTAATTAAAATCACCGAAACAGAACCAATACAATGT CCATTCTTGTTGTTGTTTTAATTCATTGAAGGCTAAATAATCTATCTCCTATTCTGAAATTTATTCCAAAAGGAG GCCGAGCCAGGAGTACAGATATGTTTTCTTTCTTATCCTGAAGTCCCTGTCATTGTGTTCCATCTCATCTGGGCA TCTAGCTGAGCCACCAGTCTCCTACGAGAGTCGAGTAGGGTGTGATGTTAGTGAGGACCAGTTATCCCGATCTAC TGATGTCTCTGGTCTCCTGAATCAGAGGGCTGCCCTGTTGTGAACTGGGTTTTCCCTAGCCTCTCACGTTCAGCC ACTGCAGTCCAAGCGTACTCTGGGGTCCAACTCCATGCAGGAGCTCTGTGGAGGTCTGAAGCGAGGCAGAAACTA AATGAGACAGCTTCCCTCTCTGCTCATCCATGGCAGCAATCCTGAGAGGACCTGACTGGCCCACAGGGGAAGATC ACAGCTGGC >Egr2 CRM 7 CCTCTGCCATTGACTCTATGTCTCCTCGAGTCCCTTTTCTTCTCTTGCCCTTGATTTCCTCAATATCCCCGGAGA AGGGTATGTGAGAGTCCAGGAACCATTCTATCCCCGCCCACCTACCCCCATCAACCCCAGCTTTTCCCCCGGGCA GCCTTGAGTCTCTGCTGAGACAGGGAAAGGAGAACAGAGCCCTTTGTGGCAGGCTGGGGACGGGCAACTTGAAAG CACTAGGGGTAGAAGAATTGAAGCATTTTTGTTGCGGAGGAAAGCGTGGGTTGCAGGCAGGGAGCCAGAAGCCGC TGACATCACCATCATACTTGGATCAAGCCTCGAAGAAGTGGGCGGGAGCCTCAGGGGAGTGGGCACAGTTACCTG GTAAAGGGACCAGAGGGCCTGAGTCTGGTCCAGTAGTGACAGGAACACTCACTGGATTAGGGACTAGTAGTCACA GGTCAAAGATAAAGAGGAAAGCCTGTGGAGGCTCTGGGGAGCCATTTCTCCATTCTATCCCTTGATTCAGGTCTT GGGGGAAGGACGGGATGGAGGCCTGTGCAGGTTATCTTTGTCCCTGGGCCTGAAGTCAAGCATTAGATGCAGCAG CTGCCAACCCAATCTTTTCCTGCTGTTTTGAGACTGCCGGTCAGAGAATGGGAAAGGAGGCCACCCCTGGAATGG GGTACAGCAGCTTCCTTATTTTCAGAGGTAGCCTCCATTAAAATGCCACAAAACTTAAACCTCCACTGTTGATAA GCCTCTCTCAGGTTTAAGCATGCCCCTCTTCTCACAGACTTCAATTATTTTGTTTTTCCAAATAGCTTCTCCCTA GGTATTTGAAAGTACTAGCTTTCTGGACCATCCTGTTCTACACTTCCGGTGAGGTCAGAGCAGACAGAGCTAATC GATCC 41 >Egr2 CRM 8 GAATTCAGGCCACGAGGCGTGGTGACAGGTGCCCTTAATTGCTGACCTATCTCTCCAGCCCAGACTCTATCTGTT TTGGTGGATGGTGTTCAGTCCAGGCAGAGGAAGTGCTACAGGGATGGCCTGCCTGCAAGAACCAATTAAGGCTGC CCTTCACTCTGAAGACAATCTTACTTCCCAGACTGGAGACCTGTGGGTAAGGGACAGGATTATCTATTTTAATGT TGCTGACCTCATTTTCTGAAAACTAAGTTCAACCGGAACTTCAACGTGTTTAAACAAATAAAAGTGACAGGCAGG TTTTCCTGGAAGTTTCCCCCATCCTTATGTCATGGACGTTATTATTTTTTTTTCTACATGGAAACATCTTGAAAC TGGAAAGACATGGGATATTATTATGTGAGACCCATTCATCGGGGTTGGGTTCAAGTCCTTTATTCCAGATGTGGG AGAATCTGGACTTGGCGAGCACGCAGTTTCCTTTGGATCACACAGAACATTTGAGGCTGTGTTGGGCGAAGCAAC CCCTGACTCTGTACTCATTACACAGTATTCTTTCCATGAGACCAATGCTATGGATTTGAGGATCACCTGTCAATA TCCAACAGCCCCCCCCATCTAAACTATGAGCCCACCAAAGGTGACACGCTGAGGACCCCCACAGAGAGACATGCC TGACTCATCCCGCCCCCCATTTCACAAGTGAAGCCTGTGATTAGAAAAATAATAAAAGCCCATATCACCTGTGTT CCTTGTTTATTATGCATTATACAGGTATCTGGTTGCAAAACCCGAACTCCTACTCCTGTCTCTCAGTGAGCATTT GAGGCAGAACACACTGACCTGCCTCATCAGATAACATAAGTGCTCCCTTCCCAGGTGACTTCCACCCTCCCTCCC CATCCCCCTTGGCTGCACTCCCAGAGAAAGAGTCCAATGGGGGAGGGGAAGGGGGGAAGGAGACACAGACAGGGA TCCATAGATTTTCCGGGTTGAGGGGGGGGGCTCTGAGATCAGAGGACTATGTCCCAGCTAAGTCAGAGGGGATTC TCTGACCCTACCCTCTTCAGTCAGGGTCCCTCTTACAGTCTTTGGAGGATACCAAGTTTGGGTGCACAAA >Egr2 CRM 9 CAGGACAGCTGGAGTAACCCTGTCTTTGAGGACAGAGGCTGACACCTTGAGAGGACCAAGGCCTCTCCCCAAGGT AAAGTCATTTGCTTTAGAGCTGACTTTCACTGGGAAAAATAAAAAACACAAAAAACAAAAAACTGACAAATGTAG ACGTGTGCTCTGCAGGACACCGGTCTGAGTCAGACTAGGAGACAAAGAGCTACTGTGTGCTATTTTCATGGAAGC CTATTAGGGGCAAGGTTTGGGAGTACATTCACAGGCTGCATCAACTGTGGCCTTGGTGCGAGACAAGGCTCTATC TTACTTTGCGACGCCAAGTGGAGATAATGAGGCTGAAGAGCTTGTGCTTCCTTCCTCCATCAGCATGCTCACAGA AACCAGGAAATGGAGAGAATATTCCACAGTGAGATGCCCTGGTGCTGGCTATTTGAAGTCTGAGGGAGGTTTAGG AGCCGTCTGATGGGGGTGATCTTGTTTTGAGGCTCCTACCTTCTTACCTCTTTTCTTTCTTTTCCCATTTAGCTT GCTGGGCAATATTGTCAGGAAACCTGAACACGTGGTCCACCGTTCTTGTCGTGTAAGTAGCAGACACCCCTTAAG TCACAGAACCTCCTGGGTGAGGCTCCTTGGGACTCTTCTTAGAGACTGGAATTAATGAAAGATGTGTGTTGTCAA GCTACCCAGGGGAGATGAGTAATTTATCTGGAGGATGAGGGTGATCTCAAGGGCTCTCGCCTTAGGAGCATTGTC TCAGTGAACCTGCCAGGGTTCAGAATTTTGGCCCGTCGACTGTTAGTTATCAGAACGCTCACATAAAGGCAGAAT GAGTTGCAATCGCCAGATTTATGAGGCCTGTGTCTAGTGAGGAATGTCCTTACCCATGAAGACTAGGCAATCCTG TCCTCCGAAAAGTCCATCTGGTTTCTGTAACATCCATGGGAAACTATGTTCTAATCAGAAGGTTGCTTGGTGAGT CAGCCATCTCAACAAAAGGAGTTACTTACGTGATTCAGATGTTTAGATGCACGGATGAAAGGCAGGTTTCACTCC ACCAGCTCGCCTATTGATTGTTAAGGGCTGTCCAGCATGTTGAGAAATAGGTCATGTTCAGGGTTTGTGAAAAAC AACAGTTGACTAGTGATGTCTGCGGTAGCCTAGACTTGGACAAAGGGGCAGATGTGCTATGTCTGAGCACCCCTG GCATACTCATAGCTGACCAATAAACCTTATGAATTCAGGCACACATAATCAATCACAAGATTACGAGAAATGGAG CAGAGGTGACTGGTAAGTCTTTGTTTGGTCACCAGTAGCGTACTTTTTGTTTTTTTAGTCCTTCTCTAAGAGATA GATCATGTGTCTGATGAGGC >Egr2 CRM 10 TGTCACTTCGTCACGGAGACAAAAACATCCGAGCTGTACATCGTGAATTTCCCCAACCGTTTTGTCAGCAATACC ACAGCATAGCTTTGGAAAGACACAGAGCGCTGTATTTGATGGAGAGGAAGAGAGGCTCCTGACTCTTAACTGCTC AGAGTATGTGGAAATGTTTTCAAGTATTTGAGGAAGAACAATTGAACCTAAGCCTGTGTAATGAGCCGCAGAAAG GGAAAGAGTGACTCTGGGGAGGCACTGCAGGAAATCTTGCAGAGGTGAGATCTGACTGGGGTAAAAGTGCAACTG TCTGGAGGACAGGAGGGCTCTGTGCTTCCGGTGTGGTTAGAGGAAGTGGGGCTGCAGCAGCTGGAGTCAGCCCAC CCCAGGGTGATCGGGACAGGTTTTCCAACGGGCAAAACCTCTGAACAGAAGCCAACAGGGTACCAGGGCTCACCG GGAGCCACGGGAATGCCTAGAGAGAGCTACAAGGGATTCTGGCCTTGGACTGACGCTGGCTTTCAGTCTCCAGGA AGGTGTCTGGCTAAATCGGTTCTGAGGCTCCATCTGGCCTGACATTCTGGCCCCACTGAGGGTCACCTTTACTCA 42 TGGACATTTCCCTTTCTTTCAGCCTCACCTGCTTTTTTTCCTCCGTGAAAATAGAGAAGTGGTAAAACAGGAAAT AGGATGTTGGTTGAGGTAAGGGGCTATGCCCCAGAACTAAATAAAACTCCCCCCTCCCCCATTCTCAACTCTGTC ACTTTGTGGGACCTCACTCAAGTCTCTGGAATCCTGGTTTCTCAGAGTAAAATAAAAACTTTGGCGAGAACGACC AAGGATGCCACTTCCATTTCTCATCCCTTTGACTAGCCTAAGAACTAGACTAATGAGAGAGTCATCGTCTTGAAC TGATCCTGGATAGTGAGAGTACCTCACCCAGGCCCTTCCCTCCTGGCTTGTGGAATCTGTCTTAAAGGAAAAGCA GTGAGTTCCTAGTATCCCCAGGGCCGTGACATTTCCCTCAAGCTCTGCAAATAGTGTCAGATCCTAAGACCCCAG AGCTTATCCTGTAGCTATGACTCCCAGGAGGTGTATCCAGTGGGGGGTGGGGGGGGGCAGAACCTGCTGCTTGAT GTCATAGCCAGGAGGGTCAATCGTTCTGAACATGGCTAGCTTATCCAGCGCACTTTATTTCCACTCAGCTCCACT TGTCCACGCGTGTGAAAGAAATCAACCCAGTACTTTCCCTTTCTGGTTTTCCCAAAGTCAACTACTCTTTCTGGC CGACTTTACCCCTCATTAACTAGTTATTTGAATGCTGCATGGATCCTTTGGAAACCTCCTTCAGAGCAGTCTCTG CATAAGCCAGTTAAATCAAGGGGCTTCGATT >Egr2 CRM 12 TCCCTTTAGTGGAATCAGTTCTGTGGTGGCGCTTCCCCTCCCACCTGTGCGATGGAGGCTGGCAGGAGGCCTGGG TCTTCCTAGCAGCCCTCTCAAGGCTGATTCATAGCACCAATTACTCAACATGTATAGATTCTAAAATAAAAATAA GTTCTTCTAAATGAATCTAAAAATGTACCCGTATTCGTTGAGTGCTTGCATCACAGACTGTGTTTAGTTTAGATC CTCCTTGCTTGCCTTCTTACACCAAGCTGCAAGATGGGGCGCTAGGAGGAAGAAACTCACACAGGCTGAAGCTCT TGAGTTCTATACTTAAGTTCTATCTCCATGACAGCCCGACAGCCCCCTGACCATGTGGAACCTGCTCACGTGGTC TCGAGGAACCGTTTTAGATTCTCCGCTCCTGCTTGCTTATCTAAGAGTTACTTTCATTGGGCTTATTCTATCTCT CTCCTGGTTATTTACCATCCAGGCGGGGAGGGCTGCATTTTCCTATATTTCTTTGAACTGGACAATGCGCTTGTC GATTCTAGCTGGTCTTTGGAGCCCTTGGAAAAAGCAGAAAAATGCGAGGTGGTGGAGGTGCGGAGGGGAGACTAG ATCTTGCCCTGTGGGGGAGCCTATGAACTTGTAGATCAAAAGATCAATCTTTGATTTAAAAGATGGAACCACTAT TCACGAACCGCGTGGTGACTGGCCAAGAAGAAATACAGTCACTTGTTGTCATTAAAAGATTAAAATAGAAAGCTG CAGTTCCCCACCCCTCCCCCAAACCAACCTGGAGTCCCATACACAAGGAAGGTGGGGGCTGTGACTCCACTTAGT CTCATTAATTGCTGATAAGTGACTGGTCCATAAGACAGAACTGAAAATTCAAGCTGACAAAGGGGGTGCGACCGG TCTACCTCTAGAGAGTGAGCCAGAACCATTCATTGCTCTATTAATTTTTTTATCATTGCCCTGTCTCTTAGTACA AAAGTGTTTAATCTGAGAATGTAAGTTCTGAGACAGGCTCCAGTCTCTAGCACTCCATACAATATCTCCATATCT TGTTTACCTAATTCTACACGCTATCAATAGACAGTTAAATGTGGAGCATCAGGGCAAAGGCAGAGCTCCTTAGTC GAAAGGATTTTGAGAAACG >Egr2 CRM 13 ATTCTGGTTGCAGAGATCATTTCTGAAGTACTTACTGGGTGGAGTTAAAATAGTTATTGACATCTGAGAACTAGG AACAAACACCGCTTTAGTGAGGGGTTAGCACAAGCCCTCCATGTTCCCTTGCTGGCCTGTGGATAACAAGCCCTA CCTAGCTAGGCTTCAGGGATTTCCCGCTGAGTTAGTGAGGAAGCCTTCTGACTCACTGTTCTGTCCCCTGTAACA GCCTATAGCTTCTCTCTGCTCCTGAGTGGGCTCCGGAAAATAAATACTTACAAGCAAGTGCCCACTCAGAGATGA CACCCCCCAGGGCTGCTTTCAGAACTCTGACCACGGCCTGGAACTCTGAGTTGTGGCTTTGTGGAAAGTGTGGCA GCATGGTCAAATTAGAAATGACTTTCTGAGTTCCAGGATTAATCTTACAACCACAGGAAACTTTTTCAAAGAGCT ATCATTCCTTCAGATCAAACCTGCTCGGACACCGCTGATGCCAGCTGATGCCAATGAAACGGCTTCAGCAATTAC CCCAGGACTCATTTCTCTCCAGTCCTGGCGTCTCAGATGGTTGCCCATGTTTGGAGGGATCATTTGTCACAAGCT TGGTGGATCTAGAAATGATTCTCAAAATAGCTTAAGATAGGTAGACACGAGGCTGTAAAACTCCAAGACAATCTA AACTTACTAGGTAAGCTGCCTCCCCCAAAACATTATAAAACTCTAAGGTTTCTTTTTTACTGGGAACATTTTATC ATTTTCTTTTAAACTCAAAACCATCTGACCTACCCACTTGTGTTATGCGCGTTGGCATCTTTACTCTGAAAATCT GGACCTCAATATTTATAATGTCCACTGCTCCTTATGCAATGTAATCTGCGGGTGTAGCCACACTGCTCAGCCTCA ACAGAAAACCCC >Egr2 CRM 18 GTGTTCTGGGATAATCAAATGAACTCACATCCCTAAGACATTTGGCAAATGGTAAACAATCAATAAACACTGGCA CCAGACAGCTCTTAGTGATGAAGCACTAAGCTGGGCTGGCAACGATTCCTAGTGTATATGCTATACAAAGATGCT 43 CCATTTCATAAAGGGAGACCCAAAGCAGGAGTGGGGCTATGCCTCTCTGCTTCATCTGAGGTTTCTTCGTTCCTT GTTCTTCCTTTCCTGGGGTGGCAGCTAGTTGGGACAGATAAGGTCTGTCAGACACATGTGAAGGGGGCTGGTCTA AGAGCTAAACTTAGAGCACAGAAGCAGGCCCACTTTGTGCCAAGAGGTAACCTTGGGGTGCAGAAGCAGGACCTA TTCTATGAGCACCCAATCCACCCACTAGCCAGAGACCCCCGTGTGTGACTTCTGGGCTGTTGTGTCTTGGGGAAA GAGTGTGACTAGCTCTAGTTTTCTGTGTGTTTACCTCAATGGGCTTGTAATTAATATTGTAGAATAGGGGCTCAT TAAATCAGACTCATTATGACTGTCCAGGTTCTGGAGCTCAAAAGAAAAGGATTTTCCTGTTCCAGATGCCTGATG GATCGTCCTAATCAAGCAGGCAGTTCTCTTCTGTTGAAGATTGTCTTACGAATGGAGATGAGATTTATTCTGACC GGTGCTGTTACAGTGAAGGATATGGCTGCTGCCGACAGGCATTATAGGTCTCTGTCAAGTCAGCGGCTCCCTTCG CAAGGCACACAGAACACAAGGATTTGTAAGATGCTGTGCTTGTAGCGGCTGGAATGTCCACTGGCTTCATTCCCA ACTTTGGCTCAGAATTGAGTCTTCCTTCTTGGGTTAGGGAAATGGCTCTGTCGAGTGCTGGCCTTACTCTATTC >Egr2 CRM 20 GGTTGGAAAGCACTTTGATCTTGGGTTTTTAAGTTTCTGGTGGCTCATCTGCCAAGGTCAACTTTTATTAGCACG TCTCTGCTCTTCAGAAAAAGTTTTACACCAGTCTCTCCAAAGGTGACTAGAGTTTGAATCAGTCCAATTAACATG ACAATGACAAAAATGTCAGGATCCATCTCTAAGGGAAAGATTTGTCCTTTGAAAGCCACTGTCTTGAGGTGTCCA GGCAGACTGAGAGCTTCTCTGGGACTATTTGGATTTTCTGTTCCTGAGCCAAAAAAACATTCTTTAGCATTGGCT GACAAAATGGGTTTCAGAACAGGAGGTCTGCGATGGAAACTTTAATTATGCAGGTCCTGTCAAAATCCATCTGCT GAATTCAACTGTTTCTGGGTAAGGTACTTCTTATTTTATTTTATTTTTTTTTAGCAGTGTTACTGGGTGACACTT TGATAGGGGTCTGAAATTAGCTCCCTTCACAGGCTAATTTATTCTGCCTCCTTCCCTGCATTAGCAATATTTATG TCAGGTTGGGATGTCTTCATGGCATTTTTTTTCATATTTTTTTAATGGATGATTTTTATGATGCTACTTAAGTGT TGGAATTGGACGGTGAAAGGTATCAGTTTGGGAGATGTCTCCAGAGGATCGGCGTTGGTGTGATTCTGGCTGACA GGTCCCATGACTTATCAATAAAGCCATTCTGTACATCTTGAGGGAAGATGGCCATCAAAAGTTGTTTTGGTTTTG GGAAGACTCAAATGCTGGGCACACATCTGTGATCTTATAACCCCTTCCATCACGCCTCGTCTATGTAACTCGCAG CCGTTACACTAAGCTATATTTACACTTATGGTGAACTCTCATTTGTGCTGGGGGGAGGGCTCAGGTGGGAATTCC AGGTGGTGGCTAAATGTACAGACAGTGAATGTATCCACATTGGGCATTTTTTTTCAGCTTGAATACCACAGAGTG AGCTTTAGACCAGCAGCACGGTTAATAAAATTGTCATTTTAGGGCACTTGCCTGGATCAGTTGTTATTTTATGAA ACCAGAGGTTTCATAATCAATGATGTCATATGTGCTTCAAAGACTAAGAGGGAGAAACGTTCACCGACAATCTTC GGTACTAAAAGAAATCCTAGTGAAAATACTGGAGAAAATGGAGCCGTGCCCATAGCCTTGCATCTATGTGGGGTC TGCAGCCTGAAGGTCGTTTGGAAACAAATTCATCTTCAGCCATCAAAAGTTAGAGCCTTATGCTTCTTCTCGGAC ACTCAAAGCATCATGGCAAGTTTGGATAAACAGTTCTTTGGGGGCTTTGAGATGGGAGAAGAAAATAAATATGCT TTATGGTTTATGGTCGGAAGTGTGTACTT >Egr2 CRM 21 GTAAGCCGCGTGTCTTCTTCGGAGCTTGTGGGTGGGGAGCGGTGGGTGTGGGGGGGCTTGGAGGGGAGAATGGGG ACTTGAGGTGGACGGCGCTGACCACACTGGCTCTGAAGACTGGAAGTAGGCGCAGGGGGAGCGGAAAAAGATAGA CTAGGGCTGCCTGGCCGTGCGCGCCAGCAGATGCACCTGGTCACCAAAAGCCACCCGCATGCCTGTAGCTCCGGG CGCGCCCAGCTTTGGCTAGGCGCAGGGTTGAAGCATCATCTCCTGGTCCTTGAAGCAGGTCCATCAGCCGGACTC TGCGCTACTGGGCTAGGGTCAAAGAGATGGGAAAGTTCATCAGTCGGGTTAGAGCTGCGCGTCTGGAGAGCTCAG TGTCAGGGGAGCGCTTAGGGAGCGCGCGGGCAGCGCTCTCTGGGCAGAGCCCACTCCCGGAGGATCTCCCGAGCG CGTTGTCCTTCTGGGGTGGCTGGGAGGCACCACCCAACTTGCGCATCGCCCCAAAGTGAACAGGGTTAACAAGCC GAGGCGGCGGAGCAGGGCAAGTAACAGCGAGAAGCGGCAAGGAGAATGCCTAGGGAGGCCCGCGCTGGCACCGGT GTGCGCCGGTTCCTTGGCGATGCTGTGTAAAGTCGTCTTCCTCAGCCCCCGCGCCTTCCTAGAAGCGCAGCCTCT TAGTCTTAGGCTAAGACAGGGATGCGCGGGTGTCCGGCTTGGCCGAGGATGTTGGGAGGGAAGGGAACCCTAGAC TCCTTCCTCCGCTGCCCGGGTTCACACTGGCAGTCTTCAGCTGAAGCGCAGCGGGAGAGGGAGGGGGAGGTGGCA CCAGGAATTAGCTTCAAGAACATCACCCACCCACTTATCCATTCTGGGGTTCCCCTCTCTCGCGGCTGGCACCTT CTCTGTACCGCCTGGGATTGATTGGTCGGCTCTTTTTATTGGTGTCTGTATGGTTGCCTCGGCGGTAAGACCACA AGGCAAAGGAGTGGGAGAGAAGTCAGAGGCAGATGGGAAATTGTTTGATATTGTTGCTGCTATTGGGTGTTTTTA 44 TTGGGAACTAGAGTCACAGATGGCCGGGAGATGGAAGCAGGAAGGGGCATGCCCTGCTTCTTGGAGATACAGTTT TTGAGTTCCAGACTCTGCCTAACTTCTTACTGTCTCTCCAGGCCTCAGTTTCCCCACCTGTTGCTCTGTAGTGTT GGAATCATGAAATGGGGATTGCCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCCGC GCGTCAGCATGCGTGTATGTGAATTAGTTCTTGCCTGTATTCTCATTGTCCCACCTCTTCCCTGACTTTTCCCTC CCCAG 45 References Arnosti D, Gray S, Barolo S, Zhou J, Levine M (1996) The gap protein Knirps mediates both quenching and direct repression in the Drosophila embryo. The EMBO Journal 15: 3659–3666 Berg OG, von Hippel PH (1987) Selection of DNA binding sites by regulatory proteins. Statisticalmechanical theory and application to operators and promoters. J Mol Biol 193(4): 723–50 Cai HN, Arnosti DN, Levine M (1996) Long-range repression in the Drosophila embryo. Proc Natl Acad Sci U S A 93(18): 9309–14 Chlon TM, Doré LC, Crispino JD (2012) Cofactor-Mediated Restriction of GATA-1 Chromatin Occupancy Coordinates Lineage-Specific Gene Expression. Mol Cell Doré LC, Chlon TM, Brown CD, White KP, Crispino JD (2012) Chromatin occupancy analysis reveals genome-wide GATA factor switching during hematopoiesis. Blood 119(16): 3724–33 Garber M, Yosef N, Goren A, Raychowdhury R, Thielke A, Guttman M, Robinson J, Minie B, Chevrier N, Itzhaki Z, Blecher-Gonen R, Bornstein C, Amann-Zalcenstein D, Weiner A, Friedrich D, Meldrim J, Ram O, Cheng C, Gnirke A, Fisher S, et al (2012) A high-throughput chromatin immunoprecipitation approach reveals principles of dynamic gene regulation in mammals. Mol Cell 47(5): 810–22 Grass JA, Boyer ME, Pal S, Wu J, Weiss MJ, Bresnick EH (2003) GATA-1-dependent transcriptional repression of GATA-2 via disruption of positive autoregulation and domain-wide chromatin remodeling. Proc Natl Acad Sci U S A 100(15): 8811–6 Harmston N, Lenhard B (2013) Chromatin and epigenetic features of long-range gene regulation. Nucleic Acids Res 41(15): 7185–99 Hastie TJ, Tibshirani RJ, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38(4): 576–89 Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15: 563–577 Hewitt GF, Strunk B, Margulies C, Priputin T, Wang XD, Amey R, Pabst B, Kosman D, Reinitz J, Arnosti DN (1999) Transcriptional repression by the Drosophila Giant protein: Cis element positioning provides an alternative means of interpreting an effector gradient. Development 126: 1201–1210 Janssens H, Hou S, Jaeger J, Kim A, Myasnikova E, Sharp D, Reinitz J (2006) Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even-skipped gene. Nature Genetics 38: 1159–1165 Kim AR, Martinez C, Ionides J, Ramos AF, Ludwig MZ, Ogawa N, Sharp DH, Reinitz J (2013) Rearrangements of 2.5 kilobases of noncoding DNA from the Drosophila even-skipped locus define predictive rules of genomic cis-regulatory logic. PLoS Genet 9(2): e1003243 46 Kulkarni MM, Arnosti DN (2005) cis-Regulatory logic of short-range transcriptional repression in Drosophila melanogaster. Molecular and Cellular Biology 25: 3411–3420 Laslo P, Spooner CJ, Warmflash A, Lancki DW, Lee HJ, Sciammas R, Gantner BN, Dinner AR, Singh H (2006) Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell 126(4): 755–66 Lin YC, Jhunjhunwala S, Benner C, Heinz S, Welinder E, Mansson R, Sigvardsson M, Hagman J, Espinoza CA, Dutkowski J, Ideker T, Glass CK, Murre C (2010) A global network of transcription factors, involving E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nat Immunol 11(7): 635–43 Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen Cy, Chou A, Ienasescu H, Lim J, Shyr C, Tan G, Zhou M, Lenhard B, Sandelin A, Wasserman WW (2014) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42(Database issue): D142–7 Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34(Database issue): D108–10 May G, Soneji S, Tipping AJ, Teles J, McGowan SJ, Wu M, Guo Y, Fugazza C, Brown J, Karlsson G, Pina C, Olariu V, Taylor S, Tenen DG, Peterson C, Enver T (2013) Dynamic analysis of gene expression and genome-wide transcription factor binding during lineage specification of multipotent progenitors. Cell Stem Cell 13(6): 754–68 Ogbourne S, Antalis TM (1998) Transcriptional control and the role of silencers in transcriptional regulation in eukaryotes. Biochem J 331 ( Pt 1): 1–14 Papatsenko D, Levine M (2011) The Drosophila gap gene network is composed of two parallel toggle switches. PLoS One 6(7): e21145 Perissi V, Jepsen K, Glass CK, Rosenfeld MG (2010) Deconstructing repression: evolving models of corepressor action. Nat Rev Genet 11(2): 109–23 Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U (2008) Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451: 535–540 Stopka T, Amanatullah DF, Papetti M, Skoultchi AI (2005) PU.1 inhibits the erythroid program by binding to GATA-1 on DNA and creating a repressive chromatin structure. EMBO J 24(21): 3712–23 Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14(2): 178–92 Treiber T, Mandel EM, Pott S, Györy I, Firner S, Liu ET, Grosschedl R (2010) Early B cell factor 1 regulates B cell gene networks by activation, repression, and transcription- independent poising of chromatin. Immunity 32(5): 714–25 Weigelt K, Lichtinger M, Rehli M, Langmann T (2009) Transcriptomic profiling identifies a PU.1 regulatory network in macrophages. Biochem Biophys Res Commun 380(2): 308–12 47