The analysis of novel distal Cebpa enhancers and silencers using a

advertisement
The analysis of novel distal Cebpa enhancers and silencers using a
transcriptional model reveals the complex regulatory logic of
hematopoietic lineage specification
Eric Bertolinoa,*, John Reinitza,b,c, Manuc,d,*
a
Department of Molecular Genetics and Cell Biology,
The University of Chicago, Chicago, IL 60637, U.S.A.
b
Department of Statistics,
The University of Chicago, Chicago, IL 60637, U.S.A.
c
Department of Ecology and Evolution and Institute of Genomics and Systems Biology,
The University of Chicago, Chicago, IL 60637, U.S.A.
d
Department of Biology,
University of North Dakota, 10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019,
U.S.A.
*
Corresponding authors. E-mail addresses: manu.manu@und.edu (M) and
eric.bertolino@gmail.com (EB).
Abstract
C/EBPα plays an instructive role in the macrophage-neutrophil cell-fate decision and its
expression is necessary for neutrophil development. How Cebpa itself is regulated in the myeloid
lineage is not known. We decoded the cis-regulatory logic of Cebpa, and two other myeloid
transcription factors, Egr1 and Egr2, using a combined experimental-computational approach.
With a reporter design capable of detecting both distal enhancers and silencers, we analyzed 46
putative cis-regulatory modules (CRMs) in cells representing myeloid progenitors, and derived
early macrophages or neutrophils. In addition to novel enhancers, this analysis revealed a
surprisingly large number of silencers. We determined the regulatory roles of 15 potential
transcriptional regulators by testing 32,768 alternative sequence-based transcriptional models
against CRM activity data. This comprehensive analysis allowed us to infer the cis-regulatory
logic for most of the CRMs. Silencer-mediated repression of Cebpa was found to be effected
mainly by TFs expressed in non-myeloid lineages, highlighting a previously unappreciated
contribution of long-distance silencing to hematopoietic lineage resolution. The repression of
Cebpa by multiple factors expressed in alternative lineages suggests that hematopoietic genes are
organized into densely interconnected repressive networks instead of hierarchies of mutually
repressive pairs of pivotal TFs. More generally, our results demonstrate that de novo cisregulatory dissection is feasible on a large scale with the aid of transcriptional modeling.
Keywords: cell fate; gene regulation; hematopoiesis; silencers; transcriptional modeling
2
Introduction
The spatiotemporal expression of genes is encoded in the genome by cis-regulatory sequences,
which may be located tens to hundred of kilobases from the transcription start site
(Carey et al., 2008; Spitz and Furlong, 2012). It is usually possible to empirically define cisregulatory modules (CRMs) as sequences that act as enhancers (Banerji et al., 1983; Banerji et
al., 1981) or silencers (Brand et al., 1985; Ogbourne and Antalis, 1998) of the activity of the core
promoter in reporter assays. The activity of CRMs results from sequence-specific transcription
factors that bind to their recognition sites and recruit cofactors which interact with the RNA
polymerase II holoenzyme complex or remodel chromatin (Spitz and Furlong, 2012). Careful
analysis of the CRMs of a few well-characterized genes (Fromental et al., 1988; Göttgens et al.,
2002; Ondek et al., 1988; Schirm et al., 1987; Small et al., 1992; Wilson et al., 2011; Yuh et al.,
1998) has revealed how the internal composition and structure of CRMs—the arrangement of
transcription factor binding sites (TFBS), the TFs binding to them, and interactions between
bound TFs—encodes the pattern of gene expression. For the vast majority of genes however, both
the identities of CRMs as well as their cis-regulatory logic remain unknown.
Determining the cis-regulatory logic of individual genes is an important goal of functional
genomics (Nam et al., 2010). First and foremost, determining the cis-regulatory logic of
individual genes is a prerequisite for constructing high-quality gene regulatory networks (GRNs)
(Levine and Davidson, 2005; Singh et al., 2014) and modeling them predictively. Second, even
though the putative rules of cis regulation have been inferred by the analysis of a few genes
(Cantor and Orkin, 2002; Göttgens et al., 2002; Kim et al., 2013; Small et al., 1993; Wilson et al.,
2011), checking their generality requires that we repeat such analyses on a much larger scale.
Transcriptional regulation is an input-output problem. The key to unscrambling cis-regulatory
logic is to map inputs (TF concentrations) to output (rate of transcription), conditioned by
regulatory sequence. A necessary requirement for successfully decoding regulatory logic
therefore is to include all three: TF concentrations, DNA sequence, and transcriptional output.
Mainstream genomic approaches, such as Chromatin Immunoprecipitation followed by
Sequencing (ChIP-Seq), RNA-seq, and massively parallel reporter assays (Arnold et al., 2013;
Levo and Segal, 2014; Melnikov et al., 2012; Nam et al., 2010; Sharon et al., 2012), assay either
input or output but not both. This fact necessitates the development of the means to include all
three components in cis-regulatory decoding.
3
More than mapping an input to an output, transcriptional regulation is a problem of mapping
multiple inputs to a single output, since CRMs are regulated by multiple interacting TFs. For
example, the CRM driving the expression of the second stripe of the even-skipped gene of
Drosophila is bound by 7 TFs at about 20 binding sites (Arnosti et al., 1996b; Janssens et al.,
2006; Small et al., 1992). The binding of CRMs by multiple TFs is widespread. Studies in
multiple cellular contexts, including the hematopoietic system (Heinz et al., 2010; Wilson et al.,
2010a), have detected combinatorial binding of lineage-specifying TFs. More generally, the
ENCODE and modENCODE projects (ENCODE Project Consortium et al., 2012; Gerstein et al.,
2012; Gerstein et al., 2010) have identified Highly Occupied Targets (HOTs)—DNA sequences
occupied by multiple TFs—which occur at a frequency higher than one expected by chance
(Nègre et al., 2011). TFs interact in complex manners to control the spatiotemporal program of
gene expression. Many activators are known to promote gene expression synergistically, TFs can
bind cooperatively, and repressors interfere with the activator function (Arnosti et al., 1996a;
Cantor and Orkin, 2001; Cantor and Orkin, 2002; He et al., 2012a; Heinz et al., 2010; Kulkarni
and Arnosti, 2005; Small et al., 1993; Small et al., 1996). Multiple interacting inputs make largescale cis-regulatory inference challenging since there isn’t a straightforward correspondence
between TF binding and gene expression (Calero-Nieto et al., 2014). Addressing this complexity
of cis-regulation requires that we devise a computational attack on the problem.
Here, we present a new approach for reverse engineering the cis-regulatory logic of a target gene.
Our approach overcomes the challenge of regulatory complexity by integrating multiple datasets
—evolutionarily conserved non-coding DNA sequence, genome-wide gene expression data, TF
binding preferences, and reporter activity data—using a transcriptional model that explicitly
simulates mechanisms of TF interaction. Our premise is that datasets assaying multiple aspects of
gene regulation, in combination with the rules of gene regulatory interaction encapsulated in the
model, will constrain the number of cis-regulatory schemes consistent with activity data. Our
transcriptional model is of the so-called “thermodynamic” type. Thermodynamic models have
been used to quantitatively predict CRM activity during development (He et al., 2012b; Janssens
et al., 2006; Kazemian et al., 2010; Kim et al., 2013; Reinitz et al., 2003; Segal et al., 2008;
Zinzen et al., 2006). In contrast to the previous applications of such models where the key TFs
and functional roles were known from previous work, here we use the model to learn them from
datasets probing multiple aspects of gene regulation.
4
The reverse engineering approach (Fig. 1) relies on four elements, 1) DNA sequences of a large
number of putative CRMs, 2) estimates of TF concentrations, 3) quantitative measurements of
CRM activity, and 4) a transcriptional model that takes CRM sequence and TF concentrations as
input to compute a prediction for CRM activity. We identify TFs likely to be regulating the set of
CRMs under consideration based on the presence of TF binding sites and their expression
patterns. We then formulate models that simulate the regulation of each CRM by the candidate
TFs, taking into account TF binding, interactions, and functional roles. In this work we allow for
two functional roles, activation or repression, which are not known beforehand.
At this point, our approach deviates from previous applications of transcriptional models. In order
to learn the functional roles of TFs, we construct 2 N models, where N is the number of TFs
included in the model (Fig. 1). Each model is then fit to CRM activity data using simulated
annealing (Kim et al., 2013; Lam and Delosme, 1988a, b). The composition of the best-fitting
models then implies the TF functional roles consistent with the CRM activity data. At the end of
the process, we arrive at specific predictions for the TFs regulating each CRM, their binding sites,
and whether they activate or repress their targets—the cis-regulatory logic of the CRM.
We applied the reverse-engineering methodology to determine the cis-regulatory logic of genes
encoding TFs involved in hematopoietic cell-fate specification. Cell-fate choice during
hematopoiesis is known to depend on the expression levels of specific transcription factors. For
example, the expression of the Ets-family transcription factor PU.1, encoded by spleen focus
forming virus proviral integration oncogene (Sfpi1), is necessary for myeloid and lymphoid
development (Scott et al., 1994). Although PU.1 is expressed in both lineages, it specifies B-cell
and macrophage fates in a concentration-dependent manner (DeKoter and Singh, 2000). Lineagespecifying TFs exert their effects in two main ways. First, they regulate the expression of genes
encoding other TFs (Laslo et al., 2008), forming transcriptional networks. Second, hematopoietic
TFs regulate the expression of cytokine receptors (DeKoter et al., 2002; Zhang et al., 1997),
allowing progenitor cells to respond to extracellular signals in order to escape cell death,
enter/exit the cell cycle, or move to the next level of differentiation (Bertolino et al., 2005 and
references therein).
We considered three genes, CCAAT/enhancer binding protein, alpha (Cebpa), early growth
response 1 (Egr1), and early growth response 2 (Egr2), which participate in a GRN directing
macrophage-neutrophil cell-fate choice. Cebpa-/- mutant mice lack mature neutrophils and
5
multipotential progenitors do not express granulocyte colony stimulating factor receptor (Zhang
et al., 1997). PU.1 and C/EBPα promote the macrophage and neutrophil fates by upregulating the
antagonists of the alternative cell fate, Egr1/Egr2 and growth factor independent 1 (Gfi1)
respectively (Dahl et al., 2003; Laslo et al., 2006). Egr2 and Gfi1 also repress each other, forming
a mutually antagonistic GRN. This GRN has been suggested to function as a bistable switch that
selects a macrophage state at high PU.1 levels and neutrophils at high C/EBPα levels (Laslo et al.,
2006). Although the model above treats PU.1 and C/EBPα as autonomous inputs, it is clear that
their own regulation is not independent of the cell fate decision. For example, PU.1 positively
regulates its own expression indirectly, by promoting longer cell cycles causing increased
accumulation of its protein product (Kueh et al., 2013), and directly, by binding two distal CRMs
(Leddin et al., 2011; Li et al., 2001). Cebpa is also known to be regulated by C/EBPα and other
C/EBP family members (Legraverend et al., 1993), which bind to a 350bp promoter region
upstream of the transcription start site (TSS). An enhancer located 37kb downstream of the
Cebpa gene has recently been identified (Guo et al., 2014; Guo et al., 2012). It is activated by
several TFs, including PU.1, RUNX1, and C/EBPα (Cooper et al., 2015). These results hint that
the GRN guiding myeloid differentiation is yet to be fully explored. In an effort to uncover new
regulatory links participating in the macrophage-neutrophil decision, we undertook a systematic
cis-regulatory dissection of the Cebpa, Egr1, and Egr2 loci.
We identified and analyzed a total of 46 putative CRMs, which were assayed in the Sfpi1-/- PU.1inducible estrogen receptor (PUER) cell line (Walsh et al., 2002). PUER cells are blocked at a
progenitor state and can be differentiated into either macrophage- or neutrophil-like cells by
inducing the translocation of PU.1 (PUER) protein into the nucleus with 4-hydroxy-tamoxifen
(OHT) (Dahl et al., 2003; Laslo et al., 2006; Walsh et al., 2002). We generated quantitative
Luciferase reporter activity data in uninduced cells with the IL-3 cytokine (progenitor stage),
OHT-induced cells with IL-3 (early macrophage), and OHT-induced cells with G-CSF cytokine
(early granulocyte). These assays identified several CRMs that enhanced or diminished the
activity of the proximal promoter, as well as apparently inactive sequences. The transcriptional
output data were matched with TF concentration input data from a genome-wide gene expressionmicroarray dataset (Laslo et al., 2006) acquired in the same conditions. These data and the model
were used to reverse engineer the cis regulation of Cebpa, Egr1, and Egr2. We evaluated the
regulation of these CRMs by 15 candidate TFs in parallel and constructed 215=32,768
alternative models to test functional roles. Predicted TFs were validated against prior evidence
and ChIP datasets deposited in NCBI Gene Expression Omnibus.
6
Our analysis shows that Cebpa has a surprisingly complex regulatory logic, integrating inputs
from multiple activators and repressors. We found that Cebpa proximal promoter and enhancing
CRMs are activated primarily by TFs expressed in the myeloid lineage—C/EBP family members,
PU.1, and Egr1—implying that, in addition to upstream TFs, the gene is regulated by its own
targets in a positive feedback loop topology. In contrast, Cebpa is repressed primarily by TFs
expressed in other hematopoietic lineages, suggesting that cross-lineage antagonism is
widespread and not limited to pair-wise interactions modeled in bistable switch models (Huang et
al., 2007; Laslo et al., 2006). This study extends the utility of transcriptional models beyond
systems where the TFs and their functional roles are already known and demonstrates the
feasibility of reverse engineering cis-regulatory logic on a larger scale.
Materials and Methods
Cell Culture
We utilized Sfpi1-/- cells expressing conditionally activable PU.1 protein (PUER) that can be
differentiated into macrophages or neutrophils by PU.1 activation (Dahl et al., 2003; Laslo et al.,
2006; Walsh et al., 2002). PUER cells were routinely maintained in complete Iscove’s modified
Dulbecco’s medium (IMDM) containing 5 ng/ml IL-3. PUER cells were differentiated into
macrophages by adding 100nM 4-hydroxy-tamoxifen (OHT). PUER cells were differentiated into
neutrophils by adding OHT in the presence of 10 ng/ml Granulocyte Colony Stimulating Factor
(G-CSF).
Identification of CRMs
We downloaded pairwise alignments, produced by the blastz tool
(http://www.bx.psu.edu/miller_lab/), of the mm9 (mouse) and canFam2 (dog) genomes from the
UCSC genome browser (http://genome.ucsc.edu). We computed the mean sequence identity in
the 101bp surrounding each nucleotide position. A threshold of 0.7 was applied to the mean
sequence identity to delineate conserved regions. Regions containing at least one conserved
region were identified as putative CRMs. The sequences of the assayed CRMs are provided in
Supplementary Text S2 in FASTA format.
7
Reporter design
Putative CRMs were cloned into a pGL3 Luciferase reporter vector (Promega). The proximal
promoter was introduced in the multiple cloning site (MCS) of pGL3. The distal CRMs were
inserted in a SalI site downstream of the SV40 late poly(A) signal. The intervening sequence was
2828bp in length and consisted of pGL3 backbone including the beta-lactamase gene (see Text
S2 for sequence). Since the lengths of the promoter regions were different for Cebpa, Egr1, and
Egr2, the distance between the distal elements and the TSS was different for each gene. The 3’
ends of the distal CRMs were located 4,022bp, 3,241bp, and 3,352bp from the TSSs of Cebpa,
Egr1, and Egr2 respectively.
Reporter assays
PUER cells were plated and cultured overnight. Cells were transiently transfected subsequently
with the reporter vector and Renilla reporter vector (Promega) using the Fugene transfection
reagent (Roche) according manufacturer’s instructions. After 24hrs, the cells were washed, lysed,
and the levels of both firefly and Renilla luciferase activities were measured using a dual
luciferase activity kit (Promega). Transfections were performed in duplicate in all conditions. For
Cebpa CRMs, assays were performed in uninduced PUER cells in IL-3 (progenitor), in the
presence of IL-3 and OHT (early macrophage), and in the presence of G-CSF and OHT (early
granulocyte). Egr1 and Egr2 CRMs were only assayed in progenitor and macrophage conditions.
The firefly luciferase activity was normalized to Renilla activity to control for sample-to-sample
transfection efficiency variation.
Sequence-based model of transcription
A model is constructed by identifying candidate TFs and specifying three inputs: 1) DNA
sequence of reporter constructs, 2) estimates of the concentrations of the included TFs, and 3)
PWMs of the included TFs.
Identification of candidate TFs. We used the “Match” tool of the transcription factor database
TRANSFAC (Matys et al., 2006) to search the CRM sequences for TFBS of TFs known to
regulate immune-specific genes. There were 62 immune-specific factors, listed in Table S1,
having at least one predicted TFBS in the sequences. Based on a literature search, we further
8
subdivided these TFs into those implicated in myeloid-specific gene regulation and those not yet
implicated—which we refer to as “non-myeloid” for convenience (Table S1). To measure
differential expression, we computed the standard deviation (Fig. S1) of gene expression in the
uninduced, IL3+OHT, and GCSF+OHT conditions in the Laslo et al. dataset (Laslo et al., 2006).
TF concentrations. We estimated TF concentrations using microarray gene expression
measurements from PUER cells in uninduced, IL3+OHT, and GCSF+OHT conditions reported
by Laslo et al. (Laslo et al., 2006) (Fig. S2). For genes with multiple probes, we chose the probe
with the highest mean intensity over the three cell types to represent the gene’s expression level in
the model (Table S3).
DNA sequences. In order to accurately represent the distances between binding sites and the TSS,
we modeled the CRM, vector, and proximal promoter sequences as they appear in the reporter
(Fig. 3D). Binding sites detected in the vector sequence were not included in the computation as
they are presumed to be nonfunctional.
PWMs. We obtained PWMs from TRANSFAC (Matys et al., 2006) (http://www.biobaseinternational.com/product/transcription-factor-binding-sites) and JASPAR (Mathelier et al., 2014)
(http://jaspar.genereg.net) databases. We evaluated a total of 88 factor-specific and pan-family
PWMs for the 19 TFs modeled in this study. While choosing PWMs, we considered two issues
affecting their quality. The first is that PWMs derived from a small number of bound sequences
can be biased toward the base composition of the founder sites as well as favor high affinity sites.
The second is that PWMs derived by pooling sites bound by multiple members of a TF family
may be non-specific and exhibit a high false positive rate. When considering a large number of
PWMs and TFs, it is not practical to determine the provenance of each PWM individually. We
developed an empirical quality criterion to identify high quality PWMs. The first PWM property
included in the criterion is the affinity of the highest-scoring binding site in the CRMs relative to
the consensus (highest affinity) site. This is evaluated as the affinity factor
1
, where
S cons−S max +1
Scons and Smax are the scores of the consensus and the highest-scoring site amongst the CRMs
respectively. This factor increases in value with Smax , reaching a maximum value of 1 if the
CRMs contain one or more consensus sites. Low values imply that the PWM only detects weak
sites in the CRMs and indicate a potential high-affinity or founder-sequence bias in the PWM. To
9
address the second quality issue, the inability to discriminate between CRMs, we computed the
difference in scores between the top-scoring sites of the 1st and 5th CRMs, denoted by Δ . Low
values of Δ imply that the PWM has lower specificity and a higher false positive rate. We
multiplied the affinity factor and Δ to obtain the quality criterion Q=
Δ
. PWMs with
Scons−S max +1
higher values of Q are able to detect stronger sites and better discriminate between the CRMs.
With two exceptions, we represented TFs with the PWMs having the highest quality criterion
value. The exceptions were C/EBPα and C/EBPδ, for which a TRANSFAC pan-family PWM,
CEBP_Q2, had the highest score. To discriminate between individual members of the family, we
chose instead the highest-scoring factor-specific PWM. For GATA, the pan-family PWM
GATA_Q6 had a very low quality score and we utilized the highest-quality factor-specific PWM,
GATA3_02. The results are robust with respect to different GATA family PWMs (Fig. S12; see
below). The PWMs chosen are listed in Table S2. The CRM sequences were scored with PATSER
(Hertz et al., 1990). We chose thresholds low enough so that weak sites would be included in the
model; sites having a binding affinity of at least 0.07 of the consensus site were included (Table
S2).
The model results are robust with respect to the choice of PWM. We replaced 1) the GATA3_02
PWM with a GATA1, a GATA2, or a pan-family GATA PWM and 2) the C/EBPα/C/EBPδ factorspecific PWMs (Table S2) with the CEBP_Q2 pan-family PWM in model 81762. The PWMs
were substituted one at a time and the modified models were fit to data while representing the
associated TF, GATA, C/EBPδ, or C/EBPα, as an activator or a repressor. In all cases the same
role was inferred as 81762, and the models utilizing alternative PWMs agreed very well with
81762 (Fig. S12; r 2 ≥ 0.87). The scores were however slightly higher in the modified models
(Fig. S12), validating our PWM quality control.
Global nonlinear optimization
We generated 215 models encapsulating all the possible combinations of the regulatory roles of the
TFs (Fig. 1). We determined the free parameters of each model using Lam Simulated Annealing
(LSA) (Reinitz and Sharp, 1995) as described previously (Janssens et al., 2006) in 5 replicates.
10
We tested the dependence of the quality of fit on the number of TFs included in the model by
removing TFs in order of increasing regulatory constraint, that is, from right to left in Figure 5C.
Upon removing the 4 least constrained TFs, the lowest score achieved increases, but is close to
the range of the 20 lowest scoring 15-TF models (Fig. S13). The lowest scores progressively
increase as the number of TFs is reduced. 7- and 5-TF models have scores comparable to those
achieved with randomized data (Fig. S3). These simulations suggest that removing the
unconstrained TFs has a minor effect on the quality of the fit and a core group of 11 wellconstrained TFs is essential to achieve the lowest obtained scores.
Model selection
We identified the 20 lowest scoring models. See Table S4 for scores of each replicate. In order to
identify a representative model, we clustered the models hierarchically using the dissimilarity of
regulatory roles as the pairwise distance metric (Fig. S4). Lower dissimilarity scores imply
greater likeness of regulatory schemes between pairs of models. Representing the regulatory roles
assigned to each TF as a binary vector, with 1 for activation and −1 for repression, we computed
the weighted Euclidean distance between each pair of models. The weights were |f i −0.5|,
act
act
where f i is the fraction of models, among the 20, that assigned an activating role to TF i . This
ensured that models assigning the same role to well-constrained activators clustered together. We
hierarchically clustered the models using the LINKAGE function of MATLAB (v. 8.0.0.783)
using the shortest distance algorithm. For the first round of reverse-engineering with only
myeloid implicated factors, there are 6 clusters at a dissimilarity cutoff of 0.4, of which all but
one are sparsely populated (Fig. S4A). The biggest cluster has 8 models with highly similar
regulatory schemes (Fig. 5B; top 8 models), allowing us to pick one, model 12058, for further
analysis (see Table S5 for parameter values). In the second round of reverse engineering with
additional non-myeloid factors, the 20 lowest scoring models form 5 clusters at a cutoff of 0.4,
the largest of which consists of 8 models (Fig. S4B). All eight members have highly similar
regulatory schemes (Fig. 5C; top 8 models), and we chose a member, model 81762, for further
analysis.
Results
11
Sequence-based model of transcription
Our model (Fig. 2) is derived from a sequence-based model of transcription (Reinitz et al., 2003)
that has been demonstrated to predict gene regulation during Drosophila segmentation
quantitatively (Janssens et al., 2006; Kim et al., 2013). A detailed description of the model and its
equations is provided in Supplementary Text S1.
Given the DNA sequence of a CRM and the position weight matrices (PWMs) and concentrations
of the TFs believed to regulate the CRM, our model computes the rate of transcription in several
steps. First, the model identifies binding sites by scoring the sequence with PWMs and retaining
the sites above a pre-specified threshold score (Fig. 2A; see Methods). Second, the model
computes the fractional occupancy of each site by taking an average over the ensemble of siteoccupancy configurations, whose statistical weights depend on the concentrations of the TFs and
the binding affinities of the sites (Fig. 2B). Some repressors act by reducing the activity of a
specific activator in a position-dependent manner (Arnosti et al., 1996a; Ogbourne and Antalis,
1998; Stopka et al., 2005), a phenomenon referred to as quenching. In the third step of the
calculation, quenching is implemented by reducing the occupancy of activators bound near
repressors (Fig. 2C). In the fourth step, the fractional occupancies of the bound activators are
summed, weighted by their activation efficiencies, to determine the strength of interaction of the
CRM with the core promoter (Fig. 2D). In contrast to quenching, some TFs act over large
distances to directly repress the activity of the proximal promoter by interfering with promoterenhancer interactions or recruiting chromatin-remodeling enzymes to establish large repressive
chromatin domains (Harmston and Lenhard, 2013). We represent their effects by reducing the
interaction strength of the CRM (Fig. 2E) to determine the net interaction strength. In the final
step, transcription initiation is modeled as a diffusion-limited enzymatic reaction, in which the
activation energy barrier is lowered in proportion to the net interaction strength computed
previously (Fig. 2F). In summary, the model takes TF concentrations as input and simulates TFTF interaction by the mechanisms of 1) competition for binding sites, 2) quenching, 3) long-range
repression, and 4) synergistic activation to determine the transcriptional output of a CRM.
Identification of putative CRMs
12
We identified putative CRMs as non-coding sequences having a high degree of evolutionary
conservation, a commonly used strategy for de novo CRM prediction (Hardison and Taylor, 2012;
Landry et al., 2009; Wilson et al., 2010b). We computed the average sequence identity over a
100bp window between the mouse and dog genomes (Fig. 3A-C). Highly conserved regions were
identified as those having greater than 70% identity. Applying this threshold to the dog-mouse
sequence identity yielded putative CRMs varying in lengths between 400bp and 1500bp, which
were long enough to include potential quenching or other TF-TF interactions but short enough
that cis-regulatory dissection was still practical (Fig. 3A-C). We tested a total of 46 CRMs
varying in length between 400bp and 1500bp. Below, we refer to CRMs by the gene name
followed by the CRM number in parentheses. For example, CRM 7 of Cebpa is denoted as
Cebpa(7).
Reporter constructs and activity data
Design of reporter constructs. Sequences upstream of the core promoter usually contain a
proximal promoter, which binds sequence-specific TFs and acts together with distal CRMs to
regulate gene expression (Bertolino and Singh, 2002; Carey et al., 2008). The reporter vectors
were designed to take into account potential positive and/or negative interactions of distal CRMs
with their cognate proximal promoters. We identified putative proximal promoters as
evolutionarily conserved sequences upstream of the TSS of the endogenous gene (Fig. 3A-C;
CRMs numbered 0) and placed them immediately upstream of the Luciferase gene in the vector
(Fig. 3D; pink boxes). Since most CRMs are distant from the TSS, placing them near the core or
proximal promoters in reporter assays—a common practice—can introduce artificial regulatory
interactions (Chopra et al., 2012; Gray and Levine, 1996). Instead, we placed the distal putative
CRMs of each gene ~3kb upstream (see Methods) of the cognate proximal regulatory sequence
(Fig. 3D; blue boxes). This construct design allowed us to detect both long-range enhancing and
silencing activities of CRMs by comparing CRM-bearing vectors with the proximal-only vectors
(Fig. 3D; top row). The location of the CRMs implied that any modulation of the activity relative
to the proximal-only vector occurred over long distances (see Methods).
Activity data reveal the regulatory complexity of Cebpa CRMs. We measured the activity of the
CRMs in three conditions—1) PUER cells in IL-3 uninduced with OHT (uninduced), 2) 24 hours
after induction by OHT in IL-3 (IL3+OHT), and 3) 24 hours after OHT induction in G-CSF
(GCSF+OHT)—which resemble macrophage-neutrophil progenitors, early macrophages, and
13
early neutrophils respectively. The activity of Cebpa CRMs (Fig. 4A) vary extensively by CRM
—up to 4.5x—and cell type—up to 15x. The patterns of differential expression are CRM
dependent. For example, Cebpa(7) has greater activity in the uninduced condition whereas CRMs
Cebpa(16) and Cebpa(18) have the greatest activity in IL-3+OHT conditions. Three patterns of
cis-regulation are discernable. A few putative CRMs, such as Cebpa(5), Cebpa(13), and
Cebpa(19), do not change activity relative to the proximal-only construct (Cebpa(0)) and hence
appear to be inert in the cell types we consider. We find four CRMs, Cebpa(7), Cebpa(14),
Cebpa(16), and Cebpa(18), which act as enhancers by increasing activity, up to 4.5x, relative to
Cebpa(0). Note that Cebpa(18) roughly corresponds to a recently described +37kb enhancer (Guo
et al., 2014; Guo et al., 2012), being in the same genomic location but ~200bp longer. We also
find many CRMs, such as Cebpa(2), Cebpa(6), Cebpa(9), Cebpa(10), Cebpa(11), Cebpa(15),
Cebpa(20), Cebpa(23), and Cebpa(24), which diminish activity relative to Cebpa(0) by a factor
of up to 4.5x, and thereby act as silencers. Although a few CRMs, such as Cebpa(22), activate in
one cell type while repressing in another, the enhancers or silencers listed above act consistently
in all three cell types.
Egr1 and Egr2 CRMs were assayed only in uninduced and IL3+OHT conditions. In contrast to
the rich activity patterns exhibited by Cebpa CRMs, Egr1/2 putative CRMs behave quite
uniformly. Egr1 has only one enhancer, Egr1(2), and two silencers, Egr1(5) and Egr1(9) (Fig.
4B). Egr2 has no silencers; most CRMs have enhancing activity in un-induced cells but have no
effect in the IL-3 condition (Fig. 4C). Notably, neither gene showed CRM-dependent differential
activity as was observed for Cebpa. These differences between Cebpa and Egr1/2 in the
complexity of CRM behavior suggest that the genes have distinct regulatory architectures.
Reverse engineering the putative CRMs of Cebpa, Egr1, and Egr2
Here we describe the general approach to reverse engineering in the context of its application to
Cebpa, Egr1, and Egr2. The main steps (see schematic in Fig. 1) are as follows. First, we identify
candidate TFs to include in the model. Second, we construct a family of models encompassing all
the possible combinations of regulatory roles. Third, we use global nonlinear optimization to infer
the free parameters of the models by minimizing the score—the sum of squared difference
between model and data—for each model. The score is also used to pick the models that best
explain the observed patterns of CRM activity for further analysis. Fourth, the chosen models are
analyzed further to infer the cis-regulatory logic of each CRM in the model.
14
Identification of candidate TFs. We took a broad approach to identifying TFs to include in the
model, starting with 62 immune-specific TFs predicted to bind at least one CRM in the
transcription factor database TRANSFAC (Matys et al., 2006) (see Methods). We further
winnowed the candidate TFs by identifying ones expressed differentially in PUER cells (Laslo et
al., 2006), reasoning that they are more likely to explain the regulatory complexity observed in
the activity data (Fig. 4). Surprisingly, we found that TFs previously implicated in myeloid
differentiation had much higher differential expression than non-myeloid TFs (Fig. S1C,D;
Methods). This led us to suspect that differential expression of the myeloid-specific TFs drives
the cell-type specific response of the CRMs. We chose the top 15 differentially expressed
myeloid-specific TFs as candidates for a first attempt at reverse engineering (Fig. S1).
Model inputs. The model takes several inputs to compute CRM activity. The first is DNA
sequence. We incorporated the DNA sequences of all the assayed CRMs, including the inactive
ones, into the model. We expect that inactive CRMs will act as negative controls, constraining the
model to reduce the amount of TFs binding to them and hence reduce the number of falsely
identified TFBS.
The second input, the concentrations of the TFs, was provided by microarray gene expression
measurements from PUER cells (Laslo et al., 2006) in conditions matched to the CRM activity
measurements. The data are shown in Figure S2 for the TFs included in the model, and are in
general agreement with an independent dataset from PUER cells in IL3 conditions (Weigelt et al.,
2009) (Fig. S2B). The third input, the DNA binding properties of the TFs was provided by PWMs
from TRANSFAC and JASPAR, which were used to detect TFBS for the candidate TFs (Table
S2; see Methods). The resulting sequence-based model for 46 CRMs contained ~700 binding
sites. The model was formulated in an internally self-consistent manner so that TF properties
were common to all CRMs. This implies that differences in predicted CRM activity arise solely
from differences in DNA sequence.
A family of sequence-based models. In order to infer the regulatory roles of the TFs, we
constructed a family of models that realized all the possible combinations of regulatory roles for
the 15 TFs. Allowing each TF to assume two roles, activation or repression, resulted in 215
(32,768) alternative models. Note that each model realization is structurally distinct from all the
15
others since changing the role of even one TF results in completely different TF-TF and TFpromoter interactions.
Model and parameter inference by nonlinear optimization. We used Lam Simulated Annealing
(Janssens et al., 2006; Kim et al., 2013; Reinitz and Sharp, 1995) to minimize the loss function or
score, computed as the residual sum of squares, for each model realization and inferred the values
of their free parameters in 5 replicates. The median absolute deviation over the replicates for the
20 lowest-scoring models (Table S4) varied between 0.004% and 5% of the median score. A
narrow range of replicate scores indicates that each model is attaining the global minimum; the
termination of replicate fits at different local minima would have led to a broad distribution of
their scores. The optimization problem is not underdetermined, having many more datapoints
than parameters (Supplementary Text S2), and the fits are statistically significant (Fig. S3;
Supplementary Text S2). Lastly, the scores of the family of models range over an order of
magnitude (Fig. 5A), suggesting that they are able to discriminate between different realizations
of TF functional roles.
Model analysis. We chose the 20 lowest scoring models (Table S4) to determine how well the
data constrain regulatory roles and to check the per-CRM agreement with data. The regulatory
roles of each TF represented in these models is depicted in Figure 5B. The regulatory roles of 5
TFs, C/EBPδ, Egr1, Gfi1, Myb, and PU.1, are completely constrained, being identical in all 20
models. 6 TFs, C/EBPα, Egr2, Ikaros, IRF4, Jun, and Myc, are well constrained, having the same
role in more than 60% of the models. 4 TFs, C/EBPβ, Fos, Fli1, and Ets1, are poorly constrained.
This implies that not only can we infer TFs likely to be regulating the CRMs, but also eliminate
TFs that are poorly constrained by the data as unlikely to be regulating the CRMs.
We further inspected the quality of the fit by analyzing a representative model in detail. We
clustered the models on the similarity of their regulatory schemes (Fig. S4A; see Methods), and
chose one, model 12058, from the largest cluster for further analysis (see Table S5 for parameter
values). The output of model 12058 is correlated well with the activity data (r 2=0.78; Pearson’s
correlation coefficient; Fig. 5D), implying that the model recapitulates cell-type- and CRMspecific changes and the dynamic range of the activity data. A direct comparison of the data and
model output is shown in Figure 6A-C. For Cebpa CRMs, with a few exceptions noted below, the
model correctly reproduces the cell type- and CRM-specific upregulation of all the enhancers.
The model shows greater upregulation by Cebpa(7) and Cebpa(14) in uninduced than IL3+OHT
16
conditions, and reverses the pattern for Cebpa(16) and Cebpa(18) in accordance with the data.
The lack of up- or down-regulation by inactive elements and down-regulation by several, though
not all, silencers is also reproduced. For Egr1, the model reproduces the pattern of upregulation
and downregulation observed in data, although in several cases the amounts are different. For
Egr2, the model reproduces the overall low level of activity of its CRMs but incorrectly shows
similar levels of activity in uninduced and IL3+OHT conditions instead of the relatively lower
levels observed in the latter. To summarize, the model reproduces most of the cell type- and
CRM-specific features of the data and the levels of a majority of the individual CRMs.
Model 12058 deviates from data in several CRMs and conditions; one class of deviations
indicated that the model lacked repressors. There are isolated instances of the model predicting
lower than observed activity, such as Cebpa(18), Egr1(14), and Egr2(10), but the reverse is more
common. Predicting higher than observed levels is particularly prominent in the silencers. The
model predicts overexpression for 4 of the 9 silencers of Cebpa, Cebpa(6), Cebpa(20),
Cebpa(23), and Cebpa(24) and three other CRMs, Cebpa(8), Cebpa(17), and Egr2(7). The
inability to correctly repress activity suggests that the model lacks repressors that presumably
bind the silencer CRMs. We had limited the model to myeloid-specific TFs initially since nonmyeloid TFs are not differentially expressed in these cell types. Uniformly expressed TFs can,
however, provide CRM-specific but cell-type independent repression. Since cross-lineage
antagonism is quite common in hematopoiesis (Laslo et al., 2008), it is possible that some of the
excluded non-myeloid TFs might bind silencer CRMs and repress Cebpa in the lymphoid or red
blood-cell lineages.
To rigorously evaluate this possibility and improve the prediction of silencer CRMs in the model,
we included a limited number of non-myeloid factors in the model. Increasing the number of TFs
adds additional parameters, that is, additional degrees of freedom to the optimization problem
(Supplementary Text S1). Including more TFs would therefore make it difficult to discern
whether the improvement in the fit results from the additional degrees of freedom or from novel
regulation introduced by the new TFs. Analysis of the low-scoring models indicated however, that
some TFs were dispensable since their roles were poorly constrained (Fig. 5B; Fig. S4). We
exploited this to include additional non-myeloid TFs without introducing additional parameters
by identifying and eliminating TFs having the least activity in model 12058. We computed the
maximum activity of each TF over all CRMs (Fig. S5) and removed the two activators and the
two repressors having the smallest maximum activity, C/EBPβ, Fos, IRF4, and Egr2.
17
Next, we included four lineage-specifying TFs from non-myeloid lineages, E2A, Elf1, EBF1, and
GATA(s), that had binding sites in the silencer CRMs. EBF1 is involved in specifying the B-cell
lineage (Laslo et al., 2008; Pongubala et al., 2008), Elf1 is required for the differentiation of the
natural killer (NK)- and NKT-cells (Choi et al., 2011), whereas E2A is required for both B- and Tcell development (Bain et al., 1994; Rothenberg, 2014). GATA1, GATA2, and GATA3 have very
similar binding site preferences in vitro (Ko and Engel, 1993; Merika and Orkin, 1993) and bind a
large number of overlapping sites in vivo (Doré et al., 2012; May et al., 2013). Due to this
degeneracy in binding and uniform expression in the PUER cells (Fig. S2), we expected that the
model would not be able to distinguish amongst the three and hence represented them as a
“lumped” GATA regulator. Any conclusions we draw regarding GATA might pertain to influences
from either the erythryoid (GATA1), MEP or mast cell (GATA2), or T-cell (GATA3) lineages. The
revised scheme of reverse engineering was executed in a manner identical to the previous round.
After optimizing the 215 models, we observed a dramatic reduction in the scores of the lowestscoring models—from 84,486 in the previous round to 35,072 here (Fig. 5A; Table S4). The
regulatory roles of 8 TFs are completely constrained, compared to 5 in the models lacking nonmyeloid factors (Fig. 5B,C). Notably three of the newly added non-myeloid factors, EBF1,
GATA, and Elf1, were well constrained as repressors, supporting the hypothesis that non-myeloid
TFs repress Cebpa CRMs.
Clustering on similarity of regulatory roles (Fig. S4B; see Methods) allowed us to choose the
lowest-scoring model, 81762, as a representative for further analysis (Tables S4 and S5). The
correlation between model output and data is further improved (r 2=0.91; Fig. 5E), implying a
better recapitulation of the changes in activity by cell type and CRM. This overall improvement is
achieved in part by better repression of nearly all the under-repressed CRMs, including silencers.
A few mispredictions remain uncorrected, such as Egr2(10), while four CRMs, Cebpa(8),
Cebpa(21), Egr1(5), and Egr2(4) are over-repressed. The overall improvement in the agreement
between model and data suggests that the non-myeloid TFs help explain the CRM-specific
patterns of expression. It is notable in this context that model 81762, the lowest-scoring model,
assigns repressive roles to all the non-myeloid factors.
The cis-regulatory logic of Cebpa, Egr1, and Egr2
18
Here we use the model to infer the TFs, binding sites, and interactions that generate the cell typeand CRM-specific pattern of activity observed for the three genes. We do this by inspecting the
intermediate steps in the calculation of transcription rate and decomposing it into contributions
from individual binding sites. Since each binding site is associated with a particular TF, we can
infer the cognate TF and its regulatory role as well. Activators acting through their coactivators
catalyze the recruitment of the transcription holoenzyme complex to the promoter to increase the
rate of transcription. This is represented in the model by reducing the activation energy barrier by
an amount ΔΔ A , which depends on the net interaction strength (Fig. 2F and Supplementary Text
S1). The net interaction strength, in turn, depends on the occupancies and activation efficiencies
of the bound activators and can be decomposed into contributions from individual activator
binding sites (Fig. 2D). Hence, we can determine individual contributions to ΔΔ A by plotting
each term of the net interaction strength separately (Fig. 7A-C). Long-range repressors act by
interfering in the recruitment of the holoenzyme complex, a phenomenon modeled by reducing
the net interaction strength (Fig. 2E) in the model. To determine repressive activity, we plot the
factor by which each bound repressor reduces the interaction strength (Fig. 7D-F). We found
negligible and inconsistent contributions from quenching and they were not considered in our
regulatory analysis. This is consistent with our reporter design (Fig. 3D) and activity data (Fig. 4),
since the reduction of activity in reporters carrying silencers in addition to the promoters could
not have occurred via quenching.
First, we illustrate the process with the examples of the Cebpa proximal promoter, an enhancer,
and a silencer. Following the illustrative examples, we describe the inferred cis-regulatory logic
and compare to available evidence.
Figure 7A shows the contributions of the activator binding sites in the model for the proximal
promoter of Cebpa, Cebpa(0). The model identifies a total of 6 binding sites for 4 TFs. The
largest contribution in all three conditions is from C/EBPδ, with relatively smaller contributions
from Gfi1 and Egr1. Although Myc has a binding site, its contribution to activation is negligible.
Plotting the strength of repression reveals two repressor sites, for Jun and C/EBPα, with weak
activity (Fig. 7D). The patterns of expression (Fig. S2) of the TF inputs, the main activator
C/EBPδ and the repressor Jun, explain the pattern of the output, promoter activity, in
combination. The promoter is strongly downregulated in GCSF+OHT (Fig. 4A). This is a
combined effect of C/EBPδ downregulation (Fig. 7A) and Jun upregulation (Fig. 7D), whereas
19
the promoter activity is unchanged in IL3+OHT due to compensation of Jun repression by
C/EBPδ upregulation.
The model for enhancer Cebpa(16) contains 4 activator binding sites, 3 for PU.1 and 1 for
C/EBPδ (Fig. 7B). Although each individual PU.1 site has a small contribution to activation,
together they upregulate Cebpa(16) expression by a factor of ~2.5 in IL-3 due to the synergism in
the action of multiple activator sites represented in the model (Fig. 2D,F). As with the promoter,
the patterns of PU.1 and C/EBPδ expression (Fig. S2) explain the activity of the reporter
containing Cebpa(16) and Cebpa(0) (Fig. 4A). This reporter is upregulated more in OHT-induced
than uninduced conditions. This can be directly attributed to the induction of PU.1 and the
preferential upregulation of C/EBPδ in IL3+OHT. The model’s prediction that PU.1 binds
Cebpa(16) is supported by genome-wide ChIP studies of PU.1 binding (Fig. S11A), one in PUER
cells (Heinz et al., 2010) and the other, an independent study, in neutrophil conditions of the
FDCPmix system (May et al., 2013). Our analysis thus identifies Cebpa(16) as a novel PU.1responsive enhancer in the vicinity of Cebpa.
In the final example of cis-regulatory inference, we analyze a silencer, Cebpa(11) (Fig. 7C,F).
Cebpa(11) downregulates the activity of the proximal promoter by a factor of 2.7, 3, and 1.9 in
uninduced, IL3+OHT, and GCSF+OHT conditions respectively. Correspondingly, the activation
provided by binding sites is much lower in the model for Cebpa(11) (Fig. 7C; compare to panel
A). Since Cebpa(11) is located ~15kb downstream of the proximal promoter in the endogenous
locus and ~3kb upstream of the proximal promoter in the reporter constructs (Fig. 3A,D), the
repressor interactions are occurring over long distances. Plotting the long-range repression
mediated by sites in the Cebpa(11) model reveals the TFs likely responsible for downregulation—
EBF1, Myb, GATA(s), and Ikaros (Fig. 7F). Consistent with the uniform expression patterns of
the repressive inputs (Fig. S2), the fold downregulation is approximately the same in all three
conditions. EBF1 and Ikaros are B-cell factors regulating cell-fate commitment and
immunoglobulin heavy-chain rearrangement (Lin and Grosschedl, 1995; Pongubala et al., 2008;
Reynaud et al., 2008). Despite the presence of a binding site in the model, Ikaros does not have
any repressive contribution here (Fig. 7F). Ikaros did however have a strong repressive
contribution in the model 12058 which lacked EBF1 (data not shown). This suggests that EBF1
and Ikaros repress Cebpa redundantly.
20
To summarize, the examples above show that the model’s inferences display a high degree of
accord between the expression patterns of the TFs inputs, their functional roles, and the
empirically observed patterns of cell- and CRM-specific transcriptional output. It is not yet
possible to conclusively connect the behavior of the CRMs to that of the endogenous Cebpa
transcriptional unit since CRMs do not act additively (Dunipace et al., 2011; Landry et al., 2009;
Perry et al., 2011). Nevertheless, the downregulation of Cebpa in alternative lineages (Huang et
al., 2009; Pongubala et al., 2008; Reynaud et al., 2008) and with OHT induction (Fig. S2)
matches well with the abundance of silencers in the locus.
We now describe the inferred cis-regulatory logic, limiting the discussion to TFs exerting
particularly strong effects and/or regulating multiple CRMs. We also compare model predictions
to available evidence—including relevant publicly-available ChIP datasets compiled from the
NCBI Gene Expression Omnibus (Fig. S11).
C/EBP family. The activation of Cebpa promoter by C/EBPδ (Fig. 7A) is in general agreement
with the in vitro binding and reporter assays of the Cebpa promoter (Legraverend et al., 1993)
that showed both binding of C/EBPα, C/EBPβ, and C/EBPδ, and transactivation by C/EBPα and
C/EBPβ. C/EBP family transcription factors have very similar binding properties. This fact, in
combination with similarity of expression patterns in PUER cells (Fig. S2), implies that these
factors play redundant roles in the model and our conclusions about C/EBPδ could pertain to the
whole family. C/EBPδ is also predicted to bind to and activate two other Cebpa enhancers,
Cebpa(7) (Fig. S6A) and Cebpa(16) (Fig. 7B). We conclude therefore that one or more C/EBP
family TFs bind and activate Cebpa(7) and Cebpa(16). This is supported by a ChIP-seq dataset
for C/EBPα in peritoneal macrophages (Heinz et al., 2010), which shows binding of C/EBPα at
the three CRMs (Fig. S11A). Lastly, a gene network inferred in human hematopoietic cells
identified C/EBPδ as an upstream regulator of a module of genes coexpressed in granulocytes and
monocytes (Novershtern et al., 2011).
PU.1. The two strongest enhancers, Cebpa(16) (Fig. 7B) and Cebpa(18) (Fig. S6B) are driven by
PU.1. Cebpa(18), like Cebpa(16), is bound by PU.1 in both PUER and FDCPmix cells (Fig.
S11A; (Heinz et al., 2010; May et al., 2013)). The activation of Cebpa(18) by PU.1 is in
agreement with reporter analyses in 32Dcl3 cells (Cooper et al., 2015) that also document the
effect. PU.1, one of the key TFs specifying the myeloid and to a degree lymphoid lineages (Scott
21
et al., 1994), is known to promote the macrophage lineage by antagonizing the activity of
C/EBPα (Laslo et al., 2006). Our analysis shows that Cebpa is a direct target of PU.1.
Egr1. The model predicts that, in addition to the proximal promoter, the Cebpa enhancer
Cebpa(14) is also activated by Egr1 (Fig. S6C). Egr1 binds the Cebpa proximal promoter in
ChIP-seq data from bone marrow-derived dendritic cells (Fig. S11A; (Garber et al., 2012)). Also,
Egr1 expression promotes macrophage differentiation at the expense of neutrophils (Nguyen et
al., 1993). Egr1 has been suggested to operate downstream of PU.1 and C/EBPα in the
macrophage/neutrophil decision; our analysis suggests that Egr1 supports Cebpa expression in
the macrophage lineage. The model also infers that Egr1 activates its own expression by binding
to CRM Egr1(2) (Fig. S9B), which is also bound in the dendritic-cell ChIP dataset (Fig. S11B;
(Garber et al., 2012)).
Ets1. Ets1 has been implicated in a broad range of roles in hematopoiesis, including the
development of lymphocytes (Bories et al., 1995; Muthusamy et al., 1995), megakaryocytes
(Lulli et al., 2006), and granulocytes (Lulli et al., 2010). The model infers that two CRMs are
activated by Ets1, the proximal promoter of Egr1, Egr1(0) (Fig. S9A), and the Cebpa enhancer
Cebpa(18) (Fig. S6B). Egr1’s promoter activity is downregulated in IL3+OHT (Fig. 4B); our
results imply that this results from the downregulation of the Ets1 activating input (Fig. S9A and
S2). The Egr1 promoter has indeed been found to be activated by Ets1 in NIH-3T3 cells
(Robinson et al., 1997) and Ets1 binds the Egr1 locus in G1ME cells (Fig. S11B; (Doré et al.,
2012)). The activation of Cebpa(18) by Ets1 agrees with a dramatic loss of activation when Ets
sites are mutated in a roughly comparable +37kb enhancer (Cooper et al., 2015).
EBF1 and Ikaros. In addition to Cebpa(11), EBF1 represses Cebpa(8) (Fig. S7D), Cebpa(19)
(Fig. S8F), and Egr1(5) (Fig. S9F). Pongubala et al (Pongubala et al., 2008) found that EBF1
represses Cebpa by binding to weak sites in its promoter. Our results suggest that EBF1 represses
Cebpa by binding to distal sites in addition to the promoter-proximal ones. Additionally, there is
detectable EBF1 binding at the Egr1 locus (Fig. S11B) in pre-B cells (Treiber et al., 2010) and
Rag1-/- pro-B cells (Lin et al., 2010). Ikaros is also known to repress Cebpa (Reynaud et al.,
2008), although we detected Ikaros repression only in the models lacking EBF1 (Fig. 5B,D).
GATA(s). The model infers that many silencer CRMs, Cebpa(11), Cebpa(8), Cebpa(23), and
Cebpa(24), and an enhancer, Cebpa(18), are repressed by one or more members of the GATA
22
family (Fig. 7F, S7D-F, and S6E). Supporting this, GATA2 binding was detected at Cebpa CRMs
24, 8, 11, and 18 in G1ME megakaryocyte progenitors (Doré et al., 2012) and at Cebpa CRMs 8
and 18 in FDCPmix multipotential progenitors (May et al., 2013) (Fig. S11A). Moreover, Cebpa
is upregulated in GATA2 knockdown in G1ME cells (Huang et al., 2009). GATA2 regulates the
proliferation of HSCs (Rodrigues et al., 2012), granulocyte-macrophage progenitors (Rodrigues
et al., 2008), and megakaryocyte-erythrocyte progenitors (Doré et al., 2012; Huang et al., 2009).
Upon mutating GATA sites, Cooper et al. (2015) found a reduction in expression driven by the
+37kb enhancer corresponding to Cebpa(18) in 32Dcl3 cells. However, the direct binding of
GATA2 to Cebpa(18) and the derepression of Cebpa upon GATA2 knockdown in G1ME cells
suggest that the activation observed in 32Dcl3 cells is a context-dependent effect and not a
general feature of Cebpa regulation. Our analysis combined with the ChIP-seq data (Fig. S11A)
therefore implies that GATA2 represses Cebpa in progenitors and the red-blood cell lineage.
Discussion
The regulatory function of the non-coding parts of the genome remains largely unexplored. We
have developed an approach that integrates datasets probing multiple aspects of gene regulation
to decode cis-regulatory logic in a scalable manner. Using this approach we analyzed 46 CRMs in
parallel to show that Cebpa—a gene for which previously only one distal CRM was known—has
a regulatory logic that relies on multiple distal enhancers and silencers. Our approach goes
beyond the detection of CRMs by determining the identity and binding sites, as well as the likely
functional roles, of the TFs regulating a gene. The functional roles of TFs were determined by
constructing 32,768 alternative models encapsulating all possible regulatory schemes for 15 TFs.
Previous analyses have implied that hematopoietic gene expression is supported by multiple
enhancers, which are usually bound by relatively few (1-3) activators in complex (Göttgens et al.,
2002; Leddin et al., 2011; Pimanda et al., 2008; Wilson et al., 2010b; Yeamans et al., 2007). Our
results about Cebpa suggest a considerably more complex regulatory organization involving the
prominent use of silencers and enhancer-bound repressors to finely control cell type-specific
expression patterns.
Perhaps our most surprising finding is that the Cebpa locus contains several silencers, which, in
fact, outnumber the enhancers (Fig. 4A). We base this conclusion on the fact that CRMs placed in
the reporter ~3kb upstream of the cognate proximal promoter (Fig. 3D) are still able to diminish
its activity (Fig. 4A). Hematopoietic genes are known to have distal silencers; a distal element
23
located 2.8kb upstream of Gata2 is known to mediate its repression by GATA1 (Grass et al.,
2003) for example. However, known enhancers vastly outnumber silencers (Wilson et al., 2011).
This situation is quite likely the result of a bias toward detecting enhancers in reporter design
rather than an actual deficit in the number of silencers relative to enhancers. Use of reporters
designed to detect both up- and down-regulation, such as the ones employed here, will likely lead
to the discovery of many more silencer elements.
The significance of the result is further clarified once we consider the identity of the repressors
inferred to be regulating the silencers (Fig. 8A). TFs binding to the distal silencers and repressing
the activity of the proximal promoter (Fig. 7F and S7), such as EBF1 and the GATA family, are
expressed at very low levels in the myeloid lineage but at high levels in alternative ones. The
repression of Cebpa by non-myeloid TFs is supported by two results. First, correctly simulating
silencer activity was only possible in models that included non-myeloid TFBS detected in the
silencer CRMs (compare panels A and D of Fig. 6). Second, silencers are occupied in vivo by the
predicted repressors (Fig. S11). These results show that cross-lineage antagonism (Cantor and
Orkin, 2001; Chou et al., 2009; Huang et al., 2007; Laslo et al., 2008; Laslo et al., 2006) is
mediated by distal silencers.
Hematopoietic lineage resolution is currently believed to occur in a hierarchical manner, where
pairs of pivotal TFs function as bistable switches and repress each others’ targets (Graf and Enver,
2009; Laslo et al., 2008). The mediation of cross-lineage antagonism by silencers suggests instead
that lineage-specifying TFs form densely interconnected repressive networks. For example, it has
been suggested that C/EBPα functions in a cross-antagonistic pair together with EBF1 to resolve
the myeloid and B-lymphoid lineages (Laslo et al., 2008; Pongubala et al., 2008). Our results
together with those of Reynaud et al. (Reynaud et al., 2008), show that Cebpa is also repressed
redundantly by Ikaros, which itself regulates EBF1 (Pongubala et al., 2008). This makes it
difficult to partition the triplet into antagonistic pairs. Similarly, it was recently shown that
GATA2 represses Sfpi1 in combination with GATA1 and its knockdown leads to myeloid
differentiation of multipotential progenitors (May et al., 2013). Ascertaining whether silencermediated cross-antagonism is a general property of hematopoietic GRNs will require widespread
cis-regulatory dissection with reporters designed to detect potential distal silencers as well as
enhancers.
24
Although the molecular mechanisms of the repression of most hematopoietic regulators remain to
be elucidated, our data suggest that cross-lineage repressors are interacting directly with the
promoter. There are two modes by which a distantly bound repressor may repress a gene. The
first is by displacing an activator bound to the distal CRM as seen, for example, in the GATA1dependent repression of Gata2 (Grass et al., 2003), which is represented as quenching in the
model (Fig. 2C). The second mode is to directly repress the activity of the proximal promoter
(long-range repression; Fig. 2E) by interfering with promoter-enhancer interactions or recruiting
chromatin-remodeling enzymes to establish large repressive chromatin domains (Harmston and
Lenhard, 2013). For example, Ikaros is found in complex with components of the nucleosome
remodeling and deacetylation (NURD) and SWI-SNF complexes (Georgopoulos, 2002) and the
GATA3 DNA binding domain bridges two separate DNA sequences, suggesting an ability to
mediate looping or long distance interactions (Chen et al., 2012). Here, the cross-lineage
repressors of Cebpa must be acting in the second, long-range, manner since reporters carrying
both the silencer and proximal promoter have lower activity than those carrying proximal
promoters alone (Fig. 4). In contrast, quenching would predict that the combined reporter has the
same or higher activity, since it can, at most, reduce the activation provided by the CRM to zero.
Although most of our cis-regulatory inferences are consistent with prior genetic, biochemical, and
genomic evidence, there are a few points of discord. For example, the assignment of a repressive
role to C/EBPα is inconsistent with its ability to transactivate its own promoter (Legraverend et
al., 1993). The inconsistent assignment has most likely been made because C/EBPδ and C/EBPα,
having very similar binding properties and expression patterns, can substitute for each other in the
model. This in silico redundancy reflects the in vivo redundancy of the binding and activity of the
C/EBP family members (Friedman, 2007b; Nerlov, 2007; Tsukada et al., 2011). Second, Gfi1,
known to function as a repressor (Laslo et al., 2006; Yücel et al., 2004), has been inferred to be an
activator here, albeit with a rather minor contribution. The model might be utilizing Gfi1 as a
stand-in for an activator with redundant binding properties that is yet to included in our analysis.
Redundancy arising from TF binding properties may be addressed in the future by acquiring TF
concentration and reporter activity data from more cell types and including more TFs. This will
allow the model to distinguish between members of TF families based on their differential
expression in multiple cell types.
Notwithstanding the exceptions noted above, our cis-regulatory dissection paints a rather complex
picture of Cebpa regulation (Fig. 8). Broadly speaking, Cebpa is activated by TFs implicated in
25
the specification of the myeloid lineage and repressed by TFs directing the specification of
alternative hematopoietic lineages. We found two PU.1-dependent enhancers, which implies—
taken together with the observation that C/EBPα activates Sfpi1 by binding to a distal enhancer
(Friedman, 2007a; Yeamans et al., 2007)—that the pair form a positive feedback loop (Alon,
2007). Cebpa is also activated by the binding of C/EBP family members to the proximal promoter
(Fig. 7A), forming another positive feedback loop. Positive feedback loops can result in bistable
behavior (Alon, 2007) and Cebpa’s participation in two of them might be a strategy to maintain
stable gene expression once induced. The activation of Cebpa by Egr1 via Cebpa(14) is
surprising, since Egr1 and Egr2 are thought to antagonize Gfi1 to resolve the macrophage and
neutrophil gene expression programs (Laslo et al., 2006). However, the expression level of
retrovirally-expressed Cebpa controls the ratio of CD11b+Gr-1- macrophages to CD11b+Gr-1+
neutrophils (Dahl et al., 2003); activation from Egr1 might serve to tune the level of Cebpa
expression in the two cell types. Furthermore, this interaction might occur in liver tissue, where
Cebpa and Egr1 are also coexpressed (Jakobsen et al., 2013). To summarize, the scheme of
Cebpa activation combines induction by PU.1 and potentially other C/EBP family factors
(Novershtern et al., 2011), positive feedback loops to stably maintain expression level, and
potential tuning by Egr1 within the myeloid lineage.
Besides alternative-lineage repressors, Cebpa is also predicted to be repressed by TFs
coexpressed in the myeloid lineage, such as Jun and Myb. Such repressors are mostly active at
enhancers (Fig. 7E and S6) or inactive (Fig. S8E,F) CRMs. During hematopoiesis, Cebpa is
expressed at low levels in HSCs, monocytes, and granulocytes but at high levels in GMPs
(Bagger et al., 2013; Hasemann et al., 2014). Jun, which can function as a repressor by forming
heterodimers with C/EBPβ (Hsu et al., 1994), has an approximately complementary pattern of
expression, with the exception that it is expressed at low levels in granulocytes (Bagger et al.,
2013; http://servers.binf.ku.dk/hemaexplorer/). Repression by Jun is a potential mechanism to
achieve the downregulation of Cebpa after the differentiation of GMPs, although confirming this
possibility would require the characterization of the time courses of CRM activity in the PUER
system.
Both the expression pattern and inferred regulation of Egr1 are considerably simpler than Cebpa
(Fig. 4B and 8). Egr1 is predicted to be activated by Ets1 and itself and repressed by EBF1 and
the GATA(s). This apparent difference in the complexity of regulation could be either genuine or
arise from differences in the evolutionary conservation—our criterion for identifying CRMs—of
26
the regulatory sequences of the two genes. The latter possibility may be checked by identifying
putative CRMs using other means such as DNase I hypersensitivity (Hesselberth et al., 2009) or
the binding of other hematopoietic TFs. If the regulatory complexity of the two genes is indeed
different, it would suggest that Cebpa enjoys a more prominent position than Egr1 in the gene
network directing myeloid differentiation (Laslo et al., 2006). Dissecting the gene regulation of
other myeloid genes in this manner will help clarify the construction of the network.
The model was unable to produce any clear inferences about the regulation of Egr2 (Fig. S10).
The main reason for the lack of clear conclusions appears to be the general inactivity of Egr2
CRMs (Fig. 4C). A potential explanation for the inertness of conserved sequences in the Egr2
locus is that they serve regulatory functions in other cell types or during Egr2’s rapid induction as
an immediate early gene. This possibility may be checked by measuring the enrichment of
chromatin marks, such as H3K4me1 and H3K27ac (Creyghton et al., 2010; Lara-Astiaso et al.,
2014), at the apparently inert locations in other cell types. Such data might not serve an analogous
function for silencers since it’s not known whether any marks are enriched at distal silencers or
not.
Here we have utilized a model that assumes very little about the specific mechanisms of TF-TF
interaction, such as dimerization (Chlon et al., 2012; Hsu et al., 1994), switchable
activation/repression (He et al., 2012a), and sequestration (Cantor and Orkin, 2002), that are
known—for a few example TFs—to operate in mammalian gene regulation. Lacking
comprehensive knowledge about which TFs interact in these ways, we believe that parsimony in
assumptions combined with inference from data is the more prudent approach. In the future, as
proteomic approaches generate comprehensive maps of protein-protein interactions, it will be
possible to implement such specific mechanisms into our framework to increase its power. A
second limitation is the use of genome-wide gene expression data as a proxy for TF
concentration, which leaves out post-transcriptional and post-translational regulation from the
analysis. Although including these phenomena is desirable, using a standard and relatively easyto-acquire dataset such as genome-wide gene expression permits a broad and unbiased approach
to identifying candidate TFs that is applicable to a wide range of cell types and organisms. In the
future, it will be possible to include post-transcriptional and –translational regulation by using
proteomic technologies such as micro-western arrays (Ciaccio et al., 2010) and modificationspecific antibodies.
27
cis-regulatory analysis is generally considered to be the gold standard for establishing functional
regulatory linkages within GRNs (Nam et al., 2010). Successful cis-regulatory dissection
necessitates methodologies for mapping transcriptional inputs and regulatory sequences to their
output. Furthermore, regulatory control by multiple interacting TFs creates a formidable
challenge since the potential number of sites and TFs to be tested is very large. The approach we
have developed here leverages the mathematical rules of gene regulation, as understood currently,
to map the inputs—TF expression patterns, TF binding preferences, and CRM sequence—to
CRM activity patterns. We overcome the challenge of multiple inputs by allowing regulation by
several TFs and combinatorially testing all possible regulatory schemes. The approach generates
specific predictions that may be tested readily. A recent technological innovation, massively
parallel reporter assays (MPRA) (Arnold et al., 2013; Levo and Segal, 2014; Melnikov et al.,
2012; Nam et al., 2010; Sharon et al., 2012) that measure the activity of thousands of CRMs in
parallel, further enhances the scalability of our approach. We expect that combining the approach
presented here with MPRA datasets will enable cis-regulatory dissection on a genomic scale.
Acknowledgements
We thank M. Kreitman for the use of laboratory facilities during the course of this work. We
thank M. Kreitman, M. Ludwig, K. Barr, and J. Gavin-Smith for discussions and A. Repele for
comments on the manuscript. EB would like to thank H. Singh for support and discussions and J.
Quintans for support. This work was supported by IIA-1355466, project UND0019821 from NSF
ND EPSCoR (to M), 2R01OD10936 from NIH (to JR), and in part by the Chicago Biomedical
Consortium with support from the Searle Funds at The Chicago Community Trust (to EB).
28
References
Alon, U., 2007. Network motifs: theory and experimental approaches. Nat Rev Genet 8,
450-461.
Arnold, C.D., Gerlach, D., Stelzer, C., Boryn, L.M., Rath, M., Stark, A., 2013. GenomeWide Quantitative Enhancer Activity Maps Identified by STARR-seq. Science.
Arnosti, D., Gray, S., Barolo, S., Zhou, J., Levine, M., 1996a. The gap protein Knirps
mediates both quenching and direct repression in the Drosophila embryo. The
EMBO Journal 15, 3659-3666.
Arnosti, D.N., Barolo, S., Levine, M., Small, S., 1996b. The eve stripe 2 enhancer
employs multiple modes of transcriptional synergy. Development 122, 205-214.
Bagger, F.O., Rapin, N., Theilgaard-Mönch, K., Kaczkowski, B., Thoren, L.A.,
Jendholm, J., Winther, O., Porse, B.T., 2013. HemaExplorer: a database of mRNA
expression profiles in normal and malignant haematopoiesis. Nucleic Acids Res
41, D1034-1039.
Bain, G., Maandag, E.C., Izon, D.J., Amsen, D., Kruisbeek, A.M., Weintraub, B.C., Krop,
I., Schlissel, M.S., Feeney, A.J., van Roon, M., 1994. E2A proteins are required
for proper B cell development and initiation of immunoglobulin gene
rearrangements. Cell 79, 885-892.
Banerji, J., Olson, L., Schaffner, W., 1983. A lymphocyte-specific cellular enhancer is
located downstream of the joining region in immunoglobulin heavy chain genes.
Cell 33, 729-740.
Banerji, J., Rusconi, S., Schaffner, W., 1981. Expression of a beta-globin gene is
enhanced by remote SV40 DNA sequences. Cell 27, 299--308.
Bertolino, E., Reddy, K., Medina, K.L., Parganas, E., Ihle, J., Singh, H., 2005. Regulation
of interleukin 7-dependent immunoglobulin heavy-chain variable gene
rearrangements by transcription factor STAT5. Nat Immunol 6, 836-843.
Bertolino, E., Singh, H., 2002. POU/TBP cooperativity: a mechanism for enhancer action
from a distance. Mol Cell 10, 397-407.
Bories, J.C., Willerford, D.M., Grévin, D., Davidson, L., Camus, A., Martin, P., Stéhelin,
D., Alt, F.W., 1995. Increased T-cell apoptosis and terminal B-cell differentiation
induced by inactivation of the Ets-1 proto-oncogene. Nature 377, 635-638.
Brand, A.H., Breeden, L., Abraham, J., Sternglanz, R., Nasmyth, K., 1985.
Characterization of a "silencer" in yeast: a DNA sequence with properties
opposite to those of a transcriptional enhancer. Cell 41, 41-48.
Calero-Nieto, F.J., Ng, F.S., Wilson, N.K., Hannah, R., Moignard, V., Leal-Cervantes,
A.I., Jimenez-Madrid, I., Diamanti, E., Wernisch, L., Göttgens, B., 2014. Key
regulators control distinct transcriptional programmes in blood progenitor and
mast cells. EMBO J 33, 1212-1226.
Cantor, A.B., Orkin, S.H., 2001. Hematopoietic development: a balancing act. Curr Opin
Genet Dev 11, 513-519.
Cantor, A.B., Orkin, S.H., 2002. Transcriptional regulation of erythropoiesis: an affair
involving multiple partners. Oncogene 21, 3368-3376.
Carey, M.F., Smale, S.T., Peterson, C.L., 2008. Transcriptional Regulation in Eukaryotes:
Concepts, Strategies, and Techniques, 2nd ed. Cold Spring Harbor Laboratory
Press.
29
Chen, Y., Bates, D.L., Dey, R., Chen, P.-H., Machado, A.C.D., Laird-Offringa, I.A., Rohs,
R., Chen, L., 2012. DNA binding by GATA transcription factor suggests
mechanisms of DNA looping and long-range gene regulation. Cell Rep 2, 11971206.
Chlon, T.M., Doré, L.C., Crispino, J.D., 2012. Cofactor-Mediated Restriction of GATA-1
Chromatin Occupancy Coordinates Lineage-Specific Gene Expression. Mol Cell.
Choi, H.-J., Geng, Y., Cho, H., Li, S., Giri, P.K., Felio, K., Wang, C.-R., 2011.
Differential requirements for the Ets transcription factor Elf-1 in the development
of NKT cells and NK cells. Blood 117, 1880-1887.
Chopra, V.S., Kong, N., Levine, M., 2012. Transcriptional repression via antilooping in
the Drosophila embryo. Proc Natl Acad Sci U S A 109, 9460-9464.
Chou, S.T., Khandros, E., Bailey, L.C., Nichols, K.E., Vakoc, C.R., Yao, Y., Huang, Z.,
Crispino, J.D., Hardison, R.C., Blobel, G.A., Weiss, M.J., 2009. Graded
repression of PU.1/Sfpi1 gene transcription by GATA factors regulates
hematopoietic cell fate. Blood 114, 983-994.
Ciaccio, M.F., Wagner, J.P., Chuu, C.-P., Lauffenburger, D.A., Jones, R.B., 2010. Systems
analysis of EGF receptor signaling dynamics with microwestern arrays. Nat
Methods 7, 148-155.
Cooper, S., Guo, H., Friedman, A.D., 2015. The +37 kb Cebpa Enhancer Is Critical for
Cebpa Myeloid Gene Expression and Contains Functional Sites that Bind SCL,
GATA2, C/EBPα, PU.1, and Additional Ets Factors. PLoS One 10, e0126385.
Creyghton, M.P., Cheng, A.W., Welstead, G.G., Kooistra, T., Carey, B.W., Steine, E.J.,
Hanna, J., Lodato, M.A., Frampton, G.M., Sharp, P.A., Boyer, L.A., Young, R.A.,
Jaenisch, R., 2010. Histone H3K27ac separates active from poised enhancers and
predicts developmental state. Proc Natl Acad Sci U S A 107, 21931-21936.
Dahl, R., Walsh, J.C., Lancki, D., Laslo, P., Iyer, S.R., Singh, H., Simon, M.C., 2003.
Regulation of macrophage and neutrophil cell fates by the PU.1:C/EBPalpha ratio
and granulocyte colony-stimulating factor. Nat Immunol 4, 1029-1036.
DeKoter, R.P., Lee, H.-J., Singh, H., 2002. PU.1 regulates expression of the interleukin-7
receptor in lymphoid progenitors. Immunity 16, 297-309.
DeKoter, R.P., Singh, H., 2000. Regulation of B lymphocyte and macrophage
development by graded expression of PU.1. Science 288, 1439-1441.
Doré, L.C., Chlon, T.M., Brown, C.D., White, K.P., Crispino, J.D., 2012. Chromatin
occupancy analysis reveals genome-wide GATA factor switching during
hematopoiesis. Blood 119, 3724-3733.
Dunipace, L., Ozdemir, A., Stathopoulos, A., 2011. Complex interactions between cisregulatory modules in native conformation are critical for Drosophila snail
expression. Development.
ENCODE Project Consortium, Bernstein, B.E., Birney, E., Dunham, I., Green, E.D.,
Gunter, C., Snyder, M., 2012. An integrated encyclopedia of DNA elements in the
human genome. Nature 489, 57-74.
Friedman, A.D., 2007a. C/EBPa induces PU.1 and interacts with AP-1 and NF-kB to
regulate myeloid development. Blood Cells Mol Dis 39, 340 - 343.
Friedman, A.D., 2007b. Transcriptional control of granulocyte and monocyte
development. Oncogene 26, 6816-6828.
30
Fromental, C., Kanno, M., Nomiyama, H., Chambon, P., 1988. Cooperativity and
hierarchical levels of functional organization in the SV40 enhancer. Cell 54, 943953.
Garber, M., Yosef, N., Goren, A., Raychowdhury, R., Thielke, A., Guttman, M.,
Robinson, J., Minie, B., Chevrier, N., Itzhaki, Z., Blecher-Gonen, R., Bornstein,
C., Amann-Zalcenstein, D., Weiner, A., Friedrich, D., Meldrim, J., Ram, O.,
Cheng, C., Gnirke, A., Fisher, S., Friedman, N., Wong, B., Bernstein, B.E.,
Nusbaum, C., Hacohen, N., Regev, A., Amit, I., 2012. A high-throughput
chromatin immunoprecipitation approach reveals principles of dynamic gene
regulation in mammals. Mol Cell 47, 810-822.
Georgopoulos, K., 2002. Haematopoietic cell-fate decisions, chromatin regulation and
ikaros. Nat Rev Immunol 2, 162-174.
Gerstein, M.B., Kundaje, A., Hariharan, M., Landt, S.G., Yan, K.-K., Cheng, C., Mu,
X.J., Khurana, E., Rozowsky, J., Alexander, R., Min, R., Alves, P., Abyzov, A.,
Addleman, N., Bhardwaj, N., Boyle, A.P., Cayting, P., Charos, A., Chen, D.Z.,
Cheng, Y., Clarke, D., Eastman, C., Euskirchen, G., Frietze, S., Fu, Y., Gertz, J.,
Grubert, F., Harmanci, A., Jain, P., Kasowski, M., Lacroute, P., Leng, J., Lian, J.,
Monahan, H., O'Geen, H., Ouyang, Z., Partridge, E.C., Patacsil, D., Pauli, F.,
Raha, D., Ramirez, L., Reddy, T.E., Reed, B., Shi, M., Slifer, T., Wang, J., Wu, L.,
Yang, X., Yip, K.Y., Zilberman-Schapira, G., Batzoglou, S., Sidow, A., Farnham,
P.J., Myers, R.M., Weissman, S.M., Snyder, M., 2012. Architecture of the human
regulatory network derived from ENCODE data. Nature 489, 91-100.
Gerstein, M.B., Lu, Z.J., Van Nostrand, E.L., Cheng, C., Arshinoff, B.I., Liu, T., Yip,
K.Y., Robilotto, R., Rechtsteiner, A., Ikegami, K., Alves, P., Chateigner, A., Perry,
M., Morris, M., Auerbach, R.K., Feng, X., Leng, J., Vielle, A., Niu, W.,
Rhrissorrakrai, K., Agarwal, A., Alexander, R.P., Barber, G., Brdlik, C.M.,
Brennan, J., Brouillet, J.J., Carr, A., Cheung, M.-S., Clawson, H., Contrino, S.,
Dannenberg, L.O., Dernburg, A.F., Desai, A., Dick, L., Dosé, A.C., Du, J.,
Egelhofer, T., Ercan, S., Euskirchen, G., Ewing, B., Feingold, E.A., Gassmann,
R., Good, P.J., Green, P., Gullier, F., Gutwein, M., Guyer, M.S., Habegger, L.,
Han, T., Henikoff, J.G., Henz, S.R., Hinrichs, A., Holster, H., Hyman, T., Iniguez,
A.L., Janette, J., Jensen, M., Kato, M., Kent, W.J., Kephart, E., Khivansara, V.,
Khurana, E., Kim, J.K., Kolasinska-Zwierz, P., Lai, E.C., Latorre, I., Leahey, A.,
Lewis, S., Lloyd, P., Lochovsky, L., Lowdon, R.F., Lubling, Y., Lyne, R.,
MacCoss, M., Mackowiak, S.D., Mangone, M., McKay, S., Mecenas, D.,
Merrihew, G., Miller, r., David M, Muroyama, A., Murray, J.I., Ooi, S.-L., Pham,
H., Phippen, T., Preston, E.A., Rajewsky, N., Rätsch, G., Rosenbaum, H.,
Rozowsky, J., Rutherford, K., Ruzanov, P., Sarov, M., Sasidharan, R., Sboner, A.,
Scheid, P., Segal, E., Shin, H., Shou, C., Slack, F.J., Slightam, C., Smith, R.,
Spencer, W.C., Stinson, E.O., Taing, S., Takasaki, T., Vafeados, D., Voronina, K.,
Wang, G., Washington, N.L., Whittle, C.M., Wu, B., Yan, K.-K., Zeller, G., Zha,
Z., Zhong, M., Zhou, X., modENCODE Consortium, Ahringer, J., Strome, S.,
Gunsalus, K.C., Micklem, G., Liu, X.S., Reinke, V., Kim, S.K., Hillier, L.W.,
Henikoff, S., Piano, F., Snyder, M., Stein, L., Lieb, J.D., Waterston, R.H., 2010.
Integrative analysis of the Caenorhabditis elegans genome by the modENCODE
project. Science 330, 1775-1787.
31
Göttgens, B., Nastos, A., Kinston, S., Piltz, S., Delabesse, E.C.M., Stanley, M., Sanchez,
M.-J., Ciau-Uitz, A., Patient, R., Green, A.R., 2002. Establishing the
transcriptional programme for blood: the SCL stem cell enhancer is regulated by a
multiprotein complex containing Ets and GATA factors. EMBO J 21, 3039-3050.
Graf, T., Enver, T., 2009. Forcing cells to change lineages. Nature 462, 587-594.
Grass, J.A., Boyer, M.E., Pal, S., Wu, J., Weiss, M.J., Bresnick, E.H., 2003. GATA-1dependent transcriptional repression of GATA-2 via disruption of positive
autoregulation and domain-wide chromatin remodeling. Proc Natl Acad Sci U S A
100, 8811-8816.
Gray, S., Levine, M., 1996. Short-range transcriptional repressors mediate both
quenching and direct repression within complex loci in Drosophila. Genes and
Development 10, 700--710.
Guo, H., Ma, O., Friedman, A.D., 2014. The Cebpa +37-kb enhancer directs transgene
expression to myeloid progenitors and to long-term hematopoietic stem cells. J
Leukoc Biol 96, 419-426.
Guo, H., Ma, O., Speck, N.A., Friedman, A.D., 2012. Runx1 deletion or dominant
inhibition reduces Cebpa transcription via conserved promoter and distal enhancer
sites to favor monopoiesis over granulopoiesis. Blood 119, 4408-4418.
Hardison, R.C., Taylor, J., 2012. Genomic approaches towards finding cis-regulatory
modules in animals. Nat Rev Genet 13, 469-483.
Harmston, N., Lenhard, B., 2013. Chromatin and epigenetic features of long-range gene
regulation. Nucleic Acids Res 41, 7185-7199.
Hasemann, M.S., Lauridsen, F.K.B., Waage, J., Jakobsen, J.S., Frank, A.-K., Schuster,
M.B., Rapin, N., Bagger, F.O., Hoppe, P.S., Schroeder, T., Porse, B.T., 2014.
C/EBPα is required for long-term self-renewal and lineage priming of
hematopoietic stem cells and for the maintenance of epigenetic configurations in
multipotent progenitors. PLoS Genet 10, e1004079.
He, A., Shen, X., Ma, Q., Cao, J., von Gise, A., Zhou, P., Wang, G., Marquez, V.E., Orkin,
S.H., Pu, W.T., 2012a. PRC2 directly methylates GATA4 and represses its
transcriptional activity. Genes Dev 26, 37-42.
He, X., Duque, T.S.P.C., Sinha, S., 2012b. Evolutionary origins of transcription factor
binding site clusters. Mol Biol Evol 29, 1059-1070.
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre,
C., Singh, H., Glass, C.K., 2010. Simple combinations of lineage-determining
transcription factors prime cis-regulatory elements required for macrophage and B
cell identities. Mol Cell 38, 576-589.
Hertz, G.Z., Hartzell, r., G W, Stormo, G.D., 1990. Identification of consensus patterns in
unaligned DNA sequences known to be functionally related. Comput Appl Biosci
6, 81-92.
Hesselberth, J.R., Chen, X., Zhang, Z., Sabo, P.J., Sandstrom, R., Reynolds, A.P.,
Thurman, R.E., Neph, S., Kuehn, M.S., Noble, W.S., Fields, S.,
Stamatoyannopoulos, J.A., 2009. Global mapping of protein-DNA interactions in
vivo by digital genomic footprinting. Nat Methods 6, 283-289.
Hsu, W., Kerppola, T.K., Chen, P.L., Curran, T., Chen-Kiang, S., 1994. Fos and Jun
repress transcription activation by NF-IL6 through association at the basic zipper
region. Mol Cell Biol 14, 268-276.
32
Huang, S., Guo, Y., May, G., Enver, T., 2007. Bifurcation dynamics in lineagecommitment in bipotent progenitor cells. Developmental Biology 305, 695--713.
Huang, Z., Dore, L.C., Li, Z., Orkin, S.H., Feng, G., Lin, S., Crispino, J.D., 2009. GATA2 reinforces megakaryocyte development in the absence of GATA-1. Mol Cell
Biol 29, 5168-5180.
Jakobsen, J.S., Waage, J., Rapin, N., Bisgaard, H.C., Larsen, F.S., Porse, B.T., 2013.
Temporal mapping of CEBPA and CEBPB binding during liver regeneration
reveals dynamic occupancy and specific regulatory codes for homeostatic and cell
cycle gene batteries. Genome Res 23, 592-603.
Janssens, H., Hou, S., Jaeger, J., Kim, A., Myasnikova, E., Sharp, D., Reinitz, J., 2006.
Quantitative and predictive model of transcriptional control of the Drosophila
melanogaster even-skipped gene. Nature Genetics 38, 1159-1165.
Kazemian, M., Blatti, C., Richards, A., McCutchan, M., Wakabayashi-Ito, N.,
Hammonds, A.S., Celniker, S.E., Kumar, S., Wolfe, S.A., Brodsky, M.H., Sinha,
S., 2010. Quantitative analysis of the Drosophila segmentation regulatory
network using pattern generating potentials. PLoS Biol 8.
Kim, A.-R., Martinez, C., Ionides, J., Ramos, A.F., Ludwig, M.Z., Ogawa, N., Sharp,
D.H., Reinitz, J., 2013. Rearrangements of 2.5 kilobases of noncoding DNA from
the Drosophila even-skipped locus define predictive rules of genomic cisregulatory logic. PLoS Genet 9, e1003243.
Ko, L.J., Engel, J.D., 1993. DNA-binding specificities of the GATA transcription factor
family. Mol Cell Biol 13, 4011-4022.
Kueh, H.Y., Champhekhar, A., Nutt, S.L., Elowitz, M.B., Rothenberg, E.V., 2013.
Positive Feedback Between PU.1 and the Cell Cycle Controls Myeloid
Differentiation. Science.
Kulkarni, M.M., Arnosti, D.N., 2005. cis-Regulatory logic of short-range transcriptional
repression in Drosophila melanogaster. Molecular and Cellular Biology 25, 34113420.
Lam, J., Delosme, J.-M., 1988a. An efficient simulated annealing schedule: Derivation.
Yale Electrical Engineering Department, New Haven, CT.
Lam, J., Delosme, J.-M., 1988b. An efficient simulated annealing schedule:
Implementation and evaluation. Yale Electrical Engineering Department, New
Haven, CT.
Landry, J.-R., Bonadies, N., Kinston, S., Knezevic, K., Wilson, N.K., Oram, S.H., Janes,
M., Piltz, S., Hammett, M., Carter, J., Hamilton, T., Donaldson, I.J., Lacaud, G.,
Frampton, J., Follows, G., Kouskoff, V., Göttgens, B., 2009. Expression of the
leukemia oncogene Lmo2 is controlled by an array of tissue-specific elements
dispersed over 100 kb and bound by Tal1/Lmo2, Ets, and Gata factors. Blood 113,
5783-5792.
Lara-Astiaso, D., Weiner, A., Lorenzo-Vivas, E., Zaretsky, I., Jaitin, D.A., David, E.,
Keren-Shaul, H., Mildner, A., Winter, D., Jung, S., Friedman, N., Amit, I., 2014.
Immunogenetics. Chromatin state dynamics during blood formation. Science 345,
943-949.
Laslo, P., Pongubala, J.M.R., Lancki, D.W., Singh, H., 2008. Gene regulatory networks
directing myeloid and lymphoid cell fates within the immune system. Semin
Immunol 20, 228-235.
33
Laslo, P., Spooner, C.J., Warmflash, A., Lancki, D.W., Lee, H.-J., Sciammas, R., Gantner,
B.N., Dinner, A.R., Singh, H., 2006. Multilineage transcriptional priming and
determination of alternate hematopoietic cell fates. Cell 126, 755-766.
Leddin, M., Perrod, C., Hoogenkamp, M., Ghani, S., Assi, S., Heinz, S., Wilson, N.K.,
Follows, G., Schönheit, J., Vockentanz, L., Mosammam, A.M., Chen, W., Tenen,
D.G., Westhead, D.R., Göttgens, B., Bonifer, C., Rosenbauer, F., 2011. Two
distinct auto-regulatory loops operate at the PU.1 locus in B cells and myeloid
cells. Blood 117, 2827-2838.
Legraverend, C., Antonson, P., Flodby, P., Xanthopoulos, K.G., 1993. High level activity
of the mouse CCAAT/enhancer binding protein (C/EBP alpha) gene promoter
involves autoregulation and several ubiquitous transcription factors. Nucleic
Acids Res 21, 1735-1742.
Levine, M., Davidson, E.H., 2005. Gene regulatory networks for development. Proc Natl
Acad Sci U S A 102, 4936-4942.
Levo, M., Segal, E., 2014. In pursuit of design principles of regulatory sequences. Nat
Rev Genet 15, 453-468.
Li, Y., Okuno, Y., Zhang, P., Radomska, H.S., Chen, H., Iwasaki, H., Akashi, K., Klemsz,
M.J., McKercher, S.R., Maki, R.A., Tenen, D.G., 2001. Regulation of the PU.1
gene by distal elements. Blood 98, 2958-2965.
Lin, H., Grosschedl, R., 1995. Failure of B-cell differentiation in mice lacking the
transcription factor EBF. Nature 376, 263-267.
Lin, Y.C., Jhunjhunwala, S., Benner, C., Heinz, S., Welinder, E., Mansson, R.,
Sigvardsson, M., Hagman, J., Espinoza, C.A., Dutkowski, J., Ideker, T., Glass,
C.K., Murre, C., 2010. A global network of transcription factors, involving E2A,
EBF1 and Foxo1, that orchestrates B cell fate. Nat Immunol 11, 635-643.
Lulli, V., Romania, P., Morsilli, O., Gabbianelli, M., Pagliuca, A., Mazzeo, S., Testa, U.,
Peschle, C., Marziali, G., 2006. Overexpression of Ets-1 in human hematopoietic
progenitor cells blocks erythroid and promotes megakaryocytic differentiation.
Cell Death Differ 13, 1064-1074.
Lulli, V., Romania, P., Riccioni, R., Boe, A., Lo-Coco, F., Testa, U., Marziali, G., 2010.
Transcriptional silencing of the ETS1 oncogene contributes to human
granulocytic differentiation. Haematologica 95, 1633-1641.
Mathelier, A., Zhao, X., Zhang, A.W., Parcy, F., Worsley-Hunt, R., Arenillas, D.J.,
Buchman, S., Chen, C.-y., Chou, A., Ienasescu, H., Lim, J., Shyr, C., Tan, G.,
Zhou, M., Lenhard, B., Sandelin, A., Wasserman, W.W., 2014. JASPAR 2014: an
extensively expanded and updated open-access database of transcription factor
binding profiles. Nucleic Acids Res 42, D142-147.
Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter,
I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., LewickiPotapov, B., Saxel, H., Kel, A.E., Wingender, E., 2006. TRANSFAC and its
module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic
Acids Res 34, D108-110.
May, G., Soneji, S., Tipping, A.J., Teles, J., McGowan, S.J., Wu, M., Guo, Y., Fugazza,
C., Brown, J., Karlsson, G., Pina, C., Olariu, V., Taylor, S., Tenen, D.G., Peterson,
C., Enver, T., 2013. Dynamic analysis of gene expression and genome-wide
34
transcription factor binding during lineage specification of multipotent
progenitors. Cell Stem Cell 13, 754-768.
Melnikov, A., Murugan, A., Zhang, X., Tesileanu, T., Wang, L., Rogov, P., Feizi, S.,
Gnirke, A., Callan, J., Curtis G, Kinney, J.B., Kellis, M., Lander, E.S., Mikkelsen,
T.S., 2012. Systematic dissection and optimization of inducible enhancers in
human cells using a massively parallel reporter assay. Nat Biotechnol 30, 271277.
Merika, M., Orkin, S.H., 1993. DNA-binding specificity of GATA family transcription
factors. Mol Cell Biol 13, 3999-4010.
Muthusamy, N., Barton, K., Leiden, J.M., 1995. Defective activation and survival of T
cells lacking the Ets-1 transcription factor. Nature 377, 639-642.
Nam, J., Dong, P., Tarpine, R., Istrail, S., Davidson, E.H., 2010. Functional cis-regulatory
genomics for systems biology. Proc Natl Acad Sci U S A 107, 3930-3935.
Nègre, N., Brown, C.D., Ma, L., Bristow, C.A., Miller, S.W., Wagner, U., Kheradpour, P.,
Eaton, M.L., Loriaux, P., Sealfon, R., Li, Z., Ishii, H., Spokony, R.F., Chen, J.,
Hwang, L., Cheng, C., Auburn, R.P., Davis, M.B., Domanus, M., Shah, P.K.,
Morrison, C.A., Zieba, J., Suchy, S., Senderowicz, L., Victorsen, A., Bild, N.A.,
Grundstad, A.J., Hanley, D., MacAlpine, D.M., Mannervik, M., Venken, K.,
Bellen, H., White, R., Gerstein, M., Russell, S., Grossman, R.L., Ren, B.,
Posakony, J.W., Kellis, M., White, K.P., 2011. A cis-regulatory map of the
Drosophila genome. Nature 471, 527-531.
Nerlov, C., 2007. The C/EBP family of transcription factors: a paradigm for interaction
between gene expression and proliferation control. Trends Cell Biol 17, 318-324.
Nguyen, H.Q., Hoffman-Liebermann, B., Liebermann, D.A., 1993. The zinc finger
transcription factor Egr-1 is essential for and restricts differentiation along the
macrophage lineage. Cell 72, 197-209.
Novershtern, N., Subramanian, A., Lawton, L.N., Mak, R.H., Haining, W.N., McConkey,
M.E., Habib, N., Yosef, N., Chang, C.Y., Shay, T., Frampton, G.M., Drake,
A.C.B., Leskov, I., Nilsson, B., Preffer, F., Dombkowski, D., Evans, J.W., Liefeld,
T., Smutko, J.S., Chen, J., Friedman, N., Young, R.A., Golub, T.R., Regev, A.,
Ebert, B.L., 2011. Densely interconnected transcriptional circuits control cell
states in human hematopoiesis. Cell 144, 296-309.
Ogbourne, S., Antalis, T.M., 1998. Transcriptional control and the role of silencers in
transcriptional regulation in eukaryotes. Biochem J 331 ( Pt 1), 1-14.
Ondek, B., Gloss, L., Herr, W., 1988. The SV40 enhancer contains two distinct levels of
organization. Nature 333, 40-45.
Perry, M.W., Boettiger, A.N., Levine, M., 2011. Multiple enhancers ensure precision of
gap gene-expression patterns in the Drosophila embryo. Proc Natl Acad Sci U S A
108, 13570-13575.
Pimanda, J.E., Chan, W.Y.I., Wilson, N.K., Smith, A.M., Kinston, S., Knezevic, K., Janes,
M.E., Landry, J.-R., Kolb-Kokocinski, A., Frampton, J., Tannahill, D., Ottersbach,
K., Follows, G.A., Lacaud, G., Kouskoff, V., Göttgens, B., 2008. Endoglin
expression in blood and endothelium is differentially regulated by modular
assembly of the Ets/Gata hemangioblast code. Blood 112, 4512-4522.
Pongubala, J.M.R., Northrup, D.L., Lancki, D.W., Medina, K.L., Treiber, T., Bertolino,
E., Thomas, M., Grosschedl, R., Allman, D., Singh, H., 2008. Transcription factor
35
EBF restricts alternative lineage options and promotes B cell fate commitment
independently of Pax5. Nat Immunol 9, 203-215.
Reinitz, J., Hou, S., Sharp, D.H., 2003. Transcriptional control in Drosophila.
ComPlexUs 1, 54--64.
Reinitz, J., Sharp, D.H., 1995. Mechanism of eve stripe formation. Mechanisms of
Development 49, 133--158.
Reynaud, D., Demarco, I.A., Reddy, K.L., Schjerven, H., Bertolino, E., Chen, Z., Smale,
S.T., Winandy, S., Singh, H., 2008. Regulation of B cell fate commitment and
immunoglobulin heavy-chain gene rearrangements by Ikaros. Nat Immunol 9,
927-936.
Robinson, L., Panayiotakis, A., Papas, T.S., Kola, I., Seth, A., 1997. ETS target genes:
identification of egr1 as a target by RNA differential display and whole genome
PCR techniques. Proc Natl Acad Sci U S A 94, 7170-7175.
Rodrigues, N.P., Boyd, A.S., Fugazza, C., May, G.E., Guo, Y., Tipping, A.J., Scadden,
D.T., Vyas, P., Enver, T., 2008. GATA-2 regulates granulocyte-macrophage
progenitor cell function. Blood 112, 4862-4873.
Rodrigues, N.P., Tipping, A.J., Wang, Z., Enver, T., 2012. GATA-2 mediated regulation of
normal hematopoietic stem/progenitor cell function, myelodysplasia and myeloid
leukemia. Int J Biochem Cell Biol 44, 457-460.
Rothenberg, E.V., 2014. Transcriptional control of early T and B cell developmental
choices. Annu Rev Immunol 32, 283-321.
Schirm, S., Jiricny, J., Schaffner, W., 1987. The SV40 enhancer can be dissected into
multiple segments, each with a different cell type specificity. Genes Dev 1, 65-74.
Scott, E.W., Simon, M.C., Anastasi, J., Singh, H., 1994. Requirement of transcription
factor PU.1 in the development of multiple hematopoietic lineages. Science 265,
1573-1577.
Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U., Gaul, U., 2008. Predicting
expression patterns from regulatory sequence in Drosophila segmentation. Nature
451, 535-540.
Sharon, E., Kalma, Y., Sharp, A., Raveh-Sadka, T., Levo, M., Zeevi, D., Keren, L.,
Yakhini, Z., Weinberger, A., Segal, E., 2012. Inferring gene regulatory logic from
high-throughput measurements of thousands of systematically designed
promoters. Nat Biotechnol 30, 521-530.
Singh, H., Khan, A.A., Dinner, A.R., 2014. Gene regulatory networks in the immune
system. Trends Immunol 35, 211-218.
Small, S., Arnosti, D.N., Levine, M., 1993. Spacing ensures autonomous expression of
different stripe enhancers in the even-skipped promoter. Development 119, 767-772.
Small, S., Blair, A., Levine, M., 1992. Regulation of even-skipped stripe 2 in the
Drosophila embryo. The EMBO Journal 11, 4047--4057.
Small, S., Blair, A., Levine, M., 1996. Regulation of two pair-rule stripes by a single
enhancer in the Drosophila embryo. Developmental Biology 175, 314--324.
Spitz, F., Furlong, E.E.M., 2012. Transcription factors: from enhancer binding to
developmental control. Nat Rev Genet 13, 613-626.
36
Stopka, T., Amanatullah, D.F., Papetti, M., Skoultchi, A.I., 2005. PU.1 inhibits the
erythroid program by binding to GATA-1 on DNA and creating a repressive
chromatin structure. EMBO J 24, 3712-3723.
Treiber, T., Mandel, E.M., Pott, S., Györy, I., Firner, S., Liu, E.T., Grosschedl, R., 2010.
Early B cell factor 1 regulates B cell gene networks by activation, repression, and
transcription- independent poising of chromatin. Immunity 32, 714-725.
Tsukada, J., Yoshida, Y., Kominato, Y., Auron, P.E., 2011. The CCAAT/enhancer (C/EBP)
family of basic-leucine zipper (bZIP) transcription factors is a multifaceted
highly-regulated system for gene regulation. Cytokine 54, 6-19.
Walsh, J.C., DeKoter, R.P., Lee, H.J., Smith, E.D., Lancki, D.W., Gurish, M.F., Friend,
D.S., Stevens, R.L., Anastasi, J., Singh, H., 2002. Cooperative and antagonistic
interplay between PU.1 and GATA-2 in the specification of myeloid cell fates.
Immunity 17, 665-676.
Weigelt, K., Lichtinger, M., Rehli, M., Langmann, T., 2009. Transcriptomic profiling
identifies a PU.1 regulatory network in macrophages. Biochem Biophys Res
Commun 380, 308-312.
Wilson, N.K., Calero-Nieto, F.J., Ferreira, R., Göttgens, B., 2011. Transcriptional
regulation of haematopoietic transcription factors. Stem Cell Res Ther 2, 6.
Wilson, N.K., Foster, S.D., Wang, X., Knezevic, K., Schütte, J., Kaimakis, P., Chilarska,
P.M., Kinston, S., Ouwehand, W.H., Dzierzak, E., Pimanda, J.E., de Bruijn,
M.F.T.R., Göttgens, B., 2010a. Combinatorial transcriptional control in blood
stem/progenitor cells: genome-wide analysis of ten major transcriptional
regulators. Cell Stem Cell 7, 532-544.
Wilson, N.K., Timms, R.T., Kinston, S.J., Cheng, Y.-H., Oram, S.H., Landry, J.-R.,
Mullender, J., Ottersbach, K., Gottgens, B., 2010b. Gfi1 expression is controlled
by five distinct regulatory regions spread over 100 kilobases, with Scl/Tal1,
Gata2, PU.1, Erg, Meis1, and Runx1 acting as upstream regulators in early
hematopoietic cells. Mol Cell Biol 30, 3853-3863.
Yeamans, C., Wang, D., Paz-Priel, I., Torbett, B.E., Tenen, D.G., Friedman, A.D., 2007.
C/EBPalpha binds and activates the PU.1 distal enhancer to induce monocyte
lineage commitment. Blood 110, 3136-3142.
Yücel, R., Kosan, C., Heyd, F., Möröy, T., 2004. Gfi1:green fluorescent protein knock-in
mutant reveals differential expression and autoregulation of the growth factor
independence 1 (Gfi1) gene during lymphocyte development. J Biol Chem 279,
40906-40917.
Yuh, C.-H., Bolouri, H., Davidson, E.H., 1998. Genomic cis-regulatory logic: Functional
analysis and computational model of a sea urchin gene control system. Science
279, 1896--1902.
Zhang, D.E., Zhang, P., Wang, N.D., Hetherington, C.J., Darlington, G.J., Tenen, D.G.,
1997. Absence of granulocyte colony-stimulating factor signaling and neutrophil
development in CCAAT enhancer binding protein alpha-deficient mice. Proc Natl
Acad Sci U S A 94, 569-574.
Zinzen, R.P., Senger, K., Levine, M., Papatsenko, D., 2006. Computational models for
neurogenic gene expression in the Drosophila embryo. Current Biology 16,
1358--1365.
37
Figure Legends
Figure 1. A schematic illustration of the methodology for reverse engineering cis-regulatory
logic.
Figure 2. Sequence-based model of transcription. The model takes DNA sequence, PWMs, and
estimates of TF concentration as inputs and computes the rate of transcription as output. The
different steps of the calculation are shown; see Supplementary Text S1 for a detailed description
of the model. A. The DNA sequence is scored using PWMs by sliding an L bp window. The
score, S, is thresholded to identify sites (see Table S2 and Methods). The binding affinity, K , is
computed for each TFBS. B. For each site k , the fractional occupancy ( f k ), the fraction of time
for which it is occupied, is computed. Here, this is illustrated with a 3-site example. All the
configurations in which the binding sites can be occupied are enumerated. The weight w i is
proportional to the probability that the sites are occupied in configuration σ i. The fractional
occupancy is given by the sum of the weights of the configurations in which a site is occupied
divided by the sum of weights. vi are the concentrations of the TFs that bind to the sites under
consideration. Competition between TFs for overlapping sites is implemented by excluding
configurations in which overlapping sites are simultaneously occupied. C. Quenching, or
activator-specific repression, reduces the fractional occupancy of activators to f 'k if they are
within the repression range, specified by the distance function q (∙) , of one or more repressors.
The reduction of activator occupancy results in a lower transcription rate. The effects of multiple
repressors are multiplicative, allowing several weak sites to act as a strong one. D. The sum of the
a (k)
fractional occupancies of the activators, weighted by the efficiencies of activation, E A , is
computed to determine the interaction strength, I , of the CRM(s) with the core promoter. In the
final step of the calculation (panel F), we model transcription initiation as an enzymatic process
where the reduction ΔΔ A in the activation energy barrier Δ A is determined in proportion to the
net interaction strength. Due to the nonlinear form of the Arrhenius law (panel F), multiple bound
activator molecules have a superlinear additive, that is, synergistic effect on transcription. E.
Generalized or long-range repression reduces the interaction strength in a multiplicative but
distance independent manner, giving the net interaction strength. F. The net interaction strength is
used to calculate the rate of transcription using a diffusion-limited version of the Arrhenius law
(Kim et al., 2013). Here Q is a factor that converts fractional occupancy to units of energy and Θ
38
is the activation energy barrier when no activators are bound.
Figure 3. Identification of putative CRMs and design of reporter constructs. A-C. Plots of
sequence identity and putative CRMs tested in this study for Cebpa (A), Egr1 (B), and Egr2 (C).
The first track shows sequence identity between the mm9 (Mouse) and CanFam2 (Dog) genomes
averaged over a 101bp window. The second track shows only those positions having identity >
70%. The third track shows annotated genes. The fourth track displays the putative CRMs tested
here. CRMs are referred to by the gene name followed by CRM number in parentheses. D.
Design of the reporter constructs. For each gene, the first construct carries the proximal promoter,
Cebpa(0), Egr1(0), or Egr2(0), immediately upstream of the core promoter of the construct. The
remaining constructs carry both a distal CRM and the cognate proximal promoter separated by an
intervening vector sequence 2828bp in length (see Methods).
Figure 4. Luciferase activity pattern of CRM-reporter constructs in PUER cells. A. Cebpa
CRMs. B. Egr1 CRMs. C. Egr2 CRMs. Activity in uninduced condition (progenitor), after 24 hrs
IL-3 and OHT treatment (early macrophage), and 24 hrs G-CSF and OHT treatment (early
granulocyte) is shown in blue, red, and green bars respectively. Error bars are measurements in
two replicates and the colored bar height is the mean.
Figure 5. Results of nonlinear optimization and the selection of representative models for further
analysis. A. The scores of the two rounds of nonlinear optimization are shown. The first round
included only myeloid-implicated factors, while the second round included four non-myeloid
factors. The parameters of each combination of TF regulatory roles were inferred in 5 replicates.
The lowest score of each combination, numbering 215 in all, is plotted. The 20 lowest scoring
regulatory combinations, analyzed further, are plotted in red. B,D. Model selection from
myeloid-only runs. C,E. Model selection from runs including non-myeloid TFs. B-C. Regulatory
roles assigned to the 20 lowest scoring models in each run. Red is activation and blue is
repression. The models were clustered hierarchically (Fig. S4) based on the similarity of
regulatory role assignment. The members of the largest cluster are the top 8 models in both
panels. D-E. Scatter plot of model output against reporter activity for models selected for further
analysis. Both axes are in log scale to show low expression values clearly. D. Model 12058.
Pearson’s correlation coefficient is r 2=0.78. E. Model 81762. r 2=0.91.
39
Figure 6. Comparison of the output of representative models with reporter activity data. A-C.
Model 12058, representative of the first round of reverse engineering having myeloid-only
factors. D-F. Model 81762, representative of the second round of reverse engineering that
included the non-myeloid factors EBF1, E2A, GATA(s), and Elf1. A,D. Cebpa. B,E. Egr1. C,F.
Egr2. Reporter activity data and model output are shown in filled and open bars respectively.
Colors and error bars are shown as in Fig. 4.
Figure 7. Inference of regulatory logic. A-C. Activation. D-F. Repression. A,D. Cebpa proximal
promoter Cebpa(0). B,E. Cebpa enhancer Cebpa(16). C,F. Cebpa silencer Cebpa(11). A-C. The
activity of each activator site is plotted. The activity is the amount by which the individual site
reduces the activation energy barrier, and depends on the occupancy of the site and the efficiency
of the bound activator (Fig. 2D,F). D-F. The repressive activity of each repressor site is plotted.
The repressive activity is fraction by which the repressor reduces the interaction strength, which
results in a higher activation energy barrier. It depends on the occupancy of the repressor site and
the efficiency of long-range repression of the bound repressor (Fig. 2E,F). The gray box is
intervening vector sequence (Fig. 3D). The sites to the right of the gray box are on the proximal
promoter. The x-axis shows each binding site modeled and the position of its 5’ end in the
reporter construct relative to the 3’ end of the proximal promoter in parentheses.
Figure 8. Summary of the inferred cis-regulation of Cebpa and Egr1. A. Cebpa. B. Egr1.
40
The analysis of novel distal Cebpa enhancers and
silencers using a transcriptional model reveals the
complex regulatory logic of hematopoietic lineage
specification
Supplementary Information
Eric Bertolinoa,* , John Reinitza,b,c , & Manuc,d,*
a Department
of Molecular Genetics and Cell Biology,
The University of Chicago, Chicago, IL 60637, U.S.A.
b Department
of Statistics,
The University of Chicago, Chicago, IL 60637, U.S.A.
c Department
of Ecology and Evolution and Institute of Genomics and Systems Biology,
The University of Chicago, Chicago, IL 60637, U.S.A.
d Department
of Biology,
University of North Dakota,
10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019, U.S.A.
* Corresponding
authors. E-mail addresses: manu.manu@und.edu (M) and
eric.bertolino@gmail.com (EB).
Contents
Text S1: Sequence-based model of transcription
5
Text S2: The optimization problem and significance of fits
8
Supplementary Figures
10
Supplementary Tables
23
Text S2: CRM and vector sequences in FASTA format
30
References
46
2
List of Figures
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
Mean and standard deviation of the gene expression of 62 candidate TFs in PUER
cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gene expression in PUER cells of the TFs tested in this study . . . . . . . . . . . .
Scores of model fits with permuted data . . . . . . . . . . . . . . . . . . . . . . .
Hierarchical clustering of models to identify representative ones for further analysis
Maximum activity of each TF in model 12058 . . . . . . . . . . . . . . . . . . . .
Regulatory logic of Cebpa enhancers . . . . . . . . . . . . . . . . . . . . . . . . .
Regulatory logic of Cebpa silencers . . . . . . . . . . . . . . . . . . . . . . . . .
Regulatory logic of miscellaneous Cebpa CRMs . . . . . . . . . . . . . . . . . . .
Regulatory logic of Egr1 CRMs . . . . . . . . . . . . . . . . . . . . . . . . . . .
Regulatory logic of Egr2 CRMs . . . . . . . . . . . . . . . . . . . . . . . . . . .
Compilation of ChIP-seq and ChIP-chip datasets from NCBI Gene Expression
Omnibus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The dependence of model output on PWM choice . . . . . . . . . . . . . . . . . .
The dependence of the quality of fit on the number of TFs in the model . . . . . . .
3
10
11
12
13
14
15
16
17
18
19
20
21
22
List of Tables
S1
S2
S3
S4
S5
List of 62 candidate TFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Position Weight Matrices used to detect binding sites for each transcription factor
Microarray probes used to determine gene expression for each TF . . . . . . . .
Scores of the lowest scoring models by replicate . . . . . . . . . . . . . . . . . .
Parameter values for the two analyzed models . . . . . . . . . . . . . . . . . . .
4
.
.
.
.
.
23
24
25
26
29
Text S1: Sequence-based model of transcription
The model takes CRM DNA sequence, estimates of TF concentration, and TF PWMs as input and
computes the rate of transcription driven by the CRM. A model is defined by specifying the TFs
and their regulatory roles beforehand.
Binding sites and their affinity
Let the length of the PWM be L. Each L-mer has a score
S=
L
X
ln
j=1
✓
fbj
pb
â—†
,
(S1)
where, fbj is the frequency of base b at position j in an alignment of known binding sites for the
TF and pb is the a priori frequency of the base in the genome (Hertz & Stormo, 1999).
The site-selection theory of Berg & von Hippel (1987) provides a means of relating the frequency of occurrence of nucleotides within a site to its free energy of binding. The theory assumes
that 1) large numbers of sites are selected to have free energy of binding within a narrow range
and 2) individual base pairs contribute independently or additively to the free energy of binding.
Under these assumptions the difference of the scores between a site and the consensus sequence
is proportional to the discrimination energy, the difference in the free energy of binding between
the site and the consensus sequence (Berg & von Hippel, 1987). As a consequence, the binding
affinity of the site relative to the consensus site can be written as
✓
â—†
S Scons
K = Kcons exp
,
(S2)
where Kcons is the binding affinity of the consensus site, expressed in units of inverse fluorescence
intensity.
is the proportionality constant between binding energy and score. Kcons is a free
parameter of the model and is inferred from CRM activity data by global nonlinear optimization
(see Methods). is not known for most TFs, but is believed to have values between 0.5 and 1.5
(Berg & von Hippel, 1987). In order to limit the number of free parameters of the model, we fix
its value to 1 here. We score each L-mer in a CRM with the PWM, and retain it in the model if the
score exceeds a preset per-TF threshold value (see Methods).
Fractional occupancy calculation and competition
Let the sites in the CRM be indexed by i, and the TF binding to site i be a(i). Two sites are
regarded as overlapping if they share at least one nucleotide. We represent the site-occupancy state
5
by a binary vector and, for each , subdivide the sites into occupied, O( ) = {i : i = 1}, and
unoccupied, U ( ) = {i : i = 0}, subsets. The weight of each state, , is given by the following
rules.
1. w( ) = 1, if all sites are unoccupied, that is O( ) = .
2. w( ) =
Q
j2O( )
Kj va(j) , if the occupied sites don’t overlap with each other.
3. w( ) = 0, if any two overlapping sites are occupied; the state is excluded from the calculation.
Here, va(j) is the concentration of the TF a(j) in the cell type under consideration and Kj is
the binding affinity of the jth site. The fractional occupancy is computed by summing the weights
of all the allowed states in which the site is occupied, denoted Si , and normalizing to the sum of
weights over all states S,
fi =
1 X
w( );
Z 2S
i
Z=
X
w( ).
(S3)
2S
TF-TF and TF-promoter interactions
The TFs bound to DNA exert their influence over the total activity of the CRM by interacting
with the promoter. TFs interact with the promoter indirectly by recruiting cofactors that either
interact with the promoter or change chromatin conformation to increase or diminish the rate of
transcription (Harmston & Lenhard, 2013). One implication of the indirect action of TFs is that
each one potentially has a different interaction efficiency, stemming from differential recruitment
or activity of cofactors. Since the identity of cofactors and mode of action are not known for
most TFs, we do not model cofactors explicitly. Instead we introduce the efficiency factors, EAa ,
EQa , and ELa , which are proportionality constants between fractional occupancy and the strength of
interaction. The first constant is for activation whereas the other two correspond to the two modes
of repression represented in the model and described further below.
TFs are also known to interact with each other to modulate each others’ occupancy or the
strength of interaction with the promoter. We include four mechanisms for such interactions: 1)
competition for binding sites, 2) quenching, 3) long-range repression, and 4) synergistic activation.
The model incorporates competition between TFs for overlapping binding sites by disallowing
states with simultaneous occupancy of overlapping sites in the fractional occupancy calculation.
Competition has two main effects. If the competitor is a repressor, it will repress CRM activity
by reducing activator occupancy. If the competitor is an activator, it will take control of CRM
6
activity in the cell types where it is expressed strongly. The remaining three TF-TF interactions
are discussed in the order of their implementation in the calculation of CRM activity.
Quenching. Many repressors act by reducing the activity of a specific activator. For example,
PU.1 binds GATA1 protein bound to DNA and recruits a repressive complex that leads to the creation of repressive chromatin carrying the H3K9me3 mark (Stopka et al, 2005). One means by
which repressors achieve specificity to activators is by acting in a position dependent manner (Ogbourne & Antalis, 1998; Arnosti et al, 1996; Hewitt et al, 1999). In Drosophila, the range of
position-dependent repression has been shown to be ~150bp (Arnosti et al, 1996). Furthermore,
repression efficiency also depends on the stoichiometry and affinities of the activator and repressor
sites (Kulkarni & Arnosti, 2005). Following the convention of earlier models (Janssens et al, 2006;
Kim et al, 2013), we refer to activator-specific repression as quenching here. We model quenching
by reducing the fractional occupancy of activators in a multiplicative manner, so that
fi0 = fi
Y
a(k)
(1
q(d(i, k))EQ fk ,
(S4)
k2R
where i 2 A, the set of sites bound by activators, and R is the set of sites bound by repressors.
fi0 is the fractional occupancy of the site after repression and d(i, k) is the distance between sites
i and k, measured from their outer edges. The attenuation with distance, q(d), is 1 within 100bp
and decreases linearly to 0 at 150bp. Thus,
q(d) =
8
>
>
>
<1
|d|  100
150 |d|
> 50
100 < |d|  150
>
>
:0
(S5)
150 < |d|.
Synergistic activation. The total interaction strength is determined by summing over the interaction strengths of all the activating sites
I=
X
EAa fk0 .
(S6)
k2A
Note that the interaction strength, I, corresponds to the product of the adaptor factor fractional
occupancy and activator efficiency in earlier implementations of the model (Kim et al, 2013). In
the final step of the calculation, described below in Eqs. (S8) and (S9), we model transcription
initiation as an enzymatic process where the reduction in the activation energy barrier,
A, is
proportional to I. Therefore, multiple bound activator molecules have a superlinear additive, that
is, synergistic, effect on transcription.
7
Long-range repression. Some repressors also diminish activity without regard to the identity
of the activators and act over large distances (Cai et al, 1996; Grass et al, 2003). Such repressors usually act by modifying chromatin marks and conformation (Perissi et al, 2010; Harmston
& Lenhard, 2013). The activity data (Fig. 4) suggest that a few putative CRMs are mediating
repression at distances of over 4kb. We model generalized long-range repression by reducing the
total interaction strength in a multiplicative manner to obtain the net interaction strength,
I0 = I
Y
(1
a(k)
(S7)
EL fk ).
k2R
In the model, each repressor can potentially act at a short range on specific activators, or as a
general long-range repressor. We determine the values of EQa and ELa by fitting to the activity data,
and hence infer which mode of repressor action is consistent with the observed pattern of activity.
Transcription rate. At the final step, we compute the rate of transcription initiation from the
net interaction strength. Transcriptional initiation is regarded as a diffusion-limited enzymatic
reaction catalyzed by the activators acting via their cofactors. The energy barrier for initiation,
A, is lowered by an amount proportional to the interaction strength, giving
A=⇥
A=⇥
QI 0 ,
(S8)
where Q is the proportionality constant between interaction strength and energy. ⇥ is the activation
energy barrier when no activators are bound and sets the basal transcription rate. The rate of
transcription is then given by the Arrhenius law, limited by the diffusion of polymerase to the
promoter (Kim et al, 2013),
R = Rmax
✓
exp( (⇥ QI 0 ))
1 + exp( (⇥ QI 0 ))
â—†
,
(S9)
where Rmax is the maximum rate of transcription. These calculations are repeated to provide
predictions for the activity of each CRM and cell type being modeled. The free parameters characterize properties of the TFs and do not change with the CRM. The nonlinear optimization is
performed in an internally consistent manner, so that the same TF parameters are used to predict
the activity of all the CRMs together (see Methods). This has two implications. First the number of free parameters depends only on the number of TFs in the model. Second, any differences
predicted in the activity of CRMs arise solely from differences in DNA sequence.
8
Text S2: The optimization problem and significance of fits
This problem differs from machine learning problems in two important respects, which determined
how we validated the model. First, we are performing a regression where the number of parameters
p varies between 32 and 47 (Supplementary Text S1) and the number of datapoints N = 114. In
contrast to typical machine learning problems (Hastie et al, 2009), where N ⇡ p or N < p, N > p
here. Second, and more importantly, our dataset is not based on repeatedly sampling from the joint
distribution of TF concentrations and reporter activities. This would require the joint measurement
of reporter activities and TF concentrations in single cells or a large number of cell types. Instead
our dataset is heterogeneous, consisting of 46 CRMs, where the measurements are activities averaged over hundreds of thousands of cells but in few cell types. The heterogeneity implies that
it is not possible to partition the data in a meaningful way for cross-validation tests (Hastie et al,
2009). For example, we cannot reasonably expect a model trained solely on Cebpa CRMs to predict the activity of Egr1 CRMs. Reflecting the nature of the data, our model is not a generalized
statistical one, but is instead rooted in the biology of gene regulation (Kim et al, 2013; Segal et al,
2008). With these considerations, instead of performing a computational cross-validation here, we
checked the validity of our results by performing an in depth comparison with literature and in vivo
TF binding data (see Results).
We also determined the significance of the fit for the lowest scoring models. Our null hypothesis was that the activity measurements are distributed randomly with respect to the identity of
the CRMs and cell types. The nonlinearity of the model precludes an explicit formulation of the
likelihood function and hence a calculation of a p-value. Nevertheless, we simulated (Papatsenko
& Levine, 2011) the scores under the above null hypothesis by scrambling the correspondence of
the CRMs and cell types to the activity data while preserving its dynamic range. 8 lowest-scoring
models were chosen from the first round of reverse engineering. Holding the order of the CRMs
and cell types constant, the order of the activity measurements was permuted randomly until the
Pearson’s correlation coefficient between real and permuted data was less than 0.2. 10 permuted
datasets were generated in this manner. The parameters of the 8 models were inferred (10 replicates) with the permuted data sets and real data using simulated annealing. Figure S3 shows that
the lowest scores achieved with permuted data are 5-fold higher than the median score achieved
with real data, demonstrating the significance of the fits.
9
Std dev
Expression
Supplementary Figures
5000
0
4000
A
B
C
D
2000
0
0
20
TF
0
10
TF
20
Figure S1: Mean and standard deviation of the gene expression of 62 candidate TFs in PUER cells. A-
B. Mean expression. C-D. Standard deviation. Microarray gene expression measurements in uninduced,
IL3+OHT, and GCSF+OHT conditions are as reported by Laslo et al (2006). The set of immune-specific
TFs were identified based on the presence of at least one binding site in the tested CRMs. They were
further classified as having been previously implicated in myeloid differentiation or not based on a literature
search. A,C. TFs previously implicated in myeloid differentiation. The TFs are plotted from left to right
in the order of the first column of Table S1. B,D. TFs not yet implicated in myeloid differentiation (“nonmyeloid”). The TFs are plotted from left to right in the order of the second column of Table S1. Although
many non-myeloid TFs are expressed in PUER cells (B), most of them have low standard deviation and are
thus expressed uniformly in the three conditions (D).
10
7000
A
Uninduced
6000
IL3+OHT
GCSF+OHT
Expression
5000
4000
3000
2000
1000
0
7000
B
6000
Expression
5000
4000
3000
2000
EB
F1
E2
A
El
f1
Fl
i1
G
AT
A
Irf
4
s
.1
ro
Ik
a
yc
PU
M
Et
s1
Fo
s
Ju
n
/E
BP
α
C
G
fi1
Eg
r2
C
/E
BP
δ
yb
/E
BP
β
C
M
Eg
r
0
1
1000
Figure S2: Gene expression in PUER cells of the TFs tested in this study. A. Microarray gene expres-
sion measurements in uninduced, IL3+OHT, and GCSF+OHT conditions reported by Laslo et al (2006). B.
Measurements in uninduced and IL3+OHT conditions reported by Weigelt et al (2009). Activity in uninduced condition (progenitor), after 24 hrs IL-3 and OHT treatment (early macrophage), and 24 hrs G-CSF
and OHT treatment (early granulocyte) is shown in blue, red, and green bars respectively. For TFs with
multiple probes, the expression values of the brightest probe were used (Table S3). The TFs plotted left of
the vertical line were part of the first round of reverse engineering. The TFs to the right are non-myeloid
TFs that replaced C/EBP , Egr2, Irf4, and Fos in the second round.
11
6
Score
10
5
35226
35225
35224
35223
35222
35221
35220
35219
10
Figure S3: Scores of model fits with permuted data. Holding the order of the CRMs and cell types constant,
the order of the activity measurements was permuted randomly until the Pearson’s correlation coefficient
between real and permuted data was less than 0.2. 10 permuted datasets were generated in this manner
and 8 lowest-scoring models were fit to each dataset in 10 replicates. The replicate scores are plotted as
boxplots; the bullseye is the median, vertical lines correspond to the 2nd and 3rd quartiles, and circles are
outliers lying outside 1.5 times the interquartile range. Scores of the fits with real data are blue and scores
produced with permuted data are other colors. x-axis is model number and y-axis is score.
12
Dissimilarity score
1
A
B
0.8
0.6
0.4
0.2
81
7
81 61
7
81 62
7
81 57
7
81 58
6
81 97
6
81 98
6
81 93
6
80 94
6
80 70
7
80 34
6
80 74
7
80 37
7
81 38
7
81 25
7
81 29
7
81 89
7
80 93
7
81 54
7
81 78
52
2
12
0
16 58
1
12 54
0
12 89
1
16 22
1
11 81
0
15 34
1
11 30
0
16 98
1
16 97
2
11 61
0
11 32
0
12 96
0
16 56
1
16 83
0
16 69
1
16 33
0
16 53
1
10 17
8
11 42
86
6
0
Figure S4: Hierarchical clustering of models to identify representative ones for further analysis. The
20 lowest-scoring models were chosen for clustering. The dissimilarity score measures the similarity of
regulatory-role assignment between pairs of models. Each model’s regulatory roles were represented as a
binary vector, with 1 for activation and 1 for repression. The dissimilarity score was computed as the
Euclidean distance between the role vectors weighted by |fiact 0.5|, where fiact is the fraction of models,
among the 20, that assigned an activating role to TF i. A. Myeloid-only models. At a dissimilarity score
cutoff of 0.4, the largest cluster, located to the left, has eight models: 12058, 16154, 12089, 12122, 16181,
11034, 15130, and 11098. B. Models including non-myeloid TFs. At a dissimilarity score cutoff of 0.4, the
largest cluster has eight models: 81761, 81762, 81757, 81758, 81697, 81698, 81693, and 81694.
13
0.7
A
Maximum activation
0.6
0.5
0.4
0.3
0.2
0.1
0
C/EBPδ
Ets1
PU.1
Egr1
Gfi1
Myc
C/EBPβ
Fos
0.7
B
Maximum repression
0.6
0.5
0.4
0.3
0.2
0.1
0
Myb
Ikaros
Jun
Fli1
C/EBPα
Irf4
Egr2
Figure S5: Maximum activity of each TF in model 12058. A. Activators. B. Repressors. The cumulative
activity of each TF was calculated in each CRM. For activators, this was accomplished by summing the
P a(k)
occupancy of each site weighted by the activation efficiency, k EA fk0 , where k is an index over all the
activator’s sites in a given CRM. For repressors, the factor by which they reduced the interaction strength
Q
a(k)
was computed as k (1 EL fk ), where k is an index over all the repressor’s sites in the CRM. See Figure
2 and Supplementary Text for more details. The maximum activity over all the CRMs is plotted here.
14
Activation
A
B
C
0.4
0.2
D
1(
Eg -42
r1 75
(-4 )
1
ve 04
)
G ctor
fi1
C (-1
/E 11
BP 2)
Eg δ(-8
r1 11
()
G 73
fi1 5)
(
M -59
yc 9)
(
Eg -29
r1 2)
(-2
83
)
Eg
r
E
F
Uninduced
IL3+OHT
GCSF+OHT
C
Ju
/E
BP
EB α(F1 463
(-4 8)
ve 087
c
)
Ju tor
n
C (- 2
/E
BP 19)
α(
-2
07
)
0
n(
Ju 451
n( 2)
-4
2
ve 93)
c
Ju tor
n
C (- 2
/E
BP 19)
α(
-2
07
)
0.5
Fl
i1
(El 460
f1
1
G (-46 )
AT 0
A( 1)
M -44
yb
( 02
Ik -41 )
ar
os 78)
EB (-4
F 11
C 1(- 3)
/E 4
BP 10
8
EB α(- )
F1 409
(-4 8)
ve 059
c
)
Ju tor
n
C (- 2
/E
BP 19)
α(
-2
07
)
Repression
1
G
fi1
(
Et -46
s1 57
)
(
PU -46
.1 01
(-4 )
M
yc 580
(-4 )
2
ve 92)
G ctor
fi1
C (-1
/E 11
BP 2)
Eg δ(-8
r1 11
(
G -73 )
fi1 5)
(-5
M
9
yc 9)
(
Eg -29
r1 2)
(-2
83
)
G
fi
C 1(-4
/E
B 42
C P δ 1)
/E (-4
BP 3
8
G δ(-4 5)
fi1 3
(-4 72
G
2 )
fi1 44
(-4 )
0
ve 98
)
G ctor
fi1
C (-1
/E 11
BP 2)
Eg δ(-8
r1 11
)
(
G -73
fi1 5)
(
M -59
yc 9)
(
Eg -29
r1 2)
(-2
83
)
0
Figure S6: Regulatory logic of Cebpa enhancers. A,D. Cebpa(7). B,E. Cebpa(18). C,F. Cebpa(14).
A-C. Activation. D-F. Repression. A-C. The activity of each activator site is plotted. The activation is
a(k)
the contribution of the TFBS to the interaction strength I (Fig. 2D), EA fk0 . Here, k is the index of the
a(k)
binding site, fk0 is its occupancy, and EA is the efficiency of activation of the cognate TF, a(k). D-F.
The repressive activity of each repressor site is plotted. The repressive activity is the fraction by which the
a(k)
repressor reduces the interaction strength (Fig. 2E), EL fk . Here, k is the index of the binding site, fk is
a(k)
its occupancy, and EL is the efficiency of long-range repression for the cognate repressor a(k). The gray
box is intervening vector sequence (Fig. 3D). The x-axis shows each binding site modeled and the position
of its 5’ end in the reporter construct relative to the 3’ end of the proximal promoter in parentheses.
15
0
16
M
yb
(
/E -48
BP 14
α )
E2 (-4
A( 779
E2 464 )
A( 7)
E2 449
A( 5)
G -43
AT 2
A( 8)
Fl 428
i1
(-4 7)
1
ve 03)
ct
Ju or
n
C (-2
/E
1
BP 9)
α(
-2
07
)
D
C
Repression
1
Ik
ar
os
G (-4
AT 63
A( 6)
-4
4
ve 76
ct )
Ju or
n
C (-21
/E
BP 9)
α(
-2
07
)
EB
F1
G (-4
AT 96
3)
A
G (-48
AT
6
A( 3)
G
AT 484
8)
A
G (-43
AT
9
A( 7)
-4
3
ve 67
)
c
Ju tor
n
C (-21
/E
BP 9)
α(
-2
07
)
0.2
0
C
Et
s1
(
Et -42
s1 13
(-4 )
1
ve 03)
c
G tor
fi1
(
C -11
/E
1
BP 2)
δ
Eg (-8
r1 11
)
(G 735
fi1
)
(-5
M
9
yc 9)
(Eg 29
r1 2)
(-2
83
)
0.4
B
-4
2
ve 35)
c
G tor
fi1
C (-11
/E
BP 12)
δ
Eg (-8
r1 11)
(G 73
fi1 5)
(M 599
yc
(- )
Eg 292
)
r1
(-2
83
)
G
fi1
(
G
fi1
(/E 44
BP 51
δ )
Eg (-4
r1 39
(-4 1)
0
ve 90)
c
G tor
fi1
C (-11
/E
BP 12)
δ
Eg (-8
r1 11)
(-7
G
fi1 35)
(-5
9
M
yc 9)
(Eg 29
r1 2)
(-2
83
)
C
Activation
A
E
F
0.5
Uninduced
IL3+OHT
GCSF+OHT
Figure S7: Regulatory logic of Cebpa silencers. A,D. Cebpa(8). B,E. Cebpa(23). C,F. Cebpa(24). A-C.
Activation. D-F. Repression. See legend of Figure S6 for the details of the calculation, axes, and legend.
D
E
17
Fl
i1
(EB 468
F1 6)
EB (-44
F1 34
(-4 )
3
ve 30)
ct
Ju or
n(
-2
C
1
/E
BP 9)
α(
-2
07
)
Repression
1
Fl
i1
(G 457
AT
7
A( )
E2 -45
A( 14)
-4
30
ve 4)
ct
Ju or
n(
-2
C
1
/E
BP 9)
α(
-2
07
)
/E
BP
α(
Ju -45
05
n(
42 )
C
/E
BP 42)
Ik α(-4
ar
os 151
(-4 )
1
ve 43)
ct
Ju or
n(
-2
C
1
/E
BP 9)
α(
-2
07
)
C
0.2
0
C
G
fi1
(-4
7
Et
s1 12)
(-4
68
ve 6)
ct
or
G
fi1
(
C -11
/E
1
BP 2)
δ
Eg (-81
1
r1
(-7 )
3
G
fi1 5)
(-5
9
M
yc 9)
(-2
Eg 92
)
r1
(-2
83
)
0.4
B
-4
57
ve 7)
ct
o
G
fi1 r
(
C -11
/E
1
BP 2)
δ
Eg (-81
1
r1
(-7 )
3
G
fi1 5)
(-5
9
M
yc 9)
(-2
Eg 92
)
r1
(-2
83
)
Et
s1
(
M
yc
(-4
Eg 27
5)
r1
C (-42
/E
5
BP
4
δ( )
-4
ve 149
)
c
G tor
fi1
(
C -111
/E
BP 2)
δ
Eg (-81
1
r1
(-7 )
3
G
fi1 5)
(-5
9
M
yc 9)
(-2
Eg 92
)
r1
(-2
83
)
Activation
A
F
Uninduced
IL3+OHT
0.5
GCSF+OHT
0
Figure S8: Regulatory logic of miscellaneous Cebpa CRMs. A,D. Cebpa(10). B,E. Cebpa(13). C,F.
Cebpa(19). A-C. Activation. D-F. Repression. See legend of Figure S6 for the details of the calculation,
axes, and legend.
A
0
B
D
E
F
18
E2
A
EB (-43
F1 22
)
G (-4
AT 04
C A(- 5)
/E 3
BP 91
α 2)
Fl (-3
8
i1
(-3 25
El 73 )
f1
2
(-3 )
Ju 73
n( 2)
-3
E2 51
6)
A
EB (-34
F1 86
)
EB (-33
F1 42
(-3 )
29
v
3
C ect )
/E or
BP
α(
Fl -3
6
i1
(- 5)
El 346
f1
(-3 )
Ik
ar 46
os )
(
Fl -29
i1
(-1 8)
20
)
G
fi1
(
G -40
fi1 60
()
Et 375
s1
3)
(
M -37
yc 32
(-3 )
7
ve 21)
PU ctor
.1
(
Et -34
s1 9)
(
Eg -34
r1 6)
(
Et -31
s1 0)
(-1
Eg 20
r1 )
(-8
6)
0.5
Eg
r1
(
G -37
fi1 57
C (-3 )
/E
2
BP 96
δ( )
ve 328
2
PU ctor )
.1
(-3
Et
s1 49)
(
Eg -34
r1 6)
(
Et -31
s1 0)
(-1
Eg 2
r1 0)
(-8
6)
1
Ju
n(
El 380
f1
7
C (-3 )
/E
3
BP 54
α( )
v 32
C ect 80)
/E or
BP
α
Fl (-3
6
i1
(- 5)
El 346
f1
(-3 )
Ik
ar 46
os )
Fl (-29
i1
(-1 8)
20
)
ve
PU ctor
.1
(
Et -34
s1 9)
(
Eg -34
r1 6)
(
Et -31
s1 0)
(-1
Eg 2
r1 0)
(-8
6)
Activation
0
ve
c
/E tor
BP
α
Fl (-3
i1
(-3 65)
El 46
f1
( )
Ik -34
ar
os 6)
Fl (-29
i1
(-1 8)
20
)
C
Repression
0.5
Uninduced
IL3+OHT
C
GCSF+OHT
Figure S9: Regulatory logic of Egr1 CRMs. A,D. Egr1 proximal promoter Egr1(0). B,E. Egr1 enhancer
Egr1(2). C,F. Egr1 silencer Egr1(5). A-C. Activation. D-F. Repression. See legend of Figure S6 for the
details of the calculation, axes, and legend.
(-4
yb
yb
94
7)
(-4
70
Fl
6
i1
(-4 )
67
El
8)
f1
(- 4
Ju 678
)
n(
-4
64
M
1)
yb
(-4
60
Fl
0)
i1
(-4
El 334
f1
)
(- 4
33
Ju
4)
n(
-4
EB 12
F1 1)
(-4
01
1)
M
Repression
1
M
E2
A(
EB 431
F1 1)
(-4
G
AT 176
)
A(
-4
07
Fl
0)
i1
(-3
El 659
f1
)
(-3
65
9)
0.5
67
8)
.1
(-4
6
Et
s1 66)
(- 4
Eg 334
)
r1
(-3
9
G
fi1 07)
(-3
64
5)
Et
s1
(- 4
PU
(-4
4
/E
BP 17)
δ
PU (-40
3
.1
(-3 2)
8
Et
s1 44)
(- 3
65
9)
C
Eg
r1
Activation
0.5
A
Uninduced
IL3+OHT
0
B
GCSF+OHT
C
D
0
Figure S10: Regulatory logic of Egr2 CRMs. A,C. Egr2(7). B,D. Egr2(10). A-B. Activation. C-D.
Repression. See legend of Figure S6 for the details of the calculation, axes, and legend.
19
A
78 kb
35,870 kb
Chr 7
35,880 kb
35,890 kb
35,900 kb
35,910 kb
Refseq genes
Cebpa CRMs
C/EBPα
35,920 kb
35,930 kb
35,940 kb
Cebpa
24
23
22
20
21
19
2 0
5 6
87
9 10
11
12 13
14
151617
18
Macrophages
Egr1
Dendritic cells
PU.1
PUER cells
PU.1
Neutrophils (FDCP mix)
GATA2
Megakaryocyte pro.
(G1ME)
GATA2
Multipotential pro.
(FDCP mix)
B
Chr 18
42 kb
35,010 kb
35,020 kb
Refseq genes
35,030 kb
35,040 kb
35,050 kb
Egr1
Egr1 CRMs
Ets1
6
5
4
3
2
0
7
9
12
14
Megakaryocyte pro.
(G1ME)
Egr1
Dendritic cells
EBF1
pre-B cells (38B9)
EBF1
RAG1-/- pro-B cells
GATA2
Megakaryocyte pro.
(G1ME)
GATA1
GATA1 transduced
(G1ME)
Figure S11: Compilation of ChIP-seq and ChIP-chip datasets from NCBI Gene Expression Omnibus.
Where available, BED format files were downloaded and plotted in Integrated Genomics Viewer (Thorvaldsdóttir et al, 2013). The first track shows annotated genes in the genomic region. The second track shows
the CRMs analyzed in this study. The other tracks show TF binding peaks from ChIP-seq or ChIP-chip
datasets. The TF and the cell type the ChIP was performed in are listed on the left of each track. Empirical
evidence for binding is matched with CRMs predicted to be bound by the TF in the red boxes. A. Cebpa
locus. Tracks 3-8: GSM537984 (Heinz et al, 2010), GSM881139 (Garber et al, 2012), GSM538003 (Heinz
et al, 2010), GSM1218228 (May et al, 2013), GSM777091 (Doré et al, 2012; Chlon et al, 2012), and
GSM1218221 (May et al, 2013). B. Egr1 locus. Tracks 3-8: GSM777093 (Doré et al, 2012; Chlon et al,
2012), GSM881139 (Garber et al, 2012), GSM499030 (Treiber et al, 2010), GSM546524 (Lin et al, 2010),
GSM777091 (Doré et al, 2012; Chlon et al, 2012), and GSM777092 (Doré et al, 2012; Chlon et al, 2012).
20
Model 81762 output
A
2
B
2
r = 0.91
C
2
r = 0.93
D
2
r = 0.91
r = 0.87
2
10
0
10
0
10
2
10
w/ GATA1_01 PWM
0
10
2
10
w/ GATA2_02 PWM
0
10
2
10
w/ GATA_Q6 PWM
0
10
2
10
w/ CEBP_Q2 PWM
Figure S12: The dependence of model output on PWM choice. Scatter plots of the output of model
81762 (Fig. 5C,E) against models in which the GATA3_02 PWM has been replaced with the GATA1_01,
GATA2_02, or the pan-family GATA_Q6 PWMs (panels A–C) and the C/EBP↵ /C/EBP PWMs (Table S2)
have been replaced with the pan-family CEBP_Q2 PWM (panel D). The modified models were fit to the
data without changing other TFs/PWMs. The regulatory roles inferred with the alternative PWMs were
identical to model 81762. The score of model 81762 was 35072. A. GATA1_01 PWM, model score:
46083. B. GATA2_02 PWM, model score: 46826. C. GATA_Q6 pan-family PWM, model score: 45876.
D. C/EBP↵ /C/EBP PWMs replaced with CEBP_Q2 PWM, model score: 74878.
21
6
Score
10
5
10
4
10
15
11
9
7
Number of TFs
5
Figure S13: The dependence of the quality of fit on the number of TFs in the model. Starting with the
15-TF optimization runs with non-myeloid factors, TFs were removed in order of increasing regulatory
constraint, that is, from right to left in Figure 5C. The scores of optimization runs carried out with different
number of TFs are shown. The parameters of each combination of TF regulatory roles were inferred in 5
replicates. The lowest score of each combination is plotted. The 20 lowest scoring regulatory combinations
of the 15-TF optimization run are plotted in red. Upon removing the 4 least constrained TFs, the lowest
score achieved increases, but is close to the range of the 20 lowest scoring 15-TF models. The lowest scores
of 7- or 5-TF runs are as high as those achieved with randomized data (Fig. S3).
22
Supplementary Tables
Myeloid-implicated TFs
Non myeloid-implicated TFs
EGR1
MYB
CEBPB
GFI1
EGR2
CEBPD
JUN
CEBPA
FOS
ETS1
MYC
SFPI1
IKZF1
IRF4
FLI1
CEBPG
YY1
SP1
IRF1
RARA
GABPA
STAT5A
EGR3
ELK1
IRF2
RUNX1
RUNX3
MZF1
LEF1
FOXO4
GFI1B
MAF
POU5F1
SMAD
HMGA1
TCF12
RXRA
ETS2
E2F1
NFATC2
NFATC1
ELF1
POU2F1
KLF1
TCF3
ELF5
PATZ1
POU3F1
POU6F1
SOX17
POU2F2
GATA3
GATA2
EBF1
PAX5
GATA1
SOX2
Table S1: List of 62 candidate TFs.
23
TF
PWM name
Source
Accession
Threshold
Consensus score
Egr1
Myb
C/EBP
Gfi1
Egr2
C/EBP
Jun
C/EBP↵
Fos
Ets1
Myc
PU.1
Ikaros
Irf4
Fli1
GATA
Elf1
E2A
EBF1
KROX_Q6
Myb_JASPAR
CEBP_Q2_01
GFI1_01
KROX_Q6
CEBPD_Q6
AP1_01
Cebpa_JASPAR
AP1_01
ETS1_01
MYC_02
PU1_01
IKZF1_03
Irf4_2_JASPAR
FLI1_01
GATA3_02
ELF1_01
Tcf3_1_JASPAR
COE1_Q6
TRANSFAC
JASPAR
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
JASPAR
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
TRANSFAC
JASPAR
TRANSFAC
M00982
MA0057.1
M00912
M00250
M00982
M00621
M00517
MA0102.1
M00517
M01986
M01154
M01203
M00088
PB0138.1
M02038
M00350
M01975
PB0082.1
M01871
10
6
6
6
10
6
7
6
7
7
7
8
7
7
7
6
7
6
8
16.92
12.011
8.711
15.687
16.92
10.779
11.466
9.706
11.466
10.517
14.689
14.409
13.451
13.468
10.692
9.549
10.829
12.875
12.725
Table
Kthresh /Kcons
9.88 ⇥ 10
2.45 ⇥ 10
6.65 ⇥ 10
6.21 ⇥ 10
9.88 ⇥ 10
8.40 ⇥ 10
1.15 ⇥ 10
2.46 ⇥ 10
1.15 ⇥ 10
2.97 ⇥ 10
4.58 ⇥ 10
1.65 ⇥ 10
1.58 ⇥ 10
1.55 ⇥ 10
2.49 ⇥ 10
2.88 ⇥ 10
2.17 ⇥ 10
1.03 ⇥ 10
8.87 ⇥ 10
4
3
2
5
4
3
2
2
2
2
4
3
3
3
2
2
2
3
3
S2:
Position Weight Matrices used to detect binding sites for each transcription factor.
PWMs were obtained from the TRANSFAC (Matys et al, 2006,
http://www.biobase-international.com/product/transcriptionfactor-binding-sites) and JASPAR (Mathelier et al, 2014, http://jaspar.genereg.net)
databases. The sixth column shows the maximum possible score, which is achieved by the consensus
sequence. The ratio of the affinity of a binding site having the threshold score to that of the consensus site
(seventh column) was computed as KKthresh
= eSthresh Scons (see Fig. 2 and Supplementary Text ).
cons
24
Name
Probe ID
Egr1
Myb
C/EBP
Gfi1
Egr2
C/EBP
Jun
C/EBP↵
Fos
Ets1
Myc
PU.1
Ikaros
Irf4
Fli1
GATA
Elf1
E2A
EBF1
1417065_at
1421317_x_at
1418901_at
1417679_at
1427683_at
1423233_at
1417409_at
1418982_at
1423100_at
1452163_at
1424942_a_at
1418747_at
1436312_at
1421173_at
1433512_at
1448886_at
1417540_at
1436207_at
1457441_at
Table S3: Microarray probes used to determine gene expression for each TF. The probes are on the
Affymetrix Mouse Genome 430 2.0 Array (Laslo et al, 2006). For TFs with multiple probes, those having the highest average expression across conditions were chosen.
25
Model number
1
2
Replicate score
3
Median absolute deviation
4
5
Models with myeloid-specific TFs only
12058
16154
12089
12122
16181
11034
15130
11098
16197
16261
11032
11096
12056
16183
16069
16133
16053
16117
10842
11866
84588
85816
92488
84459
84712
84848
85965
84525
89653
84992
84956
87461
91119
84908
83995
84845
91251
85004
85073
85002
84780
85011
84440
84612
84858
782231
85307
84575
84115
94240
85180
85011
88557
86156
90393
84403
84757
98869
85038
84847
84578
108352
92472
84660
84758
84603
85734
84398
84235
84869
84884
102838
85237
93829
84223
88747
84755
85000
84558
84646
84486
85830
95896
84867
94119
84486
98235
84375
94237
84821
84961
92432
84942
91515
84253
84814
84788
84926
87251
84708
84753
84713
91417
84619
84887
84351
84588
84984
89867
89622
85142
85103
85057
91455
89386
85163
84795
84957
84930
84744
102
805
1055
41
100
245
427
127
4584
171
77
2450
295
2374
258
318
31
43
108
98
Models including non-myeloid TFs
81761
81762
81757
81758
81697
81698
81693
81694
80670
80734
80674
80737
80738
81725
81729
81789
81793
80754
81778
81522
36997
35114
40174
40640
48554
47962
48483
53782
47774
41259
47399
49543
39419
54962
55230
52803
48685
51514
52119
70736
38373
35072
40129
40634
54404
535358
48476
48199
47874
43291
47620
49558
39401
54692
48777
55391
48526
51238
79873
53933
37158
64063
40140
40637
48559
47960
48356
48218
47853
74172
47642
49408
39274
48944
56348
65949
48691
51612
51957
75414
37050
35132
40277
40677
48431
47952
49466
48232
47894
41277
47511
49407
42075
55390
54953
52880
45343
51623
470678
54151
80891
35144
40234
40629
54211
47962
61827
48201
47822
41223
47584
202858
47061
55200
55548
48257
54666
51421
52220
53837
161
18
45
3
128
2
127
17
31
54
58
135
145
270
318
2511
159
98
263
314
Table S4: Scores of the lowest scoring models by replicate. Median absolute deviation is given by
mediani (|Xi
medianj (Xj )|).
26
Transcription rate parameters
Model 12058 Model 81762
Rmax (fixed)
⇥
Q
1500
5.77001
5.02737
1500
6.37259
4.96505
Parameter constant across TFs
Model 12058 Model 81762
(fixed)
1
1
Activators
Model 12058
Model 81762
EA
Kcons
C/EBP
3.39369
0.000557698
4.99494
0.000453547
EA
Kcons
Egr1
0.598979
0.003015459
0.402529
0.004086745
EA
Kcons
Ets1
4.99863
0.000191494
1.60987
0.012646051
EA
Kcons
Gfi1
0.212086
0.39995002
0.292196
0.39970697
EA
Kcons
Myc
4.9571
0.000247503
4.98925
0.064071995
EA
Kcons
PU.1
4.98391
0.001604342
1.69259
0.00891181
EA
Kcons
C/EBP
0.622789
0.001232013
NA
NA
27
Fos
0.178452
0.078787372
NA
NA
Repressors
Model 12058
Model 81762
EQ
EL
Kcons
Fli1
0.000672614
0.999977
0.000449492
0.00511047
0.290777
0.010875566
EQ
EL
Kcons
Ikaros
0.00123356
0.998445
0.004740461
0.933711
0.000115092
0.24209953
EQ
EL
Kcons
Jun
0.0225533
0.215683
0.1833314
0.99947
0.586878
0.002607652
EQ
EL
Kcons
Myb
0.957583
0.248584
0.073540587
0.317748
0.223248
0.07360456
EQ
EL
Kcons
C/EBP↵
0.00916877
0.0866283
0.16941582
0.999984
0.00303068
0.16931703
EQ
EL
Kcons
Egr2
0.480565
0.00353553
2.96E-06
NA
NA
NA
EQ
EL
Kcons
Irf4
0.932284
0.960832
0.001217458
NA
NA
NA
EA
Kcons
28
EQ
EL
Kcons
GATA
NA
NA
NA
0.000138341
0.545508
0.08858245
EBF1
EQ
EL
Kcons
NA
NA
NA
0.999986
0.870706
0.053408421
NA
NA
NA
0.999929
0.101139
0.39385154
NA
NA
NA
0.983475
0.313624
0.044488665
E2A
EQ
EL
Kcons
Elf1
EQ
EL
Kcons
Table S5: Parameter values for the two analyzed models. NA is
entered for a TF if it was not included in a model. Model 12058
included myeloid-implicated TFs based on differential expression
in PUER cells. Model 81762 included non-myeloid TFs that were
uniformly expressed in PUER cells. Parameters whose values
were fixed instead of being determined by fitting are indicated.
29
Text S2: CRM and vector sequences in FASTA format
>pGL3 intervening vector sequence
TCGACCGATGCCCTTGAGAGCCTTCAACCCAGTCAGCTCCTTCCGGTGGGCGCGGGGCATGACTATCGTCGCCGC
ACTTATGACTGTCTTCTTTATCATGCAACTCGTAGGACAGGTGCCGGCAGCGCTCTTCCGCTTCCTCGCTCACTG
ACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAG
AATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGT
TGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAA
ACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGC
CGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATC
TCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCT
TATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACA
GGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAA
GAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCA
AACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAG
AAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGA
GATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATG
AGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCAT
CCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAA
TGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCA
GAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGC
CAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTT
CATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCT
TCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATT
CTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGT
GTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAG
TGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGT
AACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAA
GGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATT
ATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAG
GGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGG
TGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTC
TCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTAC
GGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTC
GCCCTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCT
CGGTCTATTCTTTTGATTTATAAGGGATTTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAA
AATTTAACGCGAATTTTAACAAAATATTAACGCTTACAATTTGCCATTCGCCATTCAGGCTGCGCAACTGTTGGG
AAGGGCGATCGGTGCGGGCCTCTTCGCTATTACGCCAGCCCAAGCTACCATGATAAGTAAGTAATATTAAGGTAC
GGGAGGTACTTGGAGCGGCCGCAATAAAATATCTTTATTTTCATTACATCTGTGTGTTGGTTTTTTGTGTGAATC
GATAGTACTAACATACGCTCTCCATCAAAACAAAACGAAACAAAACAAACTAGCAAAATAGGCTGTCCCCAGTGC
AAGTGCAGGTGCCAGAACATTTCTCTATCGATAGGTACCGAGCTCTTACGCGT
>Cebpa CRM 0
CACGCAAACTCCTACCCACAGCCGCGAGCCTGTAGGCGGCGCGGCGCGGAGGGCTCCCAAGTGGGTGCTCGAAAG
GCTTCGTAGCTAGGAATTGGACACCCGAGCTACCGAGATTAGTGCCCCCATGAATGACAGTAGGGAAAGAAAACT
30
GTGTCTTCAGGCCCCTGGCTATGGGCCCCGCCTGGGGATCACAGTCCCCGATTCAAGTTCACTCCTCTCCAACAC
CCTGCCCTCTCGCGGCCCCTGTGCGCTCTCCTTAGGGTCCTTTTCCGCGAGGCTCAGAGGACCCACCGGCTGGTC
CCGGGCGTGGGGTGGTGGTGTCCCGAACACTTGACTAGAGTGCTCCACGCTGGGTAGCAACGTCTGCCTGGTAAG
CCTAGCAATCCTATCGCTCTGGCCTGGAGACGCAATGAAAAAGAAAGTTTTCCAGCCTAGGCGAGTGGACGAGCC
AGGTCCACCAGGCTCCGGGGTTAGCGGCCGCCTCCGCTCCCCGCCGGGTCCTAGCGCCCTACTACTCTGAAGAGC
CCGTGGGACCCTGTAGTTCTAGAGAAGCTGGGCGAAAGAGAGGTGCTCTGCCTGGAAGCCGTGGGGTCGCGTGGA
GTTCAGAGAAAAAGACGCACAATCTCTGCGCTCCCGGCCTCGCCACTCGGCGGTGCGCGCTAGGTTGCTGGTCCA
AAGCAGTCTCCAACCTCCCGCGCCGCGGCTCTCCGCCACAGCCTTTTAGAAATCCGGGTGGGAGACAGGCCTAGT
CCCAGCTTTTAACACAAGTCTGCACTACGGTAGCTCAAAACCAACATTCTCTCTCCAAACGCTCCCCAACCTCCA
CCTCCCCTCGCTCGGCCTCTAGATGCTCCCGGGCTCCCTAGTGTTGGCTGGAAGTGGGTGACTTAGAGGCTTAAA
GGAGGGGCGCCTAACCACGGACCACGTGTGTGCGGGGGCGACAGCGCCGCCGGGGTGGGGCTGAGCGCTGCAAGC
CGGGTTCGCCTTGCAGCGCAGGAGTCAGTGGGCGTTGCGCCACGATCTCTCTCCACTAGCACTATGCTCCCCACT
CACCGCCTTGGAAAGTCACAGGAGAAGGCGGGCTCTAAGACCCAGCAGGCACCATCCTACTGGCGCCTTCGATCC
GAGACCCGTTTGGACACCAGGGGGCGATGCCCGACCCTCTATAAAAGCGGTCCCCGCGCGGGCCTGGCCATTCGC
GACCCGAAGCTG
>Cebpa CRM 2
TTAAGGATTTTTCCCCAGGGGACACTCCACAGGTAGAGTCGCGTGCATCTGGCCCAGAGAGCGATCCTCTGCTCA
CACCAGGTTGTTTATTTAGTGTATCTGCTTTTTATTCACTCTGTTCAAAAGCATCAAGCTAAAAATATCTTTTTT
ATGCGCTCTGCACCGAAAATGAAGCAGTTGAGCCATTAATCACTGAGAATATGTTGAATGAGAATTTGATTAAAG
TACAGGATGCGTCCTTCAGACGAAGTCGGTTCTTTACTGTAACCGCAGTGGAAGCACAGCAGACCTCCCTAACAG
TTTTTAGCCGCTCCTCTTGGAGCCTCCTGGGCGGATCGCAGGGCCTTTCCTGGCCTAAAAAGGCCATTGCACTGG
GGGTCTTGACCTGCGACCCTAGAGCGC
>Cebpa CRM 5
AGTGTGACCAGGCTTTGTGGTTAAGATGTCCCACCAACCCCTTCCCCCACTGCCCTCTCTGTCTTATGCCTTGCC
ACCTCCTCACCCATGGCTGCCCTTGGCAGAGGAAAGGCCAAAGCCCAGTGAGGAGGCGGTGGCTTGCCCTGTCAG
CCTGGGAACTAGGGAGGCAGAGCCCCATCGCCTGCCTAGAGGCTGGGGCAGGCCGTGGCTTGCAGGCCCCAGCAA
ACCCTCTCTGAGTTCAGTGTGGATTCCAGTTTTTCTTGTTTACAGGGAATGATCCTTGAATTCCCCCTTTGCTCC
CAGAACAGGGCTTAGGCTGAACGGGGGAGGGGGCCCTCCTGCTCTCCTTTCCTAACCACCTGATGTGGGGGTGGT
CATCTTTCTAAGGGCTCATGGAAGGGCACCCTTCCTGCACACCCAACCCAAGTTT
>Cebpa CRM 6
GAGAGAAACAAGGGCGGGAGTGAGAGAGGGAGGTGACTTCTTTGCCACCAGATAGCTTAACGCTGTCACGGATGT
CACCCTTATCTCTCGGATCTCCTGGGAATCTGTGAACACGGGGTTTGGCCCTGACACCTGCCATGAGCCACTGGG
AAGGGTGAGGAGTGTCCACCCCTCCCAGGCTTCCCGCCCCCTCCTTCCCAGCCCAGGTGGAACCTGCCTGGCCTC
CACAAGCTGCTGATTTCACAAGAGGGGAGGGGACCCCTCATCTTGAAGGTCAAGGGGAGCAGAAATTCTGATCTT
CCTTAACCAAGTGGTAACATCTTCCAGAGCACCCATGGGTCTTCTGCAGACATGAAGTATGATGTTAATGTTCTA
GCTGTTGCTTAGCTGTGCTAATGGATAGAGGAAACCCTTTTCTTTTCTTTCATTTCTTTTTTTTTTTCCTTTCTT
TTCTCCCTTCGCATGCCCTCCCCTCCTTCCCCTAACCTGCCATCTCTTCCTTGCTTCTCTAACACCACCTCTTCT
TTCTCAACCCCACCCATCCCCACTCTCATCCTTCCCCAGTGGACAGAGCCTTTAACGAAAGCTGTAAGCTAGGAA
AGAAACCAATTCACTATCAAAGCATCCTAGCCCCATCCCAGAACAATTAAACAGCCAGATGGAATCTCTGGGAAA
ACGCATTAATAACGCGAACAATCTGTGCAGGCCAGGCAGGAGCCCAGGAGCCTGGAGCAGGGGTGGGGTTTGGGA
GTGGAAGTGGGCAGAGGAAGGCTGGGGATGCTG
>Cebpa CRM 7
CCCACTTCCACCCCCTAAGAATACTGGATCCCTCTTGCCGATAAGGAACTGTGGTCAACTTCTAGTGGCTTTCCT
GTGCACGTGTTGGGCAACCAAGCCTCAGCTGGACTTAGTTGCCAAGCCCAGACAACAGGTGGCAAGGGGGTGTCA
GGGACTGGGTACCAGCTCTTTGGGGAGCTGCCATGACCTTCACCATCAGGTTAGGACCCGTCAGAAGTGGCCTCC
31
TTGAGTGATTTACAATTTGCAAACATGTTTTATTTGATTCCCGAGTTCTGCCGGGGCAATTACAGTGACTAAGCA
TGACAAATCACTCTCAGCACAGTGTCTGCTCGCTGTTAAGAAATGTGTTTGCCTCACTGTTTTGCCTGGTGCGGC
AACATTTTAAAAATAGACTCGCTCACTGTACGCGAAGGCAATTTGTTCCAAATTTTCCCACTAATTTGATTTTAA
TCTGATATTTAAAATTCGTGTGACCACATTCCCACTGATTTATAGGGAATAAGCCCTACCTGGCGGCACTGTAAT
TGGCTTTGGCCCAGGAGTCCACAGGACAGAGCATTTATCCCAGAACAATTTGAAGGCACTCATGTCTTAATGTTT
TAAATATAGCCTAATTTAGCCTCACAAGTTCTGATTCCCTGGGGGCAGGACAATGAGTGTTAAGGTTGCTCTGCT
CAGG
>Cebpa CRM 8
TTTGTCTTTGCTGTCTCTTGGTCTCCATCCTCCACTGCTACCCTCCCTCAAGGTCACCGCTAGTTCTTAGAACTC
CCCTGGGTCCTTGCTTGGCGTCTTTGCCGACCTTGGCTCCAGCAGACATTTAGTATCTAGCCCAGGGCAGGTCAT
TGCACATGTGTCTGAATGCTAGTATCTCTTTCCAGCTTATCTCTCTCCTTTTTGGAACACCCTTGCCCTCTTCTT
TCCAGCCAAGGGGCTCCAGGGTTGGCTTAAGGTCCACCGAGTATGGGCTGAGGGTGTCATTGTCAGGAGCAGCCC
AAAGGATGGATCACAGACTTCCACCCGATGGCCTTCAATAGATGAGTTCTTGCTTCTAGAAAATGACTTTTAAAG
AACAAGACTCTAGCGAGGTGGTCTGCCTAGTCCTTGAGAGCATTCAGTGGCAATGTGGGAATAGTTTATCATCAG
ACTCAGAGTGGCCAGGCCTCTGCAGAAGGCTATGCTTTATAGGGACACTGGGTGGGGGAGAGCTGGAACTCTAAA
GAAGAGAGTGAGGAGGAAGCCCAAGCTATCTGATATATGCCATGAACTGCTGAGGTGAAGGCCCACTCACTGTAC
GGCCCGGACGCACTAGAAGCAACAGTTCGGAGTTAAAAGATAATTTCAGACCTCCATGCCTCTTTTCTTATCTCC
ACTCTAGGATGCCTGGAAAGGTCTTCCTAGAGGAATGAGAGCCCAAGAGCAGGGCTAGCAGTGGTCAAATGTTAA
CAGTCTAGTTTCAAAACACATTGCTGGGTGACCCAGAGAGCCACCACAGCCTGTCCCAAGCCCTCCTTGACCTGA
AGCTATTTCTACTCTTGAAATATAAGAGGAACCCCTGTCCTGATCCAAAGCTACAGAAGTCATAGAGCCCCACAC
CACTCTATTGCCCTTGAAGCCCACCAAGAGCACCCTCCCAGAGCTGCCCACCCCCCACTTCCACCCCCTAAGAAT
ACTGGATCCCTCTTGCCG
>Cebpa CRM 9
CCCTGTGGAAGAGTTGGTCAGGCTGGTCCTCAGACAACCAGGGAAGCTCTTGGGGTCCTGGAGAATAGGCACATA
GCAGATAAAAGGAGTTCTTAACCAAACTTCCCTAGAACGGAGGGAGCTAACAAGAAAGAACTTTGGAAATCTACC
CTCCTCTTTCCCTGTCACTGCCAGGAATGTCACCATGAGAGCAGTTTCAGTTAATGAGCAAACTCCTCAGACAAG
GCAGGAAGGCAGCTCTTGGGCCTCACTGTCAAGCACAGGAAGCGACTGGATTCCACTTGCCCGGTGTAGGGATGA
CAGCAGGTATTGAGTGGGACTGCAGGCCTGACATCCTTAGCTCCTCCACACCCAGGACAGCCCGGCTGTCAGCAC
AGGGCAGCAGAAAGGACAGGGGACAAGCTCCAGGTGTGGGCGAGTCCCAGAGCAGCCCGGGGAGAGTGTCACTGT
GTGGGTGCTGGCGTGGGGGCAGGAGCACACCACATGCAGTTGCACGGGGGACTGAATCTGAGGCTTTGGGGGAAG
CATCCAAGCCCCAGGGTGTGTGTGAGGGGGTCCCCCACATGAGCAATTCCTCAGGCCCAGCAATGGCTGATTCCC
TCTGCTCGAGGAGAAATCTCATGTGAGGAAGGTGGAGTCAGGTGAGTCACAGGCCAGGCCCCTGTGCCGTGGGGA
GGTGGGGACGATGGCTCCGAGCCAGAAATGTGTCAGAGGCCAGATGAGCATCTTGAGGGACGCGAAGTTTATGTA
ATTATGTGGGGCAGCACACTGCTCGGGTGTGTGAGAGCCATGCTAGGAAGGAGAATCGTCTGCTGCATGTCCTCT
GCCTCCCTGGGCTATACTGACCTAACAGCCCAGCATCCCCACAGCTGGCCCGGCCAGCTGCCCACAGTCACAGAG
TCCGAGTGTACCCAGATCATGTCTTTGTAGGACCAATGGGCTAG
>Cebpa CRM 10
GGTGACATCTCTGTCTTCGGTCACCACAGGCTGCCAGCTGACCATCTCTTTCTGCCTCCCTACGAACCAGTGTGT
AGTGGCAACAGGGCATCCATGAGACTCCATGAGATGTTGCTCACAGTGCCCAGGCCATGGTGCTGGTTACACAAG
GCCTGAACCTGCCTTCCCACCCACTGGGAGCGTGAGACCGTGTATACCCTGGCCCCCACACACACCTGCTGTGTG
ACCATACGCTGGTTTGGCAGCCTCTCTGTTCGAGCATGTCTAAAAGGTGTGCCGGGATTCCCAGAGGCAAGATGA
TGAGTGGTGTGTTCGAAAACACAACCACAAGAGGCTGATCCCAGGACAAGTCACAGTGAGAACAGGGAACGGGCC
ACCTGGCCACTAAGGTTGGGGGGGGGGTGCGGTGAGTCAGAAACAAACAAGCCACCTGCCAGCTAGATTGTTCAC
CTCTCCAAGAGAGTGAAGGGTTGGGAGTGAGAGGGCTGCAGGACCCGGGTTGAGTAATTCCCTGCCTGCTTCTGG
AGCCCAGATCCTGGCTTCTTGCCAGCCAGGTCCTAGGTGATGCTATTGTGCTAGATTAATAAATGGCATTACTCG
32
TTTCAGC
>Cebpa CRM 11
GGGAGGAATAGAGAATTGAGATCAGTAATCTGTCTGGGAATCCTGGCGGGGGGCCAGTCCCCTGGGGCAGCCAAA
CGGGGTGTTGTCAGCCCACATGAAGGCCCTGCTGCTGGCCACATTCTGTAAACAAACATCCACATGTGTGCGCAT
AGCAACCTAGTGCCAAAGGACAAACACAACTGTGACAAGTTTATGGAGCTGTAGTATCGGGGAGGACAGAGATAA
TTAGGCAAATCATAAATGCATGTAGTGATGCCTTTTGTCTGTGCGAACCCTGCCTGTCTGTTGGGAATTCCAGGC
TGGAGATGGGTGGGGAAGGTGGTTTGTAATCTTTATTCAGACTCGGCCCCCAGAGTCCTCTCAGATGAAACAACT
GGGGGACTGAGGAGAAAGTAATTCTGATTTCTGGGGTGACTTCGTAGGCACAGTCAGTCGTCTTTCCTTCCCTGC
TCCCTGTACCTCCCCAGATAGAGTCAGTTTACCCTCAGAAACACCTCCCTAGAAGCTGACAGGTGCCTGGGGGAA
TTCAATGCATCCTGCCTGCCAGCGCCAAAGAAGGGATCTCAGACACAGAGCTGAGAGCAAGGGGGCGGGGGCGGG
GTGGGGCACAAGTCACAACTTTGAGATGGAGGTGGTTTGAGTGGGAATGTGTGGTCAGGGAAGGGCTCTTGGGCA
GGTAGTGTTCAATGTGGGGCCACAATAACGAGCTAGCTCCAGCTACTCAAAGGACTTCTCTGGGTTGAAGGGCGA
AACTGCAGATGGCTCAGAGAG
>Cebpa CRM 12
TTGGCTTGGAAGTATAAGAAGTGAGTTTGGGGAGGCAGAGAGGAGAGGAACAAGGAGGTGAGAGCAAGCCATGGA
ACTCATCTATTCTGGGACAAGAGTCTCTGGCTCAGGAGGGAGGCAACACTGAGTTCCAGCCCAGCAGAGTGTGCA
GCCCCACCAGGGATCTAGAGACCGGCTTCCTCTTGGTGTCTTCATGCCACAAGATGTAAAATAATCCCCTAGGCT
CAGAAAAACACATTTTATGGCTGTCTGACTCATTTCCCACAAAGCTAGGTTCACCCCCCCAAAAAAACGTTTACA
GCTTTTGTTTTGAAAAATAAAAAAAGTATGTTGAAGATTTTAGGATGACTTTTAGACATAAATGGTTCCCTTCAC
CCTCGTCCCAGGCCCCCCTCACCTAATTTCTCAAATGCTGGCTGCCAGCGGCTGTGGAGGATCCAGTCCAAGTGG
GTGTGGGATGACTTGGGGAGCACTGGGCCTCATAGCCCCAGGGCAGGGCAAGTTGATGCCTGCTGGGCATTGAGA
CAGGCCAGAGCTCCAGGGTGGGGGTGGGGGTGTCCCTGAGCCTGCCGGTTTGGGGTCTTAGGCAGTGATGTCACT
AGCTTCTAGCTTGGCACTCTCCTTGGGGGACATGAGGGACAAAGGCCACATCAAACCGGTTGCTTTTTTG
>Cebpa CRM 13
CTCAGAGTTAAGGGCTGGATCTAGGACAGACATGGCTTCAGCAGGAGGGGATGTGGACTGGGGGGTCTTCCAGCT
GGGCTAACCCACAGGCATCAGGGGACAGGCAAGAGGCGACCAGGACTTCCCTAACCACCAGAAAGTGATTCAGCT
AAATTGAGGAAGACTTCCGTTTCAGCAGAATGCATTTTCCACTATGGTAAGAGCTTTCTCACCCACGGGCTCTAG
AAGATCAGAGTCCTCCAAAGAGGGTTTGGAGAGTTTCGGTTTGAATAGTCAATGCTGATTCTTAATCATTATTTT
GAAGAGGCAGCTCTTGGTTTTCCAGCAGGTGGCTGGGTCTCAGGGCCGTCAAGCCAGGTGTGTATGTGAATACAT
GGGTGCATACCGTGTGAGCATTTGTCACCTTGGAGGCCAGGAAGGAAGGCAGCCATGATCATTCCTTTGGTCAGT
GGTAAGCTCTGTCCAGGTCTGCAGAATGACCTATATATGCCACCCTCACAACCCTGCCATGAGAGCTAATAGCTA
CACTGAAGTTTCACTGTGTGCCAGGCAGCAGGCTGGGCACATGTGCCCAGCCTTGGGAACCTACCTATCAGCTAA
GGCTCTAAGAGAGGAAAATACCAGTCAACATTCAGCCAGACACACACCCAGGTAGCCTAACAGCATCGCAGCCAC
CCTACCTTCTGGTTCTCCTGTTCC
>Cebpa CRM 14
GTAAGGAATCACAGGGGTCAGTCAGGGCTTCCCTAACTGGAAAACCCAAGTTCAGAGGTACCACAGACTATGACT
GGGGTTAGAGTTGGAACATGGGGTAGGCCGACCTGGGGTACAAGGGAGGAGGACCCCCAGTGTTCATACCATAGG
GGCACCTGCTTCTGCTAGACAGTGGGGGGGGGGGAGCCTGAGCCATGAGAGAGCACAAGGGAGGTCAAGGGGCAG
AAAGGCCAGAGGGTGTCAGCAGGCTCCAGCAGGCTGTGGACACTTGGCCAGAAAGGCCTGTTTACTGAGAGGCCT
GGGAGGTCAAGGCCCAGGCCTGGAGTTAATCATTAATGGCTCACCCTGCTCGTGGCTGCCTAGTGTGGTCTGGAC
CAGGCCCCAGTACACAGGTACTGCCCCACTGCCACGCTGTGTGTATGGGGGTGGGGGCGGGCAGGGGCAGTACTG
GTGTGCTTTGGAGACACTGACTTTCTGAAACACCCTA
>Cebpa CRM 15
AGCTGGCAGTGCTATTGGAGTTGGAGGAACTGGCTGCTGGTGGGAGGAACTTAATGGGGGCTGCTGCCAGAACTC
GGGCCGCCTGGTCCTGCCCCTGGCCCCGGGCAGTTGATGGAGTCCAGAGTGGAGGCAGGCTGCCACAGGTGAGGT
33
GGGGGAAAACTGGCAACAAAGGCCTATTCTTCAGGGTTTAAAGTGTCTCCGAGCCTGTCCAACTTTCCTTCATTA
GATGCTCCAAGATGTTTCCCCTTCCAGGCTGCTTGAGATACGCGGCTGATAAGGGTACGATTTGAAGAGACTTAA
TTATGGCCCCGAGACCATGATGTAAACATCTGTGGAGTGTTTGCTTTCCCCTCCCTGTTCCGTGTGAATAGGGAA
CGCAGCCCAGGTGCCAGGGCAGAGGACCCGGTGACCCCAGCCTTTCCTAGGAAAGGAGGAAACCAGGGGCAGTGT
GTTTCTGTGGTGCTGGCCAGGGGTCTGGGTCCTGCCATGACAATCTCCTGACCATTTAGTCACAAGAAGCTTATC
TGTGATGGCCGTAATCATCTCTCACGAAGGCACCAGACAAGGGCTCCGTAAATGCTGCCTAGCGACATGGAGGGG
ATAGGGAGTTTCCCAGGCTGGCCTGACCTCCACAGGGGCCTCAGCCGTCTAGGAGGAAGCATCTGTTCCCATAGC
TTTTCTGGCCAGCTGGATGACAGGGAAGAGGACAAGGCTTTGGCCAGTCAATAGCCCTACCGTGTTTACC
>Cebpa CRM 16
AAAATCAGTTTATCCCTATGCTGCCCCAGGCCTGGTACCCAGCATGACACAGCTAAGTTTCATTTGGGAAGAAGC
TCGGTTCAGAGTTAGGAATGGGAGGAAATGGTGGCCCCATCCCAGTCCGACACAGATGCATGCTATACCTAGTTT
CACACACATACATCATGCATATATACTGCAAGTTTGTCAGGTCAGTCCTTCCACTGCCTCTAGGAGAATGCTCAC
AGGTGTGTGTGCGCACACACACACACACACACACACACACACACACACACACTAGCCAGTGAAGTGCTGCTTAGG
AGTAGTCTGCATCTCCGGGGATGGGGAGGAAAAGTGAGAGAGGAGGTGACCATCTAAGGTCACTTGGGCAGGACA
CCCTATGACATTAGGCAAGTGTTTTCACTGAAGGCCCCATTGCAGGAGCTTCAGGGACCTCATTAATAAGGACCG
TGGCTCAGCTCTCAGTGATGACCCAACAGAGGTGCAGATCCTGCTTCCTTCTGTAGCTGTCCCTTGGTTGGGTCA
CTTGTCCCAGCATACACACACACACACACACACACACACACACACACACACACACCATGCCTCCACTTCAGACCA
GGAAGCCATTTTGATCCGTGATGAGAAGAGGAAGTCCTGTGGCCCCAAAACAGAAACAGGAACCTGGGAGGGGGA
TATGGATAGGGAACGGGAAGGGGGCACACACCCAGCAGATGCAGCCAGAACACCCCAGCCTCCCCTGGCCCCACA
TTAGCAGGCAGGAGTTAGCATCTACACCAAATCCCGATGCTATCTATGGTCCTCTGTGCATCTGGACGTGCCATG
TGAGGGTAGGGCTAAGGGTCCCTCTAAAGCTGGATGCCTGGCTATTGGGCTGTGTTCAATTACCCAGGCTGATCT
GTCCCACGGAGAGATGAATGGACCTCTGCTGTCCACCCACATGTCTGTGTATCATCTGACCGTCAGGCTGCGGGC
TCCTTGCTGTTGCCCCGTGGGATCCCACACAGAGTTGTCCTCAGCCCA
>Cebpa CRM 17
ATGTGGCAGGAACCTAGTCTTTACCCTTACTGGGCAACTATACTCAGGATTGTTGTGGTGCAGTCGCTGCAGGCC
AGAACTGGGAATCTGCCCCATGGGCACCACTGTCTCACACTATCCTGGGCCAGGCCCATCCTCGGCCAGGCCCTT
TTTTCTGGCTGGGCTGTGTGGAGGCAGAAGGACAGCTCTCATTCGGGGAAGCCATTGGCTGCTCCTCCTCCTTGG
GTTTGGAGGTTCCTGCGGGAAGACTCTTGAGGCGGGTGGGCGGGGGCCGGAGCCTGGAGGACAGAGACATATGTC
ACCAATTTCAGTTCTGCCCCGGCTGGCCTGAAGCAGTTTGGGACTTCCCAGGGTGGAAGGACAGGCTGGGGGGCC
AGGCCACCTGGAGGCAGAACACACTCTTGAGTCCCCCTCTGTACCCTAATTGTCAACAGGGGAGGTGTGAAAAGT
TCTGGGCTGCCTCCATAGGGTACCTGCTTGCCTCTTCTCTTCTTGAGCCCTTAGAGCAACCTTCCAGGACTTCCT
GTTAGTCAACCCAAGATTATGGCAATAAGATACTTTTTTGGCCCTCTGAGCTCCCAGAATGAAGCCTTTAAGAGT
TGCGACGCTCCAGGGGCACCTGGGAAAGATTTTCAGTATGAGAGTGAGGTGGTCTGGATAGATGACCCAGGAAGA
CCACATCCCCTCCCCAGACTGAAGGATAAGGCAGAAGCCATAGCACATCCACCCTAAAGGGCTTCACCATCCCAA
CCCCCCACAACTGTTTCCTAGAGGAGGAACAGAGGGAATGGAGTGGGCAGGTGAGAAAAGCCCCCACCTCTGCCT
AGGGCGGTCACCCCCAAAGTCAGCTGGGAAACGGCTTGGCCTGAGGTAGCTCATTTGACCGGACCCTGTGCTCGG
GAGATCAGTTGTGAAACTTGCTTCAAACCCCCAAACCACAAAAACGGCCGATTTTCCCAGCTCTGTTCACATTTC
TCAGATCAGCCAAGGAGTCTTTATTATTTCCATCCGAAAAAATAAGCCAGGCCAGCCACCTGCCCTGGGGGGAAC
AAGAAAATTCCAAAGGTGCAGAGCCATGGTTCTGTGGAAGCCCCTTCCACACTGGCAAGCTGTTCCCACCAGCCT
CTTTACCGACGTGGGGCCCGGAGAAAGGGCAAGGCCTGCTGGGCACTTCTGGAGAAATCTGGAGATCCGAACA
>Cebpa CRM 18
ATGCCACCCCTCTGATTTTGCCATGCCCACCCCCTGATTTGCCATTCATGCCCGCCCCATGCCCTGACTACCGGC
GACCACAGGAAGTGCTGCCCTAGCTCAGTACTTCCCGTTTCTGAAATCTGCCCCCAGCAGCCTGGTGGCCAGGGA
AGGCAGACTTCCCGCTGCCTCCACCCTGGGCTCTTCCCACCGGTCACAAGTGGTTTGTTCCTGGGTAGAGGTGAC
CTTCTTGCCACAACCACACATCAGTTATTTATCAGAACAGGAAAGATGGCACCAGAGATATGTCCTCACCCCGCC
34
CAGGAGGGCCTGACCTGCTCCAACGACCACACTCCTGTTCCCCACATCACACGGGGCCTGCTCACCACATCACAT
AGAGGGGTGTGGCTGGCACGTGCCAGCGGGGCTGCATTGTGGGGGTGGGCAGAGCAGGAGGTCTGGTGGGCAGGT
ATGGTGGGACCTGGGACCAATGGGCTTGAGCGAGGCTCTCTGTGTGGCTGGGGGCTGTTGAGACATCTGGTAACC
TTTGGGTCCCATGGAAAGCACCTTCCACGTAGACCCTCTCCTGACAGGGAATCCCTTGGGGTTGGGACATGTGCT
CAGAATGGATATGTGGCTCTATTCCAGGGGACTCAGTGAGG
>Cebpa CRM 19
CCTGTCTCGAAAAACCAAAAAATAAGACCAAATGCTTACACTGGGGCTGCCACCTTTCATCTCTTGATTTTGTAC
CAAGTTTGGGCTCTTAGCTCGTTCAGATTTGTTAAATGAACCAAGGGCACAGATTGTTGGAATCTGCAGGAAGTC
AGGATAAGGTCCGGATAGTGAGGTCAGAGGAGCAGGACCATGCTCAGCAGGCTGGCCTGGCCAGTCTGAAGTCTG
CAGCAGTTCCTTCCTATCACATGAACCTGCAGGGAGGCAGACCTGTGCCCCCAGGAGCTAAGGACCAGCCATGTC
CAGGTTCTATGCCTGGCCCAGCTGGGACCCTCTTTGTACCAGTGCCCTTACAGACTAGACACACAGCAGAGCCAG
GGTAGCTTTTGGGCCCAGCTCTGGGGACCTTCACTGGCAAAGACATTTTGTCCTGCCCGGATTTTTCCCAAGAAA
ATAGGCTCTTTCTCCTCCTTCCAAGGCACTTTTTTGTTCAGCCAAGCCCCTGGGGACCACATCTGCCTGAGGGGG
GGCCTCTTTCCAGTCAGCCCTAGGCACCGCTGCATTCAGTCTGGTATTTCGGGAACCGCTTCAAGGGGTTCTGGA
TTGGGTCTGCTTTGGGGAACCGGGAGTGGCGTCCTCTCCGGCTGAGTTCTAGGAGGACCTCTGGGTAGCTGCTTT
CTGCCACATAGTTCATTCTGTGTTCATTTAACAAGTATTTGAGGCTCTCTTCTCAGCCAAGGCCTGTCCCAGTCC
AGGGGAGGAGGGGTATGCAGAACAGGACAGACAAGGT
>Cebpa CRM 20
TGTCAGGCAGCTCAAGAATGCAGTCGGCATCCACTCAGGGCAAGGTCCCTGTCACTAAGCCACTAGGCAGATTCC
AGTGACCTATTAACCTTCTAGTAATTCCTCCCAGAAGACCAAGAAAGCTGGAACAACCTCATCCTATGAGGATCA
CCCTCAAGAGTCTCTCACTCCAAGAAACACTGAATTCCAGAAGCACACTGTTTTATGGGGAATCTGACACACTCT
GGTTTTCCTTAACTGTTACTGGTACACAATGTGTGTGTGTGCATATGTGTTAATCTGGGTGAGATTAAAAACTAT
TTAAAAAAAAAAGAAAGAAAAAGAAAAAGAAAGAAAAGAAAAGATTGCATTTGGAACGAATGCAAGGTGGATTTT
CTAATCAGATATATTTTTACCTGGATGCTAAAACATGCTTGTTGAAATTCAGTTTTTCCAATGTCACCCCTACCT
GTTGACAGTGTCAGATGGGAGTTCTAACTTGGTGTTTGCTTGGTGCCCATGATGGTGCACAGGGATTTAGGCCTT
GGAAGTTATCTGGCCAACAGGCAACAGGAAGTGTCACGAGAGTGGGTTATGTGCTTTTCAGTTGACCTTTGGGTT
TTTGACTGAAAGCGCTCAAATGGAGACCAGCGGGAGAAGTGGGCTAGGCCAGTGTGCCCTGCCCATGCTGCATAA
ACTGCTAATCAGTACAGATGGGAAA
>Cebpa CRM 21
GGTGCCATATTTGGTGGAATCACAACCACACAGAGACATTGCAAAGGCCTTTGCAGGAGAAGGTCACATGCCAGT
TAGCAGCCACTAACCCAAAGGAGCCAGCAGCAAGCTACAGCTACCAAGATGAGGGAAGTATGGTAGGACACACCC
GAAAAAAAAAAAAAAAAAGCCAACTGTGCTTTGCTGCTGCTTTCTACCACAGGAATGTAGAGAAAAGGCTGGCAG
AGCACAAAGAGGAACCAGCGCTATTAAAAGAACGGAGGAATTAGCTCGGAGTCAAGCTCCTAGGGATGGCTAGGG
TCATCTGTTTCCCTCTCTATACTGATGCATCATGGGCCAAGCCCACAAGGGAATGTGATGGTGTCTTGAACAAAA
TAGACTAGGCACCCAGCAAAATCTCCTTGGAAGAGAAGTCTGGGTTCAATACTAGACCCAACCATCAATACCAGA
CAATATACGATGAGGACCAGCTAGACCTGAAGCCCAGGAAGTGCTATTTTTGGACCCCTCTGCACTGAAAGTAGT
TTGCCTTATTGGAGGAAAGGTCATTCTCTCTACAGATCCATTGAGTAGCAGCAAGAAACTCAGACTCTGTCTTCG
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTAGGGTGTTAGATGTAACGG
>Cebpa CRM 22
GCTCCAGCCTGTATTATTTCATTACTTTTTAAAGGGCTCATCGATCTAAAACTTGTCCAAACAAGAGCTCTCAGG
AAGCACGCGGGAATTGAGCGTGTTCTCTTGCTCACGGACGGGTTTAGTAGCTGCTACGCCTACTTCCAAAAACTC
CCGTGAGACCTGCCAGGATGCCCTCCTCCTTAGGTGGAGGATGGAAGGACCGGGCCTTGAGCCAAGGAAGCGCCA
GCTGGGTAGGAAATCCCCACCTGTGATTACTGGGGTATTCATTAATCCAGGATTGTTGTGAGAAAAAAAGAAGAA
ACTGAAGCTGGTTTTGACAGAGAATTCTGCTAGAATCACATGAGGTCCGCAGATGCCAAATTCTGTTAAGGGTCT
AGGGTAAATCTGCAGCTCGCTCAGTTCCACCCACCCCCACGCCCACTCCCACCCACCCCCGCGTCGGCTGTAGCC
35
AAAGCCGCAGCTTGGAGAGCTCAGCTCTGCCCCCTGGGATAAGGAAGAAGAGGGTAAGGGTTGTTTTTGATTACA
GGTTTCTATGGTAACCAGGACTCGCAACAGAGGGGGTTTGAGCATAGCAACCGCAGACTCAGCTAGCTTTGGGAC
ACTCGAAGCCAGAGTTTCGCCTAAATAATCCGTGGGGATAAGGAAGCACCGTCCTTAAGGGGGATCCATGGTGAA
AATTAGTTTTGGATTCCTTGGGGATAAGAACTACATCTGGGTAGCTGCGTTGGGTCTATTCTTGTCTTGTTTTTT
TTCCCAGGTTGCTTTTCTTCGACTTCTTCTGAAACCTTTCTTCTCAATTCC
>Cebpa CRM 23
CTCTTCCCCTAGGCATCTACAATGGACCCCAATCAGTTGTCACATCCTATCAGGAACCCAGGGTTAACACGTACC
TCCCAGTGAGGCATTGTGCTTCCTACAGTGATCTCCAAGGAAGCCTGCACCGTGCTAGGAACACAACAATTACTA
AAAACATATGTGTTGACTGGGAATCTGAAAGCATGTACGCAGCGTCATGCTGTGCTGTGTTCTGGAAAGTGAGAA
GAGCGGCTAGGGAGGAACAGACCAAGGCTCCCACTCAGCTGGGGTCTGTGTGACCTTGGGCTGGTTATCTCTTCT
CTCCGGGTCTCTGTTTCTCCCTCTGCAGACTTGCATGCTAATATATGTCTCTCTCAGGGAAAGGCAGAGGCAGAC
AAAATGTCTGGCAGGAGTTTGGCATCCAGTTCCCTACTTGGTCCTTCTAATCGCCTGTAAGCCGTTTTCACATGA
CGCCTCGGCCGCTTCACGCCATGATTAAAGACAAATGTATGTGTTGTCTTTAAATTGTTCTTCAGGGACTCTTCA
GTTAACCCCCAAATCACTTTATATTGTAGAATAAAATTTTTCTAGGATTTTATATCTTGCTCTTAGGAGATTGAT
ATTTACTGGTAATTTTTTTTTTAAGATTTACTTTTGGCATTTTAAAGTGTGTGCATGTGTATGTGTTTGTGTGTG
TGTGTGCATGTGCGTGTATATGTGAGTACATGGGGGTATGTGTCTGTGT
>Cebpa CRM 24
CAGCAGCTTTCTATCAACTTGTGACTGTTTCATGCATGACCGTAGGTTTTACTCACACCTAAACAAACAGAACTT
GATTTCCCTCTCTGGTTACTATAATTCATCCCGAACATATATCTGTCATGGCCCCGAAGAGCGTGTGCAATTATT
CTAGTGTGGGAGAATCTCAGATAATTAGGGCCAACCTCAATGTTTTGAAGGTGTGACTGTTGTGGGAGGAGGCAG
ATAATTAGTTAGATTTGAACAATCTGTGTGCCCGGTGTCTGTCCTTGAGAGTAATTGAAGCCCTTCCAAAACAAA
AAACAAAAAACAAAAAAAAAAAAAAGAAAAGAAAAGAAAAAAGAAAAAAAAATGACTTGTGTCCTTTAGAACATT
CATCAAAAAAACTTTAAAATGTTTCAATAATTCTTAAAATGTCCATAATTGCCCTAGTTCCTTGTGCTTAAAGGG
ATTTATTAAGAGTTATTTTCCCATGACAACTTAAAATGAAAAACTCCAAGCTGAAAAAAAATGAACTGCAGTAAT
TGGCTTTGATGAATTCAAGGCACTATCCCGCACTGTAACTGTGGGTACGGAGAACTTAAAGGCACAGATGTGCCT
TGAGAATAAAGTGGGGGATGAACCCCGGGTACTGTGGAAGGGAACCAGGGTTACGGTCAGTATTTACCCCATAAC
TGCGTCAGCCTGAGCTAATTCTTTGATGATTTCAGGACCAGGCGCAGCACAACTTTACAGATAATATCTCAGTTG
TTGAGCCAAACTGCAGGAAAGCCTCTCCCTGCCTCAGTCACTCAGAACCAACCCGGCAGGATGCAGGAACTGACA
CAGCCCTCCTAGCTGCCTGGGCCAGAGCTCAAATCCAGTCTGGCTTACTGTCTCCCGTGGCAAAAGAGGGACTTT
CAGCTCTGGTCCCGTGGCTTCCTGTGCAATTAGCAAGCTTCAAGCAGATCCCAGGAACTACAGGAAAGAAACTGC
CATT
>Egr1 CRM 0
GAAACGCCATATAAGGAGCAGGAAGGATCCCCCGCCGGAACAGACCTTATTTGGGCAGCGCCTTATATGGAGTGG
CCCAATATGGCCCTGCCGCTTCCGGCTCTGGGAGGAGGGGCGAGCGGGGGTTGGGGCGGGGGCAAGCTGGGAACT
CCAGGCGCCTGGCCCGGGAGGCCACTGCTGCTGTTCCAATACTAGGCTTTCCAGGAGCCTGAGCGCTCGCGATGC
CGGAGCGGGTCGCAGGGTGGAGGTGCCCACCACTCTTGGATGGGAGGGCTTCACGTCACTCCGGGTCCTCCCGGC
CGGTCCTTCCATATTAGGGCTTCCTGCTTCCCATATATGGCCATGTACGTCACGGCGGAGGCGGGCCCGTGCTGT
TCCAGACCCTTGAAATAGAGGCCGATTCGGGGAGTCGCGAGAGATCCCAGCGCGCAGAACTTG
>Egr1 CRM 2
TCCTTCCACACAGGCACTCTCTGCTTTCTTTTTAAATAAAAAAATAAAATTAAAATAACAGCACCTTCCTCGTAT
TCAAAGTTGGAAACAAGAGCCTCCCATTCCTGGAATCCCTTCTCCCTTTGGGTTGCTTCGGAGATAGGGCTTCAC
TGCTTGCGTCAGGGTCCCGGGAGACCAGCGGGATCTCTCTGCCATCACACCCCCGCCCCCTCCCCCCCCCCCCCC
TGTTCCCTGCCCTTGGCCTGGCTCTGTGAAGGAAGTGTTACCCTGAATTCTGGGCGCTTTGGCAGTGGCGGTTCC
CTCGGGACTGCGGGGAAGGCCCAGGCCGCCGCGCCTGCTCAGTTCTCCCTCACTGCGTCTAAGGCTCTCCCGGCC
TGGCTCCGCGCCCAGCCCAGACTACGGGAGGGGGAACGTGGAGGCGACGGAAGAGCCCGTCGCGCCTGGGGCTCC
36
CGAAATACAACCAGAGACCTACAGAGGGCAGCACCGAGCCGTAAACGGGTCCTCCGCACTGCAAGCTTGGGGTCG
CCAGACTGCCCAAAGCCAAGTCCCCCTCTTTAGGACAGGGCAGGGTTCGTGCCCGACCAGTCCCTGGCCTGGATA
AAAGTCAGGAAGTGTCTAACCATCACAAGAACCAACAGATCCTGGCGGGGACTTAGGACTGACCTAGAACAATCA
GGGTTCCGCAATCCAGGT
>Egr1 CRM 3
TGGTGGGAATCTAACCTGGGGCATCGAACATACTAGGAAAGTACTCTACCACTGAGCTACACCCCAGCCCATGAG
GGGCCTTCTCAGACTACCATTCGATCTGCTTCAGGCCTCAGCACTACTCTGCGTGGCCCTTCCCACCTAGCCTAG
GTCTCAGGGGTTTCAAAGAGGGGGTGATTTTGCACCTCCACACTCTGGGACAATTCTCTTTCTCTGGAACCTGTC
TTTTCCCATGACTCCCCATTTGTCCCCAACTTCTTTCCCACCTCGTGTGTGTGGCCTCTTCAGCCTTCTTCTCAT
CTCTCAAAGCCCTCCATGGGCGGGAGCAAAGCTGTACTTGGAAGCATCCCAGGCCAAGTTGTCTAGGTTTCCGCT
GGGGCCTCTCAGCCATCGGAAAGTGTGACGGGCAACCCAGACAGAACCCAAAAGAGAGTCACTTCCTGAGCCCTA
AGCTAGCCACAGAAGTGGCAAACCACGGCCTGAGCAGCACATGTGGGTGCTGACGAAACCCAGCCCTGGGAGGAG
GAAGTGGCGATGGAAAGGGGACTTGCTTTCCTCTGAGAGTGGCAGCTCGCCTGGCCTTAGGTTAAGTAGAGTCAA
GATGGCCCACTGCACTTGCTCTCTAGTGAAGACTCAGAAGCCTGGTTCTCCCTCTCCTCTGCCTCCTCCCATCAC
CTGACCAGTAAGACCCTAGACTCTCAGGACATCCCTGAGGATTCCTTGGGGCCCAGCATCCCTGCAGGCCCATAA
AGCCTGTTGTTGGTGACTTGGCTCCTTAGGAGGAACCAATTCCCCCTCCCCATGTCATCATTGTCCCCTCTGACC
CTCAATCATAGTAAACAGAAGTGTGCCACCAGCTATAGAACATATCTGCTGAGTGTGAGGATGGCTCAGAGGCGC
CTAGCGTTGCCTCTGAGATGTCGGGCCAGCAGCACAACCCTCTCTAAGAGCATGGCTGAGAGCATGGCTGACCTT
CATGGTTCCTGGGCAGAATCTGCCTTCCAGTGGCTTATACCAAGGAGACTGGGCAAAGTCAAGAGGA
>Egr1 CRM 4
TGCATCGTTCCTCATCAATGAAATGAGGCTGAGACAGAGACCTACCGCATAAGGCATTGTGAGGACAGACTCTTA
GAAGCACTGAGCCCAAAGTCAGAGTAAATGTGTTCTGGGAGGGGTGATCATGGGAAAAGAGATCCCCCAGAGATT
AAGGAAAGCCGGAAGAAGCAGGCACCCTGAGACTATCGCTTTTTGGCATTCTAAGGTTTGTTAGAGTGGGGTAGC
CGAGGACAGTTCTGAATCAGGTAGCCCCAGAAGGTGCCTGCTCCTCCCTCCTTCCCCTCCTGGCAGGTCTGCAGA
ATGCAAGCTTACCCACCAAGAAGCTGAGCAGCCTGGAGCTGGGCGGTGGGGAGGGGCCAGAAATAACCCCTGTTT
CTATTCCTGCCTTTCTGGGCTGGGTGGAAACGCTGCTTTTTTGAATCTAAAGGCTGCTTCGACTTCCTGCTCATC
ACTAGGGCTCTTATGTAACATTCCCCTCCCTCTCCTGCCCAGGGAGCTTCCAGAGATTTCTACAGCCGGCTTTGC
TGCGCTGCACATGCAGCTGGGGGTGGGGAGAGCAGAGAAAACACCACAGGATGGGTTGTCCTCTGAGCCCAGTGT
CTGAGCTTCAGATCCAGGGGAGGGCTGTGCAGGGAAGGACTGAGGGCCAAGGACTACAGCCTCTTTCGGTCTCTA
CATGGCAACCTGCCGCAGGAAAGAGCCTGTCAGCCTATAGATAGGCGGCTGCCTCCTCCCTTTACAGACCTGCTT
AAGGAACAGACTATAGCGCCTCCTCCAACTCAGTATGCTGCTGTTGCTTCCATTTTAAGCCTAACCTGGCTGGCT
CAACTCTCTGTAGGGCTCTGAGGACTTTTATTCAAGTCACCCAGACCACAACTGCCTCCTTAGGCAAGCTGTACC
TAGTGCCAAATTCCCCATGGTGTGTGTGTGGGGGTGGTGGTGGTGGGGTCTTGGTGGGAATCTAACCTGGGGCAT
C
>Egr1 CRM 5
GGCCTGACAAGGTAGAGAATCCTCTTTAACAGCCTACCTTTCTCTGACCTCTTGCCCACACTCACCCCAGCAGGT
TTGTATTGGTGATGAACTACCTCCTCTTTTTGTCCCTGATAAGTGATCAATAAATAATGTTGCTTGAAATATTGA
TGGAGAGTTTTACACCAGCATCCCTTAGCTGGGTCTGGCAGGCCATGACCCTCCCGGCAGCCTTTTCACACTGAC
AACAAAAGGTTTCCTTTGACCAAGATCTCCACCCGATCCATTCGGAACCAAAATCATGATTAAGTTTCAACCAGT
AGAGGAAGAGATGTGAGAAGAACTTAGCTCCTCCCCCTGAGAAGGACGGTTCGGCTAAAACAAGCAAAGACCCTT
AACTGATTTCCCATGATGCCCTCCAGATCCTTCTCCAGTTCCATGTCCACGGTGGGCCTGCCAGGTCCACTTTCT
GAGTGGTCTAAGCAGAAGGTGATCACATGCAGGGCACAGTAGAGAACACCTGGAATCCCAGCCCTGGGGAAGTAG
AAGCAGAAGGGTCCACTCCAGGCTCCAGGCCAGCCTGGTCTACGTAAGTTCCAGGCCAGGCAGGGCCTCCTGGAA
ACTAACAACAAAACAAAACAGGCAAACAAAACAAACATGATCACGTGATCAGACATCTCTGGCCCAGGTCACTCT
CACCTGGTTTATTCTCCAGCATGAAGGGTAAGAAGAGGAAAAGAGGGGAGAATCTGAGGTTGAGGAAGCTCCCGG
37
ACCAAGTCAGGCCCCATGCTCTGTGCCTTGTGCTCAACAGACAGAAGCCATATCACCTCAAATCACTGCCCAACA
CTTCCTGCTGCCCAAGTGCCCCTCCCCCACTCTCCCACACAGCTGGGCCAGCTCCTACAGGTGCACAAGTCTTGG
GCTTAAACCCGCAAGCTGTCAGGTGTCCCCTCACACCTGTCCTGCCACACACCTGCTGATGCCACCATCAGCCCG
TGCCCTTAGTGCCTGTTCCTCAGTCTCCTCCCCCATCAGACTGCGAGCCCTTTCTCACATACAGATGGGAGTCAG
CAAGCCAGCCACATTCGGTTCTTGCTTTAATGCAAATGCTAAATTGGGAAGAAGAGCAGTTTCACACTCCATATT
TAGCCTGGGAGAGAAAAAGTGAGACAGCATGGGTGTTACCAGGGAAAGCAGGCTGGGGCTGGGCTGGGTGGTGGA
AAACCTACCCAGAGCCCAGGGGAAGAGCATGGGCCCTTGTCAACTCACTCCTAAAGAGAATCACCCAAGGGCCCT
GCCACTTAAAGACCC
>Egr1 CRM 6
GTGAGTTCCAGGACAGCCAGAGCTACATAGGGAAACCCTATCTCGAAAAAAAATGGAAAAATAAAAGATTCCCAG
AAAGTAAAATGCTGTAGTACTCTGCTAGCGTTTACCATATAGCCAAGAAAACAAGTTAGGTTTCTGTTTCATTTG
CTACATCTCACACACACACACACACACACACACACTTGTGGCTGGGAGACCACAGCCAGATTCTGAATAGCCCCT
CTCTACATATACATTTGGGTTTTTTTGTGTCAAAAGGAGTATGCATGTTCAGACTTCCAAGCTCCTTGTCCTGTT
GCCATGGAAACATTCAGGCTTCCAGAACTCTAGGCTCAGGGTCTGCCCCCTGTGGTGCAGGGGAGGAGAGGAGAG
TTGACAGGTGACAAAGAATAACTGGAAAGCCTTTTAGGAGAAGGTTGGCTGGTGGGGCGCACTGGGTCTGGGTGA
CCTCATACTGTCACTCATTCCCTAGGGCCACCTGGCCTTGCTCCTTTGCCACAGATCTTCTCTGGCAGGAGGGAA
AGACCTCTGTTAGAGGTGGGTGGAGGGCACTAAATCAAGGGGTTCTCGGGGGCCCTTGGGAAGTATTATTAGCTT
TAGCGATAGGGTTTAGTGCCATGTCACGTGCCCAGGTCCCCTGGGAATATTAGGCAACCCTCCAGGCCTGTGTTC
CAGACTATGATGGCCTCAGGCCTGAAGCCGCTTTATCTGGCTTCTCCTCCCTTTTTTGTGGGTGTTCCAGCCTCC
CAAGAACCTGCTTAAAATGGGATTTCCAGGCCCTGACGTCAGAGGAAAGCCAGCAGCTCCCTCTAGCCTGTGCCA
GCTCTACCGTGATTAGCAGAGCTAAGTTCAGCCTTGCTCAGCCTACTGGAAAGCTAACAGGGACTGGAGGGAGGA
ACTTGGGACTCTAAACAGTGGTCTCTGTATCTGTGGCTTTCTGGATGACAGGAACAGTCTGTTTCCAGGTCAAAA
GACCCTCCTGGCTTTCCTACTAACTTAAATTTCAGCTAATGTATGATCATTTCCCTCCCAACGCCATAGTTGCTT
TCTCTCGGTTCTAGGTCTCATGCCTGACTTGAGGAAGAAAAGGGCATCTCAAGGCAGTCCTGAGAGCTGGACAGC
GGCTTCCGTTTTGGTTTTTACCCAAGAGGAGGTTGAAGGTGGCGGCTGTGGGAACTCTCCCTGCAAGACGTTGAA
AGGCCCACTAGGTGGCGCAGCTTCCTCCCGATGTGGATTCTACCCTCTAGCAGCTCAGGGCCTGGAGACCAGAAT
ACCTCCTACTCTGCTCCCCGGAATGAGAGACTAAAGGGGGTAGAAAAGA
>Egr1 CRM 7
GCCAGTGTTTGTTCTGCTTCGGGCTGGTCAACTATAGCTTTGTGTTGATGAATTGGAGCCAGAGGCCACGTGGCC
AGAGGTGGTGGCCAATCCAATCCCTTATCTCTACCCAATATTCGAGAAATCTGCTCCAGGCCAGATGTGCTCATA
GGAGATAAGAGGTAGACAACACAGGCTTTAGGAAACAGTTACAAGGCTAAGGGTGACTCAACTTCTCCCTCCTTG
CATGTCCCCAGCAACTTAAAAACAAGGGCTAGTTGTCCAGCCAAGAATCCAGGAAGCAGAGCTCATCCCTTTGCC
AGTTGGAATGGCCATTCTTGGCAGCTTCCTGGGAGACGGGGCAGAATGGGTAGGAGGAGGGTGGCAAAGTATGTT
CCCAGTGAGTGGGAGGTACACAGTAGGAGGTTCTCTTGACCCTGGAGCCTAAGGTCTACGCACTGCTGGAGTGAT
CCTTGCGGGTATGTGAGCATCTGTCTCCTCAGTGAGAGCCTTTGACCTGGCCATGACAGAGCAAGGACAGCTCCT
GTTCCAGGATTGCAGATGGTTGGAGAGATGGAGATGTTGAGCCAAAAATGACAGCACAATGGTATGTTACACAAG
AAGGGCCAGCGTGTGGCTCTCTGGAAGCACACAGACAGTTGTCCCGTCCAGCTGGGGCCTTGGAACAGGTGAGAA
CATGAAACAGAAGCCTAAGTAGGGGTTAGACCTCAGGGCCCGTGTGATGCTCGGACAGGGAAGATGACGGCCAAT
AGGGTGGAGTGTCGCTAGAATGGAGTATTGTACAAGTGCTTCCTGCCACCCTAATGTGCCCTAAGTCTTCTGTAA
ACTGATCAGATGCCCACAACCTTTGGGCAGATGGTACAGTGGATTTGGTGGGCTGAATTCCCAAGCCTTGGCTCC
CCACTCACCTTTGACCCCAGACCCCAGAGTGTCCCTTCCACCCAGCATGGTCACCAGGAACTAAGTGGATGGAGG
GTAAACTCACCTCCACCTACTCCTTTTCCTG
>Egr1 CRM 9
TGGCCCTCTAGGAATTCTCTAGCTTCTTTTCCTTTTTGGCAGGGTGAGCTATTTCAAGACACAGTTTCTCTGGCT
GTCCTGGAACTCAGTCTATAAGAACAGGCTGGCCTTGAACTCAAAGAAATCCACCTGCCTTCCCAGTGCTGGGAC
38
TAAAGGTGTGCACCACTACGGCCTGGCTCTCTTGAAGTTCTCAATAAGTCTTTTTCTCTTGAATTTCAGGAATGG
TCAGCCCCATGCATTTGGAAAGGTTCCACTTAGAAGAAAAGAACTTGGCCTCCCTGTCATATTCAGTTCTTCTGT
TTTCCCCTGTGAGCTTCCTTCTACCCTTCCTGGCTCACAGGAGAAATGGAACCAGGTCAGAATAGCATAGGGTGG
ATATTTTCTGCCAGGCCCCCATCCCCAGCTTGATTGCACTCCTAGCTTCCTGGTAACATTAGTACTGTTTATGAG
GGGGGAGAGTAGACAAATGAAGACCGCAGTGTCCTAAATGAGACTGATTCTCAAAGAATCAGATGTCAAAACCAT
TCGATTACAACATTTTATCCAGAAATCTTTGGATACCATGGTGCGTGTGATTTCCTCCAATTGAAACATTCCTTC
CTGGAGGGAGGTAGGAGGTCCCAGACTCAGGAAAGGCAACAGGCCTCCTCAGTTGAAGAATGCTGAAGCAAGCTC
CCTAGCCTCACAAGGCCCCTCCCTAAGGTTATAGAAACGTAAACAAAAGCTAGGAAGAGGAGGCCAGCTCCCATC
ACAGTAGAGTCCCGAGGACAGTCATGAAGACAGATGCAGGAACACAGTTCTCATGAAGCCTGATCCATGCTGTGG
TAGGCTTTGAGC
>Egr1 CRM 12
GACCAGGCTGGCCTAAAACTCAGAAATCCACCTGCCTCTGCCTCCCAAGAGCTGGCATTAAAGGTGAGCACCACC
ACCGCCCTGTTTCAAAAACAAGTATCAGGAGACCTCGTAGAACCAAGCAGGGAATTAGCTGAAATAGTGTATGGG
AGATCGAAACGTAGGACATAGTCAATATTCATTGATTCCACAGATATTTGAAGAGAAAAGCCACTGCATTTGGGA
GGTTTTGTGACTTATCAGTTTTATTCTACAGTGATGGAAGATGTTAAGTGCGGCTCAATTTGCTGGGTGCCTGAA
GGTGTAGTGAGATGCTTGCCATTCACAAGTGACATCCCCAAAAGGCTTCAGAACTGGTTTCTCTGGGTGGATGAG
GCCTGACACACAGGGAAGGAGATCCAATCTGCATCTCCCTTGAGTCTCTGGCTCCCTGGGAACCAGTTCTTCCTT
TCTGCTACCAGAGAGGCCTGGATGACACAGCTACATGGCACTCTGTGGCCCACAGAAGGAGCTGTGTCCCTGGTG
GAATCGAAAGACATCCTTGAAAGCTGCAACTTGTACAGGGAAGGTTACCATGCCAACCACAGCCTCCTTGCCCAA
CAGGGATGAAAGGAATCCAGGCCCTGTGTGACAGGCAGCCATTGTGTGTTCACAGACTTGCCTGGAATCTATCAG
TAGCTGTTAACTGCTGCTCAGCAGGACGGGGCAGGGAAATTCAGGTTAGATTTCGGTGGGACTCCAGGACAGGCT
GGACCGACAAGGGCTGATTGGGAAATGCTTGTCTGAGAGTGATACCCATCTCTTGGTGAAATGTGTGGGTGTGGT
ACTGACAGCAGGTGTGGGAGGGGAGGGTGTCCTACATACCGGGTTCTGGACAGGATGGGAACTCAACTTCCTCTG
GGTTCTTTTTTGGCCCTTCCACCCTTGCCTTCTGGAGGAAGTAGGGGAGGGCCATGTGCAGGGTCAGGAGATCAC
TGAAGCAATGGCCAAATTGAGAAAGGCAAGTCAGGGCCACACATACCTCAGGTAAAGGCAATAACCCTAATTGAG
CCACAGGCTGGTGCCACAGGCTGGTGCCACAGGCTGGTGTCACAGGCTGGTGTCA
>Egr1 CRM 14
CTAGGTCGTTGTGATTGGCATCAGGAACCTTCTGCAGAGCCATCCCGTCTTCAATCTTTCTCTTCCTCTTTTCAG
ATTTCCTTCCTTCTTTTCTTCCTTTTCTGGTTCTGATCTAGAATAAATCGACCTACAGGCTTCCAGTATCCACCA
GCCAAAGACACATGTGCTGTGTGACTACAACGTTCAGAAGTTGAGTTCTGTGTGTGCCCCCGGCCATCACAGGTC
ATCTTGAGTGCCTGCGAATAGGGGGCATGGTGAGGTTGTCAAGGGTCCAGAAGATCCAGGAGTTAGGTCCCACAC
CCCTGTCCTGGGAACCTCTTGCCTTAGCATGGCTATGGCTCTTGTGGATGAAGTGGTTCAGGAAGATCAGGCTTC
CTGTGGCTCCCCATTGACTCCCCGCTACCTTTCGCCCCCTGCTGGCCTATGCCTCCTCACGGTCGAGGGCAGTTC
AGGGTGTGGAGGCGCTGACTCCTGAAGCTGGGAGATCCCGGGCTGGAAACCAGAGAGATATTTATAACTGAGCGG
GGTGGAGGGAGGGAGGAGGCGTTGGAGGTGTTGAACTCCACTAGGATGCCACAGAAATCGCGGGAACCCCGCGTT
TCTGCTTGGAAGGCCTTCTCTTTCTCGGTTAGGGAGCTTTGAGACCCAGAAAGTCCTTTGAATTGGCAATAGTCC
TAGGTCTCATGAAAGCTCTGATGCCAGCATTTAAGGTTCCTTTTTGGGGGTTGGATATTTGATCTAGATTGATAA
TTTTCTCTCCCTCTCCCCCTTCCCTAGTGCCTTTAATCCATTTAGCCGTCTCAACAGCGTTCGTTAAGAAATTTT
AGTATAGGGCTGGA
>Egr2 CRM 0
TTGCTTGCGGTTTTGAGCTGCCAAGAAAGTGAAGGAGGGGTTTGACTGTAGTGTCTCGGCAGCGCTCGGTTTTCT
TTCCGAAGTTTAATTTTCCGGAATGGCTCCCAAACAAGGGCCGGGGAGGCGGAGCCGCCACTACCGGATCTTCTC
CTTTTTTGGAAAGTCTCGGAGAACCGGAATTCCTCCCCGCCCCAAGAGACAGAGCTACCAGCGCGGCCGCCGTGG
GTGAACTCACGGCGGCCGCGCTAGGGTCGGTGCTCGCGCCTTCTTCCCGCTGAACTCTGCAGTCCGGAGTCCCCG
CTGCAGGCAGGGGCCGAGAGCCCCAGACCCGGGTGGTTGTCCACCGGCTGCAAATCGTTCCTGGCGAGCTCAGCG
39
GAGCCCGCGCAGCCAAGCCCGTATGCAAATTGGCCATGTGACGGCAAAAGCTGCCAGGCCCAGCCCTGTTCCTCA
GTCCATATATGGGCAGCGACGTCACGGGTATTGAAGACCTGCCCATAAATACTCCGAGCCTAACACTTTCCGTCT
GAGAGAGCAGCGATTGATTAATAGCTGGGCGAGGGGACACACTGACTGTTATAATAACACTACACCAGCAACTCC
TGGCTCCCCAACAGCCGGATCACAGGCAGGAGAGAGTCAGTGACGGATAGACTTTTTTTTTTCTTTAAGAAGCCA
ACAACTTGGTTGCTAGTTTTATTTCTGTTAATTTTTTTCTTTTTTTTGGTGTGTGTGGATGTGTTGTGGTGGTCT
TTTCTAAGTGTGGAGGGCAAAAGGAGATACCATCCTAGGCTC
>Egr2 CRM 3
GCTGTCTCTTGAGTGCACACGCATGTCCGTGCGCGCGCGCACACACACACACATCCACACACAGAGCTTCCAGTG
GAGAGGTGAATTTGTCATTATCTGCAAACACAGGGTGATGGAACACCTGTGTAATAGGGCACAGTCCTCTGTCAA
GGCCTTACTCATCTCTAGTGTTTCTGAGCTAACAGATGTGGGGCCAAATCAACCACGTGCGGCATGTCTTCAGCG
TTCCTTCATGAGATGCAATCAGAGAAAGAGATCTAAATTGCAAAAAAAAAAAAAAATATTTTCTTCCTTTCAAAG
CTCCCATGGCTGTTGCCTGGGGAAAAAAAAACCTGTTTATAAAAAGCAAACTCTGGGCTGGACCTCACCAGGTCC
CTGGGGTAAACACTGCCTGTGTGCTTACAAGACCATTGACTGAAACTGTTCGGTGACTCAGGAATAAGCCTGGTG
GTGAACCCGAAGAGCAGAAATTACACATTTTTGTCAGTTGCTAGGAGTGTGACTGTGTGTCTAGCCTGTTTGCAT
CATCAAGAGAAGCAAGAGAGATTGGGACTGATCCCAAAGGCCCCAATTCTCCAGGGAACCCCCCCTTCCACGCTA
TGAAAAGGAGTACTCAGATGTGGACCACCCCCTAATGTGAGGAGGAGGAAGAGAGACCATTTGGAAGGAGCTTTG
GGATTTGACAGGAAGCAATCAGGTCCAATCCAAAGATGCGCTGCCTCTCTTCTAGCCTCAACTGGGTCTCCTTCC
CTGCCCTAATCTACATTCACCTCTTGCAGCCTAGCAACCACTCAGAGAGACAGCAACTAGAGCTCTCCCACAATG
CCCGAGCCATGGTCAGTAGAGTCAGACACATCAGTCTCCATCTTAAAGATGGGAAAACAGAGCCTCTGAGAAGAG
AGGTGCCACATCTCAGTACACAGGCTAGG
>Egr2 CRM 4
GCTCTAGGCAGAAGGAACAGCCAGTATATAGGTTCTGAGACAGGAAGGGCCCCGGGACACTTCAGGGTTGGGGCA
GGATGGTGACCCACGGAAGATTAGCCCTCTGGAGCAGCCTTGTGGATTGGTGGGTTACTCCGAACACAAACGAGA
TAGTGTGGGGCCAGTCGTGGCGAGAAATGCAAACCATAGATACTGTGGCCTTCTCTGAGCACACCATTGCACAGG
GTAAACTGGGTGTGGAGAAAGCAAAACCATTAAGGCTGTCTGTTAAGCTTGTCCCATCTCCCTGGTCAGGCTGGT
CACCCCAGTTGAGACTCAGGCACACTTAAGCCACCCCACACAGATGTCAATCTCTTAGGTGTGATTTCACCACAG
ACTGATAACCGACAGGGTGGCAATCACTGAGTCAGTGCCAGCCTCGTAAAATTGGGACAAGTGAGGACTCAGAGG
ATGGAACGGAAGAGGAATGGCACCCAGAATTGCCTCCCAGGACCTACGAGGTGAAGATCTGTCTGCCTGGTGGCA
GCTCATCACCATTAGGGCACTAGTCCTTAACCACTGTATGCAACAGGACACGAGGGTACGAACGGGGTCAGTCAC
AGCTGTGGAGATGCAAACAGATGACCTTAATGATAACTGGCCATTTGCCATCCCAGTTCTCAGAAATTGCACAAG
CCTTCGCATGCCTAAGCCTTCCCAACCAGTTCCCAACCAGTACGTGGACACCCATTTCTCAGACTGAAACTCAGG
CTCAGCGGCATCGGGAGGCATGGTGTATAGCCCTAGGTAAGTCACCTGAGCTCTCTGGGATTCATTTGGCTTCTC
TGGGCAGAGGGAAGGTGAACAAGGCTTGTCAGTATCAAGAGGTTATACTGTACATCTGAAAAGCAGTCATGTTGG
AGAATGGGAGTTGGAAAAGTTTGAGGAACATCTAAAAACATACTAATTAGCCTTTCCAATGGAGGGGAAGCATAG
GCAATTGGGAAGTTCACGGTGACACTGCTTTAGATAGGAAATGGACCCAAAGGCCCGAATCCCAAGACTGACTCT
TGAGAACTGGGATTTTTGTCTACAGGGATCAGAAGG
>Egr2 CRM 5
CCTTCACTTTTGGGTGGTATATGCAAATCTTGCTTTTCATTCCTGTGGCTGAGTCTGAAAAATCTTTGTTCAGAA
ACTTAAAGAATGATAAATACAAAAAGAACTTATGTTTACGGATCTGCATCTTCAGGAACTGGACTCTGTCAGTCT
CCAAATAAACTCCCAGTGTCGACTACTGCTACCCAGTGGGTGCTGTGAATAAATATCCATCAATCAGTTGATGCA
TTGATTAACGTTTAGGGATTGTCTCCCACAGGCATTCCAGTGTATTTAATATAGACAAGTGCACCATTTATGAAG
GGCAGGGACAATAAATGACCCTCATGTGACTTGTTCCACACCCCTGAGACTGTCCCCGCCCTCCCTCAGGCCTGA
AGCAGTGTTTTGAACATGTCTGACAGTACCTATCCTGTCTTGTGCACACTCTCCGTATTTGTACACTCACCAGCC
TCTATGCTAGAACCCTGTTTCAAAGAGTTAAGGACTCTAGTTCATCTTTTATTCCTTCTAAACTCTTACAAAAGC
CACAACACCCTACATAGGCCTCAACTAAATGATTTTGCCAATACTGCCCCCTTCCCCTGGGCCTAACAGATGACA
40
GCACTGTGTTAGTGACCAGAGAGCCCTGACATTTTAGATACTTTTGGACCCTAGTGATAAAACGCTTGACTTACA
TCTGTGCAAGCCTGTTTTTTTCCTTTTACACCGTGTGTTGGTTTGTATGTGTGCATGCATTTTTAAATAACCAGG
CTGTAATTAAGTCAGCCTATTCTTGATTTTCTAAAAATTATATAAATTATAGTTTCCTTTCTATAGCGCAAATTA
AAATACATCGCTGAAATTAAAGTGGACCTACAGGAGTGCCATTAGCATATTCAAAGCCGAAGGTTGATAATAACG
GTGTTTACTTAACTGAAATTTAGCGGTATTCATTCAAAAACAACATGTAGCCCTAACCGCCATCCATGAAATAAT
GCATGTTGCTTTACGGTTCTGAACCCTAATTAGCTGGGAACAAAACAACTCCTTCAACTCTTATTAACCGTTTCC
ATTCTGCTGTTCTCTGTGTTAGCTGAAGCAGAACACCCTTTGGAGGTGTTCTGGGACTCCTCCCAGGGGGGCGTG
GCCTGGAGACTGAGAAAGGACACTCCACCTTCCTTTATCCAAGTGAAAAGCAGTGCACATGGCTACAGTCAGTTC
TCATTTTCCTCCTGAGCATCGCCCTATTTATATCTGCACGTGGGTTTGCCTTCTTTGTGTGCAAATGCCTTGGCG
TGAGTGAGCCAACAAATAGGAGTTAAATCAAAATGATTTCAAGGAACCAAATTCCTCCCAGGCCCATCCAATCTC
TTCCTGGGAGAAAGCCTGGCCTGAGAGAAGCCTCTTCCACGGGCCTTCATCTCCTGGGGTGATGCCTCCCTTGGG
CCACCAAGTGTGGGTGATGGTTGAGACCACAGCCCTTGAATTCCTCACAGATTCAGGGCCAGATAAGAAGAGACT
GCCAGTGATGTGCAGGG
>Egr2 CRM 6
AGGCTAATTCTTGAGCCAGGTAGAACTCTGGCAAGGTTGGGTCCTGCTAAGGGCCAGTCCCTCGCACACTGTACT
GTGTGTGAACTCCCTAATACCTGTCGCTGTGTAGCAATGGAGCATACCCTTGGGACAATACACATCCAGTGATGA
TGAGGTGGGGTGAACATGGAAGCTACTGGTCTAAGGAATGTGATGACTGGAGACATCCTCTGAGACCCATTTTGT
GGGGTCCAGAACCGTTCCCACTAATTGGCCTAGGAAAGCCACCCCACATGGTGAGAGGGTCACCACACCACTGGT
CTGTTTAAAGTGGGAAGCCCTGAGCCCCTTCCTGGCAAAGCGTTCTGACTCTGAGTTTGGGGATAAATGACTACT
CCAGACGTGAGTCACTGCAGTTTGGAAATTCTCAGCTGCAGCCTGCACTTTAAGAAAAAAAAATTATTATTATCA
TCTCTGACTAATAACTAGCAAACCCAGAGTGAAAAAATAACTCTGTGGGAGATGAAGAGAAAGTTCTAAAAAAAA
AAAAAAAAAAAAAAAAAACTTCACAAGGAGCTCAAAGCACAAGAAGACAGGAGTACAGCAAGGCAAACAGCGAGT
CTTAAAATATTAACTGAGTAATTATACCAATTAACATTTAGCTGTATCTATGTCCATTCCATTTTGTCTCTAATA
GAATGATAGGTGGTGGTTATGGCTGGAGATCTCTGAAAGTTAATTAAAATCACCGAAACAGAACCAATACAATGT
CCATTCTTGTTGTTGTTTTAATTCATTGAAGGCTAAATAATCTATCTCCTATTCTGAAATTTATTCCAAAAGGAG
GCCGAGCCAGGAGTACAGATATGTTTTCTTTCTTATCCTGAAGTCCCTGTCATTGTGTTCCATCTCATCTGGGCA
TCTAGCTGAGCCACCAGTCTCCTACGAGAGTCGAGTAGGGTGTGATGTTAGTGAGGACCAGTTATCCCGATCTAC
TGATGTCTCTGGTCTCCTGAATCAGAGGGCTGCCCTGTTGTGAACTGGGTTTTCCCTAGCCTCTCACGTTCAGCC
ACTGCAGTCCAAGCGTACTCTGGGGTCCAACTCCATGCAGGAGCTCTGTGGAGGTCTGAAGCGAGGCAGAAACTA
AATGAGACAGCTTCCCTCTCTGCTCATCCATGGCAGCAATCCTGAGAGGACCTGACTGGCCCACAGGGGAAGATC
ACAGCTGGC
>Egr2 CRM 7
CCTCTGCCATTGACTCTATGTCTCCTCGAGTCCCTTTTCTTCTCTTGCCCTTGATTTCCTCAATATCCCCGGAGA
AGGGTATGTGAGAGTCCAGGAACCATTCTATCCCCGCCCACCTACCCCCATCAACCCCAGCTTTTCCCCCGGGCA
GCCTTGAGTCTCTGCTGAGACAGGGAAAGGAGAACAGAGCCCTTTGTGGCAGGCTGGGGACGGGCAACTTGAAAG
CACTAGGGGTAGAAGAATTGAAGCATTTTTGTTGCGGAGGAAAGCGTGGGTTGCAGGCAGGGAGCCAGAAGCCGC
TGACATCACCATCATACTTGGATCAAGCCTCGAAGAAGTGGGCGGGAGCCTCAGGGGAGTGGGCACAGTTACCTG
GTAAAGGGACCAGAGGGCCTGAGTCTGGTCCAGTAGTGACAGGAACACTCACTGGATTAGGGACTAGTAGTCACA
GGTCAAAGATAAAGAGGAAAGCCTGTGGAGGCTCTGGGGAGCCATTTCTCCATTCTATCCCTTGATTCAGGTCTT
GGGGGAAGGACGGGATGGAGGCCTGTGCAGGTTATCTTTGTCCCTGGGCCTGAAGTCAAGCATTAGATGCAGCAG
CTGCCAACCCAATCTTTTCCTGCTGTTTTGAGACTGCCGGTCAGAGAATGGGAAAGGAGGCCACCCCTGGAATGG
GGTACAGCAGCTTCCTTATTTTCAGAGGTAGCCTCCATTAAAATGCCACAAAACTTAAACCTCCACTGTTGATAA
GCCTCTCTCAGGTTTAAGCATGCCCCTCTTCTCACAGACTTCAATTATTTTGTTTTTCCAAATAGCTTCTCCCTA
GGTATTTGAAAGTACTAGCTTTCTGGACCATCCTGTTCTACACTTCCGGTGAGGTCAGAGCAGACAGAGCTAATC
GATCC
41
>Egr2 CRM 8
GAATTCAGGCCACGAGGCGTGGTGACAGGTGCCCTTAATTGCTGACCTATCTCTCCAGCCCAGACTCTATCTGTT
TTGGTGGATGGTGTTCAGTCCAGGCAGAGGAAGTGCTACAGGGATGGCCTGCCTGCAAGAACCAATTAAGGCTGC
CCTTCACTCTGAAGACAATCTTACTTCCCAGACTGGAGACCTGTGGGTAAGGGACAGGATTATCTATTTTAATGT
TGCTGACCTCATTTTCTGAAAACTAAGTTCAACCGGAACTTCAACGTGTTTAAACAAATAAAAGTGACAGGCAGG
TTTTCCTGGAAGTTTCCCCCATCCTTATGTCATGGACGTTATTATTTTTTTTTCTACATGGAAACATCTTGAAAC
TGGAAAGACATGGGATATTATTATGTGAGACCCATTCATCGGGGTTGGGTTCAAGTCCTTTATTCCAGATGTGGG
AGAATCTGGACTTGGCGAGCACGCAGTTTCCTTTGGATCACACAGAACATTTGAGGCTGTGTTGGGCGAAGCAAC
CCCTGACTCTGTACTCATTACACAGTATTCTTTCCATGAGACCAATGCTATGGATTTGAGGATCACCTGTCAATA
TCCAACAGCCCCCCCCATCTAAACTATGAGCCCACCAAAGGTGACACGCTGAGGACCCCCACAGAGAGACATGCC
TGACTCATCCCGCCCCCCATTTCACAAGTGAAGCCTGTGATTAGAAAAATAATAAAAGCCCATATCACCTGTGTT
CCTTGTTTATTATGCATTATACAGGTATCTGGTTGCAAAACCCGAACTCCTACTCCTGTCTCTCAGTGAGCATTT
GAGGCAGAACACACTGACCTGCCTCATCAGATAACATAAGTGCTCCCTTCCCAGGTGACTTCCACCCTCCCTCCC
CATCCCCCTTGGCTGCACTCCCAGAGAAAGAGTCCAATGGGGGAGGGGAAGGGGGGAAGGAGACACAGACAGGGA
TCCATAGATTTTCCGGGTTGAGGGGGGGGGCTCTGAGATCAGAGGACTATGTCCCAGCTAAGTCAGAGGGGATTC
TCTGACCCTACCCTCTTCAGTCAGGGTCCCTCTTACAGTCTTTGGAGGATACCAAGTTTGGGTGCACAAA
>Egr2 CRM 9
CAGGACAGCTGGAGTAACCCTGTCTTTGAGGACAGAGGCTGACACCTTGAGAGGACCAAGGCCTCTCCCCAAGGT
AAAGTCATTTGCTTTAGAGCTGACTTTCACTGGGAAAAATAAAAAACACAAAAAACAAAAAACTGACAAATGTAG
ACGTGTGCTCTGCAGGACACCGGTCTGAGTCAGACTAGGAGACAAAGAGCTACTGTGTGCTATTTTCATGGAAGC
CTATTAGGGGCAAGGTTTGGGAGTACATTCACAGGCTGCATCAACTGTGGCCTTGGTGCGAGACAAGGCTCTATC
TTACTTTGCGACGCCAAGTGGAGATAATGAGGCTGAAGAGCTTGTGCTTCCTTCCTCCATCAGCATGCTCACAGA
AACCAGGAAATGGAGAGAATATTCCACAGTGAGATGCCCTGGTGCTGGCTATTTGAAGTCTGAGGGAGGTTTAGG
AGCCGTCTGATGGGGGTGATCTTGTTTTGAGGCTCCTACCTTCTTACCTCTTTTCTTTCTTTTCCCATTTAGCTT
GCTGGGCAATATTGTCAGGAAACCTGAACACGTGGTCCACCGTTCTTGTCGTGTAAGTAGCAGACACCCCTTAAG
TCACAGAACCTCCTGGGTGAGGCTCCTTGGGACTCTTCTTAGAGACTGGAATTAATGAAAGATGTGTGTTGTCAA
GCTACCCAGGGGAGATGAGTAATTTATCTGGAGGATGAGGGTGATCTCAAGGGCTCTCGCCTTAGGAGCATTGTC
TCAGTGAACCTGCCAGGGTTCAGAATTTTGGCCCGTCGACTGTTAGTTATCAGAACGCTCACATAAAGGCAGAAT
GAGTTGCAATCGCCAGATTTATGAGGCCTGTGTCTAGTGAGGAATGTCCTTACCCATGAAGACTAGGCAATCCTG
TCCTCCGAAAAGTCCATCTGGTTTCTGTAACATCCATGGGAAACTATGTTCTAATCAGAAGGTTGCTTGGTGAGT
CAGCCATCTCAACAAAAGGAGTTACTTACGTGATTCAGATGTTTAGATGCACGGATGAAAGGCAGGTTTCACTCC
ACCAGCTCGCCTATTGATTGTTAAGGGCTGTCCAGCATGTTGAGAAATAGGTCATGTTCAGGGTTTGTGAAAAAC
AACAGTTGACTAGTGATGTCTGCGGTAGCCTAGACTTGGACAAAGGGGCAGATGTGCTATGTCTGAGCACCCCTG
GCATACTCATAGCTGACCAATAAACCTTATGAATTCAGGCACACATAATCAATCACAAGATTACGAGAAATGGAG
CAGAGGTGACTGGTAAGTCTTTGTTTGGTCACCAGTAGCGTACTTTTTGTTTTTTTAGTCCTTCTCTAAGAGATA
GATCATGTGTCTGATGAGGC
>Egr2 CRM 10
TGTCACTTCGTCACGGAGACAAAAACATCCGAGCTGTACATCGTGAATTTCCCCAACCGTTTTGTCAGCAATACC
ACAGCATAGCTTTGGAAAGACACAGAGCGCTGTATTTGATGGAGAGGAAGAGAGGCTCCTGACTCTTAACTGCTC
AGAGTATGTGGAAATGTTTTCAAGTATTTGAGGAAGAACAATTGAACCTAAGCCTGTGTAATGAGCCGCAGAAAG
GGAAAGAGTGACTCTGGGGAGGCACTGCAGGAAATCTTGCAGAGGTGAGATCTGACTGGGGTAAAAGTGCAACTG
TCTGGAGGACAGGAGGGCTCTGTGCTTCCGGTGTGGTTAGAGGAAGTGGGGCTGCAGCAGCTGGAGTCAGCCCAC
CCCAGGGTGATCGGGACAGGTTTTCCAACGGGCAAAACCTCTGAACAGAAGCCAACAGGGTACCAGGGCTCACCG
GGAGCCACGGGAATGCCTAGAGAGAGCTACAAGGGATTCTGGCCTTGGACTGACGCTGGCTTTCAGTCTCCAGGA
AGGTGTCTGGCTAAATCGGTTCTGAGGCTCCATCTGGCCTGACATTCTGGCCCCACTGAGGGTCACCTTTACTCA
42
TGGACATTTCCCTTTCTTTCAGCCTCACCTGCTTTTTTTCCTCCGTGAAAATAGAGAAGTGGTAAAACAGGAAAT
AGGATGTTGGTTGAGGTAAGGGGCTATGCCCCAGAACTAAATAAAACTCCCCCCTCCCCCATTCTCAACTCTGTC
ACTTTGTGGGACCTCACTCAAGTCTCTGGAATCCTGGTTTCTCAGAGTAAAATAAAAACTTTGGCGAGAACGACC
AAGGATGCCACTTCCATTTCTCATCCCTTTGACTAGCCTAAGAACTAGACTAATGAGAGAGTCATCGTCTTGAAC
TGATCCTGGATAGTGAGAGTACCTCACCCAGGCCCTTCCCTCCTGGCTTGTGGAATCTGTCTTAAAGGAAAAGCA
GTGAGTTCCTAGTATCCCCAGGGCCGTGACATTTCCCTCAAGCTCTGCAAATAGTGTCAGATCCTAAGACCCCAG
AGCTTATCCTGTAGCTATGACTCCCAGGAGGTGTATCCAGTGGGGGGTGGGGGGGGGCAGAACCTGCTGCTTGAT
GTCATAGCCAGGAGGGTCAATCGTTCTGAACATGGCTAGCTTATCCAGCGCACTTTATTTCCACTCAGCTCCACT
TGTCCACGCGTGTGAAAGAAATCAACCCAGTACTTTCCCTTTCTGGTTTTCCCAAAGTCAACTACTCTTTCTGGC
CGACTTTACCCCTCATTAACTAGTTATTTGAATGCTGCATGGATCCTTTGGAAACCTCCTTCAGAGCAGTCTCTG
CATAAGCCAGTTAAATCAAGGGGCTTCGATT
>Egr2 CRM 12
TCCCTTTAGTGGAATCAGTTCTGTGGTGGCGCTTCCCCTCCCACCTGTGCGATGGAGGCTGGCAGGAGGCCTGGG
TCTTCCTAGCAGCCCTCTCAAGGCTGATTCATAGCACCAATTACTCAACATGTATAGATTCTAAAATAAAAATAA
GTTCTTCTAAATGAATCTAAAAATGTACCCGTATTCGTTGAGTGCTTGCATCACAGACTGTGTTTAGTTTAGATC
CTCCTTGCTTGCCTTCTTACACCAAGCTGCAAGATGGGGCGCTAGGAGGAAGAAACTCACACAGGCTGAAGCTCT
TGAGTTCTATACTTAAGTTCTATCTCCATGACAGCCCGACAGCCCCCTGACCATGTGGAACCTGCTCACGTGGTC
TCGAGGAACCGTTTTAGATTCTCCGCTCCTGCTTGCTTATCTAAGAGTTACTTTCATTGGGCTTATTCTATCTCT
CTCCTGGTTATTTACCATCCAGGCGGGGAGGGCTGCATTTTCCTATATTTCTTTGAACTGGACAATGCGCTTGTC
GATTCTAGCTGGTCTTTGGAGCCCTTGGAAAAAGCAGAAAAATGCGAGGTGGTGGAGGTGCGGAGGGGAGACTAG
ATCTTGCCCTGTGGGGGAGCCTATGAACTTGTAGATCAAAAGATCAATCTTTGATTTAAAAGATGGAACCACTAT
TCACGAACCGCGTGGTGACTGGCCAAGAAGAAATACAGTCACTTGTTGTCATTAAAAGATTAAAATAGAAAGCTG
CAGTTCCCCACCCCTCCCCCAAACCAACCTGGAGTCCCATACACAAGGAAGGTGGGGGCTGTGACTCCACTTAGT
CTCATTAATTGCTGATAAGTGACTGGTCCATAAGACAGAACTGAAAATTCAAGCTGACAAAGGGGGTGCGACCGG
TCTACCTCTAGAGAGTGAGCCAGAACCATTCATTGCTCTATTAATTTTTTTATCATTGCCCTGTCTCTTAGTACA
AAAGTGTTTAATCTGAGAATGTAAGTTCTGAGACAGGCTCCAGTCTCTAGCACTCCATACAATATCTCCATATCT
TGTTTACCTAATTCTACACGCTATCAATAGACAGTTAAATGTGGAGCATCAGGGCAAAGGCAGAGCTCCTTAGTC
GAAAGGATTTTGAGAAACG
>Egr2 CRM 13
ATTCTGGTTGCAGAGATCATTTCTGAAGTACTTACTGGGTGGAGTTAAAATAGTTATTGACATCTGAGAACTAGG
AACAAACACCGCTTTAGTGAGGGGTTAGCACAAGCCCTCCATGTTCCCTTGCTGGCCTGTGGATAACAAGCCCTA
CCTAGCTAGGCTTCAGGGATTTCCCGCTGAGTTAGTGAGGAAGCCTTCTGACTCACTGTTCTGTCCCCTGTAACA
GCCTATAGCTTCTCTCTGCTCCTGAGTGGGCTCCGGAAAATAAATACTTACAAGCAAGTGCCCACTCAGAGATGA
CACCCCCCAGGGCTGCTTTCAGAACTCTGACCACGGCCTGGAACTCTGAGTTGTGGCTTTGTGGAAAGTGTGGCA
GCATGGTCAAATTAGAAATGACTTTCTGAGTTCCAGGATTAATCTTACAACCACAGGAAACTTTTTCAAAGAGCT
ATCATTCCTTCAGATCAAACCTGCTCGGACACCGCTGATGCCAGCTGATGCCAATGAAACGGCTTCAGCAATTAC
CCCAGGACTCATTTCTCTCCAGTCCTGGCGTCTCAGATGGTTGCCCATGTTTGGAGGGATCATTTGTCACAAGCT
TGGTGGATCTAGAAATGATTCTCAAAATAGCTTAAGATAGGTAGACACGAGGCTGTAAAACTCCAAGACAATCTA
AACTTACTAGGTAAGCTGCCTCCCCCAAAACATTATAAAACTCTAAGGTTTCTTTTTTACTGGGAACATTTTATC
ATTTTCTTTTAAACTCAAAACCATCTGACCTACCCACTTGTGTTATGCGCGTTGGCATCTTTACTCTGAAAATCT
GGACCTCAATATTTATAATGTCCACTGCTCCTTATGCAATGTAATCTGCGGGTGTAGCCACACTGCTCAGCCTCA
ACAGAAAACCCC
>Egr2 CRM 18
GTGTTCTGGGATAATCAAATGAACTCACATCCCTAAGACATTTGGCAAATGGTAAACAATCAATAAACACTGGCA
CCAGACAGCTCTTAGTGATGAAGCACTAAGCTGGGCTGGCAACGATTCCTAGTGTATATGCTATACAAAGATGCT
43
CCATTTCATAAAGGGAGACCCAAAGCAGGAGTGGGGCTATGCCTCTCTGCTTCATCTGAGGTTTCTTCGTTCCTT
GTTCTTCCTTTCCTGGGGTGGCAGCTAGTTGGGACAGATAAGGTCTGTCAGACACATGTGAAGGGGGCTGGTCTA
AGAGCTAAACTTAGAGCACAGAAGCAGGCCCACTTTGTGCCAAGAGGTAACCTTGGGGTGCAGAAGCAGGACCTA
TTCTATGAGCACCCAATCCACCCACTAGCCAGAGACCCCCGTGTGTGACTTCTGGGCTGTTGTGTCTTGGGGAAA
GAGTGTGACTAGCTCTAGTTTTCTGTGTGTTTACCTCAATGGGCTTGTAATTAATATTGTAGAATAGGGGCTCAT
TAAATCAGACTCATTATGACTGTCCAGGTTCTGGAGCTCAAAAGAAAAGGATTTTCCTGTTCCAGATGCCTGATG
GATCGTCCTAATCAAGCAGGCAGTTCTCTTCTGTTGAAGATTGTCTTACGAATGGAGATGAGATTTATTCTGACC
GGTGCTGTTACAGTGAAGGATATGGCTGCTGCCGACAGGCATTATAGGTCTCTGTCAAGTCAGCGGCTCCCTTCG
CAAGGCACACAGAACACAAGGATTTGTAAGATGCTGTGCTTGTAGCGGCTGGAATGTCCACTGGCTTCATTCCCA
ACTTTGGCTCAGAATTGAGTCTTCCTTCTTGGGTTAGGGAAATGGCTCTGTCGAGTGCTGGCCTTACTCTATTC
>Egr2 CRM 20
GGTTGGAAAGCACTTTGATCTTGGGTTTTTAAGTTTCTGGTGGCTCATCTGCCAAGGTCAACTTTTATTAGCACG
TCTCTGCTCTTCAGAAAAAGTTTTACACCAGTCTCTCCAAAGGTGACTAGAGTTTGAATCAGTCCAATTAACATG
ACAATGACAAAAATGTCAGGATCCATCTCTAAGGGAAAGATTTGTCCTTTGAAAGCCACTGTCTTGAGGTGTCCA
GGCAGACTGAGAGCTTCTCTGGGACTATTTGGATTTTCTGTTCCTGAGCCAAAAAAACATTCTTTAGCATTGGCT
GACAAAATGGGTTTCAGAACAGGAGGTCTGCGATGGAAACTTTAATTATGCAGGTCCTGTCAAAATCCATCTGCT
GAATTCAACTGTTTCTGGGTAAGGTACTTCTTATTTTATTTTATTTTTTTTTAGCAGTGTTACTGGGTGACACTT
TGATAGGGGTCTGAAATTAGCTCCCTTCACAGGCTAATTTATTCTGCCTCCTTCCCTGCATTAGCAATATTTATG
TCAGGTTGGGATGTCTTCATGGCATTTTTTTTCATATTTTTTTAATGGATGATTTTTATGATGCTACTTAAGTGT
TGGAATTGGACGGTGAAAGGTATCAGTTTGGGAGATGTCTCCAGAGGATCGGCGTTGGTGTGATTCTGGCTGACA
GGTCCCATGACTTATCAATAAAGCCATTCTGTACATCTTGAGGGAAGATGGCCATCAAAAGTTGTTTTGGTTTTG
GGAAGACTCAAATGCTGGGCACACATCTGTGATCTTATAACCCCTTCCATCACGCCTCGTCTATGTAACTCGCAG
CCGTTACACTAAGCTATATTTACACTTATGGTGAACTCTCATTTGTGCTGGGGGGAGGGCTCAGGTGGGAATTCC
AGGTGGTGGCTAAATGTACAGACAGTGAATGTATCCACATTGGGCATTTTTTTTCAGCTTGAATACCACAGAGTG
AGCTTTAGACCAGCAGCACGGTTAATAAAATTGTCATTTTAGGGCACTTGCCTGGATCAGTTGTTATTTTATGAA
ACCAGAGGTTTCATAATCAATGATGTCATATGTGCTTCAAAGACTAAGAGGGAGAAACGTTCACCGACAATCTTC
GGTACTAAAAGAAATCCTAGTGAAAATACTGGAGAAAATGGAGCCGTGCCCATAGCCTTGCATCTATGTGGGGTC
TGCAGCCTGAAGGTCGTTTGGAAACAAATTCATCTTCAGCCATCAAAAGTTAGAGCCTTATGCTTCTTCTCGGAC
ACTCAAAGCATCATGGCAAGTTTGGATAAACAGTTCTTTGGGGGCTTTGAGATGGGAGAAGAAAATAAATATGCT
TTATGGTTTATGGTCGGAAGTGTGTACTT
>Egr2 CRM 21
GTAAGCCGCGTGTCTTCTTCGGAGCTTGTGGGTGGGGAGCGGTGGGTGTGGGGGGGCTTGGAGGGGAGAATGGGG
ACTTGAGGTGGACGGCGCTGACCACACTGGCTCTGAAGACTGGAAGTAGGCGCAGGGGGAGCGGAAAAAGATAGA
CTAGGGCTGCCTGGCCGTGCGCGCCAGCAGATGCACCTGGTCACCAAAAGCCACCCGCATGCCTGTAGCTCCGGG
CGCGCCCAGCTTTGGCTAGGCGCAGGGTTGAAGCATCATCTCCTGGTCCTTGAAGCAGGTCCATCAGCCGGACTC
TGCGCTACTGGGCTAGGGTCAAAGAGATGGGAAAGTTCATCAGTCGGGTTAGAGCTGCGCGTCTGGAGAGCTCAG
TGTCAGGGGAGCGCTTAGGGAGCGCGCGGGCAGCGCTCTCTGGGCAGAGCCCACTCCCGGAGGATCTCCCGAGCG
CGTTGTCCTTCTGGGGTGGCTGGGAGGCACCACCCAACTTGCGCATCGCCCCAAAGTGAACAGGGTTAACAAGCC
GAGGCGGCGGAGCAGGGCAAGTAACAGCGAGAAGCGGCAAGGAGAATGCCTAGGGAGGCCCGCGCTGGCACCGGT
GTGCGCCGGTTCCTTGGCGATGCTGTGTAAAGTCGTCTTCCTCAGCCCCCGCGCCTTCCTAGAAGCGCAGCCTCT
TAGTCTTAGGCTAAGACAGGGATGCGCGGGTGTCCGGCTTGGCCGAGGATGTTGGGAGGGAAGGGAACCCTAGAC
TCCTTCCTCCGCTGCCCGGGTTCACACTGGCAGTCTTCAGCTGAAGCGCAGCGGGAGAGGGAGGGGGAGGTGGCA
CCAGGAATTAGCTTCAAGAACATCACCCACCCACTTATCCATTCTGGGGTTCCCCTCTCTCGCGGCTGGCACCTT
CTCTGTACCGCCTGGGATTGATTGGTCGGCTCTTTTTATTGGTGTCTGTATGGTTGCCTCGGCGGTAAGACCACA
AGGCAAAGGAGTGGGAGAGAAGTCAGAGGCAGATGGGAAATTGTTTGATATTGTTGCTGCTATTGGGTGTTTTTA
44
TTGGGAACTAGAGTCACAGATGGCCGGGAGATGGAAGCAGGAAGGGGCATGCCCTGCTTCTTGGAGATACAGTTT
TTGAGTTCCAGACTCTGCCTAACTTCTTACTGTCTCTCCAGGCCTCAGTTTCCCCACCTGTTGCTCTGTAGTGTT
GGAATCATGAAATGGGGATTGCCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTCCGC
GCGTCAGCATGCGTGTATGTGAATTAGTTCTTGCCTGTATTCTCATTGTCCCACCTCTTCCCTGACTTTTCCCTC
CCCAG
45
References
Arnosti D, Gray S, Barolo S, Zhou J, Levine M (1996) The gap protein Knirps mediates both quenching and
direct repression in the Drosophila embryo. The EMBO Journal 15: 3659–3666
Berg OG, von Hippel PH (1987) Selection of DNA binding sites by regulatory proteins. Statisticalmechanical theory and application to operators and promoters. J Mol Biol 193(4): 723–50
Cai HN, Arnosti DN, Levine M (1996) Long-range repression in the Drosophila embryo. Proc Natl Acad
Sci U S A 93(18): 9309–14
Chlon TM, Doré LC, Crispino JD (2012) Cofactor-Mediated Restriction of GATA-1 Chromatin Occupancy
Coordinates Lineage-Specific Gene Expression. Mol Cell
Doré LC, Chlon TM, Brown CD, White KP, Crispino JD (2012) Chromatin occupancy analysis reveals
genome-wide GATA factor switching during hematopoiesis. Blood 119(16): 3724–33
Garber M, Yosef N, Goren A, Raychowdhury R, Thielke A, Guttman M, Robinson J, Minie B, Chevrier
N, Itzhaki Z, Blecher-Gonen R, Bornstein C, Amann-Zalcenstein D, Weiner A, Friedrich D, Meldrim J,
Ram O, Cheng C, Gnirke A, Fisher S, et al (2012) A high-throughput chromatin immunoprecipitation
approach reveals principles of dynamic gene regulation in mammals. Mol Cell 47(5): 810–22
Grass JA, Boyer ME, Pal S, Wu J, Weiss MJ, Bresnick EH (2003) GATA-1-dependent transcriptional repression of GATA-2 via disruption of positive autoregulation and domain-wide chromatin remodeling.
Proc Natl Acad Sci U S A 100(15): 8811–6
Harmston N, Lenhard B (2013) Chromatin and epigenetic features of long-range gene regulation. Nucleic
Acids Res 41(15): 7185–99
Hastie TJ, Tibshirani RJ, Friedman JH (2009) The elements of statistical learning: data mining, inference,
and prediction. Springer
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK (2010)
Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required
for macrophage and B cell identities. Mol Cell 38(4): 576–89
Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments
of multiple sequences. Bioinformatics 15: 563–577
Hewitt GF, Strunk B, Margulies C, Priputin T, Wang XD, Amey R, Pabst B, Kosman D, Reinitz J, Arnosti
DN (1999) Transcriptional repression by the Drosophila Giant protein: Cis element positioning provides
an alternative means of interpreting an effector gradient. Development 126: 1201–1210
Janssens H, Hou S, Jaeger J, Kim A, Myasnikova E, Sharp D, Reinitz J (2006) Quantitative and predictive
model of transcriptional control of the Drosophila melanogaster even-skipped gene. Nature Genetics 38:
1159–1165
Kim AR, Martinez C, Ionides J, Ramos AF, Ludwig MZ, Ogawa N, Sharp DH, Reinitz J (2013) Rearrangements of 2.5 kilobases of noncoding DNA from the Drosophila even-skipped locus define predictive rules
of genomic cis-regulatory logic. PLoS Genet 9(2): e1003243
46
Kulkarni MM, Arnosti DN (2005) cis-Regulatory logic of short-range transcriptional repression in
Drosophila melanogaster. Molecular and Cellular Biology 25: 3411–3420
Laslo P, Spooner CJ, Warmflash A, Lancki DW, Lee HJ, Sciammas R, Gantner BN, Dinner AR, Singh H
(2006) Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell
126(4): 755–66
Lin YC, Jhunjhunwala S, Benner C, Heinz S, Welinder E, Mansson R, Sigvardsson M, Hagman J, Espinoza
CA, Dutkowski J, Ideker T, Glass CK, Murre C (2010) A global network of transcription factors, involving
E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nat Immunol 11(7): 635–43
Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen Cy, Chou A,
Ienasescu H, Lim J, Shyr C, Tan G, Zhou M, Lenhard B, Sandelin A, Wasserman WW (2014) JASPAR
2014: an extensively expanded and updated open-access database of transcription factor binding profiles.
Nucleic Acids Res 42(Database issue): D142–7
Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M,
Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res
34(Database issue): D108–10
May G, Soneji S, Tipping AJ, Teles J, McGowan SJ, Wu M, Guo Y, Fugazza C, Brown J, Karlsson G, Pina
C, Olariu V, Taylor S, Tenen DG, Peterson C, Enver T (2013) Dynamic analysis of gene expression and
genome-wide transcription factor binding during lineage specification of multipotent progenitors. Cell
Stem Cell 13(6): 754–68
Ogbourne S, Antalis TM (1998) Transcriptional control and the role of silencers in transcriptional regulation
in eukaryotes. Biochem J 331 ( Pt 1): 1–14
Papatsenko D, Levine M (2011) The Drosophila gap gene network is composed of two parallel toggle
switches. PLoS One 6(7): e21145
Perissi V, Jepsen K, Glass CK, Rosenfeld MG (2010) Deconstructing repression: evolving models of corepressor action. Nat Rev Genet 11(2): 109–23
Segal E, Raveh-Sadka T, Schroeder M, Unnerstall U, Gaul U (2008) Predicting expression patterns from
regulatory sequence in Drosophila segmentation. Nature 451: 535–540
Stopka T, Amanatullah DF, Papetti M, Skoultchi AI (2005) PU.1 inhibits the erythroid program by binding
to GATA-1 on DNA and creating a repressive chromatin structure. EMBO J 24(21): 3712–23
Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance
genomics data visualization and exploration. Brief Bioinform 14(2): 178–92
Treiber T, Mandel EM, Pott S, Györy I, Firner S, Liu ET, Grosschedl R (2010) Early B cell factor 1 regulates
B cell gene networks by activation, repression, and transcription- independent poising of chromatin.
Immunity 32(5): 714–25
Weigelt K, Lichtinger M, Rehli M, Langmann T (2009) Transcriptomic profiling identifies a PU.1 regulatory
network in macrophages. Biochem Biophys Res Commun 380(2): 308–12
47
Download