Transposon-mediated rewiring of gene regulatory networks

letters
Transposon-mediated rewiring of gene regulatory networks
contributed to the evolution of pregnancy in mammals
© 2011 Nature America, Inc. All rights reserved.
Vincent J Lynch, Robert D Leclerc, Gemma May & Günter P Wagner
A fundamental challenge in biology is explaining the origin
of novel phenotypic characters such as new cell types1–4; the
molecular mechanisms that give rise to novelties are unclear5–7.
We explored the gene regulatory landscape of mammalian
endometrial cells using comparative RNA-Seq and found that
1,532 genes were recruited into endometrial expression in
placental mammals, indicating that the evolution of pregnancy
was associated with a large-scale rewiring of the gene
regulatory network. About 13% of recruited genes are within
200 kb of a Eutherian-specific transposable element (MER20).
These transposons have the epigenetic signatures of enhancers,
insulators and repressors, directly bind transcription factors
essential for pregnancy and coordinately regulate gene
expression in response to progesterone and cAMP. We
conclude that the transposable element, MER20, contributed
to the origin of a novel gene regulatory network dedicated to
pregnancy in placental mammals, particularly by recruiting the
cAMP signaling pathway into endometrial stromal cells.
The defining novelties of Eutherian (placental) mammals include prolonged internal development, maternal recognition of pregnancy, an
invasive placenta and a richly vascularized uterine endometrium that
can accommodate implantation8,9. An essential step in the establishment of pregnancy in many placental mammals is the differentiation
(decidualization) of endometrial stromal cells (ESCs) in response to
the hormone progesterone, the second messenger cAMP and, in some
species, fetal signals10,11. Decidualization of ESCs involves extensive
reprogramming of many cellular functions, including the simultaneous silencing of cellular proliferation pathways and activation of
progesterone and cAMP signaling pathways. Thus, the evolution of
pregnancy was likely dependent on the evolution of ESCs and hormone- and cAMP-mediated cell signaling.
To better understand how the gene regulatory network in ESCs
evolved in mammals, we sequenced the transcriptome from human
(Homo sapiens) ESCs differentiated with progesterone and cAMP
and from the endometrium of mid-pregnancy armadillo (Dasypus
novemcinctus) and short-tailed opossum (Monodelphis domestica)
using high-throughput Illumina sequencing (Fig. 1a). A total of
13,505,261, 13,218,476 and 14,830,816 75-bp paired-end reads were
generated for human, armadillo and opossum, respectively, and
mapped to 17,550 human, 10,590 armadillo and 11,824 opossum
genes. Of 9,323 1:1:1 human:armadillo:opossum orthologs, 5,158
were expressed in human ESCs, whereas 7,433 and 4,857 genes were
expressed in armadillo and opossum endometrium, respectively (see
Methods). We found that 1,532 genes were expressed in both human
and armadillo endometrial cells but not those of opossum, whereas
199 genes were expressed in opossum but in neither human nor armadillo. A parsimonious interpretation of these data suggests 1,532 genes
were recruited into endometrial expression during the evolution of
pregnancy in placental mammals (Fig. 1b).
We annotated these 1,532 genes by their Gene Ontology (GO) terms
to identify biological processes and pathways that were recruited into
ESCs in placental mammals. We found that several pathways with
essential roles in pregnancy and decidualization were over-­represented
among the recruited genes, including ‘Regulation of G-Protein
Coupled Receptor Signaling’ (P = 0.006), ‘Regulation of Protein
Kinase Activity’ (P = 0.002), ‘Receptor-Mediated Signaling’ (P = 4.17 ×
10−5) and ‘Intracellular/Stress Activated Protein Kinase Cascade’
(P = 7.18 × 10−14), as well as more general biological processes such as
‘Signal Transduction’ (P = 2.00 × 10−8), ‘Response to Protein Stimulus’
(P = 0.008) and ‘Cell Differentiation’ (P = 7.18 × 10−14). The overrepresentation of genes involved in G protein–­coupled receptor
(GPCR) signaling is particularly interesting because GPCRs ­mediate
the cAMP signaling pathway, which is essential for decidualization
and the establishment of pregnancy10. These results suggest that
recruitment of the cAMP signaling pathway into endometrial cells
was likely a key innovation during the origin of pregnancy. Indeed,
54.89% (841/1,532) of recruited genes but only 37.06% (6,504/17,550)
of ancestrally expressed genes were differentially regulated upon
progesterone/cAMP stimulation in human ESCs (P = 5.2 × 10−50,
hypergeometric test).
Although numerous progesterone/cAMP-responsive genes are
expressed in human ESCs, one of the most dramatically induced
is prolactin (PRL). Notably, the progesterone/cAMP-responsive
enhancer of PRL in ESCs is derived from a hAT-Charlie family DNA
transposon (MER20) found only in placental mammals12, suggesting
MER20s have played a role in rewiring the gene regulatory landscape
of ESCs. To determine if other progesterone/cAMP-responsive genes
are associated with MER20s, we searched upstream, downstream
and within the coding regions and introns of differentially regulated
Department of Ecology and Evolutionary Biology & Yale Systems Biology Institute, Yale University, New Haven, Connecticut, USA. Correspondence should be
addressed to V.J.L. (vincent.j.lynch@yale.edu).
Received 4 November 2010; accepted 1 August 2011; published online 25 September 2011; doi:10.1038/ng.917
1154
VOLUME 43 | NUMBER 11 | NOVEMBER 2011 Nature Genetics
letters
a
Birds and
reptiles Monotremes Opossum Armadillo Human
b
Human
200
77
1,532
105 Mya
170 Mya
3,349
199
150 Mya
1,320
310 Mya
1,232
Opossum
Armadillo
human genes for MER20s. Notably, we found that 42% (6,949/16,562)
of MER20s were located within 200 kb of the transcriptional start
and end sites of the 6,504 differentially regulated genes, whereas
only 8% (4,834/60,299) of MER20s were found in the same window
a
341
Number of genes and elements
100
80
60
40
20
90
10
0
11
0
12
0
13
0
14
0
15
0
16
0
17
0
18
0
19
0
20
0
70
80
50
60
30
40
20
0
10
–2
00
–1
90
–1
80
–1
70
–1
60
–1
50
–1
40
–1
30
–1
20
–1
10
–1
00
–9
0
–8
0
–7
0
–6
0
–5
0
–4
0
–3
0
–2
0
–1
0
0
Distance (kb)
CTCF/H2Ak5ac
8e-5
4e-4
6e-5
3e-4
4e-5
2e-4
H3K4me1/me2/me3
H3K27me1/me2/me3/ac
c
‘Repressor’
3.5e-4
3.5e-4
767
2.5e-4
2.5e-4
20
66
669
00
0
00
0
3,
00
0
‘Insulator’
‘Enhancer’
2,
0
1,
00
–1 0
,0
00
–2
,
00
0
–3
,
00
0
00
0
3,
00
0
2,
0
1.5e-4
1,
00
–2 0
,0
0
–1 0
,0
00
–3
,
0
00
0
2,
00
0
3,
00
0
1,
00
–2 0
,0
0
–1 0
,0
00
–3
,
kb
kb
30
20
0
kb
10
3
42
200
1.5e-4
0
k
–2 b
0
k
–1 b
0
kb
Normalized Count
b
CpG/PhastCons/7×RP
–3
© 2011 Nature America, Inc. All rights reserved.
Figure 1 Evolution of the endometrial stromal cell transcriptome in
Therian mammals. (a) Amniote phylogeny showing approximate divergence
dates between major lineages; opossum, armadillo and human samples
were included in this study. Placental mammals are indicated in red.
(b) Venn diagram showing the intersection of 1:1:1 homologous genes
expressed in endometrial cells of human, armadillo and opossum inferred
from RNA-Seq. In total, 1,532 genes were scored as expressed in both
human and armadillo but not opossum.
around genes not differentially regulated upon decidualization (Yates
corrected χ2, P = 1 × 10−4). MER20s are also located closer to differentially regulated genes than expected given a random distribution,
when compared to either genes that are not differentially regulated
(Fig. 2a and Supplementary Fig. 1) or to other Eutherian-specific
hAT Charlie transposons (Supplementary Fig. 2).
To assess the potential of MER20s to act as regulatory elements
for genes other than PRL, we examined MER20s found within
200 kb of stromally regulated genes for characteristics of regulatory
elements, including conservation, predicted regulatory potential, CpG
island density and association with various histone modifications.
As expected for regulatory elements, we found MER20s had high
PhastCons scores and 7× regulatory potential and were surrounded
by regions of high CpG island density (Fig. 2b and Supplementary
Fig. 3). MER20s were also associated with histone modifications
commonly found for insulators (high acetylation of histone H2 Lys5
(H2AK5ac) and CTCF), enhancers (high mono- and dimethylation
(H3K4me1 and H3K4me2) and low trimethylation (H3K4me3) of
histone H3 Lys4) and repressors (high H3K27me1, H3K27me2 and
H3K27me3, low H3K27ac), although few MER20s had epigenetic
marks of more than one type of regulatory element (Fig. 2b,c).
Next, we asked whether MER20s were preferentially associated with
the progesterone/cAMP-responsive genes that were recruited into
Distance (bp)
Figure 2 MER20s are over-represented near progesterone/cAMP-responsive endometrial genes and have genomic and epigenetic signatures of regulatory
elements. (a) Distribution of distances from differentially regulated stromal genes (N = 6,504) to MER20s in 5-kb bins. Gray bars indicate the total number
of MER20s in each bin, and brown bars indicate the distance of the closest MER20 to the gene. The number of genes with MER20s located between
transcriptional start and end sites is indicated by 0. The expected number of MER20-associated genes per bin given random positions in the human genome
(black line) and compared to genes that were not differentially regulated upon decidualization (blue line) are shown for the location of the closest MER20 to
stromally regulated genes (mean ± s.d.). (b) MER20s are located in regions of the genome with high CpG island density, PhastCons scores and 7× regulatory
potential (RP). The profile of histone modifications around MER20s located within 200 kb of genes either up- or downregulated upon differentiation of
human ESCs is shown for several methylation and acetylation events and for the vertebrate insulator protein CTCF. Panel names are colored with respect to
the profile shown below. MER20s are centered at position 0 (red box), with normalized ChIP-Seq tag density in 5 bp windows upstream and downstream of
the MER20 shown as lines. (c) Venn diagram showing intersections among MER20s classified by histone modifications as repressors, insulators or enhancers.
Nature Genetics VOLUME 43 | NUMBER 11 | NOVEMBER 2011
1155
letters
b
3.0
2.5
2.0
1.5
1.0
0.5
0
Substitutions per site
Substitutions per site
a
YY1
p300
C/EBPβ
CTCF
TGIF
p53
Hox
FOXO1A
ETS1
PGR
10
7
5
3
2
1
0.7
0.5
Pseudogenes
Fourfold degenerate sites
Introns
3′ flanking regions
Synonymous sites
3′ untranslated regions
Twofold degenerate sites
5′ flanking regions
5′ untranslated regions
MER20 nonTFBS (1.63)
MER20 pTFBS (0.75)
Nonsynonymous sites
endometrial cell expression. We identified 2,113 human progesterone/
cAMP-responsive genes with at least one MER20 within the gene itself
or within 200 kb of its start or end sites (‘MER20-associated genes’),
including 13.32% (112/841) of the progesterone/cAMP-responsive
genes recruited into endometrial expression. However, only 6.43%
(135/2,116) of ancestral progesterone/cAMP-responsive genes were
associated with MER20s (Yates corrected χ2, P = 3.58 × 10−8). We
annotated the human MER20-associated genes by their GO terms
to determine if they had similar functions and found significant
over-representation for ‘cAMP-mediated signaling’ (P = 0.005) and
‘G-protein receptor signaling’ (P = 0.005). Furthermore, genes in
GPCR- and cAMP-mediated signaling pathways are associated with
MER20s more often than expected by chance, including eight kinases
(P = 0.007), two GPCRs (P = 0.15), three adenylate cyclases (P = 0.002)
and three cAMP phosphodiesterases (P = 0.006). These results suggest
that MER20s directly contributed to the recruitment of GPCR- and
cAMP-mediated signaling pathways into ESC.
Previous studies have shown that transposable elements contain
transcription factor binding sites that can be donated to regulate the
expression of nearby genes13–19, suggesting that MER20s may have
recruited genes into endometrial expression by acting as regulatory
elements. Indeed, the consensus of 16,562 MER20s in the human
b
Enrich.
PCC
YY1
0 5 10
CTCF
genome contains binding sites for transcription factors important
for hormone responsiveness and pregnancy, such as C/EBPβ and
PGR20,21, FOXO1A22 and HoxA-11 (refs. 23,24), as well as more general transcription factors, such as CTCF, YY1, p53 and p300 (Fig. 3a).
To determine the probability of observing these transcription factor
binding sites in the consensus MER20 by chance, we calculated the
frequency of their occurrence in 10,000 random sequences equal in
length and base composition to the MER20 consensus. We found
that PGR (P < 1 × 10−4), CTCF (P < 1 × 10−4), p53 (P < 1 × 10−4) and
YY1 (P < 1 × 10−4) binding sites and the combination of Hox, ETS1,
C/EBPβ and FOXO1A binding sites (P = 0.03) were significantly more
common in MER20s than expected. To infer whether transcription
factor binding sites in MER20s evolve under functional constraints,
we estimated nucleotide substitution rates at each site from a random
sample of 500 human MER20s. As expected for regions evolving
under strong purifying selection, nucleotides within transcription
factor binding sites evolve at rates similar to nonsynonymous sites
in proteins, while nucleotides outside binding sites evolve more than
twice as fast (Fig. 3b).
We used chromatin immunoprecipitation with quantitative PCR
(ChIP-qPCR) to test whether MER20s bind transcription factors
important for pregnancy (C/EBPβ, PGR, FOXO1A and HoxA-11)
c
PCC
USF1
0 0.5 1
SOX4
RARB
HSD11B1
HBEGF
LAMB4
ITGA1
ITGB8
TNFRSF1B
PDZRN3
WNT4
IGF1
INHBA
WNT5A.2
TPST2
PGC
WNT5A
AHRR
PRL
0 0.5 1
PRMT1/4
C/EBPβ
YY1
USF1
CTCF
Pol-II
HoxA-11
PRMT1/4
C/EBPβ
p300
FOXO1A
FOXO1A
p300
HoxA-11
PGR
PGR
p300
C/EBPβ
FOXO1A
HoxA-11
YY1
CTCF
USF1
PRMT1/4
PGR
PRL
AHRR
WNT5A
PGC
TPST2
WNT5A.2
INHBA
IGF1
WNT4
PDZRN3
TNFRSF1B
ITGB8
ITGA1
LAMB4
HBEGF
HSD11B1
RARB
SOX4
a
PRL
LAMB4
INHBA
LAMB1
HSD17B2
F13A1
AHRR
WNT5A
IGF1
ITGA1
HBEGF
ITGB8
PDZRN3
WNT4
PGC
TPST2
WNT5A.2
TNFRSF1B
RARB
SOX4
HSD11B1
© 2011 Nature America, Inc. All rights reserved.
Figure 3 MER20s have binding sites for numerous transcription factors, cofactors and insulator proteins and evolve under functional constraints. (a) The
consensus MER20 contains putative binding sites for numerous transcription factors; only sites with a core match of greater than 0.88 are shown. Overlaid
plot shows the 3-bp moving average of the per nucleotide substitution rate from a random sample of 500 MER20s. (b) Nucleotide substitution rates
(per 109 years) for various classes of sequence are shown with increasing functional constraint from top to bottom (log scale). Nucleotide substitution rates
of putative transcription factor binding sites (pTFBS) and non-binding sites (nonTFBS) from a are shown in red. Substitution rates for non-MER20
sequences are shown36.
Figure 4 MER20s are bound by transcription factors and cofactors important for decidualization and pregnancy. (a) Heat map of ChIP-qPCR data
showing fold enrichment of target over normal IgG controls after normalization to input DNA (Enrich.). MER20s are named by their nearest gene. Five
MER20s were enriched (>2-fold over background) for FOXO1A, PGR and C/EBPβ, 7 for HoxA-11, 8 for PRMT1/4, 9 for USF1, 10 for p300 and 15 for
YY1 and CTCF. (b) Pairwise Pearson’s correlation coefficients (PCCs) calculated for transcription factor binding to MER20s indicates that transcription
factors with insulator functions (blue branches) coordinately bind MER20s to the exclusion of transcription factors with enhancer and/or repressor
functions (yellow branches) and vice versa. (c) PCCs indicate that MER20s fall into two distinct groups based on the combination of transcription
factors they bind: ‘insulator-type’ MER20s shown with blue branches and ‘enhancer/repressor-type’ with yellow branches.
1156
VOLUME 43 | NUMBER 11 | NOVEMBER 2011 Nature Genetics
letters
a
c
PAM212 A549
GgaF
MyoM
HeLa CHON COS-1
ESC
Fold change
© 2011 Nature America, Inc. All rights reserved.
0
5
10
15
20
25
30
35
40
mRNA copies
10
10
5
C/EBPβ
YY1
p300
CTCF
USF1
FOXO1A
HoxA-11
PGR
1.78
3.09
4.94
1.43
1.43
Pl Bre
Ad
ac as
e t
re
an L nta
al un
Sk
gl g
el
an
e
Fr tal S d
on m ki
ta us n
l c cl
or e
W
ho L tex
le ive
br r
a
T in
Ki HP
d 1
C Th ne
e y y
O F reb mu
cc e el s
ip tal lum
it a b
Pa l corain
rt
rie
ta Te ex
l
Fe co stis
Sp ta rtex
l
SmLymina lun
oo ph l co g
th n rd
m od
Tr usc e
ac le
Sp he
Ad leea
TH ipo n
Pa 1P se
n M
Pr cre A
os as
Th tat
Bo Fe yr e
ne tal oid
li
Sa mar ver
liv O row
ar v
y ar
gl y
a
H nd
ea
U rt
te
ru
s
b
+2.5
Ñ norm.fold change
–2.5
pGL4.26
PDZRN3
TPST2
AHRR
WNT5A-1
PGC
WNT5A-2
TNFSR1B
ITG1A
PRL
SOX4
INHBA
LAMB4
ITGB8
RARB
WNT4
EGFH
HSD11B1
IGF1
F13
HSD17B2
10.31
18.22
35.73
Figure 5 MER20 reporter constructs regulate luciferase expression. (a) Heat map shows fold changes in luciferase expression between progesterone/
cAMP-treated cells and untreated cells transiently transfected with MER20 reporter constructs. Cell types are derived from mammalian cervix (HeLa), lung
(A549), kidney (COS-1), muscle (MyoM), keratinocytes (PAM212), chondrocytes (CHON) and endometrial stromal cells (ESC) and chicken fibroblasts
(GgaF). (b) Regulatory strength of MER20s across cell types. Values show the sum of fold changes in luciferase expression upon progesterone/cAMP
treatment from Figure 4a. The greatest regulatory strength was observed for ESC, whereas MER20s had only weak regulatory ability in other cell types.
(c) Expression of transcription factors shown to bind MER20s by ChIP across human tissues. The only tissue that coexpresses all transcription factors and
cofactors shown to bind MER20s is the uterus.
as well as RNA polymerase II (RNAP), the enhancer protein p300 and
the insulator proteins CTCF, USF1, and PRMT1 and PRMT4. Of 21
randomly chosen MER20s, only three bound none of the transcription
factors tested, whereas the remaining 18 MER20s bound several transcription factors and cofactors (Fig. 4a). For example, 16 MER20s were
enriched for YY1, 15 for C/EBPβ and 13 for CTCF as compared to the
control, normal IgG (t-test, P < 0.05). Notably, specific combinations of
transcription factors and cofactors tend to bind different MER20s, suggesting they have distinct functions. For example, transcription factors
with insulator functions (CTCF, USF1, PRMT1 and PRMT4, and YY1)
bind together on 14/21 MER20s, whereas transcription factors with
enhancer and/or repressor functions (p300, PGR, HoxA-11, C/EBPβ
and FOXO1A) bind together on four MER20s (Fig. 4b,c). This finding suggests that MER20s can be classified as either ‘insulator-type’ or
‘enhancer-repressor-type’ based on the combination of transcription
factors they bind (Fig. 4c), indicating that they are likely to exert
distinct kinds of regulatory control on nearby genes.
To test whether the MER20s assayed for protein binding by ChIP can
regulate gene expression, we cloned them into the pGL4.26 minimal
promoter luciferase reporter vector and transiently transfected human
ESCs with the reporter and a Renilla control (pGL4.74). Over half of
the MER20s activated luciferase expression over background levels
in undifferentiated cells; however, the majority of MER20s strongly
repressed reporter-gene expression in ESCs decidualized with progesterone and cAMP (Fig. 5a). To test whether the regulatory activity of MER20s was specific to ESC, we repeated the dual-luciferase
reporter assay in mammalian cell types derived from cervix (HeLa),
lung (A549), kidney (COS-1), smooth muscle (MyoM) and keratino­
cytes (PAM212), as well as in cells derived from chicken embryonic
fibroblasts (DF1). If MER20s function as cell type–independent
regulatory elements, then we should observe a similar downregulation of luciferase expression upon progesterone/cAMP stimulation
in these cell lines as that observed in human ESC. However, few
Nature Genetics VOLUME 43 | NUMBER 11 | NOVEMBER 2011
MER20s differentially regulated luciferase expression in response to
progesterone/cAMP in these other cell types (Fig. 5a). Significantly
more MER20s downregulated luciferase expression in differentiated
endometrial cells than expected either by chance (P = 1.91 × 10−5,
binomial test) or compared to the other cell lines we tested (P = 1.10 ×
10−18, binomial test). In addition, MER20s were generally stronger
regulators of luciferase expression in ESCs than in other cell types
(Fig. 5b). Thus, the ability of MER20s to coordinately regulate gene
expression in response to progesterone and cAMP signaling is largely
specific to endometrial cells.
The hormone-responsive regulatory function of MER20s in
endometrial cells implies that the trans-regulatory landscape of
endometrial cells is unique. To test this assumption, we examined
the expression of transcription factors shown to bind MER20s in our
ChIP assay across 34 human tissues from a database of transcription
factor expression profiles25. We found that the general transcription
factors YY1, p300, CTCF and USF1 were expressed across all tissues,
whereas the only tissue to coexpress FOXO1A, C/EBPβ, PGR and
HoxA-11 was the uterus (Fig. 5c). This suggests that other cell types
lack the appropriate transcription factor repertoire to utilize MER20s
as progesterone/cAMP-responsive regulatory elements. Our transcriptomic data shows that, like human endometrial cells, opossum
endometrium expresses this set of transcription factors and cofactors, suggesting that endometrial cells were ancestrally predisposed
to utilize MER20s as regulatory elements.
Our targeted ChIP assays demonstrated that many MER20s bind
insulator proteins, such as CTCF, YY1, PRMT1 and PRMT4, and
USF1. Interestingly, previous studies have shown that insulators
generally repress reporter-gene expression in luciferase assays 26–28,
which suggests that MER20s that repressed reporter-gene expression
in our luciferase assays may be insulators. Indeed, we found that our
set of functionally characterized insulator-type MER20s were significantly more common between genes that had expression patterns
1157
letters
a
Fold
expression
change
© 2011 Nature America, Inc. All rights reserved.
–2.5
0
2.5
PPP4R2
PDZRN3
CNTN3
TFIP11
TPST2
CRY131
PDCD6
AHRR
TFEB
PGC
CNAP3
WNT5A
ERC2
TNFSF8
TNFRSF1B
UPS1BD
PELO
ITGA1
SOX4
PRL
*
EXOC3
FRS3
ITGA2
*
HDGFL1
PRL
SOX4
BC047446
INHBA
GLI3
LAMB1b
LAMB4b
LAMB4a
MACC1
ITGB8
ABCB5
THRB
RARB
TOP2B
CDC42
WNT4
ZBTB40
CDKAL1
LAMB1b
LAMB1a
PLD
G0S2
HSD11B1
TRAF3IP3
C12orf48
IGF1
PAH
NRN1
F13A1
LY86
SDR42E1
HSD17B2
MPHOSPH6
b
There is a broad consensus that many of the genetic changes underlying the evolution of morphology occur by the stepwise modification of individual pre-existing cis-regulatory element modules5,6,29.
However, it is questionable whether the origin of complex novelties—
such as the origin of new cell types, which involves the recruitment of
hundreds of genes—can be achieved by these small-scale changes7,29.
Our findings indicate that the gene regulatory network of ESCs was
rewired in placental mammals during the evolution of pregnancy, a
reorganization partly mediated by the transposable element MER20.
Furthermore, MER20s coopted specific signaling pathways essential
for implantation and pregnancy into ESCs by acting as cell type–
specific regulatory elements. These findings strongly support the
existence of transposon-mediated gene regulatory innovation at the
network level, a mechanism of gene regulation first suggested more
than forty years ago by McClintock30 and Britten and Davidson31. Our
data and those of other recent studies13,14,32 show that transposable
elements are potent agents of gene regulatory network evolution and
add to an increasing body of evidence indicating that the evolution of
novel characters involves genetic mechanisms that are distinct from
those involved in the modification of existing characters23,33–35.
URLs. HyPhy, http://www.datam0nk3y.org/hyphy/doku.php/; GOstat,
http://gostat.wehi.edu.au/; Mammalian Atlas of Combinatorial
Transcriptional Regulation database, http://fantom.gsc.riken.jp/4/
ppi_module/; MATCH, http://www.gene-regulation.com/pub/programs.
html#match; Muscle, http://www.ebi.ac.uk/Tools/msa/muscle/.
Methods
Methods and any associated references are available in the online
­version of the paper at http://www.nature.com/naturegenetics/.
Data availability. RNA-Seq data has been deposited in Gene Expression
Omnibus (GEO), accession number GSE30708.
Figure 6 MER20s are candidate insulator elements. (a) Insulator-type
MER20s are located between differentially expressed genes in human
ESC. Cartoon shows the relative locations of genes (named rectangles)
and MER20s (small blue or yellow rectangles). The color of each rectangle
shows the fold change in expression of that gene upon progesterone/cAMP
stimulation in human ESCs (green, downregulation; red, upregulation).
White boxes indicate genes not expressed in human ESC. Blue and
yellow boxes between genes indicate insulator-type and cis-regulatory–
type MER20s, respectively. Black boxes are MER20s that were not
characterized in this study. Insulator-type MER20s are significantly more
common between differentially expressed genes than expected by chance
(P = 0.001, binomial test). Asterisks (*) indicate MER20s that have been
previously identified as regulatory elements. (b) Model of gene regulatory
rewiring by MER20s. Ancestrally, numerous genes (black arrows) were
not expressed in ESCs because they were repressed by epigenetic
modifications of chromatin and direct silencing by transcriptional
repressors. MER20s inserted into the genome in the placental mammal
lineage (blue/yellow box on phylogeny), which prevented the spread of
silent chromatin, establishing new borders between transcriptionally silent
(green) and active (red) chromatin.
in response to decidualization opposite to those expected (16/19;
P < 0.002, binomial test), whereas genes without an intervening
insulator-type MER20 were co-regulated during decidualization
(Fig. 6a). These results suggest that the insertion of MER20s into
the genome of ancestral placental mammals shielded blocks of genes
from transcriptional repression, establishing new boundaries between
inactive and active chromatin in stromal cells and leading to
previously repressed genes being available for activation (Fig. 6b).
1158
Note: Supplementary information is available on the Nature Genetics website.
Acknowledgments
The authors would like to thank A. Pyle and the three anonymous reviewers for
comments on an earlier version of this manuscript. We would also like to thank
R.W. Truman (National Hansen’s Disease Program/US National Institutes of
Allergy and Infectious Diseases IAA-2646) and K. Smith for the generous gifts
of pregnant armadillo and opossum uterus and R. Bjornson and N. Carriero for
assistance with RNA-Seq read mapping. This work was funded by a grant from the
John Templeton Foundation, no. 12793, Genetics and the Origin of Organismal
Complexity; results presented here do not necessarily reflect the views of the John
Templeton Foundation. The funders had no role in study design, data collection
and analysis, decision to publish or manuscript preparation.
Author contributions
V.J.L. and G.P.W. designed experiments and wrote the manuscript. V.J.L. and G.M.
performed experiments and analyzed data, and R.D.L. designed and performed
bioinformatics analyses.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Published online at http://www.nature.com/naturegenetics/.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
1. Darwin, C. On the Origin of Species. 6th edn. (Gramercy, 1883).
2. Mayr, E. The emergence of evolutionary novelties. in Evolution after Darwin Vol. 1
(ed. Tax, S.) 349–380 (Harvard Univ. Press, 1960).
3. Mivart, S.G. On the Genesis of Species (D. Appleton, 1871).
4. Müller, G.B. & Wagner, G.P. Novelty in evolution: restructuring the concept.
Annu. Rev. Ecol. Syst. 22, 229–256 (1991).
VOLUME 43 | NUMBER 11 | NOVEMBER 2011 Nature Genetics
© 2011 Nature America, Inc. All rights reserved.
letters
5. Prud’homme, B., Gompel, N. & Carroll, S.B. Emerging principles of regulatory
evolution. Proc. Natl. Acad. Sci. USA 104, 8605–8612 (2007).
6. Carroll, S.B. Evo-devo and an expanding evolutionary synthesis: a genetic theory of
morphological evolution. Cell 134, 25–36 (2008).
7. Wagner, G.P. & Lynch, V.J. Molecular evolution of evolutionary novelties: the vagina and
uterus of therian mammals. J. Exp. Zool. B Mol. Dev. Evol. 304, 580–592 (2005).
8. Mess, A. & Carter, A.M. Evolutionary transformations of fetal membrane characters
in Eutheria with special reference to Afrotheria. J. Exp. Zool. B Mol. Dev. Evol. 306,
140–163 (2006).
9. Wildman, D.E. et al. Evolution of the mammalian placenta revealed by phylogenetic
analysis. Proc. Natl. Acad. Sci. USA 103, 3203–3208 (2006).
10.Gellersen, B. & Brosens, J. Cyclic AMP and progesterone receptor cross-talk in
endometrium: a decidualizing affair. J. Endocrinol. 178, 357–372 (2003).
11.Gellersen, B., Brosens, I.M.D. & Brosens, J.M.D. Decidualization of the human
endometrium: mechanisms, functions, and clinical perspectives. Semin. Reprod.
Med. 25, 445–453 (2007).
12.Gerlo, S., Davis, J.R., Mager, D.L. & Kooijman, R. Prolactin in man: a tale of two
promoters. Bioessays 28, 1051–1055 (2006).
13.Bourque, G. et al. Evolution of the mammalian transcription factor binding repertoire
via transposable elements. Genome Res. 18, 1752–1762 (2008).
14.Sasaki, T. et al. Possible involvement of SINEs in mammalian-specific brain
formation. Proc. Natl. Acad. Sci. USA 105, 4220–4225 (2008).
15.Kunarso, G. et al. Transposable elements have rewired the core regulatory network
of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).
16.Bejerano, G. et al. A distal enhancer and an ultraconserved exon are derived from
a novel retroposon. Nature 441, 87–90 (2006).
17.Jordan, I.K., Rogozin, I.B., Glazko, G.V. & Koonin, E.V. Origin of a substantial fraction
of human regulatory sequences from transposable elements. Trends Genet. 19,
68–72 (2003).
18.van de Lagemaat, L.N., Landry, J.-R., Mager, D.L. & Medstrand, P. Transposable
elements in mammals promote regulatory variation and diversification of genes with
specialized functions. Trends Genet. 19, 530–536 (2003).
19.Thornburg, B.G., Gotea, V. & Makalowski, W. Transposable elements as a significant
source of transcription regulating signals. Gene 365, 104–110 (2006).
20.Christian, M. et al. Cyclic AMP-induced forkhead transcription factor, FKHR,
cooperates with CCAAT/enhancer-binding protein beta in differentiating human
endometrial stromal cells. J. Biol. Chem. 277, 20825–20832 (2002).
Nature Genetics VOLUME 43 | NUMBER 11 | NOVEMBER 2011
21.Mantena, S.R. et al. C/EEBP-beta is a critical mediator of steroid hormone-regulated
cell proliferation and differentiation in the unterine epithelium and stroma.
Proc. Natl. Acad. Sci. USA 103, 1870–1875 (2006).
22.Buzzio, O.L., Lu, Z., Miller, C.D., Unterman, T.G. & Kim, J.J. FOXO1A
differentially regulates genes of decidualization. Endocrinology 147, 3870–3876
(2006).
23.Lynch, V.J. et al. Adaptive changes in the transcription factor HoxA-11 are essential
for the evolution of pregnancy in mammals. Proc. Natl. Acad. Sci. USA 105,
14928–14933 (2008).
24.Hsieh-Li, H.M. et al. Hoxa 11 structure, extensive antisense transcription, and
function in male and female fertility. Development 121, 1373–1385 (1995).
25.Ravasi, T. et al. An atlas of combinatorial transcriptional regulation in mouse and
man. Cell 140, 744–752 (2010).
26.Wei, W. & Brennan, M.D. The gypsy insulator can act as a promoter-specific
transcriptional stimulator. Mol. Cell. Biol. 21, 7714–7720 (2001).
27.Abhyankar, M.M., Urekar, C. & Reddi, P.P. A novel CpG-free vertebrate insulator
ilences the testis-specific SP-10 gene in somatic tissues. J. Biol. Chem. 282,
36143–36154 (2007).
28.Kim, J., Kollhoff, A., Bergmann, A. & Stubbs, L. Methylation-sensitive binding of
transcription factor YY1 to an insulator sequence within the paternally expressed
imprinted gene, Peg3. Hum. Mol. Genet. 12, 233–245 (2003).
29.Carroll, S.B. Evolution at two levels: on genes and form. PLoS Biol. 3, e245
(2005).
30.McClintock, B. Components of action of the regulators Spm and Ac. Year B. Carnegie
Inst. Wash. 64, 527–536 (1965).
31.Britten, R.J. & Davidson, E.H. Gene regulation for higher cells: a theory. Science 165,
349–357 (1969).
32.Feschotte, C. Transposable elements and the evolution of regulatory networks.
Nat. Rev. Genet. 9, 397–405 (2008).
33.Adamska, M. et al. The evolutionary origin of hedgehog proteins. Curr. Biol. 17,
R836–R837 (2007).
34.Wagner, G.P. & Lynch, V.J. Evolutionary novelties. Curr. Biol. 20, R48–R52
(2010).
35.Oliver, K.R. & Greene, W.K. Transposable elements: powerful facilitators of evolution.
Bioessays 31, 703–714 (2009).
36.Harti, D. Essential Genetics: A Genomics Perspective (Jones and Bartlett Publishers,
2010).
1159
© 2011 Nature America, Inc. All rights reserved.
ONLINE METHODS
Transcriptome sequencing. Endometrial samples from mid-stage pregnant
opossum and armadillo were dissected from freshly killed females to remove
myometrial and placental tissue and washed in ice-cold PBS to remove blood
cells; tissues were stored in RNA-Later at −80 °C until processing. Endometrial
samples were isolated from whole uteri of armadillo, because they cannot
be bred in captivity and tissue culture methods are not available for either
armadillo or opossum stromal cells. Samples of differentiated and undifferentiated human endometrial stromal cells were cultured and differentiated as
described below. We extracted total RNA using the Qiagen RNA-Easy Midi
RNA-extraction kit followed by on-column DNase treatment (Qiagen). Total
RNA quality was assayed with a Bioanalyzer 2100 (Agilent) and found to be of
excellent quality. Aliquots from the total RNA samples were sequenced using
the Illumina Genome Analyzer II platform by following the protocol suggested
by Illumina for sequencing of cDNA samples. Two biological replicates each
were sequenced for the human undifferentiated and differentiated endometrial
stromal cells, and two samples dissected from different locations in the uteri
of armadillo and opossum were sequenced.
Sequence analysis was performed with Bowtie, and reads were mapped to
the human (GRCh37), armadillo (dasNov2) and opossum (monDom5) cDNA
builds at Ensembl; two mismatches were allowed, and reads aligning to more
than one cDNA were disregarded. Sequencing was performed at the W.M. Keck
Microarray at the Yale University Medical School. The average read count from
the two lanes of data was used for comparative transcriptome analysis.
Preliminary analysis indicated that most variability in read counts between
the two replicate samples occurred for genes with under 20 reads. Therefore,
subsequent analyses were based on genes with read counts greater than 20
reads. However, including all genes with reads >1 did not change our results.
Differentially regulated genes were defined as those that were up- or downregulated more than twofold in differentiated relative to undifferentiated human
endometrial stromal cells.
We identified 1:1:1 human:armadillo:opossum orthologs from the human,
armadillo and opossum cDNA builds at Ensembl using BioMart. We annotated
the 1,532 derived Eutherian ESC-expressed genes by their over-represented
Gene Ontology (GO) terms using GOstat with the goa_human database, a
minimal path length of 3, Benjamini correction for the false discovery rate
and merging GOs if their associated gene lists were inclusions or differed by
less than ten genes. The background set of genes were all those found in the
goa_human database.
Identification of putative transcription factor binding sites in MER20 and
molecular evolution of MER20s. Potential transcription factor binding sites
in the human consensus MER20 were identified using the MATCH program
(see URLs) with TRANSFAC binding site matrices, with a match cut-off
selected to minimize the sum of false positive and false negative results. Only
binding site matches with >88% identity to the core binding site motif in the
MER20 consensus are reported here.
To estimate the evolutionary rate of substitutions in MER20s, we downloaded all MER20s from the human genome and randomly sampled 500. These
500 human MER20s were aligned with Muscle (see URLs), and alignment
columns with more than 51% gapped sequences (gaps occurred outside most
known or predicted binding sites and tended to occur more frequently at the
5′ and 3′ends of the sequences) were removed. The gapped trimmed sequence
alignment was used to estimate site-specific substitution rates using the HyPhy
batch program, siterates.bf, which implements maximum-likelihood estimating of substitution rates and a phylogenetic tree constructed for the 500
MER20s using PhyML under a GTR+Γ model with four gamma classes.
Cell culture. Human endometrial stromal cells immortalized with human
telomerase (ATCC, cat. no. CRL-4003), HeLa, A549, COS-1, MyoM, PAM212
and chicken fibroblasts were grown in DMEM supplemented with 5%
charcoal-stripped calf serum (Hyclone) and 1% antibiotic/antimycotic
(ABAM). To induce decidualization, cells were treated with 0.5 mM 8-Br-cAMP
(Sigma) and 1 µM of the progesterone analog,medroxyprogesterone acetate
(MPA; Sigma) for 48 h. At 80% confluency, cells were collected for gene expression analysis, transfected for luciferase assays using TransIT-LT1 (Mirus)
according to the manufacturer′s protocol or harvested for ChIP assays.
Nature Genetics
Identification of MER20s in the human genome. We mapped the distribution
of MER20s in the human genome (GRCh37) using the Repeatmasker track
of the UCSC genome browser and identified 16,562 MER20s. We analyzed
the distribution of distances between MER20s and differentially regulated
stromal genes to determine whether MER20s were randomly distributed with
respect to stromal genes or whether they were preferentially located within
some distance [1,d] from the start and end sites or within (d = 0) differentially
regulated genes. To generate a null distribution for the association of MER20s
with stromal genes, we generated random positions in the human genome,
equal in number to the set of genes scored as ‘MER20-associated’ (N = 2,113)
and evaluated the distance from that position to the nearest upstream or downstream MER20. This procedure was replicated 500 times (Fig. 2b, black line).
To determine the expected random distribution and error of the background
distance of MER20s to genes in the human genome, we sampled 2,113 genes
that were not differentially regulated by MPA and cAMP stimulation and
evaluated the distance to their nearest upstream or downstream MER20. This
procedure was replicated 500 times (Fig. 2b, blue line).
Epigenetic and genomic profile of MER20s. We examined the epigenetic
status of MER20s associated with stromal genes by using recent genome-wide
ChIP-Seq data for 37 histone modifications, together with the histone variant
H2A.Z and the insulator protein CTCF37,38. To correlate histone modifications with MER20s, we counted ChIP-Seq tag density in 5-bp windows 10 kb
up- and downstream of ~6,000 MER20s located within 200 kb of differentially
regulated ESC genes. Note that position “0” on the x axis of Figure 2a corresponds to the midpoint of each MER20 element.
We also annotated MER20s and the genomic region immediately around
MER20s according to their CpG island density, PhastCons scores and 7× regulatory potential by counting CpG island density, PhastCons scores and 7× regulatory potential scores in 5-bp windows 10 kb up- and downstream of MER20s
located within 200 kb of differentially regulated ESC genes; CpG island density,
PhastCons scores and 7× regulatory potential data were downloaded from the
UCSC genome browser and followed the definitions found there.
Chromatin immunoprecipitation and luciferase reporter assays. For
chromatin immunoprecipitation (ChIP) assays, the EZ-Zyme Chromatin
Prep kit (Millipore) was used following the manufacturer′s protocol. Briefly,
chromatin was cross-linked with 1% formaldehyde for 10 min; this was
followed by quenching with glycine and DNA fragmentation. The equivalent
of 106 cells was used for each immunoprecipitation. The nuclear lysate was
precleared for 1 h with protein G magnetic beads and incubated overnight at
4 °C with protein G–linked magnetic beads and 2 µg of either ChIP validated
antibodies to p300, FOXO1A, PGR, YY1, HoxA-11, C/EBPβ, CTCF, USF1 or
PRMT1 and PRMT4, or ­species-appropriate IgG as negative control (all from
Santa Cruz Biotechnology). Enrichment of the MER20 targets was evaluated
by qPCR using 1/50 of the immunoprecipitated chromatin as template and
the Power SYBR Green PCR Master Mix (Applied Biosystems). We randomly
selected 21 MER20s that span the range of distances from their associated
genes (from –1 kb downstream of an end site to nearly 200 kb upstream of the
start site) to test by ChIP.
The MER20s characterized by ChIP were cloned into the pGL4.26 luciferase reporter vector (Promega). pGL4.26 luciferase reporter constructs
(100 ng) and the pGL4.74 Renilla luciferase control (20 ng) were transiently
transfected into undifferentiated and differentiated ESCs, and luciferase
expression was assayed using the Dual-Luciferase reporter system (Promega)
48 h after transfection. Firefly luciferase activity was normalized with respect
to Renilla luciferase activity. Initially, cells for luciferase assays were grown
in DMEM supplemented with 5% charcoal-stripped calf serum and 1%
antibiotic/antimycotic. Cells (10 5) were seeded into opaque 96-well plates
and either grown in the media described above or in this medium supplemented with 0.5 mM 8-Br-cAMP (cAMP) and 1 µM medroxyprogesterone
acetate (MPA).
To assess the probability of observing over-representation of downregulation by MER20s in luciferase assays, we used the binomial test, with the
observed number of MER20s that downregulated luciferase expression in
endometrial cells (19), given the sample size (21) and either an expected
proportion of 0.5 (for the comparison to chance alone) or an expected
doi:10.1038/ng.917
tissues using the recently compiled Mammalian Atlast of Combinatorial
Transcriptional Regulation database of absolutely quantified real-time PCR
data (qRT-PCR). mRNA copy data were divided into ten copy bins.
Gene expression profile. To identify tissues that coexpress FOXO1A,
C/EBPβ, PGR, HoxA-11, YY1, p300, CTCF and USF1 (data for PRMT1 and
PRMT4 are not available), we calculated the mRNA copy number across 34
37.Barski, A. et al. High-resolution profiling of histone methylations in the human
genome. Cell 129, 823–837 (2007).
38.Wang, Z. et al. Combinatorial patterns of histone acetylations and methylations in
the human genome. Nat. Genet. 40, 897–903 (2008).
© 2011 Nature America, Inc. All rights reserved.
proportion of 0.1 (14/140 observations from the luciferase assays in the
other cell types were downregulation of luciferase expression). Raw data are
provided in Supplementary Tables 1 and 2.
doi:10.1038/ng.917
Nature Genetics