Gene body methylation shows distinct patterns associated with

advertisement
Research
Gene body methylation shows distinct patterns associated with
different gene origins and duplication modes and has a
heterogeneous relationship with gene expression in Oryza sativa
(rice)
Yupeng Wang1,2, Xiyin Wang1,3, Tae-Ho Lee1, Shahid Mansoor1 and Andrew H. Paterson1
1
Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA; 2Computational Biology Service Unit, Cornell University, Ithaca, NY 14853, USA; 3Center for
Genomics and Computational Biology, School of Life Sciences, School of Sciences, Hebei United University, Tangshan, Hebei, 063000, China
Summary
Author for correspondence:
Andrew H. Paterson
Tel: +1 706 583 0162
Email: paterson@plantbio.uga.edu
Received: 31 October 2012
Accepted: 6 December 2012
New Phytologist (2013)
doi: 10.1111/nph.12137
Key words: correlation analysis, DNA
methylation, gene body, gene duplication,
gene origin, Ks, rice (Oryza sativa).
Whole-genome duplication (WGD) has been recurring and single-gene duplication is also
widespread in angiosperms. Recent whole-genome DNA methylation maps indicate that gene
body methylation (i.e. of coding regions) has a functional role. However, whether gene body
methylation is related to gene origins and duplication modes has yet to be reported.
In rice (Oryza sativa), we computed a body methylation level (proportion of methylated
CpG within coding regions) for each gene in five tissues.
Body methylation levels follow a bimodal distribution, but show distinct patterns associated
with transposable element-related genes; WGD, tandem, proximal and transposed duplicates;
and singleton genes. For pairs of duplicated genes, divergence in body methylation levels
increases with physical distance and synonymous (Ks) substitution rates, and WGDs
show lower divergence than single-gene duplications of similar Ks levels. Intermediate body
methylation tends to be associated with high levels of gene expression, whereas heavy body
methylation is associated with lower levels of gene expression.
The biological trends revealed here are consistent across five rice tissues, indicating that
genes of different origins and duplication modes have distinct body methylation patterns, and
body methylation has a heterogeneous relationship with gene expression and may be related
to survivorship of duplicated genes.
Introduction
Gene duplication is a primary mechanism for the evolution of
novelty and complexity in higher organisms (Ohno, 1970; Flagel
& Wendel, 2009; Innan & Kondrashov, 2010). It is now known
that genes may be duplicated by various modes, generally referred
to as large-scale and small-scale duplications (Maere et al., 2005;
Casneuf et al., 2006; Ganko et al., 2007; Freeling, 2009; Wang
et al., 2012). The most frequent consequence of gene duplication
is reversion to single-copy (singleton) status (Freeling & Thomas,
2006; Freeling, 2009); however, genes retained in duplicate offer
the potential for the evolution of novelty (Ohno, 1970; Flagel &
Wendel, 2009; Innan & Kondrashov, 2010). Thus, the study of
mechanisms for gene retention and evolution in view of different
gene duplication modes is very important (Wang et al., 2012).
Oryza sativa (rice) is a good model to elucidate the genetic mechanisms and evolutionary features of different gene duplication
modes (Wang et al., 2007, 2011; Li et al., 2009).
Rice has experienced at least two whole-genome duplications
(WGDs), one shared with most if not all cereals (q), and another
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
more ancient event (r) (Paterson et al., 2004; Tang et al., 2010).
In angiosperm species, most duplicated chromosomal segments
are thought to arise from WGDs (Tang et al., 2008a,b). Smallscale gene duplications, often referred to as single-gene duplications, are also widespread in rice (Wang et al., 2007, 2011; Li
et al., 2009). According to the physical distance between
duplicates, single-gene duplications can be further classified into
local and transposed gene duplications (Ganko et al., 2007;
Wang et al., 2011, 2012). Local duplications may occur as tandem duplications (i.e. duplicated genes are consecutive in the
genome), which may be caused by illegitimate chromosomal
recombination (Freeling, 2009), or proximal duplications (i.e.
separated by one or more genes), which may be caused by localized transposon activities (Zhao et al., 1998; Wang et al., 2011,
2012). Transposable element (TE)-related genes comprise a significant portion of rice protein-coding genes (Yuan et al., 2005;
Jiao & Deng, 2007). TE-related genes have normal gene structures with coding capacity and transcriptional activity, but share
significant sequence similarity with known TEs (Jiao & Deng,
2007). Transposed duplications that create two gene copies far
New Phytologist (2013) 1
www.newphytologist.com
New
Phytologist
2 Research
away from each other are widespread in plants (Freeling et al.,
2008; Freeling, 2009; Woodhouse et al., 2010, 2011; Wang
et al., 2011, 2012), suggesting that many non-TE-related genes
are also mobile, via either DNA- or RNA-mediated transposition
(Cusack & Wolfe, 2007). Transposed duplicates may also occur
by intrachromosomal recombination (Woodhouse et al., 2011).
Divergence between duplicated genes increases with time, but
the rate/extent of divergence is affected by gene duplication
modes (Casneuf et al., 2006; Arabidopsis Interactome Mapping
Consortium, 2011; Wang et al., 2011). Generally, WGD duplicates are less divergent than other duplicates (Casneuf et al.,
2006; Ganko et al., 2007; Li et al., 2009; Wang et al., 2011).
Moreover, singletons show higher interspecies conservation than
duplicates based on cross-species comparison of genomic and
expression data (Ha et al., 2009; Wang et al., 2011). Indeed, the
distinct evolutionary effects of gene duplication modes may, in
turn, affect the rates of gene retention, depending on functional
category-specific selection pressures on neo-functionalization,
functional buffering or high expression (Freeling, 2009; Innan &
Kondrashov, 2010; Wang et al., 2012).
Under-explored and controversial in the current literature are
the roles of epigenetic marks in gene duplication, evolution and
retention. DNA methylation is one of the most important epigenetic marks, and high-resolution whole-genome DNA methylation maps based on bisulfite sequencing have been made for rice
(Feng et al., 2010; Zemach et al., 2010a,b). Previous analyses of
whole-genome DNA methylation data have suggested that rice
DNA methylation occurs predominantly at cytosine followed by
guanine, that is, ‘CpG’ dinucleotides (Feng et al., 2010; Zemach
et al., 2010b). Gene body methylation (DNA methylation of
coding regions) is conserved across eukaryotic lineages (Lee et al.,
2010; Su et al., 2011). Although it is broadly accepted that promoter methylation is generally associated with the repression of
plant gene expression (Zhang et al., 2006; Su et al., 2011), the
functional roles of gene body methylation are controversial (Lee
et al., 2010; Su et al., 2011). To date, gene body methylation has
been suggested to enhance accurate splicing of primary transcripts (Lorincz et al., 2004; Kolasinska-Zwierz et al., 2009;
Schwartz et al., 2009; Luco et al., 2010) and/or prevent ‘leaky’
expression from intragenic cryptic promoters (Zilberman et al.,
2007; Maunakea et al., 2010). In Arabidopsis and rice, association of gene body methylation with active transcription has been
proposed (Zhang et al., 2006; Zilberman et al., 2007; Zemach
et al., 2010b; Takuno & Gaut, 2012). By contrast, several studies
in rice have suggested that the major effect of body methylation
on gene expression is repression (Li et al., 2008; He et al., 2010).
From the point of view of evolution, body-methylated genes have
been suggested to be functionally important and to evolve slowly
(Sarda et al., 2012; Takuno & Gaut, 2012). However, the interplay between gene body methylation and gene duplication, as
well as the evolution of duplicate genes, has been little explored.
Study of the potential interplay between gene body methylation and gene origins and duplications may help us to understand
the roles of epigenetic factors in shaping current genomes, as well
as the mechanisms underlying gene duplications and evolution.
In rice, we analyzed single-base resolution, whole-genome DNA
New Phytologist (2013)
www.newphytologist.com
methylation maps of five tissues (Zemach et al., 2010a,b). For
each gene, we computed a body methylation level (proportion of
methylated CpG dinucleotides within coding regions) in each tissue. We classified rice genes into different origins and duplication
modes, including TE-related genes, singletons, and WGD, tandem, proximal and transposed duplicates, and compared the
body methylation levels among different categories of genes. For
duplicated genes, we examined divergence in body methylation
levels and its relationship with coding sequence divergence. Furthermore, we studied the potential relationships between body
methylation and duplicate gene retention. Finally, we investigated the complicated relationships between body methylation
and gene expression levels.
Materials and Methods
Sequence sources
The rice gene set was retrieved from the Rice Genome Annotation Project (TIGR5, http://rice.plantbiology.msu.edu/). The
gene sets of outgroups, including Sorghum bicolor, Brachypodium
and Zea mays, were retrieved from Phytozome (http://www.
phytozome.net/). For each gene, only the first transcript in the
genome annotation (transcript name suffixed by ‘.1’) was used
for analysis.
Identification of genes of different origins
Rice genes were first divided into TE-related and non-TE-related
genes, according to TIGR5. The non-TE-related genes were further classified into WGD duplicates, singletons, tandem, proximal, transposed and dispersed duplicates. To this end, the
population of potential gene duplications in rice was identified
using BLASTP (Altschul et al., 1990) (TE-related genes were not
considered for BLASTP). For each gene, only the top five nonself
BLASTP matches that met a threshold of E < 10 10 were considered as potential gene duplication relationships. The genes without any BLASTP hit were deemed singletons. WGD duplicates
were obtained from a previous study (Tang et al., 2010). We then
derived single-gene duplications by excluding pairs of WGD
duplicates from the population of gene duplications. Tandem
duplicates were adjacent homologs and proximal duplicates were
not adjacent, but within 10 annotated genes of each other on the
same chromosomes and without any paralog between them.
The remaining single-gene duplications, that is, after deduction
of the tandem and proximal duplications, were searched for
transposed duplications. To accomplish this aim, genes at ancestral (i.e. interspecies collinear) chromosomal positions were discerned by aligning syntenic blocks within rice and between rice
and its outgroups, including Sorghum bicolor, Brachypodium and
Zea mays. For a pair of transposed duplicates, we required that
one duplicate was at its ancestral locus and the other was at a
nonancestral locus, named the parental duplicate and transposed
duplicate, respectively. For a transposed duplicate, there may be
multiple ancestral paralogs, and we regarded the ancestral paralog
with highest sequence identity as its parental duplicate. The
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
New
Phytologist
remaining duplicates which do not belong to any of the WGD,
tandem, proximal and transposed duplicates were simply denoted
as dispersed duplicates.
Rice whole-genome DNA methylation data
Rice single-base resolution DNA methylation data of embryo,
endosperm, leaf, root and shoot tissues, generated by bisulfite
sequencing technology, were obtained from two previous studies
(Zemach et al., 2010a,b). We used the processed data provided
by the authors, available at the Gene Expression Omnibus database (accession numbers: GSM497260, GSM560562,
GSM560563, GSM560564 and GSM560565). In the processed
data, the likelihood of methylation was shown for each CpG,
CHG and CHH site, whose chromosomal position was annotated according to TIGR5. Only CpG methylation was considered in this study. The likelihood of CpG methylation showed a
strong bimodal distribution, and we regarded a value of > 0.5 as
methylation of CpG dinucleotides.
Comparing the distributions of body methylation levels
As body methylation levels tend to be bimodally distributed, it is
not reasonable to compute a single mean and standard deviation
of body methylation levels for a gene group. To compare the distributions of body methylation levels of different gene groups, we
used both parametric and nonparametric tests: (1) parametric
test: we counted the gene numbers associated with low methylation (body methylation level < 0.1), intermediate methylation
(0.1 body methylation level 0.9), and high methylation
(body methylation level > 0.9) for each gene group, and then
compared the gene numbers with different extent of methylation
between different gene groups using a v2 test; and (2) nonparametric test: the comparison of the distributions of body methylation levels between two gene groups was modeled as testing
whether one gene group had more outliers (highly body-methylated genes) than the other group. The Outlier-Sum statistic
(Tibshirani & Hastie, 2007) was adopted. P values were assessed
based on 104 permutations of the pooled body methylation levels
of the two gene groups for comparison.
Ks calculation
Protein sequences of duplicated genes were aligned using
Clustalw (Thompson et al., 1994) with default parameters. Then,
the protein alignment was converted to a coding sequence
alignment using the ‘Bio::Align::Utilities’ module in the BioPerl
package (http://www.bioperl.org/). Ks was calculated using the
methods of Nei & Gojobori (1986) and Yang & Nielsen (2000),
via the ‘Bio::Align::DNAStatistics’ and ‘Bio::Tools::Run::Phylo::
PAML::Yn00’ modules, respectively, in the BioPerl package. It
should be noted that extremely high levels of sequence divergence
between duplicated genes may cause the ‘Bio::Align::DNAStatistics’ module to generate invalid Ks values, which were then ruled
out from the related analysis. Following a previous study in rice
(Tang et al., 2010), we excluded Ks values for gene pairs with
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
Research 3
average third-codon-position GC content (GC3) > 75% from
related statistical analyses because there are two distinct groups of
genes with significantly different GC3. Ks values > 3.0 were also
excluded because of saturated substitutions at synonymous positions.
Gene expression data
Processed rice expression data over 508 tissues and physiological
conditions, generated by the Affymetrix GeneChip Rice
Genome Array, were obtained from previous studies (Ficklin
et al., 2010; Wang et al., 2011). In the data, the numbers of columns that sampled embryo, endosperm, leaf, root and shoot
were 3, 4, 50, 99 and 84, respectively. For some genes, there
are multiple probe sets on the array to measure their expression.
Inclusion or exclusion of ‘suboptimal’ probe sets with suffix
‘_s_at’ or ‘_x_at’, which were suspected of potential crosshybridization, has been shown previously to have only trivial
effects (Wang et al., 2011). In this study, all types of probe sets
were considered and, for a gene with multiple probe sets, the
first probe set according to alphabetic sorting was used to represent its expression profile.
Correlation analysis and smoothing spline regression
In this study, correlations were measured by Spearman’s correlation coefficients. Smoothing spline regression was performed via
the ‘smooth.spline’ function of R language. To avoid overfitting
in smoothing spline regression, three degrees of freedom, including 2, 4 and 6, were tested.
Results
Gene origins in rice
Like many other eukaryotic species, the rice genome has been
shaped and dynamically reconstructed by multiple evolutionary
forces and events, which render its genes to have different origins
(International Rice Genome Sequencing Project, 2005). TErelated genes are classified on the basis of sharing significant
sequence similarity with TEs (Jiao & Deng, 2007). Among nonTE-related genes, those present in only single copies were deemed
to be singletons, whereas others were deemed to be duplicated.
Duplicated genes were further classified in terms of duplication
modes, with those at collinear positions of intraspecies syntenic
blocks deemed to be WGD duplicates (Tang et al., 2010). All
other duplicates were assumed to have occurred by single-gene
duplications, further classified into tandem, proximal and dispersed, as described above. The mechanisms underlying dispersed
duplications are very complicated (Wang et al., 2012). However,
if one member of a pair of dispersed duplications was at its ancestral locus and the other was at a nonancestral locus, such gene
duplications were deemed to be transposed (Wang et al., 2011,
2012). Summary statistics on rice gene origins are shown in
Table 1, and the classification of duplicated genes is shown in
Supporting Information Table S1.
New Phytologist (2013)
www.newphytologist.com
New
Phytologist
4 Research
Table 1 Statistics on rice (Oryza sativa) genes of different origins and
duplication modes
Gene origin
Number of gene
pairs
Number of distinct
genes
Non-TE-related
Singletons
Duplicates
WGD
Tandem
Proximal
Transposed
Dispersed
TE-related
N/A
N/A
N/A
3087
2008
2484
6269
N/A
N/A
41 046
12 618
28 428
5061
3529
3728
6269
12 957
15 232
N/A, not applicable; TE, transposable element.
Body methylation levels show different distributions
associated with gene origins and duplication modes
To investigate the patterns of gene body methylation in view of
different gene origins and duplication modes, we computed the
body methylation level for each gene, defined as the proportion
of methylated CpG dinucleotides relative to all CpG dinucleotides within its coding region, in embryo, endosperm, leaf, root
and shoot. To test the consistency of body methylation levels
across tissues, we visualized the body methylation levels of all
genes between all pairs of tissues via scatter plots (Fig. S1).
Although endosperm tissue shows higher variations than other
tissues, body methylation levels are much more likely to be consistent (rather than different) across tissues, that is, points (genes)
are densely distributed along the ‘y = x’ diagonal line in the scatter
plots. This analysis indicates that it is feasible to study the evolutionary characteristics of body methylation for large groups of
genes with the acknowledgement of the existence of tissuespecific body methylation for specific genes.
A recent study has suggested that gene bodies cluster into two
groups corresponding to high and low levels of DNA methylation, respectively, in honeybee, silkworm, sea squirt and sea
anemone (Sarda et al., 2012). We plotted the distribution of
body methylation levels for all rice genes (Fig. 1a), finding a clear
bimodal distribution peaking at ‘0’ or ‘1’, suggesting that gene
bodies tend to be either highly methylated or little methylated in
rice.
We found that different gene origins differ in the distributions
of body methylation levels. First, we compared the distributions
of body methylation levels between TE-related and nonTE-related genes, and found that the two distributions were
significantly different (P < 2.2 9 10 16, v2; P < 10 4, OutlierSum statistic; see the Materials and Methods section) (Fig. 1b).
Specifically, most TE-related genes are highly body-methylated
(body methylation level > 0.9), consistent with previous studies
(Zilberman et al., 2007; Li et al., 2008; Feng et al., 2010; He
et al., 2010; Zemach et al., 2010b), whereas non-TE-related
genes are bimodally distributed, with more genes little bodymethylated (body methylation level < 0.1). As noted previously,
TE-related genes exhibit much lower transcriptional activities
New Phytologist (2013)
www.newphytologist.com
than non-TE-related genes (Jiao & Deng, 2007), suggesting that
high levels of body methylation may be associated with reduced
transcription, and conflicting with the hypothesis that body
methylation has only minor, but positive, effects on the levels of
gene expression (Zhang et al., 2006; Zilberman et al., 2007;
Zemach et al., 2010b; Takuno & Gaut, 2012).
We compared the distributions of body methylation levels
between different origins within non-TE-related genes. Singletons show a higher frequency of high body methylation than do
duplicates (Fig. 1c; P < 2.2 9 10 16, v2; P < 10 4, Outlier-Sum
statistic; see the Materials and Methods section). Tandem, proximal and transposed duplicates show an obvious frequency peak
of high body methylation (Fig. 1d), whereas WGD duplicates do
not (P < 2.2 9 10 16, v2; P < 10 4, Outlier-Sum statistic; see the
Materials and Methods section). Moreover, the likelihood of a
duplicated gene being highly body-methylated follows the
tendency: transposed > proximal > tandem > WGD (P < 2.2 9
10 16, v2; P < 10 4, Outlier-Sum statistic; see the Materials and
Methods section). In partial summary, body methylation levels
show different distributions associated with gene origins and
duplication modes, suggesting that genes of different origins tend
to have distinct epigenetic features.
Divergence in body methylation levels between duplicated
genes
Genes duplicated by different modes differ in the extent of
expression divergence and the rewiring of protein–protein networks (De Smet & Van de Peer, 2012; Wang et al., 2012). Here,
we examined whether duplicated genes of different modes also
differ significantly in divergence in body methylation levels.
Divergence in body methylation levels among gene pairs duplicated by different modes (Fig. 2a) showed the following trend:
random gene pairs > transposed duplicates > proximal duplicates > tandem duplicates WGD duplicates (both an ANOVA
model involving all duplication modes and Tukey’s honestly significant difference (HSD) test between adjacent duplication
modes were significant at a = 0.05), indicating that different
modes of gene duplication tend to result in different extents of
divergence in body methylation levels. The physical distance
between single-gene duplicates (in terms of number of genes
apart) also followed a trend: transposed duplicates > proximal
duplicates > tandem duplicates. We hypothesized that there may
be position effects that affect body methylation levels, for example, genes that are closer to each other on chromosomes tend to
have more similar body methylation levels. To this end, we randomly selected 20 000 gene pairs on the same chromosomes and
computed the correlations between divergence in body methylation levels and physical distance. These correlations ranged from
0.053 to 0.061 (P < 4.2 9 10 14), indicating that there exist
weak position effects that affect body methylation levels for all
rice genes. For single-gene duplicates, these correlations ranged
from 0.111 to 0.137 (P < 2.2 9 10 16), indicating that the
position effects increase slightly for single-gene duplicate pairs
relative to random gene pairs. At the same physical distance, single-gene duplicates diverge less in body methylation levels than
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
New
Phytologist
Research 5
(a)
(b)
(c)
(d)
Fig. 1 Gene body methylation shows different patterns associated with gene origins and duplication modes. Each column represents one tissue. (a)
Distribution of body methylation levels for all rice genes. (b) Comparison of distributions of body methylation levels between transposable element (TE)related and non-TE-related genes. (c) Comparison of distributions of body methylation levels between singleton and duplicate genes. (d) Comparison of
distributions of body methylation levels among whole-genome duplication (WGD), tandem, proximal and transposed duplicates.
do random gene pairs (Fig. 2b), suggesting that body methylation
patterns are either copied or recapitulated following gene duplication.
Relationship between body methylation patterns and Ks for
pairs of duplicated genes
To understand how gene body methylation evolves following
gene duplication, it may be helpful to relate patterns of body
methylation of duplicated genes to the divergence of their coding
sequence. Synonymous (Ks) substitution rates largely reflect the
neutral mutation rates of coding sequences, suggested to increase
approximately linearly with time for relatively low levels of
sequence divergence (Li, 1997). We first related divergence in
body methylation levels between duplicated genes to Ks using
linear regression (Fig. 3a). Positive correlations were found for all
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
duplication modes (0.113 r 0.175, P < 2.2 9 10 16). For
single-gene duplicates, these correlations ranged from 0.112 to
0.185 (P 1.081 9 10 9). However, as we have shown that, for
single-gene duplicates, there is a weak correlation between divergence in body methylation levels and physical distance, the
position effects could be a nuisance factor for the correlation
between divergence in body methylation levels and Ks. To
remove the effect of physical distance on these correlations for
single-gene duplicates, we computed the partial correlations
between divergence in body methylation levels and Ks. These
partial correlations ranged from 0.101 to 0.159 (P 3.794 9
10 8), declining by 0.01–0.03 from their corresponding correlations, indicating that physical distance has a very weak effect on
the correlation between divergence in body methylation levels
and Ks. Thus, divergence in body methylation levels between
duplicated genes tends to increase with Ks. Moreover, at similar
New Phytologist (2013)
www.newphytologist.com
6 Research
New
Phytologist
(a)
(b)
Fig. 2 Divergence in body methylation levels between duplicated genes. Each column represents one tissue. (a) Comparison of divergence in body
methylation levels among different modes of gene duplication. Whiskers correspond to the minimum and maximum values in the data. (b) Linear
regressions between divergence in body methylation levels and physical distance for random gene pairs and single-gene duplicate pairs.
Ks levels, WGDs tend to have smaller divergence in body methylation levels between duplicates than do tandem, proximal or
transposed duplications. The different extent of divergence in
body methylation levels between gene duplication modes may be
explained by the hypothesis that WGDs generate duplicated
chromosomal segments in which collinear duplicates are more
likely to have similar chromatin environments, whereas singlegene, especially transposed, duplications re-locate to new
chromosomal positions which often have different chromatin
environments.
Next, we related the body methylation levels of duplicated
genes to Ks using linear regression (Fig. 3b). The direction of the
correlations differs among different modes of gene duplication:
Body methylation of WGD duplicates is positively correlated
with Ks (0.051 r 0.084, P < 0.05), whereas body methylation of single-gene duplicates decreases with Ks ( 0.212 r 0.082, P < 9.4 9 10 4). Some duplicated genes are highly
methylated, particularly those generated by single-gene duplications. It is well known that single-gene duplicates have a shorter
half-life than WGD-generated duplicates (Lynch & Conery,
2000). Different rates of nonrandom gene loss shortly after
WGD and single-gene duplication may contribute to the contrasting directions of the correlations between body methylation
levels and Ks. In the first few million years following single-gene
duplication, many duplicates become nonfunctionalized and are
lost (Innan & Kondrashov, 2010). Biases among these genes
may mitigate the long-term tendency towards increased body
methylation, as in WGD duplicates, for example if highly bodymethylated duplicates are preferentially lost. Thus, there could be
links between body methylation patterns and the probability of
long-term survival of duplicated genes.
New Phytologist (2013)
www.newphytologist.com
Relationship between gene body methylation and gene
expression
The observation that TE-related genes are highly body-methylated, but little expressed, appears to conflict with the observation
that body methylation has a positive effect on the levels of gene
expression (Zhang et al., 2006; Zilberman et al., 2007; Zemach
et al., 2010b; Takuno & Gaut, 2012). However, these two
observations might be reconciled if gene body methylation has
heterogeneous effects on gene expression, that is, gene body
methylation affects gene expression in different ways under different conditions. We plotted the regression lines between gene
expression levels and body methylation levels for all nonTE-related genes based on each tissue, using smooth splines with
different degrees of freedom (Fig. 4); this showed that intermediate body methylation tends to be associated with higher gene
expression levels than both low and high body methylation. To
test this observation statistically, we computed the correlations
between body methylation levels and expression levels for the
genes with body methylation levels of < 0.5 and 0.5. These
correlations ranged from 0.223 to 0.284 (P < 2.2 9 10 16) when
the body methylation level was < 0.5, and from 0.182 to
0.101 (P 1.648 9 10 9) when the body methylation level
was 0.5. This result suggests that intermediate body methylation may indeed have positive effects on transcription, possibly
through the enhancement of accurate splicing of primary transcripts, whereas high body methylation is more likely to repress
gene expression, which may lead to pseudofunctionalization or
gene losses.
We related gene expression to variances of body methylation
levels across tissues. Based on Fig. S1, we inferred that TE-related
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
New
Phytologist
Research 7
(a)
(b)
Fig. 3 Relationships between patterns of body methylation and Ks for duplicated genes. Each column represents one tissue. (a) Linear regressions between
divergence in body methylation levels and Ks for different modes of gene duplication. (b) Linear regressions between body methylation levels and Ks for
different modes of gene duplication.
genes tend to have more uniform body methylation levels
(closer to the ‘y = x’ diagonal line) than do non-TE-related
genes, which was then proven statistically by two-sample t-test
for variances of body methylation levels between TE-related and
non-TE-related genes (P < 2.2 9 10 16). This observation indicates that the ‘repressive’ TE-related body methylation tends to
be uniform across tissues. For non-TE-related genes, we found
that there is a significant positive correlation (r = 0.173,
P < 2.2 9 10 16) between the average expression levels and variances of body methylation levels, indicating that non-TE-related
genes with high expression tend to vary in body methylation
across tissues.
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
Discussion
We have related gene body methylation to gene origins and
duplication modes in rice. Our results suggest that genes of different origins and duplication modes are associated with different
patterns of gene body methylation, and highly body-methylated
genes are preferentially lost following gene duplication. Although
it is known that natural variations in DNA methylation exist
among individuals of a species (Becker et al., 2011; Bell et al.,
2011; Fraser et al., 2012) and that, within an individual, many
cytosines may be differentially methylated among different tissues
(Zemach et al., 2010a; Zhang et al., 2011; Vining et al., 2012) or
New Phytologist (2013)
www.newphytologist.com
8 Research
New
Phytologist
Fig. 4 Gene body methylation has heterogeneous effects on gene expression. Smooth spline curves are fitted between gene expression levels and body
methylation levels for all non-transposable element (TE)-related genes, based on different degrees of freedom. A body methylation level of 0.5 appears to
be a point dividing the up- and down-regulation of gene expression levels.
developmental stages (Alisch et al., 2012), or between normal
and stress conditions (Chinnusamy & Zhu, 2009), our analyses
of body methylation patterns based on five different tissues reveal
highly consistent evolutionary trends. We summarized a body
methylation level for each gene that may involve hundreds of
CpG dinucleotides. Further, we compared body methylation
levels among large groups of genes with each group consisting of
several thousand genes. Thus, our computational procedure,
through mitigation of the effect of dynamic changes of
New Phytologist (2013)
www.newphytologist.com
methylation status that may occur at some cytosine nucleotides,
is reliable for large-scale evolutionary analyses.
DNA methylation is an important epigenetic mark and can
affect the nucleotide composition of DNA sequences. DNA
methylation can trigger the spontaneous deamination of methylcytosine to thymine (Bird, 1980; Jones et al., 1987; Pfeifer,
2006), which makes DNA methylation levels and GC levels
interdependent. The data of this study showed strong negative
correlations ( 0.514 r 0.458, P < 2.2 9 10 16) between
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
New
Phytologist
body methylation levels and the GC content at the third codon
position (GC3) for rice genes. The evolution of DNA methylation patterns and DNA sequences can be intermingled, and the
study of DNA methylation evolution may facilitate the understanding of mechanisms for DNA sequence evolution.
In eukaryotic genomes, there are multiple epigenetic marks,
including DNA methylation, histone modifications, nucleosome
positioning and others, all of which may contribute to the regulation of gene expression (Henderson & Jacobsen, 2007). Among
these epigenetic marks, DNA methylation has been studied
extensively for its role in the regulation of gene expression. In
rice, Li et al. (2008) showed an interplay between DNA methylation, histone methylation and gene expression, and that gene
expression appeared to be repressed by DNA methylation, but to
be rescued by the concurrence of DNA and H3K4 methylation.
He et al. (2010) found a weak negative correlation between DNA
methylation and transcript levels, and that TE-related genes are
highly methylated and little transcribed. In Populus trichocarpa,
gene body methylation is suggested to have a more repressive
effect than promoter methylation on transcription (Vining et al.,
2012). By contrast, in Arabidopsis, many studies have suggested
that gene body methylation is associated with active transcription
(Zhang et al., 2006; Zilberman et al., 2007; Takuno & Gaut,
2012). The conflicting conclusions on the direction of the relationship between body methylation and gene expression in previous studies may be because an overall correlation pattern has
often been sought, overlooking the possibility that body methylation may have heterogeneous effects on gene expression.
In conclusion, in rice, using the proportion of methylated
CpG dinucleotides within coding regions to measure the level of
gene body methylation, we found that body methylation levels
follow a bimodal distribution peaking at ‘0’ or ‘1’, and display
distinct patterns associated with different gene origins and duplication modes. For pairs of duplicated genes, divergence in body
methylation levels increases with physical distance and Ks, and
WGDs show lower divergence than single-gene duplications at
similar Ks levels. Body methylation of WGD duplicates tends to
increase with Ks, whereas the body methylation levels of
single-gene duplicates decrease with Ks, indicating that highly
body-methylated genes are preferentially lost following gene
duplication. Moderate body methylation tends to enhance gene
expression, whereas light or heavy body methylation tends to
repress gene expression. This study suggests that genes of
different origins and duplication modes have distinct body methylation patterns, and body methylation evolves with DNA
sequence evolution, has heterogeneous effects on gene expression
and might be related to survivorship of duplicated genes.
Acknowledgements
We thank Barry Marler for IT support, Xinyu Liu for statistical
consulting and Haibao Tang for providing python scripts.
A.H.P. appreciates funding from the National Science Foundation (NSF: DBI 0849896, MCB 0821096, MCB 1021718).
This study was supported in part by resources and technical
expertise from the Georgia Advanced Computing Resource
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
Research 9
Center, a partnership between the Office of the Vice President
for Research and the Office of the Chief Information Officer.
References
Alisch RS, Barwick BG, Chopra P, Myrick LK, Satten GA, Conneely KN,
Warren ST. 2012. Age-associated DNA methylation in pediatric populations.
Genome Research 22: 623–632.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local
alignment search tool. Journal of Molecular Biology 215: 403–410.
Arabidopsis Interactome Mapping Consortium. 2011. Evidence for network
evolution in an Arabidopsis interactome map. Science 333: 601–607.
Becker C, Hagmann J, Muller J, Koenig D, Stegle O, Borgwardt K, Weigel D.
2011. Spontaneous epigenetic variation in the Arabidopsis thaliana methylome.
Nature 480: 245–249.
Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y,
Pritchard JK. 2011. DNA methylation patterns associate with genetic and gene
expression variation in HapMap cell lines. Genome Biology 12: R10.
Bird AP. 1980. DNA methylation and the frequency of CpG in animal DNA.
Nucleic Acids Research 8: 1499–1504.
Casneuf T, De Bodt S, Raes J, Maere S, Van de Peer Y. 2006. Nonrandom
divergence of gene expression following gene and genome duplications in the
flowering plant Arabidopsis thaliana. Genome Biology 7: R13.
Chinnusamy V, Zhu JK. 2009. Epigenetic regulation of stress responses in plants.
Current Opinion in Plant Biology 12: 133–139.
Cusack BP, Wolfe KH. 2007. Not born equal: increased rate asymmetry in
relocated and retrotransposed rodent gene duplicates. Molecular Biology and
Evolution 24: 679–686.
De Smet R, Van de Peer Y. 2012. Redundancy and rewiring of genetic networks
following genome-wide duplication events. Current Opinion in Plant Biology
15: 168–176.
Feng S, Cokus SJ, Zhang X, Chen PY, Bostick M, Goll MG, Hetzel J, Jain J,
Strauss SH, Halpern ME et al. 2010. Conservation and divergence of
methylation patterning in plants and animals. Proceedings of the National
Academy of Sciences, USA 107: 8689–8694.
Ficklin SP, Luo F, Feltus FA. 2010. The association of multiple interacting genes
with specific phenotypes in rice using gene coexpression networks. Plant
Physiology 154: 13–24.
Flagel LE, Wendel JF. 2009. Gene duplication and evolutionary novelty in
plants. New Phytologist 183: 557–564.
Fraser HB, Lam LL, Neumann SM, Kobor MS. 2012. Population-specificity of
human DNA methylation. Genome Biology 13: R8.
Freeling M. 2009. Bias in plant gene content following different sorts of
duplication: tandem, whole-genome, segmental, or by transposition. Annual
Review of Plant Biology 60: 433–453.
Freeling M, Lyons E, Pedersen B, Alam M, Ming R, Lisch D. 2008. Many or
most genes in Arabidopsis transposed after the origin of the order Brassicales.
Genome Research 18: 1924–1937.
Freeling M, Thomas BC. 2006. Gene-balanced duplications, like tetraploidy,
provide predictable drive to increase morphological complexity. Genome
Research 16: 805–814.
Ganko EW, Meyers BC, Vision TJ. 2007. Divergence in expression between
duplicated genes in Arabidopsis. Molecular Biology and Evolution 24: 2298–
2309.
Ha M, Kim ED, Chen ZJ. 2009. Duplicate genes increase expression diversity in
closely related species and allopolyploids. Proceedings of the National Academy of
Sciences, USA 106: 2295–2300.
He G, Zhu X, Elling AA, Chen L, Wang X, Guo L, Liang M, He H, Zhang H,
Chen F et al. 2010. Global epigenetic and transcriptional trends among two
rice subspecies and their reciprocal hybrids. Plant Cell 22: 17–33.
Henderson IR, Jacobsen SE. 2007. Epigenetic inheritance in plants. Nature 447:
418–424.
Innan H, Kondrashov F. 2010. The evolution of gene duplications: classifying
and distinguishing between models. Nature Reviews Genetics 11: 97–108.
International Rice Genome Sequencing Project. 2005. The map-based sequence
of the rice genome. Nature 436: 793–800.
New Phytologist (2013)
www.newphytologist.com
New
Phytologist
10 Research
Jiao Y, Deng XW. 2007. A genome-wide transcriptional activity survey of rice
transposable element-related genes. Genome Biology 8: R28.
Jones M, Wagner R, Radman M. 1987. Mismatch repair of deaminated 5methyl-cytosine. Journal of Molecular Biology 194: 155–159.
Kolasinska-Zwierz P, Down T, Latorre I, Liu T, Liu XS, Ahringer J. 2009.
Differential chromatin marking of introns and expressed exons by H3K36me3.
Nature Genetics 41: 376–381.
Lee TF, Zhai J, Meyers BC. 2010. Conservation and divergence in eukaryotic
DNA methylation. Proceedings of the National Academy of Sciences, USA 107:
9027–9028.
Li WH. 1997. Molecular evolution. Sunderland, MA, USA: Sinauer Associates.
Li X, Wang X, He K, Ma Y, Su N, He H, Stolc V, Tongprasit W, Jin W, Jiang J
et al. 2008. High-resolution mapping of epigenetic modifications of the rice
genome uncovers interplay between DNA methylation, histone methylation,
and gene expression. Plant Cell 20: 259–276.
Li Z, Zhang H, Ge S, Gu X, Gao G, Luo J. 2009. Expression pattern divergence
of duplicated genes in rice. BMC Bioinformatics 10(Suppl 6): S8.
Lorincz MC, Dickerson DR, Schmitt M, Groudine M. 2004. Intragenic DNA
methylation alters chromatin structure and elongation efficiency in mammalian
cells. Nature Structural & Molecular Biology 11: 1068–1075.
Luco RF, Pan Q, Tominaga K, Blencowe BJ, Pereira-Smith OM, Misteli T.
2010. Regulation of alternative splicing by histone modifications. Science 327:
996–1000.
Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate
genes. Science 290: 1151–1155.
Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de
Peer Y. 2005. Modeling gene and genome duplications in eukaryotes.
Proceedings of the National Academy of Sciences, USA 102: 5454–5459.
Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D’Souza C, Fouse SD,
Johnson BE, Hong C, Nielsen C, Zhao Y et al. 2010. Conserved role of
intragenic DNA methylation in regulating alternative promoters. Nature 466:
253–257.
Nei M, Gojobori T. 1986. Simple methods for estimating the numbers of
synonymous and nonsynonymous nucleotide substitutions. Molecular Biology
and Evolution 3: 418–426.
Ohno S. 1970. Evolution by gene duplication. New York, NY, USA: Springer.
Paterson AH, Bowers JE, Chapman BA. 2004. Ancient polyploidization
predating divergence of the cereals, and its consequences for comparative
genomics. Proceedings of the National Academy of Sciences, USA 101: 9903–
9908.
Pfeifer GP. 2006. Mutagenesis at methylated CpG sequences. DNA Methylation:
Basic Mechanisms 301: 259–281.
Sarda S, Zeng J, Hunt BG, Yi SV. 2012. The evolution of invertebrate gene body
methylation. Molecular Biology and Evolution 29: 1907–1916.
Schwartz S, Meshorer E, Ast G. 2009. Chromatin organization marks exon–
intron structure. Nature Structural & Molecular Biology 16: 990–995.
Su Z, Han L, Zhao Z. 2011. Conservation and divergence of DNA methylation
in eukaryotes: new insights from single base-resolution DNA methylomes.
Epigenetics 6: 134–140.
Takuno S, Gaut BS. 2012. Body-methylated genes in Arabidopsis thaliana are
functionally important and evolve slowly. Molecular Biology and Evolution 29:
219–227.
Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. 2008a. Synteny
and collinearity in plant genomes. Science 320: 486–488.
Tang H, Bowers JE, Wang X, Paterson AH. 2010. Angiosperm genome
comparisons reveal early polyploidy in the monocot lineage. Proceedings of the
National Academy of Sciences, USA 107: 472–477.
Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH. 2008b.
Unraveling ancient hexaploidy through multiply-aligned angiosperm gene
maps. Genome Research 18: 1944–1954.
Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the
sensitivity of progressive multiple sequence alignment through sequence
weighting, position-specific gap penalties and weight matrix choice. Nucleic
Acids Research 22: 4673–4680.
New Phytologist (2013)
www.newphytologist.com
Tibshirani R, Hastie T. 2007. Outlier sums for differential gene expression
analysis. Biostatistics 8: 2–8.
Vining KJ, Pomraning KR, Wilhelm LJ, Priest HD, Pellegrini M, Mockler TC,
Freitag M, Strauss SH. 2012. Dynamic DNA cytosine methylation in the
Populus trichocarpa genome: tissue-level variation and relationship to gene
expression. BMC Genomics 13: 27.
Wang X, Tang H, Bowers JE, Feltus FA, Paterson AH. 2007. Extensive
concerted evolution of rice paralogs and the road to regaining independence.
Genetics 177: 1753–1763.
Wang Y, Wang X, Paterson AH. 2012. Genome and gene duplications and gene
expression divergence: a view from plants. Annals of the New York Academy of
Sciences 1256: 1–14.
Wang Y, Wang X, Tang H, Tan X, Ficklin SP, Feltus FA, Paterson AH. 2011.
Modes of gene duplication contribute differently to genetic novelty and
redundancy, but show parallels across divergent angiosperms. PLoS ONE 6:
e28150.
Woodhouse MR, Pedersen B, Freeling M. 2010. Transposed genes in Arabidopsis
are often associated with flanking repeats. PLoS Genetics 6: e1000949.
Woodhouse MR, Tang H, Freeling M. 2011. Different gene families in
Arabidopsis thaliana transposed in different epochs and at different frequencies
throughout the rosids. Plant Cell 23: 4241–4253.
Yang Z, Nielsen R. 2000. Estimating synonymous and nonsynonymous
substitution rates under realistic evolutionary models. Molecular Biology and
Evolution 17: 32–43.
Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B,
Sultana R, Cheung F et al. 2005. The institute for genomic research Osa1 rice
genome annotation database. Plant Physiology 138: 18–26.
Zemach A, Kim MY, Silva P, Rodrigues JA, Dotson B, Brooks MD, Zilberman
D. 2010a. Local DNA hypomethylation activates genes in rice endosperm.
Proceedings of the National Academy of Sciences, USA 107: 18729–18734.
Zemach A, McDaniel IE, Silva P, Zilberman D. 2010b. Genome-wide
evolutionary analysis of eukaryotic DNA methylation. Science 328: 916–919.
Zhang M, Xu C, von Wettstein D, Liu B. 2011. Tissue-specific differences in
cytosine methylation and their association with differential gene expression in
sorghum. Plant Physiology 156: 1955–1966.
Zhang X, Yazaki J, Sundaresan A, Cokus S, Chan SW, Chen H, Henderson IR,
Shinn P, Pellegrini M, Jacobsen SE et al. 2006. Genome-wide high-resolution
mapping and functional analysis of DNA methylation in Arabidopsis. Cell 126:
1189–1201.
Zhao XP, Si Y, Hanson RE, Crane CF, Price HJ, Stelly DM, Wendel JF,
Paterson AH. 1998. Dispersed repetitive DNA has spread to new genomes
since polyploid formation in cotton. Genome Research 8: 479–492.
Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S. 2007. Genomewide analysis of Arabidopsis thaliana DNA methylation uncovers an
interdependence between methylation and transcription. Nature Genetics 39:
61–69.
Supporting Information
Additional supporting information may be found in the online
version of this article.
Fig. S1 Comparison of body methylation levels of all genes
between all pairs of tissues.
Table S1 Classification of rice duplicated genes
Please note: Wiley-Blackwell are not responsible for the content
or functionality of any supporting information supplied by the
authors. Any queries (other than missing material) should be
directed to the New Phytologist Central Office.
Ó 2013 The Authors
New Phytologist Ó 2013 New Phytologist Trust
Download