Genomic Context and Molecular Evolution

advertisement
1
Genomic Context and Molecular Evolution
Brian Charlesworth, University of Edinburgh
1. The problem: There is increasing evidence that genes located in regions of the
genome where the frequency of genetic recombination is unusually low have reduced
levels of intra-specific DNA sequence variation. This has been especially well documented
in Drosophila, but similar effects have been observed in plants and mammals
(Charlesworth & Wright 2001; Nachman 2001).
Measures of codon usage bias, where alternative triplets coding for the same
amino-acid may be subject to weak but statistically detectable selection in bacteria, insects,
nematodes and plants (Duret & Mouchiroud 1999). Data on codon usage suggest reduced
levels of adaptation in regions of low recombination in Drosophila i.e. codon usage bias
tends to be reduced in such regions (Marais & Piganeau 2002). Reduced selective
constraints on amino-acid sequences are detected in some non-recombining organelle
genomes (Lynch & Blanchard 1998). The most extreme example is provided by the Y
chromosome in species with chromosomal sex determination, where a large nonrecombining genomic region has lost most of its genetic functions (Charlesworth &
Charlesworth 2000). A relatively young (1 million-year old) neo-Y chromosome in D.
miranda (made by adding an autosome onto the real Y chromosome) shows signs of
incipient loss of gene function (Bachtrog 2005), as well as severely reduced variability
(Bachtrog & Charlesworth 2002).
In addition to reductions in nucleotide variation and levels of adaptation,
transposable elements and other types of repetitive sequences accumulate in regions of low
recombination (Charlesworth et al. 1994). Again, Y and neo-Y chromosomes provide an
extreme example of such accumulation (Bachtrog 2003).
2. The population genetics background: In order to understand the causes of these
phenomena, it is first necessary to review some basic concepts of population genetics (e.g.
Graur & Li 2000, Chap. 2).
First, note that DNA sequence variants in natural populations at silent nucleotide
sites (i.e, sites where mutations do not alter the protein sequence) are likely to be close to
selective neutrality. The fate of silent variants is thus largely controlled by genetic drift;
the speed of drift is inversely proportional to the effective population size (Ne), which is
usually much smaller than the number of breeding individuals (N) in the population (Graur
& Li 2000, Chap.2). At equilibrium, if mutations are sufficiently rare that each site is
segregating for at most one variant (the infinite sites model), the mean diversity () is
4Neu, where u is the mutation rate per nucleotide site per generation.
Estimates of  for silent sites are typically around 0.05 for E. coli, 0.02 in D.
melanogaster, and 0.001 in humans, yielding estimates of Ne of about 1 x 108, 3 x106 and
104, respectively, based on estimates of the relevant mutation rates per nucleotide site.
The efficiency of selection on a new variant is strongly affected by the magnitude
of Ne as well as by its selective advantage or disadvantage. If the fitness of carriers of a
new variant is 1 + s, relative to a value of 1 for the prevalent allele, then the variant will
behave as effectively neutral if INesI << 1, but will largely be under the control of selection
if INesI > 1 (Graur & Li 2000, Chap.2).
It is also necessary to examine some general features of the evolutionary
dynamics of mutation and selection at many sites in the genome, since these can
profoundly affect the fate of variants at other sites if recombination is rare or absent (see
Gordo & Charlesworth 2001 for a brief review).
Consider a very large but finite population. For a nucleotide site i, with deleterious
mutation rate ui the equilibrium frequency of mutant alleles with selection coefficient ti>>
2
ui is qi = ui/ti (Hartl & Clark 1997, p.237), where ti is the fitness reduction experienced by
carriers of the mutant allele.
If mutations at different sites affect fitness independently, this generates a Poisson
distribution of numbers of deleterious alleles carried by different individuals in the
population, whose mean is the sum of ui/ti over all sites subject to mutation and selection.
This is equal to the ratio of the net deleterious mutation rate, u = iui, to the harmonic mean
selection coefficient, t (the harmonic mean is the reciprocal of the mean of the reciprocals).
In a finite population with relatively free recombination, and with Ne > 1 for all
sites, this equilibrium is closely approached. The frequency of mutation-free individuals is
then f0 = e -u/t; this plays a critical role in several of the processes discussed below. If u/t is
sufficiently large, f0 can be very small e.g. with u/t = 5, f0=0.007.
3. Effects of reduced recombination on variability and adaptation
The next four sections describe ways in which the presence of these deleterious
mutations affects evolution and variation at linked loci (see Gordo & Charlesworth 2001
for a brief review, and Charlesworth & Charlesworth 2000 for a more detailed one).
The general principle is that selection acting at one site in the genome tends to
interfere with the action of selection at other sites, if the sites recombine only infrequently
with each other. This is often called the Hill-Robertson effect. For example, it is easy to
see that, if two favourable mutations arise in the population in different individuals, they
can never be combined into the same gamete in the absence of recombination. If there is
some recombination, however, it is possible for both advantageous mutations to become
associated in the same gamete, creating a fitter genotype. In this way, recombination
reduces the interference between the loci.
Another way of thinking about this is in terms of the effective population size, Ne.
A site that is under selection causes heritable variation in fitness at linked sites, the effect
being stronger the closer the linkage, since closely-linked variants at the two sites tend to
be transmitted together for a longer time than loosely-linked variants. It is well-known that
fitness variation among individuals within a population reduces Ne (genes are
disproportionately transmitted through the fittest individuals, which are only a small part
of the population). Selection at one sites thus reduces Ne at a linked site, which reduces
both the equilibrium level of neutral variation and the effectiveness of selection (see
section 1).
All these effects will clearly be stronger in regions of the genome where genetic
recombination is relatively infrequent, provided that these have normal densities of genes,
since the average degree of linkage between genes subject to selection will be greater in
these regions. We therefore expect to see stronger effects of Hill-Robertson effects there,
and several of the patterns described in 1 are predicted by these effects, as we will discuss
in detail below.
Of course, there may be other consequences of recombination that could
contribute to these patterns 1. If recombination were mutagenic, it would promote higher
levels of variation, and faster divergence between species at silent sites, in regions with
higher rates of recombination. There is little evidence for the latter effect in Drosophila,
but there may be such an effect in mammals (Hellmann et al. 2003).
Similar, biased gene conversion in favour of GC over AT basepairs may occur in
heterozygotes for GC/AT (Marais 2003). Since gene conversion is closely associated with
genetic recombination, it might be expected to be more frequent in regions of high
recombination, leading to enrichment for GC baspairs in such regions. This could explain
a good deal of the variation in GC content across the genome, and contribute to differences
in codon usage, since preferred codons tend to end in G or C (Marais 2003).
i. Hitchhiking effects of favourable mutations: The first class of process involves
3
hitchhiking effects during the spread of selectively favourable mutations (Maynard Smith
& Haigh (1974): for a thorough review, see Barton (2000).
With complete linkage, a selective sweep of a favourable allele will eliminate all
variation at the neutral locus; neutral or nearly neutral variability will recover only slowly,
as new variants arise and drift to high frequencies. Predictions of the level of variability in
regions of the genome with different levels of recombination can be made, assuming a
steady rate of occurrence of favourable mutations distributed randomly over the genome
(Stephan 1995).
Selective sweeps in a non-recombining genome can also drag to fixation any
deleterious mutant alleles associated with the favourable mutation, so that successive
adaptive substitutions in a non-recombining genome can lead to fixation of deleterious
mutations at other loci. As a result, selective sweeps due to amino-acid site replacements
in a gene may reduce the level of codon usage bias (Betancourt & Presgraves 2002; Kim
2004).
ii Background selection: Consider a population in equilibrium under mutation and
selection at many loci. Assume that Neti > 1 at these loci, so that deleterious mutations are
eliminated from the population with near certainty. If there is no recombination, the
lineages descended from all but the currently mutant-free individuals in a non-recombining
population are destined for ultimate elimination by selection.
A beneficial mutation that arises in a genetic background with one or more
mutations will thus also be eliminated, unless its own effect on fitness is large enough to
enable it to overcome their deleterious effects (as was tacitly assumed in i). Its net chance
of survival is reduced by the presence of these mutations, by a factor of f0. Conversely, the
chance of fixation of a new weakly deleterious mutation by genetic drift is increased, since
the restriction of successful variants to currently mutation-free lineages results in a
reduction in the evolutionarily effective size of the population to f0Ne, enhancing the
effectiveness of genetic drift relative to selection. The fixation of weakly deleterious
mutations,with selection coefficients of the order of 1/(f0Ne), will thus be accelerated, and
the fixation probability of advantageous mutations reduced.
Over long periods of evolutionary time, therefore, the mean fitness of a nonrecombining genome or genomic region should be reduced relative to that of a freely
recombining genome. In addition, the level of variability at neutral or nearly neutral sites
will be reduced by a factor of f0. Similar calculations can be done when there is some
recombination, enabling predictions to be made of the effects of selective sweeps and
background selection on variability and adaptation in regions of the genome with different
rates of recombination (Charlesworth 1996).
iii. Muller’s ratchet: Background selection effects arise from mutations that are
sufficiently strongly selected that their frequencies are close to those expected under
mutation-selection balance, even in the absence of recombination i.e. for which INesI >> 1.
But there is increasing evidence that many deleterious mutations have very small effects
on fitness (e.g. Loewe & Charlesworth 2006). There may therefore be a substantial class of
mutations for which the assumption of a stable equilibrium in the absence of
recombination is invalid.
For relatively strongly selected mutations of this kind, there will still be a
predominance of the wild-type allele at most sites subject to selection, even without
recombination. These are, however, liable to the loss by genetic drift of the class of
chromosomes carrying the fewest deleterious mutations, from a population of finite size.
In the absence of recombination and back mutation, this “least-loaded” class of
chromosome cannot be restored, once lost. The next best class then replaces it as the leastloaded class, and is in turn lost, and so on, in a fairly steady process of successive
4
irreversible losses of the current least-loaded class (Muller’s ratchet). Each such loss is
quickly followed by fixation of one deleterious mutation on the chromosome. This fixation
process can be many orders of magnitude faster than in a freely recombining population.
This can thus lead to cumulative impairment of gene function at sites scattered over a nonrecombining genome.
The speed at which the ratchet moves (reviewed by Gordo & Charlesworth 2000)
is clearly critical for its plausibility. Other things being equal, the ratchet should move
faster, the smaller the population size. Recent work, however, suggests that the ratchet can
move rapidly in a very large population if there are many mutations with very small effects
on fitness, so that f0 is small (Gordo & Charlesworth 2000). If the ratchet is operating, it
will cause a strong reduction in the level of neutral variability (Gordo et al. 2002), which
should be experimentally detectable.
iv. Weak selection with Hill-Robertson effects: A distinct but related process is the
“weak selection Hill-Robertson effect” (McVean and Charlesworth 2000).
In general, we have seen that closely linked selected alleles interfere with each
other, inhibiting the spread of favourable alleles and elimination of deleterious ones.
Muller’s ratchet and background selection can be viewed as examples of this principle for
situations when the sites creating the effects have Net > 1, so that the loci subject to
mutation and selection would be near their deterministic equilibria if recombination were
not restricted.
If, however, Net is < 1, genetic drift and back-mutation from deleterious alleles to
wild-type become important. Models incorporating the joint effects of selection, forward
and backward mutation and genetic drift are commonly used in the study of biased codon
usage (McVean & Charlesworth 2000). Some classes of mutations causing amino-acid
changes may also fall into this category (Tachida 2000). Many sites will then segregate for
favoured and disfavoured alleles at intermediate frequencies, so that the assumptions of
both the ratchet and background selection models break down.
With tens of thousands of sites subject to mutation and selection, as would be the
case even in bacterial genomes, computer simulations show that the mean level of
adaptation (measured by the frequency of sites carrying favoured alleles among
chromosomes sampled from the population) is strongly reduced when recombination is
absent; genetic diversity is also severely reduced compared with freely recombining
genomes.
Hill-Robertson effects with weak selection thus have considerable power in the
long term to erode the fitness of a non-recombining genome. In addition, with a large
number of selected sites there is a very weak relation between mean level of adaptation
and population size, which may explain the paradox that codon usage bias in bacteria is
similar to that of organisms such as Drosophila, with much smaller population sizes
(McVean & Charlesworth 2000).
4. Accumulation of repetitive DNA in regions of low recombination
The evidence from studies of transposable elements (TEs) in natural population of
Drosophila suggests that they are generally slightly harmful to the individuals that carry
them, and present only at low frequencies at individual sites in the genome (Charlesworth
et al. 1994).
An exception to this is provided by regions of low recombination. Here, element
abundances are often relatively high (Charlesworth et al. 1994; Bartolomé et al. 2002).
Recent population studies in D. melanogaster show that elements are more frequently at
very high frequencies in regions of the genome where recombination is low (Bartolomé et
al. 2004), and the same is true for the neo-Y chromosome of D. miranda (Bachtrog 2003).
Hill-Robertson effects may undermine the ability of selection to prevent the spread of TEs
5
in low recombination regions. In addition, a major mechanism for eliminating TEs from
the population is the process of ectopic exchange, whereby recombination between
homologous elements located in different places generates harmful chromosomal
rearrangements, such as deletetions or translocations. This process probably explains why
elements do not accumulate at high frequencies in intergenic sequences in Drosophila.
With reduced meiotic recombination, ectopic exchange will probably also be reduced in
frequency, and so insertions into intergenic sequences will be neutral. Most of the
examples of high frequency elements involve intergenic insertions, suggesting that it is the
lack of ectopic exchange that has led to their accumulation.
In addition to transposable elements, non-transposing, highly repetitive tandem
arrays of satellite DNA sequences accumulate in low recombination regions. Possible
population genetic mechanisms involved in this are discussed by Charlesworth et al.
(1994).
5. Testing for the effects of the different processes: It is clearly a difficult task to tell
which process has caused a particular pattern, especially as they are not mutually
exclusive. The data mentioned earlier indicate that variability seems to be greatly reduced
in genomes or genomic regions with low levels of genetic variation, as predicted by all of
the models. In D. melanogaster, both hitchhiking and background selection models can, in
principle, account for the observed relations between variability and recombination rate
(Stephan 1995; Charlesworth 1996).
The path which seems most promising is to examine the consequences of the
different processes for patterns of within-population variability at neutral or nearly neutral
sites in low-recombination genomes or genomic regions, which can then be compared with
the empirical evidence from DNA polymorphism studies (Andolfatto 2001). Methods for
testing for the effects of selection are described in more detail by the lecture by Gil
McVean. To get an idea of the approach in this context, note that a recent selective sweep
eliminates variability from a non-recombining genome; any variation that is observed must
reflect new mutations which arose after the selective sweep. These will initially be present
at low frequencies, and so the frequency distribution is distorted in favour of lowfrequency variants compared with the equilibrium state. Several statistical tests for such
departures have been devised (Andolfatto 2001). Unfortunately, background selection and
Muller’s ratchet produce skewed variant frequency distributions, as well as selective
sweeps (but not as severely: Gordo et al. 2002), making it hard to distinguish between the
effects of these processes.
In Drosophila, it has proved difficult to obtain clear evidence for sweeps causing
reduced variation in regions of low recombination (Andolfatto 2001), suggesting that
background selection may be the most important factor. There is, however, fairly
convincing evidence for a recent sweep in the case of the D. miranda neo-Y chromosome
(Bachtrog 2004).
References
Two useful textbooks
Principles of Population Genetics by D.L. Hartl and A.G. Clark (3rd edition, 1997: Sinauer Associates,
Sunderland, MA). Chapters 2, 5, 7, and 8 are especially relevant to this lecture.
Li, W.-H. and D. Graur. 2000. Fundamentals of Molecular Evolution (2nd edition). Sinauer, Sunderland,
Mass. Chapters 2 and 4 are especially relevant.
Some useful short review papers
6
Andolfatto, P., 2001 Adaptive hitchhiking effects on genome variability. Curr. Op. Genet. Dev. 11: 635641.
Charlesworth, B., P. Sniegowski, and W. Stephan, 1994 The evolutionary dynamics of repetitive DNA in
eukaryotes. Nature 371: 215-220.
Charlesworth, D.and S. I. Wright, 2001 Breeding systems and genome evolution. Curr. Opn. Genet. Dev.
11: 685-690.
Gordo, I. and B. Charlesworth, 2001 Genetic linkage and molecular evolution. Curr. Biol. 11: R684-686.
Nachman, M. W., 2001 Single nucleotide polymorphisms and recombination rate in humans. Trnds. Genet.
17: 481-484.
Longer review papers
Barton, N. H., 2000 Genetic hitchhiking. Phil. Trans. Roy. Soc. Lond. B. 355: 1553-1562.
Charlesworth, B.and D. Charlesworth, 2000 The degeneration of Y chromosomes. Phil. Trans. Roy. Soc.
Lond. B. 355: 1563-2572.
Lynch, M.and J. L. Blanchard, 1998 Deleterious mutation accumulation in organelle genomes. Genetica
102/103: 29-39.
Marais, G., 2003 Biased gene conversion: implications for genome and sex evolution. Trnds. Genet. 19:
330-338.
Research papers
Bachtrog, D., 2003 Accumulation of Spock and Worf, two novel non-LTR retrotransposons, on the neo-Y
chromosome of Drosophila miranda. Mol. Biol. Evol. 20: 173-181.
Bachtrog, D. 2004. Evidence that positive selection drives Y-chromosome degeneration in Drosophila
miranda. Nat. Genet. 36: 518-522.
Bachtrog, D. 2005. Sex chromosome evolution: Molecular aspects of Y-chromosome degeneration in
Drosophila. Gen. Res. 15: 1393-1401.
Bachtrog, D. and B. Charlesworth. 2002. Reduced adaptation of an evolving neo-Y chromosome. Nature
416: 323-326.
Bartolomé, C., and X. Maside, 2004 The lack of recombination drives the fixation of transposable elements
on the fourth chromosome of Drosophila melanogaster. Genet. Res. 83: 91-100.
Bartolomé, C., X. Maside, and B. Charlesworth, 2002 On the abundance and distribution of transposable
elements in the genome of Drosophila melanogaster. Mol. Biol. Evol. 19: 926-937.
Betancourt, A.J. and D.C. Presgraves. 2002. Linkage limits the power of natural selection. Proc. Natl. Acad.
Sci. USA 99: 13616-13620.
Charlesworth, B. 1996. Background selection and patterns of genetic diversity in Drosophila melanogaster.
Genet. Res. 68, 131-150.
Duret, L.and D. Mouchiroud, 1999 Expression pattern and, surprisingly, gene length shape codon usage in
Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96: 4482-4487.
Gordo, I. and B. Charlesworth, 2000 On the speed of Muller’s ratchet. Genetics 156: 2137-2140.
Gordo, I., A. Navarro, and B. Charlesworth, 2002 Muller’s ratchet and the pattern of variation at a neutral
locus. Genetics 161: 835-848.
Hellmann, I., et al., 2003 A neutral explanation for the correlation of diversity with recombination rates in
humans. Am. J. Hum. Genet. 72: 1527-1535.
Kim, Y., 2004 Effect of strong directional selection on weakly selected mutations at linked sites:
implication for synonymous codon usage. Mol. Biol. Evol. 21: 286-294.
L. Loewe, B. Charlesworth, 2006 Inferring the distribution of mutational effects on fitness in Drosophila.
Biol. Lett. In press (available online).
Marais, G., and G. Piganeau, 2002 Hill-Robertson interference is a minor determinant of variations in codon
bias across Drosophila melanogaster and Caenorhabditis elegans genomes. Mol. Biol. Evol. 19: 13991406.
Maynard Smith, J. and J. Haigh, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23-35.
McVean, G. A. T.and B. Charlesworth, 2000 The effects of Hill-Robertson interference between weakly
selected mutations on patterns of molecular evolution and variation. Genetics 155: 929-944.
Stephan, W., 1995 An improved method for estimating the rate of fixation of favorable mutations based on
DNA polymorphism data. Mol. Biol. Evol. 12: 959-962.
Tachida, H., 2000 DNA evolution under weak selection. Gene 261: 3-9.
Download