1 Genomic Context and Molecular Evolution Brian Charlesworth, University of Edinburgh 1. The problem: There is increasing evidence that genes located in regions of the genome where the frequency of genetic recombination is unusually low have reduced levels of intra-specific DNA sequence variation. This has been especially well documented in Drosophila, but similar effects have been observed in plants and mammals (Charlesworth & Wright 2001; Nachman 2001). Measures of codon usage bias, where alternative triplets coding for the same amino-acid may be subject to weak but statistically detectable selection in bacteria, insects, nematodes and plants (Duret & Mouchiroud 1999). Data on codon usage suggest reduced levels of adaptation in regions of low recombination in Drosophila i.e. codon usage bias tends to be reduced in such regions (Marais & Piganeau 2002). Reduced selective constraints on amino-acid sequences are detected in some non-recombining organelle genomes (Lynch & Blanchard 1998). The most extreme example is provided by the Y chromosome in species with chromosomal sex determination, where a large nonrecombining genomic region has lost most of its genetic functions (Charlesworth & Charlesworth 2000). A relatively young (1 million-year old) neo-Y chromosome in D. miranda (made by adding an autosome onto the real Y chromosome) shows signs of incipient loss of gene function (Bachtrog 2005), as well as severely reduced variability (Bachtrog & Charlesworth 2002). In addition to reductions in nucleotide variation and levels of adaptation, transposable elements and other types of repetitive sequences accumulate in regions of low recombination (Charlesworth et al. 1994). Again, Y and neo-Y chromosomes provide an extreme example of such accumulation (Bachtrog 2003). 2. The population genetics background: In order to understand the causes of these phenomena, it is first necessary to review some basic concepts of population genetics (e.g. Graur & Li 2000, Chap. 2). First, note that DNA sequence variants in natural populations at silent nucleotide sites (i.e, sites where mutations do not alter the protein sequence) are likely to be close to selective neutrality. The fate of silent variants is thus largely controlled by genetic drift; the speed of drift is inversely proportional to the effective population size (Ne), which is usually much smaller than the number of breeding individuals (N) in the population (Graur & Li 2000, Chap.2). At equilibrium, if mutations are sufficiently rare that each site is segregating for at most one variant (the infinite sites model), the mean diversity () is 4Neu, where u is the mutation rate per nucleotide site per generation. Estimates of for silent sites are typically around 0.05 for E. coli, 0.02 in D. melanogaster, and 0.001 in humans, yielding estimates of Ne of about 1 x 108, 3 x106 and 104, respectively, based on estimates of the relevant mutation rates per nucleotide site. The efficiency of selection on a new variant is strongly affected by the magnitude of Ne as well as by its selective advantage or disadvantage. If the fitness of carriers of a new variant is 1 + s, relative to a value of 1 for the prevalent allele, then the variant will behave as effectively neutral if INesI << 1, but will largely be under the control of selection if INesI > 1 (Graur & Li 2000, Chap.2). It is also necessary to examine some general features of the evolutionary dynamics of mutation and selection at many sites in the genome, since these can profoundly affect the fate of variants at other sites if recombination is rare or absent (see Gordo & Charlesworth 2001 for a brief review). Consider a very large but finite population. For a nucleotide site i, with deleterious mutation rate ui the equilibrium frequency of mutant alleles with selection coefficient ti>> 2 ui is qi = ui/ti (Hartl & Clark 1997, p.237), where ti is the fitness reduction experienced by carriers of the mutant allele. If mutations at different sites affect fitness independently, this generates a Poisson distribution of numbers of deleterious alleles carried by different individuals in the population, whose mean is the sum of ui/ti over all sites subject to mutation and selection. This is equal to the ratio of the net deleterious mutation rate, u = iui, to the harmonic mean selection coefficient, t (the harmonic mean is the reciprocal of the mean of the reciprocals). In a finite population with relatively free recombination, and with Ne > 1 for all sites, this equilibrium is closely approached. The frequency of mutation-free individuals is then f0 = e -u/t; this plays a critical role in several of the processes discussed below. If u/t is sufficiently large, f0 can be very small e.g. with u/t = 5, f0=0.007. 3. Effects of reduced recombination on variability and adaptation The next four sections describe ways in which the presence of these deleterious mutations affects evolution and variation at linked loci (see Gordo & Charlesworth 2001 for a brief review, and Charlesworth & Charlesworth 2000 for a more detailed one). The general principle is that selection acting at one site in the genome tends to interfere with the action of selection at other sites, if the sites recombine only infrequently with each other. This is often called the Hill-Robertson effect. For example, it is easy to see that, if two favourable mutations arise in the population in different individuals, they can never be combined into the same gamete in the absence of recombination. If there is some recombination, however, it is possible for both advantageous mutations to become associated in the same gamete, creating a fitter genotype. In this way, recombination reduces the interference between the loci. Another way of thinking about this is in terms of the effective population size, Ne. A site that is under selection causes heritable variation in fitness at linked sites, the effect being stronger the closer the linkage, since closely-linked variants at the two sites tend to be transmitted together for a longer time than loosely-linked variants. It is well-known that fitness variation among individuals within a population reduces Ne (genes are disproportionately transmitted through the fittest individuals, which are only a small part of the population). Selection at one sites thus reduces Ne at a linked site, which reduces both the equilibrium level of neutral variation and the effectiveness of selection (see section 1). All these effects will clearly be stronger in regions of the genome where genetic recombination is relatively infrequent, provided that these have normal densities of genes, since the average degree of linkage between genes subject to selection will be greater in these regions. We therefore expect to see stronger effects of Hill-Robertson effects there, and several of the patterns described in 1 are predicted by these effects, as we will discuss in detail below. Of course, there may be other consequences of recombination that could contribute to these patterns 1. If recombination were mutagenic, it would promote higher levels of variation, and faster divergence between species at silent sites, in regions with higher rates of recombination. There is little evidence for the latter effect in Drosophila, but there may be such an effect in mammals (Hellmann et al. 2003). Similar, biased gene conversion in favour of GC over AT basepairs may occur in heterozygotes for GC/AT (Marais 2003). Since gene conversion is closely associated with genetic recombination, it might be expected to be more frequent in regions of high recombination, leading to enrichment for GC baspairs in such regions. This could explain a good deal of the variation in GC content across the genome, and contribute to differences in codon usage, since preferred codons tend to end in G or C (Marais 2003). i. Hitchhiking effects of favourable mutations: The first class of process involves 3 hitchhiking effects during the spread of selectively favourable mutations (Maynard Smith & Haigh (1974): for a thorough review, see Barton (2000). With complete linkage, a selective sweep of a favourable allele will eliminate all variation at the neutral locus; neutral or nearly neutral variability will recover only slowly, as new variants arise and drift to high frequencies. Predictions of the level of variability in regions of the genome with different levels of recombination can be made, assuming a steady rate of occurrence of favourable mutations distributed randomly over the genome (Stephan 1995). Selective sweeps in a non-recombining genome can also drag to fixation any deleterious mutant alleles associated with the favourable mutation, so that successive adaptive substitutions in a non-recombining genome can lead to fixation of deleterious mutations at other loci. As a result, selective sweeps due to amino-acid site replacements in a gene may reduce the level of codon usage bias (Betancourt & Presgraves 2002; Kim 2004). ii Background selection: Consider a population in equilibrium under mutation and selection at many loci. Assume that Neti > 1 at these loci, so that deleterious mutations are eliminated from the population with near certainty. If there is no recombination, the lineages descended from all but the currently mutant-free individuals in a non-recombining population are destined for ultimate elimination by selection. A beneficial mutation that arises in a genetic background with one or more mutations will thus also be eliminated, unless its own effect on fitness is large enough to enable it to overcome their deleterious effects (as was tacitly assumed in i). Its net chance of survival is reduced by the presence of these mutations, by a factor of f0. Conversely, the chance of fixation of a new weakly deleterious mutation by genetic drift is increased, since the restriction of successful variants to currently mutation-free lineages results in a reduction in the evolutionarily effective size of the population to f0Ne, enhancing the effectiveness of genetic drift relative to selection. The fixation of weakly deleterious mutations,with selection coefficients of the order of 1/(f0Ne), will thus be accelerated, and the fixation probability of advantageous mutations reduced. Over long periods of evolutionary time, therefore, the mean fitness of a nonrecombining genome or genomic region should be reduced relative to that of a freely recombining genome. In addition, the level of variability at neutral or nearly neutral sites will be reduced by a factor of f0. Similar calculations can be done when there is some recombination, enabling predictions to be made of the effects of selective sweeps and background selection on variability and adaptation in regions of the genome with different rates of recombination (Charlesworth 1996). iii. Muller’s ratchet: Background selection effects arise from mutations that are sufficiently strongly selected that their frequencies are close to those expected under mutation-selection balance, even in the absence of recombination i.e. for which INesI >> 1. But there is increasing evidence that many deleterious mutations have very small effects on fitness (e.g. Loewe & Charlesworth 2006). There may therefore be a substantial class of mutations for which the assumption of a stable equilibrium in the absence of recombination is invalid. For relatively strongly selected mutations of this kind, there will still be a predominance of the wild-type allele at most sites subject to selection, even without recombination. These are, however, liable to the loss by genetic drift of the class of chromosomes carrying the fewest deleterious mutations, from a population of finite size. In the absence of recombination and back mutation, this “least-loaded” class of chromosome cannot be restored, once lost. The next best class then replaces it as the leastloaded class, and is in turn lost, and so on, in a fairly steady process of successive 4 irreversible losses of the current least-loaded class (Muller’s ratchet). Each such loss is quickly followed by fixation of one deleterious mutation on the chromosome. This fixation process can be many orders of magnitude faster than in a freely recombining population. This can thus lead to cumulative impairment of gene function at sites scattered over a nonrecombining genome. The speed at which the ratchet moves (reviewed by Gordo & Charlesworth 2000) is clearly critical for its plausibility. Other things being equal, the ratchet should move faster, the smaller the population size. Recent work, however, suggests that the ratchet can move rapidly in a very large population if there are many mutations with very small effects on fitness, so that f0 is small (Gordo & Charlesworth 2000). If the ratchet is operating, it will cause a strong reduction in the level of neutral variability (Gordo et al. 2002), which should be experimentally detectable. iv. Weak selection with Hill-Robertson effects: A distinct but related process is the “weak selection Hill-Robertson effect” (McVean and Charlesworth 2000). In general, we have seen that closely linked selected alleles interfere with each other, inhibiting the spread of favourable alleles and elimination of deleterious ones. Muller’s ratchet and background selection can be viewed as examples of this principle for situations when the sites creating the effects have Net > 1, so that the loci subject to mutation and selection would be near their deterministic equilibria if recombination were not restricted. If, however, Net is < 1, genetic drift and back-mutation from deleterious alleles to wild-type become important. Models incorporating the joint effects of selection, forward and backward mutation and genetic drift are commonly used in the study of biased codon usage (McVean & Charlesworth 2000). Some classes of mutations causing amino-acid changes may also fall into this category (Tachida 2000). Many sites will then segregate for favoured and disfavoured alleles at intermediate frequencies, so that the assumptions of both the ratchet and background selection models break down. With tens of thousands of sites subject to mutation and selection, as would be the case even in bacterial genomes, computer simulations show that the mean level of adaptation (measured by the frequency of sites carrying favoured alleles among chromosomes sampled from the population) is strongly reduced when recombination is absent; genetic diversity is also severely reduced compared with freely recombining genomes. Hill-Robertson effects with weak selection thus have considerable power in the long term to erode the fitness of a non-recombining genome. In addition, with a large number of selected sites there is a very weak relation between mean level of adaptation and population size, which may explain the paradox that codon usage bias in bacteria is similar to that of organisms such as Drosophila, with much smaller population sizes (McVean & Charlesworth 2000). 4. Accumulation of repetitive DNA in regions of low recombination The evidence from studies of transposable elements (TEs) in natural population of Drosophila suggests that they are generally slightly harmful to the individuals that carry them, and present only at low frequencies at individual sites in the genome (Charlesworth et al. 1994). An exception to this is provided by regions of low recombination. Here, element abundances are often relatively high (Charlesworth et al. 1994; Bartolomé et al. 2002). Recent population studies in D. melanogaster show that elements are more frequently at very high frequencies in regions of the genome where recombination is low (Bartolomé et al. 2004), and the same is true for the neo-Y chromosome of D. miranda (Bachtrog 2003). Hill-Robertson effects may undermine the ability of selection to prevent the spread of TEs 5 in low recombination regions. In addition, a major mechanism for eliminating TEs from the population is the process of ectopic exchange, whereby recombination between homologous elements located in different places generates harmful chromosomal rearrangements, such as deletetions or translocations. This process probably explains why elements do not accumulate at high frequencies in intergenic sequences in Drosophila. With reduced meiotic recombination, ectopic exchange will probably also be reduced in frequency, and so insertions into intergenic sequences will be neutral. Most of the examples of high frequency elements involve intergenic insertions, suggesting that it is the lack of ectopic exchange that has led to their accumulation. In addition to transposable elements, non-transposing, highly repetitive tandem arrays of satellite DNA sequences accumulate in low recombination regions. Possible population genetic mechanisms involved in this are discussed by Charlesworth et al. (1994). 5. Testing for the effects of the different processes: It is clearly a difficult task to tell which process has caused a particular pattern, especially as they are not mutually exclusive. The data mentioned earlier indicate that variability seems to be greatly reduced in genomes or genomic regions with low levels of genetic variation, as predicted by all of the models. In D. melanogaster, both hitchhiking and background selection models can, in principle, account for the observed relations between variability and recombination rate (Stephan 1995; Charlesworth 1996). The path which seems most promising is to examine the consequences of the different processes for patterns of within-population variability at neutral or nearly neutral sites in low-recombination genomes or genomic regions, which can then be compared with the empirical evidence from DNA polymorphism studies (Andolfatto 2001). Methods for testing for the effects of selection are described in more detail by the lecture by Gil McVean. To get an idea of the approach in this context, note that a recent selective sweep eliminates variability from a non-recombining genome; any variation that is observed must reflect new mutations which arose after the selective sweep. These will initially be present at low frequencies, and so the frequency distribution is distorted in favour of lowfrequency variants compared with the equilibrium state. Several statistical tests for such departures have been devised (Andolfatto 2001). Unfortunately, background selection and Muller’s ratchet produce skewed variant frequency distributions, as well as selective sweeps (but not as severely: Gordo et al. 2002), making it hard to distinguish between the effects of these processes. In Drosophila, it has proved difficult to obtain clear evidence for sweeps causing reduced variation in regions of low recombination (Andolfatto 2001), suggesting that background selection may be the most important factor. There is, however, fairly convincing evidence for a recent sweep in the case of the D. miranda neo-Y chromosome (Bachtrog 2004). References Two useful textbooks Principles of Population Genetics by D.L. Hartl and A.G. Clark (3rd edition, 1997: Sinauer Associates, Sunderland, MA). Chapters 2, 5, 7, and 8 are especially relevant to this lecture. Li, W.-H. and D. Graur. 2000. Fundamentals of Molecular Evolution (2nd edition). Sinauer, Sunderland, Mass. Chapters 2 and 4 are especially relevant. Some useful short review papers 6 Andolfatto, P., 2001 Adaptive hitchhiking effects on genome variability. Curr. Op. Genet. Dev. 11: 635641. Charlesworth, B., P. Sniegowski, and W. Stephan, 1994 The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371: 215-220. Charlesworth, D.and S. I. Wright, 2001 Breeding systems and genome evolution. Curr. Opn. Genet. Dev. 11: 685-690. Gordo, I. and B. Charlesworth, 2001 Genetic linkage and molecular evolution. Curr. Biol. 11: R684-686. Nachman, M. W., 2001 Single nucleotide polymorphisms and recombination rate in humans. Trnds. Genet. 17: 481-484. Longer review papers Barton, N. H., 2000 Genetic hitchhiking. Phil. Trans. Roy. Soc. Lond. B. 355: 1553-1562. Charlesworth, B.and D. Charlesworth, 2000 The degeneration of Y chromosomes. Phil. Trans. Roy. Soc. Lond. B. 355: 1563-2572. Lynch, M.and J. L. Blanchard, 1998 Deleterious mutation accumulation in organelle genomes. Genetica 102/103: 29-39. Marais, G., 2003 Biased gene conversion: implications for genome and sex evolution. Trnds. Genet. 19: 330-338. Research papers Bachtrog, D., 2003 Accumulation of Spock and Worf, two novel non-LTR retrotransposons, on the neo-Y chromosome of Drosophila miranda. Mol. Biol. Evol. 20: 173-181. Bachtrog, D. 2004. Evidence that positive selection drives Y-chromosome degeneration in Drosophila miranda. Nat. Genet. 36: 518-522. Bachtrog, D. 2005. Sex chromosome evolution: Molecular aspects of Y-chromosome degeneration in Drosophila. Gen. Res. 15: 1393-1401. Bachtrog, D. and B. Charlesworth. 2002. Reduced adaptation of an evolving neo-Y chromosome. Nature 416: 323-326. Bartolomé, C., and X. Maside, 2004 The lack of recombination drives the fixation of transposable elements on the fourth chromosome of Drosophila melanogaster. Genet. Res. 83: 91-100. Bartolomé, C., X. Maside, and B. Charlesworth, 2002 On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. Mol. Biol. Evol. 19: 926-937. Betancourt, A.J. and D.C. Presgraves. 2002. Linkage limits the power of natural selection. Proc. Natl. Acad. Sci. USA 99: 13616-13620. Charlesworth, B. 1996. Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet. Res. 68, 131-150. Duret, L.and D. Mouchiroud, 1999 Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96: 4482-4487. Gordo, I. and B. Charlesworth, 2000 On the speed of Muller’s ratchet. Genetics 156: 2137-2140. Gordo, I., A. Navarro, and B. Charlesworth, 2002 Muller’s ratchet and the pattern of variation at a neutral locus. Genetics 161: 835-848. Hellmann, I., et al., 2003 A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72: 1527-1535. Kim, Y., 2004 Effect of strong directional selection on weakly selected mutations at linked sites: implication for synonymous codon usage. Mol. Biol. Evol. 21: 286-294. L. Loewe, B. Charlesworth, 2006 Inferring the distribution of mutational effects on fitness in Drosophila. Biol. Lett. In press (available online). Marais, G., and G. Piganeau, 2002 Hill-Robertson interference is a minor determinant of variations in codon bias across Drosophila melanogaster and Caenorhabditis elegans genomes. Mol. Biol. Evol. 19: 13991406. Maynard Smith, J. and J. Haigh, 1974 The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23-35. McVean, G. A. T.and B. Charlesworth, 2000 The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics 155: 929-944. Stephan, W., 1995 An improved method for estimating the rate of fixation of favorable mutations based on DNA polymorphism data. Mol. Biol. Evol. 12: 959-962. Tachida, H., 2000 DNA evolution under weak selection. Gene 261: 3-9.