Mol Biol Evol. 2009 Jan 6. [Epub ahead of print] The influence of demography and weak selection on the McDonald-Kreitman test: an empirical study in Drosophila. Parsch J, Zhang Z, Baines JF. Department of Biology, University of Munich, 82152 Planegg-Martinsried, Germany. The McDonald-Kreitman (MK) test, which compares the ratio of polymorphism to divergence at nonsynonymous and synonymous sites, is frequently used to detect adaptive evolution in protein-coding sequences. Because the two classes of sites share a common evolutionary history, the MK test is thought to be robust to most demographic factors. However, weak selection on nonsynonymous sites can bias the MK test, especially when a species' effective population size has not been constant. Here we present an empirical analysis of the influence of demography on the MK test by comparing test results for a common set of 136 genes, including a set of sex-biased genes that shows a strong signal of adaptive evolution, in two Drosophila melanogaster populations: an ancestral population from Africa and a derived population from Europe. The latter has undergone a relatively recent bottleneck, which has reduced its effective population size. We find that the MK test has less power to detect positive selection in the European population for two reasons. First, the overall reduced level of standing variation decreases the statistical power of the test. Second, the segregation of slightlydeleterious nonsynonymous mutations biases the MK test away from detecting positive selection. The latter effect is stronger for X-linked genes, which have experienced the greatest reduction in effective population size outside of Africa, and also leads to the underestimation of rates of adaptive protein evolution by multi-locus implementations of the MK test. Interestingly, a subset of autosomal female-biased genes shows an increased signal of adaptive evolution in the European population. This is inconsistent with currently accepted demographic scenarios and may reflect female-specific changes in selective constraint following the colonization of non-African habitats. PMID: 19126864 [PubMed - as supplied by publisher] Mol Biol Evol. 2009 Feb;26(2):273-83. Epub 2008 Oct 14. An investigation of the statistical power of neutrality tests based on comparative and population genetic data. Zhai W, Nielsen R, Slatkin M. Department of Integrative Biology, University of California, Berkeley, USA. weiweizhai@berkeley.edu In this report, we investigate the statistical power of several tests of selective neutrality based on patterns of genetic diversity within and between species. The goal is to compare tests based solely on population genetic data with tests using comparative data or a combination of comparative and population genetic data. We show that in the presence of repeated selective sweeps on relatively neutral background, tests based on the d(N)/d(S) ratios in comparative data almost always have more power to detect selection than tests based on population genetic data, even if the overall level of divergence is low. Tests based solely on the distribution of allele frequencies or the site frequency spectrum, such as the Ewens-Watterson test or Tajima's D, have less power in detecting both positive and negative selection because of the transient nature of positive selection and the weak signal left by negative selection. The Hudson-Kreitman-Aguadé test is the most powerful test for detecting positive selection among the population genetic tests investigated, whereas McDonald-Kreitman test typically has more power to detect negative selection. We discuss our findings in the light of the discordant results obtained in several recently published genomic scans. PMID: 18922762 [PubMed - in process] Genetics. 2008 Nov;180(3):1767-71. Epub 2008 Sep 14. Controlling type-I error of the McDonald-Kreitman test in genomewide scans for selection on noncoding DNA. Andolfatto P. Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA. pandolfa@princeton.edu Departures from the assumption of homogenously interdigitated neutral and putatively selected sites in the McDonald-Kreitman test can lead to false rejections of the neutral model in the presence of intermediate levels of recombination. This problem is exacerbated by small sample sizes, nonequilibrium demography, recombination rate variation, and in comparisons involving more recently diverged species. I propose that establishing significance levels by coalescent simulation with recombination can improve the fidelity of the test in genomewide scans for selection on noncoding DNA. PMID: 18791238 [PubMed - in process] PMCID: PMC2581974 [Available on 2009/02/01] Mol Biol Evol. 2008 Oct;25(10):2199-209. Epub 2008 Jul 30. Synonymous and nonsynonymous polymorphisms versus divergences in bacterial genomes. Hughes AL, Friedman R, Rivailler P, French JO. Department of Biological Sciences, University of South Carolina, USA. Comparison of the ratio of nonsynonymous to synonymous polymorphisms within species with the ratio of nonsynonymous to synonymous substitutions between species has been widely used as a supposed indicator of positive Darwinian selection, with the ratio of these 2 ratios being designated as a neutrality index (NI). Comparison of genome-wide polymorphism within 12 species of bacteria with divergence from an outgroup species showed substantial differences in NI among taxa. A low level of nonsynonymous polymorphism at a locus was the best predictor of NI < 1, rather than a high level of nonsynonymous substitution between species. Moreover, genes with NI < 1 showed a strong tendency toward the occurrence of rare nonsynonymous polymorphisms, as expected under the action of ongoing purifying selection. Thus, our results are more consistent with the hypothesis that a high relative rate of betweenspecies nonsynonymous substitution reflects mainly the action of purifying selection within species to eliminate slightly deleterious mutations rather than positive selection between species. This conclusion is consistent with previous results highlighting an important role of slightly deleterious variants in bacterial evolution and suggests caution in the use of the McDonald-Kreitman test and related statistics as tests of positive selection. Mol Biol Evol. 2008 Jun;25(6):1007-15. Epub 2008 Jan 14. The McDonald-Kreitman test and slightly deleterious mutations. Charlesworth J, Eyre-Walker A. Centre for the Study of Evolution, University of Sussex, Brighton, United Kingdom. It is possible to estimate the proportion of substitutions that are due to adaptive evolution using the numbers of silent and nonsilent polymorphisms and substitutions in a McDonald and Kreitman-type analysis. Unfortunately, this estimate of adaptive evolution is biased downward by the segregation of slightly deleterious mutations. It has been suggested that 1 way to cope with the effects of these slightly deleterious mutations is to remove low-frequency polymorphisms from the analysis. We investigate the performance of this method theoretically. We show that although removing lowfrequency polymorphisms does indeed reduce the bias in the estimate of adaptive evolution, the estimate is always downwardly biased, often to the extent that one would not be able to detect adaptive evolution, even if it existed. The method is reasonably satisfactory, only if the rate of adaptive evolution is high and the distribution of fitness effects for slightly deleterious mutations is very leptokurtic. Our analysis suggests that adaptive evolution could be quite prevalent in humans (>8%) and still not be detectable using current methodologies. Our analysis also suggests that the level of adaptive evolution has probably been underestimated, possibly substantially, in both bacteria and Drosophila. PMID: 18195052 [PubMed - indexed for MEDLINE] Molecular Biology and Evolution 2008 25(3):568-579; doi:10.1093/molbev/msm284 Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage Ziheng Yang* and Rasmus Nielsen Current models of codon substitution are formulated at the levels of nucleotide substitution and do not explicitly consider the separate effects of mutation and selection. They are thus incapable of inferring whether mutation or selection is responsible for evolution at silent sites. Here we implement a few population genetics models of codon substitution that explicitly consider mutation bias and natural selection at the DNA level. Selection on codon usage is modeled by introducing codon-fitness parameters, which together with mutation-bias parameters, predict optimal codon frequencies for the gene. The selective pressure may be for translational efficiency and accuracy or for finetuning translational kinetics to produce correct protein folding. We apply the models to compare mitochondrial and nuclear genes from several mammalian species. Model assumptions concerning codon usage are found to affect the estimation of sequence distances (such as the synonymous rate dS, the nonsynonymous rate dN, and the rate at the 4-fold degenerate sites d4), as found in previous studies, but the new models produced very similar estimates to some old ones. We also develop a likelihood ratio test to examine the null hypothesis that codon usage is due to mutation bias alone, not influenced by natural selection. Application of the test to the mammalian data led to rejection of the null hypothesis in most genes, suggesting that natural selection may be a driving force in the evolution of synonymous codon usage in mammals. Estimates of selection coefficients nevertheless suggest that selection on codon usage is weak and most mutations are nearly neutral. The sensitivity of the analysis on the assumed mutation model is discussed. (2008) J Mol Evol 67, 418-26 doi:10.10. Divergence and polymorphism under the nearly neutral theory of molecular evolution Welch JJ, Eyre-Walker A & Waxman D The nearly neutral theory attributes most nucleotide substitution and polymorphism to genetic drift acting on weakly selected mutants, and assumes that the selection coefficients for these mutants are drawn from a continuous distribution. This means that parameter estimation can require numerical integration, and this can be computationally costly and inaccurate. Furthermore, the leading parameter dependencies of important quantities can be unclear, making results difficult to understand. For some commonly used distributions of mutant effects, we show how these problems can be avoided by writing equations in terms of special functions. Series expansion then allows for their rapid calculation, and also illuminates leading parameter dependencies. For example, we show that if mutants are Gamma distributed, the neutrality index is largely independent of the effective population size. However, we also show that such results are not robust to misspecification of the functional form of distribution. Some implications of these findings are then discussed. Ann N Y Acad Sci. 2008;1133:162-79. Near neutrality: leading edge of the neutral theory of molecular evolution. Hughes AL. Department of Biological Sciences, University of South Carolina, Coker Life Sciences Bldg., 700 Sumter St., Columbia, South Carolina 29208, USA. austin@biol.sc.edu The nearly neutral theory represents a development of Kimura's neutral theory of molecular evolution that makes testable predictions that go beyond a mere null model. Recent evidence has strongly supported several of these predictions, including the prediction that slightly deleterious variants will accumulate in a species that has undergone a severe bottleneck or in cases where recombination is reduced or absent. Because bottlenecks often occur in speciation and slightly deleterious mutations in coding regions will usually be nonsynonymous, we should expect that the ratio of nonsynonymous to synonymous fixed differences between species should often exceed the ratio of nonsynonymous to synonymous polymorphisms within species. Many data support this prediction, although they have often been wrongly interpreted as evidence for positive Darwinian selection. The use of conceptually flawed tests for positive selection has become widespread in recent years, seriously harming the quest for an understanding of genome evolution. When properly analyzed, many (probably most) claimed cases of positive selection will turn out to involve the fixation of slightly deleterious mutations by genetic drift in bottlenecked populations. Slightly deleterious variants are a transient feature of evolution in the long term, but they have substantially affected contemporary species, including our own. PMID: 18559820 [PubMed - indexed for MEDLINE] Genetics. 2008 May;179(1):555-67. Statistical power analysis of neutrality tests under demographic expansions, contractions and bottlenecks with recombination. Ramírez-Soriano A, Ramos-Onsins SE, Rozas J, Calafell F, Navarro A. Departament de Ciències de la Salut i de la Vida, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain. Several tests have been proposed to detect departures of nucleotide variability patterns from neutral expectations. However, very different kinds of evolutionary processes, such as selective events or demographic changes, can produce similar deviations from these tests, thus making interpretation difficult when a significant departure of neutrality is detected. Here we study the effects of demography and recombination upon neutrality tests by analyzing their power under sudden population expansions, sudden contractions, and bottlenecks. We evaluate tests based on the frequency spectrum of mutations and the distribution of haplotypes and explore the consequences of using incorrect estimates of the rates of recombination when testing for neutrality. We show that tests that rely on haplotype frequencies-especially Fs and ZnS, which are based, respectively, on the number of different haplotypes and on the r2 values between all pairs of polymorphic sites-are the most powerful for detecting expansions on nonrecombining genomic regions. Nevertheless, they are strongly affected by misestimations of recombination, so they should not be used when recombination levels are unknown. Instead, class I tests, particularly Tajima's D or R2, are recommended. PMID: 18493071 [PubMed - indexed for MEDLINE] PMCID: PMC2390632 Mol Biol Evol. 2007 Jul;24(7):1562-74. Epub 2007 Apr 21. Comparisons of site- and haplotype-frequency methods for detecting positive selection. Zeng K, Mano S, Shi S, Wu CI. State Key Laboratory of Biocontrol and Key Laboratory of Gene Engineering of the Ministry of Education, Sun Yat-sen University, Guangzhou, China. kzeng@uchicago.edu In this report, we compare the differences between various site- and haplotypefrequency tests in their power to detect positive selection by doing computer simulations. Our results are the following. 1) Although haplotype-frequency tests that are conditional on the number of haplotypes (K) were developed for nonrecombining haplotypes, these tests are insensitive to recombination. Such tests, including the Ewens-Watterson (EW) test, can therefore be applied to recombining haplotypes. 2) Tests conditional on the number of segregating sites (S) become overly conservative in the presence of recombination. 3) The EW test is usually the most powerful test during the sweep phase, especially when the local recombination rate is high. 4) The "extended haplotype homozygosity" test relies heavily on the prior knowledge of the target of selection. With that knowledge, it is the most powerful test, whereas in the absence of this prior information, the test has little power. We also study the sensitivities of the haplotype-frequency tests to background selection and various demographic forces. We find that these tests are sensitive to some forces other than positive selection. To alleviate the problem of low specificity, compound tests, such as the DH test (Zeng et al. 2006), may be a solution. In the companion paper (Zeng K, Shi S, Wu C-I, in preparation), we use the EW test to devise 2 compound tests, which are more powerful in detecting positive selection than DH, but are also relatively insensitive to demography. PMID: 17449894 [PubMed - indexed for MEDLINE] Mol Biol Evol. 2007 Aug;24(8):1898-908. Epub 2007 Jun 8. Compound tests for the detection of hitchhiking under positive selection. Zeng K, Shi S, Wu CI. State Key Laboratory of Biocontrol and Key Laboratory of Gene Engineering of the Ministry of Education, Sun Yat-Sen University, Guangzhou, China. kzeng@uchicago.edu Many statistical tests have been developed for detecting positive selection. Most of these tests draw conclusions based on significant deviations from the patterns of polymorphism predicted by the neutral model. However, many non-equilibrium forces may cause similar deviations, and thus the tests usually have low statistical specificity to positive selection. The main challenge is hence to construct test statistics that are reasonably powerful in detecting positive selection, but are relatively insensitive to other forces. Recently, Zeng et al. (2006) proposed a new test, DH, which is a compound of Tajima's D and Fay and Wu's H, and showed that DH has reasonably high statistical specificity to positive selection. In this report, we expand the idea of a compound test by combining Fay and Wu's H or DH with the Ewens-Watterson (EW) test. We refer to these 2 new tests as HEW and DHEW, respectively. Compared to the DH test, HEW and DHEW are more robust against the presence of recombination, and are also more powerful in detecting positive selection. Furthermore, the DHEW test, similar to DH, is also relatively insensitive to background selection and demography. The HEW test, on the other hand, tends to be somewhat less conservative than DH and DHEW in some cases. PMID: 17557886 [PubMed - indexed for MEDLINE] Bioinformatics 2007 23(13):i319-i327; doi:10.1093/bioinformatics/btm176 Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates Itay Mayrose , Adi Doron-Faigenboim , Eran Bacharach and Tal Pupko Codon evolutionary models are widely used to infer the selection forces acting on a protein. The non-synonymous to synonymous rate ratio (denoted by Ka/Ks) is used to infer specific positions that are under purifying or positive selection. Current evolutionary models usually assume that only the non-synonymous rates vary among sites while the synonymous substitution rates are constant. This assumption ignores the possibility of selection forces acting at the DNA or mRNA levels. Towards a more realistic description of sequence evolution, we present a model that accounts for amongsite-variation of both synonymous and non-synonymous substitution rates. Furthermore, we alleviate the widespread assumption that positions evolve independently of each other. Thus, possible sources of bias caused by random fluctuations in either the synonymous or non-synonymous rate estimations at a single site is removed. Our model is based on two hidden Markov models that operate on the spatial dimension: one describes the dependency between adjacent non-synonymous rates while the other describes the dependency between adjacent synonymous rates. The presented model is applied to study the selection pressure across the HIV-1 genome. The new model better describes the evolution of all HIV-1 genes, as compared to current codon models. Using both simulations and real data analyses, we illustrate that accounting for synonymous rate variability and dependency greatly increases the accuracy of Ka/Ks estimation and in particular of positively selected sites. Finally, we discuss the applicability of the developed model to infer the selection forces in regulatory and overlapping regions of the HIV-1 genome. Genetics. 2006 Nov;174(3):1431-9. Epub 2006 Sep 1. Statistical tests for detecting positive selection by utilizing high-frequency variants. Zeng K, Fu YX, Shi S, Wu CI. State Key Laboratory of Biocontrol, Ministry of Education, Sun Yat-sen University, Guangzhou, China. kzeng@uchicago.edu By comparing the low-, intermediate-, and high-frequency parts of the frequency spectrum, we gain information on the evolutionary forces that influence the pattern of polymorphism in population samples. We emphasize the high-frequency variants on which positive selection and negative (background) selection exhibit different effects. We propose a new estimator of theta (the product of effective population size and neutral mutation rate), thetaL, which is sensitive to the changes in high-frequency variants. The new thetaL allows us to revise Fay and Wu's H-test by normalization. To complement the existing statistics (the H-test and Tajima's D-test), we propose a new test, E, which relies on the difference between thetaL and Watterson's thetaW. We show that this test is most powerful in detecting the recovery phase after the loss of genetic diversity, which includes the postselective sweep phase. The sensitivities of these tests to (or robustness against) background selection and demographic changes are also considered. Overall, D and H in combination can be most effective in detecting positive selection while being insensitive to other perturbations. We thus propose a joint test, referred to as the DH test. Simulations indicate that DH is indeed sensitive primarily to directional selection and no other driving forces. PMID: 16951063 [PubMed - indexed for MEDLINE] PMCID: PMC1667063