Mol Biol Evol

advertisement
Mol Biol Evol. 2009 Jan 6. [Epub ahead of print]
The influence of demography and weak selection on the McDonald-Kreitman test:
an empirical study in Drosophila.
Parsch J, Zhang Z, Baines JF.
Department of Biology, University of Munich, 82152 Planegg-Martinsried, Germany.
The McDonald-Kreitman (MK) test, which compares the ratio of polymorphism to
divergence at nonsynonymous and synonymous sites, is frequently used to detect
adaptive evolution in protein-coding sequences. Because the two classes of sites share a
common evolutionary history, the MK test is thought to be robust to most demographic
factors. However, weak selection on nonsynonymous sites can bias the MK test,
especially when a species' effective population size has not been constant. Here we
present an empirical analysis of the influence of demography on the MK test by
comparing test results for a common set of 136 genes, including a set of sex-biased
genes that shows a strong signal of adaptive evolution, in two Drosophila melanogaster
populations: an ancestral population from Africa and a derived population from Europe.
The latter has undergone a relatively recent bottleneck, which has reduced its effective
population size. We find that the MK test has less power to detect positive selection in
the European population for two reasons. First, the overall reduced level of standing
variation decreases the statistical power of the test. Second, the segregation of slightlydeleterious nonsynonymous mutations biases the MK test away from detecting positive
selection. The latter effect is stronger for X-linked genes, which have experienced the
greatest reduction in effective population size outside of Africa, and also leads to the
underestimation of rates of adaptive protein evolution by multi-locus implementations
of the MK test. Interestingly, a subset of autosomal female-biased genes shows an
increased signal of adaptive evolution in the European population. This is inconsistent
with currently accepted demographic scenarios and may reflect female-specific changes
in selective constraint following the colonization of non-African habitats.
PMID: 19126864 [PubMed - as supplied by publisher]
Mol Biol Evol. 2009 Feb;26(2):273-83. Epub 2008 Oct 14.
An investigation of the statistical power of neutrality tests based on comparative
and population genetic data.
Zhai W, Nielsen R, Slatkin M.
Department of Integrative Biology, University of California, Berkeley, USA.
weiweizhai@berkeley.edu
In this report, we investigate the statistical power of several tests of selective neutrality
based on patterns of genetic diversity within and between species. The goal is to
compare tests based solely on population genetic data with tests using comparative data
or a combination of comparative and population genetic data. We show that in the
presence of repeated selective sweeps on relatively neutral background, tests based on
the d(N)/d(S) ratios in comparative data almost always have more power to detect
selection than tests based on population genetic data, even if the overall level of
divergence is low. Tests based solely on the distribution of allele frequencies or the site
frequency spectrum, such as the Ewens-Watterson test or Tajima's D, have less power in
detecting both positive and negative selection because of the transient nature of positive
selection and the weak signal left by negative selection. The Hudson-Kreitman-Aguadé
test is the most powerful test for detecting positive selection among the population
genetic tests investigated, whereas McDonald-Kreitman test typically has more power
to detect negative selection. We discuss our findings in the light of the discordant results
obtained in several recently published genomic scans.
PMID: 18922762 [PubMed - in process]
Genetics. 2008 Nov;180(3):1767-71. Epub 2008 Sep 14.
Controlling type-I error of the McDonald-Kreitman test in genomewide scans for
selection on noncoding DNA.
Andolfatto P.
Department of Ecology and Evolutionary Biology, Princeton University, Princeton,
New Jersey 08544, USA. pandolfa@princeton.edu
Departures from the assumption of homogenously interdigitated neutral and putatively
selected sites in the McDonald-Kreitman test can lead to false rejections of the neutral
model in the presence of intermediate levels of recombination. This problem is
exacerbated by small sample sizes, nonequilibrium demography, recombination rate
variation, and in comparisons involving more recently diverged species. I propose that
establishing significance levels by coalescent simulation with recombination can
improve the fidelity of the test in genomewide scans for selection on noncoding DNA.
PMID: 18791238 [PubMed - in process]
PMCID: PMC2581974 [Available on 2009/02/01]
Mol Biol Evol. 2008 Oct;25(10):2199-209. Epub 2008 Jul 30.
Synonymous and nonsynonymous polymorphisms versus divergences in bacterial
genomes.
Hughes AL, Friedman R, Rivailler P, French JO.
Department of Biological Sciences, University of South Carolina, USA.
Comparison of the ratio of nonsynonymous to synonymous polymorphisms within
species with the ratio of nonsynonymous to synonymous substitutions between species
has been widely used as a supposed indicator of positive Darwinian selection, with the
ratio of these 2 ratios being designated as a neutrality index (NI). Comparison of
genome-wide polymorphism within 12 species of bacteria with divergence from an
outgroup species showed substantial differences in NI among taxa. A low level of
nonsynonymous polymorphism at a locus was the best predictor of NI < 1, rather than a
high level of nonsynonymous substitution between species. Moreover, genes with NI <
1 showed a strong tendency toward the occurrence of rare nonsynonymous
polymorphisms, as expected under the action of ongoing purifying selection. Thus, our
results are more consistent with the hypothesis that a high relative rate of betweenspecies nonsynonymous substitution reflects mainly the action of purifying selection
within species to eliminate slightly deleterious mutations rather than positive selection
between species. This conclusion is consistent with previous results highlighting an
important role of slightly deleterious variants in bacterial evolution and suggests caution
in the use of the McDonald-Kreitman test and related statistics as tests of positive
selection.
Mol Biol Evol. 2008 Jun;25(6):1007-15. Epub 2008 Jan 14.
The McDonald-Kreitman test and slightly deleterious mutations.
Charlesworth J, Eyre-Walker A.
Centre for the Study of Evolution, University of Sussex, Brighton, United Kingdom.
It is possible to estimate the proportion of substitutions that are due to adaptive
evolution using the numbers of silent and nonsilent polymorphisms and substitutions in
a McDonald and Kreitman-type analysis. Unfortunately, this estimate of adaptive
evolution is biased downward by the segregation of slightly deleterious mutations. It has
been suggested that 1 way to cope with the effects of these slightly deleterious
mutations is to remove low-frequency polymorphisms from the analysis. We investigate
the performance of this method theoretically. We show that although removing lowfrequency polymorphisms does indeed reduce the bias in the estimate of adaptive
evolution, the estimate is always downwardly biased, often to the extent that one would
not be able to detect adaptive evolution, even if it existed. The method is reasonably
satisfactory, only if the rate of adaptive evolution is high and the distribution of fitness
effects for slightly deleterious mutations is very leptokurtic. Our analysis suggests that
adaptive evolution could be quite prevalent in humans (>8%) and still not be detectable
using current methodologies. Our analysis also suggests that the level of adaptive
evolution has probably been underestimated, possibly substantially, in both bacteria and
Drosophila.
PMID: 18195052 [PubMed - indexed for MEDLINE]
Molecular Biology and Evolution 2008 25(3):568-579; doi:10.1093/molbev/msm284
Mutation-Selection Models of Codon Substitution and Their Use to Estimate
Selective Strengths on Codon Usage
Ziheng Yang* and Rasmus Nielsen
Current models of codon substitution are formulated at the levels of nucleotide
substitution and do not explicitly consider the separate effects of mutation and selection.
They are thus incapable of inferring whether mutation or selection is responsible for
evolution at silent sites. Here we implement a few population genetics models of codon
substitution that explicitly consider mutation bias and natural selection at the DNA
level. Selection on codon usage is modeled by introducing codon-fitness parameters,
which together with mutation-bias parameters, predict optimal codon frequencies for the
gene. The selective pressure may be for translational efficiency and accuracy or for finetuning translational kinetics to produce correct protein folding. We apply the models to
compare mitochondrial and nuclear genes from several mammalian species. Model
assumptions concerning codon usage are found to affect the estimation of sequence
distances (such as the synonymous rate dS, the nonsynonymous rate dN, and the rate at
the 4-fold degenerate sites d4), as found in previous studies, but the new models
produced very similar estimates to some old ones. We also develop a likelihood ratio
test to examine the null hypothesis that codon usage is due to mutation bias alone, not
influenced by natural selection. Application of the test to the mammalian data led to
rejection of the null hypothesis in most genes, suggesting that natural selection may be a
driving force in the evolution of synonymous codon usage in mammals. Estimates of
selection coefficients nevertheless suggest that selection on codon usage is weak and
most mutations are nearly neutral. The sensitivity of the analysis on the assumed
mutation model is discussed.
(2008) J Mol Evol 67, 418-26 doi:10.10.
Divergence and polymorphism under the nearly neutral theory of molecular
evolution
Welch JJ, Eyre-Walker A & Waxman D
The nearly neutral theory attributes most nucleotide substitution and polymorphism to
genetic drift acting on weakly selected mutants, and assumes that the selection
coefficients for these mutants are drawn from a continuous distribution. This means that
parameter estimation can require numerical integration, and this can be computationally
costly and inaccurate. Furthermore, the leading parameter dependencies of important
quantities can be unclear, making results difficult to understand. For some commonly
used distributions of mutant effects, we show how these problems can be avoided by
writing equations in terms of special functions. Series expansion then allows for their
rapid calculation, and also illuminates leading parameter dependencies. For example,
we show that if mutants are Gamma distributed, the neutrality index is largely
independent of the effective population size. However, we also show that such results
are not robust to misspecification of the functional form of distribution. Some
implications of these findings are then discussed.
Ann N Y Acad Sci. 2008;1133:162-79.
Near neutrality: leading edge of the neutral theory of molecular evolution.
Hughes AL.
Department of Biological Sciences, University of South Carolina, Coker Life Sciences
Bldg., 700 Sumter St., Columbia, South Carolina 29208, USA. austin@biol.sc.edu
The nearly neutral theory represents a development of Kimura's neutral theory of
molecular evolution that makes testable predictions that go beyond a mere null model.
Recent evidence has strongly supported several of these predictions, including the
prediction that slightly deleterious variants will accumulate in a species that has
undergone a severe bottleneck or in cases where recombination is reduced or absent.
Because bottlenecks often occur in speciation and slightly deleterious mutations in
coding regions will usually be nonsynonymous, we should expect that the ratio of
nonsynonymous to synonymous fixed differences between species should often exceed
the ratio of nonsynonymous to synonymous polymorphisms within species. Many data
support this prediction, although they have often been wrongly interpreted as evidence
for positive Darwinian selection. The use of conceptually flawed tests for positive
selection has become widespread in recent years, seriously harming the quest for an
understanding of genome evolution. When properly analyzed, many (probably most)
claimed cases of positive selection will turn out to involve the fixation of slightly
deleterious mutations by genetic drift in bottlenecked populations. Slightly deleterious
variants are a transient feature of evolution in the long term, but they have substantially
affected contemporary species, including our own.
PMID: 18559820 [PubMed - indexed for MEDLINE]
Genetics. 2008 May;179(1):555-67.
Statistical power analysis of neutrality tests under demographic expansions,
contractions and bottlenecks with recombination.
Ramírez-Soriano A, Ramos-Onsins SE, Rozas J, Calafell F, Navarro A.
Departament de Ciències de la Salut i de la Vida, Universitat Pompeu Fabra, 08003
Barcelona, Catalonia, Spain.
Several tests have been proposed to detect departures of nucleotide variability patterns
from neutral expectations. However, very different kinds of evolutionary processes,
such as selective events or demographic changes, can produce similar deviations from
these tests, thus making interpretation difficult when a significant departure of neutrality
is detected. Here we study the effects of demography and recombination upon neutrality
tests by analyzing their power under sudden population expansions, sudden
contractions, and bottlenecks. We evaluate tests based on the frequency spectrum of
mutations and the distribution of haplotypes and explore the consequences of using
incorrect estimates of the rates of recombination when testing for neutrality. We show
that tests that rely on haplotype frequencies-especially Fs and ZnS, which are based,
respectively, on the number of different haplotypes and on the r2 values between all
pairs of polymorphic sites-are the most powerful for detecting expansions on
nonrecombining genomic regions. Nevertheless, they are strongly affected by
misestimations of recombination, so they should not be used when recombination levels
are unknown. Instead, class I tests, particularly Tajima's D or R2, are recommended.
PMID: 18493071 [PubMed - indexed for MEDLINE]
PMCID: PMC2390632
Mol Biol Evol. 2007 Jul;24(7):1562-74. Epub 2007 Apr 21.
Comparisons of site- and haplotype-frequency methods for detecting positive
selection.
Zeng K, Mano S, Shi S, Wu CI.
State Key Laboratory of Biocontrol and Key Laboratory of Gene Engineering of the
Ministry of Education, Sun Yat-sen University, Guangzhou, China.
kzeng@uchicago.edu
In this report, we compare the differences between various site- and haplotypefrequency tests in their power to detect positive selection by doing computer
simulations. Our results are the following. 1) Although haplotype-frequency tests that
are conditional on the number of haplotypes (K) were developed for nonrecombining
haplotypes, these tests are insensitive to recombination. Such tests, including the
Ewens-Watterson (EW) test, can therefore be applied to recombining haplotypes. 2)
Tests conditional on the number of segregating sites (S) become overly conservative in
the presence of recombination. 3) The EW test is usually the most powerful test during
the sweep phase, especially when the local recombination rate is high. 4) The "extended
haplotype homozygosity" test relies heavily on the prior knowledge of the target of
selection. With that knowledge, it is the most powerful test, whereas in the absence of
this prior information, the test has little power. We also study the sensitivities of the
haplotype-frequency tests to background selection and various demographic forces. We
find that these tests are sensitive to some forces other than positive selection. To
alleviate the problem of low specificity, compound tests, such as the DH test (Zeng et
al. 2006), may be a solution. In the companion paper (Zeng K, Shi S, Wu C-I, in
preparation), we use the EW test to devise 2 compound tests, which are more powerful
in detecting positive selection than DH, but are also relatively insensitive to
demography.
PMID: 17449894 [PubMed - indexed for MEDLINE]
Mol Biol Evol. 2007 Aug;24(8):1898-908. Epub 2007 Jun 8.
Compound tests for the detection of hitchhiking under positive selection.
Zeng K, Shi S, Wu CI.
State Key Laboratory of Biocontrol and Key Laboratory of Gene Engineering of the
Ministry of Education, Sun Yat-Sen University, Guangzhou, China.
kzeng@uchicago.edu
Many statistical tests have been developed for detecting positive selection. Most of
these tests draw conclusions based on significant deviations from the patterns of
polymorphism predicted by the neutral model. However, many non-equilibrium forces
may cause similar deviations, and thus the tests usually have low statistical specificity
to positive selection. The main challenge is hence to construct test statistics that are
reasonably powerful in detecting positive selection, but are relatively insensitive to
other forces. Recently, Zeng et al. (2006) proposed a new test, DH, which is a
compound of Tajima's D and Fay and Wu's H, and showed that DH has reasonably high
statistical specificity to positive selection. In this report, we expand the idea of a
compound test by combining Fay and Wu's H or DH with the Ewens-Watterson (EW)
test. We refer to these 2 new tests as HEW and DHEW, respectively. Compared to the
DH test, HEW and DHEW are more robust against the presence of recombination, and
are also more powerful in detecting positive selection. Furthermore, the DHEW test,
similar to DH, is also relatively insensitive to background selection and demography.
The HEW test, on the other hand, tends to be somewhat less conservative than DH and
DHEW in some cases.
PMID: 17557886 [PubMed - indexed for MEDLINE]
Bioinformatics 2007 23(13):i319-i327; doi:10.1093/bioinformatics/btm176
Towards realistic codon models: among site variability and dependency of
synonymous and non-synonymous rates
Itay Mayrose , Adi Doron-Faigenboim , Eran Bacharach and Tal Pupko
Codon evolutionary models are widely used to infer the selection forces acting on a
protein. The non-synonymous to synonymous rate ratio (denoted by Ka/Ks) is used to
infer specific positions that are under purifying or positive selection. Current
evolutionary models usually assume that only the non-synonymous rates vary among
sites while the synonymous substitution rates are constant. This assumption ignores the
possibility of selection forces acting at the DNA or mRNA levels. Towards a more
realistic description of sequence evolution, we present a model that accounts for amongsite-variation of both synonymous and non-synonymous substitution rates. Furthermore,
we alleviate the widespread assumption that positions evolve independently of each
other. Thus, possible sources of bias caused by random fluctuations in either the
synonymous or non-synonymous rate estimations at a single site is removed. Our model
is based on two hidden Markov models that operate on the spatial dimension: one
describes the dependency between adjacent non-synonymous rates while the other
describes the dependency between adjacent synonymous rates. The presented model is
applied to study the selection pressure across the HIV-1 genome. The new model better
describes the evolution of all HIV-1 genes, as compared to current codon models. Using
both simulations and real data analyses, we illustrate that accounting for synonymous
rate variability and dependency greatly increases the accuracy of Ka/Ks estimation and
in particular of positively selected sites. Finally, we discuss the applicability of the
developed model to infer the selection forces in regulatory and overlapping regions of
the HIV-1 genome.
Genetics. 2006 Nov;174(3):1431-9. Epub 2006 Sep 1.
Statistical tests for detecting positive selection by utilizing high-frequency variants.
Zeng K, Fu YX, Shi S, Wu CI.
State Key Laboratory of Biocontrol, Ministry of Education, Sun Yat-sen University,
Guangzhou, China. kzeng@uchicago.edu
By comparing the low-, intermediate-, and high-frequency parts of the frequency
spectrum, we gain information on the evolutionary forces that influence the pattern of
polymorphism in population samples. We emphasize the high-frequency variants on
which positive selection and negative (background) selection exhibit different effects.
We propose a new estimator of theta (the product of effective population size and
neutral mutation rate), thetaL, which is sensitive to the changes in high-frequency
variants. The new thetaL allows us to revise Fay and Wu's H-test by normalization. To
complement the existing statistics (the H-test and Tajima's D-test), we propose a new
test, E, which relies on the difference between thetaL and Watterson's thetaW. We show
that this test is most powerful in detecting the recovery phase after the loss of genetic
diversity, which includes the postselective sweep phase. The sensitivities of these tests
to (or robustness against) background selection and demographic changes are also
considered. Overall, D and H in combination can be most effective in detecting positive
selection while being insensitive to other perturbations. We thus propose a joint test,
referred to as the DH test. Simulations indicate that DH is indeed sensitive primarily to
directional selection and no other driving forces.
PMID: 16951063 [PubMed - indexed for MEDLINE]
PMCID: PMC1667063
Download