Implementation of gamma model within a Bayesian framework

advertisement
Implementation of gamma model within a Bayesian framework
We describe a new approach for incorporating mutation rate variation among loci when
using coalescent analysis to estimate genetic diversity. While Alter et al. (2007) used
locus-specific mutation rates in their calculation of an overall , such an approach
requires that phylogenetic calibrations be available for each locus for which intraspecific
data are gathered. Here, we model individual locus mutation rates as being drawn from a
gamma distribution, which has the advantage of being applicable to genealogies of any
set of loci. A comparison of the variation in individual locus mutation rates with the
gamma distribution suggests that the gamma distribution provides a good approximation
of the data (SI Figure 1). Because the current implementation of LAMARC allows the
gamma model only within the likelihood framework, we developed our own extension of
LAMARC (called GUFBUL-Gamma Updating For Bayesians Using LAMARC) that
allows the gamma model to be applied in a Bayesian framework. This program uses the
estimated posterior densities of  for each locus as input to a Markov chain Monte Carlo
algorithm that provides samples from the posterior distributions of the overall E-10, the
locus-specific relative rate parameters, and the scale parameter of the gamma distribution.
GUFBUL will be distributed as a small subcomponent of the LAMARC package
(personal communication, Mary Kuhner, University of Washington, Dept. of Genome
Sciences) along with full details of its implementation. Implementation details are also
available directly from Eric Anderson at eric.anderson@noaa.gov.
Our Bayesian framework was considerably more efficient than the likelihood framework,
reducing computation time from several weeks to several days. This increase in
efficiency makes implementation of the gamma model feasible for large data sets using
long MCMC runs in LAMARC, such as those generated when accounting for uncertainty
in gametic phase by sampling across a range of allele combinations. Our procedure for
implementing the gamma model for mutation rate variation within a Bayesian framework
can be more generally applied to any taxon for which an average multi-locus mutation
rate is available.
Sources of uncertainty
While we have attempted to reflect the overall variance in long-term abundance by
bootstrap re-sampling across the variation in each parameter used in our estimate, some
uncertainties remain that cannot be captured by our confidence intervals. Some of these,
such as the potential influence of population structure, were tested through the use of
coalescent simulation, while others, such as issues related to the estimation of mutation
rates and the ratio of Nmature/Ne, are more general problems within the field of
evolutionary genetics that remain unresolved. In the latter case, we have chosen values
that most closely reflect the current state of understanding in the field, while
acknowledging the role that these uncertainties play in our final estimate of long-term
population size.
Accounting for the potential influence of population substructure and/or interspecific
gene flow on estimates of genetic diversity is one of the primary challenges to
successfully using genetic data to estimate long-term population size (Alter & Palumbi
2007; Atkinson et al. 2008). For example, a hybridization event between common minke
and Antarctic minke whales may increase our estimate of Ne. Genetic and morphological
research indicate that Antarctic minke whales are reproductively isolated from all other
species of minke whales (reviewed in Rice 1998) and recent genetic data indicate that
they are likely to have diverged from their most recent common ancestor, the common
minke whale, at least 4 to 11 million years ago (depending upon the mutation rate and
generation time employed) (Pastene et al. 2007). This time frame is approaching 4Ne
generations, the average time at which nuclear loci are expected to become reciprocally
monophyletic, thus, our genetic estimate is unlikely to be an artifact of gene flow from
other species. Sequences of common minke whales at the same loci as those employed in
this study may help assess the possibility that a past hybridization event influenced our
estimate of Ne.
Population substructure can increase coalescence time between genes and inflate
estimates of genetic diversity. While preliminary reports have suggested that there may
be population substructure within Antarctic minke whales (st = 0.0090, p = 0.0025)
(Pastene et al. 1996), these distinctions are weak and our nuclear analysis did not detect
any hidden population substructure in our data. Nevertheless, we assessed the potential
impact of population structure on our genetic estimate by simulating seven populations in
an Antarctic ring, joined by various levels of migration. The results suggest that
population structure would not significantly increase unless migration between subpopulations was so low that the expected st > 0.10 (Figure 2). Furthermore, differed
little regardless of whether samples were drawn from a single sub-population or drawn
evenly from across all sub-populations, suggesting that even though our samples are from
a limited geographic area, our  reflects ocean-wide genetic diversity.
Here we estimate a substitution rate for the Antarctic minke whale using a Bayesian
analysis of the baleen whale phylogeny and fossil history (Jackson et al in press)
Several studies have suggested that neutral rates of substitution in mysticete whales are
slow relative to other mammals (Jackson et al. in press; Martin & Palumbi 1993;
Nabholz et al. 2008) and therefore the application of a baleen whale-specific mutation
rate is important. However, recent debates over the extent to which mutation rates evolve
faster over shorter timescales than over longer timescales (Emerson 2007; Ho et al. 2005)
lend uncertainty to the appropriateness of our use of a phylogentically derived rate. Ho et
al (2005) argue that the relationship between molecular rates and the time period over
which they are measured follows an exponential decline, but Emerson (2007) has
criticized the reliability of some of the data that underlies this relationship and suggests
that this relationship is weak or nonexistent. As recommended by Ho & Larsen (2006),
we have implemented rates calculated using a relaxed clock framework that permit rates
to vary among branches, allowing us to avoid some of the problems associated with using
a strict molecular clock approach. It is interesting to note that the majority of the debate
over time-dependence of rates has centered around mtDNA rather than nuclear genetic
markers. It has been suggested that the influence of purifying selection on timedependence in rates may be substantial for mtDNA markers (Ho et al. 2006), but the
effect of purifying selection and rate dependence on nuclear markers has yet to be
explored. If time dependence of rates is a factor for our nuclear loci, then our rate may be
too slow and a faster rate would reduce our estimate of long-term population size.
In an ideal population with random mating, an equal sex ratio, non-overlapping
generations, and random variation in reproductive success, Nmature is equal to Ne.
Because most populations, including minke whales, likely deviate from ideal conditions,
we approximate a Nmature/Ne of ~2 based on equation (1) in Nunney and Elam (1994).
Nunney (1991; 1993) has shown that a number of factors lead Nmature/Ne to be close to 2
in populations with overlapping generations. Comparative studies based upon ecological
data suggest that for low fecundity mammals, this ratio ranges from 2-10 (Frankham
1995). However, because minke whales generally have no more than one calf a year and
females are not known to concentrate on their breeding grounds, it is likely that the ratio
for minke whales lies at the lower end of the distribution for mammals in general. A
more refined estimate of this ratio could be attained with direct measurements of lifetime
variation in reproductive success; higher variation in reproductive success would lead to
an Nmature/Ne >2 and a larger estimate of the long-term population size.
Literature Cited
Alter SE, Palumbi SR (2007) Could genetic diversity in eastern North Pacific gray
whales reflect global historic abundance? Proc. Natl. Acad. Sci. USA 104, E3-4.
Alter SE, Rynes E, Palumbi SR (2007) DNA evidence for historic population size and
past ecosystem impacts of gray whales. Proc. Natl. Acad. Sci. USA 104, 1516215167.
Atkinson QD, Gray RD, Drummond AJ (2008) mtDNA variation predicts population size
in humans and reveals a major Southern Asian chapter in human prehistory. Mol.
Biol. Evol. 25, 468-474.
Emerson BC (2007) Alarm bells for the molecular clock? No support for Ho et al.'s
model of time-dependent molecular rate estimates. Syst. Biol. 56.
Frankham R (1995) Effective population size/adult population size ratios in wildlife: a
review. Gen. Res. 66, 96-107.
Ho SYW, Larson G (2006) Molecular clocks: when times are a-changin'. TRENDS in
Genetics 22, 79-83.
Ho S, Phillips MJ, Cooper A, Drummond AJ (2005) Time dependency of molecular rate
estimates and systematic overestimation of recent divergence times. Mol. Biol.
Evol. 22, 1561-1568.
Jackson JA, Baker CS, Vant M, et al. (in press) Big and Slow: Phylogenetic estimates of
molecular evolution in baleen whales (Suborder Mysticeti). Mol. Biol. Evol.
Martin AP, Palumbi SR (1993) Body size, metabolic rate, generation time and the
molecular clock. Proc. Natl. Acad. Sci. USA 90, 4087-4091.
Nabholz B, Glemin S, Galtier N (2008) Strong variations of mitochondrial mutation rate
across mammals- the longevity hypothesis. Mol. Biol. Evol. 25, 120-130.
Nunney L (1991) The influence of age structure and fecundity on effective population
size. Proc. Roy. Soc. Ser. B: Biol. Sci. 246, 71-76.
Nunney L (1993) The influence of mating system and overlapping generations on
effective population size. Evolution 47, 1329-1341.
Nunney L, Elam DR (1994) Estimating the effective population size of conserved
populations. Conserv. Biol. 8, 175-184.
Pastene LA, Goto M, Itoh S, Numachi K (1996) Spatial and temporal patterns of
mitochondrial DNA variation in minke whales from Antarctic areas IV and V.
Rep. Int. Whal. Comm. 46, 305-314.
Pastene LA, Goto M, Kanda N, et al. (2007) Radiation and speciation of palagic
organisms during periods of global warming: the case of the common minke
whale, Balaenoptera acutorostrata. Mol. Ecol. 16, 1481-1495.
Rice DW (1998) Marine mammals of the world. The Society for Marine Mammalogy.
SI Fig. 1. Comparison of variation in individual locus mutation rates with the gamma
distribution estimated by LAMARC. Bars represent a histogram of mean mutation rate
obtained by bootstrap re-sampling over individual locus mutation rates from Alter et al.
(2007) and Jackson et al (in press). Blue line is a histogram of 50,000 means of 11
random variables simulated from the gamma distribution with the shape parameter
estimated in LAMARC.
SI Figure 2. The figure illustrates considerable variation in generation length across
years and areas in the Antarctic minke whale. Generation length was estimated as the
average age of sexually mature individuals from commercial and JARPA catch-at-age
matrices (see Table 1 in Butterworth et al 1999). Blue dots indicate catches in Area IV
and read dots indicate catches in Area V (Fig 1).
SI Figure 3. Convergence of LAMARC-estimated  at 11 nuclear introns using 10
different realizations from PHASE’s posterior distribution across three separate
LAMARC runs. Each curve represents the estimated posterior distribution of  at each
locus. The 10 curves in each color: the colors distinguish between the different
LAMARC runs and the 10 curves represent alternate PHASE realizations.
Download