Implementation of gamma model within a Bayesian framework

Implementation of gamma model within a Bayesian framework We describe a new approach for incorporating mutation rate variation among loci when using coalescent analysis to estimate genetic diversity. While Alter et al. (2007) used locus-specific mutation rates in their calculation of an overall , such an approach requires that phylogenetic calibrations be available for each locus for which intraspecific data are gathered. Here, we model individual locus mutation rates as being drawn from a gamma distribution, which has the advantage of being applicable to genealogies of any set of loci. A comparison of the variation in individual locus mutation rates with the gamma distribution suggests that the gamma distribution provides a good approximation of the data (SI Figure 1). Because the current implementation of LAMARC allows the gamma model only within the likelihood framework, we developed our own extension of LAMARC (called GUFBUL-Gamma Updating For Bayesians Using LAMARC) that allows the gamma model to be applied in a Bayesian framework. This program uses the estimated posterior densities of  for each locus as input to a Markov chain Monte Carlo algorithm that provides samples from the posterior distributions of the overall E-10, the locus-specific relative rate parameters, and the scale parameter of the gamma distribution. GUFBUL will be distributed as a small subcomponent of the LAMARC package (personal communication, Mary Kuhner, University of Washington, Dept. of Genome Sciences) along with full details of its implementation. Implementation details are also available directly from Eric Anderson at eric.anderson@noaa.gov. Our Bayesian framework was considerably more efficient than the likelihood framework, reducing computation time from several weeks to several days. This increase in efficiency makes implementation of the gamma model feasible for large data sets using long MCMC runs in LAMARC, such as those generated when accounting for uncertainty in gametic phase by sampling across a range of allele combinations. Our procedure for implementing the gamma model for mutation rate variation within a Bayesian framework can be more generally applied to any taxon for which an average multi-locus mutation rate is available. Sources of uncertainty While we have attempted to reflect the overall variance in long-term abundance by bootstrap re-sampling across the variation in each parameter used in our estimate, some uncertainties remain that cannot be captured by our confidence intervals. Some of these, such as the potential influence of population structure, were tested through the use of coalescent simulation, while others, such as issues related to the estimation of mutation rates and the ratio of Nmature/Ne, are more general problems within the field of evolutionary genetics that remain unresolved. In the latter case, we have chosen values that most closely reflect the current state of understanding in the field, while acknowledging the role that these uncertainties play in our final estimate of long-term population size. Accounting for the potential influence of population substructure and/or interspecific gene flow on estimates of genetic diversity is one of the primary challenges to successfully using genetic data to estimate long-term population size (Alter & Palumbi 2007; Atkinson et al. 2008). For example, a hybridization event between common minke and Antarctic minke whales may increase our estimate of Ne. Genetic and morphological research indicate that Antarctic minke whales are reproductively isolated from all other species of minke whales (reviewed in Rice 1998) and recent genetic data indicate that they are likely to have diverged from their most recent common ancestor, the common minke whale, at least 4 to 11 million years ago (depending upon the mutation rate and generation time employed) (Pastene et al. 2007). This time frame is approaching 4Ne generations, the average time at which nuclear loci are expected to become reciprocally monophyletic, thus, our genetic estimate is unlikely to be an artifact of gene flow from other species. Sequences of common minke whales at the same loci as those employed in this study may help assess the possibility that a past hybridization event influenced our estimate of Ne. Population substructure can increase coalescence time between genes and inflate estimates of genetic diversity. While preliminary reports have suggested that there may be population substructure within Antarctic minke whales (st = 0.0090, p = 0.0025) (Pastene et al. 1996), these distinctions are weak and our nuclear analysis did not detect any hidden population substructure in our data. Nevertheless, we assessed the potential impact of population structure on our genetic estimate by simulating seven populations in an Antarctic ring, joined by various levels of migration. The results suggest that population structure would not significantly increase unless migration between subpopulations was so low that the expected st > 0.10 (Figure 2). Furthermore, differed little regardless of whether samples were drawn from a single sub-population or drawn evenly from across all sub-populations, suggesting that even though our samples are from a limited geographic area, our  reflects ocean-wide genetic diversity. Here we estimate a substitution rate for the Antarctic minke whale using a Bayesian analysis of the baleen whale phylogeny and fossil history (Jackson et al in press) Several studies have suggested that neutral rates of substitution in mysticete whales are slow relative to other mammals (Jackson et al. in press; Martin & Palumbi 1993; Nabholz et al. 2008) and therefore the application of a baleen whale-specific mutation rate is important. However, recent debates over the extent to which mutation rates evolve faster over shorter timescales than over longer timescales (Emerson 2007; Ho et al. 2005) lend uncertainty to the appropriateness of our use of a phylogentically derived rate. Ho et al (2005) argue that the relationship between molecular rates and the time period over which they are measured follows an exponential decline, but Emerson (2007) has criticized the reliability of some of the data that underlies this relationship and suggests that this relationship is weak or nonexistent. As recommended by Ho & Larsen (2006), we have implemented rates calculated using a relaxed clock framework that permit rates to vary among branches, allowing us to avoid some of the problems associated with using a strict molecular clock approach. It is interesting to note that the majority of the debate over time-dependence of rates has centered around mtDNA rather than nuclear genetic markers. It has been suggested that the influence of purifying selection on timedependence in rates may be substantial for mtDNA markers (Ho et al. 2006), but the effect of purifying selection and rate dependence on nuclear markers has yet to be explored. If time dependence of rates is a factor for our nuclear loci, then our rate may be too slow and a faster rate would reduce our estimate of long-term population size. In an ideal population with random mating, an equal sex ratio, non-overlapping generations, and random variation in reproductive success, Nmature is equal to Ne. Because most populations, including minke whales, likely deviate from ideal conditions, we approximate a Nmature/Ne of ~2 based on equation (1) in Nunney and Elam (1994). Nunney (1991; 1993) has shown that a number of factors lead Nmature/Ne to be close to 2 in populations with overlapping generations. Comparative studies based upon ecological data suggest that for low fecundity mammals, this ratio ranges from 2-10 (Frankham 1995). However, because minke whales generally have no more than one calf a year and females are not known to concentrate on their breeding grounds, it is likely that the ratio for minke whales lies at the lower end of the distribution for mammals in general. A more refined estimate of this ratio could be attained with direct measurements of lifetime variation in reproductive success; higher variation in reproductive success would lead to an Nmature/Ne >2 and a larger estimate of the long-term population size. Literature Cited Alter SE, Palumbi SR (2007) Could genetic diversity in eastern North Pacific gray whales reflect global historic abundance? Proc. Natl. Acad. Sci. USA 104, E3-4. Alter SE, Rynes E, Palumbi SR (2007) DNA evidence for historic population size and past ecosystem impacts of gray whales. Proc. Natl. Acad. Sci. USA 104, 1516215167. Atkinson QD, Gray RD, Drummond AJ (2008) mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. Mol. Biol. Evol. 25, 468-474. Emerson BC (2007) Alarm bells for the molecular clock? No support for Ho et al.'s model of time-dependent molecular rate estimates. Syst. Biol. 56. Frankham R (1995) Effective population size/adult population size ratios in wildlife: a review. Gen. Res. 66, 96-107. Ho SYW, Larson G (2006) Molecular clocks: when times are a-changin'. TRENDS in Genetics 22, 79-83. Ho S, Phillips MJ, Cooper A, Drummond AJ (2005) Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol. Biol. Evol. 22, 1561-1568. Jackson JA, Baker CS, Vant M, et al. (in press) Big and Slow: Phylogenetic estimates of molecular evolution in baleen whales (Suborder Mysticeti). Mol. Biol. Evol. Martin AP, Palumbi SR (1993) Body size, metabolic rate, generation time and the molecular clock. Proc. Natl. Acad. Sci. USA 90, 4087-4091. Nabholz B, Glemin S, Galtier N (2008) Strong variations of mitochondrial mutation rate across mammals- the longevity hypothesis. Mol. Biol. Evol. 25, 120-130. Nunney L (1991) The influence of age structure and fecundity on effective population size. Proc. Roy. Soc. Ser. B: Biol. Sci. 246, 71-76. Nunney L (1993) The influence of mating system and overlapping generations on effective population size. Evolution 47, 1329-1341. Nunney L, Elam DR (1994) Estimating the effective population size of conserved populations. Conserv. Biol. 8, 175-184. Pastene LA, Goto M, Itoh S, Numachi K (1996) Spatial and temporal patterns of mitochondrial DNA variation in minke whales from Antarctic areas IV and V. Rep. Int. Whal. Comm. 46, 305-314. Pastene LA, Goto M, Kanda N, et al. (2007) Radiation and speciation of palagic organisms during periods of global warming: the case of the common minke whale, Balaenoptera acutorostrata. Mol. Ecol. 16, 1481-1495. Rice DW (1998) Marine mammals of the world. The Society for Marine Mammalogy. SI Fig. 1. Comparison of variation in individual locus mutation rates with the gamma distribution estimated by LAMARC. Bars represent a histogram of mean mutation rate obtained by bootstrap re-sampling over individual locus mutation rates from Alter et al. (2007) and Jackson et al (in press). Blue line is a histogram of 50,000 means of 11 random variables simulated from the gamma distribution with the shape parameter estimated in LAMARC. SI Figure 2. The figure illustrates considerable variation in generation length across years and areas in the Antarctic minke whale. Generation length was estimated as the average age of sexually mature individuals from commercial and JARPA catch-at-age matrices (see Table 1 in Butterworth et al 1999). Blue dots indicate catches in Area IV and read dots indicate catches in Area V (Fig 1). SI Figure 3. Convergence of LAMARC-estimated  at 11 nuclear introns using 10 different realizations from PHASE’s posterior distribution across three separate LAMARC runs. Each curve represents the estimated posterior distribution of  at each locus. The 10 curves in each color: the colors distinguish between the different LAMARC runs and the 10 curves represent alternate PHASE realizations.

Implementation of gamma model within a Bayesian framework

Related documents

Products

Support

Implementation of gamma model within a Bayesian framework

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib