Chapter 2: Bayesian hierarchical models in geographical genetics

Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler • Geographical genetics is the field of population genetics that focuses on describing the distribution of genetic variation within and among populations and understanding the processes that produce those patterns. • Statistical sampling uncertainty arises from the process of constructing allele frequency estimates from population samples. • Genetic sampling uncertainty arises from the underlying stochastic evolutionary process that gave rise to the population we sampled. – Note: increasing the sample size of alleles with each population reduces statistical uncertainty, but it cannot reduce the magnitude of genetic uncertainty. • Weir and Cockerham approach is the most widely used approach for analysis of genetic diversity in hierarchically structured populations. • Bayesian approach provides a model-based approach to inference that is enormously powerful and flexible. • Hierarchical Bayesian models provide a natural approach to inference in geographical genetics. Weir and Cockerham Approach • To illustrate the formalism, consider a set of populations segregating for 2 alleles, A1 and A2 at a single locus • pk frequency of allele at A1 • Xij,k frequency of genotype AiAj in the kth population k=1,…,K x11   p2   p2 1 K pk where  p   x12   p (1   p )  2 p2 K k 1 x 22  (1   p ) 2   p2 1 K 2 and  p   ( pk   p ) 2 1 K K k 1 xij   xij,k K k 1 F   p2 • Variance st  p (1   p ) • Fst can be interpreted as the fraction of genetic diversity due to differences in allele frequencies among populations . Hierarchical Bayesian Models • A hierarchical Bayesian model uses the full power of the data for simultaneous estimators of the parameters while accounting for both statistical and genetic uncertainty. • To account for statistical uncertainty assume that alleles are sampled independently within populations. • Also assume the samples are drawn independently across loci and population. • Likelihood of the sample from a single population is binomial. I K P({lik },{nik } | { pik },{ i }, )   piklik (1  pik ) nik lik i 1 k 1 • To account for genetic uncertainty we must assume a parametric form for the among-population allele frequency distribution. • It is natural to assume that population allele frequencies follow a Beta distribution, 1  1     P( pik |  i , )  Beta   , 1    i   i           where E(pik) = π and Var(pik) = θπ(1 - π). • Thus, θ is equivalent to Fst. • The posterior distribution for the parameters is  I  K lik   nik lik P({ pik }, { i },  | {lik }, {nik })      pik (1  pik ) P( pik |  i , )  P( i )  P( )   i 1  k 1  where P(πi) and P(θ) are the prior distributions for πi and θ, respectively. A fully hierarchical model • To estimate the correlation of allele frequencies within loci, we need to add an additional level to the hierarchy that describes the distribution of mean allele frequencies across loci P(πi| π,θy). • Regard the loci in the sample as a sample from a larger universe of loci from which we might have sampled. • Regard the populations in our sample as a sample from a larger universe of populations from which we might have sampled. • The likelihood is unchanged. The posterior becomes P( x ,  y ,  , { i }, { pik } | {lik }, {nik })   K lik  nik lik P( pik |  i , x  P( i |  , y ) P( x ) P( y ) P( )   pik (1  pik )  i 1  k 1  I where P( pik |  i , x ) is the Beta distribution for θx, and P( pik |  i , y ) is the Beta distribution for θy. Developing an MCMC sampler • The process begins by picking an initial value for p, called p0, then p0 is updated until we have a large sample of values pt using either – Metropolis-Hastings algorithm (Figure 2.2) – Slice algorithm (Figure 2.3) • Estimate any property of the posterior to an arbitrary degree of accuracy. • Ensure that the MC has converged the values from an initial burn-in period are discarded. • Values retained from the following sample period represent the full posterior distribution and summary statistics are calculated directly from this sample. • Reduce the autocorrelation of values in the sample, it is sometimes useful to thin the sample.

Chapter 2: Bayesian hierarchical models in geographical genetics

Related documents

Products

Support

Chapter 2: Bayesian hierarchical models in geographical genetics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib