Document 11743579

10/2/14 The Neutral Theory of Molecular Evolution v Kimura (1968) ² initially viewed as a challenge to Darwinian evolution ²  e.g., King & Jukes 1969 Science 164:788-798. v many genetic polymorphisms have no effect on fitness and are therefore selectively neutral v neutral polymorphisms are maintained by the combined effects of mutation and drift ² mutations introduce new alleles as others are lost through drift Origins of the “Selectionist-Neutralist” Debate v  the only “mutations” early biologists saw were ones that had phenotypic effects v  1953 - structure of DNA (Watson & Crick / Franklin) v  1960-70’s - protein electrophoresis ²  revealed allelic diversity for many genes v  1968 - Motoo Kimura - the neutral theory ²  motivated by allozyme (amino acid) variation v  discovery of “junk DNA” ²  98.5% of human genome is non-coding v  DNA sequencing has revealed substantial silent and noncoding variation, suggesting that much genetic variation is selectively neutral (or nearly so!) 1 10/2/14 Infinite Alleles/Sites Model v what is the expected level of genetic diversity (heterozygosity) given mutation and drift in a finite population? v suppose a gene is 900 base pairs long, coding for 300 amino acids ² there are 4900 = 10542 possible sequences (sorta…) v thus, we can reasonably assume that each new mutation generates a unique allele… Infinite Alleles/Sites Model v it follows that alleles with the same sequence are identical by descent v autozygous - a genotype with two alleles that are identical by descent v allozygous - a genotype with alleles that are not identical by descent (is this possible?) ² arbitrarily declare all alleles unique at t = 0 v autozygous = homozygous under the infinite alleles model ² thus, the level of heterozygosity can be predicted from the expected level of autozygosity 2 10/2/14 Infinite Alleles/Sites Model v Ft = probability that two randomly chosen alleles are IBD ² same as autozygosity if we randomly choose alleles to form genotypes " 1 % " 1 % 2 2 Ft = $ '(1− µ) + $1− '(1− µ) Ft−1 # 2N & # 2N & ² in this model, mutations generate new alleles and “erase” IBD € Infinite Alleles Model v in a random-breeding population of constant size, an equilibrium is reached where the increase in autozygosity (~IBD) due to loss of alleles by drift is exactly countered by the increase in heterozygosity produced by new mutations v solving for Ft = Ft-1 yields: Fˆ = € 1 1+ 4Nµ 3 10/2/14 Infinite Alleles Model Fˆ = 1 1 1 = = 1+ 4Nµ 1+ 4N e µ 1+ θ v given the assumption of infinite alleles, any genotype that is not autozygous is heterozygous, so € Ĥ = 1− F̂ = 4N eµ θ = 1+ 4N eµ 1+ θ Neutral Expectations for Genetic Diversity 4 10/2/14 F̂ = 1 1 = 1+ 4N µ 1+ θ v although an ideal population is expected to reach an “equilibrium” value of F, the population is not really at equilibrium, but rather in a “dynamic steady state” because there is a continual turnover of alleles ² the most common allele is periodically replaced by another, other alleles are lost, and new alleles are produced by mutation v DNA sequencing revealed surprisingly high levels of neutral genetic variation 5 10/2/14 DNA Sequence-based Measures of Genetic Variation v S = number of segregating sites v ∏ = average number of pairwise differences between sequences v ∏ analogous to heterozygosity v can derive theoretical expectations for both measures for an idealized, random breeding population (and also assuming an “infinite sites” model)… Segregating sites v expected number of segregating sites: " 1 1 1 1 % E ( S ) = θ $1+ + + +... + ' # 2 3 4 k −1 & v where θ = 4Nµ and k = the number of sequences in the sample v µ (“mu”) is the per locus mutation rate = mutation rate per site per generation x length of sequence 6 10/2/14 Coalescent theory often provides “easy” derivations of classical theory v e.g., number of segregating sites in a sample ² is a function of the total length (in generations) of the coalescent tree E(T) times the mutation rate per locus per generation k k−1 # k & k 4N 1 E (T ) = E %∑ iTi ( = ∑ iE (Ti ) = ∑ i = 4N ∑ $ i= 2 ' i= 2 i= 2 i(i −1) i=1 i k−1 k−1 1 1 E ( S ) = µE (T ) = 4Nµ∑ = θ ∑ i i i=1 i=1 € € Coalescent theory often provides “easy” derivations of classical theory v e.g., number of segregating sites in a sample ² is a function of the total length (in generations) of the coalescent tree E(T) times the mutation rate per locus per generation n n n n−1 2k 1 1 = 2∑ = 2∑ k=2 k ( k −1) k=2 k −1 k=1 k E [total _ treelength ] = ∑ kE [tk ] = ∑ k=2 n−1 # n−1 1 & θ 1 E ( S ) = % 2∑ ( = θ ∑ $ k=1 k ' 2 k=1 k 7 10/2/14 Average number of pairwise differences Π= total number of nucleotide mismatches total number of pairwise comparisons v in an idealized population, the expected value of ∏ is θ: E ( Π) = θ = 4N µ € v θ can also be estimated from S: " 1 1 1 1 % θ = E ( S ) $1+ + + +... + ' # 2 3 4 k −1 & ! 1 1 1$ θ = 16 #1+ + + & = 7.68 " 2 3 4% # ( 6 × 6 ) + ( 4 × 9 ) + ( 7 ×1) + ( 0 × 484) & E(θ ) = Π = % ( = 7.90 10 $ ' 8 10/2/14 Key point! v differences in the values for number of segregating sites and average pairwise differences lead to the inference that the gene(s) or the population departs in one or more ways from the ideal “null model” (i.e., constant population size, no selection, etc..) mtDNA haplotypes for big brown bats (Eptesicus fuscus) east of the Rockies v θ (∏) = 5.35 v θ (S) = 10.42 9 10/2/14 Data versus histories v generally, the coalescent history of a sample is unknowable v but, can be crudely approximated by building a gene tree based on DNA sequence data “hanging” mutations on the tree… Rosenberg & Nordborg 2002 Nat Rev Gen 10 10/2/14 Data versus histories v generally, the coalescent history of a sample is unknowable v but, can be crudely approximated by building a gene tree based on DNA sequence data v genealogical histories estimated with sequence data typically collapse to poorly resolved “networks” Ornithine decarboxylase intron 6 sequences for mallards, black ducks, and mottled ducks Harrigan et al. 2008 Mol. Ecol. Resources 11 10/2/14 The Ewens Distribution v beyond F, there is additional “information” available in the number of alleles present and the distribution of allele frequencies ² “allelic configuration” ² or “allele-frequency spectrum” v Ewens (1972) - expected number of alleles k in a sample of size n, depends on θ E ( k ) = 1+ θ θ θ + + ...+ θ +1 θ + 2 θ + n −1 € Values of θ = 4Nµ E ( k ) = 1+ Expected number of unique alleles 18 0.125 θ θ θ + + ...+ θ +1 θ + 2 θ + n −1 0.25 16 0.5 14 1€ 12 2 4 10 8 8 16 32 6 4 2 0 0 5 10 15 20 N (sample size) 12 10/2/14 The Ewens Sampling Formula v but there’s more than just k (number of alleles) v the Ewens Distribution specifies the probability distribution on the set of all partitions of the integer n ² a.k.a. the “Chinese Restaurant” problem ² broad applicability outside of population genetics Partitions of 8 v 8 v 7 + 1 v 6 + 2 v 6 + 1 + 1 v 5 + 3 v 5 + 2 + 1 v 5 + 1 + 1 + 1 v 4 + 4 v 4 + 3 + 1 v 4 + 2 + 2 v 4 + 2 + 1 + 1 v 4 + 1 + 1 + 1 + 1 v 3 + 3 + 2 v 3 + 3 + 1 + 1 v 3 + 2 + 2 + 1 v 3 + 2 + 1 + 1 + 1 v 3 + 1 + 1 + 1 + 1 + 1 v 2 + 2 + 2 + 2 v 2 + 2 + 2 + 1 + 1 v 2 + 2 + 1 + 1 + 1 + 1 v 2 + 1 + 1 + 1 + 1 + 1 + 1 v 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 13 10/2/14 Ewens’ Sampling Formula (from Wikipedia!) v Ewens’ result provided the basis for a formula (Karlin & McGregor, 1972) giving the probability of a given allele frequency configuration (note: this is just one formulation)… a n n! θ j Pr{a1,...,an } = ∏ θ (θ + 1) ... (θ + n −1) j=1 j a j a j ! where a1,...,an are counts of the number of alleles represented one, two,..., n times in the sample. a1,...,an are nonnegative integers that satisfy : a1 + 2a2 + 3a3 + ...+ nan = n € Karlin & McGregor 1972 v this equation works too… r! 1 θk Pr(θ ; n1, n 2 ,..., n k ) = n1n 2 ... n k α1!α 2!...α p ! Lr (θ ) where Lr (θ ) = θ (θ + 1)(θ + 1) ... (θ + r −1) € 14 10/2/14 15 10/2/14 Ewens’ Sampling Formula v key point: the Ewens distribution provides a basis for testing observed data against the neutral model Ewens’ Sampling Formula v key point: the Ewens distribution provides a basis for testing observed data against the neutral model v and if the data fit neutral expectations, they can be used to estimate demographic and historical parameters 16 10/2/14 Site Frequency Spectrum v applicable to DNA sequence data v what is the distribution of frequencies for individual mutations (SNPs) v number of derived “singletons” = θ v why? ²  length of “external branches” = 4N generations ²  (note: t = 2 in Nielsen & Slatkin) v frequency distribution for derived mutations 1/ j E !" f j #$ = n−1 , for j = 1, 2, ... , n-1 1 ∑k k−1 Neutral Expectations with… v ...constant population size and mutation nucleotide diversity (infinite sites) # segregating sites (infinite sites) heterozygosity (infinite alleles) # unique alleles (infinite alleles) E ( Π) = θ = 4N µ " 1 1 1 1 % E ( S ) = θ $1+ + + +... + ' # 2 3 4 k −1 & θ Ĥ = 1+ θ θ θ θ + + ...+ θ +1 θ + 2 θ + n −1 E ( k ) = 1+ allele frequency distribution (infinite alleles) Pr {a ,..., a 1 site frequency€ distribution (infinite sites) n} = a n n! θ j ∏ θ (θ +1) ... (θ + n −1) j=1 j a j a j ! 1/ j E !" f j #$ = n−1 , for j = 1, 2, ... , n-1 1 ∑k k−1 17 10/2/14 The coalescent with population growth v coalescent trees are expected to be sparse (few lineages) near the root for populations of constant size v in a growing or shrinking population, the distribution of coalescence times differs from expectations for the ideal population v expanding populations have more nodes closer to the root of the tree ² takes longer for alleles to “find each other” in a growing population constant 18 10/2/14 growing declining 19 10/2/14 constant growing declining So, what’s the point? v coalescent modeling ² simulate genealogies under a given set of population parameters ² “hang” random mutations on those trees in equal number (or at the same rate) as in the observed data ² evaluate whether the observed data could have been produced by a random coalescent process (the null hypothesis) 20

Document 11743579

Related documents

Products

Support

Document 11743579

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib