ppt

Lecture 25 : Tests of Neutrality

April 14, 2014

Last Time



Human origins

 Out of Africa hypothesis

 Neanderthal and Denisovan genomes

 Introgression into humans



Signatures of selection

Today

 Sequence data and quantification of variation

 Infinite sites model

 Nucleotide diversity (π)

 Sequence-based tests of neutrality

 Ewens-Watterson Test

 Tajima ’ s D

 Hudson-Kreitman-Aguade

 Synonymous versus Nonsynonymous substitutions

 McDonald-Kreitman

The main power of neutral theory is it provides a theoretical expectation for genetic variation in the absence of selection.

Equilibrium Heterozygosity under IAM

H e

=

4 N e

4 N e m m +

1

= q q +

1

 Frequencies of individual alleles are constantly changing

 Balance between loss and gain is maintained

 4N e

μ>>1: mutation predominates, new mutants persist, H is high

 4N e

μ<<1: drift dominates: new mutants quickly eliminated, H is low

Effects of Population Size on Expected Heterozgyosity

Under Infinite Alleles Model (μ=10 -5 )

 Rapid approach to equilibrium in small populations

 Higher heterozygosity with less drift

Fate of Alleles in Mutation-Drift Balance

Generations from birth to fixation

Time between fixation events

 Time to fixation of a new mutation is much longer than time to loss

Fate of Alleles in Mutation-Drift-Selection Balance

Purifying Selection

Which case will have the most time?

Neutrality

Balancing

Selection/Overdominance

A.

10

8

6

4

2

Assume you take a sample of 100 alleles from a large (but finite) population in mutation-drift equilibrium.

What is the expected distribution of allele frequencies in your sample under neutrality and the Infinite Alleles

Model?

B.

C.

2 4 6 8 10

2 4 6 8 10

Number of Observations of Allele

2 4 6 8 10

Hartl and Clark 2007

Allele Frequency Distributions

Black: Predicted from Neutral

Theory

 Neutral theory allows a prediction of frequency distribution of alleles through process of birth and demise of alleles through time

White: Observed (hypothetical)

 Comparison of observed to expected distribution provides evidence of departure from

Infinite Alleles model

 Depends on f, effective population size, and mutation rate

Ewens Sampling Formula

.

Population mutation rate: index of variability of population:



Probability the i-th sampled allele is new given i alleles already sampled:



4 N e



Probability of sampling a new allele on the first sample:

Probability of observing a new allele after sampling one allele:









0



1









1



H e

Probability of sampling a new allele on the third and fourth samples:







2







 i







3

Expected number of different alleles (k) in a sample of 2N alleles is:

E ( k )



2 i

N 





1

0





 i



1









1









2



...









2 N



1

Example: Expected number of alleles in a sample of 4:

E ( k )

 i

N 2  

1



0





 i

 i

3 



0





 i



1









1









2









3

Ewens Sampling Formula

E ( n )



1







 i

N 2 



0



1













1

 i



2



...









2 N



1 where E(n) is the expected number of different alleles in a sample of N diploid individuals, and



= 4N e



.

f e



1

4 N e

 

1





1



1

 Predicts number of different alleles that should be observed in a given sample size if neutrality prevails under

Infinite Alleles Model

 Small



, E(n) approaches 1

 Large



, E(n) approaches 2N

  can be predicted from number of observed alleles for given sample size

 Can also predict expected homozygosity (f e

) under this model

Ewens-Watterson Test

 Compares expected homozygosity under the neutral model to expected homozygosity under Hardy-Weinberg equilibrium using observed allele frequencies

 Comparison of allele frequency distributions

 f e comes from infinite allele model simulations and can be found in tables for given sample sizes and observed allele numbers f

HW

  p i

2

Ewens-Watterson Test Example

Hartl and Clark 2007 f e

 Drosophila pseudobscura collected from winery

 Xanthine dehydrogenase alleles

 15 alleles observed in 89 chromosomes

 f

HW

= 0.366

 Generated f e mean 0.168

by simulation:

How would you interpret this result?

Most Loci Look Neutral According to Ewens-

Watterson Test

Hartl and Clark 2007

DNA Sequence Polymorphisms

 DNA sequence is ultimate view of standing genetic variation: no hidden alleles

 Is this really true?

 What about back mutation?

 Signatures of past evolution are contained in DNA sequence

 Neutral theory presents null model

 Departures due to:

 Selection

 Demographic events

-

Bottlenecks, founder effects

-

Population admixture

Sequence Alignment

 Necessary first step for comparing sequences within and between species

 Many different algorithms

 Tradeoff of speed and accuracy

Quantifying Divergence of Sequences

 Nucleotide diversity (π) is average number of pairwise differences between sequences

 

N

N



1

 ij p i p j

 ij where

N is number of sequences in sample, p i and p j are frequency of sequences i and j in the sample, and

π ij is the proportion of sites that differ between sequences i and j

Sample Calculation of π

5 10 15 20 25 30 35

A

B

C

A->B, 1 difference

A->C, 1 difference

B->C, 2 differences

 

N

N



1

 ij p i p j

 ij

 

3

( 0 .

33 )( 0 .

33 )( 1 / 35 )



( 0 .

33 )( 0 .

33 )( 1 / 35 )



( 0 .

33 )( 0 .

33 )( 2 / 35 )

2



0 .

01867

On average, there are 18.67 polymorphisms per kb between pairs of haplotypes in the population

Tajima ’ s D Statistic

 Infinite Sites Model: each new mutation affects a new site in a sequence

E (



)



 m



 where m is length of sequence, and



4 N e





  m

 Expected number of polymorphic sites in all sequences:

E ( S )

 a

1



S a

1

 i



1 n 



1

1 i



S



S a

1 where n is number of different sequences compared

A

B

C

Sample Calculation of



S

5 10 15 20 25 30 a

1

 i



1 n 



1

1 i



1

1



1

2



1 .

5

 

0 .

01867

Two polymorphic sites

S=2



S



S a

1



2

1 .

5



1 .

33





  m



( 0 .

01867 )( 35 )



0 .

65

35

Tajima ’ s D Statistic

 Two different ways of estimating same parameter:





  m



S



S a

1

 Deviation of these two indicates deviation from neutral expectations d

 



 

S

D

 d

V ( d ) where V(d) is variance of d

Tajima ’ s D Expectations

 D=0: Neutrality d

 



 

S

 D>0

 Balancing Selection: Divergence of alleles (π) increases

OR

 Bottleneck: S decreases

 D<0

 Purifying or Positive Selection: Divergence of alleles decreases

OR

 Population expansion: Many low frequency alleles cause low average divergence



‘ balanced ’ mutation

Neutral mutation d

 



 

S

Balancing Selection

Balancing selection



 Should increase nucleotide diversity (



)

 Decreases polymorphic sites (S) initially.

 D>0

Slide adapted from Yoav Gilad

Recent Bottleneck d

 



 

S

 Rare alleles are lost

 Polymorphic sites (S) more severely affected than nucleotide nucleotide diversity (



)

 D>0

Standard neutral model

Positive Selection and Purifying Selection sweep recovery



S

Advantageous mutation

Neutral mutation d

 



 

S

 s

 s

Time

 Should decrease both nucleotide diversity (



) and polymorphic sites

(S) initially.

 S recovers due to mutation

  recovers slowly: insensitive to rare alleles

 D<0


Rapid Population Growth will also result in an excess of rare alleles even for neutral loci

Standard neutral model

Rapid population size increase

 Most alleles are rare

 Nucleotide diversity (



) depressed

 Polymorphic sites (S) unchanged or even enhanced : 4N e

μ is large

 D<0

Often two main haplotypes, some rare alleles


Most alleles are rare

 

4 N e

 d

 



 

S

How do we distinguish these two forms of divergence

(selection vs demography)?

Hudson-Kreitman-Aguade Test

 Divergence between species should be of same magnitude as variation within species

 Provides a correction factor for mutation rates at different sites

 Complex goodness of fit test

 Perform test for loci under selection and supposedly neutral loci

Hudson-Kreitman-Aguade (HKA) test

Neutral Locus Test Locus A

Polymorphism 8 3

Divergence

20

Polymorphism: Variation within species

Divergence: Variation between species

8

8/20 ≈ 3/8


Hudson-Kreitman-Aguade (HKA) test

Neutral Locus Test Locus B

Polymorphism 8 3

Divergence

20

19

8/20 >> 3/19

Conclusion: polymorphism lower than expected in Test Locus B: Selective sweep?


http://www.nsf.gov/news/mmg/media/images/corn-and-teosinte_h1.jpg

Teosinte Maize Maize w/TBR mutation

HKA Example: Teosinte Branched

 Lab exercise: test Teosinte-Branched Gene for signature of purifying selection in maize compared to Teosinte relative

 Compare to patterns of polymorphism and diversity in Alchohol

Dehydrogenase gene

ppt

Last Time

Human origins

Signatures of selection

Today

The main power of neutral theory is it provides a theoretical expectation for genetic variation in the absence of selection.

Allele Frequency Distributions

DNA Sequence Polymorphisms

Sequence Alignment

Sample Calculation of π

Sample Calculation of

Balancing Selection

Hudson-Kreitman-Aguade Test

Related documents

Products

Support

ppt