Population Genetics IV: Genetic drift

advertisement
Population Genetics IV: Genetic drift
Today we introduce the third big force in population genetics and rival of natural
selection: Genetic drift. Genetic drift (or just drift) expresses the random change in allele
frequencies due to random sampling among gametes. For every allele that is found in a
population, there are two possible reasons why it may be transferred to the offspring
generation: bad fitness or just bad luck. While selection describes the first of these
reasons, drift addresses the second factor. Genetic drift has several consequences that
we will consider in turn:
1) Allele frequencies change within populations
2) Variation in allele frequency across populations increases
3) Heterozygosity and the genetic variation within populations declines
The Wright-Fisher Model
We consider a random mating diploid population without separate sexes (hermaphroditic
or monoecious) of constant size N. Suppose that there are two alleles, A and a, with
equal fitness at a single locus, and that there is no mutation. In any generation, there are
2N homologous genes in the population. Xk is the number of A Alleles in the kth
generation, the frequency of the A allele in the kth generation is thus p k = Xk/2N. We want
to model the allele frequency change among generations.
One of the simplest models for genetic drift is the Wright-Fisher model. The allelic types of
the 2N genes in the offspring generation k+1 are obtained as follows: Each offspring
gene randomly picks a parent from the previous generation and inherits its allelic value, A
or a. By this procedure, parent genes can be picked by chance multiple times or not at all.
In the latter case, they do not leave any offspring.
Assume first that we do not only pick a single offspring generation from a parent
generation, but a large number of replicate offspring generations. If the number of
replicates grows to infinity, we thus draw an infinite number of times from the parent
generation. The average frequency over these replicates is then pk +1 = pk . In a single
offspring generation the frequency will likely deviate somewhat from the expected value
p k. If the population size is large, we still expect p k+1 to be very close to p k; in small
populations larger deviations are more likely.
The Wright-Fisher model (with mutation included, see next lecture) is the null-model in
population genetics. This means, it is used to construct null hypotheses for the change of
allele frequencies in populations. These hypotheses are then tested against data.
Rejection of the simple Wright-Fisher model is then taken as evidence that something
more interesting than “just mutation and drift” acts in the population, for example
selection.
Allele frequencies change within populations
We want to quantify the effects of genetic drift. In particular: Given that the frequency of
the A allele is p in the parent generation, what is the probability to find X = i copies of the
A allele in the offspring generation (where I could range between 0 and 2N)?
The answer is given by the binomial distribution
 2N  i
2 N −i
Pr{X = i} = 
 p (1 − p )
 i 
where
2 N!
 2N 

=
 i  i! (2 N − i )!
is the binomial coefficient.
Example: Assume N = 5, thus 2N = 10 alleles in the population and initially 2 copies of
the A allele, thus i = 2 and p = 2/10 = 0.2.
The probability to still have i = 2 alleles in the following generation is:
10 
Pr{X = 2} =  0.2 2 0.88 = 0.30
2
The probability to have i = 0 copies in the offspring generation is:
10 
Pr{X = 0} =  0.2 0 0.810 = 0.11
0
If we assume 10 times the population size, but still initially p = 0.2, the probability from p =
0 in the following generation becomes
100  0 100
−10
Pr{X = 0} = 
0.2 0.8 = 2.0 × 10
 0 
We see that the probability for large changes in the allele frequency is much reduced in a
large population.
The variance in p increases across populations
Imagine that a single parent population gives rise to several “colony” offspring
populations, all of the same size N. The frequency of the allele A in the parent population
is p = p0. We ask: How will the allele frequencies in the colonies differ over time if there is
no new mutation and no selection?
Initially, the frequency of A in all colonies will be very close to the original value p 0. But
over time, p will drift up in some colonies and down in others: the average p taken over all
colonies stays at p0, but the variance of p among colonies increases. Eventually, some of
the colonies (a percentage of p 0, on average) will have only A alleles and the others only
a alleles. We can quantify how much the variance in allele frequency among colonies
increases per generation. After a single generation, the variance in the number X of A
alleles among colonies is just given by the variance of the binomial distribution, 2Np(1-p).
Thus, the variance of p = X/2N is:
Var[ p ] =
pq
2N
Notice that Var[ p ] → 0 as N → ∞ . The colonies will diverge only very slowly if the
population size is large. For infinite population size, drift ceases to operate: we retain the
prediction under the Hardy-Weinberg law.
Drift versus selection: Drift is often an alternative explanation to selection for patterns of
phenotypic (or nucleotide) diversity that we see in natural populations. For example,
Darwin’s finches on the Galapagos Islands show a large variation in beak size and shape
among populations on different islands. There are two possible explanations for this
variation:
1) natural selection (different beak shapes are adapted to different island
environments)
2) genetic drift (different alleles affecting beak shape have drifted to fixation in
different populations)
In the case of Darwin’s finches, there is convincing evidence that the beak size and
shape has not diverged at random (as would be the case under genetic drift), but has
evolved in order to increase the feeding efficiency in the presence of different types of
food (large/small seeds). But often a distinction is difficult. A similar problem exists with the
interpretation of geographic clines, i.e. the gradual change in a phenotypic trait or allele
frequency over a geographic region. Again, there are two possibilities:
1) natural selection (the mean trait in the population tracks the optimum)
2) genetic drift (two extreme populations might have drifted apart after, say, a
glaciation event. Intermediate allele frequencies in intervening populations could
just reflect the trickle of modern gene flow)
In this case, a standard way to make the case for natural selection is to establish the
existence of parallel clines: If we see the same cline on different continents, selection is
the more likely cause. In general, all methods to distinguish selection and drift try to
establish in one way or another that an observed change is systematic (and therefore
due to selection) rather then random (and therefore likely due to drift).
The heterozygosity H within a population decreases
Although the average allele frequencies in a population stay constant under drift (no
systematic effect), the heterozygosity (and the genetic variance by any measure) is
expected to decrease over time. This is intuitively clear if we consider the very long term
effects of drift:
Assume that allele A is initially present at frequency p0. We already know that the average
frequency, p , will not change over time, i.e. p = p0 . However, in every generation, there
is a small chance that A either fixes (p → 1), or is lost from the population (p → 0). Although
this event is unlikely in any particular generation, it will certainly happen in some
generation if only we wait long enough. There is no conflict of this fact with p = p0 : The A
allele will go to fixation with probability p 0 and be lost with probability (1- p 0), thus
p = 1 p0 + 0(1 − p0 ) = p0 . Since we have only drift in the process and no new mutation,
the population will stay at p = 0 or p = 1 once it reaches this point. This means that the
heterozygosity H = 2p(1-p) will go to zero over the long haul: Even though the variance
among populations increases by drift, the genetic variance within a population decreases.
We can quantify how fast the heterozygosity decreases. For added generality, we
express H as the probability that we find different alleles at two homologous genes – or
nucleotide positions – that are randomly drawn from the population. (Similarly to the
definition of the nucleotide diversity ! in a previous lecture, this extends the definition also
to haploid populations). The change across generations is most easily derived for the
homozygosity G = 1 – H, i.e. the probability that alleles at two randomly drawn genes are
equal.
Consider a population of size N. At a given locus, the homozygosity in the parent
generation is G, we ask for the expected homozygosity G’ in the offspring generation. For
this, we randomly draw a gene from the offspring and derive the probability that a second
gene that is also randomly drawn from the offspring has the same allele. There are two
possibilities:
1) The second gene is a copy of the same parent gene as the first one. Under the
conditions of the Wright-Fisher model, this occurs with a probability of 1/2N.
2) The second gene is not a copy from the same parent gene, but from a randomly
drawn different parent (probability 1 – 1/2N). In this case, the probability that the
alleles at the target locus are equal is just given by the homozygosity G in the
parent generation.
Summing over these two cases, we obtain:
G’ = 1/2N + (1 – 1/2N) G = G + 1/2N (1 – G)
We see that G will increase as long as it is smaller than 1. The reason for this increase is
the case 1) above: in finite populations, there is a finite probability for inbreeding. In our
setting this means: There is a finite probability that that two genes that are randomly
drawn from the offspring generation are descendents of the same parental gene.
Using H = 1 – G, we can express the change of the heterozygosity as
1 

H
H′ = 1 −

2N 
The heterozygosity decreases geometrically with factor (1 – 1/2N). Note that for an infinite
population size the quantity in brackets goes to one and thus H’ = H. This is just what is
expressed by the Hardy-Weinberg law: For an infinite population (no drift) the genetic
variance is preserved. Over multiple generations, H changes according to
1 t

 H
Ht =  1 −

2N  0
where H0 is the starting heterozygosity and Ht the heterozygosity after t generations. We
derive the “half-life” of heterozygosity under drift as:
Ht
1
1  t
=
=  1 −

H0
2
2N 
1 


ln (1/ 2) = t ln 1 −

2N 
t =
− ln2
− ln 2
≈
= 2N ln 2 = 1.39N
1


−(1/ 2N )
ln 1 −

2N 
where we have used the approximation ln(1+ x) ≈ x . We see that the half-life of
heterozygosity is of the order of the population size N: if N = 100, it takes 139
generations to cut H in half; if N = 1 million, takes 1.39X106 generations. We see once
again that drift is a force that is mostly potent in small populations; if N is large, it erodes
the genetic variation very slowly.
Download