Population Genetics IV: Genetic drift Today we introduce the third big force in population genetics and rival of natural selection: Genetic drift. Genetic drift (or just drift) expresses the random change in allele frequencies due to random sampling among gametes. For every allele that is found in a population, there are two possible reasons why it may be transferred to the offspring generation: bad fitness or just bad luck. While selection describes the first of these reasons, drift addresses the second factor. Genetic drift has several consequences that we will consider in turn: 1) Allele frequencies change within populations 2) Variation in allele frequency across populations increases 3) Heterozygosity and the genetic variation within populations declines The Wright-Fisher Model We consider a random mating diploid population without separate sexes (hermaphroditic or monoecious) of constant size N. Suppose that there are two alleles, A and a, with equal fitness at a single locus, and that there is no mutation. In any generation, there are 2N homologous genes in the population. Xk is the number of A Alleles in the kth generation, the frequency of the A allele in the kth generation is thus p k = Xk/2N. We want to model the allele frequency change among generations. One of the simplest models for genetic drift is the Wright-Fisher model. The allelic types of the 2N genes in the offspring generation k+1 are obtained as follows: Each offspring gene randomly picks a parent from the previous generation and inherits its allelic value, A or a. By this procedure, parent genes can be picked by chance multiple times or not at all. In the latter case, they do not leave any offspring. Assume first that we do not only pick a single offspring generation from a parent generation, but a large number of replicate offspring generations. If the number of replicates grows to infinity, we thus draw an infinite number of times from the parent generation. The average frequency over these replicates is then pk +1 = pk . In a single offspring generation the frequency will likely deviate somewhat from the expected value p k. If the population size is large, we still expect p k+1 to be very close to p k; in small populations larger deviations are more likely. The Wright-Fisher model (with mutation included, see next lecture) is the null-model in population genetics. This means, it is used to construct null hypotheses for the change of allele frequencies in populations. These hypotheses are then tested against data. Rejection of the simple Wright-Fisher model is then taken as evidence that something more interesting than “just mutation and drift” acts in the population, for example selection. Allele frequencies change within populations We want to quantify the effects of genetic drift. In particular: Given that the frequency of the A allele is p in the parent generation, what is the probability to find X = i copies of the A allele in the offspring generation (where I could range between 0 and 2N)? The answer is given by the binomial distribution 2N i 2 N −i Pr{X = i} = p (1 − p ) i where 2 N! 2N = i i! (2 N − i )! is the binomial coefficient. Example: Assume N = 5, thus 2N = 10 alleles in the population and initially 2 copies of the A allele, thus i = 2 and p = 2/10 = 0.2. The probability to still have i = 2 alleles in the following generation is: 10 Pr{X = 2} = 0.2 2 0.88 = 0.30 2 The probability to have i = 0 copies in the offspring generation is: 10 Pr{X = 0} = 0.2 0 0.810 = 0.11 0 If we assume 10 times the population size, but still initially p = 0.2, the probability from p = 0 in the following generation becomes 100 0 100 −10 Pr{X = 0} = 0.2 0.8 = 2.0 × 10 0 We see that the probability for large changes in the allele frequency is much reduced in a large population. The variance in p increases across populations Imagine that a single parent population gives rise to several “colony” offspring populations, all of the same size N. The frequency of the allele A in the parent population is p = p0. We ask: How will the allele frequencies in the colonies differ over time if there is no new mutation and no selection? Initially, the frequency of A in all colonies will be very close to the original value p 0. But over time, p will drift up in some colonies and down in others: the average p taken over all colonies stays at p0, but the variance of p among colonies increases. Eventually, some of the colonies (a percentage of p 0, on average) will have only A alleles and the others only a alleles. We can quantify how much the variance in allele frequency among colonies increases per generation. After a single generation, the variance in the number X of A alleles among colonies is just given by the variance of the binomial distribution, 2Np(1-p). Thus, the variance of p = X/2N is: Var[ p ] = pq 2N Notice that Var[ p ] → 0 as N → ∞ . The colonies will diverge only very slowly if the population size is large. For infinite population size, drift ceases to operate: we retain the prediction under the Hardy-Weinberg law. Drift versus selection: Drift is often an alternative explanation to selection for patterns of phenotypic (or nucleotide) diversity that we see in natural populations. For example, Darwin’s finches on the Galapagos Islands show a large variation in beak size and shape among populations on different islands. There are two possible explanations for this variation: 1) natural selection (different beak shapes are adapted to different island environments) 2) genetic drift (different alleles affecting beak shape have drifted to fixation in different populations) In the case of Darwin’s finches, there is convincing evidence that the beak size and shape has not diverged at random (as would be the case under genetic drift), but has evolved in order to increase the feeding efficiency in the presence of different types of food (large/small seeds). But often a distinction is difficult. A similar problem exists with the interpretation of geographic clines, i.e. the gradual change in a phenotypic trait or allele frequency over a geographic region. Again, there are two possibilities: 1) natural selection (the mean trait in the population tracks the optimum) 2) genetic drift (two extreme populations might have drifted apart after, say, a glaciation event. Intermediate allele frequencies in intervening populations could just reflect the trickle of modern gene flow) In this case, a standard way to make the case for natural selection is to establish the existence of parallel clines: If we see the same cline on different continents, selection is the more likely cause. In general, all methods to distinguish selection and drift try to establish in one way or another that an observed change is systematic (and therefore due to selection) rather then random (and therefore likely due to drift). The heterozygosity H within a population decreases Although the average allele frequencies in a population stay constant under drift (no systematic effect), the heterozygosity (and the genetic variance by any measure) is expected to decrease over time. This is intuitively clear if we consider the very long term effects of drift: Assume that allele A is initially present at frequency p0. We already know that the average frequency, p , will not change over time, i.e. p = p0 . However, in every generation, there is a small chance that A either fixes (p → 1), or is lost from the population (p → 0). Although this event is unlikely in any particular generation, it will certainly happen in some generation if only we wait long enough. There is no conflict of this fact with p = p0 : The A allele will go to fixation with probability p 0 and be lost with probability (1- p 0), thus p = 1 p0 + 0(1 − p0 ) = p0 . Since we have only drift in the process and no new mutation, the population will stay at p = 0 or p = 1 once it reaches this point. This means that the heterozygosity H = 2p(1-p) will go to zero over the long haul: Even though the variance among populations increases by drift, the genetic variance within a population decreases. We can quantify how fast the heterozygosity decreases. For added generality, we express H as the probability that we find different alleles at two homologous genes – or nucleotide positions – that are randomly drawn from the population. (Similarly to the definition of the nucleotide diversity ! in a previous lecture, this extends the definition also to haploid populations). The change across generations is most easily derived for the homozygosity G = 1 – H, i.e. the probability that alleles at two randomly drawn genes are equal. Consider a population of size N. At a given locus, the homozygosity in the parent generation is G, we ask for the expected homozygosity G’ in the offspring generation. For this, we randomly draw a gene from the offspring and derive the probability that a second gene that is also randomly drawn from the offspring has the same allele. There are two possibilities: 1) The second gene is a copy of the same parent gene as the first one. Under the conditions of the Wright-Fisher model, this occurs with a probability of 1/2N. 2) The second gene is not a copy from the same parent gene, but from a randomly drawn different parent (probability 1 – 1/2N). In this case, the probability that the alleles at the target locus are equal is just given by the homozygosity G in the parent generation. Summing over these two cases, we obtain: G’ = 1/2N + (1 – 1/2N) G = G + 1/2N (1 – G) We see that G will increase as long as it is smaller than 1. The reason for this increase is the case 1) above: in finite populations, there is a finite probability for inbreeding. In our setting this means: There is a finite probability that that two genes that are randomly drawn from the offspring generation are descendents of the same parental gene. Using H = 1 – G, we can express the change of the heterozygosity as 1 H H′ = 1 − 2N The heterozygosity decreases geometrically with factor (1 – 1/2N). Note that for an infinite population size the quantity in brackets goes to one and thus H’ = H. This is just what is expressed by the Hardy-Weinberg law: For an infinite population (no drift) the genetic variance is preserved. Over multiple generations, H changes according to 1 t H Ht = 1 − 2N 0 where H0 is the starting heterozygosity and Ht the heterozygosity after t generations. We derive the “half-life” of heterozygosity under drift as: Ht 1 1 t = = 1 − H0 2 2N 1 ln (1/ 2) = t ln 1 − 2N t = − ln2 − ln 2 ≈ = 2N ln 2 = 1.39N 1 −(1/ 2N ) ln 1 − 2N where we have used the approximation ln(1+ x) ≈ x . We see that the half-life of heterozygosity is of the order of the population size N: if N = 100, it takes 139 generations to cut H in half; if N = 1 million, takes 1.39X106 generations. We see once again that drift is a force that is mostly potent in small populations; if N is large, it erodes the genetic variation very slowly.