File

advertisement
Modeling Evolution: The Wright-Fisher Model
Rick Stuart and Samantha Biehl
Introduction to Evolution
Evolution is broadly defined as a change over time. When looking at evolution biologically, it
can be defined as a change that is heritable and results in diversity. The process that leads to
evolution is known as natural selection. Natural selection is defined as a change in allele
frequencies over time due to variation, inheritance, and differential survival. An allele is defined
as a variation of the same gene. For example, a gene that codes for eye color has several variants
– one that codes for blue eyes, one that codes for green eyes, and one that codes for brown eyes.
In a diploid organism, the individual contains two alleles for each gene, and in a haploid
organism, the individual has only one allele for each gene. One allele may be dominant to
another, therefore the dominant allele will be expressed when in the presence of the less
dominant allele, often referred to as the recessive allele.
One of the mechanisms by which evolution can occur is genetic drift. In this mechanism, the
change in allele frequency is entirely random and independent of any selection pressures. The
event where an allele is completely eliminated from a population is referred to as extinction. The
opposite of extinction, when only one allele remains in the population because the other has gone
extinct, is referred to as fixation. When fixation occurs, the genetic variation of that population
is gone. The only ways to get variation back would be through the process of new alleles
migrating into the population, or through a mutation of an allele.
Mutation is the number one source of variation and is defined as a change to the DNA sequence.
Some mutations do not affect the expression of the gene associated with the DNA sequence, and
these mutations are known as silent or neutral mutations. Other mutations, however, may affect
the expression of a gene and thus have an effect on how well the organism fares. How well an
organism does in its environment is called fitness. Fitness is formally defined as the measure of
an individual’s reproductive success relative to the average reproductive success in a population.
Therefore, an individual who fares better and produces more offspring will have a higher fitness
than an individual who might fare well, but doesn’t produce many offspring.
Introduction to Wright-Fisher
The Wright-Fisher model, in its most basic form, is a stochastic model often used to model
genetic drift in a finite population.1,2 The model can be used for both haploid and diploid
individuals as it only looks at the allele frequencies in the population. Often the model only
looks at what happens when two alleles are present, although it could also be used to look at
more than two alleles.
The variables used in the basic Wright-Fisher model are as follows:
𝑁 = Number of individuals
2𝑁 = Number of alleles (type A and B)
𝑖 = Number of type A alleles at step 𝑛 − 1
2𝑁 − 𝑖 = Number type B alleles at step 𝑛 − 1
𝑋𝑛 = Number of type A alleles at step 𝑛
𝑖
2𝑁
= Probability of getting type A allele
𝑖
1 − 2𝑁 = Probability of getting type B allele
We can then look at the probability that the number of type A alleles will change in the next
generation, which is given as:
𝑖
𝑗
2𝑁−𝑖 2𝑁−𝑗
(𝑋𝑛 = 𝑗|𝑋𝑛−1 = 𝑖) = (2𝑁
) (2𝑁) ( 2𝑁 )
.
𝑗
The number of alleles in a particular step only depends on the number of alleles in the previous
step, therefore the Wright-Fisher model is a Markov chain. The transition matrix for the Markov
chain can be constructed using the following:
𝑖
𝑗
𝑇𝑖𝑗 = (2𝑁
) (2𝑁) (
𝑗
2𝑁−𝑖 2𝑁−𝑗
2𝑁
)
.
Probability of Fixation
One interesting aspect of the model to investigate is the probability that an allele will fixate or go
to extinction.2 Since the model is a binomial distribution, we can modify the general formula of
an expected value given as
𝐸(𝑋) = 𝑛𝑝
to fit our model. It then follows that
𝑋
𝐸(𝑋𝑡+1 ) = 2𝑁 2𝑁𝑡 = 𝑋𝑡 .
Therefore,
𝐸(𝑋0 ) = 𝑖.
By induction it can then be shown that
𝑖 = 𝐸(𝑋𝑡 | 𝑋0 = 𝑖)
for all values of 𝑡. This makes the model a bounded martingale. One property of bounded
martingales is that they converge as the number of steps approach infinity.3 It is apparent that
the model will converge on either extinction (𝑋 = 0) or fixation (𝑋 = 2𝑁). Given that the
probability of the model being in one of the absorbing states as 𝑛 → ∞ is 1, this implies that the
probability of being in any other state is 0. Therefore the expected value of 𝑋𝜏 as 𝑛 → ∞, where
𝜏 is the step at which the model goes to extinction or fixation, is
𝐸(𝑋𝜏 | 𝑋0 = 𝑖) = 0 ∗ 𝑃(𝑋 𝑛 = 0) + 2𝑁 ∗ 𝑃(𝑋𝑛 = 2𝑁) as 𝑛 → ∞.
Therefore,
𝑖 = 0 ∗ 𝑃(𝑋 𝑛 = 0) + 2𝑁 ∗ 𝑃(𝑋𝑛 = 2𝑁) as 𝑛 → ∞,
or
𝑖
𝑃(𝑋𝑛 = 2𝑁) = 2𝑁 as 𝑛 → ∞.
Assumptions
The assumptions of the basic Wright-Fisher Model are as follows: 1) generations do not overlap,
2) population size remains constant, 3) no mutation or recombination is occurring, 4) the genders
of the individuals are ignored, 5) familial ties are not taken into consideration, 6) no selection
pressure is acting in the population.
Basic Model
In our basic model, we used a diploid population with 𝑁 = 10 individuals, therefore we looked
at 2𝑁 = 20 alleles. First, we wrote a code that would randomly assign an allele (0 for a B allele
and 1 for an A allele) to an array with a specified number of A and B alleles. The code would
then randomly choose an individual from this initial array for the first individual in the new
generation (Figure 1), and this would continue until all 20 individuals were chosen for the new
generation. The process would then repeat, randomly choosing an individual from this new
generation for the next generation.
Figure 1 An allele is chosen at random from the previous
generation and is placed into the new generation.
We ran this process for 20 generations and assumed that both alleles were initially represented
equally in the original population (Figure 2). As can be seen, the A allele ends up faring better
than the B allele, and the path of the B allele is just a mirror image of the path of the A allele.
We chose the A allele as our allele of interest and therefore only graphed the A allele in future
simulations to keep our graphs from looking too clustered.
We then wanted to look at the variation that could occur in different simulations, so we ran 10
simulations with 2𝑁 = 20 (Figure 3). As can be seen, in 5 of the 10 simulations the A allele
goes to fixation, while in 2 of the 10 it goes to extinction, and in 3 of the 10 it is still present in
the population, but in varying numbers. This simulation demonstrates the randomness associated
with genetic drift.
Figure 2 One simulation of the basic
Wright-Fisher model with 2𝑁 = 20
showing both the A and B alleles.
Figure 3 Ten simulations of the basic
Wright-Fisher model with 2N=20,
showing only the A alleles.
Figure 4 Ten simulations of the basic
Wright-Fisher model with 2𝑁 = 1000,
showing only the A alleles.
Figure 5 Five simulations of the basic
Wright-Fisher model with 2𝑁 = 100000,
showing only the A alleles.
We then looked at varying population sizes, starting with 2𝑁 = 1000 (Figure 4) and then
looking at 2𝑁 = 100000 (Figure 5), and found that as the population size increased, the
randomness of genetic drift had a much lower effect and the A allele neither went to fixation nor
extinction in the generation span we allowed.
Modeling Probability of Fixation
Remember that we derived that the probability of fixation of an allele with initial value (𝑋0 = 𝑖)
𝑖
is (𝑋𝑛 = 2𝑁) = 2𝑁 . To test this we performed 100 simulations in our basic model. Each
simulation consisted of 100 trials with 𝑖 = 1 and 2𝑁 = 20. We would expect to get an average
of 5 trials per simulation that would result in the allele going to fixation. Running the simulation
gave a mean of 5.02 trials that went to fixation over the 100 simulations.
Fitness Model
Once we had our basic model running, we then wanted to see if we could add fitness to the
model. To do so, we still had our code randomly choose an individual from our initial array, but
this time we set it up so that it was more likely to choose an A allele than a B allele. To do this,
we derived the probability of choosing an A allele given the fitness of that allele and came up
with the probability of choosing an A allele of
𝑋𝑛−1
2𝑁 𝜔𝐴
𝑃(𝐴) =
𝑋𝑛−1
𝑋𝑛−1
𝜔
+
(1
−
𝐴
2𝑁
2𝑁 ) 𝜔𝑩
where 𝜔𝐴 = the fitness of the A allele and 𝜔𝐵 = the fitness of the B allele.
We then set the fitness of the A allele to 1 and the fitness of the B allele to 0.9 and ran 10
simulations (Figure 6). Since the fitness of allele A was greater than the fitness of allele B, we
expected to see more of the simulations going to fixation. This was in fact the case, with 6 of the
10 simulations going to fixation and only 1 of the 10 simulations going to extinction.
Figure 6 Ten simulations of the Wright-Fisher
model with fitness 𝑤𝐴 = 1.0, 𝑤𝐵 = 0.9, with
2𝑁 = 20, showing only the A alleles.
We ran several simulations, with the fitness of A set at 1.0 and the fitness of B set at 0.95,
varying numbers of alleles and numbers of initial A alleles (Figure 7, Figure 8) and found that in
each simulation, given a large enough population with a large enough proportion of A alleles, the
A alleles eventually went to fixation.
1000
1000
900
800
# of Alleles
# of Alleles
800
700
600
400
600
200
500
400
0
20
40
60
80
Generations
100
120
Figure 7 Ten simulations of the WrightFisher model with fitness 𝑤𝐴 = 1.0,
𝑤𝐵 = 0.95, with 2𝑁 = 1000, showing only
the A alleles.
0
50
100
Generations
150
Figure 8 Ten simulations of the WrightFisher model with fitness 𝑤𝐴 = 1.0,
𝑤𝐵 = 0.95, with 2𝑁 = 1000, showing
only the A alleles.
Fitness with Mutation Model
Next we wanted to introduce mutation into our model. We then set our probability of a mutation
occurring to 𝜇 = 0.01 and used the same probability formula from our fitness model but
modified it to take into account each new mutation that arose.
𝑋𝑖:𝑛−1
2𝑁 𝜔𝑖
𝑃(𝑖) =
𝑋
∑(𝑗=𝑎𝑙𝑙𝑒𝑙𝑒𝑠) 𝑛−1 𝜔𝑗
2𝑁
We also set the fitness of the A and B alleles to 1.0 and assigned a random fitness of between 0.5
and 1.5 to each new mutated allele. We ran one simulation with 2𝑁 = 100 (Figure 9) and found
that in this simulation, both of our original alleles went to extinction and a mutant allele came
close to going to fixation.
Figure 9 One simulation of the Wright-Fisher
model with fitness 𝑤𝐴 = 1.0, 𝑤𝐵 = 1.0,
0.5 ≤ 𝑤𝑚𝑢𝑡 ≤ 1.5 and mutation probability
𝑢 = 0.01, with maximum number of mutations
set at 10, 2𝑁 = 100, showing all alleles.
Summary and Conclusion
Our objectives of this assignment were to study a stochastic model and report our findings. We
chose the Wright-Fisher model for genetic drift of a finite population. Even though there are
other models that have been developed for these scenarios, we wanted to modify our basic
Wright-Fisher model to include factors such as fitness and mutation. The results that we are
seeing with these modifications indicate that this model can indeed be used for these
modifications. The basic model does a nice job of demonstrating the concept of genetic drift.
The model that includes fitness illustrates the power of competition and how an allele with a
higher fitness has a competitive advantage. Finally, the model that includes fitness and mutation
shows the true mechanism for natural selection, the process of competitive mutations supplanting
existing alleles.
Some limitations of the model include the fact that it’s a generation by generation model,
therefore it does not take into account generational overlap. The model also does not take into
account the gender of each individual, mating patterns, or sexual selection. Future studies could
look at integrating solutions to these limitations into the model.
References
1. “Genetic Drift.” Wikipedia: The Free Encyclopedia. 3 May 2013. Web.
http://en.wikipedia.org/wiki/Genetic_drift
2. Mitrofanova, Antonina. "Lecture 2: Absorbing States in Markov Chains. Mean Time to
Absorption. Wright-Fisher Model. Moran Model." Lecture. Lecture Notes. NYU,
Department of Computer Science, New York. 18 Dec. 2007. Web.
http://cs.nyu.edu/mishra/COURSES/09.HPGP/scribe2
3. "Martingale Convergence Theorem.” Planetmath.org. N.p., n.d. Web. 06 May 2013.
http://planetmath.org/martingaleconvergencetheorem
Pseudocode
Basic Model
 Set number of simulations
 Loop 1 to number of simulations
o Set number of alleles, number of A alleles, generations
o % Note that an A allele will be 1 and B allele will be 0
o Randomly choose the A alleles for generation 1
o Pre-allocate matrices to track alleles, # of A’s and # of B’s over generations
o Seed generation 1
o Loop 2 to number of generations
 Loop 1 to number of alleles
 Randomly choose an allele in step n-1 to copy allele
o Fill # of A’s and # of B’s matrices
o Plot single simulation
Fitness Incorporation
To determine the type of the n step allele the following algorithm was used:
 Loop 1 to number of alleles
o Get a random variable R
o Get the number of A alleles in step n-1
o Calculate the probability that the allele will be type A
o If R<probability that the allele will be type A
 Assign it type A
o else
 Assign it type B
Mutation Incorporation
Changes included:
 Keeping track of the mutations, and mutation fitness in matrices
 Altering the fitness algorithm to determine the probability of each of the alleles that are
or have been present up to that time step.
 Walking through the continuous random variable to determine which of the alleles that
are present will be assigned.
Download