1180:Lab12 1 Inferring Natural Selection from Population Changes James Moore

advertisement
1180:Lab12
James Moore
April 23rd, 2013
1
Inferring Natural Selection from Population Changes
In evolutionary theory, we typically define an individual’s fitness as the relative relative number of
offspring it produces. Suppose we have two populations of flies:A and B. We want to determine
the fitness of A relative to B. Let A1 and B1 be their populations in the first generation. In the
second generation, the probability of an individual being A is
pA =
fA A1
fA A1 + fB B1
where fA and fB are the respective fitnesses of A and B.
To test whether or not an allele improves fitness, we start with a known population of flies. We
allow them to breed and create a new generation, then sample them to see how many are type A.
We can then use this to estimate the fitness of type A.
In this lab, we will use simulation to estimate the confidence intervals of fitness using the Monte
Carlo method (pg 724). Although this process is difficult to analyze using conventional probability
theory, the Monte Carlo Method will still work just fine.
2
Implementing the Monte Carlo Method
All the code in this section is necessary for running the Monte Carlo Method. I recommend putting
it all into a single source file.
First, we specify the different values of fA to use and the number of simulations to run for each
different value.
V=100
Fitness=seq(0,10,length=V)
S=5000 #Number of Samples
Set initial population sizes:
1
A1=10 #Initial A population
B1=90 #Initial B population
Specify the number of flies in the next generation and the number of samples:
N=100 #Size of 2nd Generation
n=10 #Sample size
To find the confidence interval we have to take an actual measurement and then, for each value of
the fitness, compute how likely the simulation result is to be above or below that measurement.
Measurement=3 #Number of As in our sample
AboveMeasurement=seq(V) #Initialize with a placeholder
BelowMeasurement=seq(V)
We step through the values of fitness in a for loop. In each step, we run S simulations simultaneously
and then use those simulations to try to approximate the probability of falling above or below the
measurement.
1. Calculate pA .
2. Simulate natural selection S times. Each simulation has N trials with a probability of success
pA , so it should have what type of distribution?
3. Calculate the number of type B flies in the 2nd generation by subtracting the number of type
A from the total.
4. Take a sample of n flies from each of the resulting S simulations. This is follows a hypergeometric distribution.
5. Count up the number of simulations that are above and below the measurement.
Here is the code to do this, but it includes some errors which you should be able to spot.
for(i in seq(V)
p_A=Fitness(i)*A1/Fitness(i)*A1+B1
A2=rexp(S,size=N,prob=p_A) #Simulate New Generation
B2= #Calculate number of B in 2nd generation.
#No errors after here
a2=rhyper(S,A2,B2,n) #Sample n Flies, S times
AboveMeasurement[i]=mean(a2>=Measurement)
BelowMeasurement[i]=mean(a2<=Measurement)
}
Once you have fixed those errors plot the probabilities AboveMeasurement and BelowMeasurement vs fitness as points. Label the axes as ‘fitness’ and ‘probability’. Add the appropriate line to
help you find the 95% confidence interval. Finally add a vertical line at fitness value 1. Make sure
all your lines and points are different colors.
2
3
Questions!!!
Once your code from the previous section is working, you can answer the following by tweaking
it slightly. In these questions, find the confidence interval graphically. The confidence interval is
always the 95% confidence interval.
1. Set the sample size equal to 1. What is the confidence interval if the sample is an A? is a B?
2. Set the sample size equal to 10. Suppose you find 3 As in your sample. Find the confidence
interval and provide the plot.
3. Reproduce the plot from above with S = 100. What do you notice?
4. With sample size 5, how many As must you find so that your confidence interval lies entirely
above 1? Include the plot corresponding to this number.
5. With sample size 10, how many As must you find so that your confidence interval lies entirely
above 1? Include the plot corresponding to this number.
6. With sample size 20, how many As must you find so that your confidence interval lies entirely
above 1? Include the plot corresponding to this number.
3
Download