1180:Lab12 James Moore April 23rd, 2013 1 Inferring Natural Selection from Population Changes In evolutionary theory, we typically define an individual’s fitness as the relative relative number of offspring it produces. Suppose we have two populations of flies:A and B. We want to determine the fitness of A relative to B. Let A1 and B1 be their populations in the first generation. In the second generation, the probability of an individual being A is pA = fA A1 fA A1 + fB B1 where fA and fB are the respective fitnesses of A and B. To test whether or not an allele improves fitness, we start with a known population of flies. We allow them to breed and create a new generation, then sample them to see how many are type A. We can then use this to estimate the fitness of type A. In this lab, we will use simulation to estimate the confidence intervals of fitness using the Monte Carlo method (pg 724). Although this process is difficult to analyze using conventional probability theory, the Monte Carlo Method will still work just fine. 2 Implementing the Monte Carlo Method All the code in this section is necessary for running the Monte Carlo Method. I recommend putting it all into a single source file. First, we specify the different values of fA to use and the number of simulations to run for each different value. V=100 Fitness=seq(0,10,length=V) S=5000 #Number of Samples Set initial population sizes: 1 A1=10 #Initial A population B1=90 #Initial B population Specify the number of flies in the next generation and the number of samples: N=100 #Size of 2nd Generation n=10 #Sample size To find the confidence interval we have to take an actual measurement and then, for each value of the fitness, compute how likely the simulation result is to be above or below that measurement. Measurement=3 #Number of As in our sample AboveMeasurement=seq(V) #Initialize with a placeholder BelowMeasurement=seq(V) We step through the values of fitness in a for loop. In each step, we run S simulations simultaneously and then use those simulations to try to approximate the probability of falling above or below the measurement. 1. Calculate pA . 2. Simulate natural selection S times. Each simulation has N trials with a probability of success pA , so it should have what type of distribution? 3. Calculate the number of type B flies in the 2nd generation by subtracting the number of type A from the total. 4. Take a sample of n flies from each of the resulting S simulations. This is follows a hypergeometric distribution. 5. Count up the number of simulations that are above and below the measurement. Here is the code to do this, but it includes some errors which you should be able to spot. for(i in seq(V) p_A=Fitness(i)*A1/Fitness(i)*A1+B1 A2=rexp(S,size=N,prob=p_A) #Simulate New Generation B2= #Calculate number of B in 2nd generation. #No errors after here a2=rhyper(S,A2,B2,n) #Sample n Flies, S times AboveMeasurement[i]=mean(a2>=Measurement) BelowMeasurement[i]=mean(a2<=Measurement) } Once you have fixed those errors plot the probabilities AboveMeasurement and BelowMeasurement vs fitness as points. Label the axes as ‘fitness’ and ‘probability’. Add the appropriate line to help you find the 95% confidence interval. Finally add a vertical line at fitness value 1. Make sure all your lines and points are different colors. 2 3 Questions!!! Once your code from the previous section is working, you can answer the following by tweaking it slightly. In these questions, find the confidence interval graphically. The confidence interval is always the 95% confidence interval. 1. Set the sample size equal to 1. What is the confidence interval if the sample is an A? is a B? 2. Set the sample size equal to 10. Suppose you find 3 As in your sample. Find the confidence interval and provide the plot. 3. Reproduce the plot from above with S = 100. What do you notice? 4. With sample size 5, how many As must you find so that your confidence interval lies entirely above 1? Include the plot corresponding to this number. 5. With sample size 10, how many As must you find so that your confidence interval lies entirely above 1? Include the plot corresponding to this number. 6. With sample size 20, how many As must you find so that your confidence interval lies entirely above 1? Include the plot corresponding to this number. 3