Calculus for Biologists Lab Math 1180-002 Spring 2012

advertisement
Calculus for Biologists Lab
Math 1180-002
Spring 2012
Lab #7 - Descriptive Statistics for Simulated Populations
Report due date: Tuesday, March 6, 2012 at 9 a.m.
Goal: To use the geometric mean to describe data, and to visualize the variance and standard deviation of data.
? Create a new script, either in R (laptop) or with a text editor (Linux computers).
Geometric Mean
Suppose we have a population of beings (pick your favorite one) in which there are really good times (years,
months, days, etc.) and somewhat bad times. On average, we’ll assume (as in lecture) that the population can
either reproduce at a rate R of 0.7, 0.9 and 1.5 with equal probability. Of course, these rates are randomly chosen
each year. We will simulate 50 different populations for a total of 30 time steps. Each population will have an
initial size of 10.
Notice that in the code to follow the storage matrix for reproduction rates has one fewer row than that for the
population sizes. This is because we only need to choose exactly 30 rates; the population size, on the other hand,
will be updated 30 times in addition to the initial condition at time 0.
R.list = c(0.7,0.9,1.5)
## save rates to a list
## define a function that picks some number C rates from R.list
rates = function(C){
sample(R.list,C,replace=TRUE, prob=c(1/3,1/3,1/3))
}
total.time = 30 ## time steps
n.pops = 50
## number of populations
N = matrix(0,total.time+1,n.pops) ## storage table for population sizes
R = matrix(0,total.time,n.pops)
## storage table for reproduction rates
## Simulate!
N0 = 10
## initial population size
N[1, ] = N0 ## initialize storage table
for (j in 1:total.time){
R[j, ] = rates(n.pops)
## choose CURRENT rate
N[j+1, ] = R[j, ]*N[j, ] ## update NEW population size with this rate
}
Now that we have our population in hand, let’s see what it looks like. Remember that the standard form the
population size after time t with reproduction rates R1 , R2 , . . . , Rt−1 is
Nt = Rt−1 Rt−2 . . . R2 R1 R0 N0
As an exercise, you should choose one of your populations (1 through 50) and plot the simulated solution and
the one obtained based on the formula above (using the rates chosen for that same population).
We can determine the expected population trajectory with the true geometric mean, which can be calculated
using the following:
P
R̃ = e ln(ri )pi ,
(1)
where ri takes on the values in R.list. Use the formula in (1) to calculate the actual geometric mean of this
system, and save the result.
true.geom.mean = # ???
Now we’re ready to plot some things. The following code will create two side-by-side plots of the data. The one
on the left represents that standard population sizes. The one on the right is a plot of the natural log of these
populations. The latter plot just makes it easier to see the behavior of the solutions. The thick black line in each
plot is the normal or log-transformed solution Nt = R̃t N0 , where R̃ is the true.geom.mean. The log plot is
actually the natural logarithm of the populations. The default base for R’s log command is e.
1 of 3
L7
times = 0:total.time
colors=rainbow(n.pops)
par(mfrow=c(1,2))
matplot(times, N, type = "l", col = colors, xlab="Time",lty=1,
main="Simulated populations")
lines(times,true.geom.mean^times*N0, type="l", col="black", lwd=2)
matplot(times, log(N), type = "l", col = colors, xlab="Time",lty=1,
main="log simulated populations")
lines(times,log(true.geom.mean^times*N0), type="l", col="black", lwd=2)
abline(h=0,lty=2)
Note the dotted line in the graph on the right. This tells you when the population gets into trouble. (Think about
why this is!)
Plot 7.1: Save this figure to include in your assignment.
Variance
There are other interesting pieces of information you can obtain from the data. One is the spread (or variability).
There are two statistical measures that describe this: variance (Var) and standard deviation (σ). Recall that Var
= σ 2 , so if we can find the variance, we can easily compute the standard deviation.
Now, instead of computing the spread of, say, reproductive rates, we are interested in looking at the spread of the
population sizes for each time point between 0 and 30. A good way to do this is to check the expected value of
the squared difference of the populations from the overall mean at that time. In computer-world, it’s a lot easier
to calculate the difference between the expected squared data and the square of the expectation, for each time t:
σ2 =
n
X
x2i pi − X̄ 2
(2)
i=1
This means we need several pieces of information:
1. the value of each population size (xi )
2. the probability for each population size at each time (pi )
3. the expected value at each time (X̄)
Note that here, i spans 1 to 50 and corresponds to the individual populations. In the code, j will reference each
time point. We will first compute the expected population sizes for every time, as well as the expectation of the
squared population sizes.
expect = rep(0,total.time+1)
## save expected values
squared.expect = rep(0,total.time+1)
## save second term of the variance formula
expect.of.square = rep(0,total.time+1) ## save first term of the variance formula
for (j in 1:(total.time+1)){
x.i = as.numeric(names(table(N[j, ])))
p.i = as.numeric(tabl e(N[j, ]))/n.pops
## all pop. sizes obtained at time j
## the fraction of pops. that were of each size
expect[j] = sum(x.i*p.i)
squared.expect[j] = expect[j]^2
expect.of.square[j] = sum(x.i^2*p.i)
}
The as.numeric() command in the loop forces things to be numbers (as opposed to strings of “letters”). We
end with the variance:
variance = expect.of.square - squared.expect
And, plot it.
plot(times,variance,type="o",xlab="Time",main="Variance of simulated data")
Plot 7.2: Save this figure to include in your assignment.
2 of 3
L7
Standard Deviation
Since the standard deviation is just the positive square root of the variance, we can easily compute it in R, using
the sqrt function. This – you guessed it – takes the square root of its argument.
std.dev = sqrt(variance)
Many scientific data are plotted on graphs with “error bars” at each point. In some cases, these bars represent
one standard deviation distance above and below the mean plotted. That’s what you are going to do next. You
should get the following warning when you run the lines of code below:
Warning message:
In arrows(times, expect - std.dev, times, expect + std.dev, angle = 90,
zero-length arrow is of indeterminate angle and so skipped
:
Ignore this error. It just says that the population sizes at one of the time points has a standard deviation of 0.
Can you guess which one this is?
par(mfrow=c(1,1))
## set up plot for data, but don't plot anything yet
matplot(times,N,type="n",xlab="Time",ylim=c(floor(min(expect-std.dev)/10)*10,max(N)),
main = "Population spread over time")
## add grid lines to the plot region
abline(v=times,h=seq(floor(min(expect-std.dev)/10)*10,max(N),by=25), col="lightgray")
## now actually plot all the data, as blue dots
matpoints(times,N,col="deepskyblue",pch=20,cex=0.75)
## plot the expected values at each time point, as black diamonds
points(times,expect,pch=18,col="black",cex=1.25)
## plot lines extending from mean - std.dev to the mean + std.dev
arrows(times, expect-std.dev, ## first set of points
times, expect+std.dev,
## second set of points
angle=90,
## don't plot arrowheads; plot flat heads instead
code=3,
## plot "arrows" at both ends
length=0.05,
## arrowhead length
lty=1, col="black")
Plot 7.3: Save this plot to include in your assignment.
? Save your script so that you can use it for your assignment.
3 of 3
L7
Download