Random numbers and stochastic simulations in Excel

advertisement
Random numbers and stochastic simulations in Excel


Functions in bold type require that PopTools be installed. When a PopTools
function duplicates one of Excel’s built-in functions, the PopTools function is
often easier to read and remember and is also based on algorithms that are more
numerically robust.
To generate a new set of random numbers, hit the ‘F9’ key. This replaces all the
random numbers in the worksheet and updates any other calculations that depend
on them.
Functions for generating random numbers
These are the distributions that you will most commonly be using in this class. There are
a variety of other distributions available, both in Excel and PopTools, which use similar
syntax. In the “Paste Function” dialog box, look under either “Statistical” or “PopTools
random variables” to see what’s available.
Continuous distributions
RAND()
Returns a pseudo-random number uniformly distributed between 0 and 1.
Either implicitly or explicitly, this is the basis for all the other built-in
random number functions.
dRandReal(l, u)
Returns a pseudo-random number that is uniformly distributed
between l and u. This is the basis (although you never see it) for all the
other PopTools random number functions.
NORMINV(RAND(), m, s) Returns a pseudo-random number that is normally
distributed with mean m and standard deviation s.
dNormalDev(m, s) Returns a pseudo-random number that is normally distributed with
mean m and standard deviation s.
LNORMINV(RAND(), m, s) Returns a pseudo-random number that is log-normally
distributed with mean m and standard deviation s.
dLogNormalDev(m, s)
Returns a pseudo-random number that is log-normally
distributed with mean m and standard deviation s.
BETINV(RAND(), m, s, l, u) Returns a pseudo-random number that is beta distributed
with mean m and standard deviation s, and bounded between l and u (the
beta distribution has a central tendency, like the normal, but only a finite
range of allowable values; it is often used with l = 0 and u = 1).
dBetaDev(m, s)
Returns a pseudo-random number that is beta distributed with
mean m and standard deviation s, and bounded between zero and one.
Discrete distributions
BINOMINV(RAND(), N, p) Returns a pseudo-random number that is binomially
distributed with sample size N and success probability p.
dBinomialDev(N, p) Returns a pseudo-random number that is binomially distributed
with sample size N and success probability p.
POISINV(RAND(), m)
Returns a pseudo-random number that is Poisson
distributed with mean m.
dPoissonDev(m)
Returns a pseudo-random number that is Poisson distributed with
mean m.
Using PopTools to run a stochastic simulation
Building an Excel spreadsheet to do replicate simulations using built-in tools is tedious
process, especially when you consider that instead of 100 replicates, we usually want
1000 or even 10,000! Fortunately, PopTools provides an easier way.
The model
We will use the Ricker model with the parameters given in Table 4.2 of the textbook,
applicable to the checkerspot butterfly population. The following figure shows the initial
setup for the model (note that I have named the cells that have the parameter values in
them, so that the formulas are clearer; the parameter r0 is what is called r in the text –
Excel does not allow us to use ‘r’ as a cell name). Cell B12 is the formula for the model.
Since we are interested in how long it takes the population to cross a quasi-extinction
threshold, we need to create two more columns: one giving the minimum value of N so
far, and one just replicating the QE threshold.
Using Monte Carlo analysis in PopTools
Now go to PopTools > Simulation tools > Monte Carlo analysis. Enter the range of the
minimum population size into the “Dependent range” and the column of QE values into
“test range”. Select “<=” and “Keep results”, and enter the desired number of replicates.
Don’t worry about percentiles.
Hit the “Go” button, and the program first does its calculations, and then asks for an
output cell. This will be the upper left cell of the output, which has 7 columns and as
many rows as your simulation.
CONGRATULATIONS!! You’ve performed your first Monte Carlo analysis.
You now have a bunch of new columns in your spreadsheet. The only one that’s really
useful at this point is the one labeled “<=Test1.” In this column are the number of
replicates whose minimum population size was less than or equal to the QE threshold at
each time step. It’s quite easy to use this to plot a CDF of extinction risk – simply divide
by the number of replicates to get the cumulative extinction probability.
You also now have a new worksheet, titled “Monte Carlo results 1”. This contains the
simulation results, showing the dependent variable (in this case, the minimum population
size) for each replicate simulation. Each replicate has a row, with time running from left
to right. You could use these data to construct CDFs for other QE thresholds.
The results are NOT dynamic – if you update the parameters they are not updated. Thus
if you want to look at different parameter values or initial population sizes, you will need
to re-run the analysis. The new simulation data will be stored in a new sheet, with the
helpful name “Monte Carlo results 2”. Thus you should get in the habit of either
renaming the sheets with informative names or keeping a careful record of what the
parameters were for each analysis.
If you want the simulation output to contain the actual population sizes, rather than the
minimum size to date, you will have to re-run the analysis with the column of population
sizes set as your dependent range. The summary results now tell you about the mean and
variance of population size at each time. You can also use the simulation results to plot
time series or histograms of population sizes. For example, select the last column from
the simulation results and choose PopTools > Simulation tools > Summary stats to get a
histogram.
However, unless your QE threshold is set to zero, using the test column will not be
useful, as a population could increase back over the threshold. The way to fix this is to
modify your simulation model to force the population to stay extinct if it crosses the QE
threshold, using the IF function. For example, if you replaced cell B12 with
=IF(B11>D11, B11*EXP(r0*(1-B11/K)+dNormalDev(0,s)), 0)
and carried this down, then the population would drop to zero the turn after it goes below
the QE threshold. Then you can use the test column to create the CDF as you did before.
Bootstrapping the data to get confidence intervals
This is a somewhat tedious task in excel. The first step is to generate a set of bootstapped
regression coefficients. We are going to do this by setting up a worksheet that uses
dynamic resampling and regression tools (both from PopTools), and then use the Monte
Carlo tool to replicate the output of this worksheet (the parameter estimates) many times.
Launch the Simple random sample tool from the PopTools→Sampling menu:
The “Sample size” should be the same as the number of rows of data. Be sure to check
“Sample with replacement” and uncheck “static result.”
Notice that the resulting output is an “array formula.” You can confirm that it’s dynamic
by hitting F9 to update.
Now we want to do a regression of column G on column F, using PopTools→Extra
Stats→Regression:
This also creates an array formula, and you can see that when the data in columns F and
G are updated, so is the regression results.
The quantities we are interested in are b0 (which is r0 in the model), b1 (which is -r0/K in
the model) and Residual St. dev (which is s in the model). We will use the Monte Carlo
tool to generate lots of replicate values of these:
The summary output it creates isn’t useful; instead we are interested in the new sheet it
creates (“Monte Carlo results x”). Let’s rename that sheet so that we remember what it
is:
We can now generate a simulation for each set of parameters – notice that time is now
running from left to right rather than top to bottom:
Fill this out to year 10 on the right and all the way down for the 1000 replicates:
Now let’s use the Monte Carlo tool to determine the probability of quasi-extinction by
year 10. First create a new column that is the minimum population size over the ten years
(so we don’t have populations “recovering” from quasi-extinction. Then use this column
as the “Dependent range” in the Monte Carlo tool.
It turns out we are limited by the number of columns, which is 256. This is the limit on
the number of replicate parameter values we can use. So, we run using just the first 250
rows:
This runs, and places values in the Monte Carlo results worksheet, but doesn’t generate
the summary statistics. This is because, for a few bootstrap replicates, the estimate of b1
is positive (which corresponds to a negative value of K) – this positive density
dependence leads to faster than exponential growth, and even within 10 years, the
population often diverges to Excel’s version of infinity, generating “#NUM!” as the
result… this causes the summary functions to fail. Since this is biologically unrealistic, it
is best to go through and delete those rows with positive values of b1.
This run works. Now lets copy the column “<=Test1” to a new worksheet, and sort it in
ascending order. Sort that column in ascending order, and convert it to an extinction risk.
Also calculate the confidence level associated with each value. These can be used to
produce the confidence plot.
What if we want to look at the whole cdf? Well, to do that we need to go back to the
stage where we were bootstrapping the regression parameter values. Create a simulation
that runs left to right, using the if statement to set it to zero if it ever drops below the QE
threshold. Then drag the row down to create 1000 replicate simulations. Now create the
CDF using the COUNTIF function (as far as I know, you cannot use a variable name
with this, you have to code in “50” directly). Use the TRANSPOSE array function to
convert this into a column. This column (normalized to be a probability) will be the
dependent variable for a monte carlo analysis.
This is quite slow, as you are doing 1000*1000 100-year simulations!
From the summary output you can see the mean CDF and the 95% confidence intervals.
From the new worksheet with the Monte Carlo results you can get all the confidence
intervals. The tedious part is that you have to sort each column separately.
Download