Presenting Simulation Results

advertisement
Presenting Simulation Results
Example: How to present simulation output badly.
What should be presented visually?
What should be presented orally?
1
Example
A common problem in statistical research is to compare the quality of
different estimators by simulation.
In this example, we will compare the sample mean and sample median,
using simulated random data.
We will study both of these statistics as estimators for the population
mean µ.
2
Example: Theory
It is known that the sample mean minimizes the sum of squares:
n
X
(yi − µ)2
i=1
This is equivalent to the maximum likelihood estimator under a normal
assumption on the y’s.
The sample median is an estimator which minimizes the sum of the
absolute differences
n
X
|yi − µ|.
i=1
This is equivalent to the maximum likelihood estimator under a double
exponential assumption on the y’s.
Thus, we expect the sample mean to behave better for normal-like data
and the sample median to behave better when there are outliers.
3
Simulation Study
We will verify these theoretical ideas using simulation.
Various distributions, including the normal distribution will be simulated.
Various sample sizes will be studied.
In each case, the sample mean and median will be computed.
Boxplots will be used to display the simulation output for ease of comparison.
4
Simulation Output
−1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
1
2
●
●
●
●
●
●
●
●
●
●
●
●
1
120
●
●
2
0 20 40 60 80
−1.0
0.0 0.5 1.0
1
0.0 0.5 1.0
●
−1.0
●
●
●
●
0.0
1.0
●
2
●
●
●
●
●
●
●
●
●
●
●
●
1
2
5
●
●
●
0.5
●
●
●
●
●
●
●
●
●
●
●
●
1
●
●
●
2
1
2
●
●
●
●
●
●
●
●
●
●
●
●
●
0
●
●
●
●
●
●
●
●
1
−5
●
●
●
●
●
●
−0.5
●
●
●
5
●
●
●
●
●
●
0.0
0.5
●
−0.5
−0.6
−0.2
0.0
0.2
0.6
Simulation Output
●
●
●
●
●
●
●
2
1
2
6
●
0.4
●
●
●
●
●
●
●
●
●
●
−0.4 −0.2
0.0
0.0
0.2
0.2
●
●
●
●
●
●
●
●
1
2
1
●
●
●
●
0.4
−0.4 −0.2
0.4
Simulation Output
2
5
0.0
0.2
10 15 20
●
●
●
●
●
●
1
2
−5
0
−0.2
●
●
●
●
●
●
●
●
●
●
●
●
●
1
2
7
Criticizing the Example
What should be added to the simulation output slides?
What additional information is needed?
... before the output is provided
... with the output
8
Appendix: Conducting a Simulation Study
We will first write a function which
•
•
•
•
simulates the random data for several samples
calculates the means and medians for each sample
creates a boxplot of the means and medians for visual comparison
computes the variances of the means and medians for numerical
comparison
9
Mean/Median Comparison Function
meanmediancomparison <- function(n=10, N=1000, rand.gen=rnorm) {
x <- matrix(rand.gen(n*N), nrow=n)
xmean <- apply(x, 2, mean)
xmedian <- apply(x, 2, median)
boxplot(xmean, xmedian)
vmean <- var(xmean)
vmedian <- var(xmedian)
return(c("variance of means"=vmean,"variance of medians"=
vmedian))
}
10
Making the Comparisons for Different Distributions
We would really like to compare the means and medians under different
scenarios.
Our comparison function has been constructed so that different kinds of
random data can be simulated.
The next function will call the comparison function 4 times, and will use
the output from the comparison function to produce a 2 × 2 graph and
a table of variances.
11
Simulating from Different Distributions
meanmediansimulation <- function(n0=10, N=1000) {
par(mfrow=c(2,2))
sim1 <- meanmediancomparison(n=n0, rand.gen=rnorm)
sim2 <- meanmediancomparison(n=n0, rand.gen = function(n) rt(n,
df=20))
sim3 <- meanmediancomparison(n=n0, rand.gen = function(n) rt(n,
df=10))
sim4 <- meanmediancomparison(n=n0, rand.gen = function(n) rt(n,
df=2))
variance.table <- data.frame(rbind(sim1, sim2, sim3, sim4))
row.names(variance.table) <- c("normal", "t20", "t10", "t2")
return(variance.table)
}
12
Using Different Sample Sizes
We would also like to see the effects of different samples sizes.
Having constructed the earlier functions in the way we have, it is easy
to produce multiple graphs, one for each sample size that we would like
to test.
Here we compare at sample sizes 10, 30 and 100.
n <- c(10, 30, 100)
output <- vector(mode="list", length=3)
pdf("meanmediansimulationexperiment.pdf")
for (i in 1:3) {
output[[i]] <- meanmediansimulation(n0=n[i])
}
dev.off()
13
Download