Presenting Simulation Results Example: How to present simulation output badly. What should be presented visually? What should be presented orally? 1 Example A common problem in statistical research is to compare the quality of different estimators by simulation. In this example, we will compare the sample mean and sample median, using simulated random data. We will study both of these statistics as estimators for the population mean µ. 2 Example: Theory It is known that the sample mean minimizes the sum of squares: n X (yi − µ)2 i=1 This is equivalent to the maximum likelihood estimator under a normal assumption on the y’s. The sample median is an estimator which minimizes the sum of the absolute differences n X |yi − µ|. i=1 This is equivalent to the maximum likelihood estimator under a double exponential assumption on the y’s. Thus, we expect the sample mean to behave better for normal-like data and the sample median to behave better when there are outliers. 3 Simulation Study We will verify these theoretical ideas using simulation. Various distributions, including the normal distribution will be simulated. Various sample sizes will be studied. In each case, the sample mean and median will be computed. Boxplots will be used to display the simulation output for ease of comparison. 4 Simulation Output −1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 ● ● ● ● ● ● ● ● ● ● ● ● 1 120 ● ● 2 0 20 40 60 80 −1.0 0.0 0.5 1.0 1 0.0 0.5 1.0 ● −1.0 ● ● ● ● 0.0 1.0 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● 1 2 5 ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● 2 1 2 ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● 1 −5 ● ● ● ● ● ● −0.5 ● ● ● 5 ● ● ● ● ● ● 0.0 0.5 ● −0.5 −0.6 −0.2 0.0 0.2 0.6 Simulation Output ● ● ● ● ● ● ● 2 1 2 6 ● 0.4 ● ● ● ● ● ● ● ● ● ● −0.4 −0.2 0.0 0.0 0.2 0.2 ● ● ● ● ● ● ● ● 1 2 1 ● ● ● ● 0.4 −0.4 −0.2 0.4 Simulation Output 2 5 0.0 0.2 10 15 20 ● ● ● ● ● ● 1 2 −5 0 −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 7 Criticizing the Example What should be added to the simulation output slides? What additional information is needed? ... before the output is provided ... with the output 8 Appendix: Conducting a Simulation Study We will first write a function which • • • • simulates the random data for several samples calculates the means and medians for each sample creates a boxplot of the means and medians for visual comparison computes the variances of the means and medians for numerical comparison 9 Mean/Median Comparison Function meanmediancomparison <- function(n=10, N=1000, rand.gen=rnorm) { x <- matrix(rand.gen(n*N), nrow=n) xmean <- apply(x, 2, mean) xmedian <- apply(x, 2, median) boxplot(xmean, xmedian) vmean <- var(xmean) vmedian <- var(xmedian) return(c("variance of means"=vmean,"variance of medians"= vmedian)) } 10 Making the Comparisons for Different Distributions We would really like to compare the means and medians under different scenarios. Our comparison function has been constructed so that different kinds of random data can be simulated. The next function will call the comparison function 4 times, and will use the output from the comparison function to produce a 2 × 2 graph and a table of variances. 11 Simulating from Different Distributions meanmediansimulation <- function(n0=10, N=1000) { par(mfrow=c(2,2)) sim1 <- meanmediancomparison(n=n0, rand.gen=rnorm) sim2 <- meanmediancomparison(n=n0, rand.gen = function(n) rt(n, df=20)) sim3 <- meanmediancomparison(n=n0, rand.gen = function(n) rt(n, df=10)) sim4 <- meanmediancomparison(n=n0, rand.gen = function(n) rt(n, df=2)) variance.table <- data.frame(rbind(sim1, sim2, sim3, sim4)) row.names(variance.table) <- c("normal", "t20", "t10", "t2") return(variance.table) } 12 Using Different Sample Sizes We would also like to see the effects of different samples sizes. Having constructed the earlier functions in the way we have, it is easy to produce multiple graphs, one for each sample size that we would like to test. Here we compare at sample sizes 10, 30 and 100. n <- c(10, 30, 100) output <- vector(mode="list", length=3) pdf("meanmediansimulationexperiment.pdf") for (i in 1:3) { output[[i]] <- meanmediansimulation(n0=n[i]) } dev.off() 13