An Introduction to R: Monte Carlo Simulation MWERA 2012 Emily A. Price, MS Marsha Lewis, MPA Dr. Gordon P. Brooks Objectives and/or Goals • Three main parts – Data generation in R – Basic Monte Carlo programming (e.g. loops) – Running simulations (e.g., investigating Type I errors) Why Use Monte Carlo Methods? • According to Mooney (1997) Monte Carlo simulations are useful to – Make inferences when weak statistical theory exists for an estimator – Test null hypotheses under a variety of plausible conditions – Assess the quality of an inference method – Assess the robustness of parametric inference to assumption violations – Compare estimator’s properties What are Monte Carlo Methods? • Experiments composed of random numbers to evaluate mathematical expressions (Gentle, 2003) • Empirically determine the sampling distribution of a test statistic • Computer-based methods for approximating values and properties of random variables (Braun & Murdoch, 2007) Logic of Monte Carlo • Mooney (1997) presents five steps 1. 2. 3. 4. 5. Specify the pseudo-population in symbolic terms in such a way that it can be used to generate samples. That is, writing code to generate data in a specific manner. Sample from the pseudo-population in ways that reflect the topic of interest Calculate θ in a pseudo-sample and store it in a vector Repeat steps 2 and 3 t times where t is the number of trials Construct a relative frequency distribution of resulting values which is a Monte Carlo estimate of the sampling distribution of under the conditions specified by the pseudo-population and the sampling procedures Practical Issues/ Considerations • • • • What software to use? How much time to run the simulation? Reproducibility of results Adequacy of random number generator Why use R? • • • • It’s FREE It is a flexible language that can be controlled by the user It uses a vector based approach Depending on the package, there are built in commands which the user can access and minimize the amount of programming required for MC simulation – Make sure to load the require packages at the beginning of the session • R community has a plethora of information: help websites, listservs, textbooks, blogs – Manuals for R available at http://cran.r-project.org/manuals.html Part 1: Data Generation • RNG and setting seed – Purpose of the seed is to recovery results • • • • Initialize all parameters of interest Loops Print results Access output Generating a Single Random Variable • R has four parts: CDF, PDF, Quantile function and simulation procedure – dnorm, pnorm, qnorm, rnorm respectively • rnorm(x,mean=0,sd=1) • runif(20,min=2,max=5) • Distributions: normal, uniform, poisson, beta, gamma, chisquare, weibull, exponential Try it, you’ll like it! • rnorm(x,mean=0,sd=1) Generate a normal distribution of 50 values with a mean of 50 and sd of 10 • x <- sample(1:2,20,TRUE,prob=c(1/2,1/2)) Generate data that mimics rolling a die Generating Correlated Data • X~Normal (20, 5), Y~Normal (40, 10), corr(X,Y) =0.6 – 4 inputs • Sample size, mean, variance-covariance matrix, and method – 3 methods of data generation • Eigenvalue (default), Singular Value, and Cholesky Try it, you’ll like it! • rmvnorm(n, mean, sigma, method) Generate data for 3 variables such that X --Normal (20, 5), Y-- Normal (40, 10), Z -- Normal (60,15) and Corr(X,Y) =0.6, Corr(X,Z) = 0.7, Corr(Y,Z)=0.8 Part 2: Basic MC Programming • Four steps (Braun & Murdoch, 2007) 1. Understand the problem 2. Work out a general idea how to solve it • Flow charts 3. Translate your general idea into a detailed implementation • Turn the flowchart into code 4. Check: Does it work? Programming Commands* • Loops – for, if, ifelse, while • Statements – repeat, break, next * We can’t cover all programming aspects but wanted to mention other commands Functions • They are “self-contained units with a well-defined purpose” (Braun & Murdoch, 2007, p. 59) – Take an input, do some calculations, and produce an output • In R, functions are objects and can be manipulated like other more common objects such as vectors, matrices, and lists. – R provides source code for its own functions • R allows you to write your own functions Part 3: Running Simulations • Trimmed mean sampling distribution • Replicating a published Monte Carlo study in R. – Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology 57, 173–181. Questions • Thank you for your time References • Braun, W. J., & Murdoch, D. J. (2007). A first course in statistical programming with R. New York: Cambridge University. • Gentle, J. E. (2003). Random number generation and Monte Carlo methods (2nd ed.). New York: Springer-Verlag. • Mooney, C. Z. (1997). Monte Carlo simulation (Sage University Paper series on Quantitative Applications in the Social Sciences, series no. 07-116). Thousand Oaks, CA: Sage. • Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology 57, 173–181. • Our code