Designing Monte Carlo Simulation Studies

Designing Monte Carlo Simulation Studies
Xitao Fan, Ph.D.
Chair Professor & Dean
Faculty of Education
University of Macau
Getting Involved in Monte Carlo Simulation
Fan, X., Felsovalyi, A., Sivo, S. A., & Keenan, S. (2002) SAS for Monte
Carlo studies: A guide for quantitative researchers. Cary,
NC: SAS Institute, Inc.
Fan, X. (2012). Designing simulation studies. In H.
Cooper (Ed.), Handbook of Research Methods in
Psychology,Vol. 2 (pp. 427-444). Washington, DC:
American Psychological Association.
Getting Involved in Monte Carlo Simulation
Peugh, J., & Fan, X. (In press). Enumeration index performance in generalized growth
mixture models: a Monte Carlo test of Muthén’s (2003) hypothesis. Structural
Equation Modeling.
Peugh, J., & Fan, X. (In press). Modeling unobserved heterogeneity using latent profile
analysis: A Monte Carlo simulation. Structural Equation Modeling.
Peugh, J., & Fan, X. (2012). How well does growth mixture modeling identify
heterogeneous growth trajectories? A simulation study examining GMM’s
performance characteristics. Structural Equation Modeling, (19), 204-226.
Fan, X., & Sivo, S. A. (2009). Using goodness-of-fit indices in assessing mean
structure invariance. Structural Equation Modeling, 16, 1-16.
Fan, X. & Sivo, S. (2007). Sensitivity of fit indices to model misspecification and model
types. Multivariate Behavioral Research, 42, 509-529.
Sivo, S. A., Fan, X., Witta, E. L., & Willse, J. T. (2006). The search for "optimal" cutoff
properties: Fit index criteria in structural equation modeling. Journal of Experimental
Education, 74, 267-288.
Getting Involved in Monte Carlo Simulation
Fan, Xitao, & Fan, Xiaotao. (2005). Power of latent growth modeling for detecting linear
growth: Number of measurements and comparison with other analytic approaches.
Journal of Experimental Education, 73, 121-139.
Fan, X., & Sivo, S. A. (2005). Sensitivity of fit indices to misspecified structural or
measurement model components: Rationale of two-index strategy revisited. Structural
Equation Modeling, 12, 343-367.
Fan, Xitao, & Fan, Xiaotao. (2005). Using SAS for Monte Carlo simulation research in
structural equation modeling. Structural Equation Modeling, 12, 299-333.
Sivo, S., Fan, X., & Witta, L. (2005). The biasing effects of unmodeled ARMA time series
processes on latent growth curve model estimates. Structural Equation Modeling, 12,
Fan, X. (2003). Two Approaches for Correcting Correlation Attenuation Caused by
Measurement Error: Implications for Research Practice. Educational and Psychological
Measurement, 63, 6, 915-930.
Fan, X. (2003). Power of latent growth modeling for detecting group differences in linear
growth trajectory parameters. Structural Equation Modeling, 10, 380-400.
Getting Involved in Monte Carlo Simulation
Yin, P., & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of
different analytical methods. Journal of Experimental Education, 69, 203-224.
Fan, X., & Wang, L. (1999). Comparing logistic regression with linear discriminant analysis in
their classification accuracy. Journal of Experimental Education, 67, 265-286.
Fan, X., Thompson, B, & Wang, L. (1999). The effects of sample size, estimation methods,
and model specification on SEM fit indices. Structural Equation Modeling: A
Multidisciplinary Journal, 6, 56-83.
Fan, X., & Wang, L. (1998). Effects of potential confounding factors on fit indices and
parameter estimates for true and misspecified SEM models. Educational and
Psychological Measurement, 58, 699-733.
Fan, X. & Wang, L. (1996). Comparability of jackknife and bootstrap results: An investigation
for a case of canonical analysis. Journal of Experimental Education, 64, 173-189.
What Is a Monte Carlo Simulation Study?
“the use of random sampling techniques and often the use of computer
simulation to obtain approximate solutions to mathematical or physical
problems especially in terms of a range of values each of which has a
calculated probability of being the solution” (Merriam-Webster OnLine).
An empirical alternative to a theoretical approach (i.e., a solution based
on statistical/mathematical theory)
Increasingly possible because of the advances in computing technology
Situations Where Simulation Is Useful
Consequences of Assumption Violations
Statistical Theory: stipulates what the condition should be, but does not say what
the reality would be if the conditions were not satisfied in the data
Understanding a Sample Statistic That May Not Have Theoretical
Many Other Situations
Retaining the optimal number of factors in EFA
Evaluating the performance of mixture modeling in identifying the latent
Assessing the consequences of failure to model correlated error structure in
latent growth modeling
Basic Steps in a Simulation Study
Asking Questions Suitable for a Simulation Study
Questions for which no (no trustworthy) analytical/theoretical solutions
 Simulation Study Design (Example)
 Include / manipulate the major factors that potentially affect the outcome
 Data Generation
 Sample data generation & transformation
 Analysis (Model Fitting) for Sample Data
 Accumulation and Analysis of the Statistic(s) of Interest
 Presentation and Drawing Conclusions
 Conclusions limited to the design conditions
An Example: Independent t-test (group variance homogeneity)
An Example: Independent t-test (group variance homogeneity)
Data Generation in a Simulation Study
 Common Random Number Generators
* binomial, Cauchy, exponential, gamma, Poisson, normal, uniform, etc.
* All distributions are based on uniform distribution
Simulating Univariate Sample Data
* Normally-Distributed Sample Data (N ~ , 2)
* Non-Normal Distribution: Fleishman (1978):
a, b, c, d: coefficients needed for transforming the unit normal variate to a nonnormal variable with specified degrees of population skewness and kurtosis.
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43,
Data Generation in a Simulation Study
Sample Data from a Multivariate Normal Distribution
matrix decomposition procedure (Kaiser & Dickman, 1962):
k  k matrix containing principal component factor pattern coefficients obtained by
applying principal component factorization to the given population inter-correlation
matrix R;
Sample Data from a Multivariate Non-Normal Distribution
Interaction between non-normality and inter-variable correlations
Intermediate correlations using Fleishman coefficients (Vale & Maurelli, 1983)
Matrix decomposition procedure applied to intermediate correlation matrix
Kaiser, H. F., & Dickman, K. (1962). Sample and population score matrices and sample correlation matrices from
an arbitrary population correlation matrix. Psychometrika, 27, 179-182
Vale, C. D., & Maurelli, V. A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48, 465471.
Checking the Validity of Data Generation Procedures
 Example: Multivariate non-normal sample data (three correlated
From Simulation Design to Population Data Parameters
It may take much effort to obtain population parameters – t-test example
From Simulation Design to Population Data Parameters
Latent growth model example
From Simulation Design to Population Data Parameters
Latent growth model example
Accumulation and Analysis of the Statistic(s) of Interest
 Accumulation: Straightforward or Complicated
* Typically, not an automated process
* Statistical software used
* Analytical techniques involved
* Type of statistic(s) of interest, etc.
 Analysis
Follow-up data analysis may be simple or complicated
Not different from many other data analysis situations
Presentation and Drawing Conclusions
 Presentation
* Representativeness & Exceptions
* Graphic Presentations
* Typical: table after table of results – No one has the time to read the tables!
 Drawing Conclusions
Validity & generalizability depend on the adequacy & appropriateness of
simulation design
Conclusions must be limited by the design conditions and levels.