Structural Equation Models Asma Alfadhel Sarah Asio Jimmy(Yuanshan) Cheng 10/10/2013 Outline: Part I: - CFA Part II: SEM - SEM Plots - Part III: - Goodness of Fit Part I (CFA) - Outline: CFA Confirmatory Factor Analysis Available R Packages. The Lavaan Package. Model Description. Apply CFA and Interpret Results Confirmatory Factor Analysis vs. EFA EFA: exploratory All loadings are free to vary (“L” has no zeros) Assumption: Cov(F) = I CFA: driven by theory The number of factors Correlations between factors, Cov(F) = ɸ Which items load onto which factors CFA allows for the constraint of certain loadings to be zero Diagram Confirmatory Factor Analysis vs. SEM SEM: specify the causality between factors Directed arrows between latent variables Called the structural model CFA: no directed arrows between latent factors Called (the measurement model) CFA is frequently used as a first step to assess the proposed measurement model in a structural equation model. (wikipedia) Objective of CFA: Cov(Y) = L Cov(F) LT + Ψ Factors are uncorrelated with error terms, and error terms are uncorrelated Cov(Y): the covariance of the observed variables Cov(F) = ɸ, the covariance of the factors Cov(Y) = L ɸ LT + Ψ Ʃ = (Observed Cov) Ʃ(Ɵ) (Implied Cov) Try to match the implied covariance with the observed covariance R Packages for SEM “SEM” package: developed by John Fox and for along time was the only option in R “OpenMx” package: developed by Steven Boker. “lavaan” package: developed by Yves Rossel from the Ghent University in Belgium. The “lavaan” Package: lavaan is an R package for latent variable analysis: * confirmatory factor analysis: function cfa() structural equation modeling: function sem() latent curve analysis / growth modeling: function growth() general mean/covariance structure modeling: function lavaan() (item response theory (IRT) models) (latent class + mixture models) (multilevel models) More information: Lavaan website. lavaan: an R package for structural equation modeling. Journal of Statistical Software *http://users.ugent.be/~yrosseel/lavaan/lavaan1.pdf “cfa” function: Description: Fit a Confirmatory Factor Analysis (CFA) model. Usage cfa (model = NULL, data = NULL, meanstructure = "default", fixed.x = "default", orthogonal = FALSE, std.lv = FALSE, std.ov = FALSE, missing = "default", ordered = NULL, sample.cov = NULL, sample.cov.rescale = "default", sample.mean = NULL, sample.nobs = NULL, ridge = 1e-05, group = NULL, group.label = NULL, group.equal = "", group.partial = "", cluster = NULL, constraints = ’’, estimator = "default", likelihood = "default", information = "default", se = "default", test = "default", bootstrap = 1000L, mimic = "default", representation = "default", do.fit = TRUE, control = list(), WLS.V = NULL, NACOV = NULL, start = "default", verbose = FALSE, warn = TRUE, debug = FALSE) Arguments model: A description of the user-specified model. data: An optional data frame containing the observed variables used in the model. std.lv: If TRUE, the metric of each latent variable is determined by fixing their variances to 1.0. If FALSE, the metric of each latent variable is determined by fixing the factor loading of the first indicator to 1.0. std.ov: If TRUE, all observed variables are standardized before entering the analysis. Missing: If the data contain missing values, the default behavior is “listwise” deletion. If the missing mechanism is MCAR (missing completely at random) or MAR (missing at random), the lavaan package provides case-wise (or 'full information') maximum likelihood estimation (Set missing = "ML"). Model Description The dataset was collected by Sarah Asio. The original model consists of 12 factors and 42 observed indicators. For simplification a sub-model was used; it consists of 4 factors and 23 observed variables. The dataset contains a sample of 381 responses from students. The items range in value from 1 to 6. Team Innovation Team Communication Team Effort Team Learning Specifying the model: (Symbols) =~ “latent variable definition” latent variable =~ indicator1 + indicator2 + indicator3 It define how the latent variables are 'manifested by' a set of observed variables. The reason why this model syntax is so short, is that the function will take care of several things: • First, by default, the factor loading of the first indicator of a latent variable is fixed to 1, thereby fixing the scale of the latent variable. • Second, residual variances are added automatically. • And third, all exogenous latent variables are correlated by default. http://lavaan.ugent.be/tutorial/cfa.html Specifying the model: (Symbols) ~~ ~ “Correlation” --- Correlated with Residual Variance Covariance of each latent variable. “Regression” --- Regressed on This is used in specifying the SEM model. Specifying the model: #Specify the model Our.model <- 'CMM =~ CM9 + CM10 + CM11 + CM12 + CM13 EFF =~ EF14 + EF15 + EF16 +EF17 LN =~ LN18 + LN19 +LN20 +LN21 +LN22 +LN23 +LN24 INN =~ IN36 + IN37 + IN38 + IN39 + IN40 + IN41 + IN42' fit <- cfa(Our.model, data=MyData) Syntax summary(fit, fit.measures=T) Missing values, Standardization, & R2 fit <- cfa(Our.model, data=MyData, std.lv=TRUE, std.ov = TRUE, missing = "ML") summary(fit, fit.measures=T, rsq=T) OR Inspect(fit, "rsquare") (no round off) fit <- cfa(Our.model, data=MyData, missing = "ML") summary(fit, standardized = TRUE, rsq =TRUE) 2st Output fit <- cfa(Our.model, data=MyData, std.lv=TRUE, std.ov = TRUE, missing = "ML") Inspect(fit, "rsquare") CFA Syntax in “lavaan” vs “sem” install.packages("semPlot") Lavaan.model <- semSyntax(fit, "lavaan") Sem.model <- semSyntax(fit, “sem") Output: BACK CFA vs. EFA Back Part II Outline SEM process - Overview SEM Measurement models SEM Path diagram - Overview R-Code for: SEM model specification SEM model fitting SEM Path Diagram Outputs for SEM model and path diagram STRUCTURAL Equations Modeling (SEM) process Specify the Model Select Measures of the theoretical model and collect data Determine whether the model is identified Analyze the Model Analyze the model fit Finish or Re-specify the model and repeat process Notes: SEM vs CFA “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf SEM Measurement models Endogenous measurement model: Exogenous measurement model: X = BxU + ex Y = ByZ + ey • Here: • Here: • Y is an (ny x1) matrix of endogenous indicators, • X is an (nx x1) matrix of exogenous indicators, • By is an (nyxq) matrix of coefficients from the endogenous variable to endogenous indicators, • Bx is an (nx xp) matrix of coefficients from the exogenous variables to exogenous indicators, • Z is a (qx1) matrix of endogenous latent variable(s), • U is a (px1) matrix of exogenous latent variable(s), • ey is a (nyx1) matrix for error associated with the endogenous indicators. • ex is a (nx x1) matrix for error associated with the exogenous indicators. “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf Overall SEM Measurement & Structural models SEM model for the case study: Z = BzU + ez • Here: Effort+ Learning Communication • Z is the endogenous variable, • U is a (3x1) matrix of exogenous latent variable(s), Innovation • Bz is a (1x3) matrix of coefficients of exogenous variables, • ez is the error associated with the endogenous variable. “Factor Analysis, Path Analysis, and Structural Equations Modeling”, Book extract, Jones and Bartlett publishers. http://www.jblearning.com/samples/0763755486/55485_CH14_Walker.pdf Matrix representation for SEM measurement models X = BxU + ex Y = ByZ + ey Z = BsU + es Notes: CFA vs EFA SEM Path diagram - Overview •A path diagram is a graphical representation of the hypothesized relationships between the variables. •Exogenous – emanates arrow (analogous to independent variables). • communication, effort and learning •Endogenous – receives arrow (analogous to dependent variables). • innovation and measures •Other variables are error terms which account for random or measurement error for endogenous variables. http://en.wikipedia.org/wiki/Structural_equation_modeling Path Diagram Node representations http://people.ucsc.edu/~zurbrigg/psy214b/09SEM3a.pdf R-Code for SEM model specification #Specify the model Our.model <- ‘ CMM =~ CM9 + CM10 + CM11 + CM12 + CM13 EFF =~ EF14 + EF15 + EF16 +EF17 LN =~ LN18 + LN19 +LN20 +LN21 +LN22 +LN23 +LN24 INN =~ IN36 + IN37 + IN38 + IN39 + IN40 + IN41 + IN42 INN ~ CMM + EFF + LN’ #Install the lavaan package install.packages("lavaan") require("lavaan") R-Code for SEM model fitting # Fit SEM model using standardized data fit <- lavaan ::: sem(Our.model, data=SEMdata, std.lv=TRUE, std.ov = T, missing = "ML") summary(fit, standardized=TRUE, fit.measures=TRUE, rsquare=TRUE) Syntax definitions: std.lv: If TRUE, the metric of each latent variable is determined by fixing their variances to 1.0. If FALSE, the metric of each latent variable is determined by fixing the factor loading of the first indicator to 1.0. std.ov: If TRUE, all observed variables are standardized before entering the analysis. Missing: If "listwise", cases with missing values are removed listwise from the data frame before analysis. If "direct" or "ml" or "fiml" and the estimator is maximum likelihood, Full Information Maximum Likelihood (FIML) estimation is used using all available data in the data frame. http://cran.r-project.org/web/packages/lavaan/lavaan.pdf R-Code for SEMS Path Diagram #Install semPlot package install.packages("semPlot") require("semPlot") # Plot input path diagram semPaths(fit,title=FALSE, curvePivot = TRUE, exoVar = FALSE, exoCov = FALSE) # Plot output path diagram with standardized parameters semPaths(fit, "std”, curvePivot = TRUE, exoVar = FALSE, exoCov = FALSE) For more options and Syntax definitions, refer to: http://cran.r-project.org/web/packages/semPlot/semPlot.pdf Input Path diagram Output Path Diagram Part III (Goodness of fit) - Outline • Introduction to fit indices • Using R to show these indices • Modification indices Goodness of fit • Model fit: “how the model that best represents the data reflects underlying theory” • Population covariance matrix (∑) Matches Implied covariance matrix (∑(θ) ) • So far not yet an agreement on • Which indices to use • Cut-offs for various indices Hopper et. al (2008) Overview of Indices Types of index Description Examples Absolute fit indices How well a priori model fits the sample data (McDonald and Ho, 2002) Chi-Square, RMSEA, GFI, AGFI, RMR, SRMR Incremental fit indices AKA Comparative (Miles and Shevlin, 2007) or relative fit indices (McDonald and Ho, 2002) NFI, NNFI, CFI Parsimony fit indices Overcome the problem of high fit for less rigorous theoretical model Penalize for model complexity (Hopper et. al ,2008) AIC, BIC Hopper et. al (2008) Benchmarks Summary Indices Acceptable Threshold levels Comments Chi-Square χ2 p value (p > 0.05) Sensitive to sample size RMSEA Less than 0.07 (Steiger, 2007) Has a known distribution. Favors parsimony. SRMR Less than 0.08 (Hu and Bentler, 1999) Standardized root mean square residual CFI Greater than 0.95 TLI Greater than 0.95 AIC; BIC Smaller is better Performs well in simulation studies(Sharma et al, 2005) Hopper et. al (2008) Reporting Strategy • Not necessary to report all • Do not choose to report only the good ones • CFI, GFI, NFI, and NNFI are most commonly reported (McDonald and Ho 2002) Hopper et. al (2008) Reporting Strategy • Hopper et al (2008) • • Chi-Square, df, p-value RMSEA, SRMR, CFI and one parsimony fit index • Two-index presentation strategy (Hu and Bentler, 1999) • • • TLI and SRMR RMSEA and SRMR CFI and SRMR Modification indices • To improve the model fit by freeing fixed parameters • CFA is structured by theory • One factor only measures certain but not all observable measures • Parameters assumed to be zeros • Assumed zero error correlations • Just practical standard (Westfall et. al, 2012) Wikipedia Freeing fixed parameters F2 F1 X1 e1 X2 X3 X4 e2 e3 e4 Modification Indices • Don’t allow modification indices to drive to process • Any modification should make theoretical sense • Good practice to assess the fit Hopper et. al (2008)