Structural Equation Modeling Theory: 1. Confirmatory 2. Theory-driven 3. Complex multiple regression 4. Not causal, model is based on correlation 5. Model = Data + Error Definitions: 1. Measurement and Structural Models a. A measurement model is a latent variable and the observed variables that measure the latent. b. A structural model includes the relationships among latent variables. 2. Latent and Manifest Variables a. Latent variables are the constructs being measured. Examples of latent variables include IQ, depression, political orientation, short-term memory, and consumerism. There must be at least two manifest (or observed variables) to measure each latent variable with more variables being better. Latent variables are represented by ovals in the model. 1|Page Kayla N. Jordan – RStats Workshop Spring 2014 b. Manifest or observed variables are the actual measurements on a construct. These could include scores on an IQ test, clinical ratings of depression symptoms, support of political candidates, and violent behaviors. Observed variables are represented by squares in the model. c. SEM works best is observed variables are measured on a continuous scale. d. Measures should be reliable and valid. If they variables are not measured well, then the model will not work. 3. Endogenous and Exogenous Variables a. Endogenous variables are latent variables and are the independent variables in the model. These variables do not have any arrows going into them. b. Exogenous variables are latent variables and are the dependent variables in the model. These are any variables which have arrows going into them. 4. Knowns and Unknowns a. The knowns in the model are all the correlations derived from the data. The number of knowns can be calculated with the formula, n(n+1)/2, where n is the number of observed variables in the model. 2|Page Kayla N. Jordan – RStats Workshop Spring 2014 b. The unknowns (or parameters) in the model are the number of arrows in the model which represent all the pathways being estimated. c. Degrees of freedom = knowns - unknowns 5. Underidentified and Overidentified Models a. Underidentified models are ones in which there are more unknowns than knowns. In order to fix these, certain parameters should be constrained, or forced to equal a certain value. Typically, this is accomplished by setting one of the factor loadings for each latent equal to one; this also assigns a scale for the latent variable. b. Overidentified models are ones in which there are more knowns than unknowns. This is good. 6. Example of Calculating Degrees of Freedom: a. Knowns: n(n+1)/2 = 8(9)/2 = 36 b. Unknowns: 5 factor loadings, 2 path coefficients, 8 error variances, 2 residuals = 17 total unknowns c. Degrees of Freedom: Knowns – Unknowns = 19 7. Arrows a. A straight arrow represents a path in the model. b. A curved arrow represents a correlation between two variables. 8. Estimates a. Factor loadings are used to determine if an observed variable is measuring the latent variable. If factor loadings are not significant, then the observed variable is not measuring the latent variable well. b. Path coefficients or beta weights are used to determine how well exogenous (predictor) variables are predicting endogenous (outcome) variables. If path coefficients are not significant, then the exogenous variable is not predicting the endogenous variable. 3|Page Kayla N. Jordan – RStats Workshop Spring 2014 c. Error variances are associated with each observed variable and are a measure of how much error there is in measuring each observed variable. If high, then there are problems with measure the variable. d. Residuals are associated with each endogenous variable and represent the difference between the hypothesized model and the data. These can also be used to calculate an effect size for the relationship between two variables in terms of the variance accounted for by squaring the residual and subtracting that value from one. Types of Models 1. Confirmatory Factor Analysis (CFA) a. Tests the validity of a factor structure b. Should be based on theory or exploratory factor analysis (EFA) c. Often used to determine if items on a questionnaire group together d. A group of items which are similar (based on how well they are correlated) is called a factor. 4|Page Kayla N. Jordan – RStats Workshop Spring 2014 2. Path Analysis a. Tests relationship between observed variables 5|Page Kayla N. Jordan – RStats Workshop Spring 2014 3. Full Structural Models a. Tests the fit of both measurement and structural models b. Mediation and moderation 4. Multi-trait, multi-method (MTMM) a. Tests whether or not multiple measures measure the same set of traits 6|Page Kayla N. Jordan – RStats Workshop Spring 2014 5. Multi-group CFA a. Tests whether a factor structure holds for two groups b. For example, if the same factors work for both men and women 6. Latent Growth Curves a. Tests how a measure changes over time Fit Indices 1. Chi-square a. Test of the goodness-of-fit of the model b. Smaller chi-square is better c. Non-significant = good d. Highly influenced by sample size 2. Factor Loadings/Path Coefficients a. These should be significant b. If non-significant, that observed variable or path is not contributing to the model. 3. Comparative Fit Index (CFI) a. Test of goodness-of-fit b. Greater than .90 is good; greater than .95 is better 4. Normed Fit Index (NFI) a. Same as CFI 5. Tucker-Lewis Index (TLI) a. Same as CFI 6. Root Mean Square Error of Approximation (RMSEA) a. Measure of difference between observed correlations and model correlations b. Less than .10 is good; less than .06 is better c. Influenced by small df and sample size 7. Standardized Root Mean Square Residual (SRMR) a. Same as RMSEA 8. Example Output: a. Default Model is the hypothesized model and produces the fit indices for the model that was created. b. Saturated Model is a model with no degrees of freedom model, and Independence Model is a model with the least possible number degrees of freedom. These can be ignored. c. CMIN is the chi-square value and DF is the associated degrees of freedom. The P value should be greater than .05 in a good fitting model. This output indicates a poor fitting model. d. CMIN/DF is a chi-square value correcting for sample size. This should be less than 3 in a good fitting model. e. NFI, TLI, and CFI indicate a poor fitting model as they are less than .90. f. RMSEA indicates an adequate model as it is below .10. 7|Page Kayla N. Jordan – RStats Workshop Spring 2014 Model Fit Summary CMIN Model Default model Saturated model Independence model NPAR 70 465 30 CMIN 1021.692 .000 2715.382 NFI Delta1 .624 1.000 .000 RFI rho1 .586 DF 395 0 435 P .000 CMIN/DF 2.587 .000 6.242 Baseline Comparisons Model Default model Saturated model Independence model .000 IFI Delta2 .730 1.000 .000 TLI rho2 .697 .000 CFI .725 1.000 .000 RMSEA Model Default model Independence model RMSEA .089 .162 LO 90 .082 .156 HI 90 .096 .168 PCLOSE .000 .000 Model Comparisons 1. The initial, hypothesized model often does not work out or there could be competing models; therefore, it is important to be able to compare models. 2. Modification Indices a. These are suggested additions to the model to improve the fit. 3. Chi-square difference test a. Subtract the chi-square values of the two models, b. Subtract the degrees of freedom of the two models, c. Determine the critical chi-square value, d. If the difference in the chi-square values is greater than the critical chi square value, then the models are significantly different. e. Smaller chi-square values are better, so whichever model has the smaller chisquare value is the better model. 4. CFI difference a. If the difference in the CFI values of two models is greater than .01, then the models are significantly different. b. The model with the greater CFI is the better model. 8|Page Kayla N. Jordan – RStats Workshop Spring 2014 Assumptions 1. Sample Size a. Good sample size estimate is 100-200 b. Larger sample may be needed if there are a large number of variables c. Some SEM programs do not allow there to be missing data 2. Normality a. Data should be normally distributed b. Bootstrapping can help with non-normal data 3. Outliers a. Outliers can affect the fit of the model b. Can be checked using Mahlanobis distance 4. Linearity 5. Multicollinearity a. It is important to test for multicollinearity. b. If two variables are highly correlated (r > .9), it is basically like putting the same variable in the model twice. c. If this is a problem, one of the variables should be deleted. 6. Homoscedascity Programs 1. AMOS a. Good for beginners b. Visual representation of models 2. EQS 3. LISREL 4. MPlus Recommended Texts Byrne, B. (2006). Structural Equation Modeling with EQS. 2nd Edition, (New York: Routlege). Byrne, B. (2009). Structural Equation Modeling with AMOS. (New York: Routlege). Brown, T. A. (2006) Confirmatory Factor Analysis for Applied Research. (New York: Guilford Press). Kline, R. B. (2010). Principles and Practice of Structural Equation Modeling. (New York: Guilford Press). 9|Page Kayla N. Jordan – RStats Workshop Spring 2014