Introduction to Structural Equation Modeling 1. Uses of SEM a. Exploratory factor analysis b. Path analysis/regression c. Confirmatory factor analysis d. Causal modeling 2. Background topics a. Path analysis b. Confirmatory Factor analysis c. LISREL matrix language vs. SIMPLIS 1 Comparison of Statistical Approaches Path Analysis o One or more categorical or continuous IVs, multiple continuous DVs Structural Equation Modeling (SEM) o Is like a path analysis, except that measurement error (as estimated by internal consistency reliability) is removed. o One or more categorical or continuous IVs, multiple continuous DVs; but all corrected for measurement error Factor Analytic Methods (review) o Exploratory (EFA) Principle Components (PCA) Assumes communality is equal to 1.00 (perfect reliability) Multiple continuous items, and want to estimate total amount of explained variance among items Factor Analysis (FA) Assumes less than perfect reliability (communality is usually less than 1.00) 2 Multiple continuous items, and want to estimate the total amount of common variance o Confirmatory (CFA) Also known as restricted factor analysis Multiple continuous items, but rather than letting the data determine factor structure, it is specified a priori. 3 Path Analysis: Overview Path analysis is an extension of the regression model, used to test the fit of the correlation matrix against two or more causal models which are being compared by the researcher. The model is usually depicted in a circle-and-arrow figure in which single arrows indicate causation. A regression is done for each variable in the model as a dependent on others which the model indicates are causes. The regression weights predicted by the model are compared with the observed correlation matrix for the variables, and a goodness-of-fit statistic is calculated. The best-fitting of two or more models is selected by the researcher as the best model for advancement of theory. 4 Path analysis requires the usual assumptions of regression. It is particularly sensitive to model specification because failure to include relevant causal variables or inclusion of extraneous variables often substantially affects the path coefficients, which are used to assess the relative importance of various direct and indirect causal paths to the dependent variable. Such interpretations should be undertaken in the context of comparing alternative models, after assessing their goodness of fit. When the variables in the model are latent variables measured by multiple observed indicators, path analysis is termed structural equation modeling, treated separately. We follow the conventional terminology by which path analysis refers to single-indicator variables. 5 Terminology of Structural Components Constructs (latent variables) Latent variables: Exogenous vs. Endogenous Exogenous variables in a path model are those with no explicit causes (no arrows going to them, other than the measurement error term). If exogenous variables are correlated, this is indicated by a double-headed arrow connecting them. Endogenous variables, then, are those which do have incoming arrows. Endogenous variables include intervening causal variables and dependents. Intervening endogenous variables have both incoming and outgoing causal arrows in the path diagram. The dependent variable(s) have only incoming arrows. Measures (indicators of latent variables) Errors: o associated with measures (measurement error) Note: No measurement errors in Path Analysis, but you can model the measurement error in SEM. o associated with latent variables (structural error) 6 Path coefficient/path weight. A path coefficient is a standardized regression coefficient (beta) showing the direct effect of an independent variable on a dependent variable in the path model. Effect decomposition. Path coefficients may be used to decompose correlations in the model into direct and indirect effects, corresponding, of course, to direct and indirect paths reflected in the arrows in the model. This is based on the rule that in a linear system. Significance and Goodness of Fit in Path Models o To test individual path coefficients one uses the standard t or F test from regression output. o To test the model with all its paths one uses a goodness of fit test from a structural equation modeling program. If a model is correctly specified, including all relevant and excluding all irrelevant variables, with arrows correctly indicated, then the sum of path coefficients (standardized) will equal the correlation coefficient. 7 This means one can compare the path-estimated correlation matrix with the observed correlation matrix to assess the goodness-of-fit of path models. As a practical matter, goodness-of-fit is calculated by entering the model and its data into a structural equation modeling program (LISREL), which computes a variety of alternative goodnessof-fit coefficients. 8 Path coefficients and Effect decomposition Path coefficient is equal to regression coefficient (beta weights in regression). 1 P41 P31 3 P21 4 P43 P32 2 P42 What are exogenous and endogenous variables? Variable 1 vs. Variables 2, 3, 3 To get path coefficients, you can run separate regression for each endogenous variable. To test individual path coefficients one uses the standard t or F test from regression output. P21 : regress endogenous 2 on 1. P31 : regress endogenous 3 on 1. P32 : regress endogenous 3 on 2. P41 : regress endogenous 4 on 1. P42 : regress endogenous 4 on 2. P43 : regress endogenous 4 on 3. 9 How to get reproduced correlations (r)? Use the path values and plug into the equations for the correlation. Reproduce as many of the correlations as you have paths to represent. 1 P41 P31 3 P21 4 P43 P32 2 P42 r12 = P21 r13 = P31 + P32*r12 r23 = P32 + P31*r12 r14 = P41 + P43*r13 + P42*r12 r24 = P42 + P44*r23 + P41*r12 r34 = P43 + P41*r13 + P42*r23 10 How can one test for model fit? Model fits data if the path coefficients can be combined to reproduce the observed correlations among the variables. 11 Model Identification 1 P41 P31 3 P21 P43 P32 2 4 P42 6 unknowns: path coefficients 6 knowns: the correlations between the variables. Equation = (k2 – k)/2 (k = # of variables) 12 Just identified model If the number of unknowns is equal to (k2 – k)/2, you have a just identified model. With 4 variables, (42 – 4)/2 = 6 For a just identified model, the path coefficients that we calculate can be used to reproduce exactly the observed correlation matrix. The problem is that you always reproduce the exact observed correlations no matter how well the model fit the data. Over identified model If we did not specify the link between Variable 1 and 2, we have 5 unknowns, and 6 knowns. When the number of knowns exceed the number of unknowns, it is a over identified model, which produces 1 overidentifying restrictions. 13 Example SES (1) .41 .30 .42 nAch (3) IQ (2) GPA (4) .50 3 4 unknowns: path coefficients 6 knowns: the correlations between the variables. This is an over-identified model. SES (1) IQ (2) nAch (3) GPA (4) SES (1) 1.00 .30 .41 .32 IQ (2) .30 1.00 .12 .55 nAch (3) .41 .16 1.00 .57 GPA (4) .33 .57 .50 1.00 Note. Correlations below the diagonal indicate reproduced correlations, and those above the diagonal indicate observed correlations. 14 Model Fit Is testing the similarity between reproduced correlation and observed correlations. Compare over identified model to just identified model. The larger discrepancy between the two correlations (or two models), the larger chi-square value you will have. You want a nonsignificant chi-square value to show that your reproduced correlations are same/similar to observed correlations. With a just-identified model, the chi-square value = 0. To test the model with all its paths one uses a goodness of fit test from a structural equation modeling program. If a model is correctly specified, including all relevant and excluding all irrelevant variables, with arrows correctly indicated, then the sum of path coefficients (standardized) will equal the correlation coefficient. As a practical matter, goodness-of-fit is calculated by entering the model and its data into a structural equation modeling program (LISREL), which computes a variety of alternative goodness-of-fit coefficients. 15 Model fit indices The following are typical though not universal of what researchers report to indicate degree of fit in path analysis, CFA, and SEM: Overall Chi-Square test: non-significant p-values indicate a good fit. Goodness-of-fit index (GFI, also known as gamma-hat or ˆ ) and Adjusted goodness of fit index (AGFI): the closer to 1.00, the better. 0.90-0.95 acceptable fit, 0.95-0.99 close fit and 1.00 exact fit Comparative Fit Index (CFI): the closer to 1.00, the better. < 0.85 indicate unacceptable fit, 0.85-0.89 mediocre fit, (model could be improved substantially) 0.90-0.95 acceptable fit, 0.95-0.99 close fit and 1.00 exact fit RMSEA (Root Mean Square Error of Approx): the closer to 0, the better. > 0.10 indicate unacceptable fit, 0.10-0.08 mediocre fit, 0.08-0.06 acceptable fit, 0.06-0.01 close fit, 0.00 exact fit 16 Confirmatory Factor Analysis (CFA) Is more of a theoretical, hypothesis testing way of doing factor analysis. Instead of running an EFA and trying to interpret it, we set up a particular structure and then see if it fits the data well. o Factor loadings, error variances, etc., occur as before. o We force simple structure by not allow crossloadings Think of the EFA-CFA distinction like this: o CFA is like hierarchical regression Is a model fit and comparison procedure o EFA is like stepwise regression Is based directly on classical test theory. o But the assumptions of classical test theory can be relaxed (e.g., correlated error variances, etc). CFA can be run through software programs: o SAS Proc Calis o LISREL o AMOS o EQS o M-Plus 17 Here’s a graphical example. Notice that simple structure is present—no crossloadings 18 Learn your Greek See handout~ Let’s practice. 19 Model Specification and Identification It is common to display confirmatory factor models as path diagrams in which squares represent observed variables and circles represent the latent concepts. Single-headed arrows are used to imply a direction of assumed causal influence, and double-headed arrows are used to represent covariance between two latent variables. 20 Path Diagram of a Confirmatory Factor Model In factor analysis the researcher almost always assumes that the latent variables “cause” the observed variables, as shown by the single-headed arrows pointing away from the circles and towards the manifest variables. The two ξ (ksi) latent variables represent common factors, with paths pointing to more than one observed variable. The circles labeled δ (delta) represent unique factors because they affect only a single observed variable. The δi incorporate all the variance in each xi not 21 captured by the common factors, such as measurement error. In this model the two ξi are expected to covary, as represented by the two-headed arrow. Additionally error in the measurement of x3 is expected to correlate to some extent with measurement error for x6. This may occur, for example, with panel data in which ξ1 and ξ2 represent the same concept measured at different points in time; if there is measurement error at t1 it is likely that there will be measurement error at t2. Your CFA project will be about MTMM (Multi-Traits Multi-Methods). Relax…… 22 Structural Equation Modeling (SEM) Combines CFA with path analysis o When we ran path analysis, we assumed the manifest variables were measured without error. This is almost always false. o SEM allows for the modeling of relationships among variables (like path analysis), but with the difference that the variables are latent constructs disattenuated from measurement error. o Think of it like correcting all of the covariances for measurement error, and then running path analysis. o It explicitly integrates measurement with statistics…extremely powerful. o Is perhaps the most flexible modeling approach available. Can model nested designs, growth modeling, multilevel models, etc. Here is a graphical example. 23 X1 X3 X2 .89 .89 Y4 .85 .86 Prehire Process Fairness Y5 .83 Y6 .89 Posthire Process Fairness .51** (.05) .26** (.04) .11 (.06) Y7 .55** (.06) .86 Prehire Applicant Intentions Posthire Applicant Intentions .83** (.07) .82 Y1 .96 Y2 .89 Y8 .95 Y9 .88 Y3 X4 X5 X6 .85 .76 .09* (.04) Outcome Fairness .79 Apply Greek here. Why? In LISREL outputs, you will see lots of Greek!) 25