Structural Equation Modeling Karl L. Wuensch Dept of Psychology East Carolina University Nomenclature • AKA – “causal modeling.” – “analysis of covariance structure.” • Two sets of variables – Indicators – measured (observed, manifest) variables – diagramed within rectangles – Latent variables – factors – diagramed within ellipses Causal Language • Indicators and latent variables may be classified as “independent” or “dependent” • Even if no variables are manipulated • Based on the causal model being tested. • In the diagram, indicators and latent variables may be connected by arrows – One-headed = unidirectional causal flow – Two-headed = direction not specified Paths • “Dependent” variables have one-headed arrows pointing to them. • “Independent” variables do not. • “Dependent” variables also have residuals (are not perfectly predicted by the independents) – Called errors (e) for observed variables – Disturbances (d) for latent variables Two Models • Measurement Model – how the measured variables are related to the latent variables. • Structural Model – how the latent variables are related to each other. Two Variance Covariance Matrices • The sample matrix – computed from the sample data. • The estimated population matrix – estimated from the model. • 2 test of null that the model fits the data well. • More useful are goodness of fit estimators Sample Size • Need at least 200 cases even for a simple model. • Rule of thumb: at least 10 cases per estimated parameter. Assumptions & Problems • Multivariate normality. • Linear relationships – But can include polynomial components • A singular matrix may crash the program • Multicollinearity can be a problem Simple Example from T&F The “Independent” Variables are shaded. Regression Parameters • Regression coefficients for the paths – May be “fixed” to value 0 (no path) or 1 – Or estimated from the data (). Variance/Covariance Parameters • Variances/Covariances of the “independent” variables – May be estimated – or fixed, to 1 or – to the variance of a “marker” measured variable (set to 1 the path to the marker). • Variances for “dependent” latent variables are usually fixed to the variance of one of the measured variables (set to 1 the path to that measured variable). Model Identification • A model is “identified” if there is a unique solution for each of the estimated parameters. • Determine the number of input data points (values in the sample variance/covariance matrix). m ( m 1) • This is 2 • Where m = number of measured variables. Model Identification • For T&F’s simple model, 5(6)/2 = 15 data points. Model Identification • If the number of data points = the number of parameters to be estimated, the model is “just identified,” or “saturated,” and the fit will be perfect. • If there are fewer data points than parameters to be estimated, the model is “under identified” and the analysis is kaput. The “Over Identified” Model • The number of input data points exceeds the number of parameters to be estimated. • This is the desired situation. • For T&F’s simple model, count the number of asterisks in the diagram. I count 11. • 15 input data points, 11 parameters to estimate we have an over identified model. Eleven Parameters (*) The “Independent” Variables are shaded. Identification of the Measurement Model Should Be OK if • Only one latent variable, at least three indicators, errors not correlated. • Two or more latent variables, each has at least three indicators, errors not correlated, each indicator loads on only one latent variable, latent variables are allowed to covary. Identification of the Measurement Model Should Be OK if • Two or more latent variables, one has only two indicators, errors are not correlated, each indicator loads on only one latent variable, none of the latent variables has a variance or covariance of zero. Identification of the Structural Model May Be OK if • None of the latent DVs predicts another latent DV, • or if one does, it is recursive (unidirectional) and the disturbances are not correlated • If there are nonrecursive relationships (an arrow from A to B and from B to A), hire an expert in SEM. Error in Identification • If there is a problem, the software will throw an error. • The software may suggest a way to reach identification. • You must tinker with the model to make it identified. Estimation • Maximum Likelihood most common – An iterative procedure used to maximize fit between the sample var/cov matrix and the estimated population var/cov matrix. • Generalized Least Squares estimation has also fared well in Monte Carlo comparisons of techniques. Modifying and Comparing Models • May simplify a model by deleting one or more parameters in it. • The simplified model is nested with the previous model. • Calculate difference 2 = difference between the two model’s Chi-Squares. • df = number of parameters deleted. LM and Wald Tests • Lagrange Multiplier Test – Would fit be improved by estimating a parameter that is currently fixed? • Wald Test – Would fixing this parameter significantly reduce the fit? – Available in SAS Calis, not in Amos Reliability of Measured Variables • Assume that the variance in the measured variable is due to variance in the latent variable (the “true scores”) + random error. • Reliability = true variance divided by (true and error variance). • Thus, estimated reliability = the r2 between measured variable and latent variable.