This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2007, The Johns Hopkins University and Qian-Li Xue. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed. Introduction to Structural Equations with Latent Variables Statistics for Psychosocial Research II: Structural Models Qian-Li Xue No Yes Test of causal hypotheses? Ordinary Regression SEM (Origin: Path Models) Yes Continuous endogenous var. and Continuous LV? Yes Classic SEM Yes Yes Categorical indicators and Categorical LV? Latent Class Reg. Longitudinal Data? Adv. SEM I: latent growth curves) No Adv. SEM II: Multilevel Models Latent Trait Latent Profile No Multilevel Data ? No No Classic SEM Adding Latent Variables to the Model So far…we’ve only included observed variables in our “path models” Extension to latent variables: need to add in a “measurement piece” how do we “define” the latent variable (think factor analysis) more complicated to look at, but same general principles apply Path Diagram ζ1 ξ1 γ11 η1 γ21 β21 η2 ζ2 Notation for Latent Variable Model η = latent endogenous variable (eta) ξ = latent exogenous variable (ksi, pronounced “kah-see”) ζ = latent error (zeta) β = coefficient on latent endogenous variable (beta) γ = coefficient on latent exogenous variable (gamma) η1 = γ 11ξ1 + ζ 1 η 2 = β 21η1 + γ 21ξ1 + ζ 2 e.g. ξ1 is childhood home environment, η1 is social support, and η2 is cancer coping ability η = Βη + Γξ + ζ , Cov(ξ ) = Φ, Cov(ζ ) = Ψ Path Diagram ζ1 δ1 x1 δ2 x2 δ3 x3 λ1 λ2 λ3 ξ1 γ11 η1 γ21 β21 η2 ζ2 y1 ε1 y2 ε2 y3 ε3 λ7 y 4 λ8 y5 λ9 ε4 y6 ε6 λ4 λ5 λ6 ε5 Notation for Measurement Model y = observed indicator of η x = observed indicator of ξ ε = measurement error on y x1 = λ1ξ1 + δ1 x 2 = λ2ξ1 + δ 2 x3 = λ3ξ1 + δ 3 δ = measurement error on x λx = coefficient relating x to ξ λy = coefficient relating y to η y1 = λ4η1 + ε1 y 2 = λ5η1 + ε 2 y3 = λ6η1 + ε 3 y 4 = λ7η2 + ε 4 y5 = λ8η2 + ε 5 y 6 = λ9η2 + ε 6 y = Λ yη + ε , Cov(ε ) = Θε x = Λ xξ + δ , Cov(δ ) = Θδ Example: Model Specification ζ2 ζ1 η1 γ11 β21 η2 γ21 η2 is democracy in 1965 η1 is democracy in 1960 ξ1 is industrialization in 1960 ξ1 ⎡ζ 1 ⎤ ⎡η1 ⎤ ⎡ 0 0⎤ ⎡η1 ⎤ ⎡γ 11 ⎤ + ⎢ ⎥[ξ1 ] + ⎢ ⎥ ⎥ ⎢ ⎥ ⎢η ⎥ = ⎢ β ⎣ζ 2 ⎦ ⎣ 2 ⎦ ⎣ 21 0⎦ ⎣η 2 ⎦ ⎣γ 21 ⎦ η = Βη + Γξ + ζ , Cov(ξ ) = Φ, Cov(ζ ) = Ψ Example: Model Specification ε1 ε2 y1 ε3 y2 y3 λ2 1 ε4 y4 1 β21 γ11 λ6 x1 δ1 λ7 x2 δ2 ε6 y6 λ2 λ3 ε7 y7 ε8 y8 λ4 η2 ζ2 γ21 ξ1 1 y5 λ3 λ4 η1 ζ1 ε5 Y1,Y5: freedom of press Y2,Y6: freedom of group opposition Y3,Y7: fairness of election Y4,Y8: effectiveness of legislative body x1 is GNP per capita x2 is energy consumption per capita x3 is % labor force x3 δ3 Bollen pp322-323 Model Estimation In multiple regression, estimation is based on individual cases via LS, i.e. minimization of ˆ ( Y ∑ −Y ) 2 n In SEM, estimation is based on covariances If Model is correct and θ is known Σ = Σ(θ ) Where Σ is the population covariance matrix of observed variables and Σ(θ) is the covariance matrix written as a function of θ Model Estimation Regression analysis, confirmatory factor analysis are special cases E.g. y=γx+ζ, ζ⊥γ, E(ζ)=0 ⎤ ⎤ ⎡γ 2VAR( x) + VAR(ζ ) ⎡ VAR( y ) ⎥ ⎢COV ( x, y ) VAR( x)⎥ = ⎢ γ VAR ( x ) VAR ( x ) ⎦ ⎣ ⎣ ⎦ COV ( x, y ) = γVAR( x) ⇒ γ = COV ( x, y ) VAR( x) Does γ look familiar? Remember in SLR, β=(x´x)-1x´y Model Estimation In reality, neither population covariances Σ nor the parameters θ are known What we have is sample estimate of Σ: S Goal: estimate θ based on S by choosing the estimates of θ such that Σ̂ is as close to S as possible But how close is “close”? Define and minimize objective functions or called “fitting functions”: F(S, Σ̂ ) e.g. F(S,Σ̂ )=S- Σ̂ Common Fitting Functions Maximum Likelihood (ML) To minimize FML=(1/2)tr({[S-Σ(θ)] Σ -1(θ)}2) or FML=log|Σ(θ)|+tr(SΣ-1(θ))-log|S|-(p+q) Explicit solutions of θ may not exist Iterative numeric procedure is needed Asymptotic properties of ML estimators: Consistent, i.e. θˆ → θ as n→∞ Efficient, i.e. smallest asymptotic variance Asymptotic normality Common Fitting Functions Maximum Likelihood (ML) Advantages Scale invariant F(S,Σ(θ)) if scale invariance if F(S,Σ(θ)) = F(DSD,DΣ(θ)D), where D is a diagonal matrix with positive elements E.g. D consists of inverses of standard deviation of observed variables, DSD becomes a correlation matrix More general, the value of F is the same for any change of scale (e.g. dollars to cents) Scale free Knowing D, we can calculate θˆ* (based on transformed data) from θˆ (based on non-transformed data) without actually rerunning the model Test of overall model fit for overidentified model based on the fact: (N-1)FMLis a χ2 distribution with ½(p+q)(p+q+1)-t Disadvantage Assumption of multivariate normality Common Fitting Functions Unweighted Least Squares (ULS) To minimize FULS=(1/2)tr[(S-Σ(θ))2] Analogous to OLS, minimize the sum of squares of each element in the residual matrix (S-Σ(θ)) Give greater weights to off covariance terms than variance terms Explicit solutions of θ may not exist Iterative numeric procedure is needed Advantages of ULS estimators: Intuitive Consistent, i.e. θˆ → θ as n→∞ No distributional assumptions Disadvantages Not most efficient Not scale invariant, not scale free Common Fitting Functions Generalized Least Squares (GLS) To minimize FGLS=(1/2)tr({[S-Σ(θ)]S-1}2) Weights the elements of (S- Σ(θ)) according to variances and covariances FULS is a special case of FGLS with S-1=I Advantages of ULS estimators: Intuitive Consistent, i.e. θˆ → θ as n→∞ Asymptotic normality (availability of significance test) Asymptotically efficient Scale invariant and scale free Test of overall model fit for overidentified model based on the fact: (N-1)FGLS is a χ2 distribution with ½(p+q)(p+q+1)-t Disadvantages Sensitive to “fat” or “thin” tails Example: Model Specification ε1 ε2 y1 ε3 y2 y3 λ2 1 ε4 y4 1 β21 γ11 γ21 λ6 δ1 λ7 x2 δ2 ε6 y6 λ2 η2 λ3 ε7 y7 ε8 y8 λ4 ξ1 x1 y5 λ3 λ4 η1 1 ε5 Y1,Y5: freedom of press Y2,Y6: freedom of group opposition Y3,Y7: fairness of election Y4,Y8: effectiveness of legislative body x1 is GNP per capita x2 is energy consumption per capita x3 is % labor force x3 δ3 Bollen pp322-323 Simple Case of SEM with Latent Variables: Confirmatory Factor Analysis Recap of Basic Characteristics of Exploratory Factor Analysis (EFA) Most EFA extract orthogonal factors, which is “boring” to SEM users Distinction between common and unique variances EFA is underidentified (i.e. no unique solution) Remember rotation? Equally good fit with different rotations! All measures are related to each factor Confirmatory Factor Analysis (CFA) Takes factor analysis a step further. We can “test” or “confirm” or “implement” a “highly constrained a priori structure that meets conditions of model identification” But be careful, a model can never be confirmed!! CFA model is constructed in advance number of latent variables (“factors”) is pre-set by analyst (not part of the modeling usually) Whether latent variable influences observed is specified Measurement errors may correlate Difference between CFA and the usual SEM: SEM assumes causally interrelated latent variables CFA assumes interrelated latent variables (i.e. exogenous) Exploratory Factor Analysis Two factor model: ⎡ x1 ⎤ ⎡ λ11 ⎢ x ⎥ ⎢λ ⎢ 2 ⎥ ⎢ 21 ⎢ x3 ⎥ ⎢λ31 ⎢ ⎥=⎢ ⎢ x4 ⎥ ⎢λ41 ⎢ x5 ⎥ ⎢λ51 ⎢ ⎥ ⎢ ⎢⎣ x6 ⎥⎦ ⎢⎣λ61 x = Λξ + δ λ12 ⎤ ⎡δ 1 ⎤ ⎢δ ⎥ λ22 ⎥⎥ ⎢ 2⎥ λ32 ⎥ ⎡ξ1 ⎤ ⎢δ 3 ⎥ ⎥⎢ ⎥ + ⎢ ⎥ λ42 ⎥ ⎣ξ 2 ⎦ ⎢δ 4 ⎥ ⎢δ 5 ⎥ λ52 ⎥ ⎢ ⎥ ⎥ λ62 ⎥⎦ ⎢⎣δ 6 ⎥⎦ ξ1 ξ2 x1 δ1 x2 δ2 x3 δ3 x4 δ4 x5 δ5 x6 δ6 CFA Notation Two factor model: x = Λξ + δ ⎡ x1 ⎤ ⎡ λ11 0 ⎤ ⎡δ 1 ⎤ ⎢ x ⎥ ⎢λ ⎥ ⎢δ ⎥ 0 ⎢ 2 ⎥ ⎢ 21 ⎥ ⎢ 2⎥ ⎢ x3 ⎥ ⎢λ31 0 ⎥ ⎡ξ1 ⎤ ⎢δ 3 ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ + ⎢ ⎥ ⎢ x4 ⎥ ⎢ 0 λ42 ⎥ ⎣ξ 2 ⎦ ⎢δ 4 ⎥ ⎢ x5 ⎥ ⎢ 0 λ52 ⎥ ⎢δ 5 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢⎣ x6 ⎥⎦ ⎢⎣ 0 λ62 ⎥⎦ ⎢⎣δ 6 ⎥⎦ ξ1 ξ2 x1 δ1 x2 δ2 x3 δ3 x4 δ4 x5 δ5 x16 δ6 For the “matrix-challenged” CFA EFA x1 = λ11ξ1 + δ1 x1 = λ11ξ1 + λ12ξ2 + δ1 x2 = λ21ξ1 + δ 2 x2 = λ21ξ1 + λ22ξ2 + δ 2 x3 = λ31ξ1 + δ 3 x3 = λ31ξ1 + λ32ξ2 + δ 3 x4 = λ42ξ2 + δ 4 x4 = λ41ξ1 + λ42ξ2 + δ 4 x5 = λ52ξ2 + δ5 x6 = λ62ξ2 + δ 6 cov(ξ1 , ξ2 ) = ϕ12 x5 = λ51ξ1 + λ52ξ2 + δ5 x6 = λ61ξ1 + λ62ξ2 + δ 6 cov(ξ1 , ξ2 ) = 0 More Important Notation Φ (capital of φ): covariance matrix of factors var(ξ1 ) = ϕ11 ⎡ ϕ11 ϕ12 ⎤ Φ = var(ξ ) = ⎢ ⎥ ϕ ϕ 22 ⎦ ⎣ 12 cov(ξ1 , ξ2 ) = ϕ12 Ψ(capital of ψ): covariance matrix of errors ⎡ψ 11 ⎢ψ ⎢ 21 ⎢ψ Ψ = var(Δ) = ⎢ 31 ⎢ψ 41 ⎢ψ 51 ⎢ ⎢⎣ψ 61 ψ 22 ψ 32 ψ 42 ψ 52 ψ 62 ψ 33 ψ 43 ψ 44 ψ 53 ψ 54 ψ 55 ψ 63 ψ 64 ψ 65 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ψ 66 ⎥⎦ var(δ 1 ) = ψ 11 cov(δ 1 , δ 2 ) =ψ 12 NOTE: usually, ψij = 0 if i ≠ j Model Estimation In our previous CFA example, we have six equations, and many more unknowns In this form, not enough information to uniquely solve for λ and the factor correlation What if we multiple both sides of x = Λξ + δ by x' xx′ = (λξ + δ )(λξ + δ )′ = (λξ )(λξ )′ + (λξ )δ ′ + δ (λξ )′ + δδ ′ because ξ and δ are independent xx′ = (λξ )(λξ )′ + δδ ′ = λξξ ′λ ′ + δδ ′ Σ = λ Φ λ + Ψ = Σ ( Θ) where Φ is the covariance matrix of factors ξ , and Ψ is error covariance matrix Model Constraints Hallmark of CFA Purposes for setting constraints: Test a priori theory Ensure identifiability Test reliability of measures Model Constraints: Identifiability Latent variables (LVs) need some constraints Because factors are unmeasured, their variances can take different values Recall EFA where we constrained factors: F ~ N(0,1) Otherwise, model is not identifiable. Here we have two options: Fix variance of latent variables (LV) to be 1 (or another constant) Fix one path between LV and indicator Necessary Constraints Fix variances: 1 ξ1 1 ξ2 Fix path: x1 δ1 x2 δ2 x3 δ3 x4 δ4 x5 x16 x1 δ1 x2 δ2 x3 δ3 x4 δ4 δ5 x5 δ5 δ6 x16 δ6 1 ξ1 ξ2 1 Model Parametrization Fix variances: Fix path: x1 = λ11ξ1 + δ1 x1 = ξ1 + δ1 x2 = λ21ξ1 + δ 2 x2 = λ21ξ1 + δ 2 x3 = λ31ξ1 + δ 3 x3 = λ31ξ1 + δ 3 x4 = λ42ξ2 + δ 4 x4 = ξ2 + δ 4 x5 = λ52ξ2 + δ5 x5 = λ52ξ2 + δ5 x6 = λ62ξ2 + δ 6 x6 = λ62ξ2 + δ 6 cov(ξ1 , ξ2 ) = ϕ12 cov(ξ1 , ξ2 ) = ϕ12 var(ξ1 ) = 1 var(ξ1 ) = ϕ11 var(ξ2 ) = 1 var(ξ2 ) = ϕ 22 Model Constraints: Reliability Assessment Parallel measures Tx1 = Tx2 [= E(x)] (True scores are equal) T affects x1 and x2 equally Cov(δ1, δ2) = 0 (Errors not correlated) Var (δ1) = Var (δ2) (Equal error variances) e λ T λ λ x1 δ1 x2 δ2 x3 δ3 e e Model Constraints: Reliability Assessment Tau equivalent measures Tx1 = Tx2 T affects x1 and x2 equally Var (δ1) ≠ Var (δ2) Note: for standardized measures, it makes no sense to constrain the loadings without also constraining the residuals, since Var(x) =1.0 e1 λ T λ λ x1 δ1 x2 δ2 x3 δ3 e2 e3