licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2007, The Johns Hopkins University and Qian-Li Xue. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
Introduction to Structural
Equations with Latent Variables
Statistics for Psychosocial Research II:
Structural Models
Qian-Li Xue
No
Yes
Test of causal
hypotheses?
Ordinary
Regression
SEM (Origin: Path Models)
Yes
Continuous
endogenous var. and
Continuous LV?
Yes
Classic SEM
Yes
Yes
Categorical indicators
and Categorical LV?
Latent Class Reg.
Longitudinal Data?
Adv. SEM I: latent
growth curves)
No
Adv. SEM II:
Multilevel Models
Latent Trait
Latent Profile
No
Multilevel Data ?
No
No
Classic SEM
Adding Latent Variables to the Model
ƒ So far…we’ve only included observed
variables in our “path models”
ƒ Extension to latent variables:
ƒ need to add in a “measurement piece”
ƒ how do we “define” the latent variable (think
factor analysis)
ƒ more complicated to look at, but same
general principles apply
Path Diagram
ζ1
ξ1
γ11
η1
γ21
β21
η2
ζ2
Notation for Latent Variable Model
ƒ
ƒ
ƒ
ƒ
ƒ
η = latent endogenous variable (eta)
ξ = latent exogenous variable (ksi, pronounced “kah-see”)
ζ = latent error (zeta)
β = coefficient on latent endogenous variable (beta)
γ = coefficient on latent exogenous variable (gamma)
η1 = γ 11ξ1 + ζ 1
η 2 = β 21η1 + γ 21ξ1 + ζ 2
e.g. ξ1 is childhood home environment, η1 is social support, and
η2 is cancer coping ability
η = Βη + Γξ + ζ , Cov(ξ ) = Φ, Cov(ζ ) = Ψ
Path Diagram
ζ1
δ1
x1
δ2
x2
δ3
x3
λ1
λ2
λ3
ξ1
γ11
η1
γ21
β21
η2
ζ2
y1
ε1
y2
ε2
y3
ε3
λ7 y
4
λ8
y5
λ9
ε4
y6
ε6
λ4
λ5
λ6
ε5
Notation for Measurement Model
y = observed indicator of η
x = observed indicator of ξ
ε = measurement error on y
x1 = λ1ξ1 + δ1
x 2 = λ2ξ1 + δ 2
x3 = λ3ξ1 + δ 3
δ = measurement error on x
λx = coefficient relating x to ξ
λy = coefficient relating y to η
y1 = λ4η1 + ε1
y 2 = λ5η1 + ε 2
y3 = λ6η1 + ε 3
y 4 = λ7η2 + ε 4
y5 = λ8η2 + ε 5
y 6 = λ9η2 + ε 6
y = Λ yη + ε , Cov(ε ) = Θε
x = Λ xξ + δ , Cov(δ ) = Θδ
Example: Model Specification
ζ2
ζ1
η1
γ11
β21
η2
γ21
ƒ η2 is democracy in 1965
ƒ η1 is democracy in 1960
ƒ ξ1 is industrialization in
1960
ξ1
⎡ζ 1 ⎤
⎡η1 ⎤ ⎡ 0 0⎤ ⎡η1 ⎤ ⎡γ 11 ⎤
+ ⎢ ⎥[ξ1 ] + ⎢ ⎥
⎥
⎢
⎥
⎢η ⎥ = ⎢ β
⎣ζ 2 ⎦
⎣ 2 ⎦ ⎣ 21 0⎦ ⎣η 2 ⎦ ⎣γ 21 ⎦
η = Βη + Γξ + ζ , Cov(ξ ) = Φ, Cov(ζ ) = Ψ
Example: Model Specification
ε1
ε2
y1
ε3
y2
y3
λ2
1
ε4
y4
1
β21
γ11
λ6
x1
δ1
λ7
x2
δ2
ε6
y6
λ2
λ3
ε7
y7
ε8
y8
λ4
η2
ƒ
ƒ
ƒ
ƒ
ζ2
ƒ
ƒ
γ21
ξ1
1
y5
λ3 λ4
η1
ζ1
ε5
ƒ
Y1,Y5: freedom of press
Y2,Y6: freedom of group opposition
Y3,Y7: fairness of election
Y4,Y8: effectiveness of legislative
body
x1 is GNP per capita
x2 is energy consumption per
capita
x3 is % labor force
x3
δ3
Bollen pp322-323
Model Estimation
ƒ In multiple regression, estimation is based on individual
cases via LS, i.e. minimization of
ˆ
(
Y
∑ −Y )
2
n
ƒ In SEM, estimation is based on covariances
If Model is correct and θ is known
Σ = Σ(θ )
Where Σ is the population covariance matrix of observed
variables and Σ(θ) is the covariance matrix written as a
function of θ
Model Estimation
ƒ Regression analysis, confirmatory factor analysis are
special cases
ƒ E.g. y=γx+ζ, ζ⊥γ, E(ζ)=0
⎤
⎤ ⎡γ 2VAR( x) + VAR(ζ )
⎡ VAR( y )
⎥
⎢COV ( x, y ) VAR( x)⎥ = ⎢
γ
VAR
(
x
)
VAR
(
x
)
⎦ ⎣
⎣
⎦
COV ( x, y ) = γVAR( x) ⇒ γ =
COV ( x, y )
VAR( x)
ƒ Does γ look familiar?
Remember in SLR, β=(x´x)-1x´y
Model Estimation
ƒ In reality, neither population covariances Σ nor
the parameters θ are known
ƒ What we have is sample estimate of Σ: S
ƒ Goal: estimate θ based on S by choosing the
estimates of θ such that Σ̂ is as close to S as
possible
ƒ But how close is “close”?
ƒ Define and minimize objective functions or
called “fitting functions”: F(S, Σ̂ )
e.g. F(S,Σ̂ )=S- Σ̂
Common Fitting Functions
ƒ Maximum Likelihood (ML)
To minimize
FML=(1/2)tr({[S-Σ(θ)] Σ -1(θ)}2)
or
FML=log|Σ(θ)|+tr(SΣ-1(θ))-log|S|-(p+q)
ƒ Explicit solutions of θ may not exist
ƒ Iterative numeric procedure is needed
ƒ Asymptotic properties of ML estimators:
Consistent, i.e. θˆ → θ as n→∞
™ Efficient, i.e. smallest asymptotic variance
™ Asymptotic normality
™
Common Fitting Functions
ƒ Maximum Likelihood (ML)
Advantages
ƒ Scale invariant
™
™
™
F(S,Σ(θ)) if scale invariance if F(S,Σ(θ)) = F(DSD,DΣ(θ)D), where D
is a diagonal matrix with positive elements
E.g. D consists of inverses of standard deviation of observed
variables, DSD becomes a correlation matrix
More general, the value of F is the same for any change of scale
(e.g. dollars to cents)
ƒ Scale free
Knowing D, we can calculate θˆ* (based on transformed data)
from θˆ (based on non-transformed data) without actually
rerunning the model
ƒ Test of overall model fit for overidentified model based on the
fact: (N-1)FMLis a χ2 distribution with ½(p+q)(p+q+1)-t
Disadvantage
ƒ Assumption of multivariate normality
Common Fitting Functions
ƒ Unweighted Least Squares (ULS)
To minimize
FULS=(1/2)tr[(S-Σ(θ))2]
ƒ Analogous to OLS, minimize the sum of squares of each
element in the residual matrix (S-Σ(θ))
ƒ Give greater weights to off covariance terms than variance terms
ƒ Explicit solutions of θ may not exist
ƒ Iterative numeric procedure is needed
ƒ Advantages of ULS estimators:
™
™
™
Intuitive
Consistent, i.e. θˆ → θ as n→∞
No distributional assumptions
ƒ Disadvantages
™
™
Not most efficient
Not scale invariant, not scale free
Common Fitting Functions
ƒ Generalized Least Squares (GLS)
To minimize
FGLS=(1/2)tr({[S-Σ(θ)]S-1}2)
ƒ Weights the elements of (S- Σ(θ)) according to variances and
covariances
ƒ FULS is a special case of FGLS with S-1=I
ƒ Advantages of ULS estimators:
™
™
™
™
™
™
Intuitive
Consistent, i.e. θˆ → θ as n→∞
Asymptotic normality (availability of significance test)
Asymptotically efficient
Scale invariant and scale free
Test of overall model fit for overidentified model based on the fact:
(N-1)FGLS is a χ2 distribution with ½(p+q)(p+q+1)-t
ƒ Disadvantages
™
Sensitive to “fat” or “thin” tails
Example: Model Specification
ε1
ε2
y1
ε3
y2
y3
λ2
1
ε4
y4
1
β21
γ11
γ21
λ6
δ1
λ7
x2
δ2
ε6
y6
λ2
η2
λ3
ε7
y7
ε8
y8
λ4
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ξ1
x1
y5
λ3 λ4
η1
1
ε5
ƒ
Y1,Y5: freedom of press
Y2,Y6: freedom of group opposition
Y3,Y7: fairness of election
Y4,Y8: effectiveness of legislative
body
x1 is GNP per capita
x2 is energy consumption per
capita
x3 is % labor force
x3
δ3
Bollen pp322-323
Simple Case of SEM with
Latent Variables:
Confirmatory Factor Analysis
Recap of Basic Characteristics of
Exploratory Factor Analysis (EFA)
ƒ Most EFA extract orthogonal factors,
which is “boring” to SEM users
ƒ Distinction between common and unique
variances
ƒ EFA is underidentified (i.e. no unique
solution)
ƒ Remember rotation? Equally good fit with
different rotations!
ƒ All measures are related to each factor
Confirmatory Factor Analysis (CFA)
ƒ Takes factor analysis a step further.
ƒ We can “test” or “confirm” or “implement” a “highly
constrained a priori structure that meets conditions of
model identification”
ƒ But be careful, a model can never be confirmed!!
ƒ CFA model is constructed in advance
ƒ number of latent variables (“factors”) is pre-set by
analyst (not part of the modeling usually)
ƒ Whether latent variable influences observed is
specified
ƒ Measurement errors may correlate
ƒ Difference between CFA and the usual SEM:
ƒ SEM assumes causally interrelated latent variables
ƒ CFA assumes interrelated latent variables (i.e. exogenous)
Exploratory Factor Analysis
Two factor model:
⎡ x1 ⎤ ⎡ λ11
⎢ x ⎥ ⎢λ
⎢ 2 ⎥ ⎢ 21
⎢ x3 ⎥ ⎢λ31
⎢ ⎥=⎢
⎢ x4 ⎥ ⎢λ41
⎢ x5 ⎥ ⎢λ51
⎢ ⎥ ⎢
⎢⎣ x6 ⎥⎦ ⎢⎣λ61
x = Λξ + δ
λ12 ⎤
⎡δ 1 ⎤
⎢δ ⎥
λ22 ⎥⎥
⎢ 2⎥
λ32 ⎥ ⎡ξ1 ⎤ ⎢δ 3 ⎥
⎥⎢ ⎥ + ⎢ ⎥
λ42 ⎥ ⎣ξ 2 ⎦ ⎢δ 4 ⎥
⎢δ 5 ⎥
λ52 ⎥
⎢ ⎥
⎥
λ62 ⎥⎦
⎢⎣δ 6 ⎥⎦
ξ1
ξ2
x1
δ1
x2
δ2
x3
δ3
x4
δ4
x5
δ5
x6
δ6
CFA Notation
Two factor model:
x = Λξ + δ
⎡ x1 ⎤ ⎡ λ11 0 ⎤
⎡δ 1 ⎤
⎢ x ⎥ ⎢λ
⎥
⎢δ ⎥
0
⎢ 2 ⎥ ⎢ 21
⎥
⎢ 2⎥
⎢ x3 ⎥ ⎢λ31 0 ⎥ ⎡ξ1 ⎤ ⎢δ 3 ⎥
⎢ ⎥=⎢
⎥⎢ ⎥ + ⎢ ⎥
⎢ x4 ⎥ ⎢ 0 λ42 ⎥ ⎣ξ 2 ⎦ ⎢δ 4 ⎥
⎢ x5 ⎥ ⎢ 0 λ52 ⎥
⎢δ 5 ⎥
⎢ ⎥ ⎢
⎥
⎢ ⎥
⎢⎣ x6 ⎥⎦ ⎢⎣ 0 λ62 ⎥⎦
⎢⎣δ 6 ⎥⎦
ξ1
ξ2
x1
δ1
x2
δ2
x3
δ3
x4
δ4
x5
δ5
x16
δ6
For the “matrix-challenged”
CFA
EFA
x1 = λ11ξ1 + δ1
x1 = λ11ξ1 + λ12ξ2 + δ1
x2 = λ21ξ1 + δ 2
x2 = λ21ξ1 + λ22ξ2 + δ 2
x3 = λ31ξ1 + δ 3
x3 = λ31ξ1 + λ32ξ2 + δ 3
x4 = λ42ξ2 + δ 4
x4 = λ41ξ1 + λ42ξ2 + δ 4
x5 = λ52ξ2 + δ5
x6 = λ62ξ2 + δ 6
cov(ξ1 , ξ2 ) = ϕ12
x5 = λ51ξ1 + λ52ξ2 + δ5
x6 = λ61ξ1 + λ62ξ2 + δ 6
cov(ξ1 , ξ2 ) = 0
More Important Notation
ƒ Φ (capital of φ): covariance matrix of factors
var(ξ1 ) = ϕ11
⎡ ϕ11 ϕ12 ⎤
Φ = var(ξ ) = ⎢
⎥
ϕ
ϕ
22 ⎦
⎣ 12
cov(ξ1 , ξ2 ) = ϕ12
ƒ Ψ(capital of ψ): covariance matrix of errors
⎡ψ 11
⎢ψ
⎢ 21
⎢ψ
Ψ = var(Δ) = ⎢ 31
⎢ψ 41
⎢ψ 51
⎢
⎢⎣ψ 61
ψ 22
ψ 32
ψ 42
ψ 52
ψ 62
ψ 33
ψ 43 ψ 44
ψ 53 ψ 54 ψ 55
ψ 63 ψ 64 ψ 65
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
ψ 66 ⎥⎦
var(δ 1 ) = ψ 11
cov(δ 1 , δ 2 ) =ψ 12
NOTE: usually, ψij = 0 if i ≠ j
Model Estimation
ƒ In our previous CFA example, we have six equations, and many
more unknowns
ƒ In this form, not enough information to uniquely solve for λ and the
factor correlation
ƒ What if we multiple both sides of
x = Λξ + δ
by
x'
xx′ = (λξ + δ )(λξ + δ )′
= (λξ )(λξ )′ + (λξ )δ ′ + δ (λξ )′ + δδ ′
because ξ and δ are independent
xx′ = (λξ )(λξ )′ + δδ ′
= λξξ ′λ ′ + δδ ′
Σ = λ Φ λ + Ψ = Σ ( Θ)
where Φ is the covariance matrix of factors ξ , and Ψ is error covariance matrix
Model Constraints
ƒ Hallmark of CFA
ƒ Purposes for setting constraints:
ƒ Test a priori theory
ƒ Ensure identifiability
ƒ Test reliability of measures
Model Constraints: Identifiability
ƒ Latent variables (LVs) need some
constraints
ƒ Because factors are unmeasured, their
variances can take different values
ƒ Recall EFA where we constrained factors:
F ~ N(0,1)
ƒ Otherwise, model is not identifiable.
ƒ Here we have two options:
ƒ Fix variance of latent variables (LV) to be 1 (or
another constant)
ƒ Fix one path between LV and indicator
Necessary Constraints
Fix variances:
1
ξ1
1
ξ2
Fix path:
x1
δ1
x2
δ2
x3
δ3
x4
δ4
x5
x16
x1
δ1
x2
δ2
x3
δ3
x4
δ4
δ5
x5
δ5
δ6
x16
δ6
1
ξ1
ξ2
1
Model Parametrization
Fix variances:
Fix path:
x1 = λ11ξ1 + δ1
x1 = ξ1 + δ1
x2 = λ21ξ1 + δ 2
x2 = λ21ξ1 + δ 2
x3 = λ31ξ1 + δ 3
x3 = λ31ξ1 + δ 3
x4 = λ42ξ2 + δ 4
x4 = ξ2 + δ 4
x5 = λ52ξ2 + δ5
x5 = λ52ξ2 + δ5
x6 = λ62ξ2 + δ 6
x6 = λ62ξ2 + δ 6
cov(ξ1 , ξ2 ) = ϕ12
cov(ξ1 , ξ2 ) = ϕ12
var(ξ1 ) = 1
var(ξ1 ) = ϕ11
var(ξ2 ) = 1
var(ξ2 ) = ϕ 22
Model Constraints: Reliability Assessment
ƒ Parallel measures
ƒ
ƒ
ƒ
ƒ
Tx1 = Tx2 [= E(x)] (True scores are equal)
T affects x1 and x2 equally
Cov(δ1, δ2) = 0 (Errors not correlated)
Var (δ1) = Var (δ2) (Equal error variances)
e
λ
T
λ
λ
x1
δ1
x2
δ2
x3
δ3
e
e
Model Constraints: Reliability Assessment
ƒ Tau equivalent measures
ƒ
ƒ
ƒ
ƒ
Tx1 = Tx2
T affects x1 and x2 equally
Var (δ1) ≠ Var (δ2)
Note: for standardized measures, it makes no sense
to constrain the loadings without also constraining
the residuals, since Var(x) =1.0
e1
λ
T
λ
λ
x1
δ1
x2
δ2
x3
δ3
e2
e3