Introduction to Structural Equation Modeling (TODAY)

advertisement
Introduction to Structural Equation Modeling
1. Uses of SEM
a. Exploratory factor analysis
b. Path analysis/regression
c. Confirmatory factor analysis
d. Causal modeling
2. Background topics
a. Path analysis
b. Confirmatory Factor analysis
c. LISREL matrix language vs. SIMPLIS
1
Comparison of Statistical Approaches
 Path Analysis
o One or more categorical or continuous IVs,
multiple continuous DVs
 Structural Equation Modeling (SEM)
o Is like a path analysis, except that measurement
error (as estimated by internal consistency
reliability) is removed.
o One or more categorical or continuous IVs,
multiple continuous DVs; but all corrected for
measurement error
 Factor Analytic Methods (review)
o Exploratory (EFA)
 Principle Components (PCA)
 Assumes communality is equal to 1.00
(perfect reliability)
 Multiple continuous items, and want to
estimate total amount of explained
variance among items
 Factor Analysis (FA)
 Assumes less than perfect reliability
(communality is usually less than 1.00)
2
 Multiple continuous items, and want to
estimate the total amount of common
variance
o Confirmatory (CFA)
 Also known as restricted factor analysis
 Multiple continuous items, but rather
than letting the data determine factor
structure, it is specified a priori.
3
Path Analysis: Overview
 Path analysis is an extension of the regression model,
used to test the fit of the correlation matrix against
two or more causal models which are being compared
by the researcher.
 The model is usually depicted in a circle-and-arrow
figure in which single arrows indicate causation.
 A regression is done for each variable in the model as
a dependent on others which the model indicates are
causes.
 The regression weights predicted by the model are
compared with the observed correlation matrix for the
variables, and a goodness-of-fit statistic is calculated.
 The best-fitting of two or more models is selected by
the researcher as the best model for advancement of
theory.
4
 Path analysis requires the usual assumptions of
regression. It is particularly sensitive to model
specification because failure to include relevant
causal variables or inclusion of extraneous variables
often substantially affects the path coefficients, which
are used to assess the relative importance of various
direct and indirect causal paths to the dependent
variable.
 Such interpretations should be undertaken in the
context of comparing alternative models, after
assessing their goodness of fit.
 When the variables in the model are latent variables
measured by multiple observed indicators, path
analysis is termed structural equation modeling,
treated separately.
 We follow the conventional terminology by which
path analysis refers to single-indicator variables.
5
Terminology of Structural Components
 Constructs (latent variables)
Latent variables: Exogenous vs. Endogenous
Exogenous variables in a path model are those with
no explicit causes (no arrows going to them, other
than the measurement error term). If exogenous
variables are correlated, this is indicated by a
double-headed arrow connecting them.
Endogenous variables, then, are those which do
have incoming arrows. Endogenous variables
include intervening causal variables and dependents.
Intervening endogenous variables have both
incoming and outgoing causal arrows in the path
diagram. The dependent variable(s) have only
incoming arrows.
 Measures (indicators of latent variables)
 Errors:
o associated with measures (measurement error)
 Note: No measurement errors in Path Analysis,
but you can model the measurement error in
SEM.
o associated with latent variables (structural error)
6
Path coefficient/path weight. A path coefficient is a
standardized regression coefficient (beta) showing the
direct effect of an independent variable on a dependent
variable in the path model.
Effect decomposition. Path coefficients may be used to
decompose correlations in the model into direct and
indirect effects, corresponding, of course, to direct and
indirect paths reflected in the arrows in the model. This
is based on the rule that in a linear system.
Significance and Goodness of Fit in Path Models
o To test individual path coefficients one uses the
standard t or F test from regression output.
o To test the model with all its paths one uses a
goodness of fit test from a structural equation
modeling program. If a model is correctly
specified, including all relevant and excluding all
irrelevant variables, with arrows correctly
indicated, then the sum of path coefficients
(standardized) will equal the correlation
coefficient.
7
This means one can compare the path-estimated
correlation matrix with the observed correlation
matrix to assess the goodness-of-fit of path
models. As a practical matter, goodness-of-fit is
calculated by entering the model and its data into
a structural equation modeling program (LISREL),
which computes a variety of alternative goodnessof-fit coefficients.
8
Path coefficients and Effect decomposition
Path coefficient is equal to regression coefficient (beta
weights in regression).
1
P41
P31
3
P21
4
P43
P32
2
P42
What are exogenous and endogenous variables?
Variable 1 vs. Variables 2, 3, 3
To get path coefficients, you can run separate
regression for each endogenous variable. To test
individual path coefficients one uses the standard t or F
test from regression output.
P21 : regress endogenous 2 on 1.
P31 : regress endogenous 3 on 1.
P32 : regress endogenous 3 on 2. P41 : regress endogenous 4 on 1.
P42 : regress endogenous 4 on 2. P43 : regress endogenous 4 on 3.
9
How to get reproduced correlations (r)?
Use the path values and plug into the equations for the
correlation. Reproduce as many of the correlations as
you have paths to represent.
1
P41
P31
3
P21
4
P43
P32
2
P42
r12 = P21
r13 = P31 + P32*r12
r23 = P32 + P31*r12
r14 = P41 + P43*r13 + P42*r12
r24 = P42 + P44*r23 + P41*r12
r34 = P43 + P41*r13 + P42*r23
10
How can one test for model fit?
Model fits data if the path coefficients can be combined
to reproduce the observed correlations among the
variables.
11
Model Identification
1
P41
P31
3
P21
P43
P32
2
4
P42
6 unknowns: path coefficients
6 knowns: the correlations between the variables.
Equation = (k2 – k)/2 (k = # of variables)
12
Just identified model
 If the number of unknowns is equal to (k2 – k)/2, you
have a just identified model.
With 4 variables, (42 – 4)/2 = 6
 For a just identified model, the path coefficients that
we calculate can be used to reproduce exactly the
observed correlation matrix. The problem is that you
always reproduce the exact observed correlations no
matter how well the model fit the data.
Over identified model
 If we did not specify the link between Variable 1 and
2, we have 5 unknowns, and 6 knowns.
 When the number of knowns exceed the number of
unknowns, it is a over identified model, which
produces 1 overidentifying restrictions.
13
Example
SES
(1)
.41
.30
.42
nAch
(3)
IQ
(2)
GPA
(4)
.50
3
4 unknowns: path coefficients
6 knowns: the correlations between the variables.
This is an over-identified model.
SES (1)
IQ (2)
nAch (3)
GPA (4)
SES (1)
1.00
.30
.41
.32
IQ (2)
.30
1.00
.12
.55
nAch (3)
.41
.16
1.00
.57
GPA (4)
.33
.57
.50
1.00
Note. Correlations below the diagonal indicate reproduced
correlations, and those above the diagonal indicate observed
correlations.
14
Model Fit
 Is testing the similarity between reproduced
correlation and observed correlations.
 Compare over identified model to just identified
model.
 The larger discrepancy between the two correlations
(or two models), the larger chi-square value you will
have.
 You want a nonsignificant chi-square value to show
that your reproduced correlations are same/similar to
observed correlations.
 With a just-identified model, the chi-square value = 0.
 To test the model with all its paths one uses a
goodness of fit test from a structural equation
modeling program. If a model is correctly specified,
including all relevant and excluding all irrelevant
variables, with arrows correctly indicated, then the
sum of path coefficients (standardized) will equal the
correlation coefficient.
 As a practical matter, goodness-of-fit is calculated by
entering the model and its data into a structural
equation modeling program (LISREL), which
computes a variety of alternative goodness-of-fit
coefficients.
15
Model fit indices
The following are typical though not universal of what researchers
report to indicate degree of fit in path analysis, CFA, and SEM:
 Overall Chi-Square test: non-significant p-values indicate a
good fit.
 Goodness-of-fit index (GFI, also known as gamma-hat or ˆ ) and
Adjusted goodness of fit index (AGFI): the closer to 1.00, the
better.
0.90-0.95 acceptable fit,
0.95-0.99 close fit and
1.00 exact fit
 Comparative Fit Index (CFI): the closer to 1.00, the better.
< 0.85 indicate unacceptable fit,
0.85-0.89 mediocre fit, (model could be improved
substantially)
0.90-0.95 acceptable fit,
0.95-0.99 close fit and
1.00 exact fit
 RMSEA (Root Mean Square Error of Approx): the closer to 0,
the better.
> 0.10 indicate unacceptable fit,
0.10-0.08 mediocre fit,
0.08-0.06 acceptable fit,
0.06-0.01 close fit,
0.00 exact fit
16
Confirmatory Factor Analysis (CFA)
 Is more of a theoretical, hypothesis testing way of
doing factor analysis.
 Instead of running an EFA and trying to interpret it,
we set up a particular structure and then see if it fits
the data well.
o Factor loadings, error variances, etc., occur as
before.
o We force simple structure by not allow crossloadings
 Think of the EFA-CFA distinction like this:
o CFA is like hierarchical regression
 Is a model fit and comparison procedure
o EFA is like stepwise regression
 Is based directly on classical test theory.
o But the assumptions of classical test theory can
be relaxed (e.g., correlated error variances, etc).
 CFA can be run through software programs:
o SAS Proc Calis
o LISREL
o AMOS
o EQS
o M-Plus
17
Here’s a graphical example. Notice that simple
structure is present—no crossloadings
18
Learn your Greek 
See handout~
Let’s practice.
19
Model Specification and Identification
 It is common to display confirmatory factor models as
path diagrams in which squares represent observed
variables and circles represent the latent concepts.
 Single-headed arrows are used to imply a direction of
assumed causal influence, and double-headed arrows
are used to represent covariance between two latent
variables.
20
Path Diagram of a Confirmatory Factor Model
 In factor analysis the researcher almost always
assumes that the latent variables “cause” the observed
variables, as shown by the single-headed arrows
pointing away from the circles and towards the
manifest variables.
 The two ξ (ksi) latent variables represent common
factors, with paths pointing to more than one
observed variable.
 The circles labeled δ (delta) represent unique factors
because they affect only a single observed variable.
The δi incorporate all the variance in each xi not
21
captured by the common factors, such as
measurement error.
 In this model the two ξi are expected to covary, as
represented by the two-headed arrow.
 Additionally error in the measurement of x3 is
expected to correlate to some extent with
measurement error for x6. This may occur, for
example, with panel data in which ξ1 and ξ2 represent
the same concept measured at different points in time;
if there is measurement error at t1 it is likely that there
will be measurement error at t2.
Your CFA project will be about MTMM (Multi-Traits
Multi-Methods). Relax……
22
Structural Equation Modeling (SEM)
 Combines CFA with path analysis
o When we ran path analysis, we assumed the
manifest variables were measured without error.
This is almost always false.
o SEM allows for the modeling of relationships
among variables (like path analysis), but with
the difference that the variables are latent
constructs disattenuated from measurement
error.
o Think of it like correcting all of the covariances
for measurement error, and then running path
analysis.
o It explicitly integrates measurement with
statistics…extremely powerful.
o Is perhaps the most flexible modeling approach
available.
 Can model nested designs, growth modeling,
multilevel models, etc.
 Here is a graphical example.
23
X1
X3
X2
.89
.89
Y4
.85
.86
Prehire
Process Fairness
Y5
.83
Y6
.89
Posthire
Process Fairness
.51**
(.05)
.26**
(.04)
.11
(.06)
Y7
.55**
(.06)
.86
Prehire
Applicant Intentions
Posthire
Applicant Intentions
.83**
(.07)
.82
Y1
.96
Y2
.89
Y8
.95
Y9
.88
Y3
X4
X5
X6
.85
.76
.09*
(.04)
Outcome
Fairness
.79
Apply Greek here. Why? In LISREL outputs, you will see lots of Greek!)
25
Download