ICPSRWeek4Class2 - Investigadores CIDE

advertisement
ICPSR General Structural
Equations
Week 4 No. 2
1
Review of solutions for non-normal and
missing data (see handout)
Issue #1: My data are not normally distributed
Each variable has a reasonable number of discrete values (10 or more for
most, with perhaps the odd variable with 5-6* but definitely no variables
with fewer than 5 discrete values). [*variables with smaller number of
categories should not be heavily skewed]
Solution #1: AMOS, LISREL and SAS-CALIS
Transform the data to reduce the level of kurtosis within the Stat package (SAS or
SPSS).
COMPUTE LVAR1 = LN(VAR1).
COMPUTE LVAR1= LN(VAR1 + .1). [if there are 0 values]
COMPUTE VAR1_2 = VAR1**2.
COMPUTE VAR1_2 = 1 / VAR1.
See John Fox’s Regression text or other regression texts for
more details. Usually, dealing with skewedness also deals
with Kurtosis.
Checks: DESCRIPTIVES VARIABLES=VAR1 /
STATISTICS = SKEW KURTOSIS.
With transformed data, regular ML covariance analysis can be used.
2
Non-normal data  ADF estimation
Solution #2: AMOS, LISREL, SAS-CALIS
Use an ADF (arbitrary distribution function) estimator.
AMOS: An option under Analysis Options.
LISREL: Input of a asymptotic covariance matrix (4th moment matrix)
is required. To generate such a matrix in PRELIS,
check off the asymptotic covariances check box and insert a file name.
In LISREL, you will need to add a line to read in this matrix:
CM FI=
AC FI=
And you will need to specify the ADF fit function:
OU ME=WL
SAS
PROC CALIS METHOD=WLS
Important note on the ADF fit function:
Large sample sizes are required For the acov matrix to be non-singular, N must be at least >
p + (1/2)(p)(p+1)
20 variables: N>230
30 variables: N>495
Working anywhere near these minima is not recommended.
3
Solution #3: LISREL only. Scaled test statistics.
Use a scaled or adjusted chi-square and
standard error calculation
(e.g., Bentler-Satora). Input of asymptotic
covariance matrix is required, so it is necessary to
specify an AC= line which points to the asymptotic
covariance matrix but specify ME=ML and not
ME=WL in LISREL. Probably Definitely better than
ADF for small to moderate sized samples.
4
Missing Data
Issue #2: I have missing cases. My data are fairly normally
distributed (or I have transformed them to near normality –
Kurtosis values in the +1 to -1 range or fairly close to this).
Solution #1: Use EM algorithm to construct imputed covariance
matrix [assumes normality]
LISREL: This is an option under PRELIS.
Small limitation: imputed data treated as “real” data by
LISREL (affects N, significance tests)
AMOS: If you have the SPSS Missing Data module, you may
be able to generate an imputed covariance matrix. If you do
not, a “last ditch” approach would be to “flip” the dataset
into SAS (if you have it), use the SAS MI procedure, then
“flip” the covariance matrix file back into SPSS. See
Appendix in handout.
5
Missing Data
Issue #2: I have missing cases. My data are fairly
normally distributed (or I have transformed them
to near normality – Kurtosis values in the +1 to 1 range or fairly close to this).
Solution #2: Use a multiple-group model to explicitly model
missing data.
AMOS, LISREL (SAS CALIS will not estimate multiplegroup models)
This works if the number of missing data patterns is
fairly small (say <3-5) or if cleaning up problems with a
small number of missing data patterns deals with most
of the overall problem
6
Missing Data
Issue #2: I have missing cases. My data are fairly normally
distributed (or I have transformed them to near normality –
Kurtosis values in the +1 to -1 range or fairly close to this).
Solution #3: Use nearest-neighbor imputation
LISREL only. Limitation: for data with small number of values
for each variable, “ties” will be generated. Even with a generous
criterion, imputation could easily fail for ½ of the cases.
Small limitation: imputed data treated as “real” data by
LISREL (affects N, significance tests)
If working with STATA files, there is a user routine called
hotdeck (see Stata Tech. bulletins #51 and #54). Must be
installed.
7
Missing Data
Solution #3: Use nearest-neighbor imputation
If working with STATA files, there is a user routine called hotdeck
(see Stata Tech. bulletins #51 and #54). Must be installed. This is not
the same as the Prelis nearest neighbour procedure, but uses some
similar principles.
With AMOS, must use Stata or PRELIS. Stata: use Stat-Transfer or
DBMS-Copy to convert file to AMOS-readable SPSS .sav file.
Important note, from hotdeck documentation:
If a dataset contains many variables with missing values then it is
possible that many of the rows of data will contain at least one
missing value. The hotdeck procedure will not work very well in such
circumstances. There are more elaborate methods that only replace
missing values, rather than the whole row, for imputed values.
PRELIS: More complicated process to move data into SPSS. (see point
#4 in handout “PRELISQuirks.doc”).
8
Missing Data
Solution #4: Use FIML estimation [assumes normality]
AMOS:
Check off “estimation using means and intercerpts”
under Analysis Options and then input dataset with
missing values. Amos will not provide
modification indices with its version of FIML
estimation (some other form of estimation needed
for model-fitting)
LISREL
Must input raw data into LISREL. Declare missing
values in PRELIS (already done if SPSS file read
into PRELIS), save the PRELIS .psf file and then
read it into LISREL:
Instead of CM FI= or SY FI= :
RA FI=C:\TEMP\MYDATA.PSF
Will also need a DA statement:
9
Missing Data
Issue #3: My data need to be weighted
Note: sophisticated adjustment of standard errors, test statistics (see STATA
documentation) not available. It is possible to construct some stratified sample
problems as multiple group analyses.
Solution #1: Use weighting in generating a covariance
matrix to be passed to the SEM program
PRELIS:
Under Transformation select Weight Variable
before generating the covariance matrix.
*It is not clear if LISREL can handle
weighted data in conjunction with FIML
estimation. Some other missing data
technique may be required.
10
Missing Data
Solution #1: Use weighting in generating a covariance matrix
to be passed to the SEM program
PRELIS:
Under Transformation select Weight Variable before generating the covariance matrix.
*It is not clear if LISREL can handle
weighted data in conjunction with FIML
estimation. Some other missing data
technique may be required.
Data weight cases menu
Note: it is not clear if weight
variable needs to be rescaled
to mean=1.0 (probably a
good idea)
11
Missing Data
Issue #3: My data need to be weighted
Solution #1: Use weighting in generating a
covariance matrix to be passed to the SEM program
AMOS: AMOS will not accept a weighted SPSS dataset. In fact,
if you try to get AMOS to work with a dataset where a
WEIGHT command has been issued, it may generate an
error message (to unweight data, simply use the commands:
COMPUTE WTVAR=1.0 & WEIGHT WTVAR). But it
should be possible to construct a covariance matrix within SPSS
(using weighting) and then pass the “covariance matrix system file”
to AMOS.
In spss:
Weight by wgtvar.
correlations variables= [list of variables]/missing=listwise/ matrix out(*).
mconvert matrix=in(*) / replace.
save outfile = 'c:\temp\covs1.sav'.
12
Coarsely categorized data
Issue #4: My data are at best ordinal (3-5 discrete values per
indicator)
Solution #1: Use CVM techniques for ordinal data.
PRELIS only:
By default, variables with less than 15 discrete values
are treated as “ordinal” and matrices are not simple
covariance matrices. Use the Data  Define
Variables menus to alter any defaults.
Usually, you will want to generate an
asymptotic covariance matrix too
If there are also missing data, strictly speaking, the use of FIML or EM imputation is
not correct. Nearest neighbor approaches (issue #2, solution #3 above) are
acceptable.
13
Coarsely categorized data
Issue #4: My data are at best ordinal (3-5 discrete values per indicator)
Solution #2:
Resort to “item parcels”
(Best check these variables, with crosstabulations, first)
Add scores of 2 or more variables you believe to be
parallel indicators to form single indicators.
Missing data approaches for parcels can be tricky. Consider
trying to create parcels with very similar patterns of missing-ness
(same respondents missing, same respondents non-missing across
both) and then give the variable a missing value when either of the
variables is missing.
Once variables have a sufficient number of discrete values with parceling, if the
distributions are not normal, refer to issue #1 for
solutions.
IF you parcel variables, read the “pro and con” literature
(see course outline).
14
Ordinal Data models
CVM approaches in PRELIS/LISREL.
Example file: Week4Examples\OrdinalData2
See folder for listing of programs, output listings
and a codebook for variables used.
Program LisrelU1.ls8 is simple model based on
PM matrix.
15
Extensions of the ordinal
variable model

Basic form:
Threshold parameters, representing mapping of
z* (latent variable, continuous) onto z (coarsely
categorized variables, where z has m categories.
 These thresholds will be familiar to anyone used
to working with logistic regression models (or
probit models):
Univariate case:
ln (cumulative odds) = τ(k)

Tau coefficient = ln ( kth category or lower / higher categories)
16
Extensions of the ordinal variable model
Univariate case:
ln (cumulative odds) = τ(k)
Tau coefficient = ln ( kth category or lower / higher categories)
Example:
20 20 30 40 50 distribution of cases
Tau1 = ln ( 20 / (20+30+40+50)
Tau2 = ln (40 / (30+40+50)
Tau3 = ln (70 / (40+50)
Tau4 = ln (110 / 50)
17
Polychoric correlations
Polychoric correlations:
- Estimate thresholds from univariate
distributions
- Then, minimize a fit function involving
reproduced probabilities based on a
parameter vector that includes thresholds + p
(est. correlation)
18
Categorical Variable Model
(ordinal data)
For each of the variables, the mean is fixed to 0
and the standard deviation fixed to 1.0
(otherwise, under-identified)
Parameterization
Mean
Std. dev. Thresholds
0.0
1.0
τ1 τ2 τ3 τ4
Alternative parameterization:
u1
σ1
0 1 τ3* τ4*
19
Fixing thresholds
“Equal Thresholds”
 Same threshold for 2 variables measured
over time (longitudinal data)
 Same threshold for 1 variable measured in
two different groups

See Week4Examples/OrdinalData2 files
20
Longitudinal data
I.
II.
Modeling of latent variable mean
differences over time
More complicated tests (linear growth,
quadratic growth, etc.)
See slides from previous class
21
Applications to longitudinal data
I.
II.
Modeling of latent variable mean
differences over time
More complicated tests (linear growth,
quadratic growth, etc.)
22
Applications to longitudinal data
Basic model for assessing latent variable
mean change:
Can run this model on
X or Y side (LISREL)
Equations:
X1 = a1 + 1.0L1 + e1
X2 = a2 + b1 L1 + e2
X3 = a3 + b2 L1 + e3
X4 = a4 + 1.0 L2 + e4
Constraints:
b1=b3 b2=b4 LX=IN
a1=a4 a2=a5 a3=a6 TX=IN
Ka1 = 0 ka2 = (to be estimated)
X5 = a5 + b3 L2 + e5
X6 = a6 + b4 L2 + 36
23
Applications to longitudinal data
Basic model for assessing latent variable mean
change:
Constraints:
b1=b3 b2=b4 LX=IN
a1=a4 a2=a5 a3=a6 TX=IN
Ka1 = 0
ka2 = (to be estimated)
Can run this model on
X or Y side (LISREL)
Equations:
X1 = a1 + 1.0L1 + e1
X2 = a2 + b1 L1 + e2
X3 = a3 + b2 L1 + e3
X4 = a4 + 1.0 L2 + e4
X5 = a5 + b3 L2 + e5
X6 = a6 + b4 L2 + 36
Correlated errors
24
Applications to longitudinal data
Model for assessing latent variable mean change
Usual parameter constraints:
1
x1
1
1
1
x2
x3
x4
1
1
1
1
x5
x6
x7
1
x8
x9
TX(1)=TX(4)=TX(7)
LISREL: EQ TX 1 TX 4 TX 7
1
1
Ksi-1
1
Ksi-2
AMOS: same parameter name
Ksi-3
0,
0,
0,
1
1
1
a1
x1
a2
x2
1
x3
0,
Ksi-1
0,
a3
a1
x4
0,
0,
0,
0,
0,
1
1
1
1
1
a2
x5
1
x6
0,
Ksi-2
a3
a1
x7
a2
x8
1
a3
x9
0,
Ksi-3
25
Applications to longitudinal data
Model for assessing latent variable mean change
Usual parameter constraints:
TX(1)=TX(4)=TX(7)
1
1
1
1
1
1
1
1
1
x1
x2
x3
x4
x5
x6
x7
x8
x9
1
1
1
Ksi-1
AMOS: same parameter name
Ksi-3
Ksi-2
LISREL: EQ TX 1 TX 4 TX 7
KA(1) = 0
KA(2) = mean difference parameter #1
KA(3) = mean difference parameter #2
LISREL: KA=FI group 1 KA=FR groups 2,3
0,
0,
0,
1
1
1
a1
x1
a2
x2
1
x3
0,
Ksi-1
0,
a3
a1
x4
0,
0,
0,
0,
0,
1
1
1
1
1
a2
x5
1
x6
kappa1,
Ksi-2
a3
a1
x7
a2
x8
1
a3
x9
kappa2,
IN AMOS:
Ksi-3
26
Applications to longitudinal data
Model for assessing latent variable mean change
Usual parameter constraints:
TX(1)=TX(4)=TX(7)
1
1
1
1
1
1
1
1
1
x1
x2
x3
x4
x5
x6
x7
x8
x9
1
1
1
Ksi-1
Ksi-2
LISREL: EQ TX 1 TX 4 TX 7
AMOS: same parameter name
Ksi-3
KA(1) = 0
KA(2) = mean difference parameter #1
KA(3) = mean difference parameter #2
LISREL: KA=FI group 1 KA=FR groups 2,3
Some tests:
Test for change: H0: ka1=ka2=0
Linear change model:
ka2 = 2*ka1
Quadratic change model: ka2 = 4*ka1
27
As a causal model:
• Beta 1
“stability
coefficient”
1
1
1
1
1
1
1
1
Eta-1
Beta-1
Eta-2
1
• Stability coefficient is high if relative rankings
preserved, even if there has been massive change
with respect to means
• In model with AL1=0 and AL2=free, can have
high Beta2,1 with a) AL(1)=AL(2) or AL(1)
massively different from AL(2)
28
Causal models:
Ksi-1
gamma1,1
Ksi-2
gamma1,2
Eta-1
Ksi-2 as lagged (time 1) version of eta-1
(could re-specify as an eta variable)
Temporal order in Ksi-1  Eta-1 relationship
29
Causal models:
1
Eta-1
Ksi-1
ga2,1
1
ga1,2
Ksi-2
Eta-2
Cross-lagged panel coefficients
[Reduced form of model on next slide]
30
Causal models:
1
Ksi-1
Eta-1
1
Ksi-2
Eta-2
Reciprocal effects, using lagged values to achieve model
identification
31
Causal models:
A variant
Issue: what does ga(1,1) mean given
concern over causal direction?
TV Use
gamma 1,1
Political
Trust
gamma2,1
Beta 2,1
Pol Trust
Time 2
32
Lagged and contemporaneous effects
This model is underidentified
1
1
33
Lagged effects model
Ksi-1 could be an “event”
1/0 dummy variable
ksi-1
ksi-2
eta-1
eta-2
34
First order model for three wave data
(univariate)
1
1
1
Time 1
1
1
1
1
1
1
1
1
1
Time 2
Time 3
35
First order model for three wave data
(univariate)
1
1
1
1
1
1
1
1
b1
Tests:
1
1
1
1
b1
Equivalent of stability coefficients (b1)
Mean differences (see earlier slide)
36
Second order model for three wave data
(univariate)
1
1
1
1
1
1
1
1
1
b1
No longer comparable to b1
(t1 t2)
1
1
1
b1
37
Second order model for three wave data
(univariate)
1
1
1
1
1
1
1
1
1
b1
1
1
1
b1
Issue: adding appropriate error terms (2nd order)
38
Multivariate Model for Three-wave panel
data: cross-lagged effects (first order)
1
1
1
1
39
Multivariate Model for Three-wave panel
data: cross-lagged effects (first order)
1
1
1
1
Equivalence of parameters:
T1  T2
T2  T3
40
Multivariate Model for Three-wave panel
data: cross-lagged effects (second order)
41
Multivariate Model for Four-wave panel data:
cross-lagged effects (second order)
42
Lagged and contemporaneous effects
Three wave model with constraints:
1
1
a
a
d
d
e
f
c
e
f
c
b
b
1
Under many circumstances, there will be an empirical
under-ident. problem, though in theory this model is
identified
1
43
Example:
• Canada, Quality of Life data
• In directory \Panel in
Week4Examples
44
Re-expressing parameters:
GROWTH CURVE MODELS
Intercept & linear (& sometimes quadratic)
terms
45
Linear Growth Model
Two Factor LGM
Parm1,
Parm2,
Intercept
Slope
1
0
1
1
0
0
V1 - t1
V2 - t2
1
0, 0
1
0, 0
46
Linear Growth Model
Two Factor LGM
Parm1,
Parm2,
Intercept
Slope
1
1
1
0
0
LV-t2
LV-t1
1
1
1
0,
0
A bit more complicated with
latent variables instead of single
manifest variables
1
0,
1
0,
1 1
1
0,
0,
0,
47
Linear Growth Model
Two Factor Linear Growth Model
Parm1,
Parm2,
Intercept
Slope
1
0
1
1
2
1
0
t1
1
0,
0
t2
1
0,
0
t3
1
0,
48
Unspecified 2 factor Growth Curve Model
Two Factor Unspecified Growth Model
Parm1,
Parm2,
Intercept
Slope
1
0
1
1
lambda
1
0
t1
1
0,
0
t2
1
0,
0
t3
1
0,
49
3 factor Growth Curve Model
Parm1,
Intercept
Parm2,
0,
Linear
1
0
1
1
2
1
1
0
t1
1
0,
Quadratic
0
0
t2
1
0,
4
0
t3
1
0,
50
Last slide
51
Download