example 26

advertisement
Title
stata.com
example 26 — Fitting a model with data missing at random
Description
Remarks and examples
Also see
Description
sem method(mlmv) is demonstrated using
. use http://www.stata-press.com/data/r13/cfa_missing
(CFA MAR data)
. summarize
Variable
Obs
Mean
Std. Dev.
Min
Max
id
test1
test2
test3
test4
500
406
413
443
417
250.5
97.37475
98.04501
100.9699
99.56815
144.4818
13.91442
13.84145
13.4862
14.25438
1
56.0406
62.25496
65.51753
53.8719
500
136.5672
129.3881
137.3046
153.9779
taken
500
3.358
.6593219
2
4
. notes
_dta:
1. Fictional data on 500 subjects taking four tests.
2. Tests results M.A.R. (missing at random).
3. 230 took all 4 tests
4. 219 took 3 of the 4 tests
5. 51 took 2 of the 4 tests
6. All tests have expected mean 100, s.d. 14.
See [SEM] intro 4 for background.
Remarks and examples
stata.com
Remarks are presented under the following headings:
Fitting the model with method(ml)
Fitting the model with method(mlmv)
Fitting the model with the Builder
1
2
example 26 — Fitting a model with data missing at random
Fitting the model with method(ml)
We fit a single-factor measurement model.
X
test1
test2
test3
test4
ε1
ε2
ε3
ε4
. sem (test1 test2 test3 test4 <- X), nolog
(270 observations with missing values excluded)
Endogenous variables
Measurement: test1 test2 test3 test4
Exogenous variables
Latent:
X
Structural equation model
Estimation method = ml
Log likelihood
= -3464.3099
( 1) [test1]X = 1
Coef.
Measurement
test1 <X
_cons
1
96.76907
OIM
Std. Err.
Number of obs
z
(constrained)
.8134878
118.96
=
230
P>|z|
[95% Conf. Interval]
0.000
95.17467
98.36348
test2 <X
_cons
1.021885
92.41248
.1183745
.8405189
8.63
109.95
0.000
0.000
.789875
90.7651
1.253895
94.05987
.5084673
94.12958
.0814191
.7039862
6.25
133.71
0.000
0.000
.3488889
92.7498
.6680457
95.50937
.5585651
92.2556
.0857772
.7322511
6.51
125.99
0.000
0.000
.3904449
90.82042
.7266853
93.69079
55.86083
61.88092
89.07839
93.26508
96.34453
10.85681
11.50377
8.962574
9.504276
16.28034
38.16563
42.985
73.13566
76.37945
69.18161
81.76028
89.08338
108.4965
113.8837
134.1725
test3 <X
_cons
test4 <X
_cons
var(e.test1)
var(e.test2)
var(e.test3)
var(e.test4)
var(X)
LR test of model vs. saturated: chi2(2)
=
0.39, Prob > chi2 = 0.8212
example 26 — Fitting a model with data missing at random
3
Notes:
1. This model was fit using 230 of the 500 observations in the dataset. Unless you use sem’s
method(mlmv), observations are casewise omitted, meaning that if there is a single variable with
a missing value among the variables being used, the observation is ignored.
2. The coefficients for test3 and test4 are 0.51 and 0.56. Because we at StataCorp manufactured
these data, we can tell you that the true coefficients are 1.
3. The error variance for e.test1 and e.test2 are understated. These data were manufactured
with an error variance of 100.
4. These data are missing at random (MAR), not missing completely at random (MCAR). In MAR
data, which values are missing can be a function of the observed values in the data. MAR data
can produce biased estimates if the missingness is ignored, as we just did. MCAR data do not
bias estimates.
Fitting the model with method(mlmv)
. sem (test1 test2 test3 test4 <- X), method(mlmv) nolog
Endogenous variables
Measurement:
test1 test2 test3 test4
Exogenous variables
Latent:
X
(output omitted )
Structural equation model
Estimation method = mlmv
Log likelihood
= -6592.9961
( 1)
Number of obs
=
500
[test1]X = 1
Coef.
Measurement
test1 <X
_cons
1
98.94386
OIM
Std. Err.
z
(constrained)
.6814418
145.20
P>|z|
[95% Conf. Interval]
0.000
97.60826
100.2795
test2 <X
_cons
1.069952
99.84218
.1079173
.6911295
9.91
144.46
0.000
0.000
.8584378
98.48759
1.281466
101.1968
.9489025
101.0655
.0896098
.6256275
10.59
161.54
0.000
0.000
.7732706
99.83928
1.124534
102.2917
1.021626
99.64509
.0958982
.6730054
10.65
148.06
0.000
0.000
.8336687
98.32603
1.209583
100.9642
101.1135
95.45572
95.14847
101.0943
94.04629
10.1898
10.79485
9.053014
10.0969
13.96734
82.99057
76.47892
78.9611
83.12124
70.29508
123.1941
119.1413
114.6543
122.9536
125.8225
test3 <X
_cons
test4 <X
_cons
var(e.test1)
var(e.test2)
var(e.test3)
var(e.test4)
var(X)
LR test of model vs. saturated: chi2(2)
=
2.27, Prob > chi2 = 0.3209
4
example 26 — Fitting a model with data missing at random
Notes:
1. The model is now fit using all 500 observations in the dataset.
2. The coefficients for test3 and test4—previously 0.51 and 0.56—are now 0.95 and 1.02.
3. Error variance estimates are now consistent with the true value of 100.
4. Standard errors of path coefficients are mostly smaller than reported in the previous model.
5. method(mlmv) requires that the data be MCAR or MAR.
6. method(mlmv) requires that the data be multivariate normal.
Fitting the model with the Builder
Use the diagram above for reference.
1. Open the dataset.
In the Command window, type
. use http://www.stata-press.com/data/r13/cfa_missing
2. Open a new Builder diagram.
Select menu item Statistics > SEM (structural equation modeling) > Model building and
estimation.
3. Create the measurement component for X.
Select the Add Measurement Component tool, , and then click in the diagram about one-third
of the way down from the top and about halfway in from the left.
In the resulting dialog box,
a. change the Latent variable name to X;
b. select test1, test2, test3, and test4 by using the Measurement variables control;
c. select Down in the Measurement direction control;
d. click on OK.
If you wish, move the component by clicking on any variable and dragging it.
4. Estimate.
Click on the Estimate button,
, in the Standard Toolbar. In the resulting dialog box,
a. select the Model tab;
b. select the Maximum likelihood with missing values radio button;
c. click on OK.
You can open a completed diagram in the Builder by typing
. webgetsem cfa_missing
Also see
[SEM] intro 4 — Substantive concepts
[SEM] sem option method( ) — Specifying method and calculation of VCE
Download