Title stata.com example 26 — Fitting a model with data missing at random Description Remarks and examples Also see Description sem method(mlmv) is demonstrated using . use http://www.stata-press.com/data/r13/cfa_missing (CFA MAR data) . summarize Variable Obs Mean Std. Dev. Min Max id test1 test2 test3 test4 500 406 413 443 417 250.5 97.37475 98.04501 100.9699 99.56815 144.4818 13.91442 13.84145 13.4862 14.25438 1 56.0406 62.25496 65.51753 53.8719 500 136.5672 129.3881 137.3046 153.9779 taken 500 3.358 .6593219 2 4 . notes _dta: 1. Fictional data on 500 subjects taking four tests. 2. Tests results M.A.R. (missing at random). 3. 230 took all 4 tests 4. 219 took 3 of the 4 tests 5. 51 took 2 of the 4 tests 6. All tests have expected mean 100, s.d. 14. See [SEM] intro 4 for background. Remarks and examples stata.com Remarks are presented under the following headings: Fitting the model with method(ml) Fitting the model with method(mlmv) Fitting the model with the Builder 1 2 example 26 — Fitting a model with data missing at random Fitting the model with method(ml) We fit a single-factor measurement model. X test1 test2 test3 test4 ε1 ε2 ε3 ε4 . sem (test1 test2 test3 test4 <- X), nolog (270 observations with missing values excluded) Endogenous variables Measurement: test1 test2 test3 test4 Exogenous variables Latent: X Structural equation model Estimation method = ml Log likelihood = -3464.3099 ( 1) [test1]X = 1 Coef. Measurement test1 <X _cons 1 96.76907 OIM Std. Err. Number of obs z (constrained) .8134878 118.96 = 230 P>|z| [95% Conf. Interval] 0.000 95.17467 98.36348 test2 <X _cons 1.021885 92.41248 .1183745 .8405189 8.63 109.95 0.000 0.000 .789875 90.7651 1.253895 94.05987 .5084673 94.12958 .0814191 .7039862 6.25 133.71 0.000 0.000 .3488889 92.7498 .6680457 95.50937 .5585651 92.2556 .0857772 .7322511 6.51 125.99 0.000 0.000 .3904449 90.82042 .7266853 93.69079 55.86083 61.88092 89.07839 93.26508 96.34453 10.85681 11.50377 8.962574 9.504276 16.28034 38.16563 42.985 73.13566 76.37945 69.18161 81.76028 89.08338 108.4965 113.8837 134.1725 test3 <X _cons test4 <X _cons var(e.test1) var(e.test2) var(e.test3) var(e.test4) var(X) LR test of model vs. saturated: chi2(2) = 0.39, Prob > chi2 = 0.8212 example 26 — Fitting a model with data missing at random 3 Notes: 1. This model was fit using 230 of the 500 observations in the dataset. Unless you use sem’s method(mlmv), observations are casewise omitted, meaning that if there is a single variable with a missing value among the variables being used, the observation is ignored. 2. The coefficients for test3 and test4 are 0.51 and 0.56. Because we at StataCorp manufactured these data, we can tell you that the true coefficients are 1. 3. The error variance for e.test1 and e.test2 are understated. These data were manufactured with an error variance of 100. 4. These data are missing at random (MAR), not missing completely at random (MCAR). In MAR data, which values are missing can be a function of the observed values in the data. MAR data can produce biased estimates if the missingness is ignored, as we just did. MCAR data do not bias estimates. Fitting the model with method(mlmv) . sem (test1 test2 test3 test4 <- X), method(mlmv) nolog Endogenous variables Measurement: test1 test2 test3 test4 Exogenous variables Latent: X (output omitted ) Structural equation model Estimation method = mlmv Log likelihood = -6592.9961 ( 1) Number of obs = 500 [test1]X = 1 Coef. Measurement test1 <X _cons 1 98.94386 OIM Std. Err. z (constrained) .6814418 145.20 P>|z| [95% Conf. Interval] 0.000 97.60826 100.2795 test2 <X _cons 1.069952 99.84218 .1079173 .6911295 9.91 144.46 0.000 0.000 .8584378 98.48759 1.281466 101.1968 .9489025 101.0655 .0896098 .6256275 10.59 161.54 0.000 0.000 .7732706 99.83928 1.124534 102.2917 1.021626 99.64509 .0958982 .6730054 10.65 148.06 0.000 0.000 .8336687 98.32603 1.209583 100.9642 101.1135 95.45572 95.14847 101.0943 94.04629 10.1898 10.79485 9.053014 10.0969 13.96734 82.99057 76.47892 78.9611 83.12124 70.29508 123.1941 119.1413 114.6543 122.9536 125.8225 test3 <X _cons test4 <X _cons var(e.test1) var(e.test2) var(e.test3) var(e.test4) var(X) LR test of model vs. saturated: chi2(2) = 2.27, Prob > chi2 = 0.3209 4 example 26 — Fitting a model with data missing at random Notes: 1. The model is now fit using all 500 observations in the dataset. 2. The coefficients for test3 and test4—previously 0.51 and 0.56—are now 0.95 and 1.02. 3. Error variance estimates are now consistent with the true value of 100. 4. Standard errors of path coefficients are mostly smaller than reported in the previous model. 5. method(mlmv) requires that the data be MCAR or MAR. 6. method(mlmv) requires that the data be multivariate normal. Fitting the model with the Builder Use the diagram above for reference. 1. Open the dataset. In the Command window, type . use http://www.stata-press.com/data/r13/cfa_missing 2. Open a new Builder diagram. Select menu item Statistics > SEM (structural equation modeling) > Model building and estimation. 3. Create the measurement component for X. Select the Add Measurement Component tool, , and then click in the diagram about one-third of the way down from the top and about halfway in from the left. In the resulting dialog box, a. change the Latent variable name to X; b. select test1, test2, test3, and test4 by using the Measurement variables control; c. select Down in the Measurement direction control; d. click on OK. If you wish, move the component by clicking on any variable and dragging it. 4. Estimate. Click on the Estimate button, , in the Standard Toolbar. In the resulting dialog box, a. select the Model tab; b. select the Maximum likelihood with missing values radio button; c. click on OK. You can open a completed diagram in the Builder by typing . webgetsem cfa_missing Also see [SEM] intro 4 — Substantive concepts [SEM] sem option method( ) — Specifying method and calculation of VCE