The MVA Module

advertisement

Missing Values Analysis with IBM SPSS

Analyze, Missing Values Analysis

MVA VARIABLES=Gender Ideal Statoph Nucoph SATM Year Eye

/MAXCAT=25

/CATEGORICAL=Eye

/TTEST PROB PERCENT=5

/MPATTERN

/EM(TOLERANCE=0.001 CONVERGENCE=0.0001 ITERATIONS=25).

[DataSet] C:\Users\Vati\Documents\StatData\IntroQ\IntroQ.sav

See IntroQ Questionnaire for a description of the questions asked.

N Mean

Univariate Statistics

Std.

Deviation

Missing

Count Percent

No. of Extremes

Low High a

Gender

Ideal

Statoph

Nucoph

667

662

658

665

1.26

70.32

6.29

58.04

.438

3.854

2.315

22.423

0

5

9

2

.0

.7

1.3

.3

SATM

Year

529 505.29

667 1997.52

666

96.080

8.621

138

0

20.7

.0

Eye 1 .1 a. Number of cases outside the range (Q1 - 1.5*IQR, Q3 + 1.5*IQR).

0

8

11

26

1

0

0

0

0

0

2

0

Here we see that almost 21% of the cases are missing data on the SATM variable.

Summary of Estimated Means

Gender Ideal Statoph Nucoph SATM Year

All

Values

EM

1.26 70.32

1.26 70.32

6.29 58.04 505.29 1997.52

6.29 58.02 504.60 1997.51

Please read David Howell’s document on the Expectation-Maximization algorithm . The table above and that below show the results of SPSS’ EM procedure. The algorithm leads to an estimated mean of 504.6 and standard deviation of 95.77 for SATM, not much different from the observed means for those cases on which we do have data.

Summary of Estimated Standard Deviations

Gender Ideal Statoph Nucoph SATM Year

All

Values

.438 3.854 2.315 22.423 96.080 8.621

EM .437 3.851 2.308 22.431 95.770 8.620

627

628

629

631

660

661

662

626

630

632

633

SATM t df

P(2-tail)

# Present

# Missing

Mean(Present)

Separate Variance t Tests a

Gender

1.5

228.9

.132

529

138

1.27

Ideal

.3

209.3

.787

527

135

70.34

Statoph

-2.7

221.6

.007

525

133

6.17

Nucoph

-.1

222.7

.952

527

138

58.01

Mean(Missing) 1.21 70.24 6.75 58.14

For each quantitative variable, pairs of groups are formed by indicator variables

(present, missing).

a a. Indicator variables with less than 5% missing are not displayed.

SATM

.

529

.

.

0

505.29

.

Year

-1.8

234.1

.066

529

138

1997.23

1998.65

These t tests compare the group of cases with data on SATM to the group of cases without data on

SATM. Notice that those who did not answer the SATM question scored significantly higher on the Statophobia item than did those who did answer the SATM question. There is also a hint that the frequency of failure to answer the SATM question has increased over the years.

I have cut out most of the table below, but left in enough to show you how SPSS groups cases by the pattern of missing values. The most frequent pattern was missing data on SATM but not on any other variables. Cases 646 through 629 and case 631 were missing data on SATM and Statophobia.

Case 630 (and others) were missing data only on Statophobia, and so on.

Missing Patterns (cases with missing values)

Case # Missing % Missing Missing and Extreme Value Patterns a

Gender Year Eye Nucoph Ideal Statoph SATM

2

2

2

2

1

2

1

1

1

1

1

14.3

14.3

14.3

28.6

28.6

28.6

28.6

28.6

14.3

14.3

14.3

-

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

S

444

221

552

311

194

665

558

78

2

2

1

2

1

1

2

1

28.6

14.3

14.3

14.3

14.3

28.6

28.6

28.6

S

-

S

S

S

S

S

S

S

S

S

S

S

- indicates an extreme low value, while + indicates an extreme high value. The range used is (Q1 - 1.5*IQR, Q3 + 1.5*IQR). a. Cases and variables are sorted on missing patterns.

EM Estimated Statistics

Here we have estimated means, covariances , and correlation coefficients. Little’s MCAR test the null that the missing data are Missing Completely At Random. Since it is significant, we conclude that the data are

NOT missing completely at random. The majority opinion is that EM estimates are not trustworthy when the data at not missing completely at random.

EM Means a

Gender Ideal Statoph Nucoph SATM Year

1.26 70.32 6.29 58.02 504.60 1997.51 a. Little's MCAR test: Chi-Square = 61.477, DF = 32, Sig. =

.001

EM Covariances a

Gender

Ideal

Statoph

Nucoph

Gender Ideal Statoph Nucoph SATM

.191

-.949 14.833

-.146 .698 5.326

-.815 8.702 1.749 503.155

SATM

Year a. Little's MCAR test: Chi-Square = 61.477, DF = 32, Sig. = .001

Year

2.480 -18.295 -73.667 63.812 9171.847

.032 -.483 -3.288 1.114 259.774 74.301

Gender

EM Correlations a

Gender Ideal Statoph Nucoph SATM Year

1

Ideal

Statoph

-.563

-.145

1

.079 1

Nucoph

SATM

Year

-.083

.059

.008

.101

-.050

-.015

.034

-.333

-.165

1

.030

.006

1

.315 a. Little's MCAR test: Chi-Square = 61.477, DF = 32, Sig. = .001

1

ECU Users: Curiously, the SPSS (20) installation provided for ECU faculty to use on campus does not contain the missing values and multiple imputation modules, but that provided for use off campus does. Go figure.

Karl L. Wuensch , December, 2012

 Return to Wuensch’s SPSS Lessons Page

Download