Logistic Regression: SPSS Output

advertisement
Logistic Regression: SPSS Output
This example uses SPSS 13 for the file "employee data.sav," which comes with SPSS. The
analysis uses as a dependent the attitude variable minority, which is coded 0= no, 1=yes. The
independent variables are educ (years education), prevexp (months experience), jobcat
(1=clerical, 2=custodial, 3=managerial), and gender (m, f).
To obtain this output:
1. File, Open, point to gss 93 subset.sav.
2. Analyze, Regression, Binary Logistic.
3. In the Logistic Regression dialog box, enter minority as the dependent and the
independents as educ, prevexp, jobcat, and gender.
4. Click on the Categorical button and indicate jobcat as a categorical variables. (Gender
will be automatically treated as categorical since the raw data are text format = m or f).
Click Continue.
5. Click on the Options button and select Classification Plots, Hosmer-Lemeshow
Goodness-of-Fit, Casewise Listing of Residuals (outsied 2 standard deviations), and
check Display on Last Step. Click Continue.
6. Click on OK to run the procedure.
Comments in blue are by the instructor and are not part of SPSS output.
Logistic Regression
Case Processing Summary
Unweighted Cases
Selected Cases
a
N
Included in Analysis
Mis sing Cases
Total
Unselected Cas es
Total
474
0
474
0
474
Percent
100.0
.0
100.0
.0
100.0
a. If weight is in effect, s ee class ification table for the total
number of cases.
The case processing table above shows missing values are not an issue for these
data.
De pendent Varia ble Encodi ng
Original Value Int ernal Value
No
0
Yes
1
The Dependent Variable Encoding table above shows the dependent variable,
minority, is coded with the reference category=1="yes", and the non-minority
category is coded 0. This conventional for logistic analysis, which here focuses on
the probability that minority=1.
Ca tegorical Varia ble s Codings
Employment
Category
Gender
Clerical
Custodial
Manager
Male
Female
Frequency
363
27
84
258
216
Parameter coding
(1)
(2)
1.000
.000
.000
1.000
.000
.000
1.000
.000
Above is SPSS's parameterization of the two categorical independent variables.
Note that its parameter coefficients for the last category of each such variable are
all 0's, indicating the last category is the omitted value for that set of dummy
variables. The parameter codings are the X values for the dummy variables. They
are are multiplied by the logit (effect) coefficients as part of obtaining the predicted
values of the dependent, much as one would compute an OLS regression estimate.
Block 0: Beginning Block
Classification Table a,b
Predicted
Step 0
Observed
Minority Classification
No
Yes
Overall Percentage
Minority Classification
No
Yes
370
0
104
0
Percentage
Correct
100.0
.0
78.1
a. Constant is included in the model.
b. The cut value is .500
The classification table above is a 2 x 2 table which tallies correct and incorrect
estimates for the null model with only the constant. The columns are the two
predicted values of the dependent, while the rows are the two observed (actual)
values of the dependent. In a perfect model, all cases will be on the diagonal and
the overall percent correct will be 100%. If the logistic model has homoscedasticity
(not a logistic regression assumption), the percent correct will be approximately the
same for both rows. Here it is not, with the model predicting non-minority cases
but not predicting any minority cases. While the overall percent correctly predicted
seems moderately good at 78.1%, the researcher must note that blindly estimating
the most frequent category (non-minority) for all cases would yield the same
percent correct (78.1%).
Va riables in the Equa tion
St ep 0
B
-1. 269
Constant
S. E.
.111
W ald
130.755
df
1
Sig.
.000
Ex p(B)
.281
Above SPSS prints the initial test for the model in which the coefficients for all the
independent variables are 0. The finding of significance above indicates this null
model should be rejected.
Variables not in the Equation
Step
0
Variables
Score
8.371
9.931
26.172
3.715
11.481
2.714
34.047
educ
prevexp
jobcat
jobcat(1)
jobcat(2)
gender(1)
Overall Statistics
df
1
1
2
1
1
1
5
Sig.
.004
.002
.000
.054
.001
.099
.000
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
37.303
37.303
37.303
df
5
5
5
Sig.
.000
.000
.000
The chi-square goodness-of-fit test tests the null hypothesis that the step is justified.
Here the step is from the constant-only model to the all-independents model. When
as here the step was to add a variable or variables, the inclusion is justified if the
significance of the step is less than 0.05. Had the step been to drop variables from
the equation, then the exclusion would have been justified if the significance of the
change was large (ex., over 0.10).
Model Summary
Step
1
-2 Log
Cox & Snell
likelihood
R Square
461.496a
.076
Nagelkerke
R Square
.116
a. Es timation terminated at iteration number 6 because
parameter estimates changed by les s than .001.
The Cox-Snell R2 and Nagelkerke R2 are attempts to provide a logistic analogy to
R2 in OLS regression. The Nagelkerke measure adapts the Cox-Snell measure so
that it varies from 0 to 1, as does R2 in OLS.
Hosme r and Leme show Test
St ep
1
Chi-square
15.450
df
Sig.
.051
8
The Hosmer and Lemeshow Goodness-of-Fit Test divides subjects into deciles
based on predicted probabilities, then computes a chi-square from observed and
expected frequencies. The p-value=0.051 here is computed from the chi-square
distribution with 8 degrees of freedom and indicates that the logistic model is a
(barely) good fit. That is, if the Hosmer and Lemeshow Goodness-of-Fit test
statistic is .05 or less, we reject the null hypothesis that there is no difference
between the observed and predicted values of the dependent; if it is greater, as we
want, we fail to reject the null hypothesis that there is no difference, implying that
the model's estimates fit the data at an acceptable level. As here, this does not mean
that the model explains much of the variance in the dependent, only that it does so
to a significant degree.
Contingency Table for Hosmer and Lemeshow Test
Step
1
1
2
3
4
5
6
7
8
9
10
Minority Classification
= No
Observed
Expected
47
45.020
43
43.282
43
38.046
39
38.710
36
38.104
29
36.597
38
34.103
34
33.587
33
32.823
28
29.728
Minority Classification
= Yes
Observed
Expected
0
1.980
4
3.718
3
7.954
8
8.290
11
8.896
18
10.403
9
12.897
13
13.413
14
14.177
24
22.272
Total
47
47
46
47
47
47
47
47
47
52
Classification Table a
Predicted
Step 1
Observed
Minority Classification
Overall Percentage
No
Yes
Minority Classification
No
Yes
363
7
103
1
Percentage
Correct
98.1
1.0
76.8
a. The cut value is .500
The classification table above is a 2 x 2 table which tallies correct and incorrect
estimates for the full model with the independents as well as the constant. The
columns are the two predicted values of the dependent, while the rows are the two
observed (actual) values of the dependent. In a perfect model, all cases will be on
the diagonal and the overall percent correct will be 100%. If the logistic model has
homoscedasticity (not a logistic regression assumption), the percent correct will be
approximately the same for both rows. Here it is not, with the model predicting all
but seven non-minority cases but predicting only one minority cases. While the
overall percent correctly predicted seems moderately good at 76.8%, the researcher
must note that blindly estimating the most frequent category (non-minority) for all
cases would yield an even higher percent correct (78.1%), as noted above. This
implies minority status cannot be differentiated on the basis of education, job
experience, job category, and gender for these data.
Variables in the Equation
Step
a
1
educ
prevexp
jobcat
jobcat(1)
jobcat(2)
gender(1)
Constant
B
-.008
.002
S.E.
.054
.001
2.048
2.456
.579
-3.523
.572
.765
.262
1.040
Wald
.020
1.913
13.417
12.803
10.313
4.868
11.473
df
1
1
2
1
1
1
1
Sig.
.886
.167
.001
.000
.001
.027
.001
Exp(B)
.992
1.002
7.755
11.662
1.784
.030
a. Variable(s) entered on s tep 1: educ, prevexp, jobcat, gender.
The Wald statistic above and the corresponding significance level test the
significance of each of the covariate and dummy independents in the model. The
ratio of the logistic coefficient B to its standard error S.E., squared, equals the Wald
statistic. If the Wald statistic is significant (i.e., less than 0.05) then the parameter is
significant in the model. Of the independents, jobcat and gender are significant but
educ and prevexp are not.
The "Exp(b)" column is SPSS's label for the odds ratio of the row independent with
the dependent (minority).It is the predicted change in odds for a unit increase in the
corresponding independent variable. Odds ratios less than 1 correspond to
decreases and odds ratios more than 1.0 correspond to increases in odds. Odds
ratios close to 1.0 indicate that unit changes in that independent variable do not
affect the dependent variable.
Step number: 1
Observed Groups and Predicted Probabilities
F
R
E
Q
U
E
160 



120 



80 
Y
N
N
N










N


N
YY

 N
N
NN

40  N
NY
NN

 N
NN
NN

 NN
NNYN NNY

 NNN
NNNNY NNNYY Y Y
Y Y

Predicted


Prob:
0
.25
.5
.75
1
Group: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
N
C
Y
Predicted Probability is of Membership for Yes
The Cut Value is .50
Symbols: N - No
Y - Yes
Each Symbol Represents 10 Cases.
The classplot above is an alternative way of assessing correct and incorrect
predictions under logistic regression. The X axis is the predicted probability from
0.0 to 1.0 of the dependent being classified "1" (minority status). The Y axis is
frequency: the number of cases classified. Inside the plot are columns of observed
1's and 0's, which it here codes as Y's (for minority status) and N's (not minority),
with 10 cases per symbol. Examining this plot will tell such things as how well the
model classifies difficult cases (ones near p = .5). In this case, it also shows nearly
all cases are coded as being in the N (not minority status) group, even if in reality
they are in the Y (minority) group.
http://faculty.chass.ncsu.edu/garson/PA765/logispss.htm
Download