Stat 401C

advertisement
Stat 401E
Lab 9
Fall 2005
Objective: Practice computing multiple regression models, including partial F-tests and diagnosing
multicollinearity.
Readings: Howell (2002) chapter 15, especially sections 15.1 – 15.9 and 15.13.
The data for this lab are attached. The study is concerned about the health consequences of the social and
economic environment. The unit of analysis is a small region (Census tract) and N = 21. The response
(dependent) variable is non-accidental DEATHS, which is an indicator of physical health. The predictor
(independent) variables are population (POP), economic conditions as indicated by the VALUE of all residential
housing, and the number of doctors (DOCT), nurses (NURSE), and vocational nurses (VN) practicing in the
area.
1. To make sure the data are properly coded, and to get a feel for the distributions, GRAPH the response
variable (DEATHS) against each of the five predictor variables. Notice the range of values of each predictor
variable.
2. Using the DESCRIPTIVE option in the REGRESSION program, obtain a correlation matrix for the six
variables in the study. What is the range of correlations linking the response variable to each of the predictor
variables (i.e., what is the highest and lowest correlation?).
3. To actually obtain the correlation matrix mentioned in question 2, you have to list the variables and add
regression model statements. Add regression statements to estimate the following model.
M1:
y i   0  1 X 1   i
 i ~ NID(0,  2 )
M2:
yi   0   2 X 2   i
 i ~ NID(0,  2 )
M3:
y i   0  1 X 1   2 X 2   i
 i ~ NID(0,  2 )
M4:
y i   0  1 X 1   2 X 2   3 X 3   4 X 4   5 X 5   i
 i ~ NID(0,  2 )
where X1 = POP, X2 = VALUE, X3 = DOCT, X4= NURSE and X5 = VN. To set up the models using SPSS,
your regression statements should be of the form:
REGRESSION / DESCRIPTIVES / VARS = DEATHS POP VALUE DOCT NURSE VN
/DEP = DEATHS / ENTER = POP.
REGRESSION / DESCRIPTIVES / VARS = DEATHS POP VALUE DOCT NURSE VN
/DEP = DEATHS / ENTER = VALUE / ENTER = POP / ENTER = DOC NURSE VN
/ SCATTERPLOT (*ZRESID, *ZPRED).
(a)
Using your printout, examine the F-ratio and the t-ratios for model M3. Is there evidence that
population and economic value, taken together, contribute significantly to the explanation of nonaccidental deaths? What is the evidence? Is there evidence that population contributes to the
explanation of deaths, after controlling for economic value? Again, what is the evidence? Is there
evidence that economic value contributes significantly to the explanation of deaths, after controlling
for population? For each test, let  = 0.05.
(b)
Next, conduct partial F-tests to examine the effects of population on deaths after controlling for
economic value, and to examine the effects of economic value of deaths after controlling for
population. Again, let  = 0.05. To conduct the first of these two partial F-tests, note that the sums of
squares due to regression in model M2 can be denoted by SS(X2) and the sums of squares due to
regression in model M3 can be denoted by SS(X1, X2). Then we can denote the sums of squares due to
X1 after controlling for X2 as
SS(X1 | X2) = SS(X1, X2) – SS(X2).
and a partial F-test can be constructed as follows:
Fn13 
SS(X1 | X 2 )
MSE ( X 1 , X 2 )
A similar test can be constructed to examine the effects of economic value on deaths, after controlling
for population (i.e., SS(X2 | X1)).
(c)
What is the relationship between the two partial F-tests in (b) and the two t-tests in (a)?
(d)
Why are the t-tests for the two slopes in model M3 so small compared to the t-tests for the slopes in
models M1 and M2?
(e)
Model M4 is distinct from model M3 in that it adds three variables designed to measure the presence
of medical support. The question we want to ask is, Is there evidence that the presence of medical
support affects the number of deaths, after controlling for the effects of population and economic
value? To address this question, we first denote the contribution of medical support (the combined
effects of all three variables), after controlling for population and economic value, by the expression
SS (X3 X4 X5 | X1 X2) = SS (X1 X2 X3 X4 X5) – SS (X1 X2)
Notice that this is the difference between the sums of squares due to regression in models M3 and M4.
A multiple partial F-test can then be constructed of the form:
Fn36
4.
SS(X 3 X 4 X 5 | X 1 X 2 )
3

.
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 )
(f)
Compare the coefficients associated with POP in models M1, M3 and M4. What happens to the
estimate of the slope as you move from M1 to M3 and on to M4? What happens to the estimate of the
standard error of the slope? Provide an explanation for the changes you observe.
(g)
Model criticism: Conclude your analyses by examining the residuals from M4. Is there evidence of
outliers? Influential data points? Any patterns in the residuals?
Change all the variables in question 1 into per capita values. That is, write “compute” statements of the
form
compute pcvalue = value/pop.
This statement creates a new variable, residential economic value per capita. After computing similar per
capita statements for doctors, nurses, and vocational nurses, estimate the following two models
M2a:
yi   0   2 X 2   i
 i ~ NID(0,  2 )
M4a:
yi   0   2 X 2   3 X 3   4 X 4   5 X 5   i
 i ~ NID(0,  2 )
where X2 = value per capita, X3 = doctors per capita, X4 = nurses per capita and X5 = vocational nurses per
capita. In this case, you want to use the regression statements of the form:
REGRESSION / DESCRIPTIVES
/ VARS = PCDEATHS PCVALUE PCDOCT PCNURSE PCVN
/ DEP = PCDEATHS / ENTER = PCVALUE / ENTER = PCDOC PCNURSE PCVN
/ SCATTERPLOT (*ZRESID, *ZPRED).
(a) Compare the correlation matrix generated by this DESCRIPTIVES command with the one generated
by the first set of commands in question 3? What do you conclude about the effects of expressing
these variables in “per capita” terms.
(b) Compare the regression coefficients for these two sets of models (i.e., Models M2 and M2a, and
models M4 and M4a). What similarities do you observe? What differences? Which models generate
the largest R-square?
(c) Model criticism: Conclude your analyses by examining the residuals from M4. Is there evidence of
outliers? Influential data points? Any patterns in the residuals?
(d) After examining models M2a and M4a, write an “executive summary” that summarizes your
observations about the effects of per capita economic value and per capita medical support on per
capita death rates in these 21 small regions.
TITLE "DEATH RATES".
Data list free / pop value doct nurse vn deaths.
Compute pcvalue = value/pop.
****take this line out and enter the remaining 4 “compute” statements here*****
Begin data
100 141.832 49 76 221 661
110 246.796 103 250 378 1149
130 238.065 76 140 207 1333
142 265.903 95 150 381 1321
202 397.63 162 324 554 2418
213 464.319 194 282 560 2039
246 409.948 130 211 465 2518
280 556.027 205 383 942 3088
304 711.61 222 461 723 1882
316 820.517 304 469 598 2437
328 709.859 267 525 911 2177
330 829.837 245 639 739 2593
337 465.148 221 343 541 2295
379 839.108 330 714 330 2119
434 792.016 420 865 894 4294
434 883.721 384 601 1158 2836
436 939.706 363 530 1219 4637
447 1141.803 511 180 513 3236
1087 2511.533 1193 1792 1922 7768
2305 6774.162 3450 5357 4125 14590
2637 8318.923 3131 4630 4785 19044
End data.
Frequencies var=pop value doct nurse vn deaths.
Graph /scatterplot pop with deaths.
Graph /scatterplot value with deaths.
Graph / scatterplot doct with deaths.
Graph / scatterplot nurse with deaths.
Graph / scatterplot vn with deaths.
Regression/descriptives/var=deaths pop value doct nurse vn
/dep = deaths / enter = pop.
Regression/descriptives/var=deaths pop value doct nurse vn
/dep = deaths / enter = value /enter = pop /enter = doct nurse vn
/scatterplot (*zresid, *pred).
****take this line out and write the frequency, graph, and
regression statements you need to complete question 4****
Download