Answers

advertisement
BSTT 401 Spring 2001 Sample Exam I
Answers
A sociologist investigating the recent increase in the incidence of homicide
throughout the United States studied the extent to which the homicide rate per 100,000
population (Y) is associated with the city’s population size (X1, in thousands) and the
percentage of families with yearly income less than $5,000 (X2) and the rate of
unemployment (X3). The data were collected from 20 cities.
The data steps for the outputs 1-9 are given as follows:
data homicide;
infile 'c:\data\homicide.txt' expandtabs firstobs=2;
input id y x1 x2 x3;
run;
1. Write the proc steps that can produce the sas outputs. (10 pts)
proc reg;
model y = x1;
model y = x2;
model y = x3;
model y = x1 x2/pcorr1 pcorr2;
model y = x1 x3/pcorr1 pcorr2;
model y = x2 x3/pcorr1 pcorr2;
model y = x1 x2 x3/pcorr1 pcorr2;
model y = x1 x2 x3 /selection=rsquare adjrsq cp b best=3;
proc glm;
model y=x1 x2 x3;
run;
2. Write the formula for the models 1-7. (10 pts)
Assume that E ~NID(0,  2 )
Model 0 Y =  0  E
Model 1 Y =  0  1X1  E
Model 2 Y =  0  1X2  E
Model 3 Y =  0  1X3  E
Model 4 Y =  0  1X1   2 X2  E
Model 5 Y =  0  1X1   2 X3  E
Model 6 Y =  0  1X2   2 X3  E
Model 7 Y = 0  1X1  2 X2  3X3  E
3
From the output for Model 1, answer the following questions. (20 pts)
a) What is the fitted regression equation for the homicide data.
Ŷ  21.1  .000389 * X1
b) State the hypothesis that is tested by the F-statistics computed in the ANOVA
table.
H 0 : 1  0 vs H A : Not H 0.
c) What is the correlation between Y and X1?
rYX1   .0045  .067 . However, since ˆ 1  .000389 , rYX1  .067
d) Test the null hypothesis that the correlation between Y and X1 equals zero in the
population.
Testing H 0 :  YX1  0 vs H A :  YX1  0 is equivalent to
testing H 0 : 1  0 vs H A : 1  0 .
Since F = .081 and the p-value is .7787, we cannot reject H 0 .
e) We know that SST is 1855.202 and does not change even if the model changes.
Suppose SSR(X1) = 20. Calculate the corresponding R-square.
R-square = SSR(X1)/SST = 20/1855.202 = .0108.
From the outputs for the models 1-7, the r-square method, the glm for model 7, answer
the following questions.
4.(10 pts) a) Predict the value of Y when X1 = 1,300, X2 = 21, X3 = 7.
Ŷ  34.764925  .000763 *1300  1.192174 * 21  4.719821* 7  22.17
b) Is the change in R-square by adding X2 to the model 1 significant?
Comparing Model 1 and Model 4, R-square changes from .0045 to .7103. The
difference appears to be significant. However, we need to show that the change in
R-square, .7058, is significant statistically.
This is the same as testing, in Model 4, the following hypotheses:
H 0 :  2  0 vs H A :  2  0 .
T = 6.43 and F = T 2  41.41 . The p-value is <.0001. Thus we reject H 0 .
Therefore, we conclude that the change in R-square from .0045 to .7103 is
significant.
5. (10 pts) a) When X3 was added to the model 4, what is the partial correlation
between Y and X3 in the model 7?
rYX3 |X1 ,X 2   .3728   .6105 . Knowing that ̂  in Model 7 is positive,
rYX3 |X1 ,X 2  .6105
b) Test the null hypothesis that the partial correlation between Y and X3
in the model 7 equals zero in the population.
Testing H 0 :  YX3|X1,X 2  0 vs H A :  YX3|X1,X 2  0 is equivalent to
testing H 0 :  3  0 vs H A :  3  0 in Model 7.
Since T = 3.084 and F = 9.51 and the p-value is .0071, we reject H 0 .
6. (10 pts) Complete the following table indicating the significance of the entries by *
for alpha = .05 and ** for alpha = .01
Variables
X1
X2
X3
X1,X2
Regression coefficients
Intercept
21.13**
-29.90
**
-28.53
**
-31.22
**
X1,X3
-31.60
**
X2,X3
-34.07
**
X1
X2
X3
.00039
Partial F-tests
(Squared Partial c.c.)
X1
X2
X3
Overall
F-test
R-squares
.081
.005
43.06**
.705
53.42**
.748
20.84**
.710
55.7**4
(.7661)
28.05**
.767
8.31*6
(.3283)
34.43**
.802
.081
(.005)
2.559
**
43.06**
(.705)
7.08
**
.00042 2.596
**
.00083
7.352
**
1.224
*
4.399
*
53.42**
(.748)
.2991
(.0172
)
1.4023
(.0762
)
41.4**2
(.7090)
4.64*5
(.2144)
X1,X2,X3
-36.76
**
.00076 1.192
*
4.720
**
General Method: Using the R-square method output and the formula
1.
3.
5.
(0.71032611 - .70522751)/1
 .299
(1  0.71032611) /( 20  2  1)
.76715744
1
.74795075
.76715744
= 1.402
17
.80199321 .74795075
1
.80199321
17
= 4.64
2.
6.
F
.71032611
1
4.
4.51*
(.2197)
1.44
(.0824
)
9.51**
(.3728)
R (k)  R (p)/(k  p)
1  R (k)/(n  k  1)
2
2
2
.00450219
.71032611
17
.76715744 .00450219
1
.76715744
17
.80199321 .70522751
1
24.02**
.80199321
= 41.422
= 55.682
= 8.308
17
Simplest Method:
1. From Model 4, T for X1 = .547 , F = .299; 2. T for X2 = 6.346, F = 41.422
3. From Model 5, T for X1 = 1.184, F = 1.402; 4. T for X3 = 7.462, F = 55.682
5. From Model 6, T for X2 = 2.154, F = 4.64; 6. T for X3 =2.882, F = 8.308
Squared partial c.c. can be read directly from the output for Models 4 – 7.
7. (10 pts) When Y was regressed on X1, X2, and X3, provide a table of variablesadded-in-order tests indicating the significance of the entries by * for alpha = .05 and **
for alpha = .01 for (partial) F-tests.
From the glm output Type I SS and the Table in 6.
Source
d.f.
SS
MS
X1
1
8.352
8.352
X2|X1
1
1309.446
1309.446
X3|X1,X2
1
200.347
200.347
Residual
16
337.057
21.066
Total
19
1855.202
F
.081
41.42**
9.51**
R-square
.8183
.818
8. (10 pts) When Y was regressed on X1, X2, and X3, provide a table of variablesadded-last tests indicating the significance of the entries by * for alpha = .05 and ** for
alpha = .01 for (partial) F-tests.
From the glm output Type III SS and the Table in 6.
Source
d.f.
SS
MS
X1|X2,X3
1
30.286
30.286
X2|X1,X3
1
94.913
94.913
X3|X1,X2
1
200.347
200.347
Residual
16
337.057
21.066
Total
19
1855.202
F
1.44
4.51*
9.51**
R-square
.8183
9. (10 pts) a) In conclusion, which model appears to be the best model?
Model 6 Y =  0  1X2   2 X3  E
b) Why?
Based on the variables-in-order-tests and the variables-added-last tests, we find
the partial F-test statistics are significant only on X2 and X3.
Based on the r-square method, Model 6 has the relatively largest Adjusted Rsquare and C(p) closed to p.
Download