BSTT 401 Spring 2001 Sample Exam I Answers A sociologist investigating the recent increase in the incidence of homicide throughout the United States studied the extent to which the homicide rate per 100,000 population (Y) is associated with the city’s population size (X1, in thousands) and the percentage of families with yearly income less than $5,000 (X2) and the rate of unemployment (X3). The data were collected from 20 cities. The data steps for the outputs 1-9 are given as follows: data homicide; infile 'c:\data\homicide.txt' expandtabs firstobs=2; input id y x1 x2 x3; run; 1. Write the proc steps that can produce the sas outputs. (10 pts) proc reg; model y = x1; model y = x2; model y = x3; model y = x1 x2/pcorr1 pcorr2; model y = x1 x3/pcorr1 pcorr2; model y = x2 x3/pcorr1 pcorr2; model y = x1 x2 x3/pcorr1 pcorr2; model y = x1 x2 x3 /selection=rsquare adjrsq cp b best=3; proc glm; model y=x1 x2 x3; run; 2. Write the formula for the models 1-7. (10 pts) Assume that E ~NID(0, 2 ) Model 0 Y = 0 E Model 1 Y = 0 1X1 E Model 2 Y = 0 1X2 E Model 3 Y = 0 1X3 E Model 4 Y = 0 1X1 2 X2 E Model 5 Y = 0 1X1 2 X3 E Model 6 Y = 0 1X2 2 X3 E Model 7 Y = 0 1X1 2 X2 3X3 E 3 From the output for Model 1, answer the following questions. (20 pts) a) What is the fitted regression equation for the homicide data. Ŷ 21.1 .000389 * X1 b) State the hypothesis that is tested by the F-statistics computed in the ANOVA table. H 0 : 1 0 vs H A : Not H 0. c) What is the correlation between Y and X1? rYX1 .0045 .067 . However, since ˆ 1 .000389 , rYX1 .067 d) Test the null hypothesis that the correlation between Y and X1 equals zero in the population. Testing H 0 : YX1 0 vs H A : YX1 0 is equivalent to testing H 0 : 1 0 vs H A : 1 0 . Since F = .081 and the p-value is .7787, we cannot reject H 0 . e) We know that SST is 1855.202 and does not change even if the model changes. Suppose SSR(X1) = 20. Calculate the corresponding R-square. R-square = SSR(X1)/SST = 20/1855.202 = .0108. From the outputs for the models 1-7, the r-square method, the glm for model 7, answer the following questions. 4.(10 pts) a) Predict the value of Y when X1 = 1,300, X2 = 21, X3 = 7. Ŷ 34.764925 .000763 *1300 1.192174 * 21 4.719821* 7 22.17 b) Is the change in R-square by adding X2 to the model 1 significant? Comparing Model 1 and Model 4, R-square changes from .0045 to .7103. The difference appears to be significant. However, we need to show that the change in R-square, .7058, is significant statistically. This is the same as testing, in Model 4, the following hypotheses: H 0 : 2 0 vs H A : 2 0 . T = 6.43 and F = T 2 41.41 . The p-value is <.0001. Thus we reject H 0 . Therefore, we conclude that the change in R-square from .0045 to .7103 is significant. 5. (10 pts) a) When X3 was added to the model 4, what is the partial correlation between Y and X3 in the model 7? rYX3 |X1 ,X 2 .3728 .6105 . Knowing that ̂ in Model 7 is positive, rYX3 |X1 ,X 2 .6105 b) Test the null hypothesis that the partial correlation between Y and X3 in the model 7 equals zero in the population. Testing H 0 : YX3|X1,X 2 0 vs H A : YX3|X1,X 2 0 is equivalent to testing H 0 : 3 0 vs H A : 3 0 in Model 7. Since T = 3.084 and F = 9.51 and the p-value is .0071, we reject H 0 . 6. (10 pts) Complete the following table indicating the significance of the entries by * for alpha = .05 and ** for alpha = .01 Variables X1 X2 X3 X1,X2 Regression coefficients Intercept 21.13** -29.90 ** -28.53 ** -31.22 ** X1,X3 -31.60 ** X2,X3 -34.07 ** X1 X2 X3 .00039 Partial F-tests (Squared Partial c.c.) X1 X2 X3 Overall F-test R-squares .081 .005 43.06** .705 53.42** .748 20.84** .710 55.7**4 (.7661) 28.05** .767 8.31*6 (.3283) 34.43** .802 .081 (.005) 2.559 ** 43.06** (.705) 7.08 ** .00042 2.596 ** .00083 7.352 ** 1.224 * 4.399 * 53.42** (.748) .2991 (.0172 ) 1.4023 (.0762 ) 41.4**2 (.7090) 4.64*5 (.2144) X1,X2,X3 -36.76 ** .00076 1.192 * 4.720 ** General Method: Using the R-square method output and the formula 1. 3. 5. (0.71032611 - .70522751)/1 .299 (1 0.71032611) /( 20 2 1) .76715744 1 .74795075 .76715744 = 1.402 17 .80199321 .74795075 1 .80199321 17 = 4.64 2. 6. F .71032611 1 4. 4.51* (.2197) 1.44 (.0824 ) 9.51** (.3728) R (k) R (p)/(k p) 1 R (k)/(n k 1) 2 2 2 .00450219 .71032611 17 .76715744 .00450219 1 .76715744 17 .80199321 .70522751 1 24.02** .80199321 = 41.422 = 55.682 = 8.308 17 Simplest Method: 1. From Model 4, T for X1 = .547 , F = .299; 2. T for X2 = 6.346, F = 41.422 3. From Model 5, T for X1 = 1.184, F = 1.402; 4. T for X3 = 7.462, F = 55.682 5. From Model 6, T for X2 = 2.154, F = 4.64; 6. T for X3 =2.882, F = 8.308 Squared partial c.c. can be read directly from the output for Models 4 – 7. 7. (10 pts) When Y was regressed on X1, X2, and X3, provide a table of variablesadded-in-order tests indicating the significance of the entries by * for alpha = .05 and ** for alpha = .01 for (partial) F-tests. From the glm output Type I SS and the Table in 6. Source d.f. SS MS X1 1 8.352 8.352 X2|X1 1 1309.446 1309.446 X3|X1,X2 1 200.347 200.347 Residual 16 337.057 21.066 Total 19 1855.202 F .081 41.42** 9.51** R-square .8183 .818 8. (10 pts) When Y was regressed on X1, X2, and X3, provide a table of variablesadded-last tests indicating the significance of the entries by * for alpha = .05 and ** for alpha = .01 for (partial) F-tests. From the glm output Type III SS and the Table in 6. Source d.f. SS MS X1|X2,X3 1 30.286 30.286 X2|X1,X3 1 94.913 94.913 X3|X1,X2 1 200.347 200.347 Residual 16 337.057 21.066 Total 19 1855.202 F 1.44 4.51* 9.51** R-square .8183 9. (10 pts) a) In conclusion, which model appears to be the best model? Model 6 Y = 0 1X2 2 X3 E b) Why? Based on the variables-in-order-tests and the variables-added-last tests, we find the partial F-test statistics are significant only on X2 and X3. Based on the r-square method, Model 6 has the relatively largest Adjusted Rsquare and C(p) closed to p.