Statistics 511 Study Guide 7 Fall 2001 A) Tests and Confidence Intervals for Partial Regression Coefficients A 1977 survey of the clerical employees of a large financial organization included questions related to employee satisfaction with their supervisors. There were six questions related to specific activities involving interaction between the supervisor and employee. A 7th question rated the overall satisfaction of the employee with the supervisor's performance. The data were collected in 30 departments of about the same size selected at random from the organization. The data recorded are the percentage of employees in the department giving their supervisor a favorable rating on each question. The variables are described below. The data are stored in PERFORM DATA I. The variables are: y x1 x2 x3 x4 x5 x6 Overall rating of job being done by supervisor Handles employee complaints Does not allow special privileges Opportunity to learn new things Raises based on performances Too critical of poor performances Rate of advancing to better jobs 1) Below is the output from SAS PROC REG for the regression of rating on x1 through x6. Assume that no problems were apparent in the residual plots. DEP VARIABLE: Y SOURCE MODEL ERROR C TOTAL ANALYSIS OF VARIANCE SUM OF MEAN SQUARES SQUARE 3147.96634 524.66106 1149.00032 49.95653586 4296.96667 DF 6 23 29 ROOT MSE DEP MEAN C.V. VARIABLE INTERCEP X3 X4 X6 X1 X2 X5 DF 1 1 1 1 1 1 1 F VALUE 10.502 PROB>F 0.0001 7.067994 R-SQUARE 0.7326 64.63333 ADJ R-SQ 0.6628 10.93552 PARAMETER ESTIMATES PARAMETER STANDARD T FOR H0: ESTIMATE ERROR PARAMETER=0 10.78707639 11.58925724 0.931 0.32033212 0.16852032 1.901 0.08173213 0.22147768 0.369 -0.21705668 0.17820947 -1.218 0.61318761 0.16098311 3.809 -0.07305014 0.13572469 -0.538 0.03838145 0.14699544 0.261 -1- Statistics 511 Study Guide 7 Fall 2001 a) Test whether the slope of rating on x3 after controlling for the other variables is 0.5. b) Test whether x5 (whether the supervisor is too critical) has a significant effect on rating after accounting for the effects of the other variables. c) Form a 95% confidence interval for 2.013456. B) Sequential Analysis and Simultaneous Tests 2) Below are the sequential and partial sums of squares for the regression problem described in section A. The variables have been entered in the order x3, x4, x6, x1, x2, x5. VARIABLE INTERCEP X3 X4 X6 X1 X2 X5 DF 1 1 1 1 1 1 1 PROB > |T| 0.3616 0.0699 0.7155 0.2356 0.0009 0.5956 0.7963 TYPE I SS 125324.03 1671.41026 265.10507 437.02717 756.85907 14.15891594 3.40586376 TYPE II SS 43.28013610 180.50479 6.80327563 74.11004403 724.80036 14.47160631 3.40586376 a) Test whether x2 contributes significantly to the model after accounting for the effects of all the other variables. b) What is the contribution of x6 to R2 of rating when x3 and x4 are in the model? c) Use sequential tests in the order given to determine how many variables can be dropped from the model. d) x1, x2, and x5 are related to the personal interaction between the supervisor and the employee while x3, x4 , and x6 are related to how the employees feel about their jobs. One of the supervisors claims that once the effects of the aspects of the job measured by x3, x4 , and x6 are accounted for, the personal interactions measured by x1, x2, and x5 have no effect on how the employees rate their supervisors overall. Do a simultaneous test to see if this claim is supported by the data. C) Polynomial Regression 3) In experiments on the absorption of sulfonamides, a dose of 1 mg/4g was delivered by stomach tube to each of three mice. Blood samples were drawn from the tail veins of each mouse at time intervals ranging from 20 to 420 minutes. The samples were mixed, and the concentration of sulfanilamide in the samples was determined for each time. Generally the data are transformed to logarithms before analysis. -2- Statistics 511 Study Guide 7 Fall 2001 a) Below is are plots of concentration versus time, and log(concentration) versus log(time). Why might the investigators prefer to use logarithms? PLOT OF CONCENT*TIME SYMBOL USED IS * PLOT OF LCONCEN*LTIME USED IS * CONCENT | | 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 + * | | * | + | | | + * | * | | + | | | * + | | | + | | | + | | * | + | | | + | | | + | | | * + | | | + * | -+-------+-------+-------+-------+-------+-------+-------+-------+----0 50 100 150 200 250 300 350 400 TIME SYMBOL LCONCEN | * | * | 0.4 + | | | * * 0.3 + | | | 0.2 + * | | | 0.1 + | | | 0.0 + | | * | -0.1 + | | | -0.2 + | | | -0.3 + | | | -0.4 + | | * | -0.5 + | | | -0.6 + | | | -0.7 + * -----+---------+---------+---------+---------+---------+---------+----3.0 3.5 4.0 4.5 5.0 5.5 6.0 LTIME b) What degree polynomial appears to fit these data? c) What degree polynomial should you try to fit to these data? d) Below is the SAS output from fitting a cubic polynomial to these data. Should partial or sequential sums of squares be used for determining the appropriate degree of the polynomial? DEP VARIABLE: LCONCEN SOURCE MODEL ERROR C TOTAL VARIABLE INTERCEP LTIME LTIME2 LTIME3 DF 1 1 1 1 DF 3 4 7 ANALYSIS OF VARIANCE SUM OF MEAN SQUARES SQUARE 1.32358773 0.44119591 0.01251950 0.003129875 1.33610723 PARAMETER ESTIMATE -1.58316 0.78763 0.00512 -0.01859 F VALUE 140.963 PARAMETER ESTIMATES STANDARD T FOR H0: ERROR PARAMETER=0 2.334245 -0.678 1.634226 0.482 0.371068 0.014 0.027391 -0.679 -3- PROB>F 0.0002 PROB > |T| 0.5348 0.6550 0.9896 0.5345 Statistics 511 Study Guide 7 VARIABLE INTERCEP LTIME LTIME2 LTIME3 DF 1 1 1 1 TYPE I SS 0.04189055 0.95575414 0.36639154 0.001442047 Fall 2001 TYPE II SS 0.001439746 0.000727036 5.96790E-07 0.001442047 d) What is the appropriate degree of a polynomial for these data? e) Below is the SAS output from fitting a quadratic polynomial to these data. What is the estimated mean concentration of sulfanilamide in a mouse one hour after a dose of 1 mg/4g was delivered by stomach tube? DEP VARIABLE: LCONCEN SOURCE MODEL ERROR C TOTAL DF 2 5 7 ROOT MSE ANALYSIS OF VARIANCE SUM OF MEAN SQUARES SQUARE 1.32214568 0.66107284 0.01396155 0.002792309 1.33610723 0.05284231 VARIABLE INTERCEP LTIME LTIME2 DF 1 1 1 PARAMETER ESTIMATE -3.13658682 1.88787605 -0.24627413 VARIABLE INTERCEP LTIME LTIME2 DF 1 1 1 TYPE I SS 0.04189055 0.95575414 0.36639154 R-SQUARE F VALUE 236.748 0.9896 PARAMETER ESTIMATES STANDARD T FOR H0: ERROR PARAMETER=0 0.43404237 -7.226 0.19660363 9.602 0.02149948 -11.455 TYPE II SS 0.14581894 0.25747049 0.36639154 -4- PROB>F 0.0001 PROB > |T| 0.0008 0.0002 0.0001 Statistics 511 Study Guide 7 Fall 2001 Study Guide 7 Solutions A. Tests and Confidence Intervals for Partial Regression Coefficients 1. a) H0: 3 = 0.5 HA: 3 0.5 vs. b k k b k 0.5 s( b k ) s( b k ) Test Statistics: t calc Rejection Region: Reject H0 if |tcalc| > t[0.975,23] ( = 0.05) Reject if |tcalc| > 2.069 Calculations: b3 = 0.32 s(b3) = 0.169 0.32 0.5 t calc 1.07 0.169 Results: |tcalc| = 1.07 < 2.069 so fail to reject H0 b) Test H0: 5 = 0 vs. HA: 5 0 b 5 5 s( b 5 ) Test Statistic: t calc Rejection Region: Reject H0 if |tcalc| > t[1-/2;df] Let = 0.05. t[0.975,23] = 2.069 Calculations: b5 = 0.03838, s(b5) = 0.14700 0.03838 t calc 0.261 0.14700 Results: |tcalc| = 0.261 < 2.069, so fail to reject H0. Thus there is no significant effect of x5 after accounting for the effects of the other variables. c) 95% CI for 2 Statistic: Calculations: bk t(1-/2;n-p) s(bk) b2 = -0.073 s(b2) = 0.136 t(1-/2;n-p) = t(0.975,23) = 2.069 so, b2 t(0.975,23) s(bk) = -0.073 2.069(0.136) = -0.073 0.281 = (-0.354,0.208) Results: With 95% confidence, 2 is between -0.354 and 0.208. -5- Statistics 511 B. Study Guide 7 Fall 2001 Sequential Analysis and Simultaneous Tests 2. a) H0: x2 does not contribute significantly to the model after accounting for the effects of other variables HA: x2 does contribute significantly. Test Statistic: F* SSR ( x 2 | all the other vari ables ) MSE Rejection Region: Reject H0 if F* > F(0.95,1,23) = 4.28 Calculations: SSR(x2|x0,x3,x4,x6,x1,x5) = 14.472 (type II SS) MSE = 49.957 14.472 F* 0.29 . 49.957 Results: Since F* = 0.29 < 4.28 fail to reject H0. The variable x2 (does not allow special privileges) does not contribute significantly to the model after accounting for the effects of the other variables. Note: A t-test for 2 = 0 could also be used. b) The contribution of x6 to R2 when x3 and x4 are already in the model is SSR ( x 6 | x 3 , x 4 ) 437.02717 0.1017 10.17%. SST 4296.96667 c) Sequential tests (Since there are 23 error degrees of freedom for error, pooling is not necessary.) i) H0: x5 has no significant effect on the rating after accounting for the other variables (5= 0). HA: x5 has a significant effect (5 0). MSR ( x 5 | x 0 , x 3 , x 4 , x 6 , x1 , x 2 ) . MSE (full ) Test statistic: F* Rejection region: Reject H0 if F* > F(0.95;1,23) = 4.28. Calculations: MSR(x5|others) = 3.406 3.406 F* 0.068 . 49.957 MSE = 49.957 Results: Fail to reject H0. x5 has no significant effect after accounting for the other variables in the model. -6- Statistics 511 Study Guide 7 Fall 2001 ii) H0: 2 = 0 (given that 5 = 0) HA: 2 0 (given that 5 = 0) SSR ( x 2 | x 3 , x 4 , x 6 , x1 ) / 1 Test statistic: F* MSE (full ) Rejection region: Reject if F* > F(0.95,1,23) = 4.28 Calculations: SSR(x2|x3,x4,x6,x1) = 14.15891594 (type I SS for x2 from SAS output) 14.15891594 F* 0.283 49.957 Results: Since F* = 0.283 < F = 4.28, fail to reject H0. At = 0.05, 2 = 0 when x3, x4, x6, and x1 are in the model. iii) H0: 1 = 0 (given that 5 = 2 = 0) HA: 1 0 (given that 5 = 2 = 0) Test statistic: Rejection region: SSR ( x1 | x 3 , x 4 , x 6 ) / 1 MSE (full ) Reject if F* > F(0.95,1,23) = 4.28 F* Calculations: SSR(x1|x3,x4,x6) = 756.85907 (type I SS for x1 from SAS output) 756.85907 F* 15.150 49.957 Results: Since F* = 15.150 > F = 4.28, reject H0. At = 0.05, 1 0 when x3, x4, and x6 are in the model. d) Simultaneous test to see if x1, x2, and x5 have no effect on rating given x3, x4, x6 H0: x1, x2, and x5 have no effect on rating given x3, x4, x6. HA: not H0. SSR ( x1 , x 2 , x 5 | x 3 , x 4 , x 6 )/3 MSE Test statistic: F* Rejection region: Reject H0 if F* > F(0.95;3,23) = 3.03. Calculations: SSR(x1,x2,x5|x3,x4,x5) = 756.859 + 14.157 + 3406 =774.424. 774.424 / 3 F* 5.167 . 49.957 -7- Statistics 511 Study Guide 7 Fall 2001 Results: Reject H0. At = 0.05, the claim that once the effects of the aspects of the job measured by x3, x4, and x6 are accounted for, the personal interactions measured by x1, x2, and x5 have no effect on how the supervisors rate their supervisors, is not substantiated, i.e., the data do not support the claim. C. Polynomial Regression. a) The plot of log(concentration) vs. log (time) indicates that a smooth curve would fit the data very well. A linear model fitted to concentration vs. time would definitely have outliers and it does not appear that there is a smooth curve that would fit the untransformed data. Thus, we hope to fit a polynomial regression model to log (concentration) vs. log (time). b) A parabola seems an appropriate curve through the points on the log (concentration) vs log (time) plot. This would be a second degree (quadratic) polynomial, which we should try to fit. One needs to be wary of cubic and higher polynomials. c) We should use type I SS to determine the appropriate degree of the polynomial. d) The small value of type I SS for LTIME3 (the cubic term) relative to that for LTIME and LTIME2 (linear & squared terms) seems to indicate that the cubic term is insignificant and we should use a quadratic polynomial. To test for significance we must compare the type I SS for LTIME, LTIME2 and LTIME3 with the MSE using an F-test: LTIME3: F* 0.001442047 0.003129875 0.3 > F0.95(1,4) = 7.71 LTIME2: F* 0.36639154 0.003129875 120 > F0.95(1,4) = 7.71 LTIME: F* 0.95575414 0.003129875 300 > F0.95(1,4) = 7.71 Type I SS for LTIME3 is not significantly large compared to the MSE and thus the cubic term should be removed. LTIME2 is significantly large compared to the MSE and consequently this term should be included. e) At TIME 60 minutes, LTIME log (60) = 4.0943446 2 LTIME2 (log (60)) = 16.763657 LCONCENT = -3.13658682 + (1.88787605) 4.0943446 – (0.24627413) 16.763657 = 0.4645733. estimated CONCENTRATION = e0.465 = 1.592 -8-