A) Tests and Confidence Intervals for Partial Regression Coefficients

advertisement
Statistics 511
Study Guide 7
Fall 2001
A) Tests and Confidence Intervals for Partial Regression Coefficients
A 1977 survey of the clerical employees of a large financial organization included
questions related to employee satisfaction with their supervisors. There were six questions
related to specific activities involving interaction between the supervisor and employee. A
7th question rated the overall satisfaction of the employee with the supervisor's
performance.
The data were collected in 30 departments of about the same size selected at random from
the organization. The data recorded are the percentage of employees in the department
giving their supervisor a favorable rating on each question.
The variables are described below. The data are stored in PERFORM DATA I. The
variables are:
y
x1
x2
x3
x4
x5
x6
Overall rating of job being done by supervisor
Handles employee complaints
Does not allow special privileges
Opportunity to learn new things
Raises based on performances
Too critical of poor performances
Rate of advancing to better jobs
1) Below is the output from SAS PROC REG for the regression of rating on x1
through x6. Assume that no problems were apparent in the residual plots.
DEP VARIABLE: Y
SOURCE
MODEL
ERROR
C TOTAL
ANALYSIS OF VARIANCE
SUM OF
MEAN
SQUARES
SQUARE
3147.96634
524.66106
1149.00032 49.95653586
4296.96667
DF
6
23
29
ROOT MSE
DEP MEAN
C.V.
VARIABLE
INTERCEP
X3
X4
X6
X1
X2
X5
DF
1
1
1
1
1
1
1
F VALUE
10.502
PROB>F
0.0001
7.067994
R-SQUARE
0.7326
64.63333
ADJ R-SQ
0.6628
10.93552
PARAMETER ESTIMATES
PARAMETER
STANDARD
T FOR H0:
ESTIMATE
ERROR
PARAMETER=0
10.78707639
11.58925724
0.931
0.32033212
0.16852032
1.901
0.08173213
0.22147768
0.369
-0.21705668
0.17820947
-1.218
0.61318761
0.16098311
3.809
-0.07305014
0.13572469
-0.538
0.03838145
0.14699544
0.261
-1-
Statistics 511
Study Guide 7
Fall 2001
a) Test whether the slope of rating on x3 after controlling for the other variables is
0.5.
b) Test whether x5 (whether the supervisor is too critical) has a significant effect on
rating after accounting for the effects of the other variables.
c) Form a 95% confidence interval for 2.013456.
B) Sequential Analysis and Simultaneous Tests
2) Below are the sequential and partial sums of squares for the regression problem
described in section A. The variables have been entered in the order x3, x4, x6, x1,
x2, x5.
VARIABLE
INTERCEP
X3
X4
X6
X1
X2
X5
DF
1
1
1
1
1
1
1
PROB > |T|
0.3616
0.0699
0.7155
0.2356
0.0009
0.5956
0.7963
TYPE I SS
125324.03
1671.41026
265.10507
437.02717
756.85907
14.15891594
3.40586376
TYPE II SS
43.28013610
180.50479
6.80327563
74.11004403
724.80036
14.47160631
3.40586376
a) Test whether x2 contributes significantly to the model after accounting for the
effects of all the other variables.
b) What is the contribution of x6 to R2 of rating when x3 and x4 are in the model?
c) Use sequential tests in the order given to determine how many variables can be
dropped from the model.
d) x1, x2, and x5 are related to the personal interaction between the supervisor and
the employee while x3, x4 , and x6 are related to how the employees feel about
their jobs. One of the supervisors claims that once the effects of the aspects of
the job measured by x3, x4 , and x6 are accounted for, the personal interactions
measured by x1, x2, and x5 have no effect on how the employees rate their
supervisors overall. Do a simultaneous test to see if this claim is supported by the
data.
C) Polynomial Regression
3) In experiments on the absorption of sulfonamides, a dose of 1 mg/4g was
delivered by stomach tube to each of three mice. Blood samples were drawn
from the tail veins of each mouse at time intervals ranging from 20 to 420
minutes. The samples were mixed, and the concentration of sulfanilamide in the
samples was determined for each time. Generally the data are transformed to
logarithms before analysis.
-2-
Statistics 511
Study Guide 7
Fall 2001
a) Below is are plots of concentration versus time, and log(concentration) versus
log(time). Why might the investigators prefer to use logarithms?
PLOT OF CONCENT*TIME
SYMBOL USED IS *
PLOT OF LCONCEN*LTIME
USED IS *
CONCENT |
|
1.6
1.5
1.4
1.3
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
+
*
|
|
*
|
+
|
|
|
+
*
|
*
|
|
+
|
|
|
*
+
|
|
|
+
|
|
|
+
|
|
*
|
+
|
|
|
+
|
|
|
+
|
|
|
*
+
|
|
|
+
*
|
-+-------+-------+-------+-------+-------+-------+-------+-------+----0
50
100
150
200
250
300
350
400
TIME
SYMBOL
LCONCEN |
*
|
*
|
0.4
+
|
|
|
*
*
0.3
+
|
|
|
0.2
+
*
|
|
|
0.1
+
|
|
|
0.0
+
|
|
*
|
-0.1
+
|
|
|
-0.2
+
|
|
|
-0.3
+
|
|
|
-0.4
+
|
|
*
|
-0.5
+
|
|
|
-0.6
+
|
|
|
-0.7
+
*
-----+---------+---------+---------+---------+---------+---------+----3.0
3.5
4.0
4.5
5.0
5.5
6.0
LTIME
b) What degree polynomial appears to fit these data?
c) What degree polynomial should you try to fit to these data?
d) Below is the SAS output from fitting a cubic polynomial to these data. Should
partial or sequential sums of squares be used for determining the appropriate
degree of the polynomial?
DEP VARIABLE: LCONCEN
SOURCE
MODEL
ERROR
C TOTAL
VARIABLE
INTERCEP
LTIME
LTIME2
LTIME3
DF
1
1
1
1
DF
3
4
7
ANALYSIS OF VARIANCE
SUM OF
MEAN
SQUARES
SQUARE
1.32358773
0.44119591
0.01251950 0.003129875
1.33610723
PARAMETER
ESTIMATE
-1.58316
0.78763
0.00512
-0.01859
F VALUE
140.963
PARAMETER ESTIMATES
STANDARD
T FOR H0:
ERROR
PARAMETER=0
2.334245
-0.678
1.634226
0.482
0.371068
0.014
0.027391
-0.679
-3-
PROB>F
0.0002
PROB > |T|
0.5348
0.6550
0.9896
0.5345
Statistics 511
Study Guide 7
VARIABLE
INTERCEP
LTIME
LTIME2
LTIME3
DF
1
1
1
1
TYPE I SS
0.04189055
0.95575414
0.36639154
0.001442047
Fall 2001
TYPE II SS
0.001439746
0.000727036
5.96790E-07
0.001442047
d) What is the appropriate degree of a polynomial for these data?
e) Below is the SAS output from fitting a quadratic polynomial to these data. What
is the estimated mean concentration of sulfanilamide in a mouse one hour after a
dose of 1 mg/4g was delivered by stomach tube?
DEP VARIABLE: LCONCEN
SOURCE
MODEL
ERROR
C TOTAL
DF
2
5
7
ROOT MSE
ANALYSIS OF VARIANCE
SUM OF
MEAN
SQUARES
SQUARE
1.32214568
0.66107284
0.01396155 0.002792309
1.33610723
0.05284231
VARIABLE
INTERCEP
LTIME
LTIME2
DF
1
1
1
PARAMETER
ESTIMATE
-3.13658682
1.88787605
-0.24627413
VARIABLE
INTERCEP
LTIME
LTIME2
DF
1
1
1
TYPE I SS
0.04189055
0.95575414
0.36639154
R-SQUARE
F VALUE
236.748
0.9896
PARAMETER ESTIMATES
STANDARD
T FOR H0:
ERROR
PARAMETER=0
0.43404237
-7.226
0.19660363
9.602
0.02149948
-11.455
TYPE II SS
0.14581894
0.25747049
0.36639154
-4-
PROB>F
0.0001
PROB > |T|
0.0008
0.0002
0.0001
Statistics 511
Study Guide 7
Fall 2001
Study Guide 7 Solutions
A. Tests and Confidence Intervals for Partial Regression Coefficients
1. a) H0: 3 = 0.5
HA: 3  0.5
vs.
b k   k b k  0.5

s( b k )
s( b k )
Test Statistics:
t calc 
Rejection Region:
Reject H0 if |tcalc| > t[0.975,23] ( = 0.05)
Reject if |tcalc| > 2.069
Calculations:
b3 = 0.32 s(b3) = 0.169
0.32  0.5
t calc 
 1.07
0.169
Results:
|tcalc| = 1.07 < 2.069 so fail to reject H0
b) Test H0: 5 = 0
vs.
HA: 5  0
b 5  5
s( b 5 )
Test Statistic:
t calc 
Rejection Region:
Reject H0 if |tcalc| > t[1-/2;df]
Let  = 0.05.
t[0.975,23] = 2.069
Calculations:
b5 = 0.03838, s(b5) = 0.14700
0.03838
t calc 
 0.261
0.14700
Results: |tcalc| = 0.261 < 2.069, so fail to reject H0. Thus there is no significant effect of
x5 after accounting for the effects of the other variables.
c) 95% CI for 2
Statistic:
Calculations:
bk  t(1-/2;n-p) s(bk)
b2 = -0.073 s(b2) = 0.136
t(1-/2;n-p) = t(0.975,23) = 2.069
so, b2  t(0.975,23) s(bk) = -0.073  2.069(0.136)
= -0.073  0.281 = (-0.354,0.208)
Results: With 95% confidence, 2 is between -0.354 and 0.208.
-5-
Statistics 511
B.
Study Guide 7
Fall 2001
Sequential Analysis and Simultaneous Tests
2. a) H0: x2 does not contribute significantly to the model after accounting for the effects of other
variables
HA: x2 does contribute significantly.
Test Statistic: F* 
SSR ( x 2 | all the other vari ables )
MSE
Rejection Region:
Reject H0 if F* > F(0.95,1,23) = 4.28
Calculations:
SSR(x2|x0,x3,x4,x6,x1,x5) = 14.472 (type II SS)
MSE = 49.957
14.472
F* 
 0.29 .
49.957
Results: Since F* = 0.29 < 4.28 fail to reject H0. The variable x2 (does not allow special
privileges) does not contribute significantly to the model after accounting for the effects of the
other variables.
Note: A t-test for 2 = 0 could also be used.
b) The contribution of x6 to R2 when x3 and x4 are already in the model is
SSR ( x 6 | x 3 , x 4 ) 437.02717

 0.1017  10.17%.
SST
4296.96667
c) Sequential tests
(Since there are 23 error degrees of freedom for error, pooling is not necessary.)
i) H0: x5 has no significant effect on the rating after accounting for the other variables (5= 0).
HA: x5 has a significant effect (5  0).
MSR ( x 5 | x 0 , x 3 , x 4 , x 6 , x1 , x 2 )
.
MSE (full )
Test statistic:
F* 
Rejection region:
Reject H0 if F* > F(0.95;1,23) = 4.28.
Calculations:
MSR(x5|others) = 3.406
3.406
F* 
 0.068 .
49.957
MSE = 49.957
Results: Fail to reject H0. x5 has no significant effect after accounting for the other variables in
the model.
-6-
Statistics 511
Study Guide 7
Fall 2001
ii) H0: 2 = 0 (given that 5 = 0)
HA: 2  0 (given that 5 = 0)
SSR ( x 2 | x 3 , x 4 , x 6 , x1 ) / 1
Test statistic:
F* 
MSE (full )
Rejection region:
Reject if F* > F(0.95,1,23) = 4.28
Calculations:
SSR(x2|x3,x4,x6,x1) = 14.15891594
(type I SS for x2 from SAS output)
14.15891594
F* 
 0.283
49.957
Results: Since F* = 0.283 < F = 4.28, fail to reject H0. At  = 0.05, 2 = 0 when x3, x4, x6, and x1
are in the model.
iii) H0: 1 = 0 (given that 5 = 2 = 0)
HA: 1  0 (given that 5 = 2 = 0)
Test statistic:
Rejection region:
SSR ( x1 | x 3 , x 4 , x 6 ) / 1
MSE (full )
Reject if F* > F(0.95,1,23) = 4.28
F* 
Calculations:
SSR(x1|x3,x4,x6) = 756.85907
(type I SS for x1 from SAS output)
756.85907
F* 
 15.150
49.957
Results: Since F* = 15.150 > F = 4.28, reject H0. At  = 0.05, 1  0 when x3, x4, and x6 are in
the model.
d) Simultaneous test to see if x1, x2, and x5 have no effect on rating given x3, x4, x6
H0: x1, x2, and x5 have no effect on rating given x3, x4, x6.
HA: not H0.
SSR ( x1 , x 2 , x 5 | x 3 , x 4 , x 6 )/3
MSE
Test statistic:
F* 
Rejection region:
Reject H0 if F* > F(0.95;3,23) = 3.03.
Calculations:
SSR(x1,x2,x5|x3,x4,x5) = 756.859 + 14.157 + 3406
=774.424.
774.424 / 3
F* 
 5.167 .
49.957
-7-
Statistics 511
Study Guide 7
Fall 2001
Results: Reject H0. At  = 0.05, the claim that once the effects of the aspects of the job measured
by x3, x4, and x6 are accounted for, the personal interactions measured by x1, x2, and x5 have no
effect on how the supervisors rate their supervisors, is not substantiated, i.e., the data do not
support the claim.
C. Polynomial Regression.
a) The plot of log(concentration) vs. log (time) indicates that a smooth curve would fit the data
very well. A linear model fitted to concentration vs. time would definitely have outliers and it
does not appear that there is a smooth curve that would fit the untransformed data. Thus, we
hope to fit a polynomial regression model to log (concentration) vs. log (time).
b) A parabola seems an appropriate curve through the points on the log (concentration) vs log
(time) plot. This would be a second degree (quadratic) polynomial, which we should try to fit.
One needs to be wary of cubic and higher polynomials.
c) We should use type I SS to determine the appropriate degree of the polynomial.
d) The small value of type I SS for LTIME3 (the cubic term) relative to that for LTIME and
LTIME2 (linear & squared terms) seems to indicate that the cubic term is insignificant and we
should use a quadratic polynomial. To test for significance we must compare the type I SS for
LTIME, LTIME2 and LTIME3 with the MSE using an F-test:
LTIME3:
F* 
0.001442047
0.003129875
 0.3 > F0.95(1,4) = 7.71
LTIME2:
F* 
0.36639154
0.003129875
 120 > F0.95(1,4) = 7.71
LTIME:
F* 
0.95575414
0.003129875
 300 > F0.95(1,4) = 7.71
Type I SS for LTIME3 is not significantly large compared to the MSE and thus the cubic term
should be removed. LTIME2 is significantly large compared to the MSE and consequently this
term should be included.
e) At TIME 60 minutes,
LTIME log (60) = 4.0943446
2
LTIME2 (log (60)) = 16.763657
LCONCENT = -3.13658682 + (1.88787605)  4.0943446
– (0.24627413)  16.763657
= 0.4645733.
estimated CONCENTRATION = e0.465 = 1.592
-8-
Download