Study Guide 2

advertisement
Statistics 511
Study Guide 2
Fall 2001
Questions
A) Questions 1-3 refer to the stem volume data from study guide 1. Stem volumes appear
to be closer to normally distributed if analyzed after taking square roots.
Stem Volume (cubic cm)
1384
1324
1325
1065
1870
1652
1) Compute a 95% confidence interval for the mean square root of the stem volume of 2 year old
saplings propagated from uninfected buds from this mother tree.
2) Compute a 95% confidence interval for the mean stem volume of 2 year old saplings
propagated from uninfected buds from this mother tree.
3) Suppose an uninfected bud is taken from the mother tree, and propagated into a sapling. What
is a 90% prediction interval for the stem volume of this sapling at age 2 years?
B) Questions 4-21 refer to the data set below.
Meat packing companies purchase live steers and sell butchered meat. To fix appropriate
buying prices for the steers, they need to predict the weight of the dressed beef they can
produce, from data on the live cow. The data below is the live and dressed weight (in
hundreds of pounds) of 9 steers randomly selected at a cattle auction.
(source: Ott, 1st edition, p 143)
live weight
4.2
3.8
4.8
3.4
4.5
4.6
4.3
3.7
3.9
dressed weight
2.8
2.5
3.1
2.1
2.9
2.8
2.6
2.4
2.5
The sample summary statistics are:
x  4.133 y  2.633
S xx  1.72 S xy  1.06
n
n
i 1
i 1
where S xx   (x i  x) 2 and S xy   (x i  x)(y i  y)
-1-
Statistics 511
Study Guide 2
Fall 2001
The analysis of variance table is:
SOURCE
DF
MODEL
ERROR
TOTAL
____
7
8
SUM OF
SQUARES
__________
__________
0.72000000
MEAN
SQUARES
0.65325581
0.009534884
__________
4) What is the population of interest?
5) What is the sampling population?
6) What are the independent and dependent variables in this study?
7) Plot the dependent variable versus the independent variable.
8) Does the dependent variable appear to depend linearly on the independent variable?
9) What is the estimated regression equation? What is the slope of the regression line?
10) What is the estimated population mean of dressed weight for steers with live weight 420
pounds?
11) Fill in the blanks in the ANOVA table.
12) What is the estimated population variance?
13) What assumptions about the data are needed for the estimate in (9) to be an estimate of the
population slope? How would you check these assumptions?
14) What is the estimated variance of the slope of the regression line?
15) An old almanac has the rule of thumb that every pound of live weight of a steer yields .6
pounds of dressed beef. Formulate this statement in terms of the parameters of the regression
equation.
16) Test whether the statement in (15) is true. What are the null and alternative hypotheses?
17) What assumptions about the data are needed for the test in (16) to be valid? How would you
test these assumptions?
18) Compute a 95% confidence interval for the population slope.
19) Compute a 95% confidence interval for the mean dressed weight for steers with live weight
420 pounds.
-2-
Statistics 511
Study Guide 2
Fall 2001
20) A packing company just purchased a live steer weighing 373 pounds. Compute a 95%
prediction interval for the dressed weight of this steer.
21) What assumptions about the data are needed for the intervals in (15)-(17) to be valid?
C. Prediction Versus Confidence
22) For each of the following, determine whether a confidence interval or prediction interval is
more appropriate:
a) Estimating the mean cholesterol of patients in your hospital with given fat intake from data
relating cholesterol and fat intake of individual patients in your hospital.
b) Estimating the cholesterol of a patient in your hospital with given fat intake from data
relating cholesterol and fat intake of individual patients in your hospita.
c) Estimating the mean cholesterol of adults in Pleasant Gap (a small town just outside State
College) from a study relating mean adult cholesterol to mean weekly meat sales in similar
sized towns in Pennsylvania.
STUDY GUIDE 2 SOLUTIONS
A.
1)
Sample mean square root of stem volume, X = 37.7520. Sample standard deviation for
square root of stem volume, S = 3.70696.
n=6
 = 0.05
t[1 - /2; n-1] = 2.57
95% C.I. for the mean square root of the stem volume is
X  t[1   / 2; n  1] S / n
3.70696
= 37.7520  2.571 
6
= 37.7520  3.8909
= (33.86ll, 41.6428).
2)
95% C.I. for the mean stem volume (square the endpoints of the C.I. from 8).
(1146.5741, 1734.1228)
-3-
Statistics 511
3)
Study Guide 2
Fall 2001
To compute a 90% prediction interval for the stem volume, first compute the 90%
prediction interval for the square root of the stem volume, then invert the interval by
squaring the endpoints.
n=6  = 0.1
t[1 – /2;n-1] = 2.015
X  t[1   / 2; n  1] S 1  n1
= 37.7520  2.015  3.70690 1 16
= 37.7520  8.0680
= (29.6840, 45.8200).
Therefore 90% prediction interval is (881.1394, 2,099.4724) .
B.
4)
All beef cattle.
5)
The beef cattle that were at the cattle auction.
6)
Independent variable - live weight; dependent variable - dress weight.
7)
Plot:
Dress Wt
3.0
2.5
2.0
3.4
3.9
4.4
4.9
Live Wt
8)
Yes, the dependent variable does appear to depend linearly on the independent variable.
9)
Slope  b =
SXY 1.06

 0.616 .
SXX 1.72
Intercept  a = Y  bX = 2.633 – (0.616)4.133 = 0.0859.
Estimated regression equation is given by
Ŷ = 0.0859 + 0.616X
= 2.633 + 0.616(X – 4.133)
(Reference: NWK Section 2.4)
-4-
Statistics 511
10)
Study Guide 2
Fall 2001
Estimated population mean when X = 4.2 is ŷ = 0.0859 + 0.616(4.20) = 2.674 (or 267.4
lbs.).
11)
SOURCE
MODEL
ERROR
TOTAL
DF
1
7
8
SS
0.65325581
0.0667442
0.72000000
MS
0.65325581
0.009534884
12)Estimated population variance is the MSE = 0.009535.
13)In order for the MSE to be a valid estimate of 2, the error terms, which are defined by the
simple linear regression model, need to be independent and have common variance 2 and zero
mean. Independence is usually difficult to assess without knowledge about the experimental
design, although one can plot the standardized residuals versus time to check for any systematic
patterns. The linear model assumption, together with the common variance and zero mean
criteria, can be checked using a residual plot (standardized residuals versus predicted values).
The residuals should form a horizontal band about Y=0. (4.2)
14)Estimated variance of slope =
MSE 0.009535

= 0.005544.
SXX
1.72
(3.1)
15)  = change in mean dressed weight per unit change in live weight in population = 0.6.
16) H0 :  = 0.6 versus HA :   0.6
b  0.6
0.616  0.6
t* 

 0.215
MSE / SXX
0.005544
Since |t*| < t7,0.05 = 2.365, we fail to reject H0 at  = .05, and conclude that  = 0.6; i.e., statement
in (15) is true.
17) The above test requires normality of the error terms in addition to those assumptions listed in
(13), which can be checked using normal scores plot (standardized residuals versus their normal
scores).
18) A 95% confidence interval for , population slope, is given by
b  t 7,0.05 MSE / SXX  0.616  2.365 0.005544  0.616  0.176
= (0.440,0.792) .
19) A 95% confidence interval for ŷ (Xh = 4.2) is given by
-5-
Statistics 511
Study Guide 2
Fall 2001
 1 ( X  X )2 
1 (4.2  4.133)2
ˆy  t7,0.05 MSE   h
  2.674  2.365 0.009535

S XX
9
1.72
n

= 2.674  2.365(0.0329) = (2.596,2.752) in 100 lbs.
(3.4)
20) A 95% prediction interval for Y when Xh = 3.73 is given by
1 (X  X ) 2
(b 0.1  b1.0 X h )  t 7,0.05 MSE 1   h
n
SXX
 2.384  2.365 0.009535 1 
1 (3.73  4.133)2

9
1.72
= 2.384  2.365(0.107) = (2.130,2.637) in 100 lbs.
(3.5)
21)For the preceding confidence intervals to be valid, all the previous assumptions mentioned in
(10) and (14) need to hold. Also confidence intervals for ŷ and prediction intervals should
mainly be used when X is within the range of the data values.
C.
22.
a) The population is patients in your hospital with a given fat intake. The problem asks for
the population mean, which is a parameter. Therefore, a confidence interval is
appropriate.
b) The population is the same as in problem a. The problem asks for an estimate for a
member of the population. Hence, a prediction interval is more appropriate.
c) This is tricky. The study population is the set of towns. For each town, mean adult
cholesterol and mean weekly meat sales are measured. Hence, the problem asks for an
estimate for a member of the population. A prediction interval is more appropriate.
-6-
Download