Statistics 511 Study Guide 2 Fall 2001 Questions A) Questions 1-3 refer to the stem volume data from study guide 1. Stem volumes appear to be closer to normally distributed if analyzed after taking square roots. Stem Volume (cubic cm) 1384 1324 1325 1065 1870 1652 1) Compute a 95% confidence interval for the mean square root of the stem volume of 2 year old saplings propagated from uninfected buds from this mother tree. 2) Compute a 95% confidence interval for the mean stem volume of 2 year old saplings propagated from uninfected buds from this mother tree. 3) Suppose an uninfected bud is taken from the mother tree, and propagated into a sapling. What is a 90% prediction interval for the stem volume of this sapling at age 2 years? B) Questions 4-21 refer to the data set below. Meat packing companies purchase live steers and sell butchered meat. To fix appropriate buying prices for the steers, they need to predict the weight of the dressed beef they can produce, from data on the live cow. The data below is the live and dressed weight (in hundreds of pounds) of 9 steers randomly selected at a cattle auction. (source: Ott, 1st edition, p 143) live weight 4.2 3.8 4.8 3.4 4.5 4.6 4.3 3.7 3.9 dressed weight 2.8 2.5 3.1 2.1 2.9 2.8 2.6 2.4 2.5 The sample summary statistics are: x 4.133 y 2.633 S xx 1.72 S xy 1.06 n n i 1 i 1 where S xx (x i x) 2 and S xy (x i x)(y i y) -1- Statistics 511 Study Guide 2 Fall 2001 The analysis of variance table is: SOURCE DF MODEL ERROR TOTAL ____ 7 8 SUM OF SQUARES __________ __________ 0.72000000 MEAN SQUARES 0.65325581 0.009534884 __________ 4) What is the population of interest? 5) What is the sampling population? 6) What are the independent and dependent variables in this study? 7) Plot the dependent variable versus the independent variable. 8) Does the dependent variable appear to depend linearly on the independent variable? 9) What is the estimated regression equation? What is the slope of the regression line? 10) What is the estimated population mean of dressed weight for steers with live weight 420 pounds? 11) Fill in the blanks in the ANOVA table. 12) What is the estimated population variance? 13) What assumptions about the data are needed for the estimate in (9) to be an estimate of the population slope? How would you check these assumptions? 14) What is the estimated variance of the slope of the regression line? 15) An old almanac has the rule of thumb that every pound of live weight of a steer yields .6 pounds of dressed beef. Formulate this statement in terms of the parameters of the regression equation. 16) Test whether the statement in (15) is true. What are the null and alternative hypotheses? 17) What assumptions about the data are needed for the test in (16) to be valid? How would you test these assumptions? 18) Compute a 95% confidence interval for the population slope. 19) Compute a 95% confidence interval for the mean dressed weight for steers with live weight 420 pounds. -2- Statistics 511 Study Guide 2 Fall 2001 20) A packing company just purchased a live steer weighing 373 pounds. Compute a 95% prediction interval for the dressed weight of this steer. 21) What assumptions about the data are needed for the intervals in (15)-(17) to be valid? C. Prediction Versus Confidence 22) For each of the following, determine whether a confidence interval or prediction interval is more appropriate: a) Estimating the mean cholesterol of patients in your hospital with given fat intake from data relating cholesterol and fat intake of individual patients in your hospital. b) Estimating the cholesterol of a patient in your hospital with given fat intake from data relating cholesterol and fat intake of individual patients in your hospita. c) Estimating the mean cholesterol of adults in Pleasant Gap (a small town just outside State College) from a study relating mean adult cholesterol to mean weekly meat sales in similar sized towns in Pennsylvania. STUDY GUIDE 2 SOLUTIONS A. 1) Sample mean square root of stem volume, X = 37.7520. Sample standard deviation for square root of stem volume, S = 3.70696. n=6 = 0.05 t[1 - /2; n-1] = 2.57 95% C.I. for the mean square root of the stem volume is X t[1 / 2; n 1] S / n 3.70696 = 37.7520 2.571 6 = 37.7520 3.8909 = (33.86ll, 41.6428). 2) 95% C.I. for the mean stem volume (square the endpoints of the C.I. from 8). (1146.5741, 1734.1228) -3- Statistics 511 3) Study Guide 2 Fall 2001 To compute a 90% prediction interval for the stem volume, first compute the 90% prediction interval for the square root of the stem volume, then invert the interval by squaring the endpoints. n=6 = 0.1 t[1 – /2;n-1] = 2.015 X t[1 / 2; n 1] S 1 n1 = 37.7520 2.015 3.70690 1 16 = 37.7520 8.0680 = (29.6840, 45.8200). Therefore 90% prediction interval is (881.1394, 2,099.4724) . B. 4) All beef cattle. 5) The beef cattle that were at the cattle auction. 6) Independent variable - live weight; dependent variable - dress weight. 7) Plot: Dress Wt 3.0 2.5 2.0 3.4 3.9 4.4 4.9 Live Wt 8) Yes, the dependent variable does appear to depend linearly on the independent variable. 9) Slope b = SXY 1.06 0.616 . SXX 1.72 Intercept a = Y bX = 2.633 – (0.616)4.133 = 0.0859. Estimated regression equation is given by Ŷ = 0.0859 + 0.616X = 2.633 + 0.616(X – 4.133) (Reference: NWK Section 2.4) -4- Statistics 511 10) Study Guide 2 Fall 2001 Estimated population mean when X = 4.2 is ŷ = 0.0859 + 0.616(4.20) = 2.674 (or 267.4 lbs.). 11) SOURCE MODEL ERROR TOTAL DF 1 7 8 SS 0.65325581 0.0667442 0.72000000 MS 0.65325581 0.009534884 12)Estimated population variance is the MSE = 0.009535. 13)In order for the MSE to be a valid estimate of 2, the error terms, which are defined by the simple linear regression model, need to be independent and have common variance 2 and zero mean. Independence is usually difficult to assess without knowledge about the experimental design, although one can plot the standardized residuals versus time to check for any systematic patterns. The linear model assumption, together with the common variance and zero mean criteria, can be checked using a residual plot (standardized residuals versus predicted values). The residuals should form a horizontal band about Y=0. (4.2) 14)Estimated variance of slope = MSE 0.009535 = 0.005544. SXX 1.72 (3.1) 15) = change in mean dressed weight per unit change in live weight in population = 0.6. 16) H0 : = 0.6 versus HA : 0.6 b 0.6 0.616 0.6 t* 0.215 MSE / SXX 0.005544 Since |t*| < t7,0.05 = 2.365, we fail to reject H0 at = .05, and conclude that = 0.6; i.e., statement in (15) is true. 17) The above test requires normality of the error terms in addition to those assumptions listed in (13), which can be checked using normal scores plot (standardized residuals versus their normal scores). 18) A 95% confidence interval for , population slope, is given by b t 7,0.05 MSE / SXX 0.616 2.365 0.005544 0.616 0.176 = (0.440,0.792) . 19) A 95% confidence interval for ŷ (Xh = 4.2) is given by -5- Statistics 511 Study Guide 2 Fall 2001 1 ( X X )2 1 (4.2 4.133)2 ˆy t7,0.05 MSE h 2.674 2.365 0.009535 S XX 9 1.72 n = 2.674 2.365(0.0329) = (2.596,2.752) in 100 lbs. (3.4) 20) A 95% prediction interval for Y when Xh = 3.73 is given by 1 (X X ) 2 (b 0.1 b1.0 X h ) t 7,0.05 MSE 1 h n SXX 2.384 2.365 0.009535 1 1 (3.73 4.133)2 9 1.72 = 2.384 2.365(0.107) = (2.130,2.637) in 100 lbs. (3.5) 21)For the preceding confidence intervals to be valid, all the previous assumptions mentioned in (10) and (14) need to hold. Also confidence intervals for ŷ and prediction intervals should mainly be used when X is within the range of the data values. C. 22. a) The population is patients in your hospital with a given fat intake. The problem asks for the population mean, which is a parameter. Therefore, a confidence interval is appropriate. b) The population is the same as in problem a. The problem asks for an estimate for a member of the population. Hence, a prediction interval is more appropriate. c) This is tricky. The study population is the set of towns. For each town, mean adult cholesterol and mean weekly meat sales are measured. Hence, the problem asks for an estimate for a member of the population. A prediction interval is more appropriate. -6-