Stat 2470, Practice Exam #3, Spring 2014 1. A study comparing different types of batteries showed that the average lifetimes of Duracell Alkaline AA batteries and Eveready Energizer Alkaline AA batteries were given as 4.5 hours and 4.2 hours, respectively. Suppose these are the population average lifetimes. a. Let be the sample average lifetime of 150 Duracell batteries and be the sample average lifetime of 150 Eveready batteries. What is the mean value of (i.e., where is the distribution of centered)? How does your answer depend on the specified sample sizes? b. Suppose the population standard deviations of lifetime are 1.8 hours for Duracell batteries and 2.0 hours for Eveready batteries. With the sample sizes given in part (a), what is the variance of the statistic , and what is its standard deviation? c. For the sample sizes given in part (a), what is the approximate distribution curve of (include a measurement scale on the horizontal axis)? Would the shape of the curve necessarily be the same for sample sizes of 10 batteries of each type? Explain. 2. Suppose are true mean stopping distances at 50 mph for cars of a certain type equipped with two different types of braking systems. The following statistics are given: m = 6, Calculate a 95% CI for the difference between true average stopping distance for cars equipped with system 1 and cars equipped with system 2. Does the interval suggest that precise information about the value of this difference is available? 3. A study includes the accompanying data on compression strength (lb) for a sample of 12-oz aluminum cans filled with strawberry drink and another sample filled with cola. Does the data suggest that the extra carbonation of cola results in a higher average compression strength? Base your answer on a -value. What assumptions are necessary for your analysis? Beverage Sample Size Sample Mean Strawberry drink Cola 15 15 546 560 Sample St. Dev. 21 15 4. Consider the accompanying data on breaking load (kg/25 mm width) for various fabrics in both an unabraded condition and an abraded condition. Use the paired t test at significance level .01 to test . Fabric U A 1 25.6 26.5 2 48.8 52.5 3 49.8 46.5 4 43.2 36.5 5 38.7 34.5 6 55.0 20.0 7 36.4 28.5 8 51.5 46.0 5. Obtain or compute the following quantities using the table of “critical values for F distribution” available in your text a. b. c. d. e. The 95th percentile of the F distribution with f. The 5th percentile of the F distribution with g. h. 6. In a one-way ANOVA problem involving four populations or treatments, the null hypothesis of interest is 7. In a single-factor ANOVA problem involving five populations or treatments, which of the following statements are true about the alternative hypothesis? a. All five population means are equal. b. All five population means are different. c. At least two of the population mean are different. d. At least three of the population mean are different. e. At most, two of the population means are equal. 8. In a single-factor ANOVA problem involving five populations or treatments with a random sample of four observations form each one, it is found that SSTr = 16.1408 and SSE = 37.3801. Then the value of the test statistic is a. 1.619 b. 2.316 c. 0.432 d. 1.522 e. 4.248 9. The distribution of the test statistic in single-factor ANOVA is the a. binomial distribution b. normal distribution c. t distribution d. F distribution e. None of the above answers are correct. 10. In a single-factor ANOVA problem involving five populations or treatments with a random sample of nine observations from each one, suppose that is rejected at .05 level. Which of the following values are correct for the appropriate critical value needed to perform Tukey’s procedure? a. 4.76 b. 3.79 c. 4.04 d. 3.85 e. 4.80 11. In a single-factor ANOVA problem involving five populations or treatments with a random sample of four observations form each one, it is found that SSTr = 16.1408 and SSE = 37.3801. Then the value of the test statistic is a. 1.619 b. 2.316 c. 0.432 d. 1.522 e. 4.248 12. Consider the accompanying data on plant growth after the application of different types of growth hormone. 1 2 Hormone 3 4 5 15 23 20 9 8 19 15 17 13 13 9 22 22 20 17 16 19 19 12 10 a. Perform an F test at level b. What happens when Tukey’s procedure is applied? 13. An experiment is conducted to investigate how the behavior of mozzarella cheese varied with temperature. Consider the accompanying data on x = temperature and y = elongation (%) at failure of the cheese. x y 59 118 63 182 67 247 72 208 74 197 78 160 83 132 a. Construct a scatter plot in which the axes intersect at (0,0). Mark 0, 20, 40, 60, 80, and 100 on the horizontal axis and 0, 50, 100, 150, 200, and 250 on the vertical axis. b. Construct a scatter plot in which the axes intersect at (55,100). Does this plot seem preferable to the one in part (a)? Explain your reasoning. c. What do the plots of parts (a) and (b) suggest about the nature of the relationship between the two variables? 14. The accompanying data was read from a graph that appeared in a recent study. The independent variable is and the dependent variable is steel weight loss (g/m ). x y 14 280 18 350 40 470 43 500 45 560 112 1200 a. Construct a scatter plot. Does the simple linear regression model appear to be reasonable in this situation? b. Calculate the equation of the estimated regression line. c. What percentage of observed variation in steel weight loss can be attributed to the model relationship in combination with variation in deposition rate? d. Because the largest x value in the sample greatly exceeds the others, this observation may have been very influential in determining the equation of the estimated line. Delete this observation and recalculate the equation. Does the new equation appear to differ substantially from the original one (you might consider predicted values)? 15. A study reports the results of a regression analysis based on n = 15 observations in which x = filter application temperature ( C) and y = % efficiency of BOD removal. Calculated quantities include a. Test at level .01 which states that the expected increase in % BOD removal is 1 when filter application temperature increases by 1 C, against the alternative b. Compute a 99% CI for application temperature. the expected increase in % BOD removal for a 1 C increase in filter 16. Infestation of crops by insects has long been of great concern to farmers and agricultural scientists. A study reports data on x = age of a cotton plant (days) and y = % damaged squares. Consider the accompanying n = 12 observations: x y 9 11 12 12 12 23 15 30 18 29 18 52 x y 21 41 21 65 27 60 30 72 30 84 33 93 a. Why is the relationship between x and y not deterministic? b. Does a scatter plot suggest that the simple linear regression model will describe the relationship between the two variables? c. The summary statistics are Determine the equation of the least squares line. d. Predict the percentage of damaged squares when the age is 20 days by giving an interval of plausible values. 17. Wear resistance of certain nuclear reactor components made of Zircaloy-2 is partly determined by properties of the oxide layer. The following data appears in a study that proposed a new nondestructive testing method to monitor thickness of the layer. The variables are x =oxide-layer thickness ( and y =eddy-current respond (arbitrary units). x x 0 20.3 7 19.8 17 19.5 114 15.9 133 15.1 142 14.7 190 11.9 218 11.5 237 8.3 285 6.6 The equation of the least squares line is =20.6 - .047x. Calculate and plot the residuals against x and then comment on the appropriateness of the simple linear regression model. 18. Suppose that the expected value of thermal conductivity y is a linear function of lamellar thickness. x x a. b. 240 12.0 410 14.7 460 14.7 490 15.2 520 15.2 590 15.6 745 16.0 where x is 8300 18.1 Estimate the parameters of the regression function and the regression function itself. Predict the value of thermal conductivity when lamellar thickness is 500 angstroms. 19. Let y = sales at a fast food outlet (1000’s of $), number of competing outlets within a 1-mile radius, the population within a 1-mile radius (1000’s of people), and be an indicator variable that equals 1 if the outlet has a drive-up window and 0 otherwise. Suppose that the true regression model is a. What is the mean value of sales when the number of competing outlets is 2, there are 8000 people within a 1-mile radius, and outlet has a drive-up window? b. What is the mean value of sales for an outlet without a drive-up window that has three competing outlets and 5000 people within a 1-mile radius? c. Interpret 20. A multiple regression model with four independent variables to study accuracy in reading liquid crystal displays was used. The variables were y = error percentage for subjects reading a four-digit liquid crystal display = level of backlight (ranging from 0 to 122 = character subtense (ranging from = viewing angle (ranging from ) ) ) =level of ambient light (ranging from 20 to 1500 lux) The model fit to data was a. b. c. d. e. The resulting estimated coefficient were Calculate an estimate of expected error percentage when Estimate the mean error percentage associated with a backlight level of 20, character subtense of .5, viewing angle of 10, and ambient light level of 30. What is the estimated expected change in error percentage when the level of ambient light is increased by 1 unit while all other variables are fixed at the values given in part (a)? Answer for a 100-unit increase in ambient light level. Explain why the answers in part ( c ) do not depend on the fixed values of Under what conditions would there be such a dependence? The estimated model was based on n=30 observations, with SST=39.2 and SSE=20.0. Calculate and interpret the coefficient of multiple determination, and then carry out the model utility test using ANS: 1. a. irrespective of sample sizes. b. and the standard deviation of c. A normal curve with mean and standard deviation as given in parts “a” and “b” (because the CLT implies that both have approximately normal distributions, so does also). The shape is not necessarily that of a normal curve when because the CLT cannot be invoked. So if the two lifetime population distributions are not normal, the distribution of will typically be quite complicated. 2. We want a 95% confidence interval for so the 95% interval is Because the interval is so wide, it does not appear that precise information is available. 3. Let = the true average compression strength for strawberry drink and let strength for cola. A lower tailed test is appropriate. We test = the true average compression versus The test statistic is We use degrees of freedom so use The This -value indicates strong support for the alternative hypothesis. The data do suggest that the extra carbonation of cola results in a higher compression strength. 4. Parameter of Interest: condition. , = true average difference of breaking load for fabric in unabraded or abraded The value of the test statistic is: The rejection region is: Since t is not 2.998, we fail to reject two fabric load conditions. 5. a. b. c. d. The data do not indicate a difference in breaking load for the e. f. g. h. Since =.95-.01=.94. 6. 7. C 8. A 9. D 10. C 11. A 12. a. Let = true average growth when hormone #i is applied. will be rejected in favor of Source Treatments Error Total df 4 15 19 SS 200.3 215.5 415.8 MS 50.075 14.3667 f 3.49 Because There appears to be a difference in the average growth with the application of the different growth hormones. b. The sample means are, in increasing order, 12.00, 13.50, 14.75, 19.50, and 19.75. The most extreme difference is 19.75 – 12.00 = 7.75, which doesn’t exceed 8.28, so no differences are judged significant. Tukey’s method and the F test are at odds. 13. C. A parabola appears to provide a good fit to both graphs 14. a. According to the scatter plot of the data, a simple linear regression model does appear to be plausible. b. c. d. The regression equation is y = 138 + 9.31x The desired value is the coefficient of determination, The new equation is y* = 190 + 7.55x*. This new equation appears to differ significantly. If we were to predict a value of y for x = 50, the value would be 567.9, where using the original data, the predicted value for x = 50 would be 603.5. 15. a. We reject is rejected in favor of b. = (1.08,2.32) 16. a. b. Based on a scatterplot of the data, a simple linear regression model does seem a reasonable way to describe the relationship between the two variables. c. d. 17. The (x, residual) pairs for the plot are (0, -.335), (7, -.508), (17, -.341), (114, .592), (133, .679), (142, .700), (190, .142), (218, 1.051), (237, -1.262), and (285, -.719). The plot shows substantial evidence of curvature. 18. a. The suggested model is The summary quantities are and the estimated regression function is b. 19. For x (remember the units of window) the average sales are are in 1000,s) and (since the outlet has a drive-up b. For the average sales are c. When the number of competing outlets an outlet has a drive-up window. remained fixed, the sales will increase by $15,400 when 20. b. c. d. There are no interaction predictors – e.g., dependence of interaction predictors involving e. There would be had been included. at least one among not zero, the test statistic is Because the value of in the model. is not all that impressive). is will be rejected if is rejected and the model is judged useful (this even though