Chapter 12 Study Guide Solutions Study Guide 12.1a 12.1b Ideal Response 2 No, the conditions for performing inference are not met. The variance of the residuals increases as the laboratory measurement increases. 4 Linear: The residual plot is reasonably centered around 0. This means that the scatterplot is approximately linear. Independent: This was a randomized experiment. Due to the random assignment, the observations can be viewed as independent. Normal: The histogram is mound shaped and approximately symmetric so the residuals could follow a Normal distribution. Equal Variance: The residual plot shows roughly equal scatter for all x values. Random: This was a randomized experiment. The conditions are met. 6 is the y-intercept. In this case it would measure the BAC level if no beers had been drunk. We would expect this to be 0 and the estimate is close to 0 with a value of -0.0127. is the slope. It tells us how much the BAC increases, on average, with the drinking of each additional beer. The estimate for this parameter is 0.018. In other words, we expect the BAC level to increase by 0.018 with each additional beer. Finally, measures the standard deviation of BAC values about the population regression line. In this case the estimate is 0.0204. This means that the actual BAC level will vary from the estimated value by 0.0204 on average. 8 (a) If we repeated the experiment many times, the slope of the sample regression line would typically vary by about 0.0024 from the true slope of the population regression line for predicting BAC level from the number of beers consumed. (b) Since there are 16 observations, the appropriate t distribution has 14 degrees of freedom and This leads to the confidence interval of 0.018 2.977(0.0024) 0.018 0.007 (0.011,0.025) . (c) We are 99% confident that the interval from 0.011 to 0.025 captures the true slope of the population regression line for predicting BAC level from the number of beers consumed. (d) If we were to repeat the experiment many times and compute confidence intervals for the regression slope in each case, about 99% of the resulting intervals would contain the slope of the population regression line. 10 State: Plan: Do: Conclude: 12 We want to construct a 90% confidence interval for the true slope , of the population regression line relating heights an arm spans of students in a large school. If the conditions are met, we will use a t-interval for the slope to estimate . We are assuming that the conditions are met here. This leads to the confidence interval of: 0.8404 1.746(0.0809) 0.8404 0.1413 (0.6991,0.9817) We are 90% confident that the interval from 0.6991 to 0.9817 captures the true slope of the population regression line predicting height from arm span.. (a) 1.286 11.894(5) 58.184 clusters (b) measures the typical amount of error between the actual values and the predicted values. In this case our estimate for is 6.419 so we would expect our prediction of clusters of beetles to be off by that much on average. Chapter 12 Study Guide Solutions 12.c 14 (a) The scatterplot suggests that there is a moderately strong negative linear relationship between the amount of time spent at the table and the calories consumed for young children. (b) The equation for the line is yˆ 560.65 3.0771x where ŷ is the calories consumed and x is the time spent at the table. (c) The y-intercept says that if there no time spent at the table, we would predict the average number of calories consumed to be 560.65. In this case that is extrapolation as the smallest amount of time measured was 20 minutes. Also, clearly, if the children spend no time at the table, they cannot consume 560 calories. The slope says that for each additional minute at the table we can expect the average caloric consumption to decrease by 3.0771 calories. (d) State: We want to perform a test of H o : 0 versus Ha : 0 where is the true slope of the population regression line relating time at the table to caloric consumption. We will use a significance level 0.01 . Plan: If the conditions are met, we will do a t-test for the slope . The residual plot and the histogram of the residuals are given below. Linear: The scatterplot is approximately linear. Independent: There were 20 toddlers observed. This is clearly less than 10% of all possible toddlers. Normal: The histogram of the residuals is mound shaped and approximately symmetric so the residuals could follow a Normal distribution. Equal Variance: The residual plot shows roughly equal scatter for all x values. Random: The data come from a random sample. The conditions are met. Do: Conclude: 16 According to the output, the test statistic is -3.62 and the one-sided P-value using df 18 is 0.001. Since the P-value is less than 0.01 we reject the null hypothesis and conclude that there is convincing evidence of a negative linear relationship between time at the table and caloric consumption. We want to construct a 98% t-interval for , the true slope of the population regression line relating time at the table and caloric consumption. Plan: If the conditions are met, we will construct a t-interval for the slope . The conditions were check in Exercise 12.14. Do: The sample size is 20 so the appropriate distribution has 18 degrees of freedom and t* 2.552 . This leads to the confidence interval of 3.0771 2.552(0.8498) 3.0771 2.1687 (5.2358, 0.9084) . Conclude: We are 98% confident that the interval from -5.2458 to -0.9084 contains the true slope of the population regression line for predicting calorie consumption from time at the table. In Exercise 12.14 we rejected the null hypothesis that the true slope was 0. The conclusions are the same. (b) (i) The typical error when using the regression line to predict calorie consumption is about 23.4 calories. (ii) Approximately 42.1% of the variation in calorie consumption can be explained by the linear relationship with the time spent at the table. (iii) If samples like this were observed many times, the estimated slope would differ from the slope of the true regression line for predicting calorie consumption from time at the table by an average of 0.8498. (a) State: Chapter 12 Study Guide Solutions 20 (a) State: Plan: Do: We want to perform a test of H o : 0 versus Ha : 0 where is the true slope of the population regression line relating swim time to pulse rate. We will use a significance level 0.05 . If the conditions are met, we will do a t-test for the slope . The scatterplot, residual plot, and the histogram of the residuals are given below. Linear: The scatterplot is approximately linear. Independent: There were 20 toddlers observed. This is clearly less than 10% of all possible swim times that could have been measured. Normal: The histogram of residuals is mound shaped and approximately symmetric so the residuals could follow a Normal distribution. Equal Variance: The residual plot shows roughly equal scatter for all x values. Random: The data come from a random sample. The conditions are met. The output is given below. According to the output, the test statistic is 5.13 and the P-value given is 0.000. This P-value is for the two-sided test and we are conducting a one-sided test. But when you divide by 2 here you get the same thing. The regression equation is pulse = 480-9.69 time Predictor Constant time S = 6.45505 Conclude: Coef 479.93 -9.695 SE Coef 66.23 1.889 R-Sq = 55.6% T 7.25 -5.13 P 0.000 0.000 R-Sq (adj) = 53.5% Since the P-value is less than 0.05 we reject the null hypothesis and conclude that there is convincing evidence of a negative linear relationship between swim time and pulse rate. Chapter 12 Study Guide Solutions (b) State: Do: Conclude: 12.1d 18 We want to construct a 95% t interval for β the true slope of the population regression line relating time at the table and caloric consumption. Plan: If the conditions are met, we will construct a t interval for the slope .β The conditions were checked in (a). The sample size is 23 so the appropriate distribution has 21 degrees of freedom and t* 2.08 . This leads to the confidence interval of 9.695 2.08(1.889) 9.695 3.929 (13.624, 5.766) . We are 95% confident that the interval from -13.624 to -5.766 contains the true slope of the population regression line for predicting puls rate from swim time. (a) In computing a 95% confidence interval we use a t distribution with 19 degrees of freedom and t* 2.093 . The interval is computed as 11,630.6 2.093(1, 249) 11,630.6 2,614.16 (9016.44,14244.76) (b) The vehicle is measured in years and mileage in miles. Since the automotive group claims that people drive 15,000 miles per year, that says that for every increase of 1 year, the mileage would increase by 15,000 miles. This translates into a slope of 15,000. (c) Since the interval in part (a) does not include the value 15,000, it suggests that the slope could not plausibly be 15,000. That is, we would reject the null hypothesis that the slope is 15,000. 12.1 MC 12.2b 21. C 34 22. D 23. C 24. A 25. B 26. B (a) The scatterplot is given below. The scatterplot shows a strong, negative, curved relationship between volume and pressure. (b) In this case the explanatory variable is the reciprocal of the volume and the response variable is the pressure. (c) Here the explanatory variable is the volume and the response variable is the reciprocal of the pressure. 36 (a) For transformation 1: yˆ 0.3677 15.8994 1x where y is the pressure and x is the volume. For transformation 2: 1y 0.1002 0.0398 x where y is the pressure and x is the volume. 1 =1.303 atmospheres.. (b) For transformation 1: yˆ 0.3677 15.8994 17 1 For transformation 2: 1y 0.1002 0.0398(17) 0.7768 so ŷ 0.7768 atmospheres. (c) For transformation 1: The typical distance that a predicted value of the pressure will be from the actual value is about 0.044 atmospheres. For transformation 2: The typical distance that the reciprocal of the predicted value of pressure will be from the reciprocal of the actual value is about 0.00355 (atmospheres) 1 Chapter 12 Study Guide Solutions 12.2c 37 (a) The scatterplot is below The relationship is strong, negative, and slightly curved with one potential outlier in the top left hand corner. (b) Since the graph of the explanatory variable against the natural log of the response is fairly linear, an exponential model would be reasonable. (c) ln y 5.973 0.218 x where y is the count of surviving bacteria and x is time in minutes. (d) ln y 5.973 0.218(17) 2.267 so yˆ e 2.267 9.65 or 965 bacteria. Since the residual plot shows a random scatter around the value of 0, we would expect this prediction to be about right. 38 (a) The scatterplot is below The relationship is strong, negative, and slightly curved with no outliers. (b) Since the graph of the explanatory variable against the natural log of the response is fairly linear, an exponential model would be reasonable. (c) ln y 6.789 0.333x where y is the light intensity and x is the depth. (a) ln y 6.789 0.333(12) 2.793 so yˆ e 2.793 16.33 lumens. This gives a residual of 2.785 2.793 0.008 . The value of s for this model is 0.00006, so the residual for this point is quite large. 12.2d 40 The equation for the regression line is ln y 2.00 2.42ln x where x is the diameter at breast height in cm, and y is the aboveground biomass in kg. If a tree is 30 cm in diameter, then ln y 2.00 2.42ln(30) 6.231 . This means that yˆ e6.231 508.263 kg is the total aboveground biomass of the tree. 42 (a) The exponential model would work better because the graph with only the response variable transformed is linear whereas the graph with both variables transformed has curvature to it. (b) log y 0.4537 0.1172 x where y is the height in feet and x is the bounce number. (c) log y 0.4537 0.1172(7) 0.3667 so yˆ 100.3667 0.4298 feet. (d) The trend in the residual plot suggests that the residual would be positive which means that our prediction would be too low. Chapter 12 Study Guide Solutions 44 (a) The scatterplot is below. (b) Two scatterplots are given below. (c) The MiniTab output is given below. The regression equation is ln(weight) = -0.314 + 3.14 ln(length) Predictor Constant ln(length) Coef -0.3140 3.1387 S = 0.353543 SE Coef 0.1958 0.1151 R-Sq = 99.3% T -1.60 27.27 P 0.170 0.000 R-Sq(adj) = 99.2% So the equation is ln y 0.314 3.1387 ln x where y is the weight of the heart and x is the length of the cavity of the left ventricle. (d) ln y 0.314 3.1387 ln(6.8) 5.703 so yˆ e5.703 299.77 grams. 12.2 MC 45. C 46. E 47. E 48. C