Chapter 14 Review Questions 1. Which of the following is NOT one of the basic assumptions that must be satisfied in order to perform inference for regression of y on x? (a) For each value of x, the corresponding population of y-values is normally distributed. (b) The standard deviation of the population of y-values corresponding to a particular value of x is always the same regardless of the specific value of x. (c) The sample size (the number of paired observations (x, y) in the sample data) exceeds 30. (d) There exists a straight line y = + x such that for each value of x, the mean µy of the corresponding population of y-values lies on that straight line. 2. If the assumptions for regression inference are met, then a normal probability plot of the residuals should be (a) Bell shaped (b) A group of randomly scattered points (c) Roughly linear (d) Clearly curved 3. If a test of hypotheses rejects H0: = 0 in favor of the alternative hypothesis Ha: > 0, where is the population regression slope, then the least-squares regression line (a) Slopes downward and to the right when plotted on the scatterplot of paired observations (x, y) (b) Is useful for predicting y given x (within the limits of x-values covered by the data) (c) Can be extrapolated beyond the limits of the x-values covered by the data to predict y at any possible x (d) Is not useful for predicting y given x 4. Inference for regression on the population regression slope is based on which of the following distributions? (a) The t distribution with n – 1 degrees of freedom (b) The standard normal distribution (c) The chi-square distribution with n – 1 degrees of freedom (d) The t distribution with n – 2 degrees of freedom 5. Suppose that inference for regression is conducted on the following small data set: x y 12 2 14 3 16 5 18 6 The number of degrees of freedom for our test statistic is (a) 4 (b) 3 (c) 2 (d) Inference cannot be conducted on this data set because it is too small. (e) The answer cannot be determined from the information given. 6. In inference for regression, the statistic s represents (a) the estimate of the standard deviation σ in the regression model. (b) the standard deviation of the x-values in the paired observations (x, y). (c) the estimate of the y-intercept. (d) the standard deviation of the y-values in the paired observations (x, y). The following information is used for questions 7-10. The effects of a toxic pollutant upon fish were examined by placing fish in a two-liter solution of water with various concentrations of the pollutant. The time (in minutes) until the fish showed distress was recorded at which time the fish were removed from the container. A total of 18 different experiments were performed. Note that the pollutant is measured on a logarithmic scale where a change of one unit represents an increase of 10 fold in the pollution concentration. A preliminary plot of the data showed that the relationship of time vs. log(pollution) was approximately linear. The output appears below: SOURCE DF SUM OF SQUARES MEAN SQUARE F VALUE PR > F MODEL ERROR CORR. TOTAL 1 16 17 2.21459712 6.45556062 8.67015774 2.21459712 0.40347254 5.49 0.0324 PARAMETER INTERCEPT LOGPOLLUT ESTIMATE 7.5641 -1.0269 T FOR H0: PARAMETER=0 3.82 -2.34 PR > |T| 0.0015 0.0324 STD ERROR OF ESTIMATE 1.978 0.438 7. The fitted regression line is: (a) ŷ = –1.03 + 7.56 x (b) ŷ = 7.56 – 1.03 x (c) ŷ = 3.28 – 2.34 x (d) ŷ = 7.56 – 10.27 x (e) ŷ = –1.03 + 75.64 x 8. A 95% confidence interval for the slope is: (a) 7.56 ± 1.96 (1.978) (b) –1.03 ± 1.96 (0.438) (c) 7.56 ± 2.110 (1.978) (d) –1.03 ± 2.110 (.438) (e) –1.03 ± 2.120 (.438) 9. An appropriate null and alternate hypothesis to test the slope, the test statistic, and the p-value are: (a) H0: = 0, Ha: ≠ 0, t = –2.34, and p-value = .0324 (b) H0: = 0, Ha: ≠ 0, t = 3.82, and p-value = .0007 (c) H0: = 0, Ha: < 0, t = –2.34, and p-value = .0324 (d) H0: = 0, Ha: ≠ 0, t = 3.82, and p-value = .0015 (e) H0: = 0, Ha: < 0, t = –2.34, and p-value = .0162 10. Which of the following is a reasonable conclusion? (a) There is a positive linear relationship between log(pollutant) concentration and time to distress. (b) There is a negative linear relationship between log(pollutant) concentration and time to distress. (c) There is no identifiable relationship between log(pollutant) concentration and time to distress. (d) The time to distress is due to sampling variability and independent of pollutant. (e) The sample size is too small to allow reasonable statistical inference. Answers: 1. C, 2. C, 3. B, 4. D, 5. C, 6. A, 7. B, 8. E, 9. E, 10. B