10 - Employees

Part IV: Complete Solutions, Chapter 10 553 Chapter 10: Correlation and Regression Section 10.1 1. The explanatory variable is graphed along the horizontal or x axis. The response variable is graphed along the vertical or y axis. 2. If two variables are positively correlated, then the response variable will increase as the explanatory variable increases. 3. If two variables are negatively correlated, then the response variable will decrease as the explanatory variable increases. 4. (a) There is a strong negative linear correlation. (b) There is a weak or no linear correlation. (c) There is a strong positive linear correlation. 5. (a) The points seem close to a straight line, so there is moderate linear correlation. (b) A straight line does not fit the data well, so there is no linear correlation. (c) The points seem very close to a straight line, so there is high linear correlation. 6. (a) The points seem close to a straight line, so there is moderate linear correlation. (b) The points seem very close to a straight line, so there is high linear correlation. (c) A straight line does not fit the data well, so there is no linear correlation. 7. (a) No. The correlation coefficient is a mathematical tool for measuring the strength of a linear relationship between two variables. As such, it makes no implication about cause or effect. Just because two variables tend to increase or decrease together does not mean a change in one is causing a change in the other. A strong correlation between x and y is sometimes due to the presence of lurking variables. (b) Increasing population might be a lurking variable causing both variables to increase. 8. (a) No. The correlation coefficient is a mathematical tool for measuring the strength of a linear relationship between two variables. As such, it makes no implication about cause or effect. Just because two variables tend to increase or decrease together does not mean a change in one is causing a change in the other. A strong correlation between x and y is sometimes due to the presence of lurking variables. (b) Rising cost of living might be a lurking variable causing both variables to increase. 9. (a) No. The correlation coefficient is a mathematical tool for measuring the strength of a linear relationship between two variables. As such, it makes no implication about cause or effect. Just because two variables tend to increase or decrease together does not mean a change in one is causing a change in the other. A strong correlation between x and y is sometimes due to the presence of lurking variables. (b) One lurking variable for the increase in average annual income is inflation. Better training might be a lurking variable responsible for shorter times to run the mile. 10. (a) No. The correlation coefficient is a mathematical tool for measuring the strength of a linear relationship between two variables. As such, it makes no implication about cause or effect. Just because two variables tend to increase or decrease together does not mean a change in one is causing a change in the other. A strong correlation between x and y is sometimes due to the presence of lurking variables. (b) A lurking variable might be better health care and treatments. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 554 11. (a) Scatterplot of Kilograms vs Months 200 175 Kilograms 150 125 100 75 50 0 5 10 15 20 25 Months (b) Since the points lie close to a straight line, the correlation is strong and positive. n xy    x   y  5(9,930)  (63)(650)   0.972 (c) r  2 2 5(1, 089)  (63) 2 5(95,350)  (650) 2 n x 2    x  n y 2    y  Since r is positive, y should tend to increase as x increases. (a) Scatterplot of Administrative Cost % vs Number of Employees 40 Administrative Cost % 12. 35 30 25 20 15 0 10 20 30 40 50 Number of Employees 60 70 80 (b) Since the points lie fairly close to a straight line, the correlation is moderate and negative. n xy    x   y  5(3, 040)  (135)(148)   0.945 (c) r  2 2 2 5(4, 674)  (148) 2 2 2 5(7,133)  (135) n x    x  n y    y  Since r is negative, y should tend to decrease as x increases. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 555 13. (a) Scatterplot of Wind mph vs Millibars 150 Wind mph 125 100 75 50 930 940 950 960 970 Millibars 980 990 1000 1010 (b) Since the points lie very close to a straight line, the correlation is strong and negative. n xy    x   y  6(556,315)  (5,823)(580)   0.990 (c) r  2 2 6(5, 655, 779)  (5,823) 2 6(65, 750)  (580) 2 n x 2    x  n y 2    y  Since r is negative, y should tend to decrease as x increases. 14. (a) Scatterplot of Depth km vs Magnitude 12 11 10 Depth km 9 8 7 6 5 4 3 2.5 3.0 3.5 Magnitude 4.0 4.5 (b) Since the points are not close to a straight line, the correlation is low and positive. n xy    x   y  7(190.18)  (24.1)(53.5)   0.511 (c) r  2 2 7(85.75)  (24.1)2 7(458.31)  (53.5)2 n x 2    x  n y 2    y  Since r is positive, y should tend to increase as x increases. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 556 15. (a) Scatterplot of Home Run Percentage vs Batting Average 8 Home Run Percentage 7 6 5 4 3 2 1 0.250 0.275 0.300 Batting Average 0.325 0.350 (b) Since the points are very close to a straight line, the correlation is strong and positive. n xy    x   y  7(8.753)  (1.957)(30.1)   0.948 (c) r  2 2 2 2 7(0.553)  (1.957)2 7(150.15)  (30.1) 2 n x    x  n y    y  Since r is positive, y should tend to increase as x increases. 16. (a) Scatterplot of Number of Burglaries vs Student Enrollment (1000s) 80 Number of Burglaries 70 60 50 40 30 20 10 10 15 20 Student Enrollment (1000s) 25 30 (b) Since the points are fairly close to a straight line, the correlation is moderate and positive. n xy    x   y  8(5, 488.4)  (152.8)(246)   0.765 (c) r  2 2 8(3,350.98)  (152.8)2 8(10, 030)  (246)2 n x 2    x  n y 2    y  Since r is positive, y should tend to increase as x increases. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 17. (a) The axes are scaled equally. (b) The y axis is twice the x axis. Copyright © Houghton Mifflin Company. All rights reserved. 557 Part IV: Complete Solutions, Chapter 10 558 (c) Scatterplot of y vs x 9 8 7 6 y 5 4 3 2 1 1 2 3 4 5 6 7 x The x axis is twice the y axis. (d) Stretching the scale on the y axis makes the line appear steeper. Shrinking the scale on the y axis makes the line appear flatter. The slope of the line does not change. Only the appearance of the slope changes as the scale of the y axis changes. 18. (a) We get the same result. Notice that in the numerator of the formula for r, xy  yx and (x)(y)  (y)(x). Then notice that the denominator n x 2    x  2 n y 2    y   n  y 2    y  2 2 n x 2    x  . 2 (b) We get the same result. (c) First set: 3(29)  (8)(9) r  0.61859 3(26)  (8) 2 3(41)  (9)2 Second set: 3(29)  (9)(8) r  0.61859 3(41)  (9)2 3(26)  (8) 2 19. (a) r  0.972 with n = 5 Since |0.972| = 0.972  0.88 for  = 0.05, r is significant, and we conclude that age and weight of Shetland ponies are correlated. (b) r  0.990 with n = 6 Since |0.990| = 0.990  0.92 for  = 0.01, r is significant, and we conclude that the lowest barometric pressure reading and maximum wind speed for cyclones are correlated. 20. (a) No, becaquse |0.820| = 0.820 < 0.87 for n = 7 and  = 0.01. Yes, because |0.900| = 0.900  0.80 for n = 9 and  = 0.01. (b) No, because |0.40| = 0.40 < 0.44 for n = 20 and  = 0.05. Yes, because |0.40| = 0.40  0.38 for n = 27 and  = 0.05. (c) No; no; no; a larger sample size means that a smaller |r| might be significant. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 559 21. (a) Scatterplot of Gallons Wasted vs Hours 60 Gallons Wasted 50 40 30 20 10 0 5 10 15 20 Hours n xy    x   y  r n x    x  2 2 n y    y  2 2 25  30 35 8(6, 067)  (154)(249) 8(3, 712)  (154)2 8(9,959)  (249)2  0.991 (b) For variables based on averages, x  19.25 h; sx  10.33 h; y  31.13 gal; s y  17.76 gal. For variables based on single individuals, x  20.13 h; sx  13.84 h; y  31.87 gal; s y  25.18. Dividing by larger numbers results in a smaller value. (c) Scatterplot of Gallons Wasted vs Hours 70 Gallons Wasted 60 50 40 30 20 10 0 0 r 10 20 Hours 30 8(7, 071)  (161)(255) 8(4,583)  (161) 2 8(12,565)  (255) 2 40  0.794 (d) 0.991 > 0.791 Yes, by the central limit theorem, the x distribution has a smaller standard deviation than the corresponding x distribution. 22. (a) No. Interest rate probably affects both investment returns. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 560 (b) For w = 0.6x + 0.4y, a = 0.6, b = 0.4  w  a  x  b y  w  0.6(7.32)  0.4(13.19)  w  9.67  w2  a 2 x2  b 2 2y  2ab x y   w2  (0.6)2 (6.59)2  (0.4)2 (18.56)2  2(0.6)(0.4)(6.59)(18.56)(0.424)  w2  95.64  w  95.64  9.78 (c) For w = 0.4x + 0.6y, a = 0.4, b = 0.6  w  0.4(7.32)  0.6(13.19)  w  10.84  w2  (0.4)2 (6.59)2  (0.6)2 (18.56)2  2(0.4)(0.6)(6.59)(18.56)(0.424)  w2  155.85  w  155.85  12.48 (d) w  0.4x  0.6 y produces higher returns with greater risk as measured by  w . Section 10.2 Note: In Sections 10.2, 10.3, and 10.4, answers may vary slightly depending on rounding considerations. 1. The slope is b = –2. When x increases by 1 unit, y decreases by 2 units. 2. The marginal change is the slope, b = 3 units. 3. We are extrapolating. Extrapolation is dangerous because the pattern of data may change outside the x range. 4. It is negative. 5. (a) yˆ  318.16  30.878x (b) There will be about 31 few frost-free days (30.878). (c) Here, r ≈ –0.981. Note that if the slope is negative, then r is also negative. 6. (a ) yˆ  0.8565  0.40248x (b) It increases about 40.25 kcal/24 h. (c ) Here, r ≈ 0.984. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 561 7. (a) Scatterplot of Entry Level Jobs vs Jobs 9 8 Entry Level Jobs 7 6 5 4 3 2 1 15 20 25 30 35 40 45 50 Jobs (b) Use a calculator to verify.  x  202  33.67 jobs (c) x  n 6 y 28    4.67 entry-level jobs y n 6 n xy    x   y  6(1, 096)  (202)(28) b   0.161 2 6(7, 754)  (202) 2 n x 2    x  a  y  bx  4.67  (0.161)(33.67)  0.748 ŷ  a  bx or yˆ  0.748  0.161x (d) See figure of part (a). (e) r 2  (0.860)2  0.740 This means that 74.0% of the variation in y = number of entry-level jobs can be explained by the corresponding variation in x = total number of jobs using the least-squares line. 100%  74.0% = 26.0% of the variation is unexplained. (f) Use x = 40. yˆ  0.748  0.161(40) yˆ  5.69 entry-level jobs 8. (a) Scatterplot of Weight (kg) vs Age (Weeks) 200 175 Weight (kg) 150 125 100 75 50 0 10 20 Age (Weeks) (b) Use a calculator to verify. Copyright © Houghton Mifflin Company. All rights reserved. 30 40 Part IV: Complete Solutions, Chapter 10 562 (c) x   x  92  15.33 weeks n 6  y  617  102.83 kg y b n 6 n xy    x   y  n x 2    x  2  6(13, 642)  (92)(617) 6(2,338)  (92) 2  4.509 a  y  bx  102.83  (4.509)(15.33)  33.70 yˆ  33.70  4.51x (d) See figure of part (a). (e) r 2  (0.998)2  0.996 This means that 99.6% of the variation in y = weight can be explained by the corresponding variation in x = age using the least-squares line. 100%  99.6% = 0.4% of the variation is unexplained. (f) Use x = 12. yˆ  33.70  4.51(12) yˆ  87.8 kg (a) Scatterplot of MPG vs Weight (100s lbs) 30 25 MPG 9. 20 15 10 20 25 30 35 40 Weight (100s lbs) 45 50 55 (b) Use a calculator to verify.  x 299   37.375 n 8  y 167 y   20.875 n 8 n xy    x   y  8(5,814)  (299)(167) b   0.6007 2 8(11,887)  (167) 2 n x 2    x  a  y  bx  20.875  (0.6007)(37.375)  43.326 yˆ  a  bx or yˆ  43.326  0.6007 x (d) See figure of part (a). (c) x  (e) r 2  (0.946)2  0.895 This means that 89.5% of the variation in y = miles per gallon can be explained by the corresponding variation in x = weight using the least-squares line. 100%  89.5% = 10.5% of the variation is unexplained. (f) Use x = 38. yˆ  43.326  0.6007(38)  20.5 mpg Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 563 10. (a) Scatterplot of Winning Percentage vs Number of Fouls Winning Percentage 50 45 40 35 30 25 0 1 2 3 Number of Fouls 4 5 6 (b) Use a calculator to verify.  x 13 (c) x    3.25 n 4  y 154 y   38.5 n 4 n xy    x   y  4(411)  (13)(154) b   3.934 2 4(65)  (13) 2 n x 2    x  a  y  bx  38.5  (3.934)(3.25)  51.29 yˆ  a  bx or yˆ  51.29  3.934 x (d) See the figure in part (a). (e) r 2  (0.988)2  0.976 This means that 97.6% of the variation in y = percent of wins can be explained by the variation in x = excess fouls using the least-squares line. 100%  97.6% = 2.4% of the variation is unexplained. (f) Use x = 4. yˆ  51.29  3.934(4)  35.55% 11. (a) Scatterplot of Speeding-Related Fatal % vs Age Speeding-Related Fatal % 40 30 20 10 0 10 20 30 40 50 Age (b) Use a calculator to verify. Copyright © Houghton Mifflin Company. All rights reserved. 60 70 80 Part IV: Complete Solutions, Chapter 10 564  x 329   47 years n 7  y 115 y   16.43% n 7 n xy    x   y  7(4015)  (329)(115) b ,  0.496 2 7(18, 263)  (329) 2 n x 2    x  a  y  bx  16.43  (0.496)(47)  39.761 yˆ  a  bx or yˆ  39.761  0.496 x (d) See the figure in part (a). (c) x  (e) r 2  (0.959)2  0.920 This means that 92% of the variation in y = percentage of all fatal accidents due to speeding can be explained by the corresponding variation in x = age in years of a licensed automobile driver using the least-squares line. 100%  92% = 8% of the variation is unexplained. (f) Use x = 25. yˆ  39.761  0.496(25)  27.36% 12. (a) Scatterplot of Right-of-Way Fatal % vs Age Right-of-Way Fatal % 40 30 20 10 0 30 40 50 60 Age 70 80 90 (b) Use a calculator to verify.  x 372 (c) x    62 years n 6  y 112 y   18.67% n 6 n xy    x   y  6(8, 254)  (372)(112) b   0.749 2 6(24,814)  (372)2 n x 2    x  a  y  bx  18.67  0.749(62)  27.745 yˆ  a  bx or yˆ  27.75  0.75 x (d) See the figure in part (a). (e) r 2  (0.943)2  0.889 This means that 88.9% of the variation in y = percentage of fatal accidents due to failure to yield to right of way can be explained by the corresponding variation in x = age of a licensed driver in years using the least-squares line. 100%  88.9% = 11.1% of the variation is unexplained. (f) Use x = 70. yˆ  27.75  0.75(70) yˆ  24.75% Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 565 13. (a) Scatterplot of Number of Doctors vs Per Capita Income 22 Number of Doctors 20 18 16 14 12 10 8.0 8.5 9.0 Per Capita Income 9.5 10.0 (b) Use a calculator to verify.  x 53 (c) x    $8.83 thousand n 6  y 83.7 y   13.95 physicians per 10,000 n 6 n xy    x   y  6(755.89)  (53)(83.7) b   5.756 2 6(471.04)  (53) 2 n x 2    x  a  y  bx  13.95  5.756(8.83)  36.898 yˆ  a  bx or yˆ  36.898  5.756 x (d) See the figure in part (a). (e) r 2  (0.934)2  0.872 This means that 87.2% of the variation in y = number of medical doctors per 10,000 residents can be explained by the corresponding variation in x = per capita income in thousands of dollars using the least-squares line. 100%  87.2% = 12.8% of the variation is unexplained. (f) Use x = 10. yˆ  36.898  5.756(10)  20.7 physicians per 10,000 residents 14. (a) Scatterplot of % Change Imprisonment Rate vs % Change Violent Crime % Change Imprisonment Rate 5.0 2.5 0.0 -2.5 -5.0 -7.5 3 4 5 6 7 8 9 % Change Violent Crime Copyright © Houghton Mifflin Company. All rights reserved. 10 11 12 Part IV: Complete Solutions, Chapter 10 566 (b) Use a calculator to verify.  x 44.7 (c) x    6.386% n 7  y 17.4 y   2.486% n 7 n xy    x   y  7( 107.18)  (44.7)( 17.4) b   0.129 2 7(315.85)  (44.7) 2 n x 2    x  a  y  bx  2.486  (0.129)(6.386)  3.311 yˆ  a  bx or yˆ  3.311  0.129 x (d) See the figure in part (a). (e) r 2  (0.084)2  0.007 This means that 0.7% of the variation in y = percent change in the rate of imprisonment can be explained by the corresponding variation in x = percent change in the rate of violent crime using the least-squares line. 100%  0.7% = 99.3% of the variation is unexplained. (f) The correlation between the variables is so low that it does not make sense to use the least-squares line for prediction. 15. (a) Number of Violent vs % Not Attending Number of Violent Crimes / 1000 14 12 10 8 6 4 2 0 15 16 17 18 19 20 21 22 % Not Attending / Not Graduated 23 24 (b) Use a calculator to verify.  x 112.8 (c) x    18.8% n 6  y 32.4 y   5.4 crimes per 1,000 n 6 n xy    x   y  6(665.03)  (112.8)(32.4) b   1.202 2 6(2,167.14)  (112.8) 2 n x 2    x  a  y  bx  5.4  1.202(18.8)  17.204 yˆ  a  bx or yˆ  17.204  1.202 x (d) See the figure in part (a). (e) r 2  (0.764)2  0.584 This means that 58.4% of the variation in y = reported violent crimes per 1,000 residents can be explained by the corresponding variation in x = percentage of 16- to 19-year-olds not in school and not high-school graduates using the least-squares line. 100%  58.4% = 41.6% of the variation is unexplained. (f) Use x = 24. yˆ  17.204  1.202(24)  11.6 crimes per 1,000 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 567 16. (a) Scatterplot of Mean Patents vs Research Programs 2.0 1.8 Mean Patents 1.6 1.4 1.2 1.0 0.8 0.6 10 12 14 16 Research Programs 18 20 (b) Use a calculator to verify.  x 90 (c) x    15.0 n 6  y 8.1 y   1.35 n 6 n xy    x   y  6(113.8)  (90)(8.1) b   0.1100 2 6(1420)  (90) 2 n x 2    x  a  y  bx  1.35  (0.1100)(15.0)  3.0 yˆ  a  bx or yˆ  3.0  0.11x (d) See the figure in part (a). (e) r 2  (0.973)2  0.946 This means that 94.6% of the variation in y = number of patents can be explained by the corresponding variation in x = number of programs using the least-squares line. 100%  94.6% = 5.4% of the variation is unexplained. (f) Use x = 15. yˆ  3.0  0.11(15)  1.35 patents 17. (a) Scatterplot of % Unidentified Artifacts vs Elevation % Unidentified Artifacts 60 50 40 30 20 10 5.0 5.5 6.0 6.5 Elevation (b) Use a calculator to verify. Copyright © Houghton Mifflin Company. All rights reserved. 7.0 7.5 Part IV: Complete Solutions, Chapter 10 568  x 31.25   6.25 n 5  y 164 y   32.8 n 5 n xy    x   y  5(1, 080)  (31.25)(164) b   22.0 2 5(197.813)  (31.25) 2 n x 2    x  a  y  bx  32.8  22.0(6.25)  104.5 yˆ  a  bx or yˆ  104.7  22.0 x (d) See the figure in part (a). (c) x  (e) r 2  (0.913)2  0.833 This means that 83.3% of the variation in y = % unidentified artifacts can be explained by the corresponding variation in x = elevation. 100%  83.3% = 16.7% of the variation is unexplained. (f) Use x = 6.5. yˆ  104.7  22.0(6.5)  38.3 percent 18. (a) Scatterplot of Temperature vs Chirps / Second 95 Temperature 90 85 80 75 70 14 15 16 17 18 Chirps / Second 19 20 (b) Use a calculator to verify.  x 249.8 (c) x    16.65 n 15  y 1, 200.6 y   80.04 n 15 n  xy    x   y  15(20,127.47)  (249.8)(1, 200.6) b   3.291 2 15(4, 200.56)  (249.8)2 n x 2    x  a  y  bx  80.04  3.299(16.65)  25.232 yˆ  a  bx or yˆ  25.232  3.291x (d) See the figure in part (a). (e) r 2  (0.835)2  0.697 This means that 69.7% of the variation in y = temperature can be explained by the corresponding variation in x = chirps per second using the least-squares line. 100%  69.7% = 30.3% of the variation is unexplained. (f) Use x = 19. yˆ  25.232  3.291(19)  87.7F Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 569 19. (a) Yes. The pattern of residuals appears randomly scattered around the horizontal line at 0. (b) No. There do not appear to be any outliers. 20. (a) x y y p  43.3263  0.6007 x Residual = y  y p 27 44 32 47 23 40 34 52 30 19 24 13 29 17 21 14 27.1 16.9 24.1 15.1 29.5 19.3 22.9 12.1 2.9 2.1 0.1 2.1 0.5 2.3 1.9 1.9 Residuals Versus Weight (response is MPG) 3 2 Residual 1 0 -1 -2 -3 20 25 30 35 40 45 50 55 Weight (b) The residuals seem to be scattered randomly around the horizontal line at 0. There do not appear to be any outliers. 21. (a) (b) (c) (d) Result checks. Result checks. Yes. y  0.143  1.071x y  0.143  1.071x y  0.143 x 1.071 1 0.143 y x 1.071 1.071 or x  0.9337 y  0.1335 The equation x = 0.9337y  0.1335 does not match part (b) with the symbols x and y exchanged. (e) In general, switching x and y values produces a different least-squares equation. It is important that when you perform a linear regression, you know which variable is the explanatory variable and which is the response variable. 22. (a) (b) (c) (d) No, a straight line does not fit the data well. The data seem to explode as x increases. The transformed data fit a straight line better. Use a calculator to verify. Use a calculator to verify. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 570 (e)   100.15  1.413   100.404  2.535 y  1.413  2.535 x 23. (a) Scatterplot of Number of Locusts vs Day of Observation 700 Number of Locusts 600 500 400 300 200 100 0 1 2 3 4 5 6 7 Day of Observation 8 9 10 These data are not described well by a straight line because the y values seem to explode as x increases. (b) Scatterplot of Log Locusts vs Day of Observation 3.0 2.5 Log Locusts 2.0 1.5 1.0 0.5 0.0 1 2 3 4 5 6 7 Day of Observation 8 9 10 The transformed data are better represented by a straight line. (c) yˆ  0.365  0.311x r  0.998 (d)   100.365  0.432   100.311  2.046 y  0.432  2.046  x Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 571 24. (a) Scatterplot of Log Amps vs Log Microseconds 0.65 0.60 Log Amps 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.3 0.4 0.5 0.6 0.7 Log Microseconds 0.8 0.9 1.0 The transformed data seem to be represented well by a straight line. (b) Using the transformed data, yˆ  0.128  0.492 x r  0.987 (c)   100.128  1.343   b  0.492 0.492 yˆ  1.343  x  25. (a) Scatterplot of Log Critical Sheer Strength vs Log Boiler Pressure 1.2 Log Critical Sheer Strength 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.6 0.7 0.8 Log Boiler Pressure 0.9 1.0 The transformed data seem to be represented well by a straight line. (b) Using the transformed data, yˆ  0.451  1.600 x r  0.991 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 572 (c)   100.451  0.351   b  1.600 1.600 yˆ  0.354  x  Section 10.3 1. The symbol is rho, ρ. 2. The symbol is beta, β. 3. As x moves farther from x , the confidence interval for the predicted y becomes wider. 4. The t value for b equals the t value for r. 5. (a) The predictor variable is diameter. (b) Here, a = –0.223, b = 0.8748, and yˆ  0.223  0.7848x. (c) The P value of b is 0.001. We are testing H0: β = 0 versus H1: β ≠ 0, and we reject H0 and conclude that the slope is not 0. (d) Here, r ≈ 0.896. Yes, the correlation coefficient is significant at the α = 0.01 level because the P value < α. 6. (a) Se = 4.08. (b) The standard error for the slope is approximately 0.1471. (c ) Here, d.f. = n – 2 = 7 and tc = t0.95 = 2.365. E ≈ 2.365 × 0.1471 = 0.348. The interval is 0.7848  0.348   0.437, 1.113 . 7. (a) Use a calculator to verify. (b)  = 0.05 H0 :   0 H1:   0 t r n2  0.784 6  2  2.526 1 r 1  0.7842 d.f. = n  2 = 6  2 = 4 From Table 6 in Appendix II, 2.526 falls between entries 2.132 and 2.776. Use the one-tailed areas to find that 0.025 < P value < 0.050. Since the P-value interval  0.05, we reject H 0 . At the 5% level of significance, there seems to be a positive correlation between x and y. (c) Use a calculator to verify. (d) ŷ  a  bx or yˆ  16.542  0.4117 x Use x = 70. yˆ  16.542  0.4117(70)  45.36% 2 (e) d.f. = 4, tc  2.132 E  tc Se 1  1 n( x  x ) 2 1 6(70  73.167)2   2.132(2.6964) 1    6.31 n n  x 2    x 2 6 6(32.393)  (439)2 The 90% confidence interval is yˆ  E  y  yˆ  E 45.36  6.31  y  4,536  6.31 39.05  y  51.67 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 573 (f)  = 0.05 H0:   0 H1:   0 b 1 0.4117 1 2 32,393  (439)2  2.522  x2    x   Se n 2.6964 6 d.f. = n  2 = 6  2 = 4 From Table 6 in Appendix II, 2.522 falls between entries 2.132 and 2.776. Use the one-tailed areas to find that 0.025 < P value < 0.050. Since the P-value interval < 0.05, we reject H 0 . At the 5% level of significance, there seems to be a positive slope between x and y. (g) d.f. = 4, tc  2.132, b  0.4117 t E tc Se  x2  1n  x  2  2.132(2.6964) 32,393  16 (439)2  0.3480 The 90% confidence interval is bE   bE 0.4117  0.3480    0.4117  0.3480 0.064    0.760 For every percentage increase in successful free throws, the percentage of successful field goals increases by an amount between 0.06 and 0.76. 8. (a) Use a calculator to verify. (b)  = 0.05 H0 :   0 H1:   0 t r n2 1 r2  0.891 6  2 1  (0.891)2  3.93 d.f. = n  2 = 6  2 = 4 From Table 6 in Appendix II, 3.93 falls between entries 3.747 and 4.604. Use the two-tailed areas to find that 0.01 < P value < 0.020. Using a TI-84, P value ≈ 0.0171. Since the P-value interval  0.05, we reject H 0 . At the 5% level of significance, there seems to be a correlation between x and y. (c) Use a calculator to verify. (d) ŷ  a  bx or yˆ  26.247  65.081x Use x = 0.300. yˆ  26.247  65.081(0.300)  6.72 (e) d.f. = 4, tc  1.533, x   x  1.842  0.307 n 6 1 n( x  x )2 1 6(0.300  0.307)2 E  tc Se 1    1.533(1.6838) 1    2.79 n n  x 2    x 2 6 6(0.575838)  (1.842)2 The 80% confidence interval is yˆ  E  y  yˆ  E 6.72  2.79  y  6.72  2.79 3.93  y  9.51 (f)  = 0.05 H0:   0 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 574 H1:   0 b 1 65.081 1 2 0.575838  (1.842)2  3.93  x2    x   Se n 1.6838 6 d.f. = n  2 = 6  2 = 4 From Table 6 in Appendix II, 3.93 falls between entries 3.747 and 4.604. Use the two-tailed areas to find that 0.01 < P value < 0.020. Using a TI-84, P value ≈ 0.0171. Since the P-value interval  0.05, we reject H 0 . At the 5% level of significance, the slope seems to be significant. t (g) d.f. = 4, tc  2.132, b  65.081 E tc Se  x2  x  1n 2 2.132(1.6838)  0.575838  16 (1.842)2  35.297 The 90% confidence interval is bE   bE 65.081  35.297    65.081  35.297 100.38    29.78 For every unit increase in batting average, the percentage strikeouts decrease by between 100% and 29%. 9. (a) Use a calculator to verify. (b)  = 0.01 H0 :   0 H1:   0 t r n2 1 r2  0.976 7  2 1  (0.976)2  10.02 d.f. = n  2 = 7  2 = 5 From Table 6 in Appendix II, 10.02 falls to the right of entry 6.869. Use the one-tailed areas to find P value < 0.0005. Using a TI-84, P value ≈ 0.00008. Since the P-value interval  0.01, we reject H 0 . At the 1% level of significance, the evidence supports a negative correlation between x and y. (c) Use a calculator to verify. (d) ŷ  a  bx or yˆ  3.366  0.0544 x Use x = 18. yˆ  3.366  0.0544(18)  2.39 hours (e) d.f. = 5, tc  1.476, x   x  201.4  28.77 n 7 n( x  x )2 1 1 7(18  28.77)2 E  tc Se 1    1.476(0.1660) 1    0.276 n n  x 2    x 2 7 7(6, 735.46)  (201.4)2 The 80% confidence interval is yˆ  E  y  yˆ  E 2.39  0.276  y  2.39  0.276 2.11  y  2.67 (f)  = 0.01 H0:   0 H1:   0 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 575 b 1 0.0544 1 2 6,735.46  (201.4)2  10.02  x2    x   Se n 0.1660 7 From Table 6 in Appendix II, 10.02 falls to the right of entry 6.869. Use the one-tailed areas to find P value < 0.0005. Using a TI-84, P value ≈ 0.00008. Since the P-value interval  0.01, we reject H 0 . At the 1% level of significance, the sample evidence supports a negative slope. (g) d.f. = 5, tc  2.015, b  0.0544 t E tc Se 2  x2  1n  x  2.015(0.1660)  6, 735.46  17 (201.4)2  0.011 The 90% confidence interval is bE    b E 0.054  0.011    0.054  0.011 0.065    0.043 For a 1-m increase in depth, the optimal time decreases from about 0.04 to 0.07 hour. 10. (a) Use a calculator to verify. (b)  = 0.01 H0 :   0 H1:   0 t r n2 1 r2  0.984 5  2 1  (0.984)2  9.57 d.f. = n  2 = 5  2 = 3 From Table 6 in Appendix II, the sample test statistic falls between entries 5.841 and 12.924. Use the one-tailed areas to find that 0.0005 < P value < 0.005. Using a TI-84, P value ≈ 0.0012. Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, the sample evidence supports a positive correlation. (c) Use a calculator to verify. (d) ŷ  a  bx or yˆ  2.869  6.876 x Use x = 4.0. yˆ  2.869  6.876(4.0)  24.63 mm Hg/10. (e) d.f. = 3, tc  2.353, x  E  tc Se 1   x  21.4  4.28 n 5 n( x  x ) 2 1 1 5(4.0  4.28)2   2.353(2.5319) 1    6.54 n n  x 2    x 2 5 5(103.84)  (21.4)2 The 90% confidence interval is yˆ  E  y  yˆ  E 24.63  6.54  y  24.63  6.54 18.1  y  31.2 (f)  = 0.01 H0:   0 H1:   0 b 1 6.876 1 2 103.84  (21.4)2  9.504  x2    x   Se n 2.5319 5 d.f. = n  2 = 5  2 = 3 From Table 6 in Appendix II, the sample test statistic falls between entries 5.841 and 12.924. Use the one-tailed areas to find that 0.0005 < P value < 0.005. Using a TI-84, P value ≈ 0.0012. t Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 576 Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, the sample evidence supports a positive slope. (g) d.f. = 3, tc  3.182, b  6.876 E tc Se 2  x 2  1n  x  3.182(2.5319)  103.84  (21.4)2 5  2.302 The 95% confidence interval is bE   bE 6.876  2.302    6.876  2.302 4.57    9.18 For every one-unit increase in oxygen pressure breathing only available air, the oxygen pressure breathing pure oxygen increases from about 4.57 to 9.18 units. 11. (a) Use a calculator to verify. (b)  = 0.01 H0 :   0 H1:   0 t r n2 1 r2  0.956 6  2 1  (0.956) 2  6.517 d.f. = n  2 = 6  2 = 4 From Table 6 in Appendix II, the sample test statistic falls between entries 4.604 and 8.610. Use onetailed areas to find that 0.0005 < P value < 0.005. Using a TI-84, P value ≈ 0.0014. Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, the sample evidence supports a positive correlation. (c) Use a calculator to verify. (d) ŷ  a  bx or yˆ  1.965  0.7577 x Use x = 14. yˆ  1.965  0.7577(14)  $12.57 thousand. (e) d.f. = 4, tc  1.778, x   x  79.6  13.267 n 6 n( x  x )2 1 1 6(14  13.267)2 E  tc Se 1    1.778(0.1527) 1    0.329 n n x 2    x 2 6 6(1, 057.8)  (79.6)2 The 85% confidence interval is yˆ  E  y  yˆ  E 12.57  0.329  y  12.57  0.329 12.24  y  12.90 (f)  = 0.01 H0:   0 H1:   0 b 1 0.7577 1 2 (1,057.8)  (79.6)2  6.608  x2    x   Se n 0.1527 6 d.f. = n  2 = 6  2 = 4 From Table 6 in Appendix II, the sample test statistic falls between entries 4.604 and 8.610. Use onetailed areas to find that 0.0005 < P value < 0.005. Using a TI-84, P value ≈ 0.0014. Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, the sample evidence supports a positive slope. t Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 577 (g) d.f. = 4, tc  2.776, b  0.7577 E tc Se  x2  1n  x  2 2.776(0.1527)  (1, 057.8)  (79.6)2 6  0.318 The 95% confidence interval is bE   bE 0.758  0.318    0.758  0.318 0.44    1.08 For every $1,000 increase in list price, there is an increase in dealer price of between $440 and $1,080. 12. (a) Use a calculator to verify. (b)  = 0.01 H0 :   0 H1:   0 t r n2 1 r2  0.977 5  2  7.936 1  (0.977)2 d.f. = n  2 = 5  2 = 3 From Table 6 in Appendix II, the sample test statistic falls between entries 5.841 and 12.924. Use onetailed areas to find that 0.0005 < P-value < 0.005. Using a TI-84, P-value ≈ 0.0021. Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, the sample evidence supports a positive correlation. (c) Use a calculator to verify. (d) ŷ  a  bx or yˆ  1.4084  0.8794 x Use x = 40. yˆ  1.4084  0.8794(40)  $36.58 thousand (e) d.f. = 3, tc  3.182, x   x  192.5  38.7 n 5 1 n( x  x )2 1 5(40  38.7)2 E  tc Se 1    3.182(1.5223) 1    5.33 n n  x 2    x 2 5 5(7, 676.71)  (193.5)2 The 95% confidence interval is yˆ  E  y  yˆ  E 36.58  5.33  y  36.58  5.33 31.25  y  41.91 (f)  = 0.01 H0:   0 H1:   0 b 1 0.8794 1 2 7,676.71  (193.5)2  7.926  x2    x   Se n 1.5223 5 d.f. = n  2 = 5  2 = 3 From Table 6 in Appendix II, the sample test statistic falls between entries 5.841 and 12.924. Use onetailed areas to find that 0.0005 < P value < 0.005. Using a TI-84, P value ≈ 0.0021. Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, the sample evidence supports a positive slope. t Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 578 (g) d.f. = 3, tc  2.353, b  0.8794 tc Se E 1 n  2.353(1.5223)  0.261 2 (193.5)    x 7, 676.71  5 The 90% confidence interval is bE   bE 0.879  0.261    0.879  0.261 0.618    1.140 For every $1,000 increase in list price, the dealer price is $618 to $1,140 higher. x2 2 13. (a)  = 0.01 H 0 :  0 H1:  0 r n2 0.90 6  2 t   4.129 2 1 r 1  (0.90) 2 d. f .  n  2  6  2  4 From Table 6 in Appendix II, 4.129 falls between entries 3.747 and 4.604. Use two-tailed areas to find 0.010 < P value < 0.020. Since the P-value interval > 0.01, we do not reject H 0 . The correlation coefficient  is not significantly different from 0 at the 0.01 level of significance. (b)  = 0.01 H0:   0 H1:   0 r n  2 0.90 10  2 t   5.840 1 r2 1  (0.90) 2 d . f .  n  2  10  2  8 From Table 6 in Appendix II, 5.840 falls to the right of entry 5.041. Use two-tailed areas to find P value < 0.001. Since the P value  0.01, we reject H 0 . The correlation coefficient  is significantly different from 0 at the 0.01 level of significance. (c) From part (a) to part (b), n increased from 6 to 10, and the test statistic t increased from 4.129 to 5.840. For the same r = 0.90 and the same level of significance   0.01, we rejected H 0 for the larger n but not for the smaller n.  r n2  In general, as n increases, the degrees of freedom (n – 2) increase, and the test statistic  t     1 r2   increases. This produces a smaller P value. 14. (a) Yes. The t values are equal (differences are due to rounding error). (b) Yes to all questions. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 579 Section 10.4 1. x1  1.6  3.5x2  7.9 x3  2.0 x4 (a) The response variable is x1. The explanatory variables are x2, x3, and x4. (b) The constant term is 1.6. The coefficient 3.5 corresponds to x2 . The coefficient –7.9 corresponds to x3 . The coefficient 2.0 corresponds to x4 . (c) x2  2, x3  1, x4  5 x1  1.6  3.5(2)  7.9(1)  2.0(5)  10.7 The predicted value is 10.7. (d) In multiple regression, the coefficients of the explanatory variables can be thought of as “slopes” if we look at one explanatory variable’s coefficient at a time while holding the other explanatory variables as arbitrary and fixed constants. x3 and x4 held constant, x2 increased by one unit, the change in x1 would be an increase of 3.5 units. x3 and x4 held constant, x2 increased by two units, the change in x1 would be an increase of 7 units. x3 and x4 held constant, x2 decreased by four units, the change in x1 would be a decrease of 14 units. (e) d . f .  n  k  1  12  3  1  8 A 90% confidence interval for the coefficient of x2 is b2  tS2  2  b2  tS2 . 3.5  1.86(0.419)   2  3.5  1.86(0.419) 2.72   2  4.28 (f)  = 0.05 H 0: 2  0 H1:  2  0 b   2 3.5  0 t 2   8.35 S2 0.419 d.f. = 8 From Table 6 in Appendix II, 8.35 falls to the right of entry 5.041. Use two-tailed areas to find that Pvalue < 0.001. Since 0.001  0.05, reject H 0 . We conclude that 2  0 and x2 should be included as an explanatory variable in the least-squares equation. 2. x3  16.5  4.0 x1  9.2 x4  1.1x7 (a) The response variable is x3 . The explanatory variables are x1, x4 , and x7 . (b) The constant term is –16.5. The coefficient 4.0 corresponds to x1. The coefficient 9.2 corresponds to x4 . The coefficient –1.1 corresponds to x7 . (c) x1  10, x4  1, x7  2 x3  16.5  4.0(10)  9.2(1)  1.1(2)  12.1 The predicted value is 12.1. (d) In multiple regression, the coefficients of the explanatory variables can be thought of as “slopes” if we look at one explanatory variable’s coefficient at a time while holding the other explanatory variables as arbitrary and fixed constants. x1 and x7 held constant, x4 increased by one unit, the change in x3 is an increase of 9.2 units. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 580 x1 and x7 held constant, x4 increased by three units, the change in x3 is an increase of 27.6 units x1 and x7 held constant, x4 decreased by two units, the change in x3 is a decrease of 18.4 units (e) d . f .  n  k  1  15  3  1  11 A 90% confidence interval for the coefficient of x4 is b4  tS4   4  b4  tS4 . 9.2  1.796(0.921)   4  9.2  1.796(0.921) 7.55   4  10.85 (f)  = 0.01 H 0 : 4  0 H1:  4  0 b   4 9.2  0 t 4   9.989 S4 0.921 d.f. = 11 From Table 6 in Appendix II, 9.989 falls to the right of entry 4.437. Use two-tailed areas to find that P value < 0.001. Since 0.001  0.01, we reject H 0 . We conclude that 4  0 and x4 should be included as an explanatory variable in the least-squares equation. 3. (a) s  100 x 9.08% CV  x s x1 150.09 13.63 x2 62.45 9.11 14.59% x3 195.0 17.31 8.88% Relative to its mean, x2 has the greatest spread of data values, and x3 has the smallest spread of data values. (b) r 2 x1x2  (0.979)2  0.958 r 2 x1x3  (0.971)2  0.943 r 2 x2 x3    0.895 The variable x2 has the greatest influence on x1 (0.958  0.943). Yes. Both variables x2 and x3 show a strong influence on x1 because 0.958 and 0.943 are close to 1. 95.8% of the variation of x1 can be explained by the corresponding variation in x2 . 94.3% of the variation of x1 can be explained by the corresponding variation in x3 . (c) R 2  0.977 97.7% of the variation in x1 can be explained by the corresponding variation in x2 and x3 taken together. (d) x1  30.99  0.861x2  0.355x3 In multiple regression, the coefficients of the explanatory variables can be thought of as “slopes” if we look at one explanatory variable’s coefficient at a time while holding the other explanatory variables as arbitrary and fixed constants. If age ( x2 ) were held fixed and x3 increased by 10 pounds, the systolic blood pressure would be expected to increase by 0.335(10) = 3.35. If weight ( x3 ) were held fixed and x2 increased by 10 years, the systolic blood pressure would be expected to increase by 0.861(10) = 8.61. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 581 (e)  = 0.05 H 0 : i  0 H1: i  0 d . f  n  k  1  11  2  1  8 For  2 , the sample test statistic is t = 3.47 with P value = 0.008. For  3 , the sample test statistic is t = 2.56 with P value = 0.034. Since 0.008  0.05 and 0.034  0.05, reject H 0 for each coefficient and conclude that the coefficients of x2 and x3 are not zero. Explanatory variables xi whose coefficients ( i ) are nonzero, contribute information in the least-squares equation; i.e., without these xi , the resulting least-squares regression equation is not as good a fit to the data as is the regression equation that includes these xi . (f) d . f .  8, t  1.86 A 90% confidence interval for i is bi  tSi  i  bi  tSi 0.861  1.86(0.2482)   2  0.861  1.86(0.2482) 0.40   2  1.32 0.335  1.86(0.1307)  3  0.335  1.86(0.1307) 0.09  3  0.58 (g) x1  30.99  0.861(68)  0.335(192)  153.9 Michael’s predicted systolic blood pressure is 153.9, and a 90% confidence interval for this new observation’s value, given these xi (i.e., the prediction interval), is 148.3 to 159.4. 4. (a) x s x1 79.04 12.28 s 100 x 15.53% x2 79.48 12.50 15.73% x3 81.48 11.77 14.44% x4 162.04 24.04 14.83% CV  Relative to its mean, each exam had about the same spread of scores. Yes, it seems that all the exams were about the same level of difficulty. (b) r 2 x1x2  (0.901) 2  0.812 r 2 x1x3  (0.893) 2  0.797 r 2 x1x4    0.895 r 2 x2 x3  (0.846) 2  0.716 r 2 x2 x4  (0.929) 2  0.863 r 2 x3 x4  (0.972) 2  0.945 Exam 3 had the most influence on the final exam 4 (0.945 > 0.895 > 0.863). Even though exam 3 had more influence, the other two exams still had influence on the final because 0.895 and 0.863 are close to 1. (c) R 2  0.990 99.0% of the variation in x4 can be explained by the corresponding variationin x1,x 2 , and x3 taken together. (d) x4  4.34  0.356 x1  0.543x2  1.17 x3 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 582 In multiple regression, the coefficients of the explanatory variables can be thought of as “slopes” if we look at one explanatory variable’s coefficient at a time while holding the other explanatory variables as arbitrary and fixed constants. If age x1 and x2 are held fixed and x3 is increased by 10 points, the final exam score is expected to increase by 10(1.17) = 11.7  12 points. (e)  = 0.05 H 0 : i  0 H1: i  0 d . f  n  k  1  25  3  1  21 For 1, the sample test statistic is t = 2.93 with P value = 0.008. For  2 , the sample test statistic is t = 5.38 with P value  0.000. For  3 , the sample test statistic is t = 11.33 with P value  0.000. Since 0.008  0.05, 0.000  0.05, and 0.000  0.05, reject H 0 for each coefficient and conclude that the coefficients of x1 , x2 , and x3 are not zero. Explanatory variables xi , whose coefficients ( i ) are nonzero, contribute information in the least-squares equation; i.e. without these xi , the resulting leastsquares regression equation is not as good a fit to the data as is the regression equation that includes these xi . (f) d . f .  21, t  1.721 A 90% confidence interval for i is bi  tSi  i  bi  tSi . 0.356  1.721(0.1214)  1  0.356  1.721(0.1214) 0.147  1  0.565 0.543  1.721(0.1008)   2  0.543  1.721(0.1008) 0.370   2  0.716 1.167  1.721(0.1030)  3  1.167  1.721(0.1030) 0.990  3  1.344 (g) x4  4.34  0.356(68)  0.543(72)  1.17(75)  147 Susan’s predicted score on the final exam is 147, and a 90% confidence interval for this new observation’s value, given these xi , i.e., the prediction interval, is 142–151, all rounded to whole numbers. 5. (a) s 100 x 39.64% CV  x s x1 85.24 33.79 x2 8.74 3.89 44.51% x3 4.90 2.48 50.61% x4 9.92 5.17 52.12% Relative to its mean, x4 has the largest spread of data values. The larger the CV, the more we expect the variable to change relative to its average value because a variable with a large CV has a large standard deviation s relative to x , and s measures “spread,” or variability, in the data. x1 has a small CV because we divide by a large mean. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 583 (b) r 2 x1x2  (0.917) 2  0.841 r 2 x1x3  (0.930) 2  0.865 r 2 x1x4    0.226 r 2 x2 x3  (0.790) 2  0.624 r 2 x2 x4  (0.429) 2  0.184 r 2 x3 x4  (0.299) 2  0.089 The variable x4 has the least influence on box office receipts x1 (0.226 < 0.841 < 0.865). x2 = production costs, r 2 x1x2  0.841. 84.1% of the variation of box office receipts can be attributed to the corresponding variation in production costs. (c) R 2  0.967 96.7% of the variation in x1 can be explained by the corresponding variation in x2 , x3 , and x4 taken together. (d) x1  7.68  3.66 x2  7.62 x3  0.83x4 In multiple regression, the coefficients of the explanatory variables can be thought of as “slopes” if we look at one explanatory variable’s coefficient at a time while holding the other explanatory variables as arbitrary and fixed constants. If x2 and x4 were held fixed and x3 were increased by 1 ($1 million), the corresponding change in x1 (box office receipts) would be an increase of $7.62 million. (e)  = 0.05 H 0 : i  0 H1: i  0 d . f  n  k  1  10  3  1  6 For  2 , the sample test statistic is t = 3.28 with P value = 0.017. For  3 , the sample test statistic is t = 4.60 with P value = 0.004. For  4 , the sample test statistic is t = 1.54 with P value = 0.175. Since 0.017  0.05 and 0.004  0.05, reject H 0 for coefficients  2 and 3 and conclude that the coefficients of x2 and x3 are not zero. Since 0.175 > 0.05, do not reject H 0 for the coefficient  4 and conclude that the coefficient of x4 could be zero. If  4  0, then x4 contributes nothing to the population regression line. We can eliminate the variable x4 and fit the estimated regression line without it and probably see little, if any, difference between the predicted values of x1 based on x2 and x3 only and the predicted values of x1 based on x2 , x3 , and x4 . (f) d . f .  6, t  1.943 A 90% confidence interval for i is bi  tSi  i  bi  tSi 3.662  1.943(1.118)   2  3.662  1.943(1.118) 1.49   2  5.83 7.621  1.943(1.657)  3  7.621  1.943(1.657) 4.40  3  10.84 0.8285  1.943(0.5394)   4  0.8285  1.943(0.5394) 0.22   4  1.88 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 584 (g) x1  7.68  3.66(11.4)  7.62(4.7)  0.83(8.1)  91.94 The prediction is $91.94 million, and a 85% confidence interval for this new observation’s value, given these xi (i.e., the prediction interval), is $77.6 million to $106.3 million. (h) x3  0.650  0.102 x1  0.260 x2  0.0899 x4 x3  0.650  0.102(100)  0.260(12)  0.0899(9.2) x3  5.63 The prediction is $5.63 million, and a 80% confidence interval for this new observation’s value, given these xi (i.e. the prediction interval), is $4.21 million to $7.04 million. 6. (a) s 100 x 67.02% CV  x s x1 286.574 192.062 x2 3.326 2.011 60.46% x3 387.481 191.168 49.34% x4 8.100 3.775 46.60% x5 9.693 5.140 53.03% x6 7.741 4.896 63.25% Relative to its mean, x1 has the largest spread of data values, and x4 has a smallest spread of data values. (b) r23  0.844 r24  0.749 r25  0.838 r26  0.766 r34  0.906 r35  0.864 r36  0.807 r45  0.795 r46  0.841 r56  0.870 r 2 x1x2  (0.894)2  0.799 r 2 x1x3  (0.946)2  0.895 r 2 x1x4    0.835 r 2 x1x5  (0.954)2  0.910 r 2 x1x6  (0.912)2  0.832 The variable x5 has the greatest influence on annual net sales (0.910 is the largest). The variable x2 has the least influence on annual net sales (0.799 is the smallest). (c) R 2  0.993 99.3% of the variation in x1 can be explained by the corresponding variation in x2 , x3 , x4 , x5 , and x6 taken together. (d) x1  18.9  16.2 x2  0.175x3  11.5x4  13.6 x5  5.31x6 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 585 In multiple regression, the coefficients of the explanatory variables can be thought of as “slopes” if we look at one explanatory variable’s coefficient at a time while holding the other explanatory variables as arbitrary and fixed constants. If all explanatory variables except x6 remained fixed and x6 increased by 2, then the annual net sales are expected to decrease by 2(5.31) = 10.62 or $10,620. If all explanatory variables except x4 remained fixed and x4 increased by 1 ($1,000), then the annual net sales are expected to increase by 11.5, or $11,500. (e)  = 0.05 H 0 : i  0 H1: i  0 d . f  27  5  1  21 For  2 , the sample test statistic is t = 4.57 with P value  0.000. For  3 , the sample test statistic is t = 3.03 with P value = 0.006. For  4 , the sample test statistic is t = 4.55 with P-value  0.000. For  5 , the sample test statistic is t = 7.67 with P value  0.000. For  6 , the sample test statistic is t = 3.11 with a P value = 0.005. Since all these P values are  0.05, we reject H 0 for each coefficient and conclude that the coefficients of x2 , x3 , x4 , x5 , and x6 are not 0. (f) The predicted annual net sales are 194.41 (or $194.41 thousand), and the 80% confidence interval for this new observation’s value, given these xi (i.e. the prediction interval), is $160.76 thousand to $228.06 thousand. (g) x4  4.14  0.0431x1  0.800 x2  0.00059 x3  0.661x5  0.057 x6 The predicted amount spent on local advertising is $5.571 thousand, and the 80% confidence interval for this new observation’s value, given these xi (i.e., the prediction interval), is $4.048 thousand to $7.094 thousand. Advertising costs for this store should be between $4,048 and $7,094. Chapter 10 Review 1. We expect r to be close to 0. 2. A significant r means that we reject the null hypothesis that ρ = 0 and conclude that the population correlation coefficient is not equal to zero. 3. Results are more reliable for interpolation. 4. Here, residual  y  yˆ  8  6  2. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 586 5. (a) Scatterplot of Mortality Rate vs Age 20 Mortality Rate 19 18 17 16 15 14 1 2 3 Age 4 5  x 15  3 n 5  y 86.9 y   17.38 n 5 n xy    x   y  5(273.4)  (15)(86.9) b   1.27 2 5(55)  (15) 2 n x 2    x  a  y  bx  17.38  1.27(3)  13.57 y  a  bx or y  13.57  1.27 x (b) x  (c) r n xy    x   y  n x2   x 2 n y2   y 2  5(273.4)  (15)(86.9) 5(55)  (15)2 5(1,544.73)  (86.9)2  0.685 r 2  (0.685) 2  0.469 The correlation coefficient r measures the strength of the linear relationship between a bighorn sheep’s age and the mortality rate. The coefficient of determination r 2 means that 46.9% of the variation in mortality rate can be explained by the corresponding variation in age of a bighorn sheep using the least-squares line. (d)  = 0.01 H0:   0 H1:   0 r n2 0.685 5  2 t   1.629 1 r2 1  (0.685) 2 d.f. = n  2 = 5  2 = 3 From Table 6 in Appendix II, 1.629 falls between entries 1.423 and 1.638. Use one-tailed areas to find that 0.100 < P value < 0.125. Using a TI-84, P value ≈ 0.1011. Since the P-value interval is > 0.01, we do not reject H 0 . There does not seem to be a positive correlation between age and mortality rate of bighorn sheep. (e) No, based on this limited data, predictions from the least-squares line model might be misleading. There appear to be other lurking variables that affect the mortality rate of sheep in different age groups. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 587 6. (a) Scatterplot of Annual Salary vs Number of Job Changes 44 42 Annual Salary 40 38 36 34 32 30 0 2 4 6 Number of Job Changes 8 10  x 60   6.0 n 10  y 359 y   35.9 n 10 n xy    x   y  10(2231)  (60)(359) b   0.939024 2 10(442)  (60) 2 n x 2    x  a  y  bx  35.9  0.939024(6.0)  30.266 yˆ  a  bx or yˆ  30.266  0.939 x (c) n xy    x   y  10(2, 231)  (60)(359) r   0.761 2 2 10(442)  (60) 2 10(13, 013)  (359) 2 n x 2    x  n y 2    y  (b) x  r 2  (0.761)2  0.579 This means that 57.9% of the variation in salary can be explained by the corresponding variation in number of job changes using the least-squares line. (d)  = 0.05 H0:   0 H1:   0 r n  2 0.761 10  2 t   3.318 1 r2 1  (0.761) 2 d . f .  n  2  10  2  8 From Table 6 in Appendix II, 3.32 falls between entries 2.896 and 3.355. Use one-tailed areas to find that 0.005 < P value < 0.010. Using a TI-84, P value ≈ 0.0053. Since the P-value interval is  0.05, reject H 0 and conclude that the sample evidence supports a positive correlation. (e) Let x = 2. yˆ  30.266  0.939(2)  32.14 The predicted salary is $32,140. (f) Se   y 2  a y  b xy  13,013  30.266(359)  0.939024(2, 231)  2.56 n2 Copyright © Houghton Mifflin Company. All rights reserved. 10  2 Part IV: Complete Solutions, Chapter 10 588 (g) E  tc Se 1  1 n( x  x ) 2  n n  x 2    x 2  1.860(2.564058) 1  1 10(2  6) 2  10 10(442)  602  5.43 A 90% confidence interval for y is yˆ  E  y  yˆ  E 32.14  5.43  y  32.14  5.43 26.71  y  37.57 (h)  = 0.05 H0:   0 H1:   0 b 1 0.939204 1 2 t 442  (60)2  3.32  x2    x   Se n 2.56 10 d . f .  n  2  10  2  8 From Table 6 in Appendix II, 3.32 falls between entries 2.896 and 3.355. Use one-tailed areas to find that 0.005 < P value < 0.010. Using a TI-84, P value ≈ 0.0053. Since the P-value interval  0.05, reject H 0 and conclude that the sample evidence supports a positive slope. (i) d.f. = 8, tc  1.860, b  0.939 E tc Se x  2  x 2  n 1.86(2.56) 2  0.526 442  60 10 A 90% confidence interval is bE   bE 0.939  0.526    0.939  0.526 0.413    1.465 At the 90% confidence level, we can say that for each increase of one job change, the annual salary increases from 0.4 to 1.4 thousand dollars. 7. (a) Scatterplot of Weight @ 30 years vs Weight @ 1 year 150 Weight @ 30 years 140 130 120 110 15.0 17.5 20.0 22.5 Weight @ 1 year 25.0 27.5 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 589  x 300   21.43 n 14  y 1, 775 y   126.79 n 14 n xy    x   y  14(38, 220)  (300)(1, 775) b   1.285 2 14(6,572)  (300) 2 n x 2    x  a  y  bx  126.79  (1.285)(21.43)  99.25 yˆ  a  bx or yˆ  99.25  1.285 x (c) (b) x  r n xy    x   y  n x 2    x  2 n y 2    y  2  14(38, 220)  (300)(1, 775) 14(6,572)  (300) 2 14(226,125)  (1, 775) 2  0.468 r 2  (0.468) 2  0.219 The coefficient of determination r 2 means that 21.9% of the variation in weight of a 30-year-old female can be explained by the corresponding variation in the weight of a 1-year-old baby girl using the least-squares line. (d)  = 0.01 H0:   0 H1:   0 r n  2 0.468 14  2 t   1.834 1 r2 1  (0.468) 2 d . f .  n  2  14  2  12 From Table 6 in Appendix II, the sample test statistic falls between entries 1.782 and 2.179. Use onetailed areas to find that 0.025 < P value < 0.050. Using a TI-84, P value ≈ 0.0457. Since the P-value interval > 0.01, do not reject H 0 . At the 1% level of significance, there does not seem to be a positive correlation between weight of baby and weight of adult. (e) Let x = 20. yˆ  99.25  1.285(20)  124.95 The predicted weight is 124.95 pounds, but since r is not significant, the prediction might not be very reliable. (f) Se   y 2  a y  b xy  (g) E  tc Se 1  n2 226,125  99.25(1, 775)  1.285(38, 220)  8.38 14  2 1 n( x  x ) 2  n n  x 2    x 2  2.179(8.38) 1  1 14(20  21.43) 2  14 14(6,572)  (300) 2  19.03 A 95% confidence interval for y is yˆ  E  y  yˆ  E 124.95  19.03  y  124.95  19.03 105.92  y  143.98 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 590 (h)  = 0.01 H0:   0 H1:   0 b 1 1.285 1 2 6,572  (300)2  1.84  x2    x   Se n 8.38 14 d . f .  n  2  14  2  12 From Table 6 in Appendix II, the sample test statistic falls between entries 1.782 and 2.179. Use onetailed areas to find that 0.025 < P value < 0.050. Using a TI-84, P value ≈ 0.0457. Since the P-value interval > 0.01 for , do not reject H 0 . At the 1% level of significance, there does not seem to be a positive slope between weight of baby and weight of adult. t (i) d.f. = 12, tc  1.356, b = 1.285 tc Se E  x2   x 2 n  1.356(8.38) (300)2 6,572  14  0.949 An 80% confidence interval is bE   bE 1.285  0.949    1.285  0.949 0.34    2.22 At the 80% confidence level, we can say that for each additional pound a female infant weighs at 1 year, the adult weight increases by 0.34–2.22 pounds. 8. (a) Scatterplot of Number of Buyers vs Number of Visits 11 10 Number of Buyers 9 8 7 6 5 4 3 2 5 10 15 20 Number of Visits 25 30 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 591  x 248   16.53  16.53 n 15  y 97 y   6.46  6.47 n 15 n xy    x   y  15(1,825)  (248)(97) b   0.292784 2 15(4,856)  (248) 2 n x 2    x  (b) x  a  y  bx  6.46  0.292784(16.53)  1.626 yˆ  a  bx or yˆ  1.626  0.293x r (c) n xy    x   y  n x2   x 2 n y2   y 2  15(1,825)  (248)(97) 15(4,856)  (248) 2 15(731)  (97) 2  0.790  0.624 This means that 62.4% of the variation in number of people who bought insurance that week can be explained by the corresponding variation in number of visits made each week using the least-squares line. (d)  = 0.01 H0:   0 H1:   0 r n  2 0.790 15  2 t   4.65 1 r2 1  (0.790) 2 r2  (0.790) 2 d . f .  n  2  15  2  13 From Table 6 in Appendix II, 4.65 falls to the right of entry 4.221. Use one-tailed areas to find that P value < 0.005. Using a TI-84, P value ≈ 0.0002. Since the P value  0.01, reject H 0 . At the 1% level of significance, there is sufficient evidence to show a positive correlation between number of visits and number of sales. (e) Let x = 18. yˆ  1.626  0.293(18)  6.9 The predicted number of sales is 6.9, or 7. (f) Se   y 2  a y  b xy  731  1.626(97)  0.292784(1,825)  1.7309 (g) E  tc Se 1  n2 15  2 n( x  x ) 2 1  n n  x 2    x 2  2.160(1.730940) 1  1 15(18  16.53) 2  15 15(4,856)  (248) 2 4 A 95% confidence interval for y is yˆ  E  y  yˆ  E 74 y  74 3  y  11 (h)  = 0.01 H0:   0 H1:   0 b 1 0.292784 1 2 4,856  (248)2  4.65  x2    x   Se n 1.7309 15 d . f .  n  2  15  2  13 t Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 592 From Table 6 in Appendix II, 4.65 falls to the right of entry 4.221. Use one-tailed areas to find that P value < 0.0005. Using a TI-84, P value ≈ 0.0002. Since the P value  0.01, reject H 0 . At the 1% level of significance, there is sufficient evidence to show a positive slope between number of visits and number of sales. (i) d.f. = 13, tc  1.350, b = 0.292784 tc Se E x 2   x 2  1.350(1.731) (248)2  0.085 4856  15 n An 80% confidence interval is bE   bE 0.292784  0.085    0.292784  0.085 0.208    0.378 At the 80% confidence level, we can say that for each additional visit made, between 0.21 and 0.38 more sales are made. 9. (a) Scatterplot of Number of Employees vs Weight of Mail 17.5 Number of Employees 15.0 12.5 10.0 7.5 5.0 5 10 15 Weight of Mail 20 25  x 131   16.375   n 8  y 81 y   10.125  10.13 n 8 n xy    x   y  8(1, 487)  (131)(81) b   0.554118  0.554 2 8(2, 435)  (131) 2 n x 2    x  a  y  bx  10.125  0.554118(16.375)  1.051 yˆ  a  bx or yˆ  1.051  0.554 x (b) x  (c) r n xy    x   y  n x 2    x  2 n y 2    y  2  8(1, 487)  (131)(81) 8(2, 435)  (131)2 8(927)  (81) 2  0.913 r 2  (0.913) 2  0.833 The coefficient of determination r 2 means that 83.3% of the variation in number of employees can be explained by the corresponding variation in weight of incoming mail using the least-squares line. (d)  = 0.01 Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 593 H0:   0 H1:   0 r n2 0.913 8  2 t   5.48 2 1 r 1  (0.913) 2 d. f .  n  2  8  2  6 From Table 6 in Appendix II, the sample test statistic falls between entries 3.707 and 5.959. Use onetailed areas to find that 0.0005 < P value < 0.005. Using a TI-84, P value ≈ 0.0008. Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, there is sufficient evidence to show a positive correlation between pounds of mail and number of employees required to process the mail. (e) Use x = 15. yˆ  1.051  0.554(15)  9.36 About nine employees should be assigned mail duty. (f) Se   y 2  a y  b xy  927  1.051(81)  0.554118(1, 487)  1.73 n2 (g) E  tc Se 1  82 n( x  x ) 2 1  n n  x 2    x 2 1 8(15  16.375) 2  2.447(1.73) 1   8 8(2, 435)  (131) 2  4.5 A 95% confidence interval for y is yˆ  E  y  yˆ  E 9.36  4.5  y  9.36  4.5 4.86  y  13.86 (h)  = 0.01 H0:   0 H1:   0 b 1 0.554118 1 2 t 2435  (131)2  5.45  x2    x   Se n 1.73 8 d. f .  n  2  8  2  6 From Table 6 in Appendix II, the sample test statistic falls between entries 3.707 and 5.959. Use onetailed areas to find that 0.0005 < P value < 0.005. Using a TI-84, P value ≈ 0.0008. Since the P-value interval  0.01, reject H 0 . At the 1% level of significance, there is sufficient evidence to show a positive slope between pounds of mail and number of employees required to process the mail. (i) d . f .  6, tc  1.440, b  0.554 tc Se E  x2   x n 2  1.440(1.73) 2, 435  (131)2 8  0.146 An 80% confidence interval is bE   bE 0.554  0.146    0.554  0.146 0.41    0.70 At the 80% confidence level, we can say that for each additional pound of mail, between 0.4 and 0.7 additional employees are needed. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 594 10. (a) Scatterplot of Crime Rate vs Population % Change 200 175 Crime Rate 150 125 100 75 50 0 5 10 15 20 Population % Change 25 30  x 72   12.0 n 6  y 589 y   98.16  98.17 n 6 n xy    x   y  6(9, 499)  (72)(589) b   5.1071  5.11 2 6(1,340)  (72)2 n x 2    x  (b) x  a  y  bx  98.16  5.1071(12.0)  36.881  36.9 yˆ  a  bx or yˆ  36.9  5.11x (c) r n xy    x   y  n x2   x 2 n y2   y 2  6(9, 499)  (72)(589) 6(1,340)  (72)2 6(72, 277)  (589) 2  0.927 r 2  (0.927) 2  0.859 This means that 85.9% of the variation in crime rate can be explained by the corresponding variation in percent population change. (d)  = 0.01 H0:   0 H1:   0 r n2 0.927 6  2 t   4.94 2 1 r 1  (0.927) 2 d. f .  n  2  6  2  4 From Table 6 in Appendix II, the sample test statistic falls between entries 4.604 and 8.610. Use twotailed areas to find that 0.001 < P value < 0.010. Using a TI-84, P value ≈ 0.0079. Since the P-value interval < 0.01, reject H 0 . At the 1% level of significance, there is sufficient evidence to show a nonzero correlation between population change and crime rate. Copyright © Houghton Mifflin Company. All rights reserved. Part IV: Complete Solutions, Chapter 10 595 (e) Let x = 12 yˆ  36.9  5.11(12)  98.2 The predicted crime rate is 98.2 crimes per thousand. (f) Se   y 2  a y  b xy  72, 277  36.881(589)  5.1071(9, 499)  22.59 62 n2 (g) E  tc Se 1  1 n( x  x ) 2  n n  x 2    x 2 1 6(12  12)2  1.533(22.59) 1    37.41 6 6(1,340)  (72)2 A 80% confidence interval for y is yˆ  E  y  yˆ  E 98.2  37.41  y  98.2  37.41 60.8  y  135.6, or about 61 to 136 crimes per thousand. (h)  = 0.01 H0:   0 H1:   0 b 1 5.1071 1 2 1,340  (72)2  4.93  x2    x   Se n 22.59 6 d. f .  n  2  6  2  4 From Table 6 in Appendix II, the sample test statistic falls between entries 4.604 and 8.610. Use twotailed areas to find that 0.001 < P value < 0.010. Using a TI-84, P value ≈ 0.0079. Since the P-value interval < 0.01, reject H 0 . At the 1% level of significance, there is sufficient evidence to show a nonzero slope between population change and crime rate. t (i) d . f .  4, tc  1.533, b  5.11 E tc Se  x 2  1.533(22.59) (72)2  1.59 1,340  6   n An 80% confidence interval is x2 bE   bE 5.11  1.59    5.11  1.59 3.52    6.70 At the 80% confidence level, we can say that for every percentage point increase in population, expect the crime rate per 1,000 to increase from between 3.52 and 6.70 crimes per thousand. Copyright © Houghton Mifflin Company. All rights reserved.

10 - Employees

Related documents

Products

Support

10 - Employees

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib