Chapter 14 Simple Linear Regression Learning Objectives 1. Understand how regression analysis can be used to develop an equation that estimates mathematically how two variables are related. 2. Understand the differences between the regression model, the regression equation, and the estimated regression equation. 3. Know how to fit an estimated regression equation to a set of sample data based upon the leastsquares method. 4. Be able to determine how good a fit is provided by the estimated regression equation and compute the sample correlation coefficient from the regression analysis output. 5. Understand the assumptions necessary for statistical inference and be able to test for a significant relationship. 6. Know how to develop confidence interval estimates of y given a specific value of x in both the case of a mean value of y and an individual value of y. 7. Learn how to use a residual plot to make a judgement as to the validity of the regression assumptions. 8. Know the definition of the following terms: independent and dependent variable simple linear regression regression model regression equation and estimated regression equation scatter diagram coefficient of determination standard error of the estimate confidence interval prediction interval residual plot 14 - 1 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 Solutions: a. 16 14 12 10 y 1 8 6 4 2 0 0 1 2 3 4 5 6 x b. There appears to be a positive linear relationship between x and y. c. Many different straight lines can be drawn to provide a linear approximation of the relationship between x and y; in part (d) we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion. d. x xi 15 3 n 5 y ( xi x )( yi y ) 26 b1 yi 40 8 n 5 ( xi x ) 2 10 ( xi x )( yi y ) 26 2.6 10 ( xi x )2 b0 y b1 x 8 (2.6)(3) 0.2 yˆ 0.2 2.6 x e. yˆ 0.2 2.6(4) 10.6 14 - 2 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression a. 60 50 40 y 2. 30 20 10 0 0 5 10 15 20 25 x b. There appears to be a negative linear relationship between x and y. c. Many different straight lines can be drawn to provide a linear approximation of the relationship between x and y; in part (d) we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion. d. x xi 55 11 n 5 y ( xi x )( yi y ) 540 b1 yi 175 35 n 5 ( xi x ) 2 180 ( xi x )( yi y ) 540 3 180 ( xi x )2 b0 y b1 x 35 (3)(11) 68 yˆ 68 3x e. yˆ 68 3(10) 38 14 - 3 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 a. 30 25 20 y 3. 15 10 5 0 0 5 10 15 20 x b. x xi 50 10 n 5 y ( xi x )( yi y ) 171 b1 yi 83 16.6 n 5 ( xi x ) 2 190 ( xi x )( yi y ) 171 0.9 190 ( xi x )2 b0 y b1 x 16.6 (0.9)(10) 7.6 yˆ 7.6 0.9 x c. yˆ 7.6 0.9(6) 13 14 - 4 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 25 Simple Linear Regression a. 70 60 50 % Management 4. 40 30 20 10 0 40 45 50 55 60 65 70 75 % Working b. There appears to be a positive linear relationship between the percentage of women working in the five companies (x) and the percentage of management jobs held by women in that company (y) c. Many different straight lines can be drawn to provide a linear approximation of the relationship between x and y; in part (d) we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion. d. x xi 300 60 n 5 y ( xi x )( yi y ) 624 b1 yi 215 43 n 5 ( xi x ) 2 480 ( xi x )( yi y ) 624 1.3 ( xi x )2 480 b0 y b1 x 43 1.3(60) 35 yˆ 35 1.3x e. yˆ 35 1.3x 35 1.3(60) 43% 14 - 5 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 a. 100 90 80 70 Rating 5. 60 50 40 30 20 10 0 0 500 1000 1500 2000 2500 Price ($) 3000 3500 4000 b. There appears to be a positive relationship between price and rating. The sign that says “Quality: You Get What You Pay For” does fairly reflect the price-quality relationship for ellipticals. c. Let x = price ($) and y = rating. x xi 1500 1875 n 8 y ( xi x )( yi y ) 68, 900 b1 yi 592 74 n 8 ( xi x ) 2 8,155, 000 ( xi x )( yi y ) 68,900 .008449 ( xi x )2 8,155,000 b0 y b1 x 74 (.008449)(1875) 58.158 yˆ 58.158 .008449 x d. yˆ 58.158 .008449 x 58.158 .008449(1500) 70.83 or approximately 71 14 - 6 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression a. 90 80 70 60 Win% 6. 50 40 30 20 10 0 4 5 6 7 8 9 Yds/Att b. The scatter diagram indicates a positive linear relationship between x = average number of passing yards per attempt and y = the percentage of games won by the team. c. x xi / n 680 /10 6.8 ( xi x )( yi y ) 121.6 b1 y yi / n 464 /10 46.4 ( xi x ) 2 7.08 ( xi x )( yi y ) 121.6 17.1751 ( xi x )2 7.08 b0 y b1 x 46.4 (17.1751)(6.8) 70.391 yˆ 70.391 17.1751x d. The slope of the estimated regression line is approximately 17.2. So, for every increase of one yard in the average number of passes per attempt, the percentage of games won by the team increases by 17.2%. e. With an average number of passing yards per attempt of 6.2, the predicted percentage of games won is ŷ = -70.391 + 17.175(6.2) = 36%. With a record of 7 wins and 9 loses, the percentage of wins that the Kansas City Chiefs won is 43.8 or approximately 44%. Considering the small data size, the prediction made using the estimated regression equation is not too bad. 14 - 7 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 a. 150 140 130 Annual Sales ($1000s) 7. 120 110 100 90 80 70 60 50 0 2 4 6 8 10 12 14 Years of Experience b. Let x = years of experience and y = annual sales ($1000s) x xi 70 7 n 10 y ( xi x )( yi y ) 568 b1 yi 1080 108 n 10 ( xi x ) 2 142 ( xi x )( yi y ) 568 4 142 ( xi x )2 b0 y b1 x 108 (4)(7) 80 y 80 4 x c. y 80 4 x 80 4(9) 116 or $116,000 14 - 8 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression a. 4.5 4.0 Satisfaction 8. 3.5 3.0 2.5 2.0 2.0 2.5 3.0 3.5 Speed of Execution 4.0 4.5 b. The scatter diagram indicates a positive linear relationship between x = speed of execution rating and y = overall satisfaction rating for electronic trades. c. x xi / n 36.3 /11 3.3 ( xi x )( yi y ) 2.4 b1 y yi / n 35.2 /11 3.2 ( xi x ) 2 2.6 ( xi x )( yi y ) 2.4 .9077 ( xi x )2 2.6 b0 y b1 x 3.2 (.9077)(3.3) .2046 yˆ .2046 .9077 x d. The slope of the estimated regression line is approximately .9077. So, a one unit increase in the speed of execution rating will increase the overall satisfaction rating by approximately .9 points. e. The average speed of execution rating for the other brokerage firms is 3.4. Using this as the new value of x for Zecco.com, we can use the estimated regression equation developed in part (c) to estimate the overall satisfaction rating corresponding to x = 3.4. yˆ .2046 .9077 x .2046 .9077(3.4) 3.29 Thus, an estimate of the overall satisfaction rating when x = 3.4 is approximately 3.3. 14 - 9 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 a. 85 80 75 Rating 9. 70 65 60 55 50 100 150 200 250 300 Price ($) 350 400 450 b. The scatter diagram indicates a positive linear relationship between x = price ($) and y = overall rating. c. x xi / n 4660 / 20 233 y yi / n 1400 / 20 70 ( xi x )( yi y ) 8100 b1 ( xi x ) 2 127, 420 ( xi x )( yi y ) 8100 .06357 ( xi x )2 127,420 b0 y b1 x 70 (.06357)(233) 55.188 yˆ 55.188 .06357 x d. We can use the estimated regression equation developed in part (c) to estimate the overall satisfaction rating corresponding to x = 200. yˆ 55.188 .06357 x 55.188 .06357(200) 67.9 Thus, an estimate of the overall rating when x = $200 is approximately 70. 14 - 10 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 10. a. 1400 % Gain in Options Value 1200 1000 800 600 400 200 0 0 100 200 300 400 % Increase in Stock Price 500 600 b. The scatter diagram indicates a positive linear relationship between x = percentage increase in the stock price and y = percentage gain in options value. In other words, options values increase as stock prices increase. c. x xi / n 2939 /10 293.9 ( xi x )( yi y ) 314,501.1 b1 y yi / n 6301/10 630.1 ( xi x ) 2 115,842.9 ( xi x )( yi y ) 314,501.1 2.7149 ( xi x )2 115,842.9 b0 y b1 x 630.1 (2.1749)(293.9) 167.81 yˆ 167.81 2.7149 x d. The slope of the estimated regression line is approximately 2.7. So, for every percentage increase in the price of the stock the options value increases by 2.7%. e. The rewards for the CEO do appear to be based upon performance increases in the stock value. While the rewards may seem excessive, the executive is being rewarded for his/her role in increasing the value of the company. This is why such compensation schemes are devised for CEOs by boards of directors. A compensation scheme where an executive got a big salary increase when the company stock went down would be bad. And, if the stock price for a company had gone down during the periods in question, the value of the CEOs options would also go down. 14 - 11 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 11. a. b. There appears to be a positive linear relationship between x = price and y = road-test score. c. x xi 339.6 28.3 n 12 y ( xi x )( yi y ) 309.90 b1 yi 930 77.5 n 12 ( xi x ) 2 346.38 ( xi x )( yi y ) 309.90 .8947 ( xi x )2 346.38 b0 y b1 x 77.5 (.8947)(28.3) 52.18 yˆ 52.18 .8947 x d. The slope is .8947. A sporty car that has a ten thousand dollar higher price can be expected to have a 10(.8947) = 8.947, or approximately a 9 point higher road-test score. e. yˆ 52.18 .8947(36.7) 85 14 - 12 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 12. a. 190 Entertainment ($) 170 150 130 110 90 70 70 90 110 130 Hotel Room Rate ($) 150 170 b. The scatter diagram indicates a positive linear relationship between x = hotel room rate and the amount spent on entertainment. c. x xi / n 945 / 9 105 ( xi x )( yi y ) 4237 b1 y yi / n 1134 / 9 126 ( xi x ) 2 4100 ( xi x )( yi y ) 4237 1.0334 ( xi x )2 4100 b0 y b1 x 126 (1.0334)(105) 17.49 yˆ 17.49 1.0334 x d. With a value of x = $128, the predicted value of y for Chicago is yˆ 17.49 1.0334 x 17.49 1.0334(128) 150 Note: In The Wall Street Journal article the entertainment expense for Chicago was $146. Thus, the estimated regression equation provided a good estimate of entertainment expenses for Chicago. 14 - 13 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 Reasonable Amount of Itemized Deductions ($1000s) 13. a. 30.0 25.0 20.0 15.0 10.0 5.0 0.0 0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 Adjusted Gross Income ($1000s) b. Let x = adjusted gross income and y = reasonable amount of itemized deductions x xi 399 57 n 7 y ( xi x )( yi y ) 1233.7 b1 yi 97.1 13.8714 n 7 ( xi x ) 2 7648 ( xi x )( yi y ) 1233.7 0.1613 7648 ( xi x )2 b0 y b1 x 13.8714 (0.1613)(57) 4.6773 y 4.68 016 . x c. y 4.68 016 . x 4.68 016 . (52.5) 13.08 or approximately $13,080. The agent's request for an audit appears to be justified. 14 - 14 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 14. a. b. There appears to be a positive linear relationship between x = features rating and y = PCW World Rating. c. x xi 784 78.4 n 10 y ( xi x )( yi y ) 147.20 b1 yi 777 77.7 n 10 ( xi x ) 2 284.40 ( xi x )( yi y ) 147.20 .51758 ( xi x )2 284.40 b0 y b1 x 77.7 (.51758)(78.4) 37.1217 yˆ 37.1217 .51758 x d. 15. a. yˆ 37.1217 .51758(70) 73.35 or 73 The estimated regression equation and the mean for the dependent variable are: yi 0.2 2.6xi y 8 The sum of squares due to error and the total sum of squares are SSE ( yi yi ) 2 12.40 SST ( yi y ) 2 80 Thus, SSR = SST - SSE = 80 - 12.4 = 67.6 b. r2 = SSR/SST = 67.6/80 = .845 The least squares line provided a very good fit; 84.5% of the variability in y has been explained by the least squares line. c. rxy .845 .9192 14 - 15 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 16. a. The estimated regression equation and the mean for the dependent variable are: yˆi 68 3x y 35 The sum of squares due to error and the total sum of squares are SSE ( yi yˆi ) 2 230 SST ( yi y ) 2 1850 Thus, SSR = SST - SSE = 1850 - 230 = 1620 b. r2 = SSR/SST = 1620/1850 = .876 The least squares line provided an excellent fit; 87.6% of the variability in y has been explained by the estimated regression equation. c. rxy .876 .936 Note: the sign for r is negative because the slope of the estimated regression equation is negative. (b1 = -3) 17. The estimated regression equation and the mean for the dependent variable are: yˆi 7.6 .9 x y 16.6 The sum of squares due to error and the total sum of squares are SSE ( yi yˆi ) 2 127.3 SST ( yi y ) 2 281.2 Thus, SSR = SST - SSE = 281.2 – 127.3 = 153.9 r2 = SSR/SST = 153.9/281.2 = .547 We see that 54.7% of the variability in y has been explained by the least squares line. rxy .547 .740 18. a. x xi / n 600 / 6 100 SST = ( yi y ) 2 1800 y yi / n 330 / 6 55 SSE = ( yi yˆ i ) 2 287.624 SSR = SST – SSR = 1800 – 287.624 = 1512.376 SSR 1512.376 .84 SST 1800 b. r2 c. r r 2 .84 .917 14 - 16 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 19. a. The estimated regression equation and the mean for the dependent variable are: ŷ = 80 + 4x y = 108 The sum of squares due to error and the total sum of squares are SSE ( yi yˆ i )2 170 SST ( yi y )2 2442 Thus, SSR = SST - SSE = 2442 - 170 = 2272 b. r2 = SSR/SST = 2272/2442 = .93 We see that 93% of the variability in y has been explained by the least squares line. c. 20. a. rxy .93 .96 x xi / n 160 /10 16 y yi / n 55,500 /10 5550 ( xi x )( yi y ) 31, 284 b1 ( xi x ) 2 21.74 ( xi x )( yi y ) 31,284 1439 ( xi x )2 21.74 b0 y b1 x 5550 (1439)(16) 28,574 yˆ 28,574 1439 x b. SST = 52,120,800 SSE = 7,102,922.54 SSR = SST – SSR = 52,120,800 - 7,102,922.54 = 45,017,877 r 2 = SSR/SST = 45,017,877/52,120,800 = .864 The estimated regression equation provided a very good fit. c. yˆ 28,574 1439 x 28,574 1439(15) 6989 Thus, an estimate of the price for a bike that weighs 15 pounds is $6989. 21. a. x xi 3450 575 n 6 y ( xi x )( yi y ) 712,500 b1 yi 33, 700 5616.67 n 6 ( xi x ) 2 93, 750 ( xi x )( yi y ) 712,500 7.6 93, 750 ( xi x ) 2 b0 y b1 x 5616.67 (7.6)(575) 1246.67 y 1246.67 7.6 x 14 - 17 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 b. $7.60 c. The sum of squares due to error and the total sum of squares are: SSE ( yi yˆi ) 2 233,333.33 SST ( yi y ) 2 5, 648,333.33 Thus, SSR = SST - SSE = 5,648,333.33 - 233,333.33 = 5,415,000 r2 = SSR/SST = 5,415,000/5,648,333.33 = .9587 We see that 95.87% of the variability in y has been explained by the estimated regression equation. d. 22. a. y 1246.67 7.6x 1246.67 7.6(500) $5046.67 y = 74 SSE = 173.88 The total sum of squares is SST ( yi y ) 2 756 Thus, SSR = SST - SSE = 756 – 173.88 = 582.12 r2 = SSR/SST = 582.12/756 = .77 b. The estimated regression equation provided a good fit because 77% of the variability in y has been explained by the least squares line. c. rxy .77 .88 This reflects a strong positive linear relationship between price and rating. 23. a. s2 = MSE = SSE / (n - 2) = 12.4 / 3 = 4.133 b. s MSE 4.133 2.033 c. ( xi x ) 2 10 sb1 d. t s ( xi x ) 2 2.033 10 0.643 b1 2.6 4.044 sb1 .643 Using t table (3 degrees of freedom), area in tail is between .01 and .025 p-value is between .02 and .05 Using Excel or Minitab, the p-value corresponding to t = 4.04 is .0272. Because p-value , we reject H0: 1 = 0 14 - 18 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression e. MSR = SSR / 1 = 67.6 F = MSR / MSE = 67.6 / 4.133 = 16.36 Using F table (1 degree of freedom numerator and 3 denominator), p-value is between .025 and .05 Using Excel or Minitab, the p-value corresponding to F = 16.36 is .0272. Because p-value , we reject H0: 1 = 0 Source of Variation Regression Error Total 24. a. Sum of Squares 67.6 12.4 80.0 Mean Square 67.6 4.133 F 16.36 p-value .0272 s2 = MSE = SSE/(n - 2) = 230/3 = 76.6667 b. s MSE 76.6667 8.7560 c. ( xi x ) 2 180 sb1 d. Degrees of Freedom 1 3 4 t s ( xi x ) 2 8.7560 180 0.6526 b1 3 4.59 sb1 .653 Using t table (3 degrees of freedom), area in tail is less than .01; p-value is less than .02 Using Excel or Minitab, the p-value corresponding to t = -4.59 is .0193. Because p-value , we reject H0: 1 = 0 e. MSR = SSR/1 = 1620 F = MSR/MSE = 1620/76.6667 = 21.13 Using F table (1 degree of freedom numerator and 3 denominator), p-value is less than .025 Using Excel or Minitab, the p-value corresponding to F = 21.13 is .0193. Because p-value , we reject H0: 1 = 0 Source of Variation Regression Error Total Sum of Squares 1620 230 1850 Degrees of Freedom 1 3 4 Mean Square 1620 76.6667 F 21.13 p-value .0193 14 - 19 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 25. a. s2 = MSE = SSE/(n - 2) = 127.3/3 = 42.4333 s MSE 42.4333 6.5141 b. ( xi x ) 2 190 sb1 t s ( xi x ) 2 6.5141 190 0.4726 b1 .9 1.90 sb1 .4726 Using t table (3 degrees of freedom), area in tail is between .05 and .10 p-value is between .10 and .20 Using Excel or Minitab, the p-value corresponding to t = 1.90 is .1530. Because p-value > , we cannot reject H0: 1 = 0; x and y do not appear to be related. c. MSR = SSR/1 = 153.9 /1 = 153.9 F = MSR/MSE = 153.9/42.4333 = 3.63 Using F table (1 degree of freedom numerator and 3 denominator), p-value is greater than .10 Using Excel or Minitab, the p-value corresponding to F = 3.63 is .1530. Because p-value > , we cannot reject H0: 1 = 0; x and y do not appear to be related. 26. a. In the statement of exercise 18, ŷ = 23.194 + .318x In solving exercise 18, we found SSE = 287.624 s 2 MSE = SSE/(n-2) =287.624 / 4 71.906 s MSE 71.906 8.4797 ( x x ) sb1 t 2 14,950 s ( x x ) 2 8.4797 .0694 14,950 b1 .318 4.58 sb1 .0694 Using t table (4 degrees of freedom), area in tail is between .005 and .01 p-value is between .01 and .02 14 - 20 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression Using Excel, the p-value corresponding to t = 4.58 is .010. Because p-value , we reject H0: 1 = 0; there is a significant relationship between price and overall score b. In exercise 18 we found SSR = 1512.376 MSR = SSR/1 = 1512.376/1 = 1512.376 F = MSR/MSE = 1512.376/71.906 = 21.03 Using F table (1 degree of freedom numerator and 4 denominator), p-value is between .025 and .01 Using Excel, the p-value corresponding to F = 11.74 is .010. Because p-value , we reject H0: 1 = 0 c. Source of Variation Regression Error Total 27. a. Sum of Squares 1512.376 287.624 1800 Degrees of Freedom 1 4 5 Mean Square 1512.376 71.906 F 21.03 p-value .010 Let x = number of megapixels and y = price ($) x xi 95 9.5 n 10 y yi 2190 219 n 10 2 ( xi x )( yi y ) 2165 ( xi x ) 56.5 b1 ( xi x )( yi y ) 2165 38.31858 ( xi x )2 56.5 b0 y b1 x 219 (38.31858)(9.5) 145.0265 yˆ 145.0265 38.31858 x b. SSE = ( yi yˆ i ) 2 20, 730.27 SST = ( yi y ) 2 = 103,690 Thus, SSR = SST - SSE = 103,690 – 20,730.27 = 82,959.73 MSR = SSR/1 = 82,959.73 MSE = SSE/(n - 2) = 20,730.27/8 = 2591.28 F = MSR / MSE = 82,959.73/2591.28 = 32.015 Using F table (1 degree of freedom numerator and 8 denominator), p-value is less than .01 Using Excel, the p-value corresponding to F = 32.015 is .000. 14 - 21 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 Because p-value , we reject H0: 1 = 0 Number of megapixels and price are related. c. r2 = SSR/SST = 82,959.73/103,690= .80 The estimated regression equation provided a good fit; we should feel comfortable using the estimated regression equation to estimate the price given the number of megapixels. d. 28. yˆ 145.0265 38.31858(10) 238.16 or approximately $238 The sum of squares due to error and the total sum of squares are SSE ( yi yˆ i ) 2 1.4379 SST ( yi y ) 2 3.5800 Thus, SSR = SST - SSE = 3.5800 – 1.4379 = 2.1421 s2 = MSE = SSE / (n - 2) = 1.4379 / 9 = .1598 s MSE .1598 .3997 We can use either the t test or F test to determine whether speed of execution and overall satisfaction are related. We will first illustrate the use of the t test. ( xi x ) 2 2.6 s sb1 t ( xi x ) b1 sb 1 .9077 .2479 2 .3997 .2479 2.6 3.66 Using t table (9 degrees of freedom), area in tail is less than .005; p-value is less than .01 Using Excel or Minitab, the p-value corresponding to t = 3.66 is .000. Because p-value , we reject H0: 1 = 0 Because we can reject H0: 1 = 0 we conclude that speed of execution and overall satisfaction are related. Next we illustrate the use of the F test. MSR = SSR / 1 = 2.1421 F = MSR / MSE = 2.1421 / .1598 = 13.4 Using F table (1 degree of freedom numerator and 9 denominator), p-value is less than .01 Using Excel or Minitab, the p-value corresponding to F = 13.4 is .000. 14 - 22 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression Because p-value , we reject H0: 1 = 0 Because we can reject H0: 1 = 0 we conclude that speed of execution and overall satisfaction are related. The ANOVA table is shown below. Source of Variation Regression Error Total 29. Sum of Squares 2.1421 1.4379 3.5800 Degrees of Freedom 1 9 10 SSE = ( yi yˆi ) 2 233,333.33 Mean Square 2.1421 .1598 F 13.4 p-value .000 F 92.83 p-value .0006 SST = ( yi y ) 2 = 5,648,333.33 Thus, SSR = SST – SSE = 5,648,333.33 –233,333.33 = 5,415,000 MSE = SSE/(n - 2) = 233,333.33/(6 - 2) = 58,333.33 MSR = SSR/1 = 5,415,000 F = MSR / MSE = 5,415,000 / 58,333.25 = 92.83 Source of Variation Regression Error Total Sum of Squares 5,415,000.00 233,333.33 5,648,333.33 Degrees of Freedom 1 4 5 Mean Square 5,415,000 58,333.33 Using F table (1 degree of freedom numerator and 4 denominator), p-value is less than .01 Using Excel or Minitab, the p-value corresponding to F = 92.83 is .0006. Because p-value , we reject H0: 1 = 0. Production volume and total cost are related. 30. SSE = ( yi yˆi ) 2 173.88 SST = ( yi y ) 2 = 756 Thus, SSR = SST – SSE = 756 – 173.88 = 582.12 s2 = MSE = SSE/(n-2) = 173.88/6 = 28.98 s 28.98 5.3833 ( xi x ) 2 = 8,155,000 sb1 t s ( xi x ) 2 5.3833 .001885 8,155,000 b1 .008449 4.48 sb1 .001885 14 - 23 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 Using t table (1 degree of freedom numerator and 8 denominator), area in tail is less than .005 p-value is less than .01 Using Excel or Minitab, the p-value corresponding to t = 4.48 is .0042. Because p-value , we reject H0: 1 = 0 There is a significant relationship between price and rating. 31. SST = 52,120,800 SSE = 7,102,922.54 SSR = SST – SSR = 52,120,800 - 7,102,922.54 = 45,017,877 MSR = SSR/1 = 45,017,877 MSE = SSE/(n - 2) = 7,102,922.54/8 = 887,865.3 F = MSR / MSE = 45,017,877/887,865.3 = 50.7 Using F table (1 degree of freedom numerator and 8 denominator), p-value is less than .01 Using Excel, the p-value corresponding to F = 32.015 is .000. Because p-value , we reject H0: 1 = 0 Weight and price are related. 32. a. b. s = 2.033 x 3 ( xi x ) 2 10 s yˆ * s 1 ( x* x )2 1 (4 3) 2 2.033 1.11 2 n ( xi x ) 5 10 ŷ* = .2 + 2.6 x * = .2 + 2.6(4) = 10.6 yˆ * t /2 s yˆ * 10.6 3.182 (1.11) = 10.6 3.53 or 7.07 to 14.13 c. spred s 1 d. ŷ * t /2 spred 1 ( x* x )2 1 (4 3) 2 2.033 1 2.32 n ( xi x ) 2 5 10 10.6 3.182 (2.32) = 10.6 7.38 or 3.22 to 17.98 14 - 24 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 33. a. b. s = 8.7560 x 11 s yˆ * s ( xi x )2 180 1 ( x* x )2 1 (8 11) 2 8.7560 4.3780 n ( xi x ) 2 5 180 yˆ * 0.2 2.6 x* 0.2 2.6(4) 10.6 yˆ * t /2 s yˆ * 44 3.182 (4.3780) = 44 13.93 or 30.07 to 57.93 c. spred s 1 d. ŷ* t /2 spred 1 ( x* x ) 2 1 (8 11) 2 8.7560 1 9.7895 n ( xi x ) 2 5 180 44 3.182(9.7895) = 44 31.15 or 12.85 to 75.15 34. s = 6.5141 x 10 s yˆ* s ( xi x ) 2 190 1 ( x* x ) 2 1 (12 10) 2 6.5141 3.0627 2 n ( xi x ) 5 190 yˆ * 7.6 .9 x* 7.6 .9(12) 18.40 yˆ * t /2 s yˆ * 18.40 3.182(3.0627) = 18.40 9.75 or 8.65 to 28.15 spred s 1 1 ( x* x ) 2 1 (12 10) 2 6.5141 1 7.1982 2 n ( xi x ) 5 190 ŷ * t /2 spred 18.40 3.182(7.1982) = 18.40 22.90 or -4.50 to 41.30 14 - 25 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 The two intervals are different because there is more variability associated with predicting an individual value than there is a mean value. 35. a. b. yˆ * 2090.5 581.1x* 2090.5 581.1(3) 3833.8 s MSE 21,284 145.89 s = 145.89 x 3.2 s yˆ * s ( xi x ) 2 0.74 1 ( x* x )2 1 (3 3.2) 2 145.89 68.54 n ( xi x ) 2 6 0.74 yˆ * t /2 s yˆ * 3833.8 2.776 (68.54) = 3833.8 190.27 or $3643.53 to $4024.07 c. spred s 1 1 ( x* x )2 1 (3 3.2) 2 145.89 1 161.19 n ( xi x ) 2 6 0.74 ŷ* t /2 spred 3833.8 2.776 (161.19) = 3833.8 447.46 or $3386.34 to $4281.26 d. 36. a. As expected, the prediction interval is much wider than the confidence interval. This is due to the fact that it is more difficult to predict the starting salary for one new student with a GPA of 3.0 than it is to estimate the mean for all students with a GPA of 3.0. s yˆ * s 1 ( x* x )2 1 (9 7) 2 4.6098 1.6503 n ( xi x ) 2 10 142 yˆ * t /2 s yˆ * yˆ * 80 4 x* 80 4(9) 116 116 2.306(1.6503) = 116 3.8056 or 112.19 to 119.81 ($112,190 to $119,810) b. spred s 1 1 ( x* x )2 1 (9 7) 2 4.6098 1 4.8963 n ( xi x )2 10 142 ŷ* t /2 spred 116 2.306(4.8963) = 116 11.2909 14 - 26 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression or 104.71 to 127.29 ($104,710 to $127,290) c. 37. a. As expected, the prediction interval is much wider than the confidence interval. This is due to the fact that it is more difficult to predict annual sales for one new salesperson with 9 years of experience than it is to estimate the mean annual sales for all salespersons with 9 years of experience. x 57 ( xi x ) 2 7648 s2 = 1.88 s yˆ * s s = 1.37 1 ( x* x )2 1 (52.5 57) 2 1.37 0.52 2 n ( xi x ) 7 7648 yˆ * t /2 s yˆ * ŷ* = 4.68 + 0.16 x * = 4.68 + 0.16(52.5) = 13.08 13.08 2.571 (.52) = 13.08 1.34 or 11.74 to 14.42 or $11,740 to $14,420 b. spred = 1.47 13.08 2.571 (1.47) = 13.08 3.78 or 9.30 to 16.86 or $9,300 to $16,860 c. Yes, $20,400 is much larger than anticipated. d. Any deductions exceeding the $16,860 upper limit could suggest an audit. 38. a. b. ŷ* = 1246.67 + 7.6(500) = $5046.67 x 575 ( xi x ) 2 93, 750 s2 = MSE = 58,333.33 s = 241.52 spred s 1 1 ( x* x )2 1 (500 575) 2 241.52 1 267.50 2 n ( xi x ) 6 93,750 ŷ* t /2 spred 5046.67 4.604 (267.50) = 5046.67 1231.57 or $3815.10 to $6278.24 c. Based on one month, $6000 is not out of line since $3815.10 to $6278.24 is the prediction interval. However, a sequence of five to seven months with consistently high costs should cause concern. 14 - 27 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 39. a. Let x = miles of track and y = weekday ridership in thousands. x xi 203 29 n 7 y ( xi x )( yi y ) 1471 b1 yi 309 44.1429 n 7 ( xi x ) 2 838 ( xi x )( yi y ) 1471 1.7554 838 ( xi x )2 b0 y b1 x 44.1429 (1.7554)(29) 6.76 yˆ 6.76 1.755x b. SST =3620.9 SSE = 1038.7 SSR = 2582.1 r2 = SSR/SST = 2582.1/3620.9 = .713 The estimated regression equation explained 71.3% of the variability in y; a good fit. c. s2 = MSE = 1038.7/5 = 207.7 s 207.7 14.41 s yˆ * s 1 ( x* x )2 1 (30 29) 2 14.41 5.47 2 n ( xi x ) 7 838 yˆ * 6.76 1.755x* 6.76 1.755(30) 45.9 45.9 2.571(5.47) = 45.9 14.1 or 31.8 to 60 d. spred s 1 1 ( x* x )2 1 (30 29) 2 14.41 1 15.41 2 n ( xi x ) 7 838 ŷ* t /2 spred 45.9 2.571(15.41) = 45.9 39.6 or 6.3 to 85.5 The prediction interval is so wide that it would not be of much value in the planning process. A larger data set would be beneficial. 40. a. 9 b. ŷ = 20.0 + 7.21x c. 1.3626 14 - 28 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression d. SSE = SST - SSR = 51,984.1 - 41,587.3 = 10,396.8 MSE = 10,396.8/7 = 1,485.3 F = MSR / MSE = 41,587.3 /1,485.3 = 28.00 Using F table (1 degree of freedom numerator and 7 denominator), p-value is less than .01 Using Excel or Minitab, the p-value corresponding to F = 28.00 is .0011. Because p-value = .05, we reject H0: B1 = 0. Selling price is related to annual gross rents. e. 41. a. b. ŷ = 20.0 + 7.21(50) = 380.5 or $380,500 ŷ = 6.1092 + .8951x t b1 B1 .8951 0 6.01 sb1 .149 Using the t table (8 degrees of freedom), area in tail is less than .005 p-value is less than .01 Using Excel or Minitab, the p-value corresponding to t = 6.01 is .0003. Because p-value = .05, we reject H0: B1 = 0 Maintenance expense is related to usage. c. 42 a. ŷ = 6.1092 + .8951(25) = 28.49 or $28.49 per month ŷ = 80.0 + 50.0x b. 30 c. F = MSR / MSE = 6828.6/82.1 = 83.17 Using F table (1 degree of freedom numerator and 28 denominator), p-value is less than .01 Using Excel or Minitab, the p-value corresponding to F = 83.17 is .000. Because p-value < = .05, we reject H0: B1 = 0. Annual sales is related to the number of salespersons. d. ŷ = 80 + 50 (12) = 680 or $680,000 14 - 29 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 Salary & Bonus ($1000s) 43. a. 140 130 120 110 100 90 80 70 60 50 20 b. 25 30 35 40 Tuition & Fees ($1000s) 45 50 There appears to be a positive relationship between the two variables. Students that graduate from the schools with higher tuition and fees tend to receive a higher starting salary and bonus. The Minitab output is shown below: The regression equation is Salary & Bonus ($1000s) = 33.8 + 1.92 Tuition & Fees ($1000s) Predictor Constant Tuition & Fees ($1000s) S = 7.60875 Coef 33.788 1.9154 R-Sq = 73.8% SE Coef 9.340 0.2689 T 3.62 7.12 P 0.002 0.000 R-Sq(adj) = 72.4% Analysis of Variance Source Regression Residual Error Total DF 1 18 19 SS 2937.1 1042.1 3979.2 MS 2937.1 57.9 F 50.73 P 0.000 d. The p-value = .000 < = .05 (t or F); significant relationship e. r2 = .738. The least squares line provided a good fit; approximately 74% of the variability in salary and bonus can be explained by the linear relationship with tuition and fees. f. ŷ = 33.788 + 1.9154(43) = 116.15 or approximately $116,000. Note to Instructor: The average starting salary and bonus reported by U.S. News & World Report for the University of Virginia was $121,000. 14 - 30 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 44. a. Scatter diagram: 1000 900 800 Price ($) 700 600 500 400 300 200 100 0 45 50 55 60 Weight (oz) 65 70 b. There appears to be a negative linear relationship between the two variables. The heavier helmets tend to be less expensive. c. The Minitab output is shown below: The regression equation is Price = 2044 - 28.3 Weight Predictor Constant Weight Coef 2044.4 -28.350 S = 91.8098 SE Coef 226.4 3.826 R-Sq = 77.4% T 9.03 -7.41 P 0.000 0.000 R-Sq(adj) = 76.0% Analysis of Variance Source Regression Residual Error Total DF 1 16 17 SS 462761 134865 597626 MS 462761 8429 F 54.90 P 0.000 ŷ = 2044.4 – 28.35 Weight d. Significant relationship: p-value = .000 < = .05 e. r2 = 0.774; A good fit 45. a. x xi 70 14 n 5 y ( xi x )( yi y ) 200 yi 76 15.2 n 5 ( xi x ) 2 126 14 - 31 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 b1 ( xi x )( yi y ) 200 1.5873 126 ( xi x )2 b0 y b1 x 15.2 (1.5873)(14) 7.0222 yˆ 7.02 1.59 x b. The residuals are 3.48, -2.47, -4.83, -1.6, and 5.22 c. 6 Residuals 4 2 0 -2 -4 -6 0 5 10 15 20 25 x With only 5 observations it is difficult to determine if the assumptions are satisfied. However, the plot does suggest curvature in the residuals that would indicate that the error term assumptions are not satisfied. The scatter diagram for these data also indicates that the underlying relationship between x and y may be curvilinear. d. s2 23.78 hi 1 ( xi x ) 2 1 ( xi 14) 2 n ( xi x ) 2 5 126 The standardized residuals are 1.32, -.59, -1.11, -.40, 1.49. e. The standardized residual plot has the same shape as the original residual plot. The curvature observed indicates that the assumptions regarding the error term may not be satisfied. 14 - 32 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 46. a. yˆ 2.32 .64 x b. 4 3 Residuals 2 1 0 -1 -2 -3 -4 0 2 4 6 8 10 x The assumption that the variance is the same for all values of x is questionable. The variance appears to increase for larger values of x. 47. a. Let x = advertising expenditures and y = revenue yˆ 29.4 1.55 x b. SST = 1002 SSE = 310.28 SSR = 691.72 MSR = SSR / 1 = 691.72 MSE = SSE / (n - 2) = 310.28/ 5 = 62.0554 F = MSR / MSE = 691.72/ 62.0554= 11.15 Using F table (1 degree of freedom numerator and 5 denominator), p-value is between .01 and .025 Using Excel or Minitab, the p-value corresponding to F = 11.15 is .0206. Because p-value = .05, we conclude that the two variables are related. 14 - 33 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 c. 10 Residuals 5 0 -5 -10 -15 25 35 45 55 65 Predicted Values d. 48. a. The residual plot leads us to question the assumption of a linear relationship between x and y. Even though the relationship is significant at the .05 level of significance, it would be extremely dangerous to extrapolate beyond the range of the data. yˆ 80 4 x 8 6 Residuals 4 2 0 -2 -4 -6 -8 0 2 4 6 8 10 12 14 x b. The assumptions concerning the error term appear reasonable. 14 - 34 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 49. a. The Minitab output follows: The regression equation is Price ($) = 22636 + 59.0 Square Footage Predictor Constant Square Footage S = 19166.0 Coef 22636 58.96 SE Coef 20460 12.08 R-Sq = 57.0% T 1.11 4.88 P 0.283 0.000 R-Sq(adj) = 54.6% Analysis of Variance Source Regression Residual Error Total DF 1 18 19 SS 8748562231 6612039769 15360602000 MS 8748562231 367335543 F 23.82 P 0.000 b. c. 50. a. The residual plot leads us to question the assumption of a linear relationship between square footage and price. Therefore, even though the relationship is very significant (p-value = .000), using the estimated regression equation make predictions of the price for a house with square footage beyond the range of the data is not recommended. The Minitab output follows: The regression equation is Y = 66.1 + 0.402 X Predictor Constant X S = 12.62 Coef 66.10 0.4023 SE Coef 32.06 0.2276 R-sq = 38.5% T 2.06 1.77 p 0.094 0.137 R-sq(adj) = 26.1% 14 - 35 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 Analysis of Variance SOURCE DF Regression 1 Residual Error 5 Total 6 SS 497.2 795.7 1292.9 Unusual Observations Obs. X Y 1 135 145.00 MS 497.2 159.1 Fit 120.42 F 3.12 SEFit 4.87 Residual 24.58 p 0.137 St.Resid 2.11R R denotes an observation with a large standardized residual. The standardized residuals are: 2.11, -1.08, .14, -.38, -.78, -.04, -.41 The first observation appears to be an outlier since it has a large standardized residual. b. 2.5 2.0 Standardized Residual 1.5 1.0 0.5 0.0 -0.5 -1.0 110 115 120 125 Fitted Value 130 135 140 The standardized residual plot indicates that the observation x = 135, y = 145 may be an outlier; note that this observation has a standardized residual of 2.11. 14 - 36 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression c. The scatter diagram is shown below 150 145 140 135 y 130 125 120 115 110 105 100 100 110 120 130 140 150 160 170 180 x The scatter diagram also indicates that the observation x = 135, y = 145 may be an outlier; the implication is that for simple linear regression an outlier can be identified by looking at the scatter diagram. 51. a. The Minitab output is shown below: The regression equation is Y = 13.0 + 0.425 X Predictor Constant X Coef 13.002 0.4248 S = 3.181 SE Coef 2.396 0.2116 R-sq = 40.2% T 5.43 2.01 p 0.002 0.091 R-sq(adj) = 30.2% Analysis of Variance SOURCE DF Regression 1 Residual Error 6 Total 7 SS 40.78 60.72 101.50 Unusual Observations Obs. X Y 7 12.0 24.00 8 22.0 19.00 MS 40.78 10.12 Fit Stdev.Fit 18.10 1.20 22.35 2.78 F 4.03 Residual 5.90 -3.35 p 0.091 St.Resid 2.00R -2.16RX R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence. The standardized residuals are: -1.00, -.41, .01, -.48, .25, .65, -2.00, -2.16 14 - 37 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 The last two observations in the data set appear to be outliers since the standardized residuals for these observations are 2.00 and -2.16, respectively. b. Using Minitab, we obtained the following leverage values: .28, .24, .16, .14, .13, .14, .14, .76 MINITAB identifies an observation as having high leverage if hi > 6/n; for these data, 6/n = 6/8 = .75. Since the leverage for the observation x = 22, y = 19 is .76, Minitab would identify observation 8 as a high leverage point. Thus, we conclude that observation 8 is an influential observation. c. 30 25 y 20 15 10 5 0 0 5 10 15 20 25 x The scatter diagram indicates that the observation x = 22, y = 19 is an influential observation. 52. a. 120 Program Expenses ($) 100 80 60 40 20 0 0 5 10 15 20 Fundraising Expenses (%) 14 - 38 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 25 Simple Linear Regression The scatter diagram does indicate potential influential observations. For example, the 22.2% fundraising expense for the American Cancer Society and the 16.9% fundraising expense for the St. Jude Children’s Research Hospital look like they may each have a large influence on the slope of the estimated regression line. And, with a fundraising expense of on 2.6%, the percentage spend on programs and services by the Smithsonian Institution (73.7%) seems to be somewhat lower than would be expected; thus, this observeraton may need to be considered as a possible outlier b. A portion of the Minitab output follows: The regression equation is Program Expenses (%) = 91.0 - 0.917 Fundraising Expenses (%) Predictor Constant Fundraising Expenses (%) S = 7.47387 R-Sq = 47.7% Coef 90.981 -0.9172 SE Coef 3.177 0.3392 T 28.64 -2.70 P 0.000 0.027 R-Sq(adj) = 41.2% Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS 408.35 446.87 855.22 MS 408.35 55.86 F 7.31 P 0.027 Unusual Observations Obs 3 5 Fundraising Expenses (%) 2.6 22.2 Program Expenses (%) 73.70 71.60 Fit 88.60 70.62 SE Fit 2.67 5.90 Residual -14.90 0.98 St Resid -2.13R 0.21 X R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. c. The slope of the estimtaed regression equation is -0.917. Thus, for every 1% increase in the amount spent on fundraising the percentage spent on program expresses will decrease by .917%; in other words, just a little under 1%. The negative slope and value seem to make sense in the context of this problem situation. d. The Minitab output in part (b) indicates that there are two unusual observations: Observation 3 (Smithsonian Institution) is an outlier because it has a large standardized residual. Observation 5 (American Cancer Society) is an influential observation becasuse has high leverage. Although fundraising expenses for the Smithsonian Institution are on the low side as compared to most of the other super-sized charities, the percentage spent on program expenses appears to be much lower than one would expect. It appears that the Smithsonian’s administrative expenses are too high. But, thinking about the expenses of running a large museum like the Smithsonian, the percetage spent on administrative expenses may not be unreasonable and is just due to the fact that operating costs for a museum are in general higher than for some other types of organizations. The very large value of fundraising expenses for the American Cancer Society suggests that this 14 - 39 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 obervation has a large influence on the estiamted regresion equation. The following Minitab output shows the results if this observatoin is deleted from the original data. The regression equation is Program Expenses (%) = 91.3 - 1.00 Fundraising Expenses (%) Predictor Constant Fundraising Expenses (%) S = 7.96708 Coef 91.256 -1.0026 R-Sq = 31.5% SE Coef 3.654 0.5590 T 24.98 -1.79 P 0.000 0.116 R-Sq(adj) = 21.7% The y-intercept has changed slightly, but the slope has changed from -.917 to -1.00. 53. a. 140 Debt/GDP (%) 120 100 80 60 40 20 0 0 100 200 300 400 500 600 Gold Value ($B) b. There appears to be a positive relationship between the two variables. But, observation 9 (U.S.) appears to be an observation with high leverage and may be very influential in terms of fitting a linear model to the data. c. The Minitab output follows. The regression equation is Debt = 49.1 + 0.123 Gold Value Predictor Constant Gold Value Coef SE Coef 49.08 15.12 0.12299 0.07847 S = 32.0394 R-Sq = 26.0% T 3.25 1.57 P 0.014 0.161 R-Sq(adj) = 15.4% Analysis of Variance Source Regression Residual Error Total DF 1 7 8 SS 2522 7186 9708 MS 2522 1027 F 2.46 P 0.161 14 - 40 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression Unusual Observations Obs 9 Gold Value 487 Debt 93.2 Fit 109.0 SE Fit 29.5 Residual -15.8 St Resid -1.27 X X denotes an observation whose X value gives it large leverage. d. The Minitab output identifies observation 9 as an observation whose x value gives it large leverage. e. Looking at the scatter diagram in part (a) it looks like observation 9 will have a lot of influence on the estimated regression equation. To investigate this we can simply drop the observation from the data set and fit a new estimated regression equation. The Minitab output we obtained follows. The regression equation is Debt = 30.8 + 0.342 Gold Value Predictor Constant Gold Value Coef 30.77 0.3422 SE Coef 19.85 0.1804 S = 30.3907 R-Sq = 37.5% T 1.55 1.90 P 0.172 0.107 R-Sq(adj) = 27.1% Analysis of Variance Source Regression Residual Error Total DF 1 6 7 SS 3324.2 5541.6 8865.7 MS 3324.2 923.6 F 3.60 P 0.107 Note that the slope of the estimated regression equation is now .342 as compared to a value of .123 when this observation is included. Thus, we see that this observation has a big impact on the value of the slope of the fitted line and hence we would say that it is an influential observation. 54. a. 14 - 41 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 The scatter diagram does indicate potential outliers and/or influential observations. For example, the data for the Washington Redskins, New England Patriots, and the Dallas Cowboys not only have the three highest revenues, they also have the highest team values. b. A portion of the Minitab output follows: The regression equation is Value = - 252 + 5.83 Revenue Predictor Constant Revenue S = 87.2441 Coef SE Coef -252.1 130.8 5.8317 0.5863 T -1.93 9.95 R-Sq = 76.7% P 0.064 0.000 R-Sq(adj) = 76.0% Analysis of Variance Source Regression Residual Error Total DF 1 30 31 SS 753008 228346 981354 MS 753008 7612 F 98.93 P 0.000 Unusual Observations Obs 9 19 21 22 32 Revenue 269 282 214 213 327 Value 1612.0 1324.0 1178.0 1170.0 1538.0 Fit 1316.6 1392.5 995.9 990.1 1654.9 SE Fit 31.8 38.6 16.0 16.2 63.7 Residual 295.4 -68.5 182.1 179.9 -116.9 St Resid 3.64R -0.88 X 2.12R 2.10R -1.96 X R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. c. The Minitab output indicates that there are five unusual observations: Observation 9 (Dallas Cowboys) is an outlier because it has a large standardized residual. Observation 19 (New England Patriots) is an influential observation becasuse has high leverage. Observation 21 (New York Giants) is an outlier because it has a large standardized residual. Observation 22 (New York Jets) is an outlier because it has a large standardized residual. Observation 32 (Washington Redskins) is an influential observation becasuse has high leverage. 55. No. Regression or correlation analysis can never prove that two variables are causally related. 56. The estimate of a mean value is an estimate of the average of all y values associated with the same x. The estimate of an individual y value is an estimate of only one of the y values associated with a particular x. 57. The purpose of testing whether 1 0 is to determine whether or not there is a significant relationship between x and y. However, rejecting 1 0 does not necessarily imply a good fit. For example, if 1 0 is rejected and r2 is low, there is a statistically significant relationship between x and y but the fit is not very good. 14 - 42 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 58. a. 1420 1400 S&P 500 1380 1360 1340 1320 1300 1280 1260 12200 12400 12600 12800 13000 13200 13400 DJIA b. A portion of the Minitab output is shown below: The regression equation is S&P = - 669 + 0.157 DJIA Predictor Constant DJIA Coef SE Coef -669.0 130.7 0.15727 0.01015 S = 9.60811 R-Sq = 94.9% T -5.12 15.49 P 0.000 0.000 R-Sq(adj) = 94.5% Analysis of Variance Source Regression Residual Error Total DF 1 13 14 SS 22146 1200 23346 MS 22146 92 F 239.89 P 0.000 c. Using the F test, the p-value corresponding to F = 239.89 is .000. Because the p-value =.05, we reject H 0 : 1 0 ; there is a significant relationship. d. With R-Sq = 94.9%, the estimated regression equation provided an excellent fit. e. yˆ 669.0 .15727(DJIA)= 669.0 .15727(13,500) 1454 f. The DJIA is not that far beyond the range of the data. With the excellent fit provided by the estimated regression equation, we should not be too concerned about using the estimated regression equation to predict the S&P500. 14 - 43 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 59. a. The Minitab output is shown below: The regression equation is Share Price ($) = - 2.99 + 0.911 Fair Value ($) Predictor Constant Fair Value ($) S = 12.0064 Coef SE Coef -2.987 5.791 0.91128 0.09783 R-Sq = 76.9% T -0.52 9.31 P 0.610 0.000 R-Sq(adj) = 76.1% Analysis of Variance Source Regression Residual Error Total DF 1 26 27 SS 12507 3748 16255 MS 12507 144 F 86.76 P 0.000 ŷ = -2.987 + .91128 Fair Value ($) b. Significant relationship: p-value = .000 < = .05 c. ŷ = -2.987 + .91128 Fair Value ($) = -2.987 + .91128(50) = 42.577 or approximately $42.58 d. The estimated regression equation should provide a good estimate because r2 = 0.769 60. a. The scatter diagram indicates a positive linear relationship between the two variables. Online universities with higher retention rates tend to have higher graduation rates. 14 - 44 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression b. The Minitab output follows: The regression equation is GR(%) = 25.4 + 0.285 RR(%) Predictor Constant RR(%) Coef 25.423 0.28453 S = 7.45610 SE Coef 3.746 0.06063 T 6.79 4.69 R-Sq = 44.9% P 0.000 0.000 R-Sq(adj) = 42.9% Analysis of Variance Source Regression Residual Error Total DF 1 27 28 SS 1224.3 1501.0 2725.3 MS 1224.3 55.6 F 22.02 P 0.000 Unusual Observations Obs 2 3 RR(%) 51 4 GR(%) 25.00 28.00 Fit 39.93 26.56 SE Fit 1.44 3.52 Residual -14.93 1.44 St Resid -2.04R 0.22 X R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. 61. c. Because the p-value = .000 < α =.05, the relationship is significant. d. The estimated regression equation is able to explain 44.9% of the variability in the graduation rate based upon the linear relationship with the retention rate. It is not a great fit, but given the type of data, the fit is reasonably good. e. In the Minitab output in part (b), South University is identified as an observation with a large standardized residual. With a retention rate of 51% it does appear that the graduation rate of 25% is low as compared to the results for other online universities. The president of South University should be concerned after looking at the data. Using the estimated regression equation, we estimate that the gradation rate at South University should be 25.4 + .285(51) = 40%. f. In the Minitab output in part (b), the University of Phoenix is identified as an observation whose x value gives it large influence. With a retention rate of only 4%, the president of the University of Phoenix should be concerned after looking at the data. The Minitab output is shown below: The regression equation is Expense = 10.5 + 0.953 Usage Predictor Constant X Coef 10.528 0.9534 SE Coef 3.745 0.1382 S = 4.250 R-sq = 85.6% T 2.81 6.90 p 0.023 0.000 R-sq(adj) = 83.8% Analysis of Variance SOURCE DF Regression 1 Residual Error 8 Total 9 SS 860.05 144.47 1004.53 MS 860.05 18.06 F 47.62 p 0.000 14 - 45 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 Fit 39.13 Stdev.Fit 1.49 ( 95% C.I. 35.69, 42.57) ( 95% P.I. 28.74, 49.52) a. ŷ = 10.528 + .9534 Usage b. Since the p-value corresponding to F = 47.62 = .000 < = .05, we reject H0: 1 = 0. c. The 95% prediction interval is 28.74 to 49.52 or $2874 to $4952 d. Yes, since the expected expense is ŷ = 10.528 + .9534(30) = 39.13 or $3913. 62. a. The Minitab output is shown below: The regression equation is Defects = 22.2 - 0.148 Speed Predictor Constant Speed S = 1.489 Coef 22.174 -0.14783 SE Coef 1.653 0.04391 R-Sq = 73.9% T 13.42 -3.37 P 0.000 0.028 R-Sq(adj) = 67.4% Analysis of Variance Source Regression Residual Error Total DF 1 4 5 SS 25.130 8.870 34.000 MS 25.130 2.217 F 11.33 P 0.028 Predicted Values for New Observations New Obs Fit 1 14.783 SE Fit 0.896 ( 95.0% CI 12.294, 17.271) ( 95.0% PI 9.957, 19.608) b. Since the p-value corresponding to F = 11.33 = .028 < = .05, the relationship is significant. c. r 2 = .739; a good fit. The least squares line explained 73.9% of the variability in the number of defects. d. Using the Minitab output in part (a), the 95% confidence interval is 12.294 to 17.271. 14 - 46 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 63. a. 9 8 7 Days 6 5 4 3 2 1 0 0 5 10 15 20 Distance There appears to be a negative linear relationship between distance to work and number of days absent. b. The Minitab output is shown below: The regression equation is Days = 8.10 - 0.344 Distance Predictor Constant X Coef 8.0978 -0.34420 S = 1.289 SE Coef 0.8088 0.07761 R-sq = 71.1% T 10.01 -4.43 p 0.000 0.002 R-sq(adj) = 67.5% Analysis of Variance SOURCE DF Regression 1 Residual Error 8 Total 9 Fit 6.377 c. Stdev.Fit 0.512 SS 32.699 13.301 46.000 ( MS 32.699 1.663 95% C.I. 5.195, 7.559) F 19.67 ( p 0.002 95% P.I. 3.176, 9.577) Since the p-value corresponding to F = 419.67 is .002 < = .05. We reject H0 : 1 = 0. There is a significant relationship between the number of days absent and the distance to work. d. r2 = .711. The estimated regression equation explained 71.1% of the variability in y; this is a reasonably good fit. e. The 95% confidence interval is 5.195 to 7.559 or approximately 5.2 to 7.6 days. 14 - 47 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 64. a. The Minitab output is shown below: The regression equation is Cost = 220 + 132 Age Predictor Constant X Coef 220.00 131.67 S = 75.50 SE Coef 58.48 17.80 R-sq = 87.3% T 3.76 7.40 p 0.006 0.000 R-sq(adj) = 85.7% Analysis of Variance SOURCE DF Regression 1 Residual Error 8 Total 9 Fit 746.7 b. Stdev.Fit 29.8 SS 312050 45600 357650 ( MS 312050 5700 95% C.I. 678.0, 815.4) ( F 54.75 p 0.000 95% P.I. 559.5, 933.9) Since the p-value corresponding to F = 54.75 is .000 < = .05, we reject H0: 1 = 0. Maintenance cost and age of bus are related. c. r2 = .873. The least squares line provided a very good fit. d. The 95% prediction interval is 559.5 to 933.9 or $559.50 to $933.90 65. a. The Minitab output is shown below: The regression equation is Points = 5.85 + 0.830 Hours Predictor Constant X Coef 5.847 0.8295 S = 7.523 SE Coef 7.972 0.1095 R-sq = 87.8% T 0.73 7.58 p 0.484 0.000 R-sq(adj) = 86.2% Analysis of Variance SOURCE DF Regression 1 Residual Error 8 Total 9 Fit 84.65 b. Stdev.Fit 3.67 SS 3249.7 452.8 3702.5 ( MS 3249.7 56.6 95% C.I. 76.19, 93.11) F 57.42 ( p 0.000 95% P.I. 65.35, 103.96) Since the p-value corresponding to F = 57.42 is .000 < = .05, we reject H0: 1 = 0. Total points earned is related to the hours spent studying. c. 84.65 points d. The 95% prediction interval is 65.35 to 103.96 14 - 48 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Simple Linear Regression 66. a. The Minitab output is shown below: The regression equation is Horizon = 0.275 + 0.950 S&P 500 Predictor Constant S&P 500 Coef 0.2747 0.9498 S = 2.664 SE Coef 0.9004 0.3569 R-Sq = 47.0% T 0.31 2.66 P 0.768 0.029 R-Sq(adj) = 40.3% Analysis of Variance Source DF Regression 1 Residual Error 8 Total 9 SS 50.255 56.781 107.036 MS 50.255 7.098 F 7.08 P 0.029 The market beta for Horizon is b1 = .95 b. Since the p-value = 0.029 is less than = .05, the relationship is significant. c. r2 = .470. The least squares line does not provide a very good fit. d. Xerox has higher risk with a market beta of 1.22. 67. a. The Minitab output is shown below: The regression equation is Audit% = - 0.471 +0.000039 Income Predictor Constant Income Coef -0.4710 0.00003868 S = 0.2088 SE Coef 0.5842 0.00001731 R-Sq = 21.7% T -0.81 2.23 P 0.431 0.038 R-Sq(adj) = 17.4% Analysis of Variance Source Regression Residual Error Total DF 1 18 19 SS 0.21749 0.78451 1.00200 MS 0.21749 0.04358 F 4.99 P 0.038 Predicted Values for New Observations New Obs 1 Fit 0.8828 SE Fit 0.0523 95.0% CI ( 0.7729, 0.9927) 95.0% PI ( 0.4306, 1.3349) b. Since the p-value = 0.038 is less than = .05, the relationship is significant. c. r2 = .217. The least squares line does not provide a very good fit. d. The 95% confidence interval is .7729 to .9927. 14 - 49 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Chapter 14 68. a. 18.0 Price ($1000s) 16.0 14.0 12.0 10.0 8.0 6.0 4.0 0 20 40 60 80 Miles (1000s) 100 120 b. There appears to be a negative relationship between the two variables that can be approximated by a straight line. An argument could also be made that the relationship is perhaps curvilinear because at some point a car has so many miles that its value becomes very small. c. The Minitab output is shown below. The regression equation is Price ($1000s) = 16.5 - 0.0588 Miles (1000s) Predictor Constant Miles (1000s) S = 1.54138 Coef 16.4698 -0.05877 SE Coef 0.9488 0.01319 R-Sq = 53.9% T 17.36 -4.46 P 0.000 0.000 R-Sq(adj) = 51.2% Analysis of Variance Source Regression Residual Error Total DF 1 17 18 SS 47.158 40.389 87.547 MS 47.158 2.376 F 19.85 P 0.000 d. Significant relationship: p-value = 0.000 < α = .05. e. r 2 = .539; a reasonably good fit considering that the condition of the car is also an important factor in what the price is. f. The slope of the estimated regression equation is -.0558. Thus, a one-unit increase in the value of x coincides with a decrease in the value of y equal to .0558. Because the data were recorded in thousands, every additional 1000 miles on the car’s odometer will result in a $55.80 decrease in the predicted price. g. The predicted price for a 2007 Camry with 60,000 miles is ŷ = 16.5 -.0588(60) = 12.97 or approximately $13,000. Because of other factors, such as condition and whether the seller is a private party or a dealer, this is probably not the price you would offer for the car. But, it should be a good starting point in figuring out what to offer the seller. 14 - 50 © 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.