Chapter 06 - Multiple Linear Regression Analysis CHAPTER 6 Answers to End of Chapter Problems 6.1 a. The correlations we are most interested in are those against college GPA because that is the dependent variable. Hours studied has the highest correlation, then SAT score, and lastly HS GPA. All of the independent variables are positively and relatively strongly related to college GPA. We can also see that high-school GPA and hours studied have a positive and somewhat strong relationship with each other and high-school GPA and SAT score and SAT score and hours studied have a positive but not as strong relationship. b. The correlation matrix sheds light on linear relationships but does offer information on the marginal effects such as the effect that hours studied has on College GPA holding SAT score and high-school GPA constant. 6.2 a. The two conditions are if πΏΜ1 =0 (this would mean that IQ and education are not linearly related) and if π½Μ2=0 (this would mean that log(wage) and IQ are not linearly related). b. π½Μ2=0 (this would mean that log(wage) and IQ are not linearly related) 6-1 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis πΏΜ1 =0 (this would mean that IQ and education are not linearly related) c. It is genearlly the case the IQ and education are related to each other and IQ is related to log(wage) 6-2 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis 6.3 a. I would expect π½Μ1 = π½Μ1 b. Again, I would expect π½Μ1 = π½Μ1 6.4 I would advise my friend that it would be easiest to just regress y on x and z and perform a t-test on the coefficient on z. The method that the friend suggested is not quite right. The friend should regress z on x and save the residuals and then regress y on those residuals. This will give your friend the same results as the first method but the standard errors of the second method will be wrong. 6-3 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis 6.5 a. b. R-squared is 1,237,904/2,347,829 = .5273. This means that 52.73% of the variation of Sales is related to advertising, bonus, relative price, and competition. c. 100% d. Hypothesis: π»0 : π½1 = π½2 = π½3 = π½4 = 0 π»1 : ππ‘ ππππ π‘ πππ π½π ππ πππ‘ πππ’ππ π‘π 0 Test Statistic F-statistic = 12.5471 and the p-value of the F-test is 0.001. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 0.001< 0.01 we reject H0 and conclude that at least one of advertising, bonus, relative price, and competition is related to sales at the 1% level. e. Bonus Hypothesis: π»0 : π½2 = 0 π»1 : π½2 ≠ 0 Test Statistic t-statistic = 11.6/3.02= 3.84 and the p-value of the t-test is 0.003. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 0.003 < 0.01 we reject H0 and conclude that bonus is statistically significant at the 1% level. f. ADV: On average, holding bonus, relative price, and competition constant, if the amount spend on advertising increases by $1000, then sales go up by $10,800. Bonus: On average, holding advertising, relative price, and competition constant, if the amount spend on bonuses increases by $1000, then sales go up by $11,600. RELPRICE: On average, holding advertising, bonus, relative price, and competition constant, if the Medicorp’s relative price increases by $1, then sales goes down by 122,000. 6-4 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis g. The 95% confidence interval is π½Μ2 ± π‘πΌ,π−π−1 π ππ½Μ2 2 11.6 ± (2.02) (3.02) (5.5, 17.7) This interval does not include 0, which means that Bonus is statistically significant at the 5% level. Μ π = π½Μ0 + π½Μ1 π΄ππ£π + π½Μ2 π΅πππ’π π + π½Μ3 π ππππππππ + π½Μ4 πΆπππππ‘π h. πππππ Μ π = 889.1 + 10.8π΄ππ£π + 11.6π΅πππ’π π − 122π ππππππππ − 0.18πΆπππππ‘π πππππ Μ π = 889.1 + 10.8(60) + 11.6(45) − 122(0.7) − 0.18(100) πππππ Μ π = $2,073,520 πππππ 6.6 a. b. R-squared is 1,108,892/2,807,27 9.334= .395. This means that 39.5% of the variation of price is related to speed and charge. c. Hypothesis: π»0 : π½1 = π½2 = 0 π»1 : ππ‘ ππππ π‘ πππ π½π ππ πππ‘ πππ’ππ π‘π 0 Test Statistic F-statistic = 6.203 and the p-value of the F-test is 0.0048. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 0.0048< 0.01 we reject H0 and conclude that at least one of speed or charge is related to price at the 1% level. d. Speed Hypothesis: π»0 : π½1 = 0 π»1 : π½1 ≠ 0 Test Statistic t-statistic = 10.1506/14.5019= 0.7 and the p-value of the t-test is 0.003. Rejection Rule Reject H0 if the |t-stat| > 2.1. Rejection Rule Because 0.7 < 2.1 we fail to reject H0 and conclude that speed is statistically unrelated to price at the 5% level. 6-5 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis e. Speed: On average, holding charge constant, if the speed increases by 1megahertz then price go up by $10.15. Charge: On average, holding speed constant, if the charge increases by one minute then price go up by $1.63. f. The 95% confidence interval is π½Μ1 ± π‘πΌ,π−π−1 π ππ½Μ1 2 10.15 ± (2.1) (14.5) (-22.19, 42.49) This interval includes 0, which means that Speed is statistically insignificant at the 5% level. This interval brackets π½1 . g. π π¦|π₯ = √89,389 = 298.98. Μ π = π½Μ0 + π½Μ1 ππππππ + π½Μ2 πΆβπππππ h. πππππ Μ π = 1500.6 + 10.15ππππππ + 1.63πΆβπππππ πππππ Μ π = 1500.6 + 10.15(33) + 1.63(305) i. πππππ Μ π = $2,332.7 πππππ 6.7 a. On average, holding population within one mile and population beyond one mile constant, if the number of competitors within one mile goes up by 1 then sales goes down by $4,200. b. On average, holding competition within one mile and population beyond one mile constant, if the population within one mile goes up by 1000 people then sales goes down by $6,800. c. On average, holding competition within one mile and population within one mile constant, if the population beyond one mile goes up by 1000 people then sales goes down by $2,300. Μ π = 10.1 − 4.2(2) + 6.8(8) + 2.3(13) d. πππππ Μ π = $86,000 πππππ 6.8 a. R-squared is 3660.74/4190.95= .8735. This means that 87.35% of the variation of risk is related to age, pressure, and family. b. π π¦|π₯ = √33.14 = 5.7567. This is the variation left in risk when the effects of age, pressure and family have been removed. c. Age: On average, holding pressure and family constant, if the age of a person increases by one year then risk go up by 1.08 or 1%. 6-6 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis Pressure: On average, holding age and family constant, if the blood pressure of a person increases by one then risk go up by .25 or .25%. Family: On average, holding age and pressure constant, if the number of strokes the person’s parents have had goes up by one then risk goes up by 8.74 or 8.74%. d. Hypothesis: π»0 : π½1 = π½2 = π½3 = 0 π»1 : ππ‘ ππππ π‘ πππ π½π ππ πππ‘ πππ’ππ π‘π 0 Test Statistic F-statistic = 36.82 and the p-value of the F-test is 2.06E-07. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 2.06E-07< 0.01 we reject H0 and conclude that at least one of age, pressure or family is related to price at the 1% level. e. Pressure Hypothesis: π»0 : π½2 = 0 π»1 : π½2 ≠ 0 Test Statistic t-statistic = 0.25/0.05= 5. Rejection Rule Reject H0 if the |t-stat| > 1.75. Rejection Rule Because 5 > 1.75 we reject H0 and conclude that pressure is statistically significant at the 10% level. f. The 95% confidence interval is π½Μ1 ± π‘πΌ,π−π−1 π ππ½Μ1 2 1.08 ± (2.12) (0.17) (0.72, 1.44) This interval does not include 0, which means that age is statistically significant at the 5% level. This interval brackets π½1 . Μ π = π½Μ0 + π½Μ1 π΄πππ + π½Μ2 ππππ π π’πππ + π½Μ3 πΉπππππ¦π g. π ππ π Μ π = −91.76 + 1.08π΄πππ + 0.25ππππ π π’πππ + 8.74πΉπππππ¦π π ππ π Μ π = −91.76 + 1.08(70) + 0.25(165) + 8.74(1) π ππ π Μ π = 33.83 or their risk of a stroke is 33.83% π ππ π h. The prediction could be above 100% or below 0%. 6.9 a. On average, holding square footage and niceness constant, if number of bathrooms increases by one then the price of the house increases by $10,043. 6-7 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis b. Increasing a niceness rating by 2 will increase the value of the house by (2)(10,042)=$20,084. If it costs $32,500 to increase the niceness rating from 5 to 7 then when selling a house 20,084/32,500 = .618 or 61.8%. c. Increasing a niceness rating by 2 will increase the value of the house by (2)(10,042)=$20,084. If it costs $24,000 to increase the niceness rating from 2 to 4 then when selling a house 20,084/24,000 = .8368 or 83.68%. d. From this question we see that it is more expensive to increase the niceness rating from 5 to 7 than it is to increase the niceness rating from 2 to 4 but the way the regression model is specified these increases change the value of the house by the same amount. It would be better to specify the model such that increasing the niceness rating has a nonconstant marginal effect on price. e. π¦Μπ = π½Μ0 + π½Μ1 π₯1,π + π½Μ2 π₯2,π + π½Μ3 π₯3,π π¦Μπ = 24.976 + 0.0526π₯1,π + 10.043π₯2,π + 10.042π₯3,π π¦Μπ = 24.976 + 0.0526(3200) + 10.043(3) + 10.042(3) Μπ¦π = $253,551 6.10 a. To test this hypothesis we need to change the regression model to obtain the standard error. π»0 : π½1 = π½2 or π½1 − π½2 = 0 π»1 : π½1 ≠ π½2 or π½1 − π½2 ≠ 0 π¦π = π½0 + π½1 π₯1,π + π½2 π₯2,π + π set π½1 − π½2 = π and then solve for π½1 π½1 = π + π½2 and then substitute this equation into the model for π½1 π¦π = π½0 + (π + π½2 )π₯1,π + π½2 π₯2,π + π π¦π = π½0 + ππ₯1,π + π½2 π₯1,π + π½2 π₯2,π + π π¦π = π½0 + ππ₯1,π + π½2 (π₯1,π + π₯2,π ) + π This equation says that we need to create a new variable, (π₯1,π + π₯2,π ), and the regress π¦π on π₯1,π and (π₯1,π + π₯2,π ). The coefficient estimate on π₯1,π is πΜ and then the standard error on π₯1,π is the standard error of πΜ. The t-stat next to π₯1,π is the t-stat for the hypothesis and the p-value on π₯1,π is the pvalue for this hypothesis. The rejection rule is to reject H0 if the p-value < 0.05. b. To generate a confidence interval around a prediction we need to change the data. We need to create two new columns in Excel. The first column is generated by taking each π₯1 and subtract off 6 (the value of the prediction for π₯1 ) for observations 1 all the way to n and name it (π₯1 − 6). The second column is generated by taking each π₯2 and subtract off 12 (the value of the prediction for π₯2 ) for observations 1 all the way to n and name it (π₯2 − 12). Once these two new columns are created then you need to regress y on (π₯1 − 6) and (π₯2 − 12). The predicted value is the coefficient estimate next to the intercept, the standard error is in the next column, and 6-8 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis the 95% prediction interval is the confidence interval in the intercept row of the coefficient estimates. 6.11 a. The sample regression function is π¦Μπ = π½Μ0 + π½Μ1 π₯1 + π½Μ2 π₯2 + π½Μ3 π₯3 + π½Μ4 π₯4 π¦Μπ = 14,122.2409 + 63.1533π₯1 + 10.0958π₯2 + 31.5062π₯3 + 10.4609π₯4 b. On average, holding teams win/loss percentage, opponent win/loss percentage, and games played constant, when temperature at game time goes up by one degree then the game attendance goes up by 10.46 people. c. π¦Μπ = 14,122.2409 + 63.1533(60) + 10.0958(40) + 31.5062(50) + 10.4609(70) π¦Μπ = 20,622.8439 people. d. To increase attendance the athletic director should win more games, play teams with a better record, play more games, and play games when the temperature is hotter. Answers to End of Chapter Exercises E6.1 Regression of Experience on Education Regression of wage on the residuals from the regression above 6-9 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis Regression of wage on experience and education These match! 6-10 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis E6.2 a Number of Friends: On average, holding age and member in months constant, if the number of friends goes up by 1 then the numbers of times tagged in a picture goes up by .16 times. Age: On average, holding number of friends and member in months constant, if the age of a person increases by 1 year then the numbers of times tagged in a picture goes down by .1.93 times. Member in Months: On average, holding number of friends and member in months constant, if the how long the person has been a member of facebook goes up by 1 month then the numbers of times tagged in a picture goes up by .01 times. b. Hypothesis: π»0 : π½1 = π½2 = π½3 = 0 π»1 : ππ‘ ππππ π‘ πππ π½π ππ πππ‘ πππ’ππ π‘π 0 Test Statistic F-statistic = 128.32 and the p-value of the F-test is 1.14 E – 29. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 1.14E-29< 0.01 we reject H0 and conclude that at least one of number of friends, age, or member in months is related to number of times tagged at the 1% level. 6-11 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis c. Number of Friends Hypothesis: π»0 : π½1 = 0 π»1 : π½1 ≠ 0 Test Statistic t-statistic = 10.15 and the p-value of the t-test is 8.63E-16. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 8.63E-16 < 0.01 we reject H0 and conclude that number of friends is statistically significant at the 1% level. Age Hypothesis: π»0 : π½2 = 0 π»1 : π½2 ≠ 0 Test Statistic t-statistic = -5.75 and the p-value of the t-test is 1.76E-16. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 1.76E-16 < 0.01 we reject H0 and conclude that the age of the person is statistically significant at the 1% level. Number of Months Hypothesis: π»0 : π½3 = 0 π»1 : π½3 ≠ 0 Test Statistic t-statistic = 0.079 and the p-value of the t-test is 0.94. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 0.94 > 0.10 we reject H0 and conclude that number of months a person is a member is not statistically significant at the 10% level. d. The R2 is 0.8351. This number tells us that 83.51 percent of the variation in the number of times a person is tagged is explained by number of friends, age, and member in months. e. πππππ Μ πππππππ = π½Μ0 + π½Μ1 πΉππππππ π + π½Μ2 ππππ + π½Μ3 πππππππ Μ πππππππ = 27.0287 + 0.1615πΉππππππ π − 1.929ππππ + 0.0112πππππππ πππππ Μ πππππππ = 27.0287 + 0.1615(250) − 1.929(25) + 0.0112(18) πππππ Μ πππππππ = 19.3842 πππππ The predicted number of times tagged is 19.38. 6-12 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis f. To obtain this prediction interval in excel take each observation for each independent variable and subtract off the value you would like to predict for and regress times tagged on these new variables. For example, take each number of friends observation and subtract off 250, take each age observation and subtract off 25, and take each member observation and subtract off 18. The intercept contains the predicted value and the 95% confidence interval output for the intercept contains the 95% confidence interval for the mean. This 95% confidence prediction interval for the mean is (-5.8, 44.57). g. To obtain the standard error for the individual prediction interval, take the standard error of the prediction (12.645) and add the standard error of the regression (27.1027). To obtain the 95% prediction interval multiply this standard error by the critical value (2) and add and subtract this from the predicted value. 95% confidence prediction interval for an individual is (-60.11, 98.88). Notice how much bigger this is than the interval that was obtained in part f. 6-13 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis E6.3. a. Age: On average, holding hours working out, hours at work and number of times sick constant, if the age of a person increases by one year then their BMI goes up by 0.145. Work Out: On average, holding age, hours at work and number of times sick constant, if the number of hours a person spends working out in a week increases by one year then their BMI goes down by 0.195. Work: On average, holding age, hours working out, and number of times sick constant, if the number of hours spent at work in a week increases by one year then their BMI goes up by 0.151. Sick: On average, holding age, hours working out, and hours at work constant, if the number of times a person is sick in a year increases by one then their BMI goes up by 0.4. b. Hypothesis: π»0 : π½1 = π½2 = π½3 = π½4 = 0 π»1 : ππ‘ ππππ π‘ πππ π½π ππ πππ‘ πππ’ππ π‘π 0 Test Statistic F-statistic = 55.65 and the p-value of the F-test is 1.95 E – 17. 6-14 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 1.95 E – 17 < 0.01 we reject H0 and conclude that at least one of age, workout, work, or sick is related to BMI at the 1% level. This model is statistically significant. c. Age Hypothesis: π»0 : π½1 = 0 π»1 : π½1 ≠ 0 Test Statistic t-statistic = 3.03 and the p-value of the t-test is 0.0039. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 0.0039 < 0.01 we reject H0 and conclude that age is statistically significant at the 1% level. Work Out Hypothesis: π»0 : π½2 = 0 π»1 : π½2 ≠ 0 Test Statistic t-statistic = -2.69 and the p-value of the t-test is 0.0097. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 0.0097 < 0.01 we reject H0 and conclude that hours working out in a week is statistically significant at the 1% level. Work Hypothesis: π»0 : π½3 = 0 π»1 : π½3 ≠ 0 Test Statistic t-statistic = 4.26 and the p-value of the t-test is 9.39E-05. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 9.39E-05 < 0.01 we reject H0 and conclude that hours working in a week is statistically significant at the 1% level. 6-15 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis Sick Hypothesis: π»0 : π½4 = 0 π»1 : π½4 ≠ 0 Test Statistic t-statistic = 5.99 and the p-value of the t-test is 2.61E-05. Rejection Rule Reject H0 if the p-value < 0.01. Rejection Rule Because 2.61E-05 < 0.01 we reject H0 and conclude that number of times a person is sick in a year is statistically significant at the 1% level. d. The R2 is 0.8226. This number tells us that 82.26 percent of the variation in BMI is explained by age, work out, work, and sick. Μπ = π½Μ0 + π½Μ1 π΄πππ + π½Μ2 ππππ ππ’π‘π + π½Μ3 πππππ + π½Μ4 πππππ e. π΅ππΌ Μ π΅ππΌπ = 11.74 + 0.1451π΄πππ − 0.1955 ππππ ππ’π‘π + 0.1512πππππ + 0.4006πππππ Μ π΅ππΌπ = 11.74 + 0.1451(35) − 0.1955 (3) + 0.1512(45) + 0.4006(2) The predicted number of times tagged is 23.8381. f. To obtain this prediction interval in excel take each observation for each independent variable and subtract off the value you would like to predict for and regress BMI on these new variables. The intercept contains the predicted value and the 95% confidence interval output for the intercept contains the 95% confidence interval for the mean. This 95% confidence prediction interval for the mean is (22.54, 25.14). 6-16 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Chapter 06 - Multiple Linear Regression Analysis g. To obtain the standard error for the individual prediction interval, take the standard error of the prediction (0.6466) and add the standard error of the regression (1.9427). To obtain the 95% prediction interval multiply this standard error by the critical value (2.02) and add and subtract this from the predicted value. 95% confidence prediction interval for an individual is (18.61, 29.07). Notice how much bigger this is than the interval that was obtained in part f. 6-17 Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.