Multiple Linear Regression

QUESTIONS 1-8 DEAL WITH THE FOLLOWING SITUATION: Stock Prices, Y, are assumed to be affected by the annual dividend rate, X1, and annual return on equity, X2. A first order regression was fit to the data and the following analysis resulted. Sum of Mean Source DF Squares Square F Value Prob Model 2 Error 25 Total (Adjusted) 27 Square Root MSE Dep Mean 1148.64291 248.03566 1396.67857 3.14983 22.10714 574.32145 9.92143 R-square 57.887 0.0001 0.8224 Parameter Estimates Variable INTERCEPT X1 X2 Obs 29 Coefficient Standard Estimate Error -15.830443 4.67316808 12.276010 1.19702953 0.609433 0.28306125 X1 X2 2.96 17.1 Actual Var Y . T for H0: B=0 -3.388 10.255 2.153 Prob 0.0023 0.0001 0.0412 Predicted 95% LCL Value Mean 95%UCL Mean 30.9279 33.4198 28.4359 95% LCL 95% UCL Individual Individual 23.9785 37.8772 1. I am 95% confident that the average stock price for all stocks with a dividend rate of 2.96 and an equity return of 17.1 falls in the range A. 35.00 to 30.92 B. 20.95 to 23.74 C. 28.44 to 33.42 D. 15.71 to 28.98 E. 23.98 to 37.88 2. What is the p-value for testing for the effect of equity? A. 2.153 B. 0.0001 C. 0.0412 D. 0.05 E. 0.8224 3. What percent of the sample variability in stock prices can be attributed to variation in the dividend rate and the return on equity. A. B. C. D. E. 78.14 82.24 78.95 80.82 89.27 4. The rejection region for testing that equity and/or dividend rate is useful for estimating the mean stock price is A. F > F(2, 25, 0.05) B. |t| > t(2, 25, 0.025) C. t > t(25, 0.025) or t < -t(25, 0.025) D. F < F(25, 0.05) E. F > F(25, 0.05) 5. If there is interaction between annual rate of dividend and return on equity then A. the stock price interacts with annual rate of dividend B. the change in the mean stock price associated with each additional point in the dividend rate is a linear function of the annual return on equity. C. the slope of annual rate of dividend is a function of the stock price D. the mean stock price depends on annual rate of dividend and/or the number of equity (only true for interaction model) E. annual rate of dividend is correlated with the return on equity. 6. What is your conclusion after testing 2 = 0 versus 2 ≠ 0? A. At alpha=0.05, we can say that after adjusting for the dividend rate , the return on equity does help predict the stock price. B. At alpha=0.05, we can not say that the dividend rate does help predict the stock price. C. At alpha=0.05, we can say that the dividend rate does help predict the stock price. D. At alpha=0.05, we can say that after adjusting for the dividend rate, the return on equity does not help predict the stock price. E. At alpha=0.05, we can not say that after adjusting for the dividend rate, the return on equity does help predict the stock price. 7. What is the estimate of the typical sample error when trying to predict stock price with the dividend rate and return on equity? A. 3.14983 B. 2.31635 C. 0.82240 D. 4.67317 E. 1.19703 Answers: 1. C Observation 11, third and fourth columns from left 2. C p-value: 3. B definition of r-square 4. A F(k, n-k-1) 5 B interaction interpretation 6. A testing equity - p-value = 0.0412 < 0.05 reject H0 7. A root MSE is the standard error of y given x1 and x2 QUESTIONS 8-14 DEAL WITH THE FOLLOWING SITUATION: A collector of antique grandfather clocks believes that the price received for the clocks, Y, at an antique auction increases with the age of the clocks, X1, and with the number of bidders, X2. A first order regression was fit to a random sample of 32 clocks with the following analysis resulted. Analysis of Variance Sum of Squares Mean Square Source DF Model Error C Total 2 4277159.7034 2138579.8517 29 514034.51534 17725.32812 31 4791194.2188 Root MSE Dep Mean 133.13650 1327.15625 R-square Adj R-sq F Value Pro 120.651 0.0 0.8927 0.8853 Parameter Estimates Variable INTERCEP X1 X2 Coefficient Standard Estimate Error -1336.722052 12.736199 85.815133 173.35612607 0.90238049 8.70575681 T for H0: Parameter=0 -7.711 14.114 9.857 Prob 0.0001 0.0001 0.0001 8. The estimated slope of number of bidders is: A 85.815133 B 0.90238049 C 0.0001. D 12.736199 E 14.114 9. The true slope parameter of age is interpreted as: A The change in the mean auction price for each additional year of age. B The change in the auction price for each additional year of age. C The change in the mean auction price for each additional year of age when the number of bidders is held constant, D The mean auction price given the age, holding the number of bidders constant. E The change in the estimated mean auction price for each additional year of age. 10. The test statistic value for testing the utility of the model is: A 3.33 B 120.651 C 14.114. D 133.1365 E 0.8927 11. The rejection region for testing that increases in the mean auction price of the clock will be associated with increases in the number of bidders ( age held constant) is A t > 2.042 or t < -2.042 B t > 2.045 C t > 1.699 or t < -1.699 D t > 1.699 E t > 1.697 12. The probability of saying that "the mean auction price is different for clocks one year apart in age (holding number of bidders constant)" when actually there is no difference is called A a Type II error B the p-value C the slope. D the power of the test E alpha 13. The degrees of freedom of the estimated variation in the auction prices for all clocks of the same age and number of bidders is: A1 B 31 C 30 D 29 E2 14. If there is interaction between age and number of bidders then A the auction price interacts with age B the change in the mean auction price associated with each additional bidder is a linear function of the age. C the slope of age is a function of the auction price. D the mean auction price depends on age and/or the number of bidders (only true for interaction model) E age is correlated with the number of bidders. ANSWERS 8. A 9. C 10. B 11. D 12. E 13. D 14. B QUESTIONS 15-23 DEAL WITH THE FOLLOWING SITUATION: The expected sales of a product, Y, in a city are assumed to be affected by the per capita discretionary income (PCDI), X1, and the population of the city, X2. A first order model was fit to a random sample of 15 cities Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error Total 2 12 14 53844.71643 56.88357 53901.60000 26922.35822 4.74030 5679.466 0.0001 Root MSE Dep Mean 2.17722 150.60000 R-square Adj R-sq 0.9989 0.9988 Parameter Estimates Parameter Estimate Variable INTERCEP POP INCOME 3.452 0.496 0.009 Standard T for H0: Error Parameter=0 Prob 2.430 0.006 0.001 0.1809 0.0001 0.0001 Actual Obs POP INCOME 16 17 220 375 2500 3500 15. The A. PCDI B. Mean C. PCDI D. PCDI E. PCDI . . 1.420 81.924 9.502 Pred 135.6 221.7 Lower95% Upper95% Lower95% Upper95% Mean Mean Predict Predict 134.1 219.8 137.1 223.6 130.6 216.5 140.5 226.8 null hypothesis for the test of model usefullness is interpreted as and Population size are not linearly related. sales is not a linear function of PCDI and Population size. and Population size do not vary. and Population size do not help predict the sales. and Population size do help predict the sales. 16. What is the test statistic value when testing that both coefficients are equal to zero? A. 2.17722 B. 81.924 + 9.502 = 90.426 C. 0.1809 D. 0.9988 E. 5679.466 17. What is the interpretation of the prediction interval for observation 17? A. With 95% confidence we can say that a city with 375,000 people and a PCDI of $3,500 would have sales between $219,800 and $223,600. B. With 95% confidence we can say that all cities with 375,000 people and a PCDI of $3,500 would have mean sales between $219,800 and $223,600. C. With 95% confidence we can say that a city with 375,000 people and a PCDI of $3,500 would have sales between $67,300 and $292,300. D. With 95% confidence we can say that all cities with 375,000 people and a PCDI of $3,500 would have mean sales between $216,500 and $226,800. E. With 95% confidence we can say that a city with 375,000 people and a PCDI of $3,500 would have sales between $216,500 and $226,800. 18. When testing the alternative hypothesis that 1 is less than zero, what is your conclusion? A. Since p-value =0.0001, we can say that when holding PCDI constant increases in population size is associated with decreases in mean sales. B. Since p-value > 0.05, we can say that increases in population size is associated with decreases in mean sales. C. Since the test statistic value is 9.502, we can not say that when holding PCDI constant increases in population size is associated with increases in mean sales. D. Since p-value < 0.05, we can say that when holding city size constant, PCDI does help estimate mean sales. E. Since p-value > 0.05, we can not say that when holding PCDI constant increases in population size is associated with decreases in mean sales. 19. What would be the rejection region value when testing that the change in the mean sales with each one dollar increase in the PCDI depends on the number of people in the city. Reject Ho if A. |t| > t(12, 0.025) B. F > F(2, 12, 0.05) C. chi-squared > chi-squared( 12, 0.05) D. t > t(11, 0.025) or t < -t(11, 0.025) E. F > F(3, 11, 0.05) 20. What is the meaning of the confidence interval for the mean value given X1=375 and X2=3500? A. With 95% confidence we can say that a city with 375,000 people and PCDI of $3,500 would have sales between $67,300 and $292,300. B. With 95% confidence we can say that all cities with 375,000 people a PCDI of $3,500 would have mean sales between $216,500 and $226,800. C. With 95% confidence we can say that a city with 375,000 people and PCDI of $3,500 would have sales between $216,500 and $226,800. D. With 95% confidence we can say that a city with 375,000 people and PCDI of $3,500 would have sales between $219,800 and $223,600. E. With 95% confidence we can say that all cities with 375,000 people a PCDI of $3,500 would have mean sales between $219,800 and $223,600. of Y a and a a and 22. For all cities with the same PCDI, what is the estimated change in the expected sales when the population of the city increases by one? A. 99.89% increase B. 99.88% increase C. $ 2.177 D. $ 9 E. $ 496 22. Does interaction of X1 and X2 imply that there is correlation between X1 and X2? a. Yes. b. No. 23. What is the value for the multiple coefficient of determination? A. 2.177 B. 0.0001 C. 9.502 D. 1.4457 E. 99.89% Answers ----------------------15. 16. 17. 18. 19. 20. 21. 22. 23. d e e e d e e b e Question 24-27 deal with the following situation: An instructor of BUSA 5325 wants to know if the exam scores of the third exam can be predicted from the exam scores of the first two exams. The numeric scores of the second and third exam are available. However, the first exam seems not to have a linear relationship with exam-3 and has been changed to a categorical variable. The variables are Y = Exam score on the third exam X1a = 1 if student made an A on exam 1 0 if not X1b = 1 if student made a B on exam 1 0 if not X2 = Exam score on the second exam The following model is proposed: E(Y) = 0 + 1X1a + 2X1b +  3 X2 A random sample of 32 of the students was selected and the analysis of variance report is below. PARALLEL (NO-INTERACTION)MODEL - Analysis of Variance Source DF Sum of Squares Mean Square Model Error C Total 3 28 31 2347.91005 2998.55870 5346.46875 782.63668 107.09138 Root MSE Dep Mean 10.34850 77.28125 R-square F Value Prob>F 7.308 0.0009 0.4392 Parameter Estimates Variable DF Parameter Estimate Standard Error T for H0: Parameter=0 Prob > |T| INTERCEP X1a X1b X2 1 1 1 1 47.043655 9.801066 -5.364949 0.356597 12.71357992 4.17220766 4.02638082 0.15862042 3.700 2.349 -1.332 2.248 0.0009 0.0261 0.1935 0.0326 Variable DF Standardized Estimate Tolerance Variance Inflation INTERCEP X1a X1b X2 1 1 1 1 0.00000000 0.37838587 -0.20590050 0.36907595 . 0.77202700 0.83883060 0.74317755 0.00000000 1.29529148 1.19213582 1.34557349 24. Based on the above F value 7.308 and its p-value, the following conclusion can be made: At alpha = 0.05 A. We can say that the average exam-3 scores is affected by either the exam-2 scores or the exam-1 categories. B. We can say that the average exam-3 scores differ among the exam 1 categories after adjusting for the exam-2 scores C. We can not say that exam-2 grades can help predict exam-3 scores after adjusting for the exam 1 categories. D. I can say that I haven't the foggiest idea what you are talking about (Hint: this is a wrong answer). E. We can not say that the average exam-3 scores differ among the exam 1 categories after adjusting for the exam-2 scores. 25. For students with the same exam score on the second exam, what is the estimated difference in the average exam-3 grades for students who made an A on exam 1 minus the average exam-3 grades for students who made lower than a B? A. 9.80 B. 4.03 C. -5.36 D. 0.36 E. 4.91 26. For the inferences in the report to be valid, certain assumptions must hold. Which of the following is not an assumption? A. The grades on the third exam are normally distributed for any grade on the second exam and any category of the first exam. B. For any grade on the second exam and for any category of the first exam, the grades on the third exam have the same variation. C. The difference between a student's third exam score and their expected grade on the third exam (given their grade on the second exam and their category on the first exam) are independent from student to student. D. The letter grade on the first exam is independent of the numeric grade on the second exam. E. The expected grade on the third exam is a linear function (as specified above) of the second exam grade for any category of the first exam. 27. What is the rejection region for testing that the second exam grade is useful for predicting the third exam grade when the grade category of the first exam is held constant? Reject Ho if A. F > F (3, 28, 0.05) B. F < F (3, 28, 0.05) C. F > F (2, 28, 0.05) D. | t | > t (28, 0.025) E. t > t (31, 0.05) Questions 28-31 deal with the same situation as in questions 24-27 but use the interaction model: E(Y) = 0 + 1X1a + 2X1b + 3 X2 + 4X1aX2 + 5X1bX2 where X1aX2 = EXAM2 score times(dummy variable for Exam 1 A students) and X1bX2 = EXAM2 score times(dummy variable for Exam 1 B students) The analysis of variance report is below. INTERACTION MODEL Analysis of Variance Source DF Sum of Squares Mean Square Model Error C Total 5 26 31 2882.16100 2464.30775 5346.46875 576.43220 94.78107 Root MSE Dep Mean 9.73556 77.28125 R-square F Value Prob>F 6.082 0.0007 0.5391 Parameter Estimates Variable DF Parameter Estimate Standard Error T for H0: Parameter=0 Prob > |T| INTERCEP X1a X1b X2 X1aX2 X1bX2 1 1 1 1 1 1 61.952569 -48.877273 4.914131 0.126128 0.725660 -0.083436 16.63008526 25.07804018 22.73269519 0.21473082 0.30587356 0.27919460 3.725 -1.949 0.216 0.587 2.372 -0.299 0.0010 0.0622 0.8305 0.5620 0.0254 0.7674 Variable DF Standardized Estimate Tolerance Variance Inflation INTERCEP X1a X1b X2 X1aX2 X1bX2 1 1 1 1 1 1 0.00000000 0.13054158 -1.88698547 0.18859862 2.38205577 -0.24771236 . 0.35891291 0.01891229 0.02328998 0.01758463 0.02580162 0.00000000 2.78619126 52.87565911 42.93691371 56.86784101 38.75725377 28. Based on the F value 6.082 and its p-value, we can make the following conclusion: At alpha = 0.05 A. We can not say that the average exam-3 scores is affected by either the exam-2 scores or the exam-1 categories. B. We can say that either the exam2 grade or the exam 1 letter grade help predict the exam 3 scores. C. We can not say that exam-2 grades can help predict exam-3 scores after adjusting for the exam 1 categories. D. We can say that exam-2 grades can help predict exam-3 scores after adjusting for the exam 1 categories. E. We can say that exam-3 grades can help predict exam-2 scores after adjusting for the exam 1 categories. 29. In an interaction model the difference in means is a function of another variable. Using the interaction model, what is the difference between the average exam-3 grades for category B students (students who made an B on exam 1) minus the average exam-3 grades for category C students (students who made neither an A nor a B)? A. 2 B. 3 + 2*X1b C. 3 D. 2 + 5 * X2 E. 1 + 4 * X2 30. Which of the following is not a solution to some problems caused by multicollinearity? A. Use other procedures than least squares. B. Use a designed experiment. C. Drop one or more of the correlated variables. D. Increase your sample size. 31. Which of the following is not a result of multicollinearity? A. The t values of important independent variables are small. B. The predictions of the dependent variable become very poor. C. Coefficients of variables could have signs that conflict with theoretical expectations. D. The standard errors of important independent variables are large E. The experimental region becomes elliptical and narrow. 32. Is there multicollinearity among the variables of the interaction model? A. No because the F-test of interaction is not significant: p-value of 0.0780 > 0.05 B. Yes because the largest VIF (56.87) > 10 C. No because the highest tolerance (.36) < 10 D. No because the standardized estimate (-.24) is less than zero. E. Yes because R-squared (0.5391) is large. 33 34. When you can not control the values of the independent variables, the data is said to be A. Latent variable data B. observational data C. experimental data D. continuous data E. a random sample ANSWERS 24. A 25. A 26. D 27. D 28. B 29. D 30. D 31. B 32. B 33. 34. B Questions 35-38 deal with the following situation: A 10-speed bicycle shop is located near a large southern university. The owner of the shop is having difficulty determining the quantity of bicycles to order each month from the manufacturer. To solve the owner's problem, it is essential that the owner be able to predict the monthly demand for the bikes. For the last 15 months, the following variables are available: Y = the monthly demand for the bikes, X1 = the average price of lead-free gasoline for the month X2 = 1 if fall quarter (September-November) 0 if not X3 = 1 if winter quarter (December-February) 0 if not X4 = 1 if spring quarter (March - July) 0 if not. See attached pages for statistics. Questions 36-39 use the model: E(Y) = 0 + 1X1 + 2X2 + 3 X3+ 4 X4 . ANALYSIS OF VARIANCE REPORT Dependent Variable: Y : Monthly Demand for 10-speed bicycles Source df Model Error Total 4 10 14 Sums of Squares 3861.859 3451.741 7313.6 Root Mean Square Error Mean of Dependent Variable Variable Parameter Estimate 482.7626 -382.0742 33.11131 17 28.09902 Intercept X1 X2 X3 X4 35. The A E(Y) B E(Y) C E(Y) D E(Y) E E(Y) \ mean = 0 = 0 = 0 = 0 = 0 + + + + Mean Square F-Ratio 965.4647 345.1741 522.4 Prob > F 2.80 0.085 18.57886 R Squared 40.6 Standard Error 183.3842 152.1359 13.632 15.16958 14.98537 t-value (B=0) 2.63 -2.51 2.43 1.12 1.88 Prob. > |t| 0.0251 0.0308 0.0355 0.2886 0.0902 0.5280 VIF TOL 1.165 1.795 1.600 1.908 .8584 .5572 .6250 .5240 sales for the winter months fall on the line: 1 X1 + 2 X2 + 3 X3 2 + 1 X1 3 + 1 X1 4 + 1 X1 + 1 X1 36. Holding the average price of lead-free gasoline constant, the estimated difference in the mean sales for the winter months minus the mean sales for the summer months is A 482.7626 B -382.0742 C 33.11131 D 17 E 28.09902 37. Using alpha of 0.05, what will be the rejection region for testing the alternative hypothesis that " the quarter of the year and/or the average price of lead-free gasoline help predict the monthly demand for 10-speed bicycles"? A Reject H0 if t > 2.228 or t < -2.228 B Reject H0 if F > 3.11 C Reject H0 if F > 3.48 D Reject H0 if t >1.812 E Reject H0 if t > 2.776 or t < -2.776 38. The 4 parameter in the model is interpreted as: A The mean sales for the spring quarter. B The mean sales for the spring quarter holding the average price of leadfree gas constant. C The estimated difference in the mean sales for the spring months minus the mean sales for the summer months, holding the average price of lead-free gas constant D The difference in the mean sales for the spring months minus the mean sales for the summer months, holding the average price of lead-free gas constant E The difference in the mean sales when the average price of lead-free gas increases by $1, holding the quarter constant. 39. Since there are 15 consecutive months of data, what is often a problem with this type of data? A nonlinearity, B unequal variance, C correlated errors D non-normality 40. Multicollinearity says that A the dependent variable is a linear function of the independent variables. B at least one independent variable is (statistically) linearly related to or correlated with the other independent variables. C the slope of one independent variable is a linear function of other independent variables. D outliers exist in the data. E the variance inflation factor will be small. 41. In the no-interaction model, there is A no evidence of multicollinearity because all VIFs are < 10. B no evidence of multicollinearity because all variables are significant. C evidence of multicollinearity because X1 is significant. D evidence of multicollinearity because all VIFs are < 10. E no evidence of unequal variance because all VIFs are < 10. 42. Which of the following is not a result of multicollinearity? A the t values of important independent variables are small. B the standard errors of important independent variables are large C coefficients of variables could have signs that conflict with theoretical expectations. D the predictions of the dependent variable become very poor. E the experimental region becomes elliptical and narrow. 43. If a residual plot shows all assumptions appear to be satisfied except for linearity, which variable(s) should you transform? A the dependent variable B one or more of the independent variables C both the dependent and independent variables 44. The residual plot for the interaction model is on the last page. The most obvious violation is: A nonlinearity, B unequal variance, C correlated errors D non-normality 35. 36. 37. 38. 39. 40. 41. 42 43 44. ANSWERS C D C D C B A D B B Question 45-48 deal with the following situation: A firm wishes to compare the costs among three couriers DFW, Carborne, and Metro. The measured variables are Y = cost of delivery x1 = 1 if courier is DFW, 0 if not x2 = 1 if courier is Carborne, 0 if not total = pickup plus delivery time The following model is proposed: E(Y) = B0 + B1 * X1 + B2 * X2 + B3 * TOTAL A random sample of 83 deliveries was selected and the analysis of variance report is attached. Analysis of Variance Source Prob>F Model 0.0001 Error C Total Root MSE Dep Mean DF Sum of Squares Mean Square F Value 3 1003.99331 334.66444 98.961 79 82 267.16091 1271.15422 3.38178 1.83896 19.96265 R-square Adj R-sq 0.7898 0.7818 Parameter Estimates Parameter Standard T for H0: Variance Variable Estimate Error Para=0 Prob > |T| Tolerance Inflation INTERCEP 11.539 X1 -1.086 X2 -1.793 TOTAL 0.194 INTERCEP X1 X2 TOTAL 45. A. B. C. D. E. What yhat yhat yhat E(Y) yhat 1 1 1 1 0.747 0.537 0.480 0.012 15.445 -2.023 -3.733 15.622 0.0001 0.0465 0.0004 0.0001 . 0.687 0.754 0.898 Intercept 1 if DFW, 0 if not 1 if Carborne, 0 if not delivery time plus pickup time would be the estimated mean cost for Carborne? = (11.539 - 1.086) + 0.194*total = (11.539 - 1.793) + 0.194*total = (11.539 ) + 0.194*total = B0 + B2 * X2 + B3 * TOTAL = 11.539 - 1.086*X1 -1.793*x2 + 0.194*total 0.000 1.455 1.325 1.113 46. What would be the null hypothesis when testing "the difference in mean costs between DFW and Metro couriers is zero, after adjusting for the total time that it takes to pickup and deliver the package"? A. H0: B1=0 B. H0: B2=0 C. H0: B3=0 D. H0: B1=B2=B3=0 E. H0: B1=B2=0 47. The test statistic value for TOTAL is 15.622. From this we can ______. A. not say the mean cost differs among the couriers, after adjusting for the total time to pickup and deliver. B. not say that the mean cost changes with changes in the total time to pickup and deliver, adjusting for the courier. C. both courier and total help predict the cost. D. say that the mean cost changes with changes in the total time to pickup and deliver, adjusting for the courier. E. say the mean cost differs among the couriers, after adjusting for the total time to pickup and deliver. 48. What table value would be used in a 95% confidence interval interval for the mean value of y conditional on the x values? A. t(82, 0.025) B. F(2, 80, 0.025) C. t(3, 0.05) D. F(3, 79, 0.05) E. t(79, 0.025) 49. If a linear combination of the independent variables is highly correlated with another independent variable, then this is called: A. multicollinearity B. a violation of the independence assumption C. a violation of the linearity assumption D. nonnormality E. interaction 50. What does NOT happen as more redundant variables are added. Redundant variables would be ones that measure the same thing as variables already in the model. (Assume no missing values.) A. MSE would increase. B. Important variables would become insignificant. C. The correlation between two independent variables would increase. D. VIFs would increase. 51. In the model, what is the VIF for X1? A. 14.792 B. 1.694 C. 1.080 D. 1.455 E. 0.537 45. 46. 47. 48. 49. 50. 51. ANSWERS ----------------------------------B A D E A C D

Multiple Linear Regression

Related documents

Products

Support

Multiple Linear Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib