Chapter 14 Multiple Regression and Correlation Analysis True/False 1. Multiple regression analysis is used when two or more independent variables are used to predict a value of a single dependent variable. Answer: True 2. Multiple regression analysis is used when one independent variable is used to predict values of two or more dependent variables. Answer: False 3. The values of b1, b2 and b3 in a multiple regression equation are called the net regression coefficients. Answer: True 4. A net regression coefficient, b3, indicates the change in the predicted value for a unit change in X3 when all other Xi variables are held constant. Answer: True 5. Multiple regression analysis examines the relationship of several dependent variables on the independent variable. Answer: False 6. A multiple regression equation defines the relationship between a dependent variable and a set of independent variables in the form of an equation. Answer: True 7. In multiple regression analysis, a and b1 are sample statistics that estimate the population parameters, α and β i . Answer: True 8. The coefficient of multiple determination reports the strength of the association between a dependent variable and a set of independent variables. Answer: True 9. In a multiple regression analysis with two independent variables, the multiple standard error of estimate measures the variation of the dependent variable about a regression plane. Answer: True 10. A coefficient of multiple determination could be equal to –0.76. Answer: False 11. A coefficient of multiple determination equaling –0.99 shows that the dependent variable is inversely related to a set of independent variables. Answer: False 12. Multiple R 2 measures the proportion of explained variation relative to total variation. Answer: True 13. The multiple coefficient of determination, R 2 , reports the proportion of the variation in Y that is not explained by the variation in the set of independent variables. Answer: False 14. A correlation matrix shows individual correlation coefficients for all pairs of variables. Answer: True 15. A correlation matrix can be used to assess multicollinearity between independent variables. Answer: True 16. A correlation matrix can be used to assess homoscedasticity between independent variables. Answer: False 17. To test the global hypothesis in multiple regression analysis, a t-statistic is used. Answer: False 18. To test the global hypothesis in multiple regression analysis, an F-statistic is used. Answer: True 19. A dummy variable is added to the regression equation to control for error. Answer: False 20. If a dummy variable for gender is included in a multiple regression analysis, "male" would be coded as 1 and "female" would be coded as 2. Answer: False 21. Autocorrelation often happens when data has been collected over periods of time. Answer: True 22. Homoscedasticity occurs when the variance of the residuals ( Y – Yˆ ) is different for different values of Yˆ . Answer: False 23. In multiple regression analysis, a residual is the difference between the value of an independent variable and its corresponding dependent variable value. Answer: False 24. In multiple regression analysis, a residual is the difference between the value of a dependent variable, Y, and its predicted value, Yˆ . Answer: True Multiple Choice 25. In multiple regression analysis, residual analysis is used to test the requirement that A) the variation in the residuals is the same for all fitted values of Yˆ B) the independent variables are the direct cause of the dependent variable C) the number of independent variables included in the analysis is correct D) prediction error is minimized Answer: A 26. A valid multiple regression analysis assumes or requires that A) the dependent variable is measured using an ordinal, interval, or ratio scale B) the residuals follow an F-distribution C) the independent variables and the dependent variable have a linear relationship D) the observations are autocorrelated Answer: C 27. How is the degree of association between a set of independent variables and a dependent variable measured? A) Confidence intervals. B) Autocorrelation C) Coefficient of multiple determination D) Standard error of estimate Answer: C 28. In a multiple regression ANOVA table, explained variation is represented by A) the regression sum of squares B) the total sum of squares C) the regression coefficients D) the correlation matrix Answer: A 29. If the coefficient of multiple determination is 0.81, what percent of variation is not explained? A) 19% B) 90% C) 66% D) 81% Answer: A 30. In multiple regression analysis, testing the global null hypothesis that the multiple regression coefficients are all zero is based on A) a z statistic B) a t statistic C) a F statistic D) binomial distribution Answer: C 31. What is the range of values for multiple R? A) –100% to –100% inclusive B) –100% to 0% inclusive C) 0% to +100% inclusive D) Unlimited range Answer: C 32. When does multicollinearity occur in a multiple regression analysis? A) The dependent variables are highly correlated B) The independent variables are minimally correlated C) The independent variables are highly correlated D) The independent variables have no correlation Answer: C 33. In multiple regression analysis, when the independent variables are highly correlated, this situation is called ____________________. A) Autocorrelation B) Multicollinearity C) Homoscedasticity D) curvilinearity Answer: B 34. In the general multiple regression equation which of the following variables represents the Y intercept? A) b1 B) x1 C) Yˆ D) a Answer: D 35. If there are four independent variables in a multiple regression equation, there are also four A) Y-intercepts. B) regression coefficients. C) dependent variables. D) constant terms. Answer: B 36. What does the multiple standard error of estimate measure? A) Change in Yˆ for a change in X1 B) The "error" or variability in predicting Y C) The regression mean square error in the ANOVA table D) Amount of explained variation Answer: B 37. If a multiple regression analysis is based on ten independent variables collected from a sample of 125 observations, what will be the value of the denominator in the calculation of the multiple standard error of estimate? A) 125 B) 10 C) 114 D) 115 Answer: C 38. If the correlation between the two independent variables of a regression analysis is 0.11 and each independent variable is highly correlated to the dependent variable, what does this indicate? A) Multicollinearity between these two independent variables B) A negative relationship is not possible C) Only one of the two independent variables will explain a high percent of the variation D) An effective regression equation Answer: D 39. If the correlation between the two independent variables of a regression analysis is 0.11 and each independent variable is highly correlated to the dependent variable, what does this indicate? A) Only one of the independent variables should be used in the regression equation. B) The independent variables are strongly related. C) Two separate regression equations are required. D) Both independent variables should be used to predict the dependent variable. Answer: D 40. What does the correlation matrix for a multiple regression analysis contain? A) Multiple correlation coefficients B) Simple correlation coefficients C) Multiple coefficients of determination D) Multiple standard errors of estimate Answer: B 41. What can we conclude if the global test of regression does not reject the null hypothesis? A) A strong relationship exists among the variables B) No relationship exists between the dependent variable and any of the independent variables C) The independent variables are good predictors D) Good forecasts are possible Answer: B 42. What can we conclude if the global test of regression rejects the null hypothesis? A) Strong correlations exist among the variables B) No relationship exists between the dependent variable and any of the independent variables C) At least one of the net regression coefficients is not equal to zero. D) Good predictions are not possible Answer: C 43. What are the degrees of freedom associated with the regression sum of squares? A) Number of independent variables B) 1 C) F-ratio D) (n – 2) Answer: A 44. Which of the following is a characteristic of the F-distribution? A) Normally distributed B) Positively skewed C) Negatively skewed D) Equal to the t-distribution Answer: B 45. In a regression analysis, three independent variables are used in the equation based on a sample of forty observations. What are the degrees of freedom associated with the F-statistic? A) 3 and 39 B) 4 and 40 C) 3 and 36 D) 2 and 39 Answer: C 46. Hypotheses concerning individual regression coefficients are tested using which statistic? A) t-statistic B) z-statistic C) 2 (chi-square statistic) D) F Answer: A 47. The coefficient of determination measures the proportion of A) explained variation relative to total variation. B) variation due to the relationship among variables. C) error variation relative to total variation. D) variation due to regression. Answer: A 48. What happens as the scatter of data values about the regression plane increases? A) Standard error of estimate increases B) R 2 decreases C) (1 – R 2 ) increases D) Error sum of squares increases E) All of the above are correct Answer: E Scrambling: Locked 49. For a unit change in the first independent variable with other things being held constant, what change can be expected in the dependent variable in the multiple regression equation Yˆ 5.2 6.3 X 1 7.1X 2 ? A) – 7.1 B) + 6.3 C) + 5.2 D) + 4.4 Answer: B 50. The best example of a null hypothesis for a global test of a multiple regression model is: A) H O : β1 β 2 β 3 β 4 B) H O : μ 1 μ 2 μ 3 μ 4 C) H 0 : β1 0 D) If F is greater than 20.00 then reject Answer: A Goal: 4 51. The best example of an alternate hypothesis for a global test of a multiple regression model is: A) H1 : β1 β 2 β 3 β 4 B) H1 : β1 β 2 β 3 β 4 C) H1 : Not all the β' s are 0 D) If F is less than 20.00 then fail to reject Answer: C 52. The best example of a null hypothesis for testing an individual regression coefficient is: A) H O : β1 β 2 β 3 β 4 B) H O : μ 1 μ 2 μ 3 μ 4 C) H 0 : β1 0 D) If F is greater than 20.00 then reject Answer: C 53. In multiple regression analysis, residuals ( Y – Yˆ ) are used to: A) Provide a global test of a multiple regression model. B) Evaluate multicollinearity C) Evaluate homoscedasticity D) Compare two regression coefficients Answer: C 54. In multiple regression, a dummy variable can be included in a multiple regression model as A) An additional quantitative variable B) A nominal variable with three or more values C) A nominal variable with only two values D) A new regression coefficient Answer: C 55. Multiple regression analysis is applied when analyzing the relationship between A) An independent variable and several dependent variables B) A dependent variable and several independent variables C) Several dependent variables and several independent variables D) Several regression equations and a single sample Answer: B Fill-in-the-Blank 56. Violating the need for successive observations of the dependent variable to be uncorrelated is called ____________________________. Answer: autocorrelation 57. Multiple R 2 measures the proportion of ____________________. Answer: explained variation 58. In multiple regression analysis, a variable whose possible outcomes are coded as a "1" or a "0" is called a(n) __________________________ . Answer: dummy variable 59. If a dependent variable and one or more independent variables are inversely related, what is the sign for the regression coefficients of the independent variables? ______________ Answer: negative 60. A frequent use of a correlation matrix is to check for _____________. Answer: multicollinearity 61. In a multiple regression analysis ANOVA table, what determines the number of degrees of freedom associated with the regression sum of squares? ____________________ . Answer: the number of independent variables 62. If the null hypothesis, H 0 : 4 0 , is not rejected, what effect does the independent variable, X4, have when predicting the dependent variable? ______ Answer: no effect 63. What is the proportion of total variation in the dependent variable that is explained by the independent variable for a multiple R 2 = 0.90? _______ Answer: 90% or 0.90 64. Given a multiple linear regression equation Yˆ = 5.1 + 2.2X1 – 3.5X2, what will a unit increase in the independent variable, X2, , mean in the change of Yˆ assuming other things are held constant? ________ Answer: -3.5 65. When the variance of the differences between the actual and the predicted values of the dependent variable are approximately the same, the variables are said to exhibit _______________________________. Answer: homoscedasticity 66. A method for selecting the best subset of variables in a multiple regression equation is: ____________ Answer: Stepwise Regression 67. In the following regression equation, Yˆ a b1 x1 b2 x2 b3 ( x1 x2 ), ( x1 x2 ) is the ___________ Answer: Interaction of x1 and x 2 Multiple Choice Use the following to answer questions 68-71: The following correlations were computed as part of a multiple regression analysis that used education, job, and age to predict income. Income Education Job Age Income 1.000 0.677 0.173 0.369 Education Job Age 1.000 – 0.181 0.073 1.000 0.689 1.000 68. What is this table called? A) Net regression coefficients B) Coefficients of nondetermination C) Analysis of variance D) Correlation matrix Answer: D 69. Which is the dependent variable? A) Income B) Age C) Education D) Job Answer: A 70. Which independent variable has the strongest association with the dependent variable? A) Income B) Age C) Education D) Job Answer: C 71. Which independent variable has the weakest association with the dependent variable? A) Income B) Age C) Education D) Job Answer: D Fill-in-the-Blank Use the following to answer questions 72-78: It has been hypothesized that overall academic success for college freshmen as measured by grade point average (GPA) is a function of IQ scores X1 , hours spent studying each week X 2 , and one's high school average X 3 . Suppose the regression equation is: Yˆ 6.9 0.055 X 1 0.107 X 2 0.0853 X 3 . The multiple standard error is 6.313 and R 2 = 0.826. 72. What is the predicted GPA for a student with an IQ of 108, 32 hours spent studying per week and a high school average of 82? _____ Answer: 3.1446 73. What is the predicted GPA if the IQ is 108, the number of hours spent studying is 30, and the high school average is 82? ______ Answer: 2.9306 74. Assuming other independent variables are held constant, what effect on the GPA will there be if the numbers of hours spent studying per week increases from 32 to 36? ________ Answer: +0.428 75. For which independent variable does a unit change have the least effect on GPA? ___________________ Answer: high school average X 3 76. For which independent variable does a unit change have the greatest effect on the GPA? ________________ Answer: hours spent studying per week X 2 77. How many dependent variables are in the regression equation? ___ Answer: one 78. How will a student's GPA be affected if an additional hour is spent studying each weeknight? ________ Answer: increases by 0.535 Multiple Choice Use the following to answer questions 79-87: Twenty-one executives in a large corporation were randomly selected for a study to determine the effect of several factors on annual salary (expressed in $000's). The factors selected were age, seniority, years of college, number of company divisions they had been exposed to and the level of their responsibility. A regression analysis was performed using a popular spreadsheet program with the following regression output: Constant Std Error of Y estimate 2 R No. of Observations Degrees of Freedom X Coefficients Std Err of Coef 23.00371 2.91933 0.91404 21 15 Age – 0.031 0.183 Sen 0.381 0.158 Educ 1.452 0.387 # of Div – 0.089 0.541 Level 3.554 0.833 79. Which one of the following is the dependent variable? A) Age B) Seniority C) Level of responsibility D) Annual salary E) Experience in number of company divisions Answer: D Fill-in-the-Blank 80. Write out the multiple regression equation. _______________________ Answer: Yˆ 23.004 0.031X 1 0.381X 2 1.452 X 3. 0.089 X 4 3.554 X 5 Refer To: 14_03 81. Which of the following has the most influence on salary -- 20 years of seniority, 5 years of college or attaining 55 years of age? _______ Answer: 20 years of seniority 82. If the other variables are held constant, how does an increase of one level of responsibility affect salary? ___________ Answer: +$3,554 83. If other variables are held constant, how does an increase in age of two years affect salary? _________________ Answer: -$62 84. What proportion of the total variation in salary is accounted for by the set of independent variables? ___________ Answer: 91.4% 85. What is the value of the denominator in the calculation of the multiple standard error of estimate? ___________ Answer: 15 86. Test the hypothesis that the regression coefficient for age is equal to 0 at the 0.05 significance level. ___________ Answer: d.f. = 15, t = - 0.238, t-critical = ± 2.131, fail to reject. 87. Test the hypothesis that the regression coefficient for education is equal to 0 at the 0.05 significance level. ___________ Answer: d.f. = 15, t = 3.752, t-critical = 2.131, reject the null hypothesis and conclude that education and salary are significantly related. Use the following to answer questions 88-93: The production of automobile tires in any given year is related to the number of automobiles produced this year and in prior years. Suppose our econometric model resulted in the following data. X1 = Automobiles produced this year X2 = Automobiles produced last year X3 = Automobiles produced 2 years ago X4 = Automobiles produced 3 years ago X5 = Automobiles produced 4 years ago Constant Multiple R Coef 5.00 0.25 0.67 2.12 3.44 – 50,000 0.83 t-ratio 10.4 0.6 1.4 2.7 6.5 88. Which variable in the model is the most significant predictor of tire production? __________ Answer: X1 89. What is the proportion of variation in tires produced by our predictor variables in the model? ________ Answer: 0.69 90. Which variable in the model is the least significant in predicting tire production? _________ Answer: X 2 91. What is the equation for our model? ____________________________ Answer: number of tires produced = - 50,000 5.00 X 1 0.25 X 2 0.67 X 3 2.12 X 4 3.44 X 5 92. How much does tire production increase for every thousand cars produced two years ago? _____ Answer: 670 93. How much does tire production change for every thousand cars produced three years ago? _____ Answer: 2,120 Use the following to answer questions 94-100: A real estate agent developed a model to relate a house's selling price (Y) to the area of floor space (X) and the area of floor space squared X 2 . The multiple regression equation for this model is: Yˆ 125 3X X 2 where: Yˆ = selling price (times $1000) X = square feet of floor space (times 100) 94. What is the intercept (a)? _____________ Answer: $125 (in thousands) 95. What is the selling price of a house with 1000 square feet? ______ Answer: $195,000 96. What is the selling price of a house with 1500 square feet? ______ Answer: $305,000 97. What is the selling price of a house with 2000 square feet? ______ Answer: $465,000 98. What is the difference in selling prices of a house with 1600 square feet and one with 1700 square feet? ______ Answer: $30,000 ($363,000 - $333,000) 99. What is the difference in selling prices of a house with 1700 square feet and one with 1800 square feet? ______ Answer: $32,000 ($395,000 - $363,000) 100. What is the difference in selling prices of a house with 1650 square feet and one with 1750 square feet? ______ Answer: $31,000 ($378,750 - $347,750) Multiple Choice Use the following to answer questions 101-106: A manager at a local bank analyzed the relationship between monthly salary and three independent variables: length of service (measured in months), gender ( 0 = female, 1 = male) and job type (0 = clerical, 1 = technical). The following ANOVA summarizes the regression results: ANOVA Regression Residual Total Intercept Service Gender Job df 3 26 29 SS 1004346.771 1461134.596 2465481.367 MS 334782.257 56197.48445 F 5.96 Coefficients 784.92 9.19 222.78 -28.21 Standard Error 322.25 3.20 89.00 89.61 t Stat 2.44 2.87 2.50 -0.31 P-value 0.02 0.01 0.02 0.76 101. Based on the ANOVA and a 0.05 significance level, the global null hypothesis test of the multiple regression model A) Will be rejected and conclude that monthly salary is related to all of the independent variables B) Will be rejected and conclude that monthly salary is related to at least one of the independent variables. C) Will not be rejected. D) Will show a high multiple coefficient of determination Answer: B 102. Based on the ANOVA, the multiple coefficient of determination is A) 5.957% B) 59.3% C) 40.7% D) cannot be computed Answer: C 103. Based on the hypothesis tests for the individual regression coefficients, A) All the regression coefficients are not equal to zero. B) "job" is the only significant variable in the model C) Only months of service and gender are significantly related to monthly salary. D) "service" is the only significant variable in the model Answer: C 104. In the regression model, which of the following are dummy variables? A) Intercept B) Service C) Service and gender D) Gender and job E) Service, gender, and job Answer: D 105. The results for the variable gender show that A) males average $222.78 more than females in monthly salary B) females average $222.78 more than males in monthly salary C) gender is not related to monthly salary D) Gender and months of service are correlated. Answer: A 106. Based on the hypothesis tests for individual regression coefficients, A) All regression coefficients should remain in the regression equation B) Based on the standard errors, the variable, service, should not be included in the regression equation. C) Based on the p-values, the variable, job, should not be included in the regression equation. D) The relationship between monthly salary and gender is linear. Answer: C Essay 107. What are the five assumptions of linear multiple regression? Answer: 1) A linear relationship between the dependent variable and the independent variables, 2) the variation of the residuals is the same for small and large values of Yˆ , 3) the residuals are normally distributed, 4) the independent variables should not be correlated, 5) The residuals are independent. 108. How are scatter diagrams used to evaluate the assumptions of linear regression? Answer: A scatter diagram can be used to evaluate the assumption of linearity. For each independent variable, the dependent variable can be plotted against the independent variable. These plots provide evidence of linear relationships. 109. How are residual plots drawn and used to evaluate the assumptions of linear regression? Answer: A residual plot graphs the residuals against the values of one of the independent variables. A residual plot is graphed for each independent variable. To support the assumptions of equal variation for small and large values of the independent variable, the points should be evenly distributed above and below zero and evenly distributed over all values of the independent variable. Difficulty: Hard 110. What statistic is used to assess multicolinearity in multiple regression analysis? Answer: Variance inflation factor (VIF) Fill-in-the-Blank Use the following to answer questions 111-115: It has been hypothesized that overall academic success for college freshmen as measured by grade point average (GPA) is a function of IQ scores X1 , hours spent studying each week X 2 , and one's high school average X 3 . Suppose the regression equation is: Yˆ 6.9 0.055 X 1 0.107 X 2 0.0083 X 3. 0.0004 X 2 X 3 The multiple standard error is 6.313 and R2 = 0.826. 111. What is the predicted GPA for a student with an IQ of 108, 32 hours spent studying per week and a high school average of 82? _____ Answer: 3.249 112. What is the predicted GPA if the IQ is 108, the number of hours spent studying is 30, and the high school average is 82? ______ Answer: 3.029 113. Assuming other independent variables are held constant, what effect on the GPA will there be if the numbers of hours spent studying per week increases from 32 to 36? ________ Answer: The answer depends on the value of hours studied per week 114. How many independent variables are in the regression equation? ___ Answer: four 115. How will a student's GPA be affected if the student’s high school average was 80 and an additional hour is spent studying each weeknight? ________ Answer: increases by 0.551