Chapter 14 Multiple Regression and Correlation Analysis

advertisement
Chapter 14
Multiple Regression and Correlation Analysis
True/False
1. Multiple regression analysis is used when two or more independent variables are used to predict a
value of a single dependent variable.
Answer: True
2. Multiple regression analysis is used when one independent variable is used to predict values of two or
more dependent variables.
Answer: False
3. The values of b1, b2 and b3 in a multiple regression equation are called the net regression coefficients.
Answer: True
4. A net regression coefficient, b3, indicates the change in the predicted value for a unit change in X3
when all other Xi variables are held constant.
Answer: True
5. Multiple regression analysis examines the relationship of several dependent variables on the
independent variable.
Answer: False
6. A multiple regression equation defines the relationship between a dependent variable and a set of
independent variables in the form of an equation.
Answer: True
7. In multiple regression analysis, a and b1 are sample statistics that estimate the population parameters,
α and β i .
Answer: True
8. The coefficient of multiple determination reports the strength of the association between a dependent
variable and a set of independent variables.
Answer: True
9. In a multiple regression analysis with two independent variables, the multiple standard error of
estimate measures the variation of the dependent variable about a regression plane.
Answer: True
10. A coefficient of multiple determination could be equal to –0.76.
Answer: False
11. A coefficient of multiple determination equaling –0.99 shows that the dependent variable is inversely
related to a set of independent variables.
Answer: False
12. Multiple R 2 measures the proportion of explained variation relative to total variation.
Answer: True
13. The multiple coefficient of determination, R 2 , reports the proportion of the variation in Y that is not
explained by the variation in the set of independent variables.
Answer: False
14. A correlation matrix shows individual correlation coefficients for all pairs of variables.
Answer: True
15. A correlation matrix can be used to assess multicollinearity between independent variables.
Answer: True
16. A correlation matrix can be used to assess homoscedasticity between independent variables.
Answer: False
17. To test the global hypothesis in multiple regression analysis, a t-statistic is used.
Answer: False
18. To test the global hypothesis in multiple regression analysis, an F-statistic is used.
Answer: True
19. A dummy variable is added to the regression equation to control for error.
Answer: False
20. If a dummy variable for gender is included in a multiple regression analysis, "male" would be coded
as 1 and "female" would be coded as 2.
Answer: False
21. Autocorrelation often happens when data has been collected over periods of time.
Answer: True
22. Homoscedasticity occurs when the variance of the residuals ( Y – Yˆ ) is different for different values
of Yˆ .
Answer: False
23. In multiple regression analysis, a residual is the difference between the value of an independent
variable and its corresponding dependent variable value.
Answer: False
24. In multiple regression analysis, a residual is the difference between the value of a dependent variable,
Y, and its predicted value, Yˆ .
Answer: True
Multiple Choice
25. In multiple regression analysis, residual analysis is used to test the requirement that
A) the variation in the residuals is the same for all fitted values of Yˆ
B) the independent variables are the direct cause of the dependent variable
C) the number of independent variables included in the analysis is correct
D) prediction error is minimized
Answer: A
26. A valid multiple regression analysis assumes or requires that
A) the dependent variable is measured using an ordinal, interval, or ratio scale
B) the residuals follow an F-distribution
C) the independent variables and the dependent variable have a linear relationship
D) the observations are autocorrelated
Answer: C
27. How is the degree of association between a set of independent variables and a dependent variable
measured?
A) Confidence intervals.
B) Autocorrelation
C) Coefficient of multiple determination
D) Standard error of estimate
Answer: C
28. In a multiple regression ANOVA table, explained variation is represented by
A) the regression sum of squares
B) the total sum of squares
C) the regression coefficients
D) the correlation matrix
Answer: A
29. If the coefficient of multiple determination is 0.81, what percent of variation is not explained?
A) 19%
B) 90%
C) 66%
D) 81%
Answer: A
30. In multiple regression analysis, testing the global null hypothesis that the multiple regression
coefficients are all zero is based on
A) a z statistic
B) a t statistic
C) a F statistic
D) binomial distribution
Answer: C
31. What is the range of values for multiple R?
A) –100% to –100% inclusive
B) –100% to 0% inclusive
C) 0% to +100% inclusive
D) Unlimited range
Answer: C
32. When does multicollinearity occur in a multiple regression analysis?
A) The dependent variables are highly correlated
B) The independent variables are minimally correlated
C) The independent variables are highly correlated
D) The independent variables have no correlation
Answer: C
33. In multiple regression analysis, when the independent variables are highly correlated, this situation is
called ____________________.
A) Autocorrelation
B) Multicollinearity
C) Homoscedasticity
D) curvilinearity
Answer: B
34. In the general multiple regression equation which of the following variables represents the Y
intercept?
A) b1
B) x1
C) Yˆ
D) a
Answer: D
35. If there are four independent variables in a multiple regression equation, there are also four
A) Y-intercepts.
B) regression coefficients.
C) dependent variables.
D) constant terms.
Answer: B
36. What does the multiple standard error of estimate measure?
A) Change in Yˆ for a change in X1
B) The "error" or variability in predicting Y
C) The regression mean square error in the ANOVA table
D) Amount of explained variation
Answer: B
37. If a multiple regression analysis is based on ten independent variables collected from a sample of 125
observations, what will be the value of the denominator in the calculation of the multiple standard error of
estimate?
A) 125
B) 10
C) 114
D) 115
Answer: C
38. If the correlation between the two independent variables of a regression analysis is 0.11 and each
independent variable is highly correlated to the dependent variable, what does this indicate?
A) Multicollinearity between these two independent variables
B) A negative relationship is not possible
C) Only one of the two independent variables will explain a high percent of the variation
D) An effective regression equation
Answer: D
39. If the correlation between the two independent variables of a regression analysis is 0.11 and each
independent variable is highly correlated to the dependent variable, what does this indicate?
A) Only one of the independent variables should be used in the regression equation.
B) The independent variables are strongly related.
C) Two separate regression equations are required.
D) Both independent variables should be used to predict the dependent variable.
Answer: D
40. What does the correlation matrix for a multiple regression analysis contain?
A) Multiple correlation coefficients
B) Simple correlation coefficients
C) Multiple coefficients of determination
D) Multiple standard errors of estimate
Answer: B
41. What can we conclude if the global test of regression does not reject the null hypothesis?
A) A strong relationship exists among the variables
B) No relationship exists between the dependent variable and any of the independent variables
C) The independent variables are good predictors
D) Good forecasts are possible
Answer: B
42. What can we conclude if the global test of regression rejects the null hypothesis?
A) Strong correlations exist among the variables
B) No relationship exists between the dependent variable and any of the independent variables
C) At least one of the net regression coefficients is not equal to zero.
D) Good predictions are not possible
Answer: C
43. What are the degrees of freedom associated with the regression sum of squares?
A) Number of independent variables
B) 1
C) F-ratio
D) (n – 2)
Answer: A
44. Which of the following is a characteristic of the F-distribution?
A) Normally distributed
B) Positively skewed
C) Negatively skewed
D) Equal to the t-distribution
Answer: B
45. In a regression analysis, three independent variables are used in the equation based on a sample of
forty observations. What are the degrees of freedom associated with the F-statistic?
A) 3 and 39
B) 4 and 40
C) 3 and 36
D) 2 and 39
Answer: C
46. Hypotheses concerning individual regression coefficients are tested using which statistic?
A) t-statistic
B) z-statistic
C)  2 (chi-square statistic)
D) F
Answer: A
47. The coefficient of determination measures the proportion of
A) explained variation relative to total variation.
B) variation due to the relationship among variables.
C) error variation relative to total variation.
D) variation due to regression.
Answer: A
48. What happens as the scatter of data values about the regression plane increases?
A) Standard error of estimate increases
B) R 2 decreases
C) (1 – R 2 ) increases
D) Error sum of squares increases
E) All of the above are correct
Answer: E
Scrambling: Locked
49. For a unit change in the first independent variable with other things being held constant, what change
can be expected in the dependent variable in the multiple regression equation Yˆ  5.2  6.3 X 1  7.1X 2 ?
A) – 7.1
B) + 6.3
C) + 5.2
D) + 4.4
Answer: B
50. The best example of a null hypothesis for a global test of a multiple regression model is:
A) H O : β1  β 2  β 3  β 4
B) H O : μ 1  μ 2  μ 3  μ 4
C) H 0 : β1  0
D) If F is greater than 20.00 then reject
Answer: A
Goal: 4
51. The best example of an alternate hypothesis for a global test of a multiple regression model is:
A) H1 : β1  β 2  β 3  β 4
B) H1 : β1  β 2  β 3  β 4
C) H1 : Not all the β' s are 0
D) If F is less than 20.00 then fail to reject
Answer: C
52. The best example of a null hypothesis for testing an individual regression coefficient is:
A) H O : β1  β 2  β 3  β 4
B) H O : μ 1  μ 2  μ 3  μ 4
C) H 0 : β1  0
D) If F is greater than 20.00 then reject
Answer: C
53. In multiple regression analysis, residuals ( Y – Yˆ ) are used to:
A) Provide a global test of a multiple regression model.
B) Evaluate multicollinearity
C) Evaluate homoscedasticity
D) Compare two regression coefficients
Answer: C
54. In multiple regression, a dummy variable can be included in a multiple regression model as
A) An additional quantitative variable
B) A nominal variable with three or more values
C) A nominal variable with only two values
D) A new regression coefficient
Answer: C
55. Multiple regression analysis is applied when analyzing the relationship between
A) An independent variable and several dependent variables
B) A dependent variable and several independent variables
C) Several dependent variables and several independent variables
D) Several regression equations and a single sample
Answer: B
Fill-in-the-Blank
56. Violating the need for successive observations of the dependent variable to be uncorrelated is called
____________________________.
Answer: autocorrelation
57. Multiple R 2 measures the proportion of ____________________.
Answer: explained variation
58. In multiple regression analysis, a variable whose possible outcomes are coded as a "1" or a "0" is
called a(n) __________________________ .
Answer: dummy variable
59. If a dependent variable and one or more independent variables are inversely related, what is the sign
for the regression coefficients of the independent variables? ______________
Answer: negative
60. A frequent use of a correlation matrix is to check for _____________.
Answer: multicollinearity
61. In a multiple regression analysis ANOVA table, what determines the number of degrees of freedom
associated with the regression sum of squares? ____________________ .
Answer: the number of independent variables
62. If the null hypothesis, H 0 :  4  0 , is not rejected, what effect does the independent variable, X4,
have when predicting the dependent variable? ______
Answer: no effect
63. What is the proportion of total variation in the dependent variable that is explained by the
independent variable for a multiple R 2 = 0.90? _______
Answer: 90% or 0.90
64. Given a multiple linear regression equation Yˆ = 5.1 + 2.2X1 – 3.5X2, what will a unit increase in the
independent variable, X2, , mean in the change of Yˆ assuming other things are held constant? ________
Answer: -3.5
65. When the variance of the differences between the actual and the predicted values of the dependent
variable are approximately the same, the variables are said to exhibit
_______________________________.
Answer: homoscedasticity
66. A method for selecting the best subset of variables in a multiple regression equation is:
____________
Answer: Stepwise Regression
67. In the following regression equation, Yˆ  a  b1 x1  b2 x2  b3 ( x1 x2 ), ( x1 x2 ) is the ___________
Answer: Interaction of x1 and x 2
Multiple Choice
Use the following to answer questions 68-71:
The following correlations were computed as part of a multiple regression analysis that used education,
job, and age to predict income.
Income
Education
Job
Age
Income
1.000
0.677
0.173
0.369
Education
Job
Age
1.000
– 0.181
0.073
1.000
0.689
1.000
68. What is this table called?
A) Net regression coefficients
B) Coefficients of nondetermination
C) Analysis of variance
D) Correlation matrix
Answer: D
69. Which is the dependent variable?
A) Income
B) Age
C) Education
D) Job
Answer: A
70. Which independent variable has the strongest association with the dependent variable?
A) Income
B) Age
C) Education
D) Job
Answer: C
71. Which independent variable has the weakest association with the dependent variable?
A) Income
B) Age
C) Education
D) Job
Answer: D
Fill-in-the-Blank
Use the following to answer questions 72-78:
It has been hypothesized that overall academic success for college freshmen as measured by grade point
average (GPA) is a function of IQ scores X1  , hours spent studying each week X 2  , and one's high
school average X 3  . Suppose the regression equation is: Yˆ  6.9  0.055 X 1  0.107 X 2  0.0853 X 3 .
The multiple standard error is 6.313 and R 2 = 0.826.
72. What is the predicted GPA for a student with an IQ of 108, 32 hours spent studying per week and a
high school average of 82? _____
Answer: 3.1446
73. What is the predicted GPA if the IQ is 108, the number of hours spent studying is 30, and the high
school average is 82? ______
Answer: 2.9306
74. Assuming other independent variables are held constant, what effect on the GPA will there be if the
numbers of hours spent studying per week increases from 32 to 36? ________
Answer: +0.428
75. For which independent variable does a unit change have the least effect on GPA?
___________________
Answer: high school average X 3 
76. For which independent variable does a unit change have the greatest effect on the GPA?
________________
Answer: hours spent studying per week X 2 
77. How many dependent variables are in the regression equation? ___
Answer: one
78. How will a student's GPA be affected if an additional hour is spent studying each weeknight?
________
Answer: increases by 0.535
Multiple Choice
Use the following to answer questions 79-87:
Twenty-one executives in a large corporation were randomly selected for a study to determine the effect
of several factors on annual salary (expressed in $000's). The factors selected were age, seniority, years
of college, number of company divisions they had been exposed to and the level of their responsibility. A
regression analysis was performed using a popular spreadsheet program with the following regression
output:
Constant
Std Error of Y estimate
2
R
No. of Observations
Degrees of Freedom
X Coefficients
Std Err of Coef
23.00371
2.91933
0.91404
21
15
Age
– 0.031
0.183
Sen
0.381
0.158
Educ
1.452
0.387
# of Div
– 0.089
0.541
Level
3.554
0.833
79. Which one of the following is the dependent variable?
A) Age
B) Seniority
C) Level of responsibility
D) Annual salary
E) Experience in number of company divisions
Answer: D
Fill-in-the-Blank
80. Write out the multiple regression equation. _______________________
Answer: Yˆ  23.004  0.031X 1  0.381X 2  1.452 X 3.  0.089 X 4  3.554 X 5
Refer To: 14_03
81. Which of the following has the most influence on salary -- 20 years of seniority, 5 years of college or
attaining 55 years of age? _______
Answer: 20 years of seniority
82. If the other variables are held constant, how does an increase of one level of responsibility affect
salary? ___________
Answer: +$3,554
83. If other variables are held constant, how does an increase in age of two years affect salary?
_________________
Answer: -$62
84. What proportion of the total variation in salary is accounted for by the set of independent variables?
___________
Answer: 91.4%
85. What is the value of the denominator in the calculation of the multiple standard error of estimate?
___________
Answer: 15
86. Test the hypothesis that the regression coefficient for age is equal to 0 at the 0.05 significance level.
___________
Answer: d.f. = 15, t = - 0.238, t-critical = ± 2.131, fail to reject.
87. Test the hypothesis that the regression coefficient for education is equal to 0 at the 0.05 significance
level. ___________
Answer: d.f. = 15, t = 3.752, t-critical =  2.131, reject the null hypothesis and conclude that
education and salary are significantly related.
Use the following to answer questions 88-93:
The production of automobile tires in any given year is related to the number of automobiles produced
this year and in prior years. Suppose our econometric model resulted in the following data.
X1 = Automobiles produced this year
X2 = Automobiles produced last year
X3 = Automobiles produced 2 years ago
X4 = Automobiles produced 3 years ago
X5 = Automobiles produced 4 years ago
Constant
Multiple R
Coef
5.00
0.25
0.67
2.12
3.44
– 50,000
0.83
t-ratio
10.4
0.6
1.4
2.7
6.5
88. Which variable in the model is the most significant predictor of tire production? __________
Answer: X1
89. What is the proportion of variation in tires produced by our predictor variables in the model?
________
Answer: 0.69
90. Which variable in the model is the least significant in predicting tire production? _________
Answer: X 2
91. What is the equation for our model? ____________________________
Answer: number of tires produced = - 50,000  5.00 X 1  0.25 X 2  0.67 X 3  2.12 X 4  3.44 X 5
92. How much does tire production increase for every thousand cars produced two years ago? _____
Answer: 670
93. How much does tire production change for every thousand cars produced three years ago? _____
Answer: 2,120
Use the following to answer questions 94-100:
A real estate agent developed a model to relate a house's selling price (Y) to the area of floor space (X)
and the area of floor space squared X 2 . The multiple regression equation for this model is:
 
Yˆ  125  3X  X 2
where: Yˆ = selling price (times $1000)
X = square feet of floor space (times 100)
94. What is the intercept (a)? _____________
Answer: $125 (in thousands)
95. What is the selling price of a house with 1000 square feet? ______
Answer: $195,000
96. What is the selling price of a house with 1500 square feet? ______
Answer: $305,000
97. What is the selling price of a house with 2000 square feet? ______
Answer: $465,000
98. What is the difference in selling prices of a house with 1600 square feet and one with 1700 square
feet? ______
Answer: $30,000 ($363,000 - $333,000)
99. What is the difference in selling prices of a house with 1700 square feet and one with 1800 square
feet? ______
Answer: $32,000 ($395,000 - $363,000)
100. What is the difference in selling prices of a house with 1650 square feet and one with 1750 square
feet? ______
Answer: $31,000 ($378,750 - $347,750)
Multiple Choice
Use the following to answer questions 101-106:
A manager at a local bank analyzed the relationship between monthly salary and three independent
variables: length of service (measured in months), gender ( 0 = female, 1 = male) and job type (0 =
clerical, 1 = technical). The following ANOVA summarizes the regression results:
ANOVA
Regression
Residual
Total
Intercept
Service
Gender
Job
df
3
26
29
SS
1004346.771
1461134.596
2465481.367
MS
334782.257
56197.48445
F
5.96
Coefficients
784.92
9.19
222.78
-28.21
Standard Error
322.25
3.20
89.00
89.61
t Stat
2.44
2.87
2.50
-0.31
P-value
0.02
0.01
0.02
0.76
101. Based on the ANOVA and a 0.05 significance level, the global null hypothesis test of the multiple
regression model
A) Will be rejected and conclude that monthly salary is related to all of the independent variables
B) Will be rejected and conclude that monthly salary is related to at least one of the independent
variables.
C) Will not be rejected.
D) Will show a high multiple coefficient of determination
Answer: B
102. Based on the ANOVA, the multiple coefficient of determination is
A) 5.957%
B) 59.3%
C) 40.7%
D) cannot be computed
Answer: C
103. Based on the hypothesis tests for the individual regression coefficients,
A) All the regression coefficients are not equal to zero.
B) "job" is the only significant variable in the model
C) Only months of service and gender are significantly related to monthly salary.
D) "service" is the only significant variable in the model
Answer: C
104. In the regression model, which of the following are dummy variables?
A) Intercept
B) Service
C) Service and gender
D) Gender and job
E) Service, gender, and job
Answer: D
105. The results for the variable gender show that
A) males average $222.78 more than females in monthly salary
B) females average $222.78 more than males in monthly salary
C) gender is not related to monthly salary
D) Gender and months of service are correlated.
Answer: A
106. Based on the hypothesis tests for individual regression coefficients,
A) All regression coefficients should remain in the regression equation
B) Based on the standard errors, the variable, service, should not be included in the regression equation.
C) Based on the p-values, the variable, job, should not be included in the regression equation.
D) The relationship between monthly salary and gender is linear.
Answer: C
Essay
107. What are the five assumptions of linear multiple regression?
Answer: 1) A linear relationship between the dependent variable and the independent variables, 2) the
variation of the residuals is the same for small and large values of Yˆ , 3) the residuals are normally
distributed, 4) the independent variables should not be correlated, 5) The residuals are independent.
108. How are scatter diagrams used to evaluate the assumptions of linear regression?
Answer: A scatter diagram can be used to evaluate the assumption of linearity. For each independent
variable, the dependent variable can be plotted against the independent variable. These plots provide
evidence of linear relationships.
109. How are residual plots drawn and used to evaluate the assumptions of linear regression?
Answer: A residual plot graphs the residuals against the values of one of the independent variables. A
residual plot is graphed for each independent variable. To support the assumptions of equal variation for
small and large values of the independent variable, the points should be evenly distributed above and
below zero and evenly distributed over all values of the independent variable. Difficulty: Hard
110. What statistic is used to assess multicolinearity in multiple regression analysis?
Answer: Variance inflation factor (VIF)
Fill-in-the-Blank
Use the following to answer questions 111-115:
It has been hypothesized that overall academic success for college freshmen as measured by grade point
average (GPA) is a function of IQ scores X1  , hours spent studying each week X 2  , and one's high
school average X 3  . Suppose the regression equation is:
Yˆ  6.9  0.055 X 1  0.107 X 2  0.0083 X 3.  0.0004 X 2 X 3
The multiple standard error is 6.313 and R2 = 0.826.
111. What is the predicted GPA for a student with an IQ of 108, 32 hours spent studying per week and a
high school average of 82? _____
Answer: 3.249
112. What is the predicted GPA if the IQ is 108, the number of hours spent studying is 30, and the high
school average is 82? ______
Answer: 3.029
113. Assuming other independent variables are held constant, what effect on the GPA will there be if the
numbers of hours spent studying per week increases from 32 to 36? ________
Answer: The answer depends on the value of hours studied per week
114. How many independent variables are in the regression equation? ___
Answer: four
115. How will a student's GPA be affected if the student’s high school average was 80 and an additional
hour is spent studying each weeknight? ________
Answer: increases by 0.551
Download