QUESTIONS ON SIMPLE LINEAR REGRESSION

advertisement
EXAMPLE EXAM QUESTIONS ON SIMPLE LINEAR REGRESSION
Questions 1-7 refer to the following situation: Stock Prices, Y, are assumed to be affected by the annual rate
of dividend of stock, X. A simple linear regression analysis was performed on 20 observations and the
results were::
Regression Equation Section
Independent Regression Standard
Variable
Coefficient Error
INTERCEPT
-7.964633
3.11101359
X1
12.548580
1.27081204
T-Value
(Ho: B=0)
-2.560
9.874
Prob
Level
0.0166
0.0001
1. What statistical conclusion should you make about the effect of the dividend on average stock price?
A. Since 11.30869 > table value, reject the null hypothesis.
B. Since 12.54858 > table value, reject the null hypothesis.
C. Since 9.874 < table value, reject the null hypothesis.
D. Since 9.874 > table value, reject the null hypothesis.
E. Since 0.7895 < table value, fail to reject the null hypothesis.
2. What is the 95% confidence interval for a value of Y given an X value of 2.36? You are given the
standard error of this estimate is 3.351
1) in the sample is interpreted as: I am 95% confident that
A. the stock price for a stock with a dividend rate of 2.36% falls between $14.61 and $28.69.
B. the mean stock price for all stocks with a dividend rate of 2.36% falls between $14.61 and $28.69.
C. the variance in stock price for all stocks falls between $14.61 and $28.69.
D. the dividend rate for all stocks falls between $14.61 and $28.69.
E. for each one point increase in dividend rate, the stock price will increase from $14.61 and $28.69
3. Which one of the following assumptions is incorrectly stated?
A. The stock price is normally distributed for any dividend rate.
B. The stock price has the same variability for any dividend rate.
C. The stock price for any dividend rate is a linear function of dividend rate.
D. The difference between the stock price and the expected stock price
given the dividend rate is independent from company to company.
4. The interpretation of 0.7895, the value of R-square (the coefficient of determination) is:
A. 78.95% of the sample stock prices (around the mean stock price) can be attributed to a linear relationship
with the dividend rate in the population.
B. the mean stock price will be estimated to increase $97.50 for each point increase in the rate.
C. the mean stock price will be increase $78.95 for each point increase in the rate.
D. the stock price will increase $78.95 for each point increase in the rate.
E. 78.95% of the sample variability in stock price (around the mean stock price) can be attributed to a linear
relationship with the dividend rate.
5. What is the estimate of the change in expected stock prices when the dividend rate increases by one point?
A. 97.50
B. -7.964633
C. This is a parameter not a statistic.
D. 12.54858
E. 5.36546
6. The estimate of the slope will vary from sample to sample, the estimate of the standard deviation of betahat is:
A. 3.36284
B. 3.14983
C. 0.39274
D. 12.54858
E. 1.27081
7. A 95% confidence interval for the average stock price given the rate of return will use the following t
value:
A. 9.874
B. -2.560
C. 2.101
D. 2.045
E. 2.153
Answers to 1-7
1. D from computer printout use the t-test value across from X1
2. A this is a confidence interval for a conditional mean
3. C the mean stock price falls on the line
4. E r-square is % of sample variation of y explained by x
5. D This is beta-hat – see computer printout to the right of X1
6. E This is the standard error of hat to right of X1
7. C All t-values in simple linear regression have n-2 d. f.
Questions 8-17 are concerned with the following situation: A fire insurance company wants to relate the
amount of fire damage (y) in major residential fires to the distance between the residence and the nearest
fire station (x). The study is to be conducted in a large suburb of a major city, a sample of 15 recent fires in
this suburb is selected. The 15 values and the printout follow:
OBS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
X
3.4
1.8
4.6
2.3
3.1
5.5
0.7
3.0
2.6
4.3
2.1
1.1
6.1
4.8
3.8
3.5
Y
26.2
17.8
31.3
23.1
27.5
36.0
14.1
22.3
19.6
31.3
24.0
17.3
43.2
36.4
26.1
.
Dependent Variable: Y
$1000 fire damage
Analysis of Variance
Source
DF
Model
1
Error
13
Total(Adjusted) 14
Root MSE
Dep Mean
C.V.
Sum of
Squares
2.31635
26.41333
8.76961
841.76636
69.75098
911.51733
R-square
Adj R-sq
Mean
Square
841.76636
5.36546
F Value
Pro
156.886
0.0001
0.9235
0.9176
Parameter Estimates
Variable
Parameter
Estimate
INTERCEPT
X
10.277929 1.42027781
4.919331 0.39274775
Dep
Obs
16
Actual
Y
.
Standard T for H0:
Error
Parameter=0 Prob > |T|
7.237
12.525
Predicted
Value
95% LCL
Mean
27.4956
26.1901
0.0001
0.0001
95% UCL
95% LCL
Mean
Individual
28.8011
22.3239
95%
Individual
32.66
8. Which one of the following assumptions is incorrect?
(A) The difference between the fire damage and the expected fire damage given the distance is independent
from house to house.
(B) The fire damage is normally distributed for any distance.
(C) The mean fire damage has the same variability for any distance.
(D) The mean fire damage for any distance is a linear function of distance.
9. You will find the value 4.919331 in the printout under Parameter Estimates. This is interpreted as:
(A) The mean fire damage will increase $4,919.33 for each mile from the fire station.
(B) The mean fire damage will be estimated to increase $4,919.33 for each mile from the fire station.
(C) The fire damage will increase $4,919.33 for each mile from the fire station.
(D) The mean fire damage will be $4,919.33 given the distance.
(E) The estimated mean fire damage will be $4,919.33 given the distance.
10. The estimate of the standard deviation of fire damage for all homes the same distance from the fire
station is (in thousands of dollars)
(A) 0.392744775
(B) 2.31635
(C) no information available.
(D) 69.75098
(E) 5.36546
11. The interpretation of 0.9235, the value of R-square (the coefficient of determination) is:
(A) 92.35% of the variability in fire damage (around the mean fire damage) can be attributed to a linear
relationship with the distance to the fire station in the population.
(B) the mean fire damage will be estimated to increase $923.50 for each mile from the fire station.
(C) the mean fire damage will be increase $923.50 for each mile from the fire station.
(D) the fire damage will increase $923.50 for each mile from the fire station.
(E) 92.35% of the sample variability in fire damage (around the mean fire damage) can be attributed to a
linear relationship with the distance to the fire station.
12. To test the null hypothesis that the parameter of the slope is zero, the test statistic value is:
(A) 0.9235
(B) 0.9176
(C) 0.39274775.
(D) 12.525
(E) 7.237
13. For testing the slope is zero versus the alternative that the slope is not zero (use alpha of 0.05), the
rejection region is: Reject the null hypothesis if
(A) t > 2.160 or t < -2.160
(B) | t | < 12.525
(C) t > 1.771
(D) t > 12.525
(E) t > 2.160
14. The 95% confidence interval for the mean fire damage for all house 3.5 miles from the fire station is: (in
thousands of dollars)
(A) 15.3442 to 25.8279
(B) 4.070 to 5.768
(C) 10.1999 to 21.1785
(D) 13.4329 to 17.9455
(E) 26.1901 to 28.8011
15. The 95% confidence interval for the mean (25.7076 to 28.2997) for the first house (OBS 1) in the sample
is interpreted as: I am 95% confident that
(A) the fire damage for a house 3.4 miles from the fire station falls between $25,707.60 and $28,299.70.
(B) the fire damage for all houses 3.4 miles from the fire station falls between $25,707.60 and $28,299.70.
(C) the variance in fire damage for all houses 3.4 miles from the fire station falls between $25,707.60 and
$28,299.70.
(D) the average fire damage for all houses 3.4 miles from the fire station falls between $25,707.60 and
$28,299.70.
(E) for each one mile from the fire station, the mean fire damage will increase from $25,707.60 and
$28,299.70
16. In this sample for each one standard deviation that a house is from the fire station, the mean fire damage
will be estimated to increase 0.96 standard deviations. This is
(A) the coefficient of correlation, r
(B) the sample standard deviation, s
(C) the test statistic value, t
(D) coefficient of determination, r-square
(E) the least squares coefficient, beta hat
17. The difference between the actual value of y and the predicted
value of y (y-yhat) is called
(A) a standard deviation
(B) a slope
(C) a residual
(D) a sample standard deviation
(E) an error
ANSWERS for 8-17
8.
9.
10
11
12
13
14
15
16
17
C fire damage has the same variance given distance for any distance
B this is the beta hat
B this is an estimate of sigma of y given x, the square root of MSE
E r-square is % of sample variation of y explained by x
D use t value from computer printout across from X.
A use a t with n-2 degrees of freedom
E see the 16th observation under the “mean” columns
D this is a confidence interval for a conditional mean
A this is the definition of pearson’s r from class notes
C this is the definition of the residual
QUESTIONS 18-27 DEAL WITH THE FOLLOWING SITUATION: The expected sales of a product in a
city are assumed to be affected by the per capita discretionary income and the population of the city. Per
capita discretionary income will be referred to as PCDI in all the questions. In Questions 1-10 examine only
the effect of per capita discretionary income on the mean sales. Thus the following model is hypothesized:
E(Y) = B0 + B1 X1 where
Y = Sales (in thousands of dollars)
X1 = Per Capita Discretionary Income (in dollars)
A sample of 15 cities, along with their sales, per capita discretionary income, and the population of the city
(in thousands) is given in the attached printout. The 15 values and a printout follow:
OBS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
INCOME
2450
3254
3802
2838
2347
3782
3008
2450
2137
2560
4020
4427
2660
2088
2605
2500
3500
Root MSE
Dep Mean
49.51434
150.60000
SALES
162
120
223
131
67
169
81
192
116
55
252
232
144
103
212
.
.
R-square
Adj R-sq
0.4087
0.3632
Parameter Estimates
Variable
INTERCEP
INCOME
Dep
Obs
16
17
Coefficient
Estimate
-10.207
0.054
Standard
Error
55.147
0.018
95% LCL
Actual Predicted Mean
.
.
125.5
179.8
92.5
145.1
T for H0:
B=0
Prob
-0.185
0.8560
2.998
0.0103
95% UCL 95% LCL
95% UCL
Mean
Individual Individual
158.5
214.5
13.5
67.3
237.5
292.3
18. The 95% confidence interval for the mean sales of all cities with PCDI = 2500 is
A. 92.5 to 158.5
B. can not be calculated because of missing values
C. 3500
D. 88.6 to 156.9
E. 13.5 to 237.5
19. When testing the null hypothesis that the slope equals to zero versus the alternative hypothesis that the
slope does not equal to zero, the rejection region would be: reject the Null if
A. t > t(14, 0.025) or t < -t(14, 0.025)
B. t > t(13, 0.05)
C. F < F(1, 13, 0.05)
D. |t| > t(13, 0.025)
E. p-value > alpha
20. What distribution would you use to infer about the variation of sales among all cities with the same
PCDI?
A. the Chi-square distribution
B. the t distribution
C. the F distribution
D. a t with no interaction and an F with interaction
21. Given the p-value of the F-test is 0.0103, we can interpret this as
A. Given the null is true, there is a 1.03% chance of finding this value of the test statistic or something more
extreme.
B. The percent of sample variability of Y explained by the independent variable is 1.03%
C. There is a 98.97% probability that the null hypothesis is right.
D. There is a 98.97% probability that the null hypothesis is wrong.
E. The probability of a type I error is 0.0103.
22. Does the PCDI help predict the sales of the product?
A. Yes, because 2.998 > the table value
B. No, because .8560 is greater than alpha
C. Yes, because 8.986 < the table value
D. Yes, because of MSE = 2451.66959
E. No, because 0.018 is less than the table value
23. What is the interpretation of the coefficient of determination?
A. Don't know and don't care (Hint, this is a wrong answer and best left unspoken within hearing of
instructor).
B. 40.87 probability that sales is linearly related to PCDI.
C. 40.87 percent of the sample variability of sales can be attributed to changes in PCDI.
D. 40.87 percent of the variability of PCDI can be attributed to a linear relationship between mean PCDI and
sales.
E. 40.87 percent of the sample variability of PCDI can be attributed to a linear relationship between mean
PCDI and sales.
24. What table value would you use in the calculation of a 90% confidence interval for a value of Y given a
value of X?
A. 1.645
B. 3.140
C. 1.771
D. 2.650
E. 2.998
25. How many estimated standard errors is the point estimate of the slope away from zero? Slope is the
change in the mean sales for each dollar increase in PCDI.
A. 0.054
B. 0.4087
C. -10.207
D. 2.998
E. 0.018
26. You know that most cities have small PCDI and only a few have large PCDI. Is this a violation of any
assumption?
A. Yes, because the variation of PCDI would then be unequal.
B. No, because sales has to be normally distributed but PCDI does not have to be.
C. Yes, this would violate the linear relationship between the mean sales and PCDI.
D. No, because the variance of sales has nothing to do with the problem.
E. Yes, a violation of normality.
27. What would be the change in the estimated mean sales for each one standard deviation increase in
PCDI?
A. 0.3632 standard deviations
B. can not be calculated.
C. 0.4087 squared dollars
D. 0.6393 (square root of 0.4087) standard deviations
E. 0.0540 dollars
Answers to 18-27
----------------------18. A see observation number 16
19. D use a t with n-2 degrees of freedom
20. A variance is related to chi-squared, see Table 3 in class notes
21. A see definition of p-value in text book
22. A use the F test here
23. C see the definition of r-squared
24 C use t with n-2 d.f
25 D defintion of t-test value
26 B assumptions apply to y|x or to e but not on x
27. D this is the definition of r in class notes
Download