FinS10V1

advertisement
C22.0103
FINAL EXAM
Name:________________________
Write your answers to the first five questions on the attached sheets, in the spaces
provided. Circle the choice which best answers questions 6-15. Do not write
anything else on this page (besides your name and the circles). When you are
finished, hand in the entire exam (both question sheets and answer sheets).
Please do not remove any pages from the exam paper. There are 15 questions,
each worth 5 points. Everyone receives 25 points for free. Good Luck!
1) WRITTEN
11) (A) (B) (C) (D) (E)
2) WRITTEN
12) (A) (B) (C) (D) (E)
3) WRITTEN
13) (A) (B) (C) (D) (E)
4) WRITTEN
14) (A) (B) (C) (D) (E)
5) WRITTEN
15) (A) (B) (C) (D) (E)
6) (A) (B) (C) (D) (E)
7) (A) (B) (C) (D) (E)
8) (A) (B) (C) (D) (E)
9) (A) (B) (C) (D) (E)
10) (A) (B) (C) (D) (E)
Answer For Question 1:
Answer for Question 2:
Answer for Question 3:
Answer for Question 4:
Answer for Question 5:
C22.0103
FINAL EXAM
In Questions 1) - 5), we consider the response variable of Hotel and Restaurant
Employment for Costa Rica (in thousands of employees) for each year from 1995
to 2008, together with the following three explanatory variables: Tourists
Arriving (in thousands), GDP (in millions of US Dollars), and Year.
1) Here is the linear regression output for the simple regression of Hotel and
Restaurant Employment on Tourists Arriving.
Regression Analysis: Hotel and Restaurant Employment versus
Tourists Arriving
The regression equation is
Hotel and Restaurant Employment = 26.2 + 0.0412 Tourists
Arriving
Predictor
Constant
Tourists Arriving
S = 8.07098
Coef
26.173
0.041155
R-Sq = 84.5%
SE Coef
6.845
0.005095
T
3.82
8.08
P
0.002
0.000
R-Sq(adj) = 83.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
12
13
SS
4249.5
781.7
5031.2
MS
4249.5
65.1
F
65.24
P
0.000
A) Based on this output, discuss the impact of an additional 2000 tourists
arriving in Costa Rica in a given year on Hotel and Restaurant
Employment. (2 Points)
B) Test the null hypothesis that the true coefficient of Tourists Arriving in
this model is .03. Use a two-tailed alternative hypothesis, and a
significance level of .05. (3 Points).
2) Here is the fitted line plot for the simple regression in Question 1.
Fitted Line Plot
Hotel and Restaurant Employment = 26.17 + 0.04115 Tourists Arriving
Hotel and Restaurant Employment
120
S
R-Sq
R-Sq(adj)
110
8.07098
84.5%
83.2%
100
90
80
70
60
50
800
1000
1200
1400
1600
Tourists Arriving
1800
2000
2200
A) The data point furthest to the right corresponds to the year 2008, and has a
leverage of 0.34 and a Cook's D of 0.86. Does this give us cause for
concern as to the validity of the regression model? (2 points).
B) Is there anything about the fitted line plot, or the plot of residuals from this
regression versus year (see below) that gives us cause for concern as to the
validity of the regression model? (3 points).
Residuals Versus Year
(response is Hotel and Restaurant Employment)
15
10
Residual
5
0
-5
-10
1995.0
1997.5
2000.0
2002.5
Year
2005.0
2007.5
3) Here is the regression output using all three explanatory variables.
Regression Analysis: Hotel and Restaurant Employment versus
Tourists Arriving, GDP, Year
The regression equation is
Hotel and Restaurant Employment = - 10404 + 0.0162 Tourists
Arriving - 0.197 GDP + 5.24 Year
Predictor
Constant
Tourists Arriving
GDP
Year
S = 5.11627
Coef
-10404
0.01624
-0.1967
5.244
R-Sq = 94.8%
SE Coef
2538
0.01952
0.1258
1.275
T
-4.10
0.83
-1.56
4.11
P
0.002
0.425
0.149
0.002
R-Sq(adj) = 93.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
3
10
13
SS
4769.5
261.8
5031.2
MS
1589.8
26.2
F
60.74
P
0.000
A) Based on this output, is there evidence of a positive relationship between
Tourists Arriving and Hotel and Restaurant Employment? (2 points).
B) Use the output above to compute the p-value in testing the null hypothesis
that the true coefficient of GDP is zero versus the alternative hypothesis
that the true coefficient is positive. (2 points).
C) Do the F-statistic and its associated p-value indicate that all variables
should be included in the regression? (1 point).
4) Next, we omit Year from the regression. For the regression based on Tourists
Arriving and GDP, the output is as follows.
Regression Analysis: Hotel and Restaurant Employment versus
Tourists Arriving, GDP
The regression equation is
Hotel and Restaurant Employment = 32.2 + 0.0667 Tourists Arriving
- 0.216 GDP
Predictor
Constant
Tourists Arriving
GDP
S = 8.00209
Coef
32.232
0.06667
-0.2161
R-Sq = 86.0%
SE Coef
8.744
0.02376
0.1966
T
3.69
2.81
-1.10
P
0.004
0.017
0.295
R-Sq(adj) = 83.5%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
11
13
SS
4326.8
704.4
5031.2
MS
2163.4
64.0
F
33.79
P
0.000
Is this model preferable to the full model in Question 3? Justify your answer.
(5 points).
5)
In the regression output in Question 4 above, note that the estimated
coefficient for Tourists Arriving is closer to zero than the estimated
coefficient for GDP (since |.06667| < |−.2161|). How, then, do you explain
the fact that the t-statistic for Tourists Arriving is further from zero than the
t-statistic for GDP? (5 points).
Questions 6-15 are general and do not pertain to the regression example
above.
6) In a multiple regression context, suppose that we have three available
explanatory variables. Suppose that we run three regressions. The first
regression uses variables 1 and 2, and produces an R 2 of .65. The second
regression uses variables 2 and 3, and yields an R 2 of .70. The third regression
uses variables 1 and 3, and produces an R 2 of .75. Which model is preferable,
according to AICc?
A) The model with variables 1 and 2.
B) The model with variables 2 and 3.
C) The model with variables 1 and 3.
D) It cannot be determined from the available information.
7) Consider a simple linear regression of y on x, where the y-values are not all the
same. Suppose that the residuals all take the same value. Then:
A) R 2 must be 1. B) R 2 may be less than 1 C) It cannot be determined from
the available information
8) Suppose that a simple linear regression model holds for a data set with n=20.
What is the probability that the sample mean of the (unobservable) errors is
more than .2969 times the sample standard deviation of the errors?
A) .0918 B) .100 C) .1836 D) .200 E) None of the Above.
9) Suppose we are going to use a t-test to test the null hypothesis H 0 :   0
versus the alternative hypothesis H A :   0 . Assume that the null hypothesis
is true and the population is normally distributed. What is the probability that
the right-tailed p-value will be less than .01?
A) .005 B) .99 C) .995 D) .01 E) None of the Above
10) In a sample of size 10 from a normal population, the sample mean is 2 and the
sample standard deviation is 3. Construct a 95% confidence interval for the
population mean, μ. The interval is:
A) (−.146, 4.146) B) (.141, 3.859) C) (−3.88, 7.88) D) (−.114, 4.114)
E) None of the Above.
11) We will look here at the results of a very large trial of an HIV vaccine. The
trial was conducted on 16,400 people in Thailand, all of whom were HIV
negative at the start of the trial. Half of the people received a placebo, and
half received the vaccine. Both groups were followed for three years
afterwards. Of the 8,200 who received the vaccine, 51 developed HIV. Of the
8,200 who received the placebo, 74 developed HIV. Here are the results from
Minitab's 2-proportions.
Test and CI for Two Proportions
Sample
1
2
X
51
74
N
8200
8200
Sample p
0.006220
0.009024
Difference = p (1) - p (2)
Estimate for difference: -0.00280488
95% upper bound for difference: -0.000571046
Test for difference = 0 (vs < 0): Z = -2.07 P-Value = 0.019
If the vaccine were actually ineffective, what would be the probability of
observing at least as big a reduction as seen here in the HIV rate for the
vaccine compared to the placebo?
A) .0095 B) .019 C) .038 D) .981 E) None of the Above
12) In simple linear regression, if the right-tailed p-value for the coefficient of the
explanatory variable is .5, then the R 2 must be
A) .5 B) .25 C) 1 D) 0 E) None of the Above
13) Suppose that an automobile manufacturer has been notified by owners that a
certain model has a sticky accelerator pedal. To investigate these claims, the
company wants to perform their own laboratory tests, based on a random
sample of n automobiles of the given model. If 1% of the automobiles in the
population have the sticky accelerator pedal, what is the smallest value of n
that the company should use for their sample size to guarantee a probability of
at least 90% that at least one of the automobiles in the sample has a sticky
accelerator pedal?
A) 10 B) 120 C) 230 D) 550 E) None of the Above.
14) Based on a sample of size 10 from a normal distribution, suppose you want to
test the null hypothesis that the population mean is zero against a right-tailed
alternative hypothesis. The sample mean is 1.0386 and the sample standard
deviation is 1.452. Then the p-value is:
A) .0238 B) .05 C) .0119 D) .025 E) None of the Above
15) Consider a game where a fair coin is tossed four times, independently. If all
four tosses are heads, you win $10. Otherwise, you lose $1. If you are going to
play this game once, what is your expected profit?
A) $4 B) 31.25 Cents C) −31.25 Cents D) −$4 E) None of the Above.
Download