σ σ σ β β μ μ β - University of Toronto

advertisement
UNIVERSITY OF TORONTO
Faculty of Arts and Science
AUGUST 2007 EXAMINATIONS
ECO220Y1Y
PART 1 OF 2
Duration - 3 hours
Examination Aids: Calculator
SOLUTIONS
(1) (a)
H0: μA = μB
H1: μA > μB
where “A” means after ads and “B” means before ads
σ 2pool =
t122 =
nA − 1
nB − 1
σ A2 +
σ B2 ≈ 8832
nA + nB − 2
n A + nB − 2
5746 − 5372
≈ 2.14
1
1
883
+
45 59
DF>100, we can use Z-distribution. The p-value is approximately 0.0162 (1.62%) implying that we
can reject the null at any significance level less than 0.0162. Hence we can strongly reject the null.
Hence we conclude that the sale has increased after the ads.
(b)
x A − x B ≤ 1.645 × 883
1
1
+
45 59
x A − x B ≤ 287.52
β = prob( x A − x B < 287.52 | μ A − μ B = 50)
287.52 − 50
= 1.359
1
1
+
883
45 59
β = prob( Z < 1.359) = 0.9131
Power = 1 − β = 0.0869
Z=
The beta error is very high; the test has very low power for this alternative. The reason the power is
low is because the mean before and the mean after are not that different (5900 versus 5850) give
our relatively small sample sizes. There is a good chance that even though the null hypothesis is
PART 1 OF 2
Page 1 of 4
false (5900 is NOT equal to 5850) we will fail to reject it. In other words, there is a high probability
that sampling error will result in us committing a Type 2 error.
(2) (a)
15,000 × 5,250
− 44,100
250
=
= −0.25
b1 = 2 =
2
176,400
sx
15,000
1,076,400 −
250
b0 = (5,250 / 250) − (−0.25) × (15,000 / 250) = 36
s xy
270,900 −
yˆ = 36 − 0.25 × x
Interpretation:
Slope: For every $1,000 increase in parental income, financial aid is on average $250 less. You
could also say that for every $1 increase in parental income the average financial aid is $0.25
lower.
Intercept: Given that a family income of zero is not possible the intercept does not have any
economic meaning. Of course this would also be outside of the range of the data. (If you
extrapolate backward using the regression line: for a family income of zero the financial aid would
be $36,000 on average BUT you should not do that extrapolation as it does not make sense and is
outside the range of your data.)
(b)
⎛
5250 2 ⎞
⎟⎟ = 295,120
SST = (n − 1) S y2 = ⎜⎜ 405,370 −
250
⎝
⎠
2
⎛
( s xy ) ⎞
SSE = (n − 1)⎜ s 2y − 2 ⎟ = (n − 1) s 2y − b1s xy = 295,120 − (−0.25)(−44,100) = 284,095
⎜
s x ⎟⎠
⎝
SSR = SST − SSE = 11,025
(
sε =
284,095
=
n−2
yˆ ± tα 2 sε
)
284,095
≅ 33.85
248
2
1 ( xg − x )
1+ +
n (n − 1) s x2
1
(92 − 60) 2
13 ± 1.96 × 33.85 1 +
+
250 176,400
13 ± 1.96 × 33.85 × 1.0045
13 ± 35.96
$13,000 ± $35,960
PART 1 OF 2
Page 2 of 4
However, this answer does not make sense. Surely no parents receive negative financial aid in the
data even if they are quite rich. Why is this interval so wide as to include totally unreasonable
values (large negative amounts of financial aid)? Part of it is that the standard error of estimate is
very big. If we compute the R2 for the regression it is only 0.0374. However, an F test does lead to
the conclusion that the model is statistically significant (F = 9.62). So why are we getting this crazy
result? The only other explanations are that there is a problem in the data (such as an extreme
outlier) and/or we are violating the underlying assumptions needed to use this formula (such as we
have heteroscedasticity). Hence we should check the underlying data and check our underlying
assumptions to find the problem. We should disregard the calculation above.
(c)
284,095
284,095
=
≅ 33.85
n−2
248
sε
33.85
b1 ± tα 2
= −0.25 ± 1.96
= −0.25 ± 0.1580
2
176
,
400
(n − 1) s x
sε =
We are 95% confident that the slope is between -0.408 and -0.092. This interval is entirely in the
negative range which is what one might expect: financial aid on average is observed to decrease
when parental income increases. However, given that we realized in Part (b) that there is a
problem with the data and meeting the underlying assumptions then the answer to this question is
also invalidated. If it is heteroscedasticity (a good guess) then Part (a) would still be OK because
heteroscedasticity does not bias the point estimates of the intercept and the slope. However,
because Parts (b) and (c) require the standard error of estimate (which we calculated using a
formula that assumes homoscedasticity), these intervals will have the wrong width. They will be
centered at the correct spots.
(3) (a)
Y-hat = 6.7131 + 1.6188* X1 – 0.4115* X2
Given X2, a 1 percentage point increase in the unemployment (for example, from 5% to 6%) is
associated with an increase of 1.6188 in the auto theft rate. Given X1, every year the theft rate
declines by 0.4115 percentage points on average. This means that there is a declining trend in car
theft.
(b)
t18 =
− .4115
= −2.101
SEtimetrend
SEtimetrend ≅ 0.1959
(c)
R 2 = ( R) 2 = (0.5802) 2 = 0.3372
R 2 = 1 − (1 − R 2 )(
PART 1 OF 2
n −1
21 − 1
) = 1 − (1 − 0.3372)(
) = 0.2636
n − k −1
21 − 2 − 1
Page 3 of 4
The R2 is the fraction of the variation of the car theft rate around the mean explained by the
regression.
The Adjusted R2 is the R2 adjusted for the degrees of freedom. Since adding independent variables
- even if irrelevant - increases the R2, the adjustment is done to adjust for additional variables and
hence remedy this problem.
(d) We should compare the Adjusted R2.
The new model:
R 2 = 1 − (1 − R 2 )(
21 − 1
n −1
) = 1 − (1 − 0.37)(
) = 0.2588
21 − 3 − 1
n − k −1
The old model:
R 2 = 1 − (1 − R 2 )(
21 − 1
n −1
) = 1 − (1 − 0.3372)(
) = 0.2636
21 − 2 − 1
n − k −1
Conclusion: The addition of (X3) has not really explained much more of the variation in the
dependent variable. Of course, one cannot make the decision about whether or not X3 should be
included in the model based on the fact that it does not increase the adjusted R2 much.
(4) (a) The line for D = 0 has an intercept less than zero (β0) and a negative slope equal to β1. The
line for D = 1 lies entirely below the D = 0 line, with an intercept equal to (β0 + β2), which is a more
negative intercept, and a steeper negative slope equal to (β1 + β3).
(b) Simply perform a t test of whether the coefficient on D is negative or not:
H0 : β2 = 0
H1 : β 2 < 0
t DF =
b2
SE (b2 )
(c) Simply perform a t test of whether the coefficient on D*X is negative or not, because if so then
the slope of response for those who take more than 2 courses will be (β1 + β3) which is less than
β 1:
H 0 : β3 = 0
H1 : β 3 < 0
t DF =
b3
SE (b3 )
PART 1 OF 2
Page 4 of 4
UNIVERSITY OF TORONTO
Faculty of Arts and Science
AUGUST 2007 EXAMINATIONS
ECO220Y1Y
PART 2 OF 2
Duration - 3 hours
Examination Aids: Calculator
SOLUTIONS
(1) Which of the following distributions has only one parameter? (B)
(2) For continuous random variables a histogram describes which of the following? (C)
(3) For a sample of size 22, what is the chance that the sample mean is equal to 1? (A)
(4) For a sample size of 10, what is the chance that the sample mean is equal to the
population mean? (D)
(5) For a normal population, which of these probabilities would be the largest? (B)
(6) Considering the following histogram, what is the approximate sample standard
deviation? (C)
(7) Which statements about the sampling distribution of a sample mean X and the
population distribution of X are true? (B)
(8) What is the probability that X is within three standard deviations of the mean? (D)
(9) For a sample size of 24, we can be 99% confident that the sample mean will be
within what interval? (C)
(10) Which would yield a sample mean that is most subject to sampling noise? (D)
(11) If 28 students are randomly sampled, what is the chance that fewer than 14 are
female? (D)
(12) If 10 students are randomly sampled, what is the chance that more than 10 percent
are international students? (B)
PART 2 OF 2
Page 1 of 2
(13) If 400 students are randomly sampled, what is the chance that more than 10
percent are international students? (A)
(14) For a normal population with a known variance, suppose that an investigator
believes that approximately 95% values are between 38 and 70. The appropriate
sample size for estimating the true population mean μ within 2 units with 95%
confidence level is approximately: (A)
(15) Suppose a random sample of size 5 is taken from this population: X1, X2, X3, X4
and X5. What is the variance of (X1 + X2 + X3 + X4 + X5)? (B)
(16) To the nearest hundredth, what is the chance that a random sample of size 49
taken from this population has a mean greater than 4? (B)
(17) Suppose that a t test of Ho: μ = 250 versus Ha: μ ≠ 250 is based on 12 degrees of
freedom. If the calculated value of the test statistic is 2.8, then the P-value is: (C)
(18) If four confidence interval estimates for the population means are constructed with
95% confidence for four independent samples, the probability that all four intervals
contain the population means is: (A)
(19) A binomial experiment is based on 100 trials and an unknown success probability
p. The null hypothesis is H0: p = 0.5 and the alternative hypothesis is H1: p > 0.5. H1 is
inferred to be true if pˆ > 0.598 . A minimum sample size that ensures that the power of
test is at least 0.975 for an alternative of p = 0.65 is: (B)
(20) A dean compares the fraction of students passing a common final examination for
40 randomly selected students from Professor A’s class and 40 randomly selected
students from Professor B’s class. Twenty four of Professor A’s group passed the exam
and 19 of the students from Professor B’s classes passed. Find a 95% confidence
interval for the difference between the success rates of Professors A and B. (C)
(21) Consider the estimated regression: yˆ = 15 + 6 x1 + 5 x2 + 4 x1 x2 . On average when
x2 = −1.5 a one unit increase in x1 changes the value of ŷ by: (A)
(22) If her husband is 72 inches tall, on average the height of a wife in inches is: (B)
(23) What is the F-statistic for the statistical significance of the regression? (D)
(24) What is the expected difference in price between a 2,000-square foot home with
central air conditioning and a 1,800-square foot home with no central air conditioning
(given that the values for the other X variables are the same for the two homes)? (D)
PART 2 OF 2
Page 2 of 2
Download