UNIVERSITY OF TORONTO Faculty of Arts and Science AUGUST 2007 EXAMINATIONS ECO220Y1Y PART 1 OF 2 Duration - 3 hours Examination Aids: Calculator SOLUTIONS (1) (a) H0: μA = μB H1: μA > μB where “A” means after ads and “B” means before ads σ 2pool = t122 = nA − 1 nB − 1 σ A2 + σ B2 ≈ 8832 nA + nB − 2 n A + nB − 2 5746 − 5372 ≈ 2.14 1 1 883 + 45 59 DF>100, we can use Z-distribution. The p-value is approximately 0.0162 (1.62%) implying that we can reject the null at any significance level less than 0.0162. Hence we can strongly reject the null. Hence we conclude that the sale has increased after the ads. (b) x A − x B ≤ 1.645 × 883 1 1 + 45 59 x A − x B ≤ 287.52 β = prob( x A − x B < 287.52 | μ A − μ B = 50) 287.52 − 50 = 1.359 1 1 + 883 45 59 β = prob( Z < 1.359) = 0.9131 Power = 1 − β = 0.0869 Z= The beta error is very high; the test has very low power for this alternative. The reason the power is low is because the mean before and the mean after are not that different (5900 versus 5850) give our relatively small sample sizes. There is a good chance that even though the null hypothesis is PART 1 OF 2 Page 1 of 4 false (5900 is NOT equal to 5850) we will fail to reject it. In other words, there is a high probability that sampling error will result in us committing a Type 2 error. (2) (a) 15,000 × 5,250 − 44,100 250 = = −0.25 b1 = 2 = 2 176,400 sx 15,000 1,076,400 − 250 b0 = (5,250 / 250) − (−0.25) × (15,000 / 250) = 36 s xy 270,900 − yˆ = 36 − 0.25 × x Interpretation: Slope: For every $1,000 increase in parental income, financial aid is on average $250 less. You could also say that for every $1 increase in parental income the average financial aid is $0.25 lower. Intercept: Given that a family income of zero is not possible the intercept does not have any economic meaning. Of course this would also be outside of the range of the data. (If you extrapolate backward using the regression line: for a family income of zero the financial aid would be $36,000 on average BUT you should not do that extrapolation as it does not make sense and is outside the range of your data.) (b) ⎛ 5250 2 ⎞ ⎟⎟ = 295,120 SST = (n − 1) S y2 = ⎜⎜ 405,370 − 250 ⎝ ⎠ 2 ⎛ ( s xy ) ⎞ SSE = (n − 1)⎜ s 2y − 2 ⎟ = (n − 1) s 2y − b1s xy = 295,120 − (−0.25)(−44,100) = 284,095 ⎜ s x ⎟⎠ ⎝ SSR = SST − SSE = 11,025 ( sε = 284,095 = n−2 yˆ ± tα 2 sε ) 284,095 ≅ 33.85 248 2 1 ( xg − x ) 1+ + n (n − 1) s x2 1 (92 − 60) 2 13 ± 1.96 × 33.85 1 + + 250 176,400 13 ± 1.96 × 33.85 × 1.0045 13 ± 35.96 $13,000 ± $35,960 PART 1 OF 2 Page 2 of 4 However, this answer does not make sense. Surely no parents receive negative financial aid in the data even if they are quite rich. Why is this interval so wide as to include totally unreasonable values (large negative amounts of financial aid)? Part of it is that the standard error of estimate is very big. If we compute the R2 for the regression it is only 0.0374. However, an F test does lead to the conclusion that the model is statistically significant (F = 9.62). So why are we getting this crazy result? The only other explanations are that there is a problem in the data (such as an extreme outlier) and/or we are violating the underlying assumptions needed to use this formula (such as we have heteroscedasticity). Hence we should check the underlying data and check our underlying assumptions to find the problem. We should disregard the calculation above. (c) 284,095 284,095 = ≅ 33.85 n−2 248 sε 33.85 b1 ± tα 2 = −0.25 ± 1.96 = −0.25 ± 0.1580 2 176 , 400 (n − 1) s x sε = We are 95% confident that the slope is between -0.408 and -0.092. This interval is entirely in the negative range which is what one might expect: financial aid on average is observed to decrease when parental income increases. However, given that we realized in Part (b) that there is a problem with the data and meeting the underlying assumptions then the answer to this question is also invalidated. If it is heteroscedasticity (a good guess) then Part (a) would still be OK because heteroscedasticity does not bias the point estimates of the intercept and the slope. However, because Parts (b) and (c) require the standard error of estimate (which we calculated using a formula that assumes homoscedasticity), these intervals will have the wrong width. They will be centered at the correct spots. (3) (a) Y-hat = 6.7131 + 1.6188* X1 – 0.4115* X2 Given X2, a 1 percentage point increase in the unemployment (for example, from 5% to 6%) is associated with an increase of 1.6188 in the auto theft rate. Given X1, every year the theft rate declines by 0.4115 percentage points on average. This means that there is a declining trend in car theft. (b) t18 = − .4115 = −2.101 SEtimetrend SEtimetrend ≅ 0.1959 (c) R 2 = ( R) 2 = (0.5802) 2 = 0.3372 R 2 = 1 − (1 − R 2 )( PART 1 OF 2 n −1 21 − 1 ) = 1 − (1 − 0.3372)( ) = 0.2636 n − k −1 21 − 2 − 1 Page 3 of 4 The R2 is the fraction of the variation of the car theft rate around the mean explained by the regression. The Adjusted R2 is the R2 adjusted for the degrees of freedom. Since adding independent variables - even if irrelevant - increases the R2, the adjustment is done to adjust for additional variables and hence remedy this problem. (d) We should compare the Adjusted R2. The new model: R 2 = 1 − (1 − R 2 )( 21 − 1 n −1 ) = 1 − (1 − 0.37)( ) = 0.2588 21 − 3 − 1 n − k −1 The old model: R 2 = 1 − (1 − R 2 )( 21 − 1 n −1 ) = 1 − (1 − 0.3372)( ) = 0.2636 21 − 2 − 1 n − k −1 Conclusion: The addition of (X3) has not really explained much more of the variation in the dependent variable. Of course, one cannot make the decision about whether or not X3 should be included in the model based on the fact that it does not increase the adjusted R2 much. (4) (a) The line for D = 0 has an intercept less than zero (β0) and a negative slope equal to β1. The line for D = 1 lies entirely below the D = 0 line, with an intercept equal to (β0 + β2), which is a more negative intercept, and a steeper negative slope equal to (β1 + β3). (b) Simply perform a t test of whether the coefficient on D is negative or not: H0 : β2 = 0 H1 : β 2 < 0 t DF = b2 SE (b2 ) (c) Simply perform a t test of whether the coefficient on D*X is negative or not, because if so then the slope of response for those who take more than 2 courses will be (β1 + β3) which is less than β 1: H 0 : β3 = 0 H1 : β 3 < 0 t DF = b3 SE (b3 ) PART 1 OF 2 Page 4 of 4 UNIVERSITY OF TORONTO Faculty of Arts and Science AUGUST 2007 EXAMINATIONS ECO220Y1Y PART 2 OF 2 Duration - 3 hours Examination Aids: Calculator SOLUTIONS (1) Which of the following distributions has only one parameter? (B) (2) For continuous random variables a histogram describes which of the following? (C) (3) For a sample of size 22, what is the chance that the sample mean is equal to 1? (A) (4) For a sample size of 10, what is the chance that the sample mean is equal to the population mean? (D) (5) For a normal population, which of these probabilities would be the largest? (B) (6) Considering the following histogram, what is the approximate sample standard deviation? (C) (7) Which statements about the sampling distribution of a sample mean X and the population distribution of X are true? (B) (8) What is the probability that X is within three standard deviations of the mean? (D) (9) For a sample size of 24, we can be 99% confident that the sample mean will be within what interval? (C) (10) Which would yield a sample mean that is most subject to sampling noise? (D) (11) If 28 students are randomly sampled, what is the chance that fewer than 14 are female? (D) (12) If 10 students are randomly sampled, what is the chance that more than 10 percent are international students? (B) PART 2 OF 2 Page 1 of 2 (13) If 400 students are randomly sampled, what is the chance that more than 10 percent are international students? (A) (14) For a normal population with a known variance, suppose that an investigator believes that approximately 95% values are between 38 and 70. The appropriate sample size for estimating the true population mean μ within 2 units with 95% confidence level is approximately: (A) (15) Suppose a random sample of size 5 is taken from this population: X1, X2, X3, X4 and X5. What is the variance of (X1 + X2 + X3 + X4 + X5)? (B) (16) To the nearest hundredth, what is the chance that a random sample of size 49 taken from this population has a mean greater than 4? (B) (17) Suppose that a t test of Ho: μ = 250 versus Ha: μ ≠ 250 is based on 12 degrees of freedom. If the calculated value of the test statistic is 2.8, then the P-value is: (C) (18) If four confidence interval estimates for the population means are constructed with 95% confidence for four independent samples, the probability that all four intervals contain the population means is: (A) (19) A binomial experiment is based on 100 trials and an unknown success probability p. The null hypothesis is H0: p = 0.5 and the alternative hypothesis is H1: p > 0.5. H1 is inferred to be true if pˆ > 0.598 . A minimum sample size that ensures that the power of test is at least 0.975 for an alternative of p = 0.65 is: (B) (20) A dean compares the fraction of students passing a common final examination for 40 randomly selected students from Professor A’s class and 40 randomly selected students from Professor B’s class. Twenty four of Professor A’s group passed the exam and 19 of the students from Professor B’s classes passed. Find a 95% confidence interval for the difference between the success rates of Professors A and B. (C) (21) Consider the estimated regression: yˆ = 15 + 6 x1 + 5 x2 + 4 x1 x2 . On average when x2 = −1.5 a one unit increase in x1 changes the value of ŷ by: (A) (22) If her husband is 72 inches tall, on average the height of a wife in inches is: (B) (23) What is the F-statistic for the statistical significance of the regression? (D) (24) What is the expected difference in price between a 2,000-square foot home with central air conditioning and a 1,800-square foot home with no central air conditioning (given that the values for the other X variables are the same for the two homes)? (D) PART 2 OF 2 Page 2 of 2