Econometrics 532 Midterm exam Name: Answer all questions on the sheets provided. Make sure to show all your work. Problem 1 (30 points, 5 points extra credit) Suppose that we ask a random sample of 25 LMU students to state how much they like spicy foods on a scale from 1 to 10, where 1 is “I prefer my food to have no spice whatsoever” and 10 is “I prefer my food to permanently scar the roof of my mouth.” Define x to be the answer that any student gives and x to be the mean value of x for the 25 students in the sample. We want to use our sample of 25 LMU students to draw conclusions about how all LMU students feel about spicy food. a) Remember that x could be a range of values depending on the 25 LMU students who are chosen to be in the sample. Describe the distribution of x . Assume that the central limit theorem applies even though the sample size is less than 30. (10 points) The distribution of x is normally distributed. b) Now suppose that x = 7 for the 25 students that you chose and that the true standard deviation of x is 2 (so that 2 ). What is the probability that the true mean for all LMU students is greater than 8? (10 points) The standard deviation of x is 0.4, so 8 is 2.5 standard deviations above the mean. Then, P( 8) P( z 2.5) .0062 c) Now suppose that we don’t know what the true standard deviation of x is. Instead we have an estimate s of the true standard deviation that comes from our sample. If s = 2, calculate a 95% confidence interval for the true mean. (Hint: Be careful with the distribution that you use). (10 points) Now we need to use the t-distribution, since we don’t know the true variance of x. To get a 95% confidence interval, we need to find the t-values that give us an area of .025 in both of the tails. For a t-distribution with 24 degrees of freedom, the critical values are 2.064 . Since the estimated standard deviation of x is 0.4, 6.17 is 2.064 standard deviations below the mean and 7.83 is 2.064 standard deviations above the mean. Therefore, 95% confidence interval for = (6.17,7.83) Extra credit: d) If s = 2 (so that s 2 4 ), do we accept or reject the hypothesis that the true variance of x is less than 2.636 at the 1% significance level? You must show all work to receive credit for this problem. (5 points) We want to find P( 2 2.636) To find this probability, we need to use the 2 distribution. We need to rearrange the terms on both sides of the equation so we have (n 1) s2 2 on the left side. 1 s2 s2 1 2 P( 2 2.64) P 2 P ( n 1) ( n 1) P( 24 36.42) 0.05 2 2.636 2.636 This probability is small, but it is greater than 0.01, so we cannot reject the hypothesis that the true variance of x is less than 2.636. Problem 2 (40 points) Take our basic linear regression model with two variables: yi 0 1 x1i 2 x2i ui Assume that the usual assumptions about ui hold true. a) By drawing a picture, show the problem with leaving the constant term 0 out of a regression. (10 points) Your picture should show that we will get the wrong estimate for the slope of the line if we restrict the regression line to go through the origin. b) Suppose we accidentally run the regression yi 0 1 x1i ui instead of the correct model that includes x2i . Under what circumstances will we get a biased estimate of 1 , the true effect of x1i on yi ? Under what circumstances will we get an unbiased estimate of the true effect of x1i on yi ? (10 points) If x2i is related to x1i , we will get a biased coefficient. If x2i is not related to x1i , the coefficient will not be biased. c) The regression model assumes that E (ui2 | xi ) 2 . Explain how you could use the results of running the correct regression to get an estimate of 2 . (10 points) Add up the squared residuals and divide by n-1. This is basically just taking the mean of all the estimated values of u i2 to estimate E (ui2 | xi ) . d) Suppose that the scatterplots below each represent one possibility for a plot of the residuals uˆi from running the regression against the values of x1i . Explain clearly for each picture which, if any, of the regression assumptions are violated by these residuals. (10 points) Case 1: The identification assumption is violated. The residuals are correlated with x1i. Case 2: Homoskedasiticty is violated. The variance of the error term increases with x1i. Case 3: No problems. Residuals look OK. Case 4: The expected value of the residuals is not zero. The average value is negative. Case 1: residuals 10.3204 -9.25581 -2.88299 3.64159 x1 Case 2: residuals 18.3872 -16.9872 -2.88299 3.64159 x1 Case 3: residuals 11.6772 -11.3521 -2.88299 3.64159 x1 Case 4: residuals .347511 -24.6896 -2.88299 3.64159 x1 Problem 3 (30 points) Suppose that you are trying to find out the effect of corruption on a country’s per capita income. You are going to run the regression: yi 0 1 x1i ui , where yi = per capita income for country i x1i = percent of government spending that is stolen by public officials in country i The table below describes the results of running this regression for 200 countries. . reg y x1 Source | SS df MS -------------+-----------------------------Model | 429749777 1 429749777 Residual | 43208417.7 198 218224.332 -------------+-----------------------------Total | 472958194 199 2376674.34 Number of obs F( 1, 198) Prob > F R-squared Adj R-squared Root MSE = 200 = 1969.30 = 0.0000 = 0.9086 = 0.9082 = 467.14 -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x1 | -101.2791 2.282253 -44.38 0.000 -105.7798 -96.7785 _cons | 35031.91 66.16225 529.48 0.000 34901.43 35162.38 ------------------------------------------------------------------------------ a) This regression says that, if corruption falls by 1%, a country’s per capita income goes up by about $100 on average. Clearly explain at least one reason why this estimate of the effect of corruption on income may be biased downwards. In other words, the true effect of lowering corruption by 1% may be less than an additional $100 of per capita income. (10 points) There may be variables left out of the regression that are correlated with both corruption and per-capita income. For example, a country’s literacy rate may be such a variable. If we don’t include the literacy rate in the regression, then the coefficient for corruption will pick up not only the effect of corruption on income, but also the effect of the literacy rate. x1i and we run the regression of yi on x’1i. So our 100 model is yi 0 '1 x '1i ui . Given the result in the table above, what will be the estimated coefficient ˆ ' that comes out of this regression? (10 points) b) Suppose we define a new variable x '1i 1 The new coefficient will satisfy ˆ '1 100 ˆ1 10,128 . c) Now suppose we observe every country at two different points in time, so that: Year t: yit 0 1 x1it uit Year s: yis 0 1 x1is uis Suppose there is some other variable, x2i, that should be included in the regression and is causing the estimate of 1 to be biased. To solve this problem, you consider the idea of taking the difference of the two equations, which gives: yit yis 1 ( x1it x1is ) (uit uis ) So yit yis is the change in income in country i between years s and t and x1it x1is is the change in corruption in country i between years s and t. Under what circumstances will running the above regression of yit yis on x1it x1is give an unbiased estimate of 1 ? (10 points) (Hint: Think about what must be true about x2i for this new regression to give an unbiased estimate. I would suggest starting by writing out an expression for uit uis that reflects the fact that it is picking up x2i in addition to the random error term.) uit uis 2 x2it it 2 x2is is 2 ( x2it x2is ) it is For this new regression to give us the right answer, we need for ( x2it x2is ) , the change in x2i between year s and year t to be uncorrelated with ( x1it x1is ) . This condition could be satisfied if x2i is constant over time. Then uit uis it is is just a random error term and taking the difference over the two years has solved the problem; we’ll get an unbiased coefficient.