Econometrics 532

advertisement
Econometrics 532
Midterm exam
Name:
Answer all questions on the sheets provided. Make sure to show all your work.
Problem 1 (30 points, 5 points extra credit)
Suppose that we ask a random sample of 25 LMU students to state how much they like spicy
foods on a scale from 1 to 10, where 1 is “I prefer my food to have no spice whatsoever” and 10
is “I prefer my food to permanently scar the roof of my mouth.” Define x to be the answer that
any student gives and x to be the mean value of x for the 25 students in the sample.
We want to use our sample of 25 LMU students to draw conclusions about how all LMU
students feel about spicy food.
a) Remember that x could be a range of values depending on the 25 LMU students who are
chosen to be in the sample. Describe the distribution of x . Assume that the central limit
theorem applies even though the sample size is less than 30. (10 points)
The distribution of x is normally distributed.
b) Now suppose that x = 7 for the 25 students that you chose and that the true standard deviation
of x is 2 (so that   2 ). What is the probability that the true mean  for all LMU students is
greater than 8? (10 points)
The standard deviation of x is 0.4, so 8 is 2.5 standard deviations above the mean. Then,
P(   8)  P( z  2.5)  .0062
c) Now suppose that we don’t know what the true standard deviation of x is. Instead we have an
estimate s of the true standard deviation that comes from our sample. If s = 2, calculate a 95%
confidence interval for the true mean. (Hint: Be careful with the distribution that you use). (10
points)
Now we need to use the t-distribution, since we don’t know the true variance of x. To get a 95%
confidence interval, we need to find the t-values that give us an area of .025 in both of the tails.
For a t-distribution with 24 degrees of freedom, the critical values are 2.064 .
Since the estimated standard deviation of x is 0.4, 6.17 is 2.064 standard deviations below the
mean and 7.83 is 2.064 standard deviations above the mean. Therefore,
95% confidence interval for  = (6.17,7.83)
Extra credit:
d) If s = 2 (so that s 2  4 ), do we accept or reject the hypothesis that the true variance of x is less
than 2.636 at the 1% significance level? You must show all work to receive credit for this
problem. (5 points)
We want to find
P( 2  2.636)
To find this probability, we need to use the  2 distribution. We need to rearrange the terms on
both sides of the equation so we have (n  1)
s2
2
on the left side.

1 
s2
s2 
 1
2
P( 2  2.64)  P  2 

P
(
n

1)

(
n

1)

  P(  24  36.42)  0.05

2
2.636 

2.636 


This probability is small, but it is greater than 0.01, so we cannot reject the hypothesis that the
true variance of x is less than 2.636.
Problem 2 (40 points)
Take our basic linear regression model with two variables:
yi  0  1 x1i  2 x2i  ui
Assume that the usual assumptions about ui hold true.
a) By drawing a picture, show the problem with leaving the constant term  0 out of a regression.
(10 points)
Your picture should show that we will get the wrong estimate for the slope of the line if we
restrict the regression line to go through the origin.
b) Suppose we accidentally run the regression yi  0  1 x1i  ui instead of the correct model
that includes x2i . Under what circumstances will we get a biased estimate of 1 , the true effect
of x1i on yi ? Under what circumstances will we get an unbiased estimate of the true effect of x1i
on yi ? (10 points)
If x2i is related to x1i , we will get a biased coefficient. If x2i is not related to x1i , the coefficient
will not be biased.
c) The regression model assumes that E (ui2 | xi )   2 . Explain how you could use the results of
running the correct regression to get an estimate of  2 . (10 points)
Add up the squared residuals and divide by n-1. This is basically just taking the mean of all the
estimated values of u i2 to estimate E (ui2 | xi ) .
d) Suppose that the scatterplots below each represent one possibility for a plot of the residuals uˆi
from running the regression against the values of x1i . Explain clearly for each picture which, if
any, of the regression assumptions are violated by these residuals. (10 points)
Case 1: The identification assumption is violated. The residuals are correlated with x1i.
Case 2: Homoskedasiticty is violated. The variance of the error term increases with x1i.
Case 3: No problems. Residuals look OK.
Case 4: The expected value of the residuals is not zero. The average value is negative.
Case 1:
residuals
10.3204
-9.25581
-2.88299
3.64159
x1
Case 2:
residuals
18.3872
-16.9872
-2.88299
3.64159
x1
Case 3:
residuals
11.6772
-11.3521
-2.88299
3.64159
x1
Case 4:
residuals
.347511
-24.6896
-2.88299
3.64159
x1
Problem 3 (30 points)
Suppose that you are trying to find out the effect of corruption on a country’s per capita income.
You are going to run the regression:
yi  0  1 x1i  ui , where
yi = per capita income for country i
x1i = percent of government spending that is stolen by public officials in country i
The table below describes the results of running this regression for 200 countries.
. reg y x1
Source |
SS
df
MS
-------------+-----------------------------Model |
429749777
1
429749777
Residual | 43208417.7
198 218224.332
-------------+-----------------------------Total |
472958194
199 2376674.34
Number of obs
F( 1,
198)
Prob > F
R-squared
Adj R-squared
Root MSE
=
200
= 1969.30
= 0.0000
= 0.9086
= 0.9082
= 467.14
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x1 | -101.2791
2.282253
-44.38
0.000
-105.7798
-96.7785
_cons |
35031.91
66.16225
529.48
0.000
34901.43
35162.38
------------------------------------------------------------------------------
a) This regression says that, if corruption falls by 1%, a country’s per capita income goes up by
about $100 on average. Clearly explain at least one reason why this estimate of the effect of
corruption on income may be biased downwards. In other words, the true effect of lowering
corruption by 1% may be less than an additional $100 of per capita income. (10 points)
There may be variables left out of the regression that are correlated with both corruption and
per-capita income. For example, a country’s literacy rate may be such a variable. If we don’t
include the literacy rate in the regression, then the coefficient for corruption will pick up not
only the effect of corruption on income, but also the effect of the literacy rate.
x1i
and we run the regression of yi on x’1i. So our
100
model is yi  0   '1 x '1i  ui . Given the result in the table above, what will be the estimated
coefficient ˆ ' that comes out of this regression? (10 points)
b) Suppose we define a new variable x '1i 
1
The new coefficient will satisfy ˆ '1  100 ˆ1  10,128 .
c) Now suppose we observe every country at two different points in time, so that:
Year t: yit  0  1 x1it  uit
Year s: yis  0  1 x1is  uis
Suppose there is some other variable, x2i, that should be included in the regression and is causing
the estimate of 1 to be biased. To solve this problem, you consider the idea of taking the
difference of the two equations, which gives:
yit  yis  1 ( x1it  x1is )  (uit  uis )
So yit  yis is the change in income in country i between years s and t and x1it  x1is is the change
in corruption in country i between years s and t. Under what circumstances will running the
above regression of yit  yis on x1it  x1is give an unbiased estimate of 1 ? (10 points)
(Hint: Think about what must be true about x2i for this new regression to give an unbiased
estimate. I would suggest starting by writing out an expression for uit  uis that reflects the fact
that it is picking up x2i in addition to the random error term.)
uit  uis  2 x2it   it   2 x2is   is   2 ( x2it  x2is )   it   is
For this new regression to give us the right answer, we need for ( x2it  x2is ) , the change in x2i
between year s and year t to be uncorrelated with ( x1it  x1is ) . This condition could be satisfied if
x2i is constant over time. Then uit  uis   it   is is just a random error term and taking the
difference over the two years has solved the problem; we’ll get an unbiased coefficient.
Download