Econ 526/ Manopimoke Fall2014 Econometrics Practice Final

advertisement
Econ 526/ Manopimoke
Fall2014
Econometrics Practice Final
Topics to Review for the exam: Chapters: 6, 7, 8, 9, 11, 12
The exam will be a TAKE HOME EXAM, which will be posted on the course website at 5 pm. on Thursday
Dec 18. Please type up your answers or scan your written answers and submit it to me at
pymm@ku.edu at 10 pm. on Dec 19. Only PDF files will be accepted. NO LATE SUBMISSIONS OR
ILLEGIBLE EXAMS WILL BE CONSIDERED.
Some practice questions (not comprehensive) are below – more will be covered during the review
session on 12/11. Working on the back of the book questions will also be useful (solutions are already
on the website).
Question 1
Suppose that there are threshold effects when examining the relationship between class size and
student test scores. In particular, when the class size is less than 20, students do well and when the
class size is > 25 students do poorly. How would you test this hypothesis in a linear regression? Suppose
you have data from California that supports your hypothesis. Discuss whether these results would hold
for Kansas in terms of external validity.
Answer: You would create three dummy variables (Class size small = class size < 20; class size middle =
21-25; class size large = > 25). You would run the regression Test score = constant + class size small +
class size large. Then test the null hypotheses that class size small > 0 and class size large < 0.
In order for these results from California to be externally valid for Kansas, the make up of students and
characteristics of the schools would need to be similar in both states. California is a very urban state, it
has more minorities than Kansas, and spends more on education than Kansas. It is unlikely that the
results for California will generalize (be externally valid) for Kansas.
Question 2
Refer to Table 8.3 on the following page to answer the following questions.
a. Use specification (5) to determine the change in test scores from having a student teacher ratio of 20
to 30 when students are in a high English Language Learner district and 50 percent are eligible for free
or reduced lunch, and log income is equal to 2.2516.
Answer: Since this is a nonlinear model you need to calculate the effects and take the difference.
252 + 64.33(30) – 3.42(30^2) +.059(30^3) -5.47(1) - .42(50) +11.75(2.2516) - 252 + 64.33(20) –
3.42(20^2) +.059(20^3) -5.47(1) - .42(50) +11.75(2.2516)
b. Using the output can you reject the null hypothesis that student-teacher ratio has a linear effect on
test scores?
Answer: This refers to the second F-test. We can reject the null that student teacher ratio has a linear
effect on test scores since the polynomial terms are jointly significantly different from zero.
c. What is the interpretation of the coefficient on log income in Model 5? Is this a large or small effect?
Answer: A 1% change in log income is associated with a change in test scores of .1175. As income
increases, test scores barely move.
d. What are the threats to internal validity in Model 1?
Answer: Omitted variable bias: Some variables that explain tests scores are missing (e.g. income).
Functional form: There are nonlinearities in student-teacher ratio that are not included.
e. Explain the rationale behind Model 6, why are the interaction terms included in this model. F-test (c)
tests the null hypothesis that the coefficients on the interaction terms are all zero. Explain what this
means in terms of the model.
Answer: Model 6 allows coefficients to differ by whether the school district has a high percentage of
English Language learners. It could be that the effect of student teacher ratio is different in schools with
a higher percentage of ELL students because the districts have smaller classes to accommodate the ELL
students. The null hypothesis is that there is no difference in the effect of student teacher ratio in
districts with a high percentage of ELL students. This hypothesis is rejected at the 5% level but not at
the 1% level.
Question 3
A study investigated the impact of house price appreciation on household mobility. The underlying idea
was that if a house were viewed as one part of the household's portfolio, then changes in the value of
the house, relative to other portfolio items, should result in investment decisions altering the current
portfolio. Using 5,162 observations, the logit equation was estimated as shown in the table, where the
dependent variable is one if the household moved in 1978 and is zero if the household did not move:
Regression
model
constant
Male
Black
Married78
marriage
change
A7983
PURN
Pseudo-R2
Logit
-3.323
(0.180)
-0.567
(0.421)
-0.954
(0.515)
0.054
(0.412)
0.764
(0.416)
-0257
(0.921)
-4.545
(3.354)
0.016
where male, black, married78, and marriage change are binary variables. They indicate, respectively, if
the entity was a male-headed household, a black household, was married, and whether a change in
marital status occurred between 1977 and 1978. A7983 is the appreciation rate for each house from
1979 to 1983 minus the SMSA-wide rate of appreciation for the same time period, and PNRN is a
predicted appreciation rate for the unit minus the national average rate.
(a) Interpret the results. Comment on the statistical significance of the coefficients. Do the slope
coefficients lend themselves to easy interpretation?
Answer: Since the logit model is nonlinear, the slope coefficients cannot be easily interpreted. However,
the signs of the coefficients indicate the direction of the relationship between the regressors and the
binary dependent variable. Accordingly, being married or having experienced a marriage change
increases the probability of moving. A male-headed household or a black household is less likely to
move. If the predicted appreciation rate relative to the national average increased, then the household
is less likely to move. The same holds for the actual appreciation rate from 1979 to 1983. None of the
slope coefficients are statistically significant with the exception of the black household and marriage
change coefficients. The two t-statistics are –1.85 and 1.84 respectively. These would be statistically
significant at the 5% level of a one-sided hypothesis test.
(b) The mean values for the regressors are as shown in the accompanying table.
Variable
male
black
married78
marriage change
A7983
PNRN
Mean
0.82
0.09
0.78
0.03
0.003
0.007
Taking the coefficients at face value and using the sample means, calculate the probability of a
household moving.
Answer; The probability is 0.021.
(c)
Given this probability, what would be the effect of a decrease in the predicted appreciation rate
of 20 percent, that is A7983 = –0.20?
Answer: The resulting probability would be 0.051, i.e., more than twice the value in the previous result.
Question 4
Consider the following model of demand and supply of coffee:
Demand:
Supply:
= β1
= β3
+ β2
+ β4
+ ui
+ β5Weather + vi
(variables are measure in deviations from means, so that the constant is omitted).
(a) Suppose you want to estimate the price elasticity of demand by running an OLS regression on the
demand equation. Will your estimate of β1 be unbiased? It not, explain why.
Answer: β1 will be biased due to simultaneous causality. The regressor Pcoffee is correlated with the error
term
(b) Suppose you have an exogenous variable Weather. Why can this variable be useful in estimating β1?
Outline the steps that you would follow to estimate β1. Will this give you a consistent estimate of
β1? Explain.
Answer: Changes in Weather is exogenous and therefore it will shift only the supply equation and
thereby trace out the demand equation. Weather is a valid instrumental variable as it is correlated with
the endogenous regressor
first, regress
stage, regress
but not the error term ui. To estimate β1 using the variable Weather,
on Weather and calculate the predicted value of
. Then, in the second
on the predicted value obtained from the first stage. This will yield an unbiased
estimate of β1. The model is exactly identified as you have the same number of endogenous regressors
as instrumental variables.
Question 5
Do we care more about internal or external validity in cross-sectional estimation? What about time2
series estimation? We care about R in time-series estimation. Is this also what we care about most in
cross-sectional estimation? Explain.
Answer: In cross sectional estimation we care more about internal validity. In time series analysis, we
care more about R-squared and external validity since the goal of time series estimation is to forecast
2
future outcomes of the time-series. Thus it’s important to have the model with the highest R in order
to get the best possible forecast. However, in cross-sectional estimation, we are interested in
2
estimating causal effects. Thus, R does not tell us whether we have identified a causal effect that is
free from threats to internal validity (omitted variable bias, etc).
Question 6
Explain why when testing joint hypotheses simultaneously, testing them sequentially ("one at a time"
method) using a series of t-statistics gives unreliable results. What approach should you use instead to
test joint hypotheses? (Hint: Use an example to help explain your answer. Suppose you are testing the
joint hypotheses β1=0 and β1=0. What is the probability of rejecting the joint null hypothesis under using
the “on at a time” method? Is it too low or too high?)
Answer : Using the usual t-statistics to test the restrictions one at a time will give unreliable results and
is not the same as testing the joint hypotheses using the F-statistics. Suppose you are interested in
testing the joint hypotheses b0=0 and b1=0. Consider the special case in which the t-statistics are
uncorrelated and independent. Because the t-statistics are independent, the null is not rejected only if
both |t1|<1.96 and |t2|<1.96 = 0.95*0.95=0.9025, so the one at the time test rejects the null too often.
Intuitively, this is because it gives you too many chances: if you fail to reject using the first t-statistic, you
can try again using the second. If the regressors are correlated, the situation is even more correlated
and the result is still unreliable. Use the F-statistic instead.
Download