Uploaded by Jean Gould

Previous Final Exam from Spring 2019.pdf

advertisement
Department of Economics
Columbia University
Economics UN3412
Spring 2019
Final Exam
Section 2 (Mon/Wed section at 2:40pm)
(Seyhan Erden)
Instructions
1. Do not turn this page until so instructed.
2. This exam ends promptly at 4pm.
3. This exam has five questions for a total of 100 points.
4. Write down your Columbia ID number on the cover of this exam.
5. You are permitted to use a simple calculator. No computers, wireless, or other electronic
devices without prior permission. You may not share resources with anyone else.
6. Some questions ask you to draw a real-world judgment in a problem of practical importance.
The quality of that judgment counts. For example, consider the question: “It is 10oF outside.
In your judgment, why are so many people wearing heavy coats?” The answer, “To stay
warm” would receive more points than the answer, “Because they are fashion-conscious.”
NAME:_________________________________________________________
UNI:__________________________________________________________
1
Question 1 [15 points]:
Imagine that you are interested in learning about the relationship between the research and
development (R&D) expenditures of firms and patents applied for by them. Suppose that you
would like to measure the outcome of innovation by the count of patents. Specifically, let the
dependent variable (π‘Œπ‘–π‘‘ ) be the total number of patents applied for by firm 𝑖 in year 𝑑, and the
explanatory variable (𝑋𝑖𝑑 ) the logarithm of real R&D expenditures. We have data {(π‘Œπ‘–π‘‘ , 𝑋𝑖𝑑 ): 𝑖 =
1, … , 𝑛; 𝑑 = 1, 2} for a large number (𝑛) of firms and for two years (𝑇 = 2). Consider the
linear panel data model
π‘Œπ‘–π‘‘ = 𝑋𝑖𝑑 𝛽1 + 𝛼𝑖 + πœ†π‘‘ + 𝑒𝑖𝑑 , 𝑑 = 1,2; 𝑖 = 1, … , 𝑛
Where 𝛼𝑖 is the firm fixed effect and πœ†π‘‘ is the year fixed effect.
(a) (4p) Explain why a linear regression model with only one year of data may not be
appropriate for understanding the relationship between R&D and patents.
(b) (4p) Describe carefully how you would like to estimate 𝛽1 in this case.
2
(c) (4p) A friend of yours claims that both πœ†1 and πœ†2 can be identified and estimated
consistently, since you have two years of data. Would you agree? Justify your answer.
(d) (3p) Another friend of yours suggests that you should use “clustered standard errors” that are
robust against possible correlations across years within a firm. Would you agree? Justify your
answer.
3
Question 2 [15 points]:
A study tried to find the determinants of the increase in the number of households headed by a
female. Using 1940 and 1960 historical census data, a logit model was estimated to predict
whether a woman is the head of a household (living on her own) or whether she is living within
another's household. The limited dependent variable takes on a value of one if the female lives
on her own and is zero if she shares housing. The results for 1960 using 6,051 observations on
prime-age whites and 1,294 on nonwhites were as shown in the table:
Regression
Regression model
Constant
Age
age squared
education
farm status
South
expected family
earnings
family
composition
Pseudo-R2
Percent Correctly
Predicted
(1) White
Logit
1.459
(0.685)
-0.275
(0.037)
0.00463
(0.00044)
-0.171
(0.026)
-0.687
(0.173)
0.376
(0.098)
0.0018
(0.00019)
4.123
(0.294)
0.266
(2) Nonwhite
Logit
-2.874
(1.423)
0.084
(0.068)
0.00021
(0.00081)
-0.127
(0.038)
-0.498
(0.346)
-0.520
(0.180)
0.0011
(0.00024)
2.751
(0.345)
0.189
82.0
83.4
where age is measured in years, education is years of schooling of the family head, farm status is
a binary variable taking the value of one if the family head lived on a farm, south is a binary
variable for living in a certain region of the country, expected family earnings was generated
from a separate OLS regression to predict earnings from a set of regressors, and family
composition refers to the number of family members under the age of 18 divided by the total
number in the family.
4
The mean values for the variables were as shown in the table.
Variable
age
age squared
education
farm status
south
expected family
earnings
family composition
(1) White mean
46.1
2,263.5
12.6
0.03
0.3
2,336.4
(2) Nonwhite mean
42.9
1,965.6
10.4
0.02
0.5
1,507.3
0.2
0.3
(a) (5p) Interpret the results. Do the coefficients have the expected signs? Why do you think age
was entered both in levels and in squares?
5
(b) (5p) Calculate the difference in the predicted probability between whites and nonwhites at
the sample mean values of the explanatory variables. Why do you think the study did not
combine the observations and allowed for a nonwhite binary variable to enter?
(c) (5p) What would be the effect on the probability of a nonwhite woman living on her own, if
education and family composition were changed from their current mean to the mean of
whites, while all other variables were left unchanged at the nonwhite mean values?
6
Question 3 [15 points]:
Earnings functions, whereby the log of earnings is regressed on years of education, years of on the job
training, and individual characteristics, have been studied for a variety of reasons. Some studies have
focused on the returns to education, others on discrimination, union non-union differentials, etc. For all
these studies, a major concern has been the fact that ability should enter as a determinant of earnings, but
that it is close to impossible to measure and therefore represents an omitted variable.
Assume that the coefficient on years of education is the parameter of interest. Given that education is
positively correlated to ability, since, for example, more able students attract scholarships and hence
receive more years of education, the OLS estimator for the returns to education could be upward biased.
To overcome this problem, various authors have used instrumental variable estimation techniques. For
each of the instruments potential instruments listed below briefly discuss instrument validity.
(a) (3p) The individual's postal zip code.
(b)(4p) The individual's IQ or testscore on a work related exam.
7
(c) (4p) Years of education for the individual's mother or father.
(d)(4p) Number of siblings the individual has.
8
Question 4 [15 points]:
Your textbook estimates the initial relationship between the percentage change of real frozen OJ
and the freezing degree days as follows:
t = -0.40 + 0.47 FDDt
(0.22) (0.13)
t = 1950:1 — 2000:12,
𝑅2 = 0.09, 𝑆𝐸𝑅 = 4.8
(a) (4p) Calculate the t-statistic for the slope coefficient. Can you reject the null hypothesis
that the coefficient is zero in the population at 5% level?
(b) (5p) The above regression was estimated using HAC standard errors. When you reestimate the regression using homoskedasticity-only standard errors, the standard error of
the slope coefficient drops to 0.06. Calculate the t-statistic for the slope coefficient again.
9
(c) (5p) Which of the two standard errors should you use for statistical inference?
10
Question 5 [40 points]:
True/False/Uncertain with explanations
(a) The two conditions for a valid instrument are corr(Zi, Xi) = 0 and corr(Zi, ui) ≠ 0.
(b) Stationarity means that the probability distribution of the time series variable does not
change over time.
11
(c) In time series for dynamic causal effects model if the regressor is strictly exogenous OLS
still efficient as an estimator of dynamic causal effects.
(d) Time Fixed Effects regression are useful in dealing with omitted variables if these
omitted variables are constant over time but vary across entities.
12
Selected Tables from Stock and Watson, Introduction to Econometrics
13
14
15
16
Download