Part 1: Multiple Choice Questions (40 points) Circle the right answer

advertisement
Part 1: Multiple Choice Questions (40 points)
Circle the right answer. Only one answer per question. No credit is given for multiple answers or
additional explanations. Two points per question for correct answers.
1) Consider the regression model π‘Œ = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝑒. Suppose it is reasonable to assume that
𝐸(𝑒|𝑋1 , 𝑋2 ) = 𝐸(𝑒|𝑋2 ). This assumption implies that we can causally interpret OLS estimates of
a. 𝛽0 .
b. 𝛽1 .
c. 𝛽2 .
d. 𝛽1 and 𝛽2 .
2) In multiple regression, the R2 increases whenever an explanatory variable is
a. added unless the coefficient on the added variable is exactly zero.
b. added unless the adjusted R2 falls.
c. added unless there is heteroskedasticity.
d. added unless the added variable is not statistically significant at the 5%-level.
3) The estimate on an explanatory variable is not statistically significant at the 5%-level if
a. the 95 % confidence interval does not include zero.
b. the t-statistic is greater than 2.5.
c. the p-value is less than 0.05.
d. the p-value is greater than 0.95.
4) Consider testing the hypothesis: 𝛽1 = 𝛽2 . Your chosen level of significance is 5 %. One of the
following statements is not correct:
a. This hypothesis can only be tested via an F-test.
b. This hypothesis can be tested using a t-test or an F-test.
c. In large samples, you reject the hypothesis if the computed F-statistic > 3.84.
d. In large samples, you reject the hypothesis if the computed t-statistic > 1.96.
5) In the regression model π‘Œπ‘– = 𝛽0 + 𝛽1 𝐢𝑖 + 𝛽2 𝐹𝑖 + 𝛽3 (𝐢𝑖 × πΉπ‘– ) + 𝑒𝑖 , where Y denotes earnings, C a
dummy variable for having a college degree and F a gender dummy variable, 𝛽2
a. is the gender difference in earnings for someone with a college degree.
b. is the gender difference in earnings for someone without a college degree.
c. is the difference in earnings between those with and without a college degree when 𝐹𝑖 = 0.
d. cannot be estimated since 𝐹𝑖 and (𝐢𝑖 × πΉπ‘– ) are perfectly collinear when 𝐹𝑖 = 0.
6) The following are all sensible specifications of a non-linear model with the exception of
a. π‘Œπ‘– = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 ln π‘Œπ‘– + 𝑒𝑖 .
b. ln π‘Œπ‘– = 𝛽0 + 𝛽1 ln 𝑋𝑖 + 𝑒𝑖 .
c. ln π‘Œπ‘– = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑒𝑖 .
d. π‘Œπ‘– = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋𝑖2 + 𝑒𝑖 .
7) External validity
a. is guaranteed in an ideal randomized experiment.
b. is threatened if the regression error terms are heteroskedastic.
c. is threatened if there is omitted variables bias.
d. is threatened if there is measurement error in the dependent variable.
8) You want to estimate the price elasticity of cigarette demand. To do that you collect time series data
on prices and quantities sold in the Stockholm area. The major concern for such a study is:
a. simultaneous causality.
b. errors in variables bias.
c. wrong functional form.
d. sample selection.
9) You are interested in the effects of participating in a training program (which may be of varying
length). You have data on wages after program completion for those who participated in the
program and a potential comparison group. A major concern for this study is:
a. misspecification of the functional form.
b. sample selection bias.
c. bias caused by a so-called Hawthorne effect.
d. that you miss information on program length.
10) Heteroskedasticity-robust standard errors are invalid in large samples if
a. the errors are homoskedastic.
b. the error variance differs across observations.
c. the errors are correlated across observations.
d. the dependent variable is binary.
11) Consider the probit model Pr(π‘Œ = 1| 𝑋) = Φ(𝛽0 + 𝛽1 𝑋), where 𝑋 is a female dummy variable. The
marginal effect (𝑀𝐸) of being female (as opposed to male) on Pr(π‘Œ = 1) is given by
a. 𝑀𝐸 = Φ(𝛽̂0 + 𝛽̂1 ) − Φ(𝛽̂0 ).
b. 𝑀𝐸 = 𝛽̂1.
c. 𝑀𝐸 = Φ(𝛽̂0 + 𝛽̂1 𝑋̅) − Φ(𝛽̂0 ) (where 𝑋̅ denotes the mean of 𝑋).
d. 𝑀𝐸 = Φ′ (βˆ™)𝛽̂1 (where Φ′(βˆ™) denotes the derivative of Φ(βˆ™)).
12) One of the following statements is not true. In Probit and Logit models
a. the t-statistic should still be used for testing a single restriction.
b. you can include binary variables as explanatory variables.
c. you use Maximum Likelihood estimation.
d. F-statistics should not be used, since the models are nonlinear.
13) Consider the panel data model: π‘Œπ‘–π‘‘ = 𝛼𝑖 + 𝛽1 𝑋𝑖𝑑 + 𝑒𝑖𝑑 . You can estimate 𝛽1 by first eliminating
𝛼𝑖 and then estimating the transformed model. Two transformations are "entity-demeaning" and
“first-differencing”. These two approaches
a. yield identical estimates of 𝛽1 if 𝑇 = 2.
b. yield identical estimates of 𝛽1 if 𝑇 > 2.
c. always yield identical estimates of 𝛽1 .
d. never yield identical estimates of 𝛽1 .
14) Indicate for which of the following examples you cannot use entity and time fixed effects: a
regression of
a. OECD unemployment rates on unemployment insurance generosity for the years 1980-2006.
b. the (log of) earnings on years of education, using the Swedish Level of Livings Survey in 2000.
c. the per capita income level in Swedish municipalities on local tax rates using data for 1980, 1990,
2000, and 2010.
d. the market values for 100 firms listed on the Swedish stock exchange on R&D expenditures for
the years 1998-2010.
15) The panel data model with entity and time fixed effects
a. handles any kind of omitted variables bias.
b. reduces bias caused by measurement error.
c. deals with simultaneous causality bias.
d. requires that the variable of interest varies over entities and time.
16) When there is a single instrument and a single (endogenous) regressor, the TSLS estimator for the
slope can be calculated as follows (π‘π‘œπ‘£
Μ‚ (βˆ™) (π‘£π‘Žπ‘Ÿ
Μ‚ (βˆ™)) denotes estimated covariance (variance))
Μ‚
a. 𝛽1 = π‘π‘œπ‘£
Μ‚ (𝑋, π‘Œ)/π‘£π‘Žπ‘Ÿ
Μ‚ (𝑋).
b. 𝛽̂1 = π‘π‘œπ‘£
Μ‚ (𝑍, 𝑋)/π‘π‘œπ‘£
Μ‚ (𝑍, π‘Œ).
c. 𝛽̂1 = π‘π‘œπ‘£
Μ‚ (𝑍, π‘Œ)/π‘π‘œπ‘£
Μ‚ (𝑍, 𝑋).
d. 𝛽̂1 = π‘π‘œπ‘£
Μ‚ (𝑍, π‘Œ)/π‘£π‘Žπ‘Ÿ
Μ‚ (𝑍).
17) You want to estimate the model: π‘Œπ‘– = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 π‘Šπ‘– + 𝑒𝑖 , where 𝑍𝑖 is a potential instrument for
𝑋𝑖 and π‘Šπ‘– a control variable. The exogeneity assumption required for TSLS is fulfilled if
a. you have information on π‘Šπ‘– , and 𝑍𝑖 has a direct effect on π‘Œπ‘– holding 𝑋𝑖 and π‘Šπ‘– constant.
b. you have information on π‘Šπ‘– , and 𝐸(𝑒|𝑍𝑖 , π‘Šπ‘– ) ≠ 𝐸(𝑒|π‘Šπ‘– ).
c. you lack information on π‘Šπ‘– , and 𝑍𝑖 and π‘Šπ‘– are uncorrelated.
d. you lack information on π‘Šπ‘– , and 𝑍𝑖 and π‘Šπ‘– are correlated.
18) With one exception, the following scenarios lend themselves to a Regression-Discontinuity design:
a. A test score result determines eligibility for a college grant.
b. Distance to an administrative border determines eligibility for a tax break.
c. A random subset of Swedish municipalities gets additional funding for schools.
d. Vote shares in a two-party system determine which party gets into office.
19) In the ideal randomized experiment
a. you can estimate the individual causal effects for all individuals participating in the experiment.
b. you must control for variables that are correlated with the dependent variable.
c. self-selection bias is a serious issue.
d. you can estimate the average causal effect for individuals participating in the experiment.
20) A Differences-in-Differences (DiD) approach
a. always requires data from a randomized controlled experiment.
b. can always be implemented if you have data covering at least two time points.
c. can be used with a single cross-section of data
d. can be implemented if you have data covering at least two time points, given that the treatment
affected a sub-set of the population.
Part 2: Discussion Questions (60 points)
Answer the following questions on separate sheets of paper. Answer clearly and concisely. Only legible
answers will be considered, others will be disregarded. If you think that a question is vaguely
formulated, specify the conditions used for answering it. Each question is worth 30 points.
Discussion Question 1
A long-standing question in labor economics is whether the generosity of unemployment benefits
increase unemployment. To study this question, researchers have used panels of countries with
observations spanning several years.
In a well-known book (Layard et al. (1991) Unemployment, p. 55), the authors present the regression
results reported in the table below. The results come from a cross-section of 20 countries using data
from the mid 1980’s. The generosity of unemployment benefits is measured by two variables: (i) benefit
duration (i.e. the maximum length of benefit receipt); and (ii) the benefit replacement ratio (i.e.
unemployment benefits in relation to the average wage).
Table: The relationship between unemployment and unemployment benefits
(Dependent variable: Average unemployment rate (%), 1983–88)
Estimate
(t-statistic)
Independent variables
Benefit duration (years)
Benefit replacement ratio (%)
0.92
(2.9)
0.17
(7.1)
Notes: The regression also includes a constant plus 5 other control variables (spending on active labor market policies,
coverage of collective wage bargaining, employer coordination, union coordination, and change in inflation). Number of
observations: 20. Adjusted R-squared: 0.91. The critical t-value at the 5%-level (with 12 degrees of freedom) is 2.179.
a) Interpret the two coefficient estimates.
b) Explain why the OLS estimator of the effect of unemployment benefit generosity on unemployment
may be biased in this case.
c) An alternative to using OLS is to exploit the panel structure of the data. Discuss the fixed effects
regression model in the current application. Does the fixed effects model alleviate the problem(s) of
OLS?
d) An alternative to OLS and fixed effects regression, is instrumental variables. Consider using a leftwing political majority as an instrumental variable. Discuss whether this would be a valid instrument.
Discussion Question 2
Suppose you are interested in estimating the causal effect of class size on pupil’s test scores. You want
to estimate the relationship:
π‘Œπ‘– = 𝛽0 + 𝛽1 𝐢𝑆𝑖 + 𝛽2 𝑋𝑖 + 𝑒𝑖
where i indexes individuals, Y denotes an individual’s test score, CS class size, and X a set of control
variables.
a) Consider estimating the above equation by OLS. Why is OLS likely to be biased? What is the likely
sign of the bias?
b) A number of researchers have noted that maximum class size rules can be useful for identifying the
causal effect of class size. The solid line in the figure below (labeled “expected class size”) shows an
example of such a maximum class size rule. The rule stipulates that new classes are formed when
total enrollment in a grade surpasses multiples of 30. Thus, one class is formed when total 4th grade
enrollment in a school district is less than 30. When total enrollment is between 31 and 60, two
classes are formed; when total enrollment is between 61 and 90, three classes are formed, and so
on. The dashed line shows actual class sizes. Actual class sizes do not follow the rule completely.
Explain how the maximum class size rule may help you in estimating the causal effect of class size.
What is the key “identifying assumption”? How would you test this identifying assumption? How
would you specify the regression(s) that you would use to estimate the causal effect of interest?
c) A regression of actual class size on expected class size (and control variables) yields an estimate on
expected class size of 0.335 (with a standard error of 0.051). What does this information tell you
about the validity of the research design?
d) Separate regressions of parental education and parental income (measured before the children are
age 10) on expected class size (and control variables) produce estimates that have t-ratios of: –0.15
(parental education) and 0.08 (parental income). What does this information tell you about the
validity of the research design?
Download