Uploaded by Bill Cameron

Assignment 3 Questions

advertisement
ECOM30002/90002 ECONOMETRICS 2
SEMESTER 1, 2023
ASSIGNMENT 3
Due date: 11:59 pm on Wednesday, 10 May, 2023
Information and instructions
• This assignment contributes 7.5% to the overall assessment. There are 2 questions with a
total of 40 marks. Points for each question are provided in brackets as well as for each part
of the question in parenthesis.
• An assignment can be submitted individually or by a group of up to four students. Students
form their own groups. Members of a group need not be enrolled in the same tutorial. All
group members will receive the same mark for the assignment.
• Assignment groups must be registered through LMS by 10am May 8, even if
the group members are the same as in previous assignments (students working
individually need not register). Assignments have to be submitted electronically via LMS.
Instructions for group registration and assignment submission have been announced on LMS.
• List all group members and their student ID number on the first page.
• Include your R-code for each question (either at the end of the document or as separate file
as part of the online submission). A penalty of 5 points is applied for each question for
assignment submissions without R-code.
• Assignments must be submitted as a Portable Document File (pdf). Be short, precise and
clear in your answers and aim to stay within the 600 word limit. Do not present unnecessary
graphs, tables or numbers, and do not discuss irrelevant matters. However, show your work
and intuition: marks are given for explanations and not just for having the correct answer.
Give precise numerical answers when possible and explain what units your answers are in.
Regression output must be presented in clearly labelled equation or table form whenever
either is requested as part of a question.
• Extension Policy
Late assignments will not be accepted. Students with a genuine and documented reason
for not completing an assignment, such as illness, can apply for special consideration to have
the marks for that assignment re-weighted to the exam.
Special Consideration
Students who have been significantly affected by illness or other serious circumstances during
the semester may be eligible to apply for Special Consideration. The following website
contains detailed information relating to who can apply for Special Consideration and the
process for making an application: http://students.unimelb.edu.au/admin/special.
Page 1 of 4
Question 1 [20 marks]
Reconsider the simulation setting from Question 2 in assignment 2 that considered OLS and 2SLS
estimation of a causal equation when there is a possible omitted variable. In this assignment you
will also consider asymptotic properties and a case where the potential IV can vary in its relevance.
The set-up for the data generation below is the same as in Question 2 for Assignment 2:
• The specification for the data generating process is as follows:
Yi = β0 + β1 X1,i + β2 X2,i + β3 X3,i + Ui ,
• A causal equation for the dependent variable Yi is
Yi = β0 + β1 X1,i + β2 X2,i + β3 X3,i + Ui ,
(1)
in which the coefficient β1 is of primary interest as an object for statistical inference.
• The explanatory variable X1,i is generated by an equation of the form
X1,i = α0 + α1 Z1,i + α2 X2,i + α3 X3,i + Vi .
(2)
• The random variables (Z1,i , X2,i , X3,i , Vi )′ are i.i.d. and have a multivariate standard normal
distribution, i.e. N (0, I4 ) where I4 is the 4 by 4 identity matrix.
• The disturbance Ui is independent of (Z1,i , X2,i , X3,i , Vi )′ with the following distribution: Ui
is i.i.d. standard normal (standardised to mean 0, variance 1.)
• Throughout the assignment the population values
β0 = 0; β1 = 1; β2 = 1; α0 = 0;
will be set. The other population parameters will be given various values in the questions
below.
As in Assignment 2, the explanatory variable X3,i will be treated as an unobservable variable
throughout the assignment. (eg X3,i can be thought of as something like ability in a wage equation.)
Even though Yi will be generated from equation (1), which involves X3,,i , calculations of estimators
will proceed without using X3,i . In this sense X3,i is playing the role of an omitted causal variable
that is (possibly) involved in the generation of values for Yi but is not included in statistical analysis
of the data.
(i) (6 marks) To start, set
β3 = 1; α1 = 1; α2 = 0; α3 = 1
For each sample n = 20, 200, 2000 generate 1000 replications of samples from X1,i from (2)
and Yi from (1). For each sample carry out a 2SLS regression and calculate and tabulate,
i.e. provide a table, for the following quantities:
• bias of the estimator for β1
• standard deviation of the estimator for β1
• coverage rate of the 95% confidence interval i.e. percentage of simulated confidence
intervals that include β1 )
• average length of the confidence intervals (length is upper limit minus lower limit)
Page 2 of 4
(ii) (2 marks) Do your simulation results form part (i) suggest that the 2SLS estimator is consistent or inconsistent for β1 ? Briefly explain.
(iii) (2 marks) Imagine you were to redo the table of results you reported in part (i) but now using
the OLS estimator instead of 2SLS. How would you expect the results on bias and coverage
expect to change compared to those for 2SLS? (You are NOT asked to redo the analysis!)
(iv) (5 marks) Set n = 200 and consider the range of values for α1 :
α1 = 0.8, 0.5, 0.3, 0.1, 0
For these 5 cases, carry out the simulations of the same quantities/properties of the 2SLS
estimator as in part (i), i.e. report bias and standard deviation as well as the coverage rate
and length of the confidence interval for all 5 cases in a table.
Provide a histogram of the estimates for β1 from the simulations with α1 = 0.8 and α1 = 0.3,
respectively.
(v) (2 marks) Explain the results that your reported in part (iv) in terms of the validity/invalidity
of Z1,i as an IV for X1,i .
(vi) (3 marks) Consider the OLS estimator for the simulation setting with the following values
β3 = 0, α1 = 1, α3 = 1, n = 500
and a range of values for α2 :
α2 = 0; 2; 4; 6; 8.
The bias and standard deviation of the OLS estimator and the confidence interval in terms
of the coverage rate and interval length are provided in the table in Figure 1.
α2
α2
α2
α2
α2
=
=
=
=
=
0.0
2.0
4.0
6.0
8.0
Bias
−0.002
−0.002
−0.004
−0.006
−0.008
SD Coverage Length cor(X1,i, X2,i)
0.071
94.4
0.280
−0.000
0.109
94.4
0.428
0.755
0.179
94.4
0.706
0.917
0.257
94.4
1.012
0.961
0.337
94.4
1.326
0.977
Figure 1: OLS estimator properties from simulations for α2 = (0, 2, 4, 6, 8).
Briefly explain what aspect/concept the simulation investigates. Briefly discuss the main
conclusions that you can draw from the provided results for the properties of the OLS estimator (bias, standard deviation, coverage rate and confidence interval length).
Page 3 of 4
Question 2 [20 marks]
For this question you will consider the US data on road fatalities and beer tax you have seen in
lectures 13 and 14. The data is provided in the file “A3Fatalities.csv” and contains the following
variables:
• Fatali,t : number of road fatalities in state i in year t
• Popi,t : population of state i in year t
• BeerTaxi,t : tax on beer in state i in year t ($/case, 1988 prices)
• FRatei,t : fatalities per 1000 of population in state i in year t
Suppose that the true causal model is
F Ratei,t = αi + β1 BeerT axi,t + Vit
(i) (2 marks) Should we regress F Ratei,t on BeerT axi,t to estimate β1 ? Why or why not?
(ii) (4 marks) Write down the transformed model that can be estimated via OLS and makes use
of all 7 years of data. Define all terms, including any additional variables that are required
for the transformation of the fatalities and beer tax variables.
Briefly explain what the transformation achieves. (Make sure you have reviewed lecture 14
before you proceed.)
(iii) (6 marks) Next create the within-average variable for the fatalities rate and the beer tax for
each state and call them fratehat and beertaxhat. You can use the variable state_id in a
loop to create these state mean variables. Provide a histograms of fratehat and beertaxhat.
Then create the transformed dependent and explanatory variables (call them fratetilde and
beertaxtilde) that are required for the transformed model in part (ii).
Report the original and transformed variables for the state of California in year 1982. Report
the means and standard deviations of your transformed variables.
Hint: You can generate the transformed variables either in Excel (and the load in the data
into R) or first load the data set into R and undertake the transformation in R. In R, the
within average variables and the transformed variables can be computed using a loop.
In R it is useful to first generate a variable called state_list using the command state_list <
−unique(dt$state) that you can then use to compute the within-state average values by
by selecting indices of observations for a variable associated with a particular state using
which(dt$state == state_list[i]) where i is referring to a unique state and you can use a
loop to go through all the states.
(iv) (4 marks) Estimate the transformed model using OLS. Report the table of estimation results
and interpret the coefficient on β1 .
(v) (4 marks) Finally, estimate a one-way fixed effects model for original data. Compare your
results to those in part (iv) and briefly discuss. Are they what you expected?
Page 4 of 4
Download