Uploaded by sean_tanyuheng

Econometrics Assignment: CO2 Emissions Analysis with R

advertisement
ECOM20001: Econometrics 1
Assignment 3
Student Information
To receive an assignment grade, you must ll out the information in this table and
include this table on the front cover page for your assignment. Only students
whose names and student ID numbers are included on the cover page will
receive marks for the assignment. Groups of up to 3 students are allowed.
Name
Student ID Number
Sally Probability
422552
Xiaosong Statistics
653223
Ipsa Regression
294480
Due Date and Weight
• Submit via LMS by 2 pm AEST on 18 October 2024
• See the subject syllabus on the LMS for penalties for late assignments.
• This assignment is worth 5% of your nal mark in ECOM20001.
• There are 40 marks in total.
What You Must Submit via the LMS
• Assignment answers are, at most, 8 A4 pages with 12-point font, including
tables and graphs. 5 marks will be deducted if answers exceed 8 A4 pages.
• The R code that generates your results. Please copy and paste your R code in
an Appendix at the end of your assignment (e.g., in the .docx le) so markers
can view and test it. The R code does not count toward your 8-page answer
limit. You may alter and shrink the R code font to less than a 12-point so that it
is easier to read. 2 marks will be deducted if you do not include your R code.
fi
fi
1
fi

Assignment 3
Additional Instructions
• You may submit this assignment in groups of up to 3. Students in a group may
come from di erent tutorials.
• You must complete the assignment in no more than 8 A4 pages with 12-point
Arial, Times New Roman, Helvetica, Cambria or Calibri font. The assignment
cover page does not count toward the 8 A4 page limit.
• To save time, you may copy RStudio output directly into your answers in
reporting empirical results. You are also free to create better-formatted tables
based on your RStudio output, which is good practice in learning how to
present empirical results.
• Figures may also be copied and pasted directly into your assignment answers.
They may be scaled down in size to meet the 8-page limit, but please ensure
that your gures are readable. If they are not, marks will be deducted.
• Marks will be deducted if interpretations of results are incorrect, imprecise,
unclear, or not well-scaled. Similarly, marks will be deducted if gures or tables
are incorrect, unclear, poorly labeled, well-scaled, or missing legends.
• When in doubt, work with three digits past the decimal throughout.
• This R code in the Appendix at the end of your assignment (as discussed on
the previous page) must be commented on and easy for the subject tutors to
follow. If the code is poorly commented and hard to follow, marks will be
deducted. Commenting and code clarity must be at the level of tutorial code, or
marks will be deducted.
• Students with a genuine reason for not being able to submit the assignment on
time can apply for special consideration to have the assignment mark
transferred to the exam at the following link:
• https://students.unimelb.edu.au/admin/special/
fi
ff
2
fi

Assignment 3
Getting Started
Please create an Assignment3 folder on your computer, go to the LMS site for
ECOM 20001, and download the following data le into the Assignment1 folder:
• as3_cars.csv
This raw dataset has many variables, but the key variables for the assignment are:
• co2: car CO2 emissions metric, per the WLTP laboratory test
• enghp: engine horsepower in terms of PferdStarke (“PS”)
• engcap: engine capacity (combustion chamber size) in terms of cubic cm
• manualorautomatic: string for “manual” or “automatic”
• powertrain: string containing the type of power a car has
• fueltype: string containing the type fuel a car uses
• manufacturer: string containing the name of the car’s manufacturer
• noiseleveldba: car noise level in terms of decibels
Data summary
This dataset is obtained from the public UK Government data les on car CO2
emissions from their Vehicle Certi cation Agency website. In total, it contains
information for 4,723 unique cars from 43 di erent car manufacturers from around
the world.
About the Assignment
We can use these o cial government data to examine how CO2 emissions vary
with di erent characteristics of a vehicle as well as a potentially non-linear
relationship between CO2 emissions and a car’s engine power. Further, we can
examine whether the relationship di ers across di erent car manufacturer countries
of origin.
Use a 5% signi cance level throughout the entire assignment and assume that
saying a coe cient is “statistically signi cant” is equivalent to saying it is
“statistically signi cantly di erent from 0.”
fi
ff
fi
ff
fi
ff
fi
ff
ffi
fi
fi
ffi
3
ff

Assignment 3
Questions
1. (2 marks) Load the raw data and naming the data frame rawdata. Using the
ggplot() command in R, produce a scatter plot with co2 on the vertical axis and
enghp on the horizontal axis. What looks weird/o with some of the co2 and/or
enghp data points in the raw dataset?
Construct a binary variable called zerovals that equals one if a data point is
weird/o because of the co2 and/or enghp value and zero otherwise. How many
data points have zerovals==1?
Using zeroval and the subset() command in R create a new data frame called
mydata which includes a subsample that only contains observations for which
zeroval==0. In constructing mydata, keep/select all the 8 key variables listed
on the previous page.1
Lastly, in mydata, rescale enghp so it is in terms of 100s of PS. Also, construct
log_co2 = log(co2).
Please use mydata for Questions 2-10 in what follows.
2. (2 marks) Using the ggplot() command in R, produce a scatter plot with log_co2
on the vertical axis and enghp on the horizontal axis that also includes the line of
best t from a third-order polynomial with log_co2 dependent variable and as
the enghp independent variable. Visually, does there appear to be a non-linear
relationship? If so, brie y describe it in words.
3. (3 marks) Use sequential hypothesis testing to formally determine the
appropriate polynomial order for modelling the relationship between log_co2
and enghp. Using stargazer(), report results from estimating cubic, quadratic and
linear regression models with log_co2 as the dependent variable and
(potentially) nonlinear functions of enghp as the regressors (i.e., there are no
other controls in the model).2 For each step of the sequential hypothesis testing
algorithm, state the null and alternative hypothesis for testing a particular
polynomial order, report the outcome of the test, and state whether you should
1 Hint: After
nishing Question 1, the data frames should have the following dimensions:
rawdata
will
have 3901 observations and 45 variables
•
• mydata will have 3739 observations and 8 variables.
2 Note that you may have to construct additional variables from mydata to undertake
sequential hypothesis testing starting from a cubic model.
ff
fl
fi
ff
4
fi

Assignment 3
move to the next order or stop. Assume heteroskedasticity-consistent standard
errors in conducting your tests, as well as for the remainder of the assignment.3
4. (8 marks) Construct 3 sets of dummy variables based on each of the unique
string values within manualorautomatic (2 unique values/dummies), powertrain
(4 unique values/dummies) and fueltype (4 unique values/dummies).
Create another set of dummy variables based on manufacturer. In particular,
construct 4 separate dummies based on whether a manufacturer’s region of
origin is either Asia, Europe, UK, or USA.4
With your four sets of dummy variables, construct a table in stargazer(), with 4
columns reporting regression results with log_co2 as the dependent variable:
• Column (1): regressors are the set of manualorautomatic dummies, where
“Automatic” cars are the base group. Interpret the statistical signi cance and
magnitude of the coe cient for “Manual” cars.
• Column (2): regressors are the set of powertrain dummies, where “Internal
Combustion Engine (ICE)” cars are the base group. Interpret the statistical
signi cance and magnitude of the coe cient for “Hybrid Electric Vehicle
(HEV)” cars.
• Column (3): regressors are the set of fueltype dummies, where “Petrol” cars are
the base group. Interpret the statistical signi cance and magnitude of the
coe cient for “Petrol Electric” cars.
• Column (4): regressors are the set of origin dummies, where “Asian
Manufacturer” cars are the base group. Interpret the statistical signi cance and
magnitude of the coe cient for “European” cars.
5. (4 marks) Based on your sequential hypothesis test results from Question 3,
construct another table in stargazer(), with 4 columns reporting di erent
regression results with log_co2 as the dependent variable:
• Column (1): reproduce the model estimation results for the chosen polynomial
model for log_co2 as a (potentially non-linear) function of enghp from Question
3, including the constant.
3 Report the output from stargazer() produced by R with heteroskedasticity-consistent
standard errors but do not worry about having to update the overall regression F-statistic in
the output to account for heteroskedasticity unless otherwise asked to do so in a question.
4 Hint: you
nd a given manufacturer’s country of origin using a Google search for
“<manufactuer name> country of origin.” You classify 9 Asian, 13 European, 11 UK, and 4
American manufacturers.
fi
fi
ff
fi
ffi
ffi
ffi
fi
fi
5
ffi

Assignment 3
• Column (2): same regression model as column (1) but also includes the set of
manual and powertrain dummy variables from Question 4.
• Column (3): same regression model as column (2) but also includes the
fueltype dummy variables from Question 4.
• Column (4): same regression model as column (3) but also includes the region
of manufacturer origin dummy variables from Question 4.
6. (3 marks) Based on the results in the Column (4) model in Question 5, state the
null and alternative hypothesis of an appropriate test of whether a nonlinear
relationship exists between log_co2 and enghp. Report the outcome of the test
in terms of statistical signi cance, along with the relevant test statistic and its
sampling distribution assuming a large random sample.
7. (7 marks) Based on the results in the Column (4) model in Question 5, report the
estimated partial e ects on log_co2 and their standard errors and 95%
con dence intervals from increasing enghp from 150 to 200 and increasing
enghp from 400 to 450, holding all other factors xed.5 Interpret the partial e ect
estimates from the respective changes in enghp on log_co2. Are your ndings
foreshadowed by results from any previous questions?
8. (3 marks) Construct a new variable, log_noise = log(noiseleveldba). Construct a
table in stargazer(), with 2 columns reporting di erent regression results with
log_co2 as the dependent variable:
• Column (1): includes a Constant, log_noise, and no other controls. Interpret the
statistical signi cance and magnitude of the coe cient on log_noise.
• Column (2): includes as controls the 4 sets of dummy variables you created in
Question 4 for manualorautomatic, powertrain, fueltype, and manufacturer
origin. Interpret the statistical signi cance and magnitude of the coe cient on
log_noise. Also, comment on the degree to which the coe cient magnitude
changes much at all between Columns (1) and (2) and hence omitted variable
bias in the Column (1) estimate.
9. (3 marks) Creating new regressors with appropriate interactions, modify your
regression in Column (2) from question 8 to allow the elasticity of co2 with
respect to noise to di er, depending on whether a car manufacturer’s region is
Asian, European, UK, or American.6 Report your question 9 coe cient estimates
5 That is, you should report a separate partial e
ect estimate, standard error, and
con dence interval for each of these respective changes in enghp.
6 Formulate the regression such that the coe
cient on log_noise (e.g., without any
interactions) corresponds to the elasticity of co2 with respect to noise among Asian cars.
ff
fi
ffi
ffi
ffi
fi
ffi
ff
ff
ffi
fi
fi
ff
ff
fi
fi
6
fi

Assignment 3
in Column (3) along with the Column (1) and (2) estimates from question 8 in the
same stargazer() table, reporting all coe cient estimates involving regressors
with either co2 or noise and all other control variables and the constant. Based
on the point estimates, what is the magnitude of the elasticities of co2 with
respect to noise for Asian, European, UK, and American cars?
10. (3 marks) Based on the results in the Column (3) model in Question 9, state the
null and alternative hypothesis of an appropriate test of whether the elasticity of
co2 with respect to noise is the same for the 4 region values for European, UK,
and American cars. Report the outcome of the test along with the relevant test
statistic and its sampling distribution.
11. (2 marks) R-code: we will review and mark your R code as follows:
• 2/2 if the R code is correct and organised and commented like the solution
code for the assignment.
• 1/2 if the R code is correct but hard to follow or not well commented.
• 0/2 if the R code is incorrect and/or a complete mess or not submitted.
7
ffi

Assignment 3
Download