ECOM20001: Econometrics 1 Assignment 3 Student Information To receive an assignment grade, you must ll out the information in this table and include this table on the front cover page for your assignment. Only students whose names and student ID numbers are included on the cover page will receive marks for the assignment. Groups of up to 3 students are allowed. Name Student ID Number Sally Probability 422552 Xiaosong Statistics 653223 Ipsa Regression 294480 Due Date and Weight • Submit via LMS by 2 pm AEST on 18 October 2024 • See the subject syllabus on the LMS for penalties for late assignments. • This assignment is worth 5% of your nal mark in ECOM20001. • There are 40 marks in total. What You Must Submit via the LMS • Assignment answers are, at most, 8 A4 pages with 12-point font, including tables and graphs. 5 marks will be deducted if answers exceed 8 A4 pages. • The R code that generates your results. Please copy and paste your R code in an Appendix at the end of your assignment (e.g., in the .docx le) so markers can view and test it. The R code does not count toward your 8-page answer limit. You may alter and shrink the R code font to less than a 12-point so that it is easier to read. 2 marks will be deducted if you do not include your R code. fi fi 1 fi  Assignment 3 Additional Instructions • You may submit this assignment in groups of up to 3. Students in a group may come from di erent tutorials. • You must complete the assignment in no more than 8 A4 pages with 12-point Arial, Times New Roman, Helvetica, Cambria or Calibri font. The assignment cover page does not count toward the 8 A4 page limit. • To save time, you may copy RStudio output directly into your answers in reporting empirical results. You are also free to create better-formatted tables based on your RStudio output, which is good practice in learning how to present empirical results. • Figures may also be copied and pasted directly into your assignment answers. They may be scaled down in size to meet the 8-page limit, but please ensure that your gures are readable. If they are not, marks will be deducted. • Marks will be deducted if interpretations of results are incorrect, imprecise, unclear, or not well-scaled. Similarly, marks will be deducted if gures or tables are incorrect, unclear, poorly labeled, well-scaled, or missing legends. • When in doubt, work with three digits past the decimal throughout. • This R code in the Appendix at the end of your assignment (as discussed on the previous page) must be commented on and easy for the subject tutors to follow. If the code is poorly commented and hard to follow, marks will be deducted. Commenting and code clarity must be at the level of tutorial code, or marks will be deducted. • Students with a genuine reason for not being able to submit the assignment on time can apply for special consideration to have the assignment mark transferred to the exam at the following link: • https://students.unimelb.edu.au/admin/special/ fi ff 2 fi  Assignment 3 Getting Started Please create an Assignment3 folder on your computer, go to the LMS site for ECOM 20001, and download the following data le into the Assignment1 folder: • as3_cars.csv This raw dataset has many variables, but the key variables for the assignment are: • co2: car CO2 emissions metric, per the WLTP laboratory test • enghp: engine horsepower in terms of PferdStarke (“PS”) • engcap: engine capacity (combustion chamber size) in terms of cubic cm • manualorautomatic: string for “manual” or “automatic” • powertrain: string containing the type of power a car has • fueltype: string containing the type fuel a car uses • manufacturer: string containing the name of the car’s manufacturer • noiseleveldba: car noise level in terms of decibels Data summary This dataset is obtained from the public UK Government data les on car CO2 emissions from their Vehicle Certi cation Agency website. In total, it contains information for 4,723 unique cars from 43 di erent car manufacturers from around the world. About the Assignment We can use these o cial government data to examine how CO2 emissions vary with di erent characteristics of a vehicle as well as a potentially non-linear relationship between CO2 emissions and a car’s engine power. Further, we can examine whether the relationship di ers across di erent car manufacturer countries of origin. Use a 5% signi cance level throughout the entire assignment and assume that saying a coe cient is “statistically signi cant” is equivalent to saying it is “statistically signi cantly di erent from 0.” fi ff fi ff fi ff fi ff ffi fi fi ffi 3 ff  Assignment 3 Questions 1. (2 marks) Load the raw data and naming the data frame rawdata. Using the ggplot() command in R, produce a scatter plot with co2 on the vertical axis and enghp on the horizontal axis. What looks weird/o with some of the co2 and/or enghp data points in the raw dataset? Construct a binary variable called zerovals that equals one if a data point is weird/o because of the co2 and/or enghp value and zero otherwise. How many data points have zerovals==1? Using zeroval and the subset() command in R create a new data frame called mydata which includes a subsample that only contains observations for which zeroval==0. In constructing mydata, keep/select all the 8 key variables listed on the previous page.1 Lastly, in mydata, rescale enghp so it is in terms of 100s of PS. Also, construct log_co2 = log(co2). Please use mydata for Questions 2-10 in what follows. 2. (2 marks) Using the ggplot() command in R, produce a scatter plot with log_co2 on the vertical axis and enghp on the horizontal axis that also includes the line of best t from a third-order polynomial with log_co2 dependent variable and as the enghp independent variable. Visually, does there appear to be a non-linear relationship? If so, brie y describe it in words. 3. (3 marks) Use sequential hypothesis testing to formally determine the appropriate polynomial order for modelling the relationship between log_co2 and enghp. Using stargazer(), report results from estimating cubic, quadratic and linear regression models with log_co2 as the dependent variable and (potentially) nonlinear functions of enghp as the regressors (i.e., there are no other controls in the model).2 For each step of the sequential hypothesis testing algorithm, state the null and alternative hypothesis for testing a particular polynomial order, report the outcome of the test, and state whether you should 1 Hint: After nishing Question 1, the data frames should have the following dimensions: rawdata will have 3901 observations and 45 variables • • mydata will have 3739 observations and 8 variables. 2 Note that you may have to construct additional variables from mydata to undertake sequential hypothesis testing starting from a cubic model. ff fl fi ff 4 fi  Assignment 3 move to the next order or stop. Assume heteroskedasticity-consistent standard errors in conducting your tests, as well as for the remainder of the assignment.3 4. (8 marks) Construct 3 sets of dummy variables based on each of the unique string values within manualorautomatic (2 unique values/dummies), powertrain (4 unique values/dummies) and fueltype (4 unique values/dummies). Create another set of dummy variables based on manufacturer. In particular, construct 4 separate dummies based on whether a manufacturer’s region of origin is either Asia, Europe, UK, or USA.4 With your four sets of dummy variables, construct a table in stargazer(), with 4 columns reporting regression results with log_co2 as the dependent variable: • Column (1): regressors are the set of manualorautomatic dummies, where “Automatic” cars are the base group. Interpret the statistical signi cance and magnitude of the coe cient for “Manual” cars. • Column (2): regressors are the set of powertrain dummies, where “Internal Combustion Engine (ICE)” cars are the base group. Interpret the statistical signi cance and magnitude of the coe cient for “Hybrid Electric Vehicle (HEV)” cars. • Column (3): regressors are the set of fueltype dummies, where “Petrol” cars are the base group. Interpret the statistical signi cance and magnitude of the coe cient for “Petrol Electric” cars. • Column (4): regressors are the set of origin dummies, where “Asian Manufacturer” cars are the base group. Interpret the statistical signi cance and magnitude of the coe cient for “European” cars. 5. (4 marks) Based on your sequential hypothesis test results from Question 3, construct another table in stargazer(), with 4 columns reporting di erent regression results with log_co2 as the dependent variable: • Column (1): reproduce the model estimation results for the chosen polynomial model for log_co2 as a (potentially non-linear) function of enghp from Question 3, including the constant. 3 Report the output from stargazer() produced by R with heteroskedasticity-consistent standard errors but do not worry about having to update the overall regression F-statistic in the output to account for heteroskedasticity unless otherwise asked to do so in a question. 4 Hint: you nd a given manufacturer’s country of origin using a Google search for “<manufactuer name> country of origin.” You classify 9 Asian, 13 European, 11 UK, and 4 American manufacturers. fi fi ff fi ffi ffi ffi fi fi 5 ffi  Assignment 3 • Column (2): same regression model as column (1) but also includes the set of manual and powertrain dummy variables from Question 4. • Column (3): same regression model as column (2) but also includes the fueltype dummy variables from Question 4. • Column (4): same regression model as column (3) but also includes the region of manufacturer origin dummy variables from Question 4. 6. (3 marks) Based on the results in the Column (4) model in Question 5, state the null and alternative hypothesis of an appropriate test of whether a nonlinear relationship exists between log_co2 and enghp. Report the outcome of the test in terms of statistical signi cance, along with the relevant test statistic and its sampling distribution assuming a large random sample. 7. (7 marks) Based on the results in the Column (4) model in Question 5, report the estimated partial e ects on log_co2 and their standard errors and 95% con dence intervals from increasing enghp from 150 to 200 and increasing enghp from 400 to 450, holding all other factors xed.5 Interpret the partial e ect estimates from the respective changes in enghp on log_co2. Are your ndings foreshadowed by results from any previous questions? 8. (3 marks) Construct a new variable, log_noise = log(noiseleveldba). Construct a table in stargazer(), with 2 columns reporting di erent regression results with log_co2 as the dependent variable: • Column (1): includes a Constant, log_noise, and no other controls. Interpret the statistical signi cance and magnitude of the coe cient on log_noise. • Column (2): includes as controls the 4 sets of dummy variables you created in Question 4 for manualorautomatic, powertrain, fueltype, and manufacturer origin. Interpret the statistical signi cance and magnitude of the coe cient on log_noise. Also, comment on the degree to which the coe cient magnitude changes much at all between Columns (1) and (2) and hence omitted variable bias in the Column (1) estimate. 9. (3 marks) Creating new regressors with appropriate interactions, modify your regression in Column (2) from question 8 to allow the elasticity of co2 with respect to noise to di er, depending on whether a car manufacturer’s region is Asian, European, UK, or American.6 Report your question 9 coe cient estimates 5 That is, you should report a separate partial e ect estimate, standard error, and con dence interval for each of these respective changes in enghp. 6 Formulate the regression such that the coe cient on log_noise (e.g., without any interactions) corresponds to the elasticity of co2 with respect to noise among Asian cars. ff fi ffi ffi ffi fi ffi ff ff ffi fi fi ff ff fi fi 6 fi  Assignment 3 in Column (3) along with the Column (1) and (2) estimates from question 8 in the same stargazer() table, reporting all coe cient estimates involving regressors with either co2 or noise and all other control variables and the constant. Based on the point estimates, what is the magnitude of the elasticities of co2 with respect to noise for Asian, European, UK, and American cars? 10. (3 marks) Based on the results in the Column (3) model in Question 9, state the null and alternative hypothesis of an appropriate test of whether the elasticity of co2 with respect to noise is the same for the 4 region values for European, UK, and American cars. Report the outcome of the test along with the relevant test statistic and its sampling distribution. 11. (2 marks) R-code: we will review and mark your R code as follows: • 2/2 if the R code is correct and organised and commented like the solution code for the assignment. • 1/2 if the R code is correct but hard to follow or not well commented. • 0/2 if the R code is incorrect and/or a complete mess or not submitted. 7 ffi  Assignment 3