THE CHINESE UNIVERSITY OF HONG KONG, SHENZHEN 2022 - 2023 TERM 2 ECO 3121 Introductory Econometrics ASSIGNMENT 1 ANSWERS TOPIC: Simple linear regression model. INSTRUCTIONS: • Please label clearly each answer with the appropriate question number and letter. Securely staple all answer sheets together, and make certain that your name(s) and student number(s) are printed clearly at the top of each answer sheet. • Please use STATA to do Question 1, and report your STATA commands and results together with your answers to the questions. • Hand-written answers must be legible. Illegible assignments will be returned unmarked. • Please combine your answers with supporting documents into one Adobe PDF file and submit. DUE DATE: 5PM Friday February 24, 2023 Please submit your work on Blackboard. Late submissions will receive a 0 with no excuses. MARKING: Marks for each question are indicated in parentheses. Total marks for the assignment equal 90. Marks are given for both content and presentation. Question 1 (25 marks) Data file: 3121A1.dta (or 3121A1.csv) Data Description: A random sample of 436 employees drawn from the 1976 U.S. population of all employed paid workers. Variable Definitions: 𝑤𝑎𝑔𝑒𝑖 = average hourly earnings of worker i in 1976, in dollars per hour. 𝑒𝑑𝑢𝑐𝑖 = years of formal education completed by worker i, in years. 𝑓𝑒𝑚𝑎𝑙𝑒𝑖 = an indicator variable equal to 1 if worker i is female, and 0 if worker i is male. (5 marks) 1. Compile a table of descriptive summary statistics for the sample data. The table should include for each of the variables in the dataset: the sample mean, the sample standard deviation, the minimum sample value, and the maximum sample value. How many females and how many males are there in the sample? (1 mark) per column in table, except Obs. . sum wage educ female Variable Obs Mean wage educ female 436 436 436 Std. Dev. Min Max 6.051216 12.67202 .4380734 3.795647 2.660956 .4967202 .53 0 0 25 18 1 . tab1 female, missing -> tabulation of female female Freq. Percent Cum. 0 1 245 191 56.19 43.81 56.19 100.00 Total 436 100.00 Number of females in the sample = 191 (0.5 mark) Number of males in the sample = 245 (0.5 mark) (25 marks) 2. Compute and present OLS estimates of the following population regression equation for the full sample of 436 paid workers: 𝑤𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝑢𝑖 (1) where 𝑢𝑖 is a random error term that is assumed to satisfy all the assumptions of the classical linear regression model. (5 marks) a) Report the OLS coefficient estimates 𝛽̂0 and 𝛽̂1 computed by estimating population regression equation (1). . reg wage educ Source SS df MS Model Residual 1061.27825 5205.739 1 434 1061.27825 11.9947903 Total 6267.01726 435 14.4069362 wage Coef. educ _cons .5869922 -1.38716 ̂0 = −1.38716 𝛽 ̂1 = 0.5869922 𝛽 Std. Err. .0624042 .807995 t 9.41 -1.72 Number of obs F( 1, 434) Prob > F R-squared Adj R-squared Root MSE = = = = = = 436 88.48 0.0000 0.1693 0.1674 3.4633 P>|t| [95% Conf. Interval] 0.000 0.087 .4643401 -2.97523 .7096443 .2009096 (2.5 mark) (2.5 mark) (5 marks) b) Interpret the value of the slope coefficient estimate 𝛽̂1 ; i.e., explain in words what the numerical value of 𝛽̂1 means. (Answer must not be just a generic description of the slope coefficient estimate; it must explicitly account for the units in which wage and educ are measured.) wage is measured in dollars per hour; educ is measured in years. ̂1 = 0.5870 means that a 1-year increase in education is Therefore, the estimate 𝛽 associated with an increase in average hourly wages equal to 𝟎. 𝟓𝟖𝟕𝟎 dollars per hour. (5 marks) (5 marks) c) Interpret the value of the intercept coefficient estimate 𝛽̂0 ; i.e., explain in words what the numerical value of 𝛽̂0 means. ̂0 = −1.3872 means that the average (mean) hourly wage rate of workers The estimate 𝛽 with zero years of education (educ = 0) equals −𝟏. 𝟑𝟖𝟕𝟐 dollars per hour. (5 marks) (5 marks) d) On a set of appropriately labeled coordinate axes, draw the estimated sample regression function implied by OLS estimation of regression equation (1). That is, draw the graph of the equation 𝑤𝑎𝑔𝑒 ̂ 𝑖 = 𝛽̂0 + 𝛽̂1 𝑒𝑑𝑢𝑐𝑖 , compute the coordinates of the two points on it that correspond to the values 12 and 16 of 𝑒𝑑𝑢𝑐𝑖 and label these two points on your graph as A and B respectively. (Note: you do not need to use STATA, or any software program, to draw and label this graph.) The two points have the following coordinates: Point A: For 𝑒𝑑𝑢𝑐𝑖 = 12 years, the estimated mean of average hourly earnings equals: 𝑤𝑎𝑔𝑒 ̂ 𝑖 = 𝛽̂0 + 𝛽̂1 𝑒𝑑𝑢𝑐𝑖 = −1.3872 + 0.5870(12) = 𝟓. 𝟔𝟓𝟔𝟖 𝐝𝐨𝐥𝐥𝐚𝐫𝐬 𝐩𝐞𝐫 𝐡𝐨𝐮𝐫 = $ 𝟓. 𝟔𝟔 per hour (1 mark) Point B: For 𝑒𝑑𝑢𝑐𝑖 = 16 years, the estimated mean of average hourly earnings equals: 𝑤𝑎𝑔𝑒 ̂ 𝑖 = 𝛽̂0 + 𝛽̂1 𝑒𝑑𝑢𝑐𝑖 = −1.3872 + 0.5870(16) = 𝟖. 𝟎𝟎𝟒𝟖 𝐝𝐨𝐥𝐥𝐚𝐫𝐬 𝐩𝐞𝐫 𝐡𝐨𝐮𝐫 = $ 𝟖. 𝟎𝟎𝟒𝟖 per hour (1 mark) 10 Figure 1: Line graph of 𝑤𝑎𝑔𝑒 ̂ 𝑖 = 𝛽̂0 + 𝛽̂1 𝑒𝑑𝑢𝑐𝑖 = −1.3872 + 0.5870𝑒𝑑𝑢𝑐𝑖 (3 marks) total: 2 marks for correct line graph; 1 mark for labeling points A and B B 0 5 A 0 5 10 educ = year of education 15 20 Question 2 (35 marks) A researcher is using data for a sample of 88 houses sold in an urban area during a recent year to investigate the relationship between house prices 𝑦𝑖 (measured in thousands of dollars) and house size 𝑥𝑖 (measured in square meters). Preliminary analysis of the sample data produces the following sample information: ∑𝑛𝑖=1 𝑦𝑖 = 25,832.05 𝑛 = 88 ∑𝑛𝑖=1 𝑥𝑖2 = 3,329,789.6 ∑𝑛𝑖=1 𝑥𝑖 = 16462.34 ∑𝑛𝑖=1 𝑦𝑖2 = 8,500,750.69 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 5,209,990.7 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )( 𝑦𝑖 − 𝑦̅) = 377,534.76 ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 = 917,854.51 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 = 250,144.32 ∑𝑛𝑖=1 𝑢̂𝑖 2 = 348,053.43 Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations. (12 marks) (a) Use the above information to compute OLS estimates of the intercept coefficient 𝛽0 and the slope coefficient 𝛽1 𝑛 ̅) 377,534.76 𝑖 −𝑥̅ )( 𝑦𝑖 −𝑦 ̂1 = ∑𝑖=1(𝑥 𝛽 = = 1.509268 = 𝟏. 𝟓𝟎𝟗𝟑 )2 ∑𝑛 (𝑥 𝑖=1 𝑖 −𝑥̅ 250,144.32 (6 marks) ̂0 = 𝑦̅ − 𝛽 ̂1 𝑥̅ 𝛽 𝑦̅ = ∑𝑛 𝑖=1 𝑦𝑖 𝑛 = 25,832.05 88 = 293.546 and 𝑥̅ = ∑𝑛 𝑖=1 𝑥𝑖 𝑛 = 16,462.34 88 = 187.072 Therefore ̂0 = 𝑦̅ − 𝛽 ̂1 𝑥̅ = 293.546 − 1.509268 ∗ 187.072 = 293.546 − 282.342 = 𝟏𝟏. 𝟐𝟎𝟒 (6 marks) 𝛽 (5 marks) (b) Interpret the slope coefficient estimate you calculated in part (a) -- i.e., explain what the ̂1 means. numeric value you calculated for 𝛽 ̂1 = 𝟏. 𝟓𝟎𝟗𝟑. 𝑦𝑖 is measured in thousands of dollars, and 𝑥𝑖 is measured in square Note: 𝛽 meters. ̂1 means that an increase (decrease) in house size of 1 square meter is The estimate 1.5093 of 𝛽 associated on average with an increase (decrease) in house price of 1.5093 thousands of dollars, or 1,509.3 dollars. (6 marks) (c) Calculate an estimate of 𝜎 2 , the error variance. 𝜎̂ 2 = 𝑅𝑅𝑆 𝑛−2 = ∑𝑛 ̂𝑖 2 𝑖=1 𝑢 𝑛−2 = 348,053.43 88−2 = 𝟒, 𝟎𝟒𝟕. 𝟏𝟑𝟑 (6 marks) (d) Compute the value of 𝑅2 , the coefficient of determination for the estimated OLS sample regression equation. Briefly explain what the calculated value of 𝑅2 means. 𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 − ∑𝑛𝑖=1 𝑢̂𝑖 2 = 917,854.51 − 348,053.43 = 569,801.08 𝑅2 = 𝑆𝑆𝐸 𝑆𝑆𝑇 = 569,801.08 917,854.51 = 𝟎. 𝟔𝟐𝟎𝟖 (4 marks) Interpretation of 𝑹𝟐 = 𝟎. 𝟔𝟐𝟎𝟖: The value of 0.6208 indicates that 62.08 percent of the total sample variation in house prices is attributable to, or explained by, the model. (2 marks) (6 marks) (e) What are the values of ∑𝑛𝑖=1 𝑢̂𝑖 and ∑𝑛𝑖=1 𝑥𝑖 𝑢̂𝑖 for the sample regression equation you have estimated? Explain briefly how you obtained your answer. ∑𝑛𝑖=1 𝑢̂𝑖 = 0 ∑𝑛𝑖=1 𝑥𝑖 𝑢̂𝑖 = 0 (2 marks) (2 marks) These computational properties of the OLS sample regression equation follow from the first-order conditions for the OLS coefficient estimators. (2 marks) Question 3 (30 marks) Derive the Ordinary Least Squares (OLS) estimate for the simple linear regression model, i.e., 𝛽̂0 and 𝛽̂1 . Be very specific. Deriving the OLS estimates The first-order conditions (FOCs) for a minimum of the RSS function by setting the partial derivatives equal to zero: we can get: (1) (2) To solve the equations, pass the summation operator through the equation (1): So and plug this into the equation (2) (and drop the division by n): simple algebra gives . If we can write