ECON 333 D100 Statistical Analysis of Economic Data Summer 2020 Lige zhang SID:301291969 Assignment 01 due by Friday, June 5, 11:59 PM 1. An econometric model of the wage of a worker in a particular field is built as a function of the work experience, education, and gender of that worker: Wi = β0 + β 1Ei + β2Pi + β3 Gi + εi (*) where: Wi = the wage of the ith worker Ei = the years of work experience of the ith worker Pi = the years of post-secondary education of the ith worker Gi = the gender of the ith worker (1 for male and 0 for female) εi = the stochastic error for the ith worker a. What is the real-world meaning of each of the coefficients β0, β1, β2 , and β3? β0 is the constant value when other variables equal to zero. β1 is the slope coefficient that indicates how the wages change when the years of work experience change by one unit, holding other value unchanged. β2 is the slope coefficient that indicates how the wages change when the years of post-secondary education change by one unit, holding other value unchanged. β3 is the slope coefficient that indicates how the wages change related to the gender of worker , holding other value unchanged. b. Suppose that you wanted to add a variable to this equation to measure whether there might be discrimination against visible minorities. How would you define such a variable? Be specific. We can define Mi = the ethnicity of the ith worker (1 for visible minorities and 0 for white people), and β4 as the slope coefficient that indicates how the wages change related to the ethnicity of worker , holding other value unchanged. c. Suppose that you had the opportunity to add another variable to the equation. Which of the following possibilities would seem best? Explain your answer. i. the age of the ith worker 1 ii. iii. iv. v. age can be a reasonable variable because some employers care about how long a worker can work before retirement. the number of jobs in this field this would not be a good variable because the wage received by an employed worker would not be affected by how many jobs in the field. the average wage in this field this would be relevant because when most employers post a job they will refer to how much others pay for the some job position. the number of “employee of the month” awards won by the ith worker no, because not many people would have this kind of awards, and the standard of this awards different from company to company. the number of children of the ith worker this is more relevant to disposal income than wages. Although age can be a reasonable variable, considering most applicants for similar job might be at the similar age (young people apply entry level jobs, older people apply for senior positions), age may not be the best option. Therefore, I would say the average wage would seem best as another variable to the equation. 2. Let us say you have obtained a dataset that includes information about grape production in Okanagan Valley with some additional information about fertilizer use and local weather. You run an OLS regression of annual grape yields on fertilizer amounts and rainfall. The results are: Yˆ t =− 114 + 0.12F t + 6.21Rt where Yt = the grape yield (kilograms/acre) in year t Ft = fertilizer intensity (pounds/acre) in year t Rt = rainfall (cm) in year t a. Carefully interpret the meaning of the coefficients of 0.12 and 6.21 in terms of impact of F and R on Y. Coefficient 0.12: Each pound/acre of fertilizer add will increase 0.12 kilograms/acre of the grape yield in year t. Coefficient 6.21: Each cm of rainfall increase will increase 6.21 kilograms/acre of the grape yield in year t. b. Use the estimates to determine the yield in years with 200, 500 and 800 inches of rainfall, given that the fertilizer intensity was always 120 pounds per acre. R = 200 : Yˆ t =− 114 + 0.12 * 120 + 6.21 * 200 = 1112.4 R = 500 : Yˆ t =− 114 + 0.12 * 120 + 6.21 * 200 = 2975.4 R = 800 : Yˆ t =− 114 + 0.12 * 120 + 6.21 * 200 = 4838.4 c. Does the constant term of –114 mean that negative amounts of grapes are possible? If not, what is the meaning of that estimate? 2 No, the constant term of -114 is the y-intercept for the regression model and it does not have any real meaning related to the yield of grape in year t. d. The grape yield is in unconventional kilograms/acre because it makes for convenient numbers in the regression. If the same data were recorded with the grape yield in more conventional tons/acre, how would the results of your OLS regression look like (write the equation with the new numbers for the betas). Explain. Yˆ t = (− 114 + 0.12 * 0.4536F t + 6.21Rt )/1000 First, we need to convert pound to kilogram by multiplying 0.4536 to F t . Then we convert kilogram to ton by dividing 1000. Eventually, our equation will be : Yˆ t =− 0.114 + 0.000054432 + 0.00621Rt 3. Use the MS Excel file roadtrips.xlsx to complete parts (a) to (c) below. In this file, miles is the total distance traveled by a family over a year (in miles), income is the family’s annual income in $1000's, age is average age of adult members of family (in years), kids is the number of children in the family. a. Run the OLS linear regression (to estimate the coefficients) as given below. Report the estimated equation, the R2 and the adjusted R2. Attach your R code. ˆ miles i = β̂ 0 + β̂ 1incomei (**) β̂ 0 =48.94 β̂ 1 =15.73 R-squared: 0.272, Adjusted R-squared: 0.2684 linear_model <- lm(miles ~ income, data = roadtrips) summary(linear_model) b. Run the OLS linear regression (to estimate the coefficients) as given below. Report the estimated equation, the R2 and the adjusted R2. Attach your R code. ˆ miles i = β̂ 0 + β̂ 2agei (***) β̂ 0 =265.07 β̂ 1 = 18.49 R-squared: 0.09781, Adjusted R-squared: 3 0.09325 linear_model <- lm(miles ~age, data = roadtrips) summary(linear_model) c. Run the OLS linear regression (to estimate the coefficients) as given below. Report the estimated equation, the R2 and the adjusted R2. Attach your R code. ˆ miles i = β̂ 0 + β̂ 1incomei + β̂ 2agei Coefficients: (Intercept) -365.60 R-squared: income 14.30 0.31, (****) age 11.86 Adjusted R-squared: 0.303 linear_model <- lm(miles ~ income + age, data = roadtrips) linear_model summary(linear_model) a. Run the OLS linear regression (to estimate the coefficients) as given below. Report the estimated equation, the R2 and the adjusted R2. Attach your R code. ˆ miles i = β̂ 0 + β̂ 1incomei + β̂ 2agei + β̂ 3kidsi Coefficients: (Intercept) -391.55 R-squared: income 14.20 0.3406, age 15.74 kids -81.83 Adjusted R-squared: 0.3305 linear_model <- lm(miles ~ income + age + kids, data = roadtrips) linear_model 4 (*****) b. Which of the models above [(**), (***), (****), or (*****)] would you say is the best of the four? Explain why. The last one ***** would be the best because it takes more variables into consideration. c. Explain, in your own words, why the estimated slope coefficients β̂ 1 and β̂ 2 are not same in equation (****) as they are in equations (**) and (***). ˆ **** miles i = -365.6 + 14.3incomei + 11.86agei Because the two variable income and age are both related to miles, when taking both variables into consideration, the coefficients of these two variables need to be adjust smaller to generate the same values as the miles when taking only one variable into account. 5