Uploaded by No Sei

Assignment 01 COMM 333

advertisement
ECON 333 D100 ​Statistical Analysis of Economic Data
Summer 2020
Lige zhang
SID:301291969
Assignment 01
due by Friday, June 5, 11:59 PM
1. An econometric model of the wage of a worker in a particular field is built as a function of the work
experience, education, and gender of that worker:
W​i​ = ​β​0​ + β​ ​1​E​i​ + ​β​2​P​i​ + ​β3​​ G​i​ + ​ε​i
(*)
where: W​i​ = the wage of the ​ith​​ worker
Ei​ ​ = the years of work experience of the ​ith​​ worker
Pi​ ​ = the years of post-secondary education of the ​ith​​ worker
Gi​ ​ = the gender of the ​ith​​ worker (1 for male and 0 for female)
εi​ ​ = the stochastic error for the ​ith​​ worker
a. What is the real-world meaning of each of the coefficients ​β​0​, ​β​1​, ​β2​​ , and ​β​3​?
β​0 is
the constant value when other variables equal to zero.
​
β1​ is
the slope coefficient that indicates how the wages change when the years of work
​
experience change by one unit, holding other value unchanged.
β2​ ​is the slope coefficient that indicates how the wages change when the years of
post-secondary education change by one unit, holding other value unchanged.
β3​ ​is the slope coefficient that indicates how the wages change related to the gender of worker
, holding other value unchanged.
b. Suppose that you wanted to add a variable to this equation to measure whether there might be
discrimination against visible minorities. How would you define such a variable? Be specific.
We can define ​M​i​ = the ethnicity of the ​i​th​ worker (1 for visible minorities and 0 for white
people), and ​β​4 ​as the slope coefficient that indicates how the wages change related to the
ethnicity of worker , holding other value unchanged.
c. Suppose that you had the opportunity to add another variable to the equation. Which of the
following possibilities would seem best? Explain your answer.
i. the age of the ​ith​​ worker
1
ii.
iii.
iv.
v.
age can be a reasonable variable because some employers care about how long a
worker can work before retirement.
the number of jobs in this field
this would not be a good variable because the wage received by an employed worker
would not be affected by how many jobs in the field.
the average wage in this field ​this would be relevant because when most employers
post a job they will refer to how much others pay for the some job position.
the number of “employee of the month” awards won by the ​ith​​ worker
no, because not many people would have this kind of awards, and the standard of this
awards different from company to company.
the number of children of the ​ith​​ worker
this is more relevant to disposal income than wages.
Although age can be a reasonable variable, considering most applicants for similar job might be
at the similar age (young people apply entry level jobs, older people apply for senior positions), age may
not be the best option. Therefore, I would say the average wage would seem best as another variable to
the equation.
2. Let us say you have obtained a dataset that includes information about grape production in
Okanagan Valley with some additional information about fertilizer use and local weather. You
run an OLS regression of annual grape yields on fertilizer amounts and rainfall. The results are:
Yˆ t =− 114 + 0.12F t + 6.21Rt
where
Yt​ ​ = the grape yield (kilograms/acre) in year ​t
Ft​ ​ = fertilizer intensity (pounds/acre) in year ​t
Rt​ ​ = rainfall (cm) in year ​t
a. Carefully interpret the meaning of the coefficients of 0.12 and 6.21 in terms of impact of
F​ and ​R​ on ​Y.​
Coefficient 0.12: Each pound/acre of fertilizer add will increase 0.12 kilograms/acre of
the grape yield in year t.
Coefficient 6.21: Each cm of rainfall increase will increase 6.21 kilograms/acre of the
grape yield in year t.
b. Use the estimates to determine the yield in years with 200, 500 and 800 inches of rainfall,
given that the fertilizer intensity was always 120 pounds per acre.
R = 200 : Yˆ t =− 114 + 0.12 * 120 + 6.21 * 200 = 1112.4
R = 500 : Yˆ t =− 114 + 0.12 * 120 + 6.21 * 200 = 2975.4
R = 800 : Yˆ t =− 114 + 0.12 * 120 + 6.21 * 200 = 4838.4
c. Does the constant term of –114 mean that negative amounts of grapes are possible? If
not, what is the meaning of that estimate?
2
No, the constant term of -114 is the y-intercept for the regression model and it does not
have any real meaning related to the yield of grape in year t.
d. The grape yield is in unconventional kilograms/acre because it makes for convenient
numbers in the regression. If the same data were recorded with the grape yield in more
conventional tons/acre, how would the results of your OLS regression look like (write the
equation with the new numbers for the betas). Explain.
Yˆ t = (− 114 + 0.12 * 0.4536F t + 6.21Rt )/1000
First, we need to convert pound to kilogram by multiplying 0.4536 to F t . Then we
convert kilogram to ton by dividing 1000. Eventually, our equation will be :
Yˆ t =− 0.114 + 0.000054432 + 0.00621Rt
3. Use the ​MS Excel​ file ​roadtrips.xlsx​ to complete parts (a) to (c) below. In this file, ​miles​ is the total
distance traveled by a family over a year (in miles), ​income​ is the family’s annual income in $1000's,
age​ is average age of adult members of family (in years), ​kids​ is the number of children in the family.
a. Run the OLS linear regression (to estimate the coefficients) as given below. ​Report the
estimated equation, the R​2​ and the adjusted R​2​. Attach your R code.
ˆ
miles
i​ = β̂ 0​ + β̂ 1​incomei​
(**)
β̂ 0 =48.94
β̂ 1 =15.73
​
​
R-squared:
0.272,
Adjusted R-squared:
0.2684
linear_model <- lm(miles ~ income, data = roadtrips)
summary(linear_model)
b. Run the OLS linear regression (to estimate the coefficients) as given below. ​Report the
estimated equation, the R​2​ and the adjusted R​2​. Attach your R code.
ˆ
miles
i​ = β̂ 0​ + β̂ 2​agei​
(***)
β̂ 0 ​=265.07 β̂ 1 ​= 18.49
R-squared:
0.09781,
Adjusted R-squared:
3
0.09325
linear_model <- lm(miles ~age, data = roadtrips)
summary(linear_model)
c. Run the OLS linear regression (to estimate the coefficients) as given below. ​Report the
estimated equation, the R​2​ and the adjusted R​2​. Attach your R code.
ˆ
miles
i​ = β̂ 0​ + β̂ 1​incomei​ ​ + β̂ 2​agei​
Coefficients:
(Intercept)
-365.60
R-squared:
income
14.30
0.31,
(****)
age
11.86
Adjusted R-squared:
0.303
linear_model <- lm(miles ~ income + age, data = roadtrips)
linear_model
summary(linear_model)
a. Run the OLS linear regression (to estimate the coefficients) as given below. ​Report the
estimated equation, the R​2​ and the adjusted R​2​. Attach your R code.
ˆ
miles
i​ = β̂ 0​ + β̂ 1​incomei​ ​ + β̂ 2​agei​ ​ ​+ β̂ 3​kids​i
Coefficients:
(Intercept)
-391.55
R-squared:
income
14.20
0.3406,
age
15.74
kids
-81.83
Adjusted R-squared:
0.3305
linear_model <- lm(miles ~ income + age + kids, data = roadtrips)
linear_model
4
(*****)
b. Which of the models above [(**), (***), (****), or (*****)] would you say is the best of
the four? Explain why.
The last one ***** would be the best because it takes more variables into consideration.
c. Explain, in your own words, why the estimated slope coefficients β̂ 1 and
β̂ 2​ are not same
​
in equation (****) as they are in equations (**) and (***).
ˆ
**** miles
i​ = -365.6 + 14.3​incomei​ ​ + 11.86​agei​
Because the two variable income and age are both related to miles, when taking both variables into
consideration, the coefficients of these two variables need to be adjust smaller to generate the same values
as the miles when taking only one variable into account.
5
Download