Uploaded by akshay123

Econ 3740 - Assignment 2

advertisement
Econ 3710 Assignment 2
Student Id number ends with two.
library(tidyverse)
> library(tidyverse)
> library(haven)
> library(stargazer)
> dall <- read_dta("LFS-71M0001-E-2022-October_F1.dta")
Question 1
1a. head(dall$AGE_12)
dall <- dall %>% filter ((AGE_12 >=2 & AGE_12 <=11))
dall <- dall %>% filter(AGE_12!=11)
There is total 74333 observations in my sample.
1b. There are no observations with zero hourly wages. The minimum wage is 32.52. But we do
have 23672 missing. These are people who don’t earn wages. For example, retired people,
students, housewives etc. These are unemployed people.
summary(dall$HRLYEARN)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.33 21.00 28.85 32.52 40.87
115.38
NA's
23672
> dall <-dall %>% mutate(lwage=log(HRLYEARN))
> summary(dall$lwage)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.203 3.045 3.362 3.382 3.710
4.748
NA's
23672
The mean log wage is 3.382. The mean hourly wage is 32.52.
The histogram (pictures below) of hourly wages looked skewed to the right. It doesn’t look like
a normal distribution. On the other hand, the plot for log wages almost looks like a bell curve.
It somewhat appears closer to a normal distribution.
dall<-dall %>% filter(!is.na(HRLYEARN))
1c.
> head(dall$HRLYEARN)
[1] 21.15 20.14 26.71 23.39 39.00 21.86
> head(dall$AGE_12)
<labelled<double>[6]>: Five-year age group of respondent
[1] 3 5 6 4 10 3
Labels:
value
label
1 15 to 19 years
2 20 to 24 years
3 25 to 29 years
4 30 to 34 years
5 35 to 39 years
6 40 to 44 years
7 45 to 49 years
8 50 to 54 years
9 55 to 59 years
10 60 to 64 years
11 65 to 69 years
12 70 and over
> head(dall$EDUC)
<labelled<double>[6]>: Highest educational attainment
[1] 4 4 5 5 4 4
Labels:
value
label
0
0 to 8 years
1
Some high school
2
High school graduate
3
Some postsecondary
4 Postsecondary certificate or diploma
5
Bachelor's degree
6
Above bachelor's degree
> head(dall$SEX)
<labelled<double>[6]>: Sex of respondent
[1] 2 2 2 2 1 2
Labels:
value label
1 Male
2 Female
> head(dall$COWMAIN)
<labelled<double>[6]>: Class of worker, main job
[1] 2 2 2 2 2 2
Labels:
value
label
1
Public sector employees
2
Private sector employees
3 Self-employed incorporated, with paid help
4 Self-employed incorporated, no paid help
5 Self-employed unincorporated, with paid help
6 Self-employed unincorporated, no paid help
7
Unpaid family worker
> table(dall$COWMAIN)
1 2
15019 35642
> head(dall$UNION)
<labelled<double>[6]>: Union status, employees only
[1] 3 1 3 3 3 3
Labels:
value
label
1
Union member
2 Not a member but covered by a union contract or collective a
3
Non-unionized
> glimpse(dall$HRLYEARN)
num [1:50661] 21.1 20.1 26.7 23.4 39 ...
- attr(*, "label")= chr "Usual hourly wages, employees only"
- attr(*, "format.stata")= chr "%6.2f"
EDUC, AGE_12, COWMAIN and SEX are factor variables. Only hourly wages are continuous
variables.
1d.
head(dall$EDUC)
table(dall$EDUC)
prop.table(table(dall$EDUC))
CATEGORIES
Label
0 to 8 years
Some high school
High school graduate
Some post secondary
Post secondary certificate or diploma
Bachelor’s degree
Above bachelor’s degree
Value
0.010
0.041
0.174
0.046
0.385
0.234
0.109
The largest education category is Post secondary certificate or diploma.
1e. > head(dall$SEX)
<labelled<double>[6]>: Sex of respondent
[1] 2 2 2 2 1 2
Labels:
value label
1 Male
2 Female
> dall <-dall %>% mutate (sex=SEX-1)
> table(dall$EDUC)
0 1 2 3 4 5 6
522 2093 8817 2347 19484 11869 5529
> table(dall$AGE_12)
2 3 4 5 6 7 8 9 10
4179 5374 6117 6272 6388 6056 6171 5736 4368
> table(dall$COWMAIN)
1 2
15019 35642
> model_1e <- lm(lwage ~ factor(EDUC)+factor(AGE_12)+sex+factor(COWMAIN), dall)
The base category for factor variables: for EDUC it starts at 0, for AGE_12 it is at 2, and for C
OWMAIN it is 1, sex(2).
1e. (ii) Reporting the model in the table format using the stargazer library.
> summary(model_1e)
Call:
lm(formula = lwage ~ factor(EDUC) + factor(AGE_12) + sex + factor(COWMAIN),
data = dall)
Residuals:
Min
1Q Median
3Q Max
-2.39832 -0.25370 -0.01157 0.25060 1.54464
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
3.000637 0.018396 163.114 < 2e-16 ***
Some high school
0.070508 0.018759 3.759 0.000171 ***
High school graduate
0.138425 0.017297 8.003 1.24e-15 ***
Some postsecondary
0.174339 0.018667 9.340 < 2e-16 ***
Post secondary certificate or diploma 0.268499 0.017052 15.746 < 2e-16 ***
Bachelor’s Degree
0.436309 0.017252 25.290 < 2e-16 ***
Above Bachelor’s degree
0.541368 0.017675 30.630 < 2e-16 ***
25 to 29 years
0.192787 0.008026 24.019 < 2e-16 ***
30 to 34 years
0.293741 0.007862 37.363 < 2e-16 ***
35 to 39 years
0.340646 0.007835 43.476 < 2e-16 ***
40 to 44 years
0.368100 0.007802 47.180 < 2e-16 ***
45 to 49 years
0.371832 0.007872 47.233 < 2e-16 ***
50 to 54 years
0.375515 0.007832 47.944 < 2e-16 ***
55 to 59 years
0.336037 0.007935 42.350 < 2e-16 ***
60 to 64 years
0.295222 0.008410 35.104 < 2e-16 ***
Female
-0.183452 0.003481 -52.694 < 2e-16 ***
Private sector employees
-0.177315 0.003916 -45.279 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3833 on 50644 degrees of freedom
Multiple R-squared: 0.2553, Adjusted R-squared: 0.2551
F-statistic: 1085 on 16 and 50644 DF, p-value: < 2.2e-16
Since the R2 IS 0.2553. So, this model is not a good fit because it has low correlation.
1f. Interpreting the coefficients from the wage regression.
We know that base category is an independent variable.
Variables such as EDUC and AGE_12 have positive coefficients which means that when these
variables increase, lwage also increases.
Variables such as Sex and COWMAIN have negative coefficients, indicating lwage decreases as
gender and cowmain increases.
Question 2.
# Unionization
head(dall$UNION)
dall<-mutate(dall, unionized=if_else(UNION==3,0,1))
#PUBLIC/PRIVATE
dall<-dall%>% mutate(public=if_else(COWMAIN==1,1,0))
2a. > table(dall$unionized, dall$public)
0
1
0 29626 3102
1 6016 11917
Row 1 indicate non-unionized and row 2 indicates unionized. Column 1 indicates private sector and
column 2 indicates public sector.
The unionized workforce is 6016+11917 = 17933. The total workforce is
29626+3102+6016+11917 = 50661. The unionization percent workforce = 17933 /50661 *100 =
35.40%
> prop.table(table(dall$unionized, dall$public),2)
0
1
0 0.8312104 0.2065384
1 0.1687896 0.7934616
> prop.table(table(dall$unionized, dall$public),1)
0
1
0 0.90521877 0.09478123
1 0.33547092 0.66452908
> table(dall$public)
0 1
35642 15019
The total fraction of workforce in public sector is 35642+15019 = 50661.
The percent of private sector which is unionized contributes to 33.54%. The percent of public
sector which is unionized is 66.45%.
2b.
> dallunion<- dall%>% filter(unionized==1)
> dallnonun<- dall%>% filter(unionized==0)
> summary(dallunion$lwage)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.270 3.219 3.483 3.481 3.766 4.726
> summary(dallnonun$lwage)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.203 2.975 3.258 3.327 3.650 4.748
The mean log wages for union sector are 3.481. The mean log wages for non union sector are
3.327. The union premium = 3.481-3.327 = 0.154
2c.
> dallpublic<- dall%>% filter(public==1)
> dallprivate<- dall%>% filter(public==0)
> summary(dallpublic$lwage)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.270 3.258 3.555 3.550 3.836 4.726
> summary(dallprivate$lwage)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.203 2.983 3.258 3.311 3.618 4.748
The mean log wages for public sector are 3.550 and the mean log wages for private sector is
3.311. The public premium = 0.239
2d. > lm(log(HRLYEARN)~unionized+public, data=dall)
Call:
lm(formula = log(HRLYEARN) ~ unionized + public, data = dall)
Coefficients:
(Intercept) unionized
3.30600 0.02832
public
0.22104
The union premium is 3% high. The person who works in the public sector has 22.1% higher
wages than person who works in private sector. The intercept (i.e., 3.31) is the mean of nonunionized private.
Public sector is correlated with unionized.
> lm(log(HRLYEARN)~unionized,, data=dall)
Call:
lm(formula = log(HRLYEARN) ~ unionized, data = dall)
Coefficients:
(Intercept) unionized
3.3269
0.1543
> lm(log(HRLYEARN)~public,, data=dall)
Call:
lm(formula = log(HRLYEARN) ~ public, data = dall)
Coefficients:
(Intercept)
public
3.3108
0.2387
When union was the only regressor, the coefficient is 0.1543. and when public sector was the
only regressor, the coefficient was only 0.2387.
As unionization and public are correlated, when omitting one from the regression there is
omitted variable bias. When both are included the decreased magnitude of coefficients reflects
that.
2e. > model_2e <- lm(lwage ~ factor(EDUC)+sex+factor(AGE_12)+unionized +public, data=dall)
> summary(model_2e)
Call:
lm(formula = lwage ~ factor(EDUC) + sex + factor(AGE_12) + unionized +
public, data = dall)
Residuals:
Min
1Q Median
3Q Max
-2.40386 -0.25419 -0.01312 0.24947 1.54970
Coefficients:
Estimate Std. Error t value
(Intercept)
2.819679 0.017911 157.430
factor(EDUC)1 0.070548 0.018752 3.762
factor(EDUC)2 0.138393 0.017291 8.004
factor(EDUC)3 0.174904 0.018660 9.373
factor(EDUC)4 0.267728 0.017046 15.707
factor(EDUC)5 0.437500 0.017247 25.367
factor(EDUC)6 0.543906 0.017673 30.777
sex
-0.182305 0.003485 -52.311
factor(AGE_12)3 0.191197 0.008028 23.818
factor(AGE_12)4 0.291731 0.007865 37.091
factor(AGE_12)5 0.338941 0.007837 43.248
factor(AGE_12)6 0.366112 0.007806 46.904
factor(AGE_12)7 0.369607 0.007877 46.920
factor(AGE_12)8 0.373404 0.007837 47.648
factor(AGE_12)9 0.333938 0.007939 42.064
factor(AGE_12)10 0.293433 0.008412 34.884
unionized
0.027960 0.004471 6.253
public
0.159357 0.004855 32.823
Pr(>|t|)
(Intercept)
< 2e-16 ***
factor(EDUC)1 0.000169 ***
factor(EDUC)2 1.23e-15 ***
factor(EDUC)3 < 2e-16 ***
factor(EDUC)4 < 2e-16 ***
factor(EDUC)5 < 2e-16 ***
factor(EDUC)6 < 2e-16 ***
sex
< 2e-16 ***
factor(AGE_12)3 < 2e-16 ***
factor(AGE_12)4 < 2e-16 ***
factor(AGE_12)5 < 2e-16 ***
factor(AGE_12)6 < 2e-16 ***
factor(AGE_12)7 < 2e-16 ***
factor(AGE_12)8 < 2e-16 ***
factor(AGE_12)9 < 2e-16 ***
factor(AGE_12)10 < 2e-16 ***
unionized
4.05e-10 ***
public
< 2e-16 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3832 on 50643 degrees of freedom
Multiple R-squared: 0.2559, Adjusted R-squared: 0.2556
F-statistic: 1024 on 17 and 50643 DF, p-value: < 2.2e-16
The results suggest that both unionization and public sector employment have positive effect on
wages, even after controlling for other factors such as education, age, gender and tenure. The
coefficient of unionized is 0.03 and coefficient of public sector is 0.16.
2f. > model_2f <- lm(lwage ~ factor(EDUC)+sex+factor(AGE_12)+
TENURE+
unionized+public+
factor(MARSTAT)+factor(NOC_10), data=dall)
> summary(model_2f)
Call:
lm(formula = lwage ~ factor(EDUC) + sex + factor(AGE_12) + TENURE +
unionized + public + factor(MARSTAT) + factor(NOC_10), data = dall)
Residuals:
Min
1Q Median 3Q Max
-2.5094 -0.2127 -0.0098 0.2108 1.4094
Coefficients:
Estimate Std. Error t value
(Intercept)
3.383e+00 1.727e-02 195.886
factor(EDUC)1 6.934e-02 1.639e-02 4.232
factor(EDUC)2 1.022e-01 1.513e-02 6.753
factor(EDUC)3 1.313e-01 1.635e-02 8.026
factor(EDUC)4 1.786e-01 1.495e-02 11.947
factor(EDUC)5 2.998e-01 1.521e-02 19.714
factor(EDUC)6 3.872e-01 1.561e-02 24.798
sex
-1.268e-01 3.453e-03 -36.735
factor(AGE_12)3 1.088e-01 7.139e-03 15.237
factor(AGE_12)4 1.590e-01 7.277e-03 21.844
factor(AGE_12)5 1.706e-01 7.475e-03 22.819
factor(AGE_12)6 1.743e-01 7.585e-03 22.983
factor(AGE_12)7 1.637e-01 7.743e-03 21.140
factor(AGE_12)8 1.549e-01 7.813e-03 19.827
factor(AGE_12)9 1.142e-01 7.998e-03 14.281
factor(AGE_12)10 8.115e-02 8.429e-03 9.628
TENURE
9.390e-04 2.088e-05 44.981
unionized
5.527e-02 4.074e-03 13.565
public
7.872e-02 4.524e-03 17.400
factor(MARSTAT)2 3.471e-03 4.334e-03 0.801
factor(MARSTAT)3 -4.957e-02 1.663e-02 -2.980
factor(MARSTAT)4 -2.604e-02 9.525e-03 -2.734
factor(MARSTAT)5 -6.657e-03 7.567e-03 -0.880
factor(MARSTAT)6 -6.328e-02 4.277e-03 -14.795
factor(NOC_10)2 -4.021e-01 6.471e-03 -62.141
factor(NOC_10)3 -1.677e-01 7.204e-03 -23.273
factor(NOC_10)4 -3.502e-01 7.646e-03 -45.796
factor(NOC_10)5 -3.642e-01 7.029e-03 -51.817
factor(NOC_10)6 -4.308e-01 1.297e-02 -33.204
factor(NOC_10)7 -6.552e-01 6.546e-03 -100.092
factor(NOC_10)8 -3.830e-01 7.017e-03 -54.586
factor(NOC_10)9 -4.457e-01 1.158e-02 -38.503
factor(NOC_10)10 -5.173e-01 8.899e-03 -58.133
Pr(>|t|)
(Intercept)
< 2e-16 ***
factor(EDUC)1 2.32e-05 ***
factor(EDUC)2 1.46e-11 ***
factor(EDUC)3 1.03e-15 ***
factor(EDUC)4 < 2e-16 ***
factor(EDUC)5 < 2e-16 ***
factor(EDUC)6 < 2e-16 ***
sex
< 2e-16 ***
factor(AGE_12)3 < 2e-16 ***
factor(AGE_12)4 < 2e-16 ***
factor(AGE_12)5 < 2e-16 ***
factor(AGE_12)6 < 2e-16 ***
factor(AGE_12)7 < 2e-16 ***
factor(AGE_12)8 < 2e-16 ***
factor(AGE_12)9 < 2e-16 ***
factor(AGE_12)10 < 2e-16 ***
TENURE
< 2e-16 ***
unionized
< 2e-16 ***
public
< 2e-16 ***
factor(MARSTAT)2 0.42317
factor(MARSTAT)3 0.00288 **
factor(MARSTAT)4 0.00626 **
factor(MARSTAT)5 0.37898
factor(MARSTAT)6 < 2e-16 ***
factor(NOC_10)2 < 2e-16 ***
factor(NOC_10)3 < 2e-16 ***
factor(NOC_10)4 < 2e-16 ***
factor(NOC_10)5 < 2e-16 ***
factor(NOC_10)6 < 2e-16 ***
factor(NOC_10)7 < 2e-16 ***
factor(NOC_10)8 < 2e-16 ***
factor(NOC_10)9 < 2e-16 ***
factor(NOC_10)10 < 2e-16 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3346 on 50628 degrees of freedom
Multiple R-squared: 0.4326, Adjusted R-squared: 0.4323
F-statistic: 1206 on 32 and 50628 DF, p-value: < 2.2e-16
Education is relevant to base 0. Females have a penalty of 1.27 percent. My base for age is two.
stargazer(model_1e, model_2e, model_2f, type="text")
In this model as well, unionization and public sector have positive effect on log wages. However
, this model is a little bit better because it has more relevant factors. The coefficients have incre
ased to 0.55 for unionization and 0.78 for public sector. Their also has been an increase in R^2
from 25% in model 2_e to 43% in model 2_f.
2g. > step1 <-lm(public ~ factor(EDUC)+sex+factor(AGE_12)
+
+TENURE+
+
unionized+
+
factor(MARSTAT)+factor(NOC_10), data=dall)
> dall <-dall %>% mutate(rhat =resid(step1))
> #step2
> lm(lwage~rhat, dall)
Call:
lm(formula = lwage ~ rhat, data = dall)
Coefficients:
(Intercept)
rhat
3.38155 0.07872
I chose public as my coefficient. The intuition behind this two-step procedure is to confirm that
coefficient of this rhat is going to be exactly the same as coefficient of public from the original
model when we were regressing log wage.
It is equal to 7.872.
Download