Uploaded by Hòa Nguyễn

Econometrics Exercises: Wage Regression Analysis

advertisement
Econometrics Practical Session #5 A.Y. 2023/2024
Instructor: Catarina Midões, catarina.midoes@unive.it
30 November 2023
EXERCISES
The data set consists of information on over 10,000 full-time, full-year workers. The highest educational
achievement for each worker is either a high school diploma or a bachelor’s degree. The workers’ ages
range from 25 to 40 years. The data set also contains information on the region of the country where the
person lives, gender, and age. For the purposes of these exercises, let. The data set consists of information
on over 10,000 full-time, full-year workers. The highest educational achievement for each worker is either a
high school diploma or a bachelor’s degree. The workers’ ages range from 25 to 40 years. The data set also
contains information on the region of the country where the person lives, gender, and age. For the purposes
of these exercises, let:
• AWE = logarithm of average weekly earnings (in 2007 units)
• High School = binary variable (1 if high school, 0 if less)
• Male = binary variable (1 if male, 0 if female)
• Age = (in years)
• North = binary variable (1 if Region = North, 0 otherwise)
• East = binary variable (1 if Region = East, 0 otherwise)
• South = binary variable (1 if Region = South, 0 otherwise)
• West = binary variable (1 if Region = West, 0 otherwise)
1
1. Using the regression results in column (1):
a) Is the high school earnings difference estimated from this regression statistically significant at
the 5% level? Construct a 95% confidence interval of the difference.
Solution: The t-statistic is 0.352/0.021 = 16.76, which exceeds 1.96 in absolute value. Thus, the
coefficient is statistically significant at the 5% level. The 95% confidence interval is [0.352 ±
1.96 × 0.021] = [0.31084, 0.39316].
b) Is the male-female earnings difference estimated from this regression statistically significant at
the 5% level? Construct a 95% confidence interval for the difference.
Solution: The t-statistic is 0.458/0.021 = 21.81, which exceeds 1.96 in absolute value. Thus, the
coefficient is statistically significant at the 5% level. The 95% confidence interval is [0.458 ±
1.96 × 0.021] = [0.41684, 0.49916].
2. Using the regression results in column (2):
a) Is age an important determinant of earnings? Use an appropriate statistical test and/or confidence
interval to explain your answer.
2
Solution: Yes, age is an important determinant of earnings. Using a t-test, the t-statistic is
0.011/0.001 = 11, with a p-value less than 0.01, implying that the coefficient on age is statistically significant at the 1% level. The 95% confidence interval is [0.011 ± 1.96 × 0.001] =
[0.00904, 0.01296].
b) Suppose Alvo is a 30-year-old male college graduate, and Kal is a 40-year-old male college graduate. Construct a 95% confidence interval for the expected difference between their earnings.
Solution: ∆Age × [0.011 ± 1.96 × 0.001] = 10 × [0.011 ± 1.96 × 0.001] = [0.11 ± 1.96 × 0.01] =
[0.0904, 0.1296]
3. Using the regression results in column (3):
a) Are there any important regional differences? Use an appropriate hypothesis test to explain your
answer.
Solution: The W-statistic testing the coefficients on the regional regressors are zero is 65.61. The
5% critical value from the chi-squared distribution with 3 degrees of freedom, G3−1 (0.95), is 7.81.
Because 65.61 > 7.81, the regional effects are significant at the 5% level.
b) Juan is a 32-year-old male high school graduate from the North. Mel is a 32-year-old male high
school graduate from the West. Ari is a 32-year-old male high school graduate from the East.
i. Construct a 95% confidence interval for the difference in expected earnings between Juan
and Mel.
Solution: The expected difference between Juan and Mel is ( X4,Juan − X4,Mel ) × β 4 = (1 −
0) × β 4 = 0.175. Thus a 95% confidence interval is [0.175 ± 1.96 × 0.037] = [0.10248, 0.24752].
ii. Explain how you would construct a 95% confidence interval for the difference in expected
earnings between Juan and Ari. (Hint: What would happen if you included West and excluded East from the regression?)
Solution: The expected difference is ( X4,Juan − X4,Ari ) × β 4 + ( X7,Juan − X7,Ari ) × β 7 =
β4 − β7
C.I. = [ β 4 − β 7 ± 1.96 × se( β 4 − β 7 )] = [(0.175 − (−0.102)) ± 1.96 × se( β 4 − β 7 )]
We do not know se( β 4 − β 7 ) because we do not know, from the regression table, the covariance between β 4 and β 7 . To calculate the confidence interval, we could omit from the
regression the category East and instead put the category West. The base category would
become East. Then, the new coefficient on North would reflect the difference between North
and East, and the new standard error would be the standard error of that difference.
4. The regression shown in column (2) was estimated again, this time using data from 1993 (5000 observations selected at random and converted into 2007 units using the Consumer Price Index). The
results are:
logAWE = 9.32(0.20) + 0.301(0.019) Highschool + 0.562(0.047) Male + 0.011(0.002) Age
R̄2 = 0.85, SER = 1.25
Comparing this regression to the regression for 2012 shown in column (2), was there a statistically
significant change in the coefficient on High school?
Solution: We can perform a t-test on the differences between the coefficients, which is approximately
normal, given that each coefficient is approximately normal.
HS −
Because the samples are both random and independent, so are the two coefficients, and thus, Var ( β 2007
HS
HS
HS
HS
HS
β 1993 ) = Var ( β 2007 ) + Var ( β 1993 ) (because Cov( β 2007 , β 1993 ) = 0).
3
( β HS − β HS )−0
1993
t = se2007
= √ 0.03732−0.301
( β HS − β HS )
2007
1993
(0.021) +(0.019)2
= 2.57 > 1.96 critical value
Thus, we find a statistically significant difference between the coefficients.
5. In all of the regressions in the previous exercises, the coefficient of Male is positive, large, and statistically significant. Do you believe this provides strong statistical evidence of gender discrimination in
terms of wages in labor market? Could it reflect other types of gender discrimination?
Solution: The previous regressions show that, even when considering age, education, and region, men
are paid more than women. However, there might be factors correlated with being a male which
are the true cause of that higher pay - men in the sample might work in sectors or in careers which
are better paid than women or they might have more years of experience even for the same age if,
for instance, women take maternity leave and men do not. The coefficients estimated thus might
overestimate discrimination in terms of wages - companies pay women less not because they are
women, but because of those extra factors. This would be an example of omitted variable bias.
Econometrically, with a stylized example, if the true model is:
yi = β 0 + β 1 malei + β 2 type o f jobi + ui
and instead we are fitting the regression:
yi = β 0 + β 1 malei + vi ,
vi = β 2 type o f jobi + ui
We get that, if β 2 ̸= 0 and Cov(male, type o f jobi ) ̸= 0, the OLS assumption E[v| X ] = 0 is violated
and the OLS coefficient is biased. To visualize this bias, notice the formula for the OLS coefficient is:
β̂ 1 = β 1 +
Cov( X, v)
Cov( X, Z )
∑n ( xi − x̄ )vi
−
→ β1 +
= β1 + β2 ×
n
2
(
x
−
x̄
)
Var
(
X
)
Var ( X )
∑ i
To correct this bias, the variable type o f jobi (Z in the bias formula) must be added to the regression.
We call such variables, included to reduce / eliminate ommitted variable bias, "control variables". By
adding such control variables, the coefficient β 2 would no longer answer "do men of the same age get
paid the same as women" but it would instead answer the question "do men of the same age get paid
the same as women for the same job and level of experience".
In the literature, when adding such variables, the difference in wages between women and men decreases, but is still statistically significantly. Importantly, even if the coefficient of male becomes not
statistically significant by adding such variables, it would not be evidence against gender discrimination in general but evidence against gender discrimination in terms of wages. Even if women were
paid the same as men for the same job and the same level of experience, a new question would appear
- why are the jobs popular with women less well paid than the jobs popular with men? This can reflect
gender discrimination at the stage of selecting a job. Yet another example is the case of maternity leave
- even if the difference in wages between men and women were fully explained by the fact women
take maternity leave and men do not, there could still be gender discrimination. In certain countries,
it might be legal discrimination - e.g., only women can take paid maternity leave. If the law already
ensures equal treatment of men and women, it might still reflect discrimination in society based on
gender roles.
4
Download