Econometrics Practical Session #5 A.Y. 2023/2024 Instructor: Catarina Midões, catarina.midoes@unive.it 30 November 2023 EXERCISES The data set consists of information on over 10,000 full-time, full-year workers. The highest educational achievement for each worker is either a high school diploma or a bachelor’s degree. The workers’ ages range from 25 to 40 years. The data set also contains information on the region of the country where the person lives, gender, and age. For the purposes of these exercises, let. The data set consists of information on over 10,000 full-time, full-year workers. The highest educational achievement for each worker is either a high school diploma or a bachelor’s degree. The workers’ ages range from 25 to 40 years. The data set also contains information on the region of the country where the person lives, gender, and age. For the purposes of these exercises, let: • AWE = logarithm of average weekly earnings (in 2007 units) • High School = binary variable (1 if high school, 0 if less) • Male = binary variable (1 if male, 0 if female) • Age = (in years) • North = binary variable (1 if Region = North, 0 otherwise) • East = binary variable (1 if Region = East, 0 otherwise) • South = binary variable (1 if Region = South, 0 otherwise) • West = binary variable (1 if Region = West, 0 otherwise) 1 1. Using the regression results in column (1): a) Is the high school earnings difference estimated from this regression statistically significant at the 5% level? Construct a 95% confidence interval of the difference. Solution: The t-statistic is 0.352/0.021 = 16.76, which exceeds 1.96 in absolute value. Thus, the coefficient is statistically significant at the 5% level. The 95% confidence interval is [0.352 ± 1.96 × 0.021] = [0.31084, 0.39316]. b) Is the male-female earnings difference estimated from this regression statistically significant at the 5% level? Construct a 95% confidence interval for the difference. Solution: The t-statistic is 0.458/0.021 = 21.81, which exceeds 1.96 in absolute value. Thus, the coefficient is statistically significant at the 5% level. The 95% confidence interval is [0.458 ± 1.96 × 0.021] = [0.41684, 0.49916]. 2. Using the regression results in column (2): a) Is age an important determinant of earnings? Use an appropriate statistical test and/or confidence interval to explain your answer. 2 Solution: Yes, age is an important determinant of earnings. Using a t-test, the t-statistic is 0.011/0.001 = 11, with a p-value less than 0.01, implying that the coefficient on age is statistically significant at the 1% level. The 95% confidence interval is [0.011 ± 1.96 × 0.001] = [0.00904, 0.01296]. b) Suppose Alvo is a 30-year-old male college graduate, and Kal is a 40-year-old male college graduate. Construct a 95% confidence interval for the expected difference between their earnings. Solution: ∆Age × [0.011 ± 1.96 × 0.001] = 10 × [0.011 ± 1.96 × 0.001] = [0.11 ± 1.96 × 0.01] = [0.0904, 0.1296] 3. Using the regression results in column (3): a) Are there any important regional differences? Use an appropriate hypothesis test to explain your answer. Solution: The W-statistic testing the coefficients on the regional regressors are zero is 65.61. The 5% critical value from the chi-squared distribution with 3 degrees of freedom, G3−1 (0.95), is 7.81. Because 65.61 > 7.81, the regional effects are significant at the 5% level. b) Juan is a 32-year-old male high school graduate from the North. Mel is a 32-year-old male high school graduate from the West. Ari is a 32-year-old male high school graduate from the East. i. Construct a 95% confidence interval for the difference in expected earnings between Juan and Mel. Solution: The expected difference between Juan and Mel is ( X4,Juan − X4,Mel ) × β 4 = (1 − 0) × β 4 = 0.175. Thus a 95% confidence interval is [0.175 ± 1.96 × 0.037] = [0.10248, 0.24752]. ii. Explain how you would construct a 95% confidence interval for the difference in expected earnings between Juan and Ari. (Hint: What would happen if you included West and excluded East from the regression?) Solution: The expected difference is ( X4,Juan − X4,Ari ) × β 4 + ( X7,Juan − X7,Ari ) × β 7 = β4 − β7 C.I. = [ β 4 − β 7 ± 1.96 × se( β 4 − β 7 )] = [(0.175 − (−0.102)) ± 1.96 × se( β 4 − β 7 )] We do not know se( β 4 − β 7 ) because we do not know, from the regression table, the covariance between β 4 and β 7 . To calculate the confidence interval, we could omit from the regression the category East and instead put the category West. The base category would become East. Then, the new coefficient on North would reflect the difference between North and East, and the new standard error would be the standard error of that difference. 4. The regression shown in column (2) was estimated again, this time using data from 1993 (5000 observations selected at random and converted into 2007 units using the Consumer Price Index). The results are: logAWE = 9.32(0.20) + 0.301(0.019) Highschool + 0.562(0.047) Male + 0.011(0.002) Age R̄2 = 0.85, SER = 1.25 Comparing this regression to the regression for 2012 shown in column (2), was there a statistically significant change in the coefficient on High school? Solution: We can perform a t-test on the differences between the coefficients, which is approximately normal, given that each coefficient is approximately normal. HS − Because the samples are both random and independent, so are the two coefficients, and thus, Var ( β 2007 HS HS HS HS HS β 1993 ) = Var ( β 2007 ) + Var ( β 1993 ) (because Cov( β 2007 , β 1993 ) = 0). 3 ( β HS − β HS )−0 1993 t = se2007 = √ 0.03732−0.301 ( β HS − β HS ) 2007 1993 (0.021) +(0.019)2 = 2.57 > 1.96 critical value Thus, we find a statistically significant difference between the coefficients. 5. In all of the regressions in the previous exercises, the coefficient of Male is positive, large, and statistically significant. Do you believe this provides strong statistical evidence of gender discrimination in terms of wages in labor market? Could it reflect other types of gender discrimination? Solution: The previous regressions show that, even when considering age, education, and region, men are paid more than women. However, there might be factors correlated with being a male which are the true cause of that higher pay - men in the sample might work in sectors or in careers which are better paid than women or they might have more years of experience even for the same age if, for instance, women take maternity leave and men do not. The coefficients estimated thus might overestimate discrimination in terms of wages - companies pay women less not because they are women, but because of those extra factors. This would be an example of omitted variable bias. Econometrically, with a stylized example, if the true model is: yi = β 0 + β 1 malei + β 2 type o f jobi + ui and instead we are fitting the regression: yi = β 0 + β 1 malei + vi , vi = β 2 type o f jobi + ui We get that, if β 2 ̸= 0 and Cov(male, type o f jobi ) ̸= 0, the OLS assumption E[v| X ] = 0 is violated and the OLS coefficient is biased. To visualize this bias, notice the formula for the OLS coefficient is: β̂ 1 = β 1 + Cov( X, v) Cov( X, Z ) ∑n ( xi − x̄ )vi − → β1 + = β1 + β2 × n 2 ( x − x̄ ) Var ( X ) Var ( X ) ∑ i To correct this bias, the variable type o f jobi (Z in the bias formula) must be added to the regression. We call such variables, included to reduce / eliminate ommitted variable bias, "control variables". By adding such control variables, the coefficient β 2 would no longer answer "do men of the same age get paid the same as women" but it would instead answer the question "do men of the same age get paid the same as women for the same job and level of experience". In the literature, when adding such variables, the difference in wages between women and men decreases, but is still statistically significantly. Importantly, even if the coefficient of male becomes not statistically significant by adding such variables, it would not be evidence against gender discrimination in general but evidence against gender discrimination in terms of wages. Even if women were paid the same as men for the same job and the same level of experience, a new question would appear - why are the jobs popular with women less well paid than the jobs popular with men? This can reflect gender discrimination at the stage of selecting a job. Yet another example is the case of maternity leave - even if the difference in wages between men and women were fully explained by the fact women take maternity leave and men do not, there could still be gender discrimination. In certain countries, it might be legal discrimination - e.g., only women can take paid maternity leave. If the law already ensures equal treatment of men and women, it might still reflect discrimination in society based on gender roles. 4