Assignment 2 Empirical methods in economics IIa, HT22 Ellen Bäcklinder & Olivia Campbell 1. The Instrumental Variable Approach Let’s say that we have an equation of interest: Yi = βo + β1Xi + ui. However, we think that Xi is endogenous, i.e. cov(Xi, ui)≠0. We have another variable, Zi, that predicts Xi and only affects the outcome variable Yi through the endogenous variable, Xi. This means that we can find the causal effect of X on Y using an Instrumental Variable (IV) approach. a) Write down the first stage regression. ππ = πΎ0 + πΎ1 ππ + π£π b) Write down the reduced form equation. ππ = πΏ0 + πΏ1 ππ + ππ Μ π°π½ c) Write down the π· π estimate given by the first stage and reduced form above. πΌπ Μ π½1 without first stage and reduced form: 1 π ∑π=1(ππ − πΜ )(ππ − πΜ ) ∑ππ=1(ππ − πΜ )(ππ − πΜ ) πΌπ π½Μ1 = π = π 1 π ∑π=1(ππ − πΜ )(ππ − πΜ ) Μ Μ ∑ π π=1(ππ − π)(ππ − π) π½Μ1πΌπ with first stage and reduced form: π½1πΌπ First stage OLS estimate: πΎ1 = πΆ(ππ , ππ ) πΏ1 πΆ(ππ , ππ ) π(ππ ) = = = πΎ1 πΆ(ππ , ππ ) πΆ(ππ , ππ ) π(ππ ) πΆ(ππ , ππ ) π(ππ ) πΆ(ππ , ππ ) Reduced form OLS estimate: πΏ1 = π(ππ ) d) In just a few sentences; What is it we do in IV-regressions that let us claim causal effects, even though Xi is endogenous? IV lets us extract the exogenous part of the endogenous X, which lets us find the causal effect. Thereby, we can find the causal effects even though X is endogenous. e) How do you test if the instrument is exogenous? That is, that the instrument does not have a causal effect on Y. An instrument doesn’t have a causal effect on Y, but it does have an effect on X which affects Y. If an instrument is exogenous, then πΆ(π’π , ππ ) = 0. This can’t be tested; you have to use intuition to figure out if the instrument is exogenous or not. However, you can test if Z is correlated with covariates or not, i.e., πΆ(ππ , ππ ), where W is a covariate. If Z is uncorrelated with the other covariates, it’s more likely that Z is uncorrelated with Y as well. Children and Their Parent’s Labor Supply: Evidence from Exogenous Variation in Family Size, by Angrist and Evans, AER 1998 2. Questions regarding the article a) Which variable would you say is Angrist and Evans’ endogenous variable (X)? Which is their instrumental variable (Z)? Their endogenous variable is fertility. Their main instrumental variable is same sex, but they also use twins as a second instrument. Both instruments are dummies. b) What is the initial endogeneity problem when estimating the effect of childbearing on labour supply? The initial problem is that fertility is endogenous. Due to this, an instrument must be used to capture the exogenous effect of fertility and by this show the true causal effect of fertility on labor supply. c) Do you think their instrument satisfies the exclusion restriction? Why/why not? The exclusion restriction requires that Z only affects Y through X. I believe that the instrument same sex satisfies this restriction but not the instrument twins. Having twins affect women’s ability to work to a much larger extent than having non-twins does. It’s more costly to put two children into childcare, not everyone will be able to afford it even if the woman goes back to work right after birth. This might lead to women not going back to work; hence twins have a direct causal effect on labor supply. Another reason that I don’t believe that twins satisfy the exclusion restriction is that having twins is more time consuming, than having non-twins. There might not be enough time to both work and take care of the twins (as well as the household), which might affect the ability to work, either by working less hours or not being able to work at all. d) What is the causal effect Angrist and Evans have measured here? (Is it the labour supply response of having a child, or is it something more narrow?) The causal effect that Angrist and Evans have measured is how childbearing affects labor supply for women aged 21-35 with 2 or more children. However, Angrist and Evans do argue that their choice of sample is representative and general, which means external validity. If there’s external validity, the causal effect that they have measured is how childbearing affects labor supply. 3. Replication Note: Since the output in Table 5 is for the full sample and we only use the sample consisting of married women in 1980 census (pums80.dta), I have included tables with the results you should be able to replicate, see page 4. a) Replicate the last three rows of Table 3 and show your results in a table. The rows are: (1) one boy, one girl, (2) both same sex and difference (2)-(1). Do this for your sample (Married women, 1980 PUMS), and only show results for the column: Fraction that had another child. Hint: The Fraction that had another child for the rows (1) one boy, one girl, (2) both same sex, are mean values, with standard errors in parenthesis. Use a regression to get the mean values and the standard errors (instead of standard deviations). (1) (2) (3) morekids morekids morekids 0.068*** samesex (0.002) _cons N 0.346*** 0.414*** 0.346*** (0.001) (0.001) (0.001) 125909 128745 254654 b coefficients; Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 b) Replicate rows 1-4 (More than 2 children,..., Weeks worked) in Column 1 (Mean difference by Same sex) of Table 5. Show your results in a table. Hint: The independent variable in these regressions is ”samesex”. Again, standard errors are in parentheses. (1) (2) (3) (4) samesex _cons N morekids kidcount workedm weeksm1 0.0675*** 0.0825*** -0.0093*** -0.4263*** (0.0019) (0.0030) (0.0020) (0.0867) 0.3464*** 2.4661*** 0.5329*** 19.2339*** (0.0013) (0.0021) (0.0014) (0.0618) 254654 254654 254654 254654 b coefficients; Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 c) Replicate rows 3 and 4 for column 2 (Wald estimate using as covariate: More than 2 children) in Table 5. Use the following method: On page 458, equation 2 shows the IV estimate of β. Calculate ¯y1, ¯y0, ¯x1, ¯x0 and use these to get βIV in the same way as equation 2. What value do you get for βIV when y = Worked for pay?? And when y = Weeks worked? Show your result in a table (you don’t need to calculate standard errors). Hint: First generate y1i (which is yi when zi = 1), then generate y1 (the mean). Proceed with y0i (which is yi when zi = 0) etc. Compare with your results in b), maybe they are useful? d) Replicate the same results as in c), using the following method: Run a first stage regression (using ”morekids” and ”samesex”), predict the values from the first stage regressions, and then run IV regressions using these predicted values, for each of the two dependent variables. Show your results in a table (including standard errors!). morekids_samesex _cons N (1) (2) workedm weeksm1 -0.14*** -6.31*** (0.029) (1.284) 0.58*** 21.42*** (0.011) (0.491) 254654 254654 b coefficients; Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 e) Replicate the first 2 rows (Worked for pay and Weeks worked) for Column 5 in Table 7. That is, run 2SLS regressions with samesex as an instrument for More than 2 children, with Worked for pay and Weeks worked as dependent variables. Do this with and without control variables, and show your results in a table. Compare with your results in d), are they similar? Hint: Use stata command ivregress. See table footnote for information about which control variables to include. With controls (1) (2) workedm weeksm1 morekids _cons N -0.12*** -5.46*** (0.028) (1.212) 0.45*** 8.21*** (0.014) (0.586) 254654 254654 b coefficients; Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001 Without controls morekids _cons N (1) (2) workedm weeksm1 -0.14*** -6.31*** (0.029) (1.275) 0.58*** 21.42*** (0.011) (0.487) 254654 254654 b coefficients; Standard errors in parentheses * p < 0.05, ** p < 0.01, *** p < 0.001