Economics 102C Problem Set 3 Due Monday June 1 At The Beginning of Class Question 1 Consider the following panel data regression equation: yit = βxit + fi + uit where yit are log earnings, xit is tenure (number of years spent working for the current firm), fi is an individual fixed effect, and uit a disturbance term orthogonal to both fi and xit (the constant term is omitted for simplicity). The dimension of the panel is defined by i = 1, 2, ..., N and t = 1, 2. Note that there are only two years of data. It is assumed that fi and xit are correlated: E (fi |xit ) 6= 0. A. (5 points) Suppose you want to estimate this model in first differences using OLS: yit − yit−1 = β (xit − xit−1 ) + (uit − uit−1 ) Assume: ∑N i=1 (xit − xit−1 ) N N →∞ 2 plim ∑N i=1 (xit − xit−1 ) (uit − uit−1 ) N N →∞ plim = Σx = 0 Prove that the OLS estimator of β is consistent. B. (10 points) Assume now that tenure is measured with error, i.e. instead of observing the true tenure xit we observe: zit = xit + εit where the measurement error εit is classical, orthogonal to xit , and also to fi and uit . Suppose you estimate the model in first difference by OLS ignoring the measurement error problem. Characterize the asymptotic bias of the OLS estimator of β. Assume: 1 ∑N i=1 (zit − zit−1 ) N N →∞ 2 plim ∑N i=1 (zit − zit−1 ) (uit − uit−1 ) N N →∞ plim = Σx + Σε = 0 C. (10 points) Suppose that the measurement problem arises only for those who switch firm, i.e. zit − zit−1 = xit − xit−1 for those who work for the same firm at t and t − 1, while: zit − zit−1 = xit − xit−1 + εit − εit−1 for those who switch firm at t. In other words: zit − zit−1 = (xit − xit−1 ) (1 − Sit ) + (xit − xit−1 + εit − εit−1 ) Sit where Sit is an indicator that equals 1 if the person switches firm at t and 0 otherwise. Assume ∑N i=1 Sit = π < 1 s N N →∞ p lim Show that the asymptotic bias of the OLS estimator of β in the model in first differences is now smaller than in the previous case B. D. (5 points) Discuss the problems you are likely to face if, in the case considered in C., you want to estimate the return to tenure by using only the observations on those who do not switch firm at t. Question 2 A researcher posits the following panel data model to explain the earnings of individual workers: yit = αi + βi eit + γxit + uit where eit is the labor market experience of individual i in year t and xit is a dummy for whether the person belongs to a union. This model is sometimes called the "heterogeneous growth model" of earnings: individuals have different intercepts and slopes of their earnings profiles, by labor market experience. The assumptions of the model are: E (αi |xit ) 6= 0 E (βi |xit ) 6= 0 (i.e., αi and βi are "fixed" effects), and 2 E (uit |xit ) = 0 You can also assume E (αi |uit ) = E (βi |uit ) = 0 A. (15 points) Assume that the panel is balanced, individuals are followed for three years, and they all work in these three years (and hence eit+1 = eit + 1). Suggest an empirical strategy to estimate γ without bias. B. (15 points) The researcher runs an OLS regression of yit on a constant and xit using two different samples: individuals in their first year in the labor market (when experience is 0, Sample 1), and individuals in their second year in the labor market (when experience is 1, Sample 2). She finds the following results: Estimate γ b Sample 1 Sample 2 0.10 0.18 (0.03) (0.04) Can you say anything about the correlation between βi and xit ? Show your work. Question 3 This problem will lead you through some of the techniques used in the paper, “Missing Women and the Price of Tea in China: The Effect of Sex-Specific Earnings on Sex Imbalance”. This paper uses a policy change regarding agricultural prices and regulations to demonstrate the effect of an increase in female-specific income on the sex ratio and examine potential explanations. Use the dataset DD_data for parts 1–3. There is a do-file to get you started with the dataset (run these commands prior to attempting the questions). Then use the dataset Year_data for part 4. 1. (5 points) Exploring the data (a) These regressions are not run using individual-level observations. What is the unit of analysis here? (b) Produce a table like table 2, presenting summary statistics of the data. Comment on any interesting features of the table. 2. (10 points) Fixed effects (a) Why might we think that region has a direct effect on the sex ratio? (b) Why might we think that birth period (cohort) has a direct effect on the sex ratio? (c) We can control for these direct effects using a fixed effects regression. Explain (briefly) how you can use dummy variables to implement a fixed effect strategy. 3 (d) In light of your above explanation and the goal of the paper, do you think you should use the areg or xi:reg command? Justify your choice. 3. (15 points) Tea and the sex ratio (a) The paper begins its empirical analysis with DD estimation, using the specification given in equation (2). Duplicate this regression in Stata. Hint: use the areg command and absorb the region effects. State the β and δ coefficients. (b) Give one line interpretations of each of the β and δ coefficients. (c) We need to use survey weights to weight the regressions. The relevant weight is birpop. Rerun the regression for equation (2) using weights (hint, you want to use aweights in stata for this). Discuss the differences between your results for this and your results for part (i). (d) State the β and δ coefficients for the weighted DD estimation. Are these significant? (e) The author uses two different measures of tea and orchards. First, a continuous measure of how much tea (or orchard) is grown, and second, a dummy variable for whether tea (orchard) is grown or not. Redo the DD estimation with the dummy variables. Comment on any differences in your results. (f) Put the results from all of these regressions into a table. Hint, use either the outreg2 or the estout (esttab) command in stata for this. (g) The author’s next step is to control for cohort-region effects. The above specifications controlled for cohort and region effects, but not for effects which differed by regions and cohorts. Draw a graph to illustrate how the interaction of cohort (pre- and post-reform) and region adds to the flexibility of the specification. 4. (10 points) Year by year regressions. For this part use the dataset Yearly_Data.dta. This dataset is like DD_data, but already contains the interaction terms you will require for the exercise. (a) The paper also adopts a more flexible difference in difference specification given in equation (3). Duplicate this regression in Stata. Do this both for the continuous measures of tea and orchard as well as for the dummy variables for whether or not tea (orchard) is grown. (b) Make a table showing only the coefficients on the interactions between tea and year of birth (c) Draw a picture like figure V of the resulting coefficients. 4