24020447 Q1: (i) It seems reasonoable to assume that dist and u are uncorrelated because classrooms are not usually assigned with convenence for students in mind. (ii) The variable dist must be partially correlated with atndrte. More precisely, in the reduced form atndrte = pi + pi priGPA + pi ACT + pi dist + v, 0 1 2 3 we must have π ≠ 0. Given a sample of data we can test H : π = 0 against H : π ≠ 0 using a t test. 3 0 3 1 3 (iii) We now need instrumental variables for atndrte and the interaction term, priGPA⋅atndrte. (Even though priGPA is exogenous, atndrte is not, and so priGPA⋅atndrte is generally correlated with u.) Under the exogeneity assumption that E(u|priGPA,ACT,dist) = 0, any function of priGPA, ACT, and dist is uncorrelated with u. In particular, the interaction priGPA⋅dist is uncorrelated with u. If dist is partially correlated with atndrte then priGPA⋅dist is partially correlated with priGPA⋅atndrte. So, we can estimate the equation stndfnl = β + β atndrte + β priGPA + β ACT + β priGPA⋅atndrte + u 0 1 2 3 4 by 2SLS using IVs dist, priGPA, ACT, and priGPA⋅dist. It turns out this is not generally optimal. 2 It may be better to add priGPA and priGPA⋅ACT to the instrument list. This would give us overidentifying restrictions to test. Q3:(i) Family income and background variables, such as parents’ education. (ii) The population model is score = B0 + B1girlhs + B2faminc + B3meduc + B4feduc+ u1, where the variables are self-explanatory. (iii) Parents who are supportive and motivated to have their daughters do well in school mayalso be more likely to enroll their daughters in a girls’ high school. It seems likely that girlhs and u1 are correlated. (iv) Let numghs be the number of girls’ high schools within a 20-mile radius of a girl’s home. To be a valid IV for girlhs, numghs mu st satisfy two requirements: it must be uncorrelated with u1 and it must be partially correlated with girlhs. The second requirement probably holds, and can be tested by estimating the reduced form. girlhs = p i 0 + pi1faminc + pi2meduc + pi3feduc + pi4numghs + v2 and testing numghs for statistical significance. The first requirement is more problematical. Girls’ high schools tend to locat in areas where there is a demand, and this demand can reflect the seriousness with which people in the community view education. Some areas of a state have better students on average for reasons unrelated to family income and parents’ education, and these reasons might be corrlated with numghs. One possibility is to include community-level variables that can control differences across commuities. Q2: Consider the simple regression model: y = β0 + β1x + u and let z be a binary instrum ental variable for x. Using equation (15.10) in the textbook, the IV estimator for β1 can be written as: b1^ = (1/n) ∑i=1 to n (zi(xi - xbar)(yi - ybar)) / (1/n) ∑i=1 to n (zi(xi - xbar)^2) where n is the sample size, z is the binary instrument, and xbar and ybar are the sample averages of x and y, respectively. Since z is a binary instrument, there are two groups of observations: those for which z = 0 and those for which z = 1. Let n0 and n1 be the sample sizes for these groups, respectively. We can rewrite the IV estimator as follows: b1^ = [(1/n0) ∑i: zi=0 (xi - x0bar)(yi - y0bar)] / [(1/n0) ∑i: zi=0 (xi - x0bar)^2] + [(1/n1) ∑i: zi=1 (xi - x1bar)(yi - y1bar)] / [(1/n1) ∑i: zi=1 (xi - x1bar)^2] where x0bar and y0bar are the sample averages of x and y for the group with z = 0, and x1bar and y1bar are the sample averages of x and y for the group with z = 1. Note that for the group with z = 0, x = zγ + v reduces to x = v, since z = 0. Thus, we can interpret x0bar as the sample average of v for the group with z = 0. Similarly, x1bar can be interpreted as the sample average of v for the group with z = 1. Now, consider the numerator of the first term in the equation for b1^. We can rewrite it as follows: (1/n0) ∑i: zi=0 (xi - x0bar)(yi - y0bar) = (1/n0) ∑i: zi=0 [(ziγ + vi) - x0bar][(yi - y0bar)] = (1/n0) ∑i: zi=0 (vi - x0barγ)(yi - y0bar) = (1/n0) ∑i: zi=0 (viyi - x0barvi) γy0bar(1/n0) ∑i: zi=0 (xi - x0bar) where we have used the fact that γ is the coefficient on the instrument in the first stage regression, and that x0barγy0bar is a constant that can be taken out of the sum. Similarly, we can rewrite the numerator of the second term in the equation for b1^: (1/n1) ∑i: zi=1 (xi - x1bar)(yi - y1bar) = (1/n1) ∑i: zi=1 [(ziγ + vi) - x1bar][(yi - y1bar)] = (1/n1) ∑i: zi=1 (vi - x1barγ)(yi - y1bar) = (1/n1) ∑i: zi=1 (viyi - x1barvi) γy1bar(1/n1) ∑i: zi=1 (xi - x1bar) where we have used the same reasoning as before Plugging these expressions back into the equation for b1^ and simplifying, we get: b1^ = (y1bar - y0bar) / [(1/n0) ∑i: zi=0 (xi - x0bar)^2 - (1/n1) ∑i: zi=1 (xi - x1bar)^2] = (y1bar - y0bar) / (x1bar - x0bar) where we have used the fact that x0bar = vbar for the group with z = 0 and x1bar = vbar for the group with z = 1, and that the sum of squares of deviations from the mean is equal to the sum of squares of deviations from any other constant. Thus, we have shown that the IV estimator for β1 can be written as the grouping estimator: b1^ = (y1bar - y0bar) / (x1bar - x0bar) where y0bar and x0bar are the sample averages of y and x for the group with z = 0, and y1bar and x1bar are the sample averages of y and x for the group with z = 1. This estimator was first suggested by Wald (1940). Therefore , we have used equation (15.10) in the textbook to derive the grouping estimator for the IV estimator of β1 in the simple regression model with a binary instrumental variable.