Statistics and Causal Inference JASA 1986 I. Association II. Potential Responses Framework by Holland, III. Fundamental Problem of Causal Inference IV. V. Scientific and Statistical Solutions to the Fundamental Problem of Causal Inference. Some of the many complications and being careful! a) What is the population? b) What is the cause? c) SUTVA I am not making any attempt to cover all of the issues raised by Holland; rather I'm covering issues I currently think are important. I. Association U= the population= All preschool children in US with parental income <= 2* poverty level. The u's are children-- they are the units in the population. S(u) = Y(u) t if u attended HeadStart c if u did not attend HeadStart = total years achieved education An associational question: Do children who attend HeadStart achieve a higher level of education as compared to children who do not attend HeadStart? Answer: Compare the average value of Y among children who attend HeadStart to the average value of Y among children who do not attend HeadStart. Quantitatively: E[ Y | S=t] - E[Y | S=c] How might we try to answer the questions: Jack did not attend HeadStart. What would Jack's educational achievement be if he had attended HeadStart relative to his present educational achievement? or What are the educational consequences for children who attend HeadStart as compared to children who do not attend HeadStart? That is, does "Attending HeadStart" cause a greater level of educational achievement as compared to "Not Attending HeadStart?" or What are the educational consequences for girls who attend HeadStart as compared to girls who do not attend HeadStart? II. Potential Responses Framework U= the population = All preschool children in US with parental income <= 2* poverty level. Each unit(child)has two potential responses: Yc(u) Yt(u) and Yt(u) is unit u's response if unit u attends HeadStart Yc(u) is unit u's response if unit u does not attend HeadStart u Jack Jane Nancy Tonya David Yt(u) Yc(u) 14 6 12 11 10 6 5 12 6 11 individual effect of HeadStart 8 1 0 5 -1 Causal Effects: The effect of HeadStart on Jack is 146=8 years of educational achievement. The average effect of HeadStart is (8+1+0+5-1)/5= 2.6 additional years of educational achievement. The average effect of HeadStart on girls is (1+0+5)/3=2 additional years of educational achievement. Quantitative expression of the Causal Effects: The effect of HeadStart on Jack is Yt(u)-Yc(u) where Jack is unit u. The average effect of HeadStart can be written as E[Yt]-E[Yc]. The average effect of HeadStart on girls is E[Yt | girl]- E[Yc | girl]. III. The Fundamental Problem of Causal Inference We observe ONLY one of unit u's potential responses! u Jack Jane Nancy Tonya David Yt(u) 14 ? 12 11 ? Yc(u) ? 5 ? ? 11 S(u) t c t t c Without further assumptions we certainly can not ascertain the effect of HeadStart on Jack, nor can we estimate the average effect of HeadStart nor can we estimate the average effect of HeadStart on girls. We observe S(u); recall S(u) is the cause to which unit u is exposed or in our example S(u) tells us whether or not unit u attends HeadStart. Quantitatively we get to observe S and YS but we wish to estimate Yt(u)-Yc(u) or E[Yt]-E[Yc] or E[Yt | girl]- E[Yc | girl]. IV. Some Solutions to the Fundamental Problem of Causal Inference. a) Scientific/Structural: These are assumptions concerning the relationship between Yt(u1), Yt(u2), Yc(u1), Yc(u2) for the units, (the u's), in the population. b) Statistical: These are assumptions regarding the relationship between S and(Yt,Yc). Except in very special cases these assumptions are unverifiable. The assumptions must stem from a priori scientific knowledge or experimental design. b) Statistical: Restrict interest to the average effect of HeadStart, E[Yt]-E[Yc] (or the average effect of HeadStart in a subpopulation, E[Yt | girl]- E[Yc | girl]). Assumption: The selection of exposure, S is INDEPENDENT of the potential responses, (Yt,Yc) (or the selection of exposure, S is INDEPENDENT of the potential responses, (Yt,Yc) within the subpopulation of girls). If the assumption is true, then we can use (S, YS) on our sample of units to estimate This is the case because E[YS | S=t]= E[Yt | S=t]= E[Yt] (since Yt of S) and E[YS | S=c]= E[Yc | S=c]= E[Yc] (since Yc of S). our data, E[Yt]-E[Yc]. is independent is independent Thus estimating the average causal effect, E[Yt]-E[Yc] is the same as estimating E[YS | S=t] - E[YS | S=c] and we can do the latter with our sample means! Note on Holland's paper: Regardless of whether the independence assumption holds, Holland calls, TPF= E[YS | S=t] - E[YS | S=c] the prima facie causal effect of t relative to c. We have seen that under the independence assumption the prima facie causal effect is equal to the average causal effect. Note that the independence assumption holds when S is a randomization indicator, that is the units are assigned at random to t or c. This is what makes randomized trials so special--we KNOW the independence assumption holds in randomized trials. a) Scientific/Structural: Unit Homogeneity: Assume that Yt(u) is the same for all u. Assume Yc(u) is the same for all u. This is clearly not true in our last table (Jane and David do not have the same Yc). However if this were true, then to ascertain the causal effect of HeadStart we find a unit exposed to t, say u1, and a unit exposed to c, say u2. The effect of HeadStart is then Yt(u1)-Yc(u2). Note that the average effect of HeadStart, E[Yt]-E[Yc], is equal to Yt(u1)-Yc(u2) under unit homogeneity. Thus it is no problem to estimate either the average effect of HeadStart or the effect of HeadStart from observations of S and YS for a sample of units. This is because we can estimate E[Yt] by Yt(u1) or even the average of YS over the units with S=t (E[YS| S=t]=E[Yt|S=t]=Yt(u)). This assumption is frequently found in a more general form. For example you might assume that Yt(u), Yc(u) is the same for all girls (units) who live in East Detroit, who live with a single parent, and whose mother is alcoholic. Then you find two girls of this type, one of which attended HeadStart and one of which did not attend HeadStart. The difference in their responses is then equal to the effect of HeadStart on girls of this type. Constant Effect: This is another more sophisticated, yet weaker, version of Unit Homogeneity. Here we only assume that for all units in the population Yt(u)= Yc(u) + T (So T does not vary by unit). T is the causal effect of t relative to c. The Constant Effect assumption must be combined with additional assumptions in order to estimate T=Yt(u)-Yc(u) or T= E[Yt]-E[Yc] or T= E[Yt | girl]- E[Yc | girl]. In particular, the prima facie causal effect of t relative to c, TPF = E[YS | S=t] - E[YS | S=c] = E[Yt|S=t] - E[Yc | S=c] = E[Yc + T |S=t] - E[Yc | S=c] = T + {E[Yc |S=t] - E[Yc | S=c]} Without further assumptions there is no reason to believe that the mean of Yc among children who attended HeadStart should be equal to the mean of Yc among children who did not attend HeadStart. c) Combinations of Scientific/Structural and Statistical Assumptions: Researchers often combine these types of assumptions with great success. Recall that in general the data (like in the last table) can not be used to prove or disprove the assumptions. One way to combine assumptions: Assume Constant Effect (possibly within a subpopulation). T is the causal effect of HeadStart. Then assume that mean of Yc among children who attended HeadStart is equal to the mean of Yc among children who did not attend HeadStart (possibly within a subpopulation) or assume that Yc is independent of S. This yields TPF=T so that you can use sample means to estimate T. OR Assume Unit Homogeneity. Because of measurement error we do not observe YS(u) rather we observe YES(u)= YS(u) + errorS(u) for S = t or c on each unit. Thus Jane and David do not have the same response in the last table because we are observing YEt or YEc rather than the true response. Next make the statistical assumption that measurement error (errort, errorc) has mean zero and is independent of S. Then again we have that E[YES | S=t] - E[YES | S=c] = E[YEt|S=t]- E[YEc | S=c] = Yt(u) – Yc(u) + E[errort|S=t] - E[errorc | S=c] = Yt(u) – Yc(u) + 0. Again we can use sample averages to estimate the causal effect of HeadStart. V. Some of the many complications! a) What is the population? b) What is the cause? c) SUTVA a) What is the population? Suppose you make the population U= all preschool children in US. Does it make sense to think about Yt(u), Yc(u) for all u???? Is it relevant? b) What is the cause? This is incredibly difficult and scientists have been arguing about this for ages. One must be very careful here. Racial discrimination in graduate school admission. Michigan has been charged with reverse racial discrimination. How might we think about this problem? (For simplicity, I will pretend that there are only two races.) A possible set of causes: Putting African-American on application papers vs. Putting Caucasian on application papers The population of units: all applications by students admission for graduate school The response: acceptance vs denial The question: Do you want to make inference about effect of AfricanAmerican on Jane's acceptance or do you want to make inference about the average effect of African-American on acceptance or do you want to make inference about the average effect of African-American on acceptance for poor women? Another possible set of causes: The school uses the current set of admission criteria vs. The school uses an alternate set of admission criteria that does not involve racial designation. The population units: all applications by students for admission graduate school The response: acceptance vs denial The question: Do you want to make inference about effect of current admission criteria on Jane's acceptance or do you want to make inference about the average effect of current admission criteria on acceptance or do you want to make inference about the average effect of current admission criteria on acceptance for poor women? c) SUTVA: We have been implicitly making the assumption SUTVA, "Stable Unit Treatment Value Assumption." However it is an important assumption which may not hold if we are vague in specifying the cause, or the response or the population. Rubin in his discussion of Holland's paper, JASA pg. 961962, vol 81, 1986 writes: SUTVA is a) The value of Y for unit u when exposed to treatment t will be the same no matter what mechanism is used to assign treatment to unit u and b) The value of Y for unit u when exposed to treatment t will be the same no matter what treatments the other units receive. Thus there does not exist unrepresented versions of the treatment (Yt(u) does not depend on whether the mother volunteered the child for HeadStart or on whether the child was randomized to HeadStart) and there does not exist inference between units (Yt(u) depends on whether unit u' received treatment t or c).