>* Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/conditionalindepOOangr I .(> HB31 .M415 i working paper department of economics CONDITIONAL INDEPENDENCE IN SAMPLE SELECTION MODELS Joshua D. Angrist 96-27 October, 1996 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 CONDITIONAL INDEPENDENCE IN SAMPLE SELECTION MODELS Joshua D. Angrist 96-27 October, 1996 - FEE Conditional Independence in Sample Selection Models by, Joshua D. Angrist MIT, Hebrew University, and NBER* October 1996 'Thanks go to Gary Chamberlain of the author. Guido Imbens, Whitney Newey, Paul Rosenbaum, Tom Stoker, and especially to for helpful discussions and comments. Any errors are, of course, solely the work Conditional Independence in Sample Selection Models Econometric sample selection models typically use a linear latent-index with constant coefficients to model the selection process and the conditional mean of the regression error A feature common to most of these models is that the conditional an invertible function of the selection propensity score, covariates. i.e., mean in the selected sample. function for regression errors the probability of selection conditional on Consequently, conditioning on the selection propensity score controls selection bias, a fact which underlies much of the recent literature on non-parametric and semi-parametric selection models. This literature has not addressed the question of whether the propensity-score conditioning property necessarily a feature of sample selection models. properties that make it mechanism and imposes no The if I describe the conditional independence resulting characterization does not rely on a structure other than an assumption of independence error term and the regressors in determine In this note, is possible to use the selection propensity score to control selection bias in a general sample selection model. to is random samples. This approach conditioning on the selection propensity score is latent index selection between the regression leads to a simple rule that can be used sufficient to control selection bias. Joshua D. Angrist MIT Department of Economics Memorial 50 Drive Cambridge, MA 02139-4307 ANGRIST@MIT.EDU Keywords: non-random samples, selection bias, propensity score 1. Introduction Sample selection problems occur interest in wages A in both experimental and observational studies. Econometricians' sample selection models originated with research for workers to infer the distribution of offered similar problem occurs wage equations also affect when wages in labor economics using the distribution of for everyone (Gronau, 1 974; Heckman, employment interpreted as a form of selection bias traditional 974). instrumental variables that are being used to correct for endogeneity in status (see, e.g., Angrist, 1995). In experiments, the bias by controlling for "post-treatment variables" that are correlated with the outcome of The 1 (Ham and econometric solution to interest induced can also be Lalonde, 1996; Rosenbaum, 1984). problems of this type invokes an assumption of joint normality for the regression error and the error term in a latent-index selection mechanism. This approach leads to a added maximum likelihood or two-step selection correction in the form of inverse Mills-ratio terms to the regression function of interest (see Heckman, 1976 and 1979). A number of semi-parametric variations on this normal selection model have also been introduced (see, Powell, 1993, and references in Newey, Powell, and Walker, 1990 and Heckman, parametric and semi-parametric formulations have two is used to characterize the mean of unobserved covariates and the sample selection rule. explicitly handled common threads. First, less restrictive 1990). a latent index framework regression error terms conditional on the regression Second, the problem of selection bias is either implicitly or i.e., the 1 This paper describes the conditional independence properties that selection propensity score to control selection bias in regression estimates. make it possible to use the The problem studied here includes applications involving experimental data and instrumental variables as special cases. is Both the by conditioning on an invertible function of the selection propensity score, probability of selection given covariates. objective Ahn and e.g., My first a nonparametric formulation of the sample selection problem where the only structure consists 'The propensity-score terminology originates with Rosenbaum and Rubin (1983, 1984), who introduced the propensity-score method of controlling for confounding in evaluation research. of the regression equation of interest and an independence on the regression restriction The error. elementary properties of conditional independence, as discussed by Dawid (1979) and Florens and Mouchart (1982), are then applied to this formulation. I believe this approach provides a very general description of the identification problem in sample selection models. rule that can be It also leads to an easily interpreted used to check when nonparametric conditioning on the selection propensity score is indeed sufficient to control selection bias. 2. Motivation The equation of y, where e, = X,'P + y, is e„ everybody. (1) the dependent variable and X, population. in the interest is The We is the k x vector of regressors, assumed to be independent of observe a random sample of n observations on coefficient vector expectation function, so that if we is assumed did observe necessary to justify this definition of P, the selection bias (e.g., Chamberlain, 1986). to be y, for X„ but y, is not observed for such that X/P gives the population conditional everyone in the sample, P could be consistently Although independence of regressors and errors estimated by ordinary least squares (OLS). to 1 full independence assumption At the end of the paper I show is customary that full in is not papers on independence appears be of fundamental importance for identification strategies that use the selection propensity score to control selection bias. Let w, indicate selection status (i.e., of offered wages). The selection problem that X, affects w, as well as yj. is w= l indicates workers in an application T|, is y- is the log often motivated using a latent index to capture the possibility In particular, suppose that w,= irX/6-T^X)] where where an error term that could be correlated with (2) e, but is assumed to be independent of X,. Then, we have E[e, X„ w,=l] = | E[e, | X„ X,'5 > Selection bias arises because the term E[e, E[e, = X,] | in * tjj w=l] X,, I 0. is generally a function of X, even though random samples. Parametric solutions to selection problems of this type typically begin by assuming that the error terms (£„ are jointly normally distributed, so that ti,) X„w = l] E[e,| where p (Ej, T|,) is the coefficient from a regression of with X,, E[tli where and <j) = pE[v,|x„X,'8>TU I we (3) e, on v Because of normality and joint independence of . ; can simplify further: X,'S > TiJ = -^(X,'5)/a)(X,'5) are the normal density = XCXi'8), and distribution functions for x\ . { to (1) provides a feasible solution to the selection can be estimated from a Probit regression of w problem on ( Critics of traditional econometric selection point out that these models combine a wide X and/or 8) for which there selection model have is context because the required parameters . (e.g., Hartigan and Tukey, 1986; Little, 1985) variety of assumptions that are hard to assess and interpret. on distributional assumptions, functional form (3 as a regressor ( models In particular, the validity of parametric corrections for on in this Adding A^X/8) sample selection bias clearly turns restrictions, rarely, if ever, and exclusion any empirical restrictions justification. led econometricians to develop less restrictive 2 (i.e., to a large extent zero restrictions These features of the normal models for selection problems. For example, Newey, Powell, and Walker (1990), Lee (1982), and others have shown that the distributional assumptions in these model can be relaxed One approach in a semi-parametric framework. to semi-parametric selection corrections begins with the observation that "Mroz (1987) documents the also Olsen (1980, 1982). even fact that empirical results can be sensitive to these assumptions. in See the absence of parametric distributional assumptions, selection-correction terms typically consist of non- linear but invertible functions of the selection propensity score, P[W;=1 models has been noted by Heckman and Robb (1986), and selection On This feature of selection motivates the semi-parametric estimators for models discussed by Newey (1988), Robinson (1988), Choi (1992), and Ahn and Powell (1993). 3 the other hand, earlier work on semi-parametric propensity-score conditioning property I it ZJ. | is models has not selection clarified whether the a necessary feature of the selection problem. In the next section, use simple conditional independence notation to provide a general characterization of models for which conditioning property holds. this propensity-score 3. Conditional independence and sample selection Conditional independence defined as follows: is Definition 1 (Florens and Mouchart, 1982). common Let and only Note A A 2, denote random variables defined on a 3 probability space with joint probability measure P[A,, some values of conditional probability statement for A, given if A,, if P[A, | A A 2, 3] = P[A, | A 3 ], A A A 2, 2 and and 3 ], A 3 . let 2 1 A 3 , A A 2, Then we write A, is transitive. 3] denote a JJ A 2 1 A 3 The conditional independence concept, although basic and perhaps even obvious, has proven widely useful. A | a.s. that conditional independence, like independence, the notation, A, JJ P[A, and argued that many important ideas in statistics Dawid (1979) popularized can best be understood in these terms. Chamberlain (1982) and Florens and Mouchart (1982) used conditional independence ideas to explore the relationship between alternate definitions of "causality" Educational researchers and statisticians in time-series econometrics. have also discussed conditioning on the selection propensity score; see, for example, Powell and Steelman (1984) and Wainer (1986). And Rosenbaum and Rubin (1983) used result for evaluation The Lemma 1. R Let R,, R,, (i) R, UR (ii) R, n (iii) R, II R-4 is problems with selection on observables. 2 I R I R in (i) and and 3 common R, II R-4 I ( R 2> probability space with joint R3) R2 R, II I (R 3 , R4) Florens and Mouchart (1982), presented here in (1979). and be random variables defined on a R3 I in R4 the following are equivalent: and 3 Dawid equivalence of 3, Then RJ (R* Theorem A.l Lemma 4.3 4 properties of conditional independence can be characterized as follows: probability measure. This conditional independence arguments to establish the propensity-score (iii) Dawid states the equivalence by using symmetry of R 2 and of (ii) and R4 somewhat simpler (iii) notation, and only, but this also implies the in part (ii). 3.1 Selection bias The problem of selection bias can be described in conditional independence terms with no structure other than the regression of interest and the independence restriction on £,. This restriction is stated formally below: Assumption e, 1. X, where U Let {(y„ X,')'; i=l, 8, = (y,-X,'(3) 4 and . . E&J ., n} denote i.i.d. observations on a (k+l)xl random vector. Then X,]=X,'p\ The propensity-score result for evaluation research says that in an experiment where treatment is randomly assigned conditional on covariates. it is enough to condition on the probability of selection given covariates to eliminate confounding from any relationship between the covariates and the treatment assignment. We would first like to know when Instead of observing {(y„ X,')'; i=l, w, indicate selection status as before. {(w,y„ Wj, X/)'; i=l, Proposition lemma imply Then given Assumption X, | JJ w, JJ w, implies X,. = If e, Then E[e, X, £, e, JJ 1, w,)| X„ I and E(e, I w = l) is w,) XJ = JJ w, , X, w, | X, I | e, e, JJ while s, iff e, R 2 = X;, R w„ we need observe Uw w, 3 X, X, | JJ Note w,. R =w„ parts 4 e, JJ X, is implies either e, is a non-trivial X,. Since ,}- | | and =£2, to rule out JJ w, JJ X,] | w, with (i) X, X, w, | l | | XJ [E(e, I w,= l)-E( e| takes on 2 or more | w,=0)]P[w,=l | distinct values since E(e, I w) mechanism fail in that well-known when e, X, is | w, means independent of the regression error conditional on is also a function of X,. Conversely, sample, selection status must be related to the dependent variable. models with homoscedastic errors (in which will be JJ £, if Assumption Assumption Of course, case, 1 X, I Thus, any selection X,. related to the dependent variable conditional on the regressors causes for latent-index not zero is w^O). the selected sample if selection status in the selected is is e, t random sample. The property w, that selection status or XJ. consistent or unbiased in the selected sample as well as in a means JJ that which are based on Assumption JJ e, w, | that estimators £, of the (iii) maintained, e, JJ e, JJ and XJ w=0) + E(e,| not equal to £(£, The property we n}, ., w,, this implies cannot be true when P[w = this e, {£, JJ always implies X, | | X,} iff and e, JJ Setting R,=e„ the proposition, E[E(e, X, JJ w, I w, XJ = = E[E(e,| But JJ £, JJ | with w, JJ E, X„ | To complete or both. | w, {£, JJ . of support and that P[w,= l that X, has at least 2 points Let Q. denote the sample space. Proof. . Let to fail. 1 Important consequences of the selection process are summarized below: n}. ., . Suppose 1. function of X,. w, . a sample selection process causes Assumption w, JJ £, 1 1 to fails the first point I X, means that Ej and the latent-index error are general features of are uncorrelated). ri; sample selection problems. In The proposition shows that this and the converse particular, these properties hold regardless of the underlying selection mechanism or model. Proposition effects 1 For applies equally to experimental data. illustration, where y^y.o+P and y i0=u+e, denote the counterfactual outcomes individuals. Let X, be an indicator of randomized treatment assignment. a simple comparison of means provides an unbiased estimator of (3 in at risk By assume constant treatment virtue of random assignment, random samples from the population of treatment. But the independence between randomized treatment assignment and counterfactual outcomes is lost whenever the experiment is analyzed conditional on a variable that outcomes and affected by the treatment assignment. Suppose experiment, correlated for with pairs include death and health given X, means that e, ^ w implies that estimation of (3 for yj some reason status, or Xj. I If, as conditional on 1 is seems w is that is ; is { Correlation between also affected by necessarily subject to selection bias. also relevant for is IV estimation. for is independent of Rosenbaum (1984) makes e, Proposition 1 5 IV estimation are that in cases £, in in the target population. where the sample selection rule the selected sample if and only if conditional on the instruments. a similar point regarding the consequences of conditioning on "post- treatment variables" in experiments. e, that the instruments (playing the role of X,) are involves the instruments, the instruments will be independent of selection status and ; For applications involving independent of the regression error term (defined around an endogenous regressor) 1 X„ then w s instrumental variables, the key identifying assumption The implications of Proposition both correlated with a second outcome variable in the and wages. status likely, v/ w is Examples of such correlated outcome other than X,. employment ; Finally, note that Proposition 5 and nontreated for treated 3.2 The selection propensity score The selection propensity score the selection propensity score e(X,) which is = P(w, used The E[y, Of course, is X,), below: selection propensity score is said to control selection bias if for this to be may E[e, | w,= l, JJ X, I w„ e(X,). be possible to estimate e(X,)]. (3 (4) of any practical use, that must be enough variability In the literature fixed. it e, sample because then we have X„ w,= l] = X/p + I fact that emphasized by the notation is conditioning on the selection propensity score controls selection bias, in the selected The the conditional probability of selection given X,. a function of X, is in the definition Definition 2. If | is on semi-parametric selection models, priori exclusion restrictions on (3 in X, to identify this variation is (3 once e(X,) obtained by imposing a and/or on e(X,). See Chamberlain (1986) for a precise statement of the restrictions required for identification in this context. Here I focus on Definition 2 as a likely starting point for any possible strategy to identify p. The selection models Proposition Proof. relationship between Definition 2 and other the conditional independence properties of sample 2. is outlined in the following proposition: The following U (i) w, (ii) (e„ w,) II X, (iii) e, First U X, X, | | are equivalent: e„ e(X,) w„ observe that w, | e(X,) e(X.) U X, | e(X,) because P[w = l 8 | X„ e(X,)] = e(X,) and that £, TJ X, | e(X,) because X e, ]J {(i), e, and e(X,) s H X, Therefore, since w, established. is e(X,)} | U X, a function of X,. » (ii) <=> {(iii), e(X,) and e | : U lemma Moreover, U w, X, X, 1 implies: e(X,)}. | (6) e(X,) hold automatically in this case, the result I is 6 Line (iii) of Proposition 2 same the is as Definition 2, i.e., this part of the proposition consists of the statement that conditioning on the selection propensity score controls selection bias. Lines (i) of the proposition therefore provide equivalent necessary and sufficient conditions for Definition X that while the statements w, JJ w) stronger statement that (e , independence is JJ ; X, e(X,) automatically hold, line | are jointly independent of X, given ; ; e(X,) and e | s Of e(X,). (ii) and 2. (ii) Note makes the course, this joint not implied by marginal independence. Therefore, one consequence of the if-and-only-if relationship between (ii) and that conditioning (iii) is on the selection propensity score does not necessarily control selection bias. The equivalence of lines (i) sufficient condition for (i) Proposition support of 3. X„ also has practical consequences because line given in Proposition is Suppose the selection mechanism easy to In particular, a 3: monotonic, is status. (i) is i.e., for any two values x', x° in the either P[w = l P[w,= l where conditioning on e, is 1 | x j x', e,] , £,] > P[w=l < P[w,= what makes these of selection controls selection 6 (iii) models for the mechanism determining selection verify in the context of specific weak and | x°, £,], a.s. or x°, e,], a.s., inequalities (7) random. Then conditioning on the probability bias. Choi (1992) shows that part error and the regression error l | (ii) [i.e., of Proposition 2 replacing w, with [joint r|, independence], imposed on a latent-index in (2)], implies part (iii). Proof. Note = J that if e(x')=e(x°), {P[w,= l | x 1 , e(x'), e,]-P[w,=l x°, e(x'), e,]}h(e,)de, | Given a monotonic selection mechanism, the quantity to equal zero, that for a we must therefore have P[w = l | in brackets is x', e(x'), £,]=P[w = l always non-negative. For the integral This argument shows x°, e(x'), £,]. | monotonic selection mechanism, P[w = l x | 1 , £,]=P[w,= l | x°, e,] whenever e(x')=e(x°). The proposition can then be established by observing the probability of selection To get an idea of (8) that (8) implies line (i) of Proposition conditionally independent of X, given e(X,) and is how general this result is, constant coefficients and errors independent of X, 2, i.e., that £,. note that any latent-index selection mechanism with is monotonic. assumptions about the error distribution. Monotonicity models as well. Consider, for example, a latent index is This fact does not depend on other satisfied by a wide range of model with random less restrictive coefficients: w^irx/s.-Ti.x)], which can be re-written, w, = lrXi'S'-Tii'x)], where 8* = E[5,] and r\* = + r\ X,(8*-8,) an error term that is is l independent of As long X,. as the coefficient 8, has the not independent of X, even if same sign for all i, T), is Proposition 3 implies that conditioning on the selection propensity score controls selection bias in this model. Finally, it is worth emphasizing the role played by full independence in obtaining the result that conditioning on the selection propensity score controls selection bias. Suppose that instead of (1) we have a heteroscedastic regression model, y, where E, = X,'P + is still [X,'y]e, = X,'(3 + £, assumed independent of X (9) . ; The heteroscedastic 10 error term, £,, is only mean-independent of X,. In this case, conditioning X X„w,=l] = E[y,| = SitfxJ w„ Even if E[e, converge w = l, still + [X/YMe, X,'(3 w = l, | X,] | is w,=l,e(X,)]. (10) a function of X, and not just e(X,). In other words, e(X,). e(X,)] The to (3+ye. mechanism 4. I + E[^| w=l, XJ '(3 1 Equation (10) means that E[^, on the probability of selection does not control selection bias because is fixed at a number e, estimates in the selected sample are biased and identification failure occurs in this case in spite of the fact that the selection satisfies part (i) of Proposition 2, i.e., w, TJ X, | e(X,). ^,, Conclusions This paper lays out some of the basic conditional independence properties of sample selection models, focusing on those that justify conditioning on the selection propensity score to control selection bias. One reason I think this approach to sample selection models general formulation of key identifying assumptions. Another for the validity it leads to a is that weak it leads to a very sufficient condition X„ and £,), Finally, the results given here are cast in terms of potentially observable and not inherently unobservable researchers could in principle construct an experiment that An that valuable of identification strategies that either explicitly or implicitly involve conditioning on the selection propensity score. quantities (w,, is is additional contribution of this paper may latent index error terms. would test the identifying This means that assumptions. be to help bridge the gap between the approaches taken by econometricians' and statisticians' to the problem of selection bias. Use of the propensity score to control confounding in evaluation research statisticians are familiar with this idea. was introduced by Rosenbaum and Rubin (1983) and many In a discussion of econometric models for program evaluation, Holland (1989, page 876) notes that econometricians also use the propensity score, but he suggests that this use is "quite different for the two approaches." Similarly, 11 Heckman and Robb (1986) argue that the econometric approach to selection models uses the propensity score "in a different way than that advocated by Rosenbaum and Rubin [1983]." This paper shows that differences between the role played by the propensity score in the econometrics and statistics literature In evaluation research, conditioning may not be as great as previously believed. on the propensity score controls for the bias caused by an association between treatment status and observed exogenous covariates, while the propensity score controls for the bias caused by an association exogenous covariates. In in selection models, conditioning on between selection status and observed both cases, conditioning on the propensity score solves a problem of confounding that arises because of dependence between a covariates. 12 dummy endogenous regressor and other MIT LIBRARIES 3 9080 01428 2708 References Ahn, H., and J.L. Powell (1993), "Semi-parametric Estimation of Censored Selection Models with a Nonparametric Selection Mechanism," Journal of Econometrics 58, 3-29. Angrist, J.D. (1995), "Conditioning on the Probability of Selection to Control Selection Bias," NBER Technical Working Paper No. 181, June. Chamberlain, G.C. (1982), "The General Equivalence of Granger and Sims Causality," Econometrica 50, 569581. Chamberlain, G. C. (1986), "Asymptotic Efficiency in Semi-parametric Models with Censoring," Journal of Econometrics 32, 198-218. Choi, Kyungsoo (1992), "Semiparametric Estimation of Sample Selection Models Using a Series Expansion and the Propensity Score," University of Chicago Department of Economics, mimeo. Dawid, A.P. (1979), "Conditional Independence in Statistical Theory," Journal of the Royal Statistical Society Series B Florens, J.P, and 41, 1-31. M. Mouchart (1982),"A Note on Noncausality," Econometrica 50, p. 583-591. Gronau, R. (1974), "Wage Comparisons, A Selectivity Bias," Journal of Political Economy 82, 1119-1144. Ham, John, and R.J. Lalonde (1996), "The Effect of Sample Selection and Initial Conditions in Duration Models: Evidence from Experimental Data," Econometrica 64 (January), 175-205. Hartigan, John, and J. Tukey (1986), "Discussion Impact of Interventions," by J. Comments on 3: Heckman and 'Alternative Methods for Evaluating the Drawing Inferences from R. Robb, in H. Wainer, ed., New York: Springer-Verlag, pp. 57-62. (1974), "Shadow Prices, Market Wages, and Labor Supply," Econometrica 42, 679-693. (1976), "The Common Structure of Statistical Models of Truncation, Sample selection, Limited Self-selected Samples, Heckman, J. J., Heckman, J. J., Dependent variables and a Simple Estimator for Such Models," Annals of Economic and Social Measurement 5, 475-492. Heckman, J.J., (1979), "Sample Selection Bias as a Specification Error," Econometrica 47, 153-161. Heckman, J.J., (1990), "Varieties of Selection Bias," American Economic Review 80, 313-318. Heckman, J.J., and Richard Robb (1986), "Alternative Methods for Solving the Problem of Selection Bias in Evaluating the Impact of Treatments on Outcomes," in H. Wainer, ed., Drawing Inferences from Selfselected Samples, Holland, P. (1989), "It's New York: Springer-Verlag. Very Clear," Comment on J.J. Heckman and V.J. Hotz, "Choosing Among Alternative methods of Estimating the Impact of Social programs: The Case of Manpower Training," Journal of the American Statistical Association 84, 875-877. Lee, Lung-Fei (1982), "Some Approaches to the Correction of Selectivity Bias," Review of Economic Studies 49, 355-372. J.A., "A Note About Models for Selectivity Bias," Econometrica 53, 1469-1488. Mroz, T.A. (1987), "The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions," Econometrica 55, 765-799. Newey, W.K, J.L. Powell, and J. Walker (1990), "Semi-parametric Estimation of Selection Models: Some Little, Roderick Empirical Results," American Economic Review 80(2), 324-328. Newey, W.K. (1988), "Two-Step Series Estimation of Sample Selection Models," mimeo, Department of Economics, Princeton University, October. Olsen, R.J., (1980), "A Least Squares Correction for Selectivity Bias," Olsen, R.J., (1982), "Distributional Tests for Selectivity Bias and a Econometrica 48, 1815-1820. More Robust Likelihood Estimator," International Economic Review 23, 223-240. Powell, B., and L.C. Steelman (1984), "Variations in State Harvard Education Review SAT 54, 389-412. 13 Performance: Meaningful or Misleading?," Robinson, P.M., (1988), "Root-N-Consistent Semi-parametric Regression," Econometrica 56, 931-954. "The Consequences of Adjustment for a Concomitant Variable that has Been Affected by the Treatment," Journal of the Royal Statistical Society Series A 147 Part 5, 656-666. Rosenbaum, P.R., and D.B. Rubin, (1983), "The Central Role of the Propensity Score in Observational Studies Rosenbaum, P.R., (1984), for Causal Effects," Biometrika 70, 41-55. and D.B. Rubin, (1984), "Reducing Bias the Propensity Score," JASA 79, 516-524. Rosenbaum, P.R., in Observational studies Using Subclassification on Rosenbaum, P.R., and D.B. Rubin, (1985), "Constructing a Control Group using Multi-variate Matching Methods that include the Propensity Score," American Statistician 39, 33-38. Wainer, H (1986), "The SAT as a Social Indicator: A Pretty Bad idea," in H. Wainer, ed., Drawing Inferences from Self-selected Samples, New York: Springer- Verlag. 14 232 Date Due Lib-26-67