DEWEY HB31 .M415 working paper department of economics TWO-STEP ESTIMATION, OPTIMAL MOMENT CONDITIONS, AND SAMPLE SELECTION MODELS Whitney K. Newey James L. Powell No. 99-06 February, 1999 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 WORKING PAPER DEPARTMENT OF ECONOMICS TWO-STEP ESTIMATION, OPTIMAL MOMENT CONDITIONS, AND SAMPLE SELECTION MODELS Whitney K. Newey James L. Powell No. 99-06 February, 1999 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142 TWO-STEP ESTIMATION, OPTIMAL MOMENT CONDITIONS, AND SAMPLE SELECTION MODELS by Whitney K. Newey Department of Economics MIT and James L. Powell Department of Economics UC Berkeley January, 1992 Revised, December 1998 Abstract: Two step estimators with a nonparametric first step are important, particularly for sample selection models where the first step is estimation of the propensity score. We In this paper we consider the efficiency of such estimators. characterize the efficient moment condition for a given first step nonparametric We also show how it is possible to approximately attain efficiency by estimator. combining many moment conditions. In addition we find that the efficient moment condition often leads to an estimator that attains the semiparametric efficiency bound. As illustrations we consider models with expectations and semiparametric minimum distance estimation. JEL Classification: Keywords: CIO, C21 Efficiency, Two-Step Estimation, Sample Selection Models, Semiparametric Estimation, Department of Economics, MIT, Cambridge, MA 02139; (617) 253-6420 (Work); (617) 253-1330 (Fax); wnewey@mit.edu (email). The NSF provided financial support for the research for this project. Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/twostepestimatioOOnewe Introduction 1. Two step estimators are useful for a variety of models, including sample selection models and models that depend on expectations of economic agents. first step is nonparametric are particularly important, having Estimators where the many applications in econometrics, and providing a natural approach to estimation of parameters of interest. The purpose of this paper is to derive the form of an asymptotically efficient two step estimator, given a first step estimator. The efficient estimator for a given first step nonparametric estimator will often be fully efficient, attaining the semiparametric efficiency bound for the model, as for some sample selection models considered by occurs because the first step is Newey and Powell Full efficiency (1993). just identified, analogous to the efficiency of a when limited information estimator of a simultaneous equation just identified. we show all the other equations are An analogous result for two-step parametric estimators Crepon, Kramarz, and Trognon (1997), where optimal estimation in is given in the second step leads to full efficiency if the first step is exactly identified. We will first give are efficient in some general results that characterize second step estimators that a certain class and consider construction of estimators that are approximately efficient. We then derive the form of efficient estimators in several specific models, including conditional moment restrictions that depend on functions of conditional expectations and sample selection models where the propensity score selection probability) is nonparametric. We also describe how an approximately (i.e. the efficient estimator could be constructed by optimally combining many second step estimating equations. Throughout the paper we rely on the results of Newey (1994) to derive the form of asymptotic variances and make efficiency comparisons. Those results allow us to sidestep regularity conditions for asymptotic normality and focus on the issue at hand, which the form of an efficient estimator. In this is approach we follow long standing econometric practice where efficiency comparisons are Newey specific estimators (e.g. as in mean its (1994) for series estimators or (1994) for kernel estimators), but this As an of a dependent variable in given some conditioning variable y Newey and would detract from our main purpose. example, consider the following simple model initial full set Of course, we could give general regularity conditions for of regularity conditions. McFadden made without necessarily specifying a which the conditional x is proportional to conditional standard deviation: y = 8 cr(x) + u, Given a sample estimator for <(y., x'. 2 E[u|x] = )', = i 1, 0, cr ..., n> (x) = Var(y|x). of observations on would be an instrumental variables 8 with a nonparametric estimator (1) (IV) and x, one type of estimator replacing and using an instrument cr(x) y a(x) cr(x) to solve the equation - |3£(x.)] = £.,a(x.)[y. J l ^i=l l i **• for have 8 = " —1 n n ,a(x.)c(x.)] Y. ,a(x.)y.. ^1=1 ^1=1 l l [Y. a(x) = c(x). If the data generating process and the nonparametric estimator then the formulae given in asymptotic distribution for v^n(/3 - B ) q Chamberlain (1987) is <r(x) root-n consistent and asymptotically normal, is (1994) can be used to derive the following form of the /3: _1 The asymptotic variance of c(x) Newey J3 -iU N(0, <E[a(x)cr(x)]r 2 -E[< 2 a(x) 2 ]), C = u - [2<r(x)] regressor For example, the least squares estimator would 11 are sufficiently regular, so that a(x), (2) 2 {u - o-(x) (3) 2 ). this estimator is that of an IV estimator with instrument and residual cr(x)/w(x) for ^. The efficient choice of instrument, as w(x) = 2 E[<; |x]. in The novel feature of this optimal instrument is that depends on the inverse of the conditional variance of it than the original residual 2 (v l/cr(x), though, 2 so that E[C the same as if -l)/2], it is l u. x v If independent of is 2 proportional to is l = u/cr(x) cr x <; rather < = o-(x)[v - then and the the best instrument (x), the first stage estimation were not accounted for. necessary to account for the first-stage estimator in is In general forming the optimal instrument for the second stage. The best IV estimator (1993), is weighted least squares with weight j3 = [y.%(x.) ^(x.) ^i=l l 2 ] V.^wCxJ l ^i=l Alternatively, as should be efficient. estimator could be constructed by 2. As . in Newey estimation of the optimal instrument should not affect the asymptotic variance, so the weighted least squares estimator, so for where u(x) A(x) is some vector l we GMM that w(x) is suitably well behaved, l £(x.)y. (4) li discuss below, an approximately efficient estimation with moment conditions A(x)[y-/3(r(x.)], of approximating functions. General Methods To describe the general class vector of functions, where vector, and a is E[m(z,e z of estimators )] consider, let denotes a single data observation, an unknown function. ,ct we m(z,G,a) 9 = (5) "0" subscript denotes true values. and an estimator a can be used to construct an estimator solving the equation of a parameter Suppose that the moment restrictions are satisfied, where the a is denote a This moment restriction 6 of , by £." m(z.,e,a)/n = 0. (6) The class of estimators we consider are of some moment set of feasible Section z = 1, x')', (y, 6 = a = |3, m(z,/3,a) we need n ,m(z.,e n ,i)/v^ = ^i=l i r Here u u (z)-m(z,9 ,a results of ) Newey some in - H asymptotic variance H m E[u When m E[u will simply such that m (z) (7) ,m(z.,G„,a)/n ^i=l with the term l u When a. (z). If a is equation = <3E[m(z,6,a )]/3e| 0— 9 u nonparametric the (7) holds along with , o m (z)u m (z)']H"m 1 '). will consider is one m(z,9,a) we the (8) where m minimizes the '. to minimize the asymptotic variance of 6 m, all m = E[u mm (z)u-(z)']. asymptotic variance will be (9) Y. m (z)u m (z)']H m this equation holds then by Equation m u has the In general, (1). _1 A sufficient condition for H H 0. 8 p (1994) can be used to derive The efficient two-step estimator we that for + o i N(0, example of the one where i.e. but for the moment, a, accounting for the presence of 6J -^ class, an associated function (z.Wn £.% *"i=l m other regularity conditions then for vWe is the influence function of is m (z) In the the asymptotic distribution of there m(z,8,a) m. denotes the conditional standard deviation <r asymptotic variance will depend on the form of assume that for each restricted to be in is |3cr(x)]. To characterize the optimal smallest asymptotic variance, m where may depend on a functions, and m(z,6,a) = o-(x)[y - function, and this form, is (9) Newey and McFadden (E[u— (z)u— (z)' m m ]) , (1994), a lower bound on the and will be attained when analogous to the generalized information matrix equality in m = m. parametric is models, and similar equations have been used by Hansen (1985a), Hansen, Heaton, and Ogaki (1988), Here we use this and Bates and White (1993) to find efficient estimators. mm equation to derive the optimal choice of a second step estimator. This characterization of an efficient two-step estimator can be used to derive the optimal estimator in the initial example. m Also, the choice of -E[a(x)cr(x)]. In that example, equation, that hence an optimal instrument is H and and a is a(x) = a>(x) A solution to this cr(x). Construction of an efficient estimator can often be based on the solution m may Although to equation (9). = - E[a(x)cr(x)] = E[a(x)^ a(x)] = Eia(x)w(x)a(x)]. is (9) = a(x)C (z) reduces to a choice of instrument 2- equation u depend on unknown functions other then m(z,/3,oc) they a, can often be replaced by parametric or nonparametric estimators without affecting the efficiency of e.g. e, restricted class of as in unknown Newey Estimators which are efficient for some (1993). distributions, referred to as locally efficient here, can be constructed by using finite dimensional parameterizations of unknown components of the optimal moment function. Estimators which are efficient for distributions can be all constructed by using nonparametric methods to estimate unknown components. examples to follow we which will result will discuss various estimators of the optimal in efficiency A general approach and of it is to efficient estimation, hard to form an explicit estimate, (1976), is which useful is in m when is complicated to use the efficient generalized method This approach has been considered functions. Hayashi and Sims (1983), Chamberlain (1987), and Newey (1993). "spanning" condition, this approach will result in an estimator that efficient, functions, under appropriate regularity conditions. moments estimator based on "many" moment by Beran moment In the is Under a approximately the sense that as the number of moments grows, the asymptotic variance of the estimator approaches that of the optimal estimator. To be precise, consider a depend on with mj J). and J x 1 Suppose that for some replacing u j m and vector of functions J u x m 1 vector u (z), m (z,G,a) equation respectively, and let V (where (7) is a may satisfied denote an estimator j > V of = E[u (z)u (z)'] GMM optimal 0J V (e.g. 06Q I i 2 1 m J (z.,e An alternative one-step version 1 6. = 6 - (H'T V~ H J J where 6 is equivalent to variance As an J )" 1 I H ) u— (z) in mean square, 2.1: mm E[u (z)u — (z)' ... Suppose ] for e,a). ) 3m £. (10) (z.,6,a)/ae/n, (11) > I ] As usual, the one-step estimator is asymptotically Both estimators will have asymptotic (H'.V. H ) . m D' 1 m that there is feasible E[\\u— (z)-C T u m as can approximate the optimal influence function u (z) as shown by the following result. all such that (Elu-(z)u-(z)' m is gets larger the asymptotic variance of this estimator will approach the lower linear combinations of 2, = which can be estimated by if 1, H optimization counterpart. , An (z)). 1 n H' V" ^. m (z.,e a) J J J ^1=1 l bound, = a)'V- I.2 m (z. 1 J I J u 1 ) for is, initial estimator. its full (H'V J Theorem m estimator based on the moment vector = argmin for an estimator u (z.)u ,(z.)' /n = £ J — m, H and there are conformable constant matrices C 2 J JT » nonsingular, such that (z)\\ ] — as J — » w. is ] Then — (H'V~H ) J J Jr co. The mean-square approximation hypothesis of referred to above. > E[u—(z)u—(z)' this result is the spanning condition This result falls short of an efficient estimation result, because does not specify a way, independent of the true data generating process, such that grow with the sample It is size so that &,, , J(n) has asymptotic variance possible to give such rates in particular problems, as in to avoid technical detail rates are not derived here. Instead, Newey mm (E[u— (z)u— (z)' (1993), u (z) that approximate u— (z) we focus on how in J ]) . but this result suggests efficient estimation might approximately be achieved, by choosing functions with corresponding it mean-square. moment JT = , J 3. Conditional The Moment Restrictions and Nonparametric Generated Regressors. first specific model we consider is a semiparametric instrumental variables The introductory example estimator that depends on a nonparametric regression. considered by Ahn and Manski (1993) and Rilstone (1989). E[p(z,e 0> a )|x] = 0, Let denote a vector of a(x) a residual that depends on a function p(z,9,a) Then the moment vectors a Also, this case includes estimators that have been special case of this model. instrumental variables, is a, where a (w) = E[d|w]. Q (12) will consist of a vector of instrumental variables a(x) multiplying the residual, m(z,6,a) = a(x)p(z,0,a). The optimality problem here is finding the set of instrumental variables that minimizes the asymptotic variance. To derive the optimal instruments we need to account for the nonparametric estimation, which can be done by imposing the following condition. Let <x(w,y) denote a parametric specification for the conditional expectation, satisfying regularity conditions, along with the conditional moment vector, so that the derivatives in the following condition exist. Assumption 3.1: some dE[a(x)p(z,e.,cd»)]/3r y_, There is 5(w) such that for I all a(w,^) | that it is as derived in a (w) = a(w,^ ) for = E[a(x)5(w)aa(w,^)/a?-]. This condition leads to correction terms for estimation of E[a(x) w]5(w)[d - a (w)], with Newey (1994). based on a simple derivative calculation that a. of the simple The value of is this easy to apply. form approach is For instance, in the initial y - |3{a (x) - [a (x)] -/3 [2cr The } Assumption , 3.1 x ,<x Theorem 3.1: m (z) = p(z,9 ,a) = and ,y)' |x] x If leading to the correction term given earlier. we address and for now assume that w £ 5(w) = satisfied with is for the case where the are a subset of the first stage regressors. )/S9|x], E[5p(z,9 2 is first optimal instrument question instruments u 2 1/2 (l,-2E[y |x]), (x)] a_(x) = E[(y where example, and Assumption 3.1 p is D(x) = Let a scalar. is satisfied, then < = p(z,6 n ,a n ) + 8(w)[d - g n (w)], u u u a(xK, and the choice of instruments that minimizes the asymptotic variance 2 a(x) = D(x)'(E[C \x])~ "adjusted residual" 1 . u m (z) which = a(x)C, is the instrumental variables times an that accounts for the presence of ^, form of these optimal instruments follows upon noting the An interesting interpretation of the influence function, is a. Consequently, the optimal instruments are the same as for an IV estimator without the first stage, except that the conditional variance E[< 2 |x] the optimal instruments have this simple form has replaced is E[p 2 |x]. The reason that that the adjusted residual fully accounts for the generated regressors. The initial example provides one illustration of estimator of Ahn and Manski (1993). {0, 1>, is Another example is a binary dependent variable the y 6 with Prob(y=l|x) = $(e where Suppose there this case. $ is the CDF [cc (x,0)-<x (x,l)] + io x^e^), for a standard normal. nonparametric estimator a(w) replacing a^w) = E[d|w], Their estimator a (w). is w = (x,y), (13) probit, with a As usual for probit, this estimator is asymptotically equivalent to an instrumental variables estimator with p(z,e,a) = y - $(9 [a(x,0)-a(x,D] + D(xMx) \ a(x) = where v = 6 (14) Q(x) = *(v)[l - *(v)], D(x) = -(a (x.O)-a (x.D.x' (x,0)-a (x,l)] [a x'^), : x'6 and <p{v) is )' <piv), the standard normal p.d.f.. These instruments are not optimal. x £ w, To derive the optimal instruments, note that To do 3.1. we so, note that for any functions = Elb(x)d(w)U(l-y)/(l-*(v))] - i/>(v)[5a(x,0,3')/33' - 9<x(x,l,y)/3y], -9 0(v){[l-$(v)] (1-y) - <Mv) so for </>(v)5(w)[d-g (w)], a(x) = (E[^ 2 E[C As in the last y>. Assumption Then, as in and 3.1 is Theorem 2 |x], d(w), we can apply Theorem E[b(x)<d(x,0)-d(x,l)}] a(y))/ay| 9p(z,9 u(x) = E[{d-E[d|w]} satisfied for 3.1, <; = _ 5(w) = = y-$(v) - the optimal instruments are _1 2 |x]) D(x) = *(v)[l-*(v)] |x] Then, by [y/*(v)]>. -0 b(x) so that (15) 2 + ^(v) 8 ^(vf^l-ttvjr'utx). example, the optimal instruments are those for weighted nonlinear least squares, where the weight is the inverse of the conditional variance of an "adjusted residual," rather than the original residual. An optimal estimator can be constructed by using estimates As usual, estimating instruments will not affect the asymptotic variance. instruments. The optimal instruments may be estimated by substituting a(x), and v residuals 9 , a(x), and v for 9 , respectively in the formula for the optimal instruments, and also substituting an estimator for estimating of the optimal w(x) w(x). A locally efficient estimator can be constructed by as the predicted value [d-a(w)] 2 from a regression of nonparametric squared on a few functions of 10 x. A fully efficient estimator would require nonparametric estimation of Alternatively, an approximately efficient w(x). GMM estimator could be constructed from many functions estimation using of as x, considered below. Another example of Theorem (1994), h(x) where y = (t , 1, 2.1 the semiparametric panel probit estimator of is are binary variables and there 2), is Newey an unknown function such that E[y |x] = <H[x^ + h(x)]/(r Inverting the normal CDF and differencing p(x,6 ,a gives the condition (/3',cr), (t ), = 1, 2), 2 1. eliminates the unknown function where = 0, ) = <r t t « n (x) = (E[y |x], E[y and h(x) 9 = |x])', and p(x,e,cx) = # (a (x)) - The least squares estimator regression estimator Newey in a for a(x) [a Ax)) - (x -x $ o* )'/3. (1994) is obtained by substituting a nonparametric and minimizing (x) £._ p(x.,6,ct) . As usual for least squares, this is asymptotically equivalent to an IV estimator with instruments a(x) = D(x)' = 3p(x,e ,a )/ae = -((x -x)',$ 2 _1 (a (x)))'. The optimal instruments can be derived by applying Theorem </>($ 3.1 In this (a)) we have C, 5(x) = example (x)),-cr (i//(a = 5(x)<y - E[y|x]>. i//(a (x))). Let i//(a) = Therefore, from Theorem w(x) = 5(x)Var(y |x)5(x)' Let 3.1. = Var(^lx). Then the optimal instruments are r X 2" X 1 a(x) = D(x)'w(x) /Var(0(a _1 $ (a 10 2O (x))y -o- 2 (x))y i//(a lo lo 1 x). 1 (x)) These instruments correspond to the first order condition for the weighted least squares estimator = argmin V. n -1 ,u(x.) 6^1=1 i * 2 p(x.,6,a) /n i 11 This estimator can also be thought of as a semiparametric minimum distance estimator, where 6 being chosen to minimize a function that should converge to zero is The characterization of an optimal IV estimator applies to any at the truth. semiparametric minimum distance problem where p(x,6 ,a ) = 0. instruments will be be optimal D(x) = dp{yi,B ,a )/dQ For Q D(x)'w(x) Q n (x) = E[y|x] for a vector w(x) = <5(x)Var(y x)S(x)' and | , and y the optimal Furthermore, the weighted least squares estimator will . in this class. Construction of an efficient estimator The optimal instruments in Theorem analogously to Newey (1993). constructed by GMM 2.1 is straightforward in the case where the influence function Alternatively, an approximately efficient estimator can be moment functions would is be A(x)p(z,6,a). so simple, the spanning condition of that a linear combination of x £ w. can be estimated nonparametrically, proceeding estimation using a vector of approximating functions instruments, where the we omit <* A(x) In this case, Theorem where 2.1 just requires can approximate the optimal instruments. A(x) as For brevity a formal result. The next case we consider is that where w £ x. This case is more complicated in that the correction for the first stage estimation does not lead to the adjusted residual form of the influence function. 5(w)[d-a (w)], Let S(w) = E[VV'|w], p = p(z,0 ,a and ), Q(x) = E[pp' |x], K(x) = E[pV'|x]. L2 V = Theorem 3.2: u w If m (z) £ x and Assumption then is satisfied, 3.1 = a(x)p + E[a(x)\w]V. Also, if the linear equations a(x) = [D(xY + P(w)' + P(w) = -Z(w)E[a(x)' \w] have a solution for instruments are a(x), a(x). 1 P(w) = -{Z(w)' In general the form R(w)' K(x)' ]Q(x)~ . E[K(x)'a(x)' \w], - R(w), P(w) and Furthermore, if 2 + 1 with probability one, then the optimal K(x) = 1 R(w) = -E[a(x)' \w], then R(w) = and 1 E[Q(x)~ \w]f E[Q(x)~ D(x)\w]. of the optimal instruments quite complicated, although is K(x) = simplifies in the zero conditional covariance case, An example where Theorem 3.2 applies is it 0. a nonparametric generated regressor model where p(z,9,a) = y - z'e This residual residual is - ya(w) a (w) = E[d|w], - 5[d - a(w)], (d,w) £ x. that for a linear model, where a conditional expectation and the from the same conditional expectation are included as regressors. The model is a semiparametric version of a familiar model with many economic applications, that has been considered K(x) = in Rilstone (1989). and that Assumption Note here that 3.1 is satisfied for Dfx) = -(E[z' 5(w) = 5 -y |x], . a (w), d-a (w)), Therefore, the optimal instruments are _1 a(x) = D(x)'n(x) _1 + E[D(x)'Q(x) _1 -1 |w]<Z(w) 13 + E[Q(x) 1 wlT^tx)" 1 . In the case where fE[z I are constant, this formula simplifies to Z(w) and fi(x) fE[z xfl Iwll i a(x) = a (w Q ) a (w + o d-a ( w) D* ) o z + n o and it can be shown that the resulting estimator attains Rilstone's (1989) semiparametric bound for normal disturbances. It interesting to note that these instruments are is equal to those for the best estimator involving the additional variable if z if a (w) were used If E[z |w] = E[z E[z |w], a, as would occur e.g. |x], plus a term then the best instruments would be a linear combination of the instruments £ w, that are best without the generated regressor problem. follows that least squares In the in place of constant readily available is Q(x) and Z(w) z £ w, then it E[z an estimator of the optimal instruments case, |x], E[z |w], a (w), Z, and Q is by Alternatively, since the optimal instruments are a linear A(x) = (E[z'|x], E[z'|w], a (w), d-a (w))', combination of if best. from replacing corresponding estimates. Indeed, an optimal estimator could be obtained from GMM for the corresponding A(x) with an optimal weighting matrix that accounts for the generated regressors. In the general model of Theorem 3.2 it should be possible to construct an efficient estimator by using nonparametric estimates of the optimal instruments. Alternatively, an approximately efficient estimator can be constructed using many moment conditions. that corresponds to GMM estimation with many functions of x as instruments. It straightforward to give conditions for approximate efficiency of these estimators. A (x) = (A (x),...,A estimator as dimension of in (x))' be a vector of functions of equation (10) or (11) with p. 14 m x, and consider a (z,0,a) = p(z,0,a)®A (x). Let Here is Let GMM r be the Theorem If 3.3: mm bounded, and for any E[\\a(x)-C JI J The point of ®A.(x)}\\ J r _ E[u—(z)u—(z)'] a(x) 2 ] — is with as » E[\\a(x)\\ J — is in 2 ] 2 E[\\a(x)\\ < ] (H'T V~}h J J S oo, C < m, there exists then m, > this result is that all that complicated optimal influence function is nonsingular, Q(x) and S(w) such that mm l -> (E[u-(z)u-(z)' J are ])~ 2 as needed to guarantee approximation of the Theorem leading to approximate efficiency, 3.2, that the instruments can approximate functions of x. There are many types of instruments that would meet this qualification, including power series and regression splines, making it relatively easy to construct an approximately efficient estimator. Here we have shown the form of an efficient two-step GMM estimator when the second step instruments are a subset of the first step regressors or vice versa. possible to obtain some results for the case where and x is also are not a subset of each For brevity these results are omitted. other. 4. w It Sample Selection Models and Nonparametric Propensity Score Estimation Interesting and important two-step estimators arise in the context of sample selection models. A general form of such a model * y * = x'6 + e, The selection probability 9 only observed y Prob(d = l|w) = interest is ?r(w) P is = P, if d = 1, d e <0, (16) 1}. x £ w. referred to as the propensity score. The parameters of are identified under various restrictions on the joint distribution of the selection indicator d and the disturbance 15 e. In this section we consider the form of J optimal two-step estimators d = conditional on 1 and in two cases, where c w and are mean independent and where they are statistically independent. P estimators are two-step estimators where the first step is The nonparametric estimation of Some such estimators have been previously considered by Ahn and the propensity score. Powell (1993) and Choi (1990). The first case we consider is E[e|w,d=l] = E[e|P,d=l], (17) Here the conditional mean of the disturbance, given selection and the propensity score. is This can be motivated by a latent variable model where for some unknown function 1(t(w)+tj i 0) t(w) a one-to-one function of that (e.g. and that positive everywhere) is E[e|w,d=l] = E[e w,7)i-T(w)] = E[E[e | w a function of depends only on w, only through w and 7) | Suppose that tj. are independent and E[c|t),w] = E[e |t),t(w)]. |tj,w] x(w), and a disturbance t(w) d = has density tj Then w,t)£-t(w)] = E[E[e |t),t(w)] w,t)2:-t(w)], which | and hence through P is so that equation (17) P, holds. A two step estimator of can be constructed using a nonparametric propensity score estimator P = with a residual d<y-E[y P,d=l] - (x-E[x P,d=l])' tt(w). A vector of instrumental variables | | regression estimator with regressor to P, Here a and y value tt(w). a W ir have true values Using this function with w (a E[y|a (w)] E[*|P] (w))]'6>, is as a = (a , y 7r and TT a can be used along a nonparametric form a moment vector m(z,8,a) = a(w)d<y-a (a (w)) - [w-a y where 0}, a(w) E[w|a (w)], w and , *). (18) a (w) has true 77 71 we have described a is a two-step instrumental variables version of Robinson's (1988) estimator, and a density weighted version has been developed by Ahn and Powell (1993). It is straightforward to derive the influence function from the results of Newey 16 (1994) and to obtain the optimal instruments. Let A(P) = E[e|w,d=l] and = A ax(P)/ap. Theorem 4.1: u For -q(w) 2 The influence function m (z) = = E[C, 2 is < = dlc-X(P)] + PA D (PXd-P;. r {a(w)-E[a(w)\P]K, 2 2 3 bounded away from zero, the \w] = E[d{c-X(P)} \w] + A (P) P (1-P) optimal instruments are 2 a(w) = P-q(w)' {x Furthermore, in the Var(u—(z)) 2 x\P]/E[-n(wf 2 \P]}. semiparametric variance bound for estimation of is the m model of equation - E[-n(w)~ like those that appear in an efficient estimator for a heteroskedastic partially linear model as discussed in Chamberlain (1992). obtained from partialling out an unknown function of where the weight is P is In this way P that the presence of the inverse conditional variance of the adjusted residual p. in a t;, They are weighted least squares the inverse of the conditional variance of difference with Chamberlain (1992) residual U (17). The best instruments here are criterion, 6„ The main C,. leads to the weight being rather than the original the optimal instruments account for the presence of the first stage nonparametric estimates. Because the optimal instruments attain the semiparametric efficiency bound we do not need to search beyond instrumental variables estimation for an efficient estimator. Construction of an optimal estimator, or approximately optimal estimator could be carried similar to the way discussed E[d{y-x'8-A(P)} initial r in Section 2. |w] + A (P) P (1-P) A nonparametric estimator could be constructed using 6 unweighted least squares estimator of a partially linear model in the selected data, 7)(w) and y = A 2 = from an x'0 + A(P) + and then a nonparametric estimator of the optimal instruments 17 formed as 2 a(w) = pi)(w) This is <x - E[tj(w) 2 x|P]/E[^(w) 2 |P]}. a complicated estimator for which regularity conditions have not yet been formulated the literature, but should lead to efficiency. in An approximately efficient estimator straightforward to construct here, by using is approximating functions as instruments and then doing optimal GMM functions and consider a estimator as - (x-E[x| P,d=l])' 6}. A (w)d{y-E[y |P,d=l] E[u—(z)u—(z)'] If 4.2: is with finite mean-square, there exists co, (H\V~}h then J J Here we see that S J it l — > in equation (10) or GMM mm J as is ~n(w) (z,6,a) = bounded and for any E[\\a(w)-C A (w)\\ — > 2 ] — as > a(w) J suffices for approximate efficiency that the approximating vector There are many such functions that could be used, including splines and power series. The second sample selection model we consider e Here it is and w are independent conditional on assumed that conditioning on P If the joint distribution of satisfies equation (16) with P and d = (19) 1. removes any dependence between This can be motivated by a latent variable model 1(t(w)+t} £ 0). (c,T)) like that above, given w where c and w. d = depends only on x(w) then this equation will be satisfied. A basic implication function of w — oo. spans the set of functions with finite mean-square. A (w) m estimator: such that 2 (E[u-(z)u-(z)' ])~ with (11) a relatively simple spanning is nonsingular, C that accounts for the be a vector of approximating (w) Then there condition for approximate efficiency of the Theorem A Let presence of the nonparametric first stage. GMM will be of the conditional independence in equation (19) uncorrected with any function of 18 e and P, is that any conditional on P and d = This allows us to form estimators analogous to those above, where 1. y - x'0 replaced by any function of instrumental variables a(w), and a function | and a corresponding residual q(e,P), the moment function a = a (w))], m(z,e,a) = a(w)d[q(y-x'e,a (w)) - a (y-x'6, J and a q E[q(y-x'0,ct (w))|a (w),d=l] have true values a n q tt Here TT tt Using this function with respectively. a is To be precise, for a vector of P. we can use d{q(y-x'e,P) - E[q(y-x' 0,P) P,d=l ]>, y-x'6 a qrc , and (20) ). tt(w), 71 we have described as (a of the estimator described above for the conditional mean is a nonlinear version case. The next result gives the form of the influence function and the optimal moment Here functions for this estimator. the density of f(e|P) s (e,P) = dlnf(e|P)/dP Theorem 4.3: c given MP) = let d = and 1 be the score for f For the model of equation A (P) = 5A(P)/3P, E[q(e,P) P,d=l], | w. Also, set (e,P) = dlnf(e|P)/de s and and with respect to a location parameter and (19) and moment functions as in P. equation (20), the influence function is u m (z) = {a(w)-E[a(w)\P,d=l]}C,, C, = d[q(c,P)-X(P)] + and the optimal choice of moment function has q(c,P) = s (c,P) c Furthermore, Var(u—(z)) - = x where the best and l PsJc,P)E[s s \P,d=U/(P~ (l-P)~ P c p is the Using many moment functions this case, a(w) = x P{E[q (c,P)\P,d=l]-X (P)}(d-P), p p q(c,P) has a known functional form. 1 + PE[s 2 \P,d=l]}. p semiparametric variance bound for estimation of in is an approximately efficient estimator is useful in quite complicated, but the optimal instrument In this q(e,P), rather than the potentially high dimensional approximation of the best instruments 19 a(w) case approximate efficiency can be achieved with only a two-dimensional approximation of the best a(w) Q from Theorem . 4.1. The low dimensional nature of the approximation means that attain high efficiency using a relatively small set of moment (10) or (11) Theorem 4.4: and for any -^ If of d^ty-x' E[u—(z)u—(z)'] 2 is 9,P) - C may make sense when c and t) is E[(s likely to induce P 2 ) \P,d=l] are bounded, such that E[d{q(c,P)-C q (c,P)} U J J -^ like 2 ] ». e and For example, one could derive the form have a normal distribution. few simple approximating functions, selection and to begin the approximation with functions of that correspond to particular distributions. q(e,P) J x mm J estimator as in equation | 1 l (H'.V'^H j' -^ (E[u-(z)u-(z)' })~ as J J q (e,P) Elq^y-x' 9,P) P,d=l]}©x. nonsingular, finite there is ] then oo, In practice it P (z,9,a) = E[dq(c,P) J -^ as m with Let conditions. GMM denote a vector of approximating functions, and consider a should be possible to it power series in Alternatively, one could use a some function of some skewness and because normality is e. Because not expected in many econometric applications, using such an estimator for the conditional independence case of equation (19) can result in substantial efficiency gains over the linear estimators based on equation (18), as shown in Newey probability. 20 (1991) for a semiparametric selection APPENDIX: Proofs of Theorems denote a constant that C Let Proof of Theorem = Efu.u— ' J m For a matrix 2.1: B where we suppress the ], different in different uses. is let = [tr(B'B)] IIBII 1/2 By equation . (9), argument for notational convenience, so that z H 6. J has asymptotic variance (H'.V^HJ" J J u = For the C 1 = (Elu-u'.KElu.u'jf^tu.u-'lf mJ J 1 = (Elu.u'jf Jm JJ JJ 1 , Elu-u'.KEIu.u'JfV, m in the J J J J statement of the result, u— multivariate least squares projection of E[u _u' C C ] £ Etuu' £ Efu-u-' mm ] = C u u let on u it , Since . u is the follows that (21) I. Also, by the spanning condition in the statement, IIE[u_u'] - C C Cm E[u—u— ']H s E[llu_u'-u— u— C C m m £ E[llu -u-ll Therefore, by equation by nonsingularity of matrix. mm 2 (21), E[u— u— + 2(E[llu ] E[uu' ' ] ] — > Cm -u-ll 2 II] ' 2 1/2 (E[llu-ll ]) E[u— u— m m ' ] as J — 1/2 ]) m > oo, -» 0. so the conclusion follows and continuity of the inverse matrix at any nonsingular QED. Proof of Theorem 3.1: The form of the influence function follows by Newey (1994). The (1994). For optimal instruments follow as in Chamberlain (1987). Proof of Theorem 3.2: The form of the influence function follows by Newey notational convenience suppress the x argument of 21 a(x), fi(x), D(x), and K(x). Then by iterated expectations, equation hence the optimal instruments, H D = Qa' requires that Solving for and Z w P = -Z E[n w + Z(w)E[a' |w] + KE[a' |w] + E[K'a' |w] = Qa' Plugging formula for in the -1 _1 (D + P W W )|w] = -Z gives the third result. QED. Proof of Theorem 3.3: Let W E[Q a,(x) = Theorem E[llu 3.2, m a For the third result, WW - Z P P = P(w) WW + P' )fi , P gives Solving for . Note that for any ®a,(x). r |w]P E[n u J C and conformable constant matrix P w as in m (z) then Theorem , 2 (z)-u_ (z) Cm J II ] J £ 2E[lla(x)-C a (x)Ml 2 IIQ(x)ll] + 2E[IIE[{a(x)-C a (x)> w]ll | J £ CE[lla(x)-C a 2 (x)}ll ] + CEtEtllatxl-Cja^x)!! The conclusion then follows by Theorem For the proof of Theorems E[b|P,d=l], let a = (D' the equation for in P(w) - KR(w). - _1 D|w] I J 3.1 or satisfaction of this equation x, Applying the optimal instrument formula gives = Z(w). E[a' |w]. Z(w)E[a' |w] + KE[a' |w] + E[K'a' |w]>]. then gives the the second result. a -Z + can be any bounded function of a = a(x) Since is = E[aD] = E[a<Qa' m for the optimal influence function, and [9) 4.1 - 4.4, |IZ(w)ll ] 2 ! w]] < CElllatxKCjajtx)}!! 2 ]. QED. 2.1. for any function and note that for functions 2 ] b(w) of of the data let b E b = E b = E[b|P]. w, P Proof of Theorem 4.1: As shown in Newey estimation can be derived separately for a a , y conditional expectations correction in for a and y a is Newey -E[a(w) |P,d=l]p. the correction terms for nonparametric (1994), w . (1994) Also, let and a . From it follows that the correction term jrtw,^) = E [d|w] y vv 22 the form of the tt denote the propensity score when for a distribution parameterized by Then by Newey truth. (1994) the correction By equation (21), term for nonparametric estimation of with respect to E[da(w)E[e |Tr(w,y),d=l]] can be computed from the derivative of the truth. that passes through the y 7i(w) at y iterated expectations, the chain rule, and the fact that dnlvr.r-Vdy = SE [d|w]/3y = E[{d-P}S (z)|w] U If ^f for the score for S (z) z, SE[da(w)E[c|n(w,^-),d=l]]/a9- = aE[da(w)E[MP)|ir(w,y),d=l]]/5y = SE[da(w)E[A(P-Ti(w,y)+P)|P,d=l]]/ay + aE[da(w)M7r(w,y))]/3r - E[PA D (P){a(w)-E[a(w)|P]}a7r(w,^)/33'] = E[PA (P){a(w)-E[a(w)|P]}{d-P}S Then by Newey (1994) the correction PA (P){a(w)-E[a(w)|P]}(d-P). we Noting that obtain the first conclusion. Equation -E[aP(x-E x)' p w where the H Also, m P is E[a(w) P,d=l] = E[da(w) |P]/E[d P] = E[a(w)|P], | | = E[dm(z,e,a_)/ae] = -E[Pa(w){x-E[x|P]}' argument ] 2 = -E[(a-E a)Px'] = E[(a-E aK (a-E a)' p p p is ] = E[(a-E 2 p a)7) Subtracting gives suppressed for convenience. 2 -E[(a-E a)(Px-7) <a-E a})'] = p p can be any function of a(w) Since 0. this equation implies that w, 2 Px-T) (a-E a) = h(P) for some function h(P) expectations given P PE -2 i) x/E of gives P. PE Dividing through by -2 tj x = h(P)ET) -2 7) ]. then is (9) term for estimation of (z)]. , and solving again, 23 2 T) and taking conditional -2 . Solving, h(P) = (a-E p a)' ], 2 a - E a = p [x - E 2 p (T) x)/E 2 p Newey and Powell Proof of Theorem Ellla-aJI Pt, 2 (x - E[tj x|P]/E[tj 2 |P]). 2 — > ] 2 2 !) ] + CE[IIE p (a-a = A (w) EDa = Then since 0. A Choose 4.2: P 2 2 4.3: )ll T, ] | C ard such that for mJJ E[ llu— (z)-C ¥ u.(z)ll 0, * CE[ lla-ajll 2 ] + CE[E p 2 ] = a CA, JPJ = E[li{(a-a )-E ri (a-a T lla- aj 2 ll ] s CEMli-ajll 2 ] I 2 )}CII -> =s ] The 0. QED. 2.1. a | term for estimation of where <a(w)-E[a(w)|P,d=l]}P{E[q (e,P)|P]-A (P)Md-P), P A(P) = where -E[a(w) P,d=l]d<q(e,P) - A(P)}, is Similarly, the correction E[dq(e,P) P,d=l]. (3.20) follows as for the conditional mean case that the correction It term for estimation of from equation last result follows J conclusion then follows by Theorem Proof of Theorem E[a|P] = QED. (1993). J CEllla-ajll 2 = )] The for that choice, gives the second result. of (7, equal to the expression on the right-hand side, and noting that a Setting Pt) and P is subscripts will denote e corresponding partial derivatives, giving the first conclusion. To solve equation (9) for the optimal = -E[a(w){dq (e,P)x-E[q (c,P)x and d = q (e,P) = s 1, s c and e (e,P), = q (e,P). p q PE[dq |w,d=l] = PE q H Let m p = d[q(e,P)-A(P)]. we and P,d=l]}]. Let Note that E[dq H functions, note f(e|P) s„ = s„(e,P). P P m = E[3m(z,9,a 1/90] denote the density of For notational simplicity | w] = PE[dq given c q let |w,d=l] + (l-P)E[dq w = c |w,d=0] = Therefore, = -E[aP(E D q )(x-E D x)' P c P differentiating 0, . | moment ] = -E[(a-E D a)P(E D q )(x-E D x)' P P x P Integration by parts of A(P) = 5 rq(e,P)f(e |P)dc/9P E q ]. = Tq (c.P)f(e |P)de, with respect to P, and using E s = E s obtain E Pqe = , J q e (e,P)f(c|P)de = -E q(c,P)s = -E (ps ), p e p e 24 E q p p - A p (P) = -E (ps ). p p ] It follows that C = p - E p (ps p )P(d-P). Let E (9) and a and E[pp|w] = PE Note that by conditional independence, )P(d-P). (ps p = d{q(e,P)-E q(e,P)}, denote the optimal functions, q C, = p - Then equation (pp). is Pre E[(a-E n a)PE n (ps )(x-E D x)' P = H ] m mm = E[u (z)u-(z)' 2 = El(a-E a)E[<<|w](a-Epa)'] = E[P(a-E aKE (pp~) + P (l-P)E (ps )E (ps )>(a-E a)' p p p p p p As in the E p proof of Theorem (ps e - {E )(x-E x)' p p a(w) equality for all 4.1, 2 (pp) + P (l-P)E (ps p p a = x, if = E (ps p c g(P) = and for p - [E (pp) + P (l-P)E (ps p - [s g(P)], E q = since conditional expectation gives E (s s )/[l 2 q(e,P) = s To show the 2 P (l-P)E + s ]. 0. , h(P) = 0. This ), g(P) = E An optimal c°P - g(P)s D , )g(P)] p q(c,P) it is sufficient that Multiplying through by = g(P) to last conclusion, note that _1 2 g(P)P~ (l-P) (ps find that 2 P (l-P)s g(P)]}|P,d=l]. p For this equality to hold for any function 2 -P (l-P)s E we P = h(P). 2 ) = -E[dq(e,P)<p - [s Ma-E p a)' )E (ps Taking conditional expectations of both sides given equation will hold requires that s (s q PE ) is 25 s Solving, g(P). g(P) = then given by _1 pep (s E q = so that P (l-P)E and taking a 2 2 - s q = s 0, )/[P _1 (l-P) so that + PE 2 (s PP p = dq. )]. Also, E (ps ) < = dq - E (ps )P(d-P) = dq - g(P)P p p u— (z) Therefore, = (x-E x)^ Newey and Powell (1993), Proof of Theorem 4.4: ! V-P) (d-P). matches the efficient score given giving the last conclusion. By arguments like in equation (4.15) of QED. u (z) = (x - E[x|P]){p those above, E[p J — P s > |P](d - P)> p = d(p -E[p |P,d=l]). EldwJxXptcPj-C^p^cP)} oo, P(l-P)E[(s where for P 2 ) q |P]E[llx-E[x|P]ll is 2 |P]. the dimension of x, 2 ] = Then for 4E[dw(x)s 2 ] = 2 ] C E[llu— (z)-C'u 2 P 2 2E[P(l-P)llx-E[x|P]ll <E[<p-C^ Pj (c,P)}s |P]} 2E[dw(x)<E[ ej |x,d=l]} for o(l) < o(l). 26 ] Suppose that there w(x) = llx-E[x = C ®I (z)ll ] and 2 | e C is P]ll such that as + = p(e,P)-C' p (e,P), T T = E[llx-E[x|P]ll 2 <p-C'p < E[dw(x){e -E[e J J •J |P,d=l]> J 2 ] 2 } ] + - 2E[dw(x)e 2 ] + References Ahn, H. and C.F. Manski, 1993, Distribution theory for the analysis of binary choice under uncertainty with nonparametric estimation of expectations, Journal of Econometrics 58, 291-321. Ahn, H. and J.L. Powell, 1993, Semiparametric estimation of censored selection Models with a nonparametric selection mechanism, Journal of Econometrics 58, 3-29. Bates, C.E. and H. White, 1993, Determination of estimators with covariance matrices, Econometric Theory 9, 633-648. minimum asymptotic Beran, R., 1976, Adaptive estimates for autoregressive processes, Annals of the Institute of Statistical Mathematics 26, 77-89. Chamberlain, G., 1987, Asymptotic efficiency in estimation with conditional moment restrictions, Journal of Econometrics 34, 305-334. Chamberlain, G., 1992, Efficiency bounds for semiparametric regression, Econometrica 60, 567-596. The semiparametric estimation of the sample selection model using series expansion and the propensity score, manuscript, Department of Economics, University of Choi, K., 1990, Chicago. Crepon, B., F. Kramarz, and A. Trognon, 1997, Parameters of interest, nuisance parameters, and orthogonality conditions: An application to autoregressive error component models, Journal of Econometrics 82, 135-156. P., 1982, Large sample properties of generalized method of moments estimators, Econometrica 50, 1029-1054. Hansen, L. Hansen, L.P., 1985a, A method for calculating bounds on the asymptotic covariance matrices of generalized method of moments estimators, Journal of Econometrics 30, 203-238. Hansen, L.P., 1985b, Two-step generalized method of moments estimators, discussion, North American Winter Meeting of the Econometric Society, Meeting, New York. Hansen, L.P., J.C. Heaton, and M. Ogaki, 1988, Efficiency bounds implied by multiperiod conditional moment restrictions, Journal of the American Statistical Association 83, 863-871. Hayashi, F., and C. Sims, 1983, Nearly efficient estimation of time series models with predetermined, but not exogenous, instruments, Econometrica 51, 783-798. Newey, W.K., 1990, Semiparametric efficiency bounds, Journal of Applied Econometrics 5, 99-135. 1991, Two-step series estimation of sample selection models, working paper, MIT Department of Economics. Newey, W.K., Newey, W.K., 1993, Efficient estimation of models with conditional moment restrictions, in G.S. Maddala, C.R. Rao, and H.D. Vinod, eds., Handbook of Statistics, Volume 11: Econometrics. Amsterdam: North-Holland. 27 Newey, W.K., 1994, The asymptotic variance of semiparametric estimators, Econometrica 62, 1349-1382. Newey, W.K. and D. McFadden, 1994, Large Sample Estimation and Hypothesis Testing, Engle and D. McFadden (eds.), Handbook of Econometrics, Vol. 4, Amsterdam, R. North-Holland, 2113-2245. Newey, W.K. and J.L. Powell, 1993, Efficiency bounds for semiparametric selection models, Journal of Econometrics 58, 169-184. Rilstone, P., 1989, Computing the (local) efficiency bound for a semiparametric generated regressors model, manuscript, Department of Economics, University of Western Ontario. Robinson, P.M., 1988, Root-n consistent semiparametric regression, Econometrica 56, 931-954. 28 70 / b )0 Date Due MIT LIBRARIES 3 9080 972 0900 mmm p«4V'* ; :? mmmmmms <: iltl- ^MMilMi^mm mmmmmm iBstim MB : ilillliliitlillliilifill mmmmmmmm^mmmk " , ! .' i l ; .'i; i . , :'.'i'.'i!':-''|!;i;:.:ii';t 1 I . K^K^i'tfrtv:;!! ' ' I iiiiiliiiiiiii . : : .!-;i;,: ;;!i.i' : •:.:;!. .5 ''...'. ': IS v*^';; ; : '¥*" m iiMSM-m^KM^ia^MM mw >K" s ; .:*:;:/; < ! mm