Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/asymptoticvarianOOnewe The Asymptotic Variance of Semiparametrlc Estimators Whitney No. 583 K. Newey Rev. July 1991 i massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 ^nui-lJ'' The Asymptotic Variance of Semlparametrlc Estimators Whitney No. 583 K. Newey Rev. July 1991 NOV J 4 1991 MIT Working Paper 583 The Asymptotic Variance of Semiparametric Estimators by Whitney K. Newey MIT Department of Economics August 1989 Revised, July 1991 Helpful comments were provided by referees and M. Arellano, L. Hansen, J. Hausman, R. Klein, P.C.B. Phillips, J. Powell, J. Robins, T. Stoker. Financial support was provided by the NSF, the Sloan Foundation, and BellCore. Abstract Knowledge of the asymptotic variance of an estimator is important for large sample inference, efficiency, and as a guide to the specification of regularity conditions. The purpose of this paper is the presentation of a general formula for the asymptotic variance of a semiparametric estimator. A particularly important feature of this formula is a way of accounting for the presence of nonparametric estimates of nuisance functions. The general form of an adjustment factor for nonparametric estimates is derived and analyzed. The usefulness of the formula is illustrated by deriving propositions on asymptotic equivalence for different nonparametric estimators of the same function, conditions for estimation of the nuisance functions to have no effect on the asymptotic variance, and the form of a correction term for the presence of linear function of a conditional expectation estimator, or other projection estimator (e.g. partially linear and/or additive nonparametric projections), and for a function of a density. Specific results cover a semiparametric random effects model for binary panel data, nonparametric consumer surplus, nonparametric prediction, and average derivatives. Regularity conditions are given for many of the propositions. These include primitive conditions for v'n-consistency, asymptotic normality, and consistency of an asymptotic variance estimator with series estimators of conditional expectations (or projections), Keywords: regression, derivative. in each of the examples. semiparametric estimation, asymptotic variance, nonparametric series estimation, panel data, consumer surplus, average 1. Introduction This paper develops a general form for the asymptotic variance of Despite the complicated nature of such estimators, semiparametric estimators. which can depend on estimators of functions, the formula is straightforward to derive in many cases, requiring only some calculus. not based on primitive conditions, models, it Although the formula is should be useful for semiparametric just as analogous formulae are for parametric models, (1967) for m-estimators. It such as Huber gives the form of remainder terms, which facilitates specification of primitive conditions. make asymptotic efficiency comparisons, It also can be used to in order to find an efficient estimator in some class. The usefulness of this formula is illustrated in several ways. examples are considered throughout, New in order to emphasize that it can be a useful tool for further work in semiparametric estimation, and not just a way of "unifying" existing results. A number of Propositions are derived, primitive conditions are given for many of them. and The propositions include showing that the method of estimating a function (e.g. kernel or polynomial regression) does not affect the asymptotic variance of the estimator. Also, two sufficient conditions are given for the absence of an effect on the asymptotic variance from the presence of a function estimator. One is that the limit of the function estimator maximizes the same expected objective function as the population parameter, out." i.e. the function has been "concentrated The other is a certain orthogonality condition. Several propositions are given on the form of correction terms for the presence of function estimates. One has sufficient conditions for this adjustment to take the form of the projection on the tangent set (the mean-square closure of all scores for parametric models of the nuisance More specific results are given for functions) for a semiparametric model. the case of conditional expectations, or other mean square projections, and A characterization of the correction term for estimators of for densities. linear functions of projections and densities is given, with specific formula given for semiparametric individual effects regression for binary panel data, nonparametric consumer surplus, and Stock's (1989) nonparametric prediction estimator. Regularity conditions for v^-consistency and asymptotic normality are formulated. assumptions. The discussion is organized around a few "high-level" Times series are covered, including weighted autocovariance estimation of the asymptotic variance, with data-based lag choice. Primitive conditions are given for power series estimators of conditional expectations and other projections, including several examples. The formula builds on previous work, estimators, i.e. including that on Von Mises (1947) functionals of the empirical distribution, by Reeds (1976), Boos and Serf ling (1980), and Fernholz (1983). The formula here allows for explicit dependence on nonparametric functions estimators, such as conditional expectations or densities, which are difficult to allow for in the Gateaux derivative formula for Von-Mises estimators. It is based on calculating the semiparametric efficiency bound, as in Koshevnik and Levit (1976), Pfanzagl and Wefelmeyer (1982), and Van der Vaart (1991) for the functional the estimator is a nonparametric estimator of, as discussed in the next section. Also, some of the examples build on previous work on semiparametric estimation, including Bickel, Klaassen, Ritov, and Wellner (1990), Hardle and Stoker (1989), Klein and Spady (1987), Powell, Stock, and Stoker (1989), Robinson (1988), Stock (1989), and others cited below. Section 2 gives the formula for the asymptotic variance. Section 3 and 4 apply this formula to derive some propositions on the effect of preliminary nonparametric estimators on the asymptotic variance. regularity conditions are collected in Section Some high-level Section 6 gives general 5. regularity conditions for v^-consistency and asymptotic normality when an estimator depends on a series estimator of a conditional expectation or other projection, and Section 7 applies these results to specify primitive regularity conditions for several examples. 2. The Pathwise Derivative Formula for the Asymptotic Variance The formula is based on the observation that Vn-consistent nonparametric estimators are often efficient. For example, the sample mean is known to be an efficient estimator of the population mean in a nonparametric model where no restrictions, other than regularity conditions (e.g. existence of the second moment) are placed on the distribution of the data. The idea here is to use this observation to calculate the asymptotic variance of a semiparamet- ric estimator, i.e. by finding the functional that it nonparametrically estimates, the object that it converges to under general misspecif ication, and calculating the semiparametric variance bound for this functional. To be more precise, let ^ be an estimator, and suppose that one can associate with it the triple, z (2.1) ^ = {F p fi That is, p : 3^ ; }; ^ finite dimensional data vector, unrestricted family of distributions of R^; u(F ) z = plim(p) when is a nonparametric estimator of probability limit for all distributions of ^l{F z unrestricted, except for regularity conditions. F ), z z, is true. having this as its belonging to a family that is In other words, M^F ) is the object estimated by of under general misspecif ication, when the distribution p does not necessarily satisfy restrictions on which z The is based. p asymptotic variance formula discussed here is taken to be the variance bound bound for estimation of ii{F z ), F z € This formula is an alternative to 9^. the Gateaux derivative for Von-Mises estimators, because the domain of need not include all distributions, e.g. so that on a density function. fi(F z ) can depend explicitly ) this feature of In the technical conditions to follow, the formula results from li(F having a density with respect to a measure for F which the true distribution also has a density. The formula for calculating the variance bound for in previous work by Koshevnik and Levit (1982), and others. F e (1976), /i(F is that given ) Pfanzagl and Wefelmeyer Following Van der Vaart (1991), <F ^ Zd let denote a one-dimensional subfamily of 6 e : (0, e) c a path in i.e. R, e > 0, 3-, such that the true distribution and each member of this subfamily are Z3 3-}, ^, absolutely continuous with respect to the same o— finite measure. be the expectation under the true distribution ^ F dF and let „, zO be the densities with respect to the common dominating measure, integration with respect to that measure. that for each one there is a random variable J[e-^dF-^-dF^^^) Here - T Let is^(z)dF^^^,^dz £[•] and „ z9 and dF „ zO dz denote a set of paths such S (z) with E[S (z) -.0, as -^ is a "mean-square version" of the score S (z) 6 Let e 2 Q < ] oo and 0. 51n(dF a^/dQ\ ^^ 0—0 Zw associated with the path, which quantifies a direction of departure from the truth allowed by ?. The requirement that ? in the condition that there is a set of paths scores !f 00, T with associated set of satisfying the following property: Assumption 2.1: < be unrestricted is formalized and any y c > is linear and for any there is S„(z) € y s(z) with such that E[s(z)] = E[s(z) 0, E[{s(z)-S^(z) 2 } ] < e. 2 ] the mean-square closure of the set of scores is all mean-zero random That is, variables, allows for any direction of departure from the truth. ^ i.e. The functional ji_,(S ) y : — ) ^l{F is ) that is linear and mean-square continuous with respect to IR the true distribution (i.e. ll/V(S E < )ll E[S (z) if 2 ] for every < e"^[^x(F^g) - /jfF^Q)] (2.2) pathwise different iable if there is a mapping 6), ^ there exists e > such that 5 > such that for each path, ^^ 'V^^e^ i.e. the derivative from the right of ^L,(S ). /i(F Z6 ^ e ) o, at the truth (6 = 0) The linearity and mean-square continuity of fx (S ), is Assumption 2.1, and the Riesz representation theorem imply the existence of a unique (up to the usual a. s. E[d(z)] = such that (2.3) equivalence) random vector 0, E[d(z)^] < co, d(z), the pathwise derivative, and = E[d(z)S„(z)]. Mr-(S^) y r Under Assumption 2.1 and with i.i.d. data the asymptotic variance bound for estimators of variance of /i(F ^ the functional ) is E[d(z)d(z) ' Hence, ] the formula for the asymptotic suggested here is the variance of the pathwise derivative of 3 is a nonparametric estimator of. A stronger justification for regarding the pathwise derivative of as a correct formula for the asymptotic variance of is asymptotically equivalent to a sample average. asymptotically linear with influence function (2.4) Vn{^ - j3„) = y.",i//(z. )/v^ + o ^1=1 1 p (1), i//(z) E[i//(z)] p fi(F is available when Define 3 ) ^ to be if at the truth, = 0, Var(i//(z)) finite. This condition is satisfied by many semiparametric estimators, under sufficient regularity conditions. For i.i.d. data, asymptotic linearity and the central limit theorem imply Define Var(f/»(2)). T, and 6 n is asymptotically normal with variance 3 to be a regular estimator of p = 0(l/v^), when z. has distribution /i(F F„ if for any path in ) Vn{^-^l(F^ , n limiting distribution that does not depend on {6 } fi(F consistent for Theorem 2.1: because it requires that ), /j(F ^ is a nonparametric locally is asymptotically, ^ ). Suppose that and regular for n or on the path. _ Regularity is the precise condition that specifies that estimator of has a )) 6 1 z ,z .... and Assumption T, pathwise different iable and \l)(z) are i.i.d, 2. 1 ^ is satisfied. is asymptotically linear Then is \i(F ) = d(z). The thing that seems to be novel here is the idea of applying this result to the functional (i(F ) that is nonparametrically estimated by The p. fact that asymptotic linearity and regularity imply pathwise differentiability follows by Van der Vaart (1991, Theorem 2.1), and the fact that Assumption 2.1 implies that there is only one influence function and that it equals the is a small additional step that has been discussed in pathwise derivative, Newey (1990a). This result can also be used to detect whether an estimator is As shown by Van der Vaart (1991), Vn-consistent. satisfied but fip,(S ) if equation (2.2) is not mean-square continuous (i.e. d(z) is satisfying equation (2.3) does not exist) then no V^-consistent, regular estimator exists. For example, the value of a density function at a point does not have a mean-square continuous derivative, and neither does the functional that is nonparametrically estimated by Manski's (1975) maximum score estimator. The pathwise derivative does not help in finding the asymptotic distribution (at a slower than v^ rate) of such estimators, which can be quite complicated: e.g. see Kim and Pollard (1989). The hypotheses of Theorem 2. 1 2. 1 are not primitive, but the point of Theorem is to formalize the statement that "under sufficient regularity conditions" the influence function of a semiparametric estimator is the pathwise derivative of the functional that is nonparametrically estimated by p. In Sections 3 and 4, this result and some pathwise derivative calculations are used to derive propositions about semiparametric estimators. These results are labeled as "propositions" because primitive conditions for their validity are not given in Sections 3 and 4. - They might also be labeled as "conjectures," although this word does not convey the same sense that the validity of the results only requires regularity conditions. and 4, In Sections 3 the solution to equation (2.3) is calculated using the chain rule of calculus, differentiation under integrals, = E[a(z)S (z)] in Sections given. for 5-7 a(z) integration, and 3ja(z)dF /39l 9 0— with finite mean square where ever needed, and then conditions for implied remainder terms to be small are This approach, with formal calculation followed by regularity conditions, is similar to that used in parametric asymptotic theory (e.g. for Edgeworth expansions), and is meant to illustrate the usefulness of the pathwise derivative calculation. 3. Semipcirametric M-Estimators The rest of the paper will focus on a class semiparametric m-estimators, obtained from moment conditions that can depend on estimated functions. m(z,p,h) be a vector of functions with the same dimension as on a data observation z and a vector of unknown functions fS, h. Let depending Let hO) denote an estimator of m-estimator p with corresponding h, A semiparametric is one which solves an asymptotic moment equation Y,^^^m{z^,p,hW))/n = (3.1) m(z,p,hO)). 0. The general idea here is that an estimated function is obtained by a procedure that "plugs-in" P that can depend on hO), p. An early and important example is the Buckley and James (1979) estimator for censored regression. Other examples are Robinson's (1988) semiparametric regression estimator and Powell, Stock, and Stoker's (1989) weighted average derivative estimator. For a new example, additive nonparametric component, V = (v ,v ). E[y|x,v] = x'P + p, (v The motivation for this model is that if dimensional the asymptotic properties of additivity of consider a semi-linear model with p, (v further discussion. + p ) (v ) v ) + p (v ), where is high could be adversely affected if P is true but not imposed: see Section 4 for Assume that the set of additive functions in with finite mean-square is closed in mean square, and let the mean-square (Hilbert space) projection on this set. n(»|v ,v Also, and v ) let v denote fl(«|v ,v ) denote an estimator of this projection, such as the series estimator considered in Stone (1985) and in Section 6, or the alternating conditional expectation estimator in Breiman and Friedman (1985). (3.2) p = argminp{j:^2i f^i ~ ''i'^ Consider " ^(v^, p) ]^/2>, h(v,p) = lt(y|Vj,V2) - f[ix\v^,v^)'p. This is a semiparametric m-estimator with = [x + ah(v,p)/ap][y It is possible, x'fi - m(z,p,h(p)) h(v,p)]. at the level of generality of equation (3.1), to derive a number of propositions. derivation, it is To use the pathwise derivative formula in this necessary to identify the functional that is nonparametrically estimated by for a general distribution F = F , the limit /i(F) Ep[m(z,fi,h(ti,F))] (3.3) That is, of ^ for that value of By the usual method of moments for a general E [m(z, p, h{/3, F) equal to the limit of fi hO). should be the solution to ) ] and the (by the law of large so that g is consistent that sets the population moments to zero. depend only on the limit the estimator F hO)), Before computing the pathwise derivative, it will subscript is suppressed z so that sample moments are zero, ^ sample moments have a limit of hO,F) hO) = 0. equation (3.1) sets numbers and denote the limit of where the henceforth for notational convenience. reasoning, hO,F) Let ^. Thus, hO,F), it is interesting to note that and not on the particular form of different nonparametric estimators of the same functions should result in the same asymptotic variance. For example, this reasoning explains why replacing the kernel estimator of Robinson (1988) by series estimators gives an asymptotically equivalent estimator, as shown by Newey (1990b), and suggests that for estimation of the additive model above, the distribution is invariant to the estimator of the projection. Also, two estimators may not be asymptotically equivalent if the nuisance functions estimate different objects nonparametrically. Proposition 1: The asymptotic variance of semiparametric estimators depends only on the function that is nonparametrically estimated, and not on the type of estimator (such as kernel or series nonparametric regression). To obtain more results, it is useful to be more specific about the form of the pathwise derivative. (h , ...,h,). For a path F J 1 E^[«] 6 Suppose that 9 has h denote the expectation with respect to = h.(|3,9^), h.(9) = h.O , 9 = equal to the truth when , F h.O,9) , 9 h = components, J 9=0, U let h.O,F = J J h.O) ), 9 J and let the same expressions without the 9), subscript denote corresponding vectors. For a path, j will be the ^i(9) functional satisfying the parametric version of equation (3.2), (3.4) E^[m(z,ji,h(ji,9))] = Then for m(z,h(9)) = m(z,P ,h(9)), differentiation gives 0. = /m(z, h(9- 5E^[m(z,h(9-))]/S9l ) ) [SdF /S9]dz ^ = E[m(z, h(9- ) )S^(z) ] | Then, applying the chain rule to E it follows that [m(z,h(9 ))], 9 2 1 = E[m(z,h(9Q) )Sg(z) 9Eg[m(z,h(9))]/a9|g Assuming D = 9E[m(z,p, hO, 9„) ) aE[m(z,h(9) ) ]/59 + 1^ . the implicit function is nonsingular, ]/9^| „ ° ] ^0 theorem gives = -D"^{E[m(z,h(9_))S^(z)] 5M(e)/S9l (3.5) + .0 SE[m(z, h(9) ) ]/5e I The first term is already in outer product form of equation (2.3), }. so that the pathwise derivative will exist if the second term can be put in a similar form. Suppose there are SE[m(z.h,(9-) iU (3.6) a.(z) h.(9) applying the chain rule to each 9. 9, j JU = J, 1 Jo h,(9.))]/S9L J Then, equal to such that for each = E[a .(z)S^(z) D E[m(z, h (6 ),..., h (0 .),..., h (9 . it follows that 10 ] )) ] with = « J giving the outer product form. it Then, moving follow that the pathwise derivative is a (z) {J] ._ . (3.7) } ] h ,(6, . ) ]/Se d(z) = -D [m(z,h(e ipiz) m(z,P^,hO + )) so that by Theorem 2.1 the influence function of , I inside the expectation, -D equals ^ = -D"^'{m(z,pQ,hOQ)) + l.-i^[ccAz) - E[a.(z)]]}. This influence function has an interesting structure. -D ) = El il.i. a Az)}S{z)], D J-i J l.i.ElaAz)SAz)] J-l h.O) = J].:!,5E[m(z.h, (e.) SE[m(z.h(G))]/ael )) is the usual Huber (1967) formula for the influence function of an m-estimator with moment functions hO) formula that would be obtained if The leading term m(z,^,h(^)), were equal to hO). second term is an adjustment term for the estimation of hO), i.e. the Thus, the a nonparametric analog of adjustments that are familiar for two-step parametric estimators. It can also be functional. D D Jm(z, p , hO E[m(z, p ) )dF(z) for each component of , hO interpreted as the pathwise derivative of the , F) ) ] , or as the influence function of Furthermore, . h, the pathwise derivative of and the the adjustment contains exactly one term adjustment can be interpreted as j E[m(z,PQ,hAp^) h O^, F) . , . . . , h^O^) ) ] . This property is useful, because the adjustment terms can be calculated for each function h., holding the other functions fixed at their true values, and then the total adjustment formed as the sum. For this reason the subscript will be dropped in the rest of Sections 3 and 4, understanding that the results can be applied to individual j with the h. terms, and then combined to derive the total adjustment (e.g. when some adjustment terms are zero and others are not). 11 is useful to know when an adjustment term is zero. It should not be necessary to account for the presence of hO) i.e. h(/3), it can greatly simplifying the calculation hO), be treated as if it were equal to In such cases, of the asymptotic variance and finding a consistent estimator of it. One case where an adjustment term will be zero is when equation (3.1) is the first-order condition to a maximization problem, and hO) maximizes the population value of the same function. To be specific, that there is a function depending on (3.8) P q(z,P,hO)) has a limit that HO), and a set of functions but not on the distribution F of suppose possibly such that z, m(z,p,hO)) = Sq(z,p,hO))/5|3, hO,F) = argmaxp^^^^j^^^Ej,[q(z, p, hO) The interpretation of this condition is that m(z,p,hO)) conditions for a stationary point of the function q maximizes the expected value of the same function, i.e. "concentrated out." hO,F„), Then for any parametric model it follows that E[q(z, P, hO, 6) ) ] are the first order hO,F) and that F hO,F) that is maximized at has been hO,e) since , ) ] = The first 6.. (J order conditions for this maximization are identically in p. p, hO, 0) Differentiating again with respect to = a^E[q(z,p,hO,0))]/seapi (3.9) 5E[q(z, = aE[m(z,p,hO,e))]/a0i Evaluating this equation at p , 2: | Q = 0, 3, o . it follows that If equation (3.8) is satisfied, a(z) = h(li). 12 will solve Summarizing: then the estimation of be ignored in calculating the asymptotic variance, h(p) = ]/5e = aE[aq(z,p,h(/3,0))/5p]/a0i equation (3.6), and hence the adjustment term is zero. Proposition ) i.e. it h can is the same as if Examples of estimators that satisfy the hypotheses of this proposition Ichimura (1987), Klein and Spady (1987), are those of Robinson (1988), (1990). (3.2). any and Ai A new example is the additive semi-linear estimator of equation H Suppose that the set and is invariant to F e ^ projection under and let F, hO) Then F. of additive functions is closed under IT ('Iv ,v is a nonparametric estimator of - Tl^{x\w^,w^)'^ = Tl^{y-x'P\w^,v^), which minimizes 2, estimation of (y|v IT r v 1 effect on the asymptotic variance of Z and ) p. Thus, v r 1 for c = ) Therefore, by p. (x|v IT (y|v ,v IT 2 x'^ - h(v,p)) ], - Ep[(y the same objective function minimized by the limit of Proposition denote the ) Z should have no ) y-x'P -p (v )-p (v ), the formula for the influence function is (3.10) 0(z) = (E[{x-n[xlv^,V2]}{x-n[x|v^,V2]}' ])~-^{x-n[x|v^,V2]}c. Primitive conditions for this result are given in Newey (1991), and somewhat weaker conditions could be formulated using the results of Section 6. There is another, more direct condition under which estimation of the nuisance function does not affect the asymptotic variance. condition, suppose that takes on as a function vector argument in property if h(p) m(z,h) of a subvector h(v) m(2,h). depends on h(v,p only through the value it h v of z, i.e. h is a real The additive semi-linear example has this is redefined to include the limiting value of To formulate this for a path. ) ,v IT[x|v For ]. Let h(v,9) denote M(z) = Sm(z,p ,h)/Sh| ,, , differentiation gives (3.11) aE[m(z,h(e))]/ae|^ = E[M(z)ah(v,e)/aeL If the term on the right-hand side is zero, then equation (3.6), and the adjustment term is zero. 13 ] = aE[M(z)h(v,e)]/aei a(z) = will solve One simple condition for . E[M(z)|v] = this is that if is an element of a set to which h(v,e) Proposition of a set h H E[H(z)\v] = If 3: is orthogonal. M(z) or more generally 0, E[M(z)h(v)] = such that the adjustment term will be zero More generally, 0. for all h(v,F) h & H, is an element then estimation of can be ignored in calculating the asymptotic variance. The semi-linear, additive model is also an example here. In cases where the correction term is nonzero, the limit of hO). Therefore, it is difficult to give a general characterization of the correction term. completely specifying the form of where the data, In z, h One result that does not depend on can be obtained in semiparametric models are i.i.d. z ..., , its form will depend on and z. is restricted to have a 1 is a nonparametric density function of the form f(zO,g), (functional) component. S (z) = 51nf(z|p ,g(T)))/97}| Let where for a finite-dimensional parameterization of with g S-(z) = Slnf (z|^,g„)/S^| „ the truth), and let g Also, . denote the score S^Vj^) = let A gp,. (g^ denote a constant matrix with number of rows equal to the number of elements of The tangent set J The tangent set is useful in calculating the AS (z). asymptotic variance bound for estimators of The form of this bound is S (z)-n(S (z) tangent set. IS") and this projection. in the semiparametric model 7 = (E[S(z)S(z)' ] )~\ where S(z) = for example, Newey (1990a) for further discussion. See, m(z) = m(z,p p denotes the mean-square projection on the Tl{'\J) Under certain conditions, for p. is defined as the mean-square closure of the set of all linear combinations f(z|p,g). is , hO Let discussed in Section 2 )), 6 (6 a(z) = -IT(m(z)|3') will solve equation (3.5), so that the correction term can be calculated from be the parameter of an unrestricted path, as does not have anything to do with 14 /3 or t)). Suppose that there is such that, g(0) (3.12) Jm(z,h(e))f(z|pQ.g(e))dz = In words, for the limit of hO ) 0. under a general distribution there is a corresponding value of the nonparametric component of the semiparametric model where the population moment conditions (corresponding to equation (3.1)) are satisfied. S „(z) = Slnf. (z |g(e) )/99, ^ ge Let ' element of the tangent set, respect to E[m(z)S q(z)] = E[n(m(z) g9 implying S „(z) = IKS (z)]?). ge e Suppose that = -E[m(z)S ^(z)] = -E[m(z)n(S^(z) ® under the previous conditions, go (z)]. = E[-n(m(z) |Sr)S^(z) a(z) = -n(m(z)|3^) l^") ] 1 satisfies equation Summarizing: Proposition 4: If for all unrestricted paths that equation (3.12) then |3')S Then differentiating equation (3.12) with = -E[n(m(z)|3-)n(S^(z)|3-)] (3.6). is an 6, 5E[m(z,h(e))]/Sel Thus, and note that S „(z) g9 is satisfied, a(z) = -\[(m(z,^^,h(^^))\'5 ) -D"^[m(z,pQ.hOQ)) - F ZQ there exists g(6) such dlnf(z\^^,g(B))/dQ = W(S(z)\'5), and and the influence function of ^ is n(m(z,PQ,hOQ))|3-)]. This form of the correction term has previously been derived by Bickel, Klaassen, Ritov, and Wellner (1990) and Newey (1990a). The contribution of Proposition 4 is to give a general formulation for this result in terms of the pathwise derivative calculation developed in Section 2. This result leads to a sufficient condition for asymptotic efficiency of a semiparametric m-estimator, that the hypotheses of Proposition 4 are 15 m(z) = S„(z). satisfied and In this case, m(z) + a(z) - E[a] = S^Cz) p P IKSqCz)!?) = S(z). P Furthermore, any semiparametric m-estimator that is regular under the semiparametric model -D~ (m(z)+a(z)-E[a] f(z|p,g) and has influence function will satisfy ) D = -E[(m(z)+a(z)-E[a])S(z)'], (3.13) as discussed in Newey (1990a). (E[S(z)S(z)' ] ) Thus, S(z) = VS[z), VE[S{.z)S(.z)' ]V = V, the influence function of p is with corresponding asymptotic variance which equals the lower bound. Functions of Mean-Square Projections and Densities 4. In this section, the form of the correction term is derived when the nuisance functions are linear functions of conditional expectations or other mean-square projections, such as additive or partially linear regressions, and for densities. X an r X Let vector. 1 y Let is closed in mean-square and projection of y on be a random variable with finite second moment and § g(x) that is x, denote a linear set of functions of function of g, and v h(v) = A(g,v), is a subvector of where W is all measurable functions of general example is a projection on 16 2 ]. A One h(v) is a linear z. The simplest nonparametric example of a projection is where that denote the least squares (Hilbert-space) g(x) = argmin- „E[(y-g(x)) considered in this section will be x x g(x) = E[y|x], with finite mean-square. A more with each a subvector of x. This is a smaller set of functions, whose x. consideration is motivated partly by the difficulty of estimating conditional expectations for with many dimensions; e.g. x discussion and references. where Here, see Stone (1985) for is a nuisance function, g(x) important reasons to avoid high dimensional nonparametric regressions are that a projection on a larger set of functions than that to which will lead to higher asymptotic variances for P belongs g(x) as noted in in some cases, Newey (1991), and will lower the rate at which remainder terms converge to affecting accuracy of the asymptotic normal approximation. zero, h(v) = The correction term is derived first for the simplest case, where on ^ M(z) g(x,9) = argmin~ ^^^^^V Let g(x). for a path. on identically in 2 E^^^^ 1 denote the projection of y Note that for the vector of projections of elements of 5(x) = n(M(z)|^), '§, ~ Also, 6. by it follows that E[M(z)g(x, 6) 6(x) € §, E [6(x)g(x, 6) s ] = E s = E[5(x)g(x, 9) ] so by [5(x)y], the chain rule, E[M(z)ag(x,0Q)/S9] (4.2) Ig = 5E[M(z)g(x.9)]/S9|g = SE[6(x)g(x,9)]/S9|g = {3EQ[6(x)g(x.e)]/ae - aE^[s{x)gix]]/ae}\^ = SE„[6(x){y-g(x)}]/a9l o y = E[6(x){y-g(x) }S^(z) y ] I . y Equation (4.2) implies the next result. Proposition 5: If correction term is h(v) = g(x) a.(z) is the projection of y on "§ , then the = Tl(H(z)\'§) [y-g(x)] A new example is an estimator for a semiparametric random effects model. Let (y. ,x is binary, ) (t=l,2), be sets of observations for two time periods, and suppose that for x = (x 17 ,x ), E[y |x] = $( [x ^ where +p(x) ]/cr y ) ] = <r is an unknown function, p(x) 1, an individual effect N(p(x),cr is a, 2 denotes the standard normal $ This is a binary panel data model with CDF. X and = y 1 a + e + (x P and the conditional distribution of for 0) > given a + e This model generalizes Chamberlain's (1980) random ). effects model by allowing the conditional mean of to be unknown. a In contrast to Manski's (1987) semiparametric individual effects model, is e allowed to be heteroskedastic over time, but the conditional distribution of is restricted to be Gaussian. a + E An implication of this model is that *"-^(E[y^|x]) = <r^^i>~^(.E[y^\x]) + U^-x^)^^^. (4.3) This implication can be used to construct a semiparametric minimum distance estimator by replacing the conditional expectations with nonparametric h (x) = E[y |x] estimators -1 regression of $ and choosing and p -1 - (h, (x.)) 1 on 1 x, .-x„. li 2i and $ from the least squares o" - This estimator can (h„(x.)). 2 1 also be generalized to the case where the distribution of disturbances is unknown, by normalizing the scale of ^ and replacing by a series $ approximation to the unknown inverse marginal distribution functions, although further development of this estimator is beyond the scope of this paper. To derive the influence function of the estimator of that it is a semiparametric m-estimator with ^'"^ m(z,^,h(v.^)) = A(x,h2)[$"-^(h^(x)) [x -X , $ (h variation, (x))]'. since Here, m(z,3 ,h(v,p ^ A(x,h (.h^ix))o-^ - = 0. )) Also, <i J )', , note v = x, (x^-x^)^], A(x,h2) = and D = -E[A(x, h )A(x,h Then by Proposition 5, J J 18 )' ] a.(z) = J J )J"V(4'~^(h.(x)))"^[y.-h.(x)]. )(-lcr •^ O'er cr the correction terms are the only source of M.(z) = A(x,h ){-l)'^~-^<^($~^(h.(x)))~^ J p = and IB and 0(z) = -D ^[a^(z)+a2(z)] = -D ^Atx.h^) (4.4) {^(*"^(h^(x)))"^y^-h^(x)] Proposition - • (r^(p{i~'^{h^U)))~'^ly^-h^U)]}. can be generalized to linear functionals of the projection 5 that have a particular property specified in the following assumption. h(v) = A(v,g), Assumption 4.1: is a linear function of A(v,g) there is 6(x) e & (4.5) E[M(z)A(v.i)] = E[5(x)i(x)]. such that for all and g, g(x) e §, By the Riesz representation theorem, equation (4.5) is equivalent to assuming that the functional E[M(z)A(v, g) condition is necessary for functional of h(v) h(v), is mean-square continuous in ] satisfied. Thus, for to be a v^-consistently estimable XM(z)h(v)dF(z) as discussed in Newey (1991), will affect the convergence rate of h(v) a linear This g. P so that the estimation of unless Assumption function of g(x). 4. 1 is Assumption 4.1 and the form of the correction term given below characterize the adjustment for mean-square projections. Equation (4.5) leads to a straightforward form for the correction term. h(v,e) = A(v,g(e)), Noting that (4.6) E[M(z)ah(v.e)/se] I differentiation gives = 5E[M(z)h(v,e)]/aei = aE[5(x)g(x,e)]/ae|^ o = SE[M(z)A(v,g(e))]/aei = E[5(x){y-g(x)}S^(z)]. o where the last equality follows as in equation (4.2). Proposition 6: If Assumption 4.1 is satisfied, 8(x)[y-g(x)]. 19 the correction term is oc(z) In order for this result to provide an interesting formula, possible to find 6{x). In a number of cases similar to that of Proposition (x ,x ), v = x x = h(v) = A(v,g) = J.g(x ,x )dx and , E[M(z)h(v)] = E[E[M(z) |v]h(v) In this case, takes a projection form One interesting case is that where 5. may be a vector, X 6(x) it must be E[J^[M(z) Ix^lgCxjdx^] = = ] . E[l(x^6^)f(x^|x2)"^E[M{z)|x2]i(x)] = E[n(l (x^€^)f (x^ |x2)"^E[M{z} Ix^] l^)g(x) where f (x |x Proposition is the conditional density of ) S^(x h(v) = If 7: ,x )dx respect to the product measure corresponding to X with density . dx and the distribution of -1 f(x \x^), and 1 (x €j4)f (x, then the correction term is second moment, x is absolutely continuous wi th x , given x ] Ix-, ) has finite E[M(z)|x-,] 8(x) [y-g(x)] 8(x) = for n(l(x^€^)f(x^|x2)~^E[M(z)|x2]|g). An example is average approximate consumer surplus, where variable and with M(z) = ^ n b'^ p = J^._ J g(x 1. x , By Proposition )dx /n, 8, is a price x which is a semiparametric m-estimator the influence function for this estimator will be (4.7) ^(z) = J^g(x^,X2)dx^ " ^0 " ^(l(aix^£b)f(x^|x2)"^^)[y - g(x)]. Results for exact consumer surplus (i.e. equivalent variation) and where the demand function is a nonlinear function of a projection (e.g. log-linear models) are analyzed in Hausman and Newey (1991). Another case where 5(x) takes a projection form is where derivative of a projection evaluated at some other variable and a vector \- (A A )' of nonnegative integers, and denote a partial derivative by (4.8) D'^g(x) = ^''^'g(x)/ax^^ooo^x^^ 20 let v. |A| h(v) For is a x e = E-=i'^- J" IR , h(v) = Suppose that the densities of and v v order in the components of and f (v) and V f X be (x) respectively, with respect to the same dominating x Assuming that measure. M[v) = E[M(z)|v] Let Z)'^g(x). has a density that is differentiable to sufficient v corresponding to nonzero components of X, with zero derivatives on the boundary of its support, repeated integration by parts gives E[M(z)h(v)] = J>l(v)D'^g(v)f (v)dv = (-1 XV ' ) '^' J'D'^[M(v)f (v)]g(v)dv = {_l)l^lj-C^[M(v)f (v)]| g(x)dx = E[(-l)'^'f (x)~^/[M(v)f (v)] g(x)] = V V=X V v=x X S(x) = (-l)'^'n(f (x)"^D^[M(v)f (v)] E[5(x)g(x)], Proposition 8: If h(v) = D g(x)\ x=v v , and v=x |^). are absolutely continuous x with respect to the same measure, which is Lebesgue measure for the components X of corresponding to nonzero components of X are continuously differentiable to order E[M(z)\v] of is a convex set with nonempty interior, X X, is zero on the boundary of the support of finite second moment, x in |X| f (v) XV D f (v) D [M(v)f (v)] and f (x) S(x) [y-g(x)] and the support x, X s X, and for each then the correction term is C-i;''^'n(f (x)~-^D^[H(v)f (v)] X V v=x the density for v=x has S(x) = \'§). An example with no derivatives involved is Stock's (1989) nonparametric prediction estimator, where partially linear projection, P = 7- and , [g(v. )-g(x. = E[g(v)]-E[g(x)], ) ]/n. ^ = v = {g. (x (v + x'tj}, ) x ) so that Pq (4.9) is a is partitioned conformably with This is a semiparametric m-estimator where h^(v) = g(v), h^Cx) = g(x), and From the form of the correction terms in Proposition of g(x) M^(z) = M^Cz) = 8, x, S„ 1. the influence function is 0(z) = g(v)-g(x)-pQ + [n(f^(x)~V^(x)|^) - l][y - g(x)]. This result differs from Stock's in the inclusion of the term 21 g(v)-g(x)-p , because Stock's result only derived the conditional distribution given the observations on and x The variance of the second term is the same as v. Stock's formula, because XV n(f (x)~-^f (x)|g) = f (x, )~-^f X 1 1 U^-E[x^\x^])E[War{x^\x^)]~'^Elx^-Elii^\x^]^^^ 11 equal to the densities of and x + f^U^) and fy(v^) 1 1 respectively. v ) 1 1 for ], (x, V Proposition 8 also gives the form of correction terms for the dynamic discrete choice estimators of Ahn and Manski (1989) and Hotz and Miller (1989), and the average derivative estimator of Hardle and Stoker (1989). A correction term for density estimation can be derived under conditions A(v,f w is a linear function of the density ) WW respect to some measure. E[M(z)A(v,f path. )] f w Let f w (w|e) of a vector (w) Suppose that there is = J'a(w)f (w)dw. h(v) = A(v,f Suppose that similar to those for the projection. a(w) w ), w, where with such that denote the density of w for a Then E[M(z)ah(v,e)/5e] I O Wo = aE[M(z)A(v,f (e))]/ae|^ = aE^[a(w)]/Sel = E[a(w)S^(z)]. y y y Proposition such that 9: If h(v) = A(v,f E[H(z)A(v, f )] ) for a density f and there is a(v) then the correction term is = So.(v)f (v)dw a(w)-E[a.(w)]. Existence of such a if J"f w (w) 2 dw a(w) will follow from the Riesz representation theorem is finite and E[M(z)A(v,f w can be extended to a linear )] functional on the Hilbert space of square integrable (dw) functions that is continuous. sense, Continuity of E[M(z)A(v,f )] in f , in the square integrable appears to be essentially necessary for the correction term to be V^-consistent, although it is difficult to give a precise result, because the 22 usual parameterization for checking v'n-consistency is the square root of the rather than density itself. density, A case where is it easy to compute a density correction term that with w = (y',x') parts gives E[M(z)A(v,f (_l)l^lj-£)^[M(v)f V (v)]| {-l)''^'D^[M(v)f (v)]| V Proposition W _ V— X v=x )] = jM(v)D'^[J'a(y)f W (w)dy] | | _ f X=V V is Integration by . (v)dv = [Ja(y)f (w)dy]dx = Ja(w)f (w)dw, w w a(w) = a(y). h(v) = D Sa(y)f (y,x)dy\ If 10: h(v) = D [Ja(y)f (w)dy] and a(w) _ , v and are absolutely x continuous with respect to the same measure, which is Lebesgue measure for the components x f (v) and of D f (v) is a convex set with nonempty interior, is zero on the boundary of the support of D [N(v)f (v)]\ V x v=x has finite second moment, a(y) then the correction term is (-l)^^^D^[H(v)f (v)]\ V X, are continuously differentiable to order E[H(z)\v] the support of X, corresponding to nonzero components of x v=x a(w) - Ela(w)] for x the density in |A| and for each x, X ^ and then the correction term is aCw) = a(y). This result gives the form of the correction term for Powell, Stock, and Stoker's (1989) weighted average derivative estimator and Robinson's (1989) test statistics. Another example is Ruud' s (1986) density weighted least squares estimator, which is treated in Newey and Ruud (1991). There may be other interesting cases where the form of the correction term can be calculated. Hopefully, the ones given here illustrate the usefulness of the pathwise derivative calculation of the influence function. In the next two Sections, regularity conditions for the validity of many of these calculations are given. 23 Regularity Conditions. 5. This Section develops a set of regularity conditions that are sufficient for validity of the pathwise derivative formula. The regularity conditions are based on direct verification that remainder terms from the pathwise derivative are small. The rest of the paper will focus on semiparametric generalized method of moments estimators where does not depend on parameters. h(v) be a vector of functions of the data observation vector and ^ a J x assume that the moment condition Note that this setup allows that some h(v) y. h(v) h(v) )) E[m(z. P , and each ' , h(v. ) ) ] , ) parameter 1 is a vector. v. =0 is satisfied. to include parameter values, by specifying For example, some might be trimming parameters, as in Newey and Ruud (1991). m (S) = n denote an estimator of this vector function, ,m(z. p, h(v. q x represents a possible value of h are trivial (can only take on one value). v. elements of Let where h, h(v) = (h (v ),..., h (v a vector of functions Also, vector 1 the z, m(z,^,h) Let and )/n, W a positive semi-definite matrix. The estimator to be analyzed satisfies p = argmin^ ^m 0)'Wm (p). p€t! n n (5.1) Although h is not allowed to depend on p in this section, the results are still useful for the general case, because they provide conditions for the important intermediate result that asymptotically normal. letting V. ,m(z. 6., h(v. ^1=1 , 1 , 6. ) )/v'n This result will follow as a special case by p = I^^i'^^^i'^o'^^^i'^O^^^'^- Because of the importance of asymptotic normality of (for is 1 m(z,h) = m(z,p correction terms, , h)), J^._ m(z. h(v. , ) )/V^ and because this function is the source of the it is useful to discuss this result first and organize the 24 discussion around a few high-level conditions. The pathwise derivative calculation is very useful in formulating these conditions, because it gives the form of a remainder term that should converge in probability to zero, implying asymptotic normality. (J = J), 1 and a(z) = Let J^._ be the solution to equation (3.5), a.(z) a.(z). from the form of equation (3.7) Then, one would expect that the following remainder should converge in R term probability to zero: R If R n n 11 = y;.",m(z.,h(v.))/Vn - F.^^u./Vn, ^1=1 -^ 0, 1 ^1=1 1 then asymptotic normality of ^ -^ from the central limit theorem applied to To give conditions for remainder term. column of M(z) For 11 = m(z.,h(v.)) u. ,m(z. h(v. ^1 = 1 1 1 Y. , + a(z. - E[a(z)]. ) 1 ) will follow )/V^ u./Vn. J^._ to be small it is helpful to decompose this R M(z) = 9m(z,h)/3h|, ^, ,, h=htv) let M.(z) denote the j J and 11 11 11 R^ = I.",{m(z.,h(v.)) - m(z.,h(v.)) - M(z.)[h(v.) - h(v.)]}/Vn. ^1=1 n 1 R^. = j:.'^.{M.(z.)[h.(v.) ^1=1 nj 1 1 J J Note that R n = R^ + E.'^.R^.. n ^j=l nj' - - h.(v.)] a.(z.) 1 J J so that R n + 1 E[aAz)]}/Vn. J -^ if each of the following ^ conditions is satisfied. Asymptotic Linearity: R -^ Asymptotic Differentiability: 0. R -^ . 0, (j = 1 J). Asymptotic linearity is similar to a condition formulated in Hardle and Stoker (1989), and will follow from a Taylor expansion and a sample mean-square convergence rate for discussed below. h of slightly faster than n -1/4 , as Asymptotic differentiability is a deeper, more important 25 h.(v) It can be shown to hold if condition. is a kernel estimator of J using U-statistic projection results and higher-order bias J'a(y)f (y, x)dy reducing kernels, as in Powell, Stock, and Stoker (1989) and Robinson (1988), or if is a series estimator using properties of sample projections and h.(v) series approximations, as in Newey (1990b) and Section 6. possible It is also to further decompose the asymptotic differentiability remainder in a way that allows application of Andrews (1990b) stochastic equicontinuity results. Jiji = 5:.",{M.(z.)[h.(v.) - h.(v.)] R^^. ^1=1 nj R^^ = Vn{XM.(2)[h.(v) Note that R^ . = nj R^^. nj - ji nj J - h .(v)]dF(z)}/i/S. J J h.(v)]dF(2) - l.^Aoi.U.) - E[a (z) }/n}. . so that + R^^, - J>l.(z)[h.(v) Let R^ nj . -^ ] if each of the following conditions is satisfied. Stochastic Ecpiicontinuity: Functional Convergence: R R . nj -^ . — > 0. 0. Conditions for stochastic equicontinuity are given below. convergence is specific to the form of if E[M.(z)|v] = 0. Thus, 0, One interesting result is that then functional convergence holds trivially for a.(z) = asymptotic linearity and stochastic equicontinuity are regularity conditions for Proposition zero, h(v). Functional 3, as further discussed below. When a.(z) functional convergence may follow from asymptotic normality of mean-square continuous linear functionals of h(v), since functional convergence is only slightly stronger than asymptotic normality of v^{jE[M.(z)|v]th.(v) - h.(v)]dF(z). Some of these high level conditions will be consequences of more primitive hypotheses. The first of these limits the dependence between 26 is not observations that are far apart. Assumption 5.1: coefficients is strictly stationary and strong (a) mixing with mixing z. = 0(t~^) oc{t) for fi > 2. The next condition is uniform consistency of Assumption 5.2: sup ,, v^el/j For each |h.(v.)-h.{v.)| J J J V and the support j -^ h. of . v ., 0. J Primitive conditions for this and the other assumptions about Section The following pair of hypotheses are more primitive conditions for 6. Asymptotic Linearity. random variable = {h: Y < llh-h(v)ll Assumption 5.3: . i 2, 6 The first imposes smoothness conditions on let = |Y| (E[ | Y| ^] ^''^, and for any ) For a m. let € > Jf(v,€) €>. |m(z, p there is a neighborhood & are given in h , h(v) N ) | of , p is finite for some € > 0, b. > (z) such that with probability one i 2 continuously differentiable on > s' 1, &. 2fi/(fi-l) s j+k ^ 2), (1 m(z,p,h) and is twice Nx'Hiv,e), The next hypothesis imposes a convergence rate on h. O Assumption 5.4: ii) either m i) for some h, is linear in h for each or h J, ^ -& Y. ^1 = ,1 |h . J (v. )-h 1 . (v. J 1 ) | At /n = o (n p ' -(1/2). Assumption 5.4 is stated in terms of the sample L norm rather than a more general norm because the literature on convergence rates of nonparametric estimators seems to give the sharpest results for this norm. 27 ): If Assumptions 5.2 - 5.4 are satisfied then Asymptotic Linearity Lemma 5.1: is satisfied. The following condition is sufficient for stochastic equicontinuity. Assumption 5.5 < , E[l(v such that V. set |M(z)| J V is convex with d. integer J > = ] II and for each 2/i/(/i-2) is a singleton, J Prob(Boundary(V h.(v) J .) = ) and there is a positive is continuously differentiable, J with bounded derivatives, to order d. V on . sup^ ^^ |D'^h(v.)-D\(v.)| -^ and for all . |A| £ d., J J J 0. then Stochastic If Assumptions 5.1 and 5.5 are satisfied, Lemma 5.2: there is a j, and either; a) V. J such that diin(v.)/2 J > s' .(z) .el/" .) IIM J b) or; for oo Equicontinuity is satisfied Although the main focus here is asymptotic distribution theory, for completeness it is appropriate to give a consistency result. The next hypothesis imposes identification and regularity conditions for consistency. Let p: IR ^ Assumption 5.6: W E[m(z, (3, is positive definite, b) p(e) B E[b(z)] < is compact, oo h(v) j3 e -,, nSH IV, m(z,p,h(v)) G ) -5.2 II ] < Jlm(z, p,h)-m(z, cd, /3, W , h{v) ) II W, with /3 there is -^ and b(z} £ b(z)p(€), or; J E[b{z)] < ^"PpeB h€K(v e)l'"'(z.'3.h)-m(z,p,h(v))ll If Assumptions 5.1 h(v) is continuous in continuous at zero such that Theorem 5.3: |3, P is convex in a) m(z,P,h(v)) E[llm(z, SB, and sup, 0. has a unique solution at = ) ] and either probability one, for each such that p(0) = be continuous at zero, with IR :£ oo, p, sup there is „llm(z, 3, h(v) ) II and ^ b(z), b(z)cP. and 5.6 are satisfied then 28 b(z) -^ ^ . Next, regularity conditions for Proposition 3 are given. If Assumptions 5.1-5.6 are satisfied and for each Theorem 5.4: E[H .(z) \v .] = J J U. set Let more generally or, h .(v E[M .(z)h such that J J E[m.m'.] + y„ ,E[m.m'. ^i^-p ) „ + m. -^ and h .(v J for all = .(v .)] J J .) J J h either J are elements of a .) J e . = m. K then for ., Q = J J „m'.]. V = (D'WD)~'^D'WnWD(D'WD)~'^. N(0,V), This theorem shows that Andrews (1990a) "independence" hypothesis, that estimation of ^, does not affect the limiting distribution of h(v) consequence of orthogonality of with the set of possible M(z) is a h(v). The next asymptotic normality result allows for a nonzero correction n^ = E[u.u^^^], Let term. Theorem 5.5: J), If for some Assumptions 5.1 satisfied, ^ = n^ v^(p-p then ) 5.6, -^ A consistent estimator of u. 1 m. 1 If When 1 1 ) ), * u. 1 = u. p. Such Q denote estimates of ... in. 1 WD)~^ can be formed from a.(z.) J is not autocorrelated then u. 1, is required to form a consistent n of a., Ji 1 = m. (z. ,p,h(v. '^' 1 Let u.. (J = is finite, V = (D' WD)~-^D' WnWD(D' N(0,V), n , and Asymptotic Differentiability are estimator of the asymptotic variance of estimates |a.Czj| > 2^/(y.-l), s' - 5.4, + l^l^iQ^+Q'^) + Y- ,\.a...-Y. ,a../n], ^.]=1 11 .11 ^1=1 n = Q and 1 n„ = i Yv ,u.u' „/n. ^1=1 1 i+£ will be an appropriate estimator. may be autocorrelated, consider a weighted autocovariance estimator like that in Newey and West (1987), with 29 n = ^Q + l^^^u,L)iQ^ + n^], where w(£,L) w(£, L) = 1 is a weight such that - £/(L+l). Here covariance matrix of there is 2, for each i ^ and . = p n = n^; u such that |m.| ^ is bounded uniformly in Up CO J 1, s > 4u./(ll-2) rr- \\p-pj\ i, \h .(v)-h .(v)\ J D'WnWD(D'WD) w(l,L) J, for each 1 an estimator of the asymptotic Suppose that Assumptions 5.1 and 5.3 are satisfied with Theorem 5.5: 10 Q, can be formed in the usual way, as ^ V = (D'WD) fe n ;, or = (l/Vn). there is e n b) L -^ co, and ^ and and L \a .(z.)[ s X J and n is nonsingular. The estimator Q = &_ are finite w(i,L) = lim, = o(l) such that 1/n = 0(&^), n p (^n ) L = o (&~'^). p n and either a) Q, = 0, I Then, V -^ V. the asymptotic variance depends and an optimal (asymptotic variance minimizing) choice of W, when I y.",lla ..-a .Cz Jll^/n = ^1 = 1 ji 1 J As usual for minimum distance estimators, on such as can depend on the data, as is important for Given this estimator of applications. 6,^ = L is positive semi-definite, Q W is n can be used to form a feasible version of the optimal minimum distance estimator, by using W = n in equation (5.1). The resulting estimator will be an optimal estimator that adjusts for the presence of first-stage, plug-in estimators in the moment functions, W, (D'n similarly to the estimator of Hansen (1985). D) For this choice of will be a consistent estimator of the asymptotic variance of 30 Series Estimation of Projection Functionals. 6. This Section develops regularity conditions for linear functionals of Power series are considered because power series estimators of projections. they are computationally convenient and the most complete convergence rate Although in some contexts power series results seem to be available for them. are thought to be inferior to other approximating functions, because of their "roly-poly" behavior and global sensitivity of best uniform approximations to singularities, these considerations may not be as important here, where the projection estimator is a nuisance function. Under Assumption 4.1, p depends on the series approximation essentially only through a weighted average, where these problems with power series seem not to be so important. An example is provided by the Monte-Carlo results of Newey (1988a), where a semiparametric power series estimator performs extremely well relative to a kernel estimator. Here, in equation (4.1). £ i L & the domain of the projection will be assumed to take the form The conditions to follow will depend on the maximum across of the dimension of which will be denoted by x„, a. A power series estimator of the projection can be obtained from a regression of truncated power series with elements restricted to lie in Stone's (1985) spline estimator. integers as before, and let x X Let analogous to ^f x. IT._ on a denote a vector of nonnegative X T- = W, y For a sequence . CO (/VCk)) of distinct such vectors, a power series is X (6.1) pj^(x) k = , L+l,k 1, . . . dim(x , ) L+l = A (k-s X It will be assumed that ) , . {x , . /• ~ k = dim(x A (k }, j. \ L+ l , )+l, ,~ , k>dim{x ) L+l 31 consists of all multivariate powers of each for x. p. (x) and no more, ^ L, ordered so that is |A.(k)| This assumption imposes the essential restriction that monotonic increasing. each £ belongs to and the spanning condition that the sequence W, includes all such terms. The estimator of the projection considered here is that obtained from the least squares regression of on the data. For K [p (x y = (y K i(x) = p^(xrn, (6.2) y K )', terms, where K K p (x) = (p.(x) the estimator of p (x )], ) on y g(x) is allowed to depend and p (x))', p K = is t = p'^'p^^/n n = t'p^'y, where (•) denotes a generalized inverse. K K p 'p will be nonsingular with probability approaching one, so that the Under the conditions to follow, choice of generalized inverse does not matter, asymptotically. A data based is essential for making operational the nonparametric K properties of series estimators, allowing the estimator to adjust to conditions in particular applications. how to best choose K It would also be interesting to know in the current context, but this question is outside the scope of this paper. For computational purposes it may be useful to replace p (x) with nonsingular linear transformation to polynomials that are orthogonal with respect to some distribution, since these may have less of a multicollinearity problem than power series are known to have. not affect the estimator. Also, Of course, this replacement will note that the elements of each smooth, bounded transformations (e.g. x. may be the logit distribution function) of "original" variables, which may help to limit the sensitivity of the estimator to outliers. In the Monte Carlo example of Newey (1988a), transformation lead to reduced sensitivity to the choice of 32 such a K. The function linear function A(v,g) that appears in the moments will be taken to be a of the projection will be estimated by replacing h(v) of h(v) A in g, by g g as analyzed in Section 4, and By linearity in the function. the resulting estimator takes the form h(v) = A(v,g) = A(v)'7t, (6.3) g, A(v) = (A(v,p^) Note that this estimator requires that A(v,g) A(v,p~))'. have an explicit form that does not depend on the true data-generation process. An estimator of the correction term is required for estimation of the asymptotic variance of p. constructed in a straightforward way. Let a(z) = *'f:"p^(x)[y - g(x)], (6.4) By Assumption 4.1 *'£ p (x) i * = Y.^^^ldm(z^,^,hiv^))/ah]Mv ^)/n. will be an estimator of is an estimator of the regression of approximate 6(x) such an estimator can be Under Assumption 4.1, for large and K n. on 6(x) Alternatively, K fixed. which will p (x), can be viewed a(z) as the estimator of the correction term obtained by treating were a parametric regression, with so that J6(x)p (x)dF(x), g(x) as if it This procedure results in a consistent estimator of the correction term because it accounts properly for its variance, while bias from the series approximation will be small because of smoothness restrictions on and g(x) 5(x) imposed below. The following conditions are needed to apply the results of Newey (1991). Let X denote the vector consisting of the union of all distinct variables appearing in x„, (£ £ L), and let first condition is sufficient for § s, t = L ~ '(Eo^ig/^^/^ to be closed. 33 ~ • ^^^A^f^ 2 ^ *~ "^- ^^® Assumption 6.0: of x^ then i) x = For each for some x„, such that for each x^, (£ I' = .... 1, ii) There exists a constant £ L; with the partitioning i, X = (x^.x.')', cJa(q)d[F(x^)-F(xJ)] £ E[a(q)] ^ c"Va(q)d[F(x^) -FCxJ) 0, = (i.e. of t, is not present) or X is a subvector x if L), ] c > for all a(q) ^ Either iii) ; 1 § is bounded and for the closure x E[{x^^^-n(x^^j|f)}{x^^^-n(x^^^|f)>'] i) is nonsingular. The next condition requires that the support of be a box and places a X lower bound on its distribution. Assumption 6.1: There are finite such that the support of X X is X. n II - > X.,, v . ^ 0, 1, dim(X)) ..., and the distribution of [X. ,X.. ] ju' jb ^ j=l = (j has absolutely continuous component with density bounded below by on the support. Cn. ,[(X. -X.){X.-X., )] "j=l JU J Jb J The nonsingularity condition is a normalization, unless interest, where it is an identification assumption for Assumption 6.2: |c| is finite for , s' i 2 and E[e is a parameter of 7) t\ . 2 |x] Let e = y-g(x). is bounded. The bounded second conditional moment assumption is quite common in the literature (e.g. Stone, Assumption 6.3: 0(i) = 0(t~^), that Either (t = 1, lEf^i^^i+il'^i.^i+^J and simplifies the regularity conditions. 1985), a) 2, I z. is uniform mixing with mixing coefficients ...), for /i - c(t) and E^"jC(t) < > 2 b) or; there exists c(t) such co. This assumption is restrictive, but covers many cases of interest, including independent observations and dynamic nonparametric regression with g(x.) = E[y. |x.,y. .,x. ,,y. ^,...]. 1-^1-1 1-1 '1-2 1 The next condition restricts the amount of variation allowed in the choice of number of terms 34 K. Assumption 6.4: K = K(n) where such that The bounds such that with probability approaching one, K = K K = K(n) and K(n) ^ n and K are sequences of constants, for all and there is e > large enough. n control the bias and variance, K K ^ K ^ K respectively, of g. The next set of conditions impose smoothness assumptions used to control the bias of the estimator. Assumption 6.5: is continuously differentiable to order (x.) g. d, (£ ^ L). Two results will be given, because the conditions are weaker and simpler in the special case where any nonnegative integer meriting Its separate treatment. h(v) = g(x), For let d C^(K) =K-^"^*2d_ (6.5) The covariance matrix of Section 5, can be estimated by the procedure discussed in ^ using the estimators of The asymptotic given above. a.(z) distribution results will include consistency of this variance estimator. Theorem 6.1: Suppose that Assumpt ions 5.1, 5.3, 5.6, 6.0-6.5 are satisfied, and for each J, and ^\l/(\1l-2) E[\\M .(z)-5 .(x)\\ d differentiable to order zero: either m(z,^^,h) n Cq(K)(K 1, V K n = n.; or b) u o 2 on is linear in /vn L + -^ K 00, Hi) x; h )+n and s' .(z)\\ is bounded; \x] K^^^^CK)^-"^ ^"^ K^C^(K)^/n. ElWH s > ^ii/Cii-Z), i) ] is finite for some each of the following converge to K/Vn or K5 L = o p + -^V. 35 VnK = o(l) (s'h. ri is continuously ii) S .(x) K^^^^^(K)k'^^'' , > s' , v^'^^^^S^^'^; = o(n O2); iv) and either a) Q. = Then Vn(^-f3^) u -^ = v) e N(0,V) 0, £ and > The upper bound on the rate of growth for the number of terms in the series expansion is n 1/4 when the density of which is less than the n 1/2 is bounded away from zero X rate derived in Newey (1990b). requires existence of derivatives of both and g(x) 6(x) (v = 0), This result also up to order more than the largest dimension of an additive component, as in Newey (1990b). h(v) = To obtain asymptotic normality in the more general case, where A(v,g) is some other linear function of continuity condition on support of as a function of A(v,g) and denote V, Assumption 6.6: There is a constant IIA(v,g)llQ g. Let V denote the supremum Sobolev norms by = =^Pui^d,vel/l^''h^^^l' "^^^^"d it is useful to impose a g, "S^^^"d =^"P|A|sd,xl^^^(X)l. and an integer C A such that Cllg(x)ll^. :£ This Assumption will imply that the bias from approximating the function h (v) by a linear combination of approximating of p (x). g(x) is bounded by the bias of A(v) and its derivatives to order A by a linear combination Unfortunately, for multivariate functions, a literature search has not yet revealed bias bounds for approximating a functions and derivatives by power series, except under part b) of the following condition. Assumption 6.7: Either a) a = 1 or A = is continuously differentiable to all orders, that sup^^^|Z)\^(x^)| £ c''^' for all When K K for any is random, b) for each £, g. and there is a constant (x.) C such X. Condition b) implies an approximation rate for is faster than or; g(x) and its derivatives that a. it is useful to also have an approximation rate for 36 In order that the results would apply to many cases, 6(x). with more general approximation results that one anticipates will appear eventually, the rest of the results of this section will be stated in terms of on e^(K) = min |5(x)-p*^(x)'7i|„. ^ X Let = X(n) = tK. K]. Theorem 6.2: Suppose that Assumptions 4.1. 5.1. 5.3. 5.6. 6.0-6.7 are satisfied and let = a = ^ h(v))| s , and mixing or b) € a - (d/n.)-L b) J^^ft ^ are , }^nK'°'oejK) -^ 0; and ^ CK^ K — o and e n = n^^¥p^^C,^(K)^/V?i + flj^^^CK)^]^^^ -^ 0; or b) L -^ <x>. and L = o (e~^). P (J n s' m(z.h) a) + is uniform is linear in [l^eJK)^]^^^ -^ A Then either a) n^ = Vn(^-(i^) u -^ and > ^[i/(^i-Z). n^^l^^^C^jDll^^^/^n iv) a. j. ii) Either a) z. n''^[K{:jK)+ K^^^C-CTc/j/v^ = K^^^i:jK)K~°- + yrMcAK)^lK/n+YC^"-l = o(l) A fi_; finite for Hi) Either 0; > under Assumption 6.7 a) and Also suppose that for each \6 .(x) [y .-g .(x)\ J J J i) Z,/KC.CK;^K~'^" -^ and and under Assumption 6.7 b). +00 |m(2,p = d/n, a. 0. + k""' I ^ N(O.V) 1 h or 0. + 7 h . and = V -^ V. The smallest upper bound on the number of terms allowed by this theorem is n , for V = and s = oo, a rate also derived in Newey (1991). result also requires existence of derivatives of 3/2 g(x) This of order more than the maximum dimension of the additive components, but imposes weak smoothness restrictions on 5(x). The requirement that the bias go to zero faster than for V^-consistency, product of i/n, estimating g(x) is that v'i^ °''oe.(K,) o 1/v^, converge to zero. the approximation rate for g(x) (i.e. as needed This term is the the bias from by truncated series), and the approximation rate for 37 5(x). Consequently, "iindersmoothing" is not required for asymptotic norma'' ty ' th "plug-in" series estimators, as it is for some kernel estimators, because the bias from estimating does not have to go to zero faster than h(v) 1/v^. This result is a consequence of the usual orthogonality property of mean-square projections. Let on y) p (x) of g(x) (or gv-(^) and the corresponding value of h^(v). and ^jf^^) be the population regression respectively, 6(x), Using Assumption 4.1, and hxr(v) = A(v,g ) the population first order bias term, analogous to that considered by Stoker (1990) for kernels, is E[M(z){hj,(v)-h(v)}] = E[5(x){gj,(x)-g(x)}] (6.6) = -E[{5j,(x)-5(x)}{g^(x)-g(x)}]. so that the bias term for h is equal to product of biases terms for 5 and g- Under Assumption 6.1 with v = 0, K nonrandom, and Assumption 6.7 a), these results will also apply to uniform knot spline estimators of the definition of is changed to Cj(K) Cj(K) = 5+d K' . g(x), if More generally, the results apply to any series estimator satisfying Assumptions 3.1-3.8 in Newey (1991), 7. although these do not allow for Fourier series estimators. Examples This Section gives primitive regularity conditions for the validity of the examples of Section 4, and one or two additional examples. To save space, this Section has a special format, where each subsection gives an estimator an estimator of its asymptotic covariance 38 and a result on v'n-consistency and asyniptOj. variance, normality of the estimator and consistency of the estimated c with discussion of the results reserved until the end. To give more specific results on the rate at which the number of terms can grow it will be useful to impose the following Assumption. T K(n) = n Assumption 7.1: 7. 1 — and K(n) = n r for some T > 7 > 0. Semiparametric Random Effects for Binary Panel Data CP;.o^o)' 1 2 = (!'', A. ^1=1 1 An 1 ^(h,(x.)). ^I-'^.A.* ^1=1 1 h.(x) = Ely, |x], t I. = (t 2), 1, = 1(0 1 < = A. t I 1 h,(x.) 1 < and 1 2 V = ^(I.;^A.Ap-^I.2^G.GMI.;^A.An-^ = X. 1; 1 (x. . 1 il 1 h^(x.) < 1 = S-. 1 1 1 < -x. $"-^ ' i2 1), (h„(x. 2 1 Suppose that are satisfied for to order d on i) x = (x ,x X; iv) iii) ). E[(x -x Assumption 7.1 is satisfied with Then 7. 2 V?l[(^' ,S-^) - (p' ,<r^)] -^ , p(x) )(x -x N(0, V) , p(x) )' ] + <I " and =^^g^^r^2i^^^i J^P^(x^,X2.)dx^/n}'rV(x.)[y. 39 " ^t^'^l^^^' is nonsingular; r > max{2r^/d, V -^ V. ^ = ri=i>r^g^^r^2i^^^/"' ^ AM^^.^^^'} - ii) Assumptions 6.0 and 6.1 Nonparametric Consumer Surplus ^ = ^i=i^/^' )' has continuous derivatives of up p(x) T < 1/4. ) . = A. ^c"! (h^ (x. )) G. are i.i.d.; z. ) 1 + It=l^"^^^~^°'tf^j = l'^^*'^^^t^''j^^^"^P^^''j^^"'^'P^^''i^^yti Theorem 7.1: (x.' X.;)'. il i2 -^ - i(x.)]. vi) ZTk/d^, ^/(d+d_)>. o o Theorem 1.2: Suppose that 6.5 are satisfied for are i.i.d.; ii) Assumptions 6.0 - 6.2, and z. i) v = 0, is absolutely x = (x ,x^) iii) s > 4. d continuous with respect to the product of the uniform density on the distribution of ~ 1 {g.(x.,x^) + X x 2 with bounded density , and f(x ,x 't)}, d_ different iable to order 1-11 f(x^) X continuously different iable in Assumption 7.1 is satisfied with r > Tnyd. 7. 3 Vn(^-p) Then E[x on the support of o -^ N(0. to order d > 3n./Z, V) and and either a) ^ = f(x) and ) 2 1 \x x are continuously ] or b) o -^ x. iv) y > n./[2(d+dj] o T < (s-2)/5s, V is '[l(f(x)\^) on the support of d. and V. Nonparametric Prediction P = Zi=iti(v.} - i(x.)]/n. V = Zi=i^i/n, + JJ Suppose that 6.5 are satisfied for with respect to X 2 't)}, 12 V = (v ,x x ), and f \(x d.. o 1-11 f X Assumption 7.1 is satisfied with Vn(^-^) -^ and either a) ^ = (g (x f(v) and ) on the support of continuously different iable in Then \(x ) N(0, to order d > 3a/2, V) and 40 V is absolutely continuous v iii) s > 4, with bounded density and . are i.i.d.; ii) Assumptions 6.0 - 6.2, and z. i) v = different iable to order r > rn/d. '^i-'i^i <I-"i[p^(v.)-p*^(x.)]/n}'z'^p^(x.)[y. -i(x.)]. J=l Theorem 7.3: = i(v.)-i(x.)-p G. 2 Eix or x d„ o 1 \x ] b) -^ V. + are continuously Tl(f(x)\'S) on the support of T < {s-2)/5s, ) x. is iv) y > ny[2(.d+dj]. o Average Derivatives 7.4 V = Z.^jG.G^/n, Ely .\x ., \a.(z)\ J ^ , Suppose that i) (J = 00, X in X other than and K = 0(n (s-2)/[s(7+4v)], and 7.5 s' V -^ and r ) E[\\f(x) or such that that is continuously different iable f(x) df(x)/dx a) W ] m(z,h) zero on the boundary df(x)/dx v) K ^ n is finite. linear in T < (s-2)/ [s(14+4v)] b) 4yi/(^i-2) is absolutely continuous respect to the -12 for either > and the distribution of all elements x with density x there is on the interior of a convex support, of the support, > X J), ..., product of Lebesgue measure on of iii) Assumptions 6.0 - ii) K = K(n) is not random; 1, g(x.) = Assumption 5.1 is satisfied, and and 6.1 b) are satisfied, < ^ 111 y ._, X ._,...] ; 6.5, 6.2, - [5:.",{am(z..h(x.))/5h>{ap^(x.)/ax,}'/n]z"p^(x.)[y.-i(x.)] ^j=l 1 1 1 J + Theorem 7.4: = m(z.,h(x^)) G. . Then h and Vn(0-^) -x for some F < -^ N(0, V) V. Discussion The conditions in Section 7.4 for multidimensional average derivatives are quite restrictive, results were available, but could be relaxed if better approximation rate on approximation of derivatives and unbounded functions by power series. In particular, having an approximation rate for derivatives. Also, 6(x), nonrandom K results from not which is unbounded for average one can relax these conditions substantially for weighted average derivatives, where fi = E[w(x)3g(x)/Sx] 41 and the weight function is if such that f (x) the support of w(x)5f(x)/5x x, + 5w(x)/Sx is continuously differentiable on including allowing for random K. The estimators for the semiparametric random effects and for nonparametric consumer surplus seem to be new. The result for nonparametric prediction is the first result on v^-consistency of an estimator, and includes the unconditional variance in the estimation of the asymptotic variance. Series estimators for average derivatives were previously suggested by Andrews (1991), although the result here includes conditions for V^-consistency and apply to times series. 42 Appendix Throughout this appendix Proofs of Theorems A: will denote a generic positive constant that C can be different in different uses and will denote o inunediately from Theorem 2.1 of Van der Vaart linearity of any S (z) e N(0, E[(0(z)',S„(z))' (v^(p-p !f, conclusion of Lemma b' (G [/i(F ZS )-p(F ) ' U (i//(z)' A. ,S^(z))] converges to -,)]) ZU E[ (0(z)-d(z) )s(z) to be any element of Proof of Lemma 5.1: h. = h(v.), max. Ilh.-h.ll i^n 11 = (n < 02). since asymptotic w.p.a.l. e Thus, E[0(z)S (z) B and path, b ] it ] , and Also, b, while by pathwise = b'E[d(z)S^(z) ] Since . follow by Assumption 2. so that choosing s(z), s(z) 0(z) = d(z). it follows that suffices to prove the result for scalar = h (v.), h. follows that for any vector b'E[0(z)S^(z) i/»(z)-d(z), It it for all mean-zero = ] b' m(z,h) = m(z,p,h). m. Let By Assumption 5.2, a standard result implies ^ max. i^n {b„„(z.)} 02 1 by an expansion, 111 111 liy.",m(z.,h. )-m.-M(z. )(h.-h. '^1 = 1 (A.l) follows Furthermore, by the the final ). of Van der Vaart, 1 this equality must hold for any let (1991), ) converges in distribution to S (z. )/v^)' ,X]._ 1—1 6 i differentiability it follows that that )i(F and the Linbergh-Levy central limit theorem imply that for p o (1). P Pathwise differentiability of Proof of Theorem 2.1: o P < 7.", IIS^m(z. )/nll h. )/5hah' 11 ^1=1 , 11 llllh.-h. Il^/n -1/2-(l/2)-l/& 1/& ,,2, \\^ n .Ilh.-h.ll J^..( ,,c s max.^ {b(z.)}y. /n = ^ (n 02)0 (n 02) = o (n ). ^1=1 i^n 1 1 1 p p p , , Proof of Lemma 5.2: W let aiLiiLia _. = v., m(w ,t) = t(v), a Suppress the W_, and = z.. Vf = V, g(w) = M(z). satisfied by hypothesis. J . , , subscript. q = d, In Andrews x = l(vey)h(v), . (1990b) notation, k a = dim(v), Note that Andrews Assumption F ii) is Also by hypothesis, 43 t = l(v€V)h (v) has derivatives of up to order W on q and for , ^ q, |X| sup - \D t(v) ^ a d\q(v)| = o T = {t(v) Let (1). sup^^^ |d\(v) : | < with X Ul Then sup cT^^|xI< ^ q}. + | 1 a \ for all IP'^t^Cv) ^^\^y^ a "^W 1/2 2 '^ ""^(^^l "^^^ giving < "• a Andrews Assumption F iii). W (and hence constant) outside m(w ,t) = t(v) by construction, Also, is zero giving Andrews Assumption F iv). , Also, s' cL implies > 2/1/(^-2) are satisfied. so that Andrews Assumptions v) and vi) 2s'/(s'-2), > ^l Finally, t(v) e T w.p.a. that the conclusion follows by Theorem Proof of Theorem 5.3: E[m(z,p.h„(v))], Let = QO) iii n iii 0)'Wiii n ^1=1 (p), /.-[tCvj-tCv) , g, that while by z. ergodic, = m{/3)'WmO). QO) and QO) -^ QO). Noting that conclusion then follows from and Gill (1982). Assumption 5.2, so that sup QO) QO) sup _IIQO)-QO)ll .^llm -^ 0. llm 0)-m m O) -^ is convex by iii O) uniquely minimized at Under Assumption 5.6 b), while p, Under implying O) -^ mO), iii 0)-mO)ll so 0, = mO) h„(v) )/n, 1 Assumption 5.6 a). Assumption 5.2 implies that for each 0, -^ dv ] of Andrews (1990b). II. 7 = y.",m(z. n O) and 1, sup„ ^llrii p€jD -^ n 0)-iii p n m (p), convex, , 0)ll -^ 0)ll so the as in Anderson -^ by follows by Andrews (1987), The conclusion now follows by the Wald argument for extremum estimators. Proof of Theorem 5.4: By Lemmas 5.1 and 5.2, Asymptotic Linearity and Stochastic Equicontinuity are satisfied, and by orthogonality of h(z) and h(z), v'iiJT'Kz) [h(v)-h(v) ]dF(z) = limit theorem of White and Domowitz (1984), N(0,n). M(z) with Then by the a-mixing central 0. Vnrii n ,m./V^ 0-) = T. ^1=1 1 + o p (1) — The remainder of the proof then follows from a standard minimum distance argument, such as that in Newey (1988b). Proof of Theorem 5.5: It follows by Lemma 5.1 44 that Asymptotic Linearity is > so that by Asymptotic Differentiability, satisfied, the triangle inequality, and the a-mixing central limit theorem of Vfhite and Domowitz (1984), = y. ,u./v^ + o ^1=1 1 — (1) p n (S_) The remainder of the proof then follows by a N(0,n). ) vTim standard minimum distance argument, such as that in Newey (1988b). By a mean value expansion, Proof of Theorem 5.6: X-^Jlin--m.ll^/n £ ^1 = 1 11 (A. 2) = Therefore, and llu.-u. n 11 ,llu.-u.ll T. ^1=1 „-u. i+t /n = (e P n + Jl + ,u.u' ^1=1 I llu.llllu. „-u. .11 1 i+l i+i i+£ Q„ = ). Let ). + E„^.,w(£, L) [n„ + Q'] Q = n Illlu. 1 1 llh(v)-h(v) ll^)j;." fb, . (z. )^+ b.. (z, )^]/n ll^-^_ll^+sup "^0 "^ ^1 = 1 V 10 1 01 i (€^). P a), ( Note in case b). Jlllu.-u. llu. i+£ 1 „/n, i+i 1 Q = Q„ in case 1^0, so that for all II, ^ - u.u'^.ll llu.u'.^. 1 by the Cauchy-Schwartz and Markov inequalities, (A. 3) lin„ i {I^"fllG.-u.ll^/n}^''^{j:"~fllG.^,-u.^„ll^/n}^^^ ^1=1 ^1=1 1 1 i+£ i+t - nJI i t „2 ,,-n-t,, + <r. JIu.ll ^1=1 .^-i,, _^ . ,l/2,^-£,- /n> {Y. ^1=1JIu.i+£ 1 ,,2 ^^ = l"''i+£" .l/2.Ji-£ . „2 „-u. „ll i+£ ' . .1/2 ^"^ ,,2, ^^ = l"'^i""i" '^"^ ,1/2 /n} i y."jlu.-u.ll^/n + {y.^Jlu.ll^/n}^'^^{y:.^JlG.-u.ll^/n}^^^ = ^1 = 1 ^1 = 1 ^1 = 1 1 1 1 11 In case a), - lin = nil lin„ the law of large numbers, sequence of numbers were L' 6 n 0(e for lin ), n^ - =0 p (e n ) = o p (1), giving the conclusion. ^0 such that for L' while lin In case b), = 6 /e n n , - nil n ). follows by there is a Prob(L :£ L' ) —^ 1, can be chosen as an integer by the argument from Newey (1990b). Then by boundedness of one - n^ll (€ p nil s ||5 - 5 w(£,L) II and eq. + cr.^jin.-njl ^£=1 £ (A. 3), ^ £ with probability approaching (1+CL')0 (€ p n ) = o p (l). AIso, it follows that so by Davidov's inequality, arguing as in Kool (1988), = n with probability approaching one, + J]fi_.,w(£, L) [n. + n'], 45 l/'/n = iin - a s II -n \\n W + lin„ CX;„^ - n.ll = O (L'/Vn) = o (l). applying the Finally, dominated convergence theorem as in Newey and West (1987), it follows by -^ that 00 wn^+Y.^^'^u.Din^ - nil = o (1). To show consistency, note first that iii) implies that Proof of Theorem 6.1: K^''^Cq(K)/v^ = o(1) + n^] L and K^'^^<q(K)K~^'^'^ = o(1), satisfied by Theorem 6.1 of Newey (1991). so that Assumption 5.2 is Consistency of then follow by p Let objects without subscripts denote vectors of observations. Theorem 5.3. Asymptotic Linearity follows by Lemma 5.1 and iv), since by Theorem 6.1 of Newey (1991), 5. llg-gll /n = (.K/Vn + K = o ) (n 02 ) so Assumption , 4 is satisfied. Asymptotic Differentiability is shown, with Next, Section M(z) For notational convenience the 4. treated as a scalar (for vector j subscript will be dropped and the result follows by applying M(z) the following argument to each of its elements). (M(z ), . . ,M(z ))', . and 6 = QM. as derived in a.(z) Then by Q Let Q = p(p'p) p' M = , idempotent, M'(g-g) - 6'(y-g) = 6'g - M'g - 5'y + 5'g (A.4) = (6-5)' (g-g) + S'(y-g) + (5-3)'g = Rj + ^2 ""^ ^3' By Theorem 6.1 of Newey (1991) and iii), iRjI/Vn s Vnlia-5lllli-gll = By Lemma 8.1 of Newey (1991), such that p^(x)'7r^ O Then by Q (.VniK.'^^^/Vn + k'^S^'^) (K^^^/'/n + there are K g„(x) = p (x)'7i K llg(x)-g„(x) IL £ CiC^'^ K U idempotent. 46 and K k"^'^) and g 5 ) (x) = o (1). = K. Il5(x)-5^,(x) Is. ll„ V ^ CK~^5'^'^. IR^I/V^ = (A. 5) * By II 6 I )' (y-g) |/v^ + (6-6^)'Q(y-g) I |/v/H (1). ^ R By Lemma 9.8 of Newey (1991), (1). R^^ ^ so that (iC/n). = o ) with probability approaching one, K i K II5-6-II [ (y-g) 'Q(y-g)/n] ^'^^ = By the strong mixing hypotheses, Davydov' p = inequality, and Assumption 6.4, for 1 (6-6- (Vnk~^^*^6^'''^) = o (y-g)'Q(y-g)/n = K :£ (6-6^)' (I-Q)(g-g^)|/Vn = R^, - R22 - R23- llg-g-ll/Vn = (K (I-Q)yl/V^ idempotent and I-Q 116-6- I 15' (6-6-)' (y-g)|^/n = Op(i:^E[ | (6-6^,) K and 1-2/ji _ = (y-g) |^]/n) = Op(j:j^E[ ' | 2 R [K,K], (6-5^) ' £ (y-g) |^]/n) ^ Op(j:j^|(6(x)-6j,(x))(y-g(x))|^) = 0p(E^|(5(x)-6j,(x))(y-g(x))|^) = Op(I^K"^V''). 2d_/a Then since y^K 6 > 1 o = 0(1) follows from K implying R follows, J\ same form as follows. R , Finally, with note that so that 2^/((j-l), y and |a(z)| oa = o ^1 p > s Note also that (1). so that and is bounded, for converging to zero, 6 U interchanged, M 6(x) < C^^^^K 2/i/((j-l), |e| =0 R /V^ < R for cd has the 3 also (1) s > so that all of the hypotheses of Theorem 5.5 are satisfied and the first conclusion follows from its conclusion. To prove the second conclusion, (1991), li(x)-g(x)| by Theorem 5.6, = note first that by Theorem (K^''^Cn(K)[K^''^/v^ + K'"^^""]) = (A. 6) 1 In M 111 Cr."jla.-a(z. )ll^/n £ T." JIM'p^'p^(c. -c. ^1=1 ^1 = 1 1 1 + 111 y'."jl(M'p^~p^-6. )c.ll^/n = ^1=1 47 + R^ and hence (€ P a n ). can be e. = 1 Then ll^/n + 123 R, Therefore, Let c. = y.-g(x.), ). ) a, 111 assumed to be scalar without loss of generality. 1 of Newey 11 V. It suffices to show this result for each element of 11 ). 1 Jla.-a(z.)ll /n = ^1 = 1 it only remains to be shown that y.-g(x.), M. = 5m(z. ,p,h(x))/ah, M = (M (€ 6. + R^. 7." ,11 (M-M) 'p'^'p'^e. "^ ^1 = 1 '^i 1 Il^/n 6 By and s 2 idempotent, Q M'QM/n ^ M'M/n =0 (1). Therefore, by Theorem 6.1 of Newey (1991), £ sup li(x)-g(x)|^M'QM/n = X R, (A. 7) 1 Also, & by £ IIM-MII^/n = P 1, 6 £ Finally, note that M'p 11 6(x. is p. M on p . satisfied. Consistency of = d/n. a Assumption 6.6 (V^C. (K) First, [K/n+K 5(x. = o , Thus, ) by Theorem 6.1 of Newey (1991), ll^/n = (e^). P n follows as in the proof of Theorem 6.1, noting Theorem 6.1 of Newey (1991) and 11^ = so that Asymptotic Linearity follows by the same Taylor expansion argument as used in the proof of Lemma 5.1. Asymptotic Differentiability, *j, = J-5(x)p^(x)dF(x), 6j,(x) = p*^(x)'Z^^*j,, n is the series estimator ) Vlillh(v)-h (v)!!^ £ Ci/nllg(x)-g(x) ]) (e^). P consider the case where Assumption 6.7 a) is it follows by Next, that ^ (M-M)'Q(M-M)/n = 1 11 Proof of Theorem 6.2: (n^''^) p where ) R^ £ (max. G?)y'.'^JI6(x. )-5(x. lin 1 ^1=1 3 (A. 9) that z! from regressing of 5(x) (M-M)'Q(M-M)/n £ so (M-M)'p*^"p^ll^/n = R^ £ (max.^ e^)y;."jl lin 1 ^1=1 2 1 (A.8) . ri and a Taylor expansion, 1 (€^n~^^^), n (e^). P To show let Z^ = Xp^(x)p*^(x)'dF(x), g^(x) = p^(x)'7rj,, hj,(v) 71^ = 2:^^E[p^(x)g(x)]. = A(v,gj,) = A(v)'7rj,, * = Ij"iM(z^)A(v.)'/n. Note that h(v) is invariant to nonsingular linear transformations, without loss of generality p (x), p^(x), ... so that can be assumed to be the functions in the conclusion of Lemma 8.4 of Newey (1991), for which the 48 smallest eigenvalue of sup that ^ is bounded below and there is a constant Z s (x)| ID p [K] CC, by Assumption 6.6 Then, . such C maXj^^j^llA(v,Pj^)llQ i C^(K). (A. 10) It then follows by |M(z) finite for | and Lenuna 9.6 of Newey s > 2/i/(/i-l) (1991) that (A. 11) K — 1 /9 = p (K — — /N C.(K)/yn) = o A = IIZ-ZII . — ? {KC,AK) /Vn) = o p . p P as in the proof Lemma 9.8 of Newey (1991), Also, (A. ll*-*-ll 12) liz'-^'^^p' (y-g)/v/nll (K^'^^/'/K). ~ ~ K g„(x) = p (x)'7r K. by Lemma 8.1 of Newey (1991) there exists Next, that =0 llg(x)-g^(x)ll. :£ Note that CK~". Uti-tiII :£ C[ (7i-7i)'Z(7r-7i) ^'^^ ] such = C|gj^(x)-ij^(x)l2 ^ C[|g(x)-gj,(x)l2+lg(x)-ij,(x)l2] ^ C|g(x)-g^(x) 1^ ^ CK"". Therefore. (A. 13) llg(x)-gj,(x)ll^ £ llg(x)-gj,(x)ll^ + :£ C(K""+llp^(x)IIJIn-7ill) A =£ llgj,(x)-gj,(x)ll^ s C(K~"+llp^(x)ll^ll7i-Trll) CK^''^<.(K)K~", A By the definition of the least squares coefficients Newey (1991) it follows that E[p^(x) {g(x)-g^(x) CK 0, (A. 14) so that under uniform mixing, E[Ij^llp^'(g-g^)/v/nll^] which converges to zero by i). for p = }] [p and Lemma 7i = ),..., p £ CEj^E[llp^(x)ll^(g(x)-g^(x))2] Without uniform mixing, mixing and boundedness of p (x), g(x), and 49 g„(x) that it 1 of |g(x)-gj,(x) 1^ ^ and (x 8. (x ^ )]', CJ:^KCq(K)V^%, follows by strong which converges to zero by ii) b). K 2 ' Zi/"P lip' (g-g/>)/Vnll 2 = o K. = o and |A p ($-*-)'Z"-^p'(y-g)/n + *' (i"-^-z"^ )p' (y-g)/n . min . min -1 ll*-'Z II :£ C[*-'Z hence -1 II*-' (Z~-^-Z~-^ ) II ^ = o ^ ll%'Z~-^ and = V^_5 the largest eigenvalue of IIZ-ZII, II ^ CE[5(x) 5' (y-g)/n. ($-*-) K 'z"'^ :£ I! ll$-*^IIO K llll . = o = o it (J , P J (K), (1) = o (1). p -1 2 so that ], = o (Z-Z)Z~-^II p ll*/>'Z . II K = (1), and p now follows by It p v^ that P IIZ~'^''^p' (y-g)/v'nil p :£ | I^ v^llp' (g-g-)/nll ]-XM(z) [h^(v)-h(v)]dF(z) )/n K. K. = o 1/2 */>] K. (Z) so that -^ K. V^.4 {Z)-A ) 6'(y-g)/n = I,?,R, + + is bounded in probability, '^ Also, K I\. (6^-6)' (y-g)/n + J^.^^ (M(z J [h^(v. )-h(v. IIZ-ZII Z"'^ . p 111 1—1 + J>l(z)[h^(v)-h(v)]dF(z) By it follows by the Markov 5:.|^,M(z.)[h(v.)-h(v.)]/n = ($-*)'Z"-^p' (g-g/>)/n + *' (z"-^-z"-^ )p' (g-g/>)/n + *j^'s"^p'(g-gj^)/n + + ^ (g-g')/\^ll it follows by a little arithmetic that Now, (A. 16) lip' with probability approaching one, (g-gv,)/v'nll inequality that since Then, = 1, 2, 3). Also, follows similarly from eq. by (A. 11) and iii) that . p Next, note that by either uniform mixing or the bound on the conditional covariances, E[{ (5j,-5)' (y-g)}^/n] £ CE[{l+e^}{5j^(x)-5(x) >^] £ CE[{1+E[e^|x]}{5,,(x)-S(x)}^] s C€-(K)^, K. (A. 17) o so that by Assumption 6.4 and iii), nlR^I^ = Op(j:j^E[{(5^-5)'(y-g)}^/n]) = O^ilj^e^iK)^) = o^. Similarly, note that by eq. (A. 13), |h^(v)-h(v) |^ £ CK^^^C^{K)K~", by strong mixing, Davydov' s inequality, and i) for 50 p = 1 - 2//J, so that n|R_|^ = (A. 18) p I {l^\mz)lh^{v)-h{v)]\l) = K K p Furthermore, by Assumption 4.1, = E[5(x)p * p ^'^) (5:^KC.(K)^K K A = o . p so that by (x)], j6{x)g-(x)dF(x) = J-6^(x)g^(x)dF(x) = j5^(x)g(x)dF(x) v^lRgl = ••n|j6(x)g^(x)dF(x) - E[5(x)g(x) (A.19) = •K|J[6(x)-5£,(x)][g/>(x)-g(x)]dF(x)| K :£ Vne-(K)K"% :£ O K. where the last inequality follows by ] Vne-(K)K"% -^ o monotonically decreasing in e-(K) o For the case where Assumption 6.7 b) is satisfied, replaced by any (arbitrarily large) positive number n, 6.4 that all terms above where K o, so that all the terms depending on can be ignored, a i.e. and a K a K , or and a follows K and Assumption K, appear are small, K or o, a It then a. bounded by a power of Cj^I^^ K. it follows by Lemma 8.2 of Newey (1991) that all the previous arguments hold with bounded by a power of 0, in the statement of the Theorem a can be set to +oo, again giving the first conclusion. The second conclusion will be shown only under case a) of Assumption 6.7, For notational because under case b) the result will follow as above. convenience, suppress the j subscript on each Theorem 6.1 of Newey (1991) that |h(v)-h(v)| h.(v). = p CD = (e p n ). Therefore, by Theorem 5.6, 11 r.^Jla.-a(z. ^1=1 )ll^/n = (e^). P n €^ = K^'^^<^(K)[K^'^^/\/n + K~"] KC^(K)^tK-^^^/\/n + K~"] Let if otherwise. it hi , p, is linear in By eq. so that by an expansion. 51 (K^^^C^ (K) [K^'^^/V^+K~"] ~ A only remains to be shown that * = y.^.m^ (z. ^1=1 m follows from It (A. 10) h(v. ) )A(v. 1 h ) '/n and define 1 and e^ = sup^^^llA(v)ll £ CK^'^^C^(K). (A. 20) ll$-*/>ll + II {lip-P-liy.^.llm, |IIA(v)lll K Next, + II -(2.,p,h(v.))ll/n |h(v)-hQ(v)lJ.;^llm^(z.,p,h(v.))ll/n} =0^[^^). €„ = KC,AZ)^/Vn (*-*/> )'2~-^ (A. + it follows as in the proof of eq. Also, for ll$-*i>ll :£ and is bounded, ll*-'Z~-^ll ll*i>'S~-^(Z-Z)i~'^ll K = p ll*-'S~^ (Z-i)i~'^ll = that (A. 11) (€,+€„) = o v Z p ll^'Z"'^-*-'^"^!! £ so that , and (e„) ll^-'i'^^ll = (1). p note that 21) l.'^.\a.-(x.\^/n ^ 1—1 1 1 Cj;.!? 111 J*'Z"^P. 1—1 (e. -c. ) |^/n +05^.^,1 ($'z"^-*/>'z"^ )P.e. |^/n 1—1 K. 11 + CX.^j|[5j,(x.)-5(x.)]c. l^/n = R^ + ^2 " ^3" d/n, i a, Therefore, by Theorem 6.1 of Newey(1991), and (A.22) |R,| i £ Cll$'z"^ll|llp^(x)ll|^.'^ji(x.)-g(x.)|^/n 00^1 — i 1 1 = p (Kr (K)^[(K/n) = + k"^^^""]) - (e^). P n Also. I.^liz"^/2p.l|2/n = Z." tr(Z"^''2p_p,£-l/2^/^ ^ tr (z"^''^(P'P/n)r^/2)/n ^1=1 ^1=1 1 1 1 is equal to the dimension of (A. 23) IR^I Z Z, less than or equal to £ C(max., c^) (T.^JIZ'-^^^P. Il^/n) isn 1 1=1 1 + II*'Z"^/2||2|!Z"^/2(Z-Z)Z"^''2||2] Finally, that |R_| = 0^(J:„ ^E[<6„(x)-5(x)}^]) = Theorem 6.1. Section 6, 7. 1: so ll($-vt) 'Z"-^^^ll^ (n^^=K(e +6,)^). Z w 2 |x] bounded, (e^). The proof proceeds by verifying the hypotheses of Assumption 5.1 holds by where p w.p.a. 1., Theorem 6.1, using E[e it follows as in the proof of Proof of Theorem = [ K m(z,3,cr i). h) = 52 The estimator has the form of '^ I(0<h^<l,0<h2<l)(x^-x^,$ ^ih^))'[i ^(h^)-cr^i is bounded, + p(x) X h (x) = $( [x +p(x) so that from zero and one, and that {h^)-ix^-x^)' (•) $ is ]/<r fi] Note that . is bounded away ) continuously differentiable on any set where its argument is bounded away from zero and one. It follows that Assumptions 5.3 and 5.6 are satisfied for case a) of Assumption 5.6 and CD, < j+k £ 2. 1 Assumptions 6.1 M note that Also, - 6.5 is a bounded function of (z) mean-square integrable functions of of Theorem 6. are satisfied with 1 v = also hold, with o -u. - oo. is the set of all and Co(K) = K Noting that d_ = d. s = and M.(z) = 5.(x), so that x, & and x ^ - i) 1/2 ii) it , u follows by vi) that Theorem 6.1 iii) and iv) are satisfied, since each of ' n 4r-i , n r-r{d/2k) , , and follows by Theorem 6.1. n .s-^d/k note Next, _, converge to zero. s = m .. y.-h (x) by Theorem 6.1 v) is implied by Theorem 6.1 iii), so . , . The first conclusion now bounded, — € so that and the second > conclusion also follows by Theorem 6.1. Proof of Theorem 7.2: follow, Follows similarly to the proof of Theorem 7.3 to on noting that E[J"^g(x ,x^)dx 1) = £{d)E[g{x)f{x)] ] 6(x) = ie(^)n(f(x)|g); 2) E[A(v,g)] = Assumption 4.1 is satisfied, where , is the Lebesgue measure, 1 :e(^)E[f(x)|Xj] = f(x^,X2)~^f(x2) and hence in case b), where the projection has an explicit form. The proof proceeds by verifying the hypotheses of Proof of Theorem 7.3: Theorem 6.2. Section 6, Assumption 5.1 holds by where The estimator has the form of i). so that Assumptions 5.3 and 5.6 are m(z,p,h) = g(v)-g(x)-^, satisfied for case a) of Assumption 5.6 and 0, so that Assumption 6.7 a) g(x), so that of Newey (1991), Section 4, 5 (x) '§ = 1. 6 Let is satisfied. To discuss is closed, 6 so that (x), oo, h (v 5 1 i j+k ) = g(v), :£ 2. Also, h (v ) A = = note first that by Lemma 8.0 IT(f(x)|§') Assumption 4.1 is satisfied for 53 = . (x) exists. As shown in = II(f(x)|§'), so that under Newey (1991) that iii) b), it follows by Lemma 8.2 of iii) a), the projection has an explicit form K 1/2 and a = a = -d/a, .,.,,, since and . , n converge to zero. n C„(K) = CAY.) = that the hypotheses of Theorem it follows by iv) each of 6.2 are satisfied, + ] and that E[f(x)|x^] = Noting that is satisfied. so that part b) Under o n(M(z)|^) = E[M(z)|x {x^-E[x^|x^]}(E[Var(x^|x^)])"-^E[{x^-E[x^|x^]}M(z)], f i(x^)~'^f i(x-^). €^(K) s k'^S'^^. iri3-2d/n) n , 1/2-y (d+d-)/a 5 , n 5r-l+(l/s) , The conclusions now follow from the conclusion of Theorem 6.2. First the result will be proven when Proof of Theorem 7.4: Assumption 5.1 holds by i m(z,p,h) = m(z,9g(x)/Sx ) ) . 5(x) = n(f(x) - p, b) a 1 n is satisfied, where & = . Is oo, j+k £ 2. As shown in Section so that by the usual mean-square spanning — K as ) o — > Also, oa. since Assumption 6.7 none of the conditions of Theorems 6.2 that depend on are binding. ^ when u and e_(K) 6, so that Assumptions 5.3 and 5.6 are satisfied 9f(x)/9x |&), result for polynomials, is a scalar. The estimator has the form of Section for case a) of Assumption 5.6 and 4, x The conclusion now follows by Theorem 6.2, u^ m(z,h) r • (l/s)+r[(5/2)+2i^]-(l/2) converges to zero, 1- • is linear in u h ^v, both and V, ... n • and (l/s)+r(2+l+2i'+4)-(l/2) so the conclusion follows by Theorem 6,2. 54 or A = (l/s)+r[(7/2)+y]-(l/2) u-n *u go to zero, while otherwise n ^ since a References H. and C. F. Manski (1989): "Distribution Theory for Binary Choice Analysis with Nonparametric Estimation of Expectations, " mimeo, University of Wisconsin. Ahn, Ai, C. Ph.D. (1990): "Semiparametric Estimation of Index Distribution Models," MIT Thesis. Anderson, P.K. and R.D. Gill (1982): "Cox's Regression Model for Counting Processes: A Large Sample Study," Annals of Statistics 10, 1100-1120. Andrews, D.W.K. (1987): "Consistency in Nonlinear Econometric Models, Generic Uniform Law of Large Numbers," Econometrica 55, 1465-1471. Andrews, D.W.K. (1990a): I. Estimation," mimeo, A "Asymptotics for Semiparametric Econometric Models: Cowles Foundation, Yale University. D.W.K. (1990b): "Asymptotics for Semiparametric Econometric Models: Stochastic Equicontinuity, " mimeo, Cowles Foundation, Yale University. Andrews, II. D.W.K. (1991): "Asymptotic Normality of Series Estimators for Nonparametric and Semiparametric Models, " Econometrica 59, 307-345. Andrews, Bickel P., C.A.J. Klaassen, Y. Ritov, and J. A. Wellner (1990): "Efficient and Adaptive Inference in Semiparametric Models" monograph, forthcoming. D.D. and R.J. Serfling (1980): "A Note on Differentials and the CLT and LIL for Statistical Functions, with Application to M-Estimates, " Annals of Statistics, 8, 618-624. Boos, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. 80 580-598. Breiman, Buckley, J. and I. James (1979): Biometrika, 66, 429-436. "Linear Regression with Censored Data," Chamberlain, G. (1980), "The Analysis of Covariance with Qualitative Data," Review of Economic Studies, 47, 225-238. Fernholz, L.T. (1983): von Nises Calculus for Statistical Functionals, Lecture Notes in Statistics 19. Berlin: Springer. Hansen, L.P. (1985): "Two-Step Generalized Method of Moments Estimators," discussion. North American Winter Meeting of the Econometric Society, Meeting, New York. Hardle, W. and T. Stoker (1989): "Investigation of Smooth Multiple Regression by the Method of Average Derivatives," Journal of the American Statistical Association, 84, 986-995. Hausman, J. A. and W. K. Newey (1991): "Nonparametric Estimation of Exact Consumer Surplus and Deadweight Loss, " working paper, MIT Department of Economics. 55 J. and R. Miller (1989): "Estimation of Dynamic Discrete Choice Models," preprint, Carnegie-Mellon University. Hotz, (1967): "The Behavior of Maximum Likelihood Estimates Under P. Nonstandard Conditions," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley: University of California Ruber, Press. and R.Z. Has'minskii (1981): Statistical Estimation: I. A. Asymptotic Theory, New York: Springer-Verlag. Ibragimov, H. (1987): "Estimation of Single Index Models," Ph. Massachusetts Institute of Technology. Ichimura, Kim, Pollard (1989), and D. 191-219. J. 18. "Cube Root Asymptotics, " D. dissertation, Annals of Statistics, Klein, R.W. and R. S. Spady (1987), "An Efficient Semiparametric Estimator of the Binary Response Model, " manuscript, Bell Communications Research. H. (1988): "A Note on Consistent Estimation of Heteroskedastic and Autocorrelated Covariance Matrices, " mimeo. Department of Econometrics, Free University of Amsterdam. Kool, Koshevnik, Y. A. and B. Y. Levit (1976): "On a Non-parametric Analouge of the Information Matrix," Theory of Probability and Applications, 21, 738-753. Lorentz, G.G. Company. (1986). Approximation of Functions. New York: Chelsea Publishing Manski, C. (1975): "Maximum Score Estimation of the Stochastic Utility Model of Choice, " Journal of Econometrics, 3, 205-228. C. (1987): "Semiparametric Analysis of Random Effects Linear Models From Binary Panel Data, " Econometrica 55, 357-362. Manski, (1988a): "Adaptive Estimation of Regression Models Via Moment Restrictions," Journal of Econometrics 38, 301-339. Newey, W.K. Newey, W.K. (1988b): "Asymptotic Equivalence of Closest Moments and GMM Estimators, " Econometric Theory 4, 336-340. Newey, W.K. (1990a): "Semiparametric Efficiency Bounds," Journal of Applied 5, 99-135. Econometrics, (1990b): "Series Estimation of Regression Functionals, Department of Economics, Princeton University. Newey, W.K. " mimeo. W.K. (1991): "Consistency and Asymptotic Normality of Nonparametric Projection Estimators," mimeo, MIT Department of Economics. Newey, and K.D. West (1987): "A Simple, Positive Semidef inite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator," Econometrica, 55, 703-708. Newey, W.K. 56 W.K. and P. A. Ruud (1991): "Density Weighted Least Squares Estimation," working paper, MIT Department of Economics. Newey, Pfanzagl, J., and Wefelmeyer (1982): Contributions to a General Asymptotic Statistical Theory, New York: Springer-Verlag. Powell, J.L. J.H. Stock, and T.M. Stoker (1989): "Semiparametric Estimation of Index Coefficients," Econometrica, 57, 1403-1430. , J. A. (1976): On the Definition of Von Mises Functionals, Ph.D. Thesis, Harvard University. Reeds, Y. and P.J. Bickel (1987): "Achieving Information Bounds in Non and Semiparametric Models," Technical Report No. 116, Department of Statistics, University of California, Berkeley. Ritov, Robinson, P. (1988): "Root-N-Consistent Semiparametric Regression," Econometrica, 56, 931-954. Robinson, P. (1989): "Hypothesis Testing in Semiparametric and Nonparametric Models for Economic Time Series," Review of Economic Studies, 56, 511-534. P. A. (1986): "Consistent Estimation of Limited Dependent Variable Models Despite Misspecif ication of Distribution," Journal of Econometrics, 32, Ruud, 157-187. J.H. (1989): "Nonparametric Policy Analysis," Journal of the American Statistical Association, 84, 567-575. Stock, Stoker, T.M. (1990): "Smoothing Bias in Derivative Estimation," School of Management, Working Paper 8185-90-EFA. MIT, Sloan C.J. (1985): "Additive regression and other nonparametric models." Annals of Statistics. 13 689-705. Stone, Stone, C.J. (1990): "L_ Rate of Convergence for Interaction Spline Regression," Technical Report No. 268, Berkeley Statistics Department. Van der Vaart, Statistics, A. 19, (1991): "On Differentiable Functionals," Annals of 178-204. Von Mises (1947): "On the Asymptotic Distributions of Differentiable Statistical Functionals," Annals of Mathematical Statistics, 18, 309-348. H. and I. Domowitz (1984): Observations," Econometrica 52, White, "Nonlinear Regression with Dependent 143-161. 57 ^07} 60 Date Due ^^ g 1894 19^ '^^im MIT LIBRARIES 3 TDflD DD7nSM2 D