Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/kernelestimationOOnewe working paper department of economics KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR Whitney K No. . Newey 92 -3 Dec . 1992 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 M.I.T. LIBRARIES KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR Whitney K. Newey No. 93-3 Dec. 1992 MIT Economics Working Paper 93-3 KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR* Whitney K. Newey MIT Department of Economics December, 1991 Revised: December, 1992 * Financial support comments. was provided by the NSF. P. Robinson and T. Stoker provided useful Abstract Econometric applications of kernel estimators are proliferating, suggesting the need for convenient variance estimates and conditions for asymptotic normality. This paper develops a general "delta method" variance estimator for functionals of kernel estimators. Also, regularity conditions for asymptotic normality are given, along with a guide to verifying them for particular estimators. The general results are applied to partial means, which are averages of kernel estimators over some of their arguments with other arguments held fixed. Partial means have econometric applications, such as consumer surplus estimation, and are useful for estimation of additive nonparametric models. Keywords: estimation. Kernel estimation, partial means, standard errors, delta method, functional 1. Introduction There are a growing number of applications where estimators use the kernel method their construction, where functionals of kernel estimators are involved. i.e. in Examples include average derivative estimation (Hardle and Stoker, 1989, and Powell, Stock, and Stoker, 1989), nonparametric policy analysis (Stock, 1989), consumer surplus estimation (Hausman and Newey, 1992), and others that are the topic of current research. important example in this paper is a partial mean, which is an average of a kernel regression estimator over some components, holding others fixed. The growth of kernel applications suggests the need for a general variance estimator, that applies to This paper presents one such estimator. cases, including partial means. An many Also, the paper gives general results on asymptotic normality of functionals of kernel estimators. means control for covariates by averaging over them. Partial They are related to additive nonparametric models and have important uses in economics, as further discussed below. It is shown here that their convergence rate is determined by the components that are averaged out, being faster the number of more components that are averaged over. The variance estimator is based on differentiating the functional with respect to the contribution of each observation to the kernel. A more common method asymptotic variance formula and then "plug-in" consistent estimators. quite difficult case. when the asymptotic formula In contrast, the functional and kernel. In this way Also, it This method can be complicated, as often seems to be the it is gives consistent standard errors even for fixed is centered at like the An alternative approach its limit), unlike the more common Huber (1967) asymptotic variance for m-estimators. Also, it is a generalization of the "delta bootstrap. to calculate the approach described here only requires knowing the form of the bandwidths (when the estimator approach. is is method" for functions of sample means. to variance estimation, or confidence intervals, is the The bootstrap may give consistent confidence intervals - 1 - (e.g. by the percentile method) for the not appear to be known. same types of functionals considered here, although In any case, variance estimates are useful for bootstrap improvements to the asymptotic distribution, as considered The variance formula given here has antecedents density at a point it is this does in Hall (1992). in the literature. For a kernel equal to the sample variance of the kernel observations, as recently considered by Hall (1992). For a kernel regression at a point, a related estimator was proposed by Bierens (1987). Also, the standard errors for average derivatives in Hardle and Stoker (1989) and Powell, Stock, and Stoker (1989) are equal to this estimator when the kernel is New symmetric. cases included here are partial means and estimators that depend (possibly) nonlinearly on of the density or regression all function, and not just on its value at sample points. Section 2 sets up m-estimators that depend on kernel densities or regressions, and gives examples. estimator. Section 3 gives the standard errors, i.e. the asymptotic variance Section 4 describes partial means and their estimators, and associated asymptotic theory. Section 5 gives some general lemmas that are useful for the asymptotic theory of partial means, and more generally for other nonlinear functionals of kernel estimators. The proofs are collected in Appendix A, and Appendix B contains some technical lemmas. 2. The Estimators The estimators considered is in this To describe the a vector of kernel estimators. vector of variables, x a k x paper are two step estimators where the first step 1 denote the product of the density first step, let y be a r x 1 vector of continuously distributed variables, and f n (x) of x with - 2 - E[y|x] as h (2.1) Let Q (x) = E[y|x]f (x). where in Section 4, u k x is that include observations Let 1. and y. x. l K = (u) -k a- z., (i on h(x) = n This estimator is A second ^"y.K h. u n" This is h, ($ , and not just E[m(z,£ 1 is an m-estimator that depends on the To describe such an estimator, and ,h )] = 0. its h. let |3 denote a vector of a vector of functions that depend on the m(z,/3,h) observation, parameter, and the function (2.3) and (x-x.). parameters, with true value sample equation > o- the first step considered here. estimated function Suppose that Then for a bandwidth x. is h_. step allowed for in this paper entire function denote data observations, n), 1 and y a kernel estimator of K(u/o-), = and other conditions given 1 l o" (2.2) TX(u)du = denote a kernel function satisfying X(u) Here m(z,j3,h) is allowed to depend on the value at observed points; see below for examples. A second step estimator |3 that solves a corresponding is n m(z.,/3,h) = 0. *n=l i £. i a two-step m-estimator where the first step the kernel estimator described is above. This estimator includes as special cases functions of kernel estimators evaluated at points, e.g. a kernel density estimator at a point. Some other interesting examples are as follows: Partial Means: An example that is (apparently) new regression over some variables holding others fixed. and g n (x) = E[q|x]. Partition x = (x ,x ) and let - 3 - is an average of a nonparametric Let x„ q denote a random variable be a variable that is included and has the same dimension as z in x(x & be some fixed value for x and x„, (2.3a) /3 This object A = E[T(x Q 2 )g (x x r 2 mean partial an average over some conditioning variables holding others fixed. is estimator y = Let expectation. f(x) = h,(x), and |3 = n This estimator )h (x is = (x,,x„.). x. is ^TlXylilx.). m(z,0,h) = a special case of equation (2.3) with ,x„)/h (x ,x„) - It /3. shows how explicit estimators can be included as Further discussion special cases of equation (2.3). given in Section is 4. An estimator with economic applications Differential Equation Solution: solves a differential equation depending on a nonparametric regression. estimator, let (x ,x )'. , y = (l,q) x and suppose is be some fixed value for x denoted by p and p , with p < p two-dimensional The estimator . g(x) = h„(x)/f(x). It is m(z,/3,h) ) = commodity £ q, is k = To describe 2), with x = £. is given by m(z,£,h) of equation (2.3) way This example shows one h. p to p , The economic for an individual with income - 4 is that a nonparametric estimate of the cost of a change of price from this 0, can be allowed to depend on the entire function interpretation of of a 1 Sip a special case where the the solution of the differential equation minus (i.e. one that is and consider two possible values for x dS(p)/dp = -g(x -S(p),p), = S(p°), (2.5) for Let can for the kernel density Then the estimator 2i 1 1 It and a sample average for the g g(x) = h (x)/h (x) = h (x)/f(x), (l,q), 1 (2.4) is )]. be estimated by substituting a kernel estimator for x Let . J. be some weight function, possibly associated with fixed "trimming" that keeps a ) denominator bounded away from zero. t(x x L x p, and demand function g n (x) = E[q|x]. This example is analyzed in Hausman and Newey (1992), using results developed here. Inverse Density Weighted Least Squares: An estimator that follows. is Let t(x) t(«) is an be a density of an elliptically symmetric distribution for some (x-fi)'E(x-fi) /j and t(x) (i.e. that has bounded Z), The estimator solves minimizes (2.6) Y. n l ill ,f(.x.)~ T(x.)lq.-x'.p] ^i=l l 2 . This estimator has the form given in equation (2.3), with The weighting by the inverse density leads to E[q|x] projection of on x |3 under the density (1991). This estimator is = h(x) m(z,/3,h) T(x)x[q-x'£]. converging to the least squares t(x), which coefficients of a generalized regression model, as discussed in Duan where a weighted least squares estimator, that can be described as is a density that depends only on support. useful for estimating E[y|x] = t(x'5), the semiparametric generalized regression model unknown transformation, is analyzed in Newey and Ruud is consistent for scaled Ruud (1991), (1986) and Li and using results developed here. The results of this paper apply to each of these examples, as discussed below. also apply to other estimators, including those that minimize a quadratic sample average depending on quasi-maximum 3. £ and h, form They will in a or minimize a sample average, such as likelihood estimators that depend on kernel estimators. The Asymptotic Variance Estimator To form approximate confidence consistent standard errors. intervals and test statistics it is important to have To motivate the form of the asymptotic variance estimator, - 5 - it is helpful to briefly sketch the asymptotic distribution theory. left-hand side of equation (2.3) around - (3.1) where n is = -[n" Q 1 and solving for am(z.J,n)/a0]" j;." i a mean value. 1 mn O o ) > mn (|S) Expanding the - gives = I.^mfz.^.hJ/n, By the uniform law of large numbers discussed in Section 4, will converge in probability to Y,_ 9m(z.,£,h)/3/3 1 M = E[am(z,3 ,h )/80]. so that the asymptotic distribution of will be determined by conditions will be given for existence of The magnitude of such that with a a will be determined by the form of E[m(z,0 a vn<r (0 - (3.1b) A ) -U N(0,M _1 (3.1c) as a function of E[m(z,0 ,h)]. will be -1, VM ). consistent asymptotic variance estimator can be constructed by substituting estimates for true values in the formula M, ,h)] being smaller the more dimensions being integrated over in By the Slutzky theorem the asymptotic distribution for of In Section 4 (0 n ). vno-°m (0_) -±> N(0,V). n (3.1a) h, a a m M VM ' . It is easy to construct an estimator as M = n" 1 X;." am(z.,0,n)/a0. i Finding a consistent estimator of for the presence of h in m V (0 n )- is more difficult, because of the need to account One common approach to this problem is to calculate the asymptotic variance, and then form an estimator by substituting estimates for unknown functions, such as sample averages for expectations. - 6 - This approach can be difficult when the asymptotic variance is very complicated. parameter, because the variance formula Also, it h (3.2) fl[n" tj ; iin(zJ ,|,fi The interpretation of h observation in estimator. on + that O^' is tr — 0. > constructed by estimating the influence of Let T. ,m(z.,S,h)/n. ^i=l 1 1 = 5. on be sensitive to the bandwidth only valid in the limit as is The asymptotic variance estimator here each observation in may denote a scalar number and C ~ x.))]/aci th estimates the first-order effect of the 5. is £. m(z.,/3 ,h)/n. it let In this sense it is i an "influence function" The variance can be estimated by including this term with in a m(z.,P,h) sample variance, as in V = (3.3) £."#.#'. /n, n=li #. l = m(z.,j§,h) + 8. - £. l An asymptotic variance estimator for n .5yn. J=l l l J can then be constructed by combining /3 V with a Jacobian estimator in the usual way, as in Var(p) = (3.4) _1 M VM _1 ' , M = n -1 £." Bmlz.,(i&)/ap. In Section 5 conditions will be given that are sufficient for Consequently, inference procedures based on and variance - ~ 1.96[Var(|3)../n] on (i.e. a), var(£) form of Var(p) of the 5. in form of For example, is $ . ± It is does not depend on the convergence rate for but that its large sample behavior will. 0.. h by including the These terms are straightforward to compute, requiring only knowledge m(z,/3,h) and the kernel. In particular, analytic differentiation with respect to the scalar formula ' being normally distributed with mean £ This asymptotic variance estimator accounts for the presence of terms —^-> M VM will be an asymptotic 95 percent confidence interval. interesting to note that the |3 - will be asymptotically valid. Var(/3)/n 1/2 /3 cr very hard to construct, 5. C,. 5. can be calculated by Alternatively, if the analytic can be calculated as the numerical derivative of 7 - n y. ,m(z.J,h + <y.K *-j=l j 1 (• - x.))/n cr V = Y. ri 1 Li=l,0.fi'./n Here with respect to <. 1 a "delta-method" variance for kernel estimators. is exactly analogous to delta method variances for parametric estimators. h of h = was a sample mean rather than a kernel estimator, say n 5. a{y. ,m(z.,£,h + C,y.)/n}/dC, = j=l i J is i where in, y., hi the analogous influence function estimator would be = a(z.,0,h) + m, (y.-y), in, h Another feature of bandwidth shrinking to zero for its validity. when the bandwidth The terms £ is and m(z.,£,h) convergence rate of y'. n i ^j=l is reasons the formula for Var(/3) held fixed at S./n £. slower than 5 . that is it estimator Vn(fi - K (x-x.). (3 ), = f(x) |3 at (e.g. form of some , as j X [q. M i - f(x.) j ^(xJK 2 j (x.-x.). cr The asymptotic variance estimator can then be formed as - 8 - j i to that affect the levels and will dominate). The simplest Other examples are: can be obtained by explicit differentiation of tV^.lftx.) ^j=l 2j h the sample variance of i = n the where the asymptotic variance x, 1 n n" X T(x_.)[h (x.) + <q-K (x - x .)]/[f(x.) + <K (x.-x.)], ^ M i o- j ^j=l 2 j <r 2j i l j j l is Also, for analogous this estimator. This estimator was recently considered by Hall (1992). 5. would when the \j>. between pointwise density . 5. |3 it They are retained because they are easy 1/Vn. Var(p) = Y. ,K (x-x.) /n - [Y ,K (x-x.)/n] ^i=l <r i ^j=l cr j Here where are asymptotically negligible in to illustrate the a density estimator Partial Means: m,y./n) <r. where the slower convergence rate of the derivative is (£. does not rely on the does not distinguish between elements of Some examples may serve is j the bandwidth were held fixed If asymptotic distribution and those that do not example Thus, Sa(z.,P,h)/5h. compute and could conceivably improve the asymptotic approximation. derivatives, then the analog = m(z.,£,h) + m.y. - !fi. be a consistent estimator of the asymptotic variance of |3 1 if the usual delta-method formula for the presence of a sample average in an m-estimator. limit of = n~ For example, y./n, Y>. is It Var(£) = (3.5) E^i^/n, ^1=1 l Ifi. Differential Equation Solution: 5., but this expression approach that is £K 2 2i It l ((p,x -S(p))-x.)] + 5. - l J.^Syn. J=l J l possible to derive an analytical expression for is An alternative quite complicated and difficult to evaluate. used by Hausman and Newey (1992), numerical solution to + is = x(x_.)h_(x.)/f(x.) - l is to numerically differentiate the dS(p)/dp = -[h (p,x -S(p)) + <q.K ((p,x -S(p))-x.)]/[f(p,x -S(p)) with respect to £ form to This approach 5.. is quite feasible using existing fast and accurate numerical algorithms for ordinary differential equations. Inverse Density Weighted Least Squares: 1 11 Var(0) = fiT^ifV. -^'. (3.5a) *-i=l 5. ill i jfif" M , ^J=l n 8- = -n" y is f(x.)" T(x.)x.x'. ill l n 2 i T(x.)f(x.)" x.u.K (x.-x.). ^J=l i J The variance estimator y.-x'.p. 1 i ^i=l 1 - 5\ ,8., i = u. = n"V. i n = t(x.)x.u. + #. l Let J J J J c J i For partial means, asymptotic normality and consistency of the variance estimator are shown in Section 4, using the Lemmas of Section 5. Corresponding theoretical results for the other examples are given elsewhere. 4. Partial Means Partial means have a number of applications in economics. For example, they can be used to approximate the solution to the differential equation described in Section Dropping the P„ = [p ,p S(p) term from inside g (x ,p)dp = E[(p -p ]. where It S(p) is known that is )g (x ,x 2. )], where x leads to an approximation as is distributed uniformly on this approximation is quite good in a small proportion of described in Section g (x -S(p),p) It x (see Willig, 1978). can be estimated by - 9 - 2. many economic examples, This is a partial mean as where p. is " 1 = (4.1) (p -p°)Ii g(x ,p.)/n, 1 1 drawn from a uniform distribution on [p ,p This ]. is a simulation estimator similar in spirit to that of Lerman and Manski (1981). Partial means are also of interest from a purely E[q|x dimension attenuation devices. Like smaller dimensional argument. Consequently, partial than estimators of covariates x„, The way in in the partial ], E[q|x However, unlike g n (x). 2 .)g is a function of a mean estimators will converge faster a partial mean controls for the ], which partial means control for covariates takes an additive form, E[T(x mean an average way. relationship to additive nonparametric models. (4 2) statistical point of view, as ( E[q|x] = g (x Xl ,x2 .)] = g 10 Xl Thus, as a function of ( x ) ) Suppose that the conditional expectation + g (x + E[T(x .)g the partial , illustrated by their is 2 ), 20 E[x(x„)] = and that (x 2 Then 1. .)]. mean estimates the corresponding component of an additive model, up to a constant. In comparison with other additive model estimators, partial means are easier to compute but may be less asymptotically efficient. Unlike alternating conditional expectation estimator for additive models (ACE, Breiman and Friedman, 1985), the partial mean is an explicit functional, so the kernel estimator will not require iteration. However, because the partial mean does not impose additivity estimator. Also, the partial curse of dimensionality may mean depends on the E[q|x]. The partial mean mean-square projection of may be a less efficient full conditional expectation, so the result in slower convergence to the limiting distribution. The partial mean and ACE are different placed on it E[q|x] is statistical objects when no given in equation (2.3a). The ACE object on the set of functions of the form These estimators summarize different features of - 10 - E[q|x]. If one restrictions are is gJx ) is the + g„(x interested in ). of 5. 8<£." m(z.,0,fi + ZyJ/xiY/dC, = nyr., is influence function estimator would be the analogous 6 = = m(z.,/3,h) + m,y. - 1 tr 1 Another feature of average in an m-estimator. bandwidth shrinking to zero for its validity. Var(/3) that is ,m,y./n) ^j=l h j (.Y 1 it Vn{@ - |3 where ), when the bandwidth The terms and m(z.,p,h) convergence rate of is j§ is Y,. held fixed at 8./n slower than 5. example estimator K (x-x.). £ = f(x) is at form of some x, This estimator was recently considered by Hall n'tStL.UhJx.) ^j=l 2 j 2j 5. l The simplest the sample variance of , [3]. Other examples are: can be obtained by explicit differentiation of 5. + Cq.K (x.-x.)]/[f(x.) + ^ M i cr j l j <K (x.-x.)], cr j as i _1 1 n 1 = n" I. T(3L.)f(x.r [q. M i - f(x.) h (x.)]Kcr (x.-x.). ^j=l 2j 2 j l j j 1 j The asymptotic variance estimator can then be formed as n 2 Var(£) = £. .0 /n, ^1=1 $. l Differential Equation Solution: 5., but this expression is l = T(x_.)h„(x.)/f(x.) - p + 2i 2 l l It is n 8. - I. ,5./n. ^j=l l (10) j possible to derive an analytical expression for quite complicated and difficult to evaluate. 11 - to that affect the where the asymptotic variance . Here h will dominate). this estimator. Var(p) = Y. ,K (x-x.) /n - [Y ,K (x-x.)/n] ^i=l <r ^j=l cr l j Partial Means: when the between pointwise density levels and (e.g. to illustrate the a density estimator is $. Also, for analogous does not distinguish between elements of Some examples may serve the They are retained because they are easy 1/Vn. where the slower convergence rate of the derivative derivatives, is o\ are asymptotically negligible in asymptotic distribution and those that do not would cr compute and could conceivably improve the asymptotic approximation. reasons the formula for it £ cr p . does not rely on the the bandwidth were held fixed If be a consistent estimator of the asymptotic variance of limit of Thus, the usual delta-method formula for the presence of a sample + m, (y.-y), a(z.,/3,h) iji. = n *£ " 3a(z.,0,h)/ah. mh where An alternative Let the normal. argument of u denote the true density of X(u) be partitioned conformably with x and f (x n 9) The asymptotic variance of the partial mean estimator x„. will be V = [X{XX(u (4.3) ,u 1 Theorem. ii) 2 )du Suppose that 4.1: H Assumptions 2 2 > (t) Var(q|x=(x 1 E[\q\ 4 fJx,x) E[\q\ ] < m, d s s; 4 \x]fJx) Hi) x(x for some co no- — i If, in addition, > 0. is Then for ncr E[q\x] bounded, and l — > $ then oo, equation (2,4), in 2 lV cr ^ — Vntr i x(x iv) E[q \x] 4 ^{1+Eiq \x=(xfr\,x )]}f(xfx\,x )dx 2 2 2 k /Z fsup c > 0, +* S and frf x2 ) a.e., fQ (x) are bounded; bounded and zero except is ) ~ are continuous and bounded away from zero; is ,t))dt. 1 and K are satisfied for on a compact set where 2 (x ,t)" T(t) f ].J f 1 i) 2 1 > du and ) f (x ) are continuous, and Pk-k < «; (fi - P ) Q -^ 9 i/ln(n) v) n<r -> N(0,V). V. The conditions here embody "undersmoothing," meaning that the bias goes to zero faster than the variance. distribution is Undersmoothing reflected in the conclusion, where the limiting is centered at zero, rather than at a bias term. An improved convergence rate for in the k /2 normalizing factor v'ntT partial means over pointwise estimators k -1/2 the asymptotic distribution result is (no- l) while the corresponding rate from the k -1/2 usual asymptotic normality result for pointwise estimators converges to zero slower by is cr is (n<r out, ) which , Furthermore, the rate for partial means going to zero. exactly the nonparametric pointwise rate when the dimension components are averaged embodied The rate implied by for the asymptotic distribution. i is the smaller will be k , is k . Thus, the more and hence the faster will be the convergence rate. One important feature of this result trimming" condition, where the density of nonzero. problem." This condition It is is is x hypothesis is iii), bounded away from zero where theoretically convenient because used here because it is that amounts to a "fixed not restrictive in - 12 - it t is avoids the "denominator many cases (e.g. pointwise estimation) and because the resulting theory roughly corresponds to trimming based on a large auxiliary sample, which is often available. might be possible to modify the It results to allow trimming to depend on the sample size, e.g. as in Robinson (1988), but would be very complicated. this modification Estimators will be v'n-consistent when they are full means, There are many interesting examples of such estimators, such as the components. all The general conditions given here are policy analysis estimator of Stock (1989). slightly different for that case, so some function of the data, and different than = E[a (z)g j3 x of (3 variance estimator from equation (2.6), for p = E i (4.4) " i (x) = E[q|x], a n may (z) be = h (x)/h g(x) 0. = a as above, (x) (z.)g(x.) - is 1^,/n, + 6. - = E.V^zJftx.rHq.-gtxJlK (x.-x.)/n. M & cr ^j=l i j l j 5. Oj l j The asymptotic variance of this estimator V = (4.5) g along with the associated asymptotic , I^/n, Var(p) = a (z )i(x.)/n, 1 where (x)], a continuously distributed variable that is A kernel estimator x. helpful to describe the estimator and result is it Suppose way. in a slightly different is are averages over i.e. ^ El**]. = a (z.)g will be + E[a <z) |x]| ~ - (x.) Q yx.f T^x.Hq.-g^x.)]. =x i Theorem and is E[\mJz)\ x zero if compact set; inside the Vn(p-P Y n Suppose that 4.2: ] is iv) < ii) oo; /Vn ili i 1 2 M-\l>.\\ /n ] < m, E[\q\ H not in a compact set, and E[aJz)\x], -^ i -U 4 Assumptions K and compact set of Hi); " )-Y, 2 E[\q\ i) and and and v) no- is /ln(n) 2 and — > oo and V). fQ (x) d £ s; are bounded, Hi) a.Jz) bounded away from zero on that are continuous VnCp - $ ) -i> N(0, Q v -L+ \x]f (x) are satisfied for f Q (x) f Q (x) 2k 4 a.e. 2s n<r and bounded for — » 0. x Then If, in addition, n<r 3k -> oo, then v. This result gives asymptotic normality for a trimmed version of Stock's (1989) estimator, - 13 - as well as being a general result on the asymptotic normality of sample moments that are random linear functions of kernel regressions. Useful 5. Lemmas Several intermediate results of a familiar type are useful in developing asymptotic theory for the m-estimator j3 described in Section useful for showing consistency of Asymptotic normality of is £ Uniform convergence results are 2. and of the Jacobian term |3 will follow from v'ncr m (£_) — > in the N(0,V). expansion of Also, very important for consistent estimation of the asymptotic variance. a = when such that corresponding to v'n-consistency of 0, v'nm (8„) = Y. ,\b./Vn + o h=n n (1) and p Y. p, , ^i=l for each of these results are given in this Section. 110. -i//. l v -£-» <r II /n —^-> 0. V In addition, can be shown that there it (3.1). is i//. Primitive conditions l Examples of how these results can be used to derive results for particular functionals are given in the proofs of Theorems 4.1 and 4.2, and (1992), proofs of results in Hausman and Newey (1992), Matzkin and Newey in the and Newey and Ruud A number (1991). of additional regularity conditions are used in the analysis to follow. The first regularity conditions imposes some moment assumptions. IIBII = [tr(B'B)] 1/2 where , Assumption Y: For p £ This condition, like Assumptions moment condition for y is as a function of metric is the h, in P ] < K and oo, E[llyll P |x]f (x) is bounded, let 2 E[llm(z,/3 H, is a standard type of condition. it useful to impose smoothness conditions on terms of a metric on the set of possible functions. supremum norm on the function and 14 its derivatives, ,h )ll The fourth useful for obtaining optimal convergence rates for For the asymptotic theory, B denotes the trace of a square matrix. tr(») E[llyll 4, For a matrix h. m(z,£,h) Here, the a Sobolev norm. The ] < supremum norm quite strong, but uniform convergence rates for a kernel estimator and is known or straightforward derivatives are either well its are not very much slower than term" uniform rates). in the L to derive (see Appendix B), and convergence rates (there is only an additional "log Consequently, conditions for remainder terms to go to zero fast enough to achieve asymptotic normality will not be much stronger with the supremum norm than they it in L will be with supremum norm for many Furthermore, norms. vector consisting of all distinct X Also, let is .sup rv.\\d This a Sobolev supremum norm of order taken equal to infinity One useful type of result is one, Suppose that 5.1: where £ is c > such that \\m(z,P,h)-m(z,p,h (5.1) sup i) n ) E[sup d £ A+l, ln(n)/(ncr E[b(z)] < )\\ oo, and for * b(z)(\\h-h " l \\n~ £. i \\ f . x e X. j. m Let m(z,fi,h compact, and are satisfied with and for any the derivatives do not exist for some if uniform convergence conclusion of the following result. Lemma x, l B{x)/dx'\\, IIBII is denote any elements of all contained in the support of where is 3 B(x)/5x^ show smoothness let j IIBII.smax, . let order partial derivatives of j denote a set that nonnegative integer quite easy to functionals. To define the norm, for a matrix of functions B(x) B(x). is = E[m(z,/3,h )]. n (£) continuous at each is CR ) \\m(z,p,h — /3 m(z,p,h) - E[m(z,p,h and > Then < e £ o- oo; Q )]\\ -^ (5.1) — » and E[m(z,&,h The uniform convergence conclusion of equation consistency of the solution to equation (2.3). )\\] /3 ii) e S with probability Assumptions K, H, and Y f) all 1 in probability, as in the is llh-h )] and there 0, is is biz) < c, II £ continuous on and 0. a well known condition for Also, equation (5.1) is useful in showing consistency of an estimator that maximizes an objective function - 15 - and n —1 n £. a m(z.,|3,h), where ~*1 n m a scalar, and is is useful for showing consistency of the Jacobian term * yi £._ 3m(z.,/3,h)/5(3, by letting the m in the Lemma statement of the be each column of of the derivative. Asymptotic normality of v'nV m (0 is ) essential for asymptotic normality of This result has two components, which are a linearization around the true asymptotic normality of the linearization. It is |3. and h useful to state these two components separately. Asymptotic normality of the linearization will follow from asymptotic normality of v'nV m (/3 when ) m(z,j3 ,h) m(h) = m(z,P ,h). n nature of 5.2: a linear functional that does not depend on The rate of convergence o and vnc- m(h) is m(h) = Sv(x)' h(x)dx v(x) is If S the magnitude of For the moment, assume that continuous almost everywhere, there oo, (i.e. say z, will depend on the a) Here the results are grouped into two main ones, the first involving m(h). v^n-consistency. Lemma is — > 0, where is zero outside a compact set, such that c > 5. = v(x.)'y., then for a scalar Vn[m(h) - E[sup m(h )] II v(x+u) 2 II £711 y 2 |x]] II = £."/S.-Ef5.J>/vn + (1). P Cases where convergence following assumption integer and let J is is slower than 1/Vn useful for these cases. 9 h(x)/dx J be ordered so that are somewhat more complicated. For the moment let 3 [y.K (x-x.)]/3x J i - 16 - cr i £ The be a nonnegative = y®[3T< (x-x.)/5x J <r l ]. < Assumption domain = R k = k Suppose that 5.1: k £ k„ z, there , for = (x (t)'.t')'; x(t) a matrix of functions is x a vector of functions < k, ,TGj(t)[dTi(x(t))/ax ]dt + k ii) almost everywhere and zero outside a compact set R in (t) w(t) such that 1, x (t) is continuously f differentiable with bounded partial derivatives on a convex, compact set J = in its interior; 4 %sup ||7}||i£; [<l |x], E[llyll Z(x) = E[yy' |x] iii) The key condition here continuous a.e., and for is + v(x (t)+T},t)}f (x (t)+T),t)]dt < 1 containing and e > v(x) a,. 1 () the integral representation of is m(h) i) bounded and continuous is and 3", k with w(t) The dimension of the m(h). argument being integrated and the order of the derivative lead to the convergence rate - for that m(h), Vncr is k /2 + £ l Thus, every additional dimension of integration . 7 v? increases the convergence rate by rate by a factor of l/o\ asymptotic variance of V = (5.2) This hypothesis also leads to a specific form for the which for m(h), Jcj(t)[E(x(t))®<fJ<(u ,t)^(u 1 Lemma + I, 5.3: VR<r k If Assumptions /2 — => and oo, Asymptotic normality is nonlinear in while every additional derivative decreases the h a+S in the ,t) , -h> £ ,t) £ = Sd K(u+[5x (t)/5t]v,v)/Su dv , du }a)(t) f (x(t))dt. and 5.1 is then satisfied with Vn(r a [m(h)-m(h more general case where )] m(z,£ d a £+s -^ ,h) is again assume that this useful for the linearization. is a scalar. - 17 - Let a = k /2 and for N(0,V). depends on can be reduced to the previous cases by a linearization. following assumption is 1 1 K, H, Y, Vno- X(u m(z,h) = m(z,/3 The n ,h), and z and Lemma Suppose 5.4: that vector of functionals for ii) biz) h-h n II [ln(n)/(n<r /or . II C A and nonnegative constants D(z,h), 12 and d £ max{b+l,L,+s,b„+s} such that co>; Assumptions K, H, and Y are satisfied, X with h all 12 llh-h.IL A k+2J 1/2 )] Q D(z,h) A. s A, a, is linear in \\m(z,h)-m(z,h )-D(z,h-h < c, \\ \\D(z,h)\\ iii) ; + \\h-h i) E[b(zf] and * b(z) llhll. A is )\\ h compact, there (t on = {h c 2), 1, llhll. : A is a > < =£ < «; uO /or tj J = n 1 S cr , /n U -> A n Vn<r E[b(z) JtA • i, 2 0, n -> and vncr k+A fa -> co. Then m(h) = fD(z,h)dF(z), CC n Jm(z.,h)-m(z.,h n )]/n Y *n=2 Vn(r i Op = VRa^ [m(h)-m(h J] + o i The conditions of this result imply Frechet differentiability at function of h, in the Sobolev norm llhll A = A with different norms, rather than ,. max{A = A h_ of m(z,h) as a The remainder bounds are formulated ... ,A > r 2 to allow , (1). weaker conditions for asymptotic normality in some cases. Asymptotic normality of either Lemma Lemmas 5.2 or 5.3. 5.2 and 5.3 that cr m(z.,h)/Vn 2> In the v'n-consistent case of n In the n ,m(z.,h,J/'/n ^1=1 l -^ 0, Lemma n Var(m(z.,h )+5.) Lemma so that 5.3 and o- T. n ^1=1 i a > 0, m(z.,n)/vn' l - 18 - it > N(0,V). so that follows by the central m(h) = JD(z,h)dF(z) will be the case that — 5.4 with from 5.2, it will follow ,m(z.,h)/Vn = Y. Am{z.,h n ) + S.-E[d.]}/Vn + o (1), ^1=1 ^i=l l l l l p slower than v'n-consistent case, where satisfies the conditions of o-°T. Lemma V. asymptotic normality, with asymptotic variance limit theorem. can be shown by combining Assumption oo; For ii) linear on llh-hll A — Pn > Lemma and < o- U — . 0, 1 _. i«5_ Pn 5.5: U » -^ 2oi <r V -^ V, h; s for V in and b(z)llhll. A a+s ... > k+A + (llh-hgll^] there < e, Lemma equation E[b(z) is 5.2 then, ,h for ] < 12 )dF(z) (5.2). - 19 - 5. ,h = I )ll If .. , /ln(n), E[b(z) that o(llh-h!l as ) £ b(z) llhll Op — 2 II (S a pn 2k+2A -2a > co, 3 ncr < ] is (II/3-/3 = pn + iv) co; and D(z,h;/3,h) 4 3k+2A +2A -2a no- , 5.2 is satisfied. m(h) = fD(z,h;& If II e ll IID(z,h;/3,h)-D(z,h;/3 iii) Suppose that Assumption V = Var(S.). llh-h 1 satisfies the conditions of V £ b(z)[ll/H3 |m(z,£,h)-m(z,/3,h)-D(z,h-h;£,h) and IID(z,h;/3_ ,h,JII a-k-A n 0, /3 )ll and Hp-Pgll < e satisfying oo for fixed and ) llm(z,p,h)-m(z,/3 ,h i) > llhll > U A_ . e — llh-h.ll. Sa 5.2: + ), —> co. m(h) = SD(z,h;f$ n ,h n )dF(z) = v(x.)y., T. _IIS.-5.il /n —^-» satisfies the conditions of Lemma and 5.4, Appendix C Throughout the appendix different uses and = £•_, £. Proofs of Theorems A: T and Also, CS, M, different in Cauchy-Schwartz, will refer to the DCT Markov, and triangle inequalities, respectively, and to the dominated convergence Before proving the results in the body of the paper theorem. may be will denote a generic constant that is it useful to state and prove some intermediate results. Proof of Theorem x = (x ,x Let II = sup nil t(x) = x(x ), x for e _1 D(z,h-h;h)| = | [h.(x) a = k,/2. _r , , i) n ^- 2s+a „l/2 a+s-k/2 2[ln(n)J + vncr a- x e all = [ln(n)/(ner — _ , by > . k— — v'na- > Thus, the conclusion of -1 (x(t)) Let = x(x(t))f a.e. u(t) Lemma (x(t)) n, <r /vn and by 5.4 holds, with g (x(t))h (x(t))]f f (t)[-g (x(t)),l]. Furthermore, hypothesis. k-a = k + k/2. v'na- — and > Thus, the conclusion of Then by the triangle inequality, and a vn(r (£-p ) = > '/n(r £ = [h (x) - h (x) is = /Vn + ln(n)<r n w /-. 2s+a k/2. — > Also, Lemma Cllhll. _ 0, . , . implying ln(n)cr /Vn cr — > 5.4 are satisfied. m(h) = JD(z,h;h )dF(z) = for (t)dt, = x t This function f , The other conditions of Assumption x. ti _ 0, > a+s and zero outside a compact set by continuity of assumption about — * |D(z,h;h)| v'na- , a-k.^-. ,. and llhll 5.5. |m(z,/3,h) - m(z,j3,h) - < c, so that the rate hypotheses of oo, [h (x(t)) - J*T(x(t))f , lnlnjcr goes to zero faster than some power of implies that + )] 2 _1 small enough that e llh.-h._ll s Cllh-hll | and iii), D(z,h;h) = x(x)h (x) Choose n ). Then for X. h (x) - l]D(z,h-h;h) Then for 1 be the compact set of hypothesis D(z,h) = D(z,h;h and bounded below by X ), m(z,h) = x(x)h (x)/h (x), Let llh(x)ll. {h_(x)/h (x)}h (x)], Let The proof proceeds by checking the conditions of Lemmas 5.3 - 4.1: v'no- Lemma l and f_, 5.1 v'no- )] I.{m(z.,n)-E[m(z,h )]>/n x(t) = (x-.t). bounded and continuous g , and by the are also satisfied by — 5.3 holds, for = E[m(z,h - 20 = is and 2 > oo V by hypothesis and in equation (4.3). a a = i/n<r ^<m(z.,h )-E[m(z,h v^o- 2^.{m(z.,h )-E[m(z,h )]}/n because To finish the proof, note that that for i) - iii) )]>/n + vn<r°*[m(h)-m(h = m(z,h) m(z,h,/3) and —^-» by satisfied by is iv) condition implies then follows by the conclusion of Proof of Theorem 4.2: Let Lemma Lemma of b(z) _1 = E[a (z)f 2 JV(x)h(x)dx for m(z,h) = a (z)h (xT h conditions of Lemma 5.3, for i//.-a The r.{5.-E[5.]}/v^i + o (1). ^i l l p follows from Lemma Proof of sup ' 5.1: Theorem B.2, lln~ £.[m(z.,/3,h) follows by It ) _1 (x) <h (x) - 2 =x / v n[m(h)-m(h one obtains , )]ll - m(z.,3,h —^> (ln(n) and 1/2 (n«r n )] = QED. 4.1. Tauchen, 1985) that (e.g. E[m(z,£,h k+2A )" 1/2 + )] <r) is = o continuous in (1). p. Therefore, p p )]ll Lemma Also, the second conclusion Theorem follows by standard results A g^xlh^x)}] = so that by the conclusion of v(x), first conclusion then follows. Hh-hJL = O and the By hypothesis, the f (x)(-g (x),l). f (x) (z.)g (x.)+|3 - E[m(z,£,h a = m(h) = JTJ(z,h;h )dF(z) Also, here (z)ll. 5.5 similarly to the proof of £.m(z.,p,h jJIn Also, by Lemma Ha except that 4.1, _1 ~ I D(z,h;h) = and (x) Theorem 5.3 are satisfied for this = Wx.)y. = 5. Q (z) |x] The second conclusion The proof that the conditions of Lemma 5.4 are 5.4 is taken to be = E[a i>(x) Furthermore, and the fact that this last «, » 0. + k /2 = k-a. g^xlh^x)}] = E[E[a (z)|x]f {h (x) - (x) = A_ = QED. 5.5. satisfied proceeds exactly as in the proof of function — k > as specified above, conditions = A /ln(n) no- 0, [h (x) - <h (x)/h (x)>h (x)]. a (z)h (x) sup A = A 3k/2 - 2a = 3k /2 + k /2 s > 0. » = D(z,h;h), D(z,h;|3,h) 5„ = £n -±> N(0,V), (1) follows from the above arguments and by hypothesis it of Assumption 5.2 are satisfied, with condition — a- + o )] ^ n~ X!.b(z.)(llh-h e II ) -^ so the conclusion QED. T. Proof of Lemma results, sup 5.2: By the Fubini theorem, „JIE[h](x) - h (x)ll = 0((r ) E[m(h)] = m(E[h]). for any compact set - 21 - Also, by standard 6\ Then by v(x) zero outside a compact set Let 0. v^n[E[m(h)] - E, u = (x-xj/c, J*b(x.)|X(u)|du < Then by DCT, oo. s Cb(x)llyll, DCT so by vn[m(h)-m(h so 0, )] *-i l and by 3", = 0(vno- X£ (u) Let —> ) bounded on cr l l d K(u)/8u denote same dimension as and p (x) a cr~ 4a cr 4 ]/n — 0* — By 0. » IIE[m(h)] - m(h_)ll 1 Var(p (x.)y.) cr l > and x +o-v e — > V J(x„)v Z as it x„+crv g 2 IIE[p cr X(u 3" ,v) is for all = [9x (x > a llE[h]-h 0. zero for J a show that oo, l — > v = -U N(O.V) it (x.)y.]-m(h p l l l cr k +1 cr l > V and = — > (t) By V. V. Then, for small enough is zero for X(u) l outside a bounded set (x) x is )ll — Therefore, to show 0. t[p (x.)y.y'.p (x.)'] v all which (x.)y.) l cr so v e V, Var(p cr |IE[p cr in its interior. +crv)/3t]v, Therefore, bounded, and converges to II. an identity matrix with the is cr — £ VhV )> vno- {m(h)-E[m(h)]> (x.)y.]ll suffices to show that x„ £ J. 2 For Let 3" if cr, x„ e Tcj(x )[I®X (J(x is )v and a mean value expansion give bounded over for zero for + u,v)]dv - 22 3" 2 v e V and converges p (x,(x„)-cru,x„) = cr 1 Z Z Jcj(x +crv)[I®X ([x (x„+crv)-x (x )]/cr + u,v)]dv is m(h a I -h> continuous differentiability of cr bounded and zero outside w(t) {E[m(h)] - cr [x (x +crv)-x (x )]/cr to E.teT-EleTlWn / v ncr <m(h)-E[m(h)]> -^-> N(0,V). l 3", Then by M, a Vn data and i.i.d. be a compact, convex set containing then Also, 0. > £ Thus, to show and hence 0, having bounded support, x„ £ J 2 — cr 1 cr cr and cr and the last equality follows by a the change of variables y 1 E[llp (x.)y.ll having Jw(t)[I®:K ((x(t)-x)/<r)]dt where suffices, by the Liapunov central limit theorem, to <r so by v^T(r k~£ = )/cr,v)]dv, m(h) = Y.p (x.)y./n. Then X(u) , show that suffices to it I z - = X.<5.-E[5.]>/^n + 2\<e f-E[eT]>/vn o(l) follows that it Jw(x +o-v)[I®X ((x (x +crv)-x (t-x_)/o\ cT = sT-S.. for E[m(h)] = m(E[h]), 3", Therefore, 0. -k -I = -> ] ) QED. (1). Note that 5.3: x(t) a+S 2 eT | By small enough all = 0(v/nV s (x)ll last equality follows with probability one as 5. > where the Q p l Lemma E[ — sT for = vfi{m(h)-E[m(h)]> + + o(l) = YA8.-E[8.]}/Vn + o Proof of „IIE[h](x) - h m(h) = £.,S./n. so that v(x.+cru)X(u) ^ b(x.)|X(u)| bounded support, -£» s VnCsup )] = [JV(x)K (x-x.)dx]y. = [JV(x.+cru)X(u)du]y., 5. by a change of variables |5j| m(h all u outside a compact set, by the dominated convergence theorem, for = x t J(t) = 3x k +2 <r i *E[p (x.)y.y'.p (x.)'] 1110" C 1 4 E[llp o- (x.)y.ll 0" OIk +4^JV«P j 1 4a ]/n * o" 4 (x.)ll 0" 4 lly.ll 4a ^ ]/n o- 4 E[llp Lemma u l C/(no- =£ 1 (v'Sr a p 5.4: for A E[b(z)hA«Tj 2) = ° IDn 'n p * j and 1 Lemma 1O1O ^11 = D(z.,y.K («-x.)) i j c J . ij d £ A + By Note that i. E[D n ~ n £ E[b(z) .] OAOA Vn[YMz.)/n]\lh-h n ^11 (e.g. ^2k-2A 2 2 ]E[llyil Serf ling, a (vncr <(E[D p D(z.,h). ~ ]cr Lemma 2 1/2 n.]) By <r = E[D..|z.] Jl 2k " 2A 2 llyll D. «1 Let > ]cr" and i E[D vW 5.2.2b), 2 1/2 .]) -2 n | -D ^. .<D. -D. **ij (<r .+E[D. «i i* ij a~ k = >/n) T. .D. ./n ^ij ij and D. 1' ]}| -iL> 0. ~\/Vn) Then -H> 0. A the conclusion follows by T. E[D(z.,h-hJ By J 2 Lemma D.. ij It 2 2 ]( llh-h^ll JT>(z,,y.K (•-x.);|3_,h.JdF(z). IID..-D..II ij ij lO lO i 2 J — » 0. Then QED. n o" ) . A follows by a standard argument, similar to the proof of = D(z.,y.K (•-x.);0 n ,h n ), 0' i oJ l Let E. J = D. i« £ E[b(z) ] B.4, / cr'TAD. • -E[D. • ]-D(z.,h )+E[D(z.,h,,)]>/v n -?-> 0. Jlm(z.,0,n)-m(z.,0_,h_)ll /n -5-» 0. ^i=l l i that 1 IJ l ^i 5.1, =E[D..|z.] = l Thus, by Chebyshev's inequality, 5.5: for s .] l Lemma 2 i» p Hh-hJL l Proof of 2 Then by a V-statistic projection on the basic observations l. +(E[D 0, = 1 2 = Ilfi-h.ll. . ij ij — \\ r.D(z.,h)/n n D(z,h), ii E[b(z) Then by hypothesis -^-» 0. l 2 The 0. QED. Hh-lrJI B.3, a <r B y linearity J of 2 -* l E-=EM!\^ij ^l=l^J=l and l) l 1/2 Y.{m(z.,h)-m(z.,hJ-D(z.,fi-hJ>ll s llcAi 1 k {l+v(x,(t)-oni,t)>f-(x (t)-(ru,t)dtdu/n conclusion then follows by the Liapunov central limit theorem. Proof of 1 4 l o* = <l+v(x.)>] (x.)ll (T 1 1 V. > 1 E[llp 1 (x,(t)-«ru,t)ll l — (t)-<ru,t)p (x.(t)-o-u,t)'f,,(x (t)-o-u,t)dtdu Jpo" (x,(tHni,t)E(x 1 1 <r 1 U 1 4a Also, 1 ?k +92 o-i = D. and ]/<r , 2 (A.6) iv) u = [x (x„)-x Then by the change of variables (t)/3t. n 5. l By J and Lemma 5. * b(z.)lly.K («-x.)IL (110-0<r l l l A B.5, + = D(z.,y 5. llfi-h.ll. A ji' = T. ,D../n. ^j=l ) CS, - 23 - (— x.);£,h) .K a- j l and j = E[D..|z.] = l i II . ij = J. ,D../n, ^j=l ji J D. Lemma l Also, by Assumption 5.2, ji * b(z.)lly l k-A .llo-~ j iO (5 p 0n +tj A 2). n Then by 2 ^.^HS.-a.ll * Co-^.IID.j-D.yi /!! 2k_2A By the data O 2a 2 5\H5.-S.II /n] 2ce £ -« E[llg cr + n ^[IISjII _ 2a -1 -2k-2A „ s Qr n cr 3 proof of er /n 2l.H6.-5.il *i 1 Lemma —> V ——» V 5.3 note that n ~\(5 p Q pn +7) 2 2 /n](j:.lly.ll /n) A 2 2)] ) = o (1). n p 2 E[II5.-5.II a cr T and follows by 5. = p E[5.] 2 n 2 ] ] * Gr £ Qr 2a (E[lln 2a n ^ (D -« u 2 1 (E[IID II 2 )B + E[IID ] Under the conditions of lemma -£-» 0. 1 — > 0, V.(5.-y .5 ./n)' u\ 1 ^j.5 ./n)(5.-y 1 ^j j j y.HS.-S.II ^1 1 1 E[5.6'. 11 —5-> V, ] 2 5.2, it /n -^> + n ] 2 1 E[IID n ll ] 2 II ]) was shown in the — -1 > V, and n 4a o- follows by J Lemma 4 E[H6.ll ] — > 0. 5.3. so the conclusion follows by T. As shown in Therefore, by M, 1 - 24 - T. Under the conditions of Lemma defined in the proof of e p (x) r xr i cr so that the law of large numbers. 2a — 0, > ] 1 for (x.)y. J c 1 (r ~k [5:.b( Z .) 1 5.2 that 1 that proof, CC 2a 0, 1 Then * Qr = OJ[a- ) O A 2 /!! i.i.d., E[<r so by J M, 2 + llh-h_ll. i(ll£-0_ll (r" 2 QED. Appendix Technical Details B: This appendix derives rates for uniform convergence in probability in Sobolev norms for derivatives of kernel estimators. sup^.sup Lemma „IIS ri(x)/Sx compact, Assumption K and n (B.l) \\h-E[h]\\ . It — set X, llhll = . P] < (ln(n) p > for <x> A £ for 2, Et\\y\\ and j, = <r P \x]f (x) is Q such that <r(n) 1 bounded, 1/2 (no- f 1/2 k+2J P suffices to prove the result for ). a scalar. y For each ^ £ by j, X(u) having bounded support the order of differentiation and integration can be interchanged to obtain E[S h(x)/dx derivative of 1 n" is Pn n cr" ] = dTE[h](x)/axf and h(x), k "£ X." 1 y.fc((x-x.)/o-) Next, let k(x) the corresponding derivative of and a E[h](x)/dx > £ suppressed for notational convenience. 1/p ; o- (B.2) y. in = Pn 1/p , y. > 'l Pn 1/p ; y. J m £ = E[H(x)], 1/p y. J , for some Prob(H(x) * H(x) Let 8 = [ln(n)/(ncr x) £ Prob(y. bounded and p > £ nProb(|y.| y.) k+2£ 1/2 )] . argument of n let y. = y., I y. I H(x) = Let l * in * 1/p H(x) = so that Note that by Bonferonni's inequality, L- =1 y- fc((x-x.)/o-). £ nProb(y. where the Pn < order partial £ K(x), Also, for a constant P, = -Pn th denote and H(x) . For > Pn 1/p ) £ E[ P |x.=x] c(x) = E[ |y. | 2, - 25 - for some y. i s n) l |y. | P ]/P 1/p and P . fixed, by c(x)f (x) Q is bounded is <r(n) Then oo. > = J Proof: E[\\y\\ is satisfied /ln(n) <r(n) from the text that for a closed II. Suppose that B.l: Recall <r(n) — 5 (B.3) 1 |E[fi(x)] - E[H(x)]| 1 * C5" r" k "£ (1/p)_1 n < s 5 E[ y. | | _1 k *E[1( ly. l>Pn <r P x.] fc((x-x.)/<r) | | 1/p ) |y. | |«(x-x.)/<r) I ] ] | __-l -I (1/pH..., , .. ., k/2 (l/p)-l/2., ,,, ,1/2, K , * = cr n n /ln(n) S «(v) c(x-<rv)f ;(x-crv)dv = 0(cr ) = C5 Next, by compact, Lipschitz, sup k(x) - 3|fi(x)-fi(x) Cn can be covered by less than it the centers of these open balls, = (j 3k * Cn I k~£_1 (1/p) "" 3 <r~ ^ Cn J(e) by x, -3k -3 n Let . Then for . X Also, by . open balls of radius J(e)), 1 equal to the center of an open ball containing |E[H(x)]-E[H(x)] denote x. je x. (x) ^ E[ |H(x)-H(x) | | ] follows that it sup v |H(x)-E[H(x)]| s sup (y .|H(x)-E[H(x)MH(x. (x))-E[H(x. (B.4) * Cn + su P(r |<H(x. (x))-E[H(x. (x))]>| •* je Je Note that ln(n) eq. (B.4), it 2 < Cn and, by follows by p a 2 cr(n) and k — > oo 3k that for J ^ -7V-73 Cn (B.5) tr cr 5 = [n 7 7 YXy. k{(x-x.)/<rf] * in l -k-2£ Co- n by 2 E[y |x]f (x) cr -7Y-7H «r 7 Prob(sup T |H(x)-E[H(x)]| * ^ ( ^Prob(|H(x. )-E[H(x. J( ^ 2y > jc > M, n ,-,,.,2. , , M5(l - C/[M ln(n)ncr M <r > J i (B.4), and MS/2) . > ... n y. cr &((x-x.)/cr) bounded by large enough, )]| > MS/2) ] k "£ 2 2 k_£ 1+(1/p) ) cr" S]) y. k({x. -x.)/<r) + Cn ^ exp(-n S /[2nVar(cr" 'in jc l ^J=l . _ 3k £ Cn _2.„ r -k-2£ 1/p -k-£_ n ^ _ k _ k+2£_2. fl A 1/p «-„ , cr 5)) + n exp(-n5 /C[cr cr 5]) £ Cn exp(-Cncr 5 /(l + n , - 26 - ._ MS/2. i -k-£ eq. ) -Y-7I 7 7 £ ^JVt(u)*E[yf|x.=x-<ru]f_(x-<ru)du J MS) * Prob(sup.|H(x. )-E[H(x. )]| k,3/2, in - MS big enough, i Then by C As usual for kernel estimators, 0. > all je Then for the constant . .._,, . 7 i and Bernstein's inequality, for , — /ln(n)] s Ely J k{(x-x.)/<rf] bounded. + sup.|H(x. )-E[H(x. )]|. k+2 < Ccr cr , Also, note that k "£_1 (r" (x))]}| J bounded, no- (1/p) " 3 _ (l/p)-3 -k-£-l ..... 5-(2/p) , r r tr k+2,1/2, x ,2, = MS(1 - r.,, oC/[M ln(n)n ) Cn a- ,,, o(l). | I £ Cn 31c exp(-CM 2 2 •ln(n)) * Cexp(-[CM -3k]ln(n)). Since these inequalities hold for any M, n sup„|H(x)-E[H(x)]| = (5). Then by eq. (B.3) = (5). Consider any sup x |H(x)-E[ft(x)]| sup„|H(x)-E[H(x)]| = M/2) < e/2 e Prob(5~ sup„|H(x)-H(x) so that by eq. (B.2), by large enough, for all M/2) > e/2 < so that for all E[ |y. n. p ]/P 1/p e/2, < For this fixed ProWS^sup^ |H(x)-E[H(x)] such that Then by the triangle inequality, n. P Choose M there exists (5) and the triangle inequality, > 0. I follows that it Prob(5~ sup^lHCxJ-ElHtx)]! P, > I M) < > -1 ProbCS^sup^lHtxJ-HCxH (B.2) > M/2) + Prob(5 and the triangle inequality, su P;r |H(x)-E[H(x)] sup T |H(x)-H(x) = | 0(c- If Assumptions K, H, Therefore, by eq. The conclusion then follows by (5). applying this conclusion to each derivative of up to order B.2: < e. p J. Lemma M/2) > | and by j and Y are satisfied for d £ j+s QED. bounded. cr then \\E[h]-h II . = m ). Proof: that by 1 around = - a IIS it Lemma B.3: J h/5x J - Ccr C m o- J a 'h (x)/ax J ll mXX(u){® m r =l ] = d s j+s Hh-hJI . J = J = 1/2 (n<r .f;K(u)d J+m J+m h(x)/ax B.l k+2J )~ 1/2 + h(x+u<r)/ax J du. Also, by in cr j), h(x+S:u)/ax lia J Then by a Taylor expansion 1 f 1 u>®{a J+£ h(x)/ax J+£ )du J+m }dull J+m ll * Ccr m . QED. and B.2 are satisfied and Assumption then (ln(n) = ll^jc/c^WuK® u>®{a m [T|K(u)|llull m du]llsup U C„, If the hypotheses of Lemmas satisfied with (B.7) J 9 E[h](x)/dx for constant matrices 0, so 1 h (x) = TK(u)h (x)du. follows that + * 10" having finite support, X(u) J~K(u)du = (B.6) k E[h](x) = E[y.K (x-x.)] = J"h(t)[X((x-t)/<r)/cr ]dt = TX(u)h(x+u(r)du, Note that S <r ). P - 27 - H is Lemmas Follows by Proof: Lemma B.l If Assumption. K B.4: and B.2 and the triangle inequality. m(h) is satisfied, QED. \m(h)\ £ is linear, Cllhll then , E[m(h)] = m(E[h]). By Proof: that UK (•-xJIL = linearity of m(h), (F F Let U it continuous on draws) as — J follows that g(x)K (x-x)f £, it > x (x) m(g(x)K («-x)) — — x m(J' v=,g(x)K 0" (•-x)F.(dx)) = Tm(g(x)K (»-x))F continuous Also, since each E[m(h)]. > of up to order J (T Furthermore, by m(E[h]). is J o* ll.f„g(x)K (•-x)F I (dx)-E[h]ll mtJUgMKCT (-x)F.(dx)) J the empirical measure (e.g. JV^mtgCxJK (•-x))F.(dx) with respect to follows that > £ on Then, since oo. to — bounded and is and hence 0, > . A A having finite support, F. J to J to m(E[h]) = and be a sequence of measures with finite support, that (x) to derivative of Hence, by 6\ J i.i.d. W, x i such \J converge in distribution to the distribution of and bounded on & a compact set is for all 0, E[m(h)] = .T„m(g(x)K (•-x))f_.(x)dx m(J\ g(x)K0* (•-x)f ri (x)dx). from a sequence of compact there m(g(x)K («-x)) = and hence 0, (*> to X having finite support and X(u) CT ¥ Then (dx). T gives the conclusion. J QED. Lemma linear dm(h Proof: B.5: D(h) If Assumption K with and for given \m(h)-m(h)-D(h-h)\ = o( h-h + C,yK (>-x))/dC,\ r Let is satisfied II . = II ) as h II with h-h II —> Hhll. < oo there is then 0, D(yK (—x)). h„ = h + <yK (--x), so that llh -fill, s C"yK ('-x)ll * |m(h»)-m(fi)-CD(yK (»-x))|/£ = |m(h J-m(fi)-D(h „-h) |/< = - 28 - o(llh -fill Then CC,. ./|Cl ) = o(l). QED. References Bierens, H.J. (1987): "Kernel Estimators of Regression Functions," in Bewley, T.F., ed., Advances in Econometrics: Fifth World Congress, Cambridge: Cambridge University Press. Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. 80 580-598. Hall, P. (1992): The Bootstrap and Edgeworth Approximation, New York: Springer-Verlag. Hardle, W. and T. Stoker (1989): "Investigation of Smooth Multiple Regression by the Method of Average Derivatives," Journal of the American Statistical Association, 84, 986-995. J. A. and W.K. Newey (1992): "Nonparametric Estimation of Exact Consumer Surplus and Deadweight Loss," working paper, MIT Department of Economics. Hausman, Huber, P. (1967): "The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley: University of California Press. Lerman, S.R. and C.F. Manski (1981): "On the Use of Simulated Frequencies to Approximate Choice Probabilities," in Manski, C.F. and D. McFadden, eds., Structural Analysis of Discrete Data with Econometric Applications, Cambridge: MIT Press. Li, K.C. and Duan, N. "Regression Analysis Under Link Violation," Annals 17, of Statistics, 1009-1052. Matzkin, R. and W.K. Newey (1992): "Nonparametric Estimation of Structural Discrete Choice Models," mimeo, Department of Economics, Yale University Newey, W.K. and P. A. Ruud (1992): "Density Weighted Least Squares Estimation," working paper, MIT Department of Economics. Powell, J.L., J.H. Stock, and T.M. Stoker (1989): "Semiparametric Estimation of Index Coefficients," Econometrica, 57, 1403-1430. Robinson, P. (1988): "Root-N-Consistent Semiparametric Regression," Econometrica, 56, 931-954. Ruud, P. A. (1986): "Consistent Estimation of Limited Dependent Variable Models Despite Misspecification of Distribution," Journal of Econometrics, 32, 157-187. Stock, J.H. (1989): "Nonparametric Policy Analysis," Journal Statistical Association, 84, 567-575. Tauchen, G.E. (1985): "Diagnostic Testing and Evaluation of Journal of Econometrics, 30, 415-443. Willig, R. (1976): of the American Maximum Likelihood Models," "Consumer's Surplus without Apology," American Economic Review, 66, 589-597. - 29 - 5 93 9 078 Date Due Ub-26-67 MIT LIBRARIES DUPL 3 ^OflO D0fl4b441 1