Asymptotic Trimming for Bounded Density Plug-in Estimators Arthur Lewbel Boston College October 2000 Abstract This paper proposes a form of asymptotic trimming to obtain root n convergence of functions of kernel estimated objects. The trimming is designed to deal with the boundary effects that arise in applications where densities are bounded away from zero. Arthur Lewbel, Department of Economics, Boston College, 140 Commonwealth Ave., Chestnut Hill, MA, 02467, USA. (781)-862-3678, lewbel@bc.edu. I’d like to thank Bo Honore and Shakeeb Khan for helpful suggestions. This research was supported in part by the National Science Foundation, through grant number SES9905010. 1 Introduction This paper obtains root n convergence of functions of kernel estimated objects using asymptotic trimming. The proposed trimming is designed to deal with the boundary effects that arise in applications where densities are bounded away from zero. The method is demonstrated by an example in which the object to be estimated is a weighted average of the inverse of a kernel estimated conditional density, that is, estimates of functions of the form E[w= f .vju/] where f .vju/ denotes the conditional density of a scalar random variable v given a random vector u. The estimator is a special case of a two step estimator with a kernel estimated nonparametric Þrst step. The general theory of such estimators is described in Newey and McFadden (1994), Newey (1994), and Sherman (1994). See also Robinson (1988), Powell, Stock, and Stoker (1989), Hardle and Stoker (1989), Lewbel (1995), and Andrews (1995). The difÞculty with applying generic methods like Newey and McFadden (1994), or the types of trimming employed by Hardle and Stoker (1989) or Sherman (1994), is that, for 1 root N convergence of a sample average of w= f (where a kernel estimator b f is substituted in for f ), to avoid boundary effects such estimators assume either f or the terms being averaged go to zero on the boundary of the support of v; u. When the average is a form like w= f , this would require that w go to zero in this neighborhood, which is not the case for some applications. This technicality is resolved here by bounding f away from zero, and introducing an asymptotic trimming function that sets to zero all terms in the average having data within a distance ¿ of the boundary. The estimator has ¿ go to zero more slowly than the bandwidth to eliminate boundary effects from kernel estimation, but also assumes N 1=2 ¿ ! 0, which makes the volume of the trimmed space vanish quickly enough to send the trimming induced bias to zero. Formally, this trimming requires that the support of .v; u/ be known.. In practice, this support could be estimated, or trimming might be accomplished more easily by simply dropping out a few of the most extreme observations of the data, e.g., observations where the estimated density is particularly small. Based on Rice (1986), for one dimensional densities Hong and White (2000) use jackknife boundary kernels to deal with this same problem of boundary bias. Their technique (which also requires known support) could be generalized to higher dimensions as an alternative to the trimming proposed here. The inverse density example derived here is directly applicable to Lewbel (2000) and Honore and Lewbel (2000). Other estimation problems involving nonparametric density estimators in the denominator to which the technique can likely be applied include Stoker (1991), Robinson (1991), Newey and Ruud (1994), Granger and Lin (1994), and Hong and White (2000). For ease of notation, the theorem is stated and proved assuming all of the elements of v; u are continuously distributed. The results can be readily extended to include discretely distributed variables as well, either by applying the density estimator separately to each discrete cell of data and averaging the results, or by smoothing across cells using kernels (essentially treating the discrete variables as if they were continuous). See, e.g., Racine and Li (2000) for details, or see the examples in Lewbel (2000) or Honore and Lewbel (2000). A SSUMPTION A.1: The data consist of N observations .wi ; v i ; u i /; i D 1; :::; N , which are assumed to be i.i.d. Here u i is a k ¡ 1 vector and v i is a scalar, u i and .v i ; u i / are drawn from distributions that are absolutely continuous with respect to some Lesbesgue measures, with Radon-Nikodym densities f u .u/ and f vu .v; u/, and have supports denoted f vu and b f u , with kernel functions Äu and Ävu . Denote kernel estimators of these densities b K vu and K u . Let h be a kernel bandwidth and ¿ be a density trimming parameter. Assume Ävu is known, and deÞne the trimming function I¿ .v; u/ to equal zero if .v; u/ is within a distance ¿ of the boundary of Ävu , otherwise, I¿ .v; u/ equals one. DeÞne h ¿ i D wi f u .u i / I¿ .v; u/ f vu .v i ; u i / 2 h i D wi f u .u i / f vu .v i ; u i / ´ D E.h i / qi D h i C E.h i ju i / ¡ E.h i jv i ; u i / ¶ µ N vj ¡v uj ¡u 1 X K vu ; h k N jD1 h h µ ¶ N uj ¡u 1 X b Ku f u .u/ D k¡1 h N jD1 h b f vu .v; u/ D Ó D N b f u .u i / 1 X wi I¿ .v; u/ b N iD1 f vu .v i ; u i / A SSUMPTION A.2: The functions f vu .v; u/; f u .u/, ¼ ¿ vu .v; u/ D E.h ¿ jv; u/; and ¼ ¿ u .u/ D E.h ¿ ju/ exist and are continuous in the components of .v; u/ for all .v; u/ 2 Ävu , and are continuously differentiable in the components of .v; u/ for all .v; u/ 2 Ävu ; where Ävu differs from Ävu by a set of measure zero. Assume w, Ävu , and f vu .v; u/ are bounded, and f vu .v; u/ is bounded away from zero. A SSUMPTION A.3: There exists some functions m vu and m u such that the following local Lipschitz conditions hold for some º u and some .º v ,º u / in an open neighborhood of zero, for all ¿ ¸ 0: k f vu .v C º v ; u C º u / ¡ f vu .v; u/ k · m vu .v; u/ k .º v ; º u / k k f u .u C º u / ¡ f u .u/ k · m u .u/ k º u k k ¼ ¿ vu .v C º v ; u C º u / ¡ ¼ ¿ vu .v; u/ k · m vu .v; u/ k .º v ; º u / k k ¼ ¿ u .u C º u / ¡ ¼ ¿ u .u/ k · m u .u/ k º u k E[h 2¿ f u .u/¡2 ju] and E[h 2¿ f vu .v; u/¡2 jv; u] exist and are continuous, and the following objects exist sup ¿ ¸0;.v;u/2Ävu sup ¿ ¸0;u2Äu E[h 2¿ f vu .v; u/¡2 jv; u] E[h 2¿ f u .u/¡2 ju] h i sup E .[jh ¿ = f vu .v; u/j C 1]m vu .v; u//2 ¿ ¸0 h i sup E .[jh ¿ = f u .u/j C 1]m u .u//2 ¿ ¸0 3 A SSUMPTION A.4: The support of the kernel function K .u/ is Rdim.u/ . K .u/ D 0 for all u on the boundary of, and outside of, a convex bounded subset of Rk : This subset has a nonempty interior and has the origin as an interior point. K .u/ is a bounded differentiable, symmetric function. s K .u/du D 1: The kernel function K u .u/ has order p > 1: All partial derivatives of f u .u/ of order p exist. For all 0 · ½ · p, for l1 C : : : C lk D ½, R sup¿ ¸0 u2Äu ¼ ¿ u .x/[@ ½ f u .u/=@ l1 u 1 : : : @ lk u k ]du exist. The kernel function K vu .v; u/ has all the same properties, replacing u with v; u. Theorem A: Let Assumptions A.1 to A.4 hold. If N h k ! 1; N h 2 p ! 0; h=¿ ! 0, and N ¿ 2 ! 0; then the following equations hold p N N .b́ ¡ ´/ D [N ¡1=2 6iD1 qi ¡ E.qi /] C o p .1/ p N .b́ ¡ ´/ ! N [0; var .q/] Theorem A is proved in the Appendix. 2 Appendix: Proof Begin by stating and proving Theorem B below, which supplies a set of regularity conditions for root N convergence of a weighted average of a kernel estimated density. Theorem A will then be proved using repeated application of Theorem B. Just for theorem B, let x be a continuously distributed random k vector, and let f .x/ be the probability density function of x: For some trimming parameter ¿ ; some random vector s; and some function w¿ D w¿ .x; s/; let ¹¿ D E[w¿ f .x/] and deÞne estimators µ ¶ N X ¡ x x 1 j b K f .x/ D k h N jD1 h b ¹ D N ¡1 N X f .xi / w¿ i b i D1 where h is a bandwidth and K is a kernel function. Note that h and ¿ are implicitly functions of N A SSUMPTION B.1: The data consist of N observations .xi ; si /; i D 1; :::; N , which is assumed to be an i.i.d. random sample. The k vector xi is drawn from a distribution that is absolutely continuous with respect to some Lesbesgue measure on Rk , with RadonNikodym density f .x/ having bounded support denoted Äx : The underlying measure º 4 can be written in product form as º D º x £ º s . The trimming parameter ¿ is an element of a set Ä¿ . A SSUMPTION B.2: The density function f .x/ and the function ¼ ¿ .x/ D E.w¿ jx/ f .x/ exist and are continuous in the components of x for all x 2 Äx , and are continuously differentiable in the components of x for all x 2 Äx ; where Äx differs from Äx by a set of measure zero. A SSUMPTION B.3: For some º in an open neighborhood of zero there exists some functions m f and m ¼ such that the following local Lipschitz conditions hold: k f .x C º/ ¡ f .x/ k · m f .x/ k º k k ¼ ¿ .x C º/ ¡ ¼ ¿ .x/ k · m ¼ ¿ .x/ k º k For all ¿ 2 Ä¿ , E.w¿2 jx/ is continuous in x. Also, sup¿ 2Ä¿ ;x2Äx E.w¿2 jx/ and h¡ ¢2 i sup¿ 2Ä¿ E jw¿ jm f .x/ C m ¼¿ .x/ exist. A SSUMPTION B.4: The support of K .u/ is Rk . K .u/ D 0 for all u on the boundary of, and outside of, a convex bounded subset of Rk : This subset has a nonempty interior and has the origin as an interior point. K .u/ is a bounded differentiable, symmetric function. s K .u/du D 1 and s u K .u/du D 0: A SSUMPTION B.5: The kernel K .u/ has order p: All partial derivatives of f .x/ of orR der p exist. For all 0 · ½ · p, for l1 C: : :Clk D ½, sup¿ 2Ä¿ x2Äx ¼ ¿ .x/[@ ½ f .x/=@ l1 x1 : : : @ lk xk ]d x exists. The equality w¿ .x; s/ f .x/ D 0 holds for all x within a distance ¿ of the boundary of Äx . Theorem B: Let Assumptions B.1 to B.5 hold. DeÞne r¿ i by r¿ i D [w¿ i C E.w¿ i jxi /] f .xi / D f .xi /w¿ i C ¼ ¿ .xi / If N h k ! 1; N h 2 p ! 0; and h=¿ ! 0 and then, p N N .b ¹ ¡ ¹¿ / D [N ¡1=2 6iD1 r¿ i ¡ E.r¿ i /] C o p .1/ Proof of Theorem B: Let z ¿ i D .xi ; w¿ i /, and deÞne p N .z ¿ i ; z ¿ j / by µ ¶ w¿ i C w¿ j xi ¡ x j p N .z ¿ i ; z ¿ j / D K 2h k h The average kernel estimator b ¹ is equivalent to a data dependent U-statistic µ ¶¡1 NX ¡1 X N N b ¹¡ p N .z ¿ i ; z ¿ j / D O p .N ¡2 h ¡k / D o p .N ¡1 / 2 iD1 jDiC1 5 Following Powell, Stock, and Stoker (1989), Þrst verify that E[k p N .z ¿ i ; z ¿ j / k2 ] D o.N /: Z 1 2 f .xi / f .x j / E[ k p N .z ¿ i ; z ¿ j / k ] D 4h 2k h i µ x ¡ x ¶2 i j 2 2 E.w¿ i jxi / C E.w¿ j jx j / C 2E.w¿ i jxi /E.w¿ j jx j / K dxi dx j h Z 1 D f .xi / f .xi C hu/ 4h k i h E.w¿2i jxi / C E.w¿2i jxi C hu/ C 2E.w¿ i jxi /E.w¿ i jxi C hu/ K .u/2 dxi du " #Z 1 · sup E.w¿2 jx/ f .xi / f .xi C hu/K .u/2 dxi du D O.h ¡k / D O[N .N h k /¡1 ] D o.N / h k ¿ 2Ä¿ ;x2Äx using the change of variables from .xi ; x j / to .xi ; u D .x j ¡ xi /= h/ with jacobian h k : Let r N i D 2E[ p N .z ¿ i ; z ¿ j /jz ¿ i ]. It follows from Lemma 3.1 in Powell, Stock, and PN Stoker (1989) that N 1=2 [b ¹ ¡ E.b ¹/] D N ¡1=2 iD1 r N i ¡ E.r N i / C o p .1/. Next, deÞne t N i by t N i D r N i ¡ r¿ i D E[ p N .z ¿ i ; z ¿ j /jz ¿ i ] ¡ r¿ i µ ¶ Z xi ¡ x j 1 D [w¿ i C E.w¿ j jx j /]K f .x j /dx j ¡ r¿ i hk h µ ¶ Z xi ¡ x j 1 D [w¿ i f .x j / C ¼ ¿ .x j /]K dx j ¡ r¿ i hk h Z D [w¿ i f .xi C hu/ C ¼ ¿ .xi C hu/]K .u/ du ¡ [w¿ i f .xi / C ¼ ¿ .xi /] Z D [w¿ i [ f .xi C hu/ ¡ f .xi /] C [¼ ¿ .xi C hu/] ¡ ¼ ¿ .xi /]K .u/ du Note that given the properties of K .u/ ; having w¿ .x; s/ f .x/ D 0 hold for all x within a distance ¿ of the boundary of Äx , and having h=¿ ! 0, ensures that boundary effects do not interfere with the change of variables from .xi ; x j / to .xi ; u D .x j ¡ xi /= h/ above. To illustrate this point, suppose that K is a product kernel, the Þrst element of which is non zero only over the range [¡c; c], and suppose that the Þrst element of xi has support given by the interval [L ¡ ; L C ]: Then, after the change of variables, the Þrst element of u will be evaluated over the range [.L ¡ ¡ xi /= h; .L C ¡ xi /= h], and so boundary effects regarding this Þrst element can only arise at observations i in which the interval [.L ¡ ¡ xi /= h; .L C ¡ xi /= h] does not contain the interval [¡c; c], which is equivalent to when xi is within a distance ch of the boundary of [L ¡ ; L C ]. However, w¿ .x; s/ D 0, and hence the integral equals zero, when xi is within a distance ¿ of the boundary, so having h=¿ ! 0, ensures that once the sample size is sufÞciently large (for this element, once h becomes smaller than ¿ =c), the integral will equal zero at all observations i for which boundary effects may arise. 6 Given the above, we have " N 1=2 [b ¹ ¡ E.b ¹/] D N ¡1=2 N X # " r¿ i ¡ E.r¿ i / C N ¡1=2 iD1 N X # t N i ¡ E.t N i / iD1 and, using the local Lipschitz conditions Z jt N i j · h[jw¿ i jm f .xi / C m ¼ ¿ .xi /] k u k K .u/ du " ¶2 # h¡ i µZ ¢ 2 D O.h 2 / D o.1/ E.t N2 i / · h 2 sup E jw¿ jm f .x/ C m ¼ ¿ .x/ k u k K .u/ du ¿ 2Ä¿ PN t N i ¡ E.t N i /Co p .1/, and therefore N 1=2 [b ¹¡ Now E.t N2 i / D o.1/ implies that N ¡1=2 iD1 P N ¡1=2 E.b ¹/] D N iD1 2[r¿ i ¡ E.r¿ i /] C o p .1/: Write E.b ¹/ as · µ ¶¸ xi ¡ x j 1 E.b ¹/ D k E w¿ i K h h µ ¶ Z xi ¡ x j 1 f .xi / f .x j /dxi dx j D k E.w¿ i jxi /K h h Z D ¼ ¿ .xi /K .u/ f .xi C hu/dxi du by Assumption B.5, we can substitute into this expression R a p’th order Taylor expansion of f .xRi C hu/ around f .x/. Then, using ¹¿ D ¼ ¿ .xi / f .xi /dxi , the existence of sup¿ 2Ä¿ x2Äx ¼ ¿ .x/[@ ½ f .x/=@ l1 x 1 : : : @ lk xk ]dx, the fact that w¿ .x; s/ f .x/ and hence ¼ ¿ .x/ f .x/ vanishes in a sufÞciently large neighborhood of the boundary of x; and the assumption that K .u/ is a p’th order kernel, gives E.b ¹/ D ¹¿ C O.h p /, so N 1=2 [E.b ¹/ ¡ 1=2 p ¹¿ ] D O.N h / D o.1/, which Þnishes the proof of Theorem B. Theorem B does not require that the trimming parameter ¿ equal distance to the boundary, although that is how it will be used in the proof of Theorem A. An extension to Theorem B (which might be useful in other applications), is that if we let ¿ be any function of N , and let d.¿ / denote distance to the boundary, then Theorem B still holds as long as we replace h=¿ ! 0 with h=d.¿ / ! 0, and replace the last sentence of Assumption B.5 with w¿ .x; s/ f .x/ D 0 holding for all x within a distance d.¿ / of the boundary of Äx . Before going on to prove Theorem A, the Assumptions A.1 to A.4 are restated in a different form below as Assumptions C.1 to C.4 below, which are more convenient for the proof. A SSUMPTION C.1: The data consist of N observations .wi ; v i ; u i /; i D 1; :::; N , which are assumed to be i.i.d. Here wi is a vector, u i is a k ¡ 1 vector and v i is a scalar. 7 Let f vu denote the joint density function of an observation of v and u, and let f u denote the marginal density function of an observation of u: Denote kernel estimators of these densities b f vu and b f u , where the kernels have bandwidth h and order p. Let Ävu equal the support of .v; u/, which is assumed to be known. DeÞne the trimming function I¿ .v; u/ to equal zero if .v; u/ is within a distance ¿ of the boundary of Ävu , otherwise, I¿ .v; u/ equals one. DeÞne f u .u i / I¿ .v i ; u i / f vu .v i ; u i / f u .u i / D wi f vu .v i ; u i / h ¿ i D wi hi ´ D E.h i / qi D h i C E.h i ju i / ¡ E.h i jv i ; u i / Ó D N b 1 X f u .u i / I¿ .v i ; u i / wi b N iD1 f vu .v i ; u i / A SSUMPTION C.2: Assumptions B.1 to B.5 hold, with x D .v; u/, f D f vu , b f D b f vu , s D w; and w¿ i D h ¿ i = f vu .v i ; u i /: A SSUMPTION C.3: Assumptions B.1 to B.5 hold, with x D u, f D f u , b f D b fu , s D .v; w/; and w¿ i D h ¿ i = f u .u i /: A SSUMPTION C.4: Assume w; Ävu ; and f vu .v; u/ are bounded, and f vu .v; u/ is bounded away from zero. Proof of Theorem A: DeÞne ´¿ D E .h ¿ i / and q¿ i D h ¿ i C E.h ¿ i ju i / ¡ E.h ¿ i jv i ; u i /. If h ! 0 then h i f vu .v; u/ ¡ f vu .v; u/ j D O p .N 1¡" h k /¡1=2 sup j b .v;u/2Ävu sup .v;u/2Ävu h i 1¡" k¡1 ¡1=2 b j f u .u/ ¡ f u .u/ j D O p .N h / for any " > 0. See, e.g., Silverman (1978) and Collomb and Hardle (1986). To ease notation, let f vui D f vu .v i ; u i / and f ui D f u .u i /: Consider a second order Taylor expansion of b f vui = b f ui around f vui = f ui . The quadratic terms in the expansion involve the second derivatives of f vui = f ui evaluated at e f ui and e f vui , where e f ui lies between b f ui and f vui lies between b f vui and f vui . Substituting the Taylor expansion of f ui , and similarly e b f vui = b f ui around f vui = f ui into b́ gives à ! N X b b f f ¡ f f . f ¡ f / ui ui ui ui vui vui N 1=2b́ D R N C N ¡1=2 wi C ¡ I¿ .v i ; u i / 2 f vui f vui f vui iD1 8 Where R N is a remainder term. One component of R N is N ¡1=2 à · N N X wi e f ui iD1 N X ¡1=2 D O p .N 3 e f vui ! ³ ´ ¡ ¢2 ¡3 jwi j [sup f vu ] sup j b f vu ¡ f vu j iD1 "¡1 ¡k h .b f vui ¡ f vui /2 I¿ .v i ; u i / / D o p .1/ Similarly, the other components of R N are O p .N "¡1=2 h ¡k¡1=2 / and O p .N "¡1=2 h ¡k¡1 /, which will also be o p .1/. We now have à ! N X b f ui f vui ¡ f vui / f ui ¡ f ui f ui . b 1=2 ¡1=2 I¿ .v i ; u i / C o p .1/ N b́ D N wi C ¡ 2 f vui f vui f vui iD1 à ! N X b b f ¡ f f . f ¡ f / ui ui ui vui vui D N ¡1=2 h ¿ i C wi ¡ I¿ .v i ; u i / C o p .1/ 2 f vui f vui iD1 Next, using Assumption C.3 apply Theorem B to obtain N ¡1=2 N X N X b f ui ¡ f ui ¡1=2 wi I¿ .v i ; u i / D N r¿ i ¡ E.r¿ i / C o p .1/ f vui iD1 iD1 where r¿ i is given by r¿ i · wi D I¿ .v i ; u i / C E f vui D h ¿ i C E.h ¿ i ju i / µ wi I¿ .v i ; u i / j u i f vui ¶¸ f ui so N ¡1=2 N X N X b f ui ¡ f ui ¡1=2 wi I¿ .v i ; u i / D N h ¿ i C E.h ¿ i ju i / ¡ E.h ¿ i / C o p .1/ f vui iD1 iD1 Similarly, using Assumption C.2 and again applying Theorem B gives N ¡1=2 N X N X f ui . b f vui ¡ f vui / ¡1=2 wi I¿ .v i ; u i / D N h ¿ i C E.h ¿ i jv i ; u i /¡ E.h ¿ i /Co p .1/ 2 f vui i D1 iD1 substituting in these results gives N 1=2 Ó D N ¡1=2 N X h ¿ i C[h ¿ i C E.h ¿ i ju i / ¡ E.h ¿ i /]¡[h ¿ i C E.h ¿ i jv i ; u i / ¡ E.h ¿ i /]Co p .1/ iD1 9 so N X N 1=2 . Ó ¡ ´¿ / D N ¡1=2 q¿ i ¡ E.q¿ i / C o p .1/ iD1 Next, observe that 2à !2 3 N h i X ¡1=2 2 5 4 .h i ¡ h ¿ i / D E h i [1 ¡ I¿ .v i ; u i /] C.N ¡1/ [E .h i [1 ¡ I¿ .v i ; u i /]/]2 E N iD1 £ ¤ Now E h 2i [1 ¡ I¿ .v i ; u i /] D o.1/ because the expectation of h i2 exists and ¿ ! 0 makes 1 ¡ I¿ .v i ; u i / ! 0: For the second term above we have h i2 .N ¡ 1/ [E .h i [1 ¡ I¿ .v i ; u i /]/]2 · N 1=2 sup.h i /E[1 ¡ I¿ .v i ; u i /] Now E[1 ¡ I¿ .v i ; u i /] equals the probability that .v i ; u i / is within a distance ¿ of the boundary of Ävu , which is less or equal to sup. f vu / times the volume of the space within a distance ¿ of the boundary of Ävu . That volume is O.¿ /; so given boundedness of h i and f vu , we have N 1=2 sup.h·i /E[1 ¡ I¿ .v i ; u i /] D O.N 1=2 ¿ / D o.1/: ³ ´2 ¸ p P N We therefore have E N ¡1=2 iD1 .h i ¡ h ¿ i / D o.1/; and so N .´¿ ¡´/ D o.1/. The same analysis can be applied to q¿ i , resulting in N and p 1=2 . Ó ¡ ´/ D N ¡1=2 N X qi ¡ E.qi / C o p .1/ iD1 N .b́ ¡ ´/ ! N [0; var .q/] follows immediately. References [1] A NDREWS , D. W. K., (1995), “Nonparametric Kernel Estimation for Semiparametric Models,” Econometric Theory, 11, 560–596. [2] C OLLOMB , G. AND W. H ARDLE (1986), ‘Strong Uniform Convergence Rates in Robust Nonparametric Time Series Analysis and Prediction: Kernel Regression Estimation From Dependent Observations,” Stochastic Processes and Their Applications, 23, 77–89. [3] G RANGER , C. W. J. AND J. L. L IN, (1994), ”Using the Mutual Information CoefÞcient to Identify Lags in Nonlinear Models,” Journal of Time Series Analysis, 15, pp. 371-384. [4] H ÄRDLE , W. AND T. M. S TOKER (1989), “Investigating Smooth Multiple Regression by the Method of Average Derivatives,” Journal of the American Statistical Association, 84, 986–995. 10 [5] H ONORÉ , B. E. AND A. L EWBEL (2000), ”Semiparametric Binary Choice Panel Data Models Without Strictly Exogeneous Regressors,” unpublished manuscript. [6] H ONG , Y. AND H. W HITE , (2000), ”Asymptotic Distribution Theory for Nonparametric Entropy Measures of Serial Dependence” Unpublished Manuscript. [7] L EWBEL , A. (1995), “Consistent Nonparametric Tests With an Application to Slutsky Symmetry,” Journal of Econometrics, 67, 379-401 [8] L EWBEL , A. (2000), “Semiparametric Qualitative Response Model Estimation With Unknown Heteroscedasticity or Instrumental Variables,” Journal of Econometrics, 97, 145-177. [9] N EWEY, W. K. (1994), “The Asymptotic Variance of Semiparametric Estimators,” Econometrica, 62, 1349–1382. [10] N EWEY, W. K. AND D. M C FADDEN (1994), “Large Sample Estimation and Hypothesis Testing,” in Handbook of Econometrics, vol. iv, ed. by R. F. Engle and D. L. McFadden, pp. 2111-2245, Amsterdam: Elsevier. [11] N EWEY, W. K. AND P. A. RUUD (1994), ”Density Weighted Linear Least Squares,” University of California at Berkeley working paper. [12] P OWELL , J. L., J. H. S TOCK , AND T. M. S TOKER (1989), “Semiparametric Estimation of Index CoefÞcients,” Econometrica 57, 1403–1430. [13] R ACINE , J. AND Q. L I (2000), “Nonparametric Estimation of Conditional Distributions with Mixed Categorical and Continuous Data,” Unpublished Manuscript. [14] R ICE , J., (1986), ”Boundary ModiÞcation for Kernel Regression,” Communications in Statistics, 12, pp. 1215-1230. [15] ROBINSON , P. M., (1988), ”Root-N-Consistent Semiparametric Regression,” Econometrica, 56, pp. 931-954. [16] ROBINSON , P. M., (1991), ”Consistent Nonparametric Entropy Testing,” Review of Economic Studies, 58, pp. 437-453. [17] S HERMAN , R. P. (1994), “U-Processes in the Analysis of a Generalized Semiparametric Regression Estimator,” Econometric Theory, 10, 372-395. [18] S ILVERMAN , B. W. (1978) “Weak and Strong Uniform Consistency of the Kernel Estimate of a Density Function and its Derivatives,” Annals of Statistics, 6, 177–184. [19] S TOKER , T HOMAS M. (1991), “Equivalence of Direct, Indirect and Slope Estimators of Average Derivatives,” in Nonparametric and Semiparametric Methods in Econometrics and Statistics, W. A. Barnett, J. Powell, and G. Tauchen, Eds., Cambridge University Press. 11