Bootstrapping Kernel-Based Semiparametric Estimators Matias D. Cattaneo Department of Economics, University of Michigan Michael Jansson Department of Economics, UC Berkeley and CREATES July 30, 2014 Abstract. This paper develops alternative asymptotic results for a large class of two-step semiparametric estimators. The …rst main result is an asymptotic distribution result for such estimators and di¤ers from those obtained in earlier work on classes of semiparametric two-step estimators by accommodating a non-negligible bias. A noteworthy feature of the assumptions under which the result is obtained is that reliance on a commonly employed stochastic equicontinuity condition is avoided. The second main result shows that the bootstrap provides an automatic method of correcting for the bias even when it is non-negligible. We thank Stephane Bonhomme, Lutz Kilian, Adam Rosen, Andres Santos, Azeem Shaikh, Xiaoxia Shi, and seminar participants at Cambridge University, University of Chicago, Georgetown University, London School of Economics, Oxford University, Rice University, University College London, the 2013 Latin American Meetings of the Econometric Society, and 2014 CEME/NSF Conference on Inference in Nonstandard Problems for comments. The …rst author gratefully acknowledges …nancial support from the National Science Foundation (SES 1122994). The second author gratefully acknowledges …nancial support from the National Science Foundation (SES 1124174) and the research support of CREATES (funded by the Danish National Research Foundation). 1 Bootstrapping Semiparametric Estimators 1. 2 Introduction This paper is concerned with basing inference about a …nite-dimensional parameter on an estimator which is semiparametric in the sense that it employs a nonparametric estimator of some nuisance function. The importance of such estimators is widely recognized, as is the di¢ culty of obtaining accurate distributional approximations for estimators of this kind. Regarding the latter, the consensus opinion (substantiated by Monte Carlo evidence) seems to be that the distributional properties of semiparametric estimators are much more sensitive to the properties of their (slowly converging) nonparametric ingredients than conventional asymptotic theory would suggest. In other words, the conventional approach to asymptotic analysis of semiparametric estimators, while delivering very tractable distributional approximations, e¤ectively ignores certain features of these estimators which are important in samples of realistic size. Important progress in the direction of avoiding this shortcoming has been made by Linton (1995), Nishiyama and Robinson (2000, 2001, 2005), and Ichimura and Linton (2005), among others. These papers develop (Nagar- and Edgeworth expansion-type) higher-order asymptotic theory under assumptions implying in particular that the p estimators under consideration are n-consistent (where n is the sample size). An alternative approach, and the one we take in this paper, was employed by Cattaneo, Crump, and Jansson (2013). That approach is conceptually similar to the “dimension asymptotics”approach taken in the seminal work of Mammen (1989), but we will refer to it as a “small bandwidth”approach for reasons that will become apparent below.1,2 1 For an explanation of the connection between the approaches of Cattaneo et al. (2013) and Mammen (1989), see Enno Mammen’s discussion of Cattaneo et al. (2013). 2 The approach we take is also similar to the approach taken in a series of papers by Abadie and Imbens (2006, 2008, 2011), but our main conclusion regarding the bootstrap (and subsampling) is quite di¤erent from that of Abadie and Imbens (2008). Bootstrapping Semiparametric Estimators 3 The “small bandwidth” approach is a …rst-order asymptotic approach whose goal is to produce more reliable distributional approximations by forcing the approximations to be more sensitive to the precise implementation of nonparametric ingredients than conventional …rst-order asymptotic approaches. Because its objective is similar to that of asymptotic analysis based on Edgeworth expansions, an obvious question is whether the “small bandwidth” approach shares with the Edgeworth approach the feature that developing results for general classes of estimators is (or at least would appear to be) prohibitively complicated (e.g., Nishiyama and Robinson (2005, p. 927)). One of the two main goals of this paper is to demonstrate by example that this is not the case. To do so, we study a class of estimators essentially coinciding with the class investigated by Newey and McFadden (1994, Section 8) and develop “small bandwidth” asymptotic results for it. These results turn out to be in perfect qualitative agreement with those obtained by Cattaneo et al. (2013) for a particular member of the class of estimators under study. To be speci…c, it turns out that in general semiparametric estimators utilizing kernel estimators of unknown functions su¤er from bias problems whose magnitude is non-negligible and bandwidth-dependent. Being similar to that obtained by Cattaneo et al. (2013), this “small bandwidth” asymptotic …nding is also in perfect analogy with the …nding obtained under “dimension asymptotics”by Mammen (1989, Theorem 4). It therefore seems natural to ask whether positive results about the bootstrap analogous to those of Mammen (1989, Theorem 5) can be obtained also for the class of semiparametric estimators studied herein. Providing an a¢ rmative answer to that question is the second main goal of this paper. Achieving this goal turns out to require relatively little e¤ort, essentially because the high level conditions formulated in the process of achieving our …rst main goal have been designed partly with achievement of the second goal in mind. Our main methodological prescription is a simple and constructive one: in semi- Bootstrapping Semiparametric Estimators 4 parametric models, inference procedures based on the bootstrap are much less sensitive to the precise implementation of nonparametric ingredients than their main rivals. From a practical perspective this prescription is very attractive because implementation of bootstrap-based inference procedures does not require characterization of the asymptotic variance and/or the bias of the estimator upon which inference is to be based. Although the prescription is consistent with folklore, our theoretical justi…cation for it would appear to be new. In particular, unlike Nishiyama and Robinson (2005) our theory is based on …rst-order asymptotic results and partly for this reason we do not require studentization in order to show that the bootstrap provides “re…nements”in the sense that bootstrap-based “percentile”con…dence intervals enjoy …rst-order asymptotic validity in cases where no such validity is enjoyed by their main rivals, including “Efron”con…dence intervals and subsampling-based con…dence intervals. Previous work on bootstrap validity for general classes of semiparametric models includes Chen, Linton, and van Keilegom (2003) and Cheng and Huang (2010). For the models studied in this paper our results generalize theirs by accommodating nonparametric ingredients implemented using bandwidths that are “small” in the sense that they converge to zero at a faster-than-usual rate. Accommodating such bandwidths turns out to prohibit reliance on certain stochastic equicontinuity conditions employed by Chen et al. (2003), Cheng and Huang (2010), and most (if not all) previous developments of asymptotic distribution theory for semiparametric two-step estimators, including the results surveyed by Andrews (1994b), Newey and McFadden (1994), Chen (2007), and Ichimura and Todd (2007). As a consequence, avoiding reliance on such stochastic equicontinuity conditions turns out be necessary in order to achieve the goals of this paper. The approach taken in this paper is to replace a key stochastic equicontinuity condition by an “asymptotic separability”condition, which is similar to, but weaker than, its stochastic equicontinuity counterpart. This Bootstrapping Semiparametric Estimators 5 condition exploits a special feature of kernel estimators, but may nevertheless be of independent interest. The paper proceeds as follows. Section 2 provides a more detailed statement of and motivation for the questions addressed by this paper. Section 3 presents our “small bandwidth”asymptotic result about semiparametric two-step estimators, while Section 4 is concerned with veri…cation of the high-level assumptions of that result. Section 5 presents bootstrap analogs of the results from Sections 3 and 4. Appendix I contains additional results, some of which may be of independent interest. Finally, Appendix II contains proofs of our main results while Appendix III provides additional details for the examples discussed in the paper. 2. To …x ideas, suppose 0 Motivation is a scalar parameter of interest admitting an estimator ^n satisfying p where n is the sample size, n(^n 0) N (0; ); (1) denotes weak convergence (as n ! 1), and is positive. In this scenario it is common to base inference on a distributional approxip ^ ^ mation of the form n(^n : If ^ n is 0 ) _ N (0; n ); where n is some estimator of consistent, then the distributional approximation is itself consistent in the sense that p supt jP[ n(^n 0) t] P[N (0; ^ n ) t]j !p 0; (2) a fact which in turn implies for instance that the asymptotic coverage probability of the following con…dence interval for 0 is 95%: Bootstrapping Semiparametric Estimators CIn = [^n 6 q q ^ ^ ^ n =n]: 1:96 n =n; n + 1:96 An alternative distributional approximation is provided by the bootstrap. In p standard notation, the bootstrap approximation to the cdf of n(^n 0 ) is given p ^ by P [ n( n ^n ) ]; where ^n denotes a bootstrap analogue of ^n and P denotes a probability computed under the bootstrap distribution conditional on the data. Assuming (1) holds, asymptotically valid inference procedures can be based on the bootstrap whenever the following bootstrap consistency condition is satis…ed: p supt jP[ n(^n 0) p P [ n(^n t] ^n ) t]j !p 0: (3) For instance, two well-known examples of bootstrap-based con…dence intervals for 0 with asymptotic coverage probability 95% are the “percentile”interval CIP;n = [^n qn;0:975 ; ^n qn;0:025 ]; and the “Efron”interval CIE;n = [^n + qn;0:025 ; ^n + qn;0:975 ]; where qn; = inffq 2 R : P [(^n ^n ) q] g: Being “automatic” in the sense that it can be obtained without characterizing and/or (explicitly) estimating ticularly attractive when ; the bootstrap distributional approximation is par- is di¢ cult to characterize and/or estimate, a phenomenon which occurs with some regularity for estimators ^n that are semiparametric in the Bootstrapping Semiparametric Estimators 7 sense that an estimator of an in…nite-dimensional nuisance parameter is employed in the construction of ^n : Chen et al. (2003) and Cheng and Huang (2010) made this point and gave conditions under which (1) and (3) are satis…ed also in models with in…nite-dimensional nuisance parameters. In addition to complicating the characterization of ; the presence of estimators of in…nite-dimensional nuisance parameters often implies that a delicate choice of tuning parameters (e.g., bandwidths in the case of kernel estimators) is required in order to achieve (1) in the …rst place. Moreover, in such semiparametric settings p the …nite sample distribution of n(^n 0 ) has often been found in Monte Carlo experiments to be rather sensitive to the choice of these tuning parameters, suggesting in particular that distributional approximations not depending on tuning parameters can be quite unreliable unless sample sizes are very large. Acknowledging this, Cattaneo et al. (2013) investigated the consequences of relaxing the bandwidth conditions needed to achieve (1). Under weaker-than-usual bandwidth conditions and studying a particular kernel-based semiparametric estimator, Cattaneo et al. (2013) obtained a distributional result of the form p where n(^n 0 Bn ) N (0; ); (4) is the same as in (1) while Bn is some (possibly) non-negligible bias whose value depends in part on the bandwidth. Simulation evidence reported by Cattaneo et al. (2013) was found there to be consistent with the main prediction obtained by replacing (1) with the more general result (4) ; namely that kernel-based semiparametric estimators su¤er from bias problems whose magnitude is non-negligible and bandwidth-dependent. Replacing (1) with (4) can have severe consequences. For instance, the consistency Bootstrapping Semiparametric Estimators 8 1=2 property (2) fails when Bn 6= o(n ) in (4) ; implying in turn that inference procep ^ dures based on the distributional approximation n(^n 0 ) _ N (0; n ) are invalid in general. For instance, the asymptotic coverage probability of the interval CIn is less than 95% when Bn 6= o(n 1=2 ):3 Likewise, if (3) and (4) hold the asymptotic coverage probability of the “Efron” interval CIE;n is less than 95% when Bn 6= o(n 1=2 ):4 On the other hand, using the relation p supt jP[ n(^n p = supt jP[ n(^n Bn ) 0 0) t] t] p P [ n(^n p P [ n(^n ^n ) ^n Bn ) t]j (5) t]j it can be shown that (3) and (4) are su¢ cient to guarantee asymptotic validity of the “percentile” interval CIP;n : For this bootstrap-based inference procedure the consequences of replacing (1) with (4) would therefore be benign if validity of (3) could be established also under (4) : The objective of this paper is twofold. First, we want to explore the general3 If (4) holds and if ^ n is consistent, then P[ 0 2 CIn ] = (1:96 implying in particular that limn!1 P[ 4 If (3) and (4) hold, then P[ 0 2 CIE;n ] = (1:96 implying in particular that limn!1 P[ p 0 nBn = p ) p p nBn = ) + o(1); 2 CIn ] < 0:95 when Bn 6= o(n p p 2 nBn = ) 0 ( 1:96 2 CIE;n ] even more sensitive to the bias Bn than CIn : ): p p 2 nBn = ) + o(1); ( 1:96 limn!1 P[ 1=2 0 2 CIn ]: In other words, CIE;n is Bootstrapping Semiparametric Estimators 9 ity of the …nding that replacing (1) with (4) is necessary when characterizing the large-sample properties of semiparametric estimators under weaker-than-usual conditions on tuning parameters. To do so, we study an important class of kernel-based semiparametric two-step estimators and …nd that members of this class of estimators generally have the property that a result of the form (4) can be obtained under signi…cantly weaker bandwidth conditions than those needed to achieve (1) : Our development is constructive in the sense that it produces an explicit and interpretable formula for the bias Bn : It turns out that Bn in (4) arises due to features of ^n that can be replicated by the bootstrap. As a consequence, it seems plausible that the bootstrap consistency property (3) could be valid also when Bn 6= o(n 1=2 ) in (4) : Formalizing and verifying the latter conjecture is the second objective of this paper. In combination, our …ndings suggest that even though semiparametric estimators are likely to su¤er from nonnegligible bias problems in samples of moderate size, these biases can be corrected for in a fully automatic way by basing inference on the bootstrap. It seems to us that this is a “robustness” property of the bootstrap that is of both theoretical and practical importance. Remarks. (i) Although our main emphasis is on obtaining constructive results, one negative conclusion emerging as a by-product of our development seems to be of su¢ cient theoretical and practical interest to be worth mentioning. As it turns out, the “robustness” of bootstrap-based inference with respect to tuning parameter choice is not shared by inference procedures based on subsampling, a possibly surprising …nding in light of the fact that subsampling is often regarded as a “regularized”version of the bootstrap (e.g., Bickel and Li (2006)). Indeed, while it is well understood (e.g., Politis and Romano (1994)) that the subsampling approximation p ^ to the distribution of n(^n 0 ) is consistent under (1) whenever n is of the form ^n = Tn (z1 ; : : : ; zn ) with zi i:i:d:; a consequence of replacing (1) with (4) is that Bootstrapping Semiparametric Estimators 10 one of the “minimal”assumptions for asymptotic validity of subsampling is violated and in fact it is not hard to show that inference procedures based on subsampling will be invalid (in general) as well. (For instance, subsampling-based inference procedures will be invalid if, for some r < 1=2; nr Bn converges to a non-zero limit.) In particular, the examples analyzed below all have the feature that there are (bandwidth) conditions under which inference procedures based on the standard bootstrap are valid even though subsampling-based inference procedures are not. (ii) Ibragimov and Müller (2010) have proposed an inference procedure which shares with subsampling-based inference procedures the feature that its asymptotic validity follows from (1) whenever ^n is of the form ^n = Tn (z1 ; : : : ; zn ) with zi i:i:d:: Like subsampling, that procedure ceases to be valid when (1) is replaced by (4) : (iii) Another well-known bootstrap-based con…dence interval for 0 is the “sym- metric”interval CIS;n = [^n where Qn; = inffQ 2 R : P [j^n ^n j Qn;0:95 ; ^n + Qn;0:95 ]; Q] g: The asymptotic coverage proba- bility of this interval is 95% when (3) and (4) hold, but the interval is unnecessarily wide when Bn 6= o(n 1=2 ) and is therefore not recommended. (iv) The assumption that 0 and ^n are scalar was made in this section only to simplify the exposition. In particular, the fact that “percentile” intervals are asymptotically valid when (3) and (4) hold is true also when 0 and ^n are vector- valued, so in the remainder of the paper we allow for that possibility. 3. Suppose 0 2 Asymptotics without Stochastic Equicontinuity Rk is a (possibly) vector-valued estimand that can be represented as the solution to a system of equations of the form G( ; 0) = 0; where G( ; ) = Bootstrapping Semiparametric Estimators 11 Eg(z; ; ( ; )); g( ) is a known function, z is a random vector, and 0 (z; ) = E[w(z; )jx(z; )]f0 [x(z; ); ]; with w( ) and x( ) being known functions and f0 ( ; ) denoting the (unknown) density of x(z; ); the latter being assumed to be continuously distributed. Letting z1 ; : : : ; zn denote i:i:d: copies of z; a natural estimator ^n of an approximate minimizer (with respect to 2 0 is given by ) of X ^ n( ; ) = 1 G g[zi ; ; ( ; )]; n i=1 n ^ n ( ; ^ n )0 W G ^ n ( ; ^ n ); G where W is some symmetric, positive semi-de…nite matrix and ^ n is a kernel-based estimator of 0 given by 1X w(zj ; )Kn [x(z; ) n j=1 n ^ n (z; ) = x(zj ; )]; Kn (x) = 1 K hdn x hn ; where K( ) is a kernel and d is the dimension of x(z; ):5 The formulation just given is essentially the same as in Newey and McFadden (1994, Section 8), except that we follow Newey (1994a) and Chen et al. (2003), respectively, by allowing the dimension of g to exceed that of pro…ling (i.e., allowing 0 (z; 0 and by accommodating ) and ^ n (z; ) to depend on ) as well as models where g(z; ; ) is non-smooth. 5 ^ n ( ; ^ n ) = 0: More In our motivating examples, an (approximate) minimizer is one that solves G generally, an (approximate) minimizer is one that satis…es Condition (i) of Lemma 3. Bootstrapping Semiparametric Estimators 3.1. 12 Conventional Asymptotics. Theorem 2 below provides a template for es- tablishing (4) : The approach summarized in that theorem is related to Newey and McFadden’s (1994) approach to establishing (1) : To facilitate a comparison between the approaches, we begin by presenting a version of that approach. Lemma 1. Suppose that: (AL) [approximate linearity] for some matrix J of rank k; 1 X p ) = J g0 (zi ; ^ n ) + op (1); 0 n i=1 n p n(^n where g0 (z; ) = g[z; 0; (; 0 )]; (SE) [stochastic equicontinuity] for some function g0 ; 1 X p [g0 (zi ; ^ n ) n i=1 n g0 (zi ; ^ n ) 1 X p [g0 (zi ; ^ n ) n i=1 g0 (zi ; 0) + g0 (zi ; 0 )] !p 0; n G0 (^ n ) g0 (zi ; 0) + G0 ( 0 )] !p 0; where G0 ( ) = Eg0 (z; ); (AN0 ) [asymptotic zero-mean normality] for some positive de…nite 1 X p [g0 (zi ; n i=1 ; n Then (1) holds with 0) + G0 (^ n ) G0 ( 0 )] N (0; ): = J J 0: Condition (AL) is standard and typically holds (with J = (G_ 00 W G_ 0 ) 1 G_ 00 W ) ^ n ( ; ^ n ) is small: provided the error in the following linear approximation to G ^ n( ; ^n) G ^ n ( 0 ; ^ n ) + G_ 0 ( G 0 ); @ G( ; G_ 0 = @ 0 A su¢ cient condition for this to occur will be given in Section 4.1. 0) : = 0 Bootstrapping Semiparametric Estimators 13 An implication of Condition (AL) is that the large sample properties of ^n are P governed by n 1=2 ni=1 g0 (zi ; ^ n ): Characterizing the distributional properties of such objects is potentially complicated because of the possible nonlinearity of g0 (zi ; ) and, in particular, the dependence/overlap between the arguments of g0 (zi ; ^ n ) (i.e., between zi and ^ n ). A common way to account for nonlinearity and the overlap is to impose Condition (SE) with g0 (zi ; ) being a linear approximation to g0 (zi ; ): Irrespective of the functional form of g0 ; Condition (SE) implies that 1 X 1 X p g0 (zi ; ^ n ) = p [g0 (zi ; n i=1 n i=1 n n 0) + G0 (^ n ) G0 ( 0 )] + op (1): (6) Because the summands in this expansion depend on either zi or ^ n (but not both), Condition (SE) e¤ectively achieves (asymptotic) “separability”between zi and ^ n : Verifying Condition (AN0 ) on the part of the leading term in the expansion (6) tends to be straightforward, as typically the g0 (zi ; 0) are mean zero random variables and G0 (^ n ) G0 ( 0 ) is a smooth functional of ^ n : (For instance, G0 is linear whenever g0 (zi ; ) is.) To be speci…c, assuming the smoothing bias of ^ n is small enough it is usually not hard to show that Condition (AN0 ) holds (with computable using the pathwise derivative formula of Newey (1994a)). Remarks. (i) From the perspective of this paper the most problematic assumption in Lemma 1 is Condition (SE). One exceptional case where that condition is mild is the case where the model is “adaptive” in the sense that the …rst step (i.e., estimation of 0) has no e¤ect on the asymptotic distribution of ^n : This occurs when 1 X p [g0 (zi ; ^ n ) n i=1 n g0 (zi ; 0 )] !p 0; (7) Bootstrapping Semiparametric Estimators 14 in which case Condition (SE) is satis…ed with g0 = 0 (and Condition (AN0 ) is satis…ed with = V[g0 (zi ; 0 )] whenever the latter exists and is positive de…nite). Achieving (7) under the “small bandwidth” asymptotics employed in this paper turns out to require stronger conditions than in cases where the nonparametric ingredient ^ n is n1=4 -consistent. Indeed, the condition (7) seems quite restrictive in our setup and in the sequel we therefore tacitly assume that we are dealing with a “non-adaptive” situation where the …rst step does have an e¤ect on the asymptotic distribution of ^n : (Although we focus on “non-adaptive” situations in this paper, the results developed below remain valid also in “adaptive” situations so there is no need to explicitly rule out (7).) Nevertheless, because some of the technical e¤ort used in the development of our results can be avoided when (7) holds it is of interest to be able to identify situations when this occurs, so for completeness Section 7.1 provides a useful “heuristically necessary”condition for (7) when ^ n is n1=6 -consistent. (ii) When g0 (zi ; ) is linear, the …rst part of Condition (SE) typically holds provided ^ n 0 = op (n 1=4 ): More generally, when g0 (zi ; ) is a polynomial approx- imation of order R; then the …rst part of Condition (SE) typically holds provided ^n 0 = op (n 1=2(R+1) ): In other words, the …rst part of Condition (SE) can typi- cally be satis…ed by judicious choice of g0 : (Indeed, the …rst part of Condition (SE) can be rendered redundant by setting g0 = g0 :) The most interesting part of Condition (SE), and the part motivating the label “(SE)”, is therefore the second part, which is a stochastic equicontinuity condition. (iii) In important special cases, the second part of Condition (SE) reduces to conditions already in the literature, being equivalent to Assumption 5.2 of Newey (1994a) when g0 (zi ; ) is linear and reducing to (2:8) of Andrews (1994a) and (3:34) of Andrews (1994b) when g0 = g0 : Bootstrapping Semiparametric Estimators 3.2. 15 Alternative Asymptotics. As can the approach outlined in Lemma 1, the approach taken in this paper can be summarized by means of three high-level conditions. To state these, let ^ n;i (z; ) = 1 n n X 1 j=1;j6=i w(zj ; )Kn [x(z; ) x(zj ; )] denote the “leave-one-out”versions of ^ n (z; ) and de…ne (i = 1; : : : ; n) n( ; ) = E^ n ( ; ): Theorem 2. Suppose that: (AL) for some matrix J of rank k; p 1 X gn (zi ; ^ n;i ) + op (1); 0) = J p n i=1 n n(^n where gn (z; ) = g[z; 1 0; n w(z; 0 )Kn [x( ; 0) x(z; 0 )] + (1 n 1) ( ; 0 )]; (AS) [asymptotic separability] for some function gn ; 1 X p [gn (zi ; ^ n;i ) n i=1 n gn (zi ; ^ n;i ) 1 X p [gn (zi ; ^ n;i ) n i=1 gn (zi ; n) + gn (zi ; n )] !p 0; n )] !p 0; n Gn (^ n;i ) gn (zi ; n) + Gn ( where Gn ( ) = Egn (z; ); (AN) [asymptotic normality] for some positive de…nite 1 X p [gn (zi ; n i=1 and some n; n Then (4) holds with Bn = J n) n + Gn (^ n;i ) and Gn ( n) n] N (0; ): = J J 0: By construction, the functional form of gn is such that gn (zi ; ^ n;i ) = g0 (zi ; ^ n ) for every i = 1; : : : ; n: As a consequence, Condition (AL) of the theorem is simply an Bootstrapping Semiparametric Estimators 16 unorthodox restatement of Condition (AL) of Lemma 1. It is stated in terms of gn and ^ n;i in anticipation of the other conditions of the theorem. As discussed in more detail in Section 4.2, Condition (SE) typically fails when the bandwidth hn is “small”in the sense that it converges to zero at a faster-than-usual rate. On the one hand, rapid convergence of hn to zero can result in failure of n1=4 consistency of ^ n ; which in turn can lead to a failure of the …rst part of Condition (SE) when g0 (zi ; ) is chosen to be linear. In isolation, this problem is conceptually straightforward to address. Indeed, as alluded to in remark (ii) at the end of the previous subsection, the …rst part of Condition (SE) can usually be preserved also when smaller-than-usual bandwidths are employed simply by working with a polynomial (e.g., quadratic) g0 (zi ; ): On the other hand, it turns out that regardless of the functional form of g0 (zi ; ) the stochastic equicontinuity (i.e., the second) part of Condition (SE) tends to fail when the bandwidth conditions ensuring n1=4 -consistency of ^ n are relaxed. Addressing this problem turns out to be necessary in order to achieve the goals of this paper. We are unaware of previous work pointing out the need to (let alone providing a solution to the question of how to) avoid reliance on stochastic equicontinuity when accommodating slowly converging nonparametric components. The approach we take is to replace Condition (SE) with Condition (AS). On the surface, Condition (AS) is simply a “leave-one-out”version of Condition (SE), but perhaps remarkably it turns out that Condition (AS) is satis…ed (with gn (zi ; ) being a quadratic approximation to gn (zi ; )) in many cases even when Condition (SE) is not. Condition (AS) gives rise to the following counterpart of (6) : 1 X 1 X p gn (zi ; ^ n;i ) = p [gn (zi ; n i=1 n i=1 n n n) + Gn (^ n;i ) Gn ( n )] + op (1): (8) Bootstrapping Semiparametric Estimators In this expansion, the terms gn (zi ; n) and Gn (^ n;i ) Gn ( n) 17 each depend on one (but not both) of zi and ^ n;i : In other words, Condition (AS) achieves (asymptotic) “separability” between zi and the nonparametric ingredient ^ n;i in gn (zi ; ^ n;i ): As such, Condition (AS) serves a purpose very similar to that of Condition (SE) and the label “(AS)”has been chosen to highlight this connection between the two conditions. The big advantage of Condition (AS) is that in leading examples it can be veri…ed under assumptions that are considerably weaker than those required for Condition (SE). The price to be paid for this extra generality is relatively small. In addition to the notational nuisance of having to employ additional subscripts in many places, a complication that must be addressed is that the terms gn (zi ; n) and Gn (^ n;i ) Gn ( n) in (8) both have a nonnegligible mean in general. Accordingly, Condition (AN) is the simplest counterpart of Condition (AN0 ) that one can hope for (in general). Thankfully, it turns out that Condition (AN) is veri…able under assumptions similar to those required for Condition (AN0 ) (with once again computable using the pathwise derivative formula of Newey (1994a) and n given by a formula presented below). 4. Verifying the Conditions of Theorem 2 The goal in this section is to demonstrate the plausibility of the conditions of Theorem 2. To do so, we outline general approaches to verifying the conditions and apply these approaches to the following three prominent examples.6 Example 1: Average Density. Suppose x1 ; : : : ; xn are i:i:d: copies of a continuously distributed random vector x 2 Rd with density f0 : Then a kernel-based 6 To avoid distractions, we omit statements of regularity conditions when presenting the examples and the associated results. Additional details for the examples are provided in Section 9. Bootstrapping Semiparametric Estimators estimator of 0 = R Rd 18 f0 (x)2 dx; the average density, is given by X ^n = 1 f^n (xi ); n i=1 n 1X f^n (x) = Kn (x n j=1 n xj ): When verifying the conditions of Theorem 2 for this example, we set z = x; ^ n (^n ; f^n ) = 0; ; ) = f0 ( ); and let ^n be de…ned by G x(z; ) = z; w(z; ) = 1; 0( where g(x; ; f ) = f (x) is a linear functional of f: Being a second-order V -statistic this estimator is very tractable. Owing partly to this tractability the estimator has been widely studied (in the statistics literature at least). We include it here mainly because it provides a dramatic demonstration of the fragility of the second part of Condition (SE) with respect to bandwidth choice. Example 2: Weighted Average Derivative. Suppose z1 ; : : : ; zn are i:i:d: copies of z = (y; x0 )0 , where y 2 R is a scalar dependent variable and the vector x 2 Rd is a continuous explanatory variable with density f0 : A weighted average derivative of the regression function r(x) = E(yjx) is de…ned as 0 ! is a known scalar weight function. As an estimator of = E[!(x)@r(x)=@x]; where 0; Cattaneo et al. (2013) considered X ^n = 1 yi s(xi ; f^n ); n i=1 n @ !(x) @x s(x; f ) = !(x) @f (x)=@x ; f (x) where f^n is de…ned as in Example 1. In this example, z = (y; x0 )0 ; x(z; ) = x; w(z; ) = 1; ^ n (^n ; f^n ) = 0; where g(z; ; f ) = ys(x; f ) G 0( ; ) = f0 ( ); and : This estimator is a representative member of the class of two-step semiparametric estimators insofar as it involves a nonlinear, but smooth, functional of its nonparametric ingredient f^n : As will become apparent, this nonlinearity needs to be taken Bootstrapping Semiparametric Estimators 19 into account when choosing gn in the process of verifying Condition (AS). More importantly, perhaps, it turns out that nonlinearity manifests itself in the functional form of n in Condition (AN) and therefore in the form of the bias Bn in (4) : Example 3: Hit Rate. Suppose z1 ; : : : ; zn are i:i:d: copies of z = (y; x0 )0 ; where y 2 R is a scalar and the vector x 2 Rd is a continuous explanatory variable with density f0 : As an estimator of 0 = P[y f0 (x)]; a particular example of a so-called ‘hit rate’, Chen et al. (2003) studied X ^n = 1 1[yi n i=1 n f^n (xi )]; where 1[ ] is the indicator function and f^n is as in the previous examples. In this example, z = (y; x0 )0 ; x(z; ) = x; w(z; ) = 1; ^ n (^n ; f^n ) = 0; where g(z; ; f ) = 1[y G f (z)] 0( ; ) = f0 ( ); and : While simple in many respects, being a discontinuous functional of f^n this estimator is an interesting one to investigate when attempting to understand the extent to which the conclusions of Example 2 are consequences of the smoothness (with respect to f^n ) of the estimator considered there. In particular, because the empirical process methods utilized by Chen et al. (2003) to handle the lack of smoothness of ^n with respect to f^n cannot be applied when verifying Condition (AS) it would appear to be of interest to investigate the extent to which Condition (AS) can be veri…ed also in the absence of smoothness. 4.1. Approximate Linearity. Condition (AL) is simply a representation when ^n is de…ned as the solution to G ^ n ( ; ^ n ) = 0 for a function g with g(z; ; ) not depending on : Indeed, Condition (AL) holds with J = Ik and without any op (1) term in this important special case, which covers Examples 1-3. Bootstrapping Semiparametric Estimators 20 ^ n( 0; ^n) More generally, verifying Condition (AL) is usually straightforward when G 1=2 is assumed to be Op (n ); as in Conditions (SE) and (AN0 ) of Lemma 1, because then su¢ cient conditions for Condition (AL) can be formulated with the help of Pakes and Pollard (1989, Theorem 3.3). The situation is slightly more delicate in the scenarios of main interest to us because Conditions (AS) and (AN) of Theorem 2 imply ^ n ( 0 ; ^ n ) = Op (k only that G n k); where it turns out that k nk = o(n 1=3 ) 6= O(n 1=2 ) in the cases of primary interest. For completeness, we provide a set of su¢ cient conditions for Condition (AL) compatible with Conditions (AS) and (AN). To state these, let k k denote the Euclidian norm, let k k denote a norm on a function space to which ^ n (with probability approaching one), and for any and ( ) = f : k every 0 Lemma 3. Suppose that ^n ^ n (^n ; ^ n )0 W G ^ n (^n ; ^ n ) (i) G 0 inf = op (1); k^ n 2 (v) 0 G( ; ^ n ) 0k = op (n n n 1=3 0 )k = o(n kG_ ( ) + k ); and that: # G_ 0 k = O(1); 0k ); ^ n ( 0 ; ^ n ) + G( 0 ; ^ n )k = op (n G ); is an interior point of 1=6 : Then Condition (AL) holds with J = (G_ 00 W G_ 0 ) 1 G_ 00 W: 1=2 g exists for = o(1); n) ^ n ( 0 ; ^ n ) = op (n (iv) G 0 ^ n ( ; ^ n )0 W G ^ n ( ; ^ n ) + op (n 1 ); G 2 f0; 1=3g and for every positive 2 ( = 0k _ 0 ): and let G_ 0 = G( (ii) G_ 00 W G_ 0 has rank k and for every positive " _ )( kG( ; ) G( 0 ; ) G( sup 2 k 0k 2 ( n ); 2 ( n ) ^ n( ; ^n) sup kG ( )=f :k _ ) = @G( ; )=@ 0 j g: Also, suppose G( 0k in a neighborhood of (iii) for > 0; let belongs 0 =2 ); Bootstrapping Semiparametric Estimators 21 Lemma 3 is in the spirit of Pakes and Pollard (1989, Theorem 3.3). An inspection of the proof of the lemma shows that the = 1=3 version of condition (iii) is redundant ^ n (^n ; ^ n ) = op (n when condition (i) is strengthened to G the “just identi…ed”case where 0 1=2 ); as is usually possible in and g are of the same dimension. More details on the assumptions of the lemma, including in particular some remarks on our (stochastic equicontinuity) condition (iii), are provided in Section 7.2. Su¢ ce it to say that overall the conditions of Lemma 3 seem su¢ ciently weak to support the view that in perfect analogy Newey and McFadden (1994, p. 2196), Condition (AL) is “not conceptually di¢ cult, only technically di¢ cult”. Accordingly, and because those features that are allowed for by the general formulation but assumed away in examples where Condition (AL) is simply a representation are incidental to the main points of this paper, we have deliberately chosen illustrative examples Pn p 1=2 satisfying n(^n 0) = n i=1 g0 (zi ; ^ n ) : 4.2. Asymptotic Separability. In many cases Condition (SE) is too strong when the goal is to prove (4) with Bn 6= o(n 1=2 ): This shortcoming is not shared by Condition (AS), which often turns out to be veri…able under assumptions compatible with n 6= o(n 1=2 ) in Condition (AN0 ). A dramatic illustration of this phenomenon is provided by Example 1. Example 1 (continued). Suppose the kernel is of order P and satis…es standard d conditions, and suppose the bandwidth satis…es nh2P n ! 0 and nhn ! 1: Then (4) holds with Bn = K(0)=(nhdn ) and = 4V[f0 (x)] provided f0 is su¢ ciently smooth. (For details, see Section 9.) p p d Because nBn = K(0)= nh2d n ; the condition nhn ! 1 is weak enough to permit Bn 6= o(n 1=2 ): On the other hand, (4) reduces to (1) when imposing conditions requiring nh2d n ! 1; so it is necessary to guard against this when the goal is to Bootstrapping Semiparametric Estimators obtain a result of the form (4) with Bn 6= o(n 1=2 22 ): Turning …rst to Condition (SE) and setting g0 = g0 ; the …rst part of that condition is automatically satis…ed and the second part becomes 1 X ^ p [fn (xi ) n i=1 n fn (xi ) f0 (xi ) + 0] fn ( ) = Ef^n ( ): = op (1); It follows from a direct calculation that 1 X ^ p [fn (xi ) n i=1 n fn (xi ) f0 (xi ) + 0] K(0) =p + op (1); nh2d n so Condition (SE) requires nh2d n ! 1 and is therefore too strong for our purposes. On the other hand, setting gn = gn the …rst part of Condition (AS) is automatically satis…ed and the second part becomes 1 X ^ p [fn;i (xi ) n i=1 n where f^n;i (x) = (n 1) 2fn (xi ) + n] = op (1); n = Z fn (x)f0 (x)dx; Rd 1 Pn j=1;j6=i Kn (x xj ): A simple variance calculation now shows that Condition (AS) is satis…ed if nhdn ! 1: d To interpret the bandwidth requirements nh2d n ! 1 and nhn ! 1 implied by Conditions (SE) and (AS) in this example it is helpful to recall that the (pointwise) p p rate of convergence of f^n is nhdn ; that is, f^n (x) fn (x) = Op (1= nhdn ) for any d x 2 Rd : The conditions nh2d n ! 1 and nhn ! 1 therefore correspond loosely to the requirements of n1=4 -consistency and consistency, respectively, on the part of the nonparametric ingredient f^n : The fact that any rate of convergence (on the part of the nonparametric ingredient f^n ) will su¢ ce when verifying Condition (AS) is largely due to the linearity of ^n with Bootstrapping Semiparametric Estimators 23 respect to f^n : This feature is clearly special, but in three important respects Example 1 turns out to be representative. First, in order to obtain results of the form (4) with Bn 6= o(n 1=2 ) it is necessary to work under bandwidth conditions that can result in convergence rates slower than n1=4 for nonparametric ingredients. Second, bandwidth conditions needed for stochastic equicontinuity (i.e., Condition (SE)) generally imply n1=4 -consistency on the part of nonparametric ingredients. Avoiding reliance on Condition (SE) therefore turns out to be crucial in order to achieve the goals of this paper. Third, as demonstrated by the examples that follow the qualitatively conclusion that Condition (AS) is compatible with convergence rates slower than n1=4 on the part of nonparametric ingredients is true quite generally. Condition (AS) therefore emerges as an attractive alternative to Condition (SE). Among estimators not satisfying (7) we are aware of only two types that provide exceptions to the rule that bandwidth conditions needed for Condition (SE) imply n1=4 -consistency on the part of nonparametric ingredients. These exceptions are those illustrated by Hausman and Newey’s (1995) consumer surplus estimator and the “leave in” version of Powell, Stock, and Stoker’s (1989) estimator, respectively. Hausman and Newey’s (1995) consumer surplus estimator (also discussed in Newey and McFadden (1994)) is one for which g0 (z; ) is additively separable between z and .7 In such special cases “separability” between z and is of course automatic and, indeed, both parts of Condition (SE) are satis…ed (without any op (1) terms) when g0 = g0 : In the case of the “leave in” version of Powell et al.’s (1989) estimator, on the other hand, validity of Condition (SE) under weak bandwidth conditions would appear to be due to the fact that Condition (AS) holds and the fact that g0 and gn coincide (apart from a non-important factor of proportionality).8 7 8 Indeed, their g0 (z; ) does not depend on z: As pointed out by footnote 6 of Powell et al. (1989), g0 = (1 0 kernels satisfy K (0) = 0: n 1 )gn because symmetric Bootstrapping Semiparametric Estimators 24 While seemingly peculiar, the feature which makes the “leave in”version of Powell et al.’s (1989) estimator satisfy Condition (SE) turns out to be exploitable quite generally. To be speci…c, essentially because it involves ^ n;i rather than ^ n ; the “leaveone-out” version of Condition (SE) given by Condition (AS) turns to be veri…able under remarkably weak bandwidth conditions. As pointed out by Newey and McFadden (1994) and illustrated by Example 1, veri…cation (or otherwise) of the second part of Condition (SE) involves nothing more than second-order U -statistic calculations when g0 (z; ) is linear in : Example 1 also shows that very similar calculations can be used to verify the second part of Condition (AS) when gn (z; ) is linear in . To accommodate slower-than-n1=4 rates of convergence while achieving nontrivial generality it is useful to have results that cover nonlinear-in- functions. It turns out that in many nonlinear-in- cases a variation on Newey’s (1994a) approach can be used to verify Condition (AS) for a gn which is quadratic (rather than linear, as in Newey (1994a)) in its second argument. For such a gn ; the second part of Condition (AS) can usually by veri…ed by means of the following lemma. Lemma 4. Suppose that gn; (z)[ ] is linear, gn; (z)[ ; ] is bilinear, and that: V(gn; (z1 )[ V[E(gn; (z1 )[ n;2 ; n;2 ]jz1 )] = o(n2 ); V(gn; (z1 )[ where n;j (z; ) = w(zj ; )Kn [x(z; ) gn (z; ) = gn (z; n) n;2 ]) n;2 ; V(gn; (z1 )[ n;3 ]) x(zj ; )] + gn; (z)[ satis…es the second part of Condition (AS). = o(n); n] n;2 ; n;2 ]) = o(n3 ); = o(n2 ); n (z; ): Then 1 + gn; (z)[ 2 n; n ]; Bootstrapping Semiparametric Estimators 25 The ease with which the …rst part of Condition (AS) can be veri…ed with a quadratic gn depends on the smoothness of gn (z; ) with respect to : One widely applicable su¢ cient condition, similar in spirit to Newey (1994a, Condition 5.1), is given by the following lemma, in which k k which ^ n;i ( ; 0) n( ; 0) with k ( ; 0) n( ; denotes a norm on a function space to belong (with probability approaching one). Lemma 5. Suppose that max1 all 0 0 )k kgn (z; ) i n 0 k^ n;i ( ; 0) n( ; 0 )k 0 = op (n 1=6 ) and that for small enough, gn (z; )k b(z)k ( ; 0) n( ; 0 )k 3 0 ; where gn is as in Lemma 4 and Eb(z) < 1: Then the …rst part of Condition (AS) is satis…ed. The conditions of the lemma are satis…ed in Example 2, although slightly better su¢ cient conditions for Condition (AS) can be obtained by exploiting a special feature of the estimator considered in that example. Example 2 (continued). A quadratic approximation to gn (z; f ) is given by gn (z; f ) = gn (z; fn ) + gn;f (z)[f where, de…ning fn+ (x) = n 1 Kn (0) + (1 gn;f (z)[ ] = (1 gn;f f (z)[ ; ] = (1 1 fn ] + gn;f f (z)[f 2 fn ; f fn ]; n 1 )fn (x); y!(x) @ @fn+ (x)=@x (x) (x) ; fn+ (x) @x fn+ (x) @ y!(x) @ @f + (x)=@x n 1 )2 + 2 (x) (x) + (x) (x) 2 n + (x) (x) : fn (x) @x @x fn (x) n 1) 3 If hn ! 0 and if nhn2 d+3 =(log n)3=2 ! 1; then the conditions of Lemma 5 are satis…ed. Proceeding as in Cattaneo et al. (2013), the second bandwidth condition Bootstrapping Semiparametric Estimators 3 can in fact be relaxed to nhn2 d+1 26 =(log n)3=2 ! 1: This bandwidth condition is also su¢ cient for the second part of Condition (AS), as the assumptions of Lemma 4 are satis…ed whenever nhd+2 ! 1: n In Example 1, Condition (AS) could be veri…ed without imposing conditions on the 3 d+1 rate of convergence of f^n : In contrast, the condition nhn2 =(log n)3=2 ! 1 imposed in Example 2 does impose a convergence rate condition on f^n ; corresponding roughly to an n1=6 -consistency requirement on f^n (and its derivative). While almost certainly not the weakest possible, this bandwidth condition is attractive from the perspective of this paper as it is weak enough to permit failure of n1=4 -consistency (which once again turns out to be crucial) yet strong enough to enable us to verify Condition (AS) with a fairly modest amount of e¤ort. Even in cases where gn (z; ) is not smooth, Lemma 5 might be useful when attempting to verify that the …rst part of Condition (AS) is satis…ed by a quadratic gn : Example 3 provides an illustration. Example 3 (continued). Let Fyjx ( jx) denote the conditional cdf of y given x; let fyjx ( jx) and f_yjx ( jx) denote its …rst and second derivatives. By …rst conditioning on x and then proceeding as in the smooth case we are led to consider gn (x; f ) = gn (x; fn ) + gn;f (x)[f where gn (x; fn ) = 1 fn ] + gn;f f (x)[f 2 fn ; f Fyjx [fn+ (x)jx] and gn;f (x)[ ] = (1 n 1 )fyjx [fn+ (x)jx] (x); gn;f f (x)[ ; ] = (1 n 1 )2 f_yjx [fn+ (x)jx] (x) (x): fn ]; Bootstrapping Semiparametric Estimators 3 27 d If hn ! 0 and if nhn2 =(log n)3=2 ! 1; then Lemma 5 can be used to show that the …rst part of Condition (AS) is satis…ed. The same bandwidth conditions are su¢ cient for the second part of Condition (AS), as the assumptions of Lemma 4 are satis…ed whenever nhdn ! 1: In perfect analogy with Example 2, also in this nonsmooth case Condition (AS) is seen hold under bandwidth conditions corresponding roughly to an n1=6 -consistency requirement on f^n : Once again, the main thing to notice is the fact that Condition (AS) permits failure of n1=4 -consistency. Remarks. (i) The second part of Condition (SE) can be veri…ed by exhibiting a sequence n;0 of function classes satisfying limn!1 P[^ n ( ; 1 X p [g0 (zi ; ) n i=1 0) 2 n;0 ] = 1 and n sup (; 0 )2 n;0 G0 ( ) g0 (zi ; 0) + G0 ( 0 ) !p 0; where empirical process results (e.g., maximal inequalities) can be used to formulate primitive su¢ cient conditions for the latter. This approach, taken by Andrews (1994b, Condition (3:36)), Chen et al. (2003, Conditions (2:4) and (2:50 )), and many others, does not seem applicable when the goal is to formulate primitive su¢ cient conditions for the second part of Condition (AS). To be speci…c, the dependence of ^ n;i on i in the second part of Condition (AS) implies that the desired convergence in probability result cannot be deduced with the help of a result of the form 1 X p [gn (zi ; ) n i=1 n sup (; 0 )2 n;0 Gn ( ) gn (zi ; n) + Gn ( n) !p 0: (ii) For Example 3 arguments similar to those used in our veri…cation of Condition (AS) can be used to show that Condition (SE) holds when nh2d n ! 1 (which Bootstrapping Semiparametric Estimators 28 corresponds roughly to an n1=4 -consistency requirement on f^n ). This represents a considerable improvement over the condition nh3d n ! 1 (which corresponds roughly to an n1=3 -consistency requirement on f^n ) apparently required when using empirical process methods to verify Condition (SE) by …rst exhibiting a function space H satisfying (3:3) of Chen et al. (2003)’s Theorem 3 (as is done by Chen et al. (2003, ^ in their p. 1600)) and then formulating bandwidth conditions under which f^n (h notation) belongs to H with probability approaching one. 4.3. Asymptotic Normality. Suppose gn is as in Lemmas 4 and 5, in which case we have Gn ( ) = Gn ( n) + Gn; [ n] 1 + Gn; [ 2 n; n ]; where Gn; [ ] = Egn; (z)[ ] and Gn; [ ] = Egn; (z)[ ]: Then 1 X p [gn (zi ; n i=1 n where, de…ning n (z) 1 X p )] = n n i=1 n n ) + Gn (^ n;i ) = Gn; [w(z; n (z) Gn ( 0 )Kn (x( = gn (z; n) ; 0) x(z; Egn (z; n) + n (zi ) 0 )) n( + ; p nBn ; 0 )]; n (z) and 11X Bn = Egn (z; n ) + Gn; [^ n;i 2 n i=1 n n; ^ n;i n ]: The assumptions of the following lemma are usually straightforward to verify. Bootstrapping Semiparametric Estimators 29 Lemma 6. Suppose that E[k for some function n;1 ; n;1 ]) = o(n2 ); 1=2 V(Gn; [ n;1 ; n;2 ]) = E[ (z) (z)0 ] and Then Condition (AN) holds with = EBn + o(n (9) and that: V(Gn; [ n (z)k2 ] ! 0 n (z) n (10) = o(n): any sequence satisfying ): Knowing the functional form of and n is unnecessary for the purposes of verifying Condition (AN) and/or applying the bootstrap, but since and n may nevertheless be objects of theoretical interest we comment brie‡y on them before verifying Condition (AN) for the examples. The function given by (z) = g0 (z; 0) + 0 (z); where 0 (z) is the “correction term” discussed by Newey (1994a). Regarding the “bias” sequence S n EBn = S n + = Eg0 (z; Here, S n S n = o(n term LI n LI n n ); + NL n ; in (9) will typically be n; we have the decomposition where LI n = E[gn (z; n) g0 (z; n )]; NL n = 1 EGn; [ 2n n;1 ; n;1 ]: is a familiar (smoothing) bias term, which can typically be ignored because 1=2 ) under standard smoothness, kernel, and bandwidth conditions.9 The is a “leave in” bias term (in the terminology of Cattaneo et al. (2013)) whose presence can be attributed to a failure of stochastic equicontinuity. This term is nonnegligible in all of our examples. The …nal term, 9 S n To be speci…c, we typically have can be ignored if nh2P n ! 0: S n NL n ; is what Cattaneo et al. = O(hP n ); where P is the order of the kernel. In this case, Bootstrapping Semiparametric Estimators 30 (2013) refer to as a “nonlinearity”bias term. This term is zero in Example 1 (where ^n is a linear functional of f^n ), but is nonnegligible in Examples 2 and 3 (where ^n is a nonlinear functional of f^n ). To summarize, we can typically set Example 1 (continued). If hn ! 0; then (9) holds with n = LI n + NL n : (x) = 2[f0 (x) 0 ]: Also, Bn is nonrandom and satis…es Bn = K(0)=(nhdn ) + O(hPn + n 1 ): As a consequence, Condition (AN) is satis…ed with = 4V[f0 (x)] and n = K(0)=(nhdn ) provided nh2P n ! 0: d In summary, if nh2P n ! 0 and if nhn ! 1; then the conditions of Theorem 2 are satis…ed and (4) holds with Bn = O(1=(nhdn )): If hn ! 0 and if nhdn ! 1; then (9) holds with Example 2 (continued). (z) = g0 (z; f0 ) + 0 (z) 0 (z); where = !(x) @ @ @f0 (x)=@x r(x) + r(x) !(x) + r(x)!(x) : @x @x f0 (x) Also, (10) is satis…ed if hn ! 0 and if nhd+2 ! 1: As a consequence, Condition n (AN) is satis…ed with = E[ (z) (z)0 ] if hn ! 0 and if nhd+2 ! 1: n If also nh2P n ! 0; then LI n NL n S n = O(hPn ) = o(n 1=2 ) and Z 1 @f0 (x)=@x = K(0) r(x)!(x) dx + o(n 1=2 ); d nhn f (x) d 0 Z ZR r(x)!(x) @ 1 K(r) K(r)f0 (x rhn )dxdr = d+1 nhn f0 (x) @r d d ZR ZR 1 r(x)!(x) @f0 (x)=@x K(r)2 f0 (x rhn )dxdr + o(n d nhn Rd Rd f0 (x) f0 (x) where it follows from Cattaneo et al. (2013, Lemma 1) that in-hn expansion of the form NL n 1=2 ); admits a polynomial- Bootstrapping Semiparametric Estimators NL n = 31 1 [B0 + B1 h2n + B2 h4n + : : :]; nhdn the constants B0 ; B1 ; B2 ; : : : being functionals of K and the data generating process. (Symmetry of the kernel implies that NL n expansion of nhdn NL n is of order 1=(nhdn ) and that the polynomial involves only even powers of hn .) 3 2 In summary, if nh2P n ! 0 and if nhn d+1 =(log n)3=2 ! 1; then the conditions of Theorem 2 are satis…ed and (4) holds with Bn = O(1=(nhdn )): If hn ! 0 and if nhdn ! 1; then (9) holds with Example 3 (continued). (z) = g0 (z; f0 ) + 0 (x); 0 (x) = where fyjx [f0 (x)jx]f0 (x) + Z fyjx [f0 (x)jx]f0 (x)2 dx: Rd Also, (10) is satis…ed if hn ! 0 and if nhdn ! 1: As a consequence, Condition (AN) is satis…ed with = E[ (z) (z)0 ] if hn ! 0 and if nhdn ! 1: If also nh2P n ! 0; then where LI n = NL n = NL n S n = O(hPn ) = o(n 1=2 ) and Z 1 K(0) fyjx [f0 (x)jx]f0 (x)dx + o(n 1=2 ); nhdn d Z R 1 1 f_yjx [f0 (x)jx]K(r)2 f0 (x)f0 (x rhn )dxdr + o(n nhdn 2 Rd 1=2 can be shown to admit a polynomial-in-hn expansion of the form NL n = 1 [B0 + B1 h2n + B2 h4n + : : :]; d nhn ); Bootstrapping Semiparametric Estimators 32 the constants B0 ; B1 ; B2 ; : : : being functionals of K and the data generating process. 3 d 3=2 2 In summary, if nh2P ! 1; then the conditions of n ! 0 and if nhn =(log n) Theorem 2 are satis…ed and (4) holds with Bn = O(1=(nhdn )): The …ndings for Examples 1 and 3 are in qualitative agreement with the those for Example 2. For the model studied in Example 2, Cattaneo et al. (2013, Section 3.3) went slightly further and used a bias expansion to develop a bandwidth selection method. The bandwidths chosen by that procedure are of order n 1=(P +d) and were found by Cattaneo et al. (2013) to perform well. It seems plausible analogous bandwidth selection procedures can be developed for Examples 1 and 3 (and perhaps even in fairly complete generality). Remark. From the perspective of this paper the “leave-one-out” estimators ^ n;i are simply a technical device used when analyzing the properties of the estimator ^n de…ned in terms of ^ n : An estimator ~n (say) naturally formulated in terms of ^ n;i is the “leave-one-out” counterpart of ^n de…ned as an approximate minimizer (with respect to 2 ) of X ~ n( ) = 1 G g[zi ; ; ^ n;i ( ; )]: n i=1 n ~ n ( )0 W G ~ n ( ); G Under conditions similar to those imposed when analyzing ^n the methodology developed in this paper can be used to show that ~n satis…es (4) with Bn = J and NL n = J J 0 : In particular, the estimator ~n does not su¤er from “leave in” bias. Although ~n itself is immune to a bias problem attributable to a failure of stochastic equicontinuity, this conclusion is not obtainable by means of methods relying on stochastic equicontinuity conditions, so from a technical point of view little simplicity is gained by analyzing ~n instead of ^n : Moreover, it turns out that the bootstrap Bootstrapping Semiparametric Estimators 33 consistency result (3) typically fails for ~n ; so there are good practical reasons for focusing on ^n when part of the goal is to obtain constructive results. For this reason we do not analyze ~n in this paper. 5. Bootstrap Consistency One consequence of replacing (1) with (4) is that the statistics p ^ n( n 0) cease to be tight. Proving bootstrap consistency without existence of limiting distributions (or even tightness) can be di¢ cult in general (e.g., Radulovic (1998)), but thankfully the present setting has enough structure to enable us to give a simple characterization of bootstrap consistency. Indeed, in light of (5) the following condition is (necessary and) su¢ cient for (3) under (4) : p n(^n where p ^n Bn ) p N (0; ); (11) denotes weak convergence in probability. This characterization is very useful for our purposes because it turns out that in many cases veri…cation of (3) and (4) can proceed by …rst proving (4) and then, by imitating that proof, demonstrating (11) : For example, the next result is a bootstrap analogue of Theorem 2. To state it, let ^n be an (approximate) minimizer of X ^ ( ; )= 1 G g[zi ; ; ( ; )]; n n i=1 n ^ ( ; ^ )0 W G ^ ( ; ^ ); G n n n n where 1X ^ n (z; ) = w(zj ; )Kn [x(z; ) n j=1 n x(zj ; )] and z1 ; : : : ; zn is a random sample with replacement from z1 ; : : : ; zn : Also, let Bootstrapping Semiparametric Estimators ^ n;i (z; ) = n X 1 n 1 j=1;j6=i w(zj ; )Kn [x(z; ) x(zj ; )] 34 (i = 1; : : : ; n): Theorem 7. Suppose the assumptions of Theorem 2 are satis…ed and that: (AL*) for the same J as in (AL), p X ^n ) = J p1 g (z ; ^ ) + op (1); n i=1 n i n;i n n(^n where gn (z; ) = g[z; ^n ; n 1 w(z; ^n )Kn [x( ; ^n ) x(z; ^n )] + (1 n 1 ) ( ; ^n )]; (AS*) for some function gn ; 1 X p [g (z ; ^ ) n i=1 n i n;i n gn (zi ; ^ n;i ) 1 X p [g (z ; ^ ) n i=1 n i n;i gn (zi ; ^ n ) + gn (zi ; ^ n )] !p 0; n where Gn ( ) = n (AN*) for the same 1 Pn i=1 and Gn (^ n;i ) gn (zi ; ^ n ) + Gn (^ n )] !p 0; gn (zi ; ); n as in (AN), 1 X p [g (z ; ^ ) + Gn (^ n;i ) n i=1 n i n n Gn (^ n ) n] p N (0; ): Then (3) holds. The conditions of Theorem 7 are natural bootstrap analogs of the corresponding conditions of Theorem 2 not only in appearance but also in the sense that they can usually be veri…ed by mimicking the veri…cation of the corresponding conditions of Theorem 2. Like (AL), Condition (AL*) is simply a representation in many important cases, including Examples 1-3. More generally, Condition (AL*) can often be veri…ed with the help of a bootstrap analog of Lemma 3 given in Section 7.2. Bootstrapping Semiparametric Estimators 35 Turning next to Conditions (AS*) and (AN*), suppose gn satis…es 1 ^ n ] + gn; (z)[ 2 gn (z; ) = gn (z; ^ n ) + gn; (z)[ ^n; ^ n ]; where gn; (z)[ ] is linear and gn; (z)[ ; ] is bilinear. In perfect analogy with Lemma 4, the second part of Condition (AS*) is satis…ed provided the following conditions hold: V (gn; (z1 )[ V [E (gn; (z1 )[ n;2 ; n;2 ]jz1 )] n;j (z; = op (n); = op (n2 ); V (gn; (z1 )[ where n;2 ]) ) = w(zj ; )Kn [x(z; ) n;2 ; V (gn; (z1 )[ n;3 ]) x(zj ; )] n;2 ; n;2 ]) = op (n3 ); = op (n2 ); ^ n (z; ); E ( ) = E( jz1 ; : : : ; zn ); and V ( ) = V( jz1 ; : : : ; zn ): As in the case of the …rst part of Condition (AS), the ease with (and methods by) which the …rst part of Condition (AS*) can be veri…ed with a “quadratic”gn depends on the speci…cs of the example, but veri…cation by imitation is usually straightforward once veri…cation of the …rst part of Condition (AS) has been achieved. Finally, we have 1 X p [g (z ; ^ ) + Gn (^ n;i ) n i=1 n i n n where, de…ning n (z) 1 X Gn (^ n )] = p [ n i=1 n = Gn; [w(z; ^n )Kn (x( ; ^n ) x(z; ^n )) 1X g (zi ; ^ n ) + n i=1 n n (zi ) ^ n ( ; ^n )]; n n (z) = gn (z; ^ n ) + Bn ]; n (z) Bootstrapping Semiparametric Estimators 36 and 1X 11X Bn = gn (zi ; ^ n ) + G [^ n;i n i=1 2 n i=1 n; n n ^ n ; ^ n;i ^ n ]: If Condition (AN) holds, then so does Condition (AN*) if 1X k n i=1 n 2 n (zi )k n (zi ) !p 0 and if V (Gn; [ n;1 ; n;1 ]) = op (n2 ); E (Bn ) = V (Gn; [ n + op (n 1=2 n;1 ; n;2 ]) = op (n); ): These conditions can usually be veri…ed using straightforward, but possibly tedious, moment calculations for U -statistics. In our examples it turns out that the conditions of Theorem 7 can be veri…ed under the same bandwidth conditions as those used when verifying the conditions of Theorem 2. As a consequence, assuming for speci…city that the bandwidth is of the form hn = Cn 1= (where C > 0 and > 0 are user-chosen constants) the examples allow us to draw the following conclusions regarding the “robustness”of inference procedures with respect to bandwidth choice. Asymptotic validity of (the Ibragimov and Müller (2010) procedure, “Efron” con…dence intervals, and) standard con…dence inp tervals motivated by the distributional approximation n(^n 0 ) _ N (0; ^ n ) requires 2 (2d; 2P ); while asymptotic validity of subsampling-based con…dence intervals requires 2 [2d; 2P ): On the other hand, there is an example-speci…c constant < 2d Bootstrapping Semiparametric Estimators 37 such that the bootstrap-based percentile con…dence intervals are asymptotically valid whenever 2 ( ; 2P ): The constant results presented above that equals d in Example 1 and it follows from the is no greater than 32 d + 1 and 32 d in Examples 2 and 3, respectively.10 In other words, the range of -values for which bootstrap-based inference procedures enjoy asymptotic validity is wider than the range for which its main rivals are valid. As a result, there is a formal sense in which bootstrap-based inference procedures are more “robust”with respect to bandwidth choice than their main rivals. Remarks. (i) We deliberately study only the simplest version of the bootstrap. As in Hahn (1996) doing so is su¢ cient when the goal is to establish …rst-order asymptotic validity, but we conjecture that results analogous to Theorem 2 can be obtained for various modi…cations of the simple nonparametric bootstrap, including those proposed by Brown and Newey (2002) and Hall and Horowitz (1996) to handle overidenti…ed models. (ii) Similarly, to highlight the fact that asymptotic pivotality plays no role in our p theory we use the bootstrap to approximate the distribution of n(^n 0 ) rather than a studentized version thereof. P (iii) The condition n 1 ni=1 k n (zi ) P if (9) holds, then ^ n = n 1 ni=1 n (zi ) n (zi )k 0 n (zi ) 2 !p 0 implies in particular that !p E[ (z) (z)0 ] = : Although the estimator ^ n emerges here as a by-product of our analysis of the bootstrap it may be of interest to note that it can be interpreted as a variant of the “delta-method” variance estimator of Newey (1994b). (iv) A small simulation study based on Example 3 produced results (not reported here) consistent with one of the main predictions of the theory developed in this paper, 10 The results of Cattaneo, Crump, and Jansson (2014) suggest that = d cannot be improved in Example 1. In contrast, for Examples 2 and 3 we conjecture that even smaller values of be obtained with some additional e¤ort. can Bootstrapping Semiparametric Estimators 38 namely that in samples of moderate size the coverage probabilities of CIE;n (i.e., the “Efron” interval) and CIP;n (i.e., the “percentile” interval) can di¤er substantially and that when this occurs P[ where in most cases P[ 0 2 CIE;n ] < P[ 0 2 CIP;n ] 6. 0 2 CIn ] < P[ 0 2 CIP;n ]; 0:95: Conclusion This paper has developed “small bandwidth”asymptotic results for a class of two-step semiparametric estimators previously studied by Newey and McFadden (1994) and others. The …rst main result, Theorem 2, di¤ers from those obtained in earlier work on semiparametric two-step estimators by accommodating a non-negligible bias and a noteworthy feature of the assumptions of Theorem 2 is that reliance on a commonly employed stochastic equicontinuity condition is avoided. The second main result, Theorem 7, shows that the bootstrap provides an automatic method of correcting for the bias even when it is non-negligible. The …ndings of this paper are pointwise in two distinct respects. First, the distribution of observables is held …xed when developing large sample theory. Second, the results are obtained for a …xed bandwidth sequence. It would be of interest to develop uniform versions of Theorems 2 and 7 (along the lines of Romano and Shaikh (2012) and Einmahl and Mason (2005), respectively). Although the size of the class of estimators covered by our results is nontrivial it would be of interest to explore whether conclusions analogous to ours can be obtained for semiparametric two-step estimators whose …rst step involves other types of nonparametric estimators (e.g., local polynomial or sieve estimators of M -regression functions, possibly after model selection as in Belloni, Chernozhukov, Fernández-Val, Bootstrapping Semiparametric Estimators 39 and Hansen (2013)). In this paper we focus on kernel-based estimators because of their analytical tractability, but we conjecture that our results (and the methods by which they are obtained) can be extended to cover other nonparametric …rst-step estimators. In future work we intend to attempt to substantiate this conjecture. Bootstrapping Semiparametric Estimators 40 References Abadie, A., and G. W. Imbens (2006): “Large Sample Properties of Matching Estimators for Average Treatment E¤ects,”Econometrica, 74, 235–267. (2008): “On the Failure of the Bootstrap for Matching Estimators,”Econometrica, 76, 1537–1557. (2011): “Bias-Corrected Matching Estimators for Average Treatment Effects,”Journal of Business and Economic Statistics, 29, 1–11. Andrews, D. W. K. (1994a): “Asymptotics for Semiparametric Econometric Models via Stochastic Equicontinuity,”Econometrica, 62, 43–72. (1994b): “Empirical Process Methods in Econometrics,” in Handbook of Econometrics, Volume 4, ed. by R. F. Engle, and D. L. McFadden. New York: North Holland, 2247-2294. Belloni, A., V. Chernozhukov, I. Fernández-Val, and C. Hansen (2013): “Program Evaluation with High-Dimensional Data,”Working paper, MIT. Bickel, P. J., and B. Li (2006): “Regularization in Statistics,”Test, 15, 271–344. Brown, B. W., and W. K. Newey (2002): “Generalized Method of Moments, E¢ cient Bootstrapping, and Improved Inference,”Journal of Business and Economic Statistics, 20, 507–517. Cattaneo, M. D., R. K. Crump, and M. Jansson (2013): “Generalized Jackknife Estimators of Weighted Average Derivatives (with Discussion and Rejoinder),” Journal of the American Statistical Association, 108, 1243–1268. (2014): “Bootstrapping Density-Weighted Average Derivatives,”Econometric Theory, forthcoming. Bootstrapping Semiparametric Estimators 41 Chen, X. (2007): “Large Sample Sieve Estimation of Semi-Nonparametric Models,” in Handbook of Econometrics, Volume 6B, ed. by J. J. Heckman, and E. E. Leamer. New York: North Holland, 5549-5632. Chen, X., O. Linton, and I. van Keilegom (2003): “Estimation of Semiparametric Models When the Criterion Function is Not Smooth,” Econometrica, 71, 1591–1608. Cheng, G., and J. C. Huang (2010): “Bootstrap Consistency for General Semiparametric M -Estimation,”Annals of Statistics, 38, 2884–2915. Einmahl, U., and D. M. Mason (2005): “Uniform in Bandwidth Consistency of Kernel-Type Function Estimators,”Annals of Statistics, 33, 1380–1403. Hahn, J. (1996): “A Note on Bootstrapping Generalized Method of Moments Estimators,”Econometric Theory, 12, 187–197. Hall, P., and J. L. Horowitz (1996): “Bootstrap Critical Values for Tests Based on Generalized-Method-of-Moments Estimators,”Econometrica, 64, 891–916. Hausman, J. A., and W. K. Newey (1995): “Nonparametric Estimation of Exact Consumers Surplus and Deadweight Loss,”Econometrica, 63, 1445–1476. Ibragimov, R., and U. K. Müller (2010): “t-Statistic Based Correlation and Heterogeneity Robust Inference,” Journal of Business and Economic Statistics, 28, 453–468. Ichimura, H., and O. Linton (2005): “Asymptotic Expansions for some Semiparametric Program Evaluation Estimators,” in Identi…cation and Inference in Econometric Models: Essays in Honor of Thomas J. Rothenberg, ed. by D. W. K. Andrews, and J. H. Stock. New York: Cambridge University Press, 149-170. Bootstrapping Semiparametric Estimators 42 Ichimura, H., and P. E. Todd (2007): “Implementing Nonparametric and Semiparametric Estimators,” in Handbook of Econometrics, Volume 6B, ed. by J. J. Heckman, and E. E. Leamer. New York: North Holland, 5369-5468. Linton, O. (1995): “Second Order Approximation in the Partially Linear Regression Model,”Econometrica, 63, 1079–1112. Mammen, E. (1989): “Asymptotics with Increasing Dimension for Robust Regression with Applications to the Bootstrap,”Annals of Statistics, 17, 382–400. Newey, W. K. (1990): “E¢ cient Instrumental Variables Estimation of Nonlinear Models,”Econometrica, 58, 809–837. (1994a): “The Asymptotic Variance of Semiparametric Estimators,”Econometrica, 62, 1349–1382. (1994b): “Kernel Estimation of Partial Means and a General Variance Estimator,”Econometric Theory, 10, 233–253. Newey, W. K., F. Hsieh, and J. M. Robins (2004): “Twicing Kernels and a Small Bias Property of Semiparametric Estimators,”Econometrica, 72, 947–962. Newey, W. K., and D. McFadden (1994): “Large Sample Estimation and Hypothesis Testing,”in Handbook of Econometrics, Volume 4, ed. by R. F. Engle, and D. L. McFadden. New York: North Holland, 2111-2245. Nishiyama, Y., and P. M. Robinson (2000): “Edgeworth Expansions for Semiparametric Averaged Derivatives,”Econometrica, 68, 931–979. (2001): “Studentization in Edgeworth Expansions for Estimates of Semiparametric Index Models,” in Nonlinear Statistical Modeling: Essays in Honor of Bootstrapping Semiparametric Estimators 43 Takeshi Amemiya, ed. by C. Hsiao, K. Morimune, and J. L. Powell. New York: Cambridge University Press, 197-240. (2005): “The Bootstrap and the Edgeworth Correction for Semiparametric Averaged Derivatives,”Econometrica, 73, 903–948. Pakes, A., and D. Pollard (1989): “Simulation and the Asymptotics of Optimization Estimators,”Econometrica, 57, 1027–1057. Politis, D. N., and J. P. Romano (1994): “Large Sample Con…dence Regions Based on Subsamples under Minimal Assumptions,”Annals of Statistics, 22, 2031– 2050. Powell, J. L., J. H. Stock, and T. M. Stoker (1989): “Semiparametric Estimation of Index Coe¢ cients,”Econometrica, 57, 1403–1430. Radulovic, D. (1998): “Can We Bootstrap Even if CLT Fails?,”Journal of Theoretical Probability, 11, 813–830. Romano, J. P., and A. M. Shaikh (2012): “On the Uniform Asymptotic Validity of Subsampling and the Bootstrap,”Annals of Statistics, 40, 2798–2822. van der Vaart, A. W. (1998): Asymptotic Statistics. New York: Cambridge University Press. Bootstrapping Semiparametric Estimators 7. 7.1. ^n 44 Appendix I: Additional Results Orthogonality. In the case where the nonparametric estimator ^ n satis…es 0 = op (n 1=4 ); su¢ cient conditions for the “orthogonality”condition (7) to hold have been given by Newey (1994a), among others. For completeness, this subsection discusses conditions for (7) to hold also in the case where only ^ n 0 = op (n 1=6 ) is assumed. In this case g0 will typically admit linear and bilinear functionals g0; (z)[ ] and g0; (z)[ ; ] such that the …rst part of Condition (SE) holds with g0 (z; ) = g0 (z; 0) + g0; (z)[ 0] 1 + g0; (z)[ 2 0; 0 ]: Under mild additional moment conditions, (7) then holds if Eg0; (z)[ ] = 0 (12) Eg0; (z)[ ; ] = 0: (13) and if Both (12) and (13) are usually straightforward to verify (when they hold). For instance, verifying the conditions is easy when g(z; ; ) = A(x(z); ; )m(z; ); where m(z; ) is a residual satisfying E[m(z; (optimal) instruments depending on 0 )jx(z)] and = 0 and A(x(z); ; ) is a matrix of (e.g., Newey (1990)). The condition (12) is similar to Newey (1994a, Proposition 3). Under this condition, the “correction term” of Newey (1994a) is zero and (12) therefore implies that the presence of the nonparametric estimator has no impact on the asymptotic variance of ^n : Unlike (12) ; (13) has no counterpart in Newey (1994a), but is needed when the condition ^ n 0 = op (n 1=4 ) is relaxed. The condition (13) implies that Bootstrapping Semiparametric Estimators 45 Bn in (4) can be set equal to zero even under “small bandwidth” asymptotics. As a consequence, (13) can be thought of as giving a condition under which the presence of the nonparametric estimator has no impact on the asymptotic bias of ^n : It is easy to give examples where (12) holds, but (13) and (7) do not. As a consequence, having a “correction term” equal to zero is only a necessary condition for (7) to hold when ^ n 0 6= op (n 1=4 ): In particular, also estimators having the “small bias property”discussed by Newey, Hsieh, and Robins (2004) can su¤er from the bias problems highlighted in this paper. 7.2. Remarks on Lemma 3. The assumptions of Lemma 3 are comparable to those of Pakes and Pollard (1989, Theorem 3.3). As in Pakes and Pollard (1989, Theorem 3.3), one of the conditions of Lemma 3 is a consistency condition on ^n : That condition can usually be veri…ed by adapting Pakes and Pollard (1989, Corollary 3.2) to our setup. The condition k^ n 0k = op (n 1=6 ) has no counterpart in Pakes and Pollard (1989, Theorem 3.3), but is satis…ed in the cases of primary interest in this paper and is therefore included as an assumption in Lemma 3. Our conditions (i) and (v) correspond exactly to the analogous conditions in Pakes and Pollard (1989, Theorem 3.3).11 Condition (iv) of Lemma 3 is weaker than its counterpart in Pakes and Pollard (1989, Theorem 3.3), a weakening necessitated by the fact that in the cases of main ^ n ( 0 ; ^ n ) = op (n interest in this paper only an assumption of the form G 1=3 ) can be justi…ed. To compensate for this weakening, conditions (ii) and (iii) of Lemma 3 are somewhat stronger than the natural counterparts of conditions (ii) and (iii) of Pakes and Pollard (1989, Theorem 3.3). For instance, while the natural counterpart 11 As in Pakes and Pollard (1989, pp. 1044-1046), it is possible to relax the assumption that the matrix W in (i) is nonrandom. To be speci…c, the conclusion of Lemma 3 remains una¤ected if W ^ n satisfying W ^ n = W + op (n is replaced with an estimator W 1=6 ): Bootstrapping Semiparametric Estimators 46 of condition (ii) of Pakes and Pollard (1989, Theorem 3.3) imposes only a full rank condition on G_ 00 W G_ 0 our condition (ii) also includes additional (mild) smoothness requirements on G: Translated into our notation, the natural counterpart of condition (iii) of Pakes and Pollard (1989, Theorem 3.3) is intended to ensure that ^ n (^n ; ^ n ) kG G(^n ; Under our condition (ii), the goal when ^n 0 1=3 = op (n 0) ^ n ( 0 ; ^ n ) + G( 0 ; G 0 )k 1=2 = op (n ): = 0 version of our condition (iii) achieves the same ) and k^ n 0k = op (n 1=6 ): The = 1=3 version of condition (iii) has no obvious counterpart in Pakes and Pollard (1989, Theorem 3.3), but can be interpreted as a weakened version of a condition similar to (13) of Cheng and Huang (2010, Condition S1). This condition is used here to handle situations ^ n (^n ; ^ n ) 6= op (n where G 1=2 ); as typically happens when 0 2 Rk is “overidenti…ed” in the sense the dimension of g exceeds k: When ^n 0 = op (n 1=3 ) and k^ n 0k = op (n 1=6 way of arriving at the displayed result would be to set n ); a seemingly more elegant = (n 1=6 ) and assume the following variant of Chen et al. (2003)’s Condition (2:50 ) : 12 (SE’) for every positive sup 2 ( 12 n ); 2 n n = o(n ^ n( ; ) kG By assumption, G( 0 ; 0) 1=3 ); G( ; ) ^ n( 0; G 0) + G( 0 ; = 0: The purpose of retaining G( 0 ; 0) 0 )k = op (n 1=2 ): in the formulation of Condition (SE’) is to facilitate comparison with condition (iii) of Lemma 3. We use the label (SE’) in recognition of the fact that the condition is stronger than Condition (SE) whenever ^ n 2 approaching one). n (with probability Bootstrapping Semiparametric Estimators 47 An implication of the discussion in Section 4.2 is that Condition (SE’) is violated in the examples of interest in this paper. In contrast, because the stochastic equicontinuity condition (iii) of Lemma 3 employs a formulation in which the second argument of each function inside the k k is the same, the condition often reduces to a condition concerning the ‡uctuations of a process indexed by the …nite-dimensional parameter ; suggesting that it might be veri…able under weaker conditions than Condition (SE’), which concerns the ‡uctuations of a process which depends on the in…nite-dimensional parameter as well. Indeed, condition (iii) of Lemma 3 is automatically satis…ed 1=2 (with the op (n =2 ) term equal to zero) in Examples 1-3.13 More generally, the condition can often be veri…ed by setting s 1=2 and n = (n 1=6 ) and verifying the condition of the following lemma. Lemma 8. Suppose that limn!1 lim for some neighborhood @ of Fn (z; ; ) = n ); 2 0 sup 2@ E[Fn (z; ; )2 ] < 1 and some s > 0; where k ; 2 n kg(z; 0 ; ( ; 0 )) 0 and for every positive n = o(n g(z; ; ( ; )k: ); G( ; ) ^ n ( 0 ; ) + G( 0 ; )k = op (n G 1=2 s n ^ n( ; ) kG G( ; ) ^ n ( 0 ; ) + G( 0 ; )k = op (n G 1=2 s n ^ n( ; ) kG sup 2 ( 2s sup k Then, for every 0 #0 ) (14) ): (15) and sup 2 ( 13 n ); 2 Because Condition (AL) itself is automatically satis…ed in Examples 1-3 there is no need to verify it using Lemma 3. Nevertheless, given the failure of Condition (SE’) in these examples it seems comforting that condition (iii) of Lemma 3 is satis…ed in the examples. Bootstrapping Semiparametric Estimators 48 Like Chen et al. (2003, Theorem 3), Lemma 8 is in the spirit of Andrews (1994b, Section 5). The lemma is obtained by bounding the L2 bracketing numbers of the classes Fn = fg( ; ; ( ; )) g( ; 0; (; 0 )) : 2 2 ( n ); n g: These bracketing numbers are often considerably smaller in magnitude than the bracketing numbers of classes such as fg( ; ; ( ; )) : 2 ( n ); 2 n g; the latter being the classes of interest when verifying Condition (SE’). In Lemma 8, (15) is a natural bootstrap counterpart of (14) : It is stated in anticipation of condition (iii*) of the following bootstrap analog of Lemma 3. Lemma 9. Suppose that ^n ^n = op (1); k^ n ^ n k = op (n 1=6 ); that the assump- tions of Lemma 3 are satis…ed, and that: ^ (^ ; ^ ) ^ (^ ; ^ )0 W G (i*) G n n n n n n (iii*) for 2 ^ ( ; ^ ) + op (n 1 ); ^ ( ; ^ )0 W G G n n n n 2 f0; 1=3g and for every positive ^ ( ;^ ) sup kG n n 2 ( inf G( ; ^ n ) n = o(n ); ^ ( 0 ; ^ ) + G( 0 ; ^ )k = op (n G n n n 1=2 =2 ); n) ^ n (^n ; ^ n ) = op (n (iv*) G 1=3 ): Then Condition (AL*) holds. In perfect analogy with Lemma 3, an inspection of the proof of Lemma 9 shows that the = 1=3 version of condition (iii*) is redundant when condition (i*) is ^ (^ ; ^ ) = op (n strengthened to G n n n case where 0 1=2 ); as is usually possible in the “just identi…ed” and g are of the same dimension. Bootstrapping Semiparametric Estimators 7.3. 49 Uniform Convergence Rates. Various results on uniform convergence rates for kernel estimators are used to verify the conditions of Theorems 2 and 7 in Examples 2 and 3. The results utilized are all special cases of Lemma 10 below. Suppose that for every n; Zin = (Win ; Xi0 )0 (i = 1; : : : ; n) are i:i:d: copies of Zn = (Wn ; X 0 )0 ; where X 2 Rd is continuous with bounded density fX ( ) : The kernel estimators we consider are of the form X ^ n (x) = 1 Wjn n j=1 n n (x Xj ); n (x) = 1 hdn x hn ; and ^ n;i (x) = 1 n n X 1 j=1;j6=i Wjn where hn = o (1) is a bandwidth and n (x Xj ) (i = 1; : : : ; n); : Rd ! R is a bounded and integrable kernel-like function. Bootstrap analogs of these estimators are also of interest. Letting fZ1n ; : : : ; Znn g be a random sample with replacement from fZ1n ; : : : ; Znn g; de…ne X ^ (x) = 1 W n n j=1 jn n n (x Xj ) and ^ De…ning n n;i (x) = 1 n n X 1 j=1;j6=i Wjn n (x Xj ) (i = 1; : : : ; n): (x) = E ^ n (x) the objective is to give conditions (on hn ; distribution of Zn ) under which n; and the Bootstrapping Semiparametric Estimators max1 i n j ^ n (Xi ) max1 i n j ^ n;i (Xi ) max1 j n j ^ n (Xj ) max1 i;j n j ^ n;i (Xj ) 50 = Op ( n ); (16) = Op ( n ); (17) ^ n (Xj )j = Op ( n ); (18) n (Xi )j n (Xi )j ^ n (Xj )j = Op ( n ): (19) To give a succinct statement, let Gam( ) be the Gamma function and for s > 0; let C (s) = supn [E(jWn js ) + supx2Rd E(jWn js jX = x)fX (x)]: Lemma 10. (a) If C(S) < 1 for some S 2 and if n1 1=S hdn = log n ! 1; then p p (16) (19) hold with n = max( log n= nhdn ; log n=(n1 1=S hdn )): (b) If C(s) then (16) H s for some H < 1 and every s and if limn!1 nhdn = log n > 0; then p p (19) hold with n = log n= nhdn : (c) If C(s) (16) Gam(s)H s for some H < 1 and every s and if limn!1 nhdn =(log n)3 > 0; p p (19) hold with n = log n= nhdn : The condition C(s) H s (for some H < 1 and every s) is satis…ed when Wn is bounded (uniformly in n), so part (c) can be used to analyze f^n and its derivative and we use this part in all of the examples. Part (b) covers certain distributions with full support (e.g., sub-Gaussian distributions), but is not used in our examples. On the other hand, the S = 4 version of part (a) is used to verify Condition (AN*) in Example 2. Bootstrapping Semiparametric Estimators 8. 8.1. 51 Appendix II: Proofs Proof of Lemma 1. The proof is elementary: p 1 X g0 (zi ; ^ n ) + op (1) 0) = J p n i=1 n n(^n 1 X = Jp [g0 (zi ; n i=1 n 1 X = Jp [g0 (zi ; n i=1 0) + g0 (zi ; ^ n ) 0) + G0 (^ n ) g0 (zi ; 0 )] + op (1) n G0 ( 0 )] + op (1) N (0; J J 0 ); where the …rst equality uses Condition (AL), the second and third equalities use Condition (SE), and the last line uses Condition (AN0 ). 8.2. p Proof of Theorem 2. The proof is elementary: 1 X J n) = J p [gn (zi ; ^ n;i ) n i=1 n n(^n 0 1 X [gn (zi ; = Jp n i=1 n] + op (1) n 1 X [gn (zi ; = Jp n i=1 n) + gn (zi ; ^ n;i ) n) + Gn (^ n;i ) gn (zi ; n) n] + op (1) n Gn ( n) n] + op (1) N (0; J J 0 ); where the …rst equality uses Condition (AL), the second and third equalities use Condition (AS), and the last line uses Condition (AN). 8.3. _ 0 ; ^ n ) and de…ne Jn = Proof of Lemma 3. Let G_ n = G( Using k^ n 0k = op (n 1=6 ); (ii), and (iv), (G_ 0n W G_ n ) 1 G_ 0n W: Bootstrapping Semiparametric Estimators (Jn ^ n ( 0 ; ^ n ) = op (n J )G It therefore su¢ ces to show that ^n 0 1=2 ): ^ n ( 0 ; ^ n ) = op (n Jn G ^ n ( 0 ; ^ n ) + G_ n ( To do so, let Ln ( ) = G_ 0n W [G Ln (^n ) = G_ 0n W G_ n [^n 0 0 )]: 52 1=2 ): Because ^ n ( 0 ; ^ n )] Jn G and because G_ 0n W G_ n !p G_ 00 W G_ 0 > 0; it su¢ ces to show that Ln (^n ) = op (n Using ^n 0 = op (1); k^ n kG(^n ; ^ n ) 0k ^ n (^n ; ^ n ) G = op (n 1=6 ); and (ii)-(iii) (with ^ n ( 0 ; ^ n )k = op (n G( 0 ; ^ n ) + G 1=2 ): = 0), 1=2 ) and kG_ n (^n G(^n ; ^ n ) + G( 0 ; ^ n )k = k^n 0) 0k 2 Op (1): As a consequence, by the triangle inequality and using kG_ 0n W k = O(1); kLn (^n )k kG_ 0n W kkG_ n (^n 0) +kG_ 0n W kkG(^n ; ^ n ) G(^n ; ^ n ) + G( 0 ; ^ n )k ^ n (^n ; ^ n ) G ^ n ( 0 ; ^ n )k G( 0 ; ^ n ) + G ^ n (^n ; ^ n )k +kG_ 0n W G = k^n 2 0 k Op (1) so it su¢ ces to show that ^n 0 ^ n (^n ; ^ n )k + op (n + kG_ 0n W G = op (n 1=3 1=2 ); ^ n (^n ; ^ n ) = op (n ) and that G_ 0n W G 1=2 ): Bootstrapping Semiparametric Estimators Proof of ^n 0 = op (n constant C for which k 1=3 0k 53 ): Condition (ii) implies the existence of a positive C 1 kG ( ; therefore su¢ ces to show that kG(^n ; 0 )k 0) k near = k^n 0: Because ^n 0 kop (1) + op (n 0 1=3 = op (1); it ): Using (i) and (iv)-(v), we have ^ n (^n ; ^ n )0 W G ^ n (^n ; ^ n ) G ^ n ( 0 ; ^ n )0 W G ^ n ( 0 ; ^ n ) + op (n 1 ) = op (n G ^ n (^n ; ^ n ) = op (n implying in particular that G k^ n 0k = op (n 1=6 kG(^n ; ^ n ) ); and (ii)-(iii) (with ^ n (^n ; ^ n ) G 1=3 ): Also, using ^n 2=3 0 ); = op (1); = 0), ^ n ( 0 ; ^ n )k = op (n G( 0 ; ^ n ) + G 1=2 ) and (with probability approaching one) kG(^n ; 0) G(^n ; ^ n ) + G( 0 ; ^ n )k k^n 0 kop (1) : Using these rates, the triangle inequality, and (iv), kG(^n ; 0 )k kG(^n ; 0) G(^n ; ^ n ) + G( 0 ; ^ n )k +kG(^n ; ^ n ) ^ n (^n ; ^ n ) G ^ n ( 0 ; ^ n )k G( 0 ; ^ n ) + G ^ n (^n ; ^ n )k + kG ^ n ( 0 ; ^ n )k +kG = k^n 0 kop (1) ^ n (^n ; ^ n ) = op (n Proof of G_ 0n W G su¢ ces to show that + op (n 1=2 1=3 ): ): Because G_ 0n W G_ n !p G_ 00 W G_ 0 > 0; it Bootstrapping Semiparametric Estimators 54 ^ n (^n ; ^ n )0 W G_ n (G_ 0 W G_ n ) 1 G_ 0 W G ^ n (^n ; ^ n ) = op (n 1 ): G n n ^ n (^n ; ^ n ); which satis…es ~n To do so, let ~n = ^n + Jn G ^n 0 = op (n 1=3 ^ n (^n ; ^ n ) = op (n ) and G By (ii) and (iii) (with 1=3 0 = op (n 1=3 ) because ): = 1=3), we therefore have ^ n (~n ; ^ n ) Rn = G ^ n (^n ; ^ n ) G G_ n (~n ^n ) = op (n 2=3 ): As a consequence, using (i) and (v), with probability approaching one ^ n (^n ; ^ n )0 W G ^ n (^n ; ^ n ) G ^ n (~n ; ^ n )0 W G ^ n (~n ; ^ n ) + op (n 1 ) G ^ n (^n ; ^ n )0 W G ^ n (^n ; ^ n ) = G ^ n (^n ; ^ n )0 W G_ n (G_ 0 W G_ n ) 1 G_ 0 W G ^ n (^n ; ^ n ) G n n +2Rn0 [W ^ n (^n ; ^ n ) W G_ n (G_ 0n W G_ n ) 1 G_ 0n W ]G +Rn0 W Rn + op (n 1 ); which rearranges as ^ n (^n ; ^ n )0 W G_ n (G_ 0 W G_ n ) 1 G_ 0 W G ^ n (^n ; ^ n ) G n n 2Rn0 [W ^ n (^n ; ^ n ) + R0 W Rn + op (n 1 ) W G_ n (G_ 0n W G_ n ) 1 G_ 0n W ]G n = op (n 1 ): 8.4. Proof of Lemma 4. When gn is de…ned as in the lemma, we have: Bootstrapping Semiparametric Estimators 1 X p [gn (zi ; ^ n;i ) n i=1 55 n 1 = p n(n 1) 1 + p 2 n(n n n X X Gn (^ n;i ) (gn; (zi )[ i=1 j=1;j6=i n n X X 1)2 1 + p 2 n(n 1)2 gn (zi ; n;j ] n X + Gn ( Gn; [ (gn; (zi )[ i=1 j=1;j6=i n n X X n) n;j ; n )] n;j ]) n;j ] (gn; (zi )[ Gn; [ n;j ; n;j ; n;k ] n;j ]) Gn; [ n;j ; n;k ]); i=1 j=1;j6=i k=1;k2fi;jg = where Gn; [ ] = Egn; (z)[ ] and Gn; [ ] = Egn; (z)[ ]: By construction, 1 X E( p [gn (zi ; ^ n;i ) n i=1 n Gn (^ n;i ) gn (zi ; n) + Gn ( n )]) = 0: The result therefore follows from Chebychev’s inequality because 1 X V( p [gn (zi ; ^ n;i ) n i=1 n = n 1 O(V(gn; (z1 )[ Gn (^ n;i ) gn (zi ; n) + Gn ( n )]) n;2 ])) +n 2 O(V[E(gn; (z1 )[ +n 2 O(V(gn; (z1 )[ n;2 ; n;2 ; n;2 ]jz1 )]) + n 3 O(V(gn; (z1 )[ n;2 ; n;2 ])) n;3 ])) = o(n 1 ); where the …rst equality uses Hoe¤ding’s theorem for U -statistics. 8.5. Proof of Lemma 5. Because max1 we have (with probability approaching one) i n k^ n;i ( ; 0) n( ; 0 )k 0 = op (n 1=6 ); Bootstrapping Semiparametric Estimators 1 X kp [gn (zi ; ^ n;i ) n i=1 56 n gn (zi ; ^ n;i ) 1 X p kgn (zi ; ^ n;i ) n i=1 gn (zi ; n) + gn (zi ; n )]k n gn (zi ; ^ n;i )k 1 X p b(zi )k^ n;i ( ; n i=1 n (n 1=6 max1 i n 0) k^ n;i ( ; n( 0) ; 0 )k n( ; 3 0 0 )k 0 ) 31 n = op (1); n X b(zi ) i=1 where the last equality uses Ejb(z)j = Eb(z) < 1: 8.6. Proof of Lemma 6. If (9) holds and if V(Bn ) = o(n 1 ); then Condition (AN) holds with = E[ (z) (z)0 ] and 1 X p [gn (zi ; n i=1 n = EBn + o(n 1=2 ) because n 1 X = p n i=1 n) n n (zi ) + + Gn (^ n;i ) p n(Bn 1 X = p (zi ) + op (1) n i=1 Gn ( EBn ) + n Now, N (0; ): n) n] p n(EBn n) Bootstrapping Semiparametric Estimators 1X Gn; [^ n;i n i=1 n n; ^ n;i n] = 1 n(n + = 1)2 1 1 1)2 n + n(n Gn; [ i=1 j=1;j6=i n n X X 1)2 n(n n n(n n n X X n;j ; n X n;j ] Gn; [ n;j ; n;k ] i=1 j=1;j6=i k=1;k2fi;jg = n X Gn; [ i=1 n X 2 1)2 57 n;i ; n X n;i ] Gn; [ n;i ; n;j ]; i=1 j=1;j6=i so it follows from Hoe¤ding’s theorem for U -statistics that V(Bn ) = n 3 O(V(Gn; [ n;1 ; n;1 ])) + n 2 O(V(Gn; [ n;1 ; n;2 ])); implying in particular that V(Bn ) = o(n 1 ) if (10) holds. Proof of Theorem 7. It su¢ ces to show that (11) holds with Bn = J 8.7. n 0 and = J J : The proof of that result is elementary: p n(^ n 1 X [g (z ; ^ ) n) = J p n i=1 n i n;i n ^n J n] + op (1) 1 X =Jp [g (z ; ^ ) + gn (zi ; ^ n;i ) n i=1 n i n n 1 X =Jp [g (z ; ^ ) + Gn (^ n;i ) n i=1 n i n gn (zi ; ^ n ) n] + op (1) n p Gn (^ n ) n] + op (1) N (0; J J 0 ); where the …rst equality uses Condition (AL*), the second and third equalities use Bootstrapping Semiparametric Estimators 58 Condition (AS*), and the last line uses Condition (AN*). Proof of Lemma 8. Let 8.8. 0 be given and let n ) be positive. = o(n Studying one component at a time, if necessary, we may assume without loss of generality that g( ) is scalar. Suppose E[Fn (z)2 ] = O(1); where Fn (z) = supf 2Fn jf (z)j: By van der Vaart (1998, Lemma 19.35), E " sup 2 ( n ); 2 n ^ n( ; ) kG G( ; ) # ^ n ( 0 ; ) + G( 0 ; )k = O[n G 1=2 J[ ] (Fn )]; where, with N[ ] ("; Fn ) denoting the minimum number of "-brackets in L2 needed to R pE[Fn (z)2 ] p cover Fn ; J[ ] (Fn ) = 0 log N[ ] ("; Fn )d": To show (14) it therefore su¢ ces s to show that E[Fn (z)2 ] = O(1) and that J[ ] (Fn ) = o(n show that E[Fn (z)2 ] = o(n 2s ): To do so, it su¢ ces to ) and that, for some K > 0; some r > 0; and for all n large enough, log N[ ] ("; Fn ) max(K r log[" s n ]; 0): By assumption, E[Fn (z)2 ] = E[Fn (z; Also, if M > limn!1 lim n; 0) #0 2s 2 ] sup sup 2@ 2@ E[Fn (z; n) )2 ] = O( E[Fn (z; ; )2 ]; then sup E[Fn (z; ; )2 ] 2 ( n; M 2s 2s n ) = o(n 2s ): Bootstrapping Semiparametric Estimators for all (n; ) with n large enough and f small enough. Given such a pair (n; ); let : j = 1; : : : ; Nn ( )g be a -cover for j 59 ( n ): The cover may be assumed to be chosen in such way that, for some constant C; " Nn ( ) p Then an " = 4M f g (; where g ( ; j) 2s j) log N[ ] ("; Fn ) n max C ;1 : -bracketing for Fn is given by Fn ( ; ; = g( ; # k j; 0( log Nn ; j ); g (; j )) g( ; " 1=s 1=2s (4M ) j) ! + Fn ( ; ; 0; 0( ; 0 )): j) : j = 1; : : : ; Nn ( )g; As a consequence, max(log[C (4M )k=2s ] k log[" s s n ]; 0); as was to be shown. Finally, using n 1 Pn i=1 Fn (zi )2 = Op (E[Fn (z)2 ]) and proceeding as in the proof of (14) it can be shown that sup 2 ( n ); 2 n ^ ( ; ) kG n ^ n( ; ) G ^ ( 0; ) + G ^ n ( 0 ; )k = op (n G n 1=2 s ); a result which is equivalent to (15) when (14) holds. 8.9. Proof of Lemma 9. The proof is analogous to that of Lemma 3. Let _ 0 ; ^ ) and de…ne J = G_ n = G( n n (ii), and (iv*), (G_ n0 W G_ n ) 1 G_ n0 W: Using k^ n 0k = op (n 1=6 ); Bootstrapping Semiparametric Estimators (Jn ^ (^n ; ^ ) = op (n J )G n n It therefore su¢ ces to show that ^n ^n 1=2 ): ^ (^n ; ^ ) = op (n Jn G n n ^n 1=2 ): ^n )]: Because ^ (^n ; ^ ) + G_ ( To do so, let Ln ( ) = G_ n0 W [G n n n Ln (^n ) = G_ n0 W G_ n [^n 60 ^ n (^n ; ^ n )] Jn G and because G_ n0 W G_ n !p G_ 00 W G_ 0 > 0; it su¢ ces to show that Ln (^n ) = op (n Using ^n (with 0 = op (1); ^n 0 = op (n 1=3 ); k^ n 0k = op (n 1=6 1=2 ); (ii), and (iii*) = 0), kG(^n ; ^ n ) ^ (^ ; ^ ) G n n n ^ (^n ; ^ )k = op (n G(^n ; ^ n ) + G n n 1=2 ) and kG_ n (^n ^n ) G(^n ; ^ n ) + G(^n ; ^ n )k = k^n 0k 2 Op (1) + op (n 1=2 ): As a consequence, by the triangle inequality and using kG_ n0 W k = Op (1); kLn (^n )k kG_ n0 W kkG_ n (^n ^n ) +kG_ n0 W kkG(^n ; ^ n ) G(^n ; ^ n ) + G(^n ; ^ n )k ^ n (^n ; ^ n ) G ^ n (^n ; ^ n )k G(^n ; ^ n ) + G ^ (^ ; ^ )k +kG_ n0 W G n n n = k^n 0k 2 ^ n (^n ; ^ n )k + op (n Op (1) + kG_ n0 W G 1=2 ): ); Bootstrapping Semiparametric Estimators so it su¢ ces to show that ^n Proof of ^n 0 0 = op (n constant C for which k 1=3 0k = op (n ^ (^ ; ^ ) = op (n ) and that G_ n0 W G n n n 1=2 ): ): Condition (ii) implies the existence of a positive C 1 kG ( ; therefore su¢ ces to show that kG(^n ; Using (i*), (iv*), and ^n 2 1=3 61 0 )k 0) k near = k^n 0: Because ^n 0 kop (1) + op (n 0 1=3 = op (1); it ): (with probability approaching one), we have (with probability approaching one) ^ n (^n ; ^ n )0 W G ^ n (^n ; ^ n ) G ^ n (^n ; ^ n )0 W G ^ (^n ; ^ ) + op (n 1 ) = op (n G n n ^ n (^n ; ^ n ) = op (n implying in particular that G ^n 0 = op (n 1=3 ); k^ n kG(^n ; ^ n ) 0k = op (n ^ (^ ; ^ ) G n n n 1=6 1=3 ): Also, using ^n ); (ii), and (iii*) (with 2=3 0 = op (1); = 0), ^ (^n ; ^ )k = op (n G(^n ; ^ n ) + G n n 1=2 ) and (with probability approaching one) kG(^n ; 0) G(^n ; ^ n ) + G(^n ; ^ n )k k^n 0 kop (1) + op (n 1=3 ): Using these rates, the triangle inequality, and (iv*), kG(^n ; 0 )k kG(^n ; 0) G(^n ; ^ n ) + G(^n ; ^ n )k +kG(^n ; ^ n ) ^ (^ ; ^ ) G n n n ^ (^n ; ^ )k G(^n ; ^ n ) + G n n ^ n (^n ; ^ n )k + kG ^ n (^n ; ^ n )k +kG = k^n 0 kop (1) + op (n 1=3 ): ); Bootstrapping Semiparametric Estimators ^ n (^n ; ^ n ) = op (n Proof of G_ n0 W G 1=2 62 ): Because G_ n0 W G_ n !p G_ 00 W G_ 0 > 0; it su¢ ces to show that ^ (^n ; ^ n )0 W G_ (G_ 0 W G_ ) 1 G_ 0 W G ^ (^ ; ^ ) = op (n 1 ): G n n n n n n n n ^ (^ ; ^ ); which satis…es ~ To do so, let ~n = ^n + Jn G n n n n ^ n 0 = op (n 1=3 ^ n (^n ; ^ n ) = op (n ) and G By (ii) and (iii*) (with 1=3 0 = op (n 1=3 ) because ): = 1=3), we therefore have ^ (~ ; ^ ) Rn = G n n n ^ (^ ; ^ ) G n n n G_ n (~n ^ ) = op (n n 2=3 ): As a consequence, using (i) and (v), with probability approaching one ^ (^ ; ^ ) ^ (^ ; ^ )0 W G G n n n n n n ^ (~ ; ^ ) + op (n 1 ) ^ (~ ; ^ )0 W G G n n n n n n ^ n (^n ; ^ n )0 W G ^ n (^n ; ^ n ) = G ^ (^ ; ^ ) ^ (^ ; ^ )0 W G_ (G_ 0 W G_ ) 1 G_ 0 W G G n n n n n n n n n n +2Rn0 [W ^ n (^n ; ^ n ) W G_ n (G_ n0 W G_ n ) 1 G_ n0 W ]G +Rn0 W Rn + op (n 1 ); which rearranges as ^ (^ ; ^ )0 W G_ (G_ 0 W G_ ) 1 G_ 0 W G ^ (^ ; ^ ) G n n n n n n n n n n 2Rn0 [W = op (n 1 ): ^ n (^n ; ^ n ) + Rn0 W Rn + op (n 1 ) W G_ n (G_ n0 W G_ n ) 1 G_ n0 W ]G Bootstrapping Semiparametric Estimators 8.10. 63 Proof of Lemma 10. For i = 1; : : : ; n; we have ^ n (Xi ) = (1 n 1 ) ^ n;i (Xi ) + (0) Win nhdn and therefore max1 i n j ^ n (Xi ) n (Xi )j max1 i n j ^ n;i (Xi ) n (Xi )j + Rn ; where Rn = 1 supx2Rd j n because n 1 supx2Rd j n (x)j + C max1 nhdn n 1 C(1) n (x)j chev’s inequality, P[max1 i n jWin j > M for every M and every (Sn ; n ): n] i n R Rd jWin j = C max1 nhdn i n jWin j + O( n ) j (t)jdt = O(n 1 ) = O( n ): By Cheby- nP[jWn j > M Therefore, max1 i n n] nC(Sn ) M Sn Snn jWin j = Op ( n) if the limn!1 of the majorant can be made arbitrarily small by choosing Sn appropriately and making M large. In case (a), setting (Sn ; n) = (S; n1=S ) we have d n =(nhn ) = O( n ) and nC(Sn ) C(S) = ; M Sn Snn M S whose limn!1 can be made arbitrarily small by making M large. Bootstrapping Semiparametric Estimators In case (b), setting (Sn ; n) = (log n; log n) we have n (log n)H log n = M log n (log n)log n nC(Sn ) nC(log n) = S S log M n nn M n (log n)log n d n =(nhn ) H M 64 = O( n ) and log n p O(1= log n); where the second equality uses Stirling’s formula and the limn!1 of the majorant can be made arbitrarily small by making M large. In case (c), setting (Sn ; n) d n =(nhn ) = (log n; 1) we have nCW (Sn ) nCW (log n) = S S n n M M log n n H M n = O( n ) and log n ; where the limn!1 of the majorant can be made arbitrarily small by making M large. In all cases, Rn = Op ( n ) because d n =(nhn ) = O( n ): The proof of (16) can therefore be completed by showing that (17) holds. Proof of (17) : With (Sn ; ^ n;i (x) = 1 n n X 1 j=1;j6=i n) as before, let Wjn n (x Wjn = Wjn 1[jWjn j Xj ); C n ]; where C is a constant to be chosen. We have P[ ^ n;i ( ) 6= ^ n;i ( ) for some i] P[max1 i n jWin j > C n ]; whose limn!1 P[ ^ n;i ( ) 6= ^ n;i ( ) for some i] can be made arbitrarily small by making C large. Also, max1 i n supx2Rd jE[ ^ n;i (x) ^ n;i (x)]j = O(n 1=2 ) = O( n ) Bootstrapping Semiparametric Estimators 65 because n n jE[ ^ n;i (x) ^ n;i (x)]j = n jE[Wn 1(jWn j > C n ) Z nC(Sn ) C j (t)jdt; C Sn Snn Rd n (x X)]j n whose limn!1 can be made arbitrarily small by making C large. To show the desired result it therefore su¢ ces to show that max1 for every C ; where n (x) j ^ n;i (Xi ) i n n (Xi )j = Op ( n ) = E ^ n (x) = E ^ n;i (x): For any M; h P max1 i ^ n j n;i (Xi ) n (Xi )j >M n i n max1 i n P[j ^ n;i (Xi ) n max1 i n supx2Rd P[j ^ n;i (x) n (Xi )j where the last inequality uses the fact that Xi is independent of ^ n;i : Because jWjn n (x Xj ) n (x)j = O(hn d n ); it follows from Bernstein’s inequality that V[Wjn n (x Xj )] = O(hn d ); >M n (x)j n] >M n ]; Bootstrapping Semiparametric Estimators n max1 i n supx2Rd P[j ^ n;i (x) n (x)j >M n] 2n exp 66 M 2 n 2n hdn : O(1 + M n n ) To complete the proof of (17) it therefore su¢ ces to show that limn!1 1 M 2 n 2n hdn log n 1 + M n n can be made arbitrarily large by making M large. In case (a), the desired result follows from the proof of Cattaneo et al. (2013, Lemma B-1). In case (b), 1 M 2 n 2n hdn M2 = ; log n 1 + M n n 1 + M C n log n whose limn!1 can be made arbitrarily large (by making M large) if p (log n)3 =(nhdn ) is bounded. n log n = In case (c), 1 M 2 n 2n hdn M2 = log n 1 + M n n 1 + MC ; n whose limn!1 can be made arbitrarily large (by making M large) if n is bounded. Proof of (18) : For any M; P[max1 i n j ^ n (Xi ) ^ n (Xi )j > M n] = EP [max1 i n j ^ n (Xi ) ^ n (Xi )j > M n] Bootstrapping Semiparametric Estimators 67 and P [max1 i n j ^ n (Xi ) ^ n (Xi )j > M n] n supx2Rd P [j ^ n (x) ^ n (x)j > M Because jWjn n (x Xj ) ^ n (x)j = Op (h d n n ); V [Wjn n (x Xj )] = Op (hn d ); it follows from Bernstein’s inequality that P [j ^ n (x) ^ n (x)j > M n] M 2 n 2n hdn : Op (1 + M n n ) 2 exp Validity of (18) follows from this bound and the fact that limn!1 1 M 2 n 2n hdn log n 1 + M n n can be made arbitrarily large by making M large. Proof of (19) : Because ^ n;i (x) = (1 n 1) 1^ n (x) (n 1) 1 Win n (x Xi ); we have the bound (1 n 1 ) max j ^ n;i (Xj ) 1 i;j n ^ n (Xj )j max j ^ n (Xj ) 1 j n ^ n (Xj )j + R ; n n ]: Bootstrapping Semiparametric Estimators 68 where Rn = 1 C max j ^ n (Xi )j + d max jWin j 1 i n n nhn 1 i n 1 1 max j ^ n (Xi ) sup j n (Xi )j + n1 i n n x2Rd In particular, (19) holds because (18) holds. n (x)j + C max jWin j = Op ( n ): nhdn 1 i n Bootstrapping Semiparametric Estimators 9. 69 Appendix III: Details for the Examples The purpose of this section is to give explicit regularity conditions and to provide details on the derivations of the results for Examples 1-3. 9.1. Example 1. To obtain primitive bandwidth conditions for the conditions of Theorems 2 and 7, suppose that for some P > d=2; f0 is P times di¤erentiable, and f0 and its …rst P derivatives are bounded and continuous. R K is even and bounded with Rd jK(u)j (1 + kukP )du < 1 and 8 Z < 1; if l1 = = ld = 0; ul11 uldd K(u)du = : 0; Rd if (l ; : : : ; l )0 2 Zd and l + d 1 + 1 . + ld < P Conditions (AL) and (AL*) hold with J = Ik and without any op (1) terms. Condition (AS). Because gn (x; f ) = gn (x; fn ) + gn;f (x)[f fn ]; gn;f (x)[ ] = (1 Condition (AS) holds with gn = gn if V(gn;f (z1 )[ n;2 ]) n 1 ) (x); = o(n): A su¢ cient condition for this occur is that nhdn ! 1; because then V(gn;f (z1 )[ n;2 ]) = (1 n 1 )2 V[Kn (x1 x2 ) fn (x1 )] = O(1=hdn ) = o(n): d Condition (SE). If nh2P n ! 0 and if nhn ! 1; then, using Condition (AS), Bootstrapping Semiparametric Estimators 1 X ^ p [fn (xi ) n i=1 70 n fn (xi ) 1 X 1 = p [n Kn (0) + (1 n i=1 f0 (xi ) + 0] n n 1 )f^n;i (xi ) fn (xi ) f0 (xi ) + 0] K(0) 1 X = p [(1 n 1 )(2fn (xi ) fn (xi ) f0 (xi ) + 0 ] + op (1) +p n) 2d n nhn i=1 n X p K(0) 1 K(0) = p +p [fn (xi ) f0 (xi )] n( n + op (1); 0 ) + op (1) = p n i=1 nh2d nh2d n n n where the last equality uses E(jfn (x) f0 (x)j2 ) = o(1) and n 0 = O(hPn ) = o(n 1=2 ): As a consequence, Condition (SE) requires nh2d n ! 1: Condition (AN). We have: 1 X p [gn (xi ; fn ) + Gn (f^n;i ) n i=1 n 1 X Gn (fn )] = p [ n i=1 n n (xi ) + Bn ]; where n (x) and Bn = + n 0: = 2[fn+ (x) If hn ! 0; then + n ]; n (x) + n = Z Rd fn+ (x)f0 (x)dx; ! (x) for every x and it follows from the dominated convergence theorem that (9) is satis…ed. Also, (10) is satis…ed because Bn = K(0)=(nhdn ) + O(hPn + n 1 ) is nonrandom. Condition (AN) is therefore satis…ed (with = 4V[f0 (x)]) if hn ! 0 and, in fact, we can take n = K(0)=(nhdn ) when nh2P n ! 0: Condition (AS*). Because gn (x; f ) = gn (x; f^n ) + gn;f (x)[f f^n ]; gn;f (x)[ ] = (1 n 1 ) (x); Bootstrapping Semiparametric Estimators Condition (AS*) holds with gn = gn if V (gn; (z1 )[ n;2 ]) 71 = op (n): A su¢ cient condi- tion for this to occur is that nhdn ! 1; because then EV (gn; (z1 )[ n;2 ]) = (1 n 1 )2 EV [Kn (x1 x2 ) f^n (x1 )] = O(1=hdn ) = o(n): Condition (AN*). We have: 1 X p [g (x ; f^n ) + Gn (f^n;i ) n i=1 n i 1 X Gn (f^n )] = p [ n i=1 n n where, de…ning f^n+ (x) = n 1 Kn (0) + (1 ^+ n (x) = 2[fn (x) n (xi ) + Bn ]; n 1 )f^n (x); X ^+ = 1 f^+ (xi ); n n i=1 n n ^+ ]; n Bn = n 1 Kn (0) n 1 ^n : Suppose hn ! 0 and nhdn ! 1: Then Bn = n 1 Kn (0) n 1 ^n = n + op (n 1=2 ) Pn 2 1 because ^n = Op (1): Because ^n n !p 0; n n (xi )j !p 0 also i=1 j n (xi ) holds provided 1X ^ jfn (xi ) n i=1 n fn (xi )j2 !p 0: A su¢ cient condition for this to occur is that maxi jf^n (xi ) fn (xi )j = op (1) ; which in turn will hold if nhdn = log n ! 1: Su¢ ciency of the slightly weaker condition nhdn ! 1 can be demonstrated by using a direct calculation to show that if nhdn ! 1; then 1X ^ jfn (xi ) E( n i=1 n fn (xi )j2 ) = O( 1 ) ! 0: nhdn Bootstrapping Semiparametric Estimators 72 In other words, Condition (AN*) holds if hn ! 0 and if nhdn ! 1: 9.2. Example 2. To obtain primitive bandwidth conditions for the conditions of Theorems 2 and 7, suppose that for some P > d 3; E[ jyj4 ] + supx E[jyj4 jx]f0 (x) < 1: = V[ (z)] is positive de…nite. ! is continuously di¤erentiable, and ! and its …rst derivative are bounded. inf x:!(x)>0 f0 (x) > 0: f0 is P +1 times di¤erentiable, and f0 and its …rst P +1 derivatives are bounded and continuous. r is continuously di¤erentiable, and where r (x) r and its …rst derivative are bounded, = r(x)f0 (x): limkxk!1 [f0 (x) + j r (x)j] = 0; where k k is the Euclidean norm. K is even and di¤erentiable, and K and its …rst derivative are bounded. R Z Rd ul11 R jK(u)j(1 + kukP )du < 1 and 8 < 1; if l1 = = ld = 0; uldd K(u)du = : 0; if (l1 ; : : : ; ld )0 2 Zd+ and l1 + k@K(u)=@ukdu + Rd Rd . + ld < P Conditions (AL) and (AL*) hold with J = Ik and without any op (1) terms. Condition (AS). Suppose hn ! 0: As in the text, let gn (z; f ) = gn (z; fn ) + gn;f (z)[f 1 fn ] + gn;f f (z)[f 2 fn ; f fn ]; Bootstrapping Semiparametric Estimators where, de…ning fn+ (x) = n 1 Kn (0) + (1 73 n 1 )fn (x); y!(x) @ @fn+ (x)=@x gn;f (z)[ ] = (1 n ) + (x) (x) ; fn (x) @x fn+ (x) y!(x) @ @ @f + (x)=@x gn;f f (z)[ ; ] = (1 n 1 )2 + 2 (x) (x) + (x) (x) 2 n + (x) (x) : fn (x) @x @x fn (x) 1 The assumptions of Lemma 5 are satis…ed if max( n = maxi f^n;i (xi ) n; _ n = maxi fn (xi ) ; _ n ) = op (n @ ^ fn;i (xi ) @xi 1=6 ); where @ fn (xi ) : @xi More generally, proceeding as in Cattaneo et al. (2013) it can be shown that the …rst part of Condition (AS) is satis…ed if n = op (1) and if 2 n max( n; _ n ) = op (n 1=2 ): Because (by Lemma 10) n p = Op (1= nhdn = log n); p _ n = Op (1= nhd+2 n = log n); 3 a su¢ cient condition for this to occur is that nhn2 V(gn;f (z1 )[ V[E(gn;f f (z1 )[ n;2 ; n;2 ]jz1 )] n;2 ]) n;2 ; = (log n)3=2 ! 1: Moreover, = O(1=hd+2 n ); = O(1=h2d+2 ); n V(gn;f f (z1 )[ d+1 n;3 ]) V(gn;f f (z1 )[ n;2 ; n;2 ]) = O(1=h3d+2 ); n = O(1=h2d+2 ); n so the assumptions of Lemma 4 will be satis…ed provided nhd+2 ! 1: n Condition (AN). We have: Bootstrapping Semiparametric Estimators 1 X p [gn (zi ; fn ) + Gn (f^n;i ) n i=1 1 X Gn (fn )] = p [ n i=1 n 74 n n (zi ) + Bn ]; where n (z) n (z) 0 (z) = gn (z; fn ) Egn (z; fn ) + n (z); Z f0 (s) 1 n ) [Kn (s x) fn (s)]ds; 0 (s) + fn (s) Rd = (1 = !(x) @ @ @f0 (x)=@x r(x) + r(x) !(x) + r(x)!(x) ; @x @x f0 (x) and 11X Gn;f f [f^n;i 2 n i=1 n Bn = Egn (z; fn ) + If hn ! 0 and if nhdn ! 1; then n (z) ! fn ; f^n;i (z) = g0 (z; f0 ) + fn ]: 0 (z) for every z and it follows from the dominated convergence theorem that (9) is satis…ed. Also, (10) is ! 1 because the representation satis…ed if hn ! 0 and if nhd+2 n Gn;f f [ ; ] = (1 n ) 2(1 Z r(x)!(x) @ @ [ (x) (x) + (x) (x)]f0 (x)dx + 2 @x @x Rd fn (x) Z r(x)!(x) @fn+ (x)=@x 1 2 (x) (x)f0 (x)dx n ) + 2 fn+ (x) Rd fn (x) 1 2 can be used to show that if hn ! 0; then E(kGn;f f [ n;1 ; n;1 ]k 2 ) = O(1=h2d+2 ); n E(kGn;f f [ n;1 ; 2 n;2 ]k ) = O(1=hd+2 n ): Bootstrapping Semiparametric Estimators Condition (AN) is therefore satis…ed (with = E[ (z) (z)0 ]) if hn ! 0 and if nhd+2 ! 1: It can be shown that if also nh2P n n ! 0; then EBn = n 75 n + o(n 1=2 ); where Z 1 @f0 (x)=@x = r(x)!(x) dx K(0) d nhn f0 (x) Rd Z Z @ 1 r(x)!(x) K(r) K(r)f0 (x rhn )dxdr + d+1 nhn f0 (x) @r d d Z RZ R 1 r(x)!(x) @f0 (x)=@x K(r)2 f0 (x rhn )dxdr: d nhn Rd Rd f0 (x) f0 (x) Condition (AS*). A quadratic approximation to gn (z; f ) is given by 1 f^n ] + gn;f f (z)[f 2 gn (z; f ) = gn (z; f^n ) + gn;f (z)[f f^n ; f f^n ]; where gn;f (z)[ ] = (1 gn;f f (z) [ ; ] = (1 Suppose hn ! 0 and if n = op (1) and if n # + ^ @ fn (x)=@x (x) ; f^+ (x) " y!(x) @ (x) f^n+ (x) @x n " y!(x) @ @ n 1 )2 (x) + (x) (x) (x) @x @x f^n+ (x)2 n 1) n 2 n n = op (1): Then the …rst part of Condition (AS*) is satis…ed max( = maxi jf^n;i (xi ) Because (by Lemma 10) # @ f^n+ (x)=@x 2 (x) (x) : f^+ (x) n; _ ) = op (n n f^n (xi )j; _ n 1=2 ); where = maxi j @ ^ f (x ) @x n;i i @ ^ fn (xi )j: @x Bootstrapping Semiparametric Estimators p = Op (1= nhdn = log n); n _ n 76 p = Op (1= nhd+2 n = log n); 3 a su¢ cient condition for this to occur is that nhn2 d+1 = (log n)3=2 ! 1: Moreover, if nhdn ! 1; then V (gn;f (z1 )[ V [E (gn;f f (z1 )[ n;2 ; n;2 ]jz1 )] n;2 ]) = Op (1=hd+2 n ); = Op (1=h2d+2 ); n V (gn;f f (z1 )[ n;2 ; n;3 ]) V (gn;f f (z1 )[ n;2 ; n;2 ]) = Op (1=h3d+2 ); n = Op (1=h2d+2 ); n so the second part of Condition (AS*) will be satis…ed provided nhd+2 ! 1: n Condition (AN*). We have: 1 X p [g (z ; f^n ) + Gn (f^n;i ) n i=1 n i n 1 X Gn (f^n )] = p [ n i=1 n n (zi ) + Bn ]; where 1X g (zi ; f^n ) + n i=1 n n ^ n (z) = gn (z; fn ) 1 X yi !(xi ) @ _ n ) Kn (xi n i=1 f^n+ (xi ) @xi n (z); n n (z) = 1 (1 x) @ ^ fn (xi ) @xi n 1 X yi !(xi ) @ f^n+ (xi )=@xi n ) [Kn (xi n i=1 f^n+ (xi ) f^n+ (xi ) 1 +(1 1X 11X Bn = gn (zi ; f^n ) + G [f^ n i=1 2 n i=1 n;f f n;i n x) f^n (xi )]: n f^n ; f^n;i f^n ]: Bootstrapping Semiparametric Estimators 77 ^ Suppose hn ! 0 and nhd+2 n = log n ! 1: Using Lemma 10 and the fact that n !p 0 P 2 it can be shown that n 1 ni=1 k n (zi ) n (zi )k !p 0: Also, the bootstrap analog of (10) is satis…ed because the representation Gn;f f [ ; ] = 1 1 n 2 1 n i 1 X yi ! (xi ) h _ (xi ) _ (x ) (x ) + (x ) i i i n i=1 f^n+ (xi )2 # " n 2 1 1 X yi ! (xi ) @ f^n+ (xi ) =@xi (xi ) (xi ) n n f^+ (xi )2 f^+ (xi ) 2 i=1 n n can be used to show that V (kGn;f f [ n;1 ; 2 n;1 ]k ) = Op (1=h2d+2 ); n Finally, it can be shown that E (Bn ) = V (kGn;f f [ n + op n 1=2 n;1 ; 3 if nhn2 3 other words, Condition (AN*) holds if hn ! 0 and if nhn2 9.3. n;2 ]k d+1 d+1 2 ) = Op (1=hd+2 n ): = (log n)3=2 ! 1: In = (log n)3=2 ! 1: Example 3. To obtain primitive bandwidth conditions for the conditions of Theorems 2 and 7, suppose that for some P > 3d=4; f0 is P times di¤erentiable, and f0 and its …rst P derivatives are bounded and continuous. Fyjx ( jx); the conditional cdf of y given x; has three bounded (uniformly in x) derivatives. R K is even and bounded with Rd jK(u)j(1 + kukP )du < 1 and 8 Z < 1; if l1 = = ld = 0; ld l1 u1 ud K(u)du = : 0; Rd if (l ; : : : ; l )0 2 Zd and l + 1 d + 1 . + ld < P Bootstrapping Semiparametric Estimators 78 Conditions (AL) and (AL*) hold with J = Ik and without any op (1) terms. Condition (AS). Let Fyjx ( jx) denote the conditional cdf of y given x; let fyjx ( jx) and f_yjx ( jx) denote its …rst and second derivatives, and de…ne gn (x; f ) = E[gn (z; f )jx] (1 0) = Fyjx [n 1 Kn (0) + (1 n 1 )f (x)jx]: Being a de…ned through a projection, gn (x; f ) is likely to be close to gn (z; f ) in the the appropriate sense and, indeed, 1 X p [gn (zi ; f^n;i ) n i=1 n if n gn (xi ; f^n;i ) gn (zi ; fn ) + gn (xi ; fn )] = op (1) = op (1); because then 2 n 1 X E4 p [gn (zi ; f^n;i ) n i=1 n X 1 = V [gn (zi ; f^n;i ) n i=1 gn (xi ; f^n;i ) gn (zi ; fn )]jXn ! !2 gn (zi ; fn ) + gn (xi ; fn )] supr;s fyjx (rjs) n 3 jXn 5 = op (1); where Xn = (x1 ; : : : ; xn )0 : Next, being smooth gn (x; f ) admits the quadratic approximation gn (x; f ) = gn (x; fn ) + gn;f (x)[f where 1 fn ] + gn;f f (x)[f 2 fn ; f fn ]; Bootstrapping Semiparametric Estimators gn;f (x)[ ] = (1 n 1 )fyjx [fn+ (x)jx] (x); gn;f f (x)[ ; ] = (1 n 1 )2 f_yjx [fn+ (x)jx] (x) (x): 79 It follows from standard bounding arguments (e.g., Lemma 5) that 1 X p [gn (xi ; f^n;i ) n i=1 n provided n = op (n 1=6 gn (xi ; f^n;i ) gn (xi ; fn ) + gn (xi ; fn )] = op (1) ): This condition, and therefore the …rst part of Condition 3 d (AS), is satis…ed when nhn2 =(log n)3=2 ! 1: Moreover, V(gn;f (x1 )[ V[E(gn;f f (x1 )[ n;2 ; n;2 ]jz1 )] n;2 ]) = O(1=hdn ); = O(1=h2d n ); V(gn;f f (x1 )[ n;2 ; V(gn;f f (x1 )[ n;3 ]) n;2 ; n;2 ]) = O(1=h3d n ); = O(1=h2d n ); so the assumptions of Lemma 4 will be satis…ed provided nhdn ! 1: Condition (AN). We have: 1 X p [gn (zi ; fn ) + Gn (f^n;i ) n i=1 n 1 X Gn (fn )] = p [ n i=1 n n (zi ) + Bn ]; where n (z) n (x) = (1 = gn (z; fn ) Egn (z; fn ) + n (x); Z 1 n ) fyjx [fn+ (r)jr][Kn (r x) fn (r)]f0 (r)dr; Rd Bootstrapping Semiparametric Estimators 80 and 11X Bn = Egn (z; fn ) + Gn;f f [f^n;i 2 n i=1 n fn ; f^n;i fn ]: If hn ! 0 and if nhdn ! 1; then n (z) ! (z) = g0 (z; f0 )+ 0 (x); 0 (x) = Z fyjx [f0 (x)jx]f0 (x)+ fyjx [f0 (x)jx]f0 (x)2 dx; Rd for every z and it follows from the dominated convergence theorem that (9) is satis…ed. Also, (10) is satis…ed if hn ! 0 and if nhdn ! 1 because the representation Gn;f f [ ; ] = (1 1 2 n ) Z Rd f_yjx [fn+ (x)jx] (x) (x)f0 (x)dx can be used to show that if hn ! 0; then E(kGn;f f [ n;1 ; n;1 ]k 2 ) = O(1=h2d n ); Condition (AN) is therefore satis…ed (with E(kGn;f f [ n;1 ; n;2 ]k = ) = O(1=hdn ): = E[ (z) (z)0 ]) if hn ! 0 and if nhdn ! 1: It can be shown that if also nh2P n ! 0; then EBn = n 2 Z 1 K (0) fyjx [f0 (x) jx] f0 (x) dx nhdn Rd Z 1 1 f_yjx [f0 (x) jx] K (r)2 f0 (x) f0 (x nhdn 2 Rd n + o(n 1=2 rhn ) dxdr: ); where Bootstrapping Semiparametric Estimators 81 Condition (AS*). Let gn (x; f ) = gn (x; f ) and de…ne gn (x; f ) = gn (x; f^n ) + gn;f (x)[f 1 f^n ] + gn;f f (x)[f 2 gn;f (x)[ ] = (1 n 1 )fyjx [f^n+ (x)jx] (x); gn;f f (x)[ ; ] = (1 n 1 )2 f_yjx [f^n+ (x)jx] (x) (x): f^n ; f f^n ]; where De…ning Ni = tion) that n 1 Pn j=1 Pn i=1 1 xj = xi and using the fact (about the multinomial distribu- Ni2 = Op (1); it can be shown that 1 X p [g (z ; f^ ) n i=1 n i n;i n if n gn (xi ; f^n;i ) gn (zi ; f^n ) + gn (xi ; f^n )] = op (1) = op (1) ; because then 2 n 1 X 4 p E [g (z ; f^ ) n i=1 n i n;i n X 1 = V [gn (zi ; f^n;i ) n i=1 gn (xi ; f^n;i ) !2 gn (zi ; f^n ) + gn (xi ; f^n )] gn (zi ; f^n )]jXn ; Xn ! jXn ; Xn 5 1X 2 N n i=1 i n supr;s fyjx (rjs) 3 ! n = op (1); where Xn = (x1 ; : : : ; xn )0 : Also, it follows from standard bounding arguments that 1 X p [g (x ; f^ ) n i=1 n i n;i n gn (xi ; f^n;i ) gn (xi ; f^n ) + gn (xi ; f^n )] = op (1) Bootstrapping Semiparametric Estimators provided n = op (n 1=6 82 ): This condition, and therefore the …rst part of Condition 3 d (AS*), is satis…ed when nhn2 = (log n)3=2 ! 1: Moreover, V (gn;f (x1 )[ V [E (gn;f f (x1 )[ n;2 ; n;2 ]jx1 )] n;2 ]) = Op (1=hdn ); = Op (1=h2d n ); V (gn;f f (x1 )[ n;2 ; V (gn;f f (x1 )[ n;3 ]) n;2 ; n;2 ]) = Op (1=h3d n ); = Op (1=h2d n ); so the second part of Condition (AS*) will be satis…ed provided nhdn ! 1: Condition (AN*). Finally, we have: 1 X p [g (z ; f^n ) + Gn (f^n;i ) n i=1 n i n 1 X [ Gn (f^n )] = p n i=1 n n (zi ) + Bn ]; where 1X g (zi ; f^n ) + n i=1 n n ^ n (z) = gn (z; fn ) 1X n ) fyjx [f^n+ (xi )jxi ][Kn (xi n i=1 n n (z) = (1 1 11X 1X Bn = gn (zi ; f^n ) + G [f^ n i=1 2 n i=1 n;f f n;i n n (z); x) f^n (xi )]; n 3 d f^n ; f^n;i f^n ]: Suppose hn ! 0 and nhn2 = (log n)3=2 ! 1: Using Lemma 10 and the fact that 2 ^n !p 0 it can be shown that n 1 Pn k (zi ) n (zi )k !p 0: Also, the bootstrap n i=1 analog of (10) is satis…ed because the representation Bootstrapping Semiparametric Estimators Gn;f f [ ; ] = (1 1 21 n ) n n X 83 f_yjx [f^n+ (xi )jxi ] (xi ) (xi ) i=1 can be used to show that V (kGn;f f [ n;1 ; n;1 ]k 2 ) = Op (1=h2d n ); Finally, it can be shown that E (Bn ) = V (kGn;f f [ n + op n 1=2 n;1 ; 2 n;2 ]k ) 3 = Op (1=hdn ): d if nhn2 = (log n)3=2 ! 1: In 3 d other words, Condition (AN*) holds if hn ! 0 and if nhn2 = (log n)3=2 ! 1: