Penn Institute for Economic Research Department of Economics University of Pennsylvania 3718 Locust Walk Philadelphia, PA 19104-6297 pier@econ.upenn.edu http://www.econ.upenn.edu/pier PIER Working Paper 08-006 “Bootstrap Tests of Stochastic Dominance with Asymptotic Similarity on the Boundary” by Oliver Linton, Kyungchul Song and Yoon-Jae Whang http://ssrn.com/abstract=1097564 Bootstrap Tests of Stochastic Dominance with Asymptotic Similarity on the Boundary Oliver Linton1 , Kyungchul Song2 and Yoon-Jae Whang3 London School of Economics, University of Pennsylvania and Seoul National University February 19, 2008 Abstract We propose a new method of testing stochastic dominance which improves on existing tests based on bootstrap or subsampling. Our test requires estimation of the contact sets between the marginal distributions. Our tests have asymptotic sizes that are exactly equal to the nominal level uniformly over the boundary points of the null hypothesis and are therefore valid over the whole null hypothesis. We also allow the prospects to be indexed by infinite as well as finite dimensional unknown parameters, so that the variables may be residuals from nonparametric and semiparametric models. Our simulation results show that our tests are indeed more powerful than the existing subsampling and recentered bootstrap. Key words and Phrases: Set estimation; Size of test; Unbiasedness; Similarity; Bootstrap; Subsampling. JEL Classifications: C12, C14, C52. 1 Introduction There has been a growth of interest in testing stochastic dominance relations among variables such as investment strategies and income distributions. The ordering by stochastic dominance covers a large class of utility functions and hence is more useful in general than 1 Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, United Kingdom. E-mail address: o.linton@lse.ac.uk. Thanks to the ESRC and Leverhulme foundations for financial support. 2 Department of Economics, University of Pennsylvania, 528 McNeil Building, 3718 Locust Walk, Philadelphia, Pennsylvania 19104-6297. Email: kysong@sas.upenn.edu 3 Department of Economics, Seoul National University, Seoul 151-742, Korea. E-mail address: whang@snu.ac.kr. Thanks to the Korea Research Foundation for financial support. 1 the notions of partial orderings that are specific to a certain class of utility functions. However, the main difficulty in testing stochastic dominance lies in the complexity that arises in computing approximate critical values of the test. A stochastic dominance relation is a weak inequality relation between distribution functions or integral transform of distribution functions. Unlike nonparametric or semiparametric tests that are based on the equality of functions, the convergence of test statistics of Kolmogorov-Smirnov type or Cramér-von Mises type is not uniform over the probabilities under the null hypothesis. Discontinuity of convergence arises precisely between the "interior points" of the null hypothesis and the "boundary points" of the null hypothesis, where boundary points indicate those probabilities under which all pairs of the distribution functions meet at least one point in the interior of the support and interior points indicate probabilities under which at least one pair of the distribution functions do not meet at any point in the interior of the support. In general, the boundary points do not coincide with the least favorable subset of null hypothesis, i.e., the set of probabilities under which all the pairs of competing distribution functions are equal. As noted by Linton, Maasoumi, and Whang (2005), henceforth LMW, the usual approach of recentered bootstrap or wild bootstrap does not provide tests with asymptotically valid sizes uniformly over the probabilities under the null hypothesis; the asymptotic size of the tests are exact only on the least favorable points and invalid for other points under the null hypothesis. LMW proposed a subsampling method to obtain tests with exact asymptotic sizes over the boundary points and asymptotically valid for each probability under the null hypothesis. This paper proposes a bootstrap-based test that has asymptotically valid size uniformly under the null hypothesis and asymptotically exact size on the boundary points. The method is based on bootstrap test statistics that are constructed to mimic the phenomenon of discontinuous convergence by using the estimated boundary points. This requires the estimation of a certain set, we call the "contact set"; set estimation is a topic of considerable recent interest in microeconometrics, see for example Moon and Schorfheide (2006), Chernozhukov, Hong, and Tamer (2007), and Fan and Park (2007). Our method of bootstrap has the following remarkable properties. First, it is asymptotically similar on the boundary. Second, more importantly, the bootstrap method provides asymptotically exact sizes. Furthermore, we can make the rejection probability under the null hypothesis converge to the true rejection probability only slightly less fast than the asymptotic approximation. This implies that the convergence of rejection probability can be made faster than that of subsampling. The proposed test of stochastic dominance admits variables that contain unknown finitedimensional and infinite-dimensional parameters as long as those parameters have consistent estimators. Hence the proposed test can be used to test stochastic dominance relations 2 among variables in the form of regression errors from the semiparametric regression models such as single index restrictions or partially parametric regressions. For example, one can detect the influence of a certain factor upon the stochastic dominance relation by comparing test results with and without the factor of interest while controlling for other factors. In some situations, it is desirable to control for publicly observed variables. It is a potential concern for the researcher in such cases that the result of testing may rely sensitively on the regression specification chosen to control certain factors stochastically. Semiparametric specification of the regression can be useful in this situation. We perform Monte Carlo simulations that compare three methods: recentered bootstrap, subsampling method, and the bootstrap on the boundary domain proposed in this paper. We verify the superior performance of our method. Testing stochastic dominance has drawn attention in the literature over the last decade. McFadden (1986), Anderson (1996) and Davidson and Duclos (2000) are among the early works that considered testing stochastic dominance. Barrett and Donald (2003) proposed a consistent bootstrap test that has an asymptotically exact size on the least favorable points for the special case where the prospects are mutually independent. LMW suggested a subsampling method that has asymptotically valid size on the boundary points and applies to variables that contain unknown finite-dimensional parameters and allows the prospects to be mutually dependent. In Section 2, we define the null hypothesis of stochastic dominance and introduce notations. In Section 3, we suggest test statistics and develop asymptotic theory both under the null hypothesis and local alternatives. Section 4 is devoted to the bootstrap procedure, explaining the method of obtaining bootstrap test statistics and establishing their asymptotic properties. In Section 5, we describe Monte Carlo simulation studies and discuss results from them. Section 6 concludes. All the technical proofs are relegated to the appendix. 2 2.1 Stochastic Dominance The Null Hypothesis Let {Xk }K k=1 be a set of continuous outcome variables which may, for example, represent different points in time or different regions. Let Fk (x) be the distribution function of the (1) k-th random variable Xk . Let Dk (x) = Fk (x) and (s) Dk (x) = Z x (s−1) Dk −∞ 3 (t)dt. (s) (s) Then we say that X1 stochastically dominates X2 at order s if D1 (x) ≤ D2 (x) for all x with strict inequality for some x. This is equivalent to an ordering of expected utility over a certain class of utility functions Us , see LMW. (s) (s) (s) Let Dkl (x) = Dk (x) − Dl (x) and define d∗s = (s) min sup Dkl (x) k6=l x∈X and c∗s = min k6=l Z X n o2 (s) max Dkl (x), 0 dx, where X denotes a set contained in the union of the supports of Xk , k = 1, 2, . . . , K. We do not assume that the set X is bounded. The null hypothesis that this paper focuses on takes the following form: (1) H0,A : d∗s ≤ 0 vs. H1,A : d∗s > 0. The null hypothesis represents the presence of a stochastic dominance relationship between a pair of variables in {Xk }K k=1 . The alternative hypothesis corresponds to no such incidence. Alternatively, we might also consider the following form of the null and alternative hypotheses H0,B : c∗s = 0 vs. H1,B : c∗s > 0. (2) Obviously the null hypothesis d∗s ≤ 0 implies the null hypothesis c∗s = 0. On the contrary, (s) when c∗s = 0, Dkl (x) ≤ 0, μ-a.e., for some pair (k, l), where μ denotes the Lebesgue measure (s) on X . Therefore, if the Dk ’s are continuous functions, both the null hypotheses in (1) and (2) are equivalent, but in general, the null hypothesis in (1) is stronger than that in (2). (s) Of crucial importance in the sequel are the "contact" sets, these are denoted by Bkl the (s) (s) subset of X such that Dk (x) and Dl (x) are equal, i.e., (s) (s) Bkl = {x ∈ X : Dkl (x) = 0}. (3) These sets can be empty, can contain a countable number of isolated points, or can be a countable union of disjoint intervals. The limiting distributions of our test statistics depend only these subsets of X . 2.2 Test Statistics and Asymptotic Theory In many cases of practical interest the outcome variable maybe the residual from some model. This arises in particular when data is limited and one may want to use a model to adjust for systematic differences. Common practice is to cut the data into subsets, say of families with two children, and then make comparisons across homogenous populations. When data are limited this can be difficult and a modelling approach can overcome this difficulty. We 4 suppose that the variables Xk depend on an unknown finite dimensional parameter θ0 ∈ RdΘ and on an infinite dimensional parameter τ 0 ∈ T (here, T is a totally bounded class of functions with respect to a certain metric), so that we write Xk = Xk (θ0 , τ 0 ). For example, the variable Xk may be the residual from the partially parametric regression Xk (θ0 , τ 0 ) = > > θ0 −τ 0 (Z2k ) or the single index framework Xk (θ0 , τ 0 ) = Yk −τ 0 (Z1k θ0 ). Generically we Yk −Z1k let Xk (θ, τ ) be specified as Xk (θ, τ ) = ϕk (W ; θ, τ ) where ϕk (·; θ, τ ) is a real-valued function > θ0 ), known up to the parameter (θ, τ ) ∈ Θ × T . In the example of Xk (θ0 , τ 0 ) = Yk − τ 0 (Z1k W denotes the vector of Yk and Z1k , k = 1, . . . , K. This set up is more general than in LMW who allowed only linear regression. We next specify further the precise properties that we require of the data generating process. Let BΘ×T (δ) = {(θ, τ ) ∈ Θ × T : ||θ − θ0 || < δ, supP ∈P ||τ − τ 0 ||P,2 < δ}, where the norm || · ||P,2 denotes the usual L2 (P )-norm. We introduce a bounded weight function q(x) (s) (see Assumption 3(ii)) and define hx (ϕ) = (x − ϕ)s−1 1{ϕ ≤ x}q(x). Let N[] (ε, T , || · ||P,r ) denote the bracketing number of T with respect to the Lr (P )-norm, i.e. the smallest number of ε- brackets that are needed to cover the space T (e.g. van der Vaart and Wellner (1996)). The conditions in Assumptions 1 and 2 below are concerned with the data generating process of W and the map ϕk . Let P be the collection of all the potential distributions of W that satisfy the conditions Assumptions 1-3 below. Assumption 1 : (i) {Wi }N i=1 is a random sample. (ii) supP ∈P log N[] (ε, T , || · ||P,2 ) ≤ Cε−d for some d ∈ (0, 1]. (iii) For some δ > 0 and s > 0, supP ∈P EP [sup(θ,τ )∈BΘ×T (δ) |ϕk (W ; θ, τ )|2((s−1)∨1)+δ ] < ∞. (iv) For some δ > 0, there exists a functional Γk,P (x)[θ − θ0 , τ − τ 0 ] of (θ − θ0 , τ − τ 0 ), (θ, τ ) ∈ BΘ×T (δ), such that ¯ ¯ £ (s) ¤ ¤ £ ¯EP hx (ϕk (W ; θ, τ )) − EP hx(s) (ϕk (W ; θ0 , τ 0 )) − Γk,P (x)[θ − θ0 , τ − τ 0 ]¯ ≤ C1 ||θ − θ0 ||2 + C2 ||τ − τ 0 ||2P,2 with constants C1 and C2 that do not depend on P , and for each ε > 0 ¯ ¯ ) N ¯√ ¯ X 1 ¯ ¯ ψx,k,P (Wi )¯ > ε = 0, limsupN ≥1 supP ∈P P supx∈X ¯ NΓk,P (x)[θ̂ − θ0 , τ̂ − τ 0 ] − √ ¯ ¯ N i=1 (4) where ψx,P (·) satisfies that there exist η, C > 0 and s1 ∈ (0, 1] such that for all x ∈ ¤ £ X , EP ψx,k,P (Wi ) = 0, supP ∈P ||supx∈X ||ψx,k,P ||||P,2+η < ∞, and ( h ¯2 i ¯ supP ∈P EP supx2 ∈X :|x−x2 |≤ε ¯ψ x,k,P (Wi ) − ψx2 ,k,P (Wi )¯ ≤ Cε2s1 , for all ε > 0. 5 The bracketing entropy condition for T in (ii) is satisfied by many classes of functions. For example, when T is a Hölder class of smoothness α with the common domain of τ (·) ∈ T that is convex, and bounded in the dT -dim Euclidean space with dT /α ∈ (0, 1] (e.g. Corollary 2.7.2 in van der Vaart and Wellner (1996)), the bracketing entropy condition holds. The condition is also satisfied when T is contained in a class of uniformly bounded functions of bounded variation. Condition (iii) is a moment condition with local uniform boundedness. The moment condition is widely used in the literature of semiparametric > θ0 ) + εk , we can inferences. In the example of single-index restrictions where Yk = τ 0 (Z1k > > write ϕ(w; θ, τ ) = τ 0 (z1k θ0 ) − τ (z1k θ) + εk . If τ is uniformly bounded in the neighborhood of τ 0 in L2 (P ), the moment condition is immediately satisfied when E[|εi |2((s−1)∨1)+δ ] < ∞. Or if τ is Lipschitz continuous with a uniformly bounded Lipschitz coefficient and |τ 0 (v) − τ (v)| ≤ C|v|p , p ≥ 1, and E[||Z1k ||p∨(2((s−1)∨1)+δ) ] < ∞, then the moment condition holds. We may check this condition for other semiparametric specifications in a similar manner. Condition R (s) (iv) is concerned with the pathwise differentiability of the functional hx (ϕk (W ; θ, τ ))dP in (θ, τ ) ∈ BΘ×T (δ). Let ϕ1 = ϕ(W ; θ, τ ) and ϕ2 = ϕ(W ; θ0 , τ 0 ). Then we can write (s) s−1 h(s) − (x − ϕ2 )s−1 }1{ϕ2 ≤ x} x (ϕ1 ) − hx (ϕ2 ) = {(x − ϕ1 ) +(x − ϕ2 )s−1 {1{ϕ1 ≤ x} − 1{ϕ2 ≤ x}}. Hence the pathwise differentiability is immediate when ϕ(w; θ, τ ) is continuously differentiable in (θ, τ ) and W is continuous and has a bounded density. The condition in (4) indicates that the functional Γk,P at the estimators has an asymptotic linear representation. This condition can be established using the standard method of expanding the functional in terms of the estimators, θ̂ and τ̂ , and using the asymptotic linear representation of these estimators. This asymptotic linear representation for these estimators is available in many semiparametric models. Since our procedure does not make use of its particular characteristic beyond the condition in (4), we keep this condition at a high level for the sake of brevity. Assumption 2 : (i) ϕ(W ; θ0 , τ 0 ) is absolutely continuous with bounded density. (ii) Condition (A) below holds when s = 1, and Condition (B), when s > 1. (A) There exist δ, C > 0 and a subvector W1 of W such that (a) the conditional density of ϕ(W ; θ, τ ) given W1 is bounded uniformly over (θ, τ ) ∈ BΘ×T (δ), (b) for each (θ, τ ) and (θ0 , τ 0 ) ∈ BΘ×T (δ), ϕk (W ; θ, τ ) − ϕk (W ; θ0 , τ 0 ) is measurable with respect to the σ-field of W1 , and (c) for each (θ1 , τ 1 ) ∈ BΘ×T (δ) and for each ε > 0, h i supP ∈P supw1 EP sup(d2 ,τ 2 )∈BΘ×T (ε) |ϕk (W ; θ1 , τ 1 ) − ϕk (W ; θ2 , τ 2 )|2 |W1 = w1 ≤ Cε2s2 (5) 6 for some s2 ∈ (d, 1] with d in Assumption 1(ii), where the supremum over w1 runs in the support of W1 . (B) There exist δ, C > 0 such that Condition (c) above is satisfied. Assumption 2(ii) contains two different conditions that are suited to each case of s = 1 or s > 1. This different treatment is due to the nature of the function hx (ϕ) = (x − ϕ)s−1 1{ϕ ≤ x}q(x) that is discontinuous in ϕ when s = 1 and smooth in ϕ when s > 1. Condition (A) can be viewed as a generalization of the set up of LMW. Condition (A)(a) is analogous to Assumption 1(iii) of LMW. Condition (ii)(A)(b) is satisfied by many semiparametric models. For example, in the case of a partially parametric specification: Xk (θ0 , τ 0 ) = Yk − > θ0 − τ 0 (Z2k ), we take W = (Y, Z1 , Z2 ) and W1 = (Z1 , Z2 ). In the case of single index Z1k restrictions: Xk (θ0 , τ 0 ) = Yk − τ 0 (Zk> θ0 ), W = (Y, Z) and W1 = Z. The condition in (5) requires the function ϕ(W ; θ, τ ) to be (conditionally) locally uniformly L2 (P )-continuous in (θ, τ ) ∈ BΘ×T (δ) uniformly over P ∈ P. Sufficient conditions and discussions can be found in Chen, Linton, and van Keilegom (2003). We can weaken this condition to the unconditional version when we consider only the case s > 1. P We now turn to the definition of the test statistics. Let F̄kN (x, θ, τ ) = N1 N i=1 1 {Xki (θ, τ ) ≤ x} and (s) (s) (s) D̄kl (x, θ, τ ) = D̄k (x, θ, τ ) − D̄l (x, θ, τ ) for s ≥ 1, (1) (s) where D̄k (x, θ, τ ) = F̄kN (x, θ, τ ) and D̄k (x, θ, τ ) is defined through the following recursive relation: Z x (s) (s−1) (6) D̄kl (t, θ, τ )dt for s ≥ 2. D̄kl (x, θ, τ ) = −∞ The test statistics we consider are based on the weighted empirical analogues of d∗s and c∗s , namely, (s) DN (s) CN √ (s) = min sup q(x) N D̄kl (x, θ̂, τ̂ ) and k6=l x∈X Z n o2 √ (s) = min max q(x) N D̄kl (x, θ̂, τ̂ ), 0 dx. k6=l X Horváth, Kokoszka, and Zitikis (2006) showed that in the case of the Kolmogorov-Smirnov functional with s ≥ 3, it is necessary to employ a weight function q(x) such that q(x) → 0 as x → ∞ in order to obtain nondegenerate asymptotics. The numerical integration in (6) can be cumbersome in practice. Integrating by parts, we have an alternative form X 1 (x − Xki (θ, τ ))s−1 1{Xki (θ, τ ) ≤ x}. N(s − 1)! i=1 N (s) D̄k (x, θ, τ ) = 7 (s) (s) Since D̄k and Dk are obtained by applying a linear operator to F̄kN and Fk , the estimated (s) (s) function D̄k is an unbiased estimator for Dk . Regarding the estimators θ̂ and τ̂ and the choice of the weight function q, we assume the following. Assumption 3 : (i) ||θ̂ − θ0 || = oP (N −1/4 ), ||τ̂ − τ 0 ||P,2 = oP (N −1/4 ) uniformly in P ∈ P, and supP ∈P P {τ̂ ∈ T } → 1 as N → ∞. ¡ ¢ (ii) supx∈X 1 + |x|(s−1)∨(1+δ) q(x) < ∞, for some δ > 0 and for q, nonnegative, first order continuously differentiable function on X with a bounded derivative. When θ̂ is a solution from the M- estimation problem, its rate of convergence can be obtained by following the procedure of Theorem 3.2.5 of van der Vaart and Wellner (1996). The uniformity in P in this case can be ensured by combining the uniform (in P ) oscillation behavior of the population objective function and the associated empirical processes. For instance, one may employ a uniform oscillation exponential bound for empirical processes established by Giné (1997), Theorem 6.1. The rate of convergence for the nonparametric component τ̂ can also be established by controlling the bias and variance part uniformly in P. In order to control the variance part, one may employ the framework of Giné and Zinn (1991) and establish the central limit theorem that is uniform in P . The sufficient conditions for the last condition in (i) can be checked, for example, from the results of Andrews (1994). Condition (ii) is very convenient and is fulfilled by an appropriate choice of the weight function. The use of the weight function is convenient as X is allowed to be unbounded. Condition (ii) is stronger than that of Horváth, Kokoszka, and Zikitis (2006) who under a ¡ ¢ set-up simpler than this paper, imposed that supx∈X 1 + (max(x, 0)(s−2)∨1 q(x) < ∞. Note that the condition in (ii) implies that when X is a bounded set, we may simply take q(x) = 1. When s = 1 so that our focus is on the first order stochastic dominance relation, we may transform the variable Xki (θ, τ ) into one that has a bounded support by taking a smooth strictly monotone transform. After this transformation, we can simply take q(x) = 1. (s) (s) The first result below is concerned with the convergence in distribution of DN and CN (s) under the null hypothesis uniformly in a subset of P. As for hx defined prior to Assumption (s) (s) (s) ∆ 1, we define h∆ kl,i (x) = hx (Xki ) − hx (Xli ) and ψ kl,i (x) = ψ x,k,P (Wi ) − ψ x,l,P (Wi ). Let ν kl (·) be a Gaussian process on X with covariance kernel given by ∆ ∆ ∆ CP (x1 , x2 ) = CovP (h∆ kl,i (x1 ) + ψ kl,i (x1 ), hkl,i (x2 ) + ψ kl,i (x2 )), where CovP denotes the covariance under P. The asymptotic critical values are based on this Gaussian process. By convention, we assume that the supremum over an empty set is equal to −∞. 8 KS Let P0 be the set of probabilities under the null hypothesis and let P00 be the set of (s) probabilities in P0 such that the contact set Bkl is nonempty for all pairs k 6= l.4 The set KS represents the "boundary" of the null hypothesis. In other words, the of probabilities P00 (s) boundary points are the probabilities under which the contact set Bkl is nonempty for all (s) k 6= l. We define the set of boundary points differently for Cramér-von Mises tests, CN . Let (s) CM be the set of probabilities in P0 such that for each k 6= l, the contact set Bkl has a P00 CM KS ⊂ P00 . positive Lebesgue measure. Hence, in general P00 Theorem 1 : Suppose that Assumptions 1-3 hold. Then under the null hypothesis, as N → ∞, (s) DN (s) CN → D → D ( ( (s) KS mink6=l supx∈B(s) (ν kl (x)), if P ∈ P00 kl mink6=l R KS −∞, if P ∈ P0 \P00 and (s) (s) Bkl CM max{ν kl (x), 0}2 dx, if P ∈ P00 CM 0, if P ∈ P0 \P00 . KS on the boundary, the test statistic is asymptotiUnder a fixed probability P ∈ P00 KS KS . But under a fixed probability P ∈ P0 \P00 , we have cally tight uniformly over P ∈ P00 (s) supx∈X Dkl (x) < 0 for some k, l, and hence the test statistics diverge to −∞. The limit(s) ing distribution of the test statistic DN exhibits discontinuity; they are not asymptotically KS , while they are so at each P ∈ P0KS . This phenomenon of distight at each P ∈ P0 \P00 continuity arises often in moment inequality models. (e.g. Moon and Schorfheide (2006), Chernozhukov, Hong, and Tamer (2007), and Andrews and Guggenberger (2006)). (s) CM , As for CN , similar discontinuity arises in the limiting distribution. When P ∈ P00 (s) CM CN has a nondegenerate limiting distribution. However, when P ∈ P0 \P00 , the limiting distribution becomes degenerate at zero. We introduce the following definition of a test having an asymptotically exact size. Definition 1 : (i) A test ϕα with a nominal level α is said to have an asymptotically exact size if there exists a nonempty subset P00 ⊂ P0 such that: limsupN→∞ supP ∈P0 EP ϕα ≤ α. limsupN→∞ supP ∈P00 EP ϕα = α. (7) (ii) When a test ϕα satisfies (7), we say that the test is asymptotically similar on P00 . 4 KS KS Hence, that P ∈ P00 implies d∗s = 0 because P00 ⊂ P0 . However, the converse is not true, in particular when X is unbounded. 9 (s) The Gaussian process ν kl in Theorem 1 depends on the unknown elements of the null hypothesis, precluding the use of first order asymptotic critical values in practice. Barrett and Donald (2003) suggested a bootstrap procedure and LMW, a subsampling approach. Both studies have not paid attention to the issue of uniformity in the convergence of tests. In order to deal with uniformity, we separate P0 into the boundary points and the interior points and deal with asymptotic theory of the test statistic separately. The proof of Theorem √ (s) (s) 1 establishes the weak convergence of q(·) N{D̄kl (·, θ̂, τ̂ )−Dkl (·, θ0 , τ 0 )} to ν kl (·) uniformly over P ∈ P. This result is obtained by showing that the class of functions indexing the empirical process is a uniform Donsker class (Giné and Zinn (1991)). The remaining step is to deal with the limiting behavior of the test statistic uniformly over the interior points. One might consider a typical bootstrap procedure for this situation. The difficulty for the bootstrap method tests of stochastic dominance lies mainly in the fact that it is hard to impose the null hypothesis upon the test. There have been approaches that consider only least favorable subset of the models of the null hypothesis as in the following. F1 (x) = · · · = FK (x) for all x ∈ X . (8) This leads to the problem of asymptotic nonsimilarity in the sense that when the true data generating process lies away from the least favorable subset of the models of the null hypothesis, and yet still lie at the boundary points, the bootstrap sizes become misleading even in large samples. LMW calls this phenomenon asymptotic nonsimilarity on the boundary. Bootstrap procedures that employ the usual recentering implicitly impose restrictions that do not hold outside the least favorable set and hence asymptotically biased against such probabilities outside the set. To illustrate heuristically why a test that uses critical values from the least favorable case of a composite null hypothesis can be asymptotically biased, let us consider a simple example in the finite dimensional case. Suppose that the observations {Xi = (X1i , X2i ) : i = 1, . . . , N} are mutually independently and identically distributed with unknown mean μ = (μ1 , μ2 ) and known variance Σ = diag(1, 1). Let the hypotheses of interest be given by: H0 : μ1 ≤ 0 and μ2 ≤ 0 vs. H1 : μ1 > 0 or μ2 > 0. These hypotheses are equivalent to: H0 : d∗ ≤ 0 vs. H1 : d∗ > 0, (9) where d∗ = max{μ1 , μ2 }. The "boundary" of the null hypothesis is given by BBD = {(μ1 , μ2 ) : 10 μ1 ≤ 0 and μ2 ≤ 0} ∩ {(μ1 , μ2 ) : μ1 = 0 or μ2 = 0}, while the "least favorable case (LFC)" is given by BLF = {(μ1 , μ2 ) : μ1 = 0 and μ2 = 0} ⊂ BBD . To test (9), one may consider the following the t-statistic: TN = max{N 1/2 X 1 , N 1/2 X 2 }, P where X k = N i=1 Xki /N. Then, the asymptotic null distribution of TN is non-degenerate provided the true μ lies on the boundary BBD , but the distribution depends on the location of μ. That is, we have ⎧ ⎪ μ = (0, 0) ⎨ max{Z1 , Z2 } if d TN → if μ1 = 0, μ2 < 0 , Z1 ⎪ ⎩ if μ1 < 0, μ2 = 0 Z2 where Z1 and Z2 are mutually independent standard normal random variables. On the other hand, TN diverges to −∞ under the "interior" of the null hypothesis. Suppose that zα∗ satisfies P (max{Z1 , Z2 } > zα∗ ) = α for α ∈ (0, 1). Then, the test based on the least favorable case is asymptotically non-similar on the boundary because, for example, limN→∞ P (TN > zα∗ |μ = (0, 0)) 6=limN→∞ P (TN > zα∗ |μ = (0, −1)). Now consider the following sequence of local alternatives: for δ > 0, δ HN : μ1 = √ and μ2 < 0. N d Then, under HN , it is easy to see that TN → N(δ, 1). However, the test based on the LFC critical value may be biased against this local alternatives, because limN→∞ P (TN > zα∗ ) = P (N(δ, 1) > zα∗ ) < α for some values of δ. To see this, in Figure 1, we draw a c.d.f.’s of max{Z1 , Z2 } and N(δ, 1) for δ = 0.0, 0.2, and 1.5. Clearly, the distribution of max{Z1 , Z2 } first-order stochastic dominates that of N(0.2, 1), i.e., TN is asymptotically biased against HN for δ = 0.2. 11 Figure 1. Shows that TN is asymptocally biased against HN for δ = 0.2. 3 Bootstrap Procedure In this section we discuss our method for obtaining asymptotically valid critical values. We propose the wild bootstrap procedure as follows. Let Xki (θ, τ ) = Yki − g(Zki ; θ, τ ), for some function g. Among the necessary identification conditions for θ0 and τ 0 is the condition that E [Yki − g(Zki ; θ, τ )|Zki ] = 0. Let Xi (θ, τ ) be a K- dimensional vector whose N k-th entry is given by Xki (θ, τ ). Let (X̂i )N i=1 = (Xi (θ̂, τ̂ ))i=1 . Following the wild bootstrap B ∗ ∗ procedure (Härdle and Mammen (1993)), we draw {(ε∗i,b )N i=1 }b=1 such that E [εki,b |Zki ] = 0 2 ∗ and E∗ [ε∗2 ki,b |Zki ] = X̂ki , where E [·|Zki ] denotes the bootstrap conditional distribution given ∗ = g(Zki ; θ̂, τ̂ ) + ε∗ki,b , k = 1, . . . , K and take Zki and ε∗ki,b is the k-th element of ε∗i,b . Let Yki,b ∗ ∗ ∗ ∗ = (Yki,b , Zki )K Wi,b k=1 . Using this bootstrap sample, construct estimators θ̂ b and τ̂ b , for each ∗ ∗ ∗ = Yki,b − g(Zki ; θ̂b , τ̂ ∗b ). b = 1, . . . , B. Then, we define Xki,b Now, we introduce the following bootstrap empirical process (s)∗ D̄kl,b (x) N n h io X 1 ∗ ∗ hx (Xki,b = ) − hx (Xli,b ) − EN hx (X̂ki ) − hx (X̂li ) , b = 1, 2, . . . , B, N(s − 1)! i=1 12 where EN denotes the expectation with respect to the empirical measure of {X̂i }N i=1 . The (s)∗ (s) quantity D̄kl,b (x) denotes the bootstrap counterpart of D̄kl (x). Given a sequence cN → 0 √ and cN N → ∞, we construct the estimated contact set: (s) B̂kl o n (s) = x ∈ X : q(x)|D̄kl (x)| < cN . (10) As for the weight function q(x), we may consider the following type of function. For z1 < z2 and for a constant a > 0, we set ⎧ ⎪ ⎨ 1 if x ∈ [z1 , z2 ] q(x) = a/(a + |x − z2 |(s−1)∨1 ) if x > z2 ⎪ ⎩ a/(a + |x − z1 |(s−1)∨1 ) if x < z1 . This is a modification of the weighting function considered by Horváth, Kokoszka, and Zikitis (2006). We propose the Kolmogorov-Smirnov type bootstrap test statistic as (s)∗ DN,b = ⎧ √ (s)∗ (s) ⎪ Nq(x)D̄kl,b (x), if B̂kl 6= ∅ for all k 6= l (s) ⎪ ⎨ mink6=l supx∈B̂kl ⎪ ⎪ ⎩ π N if (s) B̂kl (11) = ∅ for some k 6= l, where π N is a positive sequence such that π N → ∞. From this, we obtain the bootstrap (s)∗ −1 B critical values, c∗KS α,N,B = inf{t : B Σb=1 1{DN,b ≤ t} ≥ 1 − α}, yielding an α-level bootstrap (s) ∗ test: ϕKS α , 1{DN > cα,N,B }. The sequence π N governs the asymptotic behavior of the size of the test. When π N increases faster, the size of the test converges to zero faster in the KS . interior points P ∈ P0 \P00 Likewise, we can define the Cramér-von Mises type bootstrap test statistic as (s)∗ CN,b = ⎧ n o2 R (s)∗ (s) ⎪ ⎪ N min max q(x) D̄ (x), 0 dx, if B̂kl 6= ∅ for all k 6= l (s) k6=l B̂ ⎨ kl,b kl ⎪ ⎪ ⎩ (12) (s) π N if B̂kl = ∅ for some k 6= l, where π N is a positive sequence such that π N → ∞. The bootstrap critical values are (s)∗ −1 B obtained by c∗CM α,N,B = inf{t : B Σb=1 1{CN,b ≤ t} ≥ 1 − α}, yielding an α-level bootstrap (s) , 1{CN > c∗CM test: ϕCM α α,N,B }. The main step we need to justify the bootstrap procedure uniformly in P ∈ P under this general semiparametric environment is to establish the bootstrap uniform central limit theorem for the empirical processes in the test statistic, where uniformity holds over probability 13 measures in P. The uniform central limit theorem for empirical processes has been developed by Giné and Zinn (1991) and Sheehy and Wellner (1992) through the characterization of uniform Donsker classes. The bootstrap central limit theorem for empirical processes was established by Giné and Zinn (1990) who considered a nonparametric bootstrap procedure. See also a nice lecture note on the bootstrap by Giné (1997). We cannot directly use these results because the ∗ N }i=1 is non-identically distributed both in the bootstrap distribution for each sample {Wi,b case of wild-bootstrap or residual-based bootstrap. In the Appendix, we show how we can make the bootstrap CLT result of Giné (1997) accommodate this case by slightly modifying his result.5 This paper’s bootstrap procedure does not depend on a specific estimation method for θ0 and τ 0 . Due to this generality of our framework, we introduce the following additional ∗ assumptions about the bootstrap sample Wib∗ , the bootstrap estimators (θ̂ , τ̂ ∗ ), and the function g(z; θ, τ ) introduced at the beginning of this section. Let GN be the σ-field generated by {Wi }N i=1 . Assumption 4: (i) For arbitrary ε > 0, o n ∗ P ||θ̂ − θ̂|| + ||τ̂ ∗ − τ̂ ||P,2 > N −1/4 ε|GN →P 0, uniformly in P ∈ P. (ii) For ψ x,k,P (·) in Assumption 1(iv), E[ψx,k,P (Wi )|Zki ] = 0, and for some rN → 0 with limsupN→∞ rN /cN < ∞, P ¯ ¯ ) N ¯√ ¯ X 1 ¯ ∗ ¯ supx∈X ¯ N Γ̂k,P (x) − √ ψx,k,P (Wi,b )¯ > rN ε|GN →P 0 ¯ ¯ N i=1 ( (s) ∗ (s) uniformly in P ∈ P for any ε > 0, where Γ̂k,P (x) = EN [hx (ϕk (Wi∗ ; θ̂ , τ̂ ∗ ))]−EN [hx (ϕk (Wi∗ ; θ̂, τ̂ ))] ∗ )|Zki ] = 0. and E∗ [ψx,k,P (Wi,b (iii) There exists C such that for each (θ1 , τ 1 ) ∈ BΘ×T (δ) and for each ε > 0, supz∈Zk sup(d2 ,τ 2 )∈BΘ×T (ε) |g(z; θ1 , τ 1 ) − g(z; θ2 , τ 2 )| ≤ Cε2s2 , (13) for some s2 ∈ (d, 1], where Zk is the support of Zki . (iv) For each k 6= l, there exists δ > 0 and constants ckl > 0 and pkl ≥ 1 such that for each 5 For a weak convergence of residual-based bootstrap process in a regression model, see Koul and Lahiri (1994). For a parametric bootstrap procedure for goodness-of-fit tests, see, among others, Andrews (1997). See also Härdle and Mammen (1993) and Whang (2000) for a wild-bootstrap procedure that contains finitedimensional estimators. 14 x, x0 ∈ X , ¯ ¯ ¯ (s) (s) 0 ¯ (s) (s) 0 (s) 2 pkl /2 . ¯q(x)|Dkl (x)| − q(x )|Dkl (x )|¯ ≥ ckl {E[h(s) x (Xki )−hx (Xli )−{hx0 (Xki )−hx0 (Xli )}] ∧δ} Condition (i) requires that the bootstrap estimators should be consistent at a rate faster than N −1/4 uniformly over P ∈ P. The bootstrap validity of M-estimators is established by Arcones and Giné (1992). Infinite dimensional Z-estimators are dealt with by Wellner and Zhan (1996). See also Abrevaya and Huang (2005) for a bootstrap inconsistency result for a case where the estimators are not asymptotically linear. Condition (iii) implies the condition in (5) by choosing W1i = Zki and can be weakened to the locally uniform L2 -continuity (e.g. Chen, Linton, and van Keilegom (2003)) when we confine our attention to the case s > 1. Condition (iv) is similar to Condition C.2 in Chernozhukov, Hong, and Tamer (2007) and is used to control the convergence rate of the estimated set with respect an appropriate norm. This condition is needed only for the convergence rate of the rejection probability. The wild bootstrap procedure is introduced to ensure Condition (ii) which assumes the √ ∗ bootstrap analogue of the asymptotic linearity of NΓk,P (x)[θ̂ − θ̂, τ̂ ∗ − τ̂ ] with the same influence function ψ x,P . In the case of parametric regression specification: Xki (θ0 ) = Yki − g(Zki ; θ0 ) without the nonparametric function τ 0 , the parameter θ0 is typically identified by the condition E[Xki (θ0 )] = 0 and in this case, we may use the residual-based bootstrap procePN 1 dure by replacing ε∗ki,b by those resampled from {X̂i − X̄N }N i=1 where X̄N = N i=1 Xi (θ̂, τ̂ ). PN 1 ∗ )− Then we need Condition (ii) with the centered influence function √N i=1 {ψx,k,P (Wi,b EN ψx,k,P (Ŵi )} where Ŵi = (Ŷi , Zi ) and Ŷki = g(Zki ; θ̂) + Xki (θ0 ). (See. e.g. Koul and Lahiri (1994)). One of the important conditions that are needed in this case is the uniform oscillation behavior of the bootstrap empirical process, as established in the proof of Theorem 2.2 of Giné (1997). We slightly modify his result so that empirical processes constructed from the residual-based bootstrap procedure are accommodated. This latter result of Giné is also crucial for our establishing the bootstrap central limit theorem that is uniform in P. Theorem 2 : Suppose that the conditions of Theorem 1 and Assumption 4 hold and that √ cN → 0 and cN N → ∞. Then the following holds. (i) Uniformly over P ∈ P0 , o n p (s) ≤ α + O P DN > c∗KS (c − log cN ), and P N α,∞ o n p (s) ≤ α + OP (cN − log cN ). P CN > c∗CM α,∞ (s) KS (ii) Furthermore, suppose that for all P ∈ P00 , the distribution of mink6=l supx∈B(s) (ν kl (x)) kl 15 CM is continuous, and for all P ∈ P00 , the distribution of mink6=l continuous. Then R (s) (s) Bkl max{ν kl (x), 0}2 dx is o n p (s) KS = α + O (c − log cN ) uniformly over P ∈ P00 and P DN > c∗KS P N α,∞ o n p (s) CM = α + OP (cN − log cN ) uniformly over P ∈ P00 P CN > c∗CM , α,∞ The first result says that the bootstrap tests have asymptotically correct sizes. The second result tells us that the bootstrap test is asymptotically similar on the boundary. The second result combined with the first result establishes that the bootstrap test has exact asymptotic size equal to α. The result is uniform over P ∈ P0 . The caveats of the absolute continuity of the limiting distributions are necessary because in certain cases the distribution can be discrete or even degenerate at one point. For example, consider the case where we have K = 2, s = 1, and X = [0, 1] which is the common support for X1 and X2 whose (s) distribution functions meet only at 0 and 1 within X . Then, the Gaussian process ν kl (·) is a variant of a Brownian bridge process taking values 0 at end points of 0 and 1 and hence (s) mink6=l supx∈B(s) (ν kl (x)) is a discrete random variable degenerate at zero. In this case, the kl asymptotic size is zero, violating the equality in (ii). The limit behavior of the bootstrap test statistic mimics the discontinuity in Theorem 1. Both the original and bootstrap test statistics diverge when the data generating process is away from the boundary points. However, the direction of divergence is opposite. When the (s) bootstrap distribution is away from "the boundary points," i.e., if B̂kl = ∅ for some k 6= l, the bootstrap test statistic diverges to ∞. However, outside the boundary points, the original test statistic diverges to −∞. This opposite direction is imposed to keep the asymptotic size KS . of the bootstrap test below α uniformly over all the probabilities P ∈ P0 \P00 Our requirement for the sequence π N to increase to infinity is necessary for our results that are uniform in P. To see this, consider the following: n o √ √ √ (s) (s) (s) (s) N q(x)D̄kl (x, θ̂, τ̂ ) = Nq(x) D̄kl (x, θ̂, τ̂ ) − Dkl (x) + Nq(x)Dkl (x). (14) The first component on the right-hand side is uniformly asymptotically tight and its tail probability goes to zero only when we increase the cut-off value to infinity. The behavior of KS . The worst case arises the last term depends on the sequence of probabilities PN ∈ P0 \P00 √ (s) when PN converges to a boundary point so fast that for some k 6= l, supx∈X Nq(x)Dkl (x) also converges to zero. Considering uniformity in P, we should take this case into account. Therefore, π N should increase to infinity in order to control the tail probability of the supremum of the combined quantities in (14) appropriately. On the other hand, for only pointwise 16 KS results for each P ∈ P0 \P00 , it suffices to set π N to be any nonnegative number. It is worth noting that the convergence of the rejection probability on the boundary points is slightly slower than cN . Since we can take cN converge to zero faster than N −1/3 , the convergence of the rejection probability can be made to be faster than the subsamplingbased test.6 Suppose that the null hypotheses in (1) and (2) are equivalent. Let us compare the (s) (s) asymptotic similarity property of two tests DN and CN that use bootstrap critical values CM KS ⊂ P00 . Hence the Kolmogorov-Smirnov tests are as proposed above. In general P00 asymptotically similar over a wider set of probabilities than the Cramér-von Mises type (s) tests. Such a comparison is relevant when for all k 6= l, the contact sets Bkl are not empty (s) (s) (s) and for some k 6= l, Bkl is a finite set, as would happen, for example, with Dk and Dl (s) (s) contact on a finite set of points and Dk (x) ≤ Dl (x). In that case, the Kolmogorov-Smirnov test has an exact asymptotic size of α while the asymptotic size of Cramér-von Mises test is zero. The result of Theorem 2 allows for a wide range of choice for the sequence π N in (11) and (12) in the bootstrap test statistic. In practice, we may consider two extreme cases π N = log(N) and π N = ∞, and see the robustness of the results. The latter case with π N = ∞ is tantamount to the rule of setting the test ϕ = 0 (i.e. do not reject the null) when the contact set B̂ is empty. 4 Asymptotic Power Properties In this section, we establish asymptotic power properties of the bootstrap test. First, we consider consistency of the test. Theorem 3 : Suppose that the conditions of Theorem 2 hold and that we are under a fixed alternative P ∈ P\P0 . Then, o o n n (s) (s) ∗CM → 1 and lim → 1. limN→∞ P DN > c∗KS P C > c N→∞ α,∞ α,∞ N Therefore, the bootstrap test is consistent against all types of alternatives. This property is shared by other tests of LMW and Barrett and Donald (2003) for example. Let us turn to asymptotic local power properties. We consider a sequence of prob(s) (s) (s) abilities PN ∈ P\P0 and denote Dk,N (x) to be Dk (x) under PN . That is, Dk,N (x) = (s) (s) EPN hx (ϕk (W ; θ0 , τ 0 )) using the notation of hx and the specification of Xk in a previous 6 The slower convergence rate for the bootstrap test than the asymptotic approximation is due to the (s) presence of the estimation error in the contact set B̂kl . 17 section, where EPN denotes the expectation under PN . We confine our attention to {PN } (s) KS(s) (·) such that such that for each k ∈ {1, . . . , K}, there exists functions Hk (·) and δ k KS(s) KS(s) (s) Dk,N (x) = Hk (x) + δ k √ (x)/ N (15) as N → ∞. In the case of Cramér-von Mises type tests, we consider the following local alternatives: (s) for the functions Hk (·) in (15), we assume that CM(s) Dk,N CM(s) (s) (x) = Hk (x) + δ k KS(s) (s) We assume the following for the functions Hk (·), δ k (s) √ (x)/ N. (16) CM(s) (·) and δ k (·). (s) Assumption 5: (i) Ckl = {x ∈ X : Hk (x) − Hl (x) = 0} is nonempty for all k 6= l. (s) (s) (ii) mink6=l supx∈X (Hk (x) − Hl (x)) ≤ 0. R KS(s) KS(s) CM(s) CM(s) (x)−δ l (x)) > 0 and Ckl (max{(δk (x)−δ l (x)), 0})2 dx > 0, for (iii) supx∈Ckl (δ k all k 6= l. Assumption 5 comprises conditions for the sequences in (15) and (16) to constitute non(s) (s) void local alternatives. Note that when Hk (x) = EP hx (ϕk (W ; θ0 , τ 0 )) for some P ∈ (s) P, these conditions for Hk imply that P ∈ P0 , that is, the probability associated with (s) Hk belongs to the null hypothesis, in particular, the boundary points.7 The conditions KS(s) CM(s) (x) and δ k (x) indicate that for each N, the probability PN belongs to the for δ k alternative hypothesis P\P0 . Therefore, the sequence PN represents local alternatives that KS ) maintaining the converge to the null hypothesis (in particular, to the boundary points P00 √ KS(s) (s) KS(s) (x). The convergence of Dk,N (x) to Hk (x) at the rate of N in the direction of δk following theorem shows that the tests are asymptotically unbiased against such sequences of local alternatives. Theorem 4 : Suppose that the conditions of Theorem 2 and Assumption 5 hold. (i) Under the local alternatives PN ∈ P\P0 satisfying the condition in (15), n o (s) ∗KS limN→∞ PN DN > cα,∞ ≥ α. (ii) Under the local alternatives PN ∈ P\P0 satisfying the condition in (16), 7 n o (s) ≥ α. limN→∞ PN CN > c∗CM α,∞ Since the distributions are continuous, the contact sets are nonempty under the alternatives. Therefore, any local alternatives that converge to the null hypothesis will have a limit in the set of boundary points. 18 Similarly as in LMW, the asymptotic unbiasedness of the test follows from Anderson’s Lemma (Anderson (1952)).8 It is interesting to compare the asymptotic power of the bootstrap procedure that is based on the least favorable set of the null hypothesis. Using the ∗ : k = 1, ··, K}ni=1 , b = 1, . . . , B, this bootstrap procedure same bootstrap sample {Xki,b alternatively consider the following bootstrap test statistics (s)∗LF DN,b (s)∗LF CN,b √ (s)∗ = min sup Nq(x)D̄kl,b (x) and k6=l x∈X Z n o2 (s)∗ = N min max q(x)D̄kl,b (x), 0 dx. k6=l (17) X and c∗CM−LF respectively. Let the bootstrap critical values with B → ∞ denoted by c∗KS−LF α,∞ α,∞ The results of this paper easily imply the following fact that this bootstrap procedure is certainly inferior to the procedure that this paper proposes. For brevity, we state the result (s)∗LF (s)∗LF for DN,b . We can obtain a similar result for CN,b . Corollary 5 : Suppose that the conditions of Theorem 4 hold. Under the local alternatives PN ∈ P\P0 satisfying the condition in (15), n o n o (s) (s) ∗KS ∗KS−LF . limN→∞ PN DN > cα,∞ ≥ limN→∞ PN DN > cα,∞ Furthermore, assume that for all k 6= l, the union of the closures of the contact sets Ckl is a (s) proper subset of the interior of X and the Gaussian processes ν kl (x) in Theorem 1 satisfy (s) that E[{supx∈Ckl ν kl (x)}2 ] > 0. Then the inequality above is strict. The result of Corollary 5 is remarkable that the bootstrap test of this paper weakly dominates the bootstrap in (17) regardless of the Pitman local alternative directions. Furthermore, when the union of the closures of the contact sets is a proper subset of the interior of X , our test strictly dominates the bootstrap in (17) uniformly over the Pitman local alternatives. However, this latter condition for contact sets can be strong when the number K of prospects are many. In this case, a full characterization of the case for strict dominance appears to be complicated. The result of Corollary 5 is based on the nonsimilarity of the bootstrap tests in (17) on the boundary. In fact, Corollary 5 implies that the test based on the bootstrap procedure using (17) is inadmissible. This result is related to Hansen (2003)’s finding in an environment of finite-dimensional composite hypothesis testing that a test that is not asymptotically similar on the boundary is asymptotically inadmissible. 8 Anderson’s Lemma does not apply√to one-sided Kolmogorov-Smirnov type tests. Song (2008) demonstrates that there exist a class of n-converging Pitman local alternatives against which one-sided Kolmogorov-Smirnov type tests are asymptotically biased. Also see the discussion at the end of Section 2. 19 5 5.1 Monte Carlo Experiments Simulation Designs In this section, we examine the finite sample performance of our tests using Monte Carlo simulations. In particular, we compare our procedure with the subsampling method and recentered bootstrap method which were suggested by LMW. For a fair comparison of the simulation results, the simulation designs we consider are exactly same as those considered by LMW: the Burr distributions also examined by Tse and Zhang (2004), the lognormal distributions also considered by Barrett and Donald (2003) and the exchangeable normal processes of Klecan et. al. (1991). We first describe the details of the simulation designs. We consider the five different designs from the Burr Type XII distribution, B(α, β) which is a two parameter family with cdf defined by: F (x) = 1 − (1 + xα )−β , x ≥ 0. The parameters and the population values of d∗1 , d∗2 of the Burr designs are given below. Design X1 1a 1b 1c 1d 1e B(4.7, 0.55) B(2.0, 0.65) B(4.7, 0.55) B(4.6, 0.55) B(4.5, 0.55) d∗1 X2 d∗2 B(4.7, 0.55) 0.000(F SD) 0.0000(SSD) B(2.0, 0.65) 0.0000(F SD) 0.0000(SSD) B(2.0, 0.65) 0.1395 0.0784 B(2.0, 0.65) 0.1368 0.0773 B(2.0, 0.65) 0.1340 0.0761 On the other hand, we consider four different designs from the lognormal distribution LN(μk , σ 2k ). That is, we consider the distributions of Xk = exp(μk + σk Zk ), where Zk are i.i.d. N(0, 1). The parameters and the population values of d∗1 , d∗2 of the lognormal designs are given below. Design 2a 2b 2c 2d X1 d∗1 X2 d∗2 LN(0.85, 0.62 ) LN(0.85, 0.62 ) 0.0000(F SD) 0.0000(SSD) LN(0.85, 0.62 ) LN(0.7, 0.52 ) 0.0000(F SD) 0.0000(SSD) LN(0.85, 0.62 ) LN(1.2, 0.22 ) 0.0834 0.0000(SSD) LN(0.85, 0.62 ) LN(0.2, 0.12 ) 0.0609 0.0122 20 To define the multivariate normal designs, we let ³√ ´i h p Xki = (1 − λ) αk + β k ρZ0i + 1 − ρZki + λXk,i−1 , where (Z0i , Z1i , Z2i ) are i.i.d. standard normal random variables, mutually independent. The parameters λ = ρ = 0.1 determine the mutual correlation of X1i and X2i and their autocorrelation. The parameters αk , β k are actually the mean and standard deviation of the marginal distributions of X1i and X2i . The marginals and the true values of the multivariate normal designs are: Design 3a 3b 3c X1 d∗1 X2 d∗2 N(0, 1) N(−1, 16) 0.1981 0.0000(SSD) N(0, 16) N(1, 16) 0.0000(F SD) 0.0000(SSD) N(0, 1) N(1, 16) 0.1981 0.5967 (s) In computing the suprema in DN we took a maximum over an equally spaced grid of (2) size n on the range of the pooled empirical distribution. To compute the integral for CN , we took the sum over the same grid points. We chose a total of 5 different subsamples for each sample size N ∈ {50, 500, 1000}. We took an equally spaced grid of subsample sizes: for N = 50, the subsample sizes are {20, 25, . . . , 45}; for N = 500 the subsample sizes are {50, 100, . . . , 250}; for N = 1000 the subsample sizes are {100, 200, . . . , 500}.9 This grid of subsamples are then used to compute the mean of the critical values from the grid, which had the best overall performance among the automatic methods considered by LMW. We used a total of 200 bootstrap repetitions in each case. In computing the suprema and integral in each subsample and bootstrap test statistic, we took the same grid of points as was used in the original test statistic. To estimate the contact set o n (s) (s) B̂kl = x ∈ X : |D̄kl (x)| < cN , we took the tuning parameter to be cN = cN −1/3 , where c ∈ {0.25, 0.50, 0.75} for s = 1 and c ∈ {2.0, 3.0, 4.0} for s = 2. To compute the test statistics and the contact set, we took the weight function q(x) = 1 for all x ∈ X , i.e., no weighting. In each experiment, the number of replications was 1, 000. 9 LMW considered a finer grid, i.e., 20, of different subsample sizes but their simulation results are very close to ours. 21 5.2 Simulation Results Tables 1F - 3S present the rejection probabilities for the tests with nominal size 5%. The simulation standard error is approximately 0.007. The overall impression is that all the methods considered work reasonably well in samples above 500. The full- sample methods work better than the subsample method under the null hypothesis in small samples . Our procedure is always more powerful than the recentered bootstrap method. In cases (1c,1d,1e,2d below) where the subsample method is better than the recentered bootstrap, our procedure is strictly more powerful than the subsample method. The performance of our procedure depend on the choice of the tuning parameter cN . If it takes larger values, then the rejection probabilities get closer to those of the recentered bootstrap, but the rejection probabilities increase as cN decreases. This is consistent with our theory because the rejection probability is a monotonically non-increasing function of cN . In a reasonable range of the tuning parameter, our procedure strictly dominates the the subsample and recentered bootstrap methods in terms of power without sacrificing size in all of the designs we considered. The first two designs in Tables 1F and 1S are to evaluate the size performance, especially in the least favorable case of the equality of two distributions, while the other three are to evaluate the power performance. The subsample method tends to over-reject under the null when N = 50 but the size distortions are negligible when N ≥ 500. Our procedure has a good size performance over all tuning parameters we considered in Table 1F, but the case 1b in Table 1S shows that it tends to over-reject when the tuning parameter is chosen to be too small. The last three designs (case 1c, 1d, and 1e in Tables 1F and 1S) are quite conclusive. For moderate and large samples (N ≥ 500), the subsample method is more powerful than the recentered bootstrap method. However, in these cases, our procedure strictly dominates the subsample method uniformly over all values of cN we considered. This is so even at small samples (N = 50). This remarkable results forcefully demonstrate the power of our procedure. Tables 2F and 2S give the rejection probabilities under the lognormal designs. The results under the least favorable case 2a are similar to those of 1a and 1b. The design 2b (in Table 2F) is quite instructive because it corresponds to the boundary case under which the two distributions "kiss" at points in the interior of the support. The recentered bootstrap tends to under-reject the null hypothesis and perform very differently from the least favorable case, while our procedure tends to have correct size for a suitable value of cN . The design 2c in Table 2F shows that, in small sample N = 50, the full sample method is more powerful than the subsample method. In design 2d (Tables 2F and 2S), the subsample method is more powerful than the recentered bootstrap method for all sample sizes, but again our procedure 22 dominates the subsample method. Tables 3F and 3S consider the multivariate normal designs. The designs 3a (Table 2F) and 3b (Tables 3F and 3S ) are to evaluate the size characteristics of the tests. The latter designs correspond to the "interior" of the null hypothesis and the rejection probabilities tend to zero as the sample size increases. This is consistent with our theory. The other designs are to evaluate the power performance. They show that the subsample method is less powerful than the recentered bootstrap and the latter is again dominated by our procedure. Among the Kolmogorov-Smirnov and Cramer-von Mises type tests, the power performance depends on the alternatives. In designs 1c,1d,1e, and 2d, the Kolmogorov-Smirnov type tests are more powerful, while in designs 2c, 3a, and 3c, the Cramer-von Mises type tests are more powerful. 6 Conclusion This paper proposes a new method for testing stochastic dominance that improves on the existing methods. Specifically, our tests have asymptotic sizes that are exactly correct uniformly over the entire null hypothesis. In addition, we have extended the domain of applicability of our tests to a more general class of situations where the outcome variable is the residual from some semiparametric model. Our simulation study demonstrates that our method works better than existing methods for quite modest sample sizes. Our setting throughout has been i.i.d. data, but many time series applications call for the treatment of dependent data. In that case, one may have to use a block bootstrap algorithm in place of our wild bootstrap method. We expect, however, that similar results will obtain in that case. 7 Appendix: Mathematical Proofs Throughout the proof, C denotes a constant that can assume different values in different places. 7.1 Proofs of the Main Results Lemma A1 : Let V (x) = xq(x), and Ṽ (x) = |Dkl (x)|q(x) x ∈ X , and introduce pseudo metrics dV (x, x0 ) = |V (x) − V (x0 )| and dṼ (x, x0 ) = |Ṽ (x) − Ṽ (x0 )|. Then, © ª log N[] (ε, X , dV ) ≤ C log h−1 (C/ε) − log(ε) and © ª log N[] (ε, X , dṼ ) ≤ C log h−1 (C/ε) − log(ε) , where h(·) is an arbitrary strictly increasing function. 23 Proof of Lemma A1 : Fix ε > 0. Note that |V | is uniformly bounded by Assumption 3(ii) and hence for some C > 0, V (x) < C/h(x) for all x > 1 where h(x) is an increasing function. Put x = h−1 (2C/ε) where h−1 (y) =inf{x : h(x) ≥ y : x ∈ X }, so that we obtain V (h−1 (2C/ε)) < ε/2. Then, we can choose L(ε) ≤ Ch−1 (2C/ε) such that L(ε) ≥ 1 and sup|x|≥L(ε) V (x) < ε/2. Partition L [−L(ε), L(ε)] into intervals {Xm }N m=1 of length ε with the number of intervals NL(ε) not greater than : 2L(ε)/ε. Let X0 = X \[−L(ε), L(ε)]. Then (∪m≥1 Xm ) ∪ X0 constitutes the partition of X . In terms of dV , the diameter of the set X0 is bounded by ε because supx,x0 ∈X0 |V (x) − V (x0 )| ≤ ε/2 + ε/2 = ε, and the diameter of Xm , m ≥ 1, is also bounded by Cε because dV (x, x0 ) = |V (x) − V (x0 )| ≤ C|x − x0 | ≤ Cε for all x, x0 ∈ Im . The second inequality uses the fact that q(x) is Lipschitz continuous. Therefore, we have N (ε, X , dV ) ≤ 2L(ε)/ε + 1 ≤ Ch−1 (2C/ε)/ε + 1. Hence logN (ε, X , dV ) ≤ C log h−1 (C/ε) − log(ε). We can obtain the same result for dṼ precisely in the same manner. Note that the properties of V (x) that we used are that it is uniformly bounded and that |V (x) − V (x0 )| ≤ C|x − x0 |. By Assumption 2(i), Ṽ (x) also satisfy these properties. n o (s) Lemma A2 : Let F = hx (ϕ(·; θ, τ )) : (x, θ, τ ) ∈ X × BΘ×T (δ) . Then, supP ∈P log N[] (ε, F, || · ||P,2 ) ≤ Cε−2d/s2 + C log ε, if s = 1, and supP ∈P log N[] (ε, F, || · ||P,2 ) ≤ Cε−d/s2 + C/ε, if s > 1. (s) Proof of Lemma A2 : Let H = {hx : x ∈R} and γ x (y) = 1{y ≤ x}. Observe that the function (x − ϕ)s−1 γ x (ϕ) is monotone decreasing in ϕ for all x, z. Define Φ = {ϕ(·; θ, τ ) : (θ, τ ) ∈ Θ × T }. Then by the local uniform Lp -continuity condition in Assumption 2(i)(c), we have log N[] (ε, Φ, || · ||P,2 ) ≤ log N[] (Cε1/s2 , Θ × T , || · ||P,2 ) ≤ Cε−d/s2 . (18) R 2 1 ∆1,j dP ≤ ε2 . For each j, let Qj,P be the distribution of Choose ε-brackets (ϕj , ∆1,j )N j=1 of Φ such that ϕj (W ) where Wj is distributed as P. Suppose s = 1. In Lemma A1, we take h(x) = (log x)s2 /d and obtain logN (ε, X , dV ) ≤ Cε−d/s2 − L C log(ε). Fix ε > 0 and choose the partition X = ∪N m=0 Xm as above. Let xm be the center of Xm , m ≥ 1, and let x0 be any point in X0 . Write Then, ¯ ¯ supϕ∈Φj supx∈Xm ¯γ x (ϕ(w))q(x) − γ xm (ϕj (w))q(xm )¯ ¯ ¯ ≤ supϕ∈Φj supx∈Xm ¯γ x (ϕ(w))q(x) − γ xm (ϕj (w))q(x)¯ ¯ ¯ +supϕ∈Φj supx∈Xm ¯γ xm (ϕj (w))q(x) − γ xm (ϕj (w))q(xm )¯ = ∆∗m,j (w), say. i h ¯ ¯2 E supϕ∈Φj supx∈Xm ¯γ x (ϕ(W )) − γ xm (ϕj (W ))¯ q(x)2 |W1 ¤ £ ≤ E supx∈Xm 1{x − ∆1,j (W1 ) − |x − xm | ≤ ϕj (W ) ≤ x + ∆1,j (W1 ) + |x − xm |}q(x)2 |W1 ≤ supx∈X0 q(x)2 1{m = 0} + C {ε + ∆1,j (W1 )} 1{m ≥ 1} ≤ Cε2 1{m = 0} + C {ε + ∆1,j (W1 )} 1{m ≥ 1} ≤ Cε + C∆1,j (W1 ). 24 (19) The second inequality is obtained by splitting the supremum into the case m = 0 and the case m ≥ 1. The third and fourth inequalities follow because supx∈X0 |q(x)|2 ≤ sup|x|>L(ε) |x|2 |q(x)|2 ≤ ε2 /4. The last inequality follows by subsuming Cε2 into Cε. Similarly we can show that i h ¯ ¯2 E supϕ∈Φj supx∈Xm ¯γ xm (ϕj (W ))q(x) − γ xm (ϕj (W ))q(xk )¯ |W1 ≤ C|q(x) − q(xk )|2 ≤ Cε2 . We conclude that ||∆∗k,j ||P,2 ≤ Cε1/2 . Hence supP ∈P log N[] (Cε1/2 , F, Lp (P )) ≤ log N[] (ε, Φ, Lp (P )) + log N[] (ε, X , dV ) ≤ Cε−d/s2 − C log ε. Suppose s > 1. Then (x − ϕ)s−1 γ x (ϕ)q(x) is infinite times continuously differentiable in ϕ with bounded 1 derivatives. Choose ε-brackets (ϕj , ∆1,j )N j=1 of Φ as before. Then for each j, we choose also ε-brackets R N2 (j) (hk,j , ∆2,k,j )k=1 such that for all h ∈ H, |h − hk,j | ≤ ∆2,k,j and ∆p2,k,j dQj,P ≤ εp . Since H is monotone increasing and uniformly bounded, we can take N2(j) ≤ N[] (ε, H, Lp (Qj,P )) ≤ C/ε by Birman and Solomjak (1967). Now, observe that ¯ ¯ ¯ ¯ ¯ ¯ ¯h(ϕ(w)) − hk,j (ϕj (w))¯ ≤ ¯h(ϕ(w)) − h(ϕj (w))¯ + ¯h(ϕj (w)) − hk,j (ϕj (w))¯ ¯ ¯ ≤ ¯h(ϕj (w) − ∆1,j (w)) − h(ϕj (w) + ∆1,j (w))¯ + ∆2,k,j (ϕj (w)) ≤ C∆1,j (w) + ∆2,k,j (ϕj (w)). The last inequality is due to the fact that h is Lipschitz with a uniformly bounded coefficient. Take ∆∗k,j (w) = C∆1,j (w) + ∆2,k,j (ϕj (w)). R and note that (∆∗k,j )2 dP ≤ Cε. Therefore, log N[] (ε, F, || · ||P,2 ) ≤ log N[] (Cε, Φ, || · ||P,2 ) + C/ε. Observe that the constants above do not depend on the choice of the measure P . By (18), we obtain the wanted results. Proof of Theorem 1 : Consider the following empirical process N X 1 (s) vkN (f ) = √ {f (Wi ) − E [f (Wi )]} , f ∈ F N (s − 1)! i=1 (20) where F is as defined in Lemma A2 above. Recall that F is uniformly bounded. Therefore, by Lemma A2 and d/s2 < 1, F is asymptotically equicontinuous uniformly in P ∈ P and totally bounded for all P ∈ P. By Proposition 3.1 and Theorem 2.3 of Giné and Zinn (1991) (See also Theorem 2.8.4 of van der Vaart and (s) Wellner (1996)), the weak convergence of vkN (·) in l∞ (F) uniform over P ∈ P immediately follows. Here l∞ (F) denotes the space of uniformly bounded real functions on F. (s) Let FN = {f (·; θ, τ ) : (θ, τ ) ∈ ΘN × TN }. Let us consider the covariance kernel of the process vkN (f ) in f ∈ FN . Choose f1,N (w) = f (w; θ1,N , τ 1,N ) and f2,N (w) = f (w; θ2,N , τ 2,N ), and let f1,0 (w) = f (w; θ0 , τ 0 ) 25 and f2,0 (w) = f (w; θ0 , τ 0 ). Then, |P f1,N f2,N − P f1,N P f2,N − {P f1,0 f2,0 − P f1,0 P f2,0 }| ≤ C |P (f1,N − f1,0 )| + C |P (f2,N − f2,0 )| ≤ C {||θ1,N − θ0 || + ||τ 1,N − τ 1,0 ||P,2 } + C {||θ2,N − θ0 || + ||τ 2,N − τ 2,0 ||P,2 } → 0. (s) The convergence is uniform over all P ∈ P. Hence, the uniform weak convergence of vkN (·) in l∞ (FN ) to a Gaussian process follows by Theorem 2.11.23 of van der Vaart and Wellner (1996). The covariance kernel of the Gaussian process is given by P f1,0 f2,0 − P f1,0 P f2,0 . Suppose that we are under the null hypothesis. Note that √ N Eh(s) (ϕk (W ; θ0 , τ 0 )) (s − 1)! x N n h io X hx(s) (ϕk (Wi ; θ̂, τ̂ )) − E h(s) x (ϕk (Wi ; θ̂, τ̂ )) √ (s) N D̄k (x, θ̂, τ̂ ) − 1 √ N (s − 1)! i=1 √ io N n h (s) + E hx (ϕk (Wi ; θ̂, τ̂ )) − hx(s) (ϕk (Wi ; θ0 , τ 0 )) (s − 1)! N n h io X 1 = √ hx(s) (ϕk (Wi ; θ0 , τ 0 )) − E h(s) x (ϕk (W ; θ 0 , τ 0 )) N (s − 1)! i=1 √ N + Γk,P (x)[θ̂ − θ0 , τ̂ − τ 0 ] + oP (1) (s − 1)! = uniformly over P ∈ P. Hence we write √ (s) N D̄kl (x, θ̂, τ̂ ) − = √ N Eh∆ (W ) (s − 1)! x,kl N n h io X 1 ∆ ∆ ∆ √ + oP (1). h∆ x,kl (Wi ) + ψ x,kl (Wi ) − E hx,kl (W ) + ψ x,kl (W ) N (s − 1)! i=1 h i Under P ∈ P0 , let δ kl (x) = E h∆ (W ) . Therefore, x,kl √ (s) N D̄kl (x, θ̂, τ̂ ) − = √ N δ kl (x) (s − 1)! N n h io X 1 ∆ ∆ ∆ √ (W ) + ψ (W ) − E h (W ) + ψ (W ) + oP (1). h∆ i i x,kl x,kl x,kl x,kl N (s − 1)! i=1 ∆ Let Ψkl = {ψ ∆ x,kl : x ∈ X } and Hkl = {hx,kl : x ∈ X }. Consider the class of functions Fkl = {h + ψ : (h, ψ) ∈ Hkl ×Ψkl }. By Lemma A2 above and Theorem 6 of Andrews (1994), we have supP ∈P log N[] (ε, Hkl , ||·||P,2 ) < Cε−((2d/s2 )∨1) and by Assumption 1(iv), supP ∈P log N[] (ε, Ψkl , || · ||P,2 ) < Cε−d . Note that Hkl is uniformly bounded and Ψkl has an envelope 2ψ̄ such that ||2ψ̄||P,2+δ < C. Therefore, Fkl is a P -Donsker class. The computation of the covariance kernel of the limiting Gaussian process is straightforward. Therefore, √ (s) N D̄kl (·, θ̂, τ̂ ) − √ N δ kl (·) (s) =⇒ ν kl , (s − 1)! 26 uniformly in P ∈ P. This yields the limit results both for the Kolmogorov-Smirnov type test and the Cramér-von Mises type test. Let F be as defined in Lemma A2 above. Let the bootstrap process N n io h X ∗ 1 ∗(s) ∗ ν k,b (x) = √ ; θ̂b , τ̂ ∗b )) − EN h(s) (ϕ (W ; θ̂, τ̂ )) , x ∈ X, hx(s) (ϕk (Wi,b i k x N (s − 1)! i=1 where Wi∗ = (Yi∗ , Zi ). The following lemma is a key step for the bootstrap consistency of the test (Theorem 2). In the following, we use the usual stochastic convergence notation oP ∗ and OP ∗ that is with respect to the conditional distribution given GN . Lemma A3 : Under the assumptions of Theorem 2, ∗(s) ν k,b (x) = N n o X 1 ∗ (s) ∗ √ (ε (θ , τ )) − E [h (ε (θ , τ ))] h(s) 0 0 N 0 0 x ki,b x ki,b N (s − 1)! i=1 N X 1 +√ ψ x,P (ε∗ib (θ0 , τ 0 )) + oP ∗ (1) N (s − 1)! i=1 in P uniformly in P ∈ P. Furthermore, let FN be the class defined in the proof of Theorem 1. Then ∗(s) (s) ν kl,b → ν kl weakly in l∞ (F) in P uniformly in P ∈ P.10 Proof of Lemma A3 : Observe that ∗ (θ, τ , θ̃, τ̃ ) = g(Zki , θ̃, τ̃ ) − g(Zki , θ, τ ) + ε∗ki,b (θ̃, τ̃ ). Xki,b Then, we are interested in the weak convergence of the following bootstrap empirical process N n io h X ∗ ∗ 1 ∗ (s) ∗ √ h(s) x (Xki,b (θ̂, τ̂ , θ̂ b , τ̂ b )) − EN hx (Xki,b (θ̂, τ̂ , θ̂, τ̂ )) N (s − 1)! i=1 = (21) N n o X ∗ ∗ 1 ∗ (s) ∗ √ h(s) x (Xki,b (θ̂, τ̂ , θ̂ b , τ̂ b )) − hx (Xki,b (θ̂, τ̂ , θ̂, τ̂ )) N (s − 1)! i=1 N n o X 1 ∗ (s) ∗ +√ (X ( θ̂, τ̂ , θ̂, τ̂ )) − E [h (X ( θ̂, τ̂ , θ̂, τ̂ ))] . h(s) N x ki,b x ki,b N (s − 1)! i=1 We consider the bootstrap empirical process N n o X 1 ∗ ν ∗1 (x, θ, τ ) = √ (ε (θ, τ ))] . hx(s) (ε∗ki,b (θ, τ )) − EN [h(s) x ki,b N (s − 1)! i=1 (s) We can view the process as indexed by hx ◦ ϕ(·; θ, τ ) ∈ F and the process as a l∞ (F)-valued random (potentially nonmeasurable) element, where F is the class defined in Lemma A2. By Proposition 3.1 of Giné and Zinn (1991), Lemma A2 implies that F is finitely uniformly pregaussian. Theorem 2.3 of Giné and Zinn (1991) combined with Proposition B1 below gives the central limit theorem for the bootstrap empirical 10 For the definition of the bootstrap uniform weak convergence, see Section 7.2 below. 27 process ν ∗1 and so, the law of the process converges weakly to the law of a centered Gaussian process ν 1 on X conditionally on GN in P uniformly in P ∈ P. The Gaussian process ν 1 has a covariance kernel ∆ CovP (h∆ x1 ,kl (W ; θ 1 , τ 1 ), hx2 ,kl (W ; θ 2 , τ 2 )), (s) (s) where h∆ x,kl (W ; θ, τ ) = hx (ϕk (W ; θ, τ )) − hx (ϕl (W ; θ, τ )). The asymptotic equicontinuity associated with this bootstrap uniform CLT implies that ¯ ¯ ¯ ¯ supx ¯ν ∗1 (x, θ̂, τ̂ ) − ν ∗1 (x, θ0 , τ 0 )¯ = oP ∗ (1) in P, uniformly in P. We turn to the term in the second line of (21). Then write it as √ o ∗ N n ∗ ∗ (θ̂, τ̂ , θ̂b , τ̂ ∗b )) − EN hx(s) (Xki,b (θ̂, τ̂ , θ̂, τ̂ )) . EN hx(s) (Xki,b (s − 1)! (22) By Assumptions 1(iv) and 4(ii), we deduce that the process in (22) is asymptotically equivalent to N 1 X ∗ √ ψ x,k,P (Wi,b ) + oP ∗ (cN ) in P, N i=1 ∗ ∗ = ([Yki − g(Zki , θ̂, τ̂ )]∗ + g(Zki , θ̂, τ̂ ), Zki )K uniformly in P ∈ P. Recall that Wi,b k=1 where [·] is the part obtained through the wild-bootstrap procedure. The above process is a bootstrap empirical process already centered. The wanted result follows from the asymptotic equicontinuity condition, Assumption 3(i), and the bootstrap CLT in Proposition B1. (s) (s) (s) (s) Proof of Theorem 2 : (i) Let Bkl = {x : Dkl (x) = 0}. Denote [Bkl ]ε = {x : q(x)|Dkl (x)| < ε}. We will show the following at the end of the proof: o n (s) (s) (s) P Bkl ⊂ B̂kl ⊂ [Bkl ]2cN → 1, uniformly in P ∈ P. (23) Note that [Bkl ]2cN ⊂ X from some sufficiently large N on by Assumption 1(i)(b). Let ν̄ ∗kl,b (x) = √ (s)∗ N N D̄kl,b (x) and define 1K = {Bkl 6= ∅ for all k 6= l} and 12c = 1{[Bkl ]2cN 6= ∅ for all k 6= l}. Also define K n o (s) (s) (s) 2cN 2cN N = 1{[ B̂ ] = 6 ∅ for all k = 6 l}. Since P B ⊂ B̂ ⊂ [B ] → 1̂K = {B̂kl 6= ∅ for all k 6= l} and 1̂2c kl K kl kl kl 1 by (23) and cN → 0, it follows that with probability (uniformly over P ∈ P) approaching one, N supx∈Bkl ν̄ ∗kl,b (x)1K + π N (1 − 12c K ) (24) ≤ supx∈B̂kl ν̄ ∗kl,b (x)1̂K + π N (1 − 1̂K ) N ≤ supx∈[Bkl ]2cN ν̄ ∗kl,b (x)12c K + π N (1 − 1K ). We can apply the weak convergence of the process ν̄ ∗kl,b in Lemma A3 so that for each c > 0, n o (s) P suph |E[h(ν̄ ∗kl,b )|GN ] − Eh(ν kl )| > c, i.o. → 0, where the supremum is over the space of bounded linear functionals h on l∞ (F) and the convergence is uniform over P ∈ P. (Corollary 2.7 of Giné and Zinn (1991). See also Giné (1997)). Furthermore, since the class of functions indexing the bootstrap empirical process ν̄ ∗kl,b is uniformly pregaussian (as implied by the 28 bracketing entropy conditions in Lemma A2 and Proposition 3.1 of Giné and Zinn (1991)), the process ν satisfies that limε→0 supP ∈P EP supf1 ,f2 ∈F:||f1 −f2 ||P,2 <ε |ν(f1 ) − ν(f2 )| = 0. (25) (See Definition 2.4 of Giné and Zinn (1991).) KS . From (24), with probability approaching one, Suppose that for all k, l, Bkl 6= ∅ so that P ∈ P00 supx∈Bkl ν̄ ∗kl,b (x) ≤ supx∈B̂kl ν̄ ∗kl,b (x)1̂K + πN (1 − 1̂K ) ≤ supx∈[Bkl ]2cN ν̄ ∗kl,b (x). Therefore, we deduce that with probability approaching one, ¯ ¯ ¯ ¯ ¯supx∈B̂kl ν̄ ∗kl,b (x)1̂K + π N (1 − 1̂K ) − supx∈Bkl ν̄ ∗kl,b (x)¯ (26) ≤ supx∈[Bkl ]2cN ν̄ ∗kl,b (x) − supx∈Bkl ν̄ ∗kl,b (x) ¯ ¯ ≤ supx∈X : d (x,x0 )≤2c ¯ν̄ ∗kl,b (x) − ν̄ ∗kl,b (x0 )¯ . Ṽ N Given samples {Wi }N i=1 , we define a random metric: dˆkl (x, x ) = 0 ( N ¯2 1 X ¯¯ (s) ¯ (s) (s) ( X̂ )} − {h ( X̂ ) − h ( X̂ )} ¯{hx (X̂ki ) − h(s) 0 0 li ki li ¯ x x x N i=1 )1/2 . From the proof of Theorem 1, the class F is a uniform Donsker, and hence it follows that supdṼ (x,x0 )<δ dˆkl (x, x0 ) = supdṼ (x,x0 )<δ dkl (x, x0 ) + OP (N −1/2 ) (s) (s) (s) (s) uniformly in P ∈ P, where dkl (x, x0 ) = {E|{hx (Xki ) − hx (Xli )} − {hx0 (Xki ) − hx0 (Xli )}|2 }1/2 . Let us define the following process: ∗(s) ν k,b (x; θ̂, τ̂ ) = N n io h X 1 ∗ √ (ε ( θ̂, τ̂ )) hx(s) (ε∗ki,b (θ̂, τ̂ )) − EN h(s) x ki,b N (s − 1)! i=1 N X 1 ∗ +√ ψ x,k,P (Wi,b ), N (s − 1)! i=1 ∗ = [g(Zki ; θ̂, τ̂ ) + ε∗ki (θ̂, τ̂ ), Zki ]K where Wi,b k=1 . By Assumption 4(iv), we have opkl n ≤ CdṼ (x, x0 ) + OP (N −pkl /2 ) = CdṼ (x, x0 ) + OP (N −1/2 ). ckl dˆkl (x, x0 ) (27) Therefore, ¯ ¯ supx∈X : dṼ (x,x0 )≤2cN ¯ν̄ ∗kl,b (x) − ν̄ ∗kl,b (x0 )¯ opkl n¯ n o ¯ ≤ 3cN , ≤ sup ¯ν̄ ∗kl,b (x) − ν̄ ∗kl,b (x0 )¯ : x ∈ X : ckl dˆkl (x, x0 ) with probability approaching one. The sequence 3cN is introduced to take into account the term OP (N −1/2 ) in 29 (27). Following the proof of Lemma A3, we obtain that for some δ > 0, h EP supx∈X : h = EP supx∈X : pkl ckl {dˆkl (x,x0 )} ≤3cN p ckl {dˆkl (x,x0 )} kl ≤3cN i ¯ ∗ ¯ ¯ν̄ kl,b (x) − ν̄ ∗kl,b (x0 )¯ |GN ¯ ¯ i ¯ ¯ ∗(s) ∗(s) ¯ν k,b (x; θ̂, τ̂ ) − ν k,b (x0 ; θ̂, τ̂ )¯ |GN + oP ∗ (cN ) in P uniformly in P. Conditional on GN , the expected supnorm in the second line is bounded by that of a symmetrized one by the symmetrization lemma. This symmetrized process is sub-Gaussian with respect to dˆkl by Hoeffding’s inequality (e.g. van der Vaart and Wellner (1996), p.101), and hence by Corollary 2.2.8 of van der Vaart and Wellner (1996), we have h EP supx∈X : 1/p dˆkl (x,x0 )≤3cN kl /ckl i ¯ ∗ ¯ ¯ν̄ kl,b (x) − ν̄ ∗kl,b (x0 )¯ |GN ≤ C Z 1/pkl CcN 0 q log D(ε, dˆkl )dε 1/p where D(ε, dˆkl ) is the packing number of X with respect to dˆkl . Note that dṼ kl (x, x0 )/ckl < max{ε/2, cN /2} implies dˆkl (x, x0 ) < ε with probability approaching one by (31). Hence we observe that 1/pkl log D(ε, dˆkl ) ≤ log D(max{ε/2, cN /2}, CdṼ ) ≤ log D(max{εpkl /2, cpNkl /2}, CdṼ ) ≤ log D(εpkl /2, CdṼ ) ≤ −C log(ε), with probability approaching one. The last inequality is obtained by choosing h(x) = C1 x1−C2 with appropriate constants C1 and C2 , and applying Lemma A1. Therefore, h EP supx∈X : dˆkl (x,x0 )≤CcN i p ¯ ∗ ¯ ¯ν̄ kl,b (x) − ν̄ ∗kl,b (y)¯ |GN = OP (cN − log cN ) uniformly in P ∈ P. From (26), we conclude that supx∈B̂kl ν̄ ∗kl,b (x)1̂K + π N (1 − 1̂K ) p KS = supx∈Bkl ν̄ ∗kl,b (x) + OP ∗ (cN − log cN ) in P uniform over P ∈ P00 . (s) Since supx∈B(s) ν̄ ∗kl,b (x) →∗ supx∈B (s) ν kl (x) uniformly in P, we deduce that kl D kl p p √ KS − log cN ) = cKS − log cN ), c∗KS α,∞ = cα + OP (1/ N ) + OP (cN α + OP (cN KS , where cKS is such that uniformly in P ∈ P00 α n n o o (s) KS ≥1−α . cKS α = infc∈R P mink6=l supx∈Bkl ν kl (x) ≤ cα √ (s) Hence P {DN > c∗KS α,∞ } = α + OP (cN − log cN ). This result yields the first result of asymptotic similarity. (s) KS Now, suppose that P ∈ P0 \P00 . In this case, for some k, l, we have Dkl (x) < 0 for all x ∈ X so that for some k, l, Bkl = ∅ and 1K = 0. From (24), ∗ N πN (1 − 12c K ) ≤ mink6=l supx∈B̂kl ν̄ kl,b (x)1̂K + π N (1 − 1̂K ), (28) 2cN N = ∅ for some k 6= l. For k, l such that with probability approaching one. Recall that 12c K = 0 when [Bkl ] (s) (s) Bkl = ∅ or supx∈X |Dkl (x)| > 0, there exists N0 such that for all N ≥ N0 , supx∈X |Dkl (x)| > 2cN > 0 30 N so that [Bkl ]2cN = ∅. Hence from some large N on, we have [Bkl ]2cN = ∅ or 12c = 0. The bootstrap test K statistic on the right-hand side of (28) is bounded from below by πN from some large N on. Therefore, c∗KS α,∞ ≥ π N , (29) from some large N on. Now, by Theorem 1, ¸ ∙ √ (s) P min sup N q(x)D̄kl (x, θ̂, τ̂ ) ≥ c∗KS α,∞ k6=l x∈X ∙ ¸ ³ ´ √ (s) (s) (s) ≤ P min sup N q(x) D̄kl (x, θ̂, τ̂ ) − Dkl (x) + Dkl (x) ≥ πN (from some large N on) k6=l x∈X ∙ ¸ √ (s) (s) ≤ P maxk6=l sup N q(x){D̄kl (x, θ̂, τ̂ ) − Dkl (x)} ≥ πN /2 x∈X ∙ ¸ √ (s) +P min sup N q(x)Dkl (x) ≥ πN /2 . k6=l x∈X √ (s) (s) Since the term maxk6=l supx∈X N q(x){D̄kl (x, θ̂, τ̂ ) − Dkl (x)} is uniformly asymptotically tight uniform √ (s) over P ∈ P and π N → ∞, the first probability vanishes. Under H0 , the term mink6=l supx∈X N q(x)Dkl (x) is less than or equal to zero uniformly over P ∈ P0 , and hence the second probability also vanishes uniformly over P ∈ P0 . Hence we obtain the wanted result. CM KS Now, let us turn to the result for Cramér-von Mises type tests. Noting that P00 ⊂ P00 and using the CM previous arguments we can show that uniformly in P ∈ P00 , CM + OP (cN c∗CM α,N = cα is such that where cCM α cCM α = infc∈R ( ( P mink6=l Z (s) Bkl ³ p − log cN ), ´2 (s) max{ν kl (x), 0} dx ≤ cCM α ) ) ≥1−α . CM , observe that we have c∗CM For P ∈ P0 \P00 α,N ≥ π N as before, " P min N k6=l Z (s) (max{q(x)D̄kl (x, θ̂, τ̂ ), 0})2 dx (s) Bkl ≥ cCM α,N # . Observe that max(a + b, 0) ≤ max(a, 0) + max(b, 0). Since we can bound the probability above by " P max 2N k6=l " Z +P min 2N k6=l (s) Bkl Z # ´2 ³ (s) (s) max{q(x)(D̄kl (x, θ̂, τ̂ ) − Dkl (x)), 0} dx ≥ π N /2 (s) Bkl # ´2 ³ (s) max{q(x)Dkl (x), 0} dx ≥ πN /2 , for some sufficiently large N on. By the asymptotic tightness of the empirical process, the first probability (s) converges to zero. The second probability also converges to zero because under the null hypothesis, Dkl (x) ≤ 0 for some k 6= l. √ (s) (s) Let us turn to (23). Since the empirical process N q(x){Dk (x) − D̄k (x)} is asymptotically tight for 31 (s) (s) (s) (s) each k = 1, . . . , K, P {supx (q(x)|Dl (x) − D̄l (x)| + q(x)|Dk (x) − D̄k (x)|) > cN } → 0 by the choice of √ (s) cN → 0 and cN N → ∞. For any x ∈ Bkl so that Dkl (x) = 0, by the triangular inequality, ¯ ¯ ¯ ¯ ¯ ¯ ¯ (s) ¯ ¯ (s) ¯ ¯ (s) ¯ (s) (s) q(x) ¯D̄kl (x)¯ ≤ q(x) ¯Dl (x) − D̄l (x)¯ + q(x) ¯Dk (x) − D̄k (x)¯ ≤ cN (s) (s) with probability approaching one. Thus we deduce that P {Bkl ⊂ B̂kl } → 1. Now, for any x ∈ B̂kl , by the triangular inequality, ¯ ¯ ¯ ¯ ¯ ¯ ¯ (s) ¯ ¯ (s) ¯ ¯ (s) ¯ (s) (s) q(x) ¯Dkl (x)¯ ≤ cN + q(x) ¯Dl (x) − D̄l (x)¯ + q(x) ¯Dk (x) − D̄k (x)¯ ≤ 2cN , (s) (s) with probability approaching one. Therefore, P {B̂kl ⊂ [Bkl ]2cN } → 1. The convergence uniform over P ∈ P √ (s) (s) immediately falls from the fact that N {Dk (x) − D̄k (x)} is asymptotically tight uniformly over P ∈ P. (s) Proof of Theorem 3: (i) The contact sets Bkl are not empty for all k 6= l under each alternative hypothesis because the marginal distributions of Xk and Xl are continuous. Following the proof of Theorem 2 in the (s) case of d∗ = 0, we have for the bootstrap critical values c∗KS α,∞ , KS c∗KS α,∞ = cα + OP (cN p − log cN ). (s) Now, following the proof of Theorem 1, we can show that when d∗ > 0, √ (s) mink6=l supx∈X N D̄kl (x, θ̂, τ̂ ) → ∞. Hence the result follows. (ii) The proof is very similar to (i) and hence it is omitted. Proof of Theorem 4: (i) For any sequence {PN }∞ N =1 ∈ P, the weak convergence result in the proof of Theorem 1 gives √ N EN h(s) x (ϕk (W ; θ 0 , τ 0 )) (s − 1)! N n io h X (s) h(s) x (ϕk (Wi ; θ̂, τ̂ )) − EN hx (ϕk (Wi ; θ̂, τ̂ )) √ (s) N D̄k (x, θ̂, τ̂ ) − = 1 √ N (s − 1)! i=1 √ io h N n (s) + EN h(s) x (ϕk (Wi ; θ̂, τ̂ )) − hx (ϕk (Wi ; θ 0 , τ 0 )) (s − 1)! (s) =⇒ ν k (x), where EN denotes the expectation with respect to PN . This result follows because the functions constituting the process is a uniform Donsker class. Under any sequence PN such that √ (s) EN h(s) x (ϕk (W ; θ 0 , τ 0 ))/(s − 1)! = Hk (x) + δ k (x)/ N as N → ∞, we obtain that {PN }, √ √ (s) (s) (s) N D̄k (x, θ̂, τ̂ )− N Hk (x) =⇒ δ k (x)+ν k (x) or that under the local sequence √ √ (s) (s) (s) N D̄kl (x, θ̂, τ̂ ) − N Hkl (x) ⇒ ν kl (x) + δ kl (x), 32 (s) (s) (s) where Hkl (x) = Hk (x) − Hl (x). Hence o o n√ n√ (s) (s) (s) N D̄kl (x, θ̂, τ̂ ) = min sup N Hkl (x) + ν kl (x) + δ kl (x) + oP (1). k6=l x∈X x∈X o n√ (s) (s) sup N Hkl (x) + ν kl (x) + δ kl (x) + oP (1) = min min sup k6=l k6=l (s) x∈X :Hkl (x)=0 = min k6=l sup (s) x∈X :Hkl (x)=0 o n (s) ν kl (x) + δ kl (x) + oP (1). Therefore, the asymptotic unbiasedness result follows from Theorem 2 by applying Anderson’s Lemma (e.g. Bickel, Klaassen, Ritov, and Wellner (1993), p.466.) (ii) The asymptotic unbiasedness follows similarly by Anderson’s Lemma. Observe that the set of functions R 2 {(f )K k=1 :mink6=j (max{fk − fj , 0}) dx < c} is convex and symmetric around zero. Proof of Corollary 5 : Since the test statistics are the same, it suffices to compare the bootstrap critical values as B → ∞. By the construction of the local alternatives, we are under the probability on the boundary. Since this bootstrap test statistic is recentered, it converges in distribution (conditional on GN ) to the (s) (s)∗ distribution of mink6=l supx∈X ν kl (x), while the distribution of the bootstrap test statistic DN,b converges to (s) that of mink6=l supx∈Ckl ν kl (x). Note that (s) (s) mink6=l supx∈X ν kl (x) ≥ mink6=l supx∈Ckl ν kl (x) ≥ c∗KS because Ckl ⊂ X . Hence c∗KS−LF α,∞ α,∞ . This implies that o n o n n o (s) (s) limN →∞ PN DN > c∗KS = P mink6=l supx∈Ckl ν kl (x) + δ kl (x) > c∗KS α,∞ α,∞ o n o n (s) . ≥ P mink6=l supx∈Ckl ν kl (x) + δ kl (x) > c∗KS−LF α,∞ (30) Now, assume that the set X \∪k6=l cl(Ckl ) contains a nonempty interior, so that (s) (s) supx∈X \∪k6=l cl(Ckl ) ν kl (x) > supx∈∪k6=l cl(Ckl ) ν kl (x) for all k 6= l (s) with positive probability because ν kl (x) is a Gaussian process. Therefore, c∗KS−LF > c∗KS α,∞ α,∞ . In this case, (s) by Gaussianity of the process ν kl (x) + δ kl (x), the inequality in (30) is strict. 7.2 Bootstrap Uniform Central Limit Theorem In this section, we show how we can extend the bootstrap CLT result of Giné (1997) to accommodate the situation where bootstrap samples are non-identically distributed. The essential nature of the residual based bootstrap is to resample only from a marginal distribution of a random vector while leaving the remaining part intact. For example, consider the nonlinear regression model Yi = g(Xi , θ) + εi . Then, bootstrap sample is obtained as {Yi∗ , Xi }ni=1 or as {g(Xi , θ̂) + [Yi − g(Xi , θ̂)]∗ , Xi }ni=1 where [·]∗ indicates the variable obtained from resampling. The bootstrap central limit theorem in this section is appropriate for this situation. We adapt the notations and definitions of Giné and Zinn (1991) and Giné (1997) to our case of residual based bootstrap processes. Let (S, S, P ) be a probability space on a Banach space S, and let (Xi , Zi ) : (S N , S N , P N ) → (S, S, P ) be the i-th coordinate function. Hence {(Xi , Zi )}i∈N is a sequence of pairs of B-valued (i.e. taking values from a Banach space) random elements taking values from X × Z. Let F be 33 a class of uniformly bounded measurable functions on (S, S) with an envelope F , and let F 0 = {f (·, z) : (f, z) ∈ F × Z}. Then, we define ⎫ ⎧ n n ⎨ n ⎬ X X X 1 1 1 δ (X̂i ,Zi ) (ω) − δ Xi (ω) and Pn (ω) , δ (Xj ,Zi ) (ω) Pn,X (ω) , ⎭ n i=1 n i=1 ⎩ n j=1 where X̂1 , . . . , X̂n i.i.d. Pn,X (ω). The focus of this paper is the distribution of (X̂i , Zi )ni=1 . The main difference from the usual nonparametric bootstrap is that the conditional distribution of (X̂i , Zi ) given {Xi , Zi }i∈N is not identical across different i’s. Pn Pn The measure Pn (ω) is a signed measure, and we write EPn f (X, Z) = n1 i=1 {f (X̂i , Zi )− n1 j=1 f (Xj , Zi )}. Let GP be a P -Brownian bridge, i.e. the centered Gaussian process on L2 (P ) with the covariance kernel given √ by CovP (f, g). Let ν ∗n , nPn and define dPn (ν ∗n , GP ; F) , supH∈BL(F) |EPn H(ν ∗n ) − EPn H(GP )| , where BL(F) = {H : l∞ (F) →R: ||H(v1 ) − H(v2 )|| ≤supf ∈F ||v1 (f ) − v2 (f )|| and supv∈l∞ (F) ||H(ν)|| < ∞}. Conditional on {Xi , Zi }i∈N , the process ν ∗n is an empirical process of independent but nonidentically dis√ tributed random elements. Similarly we define ν n , n(Pn,X − PX ) and dP (ν n , GP ; F 0 ). Then, we say F ∈BUCLT(P) if F is P -pregaussian and supP ∈P dBL,Pn (ν ∗n , GP ; F) → 0 in P uniformly in P ∈ P. (For details, see Giné and Zinn (1991)). The primary focus is to identify sufficient conditions for F ∈BUCLT(P). We follow the proof for the direct implication part of Theorem 2.2. of Giné (1997). For this we need the following lemma which is a slight variant of Proposition 2.5 of Giné (1997). Lemma B1: Let B be a Banach space, and for any n ∈ N, let wji = (xj , zi ), where xj , zi , i, j ∈ N, are points in B and let wi , n1 Σnj=1 (xj , zi ). Let W1 , . . . , Wn be independent, non-identically distributed B-valued random variables with each Wi having law P {Wi = wji } = 1/n, j = 1, . . . , n. Let Nji , i, j ∈ {1, . . . , n}, be i.i.d. Poisson random variables with parameter 1. Assume these random vectors and variables are all independent. Then, ° ° ° ° n °X ° n X n ° °X ° ° e ° ° ° E° (Wi − wi )° ≤ (N − 1)(w − w ) E° ji ji i ° ° ° n(e − 1) ° ° ° i=1 j=1 i=1 Proof of Lemma B1: For any probability measure L, define Pois(L) = exp(L − 1). Let L(X) denote the distribution of X. By the Poissonization lemma (e.g. (2.5’) of Giné (1997)), we obtain that Note that Pn i=1 ° ° n à n ! Z ° °X X e ° ° ||x||dPois (Wi − wi )° ≤ L(Wib − wi ) . E° ° e−1 ° i=1 i=1 L(Wi − wi ) = 1 n Pn i=1 Pn j=1 δ wji −wi and that ⎛ ⎞ n X Pois ⎝ δ wji −wi ⎠ = Pois(N1i (w1i − wi )) ∗ · · · ∗ Pois(Nni (wni − wi )) j=1 ⎛ ⎞ ⎛ ⎞ n n X X = L⎝ Nji (wji − wi )⎠ = L ⎝ (Nji − 1)(wji − wi )⎠ . j=1 j=1 34 (31) ³P P ´ ´ ³P P n n n n Therefore, Pois =L i=1 j=1 δ wji −wi i=1 j=1 (Nji − 1)(wji − wi ) . By (2.7) of Giné (1997) and by (31), we obtain the wanted result. Proposition B1: Suppose that supP ∈P F ∈BUCLT(P). R∞p R log N[] (ε, F, || · ||P,2 )dε < ∞ and supP ∈P F 2 dP < ∞. Then 0 Proof of Proposition B1: We follow the proof Theorem 2.2 of Giné (1997). A close examination of the proof shows that the only modification needed for this extension is bounding the oscillation of the bootstrap empirical process. Let E∗ denote the expectation with respect to the bootstrap distribution. Similarly as in the proof there, we proceed as follows: ∗ EE ||ν ∗n ||Fδ0 ° ° n ³ ° 1 X ´° ° ° = EE ° √ (32) δ (X̂i ,Zi ) − Pn,i ° ° ° n i=1 Fδ0 ° ( )° ° ° n n X n X ° 1 X ° e 1 ° E° √ ≤ (Nji − 1) δ (Xj ,Zi ) − δ (Xk ,Zi ) ° ° e − 1 ° n n i=1 j=1 n ° k=1 Fδ0 ⎧ ° ° ° ° ⎫ ° ⎪ ° ° ° ⎪ n n n n n ⎨ X ° 1 XX ° 1 XX ° ° ⎬ e ° ° ° E° √ ≤ (Nji − 1)δ (Xj ,Zi ) ° + E ° 2 √ (Nji − 1) δ (Xk ,Zi ) ° ° ⎪. ⎪ °n n e−1⎩ ° n n i=1 j=1 ° ° ⎭ i=1 j=1 k=1 ∗ Fδ0 Fδ0 We can show that the limδ→0 limsupn→∞ of both the terms are equal to zero for each P ∈ P by proceeding exactly in the same manner as in the proof of Theorem 4.10 of Arcones and Giné (1993). Uniformity over P ∈ P can be ensured by applying to the chaining argument a uniform version of Bernstein’s inequality R (Proposition 2.3 of Arcones and Giné (1993)) that involves supP ∈P Fk2 dP, where Fk is the class of functions that are bracketed by the k-th bracket. The number of the brackets can be chosen not to depend on P due R∞p to the condition that supP ∈P 0 log N[] (ε, F, || · ||P,2 )dε < ∞. Details are omitted. 35 References [1] Abadie, A. (2002), "Bootstrap tests of distributional treatment effects in instrumental variables," The Journal of the American Statistical Association, 97, 284-292. [2] Anderson, G. J. (1996), "Nonparametric tests of stochastic dominance in income distributions," Econometrica, 1183-1193. [3] Andrews, D. W. K. (1994), "Empirical process methods in econometrics," in The Handbook of Econometrics, Vol. IV ed. by R. F. Engle and D. L. McFadden, Amsterdam: North-Holland. [4] Andrews, D. W. K. (1997), "A conditional Kolmogorov test," Econometrica, 65, 10971128. [5] Andrews, D. W. K. and P. Guggenberger (2007), "The limit of exact size and a problem with subsampling and with the m out of n bootstrap," Working paper. [6] Arcones, M. A. and E. Giné (1992), "On the bootstrap of M-estimators and other statistical functionals," in Exploring the Limits of Bootstrap, ed. by R. LePage and L. Billard, Wiley, 13-47. [7] Arcones, M. A. and E. Giné (1993), "Limit theorems for U-processes," Annals of Statistics, 21, 1494-1542. [8] Barrett G. and S. Donald (2003), "Consistent tests for stochastic dominance," Econometrica, 71,71-104. [9] Bickel, P. J. , A.J. Klaassen, Y. Ritov, and J. A. Wellner (1993): Efficient and Adaptive Estimation for Semiparametric Models, Springer Verlag, New York. [10] Birman, M. S. and M. Z. Solomjak (1967), "Piecewise-polynomial approximation of functions of the classes Wpα ," Mathematics of the USSR-Sbornik 2, 295-317. [11] Chen, X., O. Linton, and I. van Keilegom (2003), "Estimation of semiparametric models when the criterion function is not smooth," Econometrica 71, 1591-1608. [12] Chernozhukov, V., H. Hong, and E. Tamer (2007) Estimation and confidence regions for parameter sets in econometric models. Econometrica [13] Davidson, R. and J. Y. Duclos (2000), "Statistical inference for stochastic dominance and measurement for the poverty and inequality," Econometrica, 68, 1435-1464. 36 [14] Giné, E. (1997), Lecture Notes on Some Aspects of the Bootstrap, Ecole de Éte de Calcul de Probabilités de Saint-Flour. Lecture Notes in Mathematics, 1665, Springer, Berlin. [15] Giné, E. and J. Zinn (1991), "Gaussian characterization of uniform Donsker classes of functions," Annals of Probability, 19, 758-782. [16] Hansen, P. (2003), "Asymptotic tests of composite hypotheses," Working Paper. [17] Härdle, W. and E. Mammen (1993), "Comparing nonparametric versus parametric regression fits," Annals of Statistics, 21, 1926-1947. [18] Horváth, L., P. Kokoszka, and R. Zikitis (2006), "Testing for stochastic dominance using the weighted McFadden-type statistic," Journal of Econometrics, 133, 191-205. [19] Kim, J. and D. Pollard (1990), "Cube-root asymptotics," Annals of Statistics, 18, 191219. [20] Koul, H. L. and S. N. Lahiri (1994), "On bootstrapping M-estimated residual processes in multiple linear regression models," Journal of Multivariate Analysis, 49, 255-265. [21] Linton, O., E. Maasoumi, and Y-J. Whang (2005), "Consistent testing for stochastic dominance under general sampling schemes," Review of Economic Studies 72, 735-765. [22] McFadden, D. (1989), "Testing for stochastic dominance," in Studies in the Economics of Uncertainty: In Honor of Josef Hadar, ed. by T. B. Fomby and T. K. Seo. New York, Berlin, London and Tokyo: Springer. [23] Sheehy, A. and J. A. Wellner (1992), "Uniform Donsker classes of functions," Annals of Probability, 20, 1983-2030. [24] Song, K. (2008), "Testing distributional inequalities and asymptotic bias," Working Paper. [25] Tse, Y. K. and X. B. Zhang (2004), "A Monte Carlo investigation of some tests for stochastic dominance," Journal of Statistical Computation and Simulation, 74, 361-378. [26] van der Vaart, A. W. and J. A. Wellner (1996), Weak Convergence and Empirical Processes, Springer-Verlag. [27] Wellner, J. A. and Y. Zhan (1996), "Bootstrapping Z-estimators," Working Paper. [28] Whang, Y. J. (2000), "Consistent bootstrap tests of parametric regression functions," Journal of Econometrics, 98, 27-46. 37 KS I LSW II III SUB BTS I LSW II Design N 1a, d∗1 =0 50 500 1000 .137 .046 .052 .063 .056 .049 .070 .063 .063 .057 .056 .056 .049 .049 .049 .105 .050 .046 .064 .057 .060 .084 .064 .064 .058 .057 .057 .061 .060 .060 1b, d∗1 50 = 0 500 1000 .122 .052 .048 .055 .051 .059 .061 .055 .055 .051 .051 .051 .060 .051 .051 .082 .058 .056 .071 .059 .046 .080 .072 .071 .061 .059 .059 .048 .046 .046 1c, d∗1 >0 50 500 1000 .399 .981 .994 .685 .983 .995 .762 .707 .689 .983 .983 .983 .995 .995 .995 .279 .975 .993 .411 .981 .994 .581 .457 .424 .983 .982 .982 .994 .994 .994 1d, d∗1 50 > 0 500 1000 .381 .986 .995 .684 .984 .994 .781 .706 .689 .987 .985 .985 .994 .994 .994 .262 .985 .995 .426 .982 .993 .578 .464 .437 .983 .982 .982 .993 .993 .993 50 500 1000 .397 .986 .987 .646 .992 .992 .738 .670 .653 .992 .992 .992 .992 .992 .992 .265 .975 .987 .371 .991 .989 .528 .408 .387 .991 .991 .991 .991 .991 .991 1e, d∗1 > 0 SUB BTS CM III Table 1F. Rejection frequencies for the test of First Order Stochastic Dominance for Design 1. SUB refers to the subsampling method with critical values computed by the automatic method "Mean" described by LMW(2005) for the 5% null rejection probabilities. BT refers to the recentered bootstrap. LSW refers to the recentered bootstrap with set estimation, where the tuning parameter is given by cN = c · N −1/3 with c = 0.25, 0.50, and 0.75 for cases I, II, and III, respectively. 38 KS I LSW II III SUB BTS I LSW II Design N 1a, d∗2 =0 50 500 1000 .110 .055 .052 .066 .055 .050 .068 .066 .066 .055 .055 .055 .050 .050 .050 .090 .049 .048 .062 .057 .050 .063 .062 .062 .057 .057 .057 .050 .050 .050 1b, d∗2 50 = 0 500 1000 .084 .068 .073 .061 .060 .050 .132 .075 .066 .178 .091 .071 .177 .084 .058 .064 .063 .060 .066 .060 .051 .154 .080 .071 .210 .096 .070 .219 .086 .063 1c, d∗2 >0 50 500 1000 .261 .942 .993 .336 .451 .545 .846 .525 .368 .995 .985 .983 .997 .996 .995 .159 .738 .969 .200 .003 .003 .850 .380 .212 .996 .986 .983 .997 .996 .995 1d, d∗2 50 > 0 500 1000 .233 .958 .995 .329 .423 .524 .841 .544 .368 .994 .987 .985 .999 .996 .994 .149 .710 .955 .197 .004 .003 .851 .370 .209 .994 .987 .984 .999 .998 .994 50 500 1000 .233 .933 .984 .299 .424 .484 .798 .478 .326 .995 .993 .992 .997 .995 .992 .138 .653 .934 .179 .009 .001 .810 .346 .189 .996 .993 .991 .998 .997 .992 1e, d∗2 > 0 SUB BTS CM III Table 1S. Rejection frequencies for the test of Second Order Stochastic Dominance for Design 1. SUB refers to the subsampling method with critical values computed by the automatic method "Mean" described by LMW(2005) for the 5% null rejection probabilities. BT refers to the recentered bootstrap. LSW refers to the recentered bootstrap with set estimation, where the tuning parameter is given by cN = c · N −1/3 with c = 2.0, 3.0, and 4.0 for cases I, II, and III, respectively. 39 KS SUB BTS I CM LSW II III SUB BTS I LSW II Design N III 2a, d∗1 =0 50 500 1000 .130 .049 .070 .054 .055 .044 .058 .054 .054 .055 .055 .055 .044 .044 .044 .082 .054 .050 .058 .057 .061 .071 .059 .058 .057 .057 .057 .062 .061 .061 2b, d∗1 50 = 0 500 1000 .078 .010 .020 .072 .026 .018 .088 .075 .073 .104 .046 .028 .170 .076 .034 .031 .002 .007 .056 .004 .001 .068 .056 .056 .028 .005 .004 .031 .009 .002 2c, d∗1 >0 50 500 1000 .300 .979 1.00 .453 1.00 1.00 .787 .626 .574 1.00 1.00 1.00 1.00 1.00 1.00 .272 .988 1.00 .718 1.00 1.00 .918 .872 .831 1.00 1.00 1.00 1.00 1.00 1.00 2d, d∗1 50 > 0 500 1000 .257 .969 1.00 .173 .988 1.00 .555 .427 .377 1.00 1.00 1.00 1.00 1.00 1.00 .184 .954 1.00 .016 .644 .995 .396 .223 .159 1.00 1.00 1.00 1.00 1.00 1.00 Table 2F. Rejection frequencies for the test of First Order Stochastic Dominance for Design 2. SUB refers to the subsampling method with critical values computed by the automatic method "Mean" described by LMW(2005) for the 5% null rejection probabilities. BT refers to the recentered bootstrap. LSW refers to the recentered bootstrap with set estimation, where the tuning parameter is given by cN = c · N −1/3 with c = 0.25, 0.50, and 0.75 for cases I, II, and III, respectively. 40 KS SUB BTS I CM LSW II III SUB BTS I LSW II Design N III 2a, d∗2 =0 50 500 1000 .077 .061 .063 .056 .046 .065 .063 .056 .056 .050 .046 .046 .065 .065 .065 .068 .054 .057 .061 .058 .064 .074 .061 .061 .066 058. .058 .064 .064 .064 2b, d∗2 50 = 0 500 1000 .063 .002 .004 .078 .006 .001 .119 .078 .078 .154 .018 .006 .215 .023 .001 .037 .001 .001 .077 .002 .000 .135 .077 .077 .171 .006 .002 .239 .014 .000 2c, d∗2 =0 50 500 1000 .032 .000 .000 .006 .000 .000 .007 .006 .006 .000 .000 .000 .000 .000 .000 .001 .000 .000 .006 .000 .000 .006 .006 .006 .000 .000 .000 .000 .000 .000 2d, d∗2 50 > 0 500 1000 .177 .936 1.00 .027 .332 .860 .823 .499 .069 1.00 1.00 1.00 1.00 1.00 1.00 .145 .857 .991 .001 .000 .000 .851 .407 .006 1.00 1.00 1.00 1.00 1.00 1.00 Table 2S. Rejection frequencies for the test of Second Order Stochastic Dominance for Design 2. SUB refers to the subsampling method with critical values computed by the automatic method "Mean" described by LMW(2005) for the 5% null rejection probabilities. BT refers to the recentered bootstrap. LSW refers to the recentered bootstrap with set estimation, where the tuning parameter is given by cN = c · N −1/3 with c = 2.0, 3.0, and 4.0 for cases I, II, and III, respectively. 41 KS I LSW II III SUB BTS I LSW II Design N 3a, d∗1 >0 50 500 1000 .637 1.00 1.00 .959 1.00 1.00 .993 .986 .977 1.00 1.00 1.00 1.00 1.00 1.00 .612 1.00 1.00 .972 1.00 1.00 .997 .992 .988 1.00 1.00 1.00 1.00 1.00 1.00 3b, d∗1 50 = 0 500 1000 .060 .000 .000 .025 .000 .000 .032 .026 .025 .000 .000 .000 .000 .000 .000 .037 .000 .000 .026 .000 .000 .040 .028 .026 .000 .000 .000 .000 .000 .000 3c, d∗1 50 500 1000 .611 1.00 1.00 .949 1.00 1.00 .993 .989 .985 1.00 1.00 1.00 1.00 1.00 1.00 .575 1.00 1.00 .970 1.00 1.00 .996 .992 .990 1.00 1.00 1.00 1.00 1.00 1.00 >0 SUB BTS CM III Table 3F. Rejection frequencies for the test of First Order Stochastic Dominance for Design 3. SUB refers to the subsampling method with critical values computed by the automatic method "Mean" described by LMW(2005) for the 5% null rejection probabilities. BT refers to the recentered bootstrap. LSW refers to the recentered bootstrap with set estimation, where the tuning parameter is given by cN = c · N −1/3 with c = 0.25, 0.50, and 0.75 for cases I, II, and III, respectively. 42 KS I LSW II III SUB BTS I LSW II Design N 3a, d∗2 =0 50 500 1000 .031 .000 .000 .021 .000 .000 .025 .022 .021 .000 .000 .000 .000 .000 .000 .000 .000 .000 .016 .000 .000 .022 .016 .016 .000 .000 .000 .000 .000 .000 3b, d∗2 50 = 0 500 1000 .048 .000 .000 .044 .000 .000 .123 .053 .046 .020 .005 .002 .011 .001 .000 .039 .000 .000 .050 .000 .000 .149 .065 .051 .030 .006 .002 .018 .002 .000 3c, d∗2 50 500 1000 .546 1.00 1.00 .934 1.00 1.00 .942 .940 .938 1.00 1.00 1.00 1.00 1.00 1.00 .458 1.00 1.00 .920 1.00 1.00 .939 .926 .924 1.00 1.00 1.00 1.00 1.00 1.00 >0 SUB BTS CM III Table 3S. Table 1F. Rejection frequencies for the test of Second Order Stochastic Dominance for Design 3. SUB refers to the subsampling method with critical values computed by the automatic method "Mean" described by LMW(2005) for the 5% null rejection probabilities. BT refers to the recentered bootstrap. LSW refers to the recentered bootstrap with set estimation, where the tuning parameter is given by cN = c · N −1/3 with c = 2.0, 3.0, and 4.0 for cases I, II, and III, respectively. 43