SOME LIMIT THEOREMS AND INEQUALITIES FOR WEIGHTED AND NON-IDENTICALLY DISTRIBUTED EMPIRICAL PROCESSES by Kenneth Sidney Alexander B.S., University of'Washington (1979) SUBMITTED TO THE DEPARTMENT OF MATHEMATICS IN PARTIAL FULFILLMENT OF THE REQUIREMENTS OF THE DEGREE OF DOCTOR OF PHILOSOPHY at the INSTITUTE MASSACHUSETTS OF TECHNOLOGY September 1982 @ Kenneth Sidney Alexander The author hereby qrants to M.I.T. permission to reproduce and to distribute copies of this thesis document in whole or in part. Signature of Author... - .--..---.- 0- e -* - ----..... De artment of Mathemacs, August 6, 1982 Certified by.. Richard M. Dudley Tgsg Supervisor Accepted by................-............................... Nesmith C. Ankeny Chairmaa, Departmental Graduate Committee MA,,Cflives TTS INSTITUTE MAss NOV 23 1982 LIBRARIES 2. SOME LIMIT THEOREMS AND INEQUALITIES FOR WEIGHTED AND NON-IDENTICALLY DISTRIBUTED EMPIRICAL PROCESSES by KENNETH SIDNEY ALEXANDER Submitted to the Department of Mathematics on August 6, 1982 in partial fulfillment of the requirements for the degree of Doctor of Philosophy ABSTRACT Sharp bounds in probability are derived for the vn supremum of the normalized empirical process over classes of sets satisfying a combinatorial condition, including the non-identically distributed case. From this-is derived a bounded law of the iterated logarithm for this supremum, and a central limit theorem for truncated weighted empirical processes. In the identically distributed case with law P, conditions are found on so that an q invariance principle holds for the weighted process v n/qoP, where q is a function on [0,1]. As a corollary, a functional law of the iterated logarithm v n/qoP , and necessary and sufficient conditions for on q for the central limit theorem to hold, are obtained. Thesis Supervisor: Title: Richard M. Dudley Professor of Mathematics 3. TABLE OF CONTENTS 5 1. Introduction 2. Background and Preliminaries 12 3. Exponential Bounds for the Suprema of 32 Normalized Empirical Processes for the Normalized Empirical 68 5. Weighted Empirical Processes with Truncation 81 6. Weighted Empirical Processes without Truncation 98 4. A Bounded LIL Process Index of Notation 133 References 135 4. ACKNOWLEDGEMENTS I would like to thank my thesis advisor, Professor Richard M. Dudley, for his devotion of much time and effort to guidance of my work and careful checking for my errors. I am grateful to everyone else with whom I consulted in the course of this project, especially Professors Walter Philipp, Ronald Pyke, and Galen Shorack. My fianc6e Chris Czarnecki and my family provided moral support and encouragement throughout this endeavor, heightening my successes and cushioning my failures, making my efforts a pleasure. I am indebted to Professor Bruce Erickson for fostering my original interest in probability, and to the National Science Foundation which supported me financially as I pursued that interest. I wish to thank Phyllis Ruby for her excellent work typing this thesis. 5. I. Introduction Given a sequence X 1 ,X 2 ,.. of independent random (X,Q ) variables taking values in a measurable space 4(X ) = P laws fraction X. y, we consider for each P n(A) of the first n r.v.'s which fall in deviation of A. Pn (A) A E a with the (random variables) Our particular interest is in the from its expected value 1n P (n) of (A):= O , n il P Mi (A) as especially for large "equals by definition.") varies over some subclass C A n. (We use We call vn:= n 1 the nth normalized empirical process for We can view vn ":=" to mean (Pn (n) {P (i), i > 11. as a stochastic process indexed by t , or equivalently as a random element of a space of functions on C . We seek two main types of results: first, bounds and second, limit theorems (central limit theorems) and CLT's vn' including for n (C) I, supCeC on the tail of (laws of the iterated logarithm). LIL's The first type of result is usually the key to proving the second. If C is a finite set and all then by the finite dimensional (1.1) :((vn(C in as Jk ),..,V n + o, n(Ck))) where E CLT = some P, P we have + N(O,E) := P(C C.) - P(C )P(C ). 6. Hence, letting a:= max 1 <i<n ii = sup, var (Vn M > 0: we hae for all lim Pr[sup neo vn(C) M] > < k { lim Pr[[vn(C )I > M] i=1 n-m k (1.2) = i=ln (M/var - (v 1 2) (C k distribution function of the normal law When we allow P on satisfies Hence [0,1], ( is the t N(0,l). P then the (random) set n [0,1], are the uniform C:= {X 1,...Xn} Vapnik and Cervonenkis a.s. (1971), though, showed that if is X IVn(C)I = nl/ 2 Pn (C) = 1, P(C) = 0n, s sup and If, for example, X, and all is the Borel sets in law - satisfies a certain combinatorial condition, stated by saying C is a Vapnik-Cervonenkis (or VC) class and satisfied by many classes of interest in applications, then (1.3) Pr[sup Ivn( C)I > M] < 4(nv + 1)exp (-M /8) C for M > 21/2, ) to be infinite, however, the C situation is more complex. C 1/21/ c(M/al < k(I (-M /2a) is a constant depending on K where 1 K exp = '(C)) 12.W where v is a constant depending on C. 7. Devroye (1982) improved (1.3) to Ivn(C)I > M] < K(nV + 1)exp (-2M ) Pr[sup (1.4) C where K is a universal constant. But (1.3) and (1.4) cannot be improved to an inequality as good as (1.2), for Kiefer and Wolfowitz (1958) show that for the VC class := {(-o,x]: lim Pr[sup IVn(C) C n-*o as M }, X E R I there is a law which gives > M]IxN 8M exp (-M /2a) n m, where again + P a:= sup var(v n(C)). Kiefer (1961) did prove the next best thing to (1.2) in a special case: : he showed that when P, for each are equal to some K depending on M > 0. for all is d and c x E Id} e > 0 and all P there is a constant such that Pr[sup Ivn(C)j > MI < K exp (-(2-s)M ) C (1.5) a {(-co,x]: (1.5) is nearly as good as (1.2) if 1/4, its largest possible value since sup, var (Vn(C)) = suPC P(C)(l - P(C)) < 1/4. The inequalities (1.3) and (1.4) actually require some measurability assumptions since the events they refer to 8. may not be measurable in general. We omit these here and in the remainder of this introduction. All of this motivates us to seek generalizations of (1.5) of the form Prisup Ivn(C)f > M] < K exp (-(l-e)M /2a) C for all e > 0. We further seek to include the non- identically distributed case, where the distinct. may be P These goals will be accomplished in Chapter 3. Returning to (1.1), we are led to ask whether the processes vn C on a possible infinite class in law to a Gaussian process, i.e. whether a converge CLT holds. This information would enable us to find the limiting distribution of many functions of the CLT vn . Whether or not holds depends of course on the a-algebra on which we wish the weak convergence of laws to occur; a reasonable choice turns out to be the a-algebra generated by all balls in the space of functions on view vn as taking values. C in which we Dudley (1978) showed this convergence does occur in the identically distributed case when C is a VC class. In Chapter 5 we generalize this result to the non-identically distributed case. 9. {v n Kuelbs and Dudley (1980) showed that a compact C LIL in the identically distributed case when is a VC class. In the non-identically distributed case there may be no compact a bounded satisfies LIL LIL, but we can look for as follows. {C 1,...,C}, and EilP then by a classical LIL If O (C.) = m is a finite class j for each < k, for real-valued r.v.'s (see Stout (1974)), fv (C.)j lim sup n where a.s. n 2 (2a . log log (na ))1/2 nj nj 2 anj = var (vn (C )), so sup IVn(C)I (1.6) where 1/2 -1 lim sup n (2 an log log (nan)) 2 an:= max. anj = sup var (vn (C)). a.s. Chapter 4 In . we generalize (1.6) to possibly infinite VC classes Even more detailed information about obtained by weighting it at each set to the size of C empirical process , C E C may be according that is, by studying the weighted v n(C)/q( (n)(C)), is a non-negative function on be small when vn P(n) (C) [0,1]. CE C , where If q q is taken to is small, then the process 10. in effect magnifies the normalized deviation (n) vn /* Vn(C) of Pn(C) P(n) (C) from for small sets C, where the true deviation is likely to be small. natural question to ask is: class , for which C satisfy a CLT {P ()} do the processes or compact distributed case, if CLT q given q LIL? q(t) -+ 0 (n) t CLT for the On the other hand, if + 0 or 1, or if vn /q'(n) 0 < t < 1, then for some vn /* is continuous and positive this vn. too fast as and a VC In the identically follows easily from Dudley's (1978) unweighted processes A = 0 q(t) may be very large on some sets with high probability, precluding any or LIL. q(t) or as t approach LIL -+ > 0 for t E (0,l) 0 as to hold. q(t) t - 0 or 1 -+ if we wish the uniform on as q(t) O'Reilly (1974) showed that for P 0 for t can CLT or C:= (0,1], and satisfying certain regularity assumptions, the processes vn/" (n) satisfy a f1 exp (-Eq2(t)/t) t 0 and and 1, and the question is how fast {[0,x]: x E [0,l]}, all q q Thus the cases of interest are those which CLT < o CLT for all if and only if c > 0 + 0 11. 1 exp (0 James q2 l-t)/t) t < o for all c > 0. (1975) showed that in the same setting, vn/o( satisfies a compact LIL if and only if 1 f 0 2Qr q (t) < 0* log log (1/(t(1-t))) In Chapters 5 and 6 we extend these results to more general classes C and laws P .i 12. Background and Preliminaries II. Throughout this thesis we shall work in and assume the following context: C. CL, and {P (X,Ck) ,)i > l} is a sequence of laws (i.e. probability measures) on distributed case, i.e. when all this law is a measurable space, (X,O). P (i) In the identically are equal, we call P. Let 00 i=1 be the product measure on measure Pn (X", am). The nth empirical is defined by 1n P (A), 6 (A):= il i A EC is the ith coordinate function on where Xi P nis a function from (Xw, am) of all bounded functions on P (A) . and each (Xw, am,P) A. in X If with the sup norm, and we put a law on become r.v.'s. we can view Pn Thus Zm(Q ) is the fraction of the first n coordinates which fall Pn a to the space X (Xm, Qa) X , then For example, on as a measure obtained from 13. independent random points to laws H ,...,P P , H~y ,... n) sampled according X 1 ,...,Xn respectively. are any laws on In general if (X, CA), I:= H I i=l and B C Z(CL), that :H[Pn E B] then is in the set Pn with law B means the probability when each is X. I an r.v. H i). Note that when confusion is possible we always use subscripts in parentheses for non-random objects. Next define, as in the introduction, n (n) n . (i) and n: = nl/2 (Pn so T(n) = EVPn(A) P(n) (A) the restriction of F finite subset E is cut out by C E C out by m . c , for A E . to classes vn We wish to study C and a subset of X C from A (F) Let ' F if C E a . of E = F n C Given a F, we say for some be the number of subsets of and (n):= max {A (F): F C X, card (F) = n}, n > 1. F cut 14. for all mz(n) < 2n Then Vapnik-Cervonenkis m (2.1) n. We say C (or VC) class if (n) < 2n n> 1 for some and define the Vapnik-Cervonekis index n > 1 to be the least is a V(C) of C such that the inequality in (2.1) Thus a VC class is one which does not contain holds. enough sets to cut out all subsets of any n-element subset X of when is large. n The concept of a VC class is quite a general one, as the following lemmas from Dudley (1978) show. Let Lemma 2.1. T with f(x) > 01: f E i' Let t Then be a VC class and for all 5 := U{a(C1,...,Ck): Cy E C a(C 1 ,.. .,Ck) Then ) is a VC class k < f formed from at most o. Let i < kI, where is the algebra generated by is a VC class. Note that C jj V(C) = d + 1. Lemma 2.2. }. vector X, and space of real-valued functions on C:= {{x: (d < m) be a d-dimensional C 1 ,..., Ck* j- consists of those sets which can be k sets in C by the operations of union, intersection, and complementation. 15. Perhaps the most significant fact about VC classes is the following, due to Vapnik and Cervonenkis (1971). If Lemma 2.3. Thus for any class for all n (if C is only like a power of We use A\B 1 + _I- n > 1. for all me(n) < n is a VC class then n m(n) = 2n c CO, either C a VC class.) is C (if grows m (n) not a VC class) or to denote the set difference A n Bc. A special case of Lemma 2.2 is the following: be a subset'of C, Let be a VC class, and Let Lemma 2.4. {A\B: A,B E}. the least positive integer such that C1 Then (F) Let (vV(1) v1 + 1) 2 be < (A (F)) V(C 1 ) < v 1 . for all finite F C X. I-I Using Lemmas 2.1 and 2.3 we can show the following are each VC classes: (1) Any subset of a VC class (2) A finite union of VC classes (3) All rectangles in ] 1 < 21. This follows from Lemma 2.3 and the fact that Proof. A is a VC class with v > V(C). 16. (4) All ellipsoids in d (5) All half spaces in ] (6) The class d {(-co,x]: x E IR of all lower d rectangles in , where I H (-,x.]. (-o,x] = i=1 H Given a law on dH C C CL ak on (X,Ck ), define the pseudometric Then for dH (A,B):= H(AAB). by 1 e > 0 and define N(E,C ,H):= min {k: There exist C1 ,.,Ck C that min d H(C,C ) < e for all such C E C } i<k The most important property of VC classes for our purposes is the following modified version of Lemma 7.13 of Dudley (1978). There exists an increasing sequence of constants Lemma 2.5. such that if v > 11 {A(v), r- is a VC class and v > V(C ) then N( ,C ,H) H for all laws is < (V) and totally bounded for -2v 0 < c < 1. dH' Hence in particular I-I' Note that the bound in Lemma 2.5 does not depend on the law H. C 17. We turn our attention next to the convergence in {vn, n > l} law of (X,Q, a law on P For be a VC class. mean-zero Gaussian process indexed by and let C e C . has covariance G and Lemma 2.5 that we may take p(A,B):= P(A n B) - to have bounded GP {v n} seek for v surely. X vn vn set in D. and G D almost must in general be non-separable if {6 is uncountable, for then the set possible values for so we look in a subspace G to large enough to contain D (C), Z are random elements of Such a As discussed in the identically distributed case. for convergence in law of kw(C ) -. is the proper limit in law to G in the introduction, of with covariance . uniformly dp-continuous sample paths on and be a W, It follows from Theorem 1.1 of Dudley (1973) P(A)P(B). G, let C G (C):= WP(C) - P(C)WP(X), a(A,B):= P(A I) B), Then Let and related processes. v1 - P: x E X} of may form an uncountable discrete It is consistent with standard set theory to assume that any law y on the Borel sets of a non-separable metric space is concentrated on some separable subspace, i.e. p(E) = 1 Sikorski (1948)), so the law defined on all Borel sets in L(vn) D. (Marczewski and E for some separable is not in general Thus we must consider weak convergence on smaller a-algebras. But the limit 18. law L (G ) Cb,u ( is concentrated on the space 0 of bounded uniformly dp -continuous functions on which is separable since (by Lemma 2.5) bounded. 7 ,d ) , is totally Hence we examine weak convergence to laws concentrated on separable subspaces. Given a metric space subsets of n > 11 {yn, to S, and a law y on (S,d), a a-algebra y (S, Z )") (or "in J {yn} if converges weakly ffdydn -+ all bounded continuous real-valued functions which are measurable with respect to A to fY nI - , we say )") (or "in (S,z to ; (Y) on 2 if the laws . Let and {x: d(x,x 0 ) < r} ' B We have on Y 1B or B b S and converge weakly E in S. when no confusion is possible.) S then E, that is, } = {A n E: A E 'bBl* {A n E: A E - if on 2 the a-algebra generated is a separable subspace of E E 18b Y (x0 E X, r > 0) have the same relativization to (2.2) for denote the Borel -b(Sd) by all balls (We write only { -. 4(Yn)} BB(S,d)' (S,d) E f If converges in law to a-algebra of If . ffdy are S-valued r.v.'s measurable with respect n > 11 {Yn, of and sequence of laws (S,AS), we say on 2 is also closed; in fact, b and 19. E = r U m=1 tET B(t,l/m) E x E S, we have about r the open ball of radius . Hence a law y (SVt b on for all A E by setting (S,b B), can S which is concentrated on a separable subspace of be viewed as a law on B(x,r) and E be a countable dense subset of T letting p(A n E) p(A):= IB' The following theorem is due to Wichura (1970). (1974). Another proof is in Fernandez > 1, be a-algebras with 1n' ,n be a law on P and let p Yn, n, for each S. Suppose un converges and Y n > 1, all defined on the same probability space, Yn such that Y *n for Then there exist r.v.'s 1Bb. on B n C C which is concentrated on 1Bb some separable subspace of weakly to Ib be a law defined on pn Let n. all be a metric space and let (S,d) Let Theorem 2.6. is is measurable into measurable into Let Corollary 2.7. a a-algebra with n > 1, then be laws on pyn - p 13B' n and p,'(Y) Yn n Y a.s. + be a metric space and (S,d) b C BC .BB If (S,2). weakly on 1T 0n' . pn Let + p p and -8 pn weakly on 18b' ID 20. Proof. with Let Y and Yn, n > 1, be as in Theorem 2.6, for all ' n function on S n. If f is a bounded continuous , then by measurable with respect to B ffdyin = Ef(Yn) bounded convergence, -+ Ef(Y) = ffdyp. I-I This corollary justifies restricting our attention to weak convergence on A Given a set let I b' in a metric space A denote the set If yP is c > 0, and (S,d) {x E S: d(x,A) < e}. A E (S,1 Pb) , a set a law on called a continuity set for 'p if is y(aA) = 0, where ;A A. is the boundary of The next theorem is a characterization of weak convergence. Theorem 2.8. Let separable subspace of on (S, Z b) , with be a metric space, (S,d) y S, and i and concentrated on E a closed y n, n > 1, laws E. Then the following are equivalent: (i) (ii) in weakly on + yPn (A) y (A) + for all continuity sets for (iii) yn(A) -+ (2A) b' A E '8 which are v; for every A which is a finite union of open balls which are continuity sets for yq 21. Suppose first that (i) holds. Proof. y we may consider A E on f Since A n E Tl E) ,0). (1 - md(x,K is separable we have s > 0, so all f is e > 0. yi, and let f(x):= max by S Let such that y ((AOE)l/m\ (AfE)) < s. m > 1 Then we can choose Define (S,BB). to be a law on be a continuity set for yb As discussed above, for (A n E)6 E 1Bb 13b-measurable, and ii(A) < f fdy < y(A) + E. S (2.3) Let F:= A fl x E E so for each in the complement {x: f(x) Fc F. of {x n} C E. contained B(x,r(x)) Taking a 6 > 0. Let E C E n F = 0, Then 6}. there is a ball countable subcover we have some sequence < 1 - U B(x ,6r(x n)) n=1 such that N Choose for N y'( U B(x ,6r(x ))) n n n=l > 1 - 6, and define N U : B(xn,6r(x n)). U n=l g(x):= min (1, Define g = 1 on on min d(x,x n)/r(x n)). l<n<N continuous, bounded and and g F. Hence i S Then b-measurable, by g g < 6 is on U6 , 22. im sup n y (A n {x: f(x) < 1 - } < lim sup f gdyn n S gdyp f = S 6 + ya(U ) < 26. It follows that y n (A n {x: f(x) < 1 - E} 0 as n - o so using (2.3) we see that lim sup yn (A) = lim su n -pyn (A n' {x: f(x) > 1 - E n lim sup f fdan S n - fdy S 1 ((A Since the same holds for Ac + ). in place of A, and c is arbitrary, (ii) follows. Clearly (ii) implies prove (i). Let 1B b-measurable on Then T f (iii), so we assume (iii) and be continuous, bounded, and S. Let T:= {t E R: y({x: f(x) =tl) > 0}. is at most countable. Fix t 9 T, and let 23. > t}. A:= {x: f(x) an r(x) > 0 Then for each B(x,r(x)) C A such that = 0. yt(aB(x,r(x))) x E A n there is E and Taking a countable subcover we have 00 that A n E C U B(x for some sequence {xn I C AnE. ,r(xn)) n=l Fix E > 0 k < m0 such that and choose k y(U B(xi ,r(x - > y (A n E) C. (iii) By we have i=l k lim inf n(A) n > lim inf un ( U n i=1 ,r(x B(x i))) k = y(U B(x.,r(x ))) i=1 > y1(A) Since the same holds for (2.4) un ({x: f(x) > tI) . - A = -+ {x: ({x: f(x) f(x) < t}, > t}) From (2.4) it i-s easy to prove that and (i) follows. The functions we have for all t 9 T. f fdyn+ ffdy, S S I-I 1IC($): the coordinate functions. $P(C) , C E C , If v indexed by C , then the laws on are called is a stochastic process Id of o the r.v.'Is 24. (v(Ci),...,v(Cd)), where d > 1 and C1 ,...,Cd are called the finite dimensional distributions of A stochastic process v v. indexed by a topological space is called sample-continuous if some version of v has continuous sample paths, i.e. if there is a stochastic process with continuous sample paths and the same finitedimensional distributions as a law on B (X, CI), v. 6, e > 0 For H(AAB) Pr H let (H):= {f E k (t ): There exist For any law and let < 6, Pr* Jf(A) and - Pr* f(B)I A, B E C with > 6}. denote the corresponding outer and inner measures respectively. The following is a variant of Theorem 1.2 of Dudley (1978). Theorem 2.9. The proof is essentially the same. Let c Es C and let n , n > 1, be Z (C )-valued r.v.'s on a probability space with $ (C) a law on containing into measurable for each (XCI ). Let D Cb,u(C ,d ) (D,1Zb). Gaussian process Then G $n C E C . (Q,8 ,Pr) Let $n converges in law on with sample paths in be 9oo (C ) be a subspace of and suppose each P is measurable 1 b to a Cb,u(C ,dp) 25. if (2.5) e > 0 for every n > n0 that C (2.6) is there exist implies 6 > 0 and Pr*[$ n e B6,E(P)] totally bounded for n0 such < dp and the finite dimensional distributions of (2.7) in converge to those of G, ~_ and only if (2.5) and (2.7) hold. Using his similar theorem, Dudley (1978) showed that for a particular choice of the space in law to G on tb(Drd) D, vn converges in the identically dis- tributed case, under certain measurability assumptions. We refer the reader to Dudley's paper for details. Measurability assumptions are in general needed because, as discussed above, into (Z(C ), BB vn may not be measurable ' and because such r.v.'s as supCIvn(C)1 may not be measurable. Hence we define here a measura- bility condition to suit our purposes. ' 1 2n P := n n ni l 6-X, ,F ni=n+1 i v := n 1'/2 (P n n - P n ), Let 26. and let P n TI P :) (i) (i (n) be the product measure on z CL c n). (Xn We say a class is n-deviation-measurable for (or just "for P" if all sup in1 /2(P (2.8) > l} (D) - bP (D))f, (n) sup fv0(D)|, n Z_ |P2n (D) - bP(n) (D)l (2.10) sup are each 2n)-completion measurable on b = 1. when ,i = some P) if P n (2.9) {P We say constants for for all 2n) is n-deviation measurable with ,i)' i > 11 {P b > 0. Z (X2 n if this measurability holds The proof of the following lemma is obvious. Lemma 2.10. of k(O ) for each Let Z such that D E. C CA f(D) and let and g(IY) be random elements are measurable Suppose that with probability 1 there exists a countable subset D E .D f,g there exists a sequence of D {Dn} such that for each in 30 with 27. f(Dn) - f(D) and g(D n) is a measurable r.v. are satisfied for bP (n)) (P2n (f,g) = ), (P ,bT n {yn I Given a sequence (P (n) Z b > 0, then for all C({y n) sup wn '... P ), (2n log log n) 1/2 Y and |~j {y }, i.e. denote the cluster set of B g(D) in a topological space, {yn}. Let B Yf-|B' a-B-valted rv., Y, Sn independent copies of compact LIL in nn , i > l}. {P be a Banach space with norm 2 - is n-deviation the set of all limits of subsequences of YlY f(D) In particular, if these hypotheses measurable with constants for we let Then g(D). + n=1i, and is said to satisfy the if there is a compact set K in B such that lim n J|Sn /wn - KI 1B = 0 a.s. and C({Sn/wn}) = K a.s. A discussion of the nature of K Kuelbs, and Zinn (1980). C B[0,1] Let of continuous functions from fli C: n sup 0 <t<lllf(t)I|B. CB[0,1] by can be found in Goodman, [0,1] to denote the space B with norm Define random elements 28, Cnt) = Snt Cn with t = k/n, k = 0,1,...,n if linear in between points satisfy the functional LIL in set K in CB[0,1] and C({C n/w}) = K a.s. B such that lim n k/n. Y is said to if there is a compact ||Cn/Wn - By considering only KI1C = 0 a.s. n (1) we see that the functional LIL implies the compact LIL. Y is said to satisfy the CLT in converges weakly on l (B,-Z ) if (Sn/n 1 /2 to a Gaussian law. When no confusion is possible we may abuse notation {S n/n /2 and say satisfies the compact or functional Y LIL or the CLT when does. A bounded LIL is any result which says a sequence {Yn: n > 11 satisfies of B-valued r.v.'s with partial sums |Sn lim sup n 11/w n < Sn C' The compact LIL implies a bounded LIL, since the compact limit set K is bounded. shown (see Pisier (1975)) that if Y In fact, it may be B is separable and satisfies the compact LIL in B, then sup {I|xI|B: =sup {var (f(Y))l/2 : fE B*, f(x) < lfor allH x B 29. where B* is the dual of B. In particular, we have the following. Lemma 2.11. Y C If C Q is totally bounded and an r.v. satisfies the compact LIL in lim sup ||Sn||/wn Cb,u C = sup {var d ), then (Y(C)) 1/2 C a.s. n I -I' The next lemma is from Kuelbs and Le Page (1973). Lemma 2.12. A Gaussian r.v. taking values in a separable Banach space satisfies the CLT and the compact and functional LIL's. ID This lemma makes the following invariance principle of Dudley and Philipp (1982) quite useful. Theorem 2.13. Let T C L2 (X,Q ,P) be a class of functions such that (2.12) (2.13) ±E is totally bounded in for every that for c > 0 L2, and there exists 6 > 0 and n 0 such n > n' P*[sup {ff(f-g)dvn: fig E Then there exists a sequence I f(f-g) {Y., j > 11 2 dP < }> ] <E. of independent 30. identically distributed Gaussian processes indexed by with sample functions a.s. uniformly continuous on with respect to the (2.14) L2 EY (f) = 0 (2.15) , t norm, such that for all EY 1 (fy 1 (g) = ffgdP - f E T f,g E T, ffdP - fgdP for all and k n -1/2 max sup j f (X.) k<n fET j=l (2.16) in probability, and in L - f fdP - Y .(f) for all + 0 as n + p < 2. If in addition there exists a measurable function F with If! for all <F (2.17) fET and fF dP < co, then the convergence in (2.16) can be obtained in 2 also. If (2.18) then the fF /log Y. J log F dP < o0, can be chosen so that, instead of (2,16), n (2.19) f (X) -ffdP - Y.(f) I = 6((n log log n) 1/2) a.s. sup fET j=l 31. The r.v. in (2.16) may not be measurable in general. of r.v.'s all Therefore we take convergence in probability Zn to Z to mean Pr*[IZn-ZI e > 0, and convergence in for some measurable r.v. Lebesgue measure. to mean -+ EW n 0 - for 0 Wn > IZn-Z|, The Gaussian processes defined on the product of Lp > e] Yi (XC, in Theorem 2.13 are CL ,]P) and [0,1] with 32. III. Exponential Bounds for the Suprema of Normalized Empirical Processes C C For 0 n > 1 and a (S,n):= sup va =sup For x = - (vn(C)) 1n P ()(C) l''''' o ti nb (1 - P ()(C)). define (x 1 ,x 2 ,...) E X, x (n) define chater Our main theorem of this chapter will be the following. Theorem 3.1. Let n > 1,and let CC( k be a VC class such that (3.1) t is k-deviation measurable for i < n and P for all k > 1, and sup Iv (C) (3.2) is measurable. Cn Let v > V(C), 0 < e < 1, and constants that if M = M 0 (E,ca,v), a > 0. Then there exist n 0 = n0 (E,,a,v), K = K(v) such 33. 3n1/2ae/8 if (3.3) n > nO' a > a(C ,n), and M M < 00 if then ]P[sup IVn(C)j Given 6 > 0 (3.4) a < a0 Since > MI < K exp there exists implies (-(1-E)M /2a). a 0 = a0 (,v,6) such that II M (E,a,v) < 6. a(C ,n) < 1/4 always, this theorem is never vacuous, in that (3.3) is always satisfied for and M,n a = 1/4 large. The property (3.4) enables us to apply the theorem to classes of "small" sets, which is useful in proving statements like (2.5). The following lemma uses ideas from Pollard (1981). In its proof, and several that follow, we use iterated integrals in place of conditional expectations to avoid measurability difficulties. Lemma 3.2. Let H( 1 ) ,.,H(m) M H:= H i=1 be laws on 2 H , and EI:= H 2m on (X m a < (X,CsJ, 2m m). Let a >1 -4 34. Hm m Z m m m I lxi y ml/2 (H ~ m), cO(1, 0, and y> 6 .FH 1' .2m m i=m+ 1X1 (m) : 2m i=lH (i) m m := m1/2 (Hm - H'). and 0 > sup 1= zm - i (D)(1 H Let H (D)). - (2) 1/21 Then (3.5) H[sup Iym (D) I > y] < 2H[sup > y D) IP0 pm(D| whenever both events in (3.5) are H-completion measurable. Proof. y' := m Let 2 (H' - if and let O the sup norm for functions on sup E(y' (D) s m z m l X x 1 > | 0 > y - > 1/2. (1 - H {x x ()E (2 0 )1/2 dH(x (D)) : ,...,x2m) (26) 1/2 1dH (xm+ 1' XM1 >_ 1 - (D) H then D 0 E Z, for some E(P' (DO 20 im0 2 denote We have . so by Chebyshev's inequality, if 11-1' '''' x2m) IUm(D 0 ) I > y} 35. Hence ]Hl (26) 1/2 'PO 1 =f f m X e fl [II I m I>y = 1/2 dH(x m+1 '''''X ] m I >y - (20) H[I IyPmII > 1/2 dH(x 1 ,..,xm) ..,xM) ID y]t For integers Lemma 3.3. 2 m)dH(x 1/2 2 -~ < 3 r > 0 -2-r/2 j=r+1 so if f (x): Let Proof . r Then f'(x) < 0 then > l 1/2.-2~ < f f (x)dx j=r+1 = 2 x r f 2u22-u du rl1/2 rl/2 2-r log 2 r 1/2 2-r <log 2 + f 1 log 2f 2-u2 du r1/ 00 2u log 2 + f + 1/2 2r/2 (log 2) 2 2 2- du for x > 1, 36. -r = rl/2 log 2 2r/2 (log 2) 2 3rl/2 2-r The case r = 0 follows easily from the case r = 1. The next lemma also uses ideas from Pollard We define K 1 (v):= Lemma 3.4. in Let Lemma 3.2, 3A(V) 2 , 2A(v) + H ,...,H Z and let E( (3.6) 2 r:= [128 (b+S)V(Z )log 2 where (3.7) u > 0, 1/2 (213 be as Let b > 0, and denotes the integer part. [-] 0 be a VC class. CO (1981). > 1. ,M H, and , H S > sup z (D), v I-I 2 -r/2 Suppose < u/2. Then JR [sup |p ((D) I > u] <J [sup H (D) - ( D) I > b] (3.8) + K 1 (V(Z ))exp (-u2/32 (b+ ) ) 37. provided the events in (3.8) are ]-completion measurable. Proof. Let - 1 denote the sup norm over Z Z:= {-1,1}, let E (z):= z , z = G:= 1 and let Z, and let be the power set of F({-1}):= F({1}):= 1/2. Let (z1 1 z 2 ,...) E z 2m (X2m x Zm on Fm x Let x Define m m/2 ~0 Pm : W:= {x (2m): I x on e 6 i~Xl Zm Xn+i H(m) II - H2m > b}C X 2m Then (3.9) H[I jy 0 > u] = G[Ill IM I > u]. (This is clear conditional on any fixed values of the We have G[||IP- 1 > u] f =f X2m Zm 1 [I 00 |PM| dFmdH I>u] (3.10) < 1(W) + sup (2m) W FmI I I I > u]. E .) 38. Fix x (2m) 9 W. (3.11) Let Then from the definition of sup H2m(D) < b + S. m := N(2 exist sets J ,Z D. '... j > 0 Then for each ,H 2 m) E Djm. such that there < 2 -2j min H2m (DAD.) J for every Define j TI , > 0, by 2 J:= 480V(Z )(log By (3.7) and Lemma 3.3 we have CO (3.12) Ti. < u/2. 3 L j=r+l For D ., i < D E0 mi, su and ch that large enough so 2 -2j D E Z , so ym (D) all ~0 1pm(D) Also ~0 j ~0 = ym(D r(D)) > 0 let H 2m(DAD < 1/2m, ~0 = y 3 (D)) then (D (D)). -2j < 2 0 m( D [p ~ j=r+1 . If (D)) H 2m(DAD j = 0 is for It follows that 00 + be one of the D.(D) (D)) ~0 -y (D - (D))]. 39. H 2m(D (D) AD j1(D) < H 2m(DAD ) (D)) + H (DAD 1 (D) ) < 2-2j + 2-2(j-1) = 5 -2~' Letting it follows using (3.12) y J:= Fm [sup I that > u] (D) [sup Ipm (Dr (D)) I > u/2] < F + j=r+1 p m(D Fm[sup z (D)) -/y ( )} ) (D > 1j 1] (3.13) < mr max {Fm[ yD - + P Ir mm where so h r max {Fm[I ym D 3=r+1 i < m Now u/2]: i < mr ~0 ym (D ) := lD , k < m ((D (j -l ,1 H 2m (D j-AD (j-1)-k) =M -1/2 Swm(D (jm-l)k D(j-l)k) (X ) - ) -D(lD )k) < > 1nj] Y.}.- m h , - D(j-l)k)(Xm+ZdI 40. m Z=1 2 2m hZ < 2 1 (lD %= jiL D( )k2 X < 4mH 2m (D iiAD (j-l)k) < 20m2-2j if H 2m(D. AD ) < Y.. . Hence by Theroem 2 of Hoeffding (1963), the terms on the right side of (3.13) satisfy F tlI0m(D*~ mm 0 (D m ) I > nil 2 mj < 2 exp (-2mn /4 hZ) Z=1 (3.14) < 2 exp (-q /(40-2-2j = 2 exp (-12jV(d3)log 2). Similarly, gz:= 1 D k=1 (Xk ym (Dri) - m-1 / 2 1D eg , where (Xm+z), so by (3.11), g < 2 mH 2 m(Dri) < 2m(b+S). 211 2m, gz - Hence applying Theorem 2 of Hoeffding (1963) again, 41. m y0(Dr) F[ > u/21 < 2 exp (-2mu2 /16 2 1gz) Ao=1. (3.15) < 2 exp Now by Lemma 2.5 we have (3.13), (3.14), F m [sup < m n and (3.15) | (D) I 2A(V(Z ))e <_ A (V( , so by > u] 4rV(D)log 2 -u 2/16(b+S) -. 2A(V(Z )) + 5 ) )2 4j( and the definition of 00 (3.16) (-U2/16 (b+S) ) . 2 8jV() log 2 -l12jV(Z) log 2 j=r+1 < 2A(V(,Z ) )e-u2/32(b+ ) + 2A(V(Z )) 2e-4(r+1)V(Dlog 2 /l < [2A(V(JD )) + 3A( V(Z )) The lemma now follows from (3.9), Let and define N e-4V (,Z)log 2) e-u 2/32(b+S) (3.10), and (3.16). I-I be a positive integer, to be specified later, 42. Y:= {l,...,N} 1b Y:= Q(fi}):= 1/N i E Y for all 0:= Q , a measure on for y Y) = (y ,y 2 ,...) (x,y) E X we get probability measures x Y defined as follows: pl) := 1 n n 1 (2) 1 n n n i=l X X jN 1 j,N ' P (j) : n = 1 nN n 3'j,N nl/2 (j) n -P n 1/2 (1) v~0 n:= n (P n (n 2(N) x ' N i i= (j -)N+1 Note that _ E Y (i-1)N + yN+i (y) For each (YO, 13 (i-1)N + yi, aii) (y): T (i) Y the power set of 0 .- nN P(2) - Pnn N (i) ), Define j = 1,2 43. Pr (N) (N) Pr:= P x Q, P1 n and on (Xw x Y, G0 x 'o) x Under the law Pn a measure P (2) n Pr (N)on x Y , we can think of as each being obtained by sampling n groups of N points each from according to X P (i) X, with the ith group sampled then choosing one point randomly from each of the n groups. For fixed x (nN) Pn) is a version of the empirical n measure which has n sample points chosen according to Pl,n'''' 2,n The latter measures are of respectively. course non-random for fixed normalized empirical process is independent copy (for fixed The corresponding x(nN). . v x (nN)) P (2) of is an (l) n The next lemma will in a sense make it sufficient to prove Theorem 3.1 with P(i) replaced by Pi,N. After such replacement we can take advantage of the fact that P is bounded below in the sense that i,N if P , (C) > 1/N - N(C) > 0. Lemma 3.5. Let n > 1 and let C C0 satisfying the measurability assumptions Let P. 0 < 6<lit c> 0, v> V(Cj,, and be a VC class (3.1) and (3.2). M <n 1/2 .Suppose 44. (3.17) N > max (3.18) n > 2 16 217n log n e2 a c M 2 18V and let (3.19) { W:= j=1 I> (C) sup IP jN(C) - P c (n EM/16n 1 /2 1. Then :P [sup IVn (C) c- I > M] < 2K 1 (v) exp (-M /2a) (3.20) + sup (nN) Proof. x (nN) Let I I- II [sup Iv C (C) | denote the sup norm over (1 - )M] If C % W then Iv ( - nl/2 ( (1) _ T(n)) = = Iin in- 1 /2 (P nN 1/ 2 1 j=1 < cM/16. It > follows that (n) - P (i)) ~ 45. =f f (3.21) x( 2 |nl/ JP II I on| I > M] = Pr (N) (1) n _ (n) ) I I > M] dQd (N) 1 In1/2 (P(1) n Y00 <P(N)(W + (n) ) vO [ sup >M] )I (1 > - $M (nN) Now n (3. 22) PN JP (N) (W) (3) j=1 [I IN 1 / we wish to bound each of these := C Lemmas 3.2 and 3.4 to y:= EMNl/ 2 /16nl H P ) P N(j [I 2, (, n P(j))II>EMN /2/16n 1/2 j terms by applying , m:= N, 6:= 1/2, /32nl/2 , b:= 1, S:= 1, and u:= eMN for all 2 i < N. From Lemma 3.2 and (3.17) we get (3.23) ( ) 1,N , - ()' |N 1/2~ 1II < 2P (IN1/2 [ < 2P 2N[IN Nl/2 Let r M l /2 /16nl1/2 1,N -P2,N) 1,PN -P2 ,N) M l /2 /16nl1/2 > EMNl1/2 /32nl1/2 be as in (3.6); then by (3.17) and (3.18), 1 1;N 46. 2 r> [256v log 2 vn log 2 2 (3.24) > - log n _- 2v log 2 > 8. Combining (3.24) with (3.17) (213 V( and (3.18) we get ))1/22-r/2 < (25 v) 1 / 2 < u/2, so (3.7) holds and we may apply Lemma 3.4. Noting that the event on the right side of (3.8) is empty, we get P [N |Nl1/2 1,N - 2,N) > EMNl/2/32nl/2 (3.25) < K1 (v)exp (-c2M2N/2 16n). By (3.17), log n - c2M2N < 2 16 M2N 2 17 (3.26) M /2a. Combining (3.22), (3.23), (3.25), and (3.26) gives 47. 2 (N) (W) < 2nK 1 (v) exp (-2M 2N/2 16n) (-M /2a) . < 2K 1 (v) exp This and (3.21) prove the lemma. C For C., (y, define Lemma 3.6. o < C cx, y > 0, ,H):= Let a. 'C C a > a(C ,n), s < 1, H and {A\B: II E C AB (X,(a), a law on , H(A\B) be a VC class, v > V(t ) , and yn < Y}. n > 1, and M < n1/ 2 .if a < 1/4 we suppose that M < 3n 1/2 (3.27) Let H( 1 ) ... ,H(n),Hf , be as in Lemma 3.2, and suppose that (3.28) Let sup IH (C) - P (i) (C) [sup C yn- (C) |> <M/16n1 /2 for all i < n. Then y:= exp (-CM2/16av). H I (1 - ) M] (3.29) < 2A(v) exp 2 * (-(1- )M /2a)+H C1 sup _ (YC ,H(n) jy (C) |>3 EM/32] . 48. Proof . Let m:= N(y , C , . . ., Cm E C C E C such that yp( If C)j hence or Ci \C and C\Ci |yn (1 - such that Ci) I [sup (3.30) Iyn( C) (1 > < m max H(J-yn(Ci) i<m sup (yC,H(n) Fix f (t) C E C := t (1-t) , t [ 0,1] ; (1 - | yn (C) |I ) , and let E (3.31) < -nl > 3 EM/32] . a2:= varH (n (C)) . then f(H (i) (C)) f (P (C)) < ac(C ,n) > 3eM/32 })M] f(t+h) < f(t) using (3.28), a2 =n n i i) M - > * + H[ |pn or )M - ) ' and It follows that (Ci\C)I > 3eM/32. H (1 ,(n) CE C (CAC.) < y, H C (yt > < y for all for some )M are in I yn( either- > if (n)(CAC) 1 min i<m i < m then there exists so ); then there exist ,(n) I (i) + eM/l6n 1 /2 + EM/16n 1 /2 Define + jhj , so 49. If a < 1/4 then by (3.27) , EM/16n/2 < 3ae2 /128 If a > 1/4 < as/4. M < n 1 /2 then since EM/16n 1 / 2 < e/16 < as/4. Hence by (3.31) , 2< (3.32) Let M:= (1 - (1 + M/nl )M, A: 2 a2 , and A2 : M/nl > MI < 2 1/2 MX exp (-n /2(1 (l + and (3.32), By Bernstein's inequality (see Hoeffding (1963)) H[Ipn (C) 2 + 1 X )) (3.33) < 2 exp If a < 1/4, then H [Iypn(C) (3.34) < 2 exp < 2 exp (-nl 2M 2/2(1 so < 3s/8 2 < M/n1/2a > MI (-(1 - < 2 ex )2M (-(1 - )M p (-nl/ 2 /2a(1 /2a) + 2M + 2 (3.33) gives 2 /2(1 ) (1 + + 6) )) 50. If a > 1/4 then Theorem 1 of Hoeffding (1963) gives > M] < 2 exp (-2M H[Ilyn(C)I (3.35) < 2 exp (-(l 2 7 -e)M /2a) By Lemma 2.5, (3.36) m < A(v)y-2v = A(v)exp Combining (3.30), v > 1 For all Define also v. Let Lemma 3.7. 2 k. Then be a VC class, a > 0. Let N-l < max 216 22 n 217 E: a n log n E: M Then there exist constants MO = M 0 (E,a,v), n 1 = n 1 (Ea,v) such that if n > n and M > Mo, then for + 8A(v 1 (v)). v > V(C ), y:= exp and suppose (3.37) k > l v 1 (v) > v (1) = 6 K2 (v):= 2K 1 (v 1 (v)) ZCQ . 0 < s < 1, M > 0, and be the least integer v 1 (v) (kV + 1)2 < such that (3.35), and (3.36) gives (3.29). (3.34), let (cM /8a) (-EM /16av) ~ 51. sup 42[ > 3EM/32] (C) Iv < K 2 (v) exp (-M /2a). 1 (YIPnN Let Proof. let | v := v (v) l denote the sup norm over 1-1 Z apply Lemmas 3.2 and 3.4 to Pi,N m:= n, H u:= eM/16. ,nN), and We wish to y:= 3EM/32, e:= y, , , := i, for all := ac2/213, b:= and Note that measurability is not a problem because all functions of Let , := and 'Y-measurable. are or v( be as in (3.6); then using Lemma 2.4, r u2 r = > Let [1 28(b+S)V ( M2 [32a v M 0 (c,a,v) )log 2 log 2 be the least real number such that the following all hold for all M > M0 (-EM 2 /32av) < eM/32 (3.38) 2 1 / 2 exp (3.39) (213v )l/22-r/2 < cM/32 (3.40) exp (-eM /16av) < ac 2 /213 (3.41) 4av 1 < M 2 . Let 49 46 -13/3 6-32/3 n(s,a,v):= max (2~ ,c,24 6 vaJ e 33 ,2 vct -2 -5 s ). 52. Fix M > M0 and n > n1 . Now n sup n 1=1 C Pi,N (C)( -P < sup PnN (C) C1 i,N(C)) so the hypotheses of Lemma 3.2 are satisfied; using (3.38), (3.5) becomes Q[ I |V (1) (3.42) || > 3sM/32] By (3.40) we have sup < 6, while (3.7) (3,8)' becomes >eM/16] O[ IIv Y < PnN (C) Thus the hypotheses cf Lemma 3.4 follows from (3.39) . are satisfied, and > CM/16] . 20 [Iv | < <_ + K > b] ( (1) +P (2)-_PnNII ([ (v)exp (-u2/32(b+ )) (3.43) < 20[ IP (l-Pn 1 + K 1 (v 1 )exp Let C, . . . , Ck for some also. > b] (-M /2a) . k:= N(l/nN,Z, ,1PnN) ; then there exist E C, i < k. Thus such that But then C E C 1) P nN (CCi) implies PnN (CAC ) <l/nN nN 53. [HiP n(1 (3.44) = 1[max i<k > b] -P nN H P (C ) n PnN(C i) < k max I > b] P nN(C.)| - > b]. i<k Define log O < y < ((1-y)/yp)/(1-2i), g (y):= 1 Then g(y) > 2 (see Hoeffding g () > 2 > log g () > log (1963)) < 1. so -2 1> 1log 1 2 (3.45) ( P- - 1) > (log - 1 > log , and then also g~l~ >~ 1 2 11 log _ > $ >1 12 log _ (3.46) 2p (1-y') By Theoren 1 of Hoeffding - 2I - (1963), p - (3.45), and (3.46), < e-2 54. p[P() < (3.47) (C) n - 2 > b] PnN(C ) nN2 exp(-nb 2g(P nN(C.))) +exp(-nb 2g(l <2exp1(< 2 x n( b 2 - P (C.))) 1 log -) = 2 exp(-nae5 M /2 33v). Case 1: M2/a < 2 log n, so the second term on the Now right side of (3.37) is the larger. (log n)/n < (log 2 49)/249 Lemma 2.5, so using (3.37), and (3.41) we get log (k/A(v )) < 2v log (nN) < 2v log (2 18n 2(log n)/E2 M 2 < 2v log (n3/230 2 Case 2: right side of have, n > 2 9, since M 2/a > 2 log n, so the first term on the Again since (3.37) is larger. using (3.37) : log ). (k/A(v )) < 2v log (nN) 2 < 2v log (2 < 2v log (n3 /2 30C 2 a) n2 n > 2 , we 55. Now (in both cases) and t-/6log t n 3/230 2 a > 268 is a decreasing function for so using the definition of t > e6 n, and (3.41) we get log (n3 /2 3 0 E2) log (k/A(v1 )) < 2v < 2v1 (2-68/668 log 2)n 1 < n 1 /2v 29( (3.48) n > n , since 2 < nac 5 2 2 /2 5 2 )-1 / 6 (n1 / 2 al3/6 1/6 1 6/ 3 2- 2 3 v~) v 1 /2 3 2 v < nEs 5 M 2 /2 3 4v. Combining (3.42), (3.44), (3.43), (3.47), and (3.48) now gives I,[)(' I I > 3 EM/3 2] < 8A (v1 )exp(-nts 5M 2/2 + 2K 1 34 v) (v 1 )exp (-M /2a) < K2 (v)exp(-M /2a) , the last inequality following from the definition of n1 . j_ We are now ready to assemble Lemmas 3.5-3.7 into a proof of our main theorem. 56. < nl/ 2 IVn(C)[ Let N (3.17); then be the least integer satisfying N (3.4) holds. as in Lemmas 3.6 and 3.7. M Let (3.37). satisfies as in Lemma 3.7, and let to see that M < n 1 / 2 , since We may assume Proof of Theorem 3.1. and n 0 := max(n,2 18v). Let n1 be It is easy be as in (3.19), and W y We first apply Lemma 3.5, then apply Lemma 3.6 to the second term on the right side of (3.20), with Pi,N H the definition of for all x (nN) W, and for fixed obvious correspondence between the (3.28) follows from i; there is an of Lemma 3.6 and H V. Finally we apply Lemma 3.7 to the second term on the right side of All of this gives (3.29). > M] P[supI Vn (C) + sup Q[sup Iv (2K - x(nN) < + (v) 1- + sup (C) I > (1 - M] C x(nN)w < 1 < 2K 1 (v)exp(-M2 /2a) Q W 2A(v))exp(-(l-E)M (1) (C) sup ' l! iYt' nN) /2a) > 3 eM/32] n (2K1 (v) + 2A(v) + K 2 (v))exp(-(l-E)M /2a). From the proof above we see that if I a > a(C ,n), in place of (3.3) it is sufficient to verify that 57. (3.38) - (3.49) (3.41) all hold and that -13/3 46 49 > max(2l 8 v n >_a( ,2 ,ca,2 vai vn To prove limit theorems for a::= to apply Theorem 3.1 with as 32 n -+ -2 33 -32/3 o = a(C ,n) -5 va -2 -5 we may wish and n + o. To satisfy (3.49) in this situation requires that n-3/13 = 0(an). The next two theorems, however, are variants of Theorem 3.1 which are valid under conditions which may be weaker than (3.49); more about this later. The proofs are also simpler. Given J c a S(D ,n):= sup P Let Theorem 3.8 n > 1 and (n) 4 (D). be a C n-deviation-measurable for v > V(O ), and u > 0. define VC class which is {P (i), i > 11 Let r ( 512v5 log 2 (3.50) r(2).= un1 /2 *2048v log 2 and suppose (3.51) (2 13v)l/22-r /2 < u/2, i = 1,2, > S($ ,n), 58. u2/S > 36 log 2, (3.52) and 1/2 (3.53) > 512 log 2. Then :P[supjvn(D)l > ul < 4K 1 (v)exp(-u 2/2565) (3.54) + 4K1 (v) exp (-n 1 /2u/1024). We proceed by iterating Lemmas 3.2 and 3. 4. Proof. 1- 11denote the sup norm for functions on Let n 2 (i)) 2). n 4ju 1 yj /2 1/2 = 4ju/2 /2 u :=-" b := y j+1 /n /2 [u /128v(b +a)log 2] r: for j and 8:= S y 4 j+1/1/2 > 0. - We first apply Lemma 3.2 with H (i) := P(i) and observe that 2)1/2 1 - (18 log 2)( -/2) y > y /2 1/2 _ 59. by (3.52) , since r = r. noting that (3.7) with r > r0 - min (r We then apply Lemma 3.4, y. > u. l ,r (2) ) :P[I IVnI >u (n,2) [ - <P(n,2) + 2K (3.55) follows from (3.51) , since J We get < 4P [ ] (n) 12n I > b ] (v)exp (-u /32 (b +S)) (n) IPn > bj] + 2K 1 (v)exp(-u /32(b +S)) = [iPivniI + 2K > +1 (v) exp (-u /32 (b +S)). By obvious induction we get from (3.55) that for J > 1, [I I IvnI I o <JP1IIn 4 I > YJ] J-1 j=0 But if that J 2Ki (v) 4jexp (-u 2/32 (b+)). is large enough so y > n1/2 it follows 60. > U] < 3P [ I I| V I 2K (v)43exp(-u /32(b.+S)) j=0 Co < (3.56) (v)4 exp (-u /64b) 2K j=0 2K (v) 4 exp (-u /64 I + T > 2 Now for ) 3 1 j=0 the function f (j) = jr i on the nonnegative integers reaches its maximum value j = 1. T~- at It follows that j< (3.57) 1 I T > 2 for and integers j > 0. Also (3.58) > 1 + f(x) = since and T T 0 is convex and equality holds at j = 0 U2/S for j Applying (3.57) 1. T > 0 and integers (T-1) T = 16 with > 288(log 4)j4-2j, j log 4 < 42ju 2/288 < u /1283. > 0, so (3.59) j Using (3.58) with T = 16 -J we get to (3.52) we get 61. u2/128U = 42j u2/ 2 566 > (1+8j)u2/2566. J (3.60) Combining (3.59) , (3.60), and (3.52), 002 j=0 yI 2K (v)4jexp(-u /646) 1 J j=0 /1286) 2K (v)exp(-u 1 <20e 2K J 1 (v) exp (- (1+8j)u2 /2566) j=0 (3.61) 1-exp(-u 2/326) ) = 2K (v) exp < (v) exp 4K (-u 2/2566) Next we apply (3.57) with unl/2 > 256j (log 4)/4 T = 4 ~1 to (3.53) and get j > 0 so (3.62) Using (3.63) j log- 4 < 4 ~1 unl/2 /256 = u J/128b. J (3.58) with u /128b Combining J J (3 .62) , T = = 4 we get 4 un /2/1024 > (1+2j)un /2/1024 (3.63) , and (3.53) , 62. Cj j=O 2K (v)4 exp(-u ./64b) 2 2K (v)exp(-u /128b) < j=0 CO < (3.64) 2K 1 (v)exp(-(1+2j)un1/2/1024) j=0 2K 1 (v)exp(-un1 / 2 /1024)/ (1 = - exp(-2un 1/2 /1024)) < 4K1 (v)exp(-unl/2/1024). (3.61), Combining (3.56), proof. and (3.64) finishes the S-I For v > 1 Theorem 3.9. K3(v):= define C Let 2A(v) C a, be a VC + 2K 1 (v) + 8K v : v 1 (v), (1) r r (2) a > a(C ,n), v > V(Cj), y:= exp(-EM /16av), 9 2M2 1:= 9 2 v y log 2 3EMn 1 /2 16 2 v 1 log 2 ' (v class satisfying the measurability assumptions (3.1) and (3.2). 0 < s < 1, 1 and M > 0. Let Let 1 (v)). 63. If (3.65) y < 9ac2 /217, (3.66) M < 3n 1 /2 (3.67) y < E2M 2 /2 12log 2, (3.68) M /a (3.69) n > E/2 1 4 > log 2 2 18v and (3.70) (213V /2 r /2 < 3eM/64, i = 1,2, then (3.71) JP[suplvn(C)I > M] < K 3 (v)exp(-(1-E)M /2a). Proof. Let N and W be as in Lemma 3.5. Lemma 3.5, then Lemma 3.6 with H Applying := Pi,N, we get 64. P(supIvn(C) I > M] < 2K 1 (v) exp (-M /2a) [sup C vn 1 (C) + sup x (nN) (l - > ) M] w (3.72) < 2K 1 (v) exp (-M /2a) + sup x (nN) + 2A (v) exp (- (l- E) M /2a) |V sup 9 (Y,r- w (C) I> 3cM/32] . ' nN We now apply Theorem 3.8 to the last term on the right of D :0 v1 (3.72), with NC PnN). in place of v, u:= 3eM/32, Then V(C ) follows from S:= y, and < v1 , so (3.51), with (3.7 0), (3.67), and (3.53) from (3.66) and (3.68). we have u 2 /256S > M2/2a, n /2u/1024 > M2/2a. (3.52) from From (3.65) and from (3.66) we have Thus sup IVn(1) (C)| C,(y, C ,PnN > 3eM/32] < 4K (v )(exp(-u2/2566) + exp(-n /2u/1024)) < 8K 1 (v 1 )exp(-M /2a), and the theorem follows. ID 65. From the conditions (3.65) - (3.70) the following is immediate. Corollary 3.10. For n > 1 n C (L let be VC classes satisfying the measurability assumptions (3.1) and (3.2). Suppose V(C n IcX(C n .n) < v n < 1/4, for all and n. Let Mn > 0. 0 < c < 1, If a(3n' (a~ log I)1/2 = 0 (Mn) n n (3.73) and Mn = (3.74) (n1/2an ' then (3.75) JP[suplvn(C)I > Mn] < K 3 (v)exp(-(l-E)M /2an S|n n n for all sufficiently large n. Note that (3.73) and (3.74) imply that n~ 1 log n = o(an). In the proof of Theorem 3.9, the use of Lemma 3.5, and of the discrete approximations P for P (i), could be avoided by applying Lemma 3.6 directly with H (i) = P instead of need to assume H C e to a Ie '(.n) = Pi,N. ) But we would then to be n-deviation-measurable, 66. a hypothesis which may be a nuisance to verify in specific cases. In comparing Theorem 3.1 to Theorem 3.9, it should be noted that the condition (3.66) is stronger than the last part of (3.3), but (3.66), (3.69), and (3.70) together may put weaker conditions on totically than (3.49) does. and a = an requires that n and a asymp- To satisfy (3.49) for large n n-3/ 1 3 = 0 (an)' but (3.66), n~ log n = 0(an)' (3.69), and (3.70) only require that Corollary 3.10 could not be derived from Theorem 3.1, because (3.73) and (3.74) may hold without (3.49) becoming true for large n and a = an' n > n0 In Theorem 3.1 the requirements that M> M may be dispensed with if we allow also on c and a. K and to depend In particular, letting M2 (c"a A'v) := max (M0(E, a ,v) ,n 0 aV l /2) K0 (E,a,v):= max(K(v),exp(M2 (e,a,v)/2a)), we get the following from that theorem. Corollary 3.11. Let C C C be a VC class satisfying the measurability assumptions (3.1) and (3.2). 67. Let v > V(C), a constant a > 0, and K 0 = K 0 (,cav) 0 < e < Then there exists L. such that if 3n 1 /2 aE/8 n > 1, a > a (C ,n) , if a < 1/4 if a > 1/4 and 0 < M < then Theorem 3.1 is not true if the quantity in (3.3) is replaced by P, let be the same law of probability To see this, let all C consist of a single set c > 0 S < 1/4, let M:= (2n(log 4)/3) n > 1, a:= 5(1-5), (1-), and M] >P[P (C) = P C be small enough so K > 0. sufficiently large we have :[Ivn(C)I 3n 1 /2as/8 m. 5 > 4-(1-E) (l-)/35, and let is I -I' > M] < K0 exp(-(l-c)M /2c). 1P[sup vn(C) C 11 = Sn > K-4-n(l- )(l-E)/33 = K exp(-(1-6)M /2ca. Then if n 68. IV. A Bounded LIL for the Normalized Empirical Process In this chapter we will use Corollary 3.10 to establish a bounded LIL for vn, following a standard technique for such results. Let and Lx:= log(max(x,e)) LLx:= L(Lx). Theorem 4.1. If C is a the measurability conditions an:= a(C ,n), (4.1) class satisfying VC (3.1) and (3.2), and log = o(LLn), n then (4.2) lim sup n where 1-I 11 n 1/2 = 1 (2anLLn) a.s. denotes the sup norm for functions on Conversely if Sn, C n > 1, are positive real numbers such that (4.3) LLn = o(Sn' then there exists a sequence the integers such that Z and a VC P (, i class C > 1, of p.m.'s on of subsets of Z 69. log (4.4) L = o () n but (4.5) lim sup 'n 2 = m (2anLLn)1 n a.s. I-I The following lemma is from Stout (1974) p. 262. Lemma 4.2. 6 > 0 For each there are constants ff(6) let Z , 1 < i < n, be independent mean - 0 S:= ap < satisfying the following: Z Z= and jzij and Tr(6), s2:= Pr[S/s > P] 1 EZ. < as a.s. If and r.v.'s. Let a > 0, p > p(6), for all > exp(-(l+6)p /2). Proof of Theorem 4.1. Let n > 1 and p(6) i < n, then I-I To prove the first half of the theorem we first show that (4.6) n1/2 < 1 + 96 a.s. for all lim sup n (2an LIn)1/ Fix 0 < 6 < 1/36 Un := (2anLLn) 1 /2 and define 6 > 0. 70. M := (l+26)u . n n By (4.1) we have (4.7) nacn + o as n +4 m so we can define n(k):= min{n > 1: nan > (1+46)k k > 1, n(0):= 0. Define the events Ak: [ Iv II > (l+96)u for some n(k-l) < j < n(k)] Bk:= [IIVn(k) II > Mn(k) ' We will show that (4.8) ]P(Ak) < 2 JP(Bk), k large, and (4.9) Y k=l so F (Bk ) < CO, P[Ak i.o.] = 0 and (4.6) will follow. All further inequalities in this proof should be interpreted as being valid for sufficiently large (but 71. nonrandom) Fix k. k > 1 and define v.|| > (1+96)uj} < 0 > n(k-1): T:= min{j m 1I < m. i=z Then n(k) P(Bk) > j iP(Bk j=n (k-1) +1 n (T = j] ) (4.10) n (k) d(j+1,n (k)) dP lIj 1 j=n (k-l) +1X Fix U:= X.. with and ' Iv (C)I > (1+96)u E [T = j+1 < i < n(k), for (i) Iul < IVn(k)(C)I jl/ 2 .1/2 (C) and v(C) - n(k) 1 /2Mn(k) (1+96)u > Mn(k). j It follows that Then i. Y(k) Then jj+1 Y. i Vn(k) so < n(k), C E C there exists Y j j,n(k-l)+l < Xn(k)-j implies . Define 72. 1 n (k)j Bk C d (j+l,n (k)) (4.11) 1 > f 1 /2 [Iu1.jjl/2 (1+96)u -n(k) l2n(k) xn(k)-j Now by (4.12) as k (4.1) , LLn(k) nu LL(n(k)an(k)) - oo, LL(1+46)k so n(k)u 2 n (k) (4.13) % 'b (1+46) klog k and then (4.14) It n (k-1) u (k-1) follows, j n (1+46)1 n (k)u 2 using (4.7), that (1+96) 2u > (1+96) 2n(k-l)u (k) > (1+96)2 (1-4 6 )n(k)un(k) > (1+6) 2n(k)M and then that ( '' log k (j+1,n(k)) 73. j (1+96) u n (k) - 2 Mk 6n (k) 1 2 Mn (k) n (k) ) 1/2 > (2n (k) > 1 (2EU 2) 1/2 Hence by Chebyshev's inequality, the right side of (4.11) is at least 1/2. Since E [T = x j] is arbitrary, (4.10) then gives n(k) 1 n (k -[=j] j=n(k-1)+l n (k) 1 j3 :P[r= j ] j=n (k-l) +1 = 1(Ak)/2 and (4.8) is proved. To prove (4.9) we apply Corollary 3 .10. follows from (4.1), and (3.74) is clear. Thus, using 2 k 3(V(( ))exp(-(1-6)M < K 3 (V (C )exp < K3 (- (1+ 26) LLn (k)) 'C)k-(1+6) (3.73) n(k) (4.12), 74. and (4.9), and then (4.6), follow. To prove the reverse inequality in use Lemma 4.2. and n > 1 (4.15) Again fix define sup yn (C) T > 2/6 n(0):= 0. 0 < 6 < 1/36. (C):= n var(v (C)) For C so = nan' C Fix y (4.2) we will and this time define B y (4.15) for each k > 1 n(k):= min{n > 1: na there exists with (4.16) For yn(k) (Ck) > (1- k > 1 6 )fln(k)an(k)' define Xn(k-l)+i ~ P(n(k-l)+i))(Ck m(k):= n(k) - n(k- 1) m (k) Sk.= . i=1 2 m (k) Zk, i 2 EZk,i i=1 sk .S ak:= 1 /sk k i >1 Ck E > Tk 75. (2LLn(k)) 1 /2 S1-76 ) 1-6- Wk:= n(k) -/2Sk rk := (1-7 6 )un(k)* Similarly to (4.17) n (k-1) u (k-) T~ while (4.14) we have n(k)u n(k) as k co, -* < 6/2 < 6 (1-6) , so 2 Sk (4.18) T n(k) (Ck) > (1- 6 Y. - (k)-1(Ck) - n(k-1) an (k)-l )n (k) an(k) > (1-6) 2 n(k)an (k) Let n (6) and and (4.1) we have Similarly to be as in p (6) a 2P < T (4.12) we have with Lemma 4.2 and Lemma 4.2; from (4.18) (6) , while clearly LLn(k) (4.18) gives , log k, k >_ p(6). which along 76. :[Wk k > n(k) 1/2rk/sk] Sk/s>k > P [S k/s k > rk/(1-6)a 1/2 =P[S ks k >k > exp(-(1+6) (1-76) 2 (1-6)2LLn (k)) k -(1-6) Since the (4.19) are independent it follows that Wk limk sup Wk/un(k) >1- 76 a.s. Using (4.6) we have Ivn (k) (Ck) im sup WkI u n(k) k n (k-l)1/2 IV (k-1) (Ck) I = lim sup k n(k) 1/2un(k) n(k-1) 1/2u < lim sup k (4.20) n(k) 12u n(k) n (k-l) 1/2 1/2 < lim sup k T- 1/ 2 61/2 n (k)1/21/2 n(k) 77. (4.19), is arbitrary, (4.6), 6 Since and (4.20) prove (4.2) . Proceeding to the converse, let (4.3). We take to be {{i}: min b., so j>n b ( bnI:=n/LLn, an inductively R :=1 > 1 E ZI, i and + + 0. n/R and Rn 2 denotes the integer part. i so we can choose a sequen such that exactly Since (4.21) for all 2(k) for each Q (n) Z(k+l) - are equal to k > 1, -n n + [ . ] equal to n > 1, k9,(k), (n) where Define a n + o. Let 2 Z (k) := [exp (:exp (16k (log 2) /3R j(k): and define and Rn:= min(a 1/, n1/4 ,(n(n-l)) l/ 2 R n-_, n n so be as in {n} 2(k) Q,,, P (j (k)+1) k > 1. > 1, of of the first 1 < n < 2 k, of the p.m.'s Then j(k) for all 2(k) p.m. '5 + z on P (1's are k > 1. {P.i : j (k) < i < j (k+l) I we may take the sequence to be such that ~ (j (k)+2) = * * * (j (k)+2(k+l)-Z(k)) =(k) 78. For k > 1 let be an r.v. with binomial dis- Yk t (k) tribution with parameters for vn ({i}), we have that r.v. 's for fixed Now, using n, = { Vn (i) : i > 1} v ni) are independent = 4 ( j (k) -1/2 (v(i)) 1 Writing (k 2 k (4.21), sup 1ard a= with and 1/2. P ()= ({m: card({m: P (m) Q (i) m < n}) = Q (1) m < n}) (4.22) (k(k) + n - j (k)) /4n if k (k+l) /4n if j (k) j (k) < n <j (k) + Z (k+l) - k (k) + Z (k+l) - k (k) < n In particular (4.23) acj (k) = k(k)/4j(k) = 2 -(k+2) Let Pk:= P[ vj (k) (1) = using Pr[Z(k) 1-/2kY (4.23). > Rkuj (k)I k _ > R (LLj (k)) 1/2/2] 2(k)) ~ -k Define events Dk by < j (k+1) 79. Dk : [Vj (k Then the (4.24) (i) /u j (k) - Rk 2 k-l for some < i < 2k I are independent, with Dk 2 k-l :'(Dk) =1-Pk Applying Lemma 4.2 to the definition of pk 2/2, s = 9(k)1/2/2, 6:= 1/4, p:= Rk (LLj(k))1 n:= 9.(k), with and a:= k(k) we get (for large k, as always): > exp(-5R (LLj(k))/32) > exp (-3R 2(LL(k))/16) log 2) . > exp (-k cl2k- k Hence Thus : [Dk i.o.] = 1, To prove (4.4) , k(n):= k Using (4.22) n < j(k(n)) + n for so by (4.24), SI and (4.5) Ik ? (Dk) = CO follows. set j(k) < n < j(k+l), k > 1. and (4.23) , we have if (k(n)+1) - (k(n)) 1 P 4 j.(k (n)) + n + n - 9 (k (n) ) j (k (n)) j (k (n)) that SZ(k(n)) 4j(k(n)) - ca. j(k(n)) 1/2 80. n + Z(k(n)+1) - n > j(k(n)) while if - 9 (k (n)+1) Hence using log a n 4n > - 9 (k (n)+1) 4j(k(n)+1) (4.23) = 0 (log =0 - j (k (n))I (k (n) ) 0(Rkn)LLk (k(n))) = O(a 1/LLj(k(n))) O(a/2LLn) n = o(Sn) I--- I = Z(k(n)) j(k(n)+1) we have = a .~ )/2. j(k(n)) 81. Weighted Empirical Processes with Truncation V. In this chapter we study weighted empirical processes, with the weight at each set depending only on the size of C c on O , where q on vn /"T(n) that is, we consider processes that set; is a non-negative continuous function In particular we will prove a CLT for a [0,1]. modification of such processes. q(P(n) (C)) If for some C E . 0 < P with Therefore we restrict our attention to functions q > 0 (C) q with (0,1). on As discussed in the introduction, we will be interested in the behavior of vn /qo(n) studying studying it on q near 0 and 1. C on Since vn(C) = -Vn(C ), is essentially the same as {Cc: C e C }. consider the behavior of q Therefore we need only near 0, not near 1. Thus we define Q:= {q E C[0,1]: q > 0 and suppose on to include the case where both Suppose of X:= [0,1], [0,1], and all C are easily modified q(0) and Let may be 0. q(l) consists of all subintervals are the uniform law P q(0) = 0. (0,1] our results q E Q; all Example 5.1 < , is infinite with positive probability. (n) (C)) vn (C)/ then = 0 {Ck} P be a sequence of on 82. X . intervals shrinking to the (random) point lim sup Vn(C k k as > n -1/2 for all n, but Then q(P (n (Ck)) k 0 n k -+ for all o n. vn /q"( Thus is unbounded a.s.; Theorem 2.9, in particular (2.5), shows we cannot therefore expect a CLT to hold. |~_ To avoid this difficulty we "truncate" the process , replacing it with 0 on "small" sets 0 < T < 1 C (C) := .1 vn 1q01 for n > 1 and T in law of n 0 We consider sequences C (n),n T(n) Thus define (C /qo7() vn C. + 0 if P ()(C) (n) >< if P(n) (C) <T. and look for convergence to a Gaussian process. An alternative way around the difficulty in Example 5.1 will be considered in Chapter 6. Whenever an expression appears which evaluates as O/q(O) or /q 2 (0) (e.g., Vn (C) = q(P (n) (C)) = 0) to be 0. vn (C)/q ( ()(C)) when we define this by convention In all situations where this arises it will be apparent from the proofs that this is consistent with definition by continuity. 83. For C2( For 6 > 0 constants) C a(1, t > 0, and ,H):-= {C E C : H(C) we say for , i {P ,P)), (tC Cl C 0 and H a law on < t}. is size-6 measurable (with >l .} (or "for P") C measurable (with constants in the case of for {P i i > l} for all (or for P) if C , are each n-deviation- (t,C ,P'(n) 2 (X,s), define t E 2 (t,C ,P (n))) (0,6] and n > 1. Given a sequence (X, C ,PO.)and a law {n I P Z7(C )-valued r.v.'s on of on (X,CA), we say a subspace of Z"(C ) is admissible for {$n }' if $Pn E D a.s. $n into (D, b) for all for all n, n, and D is , i > 11 {P P, and ]P-completion contains measurable Cb,u At times we will need regularity conditions on to facilitate our proofs; non-increasing on for some [0,6] 6 > 01. pn (A,B):= covP (vn (A),Vn (B)) 1n (5.1) q q(t)/t Define finally - ' p for these occasions we define Q1 := {q E Q: q(t) non-decreasing and P (i) (A n B) - P D (A)P Wi(B), 84. pp(A,B):= cov(G P(A),GP(B)) = P(A n for all - P(A)P(B) A,B E C . Theorem 5.2. Suppose q2 (t)/t log (5.2) B) Let t c O, for P for some - be a satisfies q E Q 0 VC as t -- 0. class which is size-6 measurable 6 > 0. Let T(n) -+ 0 with either D be a subspace of n-1/2log n = o (q(T(n))) (5.3) or (5.4) Let log n = 0(T(n)). n P 9 0(C ) be a law on (X,O., admissible for and let {CT(n),n ' P, and } {P Suppose that (5.5) for every that and (5.6) and A > 0 P n)(AAB) < A P(AAB) < pn (A,B) -+ and there exist whenever for all such n > n 0 , A, B E C , 6, pP(AB) n0 A,B E C, 85. (5.7) q(P n) (C)) Then GT(n),n -+ q(P(C)) for all converges in law to C E C . G /qoP in (D,7' b). To satisfy (5.6) and (5.7) it is sufficient that P(n) (A r B) + P(A n B) for all A, B EC. In the identically distributed case, (5.5), (5.6), and (5.7) always hold. The condition (5.2) is natural because, by Theorem 2.2 of Dudley (1973) and Lemma 2.5 above, is a sample modulus for G_,. case of Example 5.1, we have where W0 WO, $ t In fact, in the special G([a,b]) = W 0 (b) - WO is a Brownian bridge. (1937, 1954) $ (t):= (t log 1)1/2 ' By a theorem of Levy is the best possible sample modulus for hence also for G . Thus $ best possible sample modulus for is in some cases the G on a VC class C In the special case of Example 5.1, our Theorem 5.2 is related to Theorem 1.2 of Shorack and Wellner (1982). If for all q(Q) n. > 0 then (5.3) is satisfied with In particular, in case T(n) = 0 q = 1, Theorem 5.2 provides a CLT for the unmodified processes Vn, general- izing Theorem 7.1 of Dudley (1978). We point out the similarity between the conditions (5.2), (5.3), and (5.4) and the hypotheses of Corollary 3.10. 86. Lemma 5.3. Suppose (5.8) q 2(t)/t Let Cc be a for every (5.9) VC class which is size-6 measurable -+ T(n) (5.5), Suppose 0. t 6 > 0, let ). satisfies as -* Cs for some (Xa q e Q (5.6), 6, rj > 0 P 0, and let -+ and be a law on (5.7) hold, and suppose there exist 6 > 0 and n1 such that 3P*[sup{fv n(C) I/q(P (n) (C)) for all If not all Proof. O T (n) < P (C) < 6} > e] < n n > n1 . P Then (2.5), : C E C , are equal, suppose also that (5.3) holds. (2.6), and (2.7) hold for is totally bounded for $n = d. (n),n by Lemma 2.5, so we must verify (2.5) and (2.7). We begin with (2.7). Using (5.3) in the non-i.i.d. case and Corollary 18.2 of Bhattacharya and Rao (1976), we see that it is sufficient to show (5.10) (B)) (A), cov(? T(n),n T(n),n as n - for all - A, B E O. p (AB) q(P(A))q(P(B)) Now 87. p n(AB) (5.11) cov (CC (n) ,n (A) , CT (n) ,n (B) )-= A)) q (P(n) (B)) (n) if P (A) > T (n) and > P(n) (B) otherwise, 0 so if neither P(n) (B) (n) < T (n) from (5.6), T(n(k)) infinitely ofte n, then (5.10) follows q(0) that so we have by P (n) (A) Ip q (P~ so (5.10) -+ - is P (A,B), that bounded away from 0. q( ( (A)) (A)) q(P n) (B) 2 (A) 1 / P I ~ (P ( (A), so (5.10) q(P(n(k)) (A)) Then (5.7) 0 =p < Pn(k)) as -+ 0 = q(P(A)), But 0. (A, B) pn(k) (AB) P n (A,B) q (A) f(n(k)) fn(k)}. Since q(0) = 0. Case 2: so > 0. (5.11) since follows from <, Thus suppose for some subsequence we have by (5.6) - infinitely often nor < T (n) (5.7) and (5.8). Case 1: k (A) T(n) (A)) * (B) 1 /2 q (P(n) (B) follows from (5.11) and (5.8). 88. We proceed to (2.5). exist a > 0 (5.12) and n2 Suppose there such that E B F*[CT n),n 6 > 0. Fix By (5.5) we can find (T n)] 6 > 0 < E: for all such that for large n, and (2.5) follows. (P) B C B (P n) Thus it is sufficient to prove (5.12). By (5.9) we can find < 6/2 T(n) (5.13) if n > n P*[ (5.14) such that n (C) T (n) I > E/2] < E/2 'T'(n) n > n . By Theorem 3.8, applied with M > 0 can choose and and sup 2 for all 6 > 0 = 1, we such that F[supIv n(C)I > M] < E/4 for all n > 1. C Let w:= inf{q(t): v:= V(C (5.15) 6/2 < t (l,C.,P)). s,t Choose E [6/2,1], We wish to choose C1 O' F (n) ). ', Fix < l}, 0 < a < 6/2 Is-tI < Sa > 0 > 0 u:= sw/2, a imply and small enough so I1/q(s)-l/q(t)I < /2M. and apply Theorem 3.8 to small enough so (3.52) holds, 89. i = 1 holds for a, (3.51) as in r (with (3.50)), is size-s measurable, and (5.16) 4K 1 (v)exp(-u2/256S) < E/8. Choose n2 > n1 for i = 2, large enough so (3.53) holds, (3.51) holds and 4K 1 (v) exp (-n 1 /2u/1024) (5.17) P[ for all n > n2 ' (5.16) and (5.17) we have By Theorem 3.8, (5.18) < E/8 sup (C) I > ew/2] |v for all n > n ' 2 < c/4 01, with P(n) (A) CT(n) ,n (A) T (n) and suppose there exist n > n2 Fix - T (n) ,n(B) > c. P(n) (AAB) Then since C < S, and < a and q(P ( (B)) n(B) V n(A) E < 6/2, B E (5.15) : we have using < 6/2 > P n) (B) > 6/2, A, (A)) Iq ( 7(n (A) ) q (P (n) (B)) Vn (A)-V n(B) (5.1J.9) q(P ( (A)) < w- 1 max (Ivn (A\ B) 1 n (B) ( q(P (n) (A)) I, I n (B\ A)I ) + ( e/2M) Ivn (B) I 90. < 6/2, Now since > 6/2 and P (n) (A) and ~(n (B) < 6. (5.18), if < P*[ then either P or (A) Hence we get from (5.13), (T. B (5.19), P(n) (A) < c/2 + Tr - n (A) > 6/2, [ (n) ,n (C) I CT (n) n (B) I : A, Ivn (C) I B E P(n) (AAB) > cw/21 I (n) follows. I I d efine (t) := sup{S E [0,1]: q(s) where we take E/2] > MI + P[supIvn(C) q E Q1 > (n) (B) > 6/2, sup_ I, t and (5.12) (n) sup C2 ( ' C 'P (n) ) + P*[sup{ I CT (n) q~ > 6/2 P (n) (B) < S and (5.14): T (n) ,n For (n) (AAB) sup 0 to be 0. < t}, t > 0, S} > E] 91. The following lemma summarizes a calculation, based on facts from O'Reilly (1974), which we will use several times. Lemma 5.4. f 0 (5.20) Let Let exp(- with q 2(t)/t) u > 0, 8 > 0, a constant t q E Q1 and 60(u,O,X,q) t q(0) = 0 and < 00 for all 0 < X < 1/3. c > 0. Then there exists such that for 0 < 6 < 60 and := q~ 1(q(6A), j (5.21) exp(-uq (t.)/t ) < 6. j=0 Furthermore if n > 1 and N > 0 (with 6 now arbitrary), and (5.22) u 1 (X~-1)~ log 2 < M < I then N (5.23) 7 1/2 < 2 exp(-uM). j=0 Proof. and By (5.20) we can fix q(t)'/t are monotone in 60 small enough so [0,q 1(q(S 0 )/X)] a nd q(t) 92. q1( 2 (5.24) Since q EQ exp(-uq 2(t)/t) A < 1/3 and q((1-A)tj X 60 /2) > tE < 0. we have (1-A,)q(t _- )/2 > q(6)AX , > 0, so t < (1-X)tj1/2 and hence 2(t. -1 -t.) 1 Using this and (5.24) we get if 6 < 60 2 exp(-uq (t)/t.) j=0 J J , 2 <= t +. - t. )/t xp(-u2 2(t o t. <~i: If J0O t. exp(-uA2 q 2(t)/t)t J < e. To prove (5.23), we have, using and reversing the order of summation: (3.58) and (5.22) 93. N exp (-un q(t )) j=0 N )Nj-N) exp(-unl/2 j=0 N < exp(-uMX~j) j=0 N < j -1 exp(-uM(1 + 1)j)) - j=0 = exp(-uM)/(l - exp(-(X (5.25) Let Suppose C c (, 6 > 0. for some (5.26) be a + VC Let 1)uM)) satisfies q E Q1 q2 (t)/t log - I-I < 2 exp(-uM). Lemma 5.5. 1 0 as t -- 0. class which is size-6 measurable T(n) -+ 0 with n- /2log n = o( q(T (n))). Then (5.9) holds. Proof. If q(0) > 0 of Theor em 3.9. then (5.9) is an easy consequence Hence we assume q(0) = 0. 94. 0, n > 0. Fix M > 0 and let be large enough so (X() - 1)~ log 2 < M (5.27) 1024() (5.28) 8K 1 (v)exp(-OXM/1024) Let 6 > 0 and (5.20), implies < 1/3, 0 < v:= V(C), fix Let < n/2. t,:= q~ 1 (q(6)X), j J > 0; since (5.25) by Lemma 5.4 and (5.25) we can take be small enough so q q (5.30) 62 X2 2(t)/t > 36 log 2 (5.31) [512vt (5.32) t-l/2q(t) > (215v)/2/OX is monotone on > [0,6] and t. E for all 1 log -/log 2 [0,6] J t < 6, for all for all for all j, t < 6, t < 6, and 00 1exp(-6 2 X2 q2 (t )/256t )< (5.33) nI/8K Mv j=0 where [ - ] can find n1 to is size-6 measurable and (5.29) 2X2(t) o 6 denotes the integer part. such that for n > n1 , By (5.26) we 95. (5.34) t(n) < 6. (5.35) exn1 / 2 q(T(n)) [eq(T(n))nl/2] (5.36) (5.37) 2048v log 2 > 512 log 2, >log n I log 2 - M < nl/2q T n) and (5.38) (2 13v) /2n n > n ." Fix integer (5.39) N > 0 t 1/2 < Xq( T(n))/2. By (5.29) and (5.34) there is an such that > T(n) if and only if j < N. Hence by (5.29), JP*[sup{Ivn(C)I/q((n) (C)) CE < 6} > 0] ,T(n) <P(n) (C) N [sup{ IVn(C)I : C E C , tj+1< (n) .} > eq(t j+) j=0 N (5.40) sup < j=0 2(t F ,C _ ,P(n) |Vn(C) > eXq(t )]. 96. Fix j < N; we wish to apply Theorem 3.8 to the right side of (5.40), with S:= t and u:= 6Aq(t. ). (3.52) follows from (5.30), and (3.53) from (5.35). r and r (2) r (1)> log t be as in (3.50) . Let Then by (5.31), /log 2 so by (5.32), 13) 1/22 (1)/ (213v) 122r 13 < and (3.51) follows for v) 1/2 1/2 J(2 t. J Xq(t. )/2, i = 1. By (5.36), r (2) > log n/log 2 so by (5.38), (213v) 1/2 2-r (2) /2 (213V) 1/2 n-1/2 < OXq(T(n))/2 < Gq(t )/2 and (3.51) follows for i = 2. Theorem 3.8 can therefore 97. be applied to give N 7 P[ j=0 ( (5.41) 2 Iv (C)I sup (tPIT(n)) > eXq(t )] N 222 I 4K (v)exp(-e X 2 (t.)/256t.) J 1 j=0 < 4K (v)exp(-nl/ + 2 Aq (t )/1024). j=0 We next apply Lemma 5.4, with from (5.27), u:= eX/1024; (5.22) follows (5.37), and (5.39), so using also (5.33) and (5.28), we have N j=O (5.42) 4K 1 (v)exp(-2 X2 2(t )/256t ) + N 4K (v)exp(-nl/2 eAq(t )/1024) j=0 < 'n . The lemma now follows from (5.40), Proof of Theorem 5.2. (5.41), and (5.42). We need only combine Theorem 2.9 with Lemmas 5.3 and 5.5, after noting that (5.2) and imply (5.3). jj1 I-I (5.4) 98. Weighted Empirical Processes Without Truncation VI. For this chapter we consider only the identically C We will look for relations between enable a the truncation at C that q P, and v n/qoP which without used in Chapter 5. T(n) class VC and a P Fix a law If , to hold for the processes CLT P. are the same law P distributed case, where all Et:= U{C E C P(C) < t}, a(t):= P*(Et), a*(t) = P*(Et), 0 C (1 and define t > 0. contains only a few "small" sets, in the sense a(t) + t as 0 - 0, then the difficulty illustrated in Example 5.1 may occur only with small probability, and a CLT may hold. Such a CLT, as well as a functional LIL, will be corollaries of the following, our main theorem. Theorem 6.1. Suppose the class VC P measurable with constants for q E Q1with (6.1) q 2(t)/a(t) - o as t - 0, C for some is size-6 6 > 0. Let 99. 2 dt f exp(-eq (t)/t)-- < 0 t (6.2) for all Then there exists a sequence copies of {G , j E > 0. of independent > l} Gy/qoP, with sample functions a.s. dP-continuous, such that k n- /2max sup lkl/2Vk (C)/q(P(C)) - I G (C)+ k<n C j=1 i (6.3) in probability, and in f a(q~ 0 (6.4) for all p < 2. 0 as n+o If (x-/2))dx < then the convergence in (6.3) can be obtained in If (6.5) in (6.6) addition f a(q~ 0 then the G.'s J ((xLLx) -1/2) )dx < 0, can be chosen so (instead of (6.3)) supIn 1 /2 Vn (C)/q(P(C)) - C nG (C)= o( (nLLn)l/2) j=1 From Lemmas 2.12 and 2.11 the following is then immediate. also. a.s. 100. Corollary 6.2. Under the hypotheses of Theorem 6.1 (including (6.5)), {v /qoP} satisfies the compact and Z0(C ), functional LIL's in and lim sup supIvn (C) / (2LLn)l/2 q (P(C)) n C Theorem 6.3. Suppose the with constants for P (6.7) q2 (t)/a(t) (6.8) :1 f exp(-eq2 (t)/t)r 0 Let D and P. (D, as v n/qoP sup P(C) l/2/q(P(C)) a.s. C class C is size-6 measurable 6 > 0. for some Let with q e Q1 t -+ 0, for all be a subspace of Then VC = 90(ZC ) E > 0. admissible for converges in law to {vn/qoPI G /qoP in and the latter process has sample paths a.s. in I b) Cb,u (C.dd). Conversely if vn/qoP converges in law in to a Gaussian process with sample paths a.s. in then f0exp(-Eq2()t < 00 for all c > 0 0 (6.9) for some q E Q1with goP = qoP, (D, ( b Cb,u(C. ,dp ), -j 101. and if q(0) = 0 (6.10) q2 (t)/a,(t) then o + t as +* I I 0. The first half of Theorem 6.3 is an immediate conWe sequence of Theorem 6.1, Lemma 2.12, and Theorem 2.8. prove the converse half in the following two propositions. Let P*(Et), P(At be sets in Ft and At and P(Ft) P If verges in law in (D,7 Sb) At t C Ft' VC con- C class to a Gaussian process with Cb,u(C sample paths a.s. in with (Et). on the vn/qoP Proposition 6.4. D C Z((C ), for some ,d ) q(0) = 0, then and (6.11) q 2(t)/a*(t) Proof. We may assume assume E A Cm E r A1 A1/2 for some with t as - (6.11) is obvious. X = (1 a*(t) > 0 0. 1/3 ... t > 0; otherwise for all a,(0+) > 0. Suppose first Let 1 < i < n, then X P(Cm) < 1/m, for each lim suplvn (Cm)I/q(P(Cm)) = 0. m -+ Thus A:= m We may Al/m' £Cm m > 1. I for some Hence 102. P(supIVn(C) I/q(P(C)) = ] > P[X E A some i < n] C for all is n, which contradicts the assumption that Z0(C )-valued, since Hence we assume not hold. P(A) a*(Q+) = 0. Then there exist q2(t ) < Ma*(t ). = a,(0+) If i vn /qoP > 0. Suppose (6.11) does M > 1/2 and t. -+ is sufficiently large 0 with (an assumption made tacitly in all inequalities for the remainder of this proof) then there exists (8n M) 1 < a*(t.) (6.12) P(C) < (4n M) -. < < a(P(C)) If a(t P(C) n. with < t. then (4n M) )< 1< (2n )- and (6. 13) If X. J with q 2 (P (C)) E A t. P(C) V < t < q 2 < Ma*(t ) (ti j < n. for some and then there exists C E C E C, so by (6.12) and (6.13), X = 1. (4n ) 1 It follows that for each ni ( 4 ni) - -l - (2n.) n11/2 (n-1~ (C) q(P(C)) large < for which /2 e > 0 there are arbitrarily 103. P,[sup{ > JP[X (6.14) = (C)f/q(P(C)): C E (: |vn I 1 (1 - P(AtPAti)) - - < P(C) e} > 11 < nil ' (8n M) 1) >l -(1- > (1 j for some E A , exp(-l/8M))/2. From its covariance we see that the limiting Gaussian process must be var(G in (C)/q(P(C))) = - P(C ) (1 P(C))/q (P(C)) (r t Because this process if )-valued, must be bounded C, so (6.15) Fix G /qoP. P(C)/q(P(C)) 0 > 0 and as P(C) (C E C a*(t) > 0 Since n > 1. it follows from (6.15) that we can find nl1/2 P(C)/q(P(C)) < P (C) < 6, JP[X E C for some ) + 0. for all C E C Thus for each 0 > 0 and < n] < 1/2, (1 - n > 1, > 0, such that and j t exp(-l/8M))/4. 104. JP*[inf{Ivn(C)|/q(P(C)): C E C, , P(C) < e} > 1/2] (6.16) < (1 - exp(-1/8M))/4. From (6.14) and (6.16) we see that (2.5) cannot hold for I-I $n = v n/qoP, so the convergence in law cannot occur. Proposition 6.5. If verges in law in (D,18b) v n/qoP VC con- C class to a Gaussian process with sample ) Cb,u (Cd paths a.s. in on the for some subspace D of then (6.17) f 1-2 dt exp(-cq (t)/t) < for all 0t E Q some Proof. then If with q(0) > 0 or a*(0+) = 0, for so t > 0. for goP = goP. a*(t) = 0 (6.17) is trivial, so we assume a,(t) > 0 s > 0 for some g(0) = 0 t > 0, and By Proposition 6.4 we have contains sets of arbitrarily small C positive probability. Suppose (6.17) does not hold. there exists u > 0 (6.18) a sequence such that P (Ci+1 P( iU {C } in We will show that and a constant Z'(C 105. and 2 exp(-uq (P(Ci))/P(Ci)) =0 { (6.19) i=1 Fix 13 > 0 6 such that q(t) q(t)/t are T:= {P(C): C E C }. [0,61], and let monotone on and Then we can write (0,1)\T (rnISn)' U = 0<n<N 0 < N < o. a union of disjoint open intervals, for some Let M = card(I), I:= {n > 0: rn < s /3}, and S:= U (rn' n)' nEI Then we can write intervals, with S = U <m<M(a m,bm), b0 > b a union of disjoint Let > ... inf(S n [0,6 1) if S n [O,6i] / 6 if S n [0,6 ] = 0. 0 s:= a*(0+) = 0 Since am 0. can be Case 1: m > some m0. but Thus either M = w. Then a*(t) M = w or bm - 0, so Define cm:= q(bm)am/q(am), > 0 m > m0' for t > 0, s > 0. bM <1 for no 106. Then since q(t):= q E Qi, Then - d1 , we have bm Cm < bm, am S q(t) if t q(bm) if t E [Cm bm) q(am)t/am if t E so there exists Define [amfcmI. E > 0 such that b 0=~ f S dt exp(-q (t)/t)- 0 b exp(-Eq2 b )/t)d C ~ m m m < 0 (6.20) m cm f I + t exp(-Eq2 (a )t/a )- mm0 a a0 m I mm + 0 2 dt f exp(-cq (t)/t) bm+1 Thus we get three subcases. Case la. is infinite. The first sum on the right side of (6.20) bm E T, we have Since Proposition 6.4 we have q(O) = 0. to some q2 (t)/a(t) +o It follows that q2 (bm) /bm - 6 1ilog 2 m 1 > M0. a*(bm+) > bm. q2 (bm)/bm for all We have then m +w as as t 0 + m + By if m, so greater than or equal 107. b m 00 (b exp (-q2 f 0 m=m1 2-k I m0m m /t)dt exp (-eq2 (bm)/t) I fn 00 (6.21) < Y (2-kb -2-k- bm) (2 -k-lb ) k=0 InI m=m 1 exp (-2k q2 (b) /b m 00 m=m k=0 <x 2 2 exp (-eq (bm)/bm). M=M Now for each b <P(C m > l bM/3 and we can find a set Cm E C q2 (P(Cm))/P(Cm 2 bm)/2bm with (6.18) and (6.19), with u = E/2, then follow from (6.21). Case lb. The second sum on the right side of is infinite. as m + Similarly to Case la, we have o, so a/m E (6.22) <_ (am) < 1 m2 > M0 . equal to some O =0 2 for m Hence 00 exp(-eq (a )t/a )m=M2 am 00 ' 00 < m=m exp (-6q 2 (am)x/am) dx 002 2 exp (-sq (am) /am)' 2 q 2 (6.20) (a )/aM greater than or m 108. Now for each we can find m > 0 and 3aM+1 < P(Cm) < am Cm e C with 2 (am)/2 am. q2 (P(Cm))> (6.18) u = E/2, then follow from (6.22). and (6.19), with The third sum on the right side of (6.20) Case lc. For is infinite. let m > m0 £ > 0} [bm+1'am) n {32 bm+1 if bm+ 1 < am if bm+1 = am' Wm {b m+} {tk, k > 0} Let U W m>m0 be a sequence consisting of the set Then in decreasing order. m 9 tk 00~ CO= f J Y' exp(-Eq 2 (t)/t)dt dt m=m 0 tk E[bm+1 am I tk (6.23) < 8 exp (-eq 2 (tk)/ 9 t k) k=0 Now inf([tk,x) n T) < Ck E C with interval k > 1, so we can find tk < P(Ck) < 3t k [bm+ 1 ,am] as tk. holds, and (6.19), with Case 2. 0 we have s > 0. Let and Then q2 (P(Ck))/P(Ck) < 3q2 (tk)/tk and c > 3 tk, in the same P(Ck) P(Ck+l) < P(Ck)/ 3 for all k, so (6.18) u = e/3, follows from (6.23). vm:= s/9m, m > 0. For some 109. V 0 2 /t)-d dt exp(-E~q 2(t) Qo= f 0 .0 m = f m0 v m+1 1 8 7 exp(-q 2(vm )/v ). Now as in case lc we have can find 2 inm+l)vm+l mn m )/9vm m 1 exp (-Eq (vm m+1 0 mL(v = exp(-Eq2 (t)/t)t Cm eC inf([vm,m) n T) < 3 vm, so we vm < P(Cm) < 3vm = vm 1 /3, m > 1. with Then (6.18) and (6.19), with u = e/3, follow as in case lc. {C } Thus we have (6.19) in all cases. D Cmm U and u satisfying (6.18) and Define C., m > l, j>mI so the for D are disjoint. Since P(C) < 3-(-m)P(Cm) j > m, we have (6.24) P(Dm) >) ()P (C > P(Cm /4. 3>m Let W be a mean-0 Gaussian process indexed by with covariance EW(A)W(B):= P(A n B). Then Os , 110. (6 .25) Let ; ..(G P (C) ) pm ' > ul/2/2] , m > )/(P(C )) q(P(Cm))/p(cM Using the fact that S Now so by Proposition 6.4, > P (CM) , we have for large - P (C) W(X) ) , C E C (W (C) P [|W(Dm a* (P (CM)+) (6.26) = 1/2 + 0 as x -l m -- o. 2 for large m: P (DM 1/2 u1/2 q (P (C )) P(C )l/2 2 P (C l/2- P D )l/2 |JW(DM) u1/2q (P CM W(Dm P (DM)1/2 P (Cm 1/2 > ( (27T) 1/22ul1/2q (P (CM) ) P (CM) -1/2 ) -lexp -uq2 (P ( m) ) /2P (Cm > Hence by exp (-uq 2 P( Cm) )/P ( Cm)' (6.19) , pm . = Since W (Dm), m > independent, it follows that (6.27) P [IW(Dm) I/q(P() ) > ul We can therefore define /2 T (j , j i.o. ] > , = 1. to be the are 111. j th index Then m such that IW(Dm)I/q(P(D is an r.v., and T(j) [T(j) in the c-algebra generated by k > 1 so Ck\ Dk is j)=k] 1 W(C m) D, W (D1 ) , .. . ,Dk X > 0. Fix = W(D m) = k] .. is an event . ,W(Dk), is independent of disjoint fzom on disjoint sets. )) > ul/2 / + W(Cm \Dm) for all since W(C k\ Dk) and W is independent Now a.s. for all m > 1, so by (6.26) and Chebyshev's inequality, for k > some we have )I/q (P(CT(j) )) > ul/2/41T(j) = k] P[JW(C >_ P[ W(C k \Dk) I/q(P(C k) ) (6.28) < ul/2 /41T = P[IW(Ck\Dk Sul/2q (P (Ck > P [W(C k\Dk0 - AP (Ck\DDk) 1/2 j) = k] >-2 Since T(j) > j, and k > k0 is arbitrary in (6.28) , it follows that P ( W(C T( ))I/q (P (C T( ) )) > u 1/2 /4] / >/4]>l -X -2 jj > k 112. so since X P(IW(C) is arbitrary, i.o.] /(Pm(C = 1. From (6.25) and (6.26) it then follows that P[ IGP(Cm) 1/1 (P(Cm)) > ul/2 /8 (6.29) n > 1 However, for each 00 :P[Pn (Cm) / 0] i.o.] = 1. we have 1 - (1 1 - (1 - - P(Cm) ) n m=1 m=1 00 3-m+1 n m=1 < so P (Cm) as m 3-m+1n M=1 a.s. -+ When combined with (6.26) this shows |V n (Cm) |/q(P (Cm)) - 0 as m- - a.s. This and (6.29) show the convergence in law cannot occur. The next three lemmas will be used in the proof of Theorem 6.1. (1974)0. The first is derived from a proof in O'Reilly I ___I 113. Lemma 6.6. f 0 (6.30) then q E Q + as o Suppose i > d- < o t 1, Hence for t q (t)/t is monotone on t, and exp (-sq2 (t) /t) q2 (t)/t Proof. q If with -+ 0. 4 0. [0,6]. t1 < t t /2 < t < t c > 0, for all Choose 6 > 0 M > 0 Then there exist /2, t1 < we have 6, and such that q2 t and i) < Mt . q2 (t)/t < Mt /t < 2M, so t f0 1 exp (-q2 t) /t) i=1 = Lemma 6.7. Suppose C C (6.31) g (x) h(0) = 0, and h~ (x) as 'n exp (-2M)t I-I 00 Q be nonnegative functions on increasing, t./2 and [0,m), q E Q . with Let h g and strictly x Then 00 (6.32) f 0 a(q~ (1/h(x)))dx < w if and only if there exists a measurable r.v. F on h 114. such that both (XCa ,P) (6.33) < F 1C/q(P(C)) C E for all and (6.34) f Proof. By (6.31), g(F)dP < m. (6.34) is equivalent to g = h so we may assume Suppose first (6.32) holds. points of discontinuity of numbers, and Ht:= f Then U:= S U T. since U Recalling that see that a(t), For Let S t > 0 be the set of T the set of rational let {Fs: s > t, s E U}. Ht E 0, s < t. fh 1(F)dP < m, Et C Ht and is countable, and Et C Ft and Hs C Ht for P*(Et) = P(Ft), a(t) P*(Et) = P(Ht) we for all t. Define F(x):= 1/(inf{q(t): Then for (6.35) x E Ht x X. u > 0, F(x) > u for some if and only if t > 0. x E Ht and q(t) < 1/u 115. q Since is continuous and the sets we may take the that F increase with in (6.35) to be rational; it t t, follows is measurable. u > 0 If and Ht and x E C, so F(x) > u. q(t) < 1/u = f for some q(P(C)) < 1/u with x E Ht, so Using (6.35) we have > u]du (F) P [h~ then t (6.33) follows. From this fg(F)dP > u, 1C(x)/q(P(C)) 0 < f P[F > h(u)]du < f P(H -l 0 0 = f 0 < Co. q )du (1/h(u)) a(q 1(1/h(u))) Conversely assume (6.33) and (6.34) hold. then a(q~ (1/h(x))) = a (0) If u > K and u > K, x E E -1 > h(u) . with Hence q(P(C)) x, and q(0) implies , then x E C q(t) K > F(x) > 0, 0 < 1/h(u). for some (1/h(u)) < 1/h(u), so > (6.32) Then there exists t < q1 (l/h(u)) q C for large q(0) = 0. holds; hence we assume such that for = 0 If 1C(x)/q(P(C)) 116. f0 a (q~ 1 (1/h(u)))du = f )du qI( (1/h(u)) 0 00 < K + f P[F K = K + f > h(u)]du > u]du P[g(F) K < 00, Lemma 6.8. with constants for (6.36) q2 (t)/a(t) (6.37) f 0 Let n1 8 > 0 VC Suppose the +. class P for some CO as t such that for 0 < T is size-6 measurable and let 6 > 0, q E Q. with - 0, exp (-eq 2 (t) /t)!-at and C for all E > 0. 6 > 0 < 1; then there exist n > n1, (6.38) 3P*[sup{|vn(C)I/q(P(C)): C E C , P(C) < 6} > Proof. If q(0) > 0 of Theorem 3.8. Hence we assume v:= V(C ), large enough so eI < n. then (6.38) is an easy consequence q(0) = 0. a(0+) = 0, and clearly we may assume Let and fix 0 < X < 1/3, By a(t) > 0 and let (6.36), for all M > 0 t be > 0. 117. (6.39) 2048 (eAnl/27 (6.40) (16K (v) + 8)exp(-eAn 1 /2M/2048) < n/ Let r be the least integer such that Let 6 > 0, and t.: -1-)-llog 2 < M, qi) ( as defined above Lemma 5.4. C 6 q is increasing on [0,6] (6.42) 2 x2 q 2 (t)/16a(t+) > 1 (6.43) 2 x2 2 (t)/8t > 36 log 2 [ (6.44) 6 2log 2 X (6.45) j=0 Let > r is (6.37), and to be small enough and for all for all for all q t E P [0,6] and for all t < 6, t < 6, t < 6, exp(-O2 x2 q 2(t )/2048t.) < n/(16K (v) + 8). (6v sup{t: a(t) < n/2n}, n > 1. a n /2 that for (6.46) 0, where is size-6 measurable with constants for (6.41) have > . (2 13V)1/ 2 2-r/ 2 < 1/2. By (6.36), Lemmas 5.4 and 6.6, we can take so j ), 4 (an) +O as n -+ By (6.36) we o, so we can find n > n eAnl/2q(a n)/2 > 512 log 2, n1 such j, 118. (6.47) .en l/-2 q(an) [4096v log 21 (6.48) n /2q(an (6.49) 8Xn 1 / 2 q(an )/6 (6.50) an < 6. Fix such that (6.51) t. > an If - M C ce , 1/2 (6.50) and (6.41) there is an integer By n > n 1. N > 0 ->r if and only if P (C) < tN+1. j and < N. P (C) = 0, then by 1 Ivn (C) I/q(P(C)) = n /2P (C) /q (P (C) ) < (na (tN+1)) 1/2P (C)1/ 2 /q (P (C) ) It follows that < (rI/2)1/2 < e. 22/288 log 2)1/2 (6.43) , 119. P*[sup{Ivn (C) /q(P (C)): < P*[P C E C > 0 for some (C) C E C , P(C) < tN+l with P(C) > e] < tN+1 (6.52) < na(tN+1I < -n/2. Define g(t):= max(a(t)/t,1), h(t):= Xq(t)min(g(t) l/2,(na(t+) )1/2) for t > 0. Then C E C F*[sup{Ivn(C) I/q(P(C)) , tN+1 < P(C) < 6} > e] N P[sup{ Vn(C) I: , tj+1- C E C P(C) < t } > eq(t j+l) j=0 (6.53) N > Oh(t )/2] <[Ivn(Ft J=0J N + I J=0 0 < Fix sup F[ 2 (t i j < N, IVn(C) , t I > Aq(t ) Ivn (F t)I <Qh(t )/2]. ,P) and let kl, k2 be nonnegative integers such that (6.54) Ivn (Ft. ) J < eh(t )/2 if and only if k1 < nP n(Ft ) J < k ' 2 120. H(.):= P(.IFt.) Define the measure on , and let J CO k > 1 For H:= H let such that i > 1 k least indices the following functions on ~ l k k . '_ H := 1 -6 1 6 X. i 2k , Xi 2 (Hk - H), Pk: k yk: k/2 (Hk - H) k1/2 ( - Hk) yk:= k (Hk , c): Xi i=k+l P0 (X Xi E Ft., and define l iELk Hk- be the (random) set of the Lk (Hk will be well defined a.s. in our context.) the set [nPn(Ft.) = k] Then on we have J (6.55) C E C , P(C) < t Also (6.56) Hk = Hk 1HI- a.s. implies kH k(C) = nPn(C). 121. Fix k < k < k2 - IkH (C) if C and nP (C.) nP (C) (F P (C) < t. then by (6.54), P( k )P n JJ < nl1/2 IVn(Ft.) /g(t) (6.57) J By for 0 2 (t assumption, o Let P. 1- , C,P); J k lIH2k - is size-6 measurable with constants denote the sup norm for functions on then - HII kl/2 (P = k P (Ft. P2k P (F P ) t. so since H 2n « P 2n it follows that k-deviation-measurable for and (6.57), H. S(t , Hence by (6.55), P) is (6.56), 122. JP[l lvn > QAq(t ) nP i = Ei[I JkHk - nPI n (F) =k] > xn 1 /2 q(t )] > t/ E)Xn 1/2 q(t.)/2] (6.58) <H[IIkHk - kH Ij ="[I lkI i > e" (n/k)1/2q (t )/21. We wish to apply Theorem 3.8 to the right side of (6.58), and with . C 2 (t , C , P) , u:= e X (n/k) 1/2q (t ) /2, := 6:= 1/g (t ) . J:= {j: We may assume j 0 < n/k > (a(t. ) + eh (t (6.59) while by (6.60) u2 2 x2 # 0. 2 Then by (6.54), ) /2nl1/2 )-1 > (2a(t )) 1 (t )/8a(t ) > 1, (6.43), u 2/ > Let < N, Qh(t ) < 2n 1 /2a(t )}. j E J. Case 1: k 2 2 2 (t ( )/8t > 36 log 2, which establishes (3.52); by (6 .46)1, so by (6.42) 123. k 1 /2u (6.61) > eXn 1 /2q(t )/2 > 512 log 2, which establishes (3.53) with and r (1) by r (2) k in place of be as in (3.50), but with k; by (6.60), the left half of (6,61), (6.47) we have r > r and (3.51) follows from (6.59) . n n. replaced (6,44), and r (2) > r, so Thus we may apply Theorem 3.8 and the left halves of (6.60) and (6.61) to get >y e (n/k) 1/ 2 q(t)/2] (6.62) < 4K (v)exp(-2x2 2 (t )/2048t) + 4K (v)exp(-6Xn1 /2 q(tk)/2048) Case 2: J. Then by (6.54), n/k > (a (t ) + eh (t (6.63) /2nl/2 > nl1/2 /8h(t) 1/2 -1 > (0Xq(t )a(t +) so by (6.42) , Let - if j E J. 124. u 2 > eXq(t )/4a(t +) 1/2 > (6.64) while by the second inequality in (6.63) and (6.46), u2 2n/2 2 (t.)(t) ) )/4 > enl/2q(t (6.65) /4h (t >36 log 2. again be as in (3.50), but with r (2) r Let Also (6,61) is valid in this case also. and rep.aced by n k; by the second inequality in (6.65), the first in (6.61), and (6.47) we have follows from (6.64). r (1) > r and r (2) > r, so (3.51) Thus we may apply Theorem 3.8, the second inequality in (6.65), and the first in (6.61) to get JH[IIpk j > 6X (n/k)l/2q(t )/2] (6.66) < 8K1 (v)exp(-6Xnl/2q(t )/2048) if We turn next (for both cases) to the first sum on the right side of (6.53). j9 jth Define J. term of the 125. w(s):= s/2(1 + s/3), y:= eh(t )/2n T:= s > 0, a(t )(1 - a(t )) , enl/2h(t )/2. w(s) > min(s/6,1), and Then T = eXnl/ 2 q (t )min (g (t.) //2, (na (t +) )1/2/2) > e r1/2 n 1/2q(t )/4 so by Bernstein's inequality (see Hoeffding (1963) ) and (6.49), P[ vn (Ft. > Oh(t )/21 < 2 exp(-Tw(y)) < 2 exp(-T) + 2 exp(-TY/6) < 2 exp(-xrI1 / 2 n1 /2q(t)/4) + 2 exp(-6 2 h 2 (t )/24a(t )) (6.67) < 2 exp(-eXn 1/2 n1 / 2 q(t )/4) + 2 exp(- 2 X2 2 (t )/24t ) + 2 exp(-2 x 2 nq2 (t.)/24) <4 exp (-e6 A11/2nl1/2 q t )/4 ) + 2 exp(-62 x2 q 2 (t )/24t.). 126. j Since (6.53), < N (6.54), are arbitrary, from k1< k < k2 and (6.62), (6.58), P* [sup{| v n(C) I/q(P (C)): (6.66), and (6.67) we get C E C, , tN+1: l P (C) < 6} > 6] N 2 1/q 4 exp(-6-n t )/4) j=0 + N 7 2 exp(-e2 x2 q2 (t )/24t.) j=0 N + 4K (v)exp(-2X2 q2 (t.)/2048t.) I j=0 (6.6s) N + 8K 1 (v)exp(-Xnl/2q(t )/2048) I j=0 N < (8K 1 (v) + 4) exp (-Xnl /2n /2q(t )/4) 7 j=0 + exp(-6 2 X 2 (4K (v) + 2) 1 2 (t.)/ 2048t ) . J j=0 Since (5.22) (6.39), (6.41), (6.45), and it by lemma. 2. in/ with (6.51), and 1/2/2048 (6.48), follows from we can apply Lemma 5.4, (6.40) to the right side of (6 .68) to bound In combination with (6.52), this proves the I -I Proof of Theorem 6.1. $: u:= 6n ( * + L2 (X, 11 ,cP) by We use Theorem 2.13. $(C) = lC, and a map Define a map $ from the 127. closure Let T to $(C) fC := $($(C )), and |1$(lCl2 tinuous. But $P($(C)), C C E C $ A T(n) $(C) be 0 B 2B dP < 2 is is totally 8 > 0 y > 0. there exists con- is Hence is a homeomorphism. fA' Since $ is totally bounded, so f(C) bounded, and given Let (lC) P(C)l/ 2 /q(P(C)), by Lemma 6.6 = compact, so (6.69) by L2 (X,(,P) imply such that P(AAB) < Y. n; then (5.8) and (5.9) for all follow from Lemmas 6.6 and 6.8, so by Lemma 5.3 we have (2.5) holding for e > 0 given for $n = vn /oP. we can find Hence using (6.69), y, 6 > 0 and n0 such that n > no' f,g P*[sup{[f(f-g)dv <P* [\n/qoP E B (6 .70) < - , f(f-g)2dP < 2 } > E] (P) 5 i.e. (2.13) holds. (2.14) E (2.16); let Thus we get Y., G := Y o~o*. j > 1, satisfying Then f(G ) = k(G /qop) G. has a.s. d -continuous sample paths; the remainder J P of the theorem follows from Theorem 2.13 and Lemma 6.7, and 128. using the latter first with =1/2 h(x) = (xLLx) with h(x) = x /LLx. Following Dudley (1978), let (C) Z (X) = x 2 , then 1/2 2 g(x) space all functions in x D0(C dp) denote the which are the sum of a dp-continuous function and a finite linear combination =lb6d let Let x E X, C:= d > 1, X:= [0,1] OL a(t) = a*(t). of vn (Cx) Here distribution function at a(t) = d-l 1 (log j=0 x. I)3/j!, Proceeding by induction on for d = k, and let ing to Ci t + ak+l (t) Ja 2 k+1 be the uniform Et = {x E X: x1 -- x <t} 0 < t < 1. d, we suppose this to be true x(t/x)dx 2)jdx log (1/t) udu y 1 X (log V)3/j! j=0 P is the normalized empirical t t + t d [0,x]: = H [O,x ] i=l C : Then }. We claim that j=0 T' 0 = , Then = t +-(log j=0 t k = d a (t)' be the function P. and X. q E C[,l1] For p , f/qoP E to(o Cx: x E X}, and let C d law on the Borel sets so E X. D 0 (C ,P,q) :={f/qoP: f E D 0 (C Example 6.9. for x of point masses, where a(t) correspond- 129. and the claim follows. t(log at) so (6.1) implies and let V Hence (1/t)) d-l (6.2) if /(d-l)! as d > 2. Let be a countable subset of x E X there exists a sequence x n) x with z as := C n , o + S: 0 = {C x : x E V, P(C x ) < t} we see that and that if 0 is 0, such that for each in V Applying Lemma 2.10 (trP), 2 with and or- {C x \C y : x,y E V, P(C x \C y ) < t}, is size-1 measurable with constants for D 0 (,P,q) v n/qoP X i < d. C -+ bd(t):= t(log (l/t))d-l x (n) for each C 1 (t,CP), or t is admissible for 2"(C )-valued a.s. {v n/qoP} for all n. and P, P Hence Theorem 6.1 and Corollary 6.2 give the following, which was proved in part by O'Reilly (1974) in the case when it is also related to the functional Corollary 6.10. Let q E Q on (0,1] d. to G /qoP, and the latter has Then vn/qoP and let P LIL of James (1975). be the uniform law converges in law in a.s. d = 1, (D ' uniformly d P-continuous sample paths, if and only if 00 f exp(-Eq < o (t)/t)!- 6 > 0 for all if d = 1 if d 0 (6.71) q (t)/t(log )d-1 + as t ' -+ 0 > 2. b 130. 00 If f 0 b d (q then -<cand (6.71) holds, (xLLx) -ldx satisfies the compact and functional LIL's in {V n/qoP} 2 (C ) . d = 1, as discussed by O'Reilly (1974), the When condition (6.71) for convergence in law is the same as that for sample continuity of the limiting Gaussian process d > 2, however; by Theorem 2.2 1p1/2 of Pruitt and Orey (1973), the function q(t) = (t log t) This is not true for G /qoP. sample-continuous but does not satisfy (6.71). G P/qoP makes For C= C: C ECd1, it as easy to show that is a(t) " dt t -* 0. Let Example 6.11. for Sd-l in the sphere w nondegenerate normal law on E Sd-l x {C : (w,b) } maps A C. and b E]R. IRd, and let Let C A: d + Rd one-to-one onto itself, P as for Let 4 be the distribution function on Now X (A) = N (0, I) . a(t) Hence we may assume N(O,I). Ed is the P =N(O,I). fl of N(0,l), be a chi-squared r.v. with d degrees of freedom. P(C b ) < t if and only if lixil Et = since be a be the class with same for and let P of all closed half spaces in Then there is an affine map Since C := {x E Ed: x-w > b} d , and d > 1, X:= rt f\#(2 > D 1(1-t)}. log (1/t) ) 1/2 and b > D-l (1-t), so Let rt := (1-t); 1 t/2) nu (27T) 1/2 t rt- exp (-r2 131. as t 0, + it follows that as a(t) = P I t 0, -+ > r ] = Pr[x2 >d r t ] 2 1r - d-2 exp (-rt2/2). C1rt t 2 ) (d-1) /2 C2t (log (1/t) where C1 Let and V C2 are constants. be a countable dense subset of similarly to Example 6.9, we see that C measurable with constants for is admissible for {v n/qoP} is a.s. ZO(C)-valued is size-1 P, and that and P for all D0 (, 'Pr') provided n. x R S vn /qP Thus Theorem 6.1 and Corollary 6.2 give the following. Corollary 6.12. normal law on Let q E Q 1 , let d, and let closed half spaces in law in (DO ' C, ]d. 'q),b be a nondegnerate be the class of all Then to P vn/qoP converges in Gp/qoP, and the latter has a.s. uniformly dp-continuous sample paths, if and only if f exp(-Eq 2(t)/t) d (6,72) < o c > 0 for all if d = 1 0 $) q2 (t)/t(log t (d-l)/2 + as t + 0 if d > 2. 132. If fd 0 (q 1((xLLx)- 1 /2))dx < w and (6.72) holds, where compact and functional LIL's in The functions and P a(t) Z.cU). In pact, let left-continuous increasing function on a(l) < 1. and t Then letting := {[ca,]: 0 < a < P all t. a a(t) [0,1] be any with is left-continuous if a(O) = 0 be the uniform law on < a(6-a)} U {[0,t]: a(t+) gives rise to the given function general D which can occur for some are quite general. and satisfies the {vn/qoP} fd(t) = t(log -.) (dl)/2, then [0,1] = t} a(t). Note that in Et is measurable for 133. INDEX OF NOTATION Notation Page Notation a. 12 K (.v) 36 A\ B 15 K 2 (v) 50 At 101 K 3 (v) 62 a(t) 98 a,(t) 98 A(v) 16 AE 20 m (n) 13 18 N( ,OH) 16 18 P B 18 b B(x,r) 19 -b y 42 24 E (P) B 17 68 Lx 12 12 12 P. j,N 12 42 C 13 P (Zm) 71 C({ .}) 27 pn 12 18 PI 25 (~t,C ,H) 47 P (1 42 (t,~ 4,) 83 Cb,u( S 2 ,d) n n p (2) n 42 16 13 128 26 Et 98 FC 21 Ft 101 17 F (n,2) 58 F(N) 42 Pr 42 Pr(N) 42 134. Notation Page Notation Page 83 Pr 24 Pr* 24 q 81 T (i) 42 q~ 90 T (i) 42 Q 42,81 Q 42 83 Q1 vc 14 14 v 1 (v) 50 x 12 xi 12 x(n) 32 42 y al(. ,n) 6(C ,n) (F) T,n 32 57 13 82 n 13 V0 25 V vnn ~0 42 n 42 pn 84 135. REFERENCES R. N., Bhattacharya, and Rao, R. Ranga (1976). Normal Approximations and Asymptotic Expansions. John Wiley and -Sons, New York Devroye, L. (1982). Bounds for the uniform deviation of empirical measures. (1973). Dudley, R. M. J. Multivar. Anal. 12, 72-79. Sample functions of the Gaussian Ann. Probability 1, 66-103. process. (1978). Dudley, R. M. Central limit theorems for empirical Ann. Probability 6, 899-929. measures. Dudley, R. M. and Philipp W. (1982). Invariance principles for sums of Banach space valued random elements and empirical processes. Fernandez, P. J. (1974). Preprint. Almost surely convergent versions of sequences which converge weakly. Bol. Soc. Brasil. Mat. 5, 51-61. Goodman, V., Kuelbs, J. and Zinn, J. (1981). Some results on the LIL in Banach space with applications to weighted empirical processes. Hoeffding, W. (1963). Ann. Probability 9, 713-752. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 13-30. James, B. R. (1975). A functional law of the iterated logarithm for weighted empirical distributions. Probability 3, 767-772. Ann. 136. On large deviations of the empiric d.f. Kiefer, J. (1961). of vector chance variables and a law of the iterated logarithm. Pacific J. Math. 11, 649-660. Kiefer, J., and Wolfowitz, J. (1958). On the deviations of the empiric distribution function of vector chance variables. Trans. Amer. Math. Soc. 87, 173-186. Kuelbs, J., and Dudley, R. M. (1980). Ann. Probability 8, 405-418. empirical measures. The law of the iterated Kuelbs, J. and LePage, R. (1973). logarithm for Brownian motion in Trans. Ldvy, P. Amer. Math. (1937, 1954). Log log laws for Soc. 185, a Banach space. 253-264. Thdorie de l'addition des variables Gauthier-Villars, Paris. aldatoires. Marczewski, E., and Sikorski, P. (1948). non-separable metric spaces. Measures in Colloq. Math. 1, 133-139. O'Reilly, N. (1974). On the convergence of empirical processes in sup-norm metrics. Ann. Probability 2, 642-651. Pisier, G. (1975). Le thdoreme de la limite centrale et la loi du logarithme it'rd dans les espaces de Banach. S6minaire Maurey-Schwartz 1975-76, Exposd IV, Ecole Polytechnique, Paris. Pollard, D. (1981). processes. A central limit theorem for empirical To appear in J. Australian Math. Soc. 137. Pruitt, W. E., and Orey, S. (1973). Sample functions of Ann, Probability 1, the N-parameter Weiner process. 138-163. Shorack, G. R., and Wellner, J. A. (1982). Limit theorems and inequalities for the uniform empirical process indexed by intervals. Stout, W. (1974). To appear in Ann, Probability. Almost Sure Convergence. Academic, New York. Vapnik, V. N., and Cervonenkis, A. Ya. (1971). On the uniform convergence of relative frequencies of events to their probabilities. 264-280. Theor. Probability Appl. 16, (Teor. Verojatnost. i Primenen. 16, 264-279, in Russian.) Wichura, M. J. (1970). On the construction of almost uniformly convergent random variables with given weakly convergent image laws. 284-291. Ann. Math. Statist. 41, 138. About the Author Kenneth Alexander was born March 3, 1958 and raised in Seattle, Washington, where he attended Shoreline High School and the University of Washington, receiving his B.S. in 1979. Since 1979 he has studied at M.I.T. under a National Science Foundation Graduate Fellowship. August 1982 he will be married to Chris Czarnecki. In