Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µn ⇒ µ, µn , µ are probability measures. (ii) Fn (x) → F (x) on each continuity point of F . (iii) E[f (Xn )] → E[f (X)] for all bounded continuous function f . (iv) φn (x) → φ(x) uniformly on every finite interval. [Breiman] (v) φn (x) → φ̃(x), φ̃ is continuous at 0. Theorem 1.2. Let Xn → c in dist, where c is a constant. Then Xn → c in pr. Theorem 1.3. Let Xn → X in dist, Yn → 0 in dist, Wn → a in dist, and Zn → b in dist, a, b are constants. Then (1) aXn + b → aX + b in dist. (Consider continuity intervals) (2) Xn + Yn → X in dist. (3) Xn Yn → 0 in dist. (4) Wn Xn + Zn = aXn + b + (Wn − a)Xn + (Zn − b) → aX + b in dist. Remark. We only need to memorize (4). (Chung) Theorem 1.4. If Xn → X in dist, and f ∈ C(R), then f (Xn ) → f (X) in dist. (Chung; consider bounded continuous test functions.) Theorem 1.5. (Lindeberg-Feller; a generalization of CLT.) For each n, let Xn,m , 1 ≤ m ≤ n, be independent random variables with EXn,m = 1 0. Suppose P 2 (1) nm=1 E[Xn,m ] → σ 2 >P0 as n → ∞ 2 ; |Xn,m | > ] = 0. (2) For all > 0, limn→∞ nm=1 E[Xn,m Then Sn = Xn,1 + · · · + Xn,n converges to N (0, σ 2 ) in distribution. (Durrett) 2 Convergence in probability Theorem 2.1. (Chebeshev’s inequality.) If φ is a strictly positive and increasing function on (0, ∞), φ(u) = φ(−u), and X is an r.v. s.t. E[φ(X)] < ∞, then for each u ≥ 0, we have P (|X| ≥ u) ≤ E[φ(X)] . (Chung) φ(u) Theorem 2.2. (Useful lower bound) Let X ≥ 0, E[X 2 ] < ∞, and 0 ≤ a < E[X]. We have P (X > a) ≥ (EX − a)2 /E[X 2 ]. (Durrett) Theorem 2.3. Let Xn → X in pr, Yn → Y in pr, f ∈ C(R). Then f (Xn ) → f (X) in pr, Xn ± Yn → X ± Y in pr, and Xn Yn → XY in pr. Theorem 2.4. Xn → X pr. if and only if for every subsequence {Xnj } of {Xn }, we may pick a further subsequence {Xnjk } so that Xnjk → X a.s. (Durrett; theorem 1.6.2) 3 Convergence a.s. Theorem 3.1. (Borel-Cantelli Lemma.) Theorem 3.2. (Komolgrov’s maximal inequality) Suppose Sn = X1 + · · · + Xn , where the Xj ’s are independent and in L2 (P ). Then for all n) λ > 0 and n ≥ 1, P (max1≤k≤n |Sk − ESk | > λ) ≤ V ar(S . λ2 Theorem 3.3. (Kronecker’s Lemma.) Let {Xk } be a sequence P∞ xof real numbers, {ak } a positive sequence that goes to infinity. Then n=1 ann P converges ⇒ a1n nj=1 xj → 0. (Chung) Theorem 3.4. Let {Xn } be a sequence of independent r.v.’s with E[Xn ] = 0 for every n, and 0 < an ↑ ∞. If φ ∈ C(R) is chosen s.t. P E[φ(Xn )] φ(x)/|x| ↑, φ(x)/x2 ↓, as |x| increases, φ is positive and even, and ∞ n=1 φ(an ) 2 converges, then P∞ Xn n=1 an Theorem 3.5. 6.10 of Davar) converges a.s. (Chung; how to use this?) Let {Xn } be i.i.d and p > 0. Then TFAE: (exercise (1) E|Xn |p < ∞. (2) Xn ∈ o(n1/p ) a.s. (3) max1≤j≤n |Xj | ∈ o(n1/p ) a.s. Theorem 3.6. If {Xn } are i.i.d and in L1 (P ), then limn→∞ EX1 a.s. (exercise 6.29 of Davar) 1 ln(n) P∞ Xj j=1 j Theorem 3.7. Let {Xn } be i.i.d with EXn = 0, Sn := X1 + · · · + Xn , and we assume that E|Xn |p < ∞. Then: (Durrett) (1) When p = 1, Snn → 0. (SLLN) n → 0 a.s. (2) When 1 < p ≤ 2, nS1/p Sn (3) When p = 2, n1/2 (log n)1/2+ → 0 a.s. for all > 0. Remark. When E|Xn |2 = 1, we have lim supn→∞ (Durrett) Sn (2n log(log(n)))1/2 = 1. Theorem 3.8. Let X1 , X2 , · · · be i.i.d. mean 0, variance σ 2 random variables, and put Sn := X1 + · · · + Xn for each n ≥ 1. Then we have lim supn Sn = ∞ a.s. (U of Utah spring 2008 qualifying exam; may use CLT and Komolgrov’s 0-1 law to prove it.) Theorem 3.9. Let X1 , X2 , · · · be i.i.d. where E[|Xn |] = ∞. Then lim supn |Snn | = ∞ a.s. (Chung; hint: prove P (|Xn | > n i.o) = 1, and then P (|Sn | > n i.o) = 1. ) 4 Random Series Theorem 4.1. (Kolmogrov’s 3 series theorem)PLet X1 , X2 , · · · be independent. Let A > 0 and Yn = Xn 1{|Xn |≤A} . Then ∞ n=1 Xn converges a.s. if and only if all the following three conditions hold: P (1)P∞ n=1 P (|Xn | > A) < ∞. (2) ∞ n=1 E[Yn ] < ∞. 3 = P (3) ∞ n=1 V ar(Yn ) < ∞. Theorem 4.2. (Levy’s theorem) If X1 , X2 , · · · , Xn , · · · is a sequence of independent random variables, and Sn := X1 + X2 + · · · + Xn then TFAE:(Varadhan) (1) Sn converges weakly to some r.v. S. (2) Sn converges in pr. to some r.v. S. (3) Sn converges a.s. to some r.v. S. Remark. For (2) ⇒ (3), we may refer to Theorem 5.3.4 of Chung. 5 Uniform Integrability Theorem 5.1. The family {Xα } is uniformly integrable if and only if the following two conditions are satisfied: (Chung) (1) supα E[|Xα |] < ∞. (2) For R every > 0, there exists some δ() > 0 s.t. for any P (E) < δ(), we have E |Xα | dP < . Theorem 5.2. Let Xn → X in dist, and for some p > 0 we have that supn E|Xn |p < ∞. We then have E|Xn |r → E|X|r for every r < p. (Chung: it provides us with a criterion to check uniform integrability; see Theorem 5.4.) Theorem 5.3. If supn E|Xn |p < ∞ for some p > 1, then {Xn } is u.i. p n| )1/q )(Chung) (Consider E[|Xn |; |Xn | > A] ≤ (E|Xn |p )1/p ( E|X p A Theorem 5.4. If {Xn } is dominated by some Y ∈ Lp , and converges in dist. to X, then E|Xn |p → E|X|p . (Chung; to prove it, first show that E|X|p < ∞, otherwise we may pick A large so that E[|X|p ∧ A] > E|Y |p ; also use the convergence E[|Xn |p ∧ A] → E[|X|p ∧ A] for every finite A > 0. This theorem can be also used to check uniform integrability; see Theorem 5.4.) Theorem 5.5. Let supn |Xn | ∈ Lp and Xn → X. Then X ∈ Lp (by Fatou) and Xn → X in Lp (by DCT). (Chung) Theorem 5.6. Let 0 < p < ∞, Xn ∈ Lp , and Xn → X in pr. Then 4 TFAE: (Chung) (1)|Xn |p is u.i. (2)Xn → X in Lp . (3)E|Xn |p → E|X|p < ∞. Theorem 5.7. Given (Ω, F, P ), {E[X|G] : G is a σ-algebra, G ⊂ F } is u.i. (Durrett; prove it with Theorem 5.1) Theorem 5.8. (Dunford-Pettis) Let (X, Σ, µ) be any measure space and A a subset of L1 = L1 (µ). Then A is uniformly integrable iff it is relatively compact in L1 for the weak topology of L1 . (measure theory II by D.H.Fremlin) 6 6.1 Martingales General submartingales Theorem 6.1. (Producing submartingales with submartingales) If {Xn } is a martingale and φ is convex, then φ(X) is a submartingale, provided that φ(Xn ) ∈ L1 (P ) for all n. If X is a submartingale and φ is a nondecreasing convex function, then φ(X) is a submartingale, provided that φ(Xn ) ∈ L1 (P ) for all n. (Davar) Theorem 6.2 (Doob’s Decomposition) Any submartingale {Xn } can be written as Xn = Yn + Zn , where {Yn } is a martingale, and {Zn } is a nonnegative previsible a.s. increasing process with Zn ∈ L1 (P ) for all n. (Davar) Remark. By the inequalities EZn ≤ E|Xn | − EY1 and E|Yn | ≤ E|Xn | + EZn , we have that {Xn } is L1 −bounded if and only if both {Yn } and {Zn } are L1 −bounded, and since E[limn Zn ] < ∞, we have that {Xn } is u.i. if and only if both {Yn } and {Zn } are u.i. (Chung) Theorem 6.3. (Doob’s maximal inequality) If {Xn } is a submartingale, then for all λ > 0 and n ≥ 1, and we have (Davar) (1)λP (max1≤j≤n Xj ≥ λ) ≤ E[Xn ; max1≤j≤n Xj ≥ λ] ≤ E[Xn+ ]. (2)λP (min1≤j≤n Xj ≤ −λ) ≤ E[Xn+ ] − EX1 . (3)Combine (1) and (2), we have λP (max1≤j≤n |Xj | ≥ λ) ≤ 2E|Xn | − EX1 . 5 Remarks. (1) If {Xn } is a martingale, then P (max1≤j≤n |Xj | ≥ λ) ≤ for p ≥ 1 and λ > 0. This is because |Xn |p is a submartingale. (2) We may take p = 2 and replace Xn with Sn − ESn in (1) to obtain Kolmogrov’s maximal inequality. E|Xn |p λp 6.2 Good submartingales (a.s. convergence) Theorem 6.4 (The Martingale Convergence Theorem) Let {Xn } be an L1 −bounded submartingale. Then {Xn } converges a.s. to a finite limit. (Chung) Remarks. (1) As a corollary, every nonnegative supermartingale and nonpositive submartingale converges a.s. to a finite limit. (2) It suffices to assume supn EXn+ < ∞(⇔ supn E|Xn | < ∞). This is due to E|Xn | = 2EXn+ − EXn ≤ 2EXn+ − EX1 . (3) It does not hold conversely. For example, we may keep throwing a fair coin and bet 10n dollars on the n − th round. We stop when our money becomes negative. Theorem 6.5 (Krickeberg’s Decomposition) Suppose {Xn } is a submartingale which is bounded in L1 (P ). Then we can write Xn = Yn − Zn , where {Yn } is a martingale and {Zn } is a non-negative supermartingale. Remarks. (1) By the inequalities E|Zn | = |EZn | ≤ E|Xn | + |EY1 | and E|Yn | ≤ E|Xn | + E|Zn | = E|Xn | + EZn , we have that both {Yn } and {Zn } are L1 −bounded. (Davar) (2) Unlike Theorem 6.2, when {Xn } is u.i., it does not imply that both {Yn } and {Zn } are u.i.. To see this, we may simply assume that {Xn } is a nonpositive submartingale which is not u.i.. 6.3 Better submartingales (L1 convergence or Lp convergence, p > 1) Theorem 6.6 For a submartingale, TFAE: (Durrett) (1) It is u.i. (2) It converges a.s. and in L1 . (3) It converges in L1 . 6 Remark. Say, it converges to X∞ , which can be regarded as “the last term” of the original submartingale. Theorem 6.7 For a martingale, TFAE: (Durrett) (1) (2) (3) (4) It is u.i. It converges a.s. and in L1 . It converges in L1 . There exists X with E|X| < ∞ so that Xn = E[X|Fn ]. Theorem 6.8 Let Fn ↑ F , then E[X|Fn ] → E[X|F ] a.s. and in L1 . (Durrett) Remark. For a generalization, see Chung Theorem 9.4.8. Theorem 6.9 (Lp maximum inequality) If Xn is a nonnegative submartingale s.t. supn E[Xnp ] < ∞, then for 1 < p < ∞, E[(max1≤j≤n Xj )p ] ≤ p 1−p p E[Xn ]p . (Davar; Chung) Remark. That is, if a nonnegative submartingale if Lp -bounded, then it is bounded by a Lp function. This inequality also holds if we replace Xn with |Xn | and submartingales with martingales. Theorem 6.10 (L1 maximum inequality) If Xn is a submartingale, 1 (1 + E[Xn+ max(log(Xn+ ), 0)]). (Durrett) then E[max1≤j≤n Xj+ ] ≤ 1−1/e Remark. By Theorem 5.5 and this theorem, we have another sufficient condition for L1 convergence martingales. Theorem 6.11 (Lp convergence theorem for martingales or nonnegative submartingales) If Xn is a nonnegative submartingale or a martingale with supn E|Xn |p < ∞, p > 1, then Xn → X a.s and in Lp . (Durrett) Remark. For a proof, see Durrett Theorem 4.4.5 or Varadhan Theorem 5.6; we may also use (a) Theorem 5.5 + 6.9 (b) Theorem 5.4 + 5.6 + 6.9 to prove it. Theorem 6.12 (Lp convergence theorem for submartingales) If Xn is a submartingale with supn E|Xn |p < ∞, p > 1, then Xn → X a.s and in Lr for every 1 ≤ r < p. (A combination of various theorems.) 7 Remarks. We may use Theorem 5.2 + 5.6 + 6.4 (martingale convergence theorem) to prove it. (Here we don’t use theorem 6.9) 7 Stopping times Theorem 7.1 (“generalized Optional Stopping Theorem”) If L ≤ M are stopping times and {YM ∧n } is a uniformly integrable submartingale, then EYL ≤ EYM and YL ≤ E[YM |FL ].(Durrett) Remarks. (1) If L ≤ M are two bounded stopping times (Say, L ≤ M ≤ k), then |YM ∧n | ≤ |Y1 | + · · · + |Yk |, which means {YM ∧n } is u.i.. Such case is usually called Optional Stopping Theorem. (2) There’re two sufficient conditions for {YM ∧n } to be u.i.(Durrett): (a) {YM ∧n } is a u.i. submartingale. (b) E|YM | < ∞ and {Yn 1{M >n} } is u.i. 8 Truncation Theorem 8.1 (L1 -Truncation) Let X1 , X2 , · · · be i.i.d with E|X1 | < ∞, and let Yn = Xn 1|Xn |≤n . Then P (Xn 6= Yn i.o.) = 0. Theorem 8.2 (Lp -Truncation, p > 1) Let X1 , X2 , · · · be i.i.d with E[|X1 |p ] < ∞, and let Yn = Xn 1|Xn |≤n1/p . Then P (Xn 6= Yn i.o.) = 0. 9 Examples of Martingales Example 9.1 Let X1 , X2 , · · · be independent and EXn = 0 for all n ≥ 1. Then Sn := X1 + · · · Xn is a martingale. Example 9.2 Let X1 , X2 , · · · be independent, EXn = 1 for all n ≥ 1, and Yn := X1 × · · · × Xn . Then {Yn } is a martingale. Example 9.3 (Related to densities- Likelihood ratios) Suppose f and g are two strictly positive probability density functions on R. If {Xi }∞ i=1 are i.i.d. random variables with probability density f , then Πnj=1 [g(Xj )/f (Xj )] defines a mean-one martingale. (Davar) 8 2 Example 9.4 Let X1 , X2 , · · · be independent, E|X and EXn = 0 Pn n | < ∞ 2 for all n ≥ 1, and Sn := X1 + · · · Xn . Then {Sn − j=1 E[Xj2 ]} is a martingale. Example 9.5 If Xn and Yn are submartingales w.r.t. Fn then Xn ∨ Yn also is. (Durrett) Example 9.6 Suppose E|Xn | < ∞ for all n ≥ 1, and E[Xn+1 |X1 , · · · , Xn ] = n Then { X1 +···+X , σ(X1 , · · · , Xn )} is a martingale. (Chung) n X1 +···+Xn . n Example 9.7 (Martingales induced from Markov chains.) Let {Xn } be a Markov chain and f = P f (That is, f is p-harmonic). Then f (Xn ) is a martingale. Note that if {Xn } is recurrent then f is trivial. (Karlin) Example 9.8 (Martingales induced from Brownian Motions(I)) exp(θBt − (θ2 t/2)) is a martingale for all θ ∈ R. (Durrett) Example 9.9 (Martingales induced from Brownian Motions(II)) Bt , Bt2 − t, Bt3 − 3tBt , Bt4 − 6tBt2 + 3t2 , · · · are all martingales. (Durrett) Remark. When we’re given a 1-parameter family of martingales, we may differentiate with respect to that parameter and see what happens. Example 9.10 Pn Let Y1 , Y2 , · · · be independent random variables. Define X0 = 0, Xn = j=1 Yj − E[Yj |Y1 , · · · , Yj−1 ] for n ≥ 1. Then {Xn } is a martingale. (A probability path; Resnick) 10 References Main reference: Chung, Durrett, Davar. Other references: Varadhan, Shiryaev, Resnick, Karataz, Karlin, Breiman. 9