PROBABILITY AND STOCHASTIC PROCESS 51 5. Modes of Convergence 5.1. Some important inequalities. We start this section by proving a result known as Markov inequality. Lemma 5.1 (Markov inequality). If X is a non-negative random variable whose expected value exists, then for all a > 0, P(X ≥ a) ≤ E(X) . a Proof. Observe that, since X is non-negative h i E[X] = E X1{X≥a} + X1{X<a} ≥ E[X1{X≥a} ] ≥ aP(X ≥ a). Hence the result follows. Corollary 5.2. If X is a random variable such that E[|X|] < +∞, then for all a > 0 P(|X| ≥ a) ≤ E(|X|) . a Example 5.1. A coin is weighted so that its probability of landing on heads is 20%. Suppose the coin is flipped 20 times. We want to find a bound for the probability it lands on heads at least 16 times. Let X be the number of times the coin lands on heads. Then X ∼ B(20, 15 ). We use Markov inequality to find the required bound: P(X ≥ 16) ≤ E(X) 4 1 = = . 16 16 4 The actual probability that this happen is 20 X 20 P(X ≥ 16) = (0.2)k (0.8)20−k ≈ 1.38 × 10−8 . k k=16 Lemma 5.3 (Chebyschev’s inequality). Let Y be an integrable random variable such that Var(Y ) < +∞. Then for any ε > 0 P(|Y − E(Y )| ≥ ε) ≤ Var(Y ) . ε2 Proof. To get the result, take X = |Y − E(Y )|2 and a = ε2 in Markov inequality. Example 5.2. Is there any random variable X for which 1 P µ − 3σ ≤ X ≤ µ + 3σ = , 2 2 where µ = E(X) and σ = Var(X). Solution: Observe that P µ − 3σ ≤ X ≤ µ + 3σ = P |X − µ| ≤ 3σ = 1 − P |X − E(X)| > 3σ . By Chebyschev’s inequality, we get that σ2 P |X − E(X)| > 3σ ≤ P |X − E(X)| ≥ 3σ ≤ 2 , 9σ and hence Since 1 2 8 1 P µ − 3σ ≤ X ≤ µ + 3σ ≥ 1 − = . 9 9 < 89 , there exists NO random variable X satisfying the given condition. 52 A. K. MAJEE In principle Chebyshev’s inequality asks about distance from the mean in either direction, it can still be used to give a bound on how often a random variable can take large values, and will usually give much better bounds than Markov’s inequality. For example consider Example 5.1. Markov’s inequality gives a bound of 14 . Using Chebyshev’s inequality, we see that Var(X) 1 = . 144 45 Lemma 5.4 (One-sided Chebyschev inequality). Let X be a random variable with mean 0 and variance σ 2 < +∞. Then for any a > 0, P(X ≥ 16) = P(X − 4 ≥ 12) ≤ P(|X − 4| ≥ 12) ≤ σ2 . σ 2 + a2 Proof. For any b ≥ 0, we see that X ≥ a is equivalent to X + b ≥ a + b. Hence by Markov’s inequality, we have P(X ≥ a) ≤ P(X ≥ a) = P(X + b ≥ a + b) ≤ P((X + b)2 ≥ (a + b)2 ) ≤ =⇒ P(X ≥ a) ≤ min b≥0 E[(X + b)2 ] σ 2 + b2 = (a + b)2 (a + b)2 σ2 σ 2 + b2 = . (a + b)2 σ 2 + a2 One can use one-sided Chebyschev inequality to arrive at the following corollary. Corollary 5.5. If E[X] = µ and Var(X) = σ 2 , then for any a > 0, σ2 σ 2 + a2 σ2 P(X ≤ µ − a) ≤ 2 . σ + a2 Example 5.3. Let X be a Poisson random variable with mean 20. Show that one-sided Chebyshev inequality gives better upper bound on P(X ≥ 26) compare to Markov and Chebyschev inequalities. Indeed, by Markov inequality, we have E[X] 10 p = P(X ≥ 26) ≤ = . 26 13 By Chebyschev inequality, we get Var(X) 10 = . p = P(X − 20 ≥ 6) ≤ P(|X − 20| ≥ 6) ≤ 36 18 One-sided Chebyschev inequality gives Var(X) 10 p = P(X − 20 ≥ 6) ≤ = . Var(X) + 36 28 P(X ≥ µ + a) ≤ Theorem 5.6 (Weak Law of Large Number). Let {Xi } be a sequence of iid random variables with finite mean µ and variance σ 2 . Then for any ε > 0 Sn σ2 P | − µ| > ε ≤ 2 , n nε P where Sn = ni=1 Xi . In particular, Sn lim P | − µ| > ε = 0 . n→∞ n Proof. First inequality follows from Chebyschev’s inequality. Sending limit as n tends to infinity in the first inequality, we arrive at the second result. PROBABILITY AND STOCHASTIC PROCESS 53 We shall discuss various modes of convergence for a given sequence of random variables {Xn } defined on a given probability space (Ω, F, P). Definition 5.1 (Convergence in probability). We say that {Xn } converges to a random variable X, defined on the same probability space (Ω, F, P), in probability if for every ε > 0, lim P(|Xn − X| > ε) = 0. n→∞ P We denote it by Xn → X. Example 5.4. Let {Xn } be a sequence of random variables such that P(Xn = 0) = 1 − P(Xn = n) = 1 n. 1 n and P Then Xn → 0. Indeed for any ε > 0, ( 1 if ε < n, P(|Xn | > ε) = n 0 if ε ≥ n . Hence, limn→∞ P(|Xn | > ε) = 0. Example 5.5. Let {Xn } be a sequence of i.i.d. random variables with P(Xn = 1) = 21 and P P(Xn = −1) = 12 . Then n1 ni=1 Xi converges to 0 in probability. Indeed for any ε > 0, thanks to weak law of large number, we have 1 Var(X1 ) P(| Sn − µ| > ε) ≤ n nε2 where µ = E(X1 ). Observe that µ = 0 and Var(X1 ) = 1. Hence n 1 1X Xi | > ε) ≤ 2 → 0 P(| n nε as n → ∞. i=1 P Theorem 5.7. Xn → X if and only if lim E n→∞ |Xn −X| 1+|Xn −X| = 0. P Proof. With out loss of generality, take X = 0. Thus, we want to show that Xn → 0 if and only |Xn | = 0. if lim E 1+|X n| n→∞ P Suppose Xn → 0. Then given ε > 0, we have limn→∞ P(|Xn | > ε) = 0. Now, |Xn | |Xn | |Xn | = 1 + 1 ≤ 1|Xn |>ε + ε 1 + |Xn | 1 + |Xn | |Xn |>ε 1 + |Xn | |Xn |≤ε |Xn | |Xn | ≤ P(|Xn | > ε) + ε =⇒ lim E ≤ ε. =⇒ E n→∞ 1 + |Xn | 1 + |Xn | |Xn | Since ε > 0 is arbitrary, we have limn→∞ E 1+|X = 0. n| |Xn | x Conversely, let limn→∞ E 1+|X = 0. Observe that the function f (x) = 1+x is strictly n| increasing on [0, ∞). Thus, ε |Xn | |Xn | 1|Xn |>ε ≤ 1|Xn |>ε ≤ 1+ε 1 + |Xn | 1 + |Xn | ε |Xn | =⇒ P(|Xn | > ε) ≤ E 1+ε 1 + |Xn | 1+ε |Xn | P =⇒ lim P(|Xn | > ε) ≤ lim E = 0 =⇒ Xn → 0. n→∞ ε n→∞ 1 + |Xn | 54 A. K. MAJEE Definition 5.2 (Convergence in r-th mean). Let X, {Xn } be random variables defined on a given probability space (Ω, F, P) such that for r ∈ N, E[|X|r ] < ∞ and E[|Xn |r ] < ∞ for all r n. We say that {Xn } converges in the r-th mean to X, denoted by Xn → X, if the following holds: lim E[|Xn − X|r ] = 0. n→∞ Example 5.6. Let {Xn } be i.i.d. random variables with E[Xn ] = µ and Var(Xn ) = σ 2 . Define P 2 Yn = n1 ni=1 Xi . Then Yn → µ. Indeed Pn Xi − nµ 2 1 1 σ2 2 E[|Yn − µ| ] = E[ i=1 ] = 2 E[|Sn − E(Sn )|2 ] = 2 Var(Sn ) = n n n n Pn 2 where Sn = i=1 Xi . Hence Yn → µ. Theorem 5.8. The following holds: r P i) Xn → X =⇒ Xn → X for any r ≥ 1. P P ii) Let f be a given continuous function. If Xn → X, then f (Xn ) → f (X). Proof. Proof of (i) follows from Markov’s inequality. Indeed, for any given ε > 0, E[|Xn − X|r ] εr 1 =⇒ lim P(|Xn − X| > ε) ≤ r lim E[|Xn − X|r ] = 0. n→∞ ε n→∞ Proof of (ii): For any k > 0, we see that P(|Xn − X| > ε) ≤ {|f (Xn ) − f (X)| > ε} ⊂ {|f (Xn ) − f (X)| > ε, |X| ≤ k} ∪ {|X| > k} . Since f is continuous, it is uniformly continuous on any bounded interval. Therefore, for any given ε > 0, there exists δ > 0 such that |f (x) − f (y)| ≤ ε if |x − y| ≤ δ for x and y in [−k, k]. This means that {|f (Xn ) − f (X)| > ε, |X| ≤ k} ⊂ {|Xn − X| > δ, |X| ≤ k} ⊂ {|Xn − X| > δ} . Thus we have {|f (Xn ) − f (X)| > ε} ⊂ {|Xn − X| > δ} ∪ {|X| > k} =⇒ P(|f (Xn ) − f (X)| > ε) ≤ P(|Xn − X| > δ) + P(|X| > k). P Since Xn → X and limk→∞ P(|X| > k) = 0, we obtain that limn→∞ P(|f (Xn ) − f (X)| > ε) = 0. This completes the proof. In general, convergence in probability does not imply convergence in r-th mean. To see it, consider the following example. P Example 5.7. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Let Xn = n1(0,1/n) . Then Xn → 0 r but Xn 9 0 for all r ≥ 1. To show this, observe that P(|Xn | > ε) ≤ 1 =⇒ lim P(|Xn | > ε) = 0 n→∞ n On the other hand, for r ≥ 1 r Z E[|Xn | ] = 0 1 n nr dx = nr−1 9 0 P i.e., Xn → 0. as n → ∞. PROBABILITY AND STOCHASTIC PROCESS 55 Definition 5.3 (Almost sure convergence). Let X, {Xn } be random variables defined on a given probability space (Ω, F, P). We say that {Xn } converges to X almost surely or with probability 1 if the following holds: P( lim Xn = X) = 1. n→∞ a.s We denote it by Xn → X. Example 5.8. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Define ( 1 if ω ∈ (0, 1 − n1 ) Xn (ω) = n, otherwise. It is easy to check that if ω = 0 or ω = 1, then limn→∞ Xn (ω) = ∞. For any ω ∈ (0, 1), we can find n0 ∈ N such that ω ∈ (0, 1 − n1 ) for any n ≥ n0 . As a consequence, Xn (ω) = 1 for any n ≥ n0 . In other words, for ω ∈ (0, 1), limn→∞ Xn (ω) = 1. Define X(ω) = 1 for all ω ∈ [0, 1]. Then a.s. P(ω ∈ [0, 1] : {Xn (ω)} does not converge to X(ω)) = P({0, 1}) = 0 =⇒ Xn → 1. Sufficient condition for almost sure convergence: Let {An } be a sequence of events in F. Define lim sup An = ∩∞ ∪m≥n Am . n=1 ∪m≥n Am = lim n→∞ n This can be interpreted probabilistically as lim sup An = “ An occurs infinitely often”. n We denote this as {An i.o.} = lim sup An . n Theorem 5.9 (Borel-Cantelli lemma). Let {An } be a sequence of events in (Ω, F, P). P i) If ∞ n=1 P(An ) < +∞, then P(An i.o.) = 0. P ii) If An are mutually independent events, and if ∞ n=1 P(An ) = ∞, then P(An i.o.) = 1. P Remark 5.1. For mutually independent events An , since ∞ n=1 P(An ) is either finite or infinite, the event {An i.o.} has probability either 0 or 1. This is sometimes called zero-one law. As a consequence of Borel-Cantelli lemma, we have the following proposition. Proposition 5.10. Let {Xn } be a sequence of random variables defined on a probability space P a.s. (Ω, F, P). If ∞ n=1 P(|Xn | > ε) < +∞ for any ε > 0, then Xn → 0. P Proof. Fix ε > 0. Let An = {|Xn | > ε}. Then ∞ n=1 P(An ) < +∞, and hence by Borel-Cantelli lemma, P(An i.o.) = 0. Now c lim sup An = {ω : ∃ n0 (ω) such that |Xn (ω)| ≤ ε ∀ n ≥ n0 (ω)} := Bε . n c ∞ c Thus, P(Bε ) = 1. Let B = ∩∞ r=1 B 1 =⇒ B = ∪r=1 B 1 . Moreover, since P(Bε= 1 ) = 1, we have r r r P(B c1 ) = 0. Observe that r {ω : lim |Xn (ω)| = 0} = ∩∞ r=1 B 1 . n→∞ Again, P(B c ) ≤ P∞ r c r=1 P(B 1 ) = 0, and hence P(B) = 1. In other words, r a.s. P {ω : lim |Xn (ω)| = 0} = 1, i.e., Xn → 0. n→∞ 56 A. K. MAJEE Example 5.9. Let {Xn } be a sequence of i.i.d. random variables such that P(Xn = 1) = 12 and P a.s. P(Xn = −1) = 12 . Let Sn = ni=1 Xi . Then n12 Sn2 → 0. To show the result, we use Proposition 5.10. Note that ∞ X E[|Sn2 |2 ] 1 1 1 1 a.s. ≤ 2 2 =⇒ P( 2 |Sn2 | > ε) ≤ P( 2 |Sn2 | > ε) < ∞ =⇒ 2 Sn2 → 0. 4 2 n n ε n ε n n n=1 Let us consider the following example. Example 5.10. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Define Xn = 1[ j j+1 , ] 2k 2k n = 2k + j, j ∈ {0, . . . , 2k − 1}, , k = blog2 (n)c. Note that, for each positive integer n, there exist integers j and k(uniquely determined) such that n = 2k + j, , j ∈ {0, . . . , 2k − 1}, k = blog2 (n)c. ( for n = 1, k = j = 0, and for n = 5, k = 2, j = 1 and so on). Let An = {Xn > 0}. Then, P clearly P(An ) → 0. Consequently, Xn → 0 but Xn (ω) 9 0 for all ω ∈ Ω. Theorem 5.11. The followings hold. a.s P i) If Xn → X, then Xn → X. a.s. P ii) If Xn → X, then there exists a subsequence Xnk of Xn such that Xnk → X. a.s a.s iii) If Xn → X, then for any continuous function f , f (Xn ) → f (X). ε = ∪∞ Aε . Since Proof. Proof of i): For any ε > 0, define Aεn = {|Xn − X| > ε} and Bm n=m n a.s ε ) = 0. Note that {B ε } are nested and decreasing sequence of events. Hence Xn → X, P(∩m Bm m from the continuity of probability measure P, we have ε ε lim P(Bm ) = P(∩m Bm ) = 0. m→∞ ε , we have P(Aε ) ≤ P(B ε ). This implies that lim ε Since Aεm ⊂ Bm m→∞ P(Am ) = 0. In other m m P words, Xn → X. P Proof of ii): To prove ii), we will use Borel-Cantelli lemma. Since Xn → X, we can choose a subsequence Xnk such that P(|Xnk − X| > k1 ) ≤ 21k . Let Ak := {|Xnk − X| > k1 }. Then P∞ k=1 P(Ak ) < +∞. Hence, by Borel-Cantelli lemma P(Ak i.o.) = 0. This implies that 1 ∞ c P(∪∞ n=1 ∩m=n Am ) = 1 =⇒ P {ω ∈ Ω : ∃ n0 : ∀k ≥ n0 , |Xnk − X| ≤ }) = 1 k a.s. =⇒ Xnk → X. Proof of iii): Let N = {ω : limn→∞ Xn (ω) 6= X(ω)}. Then P(N ) = 0. If ω ∈ / N , then by the continuity property of f , we have lim f (Xn (ω)) = f ( lim Xn (ω)) = f (X(ω)) . n→∞ n→∞ a.s This is true for any ω ∈ / N and P(N ) = 0. Hence f (Xn ) → f (X). PROBABILITY AND STOCHASTIC PROCESS 57 Definition 5.4 (Convergence in distribution). Let X, X1 , X2 , . . . be real-valued random variables with distribution functions FX , FX1 , FX2 , . . . respectively. We say that (Xn ) converges d to X is distribution, denoted by Xn → X, if lim FXn (x) = FX (x) n→∞ for all continuity points x of FX . Remark 5.2. In the above definition, the random variables X, {Xn } need not be defined on the same probability space. Example 5.11. Let Xn = 1 n and X = 0. Then ( 1 if x ≥ n1 FXn (x) = P(Xn ≤ x) = 0, otherwise and ( 1 x ≥ 0, FX (x) = 0, x < 0. Observe that 0 is the only discontinuity point of FX , and limn→∞ FXn (x) = FX (x) for x 6= 0. d Thus, Xn → 0. Example 5.12. Let X be a real-valued random variable with distribution function F . Define Xn = X + n1 . Then 1 1 FXn (x) = P(X + ≤ x) = F (x − ) n n =⇒ lim FXn (x) = F (x−) = F (x) for continuity point x of F . n→∞ d This implies that Xn → X. d P Theorem 5.12. Xn → X implies that Xn → X. Proof. Let ε > 0. Since FXn (t) = P(Xn ≤ t), we have FXn (t) = P(Xn ≤ t, |Xn − X| > ε) + P (Xn ≤ t, |Xn − X| ≤ ε) ≤ P(|Xn − X| > ε) + P(Xn ≤ t, |Xn − X| ≤ ε) ≤ P(|Xn − X| > ε) + P(X ≤ t + ε) ≤ P(|Xn − X| > ε) + FX (t + ε), and FX (t − ε) = P(X ≤ t − ε) = P(X ≤ t − ε, |Xn − X| > ε) + P(X ≤ t − ε, |Xn − X| ≤ ε) ≤ P(|Xn − X| > ε) + P(X ≤ t − ε, |Xn − X| ≤ ε) ≤ P(|Xn − X| > ε) + P(Xn ≤ t) ≤ P(|Xn − X| > ε) + FXn (t) . Thus, since limn→∞ P(|Xn − X| > ε) = 0, we obtain from the above inequalities FX (t − ε) ≤ lim inf FXn (t) ≤ lim sup FXn (t) ≤ FX (t + ε). n→∞ n→∞ Thet t be the continuity point of F . Then sending ε → 0 in the above inequality, we get lim FXn (t) = FX (t), n→∞ d i.e., Xn → X. Converse of this theorem is NOT true in general. Example 5.13. Let X ∼ N (0, 1). Define Xn = −X for n = 1, 2, 3, . . .. Then Xn ∼ N (0, 1) d and hence Xn → X. But ε P P(|Xn − X| > ε) = P(|2X| > ε) = P(|X| > ) 6= 0 =⇒ Xn 9 X. 2 58 A. K. MAJEE Theorem 5.13 (Continuity theorem). Let X, {Xn } be random variables having the characteristic function φX , {φXn } respectively. Then the followings are equivalent. d i) Xn → X. ii) E(g(Xn )) → E(g(X)) for all bounded Lipschitz continuous function. iii) limn→∞ φXn (t) = φX (t) for all t ∈ R. Example 5.14. Let {Xn } be a sequence of Poisson random variables with parameter λn = n. n −n Define Zn = X√ . Then n d Zn → Z, where L(Z) = N (0, 1). Solution: To see this, we use Levy’s continuity theorem. Let ΦZn : R → C be the characteristic function of Zn . Then we have iu √ √ √ i √u X n ΦZn (u) = E eiuZn = e−iu n E e n n = e−iu n en(e −1) . Using Taylor‘s series expansion, we have e iu √ n iu iu3 u2 −1= √ − − + ... n 2n 6n 32 and hence we get 2 ΦZn (u) = e 3 iu √ n( √ −u − iu 3 +...) 2n n −iu n 6n 2 e = e−iu √ 2 √ h(u,n) n+−iu n − u2 − √n e where h(u, n) stays bounded in n for each u and hence limn→∞ lim ΦZn (u) = lim e−iu n→∞ Since e− u2 2 √ h(u,n) √ n 2 √ h(u,n) n+−iu n − u2 − √n e n→∞ e e = 0. Therefore, we have = e− u2 2 . d is the characteristic function of N (0, 1), we conclude that Zn → Z. Theorem 5.14 (Strong law of large number (SLLN)). Let {Xi } be a sequence of i.i.d. random variables with finite mean µ and variance σ 2 . Then n X Sn a.s. → µ, where Sn = Xi . n i=1 The special case of 4-th order moment, above theorem is refered as Borel’s SLLN. Before going to prove the theorem, let us consider some examples. Example 5.15. 1) Let {Xn } be a sequence of i.i.d. random variables that are bounded, a.s. i.e., there exists C < ∞ such that P(|X1 | ≤ C) = 1. Then Snn → E(X1 ). 2) Let {Xn } be a sequence of i.i.d. Bernoulli(p) random variables. Then µ = p and σ 2 = p(1 − p). Hence by SLLN theorem, we have n 1X lim Xi = p with probability 1. n→∞ n i=1 To prove the theorem, we need following lemma. Lemma 5.15. Let {Xi } be a sequence of random variables defined on a given probability space (Ω, F, P). i) If {Xn } are positive, then ∞ ∞ X X E Xn = E[Xn ] . (5.1) n=1 ii) If P∞ n=1 E[|Xn |] < ∞, then P∞ i=1 Xi n=1 converges almost surely and (5.1) holds as well. PROBABILITY AND STOCHASTIC PROCESS 59 Proof of Strong law of large number: With out loss of generality we can assume that µ = 0 (otherwise, if µ 6= 0, then set X̃i = Xi − µ, and work with X̃i ). Set Yn = Snn . Observe that, thanks to independent property, E[Yn2 ] E[Yn ] = 0, 1 = 2 n X 1≤j,k≤n n 1 X σ2 E(Xj Xk ) = 2 E[Xj2 ] = . n n j=1 Thus, limn→∞ E[Yn2 ] = 0, and hence along a subsequence, Yn converges to 0 almost surely. But we need to show that original sequence converges to 0 with probability 1. To do so, we proceed P∞ σ2 P 2 2 as follows. Since E[Yn2 ] = σn , we see that ∞ n=1 n2 < +∞ and hence by Lemma n=1 E[Yn2 ] = P∞ 5.15, ii), n=1 Yn22 converges almost surely. Thus, lim Yn2 = 0 n→∞ with probability 1. (5.2) Let n ∈ N. Then there exists m(n) ∈ N such that (m(n))2 ≤ n < (m(n) + 1)2 . Now 2 (m(n)) n X (m(n))2 1X (m(n))2 1 Yn − Y(m(n))2 = Xi − Xi n n n (m(n))2 i=1 = 1 n i=1 n X Xi i=1+(m(n))2 2 i h 1 (m(n))2 = 2 =⇒ E Yn − Y(m(n))2 n n n X E[Xi2 ] i=1+(m(n))2 n − (m(n))2 2 2m(n) + 1 2 σ ≤ σ (∴ n < (m(n) + 1)2 ) n2 n2 √ √ 2 n + 1 2 3σ 2 ≤ σ ≤ 3 (∴ m(n) ≤ n) 2 n n2 ∞ ∞ h i 2 2 X 3σ X 2 (m(n)) Y(m(n))2 ≤ =⇒ E Yn − 3 < +∞ . n 2 n n=1 n=1 = Thus, again by Lemma 5.15, ii), we conclude that lim Yn − n→∞ Obsere that limn→∞ (m(n))2 n (m(n))2 Y(m(n))2 = 0 n with probability 1. (5.3) = 1. Thus, in view of (5.2) and (5.3), we conclude that lim Yn = 0 n→∞ with probability 1. This completes the proof. Theorem 5.16 (Kolmogorov’s strong law of large numbers). Let {Xn } be a sequence of i.i.d. random variables and µ ∈ R. Then lim n→∞ Sn = µ a.s. if and only if E[Xn ] = µ. n In this case, the convergence also holds in L1 . Example 5.16 (Monte Carlo approximation). Let f be a measurable function in [0, 1] such R1 R1 that 0 |f (x)| dx < ∞. Let α = 0 f (x) dx. In general we cannot obtain a closed form expression 60 A. K. MAJEE for α and need to estimate it. Let {Uj } be a sequence of independent uniformly random variables on [0, 1]. Then by Theorem 5.16, Z 1 n 1X lim f (Uj ) = E[f (Uj )] = f (x) dx n→∞ n 0 j=1 R1 a.s. and in L2 . Thus, to get an approximation of 0 f (x) dx, we need to simulate the uniform random variables Uj (by using a random number generator). Theorem 5.17 (Central limit theorem). Let {Xn } be a sequence of i.i.d. random variables −nµ . Then Yn converges in with finite mean µ and variance σ 2 with 0 < σ 2 < +∞. Let Yn = Sσn√ n distribution to Y , where L(Y ) = N (0, 1). Proof. With out loss of generality, we assume that µ = 0. Let Φ, ΦYn be the characteristic function of Xj and Yn respectively. Since {Xj } are i.i.d., we have ΦYn (u) = E[e iuYn iu σS√nn iu Pn X i=1 √ i σ n ] = E[e ] = E[e ] n n hY X i Y iu X√i u n iu √i =E e σ n = E e σ n = Φ( √ ) . σ n i=1 i=1 Since E[|Xj |2 ] < +∞, the function Φ has two continuous derivatives. In particular, Φ0 (u) = iE[Xj eiuXj ], Φ00 (u) = −E[Xj2 eiuXj ] =⇒ Φ0 (0) = 0, Φ00 (0) = −σ 2 . Expanding Φ in a Taylor expansion about u = 0, we have Φ(u) = 1 − σ 2 u2 + h(u)u2 , 2 where h(u) → 0 as u → 0. Thus, we get u )) n log(Φ( σ√ n ΦYn (u) = e 2 =⇒ lim ΦYn (u) = e− n→∞ n log(1− u + 2n =e u2 2 = ΦY (u) u2 u )) h( σ√ n nσ 2 (by L’Hôpital rule). Hence by Levy’s continuity theorem, we conclude that Yn converges in distribution to Y with L(Y ) = N (0, 1). Remark 5.3. If σ 2 = 0, then Xj = µ a.s. for all j, and hence Sn n = µ a.s. One can weaken slightly the hypotheses of Theorem 5.17. Indeed, we have the following Central limit theorem. Theorem 5.18. Let {Xn } be independent but not necessarily identicallu distributed. Let E[Xn ] = 0 for all n, ane let σn2 = Var(Xn ). Assume that sup E[|Xn |2+ε ] < +∞ for some ε > 0, n Then √PSnn i=1 σi2 ∞ X σn2 = ∞. n=1 converges in distribution to Y with L(Y ) = N (0, 1). Example 5.17. Let {Xn } be a sequence of i.i.d random variables such that P(Xn = 1) = 12 and P(Xn = 0) = 12 . Then µ = 12 and σ 2 = 14 . Hence by central limit theorem, Yn = 2S√nn−n converges in distribution to Y with L(Y ) = N (0, 1). PROBABILITY AND STOCHASTIC PROCESS 61 Example 5.18. Let X ∼ B(n, p). For any given 0 ≤ α ≤ 1, we want to find n such that P(X > n2 ) ≤ 1 − α. We can thaink of X as a sum of n i.i.d. random variable Xi such that Xi ∼ B(1, p). Hence by central limit theorem, for large n, Z x p u2 1 e− 2 du. P(X − np ≤ x np(1 − p)) = Φ(x) = √ 2π −∞ √ p Choose x such that np + x np(1 − p) = n2 . This implies that x = 2n √1−2p . Thus, p(1−p) Z √ n 2 √1−2p p(1−p) u2 n n 1 e− 2 du . ) = 1 − P(X ≤ ) = 1 − √ 2 2 2π −∞ Therefore, we need to choose n such that p Z √n √1−2p 2 2 √ 2 p(1 − p) −1 1 p(1−p) − u e 2 du =⇒ n ≥ α≤ √ Φ (α) 1 − 2p 2π −∞ 4p(1 − p) −1 2 Φ (α) . =⇒ n ≥ (1 − 2p)2 P(X > Example 5.19. Let {Xi } be sequence of i.i.d. random variables with exp(1) distributed. Let Pn i=1 Xi . How large should n be such that X̄ = n P(0.9 ≤ X̄ ≤ 1.1) ≥ 0.95? Since Xi ’s are exp(1) distributed, µ = E[Xi ] = 1 and σ 2 = Var(Xi ) = 1. Let Y = Then by central limit theorem, Y√−n is approximately N (0, 1). Now n Pn i=1 Xi . (0.9)n − n Y −n (1.1)n − n √ √ ≤ √ ≤ ) n n n √ √ √ √ Y −n = P − (0.1) n ≤ √ ≤ (0.1) n = Φ((0.1) n) − Φ(−(0.1) n) n √ = 2Φ((0.1) n) − 1 (∴ Φ(−x) = 1 − Φ(x)) P(0.9 ≤ X̄ ≤ 1.1) = P((0.9)n ≤ Y ≤ (1.1)n) = P( Hence we need to find n such that √ √ √ 2Φ((0.1) n) − 1 ≥ 0.95 =⇒ Φ((0.1) n) ≥ 0.975 =⇒ (0.1) n ≥ Φ−1 (0.975) = 1.96 √ =⇒ n ≥ 19.6 =⇒ n ≥ 384.16 =⇒ n ≥ 385 (∴ n ∈ N).