# 2201-MTL106-51-61 by Amaiya Singhal

```PROBABILITY AND STOCHASTIC PROCESS
51
5. Modes of Convergence
5.1. Some important inequalities. We start this section by proving a result known as Markov
inequality.
Lemma 5.1 (Markov inequality). If X is a non-negative random variable whose expected
value exists, then for all a &gt; 0,
P(X ≥ a) ≤
E(X)
.
a
Proof. Observe that, since X is non-negative
h
i
E[X] = E X1{X≥a} + X1{X&lt;a} ≥ E[X1{X≥a} ] ≥ aP(X ≥ a).
Hence the result follows.
Corollary 5.2. If X is a random variable such that E[|X|] &lt; +∞, then for all a &gt; 0
P(|X| ≥ a) ≤
E(|X|)
.
a
Example 5.1. A coin is weighted so that its probability of landing on heads is 20%. Suppose
the coin is flipped 20 times. We want to find a bound for the probability it lands on heads at
least 16 times. Let X be the number of times the coin lands on heads. Then X ∼ B(20, 15 ). We
use Markov inequality to find the required bound:
P(X ≥ 16) ≤
E(X)
4
1
=
= .
16
16
4
The actual probability that this happen is
20 X
20
P(X ≥ 16) =
(0.2)k (0.8)20−k ≈ 1.38 &times; 10−8 .
k
k=16
Lemma 5.3 (Chebyschev’s inequality). Let Y be an integrable random variable such that
Var(Y ) &lt; +∞. Then for any ε &gt; 0
P(|Y − E(Y )| ≥ ε) ≤
Var(Y )
.
ε2
Proof. To get the result, take X = |Y − E(Y )|2 and a = ε2 in Markov inequality.
Example 5.2. Is there any random variable X for which
1
P &micro; − 3σ ≤ X ≤ &micro; + 3σ = ,
2
2
where &micro; = E(X) and σ = Var(X).
Solution: Observe that
P &micro; − 3σ ≤ X ≤ &micro; + 3σ = P |X − &micro;| ≤ 3σ = 1 − P |X − E(X)| &gt; 3σ .
By Chebyschev’s inequality, we get that
σ2
P |X − E(X)| &gt; 3σ ≤ P |X − E(X)| ≥ 3σ ≤ 2 ,
9σ
and hence
Since
1
2
8
1
P &micro; − 3σ ≤ X ≤ &micro; + 3σ ≥ 1 − = .
9
9
&lt; 89 , there exists NO random variable X satisfying the given condition.
52
A. K. MAJEE
In principle Chebyshev’s inequality asks about distance from the mean in either direction, it
can still be used to give a bound on how often a random variable can take large values, and will
usually give much better bounds than Markov’s inequality. For example consider Example 5.1.
Markov’s inequality gives a bound of 14 . Using Chebyshev’s inequality, we see that
Var(X)
1
=
.
144
45
Lemma 5.4 (One-sided Chebyschev inequality). Let X be a random variable with mean 0 and
variance σ 2 &lt; +∞. Then for any a &gt; 0,
P(X ≥ 16) = P(X − 4 ≥ 12) ≤ P(|X − 4| ≥ 12) ≤
σ2
.
σ 2 + a2
Proof. For any b ≥ 0, we see that X ≥ a is equivalent to X + b ≥ a + b. Hence by Markov’s
inequality, we have
P(X ≥ a) ≤
P(X ≥ a) = P(X + b ≥ a + b) ≤ P((X + b)2 ≥ (a + b)2 ) ≤
=⇒ P(X ≥ a) ≤ min
b≥0
E[(X + b)2 ]
σ 2 + b2
=
(a + b)2
(a + b)2
σ2
σ 2 + b2
=
.
(a + b)2
σ 2 + a2
One can use one-sided Chebyschev inequality to arrive at the following corollary.
Corollary 5.5. If E[X] = &micro; and Var(X) = σ 2 , then for any a &gt; 0,
σ2
σ 2 + a2
σ2
P(X ≤ &micro; − a) ≤ 2
.
σ + a2
Example 5.3. Let X be a Poisson random variable with mean 20. Show that one-sided Chebyshev inequality gives better upper bound on P(X ≥ 26) compare to Markov and Chebyschev
inequalities. Indeed, by Markov inequality, we have
E[X]
10
p = P(X ≥ 26) ≤
=
.
26
13
By Chebyschev inequality, we get
Var(X)
10
=
.
p = P(X − 20 ≥ 6) ≤ P(|X − 20| ≥ 6) ≤
36
18
One-sided Chebyschev inequality gives
Var(X)
10
p = P(X − 20 ≥ 6) ≤
=
.
Var(X) + 36
28
P(X ≥ &micro; + a) ≤
Theorem 5.6 (Weak Law of Large Number). Let {Xi } be a sequence of iid random variables
with finite mean &micro; and variance σ 2 . Then for any ε &gt; 0
Sn
σ2
P |
− &micro;| &gt; ε ≤ 2 ,
n
nε
P
where Sn = ni=1 Xi . In particular,
Sn
lim P |
− &micro;| &gt; ε = 0 .
n→∞
n
Proof. First inequality follows from Chebyschev’s inequality. Sending limit as n tends to infinity
in the first inequality, we arrive at the second result.
PROBABILITY AND STOCHASTIC PROCESS
53
We shall discuss various modes of convergence for a given sequence of random variables {Xn }
defined on a given probability space (Ω, F, P).
Definition 5.1 (Convergence in probability). We say that {Xn } converges to a random
variable X, defined on the same probability space (Ω, F, P), in probability if for every ε &gt; 0,
lim P(|Xn − X| &gt; ε) = 0.
n→∞
P
We denote it by Xn → X.
Example 5.4. Let {Xn } be a sequence of random variables such that P(Xn = 0) = 1 −
P(Xn = n) =
1
n.
1
n
and
P
Then Xn → 0. Indeed for any ε &gt; 0,
(
1
if ε &lt; n,
P(|Xn | &gt; ε) = n
0 if ε ≥ n .
Hence, limn→∞ P(|Xn | &gt; ε) = 0.
Example 5.5. Let {Xn } be a sequence of i.i.d. random variables with P(Xn = 1) = 21 and
P
P(Xn = −1) = 12 . Then n1 ni=1 Xi converges to 0 in probability. Indeed for any ε &gt; 0, thanks
to weak law of large number, we have
1
Var(X1 )
P(| Sn − &micro;| &gt; ε) ≤
n
nε2
where &micro; = E(X1 ). Observe that &micro; = 0 and Var(X1 ) = 1. Hence
n
1
1X
Xi | &gt; ε) ≤ 2 → 0
P(|
n
nε
as n → ∞.
i=1
P
Theorem 5.7. Xn → X if and only if lim E
n→∞
|Xn −X| 1+|Xn −X|
= 0.
P
Proof. With out loss of generality, take X = 0. Thus, we want to show that Xn → 0 if and only
|Xn | = 0.
if lim E 1+|X
n|
n→∞
P
Suppose Xn → 0. Then given ε &gt; 0, we have limn→∞ P(|Xn | &gt; ε) = 0. Now,
|Xn |
|Xn |
|Xn |
=
1
+
1
≤ 1|Xn |&gt;ε + ε
1 + |Xn |
1 + |Xn | |Xn |&gt;ε 1 + |Xn | |Xn |≤ε
|Xn | |Xn | ≤ P(|Xn | &gt; ε) + ε =⇒ lim E
≤ ε.
=⇒ E
n→∞
1 + |Xn |
1 + |Xn |
|Xn | Since ε &gt; 0 is arbitrary, we have limn→∞ E 1+|X
= 0.
n|
|Xn |
x
Conversely, let limn→∞ E 1+|X
= 0. Observe that the function f (x) = 1+x
is strictly
n|
increasing on [0, ∞). Thus,
ε
|Xn |
|Xn |
1|Xn |&gt;ε ≤
1|Xn |&gt;ε ≤
1+ε
1 + |Xn |
1 + |Xn |
ε
|Xn | =⇒
P(|Xn | &gt; ε) ≤ E
1+ε
1 + |Xn |
1+ε
|Xn | P
=⇒ lim P(|Xn | &gt; ε) ≤
lim E
= 0 =⇒ Xn → 0.
n→∞
ε n→∞ 1 + |Xn |
54
A. K. MAJEE
Definition 5.2 (Convergence in r-th mean). Let X, {Xn } be random variables defined on
a given probability space (Ω, F, P) such that for r ∈ N, E[|X|r ] &lt; ∞ and E[|Xn |r ] &lt; ∞ for all
r
n. We say that {Xn } converges in the r-th mean to X, denoted by Xn → X, if the following
holds:
lim E[|Xn − X|r ] = 0.
n→∞
Example 5.6. Let {Xn } be i.i.d. random variables with E[Xn ] = &micro; and Var(Xn ) = σ 2 . Define
P
2
Yn = n1 ni=1 Xi . Then Yn → &micro;. Indeed
Pn
Xi − n&micro; 2
1
1
σ2
2
E[|Yn − &micro;| ] = E[ i=1
] = 2 E[|Sn − E(Sn )|2 ] = 2 Var(Sn ) =
n
n
n
n
Pn
2
where Sn = i=1 Xi . Hence Yn → &micro;.
Theorem 5.8. The following holds:
r
P
i) Xn → X =⇒ Xn → X for any r ≥ 1.
P
P
ii) Let f be a given continuous function. If Xn → X, then f (Xn ) → f (X).
Proof. Proof of (i) follows from Markov’s inequality. Indeed, for any given ε &gt; 0,
E[|Xn − X|r ]
εr
1
=⇒ lim P(|Xn − X| &gt; ε) ≤ r lim E[|Xn − X|r ] = 0.
n→∞
ε n→∞
Proof of (ii): For any k &gt; 0, we see that
P(|Xn − X| &gt; ε) ≤
{|f (Xn ) − f (X)| &gt; ε} ⊂ {|f (Xn ) − f (X)| &gt; ε, |X| ≤ k} ∪ {|X| &gt; k} .
Since f is continuous, it is uniformly continuous on any bounded interval. Therefore, for any
given ε &gt; 0, there exists δ &gt; 0 such that |f (x) − f (y)| ≤ ε if |x − y| ≤ δ for x and y in [−k, k].
This means that
{|f (Xn ) − f (X)| &gt; ε, |X| ≤ k} ⊂ {|Xn − X| &gt; δ, |X| ≤ k} ⊂ {|Xn − X| &gt; δ} .
Thus we have
{|f (Xn ) − f (X)| &gt; ε} ⊂ {|Xn − X| &gt; δ} ∪ {|X| &gt; k}
=⇒ P(|f (Xn ) − f (X)| &gt; ε) ≤ P(|Xn − X| &gt; δ) + P(|X| &gt; k).
P
Since Xn → X and limk→∞ P(|X| &gt; k) = 0, we obtain that limn→∞ P(|f (Xn ) − f (X)| &gt; ε) = 0.
This completes the proof.
In general, convergence in probability does not imply convergence in r-th mean. To see it,
consider the following example.
P
Example 5.7. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Let Xn = n1(0,1/n) . Then Xn → 0
r
but Xn 9 0 for all r ≥ 1. To show this, observe that
P(|Xn | &gt; ε) ≤
1
=⇒ lim P(|Xn | &gt; ε) = 0
n→∞
n
On the other hand, for r ≥ 1
r
Z
E[|Xn | ] =
0
1
n
nr dx = nr−1 9 0
P
i.e., Xn → 0.
as n → ∞.
PROBABILITY AND STOCHASTIC PROCESS
55
Definition 5.3 (Almost sure convergence). Let X, {Xn } be random variables defined on
a given probability space (Ω, F, P). We say that {Xn } converges to X almost surely or with
probability 1 if the following holds:
P( lim Xn = X) = 1.
n→∞
a.s
We denote it by Xn → X.
Example 5.8. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Define
(
1 if ω ∈ (0, 1 − n1 )
Xn (ω) =
n, otherwise.
It is easy to check that if ω = 0 or ω = 1, then limn→∞ Xn (ω) = ∞. For any ω ∈ (0, 1), we
can find n0 ∈ N such that ω ∈ (0, 1 − n1 ) for any n ≥ n0 . As a consequence, Xn (ω) = 1 for any
n ≥ n0 . In other words, for ω ∈ (0, 1), limn→∞ Xn (ω) = 1. Define X(ω) = 1 for all ω ∈ [0, 1].
Then
a.s.
P(ω ∈ [0, 1] : {Xn (ω)} does not converge to X(ω)) = P({0, 1}) = 0 =⇒ Xn → 1.
Sufficient condition for almost sure convergence: Let {An } be a sequence of events in F.
Define
lim sup An = ∩∞
∪m≥n Am .
n=1 ∪m≥n Am = lim
n→∞
n
This can be interpreted probabilistically as
lim sup An = “ An occurs infinitely often”.
n
We denote this as
{An i.o.} = lim sup An .
n
Theorem 5.9 (Borel-Cantelli lemma). Let {An } be a sequence of events in (Ω, F, P).
P
i) If ∞
n=1 P(An ) &lt; +∞, then P(An i.o.) = 0.
P
ii) If An are mutually independent events, and if ∞
n=1 P(An ) = ∞, then
P(An i.o.) = 1.
P
Remark 5.1. For mutually independent events An , since ∞
n=1 P(An ) is either finite or infinite,
the event {An i.o.} has probability either 0 or 1. This is sometimes called zero-one law.
As a consequence of Borel-Cantelli lemma, we have the following proposition.
Proposition 5.10. Let {Xn } be a sequence of random variables defined on a probability space
P
a.s.
(Ω, F, P). If ∞
n=1 P(|Xn | &gt; ε) &lt; +∞ for any ε &gt; 0, then Xn → 0.
P
Proof. Fix ε &gt; 0. Let An = {|Xn | &gt; ε}. Then ∞
n=1 P(An ) &lt; +∞, and hence by Borel-Cantelli
lemma, P(An i.o.) = 0. Now
c
lim sup An = {ω : ∃ n0 (ω) such that |Xn (ω)| ≤ ε ∀ n ≥ n0 (ω)} := Bε .
n
c
∞
c
Thus, P(Bε ) = 1. Let B = ∩∞
r=1 B 1 =⇒ B = ∪r=1 B 1 . Moreover, since P(Bε= 1 ) = 1, we have
r
r
r
P(B c1 ) = 0. Observe that
r
{ω : lim |Xn (ω)| = 0} = ∩∞
r=1 B 1 .
n→∞
Again, P(B c ) ≤
P∞
r
c
r=1 P(B 1 ) = 0, and hence P(B) = 1. In other words,
r
a.s.
P {ω : lim |Xn (ω)| = 0} = 1, i.e., Xn → 0.
n→∞
56
A. K. MAJEE
Example 5.9. Let {Xn } be a sequence of i.i.d. random variables such that P(Xn = 1) = 12 and
P
a.s.
P(Xn = −1) = 12 . Let Sn = ni=1 Xi . Then n12 Sn2 → 0. To show the result, we use Proposition
5.10. Note that
∞
X
E[|Sn2 |2 ]
1
1
1
1
a.s.
≤ 2 2 =⇒
P( 2 |Sn2 | &gt; ε) ≤
P( 2 |Sn2 | &gt; ε) &lt; ∞ =⇒ 2 Sn2 → 0.
4
2
n
n ε
n ε
n
n
n=1
Let us consider the following example.
Example 5.10. Let Ω = [0, 1], F = B([0, 1]) and P(dx) = dx. Define
Xn = 1[
j j+1
,
]
2k 2k
n = 2k + j, j ∈ {0, . . . , 2k − 1},
,
k = blog2 (n)c.
Note that, for each positive integer n, there exist integers j and k(uniquely determined) such
that
n = 2k + j, , j ∈ {0, . . . , 2k − 1}, k = blog2 (n)c.
( for n = 1, k = j = 0, and for n = 5, k = 2, j = 1 and so on). Let An = {Xn &gt; 0}. Then,
P
clearly P(An ) → 0. Consequently, Xn → 0 but Xn (ω) 9 0 for all ω ∈ Ω.
Theorem 5.11. The followings hold.
a.s
P
i) If Xn → X, then Xn → X.
a.s.
P
ii) If Xn → X, then there exists a subsequence Xnk of Xn such that Xnk → X.
a.s
a.s
iii) If Xn → X, then for any continuous function f , f (Xn ) → f (X).
ε = ∪∞ Aε . Since
Proof. Proof of i): For any ε &gt; 0, define Aεn = {|Xn − X| &gt; ε} and Bm
n=m n
a.s
ε ) = 0. Note that {B ε } are nested and decreasing sequence of events. Hence
Xn → X, P(∩m Bm
m
from the continuity of probability measure P, we have
ε
ε
lim P(Bm
) = P(∩m Bm
) = 0.
m→∞
ε , we have P(Aε ) ≤ P(B ε ). This implies that lim
ε
Since Aεm ⊂ Bm
m→∞ P(Am ) = 0. In other
m
m
P
words, Xn → X.
P
Proof of ii): To prove ii), we will use Borel-Cantelli lemma. Since Xn → X, we can choose
a subsequence Xnk such that P(|Xnk − X| &gt; k1 ) ≤ 21k . Let Ak := {|Xnk − X| &gt; k1 }. Then
P∞
k=1 P(Ak ) &lt; +∞. Hence, by Borel-Cantelli lemma P(Ak i.o.) = 0. This implies that
1
∞
c
P(∪∞
n=1 ∩m=n Am ) = 1 =⇒ P {ω ∈ Ω : ∃ n0 : ∀k ≥ n0 , |Xnk − X| ≤ }) = 1
k
a.s.
=⇒ Xnk → X.
Proof of iii): Let N = {ω : limn→∞ Xn (ω) 6= X(ω)}. Then P(N ) = 0. If ω ∈
/ N , then by the
continuity property of f , we have
lim f (Xn (ω)) = f ( lim Xn (ω)) = f (X(ω)) .
n→∞
n→∞
a.s
This is true for any ω ∈
/ N and P(N ) = 0. Hence f (Xn ) → f (X).
PROBABILITY AND STOCHASTIC PROCESS
57
Definition 5.4 (Convergence in distribution). Let X, X1 , X2 , . . . be real-valued random
variables with distribution functions FX , FX1 , FX2 , . . . respectively. We say that (Xn ) converges
d
to X is distribution, denoted by Xn → X, if
lim FXn (x) = FX (x)
n→∞
for all continuity points x of FX .
Remark 5.2. In the above definition, the random variables X, {Xn } need not be defined on
the same probability space.
Example 5.11. Let Xn =
1
n
and X = 0. Then
(
1 if x ≥ n1
FXn (x) = P(Xn ≤ x) =
0, otherwise
and
(
1 x ≥ 0,
FX (x) =
0, x &lt; 0.
Observe that 0 is the only discontinuity point of FX , and limn→∞ FXn (x) = FX (x) for x 6= 0.
d
Thus, Xn → 0.
Example 5.12. Let X be a real-valued random variable with distribution function F . Define
Xn = X + n1 . Then
1
1
FXn (x) = P(X + ≤ x) = F (x − )
n
n
=⇒ lim FXn (x) = F (x−) = F (x) for continuity point x of F .
n→∞
d
This implies that Xn → X.
d
P
Theorem 5.12. Xn → X implies that Xn → X.
Proof. Let ε &gt; 0. Since FXn (t) = P(Xn ≤ t), we have
FXn (t) = P(Xn ≤ t, |Xn − X| &gt; ε) + P (Xn ≤ t, |Xn − X| ≤ ε)
≤ P(|Xn − X| &gt; ε) + P(Xn ≤ t, |Xn − X| ≤ ε)
≤ P(|Xn − X| &gt; ε) + P(X ≤ t + ε)
≤ P(|Xn − X| &gt; ε) + FX (t + ε),
and
FX (t − ε) = P(X ≤ t − ε) = P(X ≤ t − ε, |Xn − X| &gt; ε) + P(X ≤ t − ε, |Xn − X| ≤ ε)
≤ P(|Xn − X| &gt; ε) + P(X ≤ t − ε, |Xn − X| ≤ ε)
≤ P(|Xn − X| &gt; ε) + P(Xn ≤ t)
≤ P(|Xn − X| &gt; ε) + FXn (t) .
Thus, since limn→∞ P(|Xn − X| &gt; ε) = 0, we obtain from the above inequalities
FX (t − ε) ≤ lim inf FXn (t) ≤ lim sup FXn (t) ≤ FX (t + ε).
n→∞
n→∞
Thet t be the continuity point of F . Then sending ε → 0 in the above inequality, we get
lim FXn (t) = FX (t),
n→∞
d
i.e., Xn → X.
Converse of this theorem is NOT true in general.
Example 5.13. Let X ∼ N (0, 1). Define Xn = −X for n = 1, 2, 3, . . .. Then Xn ∼ N (0, 1)
d
and hence Xn → X. But
ε
P
P(|Xn − X| &gt; ε) = P(|2X| &gt; ε) = P(|X| &gt; ) 6= 0 =⇒ Xn 9 X.
2
58
A. K. MAJEE
Theorem 5.13 (Continuity theorem). Let X, {Xn } be random variables having the characteristic function φX , {φXn } respectively. Then the followings are equivalent.
d
i) Xn → X.
ii) E(g(Xn )) → E(g(X)) for all bounded Lipschitz continuous function.
iii) limn→∞ φXn (t) = φX (t) for all t ∈ R.
Example 5.14. Let {Xn } be a sequence of Poisson random variables with parameter λn = n.
n −n
Define Zn = X√
. Then
n
d
Zn → Z, where L(Z) = N (0, 1).
Solution: To see this, we use Levy’s continuity theorem. Let ΦZn : R → C be the characteristic
function of Zn . Then we have
iu
√
√
√
i √u X n
ΦZn (u) = E eiuZn = e−iu n E e n n = e−iu n en(e −1) .
Using Taylor‘s series expansion, we have
e
iu
√
n
iu
iu3
u2
−1= √ −
−
+ ...
n 2n 6n 32
and hence we get
2
ΦZn (u) = e
3
iu
√ n( √
−u
− iu 3 +...)
2n
n
−iu n
6n 2
e
= e−iu
√
2
√
h(u,n)
n+−iu n − u2 − √n
e
where h(u, n) stays bounded in n for each u and hence limn→∞
lim ΦZn (u) = lim e−iu
n→∞
Since e−
u2
2
√
h(u,n)
√
n
2
√
h(u,n)
n+−iu n − u2 − √n
e
n→∞
e
e
= 0. Therefore, we have
= e−
u2
2
.
d
is the characteristic function of N (0, 1), we conclude that Zn → Z.
Theorem 5.14 (Strong law of large number (SLLN)). Let {Xi } be a sequence of i.i.d.
random variables with finite mean &micro; and variance σ 2 . Then
n
X
Sn a.s.
→ &micro;, where Sn =
Xi .
n
i=1
The special case of 4-th order moment, above theorem is refered as Borel’s SLLN. Before
going to prove the theorem, let us consider some examples.
Example 5.15.
1) Let {Xn } be a sequence of i.i.d. random variables that are bounded,
a.s.
i.e., there exists C &lt; ∞ such that P(|X1 | ≤ C) = 1. Then Snn → E(X1 ).
2) Let {Xn } be a sequence of i.i.d. Bernoulli(p) random variables. Then &micro; = p and σ 2 =
p(1 − p). Hence by SLLN theorem, we have
n
1X
lim
Xi = p with probability 1.
n→∞ n
i=1
To prove the theorem, we need following lemma.
Lemma 5.15. Let {Xi } be a sequence of random variables defined on a given probability space
(Ω, F, P).
i) If {Xn } are positive, then
∞
∞
X
X
E
Xn =
E[Xn ] .
(5.1)
n=1
ii) If
P∞
n=1 E[|Xn |]
&lt; ∞, then
P∞
i=1 Xi
n=1
converges almost surely and (5.1) holds as well.
PROBABILITY AND STOCHASTIC PROCESS
59
Proof of Strong law of large number: With out loss of generality we can assume that &micro; = 0
(otherwise, if &micro; 6= 0, then set X̃i = Xi − &micro;, and work with X̃i ). Set Yn = Snn . Observe that,
thanks to independent property,
E[Yn2 ]
E[Yn ] = 0,
1
= 2
n
X
1≤j,k≤n
n
1 X
σ2
E(Xj Xk ) = 2
E[Xj2 ] =
.
n
n
j=1
Thus, limn→∞ E[Yn2 ] = 0, and hence along a subsequence, Yn converges to 0 almost surely. But
we need to show that original sequence converges to 0 with probability 1. To do so, we proceed
P∞ σ2
P
2
2
as follows. Since E[Yn2 ] = σn , we see that ∞
n=1 n2 &lt; +∞ and hence by Lemma
n=1 E[Yn2 ] =
P∞
5.15, ii), n=1 Yn22 converges almost surely. Thus,
lim Yn2 = 0
n→∞
with probability 1.
(5.2)
Let n ∈ N. Then there exists m(n) ∈ N such that (m(n))2 ≤ n &lt; (m(n) + 1)2 . Now
2
(m(n))
n
X
(m(n))2
1X
(m(n))2
1
Yn −
Y(m(n))2 =
Xi −
Xi
n
n
n
(m(n))2
i=1
=
1
n
i=1
n
X
Xi
i=1+(m(n))2
2 i
h
1
(m(n))2
= 2
=⇒ E Yn −
Y(m(n))2
n
n
n
X
E[Xi2 ]
i=1+(m(n))2
n − (m(n))2 2 2m(n) + 1 2
σ ≤
σ (∴ n &lt; (m(n) + 1)2 )
n2
n2
√
√
2 n + 1 2 3σ 2
≤
σ ≤ 3 (∴ m(n) ≤ n)
2
n
n2
∞
∞
h
i
2
2
X 3σ
X
2
(m(n))
Y(m(n))2
≤
=⇒
E Yn −
3 &lt; +∞ .
n
2
n
n=1
n=1
=
Thus, again by Lemma 5.15, ii), we conclude that
lim Yn −
n→∞
Obsere that limn→∞
(m(n))2
n
(m(n))2
Y(m(n))2 = 0
n
with probability 1.
(5.3)
= 1. Thus, in view of (5.2) and (5.3), we conclude that
lim Yn = 0
n→∞
with probability 1.
This completes the proof.
Theorem 5.16 (Kolmogorov’s strong law of large numbers). Let {Xn } be a sequence of
i.i.d. random variables and &micro; ∈ R. Then
lim
n→∞
Sn
= &micro; a.s. if and only if E[Xn ] = &micro;.
n
In this case, the convergence also holds in L1 .
Example 5.16 (Monte Carlo approximation). Let f be a measurable function in [0, 1] such
R1
R1
that 0 |f (x)| dx &lt; ∞. Let α = 0 f (x) dx. In general we cannot obtain a closed form expression
60
A. K. MAJEE
for α and need to estimate it. Let {Uj } be a sequence of independent uniformly random variables
on [0, 1]. Then by Theorem 5.16,
Z 1
n
1X
lim
f (Uj ) = E[f (Uj )] =
f (x) dx
n→∞ n
0
j=1
R1
a.s. and in L2 . Thus, to get an approximation of 0 f (x) dx, we need to simulate the uniform
random variables Uj (by using a random number generator).
Theorem 5.17 (Central limit theorem). Let {Xn } be a sequence of i.i.d. random variables
−n&micro;
. Then Yn converges in
with finite mean &micro; and variance σ 2 with 0 &lt; σ 2 &lt; +∞. Let Yn = Sσn√
n
distribution to Y , where L(Y ) = N (0, 1).
Proof. With out loss of generality, we assume that &micro; = 0. Let Φ, ΦYn be the characteristic
function of Xj and Yn respectively. Since {Xj } are i.i.d., we have
ΦYn (u) = E[e
iuYn
iu σS√nn
iu
Pn
X
i=1
√ i
σ n
] = E[e
] = E[e
]
n
n
hY
X i
Y iu X√i u n
iu √i
=E
e σ n =
E e σ n = Φ( √ ) .
σ n
i=1
i=1
Since E[|Xj |2 ] &lt; +∞, the function Φ has two continuous derivatives. In particular,
Φ0 (u) = iE[Xj eiuXj ],
Φ00 (u) = −E[Xj2 eiuXj ] =⇒ Φ0 (0) = 0, Φ00 (0) = −σ 2 .
Expanding Φ in a Taylor expansion about u = 0, we have
Φ(u) = 1 −
σ 2 u2
+ h(u)u2 ,
2
where h(u) → 0 as u → 0.
Thus, we get
u
))
n log(Φ( σ√
n
ΦYn (u) = e
2
=⇒ lim ΦYn (u) = e−
n→∞
n log(1− u
+
2n
=e
u2
2
= ΦY (u)
u2
u
))
h( σ√
n
nσ 2
(by L’Hôpital rule).
Hence by Levy’s continuity theorem, we conclude that Yn converges in distribution to Y with
L(Y ) = N (0, 1).
Remark 5.3. If σ 2 = 0, then Xj = &micro; a.s. for all j, and hence
Sn
n
= &micro; a.s.
One can weaken slightly the hypotheses of Theorem 5.17. Indeed, we have the following
Central limit theorem.
Theorem 5.18. Let {Xn } be independent but not necessarily identicallu distributed.
Let E[Xn ] = 0 for all n, ane let σn2 = Var(Xn ). Assume that
sup E[|Xn |2+ε ] &lt; +∞ for some ε &gt; 0,
n
Then √PSnn
i=1
σi2
∞
X
σn2 = ∞.
n=1
converges in distribution to Y with L(Y ) = N (0, 1).
Example 5.17. Let {Xn } be a sequence of i.i.d random variables such that P(Xn = 1) = 12 and
P(Xn = 0) = 12 . Then &micro; = 12 and σ 2 = 14 . Hence by central limit theorem, Yn = 2S√nn−n converges
in distribution to Y with L(Y ) = N (0, 1).
PROBABILITY AND STOCHASTIC PROCESS
61
Example 5.18. Let X ∼ B(n, p). For any given 0 ≤ α ≤ 1, we want to find n such that
P(X &gt; n2 ) ≤ 1 − α. We can thaink of X as a sum of n i.i.d. random variable Xi such that
Xi ∼ B(1, p). Hence by central limit theorem, for large n,
Z x
p
u2
1
e− 2 du.
P(X − np ≤ x np(1 − p)) = Φ(x) = √
2π −∞
√
p
Choose x such that np + x np(1 − p) = n2 . This implies that x = 2n √1−2p . Thus,
p(1−p)
Z
√
n
2
√1−2p
p(1−p)
u2
n
n
1
e− 2 du .
) = 1 − P(X ≤ ) = 1 − √
2
2
2π −∞
Therefore, we need to choose n such that
p
Z √n √1−2p
2
2
√
2 p(1 − p) −1
1
p(1−p) − u
e 2 du =⇒ n ≥
α≤ √
Φ (α)
1 − 2p
2π −∞
4p(1 − p) −1 2
Φ (α) .
=⇒ n ≥
(1 − 2p)2
P(X &gt;
Example
5.19. Let {Xi } be sequence of i.i.d. random variables with exp(1) distributed. Let
Pn
i=1 Xi
. How large should n be such that
X̄ =
n
P(0.9 ≤ X̄ ≤ 1.1) ≥ 0.95?
Since Xi ’s are exp(1) distributed, &micro; = E[Xi ] = 1 and σ 2 = Var(Xi ) = 1. Let Y =
Then by central limit theorem, Y√−n
is approximately N (0, 1). Now
n
Pn
i=1 Xi .
(0.9)n − n
Y −n
(1.1)n − n
√
√
≤ √
≤
)
n
n
n
√
√
√
√ Y −n
= P − (0.1) n ≤ √
≤ (0.1) n = Φ((0.1) n) − Φ(−(0.1) n)
n
√
= 2Φ((0.1) n) − 1 (∴ Φ(−x) = 1 − Φ(x))
P(0.9 ≤ X̄ ≤ 1.1) = P((0.9)n ≤ Y ≤ (1.1)n) = P(
Hence we need to find n such that
√
√
√
2Φ((0.1) n) − 1 ≥ 0.95 =⇒ Φ((0.1) n) ≥ 0.975 =⇒ (0.1) n ≥ Φ−1 (0.975) = 1.96
√
=⇒ n ≥ 19.6 =⇒ n ≥ 384.16 =⇒ n ≥ 385 (∴ n ∈ N).
```