Lecture 13 Law of Large Num

advertisement
PROBABILITY AND STATISTICS
FOR ENGINEERING
The Weak Law and the Strong Law
of Large Numbers
Hossein Sameti
Department of Computer Engineering
Sharif University of Technology
Timeline…
 Bernoulli, weak law of large numbers (WLLN)
1700
 Poisson, generalized Bernoulli’s theorem
1800
 Tchebychev, discovered his method
1866
 Markov, used Tchebychev’s reasoning to extend Bernoulli’s
theorem to dependent random variables as well.
 Borel, the strong law of large numbers that further generalizes
Bernoulli’s theorem.
 Kolmogorov, necessary and sufficient conditions for a set of
mutually independent r.vs.
1909
1926
Weak Law of Large Numbers
 X i s: i.i.d Bernoulli r.vs such that
P ( X i )  p,
P( X i  0)  1  p  q,
 k  X 1  X 2    X n : the number of “successes”in n trials.
 Then the weak law due to Bernoulli states that
 kn  p    npq .
P
2
 i.e., the ratio “total number of successes to the total number of trials”
tends to p in probability as n increases.
Strong Law of Large Numbers
Borel and Cantelli:
 this ratio k/n tends to p not only in probability, but with probability 1.
 This is the strong law of large numbers (SLLN).
 What is the difference?
 SLLN states that if { n } is a sequence of positive numbers converging
to zero, then
 P kn  p   n  .

n 1
 From Borel-Cantelli lemma, when this formula is satisfied the events
An =  kn  p   n  can occur only for a finite number of indices n in an
infinite sequence,

k

 or equivalently, the events n  p   n occur infinitely often, i.e., the
event k/n converges to p almost-surely.
Proof
k
n
Since
p 
k  np   4 n 4

4
we have
 (k  np) 4 pn (k )   4 n 4   4 n 4 P kn  p    P kn  p   
n
k 0
and hence
where
n
 (k  np) 4 pn (k )
P kn  p     k 0
 4n4
n

pn (k )  P  X i  k    nk  p k q n k
 i 1

Proof – continued
n
 (k  np)
4
k 0
n
pn (k )  E{( X i  np)
i 1
n
n
n
n
4
n
}  E{( ( X
i 1
n
 E{(  Yi ) }   E (YiYk Y jYl )
4
i 1
n
4

p
)
)
}
i
i 1 k 1 j 1 l 1
n
n
0
i  1  n can coincide with
j, k or l, and the second variable
takes (n-1) values
n
n
  E (Yi )  4n(n  1) E (Yi ) E (Y j )  3n(n  1) E (Yi 2 ) E (Y j2 )
4
i 1
3
i 1 j 1
i 1 j 1
 n( p 3  q 3 ) pq  3n(n  1)( pq ) 2  [n  3n(n  1)] pq
 3n 2 pq,
 since
p 3  q 3  ( p  q)3  3 p 2 q  3 pq 2  1,
 So we obtain
P
k
n
3 pq
 p    2 4
n
pq  1 / 2  1
Proof – continued
 Let  
1
so that the above integral reads
1/ 8
n
 and hence

P
n 1

k
n


 3/ 2
1
1
 p  1/ 8  3 pq  3/ 2  3 pq(1   1 x dx )
n
n 1 n
 3 pq(1  2)  9 pq  ,
 thus proving the strong law by exhibiting a sequence of positive numbers

k
  1/ n1/ 8 that converges to zero and satisfies  P n  p   n  .
n
n 1
What is the difference?
 The weak law states that for every n that is large enough, the
n
ratio (  X i ) / n  k / n is likely to be near p with certain probability that
i 1
tends to 1 as n increases.
 It does not say that k/n is bound to stay near p if the number of trials is
increased.
 kn  p    npq .is satisfied for a given 
 Suppose P
of trials n0 .
2
in a certain number
 If additional trials are conducted beyond n0 , the weak law does not
guarantee that the new k/n is bound to stay near p for such trials.
 There can be events for which k / n  p   , for n  n0 in some regular
manner.
What is the difference?
 The probability for such an event is the sum of a large number of very
small probabilities
 The weak law is unable to say anything specific about the convergence of
that sum.
 However, the strong law states that not only all such sums converge, but
the total number of all such events:
 P kn  p   n  .

n 1
 where k / n  p   is in fact finite!
Bernstein’s inequality
 This implies that the probability
becomes and remains small,
{ kn  p   } of the events as n increases
 since with probability 1 only finite violations to the above inequality takes
place as n  .
 It is possible to arrive at the same conclusion using a powerful bound
known as Bernstein’s inequality that is based on the WLLN.
Bernstein’s inequality
 Note that
k
n
 and for any
p 

k  n( p   )
  0, this gives e ( k n ( p  ))  1.
 Thus
P{kn
 nk  p k q nk
n
 p  } 
k   n ( p  ) 
 ( k  n ( p  )) n
k nk


e
p
q

k
n

k   n ( p  ) 
  e  ( k  n ( p  )) nk  p k q n  k
n
k 0
Bernstein’s inequality
P{kn
 p  }  e
 n
n
  ( peq ) k (qe p ) nk
k 0
n
k
 e n ( pe q  qe p ) n .
 Since e  x  e
x
q
x2
for any real x,
pe  qe
 p
 p(q  e
 pe
2 q 2
2 q 2
 qe
)  q(p  e
2 p 2
2
e .
 We can obtain
P{kn 
p  }  e
2 n  n
.
2 p 2
)
Bernstein’s inequality
2
 But  n  n is minimum for    / 2 and hence
P{kn 
Similarly,
p  }  e
P{kn 
 n 2 / 4
,
p   }  e
  0.
 n 2 / 4
 and hence we obtain Bernstein’s inequality P{ kn
 p  }  2 e
 n 2 / 4
.
 This is more powerful than Tchebyshev’s inequality as it states that the
chances for the relative frequency k /n exceeding its probability p tends to
zero exponentially fast as n  .
 Chebyshev’s inequality gives the probability of k /n to lie between
p   and p   for a specific n.
 We can use Bernstein’s inequality to estimate the probability for k /n to
lie between p   and p   for all large n.
 Towards this, let
yn  { p    kn  p   }
 so that
P( y ) 
c
n
P{ nk
 p  }  2 e

 n 2 / 4
 To compute the probability of  y n , note that its complement is given by


nm
nm
(  yn ) c   ync
nm
 We have

P(
n m
y )
c
n

 P( y
nm
c
n
)

2e
 n 2 / 4
nm
 This gives

 or,
P(
n m
yn )  {1  P(

n m
yn )}  1 
2e
 m 2 / 4
1 e
 2 / 4

2e
 m 2 / 4
1 e
 2 / 4
.
 1 as m  
P{ p    kn  p   , for all n  m}  1 as m  .
 Thus k /n is bound to stay near p for all large enough n, in probability, as
stated by the SLLN.
Discussion
Let   0.1. Thus if we toss a fair coin 1,000 times, from the weak law
 kn  p    npq .
P
2


1
P k  1  0.1  .
n 2
40
 Thus on the average 39 out of 40 such events each with 1000 or more
trials will satisfy the inequality { kn  12  0.1}
 It is quite possible that one out of 40 such events may not satisfy it.
 Continuing the experiment for 1000 more trials, with k successes out of
n, for n  1000  2000, it is quite possible that for few such n the
above inequality may be violated.
Discussion - continued
 This is still consistent with the weak law
 but according to the strong law such violations can occur only a finite
number of times each with a finite probability in an infinite sequence of
trials,
 hence almost always the above inequality will be satisfied, i.e., the
sample space of k /n coincides with that of p as n  .
Example
 2n red cards and 2n black cards (all distinct) are shuffled together to
form a single deck, and then split into half.
 What is the probability that each half will contain n red and n black
cards?
Solution
 From a deck of 4n cards, 2n cards can be chosen in
ways.
 4n 
 
 2n 
different
 Consider the unique draw consisting of 2n red cards and 2n black
cards in each half.
Example – continued
 Among those 2n red cards, n of them can be chosen in
ways;
 Similarly for each such draw there are
cards.
 2n 
 
n 
 2n 
 
n 
different
ways of choosing n black
 Thus the total number of favorable draws containing n red and n black
 2n   2n 
4n
cards in each half are  n   n  among a total of  2n  draws.

 This gives the desired probability pn to be

 2 n  2 n 
  
n  n 
( 2n ! ) 4

pn 

.
4
4
n
( 4n )!( n ! )
 
 
 2n 
Example – continued
 For large n, using Stirgling’s formula we get
[ 2 (2n) (2n) 2 n e  2 n ]4
2
pn 

2 (4n) (4n) 4 n e  4 n [ 2 n n n e  n ]4
n
 For a full deck of 52 cards, we have n  13, which gives pn  0.221
 For a partial deck of 20 cards, we have n  5 and pn  0.3568.
An Experiment
 20 cards were given to a 5 year old child to split them into two equal
halves
 the outcome was declared a success if each half contained exactly 5 red
and 5 black cards.
 With adult supervision (in terms of shuffling) the experiment was repeated
100 times that very same afternoon. The results are tabulated below in
next slides.
Expt
Number of
successes
Expt
Number of
successes
Expt
Number of
successes
Expt
Number of
successes
Expt
Number of
successes
1
0
21
8
41
14
61
23
81
29
2
0
22
8
42
14
62
23
82
29
3
1
23
8
43
14
63
23
83
30
4
1
24
8
44
14
64
24
84
30
5
2
25
8
45
15
65
25
85
30
6
2
26
8
46
16
66
25
86
31
7
3
27
9
47
17
67
25
87
31
8
4
28
10
48
17
68
25
88
32
9
5
29
10
49
17
69
26
89
32
10
5
30
10
50
18
70
26
90
32
11
5
31
10
51
19
71
26
91
33
12
5
32
10
52
20
72
26
92
33
13
5
33
10
53
20
73
26
93
33
14
5
34
10
54
21
74
26
94
34
15
6
35
11
55
21
75
27
95
34
16
6
36
12
56
22
76
27
96
34
17
6
37
12
57
22
77
28
97
34
18
7
38
13
58
22
78
29
98
34
19
7
39
14
59
22
79
29
99
34
20
8
40
14
60
22
80
29
100
35
Results of an experiment of 100 trials
This figure shows the convergence of k/n to p.
pn
0.3437182
n
Download