# Lecture 7: Chebyshev`s Inequality, LLN, and the CLT ```Markov’s &amp; Chebyshev’s Inequalities
Markov’s Inequality
For any random variable X ≥ 0 and constant a &gt; 0, then
Lecture 7: Chebyshev’s Inequality, LLN, and the CLT
P(X ≥ a) ≤
E (X )
a
Corollary - Chebyshev’s Inequality:
Sta 111
Colin Rundel
P(|X − E (X )| ≥ a) ≤
May 22, 2014
Var (X )
a2
“The inequality says that the probability that X is far away from its mean
is bounded by a quantity that increases as Var (X ) increases.”
Sta 111 (Colin Rundel)
Markov’s &amp; Chebyshev’s Inequalities
Lecture 7
May 22, 2014
1 / 28
Markov’s &amp; Chebyshev’s Inequalities
Derivation of Markov’s Inequality
Derivation of Chebyshev’s Inequality
Let X be a random variable such that X ≥ 0 then
Proposition - if f (x) is a non-decreasing function then
P(X ≥ a) = P f (X ) ≥ f (a) .
Therefore,
E f (X )
P(X ≥ a) ≤
.
f (a)
If we define the positive valued random variable to be |X − E (X )| and
f (x) = x 2 then
2
P |X − E (X )| ≥ a = P (X − E (X )) ≥ a
2
E (X − E (X ))2
Var (X )
≤
=
a2
a2
Var (X )
P |X − E (X )| ≥ a ≤
a2
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
2 / 28
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
3 / 28
Markov’s &amp; Chebyshev’s Inequalities
Markov’s &amp; Chebyshev’s Inequalities
Using Markov’s and Chebyshev’s Inequalities
Chebyshev’s Inequality - Example
Suppose that it is known that the number of items produced in a factory during a week
is a random variable X with mean 50. (Note that we don’t know anything about the
distribution of the pmf)
Lets use Chebyshev’s inequality to make a statement about the bounds for
the probability of being with in 1, 2, or 3 standard deviations of the mean
for all random variables.
p
If we define a = kσ where σ = Var (X ) then
(a) What can be said about the probability that this week’s production will exceed 75
units?
(b) If the variance of a week’s production is known to equal 25, then what can be said
about the probability that this week’s production will be between 40 and 60 units?
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
4 / 28
P(|X − E (X )| ≥ kσ) ≤
Sta 111 (Colin Rundel)
Markov’s &amp; Chebyshev’s Inequalities
X −&micro;
σ
Lecture 7
May 22, 2014
5 / 28
Law of Large Numbers
Accuracy of Chebyshev’s inequality, cont.
If X ∼ N (&micro;, σ 2 ) then let Z =
Var (X )
1
= 2
2
2
k σ
k
Independent and Identically Distributed (iid)
∼ N (0, 1).
A collection of random variables that share the same probability
distribution and all are mutually independent.
Empirical Rule:
Chebyshev’s inequality:
1 − P(−1 ≤ Z ≤ 1) ≈ 1 − 0.66 = 0.34
1 − P(−1 ≤ U ≤ 1) ≤
1
11
1 − P(−2 ≤ Z ≤ 2) ≈ 1 − 0.96 = 0.04
1 − P(−2 ≤ U ≤ 2) ≤
1
22
= 0.25
1 − P(−3 ≤ Z ≤ 3) ≈ 1 − 0.997 = 0.003
1 − P(−3 ≤ U ≤ 3) ≤
1
32
= 0.11
Example
=1
If X ∼ Binom(n, p) then X =
Pn
i=1 Yi
iid
where Y1 , &middot; &middot; &middot; , Yn ∼ Bern(p)
Why do we care if it is so inaccurate?
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
6 / 28
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
7 / 28
Law of Large Numbers
Law of Large Numbers
Sums of iid Random Variables
Average of iid Random Variables
iid
iid
Let X1 , X2 , &middot; &middot; &middot; , Xn ∼ D where D is some probability distribution with
E (Xi ) = &micro; and Var (Xi ) = σ 2 .
Let X1 , X2 , &middot; &middot; &middot; , Xn ∼ D where D is some probability distribution with
E (Xi ) = &micro; and Var (Xi ) = σ 2 .
We defined Sn = X1 + X2 + &middot; &middot; &middot; + Xn
We defined X n = (X1 + X2 + &middot; &middot; &middot; + Xn )/n = Sn /n then
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
8 / 28
Sta 111 (Colin Rundel)
Law of Large Numbers
Lecture 7
May 22, 2014
9 / 28
Law of Large Numbers
Weak Law of Large Numbers
Law of Large Numbers
Based on these results and Markov’s Inequality we can show the following:
Weak Law of Large Numbers (X̄n converges in probability to &micro;):
lim P(|X̄n − &micro;| &gt; ) = 0
n→∞
Strong Law of Large Numbers (X̄n converges almost surely to &micro;):
P lim X̄n = &micro; = 1
n→∞
Strong LLN is a more powerful result (Strong LLN implies Weak LLN),
but its proof is more complicated.
Therefore, as long as σ 2 &lt; ∞
lim P(|X̄n − &micro;| ≥ ) = 0
n→∞
Sta 111 (Colin Rundel)
⇒
Lecture 7
lim P(|X̄n − &micro;| &lt; ) = 1
n→∞
May 22, 2014
10 / 28
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
11 / 28
Law of Large Numbers
Law of Large Numbers
LLN - Example
LLN and CLT
How large a random sample must be taken from a given distribution in
order for the probability to be at least 0.99 that the sample mean will be
within 2 standard deviations of the mean of the distribution?
Law of large numbers shows us that
Sn − n&micro;
= lim (X̄n − &micro;) → 0
n→∞
n→∞
n
lim
which shows that for large n, n S̄n − n&micro;.
What happens if we divide by something that grows slower than n like
What about 0.95 probability to be within 1 standard deviations of the
mean?
√
n?
√
Sn − n&micro;
d
√
= lim n(X̄n − &micro;) → N(0, σ 2 )
n→∞
n→∞
n
lim
This is the Central Limit Theorem, of which the DeMoivre-Laplace
theorem for the normal approximation to the binomial is a special case.
Hopefully by the end of this class we will have the tools to prove this.
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
12 / 28
Sta 111 (Colin Rundel)
Lecture 7
Law of Large Numbers
May 22, 2014
13 / 28
Law of Large Numbers
Equivalence of de Moivre-Laplace with CLT
Equivalence of de Moivre-Laplace with CLT, cont.
We have already seen that a Binomial random variable is equivalent to the
sum of n iid Bernoulli random variables.
P
iid
Let X ∼ Binom(n,p) where X = ni=1 Yi with Y1 , &middot; &middot; &middot; , Yn ∼ Bern(p) and
E (Yi ) = p, Var (Yi ) = p(1 − p).
The Central Limit theorem gives us,
Sn − n&micro;
√
P c≤
≤ d ≈ Φ(d) − Φ(c)
σ n
P
then for X = Sn = Yi
de Moivre-Laplace tells us:
P(a ≤ X ≤ b) = P
≈Φ
X − np
b − np
a − np
p
≤p
≤p
np(1 − p)
np(1 − p)
np(1 − p)
!
!
b − np
a − np
p
−Φ p
np(1 − p)
np(1 − p)
P(a ≤ X ≤ b) = P
!
=P
≈Φ
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
14 / 28
Sta 111 (Colin Rundel)
a − n&micro;
X − n&micro;
b − n&micro;
√
≤ √
≤ √
nσ
nσ
nσ
a − np
Sn − np
b − np
p
≤p
≤p
np(1 − p)
np(1 − p)
np(1 − p)
!
!
b − np
a − np
p
−Φ p
np(1 − p)
np(1 − p)
Lecture 7
May 22, 2014
!
15 / 28
Law of Large Numbers
Law of Large Numbers
Why do we care? (Statistics)
Astronomy Example
In general we are interested in making statements about how the world
works, we usually do this based on empirical evidence. This tends to
involve observing or measuring something a bunch of times.
An astronomer is interested in measuring, in light years, the distance from his
observatory to a distant star. Although the astronomer has a measuring
technique, he knows that, because of changing atmospheric conditions and
measurement error, each time a measurement is made it will not yield the exact
distance but merely an estimate. As a result the astronomer plans to make a
series of measurements and then use the average value of these measurements as
his estimated value of the actual distance.
LLN tells us that if we average our results we’ll get closer and closer
to the true value as we take more measurements
CLT tells us the distribution of our average is normal if we take
enough measurements, which tells us our uncertainty about the ‘true’
value
Example - Polling: I can’t interview everyone about their opinions but if I
interview enough my estimate should be close and I can use the CLT to
approximate my margin of error.
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
16 / 28
If the astronomer believes that the values of the measurements are independent
and identically distributed random variables having a common mean d (the actual
distance) and a common variance of 4 (light years2 ), how many measurements
should he make to be reasonably sure that his estimated distance is accurate to
within &plusmn;0.5 light years?
Sta 111 (Colin Rundel)
Law of Large Numbers
Lecture 7
May 22, 2014
17 / 28
Law of Large Numbers
Astronomy Example, cont.
Astronomy Example, cont.
We need to decide on a level to account for being “reasonably sure”,
usually taken to be 95% but the choice somewhat arbitrary.
Our previous estimate depends on how long it takes for the distribution of
the average of the measurements to converge to the normal distribution.
CLT only guarantees convergence as n → ∞, but most distributions
converge much more quickly (think about the np ≥ 10, np ≥ 10
requirement for Normal approximation to Binomial).
We can also solve this problem using Chebyshev’s inequality
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
18 / 28
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
19 / 28
Central Limit Theorem
Central Limit Theorem
Moment Generating Function - Properties
Moment Generating Function - Unit Normal
If X and Y are independent random variables then the moment generating
function for the distribution of X + Y is
Let Z ∼ N (0, 1) then
Z ∞
2
1
MZ (t) = E [e tZ ] =
e tx √ e −x /2 dx
2π
−∞
Z ∞
2
x −tx
1
=√
e − 2 dx
2π −∞
Z ∞
(x−t)2
t2
1
=√
e − 2 + 2 dx
2π −∞
Z ∞
(x−t)2
1
t 2 /2
√ e − 2 dx
=e
2π
−∞
MX +Y (t) = E [e t(X +Y ) ] = E [e tX e tY ] = E [e tX ]E [e tY ] = MX (t)MY (t)
Similarly, the moment generating function for Sn , the sum of iid random
variables X1 , X2 , . . . , Xn is
MSn (t) = [MXi (t)]n
Sta 111 (Colin Rundel)
Lecture 7
= et
May 22, 2014
20 / 28
Sta 111 (Colin Rundel)
Central Limit Theorem
Sketch of Proof
Let X1 , . . . , Xn be a sequence of independent and identically distributed
random variables each having mean &micro; and variance σ 2 . Then the
distribution of
Proposition
X1 + &middot; &middot; &middot; + Xn − n&micro;
√
σ n
Sta 111 (Colin Rundel)
May 22, 2014
21 / 28
Let X1 , X2 , . . . be a sequence of independent and identically distributed random
variables and Sn = X1 + &middot; &middot; &middot; + Xn . The distribution of Sn is given by the distribution
function fSn which has a moment generating function MSn with n ≥ 1.
If MSn (t) → MZ (t) for all t, then fSn (t) → fZ (t) for all t at which fZ (t) is
continuous.
That is, for −∞ &lt; a &lt; ∞,
P
Lecture 7
Let Z being a random variable with distribution function fZ and moment generating function MZ .
tends to the unit normal as n → ∞.
X1 + &middot; &middot; &middot; + Xn − n&micro;
√
≤a
σ n
/2
Central Limit Theorem
Central Limit Theorem
2
1
→√
2π
Lecture 7
Z
a
e
−x 2 /2
dx = Φ(a) as n → ∞
−∞
May 22, 2014
22 / 28
We can prove the CLT by letting Z ∼ N (0, 1), MZ (t) = e t
2
showing for any Sn that MSn /√n → e t /2 as n → ∞.
Sta 111 (Colin Rundel)
Lecture 7
2 /2
and then
May 22, 2014
23 / 28
Central Limit Theorem
Central Limit Theorem
Proof of the CLT
Proof of the CLT, cont.
√
The moment generating function of Xi / n is given by
tX
t
i
MXi /√n (t) = E exp √
= MXi √
n
n
Some simplifying assumptions and notation:
E (Xi ) = 0
Var (Xi ) = 1
P
√
√
and this the moment generating function of Sn / n = ni=1 Xi / n is
given by
n
t
MSn /√n (t) = MXi √
n
MXi (t) exists and is finite
LX (t) = log MX (t)
Also, remember L’Hospital’s Rule:
f (x)
f 0 (x)
= lim 0
x→∞ g (x)
x→∞ g (x)
lim
Sta 111 (Colin Rundel)
Lecture 7
Therefore in order to show MSn /√n → MZ (t) we need to show
n
t
2
MXi √
→ e t /2
n
May 22, 2014
24 / 28
Sta 111 (Colin Rundel)
Central Limit Theorem
√
2
[MXi (t/ n)]n → e t /2
√
nLXi (t/ n) → t 2 /2
LXi (t) = log MXi (t)
MX0 (t)
d
i
log MXi (t) =
dt
MXi (t)
MX0 (0)
&micro;
i
=
=0
L0Xi (0) =
MXi (0)
1
L00
Xi (0) =
=
Sta 111 (Colin Rundel)
25 / 28
Proof of the CLT, cont.
LXi (0) = log MXi (0) = log 1 = 0
L00
Xi (t) =
May 22, 2014
Central Limit Theorem
Proof of the CLT, cont.
L0Xi (t) =
Lecture 7
lim
n→∞
0
MXi (t)MX00 (t) − [MX0 (t)]2
d MXi (t)
i
i
=
dt MXi (t)
[MXi (t)]2
MXi (0)MX00 (0) − [MX0 (0)]2
i
i
[MXi (0)]2
E (Xi0 )E (Xi2 ) − E (Xi )2
= E (Xi2 ) − E (Xi )2 = σ 2 = 1
E (Xi0 )2
Lecture 7
May 22, 2014
26 / 28
√
√
L0 (t/ n)(− 21 tn−3/2 )
L(t/ n)
=
lim
n→∞
n−1
−n−2
√
0
L (t/ n)t
= lim
n→∞ 2n−1/2
√
L00 (t/ n)t(− 12 tn−3/2 )
= lim
n→∞
−n−3/2
√ t2
= lim L00 (t/ n)
n→∞
2
t2
=
2
Sta 111 (Colin Rundel)
Lecture 7
by L’Hospital’s rule
by L’Hospital’s rule
May 22, 2014
27 / 28
Central Limit Theorem
Proof of the CLT, Final Comments
The preceding proof assumes that E (Xi ) = 0 and Var (Xi ) = 1.
We can generalize this result to any collection of random variables Yi by
considering the standardized form Yi∗ = (Yi − &micro;)/σ.
√
Y1 + &middot; &middot; &middot; + Yn − n&micro;
Y1 − &micro;
Yn − &micro;
√
n
=
+ &middot;&middot;&middot; +
σ
σ
σ n
√
= (Y1∗ + &middot; &middot; &middot; + Yn∗ ) / n
E (Yi∗ ) = 0
Var (Yi∗ ) = 1
Sta 111 (Colin Rundel)
Lecture 7
May 22, 2014
28 / 28
```