# W1-442-Class-Notes ```Week 1: Review of Probability
Week 1: Review of Probability
Key Concepts:
Week 1: Review of Probability
Key Concepts:
Sample space and Events
Rules of Probability
Conditional probability and Independence
Computing Probabilities
Week 1: Review of Probability
Key Concepts:
Sample space and Events
Rules of Probability
Conditional probability and Independence
Computing Probabilities
Probability: Definition and Properties
(i) 0 ≤ P(A) ≤ 1
(ii) P(S) = 1
(iii) if A1 , A2 , &middot; &middot; &middot; , and B are mutually exclusive,
i.e. Ai ∩ Aj = ∅, for i 6= j, then
P(∪Ai ) =
X
P(Ai )
i
(iv) Not hard to see that P(∅) = 0
(v) P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
If A1 , A2 , &middot; &middot; &middot; An are n events, then
n
P(U1 Ai ) =
X
P(Ai ) −
X
P(Ai ∩ Aj )+
i&lt;j
X
i&lt;j&lt;k
P(Ai ∩ Aj ∩ Ak ) − &middot; &middot; &middot; + (−)
n−1
n
P(∩1 Ai )
Week 1: Review of Probability
Week 1: Review of Probability
Conditional Probability &amp; Independence:
Week 1: Review of Probability
Conditional Probability &amp; Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)
Events A and B are independent if
P(A ∩ B) = P(A)P(B)
Week 1: Review of Probability
Conditional Probability &amp; Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)
Events A and B are independent if
P(A ∩ B) = P(A)P(B)
Discrete Probability
Week 1: Review of Probability
Conditional Probability &amp; Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)
Events A and B are independent if
P(A ∩ B) = P(A)P(B)
Discrete Probability
List of values x1 , x2 , &middot; &middot; &middot; , xn
Associated probabilities:
P(x1 ), P(x2 ), &middot; &middot; &middot; , P(xn )
(i) P(x
P i) ≥ 0
(ii)
i P(xi ) = 1
Examples: Bernoulli(p), Binomial(n,p),
hypergeometric, and Poisson.
Week 1: Review of Probability
Conditional Probability &amp; Independence:
The conditional probability of A given B,
P(A∩B)
P(A|B) = P(B)
Events A and B are independent if
P(A ∩ B) = P(A)P(B)
Continuous Probability
A continuous probability distribution is completely
defined via the probability density function, f . The p.d.f
satisfies
(i) f ≥ 0
R∞
(ii) −∞
f (x)dx = 1
(iii) The probabilities are calculated by
Discrete Probability
List of values x1 , x2 , &middot; &middot; &middot; , xn
Associated probabilities:
P(x1 ), P(x2 ), &middot; &middot; &middot; , P(xn )
(i) P(x
P i) ≥ 0
(ii)
i P(xi ) = 1
Examples: Bernoulli(p), Binomial(n,p),
hypergeometric, and Poisson.
b
Z
P(a, b) =
f (x)dx
a
(iv) Note: for continuous distribution, p(x) = 0 for
all x. So,
P(a, b) = P[a, b]
Examples: Normal, Uniform, exponential ,and
Pareto.
Week 1: Review of Probability
Week 1: Review of Probability
Continuous Probability on R2
Week 1: Review of Probability
Continuous Probability on R2
Given a p.d.f f (x1 , x2 )
for any a &lt; b; c &lt; d
d
Z
b
Z
f (x1 , x2 )dx1 dx2
P(a &lt; X1 &lt; b, c &lt; X2 &lt; d) =
c
a
More generally, for densities on Rn
n
for any AR⊂
RR ,
P(A) =
f (x1 , x2 , &middot; &middot; &middot; , xn )dx1 dx2 &middot; &middot; &middot; dxn
A
Random Variables
Week 1: Review of Probability
Continuous Probability on R2
Given a p.d.f f (x1 , x2 )
for any a &lt; b; c &lt; d
d
Z
b
Z
f (x1 , x2 )dx1 dx2
P(a &lt; X1 &lt; b, c &lt; X2 &lt; d) =
c
a
More generally, for densities on Rn
n
for any AR⊂
RR ,
P(A) =
f (x1 , x2 , &middot; &middot; &middot; , xn )dx1 dx2 &middot; &middot; &middot; dxn
A
Random Variables
A real valued random variable X is a function
from a probability space (Ω, P) to R
X : Ω 7→ R
The distribution of X is the probability on R
given by
P(X ∈ A) = P(X
−1
(A)) = P({ω : X (ω) ∈ A})
Typically, all that matters is the distribution of X .
The underlying sample space is not very
relevant.
Week 1: Review of Probability
Continuous Probability on R2
Cumulative Distribution function
Given a p.d.f f (x1 , x2 )
for any a &lt; b; c &lt; d
d
Z
b
Z
f (x1 , x2 )dx1 dx2
P(a &lt; X1 &lt; b, c &lt; X2 &lt; d) =
c
a
More generally, for densities on Rn
n
for any AR⊂
RR ,
P(A) =
f (x1 , x2 , &middot; &middot; &middot; , xn )dx1 dx2 &middot; &middot; &middot; dxn
A
If P(X = x) &gt; 0, then F (x) has a jump at x
with jump size equal to p(x)
If P is continuous then F is continuous
Rt
If f is pdf then F (t) = −∞
dx and
f (x) =
Random Variables
A real valued random variable X is a function
from a probability space (Ω, P) to R
X : Ω 7→ R
The distribution of X is the probability on R
given by
P(X ∈ A) = P(X
The function F : R 7→ [0, 1] defined by
F (t) = P(X ≤ t), is called the (Cumulative)
Distribution Function of X
F is increasing int, right continuous,
limt→−∞ F (t) = 0; limt→∞ F (t) = 1
−1
(A)) = P({ω : X (ω) ∈ A})
Typically, all that matters is the distribution of X .
The underlying sample space is not very
relevant.
dF (t)
dt |x
Week 1: Review of Probability
Continuous Probability on R2
Cumulative Distribution function
Given a p.d.f f (x1 , x2 )
for any a &lt; b; c &lt; d
d
Z
b
Z
f (x1 , x2 )dx1 dx2
P(a &lt; X1 &lt; b, c &lt; X2 &lt; d) =
c
a
More generally, for densities on Rn
n
for any AR⊂
RR ,
P(A) =
f (x1 , x2 , &middot; &middot; &middot; , xn )dx1 dx2 &middot; &middot; &middot; dxn
A
If P(X = x) &gt; 0, then F (x) has a jump at x
with jump size equal to p(x)
If P is continuous then F is continuous
Rt
If f is pdf then F (t) = −∞
dx and
f (x) =
Random Variables
A real valued random variable X is a function
from a probability space (Ω, P) to R
X : Ω 7→ R
The distribution of X is the probability on R
given by
P(X ∈ A) = P(X
The function F : R 7→ [0, 1] defined by
F (t) = P(X ≤ t), is called the (Cumulative)
Distribution Function of X
F is increasing int, right continuous,
limt→−∞ F (t) = 0; limt→∞ F (t) = 1
−1
(A)) = P({ω : X (ω) ∈ A})
Typically, all that matters is the distribution of X .
The underlying sample space is not very
relevant.
dF (t)
dt |x
Moment generating function
The function MX (t) := E(etX ) is called the
moment generating Function (MGF) of X .
Moment generating function, if it exists in an
interval containing 0, huniquelyi determines the
distribution. E(X ) =
dMX (t)
dt
t=0
MGF may not always exist.
E(e−itX ) is called the characteristics function.
This always exists and has nice properties.
Week 1: Review of Probability
Week 1: Review of Probability
Joint Distribution: Discrete
Week 1: Review of Probability
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) &gt; 0; i,j P(xi , yj ) = 1
P
PX (xi ) =
j P(xi , yj ) is called the marginal
distribution of X
Week 1: Review of Probability
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) &gt; 0; i,j P(xi , yj ) = 1
P
PX (xi ) =
j P(xi , yj ) is called the marginal
distribution of X
Joint distribution: continuous
Week 1: Review of Probability
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) &gt; 0; i,j P(xi , yj ) = 1
P
PX (xi ) =
j P(xi , yj ) is called the marginal
distribution of X
Joint distribution: continuous
The joint density of (X , Y ) is given by the joint
pdf f (x, y )
R R
P((X , Y ) ∈ A) =
f (x, y )dxdy
A
The marginal density is given by
Z
∞
fX (x) =
f (x, y)dy
−∞
Conditional density of Y given X = x is
f (y|x) =
f (x, y)
fX (x)
Week 1: Review of Probability
Independence
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) &gt; 0; i,j P(xi , yj ) = 1
P
PX (xi ) =
j P(xi , yj ) is called the marginal
distribution of X
Joint distribution: continuous
The joint density of (X , Y ) is given by the joint
pdf f (x, y )
R R
P((X , Y ) ∈ A) =
f (x, y )dxdy
A
The marginal density is given by
Z
∞
fX (x) =
f (x, y)dy
−∞
Conditional density of Y given X = x is
f (y|x) =
f (x, y)
fX (x)
X and Y are independent if and only if
f (x, y) = fX (x)fY (x) for all x, y
If X and Y are independent then ρ(X , Y ) = 0.
The converse is not true
If X and Y are independent
V (X + Y ) = V (X ) + V (Y )
If X1 and X2 are independent then
MX1 +X2 (t) = MX1 (t) &times; MX2 (t)
Week 1: Review of Probability
Independence
Joint Distribution: Discrete
If (X , Y ) are two discrete random variables,
their joint probability are described by
Joint Probability distribution
P(xi , yj ) = P(X = xi , Y = yj )
P
for all xi , yj ; P(xi , yj ) &gt; 0; i,j P(xi , yj ) = 1
P
PX (xi ) =
j P(xi , yj ) is called the marginal
distribution of X
Joint distribution: continuous
The joint density of (X , Y ) is given by the joint
pdf f (x, y )
R R
P((X , Y ) ∈ A) =
f (x, y )dxdy
A
The marginal density is given by
Z
X and Y are independent if and only if
f (x, y) = fX (x)fY (x) for all x, y
If X and Y are independent then ρ(X , Y ) = 0.
The converse is not true
If X and Y are independent
V (X + Y ) = V (X ) + V (Y )
If X1 and X2 are independent then
MX1 +X2 (t) = MX1 (t) &times; MX2 (t)
Inequalities
Markov’s Ineqaulity if X ≥ 0 then
∞
fX (x) =
f (x, y)dy
−∞
Conditional density of Y given X = x is
f (x, y)
f (y|x) =
fX (x)
P(X ≥ a) ≤
E(X )
a
Chebyshev’s inequality
P|(X − E(X )| &gt; a) ≤
V (x)
a2
Hoeffding’s inequality if X1 , X2 , &middot; &middot; &middot; , Xn are i.i.d
Ber(p) then P(X̄n − p ≥ a) ≤ 2e−2na
2
Week 1: Limit Theorems
Week 1: Limit Theorems
Key Concepts:
Week 1: Limit Theorems
Key Concepts:
Notions of convergence
Law of Large Number (LLN)
Central Limit Theorem (CLT)
Week 1: Limit Theorems
Key Concepts:
Notions of convergence
Law of Large Number (LLN)
Central Limit Theorem (CLT)
Notions of Convergence
Let (Ω, P) be a probability space. For each ω ∈ Ω, we
define a sequence of random variable
X1 (ω), X2 (ω), &middot; &middot; &middot;
We know, what it means to say a sequence of
numbers an converges to a
X1 (ω), X2 (ω), &middot; &middot; &middot; are functions. A natural
definition would be
Xn → X if Xn (ω) → X (ω) for all ω
Note that the underlying probability plays no
role here.
We need a notion of convergence that uses P
Week 1: Limit Theorems
Key Concepts:
Notions of convergence
Convergence in Probability
We say that Xn → X in probability if
Law of Large Number (LLN)
Central Limit Theorem (CLT)
Notions of Convergence
Let (Ω, P) be a probability space. For each ω ∈ Ω, we
define a sequence of random variable
X1 (ω), X2 (ω), &middot; &middot; &middot;
We know, what it means to say a sequence of
numbers an converges to a
X1 (ω), X2 (ω), &middot; &middot; &middot; are functions. A natural
definition would be
Xn → X if Xn (ω) → X (ω) for all ω
Note that the underlying probability plays no
role here.
We need a notion of convergence that uses P
P(|Xn − X | &gt; ) = 0, as n → ∞
or conversely, if for every positive ,
P(|Xn − X | ≤ ) = 1, as n → ∞.
This means that by taking n sufficiently large,
one can achieve arbitrarily high probability that
Xn is arbitrarily close to X .
Week 1: Limit Theorems
Week 1: Limit Theorems
Convergence in Distribution:
Week 1: Limit Theorems
Convergence in Distribution:
Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artifitial (will discuss this in a
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have
FXn (t) → FX (t) for all t
Now, suppose Yn ∼ N(0, σn ); and σn → 0
P(−a &lt; Yn &lt; a) = P( −a
σn &lt; Z &lt;
a
σn
)
Week 1: Limit Theorems
Convergence in Distribution:
Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artifitial (will discuss this in a
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have
FXn (t) → FX (t) for all t
Now, suppose Yn ∼ N(0, σn ); and σn → 0
P(−a &lt; Yn &lt; a) = P( −a
σn &lt; Z &lt;
(why?),
a
σn
)→1
Week 1: Limit Theorems
Convergence in Distribution:
Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artifitial (will discuss this in a
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have
FXn (t) → FX (t) for all t
Now, suppose Yn ∼ N(0, σn ); and σn → 0
P(−a &lt; Yn &lt; a) = P( −a
σn &lt; Z &lt;
(why?), since σan → ∞
a
σn
)→1
so we would like to say that the distribution of
Yn converges to the probability concentrated at
0
But Fn (0) = 0.5 for all n and F (0) = 1. So
Fn (0) does not converege to F (0)
F is not continuous at 0. At all other t,
Fn (t) → F (t)
In the topics we cover F will typically be Normal
distribution. So we do not have to worry about
discontinuous points. There is none.
A useful tool for showing convergence in
distribution is the following:
If MXn (t) → MX , then Xn converges to X .
Week 1: Limit Theorems
Convergence in Distribution:
Xn → X in distribution if,
FXn (t) → FX (t) for all t such that FX is
continuous at t
The restriction on to points of continuity makes
this definition look artifitial (will discuss this in a
moment)
Note: If the limit X has a continuous
distribution, then FX is continuous and we have
Now, suppose Yn ∼ N(0, σn ); and σn → 0
a
σn
)→1
so we would like to say that the distribution of
Yn converges to the probability concentrated at
0
But Fn (0) = 0.5 for all n and F (0) = 1. So
Fn (0) does not converege to F (0)
F is not continuous at 0. At all other t,
Fn (t) → F (t)
A useful tool for showing convergence in
distribution is the following:
If MXn (t) → MX , then Xn converges to X .
Example: Poisson Approximation to Bionamial
If npn → λ then Bin(n, pn ) convergies to
Poisson(λ)
MGF of Bin(n, Pn ); Mn (t) = (1 − pn + pn et )n
FXn (t) → FX (t) for all t
P(−a &lt; Yn &lt; a) = P( −a
σn &lt; Z &lt;
(why?), since σan → ∞
In the topics we cover F will typically be Normal
distribution. So we do not have to worry about
discontinuous points. There is none.
write pn = λ/n, so that
t n
Mn (t) = (1 − λ
n (1 − e ))
Since
(1 − xn )n → e−x ; Mn (t) → exp(λ(et − 1))
Week 1: Limit Theorems
Week 1: Limit Theorems
Law of Large Numbers:
Week 1: Limit Theorems
Law of Large Numbers:
Theorem (WLLN)
Let X1 , X2 , &middot; &middot; &middot; , Xi &middot; &middot; &middot; Xn be a sequence of
independent random variables with E(Xi ) = &micro; and
P
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any
&gt; 0,
P(|X̄n − &micro;| &gt; ) → 0
as n → ∞
Proof.
We first find E(X̄n ) and Var (X̄n ):
E(X̄n ) =
n
1X
E(Xi ) = &micro;
n i=1
Since the Xi ’s are independent,
Var (X̄n ) =
n
1 X
σ2
Var (Xi ) =
2
n i=1
n
The desired result now follows immediately from
Chebyshev’s inequality, which states that
P(|X̄n − &micro;| &gt; ) ≤
∞
Var (X̄n )
2
=
σ2
n2
→ 0, as n →
Week 1: Limit Theorems
Law of Large Numbers:
Theorem (WLLN)
Let X1 , X2 , &middot; &middot; &middot; , Xi &middot; &middot; &middot; Xn be a sequence of
independent random variables with E(Xi ) = &micro; and
P
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any
&gt; 0,
P(|X̄n − &micro;| &gt; ) → 0
as n → ∞
Proof.
We first find E(X̄n ) and Var (X̄n ):
E(X̄n ) =
n
1X
E(Xi ) = &micro;
n i=1
Since the Xi ’s are independent,
Var (X̄n ) =
n
1 X
σ2
Var (Xi ) =
2
n i=1
n
The desired result now follows immediately from
Chebyshev’s inequality, which states that
P(|X̄n − &micro;| &gt; ) ≤
∞
Var (X̄n )
2
=
σ2
n2
→ 0, as n →
Problems
1. Let X1 , X2 , &middot; &middot; &middot; be a sequence of independent
random variables with E(Xi ) = &micro; and
P
Var (Xi ) = σi2 . Show that if n−2 ni=1 σi2 → 0,
then X̄ → &micro; in probability.
2. Let Xi be as in Problem 1 but with E(Xi ) = &micro;i
P
and n−1 ni=1 &micro;i → &micro;. Show that X̄ → &micro; in
probability.
Week 1: Limit Theorems
Law of Large Numbers:
Theorem (WLLN)
Let X1 , X2 , &middot; &middot; &middot; , Xi &middot; &middot; &middot; Xn be a sequence of
independent random variables with E(Xi ) = &micro; and
P
Var (Xi ) = σ 2 . Let X̄n = n−1 ni=1 Xi Then, for any
&gt; 0,
P(|X̄n − &micro;| &gt; ) → 0
as n → ∞
Problems
1. Let X1 , X2 , &middot; &middot; &middot; be a sequence of independent
random variables with E(Xi ) = &micro; and
P
Var (Xi ) = σi2 . Show that if n−2 ni=1 σi2 → 0,
then X̄ → &micro; in probability.
2. Let Xi be as in Problem 1 but with E(Xi ) = &micro;i
P
and n−1 ni=1 &micro;i → &micro;. Show that X̄ → &micro; in
probability.
Solution
X1 , X2 , &middot; &middot; &middot; , is a sequence of independent random
variables with E(Xi ) = &micro;, Var (Xi ) = σi2 . If
P 2
n−2
σi → 0, show that WLLN holds.
Proof.
We first find E(X̄n ) and Var (X̄n ):
Pn
n
1X
E(Xi ) = &micro;
E(X̄n ) =
n i=1
1
E(X̄ ) = &micro; Var (X̄ ) = Var
Xi
Pn
n
=
By Chebyshev
Since the Xi ’s are independent,
1
Var (X̄n ) = 2
n
n
X
i=1
σ2
Var (Xi ) =
n
The desired result now follows immediately from
Chebyshev’s inequality, which states that
P(|X̄n − &micro;| &gt; ) ≤
∞
Var (X̄n )
2
=
σ2
n2
→ 0, as n →
P(|X̄ − &micro;| &gt; ) ≤
1
2
Pn
2
1 σi
n2
The last term goes to zero by assumption.
2
1 σi
n2
Week 1: Limit Theorems
Week 1: Limit Theorems
Example (Monte Carlo):
Week 1: Limit Theorems
Example (Monte Carlo):
R
Suppose we want to evaluate 01 f (x)dx which
is difficult to evaluate analytically
R
Note: 01 f (x)dx = E(f ) with respect to uniform
distribution
Simulate x1 , x2 , &middot; &middot; &middot; , xn (large n) from U(0, 1)
distribution
By WLLN
Z 1
n
1X
f (x)dx
f (xi ) ≈
n 1
0
This is called Monte Carlo integration
Week 1: Limit Theorems
Example (Monte Carlo):
R
Suppose we want to evaluate 01 f (x)dx which
is difficult to evaluate analytically
R
Note: 01 f (x)dx = E(f ) with respect to uniform
distribution
Simulate x1 , x2 , &middot; &middot; &middot; , xn (large n) from U(0, 1)
distribution
By WLLN
Z 1
n
1X
f (x)dx
f (xi ) ≈
n 1
0
This is called Monte Carlo integration
Homework 1-Problem 19 &amp; 20
Find Monte Carlo approximation to
R1
cos2πxdx
0
Find an estimate of the standard deviaiton of
the approximation
Week 1: Limit Theorems
Week 1: Limit Theorems
Central Limit Theorem(CLT):
Week 1: Limit Theorems
Central Limit Theorem(CLT):
Theorem (CLT)
Let X1 , X2 , &middot; &middot; &middot; , be a sequence of independent
random variables with E(Xi ) = 0, Var (Xi ) = σ 2 with
common distribution F .( that is
X1 , X2 , &middot; &middot; &middot; , i.i.d ∼ F ).
Assume that F has MGF M(t) defined in an interval
around 0. P
n
Let Sn =
1 Xi . Then for all −∞ &lt; x &lt; ∞,
P
Sn
√ ≤x
σ n
→ Φ(x) as n → ∞
where Φ(x) = P(Z ≤ x) is the CDF of standard
normal
Week 1: Limit Theorems
Central Limit Theorem(CLT):
Dividing the numerator and denominator of
Sn
√ by n, we get
σ n
Theorem (CLT)
Let X1 , X2 , &middot; &middot; &middot; , be a sequence of independent
random variables with E(Xi ) = 0, Var (Xi ) = σ 2 with
common distribution F .( that is
X1 , X2 , &middot; &middot; &middot; , i.i.d ∼ F ).
Assume that F has MGF M(t) defined in an interval
around 0. P
n
Let Sn =
1 Xi . Then for all −∞ &lt; x &lt; ∞,
P
Sn
√ ≤x
σ n
√
P
√
→ Φ(x) as n → ∞
Extensions of CLT
The central limit theorem can be proved in
greater generality
√
Note: sd(Sn ) = nσ, so σS√n n has mean 0 and
s.d 1
!
→ Φ(x) as n → ∞
If E(Xi ) = &micro; we can apply CLT to Xi − &micro;(This
has expected value 0) and so in this case,
where Φ(x) = P(Z ≤ x) is the CDF of standard
normal
nX̄n
≤x
σ
P
n(X̄n − &micro;)
≤x
σ
!
→ Φ(x) as n → ∞
Typically
√ we use CLT
to get an approximation of
n(X̄n −&micro;)
P
≤
x
σ
How good is the approximation?
If F is symmetric and has tails that die rapidly
then the approximation is good
In case when F is highly skewed or has tail that
go to 0 very slowly, we need a large n to get a
good approximation.
Week 1: Limit Theorems
Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT):
Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT):
Proof.
M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n
Let
!
Zn =
Sn
√ , MZ (t)
n
σ n
Thus,
=E
e
t
S√
n
σ n
√
= MSn (t/σ n)
n
t
MZn (t) = M
√
σ n
We want to show that , as n → ∞, this goes to et
We will make use of the following result
1+
b + an
n
n
2 /2
b
→ e as an → 0
So we also need to express MZn (t) in this form (how?).
Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT):
Proof.
M(t) = E(etX ), MSn (t) = E(etSn ) = [M(t)]n
Let
!
Zn =
Sn
√ , MZ (t)
n
σ n
Thus,
=E
e
t
S√
n
σ n
√
= MSn (t/σ n)
n
t
MZn (t) = M
√
σ n
We want to show that , as n → ∞, this goes to et
We will make use of the following result
1+
b + an
n
n
2 /2
b
→ e as an → 0
So we also need to express MZn (t) in this form (how?).
Note M(0) = 1;
Since E(X ) = 0, M 0 (0) = 0, E(X 2 ) = σ 2 , we have
M 00 (0) = σ 2
By Taylor expansion
0
M(s) = M(0) + sM (0) +
M(s) = 1 +
s2
σ2 +
s3
s2 00
s3 000
M (0) +
M (0)
2
6
M 000 (0)
Week 1: Limit Theorems
Proof. Central Limit Theorem(CLT):
Proof Continued:
Proof.
tX
M(t) = E(e ), MSn (t) =
Let
Zn =
Sn
√ , MZ (t)
n
σ n
Thus,
=E
M(s) = 1 +
E(etSn )
n
S√
n
σ n
√
= MSn (t/σ n)
e
t
= [M(t)]
!
with s =
σ
t
√
We want to show that , as n → ∞, this goes to et
We will make use of the following result
1+
b + an
n
n
M
2 /2
b
→ e as an → 0
So we also need to express MZn (t) in this form (how?).
Note M(0) = 1;
Since E(X ) = 0, M 0 (0) = 0, E(X 2 ) = σ 2 , we have
M 00 (0) = σ 2
By Taylor expansion
0
M(s) = 1 +
s2
σ2 +
s3
s2 00
s3 000
M (0) +
M (0)
2
6
M 000 (0)
t
√
σ n
where
n =
=1+
t2
+ n
2n
t
000
M (0)
6n3/2
t3
M 000 (0).
6n
3/2 t
M σ√
can be
n
where n =
Show that
M(s) = M(0) + sM (0) +
n
n
t
MZn (t) = M
√
σ n
s2 2
s3 000
σ +
M (0)
2
6
M
t
√
σ n
written as
=1+
t 2 /2 + an
n
with an → 0 We then have
&quot;
#n
n
t
t 2 /2 + an
t 2 /2
M
= 1+
→e
√
n
σ n
as required.
Week 1: Limit Theorems
Week 1: Limit Theorems
Problems
Week 1: Limit Theorems
Problems
17. Suppose that a measurement has mean &micro; and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that
P{|X̄ − &micro;| &lt; 1} = .95?
Week 1: Limit Theorems
Problems
17. Suppose that a measurement has mean &micro; and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that
P{|X̄ − &micro;| &lt; 1} = .95?
( √
P{|X̄ −&micro;| &lt; 1} = P
(
≈P
|Z | &lt;
|
√ )
n(X̄ − &micro;)
n
|&lt;
5
5
√ )
n
= .95
5
But we also know that
P {|Z | &lt; 1.96 = .95}
√
n
5
1.96 =
2
2
n = (1.96) &times; 5
Week 1: Limit Theorems
Problems
17. Suppose that a measurement has mean &micro; and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that
P{|X̄ − &micro;| &lt; 1} = .95?
( √
P{|X̄ −&micro;| &lt; 1} = P
(
≈P
|Z | &lt;
|
√ )
n(X̄ − &micro;)
n
|&lt;
5
5
√ )
n
= .95
5
But we also know that
P {|Z | &lt; 1.96 = .95}
√
n
5
1.96 =
2
2
n = (1.96) &times; 5
Problems
Week 1: Limit Theorems
Problems
17. Suppose that a measurement has mean &micro; and
variance σ 2 = 25. Let X̄ be the average of n
such independent measurements. How large
should n be so that
Problems
P{|X̄ − &micro;| &lt; 1} = .95?
Xi weight of ith package
( √
P{|X̄ −&micro;| &lt; 1} = P
|
√ )
n(X̄ − &micro;)
n
|&lt;
5
5
E(Xi ) = 15, σ = 10
Total weight, T =
√ )
n
|Z | &lt;
= .95
5
P100
1
Xi
(
≈P
P(T &gt; 1700) =
P
But we also know that
P {|Z | &lt; 1.96 = .95}
√
n
5
1.96 =
2
2
n = (1.96) &times; 5
T − 1500
1700 − 1500
&gt;
10 &times; 10
10 &times; 10
≈ P (Z &gt; 2)
Week 1: Limit Theorems
Week 1: Limit Theorems
Problems
Week 1: Limit Theorems
Problems
Week 1: Limit Theorems
Problems
X1 , X2 , &middot; &middot; &middot; , Xn ∼ U(0, 1)
M = max(X1 , X2 , &middot; &middot; &middot; , Xn )
2
P{1 − M &lt; t} = P{M &gt; 1 − t} = 1 − (1 − t)
P{1 − M &lt;
t
} = P{n(1 − M) &lt; t}
n
1 − (1 −
As n → ∞, 1 − 1 −
t
n
n
t n
)
n
→ 1 − e−t
n(1 − M) → exp(1)
Week 1: Limit Theorems
Problems
Problems
X1 , X2 , &middot; &middot; &middot; , Xn ∼ U(0, 1)
M = max(X1 , X2 , &middot; &middot; &middot; , Xn )
2
P{1 − M &lt; t} = P{M &gt; 1 − t} = 1 − (1 − t)
P{1 − M &lt;
t
} = P{n(1 − M) &lt; t}
n
1 − (1 −
As n → ∞, 1 − 1 −
t
n
n
t n
)
n
→ 1 − e−t
n(1 − M) → exp(1)
```