# Probability Theory 1 Basic Probability September 27, 2012

Probability Theory
September 27, 2012
1
Basic Probability
Definition: Algebra
If S is a set then a collection A of subsets of S is an algebra if
• S∈A
• A ∈ A =⇒ Ac ∈ A
Sn
• {Ai }ni=1 ∈ A =⇒ i=1 Ai ∈ A
Definition: σ-Algebra
If S is a set then a collection F of subsets of S is an algebra if
• S∈F
• A ∈ F =⇒ Ac ∈ F
S∞
• {Ai }∞
i=1 ∈ F =⇒
i=1 Ai ∈ F
Lemma:
The following are basic properties of σ-algebras
• φ∈F
• {Ai }∞
i=1 ∈ F =⇒
T∞
i=1
Ai ∈ F
Lemma:
If S is a non-empty set and G ∈ P(S)
then ∃F := σ(G) the smallest σ-algebra containing G
Proof:
We know that P(S) is a σ-algebra containing G
Define Γ := T
{A σ-algebra : G ∈ A}
Claim F = A∈Γ A
If this is a σ-algebra it is clearly the smallest one containing G by definition of intersection.
• S ∈ A ∀AT∈ Γ since these are σ-algebras
hence S ∈ A∈Γ A
• Let A ∈ F then A ∈ A ∀A ∈ Γ
hence Ac ∈ A ∀A ∈ Γ
hence Ac ∈ F
∞
∞
• Let {AS
i }i=1 ∈ F then {Ai }i=1 ∈ A
∞
hence Si=1 Ai ∈ A ∀A ∈ Γ
∞
hence i=1 Ai ∈ F
∀A ∈ Γ
1
1
BASIC PROBABILITY
Remark:
From the above lemma we say that F is the σ-algebra generated by G and that G is the generator of F
Definition: BorelQσ-Algebra
n
Let S = Rn , G := { i=1 [ai , bi ] : ai ≤ bi ∈ Q}
Then the Borel σ-algebra is B(Rn ) = σ(G)
Remark:
If S is a topological space then B(S) is the σ-algebra generated by open sets of S
Definition: Measurable Space
(S, F) is a measurable space where
• S is a set
• F is a σ-algebra
Definition: Measurable Set
A is measurable if A ∈ F where F is a σ-algebra
If S is a set, A is an algebra then
&micro; : A → R|[0,∞]
is additive if F, G ∈ A, F ∩ G = φ =⇒ &micro;(F ∪ G) = &micro;(F ) + &micro;(G)
If S is a set, A is an algebra then
&micro; : A → R|[0,∞]
S
P
∞
∞
i=1 ∈ A, Ai ∩ Aj = φ ∀i 6= j =⇒ &micro;
i=1 Ai =
i=1 &micro;(Ai )
Definition: Measure
If (S, F) is a measurable space and &micro; : F → R|[0,∞] is a measure if &micro; is σ-additive
Definition: Finite Measure
A measure &micro; on measurable space (S, F) is finite if &micro;(S) &lt; ∞
Definition: σ-Finite Measure
A measure &micro; on measurable space (S, F) is σ-finite if
∃{Ai }∞
i=1 ∈ F s.t.
• &micro;(Ai ) &lt; ∞ ∀i
S∞
• i=1 Ai = S
Definition: Probability Measure
A measure &micro; on S is a probability measure if &micro;(S) = 1
Definition: Measure Space
A measure space is (S, F, &micro;) where S is a set, F is a σ-algebra on S and &micro; is a measure on (S, F)
Definition: Probability Space
A measure space (S, F, &micro;) is a probability space if &micro; is a probability measure
Definition: π-System
A family Y of subsets of set S is a π-system if Y is closed under intersection
i.e. A, B ∈ Y =⇒ A ∩ B ∈ Y
Definition: Dynkin System
If S is a set and D is a set of subsets of S then D is a Dynkin system of S if
• S∈D
• A, B ∈ D, A ⊆ B =⇒ B \ A ∈ D
• {Ai }∞
i=1 ∈ D, Ai ∩ Aj = φ ∀i 6= j =⇒
S∞
i=1
Ai ∈ D
Remark:
We denote d(G) to be the smallest Dynkin system containing G which exists
Lemma:
If G is a π-system then σ(G) = d(G)
2
1
BASIC PROBABILITY
Proof:
Clearly every σ-algebra is also a Dynkin system hence d(G) ⊆ σ(G) is trivial
hence it remains to show that d(G) is a σ-algebra
• Clearly S ∈ d(G) by definition
• Suppose A ∈ d(G) then B \ A ∈ d(G) ∀B ∈ d(G)
hence Ac = S \ A ∈ d(G)
• Want to show that d(G) is a π-system
define D1 := {A ⊂ S : A ∩ B ∈ d(G) ∀B ∈ G}
Is a Dynkin system and G ⊆ D1
hence d(G) ⊆ D1
define D2 := {A ⊂ S : A ∩ B ∈ d(G) ∀B ∈ d(G)} ⊆ d(G)
Is a Dynkin system and G ⊆ D2
hence d(G) ⊆ D2
hence d(G) is a Dynkin system hence is a π-system
S∞
• Let {A1 }∞
i=1 ∈ d(G), we need to show that
i=1 Ai ∈ d(G)
Si−1
define Bi = Ai \ j=1 Ai ∈ d(G) ∀i
∞
[
Ai =
i=1
∞
[
Bi ∈ d(G)
i=1
since this is a disjoint union
hence indeed d(G) is a σ-algebra
Theorem:
Let (S, F, &micro;) be a probability space and suppose F = σ(G)
for some π-system G ∈ P(S)
If ν is another probability measure on (S, F) s.t.
&micro;(A) = ν(A) ∀A ∈ G
then &micro; = ν
Proof:
write
D = {A ∈ F : &micro;(A) = ν(A)}
is a Dynkin system with G ⊆ D hence
d(G) ⊆ D
hence by the previous lemma F = σ(G) = d(G) = D
Definition: Content
&micro; is a content on (S, A) if
• &micro;(φ) = 0
• &micro;(A ∪ B) = &micro;(A) + &micro;(B) ∀A, B ∈ A : A ∩ B = φ
Theorem Caratheodory’s Theorem
If &micro; is a σ-additive content on (S, A) where A is an algebra of S then &micro; extends to a measure on (S, σ(A))
Lemma: Cantor
If {Kn }∞
n=1 are compact sets in a metric space s.t.
Kn 6= φ ∀n and
Kn+1 ⊆ Kn ∀n
then
∞
\
Kn 6= φ
n=1
Proof:
Choose xn ∈ Kn for each n ∈ N since Kn 6= φ
3
1
{xn }∞
n=r
Notice that
∈ Kr since Kn are decreasing
∃{xnk }∞
k=1 , x0 ∈ K1 s.t.
lim xnk = x0
k→∞
by compactness
{xnk }∞
k=r ∈ Kr
T∞
hence x0 ∈ Kn ∀n hence x0 ∈ n=1 Kn
Lemma:
If {An }∞
n=1 ⊆ A and &micro; is a σ-additive content s.t.
• A = a{Int(a, b) : −∞ ≤ a ≤ b ≤ ∞}
where
(
(a, b]
b&lt;∞
Int(a, b) =
(a, ∞) b = ∞
• An+1 ⊆ An
• limn→∞ An = φ
• &micro;(An ) &lt; ∞
∀n
then
lim &micro;(An ) = 0
n→∞
Proof:
By contradiction assume that limn→∞ &micro;(An ) = δ &gt; 0
then &micro;(An ) ≥ δ ∀n ∈ N
We can choose sets {Fn }∞
n=1 ∈ A s.t.
• F n ⊆ An for each n
• &micro;(An ) − &micro;(Fn ) ≤ 2−n δ
• Fn+1 ⊆ Fn
then we have that
&micro;(Fn ) ≥ &micro;(An ) − δ2−n ≥ δ − δ2−n ≥ δ/2
hence Fn 6= φ
moreover F n 6= φ
F n is compact so by the previous lemma
∞
\
F n 6= φ
n=1
but
∞
\
Fn ⊆
n=1
∞
\
n=1
∞
\
An 6= φ
n=1
An = lim An = φ
n→∞
Proposition:
The Lebesgue measure L on (R, A) where
A = a{Int(a, b) : −∞ ≤ a ≤ b ≤ ∞}
extends to a measure on (R, σ(A) = (R, B(R))
Proof:
By Caratheodory’s theorem it suffices to show that L is σ-additive
4
BASIC PROBABILITY
1
i.e.
∀{Ai }∞
i=1
∈ A pairwise disjoint s.t.
S∞
i=1
Ai ∈ A we have that
!
∞
∞
[
X
L
Ai =
L(Ai )
i=1
i=1
For Af inite := {A ∈ A : L(A) &lt; ∞}
S∞
suppose {Ai }∞
i=1
Sn ∈ Af inite be pairwise disjoint s.t. A = i=1 Ai ∈ Af inite
Define Bn := i=1 Ai
n
X
L(A) = L(A \ Bn ) +
L(Ai )
i=1
by the previous lemma limn→∞ L(A \ Bn ) = 0
hence
L
∞
[
!
Ai
=
i=1
∞
X
L(Ai )
i=1
so it remains to show this holds for A ∈ A \ Af inite
suppose A = (−∞, a]
S∞
let {Ai }∞
i=1 ∈ A be pairwise disjoint s.t.
i=1 Ai = A
Since {Ai }∞
are
pairwise
disjoint
∃
A
∈
{Ai }∞
1 k
i=1
i=1 s.t. Ak = (−∞, ak ]
for some ak
L(A) ≥ 0 since it is a content hence
∞ = L(Ak ) ≤
∞
X
L(Ai )
i=1
∞ = L(Ak ) ≤ L
∞
[
!
Ai
i=1
hence indeed the proposition holds.
Definition: Measurable Function
If (S, F), (S 0 , F 0 ) are measurable spaces then
f : S → S 0 is (S, S 0 )-measurable if
∀A0 ∈ F 0 f −1 (A0 ) ∈ F
Lemma:
If (S, F), (S 0 , F 0 ) are measurable spaces and
F 0 = σ(G), then it is sufficient that
f −1 (A) ∈ F ∀A ∈ G
for f to be measurable.
Corollary:
Let (S, F) is measurable and f : S → R
then if {f ≤ a} ∈ F ∀a ∈ R then f is (S, R) measurable.
Proof:
By the previous lemma take G = {(−∞, a] : a ∈ R} which generates B(R)
Lemma:
If {fi }∞
i=1 are measurable then
• h1 h2
• αh1 + βh2
• h1 ◦ h2
• infi hi
• liminfi hi
• limsupi hi
5
BASIC PROBABILITY
1
BASIC PROBABILITY
are measurable when well defined.
Corollary:
Let S be a topological space with σ-algebra B(S)
If f : S → R is continuous then it is Borel measurable.
Proof:
Open sets generate B(R)
Since the preimage of open sets by a continuous function are open we have that ∀A open f −1 (A) is open hence is
in the σ-algebra generated by open sets in S
Definition: Random Variable
If (Ω, F, P) is a probability space and (S, F 0 ) is measurable then
X : Ω → S is a random variable if it is a measurable mapping
i.e.
∀A0 ∈ F 0 X −1 (A0 ) ∈ F
Definition: Push Forward Measure
If X : Ω → S is a random variable on (Ω, F, P) then the distribution of X is the probability measure
PX (A) = P({X ∈ A})
∀A ∈ F 0
Definition: Cummulative Distribution Function
Define
FX (x) = PX ((−∞, x]) = P({X ≤ x})
Lemma:
Suppose F = FX for some random variable X then
• F : R → [0, 1] is non decreasing
• F is right-continuous
• limx→−∞ F (x) = 0
• limx→∞ F (x) = 1
Lemma:
If F : R → [0, 1] s.t.
• F : R → [0, 1] is non decreasing
• F is right-continuous
• limx→−∞ F (x) = 0
• limx→∞ F (x) = 1
then ∃(Ω, F, P) and random variable X s.t. FX = F
Definition: Product σ-Algebra
If (Si , Fi )ni=1 are measurable spaces then the product σ-algebra on these is
n
O
Fi = σ({A1 &times; ... &times; An : Ai ∈ Fi })
i=1
Lemma:
If (Ω, F), (Si , Fi )ni=1 are measurable spaces then
{Xi : Ω → Si }ni=1 are all random variables N
iff
n
Z := (X1 , ...Xn ) : Ω → S1 &times; ... &times; Sn is (F, i=1 Fi )-measurable
Proof:
Suppose Z is a random variable
let πi : S1 &times; ... &times; Sn → Si be the ith canonical projection
then Xi = πi ◦ Z
6
2 INDEPENDENCE
Compositions of measurable functions are measurable so it is sufficient that the projections πi are measurable
which is true since
πi−1 (A) = R &times; ... &times; R &times; A &times; R &times; ... &times; R
Nn
Suppose {Xi }ni=1 are random variables and let A ∈ i=1 Fi
Z −1 (A) = {ω : z(ω) ∈ A}
= {ω : Xi (ω) ∈ Ai ∀i}
= {ω : ω ∈ Xi−1 (Ai ) ∀i}
n
\
=
Xi−1 Ai ∈ F
i=1
Definition: Joint Distribution
If {Xi }ni=1 : (Ω, F) → (Si , Fi )
are random variables for measurable spaces (Ω, F), (Si , Fi )ni=1
Nn
then the distribution of Z = (X1 , ...Xn ) : (Ω, F) → (Sn &times; ... &times; Sn , i=1 Fi )
is called the joint distribution of (X1 , ...Xn ):
n
\
PZ (A) = P({Z ∈ A}) = P
{Xi ∈ Ai }
i=1
2
Independence
Definition: Independence
Let (Ω, F, P) be a probability space then
• Sub-σ algebras {Fi }ni=1 are independent if
whenever {ik }nk=1 are distinct and Aik ∈ Fik then
n
k
\
Y
Ai k =
P(Aik )
P
i=1
i=1
• Random variables {Xi }ni=1 are independent if
the sub-σ algebras {σ(Xi )}ni=1 are independent
where σ(Xi ) = σ({Xi−1 (A) : A ∈ Fi })
• Events {Ei }ni=1 ∈ F are independent if
the sub-σ algebras {σ(Ei )}ni=1 are independent
where σ(Ei ) = {φ, Ω, Ei , Ω \ Ei }
Lemma:
Suppose G, H are sub-σ algebras of F on the probability space (Ω, F, P)
s.t. G = σ(I), H = σ(J )
for π-systems I, J
then G, H are independent iff
P(I ∩ J) = P(I)P(J)
∀I ∈ I, J ∈ J
Proof:
Independence =⇒ P(I ∩ J) = P(I)P(J)
∀I ∈ I, J ∈ J
is trivial since independence =⇒ P(I ∩ J) = P(I)P(J)
∀I ∈ G ⊃ I, J ∈ H ⊃ J
Define &micro;(A) = P(A ∩ I), ν(A) = P(A)P(I)
∀I ∈ I, A ∈ H
clearly these are measures and by the property we have that they coincide on H
Define η(A) = P(A ∩ H), κ(A) = P(A)P(H)
∀A ∈ G, H ∈ H
clearly these are measures and by the property we have that they coincide on G
7
2
So indeed we have independence.
Corollary:
Let (Ω, F, P) be a probability space and X, Y : Ω → R be random variables.
If
P({X ≤ a} ∩ {Y ≤ b}) = P(X ≤ a)P(Y ≤ b)
∀a, b ∈ R
then X, Y are independent.
Proof:
σ({(−∞, a] : a ∈ R}) = B(R) hence this holds by the previous lemma.
Corollary:
X, Y are independent iff
FX (a)FY (b) = P(X ≤ a)P(Y ≤ b)
= P({X ≤ a} ∩ {Y ≤ b})
= FX,Y (a, b)
∀a, b ∈ R
Lemma:
Let (Ωi , FN
i , &micro;i )i=1,2 be σ-finite measure spaces then
∃1 &micro; = &micro;1 &micro;2 which is σ-finite and
&micro;(A1 &times; A2 ) = &micro;(A1 )&micro;2 (A2 )
∀Ai ∈ Fi i = 1, 2
N
&micro; is a measure on (Ω1 &times; Ω2 , F1 F2 )
Corollary:
Random variables X, Y with joint distribution PX,Y are independent iff
PX,Y = PX PY
Definition: Tail-σ Algebras
If {Fi }∞
i=1 are a sequence of σ-algebras then the tail-σ algebra is defined as
T =
∞
∞
[
\
{Fk }
σ
n=1
k=n
If {Xi }∞
i=1 are a sequence of random variables then the tail-σ algebra is defined as
T =
∞
∞
[
\
σ
{σ(Xk )}
n=1
k=n
Definition: Deterministic
A random varibale X is deterministic if ∃c ∈ [−∞, ∞] s.t. P(X = c) = 1
Definition: P-Trivial
T is P-trivial if
• A∈T
=⇒ P(A) ∈ {0, 1}
• If X is a T -measurable random variable then X is deterministic.
Lemma:
Let &micro; be a measure on measurable space (Ω, F) and {Ai }∞
i=1 ∈ F
• If Ai ⊆ Ai+1 ∀i then
&micro;( lim Ai ) = lim &micro;(Ai )
i→∞
i→∞
• If Ai ⊇ Ai+1 ∀i and &micro;(A1 ) &lt; ∞ then
&micro;( lim Ai ) = lim &micro;(Ai )
i→∞
i→∞
8
INDEPENDENCE
2
INDEPENDENCE
Theorem: Kolmogorov
Suppose that {Fi }∞
i=1 are an independent sequence of σ-algebras then the associated σ-algebra is P-trivial
Proof:
P(A) ∈ {0, 1} ⇐⇒ P(A)2 P(A)
⇐⇒ P(A)P(A) = P(A ∩ A)
⇐⇒ A ⊥ A
S
∞
F
Let Tn = σ
k
k=n+1
T∞
and T = n=1 Tn Sn
Hn := σ
k=1 Fk
{Fk }∞
k=1 are independent hence
Hn ⊥ Tn hence
Hn ⊥ T
therefore A ∈ T is independent from every Hn
S
∞
n=1 Hn is a π-system hence it is sufficient that
A⊥σ
∞
[
∞
[
Hn = σ
F k = T0 ⊃ T
n=1
k=1
hence A ⊥ A
Let c ∈ R by the first part of the theorem we have that P(X ≤ c) ∈ {0, 1}
define c := sup{x : P(X ≤ x) = 0}
If c = −∞ then P(X = −∞) = 1
If c = ∞ then P(X = ∞) = 1
If |c| &lt; ∞ then
1
)=0
n
1
P(X ≤ c + ) = 1
n
∀n ∈ N
P(X ≤ c −
∀n ∈ N
hence by the previous lemma
∞
[
1
= P(X &lt; c)
P
{X ≤ c −
n
n=1
=0
∞
\
1
= P(X ≤ c)
P
{X ≤ c +
n
n=1
=1
P(X = c) = P(X ≤ c) − P(X &lt; c)
=1
Definition: Limsup
Let (Ω, F, P) be a probability space with {Ai }∞
i=1 ∈ F then
9
2
{An : i.o.} =
∞ [
\
An
m=1 n≥m
= {ω : ∀m; ∃n ≥ m s.t. ω ∈ An }
= limsupn An
Definition: Liminf
Let (Ω, F, P) be a probability space with {Ai }∞
i=1 ∈ F then
{An : i.o.} =
∞ \
[
An
m=1 n≥m
= {ω : ∃m; ∀n ≥ m s.t. ω ∈ An }
= liminfn An
Lemma: Fatou
Let (Ω, F, &micro;) be a measure space and {An }∞
n=1 ∈ F then
•
&micro;(liminfn An ) ≤ liminfn &micro;(An )
• If &micro;(Ω) &lt; ∞ then
&micro;(limsupn An ) ≥ limsupn &micro;(An )
Proof:
• Define Zm :=
T
n≥m
An is monotonic and &micro;(Zm ) &lt; &micro;(An ) ∀n ≥ m
∞
[
Zm
&micro;(liminfn An ) = &micro;
m=1
= lim &micro;(Zm )
m→∞
≤ lim infn≥m &micro;(An )
m→∞
= liminfn &micro;(An )
•
&micro;(Ω) − &micro;(limsupn An ) = &micro;(Ω \ limsupn An )
= &micro;(liminfn (Ω \ An ))
≤ liminfn &micro;(Ω \ An )
= &micro;(Ω) − limsupn &micro;(An )
Since &micro;(Ω) &lt; ∞ we have that
&micro;(limsupn An ) ≥ limsupn &micro;(An )
Lemma: Borel-Cantelli
Let (Ω, F, P) be a probability space and {Ai }∞
i=1 ∈ F
then
10
INDEPENDENCE
2
• If
INDEPENDENCE
P∞
P(An ) &lt; ∞ then P(An : i.o.) = 0
P∞
• If {Ai }∞
i=1 are independent and
n=1 P(An ) = ∞ then P(An : i.o.) = 1
n=1
Proof:
S
• define Gm : n≥m An
T∞
then for k ∈ N we have m=1 Gm ⊂ Gk
P({An : i.o.) = P(limsupn An )
∞
\
= P(
Gm )
m=1
≤ P(Gk )
[ =P
An
≤
∞
X
∀k
P(An )
n=k
lim
k→∞
∞
X
P(An ) = 0
n=k
• {Ai }∞
i=1 independent =⇒
{Aci }∞
i=1 are independent.
r
r
\
Y
Acn =
P(Acn )
P
n=m
=
≤
n=m
r
Y
n=m
r
Y
by independence
1 − P(An )
e−P(An )
n=m
P
− rn=m P(An )
=e
lim e
r→∞
−
Pr
n=m
P(An )
=0
T
∞
c
hence we have that P
n=m An ) = 0
c
(limsupn An ) =
∞ \
[
Acn
m=1 n≥m
is the countable union of null sets hence is itself a null set hence
P(limsupn An ) = 1
Corollary:
If {An }∞
n=1 are independent then P(An i.o.) ∈ {0, 1}
Corollary:
T∞
If we write Fn = σ(An ) then {An i.o.} is a tail event of T = n=1 σ({Fi }∞
i=n )
Lemma: Existence Of Independent Random Variables
Let (Ωi , Fi , Pi ) i = 1, 2 be probability spaces with respective random variables X1 , X2 which have distributions
PX1 , PX2
11
3
EXPECTATION AND CONVERGENCE
N
If we set Ω = Ω1 &times; Ω2 , F = F1 F2 , P = P1 &times; P2
then for ω = (ω1 , ω2 ) ∈ Ω define
X̃1 (ω) = X1 (ω1 )
X̃2 (ω) = X2 (ω2 )
are defined on (Ω, F, P) and are independent.
3
Expectation And Convergence
Definition: Simple Function
A function f : S → R on measure space (S, F, &micro;) is simple if
∃{αi }ni=1 ∈ R|[0,∞] and partition {Ai }ni=1 ∈ F s.t.
f=
n
X
αi 1Ai
i=1
Definition: Integration Of Simple
Functions
Pn
If f : S → R is simple with f = i=1 αi 1Ai
then
Z
f d&micro; =
n
X
αi &micro;(Ai )
i=1
Definition: Integration Of Positive Measurable Functions
If f : S → R|[0,∞] is measurable
then
Z
o
nZ
gd&micro; : g ≤ f, g simple
f d&micro; = sup
Definition: &micro;-Integrable
f : S → R is &micro;-integrable if
Z
|f |d&micro; &lt; ∞
Definition: Integration Of &micro;-Integrable Functions
If f : S → R is &micro;-integrable
then
Z
Z
Z
f d&micro; = f + d&micro; − f − d&micro;
Remark
We denote the space of &micro;-integrable functions as L1 (&micro;) = L(S, F, &micro;)
Lemma:
The following properties hold for L1 (&micro;)
• f, g ∈ L1 (&micro;), α, β ∈ R =⇒
Z
Z
αf + βgd&micro; = α
Z
f d&micro; + β
• 0 ≤ f ∈ L1 (&micro;) =⇒
Z
f d&micro; ≥ 0
• {fn }∞
n=1 , f measurable s.t. limn→∞ fn = f and fn ≤ fn+1 ∀n =⇒
Z
Z
lim
fn d&micro; = f d&micro;
n→∞
• f = 0 a.e. ⇐⇒
Z
f d&micro; = 0
12
gd&micro;
3
EXPECTATION AND CONVERGENCE
Theorem: Fubini’s
N Theorem
N
N
Let (S1 &times; S2 , F1N F2 , &micro;1 &micro;2 ) be a measure space and f ∈ L1 (&micro;1 &micro;2 )
then let &micro; = &micro;1 &micro;2
Z Z
Z
Z Z
f (s1 , s2 )d&micro;1 (s1 )d&micro;2 (s2 ) = f (s)d&micro;(s) =
f (s1 , s2 )d&micro;2 (s2 )d&micro;1 (s1 )
Moreover it is sufficient that f ≥ 0 measurable for the above integral inequality to hold.
Definition: Expectation
If X is a r.v. on (Ω, F, P) probability space X ∈ L1 (P) then
Z
E[X] := XdP
Lemma:
If X is a r.v. on (Ω, F, P) with distribution PX and h : R → R is Borel measurable then
• h ◦ X ∈ L1 (P) ⇐⇒ h ∈ L1 (PX )
R
R
• E[h ◦ X] = h ◦ XdP = hdPX
Corollary:
Z
E[X] =
Z
zdPX (z) =
XdP
Corollary:
Take h = 1B for B ∈ B(R) then
Z
E[1B ◦ XdP] = 1B ◦ XdP
Z
= 1X −1 (B) dP
= P(X −1 (B))
= P(X ∈ B)
= PX (B)
Z
= 1B dPX
This holds for all indicators hence by linearity holds for all simple functions therefore by monotone convergence
theorem holds for all positive measurable functions and all integrable functions.
Lemma:
If X, Y are independent r.vs on probability space (Ω, F, P) then
E[XY ] = E[X]E[Y ]
Proof:
Z
E[XY ] =
xydPX,Y
Z Z
=
XY dPX dPY
Z
Z
= Y
XdPX dPY
Z
Z
= XdPX Y dPY
by Fubini
= E[X]E[Y ]
13
3
EXPECTATION AND CONVERGENCE
Remark:
If X is a r.v on (Ω, F, P) and A ∈ F then we use the following notation:
Z
E[X; A] := E[X1A ] = X1A dP
Lemma: Markov’s Inequality
If X is a random variable, ε &gt; 0 and g : R → R|[0,∞] is Borel measurable and non-decreasing then
•
P(g(X) &gt; ε)ε ≤ E[g(X)]
•
P(X &gt; ε)g(ε) ≤ E[g(X)]
Proof:
g ◦ X is measurable and non-negative by properties of measurable functions hence
•
Z
E[g(X)] =
g(X)dP
Z
≥
g(X)1{g(X)&gt;ε} dP
Z
≥ ε1{g(X)&gt;ε} dP
Z
≥ ε 1{g(X)&gt;ε} dP
= εP(g(X) &gt; ε)
•
Z
E[g(X)] =
g(X)dP
Z
≥
g(X)1{X&gt;ε} dP
Z
≥ g(ε)1{X&gt;ε} dP
Z
≥ g(ε) 1{X&gt;ε} dP
= g(ε)P(X &gt; ε)
Corollary: Chebyshev’s Inequality
If X is a r.v. with finite variance then
P(|X − E[X]| &gt; ε) ≤
∀ε &gt; 0
Proof:
P(|X − E[X]| &gt; ε) = P((X − E[X])2 &gt; ε2 )
≤
E[(X − E[X])2 ]
ε2
V ar[X]
=
ε2
14
V ar[X]
ε2
4
CONVERGANCE
Theorem:
If {Xi }ni=1 are independent random variables each with mean &micro; and variance σ 2 then for ε &gt; 0
n
X
Xi
σ2
P − &micro; &gt; ε ≤ 2
n
nε
i=1
Proof:
Pn
define Z = i=1 Xni then
E[Z] = &micro;
V ar[Z] = σ 2 /n
hence by Chebyshev’s inequality this clearly holds.
Lemma: Jensen’s Inequality
If X is a random variable with E[|X|] &lt; ∞ s.t. f is convex on an interval containing the range of X and
E[|f ◦ X|] &lt; ∞ then
f (E[X]) ≤ E[f (X)]
Proof:
Pn
Suppose that X is simple so let X = i=1 ai 1Ai
f (E[X]) = f
n
X
!
ai P(Ai )
i=1
≤
∞
X
P(Ai )f (ai )
by convexity
i=1
= E[f (X)]
Suppose that X is a positive r.v. then ∃{Xn }∞
n=1 r.vs which are increasing and converge to X a.s.
then
f (E[X]) = lim f (E[Xn ])
n→∞
≤ lim E[f (Xn )]
n→∞
= E[f (X)]
4
Convergance
Definition: Almost Sure Convergance
If {Xi }ni=1 are a sequence of r.vs on (Ω, F, P) then {Xi }ni=1 converge to r.v. X on (Ω, F, P) if
P({ω ∈ Ω : lim Xn (ω) = X(ω)}) = 1
n→∞
Remark:
Amost sure limits are unique only up to equality a.s.
Corollary:
X = Y a.s. iff
P({ω ∈ Ω : X(ω) = Y (ω)}) = 1
Lemma:
{Xn }∞
n=1 converge to X a.s. iff
P(|Xn − X| &gt; ε i.o.) = 0
∀ε &gt; 0, n ∈ N
Corollary: Stability Properties
• If {Xn }∞
n=1 are r.vs s.t. Xn → X a.s. and g : R → R ios continuous then g(Xn ) → g(X) a.s.
15
• If
∞
{Xn }∞
n=1 {Yn }n=1
4 CONVERGANCE
are r.vs s.t. Xn → X a.s. and Yn → Y a.s. with α, β ∈ R then αXn + βYn → αX + βY
Definition: Convergence In Probability
If {Xn }∞
n=1 , X are r.vs on (Ω, F, P) then Xn → X in probability if
lim P(|Xn − X| &gt; ε) = 0
n→∞
Definition: Cauchy Convergence In Probability
If {Xn }∞
n=1 , X are r.vs on (Ω, F, P) then Xn → X Cauchy in probability if
lim P(|Xn − Xm | &gt; ε) = 0
n,m→∞
Lemma:
Let {Xn }∞
n=1 , X be r.vs on (Ω, F, P)
If Xn → ∞ a.s. then Xn → X in probability
Proof:
Xn → ∞ a.s. implies
∀ε &gt; 0
P(|Xn − X| &gt; ε) ≤ P(supk≥n |Xk − X| &gt; ε)
hence we have
lim P(|Xn − X| &gt; ε) ≤ lim P(supk≥n |Xk − X| &gt; ε)
n→∞
n→∞
≤ P(limn→∞ supk≥n |Xk − X| &gt; ε)
= P(|Xn − X| &gt; ε i.o.)
=0
Lemma:
Let {Xn }∞
n=1 , X be r.vs on (Ω, F, P)
Xn → X in probability iff Xn → X Cauchy in probability.
Proof:
Suppose Xn → X in probability.
then
|Xn − Xm | ≤ |Xn − X| + |X − Xm |
{|Xn − Xm | &gt; ε} ⊆ {|Xn − X| &gt; ε/2} ∪ {|X − Xm | &gt; ε/2}
P(|Xn − Xm | &gt; ε) ≤ P(|Xn − X| &gt; ε/2) + P(|X − Xm | &gt; ε/2)
lim P(|Xn − Xm | &gt; ε) ≤
n,m→∞
lim P(|Xn − X| &gt; ε/2) + P(|X − Xm | &gt; ε/2)
n,m→∞
=0
Now suppose Xn → X Cauchy in probability
for any k ≥ 1 choose ε = 2−k
∃mk s.t. ∀n &gt; m ≥ mk we have that
P(|Xn − Xm | &gt; 2−k ) &lt; 2−k
set n1 = m1 and recursively define nk+1 = max{nk + 1, mk+1 }
define Xk0 := Xnk
0
definePAk := {|Xk+1
− Xk0 | &gt; 2−k }
∞
then k=1 P(Ak ) &lt; ∞
then by Borel Cantelli
0
|Xk+1
(ω) − Xk0 (ω)| ≤ 2−k
∀ω ∈ Ω \ A,
where P(A) = 0
hence ∀ω ∈ Ω \ A
16
∀k ≥ K0
4
lim
n→∞
0
supm&gt;n |Xm
−
Xn0 |
≤ lim
n→∞
∞
X
0
|Xm
− Xn0 |
k=n
∞
X
2−k
≤ lim
n→∞
k=n
=0
hence
P(limsupn Xn0 = liminfn Xn0 &lt; ∞) = 1
define X := liminfn Xn0 since Xnk → X a.s.
by the previous lemma we have that Xnk → X in probability
hence ∀ε &gt; 0
lim P(|Xk − X| &gt; ε) ≤ lim (P(|Xk − Xnk | &gt; ε/2) + P(|Xnk − X| &gt; ε/2))
k→∞
k→∞
=0
Lemma:
LetP{Xn }∞
n=1 , X be random variables on (Ω, F, P)
∞
If n=1 P(|Xn − X| &gt; ε) &lt; ∞
∀ε &gt; 0 then
Xn → X a.s.
Proof:
This follows directly from Borel Cantelli.
Lemma:
If {Xn }∞
n=1 , X are r.vs then
Xn → X in probability iff
∞
∀{Xnk }∞
k=1 ∃{Xnkr }r=1 s.t. Xnkr → X a.s.
Proof:
Suppose Xn → X in probability
the ∀ε &gt; 0 find nkr ∈ N s.t. P(|Xnkr − X| &gt; 1/r) &lt; 2−r
then
∞
∞
X
X
P(|Xnkr − X| &gt; 1/r) &lt;
2−r &lt; ∞
r=1
r=1
moreover ∀ε &gt; 0
X
r&gt;1/ε
P(|Xnkr − X| &gt; ε) &lt;
X
P(|Xnkr − X| &gt; 1/r)
r&gt;1/ε
≤
X
2−k
r&gt;1/ε
&lt;∞
hence we simply take the sequence {Xnkr }r&gt;1/ε
which converges to X a.s.
Assume that every subsequence has a subsequence converging a.s.
we need to show that X is unique.
fix ε &gt; 0
bn := P(|Xn − X| &gt; ε) ∈ [0, 1] is compact hence
17
CONVERGANCE
4
∃bnk converging subsequence
by our assumption Xnk has a subsequence Xnkr → X a.s.
lim bnkr = lim P(|Xnkr − X| &gt; ε) = 0
r→∞
r→∞
since a.s. convergence implies convergence in probability.
hence since bnk is convergent we must have that limk→∞ bnk = 0
hence every subsequence of bn which converges does so to 0
Theorem: The Weak Law Of Large Numbers
1
If {Xn }∞
n=1 are i.i.d r.vs with Xi ∈ L (Ω, F, P) then
lim
n→∞
n
X
Xk
k=1
n
= E[X1 ]
Proof:
Write
Xk∗ := Xk 1{|Xk |≤k1/4 }
Xk0 := Xk 1{|Xk |&gt;k1/4 }
Yn∗ :=
Yn0 :=
n
X
X ∗ − E[X ∗ ]
k
k=1
n
X
k=1
k
n
Xk0 − E[Xk0 ]
n
It suffices to show that Yn∗ , Yn0 → 0 in probability.
V ar[Xk∗ ] ≤ E[Xk∗ ]2
V ar[Yk∗ ] ≤
≤ k 1/2
n
X
V ar[X ∗ ]
k
n2
k=1
≤
n
X
k 1/2
k=1
≤
n2
n
X
n1/2
k=1
n2
= n−1/2
by Chebyshev’s inequality
P(|Yn∗ | &gt; ε) ≤
but
V ar[Yn∗ ]
ε2
V ar[Yn∗ ]
=0
n→∞
ε2
lim
hence Yn∗ → 0 in probability
E[|Xk0 |] = E[|X10 |1|X10 |&gt;k1/4 ] → 0
by MCT
18
CONVERGANCE
4
CONVERGANCE
hence
E[|Yn0 |]
εP
P(|Yn0 | ≥ ε) ≤
≤
2
n
n
k=1
E[|Xk0 |]
ε
lim P(|Yn0 | ≥ ε) = 0
n→∞
hence Yn0 → 0 in probability.
Theorem: The Strong Law Of Large Numbers
let {Xn }∞
n=1 be independent r.vs s.t.
• E[|Xn |2 ] &lt; ∞
∀n
• v := supXn {V ar[Xn ]} &lt; ∞
then
n
X
Xk − E[Xk ]
n
k=1
→0
a.s.
Proof:
WLOG assume
Pn E[Xk ] = 0
define Sn = i=1 Xni
Notice that Sn → 0 in probability so we want a subsequence Snk → 0 a.s.
P(|Sn2 | &gt; ε) ≤
hence
∞
X
V ar[Sn2 ]
n2 ε2
P(|Sn2 | &gt; ε) &lt; ∞
∀ε &gt; 0
n=1
so we have that Sn2 → 0 a.s.
For m ∈ N let n = nm s.t.
n2 ≤ m ≤ (n + 1)2
Pk
write yk = ksk = i=1 Xi
P(|ym − yn2 | &gt; εn2 ) ≤ ε−2 n−4 V ar
m
h X
Xi
i
i=n2 +1
2
≤ ε−2 n−4 v(m − n )
2
∞ (n+1) −1
v X X m − n2
P(|ym − yn2m | &gt; εn ) ≤ 2
ε n=1
n4
2
m=1
∞
X
2
m=n
2n
X
k
v X
n = 1∞
≤ 2
ε
n4
k=1
∞
v X 2n(2n + 1)
= 2
ε n=1
2n4
≤
∞
v X 2
ε2 n=1 n2
&lt;∞
hence since the intersection of two sets of full measure is also a set of full measure (where finite)
19
4
we get that
|Sm | = |ym /m| ≤ |ym /n2 | → 0
a.s.
Definition: Absolute Moment
Let X be a random variable on probability space (Ω, F, P)
then for p ∈ [1, ∞) the pth moment of X is defined as
Z
p
E[|X| ] = |X|p dP
Definition: Lp Space
Lp (Ω, F, P) := {X : Ω → R : E[|X|p ] &lt; ∞, X measurable}
Definition: Lp Norm
The Lp norm is defined as
||X||p := E[|X|p ]1/p
Lemma: Holder’s Inequality
For p, q ∈ [1, ∞) s.t. p1 + 1q = 1
we have that for X ∈ Lp , Y ∈ Lq
E[|XY |] ≤ ||X||p ||Y ||q
moreover this holds for p = 1, q = ∞ where
||Y ||∞ := inf {s ≥ 0 : P(|Y | &gt; s) = 0}
Lemma:
For p ∈ [1, ∞], X, Y ∈ Lp we have
||X + Y ||p ≤ ||X||p + ||Y ||p
Corollary:
||X||p = 0 =⇒ X = 0 a.s.
Definition: Quotient Space
Define N0 = {X ∈ Lp : X = 0 P a.s.}
then the quotient space of Lp is
Lp =
Lp
N0
Definition: Equivalence Class
We denote the equivalence class of X ∈ Lp to be
[X] := {Y : X = Y a.s.}
Definition: Convergance in Lp
p
Let p ∈ [1, ∞] and {Xn }∞
n=1 , X ∈ L
then
Xn → X
in Lp
if
lim ||Xn − X||p = 0
n→∞
Definition: Cauchy Sequence
p
{Xn }∞
n=1 is a Cauchy sequence in L if
∀ε &gt; 0 ∃N ∈ N s.t. ∀n, m ≥ N we have that
||Xn − Xm ||p &lt; ε
Theorem:
20
CONVERGANCE
4
{Xn }∞
n=1
p
CONVERGANCE
p
for p ∈ [1, ∞] let
∈ L be a Cauchy sequence in L then
∃X ∈ Lp s.t. Xn → X in Lp
Lemma:
If 1 ≤ s ≤ r then
Xn → X in Lr =⇒ Xn → X in Ls
Proof:
For s = 1, r = 2
E[|Xn − X|s ] =
Z
|Xn − X|dP
Z
|Xn − X|dP
=
Z
≤
1/2 Z
1/2
|Xn − X|2 dP
12 dP
by Cauchy-Schwarz inequality
= E[|Xn − X|2 ]1/2
lim E[|Xn − X|2 ] = 0 =⇒
n→∞
lim E[|Xn − X|2 ]1/2 = 0
n→∞
hence indeed the lemma holds for s = 1, r = 2
Lemma:
p
Let {Xn }∞
n=1 , X be r.vs s.t. Xn → X in L for p ≥ 1 then
Xn → X in probability.
Proof:
let ε &gt; 0
P(|Xn − X| &gt; ε) = P(|Xn − X|p &gt; εp )
≤
E[|Xn − X|p ]
εp
by Markov’s inequality
p
lim
n→∞
E[|Xn − X| ]
=0
εp
by convergence in Lp
Lemma:
X ∈ L1 (Ω, F, P) iff
Z
|X|dP = 0
lim
k→∞
{|X|&gt;k}
Definition: Uniformly Integrable
A set of random variables C on (Ω, F, P) is uniformly integrable if
Z
lim supX∈C
|X|dP = 0
k→∞
{|X|&gt;k}
Theorem:
1
If X, {Xn }∞
n=1 ∈ L (Ω, F, P) then
1
Xn → X in L iff
• Xn → X in probability
• {Xn }∞
n=1 is uniformly integrable.
21
4
CONVERGANCE
Proof:
Xn → X in L1 =⇒ Xn → X in probability by a previous lemma.
moreover if Xn → X in L1 we have finite integrability since otherwise P(X = ∞) &gt; 0 hence the expectation
cannot be finite.
Assume that Xn → X in probability and {Xn }∞
n=1 is uniformly integrable.
for k &gt; 0 define


x&gt;k
k
ϕk (x) = x
x ∈ [−k, k]


−k x &lt; −k
then we have that
E[|Xn − X|] ≤ E[|Xn − ϕk (Xn )|] + E[|ϕk (Xn ) − ϕk (X)|] + E[|ϕk (X) − X|]
notice that
|ϕk (Xn ) − Xn | = 1{|Xn |&gt;k} |k − |Xn | ≤ 2|Xn |1{|Xn |&gt;k}
hence let ε &gt; 0 then by uniform integrability we have that
∃k1 ∈ [0, ∞) s.t. ∀k ≥ k1
Z
supn
|Xn |dP ≤ ε/6
{|Xn |&gt;k}
hence ∀n we have that k &gt; k1 =⇒
Z
E[|Xn − ϕk (Xn )|] ≤ supn 2
|Xn |dP ≤ ε/3
{|Xn |&gt;k}
Since X ∈ L1 we have that
Z
|X|dP = 0
lim
k→∞
{|X|&gt;k}
hence ∃k2 ∈ [0, ∞) s.t. ∀k &gt; k2
E[|X − ϕk (X)|] ≤ ε/3
set k0 = min{k1 , k2 }
P(ϕk0 (Xn ) − ϕk0 (X) &gt; ε) ≤ P(Xn − X &gt; ε)
hence since Xn → X in probability we have that ϕk (Xn ) → ϕk (X) in probability
so by the Dominated Convergence Theorem |ϕk0 | ≤ k0 so
∃N ∈ N s.t. ∀n ≥ N we have that
E[|ϕk (Xn ) − ϕ(X)|] &lt; ε/3
hence indeed we have that for k = k0 and n ≥ N we have that
E[|Xn − X|] ≤ E[|Xn − ϕk (Xn )|] + E[|ϕk (Xn ) − ϕk (X)|] + E[|ϕk (X) − X|] ≤ ε
Definition: Probability Set
We define Prob(R) to be the set of probability measures on (R, B(R))
Definition: Continuous Bounded Functions
We define Cb (R) to be the set of continuous bounded functions f : R → R
Definition: Weak Convergence
• If &micro;, {&micro;n }∞
n=1 ∈ Prob(R) then
&micro;n → &micro; weakly if
Z
lim
n→∞
Z
f d&micro;n =
f d&micro;
• If F, {Fn }∞
n=1 are distribution functions then
Fn → F weakly if &micro;n → &micro; weakly
where &micro;, {&micro;n }∞
n=1 are the associated probability measures
F (x) = &micro;((−∞, x])
22
∀f ∈ Cb (R)
4
CONVERGANCE
X, {Xn }∞
n=1
• If
are random variables then
Xn → X weakly if PXn → PX weakly
where PX (A) = P(X ∈ A)
Definition: Convergence In Distribution
• If F, {Fn }∞
n=1 are distribution functions then
Fn → F in distribution if for all continuous points x of F
lim Fn (x) = F (x)
n→∞
• If X, {Xn }∞
n=1 are random variables then
Xn → X in distribution if FPXn → FPX in distribution
where FPX (x) = P(X ≤ x)
Theorem:
∞
If {Fn }∞
n=1 , F are distribution functions of probability measures {&micro;n }n=1 &micro; then
Fn → F weakly iff Fn → F in distribution.
Proof:
Suppose Fn → F weakly
fix x ∈ R s.t. F is continuous at x and let δ &gt; 0
define


y≤x
1
y−x
h(y) = 1 − δ
y ∈ (x, x + δ)


0
y ≥x+δ
and
then since Fn (x) =


1
g(y) = 1 −


0
R
y ≤x−δ
y ∈ (x − δ, x)
y≥x
y−x
δ
1(−∞,x] d&micro;n we have that
Z
Z
gd&micro;n ≤ F (x) ≤
hd&micro;n
Z
limsupn Fn (x) ≤ limsupn
Z
Z
liminfn Fn (x) ≥ liminfn
hd&micro; ≤ F (x + δ)
hd&micro;n =
Z
gd&micro;n =
gd&micro; ≥ F (x − δ)
by continuity of h, g
By continuity of F at x we have that
lim F (x + δ) = F (x) = lim F (x − δ)
δ→0
δ→0
hence
lim Fn (x) = F (x)
n→∞
hence Fn → F in distribution.
Suppose Fn → F in distribution.
then
Z
lim
n→∞
Z
1(−∞,x] d&micro;n =
1(−∞,x] d&micro;
∀x ∈ R
Since any continuous bounded function can be approximated by sums of indicator functions and linearity of the
integral we indeed have that Fn → F weakly
Lemma:
Let {Xi }∞
i=1 , X be random variables on (Ω, F, P) then if Xn → X in probability then Xn → X weakly
Proof:
There are two different proofs of this
23
4 CONVERGANCE
• Since Xn → X in probability and f is continuous we have that f (Xn ) → f (X) in probability.
f (Xn ) is bounded by the proerties of f hence by the dominated convergance theorem
Z
Z
f (Xn )dP = f (X)dP
lim
n→∞
hence indeed Xn → X weakly.
• Since f is bounded we know that ∃k &gt; 0 s.t. |f (Xn )| ≤ k
Z
limk→∞ supXn
∀n hence
|f (Xn )|dP = 0
{|f (Xn )&gt;k|}
so f (Xn ) is uniformly integrable.
Since f is continuous and Xn → X in probability we have that f (Xn ) → f (X) in probability.
This along with uniform integrability implies that f (Xn ) → f (X) in L1
i.e. limn→∞ E[|f (Xn ) − f (X)|] = 0
hence indeed we have weak convergance.
Corollary:
Let {Xi }∞
i=1 , X be random variables on (Ω, F, P) then if Xn → X a.s. then Xn → X weakly
Definition: Conditional Probability
If (Ω, F, P) is a probability space with A, B ∈ F
then the conditional probabiltiy of B given A has occurred (where P(A) &gt; 0 is
P(B|A) =
P(A ∩ B)
P(A)
Definition: Conditional Expectation
If (Ω, F, P) is a probability space with A ∈ F with strictly positive probability
then the conditional expectation of r.v X given A has occurred is
Z
1
E[X|A] =
XdP
P(A) A
morover if F0 = σ(G) for a countable partition G = {Ai }∞
i=1 of measurable sets on Ω
then
Z
X
1Ai
E[X|F0 ] =
XdP
P(Ai ) Ai
i:P(Ai )&gt;0
is a random variable.
Theorem:
Let X ∈ L1 (Ω, F, P), F0 = σ(G) s.t. G is a countable partition {Ai }∞
i=1 ∈ Ω
Then E[X|F0 ] has the following properties:
• E[X|F0 ] is F0 measurable.
• ∀A ∈ F0 we have that
E[X1A ] = E[1A E[X|F0 ]]
Proof:
i]
• This is trivial since E[X;A
P(Ai ) is constant, 1Ai are measurable since Ai ∈ F0 and the countable sum of
measurable functions is measurable.
• Let A ∈ G with P(A) &gt; 0
24
4
CONVERGANCE


1
A
i
E[1A E[X|F0 ]] = E 1A
E[X; Ai ]
P(Ai )
i:P(Ai )&gt;0
1A
E[X; A]
=E
P(A)
E[1A ]E[X; A]
=
P(Ai )
= E[X; A]
X
since {Ai }∞
i=1 form a partition
= E[X1A ]
This extends to A ∈ F0 by standard operations of elements of σ-algebras. If P(A) = 0 then
X1A , E[X|F0 ]1A = 0 a.s. hence both sides of the equation are null.
Corollary:
Taking y = 1 in the second property we have that
E[X] = E[E[X|F0 ]]
Definition: Version Of Conditional Expectation
A random variable X0 ∈ L1 (Ω, F, P) is called a version E[X|F0 ] if
• X0 is F0 measurable
• ∀A ∈ F0 we have
E[1A X] = E[1A X0 ]
Theorem:
Let X1 , X2 ∈ L1 (Ω, F, P) then
•
E[X1 + X2 |F0 ] = E[X1 |F0 ] + E[X2 |F0 ]
P a.s.
• ∀c ∈ R
E[cX|F0 ] = cE[X|F0 ]
P a.s.
Proof:
For any choice of E[Xi |F0 ] we have that E[X1 |F0 ] + E[X2 |F0 ] is measurable hence
E[1A (E[X1 |F0 ] + E[X2 |F0 ])] = E[1A E[X1 |F0 ] + 1A E[X2 |F0 ]]
= E[1A E[X1 |F0 ]] + E[1A E[X2 |F0 ]]
= E[1A X1 ] + E[1A X2 ]
= E[1A (X1 + X2 )]
= E[1A E[X1 + X2 |F0 ]]
Theorem:
Let X1 , X2 ∈ L1 (Ω, F, P) s.t. X1 ≤ X2
then
E[X1 |F0 ] ≤ E[X2 |F0 ]
P a.s.
25
by linearity of expectation
4
Proof:
Let B0 := {ω ∈ Ω : E[X1 |F0 ] &gt; E[X2 |F0 ]} ∈ F0 we want to show that P(B0 ) = 0
0 ≤ E[1B0 (E[X1 |F0 ] − E[X2 |F0 ])]
= E[1B0 (X1 − X2 )]
≤0
hence P(B0 ) = 0 as required.
Theorem:
If Z, W are conditional expectations of X|F0 for some X ∈ L1 then Z = W
Proof:
Z, W are F0 -measurable so write
A0 := {Z &gt; W } ∈ F0
then we have
P a.s.
E[1A0 (Z − W )] = E[1A0 Z] − E[1A0 W ]
= E[1A0 X] − E[1A0 X]
=0
since 1A0 (Z − W ) ≥ 0 we have that 1A0 (Z − W ) = 0 a.s.
but since Z &gt; W on A0 it must be that 1A0 = 0 a.s.
hence P(A0 ) = 0
this holds similarly for A1 := {W &gt; Z}
P(Z 6= W ) = P({Z &gt; W } ∪ {Z &lt; W })
≤ P(A0 ) + P(A1 )
=0
Definition: Respective Density
If &micro;, ν are measures on measure space (Ω, F) then ν has density with respect to &micro; if
∃f : Ω → [0, ∞) that is F-measurable s.t.
∀A ∈ F we have
Z
ν(A) =
f d&micro;
A
If &micro;, ν are finite measures on measurable space (Ω, F) then the following are equivalent:
• &micro;(A) = 0 =⇒ ν(A) = 0
• ν has density w.r.t. &micro;
Lemma:
If 0 ≤ X ∈ L1 and F0 ⊆ F is a sub-σ-algebra then take
&micro; := P|F0 and
ν : F0 → [0, ∞) s.t.
Z
∀A ∈ F0
ν(A) =
XdP
A
then if A is &micro;-null then by Radon-Nikodym ν has density g w.r.t. &micro;
Lemma:
26
CONVERGANCE
4
CONVERGANCE
If g is the density of ν w.r.t. &micro; where
Z
ν(A) :=
XdP
A
then g is a conditional expectation E[X|F0 ]
Proof:
let A0 ∈ F0
Z
E[g1A0 ] =
gdP
ZA0
=
gd&micro;
A0
Z
=
1A0 dν
Z
=
1A0 XdP
= E[1A0 X]
Corollary:
If X ∈ L1 then X = X+ − X− and if
g+ , g− are versions of E[X+ |F0 ], E[X− |F0 ] then by linearity
E[X|F0 ] = g+ − g−
Theorem:
If X ∈ L1 (Ω, F, P) and F0 ⊆ F1 ⊆ F are sub-σ-algebras then
X0 := E[E[X|F1 ]|F0 ] = E[X|F0 ]
Proof:
We want to show that E[X0 1A0 ] = E[X1A0 ]
E[X0 1A0 ] = E[1A0 E[E[X|F1 ]|F0 ]]
= E[1A0 E[X|F1 ]
since A0 ∈ F0
= E[X1A0 ]
since A0 ∈ F1
Theorem:
If X ∈ L1 (Ω, F, P) then
E[X|F0 ] ∈ L1 (Ω, F, P)
Proof:
Define A0 := {E[X|F0 ] ≥ 0} ∈ F0
then
E[E[X|F0 ]+ ] = E[E[X|F0 ]1A0 ]
= E[X1A0 ]
since X ∈ L1
&lt;∞
this holds similarly for E[E[X|F0 ]− ]
hence this holds for E[X|F0 ] by linearity.
Theorem:
27
4 CONVERGANCE
Let (Ω, F, P) be a probability space with F0 ⊂ F sub-σ algebra, X ∈ L (Ω, F, P) where F0 ⊥ σ(X)
then
E[X|F0 ] = E[X]
P a.s.
1
Proof:
E[X] is F0 measurable since constant so let A ∈ F0
E[1A E[X]] = E[X]E[1A ]
since E[X] is constant
= E[1A X]
by independence
= E[1A E[X|F0 ]]
Theorem: MCT For Conditional Expectations
1
Let {Xn }∞
n=1 ∈ L be an increasing sequence s.t. Xn → X
then
P-a.s. for some X ∈ L1
E[Xn |F0 ] → E[X|F]
both P-a.s. and in L1
Proof:
Since E[Xn |F0 ] is an increasing sequence of random variables ∃Z which is an F0 measurable r.v. s.t.
E[Xn |F0 ] → Z P-a.s.
For A ∈ F0 we have that
E[Z1A ] = E[ lim E[Xn |F0 ]1A ]
n→∞
= lim E[E[Xn |F0 ]1A ]
by Monotone Convergance of E
n→∞
= lim E[Xn 1A ]
n→∞
= E[X1A ]
by MCT
hence Z = E[X|F0 ] P-a.s.
0 ≤ E[X|F0 ] − E[Xn |F0 ]
≤ E[X|F0 ] − E[X1 |F0 ]
P a.s.
1
∈L
hence by dominated convergence theorem
lim E[|E[X|F0 ] − E[Xn |F0 ]|] = 0
n→∞
Theorem: DCT For Conditional Expectations
1
Let {Xn }∞
P-a.s.
n=1 ∈ L converge to X
1
If ∃Z ∈ L s.t. ∀n ∈ N we have that |Xn | ≤ Z P-a.s. then
• X ∈ L1
• E[Xn |F0 ] → E[X|F0 ] P-a.s.
• E[Xn |F0 ] → E[X|F0 ]
∈ L1
Proof:
• X ∈ L1 since |Xn | ≤ Z
P-a.s. =⇒ |X| ≤ Z
P-a.s.
28
4
CONVERGANCE
• define Un := infm≥n Xm is an increasing sequence
and Vn := supm≥n Xm is a decreasing sequence
then Un , Vn → X P-a.s.
moreover
−Z ≤ Un ≤ Xn ≤ Vn ≤ Z
by MCT for conditional expectations we have that
E[Un |F0 ] → E[X|F0 ]
E[Vn |F0 ] → E[X|F0 ]
both P-a.s. and in L1
and by monotonicity of conditional expectations we have that
E[Un |F0 ] ≤ E[Xn |F0 ] ≤ E[Vn |F0 ]
so indeed E[Xn |F0 ] → E[X|F0 ] in L1 and P-a.s.
Theorem:
Let X ∈ L1 (Ω, F, P) and Y be F0 measurable s.t. XY ∈ L1 then
E[XY |F0 ] = Y E[X|F0 ]
Proof:
Suppose Y = 1A for some A ∈ F0
then take B ∈ F0
E[1B Y E[X|F0 ]] = E[1B∩A E[X|F0 ]]
since A ∩ B ∈ F0
= E[1B∩A X]
= E[1B Y X]
= E[1B E[XY |F0 ]]
hence the theorem
P∞ holds for indicator functions
Suppose Y = n=1 αn 1An
&quot;
E[XY |F0 ] = E
∞
X
#
αn 1An X|F0
n=1
=
=
∞
X
n=1
∞
X
αn E[1An X|F0 ]
by linearity
αn 1An E[X|F0 ]
by the first part of the theorem
n=1
= Y E[X|F0 ]
hence the theorem holds for simple functions
Suppose Y is positive
let {Yn }∞
n=1 be an increasing sequence of simple functions s.t. Yn → Y
|Yn X| ≤ |Y X| ∈ L1
and Yn X → Y X
hence by DCT for conditional expectations we have that
29
4
Yn E[X|F0 ] = E[Yn X|F0 ] → E[Y X|F0 ]
CONVERGANCE
P − a.s.
moreover
Yn E[X|F0 ] → Y E[X|F0 ] P − a.s.
hence the theorem holds for positive functions.
For the general case write Y = Y+ − Y−
E[XY |F0 ] = E[X(Y+ − Y− )|F0 ]
= E[XY+ |F0 ] − E[XY− |F0 ]
= Y+ E[X|F0 ] − Y− E[X|F0 ]
= Y E[X|F0 ]
Lemma:
If X ∈ L1 , F0 ⊆ F is a sub-σ algebra then
||E[X|F0 ]||1 ≤ ||X||1
Proof:
Z
||E[X|F0 ]||1 =
|E[X|F0 ]|dP
Z
=
Z
=
Z
E[X|F0 ]+ dP +
E[X|F0 ]− dP
Z
E[X|F0 ]1E[X|F0 ]&gt;0 dP + E[X|F0 ]1E[X|F0 ]&lt;0 dP
= E[X1E[X|F0 ]&gt;0 ] + E[X1E[X|F0 ]&lt;0 ]
≤ E[|X|]
Z
= |X|dP
= ||X||1
Theorem: Conditional Jensen’s Inequality
Let I ⊂ R be open, ϕ : I → R be convex and X ∈ L1 a r.v. with X : Ω → I hence ϕ ◦ X ∈ L1
then
E[ϕ ◦ X|F0 ] ≥ ϕ(E[X|F0 ])
Proof:
From Jensen’s inequality we have that
ϕ(X) ≥ ϕ(E[X|F0 ]) + D− ϕ(E[X|F0 ])(X − E[X|F0 ])
let An = {ω ∈ Ω : |D− ϕ(E[X|F0 ])| ≤ n} ∈ F0
then
since 1An − F0 measurable
1An E[ϕ(X)|F0 ] = E[1An ϕ(X)|F0 ]
≥ E[1An ϕ(E[X|F0 ])] + E[1An D− ϕ(E[X|F0 ])(X − E[X|F0 ])]
= 1An ϕ(E[X|F0 ]) + 1An D− ϕ(E[X|F0 ])E[X − E[X|F0 ]]
= 1An ϕ(E[X|F0 ]) + 1An D− ϕ(E[X|F0 ])(E[X|F0 ] − E[X|F0 ]
= 1An ϕ(E[X|F0 ])
30
4
CONVERGANCE
this holds for all n hence also for limn→∞ An = Ω
Theorem:
Let p ≥ 1, X ∈ Lp then
||E[X|F0 ]||p ≤ ||X||p
Proof:
Write ϕ(x) = xp
Z
||E[X|F0 ]||p =
Z
=
Z
≤
is convex hence by Jensen’s inequality
|E[X|F0 ]|p dP
ϕ(|E[X|F0 ]|)dP
|E[ϕ(X)|F0 ]|dP
Z
= |E[X p |F0 ]|dP
= ||X||p
Corollary:
If (Ω, F), (Ω0 , F 0 ) are measurable spaces,
Y : (Ω, F) → (Ω0 , F 0 ) is F − F 0 measurable and
Z : Ω → R then
Z is σ(Y ) measurable iff
∃g : (Ω0 , F 0 ) → (R, B(R)) s.t. Z = g ◦ Y
Proof:
Clearly if ∃g : (Ω0 , F 0 ) → (R, B(R)) s.t. Z = g ◦ Y then Z is σ(Y ) measurable since the composition of measurable
functions is measurable.
Suppose
n
X
Z=
αk 1Ak
k=1
{Ak }nk=1
for
∈ σ(Y )
notice that σ(Y ) = {Y −1 (A0k ) : A0k ∈ F 0 }
hence Ak = Y −1 (A0k ) for some A0k ∈ F 0
so
Z=
=
=
=
n
X
k=1
n
X
k=1
n
X
αk 1Ak
αk 1Y −1 (A0k )
αk (1A0k ◦ Y )
k=1
n
X
!
αk 1A0k
◦Y
k=1
=g◦Y
where clearly g is F 0 − B(R) measurable since {A0k }nk=1 ∈ F 0
Suppose Z is positive.
then ∃{Zn }∞
n=1 simple increasing functions s.t. limn→∞ Zn = Z
0
0
then ∃{gn }∞
n=1 (Ω , F ) → (R, B(R)) s.t. Zn = gn ◦ Y
then define g = supn gn is clearly F 0 − B(R) measurable and Z = g ◦ Y
31
4
For general Z write Z = Z+ − Z−
then ∃g+ , g− (Ω0 , F 0 ) → (R, B(R)) s.t. Z+ = g+ ◦ Y, Z− = g− ◦ Y
so write g = g+ − g−
hence
Z = (g+ − g− ) ◦ Y = g ◦ Y
Definition: Factorised Conditional Expectation
f : (R, B(R)) → (R, B(R)) is a FCE of the r.v. X given the r.v. Y : Ω → R if
f ◦ Y is a version of the conditional expectation of X given Y
i.e.
E[X|Y ] = E[X|σ(Y )] = f (Y )
Corollary:
Let Y : (Ω, F) → (R, B(R)) and X ∈ L1 (Ω, F, P)
then
∃f FCE of X given Y and
Z
f dPY = E[1Y −1 (A) X]
A
Proof:
By the previous lemma
write Z = E[X|Y ] which is σ(Y ) measurable hence indeed
∃f FCE of X given Y
moreover
Z
Z
f dPY = 1Y −1 (A) XdP
A
= E[1Y −1 (A) X]
Remark:
E[X|Y = y] is unique PY -a.s.
Theorem:
Let Y : (Ω, F) → (R, B(R)) and X ∈ L1 (Ω, F, P)
If λ, &micro; are σ-finite measures on (R, B(R)) s.t. (X, Y ) has density g w.r.t. λ ⊗ &micro;
then
R
xg(x, y)dλ(x)
f (Y ) := RR
g(x, y)dλ(x)
R
is an FCE of X given Y
Proof:
recall that
Z
P((X, Y ) ∈ A1 &times; A2 ) =
g(x, y)d(λ ⊗ &micro;)(x, y)
ZA1 &times;A2
E[h(X, Y )] =
h(x, y)g(x, y)d(λ ⊗ &micro;)(x, y)
R2
Suppose f takes the required form, hence clearly it is Borel measurable.
32
CONVERGANCE
5
MARTINGALES
Let A ∈ B(R)
Z
x1A (y)g(x, y)d(λ ⊗ &micro;)(x, y)
Z
Z
= 1A (y) xg(x, y)dλ(x)d&micro;(y)
Z
Z
= 1A (y)f (y) g(x, y)dλ(x)d&micro;(y)
Z
= 1A (y)f (y)g(x, y)d(λ ⊗ &micro;)(x, y)
E[X1Y −1 (A) ] =
by Fubini
= E[1A (y)f (y)]
hence indeed f (y) = E[X|Y ]
5
Martingales
Definition: Filtration
If (Ω, F) is a measurable space then {Fi }∞
i=1 sub-σ-algebras of F are a filtration if Fi ⊆ Fi+1 ⊆ F
∞
A sequence of random variables {Xi }∞
i=1 is called adapted to {Fi }i=1 if
Xn is Fn -measurable ∀n
Definition: Martingale
If (Ω, F, P) is a probability space and {Fi }∞
i=1 is a filtration of (Ω, F) then
{Xn }∞
is
a
martingale
if:
n=1
• E[|Xn |] &lt; ∞
∀n
∞
• {Xn }∞
n=1 are adapted to {Fn }n=1
• E[Xn+1 |Fn ] = Xn
Remark:
If {Xn }∞
n=1 has the first three properties of a martingale but
E[Xn+1 |Fn ] ≥ Xn then {Xn }∞
n=1 is a sub-martingale
E[Xn+1 |Fn ] ≤ Xn then {Xn }∞
n=1 is a super-martingale
Lemma:
∞
If {Xn }∞
n=1 is a martingale w.r.t. {Fn }n=1 then
E[Xn+m |Fn ] = 0
Proof:
E[Xn+m |Fn ] = E[E[Xn+m |Fn ∩ Fn+m−1 ]]
= E[E[Xn+m |Fn+m−1 ]|Fn ]
= E[Xn+m−1 |Fn ]
= E[Xn+1 |Fn ]
= Xn
Theorem:
2
∞
If {Xn }∞
n=1 ∈ L is a martingale w.r.t. {Fn }n=1 and r ≤ s ≤ t
then
• E[Xr (Xs − Xr )|Fr ] = 0
• E[Xr (Xt − Xs )|Fs ] = 0
33
∀i
5
• E[(Xt − Xs )(Xs − Xr )|Fs ] = 0
• E[Xr (Xs − Xr )] = 0
• E[Xr (Xt − Xs )] = 0
• E[(Xt − Xs )(Xs − Xr )] = 0
Proof:
•
E[Xr (Xs − Xr )|Fr ] = E[Xr Xs − Xr2 |Fr ]
= E[Xr Xs |Fr ] − E[Xr2 |Fr ]
= X r E[Xs |Fr ] − Xr2
=0
•
E[Xr (Xt − Xs )|Fs ] = E[Xr Xt − Xr Xs |Fs ]
= E[Xr Xt |Fs ] − E[Xr Xs |Fs ]
= X r E[Xt |Fs ] − X r Xs
=0
•
E[(Xt − Xs )(Xs − Xr )|Fs ] = E[Xt Xs − Xs2 − Xt Xr + Xs Xr |Fs ]
= E[Xt Xs |Fs ] − E[Xs2 |Fs ] − E[Xt Xr |Fs ] + E[Xs Xr |Fs ]
= Xs E[Xt |Fs ] − Xs2 − Xr E[Xt |Fs ] + Xs Xr
= Xs2 − Xs2 − Xr Xs + Xs Xr
=0
•
E[Xr (Xs − Xr )] = E[E[Xr (Xs − Xr )|Fr ]]
= E[0]
=0
E[Xr (Xt − Xs )] = E[E[Xr (Xt − Xs )|Fs ]]
= E[0]
=0
E[(Xt − Xs )(Xs − Xr )] = E[E[(Xt − Xs )(Xs − Xr )|Fs ]]
= E[0]
=0
34
MARTINGALES
5
MARTINGALES
n
Definition: L Bounded
A set of r.vs C is Ln bounded if
supX∈C E[|X|n ] &lt; ∞
Theorem:
∞
2
Every L2 bounded martingale {Xn }∞
n=1 w.r.t. {Fn }n=1 converges in L
2
2
i.e. ∃I ∈ L s.t. limn→∞ E[|Xn − Y | ] = 0
Proof:
L2 is complete hence every cauchy sequence converges
n−1
X
E[(Xk+1 − Xk )2 ] =
k=1
n−1
X
2
E[(Xk+1
+ Xk2 − 2Xk Xk+1 ]
k=1
=
n−1
X
2
E[(Xk+1
+ Xk2 − 2Xk Xk+1 ]
k=1
= E[(Xn − X1 )2 ]
by a telescoping sum, orthogonality of increments
since Xn is L2 bounded
≤ 2c
this yields
lim supm≥n≥k E[(Xm − Xn )2 ] = lim supm≥n≥k
n→∞
k→∞
≤ lim supm≥n≥k
k→∞
m−1
X
E[(Xl+1 − Xl )2 ]
l=n
∞
X
E[(Xl+1 − Xl )2 ]
l=k
=0
Remark:
2
The previous theorem
S∞ also gives us that Y ∈ L (Ω, F∞ , P)
where F∞ := σ ( k=1 Fk )
Theorem: Levy
Let X ∈ L2 (Ω, F, P) and {Fk }∞
k=1 a filtration
define Xn := E[X|Fn ]
then Xn → E[X|F∞ ] in L2
Proof:
We have that Xn converges in L2 so let Xn → Y
We want to show that E[Y 1A ] = E[X1A ]
∀A ∈ Fn
let A ∈ Fn and m ≥ n then notice that
E[Xm 1A ] = E[E[X|Fm ]1A ]
= E[X1A ]
since Xm → Y in L2 we have that
lim |E[Xm 1A − Y 1A ]| ≤ lim (E[Xm − Y ]2 1A )1/2
m→∞
by the Cauchy-Schwarz inequality
m→∞
=0
hence indeed we have that E[X1A ] = E[Y 1A ] ∀A ∈ Fn
We need this ∀A ∈ F∞
35
5
MARTINGALES
define
D := {A ∈ F∞ : E[X1A ] = E[Y 1A ]}
We want to show that D is a dynkin system.
Ω ∈ D holds trivially since Ω ∈ Fn ∀n
Let A ∈ D
E[Y 1Ac ] = E[Y 1Ω ] − E[Y 1A ]
= E[X1Ω ] − E[X1A ]
= E[X1Ac ]
as required.
S∞
Suppose {Ai }∞
i=1 ∈ D are disjoint and A :=
i=1 Ai
&quot;
E[Y 1A ] = E Y lim
n→∞
&quot;
= lim E Y
n→∞
= lim
n→∞
= lim
n→∞
n
X
k=1
n
X
#
1Ak
#
1Ak
by DCT
k=1
n
X
k=1
n
X
E[Y 1Ak ]
E[X1Ak ]
k=1
&quot;
= lim E X
n→∞
&quot;
= E X lim
n→∞
n
X
k=1
n
X
#
1Ak
#
1Ak
k=1
= E[X1A ]
SinceSD is a Dynkin system if I ⊂ D is a π-system then σ(I) ⊂ D
∞
I := n=1 Fn ⊂ D is a π-system since
if {Ai }ni=1 ∈ I then ∀i ∃ki s.t. Ai ∈ Fki
hence since finite ∃K = maxi {ki }
then since {Fn }∞
n=1 is a filtration
Tn Ai ∈ FK ∀i
hence since FK is a σ-algebra i=1 Ai ∈ FK
thus F∞ = σ(I) ⊆ D ⊆ F∞
Lemma: DOOB
∞
2
∞
Let {Fn }∞
n=1 be a filtration of (Ω, F, P) and {Xn }n=1 ∈ L an adapted sequence of {Fn }n=1 then
1
∞
1
• ∃{An }∞
n=1 ⊂ L and martingale {Mn }n=1 ⊂ L with
– A1 = 0
– An+1
Fn -measurable
– Xn = Mn + An
which is a P-a.s. unique decomposition
• For Xn = Mn + An above we have that
∞
{Xn }∞
n=1 is a sub-martingale iff {An }n=1 is increasing P-a.s.
36
5
MARTINGALES
Proof:
• If we construct the sequence recursively then the sequence will be P-a.s unique
A1 = 0
=⇒ M1 = X1
Xn+1 = Mn+1 + An+1
E[Xn+1 |Fn ] = E[Mn+1 |Fn ] + E[An+1 |Fn ]
= Mn + An+1
since An+1
An+1 = E[Xn+1 |Fn ] − Mn
is clearly Fn measurable.
E[Mn+1 |Fn ] = E[Xn+1 − An+1 |Fn ]
= E[Xn+1 |Fn ] − E[An+1 |Fn ]
= An+1 + Mn − An+1
= Mn
hence {Mn }∞
n=1 is a martingale.
•
An+1 − An = E[Xn+1 |Fn ] − Mn − An
= E[Xn+1 |Fn ] − Xn
which is positive iff Xn ≤ E[Xn+1 |Fn ]
i.e. {Xn }∞
n=1 is a sub-martingale.
Definition: Stopping Time
Let {Fn }∞
n=1 be a filtration on (Ω, F, P) and let
Y : Ω → N ∪ {∞} be a random variable
then T is a stopping time if
{T ≤ n} ∈ Fn
∀n
Lemma:
Let (Ω, F, P) be a probability space with stopping times T, S on filtration {Fn }∞
n=1
then T ∩ S := min(T, S), T ∪ S := max(T, S) are stopping times.
Proof:
Clearly T ∩ S, T ∪ S are r.vs
{T ∩ S ≤ n} = {T ≤ n} ∪ {S ≤ n}
{T ∪ S ≤ n} = {T ≤ n} ∩ {S ≤ n}
Since T, S are stopping times {T ≤ n}, {S ≤ n} ∈ Fn
hence {T ∩ S ≤ n}, {T ∪ S ≤ n} ∈ Fn
Lemma:
Let (Ω, F, P) be a probability space with stopping time T on filtration {Fn }∞
n=1
then the set of events which are observable up top time T
FT := {A ∈ F : A ∩ {T ≤ n} ∈ Fn ∀n}
is a σ-algebra.
Lemma:
Let (Ω, F, P) be a probability space with stopping time T on filtration {Fn }∞
n=1
∞
If {Xn }∞
n=1 is adapted to {Fn }n=1 then
37
Fn -measurable
5
(
XT (ω) (ω) T (ω) &lt; ∞
XT (ω) :=
0
o/w
is FT measurable
Proof:
Let Xn : (Ω, F) → (Ω0 , F 0 )
and let A ∈ F 0
{XT ∈ A} = {0 ∈ A, T = ∞} ∪
∞
[
{Xn ∈ A} ∩ {T = n} ∈ F
n=1
hence XT is a random variable
Moreover
{XT ∈ A} ∩ {T = n} = {Xn ∈ A} ∩ {T = n} ∈ Fn
since Xn adapted and T is a stopping time
hence {XT ∈ A} ∈ FT
Lemma:
∞
Let {Xn }∞
n=1 be a martingale on {Fn }n=1 and T stopping time s.t. T ≤ m a.s.
then E[Xm |T ] = XT P-a.s.
Proof:
XT is FT measurable
for k ∈ N|[1,m] and A ∈ FT
E[XT 1A 1T =k ] = E[Xk 1A∩{T =k} ]
= E[E[Xm |Fk ]1A∩{T =k} ]
= E[E[Xm 1A∩{T =k} |Fk ]]
= E[Xm 1A∩{T =k} ]
m
X
E[XT 1A ] =
E[XT 1A 1T =k ]
=
k=1
m
X
E[Xm 1A 1T =k ]
k=1
= E[Xm 1A ]
hence XT = E[Xm |FT ]
Lemma:
Let S ≤ T be stopping times on {Fn }∞
n=1 then
FS ⊆ FT
Proposition: Optimal Stopping
Let S ≤ T be two bounded stopping times on {Fn }∞
n=1
and {Xn }∞
n=1 be a martingale on the same filtration.
then E[XT |FS ] = XS
P-a.s.
Proof:
T ≤ m hence
E[XT |FS ] = E[E[Xm |FT ]|FS ]
= E[Xm |FS ]
= XS
38
MARTINGALES
5
Proposition:
Let S ≤ T be P-a.s. finite stopping times and {Xn }∞
n=1 a sub-martingale
then
E[XT |FS ] ≥ XS
Proposition:
Let S ≤ T be P-a.s. finite stopping times and {Xn }∞
n=1 a super-martingale
then
E[XT |FS ] ≤ XS
Theorem: First DOOB Inequality
Let {Xn }∞
n=1 be a sub-martingale, X n := maxi≤n Xi and a &gt; 0 then
aP(X n ≥ a) ≤ E[Xn 1{X n ≥a} ]
Proof:
Let T = inf {n ∈ N : Xn ≥ a}
which is a stopping time.
Then
{X n ≥ a} = {T ≤ n} = {XT ∩n ≥ a}
hence
aP(X n ≥ a) = aP(XT ∩n ≥ a)
= aE[1T ∩n ]
= E[a1T ∩n ]
≤ E[XT ∩n 1T ∩n ]
≤ E[E[Xn |FT ∩n ]1T ∩n ]
= E[E[Xn 1T ∩n |FT ∩n ]]
= E[Xn 1T ∩n ]
Corollary:
If {Xn }∞
n=1 is a non-negative sub-martingale then ∀a ≥ 0, p ≥ 1
ap P(X n ≥ a) ≤ E[Xnp ]
Theorem: Second DOOB Inequality
If {Xn }∞
n=1 is a non-negative sub-martingale and p &gt; 1 then
p
p
p
E[X n ] ≤
E[Xnp ]
p−1
Definition: Crosses
The sequence {Xn }∞
n=1 crosses the interval [a, b] if ∃i, j s.t. Xi &lt; a &lt; b &lt; Xj
Lemma:
Let {Xn }∞
n=1 which crosses any rational interval only finitely often.
Then {Xn }∞
n=1 converges in R ∪ {&plusmn;∞}
Proof:
Suppose liminfn Xn &lt; limsupn Xn
then ∃a, b ∈ Q s.t.liminfn Xn &lt; a &lt; b &lt; limsupn Xn
then {Xn }∞
n=1 crosses [a, b] i.o.
Definition: Up-Crossing
∞
If {Xn }∞
n=1 is an adapted sequence of {Fn }n=1 and a &lt; b then
define T0 := 0 and recursively
Sk := inf {n ≥ Tk−1 : Xn ≤ a}
Tk := inf {n ≥ Sk : Xn ≥ b}
39
MARTINGALES
5
k
{Xn }Tn=S
k
Then
is the kth up-crossing of [a, b]
Remark:
We write the number of up-crossings of [a, b] by time n as
Una,b :=
n
X
1Tk ≤n
k=1
Lemma:
Let {Xn }∞
n=1 be a martingale then
E[Una,b ] ≤
Proof:
Let
Z :=
∞
X
E[(Xn − a)− ]
b−a
(XTk ∩n − XSk ∩n )
k=1
then
E[(XTk ∩n − XSk ∩n )] = E[XTk ∩n ] − E[XSk ∩n ]
= E[E[XTk ∩n |FSk ∩n ]] − E[XSk ∩n ]
= E[XSk ∩n ] − E[XSk ∩n ]
=0
E[Z] = 0
Suppose Una,b = m then
Z ≥ m(b − a) + (Xn − XSm +1∩n )
since Z crosses [a, b] m times and there could be one uncompleted crossing.
Xn − XSm +1∩n ≥ Xn − a hence
Z≥
∞
X
(1{Una,b =m} m(b − a) + 1{Una,b =m} (Xn − a))
m=0
0 = E[Z]
∞
X
≥
P(Una,b = m)m(b − a) − E[(Xn − a)− ]
m=0
= E[Una,b ](b − a) − E[(Xn − a)− ]
hence we have the required result.
Theorem: Martingale Convergence Theorem
Let {Xn }∞
n=1 be a martingale s.t. supn E[(Xn )− ] &lt; ∞
then X∞ := limn→∞ Xn exists P-a.s.
moreover X∞ is F∞ measurable and X∞ ∈ L1
Proof:
By the previous lemma
E[(Xn − a)− ]
&lt;∞
E[Una,b ] ≤
b−a
by monotone convergence of expectations we have
a,b
E[U∞
]&lt;∞
a,b
a,b
but E[U∞
] ≥ 0 hence E[U∞
] &lt; ∞ P-a.s.
40
MARTINGALES
5
MARTINGALES
Since a, b ∈ Q there are only a countable number of intervals hence
∃N ∈ F s.t. P(N ) = 0
and ∀a &lt; b ∈ Q, ∀ω ∈ Ω \ N
a,b
we have that E[U∞
]&lt;∞
moreover limn→∞ Xn (ω) exists on R ∪ {&plusmn;∞} ∀ω ∈ Ω \ N
∞
Y := limn→∞ Xn is F∞ since {Xn }∞
n=1 is adapted to {Fn }n=1
E[Y+ ] = E[liminf (Xn )+ ]
≤ liminf E[(Xn )+ ]
by Fatou’s lemma
≤ E[(X0 )+ ]
by Jensen’s inequality
&lt;∞
E[Y− ] = E[liminf (Xn )− ]
≤ liminf E[(Xn )− ]
by Fatou’s lemma
≤ E[(X0 )− ]
by Jensen’s inequality
&lt;∞
hence E[Y ] &lt; ∞ therefore Y ∈ L1
Remark:
• The upcrossing lemma and martingale convergence theorem remain true for super martingales
• In general we do not have L1 convergence
Theorem: Convergence For Uniformly Integrable Martingales
Suppose that {Xn }∞
n=1 is a uniformly integrable super-martingale.
Then Xn converges P-a.s. and in L1 (Ω, F, P) to some r.v. X∞ ∈ L1 Ω, F∞ , P)
and E[X∞ |Fn ] ≤ Xn
Proof:
For a super-martingale {Xn }∞
n=1 by uniform integrability
Z
limk→∞ supn
|Xn |dP = 0
{|Xn |&gt;k}
Z
Z
(Xn )− dP ≤ |Xn |dP
Z
Z
=
|Xn |dP +
|Xn |dP
{|X |&gt;k}
{|Xn |≤k}
Z n
Z
≤
|Xn |dP + k
dP
{|Xn |&gt;k}
{|Xn |≤k}
Z
≤ supn
|Xn |dP + kP(|Xn | ≤ k)
{|Xn |&gt;k}
&lt;∞
Then by the martingale convergence theorem limn→∞ Xn = X∞ P-a.s.
and since a.s. convergence and uniform integrability ensure L1 convergence we have convergence in L1 (Ω, F, P)
i.e. limn→∞ ||X∞ − Xn || = 0
41
5
MARTINGALES
||E[X∞ |Fm ] − E[Xn |Fm ]|| = ||E[X∞ − Xn |Fm ]||
≤ ||X∞ − Xm ||
lim ||E[X∞ |Fm ] − E[Xn |Fm ]|| = 0
m→∞
Corollary:
Suppose that {Xn }∞
n=1 is a uniformly integrable sub-martingale.
Then Xn converges P-a.s. and in L1 (Ω, F, P) to some r.v. X∞ ∈ L1 (Ω, F∞ , P)
and E[X∞ |Fn ] ≥ Xn
Corollary:
Suppose that {Xn }∞
n=1 is a uniformly integrable martingale.
Then Xn converges P-a.s. and in L1 (Ω, F, P) to some r.v. X∞ ∈ L1 (Ω, F∞ , P)
and E[X∞ |Fn ] = Xn
Theorem: Backwards Martingale Theorem
Let (Ω, F, P) be a probability space and {G−n : n ∈ N} sub-σ-algebras of F s.t.
\
G∞ :=
G−k ⊆ G−(n+1) ⊆ G−n ⊆ G−1
k∈N
For X ∈ L1 (Ω, F, P) define
X−n := E[X|G−n ]
1
then limn→∞ X−n exists P-a.s. and in L
Theorem: Strong Law Of Large Numbers
Pn
Let {Xi }∞
i=1 be i.i.d r.vs with E[|Xn |] &lt; ∞, &micro; = E[Xi ] and Sn =
i=1 Xi
then Sn /n → &micro; P-a.s.
Proof:
T∞
Define G−n := σ({Si }∞
i=n ) and G−∞ =
n=1 G−n
then E[X1 |G−n ] = E[X1 |Sn ] = Sn /n
∞
since G−n = σ(Sn , {Xi }∞
i=n+1 ) and independence of {Xi }i=1
hence
Sn
L := lim
n→∞ n
exists P-a.s. and in L1
moreover
Pn
Pn+k
Xi
Xi
L = limsupn i=1
= limsupn i=k
n
n
is measurable w.r.t. τk = σ({Xi }∞
)
Ti=k
i
hence is measurable w.r.t. τ = k=1 nf tyFk
hence by Komogorov’s zero-one law ∃c s.t. P(L = c) = 0
moreover &micro; := E[Sn /n] and limn→∞ E[Sn /n] = E[L] = c
hence indeed &micro; must be the limit.
Theorem: Central Limit Theorem
2
Let {Xk }∞
k=1 ∈ L be i.i.d random variables
denote &micro; := E[Xi ] ν := V ar[Xi ]
then
n
X
Xk − &micro;
√
→ N (0, 1)
Sn :=
νn
k=1
weakly
Proof:
√
WLOG let &micro; = 0, ν = 1 since otherwise we can use the transformation X = &micro; + νZ where Z ∼ N (0, 1) then
n
n
X
Xk − &micro; X Zk
√
√
=
νn
n
k=1
k=1
42
5
MARTINGALES
hence indeed it suffices to show for &micro; = 0, ν = 1
We need to show that ∀f ∈ Cb (R) that
Z
lim E[f (Sn )] =
n→∞
ex
2
/2
√
f (x)
dx
2π
however any f ∈ Cb (R) can be approximated by some f ∈ Cb (R) which has bounded and uniformly continuous
first and second derivatives.
∞
Let {Yk }∞
k=1 ∼i,i,d N (0, 1) be independent of {Xk }k=1
then it suffices to show that for
n
X
Y
√k
Tn :=
n
k=1
we have that
denote Xk,n
and define
lim E[f (Sn ) − f (Tn )] = 0
√
√
= Xk / n. Yk,n = Yk / n
n→∞
Wk,n =
k−1
X
Yj,n +
j=1
Then
f (Sn ) − f (Tn ) =
n
X
n
X
Xj,n
j=k+1
f (Wk,n + Xk,n ) − f (Wk,n + Yk,n )
k=1
by a telescoping sum, hence by Taylor’s theorem we have
1
2
+ RXk,n
f (Wk,n + Xk,n ) = f (Wk,n ) + f 0 (Wk,n )Xk,n + f 00 (Wk,n )Xk,n
2
1
2
+ RYk,n
f (Wk,n + Yk,n ) = f (Wk,n ) + f 0 (Wk,n )Yk,n + f 00 (Wk,n )Yk,n
2
1
2
2
− Yk,n
) + RXk,n − RYk,n
f (Wk,n + Xk,n ) − f (Wk,n + Yk,n ) = f 0 (Wk,n )(Xk,n − Yk,n ) + f 00 (Wk,n )(Xk,n
2
E[f (Wk,n + Xk,n ) − f (Wk,n + Yk,n )] ≤
2
2
E[|f 0 (Wk,n )(Xk,n − Yk,n )|] + E[|f 00 (Wk,n )(Xk,n
− Yk,n
)/2|] + E[|RXk,n | + |RYk,n |]
also from Taylor’s theorem we have
2
|RXk,n | ≤ Xk,n
||f 00 ||∞
so by uniform continuity we have that
∀ε &gt; 0 ∃δ &gt; 0 s.t.
2
|RXk,n | ≤ Xk,n
ε
∀|Xk,n | &lt; δ
thus
2
2
|RXk,n | ≤ Xk,n
ε1{|Xk,n |&lt;δ} + Xk,n
||f 00 ||∞ 1{|Xk,n |≥δ}
43
5
MARTINGALES
moreover
&quot; n
#
X
E
|RXk,n | + |RYk,n |
k=1
≤
n
X
2
2
E[Xk,n
(ε1{|Xk,n |&lt;δ} + ||f 00 ||∞ 1{|Xk,n |≥δ} ) + Yk,n
(ε1{|Yk,n |&lt;δ} + ||f 00 ||∞ 1{|Yk,n |≥δ} )]
k=1
n
1X
E[Xn2 (ε1{|Xk,n |&lt;δ} + ||f 00 ||∞ 1{|Xk,n |≥δ} ) + Yn2 (ε1{|Yk,n |&lt;δ} + ||f 00 ||∞ 1{|Yk,n |≥δ} )]
=
n
k=1
!
n
n
X
1 X
2
00
2
2
00
E[Xn ||f ||∞ 1{|Xk,n |≥δ} ] +
≤ 2εE[X1 ] +
E[Yn ||f ||∞ 1{|Yk,n |≥δ} ]
n
k=1
=
2εE[X12 ]
+
k=1
2E[X12 ||f 00 ||∞ 1{|Xn |≥δ√n} ]
→n→∞ 0
By the dominated convergence theorem.
t remains to show that the first two terms are null
E[f 0 (Wk,n )Xk,n ] = E[f 0 (Wk,n )]E[Xk,n ]
by independence
=0
0
E[f (Wk,n )Yk,n ] = E[f 0 (Wk,n )]E[Yk,n ]
by independence
=0
E[f
00
2
(Wk,n )(Xk,n
−
2
Yk,n
)]
2
2
= E[f 00 (Wk,n )]E[Xk,n
− Yk,n
]
1
1
−
= E[f 00 (Wk,n )]
n n
=0
hence indeed the theorem holds.
44