Lecture 20 (Oct. 24) Comments on ergodic theorem:

advertisement
Lecture 20 (Oct. 24)
Comments on ergodic theorem:
1, Main idea of proof: break an orbit into pieces each of which has
time average ≥ than biggest it can be.
2. we recover the strong law of large numbers (since iid(p) is
ergodic
Pn−1
k=0 xk
lim
= EpX a.e.
n→∞
n
——- apply the ergodic theorem to f = X
3. we recover the Weil equidistribution theorem: since irrational
roatations of the circle are ergodic:
Theorem (Weil, 1909): If α ∈ [0, 1] is irrational, then {α, 2α, 3α, . . . , }
is uniformly distributed mod 1 in [0, 1]: for any subinterval (a, b) ⊂
[0, 1]
|{i ∈ [0, n − 1] : iα ∈ (a, b)}|
→b−a
n
Proof: apply the ergodic theorem to Tα , f = χ(a,b).
Note: we have rescaled by 2π
For a.e. x ∈ [0, 1], get
|{i ∈ [0, n − 1] : (x + iα) ∈ (a, b)}|
→b−a
n
For a.e. x ∈ [0, 1], get this statement for all (a, b) with rational a, b.
For a.e. x ∈ [0, 1], get this statement for all (a, b) (why? fill in
details)
Need only one x for which this holds for all (a, b).
Theorem: Let T be an MPT. TFAE:
1
1. T is ergodic.
6. For all f ∈ L1,
lim (1/n)(
n→∞
n−1
X
f ◦ T i(x)) =
Z
f dµ a.e.
i=0
7. For all A, B ∈ A,
lim (1/n)(
n−1
X
n→∞
µ(B ∩ T −n(A))) = µ(A)µ(B).
i=0
8. Let B be a semi-algebra which generates A. For all A, B ∈ B,
lim (1/n)(
n−1
X
n→∞
µ(B ∩ T −n(A))) = µ(A)µ(B).
i=0
So, it suffices to just check cylinder sets to establish ergodicity for
stationary processes.
Proof:
1 ⇒ 6: apply the Ergodic Theorem.
6 ⇒ 7: apply 6 with f = χA. Then
lim (1/n)(
n→∞
n−1
X
χA ◦ T i(x)χB ) = µ(A)χB a.e.
i=0
Apply the bounded convergence theorem.
7 ⇒ 8: trivial.
8 ⇒ 1: Let T −1(A) = A. Let > 0 and A0 be in the algebra
generated by B s.t.
µ(A∆A0) < .
(1)
2
Then
|µ(A)2−µ(A)| ≤ |µ(A)2−µ(A0)2|+|µ(A0)2−((1/n)
n−1
X
µ(T −i(A0)∩A0))|
i=0
+|(1/n)
n−1
X
µ(T −i(A0) ∩ A0) − µ(T −i(A) ∩ A0)|
i=0
+|(1/n)
n−1
X
µ(T −i(A) ∩ A0) − µ(T −i(A) ∩ A)|
i=0
≤ 2|µ(A) − µ(A0)| + + + < 5
where, in the last expression, the first occurrence of comes from
condition 8, if n is sufficiently large, and the last two occurrences of
come from (1), the fact that T is MPT and the triangle inequality.
Thus, µ(A)2 − µ(A) = 0. QED
Note: If T is IMPT, can swap T with T −1 everywhere.
Characterize ergodicity of stationary finite-state first-order Markov
chains:
Assume MC defined by stochastic matrix P and stochastic vector
p:
pP = P
Let A = Aa0,...a`
µ(A) = pa0 Pa0a1 Pa1a2 · · · Pa`−1a` .
WMA: p > 0 (delete states with zero stationary probability; does
not affect the MPT)
Main Fact: Let Ai = {x : x0 = i}.
3
µ(σ −k (Aj ) ∩ Ai) = pi(P k )ij
Defn: P is irreducible if for all i, j, there exists n = n(i, j) s.t.
(P n)ij > 0.
Defn: directed graph, G = G(P ) of P :
V = {1, . . . , m}
Directed edge from i to j iff Pij > 0.
Note: Pijn > 0 iff there exists a path in G of length n from i to j.
So, P is irreducible iff G is strongly connected.
Example 1:
0 1 0
P = 1/3 0 2/3
0 1 0
is irreducible.
Example 2:
1 0
0
P = 0 1/3 2/3
0 1/4 3/4
is reducible.
Example 3:
1/3 1/3 1/3
P = 0 1/3 2/3
0 1/4 3/4
is reducible.
Theorem: σ is ergodic (w.r.t (p, P )) iff P is irreducible.
4
Lecture 21 (Oct. 26):
Lemma: (assume p > 0)
Q = lim (1/n)(
n→∞
n−1
X
P k)
k=0
exists (entry by entry).
Note: easy to prove by linear algebra if you know that no eigenvalue has modulus > 1 and Jordan form is trivial for eigenvalues of
modulus 1. (would follow from P-F).
Proof:
Thus,
n−1
n−1
X
X
k
Qij = (1/pi) lim (1/n)pi( (P )ij ) = (1/pi) lim (1/n)(
µ(σ −k (Aj )∩Ai
n→∞
n→∞
k=0
(1/pi) lim (1/n)(
n→∞
Z
= (1/pi)
n−1 Z
X
(χAj ◦ σ k )(χAi )dµ) =
k=0
n−1
X
( lim (1/n)( (χAj ◦ σ k )(χAi )dµ)
n→∞
k=0
Z
= (1/pi)
(χ∗Aj )χAi dµ
(by ergodic theorem and bounded convergence theorem)
Corollary:
–
–
–
–
QP = Q = P Q
Q2 = Q
pQ = p
Q is stochastic
5
k=0
Theorem TFAE: (assume p > 0).
1. σ is ergodic w.r.t. (p, P )
2. Qij = pj
3. P is irreducible.
Proof:
1 ⇒ 2: If σ is ergodic, then
Z
Qij = (1/pi) (χ∗Aj )χAi dµ = (1/pi)µ(Aj )µ(Ai) = pj
(alternatively, use condition 8 of previous theorem)
2 ⇒ 1:
If Qij = pj , then
lim (1/n)(
n→∞
n−1
X
µ(T −k (Aj ) ∩ Ai)) = piQij = pipj
k=0
and so condition 8 holds for cylinder sets with only one cylinder
coordinate.
Easily extends to general cylinder sets.
Let A = Aa0,...a`−1
µ(A) = pa0 Pa0a1 Pa1a2 · · · Pa`−2a`−1 .
Let B = Ab0,...bu−1
µ(B) = pb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 .
and for suff. large k,
µ((σ −k (A))∩B) = pb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 (Pbk−u
)Pa0a1 Pa1a2 · · · Pa`−2a`−1 .
u ,a0
6
Thus, (1/n)(
Pn−1
k=0
µ((σ −k (A)) ∩ B)) converges to
pb0 Pb0b1 Pb1b2 · · · Pbu−1bu pa0 Pa0a1 Pa1a2 · · · Pa`−1a` .
= µ(A)µ(B).
2 ⇒ 3: Q > 0 and so by defn. of Q, for all i, j there exists n s.t.
Pijn > 0.
3 ⇒ 2: Q = QP = QP 2 = QP n for all n.
Fix i, j. Show Qij > 0.
P
n
Qij = k Qik Pkj
Since Q is stochastic, there exists k s.t. Qik > 0.
n
> 0.
Since P is irreducible, there exists n s.t. Pkj
Thus, Qij > 0.
2
P
Since Q = Q , Qij = k Qik Qkj .
Since Q is stochastic, Qij is a weightted average, with strictly
positive weights of Q1j , Q2j , . . . , Qmj . But this holds for all i. Thus,
Qij depends only on j.
P
P
But pQ = p. Thus, pj = k pk Qkj = k pk qj = qj . QED
Defn: An MPT T is mixing if for all A, B ∈ A,
µ(T −n(A) ∩ B) → µ(A)µ(B).
Note: mixing implies ergodicity, since convergence implies Cesaro
convergence.
Note: to check mixing, it suffices to check mixing on a generating
semi-algebra (for the same reason as condition 8 above is equivalent
to condition 7).
7
Note (***): Given any finite set of sets of positive measure, A1, . . . , Am,
∃n ∀i, j, µ(T −n(Aj ) ∩ Ai) > 0.
Note: blob picture of mixing.
Check examples:
1. Rotation of circle:
Intuitively, rigid transformations can never be mixing.
Subdivide into small arcs A1, . . . , Am, then we cannot have (***).
If irrational, then ergodic, but not mixing.
2. iid process (one-sided or two-sided): is mixing because of independence (on cylunder sets).
3 and 4. It follows that doubling map and Baker are mixing (mixing is an isomorphism invariant).
Picture of Baker.
Lecture 22: Oct, 28
Mixing for Markov chains:
Defn: P is primitive if P n > 0 (entry-wise) for some n.
Graph interpretation: there is a uniform time n in which you can
get from any state to any state.
Examples:
0 1 0
P = 1/3 0 2/3
0 1 0
P n is never positive: look at graph.
8
0 1/2 1/2
P = 1/3 0 2/3
0
1
0
P 4 > 0.
Prop: P is primitive iff P is irreducible and aperiodic (i.e., gcd of
cycle lengths of G(P ) is 1).
(check examples)
Proof: gcd = gcd{n : trace(P n) > 0}.
Only If: P n > 0 implies P n+1 > 0 (think of graph interpretation).
gcd(n, n + 1) = 1.
If: Special case: there exists a self-loop.
Connect i to j via the self-loop at k.
In general, use a combination of cycle lengths that are relatively
prime.
TFAE
1. σ is mixing.
2. (P k )ij → πj .
3. P is primitive.
So, Example 1 is ergodic but not mixing, and Example 2 is mixing.
Proof:
1 ⇒ 2: Let Ai and Aj be thin cylinder sets. Mixing implies:
πi(P n)ij → πiπj
Thus, (P n)ij → πj
9
2 ⇒ 1: Just like ergodic proof: verify for cylinder sets:
For thin cylinder sets Ai, Aj , show that µ(T −n(Aj ) ∩ Ai) →
µ(Ai)µ(Aj ).
Extend to general cylinder sets
2 ⇒ 3: Follows since p > 0.
3 ⇒ 2: geometric contraction proof in two dimensions.
P
Let W = {(x1 . . . xd) ∈ Rd :
xi = 1}.
P : W → W, x 7→ xP
Well-defined since xP · 1 = xP · 1 = x · 1 = 1,.
¯
¯
¯
k
Note: P (W ) is nested decreasing.
k
Claim: ∆ ≡ ∩∞
k=0 P (W ) is a single point {z}.
—- If true, then z = π because:
———- π = πP k → z.
—- Thus, for all i, eiP k → π and so (P k )ij → πj .
Proof of claim for d = 2:
∆ is a closed interval in W . If not a single point, then two linearly
independent points {x, y} form a lin. indep. set.
Thus x and y are fixed by P 2. Thus, P 2 is the identity, contrary
to primitivity.
Proof of claim for general d:
Use contraction mapping w.r.t. Hilbert metric on W ◦:
wi/wj
ρ(v, w) = max log
i,j
vi/vj
If P n > 0, then P n(W ) is a compact subset of W ◦.
10
Apply contraction mapping theorem:
———– P has a unique fixed point (which must be π) and for all
x ∈ P n(W ), xP k → π
apply to each x = eiP n. QED
Can also apply renewal theorem
Let A be a nonnegative square matrix, i.e., entries are all nonnegative.
Define irreducible, primitive and aperiodic for A in the exact same
way as we defined for stochastic matrices.
Skipped for now:
Spectral radius of A, denoted λ(A), is max of absolute values of
eigenvalues.
Perron-Frobenius Theorem:
Let A be an irreducible matrix with spectral radius λ(A). Then
1. λ(A) > 0 and is an eigenvalue of A.
2. λ(A) is a simple eigenvalue.
3. λ(A) has a (strictly) positive eigenvector (left is v and right is
w).
4. If A is primitive, then
An
→ (viwj )
λ(A)n
(where v · w = 1).
5. If A is primitive, then λ(A) is the only eigenvalue of modulus
λ(A).
11
=———————————————
Lecture 23: Oct. 31
ENTROPY FOR MPT’S
Let (M, A, µ) be a probability space.
Let α = {A1, . . . , An} be a
FINITE MEASURABLE PARTITION of M .
Defn: H(α) = H(X) where X is a r.v. with distribution p =
(µ(A1), . . . , µ(An))
X
µ(Ai) log µ(Ai)
H(α) = −
i
Schematic:
Defn: H(α|β) = H(X|Y ) where (X, Y ) are jointly dist. r.v.’s
with distribution p(X = i, Y = j) = (µ(Ai ∩ Bj )
H(α|β) = −
X
µ(Ai ∩ Bj ) log
i,j
µ(Ai ∩ Bj )
.
µ(Bj )
Schematic:
Relations between partitions:
1. We identify two partitions if they agree, element by element,
up to a set of measure zero, i.e,
α = β:
α = {A1, . . . , An}, α = {B1, . . . , Bn},
µ(Ai∆Bi) = 0,
Note: H(α) is well-defined.
2. β α:
each element of β is a union of elements of α.
12
Note: α is finer than β, and β is coarser than α.
3. α ∨ β = {A ∩ B : A ∈ α, B ∈ β}
but delete sets of measure zero
Note: this is like joining two r.v.’s with some joint distribuiotn.
4. α ⊥ β: µ(Ai ∩ Bj ) = µ(Ai)µ(Bj ).
5. If T is an MPT,
T −1(α) = {T −1(A) : A ∈ α}
TRANSLATE ENTROPY PROPERTIES (from r.v.’s to partitions):
X≈α
Y ≈β
Properties II:
1. H(α ∨ β) = H(α) + H(β|α)
2. H(β|α) ≤ H(β) with equality iff α ⊥ β
3. H(β|α) ≥ 0 with equality iff β α
4. H(α ∨ β) = H(α) iff β α
Pn
5. H(α1 ∨ · · · ∨ αn) = i=1 H(αi|α1 ∨ · · · ∨ αi−1)
P
6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent.
7. Assume β α. Then H(β) ≤ H(α) and equality holds iff
β = α.
8. Assume γ α. Then H(β|α) ≤ H(β|γ)
9. H(α ∨ β|γ) = H(α|γ) + H(β|α, γ)
13
10. H(β|γ, α) ≤ H(β|γ) with equality iff α ⊥γ β
11. If α ⊥γ (β, δ), then H(β|δ, γ, α) = H(β|δ, γ).
12. If γ α, then H(γ|β) ≤ H(α|β)
13. If T is an MPT, then for all i, j, k,
H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α)
Want to define:
H(α|B)
where α is a finite partition and B is a sub-sigma-algebra of A.
Recall information function of a r.v. IX (x) = − log p(x)
P
Iα (x) = − i χARi (x) log µ(Ai)
Recall: H(α) = Iα dµ.
Defn:
Iα|B (x) = −
X
χAi (x) log µ(Ai|B) = −
X
i
χAi (x) log E(χAi |B)
i
(conditional expectation)
R
Defn: H(α|B) = I α|B dµ.
We want this to agree with H(α|β):
Given finite partition β, let B = B(β) be all finite unions of elts.
of β, which is a σ-algebra.
Claim:
H(α|B) = H(α|β)
proof:
µ(Ai|B) =
X
χBj (x)
j
14
µ(Ai ∩ Bj )
µ(Bj )
(since RHS is B-measble. and for all j,
Z
Z
RHS dµ = µ(Ai ∩ Bj ) =
Bj
χAi dµ
Bj
Thus,
Iα|B (x) = −
X
χAi (x) log
X
j
i
=−
X
χBj (x)
χAi∩Bj (x) log
i,j
µ(Ai ∩ Bj )
µ(Bj )
µ(Ai ∩ Bj )
.
µ(Bj )
Thus,
Z
H(α|B) =
−
X
χAi∩Bj (x) log
i,j
=−
X
i,j
µ(Ai ∩ Bj ) log
µ(Ai ∩ Bj )
dµ
µ(Bj )
µ(Ai ∩ Bj )
= H(α|β)
µ(Bj )
NOTE: Every finite sub-sigma-algebra (which is simply a finite
algebra) has a unique generating partition.
Correspondence: Finite Partition ←→ Finite σ-algebra
β 7→ σ(β):
σ(β) = unions of elts. of β
B 7→ π(B)
π(B) = {∩B∈B B where B = B or B c
15
Lecture 24: Nov. 7
RECALL:
(M, A, µ): probability space
α : finite measble. partition
B: sub-sigma-algebra
Defn: (conditional information function)
X
X
χAi (x) log E(χAi |B)
χAi (x) log µ(Ai|B) = −
Iα|B (x) = −
i
i
Defn: (conitidional entropy)
Z
H(α|B) =
I α|B dµ
Recall:
If β is a (finite, msble) partition, then H(α|σ(β)) = H(α|β)
If B is a finite sigma-algebra, H(α|B)) = H(α|π(B))
NOTE:
Iα|B 6= E(Iα |B)
Reason:
— Iα|B is typically not B-measurable.
— E(Iα |B) is measurable.
Also: this alternative would be a poor choice:
R
R
E(Iα |B) = Iα = H(α), not = H(α|B).
Trivial examples: H(α|A) = 0.
H(α|{M, ∅}) = H(α.
Theorem: (continuity of conditional entropy)
16
Let B a sub-sigma-algebra of A.
Let B1 ⊂ B2 ⊂ B3 . . . be σ-algebras
and B = σ(∪∞
n=1 Bn ).
Let α be a finite measble. partition.
Then
Iα|Bn → Iα|B a.e. and in L1
and thus,
H(α|Bn) → H(α|B)
Proof:
(µ(A|Bn), Bn) = (E(χA|Bn), Bn) is a martingale, since Bn ↑
By version of Martingale Convergence Theorem,
µ(A|Bn) → µ(A|B) a.e.
Thus,
Iα|Bn =
X
χAi log µ(A|Bn) →
X
i
χAi log µ(A|B) = Iα|B a.e.
i
For L1 convergence, it suffices to show that f ∗ ≡ sup Iα|Bn ∈ L1; for
then, apply DCT.
Note: The L1 convergence can be established by another version
of Martingale Convergence Theorem, but we will need to know that
f ∗ ∈ L1 for proof of SMB Theorem.
For this, show µ(f ∗ > t) dies exponentially fast as t → ∞:
Fix A = Ai ∈ α and t > 0.
Let fn,A(x) = − log µ(A|Bn).
Let
Cn,A(t) = {x : fi,A(x) ≤ t, i = 1, . . . , n − 1, and fn,A(x) > t}
17
Since Cn,A(t) ∈ Bn,
Z
µ(A ∩ Cn,A(t)) =
Z
χA(x)dµ =
µ(A|Bn)dµ
Cn,A (t)
Z
Cn,A (t)
e−fn,A(x)dµ ≤ e−tµ(Cn,A(t))
=
Cn,A (t)
Thus,
∗
µ(A ∩ {f > t}) =
∞
X
µ(A ∩ Cn,A(t)) ≤ e
−t
n=1
∞
X
µ(Cn,A(t)) ≤ e−t.
n=1
Thus,
µ({f ∗ > t}) ≤ M e−t
where M is the number of sets in α. Thus,
Z
Z
Z ∞
µ(f ∗ > t)dt ≤ M
f ∗dµ =
∞
e−tdt < ∞.
0
0
QED
Now, we can formulate another list of entropy inequalities involving conditioning on σ-algebras.
Defn: A measure space has a countable basis if there is a countable
collection of measurable sets C s.t. for all A ∈ A and > 0, there
exists C ∈ C s.t. µ(A∆C) < .
Fact: (M, A, µ) has a countable basis iff L2(M, A, µ) is separable,
i.e. has a countable dense set.
Fact: The Borel sigma-algebra for any metric space has this property. And so does its completion.
Fact: Given B ⊆ A, the map
L2(M, A, µ) → L2(M, B, µ),
18
f 7→ E(f |B)
is continuous (it is a continuous projection).
Corollary: Let B ⊆ A, If (M, A, µ) has a countable basis, so does
(M, B, µ).
Let (M, A, µ) have a countable basis. Let B be sub-sigma-algebra.
Let {C1, C2, . . .} be a countable basis for B.
Let Bn be the sigma-algebra (equiv.., the algebra) generated by
{C1, C2, . . . , Cn}.
Then B n ↑ B.
Let βn denote the partition determined by Bn.
By continuity theorem, H(α|βn) = H(α|Bn) → H(α|B).
Note: βn βn+1. So, H(α|Bn) ↓ H(α|B)
Apply this to derive the new list of properties from the old list.
Recall:
Property 12: If γ α, then H(γ|β) ≤ H(α|β)
Prop: If γ α, then H(γ|B) ≤ H(α|B).
Proof: H(γ|βn) ≤ H(α|βn) QED
Recall:
Property 8: Assume γ δ. Then H(α|δ ≤ H(α|γ)
Prop: Assume C ⊆ D. Then H(α|D) ≤ H(α|C).
Proof: Let γn, δn s.t. σ(γn) ↑ C and σ(δn) ↑ D.
Let ηn = γn ∨ δn.
Then γn ηn.
σ(γn) ↑ C and σ(ηn) ↑ D.
19
Thus,
H(α|D) = lim H(α|ηn) ≤ lim H(α|γn) = H(α|C).
n
n
QED
20
Lecture 25: Nov. 9
Recall: We are assuming (M, A, µ) has a countable basis and thus
so does (M, B, µ), where B is sub-sigma-algebra.
Thus, there is a sequence of partitions β1 β2 . . ., with corresponding (finite) sigma-algebras Bn = σ(βn) ↑ B
α: finite partition;
H(α|Bn) = H(α|βn)
Then, by continuity of information/entropy:
Iα|Bn → Iα|B a.e. and in L1
and thus,
H(α|Bn) → H(α|B)
——————————————–
Recall:
Property 2: H(α|β) ≤ H(α) with equality iff α ⊥ β.
Property 2’: H(α|B) ≤ H(α) with equality iff α ⊥ B iff σ(α) ⊥ B
Proof:
H(α|B) ≤ H(α):
Proof: Bn ↑ B
So, H(α|B) = limn H(α|Bn) = limn H(α|βn) ≤ H(α).
H(α|B) = H(α) iff B ⊥ α.
Proof:
If: B ⊥ α implies for all n, we have βn ⊥ α.
Thus, H(α|βn) = H(α),
Thus, H(α|B) = limn H(α|βn) = H(α).
21
Only If:
H(α|βn) ↓ H(α|B).
So, H(α|B) ≤ H(α|βn) ≤ H(α).
So, if H(α|B) = H(α), then for all n, H(α|βn) = H(α).
So, for all n, α ⊥ βn. Thus, α ⊥ B. QED
Recall:
Property 3: H(α|β) ≥ 0 with equality iff α β
Property 3’: H(α|B) ≥ 0 with equality iff α ⊆ B iff σ(α) ⊆ B
Proof:
Recall:
H(α|B) =
Z X
−χAi (x) log µ(Ai|B)(x)dµ
i
If:
If each Ai ∈ B, then E(χAi |B) = χAi . Thus,
Z X
H(α|B) =
−χAi (x) log χAi (x)dµ = 0
i
since χAi takes on only 0 and 1 values.
Only if:
If H(α|B) = 0, then since each term in formula for H is nonnegative, they must all be 0 and so each µ(Ai|B)(x) is 0/1 - valued.
Since µ(Ai|B)(x) is 0/1 - valued, µ(Ai|B) = χC for some C ∈ B.
Claim: Ai = C ∈ B.
22
Proof of this:
But by properties of conditional expectation,
Z
Z
Z
µ(Ai ∩ C) =
χA i =
µ(Ai|B) =
χC = µ(C)
C
C
C
and
µ(Ai ∩ C c) =
Z
Cc
Z
Z
µ(Ai|B) =
χAi =
Cc
χC = 0
Cc
So, up to a set of measure zero, Ai ⊆ C.
But since µ(Ai ∩ C) = µ(C), we have Ai = C mod 0.
QED
Properties III:
1. H(α ∨ β) = H(α) + H(β|α)
2. H(β|F) ≤ H(β) with equality iff F ⊥ β
3. H(β|F) ≥ 0 with equality iff β ⊆ F
4. H(α ∨ β) = H(α) iff β α
P
5. H(α1 ∨ · · · ∨ αn) = ni=1 H(αi|α1 ∨ · · · ∨ αi−1)
P
6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent.
7. Assume β α. Then H(β) ≤ H(α) and equality holds iff
β = α.
8. Assume E F. Then H(β|F) ≤ H(β|E)
9. H(α ∨ β|F) = H(α|F) + H(β|σ(α ∪ F)) ≤ H(α|F) + H(β|F)
10. H(β|σ(F ∪ E)) ≤ H(β|F) with equality iff E ⊥F β
23
11. If G ⊥F (σ(β ∪ E), then H(β|σ(E ∪ F ∪ G) = H(β|σ(E ∪ F)).
12. If γ α, then H(γ|B) ≤ H(α|B)
13. If T is an MPT, then for all i, j, k,
H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α)
14. If T is an MPT, then for all i, j, k,
H(T −iα ∨ . . . ∨ T −j α|F) = H(T −i−k α ∨ . . . ∨ T −j−k α|T −k (F))
NOTE: get similar properties for Iα , Iα|B .
NOTE:
H(α|B) =
Z X
−χAi (x) log µ(Ai|B)(x)dµ
i
=
Z X
−µ(Ai|B)(x) log µ(Ai|B)(x)
i
(since whenever g is B-measble, we have:
Z
Z
f g = E(f |B)g)
Recall:
For a stationary process X,
H(X1, . . . , Xn)
= lim H(Xn|Xn−1 . . . X1)
n→∞
n→∞
n
Given MPT T and partition α = {A1, . . . , AM }, we correspond
the stationary process taking values in {1, . . . , M }, where
h(X) = lim
P (X0 = x0, . . . , Xn = xn) = µ(Ax0 ∩ T −1(Ax1 ) ∩ · · · T −n(Axn ))
Define:
h(T, α) = h(X).
24
equivalently,
H(α0 ∨ · · · ∨ αn−1)
= lim H(αn|α0 ∨ · · · ∨ αn−1)
n→∞
n→∞
n
h(T, α) = lim
= lim H(α0|α1 ∨ · · · ∨ αn)
n→∞
where αi = T −i(α).
Both limits are non-increasing.
n
Let αm
= ∨ni=mαi.
∞
Let αm
= σ(∪∞
i=m αi ).
Using continuity of conditional entropy, we see:
Z
h(T, α) =
Iα|α1∞ dµ = H(α|α1∞)
Define h(T ) = supα h(T, α)
We will see how to compute h(T ), at least in some cases.
For right now, focus on h(T, α).
SMB Theorem: Let T be an ergodic MPT and α a (finite, measble.) partition. Then
1
Iα0n → h(T, α) = H(α|α1∞) a.e. and in L1
n
Proof will use Ergodic Theorem and Theorem on Convergence of
conditional information.
NOTE: Let α(x) be the index of the element of α to which x
belongs.
Let µn(x) = µ(∩ni=0T −i(Aα(T i(x))).
Iα0n (x) = − log µn(x)
25
and SMB theorem says that
µn(x) ≈ e−nh(T,α)
In stochastic language,
P (X0 = x0, . . . , Xn = xn) ≈ 2−nh(X)
NOTE: There is a version of SMB for non-ergodic MPT’s.
26
Download