Lecture 18: Recall: } a finite measble. partition Let α = {A

advertisement
Lecture 18:
Recall:
Let α = {A1, . . . , Ak } a finite measble. partition
and B a sub-σ-algebra
Defn:
X
X
Iα|B (x) := −
χAi (x) log µ(Ai|B) = −
χAi (x) log E(χAi |B)
i
i
Defn: H(α|B) :=
R
I α|B dµ.
Prop: Let α, β be finite measble. partitions. Let B be the σalgebra generated by β, i.e., the collection of all finite unions of
elements of β. Then
H(α|B) = H(α|β)
Continuity of Conditional entropy:
Let B1 ⊂ B2 ⊂ B3 . . . be σ-algebras
and B = σ(∪∞
n=1 Bn ).
Let α be a finite measble. partition.
Then
Iα|Bn → Iα|B a.e. and in L1
and
H(α|Bn) → H(α|B)
Proof:
By Martingale Convergence Theorem
µ(Ai|Bn) → µ(Ai|B) a.e. and in L1
Thus,
Iα|Bn : −
X
χAi log µ(Ai|Bn) →
i
X
i
1
χAi log µ(Ai|B) := Iα|B
(a.e. and in L1)
Can show supn Iα|Bn ∈ L1 (may be done in student talk on ShannonMcMillan-Breiman Thoerem). Apply DCT to get L1 convergence of
the conditional information function and convergence of conditional
entropy. Alternative proof only of convergence of conditional entropy:
Fact:
Z
H(α|B) =
−
X
µ(Ai|B) log µ(Ai|B)dµ
i
because whenever g is B-measble, we have:
Z
Z
f gdµ = E(f |B)gdµ
And integrand above is bounded by log k because {µ(Ai|B)(x)}ki=1
is a probability vector.
P
P
And − i µ(Ai|Bn) log µ(Ai|Bn) → − i µ(Ai|B) log µ(Ai|B)
a.e.
Apply bounded convergence theorem. ——————————We can formulate another list of entropy inequalities involving
conditioning on sub-σ-algebras of A, instead of conditioning on finite
measble partitions.
Defn: A measure space has a countable basis if there is a countable
collection of measurable sets C s.t. for all A ∈ A and > 0, there
exists C ∈ C s.t. µ(A∆C) < .
Fact: The Borel sigma-algebra for any separable metric space has
a countable basis. And so does its completion.
Proof: The σ-algebra A is generated by finite unions of open balls
of radius 1/n centered at the points of a countable dense set.
2
Fact: (M, A, µ) has a countable basis iff L2(M, A, µ) is separable.
Proof:
P
only if: given a countable basis C, the collection ni=1 aiχAi with
ai ∈ Q and Ai ∈ C is L2-dense.
if: given a countable L2-dense subset D, the collection f −1(a, b)
with a, b ∈ Q and f ∈ D is a countable basis.
Fact: Given B ⊆ A, the map
L2(M, A, µ) → L2(M, B, µ),
f 7→ E(f |B)
is continuous.
Proof:
Z
|E(f |B)−E(g|B)|2dµ =
Z
Z
=
E(f −g|B)2dµ ≤
Z
E(|f −g|2|B)dµ
|f − g|2dµ
Corollary: Let B ⊆ A, If (M, A, µ) has a countable basis, so does
(M, B, µ).
Tool to develop conditional entropy inequalities, conditioning on
sub-sigma-algebras:
Let (M, A, µ) have a countable basis. Let B be sub-sigma-algebra.
Fix {C1, C2, . . .}, a countable basis for B.
Let Bn be the sigma-algebra (equiv.., the algebra) generated by
{C1, C2, . . . , Cn}.
Then Bn ↑ B.
Let βn denote the partition determined by Bn.
3
By continuity theorem, H(α|βn) = H(α|Bn) → H(α|B).
Since βn βn+1, H(α|Bn) ↓ H(α|B)
4
Properties of entropy:
1. H(α ∨ β) = H(α) + H(β|α)
2. H(α|B) ≤ H(α) with equality iff α ⊥ B
3. H(α|B) ≥ 0 with equality iff α ⊆ B
4. H(α ∨ β) = H(α) iff β α
P
5. H(α1 ∨ · · · ∨ αn) = ni=1 H(αi|α1 ∨ · · · ∨ αi−1)
P
6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent.
7. Assume β α. Then H(β) ≤ H(α) and equality holds iff
β = α.
8. Assume B C. Then H(α|C) ≤ H(α|B)
9. H(α ∨ β|C) = H(α|C) + H(β|σ(α ∪ C)) ≤ H(α|C) + H(β|C)
10. H(α|σ(B ∪ C)) ≤ H(α|B) with equality iff α ⊥B C
11. If B ⊥C (σ(β ∪ D), then H(β|σ(D ∪ C ∪ B) = H(β|σ(D ∪ C)).
12. If γ α, then H(γ|B) ≤ H(α|B)
13. If T is an MPT, then for all i, j, k,
H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α)
14. If T is an MPT, then for all i, j, k,
H(T −iα ∨ . . . ∨ T −j α|B) = H(T −i−k α ∨ . . . ∨ T −j−k α|T −k (B))
5
Proof of 2:
H(α|B) ≤ H(α):
Proof: Bn ↑ B
So, H(α|B) = limn H(α|Bn) = limn H(α|βn) ≤ H(α).
H(α|B) = H(α) iff B ⊥ α.
Proof:
If: B ⊥ α implies for all n, we have βn ⊥ α.
Thus, H(α|βn) = H(α),
Thus, H(α|B) = limn H(α|βn) = H(α).
Only If:
H(α|βn) ↓ H(α|B).
So, H(α|B) ≤ H(α|βn) ≤ H(α).
So, if H(α|B) = H(α), then for all n, H(α|βn) = H(α).
So, for all n, α ⊥ βn. Thus, α ⊥ B. QED
Proof of 3:
If:
If each Ai ∈ B, then E(χAi |B) = χAi . Thus,
Z X
H(α|B) =
−χAi (x) log χAi (x)dµ = 0
i
since χAi takes on only 0 and 1 values.
Only if:
If H(α|B) = 0, then since the integrand is nonnegative in
Z
X
H(α|B) = −
µ(Ai|B) log µ(Ai|B)dµ
i
6
the integrand must be zero a.e., and so each µ(Ai|B)(x) is 0/1 valued. So, µ(Ai|B) = χC for some C ∈ B.
Claim: Ai = C ∈ B (and so α ⊂ B)
Proof:
Z
Z
µ(Ai ∩ C) =
Z
µ(Ai|B) =
χA i =
C
C
χC = µ(C)
C
and
µ(Ai ∩ C c) =
Z
Cc
Z
Z
µ(Ai|B) =
χAi =
Cc
So, Ai = C mod 0. 7
χC = 0
Cc
Lecture 19:
Let T := (i 7→ T i) be a MP Zd+ action. Let α be a finite measble.
partition. For a finite G ⊂ Zd+, let
αG := ∨g∈GT −g α
Let Dn = [0, n − 1]d ∩ Zd.
Defn:
h(T , α) := lim (1/nd)H(αDn )
n→∞
d
Note: this defines entropy of Z actions as well.
Theorem: The limit exists. In fact, for n = (n1, . . . , nd) ∈ Nd+,
let Dn = [0, n1 − 1] × · · · [0, nd − 1] ∩ Zd. Then
1
1
H(αDn ) = inf
H(αDn )
n→∞ n1 · · · nd
n∈Nd n1 · · · nd
lim
the limit exists (independently of how n → ∞).
Proof:
Defn: A function f : Nd+ → R is subadditive if for all k = 1, . . . , d
and all i, j,
f (n1, . . . , nk−1, i + j, nk+1, . . . nd) ≤ f (n1, . . . , nk−1, i, nk+1, . . . nd)
+f (n1, . . . , nk−1, j, nk+1, . . . nd)
The theorem follows immediately from the following two lemmas:
Lemma 1: The function f (n) := H(αDn ) is subadditive.
Lemma 2 (sometimes called Fekete’s lemma): For any subadditive
function f ,
1
1
f (n) = inf
f (n)
n→∞ n1 · · · nd
n∈Nd n1 · · · nd
lim
Proof of Lemma 1: (d = 2):
αDn1,i+j = αDn1,i ∨ αDn1,j +(0,i) = αDn1,i ∨ T −(0,i)(αDn1,j ).
8
So,
H(αDn1,i+j ) ≤ H(αDn1,i )+H(T −(0,i)(αDn1,j )) = H(αDn1,i )+H(αDn1,j )
Similarly,
H(αDi+j,n2 ) ≤ H(αDi,n2 ) + H(αDj,n2 )
Proof of Lemma 2: Idea for d = 1:
f (rs) ≤ rf (s) and so f (rs)/(rs) ≤ f (s)/s.
Proof for (d = 2):
Fix (t1, t2) ∈ N2+. For (n1, n2) ∈ N2+, write
n1 = q1t1 + r1, n2 = q2t2 + r2, 0 ≤ r1 < t1, 0 ≤ r2 < t2
Then
f (n1, n2) ≤ f (q1t1, n2) + f (r1, n2) ≤ q1f (t1, n2) + f (r1, n2)
≤ q1(f (t1, q2t2) + f (t1, r2)) + f (r1, q2t2) + f (r1, r2)
≤ q1q2f (t1, t2) + q1f (t1, r2) + q2f (r1, t2) + f (r1, r2)
Then
q1 q2
q1 1
1
f (n1, n2) ≤
f (t1, t2) +
f (t1, r2)
n1n2
n1 n2
n1 n2
1
1 q2
+
f (r1, t2) +
f (r1, r2)
n1 n2
n1n2
Since q1/n1 → 1/t1 as n1 → ∞ and q2/n2 → 1/t2 as n2 → ∞
lim sup
n→∞
1
1
f (n1, n2) ≤
f (t1, t2)
n1n2
t1t2
Since this holds for all (t1, t2),
lim inf
n→∞
1
1
1
f (n1, n2) ≤ lim sup
f (n1, n2) ≤ inf
f (n1, n2)
d
n1n2
n
n
n
n
n∈N
n→∞
1 2
1 2
9
d
For a finite alphabet F let M+ := F Z+ .
The law of a stationary random field Xi1,...,id defines a Borel probability measure µ+ on M+ and the Zd+ shift action T+ on (M+, A+, µ+)
is an MP semigroup action.
d
Similarly, letting M := F Z , the same stationary process defines
a Borel probability measure µ on M and the Zd shift action T on
(M, A, µ) is an MP group action.
For a ∈ F , let
Aa+ := {x ∈ M+ : x0 = a}
and α+ := {Aa+ : a ∈ F }. Similarly, let
Aa := {x ∈ M : x0 = a}
and α := {Aa : a ∈ F }.
Fact: h(T , α) = h(T+, α+)
Proof: the definitions give the same result:
Dn
H(αDn ) = H(α+
) is the joint distribution of the random variable
Xi : i ∈ Dn.
In particular for the iid process iidp,
h(T , α) = h(T , α+) = H(p)
because
H(αDn ) = ndH(α) = ndH(p).
For any d and p = (1/2, 1/2), t hen h(T , α) = log 2; for p =
(1/3, 1/3, 1/3), then h(T , α) = log 3.
We often write h(X) := h(T , α).
In fact, every MP Zd+ action has a corresponding MP Zd action
that is minimal in a categorical (universal mapping property) way;
10
the two actions share all of their ergodic properties: ergodicity, mixing, entropy.
Natural extension: Given a Zd+ action T+ on (M+, A+, µ+), define
d
M = {x = (xi) ∈ M+Z : i ∈ Zd, xi ∈ M+, T j (xi) = xi+j ∀j}
Define S i(x)j = xi+j .
d
There is a measure µ on the product σ-algebra A of M+Z s.t. § is
an MP Zd action on (M, A, µ) and the map
π : M → M+, x 7→ x0
is MP and for all i ∈ Zd+, π ◦ S i = T i ◦ π
AND
π is minimal in the sense that for any other (M 0, A0, µ0), Zd action
(§0)i and MP π 0 : M 0 → M+ s.t for all i ∈ Zd+, π 0 ◦ (S 0)i = T i ◦ π,
the map π 0 factors through π.
(M, A, µ, S, π) is called the natural extension of T . See section
2.5 of Keller for more on this.
The Zd action is better than the Zd+ action!
11
Lecture 20:
Recall some of the entropy properties.
Recall:
Let T := (i 7→ T i) be a MP Zd+ action. Let α be a finite measble.
partition. For a finite G ⊂ Zd+, let
αG := ∨g∈GT −g α
Let Dn = [0, n − 1]d ∩ Zd.
Defn: For an MP Zd+ action T and (finite, measble.) partition α,
h(T , α) := lim (1/nd)H(αDn )
n→∞
The limit exists.
This defines h(T , α) for an MP Zd action T .
For any G ⊂ Zd, let
αG := ∨g∈GT −g α
if G is infinite, we take αG to mean the σ-algebra generated αG.
Let ≺ denote lexicographic order on Zd: i = (i1, . . . , id) ≺ j =
(i1, . . . , id) if there exists 1 ≤ k ≤ d s.t. ik < jk and im = jm
for all m > k (look for the “last” disagreement; this is really antilexicographic).
For j ∈ Zd, let
P −(j) := {i ∈ Zd : i ≺ j}
the lexicographic past of j, and P − := P −(0)
Theorem Past: Let T be a Zd action.
h(T , α) = H(0| P −).
(the entropy of the present conditioned on the past)
12
Note: this yields a result for Zd+ actions via the natural extension.
Proof: (d = 2): Decompose
X
H(j|P −(j) ∩ Dn)
H(Dn) =
j∈Dn
Let Bm := [−m, m]d.
By continuity of entropy, H(0|P − ∩ Bm) ↓ H(0|P −). So, given
> 0, there exists m s.t.
H(0|P −) ≤ H(0|P − ∩ Bm) ≤ H(0|P −) + Let
Sm,n = {j ∈ Dn : P − ∩ Bm + j ⊆ P −(j) ∩ Dn}
Then |Sm,n| ≥ n2 − 3nm.
For each j ∈ Sm,n,
H(0|P −) = H(j|P −(j)) ≤ H(j|P −(j) ∩ Dn) ≤ H(j|P − ∩ Bm + j)
= H(0|P − ∩ Bm) ≤ H(0|P −) + .
Thus,
−
−
|(H(j|P
(j)
∩
D
)
−
H(0|P
))|
n
H(Dn)
j∈S
m,n
−
|
− H(0|P )| ≤
n2
n2
P
−
3mn log |α|
j∈Dn \Sm,n (H(j|P (j) ∩ Dn ))
+
<
+
< 2
n2
n2
for sufficiently large n. Prop 1: Let T be an MP Zd+ action and α a (finite, measble.)
partition. Then for all m,
P
h(T, α) = h(T, αDm )
Proof:
h(T , αDm ) = lim (1/nd)H((αDm )Dn )
n→∞
13
d
= lim (1/n )H(α
Dn+m
n→∞
1
(n + m)d
Dn+m
)(
)H(α
) = h(T , α)
) == lim (
n→∞
nd
(n + m)d
Prop 2: Let T be an MP Zd+ action and α, β be (finite, measble.)
partitions. Then
h(T , β) ≤ h(T , α) + H(β|α)
Proof:
H(β Dn ) ≤ H(((β Dn ) ∨ (αDn ))
= H(αDn ) + H(β Dn |αDn )
X
X
j
i
H(β | ∨i∈Dn α ) ≤
H(β j | αj ) = ndH(β|α)
2nd term ≤
j∈Dn
j∈Dn
Thus,
(1/nd)H(β Dn ) ≤ (1/nd)H(αDn ) + H(β | α).
Let n → ∞. 14
Download