Lecture 18: Recall: Let α = {A1, . . . , Ak } a finite measble. partition and B a sub-σ-algebra Defn: X X Iα|B (x) := − χAi (x) log µ(Ai|B) = − χAi (x) log E(χAi |B) i i Defn: H(α|B) := R I α|B dµ. Prop: Let α, β be finite measble. partitions. Let B be the σalgebra generated by β, i.e., the collection of all finite unions of elements of β. Then H(α|B) = H(α|β) Continuity of Conditional entropy: Let B1 ⊂ B2 ⊂ B3 . . . be σ-algebras and B = σ(∪∞ n=1 Bn ). Let α be a finite measble. partition. Then Iα|Bn → Iα|B a.e. and in L1 and H(α|Bn) → H(α|B) Proof: By Martingale Convergence Theorem µ(Ai|Bn) → µ(Ai|B) a.e. and in L1 Thus, Iα|Bn : − X χAi log µ(Ai|Bn) → i X i 1 χAi log µ(Ai|B) := Iα|B (a.e. and in L1) Can show supn Iα|Bn ∈ L1 (may be done in student talk on ShannonMcMillan-Breiman Thoerem). Apply DCT to get L1 convergence of the conditional information function and convergence of conditional entropy. Alternative proof only of convergence of conditional entropy: Fact: Z H(α|B) = − X µ(Ai|B) log µ(Ai|B)dµ i because whenever g is B-measble, we have: Z Z f gdµ = E(f |B)gdµ And integrand above is bounded by log k because {µ(Ai|B)(x)}ki=1 is a probability vector. P P And − i µ(Ai|Bn) log µ(Ai|Bn) → − i µ(Ai|B) log µ(Ai|B) a.e. Apply bounded convergence theorem. ——————————We can formulate another list of entropy inequalities involving conditioning on sub-σ-algebras of A, instead of conditioning on finite measble partitions. Defn: A measure space has a countable basis if there is a countable collection of measurable sets C s.t. for all A ∈ A and > 0, there exists C ∈ C s.t. µ(A∆C) < . Fact: The Borel sigma-algebra for any separable metric space has a countable basis. And so does its completion. Proof: The σ-algebra A is generated by finite unions of open balls of radius 1/n centered at the points of a countable dense set. 2 Fact: (M, A, µ) has a countable basis iff L2(M, A, µ) is separable. Proof: P only if: given a countable basis C, the collection ni=1 aiχAi with ai ∈ Q and Ai ∈ C is L2-dense. if: given a countable L2-dense subset D, the collection f −1(a, b) with a, b ∈ Q and f ∈ D is a countable basis. Fact: Given B ⊆ A, the map L2(M, A, µ) → L2(M, B, µ), f 7→ E(f |B) is continuous. Proof: Z |E(f |B)−E(g|B)|2dµ = Z Z = E(f −g|B)2dµ ≤ Z E(|f −g|2|B)dµ |f − g|2dµ Corollary: Let B ⊆ A, If (M, A, µ) has a countable basis, so does (M, B, µ). Tool to develop conditional entropy inequalities, conditioning on sub-sigma-algebras: Let (M, A, µ) have a countable basis. Let B be sub-sigma-algebra. Fix {C1, C2, . . .}, a countable basis for B. Let Bn be the sigma-algebra (equiv.., the algebra) generated by {C1, C2, . . . , Cn}. Then Bn ↑ B. Let βn denote the partition determined by Bn. 3 By continuity theorem, H(α|βn) = H(α|Bn) → H(α|B). Since βn βn+1, H(α|Bn) ↓ H(α|B) 4 Properties of entropy: 1. H(α ∨ β) = H(α) + H(β|α) 2. H(α|B) ≤ H(α) with equality iff α ⊥ B 3. H(α|B) ≥ 0 with equality iff α ⊆ B 4. H(α ∨ β) = H(α) iff β α P 5. H(α1 ∨ · · · ∨ αn) = ni=1 H(αi|α1 ∨ · · · ∨ αi−1) P 6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent. 7. Assume β α. Then H(β) ≤ H(α) and equality holds iff β = α. 8. Assume B C. Then H(α|C) ≤ H(α|B) 9. H(α ∨ β|C) = H(α|C) + H(β|σ(α ∪ C)) ≤ H(α|C) + H(β|C) 10. H(α|σ(B ∪ C)) ≤ H(α|B) with equality iff α ⊥B C 11. If B ⊥C (σ(β ∪ D), then H(β|σ(D ∪ C ∪ B) = H(β|σ(D ∪ C)). 12. If γ α, then H(γ|B) ≤ H(α|B) 13. If T is an MPT, then for all i, j, k, H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α) 14. If T is an MPT, then for all i, j, k, H(T −iα ∨ . . . ∨ T −j α|B) = H(T −i−k α ∨ . . . ∨ T −j−k α|T −k (B)) 5 Proof of 2: H(α|B) ≤ H(α): Proof: Bn ↑ B So, H(α|B) = limn H(α|Bn) = limn H(α|βn) ≤ H(α). H(α|B) = H(α) iff B ⊥ α. Proof: If: B ⊥ α implies for all n, we have βn ⊥ α. Thus, H(α|βn) = H(α), Thus, H(α|B) = limn H(α|βn) = H(α). Only If: H(α|βn) ↓ H(α|B). So, H(α|B) ≤ H(α|βn) ≤ H(α). So, if H(α|B) = H(α), then for all n, H(α|βn) = H(α). So, for all n, α ⊥ βn. Thus, α ⊥ B. QED Proof of 3: If: If each Ai ∈ B, then E(χAi |B) = χAi . Thus, Z X H(α|B) = −χAi (x) log χAi (x)dµ = 0 i since χAi takes on only 0 and 1 values. Only if: If H(α|B) = 0, then since the integrand is nonnegative in Z X H(α|B) = − µ(Ai|B) log µ(Ai|B)dµ i 6 the integrand must be zero a.e., and so each µ(Ai|B)(x) is 0/1 valued. So, µ(Ai|B) = χC for some C ∈ B. Claim: Ai = C ∈ B (and so α ⊂ B) Proof: Z Z µ(Ai ∩ C) = Z µ(Ai|B) = χA i = C C χC = µ(C) C and µ(Ai ∩ C c) = Z Cc Z Z µ(Ai|B) = χAi = Cc So, Ai = C mod 0. 7 χC = 0 Cc Lecture 19: Let T := (i 7→ T i) be a MP Zd+ action. Let α be a finite measble. partition. For a finite G ⊂ Zd+, let αG := ∨g∈GT −g α Let Dn = [0, n − 1]d ∩ Zd. Defn: h(T , α) := lim (1/nd)H(αDn ) n→∞ d Note: this defines entropy of Z actions as well. Theorem: The limit exists. In fact, for n = (n1, . . . , nd) ∈ Nd+, let Dn = [0, n1 − 1] × · · · [0, nd − 1] ∩ Zd. Then 1 1 H(αDn ) = inf H(αDn ) n→∞ n1 · · · nd n∈Nd n1 · · · nd lim the limit exists (independently of how n → ∞). Proof: Defn: A function f : Nd+ → R is subadditive if for all k = 1, . . . , d and all i, j, f (n1, . . . , nk−1, i + j, nk+1, . . . nd) ≤ f (n1, . . . , nk−1, i, nk+1, . . . nd) +f (n1, . . . , nk−1, j, nk+1, . . . nd) The theorem follows immediately from the following two lemmas: Lemma 1: The function f (n) := H(αDn ) is subadditive. Lemma 2 (sometimes called Fekete’s lemma): For any subadditive function f , 1 1 f (n) = inf f (n) n→∞ n1 · · · nd n∈Nd n1 · · · nd lim Proof of Lemma 1: (d = 2): αDn1,i+j = αDn1,i ∨ αDn1,j +(0,i) = αDn1,i ∨ T −(0,i)(αDn1,j ). 8 So, H(αDn1,i+j ) ≤ H(αDn1,i )+H(T −(0,i)(αDn1,j )) = H(αDn1,i )+H(αDn1,j ) Similarly, H(αDi+j,n2 ) ≤ H(αDi,n2 ) + H(αDj,n2 ) Proof of Lemma 2: Idea for d = 1: f (rs) ≤ rf (s) and so f (rs)/(rs) ≤ f (s)/s. Proof for (d = 2): Fix (t1, t2) ∈ N2+. For (n1, n2) ∈ N2+, write n1 = q1t1 + r1, n2 = q2t2 + r2, 0 ≤ r1 < t1, 0 ≤ r2 < t2 Then f (n1, n2) ≤ f (q1t1, n2) + f (r1, n2) ≤ q1f (t1, n2) + f (r1, n2) ≤ q1(f (t1, q2t2) + f (t1, r2)) + f (r1, q2t2) + f (r1, r2) ≤ q1q2f (t1, t2) + q1f (t1, r2) + q2f (r1, t2) + f (r1, r2) Then q1 q2 q1 1 1 f (n1, n2) ≤ f (t1, t2) + f (t1, r2) n1n2 n1 n2 n1 n2 1 1 q2 + f (r1, t2) + f (r1, r2) n1 n2 n1n2 Since q1/n1 → 1/t1 as n1 → ∞ and q2/n2 → 1/t2 as n2 → ∞ lim sup n→∞ 1 1 f (n1, n2) ≤ f (t1, t2) n1n2 t1t2 Since this holds for all (t1, t2), lim inf n→∞ 1 1 1 f (n1, n2) ≤ lim sup f (n1, n2) ≤ inf f (n1, n2) d n1n2 n n n n n∈N n→∞ 1 2 1 2 9 d For a finite alphabet F let M+ := F Z+ . The law of a stationary random field Xi1,...,id defines a Borel probability measure µ+ on M+ and the Zd+ shift action T+ on (M+, A+, µ+) is an MP semigroup action. d Similarly, letting M := F Z , the same stationary process defines a Borel probability measure µ on M and the Zd shift action T on (M, A, µ) is an MP group action. For a ∈ F , let Aa+ := {x ∈ M+ : x0 = a} and α+ := {Aa+ : a ∈ F }. Similarly, let Aa := {x ∈ M : x0 = a} and α := {Aa : a ∈ F }. Fact: h(T , α) = h(T+, α+) Proof: the definitions give the same result: Dn H(αDn ) = H(α+ ) is the joint distribution of the random variable Xi : i ∈ Dn. In particular for the iid process iidp, h(T , α) = h(T , α+) = H(p) because H(αDn ) = ndH(α) = ndH(p). For any d and p = (1/2, 1/2), t hen h(T , α) = log 2; for p = (1/3, 1/3, 1/3), then h(T , α) = log 3. We often write h(X) := h(T , α). In fact, every MP Zd+ action has a corresponding MP Zd action that is minimal in a categorical (universal mapping property) way; 10 the two actions share all of their ergodic properties: ergodicity, mixing, entropy. Natural extension: Given a Zd+ action T+ on (M+, A+, µ+), define d M = {x = (xi) ∈ M+Z : i ∈ Zd, xi ∈ M+, T j (xi) = xi+j ∀j} Define S i(x)j = xi+j . d There is a measure µ on the product σ-algebra A of M+Z s.t. § is an MP Zd action on (M, A, µ) and the map π : M → M+, x 7→ x0 is MP and for all i ∈ Zd+, π ◦ S i = T i ◦ π AND π is minimal in the sense that for any other (M 0, A0, µ0), Zd action (§0)i and MP π 0 : M 0 → M+ s.t for all i ∈ Zd+, π 0 ◦ (S 0)i = T i ◦ π, the map π 0 factors through π. (M, A, µ, S, π) is called the natural extension of T . See section 2.5 of Keller for more on this. The Zd action is better than the Zd+ action! 11 Lecture 20: Recall some of the entropy properties. Recall: Let T := (i 7→ T i) be a MP Zd+ action. Let α be a finite measble. partition. For a finite G ⊂ Zd+, let αG := ∨g∈GT −g α Let Dn = [0, n − 1]d ∩ Zd. Defn: For an MP Zd+ action T and (finite, measble.) partition α, h(T , α) := lim (1/nd)H(αDn ) n→∞ The limit exists. This defines h(T , α) for an MP Zd action T . For any G ⊂ Zd, let αG := ∨g∈GT −g α if G is infinite, we take αG to mean the σ-algebra generated αG. Let ≺ denote lexicographic order on Zd: i = (i1, . . . , id) ≺ j = (i1, . . . , id) if there exists 1 ≤ k ≤ d s.t. ik < jk and im = jm for all m > k (look for the “last” disagreement; this is really antilexicographic). For j ∈ Zd, let P −(j) := {i ∈ Zd : i ≺ j} the lexicographic past of j, and P − := P −(0) Theorem Past: Let T be a Zd action. h(T , α) = H(0| P −). (the entropy of the present conditioned on the past) 12 Note: this yields a result for Zd+ actions via the natural extension. Proof: (d = 2): Decompose X H(j|P −(j) ∩ Dn) H(Dn) = j∈Dn Let Bm := [−m, m]d. By continuity of entropy, H(0|P − ∩ Bm) ↓ H(0|P −). So, given > 0, there exists m s.t. H(0|P −) ≤ H(0|P − ∩ Bm) ≤ H(0|P −) + Let Sm,n = {j ∈ Dn : P − ∩ Bm + j ⊆ P −(j) ∩ Dn} Then |Sm,n| ≥ n2 − 3nm. For each j ∈ Sm,n, H(0|P −) = H(j|P −(j)) ≤ H(j|P −(j) ∩ Dn) ≤ H(j|P − ∩ Bm + j) = H(0|P − ∩ Bm) ≤ H(0|P −) + . Thus, − − |(H(j|P (j) ∩ D ) − H(0|P ))| n H(Dn) j∈S m,n − | − H(0|P )| ≤ n2 n2 P − 3mn log |α| j∈Dn \Sm,n (H(j|P (j) ∩ Dn )) + < + < 2 n2 n2 for sufficiently large n. Prop 1: Let T be an MP Zd+ action and α a (finite, measble.) partition. Then for all m, P h(T, α) = h(T, αDm ) Proof: h(T , αDm ) = lim (1/nd)H((αDm )Dn ) n→∞ 13 d = lim (1/n )H(α Dn+m n→∞ 1 (n + m)d Dn+m )( )H(α ) = h(T , α) ) == lim ( n→∞ nd (n + m)d Prop 2: Let T be an MP Zd+ action and α, β be (finite, measble.) partitions. Then h(T , β) ≤ h(T , α) + H(β|α) Proof: H(β Dn ) ≤ H(((β Dn ) ∨ (αDn )) = H(αDn ) + H(β Dn |αDn ) X X j i H(β | ∨i∈Dn α ) ≤ H(β j | αj ) = ndH(β|α) 2nd term ≤ j∈Dn j∈Dn Thus, (1/nd)H(β Dn ) ≤ (1/nd)H(αDn ) + H(β | α). Let n → ∞. 14