Lecture 16: Recall defns of ergodicity and mixing for Zd+ and Zd actions. Examples: 1. Z2 rotations: ergodic if α or β is irrational; never mixing 2.* Z2+ doubling and tripling map: mixing 3. Z2 iid shift action: clearly mixing 4.* Ledrappier 3-dot: mixing, but not mixing of “higher orders”. 5.* Ising (with temperature): µ+ and µ− are always mixing. Until 1957, no invariant could distinguish (d = 1) iid(1/2, 1/2) from iid(1/3, 1/3, 1/3). ENTROPY FOR MPT’s and MP actions Defn: Let (M, A, µ) be a probability space. Let α = {A1, . . . , An} be a finite measurable partition of M and X µ(Ai) log µ(Ai) H(α) := − i Letting p := (µ(A1), . . . , µ(An)), then H(α) = H(p) as defined before. Restate earlier Proposition in terms of entropy of partition: 0) H(α) ≥ 0 1) H(α) = 0 iff α is deterministic, i.e., for some unique i, µ(Ai) = 1. 2) H(α) is continuous and strictly concave 3) H(α) ≤ log n with equality iff each µ(Ai) = 1/n. Idea: H(α) represents uncertainty, information, randomness of α — How much uncertainty is there in drawing a sample and observing the Ai to which it belongs. — How much new information is revealed, on the average, when you draw a sample and observe Ai. 1 Relations between partitions: Let β := {B1, . . . , Bm} be another finite measurable partition on the same probability space. Defn: β α: each element of β is a union of elements of α. Note: α is finer than β, and β is coarser than α. Defn: α ⊥ β: µ(Ai ∩ Bj ) = µ(Ai)µ(Bj ). Defn: α ∨ β = {A ∩ B : A ∈ α, B ∈ β} but delete sets of measure zero Defn: If T is an MPT, T −1(α) = {T −1(A) : A ∈ α} Note: H(α ∨ β) = − X µ(Ai ∩ Bj ) log µ(Ai ∩ Bj ) i,j Defn: H(β|α) := − X µ(Ai ∩ Bj ) log µ(Bj |Ai). i,j which can be rewritten: X X µ(Ai)(− µ(Bj |Ai) log µ(Bj |Ai)) i j So, H(β|α) is a weighted average of the entropies X H(β|Ai) := − µ(Bj |Ai) log µ(Bj |Ai) j Another viewpoint: given a random variable X taking values 1, . . . , n: Ai := X −1({i}) and α := {A1, . . . , An}. 2 Up to renaming its values, X is determined by α and we define H(X) := H(α) Let X ∼ α, Y ∼ β (on the same probability space) Dictionary: β α: “Y is a function of X” α ⊥ β: “X and Y are independent” H(α ∨ β): “H(X, Y )” H(β|α): “H(Y |X)” Properties of entropy: 1. H(α ∨ β) = H(α) + H(β|α) Proof: LHS = X X − µ(Ai∩Bj ) log µ(Ai∩Bj ) = − µ(Ai∩Bj ) log µ(Bj |Ai)µ(Ai) i,j i,j =− X µ(Ai ∩ Bj ) log µ(Bj |Ai) − X i,j =− X µ(Ai ∩ Bj ) log µ(Ai) i,j µ(Ai ∩ Bj ) log µ(Bj |Ai) − i,j X µ(Ai) log µ(Ai) = RHS i “H(X, Y ) = H(X) + H(Y |X)” 2. H(β|α) ≤ H(β) with equality iff α ⊥ β Proof: H(β|α) = − X µ(Ai ∩ Bj ) log µ(Bj |Ai)) (i,j) = X j µ(Bj ) X µ(Ai|Bj )) log( i 3 1 ) µ(Bj |Ai) by Jensen X X µ(Ai|Bj ) X X µ(Ai) ≤ µ(Bj ) log( )= µ(Bj ) log( )= µ(B |A ) µ(B ) j i j j i j i X µ(Bj ) log( j 1 ) = H(β) µ(Bj ) with equality iff for each j, µ(Bj |Ai) is a constant cj ; only possible constant is µ(Bj ): µ(Bj ∩ Ai) = cj µ(Ai) therefore α ⊥ β. Note: literally, we have shown that µ(Ai ∩ Bj ) = µ(Ai)µ(Bj ), only when µ(Ai ∩ Bj ) > 0. But it then follows for all (i, j). “H(Y |X) ≤ H(X) with equality iff X ⊥ Y .” 3. H(β|α) ≥ 0 with equality iff β α Proof: H(β|α) is a weighted average (wtd, by µ(Ai)) of entropies H(β|Ai) Each of these entropies is nonnegative. So, H(β|α) ≥ 0. Equality holds iff for each i, H(β|Ai) = 0 iff for each i, there is a unique j such that µ(Bj |Ai) = 1 iff Bj ⊇ Ai mod 0. 4 Lecture 17: Recall Properties 1-3: 1. H(α ∨ β) = H(α) + H(β|α) “H(X, Y ) = H(X) + H(Y |X)” 2. H(β|α) ≤ H(β) with equality iff α ⊥ β “H(Y |X) ≤ H(X) with equality iff X ⊥ Y .” 3. H(β|α) ≥ 0 with equality iff β α ‘H(Y |X) ≥ 0 with equality iff Y is a function of X.” 4. H(α ∨ β) = H(α) iff β α Proof: Follows from 1 and 3. Pn 5. H(α1 ∨ · · · ∨ αn) = i=1 H(αi|α1 ∨ · · · ∨ αi−1) Proof: Induction based on 1. P 6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent. Proof: Follows from 5 and 2. 7. Assume β α. Then H(β) ≤ H(α) and equality holds iff β = α. Proof: By 1 H(α) = H(α ∨ β) = H(β) + H(α|β) Since H(α|β) ≥ 0, we have H(β) ≤ H(α). And H(β) = H(α) iff H(α|β) = 0 iff α β α iff α = β. “H(f (X)) ≤ H(X) and equality holds iff f is 1-1 a.e.” 5 8. Assume γ α. Then H(β|α) ≤ H(β|γ) Proof: LHS = X X X k RHS = i:Ai ⊆Ck XX k −µ(Ai ∩ Bj ) log(µ(Bj |Ai) j −µ(Bj ∩ Ck ) log µ(Bj |Ck ) j We show that the inequality holds term by term for each j and k, or equivalently (after dividing by µ(Ck )) X µ(Ai) − µ(Bj |Ai) log µ(Bj |Ai) ≤ −µ(Bj |Ck ) log µ(Bj |Ck ) µ(Ck ) i:Ai ⊆Ck Letting f (x) = −x log(x), this becomes X µ(Ai) f (µ(Bj |Ai) ≤ f (µ(Bj |Ck )) µ(Ck ) i:Ai ⊆Ck which is true by Jensen since X µ(Ai) =1 µ(Ck ) i:Ai ⊆Ck and X µ(Ai) µ(Bj |Ai) = µ(Bj |Ck ) µ(Ck ) i:Ai ⊆Ck “H(Y |X) ≤ H(Y |f (X)) 9. H(α ∨ β|γ) = H(α|γ) + H(β|α, γ) 10. H(β|γ, α) ≤ H(β|γ) with equality iff α ⊥γ β, i.e., for each element C of γ, the restrictions of α to C and β to C are independent. 6 11. If α ⊥γ (β, δ), then H(β|δ, γ, α) = H(β|δ, γ). 12. If γ α, then H(γ|β) ≤ H(α|β) 13. If T is an MPT, then for all i, j, k, H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α) Defn: The information function of a finite measble partition: P Iα (x) := − i RχAi (x) log µ(Ai) Note: H(α) = Iα dµ. Defn: Given B, a sub-sigma-algebra of A, define X X Iα|B (x) := − χAi (x) log µ(Ai|B) = − χAi (x) log E(χAi |B) i i Defn: H(α|B) := R I α|B dµ. Trivial examples: Z Z X X χAi (x) log χAi (x) = 0. χAi (x) log µ(Ai|A) = − H(α|A) = − i i Z H(α|{M, ∅}) = − X χAi (x) log µ(Ai|{M, ∅}) = H(α) i Prop: Let α, β be finite measble. partitions. Let B be the σalgebra generated by β, i.e., the collection of all finite unions of elements of β. Then H(α|B) = H(α|β) proof: µ(Ai|B) = X χBj (x) j 7 µ(Ai ∩ Bj ) µ(Bj ) Thus, Iα|B (x) = − X χAi (x) log X j i =− X χBj (x) χAi∩Bj (x) log i,j µ(Ai ∩ Bj ) µ(Bj ) µ(Ai ∩ Bj ) . µ(Bj ) Thus, Z H(α|B) = − X χAi∩Bj (x) log i,j =− X µ(Ai ∩ Bj ) log i,j µ(Ai ∩ Bj ) dµ µ(Bj ) µ(Ai ∩ Bj ) = H(α|β) µ(Bj ) Correspondence: Finite Partition ←→ Finite σ-algebra β 7→ σ(β): σ(β) = unions of elts. of β B 7→ π(B) π(B) = {∩B∈B B where B = B or B c (the unique generating partition for B) So, H(α|B) for finite algebras B is really the same as H(α|β) for finite measurable partitions. Theorem: (continuity of conditional entropy) (M, A, µ): probability space. Let B1 ⊂ B2 ⊂ B3 . . . be σ-algebras and B = σ(∪∞ n=1 Bn ). Let α be a finite measble. partition. 8 Then Iα|Bn → Iα|B a.e. and in L1 and thus, H(α|Bn) → H(α|B) Proof: Main idea: by Martingale Convergence Theorem, µ(Ai|Bn) → µ(Ai|B) a.e. and in L1. More details next time. 9