Lecture 20 (Oct. 24) Comments on ergodic theorem: 1, Main idea of proof: break an orbit into pieces each of which has time average ≥ than biggest it can be. 2. we recover the strong law of large numbers (since iid(p) is ergodic Pn−1 k=0 xk lim = EpX a.e. n→∞ n ——- apply the ergodic theorem to f = X 3. we recover the Weil equidistribution theorem: since irrational roatations of the circle are ergodic: Theorem (Weil, 1909): If α ∈ [0, 1] is irrational, then {α, 2α, 3α, . . . , } is uniformly distributed mod 1 in [0, 1]: for any subinterval (a, b) ⊂ [0, 1] |{i ∈ [0, n − 1] : iα ∈ (a, b)}| →b−a n Proof: apply the ergodic theorem to Tα , f = χ(a,b). Note: we have rescaled by 2π For a.e. x ∈ [0, 1], get |{i ∈ [0, n − 1] : (x + iα) ∈ (a, b)}| →b−a n For a.e. x ∈ [0, 1], get this statement for all (a, b) with rational a, b. For a.e. x ∈ [0, 1], get this statement for all (a, b) (why? fill in details) Need only one x for which this holds for all (a, b). Theorem: Let T be an MPT. TFAE: 1 1. T is ergodic. 6. For all f ∈ L1, lim (1/n)( n→∞ n−1 X f ◦ T i(x)) = Z f dµ a.e. i=0 7. For all A, B ∈ A, lim (1/n)( n−1 X n→∞ µ(B ∩ T −n(A))) = µ(A)µ(B). i=0 8. Let B be a semi-algebra which generates A. For all A, B ∈ B, lim (1/n)( n−1 X n→∞ µ(B ∩ T −n(A))) = µ(A)µ(B). i=0 So, it suffices to just check cylinder sets to establish ergodicity for stationary processes. Proof: 1 ⇒ 6: apply the Ergodic Theorem. 6 ⇒ 7: apply 6 with f = χA. Then lim (1/n)( n→∞ n−1 X χA ◦ T i(x)χB ) = µ(A)χB a.e. i=0 Apply the bounded convergence theorem. 7 ⇒ 8: trivial. 8 ⇒ 1: Let T −1(A) = A. Let > 0 and A0 be in the algebra generated by B s.t. µ(A∆A0) < . (1) 2 Then |µ(A)2−µ(A)| ≤ |µ(A)2−µ(A0)2|+|µ(A0)2−((1/n) n−1 X µ(T −i(A0)∩A0))| i=0 +|(1/n) n−1 X µ(T −i(A0) ∩ A0) − µ(T −i(A) ∩ A0)| i=0 +|(1/n) n−1 X µ(T −i(A) ∩ A0) − µ(T −i(A) ∩ A)| i=0 ≤ 2|µ(A) − µ(A0)| + + + < 5 where, in the last expression, the first occurrence of comes from condition 8, if n is sufficiently large, and the last two occurrences of come from (1), the fact that T is MPT and the triangle inequality. Thus, µ(A)2 − µ(A) = 0. QED Note: If T is IMPT, can swap T with T −1 everywhere. Characterize ergodicity of stationary finite-state first-order Markov chains: Assume MC defined by stochastic matrix P and stochastic vector p: pP = P Let A = Aa0,...a` µ(A) = pa0 Pa0a1 Pa1a2 · · · Pa`−1a` . WMA: p > 0 (delete states with zero stationary probability; does not affect the MPT) Main Fact: Let Ai = {x : x0 = i}. 3 µ(σ −k (Aj ) ∩ Ai) = pi(P k )ij Defn: P is irreducible if for all i, j, there exists n = n(i, j) s.t. (P n)ij > 0. Defn: directed graph, G = G(P ) of P : V = {1, . . . , m} Directed edge from i to j iff Pij > 0. Note: Pijn > 0 iff there exists a path in G of length n from i to j. So, P is irreducible iff G is strongly connected. Example 1: 0 1 0 P = 1/3 0 2/3 0 1 0 is irreducible. Example 2: 1 0 0 P = 0 1/3 2/3 0 1/4 3/4 is reducible. Example 3: 1/3 1/3 1/3 P = 0 1/3 2/3 0 1/4 3/4 is reducible. Theorem: σ is ergodic (w.r.t (p, P )) iff P is irreducible. 4 Lecture 21 (Oct. 26): Lemma: (assume p > 0) Q = lim (1/n)( n→∞ n−1 X P k) k=0 exists (entry by entry). Note: easy to prove by linear algebra if you know that no eigenvalue has modulus > 1 and Jordan form is trivial for eigenvalues of modulus 1. (would follow from P-F). Proof: Thus, n−1 n−1 X X k Qij = (1/pi) lim (1/n)pi( (P )ij ) = (1/pi) lim (1/n)( µ(σ −k (Aj )∩Ai n→∞ n→∞ k=0 (1/pi) lim (1/n)( n→∞ Z = (1/pi) n−1 Z X (χAj ◦ σ k )(χAi )dµ) = k=0 n−1 X ( lim (1/n)( (χAj ◦ σ k )(χAi )dµ) n→∞ k=0 Z = (1/pi) (χ∗Aj )χAi dµ (by ergodic theorem and bounded convergence theorem) Corollary: – – – – QP = Q = P Q Q2 = Q pQ = p Q is stochastic 5 k=0 Theorem TFAE: (assume p > 0). 1. σ is ergodic w.r.t. (p, P ) 2. Qij = pj 3. P is irreducible. Proof: 1 ⇒ 2: If σ is ergodic, then Z Qij = (1/pi) (χ∗Aj )χAi dµ = (1/pi)µ(Aj )µ(Ai) = pj (alternatively, use condition 8 of previous theorem) 2 ⇒ 1: If Qij = pj , then lim (1/n)( n→∞ n−1 X µ(T −k (Aj ) ∩ Ai)) = piQij = pipj k=0 and so condition 8 holds for cylinder sets with only one cylinder coordinate. Easily extends to general cylinder sets. Let A = Aa0,...a`−1 µ(A) = pa0 Pa0a1 Pa1a2 · · · Pa`−2a`−1 . Let B = Ab0,...bu−1 µ(B) = pb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 . and for suff. large k, µ((σ −k (A))∩B) = pb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 (Pbk−u )Pa0a1 Pa1a2 · · · Pa`−2a`−1 . u ,a0 6 Thus, (1/n)( Pn−1 k=0 µ((σ −k (A)) ∩ B)) converges to pb0 Pb0b1 Pb1b2 · · · Pbu−1bu pa0 Pa0a1 Pa1a2 · · · Pa`−1a` . = µ(A)µ(B). 2 ⇒ 3: Q > 0 and so by defn. of Q, for all i, j there exists n s.t. Pijn > 0. 3 ⇒ 2: Q = QP = QP 2 = QP n for all n. Fix i, j. Show Qij > 0. P n Qij = k Qik Pkj Since Q is stochastic, there exists k s.t. Qik > 0. n > 0. Since P is irreducible, there exists n s.t. Pkj Thus, Qij > 0. 2 P Since Q = Q , Qij = k Qik Qkj . Since Q is stochastic, Qij is a weightted average, with strictly positive weights of Q1j , Q2j , . . . , Qmj . But this holds for all i. Thus, Qij depends only on j. P P But pQ = p. Thus, pj = k pk Qkj = k pk qj = qj . QED Defn: An MPT T is mixing if for all A, B ∈ A, µ(T −n(A) ∩ B) → µ(A)µ(B). Note: mixing implies ergodicity, since convergence implies Cesaro convergence. Note: to check mixing, it suffices to check mixing on a generating semi-algebra (for the same reason as condition 8 above is equivalent to condition 7). 7 Note (***): Given any finite set of sets of positive measure, A1, . . . , Am, ∃n ∀i, j, µ(T −n(Aj ) ∩ Ai) > 0. Note: blob picture of mixing. Check examples: 1. Rotation of circle: Intuitively, rigid transformations can never be mixing. Subdivide into small arcs A1, . . . , Am, then we cannot have (***). If irrational, then ergodic, but not mixing. 2. iid process (one-sided or two-sided): is mixing because of independence (on cylunder sets). 3 and 4. It follows that doubling map and Baker are mixing (mixing is an isomorphism invariant). Picture of Baker. Lecture 22: Oct, 28 Mixing for Markov chains: Defn: P is primitive if P n > 0 (entry-wise) for some n. Graph interpretation: there is a uniform time n in which you can get from any state to any state. Examples: 0 1 0 P = 1/3 0 2/3 0 1 0 P n is never positive: look at graph. 8 0 1/2 1/2 P = 1/3 0 2/3 0 1 0 P 4 > 0. Prop: P is primitive iff P is irreducible and aperiodic (i.e., gcd of cycle lengths of G(P ) is 1). (check examples) Proof: gcd = gcd{n : trace(P n) > 0}. Only If: P n > 0 implies P n+1 > 0 (think of graph interpretation). gcd(n, n + 1) = 1. If: Special case: there exists a self-loop. Connect i to j via the self-loop at k. In general, use a combination of cycle lengths that are relatively prime. TFAE 1. σ is mixing. 2. (P k )ij → πj . 3. P is primitive. So, Example 1 is ergodic but not mixing, and Example 2 is mixing. Proof: 1 ⇒ 2: Let Ai and Aj be thin cylinder sets. Mixing implies: πi(P n)ij → πiπj Thus, (P n)ij → πj 9 2 ⇒ 1: Just like ergodic proof: verify for cylinder sets: For thin cylinder sets Ai, Aj , show that µ(T −n(Aj ) ∩ Ai) → µ(Ai)µ(Aj ). Extend to general cylinder sets 2 ⇒ 3: Follows since p > 0. 3 ⇒ 2: geometric contraction proof in two dimensions. P Let W = {(x1 . . . xd) ∈ Rd : xi = 1}. P : W → W, x 7→ xP Well-defined since xP · 1 = xP · 1 = x · 1 = 1,. ¯ ¯ ¯ k Note: P (W ) is nested decreasing. k Claim: ∆ ≡ ∩∞ k=0 P (W ) is a single point {z}. —- If true, then z = π because: ———- π = πP k → z. —- Thus, for all i, eiP k → π and so (P k )ij → πj . Proof of claim for d = 2: ∆ is a closed interval in W . If not a single point, then two linearly independent points {x, y} form a lin. indep. set. Thus x and y are fixed by P 2. Thus, P 2 is the identity, contrary to primitivity. Proof of claim for general d: Use contraction mapping w.r.t. Hilbert metric on W ◦: wi/wj ρ(v, w) = max log i,j vi/vj If P n > 0, then P n(W ) is a compact subset of W ◦. 10 Apply contraction mapping theorem: ———– P has a unique fixed point (which must be π) and for all x ∈ P n(W ), xP k → π apply to each x = eiP n. QED Can also apply renewal theorem Let A be a nonnegative square matrix, i.e., entries are all nonnegative. Define irreducible, primitive and aperiodic for A in the exact same way as we defined for stochastic matrices. Skipped for now: Spectral radius of A, denoted λ(A), is max of absolute values of eigenvalues. Perron-Frobenius Theorem: Let A be an irreducible matrix with spectral radius λ(A). Then 1. λ(A) > 0 and is an eigenvalue of A. 2. λ(A) is a simple eigenvalue. 3. λ(A) has a (strictly) positive eigenvector (left is v and right is w). 4. If A is primitive, then An → (viwj ) λ(A)n (where v · w = 1). 5. If A is primitive, then λ(A) is the only eigenvalue of modulus λ(A). 11 =——————————————— Lecture 23: Oct. 31 ENTROPY FOR MPT’S Let (M, A, µ) be a probability space. Let α = {A1, . . . , An} be a FINITE MEASURABLE PARTITION of M . Defn: H(α) = H(X) where X is a r.v. with distribution p = (µ(A1), . . . , µ(An)) X µ(Ai) log µ(Ai) H(α) = − i Schematic: Defn: H(α|β) = H(X|Y ) where (X, Y ) are jointly dist. r.v.’s with distribution p(X = i, Y = j) = (µ(Ai ∩ Bj ) H(α|β) = − X µ(Ai ∩ Bj ) log i,j µ(Ai ∩ Bj ) . µ(Bj ) Schematic: Relations between partitions: 1. We identify two partitions if they agree, element by element, up to a set of measure zero, i.e, α = β: α = {A1, . . . , An}, α = {B1, . . . , Bn}, µ(Ai∆Bi) = 0, Note: H(α) is well-defined. 2. β α: each element of β is a union of elements of α. 12 Note: α is finer than β, and β is coarser than α. 3. α ∨ β = {A ∩ B : A ∈ α, B ∈ β} but delete sets of measure zero Note: this is like joining two r.v.’s with some joint distribuiotn. 4. α ⊥ β: µ(Ai ∩ Bj ) = µ(Ai)µ(Bj ). 5. If T is an MPT, T −1(α) = {T −1(A) : A ∈ α} TRANSLATE ENTROPY PROPERTIES (from r.v.’s to partitions): X≈α Y ≈β Properties II: 1. H(α ∨ β) = H(α) + H(β|α) 2. H(β|α) ≤ H(β) with equality iff α ⊥ β 3. H(β|α) ≥ 0 with equality iff β α 4. H(α ∨ β) = H(α) iff β α Pn 5. H(α1 ∨ · · · ∨ αn) = i=1 H(αi|α1 ∨ · · · ∨ αi−1) P 6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent. 7. Assume β α. Then H(β) ≤ H(α) and equality holds iff β = α. 8. Assume γ α. Then H(β|α) ≤ H(β|γ) 9. H(α ∨ β|γ) = H(α|γ) + H(β|α, γ) 13 10. H(β|γ, α) ≤ H(β|γ) with equality iff α ⊥γ β 11. If α ⊥γ (β, δ), then H(β|δ, γ, α) = H(β|δ, γ). 12. If γ α, then H(γ|β) ≤ H(α|β) 13. If T is an MPT, then for all i, j, k, H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α) Want to define: H(α|B) where α is a finite partition and B is a sub-sigma-algebra of A. Recall information function of a r.v. IX (x) = − log p(x) P Iα (x) = − i χARi (x) log µ(Ai) Recall: H(α) = Iα dµ. Defn: Iα|B (x) = − X χAi (x) log µ(Ai|B) = − X i χAi (x) log E(χAi |B) i (conditional expectation) R Defn: H(α|B) = I α|B dµ. We want this to agree with H(α|β): Given finite partition β, let B = B(β) be all finite unions of elts. of β, which is a σ-algebra. Claim: H(α|B) = H(α|β) proof: µ(Ai|B) = X χBj (x) j 14 µ(Ai ∩ Bj ) µ(Bj ) (since RHS is B-measble. and for all j, Z Z RHS dµ = µ(Ai ∩ Bj ) = Bj χAi dµ Bj Thus, Iα|B (x) = − X χAi (x) log X j i =− X χBj (x) χAi∩Bj (x) log i,j µ(Ai ∩ Bj ) µ(Bj ) µ(Ai ∩ Bj ) . µ(Bj ) Thus, Z H(α|B) = − X χAi∩Bj (x) log i,j =− X i,j µ(Ai ∩ Bj ) log µ(Ai ∩ Bj ) dµ µ(Bj ) µ(Ai ∩ Bj ) = H(α|β) µ(Bj ) NOTE: Every finite sub-sigma-algebra (which is simply a finite algebra) has a unique generating partition. Correspondence: Finite Partition ←→ Finite σ-algebra β 7→ σ(β): σ(β) = unions of elts. of β B 7→ π(B) π(B) = {∩B∈B B where B = B or B c 15 Lecture 24: Nov. 7 RECALL: (M, A, µ): probability space α : finite measble. partition B: sub-sigma-algebra Defn: (conditional information function) X X χAi (x) log E(χAi |B) χAi (x) log µ(Ai|B) = − Iα|B (x) = − i i Defn: (conitidional entropy) Z H(α|B) = I α|B dµ Recall: If β is a (finite, msble) partition, then H(α|σ(β)) = H(α|β) If B is a finite sigma-algebra, H(α|B)) = H(α|π(B)) NOTE: Iα|B 6= E(Iα |B) Reason: — Iα|B is typically not B-measurable. — E(Iα |B) is measurable. Also: this alternative would be a poor choice: R R E(Iα |B) = Iα = H(α), not = H(α|B). Trivial examples: H(α|A) = 0. H(α|{M, ∅}) = H(α. Theorem: (continuity of conditional entropy) 16 Let B a sub-sigma-algebra of A. Let B1 ⊂ B2 ⊂ B3 . . . be σ-algebras and B = σ(∪∞ n=1 Bn ). Let α be a finite measble. partition. Then Iα|Bn → Iα|B a.e. and in L1 and thus, H(α|Bn) → H(α|B) Proof: (µ(A|Bn), Bn) = (E(χA|Bn), Bn) is a martingale, since Bn ↑ By version of Martingale Convergence Theorem, µ(A|Bn) → µ(A|B) a.e. Thus, Iα|Bn = X χAi log µ(A|Bn) → X i χAi log µ(A|B) = Iα|B a.e. i For L1 convergence, it suffices to show that f ∗ ≡ sup Iα|Bn ∈ L1; for then, apply DCT. Note: The L1 convergence can be established by another version of Martingale Convergence Theorem, but we will need to know that f ∗ ∈ L1 for proof of SMB Theorem. For this, show µ(f ∗ > t) dies exponentially fast as t → ∞: Fix A = Ai ∈ α and t > 0. Let fn,A(x) = − log µ(A|Bn). Let Cn,A(t) = {x : fi,A(x) ≤ t, i = 1, . . . , n − 1, and fn,A(x) > t} 17 Since Cn,A(t) ∈ Bn, Z µ(A ∩ Cn,A(t)) = Z χA(x)dµ = µ(A|Bn)dµ Cn,A (t) Z Cn,A (t) e−fn,A(x)dµ ≤ e−tµ(Cn,A(t)) = Cn,A (t) Thus, ∗ µ(A ∩ {f > t}) = ∞ X µ(A ∩ Cn,A(t)) ≤ e −t n=1 ∞ X µ(Cn,A(t)) ≤ e−t. n=1 Thus, µ({f ∗ > t}) ≤ M e−t where M is the number of sets in α. Thus, Z Z Z ∞ µ(f ∗ > t)dt ≤ M f ∗dµ = ∞ e−tdt < ∞. 0 0 QED Now, we can formulate another list of entropy inequalities involving conditioning on σ-algebras. Defn: A measure space has a countable basis if there is a countable collection of measurable sets C s.t. for all A ∈ A and > 0, there exists C ∈ C s.t. µ(A∆C) < . Fact: (M, A, µ) has a countable basis iff L2(M, A, µ) is separable, i.e. has a countable dense set. Fact: The Borel sigma-algebra for any metric space has this property. And so does its completion. Fact: Given B ⊆ A, the map L2(M, A, µ) → L2(M, B, µ), 18 f 7→ E(f |B) is continuous (it is a continuous projection). Corollary: Let B ⊆ A, If (M, A, µ) has a countable basis, so does (M, B, µ). Let (M, A, µ) have a countable basis. Let B be sub-sigma-algebra. Let {C1, C2, . . .} be a countable basis for B. Let Bn be the sigma-algebra (equiv.., the algebra) generated by {C1, C2, . . . , Cn}. Then B n ↑ B. Let βn denote the partition determined by Bn. By continuity theorem, H(α|βn) = H(α|Bn) → H(α|B). Note: βn βn+1. So, H(α|Bn) ↓ H(α|B) Apply this to derive the new list of properties from the old list. Recall: Property 12: If γ α, then H(γ|β) ≤ H(α|β) Prop: If γ α, then H(γ|B) ≤ H(α|B). Proof: H(γ|βn) ≤ H(α|βn) QED Recall: Property 8: Assume γ δ. Then H(α|δ ≤ H(α|γ) Prop: Assume C ⊆ D. Then H(α|D) ≤ H(α|C). Proof: Let γn, δn s.t. σ(γn) ↑ C and σ(δn) ↑ D. Let ηn = γn ∨ δn. Then γn ηn. σ(γn) ↑ C and σ(ηn) ↑ D. 19 Thus, H(α|D) = lim H(α|ηn) ≤ lim H(α|γn) = H(α|C). n n QED 20 Lecture 25: Nov. 9 Recall: We are assuming (M, A, µ) has a countable basis and thus so does (M, B, µ), where B is sub-sigma-algebra. Thus, there is a sequence of partitions β1 β2 . . ., with corresponding (finite) sigma-algebras Bn = σ(βn) ↑ B α: finite partition; H(α|Bn) = H(α|βn) Then, by continuity of information/entropy: Iα|Bn → Iα|B a.e. and in L1 and thus, H(α|Bn) → H(α|B) ——————————————– Recall: Property 2: H(α|β) ≤ H(α) with equality iff α ⊥ β. Property 2’: H(α|B) ≤ H(α) with equality iff α ⊥ B iff σ(α) ⊥ B Proof: H(α|B) ≤ H(α): Proof: Bn ↑ B So, H(α|B) = limn H(α|Bn) = limn H(α|βn) ≤ H(α). H(α|B) = H(α) iff B ⊥ α. Proof: If: B ⊥ α implies for all n, we have βn ⊥ α. Thus, H(α|βn) = H(α), Thus, H(α|B) = limn H(α|βn) = H(α). 21 Only If: H(α|βn) ↓ H(α|B). So, H(α|B) ≤ H(α|βn) ≤ H(α). So, if H(α|B) = H(α), then for all n, H(α|βn) = H(α). So, for all n, α ⊥ βn. Thus, α ⊥ B. QED Recall: Property 3: H(α|β) ≥ 0 with equality iff α β Property 3’: H(α|B) ≥ 0 with equality iff α ⊆ B iff σ(α) ⊆ B Proof: Recall: H(α|B) = Z X −χAi (x) log µ(Ai|B)(x)dµ i If: If each Ai ∈ B, then E(χAi |B) = χAi . Thus, Z X H(α|B) = −χAi (x) log χAi (x)dµ = 0 i since χAi takes on only 0 and 1 values. Only if: If H(α|B) = 0, then since each term in formula for H is nonnegative, they must all be 0 and so each µ(Ai|B)(x) is 0/1 - valued. Since µ(Ai|B)(x) is 0/1 - valued, µ(Ai|B) = χC for some C ∈ B. Claim: Ai = C ∈ B. 22 Proof of this: But by properties of conditional expectation, Z Z Z µ(Ai ∩ C) = χA i = µ(Ai|B) = χC = µ(C) C C C and µ(Ai ∩ C c) = Z Cc Z Z µ(Ai|B) = χAi = Cc χC = 0 Cc So, up to a set of measure zero, Ai ⊆ C. But since µ(Ai ∩ C) = µ(C), we have Ai = C mod 0. QED Properties III: 1. H(α ∨ β) = H(α) + H(β|α) 2. H(β|F) ≤ H(β) with equality iff F ⊥ β 3. H(β|F) ≥ 0 with equality iff β ⊆ F 4. H(α ∨ β) = H(α) iff β α P 5. H(α1 ∨ · · · ∨ αn) = ni=1 H(αi|α1 ∨ · · · ∨ αi−1) P 6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent. 7. Assume β α. Then H(β) ≤ H(α) and equality holds iff β = α. 8. Assume E F. Then H(β|F) ≤ H(β|E) 9. H(α ∨ β|F) = H(α|F) + H(β|σ(α ∪ F)) ≤ H(α|F) + H(β|F) 10. H(β|σ(F ∪ E)) ≤ H(β|F) with equality iff E ⊥F β 23 11. If G ⊥F (σ(β ∪ E), then H(β|σ(E ∪ F ∪ G) = H(β|σ(E ∪ F)). 12. If γ α, then H(γ|B) ≤ H(α|B) 13. If T is an MPT, then for all i, j, k, H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α) 14. If T is an MPT, then for all i, j, k, H(T −iα ∨ . . . ∨ T −j α|F) = H(T −i−k α ∨ . . . ∨ T −j−k α|T −k (F)) NOTE: get similar properties for Iα , Iα|B . NOTE: H(α|B) = Z X −χAi (x) log µ(Ai|B)(x)dµ i = Z X −µ(Ai|B)(x) log µ(Ai|B)(x) i (since whenever g is B-measble, we have: Z Z f g = E(f |B)g) Recall: For a stationary process X, H(X1, . . . , Xn) = lim H(Xn|Xn−1 . . . X1) n→∞ n→∞ n Given MPT T and partition α = {A1, . . . , AM }, we correspond the stationary process taking values in {1, . . . , M }, where h(X) = lim P (X0 = x0, . . . , Xn = xn) = µ(Ax0 ∩ T −1(Ax1 ) ∩ · · · T −n(Axn )) Define: h(T, α) = h(X). 24 equivalently, H(α0 ∨ · · · ∨ αn−1) = lim H(αn|α0 ∨ · · · ∨ αn−1) n→∞ n→∞ n h(T, α) = lim = lim H(α0|α1 ∨ · · · ∨ αn) n→∞ where αi = T −i(α). Both limits are non-increasing. n Let αm = ∨ni=mαi. ∞ Let αm = σ(∪∞ i=m αi ). Using continuity of conditional entropy, we see: Z h(T, α) = Iα|α1∞ dµ = H(α|α1∞) Define h(T ) = supα h(T, α) We will see how to compute h(T ), at least in some cases. For right now, focus on h(T, α). SMB Theorem: Let T be an ergodic MPT and α a (finite, measble.) partition. Then 1 Iα0n → h(T, α) = H(α|α1∞) a.e. and in L1 n Proof will use Ergodic Theorem and Theorem on Convergence of conditional information. NOTE: Let α(x) be the index of the element of α to which x belongs. Let µn(x) = µ(∩ni=0T −i(Aα(T i(x))). Iα0n (x) = − log µn(x) 25 and SMB theorem says that µn(x) ≈ e−nh(T,α) In stochastic language, P (X0 = x0, . . . , Xn = xn) ≈ 2−nh(X) NOTE: There is a version of SMB for non-ergodic MPT’s. 26