Lecture 26: Nov. 14 SMB Theorem Let T be an ergodic MPT and α a (finite, measble.) partition. Then Z 1 Iα0n → h(T, α) = Iα|α1∞ = H(α|α1∞) a.e. and in L1 n Shannon-1948: irreducible stationary Markov chains; convergence in probability McMillan-1953: ergodic stationary processes; convergence in L1 Breiman-1957: ergodic stationary processes; convergence a.e. Proof: Recall: — α(x) defined by x ∈ Aα(x). — Iα (x) = − log µ(Aα(x)) — Iα|β (x) = − log µ(Aα(x)|Bβ(x)) Lemma 1: Iα∨β = Iβ + Iα|β Proof: Iα∨β (x) = − log µ(Aα (x)∩Bβ (x)) = − log µ(Aα(x) ∩ Bβ(x)) −log(µ(Bβ(x))) µ(Bβ(x)) = Iα|β (x) + Iβ (x) Lemma 2: Iα ◦ T = IT −1(α) Proof: Iα ◦ T (x) = − log µ(Aα(T (x))) = − log µ(AT −1(α)(x)) = IT −1(α)(x) Thus, Iα∨T −1α = IT −1(α) + Iα|T −1α = Iα|T −1α + Iα ◦ T 1 By induction: Iα0n = Iα|α1n +Iα|αn−1 ◦T +. . .+Iα|α1 ◦T n−1 +Iα ◦T n = n X 1 1 Iα|αn−k ◦T k 1 k=0 n 1 1X Iα0n = Iα|αn−k ◦ T k 1 n n n 1 1X = Iα|α1∞ ◦ T k + n n k=0 n X k=0 k Iα|αn−k ◦ T − Iα|α1∞ ◦ T k 1 k=0 = Rn + Sn Since Iα|α1∞ ∈ L1, Z Iα|α1∞ dµ = H(α|α1∞) we can apply Ergodic Theorem to get Rn → H(α|α1∞) a.e. and in L1 So, it suffices to show that Sn → 0 a.e. and in L1. (Note: In IID case, Sn = 0) For L1 convergence, observe: Z n Z 1 X k k |Sn|dµ ≤ Iα|αn−k ◦ T − Iα|α1∞ ◦ T 1 n k=0 n 1X = n Z Iα|αn−k − Iα|α1∞ 1 k=0 1 which converges to 0 by L convergence in theorem on continuity of conditional information. 2 (in fact, we only need the Cesaro convergence in L1 from that theorem). Remains to prove: Sn → 0 a.e. (the interesting part). Proof: Let hN = supn≥N |Iα|α1n − Iα|α1∞ |. By theorem on continuity of conditional entropy, hN → 0 a.e. and hN ≤ 2f ∗ where f ∗ = supn Iα|α1n ∈ L1. By DCT, Z hN dµ → 0 Fix N . Since for k ≤ n − N , |Iα|αn−k − Iα|α1∞ | ≤ hN 1 n−N 1X 1 lim sup |Sn| ≤ lim sup( hN ◦T k + n n k=0 n X h1◦T k ) = lim sup(Un+Vn) k=n−N +1 Let n → ∞. By ergodic theorem applied to hN , Un → R hN dµ a.e.. P And Vn consists of the last N terms of n1 nk=0 h1 ◦ T k , which converges by the ergodic theorem. So Vn → 0 a.e. Thus, R lim sup |Sn| ≤ hN dµ. Let N → ∞. Thus, lim sup |Sn| = 0 a.e. QED Recall: h(T, α) = lim(1/n)H(∨ni=0αi) n 3 h(T ) = supα h(T, α) Fact: h(T ) is an isomorphism invariant, i.e. if T and S are isomorphic, then h(T ) = h(S). Proof: An isomorphism φ : M → N sets up a bijection between finite measble. partitions of M and those of N : α 7→ φ(α), h(T, α) = h(S, φ(α)) Prop 1: Let T be an MPT, α a (finite, measble.) partition and αi = T −i(α). Then for all m, h(T, α) = h(T, ∨m i=0 αi ) Proof: n+m n h(T, ∨m i=0 αi ) = lim(1/n)H(∨i=0 αi ) = lim(1/n)H(∨i=0 αi ) = h(T, α) n n Prop 2: Let T be an MPT, α, β (finite, measble.) partitions. Then h(T, β) ≤ h(T, α) + H(β|α) Proof: H(∨nj=0βj ) ≤ H((∨ni=0βi)∨(∨nj=0αi)) = H(∨ni=0αi))+H(∨nj=0βj | ∨ni=0αi) H(∨nj=0βj | ∨ni=0 αi) ≤ n X H(βj | ∨ni=0 αi) ≤ n X H(βj | αj ) j=0 j=0 Thus, H(∨nj=0βj ) ≤ H(∨ni=0αi))+ n X H(βj | αj ) = H(∨ni=0αi))+nH(β | α) j=0 Thus, (1/n)H(∨ni=0βi) ≤ (1/n)H(∨ni=0αi) + H(β | α). Let n → ∞. 4 Lecture 27: Nov. 16 Recall: Prop 1: Let T be an MPT, α a (finite, measble.) partition and αi = T −i(α). Then for all m, h(T, α) = h(T, ∨m i=0 αi ) Prop 2: Let T be an MPT, α, β (finite, measble.) partitions. Then h(T, β) ≤ h(T, α) + H(β|α) ∞ Notation: ∨∞ i=0 αi = σ(∪i=0 αi ) Defn: A 1-sided generator for a MPT T on (M, A, µ) is a (finite, measble.) partition α such that ∨∞ i=0 αi = A Theorem (Kolmogorov-Sinai, 1958-1959). Let α be a 1-sided generator for an MPT T . Then h(T ) = h(T, α) (and so we have some hope of computing h(T )). Kolmogorov - 1958: slightly different notion, for restricted class of prcoesses; not quite right; but in 1957 lecture gave correct defn. for iid Sinai - 1959: correct definition, completely general Proof: Enough to show that for any (finite, measble.) partition β, we have h(T, β) ≤ h(T, α). By Prop 2 applied to ∨ni=0αi and then Prop. 1, for all n, h(T, β) ≤ h(T, ∨ni=0αi) + H(β| ∨ni=0 αi) = h(T, α) + H(β| ∨ni=0 αi) 5 But σ(∨ni=0αi) ↑ A. By continuity of entropy H(β| ∨ni=0 αi) → H(β|A) But since β ⊂ A, we have, by Property 3, H(β|A) = 0. Thus, h(T, β) ≤ h(T, α). Example: If T is the MPT corresponding to a one-sided stationary process X, then h(T ) = h(X) Proof: Recall that h(T, α) = h(X) where α is the partition Ai = + {x ∈ F Z : x0 = i}. Clearly α is a 1-sided generator. So, h(Tiid+(p)) = H(p) h(Tiid+(1/2,1/2)) = log 2, h(Tiid+(1/3,1/3.1/3)) = log 3 h(TDoubling) = log 2. Show one-sided generator for doubling map. For Markov P with stationary vector π: X h(TP + ) = πi log pij ij Theorem: Let T be an invertible MPT. If T has a 2-sided generator α), i.e., ∨∞ i=−∞ αi = A then h(T ) = h(T, α). m Proof: In Prop. 1, replace ∨m i=0 αi with ∨i=−m αi 6 Example: If T is the MPT corresponding to a two-sided stationary process X, then h(T ) = h(X) So, h(Tiid(p)) = H(p) h(Tiid(1/2,1/2)) = log 2, h(Tiid(1/3,1/3.1/3)) = log 3 Thus, these MPT’s are not isomorphic. Existence of two-sided generators (Krieger, 1970): if T is an invertible ergodic MPT and h(T ) is finite, then it has a two-sided generator with at most beh(T )c + 1 sets. Note: this bound is necessary: Proof: If α is a two-sided generator, then h(T ) = h(T, α) = H(α|α1∞ ≤ H(α) ≤ log α Thus eh(T ) ≤ |α| Thus, deh(T )e ≤ |α| LHS is beh(T )c + 1 unless eh(T ) is an integer. But we will see an example where eh(T ) is an integer k and there is no generator with k sets; namely h(T ) = 0 (rotation of the circle). Prop: h(T, α) = 0 iff α ⊂ α1∞. Proof: h(T, α) = H(α|α1∞). But Property 3 says that RHS is 0 iff α ⊂ α1∞. Interpretation in terms of predictability (determinism). 7 Lecture 28: Nov. 18 Recall: Prop: h(T, α) = 0 iff α ⊂ α1∞. Interpretation in terms of predictability (determinism). Cor: h(T ) = 0 iff α ⊂ α1∞ for all α. Prop: If T is invertible and has a 1-sided generator, then h(T ) = 0. Proof: α ⊂ A = T −1(A) = T −1(α0∞) = α1∞ So, we do not generally expect to have one-sided generators for invertible MPT’s. Example 1: For Baker; h(T ) = log 2, and since Baker is invertitble, it has no 1-sided generator. Show 2-sided Generator Example 2: Irrational rotation (where Tr is rotation of circle by angle r set r/(2π) 6∈ Q). Claim: h(Tr ) = 0 Let a = er2πi. Tr (z) = az Partition, α = {Aup, Adown}: upper and lower semi-circles bounded by 1 and -1. Then T −n(α) are semi-circles bounded by a−n and −a−n. But {a−n}n∈Z + is dense (note: −r is just as irrational as r). Thus, all arcs belongs to α0∞. Thus, α0∞ = A. QED Note: Tr has a two-sided generator with 2 sets, but not with 1 set. Thus, in this case, we need more than deh(T )e sets in a two-sided generator (compare with Krieger generator theorem). 8 Example: For a rational rotation Tr , i.e., r/(2π) = p/q ∈ Q, h(Tr ) = 0. Proof: T q = I. For any partition α, α0nq−1 = α0q−1. Thus, h(T ) = lim(1/nq)H(α0nq−1) = lim(1/nq))H(α0q−1) → 0 n n Note: there is no finite generator for Tr , r rational (non-ergodic, compare with Krieger generator theorem) Note: More generally, any rotation on a compact group has zero entropy. Defn: Kolmogorov (K) MPT Invertible and: There exists a sub-sigma-algebra B s.t. 1. B ⊂ T (B) n 2. ∨∞ n=0 T (B) = A 3. ∩nT −n(B) = {∅, M } (mod 0) Think of B as α0∞. i.e., satisfies the Kolmogorov 0-1 law. Deep Theorem (Kolmogorov, Rohlin): Let T be invertible. Then T is K iff for all non-trivial α, h(T, α) > 0. K-MPT’s are extreme opposites of zero entropy MPT’s: For all α, h(T, α) > 0 —vs—- For all α, h(T, α) = 0 At one time, it was conjectured that every invertible MPT is a direct product of a K MPT and a zero entropy MPT. This is false, but there is a weaker notion of product for which this is true. Deep Theorem (Ornstein 1969) Tiid(p) ∼ Tiid(q) iff H(p) = H(q). Defn: An MPT is B (Bernoulli) if it isomorphic to Tiid(p) for some p. 9 Note that B ⊆ K by virtue of the Kolmogorov 0-1 Law. At one time, it was conjectured that B = K, but this is false. Lecture 29: Nov. 21 – Posted Exercises 3 and 4 on website – Student talks: reserve Math 126 for every weekday Dec. 1 - 15, 2-3:30 – Teaching evaluations, enrolled students. – Problem of isomorphism of MPT’s is unsolved, except for special classes – Lebesuge spaces are quite general Defn: A topological dynamical system (TDS) is a continuous map on a compact metric space T : M → M . Defn: An invertible top dyn sys (ITDS) is a homeomorphism T : M → M (equivalently, a bijective continuous map, since M is compact. Orbit: —- If non-invertible, {T n(x)}n≥0 —- If invertible, {T n(x)}n∈Z Iterate dynamics, just as in erogidc theory. Analogue of ergodicity: Defn: Topologically Transitive: for all nonempty open A, B, there exists n > 0 s.t. T −n(A) ∩ B 6= ∅. Fact: Top. transitive iff there exists a point with a forward dense orbit, i.e., {T n(x)}n≥0 is dense in M ; sometimes we say that x is a dense orbit (equivalently, a dense Gδ of dense orbits). 10 Proof: Only if: Let {Am} be basis for topology. Then ∞ −n ∩∞ (Am)) m=1 (∪n=1 T is a countable intersection of dense open sets. If: Let x ∈ V have forward dense orbit. Then x ∈ T −n(A) ∩ B for some n ≥ 0. Analogue of mixing: Defn: Topologically mixing: for all nonempty open A, B, there exists N > 0 s.t. for all n ≥ N , T −n(A) ∩ B 6= ∅. Note: suffices to check Top. mixing and Top. transitive on a basis of open sets. Note that if there is a T -invariant Borel probabiiity measure µ which is strictly positive on all nonempty open sets, then: — Ergodicity (wrt µ) implies Top. Transitive. — Mxiing (wrt µ) implies Top. Mixing. Defn: Periodic point: T p(x) = x (x is periodic with period p; sometimes we say that the orbit is periodic). e.g., p = 1 means fixed point. (no real analogoue in ergodic theory) Examples: 1. circle rotation: z 7→ az, a = eir , r ∈ [0, 2π) —– r/(2π) ∈ Q: all orbits are periodic; not top. trans.; therefore, not top, mixing —– r/(2π) 6∈ Q: all orbits are dense; no periodic orbits; top. trans., since ergodic w.r.t Lebesgue; not top. mixing 11 2. doubling map on circle: z 7→ z 2 (eiθ 7→ e2iθ ). this is similar to the doubling map on [0, 1]: x 7→ 2x mod 1, but rescaled. top. trans. and top. mixing, since the doobling map on [0, 1] is mixing w.r.t. Lebesgue will see that it has a dense set of periodic points. (so, a rich mixture of periodic and “ergodic” behaviour). 3. Smale horseshoe with caps (a continuous analogue of Baker’s transformation). T : M → M , where M is union of square S and left and right caps. T is not onto, but can be extended to a self-homeomorphism of R2 . Dynamics: Let Ω = ∩n∈ZT n( red ∪ blue ). T has a fixed point x0 in Left Cap. Claim: If x 6∈ Ω, then limn→∞ T n(x) = x0. Proof: black → right cap → left cap → x0. If x 6∈ Ω, then for some m ∈ Z, then T m(x) ∈ left cap ∪ black ∪ right cap , and so T n+m(x) → x0. QED Interesting dynamics is on Ω. Looks like Baker, except it is continuous. Will see T |Ω is top. mixing, therefore also top. trans., and also show that it has dense periodic points. 12 Lecture 30: Nov. 23 Example: Full two-sided Shift Let F be a finite set. Let M = F Z. Let T = σ, the left shift. Metric: d(x, y) = 2−k where k is maximal such that x[−k,k] = y[−k,k] if x = y, set d(x, y) = 0; if x0 6= y0, set d(x, y) = 2; this is a metric. meaning: x and y are close if they agree in a large central block. the topology generated is the product topology, which is the same as the topology generated by cylinder sets. note that agreement in a large central block is equivalent, in the sense of generating the topology, to agreement in a large block (central or otherwise). Top, mixing because for A = Aa0...a`−1 and B = Ab0...bu−1 σ −n(A) ∩ B 6= ∅ for n ≥ u. Thus, also top. transitive. Periodic points: periodic sequences. Periodic points are dense: Given x ∈ M , define y = (x[−n,n])∞. Then y is periodic and d(x, y) ≤ 2−n, Note: Same holds for one-sided shift. Defn: Topological conjugacy: T ∼ S if there exists homeomorphism φ : M → N s.t. φ ◦ T = S ◦ φ. (in this case, homeo is equivalent to bijective and continuous). Invariants of top. conjugacy: a. Top. Trans. 13 b. Top, Mixing c. Density of Periodic points Claim: Full two sided shift on F = {R, B} is conjugate to horseshoe, restricted to Ω, Define φ : Ω → F Z: If x ∈ Ω, define φ(x) = z ∈ F Z, where zn = R if T n(x) ∈ red zn = B if T n(x) ∈ blue Proof of: φ ◦ T = σ ◦ φ: φ(T x)n = R if T n+1(x) ∈ red φ(T x)n = B if T n+1(x) ∈ blue Thus, φ(T x)n = φ(x)n+1 = (σ(φ(x)))n. QED Lemma: For all N and all sequences z−N . . . z0 . . . zN ∈ {B, R}2N +1, −n (zn) the set S(z−N . . . z0 . . . zN ) = ∩N n=−N T – is compact – nonempty – has diameter ≈ 2−N Draw pictures of S( 3-blocks ) Proof of 1-1: Let x ∈ Ω and z = φ(x). −n ∩∞ (zn) = {x} n=−∞ T So, x can be uniquely recovered from T (x). Proof of onto: Let z ∈ {B, R}Z. −n Then ∩N (zn) is a nested decreasing sequence of compact n=−N T sets and thus its infinite intersection is nonempty. In fact, since 14 diameters decrease to zero, its intersection is a single point x ∈ Ω, and z = φ(x). Proof that φ−1 is continuous: If d(z, z 0) is small, then for a large N , 0 z−N . . . z0 . . . zN = z−N . . . z00 . . . zN0 . 0 Now, φ−1(z) ∈ S(z−N . . . z0 . . . zn) and φ−1(z 0) ∈ S(z−N . . . z00 . . . zn0 ), and so ρ(φ−1(z), φ−1(z 0)) < 2−N . Note: Ω is a Cantor set. Defn: Topological factor: T ↓ S if there exists onto, continuous φ : M → N s.t. φ ◦ T = S ◦ φ. If T has one of property a, b or c, and S is a topological factor of T (e.g., conjugacy), then so does S. Example: Claim: Full one sided shift on F = {0, 1} factors onto doubling map on circle. φ(x0x1 . . .) = e2πi(.x0x1...) φ ◦ σ = T ◦ φ: φ◦σ(x0x1 . . .) = φ(x1x2 . . .) = e2πi(.x1x2...) = e2πi2(.x0x1...) = (φ(x0x1 . . .))2 φ is continuous: if x[0,n] = y[0,n], then .x0x1 . . . and .y0y1 . . . are close. onto: every angle is of the form 2π(.x0x1 . . .), Note: full one sided shift and doubling map are not top. conjugate because F Z and circle are not homoemorphic (e.g., disconnected vs. connected). 15 Thus, both doubling map on circle and horseshoe map on Ω are top. mixing, therefore top. transitive, and have dense periodic points. Lecture 31: Nov. 25 Another Example: Defn: Let C be a square 0-1 matrix (say m × m). Let F = {0, . . . , m − 1}. Let MC = {x ∈ F Z : Cxi,xi+1 = 1 for all i ∈ Z} Consider graph of C σ|MC is called a Topological Markov Chain (TMC). Notation: MC refers to the set and to σ|CA . Examples: 1. C= 1 1 1 0 Draw graph Then MC is the set of all bi-infinite binary sequences such that 11 is forbidden. 2. C= 1 1 1 1 Then MC is a full shift. WMA: C has no all zero rows and no all zero columns. Recall that we defined irreducibility and primitivity only for stochastic matrices, but the same concepts apply to nonnegative matrices. 16 Prop: 1. σ on MC is top. transitive iff C is irreducible. 2. σ on MC is top. mixing iff C is primitive. Proof is nearly identical to proofs of ergodicity and mixing for irreducible and primitive Markov chains. Reasons for our interest: a. TMC’s model many interesting smooth dynamical systems, e.g. partial horseshoe. b. Prop: TFAE 1. M is the support of a stationary Markov chain (i.e., the smallest closed subset of F Z that has measure one). 2. M = MC is a TMC which is the disjoint union of finitely many irreducible TMC’s, 3. M = MC has dense periodic points Example: P = 1/3 2/3 1 0 π = [3/5, 2/5]. TOPOLOGICAL ENTROPY: For simplicity, assume Invertible! Define: for open covers α, β (possibly infinite!) α ∨ β = {Ai ∩ Bj : nonempty } T −1(α) = {T −1(Ai} n αm = T −m(α) ∨ · · · ∨ T −n(α). α β: every element of β is a subset of an element of α 17 Recall: every open cover of a compact space has a finite subcover. Defn: Let M be compact metric space. Let α be an open cover of M . Let N (α) be size of smallest subcover. Let H(α) = log N (α). Easy properties: – H(α ∨ β) ≤ H(α) + H(β) – If α β, then H(α) ≤ H(β). – H(α) = H(T −1(α)). Defn: h(T, α) = lim (1/n)H(α0n−1) n→∞ Prop: Limit exists. Proof: Let an = H(α0n−1). Then an+m ≤ an + am (is subadditive. Proof: α0n+m−1 = α0n−1 ∨ T −n(α0m−1). Thus, an+m = Hα0n+m−1) ≤ H(α0n−1) + H(α0m−1) = an + am QED Fekete’s Lemma: For a subadditive sequence an, lim(1/n)an = inf (1/n)an n n Rough idea of proof: akn ≤ kan and so akn kn So, if limit exists, then it must be the inf. Defn: h(T ) = sup h(T, α) α 18 ≤ an n. Notation: If T is a ITDS and and also has an invariant Borel probability measure µ, then we use hµ(T ) to denote the measure theoretic entropy and h(T ) to denote the topological entropy. Lecture 32: Nov. 28 Recall defn of h(T, α) and h(T ) For simplicity, assume T is ITDS. Prop: Topological entropy is an invariant of topological conjugacy, i.e., if S ≈ T , then h(S) = h(T ). Want to compute h(T ) Easy facts: n ). – h(T, α) = h(T, αm – If α β, then h(T, α) ≤ h(T, β). Defn: Diameter of an open cover is the sup of diameters of its elements. Prop: If αm is a sequence of open covers and diameter(αm) approaches 0, then h(T, αm) → h(T, α). Proof: Assume h(T ) < ∞. Let > 0. Let β be an open cover such that h(T, β) > h(T ) − . Choose M such that for m ≥ M , the diameter of αm is less than the Lebesgue number of β, and so β αm. Thus, h(T ) ≥ h(T, αm) ≥ h(T, β) > h(T ) − . If h(T ) = ∞. Let N > 0 and β be an open cover such that h(T, β) > N . Continue as above. QED Example: Let T be rotation of circle. Let αm be cover consisting of all open intervals of diameter < 1/m. 19 Then T −1(αm) = αm, and so αm = (αm)n−n and so h(T, αm) = 0. Thus, h(T ) = 0. QED Defn: A topological generator for an ITDS is a finite open cover α = {A1, . . . , Am} such that every sequence ∩∞ k=−∞ Aik contains at most one point. Theorem: If α is a top. generator, then h(T ) = h(T, α). n Claim: diam (α−n ) → 0. Proof of Claim: Use compactness: If not, then there exists δ > 0 and xn, yn in same element n ). and d(xn, yn) > δ. Let the xn, yn accumulate on x, y. of (α−n Then x 6= y. By diagonalization can find an infinite sequence ik s.t. x, y ∈ ∩∞ k=−∞ Aik . n Proof of Prop: By Previous Prop, h(T, α−n ) → h(T ). n By easy Prop., h(T, α−n ) = h(T, α). QED For the full shift, the standard partition α given by Ai = {x ∈ F Z : x0 = i} is a top. generator (in particular it is an open cover). Thus, h(σF Z ) = h(σF Z , α). Observe N (α0n−1) = |F |n. Thus, h(σF Z ) = log |F | Cororllary: entropy of horseshoe is log(2) (the complement of the invariant set has zero entropy). The standard partition for a TMC MC is also a top. generator (for the same reason). Notation: σC denotes the shift on MC . 20 Proposition: For a TMC MC , h(σC ) = log λC where λC is the spectral radius of C, i.e., λC = max{|λ| : λ eigenvalues of C} Will use: Perron-Frobenius Theorem: Let C be an irreducible matrix with spectral radius λC . (C is a nonnegative square matrix s.t. for all i, j there exists n s.t. (C n)ij > 0.) Then 1. λC > 0 and is an eigenvalue of A. 2. λC has a (strictly) positive eigenvector and is the only eigenvalue with a (strictly) positive eigenvector ((right Perron eigenvector is denoted v and left Perron eigenvector is denoted w). 3. λC is a simple eigenvalue. 4. If C is primitive (i.e., for some n > 0, C n > 0) , then Cn → (viwj ) λnC (where w · v = 1). 5. If C is primitive, then λ is the only eigenvalue with modulus = λ. Note the consistency of the Proposition above with computation above for the Full shift: C is the all 1’s matrix of size |F |. So, Claim: λC = |F | 21 Proof of Claim: |F | is an eigenvalue of C with eigenvector 1 Apply parts 1 and 2 of P-F Theorem. Lecture 33: Nov. 30 Re-state P-F Theorem Brief discussion of proof of P-F: Let S be the unit simplex in Rn. Define xS xS1 In primitive case, C n > 0 for some n, and so f contracts into a single point, giving a unique eigenvalue λ with positive eigenvector. And λ is the only eigenvalue with largest modulus; for otherwise we would have leakage out of S. Modify to take care of the general irreducible case. f : S → S, x 7→ Compare with stochastic case. Proposition: For a TMC MC , h(σC ) = log λC where λC is the spectral radius of C, i.e., λC = max{|λ| : λ eigenvalues of C} Proof of Proposition: Will prove Prop in special case that C is irreducible (in general, reduce to disjoint union of irreducible components plus transient connections). Proof: Let α be standard partition, a top. generator. Then N (α0n−1) is the number of vertex sequences that occur in the graph G(C). 22 Thus, N (α0n−1) = log 1C n−11. So, h(σC ) = lim (1/n) log 1C n1 n→∞ P-F Theorem guarantees a positive (right) eigenvector v corresponding to λ = λC . Claim: There exist constants K1, K2 > 0 s.t. K1λn ≤ 1C n1 ≤ K2λn Proof of Claim: 1C n1 ≤ 1C n( 1C n1 ≥ 1C n( v 1 ) = (1 · v) λn min(v) min(v) v 1 ) = (1 · v) λn max(v) max(v) Take log, divide by n, take limn, to get: h(σC ) = log λ QED Example: Golden Mean. C= λC = (1 + √ 1 1 1 0 5)/2. For an ITDS T , let M = MT denote the set of all T -invariant Borel probability measures µ, i.e., T is an MPT with respect to (M, Borel, µ). 23 Theorem (Variational Principle for irreducible TMC) Let M = MσC . h(σC ) = log λC = sup hµ(σC ) µ∈M and the sup is achieved uniquely by an explicitly describable Markov chain: vj Pij = Cij λC vi where Cv = λC v. Lecture 34: Dec. 2 Proof of Variational Principle: Terminology: λ = λG is called the Perron eigenvalue of C and it corresponding eigenvectors are called Perron eigenvectors. Step 1: For all µ ∈ M, hµ(σC ) ≤ log λ = h(T ). Proof: Let α be the standard partition. Then hµ(σC ) = lim (1/n)Hµ(α0n−1) ≤ lim (1/n)) log 1C n−11 = log λ = h(σC ). n→∞ n→∞ Step 2: Let M denote the set of all first-order stationary Markov ν ∈ M. Then sup hµ(σC ) = sup hν (σC ) µ∈M ν∈M Proof: Let µ ∈ M. Let S = {i : µ(Ai) > 0} (recall Ai = {x ∈ MC : x0 = i}). Let ν denote the first-order stationary Markov chain defined by the |S| × |S| matrix P : Pij = µ(Aij ) µ(Ai) 24 where Aij = {x ∈ MC : x0 = i, x1 = j} Then ν ∈ M and is called the Markovization of µ. Then X −1 hµ(σC ) ≤ Hµ(α|σ (α)) = − µ(Aij ) log Pij = hν (σC ) ij Step 3: Find optimal ν ∈ M What is a good guess for the joint (NOT conditional) distribution Qij = ν(Aij )? Make distribution on long words roughly uniform. Assuming primitivity of C, we get from P-F (part 4): Cn → (viwj ) λn (where Cv = λv, wC = λw, and w · v = 1). So, (C n)ki ∼ vk wiλn and (C n)j` ∼ vj w`λn So, we guess: Qij ∼ (1C nei)Cij (ej C n1) ∼ Cij wivj λ2n ∼ Cij wivj Then πi ∼ X Cij wivj = λwivi j Thus, Pij = Qij w i vj vj = Cij = Cij πi λwivi λvi 25 is a good guess. Can verify directly that this Markov chain achieves the sup, namely log λ: X X vj Qij log Cij hν = − Qij Cij (log vj − log vi) = log λ + λv i ij ij X = log λ + Qij (log vj − log vi) = log λ ij since X Qij (log vj − log vi) = ij X πj log vj − X j πi log vi = 0. i (Note: this is valid assuming irreducibility). (Note: we get the same result if Q is replaced by any other stationary Q0) Thus, sup is achieved and equals log λ = h(σC ). Step 4: Uniqueness a. if hµ(σC ) = h(σC ), then hµ(σC ) = hν (σC ), where ν is the Markovization of µ. By exercise 6 on Exercise Set 1, µ is Markov and µ = ν. b. Let P and Q be the conditional and joint probability matrices for ν as above (i.e. vj Pij = Cij λvi Let P 0 and Q0 be the conditional and joint probability matrices for an arbitrary ν 0 ∈ M. If hν 0 = hν , then −EQ0 log P 0 = −EQ log P = −EQ0 log P and so EQ0 log P =0 P0 26 And log EQ0 X X P 0 0 Qij Pij /Pij = log = log πi0 Pij = 0. 0 P ij ij (here, π 0 = π 0P ). Thus, P P 0 = 0 = log E Q P0 P0 By Jensen, equality happens only if P 0 = P on the support of Q0. But if the support of Q0 is strictly smaller than the support of Q, then P 0 cannot be stochastic. EQ0 log Example: Golden Mean v=w= [λ,1] λ2 +1 So, P = 1/λ 1/λ2 1 0 where λ is the golden mean. Defn: Let C be an irreducible m×m 0-1 matrix. Let f1, . . . , fm ∈ R. Let Cf be the matrix defined by: (Cf )ij = Cij efj The pressure is defined: 1 PσC (f ) = lim ( ) log 1(Cf )n−11 n→∞ n Note: If f ≡ 0, then PσC (f ) = h(σC ). Theorem (Variational Principle for pressure of irreducible TMC’s) Let M = MσC . Z PσC (f ) = log λCf = sup hµ(σC ) + f dµ µ∈M 27 MC where f (x) = fx0 And the sup is uniquely achieved by the Markov chain ν: vj Pij = Cf λvi where v is the right Perron eigenvector of Cf . Compute: Let Qij = ν(Aij ). hν = − X ij X X vj Qij log Cij e = log λ− Qij fj + Qij Cij (log vj −log vi) λvi ij ij Z = log λ − f dν + 0 fj MC as before. Thus, Z hν + f dν = log λ MC The main point: Solution to a variational problem (equilibrium state) has an explicit exponential form (Gibbs state). This generalizes the result discussed in lectures 1 and 2. Further generalization: Defn: Let T be an ITDS. Let f : M → R be continuous. The pressure is defined as follows. P i Define Snf (x) = n−1 i=0 f ◦ T (x). An (n, )-spanning set is a set F such that for all x ∈ M , there exists y ∈ F such that d(T ix, T iy) < , for i = 0, . . . , n − 1. Let X PT (f, , n) = inf eSnf (x) (n,)−spanning sets F x∈F 28 Let PT (f, ) = lim sup(1/n) log PT (f, , n) n→∞ Define the pressure as PT (f ) = lim PT (f, ) →0 Note: when f = 0, PT (f ) = h(T ); to see this, use the spanning set definition of topological entropy given in Walters or Keller. Theorem (generalized variational principle) Z PT (f ) = sup hµ(T ) + f dµ µ∈M Note: sup is not always achieved. When it is achieved, it need not be unique. Note: But there are many important classes where the sup is achieved uniquely. Note: The RHS above can be expressed Ras a constrained maximization of hµ(T ), subject to constraint on f dµ. Note: And cases where it is achieved non-uniquely correspond to phase transitions. THE END 29