Ergodic Theory March 14, 2013 Contents 1 Uniform Distribution 1.1 Generalisation To Higher Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Generalisation To Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 6 7 2 Dynamical Systems 2.1 Subshifts of Finite Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 10 3 Measure Theory 14 4 Measures On Compact Metric Spaces 16 5 Measure Preserving Transformations 18 6 Ergodicity 24 7 Recurrence and Unique Ergodicity 28 8 Birkhoff ’s Ergodic Theorem 30 9 Entropy 32 10 Functional Analysis 42 1 1 Uniform Distribution Definition 1.1. Orbit For a space X and transformation T : X → X the orbit of x ∈ X is defined as Ox := {T n x}∞ n=0 Definition 1.2. Fixed Point For a space X and transformation T : X → X we say that x ∈ X is a fixed point of T if T x = x Definition 1.3. Periodic Point For a space X and transformation T : X → X we say that x ∈ X is a periodic point of T if T n x = x for some n ∈ N Definition 1.4. Indicator Function For a set A ⊆ X we denote the indicator function: ( 1 χA (x) = 0 x∈A x∈ /A Corollary 1.1. Given this definition we have that the frequency in which the orbit of x lies in A is n−1 1X χA (T j x) n→∞ n j=1 lim If we say that a property holds for a typical x ∈ X we mean that the property holds almost everywhere with respect to the measure on the space X. Definition 1.5. Preserves We say that T preserves µ if for any measureable A ⊆ X we have that µ(T −1 A) = µ(A) For x ∈ R we denote bxc := max{m ∈ Z : m ≤ x} to be the interger part of x and bxe := x − bxc to be the floating part. Definition 1.6. Uniformly Distributed We say that a sequence {xn }∞ n=0 is uniformly distributed mod 1 if for every 0 ≤ a < b < 1 we have that lim n→∞ 1 #{j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = b − a n Lemma 1.1. If {xn }∞ n=0 is uniformly distributed then it is dense. Proof. By contradiction suppose {xn }∞ n=0 is not dense. Then ∃0 ≤ a < b such that @xn ∈ [a, b]. So we have that {j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = φ ∀n and hence limn→∞ {j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = φ which contradicts the uniform distribution of {xn }∞ n=0 Lemma 1.2. The frequency in which the leading digit in the sequence {mn }∞ n=1 is r ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9} when log10 (m) ∈ R \ Q is r+1 log10 r Proof. The leading digit is r iff r10l ≤ mn < (r + 1)10l By taking the logarithm base 10 on all sides of this inequality we get that: log10 (r) + l log10 (10) ≤ n log10 (m) < log10 (r + 1) + l log10 (10) Simplifying this gives: log10 (r) + l ≤ n log10 (m) < log10 (r + 1) + l 2 By our assumptions log10 (m) is irrational hence the sequence n log10 (m) mod 1 is uniformly distributed so we have that 1 #{j ∈ [0, n−1] : j log10 (m) ∈ [log10 (r)+l, log10 (r+1)+l)} = (log10 (r+1)+l)−(log10 (r)+l) = log10 n→∞ n lim r+1 r Theorem 1.1. Weyl’s Criterion The following are equivalent: • {xn }∞ n=0 are uniformly distributed mod 1 • For each l ∈ Z \ {0} we have that n−1 1 X 2πilxj e =0 n→∞ n j=0 lim Proof. • Firstly we will prove that: {xn }∞ n=0 are uniformly distributed mod 1 =⇒ n−1 1 X 2πilxj e =0 n→∞ n j=0 lim WLOG we can assume that xn ∈ [0, 1) since e2πilxj = e2πilbxj e . Suppose {xn }∞ n=0 are uniformly distributed mod 1 Then ∀[a, b] ⊂ [0, 1) we have that lim n−1 X n→∞ Z χ[a,b] (xj ) = b − a = χ[a,b] (x)dx j=0 From this we can deduce that for a step function g we have that lim n→∞ n−1 X Z gxj ) = g(x)dx j=0 Let f be a continuous function on [0, 1) and let ε > 0, we can find a step function g s.t. ||f − g||∞ < ε Since {xj }∞ j=0 are uniformly distributed and g is a step function we can find n sufficiently large s.t. n−1 Z 1 1 X g(xj ) − g(x)dx < ε n 0 j=0 So we have that: n−1 n−1 Z 1 n−1 Z 1 Z 1 Z 1 1 X 1 X 1 X f (xj ) − f (x)dx ≤ (f (xj ) − g(xj )) + g(xj ) − g(x)dx + g(x)dx − f (x)dx n 0 0 0 0 n j=0 n j=0 j=0 n−1 Z 1 1 X ε + ε + < εdx 0 n j=0 = 3ε 3 Since ε > 0 was chosen arbitrarily we have that n−1 Z 1 1 X f (xj ) − f (x)dx = 0 lim n→∞ n 0 j=0 Moreover Z 1 e2πilx dx = 0 0 hence we have the required result. • Now we will prove the reverse implication. Suppose that for each l ∈ Z \ {0} we have that n−1 1 X 2πilxj lim e =0 n→∞ n j=0 Then for trigonometric polynomial g(x) = n X αk e2πilk x k=1 we have that Z 1 n−1 1X g(xj ) = g(x)dx lim n→∞ n 0 j=0 Let f be a continuous function on [0, 1] with f (0) = f (1) and fix ε > 0. We can find a trigonometric polynomial g s.t. ||g − f ||∞ < ε and as in the previous part of the proof w can see that Z 1 n−1 X lim f (xj ) = f (x)dx n→∞ 0 j=0 If we take [a, b] ⊂ [0, 1) then we can find f1 , f2 continuous functions s.t. f1 ≤ χ[a,b] ≤ f2 where f1 (0) = f1 (1), f2 (0), f2 (1) and Z 1 f2 (x) − f1 (x)dx < ε 0 This gives us that: lim inf n→∞ n−1 n−1 1X 1X χ[a,b] (x) ≥ lim inf f1 (xj ) n j=0 n j=0 Z 1 = f1 (x)dx 0 Z 1 ≥ f2 (x)dx − ε 0 1 Z ≥ χ[a,b] (x)dx − ε 0 lim sup n→∞ n−1 n−1 1X 1X χ[a,b] (x) ≤ lim inf f2 (xj ) n j=0 n j=0 Z 1 = f2 (x)dx 0 Z 1 ≤ f1 (x)dx + ε 0 Z ≤ χ[a,b] (x)dx + ε 0 4 1 But since ε > 0 was chosen arbitrarily we have that Z 1 n−1 1X χ[a,b] (x)dx = b − a χ[a,b] (x) = lim n→∞ n 0 j=0 hence indeed {xn }∞ n=0 are uniformly distributed. Lemma 1.3. The sequence xn = αn is uniformly distributed mod 1 for α ∈ R \ Q and not uniformly distributed for α ∈ Q Proof. We shall split the proof into the two cases: • α∈Q We can write α = p/q for p, q ∈ Z, q > 0 where p, q are coprime. n oq−1 The sequence xn then only takes values np of which there are finitely many hence the set q n=0 cannot be dense and therefore xn is not uniformly distributed. • α∈R\Q Let l ∈ Z \ {0} then lα ∈ / Z so e2πixn 6= 1 This gives us that n−1 1 X 2πiljα 1 1 − e2πilnα e = n j=0 n 1 − e2πilα so we have that n−1 1 X 2πiljα 2 ≤ lim 1 e lim n→∞ n 1 − e2πilα = 0 n→∞ n j=0 Hence by Weyl’s criterion we have that xn is unifromly distributed mod 1. Corollary 1.2. We have that xn = nα + β is uniformly distributed iff α ∈ R \ Q Proof. We shall split the proof into the two cases: • α∈Q We can write α = p/q for p, q ∈ Z, q > 0 where p, q are coprime. n oq−1 The sequence xn then only takes values np + β mod 1 of which there are finitely many q n=0 hence the set cannot be dense and therefore xn is not uniformly distributed. • α∈R\Q Let l ∈ Z \ {0} then lα ∈ / Z so e2πixn 6= 1 This gives us that n−1 n−1 1 X 2πil(jα+β) e2πilβ X 2πiljα e = e n j=0 n j=0 so we have that so by the previous lemma we have the same convergence result. 5 1.1 Generalisation To Higher Dimension (1) (k) k For this subsection we will consider {xn }∞ n=1 : xn = (xn , ..., xn ) ∈ R Definition 1.7. Uniformly Distributed k k A sequence {xn }∞ n=0 ∈ R is uniformly distributed mod 1 if for each choice of k intervals {[as , bs ]}s=1 we have that n−1 k k Y 1 XY (s) χ[as ,bs ] (xj ) = (bs − as ) lim n→∞ n s=1 j=0 s=1 Theorem 1.2. Multi-Dimensional Weyl’s Criterion k The sequence {xn }∞ n=0 ∈ R is u.d. mod 1 iff n−1 1 X 2πi Pks=1 ls x(s) j e =0 n→∞ n j=0 lim where l ∈ Zk \ {0} Definition 1.8. Rationally Independent Pk The sequence {λi }ki=1 are rationally independent if for {rs }ks=1 ∈ Q s.t. s=1 rs λs = 0 we must have that rs = 0 ∀s (j) Theorem 1.3. xn = nαj is uniformly distributed mod 1 iff {αs }ks=1 , 1 are rationally independent. Proof. Suppose {αs }ks=1 , 1 are rationally independent then for any l ∈ Zk \ {0} we have that k X ls αs ∈ /Z s=1 So we have that e2πi Pk s=1 ls nαs 6= 1 Hence P n−1 2πi k 1 X 2πi Pk l jα s=1 ls nαs 1 1 − e s s=1 s e = n 1 − e2πi Pks=1 ls αs n j=0 ≤ 1 2 P n 1 − e2πi ks=1 ls αs 1 2 Pk =0 2πi n→∞ n 1 − e s=1 ls αs lim hence by Weyl’s criterion we have that xn is uniformly distributed mod 1. Now suppose that {αs }ks=1 , 1 are not rationally independent then for some l ∈ Zk \ {0} we have that k X ls αs ∈ Z s=1 hence we have that e2πi So we have that Pk s=1 ls nαs = 1 ∀n ∈ N n 1 X 2πi Pks=1 ls nαs e =1 n→∞ n j=0 lim so by Weyl’s criterion xn is not uniformly distributed mod 1. 6 1.2 Generalisation To Polynomials For this subsection we will write p(n) = k X αs ns s=0 Lemma 1.4. Van-der Corput’s Inequality Let {zj }n−1 j=0 ∈ C and let 1 ≤ m ≤ n − 1 then n−1 2 n−1−j n−1 n−1 X X X X zj ≤ m(n + m − 1) |zj |2 + 2(n + m − 1)< m2 (m − j) zs+j z s j=0 s=0 j=0 j=1 (m) For a sequence {xn }∞ n=0 let xn := xn+m − xn (m) Lemma 1.5. Let {xn }∞ n=0 ∈ R be a sequence, if for each m ≥ 1 we have that xn xn is u.d. mod 1. is u.d. mod 1 then Proof. We need to show that for any l ∈ Z \ {0} we have that lim n→∞ n−1 X e2πilxj = 0 j=0 Let zj := e2πilxj then |zj | = 1. For 1 ≤ m ≤ n we have that by Van-der Corbut’s inequality: 2 n−1 n−1−j m−1 m 2(n + m − 1) X m − j X 2πil(xs+j −xs ) m2 X 2πilxj e < e ≤ n2 (n + m − 1)n + n2 j=0 n n s=0 j=1 n−1−j m−1 m 2(n + m − 1) X 1 X 2πil(x(s) ) j = (n + m − 1) + < (m − j) e n n n s=0 j=1 We have that n−1−j 1 X 2πil(x(s) j ) = 0 e n→∞ n s=0 lim by Weyl’s criterion hence n−1−j m−1 1 X 2πil(x(s) 2(n + m − 1) X j ) = 0 < (m − j) e lim n→∞ n n s=0 j=1 Hence 2 m m(n + m − 1) limsupn→∞ 2 e2πilxj ≤ limsupn→∞ n j=0 n 2 n−1 X =m n−1 1 X 2πilx 1 j limsupn→∞ e ≤ √m n j=0 7 which holds ∀m ≥ 1 so we can choose m arbitrarily large then n−1 1 X 2πilxj =0 e n→∞ n j=0 lim hence by Weyl’s criterion we have that xn is u.d. mod 1 Theorem 1.4. If αr ∈ R \ Q for any αr ∈ {αs }ks=0 then p(n) is u.d. mod 1. Proof. Suppose αk ∈ R \ Q. We want to show this inductively so let ∆(k) be the event that any polynomial with irrational leading coefficient of degree k is u.d. mod 1. From corollary 1.2 we have that ∆(1) holds so suppose for some k ≥ 2 we have that ∆(k − 1) holds. Let k X p(n) = αi ni : αk ∈ R \ Q i=0 Foor any m ≥ 1 we have that p(m + n) − p(n) = k X αi (n + m)i − i=0 = = = k X k X αi ni i=0 αi i X i j i=0 j=0 k−1 X i X i αi j i=0 j=0 i X i i=0 j=0 k−1 X i X i−j − n m k X αi ni i=0 k−1 X αi j j i−j n m + αk k X k j=0 j nj mi−j + αk k−1 X j=0 k−1 X j j n m k−j − k X αi ni i=0 k nj mk−j j + αk nk − k X αi ni i=0 k−1 X i k nj mi−j + αk nj mk−j − αi ni j j i=0 j=0 j=0 i=0 k = αk−1 nk−1 + αk mnk−1 − αk−1 nk−1 + q(k − 2) k−1 = αi Where q(k − 2) is some polynomial of degree k − 2 and hence p(m + n) − p(n) = αk (k − 1)mnk−1 + q(k − 2) which is a polynomial of degree k − 1 with an irrational leading coefficient hence by ∆(k − 1) we have that p(m + n) − p(n) is u.d. mod 1 for any m ≥ 1 and therefore by lemma 1.5 we have that p(n) is u.d. mod 1. So by induction we have that ∆(k) holds for any k ∈ N 2 Dynamical Systems Definition 2.1. Circle We write S := {x + Z : x ∈ R} to we the circle which is an equivalence class over the real line. Definition 2.2. Rotation: T : S → S defined as T (x) = x + α is a rotation of degree α on the circle. Lemma 2.1. If α ∈ Q then a rotation of degree α has every point periodic and if α ∈ R \ Q there are no periodic points. 8 Proof. Suppose α = p/q for p, q ∈ Z, q > 0 coprime. T q (x) = x + qα mod 1 = x + p mod 1 =x so any point x is periodic. If α ∈ R \ Q then the sequence nα + x is u.d. mod 1 hence every orbit is dense and therefore there cannot be any periodic points. Definition 2.3. Cylinder Set: For a function T : S → S we denote the cylinder set of the sequence {xi }ni=0 to be I(x0 , ..., xn ) = {0 ∈ S : T k (x) ∈ Cxk ∀k ∈ [0, n]} where ( Ci = [0, 1/2) [1/2, 1) i=0 i=1 Lemma 2.2. The following statements are true of cylinder sets: • If x ∈ [0, 1) has associated sequence {xn }∞ n=0 then ∞ \ I(x0 , ..., xn ) = {x} n=0 • The set of cylinder sets of rank n form a partition for any n. Proposition 2.1. For the doubling map T : S → S defined as T (x) = 2x mod 1 we have that: • There are 2n periodic points of period n. • The periodic points are dense. • There exists a dense orbit. Proof. • Notice that T n (x) = 2n x mod 1. Suppose T n (x) = x mod 1. This happens iff 2n x = x + p for some p ∈ Z. Which is equivalent to saying x = 2np−1 . Hence each choice of p ∈ [0, 2n − 1) ∩ Z gives a distinct periodic point of which there are precisely 2n − 1. • Let y ∈ [0, 1) and ε > 0. We want to find some periodic point x ∈ (y − ε, y + ε) so find n sufficiently large such that ε > (2n − 1)−1 . Notice that x = 2np−1 for p = 0, ..., 2n − 2 are periodic points distributed evenly with distance (2n − 1)−1 between consequtive values hence clearly some periodic point x must lie in the ball of radius ε around y. • For any x ∈ [0, 1) associate the sequence ( 0 xn := 1 T n (x) ∈ [0, 1/2) T n (x) ∈ [1/2, 1) 9 Now suppose x̃ = Then because ∞ X xn 2n+1 n=0 ∞ X 1 =1 n+1 2 n=0 we have that for almost every sequence xn that x̃ ∈ [0, 1/2) iff x0 = 0. Moreover T (x̃) = 2x̃ mod 1 ∞ X 2xn = mod 1 n+1 2 n=0 = x0 + = ∞ X xn+1 mod 1 2n+1 n=0 ∞ X xn+1 mod 1 2n+1 n=0 so we have that T (x̃) ∈ [0, 1/2) iff x1 = 0. Furthermore we can see by iterating this that T n (x̃) ∈ [0, 1/2) iff xn = 0. So we must have that almost every x can be written x= ∞ X xn 2n+1 n=0 where xn is the sequence associated with x. Since cylinder sets are dense and for the doubling map any cylinder of rank n is an interval of width 2−n it suffices to find a point x ∈ [0, 1) such that T n (x) visits every cylinder. If we order the cylinders 0, 1, 00, 01, 10, 11, 000, ... then we cand define x to be the the point with binary expansion x = 0100011011000... in this way T 0 (x) ∈ [0, 1/2), T 1 (x) ∈ [1/2, 1), T 2 (x) ∈ [0, 1/4), T 4 (x) ∈ [1/4, 1/2), T 6 (x) ∈ [1/2, 3/4), ... and so on so indeed the iterates {T n (x)}∞ n=0 visit every cylinder. 2.1 Subshifts of Finite Type Definition 2.4. One-Sided Shift Space: For S = {0, ..., k} let A be a k × k matrix with entries {0, 1}. The set of one-sided shifts of finite type generated by A is defined as ∞ Σ+ A := {{xj }j=0 : Axj ,xj+1 = 1 ∀j} Definition 2.5. Two-Sided Shift Space: For S = {0, ..., k} let A be a k × k matrix with entries {0, 1}. The set of one-sided shifts of finite type generated by A is defined as ΣA := {{xj }j∈Z : Axj ,xj+1 = 1 ∀j} Definition 2.6. One Sided Shift: + + The one sided shift is σ + : Σ+ A → ΣA defined as σ (x)i = xi+1 Definition 2.7. Two Sided Shift: The two sided shift is σ : ΣA → ΣA defined as σ(x)i = xi+1 10 Definition 2.8. Irreducible: A k × k matrix A of zeros and ones is called irreducible if for any pair i, j ∈ S we can find some n ∈ N s.t. (An )i,j > 0 Notice that for n ∈ N, i, j ∈ S we have that (An )i,j is the number of paths from i → j in n steps in the graph with vertices S and directed edges i → j iff Ai,j = 1. Definition 2.9. Aperiodic: A k × k matrix A of zeros and ones is called aperiodic if ∃n ∈ N such that for any pair i, j ∈ S we have that (An )i,j > 0. In the graphical representation this says that starting from any vertex we can get to any other vertex in exactly n steps. Definition 2.10. Two Sided Cylinder: A two sided cylinder for a partial sequence {yi }ni=m of ΣA is an open set: [ym , ..., yn ]m,n := {x ∈ ΣA : xj = yj ∀m ≤ j ≤ n} Definition 2.11. One Sided Cylinder: A one sided cylinder for a partial sequence {yi }ni=m of Σ+ A is an open set: [ym , ..., yn ]m,n := {x ∈ ΣA : xj = yj ∀m ≤ j ≤ n} Lemma 2.3. A shift space along with the metric d(x, y) := 2−min{|n|:xn 6=yn } is a metric space. Proof. • d(x, y) = 0 ⇐⇒ x = y Notice that d(x, y) = 0 ⇐⇒ −min{|n| : xn 6= yn } = ∞ ⇐⇒ x = y so this clearly holds. • d(x, y) = d(y, x) Clearly min{|n| : xn 6= yn } = min{|n| : yn 6= xn } so indeed this holds. • d(x, y) + d(y, z) ≥ d(x, z) n0 := min{|n| : xn 6= zn } ≥ min{min{|n| : xn 6= yn }, min{|n| : yn 6= zn }} =: m since otherwise xn0 = yn0 = zn0 which is a contradiction. This gives us that 0 d(x, z) = 2−n ≤ 2−m ≤ 2−min{|n|:xn 6=yn } + 2−min{|n|:yn 6=zn } = d(x, y) + d(y, z) Theorem 2.1. For shift space Σ+ A we have that • Σ+ A is compact. • σ + is continuous. Proof. + • If Σ+ A = φ then we are done so assume that ΣA is non-empty. + (m) ∞ Let {x }m=1 ∈ ΣA be a sequence of elements of Σ+ A. Since the cylinders of a given degree form a disjoint countable partition we have that Σ+ A = k [ [i]0,0 i=1 11 Since there are finitely many in this union we must have that ∃i0 ∈ [1, k] such that there are infinitely many elements of x(m) inside [i0 ]0,0 . Denote these the subsequence x(m0 ) . Furthermore we can write k [ [i0 ]0,0 = [i0 , i]0,1 i=1 similarly ∃i1 ∈ [1, k] s.t. there are infinitely many elements of x(m0 ) in [i0 , i1 ]0,1 . Denote these the subsequence x(m1 ) . (m) Inductively we have that ∃{ik }∞ inside k=0 ∈ [1, k] s.t. there are infinitely many elements of x [i0 , ..., ik ]0,k for any k. (m ) We have some element xk k ∈ [i0 , ..., ik ]0,k for every k and since y ∈ [i0 , ..., ik ]0,k we must have (m ) that d(xk k , y) ≤ 2−k so indeed we have a convergent subsequence and therefore the space is sequentially compact and therefore compact. • Let ε > 0 then find n large such that 2−n < ε. Choose δ = 2−(n+1) then d(x, y) < δ =⇒ y ∈ [x0 , ..., xn+1 ]0,n+1 This gives us that σ + (y) ∈ [x1 , ..., xn+1 ]0,n So indeed d(σ + (x), σ + (y)) < 2−n < ε. Definition 2.12. Continual Fraction Map: T : [0, 1) → [0, 1) defined by T (x) = ( 0 1 x x=0 x 6= 0 mod is called the continual fraction map. Definition 2.13. Continual Fraction Expansion: If x ∈ (0, 1) then the continual fraction expansion of x is x= 1 x0 + 1 x1 + x 1 2 +... where {xi }∞ i=1 ∈ N ∪ {∞}. Lemma 2.4. x ∈ (0, 1) has a finite continual fraction expansion iff x ∈ Q. Lemma 2.5. If x ∈ Q ∩ (0, 1) then x has a unique continual fraction expansion. Lemma 2.6. If T is the continual fraction map and x has the continual fraction expansion with sequence {xi }∞ i=0 then 1 xi = T ix Proof. Inductively we have that 1 1 = x0 + 1 x x1 + x2 +... x0 ∈ N and 1 x1 + so indeed 1 x2 +... ∈ [0, 1) 1 = x0 x Assume that ∀i ≤ n we have that 1 xi = T ix 12 T n+1 x = So 1 T n+1 x 1 xn+1 + xn+21 +... = xn+1 + 1 xn+2 + ... where the remaining continual fraction is again less than one and xn+1 ∈ N so indeed 1 xn = T n+1 x So indeed by induction we have the required result. Definition 2.14. Linear Toral Endomorphism: If A : Rk → Rk is a k × k matrix with entries in Z such that det(A) 6= 0 then A is a linear map and TA : Rk /Zk → Rk Zk defined as TA x = Ax mod 1 is called a linear toral endomorphism. Lemma 2.7. The linear toral endomorphism is well defined. Proof. Suppose x, y ∈ Rk such that x = y + n for some integer vector n so x, y are in the same equivalence class on Rk /Zk . Ax = A(y + n) = Ay + An = Ay mod 1 Since n is an interger vector and A has integer entires implies that An is an integer vecotor. Definition 2.15. Linear Toral Automorphism: A linear toral endomorphism TA is a linear toral automorphism if det(A) = ±1 Lemma 2.8. If TA is a linear toral automorphism then TA−1 = TA−1 Definition 2.16. Hyperbolic Toral Automorphism: A linear toral automorphism TA is a hyperbolic toral automorphism if A doesn’t have eigenvalues of modulus 1. Proposition 2.2. Let TA be a hyperbolic toral automorphism of R2 /Z2 then Q2 /Z2 is the set of all periodic points of TA . Proof. Suppose (x1 , x2 ) = pq1 , pq2 where 0 ≤ p1 , p2 < q are integers. We have that ! (n) (n) p1 p2 n TA (x1 , x2 ) = , q q (n) (n) where 0 ≤ p1 , p2 < q are integers representing the transformed points. Notice that q remains unchanged since we are always multiplying by intergers. Moreover there are at most q possible choices (n) (n) (n) for pi and hence q 2 possible distinct combinations (p1 , p2 ) so we must have that there are some n > m ≥ 0 such that TAn (x1 , x2 ) = TAm (x1 , x2 ) But since TA is invertible and TA−1 = TA−1 we have that TAn−m (x1 , x2 ) = (x1 , x2 ) so indeed (x1 , x2 ) are periodic and since (x1 , x2 ) was chosen arbitrarily all rational points are periodic. Suppose (x1 , x2 ) is periodic then ∃n s.t. T n (x1 , x2 ) = (x1 , x2 ). 13 This is equivalent to saying An (x1 , x2 ) = (x1 , x2 ) + (n1 , n2 ) for ni ∈ Z. This gives us that (An − I)(x1 , x2 ) = (n1 , n2 ). Now since TA doesn’t have any eigenvalues with modulus one we have that 1 is not an eigenvalue of A (and therefore An ) so An − I is invertible. Therefore (x1 , x2 ) = (An − I)−1 (n1 , n2 ) But (An − I)−1 is a rational matrix and (n1 , n2 ) is an integer vector so their product must be a rational vector and hence all periodic points are rational. 3 Measure Theory Definition 3.1. Algebra: For a set X, a collection A of subsets of X is called a σ-algebra if: • φ∈F • A ∈ F =⇒ Ac ∈ F • A, B ∈ F =⇒ A ∩ B ∈ F Definition 3.2. σ-Algebra: For a set X, a collection F of subsets of X is called a σ-algebra if: • φ∈F • A ∈ F =⇒ Ac ∈ F S∞ • {Ai }∞ i=1 ∈ F =⇒ i=1 Ai ∈ F Lemma 3.1. If F is a σ-algebra of subsets of X then • X∈X • {Ai }∞ i=1 ∈ F =⇒ T∞ i=1 Ai ∈ F Proof. • X = φc ∈ F T∞ S∞ c • i=1 Ai = ( i=1 Aci ) ∈ F Definition 3.3. Borel σ-Algebra: For a given set X the Borel σ-algebra B(X) on X is the smallest σ-algebra containing all open sets. Definition 3.4. Measurable Space: If X is a set and F a σ-algebra on X then (X, F) is called a measurable space. Definition 3.5. Measure: If (X, F) is a measurable space then µ : R+ → {∞} is a measure on (X, F) if: • µ(φ) = 0 • {Ai }∞ i=1 ∈ F : Ai ∩ Aj = φ ∀i 6= j then µ ∞ [ ! Ai i=1 = ∞ X µ(Ai ) i=1 Definition 3.6. Measure Space: If X is a set, F a σ-algebra on X and µ a measure on (X, F) then (X, F, µ) is called a measure space. 14 Definition 3.7. Finite Measure: A measure µ on (X, F, µ) is finite if µ(X) < ∞ Definition 3.8. σ-Finite Measure: A measure µ on (X, F, µ) is σ-finite if ∃{Ai }∞ i=1 ∈ F such that • µ(Ai ) < ∞ ∀i S∞ • X = i=1 Definition 3.9. Almost Everywhere: For measure space (X, F, µ) a property P holds almost everywhere with respect to µ is µ(P f ails) = 0 Theorem 3.1. Kolmogorov Extension Theorem: If A is an algebra on X and µ : A → R+ satisfies • µ(φ) = 0 • µ σ-finite. • {Ai }∞ i=1 ∈ F : Ai ∩ Aj = φ ∀i 6= j then ∞ [ µ ! Ai = ∞ X i=1 µ(Ai ) i=1 Then ∃1 µ∗ : B(A) → R+ ∪ {∞} extension of µ. Definition 3.10. Stieltjes Measure: For X = [0, 1], A the algebra generated by open intervals and ρ : X → R+ increasing with ρ(1) − ρ(0) = 1 we define the Stieltjes measure with respect to ρ by µ(a, b) = ρ(b) − ρ(a) which extends to the entire space by KET. Definition 3.11. Dirac Measure: For X an arbitrary space and A any non-empty σ-algebra we define the Dirac measure with respect to x ∈ X to be δx (A) = I{x∈A} Definition 3.12. Measurable: If (X, F, µ) is a measure space then f : X → R is called F-measurable if f −1 (A) ∈ F ∀A ∈ B(R) Definition 3.13. Simple: f : X → R is called simple if ∃{Ai }ki=1 ∈ F, {ai }ki=1 ∈ R such that f= k X ai IAi i=1 + Theorem 3.2. For f : X → R+ measureable ∃{fn }∞ n=1 : X → R increasing sequence of simple functions converging pointwise to f . Definition 3.14. Integral: We split the definition of an integral into three seperate cases: • If f : X → R is simple then we can write f= k X ai IAi i=1 and then Z f dµ = k X i=1 15 ai µ(Ai ) • If f : X → R+ then by theorem 3.2 we can find an increasing sequence {fn }∞ n=1 of simple functions converging pointwise to f but less than f everywhere then Z Z f dµ = lim fn dµ n→∞ • If f : X → R such that Z |f |dµ < ∞ then we write f + = max{f, 0}, f − = max{−f, 0} so that f = f + − f − then define Z Z Z f dµ = f + dµ − f − dµ All of which are consistent. Definition 3.15. Equivalent: If f, g : X → R are measurable then they are equivalent with respect to the measure µ if f = g µ − a.e. We write L1 (X, F, µ) to be the space of equivalence classes of integrable functions f : X → R and define Z ||f ||1 = |f |dµ to be its norm which defines a metric via d(f, g) = ||f − g||1 Furthermore for p ≥ 1 we write Lp (X, F, µ) to be the space of equivalence classes of functions f : X → R such that |f |p is integrable and define Z ||f ||p = p1 |f | dµ p to be its norm. Lemma 3.2. If (X, F, µ) is a finite measure space then for 1 ≤ p < q we have that Lq (X, F, µ) ⊂ Lp (X, F, µ) Theorem 3.3. Monotone Convergence Theorem: R If {fn }∞ fn dµ is n=1 : X → R is an increasing sequence of integrable functions on (X, F, µ) such that a bounded sequence then limn→∞ fn exists µ a.e, is integrable and Z Z lim fn dµ = lim fn dµ n→∞ n→∞ Theorem 3.4. Dominated Convergence Theorem: If {fn }∞ n=1 : X → R is a sequence of measurable functions on (X, F, µ) such that |fn | ≤ g for some integarble function g : X → R and limn→∞ fn = f µ a.e. then f is integrable and Z Z lim fn dµ = f dµ n→∞ 4 Measures On Compact Metric Spaces Lemma 4.1. C(X, R) := {f : X → R continuous} equiped with the metric d(f, g) = ||f − g||∞ := supx∈X |f (x) − g(x)| is a metric space. 16 We denote M (X) to be the set of probability measures on (XB) then for µ ∈ M (X) we write Z µ(f ) := f dµ Proposition 4.1. If µ ∈ M (X) then: • µ is continuous: fn ∈ C(X, R), limn→∞ fn = f then limn→∞ µ(fn ) = µ(f ) • µ is bounded: f ∈ C(X, R) then |µ(f )| eq||f ||∞ • µ is linear: λ1 , λ2 ∈ R, f1 , f2 ∈ C(X, R) then µ(λ1 f1 + λ2 f2 ) = λ1 µ(f1 ) + λ2 µ(f2 ) • µ is positive: f ≥ 0 then µ(f ) ≥ 0 • µ is normalised: µ(1) = 1 Theorem 4.1. Riesz Representation Theorem: Let w : C(X, R) → R be a linear, bounded, positive, normalised functional then ∃1 µ ∈ M (X) such that Z w(f ) = f dµ Definition 4.1. Complete: A metric space is complete if every Cauchy sequence converges. Definition 4.2. Seperable: A metric space is seperable if it contains a countable dense subset. Proposition 4.2. M (X) is convex. i.e. µ1 , µ2 ∈ M (X), α ∈ [0, 1] then αµ1 + (1 − α)µ2 ∈ M (X) Definition 4.3. Weak Convergence: If µ, {µn }∞ n=1 ∈ M (X) then µn converges to µ weakly if ∀f ∈ C(X, R) we have that Z Z lim f dµn = f dµ n→∞ Lemma 4.2. ∃{fn }∞ n=1 ∈ C(X, R) countable dense subset and ∀µ, ν ∈ M (X) we have that ∞ X Z Z 1 d(µ, ν) := f dµ − f dν 2n ||fn ||∞ n=1 is a metric on M (X) compatable with the notation of weak convergence. Theorem 4.2. If X is a compact metric space then M (X) is weakly compact. Proof. It suffices to show that M (X) is sequentially compact i.e. that: ∞ ∀{µn }∞ n=1 ∈ M (X) ∃{µnk }k=1 which converges weakly. C(X, R) is seperable so choose a countable dense subset {fi }∞ i=1 ∈ C(X, R). ∈ M (X) we have that Given {µn }∞ n=1 µn (f1 ) ≤ ||f1 ||∞ ∀n by boundedness of µn hence since this sequence is in R we have that there is some convergent ∞ subsequence {µnk (1) }∞ k=1 ⊆ {µn }n=1 17 Similarly for each r = 2, 3, ... we have that µn (fr ) ≤ ||fr ||∞ ∀n by boundedness of µn hence since this sequence is in R we have that there is some convergent ∞ subsequence {µnk (r) }∞ k=1 ⊆ {µnk (r−1) }k=1 In particular let νn := µnk(n) be the diagonal sequence then νn (fn ) converges ∀n ≥ 1 Since fn are dense we have that for any f ∈ C(X, R) and fixed ε > 0 we can find some fi in our counable set such that ||f − fi ||∞ < ε. Since νn converges we can find N ∈ N such that ∀m, n ≥ N we have that |νn (fi ) − νm (fi )| < ε hence |νn (f ) − νm (f )| ≤ |νn (f ) − νn (fi )| + |νn (fi ) − νm (fi )| + |νm (fi ) − νm (f )| ≤ 3ε So indeed νn (f ) converges. Moreover by writing w(f ) = lim νn (f ) n→∞ we have that W satisfies the Riesx representation theorem criteria hence ∃1 µ ∈ M (X) s.t. Z w(f ) = f dµ so we must have that Z lim n→∞ Z f dνn = f dµ ∀f ∈ C(X, R) so indeed νn converges weakly to µ so we have some convergent subsequence. 5 Measure Preserving Transformations Definition 5.1. Measure Preserving Transformation: T : X → X measurable on (X, B) is a measure preserving transform if µ(T −1 (A)) = µ(A) Lemma 5.1. TFAE: 1. T is a measure preserving transform. 2. ∀f ∈ L1 (X, B, µ) we have that Z Z f ◦ T dµ = Proof. 2 =⇒ 1): For A ∈ B we have that χA ∈ L1 (X, B, µ) Z µ(A) = χA dµ Z χA ◦ T dµ = Z = χT −1 (A) dµ = µ(T −1 (A)) 18 f dµ ∀A ∈ B 1 =⇒ 2): Suppose T is a measure preserving transform then for any characteristic function we have that: Z χA dµ = µ(A) = µ(T −1 (A)) Z = χA ◦ T dµ This extends to simple functions by linearity and for f ∈ L1 (X, B, µ) we can find an increasing sequence of simple functions fn ∈ L1 (X, B, µ) s.t. fn → f pointwise. In particular fn ◦ T → f ◦ T pointwise so: Z Z fn dµ f dµ = lim n→∞ Z = lim fn ◦ T dµ n→∞ Z = f ◦ T dµ Definition 5.2. Push Forward Measure: For T : X → X continuous on compact X we define T∗ : M (X) → M (X) by T∗ µ(A) = µ(T −1 A) and call T∗ µ the push forward measure. Notice that µ is T -invariant iff T∗ µ = µ Lemma 5.2. For f ∈ C(X, R) we have that Z Z f d(T∗ µ) = f ◦ T dµ Lemma 5.3. If T : X → X is continuous on compact metric space X then TFAE: 1. T∗ µ = µ 2. ∀f ∈ C(X, R) we have that: Z Z f dµ = f ◦ T dµ Proof. 1 =⇒ 2) follows from lemma 5.1. 2 =⇒ 1): Let w1 , w2 : C(X, R) → R be defined by Z w1 (f ) = f dµ Z w2 (f ) = f d(T∗ µ) which satisfy the criteria for Riesz representation theorem so Z w2 (f ) = f d(T∗ µ) Z = f ◦ T dµ Z = f dµ = w1 (f ) 19 but by uniqueness from RRT we must have that µ = T∗ µ Theorem 5.1. Let T : X → X be a continuous mapping of compact metric space X then there is at least one T -invariant probability measure. Proof. Let σ ∈ M (X) and define µn := then we have that Z n−1 1X j T σ n j=0 ∗ n−1 Z 1 X f ◦ T j dσ f dµn = n j=0 Since M (X) is weakly compact we have that µn has a convergent subsequence µnk converging to some µ ∈ M (X). We need to show that µ is T -invariant. Let f : X → R be continuous then Z Z Z Z f ◦ T dµ − f dµ = lim f ◦ T dµn − f dµn k k k→∞ nX Z n−1 X 1 Z k −1 1 = lim f ◦ T j+1 dσ − f ◦ T j dσ k→∞ nk n j=0 j=0 k Z Z 1 = lim f ◦ T nk dσ − f dσ k→∞ nk 2||f ||∞ ≤ lim k→∞ nk =0 Theorem 5.2. For a compact metric space X and T : X → X continuous mapping we have that: • M (X, T ) is convex. • M (X, T ) is closed. Proof. • Let µ1 , µ2 ∈ M (X, T ) and α ∈ (0, 1) (αµ1 + (1 − α)µ2 )(T −1 B) = αµ1 (T −1 B) + (1 − α)µ2 (T −1 B) = αµ1 (B) + (1 − α)µ2 (B) = (αµ1 + (1 − α)µ2 )(B) • Let {µn }∞ n=1 ∈ M (X, T ) be a sequence of T -invariant probability measures converging to some µ ∈ M (X) weakly. For f ∈ C(X, R) Z Z f ◦ T dµ = lim f ◦ T dµn n→∞ Z = lim f dµn n→∞ Z = f dµ 20 Corollary 5.1. In order to show that a continuous mapping T : X → X is µ-invariant we can simply check µ(T −1 B) = µ(B) for open intervals. Open intervals generate the Borel σ algebra and hence by Kolmogorov’s extension theorem we must have that the measures are unique and hence T∗ µ, µ coincide on the entire σ-algebra. Definition 5.3. Fourier Series: For f ∈ L1 (R \ Z, B, µ) we have the Fourier series ∞ X cn e2πinx n=−∞ where Z 1 cn = f (x)e−2πinx dµ(x) 0 For general f we do not have that this series necessarily converges. Lemma 5.4. Riemann-Lebesgue: If f ∈ L1 then limn→∞ cn = 0 We denote n X Sn (x) := cn e2πirx r=−n to be the partial sum of the Fourier series and σn (x) := n−1 1X Sk (x) n k=0 to be the Cesaro average. Theorem 5.3. Riesz-Fischer: For f ∈ L2 (R \ Z, B, µ) we have that Sn converges to f in L2 Lemma 5.5. For f ∈ L2 (R \ Z, B, µ) we have that Sn converges to f µ-a.e. Theorem 5.4. Feyer’s: If f is continuous then σn converges uniformly to f Corollary 5.2. In order to determine whether a measure µ is T -invariant it suffices to check that Z Z σn dµ = σn ◦ T dµ By Feyer’s theorem. Moreover to check this it suffices to show that Z Z Sn dµ = Sn ◦ T dµ so long as f ∈ L2 by Riesz-Fischer. Definition 5.4. Equivalent: Two measures µ, ν on the same measurable space are equivalent if they have the same collection of null sets. Lemma 5.6. If T : Rk /Zk is a linear toral endomorphism defined as T (x) = Ax mod 1 then T is Lebesgue invariant. 21 Proof. Let f ∈ L1 (Rk /Zk , mathcalB, λ) then f has Fourier series X cn e2πi<n,x> n∈Z where Z cn := f (x)e−2πi<n,x> dλ Rk /Zk Moreover Z e Rk /Zk 2πi<n,x> ( 0 dλ = 1 n 6= 0 n=0 Hence since det(A) 6= 0 we have that nA = 0 ⇐⇒ n = 0 so Z Z X f ◦ T dλ = cn e2πi<n,Ax> dλ n∈Zk = Z X cn e2πi<nA,x> dλ n∈Zk = X Z cn e2πi<nA,x> dλ n∈Zk = c0 Z = f dλ Theorem 5.5. Perron-Frobenum: If B is a non-negative, aperiodic, k × k matrix then: • ∃λ > 0 eigenvalue of B such that |λ̃| < λ for all other eigenvalues λ̃. of B • λ is simple i.e. the eigenspace of λ is one dimensional. • ∃1 v right-eigenvector s.t. vi > 0 ∀i, Bv = λv and k X vi = 1 i=1 • ∃1 u left-eigenvector s.t. ui > 0 ∀i, uB = λu and k X ui = 1 i=1 • Eigenvectors corresponding to other eigenvalues have at least one negative entry. Definition 5.5. Stochastic Matrix: A k × k matrix P is called stochastic if • Pi,j ≥ 0 Pk • j=1 Pi,j = 1 ∀i Definition 5.6. Compatible: Stochastic matrix P is compatible with 0, 1 matrix A if Pi,j > 0 ⇐⇒ Ai,j = 1 22 Corollary 5.3. P is aperiodic if and only if A is aperiodic where P is compatible with A. Lemma 5.7. If P is a stochastic matrix then P satisfies the hypothesis of the P-F theorem and hence ∃λ > 0 strictly largest eigenvalue moreover, λ = 1 and has corresponding right-eigenvector v = 1 which is the unique eigenvector with positive entries. Definition 5.7. Markov Measure: If P is a stochastic matrix and p the left-eigenvector of P then µP [y0 , ..., yn ] := py0 n Y Pyi−1 ,yi i=1 defines the Markov measure on cylinder sets which extends to the shift space by KET. Lemma 5.8. A Markov measure is σ-invariant. Proof. σ∗ µP [y0 , ..., yn ] = µP (σ −1 [y0 , ..., yn ]) = µP k [ ! [i, y0 , ..., yn ] i=1 = k X µP ([i, y0 , ..., yn ]) i=1 = k X pi Pi,y0 i=1 = py0 n Y Pyj−1 ,yj j=1 n Y Pyj−1 ,yj since p is a left eigenvector of P j=1 = µP ([y0 , ..., yn ]) There are uncountably many σ-invariant probability measures. Definition 5.8. Bernoulli Measure: For a full shift we define the Bernoulli measure by the stochastic matrix Pi,j = pj so that µP ([y0 , ..., yn ]) := n Y pyi i=0 Definition 5.9. Parry Measure: For a 0, 1 matrix A with eigenvalue λ and eigenvectors u, v determined by P-F we define the Parry measure to be that generated by Ai,j vj Pi,j = λvi and ui vi pi = Pk j=1 uj vj 23 6 Ergodicity Definition 6.1. Ergodic: If (X, B, µ) is a probability space then the measure preserving transformation T : X → X is ergodic if B ∈ B s.t. T −1 (B) = B implies that µ(B) ∈ {0, 1} Theorem 6.1. If T is an ergodic measure preserving transformation of probability space (X, B, µ) and f ∈ L1 (X, B, µ) then Z n−1 1X j lim f (T x) = f dµ n→∞ n j=0 Lemma 6.1. If ∃A ∈ B such that T −1 (A) = A but µ(A) ∈ (0, 1) then T is not ergodic for µ but µA defined by µ(B ∩ A) µA (B) = µ(A) is invariant with respect to T . Lemma 6.2. If B ∈ B is such that µ(T −1 B∆B) = 0 then ∃B∞ ∈ B such that T −1 (B∞ ) = B∞ and µ(B∞ ∆B) = 0 Corollary 6.1. If T is ergodic and µ(T −1 (B)∆B) = 0 then µ(B) ∈ {0, 1} Proposition 6.1. Let T be a measure preserving transformation of the probability space (X, B, µ) then TFAE: • T is ergodic. • Whenever f ∈ L1 (X, B, µ) such that f ◦ T = f µ a.e. we have that f is constant µ a.e. Proof. • 1 =⇒ 2) Suppose T is ergodic and f ∈ L1 (X, B, µ) such that f ◦ T = f µ a.e. For k ∈ Z, n ∈ N define h k k + 1 k k+1 X(k, n) := x ∈ X : n ≤ f (x) < = f −1 n , n n 2 2 2 2 Since f is measurable we have that X(k, n) ∈ B. Moreover T −1 (X(k, n))∆X(k, n) ⊂ {x ∈ X : f (x) 6= f (T x)} hence µ(T −1 (X(k, n))∆X(k, n)) = 0 so we must have that µ(X(k, n)) ∈ {0, 1} by corollary 6.1 For fixed n ∈ N we have that [ X= X(k, n) ∪ X∞ k∈Z forms a disjoint partition where X∞ := {x : f (x) = ±∞}. Furthermore µ(X∞ ) = 0 since f ∈ L1 which means that X 1 = µ(X) = µ(X(k, n)) k∈Z Since each X(k, n) satisfies µ(X(k, n)) ∈ {0, 1} we must have that for each n ∈ N that ∃1 kn ∈ Z such that µ(X(kn , n)) = 1. If we let ∞ \ Y = X(kn , n) = {x ∈ X : f (x) = c} n=1 for some c then µ(Y ) = 1 and f is constant on Y so f is constant µ a.e. 24 • 2 =⇒ 1) Suppose B ∈ B such that T −1 (B) = B. χB ∈ L1 and χB ◦ T = χT −1 B = χB so by our assumption χB is constant µ a.e. hence Z µ(B) = χB dµ ∈ {0, 1} Hence T is ergodic. This extends to f ∈ L2 (X, B, µ) using the same proof. Theorem 6.2. If T : R/Z → R/Z is the rotation T (x) = x + α mod 1 then T is ergodic for the Lebesgue measure iff α ∈ R/Q Proof. Suppose α ∈ Q then let α = p/q for p, q ∈ Z coprime. Define f (x) = e2πiqx ∈ L2 which isn’t constant. f (T x) = e2πiq(x+α) = e2πiqx e2πiqα = e2πiqx e2πip = e2πiqx hence f ◦ T = f µ a.e. but f isn’t constant µ a.e. hence T cannot be ergodic. Suppose α ∈ R \ Q and f ∈ L2 with f ◦ T = f µ a.e. Then f has Fourier series ∞ X cn e2πinx n=−∞ Moreover f ◦ T has Fourier series ∞ X cn e2πinx e2πinα n=−∞ Since f ◦ T = f µ a.e. we must have that the Fourier series coincide so by comparing coefficients we have that for n 6= 0 cn = cn e2πinα which only has a solution if cn = 0. We therefore have that both functions have the Fourier series c0 which is a constant hence f = c0 µ a.e. so T is ergodic. Theorem 6.3. If T : R/Z → R/Z is the doubling map T (x) = 2x mod 1 then T is ergodic with respect to the Lebesgue measure. Proof. Suppose f ∈ L2 with f ◦ T = f f has Fourier series µ a.e. then f ◦ T j = f X µ a.e. for any j ∈ N cn e2πinx n∈Z and f ◦ T j has Fourier series X j cn e2πin2 x n∈Z j Since f = f ◦ T µ a.e. we must have that the Fourier series coincide hence cn = cn2j for any j ≥ 0 If n 6= 0 then limj→∞ |n2j | = ∞ but by the Riemann-Lebesgue lemma the coefficients must converge to zero: limj→∞ cn2j = 0 This means that cn = 0 whenever n 6= 0, in particular f = c0 µ a.e. hence T is ergodic. 25 Lemma 6.3. If T : Rk /Zk → Rk /Zk is a linear toral automorphism T (x) = Ax mod 1 then TFAE • T is ergodic with respect to the Lebesgue measure. p • The only n ∈ Zk s.t. ∃p ∈ N with e2πi<n,A x> = e2πi<n,x> µ a.e. is n = 0 Proof. Suppose T is ergodic with respect to µ and that ∃n ∈ Zk , p ∈ N s.t. p e2πi<n,A x> = e2πi<n,x> µ a.e. WLOG let p be the smallest such p for this m and define f (x) = p−1 X j e2πi<n,A x> ∈ L2 j=0 Notice that f ◦ T = f µ a.e. Since T is ergodic we must have that f is constant which is only the case if m = 0. p Suppose the only n ∈ Zk s.t. ∃p ∈ N with e2πi<n,A x> = e2πi<n,x> µ a.e. is n = 0 Let f ∈ L2 s.t. f ◦ T = f µ a.e. f has Fourier series X cn e2πi<n,x> n∈Z p f ◦ T has Fourier series X p e2πi<n,A x> n∈Z = X p cn e2πi<nA ,x> n∈Z p Since f = f ◦ T µ a.e. we can equate coefficients which gives us that cn = cnAp for any p ≥ 0 Suppose cn 6= 0 then cnAp 6= 0 If limp→∞ ||nAp || = ∞ then by Riemann Lebesgue we must have that limp→∞ cnAp = 0 which contradicts our last assumption so it must be the case that nAp has repeats. 0 This means that ∃l > l0 s.t. nAl = nAl and so nAp = n for some p ∈ N. p This gives us that e2πi<n,A x> = e2πi<n,x> and so by our initial assumption n = 0. This means that we must have that f = c0 is constant and hence T is ergodic. Proposition 6.2. If T : Rk /Zk → Rk /Zk is a linear toral automorphism T (x) = Ax mod 1 then T is ergodic with respect to the lebesgue measure iff A has no roots of unity as eigenvalues. Proof. Suppose that T is not ergodic then by the previous lemma ∃n ∈ Zk \ {0}, p ∈ N s.t. p e2πi<n,A x> = e2πi<n,x> So we have that nAp = n and since n 6= 0 we must have that 1 is an eigenvalue of Ap so A has an eigenvalue which is a root of unity. Suppose A has a pth root of unity as an eigenvalue. Then Ap has 1 as an eigenvalue hence ∃n ∈ Rk \ {0} such that n(Ap − I) = 0 In particualr since Ap has integer entries we can choose n such that n ∈ Zk \ {0} p This means that nAp = n and e2πi<n,A x> = e2πi<n,x> so by the previous lemma T is not ergodic with respect to µ. Corollary 6.2. Hyperbolic toral automorphisms and ergodic with respect to the Lebesgue measure. Definition 6.2. Extremal: For a convex set Y we say that y ∈ Y is extremal if y = αy1 + (1 − α)y2 where y1 , y2 ∈ Y and α ∈ (0, 1) implies that y1 = y2 = y Theorem 6.4. For µ ∈ M (X, T ) we have that if µ is extemal then µ is ergodic. Proof. Suppose that µ is not ergodic then ∃B ∈ B such that T −1 (B) = B and µ(B) ∈ (0, 1). Define µ(A ∩ B) µ1 (A) := µ(B) µ2 (A) := µ(A ∩ (X \ B)) µ(X \ B) 26 which are both invariant probability measures with respect to T moreover µ1 6= µ2 µ = µ(B)µ1 + (1 − µ(B))µ2 hence µ cannot be extremal. Theorem 6.5. If T : X → X is a continuous mapping on a compact metric space then M (X, T ) contains at least one ergodic measure. Proof. It suffices to show that ∃µ ∈ M (X, T ) extremal by the previous theorem. C(X, R) is separable so choose a countable dense set {fn }∞ n=0 ∈ C(X, R) R The map µ → f0 dµ is continuous in the weak topology so since M (X, T ) is compact ∃ν ∈ M (X, T ) s.t. Z Z f0 dν = supµ∈M (X,T ) f0 dµ Which means that M0 := Z ν ∈ M (X, T ) : Z f0 dν = supµ∈M (X,T ) f0 dµ Is a non-empty, closed subset of a compact space and hence is compact. Continuing inductively we can define Z Z Mn := ν ∈ Mn−1 : fn dν = supµ∈Mn−1 fn dµ each of which is non-empty and compact. T∞ If we define M∞ := n=0 Mn then M∞ is non-empty since the countable intersection of nested, non-empty compact sets is non-empty. We therefore have that ∃µ∞ ∈ M∞ . We claim that µ∞ is extremal. Suppose µ∞ = αµ1 + (1 − α)µ2 for α ∈ (0, 1) and µ1 , µ2 ∈ M (X, T ) then we want to show that µ1 = µ2 . By the Reisz representation theorem we have that µ1 = µ2 iff Z Z f dµ1 = f dµ2 ∀f ∈ C(X, T ) However it suffices to show this for some dense subset. Z Z Z f0 dµ∞ = α f0 dµ1 + (1 − α) f0 dµ2 so we must have that Z Z Z Z Z supµ∈M (X,T ) f0 dµ = f0 dµ∞ ≤ max f0 dµ1 , (1 − α) f0 dµ2 ≤ supµ∈M (X,T ) f0 dµ Since µ1 , µ2 ∈ M (X, T ) It therefore follows that µ1 , µ2 ∈ M0 . Suppose µ1 , µ2 ∈ Mn for some n ∈ N0 then: Z Z Z fn dµ∞ = α fn dµ1 + (1 − α) fn dµ2 so we must have that Z Z Z Z Z supµ∈Mn−1 fn dµ = fn dµ∞ ≤ max fn dµ1 , (1 − α) fn dµ2 ≤ supµ∈Mn fn dµ Since µ1 , µ2 ∈ Mn−1 We therefore have that Z Z fn dµ∞ = since α ∈ (0, 1). This holds ∀n ∈ N hence Z fn dµ1 = Z Z f dµ1 = ∀f ∈ {fn }∞ n=0 so indeed µ1 = µ2 . 27 f dµ2 fn dµ2 Corollary 6.3. IfQσ : Σk → Σk is the full shift and p is a probability vector then n−1 µp [z0 , ..., zn−1 ] := i=0 pzi is ergodic for σ. Corollary 6.4. µ(B) = 1 log(2) Z B 1 dx 1+x is ergodic for the continued fraction map. 7 Recurrence and Unique Ergodicity Theorem 7.1. Poincare Recurrence Theorem: Let T : X → X be a measure preserving transform of the probability space (X, B, µ) for compact X. If A ∈ B s.t. µ(A) > 0 then for µ almost every x ∈ A we have that {T n x}∞ n=0 return to A infinitely often. Proof. Let E = {x ∈ A : ∃minmathbbN : T n x ∈ / A ∀n ≥ m} which is equivalent to the set of x ∈ A such that the sequence only returns to A finitely often. We want to show that µ(A \ E) = 0. Let F = {x ∈ A : T n x ∈ / A ∀n ≥ 1} then T −k F = {x ∈ X : T k x ∈ A, T n x ∈ / A ∀n > k} So in particular we have that ∞ [ A\E = (T −k F ∩ A) k=0 So we have that ∞ [ µ(A \ E) = µ ≤µ ! (T k=0 ∞ [ −k F ∩ A) ! T −k F k=0 ≤ ∞ X µ(T −k F ) k=0 ∞ X = µ(F ) k=0 So it suffices to show that µ(F ) = 0. If n > m then suppose x ∈ T −n F ∩ T −m F . We must have that T m x ∈ F and also that T n−m (T m x) = T n x ∈ F ⊂ A which contradicts that x ∈ T −m F so we must have that {T −k }∞ k=0 are disjoint. This gives us that ! ∞ ∞ ∞ [ X X −n µ T F = µ(T −k F ) = µ(F ) k=0 k=0 k=0 The left hand side lies in the interval [0, 1] since µ is a probability measure and the right hand side can only take values in {±∞, 0} since it is the infinite sum of the same value hence the equality implies that both sides must equal zero and therefore µ(F ) = 0. Definition 7.1. Unique Ergodicity: If (X, B) is a measurable space for X compact and T : X → X has a unique invariant measure µ then T is called uniquely ergodic. Theorem 7.2. Let X be a compact metric space and T : X → X continuous then the following are equivalent: 1. T is uniquely ergodic. 28 2. ∀f ∈ C(X, R) ∃cf constant such that n−1 1X f (T j x) = cf n→∞ n j=0 lim uniformly over x. Proof. We split the proof into the two separate implications: • 2 =⇒ 1) Suppose µ, ν are T invariant probability measures then Z f dµ = n−1 Z 1X f ◦ T j dµ n j=0 n−1 Z 1X f ◦ T j dµ n→∞ n j=0 = lim Z = lim n→∞ Z = n−1 1X f ◦ T j dµ n j=0 n−1 1X f ◦ T j dµ n→∞ n j=0 lim by DCT Z = cf dµ = cf Similarly the same must hold for ν hence Z Z f dµ = f dν for any f ∈ C(X, R) so indeed µ, ν coincide by Riesz representation theorem. • 1 =⇒ 2) Suppose µ is an invariant measure then if 2 holds we have that ∃cf such that Z f dµ = cf Suppose 2 fails then we want to show that 1 also fails. ∞ ∃f ∈ C(X, R) and sequence {nk }∞ k=1 ∈ N with associated {xk }k=1 ∈ X such that Z nk −1 1 X f ◦ T j xk 6= f dµ k→∞ nk j=0 lim For k ≥ 1 define νk ∈ M (X) by νk = then Z nk −1 1 X T j δx nk j=0 ∗ k nk −1 1 X f dνk = f (T j xk ) nk j=0 so νk has some subsequence νkr which converges to some invariant probability measure ν. 29 Z Z f dν = lim r→∞ f dνkr nkr −1 1 X f (T j xkr ) r→∞ nkr j=0 Z 6 = f dµ = lim So µ is not uniquely ergodic. 8 Birkhoff ’s Ergodic Theorem Definition 8.1. Absolutely Continuous: If µ, ν are measures on (X, B) then ν is absolutely continuous with respect to µ if µ(B) = 0 =⇒ ν(B) = 0 for any B ∈ B We have that if Z ν(B) := f dµ B then ν is absolutely continuous with respect to µ. Theorem 8.1. Radon-Nikodym: Let (X, B, µ) be a probability space and ν be a measure on (X, B) absolutely continuous with respect to µ then ∃1 f non-negative measurable function such that Z ν(B) = f dµ B ∀B ∈ B Definition 8.2. Conditional Expectation: For A ⊆ B sub-σ algebra µ|A is a measure. For f ≥ 0 ∈ L(X, B, µ) Z ν(A) = f dµ A is a measure absolutely continuous with respect to µ so by Radon-Nikodym ∃1 E[f |A] measure s.t. Z ν(A) = E[f |A]dµ called the conditional expectation of f given A Corollary 8.1. E[f |A] is uniquely determined by the requirements that • E[f |A] is A-measurable. R R • A f dµ = A E[f |A]dµ Lemma 8.1. I := {B ∈ B : T ∀A ∈ A −1 B = B a.e.} is a σ-algebra of invariant sets. Theorem 8.2. Birkhoff ’s Ergodic Theorem: Let (X, B, µ) be a probability space and T : X → X a measure preserving transformation. ∀f ∈ L(X, B, µ) we have that n−1 1X lim f (T j x) = E[f |I] n→∞ n j=0 for a.e. x ∈ X. 30 Corollary 8.2. Let (X, B, µ) be a probability space and T : X → X an ergodic measure preserving transformation. ∀f ∈ L(X, B, µ) we have that Z n−1 1X f (T j x) = f dµ lim n→∞ n j=0 for a.e. x ∈ X. Proof. If T is ergodic that I is the set of all trivial sets so for f ∈ L(X, B, µ) Z E[f |I] = f dµ so the result follows by Birkhoff’s ergodic theorem. Corollary 8.3. If T : X → X is an ergodic transformation of (X, B, µ) and B ∈ B then lim n→∞ 1 #{j : 0 ≤ j ≤ n − 1, T j x ∈ B} = µ(B) n Theorem 8.3. If T : X → X is a measure preserving transformation of the probability space (X, B, µ) then the following are equivalent: 1. T ergodic. 2. ∀A, B ∈ B n−1 1X µ (T −j A) ∩ B = µ(A)µ(B) n→∞ n j=0 lim Proof. • 1 =⇒ 2) Suppose T is ergodic then χA ∈ L1 so Z n−1 1X χA T j = χA dµ = µ(A) n→∞ n j=0 lim a.e. so n−1 1X (χA ◦ T j )χB = µ(A)χB n→∞ n j=0 lim a.e. Since the left hand side is bounded by 1 by DCT we have that n−1 n−1 Z 1X 1X µ((T −j A) ∩ B) = (χA ◦ T j )χB dµ n j=0 n j=0 Z = n−1 1X (χA ◦ T j )χB dµ n j=0 Z n−1 n−1 1X 1X µ((T −j A) ∩ B) = lim (χA ◦ T j )χB dµ n→∞ n n→∞ n j=0 j=0 Z = µ(A)χB dµ lim = µ(A)µ(B) 31 • Suppose 2 holds and that T −1 A = A then set B = A which gives us that µ((T −j A) ∩ B) = µ(A ∩ B) = µ(A) So µ(A) = = n−1 1X µ(A) n j=0 n−1 1X µ((T −j A) ∩ B) n j=0 = µ(A)µ(B) = µ(A)2 So µ(A) ∈ {0, 1} and therefore T is ergodic. Definition 8.3. Weak-Mixing: T is weak mixing if ∀A, B ∈ B we have that n−1 1X |µ((T −j A) ∩ B) − µ(A)µ(B)| = 0 n→∞ n j=0 lim Definition 8.4. Strong-Mixing: T is strong mixing if ∀A, B ∈ B we have that lim µ((T −j A) ∩ B) = µ(A)µ(B) n→∞ Definition 8.5. Normal: We call x ∈ [0, 1) normal if it has a unique binary expansion x= ∞ X xi i=1 2i for xi ∈ {0, 1} with 1 #{j : 1 ≤ j ≤ n, xj = 0} = 1/2 n→∞ n lim 9 Entropy Definition 9.1. Topologically Conjugate: For compact spaces X, Y we say continuous functions T : X → X, S : Y → Y are topologically conjugate if ∃h : X → Y homeomorphism such that h ◦ T = S ◦ h Definition 9.2. Isomorphic: If T, S are measure preserving transforms on probability spaces (X, B, µ), (Y, C, ν) respectively then T, S are isomorphic if ∃M ∈ B, N ∈ C s.t. • T M ⊆ M, SN ⊆ N • µ(M ) = 1 = ν(N ) • ∃ϕ : M → N bijection s.t. – ϕ, ϕ−1 are measurable: ϕ(B) ∈ C ∀B ∈ B, ϕ−1 (C) ∈ B ∀C ∈ C 32 – ϕ, ϕ−1 are measure preserving: µ(ϕ−1 C) = ν(C) ∀C ∈ C, ν(ϕB) = µ(B) ∀B ∈ B – ϕ◦T =S◦ϕ Definition 9.3. Conditional Probability: Let (X, B, µ) be a probability space; ff A ⊂ B is a sub-σ algebra and B ∈ B then µ(B|A) := E[χB |A] is the conditional probability of B given A Definition 9.4. Countable Partition: α is a measure theoretic countable partition of probability space (X, B, µ) if α = {Ai }∞ i=1 s.t. • Ai ∈ B ∀i • µ(Ai ∩ Aj ) = 0 ∀i 6= j S∞ • µ ( i=1 Ai ) = 1 Corollary 9.1. For a measurable function f we have that X χA (x) Z f dµ E[f |σ(α)](x) = µ(A) A A∈α Furthermore µ(B|σ(α))(x) = µ(A ∩ B) µ(A) x∈A Theorem 9.1. Increasing Martingale Theorem: S∞ Let {Ai }∞ i=1 be an increasing sequence of sub-σ algebras of A such that σ ( i=1 Ai ) = A then • limn→∞ E[f |An ] = E[f |A] µ − a.e. R • limn→∞ |E[f |An ] − E[f |A]|dµ = 0 Definition 9.5. Join: If α, β are countable partitions of X then the join of α, β is the partition: α ∨ β := {A ∩ B : A ∈ α, B ∈ β} Notice that α, β are independent if µ(A ∨ B) = µ(A)µ(B) ∀A ∈ α, B ∈ β. Definition 9.6. Information: Given a partition α we define the information I(α) : X → R+ obtained from observing α to be X I(α)(x) := − χA (x) log(µ(A)) A∈α Corollary 9.2. I(α) is continuous. Corollary 9.3. If α, β are independent partitions then I(α ∨ β) = I(α) + I(β) 33 Proof. I(α ∨ β) = − X χC (x) log(µ(C)) C∈α∨β =− X χA∩B (x) log(µ(A ∩ B)) A∈α,B∈β =− XX χA (x)χB (x) log(µ(A)µ(B)) by independence A∈α B∈β =− XX χA (x)χB (x)(log(µ(A)) + log(µ(B))) A∈α B∈β =− XX χA (x)χB (x) log(µ(A)) − A∈α B∈β =− X XX χA (x)χB (x) log(µ(B)) A∈α B∈β χA (x) log(µ(A)) − A∈α X χB (x) log(µ(B)) B∈β = I(α) + I(β) Definition 9.7. Entropy: Given a partition α we define the entropy to be Z X H(α) = I(α)dµ = − µ(A) log(µ(A)) A∈α Definition 9.8. Conditional Information: Given A ⊆ B sub-σ algebra and partition α we define the conditional information of α given A to be X I(α|A)(x) := − χA (x)log(µ(A|A)(x)) A∈α Definition 9.9. Conditional Entropy: Given A ⊆ B sub-σ algebra and partition α we define the conditional entropy of α given A to be Z Z X H(α|A) := I(α|A)dµ = − µ(A|A) log(µ(A|A))dµ A∈α Lemma 9.1. For countable partitions α, β, γ we have that I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ) Moreover H(α ∨ β|γ) = H(α|γ) + H(β|α ∨ γ) Proof. Let x ∈ X we know that since α, β, γ form partitions we have some A ∈ α, B ∈ β, C ∈ γ s.t. x ∈ A ∩ B ∩ C. I(α ∨ β|γ)(x) = − X χY (x) log(µ(Y |γ)(x)) Y ∈α∨β = − log(µ(A ∩ B)(x)) X µ(A ∩ B ∩ C) = − log χC (x) µ(C) C∈γ = − log(µ(A ∩ B ∩ C)) + log(µ(C)) I(α|γ) = − log(µ(A ∩ C)) + log(µ(C)) I(β|α ∨ γ) = − log(µ(A ∩ B ∩ C)) + log(µ(A ∩ C)) 34 Hence indeed I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ) so by integrating over x we have that H(α ∨ β|γ) = H(α|γ) + H(β|α ∨ γ) Definition 9.10. Refinement: For countable partitions α, β we say that β is a refinement of α (written α ≤ β) if any set A ∈ α can be written as the union of sets in β. Corollary 9.4. For countable partitions α ≤ β we have that I(α|β) = 0 Proof. Since α ≤ β we have that β = α ∨ β so I(α|β) = I(α|α ∨ β) = 0 Corollary 9.5. If α, β, γ are countable partitions and γ ≥ β then I(α ∨ β|γ) = I(α|γ) Moreover H(α ∨ β|γ) = H(α|γ) Proof. β ≤ α ∨ γ so I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ) = I(α|γ) So the final result follows by integration. Corollary 9.6. If α, β, γ are countable partitions s.t. α ≥ β then I(α|γ) ≥ I(β|γ) Moreover H(α|γ) ≥ H(β|γ) Proof. α ≥ β so α = α ∨ β so we have that I(α|γ) = I(α ∨ β|γ) = I(β|γ) + I(α|β ∨ γ) ≥ I(β|γ) The final result then follows by integration. Proposition 9.1. Jensen’s Inequality: Let ϕ : [0, 1] → R+ be continuous and concave. If f ∈ L1 (X, B, µ) and A ⊂ B a sub-σ algebra then ϕ(E[f |A])(x) ≥ E[ϕ(f )|A](x) Lemma 9.2. If γ ≥ β are countable partitions then H(α|β) ≥ H(α|γ) 35 µ − a.e. Proof. Set ϕ(t) = −t log(t) which is continuous and convex on [0, 1] and hence satisfies the requirements for Jensen’s inequality. Choose A ∈ α and define f (x) := µ(A|γ)(x) = E[χA |γ](x) By properties of conditional expectation we have that E[f |β] = E[E[χA |γ]|β] = E[χA |β] = µ(A|β) By Jensen’s inequality we have that ϕ(E[f |β]) ≥ E[ϕ(f )|β] hence −µ(A|β) log(µ(A|β)) ≥ −E[µ(A|γ) log(µ(A|γ))] Integrating with respect to µ yields Z Z − µ(A|β) log(µ(A|β))dµ ≥ − E[µ(A|γ) log(µ(A|γ))]dµ By summing over A ∈ α we have that H(α|β) ≥ H(α|γ) Definition 9.11. Sub-Additive: A sequence {an }∞ n=1 is called sub-additive if an+m ≤ an + am Lemma 9.3. If {an }∞ n=1 is a positive sub-additive sequence then an /n converges. Proof. If {an }∞ n=1 is sub-additive then its infimum element is a1 moreover 0≤ an na1 ≤ = a1 n n hence the sequence must converge. For measure preserving transformation T and countable partition α we denote T −1 α := {T −1 A : A ∈ α} and −i Hn (α) := H(∨n−1 α) i=0 T Lemma 9.4. If T is a measure preserving transformation and α a countable partition then Hn (α) is a sub-additive sequence. Proof. Hn+m (α) = H(∨n+m−1 T −i α) i=0 −i = H (∨n−1 α) ∨ (∨n+m−1 T −j α) i=0 T j=n −i ≤ H(∨n−1 α) + H(∨n+m−1 T −j α)) i=0 T j=n −i −n −j = H(∨n−1 α) + H(∨m−1 T α)) i=0 T j=0 T −i −j = H(∨n−1 α) + H(T −n ∨m−1 α)) i=0 T j=0 T −i −j = H(∨n−1 α) + H(∨m−1 α)) i=0 T j=0 T T measure preserving = Hn (α) + Hm (α) 36 Definition 9.12. Relative Entropy: If T : X → X is a measure preserving transform of probability space (X, B, µ) and α a countable partition of X such that H(α) < ∞ then the entropy of T relative to α is defined to be h(T, α) := lim n→∞ 1 −i H(∨n−1 α) i=0 T n The relative entropy always exists due to the previous lemmas. Corollary 9.7. 0 ≤ h(T, α) ≤ H(α) Corollary 9.8. −i h(t, α) = H(α| ∨∞ α) i=1 T Proof. Denote −i αn := ∨n−1 α i=0 T Then we have that −i −i H(αn ) = H(α| ∨n−1 α) + H(∨n−1 α) i=1 T i=1 T −i = H(α| ∨n−1 α) + H(αn−1 ) i=1 T n X −i = H(α| ∨k−1 α) i=0 T k=1 Hence n H(αn ) 1X −i H(α| ∨k−1 α) = i=0 T n n k=1 By the increasign martingale theorem we have that −i −i lim H(α| ∨n−1 α) = H(α| ∨∞ α) i=0 T i=0 T n→∞ So we have that H(αn ) −i = H(α| ∨∞ α) i=0 T n→∞ n h(T, α) = lim Definition 9.13. Entropy of a Measure Preserving Transformation: If T is a measure preserving transformation then h(T ) := sup{h(T, α) : α countable partition, H(T, α) < ∞} Theorem 9.2. Let T : X → X, S : Y → Y be measure preserving transformations of (X, B, µ), (Y, C, ν) respectively. If T, S are isomorphic then h(T ) = h(S) Proof. Recall that T, S are isomorphic if ∃M ∈ B, N ∈ C s.t. • T M ⊆ M, SN ⊆ N • µ(M ) = 1 = ν(N ) • ∃ϕ : M → N bijection s.t. ϕ, ϕ−1 are measurable, measure preserving and ϕ ◦ T = S ◦ ϕ 37 If α is a countable partition of Y then it is also a countable partition of N . ϕ−1 α is a partition of M and hence of X. We have that X Hµ (ϕ−1 α) = − µ(ϕ−1 A) log(µ(ϕ−1 A)) A∈α =− X ν(A) log(ν(A) A∈α = Hν (α) More generally we have that n−1 −i −i −i Hµ (∨n−1 (ϕ−1 α)) = Hµ (ϕ−1 ∨n−1 i=0 T i=0 S α) = Hν (∨i=0 S α) Dividing by n and taking the limit as n → ∞ gives h(T, ϕ−1 α) = lim n→∞ 1 1 n−1 −i −i Hµ (∨n−1 (ϕ−1 α)) = lim Hν (∨i=0 S α) = h(S, α) i=0 T n→∞ n n So we have that h(S) = sup{h(S, α) : α countable partition of Y, Hν (α) < ∞} = sup{h(T, ϕ−1 α) : α countable partition of Y, Hν (α) < ∞} ≤ sup{h(T, β) : β countable partition of X, Hµ (beta) < ∞} = h(T ) So by symmetry we have that h(T ) ≤ h(S) and hence h(T ) = h(S) Theorem 9.3. Abramov’s Theorem: If {αn }∞ n=1 is an increasing sequence of partitions on (X, B, µ) such that H(αn ) < ∞ and S∞ α = B then n n=1 h(T ) = lim h(T, αn ) n→∞ Proof. Let α, β be partitions with H(α), H(β) < ∞ then −i −i −j H(∨n−1 α) ≤ H (∨n−1 α) ∨ (∨n−1 β) i=0 T i=0 T j=0 T n−1 −i −i −j = H(∨n−1 β) + H(∨i=0 T α| ∨n−1 β) i=0 T j=0 T −i ≤ H(∨n−1 β) + i=0 T n−1 X n−1 −j H(T −i α| ∨j=0 T β) i=0 −i ≤ H(∨n−1 β) + i=0 T n−1 X H(T −i α|T −i β) i=0 −i = H(∨n−1 β) + nH(α|β) i=0 T which gives us that h(T, α) = lim n→∞ 1 1 −i −i H(∨n−1 α) ≤ lim H(∨n−1 β) + H(α|β) = h(T, β) + H(α|β) i=0 T i=0 T n→∞ n n In particular h(T, α) ≤ h(T, αn ) + H(α|αn ) for any countable partition α. 38 Furthermore for an increasing sequence of partitions {αn }∞ n=1 with H(αn ) < ∞ and along with arbitrary partition α with H(α) < ∞ then S∞ n=1 αn = B lim H(α|αn ) = 0 n→∞ Which means that h(T, α) ≤ lim h(T, αn ) n→∞ for any countable partition α so indeed h(T ) = supα h(T, α) ≤ lim h(T, αn ) n→∞ ≤ h(T ) which means that h(T ) = lim h(T, αn ) n→∞ Definition 9.14. Generator: For T invertible measure preserving transformation on (X, B, µ) we say that a countable partition α is a generator if −i lim ∨n−1 α=B j=−(n−1) T n→∞ Definition 9.15. Strong Generator: For T measure preserving transformation on (X, B, µ) we say that a countable partition α is a strong generator if −i lim ∨n−1 α=B j=0 T n→∞ Corollary 9.9. If for a.e. x, y ∈ X ∃n s.t. x, y are in different elements of the partition −i ∨n−1 α then α is a generator. j=−(n−1) T −i Corollary 9.10. If for a.e. x, y ∈ X ∃n s.t. x, y are in different elements of the partition ∨n−1 α j=0 T then α is a strong generator. Theorem 9.4. Sinai’s Theorem: If either • α is a strong generator. or • T is invertible and α is a generator. then h(T ) = h(T, α) Proof. • Suppose α is a strong generator. 1 −i H(∨k−1 (∨nj=0 T −j α)) i=0 T k 1 = lim H(∨n+k−1 T −i α) i=0 k→∞ k n+k 1 = lim H(∨n+k−1 T −i α) i=0 k→∞ k n+k = h(T, α) h(T, ∨nj=0 T −j α) = lim k→∞ 39 In particular this holds for every n so h(T, B) = h(T, α) hence by Abramov’s we have that h(T, α) = lim h(T, αn ) = h(T ) n→∞ • Suppose T is invertible and α is a generator. 1 −i H(∨k−1 (∨nj=−n T −j α)) i=0 T k 1 −i α) = lim H(∨n+k−1 i=−n T k→∞ k 1 2n+k−1 −i = lim H(∨i=0 T α) k→∞ k 2n + k 1 = lim H(∨2n+k−1 T −i α) i=0 k→∞ k 2n + k = h(T, α) h(T, ∨nj=−n T −j α) = lim k→∞ In particular this holds for every n so h(T, B) = h(T, α) hence by Abramov’s we have that h(T, α) = lim h(T, αn ) = h(T ) n→∞ Theorem 9.5. If T is a measure preserving transformation of (X, B, µ) then • For k ∈ N0 we have that h(T k ) = kh(T ). • If T is invertible then h(T −1 ) = h(T ). • If T is invertible and k ∈ Z then h(T k ) = |k|h(T ). Proof. • For k = 0 we have that h(T k ) = h(1) so if α is a countable partition such that H(α) < ∞ then n−1 −i H(∨i=0 1 α) = H(α) and hence h(T, α) = lim n→∞ so indeed the statement holds for k = 0. Choose a countable partition α with H(α) < ∞. 40 1 H(α) = 0 n 1 −j H(∨nk−1 α) j=0 T n→∞ n 1 nk−1 −j = k lim H(∨j=0 T α) n→∞ nk = kh(T, α) −j h(T k , ∨k−1 α) = lim j=0 T kh(T ) = supα:H(α)<∞ kh(T, α) −j = supα:H(α)<∞ h(T k , ∨k−1 α) j=0 T ≤ supα:H(α)<∞ h(T k , α) = h(T k ) 1 −jk h(T k , α) = lim H(∨n−1 α) j=0 T n→∞ n 1 −j ≤ lim H(∨nk−1 α) j=0 T n→∞ n 1 −j H(∨nk−1 α) = k lim j=0 T n→∞ nk = kh(T k , α) So indeed h(T k ) = kh(T ) • −j −j H(∨n−1 α) = H(T n−1 ∨n−1 α) j=0 T j=0 T j = H(∨n−1 j=0 T α) 1 −j h(T, α) = lim H(∨n−1 α) j=0 T n→∞ n 1 j = lim H(∨n−1 j=0 T α) n→∞ n = h(T −1 , α) Taking the supremum with respect to α then gives us that h(T ) = h(T −1 ) • This follows directly from the last two points. Lemma 9.5. The Parry measure µP r of n × n matrix A with entries 0, 1 and largest eigenvalue λ has entropy h(µP r ) = log(λ) Proof. Let Ai,j vj λvi ui vi pi = c Pi,j = where u, v are left and right eigenvectors respectively and c = 41 Pk i=1 ui vi is a normalising constant. P is a stochastic matrix so: h(µP r ) = − k X pi Pi,j log(Pi,j ) i,j=1 =− k X Ai,j vj ui vi Ai,j vj log c λvi λvi i,j=1 =− k X Ai,j vj ui Ai,j vj log λc λvi i,j=1 k k k X X X ui Ai,j vj ui Ai,j vj ui Ai,j vj =− log(Ai,j ) + log(λ) + (log(vi ) − log(vj )) λc λc λc i,j=1 i,j=1 i,j=1 Since A has entries 0, 1 we must always have that Ai,j log(Ai,j ) = 0 and the third term contains a telescoping sum which cancels all of the terms so we have that h(µP r ) = k X ui Ai,j vj log(λ) λc i,j=1 = log(λ) k X λuj vj j=1 Pk λc j=1 uj v j i=1 ui vi = log(λ) Pk = log(λ) In general we may not have that h(T ) = h(S) implies that T, S are isomorphic however the following two theorems give some cases where this is a sufficient condition. Theorem 9.6. Ornstein: Two 2-sided Bernoulli shifts with the same entropy are isomorphic. Theorem 9.7. Ornstein-Friedman: Two aperiodic Markov shifts of finite type with the same entropy are isomorphic. 10 Functional Analysis Definition 10.1. Banach Space: A Banach space is a complete normed space. For a probability space (X, B, µ) we denote L1 (X, B, µ) to be the space of integrable functions which is a Banach space with the norm Z ||f ||1 := |f |dµ Furthermore we denote L∞ (X, B, µ) to be the space of functions which are almost everywhere bounded which is a Banach space with the norm ||f ||∞ := inf sup |f (y)| Y ⊂X:µ(Y )=1 y∈Y Proposition 10.1. We have the following facts concerning L1 , L∞ : 42 • L∞ ⊂ L1 R • Every bounded linear functional W : L1 → R can be written as W (f ) = f gdµ for some g ∈ L∞ R • L10 := {f ∈ L1 : f dµ = 0} is a Banach space with respect to the norm ||f || = inf c∈R ||f − c||1 Lemma 10.1. If E is a Banach space and F ⊂ E is a closed subspace then ∃W : E → R non-zero, bounded linear functional s.t. W (f ) = 0 ∀f ∈ F Lemma 10.2. Let m, p ∈ N such that 1 ≤ m ≤ p and {xi }pi=1 ∈ R+ For ε > 0 let i+n−1 X Sε := i ∈ {1, ..., p − m} : ∃n ∈ N ∩ [1, m] s.t. nε ≤ j=i then p X j=1 xj ≥ ε p−m X χSε (i) i=1 Proof. If S = φ then since xi ≥ 0 ∀i the statement is trivial so suppose S 6= φ. Let j1 = min{i ∈ S} since S is finite and non-empty. Furthermore let n1 be the least value for which Pj1 +n1 −1 n1 ε ≤ i=j 1 Then inductively define jr := min{j ∈ S : j > jr−1 + nr−1 − 1} (so long as the right-hand-side is Pjr +nr −1 non-empty) and nr to be the least value for which nr ε ≤ i=j r This yields the finite collections {jr }kr=1 , {nr }kr=1 n X j=1 xj ≥ k jr +n r −1 X X r=1 ≥ xi i=jr k X nr ε r=1 ≥ε k jr +n r −1 X X r=1 =ε χS (i) i=jr p−m X i=1 since any element not in some string cannot be in S. For ε > 0 and f ∈ L1 define 1 n−1 X Eε (f ) := x ∈ X : lim sup f (T j x) ≥ ε n j=0 n→∞ Lemma 10.3. µ(E2ε (f )) ≤ ||f ||1 ε Proof. Write f = f + − f − where f + , f − ≥ 0 then for m ≥ 1 define n−1 X Eεm (f + ) = x ∈ X : ∃n ≤ m, f + (T j x) ≥ εn j=0 43 Eεm (f − ) = n−1 X x ∈ X : ∃n ≤ m, f − (T j x) ≥ εn j=0 then by applying lemma 10.2 with xj = f + (T j−1 x) and Sε = Eεm (f + ) we have that for p > m p−1 X f + (T j x) ≥ ε j=0 p−m−1 X χEεm (f + ) (T i x) i=0 and similarly for f − we get p−1 X f − (T j x) ≥ ε j=0 p−m−1 X χEεm (f − ) (T i x) i=0 It follows that Z p f + dµ = Z X p−1 f + (T j x)dµ T µ invariant j=0 ≥ε p−m−1 X Z χEεm (f + ) (T i x)dµ i=0 = ε = ε(p − m)µ(Eεm (f + )) Similarly we have that Z p f − dµ ≥ ε(p − m)µ(Eεm (f − )) By dividing by p and taking the limit as p → ∞ we get that Z f + dµ ≥ εµ(Eεm (f + )) and Z f − dµ ≥ εµ(Eεm (f − )) Furthermore we have that E2ε (f ) ⊂ Eεm (f + ) ∪ Eεm (f − ) so µ(E2ε (f )) ≤ lim sup µ(Eεm (f + )) + lim sup µ(Eεm (f − +)) m→∞ m→∞ Z Z 1 f + dµ + f − dµ ≤ ε ||f ||1 = ε Lemma 10.4. Given f ∈ L10 and δ > 0 we have that ∃h ∈ L∞ s.t. ||f − (h ◦ T − h)||1 < δ Proof. Let C = {h ◦ T − h : h ∈ L∞ } ⊂ L10 which is a vector space. We want to show that C is dense in L10 so it suffices to show that any linear bounded functional which vanishes on C also vanishes on L10 by lemma 10.1. We know that any linear bounded functional W on L1 can be written as Z W (f ) = f gdµ 44 for some g ∈ L∞ . Suppose W vanishes on C then ∀h ∈ L∞ we have that Z (h ◦ T − h)gdµ = 0 in particular we have that Z (g ◦ T − g)gdµ = 0 Which means that Z Z (g ◦ T )gdµ = g 2 dµ Furthermore Z Z Z Z (g ◦ T − g)2 dµ = (g ◦ T )2 dµ + g 2 dµ − 2 (g ◦ T )gdµ Z Z 2 = 2 g dµ − 2 (g ◦ T )gdµ =0 so we must have that g ◦ T − g = 0 almost everywhere hence since T is ergodic we must have that g is some constant c. For f ∈ L10 we have that Z W (f ) = f gdµ Z = c f dµ =0 so indeed W vanishes on L10 so C is dense in L10 and the result follows. Theorem 10.1. Birkhoff ’s Ergodic Theorem: If T : X → X is an ergodic measure preserving transformation on probability space (X, B, µ) then for any f ∈ L1 (X, B, µ) we have that for almost every x ∈ X Z n−1 1X j lim f (T x) = f dµ n→∞ n j=0 Proof. Firstly suppose that h ∈ L∞ then h ◦ T − h ∈ L10 since Z Z Z h ◦ T − hdµ = h ◦ T dµ − hdµ Z Z = hdµ − dµ =0 45 Furthermore 1 n−1 1 n−1 X X (h ◦ T − H)(T j x) = h(T j+1 x) − h(T j x) n j=0 n j=0 1 = (h(T n x) − h(x)) n 1 ≤ (|f (T n x)| + |f (x)|) n 2||h||∞ ≤ n which converges to 0 as n → ∞. We need to extend this to f ∈ L10 so suppose that f ∈ L10 then we want to show that n−1 1X f (T j x) = 0 n→∞ n j=0 lim Fix δ > 0 then by the previous lemma we can find h ∈ L∞ s.t. ||f − (h ◦ T − h)||1 < δ so ∀ε > 0 we have that: Eε (f ) ⊂ Eε/2 (f − (h ◦ T − h)) ∪ Eε/2 (h ◦ T − h) µ(Eε (f )) ≤ µ(Eε/2 (f − (h ◦ T − h))) + µ(Eε/2 (h ◦ T − h)) = µ(Eε/2 (f − (h ◦ T − h))) ≤ by earlier part of the proof 4||f − (h ◦ T − h)||1 ε 4δ ≤ ε by lemma 10.3 since δ was chosen arbitrarily it follows that µ(Eε (f )) = 0. R So the theorem holds for f ∈ L10 so for g ∈ L1 write f = g − gdµ. The theorem holds for f since f ∈ L10 and by rearrangement we have that the theorem holds for g. 46