Ergodic Theory Contents March 14, 2013

Ergodic Theory March 14, 2013 Contents 1 Uniform Distribution 1.1 Generalisation To Higher Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Generalisation To Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 6 7 2 Dynamical Systems 2.1 Subshifts of Finite Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 10 3 Measure Theory 14 4 Measures On Compact Metric Spaces 16 5 Measure Preserving Transformations 18 6 Ergodicity 24 7 Recurrence and Unique Ergodicity 28 8 Birkhoff ’s Ergodic Theorem 30 9 Entropy 32 10 Functional Analysis 42 1 1 Uniform Distribution Definition 1.1. Orbit For a space X and transformation T : X → X the orbit of x ∈ X is defined as Ox := {T n x}∞ n=0 Definition 1.2. Fixed Point For a space X and transformation T : X → X we say that x ∈ X is a fixed point of T if T x = x Definition 1.3. Periodic Point For a space X and transformation T : X → X we say that x ∈ X is a periodic point of T if T n x = x for some n ∈ N Definition 1.4. Indicator Function For a set A ⊆ X we denote the indicator function: ( 1 χA (x) = 0 x∈A x∈ /A Corollary 1.1. Given this definition we have that the frequency in which the orbit of x lies in A is n−1 1X χA (T j x) n→∞ n j=1 lim If we say that a property holds for a typical x ∈ X we mean that the property holds almost everywhere with respect to the measure on the space X. Definition 1.5. Preserves We say that T preserves µ if for any measureable A ⊆ X we have that µ(T −1 A) = µ(A) For x ∈ R we denote bxc := max{m ∈ Z : m ≤ x} to be the interger part of x and bxe := x − bxc to be the floating part. Definition 1.6. Uniformly Distributed We say that a sequence {xn }∞ n=0 is uniformly distributed mod 1 if for every 0 ≤ a < b < 1 we have that lim n→∞ 1 #{j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = b − a n Lemma 1.1. If {xn }∞ n=0 is uniformly distributed then it is dense. Proof. By contradiction suppose {xn }∞ n=0 is not dense. Then ∃0 ≤ a < b such that @xn ∈ [a, b]. So we have that {j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = φ ∀n and hence limn→∞ {j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = φ which contradicts the uniform distribution of {xn }∞ n=0 Lemma 1.2. The frequency in which the leading digit in the sequence {mn }∞ n=1 is r ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9} when log10 (m) ∈ R \ Q is r+1 log10 r Proof. The leading digit is r iff r10l ≤ mn < (r + 1)10l By taking the logarithm base 10 on all sides of this inequality we get that: log10 (r) + l log10 (10) ≤ n log10 (m) < log10 (r + 1) + l log10 (10) Simplifying this gives: log10 (r) + l ≤ n log10 (m) < log10 (r + 1) + l 2 By our assumptions log10 (m) is irrational hence the sequence n log10 (m) mod 1 is uniformly distributed so we have that 1 #{j ∈ [0, n−1] : j log10 (m) ∈ [log10 (r)+l, log10 (r+1)+l)} = (log10 (r+1)+l)−(log10 (r)+l) = log10 n→∞ n lim r+1 r Theorem 1.1. Weyl’s Criterion The following are equivalent: • {xn }∞ n=0 are uniformly distributed mod 1 • For each l ∈ Z \ {0} we have that n−1 1 X 2πilxj e =0 n→∞ n j=0 lim Proof. • Firstly we will prove that: {xn }∞ n=0 are uniformly distributed mod 1 =⇒ n−1 1 X 2πilxj e =0 n→∞ n j=0 lim WLOG we can assume that xn ∈ [0, 1) since e2πilxj = e2πilbxj e . Suppose {xn }∞ n=0 are uniformly distributed mod 1 Then ∀[a, b] ⊂ [0, 1) we have that lim n−1 X n→∞ Z χ[a,b] (xj ) = b − a = χ[a,b] (x)dx j=0 From this we can deduce that for a step function g we have that lim n→∞ n−1 X Z gxj ) = g(x)dx j=0 Let f be a continuous function on [0, 1) and let ε > 0, we can find a step function g s.t. ||f − g||∞ < ε Since {xj }∞ j=0 are uniformly distributed and g is a step function we can find n sufficiently large s.t. n−1 Z 1 1 X g(xj ) − g(x)dx < ε n 0 j=0 So we have that: n−1 n−1 Z 1 n−1 Z 1 Z 1 Z 1 1 X 1 X 1 X f (xj ) − f (x)dx ≤ (f (xj ) − g(xj )) + g(xj ) − g(x)dx + g(x)dx − f (x)dx n 0 0 0 0 n j=0 n j=0 j=0 n−1 Z 1 1 X ε + ε + < εdx 0 n j=0 = 3ε 3 Since ε > 0 was chosen arbitrarily we have that n−1 Z 1 1 X f (xj ) − f (x)dx = 0 lim n→∞ n 0 j=0 Moreover Z 1 e2πilx dx = 0 0 hence we have the required result. • Now we will prove the reverse implication. Suppose that for each l ∈ Z \ {0} we have that n−1 1 X 2πilxj lim e =0 n→∞ n j=0 Then for trigonometric polynomial g(x) = n X αk e2πilk x k=1 we have that Z 1 n−1 1X g(xj ) = g(x)dx lim n→∞ n 0 j=0 Let f be a continuous function on [0, 1] with f (0) = f (1) and fix ε > 0. We can find a trigonometric polynomial g s.t. ||g − f ||∞ < ε and as in the previous part of the proof w can see that Z 1 n−1 X lim f (xj ) = f (x)dx n→∞ 0 j=0 If we take [a, b] ⊂ [0, 1) then we can find f1 , f2 continuous functions s.t. f1 ≤ χ[a,b] ≤ f2 where f1 (0) = f1 (1), f2 (0), f2 (1) and Z 1 f2 (x) − f1 (x)dx < ε 0 This gives us that: lim inf n→∞ n−1 n−1 1X 1X χ[a,b] (x) ≥ lim inf f1 (xj ) n j=0 n j=0 Z 1 = f1 (x)dx 0 Z 1 ≥ f2 (x)dx − ε 0 1 Z ≥ χ[a,b] (x)dx − ε 0 lim sup n→∞ n−1 n−1 1X 1X χ[a,b] (x) ≤ lim inf f2 (xj ) n j=0 n j=0 Z 1 = f2 (x)dx 0 Z 1 ≤ f1 (x)dx + ε 0 Z ≤ χ[a,b] (x)dx + ε 0 4 1 But since ε > 0 was chosen arbitrarily we have that Z 1 n−1 1X χ[a,b] (x)dx = b − a χ[a,b] (x) = lim n→∞ n 0 j=0 hence indeed {xn }∞ n=0 are uniformly distributed. Lemma 1.3. The sequence xn = αn is uniformly distributed mod 1 for α ∈ R \ Q and not uniformly distributed for α ∈ Q Proof. We shall split the proof into the two cases: • α∈Q We can write α = p/q for p, q ∈ Z, q > 0 where p, q are coprime. n oq−1 The sequence xn then only takes values np of which there are finitely many hence the set q n=0 cannot be dense and therefore xn is not uniformly distributed. • α∈R\Q Let l ∈ Z \ {0} then lα ∈ / Z so e2πixn 6= 1 This gives us that n−1 1 X 2πiljα 1 1 − e2πilnα e = n j=0 n 1 − e2πilα so we have that n−1 1 X 2πiljα 2 ≤ lim 1 e lim n→∞ n 1 − e2πilα = 0 n→∞ n j=0 Hence by Weyl’s criterion we have that xn is unifromly distributed mod 1. Corollary 1.2. We have that xn = nα + β is uniformly distributed iff α ∈ R \ Q Proof. We shall split the proof into the two cases: • α∈Q We can write α = p/q for p, q ∈ Z, q > 0 where p, q are coprime. n oq−1 The sequence xn then only takes values np + β mod 1 of which there are finitely many q n=0 hence the set cannot be dense and therefore xn is not uniformly distributed. • α∈R\Q Let l ∈ Z \ {0} then lα ∈ / Z so e2πixn 6= 1 This gives us that n−1 n−1 1 X 2πil(jα+β) e2πilβ X 2πiljα e = e n j=0 n j=0 so we have that so by the previous lemma we have the same convergence result. 5 1.1 Generalisation To Higher Dimension (1) (k) k For this subsection we will consider {xn }∞ n=1 : xn = (xn , ..., xn ) ∈ R Definition 1.7. Uniformly Distributed k k A sequence {xn }∞ n=0 ∈ R is uniformly distributed mod 1 if for each choice of k intervals {[as , bs ]}s=1 we have that n−1 k k Y 1 XY (s) χ[as ,bs ] (xj ) = (bs − as ) lim n→∞ n s=1 j=0 s=1 Theorem 1.2. Multi-Dimensional Weyl’s Criterion k The sequence {xn }∞ n=0 ∈ R is u.d. mod 1 iff n−1 1 X 2πi Pks=1 ls x(s) j e =0 n→∞ n j=0 lim where l ∈ Zk \ {0} Definition 1.8. Rationally Independent Pk The sequence {λi }ki=1 are rationally independent if for {rs }ks=1 ∈ Q s.t. s=1 rs λs = 0 we must have that rs = 0 ∀s (j) Theorem 1.3. xn = nαj is uniformly distributed mod 1 iff {αs }ks=1 , 1 are rationally independent. Proof. Suppose {αs }ks=1 , 1 are rationally independent then for any l ∈ Zk \ {0} we have that k X ls αs ∈ /Z s=1 So we have that e2πi Pk s=1 ls nαs 6= 1 Hence P n−1 2πi k 1 X 2πi Pk l jα s=1 ls nαs 1 1 − e s s=1 s e = n 1 − e2πi Pks=1 ls αs n j=0 ≤ 1 2 P n 1 − e2πi ks=1 ls αs 1 2 Pk =0 2πi n→∞ n 1 − e s=1 ls αs lim hence by Weyl’s criterion we have that xn is uniformly distributed mod 1. Now suppose that {αs }ks=1 , 1 are not rationally independent then for some l ∈ Zk \ {0} we have that k X ls αs ∈ Z s=1 hence we have that e2πi So we have that Pk s=1 ls nαs = 1 ∀n ∈ N n 1 X 2πi Pks=1 ls nαs e =1 n→∞ n j=0 lim so by Weyl’s criterion xn is not uniformly distributed mod 1. 6 1.2 Generalisation To Polynomials For this subsection we will write p(n) = k X αs ns s=0 Lemma 1.4. Van-der Corput’s Inequality Let {zj }n−1 j=0 ∈ C and let 1 ≤ m ≤ n − 1 then   n−1 2 n−1−j n−1 n−1 X X X X zj ≤ m(n + m − 1) |zj |2 + 2(n + m − 1)<  m2 (m − j) zs+j z s  j=0 s=0 j=0 j=1 (m) For a sequence {xn }∞ n=0 let xn := xn+m − xn (m) Lemma 1.5. Let {xn }∞ n=0 ∈ R be a sequence, if for each m ≥ 1 we have that xn xn is u.d. mod 1. is u.d. mod 1 then Proof. We need to show that for any l ∈ Z \ {0} we have that lim n→∞ n−1 X e2πilxj = 0 j=0 Let zj := e2πilxj then |zj | = 1. For 1 ≤ m ≤ n we have that by Van-der Corbut’s inequality: 2   n−1 n−1−j m−1 m 2(n + m − 1)  X m − j X 2πil(xs+j −xs )  m2 X 2πilxj e < e ≤ n2 (n + m − 1)n + n2 j=0 n n s=0 j=1   n−1−j m−1 m 2(n + m − 1)  X 1 X 2πil(x(s) )  j = (n + m − 1) + < (m − j) e n n n s=0 j=1 We have that n−1−j 1 X 2πil(x(s) j ) = 0 e n→∞ n s=0 lim by Weyl’s criterion hence   n−1−j m−1 1 X 2πil(x(s) 2(n + m − 1)  X j ) = 0 < (m − j) e lim n→∞ n n s=0 j=1 Hence 2 m m(n + m − 1) limsupn→∞ 2 e2πilxj ≤ limsupn→∞ n j=0 n 2 n−1 X =m n−1 1 X 2πilx 1 j limsupn→∞ e ≤ √m n j=0 7 which holds ∀m ≥ 1 so we can choose m arbitrarily large then n−1 1 X 2πilxj =0 e n→∞ n j=0 lim hence by Weyl’s criterion we have that xn is u.d. mod 1 Theorem 1.4. If αr ∈ R \ Q for any αr ∈ {αs }ks=0 then p(n) is u.d. mod 1. Proof. Suppose αk ∈ R \ Q. We want to show this inductively so let ∆(k) be the event that any polynomial with irrational leading coefficient of degree k is u.d. mod 1. From corollary 1.2 we have that ∆(1) holds so suppose for some k ≥ 2 we have that ∆(k − 1) holds. Let k X p(n) = αi ni : αk ∈ R \ Q i=0 Foor any m ≥ 1 we have that p(m + n) − p(n) = k X αi (n + m)i − i=0 = = = k X k X αi ni i=0 αi i X i j i=0 j=0 k−1 X i X i αi j i=0 j=0 i X i i=0 j=0 k−1 X i X i−j − n m k X αi ni i=0 k−1 X αi j j i−j n m + αk k X k j=0 j nj mi−j + αk k−1 X j=0 k−1 X j j n m k−j − k X αi ni i=0 k nj mk−j j + αk nk − k X αi ni i=0 k−1 X i k nj mi−j + αk nj mk−j − αi ni j j i=0 j=0 j=0 i=0 k = αk−1 nk−1 + αk mnk−1 − αk−1 nk−1 + q(k − 2) k−1 = αi Where q(k − 2) is some polynomial of degree k − 2 and hence p(m + n) − p(n) = αk (k − 1)mnk−1 + q(k − 2) which is a polynomial of degree k − 1 with an irrational leading coefficient hence by ∆(k − 1) we have that p(m + n) − p(n) is u.d. mod 1 for any m ≥ 1 and therefore by lemma 1.5 we have that p(n) is u.d. mod 1. So by induction we have that ∆(k) holds for any k ∈ N 2 Dynamical Systems Definition 2.1. Circle We write S := {x + Z : x ∈ R} to we the circle which is an equivalence class over the real line. Definition 2.2. Rotation: T : S → S defined as T (x) = x + α is a rotation of degree α on the circle. Lemma 2.1. If α ∈ Q then a rotation of degree α has every point periodic and if α ∈ R \ Q there are no periodic points. 8 Proof. Suppose α = p/q for p, q ∈ Z, q > 0 coprime. T q (x) = x + qα mod 1 = x + p mod 1 =x so any point x is periodic. If α ∈ R \ Q then the sequence nα + x is u.d. mod 1 hence every orbit is dense and therefore there cannot be any periodic points. Definition 2.3. Cylinder Set: For a function T : S → S we denote the cylinder set of the sequence {xi }ni=0 to be I(x0 , ..., xn ) = {0 ∈ S : T k (x) ∈ Cxk ∀k ∈ [0, n]} where ( Ci = [0, 1/2) [1/2, 1) i=0 i=1 Lemma 2.2. The following statements are true of cylinder sets: • If x ∈ [0, 1) has associated sequence {xn }∞ n=0 then ∞ \ I(x0 , ..., xn ) = {x} n=0 • The set of cylinder sets of rank n form a partition for any n. Proposition 2.1. For the doubling map T : S → S defined as T (x) = 2x mod 1 we have that: • There are 2n periodic points of period n. • The periodic points are dense. • There exists a dense orbit. Proof. • Notice that T n (x) = 2n x mod 1. Suppose T n (x) = x mod 1. This happens iff 2n x = x + p for some p ∈ Z. Which is equivalent to saying x = 2np−1 . Hence each choice of p ∈ [0, 2n − 1) ∩ Z gives a distinct periodic point of which there are precisely 2n − 1. • Let y ∈ [0, 1) and ε > 0. We want to find some periodic point x ∈ (y − ε, y + ε) so find n sufficiently large such that ε > (2n − 1)−1 . Notice that x = 2np−1 for p = 0, ..., 2n − 2 are periodic points distributed evenly with distance (2n − 1)−1 between consequtive values hence clearly some periodic point x must lie in the ball of radius ε around y. • For any x ∈ [0, 1) associate the sequence ( 0 xn := 1 T n (x) ∈ [0, 1/2) T n (x) ∈ [1/2, 1) 9 Now suppose x̃ = Then because ∞ X xn 2n+1 n=0 ∞ X 1 =1 n+1 2 n=0 we have that for almost every sequence xn that x̃ ∈ [0, 1/2) iff x0 = 0. Moreover T (x̃) = 2x̃ mod 1 ∞ X 2xn = mod 1 n+1 2 n=0 = x0 + = ∞ X xn+1 mod 1 2n+1 n=0 ∞ X xn+1 mod 1 2n+1 n=0 so we have that T (x̃) ∈ [0, 1/2) iff x1 = 0. Furthermore we can see by iterating this that T n (x̃) ∈ [0, 1/2) iff xn = 0. So we must have that almost every x can be written x= ∞ X xn 2n+1 n=0 where xn is the sequence associated with x. Since cylinder sets are dense and for the doubling map any cylinder of rank n is an interval of width 2−n it suffices to find a point x ∈ [0, 1) such that T n (x) visits every cylinder. If we order the cylinders 0, 1, 00, 01, 10, 11, 000, ... then we cand define x to be the the point with binary expansion x = 0100011011000... in this way T 0 (x) ∈ [0, 1/2), T 1 (x) ∈ [1/2, 1), T 2 (x) ∈ [0, 1/4), T 4 (x) ∈ [1/4, 1/2), T 6 (x) ∈ [1/2, 3/4), ... and so on so indeed the iterates {T n (x)}∞ n=0 visit every cylinder. 2.1 Subshifts of Finite Type Definition 2.4. One-Sided Shift Space: For S = {0, ..., k} let A be a k × k matrix with entries {0, 1}. The set of one-sided shifts of finite type generated by A is defined as ∞ Σ+ A := {{xj }j=0 : Axj ,xj+1 = 1 ∀j} Definition 2.5. Two-Sided Shift Space: For S = {0, ..., k} let A be a k × k matrix with entries {0, 1}. The set of one-sided shifts of finite type generated by A is defined as ΣA := {{xj }j∈Z : Axj ,xj+1 = 1 ∀j} Definition 2.6. One Sided Shift: + + The one sided shift is σ + : Σ+ A → ΣA defined as σ (x)i = xi+1 Definition 2.7. Two Sided Shift: The two sided shift is σ : ΣA → ΣA defined as σ(x)i = xi+1 10 Definition 2.8. Irreducible: A k × k matrix A of zeros and ones is called irreducible if for any pair i, j ∈ S we can find some n ∈ N s.t. (An )i,j > 0 Notice that for n ∈ N, i, j ∈ S we have that (An )i,j is the number of paths from i → j in n steps in the graph with vertices S and directed edges i → j iff Ai,j = 1. Definition 2.9. Aperiodic: A k × k matrix A of zeros and ones is called aperiodic if ∃n ∈ N such that for any pair i, j ∈ S we have that (An )i,j > 0. In the graphical representation this says that starting from any vertex we can get to any other vertex in exactly n steps. Definition 2.10. Two Sided Cylinder: A two sided cylinder for a partial sequence {yi }ni=m of ΣA is an open set: [ym , ..., yn ]m,n := {x ∈ ΣA : xj = yj ∀m ≤ j ≤ n} Definition 2.11. One Sided Cylinder: A one sided cylinder for a partial sequence {yi }ni=m of Σ+ A is an open set: [ym , ..., yn ]m,n := {x ∈ ΣA : xj = yj ∀m ≤ j ≤ n} Lemma 2.3. A shift space along with the metric d(x, y) := 2−min{|n|:xn 6=yn } is a metric space. Proof. • d(x, y) = 0 ⇐⇒ x = y Notice that d(x, y) = 0 ⇐⇒ −min{|n| : xn 6= yn } = ∞ ⇐⇒ x = y so this clearly holds. • d(x, y) = d(y, x) Clearly min{|n| : xn 6= yn } = min{|n| : yn 6= xn } so indeed this holds. • d(x, y) + d(y, z) ≥ d(x, z) n0 := min{|n| : xn 6= zn } ≥ min{min{|n| : xn 6= yn }, min{|n| : yn 6= zn }} =: m since otherwise xn0 = yn0 = zn0 which is a contradiction. This gives us that 0 d(x, z) = 2−n ≤ 2−m ≤ 2−min{|n|:xn 6=yn } + 2−min{|n|:yn 6=zn } = d(x, y) + d(y, z) Theorem 2.1. For shift space Σ+ A we have that • Σ+ A is compact. • σ + is continuous. Proof. + • If Σ+ A = φ then we are done so assume that ΣA is non-empty. + (m) ∞ Let {x }m=1 ∈ ΣA be a sequence of elements of Σ+ A. Since the cylinders of a given degree form a disjoint countable partition we have that Σ+ A = k [ [i]0,0 i=1 11 Since there are finitely many in this union we must have that ∃i0 ∈ [1, k] such that there are infinitely many elements of x(m) inside [i0 ]0,0 . Denote these the subsequence x(m0 ) . Furthermore we can write k [ [i0 ]0,0 = [i0 , i]0,1 i=1 similarly ∃i1 ∈ [1, k] s.t. there are infinitely many elements of x(m0 ) in [i0 , i1 ]0,1 . Denote these the subsequence x(m1 ) . (m) Inductively we have that ∃{ik }∞ inside k=0 ∈ [1, k] s.t. there are infinitely many elements of x [i0 , ..., ik ]0,k for any k. (m ) We have some element xk k ∈ [i0 , ..., ik ]0,k for every k and since y ∈ [i0 , ..., ik ]0,k we must have (m ) that d(xk k , y) ≤ 2−k so indeed we have a convergent subsequence and therefore the space is sequentially compact and therefore compact. • Let ε > 0 then find n large such that 2−n < ε. Choose δ = 2−(n+1) then d(x, y) < δ =⇒ y ∈ [x0 , ..., xn+1 ]0,n+1 This gives us that σ + (y) ∈ [x1 , ..., xn+1 ]0,n So indeed d(σ + (x), σ + (y)) < 2−n < ε. Definition 2.12. Continual Fraction Map: T : [0, 1) → [0, 1) defined by T (x) = ( 0 1 x x=0 x 6= 0 mod is called the continual fraction map. Definition 2.13. Continual Fraction Expansion: If x ∈ (0, 1) then the continual fraction expansion of x is x= 1 x0 + 1 x1 + x 1 2 +... where {xi }∞ i=1 ∈ N ∪ {∞}. Lemma 2.4. x ∈ (0, 1) has a finite continual fraction expansion iff x ∈ Q. Lemma 2.5. If x ∈ Q ∩ (0, 1) then x has a unique continual fraction expansion. Lemma 2.6. If T is the continual fraction map and x has the continual fraction expansion with sequence {xi }∞ i=0 then 1 xi = T ix Proof. Inductively we have that 1 1 = x0 + 1 x x1 + x2 +... x0 ∈ N and 1 x1 + so indeed 1 x2 +... ∈ [0, 1) 1 = x0 x Assume that ∀i ≤ n we have that 1 xi = T ix 12 T n+1 x = So 1 T n+1 x 1 xn+1 + xn+21 +... = xn+1 + 1 xn+2 + ... where the remaining continual fraction is again less than one and xn+1 ∈ N so indeed 1 xn = T n+1 x So indeed by induction we have the required result. Definition 2.14. Linear Toral Endomorphism: If A : Rk → Rk is a k × k matrix with entries in Z such that det(A) 6= 0 then A is a linear map and TA : Rk /Zk → Rk Zk defined as TA x = Ax mod 1 is called a linear toral endomorphism. Lemma 2.7. The linear toral endomorphism is well defined. Proof. Suppose x, y ∈ Rk such that x = y + n for some integer vector n so x, y are in the same equivalence class on Rk /Zk . Ax = A(y + n) = Ay + An = Ay mod 1 Since n is an interger vector and A has integer entires implies that An is an integer vecotor. Definition 2.15. Linear Toral Automorphism: A linear toral endomorphism TA is a linear toral automorphism if det(A) = ±1 Lemma 2.8. If TA is a linear toral automorphism then TA−1 = TA−1 Definition 2.16. Hyperbolic Toral Automorphism: A linear toral automorphism TA is a hyperbolic toral automorphism if A doesn’t have eigenvalues of modulus 1. Proposition 2.2. Let TA be a hyperbolic toral automorphism of R2 /Z2 then Q2 /Z2 is the set of all periodic points of TA . Proof. Suppose (x1 , x2 ) = pq1 , pq2 where 0 ≤ p1 , p2 < q are integers. We have that ! (n) (n) p1 p2 n TA (x1 , x2 ) = , q q (n) (n) where 0 ≤ p1 , p2 < q are integers representing the transformed points. Notice that q remains unchanged since we are always multiplying by intergers. Moreover there are at most q possible choices (n) (n) (n) for pi and hence q 2 possible distinct combinations (p1 , p2 ) so we must have that there are some n > m ≥ 0 such that TAn (x1 , x2 ) = TAm (x1 , x2 ) But since TA is invertible and TA−1 = TA−1 we have that TAn−m (x1 , x2 ) = (x1 , x2 ) so indeed (x1 , x2 ) are periodic and since (x1 , x2 ) was chosen arbitrarily all rational points are periodic. Suppose (x1 , x2 ) is periodic then ∃n s.t. T n (x1 , x2 ) = (x1 , x2 ). 13 This is equivalent to saying An (x1 , x2 ) = (x1 , x2 ) + (n1 , n2 ) for ni ∈ Z. This gives us that (An − I)(x1 , x2 ) = (n1 , n2 ). Now since TA doesn’t have any eigenvalues with modulus one we have that 1 is not an eigenvalue of A (and therefore An ) so An − I is invertible. Therefore (x1 , x2 ) = (An − I)−1 (n1 , n2 ) But (An − I)−1 is a rational matrix and (n1 , n2 ) is an integer vector so their product must be a rational vector and hence all periodic points are rational. 3 Measure Theory Definition 3.1. Algebra: For a set X, a collection A of subsets of X is called a σ-algebra if: • φ∈F • A ∈ F =⇒ Ac ∈ F • A, B ∈ F =⇒ A ∩ B ∈ F Definition 3.2. σ-Algebra: For a set X, a collection F of subsets of X is called a σ-algebra if: • φ∈F • A ∈ F =⇒ Ac ∈ F S∞ • {Ai }∞ i=1 ∈ F =⇒ i=1 Ai ∈ F Lemma 3.1. If F is a σ-algebra of subsets of X then • X∈X • {Ai }∞ i=1 ∈ F =⇒ T∞ i=1 Ai ∈ F Proof. • X = φc ∈ F T∞ S∞ c • i=1 Ai = ( i=1 Aci ) ∈ F Definition 3.3. Borel σ-Algebra: For a given set X the Borel σ-algebra B(X) on X is the smallest σ-algebra containing all open sets. Definition 3.4. Measurable Space: If X is a set and F a σ-algebra on X then (X, F) is called a measurable space. Definition 3.5. Measure: If (X, F) is a measurable space then µ : R+ → {∞} is a measure on (X, F) if: • µ(φ) = 0 • {Ai }∞ i=1 ∈ F : Ai ∩ Aj = φ ∀i 6= j then µ ∞ [ ! Ai i=1 = ∞ X µ(Ai ) i=1 Definition 3.6. Measure Space: If X is a set, F a σ-algebra on X and µ a measure on (X, F) then (X, F, µ) is called a measure space. 14 Definition 3.7. Finite Measure: A measure µ on (X, F, µ) is finite if µ(X) < ∞ Definition 3.8. σ-Finite Measure: A measure µ on (X, F, µ) is σ-finite if ∃{Ai }∞ i=1 ∈ F such that • µ(Ai ) < ∞ ∀i S∞ • X = i=1 Definition 3.9. Almost Everywhere: For measure space (X, F, µ) a property P holds almost everywhere with respect to µ is µ(P f ails) = 0 Theorem 3.1. Kolmogorov Extension Theorem: If A is an algebra on X and µ : A → R+ satisfies • µ(φ) = 0 • µ σ-finite. • {Ai }∞ i=1 ∈ F : Ai ∩ Aj = φ ∀i 6= j then ∞ [ µ ! Ai = ∞ X i=1 µ(Ai ) i=1 Then ∃1 µ∗ : B(A) → R+ ∪ {∞} extension of µ. Definition 3.10. Stieltjes Measure: For X = [0, 1], A the algebra generated by open intervals and ρ : X → R+ increasing with ρ(1) − ρ(0) = 1 we define the Stieltjes measure with respect to ρ by µ(a, b) = ρ(b) − ρ(a) which extends to the entire space by KET. Definition 3.11. Dirac Measure: For X an arbitrary space and A any non-empty σ-algebra we define the Dirac measure with respect to x ∈ X to be δx (A) = I{x∈A} Definition 3.12. Measurable: If (X, F, µ) is a measure space then f : X → R is called F-measurable if f −1 (A) ∈ F ∀A ∈ B(R) Definition 3.13. Simple: f : X → R is called simple if ∃{Ai }ki=1 ∈ F, {ai }ki=1 ∈ R such that f= k X ai IAi i=1 + Theorem 3.2. For f : X → R+ measureable ∃{fn }∞ n=1 : X → R increasing sequence of simple functions converging pointwise to f . Definition 3.14. Integral: We split the definition of an integral into three seperate cases: • If f : X → R is simple then we can write f= k X ai IAi i=1 and then Z f dµ = k X i=1 15 ai µ(Ai ) • If f : X → R+ then by theorem 3.2 we can find an increasing sequence {fn }∞ n=1 of simple functions converging pointwise to f but less than f everywhere then Z Z f dµ = lim fn dµ n→∞ • If f : X → R such that Z |f |dµ < ∞ then we write f + = max{f, 0}, f − = max{−f, 0} so that f = f + − f − then define Z Z Z f dµ = f + dµ − f − dµ All of which are consistent. Definition 3.15. Equivalent: If f, g : X → R are measurable then they are equivalent with respect to the measure µ if f = g µ − a.e. We write L1 (X, F, µ) to be the space of equivalence classes of integrable functions f : X → R and define Z ||f ||1 = |f |dµ to be its norm which defines a metric via d(f, g) = ||f − g||1 Furthermore for p ≥ 1 we write Lp (X, F, µ) to be the space of equivalence classes of functions f : X → R such that |f |p is integrable and define Z ||f ||p = p1 |f | dµ p to be its norm. Lemma 3.2. If (X, F, µ) is a finite measure space then for 1 ≤ p < q we have that Lq (X, F, µ) ⊂ Lp (X, F, µ) Theorem 3.3. Monotone Convergence Theorem: R If {fn }∞ fn dµ is n=1 : X → R is an increasing sequence of integrable functions on (X, F, µ) such that a bounded sequence then limn→∞ fn exists µ a.e, is integrable and Z Z lim fn dµ = lim fn dµ n→∞ n→∞ Theorem 3.4. Dominated Convergence Theorem: If {fn }∞ n=1 : X → R is a sequence of measurable functions on (X, F, µ) such that |fn | ≤ g for some integarble function g : X → R and limn→∞ fn = f µ a.e. then f is integrable and Z Z lim fn dµ = f dµ n→∞ 4 Measures On Compact Metric Spaces Lemma 4.1. C(X, R) := {f : X → R continuous} equiped with the metric d(f, g) = ||f − g||∞ := supx∈X |f (x) − g(x)| is a metric space. 16 We denote M (X) to be the set of probability measures on (XB) then for µ ∈ M (X) we write Z µ(f ) := f dµ Proposition 4.1. If µ ∈ M (X) then: • µ is continuous: fn ∈ C(X, R), limn→∞ fn = f then limn→∞ µ(fn ) = µ(f ) • µ is bounded: f ∈ C(X, R) then |µ(f )| eq||f ||∞ • µ is linear: λ1 , λ2 ∈ R, f1 , f2 ∈ C(X, R) then µ(λ1 f1 + λ2 f2 ) = λ1 µ(f1 ) + λ2 µ(f2 ) • µ is positive: f ≥ 0 then µ(f ) ≥ 0 • µ is normalised: µ(1) = 1 Theorem 4.1. Riesz Representation Theorem: Let w : C(X, R) → R be a linear, bounded, positive, normalised functional then ∃1 µ ∈ M (X) such that Z w(f ) = f dµ Definition 4.1. Complete: A metric space is complete if every Cauchy sequence converges. Definition 4.2. Seperable: A metric space is seperable if it contains a countable dense subset. Proposition 4.2. M (X) is convex. i.e. µ1 , µ2 ∈ M (X), α ∈ [0, 1] then αµ1 + (1 − α)µ2 ∈ M (X) Definition 4.3. Weak Convergence: If µ, {µn }∞ n=1 ∈ M (X) then µn converges to µ weakly if ∀f ∈ C(X, R) we have that Z Z lim f dµn = f dµ n→∞ Lemma 4.2. ∃{fn }∞ n=1 ∈ C(X, R) countable dense subset and ∀µ, ν ∈ M (X) we have that ∞ X Z Z 1 d(µ, ν) := f dµ − f dν 2n ||fn ||∞ n=1 is a metric on M (X) compatable with the notation of weak convergence. Theorem 4.2. If X is a compact metric space then M (X) is weakly compact. Proof. It suffices to show that M (X) is sequentially compact i.e. that: ∞ ∀{µn }∞ n=1 ∈ M (X) ∃{µnk }k=1 which converges weakly. C(X, R) is seperable so choose a countable dense subset {fi }∞ i=1 ∈ C(X, R). ∈ M (X) we have that Given {µn }∞ n=1 µn (f1 ) ≤ ||f1 ||∞ ∀n by boundedness of µn hence since this sequence is in R we have that there is some convergent ∞ subsequence {µnk (1) }∞ k=1 ⊆ {µn }n=1 17 Similarly for each r = 2, 3, ... we have that µn (fr ) ≤ ||fr ||∞ ∀n by boundedness of µn hence since this sequence is in R we have that there is some convergent ∞ subsequence {µnk (r) }∞ k=1 ⊆ {µnk (r−1) }k=1 In particular let νn := µnk(n) be the diagonal sequence then νn (fn ) converges ∀n ≥ 1 Since fn are dense we have that for any f ∈ C(X, R) and fixed ε > 0 we can find some fi in our counable set such that ||f − fi ||∞ < ε. Since νn converges we can find N ∈ N such that ∀m, n ≥ N we have that |νn (fi ) − νm (fi )| < ε hence |νn (f ) − νm (f )| ≤ |νn (f ) − νn (fi )| + |νn (fi ) − νm (fi )| + |νm (fi ) − νm (f )| ≤ 3ε So indeed νn (f ) converges. Moreover by writing w(f ) = lim νn (f ) n→∞ we have that W satisfies the Riesx representation theorem criteria hence ∃1 µ ∈ M (X) s.t. Z w(f ) = f dµ so we must have that Z lim n→∞ Z f dνn = f dµ ∀f ∈ C(X, R) so indeed νn converges weakly to µ so we have some convergent subsequence. 5 Measure Preserving Transformations Definition 5.1. Measure Preserving Transformation: T : X → X measurable on (X, B) is a measure preserving transform if µ(T −1 (A)) = µ(A) Lemma 5.1. TFAE: 1. T is a measure preserving transform. 2. ∀f ∈ L1 (X, B, µ) we have that Z Z f ◦ T dµ = Proof. 2 =⇒ 1): For A ∈ B we have that χA ∈ L1 (X, B, µ) Z µ(A) = χA dµ Z χA ◦ T dµ = Z = χT −1 (A) dµ = µ(T −1 (A)) 18 f dµ ∀A ∈ B 1 =⇒ 2): Suppose T is a measure preserving transform then for any characteristic function we have that: Z χA dµ = µ(A) = µ(T −1 (A)) Z = χA ◦ T dµ This extends to simple functions by linearity and for f ∈ L1 (X, B, µ) we can find an increasing sequence of simple functions fn ∈ L1 (X, B, µ) s.t. fn → f pointwise. In particular fn ◦ T → f ◦ T pointwise so: Z Z fn dµ f dµ = lim n→∞ Z = lim fn ◦ T dµ n→∞ Z = f ◦ T dµ Definition 5.2. Push Forward Measure: For T : X → X continuous on compact X we define T∗ : M (X) → M (X) by T∗ µ(A) = µ(T −1 A) and call T∗ µ the push forward measure. Notice that µ is T -invariant iff T∗ µ = µ Lemma 5.2. For f ∈ C(X, R) we have that Z Z f d(T∗ µ) = f ◦ T dµ Lemma 5.3. If T : X → X is continuous on compact metric space X then TFAE: 1. T∗ µ = µ 2. ∀f ∈ C(X, R) we have that: Z Z f dµ = f ◦ T dµ Proof. 1 =⇒ 2) follows from lemma 5.1. 2 =⇒ 1): Let w1 , w2 : C(X, R) → R be defined by Z w1 (f ) = f dµ Z w2 (f ) = f d(T∗ µ) which satisfy the criteria for Riesz representation theorem so Z w2 (f ) = f d(T∗ µ) Z = f ◦ T dµ Z = f dµ = w1 (f ) 19 but by uniqueness from RRT we must have that µ = T∗ µ Theorem 5.1. Let T : X → X be a continuous mapping of compact metric space X then there is at least one T -invariant probability measure. Proof. Let σ ∈ M (X) and define µn := then we have that Z n−1 1X j T σ n j=0 ∗   n−1 Z 1 X f ◦ T j dσ  f dµn = n j=0 Since M (X) is weakly compact we have that µn has a convergent subsequence µnk converging to some µ ∈ M (X). We need to show that µ is T -invariant. Let f : X → R be continuous then Z Z Z Z f ◦ T dµ − f dµ = lim f ◦ T dµn − f dµn k k k→∞ nX Z n−1 X 1 Z k −1 1 = lim f ◦ T j+1 dσ − f ◦ T j dσ k→∞ nk n j=0 j=0 k Z Z 1 = lim f ◦ T nk dσ − f dσ k→∞ nk 2||f ||∞ ≤ lim k→∞ nk =0 Theorem 5.2. For a compact metric space X and T : X → X continuous mapping we have that: • M (X, T ) is convex. • M (X, T ) is closed. Proof. • Let µ1 , µ2 ∈ M (X, T ) and α ∈ (0, 1) (αµ1 + (1 − α)µ2 )(T −1 B) = αµ1 (T −1 B) + (1 − α)µ2 (T −1 B) = αµ1 (B) + (1 − α)µ2 (B) = (αµ1 + (1 − α)µ2 )(B) • Let {µn }∞ n=1 ∈ M (X, T ) be a sequence of T -invariant probability measures converging to some µ ∈ M (X) weakly. For f ∈ C(X, R) Z Z f ◦ T dµ = lim f ◦ T dµn n→∞ Z = lim f dµn n→∞ Z = f dµ 20 Corollary 5.1. In order to show that a continuous mapping T : X → X is µ-invariant we can simply check µ(T −1 B) = µ(B) for open intervals. Open intervals generate the Borel σ algebra and hence by Kolmogorov’s extension theorem we must have that the measures are unique and hence T∗ µ, µ coincide on the entire σ-algebra. Definition 5.3. Fourier Series: For f ∈ L1 (R \ Z, B, µ) we have the Fourier series ∞ X cn e2πinx n=−∞ where Z 1 cn = f (x)e−2πinx dµ(x) 0 For general f we do not have that this series necessarily converges. Lemma 5.4. Riemann-Lebesgue: If f ∈ L1 then limn→∞ cn = 0 We denote n X Sn (x) := cn e2πirx r=−n to be the partial sum of the Fourier series and σn (x) := n−1 1X Sk (x) n k=0 to be the Cesaro average. Theorem 5.3. Riesz-Fischer: For f ∈ L2 (R \ Z, B, µ) we have that Sn converges to f in L2 Lemma 5.5. For f ∈ L2 (R \ Z, B, µ) we have that Sn converges to f µ-a.e. Theorem 5.4. Feyer’s: If f is continuous then σn converges uniformly to f Corollary 5.2. In order to determine whether a measure µ is T -invariant it suffices to check that Z Z σn dµ = σn ◦ T dµ By Feyer’s theorem. Moreover to check this it suffices to show that Z Z Sn dµ = Sn ◦ T dµ so long as f ∈ L2 by Riesz-Fischer. Definition 5.4. Equivalent: Two measures µ, ν on the same measurable space are equivalent if they have the same collection of null sets. Lemma 5.6. If T : Rk /Zk is a linear toral endomorphism defined as T (x) = Ax mod 1 then T is Lebesgue invariant. 21 Proof. Let f ∈ L1 (Rk /Zk , mathcalB, λ) then f has Fourier series X cn e2πi<n,x> n∈Z where Z cn := f (x)e−2πi<n,x> dλ Rk /Zk Moreover Z e Rk /Zk 2πi<n,x> ( 0 dλ = 1 n 6= 0 n=0 Hence since det(A) 6= 0 we have that nA = 0 ⇐⇒ n = 0 so Z Z X f ◦ T dλ = cn e2πi<n,Ax> dλ n∈Zk = Z X cn e2πi<nA,x> dλ n∈Zk = X Z cn e2πi<nA,x> dλ n∈Zk = c0 Z = f dλ Theorem 5.5. Perron-Frobenum: If B is a non-negative, aperiodic, k × k matrix then: • ∃λ > 0 eigenvalue of B such that |λ̃| < λ for all other eigenvalues λ̃. of B • λ is simple i.e. the eigenspace of λ is one dimensional. • ∃1 v right-eigenvector s.t. vi > 0 ∀i, Bv = λv and k X vi = 1 i=1 • ∃1 u left-eigenvector s.t. ui > 0 ∀i, uB = λu and k X ui = 1 i=1 • Eigenvectors corresponding to other eigenvalues have at least one negative entry. Definition 5.5. Stochastic Matrix: A k × k matrix P is called stochastic if • Pi,j ≥ 0 Pk • j=1 Pi,j = 1 ∀i Definition 5.6. Compatible: Stochastic matrix P is compatible with 0, 1 matrix A if Pi,j > 0 ⇐⇒ Ai,j = 1 22 Corollary 5.3. P is aperiodic if and only if A is aperiodic where P is compatible with A. Lemma 5.7. If P is a stochastic matrix then P satisfies the hypothesis of the P-F theorem and hence ∃λ > 0 strictly largest eigenvalue moreover, λ = 1 and has corresponding right-eigenvector v = 1 which is the unique eigenvector with positive entries. Definition 5.7. Markov Measure: If P is a stochastic matrix and p the left-eigenvector of P then µP [y0 , ..., yn ] := py0 n Y Pyi−1 ,yi i=1 defines the Markov measure on cylinder sets which extends to the shift space by KET. Lemma 5.8. A Markov measure is σ-invariant. Proof. σ∗ µP [y0 , ..., yn ] = µP (σ −1 [y0 , ..., yn ]) = µP k [ ! [i, y0 , ..., yn ] i=1 = k X µP ([i, y0 , ..., yn ]) i=1 = k X pi Pi,y0 i=1 = py0 n Y Pyj−1 ,yj j=1 n Y Pyj−1 ,yj since p is a left eigenvector of P j=1 = µP ([y0 , ..., yn ]) There are uncountably many σ-invariant probability measures. Definition 5.8. Bernoulli Measure: For a full shift we define the Bernoulli measure by the stochastic matrix Pi,j = pj so that µP ([y0 , ..., yn ]) := n Y pyi i=0 Definition 5.9. Parry Measure: For a 0, 1 matrix A with eigenvalue λ and eigenvectors u, v determined by P-F we define the Parry measure to be that generated by Ai,j vj Pi,j = λvi and ui vi pi = Pk j=1 uj vj 23 6 Ergodicity Definition 6.1. Ergodic: If (X, B, µ) is a probability space then the measure preserving transformation T : X → X is ergodic if B ∈ B s.t. T −1 (B) = B implies that µ(B) ∈ {0, 1} Theorem 6.1. If T is an ergodic measure preserving transformation of probability space (X, B, µ) and f ∈ L1 (X, B, µ) then Z n−1 1X j lim f (T x) = f dµ n→∞ n j=0 Lemma 6.1. If ∃A ∈ B such that T −1 (A) = A but µ(A) ∈ (0, 1) then T is not ergodic for µ but µA defined by µ(B ∩ A) µA (B) = µ(A) is invariant with respect to T . Lemma 6.2. If B ∈ B is such that µ(T −1 B∆B) = 0 then ∃B∞ ∈ B such that T −1 (B∞ ) = B∞ and µ(B∞ ∆B) = 0 Corollary 6.1. If T is ergodic and µ(T −1 (B)∆B) = 0 then µ(B) ∈ {0, 1} Proposition 6.1. Let T be a measure preserving transformation of the probability space (X, B, µ) then TFAE: • T is ergodic. • Whenever f ∈ L1 (X, B, µ) such that f ◦ T = f µ a.e. we have that f is constant µ a.e. Proof. • 1 =⇒ 2) Suppose T is ergodic and f ∈ L1 (X, B, µ) such that f ◦ T = f µ a.e. For k ∈ Z, n ∈ N define h k k + 1 k k+1 X(k, n) := x ∈ X : n ≤ f (x) < = f −1 n , n n 2 2 2 2 Since f is measurable we have that X(k, n) ∈ B. Moreover T −1 (X(k, n))∆X(k, n) ⊂ {x ∈ X : f (x) 6= f (T x)} hence µ(T −1 (X(k, n))∆X(k, n)) = 0 so we must have that µ(X(k, n)) ∈ {0, 1} by corollary 6.1 For fixed n ∈ N we have that [ X= X(k, n) ∪ X∞ k∈Z forms a disjoint partition where X∞ := {x : f (x) = ±∞}. Furthermore µ(X∞ ) = 0 since f ∈ L1 which means that X 1 = µ(X) = µ(X(k, n)) k∈Z Since each X(k, n) satisfies µ(X(k, n)) ∈ {0, 1} we must have that for each n ∈ N that ∃1 kn ∈ Z such that µ(X(kn , n)) = 1. If we let ∞ \ Y = X(kn , n) = {x ∈ X : f (x) = c} n=1 for some c then µ(Y ) = 1 and f is constant on Y so f is constant µ a.e. 24 • 2 =⇒ 1) Suppose B ∈ B such that T −1 (B) = B. χB ∈ L1 and χB ◦ T = χT −1 B = χB so by our assumption χB is constant µ a.e. hence Z µ(B) = χB dµ ∈ {0, 1} Hence T is ergodic. This extends to f ∈ L2 (X, B, µ) using the same proof. Theorem 6.2. If T : R/Z → R/Z is the rotation T (x) = x + α mod 1 then T is ergodic for the Lebesgue measure iff α ∈ R/Q Proof. Suppose α ∈ Q then let α = p/q for p, q ∈ Z coprime. Define f (x) = e2πiqx ∈ L2 which isn’t constant. f (T x) = e2πiq(x+α) = e2πiqx e2πiqα = e2πiqx e2πip = e2πiqx hence f ◦ T = f µ a.e. but f isn’t constant µ a.e. hence T cannot be ergodic. Suppose α ∈ R \ Q and f ∈ L2 with f ◦ T = f µ a.e. Then f has Fourier series ∞ X cn e2πinx n=−∞ Moreover f ◦ T has Fourier series ∞ X cn e2πinx e2πinα n=−∞ Since f ◦ T = f µ a.e. we must have that the Fourier series coincide so by comparing coefficients we have that for n 6= 0 cn = cn e2πinα which only has a solution if cn = 0. We therefore have that both functions have the Fourier series c0 which is a constant hence f = c0 µ a.e. so T is ergodic. Theorem 6.3. If T : R/Z → R/Z is the doubling map T (x) = 2x mod 1 then T is ergodic with respect to the Lebesgue measure. Proof. Suppose f ∈ L2 with f ◦ T = f f has Fourier series µ a.e. then f ◦ T j = f X µ a.e. for any j ∈ N cn e2πinx n∈Z and f ◦ T j has Fourier series X j cn e2πin2 x n∈Z j Since f = f ◦ T µ a.e. we must have that the Fourier series coincide hence cn = cn2j for any j ≥ 0 If n 6= 0 then limj→∞ |n2j | = ∞ but by the Riemann-Lebesgue lemma the coefficients must converge to zero: limj→∞ cn2j = 0 This means that cn = 0 whenever n 6= 0, in particular f = c0 µ a.e. hence T is ergodic. 25 Lemma 6.3. If T : Rk /Zk → Rk /Zk is a linear toral automorphism T (x) = Ax mod 1 then TFAE • T is ergodic with respect to the Lebesgue measure. p • The only n ∈ Zk s.t. ∃p ∈ N with e2πi<n,A x> = e2πi<n,x> µ a.e. is n = 0 Proof. Suppose T is ergodic with respect to µ and that ∃n ∈ Zk , p ∈ N s.t. p e2πi<n,A x> = e2πi<n,x> µ a.e. WLOG let p be the smallest such p for this m and define f (x) = p−1 X j e2πi<n,A x> ∈ L2 j=0 Notice that f ◦ T = f µ a.e. Since T is ergodic we must have that f is constant which is only the case if m = 0. p Suppose the only n ∈ Zk s.t. ∃p ∈ N with e2πi<n,A x> = e2πi<n,x> µ a.e. is n = 0 Let f ∈ L2 s.t. f ◦ T = f µ a.e. f has Fourier series X cn e2πi<n,x> n∈Z p f ◦ T has Fourier series X p e2πi<n,A x> n∈Z = X p cn e2πi<nA ,x> n∈Z p Since f = f ◦ T µ a.e. we can equate coefficients which gives us that cn = cnAp for any p ≥ 0 Suppose cn 6= 0 then cnAp 6= 0 If limp→∞ ||nAp || = ∞ then by Riemann Lebesgue we must have that limp→∞ cnAp = 0 which contradicts our last assumption so it must be the case that nAp has repeats. 0 This means that ∃l > l0 s.t. nAl = nAl and so nAp = n for some p ∈ N. p This gives us that e2πi<n,A x> = e2πi<n,x> and so by our initial assumption n = 0. This means that we must have that f = c0 is constant and hence T is ergodic. Proposition 6.2. If T : Rk /Zk → Rk /Zk is a linear toral automorphism T (x) = Ax mod 1 then T is ergodic with respect to the lebesgue measure iff A has no roots of unity as eigenvalues. Proof. Suppose that T is not ergodic then by the previous lemma ∃n ∈ Zk \ {0}, p ∈ N s.t. p e2πi<n,A x> = e2πi<n,x> So we have that nAp = n and since n 6= 0 we must have that 1 is an eigenvalue of Ap so A has an eigenvalue which is a root of unity. Suppose A has a pth root of unity as an eigenvalue. Then Ap has 1 as an eigenvalue hence ∃n ∈ Rk \ {0} such that n(Ap − I) = 0 In particualr since Ap has integer entries we can choose n such that n ∈ Zk \ {0} p This means that nAp = n and e2πi<n,A x> = e2πi<n,x> so by the previous lemma T is not ergodic with respect to µ. Corollary 6.2. Hyperbolic toral automorphisms and ergodic with respect to the Lebesgue measure. Definition 6.2. Extremal: For a convex set Y we say that y ∈ Y is extremal if y = αy1 + (1 − α)y2 where y1 , y2 ∈ Y and α ∈ (0, 1) implies that y1 = y2 = y Theorem 6.4. For µ ∈ M (X, T ) we have that if µ is extemal then µ is ergodic. Proof. Suppose that µ is not ergodic then ∃B ∈ B such that T −1 (B) = B and µ(B) ∈ (0, 1). Define µ(A ∩ B) µ1 (A) := µ(B) µ2 (A) := µ(A ∩ (X \ B)) µ(X \ B) 26 which are both invariant probability measures with respect to T moreover µ1 6= µ2 µ = µ(B)µ1 + (1 − µ(B))µ2 hence µ cannot be extremal. Theorem 6.5. If T : X → X is a continuous mapping on a compact metric space then M (X, T ) contains at least one ergodic measure. Proof. It suffices to show that ∃µ ∈ M (X, T ) extremal by the previous theorem. C(X, R) is separable so choose a countable dense set {fn }∞ n=0 ∈ C(X, R) R The map µ → f0 dµ is continuous in the weak topology so since M (X, T ) is compact ∃ν ∈ M (X, T ) s.t. Z Z f0 dν = supµ∈M (X,T ) f0 dµ Which means that M0 := Z ν ∈ M (X, T ) : Z f0 dν = supµ∈M (X,T ) f0 dµ Is a non-empty, closed subset of a compact space and hence is compact. Continuing inductively we can define Z Z Mn := ν ∈ Mn−1 : fn dν = supµ∈Mn−1 fn dµ each of which is non-empty and compact. T∞ If we define M∞ := n=0 Mn then M∞ is non-empty since the countable intersection of nested, non-empty compact sets is non-empty. We therefore have that ∃µ∞ ∈ M∞ . We claim that µ∞ is extremal. Suppose µ∞ = αµ1 + (1 − α)µ2 for α ∈ (0, 1) and µ1 , µ2 ∈ M (X, T ) then we want to show that µ1 = µ2 . By the Reisz representation theorem we have that µ1 = µ2 iff Z Z f dµ1 = f dµ2 ∀f ∈ C(X, T ) However it suffices to show this for some dense subset. Z Z Z f0 dµ∞ = α f0 dµ1 + (1 − α) f0 dµ2 so we must have that Z Z Z Z Z supµ∈M (X,T ) f0 dµ = f0 dµ∞ ≤ max f0 dµ1 , (1 − α) f0 dµ2 ≤ supµ∈M (X,T ) f0 dµ Since µ1 , µ2 ∈ M (X, T ) It therefore follows that µ1 , µ2 ∈ M0 . Suppose µ1 , µ2 ∈ Mn for some n ∈ N0 then: Z Z Z fn dµ∞ = α fn dµ1 + (1 − α) fn dµ2 so we must have that Z Z Z Z Z supµ∈Mn−1 fn dµ = fn dµ∞ ≤ max fn dµ1 , (1 − α) fn dµ2 ≤ supµ∈Mn fn dµ Since µ1 , µ2 ∈ Mn−1 We therefore have that Z Z fn dµ∞ = since α ∈ (0, 1). This holds ∀n ∈ N hence Z fn dµ1 = Z Z f dµ1 = ∀f ∈ {fn }∞ n=0 so indeed µ1 = µ2 . 27 f dµ2 fn dµ2 Corollary 6.3. IfQσ : Σk → Σk is the full shift and p is a probability vector then n−1 µp [z0 , ..., zn−1 ] := i=0 pzi is ergodic for σ. Corollary 6.4. µ(B) = 1 log(2) Z B 1 dx 1+x is ergodic for the continued fraction map. 7 Recurrence and Unique Ergodicity Theorem 7.1. Poincare Recurrence Theorem: Let T : X → X be a measure preserving transform of the probability space (X, B, µ) for compact X. If A ∈ B s.t. µ(A) > 0 then for µ almost every x ∈ A we have that {T n x}∞ n=0 return to A infinitely often. Proof. Let E = {x ∈ A : ∃minmathbbN : T n x ∈ / A ∀n ≥ m} which is equivalent to the set of x ∈ A such that the sequence only returns to A finitely often. We want to show that µ(A \ E) = 0. Let F = {x ∈ A : T n x ∈ / A ∀n ≥ 1} then T −k F = {x ∈ X : T k x ∈ A, T n x ∈ / A ∀n > k} So in particular we have that ∞ [ A\E = (T −k F ∩ A) k=0 So we have that ∞ [ µ(A \ E) = µ ≤µ ! (T k=0 ∞ [ −k F ∩ A) ! T −k F k=0 ≤ ∞ X µ(T −k F ) k=0 ∞ X = µ(F ) k=0 So it suffices to show that µ(F ) = 0. If n > m then suppose x ∈ T −n F ∩ T −m F . We must have that T m x ∈ F and also that T n−m (T m x) = T n x ∈ F ⊂ A which contradicts that x ∈ T −m F so we must have that {T −k }∞ k=0 are disjoint. This gives us that ! ∞ ∞ ∞ [ X X −n µ T F = µ(T −k F ) = µ(F ) k=0 k=0 k=0 The left hand side lies in the interval [0, 1] since µ is a probability measure and the right hand side can only take values in {±∞, 0} since it is the infinite sum of the same value hence the equality implies that both sides must equal zero and therefore µ(F ) = 0. Definition 7.1. Unique Ergodicity: If (X, B) is a measurable space for X compact and T : X → X has a unique invariant measure µ then T is called uniquely ergodic. Theorem 7.2. Let X be a compact metric space and T : X → X continuous then the following are equivalent: 1. T is uniquely ergodic. 28 2. ∀f ∈ C(X, R) ∃cf constant such that n−1 1X f (T j x) = cf n→∞ n j=0 lim uniformly over x. Proof. We split the proof into the two separate implications: • 2 =⇒ 1) Suppose µ, ν are T invariant probability measures then Z f dµ = n−1 Z 1X f ◦ T j dµ n j=0 n−1 Z 1X f ◦ T j dµ n→∞ n j=0 = lim Z = lim n→∞ Z = n−1 1X f ◦ T j dµ n j=0 n−1 1X f ◦ T j dµ n→∞ n j=0 lim by DCT Z = cf dµ = cf Similarly the same must hold for ν hence Z Z f dµ = f dν for any f ∈ C(X, R) so indeed µ, ν coincide by Riesz representation theorem. • 1 =⇒ 2) Suppose µ is an invariant measure then if 2 holds we have that ∃cf such that Z f dµ = cf Suppose 2 fails then we want to show that 1 also fails. ∞ ∃f ∈ C(X, R) and sequence {nk }∞ k=1 ∈ N with associated {xk }k=1 ∈ X such that Z nk −1 1 X f ◦ T j xk 6= f dµ k→∞ nk j=0 lim For k ≥ 1 define νk ∈ M (X) by νk = then Z nk −1 1 X T j δx nk j=0 ∗ k nk −1 1 X f dνk = f (T j xk ) nk j=0 so νk has some subsequence νkr which converges to some invariant probability measure ν. 29 Z Z f dν = lim r→∞ f dνkr nkr −1 1 X f (T j xkr ) r→∞ nkr j=0 Z 6 = f dµ = lim So µ is not uniquely ergodic. 8 Birkhoff ’s Ergodic Theorem Definition 8.1. Absolutely Continuous: If µ, ν are measures on (X, B) then ν is absolutely continuous with respect to µ if µ(B) = 0 =⇒ ν(B) = 0 for any B ∈ B We have that if Z ν(B) := f dµ B then ν is absolutely continuous with respect to µ. Theorem 8.1. Radon-Nikodym: Let (X, B, µ) be a probability space and ν be a measure on (X, B) absolutely continuous with respect to µ then ∃1 f non-negative measurable function such that Z ν(B) = f dµ B ∀B ∈ B Definition 8.2. Conditional Expectation: For A ⊆ B sub-σ algebra µ|A is a measure. For f ≥ 0 ∈ L(X, B, µ) Z ν(A) = f dµ A is a measure absolutely continuous with respect to µ so by Radon-Nikodym ∃1 E[f |A] measure s.t. Z ν(A) = E[f |A]dµ called the conditional expectation of f given A Corollary 8.1. E[f |A] is uniquely determined by the requirements that • E[f |A] is A-measurable. R R • A f dµ = A E[f |A]dµ Lemma 8.1. I := {B ∈ B : T ∀A ∈ A −1 B = B a.e.} is a σ-algebra of invariant sets. Theorem 8.2. Birkhoff ’s Ergodic Theorem: Let (X, B, µ) be a probability space and T : X → X a measure preserving transformation. ∀f ∈ L(X, B, µ) we have that n−1 1X lim f (T j x) = E[f |I] n→∞ n j=0 for a.e. x ∈ X. 30 Corollary 8.2. Let (X, B, µ) be a probability space and T : X → X an ergodic measure preserving transformation. ∀f ∈ L(X, B, µ) we have that Z n−1 1X f (T j x) = f dµ lim n→∞ n j=0 for a.e. x ∈ X. Proof. If T is ergodic that I is the set of all trivial sets so for f ∈ L(X, B, µ) Z E[f |I] = f dµ so the result follows by Birkhoff’s ergodic theorem. Corollary 8.3. If T : X → X is an ergodic transformation of (X, B, µ) and B ∈ B then lim n→∞ 1 #{j : 0 ≤ j ≤ n − 1, T j x ∈ B} = µ(B) n Theorem 8.3. If T : X → X is a measure preserving transformation of the probability space (X, B, µ) then the following are equivalent: 1. T ergodic. 2. ∀A, B ∈ B n−1 1X µ (T −j A) ∩ B = µ(A)µ(B) n→∞ n j=0 lim Proof. • 1 =⇒ 2) Suppose T is ergodic then χA ∈ L1 so Z n−1 1X χA T j = χA dµ = µ(A) n→∞ n j=0 lim a.e. so n−1 1X (χA ◦ T j )χB = µ(A)χB n→∞ n j=0 lim a.e. Since the left hand side is bounded by 1 by DCT we have that n−1 n−1 Z 1X 1X µ((T −j A) ∩ B) = (χA ◦ T j )χB dµ n j=0 n j=0 Z = n−1 1X (χA ◦ T j )χB dµ n j=0 Z n−1 n−1 1X 1X µ((T −j A) ∩ B) = lim (χA ◦ T j )χB dµ n→∞ n n→∞ n j=0 j=0 Z = µ(A)χB dµ lim = µ(A)µ(B) 31 • Suppose 2 holds and that T −1 A = A then set B = A which gives us that µ((T −j A) ∩ B) = µ(A ∩ B) = µ(A) So µ(A) = = n−1 1X µ(A) n j=0 n−1 1X µ((T −j A) ∩ B) n j=0 = µ(A)µ(B) = µ(A)2 So µ(A) ∈ {0, 1} and therefore T is ergodic. Definition 8.3. Weak-Mixing: T is weak mixing if ∀A, B ∈ B we have that n−1 1X |µ((T −j A) ∩ B) − µ(A)µ(B)| = 0 n→∞ n j=0 lim Definition 8.4. Strong-Mixing: T is strong mixing if ∀A, B ∈ B we have that lim µ((T −j A) ∩ B) = µ(A)µ(B) n→∞ Definition 8.5. Normal: We call x ∈ [0, 1) normal if it has a unique binary expansion x= ∞ X xi i=1 2i for xi ∈ {0, 1} with 1 #{j : 1 ≤ j ≤ n, xj = 0} = 1/2 n→∞ n lim 9 Entropy Definition 9.1. Topologically Conjugate: For compact spaces X, Y we say continuous functions T : X → X, S : Y → Y are topologically conjugate if ∃h : X → Y homeomorphism such that h ◦ T = S ◦ h Definition 9.2. Isomorphic: If T, S are measure preserving transforms on probability spaces (X, B, µ), (Y, C, ν) respectively then T, S are isomorphic if ∃M ∈ B, N ∈ C s.t. • T M ⊆ M, SN ⊆ N • µ(M ) = 1 = ν(N ) • ∃ϕ : M → N bijection s.t. – ϕ, ϕ−1 are measurable: ϕ(B) ∈ C ∀B ∈ B, ϕ−1 (C) ∈ B ∀C ∈ C 32 – ϕ, ϕ−1 are measure preserving: µ(ϕ−1 C) = ν(C) ∀C ∈ C, ν(ϕB) = µ(B) ∀B ∈ B – ϕ◦T =S◦ϕ Definition 9.3. Conditional Probability: Let (X, B, µ) be a probability space; ff A ⊂ B is a sub-σ algebra and B ∈ B then µ(B|A) := E[χB |A] is the conditional probability of B given A Definition 9.4. Countable Partition: α is a measure theoretic countable partition of probability space (X, B, µ) if α = {Ai }∞ i=1 s.t. • Ai ∈ B ∀i • µ(Ai ∩ Aj ) = 0 ∀i 6= j S∞ • µ ( i=1 Ai ) = 1 Corollary 9.1. For a measurable function f we have that X χA (x) Z f dµ E[f |σ(α)](x) = µ(A) A A∈α Furthermore µ(B|σ(α))(x) = µ(A ∩ B) µ(A) x∈A Theorem 9.1. Increasing Martingale Theorem: S∞ Let {Ai }∞ i=1 be an increasing sequence of sub-σ algebras of A such that σ ( i=1 Ai ) = A then • limn→∞ E[f |An ] = E[f |A] µ − a.e. R • limn→∞ |E[f |An ] − E[f |A]|dµ = 0 Definition 9.5. Join: If α, β are countable partitions of X then the join of α, β is the partition: α ∨ β := {A ∩ B : A ∈ α, B ∈ β} Notice that α, β are independent if µ(A ∨ B) = µ(A)µ(B) ∀A ∈ α, B ∈ β. Definition 9.6. Information: Given a partition α we define the information I(α) : X → R+ obtained from observing α to be X I(α)(x) := − χA (x) log(µ(A)) A∈α Corollary 9.2. I(α) is continuous. Corollary 9.3. If α, β are independent partitions then I(α ∨ β) = I(α) + I(β) 33 Proof. I(α ∨ β) = − X χC (x) log(µ(C)) C∈α∨β =− X χA∩B (x) log(µ(A ∩ B)) A∈α,B∈β =− XX χA (x)χB (x) log(µ(A)µ(B)) by independence A∈α B∈β =− XX χA (x)χB (x)(log(µ(A)) + log(µ(B))) A∈α B∈β =− XX χA (x)χB (x) log(µ(A)) − A∈α B∈β =− X XX χA (x)χB (x) log(µ(B)) A∈α B∈β χA (x) log(µ(A)) − A∈α X χB (x) log(µ(B)) B∈β = I(α) + I(β) Definition 9.7. Entropy: Given a partition α we define the entropy to be Z X H(α) = I(α)dµ = − µ(A) log(µ(A)) A∈α Definition 9.8. Conditional Information: Given A ⊆ B sub-σ algebra and partition α we define the conditional information of α given A to be X I(α|A)(x) := − χA (x)log(µ(A|A)(x)) A∈α Definition 9.9. Conditional Entropy: Given A ⊆ B sub-σ algebra and partition α we define the conditional entropy of α given A to be Z Z X H(α|A) := I(α|A)dµ = − µ(A|A) log(µ(A|A))dµ A∈α Lemma 9.1. For countable partitions α, β, γ we have that I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ) Moreover H(α ∨ β|γ) = H(α|γ) + H(β|α ∨ γ) Proof. Let x ∈ X we know that since α, β, γ form partitions we have some A ∈ α, B ∈ β, C ∈ γ s.t. x ∈ A ∩ B ∩ C. I(α ∨ β|γ)(x) = − X χY (x) log(µ(Y |γ)(x)) Y ∈α∨β = − log(µ(A ∩ B)(x))   X µ(A ∩ B ∩ C)  = − log  χC (x) µ(C) C∈γ = − log(µ(A ∩ B ∩ C)) + log(µ(C)) I(α|γ) = − log(µ(A ∩ C)) + log(µ(C)) I(β|α ∨ γ) = − log(µ(A ∩ B ∩ C)) + log(µ(A ∩ C)) 34 Hence indeed I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ) so by integrating over x we have that H(α ∨ β|γ) = H(α|γ) + H(β|α ∨ γ) Definition 9.10. Refinement: For countable partitions α, β we say that β is a refinement of α (written α ≤ β) if any set A ∈ α can be written as the union of sets in β. Corollary 9.4. For countable partitions α ≤ β we have that I(α|β) = 0 Proof. Since α ≤ β we have that β = α ∨ β so I(α|β) = I(α|α ∨ β) = 0 Corollary 9.5. If α, β, γ are countable partitions and γ ≥ β then I(α ∨ β|γ) = I(α|γ) Moreover H(α ∨ β|γ) = H(α|γ) Proof. β ≤ α ∨ γ so I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ) = I(α|γ) So the final result follows by integration. Corollary 9.6. If α, β, γ are countable partitions s.t. α ≥ β then I(α|γ) ≥ I(β|γ) Moreover H(α|γ) ≥ H(β|γ) Proof. α ≥ β so α = α ∨ β so we have that I(α|γ) = I(α ∨ β|γ) = I(β|γ) + I(α|β ∨ γ) ≥ I(β|γ) The final result then follows by integration. Proposition 9.1. Jensen’s Inequality: Let ϕ : [0, 1] → R+ be continuous and concave. If f ∈ L1 (X, B, µ) and A ⊂ B a sub-σ algebra then ϕ(E[f |A])(x) ≥ E[ϕ(f )|A](x) Lemma 9.2. If γ ≥ β are countable partitions then H(α|β) ≥ H(α|γ) 35 µ − a.e. Proof. Set ϕ(t) = −t log(t) which is continuous and convex on [0, 1] and hence satisfies the requirements for Jensen’s inequality. Choose A ∈ α and define f (x) := µ(A|γ)(x) = E[χA |γ](x) By properties of conditional expectation we have that E[f |β] = E[E[χA |γ]|β] = E[χA |β] = µ(A|β) By Jensen’s inequality we have that ϕ(E[f |β]) ≥ E[ϕ(f )|β] hence −µ(A|β) log(µ(A|β)) ≥ −E[µ(A|γ) log(µ(A|γ))] Integrating with respect to µ yields Z Z − µ(A|β) log(µ(A|β))dµ ≥ − E[µ(A|γ) log(µ(A|γ))]dµ By summing over A ∈ α we have that H(α|β) ≥ H(α|γ) Definition 9.11. Sub-Additive: A sequence {an }∞ n=1 is called sub-additive if an+m ≤ an + am Lemma 9.3. If {an }∞ n=1 is a positive sub-additive sequence then an /n converges. Proof. If {an }∞ n=1 is sub-additive then its infimum element is a1 moreover 0≤ an na1 ≤ = a1 n n hence the sequence must converge. For measure preserving transformation T and countable partition α we denote T −1 α := {T −1 A : A ∈ α} and −i Hn (α) := H(∨n−1 α) i=0 T Lemma 9.4. If T is a measure preserving transformation and α a countable partition then Hn (α) is a sub-additive sequence. Proof. Hn+m (α) = H(∨n+m−1 T −i α) i=0 −i = H (∨n−1 α) ∨ (∨n+m−1 T −j α) i=0 T j=n −i ≤ H(∨n−1 α) + H(∨n+m−1 T −j α)) i=0 T j=n −i −n −j = H(∨n−1 α) + H(∨m−1 T α)) i=0 T j=0 T −i −j = H(∨n−1 α) + H(T −n ∨m−1 α)) i=0 T j=0 T −i −j = H(∨n−1 α) + H(∨m−1 α)) i=0 T j=0 T T measure preserving = Hn (α) + Hm (α) 36 Definition 9.12. Relative Entropy: If T : X → X is a measure preserving transform of probability space (X, B, µ) and α a countable partition of X such that H(α) < ∞ then the entropy of T relative to α is defined to be h(T, α) := lim n→∞ 1 −i H(∨n−1 α) i=0 T n The relative entropy always exists due to the previous lemmas. Corollary 9.7. 0 ≤ h(T, α) ≤ H(α) Corollary 9.8. −i h(t, α) = H(α| ∨∞ α) i=1 T Proof. Denote −i αn := ∨n−1 α i=0 T Then we have that −i −i H(αn ) = H(α| ∨n−1 α) + H(∨n−1 α) i=1 T i=1 T −i = H(α| ∨n−1 α) + H(αn−1 ) i=1 T n X −i = H(α| ∨k−1 α) i=0 T k=1 Hence n H(αn ) 1X −i H(α| ∨k−1 α) = i=0 T n n k=1 By the increasign martingale theorem we have that −i −i lim H(α| ∨n−1 α) = H(α| ∨∞ α) i=0 T i=0 T n→∞ So we have that H(αn ) −i = H(α| ∨∞ α) i=0 T n→∞ n h(T, α) = lim Definition 9.13. Entropy of a Measure Preserving Transformation: If T is a measure preserving transformation then h(T ) := sup{h(T, α) : α countable partition, H(T, α) < ∞} Theorem 9.2. Let T : X → X, S : Y → Y be measure preserving transformations of (X, B, µ), (Y, C, ν) respectively. If T, S are isomorphic then h(T ) = h(S) Proof. Recall that T, S are isomorphic if ∃M ∈ B, N ∈ C s.t. • T M ⊆ M, SN ⊆ N • µ(M ) = 1 = ν(N ) • ∃ϕ : M → N bijection s.t. ϕ, ϕ−1 are measurable, measure preserving and ϕ ◦ T = S ◦ ϕ 37 If α is a countable partition of Y then it is also a countable partition of N . ϕ−1 α is a partition of M and hence of X. We have that X Hµ (ϕ−1 α) = − µ(ϕ−1 A) log(µ(ϕ−1 A)) A∈α =− X ν(A) log(ν(A) A∈α = Hν (α) More generally we have that n−1 −i −i −i Hµ (∨n−1 (ϕ−1 α)) = Hµ (ϕ−1 ∨n−1 i=0 T i=0 S α) = Hν (∨i=0 S α) Dividing by n and taking the limit as n → ∞ gives h(T, ϕ−1 α) = lim n→∞ 1 1 n−1 −i −i Hµ (∨n−1 (ϕ−1 α)) = lim Hν (∨i=0 S α) = h(S, α) i=0 T n→∞ n n So we have that h(S) = sup{h(S, α) : α countable partition of Y, Hν (α) < ∞} = sup{h(T, ϕ−1 α) : α countable partition of Y, Hν (α) < ∞} ≤ sup{h(T, β) : β countable partition of X, Hµ (beta) < ∞} = h(T ) So by symmetry we have that h(T ) ≤ h(S) and hence h(T ) = h(S) Theorem 9.3. Abramov’s Theorem: If {αn }∞ n=1 is an increasing sequence of partitions on (X, B, µ) such that H(αn ) < ∞ and S∞ α = B then n n=1 h(T ) = lim h(T, αn ) n→∞ Proof. Let α, β be partitions with H(α), H(β) < ∞ then −i −i −j H(∨n−1 α) ≤ H (∨n−1 α) ∨ (∨n−1 β) i=0 T i=0 T j=0 T n−1 −i −i −j = H(∨n−1 β) + H(∨i=0 T α| ∨n−1 β) i=0 T j=0 T −i ≤ H(∨n−1 β) + i=0 T n−1 X n−1 −j H(T −i α| ∨j=0 T β) i=0 −i ≤ H(∨n−1 β) + i=0 T n−1 X H(T −i α|T −i β) i=0 −i = H(∨n−1 β) + nH(α|β) i=0 T which gives us that h(T, α) = lim n→∞ 1 1 −i −i H(∨n−1 α) ≤ lim H(∨n−1 β) + H(α|β) = h(T, β) + H(α|β) i=0 T i=0 T n→∞ n n In particular h(T, α) ≤ h(T, αn ) + H(α|αn ) for any countable partition α. 38 Furthermore for an increasing sequence of partitions {αn }∞ n=1 with H(αn ) < ∞ and along with arbitrary partition α with H(α) < ∞ then S∞ n=1 αn = B lim H(α|αn ) = 0 n→∞ Which means that h(T, α) ≤ lim h(T, αn ) n→∞ for any countable partition α so indeed h(T ) = supα h(T, α) ≤ lim h(T, αn ) n→∞ ≤ h(T ) which means that h(T ) = lim h(T, αn ) n→∞ Definition 9.14. Generator: For T invertible measure preserving transformation on (X, B, µ) we say that a countable partition α is a generator if −i lim ∨n−1 α=B j=−(n−1) T n→∞ Definition 9.15. Strong Generator: For T measure preserving transformation on (X, B, µ) we say that a countable partition α is a strong generator if −i lim ∨n−1 α=B j=0 T n→∞ Corollary 9.9. If for a.e. x, y ∈ X ∃n s.t. x, y are in different elements of the partition −i ∨n−1 α then α is a generator. j=−(n−1) T −i Corollary 9.10. If for a.e. x, y ∈ X ∃n s.t. x, y are in different elements of the partition ∨n−1 α j=0 T then α is a strong generator. Theorem 9.4. Sinai’s Theorem: If either • α is a strong generator. or • T is invertible and α is a generator. then h(T ) = h(T, α) Proof. • Suppose α is a strong generator. 1 −i H(∨k−1 (∨nj=0 T −j α)) i=0 T k 1 = lim H(∨n+k−1 T −i α) i=0 k→∞ k n+k 1 = lim H(∨n+k−1 T −i α) i=0 k→∞ k n+k = h(T, α) h(T, ∨nj=0 T −j α) = lim k→∞ 39 In particular this holds for every n so h(T, B) = h(T, α) hence by Abramov’s we have that h(T, α) = lim h(T, αn ) = h(T ) n→∞ • Suppose T is invertible and α is a generator. 1 −i H(∨k−1 (∨nj=−n T −j α)) i=0 T k 1 −i α) = lim H(∨n+k−1 i=−n T k→∞ k 1 2n+k−1 −i = lim H(∨i=0 T α) k→∞ k 2n + k 1 = lim H(∨2n+k−1 T −i α) i=0 k→∞ k 2n + k = h(T, α) h(T, ∨nj=−n T −j α) = lim k→∞ In particular this holds for every n so h(T, B) = h(T, α) hence by Abramov’s we have that h(T, α) = lim h(T, αn ) = h(T ) n→∞ Theorem 9.5. If T is a measure preserving transformation of (X, B, µ) then • For k ∈ N0 we have that h(T k ) = kh(T ). • If T is invertible then h(T −1 ) = h(T ). • If T is invertible and k ∈ Z then h(T k ) = |k|h(T ). Proof. • For k = 0 we have that h(T k ) = h(1) so if α is a countable partition such that H(α) < ∞ then n−1 −i H(∨i=0 1 α) = H(α) and hence h(T, α) = lim n→∞ so indeed the statement holds for k = 0. Choose a countable partition α with H(α) < ∞. 40 1 H(α) = 0 n 1 −j H(∨nk−1 α) j=0 T n→∞ n 1 nk−1 −j = k lim H(∨j=0 T α) n→∞ nk = kh(T, α) −j h(T k , ∨k−1 α) = lim j=0 T kh(T ) = supα:H(α)<∞ kh(T, α) −j = supα:H(α)<∞ h(T k , ∨k−1 α) j=0 T ≤ supα:H(α)<∞ h(T k , α) = h(T k ) 1 −jk h(T k , α) = lim H(∨n−1 α) j=0 T n→∞ n 1 −j ≤ lim H(∨nk−1 α) j=0 T n→∞ n 1 −j H(∨nk−1 α) = k lim j=0 T n→∞ nk = kh(T k , α) So indeed h(T k ) = kh(T ) • −j −j H(∨n−1 α) = H(T n−1 ∨n−1 α) j=0 T j=0 T j = H(∨n−1 j=0 T α) 1 −j h(T, α) = lim H(∨n−1 α) j=0 T n→∞ n 1 j = lim H(∨n−1 j=0 T α) n→∞ n = h(T −1 , α) Taking the supremum with respect to α then gives us that h(T ) = h(T −1 ) • This follows directly from the last two points. Lemma 9.5. The Parry measure µP r of n × n matrix A with entries 0, 1 and largest eigenvalue λ has entropy h(µP r ) = log(λ) Proof. Let Ai,j vj λvi ui vi pi = c Pi,j = where u, v are left and right eigenvectors respectively and c = 41 Pk i=1 ui vi is a normalising constant. P is a stochastic matrix so: h(µP r ) = − k X pi Pi,j log(Pi,j ) i,j=1 =− k X Ai,j vj ui vi Ai,j vj log c λvi λvi i,j=1 =− k X Ai,j vj ui Ai,j vj log λc λvi i,j=1 k k k X X X ui Ai,j vj ui Ai,j vj ui Ai,j vj =− log(Ai,j ) + log(λ) + (log(vi ) − log(vj )) λc λc λc i,j=1 i,j=1 i,j=1 Since A has entries 0, 1 we must always have that Ai,j log(Ai,j ) = 0 and the third term contains a telescoping sum which cancels all of the terms so we have that h(µP r ) = k X ui Ai,j vj log(λ) λc i,j=1 = log(λ) k X λuj vj j=1 Pk λc j=1 uj v j i=1 ui vi = log(λ) Pk = log(λ) In general we may not have that h(T ) = h(S) implies that T, S are isomorphic however the following two theorems give some cases where this is a sufficient condition. Theorem 9.6. Ornstein: Two 2-sided Bernoulli shifts with the same entropy are isomorphic. Theorem 9.7. Ornstein-Friedman: Two aperiodic Markov shifts of finite type with the same entropy are isomorphic. 10 Functional Analysis Definition 10.1. Banach Space: A Banach space is a complete normed space. For a probability space (X, B, µ) we denote L1 (X, B, µ) to be the space of integrable functions which is a Banach space with the norm Z ||f ||1 := |f |dµ Furthermore we denote L∞ (X, B, µ) to be the space of functions which are almost everywhere bounded which is a Banach space with the norm ||f ||∞ := inf sup |f (y)| Y ⊂X:µ(Y )=1 y∈Y Proposition 10.1. We have the following facts concerning L1 , L∞ : 42 • L∞ ⊂ L1 R • Every bounded linear functional W : L1 → R can be written as W (f ) = f gdµ for some g ∈ L∞ R • L10 := {f ∈ L1 : f dµ = 0} is a Banach space with respect to the norm ||f || = inf c∈R ||f − c||1 Lemma 10.1. If E is a Banach space and F ⊂ E is a closed subspace then ∃W : E → R non-zero, bounded linear functional s.t. W (f ) = 0 ∀f ∈ F Lemma 10.2. Let m, p ∈ N such that 1 ≤ m ≤ p and {xi }pi=1 ∈ R+ For ε > 0 let   i+n−1  X  Sε := i ∈ {1, ..., p − m} : ∃n ∈ N ∩ [1, m] s.t. nε ≤   j=i then p X j=1 xj ≥ ε p−m X χSε (i) i=1 Proof. If S = φ then since xi ≥ 0 ∀i the statement is trivial so suppose S 6= φ. Let j1 = min{i ∈ S} since S is finite and non-empty. Furthermore let n1 be the least value for which Pj1 +n1 −1 n1 ε ≤ i=j 1 Then inductively define jr := min{j ∈ S : j > jr−1 + nr−1 − 1} (so long as the right-hand-side is Pjr +nr −1 non-empty) and nr to be the least value for which nr ε ≤ i=j r This yields the finite collections {jr }kr=1 , {nr }kr=1 n X j=1 xj ≥ k jr +n r −1 X X r=1 ≥ xi i=jr k X nr ε r=1 ≥ε k jr +n r −1 X X r=1 =ε χS (i) i=jr p−m X i=1 since any element not in some string cannot be in S. For ε > 0 and f ∈ L1 define    1 n−1  X Eε (f ) := x ∈ X : lim sup f (T j x) ≥ ε   n j=0 n→∞ Lemma 10.3. µ(E2ε (f )) ≤ ||f ||1 ε Proof. Write f = f + − f − where f + , f − ≥ 0 then for m ≥ 1 define   n−1   X Eεm (f + ) = x ∈ X : ∃n ≤ m, f + (T j x) ≥ εn   j=0 43 Eεm (f − ) =   n−1 X x ∈ X : ∃n ≤ m,  f − (T j x) ≥ εn j=0    then by applying lemma 10.2 with xj = f + (T j−1 x) and Sε = Eεm (f + ) we have that for p > m p−1 X f + (T j x) ≥ ε j=0 p−m−1 X χEεm (f + ) (T i x) i=0 and similarly for f − we get p−1 X f − (T j x) ≥ ε j=0 p−m−1 X χEεm (f − ) (T i x) i=0 It follows that Z p f + dµ = Z X p−1 f + (T j x)dµ T µ invariant j=0 ≥ε p−m−1 X Z χEεm (f + ) (T i x)dµ i=0 = ε = ε(p − m)µ(Eεm (f + )) Similarly we have that Z p f − dµ ≥ ε(p − m)µ(Eεm (f − )) By dividing by p and taking the limit as p → ∞ we get that Z f + dµ ≥ εµ(Eεm (f + )) and Z f − dµ ≥ εµ(Eεm (f − )) Furthermore we have that E2ε (f ) ⊂ Eεm (f + ) ∪ Eεm (f − ) so µ(E2ε (f )) ≤ lim sup µ(Eεm (f + )) + lim sup µ(Eεm (f − +)) m→∞ m→∞ Z Z 1 f + dµ + f − dµ ≤ ε ||f ||1 = ε Lemma 10.4. Given f ∈ L10 and δ > 0 we have that ∃h ∈ L∞ s.t. ||f − (h ◦ T − h)||1 < δ Proof. Let C = {h ◦ T − h : h ∈ L∞ } ⊂ L10 which is a vector space. We want to show that C is dense in L10 so it suffices to show that any linear bounded functional which vanishes on C also vanishes on L10 by lemma 10.1. We know that any linear bounded functional W on L1 can be written as Z W (f ) = f gdµ 44 for some g ∈ L∞ . Suppose W vanishes on C then ∀h ∈ L∞ we have that Z (h ◦ T − h)gdµ = 0 in particular we have that Z (g ◦ T − g)gdµ = 0 Which means that Z Z (g ◦ T )gdµ = g 2 dµ Furthermore Z Z Z Z (g ◦ T − g)2 dµ = (g ◦ T )2 dµ + g 2 dµ − 2 (g ◦ T )gdµ Z Z 2 = 2 g dµ − 2 (g ◦ T )gdµ =0 so we must have that g ◦ T − g = 0 almost everywhere hence since T is ergodic we must have that g is some constant c. For f ∈ L10 we have that Z W (f ) = f gdµ Z = c f dµ =0 so indeed W vanishes on L10 so C is dense in L10 and the result follows. Theorem 10.1. Birkhoff ’s Ergodic Theorem: If T : X → X is an ergodic measure preserving transformation on probability space (X, B, µ) then for any f ∈ L1 (X, B, µ) we have that for almost every x ∈ X Z n−1 1X j lim f (T x) = f dµ n→∞ n j=0 Proof. Firstly suppose that h ∈ L∞ then h ◦ T − h ∈ L10 since Z Z Z h ◦ T − hdµ = h ◦ T dµ − hdµ Z Z = hdµ − dµ =0 45 Furthermore 1 n−1 1 n−1 X X (h ◦ T − H)(T j x) = h(T j+1 x) − h(T j x) n j=0 n j=0 1 = (h(T n x) − h(x)) n 1 ≤ (|f (T n x)| + |f (x)|) n 2||h||∞ ≤ n which converges to 0 as n → ∞. We need to extend this to f ∈ L10 so suppose that f ∈ L10 then we want to show that n−1 1X f (T j x) = 0 n→∞ n j=0 lim Fix δ > 0 then by the previous lemma we can find h ∈ L∞ s.t. ||f − (h ◦ T − h)||1 < δ so ∀ε > 0 we have that: Eε (f ) ⊂ Eε/2 (f − (h ◦ T − h)) ∪ Eε/2 (h ◦ T − h) µ(Eε (f )) ≤ µ(Eε/2 (f − (h ◦ T − h))) + µ(Eε/2 (h ◦ T − h)) = µ(Eε/2 (f − (h ◦ T − h))) ≤ by earlier part of the proof 4||f − (h ◦ T − h)||1 ε 4δ ≤ ε by lemma 10.3 since δ was chosen arbitrarily it follows that µ(Eε (f )) = 0. R So the theorem holds for f ∈ L10 so for g ∈ L1 write f = g − gdµ. The theorem holds for f since f ∈ L10 and by rearrangement we have that the theorem holds for g. 46

Ergodic Theory Contents March 14, 2013

Related documents

Products

Support

Ergodic Theory Contents March 14, 2013

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib