————————————————— Lecture 14 (Oct. 7) Ergodic Theory: Defn: A measure-preserving transformation on a probability space (M, A, µ) is map T : M → M which is measurable and measurepreserving, i.e., for A ∈ A, T −1(A) ∈ A and µ(T −1(A)) = µ(A) . Defn: IMPT: T is MPT + bijection a.e. + T −1 is MPT. Note: for IMPT, need only assume that T is MPT, bijective and T −1 is measurable: µ(T (A)) = µ(T −1 ◦ T (A)) = µ(A) R R Theorem: T is MPT iff f ◦ T dµ = f dµ for all f ∈ L1. Proof: If: Apply to f = χA, A ∈ A and note that χA ◦ T = χT −1(A) Only If: split f into positive and negative parts; approximate f + by monotone sequence of simple functions φn. then f + ◦ T is approximated by φn ◦ T R R each φn = φn ◦ T use monotone convergence theorem Every MPT is recurrent: Poincare Recurrence Theorem (1890) Let T be an MPT and µ(A) > 0. Then a.e. x ∈ A visits A infinitely often, i.e. for a.e. x ∈ A, there are infinitely many n > 0 s.t. T n(x) ∈ A. Proof: 1 Lemma: There exists n > 0 such that µ(T −n(A) ∩ A) > 0. Proof: If not, we claim that {T −n(A)}∞ n=0 are pairwise measure disjoint (i.e., for n 6= m, µ(T −n(A) ∩ T −m(A)) = 0); this follows because (assuming n > m): T −n(A) ∩ T −m(A) = T −m(T −(n−m)(A) ∩ A) −n Thus, µ(∪∞ (A)) = ∞. n=0 T end of proof of lemma. Let G be the set of points in A which do not visit A infinitely often. We want to show that µ(G) = 0. Let Gn be the set of points in A that visit A exactly n times. Then G is the disjoint union of the Gn and so it suffices to show that µ(Gn) = 0. If µ(Gn) > 0, then by Lemma, there exists n0 > 0 s.t. 0 µ((T −n (Gn) ∩ Gn)) > 0. 0 If x is in this intersection, then x ∈ Gn and T n (x) ∈ Gn and so x visits A at least n + 1 times, a contradiction to defn. of Gn. Lecture 15 (Oct. 12): Comments from last time: 1. Recall charactrerization R of MPT inRterms of functions: 1 Theorem: T is MPT iff f ◦ T dµ = f dµ for all f ∈ L . – This indicates why we, in defn,, we require measure of T −1(A) to be preserved, instead of T (A); other reasons: ——— parallel with defn of measurablity ——— allows certain non-invertible tranfromations to be MPT – In fact, this results holds for Lp any 1 ≤ p ≤ ∞. 2 (on a finite measure space all such Lp are contained in L1: Z Z Z Z |f | ≤ |f | + |f | ≤ 1 + |f |p |f |≤1 |f |>1 |f |>1 ) Induced operators on Lp: UT f = f ◦ T . – Note: f ◦ T ∈ Lp since |f |p ∈ L1 and on L1, U preservers integral. – most important for p = 2; Hilbert space. – von Neumann, Koopmans, Kakutani 2. Recall Lemma in Poincare Recurrent Theoren (1890): For MPT T and µ(A) > 0. Lemma: There exists n > 0 such that µ(T −n(A) ∩ A) > 0 Note: an inductive application of the lemma shows that for any k > 0, there exist nk > nk−1 . . . n2 > n1 > 0 s.t. µ(A ∩ T −n1 (A) ∩ T −n2 (A) · · · ∩ T −(nk (A)) > 0. Multiple recurrence theorem (Furstenburg, 1977): Let T be an MPT and k > 0. Let µ(A) > 0. Then there exists n > 0 s.t. µ(A ∩ T −n(A) ∩ T −2n(A) · · · ∩ T −(k)n(A)) > 0. Proof: Deep. This theorem turns out to be equivalent to Szmeeredi Theorem (1975): Let Λ ⊂ N have positive upper density, i.e., lim sup n→∞ Λ ∩ [1, n] >0 n 3 Then B contains arithmetic progressions of arbitrary length. Proof: delicate combinatorics. Even non-trivial for k = 2 Will explain the link later (or good topic for a student talk). For now, corresponding to Λ, you constrruct an MPT T on M and a subset A ⊂ M s.t. Λ has an arithemetic progression of length k +1 iff µ(A ∩ T −n(A) ∩ T −2n(A) · · · ∩ T −(k)n(A)) > 0. Ergodic theory provided first proofs for similar results on patterns that must occur say in subset of Z 2 with “upper positive density.” Next, how do you verify measure-preserving? Check on a small collection of sets (alluded to this last time): Defn: A semi-algebra B is a collection of sets s.t. 1. B is closed under finite intersections 2. for B ∈ B, B c is a finite disjoint union of elements of B. Note: weaker concept than algebra Examples: • Intervals in R, • (literal) rectangles in R2 + • cylinder sets in F Z or F Z : A = {x : xi1 = a1 . . . xik = ak } A semi-algebra B generates a σ-algebra A if A is the smallest σ-algebra containing B. 4 Examples: Borel sets in R, R2 and Borel sets in the product σ+ algebras F Z , F Z . ms Theorem T is MPT iff for a generating semi-algebra B of A, for all for B ∈ B, T −1(B) ∈ A and µ(T −1(B)) = µ(B) Proof: Only if: obvious If: argue using monotone class lemma: Let C = {A ∈ A : T −1(A) ∈ A and µ(T −1(A)) = µ(A)} Then C contains B and hence the algebra generated by B (the algebra is the set of all finite disjoint unions of elts. of B) And C is a monotone class, (i.e., closed under countable increasing sequence of s sets and countable decreasing sequences of sets). Then C = A, by Monotone class lemma. Examples: Check MPT on semi-algebra: 1. Recall: Circle rotation (w.r.t. normalized Lebesgue measure on the circle) M : circle identified as [0, 2π] with normalized Lebesgue measure Tα (θ) = θ + α mod 2π MPT because Lebesgue measure is translation invariant. Can also be viewed as map from M = [0, 2π] to itself. Graph has slope 1. Inverse image of an interval I is one or two intervals whose total length is `(I). 5 2. Doubling map (w.r.t. normalized Lebesgue measure on the circle) M = [0, 1],µ = Lebesgue measure T (x) = 2x mod 1 Draw graph, which has two pieces of slope 2: Inverse image of an interval I is the union of two intervals each with length (1/2)`(I). Note: would not be an MPT if forward images required to preserve measure. 3. Recall: Baker’s transformation (w.r.t. Lebesgue measure on the square) M : unit square with Lebesgue measure if 0 ≤ x < 1/2 T (x, y) = (2x mod 1, (1/2)y) T (x, y) = (2x mod 1, (1/2)y + 1/2) if 1/2 ≤ x < 1 Draw inverse image of a rectangle contained in bottom (blue) or top (red). Inverse image has half the width and twice the height. If a rectangle intersects top and bottom, then split it into top part and bottom part. Since IMPT, can alternatively check µ(T (A)) = µ(A). 4. Recall: stationary stochastic process, one-sided or two-sided e.g., iid or stationary Markov say one-sided: + M = F Z = {x0x1x2 · · · : xi ∈ F } (F is a finite alphabet) – T = σ, the left shift map: 6 for cylinder set A = {x ∈ M : xi1 = a1 . . . xik = ak } define µ(A) = p(Xi1 = a1 . . . Xik = ak ) where p is law of process. Extend to product sigma-algebra to define µ. σ −1(A) = {x ∈ M : xi1+1 = a1 . . . xik +1 = ak } µ(σ −1(A)) = µ(A) Note that sometimes µ(σ(A)) 6= µ(A), e.g., A = {x : x0 = 1}, σ(A) = M, enire space. falls off cliff ————————————— The link from Furstenburg to Szmeredi : Given B, let χB be the characteristic function on N. Let M be the subset of all x ∈ {0, 1}N such that every finite word in x appears in χB infinitely often. Let µ be the measure on M which counts frequency of 1’s. Then apply MRT to σ on (M, A, µ) (where σ is the left shift). —————————————Lecture 16 (Oct, 14): Defn: An MPT T is ergodic if whenever µ(A) > 0, then a.e. −n x ∈ M visits A, i.e., µ(∪∞ A) = 1. n=1 T Note: in fact, a.e. x ∈ M visits A infinitely often: Why? −n −k Let B = ∪∞ A). Then µ(∩∞ (B)) = 1. n=1 T k=0 T TFAE 7 1 T is ergodic 2 if µ(A), µ(B) > 0, then for some n > 0, µ(T −n(A) ∩ B) > 0. 3 If T −1(A) ⊆ A, then µ(A) = 0 or 1. 3’ If µ(T −1(A) \ A) = 0, then µ(A) = 0 or 1. 4 (the usual definition) If T −1(A) = A, then µ(A) = 0 or 1. 4’ If µ(T −1(A)∆A) = 0, then µ(A) = 0 or 1. 5p If f ∈ Lp and f ◦ T = f , then f is constant a.e. (here, 0 ≤ p ≤ ∞) 5p’ If f ∈ Lp and f ◦ T = f a.e., then f is constant a.e. (here, 0 ≤ p ≤ ∞) Note: – 1 and 2 express universal explorer – 3 and 4 express irreducibility; you can’t split the space into two non-trivial invariant pieces – 5 expresses a functional analysis version: the number 1 is a simple eigenvalue of the induced operator on Lp 1 ≤ p < ∞ Note: 3 is equivalent to T −1(A) ⊇ A, then µ(A) = 0 or 1. Proof (exercise): Assume 3. If T −1(A) ⊇ A, then T −1(Ac) = (T −1(A))c ⊆ Ac. Thus, µ(Ac) = 0 or 1. Thus µ(A) = 0 or 1. Note: Most important equivalent defns. left out; need ergodic theorem. Proof: 8 1 implies 2: the measure of the intersection of a set of measure 1 and a set of measure b is b. Thus, ∞ X −n 0 < µ(B) = µ(B ∩ (∪∞ (A)) ≤ µ(B ∩ T −n(A)) n=1 T n=1 −n 2 implies 3: If T −1(A) ⊆ A, then ∪∞ (A) ⊆ A. n=1 T Let B = Ac. If 0 < µ(A) < 1, then µ(A), µ(B) > 0. −n Then, by 2, for some n > 0, µ(B∩T −n(A)) > 0. But ∪∞ (A) n=1 T and B are disjoint. 3 implies 3’: Let B = T −1(A) \ A. Then µ(B) = 0. −n Let C = ∪∞ (B). Then µ(C) = 0. n=0 T Let D = A ∪ C. Then T −1(D) ⊆ D: T −1(D) = T −1(A) ∪ T −1(C) ⊆ (A ∪ B) ∪ C = A ∪ C = D By 3, µ(D) = 0 or 1. But µ(D∆A) ≤ µ(C) = 0: Thus, µ(A) = 0 or 1. 3’ implies 3: obvious 4’ implies 4: obvious 3 implies 4: obvious 3’ implies 4’: obvious 4 implies 5p: Let Ar = f −1((−∞, r]). Then T −1(Ar ) = Ar and so, by 4, µ(Ar ) = 0 or 1. But Ar are increasing and 1 = µ(∪r Ar ) = lim µ(Ar ). r→∞ 9 Let r0 = inf{r : µ(Ar ) = 1}. Then f = r0 a.e. Note: use fact that distribution functions are right continuous. 4’ implies 5p’: similar 5p’ implies 5p: obvious 5p implies 4: let f = χA. 4 implies 1: −n Proof: Let µ(A) > 0 and B = ∪∞ (A)). n=1 T −k Let C = ∩∞ (B). k=0 T Enough to show that µ(C) = 1 because T −k (B) is a decreasing sequence of sets all with the same measure. Observe that T −1(C) = C because T −k (B) is a decreasing sequence. Thus, µ(C) = 0 or 1. If µ(C) = 0, then µ(B) = 0 and so µ(A) = 0, a contradiciton. Note: C is the lim sup of A. Lecture 17 (Oct. 17): Midterm review: Friday, Oct. 21, 4:15 PM, Math 126 Lebesgue ∼ 1902. So, how could Poincare (1890) prove something about measure theory before measure theory was really developed? Recall: equivalent conditions for ergodicity. Note: If T is IMPT, can replace T −1 with T . Note: If T is not ergodic, it can be decomposed into ergodic pieces (ergodic decomposition– Keller, sec. 2.3). Check examples: Example 1: Rotation of Circle 10 View as T = Tα (z) = az on M = {|z| = 1}. where a = eiα . with measure µ(A) = (1/(2π))Lebesgue((log(A))/i) Note this is IMPT Case 1: α/(2π) = p/q ∈ Q: gcd(p, q) = 1 {1, a, a2, . . . , aq−1} is invariant under Tα . Using rigidity of T , a thickening of this set is invariant under Tα and has measure in (0, 1). Thus, T is not ergodic. Alternatively, the function f (z) = z q is satisfies f ◦ T (z) = (az)q ) = z q = f (z), but is not constant a.e. Case 2: α/(2π) 6∈ Q: Then for all n ∈ Z \ {0}, an 6= 1. Apply condition 5p’ with p = 2: An orthonormal basis of functions for L2(M ) is {z n : n ∈ Z} and each f ∈ L2(M ) is represented by its Fourier series: f= ∞ X bnz n in L2 n=−∞ Then, since T is MPT, ∞ ∞ X X f ◦T = an(z n ◦ T ) = anbnz n in L2 n=−∞ n=−∞ So, if f ◦T = f a.e., then f ◦T = f in L2, and so each (an −1)bn = 0 Since for n 6= 0, an − 1 6= 0, we have for for each n 6= 0, bn = 0, and so f is constant a.e. 11 So, T is ergodic. Example 4 (special case): iid (2-sided or 1-sided) Proof of ergodicity: (alternative proof, apply L2 strong law. ???) Will need fact that |µ(A) − µ(B)| ≤ µ(A∆B). + Recall A is the product σ-algebra on F Z or F Z , Apply condition 4. Suppose A = σ −1(A) for some A ∈ A Suffices to show: µ(A) = µ(A)2. Let B be the algebra of finite disjoint unions of cylinder sets (not the semi-algebra of cylinder sets). Given > 0, there exists B ∈ B such that µ(A∆B) < . Choose n s.t. the cylinder coordinates of B are disjoint from those of σ −n(B). Let C = σ −n(B) Then µ(B ∩ C) = µ(B)2. Now, µ(A∆C) = µ(σ −n(A)∆σ −n(B)) = µ(A∆B) < |µ(A)−µ(A)2| ≤ |µ(A)−µ(B)|+|µ(B)−µ(B∩C)|+|µ(B∩C)−µ(A)2| ≤ µ(A∆B) + µ(B∆(C)) + |µ(B)2 − µ(A)2| 12 ≤ + 2 + 2 ——————— Since symmetric difference is a metric (on measble. sets, mod measure zero), it follows that µ(B∆C) < 2 Thus, µ(B)2 = µ(B ∩ C) ≈ µ(B) Since µ(A∆B) < , µ(A) ≈ µ(B) Thus, µ(A)2 ≈ µ(A) Thus µ(A) ≈ 0 or 1. ——————————Note!: all we really needed is asymptotic independence for cylinder sets. Will pick up on this thread later. Non-example: µ = 1/2 iid(p) + 1/2 iid(q) where p and q are distinct prob. vectors of length 2. Sat one-sided shift We claim that w.r.t. µ, σ satisfies all equiv. of defn of ergodicity for all cylinder sets or even algebra of finite unions of cylinder sets (e.g., version 1). However, it is not ergodic. Proof: Check condition 1: 13 Let A be cylinder set. Since µ dominates iid(p), iid(q), µ(A) > 0. Since iid(p), iid(q) are ergodic, for ν = iid(p), iid(q), −n ν(∪∞ A) = 1. n=1 T and so −n µ(∪∞ A) = 1. n=1 T However, let + x0 + . . . + xn−1 = q1 } n→∞ n A = {x ∈ F Z : lim Then by strong law of large numbers µ(A) ≥ 1/2. But Ac contains + x0 + . . . + xn−1 = p1 } n→∞ n B = {x ∈ F Z : lim and µ(B) ≥ 1/2 for same reason Thus, µ(A) = 1/2. But clearly σ −1(A) = A. So, σ is not ergodic w.r.t µ. Lecture 18 (Oct. 19) Can show directly that doubling map and Baker are ergodic. But here is an indirect way. We can consider measure-preserving maps and invertible measurepreserving maps from one probability space to another. Defn: If T is an MPT on (M, A, µ) and S is an MPT on (N, B, ν), we say T is isomorphic to S if there exists an invetrible measurepreserving map φ from M to N such that φ◦T =S◦φ Major Problem: classify MPT’s up to isomorphism. Prop: Ergodicity is an isomorphism invariant. 14 Proof: Spose T is ergodic. Let B ∈ B s.t. S −1(B) = B. Let A = φ−1((B)). Then T −1(A) = T −1 ◦ φ−1((B)) = (φ ◦ T )−1(B) = (S ◦ φ)−1(B) = φ−1 ◦ S −1(B) = φ−1(B) = A. Thus, µ(A) = 0 or 1. Thus, ν(B) = 0 or 1. Example 3: Recall: Isomorphism of Baker and uniform i.i.d. binary two-sided process φ(· · · x−1x̂0x1x2 · · ·) = (.x0x1x2 · · · , .x−1x−2 · · ·) (in binary) Example 2: Isomorphism of doubling map and uniform i.i.d. binary one-sided process φ(x0x1x2 · · ·) = .x0x1x2 · · · , (in binary) So, both are ergodic. Ergodic Theorem: Let T be MPT. Let f ∈ L1. Pn−1 1. The limit limn→∞(1/n)( i=0 f ◦ T i(x)) exists a.e. x. Call the limit function f ∗(x) 2. f ∗ ∈ L1 3. f ∗ ◦ T = f ∗ a.e. 4. Let I be σ-algebra of invariant sets mod 0. If A ∈ I, then Z Z f∗ = f A A 5. f ∗ = E(f |I) 15 Pn−1 ◦ T i(x)) = f ∗(x) in L1. Pn−1 p 7. If f ∈ L , then limn→∞(1/n)( i=0 f ◦ T i(x)) = f ∗(x) in Lp. 6. limn→∞(1/n)( n=0 f 8. If T is ergodic, then f∗ = Z f a.e. x M (time average = space average) Note: item 3 is clear from the defn. Note: item 5 follows from items 3 and 4, because 3 implies that each set (f ∗)−1((−∞, r]) ∈ I and so f ∗ is I-measurable. Note: item 8 follows since if T is ergodic, then I is trivial and so f ∗ = E(f |I) is constant a.e. R R R ∗ ∗ But also M f = M f and so f = M f a.e. Note that this recovers the strong law of large numbers for iid processes. Lecture 19 (Oct. 21) Midterm Review: Today at 4:15 in Math 126. Proof of Ergodic Theorem: Part 1 is the guts. By splitting into positive and negative parts, WMA, f ≥ 0. Let Snf (x) = n−1 X f ◦T i(x), Anf (x) = (1/n)Snf (x), A+f (x) = lim sup Anf (x) i=0 Show: A+f = A−f a.e. R R + Will show: (*) f dµ ≥ A f dµ. R R − It will follow that f dµ ≤ A f dµ. 16 Thus, A+f = A−f a.e. Proof of (*): Let r > 0 be large and 0 < < 1 be small. Let H = Hr, = (1 − ) min(A+f, r). (a lower approx to A+f ) Picture: Clearly, A+f ◦ T = A+f . Thus, H ◦ T = H. Let τ (x) = min{n ≥ 1 : Anf (x) ≥ H(x)}. Note τ < ∞ a.e. Picture: Let M > 0 large Define inductively: τk = τk (x) and tk = tk (x). τ0 = 0, t0 = 0. τk = τ (T tk−1(x)(x) if τ (T tk−1(x)(x) ≤ M and = 1 otherwise. tk = tk−1 + τk . Picture: Snf (x) ≥ X Sτk f ◦ (T tk−1 (x)) k:tk ≤n ≥ H(x)( X τk ) − H(x)( X k:tk ≤n, τ (T tk−1 (x))>M k:tk ≤n ≥ H(x)(n − M ) − H(x)Sn χτ >M (x) ≥ 17 1) H(x)(n − M ) − rSn χτ >M (x) So, Snf (x) ≥ H(x)(n − M ) − rSn χτ >M (x) Integrate and divide by n: Z Z n−M (1 − ) min(A+f, r)dµ − rµ({τ > M }) f dµ ≥ n Let n → ∞. Z Z f dµ ≥ (1 − ) min(A+f, r)dµ − rµ({τ > M }) Let M → ∞. Z Z f dµ ≥ (1 − ) min(A+f, r)dµ Let → 0. Z Z f dµ ≥ min(A+f, r)dµ Let r → ∞ and apply MCT. Z Z f dµ ≥ A+f dµ as desired. Now, we show: (**) R f dµ ≤ R A−f dµ. Proof: First, assume f is bounded: f ≤ N . By (*), Z Z Z (N − f )dµ ≥ A+(N − f )dµ = (N + A+(−f ))dµ, 18 which is equivalent to: Z N− Thus, R f dµ ≤ R Z f dµ ≥ N − A−f )dµ A−f dµ. In general case, let fN = min(f, N ). Then Z Z Z A−f dµ ≥ A−fN dµ ≥ fN dµ Let N → ∞ and apply MCT. This give (**). Then, as outlined above, we get A+f = A−f a.e., and this proves 1. In the course of proving this, we have shown thus f ∗ ∈ L1, giving 2. R f ∗dµ = R f dµ and As mentioned above, 3 follows from 1. 4: If A ∈ I and µ(A) = 0, this clearly holds. R ∗ R If µ(A) > 0, apply the result f dµ = f dµ to the MPT T |A. As mentioned above, 5 follows from 3 and 4. 6: uses a generalized version of DCT. 7: von Neumann’s mean ergodic theorem generalized from Hilbert space to Banach space. 8: as mentioned above, easily follows from 1 and 5. 19