Lecture 7: Recall defns of Snf and Anf . Recal that WMA fR ≥ 0. R Proof of (*): f dµ ≥ A+f dµ (recall that this, together with the analogous statement regarding A−f , is sufficient to prove parts 1 and 2 of individual ergodic theorem). Observe that A+f ◦ T = A+f for all x (same reason as Part 3). LET’S ASSUME THAT f is bounded (e.g., a characteristic function) so that A+f is finite-valued. Let > 0. Let τ (x) := min{n ≥ 1 : Anf (x) ≥ (1 − )A+f (x)}. Note that τ is finite-valued. Define inductively: τk = τk (x) and tk = tk (x): τ0 = 0, t0 = 0. τk := τ (T tk−1 (x)) tk = tk−1 + τk . Picture: Snf (x) ≥ X Sτk f ◦(T tk−1 (x)) ≥ ( k:tk ≤n =( X X τk )(1−)(A+f (T tk−1 x)) k:tk ≤n τk )(1−)(A+f (x)) = (max({tk : tk ≤ n}))(1−)(A+f (x)) k:tk ≤n SUPPOSE τ is bounded, i.e., τ (x) ≤ L for some L. Then Snf (x) ≥ (n − L)(1 − )(A+f (x)) and so Anf (x) ≥ ( n−L )(1 − )(A+f (x)) n 1 Integrate and use MP, Z Z Z n−L f dµ = Anf dµ ≥ ( )(1 − )( A+f dµ) n Let n → ∞. Full proof: Since τ (x) is measurable and finite-valued, µ({τ (x) > L}) → 0. Let r, > 0 (r is be large and is small). Let H(x) := Hr,(x) = (1 − ) min(A+f, r)(x). (a lower approx to A+f (x)) Thus, H ◦ T (x) = H(x). Picture: Redefine: τ (x) := min{n ≥ 1 : Anf (x) ≥ H(x)}, and correspondingly redefine τk and tk as follows: Fix L > 0. τk := τ (T tk−1(x)(x) if τ (T tk−1(x)(x) ≤ L and := 1 otherwise. Snf (x) ≥ X Sτk f ◦ (T tk−1 (x)) k:tk ≤n ≥ H(x)( X X τk ) − H(x)( k:tk ≤n, τ (T tk−1 (x))>L k:tk ≤n ≥ H(x)(n − L) − H(x)Snχτ >L(x) ≥ H(x)(n − L) − rSnχτ >L(x) So, Snf (x) ≥ H(x)(n − L) − rSnχτ >L(x) Integrate, use MP, and divide by n: 2 1) n−L f dµ ≥ (1 − ) n Z Z min(A+f, r)dµ − rµ({τ > L}) Let n → ∞. Z Z f dµ ≥ (1 − ) min(A+f, r)dµ − rµ({τ > L}) Let L → ∞. Z Z f dµ ≥ (1 − ) min(A+f, r)dµ Let r → ∞ and apply MCT: Z Z f dµ ≥ (1 − ) A+f dµ R Now, we show: (**) f dµ ≤ R A−f dµ. Proof: First, assume f is bounded: f ≤ N . By (*), Z Z Z (N − f )dµ ≥ A+(N − f )dµ = (N + A+(−f ))dµ, and so Z N− Thus, R f dµ ≤ R Z f dµ ≥ N − A−f dµ A−f dµ. In general case, let fN = min(f, N ). Then Z Z Z A−f dµ ≥ A−fN dµ ≥ fN dµ Let N → ∞ and apply MCT. This gives (**). Then, as outlined above, we get A+f = A−f a.e., and this proves Part 1. 3 In the course of proving this, we have shown thus f ∗ ∈ L1, giving Part 2. R f ∗dµ = R f dµ and We already showed Parts 3 and 4 . Note: this proof (taken mainly from Keller, Theorem 2.1.5) is clunkier than the classical proof, via maximal inequality, but more intuitive. Corollary: Let I be the collection of of invariant sets mod 0 i.e., sets A s.t. µ(A∆T −1(A)) = 0. Then I is a σ-algebra and f ∗ = E(f |I) a.e. Proof: I is a σ-algebra: left as an exercise. f ∗ is I-measurable since, by Part 3 of ergodic theorem, for all r, (f ∗)−1(−∞, r]) ∈ I. So it suffices to show that if A ∈ I, then Z Z f dµ f ∗dµ = A A Clearly true if µ(A) = 0. Otherwise, apply Part 2 of ergodic theorem to f |A (with normalized measure on A). Lp ergodic theorem (von Neumann (p = 2), 1932): Let T be an MPT and f ∈ Lp, 1 ≤ p < ∞. Then there exists f ∗ ∈ Lp s.t. f ∗ ◦ T = f ∗ a.e. and n−1 X ||(1/n)( f (T i(x))) − f ∗||p → 0 i=0 ∞ Proof: For f ∈ L apply individual ergodic theorem and bounded convergence theorem. Then approximate Lp functions by L∞ functions (see Walters, Corollary 1.14.1 (GTM edition (1982)), or Theorem 1.5 (online SLN edition (1975))). Note: Legend has it that von Neumann proved his theorem before Birkhoff proved his theorem (and did not use the individual ergodic theorem). 4 Lecture 8: From the ergodic theorem, we recover – the strong law of large numbers for finite-valued iid processes + Xi, with X0 ∼ p: apply ergodic theorem to (F Z , Borel, µ) where µ is the law of iid(p), T = σ (left shift) and f (x) = x0 and – Weil equidistribution theorem: Theorem (Weil, 1909): If α ∈ [0, 1) is irrational, then {α, 2α, 3α, . . . , } is uniformly distributed mod 1 in [0, 1): for any subinterval (a, b) ⊂ [0, 1) |{i ∈ [0, n − 1] : iα ∈ (a, b)}| →b−a n Proof: apply the ergodic theorem to Tα(2π), f = χ(2πa,2πb), for fixed a, b.. For a.e. x ∈ [0, 1), get |{i ∈ [0, n − 1] : (x + iα) ∈ (a, b)}| →b−a n (1) Thus, there is a set K of full measure s.t. for all x ∈ K, we get (??) for all (a, b) with rational a, b. Let > 0 and a < b. Choose a0, b0, a00, b00 ∈ Q s.t. a00 < a < a0 < b0 < b < b00, a0 − a00 < , b00 − b0 < . Choose N s.t. for n ≥ N and x ∈ K, |{i ∈ [0, n − 1] : (x + iα) ∈ (a0, b0)}| | − (b0 − a0)| < n and |{i ∈ [0, n − 1] : (x + iα) ∈ (a00, b00)}| − (b00 − a00)| < | n 5 Then |{i ∈ [0, n − 1] : (x + iα) ∈ (a, b)}| − (b − a)| < 3 n Thus, by translation, the same holds for x = 0 (note that all we needed was one x s.t. (??) holds for all (a, b)). Corollary of Ergodic Theorem: Let T be an MPT. TFAE: | i. T is ergodic. ii. For all f ∈ L1, lim (1/n)( n→∞ n−1 X f ◦ T i(x)) = Z f dµ a.e. i=0 iii. For all A, B ∈ A, lim (1/n)( n→∞ n−1 X µ(T −i(A) ∩ B)) = µ(A)µ(B). i=0 iv. Let B be a semi-algebra which generates A. For all A, B ∈ B, lim (1/n)( n→∞ n−1 X µ(T −i(A) ∩ B)) = µ(A)µ(B). i=0 (So, it suffices to just check part iv on cylinder sets to establish ergodicity for stationary processes). Proof: i ⇒ ii: apply the Ergodic Theorem. ii ⇒ iii: apply ii with f = χA. Then lim (1/n)( n→∞ n−1 X χA ◦ T i(x)χB ) = µ(A)χB a.e. i=0 Apply the bounded convergence theorem. 6 iii ⇒ iv: trivial. iv ⇒ i: Note that iv automatically holds for all sets A, B in the algebra generated by B (i.e., finite disjoint unions of elements of B) Let T −1(A) = A. Let > 0 and A0 be in the algebra generated by B s.t. µ(A∆A0) < . (2) Then by the triangle inequality and the fact T −1(A) = A, |µ(A)2−µ(A)| ≤ |µ(A)2−µ(A0)2|+|µ(A0)2−((1/n) n−1 X µ(T −i(A0)∩A0))| i=0 +|(1/n) n−1 X µ(T −i(A0) ∩ A0) − µ(T −i(A) ∩ A0)| i=0 +|(1/n) n−1 X µ(T −i(A) ∩ A0) − µ(T −i(A) ∩ A)| i=0 ≤ 2|µ(A) − µ(A0)| + + + < 5 where, in the last expression, the first occurrence of comes from iv, applied to the algebra generated by B, if n is sufficiently large, and the last two occurrences of come from (??), the fact that T is MPT, the triangle inequality. Thus, µ(A)2 − µ(A) = 0. Note: If T is IMPT, can swap T with T −1 everywhere. Characterize ergodicity of stationary finite-state first-order Markov + chains (MC) – one-sided or two-sided: (F Z , Borel, µ), where µ is the law of the MC and T = σ, MC defined by an m×m stochastic matrix P and stochastic vector π: πP = π; states of the chain are the m indices of the matrix. 7 + The law: define Aa0,...,a` := {x ∈ F Z : x0 = a0, . . . , x` = a`}; then µ(A) = πa0 Pa0a1 Pa1a2 · · · Pa`−1a` . + extends to a σ-invariant measure µ on F Z . WMA: π > 0 (delete states with zero stationary probability; does not affect the MPT) Main Fact: µ(σ −k (Aj ) ∩ Ai) = πi(P k )ij Defn: P is irreducible if for all i, j, there exists n = n(i, j) s.t. (P n)ij > 0. Defn: directed graph, G = G(P ) of P : V = {1, . . . , m}; directed edge from i to j iff Pij > 0. Note: Pijn > 0 iff there exists a path in G of length n from i to j, i.e., P is irreducible iff G is strongly connected. Note: if P is positive entry-by-entry, it is irreducible. Example 0: 0 1 P = 1 0 irreducible Example 1: 0 1 0 P = 1/3 0 2/3 0 1 0 is irreducible. Example 2: 1 0 0 P = 0 1/3 2/3 0 1/4 3/4 8 is reducible. Example 3: 1/3 1/3 1/3 P = 0 1/3 2/3 0 1/4 3/4 is reducible. Theorem: An MC is ergodic iff P is irreducible. 9 Lecture 9: Recall that T is ergodic iff: iv. Let B be a semi-algebra which generates A. For all A, B ∈ B, lim (1/n)( n−1 X n→∞ µ(T −k (A) ∩ B)) = µ(A)µ(B). k=0 In the following P is a stochastic matrix, π is a stochastic vector with all strictly positive entries and πP = π (P is not necessarily irreducible). Recall: πi(P k )ij = µ(T −k (Aj ) ∩ Ai), where Ai = {x : x0 = i}. Lemma: Q := lim (1/n)( n→∞ n−1 X P k) k=0 exists (entry by entry) and Z (Qij = (1/πi) (χ∗Aj )χAi dµ) Proof: n−1 n−1 X X Qij = (1/πi) lim (1/n)πi( (P k )ij ) = (1/πi) lim (1/n)( µ(σ −k (Aj )∩Ai)) n→∞ n→∞ k=0 (1/πi) lim (1/n)( n→∞ Z = (1/πi) n−1 Z X k=0 (χAj ◦ σ k )(χAi )dµ) = k=0 n−1 X ( lim (1/n)( (χAj ◦ σ k )(χAi )dµ) n→∞ k=0 Z = (1/πi) (χ∗Aj )χAi dµ (by ergodic theorem and bounded convergence theorem) 10 Note that the proof assumes that there exists a strictly positive π that accompanies P and gives the same result for all such π. Exercise: interpret χ∗Ai concretely in terms of the Markov chain. Alternate proof: use linear algebra and fact that no eigenvalue has modulus > 1 and Jordan form is trivial for eigenvalues of modulus 1 (follows from Perron Frobenius Theorem). Corollary: – – – – QP = Q = P Q Q2 = Q πQ = π Q is stochastic Theorem TFAE: 1. The MC defined by π, P is ergodic 2. For all i, j, Qij = πj 3. P is irreducible. Proof: 1 ⇔ 2: Let Ai and Aj be initial cylinder sets. Since πi(P k )ij = µ(T −k (Aj ) ∩ Ai), P Qij = πj iff (1/n) n−1 (P k )ij → πj k=0 P −k iff (1/n) n−1 (Aj ) ∩ Ai) → µ(Ai)µ(Aj ). k=0 µ(T So, (2) holds iff condition (iv) for ergodicity holds for all cylinder sets Ai, Aj . Exercise: extend “only if” to general initial cylinder sets. Let A = Aa0,...a`−1 µ(A) = πa0 Pa0a1 Pa1a2 · · · Pa`−2a`−1 . 11 Let B = Ab0,...bu−1 µ(B) = πb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 . and for suff. large k, µ((σ −k (A))∩B) = πb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 (Pbk−u )Pa0a1 Pa1a2 · · · Pa`−2a`−1 . u ,a0 Pn−1 Thus, (1/n)( k=0 µ((σ −k (A)) ∩ B)) converges to πb0 Pb0b1 Pb1b2 · · · Pbu−1bu πa0 Pa0a1 Pa1a2 · · · Pa`−1a` . = µ(A)µ(B). 2 ⇒ 3: Q > 0 and so by defn. of Q, for all i, j there exists n s.t. Pijn > 0. 3 ⇒ 2: Q = QP = QP 2 = QP n for all n. Fix i, j. Show Qij > 0. P n – Qij = k Qik Pkj – Since Q is stochastic, there exists k s.t. Qik > 0. n > 0. – Since P is irreducible, there exists n s.t. Pkj – Thus, Qij > 0. P Since Q = Q2, Qij = k Qik Qkj . Since Q is stochastic, Qij is a weighted average, with strictly positive weights of Q1j , Q2j , . . . , Qmj . But this holds for all i. Thus, Qij depends only on j: Qij = qj . P P But πQ = π. Thus, πj = k πk Qkj = k πk qj = qj . Note: This result shows that for irreducible P , the stationary vector is unique (but this also follows from linear algebra, PerronFrobenius Theorem) 12 Defn: An MPT T is mixing if for all A, B ∈ A, µ(T −n(A) ∩ B) → µ(A)µ(B). Fact: For an IMPT can replace T −n with T n in defn. of mixing. Fact: Mixing implies ergodicity, since convergence implies Cesaro convergence (see condition (iv) of earlier Theorem) – For an ergodic MPT, the orbit of a set of positive measure gets uniformly distributed. – For a mixing MPT, the time-n images of sets of positive measure persistently get uniformly distributed. Fact (exercise): to check mixing, it suffices to check mixing on a generating semi-algebra (e.g., initial cylinder sets for stationary processes); an approximation argument. Fact: Mixing is an isomorphism invariant; in fact, if there is an MP homomorphism from an MPT T to and MPT S and T is mixing, then so is S. Fact: If T is mixing, given any finite collection of sets of positive measure, A1, . . . , Am, ∃n ∀i, j, µ(T −n(Aj ) ∩ Ai) > 0 (and therefore not disjoint). Note: blob picture of mixing. Example: Rotation Tα of circle is never mixing. Proof: Partition the circle into disjoint three disjoint (half-open) arcs, A, B, C, of the same size. Then for all n, T n(A) is disjoint from one of A, B, C. Example: iid process (one-sided or two-sided): is mixing because of independence (on cylunder sets). Example: It follows that doubling map and Baker are mixing (since mixing is an isomorphism invariant). 13 Picture of Baker. 14