Lecture 7: Recall defns of S f and A f .

advertisement
Lecture 7:
Recall defns of Snf and Anf .
Recal that WMA fR ≥ 0.
R
Proof of (*):
f dµ ≥ A+f dµ (recall that this, together
with the analogous statement regarding A−f , is sufficient to prove
parts 1 and 2 of individual ergodic theorem).
Observe that A+f ◦ T = A+f for all x (same reason as Part 3).
LET’S ASSUME THAT f is bounded (e.g., a characteristic function) so that A+f is finite-valued.
Let > 0.
Let τ (x) := min{n ≥ 1 : Anf (x) ≥ (1 − )A+f (x)}. Note that τ
is finite-valued.
Define inductively: τk = τk (x) and tk = tk (x):
τ0 = 0, t0 = 0.
τk := τ (T tk−1 (x))
tk = tk−1 + τk .
Picture:
Snf (x) ≥
X
Sτk f ◦(T
tk−1
(x)) ≥ (
k:tk ≤n
=(
X
X
τk )(1−)(A+f (T tk−1 x))
k:tk ≤n
τk )(1−)(A+f (x)) = (max({tk : tk ≤ n}))(1−)(A+f (x))
k:tk ≤n
SUPPOSE τ is bounded, i.e., τ (x) ≤ L for some L. Then
Snf (x) ≥ (n − L)(1 − )(A+f (x))
and so
Anf (x) ≥ (
n−L
)(1 − )(A+f (x))
n
1
Integrate and use MP,
Z
Z
Z
n−L
f dµ = Anf dµ ≥ (
)(1 − )( A+f dµ)
n
Let n → ∞.
Full proof: Since τ (x) is measurable and finite-valued,
µ({τ (x) > L}) → 0.
Let r, > 0 (r is be large and is small).
Let H(x) := Hr,(x) = (1 − ) min(A+f, r)(x). (a lower approx
to A+f (x))
Thus, H ◦ T (x) = H(x).
Picture:
Redefine: τ (x) := min{n ≥ 1 : Anf (x) ≥ H(x)}, and correspondingly redefine τk and tk as follows:
Fix L > 0.
τk := τ (T tk−1(x)(x) if τ (T tk−1(x)(x) ≤ L and := 1 otherwise.
Snf (x) ≥
X
Sτk f ◦ (T tk−1 (x))
k:tk ≤n
≥ H(x)(
X
X
τk ) − H(x)(
k:tk ≤n, τ (T tk−1 (x))>L
k:tk ≤n
≥ H(x)(n − L) − H(x)Snχτ >L(x) ≥
H(x)(n − L) − rSnχτ >L(x)
So,
Snf (x) ≥ H(x)(n − L) − rSnχτ >L(x)
Integrate, use MP, and divide by n:
2
1)
n−L
f dµ ≥
(1 − )
n
Z
Z
min(A+f, r)dµ − rµ({τ > L})
Let n → ∞.
Z
Z
f dµ ≥ (1 − ) min(A+f, r)dµ − rµ({τ > L})
Let L → ∞.
Z
Z
f dµ ≥ (1 − )
min(A+f, r)dµ
Let r → ∞ and apply MCT:
Z
Z
f dµ ≥ (1 − ) A+f dµ
R
Now, we show: (**)
f dµ ≤
R
A−f dµ.
Proof: First, assume f is bounded: f ≤ N .
By (*),
Z
Z
Z
(N − f )dµ ≥ A+(N − f )dµ = (N + A+(−f ))dµ,
and so
Z
N−
Thus,
R
f dµ ≤
R
Z
f dµ ≥ N −
A−f dµ
A−f dµ.
In general case, let fN = min(f, N ). Then
Z
Z
Z
A−f dµ ≥ A−fN dµ ≥ fN dµ
Let N → ∞ and apply MCT. This gives (**).
Then, as outlined above, we get A+f = A−f a.e., and this proves
Part 1.
3
In the course of proving this, we have shown
thus f ∗ ∈ L1, giving Part 2.
R
f ∗dµ =
R
f dµ and
We already showed Parts 3 and 4 .
Note: this proof (taken mainly from Keller, Theorem 2.1.5) is
clunkier than the classical proof, via maximal inequality, but more
intuitive.
Corollary: Let I be the collection of of invariant sets mod 0 i.e.,
sets A s.t. µ(A∆T −1(A)) = 0. Then I is a σ-algebra and f ∗ =
E(f |I) a.e.
Proof: I is a σ-algebra: left as an exercise.
f ∗ is I-measurable since, by Part 3 of ergodic theorem, for all r,
(f ∗)−1(−∞, r]) ∈ I. So it suffices to show that if A ∈ I, then
Z
Z
f dµ
f ∗dµ =
A
A
Clearly true if µ(A) = 0. Otherwise, apply Part 2 of ergodic theorem
to f |A (with normalized measure on A).
Lp ergodic theorem (von Neumann (p = 2), 1932): Let T be an
MPT and f ∈ Lp, 1 ≤ p < ∞. Then there exists f ∗ ∈ Lp s.t.
f ∗ ◦ T = f ∗ a.e. and
n−1
X
||(1/n)(
f (T i(x))) − f ∗||p → 0
i=0
∞
Proof: For f ∈ L apply individual ergodic theorem and bounded
convergence theorem. Then approximate Lp functions by L∞ functions (see Walters, Corollary 1.14.1 (GTM edition (1982)), or Theorem 1.5 (online SLN edition (1975))).
Note: Legend has it that von Neumann proved his theorem before
Birkhoff proved his theorem (and did not use the individual ergodic
theorem).
4
Lecture 8:
From the ergodic theorem, we recover
– the strong law of large numbers for finite-valued iid processes
+
Xi, with X0 ∼ p: apply ergodic theorem to (F Z , Borel, µ) where
µ is the law of iid(p), T = σ (left shift) and f (x) = x0 and
– Weil equidistribution theorem:
Theorem (Weil, 1909): If α ∈ [0, 1) is irrational, then {α, 2α, 3α, . . . , }
is uniformly distributed mod 1 in [0, 1): for any subinterval (a, b) ⊂
[0, 1)
|{i ∈ [0, n − 1] : iα ∈ (a, b)}|
→b−a
n
Proof: apply the ergodic theorem to Tα(2π), f = χ(2πa,2πb), for fixed
a, b..
For a.e. x ∈ [0, 1), get
|{i ∈ [0, n − 1] : (x + iα) ∈ (a, b)}|
→b−a
n
(1)
Thus, there is a set K of full measure s.t. for all x ∈ K, we get (??)
for all (a, b) with rational a, b.
Let > 0 and a < b. Choose a0, b0, a00, b00 ∈ Q s.t.
a00 < a < a0 < b0 < b < b00, a0 − a00 < , b00 − b0 < .
Choose N s.t. for n ≥ N and x ∈ K,
|{i ∈ [0, n − 1] : (x + iα) ∈ (a0, b0)}|
|
− (b0 − a0)| < n
and
|{i ∈ [0, n − 1] : (x + iα) ∈ (a00, b00)}|
− (b00 − a00)| < |
n
5
Then
|{i ∈ [0, n − 1] : (x + iα) ∈ (a, b)}|
− (b − a)| < 3
n
Thus, by translation, the same holds for x = 0 (note that all we
needed was one x s.t. (??) holds for all (a, b)). Corollary of Ergodic Theorem: Let T be an MPT. TFAE:
|
i. T is ergodic.
ii. For all f ∈ L1,
lim (1/n)(
n→∞
n−1
X
f ◦ T i(x)) =
Z
f dµ a.e.
i=0
iii. For all A, B ∈ A,
lim (1/n)(
n→∞
n−1
X
µ(T −i(A) ∩ B)) = µ(A)µ(B).
i=0
iv. Let B be a semi-algebra which generates A. For all A, B ∈ B,
lim (1/n)(
n→∞
n−1
X
µ(T −i(A) ∩ B)) = µ(A)µ(B).
i=0
(So, it suffices to just check part iv on cylinder sets to establish
ergodicity for stationary processes).
Proof:
i ⇒ ii: apply the Ergodic Theorem.
ii ⇒ iii: apply ii with f = χA. Then
lim (1/n)(
n→∞
n−1
X
χA ◦ T i(x)χB ) = µ(A)χB a.e.
i=0
Apply the bounded convergence theorem.
6
iii ⇒ iv: trivial.
iv ⇒ i: Note that iv automatically holds for all sets A, B in the
algebra generated by B (i.e., finite disjoint unions of elements of B)
Let T −1(A) = A. Let > 0 and A0 be in the algebra generated
by B s.t.
µ(A∆A0) < .
(2)
Then by the triangle inequality and the fact T −1(A) = A,
|µ(A)2−µ(A)| ≤ |µ(A)2−µ(A0)2|+|µ(A0)2−((1/n)
n−1
X
µ(T −i(A0)∩A0))|
i=0
+|(1/n)
n−1
X
µ(T −i(A0) ∩ A0) − µ(T −i(A) ∩ A0)|
i=0
+|(1/n)
n−1
X
µ(T −i(A) ∩ A0) − µ(T −i(A) ∩ A)|
i=0
≤ 2|µ(A) − µ(A0)| + + + < 5
where, in the last expression, the first occurrence of comes from
iv, applied to the algebra generated by B, if n is sufficiently large,
and the last two occurrences of come from (??), the fact that T is
MPT, the triangle inequality.
Thus, µ(A)2 − µ(A) = 0. Note: If T is IMPT, can swap T with T −1 everywhere.
Characterize ergodicity of stationary finite-state first-order Markov
+
chains (MC) – one-sided or two-sided: (F Z , Borel, µ), where µ is
the law of the MC and T = σ,
MC defined by an m×m stochastic matrix P and stochastic vector
π: πP = π; states of the chain are the m indices of the matrix.
7
+
The law: define Aa0,...,a` := {x ∈ F Z : x0 = a0, . . . , x` = a`};
then
µ(A) = πa0 Pa0a1 Pa1a2 · · · Pa`−1a` .
+
extends to a σ-invariant measure µ on F Z .
WMA: π > 0 (delete states with zero stationary probability; does
not affect the MPT)
Main Fact:
µ(σ −k (Aj ) ∩ Ai) = πi(P k )ij
Defn: P is irreducible if for all i, j, there exists n = n(i, j) s.t.
(P n)ij > 0.
Defn: directed graph, G = G(P ) of P :
V = {1, . . . , m}; directed edge from i to j iff Pij > 0.
Note: Pijn > 0 iff there exists a path in G of length n from i to j,
i.e., P is irreducible iff G is strongly connected.
Note: if P is positive entry-by-entry, it is irreducible.
Example 0:
0 1
P =
1 0
irreducible
Example 1:
0 1 0
P = 1/3 0 2/3
0 1 0
is irreducible.
Example 2:
1 0
0
P = 0 1/3 2/3
0 1/4 3/4
8
is reducible.
Example 3:
1/3 1/3 1/3
P = 0 1/3 2/3
0 1/4 3/4
is reducible.
Theorem: An MC is ergodic iff P is irreducible.
9
Lecture 9:
Recall that T is ergodic iff:
iv. Let B be a semi-algebra which generates A. For all A, B ∈ B,
lim (1/n)(
n−1
X
n→∞
µ(T −k (A) ∩ B)) = µ(A)µ(B).
k=0
In the following P is a stochastic matrix, π is a stochastic vector
with all strictly positive entries and πP = π (P is not necessarily
irreducible).
Recall: πi(P k )ij = µ(T −k (Aj ) ∩ Ai), where Ai = {x : x0 = i}.
Lemma:
Q := lim (1/n)(
n→∞
n−1
X
P k)
k=0
exists (entry by entry) and
Z
(Qij = (1/πi)
(χ∗Aj )χAi dµ)
Proof:
n−1
n−1
X
X
Qij = (1/πi) lim (1/n)πi( (P k )ij ) = (1/πi) lim (1/n)(
µ(σ −k (Aj )∩Ai))
n→∞
n→∞
k=0
(1/πi) lim (1/n)(
n→∞
Z
= (1/πi)
n−1 Z
X
k=0
(χAj ◦ σ k )(χAi )dµ) =
k=0
n−1
X
( lim (1/n)( (χAj ◦ σ k )(χAi )dµ)
n→∞
k=0
Z
= (1/πi)
(χ∗Aj )χAi dµ
(by ergodic theorem and bounded convergence theorem) 10
Note that the proof assumes that there exists a strictly positive π
that accompanies P and gives the same result for all such π.
Exercise: interpret χ∗Ai concretely in terms of the Markov chain.
Alternate proof: use linear algebra and fact that no eigenvalue has
modulus > 1 and Jordan form is trivial for eigenvalues of modulus 1
(follows from Perron Frobenius Theorem).
Corollary:
–
–
–
–
QP = Q = P Q
Q2 = Q
πQ = π
Q is stochastic
Theorem TFAE:
1. The MC defined by π, P is ergodic
2. For all i, j, Qij = πj
3. P is irreducible.
Proof:
1 ⇔ 2:
Let Ai and Aj be initial cylinder sets.
Since πi(P k )ij = µ(T −k (Aj ) ∩ Ai),
P
Qij = πj iff (1/n) n−1
(P k )ij → πj
k=0
P
−k
iff (1/n) n−1
(Aj ) ∩ Ai) → µ(Ai)µ(Aj ).
k=0 µ(T
So, (2) holds iff condition (iv) for ergodicity holds for all cylinder
sets Ai, Aj .
Exercise: extend “only if” to general initial cylinder sets.
Let A = Aa0,...a`−1
µ(A) = πa0 Pa0a1 Pa1a2 · · · Pa`−2a`−1 .
11
Let B = Ab0,...bu−1
µ(B) = πb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 .
and for suff. large k,
µ((σ −k (A))∩B) = πb0 Pb0b1 Pb1b2 · · · Pbu−2bu−1 (Pbk−u
)Pa0a1 Pa1a2 · · · Pa`−2a`−1 .
u ,a0
Pn−1
Thus, (1/n)( k=0 µ((σ −k (A)) ∩ B)) converges to
πb0 Pb0b1 Pb1b2 · · · Pbu−1bu πa0 Pa0a1 Pa1a2 · · · Pa`−1a` .
= µ(A)µ(B).
2 ⇒ 3: Q > 0 and so by defn. of Q, for all i, j there exists n s.t.
Pijn > 0.
3 ⇒ 2: Q = QP = QP 2 = QP n for all n.
Fix i, j. Show Qij > 0.
P
n
– Qij = k Qik Pkj
– Since Q is stochastic, there exists k s.t. Qik > 0.
n
> 0.
– Since P is irreducible, there exists n s.t. Pkj
– Thus, Qij > 0.
P
Since Q = Q2, Qij = k Qik Qkj .
Since Q is stochastic, Qij is a weighted average, with strictly positive weights of Q1j , Q2j , . . . , Qmj . But this holds for all i. Thus, Qij
depends only on j: Qij = qj .
P
P
But πQ = π. Thus, πj = k πk Qkj = k πk qj = qj . Note: This result shows that for irreducible P , the stationary
vector is unique (but this also follows from linear algebra, PerronFrobenius Theorem)
12
Defn: An MPT T is mixing if for all A, B ∈ A,
µ(T −n(A) ∩ B) → µ(A)µ(B).
Fact: For an IMPT can replace T −n with T n in defn. of mixing.
Fact: Mixing implies ergodicity, since convergence implies Cesaro
convergence (see condition (iv) of earlier Theorem)
– For an ergodic MPT, the orbit of a set of positive measure gets
uniformly distributed.
– For a mixing MPT, the time-n images of sets of positive measure
persistently get uniformly distributed.
Fact (exercise): to check mixing, it suffices to check mixing on
a generating semi-algebra (e.g., initial cylinder sets for stationary
processes); an approximation argument.
Fact: Mixing is an isomorphism invariant; in fact, if there is an
MP homomorphism from an MPT T to and MPT S and T is mixing,
then so is S.
Fact: If T is mixing, given any finite collection of sets of positive
measure, A1, . . . , Am, ∃n ∀i, j, µ(T −n(Aj ) ∩ Ai) > 0 (and therefore
not disjoint).
Note: blob picture of mixing.
Example: Rotation Tα of circle is never mixing.
Proof: Partition the circle into disjoint three disjoint (half-open)
arcs, A, B, C, of the same size. Then for all n, T n(A) is disjoint from
one of A, B, C.
Example: iid process (one-sided or two-sided): is mixing because
of independence (on cylunder sets).
Example: It follows that doubling map and Baker are mixing
(since mixing is an isomorphism invariant).
13
Picture of Baker.
14
Download