Lecture 26: Nov. 14 SMB Theorem Then

advertisement
Lecture 26: Nov. 14
SMB Theorem
Let T be an ergodic MPT and α a (finite, measble.) partition.
Then
Z
1
Iα0n → h(T, α) = Iα|α1∞ = H(α|α1∞) a.e. and in L1
n
Shannon-1948: irreducible stationary Markov chains; convergence
in probability
McMillan-1953: ergodic stationary processes; convergence in L1
Breiman-1957: ergodic stationary processes; convergence a.e.
Proof:
Recall:
— α(x) defined by x ∈ Aα(x).
— Iα (x) = − log µ(Aα(x))
— Iα|β (x) = − log µ(Aα(x)|Bβ(x))
Lemma 1: Iα∨β = Iβ + Iα|β
Proof:
Iα∨β (x) = − log µ(Aα (x)∩Bβ (x)) = − log
µ(Aα(x) ∩ Bβ(x))
−log(µ(Bβ(x)))
µ(Bβ(x))
= Iα|β (x) + Iβ (x)
Lemma 2: Iα ◦ T = IT −1(α)
Proof:
Iα ◦ T (x) = − log µ(Aα(T (x))) = − log µ(AT −1(α)(x)) = IT −1(α)(x)
Thus, Iα∨T −1α = IT −1(α) + Iα|T −1α = Iα|T −1α + Iα ◦ T
1
By induction:
Iα0n = Iα|α1n +Iα|αn−1 ◦T +. . .+Iα|α1 ◦T n−1 +Iα ◦T n =
n
X
1
1
Iα|αn−k ◦T k
1
k=0
n
1
1X
Iα0n =
Iα|αn−k ◦ T k
1
n
n
n
1
1X
=
Iα|α1∞ ◦ T k +
n
n
k=0
n X
k=0
k
Iα|αn−k ◦ T − Iα|α1∞ ◦ T
k
1
k=0
= Rn + Sn
Since Iα|α1∞ ∈ L1,
Z
Iα|α1∞ dµ = H(α|α1∞)
we can apply Ergodic Theorem to get
Rn → H(α|α1∞) a.e. and in L1
So, it suffices to show that Sn → 0 a.e. and in L1.
(Note: In IID case, Sn = 0)
For L1 convergence, observe:
Z
n Z
1 X k
k
|Sn|dµ ≤
Iα|αn−k ◦ T − Iα|α1∞ ◦ T 1
n
k=0
n
1X
=
n
Z Iα|αn−k − Iα|α1∞ 1
k=0
1
which converges to 0 by L convergence in theorem on continuity of
conditional information.
2
(in fact, we only need the Cesaro convergence in L1 from that
theorem).
Remains to prove:
Sn → 0 a.e.
(the interesting part).
Proof: Let hN = supn≥N |Iα|α1n − Iα|α1∞ |.
By theorem on continuity of conditional entropy,
hN → 0 a.e.
and
hN ≤ 2f ∗
where f ∗ = supn Iα|α1n ∈ L1.
By DCT,
Z
hN dµ → 0
Fix N . Since for k ≤ n − N , |Iα|αn−k − Iα|α1∞ | ≤ hN
1
n−N
1X
1
lim sup |Sn| ≤ lim sup(
hN ◦T k +
n
n
k=0
n
X
h1◦T k ) = lim sup(Un+Vn)
k=n−N +1
Let n → ∞.
By ergodic theorem applied to hN , Un →
R
hN dµ a.e..
P
And Vn consists of the last N terms of n1 nk=0 h1 ◦ T k , which
converges by the ergodic theorem. So Vn → 0 a.e. Thus,
R
lim sup |Sn| ≤ hN dµ.
Let N → ∞. Thus, lim sup |Sn| = 0 a.e. QED
Recall:
h(T, α) = lim(1/n)H(∨ni=0αi)
n
3
h(T ) = supα h(T, α)
Fact: h(T ) is an isomorphism invariant, i.e. if T and S are isomorphic, then h(T ) = h(S).
Proof: An isomorphism φ : M → N sets up a bijection between
finite measble. partitions of M and those of N :
α 7→ φ(α),
h(T, α) = h(S, φ(α))
Prop 1: Let T be an MPT, α a (finite, measble.) partition and
αi = T −i(α). Then for all m,
h(T, α) = h(T, ∨m
i=0 αi )
Proof:
n+m
n
h(T, ∨m
i=0 αi ) = lim(1/n)H(∨i=0 αi ) = lim(1/n)H(∨i=0 αi ) = h(T, α)
n
n
Prop 2: Let T be an MPT, α, β (finite, measble.) partitions. Then
h(T, β) ≤ h(T, α) + H(β|α)
Proof:
H(∨nj=0βj ) ≤ H((∨ni=0βi)∨(∨nj=0αi)) = H(∨ni=0αi))+H(∨nj=0βj | ∨ni=0αi)
H(∨nj=0βj | ∨ni=0 αi) ≤
n
X
H(βj | ∨ni=0 αi) ≤
n
X
H(βj | αj )
j=0
j=0
Thus,
H(∨nj=0βj ) ≤ H(∨ni=0αi))+
n
X
H(βj | αj ) = H(∨ni=0αi))+nH(β | α)
j=0
Thus, (1/n)H(∨ni=0βi) ≤ (1/n)H(∨ni=0αi) + H(β | α).
Let n → ∞.
4
Lecture 27: Nov. 16
Recall:
Prop 1: Let T be an MPT, α a (finite, measble.) partition and
αi = T −i(α). Then for all m,
h(T, α) = h(T, ∨m
i=0 αi )
Prop 2: Let T be an MPT, α, β (finite, measble.) partitions. Then
h(T, β) ≤ h(T, α) + H(β|α)
∞
Notation: ∨∞
i=0 αi = σ(∪i=0 αi )
Defn: A 1-sided generator for a MPT T on (M, A, µ) is a (finite,
measble.) partition α such that
∨∞
i=0 αi = A
Theorem (Kolmogorov-Sinai, 1958-1959). Let α be a 1-sided generator for an MPT T . Then
h(T ) = h(T, α)
(and so we have some hope of computing h(T )).
Kolmogorov - 1958: slightly different notion, for restricted class of
prcoesses; not quite right; but in 1957 lecture gave correct defn. for
iid
Sinai - 1959: correct definition, completely general
Proof: Enough to show that for any (finite, measble.) partition β,
we have h(T, β) ≤ h(T, α).
By Prop 2 applied to ∨ni=0αi and then Prop. 1, for all n,
h(T, β) ≤ h(T, ∨ni=0αi) + H(β| ∨ni=0 αi) = h(T, α) + H(β| ∨ni=0 αi)
5
But σ(∨ni=0αi) ↑ A. By continuity of entropy
H(β| ∨ni=0 αi) → H(β|A)
But since β ⊂ A, we have, by Property 3, H(β|A) = 0. Thus,
h(T, β) ≤ h(T, α).
Example: If T is the MPT corresponding to a one-sided stationary
process X, then
h(T ) = h(X)
Proof: Recall that h(T, α) = h(X) where α is the partition Ai =
+
{x ∈ F Z : x0 = i}.
Clearly α is a 1-sided generator.
So,
h(Tiid+(p)) = H(p)
h(Tiid+(1/2,1/2)) = log 2,
h(Tiid+(1/3,1/3.1/3)) = log 3
h(TDoubling) = log 2.
Show one-sided generator for doubling map.
For Markov P with stationary vector π:
X
h(TP + ) =
πi log pij
ij
Theorem: Let T be an invertible MPT. If T has a 2-sided generator
α), i.e.,
∨∞
i=−∞ αi = A
then h(T ) = h(T, α).
m
Proof: In Prop. 1, replace ∨m
i=0 αi with ∨i=−m αi
6
Example: If T is the MPT corresponding to a two-sided stationary
process X, then
h(T ) = h(X)
So,
h(Tiid(p)) = H(p)
h(Tiid(1/2,1/2)) = log 2,
h(Tiid(1/3,1/3.1/3)) = log 3
Thus, these MPT’s are not isomorphic.
Existence of two-sided generators (Krieger, 1970): if T is an invertible ergodic MPT and h(T ) is finite, then it has a two-sided generator
with at most beh(T )c + 1 sets.
Note: this bound is necessary:
Proof: If α is a two-sided generator, then
h(T ) = h(T, α) = H(α|α1∞ ≤ H(α) ≤ log α
Thus
eh(T ) ≤ |α|
Thus,
deh(T )e ≤ |α|
LHS is beh(T )c + 1 unless eh(T ) is an integer.
But we will see an example where eh(T ) is an integer k and there
is no generator with k sets; namely h(T ) = 0 (rotation of the circle).
Prop: h(T, α) = 0 iff α ⊂ α1∞.
Proof: h(T, α) = H(α|α1∞).
But Property 3 says that RHS is 0 iff α ⊂ α1∞.
Interpretation in terms of predictability (determinism).
7
Lecture 28: Nov. 18
Recall:
Prop: h(T, α) = 0 iff α ⊂ α1∞.
Interpretation in terms of predictability (determinism).
Cor: h(T ) = 0 iff α ⊂ α1∞ for all α.
Prop: If T is invertible and has a 1-sided generator, then h(T ) = 0.
Proof:
α ⊂ A = T −1(A) = T −1(α0∞) = α1∞
So, we do not generally expect to have one-sided generators for
invertible MPT’s.
Example 1:
For Baker; h(T ) = log 2, and since Baker is invertitble, it has no
1-sided generator. Show 2-sided Generator
Example 2:
Irrational rotation (where Tr is rotation of circle by angle r set
r/(2π) 6∈ Q).
Claim: h(Tr ) = 0
Let a = er2πi.
Tr (z) = az
Partition, α = {Aup, Adown}: upper and lower semi-circles bounded
by 1 and -1. Then T −n(α) are semi-circles bounded by a−n and
−a−n. But {a−n}n∈Z + is dense (note: −r is just as irrational as r).
Thus, all arcs belongs to α0∞. Thus, α0∞ = A. QED
Note: Tr has a two-sided generator with 2 sets, but not with 1
set. Thus, in this case, we need more than deh(T )e sets in a two-sided
generator (compare with Krieger generator theorem).
8
Example: For a rational rotation Tr , i.e., r/(2π) = p/q ∈ Q,
h(Tr ) = 0.
Proof: T q = I. For any partition α, α0nq−1 = α0q−1. Thus,
h(T ) = lim(1/nq)H(α0nq−1) = lim(1/nq))H(α0q−1) → 0
n
n
Note: there is no finite generator for Tr , r rational (non-ergodic,
compare with Krieger generator theorem)
Note: More generally, any rotation on a compact group has zero
entropy.
Defn: Kolmogorov (K) MPT
Invertible and:
There exists a sub-sigma-algebra B s.t.
1. B ⊂ T (B)
n
2. ∨∞
n=0 T (B) = A
3. ∩nT −n(B) = {∅, M } (mod 0)
Think of B as α0∞.
i.e., satisfies the Kolmogorov 0-1 law.
Deep Theorem (Kolmogorov, Rohlin): Let T be invertible. Then
T is K iff for all non-trivial α, h(T, α) > 0.
K-MPT’s are extreme opposites of zero entropy MPT’s:
For all α, h(T, α) > 0 —vs—- For all α, h(T, α) = 0
At one time, it was conjectured that every invertible MPT is a
direct product of a K MPT and a zero entropy MPT. This is false,
but there is a weaker notion of product for which this is true.
Deep Theorem (Ornstein 1969) Tiid(p) ∼ Tiid(q) iff H(p) = H(q).
Defn: An MPT is B (Bernoulli) if it isomorphic to Tiid(p) for some
p.
9
Note that B ⊆ K by virtue of the Kolmogorov 0-1 Law.
At one time, it was conjectured that B = K, but this is false.
Lecture 29: Nov. 21
– Posted Exercises 3 and 4 on website
– Student talks: reserve Math 126 for every weekday Dec. 1 - 15,
2-3:30
– Teaching evaluations, enrolled students.
– Problem of isomorphism of MPT’s is unsolved, except for special
classes
– Lebesuge spaces are quite general
Defn: A topological dynamical system (TDS) is a continuous map
on a compact metric space T : M → M .
Defn: An invertible top dyn sys (ITDS) is a homeomorphism
T : M → M (equivalently, a bijective continuous map, since M is
compact.
Orbit:
—- If non-invertible, {T n(x)}n≥0
—- If invertible, {T n(x)}n∈Z
Iterate dynamics, just as in erogidc theory.
Analogue of ergodicity:
Defn: Topologically Transitive: for all nonempty open A, B, there
exists n > 0 s.t. T −n(A) ∩ B 6= ∅.
Fact: Top. transitive iff there exists a point with a forward dense
orbit, i.e., {T n(x)}n≥0 is dense in M ; sometimes we say that x is a
dense orbit (equivalently, a dense Gδ of dense orbits).
10
Proof:
Only if: Let {Am} be basis for topology. Then
∞
−n
∩∞
(Am))
m=1 (∪n=1 T
is a countable intersection of dense open sets.
If: Let x ∈ V have forward dense orbit. Then x ∈ T −n(A) ∩ B
for some n ≥ 0.
Analogue of mixing:
Defn: Topologically mixing: for all nonempty open A, B, there
exists N > 0 s.t. for all n ≥ N , T −n(A) ∩ B 6= ∅.
Note: suffices to check Top. mixing and Top. transitive on a basis
of open sets.
Note that if there is a T -invariant Borel probabiiity measure µ
which is strictly positive on all nonempty open sets, then:
— Ergodicity (wrt µ) implies Top. Transitive.
— Mxiing (wrt µ) implies Top. Mixing.
Defn: Periodic point: T p(x) = x (x is periodic with period p;
sometimes we say that the orbit is periodic).
e.g., p = 1 means fixed point.
(no real analogoue in ergodic theory)
Examples:
1. circle rotation: z 7→ az, a = eir , r ∈ [0, 2π)
—– r/(2π) ∈ Q: all orbits are periodic; not top. trans.; therefore,
not top, mixing
—– r/(2π) 6∈ Q: all orbits are dense; no periodic orbits; top.
trans., since ergodic w.r.t Lebesgue; not top. mixing
11
2. doubling map on circle: z 7→ z 2 (eiθ 7→ e2iθ ).
this is similar to the doubling map on [0, 1]: x 7→ 2x mod 1,
but rescaled.
top. trans. and top. mixing, since the doobling map on [0, 1]
is mixing w.r.t. Lebesgue
will see that it has a dense set of periodic points.
(so, a rich mixture of periodic and “ergodic” behaviour).
3. Smale horseshoe with caps (a continuous analogue of Baker’s
transformation).
T : M → M , where M is union of square S and left and right
caps.
T is not onto, but can be extended to a self-homeomorphism of
R2 .
Dynamics:
Let Ω = ∩n∈ZT n( red ∪ blue ).
T has a fixed point x0 in Left Cap.
Claim: If x 6∈ Ω, then limn→∞ T n(x) = x0.
Proof: black → right cap → left cap → x0.
If x 6∈ Ω, then for some m ∈ Z, then
T m(x) ∈ left cap ∪ black ∪ right cap , and so T n+m(x) → x0.
QED
Interesting dynamics is on Ω.
Looks like Baker, except it is continuous.
Will see T |Ω is top. mixing, therefore also top. trans., and also
show that it has dense periodic points.
12
Lecture 30: Nov. 23
Example: Full two-sided Shift
Let F be a finite set. Let M = F Z. Let T = σ, the left shift.
Metric: d(x, y) = 2−k where k is maximal such that
x[−k,k] = y[−k,k]
if x = y, set d(x, y) = 0; if x0 6= y0, set d(x, y) = 2; this is a
metric.
meaning: x and y are close if they agree in a large central block.
the topology generated is the product topology, which is the same
as the topology generated by cylinder sets.
note that agreement in a large central block is equivalent, in the
sense of generating the topology, to agreement in a large block (central or otherwise).
Top, mixing because for A = Aa0...a`−1 and B = Ab0...bu−1
σ −n(A) ∩ B 6= ∅ for n ≥ u.
Thus, also top. transitive.
Periodic points: periodic sequences.
Periodic points are dense: Given x ∈ M , define y = (x[−n,n])∞.
Then y is periodic and d(x, y) ≤ 2−n,
Note: Same holds for one-sided shift.
Defn: Topological conjugacy: T ∼ S if there exists homeomorphism φ : M → N s.t. φ ◦ T = S ◦ φ.
(in this case, homeo is equivalent to bijective and continuous).
Invariants of top. conjugacy:
a. Top. Trans.
13
b. Top, Mixing
c. Density of Periodic points
Claim: Full two sided shift on F = {R, B} is conjugate to horseshoe, restricted to Ω,
Define φ : Ω → F Z: If x ∈ Ω, define φ(x) = z ∈ F Z, where
zn = R if T n(x) ∈ red
zn = B if T n(x) ∈ blue
Proof of: φ ◦ T = σ ◦ φ:
φ(T x)n = R if T n+1(x) ∈ red
φ(T x)n = B if T n+1(x) ∈ blue
Thus, φ(T x)n = φ(x)n+1 = (σ(φ(x)))n. QED
Lemma: For all N and all sequences z−N . . . z0 . . . zN ∈ {B, R}2N +1,
−n
(zn)
the set S(z−N . . . z0 . . . zN ) = ∩N
n=−N T
– is compact
– nonempty
– has diameter ≈ 2−N
Draw pictures of S( 3-blocks )
Proof of 1-1: Let x ∈ Ω and z = φ(x).
−n
∩∞
(zn) = {x}
n=−∞ T
So, x can be uniquely recovered from T (x).
Proof of onto: Let z ∈ {B, R}Z.
−n
Then ∩N
(zn) is a nested decreasing sequence of compact
n=−N T
sets and thus its infinite intersection is nonempty. In fact, since
14
diameters decrease to zero, its intersection is a single point x ∈ Ω,
and z = φ(x).
Proof that φ−1 is continuous:
If d(z, z 0) is small, then for a large N ,
0
z−N . . . z0 . . . zN = z−N
. . . z00 . . . zN0 .
0
Now, φ−1(z) ∈ S(z−N . . . z0 . . . zn) and φ−1(z 0) ∈ S(z−N
. . . z00 . . . zn0 ),
and so
ρ(φ−1(z), φ−1(z 0)) < 2−N .
Note: Ω is a Cantor set.
Defn: Topological factor: T ↓ S if there exists onto, continuous
φ : M → N s.t. φ ◦ T = S ◦ φ.
If T has one of property a, b or c, and S is a topological factor of
T (e.g., conjugacy), then so does S.
Example:
Claim: Full one sided shift on F = {0, 1} factors onto doubling
map on circle.
φ(x0x1 . . .) = e2πi(.x0x1...)
φ ◦ σ = T ◦ φ:
φ◦σ(x0x1 . . .) = φ(x1x2 . . .) = e2πi(.x1x2...) = e2πi2(.x0x1...) = (φ(x0x1 . . .))2
φ is continuous: if x[0,n] = y[0,n], then .x0x1 . . . and .y0y1 . . .
are close.
onto: every angle is of the form 2π(.x0x1 . . .),
Note: full one sided shift and doubling map are not top. conjugate
because F Z and circle are not homoemorphic (e.g., disconnected vs.
connected).
15
Thus, both doubling map on circle and horseshoe map on Ω are
top. mixing, therefore top. transitive, and have dense periodic
points.
Lecture 31: Nov. 25
Another Example:
Defn: Let C be a square 0-1 matrix (say m × m).
Let F = {0, . . . , m − 1}.
Let MC = {x ∈ F Z : Cxi,xi+1 = 1 for all i ∈ Z}
Consider graph of C
σ|MC is called a Topological Markov Chain (TMC).
Notation: MC refers to the set and to σ|CA .
Examples:
1.
C=
1 1
1 0
Draw graph
Then MC is the set of all bi-infinite binary sequences such that 11
is forbidden.
2.
C=
1 1
1 1
Then MC is a full shift.
WMA: C has no all zero rows and no all zero columns.
Recall that we defined irreducibility and primitivity only for stochastic matrices, but the same concepts apply to nonnegative matrices.
16
Prop:
1. σ on MC is top. transitive iff C is irreducible.
2. σ on MC is top. mixing iff C is primitive.
Proof is nearly identical to proofs of ergodicity and mixing for
irreducible and primitive Markov chains.
Reasons for our interest:
a. TMC’s model many interesting smooth dynamical systems, e.g.
partial horseshoe.
b. Prop: TFAE
1. M is the support of a stationary Markov chain (i.e., the smallest
closed subset of F Z that has measure one).
2. M = MC is a TMC which is the disjoint union of finitely many
irreducible TMC’s,
3. M = MC has dense periodic points
Example:
P =
1/3 2/3
1
0
π = [3/5, 2/5].
TOPOLOGICAL ENTROPY:
For simplicity, assume Invertible!
Define: for open covers α, β (possibly infinite!)
α ∨ β = {Ai ∩ Bj : nonempty }
T −1(α) = {T −1(Ai}
n
αm
= T −m(α) ∨ · · · ∨ T −n(α).
α β: every element of β is a subset of an element of α
17
Recall: every open cover of a compact space has a finite subcover.
Defn: Let M be compact metric space. Let α be an open cover of
M . Let N (α) be size of smallest subcover. Let H(α) = log N (α).
Easy properties:
– H(α ∨ β) ≤ H(α) + H(β)
– If α β, then H(α) ≤ H(β).
– H(α) = H(T −1(α)).
Defn:
h(T, α) = lim (1/n)H(α0n−1)
n→∞
Prop: Limit exists. Proof:
Let an = H(α0n−1). Then an+m ≤ an + am (is subadditive.
Proof:
α0n+m−1 = α0n−1 ∨ T −n(α0m−1).
Thus,
an+m = Hα0n+m−1) ≤ H(α0n−1) + H(α0m−1) = an + am
QED
Fekete’s Lemma: For a subadditive sequence an,
lim(1/n)an = inf (1/n)an
n
n
Rough idea of proof: akn ≤ kan and so
akn
kn
So, if limit exists, then it must be the inf.
Defn:
h(T ) = sup h(T, α)
α
18
≤
an
n.
Notation: If T is a ITDS and and also has an invariant Borel
probability measure µ, then we use hµ(T ) to denote the measure
theoretic entropy and h(T ) to denote the topological entropy.
Lecture 32: Nov. 28
Recall defn of h(T, α) and h(T )
For simplicity, assume T is ITDS.
Prop: Topological entropy is an invariant of topological conjugacy,
i.e., if S ≈ T , then h(S) = h(T ).
Want to compute h(T )
Easy facts:
n
).
– h(T, α) = h(T, αm
– If α β, then h(T, α) ≤ h(T, β).
Defn: Diameter of an open cover is the sup of diameters of its
elements.
Prop: If αm is a sequence of open covers and diameter(αm) approaches 0, then h(T, αm) → h(T, α).
Proof: Assume h(T ) < ∞. Let > 0. Let β be an open cover
such that h(T, β) > h(T ) − . Choose M such that for m ≥ M ,
the diameter of αm is less than the Lebesgue number of β, and so
β αm. Thus,
h(T ) ≥ h(T, αm) ≥ h(T, β) > h(T ) − .
If h(T ) = ∞. Let N > 0 and β be an open cover such that
h(T, β) > N . Continue as above. QED
Example: Let T be rotation of circle. Let αm be cover consisting
of all open intervals of diameter < 1/m.
19
Then T −1(αm) = αm, and so αm = (αm)n−n and so h(T, αm) = 0.
Thus, h(T ) = 0. QED
Defn: A topological generator for an ITDS is a finite open cover
α = {A1, . . . , Am} such that every sequence ∩∞
k=−∞ Aik contains at
most one point.
Theorem: If α is a top. generator, then h(T ) = h(T, α).
n
Claim: diam (α−n
) → 0.
Proof of Claim: Use compactness:
If not, then there exists δ > 0 and xn, yn in same element
n
). and d(xn, yn) > δ. Let the xn, yn accumulate on x, y.
of (α−n
Then x 6= y. By diagonalization can find an infinite sequence ik s.t.
x, y ∈ ∩∞
k=−∞ Aik .
n
Proof of Prop: By Previous Prop, h(T, α−n
) → h(T ).
n
By easy Prop., h(T, α−n
) = h(T, α). QED
For the full shift, the standard partition α given by Ai = {x ∈
F Z : x0 = i} is a top. generator (in particular it is an open cover).
Thus, h(σF Z ) = h(σF Z , α).
Observe N (α0n−1) = |F |n.
Thus, h(σF Z ) = log |F |
Cororllary: entropy of horseshoe is log(2) (the complement of the
invariant set has zero entropy).
The standard partition for a TMC MC is also a top. generator
(for the same reason).
Notation: σC denotes the shift on MC .
20
Proposition: For a TMC MC ,
h(σC ) = log λC
where λC is the spectral radius of C, i.e.,
λC = max{|λ| : λ eigenvalues of C}
Will use:
Perron-Frobenius Theorem:
Let C be an irreducible matrix with spectral radius λC . (C is a
nonnegative square matrix s.t. for all i, j there exists n s.t. (C n)ij >
0.) Then
1. λC > 0 and is an eigenvalue of A.
2. λC has a (strictly) positive eigenvector and is the only eigenvalue
with a (strictly) positive eigenvector ((right Perron eigenvector
is denoted v and left Perron eigenvector is denoted w).
3. λC is a simple eigenvalue.
4. If C is primitive (i.e., for some n > 0, C n > 0) , then
Cn
→ (viwj )
λnC
(where w · v = 1).
5. If C is primitive, then λ is the only eigenvalue with modulus
= λ.
Note the consistency of the Proposition above with computation
above for the Full shift:
C is the all 1’s matrix of size |F |. So,
Claim: λC = |F |
21
Proof of Claim: |F | is an eigenvalue of C with eigenvector 1
Apply parts 1 and 2 of P-F Theorem.
Lecture 33: Nov. 30
Re-state P-F Theorem
Brief discussion of proof of P-F:
Let S be the unit simplex in Rn. Define
xS
xS1
In primitive case, C n > 0 for some n, and so f contracts into a single
point, giving a unique eigenvalue λ with positive eigenvector. And λ
is the only eigenvalue with largest modulus; for otherwise we would
have leakage out of S.
Modify to take care of the general irreducible case.
f : S → S,
x 7→
Compare with stochastic case.
Proposition: For a TMC MC ,
h(σC ) = log λC
where λC is the spectral radius of C, i.e.,
λC = max{|λ| : λ eigenvalues of C}
Proof of Proposition:
Will prove Prop in special case that C is irreducible (in general,
reduce to disjoint union of irreducible components plus transient connections).
Proof: Let α be standard partition, a top. generator.
Then N (α0n−1) is the number of vertex sequences that occur in the
graph G(C).
22
Thus, N (α0n−1) = log 1C n−11.
So,
h(σC ) = lim (1/n) log 1C n1
n→∞
P-F Theorem guarantees a positive (right) eigenvector v corresponding to λ = λC .
Claim: There exist constants K1, K2 > 0 s.t.
K1λn ≤ 1C n1 ≤ K2λn
Proof of Claim:
1C n1 ≤ 1C n(
1C n1 ≥ 1C n(
v
1
) = (1 · v)
λn
min(v)
min(v)
v
1
) = (1 · v)
λn
max(v)
max(v)
Take log, divide by n, take limn, to get:
h(σC ) = log λ
QED
Example: Golden Mean.
C=
λC = (1 +
√
1 1
1 0
5)/2.
For an ITDS T , let M = MT denote the set of all T -invariant
Borel probability measures µ, i.e., T is an MPT with respect to
(M, Borel, µ).
23
Theorem (Variational Principle for irreducible TMC) Let M =
MσC .
h(σC ) = log λC = sup hµ(σC )
µ∈M
and the sup is achieved uniquely by an explicitly describable Markov
chain:
vj
Pij = Cij
λC vi
where Cv = λC v.
Lecture 34: Dec. 2
Proof of Variational Principle:
Terminology: λ = λG is called the Perron eigenvalue of C and
it corresponding eigenvectors are called Perron eigenvectors.
Step 1: For all µ ∈ M, hµ(σC ) ≤ log λ = h(T ).
Proof: Let α be the standard partition. Then
hµ(σC ) = lim (1/n)Hµ(α0n−1) ≤ lim (1/n)) log 1C n−11 = log λ = h(σC ).
n→∞
n→∞
Step 2: Let M denote the set of all first-order stationary Markov
ν ∈ M. Then
sup hµ(σC ) = sup hν (σC )
µ∈M
ν∈M
Proof: Let µ ∈ M. Let
S = {i : µ(Ai) > 0} (recall Ai = {x ∈ MC : x0 = i}).
Let ν denote the first-order stationary Markov chain defined by
the |S| × |S| matrix P :
Pij =
µ(Aij )
µ(Ai)
24
where
Aij = {x ∈ MC : x0 = i, x1 = j}
Then ν ∈ M and is called the Markovization of µ. Then
X
−1
hµ(σC ) ≤ Hµ(α|σ (α)) = −
µ(Aij ) log Pij = hν (σC )
ij
Step 3: Find optimal ν ∈ M
What is a good guess for the joint (NOT conditional) distribution
Qij = ν(Aij )?
Make distribution on long words roughly uniform.
Assuming primitivity of C, we get from P-F (part 4):
Cn
→ (viwj )
λn
(where Cv = λv, wC = λw, and w · v = 1).
So,
(C n)ki ∼ vk wiλn
and
(C n)j` ∼ vj w`λn
So, we guess:
Qij ∼ (1C nei)Cij (ej C n1) ∼ Cij wivj λ2n ∼ Cij wivj
Then
πi ∼
X
Cij wivj = λwivi
j
Thus,
Pij =
Qij
w i vj
vj
= Cij
= Cij
πi
λwivi
λvi
25
is a good guess. Can verify directly that this Markov chain achieves
the sup, namely log λ:
X
X
vj
Qij log Cij
hν = −
Qij Cij (log vj − log vi)
= log λ +
λv
i
ij
ij
X
= log λ +
Qij (log vj − log vi) = log λ
ij
since
X
Qij (log vj − log vi) =
ij
X
πj log vj −
X
j
πi log vi = 0.
i
(Note: this is valid assuming irreducibility).
(Note: we get the same result if Q is replaced by any other stationary Q0)
Thus, sup is achieved and equals log λ = h(σC ).
Step 4: Uniqueness
a. if hµ(σC ) = h(σC ), then hµ(σC ) = hν (σC ), where ν is the
Markovization of µ. By exercise 6 on Exercise Set 1, µ is Markov
and µ = ν.
b. Let P and Q be the conditional and joint probability matrices
for ν as above (i.e.
vj
Pij = Cij
λvi
Let P 0 and Q0 be the conditional and joint probability matrices
for an arbitrary ν 0 ∈ M.
If hν 0 = hν , then
−EQ0 log P 0 = −EQ log P = −EQ0 log P
and so
EQ0 log
P
=0
P0
26
And
log EQ0
X
X
P
0
0
Qij Pij /Pij = log
= log
πi0 Pij = 0.
0
P
ij
ij
(here, π 0 = π 0P ). Thus,
P
P
0
=
0
=
log
E
Q
P0
P0
By Jensen, equality happens only if P 0 = P on the support of Q0.
But if the support of Q0 is strictly smaller than the support of Q,
then P 0 cannot be stochastic.
EQ0 log
Example: Golden Mean
v=w=
[λ,1]
λ2 +1
So,
P =
1/λ 1/λ2
1
0
where λ is the golden mean.
Defn: Let C be an irreducible m×m 0-1 matrix. Let f1, . . . , fm ∈
R. Let Cf be the matrix defined by:
(Cf )ij = Cij efj
The pressure is defined:
1
PσC (f ) = lim ( ) log 1(Cf )n−11
n→∞ n
Note: If f ≡ 0, then PσC (f ) = h(σC ).
Theorem (Variational Principle for pressure of irreducible TMC’s)
Let M = MσC .
Z
PσC (f ) = log λCf = sup hµ(σC ) +
f dµ
µ∈M
27
MC
where f (x) = fx0 And the sup is uniquely achieved by the Markov
chain ν:
vj
Pij = Cf
λvi
where v is the right Perron eigenvector of Cf .
Compute:
Let Qij = ν(Aij ).
hν = −
X
ij
X
X
vj
Qij log Cij e
= log λ−
Qij fj +
Qij Cij (log vj −log vi)
λvi
ij
ij
Z
= log λ −
f dν + 0
fj
MC
as before. Thus,
Z
hν +
f dν = log λ
MC
The main point:
Solution to a variational problem (equilibrium state) has an explicit exponential form (Gibbs state).
This generalizes the result discussed in lectures 1 and 2.
Further generalization:
Defn: Let T be an ITDS. Let f : M → R be continuous.
The pressure is defined as follows.
P
i
Define Snf (x) = n−1
i=0 f ◦ T (x).
An (n, )-spanning set is a set F such that for all x ∈ M , there
exists y ∈ F such that d(T ix, T iy) < , for i = 0, . . . , n − 1. Let
X
PT (f, , n) =
inf
eSnf (x)
(n,)−spanning sets F
x∈F
28
Let
PT (f, ) = lim sup(1/n) log PT (f, , n)
n→∞
Define the pressure as
PT (f ) = lim PT (f, )
→0
Note: when f = 0, PT (f ) = h(T ); to see this, use the spanning
set definition of topological entropy given in Walters or Keller.
Theorem (generalized variational principle)
Z
PT (f ) = sup hµ(T ) + f dµ
µ∈M
Note: sup is not always achieved. When it is achieved, it need not
be unique.
Note: But there are many important classes where the sup is
achieved uniquely.
Note: The RHS above can be expressed Ras a constrained maximization of hµ(T ), subject to constraint on f dµ.
Note: And cases where it is achieved non-uniquely correspond to
phase transitions.
THE END
29
Download