Ergodic Theory Contents March 14, 2013

advertisement
Ergodic Theory
March 14, 2013
Contents
1 Uniform Distribution
1.1 Generalisation To Higher Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Generalisation To Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
6
7
2 Dynamical Systems
2.1 Subshifts of Finite Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
10
3 Measure Theory
14
4 Measures On Compact Metric Spaces
16
5 Measure Preserving Transformations
18
6 Ergodicity
24
7 Recurrence and Unique Ergodicity
28
8 Birkhoff ’s Ergodic Theorem
30
9 Entropy
32
10 Functional Analysis
42
1
1
Uniform Distribution
Definition 1.1. Orbit
For a space X and transformation T : X → X the orbit of x ∈ X is defined as
Ox := {T n x}∞
n=0
Definition 1.2. Fixed Point
For a space X and transformation T : X → X we say that x ∈ X is a fixed point of T if T x = x
Definition 1.3. Periodic Point
For a space X and transformation T : X → X we say that x ∈ X is a periodic point of T if T n x = x
for some n ∈ N
Definition 1.4. Indicator Function
For a set A ⊆ X we denote the indicator function:
(
1
χA (x) =
0
x∈A
x∈
/A
Corollary 1.1. Given this definition we have that the frequency in which the orbit of x lies in A is
n−1
1X
χA (T j x)
n→∞ n
j=1
lim
If we say that a property holds for a typical x ∈ X we mean that the property holds almost
everywhere with respect to the measure on the space X.
Definition 1.5. Preserves
We say that T preserves µ if for any measureable A ⊆ X we have that µ(T −1 A) = µ(A)
For x ∈ R we denote bxc := max{m ∈ Z : m ≤ x} to be the interger part of x and bxe := x − bxc to be
the floating part.
Definition 1.6. Uniformly Distributed
We say that a sequence {xn }∞
n=0 is uniformly distributed mod 1 if for every 0 ≤ a < b < 1 we have that
lim
n→∞
1
#{j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = b − a
n
Lemma 1.1. If {xn }∞
n=0 is uniformly distributed then it is dense.
Proof. By contradiction suppose {xn }∞
n=0 is not dense. Then ∃0 ≤ a < b such that @xn ∈ [a, b].
So we have that {j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = φ ∀n and hence
limn→∞ {j : 0 ≤ j ≤ n − 1, {xj } ∈ [a, b]} = φ which contradicts the uniform distribution of {xn }∞
n=0
Lemma 1.2. The frequency in which the leading digit in the sequence {mn }∞
n=1 is
r ∈ {1, 2, 3, 4, 5, 6, 7, 8, 9} when log10 (m) ∈ R \ Q is
r+1
log10
r
Proof. The leading digit is r iff
r10l ≤ mn < (r + 1)10l
By taking the logarithm base 10 on all sides of this inequality we get that:
log10 (r) + l log10 (10) ≤ n log10 (m) < log10 (r + 1) + l log10 (10)
Simplifying this gives:
log10 (r) + l ≤ n log10 (m) < log10 (r + 1) + l
2
By our assumptions log10 (m) is irrational hence the sequence n log10 (m) mod 1 is uniformly
distributed so we have that
1
#{j ∈ [0, n−1] : j log10 (m) ∈ [log10 (r)+l, log10 (r+1)+l)} = (log10 (r+1)+l)−(log10 (r)+l) = log10
n→∞ n
lim
r+1
r
Theorem 1.1. Weyl’s Criterion
The following are equivalent:
• {xn }∞
n=0 are uniformly distributed mod 1
• For each l ∈ Z \ {0} we have that
n−1
1 X 2πilxj
e
=0
n→∞ n
j=0
lim
Proof.
• Firstly we will prove that:
{xn }∞
n=0 are uniformly distributed mod 1 =⇒
n−1
1 X 2πilxj
e
=0
n→∞ n
j=0
lim
WLOG we can assume that xn ∈ [0, 1) since e2πilxj = e2πilbxj e .
Suppose {xn }∞
n=0 are uniformly distributed mod 1
Then ∀[a, b] ⊂ [0, 1) we have that
lim
n−1
X
n→∞
Z
χ[a,b] (xj ) = b − a =
χ[a,b] (x)dx
j=0
From this we can deduce that for a step function g we have that
lim
n→∞
n−1
X
Z
gxj ) =
g(x)dx
j=0
Let f be a continuous function on [0, 1) and let ε > 0, we can find a step function g s.t.
||f − g||∞ < ε
Since {xj }∞
j=0 are uniformly distributed and g is a step function we can find n sufficiently large
s.t.
n−1
Z 1
1 X
g(xj ) −
g(x)dx < ε
n
0
j=0
So we have that:
n−1
n−1
Z 1
n−1
Z 1
Z 1
Z 1
1 X
1 X
1 X
f (xj ) −
f (x)dx ≤ (f (xj ) − g(xj )) + g(xj ) −
g(x)dx + g(x)dx −
f (x)dx
n
0
0
0
0
n j=0
n j=0
j=0
n−1
Z 1
1 X ε + ε + <
εdx
0
n j=0 = 3ε
3
Since ε > 0 was chosen arbitrarily we have that
n−1
Z 1
1 X
f (xj ) −
f (x)dx = 0
lim n→∞ n
0
j=0
Moreover
Z
1
e2πilx dx = 0
0
hence we have the required result.
• Now we will prove the reverse implication.
Suppose that for each l ∈ Z \ {0} we have that
n−1
1 X 2πilxj
lim
e
=0
n→∞ n
j=0
Then for trigonometric polynomial
g(x) =
n
X
αk e2πilk x
k=1
we have that
Z 1
n−1
1X
g(xj ) =
g(x)dx
lim
n→∞ n
0
j=0
Let f be a continuous function on [0, 1] with f (0) = f (1) and fix ε > 0.
We can find a trigonometric polynomial g s.t. ||g − f ||∞ < ε and as in the previous part of the
proof w can see that
Z 1
n−1
X
lim
f (xj ) =
f (x)dx
n→∞
0
j=0
If we take [a, b] ⊂ [0, 1) then we can find f1 , f2 continuous functions s.t. f1 ≤ χ[a,b] ≤ f2 where
f1 (0) = f1 (1), f2 (0), f2 (1) and
Z 1
f2 (x) − f1 (x)dx < ε
0
This gives us that:
lim inf
n→∞
n−1
n−1
1X
1X
χ[a,b] (x) ≥ lim inf
f1 (xj )
n j=0
n j=0
Z
1
=
f1 (x)dx
0
Z
1
≥
f2 (x)dx − ε
0
1
Z
≥
χ[a,b] (x)dx − ε
0
lim sup
n→∞
n−1
n−1
1X
1X
χ[a,b] (x) ≤ lim inf
f2 (xj )
n j=0
n j=0
Z
1
=
f2 (x)dx
0
Z
1
≤
f1 (x)dx + ε
0
Z
≤
χ[a,b] (x)dx + ε
0
4
1
But since ε > 0 was chosen arbitrarily we have that
Z 1
n−1
1X
χ[a,b] (x)dx = b − a
χ[a,b] (x) =
lim
n→∞ n
0
j=0
hence indeed {xn }∞
n=0 are uniformly distributed.
Lemma 1.3. The sequence xn = αn is uniformly distributed mod 1 for α ∈ R \ Q and not uniformly
distributed for α ∈ Q
Proof. We shall split the proof into the two cases:
• α∈Q
We can write α = p/q for p, q ∈ Z, q > 0 where p, q are coprime.
n oq−1
The sequence xn then only takes values np
of which there are finitely many hence the set
q
n=0
cannot be dense and therefore xn is not uniformly distributed.
• α∈R\Q
Let l ∈ Z \ {0} then lα ∈
/ Z so e2πixn 6= 1
This gives us that
n−1
1 X 2πiljα
1 1 − e2πilnα
e
=
n j=0
n 1 − e2πilα
so we have that
n−1
1 X 2πiljα 2
≤ lim 1
e
lim
n→∞ n 1 − e2πilα = 0
n→∞ n
j=0
Hence by Weyl’s criterion we have that xn is unifromly distributed mod 1.
Corollary 1.2. We have that xn = nα + β is uniformly distributed iff α ∈ R \ Q
Proof. We shall split the proof into the two cases:
• α∈Q
We can write α = p/q for p, q ∈ Z, q > 0 where p, q are coprime.
n
oq−1
The sequence xn then only takes values np
+
β
mod
1
of which there are finitely many
q
n=0
hence the set cannot be dense and therefore xn is not uniformly distributed.
• α∈R\Q
Let l ∈ Z \ {0} then lα ∈
/ Z so e2πixn 6= 1
This gives us that
n−1
n−1
1 X 2πil(jα+β)
e2πilβ X 2πiljα
e
=
e
n j=0
n j=0
so we have that so by the previous lemma we have the same convergence result.
5
1.1
Generalisation To Higher Dimension
(1)
(k)
k
For this subsection we will consider {xn }∞
n=1 : xn = (xn , ..., xn ) ∈ R
Definition 1.7. Uniformly Distributed
k
k
A sequence {xn }∞
n=0 ∈ R is uniformly distributed mod 1 if for each choice of k intervals {[as , bs ]}s=1
we have that
n−1 k
k
Y
1 XY
(s)
χ[as ,bs ] (xj ) =
(bs − as )
lim
n→∞ n
s=1
j=0 s=1
Theorem 1.2. Multi-Dimensional Weyl’s Criterion
k
The sequence {xn }∞
n=0 ∈ R is u.d. mod 1 iff
n−1
1 X 2πi Pks=1 ls x(s)
j
e
=0
n→∞ n
j=0
lim
where l ∈ Zk \ {0}
Definition 1.8. Rationally Independent
Pk
The sequence {λi }ki=1 are rationally independent if for {rs }ks=1 ∈ Q s.t.
s=1 rs λs = 0 we must have
that rs = 0 ∀s
(j)
Theorem 1.3. xn = nαj is uniformly distributed mod 1 iff {αs }ks=1 , 1 are rationally independent.
Proof. Suppose {αs }ks=1 , 1 are rationally independent then for any l ∈ Zk \ {0} we have that
k
X
ls αs ∈
/Z
s=1
So we have that
e2πi
Pk
s=1 ls nαs
6= 1
Hence
P
n−1
2πi k
1 X 2πi Pk l jα s=1 ls nαs
1
1
−
e
s
s=1 s
e
= n 1 − e2πi Pks=1 ls αs
n
j=0
≤
1
2
P
n 1 − e2πi ks=1 ls αs
1
2
Pk
=0
2πi
n→∞ n 1 − e
s=1 ls αs
lim
hence by Weyl’s criterion we have that xn is uniformly distributed mod 1.
Now suppose that {αs }ks=1 , 1 are not rationally independent then for some l ∈ Zk \ {0} we have that
k
X
ls αs ∈ Z
s=1
hence we have that
e2πi
So we have that
Pk
s=1 ls nαs
= 1 ∀n ∈ N
n
1 X 2πi Pks=1 ls nαs
e
=1
n→∞ n
j=0
lim
so by Weyl’s criterion xn is not uniformly distributed mod 1.
6
1.2
Generalisation To Polynomials
For this subsection we will write
p(n) =
k
X
αs ns
s=0
Lemma 1.4. Van-der Corput’s Inequality
Let {zj }n−1
j=0 ∈ C and let 1 ≤ m ≤ n − 1 then


n−1 2
n−1−j
n−1
n−1
X X
X
X
zj ≤ m(n + m − 1)
|zj |2 + 2(n + m − 1)< 
m2 (m − j)
zs+j z s 
j=0 s=0
j=0
j=1
(m)
For a sequence {xn }∞
n=0 let xn
:= xn+m − xn
(m)
Lemma 1.5. Let {xn }∞
n=0 ∈ R be a sequence, if for each m ≥ 1 we have that xn
xn is u.d. mod 1.
is u.d. mod 1 then
Proof. We need to show that for any l ∈ Z \ {0} we have that
lim
n→∞
n−1
X
e2πilxj = 0
j=0
Let zj := e2πilxj then |zj | = 1.
For 1 ≤ m ≤ n we have that by Van-der Corbut’s inequality:
2


n−1
n−1−j
m−1
m
2(n + m − 1)  X m − j X 2πil(xs+j −xs ) 
m2 X 2πilxj e
<
e
≤ n2 (n + m − 1)n +
n2 j=0
n
n
s=0
j=1


n−1−j
m−1
m
2(n + m − 1)  X
1 X 2πil(x(s)
)

j
= (n + m − 1) +
<
(m − j)
e
n
n
n
s=0
j=1
We have that
n−1−j
1 X 2πil(x(s)
j ) = 0
e
n→∞ n
s=0
lim
by Weyl’s criterion hence


n−1−j
m−1
1 X 2πil(x(s)
2(n + m − 1)  X
j ) = 0
<
(m − j)
e
lim
n→∞
n
n
s=0
j=1
Hence
2
m
m(n + m − 1)
limsupn→∞ 2 e2πilxj ≤ limsupn→∞
n j=0
n
2 n−1
X
=m
n−1
1 X 2πilx 1
j
limsupn→∞ e
≤ √m
n
j=0
7
which holds ∀m ≥ 1 so we can choose m arbitrarily large then
n−1
1 X 2πilxj
=0
e
n→∞ n
j=0
lim
hence by Weyl’s criterion we have that xn is u.d. mod 1
Theorem 1.4. If αr ∈ R \ Q for any αr ∈ {αs }ks=0 then p(n) is u.d. mod 1.
Proof. Suppose αk ∈ R \ Q.
We want to show this inductively so let ∆(k) be the event that any polynomial with irrational leading
coefficient of degree k is u.d. mod 1.
From corollary 1.2 we have that ∆(1) holds so suppose for some k ≥ 2 we have that ∆(k − 1) holds.
Let
k
X
p(n) =
αi ni : αk ∈ R \ Q
i=0
Foor any m ≥ 1 we have that
p(m + n) − p(n) =
k
X
αi (n + m)i −
i=0
=
=
=
k
X
k
X
αi ni
i=0
αi
i X
i
j
i=0
j=0
k−1
X
i X
i
αi
j
i=0
j=0
i X
i
i=0
j=0
k−1
X
i X
i−j
−
n m
k
X
αi ni
i=0
k−1
X
αi
j
j
i−j
n m
+ αk
k X
k
j=0
j
nj mi−j
+ αk
k−1
X j=0
k−1
X j
j
n m
k−j
−
k
X
αi ni
i=0
k
nj mk−j
j
+ αk nk −
k
X
αi ni
i=0
k−1
X
i
k
nj mi−j + αk
nj mk−j −
αi ni
j
j
i=0
j=0
j=0
i=0
k
= αk−1 nk−1 + αk
mnk−1 − αk−1 nk−1 + q(k − 2)
k−1
=
αi
Where q(k − 2) is some polynomial of degree k − 2 and hence
p(m + n) − p(n) = αk (k − 1)mnk−1 + q(k − 2)
which is a polynomial of degree k − 1 with an irrational leading coefficient hence by ∆(k − 1) we have
that p(m + n) − p(n) is u.d. mod 1 for any m ≥ 1 and therefore by lemma 1.5 we have that p(n) is u.d.
mod 1.
So by induction we have that ∆(k) holds for any k ∈ N
2
Dynamical Systems
Definition 2.1. Circle
We write S := {x + Z : x ∈ R} to we the circle which is an equivalence class over the real line.
Definition 2.2. Rotation:
T : S → S defined as T (x) = x + α is a rotation of degree α on the circle.
Lemma 2.1. If α ∈ Q then a rotation of degree α has every point periodic and if α ∈ R \ Q there are
no periodic points.
8
Proof. Suppose α = p/q for p, q ∈ Z, q > 0 coprime.
T q (x) = x + qα mod 1
= x + p mod 1
=x
so any point x is periodic.
If α ∈ R \ Q then the sequence nα + x is u.d. mod 1 hence every orbit is dense and therefore there
cannot be any periodic points.
Definition 2.3. Cylinder Set:
For a function T : S → S we denote the cylinder set of the sequence {xi }ni=0 to be
I(x0 , ..., xn ) = {0 ∈ S : T k (x) ∈ Cxk ∀k ∈ [0, n]}
where
(
Ci =
[0, 1/2)
[1/2, 1)
i=0
i=1
Lemma 2.2. The following statements are true of cylinder sets:
• If x ∈ [0, 1) has associated sequence {xn }∞
n=0 then
∞
\
I(x0 , ..., xn ) = {x}
n=0
• The set of cylinder sets of rank n form a partition for any n.
Proposition 2.1. For the doubling map T : S → S defined as T (x) = 2x mod 1 we have that:
• There are 2n periodic points of period n.
• The periodic points are dense.
• There exists a dense orbit.
Proof.
• Notice that T n (x) = 2n x mod 1.
Suppose T n (x) = x mod 1.
This happens iff 2n x = x + p for some p ∈ Z.
Which is equivalent to saying x = 2np−1 .
Hence each choice of p ∈ [0, 2n − 1) ∩ Z gives a distinct periodic point of which there are precisely
2n − 1.
• Let y ∈ [0, 1) and ε > 0.
We want to find some periodic point x ∈ (y − ε, y + ε) so find n sufficiently large such that
ε > (2n − 1)−1 .
Notice that x = 2np−1 for p = 0, ..., 2n − 2 are periodic points distributed evenly with distance
(2n − 1)−1 between consequtive values hence clearly some periodic point x must lie in the ball of
radius ε around y.
• For any x ∈ [0, 1) associate the sequence
(
0
xn :=
1
T n (x) ∈ [0, 1/2)
T n (x) ∈ [1/2, 1)
9
Now suppose
x̃ =
Then because
∞
X
xn
2n+1
n=0
∞
X
1
=1
n+1
2
n=0
we have that for almost every sequence xn that x̃ ∈ [0, 1/2) iff x0 = 0.
Moreover
T (x̃) = 2x̃ mod 1
∞
X
2xn
=
mod 1
n+1
2
n=0
= x0 +
=
∞
X
xn+1
mod 1
2n+1
n=0
∞
X
xn+1
mod 1
2n+1
n=0
so we have that T (x̃) ∈ [0, 1/2) iff x1 = 0.
Furthermore we can see by iterating this that T n (x̃) ∈ [0, 1/2) iff xn = 0.
So we must have that almost every x can be written
x=
∞
X
xn
2n+1
n=0
where xn is the sequence associated with x.
Since cylinder sets are dense and for the doubling map any cylinder of rank n is an interval of
width 2−n it suffices to find a point x ∈ [0, 1) such that T n (x) visits every cylinder.
If we order the cylinders 0, 1, 00, 01, 10, 11, 000, ... then we cand define x to be the the point with
binary expansion x = 0100011011000... in this way
T 0 (x) ∈ [0, 1/2), T 1 (x) ∈ [1/2, 1), T 2 (x) ∈ [0, 1/4), T 4 (x) ∈ [1/4, 1/2), T 6 (x) ∈ [1/2, 3/4), ... and so
on so indeed the iterates {T n (x)}∞
n=0 visit every cylinder.
2.1
Subshifts of Finite Type
Definition 2.4. One-Sided Shift Space:
For S = {0, ..., k} let A be a k × k matrix with entries {0, 1}.
The set of one-sided shifts of finite type generated by A is defined as
∞
Σ+
A := {{xj }j=0 : Axj ,xj+1 = 1 ∀j}
Definition 2.5. Two-Sided Shift Space:
For S = {0, ..., k} let A be a k × k matrix with entries {0, 1}.
The set of one-sided shifts of finite type generated by A is defined as
ΣA := {{xj }j∈Z : Axj ,xj+1 = 1 ∀j}
Definition 2.6. One Sided Shift:
+
+
The one sided shift is σ + : Σ+
A → ΣA defined as σ (x)i = xi+1
Definition 2.7. Two Sided Shift:
The two sided shift is σ : ΣA → ΣA defined as σ(x)i = xi+1
10
Definition 2.8. Irreducible:
A k × k matrix A of zeros and ones is called irreducible if for any pair i, j ∈ S we can find some n ∈ N
s.t. (An )i,j > 0
Notice that for n ∈ N, i, j ∈ S we have that (An )i,j is the number of paths from i → j in n steps in the
graph with vertices S and directed edges i → j iff Ai,j = 1.
Definition 2.9. Aperiodic:
A k × k matrix A of zeros and ones is called aperiodic if ∃n ∈ N such that for any pair i, j ∈ S we have
that (An )i,j > 0.
In the graphical representation this says that starting from any vertex we can get to any other vertex
in exactly n steps.
Definition 2.10. Two Sided Cylinder:
A two sided cylinder for a partial sequence {yi }ni=m of ΣA is an open set:
[ym , ..., yn ]m,n := {x ∈ ΣA : xj = yj ∀m ≤ j ≤ n}
Definition 2.11. One Sided Cylinder:
A one sided cylinder for a partial sequence {yi }ni=m of Σ+
A is an open set:
[ym , ..., yn ]m,n := {x ∈ ΣA : xj = yj ∀m ≤ j ≤ n}
Lemma 2.3. A shift space along with the metric
d(x, y) := 2−min{|n|:xn 6=yn }
is a metric space.
Proof.
• d(x, y) = 0 ⇐⇒ x = y
Notice that d(x, y) = 0 ⇐⇒ −min{|n| : xn 6= yn } = ∞ ⇐⇒ x = y so this clearly holds.
• d(x, y) = d(y, x)
Clearly min{|n| : xn 6= yn } = min{|n| : yn 6= xn } so indeed this holds.
• d(x, y) + d(y, z) ≥ d(x, z)
n0 := min{|n| : xn 6= zn } ≥ min{min{|n| : xn 6= yn }, min{|n| : yn 6= zn }} =: m
since otherwise xn0 = yn0 = zn0 which is a contradiction.
This gives us that
0
d(x, z) = 2−n ≤ 2−m ≤ 2−min{|n|:xn 6=yn } + 2−min{|n|:yn 6=zn } = d(x, y) + d(y, z)
Theorem 2.1. For shift space Σ+
A we have that
• Σ+
A is compact.
• σ + is continuous.
Proof.
+
• If Σ+
A = φ then we are done so assume that ΣA is non-empty.
+
(m) ∞
Let {x }m=1 ∈ ΣA be a sequence of elements of Σ+
A.
Since the cylinders of a given degree form a disjoint countable partition we have that
Σ+
A
=
k
[
[i]0,0
i=1
11
Since there are finitely many in this union we must have that ∃i0 ∈ [1, k] such that there are
infinitely many elements of x(m) inside [i0 ]0,0 . Denote these the subsequence x(m0 ) .
Furthermore we can write
k
[
[i0 ]0,0 =
[i0 , i]0,1
i=1
similarly ∃i1 ∈ [1, k] s.t. there are infinitely many elements of x(m0 ) in [i0 , i1 ]0,1 . Denote these
the subsequence x(m1 ) .
(m)
Inductively we have that ∃{ik }∞
inside
k=0 ∈ [1, k] s.t. there are infinitely many elements of x
[i0 , ..., ik ]0,k for any k.
(m )
We have some element xk k ∈ [i0 , ..., ik ]0,k for every k and since y ∈ [i0 , ..., ik ]0,k we must have
(m )
that d(xk k , y) ≤ 2−k so indeed we have a convergent subsequence and therefore the space is
sequentially compact and therefore compact.
• Let ε > 0 then find n large such that 2−n < ε.
Choose δ = 2−(n+1) then d(x, y) < δ =⇒ y ∈ [x0 , ..., xn+1 ]0,n+1
This gives us that σ + (y) ∈ [x1 , ..., xn+1 ]0,n
So indeed d(σ + (x), σ + (y)) < 2−n < ε.
Definition 2.12. Continual Fraction Map:
T : [0, 1) → [0, 1) defined by
T (x) =
(
0
1
x
x=0
x 6= 0
mod
is called the continual fraction map.
Definition 2.13. Continual Fraction Expansion:
If x ∈ (0, 1) then the continual fraction expansion of x is
x=
1
x0 +
1
x1 + x
1
2 +...
where {xi }∞
i=1 ∈ N ∪ {∞}.
Lemma 2.4. x ∈ (0, 1) has a finite continual fraction expansion iff x ∈ Q.
Lemma 2.5. If x ∈ Q ∩ (0, 1) then x has a unique continual fraction expansion.
Lemma 2.6. If T is the continual fraction map and x has the continual fraction expansion with
sequence {xi }∞
i=0 then
1
xi =
T ix
Proof. Inductively we have that
1
1
= x0 +
1
x
x1 + x2 +...
x0 ∈ N and
1
x1 +
so indeed
1
x2 +...
∈ [0, 1)
1
= x0
x
Assume that ∀i ≤ n we have that
1
xi =
T ix
12
T n+1 x =
So
1
T n+1 x
1
xn+1 + xn+21 +...
= xn+1 +
1
xn+2 + ...
where the remaining continual fraction is again less than one and xn+1 ∈ N so indeed
1
xn =
T n+1 x
So indeed by induction we have the required result.
Definition 2.14. Linear Toral Endomorphism:
If A : Rk → Rk is a k × k matrix with entries in Z such that det(A) 6= 0 then A is a linear map and
TA : Rk /Zk → Rk Zk defined as TA x = Ax mod 1 is called a linear toral endomorphism.
Lemma 2.7. The linear toral endomorphism is well defined.
Proof. Suppose x, y ∈ Rk such that x = y + n for some integer vector n so x, y are in the same
equivalence class on Rk /Zk .
Ax = A(y + n)
= Ay + An
= Ay mod 1
Since n is an interger vector and A has integer entires implies that An is an integer vecotor.
Definition 2.15. Linear Toral Automorphism:
A linear toral endomorphism TA is a linear toral automorphism if det(A) = ±1
Lemma 2.8. If TA is a linear toral automorphism then TA−1 = TA−1
Definition 2.16. Hyperbolic Toral Automorphism:
A linear toral automorphism TA is a hyperbolic toral automorphism if A doesn’t have eigenvalues of
modulus 1.
Proposition 2.2. Let TA be a hyperbolic toral automorphism of R2 /Z2 then Q2 /Z2 is the set of all
periodic points of TA .
Proof. Suppose (x1 , x2 ) = pq1 , pq2 where 0 ≤ p1 , p2 < q are integers.
We have that
!
(n)
(n)
p1 p2
n
TA (x1 , x2 ) =
,
q
q
(n)
(n)
where 0 ≤ p1 , p2 < q are integers representing the transformed points. Notice that q remains
unchanged since we are always multiplying by intergers. Moreover there are at most q possible choices
(n)
(n) (n)
for pi and hence q 2 possible distinct combinations (p1 , p2 ) so we must have that there are some
n > m ≥ 0 such that
TAn (x1 , x2 ) = TAm (x1 , x2 )
But since TA is invertible and TA−1 = TA−1 we have that
TAn−m (x1 , x2 ) = (x1 , x2 )
so indeed (x1 , x2 ) are periodic and since (x1 , x2 ) was chosen arbitrarily all rational points are periodic.
Suppose (x1 , x2 ) is periodic then ∃n s.t. T n (x1 , x2 ) = (x1 , x2 ).
13
This is equivalent to saying An (x1 , x2 ) = (x1 , x2 ) + (n1 , n2 ) for ni ∈ Z.
This gives us that (An − I)(x1 , x2 ) = (n1 , n2 ).
Now since TA doesn’t have any eigenvalues with modulus one we have that 1 is not an eigenvalue of A
(and therefore An ) so An − I is invertible.
Therefore
(x1 , x2 ) = (An − I)−1 (n1 , n2 )
But (An − I)−1 is a rational matrix and (n1 , n2 ) is an integer vector so their product must be a
rational vector and hence all periodic points are rational.
3
Measure Theory
Definition 3.1. Algebra:
For a set X, a collection A of subsets of X is called a σ-algebra if:
• φ∈F
• A ∈ F =⇒ Ac ∈ F
• A, B ∈ F =⇒ A ∩ B ∈ F
Definition 3.2. σ-Algebra:
For a set X, a collection F of subsets of X is called a σ-algebra if:
• φ∈F
• A ∈ F =⇒ Ac ∈ F
S∞
• {Ai }∞
i=1 ∈ F =⇒
i=1 Ai ∈ F
Lemma 3.1. If F is a σ-algebra of subsets of X then
• X∈X
• {Ai }∞
i=1 ∈ F =⇒
T∞
i=1
Ai ∈ F
Proof.
• X = φc ∈ F
T∞
S∞
c
• i=1 Ai = ( i=1 Aci ) ∈ F
Definition 3.3. Borel σ-Algebra:
For a given set X the Borel σ-algebra B(X) on X is the smallest σ-algebra containing all open sets.
Definition 3.4. Measurable Space:
If X is a set and F a σ-algebra on X then (X, F) is called a measurable space.
Definition 3.5. Measure:
If (X, F) is a measurable space then µ : R+ → {∞} is a measure on (X, F) if:
• µ(φ) = 0
• {Ai }∞
i=1 ∈ F : Ai ∩ Aj = φ ∀i 6= j then
µ
∞
[
!
Ai
i=1
=
∞
X
µ(Ai )
i=1
Definition 3.6. Measure Space:
If X is a set, F a σ-algebra on X and µ a measure on (X, F) then (X, F, µ) is called a measure space.
14
Definition 3.7. Finite Measure:
A measure µ on (X, F, µ) is finite if µ(X) < ∞
Definition 3.8. σ-Finite Measure:
A measure µ on (X, F, µ) is σ-finite if ∃{Ai }∞
i=1 ∈ F such that
• µ(Ai ) < ∞ ∀i
S∞
• X = i=1
Definition 3.9. Almost Everywhere:
For measure space (X, F, µ) a property P holds almost everywhere with respect to µ is µ(P f ails) = 0
Theorem 3.1. Kolmogorov Extension Theorem:
If A is an algebra on X and µ : A → R+ satisfies
• µ(φ) = 0
• µ σ-finite.
• {Ai }∞
i=1 ∈ F : Ai ∩ Aj = φ ∀i 6= j then
∞
[
µ
!
Ai
=
∞
X
i=1
µ(Ai )
i=1
Then ∃1 µ∗ : B(A) → R+ ∪ {∞} extension of µ.
Definition 3.10. Stieltjes Measure:
For X = [0, 1], A the algebra generated by open intervals and ρ : X → R+ increasing with
ρ(1) − ρ(0) = 1 we define the Stieltjes measure with respect to ρ by µ(a, b) = ρ(b) − ρ(a) which extends
to the entire space by KET.
Definition 3.11. Dirac Measure:
For X an arbitrary space and A any non-empty σ-algebra we define the Dirac measure with respect to
x ∈ X to be δx (A) = I{x∈A}
Definition 3.12. Measurable:
If (X, F, µ) is a measure space then f : X → R is called F-measurable if f −1 (A) ∈ F
∀A ∈ B(R)
Definition 3.13. Simple:
f : X → R is called simple if ∃{Ai }ki=1 ∈ F, {ai }ki=1 ∈ R such that
f=
k
X
ai IAi
i=1
+
Theorem 3.2. For f : X → R+ measureable ∃{fn }∞
n=1 : X → R increasing sequence of simple
functions converging pointwise to f .
Definition 3.14. Integral:
We split the definition of an integral into three seperate cases:
• If f : X → R is simple then we can write
f=
k
X
ai IAi
i=1
and then
Z
f dµ =
k
X
i=1
15
ai µ(Ai )
• If f : X → R+ then by theorem 3.2 we can find an increasing sequence {fn }∞
n=1 of simple
functions converging pointwise to f but less than f everywhere then
Z
Z
f dµ = lim
fn dµ
n→∞
• If f : X → R such that
Z
|f |dµ < ∞
then we write f + = max{f, 0}, f − = max{−f, 0} so that f = f + − f − then define
Z
Z
Z
f dµ = f + dµ − f − dµ
All of which are consistent.
Definition 3.15. Equivalent:
If f, g : X → R are measurable then they are equivalent with respect to the measure µ if f = g µ − a.e.
We write L1 (X, F, µ) to be the space of equivalence classes of integrable functions f : X → R and
define
Z
||f ||1 = |f |dµ
to be its norm which defines a metric via d(f, g) = ||f − g||1
Furthermore for p ≥ 1 we write Lp (X, F, µ) to be the space of equivalence classes of functions
f : X → R such that |f |p is integrable and define
Z
||f ||p =
p1
|f | dµ
p
to be its norm.
Lemma 3.2. If (X, F, µ) is a finite measure space then for 1 ≤ p < q we have that
Lq (X, F, µ) ⊂ Lp (X, F, µ)
Theorem 3.3. Monotone Convergence Theorem:
R
If {fn }∞
fn dµ is
n=1 : X → R is an increasing sequence of integrable functions on (X, F, µ) such that
a bounded sequence then
limn→∞ fn exists µ a.e, is integrable and
Z
Z
lim
fn dµ =
lim fn dµ
n→∞
n→∞
Theorem 3.4. Dominated Convergence Theorem:
If {fn }∞
n=1 : X → R is a sequence of measurable functions on (X, F, µ) such that |fn | ≤ g for some
integarble function g : X → R and limn→∞ fn = f µ a.e. then f is integrable and
Z
Z
lim
fn dµ = f dµ
n→∞
4
Measures On Compact Metric Spaces
Lemma 4.1. C(X, R) := {f : X → R continuous} equiped with the metric
d(f, g) = ||f − g||∞ := supx∈X |f (x) − g(x)|
is a metric space.
16
We denote M (X) to be the set of probability measures on (XB) then for µ ∈ M (X) we write
Z
µ(f ) := f dµ
Proposition 4.1. If µ ∈ M (X) then:
• µ is continuous:
fn ∈ C(X, R), limn→∞ fn = f then limn→∞ µ(fn ) = µ(f )
• µ is bounded:
f ∈ C(X, R) then |µ(f )| eq||f ||∞
• µ is linear:
λ1 , λ2 ∈ R, f1 , f2 ∈ C(X, R) then µ(λ1 f1 + λ2 f2 ) = λ1 µ(f1 ) + λ2 µ(f2 )
• µ is positive:
f ≥ 0 then µ(f ) ≥ 0
• µ is normalised:
µ(1) = 1
Theorem 4.1. Riesz Representation Theorem:
Let w : C(X, R) → R be a linear, bounded, positive, normalised functional then ∃1 µ ∈ M (X) such that
Z
w(f ) = f dµ
Definition 4.1. Complete:
A metric space is complete if every Cauchy sequence converges.
Definition 4.2. Seperable:
A metric space is seperable if it contains a countable dense subset.
Proposition 4.2. M (X) is convex.
i.e. µ1 , µ2 ∈ M (X), α ∈ [0, 1] then αµ1 + (1 − α)µ2 ∈ M (X)
Definition 4.3. Weak Convergence:
If µ, {µn }∞
n=1 ∈ M (X) then µn converges to µ weakly if ∀f ∈ C(X, R) we have that
Z
Z
lim
f dµn = f dµ
n→∞
Lemma 4.2. ∃{fn }∞
n=1 ∈ C(X, R) countable dense subset and ∀µ, ν ∈ M (X) we have that
∞
X
Z
Z
1
d(µ, ν) :=
f
dµ
−
f
dν
2n ||fn ||∞
n=1
is a metric on M (X) compatable with the notation of weak convergence.
Theorem 4.2. If X is a compact metric space then M (X) is weakly compact.
Proof. It suffices to show that M (X) is sequentially compact i.e. that:
∞
∀{µn }∞
n=1 ∈ M (X) ∃{µnk }k=1 which converges weakly.
C(X, R) is seperable so choose a countable dense subset {fi }∞
i=1 ∈ C(X, R).
∈
M
(X)
we
have
that
Given {µn }∞
n=1
µn (f1 ) ≤ ||f1 ||∞
∀n
by boundedness of µn hence since this sequence is in R we have that there is some convergent
∞
subsequence {µnk (1) }∞
k=1 ⊆ {µn }n=1
17
Similarly for each r = 2, 3, ... we have that
µn (fr ) ≤ ||fr ||∞
∀n
by boundedness of µn hence since this sequence is in R we have that there is some convergent
∞
subsequence {µnk (r) }∞
k=1 ⊆ {µnk (r−1) }k=1
In particular let νn := µnk(n) be the diagonal sequence then νn (fn ) converges ∀n ≥ 1
Since fn are dense we have that for any f ∈ C(X, R) and fixed ε > 0 we can find some fi in our
counable set such that ||f − fi ||∞ < ε.
Since νn converges we can find N ∈ N such that ∀m, n ≥ N we have that
|νn (fi ) − νm (fi )| < ε
hence
|νn (f ) − νm (f )| ≤ |νn (f ) − νn (fi )| + |νn (fi ) − νm (fi )| + |νm (fi ) − νm (f )|
≤ 3ε
So indeed νn (f ) converges.
Moreover by writing
w(f ) = lim νn (f )
n→∞
we have that W satisfies the Riesx representation theorem criteria hence ∃1 µ ∈ M (X) s.t.
Z
w(f ) = f dµ
so we must have that
Z
lim
n→∞
Z
f dνn =
f dµ
∀f ∈ C(X, R)
so indeed νn converges weakly to µ so we have some convergent subsequence.
5
Measure Preserving Transformations
Definition 5.1. Measure Preserving Transformation:
T : X → X measurable on (X, B) is a measure preserving transform if µ(T −1 (A)) = µ(A)
Lemma 5.1. TFAE:
1. T is a measure preserving transform.
2. ∀f ∈ L1 (X, B, µ) we have that
Z
Z
f ◦ T dµ =
Proof. 2 =⇒ 1):
For A ∈ B we have that χA ∈ L1 (X, B, µ)
Z
µ(A) =
χA dµ
Z
χA ◦ T dµ
=
Z
=
χT −1 (A) dµ
= µ(T −1 (A))
18
f dµ
∀A ∈ B
1 =⇒ 2):
Suppose T is a measure preserving transform then for any characteristic function we have that:
Z
χA dµ = µ(A)
= µ(T −1 (A))
Z
= χA ◦ T dµ
This extends to simple functions by linearity and for f ∈ L1 (X, B, µ) we can find an increasing
sequence of simple functions fn ∈ L1 (X, B, µ) s.t. fn → f pointwise. In particular fn ◦ T → f ◦ T
pointwise so:
Z
Z
fn dµ
f dµ = lim
n→∞
Z
= lim
fn ◦ T dµ
n→∞
Z
= f ◦ T dµ
Definition 5.2. Push Forward Measure:
For T : X → X continuous on compact X we define T∗ : M (X) → M (X) by T∗ µ(A) = µ(T −1 A) and
call T∗ µ the push forward measure.
Notice that µ is T -invariant iff T∗ µ = µ
Lemma 5.2. For f ∈ C(X, R) we have that
Z
Z
f d(T∗ µ) = f ◦ T dµ
Lemma 5.3. If T : X → X is continuous on compact metric space X then TFAE:
1. T∗ µ = µ
2. ∀f ∈ C(X, R) we have that:
Z
Z
f dµ =
f ◦ T dµ
Proof. 1 =⇒ 2) follows from lemma 5.1.
2 =⇒ 1):
Let w1 , w2 : C(X, R) → R be defined by
Z
w1 (f ) = f dµ
Z
w2 (f ) = f d(T∗ µ)
which satisfy the criteria for Riesz representation theorem so
Z
w2 (f ) = f d(T∗ µ)
Z
= f ◦ T dµ
Z
= f dµ
= w1 (f )
19
but by uniqueness from RRT we must have that µ = T∗ µ
Theorem 5.1. Let T : X → X be a continuous mapping of compact metric space X then there is at
least one T -invariant probability measure.
Proof. Let σ ∈ M (X) and define
µn :=
then we have that
Z
n−1
1X j
T σ
n j=0 ∗


n−1 Z
1 X
f ◦ T j dσ 
f dµn =
n j=0
Since M (X) is weakly compact we have that µn has a convergent subsequence µnk converging to some
µ ∈ M (X). We need to show that µ is T -invariant.
Let f : X → R be continuous then
Z
Z
Z
Z
f ◦ T dµ − f dµ = lim f ◦ T dµn − f dµn k
k
k→∞ nX
Z
n−1
X 1 Z
k −1 1
= lim f ◦ T j+1 dσ −
f ◦ T j dσ k→∞ nk
n
j=0
j=0 k
Z
Z
1 = lim
f ◦ T nk dσ − f dσ k→∞ nk 2||f ||∞
≤ lim
k→∞
nk
=0
Theorem 5.2. For a compact metric space X and T : X → X continuous mapping we have that:
• M (X, T ) is convex.
• M (X, T ) is closed.
Proof.
• Let µ1 , µ2 ∈ M (X, T ) and α ∈ (0, 1)
(αµ1 + (1 − α)µ2 )(T −1 B) = αµ1 (T −1 B) + (1 − α)µ2 (T −1 B)
= αµ1 (B) + (1 − α)µ2 (B)
= (αµ1 + (1 − α)µ2 )(B)
• Let {µn }∞
n=1 ∈ M (X, T ) be a sequence of T -invariant probability measures converging to some
µ ∈ M (X) weakly.
For f ∈ C(X, R)
Z
Z
f ◦ T dµ = lim
f ◦ T dµn
n→∞
Z
= lim
f dµn
n→∞
Z
= f dµ
20
Corollary 5.1. In order to show that a continuous mapping T : X → X is µ-invariant we can simply
check µ(T −1 B) = µ(B) for open intervals.
Open intervals generate the Borel σ algebra and hence by Kolmogorov’s extension theorem we must
have that the measures are unique and hence T∗ µ, µ coincide on the entire σ-algebra.
Definition 5.3. Fourier Series:
For f ∈ L1 (R \ Z, B, µ) we have the Fourier series
∞
X
cn e2πinx
n=−∞
where
Z
1
cn =
f (x)e−2πinx dµ(x)
0
For general f we do not have that this series necessarily converges.
Lemma 5.4. Riemann-Lebesgue:
If f ∈ L1 then limn→∞ cn = 0
We denote
n
X
Sn (x) :=
cn e2πirx
r=−n
to be the partial sum of the Fourier series and
σn (x) :=
n−1
1X
Sk (x)
n
k=0
to be the Cesaro average.
Theorem 5.3. Riesz-Fischer:
For f ∈ L2 (R \ Z, B, µ) we have that Sn converges to f in L2
Lemma 5.5. For f ∈ L2 (R \ Z, B, µ) we have that Sn converges to f µ-a.e.
Theorem 5.4. Feyer’s:
If f is continuous then σn converges uniformly to f
Corollary 5.2. In order to determine whether a measure µ is T -invariant it suffices to check that
Z
Z
σn dµ = σn ◦ T dµ
By Feyer’s theorem.
Moreover to check this it suffices to show that
Z
Z
Sn dµ = Sn ◦ T dµ
so long as f ∈ L2 by Riesz-Fischer.
Definition 5.4. Equivalent:
Two measures µ, ν on the same measurable space are equivalent if they have the same collection of null
sets.
Lemma 5.6. If T : Rk /Zk is a linear toral endomorphism defined as T (x) = Ax mod 1 then T is
Lebesgue invariant.
21
Proof. Let f ∈ L1 (Rk /Zk , mathcalB, λ) then f has Fourier series
X
cn e2πi<n,x>
n∈Z
where
Z
cn :=
f (x)e−2πi<n,x> dλ
Rk /Zk
Moreover
Z
e
Rk /Zk
2πi<n,x>
(
0
dλ =
1
n 6= 0
n=0
Hence since det(A) 6= 0 we have that nA = 0 ⇐⇒ n = 0 so
Z
Z X
f ◦ T dλ =
cn e2πi<n,Ax> dλ
n∈Zk
=
Z X
cn e2πi<nA,x> dλ
n∈Zk
=
X
Z
cn
e2πi<nA,x> dλ
n∈Zk
= c0
Z
= f dλ
Theorem 5.5. Perron-Frobenum:
If B is a non-negative, aperiodic, k × k matrix then:
• ∃λ > 0 eigenvalue of B such that |λ̃| < λ for all other eigenvalues λ̃. of B
• λ is simple i.e. the eigenspace of λ is one dimensional.
• ∃1 v right-eigenvector s.t. vi > 0 ∀i, Bv = λv and
k
X
vi = 1
i=1
• ∃1 u left-eigenvector s.t. ui > 0 ∀i, uB = λu and
k
X
ui = 1
i=1
• Eigenvectors corresponding to other eigenvalues have at least one negative entry.
Definition 5.5. Stochastic Matrix:
A k × k matrix P is called stochastic if
• Pi,j ≥ 0
Pk
•
j=1 Pi,j = 1
∀i
Definition 5.6. Compatible:
Stochastic matrix P is compatible with 0, 1 matrix A if Pi,j > 0 ⇐⇒ Ai,j = 1
22
Corollary 5.3. P is aperiodic if and only if A is aperiodic where P is compatible with A.
Lemma 5.7. If P is a stochastic matrix then P satisfies the hypothesis of the P-F theorem and hence
∃λ > 0 strictly largest eigenvalue moreover, λ = 1 and has corresponding right-eigenvector v = 1 which
is the unique eigenvector with positive entries.
Definition 5.7. Markov Measure:
If P is a stochastic matrix and p the left-eigenvector of P then
µP [y0 , ..., yn ] := py0
n
Y
Pyi−1 ,yi
i=1
defines the Markov measure on cylinder sets which extends to the shift space by KET.
Lemma 5.8. A Markov measure is σ-invariant.
Proof.
σ∗ µP [y0 , ..., yn ] = µP (σ −1 [y0 , ..., yn ])
= µP
k
[
!
[i, y0 , ..., yn ]
i=1
=
k
X
µP ([i, y0 , ..., yn ])
i=1
=
k
X
pi Pi,y0
i=1
= py0
n
Y
Pyj−1 ,yj
j=1
n
Y
Pyj−1 ,yj
since p is a left eigenvector of P
j=1
= µP ([y0 , ..., yn ])
There are uncountably many σ-invariant probability measures.
Definition 5.8. Bernoulli Measure:
For a full shift we define the Bernoulli measure by the stochastic matrix Pi,j = pj so that
µP ([y0 , ..., yn ]) :=
n
Y
pyi
i=0
Definition 5.9. Parry Measure:
For a 0, 1 matrix A with eigenvalue λ and eigenvectors u, v determined by P-F we define the Parry
measure to be that generated by
Ai,j vj
Pi,j =
λvi
and
ui vi
pi = Pk
j=1 uj vj
23
6
Ergodicity
Definition 6.1. Ergodic:
If (X, B, µ) is a probability space then the measure preserving transformation T : X → X is ergodic if
B ∈ B s.t. T −1 (B) = B implies that µ(B) ∈ {0, 1}
Theorem 6.1. If T is an ergodic measure preserving transformation of probability space (X, B, µ) and
f ∈ L1 (X, B, µ) then
Z
n−1
1X
j
lim
f (T x) = f dµ
n→∞ n
j=0
Lemma 6.1. If ∃A ∈ B such that T −1 (A) = A but µ(A) ∈ (0, 1) then T is not ergodic for µ but µA
defined by
µ(B ∩ A)
µA (B) =
µ(A)
is invariant with respect to T .
Lemma 6.2. If B ∈ B is such that µ(T −1 B∆B) = 0 then ∃B∞ ∈ B such that T −1 (B∞ ) = B∞ and
µ(B∞ ∆B) = 0
Corollary 6.1. If T is ergodic and µ(T −1 (B)∆B) = 0 then µ(B) ∈ {0, 1}
Proposition 6.1. Let T be a measure preserving transformation of the probability space (X, B, µ) then
TFAE:
• T is ergodic.
• Whenever f ∈ L1 (X, B, µ) such that f ◦ T = f
µ a.e. we have that f is constant µ a.e.
Proof.
• 1 =⇒ 2)
Suppose T is ergodic and f ∈ L1 (X, B, µ) such that f ◦ T = f µ a.e.
For k ∈ Z, n ∈ N define
h k k + 1
k
k+1
X(k, n) := x ∈ X : n ≤ f (x) <
= f −1 n , n
n
2
2
2
2
Since f is measurable we have that X(k, n) ∈ B.
Moreover T −1 (X(k, n))∆X(k, n) ⊂ {x ∈ X : f (x) 6= f (T x)} hence µ(T −1 (X(k, n))∆X(k, n)) = 0
so we must have that µ(X(k, n)) ∈ {0, 1} by corollary 6.1
For fixed n ∈ N we have that
[
X=
X(k, n) ∪ X∞
k∈Z
forms a disjoint partition where X∞ := {x : f (x) = ±∞}.
Furthermore µ(X∞ ) = 0 since f ∈ L1 which means that
X
1 = µ(X) =
µ(X(k, n))
k∈Z
Since each X(k, n) satisfies µ(X(k, n)) ∈ {0, 1} we must have that for each n ∈ N that ∃1 kn ∈ Z
such that µ(X(kn , n)) = 1.
If we let
∞
\
Y =
X(kn , n) = {x ∈ X : f (x) = c}
n=1
for some c then µ(Y ) = 1 and f is constant on Y so f is constant µ a.e.
24
• 2 =⇒ 1)
Suppose B ∈ B such that T −1 (B) = B.
χB ∈ L1 and χB ◦ T = χT −1 B = χB so by our assumption χB is constant µ a.e. hence
Z
µ(B) = χB dµ ∈ {0, 1}
Hence T is ergodic.
This extends to f ∈ L2 (X, B, µ) using the same proof.
Theorem 6.2. If T : R/Z → R/Z is the rotation T (x) = x + α mod 1 then
T is ergodic for the Lebesgue measure iff α ∈ R/Q
Proof. Suppose α ∈ Q then let α = p/q for p, q ∈ Z coprime.
Define
f (x) = e2πiqx ∈ L2
which isn’t constant.
f (T x) = e2πiq(x+α)
= e2πiqx e2πiqα
= e2πiqx e2πip
= e2πiqx
hence f ◦ T = f µ a.e. but f isn’t constant µ a.e. hence T cannot be ergodic.
Suppose α ∈ R \ Q and f ∈ L2 with f ◦ T = f µ a.e.
Then f has Fourier series
∞
X
cn e2πinx
n=−∞
Moreover f ◦ T has Fourier series
∞
X
cn e2πinx e2πinα
n=−∞
Since f ◦ T = f µ a.e. we must have that the Fourier series coincide so by comparing coefficients we
have that for n 6= 0 cn = cn e2πinα which only has a solution if cn = 0.
We therefore have that both functions have the Fourier series c0 which is a constant hence f = c0 µ
a.e. so T is ergodic.
Theorem 6.3. If T : R/Z → R/Z is the doubling map T (x) = 2x mod 1 then T is ergodic with respect
to the Lebesgue measure.
Proof. Suppose f ∈ L2 with f ◦ T = f
f has Fourier series
µ a.e. then f ◦ T j = f
X
µ a.e. for any j ∈ N
cn e2πinx
n∈Z
and f ◦ T j has Fourier series
X
j
cn e2πin2
x
n∈Z
j
Since f = f ◦ T
µ a.e. we must have that the Fourier series coincide hence cn = cn2j for any j ≥ 0
If n 6= 0 then limj→∞ |n2j | = ∞ but by the Riemann-Lebesgue lemma the coefficients must converge
to zero: limj→∞ cn2j = 0
This means that cn = 0 whenever n 6= 0, in particular f = c0 µ a.e. hence T is ergodic.
25
Lemma 6.3. If T : Rk /Zk → Rk /Zk is a linear toral automorphism T (x) = Ax mod 1 then TFAE
• T is ergodic with respect to the Lebesgue measure.
p
• The only n ∈ Zk s.t. ∃p ∈ N with e2πi<n,A
x>
= e2πi<n,x>
µ a.e. is n = 0
Proof. Suppose T is ergodic with respect to µ and that ∃n ∈ Zk , p ∈ N s.t.
p
e2πi<n,A x> = e2πi<n,x> µ a.e.
WLOG let p be the smallest such p for this m and define
f (x) =
p−1
X
j
e2πi<n,A
x>
∈ L2
j=0
Notice that f ◦ T = f µ a.e.
Since T is ergodic we must have that f is constant which is only the case if m = 0.
p
Suppose the only n ∈ Zk s.t. ∃p ∈ N with e2πi<n,A x> = e2πi<n,x> µ a.e. is n = 0
Let f ∈ L2 s.t. f ◦ T = f µ a.e.
f has Fourier series
X
cn e2πi<n,x>
n∈Z
p
f ◦ T has Fourier series
X
p
e2πi<n,A
x>
n∈Z
=
X
p
cn e2πi<nA
,x>
n∈Z
p
Since f = f ◦ T
µ a.e. we can equate coefficients which gives us that cn = cnAp for any p ≥ 0
Suppose cn 6= 0 then cnAp 6= 0
If limp→∞ ||nAp || = ∞ then by Riemann Lebesgue we must have that limp→∞ cnAp = 0 which
contradicts our last assumption so it must be the case that nAp has repeats.
0
This means that ∃l > l0 s.t. nAl = nAl and so nAp = n for some p ∈ N.
p
This gives us that e2πi<n,A x> = e2πi<n,x> and so by our initial assumption n = 0.
This means that we must have that f = c0 is constant and hence T is ergodic.
Proposition 6.2. If T : Rk /Zk → Rk /Zk is a linear toral automorphism T (x) = Ax mod 1 then
T is ergodic with respect to the lebesgue measure iff A has no roots of unity as eigenvalues.
Proof. Suppose that T is not ergodic then by the previous lemma ∃n ∈ Zk \ {0}, p ∈ N s.t.
p
e2πi<n,A x> = e2πi<n,x>
So we have that nAp = n and since n 6= 0 we must have that 1 is an eigenvalue of Ap so A has an
eigenvalue which is a root of unity.
Suppose A has a pth root of unity as an eigenvalue.
Then Ap has 1 as an eigenvalue hence ∃n ∈ Rk \ {0} such that n(Ap − I) = 0
In particualr since Ap has integer entries we can choose n such that n ∈ Zk \ {0}
p
This means that nAp = n and e2πi<n,A x> = e2πi<n,x> so by the previous lemma T is not ergodic with
respect to µ.
Corollary 6.2. Hyperbolic toral automorphisms and ergodic with respect to the Lebesgue measure.
Definition 6.2. Extremal:
For a convex set Y we say that y ∈ Y is extremal if y = αy1 + (1 − α)y2 where y1 , y2 ∈ Y and
α ∈ (0, 1) implies that y1 = y2 = y
Theorem 6.4. For µ ∈ M (X, T ) we have that if µ is extemal then µ is ergodic.
Proof. Suppose that µ is not ergodic then ∃B ∈ B such that T −1 (B) = B and µ(B) ∈ (0, 1).
Define
µ(A ∩ B)
µ1 (A) :=
µ(B)
µ2 (A) :=
µ(A ∩ (X \ B))
µ(X \ B)
26
which are both invariant probability measures with respect to T moreover µ1 6= µ2
µ = µ(B)µ1 + (1 − µ(B))µ2 hence µ cannot be extremal.
Theorem 6.5. If T : X → X is a continuous mapping on a compact metric space then M (X, T )
contains at least one ergodic measure.
Proof. It suffices to show that ∃µ ∈ M (X, T ) extremal by the previous theorem.
C(X, R) is separable
so choose a countable dense set {fn }∞
n=0 ∈ C(X, R)
R
The map µ → f0 dµ is continuous in the weak topology so since M (X, T ) is compact ∃ν ∈ M (X, T )
s.t.
Z
Z
f0 dν = supµ∈M (X,T ) f0 dµ
Which means that
M0 :=
Z
ν ∈ M (X, T ) :
Z
f0 dν = supµ∈M (X,T )
f0 dµ
Is a non-empty, closed subset of a compact space and hence is compact.
Continuing inductively we can define
Z
Z
Mn := ν ∈ Mn−1 : fn dν = supµ∈Mn−1 fn dµ
each of which is non-empty
and compact.
T∞
If we define M∞ := n=0 Mn then M∞ is non-empty since the countable intersection of nested,
non-empty compact sets is non-empty.
We therefore have that ∃µ∞ ∈ M∞ . We claim that µ∞ is extremal.
Suppose µ∞ = αµ1 + (1 − α)µ2 for α ∈ (0, 1) and µ1 , µ2 ∈ M (X, T ) then we want to show that
µ1 = µ2 .
By the Reisz representation theorem we have that µ1 = µ2 iff
Z
Z
f dµ1 = f dµ2 ∀f ∈ C(X, T )
However it suffices to show this for some dense subset.
Z
Z
Z
f0 dµ∞ = α f0 dµ1 + (1 − α) f0 dµ2
so we must have that
Z
Z
Z
Z
Z
supµ∈M (X,T ) f0 dµ = f0 dµ∞ ≤ max
f0 dµ1 , (1 − α) f0 dµ2 ≤ supµ∈M (X,T ) f0 dµ
Since µ1 , µ2 ∈ M (X, T )
It therefore follows that µ1 , µ2 ∈ M0 .
Suppose µ1 , µ2 ∈ Mn for some n ∈ N0 then:
Z
Z
Z
fn dµ∞ = α fn dµ1 + (1 − α) fn dµ2
so we must have that
Z
Z
Z
Z
Z
supµ∈Mn−1 fn dµ = fn dµ∞ ≤ max
fn dµ1 , (1 − α) fn dµ2 ≤ supµ∈Mn fn dµ
Since µ1 , µ2 ∈ Mn−1
We therefore have that
Z
Z
fn dµ∞ =
since α ∈ (0, 1).
This holds ∀n ∈ N hence
Z
fn dµ1 =
Z
Z
f dµ1 =
∀f ∈ {fn }∞
n=0 so indeed µ1 = µ2 .
27
f dµ2
fn dµ2
Corollary 6.3. IfQσ : Σk → Σk is the full shift and p is a probability vector then
n−1
µp [z0 , ..., zn−1 ] := i=0 pzi is ergodic for σ.
Corollary 6.4.
µ(B) =
1
log(2)
Z
B
1
dx
1+x
is ergodic for the continued fraction map.
7
Recurrence and Unique Ergodicity
Theorem 7.1. Poincare Recurrence Theorem:
Let T : X → X be a measure preserving transform of the probability space (X, B, µ) for compact X. If
A ∈ B s.t. µ(A) > 0 then for µ almost every x ∈ A we have that {T n x}∞
n=0 return to A infinitely often.
Proof. Let E = {x ∈ A : ∃minmathbbN : T n x ∈
/ A ∀n ≥ m} which is equivalent to the set of x ∈ A
such that the sequence only returns to A finitely often.
We want to show that µ(A \ E) = 0.
Let F = {x ∈ A : T n x ∈
/ A ∀n ≥ 1} then T −k F = {x ∈ X : T k x ∈ A, T n x ∈
/ A ∀n > k}
So in particular we have that
∞
[
A\E =
(T −k F ∩ A)
k=0
So we have that
∞
[
µ(A \ E) = µ
≤µ
!
(T
k=0
∞
[
−k
F ∩ A)
!
T −k F
k=0
≤
∞
X
µ(T −k F )
k=0
∞
X
=
µ(F )
k=0
So it suffices to show that µ(F ) = 0.
If n > m then suppose x ∈ T −n F ∩ T −m F .
We must have that T m x ∈ F and also that T n−m (T m x) = T n x ∈ F ⊂ A which contradicts that
x ∈ T −m F so we must have that {T −k }∞
k=0 are disjoint.
This gives us that
!
∞
∞
∞
[
X
X
−n
µ
T F =
µ(T −k F ) =
µ(F )
k=0
k=0
k=0
The left hand side lies in the interval [0, 1] since µ is a probability measure and the right hand side can
only take values in {±∞, 0} since it is the infinite sum of the same value hence the equality implies
that both sides must equal zero and therefore µ(F ) = 0.
Definition 7.1. Unique Ergodicity:
If (X, B) is a measurable space for X compact and T : X → X has a unique invariant measure µ then
T is called uniquely ergodic.
Theorem 7.2. Let X be a compact metric space and T : X → X continuous then the following are
equivalent:
1. T is uniquely ergodic.
28
2. ∀f ∈ C(X, R)
∃cf constant such that
n−1
1X
f (T j x) = cf
n→∞ n
j=0
lim
uniformly over x.
Proof. We split the proof into the two separate implications:
• 2 =⇒ 1)
Suppose µ, ν are T invariant probability measures then
Z
f dµ =
n−1 Z
1X
f ◦ T j dµ
n j=0
n−1 Z
1X
f ◦ T j dµ
n→∞ n
j=0
= lim
Z
= lim
n→∞
Z
=
n−1
1X
f ◦ T j dµ
n j=0
n−1
1X
f ◦ T j dµ
n→∞ n
j=0
lim
by DCT
Z
=
cf dµ
= cf
Similarly the same must hold for ν hence
Z
Z
f dµ =
f dν
for any f ∈ C(X, R) so indeed µ, ν coincide by Riesz representation theorem.
• 1 =⇒ 2)
Suppose µ is an invariant measure then if 2 holds we have that ∃cf such that
Z
f dµ = cf
Suppose 2 fails then we want to show that 1 also fails.
∞
∃f ∈ C(X, R) and sequence {nk }∞
k=1 ∈ N with associated {xk }k=1 ∈ X such that
Z
nk −1
1 X
f ◦ T j xk 6= f dµ
k→∞ nk
j=0
lim
For k ≥ 1 define νk ∈ M (X) by
νk =
then
Z
nk −1
1 X
T j δx
nk j=0 ∗ k
nk −1
1 X
f dνk =
f (T j xk )
nk j=0
so νk has some subsequence νkr which converges to some invariant probability measure ν.
29
Z
Z
f dν = lim
r→∞
f dνkr
nkr −1
1 X
f (T j xkr )
r→∞ nkr
j=0
Z
6
=
f dµ
= lim
So µ is not uniquely ergodic.
8
Birkhoff ’s Ergodic Theorem
Definition 8.1. Absolutely Continuous:
If µ, ν are measures on (X, B) then ν is absolutely continuous with respect to µ if
µ(B) = 0 =⇒ ν(B) = 0 for any B ∈ B
We have that if
Z
ν(B) :=
f dµ
B
then ν is absolutely continuous with respect to µ.
Theorem 8.1. Radon-Nikodym:
Let (X, B, µ) be a probability space and ν be a measure on (X, B) absolutely continuous with respect to
µ then ∃1 f non-negative measurable function such that
Z
ν(B) =
f dµ
B
∀B ∈ B
Definition 8.2. Conditional Expectation:
For A ⊆ B sub-σ algebra µ|A is a measure.
For f ≥ 0 ∈ L(X, B, µ)
Z
ν(A) =
f dµ
A
is a measure absolutely continuous with respect to µ so by Radon-Nikodym ∃1 E[f |A] measure s.t.
Z
ν(A) = E[f |A]dµ
called the conditional expectation of f given A
Corollary 8.1. E[f |A] is uniquely determined by the requirements that
• E[f |A] is A-measurable.
R
R
• A f dµ = A E[f |A]dµ
Lemma 8.1. I := {B ∈ B : T
∀A ∈ A
−1
B = B a.e.} is a σ-algebra of invariant sets.
Theorem 8.2. Birkhoff ’s Ergodic Theorem:
Let (X, B, µ) be a probability space and T : X → X a measure preserving transformation.
∀f ∈ L(X, B, µ) we have that
n−1
1X
lim
f (T j x) = E[f |I]
n→∞ n
j=0
for a.e. x ∈ X.
30
Corollary 8.2. Let (X, B, µ) be a probability space and T : X → X an ergodic measure preserving
transformation.
∀f ∈ L(X, B, µ) we have that
Z
n−1
1X
f (T j x) = f dµ
lim
n→∞ n
j=0
for a.e. x ∈ X.
Proof. If T is ergodic that I is the set of all trivial sets so for f ∈ L(X, B, µ)
Z
E[f |I] = f dµ
so the result follows by Birkhoff’s ergodic theorem.
Corollary 8.3. If T : X → X is an ergodic transformation of (X, B, µ) and B ∈ B then
lim
n→∞
1
#{j : 0 ≤ j ≤ n − 1, T j x ∈ B} = µ(B)
n
Theorem 8.3. If T : X → X is a measure preserving transformation of the probability space (X, B, µ)
then the following are equivalent:
1. T ergodic.
2. ∀A, B ∈ B
n−1
1X
µ (T −j A) ∩ B = µ(A)µ(B)
n→∞ n
j=0
lim
Proof.
• 1 =⇒ 2)
Suppose T is ergodic then χA ∈ L1 so
Z
n−1
1X
χA T j = χA dµ = µ(A)
n→∞ n
j=0
lim
a.e. so
n−1
1X
(χA ◦ T j )χB = µ(A)χB
n→∞ n
j=0
lim
a.e.
Since the left hand side is bounded by 1 by DCT we have that
n−1
n−1 Z
1X
1X
µ((T −j A) ∩ B) =
(χA ◦ T j )χB dµ
n j=0
n j=0
Z
=
n−1
1X
(χA ◦ T j )χB dµ
n j=0
Z
n−1
n−1
1X
1X
µ((T −j A) ∩ B) = lim
(χA ◦ T j )χB dµ
n→∞ n
n→∞
n
j=0
j=0
Z
= µ(A)χB dµ
lim
= µ(A)µ(B)
31
• Suppose 2 holds and that T −1 A = A then set B = A which gives us that
µ((T −j A) ∩ B) = µ(A ∩ B) = µ(A)
So
µ(A) =
=
n−1
1X
µ(A)
n j=0
n−1
1X
µ((T −j A) ∩ B)
n j=0
= µ(A)µ(B)
= µ(A)2
So µ(A) ∈ {0, 1} and therefore T is ergodic.
Definition 8.3. Weak-Mixing:
T is weak mixing if ∀A, B ∈ B we have that
n−1
1X
|µ((T −j A) ∩ B) − µ(A)µ(B)| = 0
n→∞ n
j=0
lim
Definition 8.4. Strong-Mixing:
T is strong mixing if ∀A, B ∈ B we have that
lim µ((T −j A) ∩ B) = µ(A)µ(B)
n→∞
Definition 8.5. Normal:
We call x ∈ [0, 1) normal if it has a unique binary expansion
x=
∞
X
xi
i=1
2i
for xi ∈ {0, 1} with
1
#{j : 1 ≤ j ≤ n, xj = 0} = 1/2
n→∞ n
lim
9
Entropy
Definition 9.1. Topologically Conjugate:
For compact spaces X, Y we say continuous functions T : X → X, S : Y → Y are topologically
conjugate if ∃h : X → Y homeomorphism such that h ◦ T = S ◦ h
Definition 9.2. Isomorphic:
If T, S are measure preserving transforms on probability spaces (X, B, µ), (Y, C, ν) respectively then T, S
are isomorphic if ∃M ∈ B, N ∈ C s.t.
• T M ⊆ M, SN ⊆ N
• µ(M ) = 1 = ν(N )
• ∃ϕ : M → N bijection s.t.
– ϕ, ϕ−1 are measurable:
ϕ(B) ∈ C ∀B ∈ B,
ϕ−1 (C) ∈ B ∀C ∈ C
32
– ϕ, ϕ−1 are measure preserving:
µ(ϕ−1 C) = ν(C) ∀C ∈ C,
ν(ϕB) = µ(B) ∀B ∈ B
– ϕ◦T =S◦ϕ
Definition 9.3. Conditional Probability:
Let (X, B, µ) be a probability space; ff A ⊂ B is a sub-σ algebra and B ∈ B then
µ(B|A) := E[χB |A]
is the conditional probability of B given A
Definition 9.4. Countable Partition:
α is a measure theoretic countable partition of probability space (X, B, µ) if α = {Ai }∞
i=1 s.t.
• Ai ∈ B
∀i
• µ(Ai ∩ Aj ) = 0 ∀i 6= j
S∞
• µ ( i=1 Ai ) = 1
Corollary 9.1. For a measurable function f we have that
X χA (x) Z
f dµ
E[f |σ(α)](x) =
µ(A) A
A∈α
Furthermore
µ(B|σ(α))(x) =
µ(A ∩ B)
µ(A)
x∈A
Theorem 9.1. Increasing Martingale Theorem:
S∞
Let {Ai }∞
i=1 be an increasing sequence of sub-σ algebras of A such that σ ( i=1 Ai ) = A then
• limn→∞ E[f |An ] = E[f |A]
µ − a.e.
R
• limn→∞ |E[f |An ] − E[f |A]|dµ = 0
Definition 9.5. Join:
If α, β are countable partitions of X then the join of α, β is the partition:
α ∨ β := {A ∩ B : A ∈ α, B ∈ β}
Notice that α, β are independent if µ(A ∨ B) = µ(A)µ(B) ∀A ∈ α, B ∈ β.
Definition 9.6. Information:
Given a partition α we define the information I(α) : X → R+ obtained from observing α to be
X
I(α)(x) := −
χA (x) log(µ(A))
A∈α
Corollary 9.2. I(α) is continuous.
Corollary 9.3. If α, β are independent partitions then I(α ∨ β) = I(α) + I(β)
33
Proof.
I(α ∨ β) = −
X
χC (x) log(µ(C))
C∈α∨β
=−
X
χA∩B (x) log(µ(A ∩ B))
A∈α,B∈β
=−
XX
χA (x)χB (x) log(µ(A)µ(B))
by independence
A∈α B∈β
=−
XX
χA (x)χB (x)(log(µ(A)) + log(µ(B)))
A∈α B∈β
=−
XX
χA (x)χB (x) log(µ(A)) −
A∈α B∈β
=−
X
XX
χA (x)χB (x) log(µ(B))
A∈α B∈β
χA (x) log(µ(A)) −
A∈α
X
χB (x) log(µ(B))
B∈β
= I(α) + I(β)
Definition 9.7. Entropy:
Given a partition α we define the entropy to be
Z
X
H(α) = I(α)dµ = −
µ(A) log(µ(A))
A∈α
Definition 9.8. Conditional Information:
Given A ⊆ B sub-σ algebra and partition α we define the conditional information of α given A to be
X
I(α|A)(x) := −
χA (x)log(µ(A|A)(x))
A∈α
Definition 9.9. Conditional Entropy:
Given A ⊆ B sub-σ algebra and partition α we define the conditional entropy of α given A to be
Z
Z X
H(α|A) := I(α|A)dµ = −
µ(A|A) log(µ(A|A))dµ
A∈α
Lemma 9.1. For countable partitions α, β, γ we have that
I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ)
Moreover
H(α ∨ β|γ) = H(α|γ) + H(β|α ∨ γ)
Proof. Let x ∈ X we know that since α, β, γ form partitions we have some A ∈ α, B ∈ β, C ∈ γ s.t.
x ∈ A ∩ B ∩ C.
I(α ∨ β|γ)(x) = −
X
χY (x) log(µ(Y |γ)(x))
Y ∈α∨β
= − log(µ(A ∩ B)(x))


X
µ(A ∩ B ∩ C) 
= − log 
χC (x)
µ(C)
C∈γ
= − log(µ(A ∩ B ∩ C)) + log(µ(C))
I(α|γ) = − log(µ(A ∩ C)) + log(µ(C))
I(β|α ∨ γ) = − log(µ(A ∩ B ∩ C)) + log(µ(A ∩ C))
34
Hence indeed
I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ)
so by integrating over x we have that
H(α ∨ β|γ) = H(α|γ) + H(β|α ∨ γ)
Definition 9.10. Refinement:
For countable partitions α, β we say that β is a refinement of α (written α ≤ β) if any set A ∈ α can
be written as the union of sets in β.
Corollary 9.4. For countable partitions α ≤ β we have that
I(α|β) = 0
Proof. Since α ≤ β we have that β = α ∨ β so
I(α|β) = I(α|α ∨ β) = 0
Corollary 9.5. If α, β, γ are countable partitions and γ ≥ β then
I(α ∨ β|γ) = I(α|γ)
Moreover
H(α ∨ β|γ) = H(α|γ)
Proof. β ≤ α ∨ γ so
I(α ∨ β|γ) = I(α|γ) + I(β|α ∨ γ)
= I(α|γ)
So the final result follows by integration.
Corollary 9.6. If α, β, γ are countable partitions s.t. α ≥ β then
I(α|γ) ≥ I(β|γ)
Moreover
H(α|γ) ≥ H(β|γ)
Proof. α ≥ β so α = α ∨ β so we have that
I(α|γ) = I(α ∨ β|γ)
= I(β|γ) + I(α|β ∨ γ)
≥ I(β|γ)
The final result then follows by integration.
Proposition 9.1. Jensen’s Inequality:
Let ϕ : [0, 1] → R+ be continuous and concave. If f ∈ L1 (X, B, µ) and A ⊂ B a sub-σ algebra then
ϕ(E[f |A])(x) ≥ E[ϕ(f )|A](x)
Lemma 9.2. If γ ≥ β are countable partitions then
H(α|β) ≥ H(α|γ)
35
µ − a.e.
Proof. Set ϕ(t) = −t log(t) which is continuous and convex on [0, 1] and hence satisfies the
requirements for Jensen’s inequality.
Choose A ∈ α and define
f (x) := µ(A|γ)(x) = E[χA |γ](x)
By properties of conditional expectation we have that
E[f |β] = E[E[χA |γ]|β]
= E[χA |β]
= µ(A|β)
By Jensen’s inequality we have that
ϕ(E[f |β]) ≥ E[ϕ(f )|β]
hence
−µ(A|β) log(µ(A|β)) ≥ −E[µ(A|γ) log(µ(A|γ))]
Integrating with respect to µ yields
Z
Z
− µ(A|β) log(µ(A|β))dµ ≥ − E[µ(A|γ) log(µ(A|γ))]dµ
By summing over A ∈ α we have that
H(α|β) ≥ H(α|γ)
Definition 9.11. Sub-Additive:
A sequence {an }∞
n=1 is called sub-additive if an+m ≤ an + am
Lemma 9.3. If {an }∞
n=1 is a positive sub-additive sequence then an /n converges.
Proof. If {an }∞
n=1 is sub-additive then its infimum element is a1 moreover
0≤
an
na1
≤
= a1
n
n
hence the sequence must converge.
For measure preserving transformation T and countable partition α we denote
T −1 α := {T −1 A : A ∈ α} and
−i
Hn (α) := H(∨n−1
α)
i=0 T
Lemma 9.4. If T is a measure preserving transformation and α a countable partition then Hn (α) is a
sub-additive sequence.
Proof.
Hn+m (α) = H(∨n+m−1
T −i α)
i=0
−i
= H (∨n−1
α) ∨ (∨n+m−1
T −j α)
i=0 T
j=n
−i
≤ H(∨n−1
α) + H(∨n+m−1
T −j α))
i=0 T
j=n
−i
−n −j
= H(∨n−1
α) + H(∨m−1
T α))
i=0 T
j=0 T
−i
−j
= H(∨n−1
α) + H(T −n ∨m−1
α))
i=0 T
j=0 T
−i
−j
= H(∨n−1
α) + H(∨m−1
α))
i=0 T
j=0 T
T measure preserving
= Hn (α) + Hm (α)
36
Definition 9.12. Relative Entropy:
If T : X → X is a measure preserving transform of probability space (X, B, µ) and α a countable
partition of X such that H(α) < ∞ then the entropy of T relative to α is defined to be
h(T, α) := lim
n→∞
1
−i
H(∨n−1
α)
i=0 T
n
The relative entropy always exists due to the previous lemmas.
Corollary 9.7.
0 ≤ h(T, α) ≤ H(α)
Corollary 9.8.
−i
h(t, α) = H(α| ∨∞
α)
i=1 T
Proof. Denote
−i
αn := ∨n−1
α
i=0 T
Then we have that
−i
−i
H(αn ) = H(α| ∨n−1
α) + H(∨n−1
α)
i=1 T
i=1 T
−i
= H(α| ∨n−1
α) + H(αn−1 )
i=1 T
n
X
−i
=
H(α| ∨k−1
α)
i=0 T
k=1
Hence
n
H(αn )
1X
−i
H(α| ∨k−1
α)
=
i=0 T
n
n
k=1
By the increasign martingale theorem we have that
−i
−i
lim H(α| ∨n−1
α) = H(α| ∨∞
α)
i=0 T
i=0 T
n→∞
So we have that
H(αn )
−i
= H(α| ∨∞
α)
i=0 T
n→∞
n
h(T, α) = lim
Definition 9.13. Entropy of a Measure Preserving Transformation:
If T is a measure preserving transformation then
h(T ) := sup{h(T, α) : α countable partition, H(T, α) < ∞}
Theorem 9.2. Let T : X → X, S : Y → Y be measure preserving transformations of
(X, B, µ), (Y, C, ν) respectively. If T, S are isomorphic then h(T ) = h(S)
Proof. Recall that T, S are isomorphic if ∃M ∈ B, N ∈ C s.t.
• T M ⊆ M, SN ⊆ N
• µ(M ) = 1 = ν(N )
• ∃ϕ : M → N bijection s.t. ϕ, ϕ−1 are measurable, measure preserving and ϕ ◦ T = S ◦ ϕ
37
If α is a countable partition of Y then it is also a countable partition of N .
ϕ−1 α is a partition of M and hence of X.
We have that
X
Hµ (ϕ−1 α) = −
µ(ϕ−1 A) log(µ(ϕ−1 A))
A∈α
=−
X
ν(A) log(ν(A)
A∈α
= Hν (α)
More generally we have that
n−1 −i
−i
−i
Hµ (∨n−1
(ϕ−1 α)) = Hµ (ϕ−1 ∨n−1
i=0 T
i=0 S α) = Hν (∨i=0 S α)
Dividing by n and taking the limit as n → ∞ gives
h(T, ϕ−1 α) = lim
n→∞
1
1
n−1 −i
−i
Hµ (∨n−1
(ϕ−1 α)) = lim Hν (∨i=0
S α) = h(S, α)
i=0 T
n→∞ n
n
So we have that
h(S) = sup{h(S, α) : α countable partition of Y, Hν (α) < ∞}
= sup{h(T, ϕ−1 α) : α countable partition of Y, Hν (α) < ∞}
≤ sup{h(T, β) : β countable partition of X, Hµ (beta) < ∞}
= h(T )
So by symmetry we have that h(T ) ≤ h(S) and hence h(T ) = h(S)
Theorem 9.3. Abramov’s Theorem:
If
{αn }∞
n=1 is an increasing sequence of partitions on (X, B, µ) such that H(αn ) < ∞ and
S∞
α
= B then
n
n=1
h(T ) = lim h(T, αn )
n→∞
Proof. Let α, β be partitions with H(α), H(β) < ∞ then
−i
−i
−j
H(∨n−1
α) ≤ H (∨n−1
α) ∨ (∨n−1
β)
i=0 T
i=0 T
j=0 T
n−1 −i
−i
−j
= H(∨n−1
β) + H(∨i=0
T α| ∨n−1
β)
i=0 T
j=0 T
−i
≤ H(∨n−1
β) +
i=0 T
n−1
X
n−1 −j
H(T −i α| ∨j=0
T β)
i=0
−i
≤ H(∨n−1
β) +
i=0 T
n−1
X
H(T −i α|T −i β)
i=0
−i
= H(∨n−1
β) + nH(α|β)
i=0 T
which gives us that
h(T, α) = lim
n→∞
1
1
−i
−i
H(∨n−1
α) ≤ lim H(∨n−1
β) + H(α|β) = h(T, β) + H(α|β)
i=0 T
i=0 T
n→∞ n
n
In particular
h(T, α) ≤ h(T, αn ) + H(α|αn )
for any countable partition α.
38
Furthermore for an increasing sequence of partitions {αn }∞
n=1 with H(αn ) < ∞ and
along with arbitrary partition α with H(α) < ∞ then
S∞
n=1
αn = B
lim H(α|αn ) = 0
n→∞
Which means that
h(T, α) ≤ lim h(T, αn )
n→∞
for any countable partition α so indeed
h(T ) = supα h(T, α)
≤ lim h(T, αn )
n→∞
≤ h(T )
which means that
h(T ) = lim h(T, αn )
n→∞
Definition 9.14. Generator:
For T invertible measure preserving transformation on (X, B, µ) we say that a countable partition α is
a generator if
−i
lim ∨n−1
α=B
j=−(n−1) T
n→∞
Definition 9.15. Strong Generator:
For T measure preserving transformation on (X, B, µ) we say that a countable partition α is a strong
generator if
−i
lim ∨n−1
α=B
j=0 T
n→∞
Corollary 9.9. If for a.e. x, y ∈ X ∃n s.t. x, y are in different elements of the partition
−i
∨n−1
α then α is a generator.
j=−(n−1) T
−i
Corollary 9.10. If for a.e. x, y ∈ X ∃n s.t. x, y are in different elements of the partition ∨n−1
α
j=0 T
then α is a strong generator.
Theorem 9.4. Sinai’s Theorem:
If either
• α is a strong generator.
or
• T is invertible and α is a generator.
then
h(T ) = h(T, α)
Proof.
• Suppose α is a strong generator.
1
−i
H(∨k−1
(∨nj=0 T −j α))
i=0 T
k
1
= lim H(∨n+k−1
T −i α)
i=0
k→∞ k
n+k 1
= lim
H(∨n+k−1
T −i α)
i=0
k→∞
k n+k
= h(T, α)
h(T, ∨nj=0 T −j α) = lim
k→∞
39
In particular this holds for every n so
h(T, B) = h(T, α)
hence by Abramov’s we have that
h(T, α) = lim h(T, αn ) = h(T )
n→∞
• Suppose T is invertible and α is a generator.
1
−i
H(∨k−1
(∨nj=−n T −j α))
i=0 T
k
1
−i
α)
= lim H(∨n+k−1
i=−n T
k→∞ k
1
2n+k−1 −i
= lim H(∨i=0
T α)
k→∞ k
2n + k 1
= lim
H(∨2n+k−1
T −i α)
i=0
k→∞
k 2n + k
= h(T, α)
h(T, ∨nj=−n T −j α) = lim
k→∞
In particular this holds for every n so
h(T, B) = h(T, α)
hence by Abramov’s we have that
h(T, α) = lim h(T, αn ) = h(T )
n→∞
Theorem 9.5. If T is a measure preserving transformation of (X, B, µ) then
• For k ∈ N0 we have that h(T k ) = kh(T ).
• If T is invertible then h(T −1 ) = h(T ).
• If T is invertible and k ∈ Z then h(T k ) = |k|h(T ).
Proof.
• For k = 0 we have that h(T k ) = h(1) so if α is a countable partition such that H(α) < ∞
then
n−1 −i
H(∨i=0
1 α) = H(α)
and hence
h(T, α) = lim
n→∞
so indeed the statement holds for k = 0.
Choose a countable partition α with H(α) < ∞.
40
1
H(α) = 0
n
1
−j
H(∨nk−1
α)
j=0 T
n→∞ n
1
nk−1 −j
= k lim
H(∨j=0
T α)
n→∞ nk
= kh(T, α)
−j
h(T k , ∨k−1
α) = lim
j=0 T
kh(T ) = supα:H(α)<∞ kh(T, α)
−j
= supα:H(α)<∞ h(T k , ∨k−1
α)
j=0 T
≤ supα:H(α)<∞ h(T k , α)
= h(T k )
1
−jk
h(T k , α) = lim H(∨n−1
α)
j=0 T
n→∞ n
1
−j
≤ lim H(∨nk−1
α)
j=0 T
n→∞ n
1
−j
H(∨nk−1
α)
= k lim
j=0 T
n→∞ nk
= kh(T k , α)
So indeed
h(T k ) = kh(T )
•
−j
−j
H(∨n−1
α) = H(T n−1 ∨n−1
α)
j=0 T
j=0 T
j
= H(∨n−1
j=0 T α)
1
−j
h(T, α) = lim H(∨n−1
α)
j=0 T
n→∞ n
1
j
= lim H(∨n−1
j=0 T α)
n→∞ n
= h(T −1 , α)
Taking the supremum with respect to α then gives us that
h(T ) = h(T −1 )
• This follows directly from the last two points.
Lemma 9.5. The Parry measure µP r of n × n matrix A with entries 0, 1 and largest eigenvalue λ has
entropy
h(µP r ) = log(λ)
Proof. Let
Ai,j vj
λvi
ui vi
pi =
c
Pi,j =
where u, v are left and right eigenvectors respectively and c =
41
Pk
i=1
ui vi is a normalising constant.
P is a stochastic matrix so:
h(µP r ) = −
k
X
pi Pi,j log(Pi,j )
i,j=1
=−
k
X
Ai,j vj
ui vi Ai,j vj
log
c λvi
λvi
i,j=1
=−
k
X
Ai,j vj
ui Ai,j vj
log
λc
λvi
i,j=1
k
k
k
X
X
X
ui Ai,j vj
ui Ai,j vj
ui Ai,j vj
=−
log(Ai,j ) +
log(λ) +
(log(vi ) − log(vj ))
λc
λc
λc
i,j=1
i,j=1
i,j=1
Since A has entries 0, 1 we must always have that Ai,j log(Ai,j ) = 0 and the third term contains a
telescoping sum which cancels all of the terms so we have that
h(µP r ) =
k
X
ui Ai,j vj
log(λ)
λc
i,j=1
= log(λ)
k
X
λuj vj
j=1
Pk
λc
j=1
uj v j
i=1
ui vi
= log(λ) Pk
= log(λ)
In general we may not have that h(T ) = h(S) implies that T, S are isomorphic however the following
two theorems give some cases where this is a sufficient condition.
Theorem 9.6. Ornstein:
Two 2-sided Bernoulli shifts with the same entropy are isomorphic.
Theorem 9.7. Ornstein-Friedman:
Two aperiodic Markov shifts of finite type with the same entropy are isomorphic.
10
Functional Analysis
Definition 10.1. Banach Space:
A Banach space is a complete normed space.
For a probability space (X, B, µ) we denote L1 (X, B, µ) to be the space of integrable functions which is
a Banach space with the norm
Z
||f ||1 :=
|f |dµ
Furthermore we denote L∞ (X, B, µ) to be the space of functions which are almost everywhere
bounded which is a Banach space with the norm
||f ||∞ :=
inf
sup |f (y)|
Y ⊂X:µ(Y )=1 y∈Y
Proposition 10.1. We have the following facts concerning L1 , L∞ :
42
• L∞ ⊂ L1
R
• Every bounded linear functional W : L1 → R can be written as W (f ) = f gdµ for some g ∈ L∞
R
• L10 := {f ∈ L1 : f dµ = 0} is a Banach space with respect to the norm ||f || = inf c∈R ||f − c||1
Lemma 10.1. If E is a Banach space and F ⊂ E is a closed subspace then ∃W : E → R non-zero,
bounded linear functional s.t. W (f ) = 0 ∀f ∈ F
Lemma 10.2. Let m, p ∈ N such that 1 ≤ m ≤ p and {xi }pi=1 ∈ R+
For ε > 0 let


i+n−1

X 
Sε := i ∈ {1, ..., p − m} : ∃n ∈ N ∩ [1, m] s.t. nε ≤


j=i
then
p
X
j=1
xj ≥ ε
p−m
X
χSε (i)
i=1
Proof. If S = φ then since xi ≥ 0 ∀i the statement is trivial so suppose S 6= φ.
Let j1 = min{i ∈ S} since S is finite and non-empty. Furthermore let n1 be the least value for which
Pj1 +n1 −1
n1 ε ≤ i=j
1
Then inductively define jr := min{j ∈ S : j > jr−1 + nr−1 − 1} (so long as the right-hand-side is
Pjr +nr −1
non-empty) and nr to be the least value for which nr ε ≤ i=j
r
This yields the finite collections {jr }kr=1 , {nr }kr=1
n
X
j=1
xj ≥
k jr +n
r −1
X
X
r=1
≥
xi
i=jr
k
X
nr ε
r=1
≥ε
k jr +n
r −1
X
X
r=1
=ε
χS (i)
i=jr
p−m
X
i=1
since any element not in some string cannot be in S.
For ε > 0 and f ∈ L1 define



1 n−1

X
Eε (f ) := x ∈ X : lim sup f (T j x) ≥ ε


n j=0
n→∞
Lemma 10.3.
µ(E2ε (f )) ≤
||f ||1
ε
Proof. Write f = f + − f − where f + , f − ≥ 0 then for m ≥ 1 define


n−1


X
Eεm (f + ) = x ∈ X : ∃n ≤ m,
f + (T j x) ≥ εn


j=0
43
Eεm (f − ) =


n−1
X
x ∈ X : ∃n ≤ m,

f − (T j x) ≥ εn
j=0



then by applying lemma 10.2 with xj = f + (T j−1 x) and Sε = Eεm (f + ) we have that for p > m
p−1
X
f + (T j x) ≥ ε
j=0
p−m−1
X
χEεm (f + ) (T i x)
i=0
and similarly for f − we get
p−1
X
f − (T j x) ≥ ε
j=0
p−m−1
X
χEεm (f − ) (T i x)
i=0
It follows that
Z
p
f + dµ =
Z X
p−1
f + (T j x)dµ
T µ invariant
j=0
≥ε
p−m−1
X Z
χEεm (f + ) (T i x)dµ
i=0
= ε = ε(p − m)µ(Eεm (f + ))
Similarly we have that
Z
p
f − dµ ≥ ε(p − m)µ(Eεm (f − ))
By dividing by p and taking the limit as p → ∞ we get that
Z
f + dµ ≥ εµ(Eεm (f + ))
and
Z
f − dµ ≥ εµ(Eεm (f − ))
Furthermore we have that E2ε (f ) ⊂ Eεm (f + ) ∪ Eεm (f − ) so
µ(E2ε (f )) ≤ lim sup µ(Eεm (f + )) + lim sup µ(Eεm (f − +))
m→∞
m→∞
Z
Z
1
f + dµ + f − dµ
≤
ε
||f ||1
=
ε
Lemma 10.4. Given f ∈ L10 and δ > 0 we have that ∃h ∈ L∞ s.t.
||f − (h ◦ T − h)||1 < δ
Proof. Let C = {h ◦ T − h : h ∈ L∞ } ⊂ L10 which is a vector space.
We want to show that C is dense in L10 so it suffices to show that any linear bounded functional which
vanishes on C also vanishes on L10 by lemma 10.1.
We know that any linear bounded functional W on L1 can be written as
Z
W (f ) = f gdµ
44
for some g ∈ L∞ .
Suppose W vanishes on C then ∀h ∈ L∞ we have that
Z
(h ◦ T − h)gdµ = 0
in particular we have that
Z
(g ◦ T − g)gdµ = 0
Which means that
Z
Z
(g ◦ T )gdµ =
g 2 dµ
Furthermore
Z
Z
Z
Z
(g ◦ T − g)2 dµ = (g ◦ T )2 dµ + g 2 dµ − 2 (g ◦ T )gdµ
Z
Z
2
= 2 g dµ − 2 (g ◦ T )gdµ
=0
so we must have that g ◦ T − g = 0 almost everywhere hence since T is ergodic we must have that g is
some constant c.
For f ∈ L10 we have that
Z
W (f ) = f gdµ
Z
= c f dµ
=0
so indeed W vanishes on L10 so C is dense in L10 and the result follows.
Theorem 10.1. Birkhoff ’s Ergodic Theorem:
If T : X → X is an ergodic measure preserving transformation on probability space (X, B, µ) then for
any f ∈ L1 (X, B, µ) we have that for almost every x ∈ X
Z
n−1
1X
j
lim
f (T x) = f dµ
n→∞ n
j=0
Proof. Firstly suppose that h ∈ L∞ then h ◦ T − h ∈ L10 since
Z
Z
Z
h ◦ T − hdµ = h ◦ T dµ − hdµ
Z
Z
= hdµ − dµ
=0
45
Furthermore
1 n−1
1 n−1
X
X
(h ◦ T − H)(T j x) = h(T j+1 x) − h(T j x)
n j=0
n j=0
1
= (h(T n x) − h(x))
n
1
≤ (|f (T n x)| + |f (x)|)
n
2||h||∞
≤
n
which converges to 0 as n → ∞.
We need to extend this to f ∈ L10 so suppose that f ∈ L10 then we want to show that
n−1
1X
f (T j x) = 0
n→∞ n
j=0
lim
Fix δ > 0 then by the previous lemma we can find h ∈ L∞ s.t. ||f − (h ◦ T − h)||1 < δ so ∀ε > 0 we
have that:
Eε (f ) ⊂ Eε/2 (f − (h ◦ T − h)) ∪ Eε/2 (h ◦ T − h)
µ(Eε (f )) ≤ µ(Eε/2 (f − (h ◦ T − h))) + µ(Eε/2 (h ◦ T − h))
= µ(Eε/2 (f − (h ◦ T − h)))
≤
by earlier part of the proof
4||f − (h ◦ T − h)||1
ε
4δ
≤
ε
by lemma 10.3
since δ was chosen arbitrarily it follows that µ(Eε (f )) = 0. R
So the theorem holds for f ∈ L10 so for g ∈ L1 write f = g − gdµ. The theorem holds for f since
f ∈ L10 and by rearrangement we have that the theorem holds for g.
46
Download