Lecture 16: Recall defns of ergodicity and mixing for Z and Z actions.

advertisement
Lecture 16:
Recall defns of ergodicity and mixing for Zd+ and Zd actions.
Examples:
1. Z2 rotations: ergodic if α or β is irrational; never mixing
2.* Z2+ doubling and tripling map: mixing
3. Z2 iid shift action: clearly mixing
4.* Ledrappier 3-dot: mixing, but not mixing of “higher orders”.
5.* Ising (with temperature): µ+ and µ− are always mixing.
Until 1957, no invariant could distinguish (d = 1) iid(1/2, 1/2)
from iid(1/3, 1/3, 1/3).
ENTROPY FOR MPT’s and MP actions
Defn: Let (M, A, µ) be a probability space. Let α = {A1, . . . , An}
be a finite measurable partition of M and
X
µ(Ai) log µ(Ai)
H(α) := −
i
Letting p := (µ(A1), . . . , µ(An)), then H(α) = H(p) as defined
before.
Restate earlier Proposition in terms of entropy of partition:
0) H(α) ≥ 0
1) H(α) = 0 iff α is deterministic, i.e., for some unique i, µ(Ai) =
1.
2) H(α) is continuous and strictly concave
3) H(α) ≤ log n with equality iff each µ(Ai) = 1/n.
Idea: H(α) represents uncertainty, information, randomness of α
— How much uncertainty is there in drawing a sample and observing the Ai to which it belongs.
— How much new information is revealed, on the average, when
you draw a sample and observe Ai.
1
Relations between partitions:
Let β := {B1, . . . , Bm} be another finite measurable partition on
the same probability space.
Defn: β α:
each element of β is a union of elements of α.
Note: α is finer than β, and β is coarser than α.
Defn: α ⊥ β: µ(Ai ∩ Bj ) = µ(Ai)µ(Bj ).
Defn: α ∨ β = {A ∩ B : A ∈ α, B ∈ β}
but delete sets of measure zero
Defn: If T is an MPT,
T −1(α) = {T −1(A) : A ∈ α}
Note:
H(α ∨ β) = −
X
µ(Ai ∩ Bj ) log µ(Ai ∩ Bj )
i,j
Defn:
H(β|α) := −
X
µ(Ai ∩ Bj ) log µ(Bj |Ai).
i,j
which can be rewritten:
X
X
µ(Ai)(−
µ(Bj |Ai) log µ(Bj |Ai))
i
j
So, H(β|α) is a weighted average of the entropies
X
H(β|Ai) := −
µ(Bj |Ai) log µ(Bj |Ai)
j
Another viewpoint: given a random variable X taking values
1, . . . , n: Ai := X −1({i}) and α := {A1, . . . , An}.
2
Up to renaming its values, X is determined by α and we define
H(X) := H(α)
Let X ∼ α, Y ∼ β (on the same probability space)
Dictionary:
β α: “Y is a function of X”
α ⊥ β: “X and Y are independent”
H(α ∨ β): “H(X, Y )”
H(β|α): “H(Y |X)”
Properties of entropy:
1. H(α ∨ β) = H(α) + H(β|α)
Proof: LHS =
X
X
−
µ(Ai∩Bj ) log µ(Ai∩Bj ) = −
µ(Ai∩Bj ) log µ(Bj |Ai)µ(Ai)
i,j
i,j
=−
X
µ(Ai ∩ Bj ) log µ(Bj |Ai) −
X
i,j
=−
X
µ(Ai ∩ Bj ) log µ(Ai)
i,j
µ(Ai ∩ Bj ) log µ(Bj |Ai) −
i,j
X
µ(Ai) log µ(Ai) = RHS
i
“H(X, Y ) = H(X) + H(Y |X)”
2. H(β|α) ≤ H(β) with equality iff α ⊥ β
Proof:
H(β|α) = −
X
µ(Ai ∩ Bj ) log µ(Bj |Ai))
(i,j)
=
X
j
µ(Bj )
X
µ(Ai|Bj )) log(
i
3
1
)
µ(Bj |Ai)
by Jensen
X
X µ(Ai|Bj )
X
X µ(Ai)
≤
µ(Bj ) log(
)=
µ(Bj ) log(
)=
µ(B
|A
)
µ(B
)
j
i
j
j
i
j
i
X
µ(Bj ) log(
j
1
) = H(β)
µ(Bj )
with equality iff for each j, µ(Bj |Ai) is a constant cj ; only possible constant is µ(Bj ):
µ(Bj ∩ Ai) = cj µ(Ai)
therefore α ⊥ β.
Note: literally, we have shown that µ(Ai ∩ Bj ) = µ(Ai)µ(Bj ),
only when µ(Ai ∩ Bj ) > 0. But it then follows for all (i, j).
“H(Y |X) ≤ H(X) with equality iff X ⊥ Y .”
3. H(β|α) ≥ 0 with equality iff β α
Proof: H(β|α) is a weighted average (wtd, by µ(Ai)) of entropies
H(β|Ai)
Each of these entropies is nonnegative. So, H(β|α) ≥ 0.
Equality holds iff for each i, H(β|Ai) = 0 iff for each i, there is
a unique j such that µ(Bj |Ai) = 1 iff Bj ⊇ Ai mod 0.
4
Lecture 17:
Recall Properties 1-3:
1. H(α ∨ β) = H(α) + H(β|α)
“H(X, Y ) = H(X) + H(Y |X)”
2. H(β|α) ≤ H(β) with equality iff α ⊥ β
“H(Y |X) ≤ H(X) with equality iff X ⊥ Y .”
3. H(β|α) ≥ 0 with equality iff β α
‘H(Y |X) ≥ 0 with equality iff Y is a function of X.”
4. H(α ∨ β) = H(α) iff β α
Proof: Follows from 1 and 3.
Pn
5. H(α1 ∨ · · · ∨ αn) = i=1 H(αi|α1 ∨ · · · ∨ αi−1)
Proof: Induction based on 1.
P
6. H(α1 ∨ · · · ∨ αn)) ≤ i H(αi) with equality iff αi are independent.
Proof: Follows from 5 and 2.
7. Assume β α. Then H(β) ≤ H(α) and equality holds iff
β = α.
Proof: By 1
H(α) = H(α ∨ β) = H(β) + H(α|β)
Since H(α|β) ≥ 0, we have H(β) ≤ H(α).
And H(β) = H(α) iff H(α|β) = 0 iff α β α iff α = β.
“H(f (X)) ≤ H(X) and equality holds iff f is 1-1 a.e.”
5
8. Assume γ α. Then H(β|α) ≤ H(β|γ)
Proof:
LHS =
X X X
k
RHS =
i:Ai ⊆Ck
XX
k
−µ(Ai ∩ Bj ) log(µ(Bj |Ai)
j
−µ(Bj ∩ Ck ) log µ(Bj |Ck )
j
We show that the inequality holds term by term for each j and
k, or equivalently (after dividing by µ(Ck ))
X
µ(Ai)
−
µ(Bj |Ai) log µ(Bj |Ai) ≤ −µ(Bj |Ck ) log µ(Bj |Ck )
µ(Ck )
i:Ai ⊆Ck
Letting f (x) = −x log(x), this becomes
X µ(Ai)
f (µ(Bj |Ai) ≤ f (µ(Bj |Ck ))
µ(Ck )
i:Ai ⊆Ck
which is true by Jensen since
X µ(Ai)
=1
µ(Ck )
i:Ai ⊆Ck
and
X µ(Ai)
µ(Bj |Ai) = µ(Bj |Ck )
µ(Ck )
i:Ai ⊆Ck
“H(Y |X) ≤ H(Y |f (X))
9. H(α ∨ β|γ) = H(α|γ) + H(β|α, γ)
10. H(β|γ, α) ≤ H(β|γ) with equality iff α ⊥γ β, i.e., for each
element C of γ, the restrictions of α to C and β to C are independent.
6
11. If α ⊥γ (β, δ), then H(β|δ, γ, α) = H(β|δ, γ).
12. If γ α, then H(γ|β) ≤ H(α|β)
13. If T is an MPT, then for all i, j, k,
H(T −iα ∨ . . . ∨ T −j α) = H(T −i−k α ∨ . . . ∨ T −j−k α)
Defn: The information function of a finite measble partition:
P
Iα (x) := − i RχAi (x) log µ(Ai)
Note: H(α) = Iα dµ.
Defn: Given B, a sub-sigma-algebra of A, define
X
X
Iα|B (x) := −
χAi (x) log µ(Ai|B) = −
χAi (x) log E(χAi |B)
i
i
Defn: H(α|B) :=
R
I α|B dµ.
Trivial examples:
Z
Z
X
X
χAi (x) log χAi (x) = 0.
χAi (x) log µ(Ai|A) = −
H(α|A) = −
i
i
Z
H(α|{M, ∅}) =
−
X
χAi (x) log µ(Ai|{M, ∅}) = H(α)
i
Prop: Let α, β be finite measble. partitions. Let B be the σalgebra generated by β, i.e., the collection of all finite unions of
elements of β. Then
H(α|B) = H(α|β)
proof:
µ(Ai|B) =
X
χBj (x)
j
7
µ(Ai ∩ Bj )
µ(Bj )
Thus,
Iα|B (x) = −
X
χAi (x) log
X
j
i
=−
X
χBj (x)
χAi∩Bj (x) log
i,j
µ(Ai ∩ Bj )
µ(Bj )
µ(Ai ∩ Bj )
.
µ(Bj )
Thus,
Z
H(α|B) =
−
X
χAi∩Bj (x) log
i,j
=−
X
µ(Ai ∩ Bj ) log
i,j
µ(Ai ∩ Bj )
dµ
µ(Bj )
µ(Ai ∩ Bj )
= H(α|β)
µ(Bj )
Correspondence: Finite Partition ←→ Finite σ-algebra
β 7→ σ(β):
σ(β) = unions of elts. of β
B 7→ π(B)
π(B) = {∩B∈B B where B = B or B c (the unique
generating partition for B)
So, H(α|B) for finite algebras B is really the same as H(α|β) for
finite measurable partitions.
Theorem: (continuity of conditional entropy)
(M, A, µ): probability space.
Let B1 ⊂ B2 ⊂ B3 . . . be σ-algebras
and B = σ(∪∞
n=1 Bn ).
Let α be a finite measble. partition.
8
Then
Iα|Bn → Iα|B a.e. and in L1
and thus,
H(α|Bn) → H(α|B)
Proof:
Main idea: by Martingale Convergence Theorem,
µ(Ai|Bn) → µ(Ai|B) a.e. and in L1.
More details next time. 9
Download