1 Markov Chain MA3H2 Markov Processes and Percolation Theory (Support Class 3)

advertisement
MA3H2 Markov Processes and Percolation Theory (Support Class 3)
Yulong Lu (yulong.lu@warwick.ac.uk)
1
Markov Chain
Definition 1.1. A Markov chain on a general state space X is a sequence of X -valued random variables
{Xn }n∈N0 satisfying that, for every n > 0 and every measurable bounded function f : X → R, one has
E(f (Xn )|F0n−1 ) = E(f (Xn )|Fn−1 )
where the σ-algebra Fnm = σ(Xm , Xm+1 , · · · , Xn ) and Fn = σ(Xn ).
A Markov chain is a discrete time Markov process, i.e. the future of the process is determined by the
present and independent of the past.
Definition 1.2. A Markov process is time-homogeneous if there exists a measurable map P from X into
P (X ), the space of probability measures on X , such that
P(Xn ∈ A|Xn−1 = a) = P (a; A)
for every A ∈ B(X ), almost every a ∈ X , and every n > 0. We call P the transition probability (or kernel)
of the Markov chain.
With the transition kernel P , one can define the n-step transition probability P n by
Z
n
P (xn ∈ A|x0 = a) = P (a; A) =
P (x; A)P n−1 (a; dx)
X
Given an initial probability measure µ0 and a transition kernel P , it is natural to ask whether there exists
a Markov chain with the prescribed initial measure and transition kernel. The following result answers
this question affirmatively.
Proposition 1.1. Let P be a measurable map from X to P (X ) and let µ0 be a probability measure on
X . Then, there exists a (unique in law) Markov process X with transition probabilities P such that the
law of X0 is µ0 .
Proof. Define the sequence of finite dimensional probability measures µn on X n by
Z Z Z
Z
Z
µn+1 (A0 ×· · ·×An ) =
···
P (xn−1 , An )P (xn−2 , dxn−1 ) · · · P (x1 , dx2 )P (x0 , dx1 )µ(dx0 )
A0
A1
A2
An−2
An−1
It is easy to check that this sequence of measures fulfill the compatibility condition. By Kolmogorov’s
Extension Theorem, there exists a unique measure µ on X ∞ such that the restriction of µ on X n is given
by µn . We choose Ω = X ∞ as our probability space endowed with the probability measure µ. Then one
can define the process X = {Xn }n∈N0 with Xn (ω0 , ω1 , · · · ) = ωn and ωn ∼ µ0 P n , it is clear that the law
of (X0 , · · · , Xn ) is equal to µn+1 for every n.
2
Discrete State Space
Now we consider the Markov chain X defined on a finite or countably discrete state space S. For all n ≥ 0
and (i0 , · · · , in , j) ∈ Sn+2 ,
P(Xn+1 = j|X0 = i0 , · · · , Xn = in ) = Pin j
1
where P is the transition matrix all of whose entries are non-negative and each of whose row sums to 1.
It is clear that the n-step transition probability is given by P n .
Let µ be the row vector with i-th entry (µi ) = P(X0 = i), then µ is called the initial distribution of
the chain and
P(Xn = j) = (µP n )j , n ≥ 0 and j ∈ S,
that is the row vector µP n is the probability distribution of the Markov chain X with the initial distribution µ. Since we are treating the discrete state space Markov chain, for the sake of convenience, we are
going to use row vectors to represent measures. Furthermore, given a measure (or a row vector) ρ, we
define
X
kρkv =
|ρi |
i∈S
as the variation norm (or `1 norm) of ρ. Then we have
kρP kv ≤ kρkv .
Indeed,


X X
X X

kρP kv =
ρi Pij ≤
|ρi |Pij  = kρkv
j∈S i∈S
i∈S
j∈S
Sometimes it is also important to compute expectations of some functions of Xn . To that end, we
think of a function f on the state space S as the column vector f whose jth entry is equal to f evaluated at
j. If µ is the row vector representing the probability
measure on S and f is the column vector representing
P
a positive or bounded function, then µf = i∈S f (i)µ({i}) is the expected value of f with respect to µ.
Moreover, we can compute the conditional expectation value of f (Xn ) given that X0 = i as following
X
X
E[f (Xn )|X0 = i] =
f (j)P(Xn = j|X0 = i) =
(P n )ij fj = (P n f )i
j∈S
j∈S
In particular, if µ is the initial distribution of the chain X, then
E[f (Xn )] = µP n f
Just as k · kv was used as the metric of probability measures, the distance between functions can be
measured in the uniform norm k · ku :
kf ku = sup |fj |
j∈S
By the Cauchy-Schwartz inequality, |µf | ≤ kµkv kf ku . In particular, we have
kP f ku ≤ kf ku
In many applications, one usually cares about the how a Markov chain behaves in the long time period,
especially wants to know if the chain can achieve certain equilibrium distribution. To be more precise,
we want to know whether µP n will be nearly independent of µ for sufficiently large n. If so, we define
π = limn→∞ µP n . It is easy to see that π satisfies π = πP . This π is called a stationary distribution of
the transition matrix P .
The following theorem provides a way to justify the existence of the stationary distribution.
Theorem 2.1. (Doeblin’s Theorem) Let P be a transition probability matrix satisfying that, for some
state j0 ∈ S and ε > 0, Pij0 ≥ ε for all i ∈ S. Then P has a unique stationary probability measure π,
πj0 ≥ ε, and for all initial distribution µ,
kµP n − πkv ≤ 2(1 − ε)n ,
2
n≥0
Proof. The proof relies upon the following observations: if ρ ∈ RS is a row vector with kρkv < ∞, then
X
X
(ρP )j =
(ρ)i
(1)
j∈S
i∈S
and
X
(ρ)i = 0 =⇒ kρP n kv ≤ (1 − ε)n kρkv
for n ≥ 1.
(2)
i∈S
Indeed, (1) follows from Fubini’s Theorem that

!
X
(ρP )j =
j∈S
X X
j∈S
ρi Pij
=
X

X

i∈S
i∈S
X
(ρ)i
ρi Pij  =
j∈S
i∈S
P
And for (2), it suffices to prove it in the case that n = 1. Suppose that i∈S ρi = 0, note that
X
X
X
|(ρP )j | = ρi Pij = ρi (Pij − εδj,j0 ) ≤
|ρi |(Pij − εδj,j0 )
i∈S
i∈S
i∈S
and hence

!
kρP kv ≤
X X
j∈S
|ρi |(Pij − εδj,j0 )
=
i∈S
X

X
|ρi |  (Pij − εδj,j0 ) = (1 − ε)kρkv
i∈S
j∈S
For a probability vector µ, define µn = µP n . Then µn = µn−m P m and
P
i ((µ
n−m
− µ)i ) = 1 − 1 = 0,
kµn − µm kv ≤ (1 − ε)m kµn−m − µkv ≤ 2(1 − ε)m
for 1 ≤ m < n. Hence {µn }∞
n=1 is a Cauchy sequence, thus there exists a probability vector π such that
kµn − πkv → 0. In addition, π = limn→∞ µP n+1 = limn→∞ (µP n )P = πP , that is π is stationary. In
particular,
X
X
πj0 =
πi Pij0 ≥ ε
πi = ε
i∈S
i∈S
For any probability vector ν, by using (2), we have
kνP m − πkv = k(ν − π)P m kv ≤ 2(1 − ε)m
which justifies the convergence result and also implies that π is the uniqueness stationary measure of
P.
Examples
(a) Any sequence of independent random variables taking values in the discrete space S is a Markov
chain.
Proof. The sequence X1 , X2 , · · · of independent random variables satisfies
P(Xn+1 = j|X1 = i1 , · · · , Xn = in ) = P(Xn+1 = j),
thus the sequence is a Markov chain. In addition, the chain is homogeneous if the {Xi }i∈S are identically
distributed.
(b) A die is rolled repeatedly. Let Nn be the number of sixes in n rolls. Then {Xn }n∈N is a Markov
chain.
3
Proof. Clearly, Nn+1 − Nn is independent of N1 , · · ·
the transition matrix P is given by

1

6,
P (i, j) = 65 ,


0,
3
, Nn , so {Xn }n∈N is a Markov chain. Furthermore,
if j = i + 1
if j = i
otherwise
Optional Stopping Theorem
A discrete-time martingale is a discrete-time stochastic process {Xn }n∈N0 with respect to the filtration
F = {Fn }n∈N0 satisfying
(i) E(|Xn |) < ∞;
(ii) Xn is a adapted to Fn ;
(iii) E(Xn+1 |X1 , · · · , Xn ) = Xn .
Clearly, a martingale {Xn } has the same expectation, i.e. E(Xn ) = E(X0 ) for any n ∈ N0 . However,
a more interesting question is under what condition this is still true if you stop after a random time T ;
that is when will E(XT ) = E(X0 ) hold. It is answered by the Optimal Stopping Theorem which we are
now elaborating about.
A stopping time T is a random variable such that {T ≤ n} ∈ Fn for any n ∈ N0 . Suppose that T is
finite almost surely, and let (X, F) be a martingale. Then T ∧ n → T as n → ∞, so YT ∧n → YT a.s. Thus
E(X0 ) = E(XT ∧n ) → E(XT ) if the family of {XT ∧n } is uniformly integrable.
Theorem 3.1. (Optimal Stopping Theorem) Let (X, F) be a martingale and let T be a stopping
time. Then E(XT ) = E(X0 ) if:
(i) P(T < ∞) = 1;
(ii) E(|XT |) < ∞;
(iii) E(Xn 1{T >n} ) → 0 as n → ∞.
Proof. We write XT as XT = XT ∧n +(XT −Xn )1{T >n} . Note that E(XT ∧n ) = E(X0 ). Taking expectations
in the last equation gives
E(XT ) = E(X0 ) + E(XT 1{T >n} ) − E(Xn 1{T >n} )
The last term goes to zero from (iii) as n → ∞. As for the second term,
E(XT 1{T >n} ) =
∞
X
E(XT 1{T =k} ) → 0
k=n+1
as n → ∞ since the series converges by (ii). Therefore, we have E(XT ) = E(X0 ).
The conditions in above theorem is almost the weakest needed for the theorem to hold. It is worth to
note that there are stronger conditions that are more useful in practice. We list a few in the following:
the Optimal Stopping Theorem holds if one of the following holds
(i) T is bounded surely.
(ii) X is bounded and T is a.s. bounded.
(iii) E(T ) < ∞ and X has bounded increments.
(iv) X is uniform integrable.
We note that the Theorem 1.24 in the lecture notes has used the Optimal Stopping Theorem under
the condition (ii) above.
4
References
[1] G. Grimmett and D. Stirzaker, Probability and Random Processes, Oxford University Press, 1992.
[2] M. Hairer, Ergodic Properties of Markov Processes. Lecture Notes, 2006.
[3] D. W. Stroock, An Introduction to Markov Processes. Vol. 230. Springer, 2005.
5
Download