MA3H2 Markov Processes and Percolation Theory (Support Class 3) Yulong Lu (yulong.lu@warwick.ac.uk) 1 Markov Chain Definition 1.1. A Markov chain on a general state space X is a sequence of X -valued random variables {Xn }n∈N0 satisfying that, for every n > 0 and every measurable bounded function f : X → R, one has E(f (Xn )|F0n−1 ) = E(f (Xn )|Fn−1 ) where the σ-algebra Fnm = σ(Xm , Xm+1 , · · · , Xn ) and Fn = σ(Xn ). A Markov chain is a discrete time Markov process, i.e. the future of the process is determined by the present and independent of the past. Definition 1.2. A Markov process is time-homogeneous if there exists a measurable map P from X into P (X ), the space of probability measures on X , such that P(Xn ∈ A|Xn−1 = a) = P (a; A) for every A ∈ B(X ), almost every a ∈ X , and every n > 0. We call P the transition probability (or kernel) of the Markov chain. With the transition kernel P , one can define the n-step transition probability P n by Z n P (xn ∈ A|x0 = a) = P (a; A) = P (x; A)P n−1 (a; dx) X Given an initial probability measure µ0 and a transition kernel P , it is natural to ask whether there exists a Markov chain with the prescribed initial measure and transition kernel. The following result answers this question affirmatively. Proposition 1.1. Let P be a measurable map from X to P (X ) and let µ0 be a probability measure on X . Then, there exists a (unique in law) Markov process X with transition probabilities P such that the law of X0 is µ0 . Proof. Define the sequence of finite dimensional probability measures µn on X n by Z Z Z Z Z µn+1 (A0 ×· · ·×An ) = ··· P (xn−1 , An )P (xn−2 , dxn−1 ) · · · P (x1 , dx2 )P (x0 , dx1 )µ(dx0 ) A0 A1 A2 An−2 An−1 It is easy to check that this sequence of measures fulfill the compatibility condition. By Kolmogorov’s Extension Theorem, there exists a unique measure µ on X ∞ such that the restriction of µ on X n is given by µn . We choose Ω = X ∞ as our probability space endowed with the probability measure µ. Then one can define the process X = {Xn }n∈N0 with Xn (ω0 , ω1 , · · · ) = ωn and ωn ∼ µ0 P n , it is clear that the law of (X0 , · · · , Xn ) is equal to µn+1 for every n. 2 Discrete State Space Now we consider the Markov chain X defined on a finite or countably discrete state space S. For all n ≥ 0 and (i0 , · · · , in , j) ∈ Sn+2 , P(Xn+1 = j|X0 = i0 , · · · , Xn = in ) = Pin j 1 where P is the transition matrix all of whose entries are non-negative and each of whose row sums to 1. It is clear that the n-step transition probability is given by P n . Let µ be the row vector with i-th entry (µi ) = P(X0 = i), then µ is called the initial distribution of the chain and P(Xn = j) = (µP n )j , n ≥ 0 and j ∈ S, that is the row vector µP n is the probability distribution of the Markov chain X with the initial distribution µ. Since we are treating the discrete state space Markov chain, for the sake of convenience, we are going to use row vectors to represent measures. Furthermore, given a measure (or a row vector) ρ, we define X kρkv = |ρi | i∈S as the variation norm (or `1 norm) of ρ. Then we have kρP kv ≤ kρkv . Indeed, X X X X kρP kv = ρi Pij ≤ |ρi |Pij = kρkv j∈S i∈S i∈S j∈S Sometimes it is also important to compute expectations of some functions of Xn . To that end, we think of a function f on the state space S as the column vector f whose jth entry is equal to f evaluated at j. If µ is the row vector representing the probability measure on S and f is the column vector representing P a positive or bounded function, then µf = i∈S f (i)µ({i}) is the expected value of f with respect to µ. Moreover, we can compute the conditional expectation value of f (Xn ) given that X0 = i as following X X E[f (Xn )|X0 = i] = f (j)P(Xn = j|X0 = i) = (P n )ij fj = (P n f )i j∈S j∈S In particular, if µ is the initial distribution of the chain X, then E[f (Xn )] = µP n f Just as k · kv was used as the metric of probability measures, the distance between functions can be measured in the uniform norm k · ku : kf ku = sup |fj | j∈S By the Cauchy-Schwartz inequality, |µf | ≤ kµkv kf ku . In particular, we have kP f ku ≤ kf ku In many applications, one usually cares about the how a Markov chain behaves in the long time period, especially wants to know if the chain can achieve certain equilibrium distribution. To be more precise, we want to know whether µP n will be nearly independent of µ for sufficiently large n. If so, we define π = limn→∞ µP n . It is easy to see that π satisfies π = πP . This π is called a stationary distribution of the transition matrix P . The following theorem provides a way to justify the existence of the stationary distribution. Theorem 2.1. (Doeblin’s Theorem) Let P be a transition probability matrix satisfying that, for some state j0 ∈ S and ε > 0, Pij0 ≥ ε for all i ∈ S. Then P has a unique stationary probability measure π, πj0 ≥ ε, and for all initial distribution µ, kµP n − πkv ≤ 2(1 − ε)n , 2 n≥0 Proof. The proof relies upon the following observations: if ρ ∈ RS is a row vector with kρkv < ∞, then X X (ρP )j = (ρ)i (1) j∈S i∈S and X (ρ)i = 0 =⇒ kρP n kv ≤ (1 − ε)n kρkv for n ≥ 1. (2) i∈S Indeed, (1) follows from Fubini’s Theorem that ! X (ρP )j = j∈S X X j∈S ρi Pij = X X i∈S i∈S X (ρ)i ρi Pij = j∈S i∈S P And for (2), it suffices to prove it in the case that n = 1. Suppose that i∈S ρi = 0, note that X X X |(ρP )j | = ρi Pij = ρi (Pij − εδj,j0 ) ≤ |ρi |(Pij − εδj,j0 ) i∈S i∈S i∈S and hence ! kρP kv ≤ X X j∈S |ρi |(Pij − εδj,j0 ) = i∈S X X |ρi | (Pij − εδj,j0 ) = (1 − ε)kρkv i∈S j∈S For a probability vector µ, define µn = µP n . Then µn = µn−m P m and P i ((µ n−m − µ)i ) = 1 − 1 = 0, kµn − µm kv ≤ (1 − ε)m kµn−m − µkv ≤ 2(1 − ε)m for 1 ≤ m < n. Hence {µn }∞ n=1 is a Cauchy sequence, thus there exists a probability vector π such that kµn − πkv → 0. In addition, π = limn→∞ µP n+1 = limn→∞ (µP n )P = πP , that is π is stationary. In particular, X X πj0 = πi Pij0 ≥ ε πi = ε i∈S i∈S For any probability vector ν, by using (2), we have kνP m − πkv = k(ν − π)P m kv ≤ 2(1 − ε)m which justifies the convergence result and also implies that π is the uniqueness stationary measure of P. Examples (a) Any sequence of independent random variables taking values in the discrete space S is a Markov chain. Proof. The sequence X1 , X2 , · · · of independent random variables satisfies P(Xn+1 = j|X1 = i1 , · · · , Xn = in ) = P(Xn+1 = j), thus the sequence is a Markov chain. In addition, the chain is homogeneous if the {Xi }i∈S are identically distributed. (b) A die is rolled repeatedly. Let Nn be the number of sixes in n rolls. Then {Xn }n∈N is a Markov chain. 3 Proof. Clearly, Nn+1 − Nn is independent of N1 , · · · the transition matrix P is given by 1 6, P (i, j) = 65 , 0, 3 , Nn , so {Xn }n∈N is a Markov chain. Furthermore, if j = i + 1 if j = i otherwise Optional Stopping Theorem A discrete-time martingale is a discrete-time stochastic process {Xn }n∈N0 with respect to the filtration F = {Fn }n∈N0 satisfying (i) E(|Xn |) < ∞; (ii) Xn is a adapted to Fn ; (iii) E(Xn+1 |X1 , · · · , Xn ) = Xn . Clearly, a martingale {Xn } has the same expectation, i.e. E(Xn ) = E(X0 ) for any n ∈ N0 . However, a more interesting question is under what condition this is still true if you stop after a random time T ; that is when will E(XT ) = E(X0 ) hold. It is answered by the Optimal Stopping Theorem which we are now elaborating about. A stopping time T is a random variable such that {T ≤ n} ∈ Fn for any n ∈ N0 . Suppose that T is finite almost surely, and let (X, F) be a martingale. Then T ∧ n → T as n → ∞, so YT ∧n → YT a.s. Thus E(X0 ) = E(XT ∧n ) → E(XT ) if the family of {XT ∧n } is uniformly integrable. Theorem 3.1. (Optimal Stopping Theorem) Let (X, F) be a martingale and let T be a stopping time. Then E(XT ) = E(X0 ) if: (i) P(T < ∞) = 1; (ii) E(|XT |) < ∞; (iii) E(Xn 1{T >n} ) → 0 as n → ∞. Proof. We write XT as XT = XT ∧n +(XT −Xn )1{T >n} . Note that E(XT ∧n ) = E(X0 ). Taking expectations in the last equation gives E(XT ) = E(X0 ) + E(XT 1{T >n} ) − E(Xn 1{T >n} ) The last term goes to zero from (iii) as n → ∞. As for the second term, E(XT 1{T >n} ) = ∞ X E(XT 1{T =k} ) → 0 k=n+1 as n → ∞ since the series converges by (ii). Therefore, we have E(XT ) = E(X0 ). The conditions in above theorem is almost the weakest needed for the theorem to hold. It is worth to note that there are stronger conditions that are more useful in practice. We list a few in the following: the Optimal Stopping Theorem holds if one of the following holds (i) T is bounded surely. (ii) X is bounded and T is a.s. bounded. (iii) E(T ) < ∞ and X has bounded increments. (iv) X is uniform integrable. We note that the Theorem 1.24 in the lecture notes has used the Optimal Stopping Theorem under the condition (ii) above. 4 References [1] G. Grimmett and D. Stirzaker, Probability and Random Processes, Oxford University Press, 1992. [2] M. Hairer, Ergodic Properties of Markov Processes. Lecture Notes, 2006. [3] D. W. Stroock, An Introduction to Markov Processes. Vol. 230. Springer, 2005. 5