Report on Finite Markov Chains Yuxiang Hu 7796732 7 December 2017 1 Introduction and Foundations After being brought up by A.A. Markov in 1907 and developed by a number of mathematicians, the theory of Markov chain has invoked applications in almost every field associated with statistical study, such as meteorology, information and computer science, economy, etc. Roughly speaking, a Markov chain is a stochastic process that satisfies the memoryless property, that is, one can make prediction on the future regardless of the past, with the present information given. Therefore, if the transition matrix of a Markov chain together with the initial state are given, then the entire process can be determined in a predictable way. It underlies the importance of the study of nonnegative matrices in finite Markov chains. This report reviews the basic knowledge of finite Markov chains and compares two different theories in determining several quantities in the process. Group generalized inverse as the foundation of one theory is studied. Finally, calculation complexity consideration is given to the two theories. The following notations and definitions are used throughout the report. Other notations unless otherwise specified, are the conventions used in class. Set N { - the complement of N ; Probability M (X) - the expected value of X; V ar(X) - the variance of X; For X ∈ M n(C) : X 0 - the transpose of X; Xdg - the diagonal matrix obtained from X by setting all elements off the main diagonal equal to 0; 1 Xsq - the matrix obtained from X by squaring each entry; σ(X) the eigenvalue of X; For a Markov Chain: T - the one-step transition matrix of an m-state homogeneous Markov chain; j - a column vector with each component equal to 1; J - the matrix jj 0 ; w0 - the fixed probability (row) vector for the transition matrix of an ergodic chain; W - the matrix I − AA# , where A# is given in the following definition; tij - the transition probability from i to j; sj - some state in a Markov chain; d - the GCD of cycle length in an ergodic set. Definition 1.0.1. For A ∈Mn (C), the group inverse for A, when it exists, is defined be the matrix A# which satisfies the conditions AA# A = A, A# AA# = A# and AA# = A# A . For A ∈Mn (C), it can be shown that the three equations AXA = A, XAX = X and AX = XA possess a solution for X if and only if rank(A) = rank(A2 ). Furthermore, when it exists, the solution is unique. Definition 1.0.2. If A− is a matrix which satisfies AA− A = A, then A− is said to be a (1)-inverse for A. Lemma 1.0.3. [2] Suppose M ∈Mn (C) has the form U C M= , 0 L where U and L are square. If U is nonsingular and if L# exists, then M # exists. Theorem 1.0.4. [2]For every transition matrix T , the matrix A# exists where A = I − T . 2 Proof. The proof is divided into two parts. The first part deals with the case when T is irreducible while the second part handles the case when T is reducible. Details of the proof can be referred to Meyer’s paper. P(n−1) Definition 1.0.5. Let s be a sequence. Denote t = (1/n) n n i=0 si , un = Pn n n−i i (1 − k) si for some k such that 0 < k < 1. i=0 i k 1. If tn converges to a limt t, then we say that the original sequennce is Cesaro-summable to t. 2. If un converges to a limt u, then we say that the original sequennce is Euler-summable to u. There are two facts one needs to know. If a sequence converges, then it is summable by each method to its limit. If a sequence is summable by both methods, the two sums must be the same. Theorem 1.0.6. [1]If the sequence An is summable to 0 by some averaging method, then the matrix I − A has an inverse, and the series I + A + A2 + ... is summable by the same method to (I − A)−1 . Fundamental Facts of One-step Transition Matrix The following results are suggested by Meyer and concluded by the author through the studying of Meyer’s paper. • T n is a stochastic matrix. • 1 ∈ σ(T ) with the corresponding eigenvector equal to j. • ρ(T ) = 1. • T is primitive for regular Markov chain; T is periodic(irreducible) for cyclic Markov chain. • For ergodic chains, 1 has algebraic multiplicity equal to 1, geometric multiplicity equal to 1. Thus the Jordan Canonical form of T is obtained. 1 0 0 K where 1 6∈ σ(K) 3 2 Remedies for Absorbing Chains Recall that an absorbing chain contains at least one transient set and one ergodic set where every ergodic set consists of a single absorbing state. An absorbing state is characterized by tii = 1, in other word, once such a state is entered, it can never be escaped. From this one may guess that after sufficiently large number of steps, an absorbing chain will inevitably end up in some absorbing state. This result is generalized by Kemeny and Snell to any finite Markov chains. Theorem 2.0.1. [1]In any finite Markov chain with transient states, no matter where the process starts, the probability after n steps that the process is in an ergodic state tends to 1 as n → ∞. Proof. [1]Kemeny and Snell gives a probabilistic proof. The case is easy if the process starts in an ergodic sets. Now assume that the chain starts in some transient state si and assume that it will take at most k steps to get from any transient state to an ergodic state. The latter assumption could be made since it is possible to travel from any transient state to some ergodic state in finite steps. Hence for any transient state, there is a positive number p such that the probability of entering an ergodic state in k steps is at least p. Hence starting in si , the probability of staying in transient state after kn steps is at most (1 − p)n , which tends to 0 as n → ∞. However, in the matrix analysis language it is natural to think of the problem of matrix convergence in terms of spectral radius of the matrix. This idea is suggested but not proved by Meyer. Daniel Johnson provides a beautiful proof on MathstackExchange website , as concluded in theorem 2.0.1. Then Meyer generalise the idea to ergodic chains. Theorem 2.0.2. Let T be the transition matrix of some Markov chain that has some transient states. Let Q denote the principle submatrix of T that corresponds a entire transient set. Then ρ(Q) < 1 Proof. [3]Let λ be the Perron value of Q. Perron-Frobenius suggests that λ corresponds to a positive normalized left eigenvector v 0 such that XX |λ| = kλv 0 k1 = kv 0 Qk1 = vj qjk . j 4 k P Let j = 1/n(1 − nk=1 qjk ). Add each element of the jth row of Q, it results in the row sum equal to 1.Let 0 be the row vector containning the values of j . XX |λ| = vj (qjk + j − j ) = kv(Q + 0 jk1 − n(v) = 1 − n(v) < 1 j k Theorem 2.0.3. [2]Let T be the transition matrix of an m-state ergodic chain, and let A = I −T .Then every principle submatrix of A of size k∗k, k = 1, 2, 3, ..., m − 1 is nonsingular. Furthermore, the inverse of each principal submatrix of A is a nonnegative matrix. Proof. (sketch) One can think of forming a absorbing chain by setting up the complement of the states corresponding to the principle submatrix to be absorbing states. Then the result follows immediately. The reader should bear in mind the idea behind the proof of creating a new Markov chain from the existing one by substituting some transient states with absorbing states. This idea shows up a few times later and was adopted by Kemeny and Snell to extend the results from absorbing chains to a larger set of states. From above, it is easy to see that T itself as a transition for absorbing chain will essentially converge. The result is summarized in the following theorem and the ”fundamental matrix” for absorbing chains named by Kemeny and Snell can be introduced. Since Meyer did not cover lots in absorbing chains, the last part of the section will be a solo exhibition of Kemeny and Snell with focus on the applications of fundamental matrix. These results were used as part of the treatment of ergodic chains by Kemeny and Snell later. For convenience, the canonical form of the T will be used from now on. Suppose an absorbing chain has q transient states and s absorbing states. Then there is a permutation matrix P such that I 0 0 P TP = R Q where I corresponds to all the s absorbing states, and Q corresponds to all the q transient states. 5 Theorem 2.0.4. [1]Assume Q is the principle submatrix of the transition matrix for some absorbing chain corresponding to all the transient states, then Qn converges to 0 as n→ ∞. Proof. It follows immediately from Theorem 2.0.2. Definition 2.0.5. [1]For an absorbing Markov Chain, the fundamental matrix is defined to be N = (I − Q)−1 . Note that the existence of N is guaranteed by the convergence of Qn . An interpretation of N is given in the following setting: Definition 2.0.6. Define nij to be the function giving the total number of times that the process is in sj with the initial state being si . (This is defined only for transient states si and sj .) Theorem 2.0.7. [1]Let M be matrix whose (i, j) − th entry is given by nij .Then M = N . P i Proof. Note that M = ∞ i=0 Q = N Remark Indeed, the fundamental matrix of an absorbing chain can be used to study one type of problem in ergodic chains associated with the number steps in some state before ending up in some other state by taking the last state as if it were an absorbing state. Again, one can see the power of forming a new chain in such a way. One application of the fundamental matrix for an absorbing chain is to calculate the variance of nij , however, details are omitted. Another application of the fundamental matrix, as mentioned in the remark, is to study a similar question in a wider setting. Here the notion of open set is introduced. Definition 2.0.8. A set S of states is an open set if from every state in S it is possible to go to a state in S { . 6 Here is a characterization of open set: Theorem 2.0.9. [1]A set S of states is open if and only if no ergodic set is a subset of S. Theorem 2.0.10. [1]Let S be an open set of s states. Let Q be the s by s submatrix of the transition matrix corresponding to these states. Let the process start in si where si . Then: The (i, j) − th entry of N = (I − Q)−1 gives the mean number of times the process is in sj before leaving S. Proof. Again, the idea of creating a new absorbing chain can be used here. 3 Treatment of Ergodic Chains Recall that an ergodic chain can either be regular or cyclic, with the corresponding transition matrix being either primitive or periodically irreducible. Meyer, Kemeny and Snell independently defined their fundamental matrices in studying ergodic chains. It will be shown that although the two fundamental matrices are different in general, yet one may be used in a very similar manner to the way the other is used. Moreover, the computational advantage of Meyers method will be unraveled. 3.1 Foundations of Kemeny and Snells Theory Indeed, Meyer used A# as his fundamental matrix. Before establishing Kemeny and Snells fundamental matrix, another matrix so called as the limiting matrix needs to be introduced. In this section definitions and theorems are first given under regular chains, then they are generalized to the case for all types of ergodic chains. Lemma 3.1.1. [1]Let T be the transition matrix for a regular chain, then 1. There is a unique positive vector w0 such that w0 T = w0 and w0 j = j. w’ is defined to be the unique fixed probability vector of T. 2. T n converges to jw0 as n → ∞. The W = jw0 is defined to be the limiting matrix of T . 3. For any probability vector x0 , x0 T n converges to w0 . 7 Remark This is a well-known fact for primitive Stochastic matrices and the proof is omitted here. The analysis from Kemeny and Snell, Meyer are all very elegant, yet the author used a different approach in the assignment that can be referred to. One should notice that w0 T = w0 provides a way of calculating w0 and W . Theorem 3.1.2. [1]Let T be the transition matrix for a regular Markov chain. Let W be its limiting matrix. Then Z = (I − (T − W ))−1 exists. Proof. From the fact that T W = W T and W T = T W = W , one can show that (T − W )n = T n − W so that (T − W )n → 0 as n → ∞. This establishes the existence of Z. Definition 3.1.3. [1]Let T be the transition matrix for a regular chain. The matrix Z = (I − (T − W ))−1 is called the fundamental matrix for the regular chain. Now it will be shown that the notion of fundamental matrix can be extended for any ergodic chains in a similar fashion. However, in dealing with cyclic chains, T n does not converge to 0. Therefore, W need to be redefined in order to generalize fundamental matrix. Lemma 3.1.4. [1]If the sequence T n is summable to 0 by some averaging method, then the matrix I − T has an inverse, and the partial sum I + T 1 + T 2 + · · · + T n is summable by the same method to (I − T )−1 . Lemma 3.1.5. T is irreducible if and only if (1 − k)T + kI is primitive for some k between 0 and 1. Theorem 3.1.6. [1]let T be the transition matrix of a ergodic chain. Then T n is Euler-summable to a limiting matrix W , and this limiting matrix is of the form W = jw0 , with w0 a positive probability vector. Proof. The case is easy when the chain is regular. Now assume that the chain is cyclic, so that T is periodic with period d > 1. By Lemma 3.1.5, (1 − k)T + kI is primitive so that it is the transition matrix for some regular chain, so that ((1−k)T +kI)n converges to a limiting matrix W = jw0 where w0 a positive probability vector as n → ∞. Expand the expression one can see that T n is Euler-summable to W . 8 Similar results to W for regular chains can then be generalized to W for ergodic chains. Theorem 3.1.7. [1]If T is an ergodic transition matrix, and let W and w0 be its limiting vector and fixed probability vector, then 1. For any probability vector x0 , x0 T n is Euler-summable to w0 . 2. w0 is the unique fixed probability vector of T . 3. T W = W T 3.2 Fundations of Meyer’s Theory Meyer adopts the group inverse of A = I − T as his fundamental matrix. As it turns out, almost every quantity one can derive from Kemeny and Snells theory, there is a corresponding expression in terms of A# which has its own calculation-wise advantage. Careful reader might notice that the limiting matrix is written as W in the last section, however, W is defined to be W = I − AA# at the notation section. The validity of such will be given in the following theorem. Lemma 3.2.1. [2]Assume T is the transition matrix of a regular chain, then W = I − AA# = limn→∞ T n Proof. [2]Consider the Jordan canonical form of T. Perron-Frobenius theorem suggests that the perron value 1 has multiplicity 1. Hence, there exists nonsingular matrix S such that 0 −1 1 T =S S. 0 K Furthermore, 1 is not an eigenvalue of K, deduce that I − K is nonsingular. So that 0 # −1 0 A =S S. 0 (I − K)−1 From this one can easily verify that W = I − AA# = limn→∞ T n . Theorem 3.2.2. [2]Assume T is the transition matrix of an ergodic chain, then n X n n−i # W = I − AA = lim k (1 − k)i T i . n→∞ i i=0 9 Proof. The proof follows directly from the preceding results for regular chains. The following result gives an expression of the fixed probability vector of T for an ergodic chain in terms of information of A# . Theorem 3.2.3. [2]If T is the transition matrix of an m-state ergodic chain and if A = I −T , then the fixed probability vector of T is given by w0 = e0i −ri0 A for each i = 1, 2, , m, where ri0 is the i − th row of A# . Proof. The theorem directly follows from the facts that W = I − AA# , e0i W = w0 , and AA# = A# A. The following theorem shows how the two fundamental matrices are related. Theorem 3.2.4. [2]Let T be the transition matrix of an ergodic chain, let A = I − T , and let W = jw0 , where w0 is the unique fixed probability vector of T . The matrix Z = (I − (T − W ))−1 is given by Z = A# + W = I + T A# . Remark One can verify the theorem directly by multiplying the two matrices. However, it is worth mentioning that the inverse of I − (T − W ) was not found by trial and error. One can refer to Meyer’s paper for the detail. Notice that whenever a result can be derived from Z, it can now be expressed in terms of A# by using the last theorem. However, Meyer suggests that the theory can be developed directly from A# , without the need of introducing Z. 3.3 Application of the fundamental matrices In this section, certain classical statistical quantities are introduced and calculated in terms of the two fundamental matrices. It will be shown that in most cases, the two fundamental matrices could interchange with no change or subtle change in the expressions of some quantities. Application #1 10 Definition 3.3.1. Let T be the transition matrix of a regular chain, and let N (n) denote the matrix whose (i, j) − th entry is Nij (n) = the expected number of time the process is in state sj in the first n stages (i.e. the initial plus (n − 1) stages) when the process was initially in state si . Theorem 3.3.2. [1][2]In the above setting, 1. Nij (n) − nW tends to Z − W , 2. N ij(n) − nW tends to A# In this case, A# = lim (N (n) − nW ). n→∞ P k Remark The theorem directly follows from the fact that N (n) = n−1 k=0 T . Note that this theorem gives an interpretation for the entries of A# . Aplication #2 Definition 3.3.3. For an ergodic chain with transition matrix T , let M denote the matrix whose (i, j) − th entry is given by Mij = the expected number of steps before entering state sj for the first time after the initial state si . M is called the mean first passage matrix. Kemeny and Snell’s representation of M Now let M be the same setting as above except that M is under a regular chain. Lemma 3.3.4. [1]The matrix M satisfies the equation X = T (X − Xdg ) + J. (1) Lemma 3.3.5. [1]Mii = 1/wi where wi is the ith component of the fixed probability vector w0 of T . Theorem 3.3.6. [1]The mean first passage matrix M is the unique solution of (1) and is given by M = (I − Z + JZdg )D where D is the diagonal matrix with diagonal elements dii = 1/wi . 11 Remark Kemeny and Snell then suggests that the same expressions of M can then be generalized for ergodic chains since many basic properties of of Z under a regular chain setting can be generalized to ergodic chains and since the period d did not appear explicitly. Although the author is not fully convinced from this point of view, various examples has been tested to follow Kemeny and Snell’s argument. Meyer’s representation of M Theorem 3.3.7. [2]For an ergodic chain, the unique solution of the equation (1) is given by M = (I − A# + JA# (2) dg )D. Remark Plug (2) into (1) to verify.Notice that the two expressions for M can be interchanged by substituting one fundamental matrix with another. Application #3 Definition 3.3.8. For an ergodic chain, let V be the matrix whose (i, j) − th entry is Vij = variance of the number of steps required to reach state sj for the first time after the initial state si . Kemeny and Snell’s representation of V Theorem 3.3.9. [1]Let P be the matrix whose (i, j) − th entry is Pij = square of the expected number of steps before entering state sj for the first time after the initial state si . Then P satisfies the equation X = T (X − Xdg ) − 2T (Z − JZdg )D + J. (2) The unique solution to (2) is given by P = M (2Zdg D − I) + 2(ZM − J(ZM )dg ). Remark This theorem can be similarly proved as the theorem for M . V is then given by P − Msq . Meyers representation of V Theorem 3.3.10. [2]The unique solution of (2) is given by B = M (2A# dg D + # # I) + 2(A M − J(A M )dg ). Again, one can notice that the expression of V by Kemeny and Snell can be rewritten with the only change to be Z replaced by −A# . 12 3.4 Calculation of A# and w0 Lower calculation complexity is important when dealing with large data set. From last section, one see that both fundamental matrices and the fixed probability vector are needed for calculation purpose. This section carries through the calculation of A# and w0 for ergodic chains using A# The following theorem provides a way to calculating A Theorem 3.4.1. [2]Let T be the transition matrix of an m-state ergodic chain, and let A = I − T . Write A as U c A= 0 d α where U is (m − 1) ∗ (m − 1)(from theorem 2.0.2, U −1 exists) and adopt the following notation: h0 = d0 U −1 , δ = −h0 U −1 j, β = 1 − h0 j, F = U −1 − δ/βI. The scalar δ and β are each nonzero(in fact, δ > 0 and β > 1), and A# is given by −1 U + U −1 jh0 U −1 /δ − F jh0 F/δ −F j/β # A = . h0 F/β δ/β 2 From Theorem 3.2.3, one can see that in order to calculate w’, only one row of A needs to be known. This next theorem deal with the caculation of w0 . Theorem 3.4.2. [2]If T is the transition matrix of an m-state ergodic chain and A = I − T is partitioned the same as above, then w0 is given by w0 = 1/β −d0 U −1 1 With w0 one can know ”W” and thus is able to calculate all the quantities listed above. On the other hand, Kemeny and Snell suggest that the calculation of w0 can be done by solving a linear equation w0 T = w0 , and from this the fundamental matrix Z = (I − (T − W ))−1 can be calculated. One can immediately see the advantage of Meyer’s theory. 13 4 Example The report is concluded with an example. Consider the transition matrix for a cyclic chain: 0 1/3 2/3 T = 1/2 0 1/2 . 3/4 1/4 0 so that 1 −1/3 −2/3 1 −1/2 . A = −1/2 −3/4 −1/4 1 Consider the following cases. First assume only w0 is desired. Then h is computed as the solution of the system U 0 x = d, where 1 −1/3 U= and d0 = (−3/4 − 1/4). −1/2 1 Now if A# is desired, compute U −1 and obtain h0 as h0 = d0 U −1 . In our case, 6/5 2/5 −1 U = and h0 = (−21/20 − 2/5). 3/5 6/5 Now calculate β = 49/20 The fixed probability vector w0 is given by w0 = (3/7 8/49 20/49) . Therefore 4/7 −8/49 −20/49 AA# = I − W = −3/7 41/49 −20/49 −3/7 −8/49 29/49 The matrix D is 7/3 0 0 0 49/8 0 0 0 49/20 14 References [1] John G. Kemeny, J.Laurie Snell Finite Markov Chains. D. Van Nostrand Company, INC., New Jersey, 1960. [2] Carl Meyer The role of the group inverse in the theory of finite Markov chains. SIAM Review 17 (3), 443-464, 1975. [3] Daniel Johnson Substochastic Matrix spectral radius. Mathematics Stack Exchange, https://math.stackexchange.com/questions/36828/substochasticmatrix-spectral-radius 15