Uploaded by 胡宇翔

Markov Chain Report

advertisement
Report on Finite Markov Chains
Yuxiang Hu 7796732
7 December 2017
1
Introduction and Foundations
After being brought up by A.A. Markov in 1907 and developed by a number
of mathematicians, the theory of Markov chain has invoked applications in
almost every field associated with statistical study, such as meteorology, information and computer science, economy, etc. Roughly speaking, a Markov
chain is a stochastic process that satisfies the memoryless property, that
is, one can make prediction on the future regardless of the past, with the
present information given. Therefore, if the transition matrix of a Markov
chain together with the initial state are given, then the entire process can be
determined in a predictable way. It underlies the importance of the study of
nonnegative matrices in finite Markov chains.
This report reviews the basic knowledge of finite Markov chains and compares two different theories in determining several quantities in the process.
Group generalized inverse as the foundation of one theory is studied. Finally,
calculation complexity consideration is given to the two theories.
The following notations and definitions are used throughout the report.
Other notations unless otherwise specified, are the conventions used in class.
Set
N { - the complement of N ;
Probability
M (X) - the expected value of X;
V ar(X) - the variance of X;
For X ∈ M n(C) :
X 0 - the transpose of X;
Xdg - the diagonal matrix obtained from X by setting all elements off the
main diagonal equal to 0;
1
Xsq - the matrix obtained from X by squaring each entry;
σ(X) the eigenvalue of X;
For a Markov Chain:
T - the one-step transition matrix of an m-state homogeneous Markov
chain;
j - a column vector with each component equal to 1;
J - the matrix jj 0 ;
w0 - the fixed probability (row) vector for the transition matrix of an
ergodic chain;
W - the matrix I − AA# , where A# is given in the following definition;
tij - the transition probability from i to j;
sj - some state in a Markov chain;
d - the GCD of cycle length in an ergodic set.
Definition 1.0.1. For A ∈Mn (C), the group inverse for A, when it exists,
is defined be the matrix A# which satisfies the conditions
AA# A = A, A# AA# = A# and AA# = A# A
. For A ∈Mn (C), it can be shown that the three equations AXA = A, XAX =
X and AX = XA possess a solution for X if and only if rank(A) =
rank(A2 ). Furthermore, when it exists, the solution is unique.
Definition 1.0.2. If A− is a matrix which satisfies
AA− A = A,
then A− is said to be a (1)-inverse for A.
Lemma 1.0.3. [2] Suppose M ∈Mn (C) has the form
U C
M=
,
0 L
where U and L are square. If U is nonsingular and if L# exists, then M #
exists.
Theorem 1.0.4. [2]For every transition matrix T , the matrix A# exists
where A = I − T .
2
Proof. The proof is divided into two parts. The first part deals with the
case when T is irreducible while the second part handles the case when T is
reducible. Details of the proof can be referred to Meyer’s paper.
P(n−1)
Definition
1.0.5.
Let
s
be
a
sequence.
Denote
t
=
(1/n)
n
n
i=0 si , un =
Pn n n−i
i
(1 − k) si for some k such that 0 < k < 1.
i=0 i k
1. If tn converges to a limt t, then we say that the original sequennce is
Cesaro-summable to t.
2. If un converges to a limt u, then we say that the original sequennce is
Euler-summable to u.
There are two facts one needs to know. If a sequence converges, then it
is summable by each method to its limit. If a sequence is summable by both
methods, the two sums must be the same.
Theorem 1.0.6. [1]If the sequence An is summable to 0 by some averaging
method, then the matrix I − A has an inverse, and the series I + A + A2 + ...
is summable by the same method to (I − A)−1 .
Fundamental Facts of One-step Transition Matrix
The following results are suggested by Meyer and concluded by the author
through the studying of Meyer’s paper.
• T n is a stochastic matrix.
• 1 ∈ σ(T ) with the corresponding eigenvector equal to j.
• ρ(T ) = 1.
• T is primitive for regular Markov chain; T is periodic(irreducible) for
cyclic Markov chain.
• For ergodic chains, 1 has algebraic multiplicity equal to 1, geometric
multiplicity equal to 1. Thus the Jordan Canonical form of T is obtained.
1 0
0 K
where 1 6∈ σ(K)
3
2
Remedies for Absorbing Chains
Recall that an absorbing chain contains at least one transient set and one
ergodic set where every ergodic set consists of a single absorbing state. An
absorbing state is characterized by tii = 1, in other word, once such a state
is entered, it can never be escaped. From this one may guess that after
sufficiently large number of steps, an absorbing chain will inevitably end up
in some absorbing state. This result is generalized by Kemeny and Snell to
any finite Markov chains.
Theorem 2.0.1. [1]In any finite Markov chain with transient states, no
matter where the process starts, the probability after n steps that the process
is in an ergodic state tends to 1 as n → ∞.
Proof. [1]Kemeny and Snell gives a probabilistic proof. The case is easy if
the process starts in an ergodic sets. Now assume that the chain starts in
some transient state si and assume that it will take at most k steps to get
from any transient state to an ergodic state. The latter assumption could be
made since it is possible to travel from any transient state to some ergodic
state in finite steps. Hence for any transient state, there is a positive number
p such that the probability of entering an ergodic state in k steps is at least
p. Hence starting in si , the probability of staying in transient state after kn
steps is at most (1 − p)n , which tends to 0 as n → ∞.
However, in the matrix analysis language it is natural to think of the
problem of matrix convergence in terms of spectral radius of the matrix.
This idea is suggested but not proved by Meyer. Daniel Johnson provides
a beautiful proof on MathstackExchange website , as concluded in theorem
2.0.1. Then Meyer generalise the idea to ergodic chains.
Theorem 2.0.2. Let T be the transition matrix of some Markov chain that
has some transient states. Let Q denote the principle submatrix of T that
corresponds a entire transient set. Then ρ(Q) < 1
Proof. [3]Let λ be the Perron value of Q. Perron-Frobenius suggests that λ
corresponds to a positive normalized left eigenvector v 0 such that
XX
|λ| = kλv 0 k1 = kv 0 Qk1 =
vj qjk .
j
4
k
P
Let j = 1/n(1 − nk=1 qjk ). Add each element of the jth row of Q, it results
in the row sum equal to 1.Let 0 be the row vector containning the values of
j .
XX
|λ| =
vj (qjk + j − j ) = kv(Q + 0 jk1 − n(v) = 1 − n(v) < 1
j
k
Theorem 2.0.3. [2]Let T be the transition matrix of an m-state ergodic
chain, and let A = I −T .Then every principle submatrix of A of size k∗k, k =
1, 2, 3, ..., m − 1 is nonsingular. Furthermore, the inverse of each principal
submatrix of A is a nonnegative matrix.
Proof. (sketch) One can think of forming a absorbing chain by setting up
the complement of the states corresponding to the principle submatrix to be
absorbing states. Then the result follows immediately.
The reader should bear in mind the idea behind the proof of creating a
new Markov chain from the existing one by substituting some transient states
with absorbing states. This idea shows up a few times later and was adopted
by Kemeny and Snell to extend the results from absorbing chains to a larger
set of states.
From above, it is easy to see that T itself as a transition for absorbing
chain will essentially converge. The result is summarized in the following theorem and the ”fundamental matrix” for absorbing chains named by Kemeny
and Snell can be introduced. Since Meyer did not cover lots in absorbing
chains, the last part of the section will be a solo exhibition of Kemeny and
Snell with focus on the applications of fundamental matrix. These results
were used as part of the treatment of ergodic chains by Kemeny and Snell
later.
For convenience, the canonical form of the T will be used from now on.
Suppose an absorbing chain has q transient states and s absorbing states.
Then there is a permutation matrix P such that
I 0
0
P TP =
R Q
where I corresponds to all the s absorbing states, and Q corresponds to all
the q transient states.
5
Theorem 2.0.4. [1]Assume Q is the principle submatrix of the transition
matrix for some absorbing chain corresponding to all the transient states,
then Qn converges to 0 as n→ ∞.
Proof. It follows immediately from Theorem 2.0.2.
Definition 2.0.5. [1]For an absorbing Markov Chain, the fundamental matrix is defined to be
N = (I − Q)−1 .
Note that the existence of N is guaranteed by the convergence of Qn .
An interpretation of N is given in the following setting:
Definition 2.0.6. Define nij to be the function giving the total number of
times that the process is in sj with the initial state being si . (This is defined
only for transient states si and sj .)
Theorem 2.0.7. [1]Let M be matrix whose (i, j) − th entry is given by
nij .Then M = N .
P
i
Proof. Note that M = ∞
i=0 Q = N
Remark Indeed, the fundamental matrix of an absorbing chain can be used
to study one type of problem in ergodic chains associated with the number
steps in some state before ending up in some other state by taking the last
state as if it were an absorbing state. Again, one can see the power of forming
a new chain in such a way.
One application of the fundamental matrix for an absorbing chain is to
calculate the variance of nij , however, details are omitted. Another application of the fundamental matrix, as mentioned in the remark, is to study a
similar question in a wider setting. Here the notion of open set is introduced.
Definition 2.0.8. A set S of states is an open set if from every state in S
it is possible to go to a state in S { .
6
Here is a characterization of open set:
Theorem 2.0.9. [1]A set S of states is open if and only if no ergodic set is
a subset of S.
Theorem 2.0.10. [1]Let S be an open set of s states. Let Q be the s by
s submatrix of the transition matrix corresponding to these states. Let the
process start in si where si . Then: The (i, j) − th entry of N = (I − Q)−1
gives the mean number of times the process is in sj before leaving S.
Proof. Again, the idea of creating a new absorbing chain can be used here.
3
Treatment of Ergodic Chains
Recall that an ergodic chain can either be regular or cyclic, with the corresponding transition matrix being either primitive or periodically irreducible.
Meyer, Kemeny and Snell independently defined their fundamental matrices
in studying ergodic chains. It will be shown that although the two fundamental matrices are different in general, yet one may be used in a very similar
manner to the way the other is used. Moreover, the computational advantage
of Meyers method will be unraveled.
3.1
Foundations of Kemeny and Snells Theory
Indeed, Meyer used A# as his fundamental matrix. Before establishing
Kemeny and Snells fundamental matrix, another matrix so called as the limiting matrix needs to be introduced. In this section definitions and theorems
are first given under regular chains, then they are generalized to the case for
all types of ergodic chains.
Lemma 3.1.1. [1]Let T be the transition matrix for a regular chain, then
1. There is a unique positive vector w0 such that w0 T = w0 and w0 j = j.
w’ is defined to be the unique fixed probability vector of T.
2. T n converges to jw0 as n → ∞. The W = jw0 is defined to be the
limiting matrix of T .
3. For any probability vector x0 , x0 T n converges to w0 .
7
Remark This is a well-known fact for primitive Stochastic matrices and
the proof is omitted here. The analysis from Kemeny and Snell, Meyer are
all very elegant, yet the author used a different approach in the assignment
that can be referred to. One should notice that w0 T = w0 provides a way of
calculating w0 and W .
Theorem 3.1.2. [1]Let T be the transition matrix for a regular Markov
chain. Let W be its limiting matrix. Then Z = (I − (T − W ))−1 exists.
Proof. From the fact that T W = W T and W T = T W = W , one can show
that (T − W )n = T n − W so that (T − W )n → 0 as n → ∞. This establishes
the existence of Z.
Definition 3.1.3. [1]Let T be the transition matrix for a regular chain. The
matrix Z = (I − (T − W ))−1 is called the fundamental matrix for the regular
chain.
Now it will be shown that the notion of fundamental matrix can be extended for any ergodic chains in a similar fashion. However, in dealing with
cyclic chains, T n does not converge to 0. Therefore, W need to be redefined
in order to generalize fundamental matrix.
Lemma 3.1.4. [1]If the sequence T n is summable to 0 by some averaging
method, then the matrix I − T has an inverse, and the partial sum I + T 1 +
T 2 + · · · + T n is summable by the same method to (I − T )−1 .
Lemma 3.1.5. T is irreducible if and only if (1 − k)T + kI is primitive for
some k between 0 and 1.
Theorem 3.1.6. [1]let T be the transition matrix of a ergodic chain. Then
T n is Euler-summable to a limiting matrix W , and this limiting matrix is of
the form W = jw0 , with w0 a positive probability vector.
Proof. The case is easy when the chain is regular. Now assume that the
chain is cyclic, so that T is periodic with period d > 1. By Lemma 3.1.5,
(1 − k)T + kI is primitive so that it is the transition matrix for some regular
chain, so that ((1−k)T +kI)n converges to a limiting matrix W = jw0 where
w0 a positive probability vector as n → ∞. Expand the expression one can
see that T n is Euler-summable to W .
8
Similar results to W for regular chains can then be generalized to W for
ergodic chains.
Theorem 3.1.7. [1]If T is an ergodic transition matrix, and let W and w0
be its limiting vector and fixed probability vector, then
1. For any probability vector x0 , x0 T n is Euler-summable to w0 .
2. w0 is the unique fixed probability vector of T .
3. T W = W T
3.2
Fundations of Meyer’s Theory
Meyer adopts the group inverse of A = I − T as his fundamental matrix.
As it turns out, almost every quantity one can derive from Kemeny and
Snells theory, there is a corresponding expression in terms of A# which has
its own calculation-wise advantage. Careful reader might notice that the
limiting matrix is written as W in the last section, however, W is defined to
be W = I − AA# at the notation section. The validity of such will be given
in the following theorem.
Lemma 3.2.1. [2]Assume T is the transition matrix of a regular chain, then
W = I − AA# = limn→∞ T n
Proof. [2]Consider the Jordan canonical form of T. Perron-Frobenius theorem
suggests that the perron value 1 has multiplicity 1. Hence, there exists
nonsingular matrix S such that
0
−1 1
T =S
S.
0 K
Furthermore, 1 is not an eigenvalue of K, deduce that I − K is nonsingular.
So that
0
#
−1 0
A =S
S.
0 (I − K)−1
From this one can easily verify that W = I − AA# = limn→∞ T n .
Theorem 3.2.2. [2]Assume T is the transition matrix of an ergodic chain,
then
n X
n n−i
#
W = I − AA = lim
k (1 − k)i T i .
n→∞
i
i=0
9
Proof. The proof follows directly from the preceding results for regular chains.
The following result gives an expression of the fixed probability vector of
T for an ergodic chain in terms of information of A# .
Theorem 3.2.3. [2]If T is the transition matrix of an m-state ergodic chain
and if A = I −T , then the fixed probability vector of T is given by w0 = e0i −ri0 A
for each i = 1, 2, , m, where ri0 is the i − th row of A# .
Proof. The theorem directly follows from the facts that W = I − AA# ,
e0i W = w0 , and AA# = A# A.
The following theorem shows how the two fundamental matrices are related.
Theorem 3.2.4. [2]Let T be the transition matrix of an ergodic chain, let
A = I − T , and let W = jw0 , where w0 is the unique fixed probability vector
of T . The matrix Z = (I − (T − W ))−1 is given by Z = A# + W = I + T A# .
Remark One can verify the theorem directly by multiplying the two matrices. However, it is worth mentioning that the inverse of I − (T − W ) was
not found by trial and error. One can refer to Meyer’s paper for the detail.
Notice that whenever a result can be derived from Z, it can now be
expressed in terms of A# by using the last theorem. However, Meyer suggests
that the theory can be developed directly from A# , without the need of
introducing Z.
3.3
Application of the fundamental matrices
In this section, certain classical statistical quantities are introduced and
calculated in terms of the two fundamental matrices. It will be shown that in
most cases, the two fundamental matrices could interchange with no change
or subtle change in the expressions of some quantities.
Application #1
10
Definition 3.3.1. Let T be the transition matrix of a regular chain, and let
N (n) denote the matrix whose (i, j) − th entry is
Nij (n) = the expected number of time the process is in state sj in the first
n stages (i.e. the initial plus (n − 1) stages) when the process was initially
in state si .
Theorem 3.3.2. [1][2]In the above setting,
1. Nij (n) − nW tends to Z − W ,
2. N ij(n) − nW tends to A#
In this case,
A# = lim (N (n) − nW ).
n→∞
P
k
Remark The theorem directly follows from the fact that N (n) = n−1
k=0 T .
Note that this theorem gives an interpretation for the entries of A# .
Aplication #2
Definition 3.3.3. For an ergodic chain with transition matrix T , let M
denote the matrix whose (i, j) − th entry is given by Mij = the expected
number of steps before entering state sj for the first time after the initial
state si .
M is called the mean first passage matrix.
Kemeny and Snell’s representation of M
Now let M be the same setting as above except that M is under a regular
chain.
Lemma 3.3.4. [1]The matrix M satisfies the equation X = T (X − Xdg ) +
J. (1)
Lemma 3.3.5. [1]Mii = 1/wi where wi is the ith component of the fixed
probability vector w0 of T .
Theorem 3.3.6. [1]The mean first passage matrix M is the unique solution
of (1) and is given by M = (I − Z + JZdg )D where D is the diagonal matrix
with diagonal elements dii = 1/wi .
11
Remark Kemeny and Snell then suggests that the same expressions of M
can then be generalized for ergodic chains since many basic properties of of
Z under a regular chain setting can be generalized to ergodic chains and
since the period d did not appear explicitly. Although the author is not fully
convinced from this point of view, various examples has been tested to follow
Kemeny and Snell’s argument.
Meyer’s representation of M
Theorem 3.3.7. [2]For an ergodic chain, the unique solution of the equation
(1) is given by M = (I − A# + JA#
(2)
dg )D.
Remark Plug (2) into (1) to verify.Notice that the two expressions for M
can be interchanged by substituting one fundamental matrix with another.
Application #3
Definition 3.3.8. For an ergodic chain, let V be the matrix whose (i, j) − th
entry is Vij = variance of the number of steps required to reach state sj for
the first time after the initial state si .
Kemeny and Snell’s representation of V
Theorem 3.3.9. [1]Let P be the matrix whose (i, j) − th entry is Pij =
square of the expected number of steps before entering state sj for the first
time after the initial state si . Then P satisfies the equation X = T (X −
Xdg ) − 2T (Z − JZdg )D + J.
(2) The unique solution to (2) is given by
P = M (2Zdg D − I) + 2(ZM − J(ZM )dg ).
Remark This theorem can be similarly proved as the theorem for M . V
is then given by P − Msq .
Meyers representation of V
Theorem 3.3.10. [2]The unique solution of (2) is given by B = M (2A#
dg D +
#
#
I) + 2(A M − J(A M )dg ).
Again, one can notice that the expression of V by Kemeny and Snell can
be rewritten with the only change to be Z replaced by −A# .
12
3.4
Calculation of A# and w0
Lower calculation complexity is important when dealing with large data
set. From last section, one see that both fundamental matrices and the fixed
probability vector are needed for calculation purpose. This section carries
through the calculation of A# and w0 for ergodic chains using A#
The following theorem provides a way to calculating A
Theorem 3.4.1. [2]Let T be the transition matrix of an m-state ergodic
chain, and let A = I − T . Write A as
U c
A= 0
d α
where U is (m − 1) ∗ (m − 1)(from theorem 2.0.2, U −1 exists) and adopt
the following notation:
h0 = d0 U −1 , δ = −h0 U −1 j, β = 1 − h0 j, F = U −1 − δ/βI.
The scalar δ and β are each nonzero(in fact, δ > 0 and β > 1), and A# is
given by
−1
U + U −1 jh0 U −1 /δ − F jh0 F/δ −F j/β
#
A =
.
h0 F/β
δ/β 2
From Theorem 3.2.3, one can see that in order to calculate w’, only one
row of A needs to be known. This next theorem deal with the caculation of
w0 .
Theorem 3.4.2. [2]If T is the transition matrix of an m-state ergodic chain
and A = I − T is partitioned the same as above, then w0 is given by w0 =
1/β −d0 U −1 1
With w0 one can know ”W” and thus is able to calculate all the quantities
listed above. On the other hand, Kemeny and Snell suggest that the calculation of w0 can be done by solving a linear equation w0 T = w0 , and from this
the fundamental matrix Z = (I − (T − W ))−1 can be calculated. One can
immediately see the advantage of Meyer’s theory.
13
4
Example
The report is concluded with an example. Consider the transition matrix
for a cyclic chain:


0 1/3 2/3
T = 1/2 0 1/2 .
3/4 1/4 0
so that


1
−1/3 −2/3
1
−1/2 .
A = −1/2
−3/4 −1/4
1
Consider the following cases. First assume only w0 is desired. Then h is
computed as the solution of the system U 0 x = d, where
1
−1/3
U=
and d0 = (−3/4 − 1/4).
−1/2
1
Now if A# is desired, compute U −1 and obtain h0 as h0 = d0 U −1 . In our
case,
6/5 2/5
−1
U =
and h0 = (−21/20 − 2/5).
3/5 6/5
Now calculate
β = 49/20
The fixed probability vector w0 is given by
w0 = (3/7 8/49 20/49)
. Therefore

4/7 −8/49 −20/49
AA# = I − W = −3/7 41/49 −20/49
−3/7 −8/49 29/49
The matrix D is



7/3
0
0
 0 49/8
0 
0
0
49/20
14
References
[1] John G. Kemeny, J.Laurie Snell Finite Markov Chains. D. Van Nostrand
Company, INC., New Jersey, 1960.
[2] Carl Meyer The role of the group inverse in the theory of finite Markov
chains. SIAM Review 17 (3), 443-464, 1975.
[3] Daniel
Johnson
Substochastic
Matrix
spectral
radius.
Mathematics
Stack
Exchange,
https://math.stackexchange.com/questions/36828/substochasticmatrix-spectral-radius
15
Download