Lumping the Markov Chain

advertisement
LUMPING THE MARKOV CHAIN FOR CO-OCCURRENCE MATRICES
USING SUBSETS OF THE ROW AND COLUMN SWAP GROUP
Ana M. Mocanu
June 17, 2001
Definition 1: A co-occurrence matrix is a binary matrix with fixed row and column sums.
Definition 2: A checkerboard unit is a 2x2 submatrix of a co-occurrence matrix that has
one of the following forms:
0 1 
1 0
1 0 or 0 1 .




Definition 3: Performing a 2x2 swap on a matrix means swapping the 1s and the 0s of a
checkerboard unit in the matrix.
Notation: (r1, r2, c1, c2) is the 2x2 swap performed by swapping the elements of the
matrix that are found at the intersections between the rows r1 and r2 and the columns c1 and c2.
Observation 1: 2x2 swaps conserve the row and column sums of the matrix.
Observation 2: The method of picking a pair of rows and a pair of columns at random and
performing a 2x2 swaps if the resulting submatrix is a checkerboard unit represents an instance
of Markov Chain Monte Carlo1.
Observation 3: The graph representing the Markov Chain mentioned above is connected.
The vertices of the graph are co-occurrence matrices and the edges are 2x2 swaps.
Definition 4: Performing a row swap on a co-occurrence matrix means interchanging two
rows that have the same sum. Performing a column swap on a co-occurrence matrix means
interchanging two columns that have the same sum.
Observation 4: Row and column swaps conserve the row and column sums and the
number of checkerboard units of the matrix.
Definition 5: Let A = A(r, c) be the set of all nxm co-occurrence matrices with row sums r
= (r1, r2, … , rn) and column sums c = (c1, c2, … ,cm).
Observation 5: Row or column swaps are bijective functions from A to A. 2x2 swaps are
bijective functions from a subset of A to a different subset of A.
Theorem 1: The set of possible row and column swaps on A generate a group G.
1
G. W. Cobb, Discrete Markov Chain Monte Carlo, page 2
1
Observation 6: Some elements of the group are row or column swaps, others are
combinations of row or/and column swaps. Therefore, the notions of row of column swap and of
element of the group G are distinct.
Observation 7: Any row swap and any column swap commute.
Theorem 2: Any subset S of G generates a partition of the set A. The elements of the
partition are sets Ai ( 1  i  n( S ) , n(S) being the number of sets, which depends S) of cooccurrence matrices with the following property:
Ai  {a  A|  T  S,  b  Ai so that b  T(a) and  b  Ai ,  T  S so that b  T(a)}
Proof:
The sets Ai contain all the elements of A. Any transformation T in S acts on each cooccurrence matrix, the result being the same matrix or a different matrix.
Suppose that there is a matrix a in A that belongs to two distinct sets in A. For simplicity,
let’s assume that these sets are A1 and A2. Let b be an arbitrary matrix in A1. According to the
definition of the partition,  T  S so that b  T(a) . At the same time,
 c  A2 so that c  T(a) . But as T is a bijective function, b=c. Consequently, b  A1 , b  A2 .
Similarly we can prove that b  A2 , b  A1 . Therefore, A1=A2. Contradiction: the sets A1 and A2
are not distinct. Our supposition is false. It results that the sets Ai are disjunct.
Consequently, A is a partition.
Definition 6: A Markov Chain is lumpable with respect to a partition of the states A =
{A1, A2, …, Ar} if for every pair of sets Ai and Aj, and for every element a of Ai the probability of
going from a to a state in Aj is the same2.
The main purpose of this paper is to provide a proof for the following:
Theorem: The Markov Chain for co-occurrence matrices is lumpable with respect to any
partition generated by a subset of the group G of row and column swaps.
Proof:
Observation: Since row or column swaps conserve the number of checkerboard
units in a matrix, all matrices in a set of the partition A have the same number of
checkerboard units. In terms of the graph representing the Markov Chain, this means
that all matrices in a set of the partition have the same degree, i.e. the same number of
incoming edges.
2
See John G. Kemeny and J. Laurie Snell, Finite Markov Chains ( D. Van Nostrand Company, Inc, Princeton, NJ,
1967), page 124
2
The probability of going from a matrix a in a set Ai to a set Aj in the partition A
= {A1, A2, …, An(S)} is k/n, where k is the number of edges from a to elements in Aj, and
n is the degree of a.
Proving that the Markov Chain is lumpable with respect to the partition A is
equivalent to proving that each matrix in a set Ai of the partition is connected by 2x2
swaps to exactly k(i,j) elements in the set Aj.
Let S be the subset of G that generates the partition A.
Let Ai and Aj be two sets in the partition A = {A1, A2, …, Ar} so that Ai contains
at least two elements. ( If Ai contains only one element, the proof is trivial.)
Let a be a matrix in Ai so that it is connected by 2x2 swaps to k>0 matrices in Aj.
(If we can not choose a matrix with this property, i.e. if all matrices in Ai are connected
by 2x2 swaps to 0 matrices in Aj, then k(i,j) = 0)
Let b be a matrix in Ai distinct from a.
Let Si = (ri1, ri2, ci1, ci2), 1  i  k , the 2x2 swaps that connect a and the k
matrices in Aj. These matrices can be written as Si(a), 1  i  k.
Let T  S the transformation which has the property that T(a) = b and T(b) = a.
As T is an element in the group generated by row or column swaps, it is a
combination of α row swaps and β column swaps. Any row swap and column swap
commute. Therefore, we can isolate the row swaps and the column swaps in the writing
of T:
T  R1  R2    R  C1  C 2    C 
R1  R2    R is a permutation σr of the rows of a.
C1  C 2    C  is a permutation σc of the columns of a.
Consequently, we can write that T   r   c .
Let Si’=( σr(ri1), σr(ri2), σc(ci1), σc(ci2)), 1  i  k.
Using elementary matrix algebra, we can prove the following:
T(Si(a)) = Si’(T(a)) = Si’(b), 1  i  k.
This tells us that matrix b is connected by 2x2 swaps to at least k matrices in Aj.
If we start from b, assuming that b is connected to l matrices in Aj, we obtain
using the same reasoning that a is connected to at least l matrices in Aj.
From the last two statements we can infer that k = l, and a and b are connected
to the same number of matrices in Aj.
As a and b have been chosen arbitrarily, results that each matrix in Ai is
connected by 2x2 swaps to exactly k(i,j) (k(i,j) = k = l) matrices in Aj.
3
Download