LUMPING THE MARKOV CHAIN FOR CO-OCCURRENCE MATRICES USING SUBSETS OF THE ROW AND COLUMN SWAP GROUP Ana M. Mocanu June 17, 2001 Definition 1: A co-occurrence matrix is a binary matrix with fixed row and column sums. Definition 2: A checkerboard unit is a 2x2 submatrix of a co-occurrence matrix that has one of the following forms: 0 1 1 0 1 0 or 0 1 . Definition 3: Performing a 2x2 swap on a matrix means swapping the 1s and the 0s of a checkerboard unit in the matrix. Notation: (r1, r2, c1, c2) is the 2x2 swap performed by swapping the elements of the matrix that are found at the intersections between the rows r1 and r2 and the columns c1 and c2. Observation 1: 2x2 swaps conserve the row and column sums of the matrix. Observation 2: The method of picking a pair of rows and a pair of columns at random and performing a 2x2 swaps if the resulting submatrix is a checkerboard unit represents an instance of Markov Chain Monte Carlo1. Observation 3: The graph representing the Markov Chain mentioned above is connected. The vertices of the graph are co-occurrence matrices and the edges are 2x2 swaps. Definition 4: Performing a row swap on a co-occurrence matrix means interchanging two rows that have the same sum. Performing a column swap on a co-occurrence matrix means interchanging two columns that have the same sum. Observation 4: Row and column swaps conserve the row and column sums and the number of checkerboard units of the matrix. Definition 5: Let A = A(r, c) be the set of all nxm co-occurrence matrices with row sums r = (r1, r2, … , rn) and column sums c = (c1, c2, … ,cm). Observation 5: Row or column swaps are bijective functions from A to A. 2x2 swaps are bijective functions from a subset of A to a different subset of A. Theorem 1: The set of possible row and column swaps on A generate a group G. 1 G. W. Cobb, Discrete Markov Chain Monte Carlo, page 2 1 Observation 6: Some elements of the group are row or column swaps, others are combinations of row or/and column swaps. Therefore, the notions of row of column swap and of element of the group G are distinct. Observation 7: Any row swap and any column swap commute. Theorem 2: Any subset S of G generates a partition of the set A. The elements of the partition are sets Ai ( 1 i n( S ) , n(S) being the number of sets, which depends S) of cooccurrence matrices with the following property: Ai {a A| T S, b Ai so that b T(a) and b Ai , T S so that b T(a)} Proof: The sets Ai contain all the elements of A. Any transformation T in S acts on each cooccurrence matrix, the result being the same matrix or a different matrix. Suppose that there is a matrix a in A that belongs to two distinct sets in A. For simplicity, let’s assume that these sets are A1 and A2. Let b be an arbitrary matrix in A1. According to the definition of the partition, T S so that b T(a) . At the same time, c A2 so that c T(a) . But as T is a bijective function, b=c. Consequently, b A1 , b A2 . Similarly we can prove that b A2 , b A1 . Therefore, A1=A2. Contradiction: the sets A1 and A2 are not distinct. Our supposition is false. It results that the sets Ai are disjunct. Consequently, A is a partition. Definition 6: A Markov Chain is lumpable with respect to a partition of the states A = {A1, A2, …, Ar} if for every pair of sets Ai and Aj, and for every element a of Ai the probability of going from a to a state in Aj is the same2. The main purpose of this paper is to provide a proof for the following: Theorem: The Markov Chain for co-occurrence matrices is lumpable with respect to any partition generated by a subset of the group G of row and column swaps. Proof: Observation: Since row or column swaps conserve the number of checkerboard units in a matrix, all matrices in a set of the partition A have the same number of checkerboard units. In terms of the graph representing the Markov Chain, this means that all matrices in a set of the partition have the same degree, i.e. the same number of incoming edges. 2 See John G. Kemeny and J. Laurie Snell, Finite Markov Chains ( D. Van Nostrand Company, Inc, Princeton, NJ, 1967), page 124 2 The probability of going from a matrix a in a set Ai to a set Aj in the partition A = {A1, A2, …, An(S)} is k/n, where k is the number of edges from a to elements in Aj, and n is the degree of a. Proving that the Markov Chain is lumpable with respect to the partition A is equivalent to proving that each matrix in a set Ai of the partition is connected by 2x2 swaps to exactly k(i,j) elements in the set Aj. Let S be the subset of G that generates the partition A. Let Ai and Aj be two sets in the partition A = {A1, A2, …, Ar} so that Ai contains at least two elements. ( If Ai contains only one element, the proof is trivial.) Let a be a matrix in Ai so that it is connected by 2x2 swaps to k>0 matrices in Aj. (If we can not choose a matrix with this property, i.e. if all matrices in Ai are connected by 2x2 swaps to 0 matrices in Aj, then k(i,j) = 0) Let b be a matrix in Ai distinct from a. Let Si = (ri1, ri2, ci1, ci2), 1 i k , the 2x2 swaps that connect a and the k matrices in Aj. These matrices can be written as Si(a), 1 i k. Let T S the transformation which has the property that T(a) = b and T(b) = a. As T is an element in the group generated by row or column swaps, it is a combination of α row swaps and β column swaps. Any row swap and column swap commute. Therefore, we can isolate the row swaps and the column swaps in the writing of T: T R1 R2 R C1 C 2 C R1 R2 R is a permutation σr of the rows of a. C1 C 2 C is a permutation σc of the columns of a. Consequently, we can write that T r c . Let Si’=( σr(ri1), σr(ri2), σc(ci1), σc(ci2)), 1 i k. Using elementary matrix algebra, we can prove the following: T(Si(a)) = Si’(T(a)) = Si’(b), 1 i k. This tells us that matrix b is connected by 2x2 swaps to at least k matrices in Aj. If we start from b, assuming that b is connected to l matrices in Aj, we obtain using the same reasoning that a is connected to at least l matrices in Aj. From the last two statements we can infer that k = l, and a and b are connected to the same number of matrices in Aj. As a and b have been chosen arbitrarily, results that each matrix in Ai is connected by 2x2 swaps to exactly k(i,j) (k(i,j) = k = l) matrices in Aj. 3