1 Math 1b Practical — February 25, 2011 — revised February 28, 2011 Probability matrices A probability vector is a nonnegative vector whose coordinates sum to 1. A square matrix P is called a probability matrix (or a left-stochastic matrix or a column-stochastic matrix) when all of its columns are probability vectors. [Caution: Some references use probability matrix to mean a row-stochastic matrix.] Probability matrices arise as “transition matrices” in Markov chains. Let A = (aij ) be a probability matrix and u = (u1 , . . . , un ) a probability vector. If we think of uj as the proportion of some commodity or population in ‘state’ j at a given moment (or as the probability that a member of the population is in state j), and of aij as the proportion of (or probability that) the commodity or population in state j that will change to state i after a unit of time, we would find the proportion of the commodity or population in state i after one unit of time to be ai1 u1 + . . . + ain un . That is, the vector that describes the new proportions of the commodity or population in various states after one unit of time is Au. After k units of time, the vector that describes the new proportions of the commodity or population in various states is Ak u. If aij = 0, the edge directed from i to j in the digraph of a probability matrix A may be labeled with the number aij ; this may help to ‘visualize’ the matrix and its meaning. Here is an example (taken from Wikipedia) similar to Story 1 about smoking. Assume that weather observations at some location show that a sunny day is 90% likely to be followed by another sunny day, and a rainy day is 50% likely to be followed by another rainy day. We are asked to predict the proportion of sunny days in a year. Let xn = (an , bn ) where an is the probability that it is rainy on day n, and bn = 1 − an . Then an+1 bn+1 =A an bn where A= .9 .5 .1 .5 . The digraph of A is Theorem 1. Any probability matrix A has 1 as an eigenvalue. Proof: Let 11 be the row vector of all ones. We have 1P 1 = 1. 1 (We could call 11 a left eigenvector and 1 a left eigenvalue.) This means 1(P 1 − I) = 0, so the rows of P − I are linearly dependent, so the columns of P − I are linearly dependent, so (P − I)u = 0 for some u, or P u = u. (In general, the left eigenvalues of a matrix are the same as the right eigenvalues, and they have both the same left and right geometric and algebraic multiplicities.) 2 A probability vector s which is an eigenvector corresponding to eigenvalue 1 is called, in this subject, a stable vector or a production vector or an equilibrium. While we know a probability matrix P has eigenvalue 1, we don’t automatically know that it has a stable vector. If we find a vector s so that P s = s with nonnegative coordinates, then we can multiply s by a scalar to ensure that the sum of its coordinates is 1, and then it is a stable vector. But there may be vectors s so that P s = s with both positive and negative coordinates, e.g. if P = I. A probability matrix A is said to be regular when Am > O (every entry in Am is strictly positive) for some positive integer m. (The poor word ‘regular’ is overused in mathematics. It means a lot of things. But we’ll use it in this sense today.) Theorem 2. Let A be a regular probability matrix. Then A has a unique stable vector e. For this stable vector, e > 0 (in every coordinate). We have Ak → U as k → ∞, where U is the matrix all of whose columns are equal to e, and hence Ak u → e for any probability vector u. For our example, the stable vector is (5/6, 1/6) . So for any probability vector u, k A u= .9 .1 .5 .5 k u → 0.833 0.167 . This means that after k is not too small, the probability of sun on day k will be about 0.833. So in any period of time beyond the first few days, we can expect about 83% sunny days. Partial proof: Let x = (x1 , x2 , . . . , xn ) be an eigenvector (a column vector) corresponding to eigenvalue 1. Then x = Ax. With A = (aij ), we have xi = n j=1 aij xj , so |xi | ≤ n aij |xj |. j=1 That is, x̂ ≤ Ax̂, where x̂ = (|x1 |, |x2 |, . . . , |xn |), and where the inequality holds in each coordinate. If strict inequality holds in any coordinate, then 1x̂ 1 < 1(Ax̂) 1 = (1 1A)x̂ = 1x̂, 1 a contradiction. So Ax̂ = x̂. We have established that A has an eigenvector of value 1, namely x̂, with all nonnegative coordinates. We can multiply this vector by a suitable scalar so that it is a probabilty vector e. In summary so far, we have shown that any probability matrix, regular or not, has a stable vector e. The digraph of a regular probability matrix is obviously strongly connected, so we can apply the Perron-Frobenius Theorem (Theorem 3 of the handout on nonnegative matrices) to establish the uniqueness of a stable vector, and the fact that all entries are positive. 3 But since we didn’t prove this theorem, we will give independent proofs of these two facts in the next two paragraphs. If A is a regular probability matrix, let m be such that Am > O (in all positions). If x ≥ 0 but x = 0, then Am x > 0. If u ≥ 0 is a nonnegative eigenvector of A corresponding to eigenvalue 1, then u = Am u > 0. Suppose a is a nonnegative nonzero vector and b any real vector. We can add or subtract some scalar (possibly zero) multiple of b from a to get a nonnegative vector with at least one coordinate 0. (For example, for a = (1, 2, 3, 4) and b = (1, 2, 5, 5), we would take a − 0.6b = (.4, .8, 0, 1); for a = (1, 2, 3, 4) and b = (−1, 1, 1, 1), we could take a + b or a − 2b.) So if we have two stable vectors e and e , we could find a linear combination e − αe that is nonnegative with at least one zero coordinate. But e − αe , if nonzero, would also be an nonnegative eigenvector with eigenvalue 1, and this contradicts the last sentence of the previous paragraph. So e = αe , and since both have entries that sum to 1, α must be 1. Let e be the stable vector of A. The matrix U all of whose columns are equal to e is U = e 11 (a column vector times a row vector). Check that AU = U A = U 2 = U, and it then follows by induction that (A − U)k = Ak − U for any positive integer k (see Problem Set 8). The next step is to show that A − U has spectral radius < 1, but we omit the details here (this may also be on the problem set). By Theorem 2 from the Leontief model handout, Ak − U = (A − U)k → O as k → ∞. The uniqueness of a stable vector follows under weaker conditions than ‘regularity’. We really do need regularity to prove that all eigenvalues μ = 1 satisfy |μ| < 1 (and we didn’t do this). Here is a probablity matrix that is not regular. ⎛ 0 ⎝ A= 0 1 1 0 0 ⎞ 0 1⎠ 0 Note that A3 = I so the eigenvalues λ satisfy λ3 = 1, and in fact the three (complex) cube roots of unity are the eigenvalues. It is not true that Ak has a limit as k → ∞ since the sequence of powers of A is periodic: I, A, A2 , I, A, A2 , I, A, . . .. It is not true that Ak u approaches a stable vector as k increases. Still it is true that A has a unique stable vector (all coordinates 1/3). Theorem 3. If A is a probability matrix whose digraph is strongly connected, then A has a unique stable vector. All coordinates of that stable vector are positive. Proof: If the digraph has n vertices and is strongly connected, then B = I + A + A2 + A3 + . . . + An−1 > O. 4 By the second paragraph of the proof of Theorem 2, A has a stable vector e. If u ≥ 0 is an eigenvector of A with eigenvalue 1, then 0 < Bu = (n + 1)u. So a nonnegative eigenvector u of eigenvalue 1 (in particular, a stable vector) of A has all coordinates strictly positive. Now proceed as in the fifth paragraph of the proof of Theorem 2 to establish uniqueness. ∗ ∗ ∗ We conclude with an example that shows that a probability matrix may have a unique stable vector even if its digraph is not strongly connected. It is possible to give a smaller example, but we copy (with notation changes) one with a story from http://en.wikipedia.org/wiki/Stochastic_matrix . (You are not responsible for the terminology or any other material below.) There are five boxes in a row. At time 0 there is a cat in box 1 and a mouse in box 5. After one unit of time, the cat and mouse each jump to a randomly chosen adjacent box. (So if the mouse is in box 5, it must jump to box 4, but if in box 4, there are two possibilities.) This is continued indefinitely. We distinguish five ‘states’. State State State State State 1 2 3 4 5 cat in box 1, mouse in box 3 cat in box 1, mouse in box 5 cat in box 2, mouse in box 4 cat in box 3, mouse in box 5 the cat and mouse are both in box 3. If State 5 is reached, the cat eats the mouse and the game officially ends. But we may imagine the game continuing with the cat and mouse-skeleton still jumping acording to the rules. (This game is an example of a Markov chain. There is no ‘memory’ in the game; what happens next does not depend on what has happened before, but only on the state.) The transition matrix of this Markov chain is ⎛ 0 ⎜ 0 ⎜ A = ⎜ 1/2 ⎝ 0 1/2 0 0 1 0 0 1/4 1/4 0 1/4 1/4 0 0 1/2 0 1/2 ⎞ 0 0⎟ ⎟ 0⎟. ⎠ 0 1 √ √ The eigenvalues are 1, 2, − 2, 0, 0. There is one nonnegative eigenvector, namely (0, 0, 0, 0, 1) , and it is true that Ak approaches the matrix all of whose columns are (0, 0, 0, 0, 1) as k → ∞ (details omitted, though you can give evidence with Mathematica or Matlab). So no matter what state we start in, the poor mouse eventually gets eaten. 5 The Wikipedia page shows (without proof) how to calculate the expected lifespan of the mouse. If we start in State 2, it is ⎛ ⎞ 0 ⎜1⎟ (1, 1, 1, 1)(I − B)−1 ⎝ ⎠ = 4.5 0 0 ⎛ 0 ⎜ 0 where B = ⎝ 1/2 0 0 1/4 0 1/4 1 0 0 1/4 ⎞ 0 0 ⎟ ⎠, 1/2 0 i.e. B is A with row and column 5 dropped. To see what happens if we start in State 3, replace (0, 1, 0, 0) by (0, 0, 1, 0); of course, we get 3.5 as the expected life of the mouse. Starting in States 1 or 4 gives an expected life of 2.75. How do we calculate the expected life when starting in State 5? We don’t need matrices at all in this case; the answer is 0. We remark that (I −B)−1 is nonnegative, cf. Theorem 1 of the handout on the Leontief model.