11 - Markov Chains Jim Vallandingham Outline • Irreducible Markov Chains – Outline of Proof of Convergence to Stationary Distribution – Convergence Example – Reversible Markov Chain • Monte Carlo Methods – Hastings-Metropolis Algorithm – Gibbs Sampling – Simulated Annealing • Absorbing Markov Chains Stationary Distribution • As approaches Each row is the stationary distribution Stationary Dist. Example Stationary Dist. Example • Long Term averages: – 24% time spent in state E1 – 39% time spent in state E2 – 21% time spent in state E3 – 17% time spent in state E4 Stationary Distribution • Any finite, aperiodic irreducible Markov chain will converge to a stationary distribution – Regardless of starting distribution • Outline of Proof requires linear algebra – Appendix B.19 L.A. : Eigenvalues • Let P be an s x s matrix. • P has s eigenvalues – Found as the s solutions to – Assume all eigenvalues of P are distinct L.A. : left & right eigenvectors • Corresponding to each eigenvalue – Is a right eigenvector – And a left eigenvector – For which: – Assume they are normalized: L.A. : Spectral Expansion • Can express P in terms of its eigenvectors and eigenvalues: • Called a spectral expansion of P L.A. : Spectral Expansion • If is an eigenvalue of P with corresponding left and right eigenvectors & • Then is an eigenvalue of Pn with same left and right eigenvectors & L.A. : Spectral Expansion • Implies spectral expansion of Pn can be written as: Outline of Proof • Going back to proof… – P is transition matrix for finite aperiodic irreducible Markov chain • P has one eigenvalue, equal to 1 – All other eigenvalues have absolute value <1 Outline of Proof • Choosing left and right eigenvectors of – Requirements: & =1 Probability vector (sum to 1) Normalization – Also satisfies : (definition of left eigenvector as eigenvalue of 1) Outline of Proof Same equation satisfied by the stationary distribution • Also: – Can be shown that there is a unique solution of this equation that also satisfies so so that Outline of Proof • Pn gives the n-step transition probabilities. • Spectral Expansion of Pn is: Only one eigenvalue is = 1. Rest are < 1 • So as n increases Pn approaches Convergence Example Convergence Example Has Eigenvalues of : Convergence Example Has Eigenvalues of : Less than 1 Convergence Example • Left & Right eigenvectors satisfying Convergence Example • Left & Right eigenvectors satisfying Stationary distribution Convergence Example • Spectral expansion Stationary distribution 0 0 0 Reversible Markov Chains Reversible Markov Chains • Typically moving forward in ‘time’ in a Markov chain – 1 2 3 … t • What about moving backward in this chain? – t t-1 t-2 … 1 Reversible Markov Chains Ancestor Species A Species B Reversible Markov Chains • Have a finite irreducible aperiodic Markov chain – with stationary distribution – During t transitions, chain will move through states: • Reverse chain – Define – Then reverse chain will move through states: Reversible Markov Chains • Want to show structure determining the reverse chain sequence is also a Markov chain • Typical element found from typical element of P, using: Reversible Markov Chains • Shown by using Bayes rule to invert conditional probability Intuitively: The future is independent of the past, given the present The past is independent of the future, given the present Reversible Markov Chains • Stationary distribution of reverse chain is still • Follows from Stationary distribution property Reversible Markov Chains • Markov chain is said to be reversible if • This only holds if Monte Carlo Methods Markov Chain Monte Carlo • Class of algorithms for sampling from probability distributions – Involve constructing a Markov Chain – Want to have stationary distribution – State of chain after large number of steps is used as a sample of desired distribution • We discuss 2 algorithms – Gibbs Sampling – Simulated Annealing Basic Problem • Find transition matrix P such that – Its stationary distribution is the target distribution • Know that Markov chain will converge to stationary distribution, regardless of initial distribution – How can we find such a P with its stationary distribution as the target distribution? Basic Idea • Construct transition matrix Q – “candidate generating matrix” – Modify to have correct stationary distribution • Modification involves inserting factors • So that Various ways to picking a’s Hastings-Metropolis • Goal: construct aperiodic irreducible Markov chain • Having prescribed stationary distribution • Produces a correlated sequence of draws from the target density that may be difficult to sample using a classical independence method. Hastings-Metropolis Process: • Choose set of constants – Such that – And • Define Accept state change Reject state change Chain doesn’t change value Hastings-Metropolis Example = (.4 .6) Q= 1 2 1 .5 .5 2 .9 .1 Hastings-Metropolis Example Q= 1 2 1 2 .5 .9 .5 .1 = (.4 .6) P= 1 2 1 2 .5 .33 .5 .67 Hastings-Metropolis Example = (.4 .6) P= 1 2 P2= 1 2 P50= 1 2 1 2 .5 .33 .5 .67 1 2 .415 .386 .585 .614 1 2 .398 .398 .602 .602 Algorithmic Description 1. Start with State E1, then iterate 2. Propose E’ from q(Et,E’) 3. Calculate ratio 4. If a > 1, – Accept E(t+1) = E’ 5. Else – Accept with probability of a – If rejected, E(t+1) = Et Gibbs Sampling Gibbs Sampling Definitions Be the random vector Be the distribution of Assume We define a Markov chain whose states are the possible values of Y Gibbs Sampling Process • Enumerate vectors in some order – 1, 2,…,s • Pick vector j with jth state in chain • pij : – 0 : if vectors i & j differ by more than 1 component If they differ by at most 1 component, y1* Gibbs Sampling • • • • • Assume Joint distribution p(X,Y) Looking to sample k values of X Begin with value of y0 Sample xi using p(X | Y = yi-1) Once xi is found use it to find yi – p(Y | X = xi) • Repeat k times Visual Example Gibbs Sampling • Allows us to deal with univariate conditional distributions • Instead of complex joint distributions • Chain has stationary distribution of Why is is Hastings-Metropolis ? • If we define • Can see that for Gibbs: • When a is always 1 Simulated Annealing Simulated Annealing • Goal: Find (approximate) minimum of some positive function – Function defined on an extremely large number of states, s – • And to find those states where this function is minimized • Value of the function for state is: Simulated Annealing Process • Construct neighborhood of each state – Set of states “close” to the state – Variable in Markov chain can move to a neighbor in one step – Moves outside neighborhood not allowed Simulated Annealing • Requirements of neighborhood – If is in neighborhood of then is in the neighborhood of – Number of states in a neighborhood (N) is independent of that state – Neighborhoods are linked so that chain can eventually make it from any Ej to any Em. – If in state Ej, then the next move must be in neighborhood of Ej. Simulated Annealing • Uses a positive parameter T • Aim is to have the stationary distribution of each Markov chain state being: Constant to ensure sum of probabilities is 1 Visit often enough to allow those states with low value of f() to become recognizable Simulated Annealing Simulated Annealing • Large T values – All states in current states neighborhood are chosen with ~ equal probability – Stationary distribution of chain tends to be uniform • Small T values – Different states in neighborhoods have much different stationary distribution probabilities – Too small might get stuck in local maxima Simulated Annealing • Art of picking T value – Want rapid movement from one neighborhood to another • (Large T) – Picks out states in neighborhoods with large stationary probabilities • (Small T) SA Example Absorbing Markov Chains Absorbing Markov Chains • Absorbing state: – State which is impossible to leave – pii = 1 • Transient state: – Non-absorbing state in absorbing chain Absorbing Markov Chains • Questions to answer: – Given chain starts at a particular state, what is the expected number of steps before being absorbed? – Given chain starts at a particular state, what is the probability it will be absorbed by a particular absorbing state? General Process • Use Explanation from – Introduction to Probability – Grinstead • Convert matrix into canonical form – Uses conversions to answer these questions • Use simple example throughout Canonical Form • Rearrange states so that the transient states come first in P t x t matrix t : # of transient states r : # of absorbing states r x t zero matrix t x r matrix r x r identity matrix Drunkard’s Walk Example • Man walking home from a bar – 4 blocks to walk – 5 states total • Absorbing states: – Corner 4 – Home – Corner 0 – Bar • Each block he has an equal probability of going forward or backward Drunkard’s Walk Example Drunkard’s Walk : Canonical Form Canonical form Fundamental Matrix • For an absorbing Markov Chain P • Fundamental Matrix for P is: • nij entry gives expected number of times that the process is in the transient state sj if started in transient state si – (Before being absorbed) Proof Proof • Let si and sj be two transient states • Let be random variable – 1 : if chain is in state sj after k steps – 0 : otherwise Proof • Expected # of times chain is in state sj in the first n steps: • As n goes to infinity Example Fundamental Matrix Canonical form Time to Absorption • Expected number of steps before chain is absorbed. • ti is expected number of steps before chain is absorbed, – Given it started in si. Vector with elements ti Column vector of 1’s Proof • Sum of the ith row of N: – Expected number of times in any transient state for a given starting state si – Expected time required before absorption – This is what each value of t is Example: Time to Absorption Absorption Probabilities • bij – probability that chain will be absorbed in absorbing state sj if starts in transient state si • B – t x r matrix with entries of bij Other component of canonical matrix Proof Example: Absorption Probabilities Absorbing Markov Chains – Given chain starts at a particular state, what is the expected number of steps before being absorbed? – Given chain starts at a particular state, what is the probability it will be absorbed by a particular absorbing state? Interesting Markov Chain use Sentence Creator • Feed text into Markov chain to create transition matrix – Holds the probability of going from word i to word j in a sentence • Start at a particular word in the chain and use distributions to create new sentences Sentence Creator Dracula + Huckleberry Finn: This afternoon I don't know of humbug talky-talk, just set in, and perpetually violent. Then I saw, and looking tired them pens was a few minutes our sight. End