Markov Chains 1 Markov Chains (1) A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed by transition probability. Suppose the random variable X 0 , X1 , take state space (Ω) that is a countable set of value. A Markov chain is a process that corresponds to the network. X0 X1 ... X t 1 Xt X t 1 ... 2 Markov Chains (2) The current state in Markov chain only depends on the most recent previous states. P X t 1 j | X t i, X t 1 it 1 , , X 0 i0 P X t 1 j | X t i Transition probability where i0 , , i, j http://en.wikipedia.org/wiki/Markov_chain http://civs.stat.ucla.edu/MCMC/MCMC_tuto rial/Lect1_MCMC_Intro.pdf 3 An Example of Markov Chains 1, 2,3, 4,5 X X 0 , X1, , Xt , where X 0 is initial state and so on. P is transition matrix. 1 1 0.4 2 0.5 P 3 0.0 4 0.0 5 0.0 2 3 4 5 0.6 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.3 0.0 0.7 0.0 0.0 0.1 0.3 0.6 0.3 0.0 0.5 0.2 4 Definition (1) Define the probability of going from state i to state j in n time steps as ( n) pij P X t n j | X t i A state j is accessible from state i if there ( n) p are n time steps such that ij 0 , where n 0,1, A state i is said to communicate with state j (denote: i j ), if it is true that both i is accessible from j and that j is accessible from i. 5 Definition (2) A state i has period d i if any return to state i must occur in multiples of d i time steps. Formally, the period of a state is defined as d i gcd n : Pii( n ) 0 If d i 1, then the state is said to be aperiodic; otherwise (d i 1), the state is said to be periodic with period d i . 6 Definition (3) A set of states C is a communicating class if every pair of states in C communicates with each other. Every state in a communicating class must have the same period Example: 7 Definition (4) A finite Markov chain is said to be irreducible if its state space (Ω) is a communicating class; this means that, in an irreducible Markov chain, it is possible to get to any state from any state. Example: 8 Definition (5) A finite state irreducible Markov chain is said to be ergodic if its states are aperiodic Example: 9 Definition (6) A state i is said to be transient if, given that we start in state i, there is a non-zero probability that we will never return back to i. Formally, let the random variable Ti be the next return time to state i (the “hitting time”): Ti min n : X n i | X 0 i Then, state i is transient iff there exists a finite Ti such that: P Ti 1 10 Definition (7) A state i is said to be recurrent or persistent iff there exists a finite Ti such that: P Ti 1. The mean recurrent time i E Ti . State i is positive recurrent if i is finite; otherwise, state i is null recurrent. A state i is said to be ergodic if it is aperiodic and positive recurrent. If all states in a Markov chain are ergodic, then the chain is said to be ergodic. 11 Stationary Distributions Theorem: If a Markov Chain is irreducible and aperiodic, then (n) ij P 1 j as n , i, j Theorem: If a Markov chain is irreducible P Xn j and aperiodic, then ! j lim n and j i Pij , j 1, j i i where is stationary distribution. 12 Definition (8) A Markov chain is said to be reversible, if there is a stationary distribution such that i Pij j Pji i, j Theorem: if a Markov chain is reversible, then j i Pij i 13 An Example of Stationary Distributions A Markov chain: 0.4 0.3 0.3 0.7 1 2 0.3 0.3 0.7 3 0.7 0.3 0.0 P 0.3 0.4 0.3 0.0 0.3 0.7 1 The stationary distribution is 3 0.7 0.3 0.0 1 1 1 1 1 1 3 3 3 0.3 0.4 0.3 3 3 3 0.0 0.3 0.7 1 3 1 3 14 Properties of Stationary Distributions Regardless of the starting point, the process of irreducible and aperiodic Markov chains will converge to a stationary distribution. The rate of converge depends on properties of the transition probability. 15 Monte Carlo Markov Chains 16 Monte Carlo Markov Chains MCMC method are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution. The state of the chain after a large number of steps is then used as a sample from the desired distribution. http://en.wikipedia.org/wiki/MCMC 17 Metropolis-Hastings Algorithm 18 Metropolis-Hastings Algorithm (1) The Metropolis-Hastings algorithm can draw samples from any probability distribution , requiring only that a function proportional to the density can be Π x calculated at . Process in three x steps: Set up a Markov chain; Run the chain until stationary; Estimate with Monte Carlo methods. http://en.wikipedia.org/wiki/MetropolisHastings_algorithm 19 Metropolis-Hastings Algorithm (2) Let 1, , n be a probability density (or mass) function (pdf or pmf). f is any function and we want to estimate n I E f f i i i 1 Construct P Pij the transition matrix of an irreducible Markov chain with states 1, 2, , n, where Pij Pr Xt 1 j | X t i , X t 1,2, , n and Π is its unique stationary distribution. 20 Metropolis-Hastings Algorithm (3) Run this Markov chain for times t 1, and calculate the Monte Carlo sum ,N N 1 Iˆ f Xt N t 1 then Iˆ I as N Sheldon M. Ross(1997). Proposition 4.3. Introduction to Probability Model. 7th ed. http://nlp.stanford.edu/local/talks/mcmc_2 004_07_01.ppt 21 Metropolis-Hastings Algorithm (4) In order to perform this method for a given distribution Π , we must construct a Markov chain transition matrix P with Π as its stationary distribution, i.e. ΠP = Π. Consider the matrix P was made to satisfy the reversibility condition that for all i and j Πi Pij = Π j Pij . The property ensures that i Pij j for all j i and hence Π is a stationary distribution for P 22 Metropolis-Hastings Algorithm (5) Let a proposal Q Qij be irreducible where Qij Pr Xt 1 j | X t i , and range of Q is equal to range of Π . But Π is not have to a stationary distribution of Q. Process: Tweak Qij to yield Π . States from Qij not π Tweak States from Pij π 23 Metropolis-Hastings Algorithm (6) We assume that Pij has the form Pij Qij i, j i j , Pii 1 Pij , i j where i, j is called accepted probability, i.e. given X t i , X t 1 j with probability i, j take X t 1 i with probability 1- i, j 24 Metropolis-Hastings Algorithm (7) For i j, i Pij j Pji iQij (i, j) jQji ( j, i) * WLOG for some (i, j ) , iQij j Qji. In order to achieve equality (*), one can introduce a probability i, j 1 on the lefthand side and set j, i 1 on the righthand side. 25 Metropolis-Hastings Algorithm (8) Then i Qij i, j j Q ji j , i j Q ji j Q ji i, j i Qij These arguments imply that the accepted probability (i, j ) must be jQ ji i, j min 1, iQ ij 26 Metropolis-Hastings Algorithm (9) M-H Algorithm: Step 1: Choose an irreducible Markov chain transition matrix Q with transition probability Qij . Step 2: Let t 0 and initialize X 0 from states in Q. Step 3 (Proposal Step): Given X t i , sample Y j form QiY . 27 Metropolis-Hastings Algorithm (10) M-H Algorithm (cont.): Step 4 (Acceptance Step): Generate a random number U from Unif 0,1 If U i, j , set X t 1 Y j else X t 1 X t i Step 5: t t 1 , repeat Step 3~5 until convergence. 28