11 - Markov Chains Jim Vallandingham

11 - Markov Chains
Jim Vallandingham
• Irreducible Markov Chains
– Outline of Proof of Convergence to Stationary
– Convergence Example
– Reversible Markov Chain
• Monte Carlo Methods
– Hastings-Metropolis Algorithm
– Gibbs Sampling
– Simulated Annealing
• Absorbing Markov Chains
Stationary Distribution
• As
Each row is the stationary distribution
Stationary Dist. Example
Stationary Dist. Example
• Long Term averages:
– 24% time spent in state E1
– 39% time spent in state E2
– 21% time spent in state E3
– 17% time spent in state E4
Stationary Distribution
• Any finite, aperiodic irreducible Markov
chain will converge to a stationary
– Regardless of starting distribution
• Outline of Proof requires linear algebra
– Appendix B.19
L.A. : Eigenvalues
• Let P be an s x s matrix.
• P has s eigenvalues
– Found as the s solutions to
– Assume all eigenvalues of P are distinct
L.A. : left & right eigenvectors
• Corresponding to each eigenvalue
– Is a right eigenvector – And a left eigenvector – For which:
– Assume they are normalized:
L.A. : Spectral Expansion
• Can express P in terms of its
eigenvectors and eigenvalues:
• Called a spectral expansion of P
L.A. : Spectral Expansion
• If
is an eigenvalue of P with
corresponding left and right
• Then
is an eigenvalue of Pn with
same left and right eigenvectors
L.A. : Spectral Expansion
• Implies spectral expansion of Pn can be
written as:
Outline of Proof
• Going back to proof…
– P is transition matrix for finite aperiodic
irreducible Markov chain
• P has one eigenvalue,
equal to 1
– All other eigenvalues have absolute value
Outline of Proof
• Choosing left and right eigenvectors of
– Requirements:
Probability vector
(sum to 1)
– Also satisfies :
(definition of left eigenvector as eigenvalue of 1)
Outline of Proof
Same equation satisfied by the stationary distribution
• Also:
– Can be shown that there is a unique
solution of this equation that also satisfies
so that
Outline of Proof
• Pn gives the n-step transition
• Spectral Expansion of Pn is:
Only one eigenvalue is = 1. Rest are < 1
• So as n increases Pn approaches
Convergence Example
Convergence Example
Has Eigenvalues of :
Convergence Example
Has Eigenvalues of :
Less than 1
Convergence Example
• Left & Right eigenvectors satisfying
Convergence Example
• Left & Right eigenvectors satisfying
Stationary distribution
Convergence Example
• Spectral expansion
Stationary distribution
Reversible Markov Chains
Reversible Markov Chains
• Typically moving forward in ‘time’ in a
Markov chain
– 1  2  3  … t
• What about moving backward in this
– t  t-1  t-2 …  1
Reversible Markov Chains
Species A
Species B
Reversible Markov Chains
• Have a finite irreducible aperiodic
Markov chain
– with stationary distribution
– During t transitions, chain will move
through states:
• Reverse chain
– Define
– Then reverse chain will move through
Reversible Markov Chains
• Want to show structure determining the
reverse chain sequence is also a
Markov chain
• Typical element
found from typical
element of P,
Reversible Markov Chains
• Shown by using Bayes rule to invert
conditional probability
The future is independent of the past,
given the present
The past is independent of the future,
given the present
Reversible Markov Chains
• Stationary distribution of reverse chain
is still
• Follows from Stationary distribution
Reversible Markov Chains
• Markov chain is said to be reversible if
• This only holds if
Monte Carlo Methods
Markov Chain Monte Carlo
• Class of algorithms for sampling from
probability distributions
– Involve constructing a Markov Chain
– Want to have stationary distribution
– State of chain after large number of steps
is used as a sample of desired distribution
• We discuss 2 algorithms
– Gibbs Sampling
– Simulated Annealing
Basic Problem
• Find transition matrix P such that
– Its stationary distribution is the target
• Know that Markov chain will converge to
stationary distribution, regardless of
initial distribution
– How can we find such a P with its
stationary distribution as the target
Basic Idea
• Construct transition matrix Q
– “candidate generating matrix”
– Modify to have correct stationary distribution
• Modification involves inserting factors
• So that
Various ways to
picking a’s
• Goal: construct aperiodic irreducible
Markov chain
• Having prescribed stationary distribution
• Produces a correlated sequence of
draws from the target density that may
be difficult to sample using a classical
independence method.
• Choose set of constants
– Such that
– And
• Define
Accept state change
Reject state change
Chain doesn’t change
Hastings-Metropolis Example
= (.4 .6)
Hastings-Metropolis Example
= (.4 .6)
Hastings-Metropolis Example
= (.4 .6)
P50= 1
Algorithmic Description
1. Start with State E1, then iterate
2. Propose E’ from q(Et,E’)
3. Calculate ratio
4. If a > 1,
– Accept E(t+1) = E’
5. Else
– Accept with probability of a
– If rejected, E(t+1) = Et
Gibbs Sampling
Gibbs Sampling
Be the random vector
Be the distribution of
We define a Markov chain whose states are the possible values of Y
Gibbs Sampling
• Enumerate vectors in some order
– 1, 2,…,s
• Pick vector j with jth state in chain
• pij :
– 0 : if vectors i & j differ by more than 1
If they differ by at most 1 component, y1*
Gibbs Sampling
Assume Joint distribution p(X,Y)
Looking to sample k values of X
Begin with value of y0
Sample xi using p(X | Y = yi-1)
Once xi is found use it to find yi
– p(Y | X = xi)
• Repeat k times
Visual Example
Gibbs Sampling
• Allows us to deal with univariate
conditional distributions
• Instead of complex joint distributions
• Chain has stationary distribution of
Why is is Hastings-Metropolis ?
• If we define
• Can see that for Gibbs:
• When a is always 1
Simulated Annealing
Simulated Annealing
• Goal: Find (approximate) minimum of
some positive function
– Function defined on an extremely large
number of states, s
• And to find those states where this
function is minimized
• Value of the function for state
Simulated Annealing
• Construct neighborhood of each state
– Set of states “close” to the state
– Variable in Markov chain can move to a
neighbor in one step
– Moves outside neighborhood not allowed
Simulated Annealing
• Requirements of neighborhood
– If
is in neighborhood of
is in the neighborhood of
– Number of states in a neighborhood (N) is
independent of that state
– Neighborhoods are linked so that chain
can eventually make it from any Ej to any
– If in state Ej, then the next move must be in
neighborhood of Ej.
Simulated Annealing
• Uses a positive parameter T
• Aim is to have the stationary distribution
of each Markov chain state
Constant to ensure sum of probabilities is 1
Visit often enough to allow those states with low value of f() to become recognizable
Simulated Annealing
Simulated Annealing
• Large T values
– All states in current states neighborhood
are chosen with ~ equal probability
– Stationary distribution of chain tends to be
• Small T values
– Different states in neighborhoods have
much different stationary distribution
– Too small might get stuck in local maxima
Simulated Annealing
• Art of picking T value
– Want rapid movement from one
neighborhood to another
• (Large T)
– Picks out states in neighborhoods with
large stationary probabilities
• (Small T)
SA Example
Absorbing Markov Chains
Absorbing Markov Chains
• Absorbing state:
– State which is impossible to leave
– pii = 1
• Transient state:
– Non-absorbing state in absorbing chain
Absorbing Markov Chains
• Questions to answer:
– Given chain starts at a particular state,
what is the expected number of steps
before being absorbed?
– Given chain starts at a particular state,
what is the probability it will be absorbed
by a particular absorbing state?
General Process
• Use Explanation from
– Introduction to Probability – Grinstead
• Convert matrix into canonical form
– Uses conversions to answer these
• Use simple example throughout
Canonical Form
• Rearrange states so that the transient
states come first in P
t x t matrix
t : # of transient states
r : # of absorbing states
r x t zero matrix
t x r matrix
r x r identity matrix
Drunkard’s Walk Example
• Man walking home from a bar
– 4 blocks to walk
– 5 states total
• Absorbing states:
– Corner 4 – Home
– Corner 0 – Bar
• Each block he has an equal probability
of going forward or backward
Drunkard’s Walk Example
Drunkard’s Walk : Canonical Form
Canonical form
Fundamental Matrix
• For an absorbing Markov Chain P
• Fundamental Matrix for P is:
• nij entry gives expected number of times
that the process is in the transient state
sj if started in transient state si
– (Before being absorbed)
• Let si and sj be two transient states
• Let
be random variable
– 1 : if chain is in state sj after k steps
– 0 : otherwise
• Expected # of times chain is in state sj
in the first n steps:
• As n goes to infinity
Example Fundamental Matrix
Canonical form
Time to Absorption
• Expected number of steps before chain
is absorbed.
• ti is expected number of steps before
chain is absorbed,
– Given it started in si.
Vector with elements ti
Column vector of 1’s
• Sum of the ith row of N:
– Expected number of times in any transient
state for a given starting state si
– Expected time required before absorption
– This is what each value of t is
Example: Time to Absorption
Absorption Probabilities
• bij – probability that chain will be
absorbed in absorbing state sj if starts in
transient state si
• B – t x r matrix with entries of bij
Other component of canonical matrix
Example: Absorption Probabilities
Absorbing Markov Chains
– Given chain starts at a particular state,
what is the expected number of steps
before being absorbed?
– Given chain starts at a particular state,
what is the probability it will be absorbed
by a particular absorbing state?
Interesting Markov Chain use
Sentence Creator
• Feed text into Markov chain to create
transition matrix
– Holds the probability of going from word i
to word j in a sentence
• Start at a particular word in the chain
and use distributions to create new
Sentence Creator
Dracula + Huckleberry Finn:
This afternoon I don't know of
humbug talky-talk, just set in, and
perpetually violent. Then I saw,
and looking tired them pens was
a few minutes our sight.