11 - Markov Chains Jim Vallandingham

advertisement
11 - Markov Chains
Jim Vallandingham
Outline
• Irreducible Markov Chains
– Outline of Proof of Convergence to Stationary
Distribution
– Convergence Example
– Reversible Markov Chain
• Monte Carlo Methods
– Hastings-Metropolis Algorithm
– Gibbs Sampling
– Simulated Annealing
• Absorbing Markov Chains
Stationary Distribution
• As
approaches
Each row is the stationary distribution
Stationary Dist. Example
Stationary Dist. Example
• Long Term averages:
– 24% time spent in state E1
– 39% time spent in state E2
– 21% time spent in state E3
– 17% time spent in state E4
Stationary Distribution
• Any finite, aperiodic irreducible Markov
chain will converge to a stationary
distribution
– Regardless of starting distribution
• Outline of Proof requires linear algebra
– Appendix B.19
L.A. : Eigenvalues
• Let P be an s x s matrix.
• P has s eigenvalues
– Found as the s solutions to
– Assume all eigenvalues of P are distinct
L.A. : left & right eigenvectors
• Corresponding to each eigenvalue
– Is a right eigenvector – And a left eigenvector – For which:
– Assume they are normalized:
L.A. : Spectral Expansion
• Can express P in terms of its
eigenvectors and eigenvalues:
• Called a spectral expansion of P
L.A. : Spectral Expansion
• If
is an eigenvalue of P with
corresponding left and right
eigenvectors
&
• Then
is an eigenvalue of Pn with
same left and right eigenvectors
&
L.A. : Spectral Expansion
• Implies spectral expansion of Pn can be
written as:
Outline of Proof
• Going back to proof…
– P is transition matrix for finite aperiodic
irreducible Markov chain
• P has one eigenvalue,
equal to 1
– All other eigenvalues have absolute value
<1
Outline of Proof
• Choosing left and right eigenvectors of
– Requirements:
&
=1
Probability vector
(sum to 1)
Normalization
– Also satisfies :
(definition of left eigenvector as eigenvalue of 1)
Outline of Proof
Same equation satisfied by the stationary distribution
• Also:
– Can be shown that there is a unique
solution of this equation that also satisfies
so
so that
Outline of Proof
• Pn gives the n-step transition
probabilities.
• Spectral Expansion of Pn is:
Only one eigenvalue is = 1. Rest are < 1
• So as n increases Pn approaches
Convergence Example
Convergence Example
Has Eigenvalues of :
Convergence Example
Has Eigenvalues of :
Less than 1
Convergence Example
• Left & Right eigenvectors satisfying
Convergence Example
• Left & Right eigenvectors satisfying
Stationary distribution
Convergence Example
• Spectral expansion
Stationary distribution
0
0
0
Reversible Markov Chains
Reversible Markov Chains
• Typically moving forward in ‘time’ in a
Markov chain
– 1  2  3  … t
• What about moving backward in this
chain?
– t  t-1  t-2 …  1
Reversible Markov Chains
Ancestor
Species A
Species B
Reversible Markov Chains
• Have a finite irreducible aperiodic
Markov chain
– with stationary distribution
– During t transitions, chain will move
through states:
• Reverse chain
– Define
– Then reverse chain will move through
states:
Reversible Markov Chains
• Want to show structure determining the
reverse chain sequence is also a
Markov chain
• Typical element
found from typical
element of P,
using:
Reversible Markov Chains
• Shown by using Bayes rule to invert
conditional probability
Intuitively:
The future is independent of the past,
given the present
The past is independent of the future,
given the present
Reversible Markov Chains
• Stationary distribution of reverse chain
is still
• Follows from Stationary distribution
property
Reversible Markov Chains
• Markov chain is said to be reversible if
• This only holds if
Monte Carlo Methods
Markov Chain Monte Carlo
• Class of algorithms for sampling from
probability distributions
– Involve constructing a Markov Chain
– Want to have stationary distribution
– State of chain after large number of steps
is used as a sample of desired distribution
• We discuss 2 algorithms
– Gibbs Sampling
– Simulated Annealing
Basic Problem
• Find transition matrix P such that
– Its stationary distribution is the target
distribution
• Know that Markov chain will converge to
stationary distribution, regardless of
initial distribution
– How can we find such a P with its
stationary distribution as the target
distribution?
Basic Idea
• Construct transition matrix Q
– “candidate generating matrix”
– Modify to have correct stationary distribution
• Modification involves inserting factors
• So that
Various ways to
picking a’s
Hastings-Metropolis
• Goal: construct aperiodic irreducible
Markov chain
• Having prescribed stationary distribution
• Produces a correlated sequence of
draws from the target density that may
be difficult to sample using a classical
independence method.
Hastings-Metropolis
Process:
• Choose set of constants
– Such that
– And
• Define
Accept state change
Reject state change
Chain doesn’t change
value
Hastings-Metropolis Example
= (.4 .6)
Q=
1
2
1
.5
.5
2
.9
.1
Hastings-Metropolis Example
Q=
1
2
1
2
.5
.9
.5
.1
= (.4 .6)
P=
1
2
1
2
.5
.33
.5
.67
Hastings-Metropolis Example
= (.4 .6)
P=
1
2
P2=
1
2
P50= 1
2
1
2
.5
.33
.5
.67
1
2
.415
.386
.585
.614
1
2
.398
.398
.602
.602
Algorithmic Description
1. Start with State E1, then iterate
2. Propose E’ from q(Et,E’)
3. Calculate ratio
4. If a > 1,
– Accept E(t+1) = E’
5. Else
– Accept with probability of a
– If rejected, E(t+1) = Et
Gibbs Sampling
Gibbs Sampling
Definitions
Be the random vector
Be the distribution of
Assume
We define a Markov chain whose states are the possible values of Y
Gibbs Sampling
Process
• Enumerate vectors in some order
– 1, 2,…,s
• Pick vector j with jth state in chain
• pij :
– 0 : if vectors i & j differ by more than 1
component
If they differ by at most 1 component, y1*
Gibbs Sampling
•
•
•
•
•
Assume Joint distribution p(X,Y)
Looking to sample k values of X
Begin with value of y0
Sample xi using p(X | Y = yi-1)
Once xi is found use it to find yi
– p(Y | X = xi)
• Repeat k times
Visual Example
Gibbs Sampling
• Allows us to deal with univariate
conditional distributions
• Instead of complex joint distributions
• Chain has stationary distribution of
Why is is Hastings-Metropolis ?
• If we define
• Can see that for Gibbs:
• When a is always 1
Simulated Annealing
Simulated Annealing
• Goal: Find (approximate) minimum of
some positive function
– Function defined on an extremely large
number of states, s
–
• And to find those states where this
function is minimized
• Value of the function for state
is:
Simulated Annealing
Process
• Construct neighborhood of each state
– Set of states “close” to the state
– Variable in Markov chain can move to a
neighbor in one step
– Moves outside neighborhood not allowed
Simulated Annealing
• Requirements of neighborhood
– If
is in neighborhood of
then
is in the neighborhood of
– Number of states in a neighborhood (N) is
independent of that state
– Neighborhoods are linked so that chain
can eventually make it from any Ej to any
Em.
– If in state Ej, then the next move must be in
neighborhood of Ej.
Simulated Annealing
• Uses a positive parameter T
• Aim is to have the stationary distribution
of each Markov chain state
being:
Constant to ensure sum of probabilities is 1
Visit often enough to allow those states with low value of f() to become recognizable
Simulated Annealing
Simulated Annealing
• Large T values
– All states in current states neighborhood
are chosen with ~ equal probability
– Stationary distribution of chain tends to be
uniform
• Small T values
– Different states in neighborhoods have
much different stationary distribution
probabilities
– Too small might get stuck in local maxima
Simulated Annealing
• Art of picking T value
– Want rapid movement from one
neighborhood to another
• (Large T)
– Picks out states in neighborhoods with
large stationary probabilities
• (Small T)
SA Example
Absorbing Markov Chains
Absorbing Markov Chains
• Absorbing state:
– State which is impossible to leave
– pii = 1
• Transient state:
– Non-absorbing state in absorbing chain
Absorbing Markov Chains
• Questions to answer:
– Given chain starts at a particular state,
what is the expected number of steps
before being absorbed?
– Given chain starts at a particular state,
what is the probability it will be absorbed
by a particular absorbing state?
General Process
• Use Explanation from
– Introduction to Probability – Grinstead
• Convert matrix into canonical form
– Uses conversions to answer these
questions
• Use simple example throughout
Canonical Form
• Rearrange states so that the transient
states come first in P
t x t matrix
t : # of transient states
r : # of absorbing states
r x t zero matrix
t x r matrix
r x r identity matrix
Drunkard’s Walk Example
• Man walking home from a bar
– 4 blocks to walk
– 5 states total
• Absorbing states:
– Corner 4 – Home
– Corner 0 – Bar
• Each block he has an equal probability
of going forward or backward
Drunkard’s Walk Example
Drunkard’s Walk : Canonical Form
Canonical form
Fundamental Matrix
• For an absorbing Markov Chain P
• Fundamental Matrix for P is:
• nij entry gives expected number of times
that the process is in the transient state
sj if started in transient state si
– (Before being absorbed)
Proof
Proof
• Let si and sj be two transient states
• Let
be random variable
– 1 : if chain is in state sj after k steps
– 0 : otherwise
Proof
• Expected # of times chain is in state sj
in the first n steps:
• As n goes to infinity
Example Fundamental Matrix
Canonical form
Time to Absorption
• Expected number of steps before chain
is absorbed.
• ti is expected number of steps before
chain is absorbed,
– Given it started in si.
Vector with elements ti
Column vector of 1’s
Proof
• Sum of the ith row of N:
– Expected number of times in any transient
state for a given starting state si
– Expected time required before absorption
– This is what each value of t is
Example: Time to Absorption
Absorption Probabilities
• bij – probability that chain will be
absorbed in absorbing state sj if starts in
transient state si
• B – t x r matrix with entries of bij
Other component of canonical matrix
Proof
Example: Absorption Probabilities
Absorbing Markov Chains
– Given chain starts at a particular state,
what is the expected number of steps
before being absorbed?
– Given chain starts at a particular state,
what is the probability it will be absorbed
by a particular absorbing state?
Interesting Markov Chain use
Sentence Creator
• Feed text into Markov chain to create
transition matrix
– Holds the probability of going from word i
to word j in a sentence
• Start at a particular word in the chain
and use distributions to create new
sentences
Sentence Creator
Dracula + Huckleberry Finn:
This afternoon I don't know of
humbug talky-talk, just set in, and
perpetually violent. Then I saw,
and looking tired them pens was
a few minutes our sight.
End
Download