M C ARKOV HAIN

advertisement
Slides for Introduction to Stochastic Search
and Optimization (ISSO) by J. C. Spall
CHAPTER 16
MARKOV CHAIN MONTE CARLO
•Organization of chapter in ISSO
–Background on MCMC
–Metropolis-Hastings algorithm
–Numerical example of Metropolis-Hastings
–Gibbs sampling
–Numerical example of Gibbs sampling
–Optional in these slides: Non-Gaussian state estimation (not in
ISSO)
Background
• Process generating random vector X,
• Want to compute E([f(X)] for function f()
• Standard method for approximating E([f(X)] is to
generate many independent sample values of X
and compute sample mean of f(X) values
• Only useful in “trivial” cases where X can be
generated directly
• Many practical problems have non-trivial
distribution for X
– E.g., state in nonlinear/non-Gaussian state-space
model
16-2
Markov Chains
• Not necessary to generate independent X to
estimate E([f (X)]
• Consider dependent sequence X0, X1, X2,…
• Generate Xk+1 according to “easy” conditional
distribution for {Xk+1|Xk}
– {Xk} process is a Markov chain
– Xk dependence on fixed number of early states
disappears as k gets large
• Above implies distribution of Xk approaches a
stationary form as k gets large
– Stationary form corresponds to target distribution
(density) p(·) if conditional distribution chosen properly
16-3
Ergodic Averaging
• Let M denote the “burn-in” period for the Markov
chain
• The ergodic average of n – M values of f(X) with Xk
generated via a Markov chain is
n
1
f (X k )

n  M k=M +1
• Summands above are dependent via the Markov
property for the {Xk}
• Above sum approaches E([f(X)] as n gets large by
ergodic theorem
16-4
Metropolis-Hastings (M-H) Algorithm
• M-H algorithm is one of two most popular forms for
MCMC (other is Gibbs sampling)
• M-H relies on proposal distribution and
Metropolis criterion
• Let proposal distribution be q(·|·); used to generate
candidate points W ~q(·|X=x)
• Candidate point W = w is accepted with probability
given by Metropolis criterion:
 p(w ) q( x | w )
( x ,w )  min 
,
 p( x ) q(w | x )

1

• In practice, in going from Xk to Xk+1, x above is Xk
and W becomes Xk+1 if W is accepted
16-5
M-H Algorithm for Estimating E([f(X)])
Step 0 (initialization) Choose length of “burn-in” period M
and initial state X0. Set k = 0.
Step 1 (candidate point) Generate a candidate point W
according to proposal distribution q(|Xk).
Step 2 (accept/reject) Generate point U from U(0, 1)
distribution. Set Xk+1 = W if U  (Xk, W) (Metropolis
criterion). Otherwise set Xk+1 = Xk.
Step 3 (iterate) Repeat Steps 1 and 2 until XM is
available. Terminate “burn-in” process and proceed to
step 4 with Xk = XM.
Steps 4–6 (ergodic average) Repeat process and
compute average of f(XM+1),…, f(Xn). This ergodic average
is estimate of E([f(X)].
16-6
Example: Estimating E([f(X)]) from a
Bivariate Normal Distribution
(Example 16.1 from ISSO)
 0   1 0.9 
• Suppose X ~ N    , 


 0  0.9 1  
• Use M-H to estimate sum of the two mean
components (true value = 0): f (X) = [1, 1]X
• Standard (unit length) uniform proposal distribution
and burn-in period of M = 500
• Following plot shows three independent runs
– Acceptance rate (Metropolis criterion) about 70%
– Better performance possible with lower acceptance
rate (requires “tuning”—not always feasible in practice)
16-7
Example (cont’d): M-H Algorithm with
Uniform Proposal Distribution;
Mean Zero Target
3
2
1
0
–1
–2
–3
0
5000
10000
Iterations (Post Burn-In)
15000
16-8
Gibbs Sampling
• Gibbs sampling is implementation of M-H on elementby-element basis
• Gibbs sampling uniquely designed for multivariate
problems, i.e., dim(X)  2
• Gibbs sampling based on idea of “full conditional”
distributions
– ith full conditional distribution is conditional distribution
for ith component of X conditioned on most recent
values of all other components of X
• In contrast to M-H, Gibbs sampling updates
components of X one-at-a-time
16-9
Relationship of Gibbs Sampling to M-H
• Gibbs sampling is special case of M-H on element-byelement basis
• Gibbs sampling and M-H developed largely independent
of each other
– M-H introduced in Hastings (1970) as implementation of
Metropolis sampling from statistical physics
– Gibbs introduced in Tanner and Wong (1987) and Gelfand
and Smith (1990), with special focus on Bayesian problems
• Gibbs sampling uses particular form of full conditionals as
proposal distribution from M-H
– Eliminates need to “tune” proposal distribution as in general
M-H
– Requires stronger assumptions to construct full conditionals
– Acceptance rate for new points is 100%
16-10
Example: Truncated Exponential Distribution
(Example 16.5 from ISSO)
• Consider two-variable problem where conditional
random variables {X|Y} and {Y|X} have exponential
distributions over finite interval (length = 5)
– Distributions for {X|Y} and {Y|X} are two full conditionals for
Gibbs sampling
• Suppose interested in marginal distribution for X
• Can determine exact marginal distribution for X
– Useful for comparison purposes; not usually available in
practice
• Plot shows Gibbs output relative to true density for X
– Histogram based on terminal X value from 5000
independent replications
– Burn-in period of M = 10; terminal value occurs 30 iterations
past burn-in
16-11
Example (cont’d): Histogram of Gibbs
Sampling Output vs. Known Density
16-12
Optional (not in ISSO): Non-Gaussian State
Estimation
• Consider state-space model with non-Gaussian
noises (xk is state; zk is measurement)
• Represent p(xk|xk–1) and p(zk|xk) as Gaussian
mixtures
– Gaussian mixtures can be used to approx. many nonGaussian distributions
• Gibbs sampling used to estimate state based on
Gaussian full conditionals
• Further information on pp. 4344 of: Spall, J. C.
(2003), “Estimation via Markov Chain Monte Carlo,”
IEEE Control Systems Magazine, vol. 23(2), pp. 3445
16-13
Non-Gaussian State Estimation: Basic Idea
• Let  represent parameters in Gaussian mixture
• Xn and Zn are complete collection of all (n) states and
measurements
• Gibbs sampling operates from full conditionals:
{xk| xk–1, , Zn} — Gaussian distribution
{ | Xn, Zn} — non-Gaussian distribution
• Above non-Gaussian distribution known for many
cases
• Iterative sampling from above full conditionals
produces samples from p(xk| Zn) for all k
Average the samples to get E(xk| Zn)
16-14
Concluding Remarks
• M-H and Gibbs sampling two notable examples of
MCMC
– Methods for “easy” generation of random samples and
estimates
• M-H more general, but Gibbs especially useful in
specific applications
• Not “magic”—still need relevant assumptions
• Widespread use in statistics, computer science,
simulation, etc.
• Limited current use in control and signal processing
– But non-Gaussian/nonlinear state estimation one
growing area
16-15
Exercise 16.3: Four replications of M-H
Algorithm with Mean Zero Target
16-16
Exercise 16.8: Histogram (2000
samples) and Known Density for X
16-17
Download