Introduction to Markov Chain Monte Carlo Fall 2013 By Yaohang Li, Ph.D. Review • Last Class – Linear Operator Equations – Monte Carlo • This Class – Markov Chain – Metropolis Method • Next Class – Presentations Markov Chain • Markov Chain – Consider chains whose transitions occur at discrete times • States S1, S2, … • Xt is the state that the chain is in at time t • Conditional probability – P(Xt=Sj|Xt1=Si1, Xt2=Si2, …, Xtn=Sin) – The system is a Markov Chain if the distribution of Xt is independent of all previous states except for its immediate predecessor Xt-1 – P(Xt=Sj|X1=Si1, X2=Si2, …, Xt-1=Sit-1)=P(Xt=Sj|Xt-1=Sit-1) Characteristics of Markov Chain •Irreducible Chain •Aperiodic Chain •Stationary Distribution – Markov Chain can gradually forget its initial state – eventually converge to a unique stationary distribution • invariant distribution •Ergodic average n 1 f f (Xt ) n m t m 1 Target Distribution • Target Distribution Function – (x)=ce-h(x) • h(x) – in physics, the potential function – other system, the score function • c – normalization constant » make sure the integral of (x) is 1 • Presumably, all pdfs can be written in this form Metropolis Method • Basic Idea – Evolving a Markov process to achieve the sampling of • Metropolis Algorithm – Start with any configuration x0, iterates the following two steps – Step 1: Propose a random “perturbation” of the current state, i.e., xt -> x’, where x’ can be seen as generated from a symmetric probability transition function Q(xt->x’) • i.e., Q(x->x’)=Q(x’->x) • Calculate the change h=h(x’)-h(x) – Step 2: Generate a random number u ~ U[0, 1). Let xt+1=x’ if u<=e-h, otherwise xt+1=xt Simple Example for Hard-Shell Balls • Simulation – Uniformly distributed positions of K hard-shell balls in a box – These balls are assumed to have equal diameter d • (X, Y) ={(xi, yi), i=1, …, K} – denote the positions of the balls – Target Distribution (X, Y) • = constant if the balls are all in the box and have no overlaps • =0 otherwise • Metropolis algorithm – (a) pick a ball at random, say, its position is (xi, yi) – (b) move it to a tentative position (xi’, yi’)=(xi+1, yi+2), where 1, 2 are normally distributed – (c) accept the proposal if it doesn’t violate the constraints, otherwise stay out Simulation of K-Hardballs Hastings’ Generalization • Metropolis Method – A symmetric transition rule • Q(x->x’)=Q(x’->x) • Hastings’ Generalization – An arbitrary transition rule • Q(x->x’) – Q() is called a proposal function Metropolis-Hastings Method •Metropolis-Hastings Method – Given current state xt – Draw U ~ U[0, 1) and update y, ifU ( xt , y) xt 1 xt Otherwise • (x, y) = min{1, (y)Q(y->x)/((x)Q(x->y))} Detailed Balance Condition •Transition Kernel for the Metropolis-Hastings Algorithm P( xt 1 | xt ) Q( xt 1 xt ) ( xt , xt 1 ) I ( xt xt 1 )[1 Q( y xt ) ( xt , y )dy ] – I(.) denotes the indicator function • Taking the value 1 when its argument is true, and 0 otherwise – First term arises from acceptance of a candidate y=xt+1 – Second term arises from rejection, for all possible candidates y •Detailed Balance Equation – (xt)P(xt+1|xt) = (xt+1)P(xt|xt+1) Gibbs Sampling • Gibbs Sampler – A special MCMC scheme – The underlying Markov Chain is constructed by using a sequence of conditional distributions which are so chosen that is invariant with respect to each of these “conditional” moves • Gibbs Sampling – Definition • X-i={X1, …, Xi-1, Xi+1, … Xn} – Proposal Distribution • Updating the ith component of X • Qi(Xi->Yi, X-i)= (Yi|X-i) MCMC Simulations • Two Phases – Equilibration • Try to reach equilibrium distribution • Measurements of steps before equilibrium are discarded – Production • After reaching equilibrium • Measurements become meaningful • Question – How fast the simulation can reach equilibrium? Autocorrelations • Given a time series of N measurements from a Markov process • Estimator of the expectation value is • Autocorrelation function – Definition • where Behavior of Autocorrelation Function • Autocorrelation function – Asymptotic behavior for large t – is called (exponential) autocorrelation time – is related to the second largest eigenvalue of the transition matrix – Special case of the autocorrelations Integrated Autocorrelation Time • Variance of the estimator • Integrated autocorrelation time • When Reaching Equilibrium • How many steps to reach equilibrium? Example of Autocorrelation • Target Distribution Small Step Size (d = 1) Medium Step Size (d = 4) Large Step Size (d = 8) Microstate and Macrostate • Macrostate – Characterized by the following fixed values • N: number of particles • V: volume • E: total energy • Microstate – Configuration that the specific macrostate (E, V, N) can be realized – accessible • a microstate is accessible if its properties are consistent with the specified macrostate Ensemble • System of N particles characterized by macro variables: N, V, E – macro state refers to a set of these variables • There are many micro states which give the same values of {N,V,E} or macro state. – micro states refers to points in phase space • All these micro states constitute an ensemble Microcanonical Ensemble • Isolated System – N particles in volume V – Total energy is conserved – External influences can be ignored • Microcanonical Ensemble – The set of all micro states corresponding to the macro state with value N,V,E is called the Microcanonical ensemble • Generate Microcanonical Ensemble – Start with an initial micro state – Demon algorithm to produce the other micro states The Demon Algorithm • Extra degree of freedom the demon goes to every particle and exchanges energy with it • Demon Algorithm – For each Monte Carlo step (for j=1 to mcs) – for i=1 to N • Choose a particle at random and make a trial change in its position • Compute E, the change in the energy of the system due to the change • If E<=0, the system gives the amount |E| to the demon, i.e., Ed=EdE, and the trial configuration is accepted • If E>0 and the demon has sufficient energy for this change (Ed>= E), the demon gives the necessary energy to the system, i.e., Ed=Ed- E, and the trial configuration is accepted. Otherwise, the trial configuration is rejected and the configuration is not changed Monte Carlo Step • In the process of changing the micro state the program attempts to change the state of each particle. This is called a sweep or Monte Carlo Step per particle mcs • In each mcs the demon tries to change the energy of each particle once. • mcs provides a useful unit of ‘time’ MD and MC • Molecular Dynamic – a system of many particles with E, V, and N fixed by integrating Newton’s equations of motion for each particle – time-averaged value of the physical quantities of interest • Monte Carlo – Sampling the ensemble – ergodic-averaged value of the physical quantities of interest • How can we know that the Monte Carlo simulation of the microcanonical ensemble yields results equivalent to the timeaveraged results of molecular dynamics? – Ergodic hypothesis • have not been shown identical in general • have been found to yield equivalent results in all cases of interest One-Dimensional Classical Ideal Gas • Ideal Gas – The energy of a configuration is independent of the positions of the particles – The total energy is the sum of the kinetic energies of the individual particles • Interesting physical quantity – velocity • A Java Program of the 1-D Classical Ideal Gas – http://electron.physics.buffalo.edu/gonsalves/phy411506_spring01/Chapter16/feb23.html – Using the demon algorithm Physical Interpretation of Demon • Demon may be thought of as a thermometer • Simple MC simulation of ideal gas shows – Mean demon energy is twice mean kinetic of gas • The ideal gas and demon may be thought of – as a heat bath (the gas) characterized by temp T • related to its average kinetic energy – and a sub or laboratory system (the demon) – the temperature of the demon is that of the bath • The demon is in macro state T – the canonical ensemble of microstates are the Ed Canonical ensemble • Normally system is not isolated. – surrounded by a much bigger system – exchanges energy with it. • Composite system of laboratory system and surroundings may be consider isolated. • Analogy: – lab system <=> demon – surroundings <=> ideal gas • Surroundings has temperature T which also characterizes macro state of lab system Boltzmann distribution •In canonical ensemble the daemon’s energy fluctuates about a mean energy < Ed > •Probability that demon has energy Ed is given by – Boltzmann distribution • Proved in statistical mechanics • Shown by output of demon energy P ( Ed ) 1 Ed / kT e Z •Mean demon energy Ed 0 Ee E / kT dE 0 e E / kT dE kT Phase transitions • Examples: – Gas - liquid, liquid - solid – magnets, pyroelectrics – superconductors, superfluids • Below certain temperature Tc the state of the system changes structure • Characterized by order parameter – zero above Tc and non zero below Tc – e.g. magnetisation M in magnets, gap in superconductors Ising model Consider a set of spins localized on the sites of a lattice. Each spin si is allowed to point either up si 1 or down si 1. Each spin i interacts with an external field Bi and with one another vi a an exchange interactio n J ij . Simplest form of N Hamiltonia n is : H J si s j B si ij i 1 We have assumed isotropic J and uniform B. ij denotes only nearest neighbour pairs are to be counted For J 0, the lowest energy or ground state is where all spins point up. Thus positive J gives a ferromagnetic coupling. For J 0 antiferrom agnetic ground state is expected. Why Ising model? • Simplest model which exhibit a phase transition in two or more dimensions • Can be mapped to models of lattice gas and binary alloy. • Exactly solvable in one and two dimensions • No kinetic energy to complicate things • Theoretically and computationally tractable – can make dedicated ‘Ising machine’ 2-D Ising Model • E=-J • E=+J Physical Quantity •Energy – Average Energy <E> – Mean Square Energy Fluctuation <(E)2>=<E2>-<E>2 •Magnetization M – Given by N M Si i 1 – mean square magnetization <(M)2>=<M2>-<M>2 •Temperature (Microcanonical) kT / J 4 ln( 1 4 J / Ed ) Simulation of Ising Model • Microcanonical Ensemble of Ising Model – Demon Algorithm • Canonical Ensemble of Ising Model – Metropolis Algorithm Metropolis for Ising Model 1. Establish an initial microstate. 2. Make a random trial change in the microstate. 3. Compute E Etrial Eold , change in energy due to trial change 4. If E 0 , accept the new microstate and go to step 8 5. If E 0 , compute w = exp(E / T ) 6.Generate a random number r in the unit interval 7. If r w , accept the new microstate; otherwise retain old state 8. Determine the value of the desired physical quantities. 9. Repeat steps (2) through (8) 10. Periodically compute averages over microstates Simulating Infinite System • Simulating Infinite System – Periodic Boundaries Energy Landscape • Energy Landscape – Complex biological or physical system has a complicated and rough energy landscape Local Minima and Global Minima • Local Minima – The physical system may transiently reside in the local minima • Global Minima – The most stable physical/biological system – Usually the native configuration • Goal of Global Optimization – Escape the traps of local minima – Converge to the global minimum Local Optimization • Local Optimization – Quickly reach a local minima – Approaches • Gradient-based Method • Quasi-Newton Method • Powell Method • Hill-climbing Method • Simplex Rosenbrock Function f 1000 100 10 1 0.1 0.01 0.001 1.5 global minimum 1 y • Global Optimization – Find the global minima – More difficult than local optimization 0.5 0 -0.5 -1.5 -1 -0.5 0.5 0 x 1 1.5 Consequences of the Occasional Ascents desired effect Help escaping the local optima. adverse effect Might pass global optima after reaching it (easy to avoid by keeping track of best-ever state) Temperature is the Key • Probability of energy changes as temperature raises Boltzmann distribution • At thermal equilibrium at temperature T, the Boltzmann distribution gives the relative probability that the system will occupy state A vs. state B as: P( A) E ( A) E ( B) exp( E ( B) / T ) exp P( B) T exp( E ( A) / T ) • where E(A) and E(B) are the energies associated with states A and B. Real annealing: Sword •He heats the metal, then slowly cools it as he hammers the blade into shape. – If he cools the blade too quickly the metal will form patches of different composition; – If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy. Simulated Annealing Algorithm • Idea: Escape local extrema by allowing “bad moves,” but gradually decrease their size and frequency. – proposed by Kirkpatrick et al. 1983 • Simulated Annealing Algorithm – A modification of the Metropolis Algorithm • Start with any configuration x0, iterates the following two steps • Initialize T to a high temperature • Step 1: Propose a random “perturbation” of the current state, i.e., xt -> x’, where x’ can be seen as generated from a symmetric probability transition function Q(xt->x’) – i.e., Q(x->x’)=Q(x’->x) – Calculate the change h=h(x’)-h(x) • Step 2: Generate a random number u ~ U[0, 1). Let xt+1=x’ if u<=e-h/T, otherwise xt+1=xt • Step 3: Slightly reduce T and go to Step 1 Until T is your target temperature Note on simulated annealing: limit cases • Boltzmann distribution: accept “bad move” with E<0 (goal is to maximize E) with probability P(E) = exp(E/T) • If T is large: E < 0 E/T < 0 and very small Random walk exp(E/T) close to 1 accept bad move with high probability • If T is near 0: E < 0 Deterministic E/T < 0 and very large down-hill exp(E/T) close to 0 accept bad move with low probability Simulated Tempering • Basic Idea – – – • Allow the temperature to go up and down randomly Proposed by Geyers 1993 Can be applied to a very rough energy landscape Simulated Tempering Algorithm – – – At the current sampler i, update x using a Metropolis-Hastings update for i(x). Set j = i 1 according to probabilities pi,j, where p1,2 = pm,m-1 = 1.0 and pi,i+1 = pi,i-1 = 0.5 if 1 < i < m. Calculate the equivalent of the Metropolis-Hastings ratio for the ST method, c( j ) j ( x) p j ,i rST – c(i ) i ( x) pi , j where c(i) are tunable constants and accept the transition from i to j with probability min(rst, 1). The distribution is called the pseudo-prior because the function ci(x)i(x) resembles formally the product of likelihood and prior. Parallel Tempering • Basic Idea – Allow multiple walks at different temperature levels – Switch configuration between temperature levels • Based on Metropolis-Hastings Ratio – Also called replica exchange method – Very powerful in complicated energy landscape Multi-Transition Metropolis • Multi-Transition Metropolis – Allow steps of different size – Proposed by J. Liu • Basic Idea – Large step size can escape the deep local minima trap more easily – However, the acceptance rate is much lower Accelerated Simulated Tempering • Accelerated Simulated Tempering [Li, Protopopescu, Gorin, 2004] – A modified scheme of simulated tempering • Allow larger transition steps at high temperature level – Large transition steps at high temperature level – Small transition steps at low temperature level Rugged Energy Function Landscape • Lean the ladder – Biased the transition probability between temperature levels – Favor low temperature level Hybrid PT/SA Method • Hybrid PT/SA Method [Li, Protopopescu, Nikita, Gorin, 2009] – System Setup • Subsystem – xi at different temperature levels • Composite System – Macrostate X = { x1, x2, …, xN } – Transition Types • Parallel Local Transitions – Local Monte Carlo Moves at a specific temperature level t 1 wLocal ( xit xit 1 ) e i i E e i ( E ( xi ) E ( xit )) – Acceptance Probability min( 1, wLocal ( xit xit 1 )) • Replica Exchange – Replica Exchange between two randomly chosen temperature levels xit 1 xit11 – Acceptance Probability t 1 min( 1, e i E ( xi 1 ) i 1E ( xi t 1 ) i 1E ( xi 1t 1 ) i E ( xi t 1 ) – Cooling Scheme • Examine the Macrostate for Equilibrium – Boltzmann Distribution of the Subsystems P( E ( xi )) exp( E ( xi ) / Ti ) exp( E ( x j ) / T j ) j ) Application of Global Optimization VLSI Floorplan Design • VLSI Design – Goal: putting chips on the circuit floor without overlaps Summary • • • • • • • Markov Chain Markov Chain Monte Carlo Metropolis Method Hastings’ Generalization Detail Balance Condition Gibbs Sampler Autocorrelation Function and Integrated Autocorrelation Time What I want you to do? • • • • Review Slides Review basic probability/statistics concepts Prepare for your presentation topic and term paper Work on your assignment 5