Home Page Title Page Contents NOTES ON PERFECT SIMULATION JJ II J I Wilfrid S. Kendall Department of Statistics, University of Warwick Page 1 of 111 March 2004 Go Back This work was carried out while the author was visiting the Institute for Mathematical Sciences, National University of Singapore in 2004. The visit was supported by the Institute. Full Screen Version 1.7 Close Quit Home Page Title Page Contents JJ II J I Page 2 of 111 Go Back Full Screen Close Quit Home Page Title Page Contents 1 CFTP: the classic case Contents JJ II J I 7 Page 3 of 111 2 CFTP and regeneration 35 3 Dominated CFTP 51 4 Theory and connections 77 Go Back Full Screen Close Quit Perfect simulation refers to the art of converting suitable Markov Chain Monte Carlo algorithms into algorithms which return exact draws from the target distribution, instead of long-time approximations. The underlying ideas of perfect simulation stretch back a long long way, but they sprang into prominence as a result of a groundbreaking paper of Propp and Wilson (1996), which showed how to obtain exact draws from (for example) the critical Ising model. In these lectures I will describe several themes of perfect simulation: Home Page Title Page Lecture 1 the original or classic Coupling From The Past algorithm (CFTP); Lecture 2 variations which exploit regeneration ideas: small-set or split-chain constructions from Markov chain theory (small-set CFTP); Lecture 3 generalizations which deal with non-monotonic and non-uniformly ergodic examples (dominated CFTP); and Lecture 4 striking results relating CFTP to an apparently different algorithm due originally to Fill, as well as other theoretical complements. The talks (which will be organized approximately under the four headings above) will be illustrated with online animations of CFTP simulations and will describe related theoretical issues. The objective of these tutorial lectures is to give a sound and clear exposition of these themes, and hence to promote an interplay of ideas with practitioners of Markov Chain Monte Carlo. Contents JJ II J I Page 4 of 111 Go Back Full Screen Close Quit Useful reading Home Page • Books: Thorisson (2000), Häggström (2002), Møller and Waagepetersen (2004). Title Page Contents JJ II J I Page 5 of 111 Go Back • Online bibliography: Full Screen http://research.microsoft.com/˜dbwilson/exact/ Close Quit Home Page Title Page Contents JJ II J I Page 6 of 111 Go Back Full Screen Close Quit Home Page Title Page Lecture 1 CFTP: the classic case Contents JJ II J I Page 7 of 111 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Coupling Random walk CFTP The CFTP theorem Falling leaves Ising CFTP Point process CFTP CFTP in space Pre-history Go Back Full Screen Close Quit Recall basic facts concerning Markov chain Monte Carlo (MCMC): Home Page • Fundamental paradigm: – specify the target distribution indirectly; Title Page – realize it as the equilibrium of Markov chain; – sample approximately from target distribution by running Markov chain for a long time (till near-equilibrium). • How long is the “burn-in” period? – Guess or diagnose (Cowles and Carlin 1995; Brooks and Roberts 1997), – estimate, whether analytically (Diaconis and Stroock 1991; Roberts and Polson 1994; Roberts and Tweedie 1996a; Roberts and Tweedie 1996b), or empirically (Johnson 1996), – or do better? Contents JJ II J I Page 8 of 111 Go Back Full Screen Close Quit MCMC arises in a number of different areas of the mathematical sciences, with different emphases. (This makes interaction interesting and fruitful!) Here are some examples: • Statistical mechanics. Infinite-dimensional systems: Are there phase transition phenomena? How do they behave? • Computer science. Approximate counting: How does the algorithm behave as the scale of the problem increases? Does the run-time increase exponentially, or polynomially? • Image analysis. Given a noisy picture with some kind of geometric content: can we clean it up using modelling by spatial random fields? Can we identify significant features? • Statistics. – Bayesian. Can we draw accurately (and quickly if possible!) from the posterior distribution on a space which may be low-dimensional but not at all symmetric? Home Page Title Page Contents JJ II J I Page 9 of 111 Go Back Full Screen Close – Frequentist. What does the likelihood surface look like? Quit The Propp and Wilson (1996) CFTP idea: Home Page • Design modified MCMC algorithms which deliver exact draws after randomly long run-time. Title Page • Construct actual implementations for interesting cases. Contents • Approach is based on probabilistic coupling methods. • Competitive with ordinary MCMC for amenable cases. This is exact or perfect simulation.1 Practical applications use a large number of different ideas, which will be introduced in these lectures. JJ II J I Page 10 of 111 Go Back Full Screen 1 Mark Huber suggests a useful taxonomy: divide exact sampling methods into (a) direct (loosely: use knowledge of normalizing constant) and (b) perfect (loosely: don’t use this knowledge). Close Quit 1.1. Coupling and convergence: the binary switch Home Page Title Page We commence by introducing the fundamental idea of coupling (see Lindvall 2002, or my online notes from the 2003 Durham symposium or the 2003 GTP program for more on coupling). Consider the simplest possible case: a continuous-time Markov chain with just two states, which makes transitions from one state to the other at constant rate 1/α (the binary switch). Construction: Supply (a) Poisson process (rate 1/α) of 0 → 1 transitions, (b) independently ditto of 1 → 0 transitions. Use transitions to build coupled processes X, Y begun at 0, 1. Clearly X, Y are (coupled) copies of the binary switch, coupling at the time T of the first Poisson incident. Contents JJ II J I Page 11 of 111 Go Back Full Screen Close Quit Then the classic coupling inequality argument shows Home Page distTV (Xt , π) = sup{P [Xt ∈ A] − π(A)} ≤ P [T ≤ t] A (1.1) Title Page where Contents • π is the equilibrium distribution, • and distTV is the total variation distance. Diaconis (1996) reviews interesting features of convergence to equilibrium for random walks on the hypercube; the above chain is a basic building block of such Markov chains. JJ II J I Page 12 of 111 Go Back Full Screen Close Quit The coupling argument generalizes to arbitrary Markov chains: • if we can couple a general Markov chain X to a version Y in statistical equilibrium, then such a coupling bounds the approach to equilibrium through Equation (1.1); Home Page Title Page • if we allow non-adapted couplings, the bound is sharp (Griffeath 1975, Goldstein 1979); • non-adapted couplings can be very difficult to construct! Coadapted couplings can supply usable bounds but need not be sharp. • Diaconis (1996) uses hypercube random walks to illustrate the cutoff phenomenon in simple contexts. Here is an interesting (but probably very hard) research question: how to relate CFTP to cutoff phenomena? (Useful hints in Propp and Wilson 1996, Wilson 2004.) Can we use such a coupling to draw from equilibrium? The binary switch example is deceptive: X(T ) is in equilibrium in the case of the binary switch, but not in general . . . . Contents JJ II J I Page 13 of 111 Go Back Full Screen Close Quit 1.2. Random walk CFTP Home Page Consider coupling applied to the random walk X on {1, 2, . . . , N }, reflected at the boundary points. Use synchronous coupling (generalizes coupling above!): then X only couples at 1, N . Thus X(T ) cannot be a draw from equilibrium if N > 2. The Propp-Wilson idea draws on a well-established theme from ergodic theory: realize a Markov chain as a stochastic flow and evolve it not into the future but from the past! However if we do this then we need to consider coupled realizations of the Markov chain started at all possible starting points. By monotonicity, we can focus on: • X lower,−n begun at 1 at time −n, • X upper,−n begun at N at time −n; Title Page Contents JJ II J I Page 14 of 111 Go Back Full Screen Close since these can be arranged to sandwich all other realizations begun at time −n. Quit Home Page • Run upper and lower processes from time −n. Title Page • If coupling by time 0, return the common value. • Otherwise, repeat but start at time −2n (say). Issues: • Run from the past, not into the future; Contents JJ II J I Page 15 of 111 Go Back • Use common randomness and re-use it; Full Screen • Sample at time 0, not at coupling time; Close • There is a rationale for n → 2n: binary search. Quit Exercise 1.1 Use simple R statistical package scripts and elementary calculations to investigate bias resulting from the following. Home Page Title Page • Running into the future, not from the past, stopping (say) at the first time t = 2n after coupling has occurred; • Failing to re-use randomness (we’d expect this to bias towards “earlier” coalescence); • Sampling at coupling time instead of time 0 (obviously a bad idea; sampling an independent random walk at this time still gives a biased result). Contents JJ II J I Page 16 of 111 Go Back See my GTP online notes for further details. Full Screen Close Quit 1.3. The CFTP theorem Home Page Morally the proof of classic CFTP is just 3 lines long. Express the coupling for X in terms of random input-output maps F(−u,v] . Title Page Theorem 1.2 (Propp and Wilson 1996): If coalescence is almost sure then CFTP samples from equilibrium. Contents Proof: For each time-range [−n, ∞) use the F(−n,t] to define Xt−n = F(−n,t] (0) X0−T L(Xn0 ) . = = whenever − n ≤ −T ; I Go Back Full Screen distTV (L(X0−T ), π) = lim distTV (L(X0−n ), π) = lim distTV (L(Xn0 ), π) = 0 hence result. J Page 17 of 111 If X converges to an equilibrium π in total variation distTV then n II for − n ≤ t . Finite coalescence time −T is assumed. So X0−n L(X0−n ) JJ n Close Quit Remark 1.3 We are free to choose any “backwards stopping time” −T so long as we can guarantee coalescence of F(−T,0] . The binary search approach of random walk CFTP is deservedly popular, but there are alternatives: see for example the “block-by-block” strategy of read-once CFTP. Home Page Title Page Contents Remark 1.4 Monotonicity of the target process is convenient for CFTP, but not essential. Propp and Wilson (1996, §3.2) formalize use of monotonicity. Kendall (1998) describes a “crossover” trick for use in anti-monotonic situations; Kendall and Møller (2000) generalize the trick to cases where monotonicity is absent (see also Huber 2003 on bounding chains); Häggström and Nelander (1998) apply the trick to lattice systems. JJ II J I Page 18 of 111 Go Back Full Screen Close Quit 1.4. The falling leaves of Fontainebleau Home Page Kendall and Thönnes (1999) describe a visual and geometric application of CFTP in mathematical geology: this particular example being well-known to workers in the field previous to the introduction of CFTP itself. Occlusion CFTP for the falling leaves of Fontainebleau. Title Page Contents JJ II J I (The “dead leaves” model.) Page 19 of 111 Why “occlusion”? David Wilson’s terminology: this CFTP algorithm builds up the result piece-by-piece with no back-tracking. Go Back Full Screen (Dead leaves in recent image analysis work: see Lee, Mumford, and Huang 2001, Gousseau and Roueff 2003.) Close Quit We can think of the “dead leaves” process as a Markov chain with states which are points in “pattern space”. Home Page Title Page Corollary 1.5 Occlusion CFTP as described above delivers a sample from the dead leaves distribution. Proof: Use the notation of Theorem 1.2: F(−n,t] now represents superposition of random leaves falling over the period (−n, t]. It suffices to note, general Markov chain theory2 can be combined with the apparatus of stochastic geometry to show the random pattern converges in distribution to a limiting random pattern. Contents JJ II J I Page 20 of 111 Go Back Full Screen Close 2 eg: Exploit the regeneration which occurs when the window is completely covered. Quit Exercise 1.6 Show that bias results from simulating forwards in time till the image is completely covered. Hint: consider a field of view small enough for it to be covered completely by a single leaf: argue by comparison that the forwards simulation is relatively more likely to result in a pattern made up of just one large leaf! Home Page Title Page Contents Remark 1.7 Web animations of perfect simulation for the dead leaves model can be found at JJ II J I www.warwick.ac.uk/go/wsk/abstracts/dead/ Page 21 of 111 Remark 1.8 Other examples of occlusion CFTP include the Aldous-Broder algorithm for generating random spanning trees (Broder 1989; Aldous 1990; Wilson and Propp 1996; Wilson 1996). Go Back Full Screen Close Quit 1.5. Ising CFTP Home Page Propp and Wilson (1996) showed how to make exact draws from the critical Ising model using Sweeny (1983)’s single-bond heat-bath (for Swendsen-Wang, see Huber 2003). A simpler application uses the heat-bath sampler to get exact draws from the sub-critical Ising model. Classic CFTP for the Ising model (simple, sub-critical case). Heat-bath dynamics run from past; compare maximal and minimal starts. Title Page Contents JJ II J I Page 22 of 111 Go Back The simulation displays the results of a systematic scan Gibbs sampler. The left-most image is generated by the lower chain; the rightmost image by the upper chain, while the middle red image expresses the difference. The classic CFTP recipe is followed: after cycling through a time interval [−T, 0], the simulation extends back to cycle through [−2T, 0] unless the upper and lower chains coincide at time 0 (middle image is blank at end of cycle). Details of the target Ising measure are configurable (neighbourhood structure varying between 4 nearest neighbours, 4+4 √ nearest and next-nearest neighbours, with a “symmetry” option fixing next-nearest interaction as 1/ 2 of nearest interaction). Coalescence becomes very slow as parameters approach super-criticality. Full Screen Close Quit We can also use this for the conditioned Ising model,3 as used in image analysis applications (see also Besag’s lectures).4 Home Page Title Page Classic CFTP for the Ising model (super-critical case, conditioned). Heat-bath dynamics run from past; compare maximal and minimal starts. Simulation display and options as before. Coalescence speed now depends on interaction between correlation parameters and conditioning, as could be predicted from theoretical considerations (Kin- Contents JJ II J I Page 23 of 111 dermann and Snell 1980). Go Back Full Screen 3 Conditioning ≡ external magnetic field! Note however, the Ising model is a very poor model for this particular (dithered) image! Close 4 Quit Remark 1.9 Web animations of perfect simulations of conditioned Ising models (using simple images which are rather more suitable for 4-neighbour Ising models!) can be found at Home Page Title Page www.warwick.ac.uk/go/wsk/ising-animations/ Contents JJ II J I Page 24 of 111 Go Back Full Screen Close Quit 1.6. Point process CFTP Home Page Classic CFTP is not limited to discrete models. We have already seen this in the case of the falling leaves model. Here is a further example: perfect simulation procedure for attractive area-interaction point processes (Häggström et al. 1999).5 Area-interaction point process: weight a Poisson process realization according to the area of the region closer than r to the pattern: pattern density ∝ γ −area of region within distance r of pattern . (1.2) Title Page Contents JJ II J I Page 25 of 111 Exercise 1.10 If γ > 1 (attractive case only!), show the above density is proportional to the probability that an independent Poisson process places no points within distance r of the pattern. Go Back Full Screen Close 5 Also known as the Widom and Rowlinson (1970) model. Quit Hence the area-interaction point process may be represented as the pattern of red points, where red and blue points are distributed as Poisson patterns conditioned to be at least distance r from blue and red points respectively. This can be implemented as a (typically impracticable) rejection sampler: a more practical option is to use a Gibbs sampler, which is monotonic and so lends itself to (classic!) CFTP. First we describe the Gibbs sampler, then we demonstrate the CFTP procedure using an animation. Home Page Title Page Contents JJ II J I Page 26 of 111 Go Back Full Screen Close Quit Construct a Poisson point process (centres of red crosses). Home Page Title Page Construct a new Poisson process (centres of blue discs), but censor all points of the new process such that a disc centred on the point overlaps a red point. Contents JJ II J I Page 27 of 111 Discard the old red points, construct a new Poisson process (centres of red crosses), but censor all points of the new process which would fall on a blue disc. Go Back Full Screen Close Continue this procedure . . . (notice red/blue duality!) Quit Gibbs sampler CFTP for the attractive area-interaction point process as a marginal of a two-type soft-core repulsion point process. Home Page Title Page Contents The CFTP construction is based on the observations 1. there is a partial order for states (red pattern, blue pattern); 2. “highest” and “lowest” states are (X , ∅) and (∅, X ) where X is the full ground space: Note that these are “pseudo-states”, not achieved by the pattern itself! The fact that there are highest and lowest pseudo-states is the key to this algorithm’s quick convergence (at least when there is no phasetransition effect): the underlying Markov chain is uniformly ergodic. JJ II J I Page 28 of 111 Go Back Full Screen Close Quit 1.7. CFTP in space and time Home Page When interactions are sufficiently weak (certainly weak enough that phase transitions cannot occur!) then the CFTP idea can be applied in space as well as time. In effect, one aims to capture a fragment of a virtual simulation in perfect equilibrium, for which the fragment is spatially limited as well as temporally limited. See Kendall (1997b) for a description and Häggström and Steif (2000) for related theory, also §4.5 on the BFA algorithm. In this case the binary search rationale guides how we should extend the CFTP algorithm in both space and time. Title Page Contents JJ II J I Page 29 of 111 Go Back Full Screen Close Quit 1.8. Pre-history of CFTP Home Page “The conceptual ingredients of CFTP were in the air” (Propp and Wilson 1998) for a long time. For example, elements can be found in: Title Page Contents • Kolmogorov (1936)6 • Doeblin (1937) JJ II • Loynes (1962), Kendall (1983) J I • Letac (1986), Kifer (1986) Page 30 of 111 • Borovkov (1984) Go Back • Broder (1989), Aldous (1990) • Thorisson (1988) Full Screen Close 6 Thanks to Thorisson (2000) for this reference . . . . Quit For example consider Kolmogorov (1936): Home Page Consider an inhomogeneous (finite state) Markov transition kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in the indefinite past? Title Page Contents Yes! Apply diagonalization arguments to select a convergent subsequence from the probability systems arising from progressively earlier and earlier starts: P [X(0) = k] = Qk (0) JJ II J I Page 31 of 111 Go Back Full Screen Close Quit For example consider Kolmogorov (1936): Home Page Consider an inhomogeneous (finite state) Markov transition kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in the indefinite past? Title Page Contents Yes! Apply diagonalization arguments to select a convergent subsequence from the probability systems arising from progressively earlier and earlier starts: P [X(0) = k]= Qk (0) P [X(−1) = j, X(0) = k] = Qj (−1)Pjk (−1, 0) JJ II J I Page 32 of 111 Go Back Full Screen Close Quit For example consider Kolmogorov (1936): Home Page Consider an inhomogeneous (finite state) Markov transition kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in the indefinite past? Title Page Contents Yes! Apply diagonalization arguments to select a convergent subsequence from the probability systems arising from progressively earlier and earlier starts: P [X(0) = k]= Qk (0) P [X(−1) = j, X(0) = k]= Qj (−1)Pjk (−1, 0) P [X(−2) = i, X(−1) = j, X(0) = k] = Qi (−2)Pij (−2, −1)Pjk (−1, 0) JJ II J I Page 33 of 111 Go Back Full Screen Close Quit For example consider Kolmogorov (1936): Home Page Consider an inhomogeneous (finite state) Markov transition kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in the indefinite past? Title Page Contents Yes! Apply diagonalization arguments to select a convergent subsequence from the probability systems arising from progressively earlier and earlier starts: P [X(0) = k] = Qk (0) P [X(−1) = j, X(0) = k] = Qj (−1)Pjk (−1, 0) P [X(−2) = i, X(−1) = j, X(0) = k] = Qi (−2)Pij (−2, −1)Pjk (−1, 0) ... JJ II J I Page 34 of 111 Go Back Full Screen Close Quit Home Page Title Page Lecture 2 CFTP and regeneration Contents JJ II J I Page 35 of 111 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Small sets Small-set CFTP Slice sampler CFTP Multi-shift sampler Catalytic CFTP Read-once CFTP Simulated tempering Pre-history Go Back Full Screen Close Quit 2.1. Small sets Home Page Suppose we want a coupling between two random variables X, Y yielding a maximal positive chance of X = Y and otherwise no constraint. (Related to notion of convergence stationnaire, or “parking convergence”, from stochastic process theory.) Clearly this is relevant to CFTP, where we aspire to coalescence! Given two overlapping probability densities f and g, we can implement such a coupling (X, Y ) as follows: R • Compute α = (f ∧g)(x) d x, and with probability α return a draw of X = Y from the density (f ∧ g)/α. • Otherwise draw X from (f − f ∧ g)/(1 − α) and Y from (g − f ∧ g)/(1 − α). This can be combined usefully with the method of rejection sampling in stochastic simulation. Title Page Contents JJ II J I Page 36 of 111 Go Back Full Screen Close Quit From Doeblin’s time onwards, probabilists have applied this to study Markov chain transition probability kernels: Definition 2.1 (Small set condition) Let X be a Markov chain on a state space S, transition kernel p(x, d y). The set C ⊆ S is a small set of order k if for some probability measure ν and some α > 0 (k) p (x, d y) ≥ I [C] (x) × α ν(d y) . Home Page Title Page Contents JJ II J I (2.1) Here I [C] (x) is the indicator function for the set C. Page 37 of 111 The animation displays evolution of a Markov chain on the unit interval whose transition density p(x, y) is triangular with peak at x. Here the small set is the whole state space [0, 1], of order k = 1, α = 1/2, and ν has the isoceles triangle density over [0, 1]. It is a central result that (non-trivial) small sets (possibly of arbitrarily high order) exist for any modestly regular Markov chain (classical proof: see either of Nummelin 1984; Meyn and Tweedie 1993). Go Back Full Screen Close Quit The trouble with general Markov chains is that the chain may have zero probability of returning to any starting point. However if there is a small set of order 1 then we may re-model the chain to fix this. Home Page Title Page Theorem 2.2 Let X be a Markov chain on a (non-discrete) state space S, transition kernel p(x, d y), with small set C of order 1. Then X can be represented using a new Markov chain on S ∪ {c}, for c a regenerative pseudo-state. For details see Athreya and Ney (1978), also Nummelin (1978). Higher-order small sets can be used if we are prepared to sub-sample the chain . . . . Contents JJ II J I Page 38 of 111 Go Back Small sets (perhaps of higher order) can be used systematically to attack general state space theory using discrete state space methods. See Meyn and Tweedie (1993) also Roberts and Rosenthal (2001).1 Full Screen Close 1 Roberts and Rosenthal (2001) also introduce “pseudo-small sets”, which relate to coupling as small sets relate to CFTP. Quit It is natural to ask whether we can go further, and use small sets to re-model the chain as an entirely discrete affair. However . . . Home Page Title Page Remark 2.3 Non-trivial small sets of order 1 need not exist: however they do exist if (a) the kernel p(x, d y) has a measurable density and (b) chain is sub-sampled at even times. (Both are needed: example in Kendall and Montana 2002.) Contents JJ II J I Page 39 of 111 Go Back Full Screen Close Quit Theorem 2.4 (Kendall and Montana 2002) If the Markov chain has a measurable transition density p(x, y) then the two-step density p(2) (x, y) can be expressed (non-uniquely) as a non-negative countable sum X p(2) (x, y) = fi (x)gi (y) . Home Page Title Page Contents i Proof: Key Lemma, variation on Egoroff’s Theorem: Let p(x, y) be an integrable function on [0, 1]2 . Then we can find subsets Aε ⊂ [0, 1], increasing as ε decreases, such that (a) for any fixed Aε the “L1 -valued function” px is uniformly continuous on Aε : for any η > 0 we can find δ > 0 such that R1 |px (z) − px0 (z)| d z < η for |x − x0 | < δ and x, x0 ∈ Aε . 0 (b) every point x in Aε is of full relative density: as u, v → 0 so Leb([x − u, x + v] ∩ Aε )/(u + v) → 1. JJ II J I Page 40 of 111 Go Back Full Screen Close Quit 2.2. Murdoch-Green small-set CFTP Home Page Green and Murdoch (1999) showed how to use small sets to carry out CFTP when the state space is continuous with no helpful ordering. Title Page Contents Small-set CFTP in nearly the simplest possible case: triangular kernel on [0, 1]. JJ II J I Page 41 of 111 Exercise 2.5 Express Small-set CFTP as composition of Geometrically many kernels (conditioned to avoid smallset effect), with starting point randomized by small-set distribution ν. Go Back Full Screen Close Quit 2.3. Slice sampler CFTP Home Page Task is to draw from density f (x). Here is a one-dimensional example! (But note, this is only interesting because it can be made to work in many dimensions . . . ) Suppose f unimodal. Define g0(y), g1(y) implicitly by: [g0(y), g1(y)] is the super-level set {x : f (x) ≥ y}. Alternate between drawing y uniformly from [0, f (x)] and drawing x uniformly from [g0(y), g1(y)]. Title Page Contents JJ II J I Page 42 of 111 Go Back Full Screen Roberts and Rosenthal (1999, Theorem 12) show rapid convergence (order of 530 iterations!) under a specific variation of log-concavity. Mira, Møller, and Roberts (2001) describe a perfect variant. Close Quit Exercise 2.6 Design your own perfect slice sampler for the case when the unimodal density f (x) has bounded support. Home Page Title Page HINT: You need to figure out how to make uniform choices for two versions of the process simultaneously, so as to preserve the partial ordering (x, y) (x0 , y 0 ) Contents if f (x) ≤ f (y) , but also so as to have positive chances of coalescing. (2.2) JJ II J I Page 43 of 111 Go Back Full Screen Close Quit 2.4. Multi-shift sampler Home Page Question 2.7 How to draw simultaneously for all x ∈ R from Uniform(x, x + 1), and couple the draws? Answer: random unit span integer lattice (Wilson 2000b). Title Page Contents Wilson (2000b) also considers more general distributions! For example, we can express a normal distribution as a mixture of uniform distributions using slice sampler ideas, and this allows us to do the same trick in several ways. The method also deals with multivariate and even multi-modal cases. JJ II J I Page 44 of 111 Go Back Full Screen Close Corcoran and Schneider (2003) carry this even further, inventing and exploiting a scheme to couple draws from Uniform distributions with Quit different ranges.. Home Page Title Page Contents JJ II J I Page 45 of 111 Go Back Full Screen Close Quit 2.5. Catalytic CFTP Home Page Breyer and Roberts (2002) have devised an “automatic” variation on small-set CFTP: catalytic CFTP . The underlying idea is to perform simultaneous Metropolis-Hastings updates for all possible states, using a common Uniform[0, 1] random variable U to determine rejection or acceptance. For suitable proposal random fields Φx , it may be possible to identify when the input-output map F(−t,0] coalesces into a finite range; moreover the choice of construction for Φx can be varied from time point to time point. Simulations can be viewed at Title Page Contents JJ II J I Page 46 of 111 www.lbreyer.com/fcoupler.html Go Back Full Screen Close Quit 2.6. Read-once CFTP Home Page Wilson (2000a): build F(−n,0] of Theorem 1.2 from i.i.d. blocks Title Page F(−nt,0] = F(−t,0] ◦ . . . ◦ F(−(n−1)t,(−(n−2)t] ◦ F(−nt,−(n−1)t] . The blocking length t is chosen so that there is a positive chance of D the map B = F(−t,0] being coalescent. By a simple computation, the CFTP procedure is identical to the following forwards procedure: • Repeatedly draw independent realizations of B till a coalescent block is obtained; note coalesced output x. • Repeatedly draw independent realizations of B; while these are not coalescent compute the update x ← B(x). • When a coalescent realization of B is obtained, return x without updating! Contents JJ II J I Page 47 of 111 Go Back Full Screen Close Quit There are strong resonances with small-set CFTP (the possibility of coalescent B corresponds to the whole state space being a small set of order t) and with the dead leaves CFTP example (one can view the argument as a re-ordering in time). Exercise 2.8 Prove the validity of read-once CFTP by establishing that the above forwards procedure produces a sequence of B maps which have the same distribution as would be produced by carrying out classic CFTP, but checking for coalescence only block-by-block. Exercise 2.5 should be helpful! Home Page Title Page Contents JJ II J I Page 48 of 111 The choice of the blocking length t is of course crucial! A feature of major importance of this method is that storage requirements are minimized: one needs only (a) to flag whether a block is coalescent, (b) to compute the output of a coalescent block, and (c) to track the current state of the chain as the blocks are produced. Go Back Full Screen Close Quit 2.7. Perfect simulated tempering Home Page Recall that simulated tempering embeds the target distribution π = π0 as one of a family πi : i ∈ I of distributions: for each i ∈ I one devises a chain Xi leaving πi invariant, and carries out simulated tempering by evolving according to chain Xi while making occasional proposals to change i to another value i0 ∈ I. The distribution of the resulting chain is proportional to the target π0 when restricted to i = 0. Perfect simulated tempering (Møller and Nicholls 1999; Brooks et al. 2002) devises schemes in which the chain at i = N (say) coalesces very fast, moreover it is possible to construct upper and lower ichains after the manner of random walk CFTP. One can then carry out CFTP by evolving the upper and lower i-chains till (a) they coalesce at i = N , and (b) during the period of i-coalescence the XN chain itself coalesces. This combines well with the ideas of read-once CFTP applied to the upper and lower i-chains (Brooks et al. 2002). Title Page Contents JJ II J I Page 49 of 111 Go Back Full Screen Close Quit 2.8. Pre-history of small-set CFTP Home Page • Athreya and Ney (1978), Nummelin (1978) • Asmussen, Glynn, and Thorisson (1992) Title Page • Mykland, Tierney, and Yu (1995). Contents JJ II J I Page 50 of 111 Go Back Full Screen Close Quit Home Page Title Page Lecture 3 Dominated CFTP Contents JJ II J I Page 51 of 111 3.1 3.2 3.3 3.4 3.5 3.6 Queues Ergodicity Birth and death Perpetuities Point processes General theorem Go Back Full Screen Close Quit 3.1. Queues Home Page Consider a GI/G/1 queue. Lindley noticed a beautiful representation for waiting time Wn of customer n in terms of services Sn and inter-arrivals Xn . . . Title Page Contents Theorem 3.1 (Lindley’s equation) Consider the waiting time identity for the GI/G/1 queue. Wn+1 = max{0, Wn + Sn − Xn+1 } = max{0, Wn + ηn } = max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 } D = max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn } and thus we obtain the steady-state expression D W∞ = max{0, η1 , η1 + η2 , . . .} . JJ II J I Page 52 of 111 Go Back Full Screen Close SLLN/CLT/random walk theory shows W∞ will be finite if and only if E [ηi ] < 0 or ηi ≡ 0. Quit Supposing we lose independence? Loynes (1962) discovered a coupling application to queues with (for example) general dependent stationary inputs and associated service times, pre-figuring CFTP. Home Page Title Page Theorem 3.2 Suppose queue arrivals follow a stationary point process stretching back to time −∞. Denote arrivals/associated service times in (s, t] by Ns,t (stationarity: statistics of process {Ns,s+u : u ≥ 0} do not depend on s). Let QT denote the behaviour of the queue observed from time 0 onwards if begun with 0 customers at time −T . The queue converges to statistical equilibrium if and only if Contents JJ II J I Page 53 of 111 T lim Q T →∞ exists almost surely. Go Back Full Screen Remark 3.3 Stoyan (1983) develops this kind of idea. Kendall (1983) gives an application to storage problems. Close Quit Remark 3.4 Coupling and CFTP ideas enter into Theorem 3.1 at the crucial time-reversal step: max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 } D = = Home Page Title Page max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn } Compare Section 1.4 on falling leaves . . . . Remark 3.5 The problem about applying Theorem 3.1 in CFTP is that we don’t know which η1 + η2 + . . . + ηn attains the supremum max{0, η1 , η1 + η2 , . . .}. Contents JJ II J I Page 54 of 111 Go Back Exercise 3.6 Use simple R statistical package scripts and elementary calculations to implement the Loynes coupling. See my GTP online notes for further details. Full Screen Close Quit 3.2. Uniform and Geometric Ergodicity Home Page There is a theoretical issue which forms a road block to the use of Lindley’s representation in CFTP. Recall the notions of uniform ergodicity and geometric ergodicity. Title Page Contents Definition 3.7 A Markov chain with transition kernel p(x, ·) possesses geometric ergodicity if there are 0 < ρ < 1, R(x) > 0 with kπ − p(n) (x, ·)kTV ≤ R(x)ρn for all n, all starting points x. JJ II J I Page 55 of 111 Go Back Full Screen So a chain exhibits geometric ergodicity if equilibrium is approached at a geometric rate. Note that the geometric bound is shifted by the chain’s starting point. However . . . Close Quit Definition 3.8 A Markov chain with transition kernel p(x, ·) possesses uniform ergodicity if there are ρ ∈ (0, 1), R > 0 not depending on the starting point x such that Home Page Title Page kπ − p(n) (x, ·)kTV ≤ Rρn Contents for all n, uniformly for all starting points x. So a chain exhibits uniform ergodicity if the geometric rate is not affected by the chain’s starting point. JJ II J I Page 56 of 111 Uniform ergodicity corresponds loosely to “virtually finite state space”. However chains may still be uniformly ergodic even if the state space is far from finite: the Häggström et al. (1999) chain in Section 1.6 is a good example. Go Back Full Screen Close Quit Foss and Tweedie (1998) show that the (theoretical) possibility of classic CFTP is equivalent (under modest regularity) to uniform ergodicity (rate of convergence to equilibrium does not depend on start-point). For classic CFTP needs vertical coalescence: every possible start from time −T leads to the same result at time 0, and this implies a uniformly geometric rate of convergence to equilibrium. Home Page Title Page Contents JJ II J I Page 57 of 111 Figure 3.1: Vertical coalescence Go Back The converse, that uniform ergodicity implies classic CFTP under modest regularity is possible in principle, follows from small set theory. Full Screen Close Quit Theorem 3.9 (Foss and Tweedie 1998) Suppose a Markov chain X on a general state space X has positive probability of hitting a specified small set C of order k, where the probability may depend on the starting position but is always positive. Then X is uniformly ergodic if and only if classic CFTP is possible in principle (disregarding all computation problems!). Proof (Outline): It is clear from the CFTP construction that classic CFTP forces uniform ergodicity. On the other hand, uniform ergodicity means we can choose n such that p(n) (x, ·) is close to equilibrium in total variation, uniformly in x. It follows that in principle we can design a split chain which has positive chance of applying regeneration every n + k time steps, and this permits construction of small-set CFTP . Home Page Title Page Contents JJ II J I Page 58 of 111 Go Back Full Screen Close Quit The practical obstacle here is that we will have to gain knowledge of p(n) (x, ·) to build the split chain. But in general we expect p(n) (x, ·) to be less accessible than the equilibrium distribution itself! Home Page Title Page It follows from the very existence of CFTP constructions, all our chains so far have been uniformly ergodic. Can we lift this uniform ergodicity requirement? CFTP almost works with horizontal coalescence as exhibited in the Lindley representation: “all sufficiently early starts from a specific location * lead to the same result at time 0”. But, as the Lindley representation highlights, the question is how to identify when this has happened? Contents JJ II J I Page 59 of 111 Go Back Full Screen Figure 3.2: Horizontal coalescence Close Quit Remark 3.10 Dominated CFTP idea (domCFTP): Generate target chains coupled to a dominating process for which equilibrium is known. Domination allows us to identify horizontal coalescence by checking starts from maxima given by the dominating process. Home Page Title Page Contents If we can make this work then we can apply CFTP to Markov chains which are merely geometrically ergodic (Kendall 1998; Kendall and Møller 2000; Kendall and Thönnes 1999; Cai and Kendall 2002) or worse (geometric ergodicity 6= domCFTP!). JJ II J I Page 60 of 111 Go Back Full Screen Close Quit Theorem 3.11 (Kendall 1998; Kendall and Møller 2000): If coalescence is almost sure then domCFTP samples from equilibrium. Home Page Title Page Contents Proof: For simplicity we suppose the target process X and the dominating process Y are non-negative. Let X upper,−n , X lower,−n = X −n be versions of the target chain started at time −n at Y (−n), 0 respectively. Let −T be the latest time such that X upper,−T (0) = X lower,−T (0) = X −T (0) (so −T is the coalescence time). Now argue as in Theorem 1.2 for classic CFTP: If X converges to an equilibrium π in total variation distTV then distTV (L(X0−T ), π) = lim distTV (L(X0−n ), π) = lim distTV (L(Xn0 ), π) = 0 n JJ II J I Page 61 of 111 Go Back n hence result.1 Full Screen Close 1 Note: the very last step makes implicit use of stationarity of Y ! Quit Home Page Title Page Contents JJ II J I Figure 3.3: Dominated CFTP Thus we can use realizations of the target process started from the dominating process to identify horizontal coalescence. Page 62 of 111 Go Back Full Screen Close Quit 3.3. Non-linear birth-death processes Home Page To illustrate domCFTP in detail, we describe a simple example taken from Kendall (1997a). Consider a continuous-time non-linear birthdeath process X, with transition rates X →X +1 X →X −1 Title Page Contents at rate λX , at rate Xµ , for positive λX , µ. We suppose the birth rate λX is bounded above2 by λ. Of course it is possible to compute the equilibrium distribution using detailed balance. However here the object of the exercise is to construct a domCFTP method to draw exactly from the target distribution. JJ II J I Page 63 of 111 Go Back Full Screen Close 2 tive! Kendall (1997a) requires monotonicity for λX , which is unnecessarily restricQuit Note first that the non-linear birth-death process X can be bounded above, or dominated by, the linear birth-death process Y with transition rates Home Page Title Page X →X +1 X →X −1 at rate λ , at rate Xµ . Here domination means, if 0 ≤ X(0) ≤ Y (0) then we can construct coupled copies of X and Y such that the relationship X ≤ Y is maintained for all time. Indeed we can go further: given the process Y then for any x, 0 ≤ x ≤ Y (0), we can construct a copy Xx of X begun at x such that 0 ≤ Xa ≤ Xb ≤ Y for all time whenever a ≤ b ≤ Y (0). Contents JJ II J I Page 64 of 111 Go Back We do this as follows. Full Screen Close Quit We construct X from Y by supposing, X can have a birth only if Y has a birth, and similarly for deaths. Suppose to each birth incident and each death incident of Y there is attached an independent Uniform[0, 1] random mark U . So the construction of X is specified by determining for each Y incident whether or not this corresponds to an X incident. • A birth incident Y → Y + 1 at time t with mark U is allowed to generate an X birth incident exactly when U ≤ λX(t−) ; λ Home Page Title Page Contents JJ II J I (3.1) Page 65 of 111 • A death incident Y → Y − 1 at time t with mark U is allowed to generate an X death incident exactly when U ≤ µX(t−) µY (t−) = X(t−) . Y (t−) (3.2) Go Back Full Screen Close Quit It is apparent from X(t−) ≤ Y (t−) and the increasing nature of λX ≤ λ that the U -based criteria above use probabilities λX(t−) /λ ≤ 1 and X(t−)/Y (t−) ≤ 1 respectively. This permits an inductive argument, iterating through the birth and death incidents of Y , which shows X ≤ Y for all time, and which indeed also demonstrates 0 ≤ Xa ≤ Xb ≤ Y if 0 ≤ Xa (0) ≤ Xb (0) ≤ Y (0). Now carry out the CFTP construction, but making starts at times −n, −2n, . . . using a stationary realization of the dominating process, rather than the top-most state. To do this it is necessary to be able to Home Page Title Page Contents JJ II J I Page 66 of 111 1. draw from the equilibrium of the dominating process (easy here: detailed balance identifies the equilibrium distribution as Geometric); 2. simulate the reversed process in equilibrium (easy here: by detailed balance the process is reversible). Go Back Full Screen Close Quit 3.4. A simple application of domCFTP: Perpetuities Home Page Title Page Consider the simplest perpetuity, with distribution defined by L(X) = L(U (1 + X)) (3.3) Contents where U is a Uniform(0, 1) random variable which is independent of X. Clearly X should have limiting distribution given by the infinite product U1 (1 + U2 (1 + U3 (1 + . . .))) (3.4) JJ II J I where the Ui are independent Uniform(0, 1) random variables, so Ui can be viewed as effect of random inflation on an annual payment. Page 67 of 111 Go Back We want to know about this limiting distribution of X: in effect we want to know about the equilibrium behaviour of the Markov chain whose update is Full Screen Close Xn+1 = Un+1 (1 + Xn ) . Quit Monotonicity and CFTP ideas can be applied to the infinite product (3.4): certainly the limiting distribution satisfies L(X) ≥ lim {U1 (1 + U2 (1 + U3 (1 + . . . (1 + Un × 0) . . .)))} n and it is possible to establish equality using small sets and FosterLyapunov theory. However we are left with the dilemma already faced when considering the Lindley representation in the Section 3.1 on queues: we have horizontal coalescence here rather than vertical coalescence, so classical CFTP is inapplicable. Home Page Title Page Contents JJ II J I Page 68 of 111 Go Back Full Screen Close Quit Here is how to apply domCFTP. Take logs and produce an upper bound on the Markov chain log Xn which is a reflecting random walk: 1 + log Xn log Xn+1 = log Un+1 + log 1 + Xn 1 ≤ −Vn+1 + log 1 + + log Xn if Xn ≥ k ; k where Vn+1 is an Exponential(1) random variable. Thus log X ≤ Z if Z is the reflecting random walk given by ( κ − Vn+1 + Zn if κ − Vn+1 + Zn ≥ log k , Zn+1 = (3.5) log k otherwise, where κ = log(1 + 1/k). Home Page Title Page Contents JJ II J I Page 69 of 111 Go Back Full Screen Close Quit We need an equilibrium distribution for Z, which can be viewed as the workload process of a modest variation on a discrete M/D/1 queue. Actually it is easiest to bound Z above by a reversible reflecting simple random walk W : we approximate κ − Vn+1 by a discrete random variable if Vn+1 ≤ κ, probability 1 − e−κ , κ ∆Wn+1 = 0 if κ ≤ Vn+1 < 2κ, probability e−κ − e−2κ , −κ if 2κ ≤ Vn+1 , probability e−2κ , Home Page Title Page Contents JJ II J I and note that the corresponding reflecting simple random walk W on {nκ + log k : n = 0, 1, 2, . . .} dominates Z (use coupling!), is positive-recurrent if e−2κ + e−κ > 1 (in which case eκ (eκ − 1) < 1), and then has equilibrium distribution which is Geometric: P [W = nκ + log k] = (eκ (eκ − 1))n (1 − eκ (eκ − 1)) . (3.6) Page 70 of 111 Go Back Full Screen Close Quit So we can draw from this Geometric distribution, then simulate the (reversible!) reflecting simple random walk W backwards in time, and use exp(W ) as dominating process. Calculations show that a choice of k = 2.76796 . . . is optimal, in the sense of minimizing the mean equilibrium value of W . There is still one problem left. If we apply the same value of Ui to all possible starts lying below the dominating process, then we will not achieve coalescence. (We will get convergence at a geometric rate, which will eventually be absorbed by rounding error, but that is not the same thing!) It is best to use Wilson’s multi-shift trick. A perfect perpetuity. Note that the approximate algorithm, “run recurrence till initial conditions lost in floating point error” is slower! Home Page Title Page Contents JJ II J I Page 71 of 111 Go Back Full Screen Close See Devroye (2001) for a different (but still perfect!) approach. Quit 3.5. Point processes Home Page Dominated CFTP works, for example, on both attractive and repulsive area-interaction point processes (Kendall 1998; Kendall and Møller 2000): using as target chain a spatial birth-and-death process, which give birth to points at a rate determined by the local interaction with pre-existent points, and which kills points at unit rate per point. This allows the use of domCFTP in a manner very similar to that of Section 3.3, but with Uniform[0, 1] marks replaced by marks which are Poisson clusters, as described in Kendall (1997a). Dominated CFTP for attractive areainteraction point process with geometric marking using Poisson processes in disks. Title Page Contents JJ II J I Page 72 of 111 Go Back Full Screen It is of interest in stochastic geometry that this expresses such point processes as explicit but highly dependent thinnings of Poisson point processes. Close Quit How do we adapt the method of domCFTP (or indeed CFTP in general) to non-monotonic cases? We can use the crossover trick (Kendall 1998), which we explain in the context of repulsive area interaction γ < 1. Create two chains to bound the target chain X above (by X upper ) and below (by X lower ). Cross over the rules for birth: a proposed point is born in X upper if it would pass the test for X lower , and vice versa. Then automatically X lower ⊆ X ⊆ X upper so CFTP can be applied. Kendall and Møller (2000) give a general formulation for point processes. See also Huber (1999)’s use of “swap moves” in the context of bounding chains, which he uses to estimate a rapid-mixing regime. If the birth proposal is blocked by just one point, then replace the blocking point by the new proposal in a swap, with swap probability pswap which we are free to choose; fix the bounding chains accordingly(!); this results in a provable speed-up of the CFTP algorithm. Home Page Title Page Contents JJ II J I Page 73 of 111 Go Back Full Screen Close Quit 3.6. A general theorem for CFTP Home Page Moving from CFTP to domCFTP suggests still further abstraction. This is convenient, for example, when considering CFTP for conditioned point processes as in Cai and Kendall (2002): it can be convenient to allow the upper- and lower-processes to move out of the conditioned state space. Let X be a Markov chain on X which exhibits equilibrium behaviour. Embed the state space X in a partially ordered space (Y, ) so that X is at bottom of Y, in the sense that for any y ∈ Y, x ∈ X, yx implies y = x. Title Page Contents JJ II J I Page 74 of 111 Go Back Full Screen Close Quit We may then use the methods of Theorem 1.2 (CFTP) and Theorem 3.11 (domCFTP) to show: Theorem 3.12 Define a Markov chain Y on Y such that Y evolves as X after it hits X ; let Y (−u, t) be the value at t of a version of Y begun at time −u, (a) of fixed initial distribution: L(Y (−T, −T )) = L(Y (0, 0)), and (b) obeying funnelling: if −v ≤ −u ≤ t then Y (−v, t) Y (−u, t). Home Page Title Page Contents JJ II J I Page 75 of 111 Suppose coalescence occurs: P [Y (−T, 0) ∈ X ] → 1 as T → ∞. Then lim Y (−T, 0) can be used for a CFTP draw from the equilibrium of X. Go Back Full Screen Close Quit Exercise 3.13 Murdoch (2000) points out a MCMC algorithm can be forced to become uniformly ergodic by altering the move to allow a small chance of an independence sampler move. This can be expressed in terms of the above general framework. Home Page Title Page Contents JJ II J I Page 76 of 111 Go Back Full Screen Close Quit Home Page Title Page Lecture 4 Theory and connections Contents JJ II J I Page 77 of 111 4.1 4.2 4.3 4.4 4.5 FMMR Recycling randomness Efficiency Foster-Lyapunov BFA algorithm Go Back Full Screen Close Quit 4.1. Siegmund duality and FMMR Home Page An important alternative to CFTP makes fuller use of the notion of time reversal, as in the dead-leaves example, and Section 3.1 on queues. We begin with a beautiful duality. Title Page Contents Theorem 4.1 (Siegmund duality) Suppose X is a process on [0, ∞). When is there another process Y satisfying the following? P [Xt ≥ y|X0 = x] = P [Yt ≤ x|Y0 = y] (4.1) Answer (Siegmund 1976): exactly when X is (suitably regular and) stochastically monotone: x ≤ x0 implies JJ II J I Page 78 of 111 Go Back P [Xt ≥ y|X0 = x] ≤ 0 P [Xt ≥ y|X0 = x ] . Full Screen Proof (Outline): Use Equation (4.1) and Fubini’s Theorem to derive the ChapmanKolmogorov equations. Close Quit Remark 4.2 If X is not stochastically monotone then we get negative transition probabilities for Y ! Home Page Title Page Remark 4.3 Consequence: Y is absorbed at 0, and X at ∞. Remark 4.4 Intuition: think of the Siegmund dual this way. Couple monotonically the X (x) begun at different x (y) (x) (use stochastic monotonicity!), set Yt = x if Xt = y. Contents JJ II J I Page 79 of 111 Go Back Full Screen Close Quit This beautiful idea grew into a method of simulation, and then a method of perfect simulation called Fill’s method (Fill 1998) alternative to CFTP. Fill’s method considered on its own is harder to explain than CFTP. Home Page Title Page On the other hand it has advantages too: • it can provide user-interruptibility (perfect draws are not biased by censoring draws which take longer than a specified threshold); • as we will see, it can be viewed as a conditional version of CFTP, and the conditioning can be used to speed up the algorithm. Contents JJ II J I Page 80 of 111 Go Back Full Screen Close Quit Fill’s method (Fill 1998; Thönnes 1999) is at first sight quite different from CFTP. It is based on the notion of a strong uniform time T (Diaconis and Fill 1990) and is related to Siegmund duality. However Fill et al. (2000) establish a profound and informative link with CFTP. We explain using “blocks” as input-output maps for a chain. First recall that CFTP can be viewed in a curiously redundant fashion as follows: • Draw from equilibrium X(−T ) and run forwards; Home Page Title Page Contents JJ II J I Page 81 of 111 • continue to increase T until X(0) is coalesced; Go Back • return X(0); note that by construction T and X(0) are independent of X(−T ). Full Screen Close Quit Key observation: By construction, X(0) and T are independent of X(−T ) so we can condition on them! Home Page Title Page Contents JJ II J I • Condition on a convenient X(0); • Run X backwards to a fixed time −T ; • Draw blocks conditioned on the X transitions; • If coalescence then return X(−T ) else repeat. This makes it apparent that there are gains to be obtained over CFTP by careful selection of the convenient X(0). These gains can be dramatic! (See for example Dobrow and Fill 2003.) Page 82 of 111 Go Back Full Screen Close Quit Key observation: By construction, X(0) and T are independent of X(−T ) so we can condition on them! Home Page Title Page Contents JJ II J I • Condition on a convenient X(0); • Run X backwards to a fixed time −T ; • Draw blocks conditioned on the X transitions; • If coalescence then return X(−T ) else repeat. This makes it apparent that there are gains to be obtained over CFTP by careful selection of the convenient X(0). These gains can be dramatic! (See for example Dobrow and Fill 2003.) Page 83 of 111 Go Back Full Screen Close Quit Question 4.5 Is there a dominated version of Fill’s method? Home Page Title Page Contents JJ II J I Page 84 of 111 Go Back Full Screen Close Quit 4.2. Recycling randomness Home Page Fill and Huber (2000) introduce a quite different form of perfect simulation! Here is how they apply their Randomness Recycler algorithm to the problem of drawing a random independent subset X of a graph G, weighted exponentially by number of points in X. Start: V = ∅, x ≡ 0. End: V = G, x indicates X membership. while V 6= G : v ← choice (G − V ) # Choose v from G \ V V.add (v) if uniform (0, 1) ≤ 1/(1 + alpha) : x[v] ← 0 # Skip v with prob 1/(1 + α) else : x[v] ← 1 # or tentatively include it ... nbd ← [ ] # ... iterate thro’ neighbours for w ∈ neighbourhood (v) : # Valid? nbd.append (w) if x[w] = 1 : # If not valid ... x[w] ← 0 # ... remove all x[v] ← 0 # “contaminated” vertices V ← V − [v] − nbd break # and move on Title Page Contents JJ II J I Page 85 of 111 Go Back Full Screen Close Quit 4.3. Efficiency and the price of perfection Home Page How efficient might CFTP be? When there is strong enough monotonicity then useful bounds have been derived – even as early as Propp and Wilson (1996). Title Page Contents Remark 4.6 In general CFTP has to involve coupling, usually co-adapted. One expects co-adapted coupling to happen at some exponential rate, and convergence to equilibrium (in total variation norm distTV !) likewise. From the Coupling Inequality (1.1) we know that coupling cannot happen faster than convergence to equilibrium. JJ II J I Page 86 of 111 Go Back Full Screen Close Quit In the case of monotonic CFTP on a finite partially ordered space, Propp and Wilson (1996) present a strong bound. Let ` be the longest chain in the space; let T ∗ be the coalescence time, let Home Page Title Page d(k) = max{Px(k) x,y − Py(k) } . Contents Then P [T ∗ > k] ≤ d(k) ≤ P [T ∗ > k] , ` so CFTP is within a factor of being as good as possible. (4.2) But can it happen slower? and for relatively simple Markov chains? Burdzy and Kendall (2000) show how to use coupling to find out about this coupling problem! JJ II J I Page 87 of 111 Go Back Full Screen Close Quit Coupling of couplings:. . . Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t) while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt) Home Page Title Page Contents JJ II J I Page 88 of 111 Go Back Full Screen Close Quit Coupling of couplings:. . . Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t) while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt) Home Page Title Page |pt (x1 , y) − pt (x2 , y)| = = |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| = |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]| × P [τ > t|X(0) = (x1 , x2 )] Contents JJ II J I Page 89 of 111 Go Back Full Screen Close Quit Coupling of couplings:. . . Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t) while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt) Home Page Title Page |pt (x1 , y) − pt (x2 , y)| = = |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| = |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]| × P [τ > t|X(0) = (x1 , x2 )] Contents JJ II J I Let X ∗ be a independently coupled copy of X but begun at (x2 , x1 ): Page 90 of 111 | P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] | = | P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] − P [X1∗ (t) = y|τ ∗ > t, X ∗ (0) = (x2 , x1 )] | Go Back Full Screen Close Quit Coupling of couplings:. . . Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t) while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt) Home Page Title Page |pt (x1 , y) − pt (x2 , y)| = = |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| = |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]| × P [τ > t|X(0) = (x1 , x2 )] Contents JJ II J I Let X ∗ be a independently coupled copy of X but begun at (x2 , x1 ): Page 91 of 111 | P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] | = | P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] − P [X1∗ (t) = y|τ ∗ > t, X ∗ (0) = (x2 , x1 )] | ≤ P [σ > t|τ > t, τ ∗ > t, X(0) = (x1 , x2 )] (≈ c0 exp(−µ0 t)) for σ the time when X, X ∗ couple. Thus µ2 ≥ µ0 + µ. Go Back Full Screen Close Quit Remark 4.7 So co-adapted coupling is strictly slower than convergence to equilibrium when coupled chains can swap round before coupling (the opposite of monotonicity!). Burdzy and Kendall (2000) present heuristics to expand on this. Home Page Title Page Contents Here is a simple continuous-time Markov chain case. JJ II J I Page 92 of 111 Go Back Full Screen Mountford and Cranston (2000) describe a still simpler example! Close Quit Remark 4.8 Kumar and Ramesh (2001) give a computerscience type example of graph-matchings: coupling becomes much slower than convergence to equilibrium as problem-size increases. Home Page Title Page Contents Remark 4.9 A different question about efficiency is, how best to use CFTP to obtain repeated samples? see Murdoch and Rosenthal (1999). JJ II J I Page 93 of 111 Go Back Full Screen Close Quit 4.4. Dominated CFTP and Foster-Lyapunov conditions Corcoran and Tweedie (2001) describe how to mix domCFTP and small-set CFTP. The upper envelope process must be formulated carefully . . . . When the dominating process visits a small set, then one can attempt small-set coupling. However the dominating process must be large enough to dominate instances when small-set coupling is attempted and fails! There are similarities to Foster-Lyapunov conditions for assessing geometric ergodicity etc for Markov chains. Such conditions use a Lyapunov function V to deliver a controlled supermartingale off a small set. We begin by discussing a Foster-Lyapunov condition for positiverecurrence. Home Page Title Page Contents JJ II J I Page 94 of 111 Go Back Full Screen Close Quit Theorem 4.10 (Meyn and Tweedie 1993) Positiverecurrence on a set C holds if C is a small set and one can find a constant β > 0, and a non-negative function V bounded on C such that for all n > 0 E [V (Xn+1 )|Xn ] ≤ ≤ Title Page Contents V (Xn ) − 1 + β I [Xn ∈ C] . (4.3) Proof: Let N be the random time at which X first (re-)visits C. It suffices1 to show E [N |X0 ] < V (X0 ) + constant < ∞ (then use small-set regeneration). By iteration of (4.3), we deduce E [V (Xn )|X0 ] < ∞ for all n. If X0 6∈ C then (4.3) tells us n 7→ V (Xn∧N ) + n ∧ N defines a nonnegative supermartingale (I X(n∧N ) ∈ C = 0 if n < N ) . Consequently E [N |X0 ] Home Page E [V (XN ) + N |X0 ] ≤ V (X0 ) . JJ II J I Page 95 of 111 Go Back Full Screen Close 1 This martingale approach can be reformulated as an application of Dynkin’s formula. Quit If X0 ∈ C then the above can be used to show Home Page E [N |X0 ] = E [1 × I [X1 ∈ C]|X0 ] + E [E [N |X1 ] I [X1 6∈ C]|X0 ] ≤ P [X1 ∈ C|X0 ] + E [1 + V (X1 )|X0 ] ≤ P [X1 ∈ C|X0 ] + V (X0 ) + β Title Page Contents where the last step uses Inequality (4.3) applied when I [Xn−1 ∈ C] = JJ II J I 1. Page 96 of 111 Now we consider a strengthened condition for geometric ergodicity. Go Back Full Screen Close Quit Theorem 4.11 (Meyn and Tweedie 1993) Geometric ergodicity holds if one can find a small set C, positive constants λ < 1, β, and a function V ≥ 1 bounded on C such that E [V (Xn+1 )|Xn ] ≤ Home Page Title Page Contents λV (Xn ) + β I [Xn ∈ C] . (4.4) Proof: Define N as in Theorem 4.10. Iterating (4.4), we deduce E [V (Xn )|X0 ] < ∞ and more specifically n 7→ V (Xn∧N )/λn∧N II J I Page 97 of 111 is a nonnegative supermartingale. Consequently E V (XN )/λN |X0 ≤ V (X0 ) . Go Back Using the facts that V ≥ 1, λ ∈ (0, 1) and Markov’s inequality we deduce P [N > n|X0 ] ≤ λn V (X0 ) , which delivers the required geometric ergodicity. JJ Full Screen Close Quit Temptation: define dominating process using V . There is an interesting link – Rosenthal (2002) draws it even closer – but: Home Page Title Page Existence of Lyapunov function doesn’t ensure domCFTP The expectation inequality of supermartingale-type, E [V (Xn+1 )|Xn ] ≤ λV (Xn ) + β I [Xn ∈ C] , Contents JJ II J I Page 98 of 111 is enough to control the rate at which X visits C, but for domCFTP based on the ordering implied by V we require well-behaved distributional bounds on the families of distributions Go Back Full Screen Dx = {L(V (Xn+1 |Xn ) : Xn = u, V (u) ≤ V (x)} , and it is easy to construct badly behaved examples. :-( Close Quit Exercise 4.12 Construct a Markov chain on [0, ∞) which satisfies the conditions of Theorem 4.11, using V (x) = x and C = {0}, but such that any V -dominating process for X (a process U for which V (Un+1 ) ≥ V (Xn+1 ) whenever V (Un ) ≥ V (Xn )) must be transient! However I have recently discovered how to circumvent this issue using sub-sampling . . . . Home Page Title Page Contents JJ II J I Page 99 of 111 Go Back Full Screen Close Quit 4.5. Backward-forward algorithm Home Page A careful look at domCFTP for the area-interaction process, or generalizations to other point processes as described in Kendall and Møller (2000), shows that the construction is as follows: Title Page Contents • build space-time Poisson process of free points; • convert free points into initial points for time-like line segments, hence space-time birth and death process; JJ II J I • mark free points independently; Page 100 of 111 • apply causal thinning procedure in time order; Go Back • domCFTP succeeds if the time-zero result of this thinning procedure stabilizes when thinning begins far enough back in time; apply binary search procedure to capture a time early enough to ensure stabilization at time zero. Full Screen Close Quit Ferrari et al. (2002) describe a variant of perfect simulation (BackwardsForwards Algorithm, or BFA) which avoids the need to iterate back through successive starts −T , −2T , −4T . Home Page Title Page • Conduct recursive backwards sweep, identifying free points (ancestors) which by thinning might conceivably influence subsequent points already identified as potentially influencing points in the region of interest; • Work forwards through time in a forwards sweep, thinning out ancestors to obtain required perfect sample at time zero (assuming backwards sweep generates only finitely many ancestors). Contents JJ II J I Page 101 of 111 Go Back Full Screen Instead of coalescence, we now require sub-criticality of the oriented percolation implicit in the backwards sweep; computable conditions arise from standard branching process comparisons, and these conditions will generally apply under sufficiently weak interactions. Close Quit The BFA generalizes easily to deal with space windows of infinite volume processes (compare the “space-time CFTP” mentioned in Section 1.7). Home Page Title Page An enlightening theoretical example arises if one reformulates the Ising model using Peierls contours (lines separating ±1 values). As is well known, these form a “non-interacting hard-core gas”, interacting by perimeter exclusion, to which the Backwards-Forwards Algorithm may in principle be applied: see Fernández et al. (2001). 2 Exercise 4.13 Use BFA to implement a perfect simulation of the Peierls contour model for low temperature Ising models. Contents JJ II J I Page 102 of 111 Go Back Full Screen Close 2 Ferrari et al. (2002) point out, especially for infinite volume processes there is a “user-impatience” bias for which they estimate the effects. Quit References Aldous, D. J. [1990]. The random walk construction of uniform spanning trees and uniform labelled trees. SIAM J. Discrete Math. 3(4), 450–465. Asmussen, S., P. W. Glynn, and H. Thorisson [1992]. Stationarity detection in the initial transient problem. ACM Trans. Model. Comput. Simul. 2(2), 130–157. Athreya, K. B. and P. Ney [1978]. A new approach to the limit theory of recurrent Markov chains. Transactions of the American Mathematical Society 245, 493–501. This is a rich hypertext bibliography. Journals are linked to their homepages, and stable URL links (as provided for example by JSTOR or Project Euclid ) have been added where known. Access to such URLs is not universal: in case of difficulty you should check whether you are registered (directly or indirectly) with the relevant provider. In the case of preprints, icons , , , linking to homepage locations are inserted where available: note that these are probably less stable than journal links!. Borovkov, A. A. [1984]. Asymptotic methods in queuing theory. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Chichester: John Wiley & Sons Ltd. Breyer, L. A. and G. O. Roberts [2002]. A new method for coupling random fields. LMS J. Comput. Math. 5, 77–94 (electronic). Broder, A. [1989]. Generating random spanning trees. In Thirtieth Annual Symposium Foundations Computer Sci., New York, pp. 442-447. IEEE. Brooks, S. and G. Roberts [1997, May]. Assessing convergence of Markov Chain Monte Carlo algorithms. Technical report, University of Cambridge. Home Page Title Page Contents JJ II J I Page 103 of 111 Go Back Brooks, S. P., Y. Fan, and J. S. Rosenthal [2002]. Perfect forward simulation via simulated tempering. Research report, University of Cambridge. Burdzy, K. and W. S. Kendall [2000, May]. Efficient Markovian couplings: examples and counterexamples. The Annals of Applied Probability 10(2), 362–409. Full Screen Close Quit Also University of Warwick Department of Statistics Research Report 331. Cai, Y. and W. S. Kendall [2002, July]. Perfect simulation for correlated Poisson random variables conditioned to be positive. Statistics and Computing 12, 229–243. . Also: University of Warwick Department of Statistics Research Report 349. Corcoran, J. N. and U. Schneider [2003]. Shift and scale coupling methods for perfect simulation. Probab. Engrg. Inform. Sci. 17(3), 277–303. Corcoran, J. N. and R. L. Tweedie [2001]. Perfect sampling of ergodic Harris chains. The Annals of Applied Probability 11(2), 438–451. Cowles, M. and B. Carlin [1995, May]. Markov Chain Monte Carlo convergence diagnostics: A comparative review. Technical report, University of Minnesota, Div. of Biostatistics, USA. Devroye, L. [2001]. Simulating perpetuities. Methodol. Comput. Appl. Probab. 3(1), 97–115. Diaconis, P. [1996]. The cutoff phenomenon in finite Markov chains. Proc. Nat. Acad. Sci. U.S.A. 93(4), 1659–1664. Diaconis, P. and J. Fill [1990]. Strong stationary times via a new form of duality. The Annals of Probability 18, 1483–1522. Diaconis, P. and D. Stroock [1991]. Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab. 1(1), 36–61. Home Page Title Page Contents JJ II J I Dobrow, R. P. and J. A. Fill [2003]. Speeding up the FMMR perfect sampling algorithm: a case study revisited. Random Structures Algorithms 23(4), 434–452. Page 104 of 111 Doeblin, W. [1937]. Sur les propriétés asymptotiques de mouvement régis par certains types de chaı̂nes simples. Bull. Math. Soc. Roum. Sci. 39(1,2), 57–115,3–61. Full Screen Fernández, R., P. A. Ferrari, and N. L. Garcia [2001]. Loss network representation Go Back Close Quit of Peierls contours. The Annals of Probability 29(2), 902–937. Ferrari, P. A., R. Fernández, and N. L. Garcia [2002]. Perfect simulation for interacting point processes, loss networks and Ising models. Stochastic Processes and Their Applications 102(1), 63–88. Fill, J. A. [1998]. An interruptible algorithm for exact sampling via Markov Chains. The Annals of Applied Probability 8, 131–162. Fill, J. A. and M. Huber [2000]. The randomness recycler: a new technique for perfect sampling. In 41st Annual Symposium on Foundations of Computer Science (Redondo Beach, CA, 2000), pp. 503–511. Los Alamitos, CA: IEEE Comput. Soc. Press. Fill, J. A., M. Machida, D. J. Murdoch, and J. S. Rosenthal [2000]. Extension of Fill’s perfect rejection sampling algorithm to general chains. Random Structures and Algorithms 17(3-4), 290–316. Foss, S. G. and R. L. Tweedie [1998]. Perfect simulation and backward coupling. Stochastic Models 14, 187–203. Goldstein, S. [1978 / 1979]. Maximal coupling. Zeitschrift für Wahrscheinlichkeitstheorie und Verve Gebiete 46(2), 193– 204. Gousseau, Y. and F. Roueff [2003, December]. The dead leaves model : general results and limits at small scales. Preprint, arXiv:math.PR/0312035. Green, P. J. and D. J. Murdoch [1999]. Exact sampling for Bayesian inference: towards general purpose algorithms (with discussion). In J. Bernardo, J. Berger, A. Dawid, and A. Smith (Eds.), Bayesian Statistics 6, pp. 301–321. The Clarendon Press Oxford University Press. Presented as an invited paper at the 6th Valencia International Meeting on Bayesian Statistics, Alcossebre, Spain, June 1998. Griffeath, D. [1974 / 1975]. A maximal coupling for Markov chains. Zeitschrift für Wahrscheinlichkeitstheorie und Verve Gebiete 31, 95–106. Home Page Title Page Contents JJ II J I Page 105 of 111 Go Back Full Screen Close Quit Häggström, O. [2002]. Finite Markov chains and algorithmic applications, Volume 52 of London Mathematical Society Student Texts. Cambridge: Cambridge University Press. Häggström, O. and K. Nelander [1998]. Exact sampling from anti-monotone systems. statistica neerlandica 52(3), 360– 380. Häggström, O. and J. E. Steif [2000]. ProppWilson algorithms and finitary codings for high noise Markov random fields. Combin. Probab. Computing 9, 425– 439. Häggström, O., M. N. M. van Lieshout, and J. Møller [1999]. Characterisation results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli 5(5), 641–658. Was: Aalborg Mathematics Department Research Report R-96-2040. Huber, M. [1999]. The swap move: a tool for building better Markov chains. In 10th Annual Symposium on Discrete Algorithms. Huber, M. [2003]. A bounding chain for Swendsen-Wang. Random Structures and Algorithms 22(1), 43–59. Johnson, V. E. [1996]. Studying convergence of Markov chain Monte Carlo algorithms using coupled sample paths. Journal of the American Statistical Association 91(433), 154–166. Kendall, W. S. [1983]. Coupling methods and the storage equation. Journal of Applied Probability 20, 436–441. Kendall, W. S. [1997a]. On some weighted Boolean models. In D. Jeulin (Ed.), Advances in Theory and Applications of Random Sets, Singapore, pp. 105–120. World Scientific. Also: University of Warwick Department of Statistics Research Report 295. Kendall, W. S. [1997b]. Perfect simulation for spatial point processes. In Bulletin ISI, 51st session proceedings, Istanbul (August 1997), Volume 3, pp. 163–166. Also: University of Warwick Department of Statistics Research Report 308. Home Page Title Page Contents JJ II J I Page 106 of 111 Go Back Full Screen Close Quit Kendall, W. S. [1998]. Perfect simulation for the area-interaction point process. In L. Accardi and C. C. Heyde (Eds.), Probability Towards 2000, New York, pp. 218–234. Springer-Verlag. Also: University of Warwick Department of Statistics Research Report 292. Kendall, W. S. and J. Møller [2000, September]. Perfect simulation using dominating processes on ordered state spaces, with application to locally stable point processes. Advances in Applied Probability 32(3), 844–865. Also University of Warwick Department of Statistics Research Report 347. Kendall, W. S. and G. Montana [2002, May]. Small sets and Markov transition densities. Stochastic Processes and Their Ap, also plications 99(2), 177–194. University of Warwick Department of Statistics Research Report 371 Kendall, W. S. and E. Thönnes [1999]. Perfect simulation in stochastic geometry. Pattern Recognition 32(9), 1569–1586. Also: University of Warwick Department of Statistics Research Report 323. Kifer, Y. [1986]. Ergodic theory of random transformations, Volume 10 of Progress in Probability and Statistics. Boston, MA: Birkhäuser Boston Inc. Kindermann, R. and J. L. Snell [1980]. Markov random fields and their applications, Volume 1 of Contemporary Mathematics. Providence, R.I.: American Mathematical Society. Kolmogorov, A. N. [1936]. Zur theorie der markoffschen ketten. Math. Annalen 112, 155–160. Kumar, V. S. A. and H. Ramesh [2001]. Coupling vs. conductance for the JerrumSinclair chain. Random Structures and Algorithms 18(1), 1–17. Lee, A. B., D. Mumford, and J. Huang [2001]. Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model. International Journal of Computer Vision 41, 35–59. Letac, G. [1986]. A contraction principle for certain Markov chains and its applications. In Random matrices and their applications (Brunswick, Maine, 1984), Home Page Title Page Contents JJ II J I Page 107 of 111 Go Back Full Screen Close Quit Volume 50 of Contemp. Math., pp. 263– 273. Providence, RI: Amer. Math. Soc. Lindvall, T. [2002]. Lectures on the coupling method. Mineola, NY: Dover Publications Inc. Corrected reprint of the 1992 original. Loynes, R. [1962]. The stability of a queue with non-independent inter-arrival and service times. Math. Proc. Camb. Phil. Soc. 58(3), 497–520. Meyn, S. P. and R. L. Tweedie [1993]. Markov Chains and Stochastic Stability. New York: Springer-Verlag. Mira, A., J. Møller, and G. O. Roberts [2001]. Perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 63(3), 593–606. See (Mira, Møller, and Roberts 2002) for corrigendum. Mira, A., J. Møller, and G. O. Roberts [2002]. Corrigendum: Perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 64(3), 581. Møller, J. and G. Nicholls [1999]. Perfect simulation for sample-based inference. Technical Report R-99-2011, Department of Mathematical Sciences, Aalborg University. Møller, J. and R. Waagepetersen [2004]. Statistical inference and simulation for spatial point processes, Volume 100 of Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC. Mountford, T. S. and M. Cranston [2000]. Efficient coupling on the circle. In Game theory, optimal stopping, probability and statistics, Volume 35 of IMS Lecture Notes Monogr. Ser., pp. 191–203. Beachwood, OH: Institute of Mathematical Statistics. Murdoch, D. and J. Rosenthal [1999]. Efficient use of exact samples. Statistics and Computing 10, 237–243. Murdoch, D. J. [2000]. Exact sampling for Bayesian inference: unbounded state spaces. In Monte Carlo methods (Toronto, ON, 1998), Volume 26 Home Page Title Page Contents JJ II J I Page 108 of 111 Go Back Full Screen Close Quit of Fields Inst. Commun., pp. 111–121. Providence, RI: Amer. Math. Soc. Random Structures and Algorithms 9, 223–252. Mykland, P., L. Tierney, and B. Yu [1995]. Regeneration in Markov chain samplers. Journal of the American Statistical Association 90(429), 233–241. Roberts, G. O. and N. G. Polson [1994]. On the geometric convergence of the Gibbs sampler. Journal of the Royal Statistical Society (Series B: Methodological) 56(2), 377–384. Nummelin, E. [1978]. A splitting technique for Harris-recurrent chains. Zeitschrift für Wahrscheinlichkeitstheorie und Verve Gebiete 43, 309–318. Nummelin, E. [1984]. General irreducible Markov chains and nonnegative operators. Cambridge: Cambridge University Press. Propp, J. and D. Wilson [1998]. Coupling from the past: a user’s guide. In Microsurveys in discrete probability (Princeton, NJ, 1997), Volume 41 of DIMACS Ser. Discrete Math. Theoret. Comput. Sci., pp. 181–192. Providence, RI: American Mathematical Society. Propp, J. G. and D. B. Wilson [1996]. Exact sampling with coupled Markov chains and applications to statistical mechanics. Roberts, G. O. and J. S. Rosenthal [1999]. Convergence of slice sampler Markov chains. Journal of the Royal Statistical Society (Series B: Methodological) 61(3), 643–660. Roberts, G. O. and J. S. Rosenthal [2001]. Small and pseudo-small sets for Markov chains. Stochastic Models 17(2), 121– 145. Roberts, G. O. and R. L. Tweedie [1996a]. Exponential convergence of Langevin diffusions and their discrete approximations. Bernoulli 2, 341–364. Roberts, G. O. and R. L. Tweedie [1996b]. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 96–110. Home Page Title Page Contents JJ II J I Page 109 of 111 Go Back Full Screen Close Quit Rosenthal, J. S. [2002]. Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability 7, no. 13, 123–128 (electronic). Siegmund, D. [1976]. The equivalence of absorbing and reflecting barrier problems for stochastically monotone Markov processes. The Annals of Probability 4(6), 914–924. Stoyan, D. [1983]. Comparison methods for queues and other stochastic models. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Chichester: John Wiley & Sons. Translation from the German edited by Daryl J. Daley. of Warwick Statistics Research Report 317. Thorisson, H. [1988]. Backward limits. The Annals of Probability 16(2), 914–924. Thorisson, H. [2000]. Coupling, stationarity, and regeneration. New York: SpringerVerlag. Widom, B. and J. S. Rowlinson [1970]. New model for the study of liquid-vapor phase transitions. Journal of Chemical Physics 52, 1670–1684. Sweeny, M. [1983]. Monte Carlo study of weighted percolation clusters relevant to the Potts models. Physical Review B 27(7), 4445–4455. Wilson, D. B. [1996]. Generating random spanning trees more quickly than the cover time. In Proceedings of the Twenty-eighth Annual ACM Symposium on the Theory of Computing (Philadelphia, PA, 1996), New York, pp. 296– 303. ACM. Thönnes, E. [1999]. Perfect simulation of some point processes for the impatient user. Advances in Applied Probability 31(1), 69–87. . Also University Wilson, D. B. [2000a]. How to couple from the past using a read-once source of randomness. Random Structures and Algorithms 16(1), 85–113. Home Page Title Page Contents JJ II J I Page 110 of 111 Go Back Full Screen Close Quit Wilson, D. B. [2000b]. Layered Multishift Coupling for use in Perfect Sampling Algorithms (with a primer on CFTP). In N. Madras (Ed.), Monte Carlo Methods, Volume 26 of Fields Institute Communications, pp. 143–179. American Mathematical Society. Wilson, D. B. [2004]. Mixing times of lozenge tiling and card shuffling Markov chains. The Annals of Applied Probability 14(1), 274–325. Wilson, D. B. and J. G. Propp [1996]. How to get an exact sample from a generic Markov chain and sample a random spanning tree from a directed graph, both within the cover time. In Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (Atlanta, GA, 1996), New York, pp. 448–457. ACM. Home Page Title Page Contents JJ II J I Page 111 of 111 Go Back Full Screen Close Quit