NOTES ON PERFECT SIMULATION Wilfrid S. Kendall Department of Statistics University of Warwick

advertisement
Home Page
Title Page
Contents
NOTES ON PERFECT SIMULATION
JJ
II
J
I
Wilfrid S. Kendall
Department of Statistics, University of Warwick
Page 1 of 111
March 2004
Go Back
This work was carried out while the author was
visiting the Institute for Mathematical Sciences,
National University of Singapore in 2004.
The visit was supported by the Institute.
Full Screen
Version 1.7
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 2 of 111
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
1 CFTP: the classic case
Contents
JJ
II
J
I
7
Page 3 of 111
2 CFTP and regeneration
35
3
Dominated CFTP
51
4
Theory and connections
77
Go Back
Full Screen
Close
Quit
Perfect simulation refers to the art of converting suitable Markov Chain Monte Carlo
algorithms into algorithms which return exact draws from the target distribution,
instead of long-time approximations. The underlying ideas of perfect simulation
stretch back a long long way, but they sprang into prominence as a result of a groundbreaking paper of Propp and Wilson (1996), which showed how to obtain exact
draws from (for example) the critical Ising model.
In these lectures I will describe several themes of perfect simulation:
Home Page
Title Page
Lecture 1 the original or classic Coupling From The Past algorithm (CFTP);
Lecture 2 variations which exploit regeneration ideas: small-set or split-chain constructions from Markov chain theory (small-set CFTP);
Lecture 3 generalizations which deal with non-monotonic and non-uniformly ergodic
examples (dominated CFTP); and
Lecture 4 striking results relating CFTP to an apparently different algorithm due originally to Fill, as well as other theoretical complements.
The talks (which will be organized approximately under the four headings above)
will be illustrated with online animations of CFTP simulations and will describe
related theoretical issues. The objective of these tutorial lectures is to give a sound
and clear exposition of these themes, and hence to promote an interplay of ideas with
practitioners of Markov Chain Monte Carlo.
Contents
JJ
II
J
I
Page 4 of 111
Go Back
Full Screen
Close
Quit
Useful reading
Home Page
• Books: Thorisson (2000), Häggström (2002),
Møller and Waagepetersen (2004).
Title Page
Contents
JJ
II
J
I
Page 5 of 111
Go Back
• Online bibliography:
Full Screen
http://research.microsoft.com/˜dbwilson/exact/
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 6 of 111
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Lecture 1
CFTP: the classic case
Contents
JJ
II
J
I
Page 7 of 111
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Coupling
Random walk CFTP
The CFTP theorem
Falling leaves
Ising CFTP
Point process CFTP
CFTP in space
Pre-history
Go Back
Full Screen
Close
Quit
Recall basic facts concerning Markov chain Monte Carlo (MCMC):
Home Page
• Fundamental paradigm:
– specify the target distribution indirectly;
Title Page
– realize it as the equilibrium of Markov chain;
– sample approximately from target distribution by running
Markov chain for a long time (till near-equilibrium).
• How long is the “burn-in” period?
– Guess or diagnose (Cowles and Carlin 1995; Brooks and
Roberts 1997),
– estimate, whether analytically (Diaconis and Stroock 1991;
Roberts and Polson 1994; Roberts and Tweedie 1996a;
Roberts and Tweedie 1996b), or empirically (Johnson
1996),
– or do better?
Contents
JJ
II
J
I
Page 8 of 111
Go Back
Full Screen
Close
Quit
MCMC arises in a number of different areas of the mathematical sciences, with different emphases. (This makes interaction interesting
and fruitful!) Here are some examples:
• Statistical mechanics. Infinite-dimensional systems: Are there
phase transition phenomena? How do they behave?
• Computer science. Approximate counting: How does the algorithm behave as the scale of the problem increases? Does
the run-time increase exponentially, or polynomially?
• Image analysis. Given a noisy picture with some kind of geometric content: can we clean it up using modelling by spatial
random fields? Can we identify significant features?
• Statistics.
– Bayesian. Can we draw accurately (and quickly if possible!) from the posterior distribution on a space which
may be low-dimensional but not at all symmetric?
Home Page
Title Page
Contents
JJ
II
J
I
Page 9 of 111
Go Back
Full Screen
Close
– Frequentist. What does the likelihood surface look like?
Quit
The Propp and Wilson (1996) CFTP idea:
Home Page
• Design modified MCMC algorithms which deliver exact draws
after randomly long run-time.
Title Page
• Construct actual implementations for interesting cases.
Contents
• Approach is based on probabilistic coupling methods.
• Competitive with ordinary MCMC for amenable cases.
This is exact or perfect simulation.1
Practical applications use a large number of different ideas, which
will be introduced in these lectures.
JJ
II
J
I
Page 10 of 111
Go Back
Full Screen
1
Mark Huber suggests a useful taxonomy: divide exact sampling methods
into (a) direct (loosely: use knowledge of normalizing constant) and (b) perfect
(loosely: don’t use this knowledge).
Close
Quit
1.1. Coupling and convergence:
the binary switch
Home Page
Title Page
We commence by introducing the fundamental idea of coupling (see
Lindvall 2002, or my online notes from the 2003 Durham symposium or the 2003 GTP program for more on coupling). Consider the
simplest possible case: a continuous-time Markov chain with just
two states, which makes transitions from one state to the other at
constant rate 1/α (the binary switch).
Construction: Supply
(a) Poisson process (rate 1/α) of 0 → 1 transitions,
(b) independently ditto of 1 → 0 transitions.
Use transitions to build coupled processes X, Y begun at 0,
1. Clearly X, Y are (coupled) copies of the binary switch,
coupling at the time T of the first Poisson incident.
Contents
JJ
II
J
I
Page 11 of 111
Go Back
Full Screen
Close
Quit
Then the classic coupling inequality argument shows
Home Page
distTV (Xt , π)
=
sup{P [Xt ∈ A] − π(A)}
≤
P [T ≤ t]
A
(1.1)
Title Page
where
Contents
• π is the equilibrium distribution,
• and distTV is the total variation distance.
Diaconis (1996) reviews interesting features of convergence to equilibrium for random walks on the hypercube; the above chain is a
basic building block of such Markov chains.
JJ
II
J
I
Page 12 of 111
Go Back
Full Screen
Close
Quit
The coupling argument generalizes to arbitrary Markov chains:
• if we can couple a general Markov chain X to a version Y
in statistical equilibrium, then such a coupling bounds the approach to equilibrium through Equation (1.1);
Home Page
Title Page
• if we allow non-adapted couplings, the bound is sharp (Griffeath 1975, Goldstein 1979);
• non-adapted couplings can be very difficult to construct! Coadapted couplings can supply usable bounds but need not be
sharp.
• Diaconis (1996) uses hypercube random walks to illustrate the
cutoff phenomenon in simple contexts. Here is an interesting (but probably very hard) research question: how to relate
CFTP to cutoff phenomena? (Useful hints in Propp and Wilson 1996, Wilson 2004.)
Can we use such a coupling to draw from equilibrium? The binary
switch example is deceptive: X(T ) is in equilibrium in the case of
the binary switch, but not in general . . . .
Contents
JJ
II
J
I
Page 13 of 111
Go Back
Full Screen
Close
Quit
1.2. Random walk CFTP
Home Page
Consider coupling applied to the random walk X on {1, 2, . . . , N },
reflected at the boundary points. Use synchronous coupling (generalizes coupling above!): then X only couples at 1, N . Thus X(T )
cannot be a draw from equilibrium if N > 2.
The Propp-Wilson idea draws on a well-established theme from ergodic theory: realize a Markov chain as a stochastic flow and evolve
it not into the future but from the past! However if we do this then
we need to consider coupled realizations of the Markov chain started
at all possible starting points. By monotonicity, we can focus on:
• X lower,−n begun at 1 at time −n,
• X upper,−n begun at N at time −n;
Title Page
Contents
JJ
II
J
I
Page 14 of 111
Go Back
Full Screen
Close
since these can be arranged to sandwich all other realizations begun
at time −n.
Quit
Home Page
• Run upper and lower processes from time −n.
Title Page
• If coupling by time 0, return the common value.
• Otherwise, repeat but start
at time −2n (say).
Issues:
• Run from the past, not into the future;
Contents
JJ
II
J
I
Page 15 of 111
Go Back
• Use common randomness and re-use it;
Full Screen
• Sample at time 0, not at coupling time;
Close
• There is a rationale for n → 2n: binary search.
Quit
Exercise 1.1 Use simple R statistical package scripts and
elementary calculations to investigate bias resulting from
the following.
Home Page
Title Page
• Running into the future, not from the past, stopping
(say) at the first time t = 2n after coupling has occurred;
• Failing to re-use randomness (we’d expect this to bias
towards “earlier” coalescence);
• Sampling at coupling time instead of time 0 (obviously a bad idea; sampling an independent random
walk at this time still gives a biased result).
Contents
JJ
II
J
I
Page 16 of 111
Go Back
See my GTP online notes for further details.
Full Screen
Close
Quit
1.3. The CFTP theorem
Home Page
Morally the proof of classic CFTP is just 3 lines long. Express the
coupling for X in terms of random input-output maps F(−u,v] .
Title Page
Theorem 1.2 (Propp and Wilson 1996): If coalescence is
almost sure then CFTP samples from equilibrium.
Contents
Proof: For each time-range [−n, ∞) use the F(−n,t] to define
Xt−n
=
F(−n,t] (0)
X0−T
L(Xn0 ) .
=
=
whenever − n ≤ −T ;
I
Go Back
Full Screen
distTV (L(X0−T ), π) = lim distTV (L(X0−n ), π) = lim distTV (L(Xn0 ), π) = 0
hence result.
J
Page 17 of 111
If X converges to an equilibrium π in total variation distTV then
n
II
for − n ≤ t .
Finite coalescence time −T is assumed. So
X0−n
L(X0−n )
JJ
n
Close
Quit
Remark 1.3 We are free to choose any “backwards stopping time” −T so long as we can guarantee coalescence of
F(−T,0] . The binary search approach of random walk CFTP
is deservedly popular, but there are alternatives: see for
example the “block-by-block” strategy of read-once CFTP.
Home Page
Title Page
Contents
Remark 1.4 Monotonicity of the target process is convenient for CFTP, but not essential. Propp and Wilson (1996,
§3.2) formalize use of monotonicity. Kendall (1998) describes a “crossover” trick for use in anti-monotonic situations; Kendall and Møller (2000) generalize the trick to
cases where monotonicity is absent (see also Huber 2003
on bounding chains); Häggström and Nelander (1998) apply the trick to lattice systems.
JJ
II
J
I
Page 18 of 111
Go Back
Full Screen
Close
Quit
1.4. The falling leaves of Fontainebleau
Home Page
Kendall and Thönnes (1999) describe a visual and geometric application of CFTP in mathematical geology: this particular example
being well-known to workers in the field previous to the introduction
of CFTP itself.
Occlusion CFTP for the falling leaves
of Fontainebleau.
Title Page
Contents
JJ
II
J
I
(The “dead leaves” model.)
Page 19 of 111
Why “occlusion”? David Wilson’s terminology: this CFTP algorithm builds up the result piece-by-piece with no back-tracking.
Go Back
Full Screen
(Dead leaves in recent image analysis work: see Lee, Mumford, and
Huang 2001, Gousseau and Roueff 2003.)
Close
Quit
We can think of the “dead leaves” process as a Markov chain with
states which are points in “pattern space”.
Home Page
Title Page
Corollary 1.5 Occlusion CFTP as described above delivers a sample from the dead leaves distribution.
Proof: Use the notation of Theorem 1.2: F(−n,t] now represents superposition of random leaves falling over the period (−n, t].
It suffices to note, general Markov chain theory2 can be combined
with the apparatus of stochastic geometry to show the random pattern
converges in distribution to a limiting random pattern.
Contents
JJ
II
J
I
Page 20 of 111
Go Back
Full Screen
Close
2
eg: Exploit the regeneration which occurs when the window is completely
covered.
Quit
Exercise 1.6 Show that bias results from simulating forwards in time till the image is completely covered.
Hint: consider a field of view small enough for it to be covered completely by a single
leaf: argue by comparison that the forwards simulation is relatively more likely to result
in a pattern made up of just one large leaf!
Home Page
Title Page
Contents
Remark 1.7 Web animations of perfect simulation for the
dead leaves model can be found at
JJ
II
J
I
www.warwick.ac.uk/go/wsk/abstracts/dead/
Page 21 of 111
Remark 1.8 Other examples of occlusion CFTP include
the Aldous-Broder algorithm for generating random spanning trees (Broder 1989; Aldous 1990; Wilson and Propp
1996; Wilson 1996).
Go Back
Full Screen
Close
Quit
1.5. Ising CFTP
Home Page
Propp and Wilson (1996) showed how to make exact draws from
the critical Ising model using Sweeny (1983)’s single-bond heat-bath
(for Swendsen-Wang, see Huber 2003). A simpler application uses
the heat-bath sampler to get exact draws from the sub-critical Ising
model.
Classic CFTP for the Ising model
(simple, sub-critical case).
Heat-bath dynamics run from past;
compare maximal and minimal starts.
Title Page
Contents
JJ
II
J
I
Page 22 of 111
Go Back
The simulation displays the results of a systematic scan Gibbs sampler. The left-most image is generated by the lower chain; the rightmost image by the upper chain, while the middle red image expresses
the difference. The classic CFTP recipe is followed: after cycling through a time interval [−T, 0], the
simulation extends back to cycle through [−2T, 0] unless the upper and lower chains coincide at time
0 (middle image is blank at end of cycle).
Details of the target Ising measure are configurable (neighbourhood structure varying between 4 nearest
neighbours, 4+4 √
nearest and next-nearest neighbours, with a “symmetry” option fixing next-nearest
interaction as 1/ 2 of nearest interaction). Coalescence becomes very slow as parameters approach
super-criticality.
Full Screen
Close
Quit
We can also use this for the conditioned Ising model,3 as used in
image analysis applications (see also Besag’s lectures).4
Home Page
Title Page
Classic CFTP for the Ising model
(super-critical case, conditioned).
Heat-bath dynamics run from past;
compare maximal and minimal starts.
Simulation display and options as before. Coalescence speed now depends on interaction between
correlation parameters and conditioning, as could be predicted from theoretical considerations (Kin-
Contents
JJ
II
J
I
Page 23 of 111
dermann and Snell 1980).
Go Back
Full Screen
3
Conditioning ≡ external magnetic field!
Note however, the Ising model is a very poor model for this particular
(dithered) image!
Close
4
Quit
Remark 1.9 Web animations of perfect simulations of conditioned Ising models (using simple images which are
rather more suitable for 4-neighbour Ising models!) can
be found at
Home Page
Title Page
www.warwick.ac.uk/go/wsk/ising-animations/
Contents
JJ
II
J
I
Page 24 of 111
Go Back
Full Screen
Close
Quit
1.6. Point process CFTP
Home Page
Classic CFTP is not limited to discrete models. We have already
seen this in the case of the falling leaves model. Here is a further
example: perfect simulation procedure for attractive area-interaction
point processes (Häggström et al. 1999).5
Area-interaction point process: weight a Poisson process realization
according to the area of the region closer than r to the pattern:
pattern density
∝
γ −area of region within distance r of pattern .
(1.2)
Title Page
Contents
JJ
II
J
I
Page 25 of 111
Exercise 1.10 If γ > 1 (attractive case only!), show the
above density is proportional to the probability that an independent Poisson process places no points within distance
r of the pattern.
Go Back
Full Screen
Close
5
Also known as the Widom and Rowlinson (1970) model.
Quit
Hence the area-interaction point process may be represented as the
pattern of red points, where red and blue points are distributed as
Poisson patterns conditioned to be at least distance r from blue and
red points respectively. This can be implemented as a (typically impracticable) rejection sampler: a more practical option is to use a
Gibbs sampler, which is monotonic and so lends itself to (classic!)
CFTP.
First we describe the Gibbs sampler, then we demonstrate the CFTP
procedure using an animation.
Home Page
Title Page
Contents
JJ
II
J
I
Page 26 of 111
Go Back
Full Screen
Close
Quit
Construct a Poisson point process (centres of red crosses).
Home Page
Title Page
Construct a new Poisson process (centres of blue discs),
but censor all points of the new
process such that a disc centred on the point overlaps a red
point.
Contents
JJ
II
J
I
Page 27 of 111
Discard the old red points,
construct a new Poisson process (centres of red crosses),
but censor all points of the new
process which would fall on a
blue disc.
Go Back
Full Screen
Close
Continue this procedure . . . (notice red/blue duality!)
Quit
Gibbs sampler CFTP for the attractive area-interaction point process as a
marginal of a two-type soft-core repulsion point process.
Home Page
Title Page
Contents
The CFTP construction is based on the observations
1. there is a partial order for states (red pattern, blue pattern);
2. “highest” and “lowest” states are
(X , ∅) and (∅, X ) where X is the full ground space:
Note that these are “pseudo-states”, not achieved by the pattern
itself!
The fact that there are highest and lowest pseudo-states is the key to
this algorithm’s quick convergence (at least when there is no phasetransition effect): the underlying Markov chain is uniformly ergodic.
JJ
II
J
I
Page 28 of 111
Go Back
Full Screen
Close
Quit
1.7. CFTP in space and time
Home Page
When interactions are sufficiently weak (certainly weak enough that
phase transitions cannot occur!) then the CFTP idea can be applied
in space as well as time. In effect, one aims to capture a fragment of
a virtual simulation in perfect equilibrium, for which the fragment is
spatially limited as well as temporally limited. See Kendall (1997b)
for a description and Häggström and Steif (2000) for related theory,
also §4.5 on the BFA algorithm.
In this case the binary search rationale guides how we should extend
the CFTP algorithm in both space and time.
Title Page
Contents
JJ
II
J
I
Page 29 of 111
Go Back
Full Screen
Close
Quit
1.8. Pre-history of CFTP
Home Page
“The conceptual ingredients of CFTP were in the air” (Propp and
Wilson 1998) for a long time. For example, elements can be found
in:
Title Page
Contents
• Kolmogorov (1936)6
• Doeblin (1937)
JJ
II
• Loynes (1962), Kendall (1983)
J
I
• Letac (1986), Kifer (1986)
Page 30 of 111
• Borovkov (1984)
Go Back
• Broder (1989), Aldous (1990)
• Thorisson (1988)
Full Screen
Close
6
Thanks to Thorisson (2000) for this reference . . . .
Quit
For example consider Kolmogorov (1936):
Home Page
Consider an inhomogeneous (finite state) Markov transition kernel
Pij (s, t). Can it be represented as the kernel of a Markov chain begun
in the indefinite past?
Title Page
Contents
Yes!
Apply diagonalization arguments to select a convergent subsequence
from the probability systems arising from progressively earlier and
earlier starts:
P [X(0) = k] = Qk (0)
JJ
II
J
I
Page 31 of 111
Go Back
Full Screen
Close
Quit
For example consider Kolmogorov (1936):
Home Page
Consider an inhomogeneous (finite state) Markov transition kernel
Pij (s, t). Can it be represented as the kernel of a Markov chain begun
in the indefinite past?
Title Page
Contents
Yes!
Apply diagonalization arguments to select a convergent subsequence
from the probability systems arising from progressively earlier and
earlier starts:
P [X(0) = k]= Qk (0)
P [X(−1) = j, X(0) = k] = Qj (−1)Pjk (−1, 0)
JJ
II
J
I
Page 32 of 111
Go Back
Full Screen
Close
Quit
For example consider Kolmogorov (1936):
Home Page
Consider an inhomogeneous (finite state) Markov transition kernel
Pij (s, t). Can it be represented as the kernel of a Markov chain begun
in the indefinite past?
Title Page
Contents
Yes!
Apply diagonalization arguments to select a convergent subsequence
from the probability systems arising from progressively earlier and
earlier starts:
P [X(0) = k]= Qk (0)
P [X(−1) = j, X(0) = k]= Qj (−1)Pjk (−1, 0)
P [X(−2) = i, X(−1) = j, X(0) = k] = Qi (−2)Pij (−2, −1)Pjk (−1, 0)
JJ
II
J
I
Page 33 of 111
Go Back
Full Screen
Close
Quit
For example consider Kolmogorov (1936):
Home Page
Consider an inhomogeneous (finite state) Markov transition kernel
Pij (s, t). Can it be represented as the kernel of a Markov chain begun
in the indefinite past?
Title Page
Contents
Yes!
Apply diagonalization arguments to select a convergent subsequence
from the probability systems arising from progressively earlier and
earlier starts:
P [X(0) = k] = Qk (0)
P [X(−1) = j, X(0) = k] = Qj (−1)Pjk (−1, 0)
P [X(−2) = i, X(−1) = j, X(0) = k] = Qi (−2)Pij (−2, −1)Pjk (−1, 0)
...
JJ
II
J
I
Page 34 of 111
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Lecture 2
CFTP and regeneration
Contents
JJ
II
J
I
Page 35 of 111
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Small sets
Small-set CFTP
Slice sampler CFTP
Multi-shift sampler
Catalytic CFTP
Read-once CFTP
Simulated tempering
Pre-history
Go Back
Full Screen
Close
Quit
2.1. Small sets
Home Page
Suppose we want a coupling between two random variables X, Y
yielding a maximal positive chance of X = Y and otherwise no
constraint. (Related to notion of convergence stationnaire, or “parking convergence”, from stochastic process theory.) Clearly this is
relevant to CFTP, where we aspire to coalescence!
Given two overlapping probability densities f and g, we can
implement such a coupling (X, Y ) as follows:
R
• Compute α = (f ∧g)(x) d x, and with probability α
return a draw of X = Y from the density (f ∧ g)/α.
• Otherwise draw X from (f − f ∧ g)/(1 − α) and Y
from (g − f ∧ g)/(1 − α).
This can be combined usefully with the method of rejection sampling
in stochastic simulation.
Title Page
Contents
JJ
II
J
I
Page 36 of 111
Go Back
Full Screen
Close
Quit
From Doeblin’s time onwards, probabilists have applied this to study
Markov chain transition probability kernels:
Definition 2.1 (Small set condition) Let X be a Markov
chain on a state space S, transition kernel p(x, d y). The
set C ⊆ S is a small set of order k if for some probability
measure ν and some α > 0
(k)
p (x, d y)
≥
I [C] (x) × α ν(d y) .
Home Page
Title Page
Contents
JJ
II
J
I
(2.1)
Here I [C] (x) is the indicator function for the set C.
Page 37 of 111
The animation displays evolution of a Markov chain on the unit interval whose transition density p(x, y) is triangular with peak at x.
Here the small set is the whole state space [0, 1], of order k = 1,
α = 1/2, and ν has the isoceles triangle density over [0, 1].
It is a central result that (non-trivial) small sets (possibly of arbitrarily high order) exist for any modestly regular Markov chain (classical
proof: see either of Nummelin 1984; Meyn and Tweedie 1993).
Go Back
Full Screen
Close
Quit
The trouble with general Markov chains is that the chain may have
zero probability of returning to any starting point. However if there
is a small set of order 1 then we may re-model the chain to fix this.
Home Page
Title Page
Theorem 2.2 Let X be a Markov chain on a (non-discrete)
state space S, transition kernel p(x, d y), with small set C
of order 1. Then X can be represented using a new Markov
chain on S ∪ {c}, for c a regenerative pseudo-state.
For details see Athreya and Ney (1978), also Nummelin (1978).
Higher-order small sets can be used if we are prepared to sub-sample
the chain . . . .
Contents
JJ
II
J
I
Page 38 of 111
Go Back
Small sets (perhaps of higher order) can be used systematically to
attack general state space theory using discrete state space methods.
See Meyn and Tweedie (1993) also Roberts and Rosenthal (2001).1
Full Screen
Close
1
Roberts and Rosenthal (2001) also introduce “pseudo-small sets”, which relate to coupling as small sets relate to CFTP.
Quit
It is natural to ask whether we can go further, and use small sets to
re-model the chain as an entirely discrete affair. However . . .
Home Page
Title Page
Remark 2.3 Non-trivial small sets of order 1 need not exist: however they do exist if (a) the kernel p(x, d y) has a
measurable density and (b) chain is sub-sampled at even
times.
(Both are needed: example in Kendall and Montana 2002.)
Contents
JJ
II
J
I
Page 39 of 111
Go Back
Full Screen
Close
Quit
Theorem 2.4 (Kendall and Montana 2002) If the Markov
chain has a measurable transition density p(x, y) then the
two-step density p(2) (x, y) can be expressed (non-uniquely)
as a non-negative countable sum
X
p(2) (x, y) =
fi (x)gi (y) .
Home Page
Title Page
Contents
i
Proof: Key Lemma, variation on Egoroff’s Theorem:
Let p(x, y) be an integrable function on [0, 1]2 . Then we can find
subsets Aε ⊂ [0, 1], increasing as ε decreases, such that
(a) for any fixed Aε the “L1 -valued function” px is uniformly continuous
on Aε : for any η > 0 we can find δ > 0 such that
R1
|px (z) − px0 (z)| d z < η for |x − x0 | < δ and x, x0 ∈ Aε .
0
(b) every point x in Aε is of full relative density: as u, v → 0 so
Leb([x − u, x + v] ∩ Aε )/(u + v) → 1.
JJ
II
J
I
Page 40 of 111
Go Back
Full Screen
Close
Quit
2.2. Murdoch-Green small-set CFTP
Home Page
Green and Murdoch (1999) showed how to use small sets to carry out
CFTP when the state space is continuous with no helpful ordering.
Title Page
Contents
Small-set CFTP in nearly the simplest
possible case: triangular kernel on [0, 1].
JJ
II
J
I
Page 41 of 111
Exercise 2.5 Express Small-set CFTP as composition of
Geometrically many kernels (conditioned to avoid smallset effect), with starting point randomized by small-set distribution ν.
Go Back
Full Screen
Close
Quit
2.3. Slice sampler CFTP
Home Page
Task is to draw from density f (x). Here is a one-dimensional example! (But note, this is only interesting because it can be made to work
in many dimensions . . . ) Suppose f unimodal. Define g0(y), g1(y)
implicitly by: [g0(y), g1(y)] is the super-level set {x : f (x) ≥ y}.
Alternate between drawing y uniformly from [0, f (x)] and drawing
x uniformly from [g0(y), g1(y)].
Title Page
Contents
JJ
II
J
I
Page 42 of 111
Go Back
Full Screen
Roberts and Rosenthal (1999, Theorem 12) show rapid convergence
(order of 530 iterations!) under a specific variation of log-concavity.
Mira, Møller, and Roberts (2001) describe a perfect variant.
Close
Quit
Exercise 2.6 Design your own perfect slice sampler for the
case when the unimodal density f (x) has bounded support.
Home Page
Title Page
HINT: You need to figure out how to make uniform choices for two versions of the process
simultaneously, so as to preserve the partial ordering
(x, y)
(x0 , y 0 )
Contents
if
f (x) ≤ f (y) ,
but also so as to have positive chances of coalescing.
(2.2)
JJ
II
J
I
Page 43 of 111
Go Back
Full Screen
Close
Quit
2.4. Multi-shift sampler
Home Page
Question 2.7 How to draw simultaneously for all x ∈ R
from Uniform(x, x + 1), and couple the draws?
Answer: random unit span integer lattice (Wilson 2000b).
Title Page
Contents
Wilson (2000b) also considers more general distributions! For example, we can express a normal distribution as a mixture of uniform
distributions using slice sampler ideas, and this allows us to do the
same trick in several ways. The method also deals with multivariate
and even multi-modal cases.
JJ
II
J
I
Page 44 of 111
Go Back
Full Screen
Close
Corcoran and Schneider (2003) carry this even further, inventing and
exploiting a scheme to couple draws from Uniform distributions with
Quit
different ranges..
Home Page
Title Page
Contents
JJ
II
J
I
Page 45 of 111
Go Back
Full Screen
Close
Quit
2.5. Catalytic CFTP
Home Page
Breyer and Roberts (2002) have devised an “automatic” variation on
small-set CFTP: catalytic CFTP . The underlying idea is to perform
simultaneous Metropolis-Hastings updates for all possible states, using a common Uniform[0, 1] random variable U to determine rejection or acceptance. For suitable proposal random fields Φx , it may
be possible to identify when the input-output map F(−t,0] coalesces
into a finite range; moreover the choice of construction for Φx can
be varied from time point to time point.
Simulations can be viewed at
Title Page
Contents
JJ
II
J
I
Page 46 of 111
www.lbreyer.com/fcoupler.html
Go Back
Full Screen
Close
Quit
2.6. Read-once CFTP
Home Page
Wilson (2000a): build F(−n,0] of Theorem 1.2 from i.i.d. blocks
Title Page
F(−nt,0]
=
F(−t,0] ◦ . . . ◦ F(−(n−1)t,(−(n−2)t] ◦ F(−nt,−(n−1)t] .
The blocking length t is chosen so that there is a positive chance of
D
the map B = F(−t,0] being coalescent. By a simple computation, the
CFTP procedure is identical to the following forwards procedure:
• Repeatedly draw independent realizations of B till a
coalescent block is obtained; note coalesced output x.
• Repeatedly draw independent realizations of B;
while these are not coalescent compute the update
x ← B(x).
• When a coalescent realization of B is obtained, return
x without updating!
Contents
JJ
II
J
I
Page 47 of 111
Go Back
Full Screen
Close
Quit
There are strong resonances with small-set CFTP (the possibility of
coalescent B corresponds to the whole state space being a small set
of order t) and with the dead leaves CFTP example (one can view
the argument as a re-ordering in time).
Exercise 2.8 Prove the validity of read-once CFTP by establishing that the above forwards procedure produces a
sequence of B maps which have the same distribution
as would be produced by carrying out classic CFTP, but
checking for coalescence only block-by-block.
Exercise 2.5 should be helpful!
Home Page
Title Page
Contents
JJ
II
J
I
Page 48 of 111
The choice of the blocking length t is of course crucial!
A feature of major importance of this method is that storage requirements are minimized: one needs only (a) to flag whether a block is
coalescent, (b) to compute the output of a coalescent block, and (c)
to track the current state of the chain as the blocks are produced.
Go Back
Full Screen
Close
Quit
2.7. Perfect simulated tempering
Home Page
Recall that simulated tempering embeds the target distribution π =
π0 as one of a family πi : i ∈ I of distributions: for each i ∈ I one
devises a chain Xi leaving πi invariant, and carries out simulated
tempering by evolving according to chain Xi while making occasional proposals to change i to another value i0 ∈ I. The distribution
of the resulting chain is proportional to the target π0 when restricted
to i = 0.
Perfect simulated tempering (Møller and Nicholls 1999; Brooks et al.
2002) devises schemes in which the chain at i = N (say) coalesces
very fast, moreover it is possible to construct upper and lower ichains after the manner of random walk CFTP. One can then carry
out CFTP by evolving the upper and lower i-chains till (a) they coalesce at i = N , and (b) during the period of i-coalescence the XN
chain itself coalesces.
This combines well with the ideas of read-once CFTP applied to the
upper and lower i-chains (Brooks et al. 2002).
Title Page
Contents
JJ
II
J
I
Page 49 of 111
Go Back
Full Screen
Close
Quit
2.8. Pre-history of small-set CFTP
Home Page
• Athreya and Ney (1978), Nummelin (1978)
• Asmussen, Glynn, and Thorisson (1992)
Title Page
• Mykland, Tierney, and Yu (1995).
Contents
JJ
II
J
I
Page 50 of 111
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Lecture 3
Dominated CFTP
Contents
JJ
II
J
I
Page 51 of 111
3.1
3.2
3.3
3.4
3.5
3.6
Queues
Ergodicity
Birth and death
Perpetuities
Point processes
General theorem
Go Back
Full Screen
Close
Quit
3.1. Queues
Home Page
Consider a GI/G/1 queue. Lindley noticed a beautiful representation for waiting time Wn of customer n in terms of services Sn and
inter-arrivals Xn . . .
Title Page
Contents
Theorem 3.1 (Lindley’s equation) Consider the waiting
time identity for the GI/G/1 queue.
Wn+1 = max{0, Wn + Sn − Xn+1 } = max{0, Wn + ηn }
= max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 }
D
= max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn }
and thus we obtain the steady-state expression
D
W∞ = max{0, η1 , η1 + η2 , . . .} .
JJ
II
J
I
Page 52 of 111
Go Back
Full Screen
Close
SLLN/CLT/random walk theory shows W∞ will be finite if and only
if E [ηi ] < 0 or ηi ≡ 0.
Quit
Supposing we lose independence? Loynes (1962) discovered a coupling application to queues with (for example) general dependent
stationary inputs and associated service times, pre-figuring CFTP.
Home Page
Title Page
Theorem 3.2 Suppose queue arrivals follow a stationary
point process stretching back to time −∞. Denote arrivals/associated service times in (s, t] by Ns,t (stationarity: statistics of process {Ns,s+u : u ≥ 0} do not depend
on s). Let QT denote the behaviour of the queue observed
from time 0 onwards if begun with 0 customers at time −T .
The queue converges to statistical equilibrium if and only if
Contents
JJ
II
J
I
Page 53 of 111
T
lim Q
T →∞
exists almost surely.
Go Back
Full Screen
Remark 3.3 Stoyan (1983) develops this kind of idea.
Kendall (1983) gives an application to storage problems.
Close
Quit
Remark 3.4 Coupling and CFTP ideas enter into Theorem
3.1 at the crucial time-reversal step:
max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 }
D
=
=
Home Page
Title Page
max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn }
Compare Section 1.4 on falling leaves . . . .
Remark 3.5 The problem about applying Theorem 3.1 in
CFTP is that we don’t know which η1 + η2 + . . . + ηn attains
the supremum max{0, η1 , η1 + η2 , . . .}.
Contents
JJ
II
J
I
Page 54 of 111
Go Back
Exercise 3.6 Use simple R statistical package scripts and
elementary calculations to implement the Loynes coupling.
See my GTP online notes for further details.
Full Screen
Close
Quit
3.2. Uniform and Geometric Ergodicity
Home Page
There is a theoretical issue which forms a road block to the use of
Lindley’s representation in CFTP. Recall the notions of uniform ergodicity and geometric ergodicity.
Title Page
Contents
Definition 3.7 A Markov chain with transition kernel
p(x, ·) possesses geometric ergodicity if there are 0 < ρ <
1, R(x) > 0 with
kπ − p(n) (x, ·)kTV ≤ R(x)ρn
for all n, all starting points x.
JJ
II
J
I
Page 55 of 111
Go Back
Full Screen
So a chain exhibits geometric ergodicity if equilibrium is approached
at a geometric rate. Note that the geometric bound is shifted by the
chain’s starting point. However . . .
Close
Quit
Definition 3.8 A Markov chain with transition kernel
p(x, ·) possesses uniform ergodicity if there are ρ ∈ (0, 1),
R > 0 not depending on the starting point x such that
Home Page
Title Page
kπ − p(n) (x, ·)kTV ≤ Rρn
Contents
for all n, uniformly for all starting points x.
So a chain exhibits uniform ergodicity if the geometric rate is not
affected by the chain’s starting point.
JJ
II
J
I
Page 56 of 111
Uniform ergodicity corresponds loosely to “virtually finite state space”.
However chains may still be uniformly ergodic even if the state space
is far from finite: the Häggström et al. (1999) chain in Section 1.6 is
a good example.
Go Back
Full Screen
Close
Quit
Foss and Tweedie (1998) show that the (theoretical) possibility of
classic CFTP is equivalent (under modest regularity) to uniform ergodicity (rate of convergence to equilibrium does not depend on
start-point). For classic CFTP needs vertical coalescence: every possible start from time −T leads to the same result at time 0, and this
implies a uniformly geometric rate of convergence to equilibrium.
Home Page
Title Page
Contents
JJ
II
J
I
Page 57 of 111
Figure 3.1: Vertical coalescence
Go Back
The converse, that uniform ergodicity implies classic CFTP under
modest regularity is possible in principle, follows from small set theory.
Full Screen
Close
Quit
Theorem 3.9 (Foss and Tweedie 1998) Suppose a Markov
chain X on a general state space X has positive probability of hitting a specified small set C of order k, where the
probability may depend on the starting position but is always positive. Then X is uniformly ergodic if and only if
classic CFTP is possible in principle (disregarding all computation problems!).
Proof (Outline):
It is clear from the CFTP construction that classic CFTP forces uniform ergodicity.
On the other hand, uniform ergodicity means we can choose n such
that p(n) (x, ·) is close to equilibrium in total variation, uniformly in
x. It follows that in principle we can design a split chain which has
positive chance of applying regeneration every n + k time steps, and
this permits construction of small-set CFTP .
Home Page
Title Page
Contents
JJ
II
J
I
Page 58 of 111
Go Back
Full Screen
Close
Quit
The practical obstacle here is that we will have to gain knowledge of
p(n) (x, ·) to build the split chain. But in general we expect p(n) (x, ·)
to be less accessible than the equilibrium distribution itself!
Home Page
Title Page
It follows from the very existence of CFTP constructions, all our
chains so far have been uniformly ergodic. Can we lift this uniform
ergodicity requirement? CFTP almost works with horizontal coalescence as exhibited in the Lindley representation: “all sufficiently
early starts from a specific location * lead to the same result at time
0”. But, as the Lindley representation highlights, the question is how
to identify when this has happened?
Contents
JJ
II
J
I
Page 59 of 111
Go Back
Full Screen
Figure 3.2: Horizontal coalescence
Close
Quit
Remark 3.10 Dominated CFTP idea (domCFTP): Generate target chains coupled to a dominating process for which
equilibrium is known. Domination allows us to identify horizontal coalescence by checking starts from maxima given
by the dominating process.
Home Page
Title Page
Contents
If we can make this work then we can apply CFTP to Markov chains
which are merely geometrically ergodic (Kendall 1998; Kendall and
Møller 2000; Kendall and Thönnes 1999; Cai and Kendall 2002) or
worse (geometric ergodicity 6= domCFTP!).
JJ
II
J
I
Page 60 of 111
Go Back
Full Screen
Close
Quit
Theorem 3.11 (Kendall 1998; Kendall and Møller 2000):
If coalescence is almost sure then domCFTP samples from
equilibrium.
Home Page
Title Page
Contents
Proof: For simplicity we suppose the target process X and the dominating process Y are non-negative.
Let X upper,−n , X lower,−n = X −n be versions of the target chain started
at time −n at Y (−n), 0 respectively. Let −T be the latest time such
that X upper,−T (0) = X lower,−T (0) = X −T (0) (so −T is the coalescence time). Now argue as in Theorem 1.2 for classic CFTP:
If X converges to an equilibrium π in total variation distTV then
distTV (L(X0−T ), π) = lim distTV (L(X0−n ), π) = lim distTV (L(Xn0 ), π) = 0
n
JJ
II
J
I
Page 61 of 111
Go Back
n
hence result.1
Full Screen
Close
1
Note: the very last step makes implicit use of stationarity of Y !
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Figure 3.3: Dominated CFTP
Thus we can use realizations of the target process started from the
dominating process to identify horizontal coalescence.
Page 62 of 111
Go Back
Full Screen
Close
Quit
3.3. Non-linear birth-death processes
Home Page
To illustrate domCFTP in detail, we describe a simple example taken
from Kendall (1997a). Consider a continuous-time non-linear birthdeath process X, with transition rates
X →X +1
X →X −1
Title Page
Contents
at rate λX ,
at rate Xµ ,
for positive λX , µ. We suppose the birth rate λX is bounded above2
by λ.
Of course it is possible to compute the equilibrium distribution using detailed balance. However here the object of the exercise is to
construct a domCFTP method to draw exactly from the target distribution.
JJ
II
J
I
Page 63 of 111
Go Back
Full Screen
Close
2
tive!
Kendall (1997a) requires monotonicity for λX , which is unnecessarily restricQuit
Note first that the non-linear birth-death process X can be bounded
above, or dominated by, the linear birth-death process Y with transition rates
Home Page
Title Page
X →X +1
X →X −1
at rate λ ,
at rate Xµ .
Here domination means, if 0 ≤ X(0) ≤ Y (0) then we can construct
coupled copies of X and Y such that the relationship X ≤ Y is
maintained for all time.
Indeed we can go further: given the process Y then for any x, 0 ≤
x ≤ Y (0), we can construct a copy Xx of X begun at x such that
0 ≤ Xa ≤ Xb ≤ Y for all time whenever a ≤ b ≤ Y (0).
Contents
JJ
II
J
I
Page 64 of 111
Go Back
We do this as follows.
Full Screen
Close
Quit
We construct X from Y by supposing, X can have a birth only if Y
has a birth, and similarly for deaths.
Suppose to each birth incident and each death incident of Y there
is attached an independent Uniform[0, 1] random mark U . So the
construction of X is specified by determining for each Y incident
whether or not this corresponds to an X incident.
• A birth incident Y → Y + 1 at time t with mark U is allowed
to generate an X birth incident exactly when
U
≤
λX(t−)
;
λ
Home Page
Title Page
Contents
JJ
II
J
I
(3.1)
Page 65 of 111
• A death incident Y → Y − 1 at time t with mark U is allowed
to generate an X death incident exactly when
U
≤
µX(t−)
µY (t−)
=
X(t−)
.
Y (t−)
(3.2)
Go Back
Full Screen
Close
Quit
It is apparent from X(t−) ≤ Y (t−) and the increasing nature of
λX ≤ λ that the U -based criteria above use probabilities λX(t−) /λ ≤
1 and X(t−)/Y (t−) ≤ 1 respectively. This permits an inductive
argument, iterating through the birth and death incidents of Y , which
shows X ≤ Y for all time, and which indeed also demonstrates
0 ≤ Xa ≤ Xb ≤ Y if 0 ≤ Xa (0) ≤ Xb (0) ≤ Y (0).
Now carry out the CFTP construction, but making starts
at times −n, −2n, . . . using a stationary realization of the
dominating process, rather than the top-most state. To do
this it is necessary to be able to
Home Page
Title Page
Contents
JJ
II
J
I
Page 66 of 111
1. draw from the equilibrium of the dominating process
(easy here: detailed balance identifies the equilibrium
distribution as Geometric);
2. simulate the reversed process in equilibrium (easy
here: by detailed balance the process is reversible).
Go Back
Full Screen
Close
Quit
3.4. A simple application of domCFTP:
Perpetuities
Home Page
Title Page
Consider the simplest perpetuity, with distribution defined by
L(X)
=
L(U (1 + X))
(3.3)
Contents
where U is a Uniform(0, 1) random variable which is independent of
X. Clearly X should have limiting distribution given by the infinite
product
U1 (1 + U2 (1 + U3 (1 + . . .)))
(3.4)
JJ
II
J
I
where the Ui are independent Uniform(0, 1) random variables, so Ui
can be viewed as effect of random inflation on an annual payment.
Page 67 of 111
Go Back
We want to know about this limiting distribution of X: in effect we
want to know about the equilibrium behaviour of the Markov chain
whose update is
Full Screen
Close
Xn+1
=
Un+1 (1 + Xn ) .
Quit
Monotonicity and CFTP ideas can be applied to the infinite product
(3.4): certainly the limiting distribution satisfies
L(X)
≥
lim {U1 (1 + U2 (1 + U3 (1 + . . . (1 + Un × 0) . . .)))}
n
and it is possible to establish equality using small sets and FosterLyapunov theory. However we are left with the dilemma already
faced when considering the Lindley representation in the Section 3.1
on queues: we have horizontal coalescence here rather than vertical
coalescence, so classical CFTP is inapplicable.
Home Page
Title Page
Contents
JJ
II
J
I
Page 68 of 111
Go Back
Full Screen
Close
Quit
Here is how to apply domCFTP. Take logs and produce an upper
bound on the Markov chain log Xn which is a reflecting random
walk:
1
+ log Xn
log Xn+1 = log Un+1 + log 1 +
Xn
1
≤ −Vn+1 + log 1 +
+ log Xn if Xn ≥ k ;
k
where Vn+1 is an Exponential(1) random variable.
Thus log X ≤ Z if Z is the reflecting random walk given by
(
κ − Vn+1 + Zn if κ − Vn+1 + Zn ≥ log k ,
Zn+1 =
(3.5)
log k
otherwise,
where κ = log(1 + 1/k).
Home Page
Title Page
Contents
JJ
II
J
I
Page 69 of 111
Go Back
Full Screen
Close
Quit
We need an equilibrium distribution for Z, which can be viewed as
the workload process of a modest variation on a discrete M/D/1
queue. Actually it is easiest to bound Z above by a reversible reflecting simple random walk W : we approximate κ − Vn+1 by a discrete
random variable


if Vn+1 ≤ κ, probability 1 − e−κ ,
κ
∆Wn+1 =
0
if κ ≤ Vn+1 < 2κ, probability e−κ − e−2κ ,


−κ if 2κ ≤ Vn+1 , probability e−2κ ,
Home Page
Title Page
Contents
JJ
II
J
I
and note that the corresponding reflecting simple random walk W on
{nκ + log k : n = 0, 1, 2, . . .}
dominates Z (use coupling!), is positive-recurrent if e−2κ + e−κ > 1
(in which case eκ (eκ − 1) < 1), and then has equilibrium distribution
which is Geometric:
P [W = nκ + log k]
=
(eκ (eκ − 1))n (1 − eκ (eκ − 1)) .
(3.6)
Page 70 of 111
Go Back
Full Screen
Close
Quit
So we can draw from this Geometric distribution, then simulate the
(reversible!) reflecting simple random walk W backwards in time,
and use exp(W ) as dominating process.
Calculations show that a choice of k = 2.76796 . . . is optimal, in the
sense of minimizing the mean equilibrium value of W .
There is still one problem left. If we apply the same value of Ui to all
possible starts lying below the dominating process, then we will not
achieve coalescence. (We will get convergence at a geometric rate,
which will eventually be absorbed by rounding error, but that is not
the same thing!) It is best to use Wilson’s multi-shift trick.
A perfect perpetuity.
Note that the approximate
algorithm, “run recurrence
till initial conditions lost
in floating point error” is
slower!
Home Page
Title Page
Contents
JJ
II
J
I
Page 71 of 111
Go Back
Full Screen
Close
See Devroye (2001) for a different (but still perfect!) approach.
Quit
3.5. Point processes
Home Page
Dominated CFTP works, for example, on both attractive and repulsive area-interaction point processes (Kendall 1998; Kendall and
Møller 2000): using as target chain a spatial birth-and-death process,
which give birth to points at a rate determined by the local interaction with pre-existent points, and which kills points at unit rate per
point. This allows the use of domCFTP in a manner very similar to
that of Section 3.3, but with Uniform[0, 1] marks replaced by marks
which are Poisson clusters, as described in Kendall (1997a).
Dominated CFTP for attractive areainteraction point process with geometric marking using Poisson processes in
disks.
Title Page
Contents
JJ
II
J
I
Page 72 of 111
Go Back
Full Screen
It is of interest in stochastic geometry that this expresses such point
processes as explicit but highly dependent thinnings of Poisson point
processes.
Close
Quit
How do we adapt the method of domCFTP (or indeed CFTP in
general) to non-monotonic cases? We can use the crossover trick
(Kendall 1998), which we explain in the context of repulsive area
interaction γ < 1. Create two chains to bound the target chain X
above (by X upper ) and below (by X lower ). Cross over the rules for
birth: a proposed point is born in X upper if it would pass the test for
X lower , and vice versa. Then automatically
X lower
⊆
X
⊆
X upper
so CFTP can be applied. Kendall and Møller (2000) give a general
formulation for point processes.
See also Huber (1999)’s use of “swap moves” in the context of bounding chains, which he uses to estimate a rapid-mixing regime. If the
birth proposal is blocked by just one point, then replace the blocking point by the new proposal in a swap, with swap probability pswap
which we are free to choose; fix the bounding chains accordingly(!);
this results in a provable speed-up of the CFTP algorithm.
Home Page
Title Page
Contents
JJ
II
J
I
Page 73 of 111
Go Back
Full Screen
Close
Quit
3.6. A general theorem for CFTP
Home Page
Moving from CFTP to domCFTP suggests still further abstraction.
This is convenient, for example, when considering CFTP for conditioned point processes as in Cai and Kendall (2002): it can be convenient to allow the upper- and lower-processes to move out of the
conditioned state space.
Let X be a Markov chain on X which exhibits equilibrium behaviour.
Embed the state space X in a partially ordered space (Y, )
so that X is at bottom of Y, in the sense that for any y ∈ Y,
x ∈ X,
yx
implies
y = x.
Title Page
Contents
JJ
II
J
I
Page 74 of 111
Go Back
Full Screen
Close
Quit
We may then use the methods of Theorem 1.2 (CFTP) and Theorem
3.11 (domCFTP) to show:
Theorem 3.12 Define a Markov chain Y on Y such that Y
evolves as X after it hits X ; let Y (−u, t) be the value at t
of a version of Y begun at time −u,
(a) of fixed initial distribution:
L(Y (−T, −T )) = L(Y (0, 0)), and
(b) obeying funnelling:
if −v ≤ −u ≤ t then Y (−v, t) Y (−u, t).
Home Page
Title Page
Contents
JJ
II
J
I
Page 75 of 111
Suppose coalescence occurs: P [Y (−T, 0) ∈ X ] → 1 as
T → ∞. Then lim Y (−T, 0) can be used for a CFTP draw
from the equilibrium of X.
Go Back
Full Screen
Close
Quit
Exercise 3.13 Murdoch (2000) points out a MCMC algorithm can be forced to become uniformly ergodic by altering the move to allow a small chance of an independence
sampler move. This can be expressed in terms of the above
general framework.
Home Page
Title Page
Contents
JJ
II
J
I
Page 76 of 111
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Lecture 4
Theory and connections
Contents
JJ
II
J
I
Page 77 of 111
4.1
4.2
4.3
4.4
4.5
FMMR
Recycling randomness
Efficiency
Foster-Lyapunov
BFA algorithm
Go Back
Full Screen
Close
Quit
4.1. Siegmund duality and FMMR
Home Page
An important alternative to CFTP makes fuller use of the notion
of time reversal, as in the dead-leaves example, and Section 3.1 on
queues. We begin with a beautiful duality.
Title Page
Contents
Theorem 4.1 (Siegmund duality) Suppose X is a process
on [0, ∞). When is there another process Y satisfying the
following?
P [Xt ≥ y|X0 = x]
=
P [Yt ≤ x|Y0 = y]
(4.1)
Answer (Siegmund 1976): exactly when X is (suitably regular and) stochastically monotone: x ≤ x0 implies
JJ
II
J
I
Page 78 of 111
Go Back
P [Xt ≥ y|X0 = x]
≤
0
P [Xt ≥ y|X0 = x ] .
Full Screen
Proof (Outline):
Use Equation (4.1) and Fubini’s Theorem to derive the ChapmanKolmogorov equations.
Close
Quit
Remark 4.2 If X is not stochastically monotone then we
get negative transition probabilities for Y !
Home Page
Title Page
Remark 4.3 Consequence:
Y is absorbed at 0, and X at ∞.
Remark 4.4 Intuition: think of the Siegmund dual this
way. Couple monotonically the X (x) begun at different x
(y)
(x)
(use stochastic monotonicity!), set Yt = x if Xt = y.
Contents
JJ
II
J
I
Page 79 of 111
Go Back
Full Screen
Close
Quit
This beautiful idea grew into a method of simulation, and then a
method of perfect simulation called Fill’s method (Fill 1998) alternative to CFTP. Fill’s method considered on its own is harder to
explain than CFTP.
Home Page
Title Page
On the other hand it has advantages too:
• it can provide user-interruptibility (perfect draws are not biased by censoring draws which take longer than a specified
threshold);
• as we will see, it can be viewed as a conditional version of
CFTP, and the conditioning can be used to speed up the algorithm.
Contents
JJ
II
J
I
Page 80 of 111
Go Back
Full Screen
Close
Quit
Fill’s method (Fill 1998; Thönnes 1999) is at first sight quite different from CFTP. It is based on the notion of a strong uniform time T
(Diaconis and Fill 1990) and is related to Siegmund duality. However Fill et al. (2000) establish a profound and informative link with
CFTP. We explain using “blocks” as input-output maps for a chain.
First recall that CFTP can be viewed in a curiously redundant fashion
as follows:
• Draw from equilibrium X(−T ) and run forwards;
Home Page
Title Page
Contents
JJ
II
J
I
Page 81 of 111
• continue to increase T until X(0) is coalesced;
Go Back
• return X(0); note that by construction T and X(0) are independent of X(−T ).
Full Screen
Close
Quit
Key observation: By construction, X(0) and T are independent of
X(−T ) so we can condition on them!
Home Page
Title Page
Contents
JJ
II
J
I
• Condition on a convenient X(0);
• Run X backwards to a fixed time −T ;
• Draw blocks conditioned on the X transitions;
• If coalescence then return X(−T ) else repeat.
This makes it apparent that there are gains to be obtained over CFTP
by careful selection of the convenient X(0). These gains can be
dramatic! (See for example Dobrow and Fill 2003.)
Page 82 of 111
Go Back
Full Screen
Close
Quit
Key observation: By construction, X(0) and T are independent of
X(−T ) so we can condition on them!
Home Page
Title Page
Contents
JJ
II
J
I
• Condition on a convenient X(0);
• Run X backwards to a fixed time −T ;
• Draw blocks conditioned on the X transitions;
• If coalescence then return X(−T ) else repeat.
This makes it apparent that there are gains to be obtained over CFTP
by careful selection of the convenient X(0). These gains can be
dramatic! (See for example Dobrow and Fill 2003.)
Page 83 of 111
Go Back
Full Screen
Close
Quit
Question 4.5 Is there a dominated version of Fill’s
method?
Home Page
Title Page
Contents
JJ
II
J
I
Page 84 of 111
Go Back
Full Screen
Close
Quit
4.2. Recycling randomness
Home Page
Fill and Huber (2000) introduce a quite different form of perfect simulation! Here is how they apply their Randomness Recycler algorithm to the problem of drawing a random independent subset X of
a graph G, weighted exponentially by number of points in X.
Start: V = ∅, x ≡ 0. End: V = G, x indicates X membership.
while V 6= G :
v ← choice (G − V ) # Choose v from G \ V
V.add (v)
if uniform (0, 1) ≤ 1/(1 + alpha) :
x[v] ← 0 # Skip v with prob 1/(1 + α)
else :
x[v] ← 1 # or tentatively include it ...
nbd ← [ ] # ... iterate thro’ neighbours
for w ∈ neighbourhood (v) : # Valid?
nbd.append (w)
if x[w] = 1 : # If not valid ...
x[w] ← 0 # ... remove all
x[v] ← 0 # “contaminated” vertices
V ← V − [v] − nbd
break # and move on
Title Page
Contents
JJ
II
J
I
Page 85 of 111
Go Back
Full Screen
Close
Quit
4.3. Efficiency and the price of perfection
Home Page
How efficient might CFTP be? When there is strong enough monotonicity then useful bounds have been derived – even as early as
Propp and Wilson (1996).
Title Page
Contents
Remark 4.6 In general CFTP has to involve coupling, usually co-adapted. One expects co-adapted coupling to happen at some exponential rate, and convergence to equilibrium (in total variation norm distTV !) likewise. From the
Coupling Inequality (1.1) we know that coupling cannot
happen faster than convergence to equilibrium.
JJ
II
J
I
Page 86 of 111
Go Back
Full Screen
Close
Quit
In the case of monotonic CFTP on a finite partially ordered space,
Propp and Wilson (1996) present a strong bound. Let ` be the longest
chain in the space; let T ∗ be the coalescence time, let
Home Page
Title Page
d(k)
=
max{Px(k)
x,y
−
Py(k) } .
Contents
Then
P [T ∗ > k]
≤ d(k) ≤ P [T ∗ > k] ,
`
so CFTP is within a factor of being as good as possible.
(4.2)
But can it happen slower? and for relatively simple Markov chains?
Burdzy and Kendall (2000) show how to use coupling to find out
about this coupling problem!
JJ
II
J
I
Page 87 of 111
Go Back
Full Screen
Close
Quit
Coupling of couplings:. . .
Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t)
while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt)
Home Page
Title Page
Contents
JJ
II
J
I
Page 88 of 111
Go Back
Full Screen
Close
Quit
Coupling of couplings:. . .
Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t)
while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt)
Home Page
Title Page
|pt (x1 , y) − pt (x2 , y)| =
= |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| =
|P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]|
× P [τ > t|X(0) = (x1 , x2 )]
Contents
JJ
II
J
I
Page 89 of 111
Go Back
Full Screen
Close
Quit
Coupling of couplings:. . .
Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t)
while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt)
Home Page
Title Page
|pt (x1 , y) − pt (x2 , y)| =
= |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| =
|P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]|
× P [τ > t|X(0) = (x1 , x2 )]
Contents
JJ
II
J
I
Let X ∗ be a independently coupled copy of X but begun at (x2 , x1 ):
Page 90 of 111
| P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] |
= | P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] −
P [X1∗ (t) = y|τ ∗ > t, X ∗ (0) = (x2 , x1 )] |
Go Back
Full Screen
Close
Quit
Coupling of couplings:. . .
Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c2 exp(−µ2 t)
while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt)
Home Page
Title Page
|pt (x1 , y) − pt (x2 , y)| =
= |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| =
|P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]|
× P [τ > t|X(0) = (x1 , x2 )]
Contents
JJ
II
J
I
Let X ∗ be a independently coupled copy of X but begun at (x2 , x1 ):
Page 91 of 111
| P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] |
= | P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] −
P [X1∗ (t) = y|τ ∗ > t, X ∗ (0) = (x2 , x1 )] |
≤ P [σ > t|τ > t, τ ∗ > t, X(0) = (x1 , x2 )] (≈ c0 exp(−µ0 t))
for σ the time when X, X ∗ couple.
Thus µ2 ≥ µ0 + µ.
Go Back
Full Screen
Close
Quit
Remark 4.7 So co-adapted coupling is strictly slower than
convergence to equilibrium when coupled chains can swap
round before coupling (the opposite of monotonicity!). Burdzy and Kendall (2000) present heuristics to expand on this.
Home Page
Title Page
Contents
Here is a simple continuous-time Markov chain case.
JJ
II
J
I
Page 92 of 111
Go Back
Full Screen
Mountford and Cranston (2000) describe a still simpler example!
Close
Quit
Remark 4.8 Kumar and Ramesh (2001) give a computerscience type example of graph-matchings: coupling becomes much slower than convergence to equilibrium as
problem-size increases.
Home Page
Title Page
Contents
Remark 4.9 A different question about efficiency is, how
best to use CFTP to obtain repeated samples? see Murdoch
and Rosenthal (1999).
JJ
II
J
I
Page 93 of 111
Go Back
Full Screen
Close
Quit
4.4. Dominated CFTP and Foster-Lyapunov conditions
Corcoran and Tweedie (2001) describe how to mix domCFTP and
small-set CFTP. The upper envelope process must be formulated
carefully . . . . When the dominating process visits a small set, then
one can attempt small-set coupling. However the dominating process
must be large enough to dominate instances when small-set coupling
is attempted and fails!
There are similarities to Foster-Lyapunov conditions for assessing
geometric ergodicity etc for Markov chains. Such conditions use a
Lyapunov function V to deliver a controlled supermartingale off a
small set.
We begin by discussing a Foster-Lyapunov condition for positiverecurrence.
Home Page
Title Page
Contents
JJ
II
J
I
Page 94 of 111
Go Back
Full Screen
Close
Quit
Theorem 4.10 (Meyn and Tweedie 1993) Positiverecurrence on a set C holds if C is a small set and one
can find a constant β > 0, and a non-negative function V
bounded on C such that for all n > 0
E [V (Xn+1 )|Xn ]
≤
≤
Title Page
Contents
V (Xn ) − 1 + β I [Xn ∈ C] .
(4.3)
Proof: Let N be the random time at which X first (re-)visits C. It
suffices1 to show E [N |X0 ] < V (X0 ) + constant < ∞ (then use
small-set regeneration).
By iteration of (4.3), we deduce E [V (Xn )|X0 ] < ∞ for all n.
If X0 6∈ C then (4.3) tells us n
7→ V (Xn∧N
) + n ∧ N defines a
nonnegative supermartingale (I X(n∧N ) ∈ C = 0 if n < N ) . Consequently
E [N |X0 ]
Home Page
E [V (XN ) + N |X0 ]
≤
V (X0 ) .
JJ
II
J
I
Page 95 of 111
Go Back
Full Screen
Close
1
This martingale approach can be reformulated as an application of Dynkin’s
formula.
Quit
If X0 ∈ C then the above can be used to show
Home Page
E [N |X0 ] = E [1 × I [X1 ∈ C]|X0 ] + E [E [N |X1 ] I [X1 6∈ C]|X0 ]
≤ P [X1 ∈ C|X0 ] + E [1 + V (X1 )|X0 ]
≤ P [X1 ∈ C|X0 ] + V (X0 ) + β
Title Page
Contents
where the last step uses Inequality (4.3) applied when
I [Xn−1 ∈ C]
=
JJ
II
J
I
1.
Page 96 of 111
Now we consider a strengthened condition for geometric ergodicity.
Go Back
Full Screen
Close
Quit
Theorem 4.11 (Meyn and Tweedie 1993) Geometric ergodicity holds if one can find a small set C, positive constants λ < 1, β, and a function V ≥ 1 bounded on C such
that
E [V (Xn+1 )|Xn ]
≤
Home Page
Title Page
Contents
λV (Xn ) + β I [Xn ∈ C] . (4.4)
Proof: Define N as in Theorem 4.10.
Iterating (4.4), we deduce E [V (Xn )|X0 ] < ∞ and more specifically
n
7→
V (Xn∧N )/λn∧N
II
J
I
Page 97 of 111
is a nonnegative supermartingale. Consequently
E V (XN )/λN |X0
≤ V (X0 ) .
Go Back
Using the facts that V ≥ 1, λ ∈ (0, 1) and Markov’s inequality we
deduce
P [N > n|X0 ] ≤ λn V (X0 ) ,
which delivers the required geometric ergodicity.
JJ
Full Screen
Close
Quit
Temptation: define dominating process using V . There is an interesting link – Rosenthal (2002) draws it even closer – but:
Home Page
Title Page
Existence of Lyapunov function
doesn’t ensure domCFTP
The expectation inequality of supermartingale-type,
E [V (Xn+1 )|Xn ]
≤
λV (Xn ) + β I [Xn ∈ C] ,
Contents
JJ
II
J
I
Page 98 of 111
is enough to control the rate at which X visits C, but for domCFTP
based on the ordering implied by V we require well-behaved distributional bounds on the families of distributions
Go Back
Full Screen
Dx
=
{L(V (Xn+1 |Xn ) : Xn = u, V (u) ≤ V (x)} ,
and it is easy to construct badly behaved examples. :-(
Close
Quit
Exercise 4.12 Construct a Markov chain on [0, ∞) which
satisfies the conditions of Theorem 4.11, using V (x) = x
and C = {0}, but such that any V -dominating process for
X (a process U for which V (Un+1 ) ≥ V (Xn+1 ) whenever
V (Un ) ≥ V (Xn )) must be transient!
However I have recently discovered how to circumvent this issue
using sub-sampling . . . .
Home Page
Title Page
Contents
JJ
II
J
I
Page 99 of 111
Go Back
Full Screen
Close
Quit
4.5. Backward-forward algorithm
Home Page
A careful look at domCFTP for the area-interaction process, or generalizations to other point processes as described in Kendall and
Møller (2000), shows that the construction is as follows:
Title Page
Contents
• build space-time Poisson process of free points;
• convert free points into initial points for time-like line
segments, hence space-time birth and death process;
JJ
II
J
I
• mark free points independently;
Page 100 of 111
• apply causal thinning procedure in time order;
Go Back
• domCFTP succeeds if the time-zero result of this
thinning procedure stabilizes when thinning begins
far enough back in time; apply binary search procedure to capture a time early enough to ensure stabilization at time zero.
Full Screen
Close
Quit
Ferrari et al. (2002) describe a variant of perfect simulation (BackwardsForwards Algorithm, or BFA) which avoids the need to iterate back
through successive starts −T , −2T , −4T .
Home Page
Title Page
• Conduct recursive backwards sweep, identifying free
points (ancestors) which by thinning might conceivably influence subsequent points already identified as
potentially influencing points in the region of interest;
• Work forwards through time in a forwards sweep,
thinning out ancestors to obtain required perfect sample at time zero (assuming backwards sweep generates only finitely many ancestors).
Contents
JJ
II
J
I
Page 101 of 111
Go Back
Full Screen
Instead of coalescence, we now require sub-criticality of the oriented
percolation implicit in the backwards sweep; computable conditions
arise from standard branching process comparisons, and these conditions will generally apply under sufficiently weak interactions.
Close
Quit
The BFA generalizes easily to deal with space windows of infinite
volume processes (compare the “space-time CFTP” mentioned in
Section 1.7).
Home Page
Title Page
An enlightening theoretical example arises if one reformulates the
Ising model using Peierls contours (lines separating ±1 values). As
is well known, these form a “non-interacting hard-core gas”, interacting by perimeter exclusion, to which the Backwards-Forwards Algorithm may in principle be applied: see Fernández et al. (2001). 2
Exercise 4.13 Use BFA to implement a perfect simulation
of the Peierls contour model for low temperature Ising models.
Contents
JJ
II
J
I
Page 102 of 111
Go Back
Full Screen
Close
2
Ferrari et al. (2002) point out, especially for infinite volume processes there
is a “user-impatience” bias for which they estimate the effects.
Quit
References
Aldous, D. J. [1990]. The random walk construction of uniform spanning trees and
uniform labelled trees. SIAM J. Discrete
Math. 3(4), 450–465.
Asmussen, S., P. W. Glynn, and H. Thorisson [1992]. Stationarity detection in
the initial transient problem. ACM Trans.
Model. Comput. Simul. 2(2), 130–157.
Athreya, K. B. and P. Ney [1978]. A new
approach to the limit theory of recurrent Markov chains. Transactions of
the American Mathematical Society 245,
493–501.
This is a rich hypertext bibliography. Journals are
linked to their homepages, and stable URL links
(as provided for example by JSTOR or Project
Euclid
) have been added where known. Access to such URLs is not universal: in case of
difficulty you should check whether you are registered (directly or indirectly) with the relevant
provider. In the case of preprints, icons , ,
,
linking to homepage locations are inserted
where available: note that these are probably less
stable than journal links!.
Borovkov, A. A. [1984]. Asymptotic methods
in queuing theory. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Chichester: John Wiley & Sons Ltd.
Breyer, L. A. and G. O. Roberts [2002]. A new
method for coupling random fields. LMS
J. Comput. Math. 5, 77–94 (electronic).
Broder, A. [1989]. Generating random spanning trees. In Thirtieth Annual Symposium Foundations Computer Sci., New
York, pp. 442-447. IEEE.
Brooks, S. and G. Roberts [1997, May]. Assessing convergence of Markov Chain
Monte Carlo algorithms. Technical report, University of Cambridge.
Home Page
Title Page
Contents
JJ
II
J
I
Page 103 of 111
Go Back
Brooks, S. P., Y. Fan, and J. S. Rosenthal
[2002]. Perfect forward simulation via
simulated tempering. Research report,
University of Cambridge.
Burdzy, K. and W. S. Kendall [2000, May].
Efficient Markovian couplings: examples and counterexamples. The Annals
of Applied Probability 10(2), 362–409.
Full Screen
Close
Quit
Also University of Warwick Department of Statistics Research Report
331.
Cai, Y. and W. S. Kendall [2002, July]. Perfect
simulation for correlated Poisson random variables conditioned to be positive.
Statistics and Computing 12, 229–243.
. Also: University of Warwick Department of Statistics Research Report
349.
Corcoran, J. N. and U. Schneider [2003].
Shift and scale coupling methods for perfect simulation. Probab. Engrg. Inform.
Sci. 17(3), 277–303.
Corcoran, J. N. and R. L. Tweedie [2001]. Perfect sampling of ergodic Harris chains.
The Annals of Applied Probability 11(2),
438–451.
Cowles, M. and B. Carlin [1995, May].
Markov Chain Monte Carlo convergence diagnostics: A comparative review. Technical report, University of
Minnesota, Div. of Biostatistics, USA.
Devroye, L. [2001]. Simulating perpetuities.
Methodol. Comput. Appl. Probab. 3(1),
97–115.
Diaconis, P. [1996]. The cutoff phenomenon
in finite Markov chains. Proc. Nat. Acad.
Sci. U.S.A. 93(4), 1659–1664.
Diaconis, P. and J. Fill [1990]. Strong stationary times via a new form of duality. The
Annals of Probability 18, 1483–1522.
Diaconis, P. and D. Stroock [1991]. Geometric bounds for eigenvalues of Markov
chains. Ann. Appl. Probab. 1(1), 36–61.
Home Page
Title Page
Contents
JJ
II
J
I
Dobrow, R. P. and J. A. Fill [2003]. Speeding up the FMMR perfect sampling algorithm: a case study revisited. Random
Structures Algorithms 23(4), 434–452.
Page 104 of 111
Doeblin, W. [1937]. Sur les propriétés asymptotiques de mouvement régis par certains
types de chaı̂nes simples. Bull. Math.
Soc. Roum. Sci. 39(1,2), 57–115,3–61.
Full Screen
Fernández, R., P. A. Ferrari, and N. L. Garcia [2001]. Loss network representation
Go Back
Close
Quit
of Peierls contours. The Annals of Probability 29(2), 902–937.
Ferrari, P. A., R. Fernández, and N. L. Garcia
[2002]. Perfect simulation for interacting
point processes, loss networks and Ising
models. Stochastic Processes and Their
Applications 102(1), 63–88.
Fill, J. A. [1998]. An interruptible algorithm
for exact sampling via Markov Chains.
The Annals of Applied Probability 8,
131–162.
Fill, J. A. and M. Huber [2000]. The randomness recycler: a new technique for
perfect sampling. In 41st Annual Symposium on Foundations of Computer
Science (Redondo Beach, CA, 2000),
pp. 503–511. Los Alamitos, CA: IEEE
Comput. Soc. Press.
Fill, J. A., M. Machida, D. J. Murdoch,
and J. S. Rosenthal [2000]. Extension
of Fill’s perfect rejection sampling algorithm to general chains. Random Structures and Algorithms 17(3-4), 290–316.
Foss, S. G. and R. L. Tweedie [1998]. Perfect simulation and backward coupling.
Stochastic Models 14, 187–203.
Goldstein, S. [1978 / 1979]. Maximal coupling. Zeitschrift für Wahrscheinlichkeitstheorie und Verve Gebiete 46(2), 193–
204.
Gousseau, Y. and F. Roueff [2003, December]. The dead leaves model : general results and limits at small scales. Preprint,
arXiv:math.PR/0312035.
Green, P. J. and D. J. Murdoch [1999]. Exact sampling for Bayesian inference: towards general purpose algorithms (with
discussion). In J. Bernardo, J. Berger,
A. Dawid, and A. Smith (Eds.), Bayesian
Statistics 6, pp. 301–321. The Clarendon
Press Oxford University Press. Presented
as an invited paper at the 6th Valencia International Meeting on Bayesian Statistics, Alcossebre, Spain, June 1998.
Griffeath, D. [1974 / 1975]. A maximal coupling for Markov chains. Zeitschrift für
Wahrscheinlichkeitstheorie und Verve
Gebiete 31, 95–106.
Home Page
Title Page
Contents
JJ
II
J
I
Page 105 of 111
Go Back
Full Screen
Close
Quit
Häggström, O. [2002]. Finite Markov chains
and algorithmic applications, Volume 52 of London Mathematical Society
Student Texts. Cambridge: Cambridge
University Press.
Häggström, O. and K. Nelander [1998]. Exact sampling from anti-monotone systems. statistica neerlandica 52(3), 360–
380.
Häggström, O. and J. E. Steif [2000]. ProppWilson algorithms and finitary codings
for high noise Markov random fields.
Combin. Probab. Computing 9, 425–
439.
Häggström, O., M. N. M. van Lieshout,
and J. Møller [1999]. Characterisation results and Markov chain Monte
Carlo algorithms including exact simulation for some spatial point processes.
Bernoulli 5(5), 641–658. Was: Aalborg
Mathematics Department Research Report R-96-2040.
Huber, M. [1999]. The swap move: a tool
for building better Markov chains. In
10th Annual Symposium on Discrete Algorithms.
Huber, M. [2003]. A bounding chain for
Swendsen-Wang. Random Structures
and Algorithms 22(1), 43–59.
Johnson, V. E. [1996]. Studying convergence
of Markov chain Monte Carlo algorithms using coupled sample paths. Journal of the American Statistical Association 91(433), 154–166.
Kendall, W. S. [1983]. Coupling methods and
the storage equation. Journal of Applied
Probability 20, 436–441.
Kendall, W. S. [1997a]. On some weighted
Boolean models. In D. Jeulin (Ed.), Advances in Theory and Applications of
Random Sets, Singapore, pp. 105–120.
World Scientific. Also: University of
Warwick Department of Statistics Research Report 295.
Kendall, W. S. [1997b]. Perfect simulation
for spatial point processes. In Bulletin
ISI, 51st session proceedings, Istanbul
(August 1997), Volume 3, pp. 163–166.
Also: University of Warwick Department of Statistics Research Report 308.
Home Page
Title Page
Contents
JJ
II
J
I
Page 106 of 111
Go Back
Full Screen
Close
Quit
Kendall, W. S. [1998]. Perfect simulation
for the area-interaction point process. In
L. Accardi and C. C. Heyde (Eds.), Probability Towards 2000, New York, pp.
218–234. Springer-Verlag. Also: University of Warwick Department of Statistics Research Report 292.
Kendall, W. S. and J. Møller [2000, September]. Perfect simulation using dominating processes on ordered state spaces,
with application to locally stable point
processes. Advances in Applied Probability 32(3), 844–865.
Also University of Warwick Department of Statistics
Research Report 347.
Kendall, W. S. and G. Montana [2002, May].
Small sets and Markov transition densities. Stochastic Processes and Their Ap, also
plications 99(2), 177–194.
University of Warwick Department of
Statistics Research Report 371
Kendall, W. S. and E. Thönnes [1999]. Perfect simulation in stochastic geometry.
Pattern Recognition 32(9), 1569–1586.
Also: University of Warwick Department of Statistics Research Report 323.
Kifer, Y. [1986]. Ergodic theory of random
transformations, Volume 10 of Progress
in Probability and Statistics. Boston,
MA: Birkhäuser Boston Inc.
Kindermann, R. and J. L. Snell [1980].
Markov random fields and their applications, Volume 1 of Contemporary Mathematics. Providence, R.I.: American
Mathematical Society.
Kolmogorov, A. N. [1936]. Zur theorie
der markoffschen ketten. Math. Annalen 112, 155–160.
Kumar, V. S. A. and H. Ramesh [2001]. Coupling vs. conductance for the JerrumSinclair chain. Random Structures and
Algorithms 18(1), 1–17.
Lee, A. B., D. Mumford, and J. Huang [2001].
Occlusion models for natural images: A
statistical study of a scale-invariant dead
leaves model. International Journal of
Computer Vision 41, 35–59.
Letac, G. [1986]. A contraction principle for
certain Markov chains and its applications. In Random matrices and their
applications (Brunswick, Maine, 1984),
Home Page
Title Page
Contents
JJ
II
J
I
Page 107 of 111
Go Back
Full Screen
Close
Quit
Volume 50 of Contemp. Math., pp. 263–
273. Providence, RI: Amer. Math. Soc.
Lindvall, T. [2002]. Lectures on the coupling
method. Mineola, NY: Dover Publications Inc. Corrected reprint of the 1992
original.
Loynes, R. [1962]. The stability of a queue
with non-independent inter-arrival and
service times. Math. Proc. Camb. Phil.
Soc. 58(3), 497–520.
Meyn, S. P. and R. L. Tweedie [1993]. Markov
Chains and Stochastic Stability. New
York: Springer-Verlag.
Mira, A., J. Møller, and G. O. Roberts
[2001]. Perfect slice samplers. Journal
of the Royal Statistical Society (Series
B: Methodological) 63(3), 593–606. See
(Mira, Møller, and Roberts 2002) for
corrigendum.
Mira, A., J. Møller, and G. O. Roberts [2002].
Corrigendum: Perfect slice samplers.
Journal of the Royal Statistical Society
(Series B: Methodological) 64(3), 581.
Møller, J. and G. Nicholls [1999]. Perfect
simulation for sample-based inference.
Technical Report R-99-2011, Department of Mathematical Sciences, Aalborg
University.
Møller, J. and R. Waagepetersen [2004]. Statistical inference and simulation for spatial point processes, Volume 100 of
Monographs on Statistics and Applied
Probability. Boca Raton: Chapman and
Hall / CRC.
Mountford, T. S. and M. Cranston [2000]. Efficient coupling on the circle. In Game
theory, optimal stopping, probability and
statistics, Volume 35 of IMS Lecture
Notes Monogr. Ser., pp. 191–203. Beachwood, OH: Institute of Mathematical
Statistics.
Murdoch, D. and J. Rosenthal [1999]. Efficient use of exact samples. Statistics and
Computing 10, 237–243.
Murdoch, D. J. [2000]. Exact sampling
for Bayesian inference: unbounded
state spaces. In Monte Carlo methods (Toronto, ON, 1998), Volume 26
Home Page
Title Page
Contents
JJ
II
J
I
Page 108 of 111
Go Back
Full Screen
Close
Quit
of Fields Inst. Commun., pp. 111–121.
Providence, RI: Amer. Math. Soc.
Random Structures and Algorithms 9,
223–252.
Mykland, P., L. Tierney, and B. Yu [1995].
Regeneration in Markov chain samplers.
Journal of the American Statistical Association 90(429), 233–241.
Roberts, G. O. and N. G. Polson [1994].
On the geometric convergence of the
Gibbs sampler. Journal of the Royal Statistical Society (Series B: Methodological) 56(2), 377–384.
Nummelin, E. [1978]. A splitting technique
for Harris-recurrent chains. Zeitschrift
für Wahrscheinlichkeitstheorie und
Verve Gebiete 43, 309–318.
Nummelin, E. [1984]. General irreducible
Markov chains and nonnegative operators. Cambridge: Cambridge University
Press.
Propp, J. and D. Wilson [1998]. Coupling
from the past: a user’s guide. In Microsurveys in discrete probability (Princeton, NJ, 1997), Volume 41 of DIMACS
Ser. Discrete Math. Theoret. Comput.
Sci., pp. 181–192. Providence, RI:
American Mathematical Society.
Propp, J. G. and D. B. Wilson [1996]. Exact
sampling with coupled Markov chains
and applications to statistical mechanics.
Roberts, G. O. and J. S. Rosenthal [1999].
Convergence of slice sampler Markov
chains. Journal of the Royal Statistical Society (Series B: Methodological) 61(3), 643–660.
Roberts, G. O. and J. S. Rosenthal [2001].
Small and pseudo-small sets for Markov
chains. Stochastic Models 17(2), 121–
145.
Roberts, G. O. and R. L. Tweedie [1996a]. Exponential convergence of Langevin diffusions and their discrete approximations. Bernoulli 2, 341–364.
Roberts, G. O. and R. L. Tweedie [1996b].
Geometric convergence and central
limit theorems for multidimensional
Hastings and Metropolis algorithms.
Biometrika 83, 96–110.
Home Page
Title Page
Contents
JJ
II
J
I
Page 109 of 111
Go Back
Full Screen
Close
Quit
Rosenthal, J. S. [2002]. Quantitative convergence rates of Markov chains: A simple account. Electronic Communications
in Probability 7, no. 13, 123–128 (electronic).
Siegmund, D. [1976]. The equivalence of absorbing and reflecting barrier problems
for stochastically monotone Markov processes. The Annals of Probability 4(6),
914–924.
Stoyan, D. [1983]. Comparison methods for
queues and other stochastic models. Wiley Series in Probability and Mathematical Statistics: Applied Probability
and Statistics. Chichester: John Wiley
& Sons. Translation from the German
edited by Daryl J. Daley.
of Warwick Statistics Research Report
317.
Thorisson, H. [1988]. Backward limits. The
Annals of Probability 16(2), 914–924.
Thorisson, H. [2000]. Coupling, stationarity,
and regeneration. New York: SpringerVerlag.
Widom, B. and J. S. Rowlinson [1970].
New model for the study of liquid-vapor
phase transitions. Journal of Chemical
Physics 52, 1670–1684.
Sweeny, M. [1983]. Monte Carlo study of
weighted percolation clusters relevant
to the Potts models. Physical Review
B 27(7), 4445–4455.
Wilson, D. B. [1996]. Generating random
spanning trees more quickly than the
cover time. In Proceedings of the
Twenty-eighth Annual ACM Symposium
on the Theory of Computing (Philadelphia, PA, 1996), New York, pp. 296–
303. ACM.
Thönnes, E. [1999]. Perfect simulation of
some point processes for the impatient
user. Advances in Applied Probability 31(1), 69–87.
. Also University
Wilson, D. B. [2000a]. How to couple from
the past using a read-once source of randomness. Random Structures and Algorithms 16(1), 85–113.
Home Page
Title Page
Contents
JJ
II
J
I
Page 110 of 111
Go Back
Full Screen
Close
Quit
Wilson, D. B. [2000b]. Layered Multishift
Coupling for use in Perfect Sampling Algorithms (with a primer on CFTP). In
N. Madras (Ed.), Monte Carlo Methods,
Volume 26 of Fields Institute Communications, pp. 143–179. American Mathematical Society.
Wilson, D. B. [2004]. Mixing times of lozenge
tiling and card shuffling Markov chains.
The Annals of Applied Probability 14(1),
274–325.
Wilson, D. B. and J. G. Propp [1996]. How
to get an exact sample from a generic
Markov chain and sample a random
spanning tree from a directed graph, both
within the cover time. In Proceedings
of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (Atlanta,
GA, 1996), New York, pp. 448–457.
ACM.
Home Page
Title Page
Contents
JJ
II
J
I
Page 111 of 111
Go Back
Full Screen
Close
Quit
Download