Models and simulation techniques from stochastic geometry Wilfrid S. Kendall

advertisement
Home Page
Title Page
Contents
JJ
II
J
I
Page 1 of 167
Models and simulation techniques
from stochastic geometry
Wilfrid S. Kendall
Department of Statistics, University of Warwick
Go Back
Full Screen
Close
Quit
A course of 5 × 75 minute lectures at the
Madison Probability Intern Program, June-July 2003
Introduction
Home Page
Title Page
Stochastic geometry is the study of random patterns, whether of points, line segments, or objects.
Related reading:
Stoyan et al. (1995);
Contents
Barndorff-Nielsen et al. (1999);
JJ
II
Møller and Waagepetersen (2003) and the recent expository essays edited
by Møller (2003);
J
I
The Stochastic Geometry and Statistical Applications section of Advances
in Applied Probability.
Page 2 of 167
Go Back
Full Screen
Close
Quit
Home Page
This screen layout is not very suitable for printing. However, if a printout is desired
then here is a button to do the job!
Print
Title Page
Contents
JJ
II
J
I
Page 3 of 167
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 4 of 167
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 5 of 167
Go Back
Full Screen
Close
Quit
Contents
1
A rapid tour of point processes
7
2
Simulation for point processes
45
3
Perfect simulation
79
4
Deformable templates and vascular trees
107
5
Multiresolution Ising models
127
A Notes on the proofs in Chapter 5
149
Home Page
Title Page
Contents
JJ
II
J
I
Page 6 of 167
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 1
A rapid tour of point processes
Page 7 of 167
Go Back
Full Screen
Close
Quit
Contents
1.1
1.2
1.3
1.4
1.5
The Poisson point process . . . .
Boolean models . . . . . . . . . .
General definitions and concepts
Markov point processes . . . . .
Further issues . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
23
27
33
41
1.1. The Poisson point process
Home Page
Title Page
Contents
JJ
II
J
I
The simplest example of a random point pattern is the Poisson point process, which
models “completely random” distribution of points in the plane R2 , or 3-space R3 ,
or other spaces. It is a basic building block in stochastic geometry and elsewhere,
with many symmetries which typically reduce calculations to computations of area
or volume.
Page 8 of 167
Go Back
Full Screen
Close
Quit
Figure 1.1: Which of these point clouds exhibits interaction?
Home Page
Title Page
Contents
JJ
II
J
I
Page 9 of 167
Go Back
Definition 1.1 Let X be a random point process in a window W ⊆ Rn
defined as a collection of random variables X(A), where A runs through
the Borel subsets of W ⊆ Rn . The point process X is Poisson of intensity
λ > 0 if the following are true:
1. X(A) is a Poisson random variable of mean λ Leb(A);
2. X(A1 ), . . . , X(An ) are independent for disjoint A1 , . . . , An .
Examples:
stars in the night sky;
patterns used to define collections of random discs;
Full Screen
. . . or space-time structures;
Close
Brownian excursions indexed by local time(!).
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 10 of 167
Go Back
• In dimension 1 we can construct X using a Markov chain (“number of points
in [0, t]”) and simulate it using Exponential random variables for inter-point
spacings. Higher dimensions are trickier! See Theorem 1.5 below.
• We can replace λ Leb (and W ⊆ Rn ) by a nonnegative σ-finite measure µ
on a measurable space X , in which case X is an inhomogeneous Poisson
point process with intensity measure µ. We should also require µ to be
diffuse (P [µ({x}) = 0] = 1 for every point {x}) in order to avoid the point
process faux pas of placing than one point onto a single location.
Exercise 1.2 Show that any measurable map will carry one (perhaps
inhomogeneous) Poisson point process into another, so long as
– the image of the intensity measure is a diffuse measure;
– and we note that the image Poisson point process is degenerate
if the image measure fails to be σ-finite.
Full Screen
Close
Quit
Exercise 1.3 Show that in particular we can superpose two independent Poisson point processes to obtain a third, since the sum of independent Poisson random variables is itself Poisson.
Home Page
Title Page
• It is a theorem that the second, independent scattering, requirement of Definition 1.1 is redundant! In fact even the first requirement is over-elaborate:
Theorem 1.4 Suppose a random point process X satisfies
P [X(A) = 0]
Contents
JJ
II
J
I
Page 11 of 167
Go Back
=
exp (−µ(A))
for all measurable A .
Then X is an (inhomogeneous) Poisson point process of intensity measure µ.
– This can be proved directly. I find it more satisfying to view the result as a consequence of Choquet capacity theory, since this approach
extends to cover a theory of random closed sets (Kendall 1974). Compare inclusion-exclusion arguments and “correlation functions” used
in the lattice case in statistical mechanics (Minlos 1999).
Full Screen
– We don’t have to check every measurable set:1 it is enough to consider
sets A running through the finite unions of rectangles!
Close
– But we cannot be too constrained in our choice of various A (the family
of convex A is not enough).
Quit
1 Certainly
we need check only the BOUNDED measurable sets.
Home Page
• An alternative approach2 lends itself to simulation methods for Poisson point
processes in bounded windows: condition on the total number of points.
Title Page
Contents
Theorem 1.5 Suppose the window W ⊆ Rn has finite area. The following constructs X as a Poisson point process of finite intensity λ:
– X(W ) is Poisson of mean λ Leb(W );
JJ
II
J
I
– Conditional on X(W ) = n, the random variables X(A) are
constructed as point-counts of a pattern of n independent points
U1 , . . . , Un uniformly distributed over W . So
X(A)
=
# {i : Ui ∈ A} .
Page 12 of 167
Go Back
Full Screen
Exercise 1.6 Generalize Theorem 1.5 to (inhomogeneous) Poisson
point processes and to the case of W of infinite area/volume.
Close
Quit
2 The construction described in Theorem 1.5 is the same as the one producing the link via conditioning between Poisson and multinomial contingency table models.
Home Page
Title Page
Contents
JJ
II
J
I
Page 13 of 167
Go Back
Full Screen
Close
Quit
Exercise 1.7 Independent thinning. Suppose each point ξ of X, produced as in Theorem 1.5, is deleted independently of all other points
with probability 1 − p(ξ) (so p(ξ) is the retention probability). Show
that the resulting thinned random point pattern is an inhomogeneous
Poisson point process with intensity measure p(u) Leb( du).
Exercise 1.8 Formulate the generalization of the above exercise
which uses position-dependent thinning on inhomogeneous Poisson
point processes.
Home Page
Title Page
Contents
JJ
II
J
I
Page 14 of 167
Go Back
Full Screen
Close
Quit
• The process produced by conditioning on X(W ) = n has a claim to being
even more primitive than the Poisson point process.
Definition 1.9 A Binomial point process with n points defined on a
window W is produced by scattering n points independently and uniformly over W .
However you should notice
– point counts in disjoint subsets are now (perhaps weakly) negatively
correlated: the more in one region, the fewer can fall in another;
– the total number n of points is fixed, so we cannot extend the notion of
a (homogeneous) Binomial point process to windows of infinite measure;
– if you like that sort of thing, the Binomial point process is the simplest
representative of a class of point processes (with finite total number of
points) analogous to canonical ensembles in statistical physics. The
Poisson point process relates similarly to grand canonical ensembles
(grand, because points are indistinguishable and we randomize over
the total number of particles). Of course these are trivial cases because
there is no interaction between points — but more of that later!
Home Page
Title Page
Contents
JJ
II
J
I
Here’s a worry.3 It is easy to prove that the construction of Theorem 1.5 produces
point count random variables which satisfy Definition 1.1 and therefore delivers a
Poisson point process. But what about the other way round? Can we travel from
Poisson point counts to a random point pattern? Or is there some subtle concealed
random behaviour in Theorem 1.5 going beyond the labelling of points, which
cannot be extracted from a point count specification?
The answer is that the construction and the point count definition are equivalent.
Page 15 of 167
Go Back
Full Screen
Close
Quit
3 Perhaps you think this is more properly a neurosis to be dealt with by personal counselling?!
Consider that a neurotic concern here is what allows us to introduce measure-theoretic foundations
which are most valuable in further theory.
Home Page
Title Page
Theorem 1.10 Any Poisson point process given by Definition 1.1 induces
a unique probability measure on the space of locally finite point patterns,
when this is furnished with the σ-algebra generated by point counts.
Contents
Here is a sketch of the proof.
JJ
II
J
I
Page 16 of 167
Go Back
Full Screen
Close
Quit
Take W = [0, 1)2 . For each n = 1, 2, . . . , make a dyadic dissection of [0, 1)2 into
4n different sub-rectangles [k2−n , (k + 1)2−n ) × [`2−n , (` + 1)2−n ). For a given
level n, visualize the pattern Pn obtained by painting sub-rectangles black if they
are occupied by points, white otherwise. The point counts for each of these 4n
sub-rectangles are independent Poisson of mean 4−n λ: by subadditivity of probability:
P [one or more point counts > 1] ≤ 4n (1 − (1 + 4−n λ) exp −4−n λ
= 4n (4−n λ)2 θ exp −4−n λθ
for some θ ∈ (0, 1) (use the mean value theorem for x 7→ 1 − (1 + x) exp(−x)),
which is summable in n. So by Borel-Cantelli we deduce that for all large enough
n the black sub-rectangles in pattern Pn contain just one Poisson point each. Ordering these points randomly yields the construction in Theorem 1.5.
Home Page
Simulation techniques for Poisson point processes are rather direct and straightforward: here are two possibilities.
1. We can use Theorem 1.5 to reduce the problem to the algorithm
Title Page
Contents
JJ
II
J
I
Page 17 of 167
Go Back
Full Screen
Close
Quit
• draw N = n from a Poisson random variable distribution;
• draw n independent values from a Uniform distribution over the required window.
def poisson process (region ← W, intensity ← alpha) :
pattern ← [ ]
for i ∈ range (poisson (alpha)) :
pattern.append (uniform (region ← W ))
return pattern
Home Page
Title Page
Contents
JJ
II
J
I
Page 18 of 167
Go Back
Full Screen
2. Quine and Watson (1984) describe a generalization of the 1-dimensional
“Markov chain” method mentioned above. Suppose we wish to simulate a
Poisson point process of intensity λ in ball(o, R). We describe the twodimensional case; the generalization to higher dimensions is straightforward. 4
Using polar coordinates and power transform of radius, produce a
measure-preserving map
(0, R2 /2] × (0, 2π]
→
(s, θ)
7→
ball(o, R)
√
( 2s, θ)
(1.1)
√
where ( 2s, θ) is expressed in polar coordinates, and [0, R2 /2] ×
[0, 2π] and the ball(o, R) are endowed with the measure λ Leb. Being measure-preserving, the map preserves point processes. But
the following Exercise 1.11 shows a Poisson(λ) point process on
the product space [0, R2 /2] × [0, 2π] can be thought of as a 1dimensional Poisson(2λπ) point process of “half-squared radii” on
[0, R2 /2] (which may be constructed using a sequence of independent
Exponential(2λπ) random variables), each point of which is associated
with an “angle” which is uniformly distributed on [0, 2π].
Close
Quit
4 The Quine-Watson method is particularly suited to the construction of Poisson point processes
which are to be used to build geometric tessellations . . .
Home Page
Title Page
Contents
JJ
II
J
I
Page 19 of 167
Exercise 1.11 Suppose X is a Poisson point process of intensity λ
on the rectangle [a, b] × [0, 1]. Show that it may be realized as a
one-dimensional Poisson point process on [a, b] of intensity λ, enriched by attaching to each one-dimensional point an independent
Uniform([0, 1]) random variable to provide the second coordinate.
Simply suppose the construction is given by Theorem 1.5, show the first coordinate of each point provides the required one-dimensional Poisson point
process, note the second coordinate is uniformly distributed as required!
Go Back
Full Screen
Close
Quit
Exercise 1.12 Why is it irrelevant to this proof that the map image
omits the centre o of ball(o, R)?
Home Page
Title Page
Contents
JJ
II
J
I
Page 20 of 167
Exercise 1.13 Check the details of modifying the Quine-Watson construction for higher dimensions, to show how to build a d-dimensional
Poisson point process using independent sequences of Exponential
random variables and random variables uniformly distributed on
S d−1 .
Exercise 1.14 Consider the following method of parametrizing lines:
measure s the perpendicular distance from the origin and θ the angle
made by the perpendicular to a reference line. How should you modify the Quine-Watson construction to produce a Poisson line process
which is invariant under isometries of the plane? What about invariant Poisson hyperplane processes?
Go Back
Full Screen
Close
Quit
What do large cells look like? In the early 1940s D.G. Kendall conjectured a large cell should approximate a disk. We now know this
is true, and more besides! (Miles 1995; Kovalenko 1997; Kovalenko
1999; Hug et al. 2003)
Home Page
Title Page
Given a Poisson point process, we may construct a locally finite point pattern
using the methods of Theorem 1.10, and enumerate the points of the pattern using
some arbitrary rule. So we may speak of summing h(ξ) over the pattern points
ξ; by Fubini the result will not depend on the arbitrary ordering so long as h is
nonnegative.
Contents
JJ
II
J
I
Page 21 of 167
Go Back
Full Screen
Theorem 1.15 If X is a Poisson point process of intensity λ and if h(x, φ)
is a nonnegative measurable function of a point x and a locally finite point
pattern φ then
X
Z
E
h(ξ, X \ {ξ})
=
E [h(u, X)] λ Leb( du) ,
(1.2)
and indeed by direct generalization
X
E
h(ξ1 , . . . , ξn , X \ {ξ1 , ξ2 , . . . , ξn })
Z
···
Rd ···Rd
Quit
=
disjoint
ξ1 ,...,ξn ∈X
Z
Close
Rd
ξ∈X
(1.3)
n
E [h(u1 , . . . , un , X)] λ Leb( du1 ), . . . , Leb( dun ) .
Home Page
Title Page
Exercise 1.16 Use Theorem 1.5 to prove (1.2) for h(u, X) which is positive
only when u lies in a bounded window W and which depends on X only on
the intersection of the pattern X with the window W . Hence deduce the full
Theorem 1.15.
Contents
This delivers a variation on the independent scattering property of the Poisson
point process.
JJ
II
J
I
Page 22 of 167
Corollary 1.17 If g is a nonnegative measurable function of a locally finite
point pattern φ then
E [g(X \ {u})|u ∈ X]
=
E [g(X)]
Go Back
Full Screen
Close
Quit
hP
i
Use Theorem 1.15 to calculate E
ξ∈X I [ξ ∈ ball(u, ε)]g(X \ {ξ}) . Interpret
R
the expectation as ball(u,ε) E [g(X \ {u})|u ∈ X] λ Leb( du)
We say the Palm distribution for a Poisson point process is the Poisson point process itself with the conditioning point added in!
1.2. Boolean models
Home Page
Title Page
Contents
JJ
II
J
I
In stochastic geometry the counterpart to the Poisson point process is the Boolean
model; the set-union of (possibly random) geometric figures or grains located
one at each point or germ of an underlying Poisson point process. Much of the
amenability of the Poisson point process is inherited by the Boolean model, though
nevertheless its analysis can lead to very substantial calculations and theory.
Page 23 of 167
Go Back
Boolean models have the advantage of being simple enough that one can carry
explicit computation. They have a long application history, for example being
used by Robbins (1944,1945) to study bombing problems (how heavily to bomb a
beach in order to ensure a high probability of landmine-clearance . . . ).
Full Screen
Close
Bombing problems and coverage: refer to Hall (1988, §3.5), Molchanov
(1996a); also note application of Exercise 1.14.
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 24 of 167
Go Back
Full Screen
Close
Quit
Definition 1.18 A Boolean model Ξ is defined in terms of a germ process
which is a Poisson point process X of intensity λ, and a grain distribution
which defines a random set5 . It is a random set which is formed by the union
[
(U (ξ) + ξ) ,
ξ∈X
where the U (ξ) are independent realizations of the random set, one for each
germ or point ξ of the germ process. (Notice: the grains should also be
independent of the germs!)
The classic computation for a Boolean model Ξ is to determine the probability
P [o ∈ Ξ] that the resulting random set covers a specified point (here, the origin
o). This reduces to a matter of geometry: if o is covered then it must be covered
by at least one grain, so just think about where the corresponding germ might lie.
If U is non-random then the germ will have to lie in the reflection Û of U in the
origin, so restrict attention to germs lying in Û . If U is random then
h generalize
i
this: thin each germ ξ of the germ process using the probability P ξ ∈ Û that
5 The random set is often not random at all (a disk of fixed radius, say) or is random in a simple
parametrized manner (a disk of random radius, or a random convex polygon defined as the intersection of a sequence of random half-spaces); so we will follow Robbins (1944,1945) in treating this
informally, rather than developing the later and profound theory of random sets (Matheron 1975)!
Home Page
the corresponding grain coversho. Theiresult is an inhomogeneous Poisson point
process of intensity measure P u ∈ Û Leb( du): calculate the probability of this
process containing no points at all to produce the following:
Title Page
Contents
JJ
II
J
I
Page 25 of 167
Go Back
Full Screen
Theorem 1.19 Suppose Ξ is a Boolean model with germ intensity λ and
typical random grain U . Then the area / volume fraction is
P [o ∈ Ξ]
=
1 − exp (−λ E [Leb(U )]) .
(1.4)
A similar argument produces the generalization
Theorem 1.20 Suppose Ξ is a Boolean model with germ intensity λ and
typical random grain U . Let K be a compact region. Then the probability
of Ξ hitting K is
h
i
P [Ξ ∩ K 6= ∅] = 1 − exp −λ E Leb(Û ⊕ K) ,
(1.5)
where Û ⊕ K is a Minkowski sum:
Close
K 1 ⊕ K2
Quit
=
{x1 + x2 : x1 ∈ K1 , x2 ∈ K2 } .
(1.6)
Home Page
Title Page
Contents
Remember, from the discussion prior to Theorem 1.15, that we may view the
germs as a sequence of points (using some arbitrary method of ordering); so simulation of Ξ need only depend on generating a suitable sequence of independent
random sets U1 , U2 , . . . .
Harder analytical questions include:
• how to estimate the probability of complete coverage?
JJ
II
• in a high-intensity Boolean model, what is the approximate statistical shape
of a vacant region?
J
I
• as intensity increases, how to describe the transition to infinite connected
regions of coverage?
Page 26 of 167
Go Back
Full Screen
Close
Quit
Useful further reading for these questions: Hall (1988), Meester and Roy (1996),
Molchanov (1996b).
1.3. General definitions and concepts
Home Page
1.3.1. General Point Processes
Title Page
Other approaches to point processes include:
• random measures (nonnegative integer-valued, charge singletons with 0, 1);
Contents
• measures on exponential measure spaces;
JJ
II
J
I
Page 27 of 167
Go Back
Full Screen
Close
Quit
reflected in notation (a point process X can have points belonging to it (ξ ∈ X),
can be used for point counts (X(E) when E is a subset of the ground space, and
can be used to provide a reference measure so as to define other point processes
via densities). Carter and Prenter (1972) establish essential equivalences.
We can define general point processes as probability measures on the space of locally finite point patterns, following Theorem 1.10. This allows us immediately to
define stationary point processes (the distribution respects translation symmetries,
at least locally) and isotropic point processes (the same, but for rotations). The
distribution of a general point process is characterized by its void probabilities as
in Theorem 1.4. To each point process X (whether stationary, isotropic, or not)
there is an intensity measure given by Λ(E) = E [X(E)]. (For stationary point
processes this has to be a multiple of Lebesgue measure!)
We can also define the Palm distribution for X:
Home Page
Definition 1.21 E [g(X)|u ∈ X] is required to solve
Title Page

Contents
E

X
f (ξ)g(X \ {ξ})
=
ξ∈X∩ball(x,ε)
Z
JJ
II
J
I
Page 28 of 167
f (u)E [g(X)|u ∈ X] Λ( du) . (1.7)
ball(x,ε)
A somewhat dual notion is the conditional intensity: informally, if ξ is a location and x is a locally finite point pattern not including the location ξ then the
conditional intensity λ∗ (x; ξ) is the infinitesimal probability of a point pattern X
containing ξ if X \ {ξ} = x:
Go Back
λ∗ (φ; ξ)
=
P [ξ ∈ X|X \ {ξ} = x]
.
P [X \ {ξ} = x]
(1.8)
Full Screen
Close
Quit
We shall not define this in rigorous generality, as its definition will be immediate
for the Markov point processes to which we will turn below.
Home Page
These considerations provide a general context for figuring out constructions which
move away from the “no interactions” property of Poisson point processes.
Title Page
1. Poisson cluster processes (can be viewed as Boolean models in which the
grains are simply finite random clusters of points). Neyman-Scott processes
arise from clusters of independent points.
Contents
2. Randomize intensity measure: “doubly stochastic” or Cox process.
JJ
II
J
I
Page 29 of 167
Go Back
Exercise 1.22 If the number of points in a Neyman-Scott cluster is
Poisson, then the Neyman-Scott point process is also a Cox process.
Line processes present an interesting issue here! Davidson conjectured that a second-order stationary line process with no parallel lines
is Cox. Kallenberg (1977) counterexample:
Stoyan et al. (1995, Figure 8.3) provides an illustration.
Full Screen
Close
Quit
3D variants with applications in Parkhouse and Kelly (1998).
3. Or apply rejection sampling to a Poisson point process, rejection probability
depending on statistics computed from the point pattern. Compare Gibbs
measure formalism of statistical mechanics.
What sort of statistics might we use?
Home Page
• estimation of intensity λ by λ̂(X) = X(W )/ Leb(W ) (using sufficient
statistic X(W ));
Title Page
• measuring how close the point pattern is to a specific location using the
spherical contact distribution function
Contents
JJ
II
J
I
Page 30 of 167
Hs (r)
Full Screen
Close
Quit
P [dist(o, X) ≤ r] ,
(1.9)
estimated via a Boolean model Ξr based on balls U of fixed radius r by
[
Leb(Ξr ∩ W )/ Leb(W ) = Leb
(U + ξ) ∩ W / Leb(W ) ;
ξ∈X
(1.10)
• measuring how close pattern points come to each other using the nearest
neighbour distance function
D(r)
Go Back
=
=
P [dist(o, X) ≤ r|o ∈ X] .
(1.11)
Interpret conditioning on null event “o ∈ X” using Palm distributions;
• The related reduced second moment measure
1
K(r) =
E [#{ξ ∈ X : dist(o, ξ) ≤ r} − 1|o ∈ X] ,
(1.12)
λ
estimating λK(r) by the number sr (X) of (ordered) neighbour pairs (ξ1 , ξ2 )
in X for which dist(ξ1 , ξ2 ) ≤ r (use Slivynak-Mecke Theorem 1.15).
We ignore the important issue of edge effects!
Home Page
Title Page
Contents
JJ
II
J
I
Page 31 of 167
Go Back
Full Screen
Close
Quit
Exercise 1.23 Use Theorem 1.15 to prove that for a homogeneous Poisson
point process D(r) = Hs (r) and K(r) = Leb(ball(o, r)).
Here is a prototype of use of these statistics in rejection algorithms:
Exercise 1.24 Consider the following rejection simulation algorithm, noting that λ̂ = X(W )/ Leb(W ) to connect with previous discussion:
def reject () :
X ← poisson process (region ← W, intensity ← lamda)
while uniform (0, 1) > theta ∗ ∗(number (X)) :
X ← poisson process (region ← W, intensity ← lamda)
return X
Show that the resulting point process is Poisson of intensity λ + θ.
Use Theorem 1.5 to reduce this to a question about rejection sampling of Poisson
random variables.
1.3.2. Marked Point Processes
Home Page
Title Page
Contents
JJ
II
J
I
We can think of Boolean models as derived from marked Poisson point processes;
to each germ point assign a random mark encoding the geometric data (such as
disk radius) of the grain. In effect the Poisson point process now lives on Rd × M
where M is the mark space.
• Poisson cluster processes as mentioned above: the grains are finite random
clusters;
• segment and disk processes;
• line and (hard) rod processes;
Page 32 of 167
Go Back
Full Screen
• fibre and surface processes.
Compare Exercise 1.11; the intensity measure will factorize as µ × ν so long as the
marks are independent, where µ is the intensity of the basic Poisson point process
and the measure ν is the distribution of the mark.
But beware! it can be meaningful to consider improper mark distributions.
Examples include:
Decomposition of Brownian motion into excursions;
Close
Marked planar Poisson point process for which the marks are disks of
random radius R, density proportional to 1/r if r ≤ threshold.
Quit
1.4. Markov point processes
Home Page
Title Page
Consider interacting point processes generating by rejection sampling (Exercise
1.24 is simple example!).
For the sake of simplicity, we suppose here that all our processes are two-dimensional.
The general procedure is to use a statistic t(X) in the algorithm
Contents
JJ
II
J
I
Page 33 of 167
def reject () :
X ← poisson process (region ← W, intensity ← lamda)
while uniform (0, 1) > exp (theta ∗ t (X)) :
X ← poisson process (region ← W, intensity ← lamda)
return X
We can now use standard arguments from rejection simulation. The resulting point
process distribution has a density
constantθ × exp(θt(X))
Go Back
(1.13)
with respect to the Poisson point process X of intensity λ, for normalizing constant
Full Screen
constantθ
Close
Quit
=
E [exp(θt(X))|X is Poisson(λ)] .
Home Page
If θt(X) > 0 is possible then the above algorithm runs into trouble (how do you
reject a point pattern with probability greater than 1?!) However the probability
density at Equation (1.13) can still make sense. Importance sampling could of
course implement this. We can do better if the following holds:
Title Page
Contents
Definition 1.25 A point pattern density f (X) satisfies Ruelle stability if
f (X)
JJ
≤
A exp(B#(X))
(1.14)
II
for constants A > 0, B; where #(X) is the number of points in X.
J
I
Page 34 of 167
Go Back
Exercise 1.26 6 Show that if the density exp(θt(X)) is Ruelle stable with
constants as in 1.25 then the rejection algorithm above will produce a
point pattern with the correct density if applied to a Poisson point process of density λeB instead of λ, and rejection probability is changed to
exp(θt(X) − B#(X))/A instead of exp(θt(X)).
Full Screen
Close
Quit
Use t(X) = sr (X), for γ = exp(θ) ∈ (0, 1), the number of ordered pairs separated by at least r, to obtain the celebrated Strauss point process (Strauss 1975).
6 Thanks
for the comment, Tom!
Home Page
The statistic t(X) = Leb(Ξr ∩ W ), for γ = exp(−θ) results in a process known
as the area-interaction point process (Baddeley and van Lieshout 1995), also
known in statistical physics as the Widom-Rowlinson interpenetrating spheres
model (Widom and Rowlinson 1970) in the case θ > 1.
Title Page
Contents
JJ
II
J
I
The (Papangelou) conditional intensity is easily defined for this kind of weighted
Poisson point process, which (borrowing statistical mechanics terminology) is
commonly known as a Gibbs point process: if the density of the Gibbs process
with respect to a unit rate Poisson point process is given by f (X) then the conditional intensity at ξ may be defined as
λ∗ (x; ξ)
=
f (x ∪ {ξ}
.
f (x)
(1.15)
Page 35 of 167
Go Back
Full Screen
Close
Exercise 1.27 Check the Nguyen-Zessin formula (compare Theorem 1.15):
for nonnegative h,


Z
X
E [h(u, x)λ∗ (x, u)] Leb( du) (1.16)
E
h(ξ, x \ {ξ}) =
ξ∈x
Quit
Home Page
Title Page
Exercise 1.28 Compute the conditional intensity for the Poisson point process of intensity λ (use its density with respect to a unit-intensity Poisson
point process; see Exercise 1.24).
Contents
JJ
II
J
I
Exercise 1.29 Compute the conditional intensity for the Strauss point process.
Page 36 of 167
Go Back
Exercise 1.30 Compute the conditional intensity for the area-interaction
point process.
Full Screen
Close
Quit
All these conditional intensities λ∗ (x; ξ) share a striking property: they do not
depend on the point pattern x outside of some neighborhood of ξ. This is an
appealing property for a point process intended to model interactions: it is captured
in Ripley and Kelly’s (1977) definition of a Markov point process:
Home Page
Title Page
Definition 1.31 A Gibbs point process is a Markov point process with respect to a neighbourhood relation ∼ if its density is everywhere positive7 ,
and its conditional intensity λ∗ (x; ξ) depends only on ξ and {η ∈ φ : η ∼
ξ}.
Contents
JJ
II
J
I
Page 37 of 167
Go Back
Full Screen
Close
Quit
The celebrated Hammersley-Clifford theorem tells us that the probability density
for a Markov point process can always be factorized as a product of interactions,
functions of ∼-cliques of points of the process. Ripley and Kelly (1977) presents
a proof for the point process case, while Baddeley and Møller (1989) provides a
significant extension. A recent treatment of Markov point processes in book form
is given by Van Lieshout (2000).
However care must be taken to ensure they are well-defined: the danger of simply
writing down a statistic t(X) and defining the density by Equation (1.13) is that
the total integral
E [exp(θt(X)|X is Poisson(λ)]
may diverge so that the normalizing constant is not well-defined. (This is the case
for the Strauss point process when γ > 1, as shown by Kelly and Ripley 1976. This
particular point process cannot therefore be used to model clustering — a major
motivation for considering the more amenable area-interaction point process.)
7 The positivity requirement ensures that the ratio expression for λ∗ (x; ξ) in Equation (1.15) is
well-defined! It can be relaxed to a hereditary property.
Home Page
Title Page
Exercise 1.32 Show that E γ sr (X) diverges for γ > 1 if the interaction
radius r exceeds the maximum separation of potential points in the window
W.
Contents
JJ
II
J
I
Page 38 of 167
Go Back
Full Screen
Close
Quit
Exercise 1.33 Bound exp(−θ Leb(Ξr ∩ W )) above in terms of the number
of points in W (show the interaction satisfies Ruelle-stability), and hence
deduce that the area-interaction point process is defined for all real θ.
Home Page
Title Page
Contents
JJ
II
J
I
Page 39 of 167
Can we bias using perimeter instead of area of Ξr in area-interaction, or even
“holiness” (Euler characteristic: number-of-components minus number-of-holes)?
Like area, these characteristics are local (lead to formal densities whose conditional intensities satisfy the local requirement of Theorem 1.31). The Hadwiger
characterization theorem (Hadwiger 1957) more-or-less says that linear combinations of these three measures exhaust all local geometric possibilities! Can the
result be normalized?
Exercise 1.34 Use the argument of Exercise 1.33 to show that
exp(θ perimeter(Ξr ∩ W )) can be normalized for all θ.
Exercise 1.35 Argue similarly to show that exp(θ euler(Ξr ∩ W )) can be
normalized for all θ > 0.
Go Back
Full Screen
Close
Quit
There can be no more components than disks/grains of the Boolean model Ξr .
Home Page
Title Page
What about exp(θ euler(Ξr ∩ W )) for θ < 0? Key remark is Theorem 4.3 of Kendall et al. (1999): a union of
N closed disks in the plane (of whatever radii, not necessarily the same for each disk) can have at most 2N − 5
holes.
Contents
JJ
II
J
I
Separate, trickier argument: disks can be replaced by
polygons satisfying a “uniform wedge” condition.
Page 40 of 167
Go Back
Full Screen
Close
Quit
There are counterexamples . . . .
1.5. Further issues
Home Page
Before moving on to develop machinery for simulating realizations of Markov
point processes, we make brief mention of a couple of recent pieces of work.
Title Page
1.5.1. Poisson point processes and set-indexed martingales
Contents
JJ
II
J
I
Page 41 of 167
There are striking applications of set-indexed martingales to distributional results
for the areas of adaptively random regions. Recall the notion of stopping time T
for a filtration {Ft }. Here is a variant on the standard definition: we require
{ω : ∆(ω) t} ∈ Ft
where
∆(ω) = {ω : t T (ω)} .
(1.17)
Makes sense for partially-ordered family of closed sets (using set-inclusion); hence
martingales, and Optional Sampling Theorem (Kurtz 1980). Zuyev (1999) used
this to give an elegant proof of results of Møller and Zuyev (1996):
Go Back
Full Screen
Close
Theorem 1.36 Suppose ∆ is a compact stopping set for X a Poisson point
process of intensity α, such that
P [X(∆) = n]
>
0
and does not depend on α .
Then Leb(∆) given X(∆) is Gamma(n, α).
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Examples: minimal centred ball containing n points (easy: compare Exercise
1.11); regions constructed using Delaunay or Voronoi tessellations.
Proof: use likelihood ratio for α (furnishing a martingale), hence
h
i
E ez Leb(∆) | X(∆) = n, α = α
=
z Leb(∆)
P e
I [X(∆) = n]αn α0−n e−(α−α0 ) Leb(∆) | α = α0
.
P [X(∆) = n | α = α0 ]
Page 42 of 167
Go Back
Full Screen
(a) Delaunay circumdisk
(b) Voronoi flower
Close
Quit
Cowan et al. (2003) use this technique to consider the extent to which these examples lead to decompositions as in the minimal centred ball.
1.5.2. Stationary random countable dense sets
Home Page
Title Page
Contents
JJ
II
J
I
Page 43 of 167
Go Back
We conclude this lecture with a cautionary example! Without straying very from
the theory of random point processes one can end up in the bad part of the Measure
Theory town. Suppose for whatever reason one wishes to consider random point
sets X which are countable but are also dense (set of directions of a line process?).
We need to be precise about what we can observe, so that we genuinely treat the
different points of X on equal terms. So suppose we can observe any event in the
hit-or-miss σ-algebra generated by [X ∩ A] 6= ∅, as A runs through the Borel sets.
Suppose that X is stationary.
(We thus exclude the situations considered by Aldous and Barlow (1981), where
account is taken of how various points are constructed — in effect the dense point
process is obtained by projection from a conventional locally finite point process.)
There are two ways to define such sets:
(a) measurable subset of Ω × R;
(b) Borel-set-valued random variable, hit-or-miss measurable.
Full Screen
Close
Quit
Equivalent for case of closed sets (Himmelberg 1975, Thm. 3.5). But (warning!)
the set of countable sets is not hit-or-miss measurable, though constructive and
weak countability are equivalent (section theorem).
What can we say about stationary random countable dense sets?
Home Page
Title Page
Theorem 1.37 (Kendall 2000, Thm. 4.3, specialized to Euclidean case.) If
E is any hit-or-miss measurable event then either P [E] = 0 or P [E] = 1.
Whether 0 or 1 depends on E but not on X!
Contents
JJ
II
J
I
Page 44 of 167
Exercise 1.38 Compare these two stationary random countable dense sets:
(a) For Y a unit-intensity Poisson point process on R × (0, ∞), the countable set obtained by projection onto the first coordinate.
(b) The set Q + Z, where Q is the set of rational numbers, and Z is a
Uniform([0, 1]) random variable.
They are very different, but appear indistinguishable . . . .
Go Back
Full Screen
Close
Quit
This bad behaviour should not be surprising: for example Vitali’s non-measurable
set depends on there being no measurable random variable taking values in Q + Z.
(Contrast our use of enumeration for locally finite point sets in Theorem 1.15.)
Careless thought betrays intuition!
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 2
Simulation for point processes
Page 45 of 167
Go Back
Full Screen
Close
Quit
Contents
2.1
2.2
2.3
2.4
2.5
2.6
General theory . . . . . . . . . . .
Measure-theoretic approach . . . .
MCMC for Markov point processes
Uniqueness and convergence . . . .
The MCMC zoo . . . . . . . . . . .
Speed of convergence . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
51
57
63
64
70
Home Page
Computations with Poisson process, Boolean model, cluster processes can be very
direct, and simulation is usually direct too. But what about Markov point processes?
Title Page
• Simple-minded rejection procedures (§1.4) are usually infeasible.
Contents
JJ
II
• What about accept-reject procedures sequentially in small sub-windows,
thus constructing a point-pattern-valued Markov chain whose invariant distribution is the desired point process?
J
I
• Or proceed to limit: add/delete single points?
Page 46 of 167
Go Back
Full Screen
Close
Quit
Markov chain Monte Carlo (MCMC) Gibbs’ sampler, Metropolis-Hastings!
Poisson point process as invariant distribution of spatial immigration-death
process after Preston (1977), also Ripley (1977, 1979).
To deal with more general Markov point processes it is convenient first to work
through the general theory of detailed balance equations for Markov chains on
general state-spaces. We will show how to use this to derive simulation algorithms
for our example point processes above, and discuss the more general MetropolisHastings (MH) formulation after Metropolis et al. (1953) and Hastings (1970),
which allows us to view MCMC as a sequence of proposals of random moves
which may or may not be rejected.
Simple example of detailed balance
Home Page
Underlying the Poisson point process example: the process X of total number of
points alters as follows
Title Page
• birth in time interval [0, ∆t) at rate α∆t;
• death in time interval [0, ∆t) at rate X∆t.
Contents
Then we can solve
JJ
II
J
I
P [X(t) = n − 1 and birth in [t, t + ∆t)] ≈ π(n − 1) × α∆t
= π(n) × n∆t ≈ P [X(t) = n and death in [t, t + ∆t)]
by
Page 47 of 167
Go Back
Full Screen
Close
Quit
π(n)
=
α
π(n − 1)
n
= ... =
αn e−α
.
n!
2.1. General theory
Home Page
Title Page
• Ergodic theory for Markov chains: invariant distribution
X
π(y) =
π(x)p(x, y)
x
Contents
or (continuous time)
X
JJ
II
J
I
π(x)q(x, y) = 0 .
x
• Detailed balance;
– reversibility π(x)p(x, y) = π(y)p(y, x) ;
– dynamic reversibility π(x)p(x, y) = π(ỹ)p(ỹ, x̃) ;
Page 48 of 167
Go Back
Full Screen
Close
Quit
Exercise 2.1 Show detailed balance (whether reversibility or dynamical reversibility!) implies invariant distribution.
Home Page
P
• improper distributions: does x π(x) converge?
R
Pn
• estimate θ(y) dπ(y) using limn→∞ n1 m=1 θ(Ym ) ;
Title Page
Definition 2.2 Geometric ergodicity: there is R(x) > 0 with
Contents
kπ − p(n) (x, ·)ktv ≤ R(x)ρn
JJ
II
J
I
for all n, all starting points x.
Page 49 of 167
Definition 2.3 Uniform ergodicity: there are ρ ∈ (0, 1), R > 0 not
depending on the starting point x such that
Go Back
kπ − p(n) (x, ·)ktv ≤ Rρn
Full Screen
Close
Quit
for all n, uniformly for all starting points x;
2.1.1. History
Home Page
Title Page
• Metropolis et al. (1953)
• Hastings (1970)
• Simulation tests
Contents
JJ
II
• Geman and Geman (1984)
• Gelfand and Smith (1990)
• Geyer and Møller (1994), Green (1995)
J
I
Page 50 of 167
• ...
Definition 2.4 Generic Metropolis algorithm:
• Propose: move x → y with probability r(x, y);
Go Back
• Accept: move with probability α(x, y).
Required: balance π(x)r(x, y)α(x, y) = π(y)r(y, x)α(y, x).
Full Screen
Definition 2.5 Hastings ratio: set α(x, y) = min{Λ(x, y), 1} where
Close
Quit
Λ(x, y)
=
π(y)r(y, x)
.
π(x)r(x, y)
2.2. Measure-theoretic approach
Home Page
Title Page
Contents
JJ
II
J
I
Page 51 of 167
We follow Tierney (1998) closely in presenting a treatment of the Hastings ratio for
general discrete-time Markov chains, since in stochastic geometry one frequently
deals with chains on quite complicated state spaces.
Here is the key detailed balance calculation in general form: compare the “fluxes”
π( dx)p(x, dy) and π( dy)p(y, dx). Both are absolutely continuous with respect
to their sum: set
Λ(y, x)
Λ(y, x) + 1
=
π( dx)p(x, dy)
π( dx)p(x, dy) + π( dy)p(y, dx)
defining a possibly infinite Λ(y, x) up to nullsets of π( dx)p(x, dy)+π( dy)p(y, dx).
Now Λ(y, x) is positive off a null-set N1 of π( dx)p(x, dy) and finite off a null-set
N2 of π( dy)p(y, dx). Moreover N1 and N2 can be chosen as transposes of each
other under the symmetry (x, y) ↔ (y, x), because they are defined in terms of
whether Λ(y, x)/(Λ(y, x) + 1) is 0 or 1, and so we may arrange1 for
Go Back
Λ(y, x)
Λ(x, y)
+
Λ(y, x) + 1 Λ(x, y) + 1
Full Screen
=
1.
Close
Quit
(2.1)
1 working
to null-sets of π( dx)p(x, dy) + π( dy)p(y, dx) throughout!
Applying simple algebra, we deduce Λ(y, x)Λ(x, y) = 1 off N1 ∪ N2 . We know
Home Page
Title Page
0 < Λ(y, x), Λ(x, y) < ∞
so
Contents
=
II
J
I
Page 52 of 167
Go Back
Λ(y, x)
Λ(y, x) + 1
Λ(y, x)
Λ(x, y)
×
Λ(y, x) + 1 Λ(x, y)
Λ(x, y)
min {Λ(y, x), Λ(y, x)Λ(x, y)}
(Λ(y, x) + 1)Λ(x, y)
Λ(x, y)
= min {Λ(y, x), 1}
(2.2)
Λ(x, y) + 1
min {1, Λ(x, y)}
JJ
off N1 ∪ N2 ,
=
min {1, Λ(x, y)}
and therefore
min {1, Λ(x, y)} π( dx)p(x, dy) = min {1, Λ(y, x)} π( dy)p(y, dx) .
The kernel min{1, Λ(x, y)}p(x, dy) may have a deficit (integral over y less than
one): compensate by adding an atom
Z
1 − min{1, Λ(x, u)}p(x, du) δx ( dy)
Full Screen
Close
Quit
to obtain a reversible kernel (actually a Metropolis-Hastings kernel!)
Z
1 − min{1, Λ(x, u)}p(x, du) δx ( dy) + min{1, Λ(x, y)}p(x, dy)
2.2.1. Projection arguments
Home Page
In case π( dx)p(x, dy) and π( dy)p(y, dx) are mutually absolutely continuous
Λ(x, y)
Title Page
Contents
JJ
II
I
Page 53 of 167
π( dy)p(y, dx)
π( dx)p(x, dy)
(2.3)
is exactly the Hastings ratio. Otherwise it arises via the Metropolis map L1 projection arguments of Billera and Diaconis (2001), measurable case.
Sketch: Let K be family of all kernels and π( dx) be target probability measure.
R
J
=
=
{q(x, dy) ∈ K : π( dx)q(x, dy) = π( dy)q(y, dx)}
and introduce distance (via Hahn-Jordan decomposition of signed measures)
Z Z
d(p, q) =
|π( dx) (q(x, dy) − p(x, dy))|
x6=y
Go Back
between p ∈ K and q ∈ R.
Remembering q ∈ R, introduce a deficit signed kernel (x, dy) by
π( dy)q(y, dx)
Full Screen
=
π( dx)p(x, dy) − π( dx)ε(x, dy)
and define
Close
Quit
H+
H−
= {(x, y) : Λ(x, y) > Λ(y, x)} ,
= {(x, y) : Λ(x, y) < Λ(y, x)}
(2.4)
Home Page
Title Page
Contents
JJ
II
J
I
Page 54 of 167
Go Back
Full Screen
where Λ(x, y) is defined in terms of p using Equation (2.3).
Notice H− = {(x, y) : (y, x) ∈ H+ }, and (x, x) 6∈ H− ∪ H+ for all x.
Furthermore, set p̃(x, dy) = α(x, y)p(x, dy) with
Λ(y, x)
α(x, y) = min 1,
Λ(x, y)
(use Λ(x, y) to compute acceptance probabilities to produce p̃ ∈ R). Now
Z Z
d(p, p̃) =
|π( dx) (p̃(x, dy) − p(x, dy))|
x6=y
Z Z
=
|π( dx) (α(x, y)p(x, dy) − p(x, dy))|
x6=y
Z Z
Λ(y, x)
=
π( dx) p(x, dy) − min 1,
p(x, dy)
Λ(x, y)
x6=y
Z Z
Λ(y, x)
=
π( dx) p(x, dy) −
p(x, dy)
Λ(x, y)
H
Z Z +
=
(π( dx)p(x, dy) − π( dy)p(y, dx))
H+
Close
Quit
using 0 ≤ α(x, y) ≤ 1 and definition of Λ(x, y) via a density in Equation (2.1).
Home Page
Note also that replacing p̃(x, dy) by α0 (x, y)p(x, dy), for α0 (x, y) ∈ [0, α(x, y)]
increases the distance by
Z Z
π( dx) (α(x, y) − α0 (x, y)) p(x, dy) .
(2.5)
Title Page
x6=y
Now compute
Contents
Z Z
d(p, q)
|π( dx) (q(x, dy) − p(x, dy))|
=
x6=y
JJ
II
Z Z
≥
|π( dx) (q(x, dy) − p(x, dy))| +
H+
J
I
+ |π( dy) (q(y, dx) − p(y, dx))|
Z Z
|π( dx)ε(x, dy)| +
=
Page 55 of 167
H+
+ |π( dx)p(x, dy) − π( dx)ε(x, dy) − π( dy)p(y, dx)|
Go Back
Z Z
≥
π( dx)p(x, dy) − π( dy)p(y, dx)
=
d(p, p̃) .
H+
Full Screen
Close
Quit
Thus p̃(x, dy), defined by p̃(x, dy) = α(x, y)p(x, dy), is the kernel in R which is
closest to the original kernel p(x, dy) in this distance d(p, p̃). Moreover equality in
the last step can hold only when π( dx)ε(x, dy) restricted to H+ is a non-negative
measure dominated by π( dx)p(x, dy), which is to say when
π( dx)ε(x, dy)
=
(1 − α0 (x, y)) π( dx)p(x, dy) on H+
Home Page
Title Page
Contents
JJ
II
J
I
Page 56 of 167
for some α0 (x, y) ∈ [0, 1]. Exactly the same arguments applied to the transposed
measure π( dy)p(y, dx), and Equation (2.4), lead us to deduce that if q(d, dy) is a
kernel in R which minimizes d(q, p) then
q(x, y)
=
α0 (x, y)p(x, dy) .
We deduce now from the discussion before Equation (2.5) that the distance is
minimized only when
α0 (x, y)
=
α(x, y) = min{1,
Λ(y, x)
}
Λ(x, y)
in L1 (π( dx)p(x, dy)). In this sense the Metropolis-Hastings sampler
p̃(x, dy)
=
α(x, y)p(x, dy)
is an L1 projection of p(x, dy) onto the class of reversible kernels.
Go Back
Full Screen
Close
Quit
We note in passing that the Hastings ratio (as opposed to other possible accept/reject
rules) yields the largest spectral gap and minimizes the asymptotic variance of additive functionals (Peskun 1973; Tierney 1998).
2.3. MCMC for Markov point processes
Home Page
Title Page
Contents
JJ
II
J
I
Page 57 of 167
Go Back
We now apply the above to simulation of a point process in a bounded window,
whose distribution2 has density f (ξ) with respect to a Poisson point process measure π1 ( dξ) of unit intensity (Geyer and Møller 1994; Green 1995). We need
to design a point-pattern-valued Markov chain whose limiting distribution will be
f (ξ)π1 ( dξ), and we will constrain the Markov chain to evolve solely by adding
and deleting points.
Let λ∗ (x; ξ) = f (x ∪ {ξ})/f (x) be the Papangelou conditional intensity for the
target point process, as defined in Equation (1.15). Suppose the point process
satisfies some condition such as local stability, namely
(λ∗ (x; ξ)
Close
Quit
γ
(2.6)
for some positive constant γ. We can weaken this: we need require only that
Z
W
Full Screen
≤
(λ∗ (x; u) du
Z
≤
γ(u) d Leb(u)
<
∞.
X
However the stronger condition is useful later, in CFTP, and implies Ruelle stability.
2 Here we use the measurable space of locally finite point patterns, introduced in Theorem 1.10.
Notice that we are dealing with a bounded window, so in fact the total number of points in the pattern
will always be finite.
2.3.1. Discrete time case
Home Page
Title Page
Consider a Markov chain X which evolves as follows. Fix αn so that α0 = 0 and
Z
αn + αn+1
λ∗ (x; u) du ≤ 1 .
(2.7)
W
Contents
JJ
II
J
I
• suppose the current value of X is X = x with #(x) = n. Choose from:
– Death: with probability αn−1 , choose a point η of x at random and
delete it from x.
– Birth: generate a new point ξ with sub-probability density
Page 58 of 167
Go Back
Full Screen
Close
Quit
αn λ∗ (x; ·)/(n + 1)
and add it to x (in case of probability deficit, do nothing!);
Arrange matters so that birth and death are exclusive (feasible by (2.7) above).
It is tedious to write down the transition kernel for X: however the message of
Λ(x, y) (as given in (2.3)) is that equality of probability fluxes suffices. Point
pattern configurations are tricky (patterns are not ordered); however an equivalent
task is to arrange for equality of birth and death fluxes when the target distribution
is given by rejection sampling based on the construction of Theorem 1.5, recalling
that f is a density with respect to a unit Poisson point process. Set Leb(W ) = A.
We consider the exchange between x̃ and x where x̃ = x ∪ {ξ}, x = x̃ \ {ξ}.
Suppose #(x) = n, so #(x̃) = n + 1.
Death: the flux from x̃ to x is
Home Page
f (x̃)
Title Page
Contents
JJ
II
J
I
Page 59 of 167
Go Back
Full Screen
Close
Quit
e−A dx1 . . . dxn dξ
αn
×(n + 1)!×
(n + 1)!
n+1
=
αn f (x̃)
e−A dx1 . . . dxn dξ
.
n+1
The factor (n + 1)! in the numerator of the left-hand side allows for the
(n+1)! ways of permuting x1 , . . . , xn , ξ to represent the same configuration
x̃. The death probability αn /(n + 1) allows for the requirement that the
point chosen to die should be ξ and not one of x1 , . . . , xn .
Birth: the flux from x to x̃ is
f (x)
αn ∗
e−A dx1 . . . dxn
× n! ×
λ (x; ξ) dξ
n!
n+1
e−A dx1 . . . dxn
= αn f (x)λ∗ (x; ξ)
dξ . (2.8)
n+1
The factor n! in the numerator of the left-hand side allows for the n! ways
of permuting x1 , . . . , xn to represent the same configuration x.
Using the formula (1.15) for the Papangelou conditional intensity, we deduce
equality between these two fluxes. It follows that X has the desired target distribution as an invariant distribution.
Home Page
Title Page
Contents
JJ
II
J
I
Page 60 of 167
Go Back
Full Screen
Close
Quit
Exercise 2.6 Consider the special case λ∗ (x; ξ) = β, choosing say
αn
=
1/(1 + β Leb(W ))
for all n > 0. Write down the simulation algorithm. Check that the resulting kernel is reversible with respect to πβ , the distribution measure of the
Poisson point process of rate β.
There is a very easy approach: recognize that there can be no spatial interaction, so
it suffices to set up detailed balance equations for the Markov chain which counts the
total number of points . . . .
2.3.2. Continuous Time case
Home Page
Title Page
Contents
JJ
II
J
I
Page 61 of 167
In the treatment above it is tempting to make αn as large as possible, to maximize
the rate of change per time-step. However we can also consider the limiting case
of a continuous-time model and reformulate the simulation in event-based terms,
thus producing a spatial birth-and-death process. For small δ, set αn = (n + 1)δ
for n to maintain Equation (2.7). (For larger n use smaller αn . . . .) Then
• suppose the current value of X is X = x. While #(x) is not too large,
choose from:
– Death: with probability #(x)δ, choose a point η of x at random and
delete it from x.
– Birth: generate a new point ξ with sub-probability density λ∗ (x; ξ)δ,
and add it to x (in case of probability deficit, do nothing!);
Taking the limit as δ → 0, and reformulating as a continuous time Markov chain:
Go Back
Full Screen
Close
Quit
• suppose the current value of X is X = x:
R
• wait an Exponential(#(x) + W λ∗ (x; ξ) dξ) random time then
R
– Death: with probability #(x)/(#(x) + W λ∗ (x; ξ) dξ), choose a
point η of x at random and delete it from x;
– Birth: otherwise generate a new point ξ with probability density proportional to λ∗ (x; ξ), and add it to x.
Home Page
Exercise 2.7 Verify that the above algorithm produces an invariant distribution with Papangelou conditional intensity λ∗ (x; ξ).
Title Page
A coupling argument shows this is geometrically ergodic – but we will do better!
Contents
JJ
II
J
I
Exercise 2.8 Determine a spatial birth-and-death process with invariant
distribution which is a Strauss point process.
Example of Strauss process target.
Page 62 of 167
Go Back
Full Screen
Close
Quit
Exercise 2.9 Determine a spatial birth-and-death process with invariant
distribution which is an area-interaction point process.
Note that the technique underlying these exercises suggests the following: a local
stability bound λ∗ (x; ξ) ≤ M implies that the point process can be expressed as
some complicated and dependent thinning of a Poisson point process of intensity
M . We shall substantiate this in Chapter 3, when we come to discuss perfect
simulation for point processes.
2.4. Uniqueness and convergence
Home Page
Title Page
Contents
JJ
II
J
I
Page 63 of 167
Go Back
Full Screen
Close
Quit
We have been careful to describe the target distributions as invariant distributions
rather than equilibrium distributions, because we have not yet introduced useful
methods to determine whether our chains actually converge to their invariant distributions! As we have already hinted in Chapter 1, this can be a problem; indeed
one can write down a Markov chain which purports to have the attractive case
Strauss point process as invariant distribution, even though this point process is
not well-defined!
The full story here involves discussions of φ-recurrence and Harris recurrence,
which we reluctantly omit for reasons of time. Fortunately there is a relatively
simple technique which often works to ensure uniqueness of and convergence to
an invariant distribution, once existence is established. Suppose we can ensure that
the chain X visits the empty-pattern state infinitely often, and with finite mean
recurrence time (the empty-pattern state ∅ is an ergodic atom). Then the coupling
approach to renewal theory (Lindvall 1982; Lindvall 2002) can be applied to show
that the distribution of Xn converges in total variation to its invariant distribution,
which therefore must be unique.
This pleasantly low-tech approach foreshadows the discussion of small sets later
in this chapter, and also anticipates important techniques in perfect simulation.
2.5. The MCMC zoo
Home Page
Title Page
Contents
JJ
II
J
I
Page 64 of 167
Go Back
Full Screen
Close
Quit
The attraction of MCMC is that it allows us to produce simulation algorithms for
all sorts of point processes and geometric processes. It is helpful however to have
in mind a general categorization: a kind of “zoo” of different MCMC algorithms.
• Here are some important special cases of the MH sampler classified by type
of proposal distribution. Let the target distribution be π (could be Bayesian
posterior . . . ).
– Independence sampler: Choose q(x, y) = q(y) (independent of current location x) which you hope is “well-matched” to π;
– Random walk sampler: Choose q(x, y) = q(y − x) to be a random
walk increment based on current location x. Tails of target distribution
π must not be too heavy and (eg) density contours not too sharply
curved (Roberts and Tweedie 1996b);
– Langevin algorithms (Also called, “Metropolis-adjusted Langevin”):
Choose q(x, y) = q(y − σ 2 g(x)/2) where g(x) is formed using x
and the “logarithmic gradient” of the density of π( dx) at x (possibly
adjusted) (σ 2 is the jump variance). Behaves badly if logarithmic gradient tends to zero at infinity; a condition requiring something of the
order “tails must not be too light” (Roberts and Tweedie 1996a).
Home Page
Title Page
Contents
JJ
II
J
I
Page 65 of 167
Go Back
Full Screen
Close
Quit
The Gibbs’ sampler corresponds to the case when proposal distributions are chosen to be conditional distributions: in a simple example, if state is described by X1 , X2 then propose new X1 according to L(X1 |X2 ) (under equilibrium distribution) and vice versa.
The continuous-time algorithms for simulation of point patterns using
birth-and-death processes and conditional intensities are examples of
this.
Home Page
• Slice sampler: task is to draw from density f (x). Here is a one-dimensional
example! (But note, this is only interesting because it can be made to work in
many dimensions . . . ) Suppose f unimodal. Define g0(y), g1(y) implicitly
by the requirement that [g0(y), g1(y)] is the superlevel set {x : f (x) ≥ y}.
Title Page
Contents
JJ
II
J
I
def mcmc slice move (x) :
y ← uniform (0, f (x))
return uniform (g0 (y), g1 (y))
Page 66 of 167
Go Back
Figure 2.1: Slice sampler
Full Screen
Close
Quit
Roberts and Rosenthal (1999, Theorem 12) show rapid convergence (order
of 530 iterations!) under a specific variation of log-concavity. Relates to
notion of “auxiliary variables”.
• Simulated tempering:
Home Page
Title Page
– Embed target distribution in a family of distributions indexed by some
“temperature” parameter. For example, embed density f (x) (defined
over bounded region for x) in family of normalized
Contents
R
(f (x))1/τ
;
(f (u))1/τ du
JJ
II
– Choose weights β0 , β1 , . . . , βk for selected temperatures τ0 = 1 <
τ1 < . . . τk ;
J
I
– Now do Metropolis-Hastings for target density
Page 67 of 167
βo (f (x))1/τ0 = β0 f (x)
Z
β1 (f (x))1/τ1 / (f (u))1/τ1 du
at level 0
at level 1
...
Go Back
1/τk
βk (f (x))
Z
/
1/τk
(f (u))
du
at level k
Full Screen
where moves either propose changes in x, or propose changes in level
i; (NB: choice of βi ?)
Close
Quit
– Finally, sample chain only when it is at level 0.
Home Page
Title Page
Contents
JJ
II
J
I
Page 68 of 167
• Markov-deterministic hybrid: Need to sample exp(−E(q))/Z. Introduce
“momenta” p and aim to draw from
1
π(p, q) = exp −E(q) − |p|2 /Z 0 .
2
Fact: for Hamiltonian H(p, q) = E(q) + 12 |p|2 , the dynamical system
dq
dp
= (∂H/∂p) dt = p
= −(∂H/∂q) dt = −(∂E/∂q)
has π(p, q) as invariant distribution (Liouville’s theorem). So alternate between (a) evolving using dynamics and (b) drawing from Gaussian marginal
for p. Discretization means (a) is not exact; can fix this by use of MH rejection techniques.
Cf : dynamic reversibility.
Go Back
Full Screen
Close
Quit
Home Page
Title Page
• Dynamic importance weighting (Liu, Liang, and Wong 2001): augment
Markov chain X by adding an
R importance parameter W , replace search
for equilibrium (E [θ(Xn )] → θ(x)π( dx)) by requirement
Z
lim E [Wn θ(Xn , Wn )] ∝ θ(x)π( dx) .
n→∞
Contents
JJ
II
J
I
Page 69 of 167
Go Back
Full Screen
Close
Quit
• Evolutionary Monte Carlo Liang and Wong (2000, 2001): use ideas of
genetic algorithms (multiple samples, “exchanging” sub-components) and
tempering.
• and many more ideas besides . . . .
2.6. Speed of convergence
Home Page
There is of course an issue as to how fast is the convergence to equilibrium. This
can be analytically challenging (Diaconis 1988)!
Title Page
2.6.1. Convergence and symmetry: card shuffling
Contents
JJ
II
Example: transposition random walk X on permutation group of n cards. Equilibrium occurs at strong stationary times T (Aldous and Diaconis 1987; Diaconis
and Fill 1990):
J
I
Definition 2.10 The random stopping time T is a strong stationary time for the
Markov chain X (whose equilibrium distribution is π) if
Page 70 of 167
Go Back
P [T = k, Xk = s]
=
πs × P [T = k] .
Broder: notion of checked cards. Transpose by choosing 2 cards at random; LH,
RH. Check card pointed to by LH if
either LH, RH are the same unchecked card;
Full Screen
Close
Quit
or LH is unchecked, RH is checked.
Inductive claim: given number of checked cards, positions in pack of checked
cards, list of values of cards; the map determining which checked card has which
value is uniformly random.
Pn−1 Set Tm to be the time till m cards checked. The
independent sum T = m=0 Tm is a strong stationary time by induction.
How big is T ?
Home Page
Title Page
Contents
JJ
II
J
I
Page 71 of 167
Go Back
Full Screen
Close
Quit
Now Tm+1 −Tm is Geometrically distributed with success probability (n−m)(m+
1)/n2 . So mean of T is
" n−1
# n−1
X
X n2 1
1
Tm =
E
+
≈ 2n log n .
n+1 n−m m+1
m=0
m=0
Pn−1
MOREOVER: T = m=0 Tm is a discrete version of time to complete infection
for simple epidemic (without recovery) in n individuals, starting with 1 infective,
individuals contacting at rate n2 . A classic calculation tells us T /(2n log n) has a
limiting distribution.
NOTE: group representation theory: correct asymptotic is 12 n log n.
The MCMC approach to substitution decoding also produces a Markov chain moving by transpositions on a permutation group of around n = 60 symbols. However
it takes much longer to reach equilibrium than order of 60 log(60) ≈ 250 transpositions. Conditioning on data can have a drastic effect on convergence rates!
2.6.2. Convergence and small sets
Home Page
Title Page
Contents
JJ
II
J
I
The most accessible theory (as expounded for example in Meyn and Tweedie
1993) uses the notion of small sets for Markov chains. In effect, small set theory allows us to argue as if (suitably regular) Markov chains are actually discrete,
even if defined on very general state-spaces. In fact this discreteness can even be
formalized as the existence of a latent discrete structure.
Definition 2.11 C is a small set (of order k) if there is a positive integer k,
positive c and a probability measure µ such that for all x ∈ X and E ⊆ X
p(k) (x, E)
Page 72 of 167
Go Back
Full Screen
Close
Quit
≥
c × I [x ∈ C] µ(E) .
Markov chains can be viewed as regenerating at small sets. The notion is useful
for recurrence conditions of Lyapunov type: compare low-tech “empty pattern”
method.
Small sets play a crucial rôle in notions of periodicity, recurrence, positive
recurrence, geometric ergodicity, making link to discrete state-space theory
(Meyn and Tweedie 1993).
Home Page
Foster-Lyapunov recurrence on small sets (Mykland, Tierney, and Yu 1995;
Roberts and Tweedie 1996a; Roberts and Tweedie 1996b; Roberts and
Rosenthal 2000; Roberts and Rosenthal 1999; Roberts and Polson 1994)
Title Page
Contents
JJ
II
J
I
Page 73 of 167
Go Back
Full Screen
Theorem 2.12 (Meyn and Tweedie 1993) Positive recurrence holds if one
can find a small set C, a constant β > 0, and a non-negative function V
bounded on C such that for all n > 0
E [V (Xn+1 )|Xn ]
V (Xn ) − 1 + β I [Xn ∈ C] .
(2.9)
Proof:
Let N be the random time at which X first (re-)visits C. It suffices3 to show
E [N |X0 ] < V (X0 ) + constant < ∞ (then use small-set regeneration).
By iteration of (2.9), we deduce E [V (Xn )|X0 ] < ∞ for all n.
If X0 6∈ C then (2.9)
tells us n 7→ V (Xn∧N ) + n ∧ N defines a nonnegative
supermartingale (I X(n∧N ) ∈ C = 0 if n < N ) . Consequently
E [N |X0 ]
Close
Quit
≤
3 This
≤
E [V (XN ) + N |X0 ]
≤
V (X0 ) .
martingale approach can be reformulated as an application of Dynkin’s formula.
If X0 ∈ C then the above can be used to show
Home Page
Title Page
Contents
JJ
II
J
I
Page 74 of 167
Go Back
Full Screen
Close
Quit
E [N |X0 ]
= E [1 × I [X1 ∈ C]|X0 ] + E [E [N |X1 ] I [X1 6∈ C]|X0 ]
≤ P [X1 ∈ C|X0 ] + E [1 + V (X1 )|X0 ]
≤ P [X1 ∈ C|X0 ] + V (X0 ) + β
where the last step uses Inequality (2.9) applied when I [Xn−1 ∈ C] = 1.
This idea can be extended to give bounds on convergence rates.
Home Page
Title Page
Contents
JJ
II
J
I
Theorem 2.13 (Meyn and Tweedie 1993) Geometric ergodicity holds if one
can find a small set C, positive constants λ < 1, β, and a function V ≥ 1
bounded on C such that
E [V (Xn+1 )|Xn ]
≤
λV (Xn ) + β I [Xn ∈ C] .
(2.10)
Proof:
Define N as in Theorem 2.12.
Iterating (2.10), we deduce E [V (Xn )|X0 ] < ∞ and more specifically
n
7→
V (Xn∧N )/λn∧N
Page 75 of 167
Go Back
Full Screen
is a nonnegative supermartingale. Consequently
E V (XN )/λN |X0
≤
V (X0 ) .
Using the facts that V ≥ 1, λ ∈ (0, 1) and Markov’s inequality we deduce
P [N > n|X0 ]
≤
λn V (X0 ) ,
Close
which delivers the required geometric ergodicity.
Quit
Home Page
Title Page
Contents
JJ
Bounds may not be tight, but there are many small sets for most Markov chains
(not just the trivial singleton sets, either!). In fact small sets can be used to show:
Theorem 2.14 (Kendall and Montana 2002) If the Markov chain has a measurable transition density p(x, y) then the two-step density p(2) (x, y) can be
expressed (non-uniquely) as a non-negative countable sum
X
p(2) (x, y) =
fi (x)gi (y) .
II
Hence in such cases small sets of order 2 abound!
J
I
Page 76 of 167
Go Back
Full Screen
Close
Quit
This relates to the classical theory of small sets and would not surprise its originators; see for example Nummelin 1984.
Home Page
Proof:
Key Lemma, variation of Egoroff’s Theorem:
Let p(x, y) be an integrable function on [0, 1]2 . Then we can find subsets Aε ⊂
[0, 1], increasing as ε decreases, such that
Title Page
Contents
JJ
II
J
I
Page 77 of 167
Go Back
Full Screen
Close
Quit
(a) for any fixed Aε the “L1 -valued function” px is uniformly continuous on
Aε : for any η > 0 we can find δ > 0 such that |x − x0 | < δ and x, x0 ∈ Aε
implies
Z 1
|px (z) − px0 (z)| dz < η ;
0
(b) every point x in Aε is of full relative density: as u, v → 0 so
Leb([x − u, x + v] ∩ Aε )
u+v
→
1.
Moreover one can discover latent discrete Markov chains representing all
such Markov chains at period 2.
However one cannot get below order 2:
Home Page
Title Page
Contents
JJ
II
J
I
Theorem 2.15 (Kendall and Montana 2002) There are Markov chains
which possess no small sets of order 1.
Proof:
Key Theorem:
There exist Borel measurable subsets E ⊂ [0, 1]2 of positive area which are
rectangle-free, so that if A × B ⊆ E then Leb(A × B) = 0.
(Stirling’s formula, discrete approximations, Borel-Cantelli.)
Page 78 of 167
Go Back
Full Screen
Close
Quit
Figure 2.2: No small sets!
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 3
Perfect simulation
Page 79 of 167
Contents
Go Back
Full Screen
Close
Quit
3.1
3.2
3.3
CFTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fill’s (1998) method . . . . . . . . . . . . . . . . . . . . .
Sundry other matters . . . . . . . . . . . . . . . . . . . .
81
97
99
Home Page
Title Page
Contents
JJ
II
J
I
Page 80 of 167
Go Back
Full Screen
Close
Quit
Establishing bounds on convergence to equilibrium is important if we are to get
a sense of how to do MCMC. It would be very useful if one could finesse the
whole issue by modifying the MCMC algorithm so as to produce an exact draw
from equilibrium, and this is precisely the idea of perfect simulation. A graphic
illustration of this in the context of stochastic geometry (specifically, the dead
leaves model) is to be found at
http://www.warwick.ac.uk/statsdept/staff/WSK/dead.html.
3.1. CFTP
Home Page
Title Page
Contents
JJ
II
J
I
Page 81 of 167
Go Back
Full Screen
Close
Quit
There are two main approaches to perfect simulation. One is Coupling from the
Past (CFTP), introduced by Propp and Wilson (1996). Many variations have
arisen: we will describe the original Classic CFTP, and two variants Small-set
CFTP and Dominated CFTP. Part of the fascination of this area is the interplay
between theory and actual implementation: we will present illustrative simulations.
3.1.1. Classic CFTP
Home Page
Title Page
Consider reflecting random walk run From The Past (Propp and Wilson 1996):
• no fixed start: run all starts simultaneously (Coupling From The Past);
• extend past till all starts coalesce by t = 0;
Contents
JJ
II
J
I
• 3-line proof: perfect draw from equilibrium.
Page 82 of 167
Go Back
Figure 3.1: Classic CFTP.
Implementation:
- “re-use” randomness;
Full Screen
Close
- sample at t = 0 not at coalescence;
- control coupling efficiently using monotonicity, boundedness;
- uniformly ergodic.
Quit
Here is the 3-line proof, with a couple of extra lines to help explain!
Home Page
Title Page
Theorem 3.1 If coalescence is almost sure then CFTP delivers a sample
from the equilibrium distribution of the Markov chain X corresponding to
the random input-output maps F(−u,v] .
Contents
JJ
II
J
I
Proof:
For each time-range [−n, ∞) use input-output maps F(−n,t] to define
Xt−n
Page 83 of 167
=
F(−n,t] (0)
for − n ≤ t .
We have assumed finite coalescence time −T for F . Then
X0−n
L(X0−n )
=
=
X0−T
L(Xn0 )
whenever − n ≤ −T ;
(3.1)
(3.2)
Go Back
If X converges to an equilibrium π then
Full Screen
Close
Quit
dist(L(X0−T ), π)
lim dist(L(X0−n ), π) = lim dist(L(Xn0 ), π) = 0
n
n
(3.3)
(dist is total variation) hence giving the required result.
=
Home Page
Title Page
Here is a simple version of perfect simulation applied to the Ising model,
following Propp and Wilson (1996) (but they do something much cleverer!).
Here the probability of a given ±1-valued configuration S is proportional to
XX
exp β
Sx Sy
x∼y
Contents
JJ
II
J
I
Page 84 of 167
Go Back
Full Screen
Close
Quit
and our recipe is, pick a site x, compute conditional probability of Sx = +1 given
rest of configuration S,
P
X
exp(β y:y∼x (-1) × Sy )
P
ρ =
= exp(−2β
Sy ) ,
exp(β y:y∼x (1) × Sy )
y:y∼x
and set Sx = +1 with probability ρ, else set Sx = −1.
Boundary conditions are “free” in the simulation: for small β we can make a
perfect draw from Ising model over infinite lattice, by coupling in space as well as
time (Kendall 1997b; Häggström and Steif 2000).
We can do better for Ising: Huber (2003) shows how to convert Swendsen-Wang
methods to CFTP.
3.1.2. A little history
Home Page
Title Page
Contents
JJ
II
J
I
Kolmogorov (1936): 1 Consider an inhomogeneous (finite state) Markov transition
kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in
the indefinite past?
Yes!
Apply diagonalization arguments to select a convergent subsequence from the
probability systems arising from progressively earlier and earlier starts:
P [X(0) = k]
P [X(−1) = j, X(0) = k]
P [X(−2) = i, X(−1) = j, X(0) = k]
=
=
=
...
Page 85 of 167
Go Back
Full Screen
Close
Quit
1 Thanks
to Thorisson (2000) for this reference . . . .
Qk (0)
Qj (−1)Pjk (−1, 0)
Qi (−2)Pij (−2, −1)Pjk (−1, 0)
3.1.3. Dead Leaves
Home Page
Title Page
Nearly the simplest possible example of perfect simulation.
• Dead leaves falling in the forests at Fontainebleau;
• Equilibrium distribution models complex images;
Contents
• Computationally amenable even in classic MCMC framework;
JJ
II
J
I
• But can be sampled perfectly with no extra effort! (And uniformly ergodic)
How can this be done? see Kendall and Thönnes (1999)
[HINT: try to think non-anthropomorphically . . . .]
Occlusion CFTP. Related work: Jeulin (1997), Lee, Mumford, and Huang (2001).
Page 86 of 167
Go Back
Full Screen
Close
Figure 3.2: Falling leaves of perfection.
Quit
3.1.4. Small-set CFTP
Home Page
Does CFTP depend on monotonicity? No: we can use small sets. Consider a
simple Markov chain on [0, 1] with triangular kernel.
Title Page
Contents
JJ
II
J
I
Figure 3.3: Small-set CFTP.
Page 87 of 167
• Note “fixed” small triangle;
Go Back
• this can couple Markov chain steps;
Full Screen
• basis of small-set CFTP (Murdoch and Green 1998).
Close
Quit
Extends to more general cases (“partitioned multi-gamma sampler”).
Uniformly ergodic.
Home Page
Title Page
Exercise 3.2 Write out the details of small-set CFTP. Deduce how to represent the output of simple small-set CFTP (uses just one small set) in terms of
a Geometric random variable and draws from a single kernel.
Contents
JJ
II
J
I
Page 88 of 167
Go Back
Full Screen
Close
Quit
Exercise 3.3 One small set is usually not enough - coalescence takes much
too long! Work out how to use several small sets to do partitioned small-set
CFTP (“partitioned multi-gamma sampler”) (Murdoch and Green 1998).
3.1.5. Point patterns
Home Page
Title Page
Recall Widom-Rowlinson model (Widom and Rowlinson 1970) from physics,
“area-interaction” model in statistics (Baddeley and van Lieshout 1995).
The point pattern X has density (with respect to Poisson process):
exp (− log(γ) × (area of union of discs centred on X))
Contents
JJ
II
J
I
Page 89 of 167
Figure 3.4: Perfect area-interaction.
Go Back
• Represent as two-type pattern (red crosses, blue disks);
Full Screen
Close
Quit
• inspect structure to detect monotonicity;
• attractive case γ > 1: CFTP uses “bounded” state-space (Häggström et al.
1999) in sense of being uniformly ergodic.
All cases can be dealt with using Dominated CFTP (Kendall 1998).
3.1.6. Dominated CFTP
Home Page
Title Page
Contents
JJ
II
J
I
Foss and Tweedie (Foss and Tweedie 1998) show classic CFTP is essentially
equivalent to uniform ergodicity (rate of convergence to equilibrium does not depend on start-point). For classic CFTP needs vertical coalescence: every possible
start from time −T leads to same result at time 0.
Can we lift this uniform ergodicity requirement?
Page 90 of 167
Go Back
Figure 3.5: Horizontal CFTP.
Full Screen
Close
Quit
CFTP almost works with horizontal coalescence: all sufficiently early starts from
a specific location * lead to same result at time 0. But how to identify when this
has happened?
Home Page
Recall Lindley’s equation for the GI/G/1 queue identity
Wn+1
Title Page
Contents
JJ
II
=
I
Page 91 of 167
Go Back
Full Screen
Close
Quit
=
max{0, Wn + ηn }
and its development
Wn+1
max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 }
=
leading by time-reversal to
Wn+1
J
max{0, Wn + Sn − Xn+1 }
=
max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn }
and thus the steady-state expression
W∞
=
max{0, η1 , η1 + η2 , . . .} .
Loynes (1962) discovered a coupling application to queues with general inputs, which pre-figures CFTP.
Home Page
Title Page
Contents
JJ
II
J
I
Dominated CFTP idea: Generate target chains using a dominating process
for which equilibrium is known. Domination allows us to identify horizontal coalescence by checking starts from maxima given by the dominating
process.
Thus we can apply CFTP to Markov chains which are merely geometrically ergodic (Kendall 1998; Kendall and Møller 2000; Kendall and Thönnes 1999; Cai
and Kendall 2002) or worse (geometric ergodicity 6= Dominated CFTP!).
Page 92 of 167
Go Back
Full Screen
(a) Perfect Strauss point process. (b) Perfect conditioned Boolean
model.
Close
Figure 3.6: Two examples of non-uniformly ergodic CFTP.
Quit
Home Page
Description2 of dominated CFTP (Kendall 1998; Kendall and Møller 2000) for
birth-death algorithm for point process satisfying the local stability condition
λ∗ (ξ; x)
≤
γ.
Title Page
Contents
JJ
II
J
I
Page 93 of 167
Original Algorithm:
points η die at unit death rate,
R
R are born in pattern x
at total birth rate W λ∗ (x; u) du, density λ∗ (x; ·)/ W λ∗ (x; u) du, so local
birth rate λ∗ (x; ξ).
Perfect modification: underlying supply of randomness is birth-and-death
process with unit death rate, uniform birth rate γ per unit area. Mark points
independently with (example) Uniform(0, 1) marks. Define lower and upper processes L and U as exhibiting minimal and maximal birth rates available in range of configurations between L and U :
minimal birth rate at u
maximal birth rate at u
Go Back
Full Screen
Close
Quit
= min{λ∗ (x; u) : L ⊆ ξ ⊆ U } ,
= max{λ∗ (x; u) : L ⊆ ξ ⊆ U } .
Use marks to do CFTP till current iteration delivers L(0) = U (0), then
return the agreed configuration! Example uses Poisson point patterns as
marks for area-interaction, following Kendall (1997a).
2 Recall
remark about dependent thinning!
Home Page
Title Page
Contents
JJ
II
J
I
Page 94 of 167
Go Back
Full Screen
Close
Quit
“It’s a thinning procedure, Jim, but not as we know it.”
Dr McCoy, Startrek, stardate unknown.
Home Page
All this suggests a general formulation:
Let X be a Markov chain on X which exhibits equilibrium behaviour.
Embed the state-space X in a partially ordered space (Y, ) so that X is at bottom
of Y, in the sense that for any y ∈ Y, x ∈ X ,
Title Page
yx
Contents
JJ
II
J
I
implies
y = x.
We may then use Theorem 3.1 to show:
Theorem 3.4 Define a Markov chain Y on Y such that Y evolves as X after
it hits X ; let Y (−u, t) be the value at t a version of Y begun at time −u,
(a) of fixed initial distribution L(Y (−T, −T )) = L(Y (0, 0)), and
Page 95 of 167
Go Back
Full Screen
Close
Quit
(b) obeying funnelling: if −v ≤ −u ≤ t then Y (−v, t) Y (−u, t).
Suppose coalescence occurs: P [Y (−T, 0) ∈ X ] → 1 as T → ∞. Then
lim Y (−T, 0) can be used for a CFTP draw from the equilibrium of X.
Home Page
So how far can we go? It looks nearly possible to define the dominating process
using the “Liapunov function” V of Theorem 2.13. There is an interesting link –
Rosenthal (2002) draws it even closer – nevertheless:
Title Page
Existence of Liapunov function doesn’t ensure dominated CFTP!
The relevant Equation (2.10) is:
Contents
E [V (Xn+1 )|Xn ]
JJ
II
J
I
≤
λV (Xn ) + β I [Xn ∈ C] .
However there are perverse examples satisfying the supermartingale inequality, but
failing the the stochastic dominance required to use V (X) for dominated CFTP
. . . :-(.3
Page 96 of 167
Go Back
Full Screen
Close
Quit
3 Later:
I have discovered how to fix this using sub-sampling . . .
3.2. Fill’s (1998) method
Home Page
Title Page
The alternative to CFTP is Fill’s algorithm (Fill 1998; Thönnes 1999), at first sight
quite different, based on the notion of a strong stationary time T . Fill et al. (2000)
show a profound link, best explained using “blocks” as input-output maps for a
chain.
Contents
JJ
II
J
I
Figure 3.7: The “block” model for input-output maps.
First recall that CFTP can be viewed in a curiously redundant fashion as follows:
Page 97 of 167
Go Back
• Draw from equilibrium X(−T ) and run forwards;
• continue to increase T until X(0) is coalesced;
• return X(0).
Full Screen
Close
Quit
Home Page
Title Page
Figure 3.8: Conditioning the blocks on the reversed chain.
Key observation: By construction, X(−T ) is independent of X(0) and T so . . .
Contents
• Condition on a convenient X(0);
JJ
II
J
I
• Run X backwards to a fixed time −T ;
• Draw blocks conditioned on the X transitions;
• If coalescence then return X(−T ) else repeat.
Page 98 of 167
Go Back
Exercise 3.5 Can you figure out how to do a dominated version of Fill’s
method?
Full Screen
Close
Quit
Exercise 3.6 Produce a perfect version of the slice sampler! (Mira, Møller,
and Roberts 2001)
3.3. Sundry other matters
Home Page
Title Page
Contents
JJ
II
J
I
Page 99 of 167
Go Back
Layered Multishift: Wilson (2000b) and further work by Corcoran and others.
Basic idea: how to draw simultaneously from Uniform(x, x + 1) for all
x ∈ R, and to try to couple the draws? Answer: draw a uniformly random
unit span integer lattice, . . . . 4
Exercise 3.7 Figure out how to replace “Uniform(x, x + 1)” by any
other location-shifted family of continuous densities. (Hint: mixtures
of uniforms . . . .)
Montana used these ideas to do perfect simulation of nonlinear AR(1) in his
PhD thesis.
Full Screen
Close
Quit
4 Uniformly
random lattices in dimension 2 play a part in line process theory.
Home Page
Title Page
Read-once randomness: Wilson (2000a). The clue is in:
Exercise 3.8 Figure out how to do CFTP, using the “input-output”
blocks picture of Figure 3.7, but under the constraint that you aren’t
allowed to save any blocks!
Contents
JJ
II
J
I
Page 100 of 167
Perfect simulation for Peirls’ contours (Ferrari et al. 2002). The Ising model
can be reformulated in an important way by looking only at the contours
(lines separating ±1 values). In fact these form a “non-interacting hard-core
gas”, thus permitting (at least in theory) Ferrari et al. (2002) to apply their
variant of perfect simulation (Backwards-Forwards Algorithm)! Compare
also Propp and Wilson (1996).
Go Back
Perfect simulated tempering (Møller and Nicholls 1999; Brooks et al. 2002).
Full Screen
Close
Quit
3.3.1. Efficiency
Home Page
Title Page
Contents
JJ
II
J
I
Page 101 of 167
Go Back
Burdzy and Kendall (2000): Coupling of couplings:. . .
|pt (x1 , y) − pt (x2 , y)| ≤
≤ |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]|
≤ |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]| ×
×P [τ > t|X(0) = (x1 , x2 )]
Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c exp(−µ2 t)
while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt)
Let X ∗ be a coupled copy of X but begun at (x2 , x1 ):
|P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] |
= |P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] −
P [X1∗ (t) = y|τ > t, X ∗ (0) = (x2 , x1 )] |
So let σ be time when X, X ∗ couple:
Full Screen
≤ P [σ > t|τ > t, X(0) = (x1 , x2 )]
(≈ c exp(−µ0 t))
Close
Quit
Thus µ2 ≥ µ0 + µ.
See also Kumar and Ramesh (2001).
3.3.2. Example: perfect epidemic
Home Page
Title Page
Context
• many models: SIR, SEIR, Reed-Frost
• data: knowledge about removals
Contents
JJ
II
J
I
• draw inferences about other aspects
– infection times
– infectivity parameter
– removal rate
– ...
Page 102 of 167
Perfect MCMC for Reed-Frost model
Go Back
• based on explicit formula for size of epidemic
• (O’Neill 2001; Clancy and O’Neill 2002)
Full Screen
Close
Quit
Question
Home Page
Title Page
Can we apply perfect simulation to SIR epidemic conditioned on removals?
• IDEA
– SIR → compartment model
Contents
JJ
II
J
I
Page 103 of 167
Go Back
Full Screen
Close
Quit
– conditioning on removals is reminiscent of coverage conditioning for
Boolean model
• consider SIR-epidemic-valued process, considered as event-driven simulation, driven by birth-and-event processes producing infection and removal
events . . .
• then use reversibility, perpetuation, crossover to derive perfect simulation
algorithm for conditioned case.
• PRO: conditioning on removals appears to make structure “monotonic”
Home Page
Title Page
• CON: Perpetuation and nonlinear infectivity rate combine to make funnelling difficult
• So: attach independent streams of infection proposals to each level k =
I + R to “localize” perpetuation
Contents
• Speed-up by cycling through:
JJ
II
J
I
– replace all unobserved marked removal proposals (thresholds)
– try to alter all observed removal marks (perpetuate . . . )
Page 104 of 167
Go Back
Full Screen
Close
Quit
– replace all infection proposals (perpetuate . . . )
(Reminiscent of H-vL-M method for perfect simulation of attractive areainteraction point process)
Consistent theoretical framework
Home Page
Title Page
Contents
JJ
II
J
I
View Poisson processes in time as 0 : 1-valued processes over “infinitesimal pixels”:
infinitesimal
infinitesimal
P X
= 1 = λ × measure
pixel
pixel
• replace thresholds: work sequentially through “infinitesimal pixels”, reject
replacement if it violates conditioning
• replace removal marks: can incorporate into above step
• replace infection proposals: work sequentially through “infinitesimal pixels” first at level 1, then level 2, then level 3 . . . (TV scan); reject replacement
if it violates conditioning.
Page 105 of 167
Theorem 3.9 Resulting construction is monotonic (so enables CFTP)
Go Back
Full Screen
Close
Proof:
Establish monotonicity for
• fastest path of infection (what is observed)
• slowest feasible path of infection (what is used for perpetuation)
Quit
Implementation:
Home Page
Title Page
It works!
• speed-up?
• scaling?
Contents
JJ
II
J
I
Page 106 of 167
Go Back
Full Screen
Close
Quit
• refine?
• generalize?
Applications
• “omnithermal” variant to produce estimate of likelihood surface
• generalization using crossover to produce full Bayesian analysis
Home Page
Title Page
Contents
JJ
II
J
I
Page 107 of 167
Chapter 4
Deformable templates and
vascular trees
Go Back
Contents
Full Screen
Close
Quit
4.1
4.2
4.3
MCMC and statistical image analysis . . . . . . . . . . . . 108
Deformable templates . . . . . . . . . . . . . . . . . . . . 110
Vascular systems . . . . . . . . . . . . . . . . . . . . . . . 111
4.1. MCMC and statistical image analysis
Home Page
Title Page
How to use MCMC on images? Huge literature, beginning with Geman and Geman (1984).
Simple example: reconstruct white/black ±1 image: observation Xij = ±1, “true
image” Sij = ±1, X|S independent
Contents
P [Xij = xij |Sij ]
JJ
II
J
I
Full Screen
Close
exp (φSij xij )
cosh(φ)
(i,j)∼(i0 ,j 0 )
Posterior for S|X proportional to

X
exp β
(i,j)∼(i0 ,j 0 )

Sij Si0 j 0 + φ
X
Xij Sij 
ij
(Ising model with inhomogeneous external magnetic field φX), and we can sample
from this using MCMC algorithms.
Easy CFTP for this example
Quit
2
while prior for S is Ising model, probability mass function proportional to


X
exp β
Sij Si0 j 0 
Page 108 of 167
Go Back
=
Home Page
“Low-level image analysis”: contrast “high-level image analysis”.
For example, pixel array {(i, j)} representing small squares in a window [0, 1]2 ,
and replace S by a Boolean model (Baddeley and van Lieshout 1993).
Title Page
Contents
JJ
II
J
I
Page 109 of 167
Go Back
Full Screen
Close
Quit
Figure 4.1: Scheme for high-level image analysis.
Van Lieshout and Baddeley (2001), Loizeaux and McKeague (2001) provide perfect examples.
4.2. Deformable templates
Home Page
Grenander and Miller (1994): to answer “how can knowledge in complex systems
be represented? use “deformable templates” (image algebras).
Title Page
• configuration space;
Contents
• configurations (prior probability measure from interactions) → image;
• groups of similarities allow deformations;
JJ
II
• template is representation within similarity equivalence class;
J
I
Page 110 of 167
Go Back
Full Screen
Close
Quit
• “jump-diffusion”: analogue of Metropolis-Hastings. Besag makes link in
discussion of Grenander and Miller (1994), suggesting discrete-time LangevinMetropolis-Hastings moves.
Expect to need long MCMC runs; Grenander and Miller (1994) use SIMD technology.
4.3. Vascular systems
Home Page
We illustrate using an application to vascular structure in medical images (Bhalerao
et al. 2001, Thönnes et al. 2002, 2003). Figures:
Title Page
Figure 4.2(a) cerebral vascular system (application: preplanning of surgical
procedures).
Contents
Figure 4.2(b) retinal vascular system (application: diabetic retinopathy).
JJ
II
J
I
Page 111 of 167
Go Back
Full Screen
(a) AVI movie of cerebral vascular
system.
(b) Retina vascular system:
4002 pixel image.
Close
Figure 4.2: Examples of vascular structures.
Quit
a
Home Page
Title Page
Contents
JJ
II
J
I
Plan: use random forests of planar (or spatial!) trees as configurations, since the
underlying vascular system is nearly (but not exactly!) tree-like.
Note: our tree edges are graph vertices in the Grenander-Miller formalism.
Vary tree number, topologies and geometries using Metropolis-Hastings moves.
Hence sample from posterior so as to partly reconstruct the vascular system.
Issues:
• figuring out effective reduction of the very large datasets which can be involved, how to link the model to the data,
• careful formulation of the prior stochastic geometry model,
• choice of the moves so as not to slow down the MCMC algorithm,
• and use of multiscale information.
Page 112 of 167
Go Back
Full Screen
Close
Quit
4.3.1. Data
Home Page
Title Page
For computational feasibility, use multiresolution Fourier Transform (Wilson, Calway, and Pearson 1992) based on dyadic recursive partitioning using quadtree.
(Compare wavelets.)
Contents
JJ
II
J
I
Page 113 of 167
Go Back
Figure 4.3: Multiresolution summary (planar case!).
Full Screen
Close
Quit
Home Page
Title Page
Partitioning: based on mean-squared error criterion and the extent to which summarizing feature is elongated: alternative approach uses Akaike’s information criterion (Li and Bhalerao 2003). Figure 4.4(a) shows a typical feature, which in
the planar case is a weighted bivariate Gaussian kernel, while Figure 4.4(b) shows
the result of summarizing a 400 × 400 pixel retinal image using 1000 weighted
bivariate Gaussian kernels.
Contents
JJ
II
J
I
Page 114 of 167
Go Back
Full Screen
Close
Quit
(a) A typical feature
for the MFT.
(b) 1000 kernels for a retina.
Figure 4.4: Decomposition into features using quadtree.
Home Page
Title Page
This procedure can be viewed as a spatial variant of a treed model (Chipman et al.
2002), a variation on CART (Breiman et al. 1984) in which different models are
fitted to divisions of the data into disjoint subsets, with further divisions being
made recursively until satisfactory models are obtained.
Thus subsequent Bayesian inference can be viewed as conditional on the treed
model inference.
Contents
JJ
II
J
I
Page 115 of 167
Go Back
Full Screen
Figure 4.5: Final result of Multiresolution Fourier Transform
Close
Quit
4.3.2. Likelihood
Home Page
Link summarized data (mixture of weighted bivariate Gaussian kernels k(x, y, e)
to configuration, assuming additive Gaussian noise.
Title Page
f (x, y) =
Contents
JJ
II
J
I
XX
→
k(x, y, e) + noise
f˜(x, y)
=
τ ∈F e∈τ
X
k(x, y, e) .
e∈E
Split likelihood by rectangles Ri .
`(F|f˜)
=
N
X
`i (F|f˜) + constant
i=1
Page 116 of 167
`i (F|f˜)
= −
Go Back
≈
1
2σ 2
1
− 2
2σ
h
X
f˜(x, y) −
h
f˜(x, y) −
Ri
i2
k(x, y, e)
τ ∈F e∈τ
(x,y)∈Ri
ZZ
XX
XX
i2
k(x, y, e) hi (x, y) dx dy
τ ∈F e∈τ
Full Screen
Extend from Ri to all R2 , enabling closed form computation.
Close
`i (F|f˜)
Quit
≈
−
1
2σ 2
ZZ
R2
h
f˜(x, y) −
XX
τ ∈F e∈τ
i2
k(x, y, e) gi (x, y) dx dy (4.1)
4.3.3. Prior
Home Page
Prior using AR(1) processes for location, width, extending along forests of trees
τ produced by subcritical binary Galton-Watson processes.
Title Page
lengthe
Contents
JJ
II
J
I
Page 117 of 167
Go Back
Full Screen
Close
Quit
=
α × lengthe0 + Gaussian innovation
for autoregressive parameter α ∈ (0, 1).
(a) sub-critical binary (b) AR(1) process on (c) each edge reprebranching process.
edges.
sented by a weighted
Gaussian kernel.
Figure 4.6: Components of the prior.
(4.2)
4.3.4. MCMC implementation
Home Page
We have now formulated likelihood and model. It remains to decide on a suitable
MCMC process.
Title Page
Contents
JJ
II
J
I
Page 118 of 167
Go Back
Full Screen
Figure 4.7: Refined scheme for high-level image analysis.
Close
Quit
Home Page
Title Page
Changes are local (hence slow). Compute MH accept-reject probabilities using
AR(1) location distributions and
Y
Y
P [τ ] =
P [ node has n children ]
(4.3)
n=0,1,2 v∈τ :v has n children
Contents
JJ
II
J
I
Page 119 of 167
Go Back
Full Screen
Close
Figure 4.8: Reversible jump MCMC proposals.
Quit
Computations for topological moves. Suppose family-size distribution is
Home Page
Title Page
Contents
JJ
II
J
I
Page 120 of 167
Go Back
Full Screen
Close
Quit
(1 − p)2 , 2p(1 − p) , p2
Hastings ratio for adding a specified leaf ` ∈ ∂ext (τ ):
H(τ ; `)
=
P [τ ∪ {`}] /#(∂int (τ ∪ {`}) ∪ ∂ext (τ ∪ {`}))
.
P [τ ∪ {`}] /#(∂int (τ ) ∪ ∂ext (τ ))
Use induction on τ : P [τ ∪ {`}] = P [τ ] p(1 − p), . . .
H(τ ; `)
=
p(1 − p)
1 + #(V (τ )) + #(∂ext (τ ))
2 + #(V (τ )) + #(∂ext (τ ))+1
where +1 is added to the denominator if ` is added to a vertex which already has
one daughter.
It is clear that this Hastings ratio is always smaller than p(1 − p), and is approximately equal to p(1 − p) for large trees τ . Consequently removals of leaves will
always be accepted, and additions will usually be accepted, but there is a computational slowdown as the tree size increases.
Home Page
Title Page
Contents
JJ
II
J
I
Page 121 of 167
Go Back
Full Screen
Close
Quit
Geometric ergodicity is impossible: consider second eigenvalue λ2 of the Markov
kernel by Rayleigh-Ritz:
P P
2
x
y |φ(x) − φ(y)| p(x, y)π(y)
1 − λ2 = inf P P
2
φ6=0
x
y |φ(x) − φ(y)| π(x)π(y)
P
P
x∈S
y∈S c p(x, y)π(y)
≤ 2
π(S)
whenever π(S) < 1/2. Taking S to be the set of all trees of depth n or more, and
using the fact that such trees must have at least n sites, it follows
1 − λ2
≤
2
1
p(1 − p) n
→
0.
Experiments:
Home Page
MCMC for prior model
Title Page
JJ
II
First we implement only the moves relating to a single
tree, and only to that tree’s topology. We change the tree
just one leaf at a time. Convergence is clearly very slow!
moreover the animation clearly indicates the problem is
to do with incipient criticality phenomena.
J
I
MCMC for prior model with more global changes
Contents
Page 122 of 167
Go Back
Moves dealing with sub-trees will speed things up
greatly! We won’t use these as such – however we will
be working with a whole forest of such trees, and using
more “global” moves of join and divide.
Full Screen
MCMC for occupancy conditioned prior model
Close
Quit
Conditioning on data will also help! Here is a crude parody of conditioning (pixels are black/white depending on
whether or not they intersect the planar tree).
4.3.5. Results
Home Page
Title Page
Figure 4.9 presents snapshots (over about 50k moves) of a first try, using accept/reject probabilities for a Metropolis-Hastings algorithm based on the above
prior and likelihood.
Contents
JJ
II
J
I
Page 123 of 167
Go Back
Full Screen
Close
Quit
Figure 4.9: AVI movie of fixed-width MCMC.
We improve the algorithm by adding
Home Page
Title Page
Contents
(1) a further AR(1) process to model the width of blood vessels;
(2) a Langevin-related component to the Metropolis-Hasting proposal distribution, biasing the proposal according to local modes of the posterior density
(compare Grenander and Miller 1994 and especially Besag’s remarks!);
(3) the possibility of proposing addition (and deletion) of features of length 2;
JJ
II
J
I
(4) gradual refinement of feature proposals from coarse to fine levels.
Page 124 of 167
Go Back
Full Screen
Close
Figure 4.10: AVI movies of improved MCMC.
Quit
4.3.6. Refining the starting state
Home Page
Title Page
Contents
JJ
II
J
I
Page 125 of 167
Go Back
Full Screen
Close
Quit
We improve matters by choosing a better starting state for the algorithm, using a
variant of Kruskal’s algorithm.
This generates a minimal spanning tree of a graph as follows: Let G = (V, E) be
a graph with edges weighted by cost. A spanning tree is a subgraph of G which is
(a) a tree and (b) spans all vertices. We wish to find a spanning tree τ of minimal
cost:
def kruskal (edges) :
edgelist ← edges.sort () # sort so cheapest edge is first
tree ← [ ]
for e ∈ edgelist :
if cyclefree (tree, e) : # check no cycle is created
tree.append (e)
return tree
(Key remark: at end, swapping in new edge e would create a circuit which must
be broken by removal of e0 . By construction cost(e) ≥ cost(e0 ).)
Actually we need a binary tree (so amend cyclefree accordingly!) and the cost
depends on the edges to which it is proposed to join e, but a heuristic modification
of Kruskal’s algorithm produces a binary forest which is a good candidate for a
starting position for our algorithm. Figure 4.11 shows a result; notice how now
the algorithm begins with many small trees, which are progressively dropped or
joined to other trees.
Home Page
Title Page
Contents
JJ
II
J
I
Page 126 of 167
Go Back
Full Screen
Close
Quit
Figure 4.11: AVI movie of MCMC using starting position provided by Kruskal’s
algorithm.
Home Page
Title Page
Contents
JJ
II
J
I
Chapter 5
Multiresolution Ising models
Page 127 of 167
Go Back
Full Screen
Close
Quit
Contents
5.1
5.2
5.3
5.4
5.5
5.6
Generalized quad-trees
Random clusters . . . .
Percolation . . . . . . .
Comparison . . . . . . .
Simulation . . . . . . .
Future work . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
133
136
143
145
147
Home Page
Title Page
Contents
JJ
II
J
I
Page 128 of 167
The vascular problem discussed above involves multi-resolution image analysis
described, for example, in Wilson and Li (2002). This represents the image on
different scales: the simplest example is that of a quad-tree (a tree graph in which
each node has four daughters and just one parent), for which each node carries
a random value 0 (coding the colour white) and 1 (coding the colour black), and
the random values interact in the manner of a Markov point process with their
neighbours. In the vascular case the 0/1 value is replaced by a line segment summarizing the linear structure which best explains the local data at the appropriate
resolution level, but for now we prefer to analyze the simpler 0/1 case.
Conventional multi-resolution models have interactions
only running up and down the tree; however in our case
we clearly need interactions between neighbours at the
same resolution level as well. This complicates the analysis!
Go Back
Full Screen
Close
Quit
A major question is to determine the extent to which values at high resolution
levels are correlated with values at lower resolution levels (Kendall and Wilson
2003). This is related to much recent interest in the question of phase transitions
for Ising models on non-Euclidean random graphs. We shall show how geometric
arguments lead to a schematic phase portrait.
5.1. Generalized quad-trees
Home Page
Title Page
Define Qd as graph whose vertices are cells of all dyadic tessellations of Rd , with
edges connecting each cell u to its 2d neighbours, and also its parent M(u) (covering cell in tessellation at next resolution down) and its 2d daughters (cells which
it covers in next resolution up).
Contents
JJ
II
J
I
Case d = 1:
Page 129 of 167
Go Back
Full Screen
Close
Quit
Neighbours at same level also are connected.
Remark: No spatial symmetry!
However: Qd can be constructed using an iterated function system generated by
elements of a skew-product group R+ ⊗s Rd .
Further define:
Home Page
Title Page
Contents
JJ
II
J
I
Page 130 of 167
• Qd;r as subgraph of Qd at resolution
levels of r or higher;
• Qd (o) as subgraph formed by o and
all its descendants.
Go Back
Full Screen
Close
Quit
• Remark: there are many graph-isomorphisms Su;v between Qd;r and Qd;s ,
with natural Zd -action;
• Remark: there are graph homomorphisms injecting Q(o) into itself, sending o to x ∈ Q(o) (semi-transitivity).
Simplistic analysis
Home Page
Title Page
Contents
JJ
II
J
I
Page 131 of 167
Define Jλ to be strength of neighbour interaction, Jτ to be strength of parent interaction. If Sx = ±1 then probability of configuration is proportional to exp(−H)
where
X
1
H=−
Jhx,yi (Sx Sy − 1) ,
(5.1)
2
hx,yi∈E(G)
for Jhx,yi = Jλ , Jτ as appropriate.
If Jλ = 0 then the free Ising model on Qd (o) is a branching process (Preston
1977; Spitzer 1975); if Jτ = 0 then the Ising model on Qd (o) decomposes into
sequence of d-dimensional classical (finite) Ising models. So we know there is
a phase change at (Jλ , Jτ ) = (0, ln(5/3))
(results from branching processes),
√
and expect one at (Jλ , Jτ ) = (ln(1 + 2), 0+) (results from 2-dimensional Ising
model).
But is this all that there is to say?
Go Back
Full Screen
Close
Quit
Home Page
Title Page
There has been a lot of work on percolation and Ising models on non-Euclidean
graphs (Grimmett and Newman 1990; Newman and Wu 1990; Benjamini and
Schramm 1996; Benjamini et al. 1999; Häggström and Peres 1999; Häggström
et al. 1999; Lyons 2000; Häggström et al. 2002). Mostly this concerns graphs
admitting substantial symmetry (unimodular, quasi-transitive), so that one can use
Häggström’s remarkable Mass Transport method:
Contents
JJ
II
J
I
Page 132 of 167
Go Back
Full Screen
Close
Quit
Theorem 5.1 Let Γ ⊂ Aut(G) be unimodular and quasi-transitive. Given
Rmass transport m(x, y, ω) ≥ 0 “diagonally invariant”, set M (x, y) =
m(x, y, ω) dω. Then the expected total transport out of any x equals
Ω
the expected total transport into x:
X
X
M (x, y) =
M (y, x) .
y∈Γx
y∈Γx
Sadly our graphs do not have quasi-transitive symmetry :-(.
Furthermore the literature mostly deals with phase transition results “for sufficiently large values of the parameter”. This is an invaluable guide, but image
analysis needs actual estimates!
5.2. Random clusters
Home Page
Title Page
Contents
JJ
II
J
I
Page 133 of 167
Go Back
Full Screen
Close
Quit
A similar problem, concerning Ising models on products of trees with Euclidean
lattices, is treated by Newman and Wu (1990). We follow them by exploiting the
celebrated Fortuin-Kasteleyn random cluster representation (Fortuin and Kasteleyn 1972; Fortuin 1972a; Fortuin 1972b):
The Ising model is the marginal site process at q = 2 of a site/bond process derived from a dependent bond percolation model with configuration probability Pq,p
proportional to
qC ×
Y
(phx,yi )bhx,yi × (1 − phx,yi )1−bhx,yi .
hx,yi∈E(G)
(where bhx,yi indicates whether or not hx, yi is closed, and C is the number of
connected clusters of vertices). Site spins are chosen to be the same in each cluster
independently of other clusters with equal probabilities for ±1. We find
phx,yi = 1 − exp(−Jhx,yi ) ;
(5.2)
Argue for general q (Potts model): spot the Gibbs’ sampler!
(I) given bonds, chose connected component colours independently and uniformly
from 1, . . . , q;
(II) given colouring, open bonds hx, yi independently, probability phx,yi .
Home Page
Title Page
Exercise 5.2 Check step (II) gives correct conditional probabilities!
Now compute conditional probability that site o gets colour 1 given neighbours
split into components C1 , . . . , Cr with colour 1, and other neighbours in set C0 .
Contents
Exercise 5.3 Use inclusion-exclusion to show it is this:
JJ
II
J
I
Page 134 of 167
!
Y
1
P [colour(o) = 1|neighbours] ∝
×q
(1 − phx,yi ) +
q
y


!
Y
Y


+
(1 − phx,yi )
1−
1 − phx,zi
y∈C0
Go Back
z∈C1 ∪...∪Cr
=
Y
(1 − phx,yi ) .
y:colour(y)6=1
Full Screen
Close
Quit
Deduce interaction strength is Jhx,yi = − log(1 − phx,yi ) as stated in Equation
(5.2) above.
FK-comparison inequalities
Home Page
If q ≥ 1 and A is an increasing event then
Pq,p (A) ≤ P1,p (A)
Pq,p (A) ≥ P1,p0 (A)
Title Page
Contents
where
p0hx,yi =
JJ
II
J
I
Page 135 of 167
Go Back
Full Screen
Close
Quit
(5.3)
(5.4)
phx,yi
phx,yi
=
.
phx,yi + (1 − phx,yi )q
q − (q − 1)phx,yi
Since P1,p is bond percolation (bonds open or not independently of each other),
we can find out about phase transitions by studying independent bond percolation.
5.3. Percolation
Home Page
Title Page
Contents
JJ
II
J
I
Page 136 of 167
Go Back
Independent bond percolation on products of trees with Euclidean lattices have
been studied by Grimmett and Newman (1990), and these results were used in the
Newman and Wu work on the Ising model. So we can make good progress by
studying independent bond percolation on Qd , using pτ for parental bonds, pλ for
neighbour bonds.
Theorem 5.4 There is almost surely no infinite cluster in Qd;0 (and consequently in Qd (o)) if
q
2d τ Xλ 1 + 1 − Xλ−1
< 1,
where Xλ is the mean size of the percolation cluster at the origin for λpercolation in Zd .
Full Screen
Close
Quit
Modelled on Grimmett and Newman (1990, §3 and §5). Get
from matrix spectral asymptotics.
1+
q
1 − Xλ−1
1
Home Page
Title Page
λ
Contents
?
JJ
II
J
I
Infinite clusters here
Page 137 of 167
0
Go Back
No
infinite
clusters
here
0
(may or may not be unique)
2
−d
1
τ
Full Screen
Close
Quit
The story so far: small λ, small to moderate τ .
Case of small τ
Home Page
Title Page
Contents
JJ
II
J
I
Page 138 of 167
Go Back
Full Screen
Close
Quit
Need d = 2 for mathematical convenience. Use Borel-Cantelli argument and
planar duality to show, for supercritical λ > 1/2 (that is, supercritical with respect
to planar bond percolation!), all but finitely many of the resolution layers Ln =
[1, 2n ] × [1, 2n ] of Q2 (o) have just one large cluster each of diameter larger than
constant × n.
Hence . . .
Theorem 5.5 When λ > 1/2 and τ is positive there is one and only one
infinite cluster in Q2 (o).
1
Home Page
Just one unique infinite cluster here
Title Page
λ
Contents
JJ
II
J
I
1/2
?
Page 139 of 167
0
Go Back
Full Screen
Close
Quit
No
infinite
clusters
here
0
Infinite clusters here
(may or may not be unique)
1/4
τ
The story so far: adds small τ for case d = 2.
1
Uniqueness of infinite clusters
Home Page
Title Page
Contents
JJ
II
J
I
Page 140 of 167
Go Back
Full Screen
Close
Quit
The Grimmett and Newman (1990) work was remarkable in pointing out that as τ
increases so there is a further phase change, from many to just one infinite cluster
for λ > 0. The work of Grimmett and Newman carries
√ through for Qd (o). However the relevant bound is improved by a factor of 2 if we take into account the
hyperbolic structure of Qd (o)!
Theorem 5.6 If τ < 2−(d−1)/2 and λ > 0 then there cannot be just one
infinite cluster in Qd;0 .
Method: sum weights of “up-paths” in Qd;0 starting,
ending at level 0. For fixed s and start point there are
infinitely many such up-paths containing s λ-bonds;
but no more than (1 + 2d + 2d )s which cannot be reduced by “shrinking” excursions. Hence control the
mean number of open up-paths stretching more than a
given distance at level 0.
Contribution to upper bound on second phase transition:
Home Page
Title Page
Contents
JJ
II
J
I
Page 141 of 167
Go Back
Full Screen
Close
Quit
p
Theorem 5.7 If τ > 2/3 then the infinite cluster of Q2:0 is almost surely
unique for all positive λ.
Method: prune bonds, branching processes, 2-dim comparison . . .
1
Home Page
Just one unique infinite cluster here
Title Page
Contents
JJ
II
J
λ
1/2
I
Page 142 of 167
0
Go Back
Infinite clusters here
?
0
(may or may not be unique)
No
infinite
clusters
here
Many infinite clusters
1
1/4
τ
0.707
0.816
The story so far: includes uniqueness transition for case d = 2.
Full Screen
Close
Quit
5.4. Comparison
Home Page
Title Page
We need to apply the Fortuin-Kasteleyn comparison inequalities (5.3) and (5.4).
The event “just one infinite cluster” is not increasing, so we need more. Newman
and Wu (1990) show it suffices to establish a finite island property for the site
percolation derived under adjacency when all infinite clusters are removed. Thus:
Contents
JJ
II
J
I
1
λ
finite islands
1/2
Page 143 of 167
Go Back
0
Full Screen
0
1
1/4
τ
Close
Quit
0.707
0.816
Home Page
Comparison arguments then show the following schematic phase diagram for the
Ising model on Q2 (o):
Title Page
Contents
JJ
II
J
I
All nodes substantially
Free boundary condition
correlated with each
is mixture of the two extreme
other in case of free
Gibbs states (spin 1 at
boundary conditions
boundary, spin −1 at boundary)
1.099
0.693
Page 144 of 167
Go Back
Unique
J
Full Screen
λ
Gibbs
state
J
Close
Quit
τ
Root influenced
by wired boundary
0.288
0.511
0.762
1.228
1.190
2.292
5.5. Simulation
Home Page
Title Page
Approximate simulations confirm the general story (compare our original animation, which had Jτ = 2, Jλ = 1/4):
Contents
http://www.dcs.warwick.ac.uk/˜rgw/sira/sim.html
JJ
II
J
I
(1) Only 200 resolution levels;
(2) At each level, 1000 sweeps in scan order;
Page 145 of 167
Go Back
Full Screen
(3) At each level, simulate square sub-region of 128 × 128 pixels conditioned
by mother 64 × 64 pixel region;
(4) Impose periodic boundary conditions on 128 × 128 square region;
Close
Quit
(5) At the coarsest resolution, all pixels set white. At subsequent resolutions,
‘all black’ initial state.
Home Page
Title Page
Contents
JJ
II
J
I
(a) Jλ = 1, Jτ = 0.5
(b) Jλ = 1, Jτ = 1
(c) Jλ = 1, Jτ = 2
(d) Jλ = 0.5, Jτ = 0.5
(e) Jλ = 0.5, Jτ = 1
(f) Jλ = 0.5, Jτ = 2
(g) Jλ = 0.25, Jτ = 0.5
(h) Jλ = 0.25, Jτ = 1
(i) Jλ = 0.25, Jτ = 2
Page 146 of 167
Go Back
Full Screen
Close
Quit
5.6. Future work
Home Page
Title Page
Contents
JJ
II
J
I
Page 147 of 167
Go Back
Full Screen
Close
Quit
This is about the free Ising model on Q2 (o). Image analysis more naturally concerns the case of prescribed boundary conditions (say, image at finest resolution
level . . . ).
Question: will boundary conditions at “infinite fineness” propagate back to finite
resolution?
Series and Sinaı̆ (1990) show answer is yes for analogous problem on hyperbolic
disk (2-dim, all bond probabilities the same).
Gielis and Grimmett (2002) point out (eg, in Z3 case) these boundary conditions
translate to a conditioning for random cluster model, and investigate using large
deviations.
Project: do same for Q2 (o) . . . and get quantitative bounds?
Project: generalize from Qd to graphs with appropriate hyperbolic structure.
Home Page
Title Page
Contents
JJ
II
J
I
Page 148 of 167
Go Back
Full Screen
Close
Quit
Home Page
Title Page
Contents
JJ
II
J
I
Page 149 of 167
Go Back
Full Screen
Close
Quit
Appendix A
Notes on the proofs in Chapter
5
A.1. Notes on proof of Theorem 5.4
Home Page
Mean size of cluster at o bounded above by
∞
X
X
Title Page
n−T (t)
Xλ τ n (Xλ − 1)T (t) Xλ
n=0 t:|t|=n
Contents
≤
JJ
Xλ (τ Xλ )n
n=0
II
≤
J
∞
X
I
∞
X
Page 150 of 167
(1 − Xλ−1 )T (t)
t:|t|=n
Xλ (2d τ Xλ )n
n=0
≈
X
X
(1 − Xλ−1 )T (j)
j:|j|=n
∞
X
n
q
Xλ (2d τ Xλ )n 1 + 1 − Xλ−1
.
n=0
Go Back
Full Screen
For last step, use spectral analysis of matrix representation
n X
1
1
1
−1 T (j)
1 1
(1 − Xλ )
=
.
1
1 − Xλ−1 1
j:|j|=n
Close
Quit
A.2. Notes on proof of Theorem 5.5
Home Page
Title Page
Contents
JJ
II
J
I
Uniqueness: For negative exponent ξ(1 − λ) of dual connectivity function, set
`n
=
(n log 4 + (2 + ) log n) ξ(1 − λ) .
More than one “`n -large” cluster in Ln forces existence of open path in dual lattice
longer than `n . Now use Borel-Cantelli . . . .
On the other hand super-criticality will mean some distant points in Ln are interconnected.
Existence: consider 4n−[n/2] points in Ln−1 and specified daughters in Ln . Study
probability that
(a) parent percolates more than `n−1 ,
Page 151 of 167
Go Back
(b) parent and child are connected,
(c) child percolates more than `n .
Now use Borel-Cantelli again . . . .
Full Screen
Close
Quit
A.3. Notes on proof of Theorem 5.6
Home Page
Two relevant lemmata:
Title Page
Lemma A.1 Consider u ∈ Ls+1 ⊂ Qd and v = M(u) ∈ Ls ⊂ Qd . There
are exactly 2d solutions in Ls+1 of
Contents
M(x)
JJ
II
J
I
Page 152 of 167
=
Su;v (x) .
One is x = u. Others are the remaining 2d − 1 vertices y such that closure of
cell representing y intersects vertex shared by closures of cells representing
u and M(u). Finally, if x ∈ Ls+1 does not solve M(x) = Su;v (x) then
kSu;v (x) − Su;v (u)ks,∞
>
kM(x) − M(u)ks,∞ .
Go Back
Full Screen
Lemma A.2 Given distinct v and y in the same resolution level. Count pairs
of vertices u, x in the resolution level one step higher, such that
(a) M(u) = v; (b) M(x) = y; (c) Su;v (x) = y.
Close
Quit
There are at most 2d−1 such vertices.
A.4. Notes on proof of Theorem 5.7
Home Page
Prune! Then a direct connection is certainly established across the boundary between the cells corresponding to two neighbouring vertices u, v in L0 if
Title Page
(a) the τ -bond leading from u to the relevant boundary is open;
Contents
JJ
II
J
I
(b) a τ -branching process (formed by using τ -bonds mirrored across the boundary) survives indefinitely, where this branching process has family-size distribution Binomial(2, τ 2 );
(c) the τ -bond leading from v to the relevant boundary is open.
Page 153 of 167
Go Back
Full Screen
Close
Quit
Then there are infinitely many
chances of making a connection
across the cell boundary.
A.5. Notes on proof of infinite island property
Home Page
Title Page
Notion of “cone boundary” ∂c (S) of finite subset S of vertices: collection of
daughters v of S such that Qd (v) ∩ S = ∅.
Use induction on S, building it layer Ln on layer Ln−1 to obtain an isoperimetric
bound: #(∂c (S)) ≥ (2d − 1)#(S). Hence deduce
Contents
P [S in island at u]
JJ
II
J
I
Page 154 of 167
≤
(2d −1)n
(1 − pτ (1 − η))
where #(S) = n and η = P [u not in infinite cluster of Qd (u)].
Upper bound on number N (n) of self-avoiding paths S of length n beginning at
u0 :
N (n) ≤ (1 + 2d + 2d )(2d + 2d )n .
Hence upper bound on the mean size of the island:
∞
X
Go Back
n(1−2−d )
(1 + 2d + 2d )(2d + 2d )n ηbr
,
n=0
Full Screen
Close
Quit
where ηbr is extinction probability for branching process based on Binomial(2d , pτ )
family distribution.
Home Page
Title Page
Bibliography
Contents
JJ
II
J
I
Page 155 of 167
Go Back
Full Screen
Close
Quit
Aldous, D. and M. T. Barlow [1981]. On
countable dense random sets. Lecture Notes in Mathematics 850, 311–
327.
Aldous, D. J. and P. Diaconis [1987].
Strong uniform times and finite random walks. Advances in Applied
Mathematics 8(1), 69–97.
Baddeley, A. J. and J. Møller [1989].
Nearest-neighbour Markov point
processes and random sets. Int.
Statist. Rev. 57, 89–121.
Baddeley, A. J. and M. N. M.
van Lieshout [1993]. Stochastic
geometry models in high-level
vision. Journal of Applied Statistics 20, 233–258.
Baddeley, A. J. and M. N. M.
van
Lieshout
[1995].
Areainteraction point processes. Annals of the Institute of Statistical
Mathematics 47, 601–619.
Barndorff-Nielsen, O. E., W. S. Kendall,
and M. N. M. van Lieshout (Eds.)
[1999]. Stochastic Geometry: Likelihood and Computation. Number 80
in Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC.
Benjamini, I., R. Lyons, Y. Peres, and
Home Page
Title Page
Contents
JJ
II
J
I
Page 156 of 167
Go Back
Full Screen
Close
Quit
O. Schramm [1999]. Group-invariant
percolation on graphs. Geom. Funct.
Anal. 9(1), 29–66.
Benjamini, I. and O. Schramm [1996].
Percolation beyond Zd , many questions and a few answers. Electronic
Communications in Probability 1,
no. 8, 71–82 (electronic).
Bhalerao, A., E. Thönnes, W. S. Kendall,
and R. G. Wilson [2001]. Inferring
vascular structure from 2d and 3d imagery. In W. J. Niessen and M. A.
Viergever (Eds.), Medical Image
Computing and Computer-Assisted
Intervention, proceedings of MICCAI 2001, Volume 2208 of Springer
Lecture Notes in Computer Science,
pp. 820–828. Springer-Verlag. Also:
University of Warwick Department
of Statistics Research Report 392.
Billera, L. J. and P. Diaconis [2001].
A geometric interpretation of the
Metropolis-Hastings algorithm. Statistical Science 16(4), 335–339.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone [1984]. Clas-
sification and Regression. Statistics/Probability. Wadsworth.
Brooks, S. P., Y. Fan, and J. S. Rosenthal [2002]. Perfect forward simulation via simulated tempering. Research report, University of Cambridge.
Burdzy, K. and W. S. Kendall [2000,
May]. Efficient Markovian couplings: examples and counterexamples. The Annals of Applied
Probability 10(2), 362–409.
Also University of Warwick Department of Statistics Research Report
331.
Cai, Y. and W. S. Kendall [2002,
July]. Perfect simulation for correlated Poisson random variables conditioned to be positive. Statistics and
Computing 12, 229–243. Also: University of Warwick Department of
Statistics Research Report 349.
Carter, D. S. and P. M. Prenter [1972].
Exponential spaces and counting
processes. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 21, 1–19.
Home Page
Title Page
Contents
JJ
II
J
I
Page 157 of 167
Go Back
Full Screen
Close
Quit
Chipman, H., E. George, and R. McCulloch [2002]. Bayesian treed models.
Machine Learning 48, 299–320.
Clancy, D. and P. D. O’Neill [2002]. Perfect simulation for stochastic models
of epidemics among a community of
households. Research report, School
of Mathematical Sciences, University of Nottingham.
Cowan, R., M. Quine, and S. Zuyev
[2003]. Decomposition of gammadistributed domains constructed
from Poisson point processes. Advances in Applied Probability 35(1),
56–69.
Diaconis, P. [1988]. Group Representations in Probability and Statistics,
Volume 11 of IMS Lecture Notes Series. Hayward, California: Institute
of Mathematical Statistics.
Diaconis, P. and J. Fill [1990]. Strong stationary times via a new form of duality. The Annals of Probability 18,
1483–1522.
Ferrari, P. A., R. Fernández, and N. L.
Garcia [2002]. Perfect simulation for
interacting point processes, loss networks and Ising models. Stochastic
Process. Appl. 102(1), 63–88.
Fill, J. [1998]. An interruptible algorithm for exact sampling via Markov
Chains. The Annals of Applied Probability 8, 131–162.
Fill, J. A., M. Machida, D. J. Murdoch, and J. S. Rosenthal [2000].
Extension of Fill’s perfect rejection sampling algorithm to general
chains. Random Structures and Algorithms 17(3-4), 290–316.
Fortuin, C. M. [1972a]. On the randomcluster model. II. The percolation
model. Physica 58, 393–418.
Fortuin, C. M. [1972b]. On the randomcluster model. III. The simple
random-cluster model. Physica 59,
545–570.
Fortuin, C. M. and P. W. Kasteleyn [1972]. On the random-cluster
model. I. Introduction and relation to
other models. Physica 57, 536–564.
Home Page
Title Page
Contents
JJ
II
J
I
Page 158 of 167
Go Back
Full Screen
Close
Quit
Foss, S. G. and R. L. Tweedie [1998].
Perfect simulation and backward
coupling. Stochastic Models 14,
187–203.
Gelfand, A. E. and A. F. M. Smith
[1990]. Sampling-based approaches
to calculating marginal densities.
Journal of the American Statistical
Association 85(410), 398–409.
Geman, S. and D. Geman [1984].
Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images. IEEE Transactions on Pattern
Analysis and Machine Intelligence 6,
721–741.
Geyer, C. J. and J. Møller [1994]. Simulation and likelihood inference for
spatial point processes. Scand. J.
Statist. 21, 359–373.
Gielis, G. and G. R. Grimmett [2002].
Rigidity of the interface for percolation
and
random-cluster
models. Journal of Statistical
Physics 109, 1–37. See also
arXiv:math.PR/0109103.,
Green, P. J. [1995]. Reversible jump
Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732.
Grenander, U. and M. I. Miller [1994].
Representations of knowledge in
complex systems (with discussion
and a reply by the authors). Journal
of the Royal Statistical Society (Series B: Methodological) 56(4), 549–
603.
Grimmett, G. R. and C. Newman [1990].
Percolation in ∞ + 1 dimensions.
In Disorder in physical systems, pp.
167–190. New York: The Clarendon
Press Oxford University Press. Also
at
Hadwiger, H. [1957]. Vorlesungen über
Inhalt, Oberfläche und Isoperimetrie. Berlin: Springer-Verlag.
Häggström, O., J. Jonasson, and
R. Lyons [2002]. Explicit isoperimetric constants and phase transitions in the random-cluster model.
The Annals of Probability 30(1),
443–473.
Home Page
Title Page
Contents
JJ
II
J
I
Page 159 of 167
Go Back
Full Screen
Close
Quit
Häggström, O. and Y. Peres [1999].
Monotonicity of uniqueness for percolation on Cayley graphs: all infinite clusters are born simultaneously. Probability Theory and Related Fields 113(2), 273–285.
Häggström, O., Y. Peres, and R. H.
Schonmann [1999]. Percolation on
transitive graphs as a coalescent process: relentless merging followed by
simultaneous uniqueness. In Perplexing problems in probability, pp. 69–
90. Boston, MA: Birkhäuser Boston.
Häggström, O. and J. E. Steif [2000].
Propp-Wilson algorithms and finitary codings for high noise Markov
random fields. Combin. Probab.
Computing 9, 425–439.
Häggström, O., M. N. M. van Lieshout,
and J. Møller [1999]. Characterisation results and Markov chain Monte
Carlo algorithms including exact
simulation for some spatial point
processes. Bernoulli 5(5), 641–658.
Was: Aalborg Mathematics Department Research Report R-96-2040.
Hall, P. [1988]. Introduction to the theory of coverage processes. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: John
Wiley & Sons.
Hastings, W. K. [1970]. Monte Carlo
sampling methods using Markov
chains and their applications.
Biometrika 57, 97–109.
Himmelberg, C. J. [1975]. Measurable
relations. Fundamenta mathematicae 87, 53–72.
Huber, M. [2003]. A bounding chain
for Swendsen-Wang. Random Structures and Algorithms 22(1), 43–59.
Hug, D., M. Reitzner, and R. Schneider [2003]. The limit shape of
the zero cell in a stationary poisson hyperplane tessellation. Preprint
03-03, Albert-Ludwigs-Universität
Freiburg.
Jeulin, D. [1997]. Conditional simulation
of object-based models. In D. Jeulin
(Ed.), Advances in Theory and Ap-
Home Page
Title Page
Contents
JJ
II
J
I
Page 160 of 167
Go Back
Full Screen
Close
Quit
plications of Random Sets, pp. 271–
288. World Scientific.
Kallenberg, O. [1977]. A counterexample to R. Davidson’s conjecture on
line processes. Math. Proc. Camb.
Phil. Soc. 82(2), 301–307.
Kelly, F. P. and B. D. Ripley [1976]. A
note on Strauss’s model for clustering. Biometrika 63, 357–360.
Kendall, D. G. [1974]. Foundations of
a theory of random sets. In E. F.
Harding and D. G. Kendall (Eds.),
Stochastic Geometry. John Wiley &
Sons.
Kendall, W. S. [1997a]. On some
weighted Boolean models. In
D. Jeulin (Ed.), Advances in Theory
and Applications of Random Sets,
Singapore, pp. 105–120. World Scientific. Also: University of Warwick
Department of Statistics Research
Report 295.
Kendall, W. S. [1997b]. Perfect simulation for spatial point processes. In
Bulletin ISI, 51st session proceed-
ings, Istanbul (August 1997), Volume 3, pp. 163–166. Also: University of Warwick Department of
Statistics Research Report 308.
Kendall, W. S. [1998]. Perfect simulation
for the area-interaction point process. In L. Accardi and C. C. Heyde
(Eds.), Probability Towards 2000,
New York, pp. 218–234. SpringerVerlag. Also: University of Warwick Department of Statistics Research Report 292.
Kendall, W. S. [2000, March]. Stationary countable dense random sets. Advances in Applied Probability 32(1),
86–100.
Also University of
Warwick Department of Statistics
Research Report 341.
Kendall, W. S. and J. Møller [2000,
September]. Perfect simulation using dominating processes on ordered
state spaces, with application to locally stable point processes. Advances in Applied Probability 32(3),
844–865.
Also University of
Warwick Department of Statistics
Home Page
Title Page
Contents
JJ
II
J
I
Page 161 of 167
Go Back
Full Screen
Close
Quit
Research Report 347.
Kendall, W. S. and G. Montana [2002,
May]. Small sets and Markov transition densities. Stochastic Processes
and Their Applications 99(2), 177–
194.
Also University of Warwick
Department of Statistics Research
Report 371
Kendall, W. S. and E. Thönnes [1999].
Perfect simulation in stochastic geometry. Pattern Recognition 32(9),
1569–1586. Also: University of
Warwick Department of Statistics
Research Report 323.
Kendall, W. S., M. van Lieshout,
and A. Baddeley [1999]. Quermassinteraction processes: conditions for
stability. Advances in Applied Probability 31(2), 315–342.
Also University of Warwick Department of
Statistics Research Report 293.
Kendall, W. S. and R. G. Wilson [2003,
March]. Ising models and multiresolution quad-trees. Advances in Applied Probability 35(1), 96–122.
Also University of Warwick De-
partment of Statistics Research Report 402.
Kingman, J. F. C. [1993]. Poisson processes, Volume 3 of Oxford Studies in Probability. New York: The
Clarendon Press Oxford University
Press. Oxford Science Publications.
Kolmogorov, A. N. [1936]. Zur theorie
der markoffschen ketten. Math. Annalen 112, 155–160.
Kovalenko, I. [1997]. A proof of a conjecture of David Kendall on the
shape of random polygons of large
area. Kibernet. Sistem. Anal. (4), 3–
10, 187. English translation: Cybernet. Systems Anal. 33 461-467.
Kovalenko, I. N. [1999]. A simplified proof of a conjecture of D.
G. Kendall concerning shapes of
random polygons. J. Appl. Math.
Stochastic Anal. 12(4), 301–310.
Kumar, V. S. A. and H. Ramesh
[2001]. Coupling vs. conductance for
the Jerrum-Sinclair chain. Random
Structures and Algorithms 18(1), 1–
17.
Home Page
Title Page
Contents
JJ
II
J
I
Page 162 of 167
Go Back
Full Screen
Close
Quit
Kurtz, T. G. [1980]. The optional sampling theorem for martingales indexed by directed sets. The Annals of
Probability 8, 675–681.
Lee, A. B., D. Mumford, and J. Huang
[2001]. Occlusion models for natural images: A statistical study of
a scale-invariant dead leaves model.
International Journal of Computer
Vision 41, 35–59.
Li, W. and A. Bhalerao [2003]. Model
based segmentation for retinal fundus images. In Proceedings of Scandinavian Conference on Image Analysis SCIA’03, Göteborg, Sweden, pp.
to appear.
Liang, F. and W. H. Wong [2000]. Evolutionary Monte Carlo sampling: Applications to Cp model sampling
and change-point problem. Statistica
Sinica 10, 317–342.
Liang, F. and W. H. Wong [2001,
June]. Real-parameter evolutionary
Monte Carlo with applications to
Bayesian mixture models. Journal
of the American Statistical Association 96(454), 653–666.
Lindvall, T. [1982]. On coupling of
continuous-time renewal processes.
Journal of Applied Probability 19(1),
82–89.
Lindvall, T. [2002]. Lectures on the coupling method. Mineola, NY: Dover
Publications Inc. Corrected reprint of
the 1992 original.
Liu, J. S., F. Liang, and W. H. Wong
[2001, June]. A theory for dynamic
weighting in Monte Carlo computation. Journal of the American Statistical Association 96(454), 561–573.
Loizeaux, M. A. and I. W. McKeague
[2001]. Perfect sampling for posterior landmark distributions with an
application to the detection of disease clusters. In Selected Proceedings of the Symposium on Inference
for Stochastic Processes, Volume 37
of Lecture Notes - Monograph Series, pp. 321–331. Institute of Mathematical Statistics.
Home Page
Title Page
Contents
JJ
II
J
I
Page 163 of 167
Go Back
Full Screen
Close
Quit
Loynes, R. [1962]. The stability of a
queue with non-independent interarrival and service times. Math. Proc.
Camb. Phil. Soc. 58(3), 497–520.
Lyons, R. [2000]. Phase transitions
on nonamenable graphs. J. Math.
Phys. 41(3), 1099–1126. Probabilistic techniques in equilibrium and
nonequilibrium statistical physics.
Matheron, G. [1975]. Random Sets and
Integral Geometry. Chichester: John
Wiley & Sons.
Meester, R. and R. Roy [1996]. Continuum percolation, Volume 119 of
Cambridge Tracts in Mathematics.
Cambridge: Cambridge University
Press.
Metropolis, N., A. V. Rosenbluth,
M. N. Rosenbluth, A. H. Teller,
and E. Teller [1953]. Equation of
state calculations by fast computing machines. Journal of Chemical
Physics 21, 1087–1092.
Meyn, S. P. and R. L. Tweedie [1993].
Markov Chains and Stochastic Sta-
bility. New York: Springer-Verlag.
Miles, R. E. [1995]. A heuristic proof
of a long-standing conjecture of D.
G. Kendall concerning the shapes of
certain large random polygons. Advances in Applied Probability 27(2),
397–417.
Minlos, R. A. [1999]. Introduction
to Mathematical Statistical Physics,
Volume 19 of University Lecture Series. American Mathematical Society.
Mira, A., J. Møller, and G. O. Roberts
[2001]. Perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 63(3),
593–606. See (Mira, Møller, and
Roberts 2002) for corrigendum.
Mira, A., J. Møller, and G. O. Roberts
[2002]. Corrigendum: perfect slice
samplers. Journal of the Royal Statistical Society (Series B: Methodological) 64(3), 581.
Molchanov, I. S. [1996a]. A limit
theorem for scaled vacancies of
Home Page
Title Page
Contents
JJ
II
J
I
Page 164 of 167
Go Back
Full Screen
Close
Quit
the Boolean model. Stochastics and
Stochastic Reports 58(1-2), 45–65.
Molchanov, I. S. [1996b]. Statistics of
the Boolean Models for Practitioners and Mathematicians. Chichester:
John Wiley & Sons.
Møller, J. (Ed.) [2003]. Spatial Statistics
and Computational Methods, Volume 173 of Lecture Notes in Statistics. Springer-Verlag.
Møller, J. and G. Nicholls [1999]. Perfect
simulation for sample-based inference. Technical Report R-99-2011,
Department of Mathematical Sciences, Aalborg University.
Møller, J. and R. Waagepetersen [2003].
Statistics inference and simulation
for spatial point processes. Monographs on Statistics and Applied
Probability. Boca Raton: Chapman
and Hall / CRC.
Møller, J. and S. Zuyev [1996]. Gammatype results and other related properties of Poisson processes. Advances
in Applied Probability 28(3), 662–
673.
Murdoch, D. J. and P. J. Green [1998].
Exact sampling from a continuous
state space. Scandinavian Journal
of Statistics Theory and Applications 25, 483–502.
Mykland, P., L. Tierney, and B. Yu
[1995]. Regeneration in Markov
chain samplers. Journal of the American Statistical Association 90(429),
233–241.
Newman, C. and C. Wu [1990]. Markov
fields on branching planes. Probability Theory and Related Fields 85(4),
539–552.
Nummelin, E. [1984]. General irreducible Markov chains and nonnegative operators. Cambridge: Cambridge University Press.
O’Neill, P. D. [2001]. Perfect simulation for Reed-Frost epidemic models. Statistics and Computing 13, 37–
44.
Parkhouse, J. G. and A. Kelly [1998].
The regular packing of fibres in three
Home Page
Title Page
Contents
JJ
II
J
I
Page 165 of 167
Go Back
Full Screen
Close
Quit
dimensions. Philosophical Transactions of the Royal Society, London,
Series A 454(1975), 1889–1909.
Peskun, P. H. [1973]. Optimum Monte
Carlo sampling using Markov
chains. Biometrika 60, 607–612.
Preston, C. J. [1977]. Spatial birth-anddeath processes. Bull. Inst. Internat.
Statist. 46, 371–391.
Propp, J. G. and D. B. Wilson [1996]. Exact sampling with coupled Markov
chains and applications to statistical
mechanics. Random Structures and
Algorithms 9, 223–252.
Quine, M. P. and D. F. Watson [1984].
Radial generation of n-dimensional
Poisson processes. Journal of Applied Probability 21(3), 548–557.
Ripley, B. D. [1977]. Modelling spatial
patterns (with discussion). Journal of
the Royal Statistical Society (Series
B: Methodological) 39, 172–212.
Ripley, B. D. [1979]. Simulating spatial
patterns: dependent samples from a
multivariate density. Applied Statis-
tics 28, 109–112.
Ripley, B. D. and F. P. Kelly [1977].
Markov point processes. J. Lond.
Math. Soc. 15, 188–192.
Robbins, H. E. [1944]. On the measure of
a random set I. Annals of Mathematical Statistics 15, 70–74.
Robbins, H. E. [1945]. On the measure of
a random set II. Annals of Mathematical Statistics 16, 342–347.
Roberts, G. O. and N. G. Polson [1994].
On the geometric convergence of
the Gibbs sampler. Journal of the
Royal Statistical Society (Series B:
Methodological) 56(2), 377–384.
Roberts, G. O. and J. S. Rosenthal
[1999]. Convergence of slice sampler Markov chains. Journal of the
Royal Statistical Society (Series B:
Methodological) 61(3), 643–660.
Roberts, G. O. and J. S. Rosenthal
[2000]. Recent progress on computable bounds and the simple slice
sampler. In Monte Carlo methods
(Toronto, ON, 1998), pp. 123–130.
Home Page
Title Page
Contents
JJ
II
J
I
Page 166 of 167
Go Back
Full Screen
Close
Quit
Providence, RI: American Mathematical Society.
Roberts, G. O. and R. L. Tweedie
[1996a]. Exponential convergence of
Langevin diffusions and their discrete approximations. Bernoulli 2,
341–364.
Roberts, G. O. and R. L. Tweedie
[1996b]. Geometric convergence and
central limit theorems for multidimensional Hastings and Metropolis
algorithms. Biometrika 83, 96–110.
Rosenthal, J. S. [2002]. Quantitative convergence rates of Markov chains: A
simple account. Electronic Communications in Probability 7, no. 13,
123–128 (electronic).
Series, C. and Y. Sinaı̆ [1990]. Ising
models on the Lobachevsky plane.
Comm. Math. Phys. 128(1), 63–76.
Spitzer, F. [1975]. Markov random fields
on an infinite tree. The Annals of
Probability 3(3), 387–398.
Stoyan, D., W. S. Kendall, and J. Mecke
[1995]. Stochastic geometry and its
applications (Second ed.). Chichester: John Wiley & Sons. (First edition in 1987 joint with Akademie
Verlag, Berlin).
Strauss, D. J. [1975]. A model for clustering. Biometrika 63, 467–475.
Thönnes, E. [1999]. Perfect simulation of
some point processes for the impatient user. Advances in Applied Probability 31(1), 69–87. Also University
of Warwick Statistics Research Report 317.
Thönnes, E., A. Bhalerao, W. S. Kendall,
and R. G. Wilson [2002]. A Bayesian
approach to inferring vascular tree
structure from 2d imagery. In W. J.
Niessen and M. A. Viergever (Eds.),
Proceedings IEEE ICIP-2002, Volume II, Rochester, NY, pp. 937–939.
Also: University of Warwick Department of Statistics Research Report
391.
Thönnes, E., A. Bhalerao, W. S. Kendall,
and R. G. Wilson [2003]. Bayesian
analysis of vascular structure. In
LASR 2003, Volume to appear.
Home Page
Title Page
Contents
JJ
II
J
I
Page 167 of 167
Go Back
Full Screen
Close
Quit
Thorisson, H. [2000]. Coupling, stationarity, and regeneration. New York:
Springer-Verlag.
Tierney, L. [1998]. A note on MetropolisHastings kernels for general state
spaces. The Annals of Applied Probability 8(1), 1–9.
Van Lieshout, M. N. M. [2000]. Markov
point processes and their applications. Imperial College Press.
Van Lieshout, M. N. M. and A. J. Baddeley [2001, October]. Extrapolating and interpolating spatial patterns.
CWI Report PNA-R0117, CWI, Amsterdam.
Widom, B. and J. S. Rowlinson [1970].
New model for the study of liquidvapor phase transitions. Journal of
Chemical Physics 52, 1670–1684.
Wilson, D. B. [2000a]. How to couple from the past using a readonce source of randomness. Random
Structures and Algorithms 16(1),
85–113.
Wilson, D. B. [2000b]. Layered Multishift Coupling for use in Perfect
Sampling Algorithms (with a primer
on CFTP). In N. Madras (Ed.),
Monte Carlo Methods, Volume 26
of Fields Institute Communications,
pp. 143–179. American Mathematical Society.
Wilson, R., A. Calway, and E. R. S. Pearson [March 1992]. A Generalised
Wavelet Transform for Fourier Analysis: The Multiresolution Fourier
Transform and Its Application to
Image and Audio Signal Analysis.
IEEE Transactions on Information
Theory 38(2), 674–690.
Wilson, R. G. and C. T. Li [2002]. A class
of discrete multiresolution random
fields and its application to image
segmentation. IEEE Transactions on
Pattern Analysis and Machine Intelligence 25, 42–56.
Zuyev, S. [1999]. Stopping sets: gammatype results and hitting properties. Advances in Applied Probability 31(2), 355–366.
Download