Home Page Title Page Contents JJ II J I Page 1 of 167 Models and simulation techniques from stochastic geometry Wilfrid S. Kendall Department of Statistics, University of Warwick Go Back Full Screen Close Quit A course of 5 × 75 minute lectures at the Madison Probability Intern Program, June-July 2003 Introduction Home Page Title Page Stochastic geometry is the study of random patterns, whether of points, line segments, or objects. Related reading: Stoyan et al. (1995); Contents Barndorff-Nielsen et al. (1999); JJ II Møller and Waagepetersen (2003) and the recent expository essays edited by Møller (2003); J I The Stochastic Geometry and Statistical Applications section of Advances in Applied Probability. Page 2 of 167 Go Back Full Screen Close Quit Home Page This screen layout is not very suitable for printing. However, if a printout is desired then here is a button to do the job! Print Title Page Contents JJ II J I Page 3 of 167 Go Back Full Screen Close Quit Home Page Title Page Contents JJ II J I Page 4 of 167 Go Back Full Screen Close Quit Home Page Title Page Contents JJ II J I Page 5 of 167 Go Back Full Screen Close Quit Contents 1 A rapid tour of point processes 7 2 Simulation for point processes 45 3 Perfect simulation 79 4 Deformable templates and vascular trees 107 5 Multiresolution Ising models 127 A Notes on the proofs in Chapter 5 149 Home Page Title Page Contents JJ II J I Page 6 of 167 Go Back Full Screen Close Quit Home Page Title Page Contents JJ II J I Chapter 1 A rapid tour of point processes Page 7 of 167 Go Back Full Screen Close Quit Contents 1.1 1.2 1.3 1.4 1.5 The Poisson point process . . . . Boolean models . . . . . . . . . . General definitions and concepts Markov point processes . . . . . Further issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 23 27 33 41 1.1. The Poisson point process Home Page Title Page Contents JJ II J I The simplest example of a random point pattern is the Poisson point process, which models “completely random” distribution of points in the plane R2 , or 3-space R3 , or other spaces. It is a basic building block in stochastic geometry and elsewhere, with many symmetries which typically reduce calculations to computations of area or volume. Page 8 of 167 Go Back Full Screen Close Quit Figure 1.1: Which of these point clouds exhibits interaction? Home Page Title Page Contents JJ II J I Page 9 of 167 Go Back Definition 1.1 Let X be a random point process in a window W ⊆ Rn defined as a collection of random variables X(A), where A runs through the Borel subsets of W ⊆ Rn . The point process X is Poisson of intensity λ > 0 if the following are true: 1. X(A) is a Poisson random variable of mean λ Leb(A); 2. X(A1 ), . . . , X(An ) are independent for disjoint A1 , . . . , An . Examples: stars in the night sky; patterns used to define collections of random discs; Full Screen . . . or space-time structures; Close Brownian excursions indexed by local time(!). Quit Home Page Title Page Contents JJ II J I Page 10 of 167 Go Back • In dimension 1 we can construct X using a Markov chain (“number of points in [0, t]”) and simulate it using Exponential random variables for inter-point spacings. Higher dimensions are trickier! See Theorem 1.5 below. • We can replace λ Leb (and W ⊆ Rn ) by a nonnegative σ-finite measure µ on a measurable space X , in which case X is an inhomogeneous Poisson point process with intensity measure µ. We should also require µ to be diffuse (P [µ({x}) = 0] = 1 for every point {x}) in order to avoid the point process faux pas of placing than one point onto a single location. Exercise 1.2 Show that any measurable map will carry one (perhaps inhomogeneous) Poisson point process into another, so long as – the image of the intensity measure is a diffuse measure; – and we note that the image Poisson point process is degenerate if the image measure fails to be σ-finite. Full Screen Close Quit Exercise 1.3 Show that in particular we can superpose two independent Poisson point processes to obtain a third, since the sum of independent Poisson random variables is itself Poisson. Home Page Title Page • It is a theorem that the second, independent scattering, requirement of Definition 1.1 is redundant! In fact even the first requirement is over-elaborate: Theorem 1.4 Suppose a random point process X satisfies P [X(A) = 0] Contents JJ II J I Page 11 of 167 Go Back = exp (−µ(A)) for all measurable A . Then X is an (inhomogeneous) Poisson point process of intensity measure µ. – This can be proved directly. I find it more satisfying to view the result as a consequence of Choquet capacity theory, since this approach extends to cover a theory of random closed sets (Kendall 1974). Compare inclusion-exclusion arguments and “correlation functions” used in the lattice case in statistical mechanics (Minlos 1999). Full Screen – We don’t have to check every measurable set:1 it is enough to consider sets A running through the finite unions of rectangles! Close – But we cannot be too constrained in our choice of various A (the family of convex A is not enough). Quit 1 Certainly we need check only the BOUNDED measurable sets. Home Page • An alternative approach2 lends itself to simulation methods for Poisson point processes in bounded windows: condition on the total number of points. Title Page Contents Theorem 1.5 Suppose the window W ⊆ Rn has finite area. The following constructs X as a Poisson point process of finite intensity λ: – X(W ) is Poisson of mean λ Leb(W ); JJ II J I – Conditional on X(W ) = n, the random variables X(A) are constructed as point-counts of a pattern of n independent points U1 , . . . , Un uniformly distributed over W . So X(A) = # {i : Ui ∈ A} . Page 12 of 167 Go Back Full Screen Exercise 1.6 Generalize Theorem 1.5 to (inhomogeneous) Poisson point processes and to the case of W of infinite area/volume. Close Quit 2 The construction described in Theorem 1.5 is the same as the one producing the link via conditioning between Poisson and multinomial contingency table models. Home Page Title Page Contents JJ II J I Page 13 of 167 Go Back Full Screen Close Quit Exercise 1.7 Independent thinning. Suppose each point ξ of X, produced as in Theorem 1.5, is deleted independently of all other points with probability 1 − p(ξ) (so p(ξ) is the retention probability). Show that the resulting thinned random point pattern is an inhomogeneous Poisson point process with intensity measure p(u) Leb( du). Exercise 1.8 Formulate the generalization of the above exercise which uses position-dependent thinning on inhomogeneous Poisson point processes. Home Page Title Page Contents JJ II J I Page 14 of 167 Go Back Full Screen Close Quit • The process produced by conditioning on X(W ) = n has a claim to being even more primitive than the Poisson point process. Definition 1.9 A Binomial point process with n points defined on a window W is produced by scattering n points independently and uniformly over W . However you should notice – point counts in disjoint subsets are now (perhaps weakly) negatively correlated: the more in one region, the fewer can fall in another; – the total number n of points is fixed, so we cannot extend the notion of a (homogeneous) Binomial point process to windows of infinite measure; – if you like that sort of thing, the Binomial point process is the simplest representative of a class of point processes (with finite total number of points) analogous to canonical ensembles in statistical physics. The Poisson point process relates similarly to grand canonical ensembles (grand, because points are indistinguishable and we randomize over the total number of particles). Of course these are trivial cases because there is no interaction between points — but more of that later! Home Page Title Page Contents JJ II J I Here’s a worry.3 It is easy to prove that the construction of Theorem 1.5 produces point count random variables which satisfy Definition 1.1 and therefore delivers a Poisson point process. But what about the other way round? Can we travel from Poisson point counts to a random point pattern? Or is there some subtle concealed random behaviour in Theorem 1.5 going beyond the labelling of points, which cannot be extracted from a point count specification? The answer is that the construction and the point count definition are equivalent. Page 15 of 167 Go Back Full Screen Close Quit 3 Perhaps you think this is more properly a neurosis to be dealt with by personal counselling?! Consider that a neurotic concern here is what allows us to introduce measure-theoretic foundations which are most valuable in further theory. Home Page Title Page Theorem 1.10 Any Poisson point process given by Definition 1.1 induces a unique probability measure on the space of locally finite point patterns, when this is furnished with the σ-algebra generated by point counts. Contents Here is a sketch of the proof. JJ II J I Page 16 of 167 Go Back Full Screen Close Quit Take W = [0, 1)2 . For each n = 1, 2, . . . , make a dyadic dissection of [0, 1)2 into 4n different sub-rectangles [k2−n , (k + 1)2−n ) × [`2−n , (` + 1)2−n ). For a given level n, visualize the pattern Pn obtained by painting sub-rectangles black if they are occupied by points, white otherwise. The point counts for each of these 4n sub-rectangles are independent Poisson of mean 4−n λ: by subadditivity of probability: P [one or more point counts > 1] ≤ 4n (1 − (1 + 4−n λ) exp −4−n λ = 4n (4−n λ)2 θ exp −4−n λθ for some θ ∈ (0, 1) (use the mean value theorem for x 7→ 1 − (1 + x) exp(−x)), which is summable in n. So by Borel-Cantelli we deduce that for all large enough n the black sub-rectangles in pattern Pn contain just one Poisson point each. Ordering these points randomly yields the construction in Theorem 1.5. Home Page Simulation techniques for Poisson point processes are rather direct and straightforward: here are two possibilities. 1. We can use Theorem 1.5 to reduce the problem to the algorithm Title Page Contents JJ II J I Page 17 of 167 Go Back Full Screen Close Quit • draw N = n from a Poisson random variable distribution; • draw n independent values from a Uniform distribution over the required window. def poisson process (region ← W, intensity ← alpha) : pattern ← [ ] for i ∈ range (poisson (alpha)) : pattern.append (uniform (region ← W )) return pattern Home Page Title Page Contents JJ II J I Page 18 of 167 Go Back Full Screen 2. Quine and Watson (1984) describe a generalization of the 1-dimensional “Markov chain” method mentioned above. Suppose we wish to simulate a Poisson point process of intensity λ in ball(o, R). We describe the twodimensional case; the generalization to higher dimensions is straightforward. 4 Using polar coordinates and power transform of radius, produce a measure-preserving map (0, R2 /2] × (0, 2π] → (s, θ) 7→ ball(o, R) √ ( 2s, θ) (1.1) √ where ( 2s, θ) is expressed in polar coordinates, and [0, R2 /2] × [0, 2π] and the ball(o, R) are endowed with the measure λ Leb. Being measure-preserving, the map preserves point processes. But the following Exercise 1.11 shows a Poisson(λ) point process on the product space [0, R2 /2] × [0, 2π] can be thought of as a 1dimensional Poisson(2λπ) point process of “half-squared radii” on [0, R2 /2] (which may be constructed using a sequence of independent Exponential(2λπ) random variables), each point of which is associated with an “angle” which is uniformly distributed on [0, 2π]. Close Quit 4 The Quine-Watson method is particularly suited to the construction of Poisson point processes which are to be used to build geometric tessellations . . . Home Page Title Page Contents JJ II J I Page 19 of 167 Exercise 1.11 Suppose X is a Poisson point process of intensity λ on the rectangle [a, b] × [0, 1]. Show that it may be realized as a one-dimensional Poisson point process on [a, b] of intensity λ, enriched by attaching to each one-dimensional point an independent Uniform([0, 1]) random variable to provide the second coordinate. Simply suppose the construction is given by Theorem 1.5, show the first coordinate of each point provides the required one-dimensional Poisson point process, note the second coordinate is uniformly distributed as required! Go Back Full Screen Close Quit Exercise 1.12 Why is it irrelevant to this proof that the map image omits the centre o of ball(o, R)? Home Page Title Page Contents JJ II J I Page 20 of 167 Exercise 1.13 Check the details of modifying the Quine-Watson construction for higher dimensions, to show how to build a d-dimensional Poisson point process using independent sequences of Exponential random variables and random variables uniformly distributed on S d−1 . Exercise 1.14 Consider the following method of parametrizing lines: measure s the perpendicular distance from the origin and θ the angle made by the perpendicular to a reference line. How should you modify the Quine-Watson construction to produce a Poisson line process which is invariant under isometries of the plane? What about invariant Poisson hyperplane processes? Go Back Full Screen Close Quit What do large cells look like? In the early 1940s D.G. Kendall conjectured a large cell should approximate a disk. We now know this is true, and more besides! (Miles 1995; Kovalenko 1997; Kovalenko 1999; Hug et al. 2003) Home Page Title Page Given a Poisson point process, we may construct a locally finite point pattern using the methods of Theorem 1.10, and enumerate the points of the pattern using some arbitrary rule. So we may speak of summing h(ξ) over the pattern points ξ; by Fubini the result will not depend on the arbitrary ordering so long as h is nonnegative. Contents JJ II J I Page 21 of 167 Go Back Full Screen Theorem 1.15 If X is a Poisson point process of intensity λ and if h(x, φ) is a nonnegative measurable function of a point x and a locally finite point pattern φ then X Z E h(ξ, X \ {ξ}) = E [h(u, X)] λ Leb( du) , (1.2) and indeed by direct generalization X E h(ξ1 , . . . , ξn , X \ {ξ1 , ξ2 , . . . , ξn }) Z ··· Rd ···Rd Quit = disjoint ξ1 ,...,ξn ∈X Z Close Rd ξ∈X (1.3) n E [h(u1 , . . . , un , X)] λ Leb( du1 ), . . . , Leb( dun ) . Home Page Title Page Exercise 1.16 Use Theorem 1.5 to prove (1.2) for h(u, X) which is positive only when u lies in a bounded window W and which depends on X only on the intersection of the pattern X with the window W . Hence deduce the full Theorem 1.15. Contents This delivers a variation on the independent scattering property of the Poisson point process. JJ II J I Page 22 of 167 Corollary 1.17 If g is a nonnegative measurable function of a locally finite point pattern φ then E [g(X \ {u})|u ∈ X] = E [g(X)] Go Back Full Screen Close Quit hP i Use Theorem 1.15 to calculate E ξ∈X I [ξ ∈ ball(u, ε)]g(X \ {ξ}) . Interpret R the expectation as ball(u,ε) E [g(X \ {u})|u ∈ X] λ Leb( du) We say the Palm distribution for a Poisson point process is the Poisson point process itself with the conditioning point added in! 1.2. Boolean models Home Page Title Page Contents JJ II J I In stochastic geometry the counterpart to the Poisson point process is the Boolean model; the set-union of (possibly random) geometric figures or grains located one at each point or germ of an underlying Poisson point process. Much of the amenability of the Poisson point process is inherited by the Boolean model, though nevertheless its analysis can lead to very substantial calculations and theory. Page 23 of 167 Go Back Boolean models have the advantage of being simple enough that one can carry explicit computation. They have a long application history, for example being used by Robbins (1944,1945) to study bombing problems (how heavily to bomb a beach in order to ensure a high probability of landmine-clearance . . . ). Full Screen Close Bombing problems and coverage: refer to Hall (1988, §3.5), Molchanov (1996a); also note application of Exercise 1.14. Quit Home Page Title Page Contents JJ II J I Page 24 of 167 Go Back Full Screen Close Quit Definition 1.18 A Boolean model Ξ is defined in terms of a germ process which is a Poisson point process X of intensity λ, and a grain distribution which defines a random set5 . It is a random set which is formed by the union [ (U (ξ) + ξ) , ξ∈X where the U (ξ) are independent realizations of the random set, one for each germ or point ξ of the germ process. (Notice: the grains should also be independent of the germs!) The classic computation for a Boolean model Ξ is to determine the probability P [o ∈ Ξ] that the resulting random set covers a specified point (here, the origin o). This reduces to a matter of geometry: if o is covered then it must be covered by at least one grain, so just think about where the corresponding germ might lie. If U is non-random then the germ will have to lie in the reflection Û of U in the origin, so restrict attention to germs lying in Û . If U is random then h generalize i this: thin each germ ξ of the germ process using the probability P ξ ∈ Û that 5 The random set is often not random at all (a disk of fixed radius, say) or is random in a simple parametrized manner (a disk of random radius, or a random convex polygon defined as the intersection of a sequence of random half-spaces); so we will follow Robbins (1944,1945) in treating this informally, rather than developing the later and profound theory of random sets (Matheron 1975)! Home Page the corresponding grain coversho. Theiresult is an inhomogeneous Poisson point process of intensity measure P u ∈ Û Leb( du): calculate the probability of this process containing no points at all to produce the following: Title Page Contents JJ II J I Page 25 of 167 Go Back Full Screen Theorem 1.19 Suppose Ξ is a Boolean model with germ intensity λ and typical random grain U . Then the area / volume fraction is P [o ∈ Ξ] = 1 − exp (−λ E [Leb(U )]) . (1.4) A similar argument produces the generalization Theorem 1.20 Suppose Ξ is a Boolean model with germ intensity λ and typical random grain U . Let K be a compact region. Then the probability of Ξ hitting K is h i P [Ξ ∩ K 6= ∅] = 1 − exp −λ E Leb(Û ⊕ K) , (1.5) where Û ⊕ K is a Minkowski sum: Close K 1 ⊕ K2 Quit = {x1 + x2 : x1 ∈ K1 , x2 ∈ K2 } . (1.6) Home Page Title Page Contents Remember, from the discussion prior to Theorem 1.15, that we may view the germs as a sequence of points (using some arbitrary method of ordering); so simulation of Ξ need only depend on generating a suitable sequence of independent random sets U1 , U2 , . . . . Harder analytical questions include: • how to estimate the probability of complete coverage? JJ II • in a high-intensity Boolean model, what is the approximate statistical shape of a vacant region? J I • as intensity increases, how to describe the transition to infinite connected regions of coverage? Page 26 of 167 Go Back Full Screen Close Quit Useful further reading for these questions: Hall (1988), Meester and Roy (1996), Molchanov (1996b). 1.3. General definitions and concepts Home Page 1.3.1. General Point Processes Title Page Other approaches to point processes include: • random measures (nonnegative integer-valued, charge singletons with 0, 1); Contents • measures on exponential measure spaces; JJ II J I Page 27 of 167 Go Back Full Screen Close Quit reflected in notation (a point process X can have points belonging to it (ξ ∈ X), can be used for point counts (X(E) when E is a subset of the ground space, and can be used to provide a reference measure so as to define other point processes via densities). Carter and Prenter (1972) establish essential equivalences. We can define general point processes as probability measures on the space of locally finite point patterns, following Theorem 1.10. This allows us immediately to define stationary point processes (the distribution respects translation symmetries, at least locally) and isotropic point processes (the same, but for rotations). The distribution of a general point process is characterized by its void probabilities as in Theorem 1.4. To each point process X (whether stationary, isotropic, or not) there is an intensity measure given by Λ(E) = E [X(E)]. (For stationary point processes this has to be a multiple of Lebesgue measure!) We can also define the Palm distribution for X: Home Page Definition 1.21 E [g(X)|u ∈ X] is required to solve Title Page Contents E X f (ξ)g(X \ {ξ}) = ξ∈X∩ball(x,ε) Z JJ II J I Page 28 of 167 f (u)E [g(X)|u ∈ X] Λ( du) . (1.7) ball(x,ε) A somewhat dual notion is the conditional intensity: informally, if ξ is a location and x is a locally finite point pattern not including the location ξ then the conditional intensity λ∗ (x; ξ) is the infinitesimal probability of a point pattern X containing ξ if X \ {ξ} = x: Go Back λ∗ (φ; ξ) = P [ξ ∈ X|X \ {ξ} = x] . P [X \ {ξ} = x] (1.8) Full Screen Close Quit We shall not define this in rigorous generality, as its definition will be immediate for the Markov point processes to which we will turn below. Home Page These considerations provide a general context for figuring out constructions which move away from the “no interactions” property of Poisson point processes. Title Page 1. Poisson cluster processes (can be viewed as Boolean models in which the grains are simply finite random clusters of points). Neyman-Scott processes arise from clusters of independent points. Contents 2. Randomize intensity measure: “doubly stochastic” or Cox process. JJ II J I Page 29 of 167 Go Back Exercise 1.22 If the number of points in a Neyman-Scott cluster is Poisson, then the Neyman-Scott point process is also a Cox process. Line processes present an interesting issue here! Davidson conjectured that a second-order stationary line process with no parallel lines is Cox. Kallenberg (1977) counterexample: Stoyan et al. (1995, Figure 8.3) provides an illustration. Full Screen Close Quit 3D variants with applications in Parkhouse and Kelly (1998). 3. Or apply rejection sampling to a Poisson point process, rejection probability depending on statistics computed from the point pattern. Compare Gibbs measure formalism of statistical mechanics. What sort of statistics might we use? Home Page • estimation of intensity λ by λ̂(X) = X(W )/ Leb(W ) (using sufficient statistic X(W )); Title Page • measuring how close the point pattern is to a specific location using the spherical contact distribution function Contents JJ II J I Page 30 of 167 Hs (r) Full Screen Close Quit P [dist(o, X) ≤ r] , (1.9) estimated via a Boolean model Ξr based on balls U of fixed radius r by [ Leb(Ξr ∩ W )/ Leb(W ) = Leb (U + ξ) ∩ W / Leb(W ) ; ξ∈X (1.10) • measuring how close pattern points come to each other using the nearest neighbour distance function D(r) Go Back = = P [dist(o, X) ≤ r|o ∈ X] . (1.11) Interpret conditioning on null event “o ∈ X” using Palm distributions; • The related reduced second moment measure 1 K(r) = E [#{ξ ∈ X : dist(o, ξ) ≤ r} − 1|o ∈ X] , (1.12) λ estimating λK(r) by the number sr (X) of (ordered) neighbour pairs (ξ1 , ξ2 ) in X for which dist(ξ1 , ξ2 ) ≤ r (use Slivynak-Mecke Theorem 1.15). We ignore the important issue of edge effects! Home Page Title Page Contents JJ II J I Page 31 of 167 Go Back Full Screen Close Quit Exercise 1.23 Use Theorem 1.15 to prove that for a homogeneous Poisson point process D(r) = Hs (r) and K(r) = Leb(ball(o, r)). Here is a prototype of use of these statistics in rejection algorithms: Exercise 1.24 Consider the following rejection simulation algorithm, noting that λ̂ = X(W )/ Leb(W ) to connect with previous discussion: def reject () : X ← poisson process (region ← W, intensity ← lamda) while uniform (0, 1) > theta ∗ ∗(number (X)) : X ← poisson process (region ← W, intensity ← lamda) return X Show that the resulting point process is Poisson of intensity λ + θ. Use Theorem 1.5 to reduce this to a question about rejection sampling of Poisson random variables. 1.3.2. Marked Point Processes Home Page Title Page Contents JJ II J I We can think of Boolean models as derived from marked Poisson point processes; to each germ point assign a random mark encoding the geometric data (such as disk radius) of the grain. In effect the Poisson point process now lives on Rd × M where M is the mark space. • Poisson cluster processes as mentioned above: the grains are finite random clusters; • segment and disk processes; • line and (hard) rod processes; Page 32 of 167 Go Back Full Screen • fibre and surface processes. Compare Exercise 1.11; the intensity measure will factorize as µ × ν so long as the marks are independent, where µ is the intensity of the basic Poisson point process and the measure ν is the distribution of the mark. But beware! it can be meaningful to consider improper mark distributions. Examples include: Decomposition of Brownian motion into excursions; Close Marked planar Poisson point process for which the marks are disks of random radius R, density proportional to 1/r if r ≤ threshold. Quit 1.4. Markov point processes Home Page Title Page Consider interacting point processes generating by rejection sampling (Exercise 1.24 is simple example!). For the sake of simplicity, we suppose here that all our processes are two-dimensional. The general procedure is to use a statistic t(X) in the algorithm Contents JJ II J I Page 33 of 167 def reject () : X ← poisson process (region ← W, intensity ← lamda) while uniform (0, 1) > exp (theta ∗ t (X)) : X ← poisson process (region ← W, intensity ← lamda) return X We can now use standard arguments from rejection simulation. The resulting point process distribution has a density constantθ × exp(θt(X)) Go Back (1.13) with respect to the Poisson point process X of intensity λ, for normalizing constant Full Screen constantθ Close Quit = E [exp(θt(X))|X is Poisson(λ)] . Home Page If θt(X) > 0 is possible then the above algorithm runs into trouble (how do you reject a point pattern with probability greater than 1?!) However the probability density at Equation (1.13) can still make sense. Importance sampling could of course implement this. We can do better if the following holds: Title Page Contents Definition 1.25 A point pattern density f (X) satisfies Ruelle stability if f (X) JJ ≤ A exp(B#(X)) (1.14) II for constants A > 0, B; where #(X) is the number of points in X. J I Page 34 of 167 Go Back Exercise 1.26 6 Show that if the density exp(θt(X)) is Ruelle stable with constants as in 1.25 then the rejection algorithm above will produce a point pattern with the correct density if applied to a Poisson point process of density λeB instead of λ, and rejection probability is changed to exp(θt(X) − B#(X))/A instead of exp(θt(X)). Full Screen Close Quit Use t(X) = sr (X), for γ = exp(θ) ∈ (0, 1), the number of ordered pairs separated by at least r, to obtain the celebrated Strauss point process (Strauss 1975). 6 Thanks for the comment, Tom! Home Page The statistic t(X) = Leb(Ξr ∩ W ), for γ = exp(−θ) results in a process known as the area-interaction point process (Baddeley and van Lieshout 1995), also known in statistical physics as the Widom-Rowlinson interpenetrating spheres model (Widom and Rowlinson 1970) in the case θ > 1. Title Page Contents JJ II J I The (Papangelou) conditional intensity is easily defined for this kind of weighted Poisson point process, which (borrowing statistical mechanics terminology) is commonly known as a Gibbs point process: if the density of the Gibbs process with respect to a unit rate Poisson point process is given by f (X) then the conditional intensity at ξ may be defined as λ∗ (x; ξ) = f (x ∪ {ξ} . f (x) (1.15) Page 35 of 167 Go Back Full Screen Close Exercise 1.27 Check the Nguyen-Zessin formula (compare Theorem 1.15): for nonnegative h, Z X E [h(u, x)λ∗ (x, u)] Leb( du) (1.16) E h(ξ, x \ {ξ}) = ξ∈x Quit Home Page Title Page Exercise 1.28 Compute the conditional intensity for the Poisson point process of intensity λ (use its density with respect to a unit-intensity Poisson point process; see Exercise 1.24). Contents JJ II J I Exercise 1.29 Compute the conditional intensity for the Strauss point process. Page 36 of 167 Go Back Exercise 1.30 Compute the conditional intensity for the area-interaction point process. Full Screen Close Quit All these conditional intensities λ∗ (x; ξ) share a striking property: they do not depend on the point pattern x outside of some neighborhood of ξ. This is an appealing property for a point process intended to model interactions: it is captured in Ripley and Kelly’s (1977) definition of a Markov point process: Home Page Title Page Definition 1.31 A Gibbs point process is a Markov point process with respect to a neighbourhood relation ∼ if its density is everywhere positive7 , and its conditional intensity λ∗ (x; ξ) depends only on ξ and {η ∈ φ : η ∼ ξ}. Contents JJ II J I Page 37 of 167 Go Back Full Screen Close Quit The celebrated Hammersley-Clifford theorem tells us that the probability density for a Markov point process can always be factorized as a product of interactions, functions of ∼-cliques of points of the process. Ripley and Kelly (1977) presents a proof for the point process case, while Baddeley and Møller (1989) provides a significant extension. A recent treatment of Markov point processes in book form is given by Van Lieshout (2000). However care must be taken to ensure they are well-defined: the danger of simply writing down a statistic t(X) and defining the density by Equation (1.13) is that the total integral E [exp(θt(X)|X is Poisson(λ)] may diverge so that the normalizing constant is not well-defined. (This is the case for the Strauss point process when γ > 1, as shown by Kelly and Ripley 1976. This particular point process cannot therefore be used to model clustering — a major motivation for considering the more amenable area-interaction point process.) 7 The positivity requirement ensures that the ratio expression for λ∗ (x; ξ) in Equation (1.15) is well-defined! It can be relaxed to a hereditary property. Home Page Title Page Exercise 1.32 Show that E γ sr (X) diverges for γ > 1 if the interaction radius r exceeds the maximum separation of potential points in the window W. Contents JJ II J I Page 38 of 167 Go Back Full Screen Close Quit Exercise 1.33 Bound exp(−θ Leb(Ξr ∩ W )) above in terms of the number of points in W (show the interaction satisfies Ruelle-stability), and hence deduce that the area-interaction point process is defined for all real θ. Home Page Title Page Contents JJ II J I Page 39 of 167 Can we bias using perimeter instead of area of Ξr in area-interaction, or even “holiness” (Euler characteristic: number-of-components minus number-of-holes)? Like area, these characteristics are local (lead to formal densities whose conditional intensities satisfy the local requirement of Theorem 1.31). The Hadwiger characterization theorem (Hadwiger 1957) more-or-less says that linear combinations of these three measures exhaust all local geometric possibilities! Can the result be normalized? Exercise 1.34 Use the argument of Exercise 1.33 to show that exp(θ perimeter(Ξr ∩ W )) can be normalized for all θ. Exercise 1.35 Argue similarly to show that exp(θ euler(Ξr ∩ W )) can be normalized for all θ > 0. Go Back Full Screen Close Quit There can be no more components than disks/grains of the Boolean model Ξr . Home Page Title Page What about exp(θ euler(Ξr ∩ W )) for θ < 0? Key remark is Theorem 4.3 of Kendall et al. (1999): a union of N closed disks in the plane (of whatever radii, not necessarily the same for each disk) can have at most 2N − 5 holes. Contents JJ II J I Separate, trickier argument: disks can be replaced by polygons satisfying a “uniform wedge” condition. Page 40 of 167 Go Back Full Screen Close Quit There are counterexamples . . . . 1.5. Further issues Home Page Before moving on to develop machinery for simulating realizations of Markov point processes, we make brief mention of a couple of recent pieces of work. Title Page 1.5.1. Poisson point processes and set-indexed martingales Contents JJ II J I Page 41 of 167 There are striking applications of set-indexed martingales to distributional results for the areas of adaptively random regions. Recall the notion of stopping time T for a filtration {Ft }. Here is a variant on the standard definition: we require {ω : ∆(ω) t} ∈ Ft where ∆(ω) = {ω : t T (ω)} . (1.17) Makes sense for partially-ordered family of closed sets (using set-inclusion); hence martingales, and Optional Sampling Theorem (Kurtz 1980). Zuyev (1999) used this to give an elegant proof of results of Møller and Zuyev (1996): Go Back Full Screen Close Theorem 1.36 Suppose ∆ is a compact stopping set for X a Poisson point process of intensity α, such that P [X(∆) = n] > 0 and does not depend on α . Then Leb(∆) given X(∆) is Gamma(n, α). Quit Home Page Title Page Contents JJ II J I Examples: minimal centred ball containing n points (easy: compare Exercise 1.11); regions constructed using Delaunay or Voronoi tessellations. Proof: use likelihood ratio for α (furnishing a martingale), hence h i E ez Leb(∆) | X(∆) = n, α = α = z Leb(∆) P e I [X(∆) = n]αn α0−n e−(α−α0 ) Leb(∆) | α = α0 . P [X(∆) = n | α = α0 ] Page 42 of 167 Go Back Full Screen (a) Delaunay circumdisk (b) Voronoi flower Close Quit Cowan et al. (2003) use this technique to consider the extent to which these examples lead to decompositions as in the minimal centred ball. 1.5.2. Stationary random countable dense sets Home Page Title Page Contents JJ II J I Page 43 of 167 Go Back We conclude this lecture with a cautionary example! Without straying very from the theory of random point processes one can end up in the bad part of the Measure Theory town. Suppose for whatever reason one wishes to consider random point sets X which are countable but are also dense (set of directions of a line process?). We need to be precise about what we can observe, so that we genuinely treat the different points of X on equal terms. So suppose we can observe any event in the hit-or-miss σ-algebra generated by [X ∩ A] 6= ∅, as A runs through the Borel sets. Suppose that X is stationary. (We thus exclude the situations considered by Aldous and Barlow (1981), where account is taken of how various points are constructed — in effect the dense point process is obtained by projection from a conventional locally finite point process.) There are two ways to define such sets: (a) measurable subset of Ω × R; (b) Borel-set-valued random variable, hit-or-miss measurable. Full Screen Close Quit Equivalent for case of closed sets (Himmelberg 1975, Thm. 3.5). But (warning!) the set of countable sets is not hit-or-miss measurable, though constructive and weak countability are equivalent (section theorem). What can we say about stationary random countable dense sets? Home Page Title Page Theorem 1.37 (Kendall 2000, Thm. 4.3, specialized to Euclidean case.) If E is any hit-or-miss measurable event then either P [E] = 0 or P [E] = 1. Whether 0 or 1 depends on E but not on X! Contents JJ II J I Page 44 of 167 Exercise 1.38 Compare these two stationary random countable dense sets: (a) For Y a unit-intensity Poisson point process on R × (0, ∞), the countable set obtained by projection onto the first coordinate. (b) The set Q + Z, where Q is the set of rational numbers, and Z is a Uniform([0, 1]) random variable. They are very different, but appear indistinguishable . . . . Go Back Full Screen Close Quit This bad behaviour should not be surprising: for example Vitali’s non-measurable set depends on there being no measurable random variable taking values in Q + Z. (Contrast our use of enumeration for locally finite point sets in Theorem 1.15.) Careless thought betrays intuition! Home Page Title Page Contents JJ II J I Chapter 2 Simulation for point processes Page 45 of 167 Go Back Full Screen Close Quit Contents 2.1 2.2 2.3 2.4 2.5 2.6 General theory . . . . . . . . . . . Measure-theoretic approach . . . . MCMC for Markov point processes Uniqueness and convergence . . . . The MCMC zoo . . . . . . . . . . . Speed of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 51 57 63 64 70 Home Page Computations with Poisson process, Boolean model, cluster processes can be very direct, and simulation is usually direct too. But what about Markov point processes? Title Page • Simple-minded rejection procedures (§1.4) are usually infeasible. Contents JJ II • What about accept-reject procedures sequentially in small sub-windows, thus constructing a point-pattern-valued Markov chain whose invariant distribution is the desired point process? J I • Or proceed to limit: add/delete single points? Page 46 of 167 Go Back Full Screen Close Quit Markov chain Monte Carlo (MCMC) Gibbs’ sampler, Metropolis-Hastings! Poisson point process as invariant distribution of spatial immigration-death process after Preston (1977), also Ripley (1977, 1979). To deal with more general Markov point processes it is convenient first to work through the general theory of detailed balance equations for Markov chains on general state-spaces. We will show how to use this to derive simulation algorithms for our example point processes above, and discuss the more general MetropolisHastings (MH) formulation after Metropolis et al. (1953) and Hastings (1970), which allows us to view MCMC as a sequence of proposals of random moves which may or may not be rejected. Simple example of detailed balance Home Page Underlying the Poisson point process example: the process X of total number of points alters as follows Title Page • birth in time interval [0, ∆t) at rate α∆t; • death in time interval [0, ∆t) at rate X∆t. Contents Then we can solve JJ II J I P [X(t) = n − 1 and birth in [t, t + ∆t)] ≈ π(n − 1) × α∆t = π(n) × n∆t ≈ P [X(t) = n and death in [t, t + ∆t)] by Page 47 of 167 Go Back Full Screen Close Quit π(n) = α π(n − 1) n = ... = αn e−α . n! 2.1. General theory Home Page Title Page • Ergodic theory for Markov chains: invariant distribution X π(y) = π(x)p(x, y) x Contents or (continuous time) X JJ II J I π(x)q(x, y) = 0 . x • Detailed balance; – reversibility π(x)p(x, y) = π(y)p(y, x) ; – dynamic reversibility π(x)p(x, y) = π(ỹ)p(ỹ, x̃) ; Page 48 of 167 Go Back Full Screen Close Quit Exercise 2.1 Show detailed balance (whether reversibility or dynamical reversibility!) implies invariant distribution. Home Page P • improper distributions: does x π(x) converge? R Pn • estimate θ(y) dπ(y) using limn→∞ n1 m=1 θ(Ym ) ; Title Page Definition 2.2 Geometric ergodicity: there is R(x) > 0 with Contents kπ − p(n) (x, ·)ktv ≤ R(x)ρn JJ II J I for all n, all starting points x. Page 49 of 167 Definition 2.3 Uniform ergodicity: there are ρ ∈ (0, 1), R > 0 not depending on the starting point x such that Go Back kπ − p(n) (x, ·)ktv ≤ Rρn Full Screen Close Quit for all n, uniformly for all starting points x; 2.1.1. History Home Page Title Page • Metropolis et al. (1953) • Hastings (1970) • Simulation tests Contents JJ II • Geman and Geman (1984) • Gelfand and Smith (1990) • Geyer and Møller (1994), Green (1995) J I Page 50 of 167 • ... Definition 2.4 Generic Metropolis algorithm: • Propose: move x → y with probability r(x, y); Go Back • Accept: move with probability α(x, y). Required: balance π(x)r(x, y)α(x, y) = π(y)r(y, x)α(y, x). Full Screen Definition 2.5 Hastings ratio: set α(x, y) = min{Λ(x, y), 1} where Close Quit Λ(x, y) = π(y)r(y, x) . π(x)r(x, y) 2.2. Measure-theoretic approach Home Page Title Page Contents JJ II J I Page 51 of 167 We follow Tierney (1998) closely in presenting a treatment of the Hastings ratio for general discrete-time Markov chains, since in stochastic geometry one frequently deals with chains on quite complicated state spaces. Here is the key detailed balance calculation in general form: compare the “fluxes” π( dx)p(x, dy) and π( dy)p(y, dx). Both are absolutely continuous with respect to their sum: set Λ(y, x) Λ(y, x) + 1 = π( dx)p(x, dy) π( dx)p(x, dy) + π( dy)p(y, dx) defining a possibly infinite Λ(y, x) up to nullsets of π( dx)p(x, dy)+π( dy)p(y, dx). Now Λ(y, x) is positive off a null-set N1 of π( dx)p(x, dy) and finite off a null-set N2 of π( dy)p(y, dx). Moreover N1 and N2 can be chosen as transposes of each other under the symmetry (x, y) ↔ (y, x), because they are defined in terms of whether Λ(y, x)/(Λ(y, x) + 1) is 0 or 1, and so we may arrange1 for Go Back Λ(y, x) Λ(x, y) + Λ(y, x) + 1 Λ(x, y) + 1 Full Screen = 1. Close Quit (2.1) 1 working to null-sets of π( dx)p(x, dy) + π( dy)p(y, dx) throughout! Applying simple algebra, we deduce Λ(y, x)Λ(x, y) = 1 off N1 ∪ N2 . We know Home Page Title Page 0 < Λ(y, x), Λ(x, y) < ∞ so Contents = II J I Page 52 of 167 Go Back Λ(y, x) Λ(y, x) + 1 Λ(y, x) Λ(x, y) × Λ(y, x) + 1 Λ(x, y) Λ(x, y) min {Λ(y, x), Λ(y, x)Λ(x, y)} (Λ(y, x) + 1)Λ(x, y) Λ(x, y) = min {Λ(y, x), 1} (2.2) Λ(x, y) + 1 min {1, Λ(x, y)} JJ off N1 ∪ N2 , = min {1, Λ(x, y)} and therefore min {1, Λ(x, y)} π( dx)p(x, dy) = min {1, Λ(y, x)} π( dy)p(y, dx) . The kernel min{1, Λ(x, y)}p(x, dy) may have a deficit (integral over y less than one): compensate by adding an atom Z 1 − min{1, Λ(x, u)}p(x, du) δx ( dy) Full Screen Close Quit to obtain a reversible kernel (actually a Metropolis-Hastings kernel!) Z 1 − min{1, Λ(x, u)}p(x, du) δx ( dy) + min{1, Λ(x, y)}p(x, dy) 2.2.1. Projection arguments Home Page In case π( dx)p(x, dy) and π( dy)p(y, dx) are mutually absolutely continuous Λ(x, y) Title Page Contents JJ II I Page 53 of 167 π( dy)p(y, dx) π( dx)p(x, dy) (2.3) is exactly the Hastings ratio. Otherwise it arises via the Metropolis map L1 projection arguments of Billera and Diaconis (2001), measurable case. Sketch: Let K be family of all kernels and π( dx) be target probability measure. R J = = {q(x, dy) ∈ K : π( dx)q(x, dy) = π( dy)q(y, dx)} and introduce distance (via Hahn-Jordan decomposition of signed measures) Z Z d(p, q) = |π( dx) (q(x, dy) − p(x, dy))| x6=y Go Back between p ∈ K and q ∈ R. Remembering q ∈ R, introduce a deficit signed kernel (x, dy) by π( dy)q(y, dx) Full Screen = π( dx)p(x, dy) − π( dx)ε(x, dy) and define Close Quit H+ H− = {(x, y) : Λ(x, y) > Λ(y, x)} , = {(x, y) : Λ(x, y) < Λ(y, x)} (2.4) Home Page Title Page Contents JJ II J I Page 54 of 167 Go Back Full Screen where Λ(x, y) is defined in terms of p using Equation (2.3). Notice H− = {(x, y) : (y, x) ∈ H+ }, and (x, x) 6∈ H− ∪ H+ for all x. Furthermore, set p̃(x, dy) = α(x, y)p(x, dy) with Λ(y, x) α(x, y) = min 1, Λ(x, y) (use Λ(x, y) to compute acceptance probabilities to produce p̃ ∈ R). Now Z Z d(p, p̃) = |π( dx) (p̃(x, dy) − p(x, dy))| x6=y Z Z = |π( dx) (α(x, y)p(x, dy) − p(x, dy))| x6=y Z Z Λ(y, x) = π( dx) p(x, dy) − min 1, p(x, dy) Λ(x, y) x6=y Z Z Λ(y, x) = π( dx) p(x, dy) − p(x, dy) Λ(x, y) H Z Z + = (π( dx)p(x, dy) − π( dy)p(y, dx)) H+ Close Quit using 0 ≤ α(x, y) ≤ 1 and definition of Λ(x, y) via a density in Equation (2.1). Home Page Note also that replacing p̃(x, dy) by α0 (x, y)p(x, dy), for α0 (x, y) ∈ [0, α(x, y)] increases the distance by Z Z π( dx) (α(x, y) − α0 (x, y)) p(x, dy) . (2.5) Title Page x6=y Now compute Contents Z Z d(p, q) |π( dx) (q(x, dy) − p(x, dy))| = x6=y JJ II Z Z ≥ |π( dx) (q(x, dy) − p(x, dy))| + H+ J I + |π( dy) (q(y, dx) − p(y, dx))| Z Z |π( dx)ε(x, dy)| + = Page 55 of 167 H+ + |π( dx)p(x, dy) − π( dx)ε(x, dy) − π( dy)p(y, dx)| Go Back Z Z ≥ π( dx)p(x, dy) − π( dy)p(y, dx) = d(p, p̃) . H+ Full Screen Close Quit Thus p̃(x, dy), defined by p̃(x, dy) = α(x, y)p(x, dy), is the kernel in R which is closest to the original kernel p(x, dy) in this distance d(p, p̃). Moreover equality in the last step can hold only when π( dx)ε(x, dy) restricted to H+ is a non-negative measure dominated by π( dx)p(x, dy), which is to say when π( dx)ε(x, dy) = (1 − α0 (x, y)) π( dx)p(x, dy) on H+ Home Page Title Page Contents JJ II J I Page 56 of 167 for some α0 (x, y) ∈ [0, 1]. Exactly the same arguments applied to the transposed measure π( dy)p(y, dx), and Equation (2.4), lead us to deduce that if q(d, dy) is a kernel in R which minimizes d(q, p) then q(x, y) = α0 (x, y)p(x, dy) . We deduce now from the discussion before Equation (2.5) that the distance is minimized only when α0 (x, y) = α(x, y) = min{1, Λ(y, x) } Λ(x, y) in L1 (π( dx)p(x, dy)). In this sense the Metropolis-Hastings sampler p̃(x, dy) = α(x, y)p(x, dy) is an L1 projection of p(x, dy) onto the class of reversible kernels. Go Back Full Screen Close Quit We note in passing that the Hastings ratio (as opposed to other possible accept/reject rules) yields the largest spectral gap and minimizes the asymptotic variance of additive functionals (Peskun 1973; Tierney 1998). 2.3. MCMC for Markov point processes Home Page Title Page Contents JJ II J I Page 57 of 167 Go Back We now apply the above to simulation of a point process in a bounded window, whose distribution2 has density f (ξ) with respect to a Poisson point process measure π1 ( dξ) of unit intensity (Geyer and Møller 1994; Green 1995). We need to design a point-pattern-valued Markov chain whose limiting distribution will be f (ξ)π1 ( dξ), and we will constrain the Markov chain to evolve solely by adding and deleting points. Let λ∗ (x; ξ) = f (x ∪ {ξ})/f (x) be the Papangelou conditional intensity for the target point process, as defined in Equation (1.15). Suppose the point process satisfies some condition such as local stability, namely (λ∗ (x; ξ) Close Quit γ (2.6) for some positive constant γ. We can weaken this: we need require only that Z W Full Screen ≤ (λ∗ (x; u) du Z ≤ γ(u) d Leb(u) < ∞. X However the stronger condition is useful later, in CFTP, and implies Ruelle stability. 2 Here we use the measurable space of locally finite point patterns, introduced in Theorem 1.10. Notice that we are dealing with a bounded window, so in fact the total number of points in the pattern will always be finite. 2.3.1. Discrete time case Home Page Title Page Consider a Markov chain X which evolves as follows. Fix αn so that α0 = 0 and Z αn + αn+1 λ∗ (x; u) du ≤ 1 . (2.7) W Contents JJ II J I • suppose the current value of X is X = x with #(x) = n. Choose from: – Death: with probability αn−1 , choose a point η of x at random and delete it from x. – Birth: generate a new point ξ with sub-probability density Page 58 of 167 Go Back Full Screen Close Quit αn λ∗ (x; ·)/(n + 1) and add it to x (in case of probability deficit, do nothing!); Arrange matters so that birth and death are exclusive (feasible by (2.7) above). It is tedious to write down the transition kernel for X: however the message of Λ(x, y) (as given in (2.3)) is that equality of probability fluxes suffices. Point pattern configurations are tricky (patterns are not ordered); however an equivalent task is to arrange for equality of birth and death fluxes when the target distribution is given by rejection sampling based on the construction of Theorem 1.5, recalling that f is a density with respect to a unit Poisson point process. Set Leb(W ) = A. We consider the exchange between x̃ and x where x̃ = x ∪ {ξ}, x = x̃ \ {ξ}. Suppose #(x) = n, so #(x̃) = n + 1. Death: the flux from x̃ to x is Home Page f (x̃) Title Page Contents JJ II J I Page 59 of 167 Go Back Full Screen Close Quit e−A dx1 . . . dxn dξ αn ×(n + 1)!× (n + 1)! n+1 = αn f (x̃) e−A dx1 . . . dxn dξ . n+1 The factor (n + 1)! in the numerator of the left-hand side allows for the (n+1)! ways of permuting x1 , . . . , xn , ξ to represent the same configuration x̃. The death probability αn /(n + 1) allows for the requirement that the point chosen to die should be ξ and not one of x1 , . . . , xn . Birth: the flux from x to x̃ is f (x) αn ∗ e−A dx1 . . . dxn × n! × λ (x; ξ) dξ n! n+1 e−A dx1 . . . dxn = αn f (x)λ∗ (x; ξ) dξ . (2.8) n+1 The factor n! in the numerator of the left-hand side allows for the n! ways of permuting x1 , . . . , xn to represent the same configuration x. Using the formula (1.15) for the Papangelou conditional intensity, we deduce equality between these two fluxes. It follows that X has the desired target distribution as an invariant distribution. Home Page Title Page Contents JJ II J I Page 60 of 167 Go Back Full Screen Close Quit Exercise 2.6 Consider the special case λ∗ (x; ξ) = β, choosing say αn = 1/(1 + β Leb(W )) for all n > 0. Write down the simulation algorithm. Check that the resulting kernel is reversible with respect to πβ , the distribution measure of the Poisson point process of rate β. There is a very easy approach: recognize that there can be no spatial interaction, so it suffices to set up detailed balance equations for the Markov chain which counts the total number of points . . . . 2.3.2. Continuous Time case Home Page Title Page Contents JJ II J I Page 61 of 167 In the treatment above it is tempting to make αn as large as possible, to maximize the rate of change per time-step. However we can also consider the limiting case of a continuous-time model and reformulate the simulation in event-based terms, thus producing a spatial birth-and-death process. For small δ, set αn = (n + 1)δ for n to maintain Equation (2.7). (For larger n use smaller αn . . . .) Then • suppose the current value of X is X = x. While #(x) is not too large, choose from: – Death: with probability #(x)δ, choose a point η of x at random and delete it from x. – Birth: generate a new point ξ with sub-probability density λ∗ (x; ξ)δ, and add it to x (in case of probability deficit, do nothing!); Taking the limit as δ → 0, and reformulating as a continuous time Markov chain: Go Back Full Screen Close Quit • suppose the current value of X is X = x: R • wait an Exponential(#(x) + W λ∗ (x; ξ) dξ) random time then R – Death: with probability #(x)/(#(x) + W λ∗ (x; ξ) dξ), choose a point η of x at random and delete it from x; – Birth: otherwise generate a new point ξ with probability density proportional to λ∗ (x; ξ), and add it to x. Home Page Exercise 2.7 Verify that the above algorithm produces an invariant distribution with Papangelou conditional intensity λ∗ (x; ξ). Title Page A coupling argument shows this is geometrically ergodic – but we will do better! Contents JJ II J I Exercise 2.8 Determine a spatial birth-and-death process with invariant distribution which is a Strauss point process. Example of Strauss process target. Page 62 of 167 Go Back Full Screen Close Quit Exercise 2.9 Determine a spatial birth-and-death process with invariant distribution which is an area-interaction point process. Note that the technique underlying these exercises suggests the following: a local stability bound λ∗ (x; ξ) ≤ M implies that the point process can be expressed as some complicated and dependent thinning of a Poisson point process of intensity M . We shall substantiate this in Chapter 3, when we come to discuss perfect simulation for point processes. 2.4. Uniqueness and convergence Home Page Title Page Contents JJ II J I Page 63 of 167 Go Back Full Screen Close Quit We have been careful to describe the target distributions as invariant distributions rather than equilibrium distributions, because we have not yet introduced useful methods to determine whether our chains actually converge to their invariant distributions! As we have already hinted in Chapter 1, this can be a problem; indeed one can write down a Markov chain which purports to have the attractive case Strauss point process as invariant distribution, even though this point process is not well-defined! The full story here involves discussions of φ-recurrence and Harris recurrence, which we reluctantly omit for reasons of time. Fortunately there is a relatively simple technique which often works to ensure uniqueness of and convergence to an invariant distribution, once existence is established. Suppose we can ensure that the chain X visits the empty-pattern state infinitely often, and with finite mean recurrence time (the empty-pattern state ∅ is an ergodic atom). Then the coupling approach to renewal theory (Lindvall 1982; Lindvall 2002) can be applied to show that the distribution of Xn converges in total variation to its invariant distribution, which therefore must be unique. This pleasantly low-tech approach foreshadows the discussion of small sets later in this chapter, and also anticipates important techniques in perfect simulation. 2.5. The MCMC zoo Home Page Title Page Contents JJ II J I Page 64 of 167 Go Back Full Screen Close Quit The attraction of MCMC is that it allows us to produce simulation algorithms for all sorts of point processes and geometric processes. It is helpful however to have in mind a general categorization: a kind of “zoo” of different MCMC algorithms. • Here are some important special cases of the MH sampler classified by type of proposal distribution. Let the target distribution be π (could be Bayesian posterior . . . ). – Independence sampler: Choose q(x, y) = q(y) (independent of current location x) which you hope is “well-matched” to π; – Random walk sampler: Choose q(x, y) = q(y − x) to be a random walk increment based on current location x. Tails of target distribution π must not be too heavy and (eg) density contours not too sharply curved (Roberts and Tweedie 1996b); – Langevin algorithms (Also called, “Metropolis-adjusted Langevin”): Choose q(x, y) = q(y − σ 2 g(x)/2) where g(x) is formed using x and the “logarithmic gradient” of the density of π( dx) at x (possibly adjusted) (σ 2 is the jump variance). Behaves badly if logarithmic gradient tends to zero at infinity; a condition requiring something of the order “tails must not be too light” (Roberts and Tweedie 1996a). Home Page Title Page Contents JJ II J I Page 65 of 167 Go Back Full Screen Close Quit The Gibbs’ sampler corresponds to the case when proposal distributions are chosen to be conditional distributions: in a simple example, if state is described by X1 , X2 then propose new X1 according to L(X1 |X2 ) (under equilibrium distribution) and vice versa. The continuous-time algorithms for simulation of point patterns using birth-and-death processes and conditional intensities are examples of this. Home Page • Slice sampler: task is to draw from density f (x). Here is a one-dimensional example! (But note, this is only interesting because it can be made to work in many dimensions . . . ) Suppose f unimodal. Define g0(y), g1(y) implicitly by the requirement that [g0(y), g1(y)] is the superlevel set {x : f (x) ≥ y}. Title Page Contents JJ II J I def mcmc slice move (x) : y ← uniform (0, f (x)) return uniform (g0 (y), g1 (y)) Page 66 of 167 Go Back Figure 2.1: Slice sampler Full Screen Close Quit Roberts and Rosenthal (1999, Theorem 12) show rapid convergence (order of 530 iterations!) under a specific variation of log-concavity. Relates to notion of “auxiliary variables”. • Simulated tempering: Home Page Title Page – Embed target distribution in a family of distributions indexed by some “temperature” parameter. For example, embed density f (x) (defined over bounded region for x) in family of normalized Contents R (f (x))1/τ ; (f (u))1/τ du JJ II – Choose weights β0 , β1 , . . . , βk for selected temperatures τ0 = 1 < τ1 < . . . τk ; J I – Now do Metropolis-Hastings for target density Page 67 of 167 βo (f (x))1/τ0 = β0 f (x) Z β1 (f (x))1/τ1 / (f (u))1/τ1 du at level 0 at level 1 ... Go Back 1/τk βk (f (x)) Z / 1/τk (f (u)) du at level k Full Screen where moves either propose changes in x, or propose changes in level i; (NB: choice of βi ?) Close Quit – Finally, sample chain only when it is at level 0. Home Page Title Page Contents JJ II J I Page 68 of 167 • Markov-deterministic hybrid: Need to sample exp(−E(q))/Z. Introduce “momenta” p and aim to draw from 1 π(p, q) = exp −E(q) − |p|2 /Z 0 . 2 Fact: for Hamiltonian H(p, q) = E(q) + 12 |p|2 , the dynamical system dq dp = (∂H/∂p) dt = p = −(∂H/∂q) dt = −(∂E/∂q) has π(p, q) as invariant distribution (Liouville’s theorem). So alternate between (a) evolving using dynamics and (b) drawing from Gaussian marginal for p. Discretization means (a) is not exact; can fix this by use of MH rejection techniques. Cf : dynamic reversibility. Go Back Full Screen Close Quit Home Page Title Page • Dynamic importance weighting (Liu, Liang, and Wong 2001): augment Markov chain X by adding an R importance parameter W , replace search for equilibrium (E [θ(Xn )] → θ(x)π( dx)) by requirement Z lim E [Wn θ(Xn , Wn )] ∝ θ(x)π( dx) . n→∞ Contents JJ II J I Page 69 of 167 Go Back Full Screen Close Quit • Evolutionary Monte Carlo Liang and Wong (2000, 2001): use ideas of genetic algorithms (multiple samples, “exchanging” sub-components) and tempering. • and many more ideas besides . . . . 2.6. Speed of convergence Home Page There is of course an issue as to how fast is the convergence to equilibrium. This can be analytically challenging (Diaconis 1988)! Title Page 2.6.1. Convergence and symmetry: card shuffling Contents JJ II Example: transposition random walk X on permutation group of n cards. Equilibrium occurs at strong stationary times T (Aldous and Diaconis 1987; Diaconis and Fill 1990): J I Definition 2.10 The random stopping time T is a strong stationary time for the Markov chain X (whose equilibrium distribution is π) if Page 70 of 167 Go Back P [T = k, Xk = s] = πs × P [T = k] . Broder: notion of checked cards. Transpose by choosing 2 cards at random; LH, RH. Check card pointed to by LH if either LH, RH are the same unchecked card; Full Screen Close Quit or LH is unchecked, RH is checked. Inductive claim: given number of checked cards, positions in pack of checked cards, list of values of cards; the map determining which checked card has which value is uniformly random. Pn−1 Set Tm to be the time till m cards checked. The independent sum T = m=0 Tm is a strong stationary time by induction. How big is T ? Home Page Title Page Contents JJ II J I Page 71 of 167 Go Back Full Screen Close Quit Now Tm+1 −Tm is Geometrically distributed with success probability (n−m)(m+ 1)/n2 . So mean of T is " n−1 # n−1 X X n2 1 1 Tm = E + ≈ 2n log n . n+1 n−m m+1 m=0 m=0 Pn−1 MOREOVER: T = m=0 Tm is a discrete version of time to complete infection for simple epidemic (without recovery) in n individuals, starting with 1 infective, individuals contacting at rate n2 . A classic calculation tells us T /(2n log n) has a limiting distribution. NOTE: group representation theory: correct asymptotic is 12 n log n. The MCMC approach to substitution decoding also produces a Markov chain moving by transpositions on a permutation group of around n = 60 symbols. However it takes much longer to reach equilibrium than order of 60 log(60) ≈ 250 transpositions. Conditioning on data can have a drastic effect on convergence rates! 2.6.2. Convergence and small sets Home Page Title Page Contents JJ II J I The most accessible theory (as expounded for example in Meyn and Tweedie 1993) uses the notion of small sets for Markov chains. In effect, small set theory allows us to argue as if (suitably regular) Markov chains are actually discrete, even if defined on very general state-spaces. In fact this discreteness can even be formalized as the existence of a latent discrete structure. Definition 2.11 C is a small set (of order k) if there is a positive integer k, positive c and a probability measure µ such that for all x ∈ X and E ⊆ X p(k) (x, E) Page 72 of 167 Go Back Full Screen Close Quit ≥ c × I [x ∈ C] µ(E) . Markov chains can be viewed as regenerating at small sets. The notion is useful for recurrence conditions of Lyapunov type: compare low-tech “empty pattern” method. Small sets play a crucial rôle in notions of periodicity, recurrence, positive recurrence, geometric ergodicity, making link to discrete state-space theory (Meyn and Tweedie 1993). Home Page Foster-Lyapunov recurrence on small sets (Mykland, Tierney, and Yu 1995; Roberts and Tweedie 1996a; Roberts and Tweedie 1996b; Roberts and Rosenthal 2000; Roberts and Rosenthal 1999; Roberts and Polson 1994) Title Page Contents JJ II J I Page 73 of 167 Go Back Full Screen Theorem 2.12 (Meyn and Tweedie 1993) Positive recurrence holds if one can find a small set C, a constant β > 0, and a non-negative function V bounded on C such that for all n > 0 E [V (Xn+1 )|Xn ] V (Xn ) − 1 + β I [Xn ∈ C] . (2.9) Proof: Let N be the random time at which X first (re-)visits C. It suffices3 to show E [N |X0 ] < V (X0 ) + constant < ∞ (then use small-set regeneration). By iteration of (2.9), we deduce E [V (Xn )|X0 ] < ∞ for all n. If X0 6∈ C then (2.9) tells us n 7→ V (Xn∧N ) + n ∧ N defines a nonnegative supermartingale (I X(n∧N ) ∈ C = 0 if n < N ) . Consequently E [N |X0 ] Close Quit ≤ 3 This ≤ E [V (XN ) + N |X0 ] ≤ V (X0 ) . martingale approach can be reformulated as an application of Dynkin’s formula. If X0 ∈ C then the above can be used to show Home Page Title Page Contents JJ II J I Page 74 of 167 Go Back Full Screen Close Quit E [N |X0 ] = E [1 × I [X1 ∈ C]|X0 ] + E [E [N |X1 ] I [X1 6∈ C]|X0 ] ≤ P [X1 ∈ C|X0 ] + E [1 + V (X1 )|X0 ] ≤ P [X1 ∈ C|X0 ] + V (X0 ) + β where the last step uses Inequality (2.9) applied when I [Xn−1 ∈ C] = 1. This idea can be extended to give bounds on convergence rates. Home Page Title Page Contents JJ II J I Theorem 2.13 (Meyn and Tweedie 1993) Geometric ergodicity holds if one can find a small set C, positive constants λ < 1, β, and a function V ≥ 1 bounded on C such that E [V (Xn+1 )|Xn ] ≤ λV (Xn ) + β I [Xn ∈ C] . (2.10) Proof: Define N as in Theorem 2.12. Iterating (2.10), we deduce E [V (Xn )|X0 ] < ∞ and more specifically n 7→ V (Xn∧N )/λn∧N Page 75 of 167 Go Back Full Screen is a nonnegative supermartingale. Consequently E V (XN )/λN |X0 ≤ V (X0 ) . Using the facts that V ≥ 1, λ ∈ (0, 1) and Markov’s inequality we deduce P [N > n|X0 ] ≤ λn V (X0 ) , Close which delivers the required geometric ergodicity. Quit Home Page Title Page Contents JJ Bounds may not be tight, but there are many small sets for most Markov chains (not just the trivial singleton sets, either!). In fact small sets can be used to show: Theorem 2.14 (Kendall and Montana 2002) If the Markov chain has a measurable transition density p(x, y) then the two-step density p(2) (x, y) can be expressed (non-uniquely) as a non-negative countable sum X p(2) (x, y) = fi (x)gi (y) . II Hence in such cases small sets of order 2 abound! J I Page 76 of 167 Go Back Full Screen Close Quit This relates to the classical theory of small sets and would not surprise its originators; see for example Nummelin 1984. Home Page Proof: Key Lemma, variation of Egoroff’s Theorem: Let p(x, y) be an integrable function on [0, 1]2 . Then we can find subsets Aε ⊂ [0, 1], increasing as ε decreases, such that Title Page Contents JJ II J I Page 77 of 167 Go Back Full Screen Close Quit (a) for any fixed Aε the “L1 -valued function” px is uniformly continuous on Aε : for any η > 0 we can find δ > 0 such that |x − x0 | < δ and x, x0 ∈ Aε implies Z 1 |px (z) − px0 (z)| dz < η ; 0 (b) every point x in Aε is of full relative density: as u, v → 0 so Leb([x − u, x + v] ∩ Aε ) u+v → 1. Moreover one can discover latent discrete Markov chains representing all such Markov chains at period 2. However one cannot get below order 2: Home Page Title Page Contents JJ II J I Theorem 2.15 (Kendall and Montana 2002) There are Markov chains which possess no small sets of order 1. Proof: Key Theorem: There exist Borel measurable subsets E ⊂ [0, 1]2 of positive area which are rectangle-free, so that if A × B ⊆ E then Leb(A × B) = 0. (Stirling’s formula, discrete approximations, Borel-Cantelli.) Page 78 of 167 Go Back Full Screen Close Quit Figure 2.2: No small sets! Home Page Title Page Contents JJ II J I Chapter 3 Perfect simulation Page 79 of 167 Contents Go Back Full Screen Close Quit 3.1 3.2 3.3 CFTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fill’s (1998) method . . . . . . . . . . . . . . . . . . . . . Sundry other matters . . . . . . . . . . . . . . . . . . . . 81 97 99 Home Page Title Page Contents JJ II J I Page 80 of 167 Go Back Full Screen Close Quit Establishing bounds on convergence to equilibrium is important if we are to get a sense of how to do MCMC. It would be very useful if one could finesse the whole issue by modifying the MCMC algorithm so as to produce an exact draw from equilibrium, and this is precisely the idea of perfect simulation. A graphic illustration of this in the context of stochastic geometry (specifically, the dead leaves model) is to be found at http://www.warwick.ac.uk/statsdept/staff/WSK/dead.html. 3.1. CFTP Home Page Title Page Contents JJ II J I Page 81 of 167 Go Back Full Screen Close Quit There are two main approaches to perfect simulation. One is Coupling from the Past (CFTP), introduced by Propp and Wilson (1996). Many variations have arisen: we will describe the original Classic CFTP, and two variants Small-set CFTP and Dominated CFTP. Part of the fascination of this area is the interplay between theory and actual implementation: we will present illustrative simulations. 3.1.1. Classic CFTP Home Page Title Page Consider reflecting random walk run From The Past (Propp and Wilson 1996): • no fixed start: run all starts simultaneously (Coupling From The Past); • extend past till all starts coalesce by t = 0; Contents JJ II J I • 3-line proof: perfect draw from equilibrium. Page 82 of 167 Go Back Figure 3.1: Classic CFTP. Implementation: - “re-use” randomness; Full Screen Close - sample at t = 0 not at coalescence; - control coupling efficiently using monotonicity, boundedness; - uniformly ergodic. Quit Here is the 3-line proof, with a couple of extra lines to help explain! Home Page Title Page Theorem 3.1 If coalescence is almost sure then CFTP delivers a sample from the equilibrium distribution of the Markov chain X corresponding to the random input-output maps F(−u,v] . Contents JJ II J I Proof: For each time-range [−n, ∞) use input-output maps F(−n,t] to define Xt−n Page 83 of 167 = F(−n,t] (0) for − n ≤ t . We have assumed finite coalescence time −T for F . Then X0−n L(X0−n ) = = X0−T L(Xn0 ) whenever − n ≤ −T ; (3.1) (3.2) Go Back If X converges to an equilibrium π then Full Screen Close Quit dist(L(X0−T ), π) lim dist(L(X0−n ), π) = lim dist(L(Xn0 ), π) = 0 n n (3.3) (dist is total variation) hence giving the required result. = Home Page Title Page Here is a simple version of perfect simulation applied to the Ising model, following Propp and Wilson (1996) (but they do something much cleverer!). Here the probability of a given ±1-valued configuration S is proportional to XX exp β Sx Sy x∼y Contents JJ II J I Page 84 of 167 Go Back Full Screen Close Quit and our recipe is, pick a site x, compute conditional probability of Sx = +1 given rest of configuration S, P X exp(β y:y∼x (-1) × Sy ) P ρ = = exp(−2β Sy ) , exp(β y:y∼x (1) × Sy ) y:y∼x and set Sx = +1 with probability ρ, else set Sx = −1. Boundary conditions are “free” in the simulation: for small β we can make a perfect draw from Ising model over infinite lattice, by coupling in space as well as time (Kendall 1997b; Häggström and Steif 2000). We can do better for Ising: Huber (2003) shows how to convert Swendsen-Wang methods to CFTP. 3.1.2. A little history Home Page Title Page Contents JJ II J I Kolmogorov (1936): 1 Consider an inhomogeneous (finite state) Markov transition kernel Pij (s, t). Can it be represented as the kernel of a Markov chain begun in the indefinite past? Yes! Apply diagonalization arguments to select a convergent subsequence from the probability systems arising from progressively earlier and earlier starts: P [X(0) = k] P [X(−1) = j, X(0) = k] P [X(−2) = i, X(−1) = j, X(0) = k] = = = ... Page 85 of 167 Go Back Full Screen Close Quit 1 Thanks to Thorisson (2000) for this reference . . . . Qk (0) Qj (−1)Pjk (−1, 0) Qi (−2)Pij (−2, −1)Pjk (−1, 0) 3.1.3. Dead Leaves Home Page Title Page Nearly the simplest possible example of perfect simulation. • Dead leaves falling in the forests at Fontainebleau; • Equilibrium distribution models complex images; Contents • Computationally amenable even in classic MCMC framework; JJ II J I • But can be sampled perfectly with no extra effort! (And uniformly ergodic) How can this be done? see Kendall and Thönnes (1999) [HINT: try to think non-anthropomorphically . . . .] Occlusion CFTP. Related work: Jeulin (1997), Lee, Mumford, and Huang (2001). Page 86 of 167 Go Back Full Screen Close Figure 3.2: Falling leaves of perfection. Quit 3.1.4. Small-set CFTP Home Page Does CFTP depend on monotonicity? No: we can use small sets. Consider a simple Markov chain on [0, 1] with triangular kernel. Title Page Contents JJ II J I Figure 3.3: Small-set CFTP. Page 87 of 167 • Note “fixed” small triangle; Go Back • this can couple Markov chain steps; Full Screen • basis of small-set CFTP (Murdoch and Green 1998). Close Quit Extends to more general cases (“partitioned multi-gamma sampler”). Uniformly ergodic. Home Page Title Page Exercise 3.2 Write out the details of small-set CFTP. Deduce how to represent the output of simple small-set CFTP (uses just one small set) in terms of a Geometric random variable and draws from a single kernel. Contents JJ II J I Page 88 of 167 Go Back Full Screen Close Quit Exercise 3.3 One small set is usually not enough - coalescence takes much too long! Work out how to use several small sets to do partitioned small-set CFTP (“partitioned multi-gamma sampler”) (Murdoch and Green 1998). 3.1.5. Point patterns Home Page Title Page Recall Widom-Rowlinson model (Widom and Rowlinson 1970) from physics, “area-interaction” model in statistics (Baddeley and van Lieshout 1995). The point pattern X has density (with respect to Poisson process): exp (− log(γ) × (area of union of discs centred on X)) Contents JJ II J I Page 89 of 167 Figure 3.4: Perfect area-interaction. Go Back • Represent as two-type pattern (red crosses, blue disks); Full Screen Close Quit • inspect structure to detect monotonicity; • attractive case γ > 1: CFTP uses “bounded” state-space (Häggström et al. 1999) in sense of being uniformly ergodic. All cases can be dealt with using Dominated CFTP (Kendall 1998). 3.1.6. Dominated CFTP Home Page Title Page Contents JJ II J I Foss and Tweedie (Foss and Tweedie 1998) show classic CFTP is essentially equivalent to uniform ergodicity (rate of convergence to equilibrium does not depend on start-point). For classic CFTP needs vertical coalescence: every possible start from time −T leads to same result at time 0. Can we lift this uniform ergodicity requirement? Page 90 of 167 Go Back Figure 3.5: Horizontal CFTP. Full Screen Close Quit CFTP almost works with horizontal coalescence: all sufficiently early starts from a specific location * lead to same result at time 0. But how to identify when this has happened? Home Page Recall Lindley’s equation for the GI/G/1 queue identity Wn+1 Title Page Contents JJ II = I Page 91 of 167 Go Back Full Screen Close Quit = max{0, Wn + ηn } and its development Wn+1 max{0, ηn , ηn + ηn−1 , . . . , ηn + ηn−1 + . . . + η1 } = leading by time-reversal to Wn+1 J max{0, Wn + Sn − Xn+1 } = max{0, η1 , η1 + η2 , . . . , η1 + η2 + . . . + ηn } and thus the steady-state expression W∞ = max{0, η1 , η1 + η2 , . . .} . Loynes (1962) discovered a coupling application to queues with general inputs, which pre-figures CFTP. Home Page Title Page Contents JJ II J I Dominated CFTP idea: Generate target chains using a dominating process for which equilibrium is known. Domination allows us to identify horizontal coalescence by checking starts from maxima given by the dominating process. Thus we can apply CFTP to Markov chains which are merely geometrically ergodic (Kendall 1998; Kendall and Møller 2000; Kendall and Thönnes 1999; Cai and Kendall 2002) or worse (geometric ergodicity 6= Dominated CFTP!). Page 92 of 167 Go Back Full Screen (a) Perfect Strauss point process. (b) Perfect conditioned Boolean model. Close Figure 3.6: Two examples of non-uniformly ergodic CFTP. Quit Home Page Description2 of dominated CFTP (Kendall 1998; Kendall and Møller 2000) for birth-death algorithm for point process satisfying the local stability condition λ∗ (ξ; x) ≤ γ. Title Page Contents JJ II J I Page 93 of 167 Original Algorithm: points η die at unit death rate, R R are born in pattern x at total birth rate W λ∗ (x; u) du, density λ∗ (x; ·)/ W λ∗ (x; u) du, so local birth rate λ∗ (x; ξ). Perfect modification: underlying supply of randomness is birth-and-death process with unit death rate, uniform birth rate γ per unit area. Mark points independently with (example) Uniform(0, 1) marks. Define lower and upper processes L and U as exhibiting minimal and maximal birth rates available in range of configurations between L and U : minimal birth rate at u maximal birth rate at u Go Back Full Screen Close Quit = min{λ∗ (x; u) : L ⊆ ξ ⊆ U } , = max{λ∗ (x; u) : L ⊆ ξ ⊆ U } . Use marks to do CFTP till current iteration delivers L(0) = U (0), then return the agreed configuration! Example uses Poisson point patterns as marks for area-interaction, following Kendall (1997a). 2 Recall remark about dependent thinning! Home Page Title Page Contents JJ II J I Page 94 of 167 Go Back Full Screen Close Quit “It’s a thinning procedure, Jim, but not as we know it.” Dr McCoy, Startrek, stardate unknown. Home Page All this suggests a general formulation: Let X be a Markov chain on X which exhibits equilibrium behaviour. Embed the state-space X in a partially ordered space (Y, ) so that X is at bottom of Y, in the sense that for any y ∈ Y, x ∈ X , Title Page yx Contents JJ II J I implies y = x. We may then use Theorem 3.1 to show: Theorem 3.4 Define a Markov chain Y on Y such that Y evolves as X after it hits X ; let Y (−u, t) be the value at t a version of Y begun at time −u, (a) of fixed initial distribution L(Y (−T, −T )) = L(Y (0, 0)), and Page 95 of 167 Go Back Full Screen Close Quit (b) obeying funnelling: if −v ≤ −u ≤ t then Y (−v, t) Y (−u, t). Suppose coalescence occurs: P [Y (−T, 0) ∈ X ] → 1 as T → ∞. Then lim Y (−T, 0) can be used for a CFTP draw from the equilibrium of X. Home Page So how far can we go? It looks nearly possible to define the dominating process using the “Liapunov function” V of Theorem 2.13. There is an interesting link – Rosenthal (2002) draws it even closer – nevertheless: Title Page Existence of Liapunov function doesn’t ensure dominated CFTP! The relevant Equation (2.10) is: Contents E [V (Xn+1 )|Xn ] JJ II J I ≤ λV (Xn ) + β I [Xn ∈ C] . However there are perverse examples satisfying the supermartingale inequality, but failing the the stochastic dominance required to use V (X) for dominated CFTP . . . :-(.3 Page 96 of 167 Go Back Full Screen Close Quit 3 Later: I have discovered how to fix this using sub-sampling . . . 3.2. Fill’s (1998) method Home Page Title Page The alternative to CFTP is Fill’s algorithm (Fill 1998; Thönnes 1999), at first sight quite different, based on the notion of a strong stationary time T . Fill et al. (2000) show a profound link, best explained using “blocks” as input-output maps for a chain. Contents JJ II J I Figure 3.7: The “block” model for input-output maps. First recall that CFTP can be viewed in a curiously redundant fashion as follows: Page 97 of 167 Go Back • Draw from equilibrium X(−T ) and run forwards; • continue to increase T until X(0) is coalesced; • return X(0). Full Screen Close Quit Home Page Title Page Figure 3.8: Conditioning the blocks on the reversed chain. Key observation: By construction, X(−T ) is independent of X(0) and T so . . . Contents • Condition on a convenient X(0); JJ II J I • Run X backwards to a fixed time −T ; • Draw blocks conditioned on the X transitions; • If coalescence then return X(−T ) else repeat. Page 98 of 167 Go Back Exercise 3.5 Can you figure out how to do a dominated version of Fill’s method? Full Screen Close Quit Exercise 3.6 Produce a perfect version of the slice sampler! (Mira, Møller, and Roberts 2001) 3.3. Sundry other matters Home Page Title Page Contents JJ II J I Page 99 of 167 Go Back Layered Multishift: Wilson (2000b) and further work by Corcoran and others. Basic idea: how to draw simultaneously from Uniform(x, x + 1) for all x ∈ R, and to try to couple the draws? Answer: draw a uniformly random unit span integer lattice, . . . . 4 Exercise 3.7 Figure out how to replace “Uniform(x, x + 1)” by any other location-shifted family of continuous densities. (Hint: mixtures of uniforms . . . .) Montana used these ideas to do perfect simulation of nonlinear AR(1) in his PhD thesis. Full Screen Close Quit 4 Uniformly random lattices in dimension 2 play a part in line process theory. Home Page Title Page Read-once randomness: Wilson (2000a). The clue is in: Exercise 3.8 Figure out how to do CFTP, using the “input-output” blocks picture of Figure 3.7, but under the constraint that you aren’t allowed to save any blocks! Contents JJ II J I Page 100 of 167 Perfect simulation for Peirls’ contours (Ferrari et al. 2002). The Ising model can be reformulated in an important way by looking only at the contours (lines separating ±1 values). In fact these form a “non-interacting hard-core gas”, thus permitting (at least in theory) Ferrari et al. (2002) to apply their variant of perfect simulation (Backwards-Forwards Algorithm)! Compare also Propp and Wilson (1996). Go Back Perfect simulated tempering (Møller and Nicholls 1999; Brooks et al. 2002). Full Screen Close Quit 3.3.1. Efficiency Home Page Title Page Contents JJ II J I Page 101 of 167 Go Back Burdzy and Kendall (2000): Coupling of couplings:. . . |pt (x1 , y) − pt (x2 , y)| ≤ ≤ |P [X1 (t) = y|X1 (0) = x1 ] − P [X2 (t) = y|X2 (0) = x2 ]| ≤ |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ]| × ×P [τ > t|X(0) = (x1 , x2 )] Suppose |pt (x1 , y) − pt (x2 , y)| ≈ c exp(−µ2 t) while P [τ > t|X(0) = (x1 , x2 )] ≈ c exp(−µt) Let X ∗ be a coupled copy of X but begun at (x2 , x1 ): |P [X1 (t) = y|τ > t, X1 (0) = x1 ] − P [X2 (t) = y|τ > t, X2 (0) = x2 ] | = |P [X1 (t) = y|τ > t, X(0) = (x1 , x2 )] − P [X1∗ (t) = y|τ > t, X ∗ (0) = (x2 , x1 )] | So let σ be time when X, X ∗ couple: Full Screen ≤ P [σ > t|τ > t, X(0) = (x1 , x2 )] (≈ c exp(−µ0 t)) Close Quit Thus µ2 ≥ µ0 + µ. See also Kumar and Ramesh (2001). 3.3.2. Example: perfect epidemic Home Page Title Page Context • many models: SIR, SEIR, Reed-Frost • data: knowledge about removals Contents JJ II J I • draw inferences about other aspects – infection times – infectivity parameter – removal rate – ... Page 102 of 167 Perfect MCMC for Reed-Frost model Go Back • based on explicit formula for size of epidemic • (O’Neill 2001; Clancy and O’Neill 2002) Full Screen Close Quit Question Home Page Title Page Can we apply perfect simulation to SIR epidemic conditioned on removals? • IDEA – SIR → compartment model Contents JJ II J I Page 103 of 167 Go Back Full Screen Close Quit – conditioning on removals is reminiscent of coverage conditioning for Boolean model • consider SIR-epidemic-valued process, considered as event-driven simulation, driven by birth-and-event processes producing infection and removal events . . . • then use reversibility, perpetuation, crossover to derive perfect simulation algorithm for conditioned case. • PRO: conditioning on removals appears to make structure “monotonic” Home Page Title Page • CON: Perpetuation and nonlinear infectivity rate combine to make funnelling difficult • So: attach independent streams of infection proposals to each level k = I + R to “localize” perpetuation Contents • Speed-up by cycling through: JJ II J I – replace all unobserved marked removal proposals (thresholds) – try to alter all observed removal marks (perpetuate . . . ) Page 104 of 167 Go Back Full Screen Close Quit – replace all infection proposals (perpetuate . . . ) (Reminiscent of H-vL-M method for perfect simulation of attractive areainteraction point process) Consistent theoretical framework Home Page Title Page Contents JJ II J I View Poisson processes in time as 0 : 1-valued processes over “infinitesimal pixels”: infinitesimal infinitesimal P X = 1 = λ × measure pixel pixel • replace thresholds: work sequentially through “infinitesimal pixels”, reject replacement if it violates conditioning • replace removal marks: can incorporate into above step • replace infection proposals: work sequentially through “infinitesimal pixels” first at level 1, then level 2, then level 3 . . . (TV scan); reject replacement if it violates conditioning. Page 105 of 167 Theorem 3.9 Resulting construction is monotonic (so enables CFTP) Go Back Full Screen Close Proof: Establish monotonicity for • fastest path of infection (what is observed) • slowest feasible path of infection (what is used for perpetuation) Quit Implementation: Home Page Title Page It works! • speed-up? • scaling? Contents JJ II J I Page 106 of 167 Go Back Full Screen Close Quit • refine? • generalize? Applications • “omnithermal” variant to produce estimate of likelihood surface • generalization using crossover to produce full Bayesian analysis Home Page Title Page Contents JJ II J I Page 107 of 167 Chapter 4 Deformable templates and vascular trees Go Back Contents Full Screen Close Quit 4.1 4.2 4.3 MCMC and statistical image analysis . . . . . . . . . . . . 108 Deformable templates . . . . . . . . . . . . . . . . . . . . 110 Vascular systems . . . . . . . . . . . . . . . . . . . . . . . 111 4.1. MCMC and statistical image analysis Home Page Title Page How to use MCMC on images? Huge literature, beginning with Geman and Geman (1984). Simple example: reconstruct white/black ±1 image: observation Xij = ±1, “true image” Sij = ±1, X|S independent Contents P [Xij = xij |Sij ] JJ II J I Full Screen Close exp (φSij xij ) cosh(φ) (i,j)∼(i0 ,j 0 ) Posterior for S|X proportional to X exp β (i,j)∼(i0 ,j 0 ) Sij Si0 j 0 + φ X Xij Sij ij (Ising model with inhomogeneous external magnetic field φX), and we can sample from this using MCMC algorithms. Easy CFTP for this example Quit 2 while prior for S is Ising model, probability mass function proportional to X exp β Sij Si0 j 0 Page 108 of 167 Go Back = Home Page “Low-level image analysis”: contrast “high-level image analysis”. For example, pixel array {(i, j)} representing small squares in a window [0, 1]2 , and replace S by a Boolean model (Baddeley and van Lieshout 1993). Title Page Contents JJ II J I Page 109 of 167 Go Back Full Screen Close Quit Figure 4.1: Scheme for high-level image analysis. Van Lieshout and Baddeley (2001), Loizeaux and McKeague (2001) provide perfect examples. 4.2. Deformable templates Home Page Grenander and Miller (1994): to answer “how can knowledge in complex systems be represented? use “deformable templates” (image algebras). Title Page • configuration space; Contents • configurations (prior probability measure from interactions) → image; • groups of similarities allow deformations; JJ II • template is representation within similarity equivalence class; J I Page 110 of 167 Go Back Full Screen Close Quit • “jump-diffusion”: analogue of Metropolis-Hastings. Besag makes link in discussion of Grenander and Miller (1994), suggesting discrete-time LangevinMetropolis-Hastings moves. Expect to need long MCMC runs; Grenander and Miller (1994) use SIMD technology. 4.3. Vascular systems Home Page We illustrate using an application to vascular structure in medical images (Bhalerao et al. 2001, Thönnes et al. 2002, 2003). Figures: Title Page Figure 4.2(a) cerebral vascular system (application: preplanning of surgical procedures). Contents Figure 4.2(b) retinal vascular system (application: diabetic retinopathy). JJ II J I Page 111 of 167 Go Back Full Screen (a) AVI movie of cerebral vascular system. (b) Retina vascular system: 4002 pixel image. Close Figure 4.2: Examples of vascular structures. Quit a Home Page Title Page Contents JJ II J I Plan: use random forests of planar (or spatial!) trees as configurations, since the underlying vascular system is nearly (but not exactly!) tree-like. Note: our tree edges are graph vertices in the Grenander-Miller formalism. Vary tree number, topologies and geometries using Metropolis-Hastings moves. Hence sample from posterior so as to partly reconstruct the vascular system. Issues: • figuring out effective reduction of the very large datasets which can be involved, how to link the model to the data, • careful formulation of the prior stochastic geometry model, • choice of the moves so as not to slow down the MCMC algorithm, • and use of multiscale information. Page 112 of 167 Go Back Full Screen Close Quit 4.3.1. Data Home Page Title Page For computational feasibility, use multiresolution Fourier Transform (Wilson, Calway, and Pearson 1992) based on dyadic recursive partitioning using quadtree. (Compare wavelets.) Contents JJ II J I Page 113 of 167 Go Back Figure 4.3: Multiresolution summary (planar case!). Full Screen Close Quit Home Page Title Page Partitioning: based on mean-squared error criterion and the extent to which summarizing feature is elongated: alternative approach uses Akaike’s information criterion (Li and Bhalerao 2003). Figure 4.4(a) shows a typical feature, which in the planar case is a weighted bivariate Gaussian kernel, while Figure 4.4(b) shows the result of summarizing a 400 × 400 pixel retinal image using 1000 weighted bivariate Gaussian kernels. Contents JJ II J I Page 114 of 167 Go Back Full Screen Close Quit (a) A typical feature for the MFT. (b) 1000 kernels for a retina. Figure 4.4: Decomposition into features using quadtree. Home Page Title Page This procedure can be viewed as a spatial variant of a treed model (Chipman et al. 2002), a variation on CART (Breiman et al. 1984) in which different models are fitted to divisions of the data into disjoint subsets, with further divisions being made recursively until satisfactory models are obtained. Thus subsequent Bayesian inference can be viewed as conditional on the treed model inference. Contents JJ II J I Page 115 of 167 Go Back Full Screen Figure 4.5: Final result of Multiresolution Fourier Transform Close Quit 4.3.2. Likelihood Home Page Link summarized data (mixture of weighted bivariate Gaussian kernels k(x, y, e) to configuration, assuming additive Gaussian noise. Title Page f (x, y) = Contents JJ II J I XX → k(x, y, e) + noise f˜(x, y) = τ ∈F e∈τ X k(x, y, e) . e∈E Split likelihood by rectangles Ri . `(F|f˜) = N X `i (F|f˜) + constant i=1 Page 116 of 167 `i (F|f˜) = − Go Back ≈ 1 2σ 2 1 − 2 2σ h X f˜(x, y) − h f˜(x, y) − Ri i2 k(x, y, e) τ ∈F e∈τ (x,y)∈Ri ZZ XX XX i2 k(x, y, e) hi (x, y) dx dy τ ∈F e∈τ Full Screen Extend from Ri to all R2 , enabling closed form computation. Close `i (F|f˜) Quit ≈ − 1 2σ 2 ZZ R2 h f˜(x, y) − XX τ ∈F e∈τ i2 k(x, y, e) gi (x, y) dx dy (4.1) 4.3.3. Prior Home Page Prior using AR(1) processes for location, width, extending along forests of trees τ produced by subcritical binary Galton-Watson processes. Title Page lengthe Contents JJ II J I Page 117 of 167 Go Back Full Screen Close Quit = α × lengthe0 + Gaussian innovation for autoregressive parameter α ∈ (0, 1). (a) sub-critical binary (b) AR(1) process on (c) each edge reprebranching process. edges. sented by a weighted Gaussian kernel. Figure 4.6: Components of the prior. (4.2) 4.3.4. MCMC implementation Home Page We have now formulated likelihood and model. It remains to decide on a suitable MCMC process. Title Page Contents JJ II J I Page 118 of 167 Go Back Full Screen Figure 4.7: Refined scheme for high-level image analysis. Close Quit Home Page Title Page Changes are local (hence slow). Compute MH accept-reject probabilities using AR(1) location distributions and Y Y P [τ ] = P [ node has n children ] (4.3) n=0,1,2 v∈τ :v has n children Contents JJ II J I Page 119 of 167 Go Back Full Screen Close Figure 4.8: Reversible jump MCMC proposals. Quit Computations for topological moves. Suppose family-size distribution is Home Page Title Page Contents JJ II J I Page 120 of 167 Go Back Full Screen Close Quit (1 − p)2 , 2p(1 − p) , p2 Hastings ratio for adding a specified leaf ` ∈ ∂ext (τ ): H(τ ; `) = P [τ ∪ {`}] /#(∂int (τ ∪ {`}) ∪ ∂ext (τ ∪ {`})) . P [τ ∪ {`}] /#(∂int (τ ) ∪ ∂ext (τ )) Use induction on τ : P [τ ∪ {`}] = P [τ ] p(1 − p), . . . H(τ ; `) = p(1 − p) 1 + #(V (τ )) + #(∂ext (τ )) 2 + #(V (τ )) + #(∂ext (τ ))+1 where +1 is added to the denominator if ` is added to a vertex which already has one daughter. It is clear that this Hastings ratio is always smaller than p(1 − p), and is approximately equal to p(1 − p) for large trees τ . Consequently removals of leaves will always be accepted, and additions will usually be accepted, but there is a computational slowdown as the tree size increases. Home Page Title Page Contents JJ II J I Page 121 of 167 Go Back Full Screen Close Quit Geometric ergodicity is impossible: consider second eigenvalue λ2 of the Markov kernel by Rayleigh-Ritz: P P 2 x y |φ(x) − φ(y)| p(x, y)π(y) 1 − λ2 = inf P P 2 φ6=0 x y |φ(x) − φ(y)| π(x)π(y) P P x∈S y∈S c p(x, y)π(y) ≤ 2 π(S) whenever π(S) < 1/2. Taking S to be the set of all trees of depth n or more, and using the fact that such trees must have at least n sites, it follows 1 − λ2 ≤ 2 1 p(1 − p) n → 0. Experiments: Home Page MCMC for prior model Title Page JJ II First we implement only the moves relating to a single tree, and only to that tree’s topology. We change the tree just one leaf at a time. Convergence is clearly very slow! moreover the animation clearly indicates the problem is to do with incipient criticality phenomena. J I MCMC for prior model with more global changes Contents Page 122 of 167 Go Back Moves dealing with sub-trees will speed things up greatly! We won’t use these as such – however we will be working with a whole forest of such trees, and using more “global” moves of join and divide. Full Screen MCMC for occupancy conditioned prior model Close Quit Conditioning on data will also help! Here is a crude parody of conditioning (pixels are black/white depending on whether or not they intersect the planar tree). 4.3.5. Results Home Page Title Page Figure 4.9 presents snapshots (over about 50k moves) of a first try, using accept/reject probabilities for a Metropolis-Hastings algorithm based on the above prior and likelihood. Contents JJ II J I Page 123 of 167 Go Back Full Screen Close Quit Figure 4.9: AVI movie of fixed-width MCMC. We improve the algorithm by adding Home Page Title Page Contents (1) a further AR(1) process to model the width of blood vessels; (2) a Langevin-related component to the Metropolis-Hasting proposal distribution, biasing the proposal according to local modes of the posterior density (compare Grenander and Miller 1994 and especially Besag’s remarks!); (3) the possibility of proposing addition (and deletion) of features of length 2; JJ II J I (4) gradual refinement of feature proposals from coarse to fine levels. Page 124 of 167 Go Back Full Screen Close Figure 4.10: AVI movies of improved MCMC. Quit 4.3.6. Refining the starting state Home Page Title Page Contents JJ II J I Page 125 of 167 Go Back Full Screen Close Quit We improve matters by choosing a better starting state for the algorithm, using a variant of Kruskal’s algorithm. This generates a minimal spanning tree of a graph as follows: Let G = (V, E) be a graph with edges weighted by cost. A spanning tree is a subgraph of G which is (a) a tree and (b) spans all vertices. We wish to find a spanning tree τ of minimal cost: def kruskal (edges) : edgelist ← edges.sort () # sort so cheapest edge is first tree ← [ ] for e ∈ edgelist : if cyclefree (tree, e) : # check no cycle is created tree.append (e) return tree (Key remark: at end, swapping in new edge e would create a circuit which must be broken by removal of e0 . By construction cost(e) ≥ cost(e0 ).) Actually we need a binary tree (so amend cyclefree accordingly!) and the cost depends on the edges to which it is proposed to join e, but a heuristic modification of Kruskal’s algorithm produces a binary forest which is a good candidate for a starting position for our algorithm. Figure 4.11 shows a result; notice how now the algorithm begins with many small trees, which are progressively dropped or joined to other trees. Home Page Title Page Contents JJ II J I Page 126 of 167 Go Back Full Screen Close Quit Figure 4.11: AVI movie of MCMC using starting position provided by Kruskal’s algorithm. Home Page Title Page Contents JJ II J I Chapter 5 Multiresolution Ising models Page 127 of 167 Go Back Full Screen Close Quit Contents 5.1 5.2 5.3 5.4 5.5 5.6 Generalized quad-trees Random clusters . . . . Percolation . . . . . . . Comparison . . . . . . . Simulation . . . . . . . Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 133 136 143 145 147 Home Page Title Page Contents JJ II J I Page 128 of 167 The vascular problem discussed above involves multi-resolution image analysis described, for example, in Wilson and Li (2002). This represents the image on different scales: the simplest example is that of a quad-tree (a tree graph in which each node has four daughters and just one parent), for which each node carries a random value 0 (coding the colour white) and 1 (coding the colour black), and the random values interact in the manner of a Markov point process with their neighbours. In the vascular case the 0/1 value is replaced by a line segment summarizing the linear structure which best explains the local data at the appropriate resolution level, but for now we prefer to analyze the simpler 0/1 case. Conventional multi-resolution models have interactions only running up and down the tree; however in our case we clearly need interactions between neighbours at the same resolution level as well. This complicates the analysis! Go Back Full Screen Close Quit A major question is to determine the extent to which values at high resolution levels are correlated with values at lower resolution levels (Kendall and Wilson 2003). This is related to much recent interest in the question of phase transitions for Ising models on non-Euclidean random graphs. We shall show how geometric arguments lead to a schematic phase portrait. 5.1. Generalized quad-trees Home Page Title Page Define Qd as graph whose vertices are cells of all dyadic tessellations of Rd , with edges connecting each cell u to its 2d neighbours, and also its parent M(u) (covering cell in tessellation at next resolution down) and its 2d daughters (cells which it covers in next resolution up). Contents JJ II J I Case d = 1: Page 129 of 167 Go Back Full Screen Close Quit Neighbours at same level also are connected. Remark: No spatial symmetry! However: Qd can be constructed using an iterated function system generated by elements of a skew-product group R+ ⊗s Rd . Further define: Home Page Title Page Contents JJ II J I Page 130 of 167 • Qd;r as subgraph of Qd at resolution levels of r or higher; • Qd (o) as subgraph formed by o and all its descendants. Go Back Full Screen Close Quit • Remark: there are many graph-isomorphisms Su;v between Qd;r and Qd;s , with natural Zd -action; • Remark: there are graph homomorphisms injecting Q(o) into itself, sending o to x ∈ Q(o) (semi-transitivity). Simplistic analysis Home Page Title Page Contents JJ II J I Page 131 of 167 Define Jλ to be strength of neighbour interaction, Jτ to be strength of parent interaction. If Sx = ±1 then probability of configuration is proportional to exp(−H) where X 1 H=− Jhx,yi (Sx Sy − 1) , (5.1) 2 hx,yi∈E(G) for Jhx,yi = Jλ , Jτ as appropriate. If Jλ = 0 then the free Ising model on Qd (o) is a branching process (Preston 1977; Spitzer 1975); if Jτ = 0 then the Ising model on Qd (o) decomposes into sequence of d-dimensional classical (finite) Ising models. So we know there is a phase change at (Jλ , Jτ ) = (0, ln(5/3)) (results from branching processes), √ and expect one at (Jλ , Jτ ) = (ln(1 + 2), 0+) (results from 2-dimensional Ising model). But is this all that there is to say? Go Back Full Screen Close Quit Home Page Title Page There has been a lot of work on percolation and Ising models on non-Euclidean graphs (Grimmett and Newman 1990; Newman and Wu 1990; Benjamini and Schramm 1996; Benjamini et al. 1999; Häggström and Peres 1999; Häggström et al. 1999; Lyons 2000; Häggström et al. 2002). Mostly this concerns graphs admitting substantial symmetry (unimodular, quasi-transitive), so that one can use Häggström’s remarkable Mass Transport method: Contents JJ II J I Page 132 of 167 Go Back Full Screen Close Quit Theorem 5.1 Let Γ ⊂ Aut(G) be unimodular and quasi-transitive. Given Rmass transport m(x, y, ω) ≥ 0 “diagonally invariant”, set M (x, y) = m(x, y, ω) dω. Then the expected total transport out of any x equals Ω the expected total transport into x: X X M (x, y) = M (y, x) . y∈Γx y∈Γx Sadly our graphs do not have quasi-transitive symmetry :-(. Furthermore the literature mostly deals with phase transition results “for sufficiently large values of the parameter”. This is an invaluable guide, but image analysis needs actual estimates! 5.2. Random clusters Home Page Title Page Contents JJ II J I Page 133 of 167 Go Back Full Screen Close Quit A similar problem, concerning Ising models on products of trees with Euclidean lattices, is treated by Newman and Wu (1990). We follow them by exploiting the celebrated Fortuin-Kasteleyn random cluster representation (Fortuin and Kasteleyn 1972; Fortuin 1972a; Fortuin 1972b): The Ising model is the marginal site process at q = 2 of a site/bond process derived from a dependent bond percolation model with configuration probability Pq,p proportional to qC × Y (phx,yi )bhx,yi × (1 − phx,yi )1−bhx,yi . hx,yi∈E(G) (where bhx,yi indicates whether or not hx, yi is closed, and C is the number of connected clusters of vertices). Site spins are chosen to be the same in each cluster independently of other clusters with equal probabilities for ±1. We find phx,yi = 1 − exp(−Jhx,yi ) ; (5.2) Argue for general q (Potts model): spot the Gibbs’ sampler! (I) given bonds, chose connected component colours independently and uniformly from 1, . . . , q; (II) given colouring, open bonds hx, yi independently, probability phx,yi . Home Page Title Page Exercise 5.2 Check step (II) gives correct conditional probabilities! Now compute conditional probability that site o gets colour 1 given neighbours split into components C1 , . . . , Cr with colour 1, and other neighbours in set C0 . Contents Exercise 5.3 Use inclusion-exclusion to show it is this: JJ II J I Page 134 of 167 ! Y 1 P [colour(o) = 1|neighbours] ∝ ×q (1 − phx,yi ) + q y ! Y Y + (1 − phx,yi ) 1− 1 − phx,zi y∈C0 Go Back z∈C1 ∪...∪Cr = Y (1 − phx,yi ) . y:colour(y)6=1 Full Screen Close Quit Deduce interaction strength is Jhx,yi = − log(1 − phx,yi ) as stated in Equation (5.2) above. FK-comparison inequalities Home Page If q ≥ 1 and A is an increasing event then Pq,p (A) ≤ P1,p (A) Pq,p (A) ≥ P1,p0 (A) Title Page Contents where p0hx,yi = JJ II J I Page 135 of 167 Go Back Full Screen Close Quit (5.3) (5.4) phx,yi phx,yi = . phx,yi + (1 − phx,yi )q q − (q − 1)phx,yi Since P1,p is bond percolation (bonds open or not independently of each other), we can find out about phase transitions by studying independent bond percolation. 5.3. Percolation Home Page Title Page Contents JJ II J I Page 136 of 167 Go Back Independent bond percolation on products of trees with Euclidean lattices have been studied by Grimmett and Newman (1990), and these results were used in the Newman and Wu work on the Ising model. So we can make good progress by studying independent bond percolation on Qd , using pτ for parental bonds, pλ for neighbour bonds. Theorem 5.4 There is almost surely no infinite cluster in Qd;0 (and consequently in Qd (o)) if q 2d τ Xλ 1 + 1 − Xλ−1 < 1, where Xλ is the mean size of the percolation cluster at the origin for λpercolation in Zd . Full Screen Close Quit Modelled on Grimmett and Newman (1990, §3 and §5). Get from matrix spectral asymptotics. 1+ q 1 − Xλ−1 1 Home Page Title Page λ Contents ? JJ II J I Infinite clusters here Page 137 of 167 0 Go Back No infinite clusters here 0 (may or may not be unique) 2 −d 1 τ Full Screen Close Quit The story so far: small λ, small to moderate τ . Case of small τ Home Page Title Page Contents JJ II J I Page 138 of 167 Go Back Full Screen Close Quit Need d = 2 for mathematical convenience. Use Borel-Cantelli argument and planar duality to show, for supercritical λ > 1/2 (that is, supercritical with respect to planar bond percolation!), all but finitely many of the resolution layers Ln = [1, 2n ] × [1, 2n ] of Q2 (o) have just one large cluster each of diameter larger than constant × n. Hence . . . Theorem 5.5 When λ > 1/2 and τ is positive there is one and only one infinite cluster in Q2 (o). 1 Home Page Just one unique infinite cluster here Title Page λ Contents JJ II J I 1/2 ? Page 139 of 167 0 Go Back Full Screen Close Quit No infinite clusters here 0 Infinite clusters here (may or may not be unique) 1/4 τ The story so far: adds small τ for case d = 2. 1 Uniqueness of infinite clusters Home Page Title Page Contents JJ II J I Page 140 of 167 Go Back Full Screen Close Quit The Grimmett and Newman (1990) work was remarkable in pointing out that as τ increases so there is a further phase change, from many to just one infinite cluster for λ > 0. The work of Grimmett and Newman carries √ through for Qd (o). However the relevant bound is improved by a factor of 2 if we take into account the hyperbolic structure of Qd (o)! Theorem 5.6 If τ < 2−(d−1)/2 and λ > 0 then there cannot be just one infinite cluster in Qd;0 . Method: sum weights of “up-paths” in Qd;0 starting, ending at level 0. For fixed s and start point there are infinitely many such up-paths containing s λ-bonds; but no more than (1 + 2d + 2d )s which cannot be reduced by “shrinking” excursions. Hence control the mean number of open up-paths stretching more than a given distance at level 0. Contribution to upper bound on second phase transition: Home Page Title Page Contents JJ II J I Page 141 of 167 Go Back Full Screen Close Quit p Theorem 5.7 If τ > 2/3 then the infinite cluster of Q2:0 is almost surely unique for all positive λ. Method: prune bonds, branching processes, 2-dim comparison . . . 1 Home Page Just one unique infinite cluster here Title Page Contents JJ II J λ 1/2 I Page 142 of 167 0 Go Back Infinite clusters here ? 0 (may or may not be unique) No infinite clusters here Many infinite clusters 1 1/4 τ 0.707 0.816 The story so far: includes uniqueness transition for case d = 2. Full Screen Close Quit 5.4. Comparison Home Page Title Page We need to apply the Fortuin-Kasteleyn comparison inequalities (5.3) and (5.4). The event “just one infinite cluster” is not increasing, so we need more. Newman and Wu (1990) show it suffices to establish a finite island property for the site percolation derived under adjacency when all infinite clusters are removed. Thus: Contents JJ II J I 1 λ finite islands 1/2 Page 143 of 167 Go Back 0 Full Screen 0 1 1/4 τ Close Quit 0.707 0.816 Home Page Comparison arguments then show the following schematic phase diagram for the Ising model on Q2 (o): Title Page Contents JJ II J I All nodes substantially Free boundary condition correlated with each is mixture of the two extreme other in case of free Gibbs states (spin 1 at boundary conditions boundary, spin −1 at boundary) 1.099 0.693 Page 144 of 167 Go Back Unique J Full Screen λ Gibbs state J Close Quit τ Root influenced by wired boundary 0.288 0.511 0.762 1.228 1.190 2.292 5.5. Simulation Home Page Title Page Approximate simulations confirm the general story (compare our original animation, which had Jτ = 2, Jλ = 1/4): Contents http://www.dcs.warwick.ac.uk/˜rgw/sira/sim.html JJ II J I (1) Only 200 resolution levels; (2) At each level, 1000 sweeps in scan order; Page 145 of 167 Go Back Full Screen (3) At each level, simulate square sub-region of 128 × 128 pixels conditioned by mother 64 × 64 pixel region; (4) Impose periodic boundary conditions on 128 × 128 square region; Close Quit (5) At the coarsest resolution, all pixels set white. At subsequent resolutions, ‘all black’ initial state. Home Page Title Page Contents JJ II J I (a) Jλ = 1, Jτ = 0.5 (b) Jλ = 1, Jτ = 1 (c) Jλ = 1, Jτ = 2 (d) Jλ = 0.5, Jτ = 0.5 (e) Jλ = 0.5, Jτ = 1 (f) Jλ = 0.5, Jτ = 2 (g) Jλ = 0.25, Jτ = 0.5 (h) Jλ = 0.25, Jτ = 1 (i) Jλ = 0.25, Jτ = 2 Page 146 of 167 Go Back Full Screen Close Quit 5.6. Future work Home Page Title Page Contents JJ II J I Page 147 of 167 Go Back Full Screen Close Quit This is about the free Ising model on Q2 (o). Image analysis more naturally concerns the case of prescribed boundary conditions (say, image at finest resolution level . . . ). Question: will boundary conditions at “infinite fineness” propagate back to finite resolution? Series and Sinaı̆ (1990) show answer is yes for analogous problem on hyperbolic disk (2-dim, all bond probabilities the same). Gielis and Grimmett (2002) point out (eg, in Z3 case) these boundary conditions translate to a conditioning for random cluster model, and investigate using large deviations. Project: do same for Q2 (o) . . . and get quantitative bounds? Project: generalize from Qd to graphs with appropriate hyperbolic structure. Home Page Title Page Contents JJ II J I Page 148 of 167 Go Back Full Screen Close Quit Home Page Title Page Contents JJ II J I Page 149 of 167 Go Back Full Screen Close Quit Appendix A Notes on the proofs in Chapter 5 A.1. Notes on proof of Theorem 5.4 Home Page Mean size of cluster at o bounded above by ∞ X X Title Page n−T (t) Xλ τ n (Xλ − 1)T (t) Xλ n=0 t:|t|=n Contents ≤ JJ Xλ (τ Xλ )n n=0 II ≤ J ∞ X I ∞ X Page 150 of 167 (1 − Xλ−1 )T (t) t:|t|=n Xλ (2d τ Xλ )n n=0 ≈ X X (1 − Xλ−1 )T (j) j:|j|=n ∞ X n q Xλ (2d τ Xλ )n 1 + 1 − Xλ−1 . n=0 Go Back Full Screen For last step, use spectral analysis of matrix representation n X 1 1 1 −1 T (j) 1 1 (1 − Xλ ) = . 1 1 − Xλ−1 1 j:|j|=n Close Quit A.2. Notes on proof of Theorem 5.5 Home Page Title Page Contents JJ II J I Uniqueness: For negative exponent ξ(1 − λ) of dual connectivity function, set `n = (n log 4 + (2 + ) log n) ξ(1 − λ) . More than one “`n -large” cluster in Ln forces existence of open path in dual lattice longer than `n . Now use Borel-Cantelli . . . . On the other hand super-criticality will mean some distant points in Ln are interconnected. Existence: consider 4n−[n/2] points in Ln−1 and specified daughters in Ln . Study probability that (a) parent percolates more than `n−1 , Page 151 of 167 Go Back (b) parent and child are connected, (c) child percolates more than `n . Now use Borel-Cantelli again . . . . Full Screen Close Quit A.3. Notes on proof of Theorem 5.6 Home Page Two relevant lemmata: Title Page Lemma A.1 Consider u ∈ Ls+1 ⊂ Qd and v = M(u) ∈ Ls ⊂ Qd . There are exactly 2d solutions in Ls+1 of Contents M(x) JJ II J I Page 152 of 167 = Su;v (x) . One is x = u. Others are the remaining 2d − 1 vertices y such that closure of cell representing y intersects vertex shared by closures of cells representing u and M(u). Finally, if x ∈ Ls+1 does not solve M(x) = Su;v (x) then kSu;v (x) − Su;v (u)ks,∞ > kM(x) − M(u)ks,∞ . Go Back Full Screen Lemma A.2 Given distinct v and y in the same resolution level. Count pairs of vertices u, x in the resolution level one step higher, such that (a) M(u) = v; (b) M(x) = y; (c) Su;v (x) = y. Close Quit There are at most 2d−1 such vertices. A.4. Notes on proof of Theorem 5.7 Home Page Prune! Then a direct connection is certainly established across the boundary between the cells corresponding to two neighbouring vertices u, v in L0 if Title Page (a) the τ -bond leading from u to the relevant boundary is open; Contents JJ II J I (b) a τ -branching process (formed by using τ -bonds mirrored across the boundary) survives indefinitely, where this branching process has family-size distribution Binomial(2, τ 2 ); (c) the τ -bond leading from v to the relevant boundary is open. Page 153 of 167 Go Back Full Screen Close Quit Then there are infinitely many chances of making a connection across the cell boundary. A.5. Notes on proof of infinite island property Home Page Title Page Notion of “cone boundary” ∂c (S) of finite subset S of vertices: collection of daughters v of S such that Qd (v) ∩ S = ∅. Use induction on S, building it layer Ln on layer Ln−1 to obtain an isoperimetric bound: #(∂c (S)) ≥ (2d − 1)#(S). Hence deduce Contents P [S in island at u] JJ II J I Page 154 of 167 ≤ (2d −1)n (1 − pτ (1 − η)) where #(S) = n and η = P [u not in infinite cluster of Qd (u)]. Upper bound on number N (n) of self-avoiding paths S of length n beginning at u0 : N (n) ≤ (1 + 2d + 2d )(2d + 2d )n . Hence upper bound on the mean size of the island: ∞ X Go Back n(1−2−d ) (1 + 2d + 2d )(2d + 2d )n ηbr , n=0 Full Screen Close Quit where ηbr is extinction probability for branching process based on Binomial(2d , pτ ) family distribution. Home Page Title Page Bibliography Contents JJ II J I Page 155 of 167 Go Back Full Screen Close Quit Aldous, D. and M. T. Barlow [1981]. On countable dense random sets. Lecture Notes in Mathematics 850, 311– 327. Aldous, D. J. and P. Diaconis [1987]. Strong uniform times and finite random walks. Advances in Applied Mathematics 8(1), 69–97. Baddeley, A. J. and J. Møller [1989]. Nearest-neighbour Markov point processes and random sets. Int. Statist. Rev. 57, 89–121. Baddeley, A. J. and M. N. M. van Lieshout [1993]. Stochastic geometry models in high-level vision. Journal of Applied Statistics 20, 233–258. Baddeley, A. J. and M. N. M. van Lieshout [1995]. Areainteraction point processes. Annals of the Institute of Statistical Mathematics 47, 601–619. Barndorff-Nielsen, O. E., W. S. Kendall, and M. N. M. van Lieshout (Eds.) [1999]. Stochastic Geometry: Likelihood and Computation. Number 80 in Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC. Benjamini, I., R. Lyons, Y. Peres, and Home Page Title Page Contents JJ II J I Page 156 of 167 Go Back Full Screen Close Quit O. Schramm [1999]. Group-invariant percolation on graphs. Geom. Funct. Anal. 9(1), 29–66. Benjamini, I. and O. Schramm [1996]. Percolation beyond Zd , many questions and a few answers. Electronic Communications in Probability 1, no. 8, 71–82 (electronic). Bhalerao, A., E. Thönnes, W. S. Kendall, and R. G. Wilson [2001]. Inferring vascular structure from 2d and 3d imagery. In W. J. Niessen and M. A. Viergever (Eds.), Medical Image Computing and Computer-Assisted Intervention, proceedings of MICCAI 2001, Volume 2208 of Springer Lecture Notes in Computer Science, pp. 820–828. Springer-Verlag. Also: University of Warwick Department of Statistics Research Report 392. Billera, L. J. and P. Diaconis [2001]. A geometric interpretation of the Metropolis-Hastings algorithm. Statistical Science 16(4), 335–339. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone [1984]. Clas- sification and Regression. Statistics/Probability. Wadsworth. Brooks, S. P., Y. Fan, and J. S. Rosenthal [2002]. Perfect forward simulation via simulated tempering. Research report, University of Cambridge. Burdzy, K. and W. S. Kendall [2000, May]. Efficient Markovian couplings: examples and counterexamples. The Annals of Applied Probability 10(2), 362–409. Also University of Warwick Department of Statistics Research Report 331. Cai, Y. and W. S. Kendall [2002, July]. Perfect simulation for correlated Poisson random variables conditioned to be positive. Statistics and Computing 12, 229–243. Also: University of Warwick Department of Statistics Research Report 349. Carter, D. S. and P. M. Prenter [1972]. Exponential spaces and counting processes. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 21, 1–19. Home Page Title Page Contents JJ II J I Page 157 of 167 Go Back Full Screen Close Quit Chipman, H., E. George, and R. McCulloch [2002]. Bayesian treed models. Machine Learning 48, 299–320. Clancy, D. and P. D. O’Neill [2002]. Perfect simulation for stochastic models of epidemics among a community of households. Research report, School of Mathematical Sciences, University of Nottingham. Cowan, R., M. Quine, and S. Zuyev [2003]. Decomposition of gammadistributed domains constructed from Poisson point processes. Advances in Applied Probability 35(1), 56–69. Diaconis, P. [1988]. Group Representations in Probability and Statistics, Volume 11 of IMS Lecture Notes Series. Hayward, California: Institute of Mathematical Statistics. Diaconis, P. and J. Fill [1990]. Strong stationary times via a new form of duality. The Annals of Probability 18, 1483–1522. Ferrari, P. A., R. Fernández, and N. L. Garcia [2002]. Perfect simulation for interacting point processes, loss networks and Ising models. Stochastic Process. Appl. 102(1), 63–88. Fill, J. [1998]. An interruptible algorithm for exact sampling via Markov Chains. The Annals of Applied Probability 8, 131–162. Fill, J. A., M. Machida, D. J. Murdoch, and J. S. Rosenthal [2000]. Extension of Fill’s perfect rejection sampling algorithm to general chains. Random Structures and Algorithms 17(3-4), 290–316. Fortuin, C. M. [1972a]. On the randomcluster model. II. The percolation model. Physica 58, 393–418. Fortuin, C. M. [1972b]. On the randomcluster model. III. The simple random-cluster model. Physica 59, 545–570. Fortuin, C. M. and P. W. Kasteleyn [1972]. On the random-cluster model. I. Introduction and relation to other models. Physica 57, 536–564. Home Page Title Page Contents JJ II J I Page 158 of 167 Go Back Full Screen Close Quit Foss, S. G. and R. L. Tweedie [1998]. Perfect simulation and backward coupling. Stochastic Models 14, 187–203. Gelfand, A. E. and A. F. M. Smith [1990]. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85(410), 398–409. Geman, S. and D. Geman [1984]. Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741. Geyer, C. J. and J. Møller [1994]. Simulation and likelihood inference for spatial point processes. Scand. J. Statist. 21, 359–373. Gielis, G. and G. R. Grimmett [2002]. Rigidity of the interface for percolation and random-cluster models. Journal of Statistical Physics 109, 1–37. See also arXiv:math.PR/0109103., Green, P. J. [1995]. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732. Grenander, U. and M. I. Miller [1994]. Representations of knowledge in complex systems (with discussion and a reply by the authors). Journal of the Royal Statistical Society (Series B: Methodological) 56(4), 549– 603. Grimmett, G. R. and C. Newman [1990]. Percolation in ∞ + 1 dimensions. In Disorder in physical systems, pp. 167–190. New York: The Clarendon Press Oxford University Press. Also at Hadwiger, H. [1957]. Vorlesungen über Inhalt, Oberfläche und Isoperimetrie. Berlin: Springer-Verlag. Häggström, O., J. Jonasson, and R. Lyons [2002]. Explicit isoperimetric constants and phase transitions in the random-cluster model. The Annals of Probability 30(1), 443–473. Home Page Title Page Contents JJ II J I Page 159 of 167 Go Back Full Screen Close Quit Häggström, O. and Y. Peres [1999]. Monotonicity of uniqueness for percolation on Cayley graphs: all infinite clusters are born simultaneously. Probability Theory and Related Fields 113(2), 273–285. Häggström, O., Y. Peres, and R. H. Schonmann [1999]. Percolation on transitive graphs as a coalescent process: relentless merging followed by simultaneous uniqueness. In Perplexing problems in probability, pp. 69– 90. Boston, MA: Birkhäuser Boston. Häggström, O. and J. E. Steif [2000]. Propp-Wilson algorithms and finitary codings for high noise Markov random fields. Combin. Probab. Computing 9, 425–439. Häggström, O., M. N. M. van Lieshout, and J. Møller [1999]. Characterisation results and Markov chain Monte Carlo algorithms including exact simulation for some spatial point processes. Bernoulli 5(5), 641–658. Was: Aalborg Mathematics Department Research Report R-96-2040. Hall, P. [1988]. Introduction to the theory of coverage processes. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: John Wiley & Sons. Hastings, W. K. [1970]. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. Himmelberg, C. J. [1975]. Measurable relations. Fundamenta mathematicae 87, 53–72. Huber, M. [2003]. A bounding chain for Swendsen-Wang. Random Structures and Algorithms 22(1), 43–59. Hug, D., M. Reitzner, and R. Schneider [2003]. The limit shape of the zero cell in a stationary poisson hyperplane tessellation. Preprint 03-03, Albert-Ludwigs-Universität Freiburg. Jeulin, D. [1997]. Conditional simulation of object-based models. In D. Jeulin (Ed.), Advances in Theory and Ap- Home Page Title Page Contents JJ II J I Page 160 of 167 Go Back Full Screen Close Quit plications of Random Sets, pp. 271– 288. World Scientific. Kallenberg, O. [1977]. A counterexample to R. Davidson’s conjecture on line processes. Math. Proc. Camb. Phil. Soc. 82(2), 301–307. Kelly, F. P. and B. D. Ripley [1976]. A note on Strauss’s model for clustering. Biometrika 63, 357–360. Kendall, D. G. [1974]. Foundations of a theory of random sets. In E. F. Harding and D. G. Kendall (Eds.), Stochastic Geometry. John Wiley & Sons. Kendall, W. S. [1997a]. On some weighted Boolean models. In D. Jeulin (Ed.), Advances in Theory and Applications of Random Sets, Singapore, pp. 105–120. World Scientific. Also: University of Warwick Department of Statistics Research Report 295. Kendall, W. S. [1997b]. Perfect simulation for spatial point processes. In Bulletin ISI, 51st session proceed- ings, Istanbul (August 1997), Volume 3, pp. 163–166. Also: University of Warwick Department of Statistics Research Report 308. Kendall, W. S. [1998]. Perfect simulation for the area-interaction point process. In L. Accardi and C. C. Heyde (Eds.), Probability Towards 2000, New York, pp. 218–234. SpringerVerlag. Also: University of Warwick Department of Statistics Research Report 292. Kendall, W. S. [2000, March]. Stationary countable dense random sets. Advances in Applied Probability 32(1), 86–100. Also University of Warwick Department of Statistics Research Report 341. Kendall, W. S. and J. Møller [2000, September]. Perfect simulation using dominating processes on ordered state spaces, with application to locally stable point processes. Advances in Applied Probability 32(3), 844–865. Also University of Warwick Department of Statistics Home Page Title Page Contents JJ II J I Page 161 of 167 Go Back Full Screen Close Quit Research Report 347. Kendall, W. S. and G. Montana [2002, May]. Small sets and Markov transition densities. Stochastic Processes and Their Applications 99(2), 177– 194. Also University of Warwick Department of Statistics Research Report 371 Kendall, W. S. and E. Thönnes [1999]. Perfect simulation in stochastic geometry. Pattern Recognition 32(9), 1569–1586. Also: University of Warwick Department of Statistics Research Report 323. Kendall, W. S., M. van Lieshout, and A. Baddeley [1999]. Quermassinteraction processes: conditions for stability. Advances in Applied Probability 31(2), 315–342. Also University of Warwick Department of Statistics Research Report 293. Kendall, W. S. and R. G. Wilson [2003, March]. Ising models and multiresolution quad-trees. Advances in Applied Probability 35(1), 96–122. Also University of Warwick De- partment of Statistics Research Report 402. Kingman, J. F. C. [1993]. Poisson processes, Volume 3 of Oxford Studies in Probability. New York: The Clarendon Press Oxford University Press. Oxford Science Publications. Kolmogorov, A. N. [1936]. Zur theorie der markoffschen ketten. Math. Annalen 112, 155–160. Kovalenko, I. [1997]. A proof of a conjecture of David Kendall on the shape of random polygons of large area. Kibernet. Sistem. Anal. (4), 3– 10, 187. English translation: Cybernet. Systems Anal. 33 461-467. Kovalenko, I. N. [1999]. A simplified proof of a conjecture of D. G. Kendall concerning shapes of random polygons. J. Appl. Math. Stochastic Anal. 12(4), 301–310. Kumar, V. S. A. and H. Ramesh [2001]. Coupling vs. conductance for the Jerrum-Sinclair chain. Random Structures and Algorithms 18(1), 1– 17. Home Page Title Page Contents JJ II J I Page 162 of 167 Go Back Full Screen Close Quit Kurtz, T. G. [1980]. The optional sampling theorem for martingales indexed by directed sets. The Annals of Probability 8, 675–681. Lee, A. B., D. Mumford, and J. Huang [2001]. Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model. International Journal of Computer Vision 41, 35–59. Li, W. and A. Bhalerao [2003]. Model based segmentation for retinal fundus images. In Proceedings of Scandinavian Conference on Image Analysis SCIA’03, Göteborg, Sweden, pp. to appear. Liang, F. and W. H. Wong [2000]. Evolutionary Monte Carlo sampling: Applications to Cp model sampling and change-point problem. Statistica Sinica 10, 317–342. Liang, F. and W. H. Wong [2001, June]. Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models. Journal of the American Statistical Association 96(454), 653–666. Lindvall, T. [1982]. On coupling of continuous-time renewal processes. Journal of Applied Probability 19(1), 82–89. Lindvall, T. [2002]. Lectures on the coupling method. Mineola, NY: Dover Publications Inc. Corrected reprint of the 1992 original. Liu, J. S., F. Liang, and W. H. Wong [2001, June]. A theory for dynamic weighting in Monte Carlo computation. Journal of the American Statistical Association 96(454), 561–573. Loizeaux, M. A. and I. W. McKeague [2001]. Perfect sampling for posterior landmark distributions with an application to the detection of disease clusters. In Selected Proceedings of the Symposium on Inference for Stochastic Processes, Volume 37 of Lecture Notes - Monograph Series, pp. 321–331. Institute of Mathematical Statistics. Home Page Title Page Contents JJ II J I Page 163 of 167 Go Back Full Screen Close Quit Loynes, R. [1962]. The stability of a queue with non-independent interarrival and service times. Math. Proc. Camb. Phil. Soc. 58(3), 497–520. Lyons, R. [2000]. Phase transitions on nonamenable graphs. J. Math. Phys. 41(3), 1099–1126. Probabilistic techniques in equilibrium and nonequilibrium statistical physics. Matheron, G. [1975]. Random Sets and Integral Geometry. Chichester: John Wiley & Sons. Meester, R. and R. Roy [1996]. Continuum percolation, Volume 119 of Cambridge Tracts in Mathematics. Cambridge: Cambridge University Press. Metropolis, N., A. V. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller [1953]. Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092. Meyn, S. P. and R. L. Tweedie [1993]. Markov Chains and Stochastic Sta- bility. New York: Springer-Verlag. Miles, R. E. [1995]. A heuristic proof of a long-standing conjecture of D. G. Kendall concerning the shapes of certain large random polygons. Advances in Applied Probability 27(2), 397–417. Minlos, R. A. [1999]. Introduction to Mathematical Statistical Physics, Volume 19 of University Lecture Series. American Mathematical Society. Mira, A., J. Møller, and G. O. Roberts [2001]. Perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 63(3), 593–606. See (Mira, Møller, and Roberts 2002) for corrigendum. Mira, A., J. Møller, and G. O. Roberts [2002]. Corrigendum: perfect slice samplers. Journal of the Royal Statistical Society (Series B: Methodological) 64(3), 581. Molchanov, I. S. [1996a]. A limit theorem for scaled vacancies of Home Page Title Page Contents JJ II J I Page 164 of 167 Go Back Full Screen Close Quit the Boolean model. Stochastics and Stochastic Reports 58(1-2), 45–65. Molchanov, I. S. [1996b]. Statistics of the Boolean Models for Practitioners and Mathematicians. Chichester: John Wiley & Sons. Møller, J. (Ed.) [2003]. Spatial Statistics and Computational Methods, Volume 173 of Lecture Notes in Statistics. Springer-Verlag. Møller, J. and G. Nicholls [1999]. Perfect simulation for sample-based inference. Technical Report R-99-2011, Department of Mathematical Sciences, Aalborg University. Møller, J. and R. Waagepetersen [2003]. Statistics inference and simulation for spatial point processes. Monographs on Statistics and Applied Probability. Boca Raton: Chapman and Hall / CRC. Møller, J. and S. Zuyev [1996]. Gammatype results and other related properties of Poisson processes. Advances in Applied Probability 28(3), 662– 673. Murdoch, D. J. and P. J. Green [1998]. Exact sampling from a continuous state space. Scandinavian Journal of Statistics Theory and Applications 25, 483–502. Mykland, P., L. Tierney, and B. Yu [1995]. Regeneration in Markov chain samplers. Journal of the American Statistical Association 90(429), 233–241. Newman, C. and C. Wu [1990]. Markov fields on branching planes. Probability Theory and Related Fields 85(4), 539–552. Nummelin, E. [1984]. General irreducible Markov chains and nonnegative operators. Cambridge: Cambridge University Press. O’Neill, P. D. [2001]. Perfect simulation for Reed-Frost epidemic models. Statistics and Computing 13, 37– 44. Parkhouse, J. G. and A. Kelly [1998]. The regular packing of fibres in three Home Page Title Page Contents JJ II J I Page 165 of 167 Go Back Full Screen Close Quit dimensions. Philosophical Transactions of the Royal Society, London, Series A 454(1975), 1889–1909. Peskun, P. H. [1973]. Optimum Monte Carlo sampling using Markov chains. Biometrika 60, 607–612. Preston, C. J. [1977]. Spatial birth-anddeath processes. Bull. Inst. Internat. Statist. 46, 371–391. Propp, J. G. and D. B. Wilson [1996]. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms 9, 223–252. Quine, M. P. and D. F. Watson [1984]. Radial generation of n-dimensional Poisson processes. Journal of Applied Probability 21(3), 548–557. Ripley, B. D. [1977]. Modelling spatial patterns (with discussion). Journal of the Royal Statistical Society (Series B: Methodological) 39, 172–212. Ripley, B. D. [1979]. Simulating spatial patterns: dependent samples from a multivariate density. Applied Statis- tics 28, 109–112. Ripley, B. D. and F. P. Kelly [1977]. Markov point processes. J. Lond. Math. Soc. 15, 188–192. Robbins, H. E. [1944]. On the measure of a random set I. Annals of Mathematical Statistics 15, 70–74. Robbins, H. E. [1945]. On the measure of a random set II. Annals of Mathematical Statistics 16, 342–347. Roberts, G. O. and N. G. Polson [1994]. On the geometric convergence of the Gibbs sampler. Journal of the Royal Statistical Society (Series B: Methodological) 56(2), 377–384. Roberts, G. O. and J. S. Rosenthal [1999]. Convergence of slice sampler Markov chains. Journal of the Royal Statistical Society (Series B: Methodological) 61(3), 643–660. Roberts, G. O. and J. S. Rosenthal [2000]. Recent progress on computable bounds and the simple slice sampler. In Monte Carlo methods (Toronto, ON, 1998), pp. 123–130. Home Page Title Page Contents JJ II J I Page 166 of 167 Go Back Full Screen Close Quit Providence, RI: American Mathematical Society. Roberts, G. O. and R. L. Tweedie [1996a]. Exponential convergence of Langevin diffusions and their discrete approximations. Bernoulli 2, 341–364. Roberts, G. O. and R. L. Tweedie [1996b]. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 96–110. Rosenthal, J. S. [2002]. Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability 7, no. 13, 123–128 (electronic). Series, C. and Y. Sinaı̆ [1990]. Ising models on the Lobachevsky plane. Comm. Math. Phys. 128(1), 63–76. Spitzer, F. [1975]. Markov random fields on an infinite tree. The Annals of Probability 3(3), 387–398. Stoyan, D., W. S. Kendall, and J. Mecke [1995]. Stochastic geometry and its applications (Second ed.). Chichester: John Wiley & Sons. (First edition in 1987 joint with Akademie Verlag, Berlin). Strauss, D. J. [1975]. A model for clustering. Biometrika 63, 467–475. Thönnes, E. [1999]. Perfect simulation of some point processes for the impatient user. Advances in Applied Probability 31(1), 69–87. Also University of Warwick Statistics Research Report 317. Thönnes, E., A. Bhalerao, W. S. Kendall, and R. G. Wilson [2002]. A Bayesian approach to inferring vascular tree structure from 2d imagery. In W. J. Niessen and M. A. Viergever (Eds.), Proceedings IEEE ICIP-2002, Volume II, Rochester, NY, pp. 937–939. Also: University of Warwick Department of Statistics Research Report 391. Thönnes, E., A. Bhalerao, W. S. Kendall, and R. G. Wilson [2003]. Bayesian analysis of vascular structure. In LASR 2003, Volume to appear. Home Page Title Page Contents JJ II J I Page 167 of 167 Go Back Full Screen Close Quit Thorisson, H. [2000]. Coupling, stationarity, and regeneration. New York: Springer-Verlag. Tierney, L. [1998]. A note on MetropolisHastings kernels for general state spaces. The Annals of Applied Probability 8(1), 1–9. Van Lieshout, M. N. M. [2000]. Markov point processes and their applications. Imperial College Press. Van Lieshout, M. N. M. and A. J. Baddeley [2001, October]. Extrapolating and interpolating spatial patterns. CWI Report PNA-R0117, CWI, Amsterdam. Widom, B. and J. S. Rowlinson [1970]. New model for the study of liquidvapor phase transitions. Journal of Chemical Physics 52, 1670–1684. Wilson, D. B. [2000a]. How to couple from the past using a readonce source of randomness. Random Structures and Algorithms 16(1), 85–113. Wilson, D. B. [2000b]. Layered Multishift Coupling for use in Perfect Sampling Algorithms (with a primer on CFTP). In N. Madras (Ed.), Monte Carlo Methods, Volume 26 of Fields Institute Communications, pp. 143–179. American Mathematical Society. Wilson, R., A. Calway, and E. R. S. Pearson [March 1992]. A Generalised Wavelet Transform for Fourier Analysis: The Multiresolution Fourier Transform and Its Application to Image and Audio Signal Analysis. IEEE Transactions on Information Theory 38(2), 674–690. Wilson, R. G. and C. T. Li [2002]. A class of discrete multiresolution random fields and its application to image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 42–56. Zuyev, S. [1999]. Stopping sets: gammatype results and hitting properties. Advances in Applied Probability 31(2), 355–366.