Interacting Stochastic Systems Benjamin Graham Fitzwilliam College and Statistical Laboratory University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy August 2007 Preface This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. I would like to thank my PhD supervisor, Geoffrey Grimmett. Chapters 2 and 3 were done in collaboration with him. We have agreed that 65% of the work is mine. The presentation is entirely my own. The work has also been submitted to journals as joint publications [31, 32]. I am the sole author of Chapter 4. I would like to acknowledge the financial support I have received from the Engineering and Physical Sciences Research Council under a Doctoral Training Award to the University of Cambridge. Abstract Statistical mechanics is the branch of physics that seeks to explain the properties of matter that emerge from microscopic scale interactions. Mathematicians are attempting to acquire a rigorous understanding of the models and calculations arising from physicists’ work. The field of interacting particle systems has proved a fruitful area for mathematical research. Two of the most fundamental models are the Ising model and bond percolation. The Ising model was suggested by Lenz in 1920. It is a probabilistic model for ferromagnetism. Magnetization can then be explored as correlation between spin random variables on a graph. Bond percolation was introduced by Simon Broadbent and John Hammersley in 1957. It is a model for long range order. Edges of a lattice graph are declared open, independently, with some probability p, and clusters of open edges are studied. Both these models can be understood as aspects of the random-cluster model. In this thesis we study various aspects of mathematical statistical mechanics. In Chapter 2 we create a diluted version of the random-cluster model. This allows the coupling of the Ising model to the random-cluster model to be extended to include the Blume–Capel model. Crucially, it retains some of the key properties of its parent model. This enables much of the random-cluster technology to be carried forward. The key issue for bond percolation concerns the fraction of open edges required in order to have long range connectivity. Harry Kesten proved that this fraction is precisely one half for the square planar lattice. Recent development in the theory of influence and sharp thresholds allowed Béla Bollobás and Oliver Riordan to simplify parts of his proof. In Chapter 3 we extend an influence result to apply to monotonic measures. This allows sharp thresholds to be shown for certain families of stochastically increasing monotonic distributions, including random-cluster measures. In Chapter 4 we study time to convergence for a mean-field zerorange process. The problem was motivated by the canonical ensemble model of energy levels used in the derivation of Maxwell–Boltzmann distributions. By considering the entropy of the system, we show that the empirical distribution rapidly converges—in the sense of KullbackLeibler divergence—to a geometric distribution. The proof utilizes arguments of spectral gap and log Sobolev type. Contents 1 Introduction 1.1 Phase transition and statistical mechanics . . . . . . . . . . . . . 1 1 1.2 Bond percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Gibbs distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 1.5 The Ising and Potts models . . . . . . . . . . . . . . . . . . . . . The random-cluster measure . . . . . . . . . . . . . . . . . . . . . 6 8 1.6 The FKG inequality and Holley’s theorem . . . . . . . . . . . . . 10 1.7 Influence and sharp threshold . . . . . . . . . . . . . . . . . . . . 12 1.8 Microcanonical ensemble Markov chains . . . . . . . . . . . . . . 14 2 Blume–Capel model 16 2.1 The Blume–Capel model . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 2.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FKG and Holley’s theorem . . . . . . . . . . . . . . . . . . . . . . 18 19 2.4 The diluted-random-cluster representation of the BCP model . . . 20 2.5 The vertex DRC marginal measure . . . . . . . . . . . . . . . . . 24 2.6 2.7 Edge comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . Infinite-volume measures . . . . . . . . . . . . . . . . . . . . . . . 29 31 2.8 Uniqueness by convexity of pressure . . . . . . . . . . . . . . . . . 37 2.9 Phases and comparison results . . . . . . . . . . . . . . . . . . . . 40 3 Influence and monotonic measures 3.1 Influence beyond KKL . . . . . . . . . . . . . . . . . . . . . . . . 46 46 3.2 Symmetric events . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 Sharp thresholds for product measures . . . . . . . . . . . . . . . 49 iv CONTENTS 3.4 Self duality for bond percolation on L2 . . . . . . . . . . . . . . . 50 3.5 3.6 Influence for monotonic measures . . . . . . . . . . . . . . . . . . Russo formula for monotonic measures . . . . . . . . . . . . . . . 52 56 3.7 Sharp thresholds for the random-cluster measure . . . . . . . . . . 58 3.8 Self duality for the random-cluster model . . . . . . . . . . . . . . 59 3.9 Long-ways crossings and the critical point . . . . . . . . . . . . . 3.10 Influence for continuous random variables . . . . . . . . . . . . . . 61 62 4 The mean-field zero-range process 68 4.1 4.2 Balls and boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 71 4.3 The Ehrenfest urn model . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Markov chain convergence . . . . . . . . . . . . . . . . . . . . . . 75 4.5 4.6 Entropy for the mean-field ZRP . . . . . . . . . . . . . . . . . . . Total variation convergence . . . . . . . . . . . . . . . . . . . . . 77 80 4.7 A biased random walk . . . . . . . . . . . . . . . . . . . . . . . . 81 4.8 Concentration results . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.9 Coupling results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Slow initial decrease in S . . . . . . . . . . . . . . . . . . . . . . . 85 92 4.11 Rapid decrease in S . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.12 Final observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.13 Proofs relating to entropy and the biased random walk . . . . . . 101 4.14 Technical lemmas relating to ∆S . . . . . . . . . . . . . . . . . . 106 References 117 v List of Figures 1.1 Approximate phase diagram for H2 O . . . . . . . . . . . . . . . . 1.2 Percolation with p = 0.48, 0.50 and 0.52 on the 20 × 20 torus and 2 the 200 × 200 torus. The largest open cluster is black, the other open edges are grey. . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Approximate Blume–Capel phase diagram . . . . . . . . . . . . . 41 2.2 2.3 Approximate Blume–Capel phase diagram . . . . . . . . . . . . . Blume–Capel comparison results . . . . . . . . . . . . . . . . . . . 42 43 3.1 Finite graph and dual . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Square planar lattice and isomorphic dual . . . . . . . . . . . . . 51 4.1 A sample path: the empirical distribution for N = 107 , R = 20 at 4.2 4.3 intervals of 40N steps . . . . . . . . . . . . . . . . . . . . . . . . . 70 Hydrodynamic limit XRf tends to 1/2 . . . . . . . . . . . . . . . . Graph of δ(x, y) as a function of y . . . . . . . . . . . . . . . . . . 73 74 vi Chapter 1 Introduction 1.1 Phase transition and statistical mechanics Statistical mechanics seeks to explains the properties of matter that emerge from microscopic scale interactions. It has for a long time been a source of interesting mathematical problems. The field goes back to the work of Boltzmann, Gibbs and Maxwell in the late 19th and early 20th century. Most models are not exactly solvable. Simple local interactions produce complex long range behaviour. In 1895 Pierre Curie discovered that ferromagnetic substances exhibit a phase transition with temperature—in the case of iron at 1043 K. The phenomenon had been observed 60 years earlier by Claude Pouillet; however, he had lacked the means to measure temperature as precisely. In the classic physics experiment, an iron bar is placed with a strong magnetic field parallel to its axis. The bar becomes magnetized. When the magnetic field is reduced to zero, one of two things happens. If the temperature is above the Curie point, the bar retains no magnetism. When the temperature is below the Curie point, it does retain a magnetized state. This is a physical example of a threshold phenomenon. Statistical physicists have devised stochastic models for magnetic spin systems in thermal equilibrium. They allow the study of the phase transition phenomenon. The most famous model for ferromagnetism is the Ising model. It exhibits phase transitions in two and higher dimensions. Another type of phase transition is 1 1.2 Bond percolation Pressure Water Ice Steam Temperature Figure 1.1: Approximate phase diagram for H2 O that exhibited by H2 O. The phase diagram has a tri-critical point where three phases meet—the triple point of water, 273 K, 612 Pa. See Figure 1.1. 1.2 Bond percolation It would be almost impossible to begin a thesis on statistical mechanics without first mentioning percolation. Bond percolation was introduced by Simon Broadbent and John Hammersley in 1957 [12]. It is one of the simplest spatial models with a phase transition phenomenon, yet still some basic question have proved difficult to answer. Bond percolation is defined on a lattice. Rather than defining exactly what we mean by a lattice we will just give the standard example. Let Z = {. . . , −1, 0, 1, . . . } be the set of integers and let Zd , d ≥ 2, be the set of d-tuples of integers, Zd = {(x1 , . . . , xd ) : x1 , . . . , xd ∈ Z}. 2 1.3 Gibbs distributions Define the L1 distance l : Zd × Zd → [0, ∞) by l(x, y) = d X |xi − yi |, x, y ∈ Zd . i=1 d Take E to be the pairs of lattice points at distance one, so l is the normal graph-theoretic distance. Ed = {hx, yi : x, y ∈ Zd , l(x, y) = 1}. This is the set of ‘nearest-neighbours’ of Zd . The graph Ld = (Zd , Ed ) is called the d-dimensional hypercubic lattice. d Let Ω = {0, 1}E . For ω ∈ Ω, we say that an edge is ω-open (or just open) if ω(e) = 1, and ω-closed otherwise. Let (Ω, F, µp ) be the product probability space such that the edge states are independent and each edge is open with probability p. For x, y ∈ Zd , write {x ↔ y} for the event that there is a chain of open edges connecting x to y. Write {x ↔ ∞} for the event that x is the starting point of an arbitrarily long chain of open edges. The open clusters are the maximal sets of connected vertices. The model is said to percolate, or to be supercritical, if µp (0 ↔ ∞) > 0. By Kolmogorov’s zero-one law, the system has an infinite cluster if and only if µp (0 ↔ ∞) > 0. It is clear that µp (0 ↔ ∞) is non-decreasing in p. The critical probability is defined pc = inf{p : µp (0 ↔ ∞) > 0}. The exact critical point for the hypercubic lattice is only known for d = 2; this case is particularly amenable to analysis due to the ‘self-duality’ of the planar lattice. 1.3 Gibbs distributions Suppose a system can be described by a configuration σ ∈ Σ with Σ finite. Let H be the energy function, or Hamiltonian, H : Σ → R. Let T be the temperature of the system and take β = (kB T )−1 to be the inverse-temperature; the constant kB is Boltzmann’s constant. Define the partition function, X Z(β) = e−βH(σ) . σ 3 1.3 Gibbs distributions Figure 1.2: Percolation with p = 0.48, 0.50 and 0.52 on the 20 × 20 torus and the 200 × 200 torus. The largest open cluster is black, the other open edges are grey. 4 1.3 Gibbs distributions Then the Gibbs distribution Gβ is defined by Gβ (σ) = e−βH(σ) . Z(β) We are abusing notation by using Gβ to mean both a probability measure and the corresponding mass function. The motivation for the measure is as follows [45]. Assume now for simplicity that H is an integer valued function. As Σ is finite it does not have interesting statistical properties by itself. Consider a canonical ensemble of N copies of the system, with states σ (1) , . . . , σ (N ) . The total energy of the ensemble is N X Etot = H σ (i) . i=1 Imagine that the copies have been weakly interacting in some way that conserves the total energy, and have reached an equilibrium state. Let X X Ω(Etot ) = ··· δEtot ,Etot , σ (1) σ (N ) counting the number of configurations of the canonical ensemble with total energy Etot . The basic assumption of statistical mechanics is that all compatible configurations are equally likely once the system has reached equilibrium. Such systems are said to have Maxwell–Boltzmann statistics. This gives δE ,E P σ (1) , . . . , σ (N ) | Etot = tot tot . Ω(Etot ) Take the limit N → ∞ with E = Etot /N a fixed integer. We want to show that the marginal distribution of one system converges to a Gibbs distribution. If p(·) = lim P σ (1) = · | Etot = N E , N →∞ then the energy of p is P σ p(σ)H(σ) = E. Define entropy, X S(p) = − p(σ) log p(σ). σ Let δ : (0, ∞) × [0, ∞) → R be given by d δ(x, y) = y log y − x log x − (y − x) t log t t=x ≥ 0. dt 5 1.4 The Ising and Potts models Take inverse temperature β such that Gβ has energy E, E=− ∂ log Z(β). ∂β For all distributions p with energy E, X S(Gβ ) − S(p) = δ(Gβ (σ), p(σ)) ≥ 0. σ Therefore Gβ is the maximum entropy distribution. This implies, Gβ (σ) = lim P(σ (1) = σ | Etot = N E). N →∞ 1.4 The Ising and Potts models The Ising model was suggested by Lenz in 1920, and first analysed by his student Ising in 1925 [39]. It is a probabilistic model for ferromagnetism. Magnetization can be explored as correlation between simple random variables on a graph. Ising analysed the model on the set of integers with nearest neighbour interactions. At finite temperature, the Ising model is simply an ergodic two state Markov Chain— there is no long range order. Certain associated calculations may be done exactly on the d = 2 hypercubic lattice with uniform nearest neighbour interactions and no external magnetic field [45, 47]. There are many similar models for magnetism [2]. Let G = (V, E) be a graph. A spin model consists of a set of spins S that each vertex can take, and a Hamiltonian function H : Σ → R from the set of configurations Σ = S V to energy values. Hamiltonians are often written in an ill defined way, as sums that do not converge on unbounded graphs. However, if two configurations σ1 , σ2 ∈ Σ only differ at a finite number of vertices, it is standard to interpret the sums in such a way that H(σ1 ) − H(σ2 ) is well-defined. Let kB be Boltzmann’s constant and T ∈ (0, ∞) be the temperature. Define inverse temperature β = (kB T )−1 . The Ising model has spin states S = {+1, −1}. Let J ∈ R be the interaction strength and let h ∈ R be the strength of the external magnetic field. The Hamiltonian H : Σ → R is given by X X H(σ) = −J σ(x)σ(y) − h σ(x), σ ∈ Σ = S V . x∈V hx,yi∈E 6 1.4 The Ising and Potts models The first sum is over the set of unordered edges and the second is over the set of vertices. If J > 0 the model is called ferromagnetic; the spins are positively correlated. This is easily seen using the random-cluster representation that will be introduced in the next section. An important special case is h = 0, the Ising model with no external magnetic field. If G is finite we can immediately define the Ising measure π = πG,βJ,βh by the Gibbs formalism, 1 π(σ) = I exp(−βH(σ)), σ ∈ Σ, Z with X I Z I = ZβJ,βh (G) = exp(−βH(σ)) σ∈Σ I the necessary normalizing constant. Z is called the Ising partition function. If G is infinite, constructing the Ising measure is not quite so straightforward. Define Λ = (V, E) a region of G. Let V ⊂ V be a finite set of vertices. Let ∂V be the external vertex boundary of V , the vertices in V \ V that are adjacent to a member of V . Let E be the subset of edges E with both end vertices in V ∪ ∂V . Note (V ∪ ∂V, E) is a subgraph of G. Let ξ ∈ Σ, call ξ the boundary conditions. Let ΣξΛ be the set {σ ∈ Σ : σ(x) = ξ(x) for x ∈ V \ V }. The Ising measure on Λ with boundary conditions ξ is defined by, X X 1 ξ σ(x)σ(y) + βh σ(x) , σ ∈ ΣξΛ . πΛ,βJ,βh (σ) = I exp βJ Z x∈V hx,yi∈E I Again Z I = ZβJ,βh (ξ, Λ) is the required normalizing constant. There are two standard ways to define the Ising model on an infinite graph G. They are both measures on (Σ, F) where F is the σ-algebra generated by the product topology on Σ = S V . The first is as the infinite-volume limit of finite Ising measures. Let (Λn ) be an increasing sequence of regions that tend towards G as n → ∞. WβJ,βh is defined as the set of accumulation points of the sequences of measures πΛξ n ,βJ,βh . There is no loss of generality in using ξ instead of a series of boundary conditions (ξn ), see [33]. Write co WβJ,βh for the convex closure of WβJ,βh . If, for some ξ ∈ Σ and all increasing sequences (Λn ) that tend to G there is a single accumulation point, we 7 1.5 The random-cluster measure ξ can define that point to be πβJ,βh . We can also define the infinite-volume limit with free boundary conditions. Let (Λn ) be an increasing sequence of subgraphs that tend to G, and consider the accumulation points of the measures πΛn ,βJ,βh . The second is the Dobrushin-Lanford-Ruelle (DLR) formalism. Let TΛ be the σ-algebra generated by the states of vertices outside Λ. A DLR measure is a measure π such that for π-almost every ξ ∈ Σ, for all regions Λ, ξ (A), π(A | TΛ )(ξ) = πΛ,βJ,βh ∀A ∈ F. The Ising model can be seen as the ‘q = 2’ case of the Potts model [49]. For q ∈ {1, 2, . . . } the q-state Potts model has spin states S = {1, 2, . . . , q}. Let J ∈ R be the interaction strength. The Hamiltonian is given by H(σ) = −J X δσ(x),σ(y) , σ ∈ Σ = S V. hx,yi∈E This defines measures for inverse temperature β on graph G, in the same ways as the Ising model. The Potts model may be viewed as a generalization of the h = 0 Ising model. The −1, +1 Ising states are represented in the q = 2 Potts model by states 1, 2. 1.5 The random-cluster measure A key step in the understanding of the Ising and Potts models was the development of the random-cluster model by Fortuin and Kasteleyn [25]. It provides a connection between bond percolation and the magnetic spin models. For a recent account of the random-cluster measure see [30, 34]. It is an edge model like bond percolation. Edges are open or closed, maximal set of vertices connected by open edge paths are called clusters. Let G = (V, E) be a finite graph. Let Ω = {0, 1}E . For ω ∈ Ω, let k(ω) be the number of ω-open clusters. The random-cluster measure with parameters p ∈ (0, 1), q ∈ (0, ∞) is given by, Y p ω(e) 1 k(ω) µ(ω) = RC q Zp,q (G) 1−p e∈E 8 1.5 The random-cluster measure with normalizing constant, the random-cluster partition function, RC (G) Zp,q = X q k(ω) ω∈{0,1}E Y p ω(e) . 1−p e∈E (1.1) If q = 1 this is just a product measure with density p. If q > 1 then the measure favours having more clusters and so the edge density is driven down. The reverse holds for q < 1. This is a generalisation of percolation with a dependence structure. As described in the next section, measure µ has the property of monotonicity if q ≥ 1. The density of an edge e, conditional on the states of the other edges, takes a particularly simple form. Let Je be the event edge e is open and let Te be the σ-algebra generated by {Jf : f 6= e}. The value of µ(Je | Te ) depends on whether or not the edges E \ {e} contain an open path between the end vertices of e, ( p if the end vertices of e are joined, (1.2) µ(Je | Te ) = p otherwise. p+q(1−p) The random-cluster measure can also be defined with boundary conditions on a region Λ = (V, E) of an infinite graph G. The number of clusters k(ω) is then taken to be the number of open edge clusters of G that intersect V ∪ ∂V . The random-cluster model is naturally coupled to the Potts model. Let G = (V, E) be a finite graph with q-state Potts measure φ at inverse temperature β, and (p, q) random-cluster measure µ. If p = 1 − e−βJ there is a measure π on {1, . . . , q}V × {0, 1}E with the following properties. (i) The marginal measure on {1, . . . , q}V is φ. (ii) The marginal measure on {0, 1}E is µ. (iii) Given the edge configuration, spins are constant on connected clusters. Between the different clusters, the spins are independent. The common spin of each cluster is uniformly distributed. (iv) Given the vertex spins, an edge hx, yi is open with probability p if the spins at x and y agree, and closed otherwise. 9 1.6 The FKG inequality and Holley’s theorem We get the following formula relating long range connectivity in the randomcluster model and long range correlation in the Potts models. The two-point correlation function for the Potts model is, τ (x, y) = π(σx = σy ) − q −1 = (1 − q −1 ) φ(x ↔ y). In Chapter 2 we consider a variant of the Potts model, the Blume–Capel model. We analyse the Blume–Capel model by extending the coupling between the Potts model and the random-cluster model. 1.6 The FKG inequality and Holley’s theorem Let Ω be a partially ordered set. Ω is called a lattice if for all ω1 , ω2 ∈ Ω, there exists a least upper bound ω1 ∨ ω2 ∈ Ω, and a greatest lower bound ω1 ∧ ω2 ∈ Ω. A lattice is distributive if for all ω1 , ω2 , ω3 ∈ Ω, ω1 ∧ (ω2 ∨ ω3 ) = (ω1 ∧ ω2 ) ∨ (ω1 ∧ ω3 ), ω1 ∨ (ω2 ∧ ω3 ) = (ω1 ∨ ω2 ) ∧ (ω1 ∨ ω3 ). Let S ⊂ R and let Ω = S I for a set I. Then Ω is a distributive lattice with respect to the coordinate-wise partial order: ω1 ≤ ω2 if ∀i ∈ I, ω1 (i) ≤ ω2 (i). For ω1 , ω2 ∈ Ω, (ω1 ∨ ω2 )(i) = max{ω1 (i), ω2 (i)}, (ω1 ∧ ω2 )(i) = min{ω1 (i), ω2 (i)}. A set A ⊂ Ω is called increasing if it is closed upwards with respect to the partial order, ∀ω1 , ω2 ∈ Ω : ω1 ≤ ω2 and ω1 ∈ A → ω2 ∈ A. Let F be the set of all subsets of Ω. Again we abuse notation by using µ to mean both a measure on (Ω, F), and the corresponding mass function from Ω to [0, 1]. A probability measure µ is positively-associated if for all increasing A, B ⊂ Ω, µ(A ∩ B) ≥ µ(A)µ(B). 10 1.6 The FKG inequality and Holley’s theorem An early result in percolation theory is that product measures on sets {0, 1}I are positively-associated [37]. Motivated by the Potts measure, Fortuin, Kasteleyn and Ginibre showed the following [26]. Theorem 1.3. Let µ be a probability measure on a finite distributive lattice Ω. If for all ω1 , ω2 ∈ Ω, µ(ω1 ∧ ω2 )µ(ω1 ∨ ω2 ) ≥ µ(ω1 )µ(ω2 ) (1.4) then µ is positively-associated. Inequality (1.4) is known as the FKG lattice condition. Holley proved a related result [38]. We actually state a slight generalisation, see for example [50]. Given measures µ1 , µ2 on Ω say that µ1 is stochastically dominated by µ2 , written µ1 ≤st µ2 , if for all increasing A ⊂ Ω, µ1 (A) ≤ µ2 (A). Theorem 1.5. Let Ω be a finite distributive lattice. Let µ1 , µ2 be two measures on (Ω, F) such that for all ω1 , ω2 ∈ Ω, µ1 (ω1 ∧ ω2 )µ2 (ω1 ∨ ω2 ) ≥ µ1 (ω1 )µ2 (ω2 ). Then µ1 ≤st µ2 . We will also define strong positive-association and monotonicity. We call µ positive if for all ω ∈ Ω, µ(ω) > 0. Let µ be a positive measure on a finite distributive lattice Ω = S I . Let J ⊂ I. Define ΩJ = S J and for ξ ∈ Ω let ΩξJ = {ω ∈ Ω : ω(i) = ξ(i) for i ∈ I \ J}. Let µξJ be the measure µ, conditioned on the coordinates in I \ J taking the corresponding values from ξ. Let Xi be the random variable Xi = ω(i). For η ∈ ΩJ , µξJ (η) = µ(Xj = η(j) for j ∈ J | Xi = ξ(i) for i ∈ I \ J). We say that µ is strongly positively-associated if for all J ⊂ I and for all ξ ∈ Ω, µξJ is positively associated. We say that µ is monotonic if for J ⊂ I and for all increasing A ⊂ Ω, µξJ (A) is increasing in ξ. If Ω = {0, 1}I finite and µ is a positive measure on (Ω, F) then these results take a particularly simple form [34]. 11 1.7 Influence and sharp threshold Theorem 1.6. The following are equivalent: (i) µ satisfies the FKG lattice condition (1.4). (ii) µ is strongly positively-associated. (iii) µ is monotonic. 1.7 Influence and sharp threshold Another type of random graph is due to Erdős and Rényi [21]. The Erdős–Rényi graph G(n, p) is a bond percolation model on the complete graph. Starting with the graph of n vertices and all possible n2 edges, edges are open with probability p and otherwise closed. An alternative model is G(n, M ). All M -edge subgraphs n of the complete graph are assigned probability 1/ M . Many properties of Erdős– Rényi graphs have sharp thresholds. It is natural to consider the limit n → ∞ with p = c/n for constant c. If c < 1 then the largest connected cluster has size Θ(log n). If c > 1 then there is a unique giant cluster of size Θ(n), and the second largest cluster has size Θ(log n). A concept that has proved to be closely related to sharp-thresholds is that of influence. Ben-Or and Linial [3] considered influence in the context of computer science. They were studying coin tossing algorithms involving fair coins. How important is any given coin? For N ∈ N let Ω = {0, 1}N , the N -dimensional Hamming space. Recall that Ω has the coordinate-wise partial order. Let µ be the uniform measure on Ω, so the coordinates are independent Bernoulli 1/2 random variables. We will use the following notation to manipulate single coordinates of vectors in Ω. ( 1 i=j ω j (i) = ω(i) i = 6 j ( 0 i=j ωj (i) = ω(i) i = 6 j For a Boolean function f : Ω → {0, 1} define the influence of the j-th coordinate, If (j) = µ({ω ∈ Ω : f (ωj ) 6= f (ω j )}). (1.7) Equivalently, for A ⊂ Ω, let f be the indicator function of A, so we have influence IA (j) = If (j). Clearly if µ(f ) is very close to either 0 or 1, then the influence of 12 1.7 Influence and sharp threshold every coordinate can be small. To avoid this, Ben-Or and Linial asked how small the maximum influence could be be if µ(f ) = 1/2. In this context, to study functions that minimize influence it is sufficient to consider only increasing functions. Let f be a general Boolean function. Then by Proposition 2.1 of [3], there is an increasing Boolean function g such that µ(f ) = µ(g), ∀j, If (j) ≥ Ig (j). This can be shown using a technique from extremal set theory. The values of f on pairs (ωi , ω i ) where f (ωi ) > f (ω i ) can be swapped, making the function ‘more’ monotonic without increasing the coordinate-wise influences. Let f : Ω → {0, 1} be a Boolean function such that µ(f ) = 1/2. Consider for example f (ω) = ω(i), the dictatorship of coordinate i. Then If (j) = 1 if i = j and If (j) = 0 otherwise. An example with much smaller maximum influence is the ‘majority’ function. Suppose for convenience that N is odd. Set f (ω) = 1 if more that half of the ω(i) are 1. The influence of each coordinate is Θ(N −1/2 ), the probability exactly half of the other N − 1 coordinates take value 1. They found they could do better, giving the ‘tribe’ function with maximum influence Θ(N −1 log N ). The N coordinates are split into approximately N/ log2 N tribes of size log2 N − log2 log2 N + O(1). Take f (ω) = 1 when at least one tribe is unanimously 1. The influence of a coordinate is the probability that all the other members of the tribe take value 1, probability Θ(N −1 log N ), and that none of the other tribes is unanimously 1, probability Θ(1). They conjectured that for all f with µ(f ) = 1/2, for some j, If (j) = Ω(N −1 log N ). This was proved by a set of authors known as KKL [41]. Their proof uses the observation that the influences for increasing function f , equation (1.7), are Fourier coefficients of the function on the group Zn2 . They actually prove a more general result, without the restriction µ(f ) = 1/2. One can take C = 1/5 below. Theorem 1.8 (KKL). There is an absolute constant C > 0 such that for all Boolean functions f , ∃j, If (j) ≥ C min{µ(f ), 1 − µ(f )} 13 log N . N 1.8 Microcanonical ensemble Markov chains This definition of influence can be extended to more general product measure, see [27]. Let Ω = [0, 1]N and let λ be the uniform measure on Ω. For x = (x1 , . . . , xN ) and j ∈ {1, . . . , N }, define the set Sj (x) = {y ∈ Ω : y(i) = x(i) for i 6= j}. This is the subset of Ω generated by freeing the j-th coordinate. Given a random variable f : [0, 1]N → {0, 1}, define the influence of coordinate j on f , If (j) = λ({x : f not constant on Sj (x)}). The result of KKL has been extended to this setting by a superset of authors known as BKKKL. We use this result to study influence and sharp thresholds for monotonic measures in Chapter 3. 1.8 Microcanonical ensemble Markov chains We have introduced the Gibbs formalism and shown how Gibbs measures correspond to maximum entropy distributions. This has aroused some interest in Markov chain models for microcanonical ensembles, such as the Ehrenfest urn model [20, 44]. In Boltzmann’s H theory, entropy always increases. Markov chain models often have a seemingly paradoxical property. It seems natural that they should be time reversible. However, then by ergodicity any function of the system that goes up must also come down. In the Ehrenfest urn model there is an urn of N coloured balls, each either red or green. A ball is picked at random and replaced with a ball of the other colour. To ensure aperiodicity, we will consider a lazy version: each step do nothing with probability one half. The equilibrium distribution is uniform on the set {red, green}N , the system has Maxwell–Boltzmann statistics [40]. If the system is started from a highly ordered state, say all green balls, after time O(N log N ) it is in a disordered state, with an approximately equal number of red and green balls [19, 48]. In Chapter 4 we study a different Markov chain model. We start with N boxes each containing R indistinguishable balls. The process evolves as follows. 14 1.8 Microcanonical ensemble Markov chains Pick two boxes uniformly at random. If the first box has any balls in, move one ball to the second box. It is a discrete time ‘mean-field’ version of the zero-range process [53]. A zero-range process is defined on a directed graph. For simplicity I will only describe a finite one-dimensional version. Consider N boxes arranged in a circle with edges oriented clockwise. Distributed among the boxes are balls. Balls move along the directed edges. The rate at which balls leave a box may vary by location and box occupancy. The problem is hard to analyse; the model exhibits a phenomenon known as condensation where a large fraction of all the balls end up in one box [22]. We consider a zero-range process on the complete graph; there are directed edges in both directions between each pair of boxes. The process can be interpreted as a model for a microcanonical ensemble. Think of the boxes as particles and the balls as quanta of energy. The equilibrium distribution has Maxwell– Boltzmann statistics. 15 Chapter 2 Blume–Capel model 2.1 The Blume–Capel model We have introduced the Ising model for ferromagnetism. Each vertex takes spin +1 or −1. The Blume–Capel model is a generalization of the Ising model, formulated by Blume in 1966 [7]. The set of spin states is S = {+1, 0, −1}. The Blume–Capel model is defined on a graph G = (V, E) by the Hamiltonian, X X X H(σ) = −J σ(x)σ(y) + D σ(x)2 − h σ(x), σ ∈ Σ = S V . hx,yi∈E x∈V x∈V This can be considered a dilution of the Ising model in the following sense. Let G be a finite graph. For a configuration σ ∈ Σ, define the undeleted graph to be Gσ = (Vσ , Eσ ), Vσ = {x : σ(x) 6= 0} and Eσ = {hx, yi ∈ E : x, y ∈ Vσ }. Condition the Blume–Capel measure on the event that the undeleted graph takes a particular value; the resulting measure is the Ising measure on the undeleted graph. A first order phase transition is one where a property of the model, such as the density of +1 spins, is discontinuous as a function of the parameters. A second order phase transition is one without this phenomenon. Capel studied the ferromagnetic model (J > 0) by mean field approximations [14, 15, 16]. He predicted that in the case h = 0, on a lattice with degree δ, there is a first order phase transition when 31 Jδ log 4 < D < 12 Jδ. He predicted a second order phase transition when D < 13 Jδ log 4 and no long range order for any temperature when 16 2.1 The Blume–Capel model D > 21 Jδ. It is believed that even on two dimensional lattices there is a firstorder phase transition. It is predicted that there is a tri-critical point at which the phase transition goes from first-order to second-order. Some of the results we will show for the Blume–Capel model are in fact true in a more general setting. The generalization of the Ising model from two states to the q-state Potts model motivates an expansion of the Blume–Capel model with h = 0. The Blume–Capel zero state is kept, but the ±1 states are replaced by states 1, 2; and more generally 1, . . . , q. Define the Blume–Capel–Potts (BCP) Hamiltonian as follows. Let q be a positive integer, let S = {0, 1, . . . , q} and let Σ = S V . Again, define the deleted graph by Vσ = {x : σ(x) 6= 0} and Eσ = {hx, yi ∈ E : x, y ∈ Vσ }. Let H(σ) = J|Eσ | − 2J X δσ(x),σ(y) + D|Vσ |, σ ∈ Σ. hx,yi∈Eσ Let K = βJ, ∆ = βD. Let Λ = (V, E) be a region of G and let ξ ∈ Σ. Define ΣξΛ = {σ ∈ Σ : σ(x) = ξ(x) for x 6∈ V }. Let Vσ = Vσ ∩ V and Eσ = Eσ ∩ E. Define the BCP measure on region Λ by X 1 ξ δσ(x),σ(y) − ∆|Vσ | , πΛ,K,∆,q (σ) = BCP exp −K|Eσ | + 2K Z σ ∈ ΣξΛ . hx,yi∈Eσ It is easy to see that the q = 2 case is equivalent to the Blume–Capel measure. In Chapter 1 we described a coupling of the Potts spin model and the randomcluster bond model. In Section 2.4, we demonstrate a random-cluster representation for the Blume–Capel–Potts model—we call it the diluted-random-cluster model. Paths of open edges in the random-cluster model correspond to order amongst the spins of the Potts model. Likewise, paths of open edges in the diluted-random-cluster model correspond to order amongst the non-zero spins of the BCP model. We say that it is a diluted version of the random-cluster model. Some vertices, and the adjacent edges, are removed—they correspond to zero spins in the BCP model. In Section 2.5 we show that the distribution of vertices deleted in the dilutedrandom-cluster model is monotonic. We also give various comparison results in Sections 2.5 and 2.6, indicating how the diluted-random-cluster measure varies 17 2.2 Notation with the parameters of the model and with the boundary conditions. The comparison results relating to the boundary conditions allow us to define infinite-volume limit measures in Section 2.7, extending the coupling to the hypercubic lattice. In Section 2.8 we use the convexity of the BCP partition function to prove a uniqueness result. If q = 2, the diluted-random-cluster measure on the hypercubic lattice is independent of the boundary conditions except on a null subset of the parameter space. In Section 2.9 we use the coupling and comparison results to draw conclusions about the phase diagram of the q = 2 Blume–Capel–Potts model on the d = 2 hypercubic lattice. 2.2 Notation Let d ≥ 2. Let Z = {. . . , −1, 0, 1, . . . }. For x, y ∈ Zd define distance l(x, y) = d X |xi − yi |. i=1 Then Ld = (Zd , Ed ), the d-dimensional hypercubic lattice, is the nearest neighbour graph on Zd , Ed = {hx, yi : x, y ∈ Zd , l(x, y) = 1}. It is regular, every vertex has degree δ = 2d. Ld is simple, it has no loop edges or parallel edges. If V ⊂ Zd then the boundary ∂V is set of vertices in Zd \ V adjacent to a member of V . The corresponding region Λ = (V, E) of Ld contains the edges with both end vertices in V ∪ ∂V . Write Λn → Ld if (Λn ) is a sequence of regions with vertices tending to Zd . If ω ∈ Ω = {0, 1}I for some set I, define configurations ω x , ωx for x ∈ I by ( ( 1 x = y, 0 x = y, ω x (y) = ωx (y) = ω(y) otherwise, ω(y) otherwise. For convenience when manipulating multiple coordinates, we will write ω x,y meaning (ω x )y . We use the Kronecker-delta symbol, ( 1 if x = y, δx,y = 0 otherwise. 18 2.3 FKG and Holley’s theorem 2.3 FKG and Holley’s theorem In Chapter 1 we stated the FKG theorem and Holley’s theorem [26, 38]. They are useful tools in the study of the random-cluster model. In this section we will give two helpful additional results [34, 35]. Let I be a finite set and let Ω = {0, 1}I . Let (Ω, F) be the σ-algebra with F the set of all subset of Ω. We call probability measure µ positive if µ(ω) > 0 for all ω ∈ Ω. Theorem 2.1. Let µ be a positive probability measure on (Ω, F) such that for all i, j ∈ I, for all ω ∈ Ω with ω(i) = ω(j) = 0, µ(ω)µ(ω i,j ) ≥ µ(ω i )µ(ω j ). Then µ satisfies the FKG lattice condition (1.4). By the FKG theorem it is monotonic and strongly positively-associated. Theorem 2.2. Let µ1 , µ2 be positive probability measures on (Ω, F) such that µ1 is strongly positively-associated. Suppose that for all ω ∈ Ω with ω(i) = 0, µ1 (ω)µ2 (ω i ) ≥ µ1 (ω i )µ2 (ω), and for all ω ∈ Ω with ω(i) = ω(j) = 0, µ1 (ω)µ2 (ω i,j ) ≥ µ1 (ω i )µ2 (ω j ). Then ∀ω, ω 0 ∈ Ω : µ1 (ω ∧ ω 0 )µ2 (ω ∨ ω 0 ) ≥ µ1 (ω)µ2 (ω 0 ), so by Holley’s theorem µ1 ≤st µ2 . Let G = (V, E) be a finite graph, let p ∈ (0, 1) and q ≥ 1. The (p, q) randomcluster measure on G satisfies the FKG lattice condition and hence is monotonic. The more edges are open, the more closed edges can be opened without reducing the number of clusters. 19 2.4 The diluted-random-cluster representation of the BCP model 2.4 The diluted-random-cluster representation of the BCP model In the same sense that the BCP model is a dilution of the Potts model, the measure we now define is a diluted version of the random-cluster measure. The parameters of the model are a ∈ [0, 1], p ∈ [0, 1) and q ∈ (0, ∞). Let G = (V, E) be a graph. Let Ψ = {0, 1}V be the set of vertex configurations and let Ω = {0, 1}E be the set of edge configurations. For ψ ∈ Ψ, call vertex x open if ψ(x) = 1 and closed otherwise. Define the undeleted graph Gψ = (Vψ , Eψ ) by Vψ = {x ∈ V : ψ(x) = 1}, Eψ = {hx, yi ∈ E : x, y ∈ Vψ }. Call the edges not in Eψ deleted. Let ω ∈ Ω. Call edge e open if ω(e) = 1 and closed otherwise. If all deleted edges are closed, say that ψ and ω are compatible. Define Θ = {(ψ, ω) : ψ, ω are compatible}. Let Λ = (V, E) be a region, and let λ = (κ, ρ) ∈ Θ. Let ΘλΛ = {θ ∈ Θ : θ = λ off Λ}, the set of configurations that agree with boundary conditions λ on the vertices V \ V and edges E \ E. Let Vψ = Vψ ∩ V and Eψ = Eψ ∩ E. Two open vertices are connected, written x ↔ y, if they are joined by a path of open edges. A cluster of the graph is a maximal set of connected open vertices. Let k(θ) denote the number of clusters of the graph that include a vertex in V ∪ ∂V ; do not count isolated closed vertices. For now assume a ∈ (0, 1), p ∈ [0, 1). √ For notational convenience, set r = 1 − p. Define the (a, p, q) diluted-randomcluster (DRC) measure with boundary conditions λ on Λ. For θ = (ψ, ω) ∈ ΘλΛ , φλΛ,a,p,q (θ) = 1 Z DRC r |Eψ | k(θ) q Y a ψ(x) Y p ω(e) 1−a 1−p x∈V e∈E ψ DRC and φλΛ,a,p,q (θ) = 0 otherwise. Here Z DRC = Za,p,q (λ, Λ) is a normalizing constant. The above formula may be interpreted with a = 0 and a = 1. If a = 0, set all vertices and edges closed. If a = 1, require all vertices be open, with edges are distributed according to the normal (p, q) random-cluster measure. The motivation for this definition is provided by the following coupling. Let q be a positive integer and let s ∈ {0, 1, . . . , q}. We will abuse notation by using 20 2.4 The diluted-random-cluster representation of the BCP model s to represent a BCP spin configuration: the member of Σ such that every vertex has spin integer s. If s = 0 let b = 0; if s ∈ {1, . . . , q} let b = 1. Again we will abuse notation, this time by taking b to represent a DRC configuration: the element of Θ that assigns each vertex and edge state b. Let S be the set of triples of spin, vertex and edge configurations, (σ, ψ, ω) ∈ Σ × Ψ × Ω, such that (i) σ ∈ ΣsΛ , (ii) (ψ, ω) ∈ ΘbΛ , (iii) for all x ∈ V, σ(x) = 0 if and only if ψ(x) = 0, and (iv) for all e = hx, yi ∈ E, σ(x) 6= σ(y) implies ω(e) = 0. So S is the set of configuration that have Eσ = Eψ and such that the σ-spins on (ψ, ω)-clusters are constant. Define measure µ with support S and parameters, a p = 1 − e−2K , = e−∆ (2.3) 1−a by ω(e) ψ(x) Q −1 |Eψ | Q p a (σ, ψ, ω) ∈ S, Z r e∈Eψ 1−p x∈V 1−a µ(σ, ψ, ω) = 0 otherwise. s Theorem 2.4. Measure µ couples measures π = πΛ,K,∆,q and φ = φbΛ,a,p,q where a, p, K, ∆ satisfy (2.3). Proof. Let σ ∈ Σ. This uniquely determines vertex configuration ψ and the set of edges that may be open, Aσ = {hx, yi ∈ Eσ : σ(x) = σ(y)}. Then X 1 π(σ) = BCP exp −K|Eσ | + 2K δσ(x),σ(y) − ∆|Vσ | Z hx,yi∈Eσ 1 e−K|Eψ | e−∆|Vσ | e2K|Aσ | ψ(x) Y 1 |Eψ | Y a p = BCP r 1+ Z 1−a 1−p x∈V e∈Aσ X Y a ψ(x) Y p ω(e) 1 |Eψ | = BCP r Z 1−a 1−p x∈V e∈E = Z BCP ω:(σ,ψ,ω)∈S = Z Z BCP X ψ µ(σ, ψ, ω). ω:(σ,ψ,ω)∈S 21 2.4 The diluted-random-cluster representation of the BCP model By summing over σ, Z = Z BCP and µ has first marginal π. Now fix θ = (ψ, ω) ∈ ΘbΛ . There are q k(θ)−b BCP configurations σ ∈ ΣsΛ such that (σ, ψ, ω) ∈ S; there are q possible spins on each of the k(θ) − b open clusters that do not meet the boundary. φ(ψ, ω) = 1 Z DRC r |Eψ | k(θ) q Y a ψ(x) Y p ω(e) 1−a 1−p x∈V e∈E ψ = Z Z DRC qb X µ(σ, ψ, ω). σ So Z DRC = q b Z and the second marginal is φ. Note that Z DRC = q b Z BCP . This is used to give a uniqueness result in Section 2.8. Conditional on the BCP configuration, the coupled diluted-random-cluster edge states are (i) zero on the deleted edges, (ii) zero on the undeleted edges hx, yi where σ(x) 6= σ(y), and (iii) independent Bernoulli p random variables on the other edges. Conditional on the diluted-random-cluster configuration, the coupled BCP spin states are (i) constant and non-zero on clusters, (ii) zero on deleted vertices, (iii) equal to the boundary condition s if the cluster meets the boundary, and (iv) independent between the other clusters, with the spins uniformly distributed on S = {1, . . . , q}. Suppose ψ(x) = ψ(y) = 1. If x ↔ y, σ(x) = σ(y). If x 6↔ y then σ(x) and σ(y) are independent and they agree with probability 1/q. It follows that for q > 1, π(σ(x) = σ(y) 6= 0) − q −1 π(σ(x)σ(y) 6= 0) = (1 − q −1 )φ(x ↔ y). (2.5) This will provide a precise connection between the phase diagrams of the DRC and BCP models. Existence of large open edge clusters in the DRC model implies 22 2.4 The diluted-random-cluster representation of the BCP model that non-zero spins in the BCP model are correlated in a non-trivial way over large distances. Also, the distribution of the set of 0-spins in both models is the same. The DRC model has an infinite vertex cluster of zero states if and only if the same is true for the corresponding BCP model. Unlike the random-cluster measure, the DRC measure is not monotonic. Let G be the graph with two vertices and one edge, V = {x, y} and E = {e = hx, yi}. √ Let 0 < a, p < 1 and q ∈ (0, ∞). Set r = 1 − p < 1. Let φ be the (a, p, q) DRC measure for G. Then φ(ψ(y) = 1 | ψ(x) = 0, ω(e) = 0) = qa qa + 1 − a ≥ φ(ψ(y) = 1 | ψ(x) = 1, ω(e) = 0) = qar . qar + 1 − a Hence φ is not monotonic with respect to {0, 1}V × {0, 1}E ; the probability that y is open is not increasing with the other states. However, we will show in Section 2.5 that φ does have the weaker property of monotonicity in the vertex states. There are some special cases of the diluted-random-cluster measure that are well understood. We will use them later for comparison purposes. If p = 0, the edge configuration is zero. The vertex states are independent Bernoulli random variables with density qa/(1 − a + qa). This corresponds to the K = 0 Blume– Capel–Potts measure with independent spins. On graphs with regular degree, the q = 1 BCP and DRC measures may be viewed as the Ising model. For the BCP model, let η(x) = 2σ(x) − 1. This takes us from BCP configuration σ ∈ {0, 1}V to Ising configuration η ∈ {+1, −1}V . X ξ πΛ,K,∆,1 (σ) ∝ exp −K|Eσ | + 2K δσ(x),σ(y) − ∆|Vσ | hx,yi∈Eσ = exp −∆ X σ(x) + K x∈V X σ(x)σ(y) hx,yi∈E X X Kδ − 2∆ 1 ∝ exp η(x) + K η(x)η(y) 4 4 x∈V hx,yi∈E By the coupling theorem, the same holds for the vertex states of the corresponding DRC measure. 23 2.5 The vertex DRC marginal measure 2.5 The vertex DRC marginal measure Let φ = φλΛ,a,p,q be a diluted-random-cluster measure on a region Λ = (V, E) in G with boundary conditions λ = (κ, ρ). Define vertex DRC measure Φ = ΦλΛ,a,p,q as the marginal measure of φ on the vertex states Ψ = {0, 1}V , X Φ(ψ) = φ(ψ, ω). (ψ,ω)∈Θλ Λ Let a ∈ (0, 1), p ∈ [0, 1) and q ∈ [1, 2]. Let r = √ 1 − p. We exclude the cases a = 0 and a = 1; the corresponding vertex measures are trivial. Assume G is simple with maximum vertex degree δ. Each vertex, conditional on the states of the other vertices, is open with probability bounded away from 0 and 1; Φ has what is known as the ‘finite energy property’ [13]. Theorem 2.6. Let Jx be the event that x ∈ V is open, and let Tx be the σ-field generated by the states of vertices V \ {x}, qa qa ≤ Φ(Jx | Tx ) ≤ δ . 1 − a + qa r (1 − a) + qa Theorem 2.7. Φ is monotonic and strongly positively-associated. The following theorems will be used to show the existence of infinite-volume limit DRC measures. 1 2 Theorem 2.8. Let λ1 , λ2 ∈ Θ. If λ1 ≤ λ2 then ΦλΛ,a,p,q ≤st ΦλΛ,a,p,q . Theorem 2.9. Let Λ1 , Λ2 be two regions in G such that Λ1 is contained in Λ2 , Φ0Λ1 ,a,p,q ≤st Φ0Λ2 ,a,p,q , Φ1Λ1 ,a,p,q ≥st Φ1Λ2 ,a,p,q . We can identify a number of situations in which stochastic inequalities hold between versions of the vertex DRC measure with different parameter values. Theorem 2.10. Let ai ∈ (0, 1), pi ∈ [0, 1) and qi ∈ [1, 2] for i = 1, 2. Let Φi = ΦλΛ,ai ,pi ,qi for i = 1, 2. Let δ be the maximum degree of G. Then Φ1 ≤st Φ2 if any of the following hold. (a) That a1 ≤ a2 , p1 ≤ p2 and q1 = q2 . 24 2.5 The vertex DRC marginal measure (b) That q2 a2 1 − a2 ≥ q1 a1 1 − a1 (1 − p1 )−δ/2 . (c) That p1 ≤ p2 , q1 ≥ q2 and a2 a1 δ/2 q2 (1 − p2 ) ≥ q1 (1 − p1 )δ/2 . 1 − a2 1 − a1 p2 ≤ q2 (1−p and 2) a2 a1 δ/2 q2 (1 − p2 ) ≥ q1 (1 − p1 )δ/2 . 1 − a2 1 − a1 (d) That q1 ≤ q2 , p1 q1 (1−p1 ) Let Λ(λ, ψ) be the undeleted graph Gψ with boundary conditions λ on Λ; identify the vertices in the boundary ∂V joined by open edges of λ outside Λ. Then with reference to (1.1) we can write, Φ(ψ) = 1 Z DRC r |Eψ | a 1−a |Vψ | RC Zp,q (Λ(λ, ψ)). (2.11) Lemma 2.12. Let Φi , i = 1, 2, be the vertex DRC measures on Λ with boundary conditions λ ∈ Θ and parameters ai ∈ (0, 1), pi ∈ [0, 1), q1 ∈ [1, ∞) and q2 ∈ [1, 2]. Suppose that ψ ∈ {0, 1}V , x ∈ V with ψ(x) = 0. Let b be the number of edges incident to x in Λ(λ, ψ) and let Ix be the event that all the edges incident to x are closed. Let µG i be the (pi , qi ) random-cluster measure on graph G. (i) If q2 a2 1 − a2 (1 − p2 )b/2 Λ(λ,ψ x ) µ2 (Ix ) ≥ q1 a1 1 − a1 (1 − p1 )b/2 Λ(λ,ψ x ) µ1 (2.13) (Ix ) then Φ1 (ψ)Φ2 (ψ x ) ≥ Φ1 (ψ x )Φ2 (ψ). (ii) If y ∈ V \ {x} with ψ(y) = 0, let f ∈ {0, 1} be the number of edges between x and y. Then a2 (1 − p2 )(b+f )/2 a1 (1 − p1 )b/2 q2 ≥ q 1 x,y x) ) 1 − a2 µΛ(λ,ψ 1 − a1 µΛ(λ,ψ (Ix ) (Ix ) 2 1 implies Φ1 (ψ)Φ2 (ψ x,y ) ≥ Φ1 (ψ x )Φ2 (ψ y ). 25 (2.14) 2.5 The vertex DRC marginal measure The following reduces the work needed to check that the conditions for Lemma 2.12 √ hold. Let ri = 1 − pi , i = 1, 2. Lemma 2.15. In the context of inequality (2.14), Λ(λ,ψ x,y ) µ2 Λ(λ,ψ x ) (Ix ) ≤ µ2 (Ix )r2f . Corollary 2.16. Condition (2.13) also implies (2.14) for all appropriate y. Proof of Lemma 2.15. Let B and C be the sets of edges joining x and y to ψopen vertices, respectively. Let F be the set of f edges with end vertices x, y. Let B0 , C0 , F0 be the decreasing events that all the edges in B, C, F are closed, respectively. By the monotonicity of the random-cluster model, the probability of Ix is increased if the edges in C are deleted from Λ(λ, ψ x,y ), Λ(λ,ψ x,y ) µ2 Λ(λ,ψ x,y ) (Ix ) = µ2 Λ(λ,ψ x,y )\C (B0 ∩ F0 ) ≤ µ2 (B0 ∩ F0 ). Notice that ( Λ(λ,ψ x,y )\C µ2 (F0 ) = 1 if f = 0 q2 (1−p2 ) p2 +q2 (1−p2 ) if f = 1 ) ≤ r2f . This is by the conditional probability property of the random-cluster model (1.2) and the elementary inequality, p q(1 − p) ≤ 1 − p, p + q(1 − p) p ∈ [0, 1], q ∈ [1, 2]. Therefore, deleting edges that have been conditioned closed, Λ(λ,ψ x,y )\C µ2 Λ(λ,ψ x,y )\C (B0 ∩ F0 ) ≤ µ2 Λ(λ,ψ x ) (B0 | F0 )r2f = µ2 (Ix )r2f . Proof of Lemma 2.12. We will only show part (i), that inequality (2.13) implies Φ1 (ψ)Φ2 (ψ x ) ≥ Φ1 (ψ x )Φ2 (ψ). 26 2.5 The vertex DRC marginal measure Part (ii) follows in the same way. Use (1.1) to rewrite (2.13), (Λ(λ, ψ x )) b (Λ(λ, ψ x )) b a2 ZpRC a1 ZpRC 2 ,q2 1 ,q1 r ≤ r. 1 − a2 ZpRC (Λ(λ, ψ)) 2 1 − a1 ZpRC (Λ(λ, ψ)) 1 2 ,q2 1 ,q1 Notice that |Vψx | − |Vψ | = 1 and |Eψx | − |Eψ | = b. Therefore by (2.11) the result follows. The proofs of the Theorems 2.7 and 2.10 now follow from Lemma 2.12. In particular Corollary 2.16 means we only have to check that (2.13) holds. Let ai = a, pi = p and qi = q for i = 1, 2. Proof of Theorem 2.7. Clearly (2.13) holds with Φ = Φ1 = Φ2 . Now apply Theorem 2.1. Proof of Theorem 2.10. It is sufficient to check (2.13) in each case. Let x be a Λ(λ,ψ x ) vertex incident to b edges in Λ(λ, ψ x ). Let µi = µi for i = 1, 2. (a) For an edge e, let Je be the event that e is open. By [5] and positive association, X d 1 µ2 (Ix ) = cov(Ix , Je ) dp2 p2 (1 − p2 ) e∈E X 1 cov(Ix , Je ). ≤ p2 (1 − p2 ) e∈B For e ∈ B, Ix and Je are mutually exclusive, X d 1 µ2 (Je ) log µ2 (Ix ) ≤ − dp2 p2 (1 − p2 ) e∈B b p2 p2 (1 − p2 ) p2 + q2 (1 − p2 ) b ≤− as q2 ∈ [1, 2]. 2(1 − p2 ) ≤− Integrating from p1 to p2 , µ2 (Ix ) (1 − p2 )b/2 . ≤ µ1 (Ix ) (1 − p1 )b/2 As q1 = q2 and a1 ≤ a2 , (2.13) holds and the result follows. 27 2.5 The vertex DRC marginal measure (b) By the conditional edge densities of the random-cluster measure (1.2), µ2 (Ix ) ≤ (1 − p2 )b/2 and µ1 (Ix ) ≥ (1 − p1 )b , so (2.13) holds. (c) Under these conditions, µ2 ≥st µ1 so µ2 (Ix ) ≤ µ1 (Ix ). (d) Again µ2 ≥st µ1 so µ2 (Ix ) ≤ µ1 (Ix ). Proof of Theorem 2.8. Let λi = (κi , ρi ), i = 1, 2. By the monotonicity of the vertex DRC measure we may assume κ1 = κ2 . We can adapt the proof of Λ(λ1 ,ψ x ) Lemma 2.12. By the monotonicity of the random-cluster model, µ1 Λ(λ ,ψ x ) µ2 2 (Ix ); (Ix ) ≥ apply Theorem 2.2. Proof of Theorem 2.9. Let A be the decreasing event that all vertices in Λ2 \ Λ1 are closed. By the definition of the DRC measure, Φ0Λ1 ,a,p,q = Φ0Λ2 ,a,p,q ( · | A). Event A is decreasing so by the monotonicity of Φ0Λ2 ,a,p,q , Φ0Λ1 ,a,p,q ≤st Φ0Λ2 ,a,p,q . Let λ = (κ, ρ) be a configuration such that off Λ2 , κ = 1 and ρ = 1. Then Φ1Λ1 ,a,p,q ≥st ΦλΛ1 ,a,p,q . Notice Φ1Λ2 ,a,p,q can be written as a convex combination of terms on the right hand side above, so Φ1Λ1 ,a,p,q ≥st Φ1Λ2 ,a,p,q . Proof of Theorem 2.6. Since q ∈ [1, 2], ΦλΛ,a,p,q is monotonic by Theorem 2.7. As Jx is increasing, a lower bound for Φ(Jx | Tx ) is obtained by conditioning on all vertices other than x being closed. In that case, x contributes qa/(1 − a) when open and 1 when closed; the lower bound follows. An upper bound is obtained by conditioning on all the vertices other that x being open, and assuming they are in the same open cluster. This time x contributes no more than δ r q a 1−a X ω∈{0,1}δ when open, and 1 when closed. 28 ω(i) δ Y p 1−p i=1 2.6 Edge comparisons 2.6 Edge comparisons In the previous section, we only dealt with the vertex states. In this section we will give results for the edge states and the full DRC measure. Let Λ = (V, E) be a region in G. Let λ ∈ Θ, and let φ = φλΛ,a,p,q be the diluted-random-cluster measure. Define the DRC edge measure by Υ(ω) = X φ(ψ, ω), ω ∈ Ω. ψ∈Ψ In the above sum, ψ that are not compatible with edge configuration ω in Λ and boundary conditions λ on G \ Λ contribute nothing. Assume G is a simple graph with maximum vertex degree δ. Let −1 a j . wj = r q 1−a This can be thought of as the cost of closing a vertex under φ. Let (ψ, ω) be a configuration such that ψ(x) = 0. If j is the number of ψ-open neighbouring vertices of x, φ(ψ, ω) = φ(ψ x , ω)wj , wj ≤ wδ . Theorem 2.17. Let 0 < a ≤ a0 = 1, p, p0 ∈ (0, 1) and q ∈ [1, ∞). Let Υ = ΥλΛ,a,p,q and Υ0 = ΥλΛ,a0 ,p0 ,q . If p p0 ≥ (1 + 2wδ + wδ wδ−1 ) 1−p 1 − p0 then Υ ≥st Υ0 . Theorem 2.18. Let 0 < a < 1, 0 ≤ p < 1 and q ∈ [1, 2]. Let λ ∈ Θ. Let Λ be a region in G. (i) φλΛ,a,p,q is increasing in a, p and λ. (ii) If λ = 0, φλΛ,a,p,q is increasing in Λ (iii) If λ = 1, φλΛ,a,p,q is decreasing in Λ. 29 2.6 Edge comparisons Proof of Theorem 2.17. Υ0 is monotonic, it is simply a random-cluster measure. To apply Theorem 2.2 we must show Υ0 (ω)Υ(ω e ) ≥ Υ0 (ω e )Υ(ω), Υ0 (ω)Υ(ω e,f ) ≥ Υ0 (ω e )Υ(ω f ), ∀ω : ω(e) = 0, (2.19) ∀ω : ω(e) = ω(f ) = 0. (2.20) Let ω be an edge configuration from (2.20). By the definition of the randomcluster measure, Υ0 (ω) = Υ0 (ω e ) 1 − p0 p0 qk 0 where 1 if e is an isthmus of the (1, ω e )-open graph, 0 e k = k(1, ω) − k(1, ω ) = 0 otherwise. Let x and y be the end vertices of e. Let A be the set of vertex configurations compatible with ω e,f , A = {ψ : (ψ, ω e,f ) ∈ Θ}. Write Ax for the set of configurations in A with vertex x turned off, {ψx : ψ ∈ A}. Then (2.20) becomes, X X 1 − p0 k0 e,f φ(ψ, ω f ). q φ(ψ, ω ) ≥ 0 p ψ∈A∪A ∪A ∪A ψ∈A x y x,y This holds if X X 1−p k 1 − p0 f k0 f q φ(ψ, ω ) ≥ (1 + 2wδ + wδ wδ−1 )φ(ψ, ω ) q . p0 p ψ∈A ψ∈A with k = k(ψ, ω f ) − k(ψ, ω e,f ) ≤ k 0 . Splitting up the sum, we see that (2.20) holds. The proof of (2.19) is very similar. Proof of Theorem 2.18. In each case the statement holds for the vertex component of the measure, see Theorems 2.8, 2.9 and 2.10 (a). Recall that φλΛ,a,p,q has Λ(λ,ψ) the same distribution as the random-cluster measure µp,q generated by ψ ∼ ΦλΛ,a,p,q . on the graph Λ(λ, ψ) Let X(ψ, ω) be an increasing random variable defined 30 2.7 Infinite-volume measures on the vertex and edge states of G. For fixed ψ, define a random-cluster random variable, Xψ (ω) = X(ψ, ω), φ(X) = Φ(µΛ(λ,ψ) (Xψ )). p,q The random-cluster measure in increasing in p and Λ(λ, ψ). Xψ is an increasing Λ(λ,ψ) random variable, and it is increasing in ψ. Therefore µp,q (Xψ ) is an increasing function of ψ. 2.7 Infinite-volume measures We now are in a position to discuss infinite-volume limit measures. We will work on the hypercubic lattice, take G = Ld with d ≥ 2. The results easily generalize to other lattices. Let 0 < a, p < 1 and q ∈ (0, ∞). Let FΘ be the σ-algebra generated by the weak topology on Θ. Call a probability measure φ on (Θ, FΘ ) a limit diluted-random-cluster measure with parameters a, p, q and boundary conditions λ ∈ Θ if it is an accumulation point of a sequence of measures (φλΛn ,a,p,q ) with Λn → Ld . Let Wa,p,q be the set of such measures; let co Wa,p,q be the closure of the convex hull of Wa,p,q . The configuration space Θ is compact. The product Ψ × Ω of discrete topological spaces is compact, and Θ is a closed subset of Ψ × Ω. It is standard by Prohorov’s theorem [6] that any sequence of measures on Θ must have a fixed point. Hence Wa,p,q is non-empty. Using the results of the previous three sections, we can show that many results for the q ≥ 1 random-cluster model have analogues for the DRC model when q ∈ [1, 2]. Let 0 < a, p < 1, q ∈ [1, 2] and define the DRC model with regular boundary conditions, φba,p,q = limΛ→Ld φbΛ,a,p,q for b ∈ {0, 1}. Theorem 2.21. Let φ be a limit DRC measure. Consider the vertex measure Φ obtained by projecting φ onto the set of vertex configurations. (i) If φ ∈ Wa,p,q then Φ is positively associated. (ii) If φ ∈ co Wa,p,q then Φ has a finite energy property. 31 2.7 Infinite-volume measures By the conditional probability properties of the random-cluster model, if Φ has a finite energy property then the corresponding edge measure Υ has a finite energy property. Given the states of all but one edge, that edge is open with probability bounded away from 0 and 1. Theorem 2.22. Limit measures φba,p,q are (i) well defined, (ii) stochastically increasing in a, p and b, (iii) extremal in co Wa,p,q , (iv) tail trivial, and (v) translation invariant. Also φba,p,q have the 0/1-infinite cluster property with respect to both nearestneighbour vertex clusters and edge clusters. The number of infinite clusters of each type is a.s. constant and equal to either 0 or 1. Theorem 2.23. If q ∈ {1, 2} and s ∈ {0, 1, . . . , q} then the limit Blume–Capel– s s Potts measure πK,∆,q = limΛ→Ld πΛ,K,∆,q exists. We can also define DLR type DRC measures. Let Ra,p,q be the set of measures φ such that, φ(A | TΛ )(λ) = φλΛ,a,p,q (A), A ∈ FΘ , φ-a.e. λ. Let x, y ∈ Zd and write 1{x↔y} for the indicator function of the event x is connected to y. A consequence of Theorem 2.22 is that the set of discontinuities of 1{x↔y} is φba,p,q -a.s. null. By this we mean that for almost all configurations θ, there is a region Λ = Λ(θ) such that x ↔ y is decided inside Λ. If the event is not determined inside a region Λ0 , then x and y must both be joined to the boundary ∂Λ0 by open paths that do not meet inside Λ0 . This cannot be the case for arbitrarily large Λ0 , or x and y are in distinct infinite edge clusters. It follows from the above that φba,p,q ∈ Ra,p,q . The following theorem collects some standard results [34]. Positive association and stochastic orderings are preserved under weak limits. 32 2.7 Infinite-volume measures Theorem 2.24. Let I be a countable set. Let Ω = {0, 1}I and let F be the σalgebra generated by the product topology. (a) Let (µn ) and (νn ) be two sequences of probability measure on (Ω, F), with weak limits µ, ν respectively. If µn ≤st νn for all n then µ ≤st ν. (b) If measures (µn ) on (Ω, F) converges weakly to µ, and each µn is positively associated, then so is µ. (c) Let µ, ν be measures on (Ω, F). Let Ji (ω) = ω(i). If µ ≤st ν and µ(Ji ) = ν(Ji ) for all i ∈ I then µ = ν. Corollary 2.25. The vertex comparison results of Theorem 2.10 and the edge comparison result of Theorem 2.17 extend to Ld . The following lemma is needed in the proof of Theorem 2.23 for s = 1, . . . , q. Lemma 2.26. Let A be an increasing closed event. Let X be a φ1 -a.s. continuous random variable. Then φ1Λ (X1A ) → φ1 (X1A ) as Λ → Ld . Proof of Theorem 2.21. (i) Write Φ = limΛ→Ld ΦλΛ,a,p,q . By Theorem 2.7, Φ is the limit of monotonic measures so the result follows by Theorem 2.24 (b). (ii) We adapt the proof of Theorem 3.3 in [33]. Suppose first φ ∈ Wa,p,q , say φ = lim φλΛ . Λ→Zd We will write FV for the σ-algebra generated by the states of vertices in Λ = (V, E) and TV for σ-algebra generated by the states of vertices in Zd \V . Let Λ0 = (V 0 , E 0 ). By the martingale convergence theorem [36] and weak convergence, φλ (Jx | Tx ) = lim φλ (Jx | FV 0 \x ) Λ0 →Zd = lim lim φλΛ (Jx | FV 0 \x ). Λ0 →Zd Λ→Zd 33 2.7 Infinite-volume measures By Theorem 2.6 this limit is bounded away from 0 and 1. The bounds depend only on a, p and q so the result extends to the convex hull of Wa,p,q by linearity. If φ ∈ co Wa,p,q we can write φ = limn φn with each φn in the convex hull of Wa,p,q . Write Iκ,V \x for the cylinder event that the vertices in V \x take the values from κ. Again a.s. by martingale convergence, φ(Jx | Tx )(κ) = lim φ(Jx | FV \x )(κ) Λ→Zd φ(Jx ∩ Iκ,V \x ) φ(Iκ,V \x ) Λ→Zd φn (Jx ∩ Iκ,V \x ) = lim lim . d n→∞ φn (Iκ,V \x ) Λ→Z = lim The vertex states of φ have a finite energy property. An event is called a cylinder event if it only depends on the state of a finite number of vertices and edges. The set of cylinder events is convergence determining [6]; the same is true for just the set of increasing cylinder events. Proof of Theorem 2.22. (i) Let A be an increasing cylinder event. Let Λn , ∆n be two increasing sequences of regions such that Λn → Ld , ∆n → Ld . Then by Theorem 2.18, the sequence (φbΛn ,a,p,q (A)) is monotone. For every Λn there exists an m such that Λn ⊂ ∆m , and vice versa. Hence φba,p,q (A) is well-defined. (ii) This follows from Theorem 2.18, take the limit Λ → Ld by Theorem 2.24 (a). (iii) This follows from Theorem 2.18, take the limit Λ → Ld by Theorem 2.24 (a). (iv) The following argument is standard [4, 34]. First consider the case b = 0. Let Λ ⊂ ∆ be regions in Ld . Let TΛ be the σ-field generated by the vertices and edges of Ld \Λ. Let A be an increasing cylinder event on Λ, and T an 34 2.7 Infinite-volume measures event on ∆\Λ. Let I0,∆\Λ be the decreasing event that vertices and edges in ∆\Λ are closed. By Theorem 2.18 (ii), φ0Λ (A) = φ0∆ (A | I0,∆\Λ ) ≤ φ0∆ (A | T ). Take the limits ∆ → Zd and then Λ → Zd to get φ0 (A)φ0 (T ) ≤ φ0 (A ∩ T ) for T in the tail σ-field T = T Λ a region of Ld TΛ . Replace T with T c , φ0 (A)φ0 (T c ) ≤ φ0 (A ∩ T c ). Adding the two inequalities above we get φ0 (A) ≤ φ0 (A). Therefore the inequalities hold with equality, φ0 (A)φ0 (T ) = φ0 (A ∩ T ). This holds for all increasing cylinder events A, and so for all events. Let A = T , φ0 (T ) = φ0 (T )2 , φ0 is tail-trivial. Similarly for b = 1. (v) Let A be an increasing cylinder event on Λ. Let τ : Ld → Ld be a translation of the lattice. Then φ0 (A) ≥ φ0Λ (A) = φ0τ Λ (τ A) → φ0 (τ A), Λ → Zd . The above inequality also holds for τ −1 , so φ(A) = φ(τ A). Similarly for 1 boundary conditions with the inequalities reversed. The 0/1-infinite cluster property follows from [13] as φb is translation invariant and the projected measures on vertices and edges have finite energy properties. Proof of Lemma 2.26. Write [1A ]1Λ (θ) for the indicator function of A evaluated at θ on Λ with boundary condition 1. As A is increasing and closed, for all θ ∈ Θ, [1A ]1Λ (θ) ↓ 1A (θ) as Λ → Ld . 35 2.7 Infinite-volume measures Let regions Λ, Λ0 , Λ00 satisfy Λ0 ⊂ Λ ⊂ Λ00 . By monotonicity, as used in the proof of Theorem 2.9, φ1Λ ([1A ]1Λ0 ) ≥ φ1Λ (1A ) ≥ φ1Λ00 ([1A ]1Λ ). Let Λ00 → Ld . By weak convergence, φ1Λ ([1A ]1Λ0 ) ≥ φ1Λ (1A ) ≥ φ1 ([1A ]1Λ ). Take the limit Λ → Ld . By weak convergence and the monotone convergence theorem, φ1 ([1A ]1Λ0 ) ≥ lim φ1Λ (1A ) ≥ φ1 (1A ). Λ→Ld Now let Λ0 → Ld . Again by the monotone convergence theorem, φ1 (1A ) ≥ lim φ1Λ (1A ) ≥ φ1 (1A ). Λ→Ld Let (Λn ) be a sequence of regions that tend to Ld . The sequence of measures (φ1Λn ) is decreasing so there is a monotonic coupling. Let θ, (θn ) be random variables such that each θn has law φ1Λn , θ has law φ1 and θn ↓ θ as n → ∞. X is φ1 -a.s. continuous so by weak convergence X(θn ) → X(θ), φ1 -a.s.. Also 1A (θn ) → 1A (θ), φ1 -a.s.. By the dominated convergence theorem the result follows. Proof of Theorem 2.23. By the coupling of the finite BCP and DRC models, and the existence of limΛ→Ld φbΛ , b = 0, 1, we can show that limΛ→Ld πΛs exists. Let σ ∈ Σ be a spin configuration. Let A be the BCP cylinder event that the spins in a region Λ = (V, E) agree with σ. Events such as A are convergence determining. Let B be the DRC event (i) for all x ∈ V , ψ(x) = 0 if and only if σ(x) = 0, and (ii) for all x, y ∈ V , x ↔ y only if σ(x) = σ(y). We can partition event B into disjoint events according to number of open clusters that intersect V . Let Bi be the union of B with the event V intersects i open clusters. Let ∆ be a region containing Λ. Suppose first b = 0 and s = 0. By the P | −i 0 s properties of the coupling we can write π∆ (A) = |V i=0 q φ∆ (Bi ). Take the limit ∆ → Zd . 36 2.8 Uniqueness by convexity of pressure s In the case b = 1 and s = 1, . . . , q we must modify the expression for π∆ (A) to take into account the fixed spin of the boundary cluster. Let A+ = {∃x ∈ V : σ(x) = s and x ↔ ∞}, A− = {∃x ∈ V : σ(x) 6= s and x ↔ ∞}. By Lemma 2.26 we can take the limit ∆ → Ld below to give the result, s π∆ (A) = |V | X q −i [φ1∆ (Bi ) + (q − 1)φ1∆ (Bi ∩ A+ ) − φ1∆ (Bi ∩ A− )]. i=0 2.8 Uniqueness by convexity of pressure Fix q ≥ 1. Except possibly for countably many p, limit random-cluster measures on Ld are unique [33]. The proof uses the convexity of a certain function, pressure, defined in terms of the random-cluster partition function. We extend this technique to give a uniqueness result for the q = 2 limit DRC measures φλ = φλa,p,q on Ld . By the coupling, we also get a uniqueness result for q = 2 ξ Blume–Capel–Potts limit measures π ξ = πK,∆,q . In this section, let (2.3) fix the relationship between a, p and K, ∆. Theorem 2.27. Let q = 2. The set of points a, p at which φ0a,p,q 6= φ1a,p,q , and 0 1 2 πK,∆,q 6= (πK,∆,q + πK,∆,q )/2, can be covered by a countable collection of rectifiable curves of [0, 1]2 . Let Λ = (V, E) be a region in Ld . Take b ∈ {0, 1} to be both a DRC boundary condition and a BCP boundary condition. By Theorem 2.4 we can define, 1 1 DRC BCP log q −b Za,p,q (b, Λ) = log ZK,∆,q (b, Λ) |V | |V | h i X X 1 = log exp − K|Eσ | + 2K δσ(x),σ(y) − ∆|Vσ | . |V | σ GbΛ (K, ∆) = hx,yi∈Eσ 37 2.8 Uniqueness by convexity of pressure The sum is over BCP configurations σ that are compatible with boundary condition b. Differentiating GβΛ (K, ∆) with respect to ∆, ∂ b 1 b GΛ (K, ∆) = − π (|Vσ |) ∂∆ |V | 1 b =− φ (|Vψ |). |V | Also ∂ b 1 b GΛ (K, ∆) = − π −|Eσ | + 2 ∂K |V | X δσ(x),σ(y) hx,yi∈Eσ ! 1 b 2X =− φ −|Eψ | + ω(e) . |V | p e∈E Let i = (i1 , i2 ) be a unit vector in R2 . Let Var denotes variance with respect to π b . Differentiating in the direction i on the (K, ∆)-plane, X 1 ∂2 b G (K, ∆) = Var − i |E | + 2i δ − i |V | ≥ 0. 1 σ 1 2 σ σ(x),σ(y) ∂i2 Λ |V | hx,yi∈Eσ GbΛ is convex. Lemma 2.28. Let ∆ ∈ R, K ∈ [0, ∞) and b ∈ {0, 1}. The limit, G(K, ∆) = lim GbΛ (K, ∆) Λ→Ld exists. It is a convex function of K, ∆ and independent of b. Proof of Theorem 2.27. By Theorem 8.18 of [23] and the convexity of G, the subset of [0, ∞) × R where G(K, ∆) is not differentiable can be covered by a countable collection of rectifiable curves. Let Jx be the event that vertex x is open. By translation invariance and the monotonicity of vertex DRC measures, for any region Λ containing x, 1 X 0 1 X 1 φΛ (ψ(y)) ≤ φ0 (Jx ) ≤ φ1 (Jx ) ≤ φ (ψ(y)) . |V | y∈V |V | y∈V Λ 38 (2.29) 2.8 Uniqueness by convexity of pressure The left and right most terms in (2.29) are differentiable at (a, p), ∂ Gb ∂∆ Λ → ∂ G ∂∆ ∂ Gb , ∂∆ Λ b = 0, 1 respectively. If G is as Λ → ∞; φ0 (Jx ) = φ1 (Jx ). By Theorem 2.24 (c), the corresponding vertex measures satisfy Φ0 = Φ1 . As Φ0 = Φ1 , limΛ→Ld |V |−1 φb (|Eψ |) is independent of b. Take the limit of ∂ Gb (K, ∆) ∂K Λ as Λ → Ld . For all e ∈ Ed , φ0 (Je ) = φ1 (Je ). Applying Theorem 2.24 (c) gives φ0 = φ1 . We will use the following lemmas in a subadditivity argument to show GbΛ converges as Λ → Zd . The first bounds the BCP partition function. The second bounds how the partition function changes when edges are removed from the graph. Write ZGBCP for the BCP partition function on a graph G with no boundary conditions. Lemma 2.30. For graph G = (V, E), −K|E| − |V | · |∆| + |V | log 3 ≤ log ZGBCP ≤ K|E| + |V | · |∆| + |V | log 3. Lemma 2.31. Suppose removing edges F ⊂ E from a graph G = (V, E) creates disjoint subgraphs Gi = (Vi , Ei ), i=1,2. Then log ZGBCP + log ZGBCP − K|F | ≤ log ZGBCP ≤ log ZGBCP + log ZGBCP + K|F |. 1 2 1 2 Proof of Lemma 2.28. First assume b = 0. For a region Λ = (V, E) the edges BCP between V and the boundary ∂V make no contribution to ZK,∆,q (b, Λ) and can be ignored. Let n = (n1 , . . . , nd ) ∈ Nd and let Λn be the region defined by the Q |n| = n1 . . . nd vertices in di=1 [1, ni ]. Let k ∈ Nd and write j k ni (n, k) = ki :1≤i≤d , ki jnk k = d Y ni i=1 ki . Let ∆ be the region defined by the vertices in Λn but not in Λ(n,k) . Then by Lemma 2.31, j n k k log ZΛBCP −K k d X |k| i=1 ki 39 BCP ≤ log ZΛBCP . + log Z∆ n 2.9 Phases and comparison results Apply Lemma 2.30 to ∆, BCP log Z∆ ≥ (−dK − |∆| + log 3) d X |n| i=1 ni ki . Divide by |n| and let the ni tend to ∞, d X 1 1 − K log ZΛBCP ≤ lim inf GbΛn . k n→∞ |k| k i i=1 Let k → ∞, lim sup GbΛk ≤ lim inf GbΛn . n→∞ k→∞ Hence G = limn→∞ GbΛn exists. Each GbΛn is a convex function of K, ∆ and so G is too. If b = 1 we can ignore the boundary conditions by Lemma 2.31; the proportion of vertices in Λn adjacent to the external vertex boundary tends to 0 as n → ∞. Proof of Lemma 2.30. The partition function ZGBCP is a sum over 3|V | terms, each bounded between exp(−K|E| − |V | · |∆|) and exp(K|E| + |V | · |∆|). BCP BCP BCP Proof of Lemma 2.31. Notice log Z(V,E\F ) = log ZG1 + log ZG2 , the BCP mea- sure is independent between disjoint graph components. For every edge you add to a graph, the change to log Z BCP is bounded by ±K. 2.9 Phases and comparison results In this section we will discuss the phase diagrams of the BCP model and the DRC model. By the coupling of Theorem 2.4, and in particular equation (2.5), we can discuss them in parallel. For concreteness we will specify 1 boundary conditions for both models with q = 2 and d = 2. There are two properties of the DRC model that we are interested in: (i) φ1 has an infinite open edge cluster. (ii) φ1 has an infinite closed vertex cluster. 40 2.9 Phases and comparison results K (a) (b) (c) ∆ Figure 2.1: Approximate Blume–Capel phase diagram For all a, p the probability of each property is either 0 or 1. The corresponding phases of the BCP model are: (i) π 1 (σ(0) = 1) > 12 π 1 (σ(0) ∈ {1, 2}). (ii) π 1 has an infinite 0 spin nearest-neighbour cluster. Capel’s predictions for the Blume–Capel phase diagram are illustrated in Figures 2.1 on the K, ∆ plane and in Figure 2.2 on the a, p plane. The areas indicated are, (a) where property (i) holds, (b) where property (ii) holds, (c) where neither property (i) nor property (ii) holds, there is neither type of long range order. Recall the original parameterization of the Blume–Capel model, K = βJ and ∆ = βD. With J, D fixed, as temperature increases (K, ∆) moves towards (0, 0) along a straight line, (a, p) moves towards (1/2, 0) along the path a = (1 − p)D/(2J) . 1−a 41 2.9 Phases and comparison results 1 (a) 0.8 0.6 (b) (c) p 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 a Figure 2.2: Approximate Blume–Capel phase diagram Such paths are indicated in Figures 2.1 and 2.2 by grey lines. A first order phase transition is one where a property of the model, such as the density of +1 spins, is discontinuous as a function of the parameters. A second order phase transition is one without this phenomenon. Capel predicted that a tri-critical point existed where the three phases meet, lying on the curve 2 a/(1 − a) = (1 − p) 3 log 4 . The boundary between (a) and (b) is thought to be first-order and arrive at (a, p) = (0, 1) at the same gradient as the curve a/(1 − a) = (1 − p). The boundary between (a) and (c) is thought to be secondorder. By the comparison inequalities, if (a, p) lies in region (a) then so does the rectangle bounded by (a, p) and (1, 1). Similarly, if (a, p) lies in region (b) then 42 2.9 Phases and comparison results 1 0.8 0.6 p 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 a Figure 2.3: Blume–Capel comparison results so does the rectangle bounded by (0, 0) and (a, p). We can use the comparison results with certain points of reference to draw more detailed inferences about the phase diagram. Three points are marked on Figure 2.3. (i) (a, p, q) = (1, √ 2/(1 + √ 2), 2): The a = 1 DRC model is a random-cluster measure. The q = 2 random-cluster measure on L2 has critical point p = √ √ 2/(1 + 2) ∼ 0.57. By Theorem 2.17, the q = 2 DRC measure has an infinite open edge cluster to the right of the curve from (a, p) = (1, 0.57) to (a, p) = (1, 1). The corresponding BCP measure has a preponderance of 1 states. Below the line p = 0.57, φ has no infinite open edge clusters. In 43 2.9 Phases and comparison results the corresponding BCP measure, the 1-spins and the 2-spins have the same distribution. site (ii) (a, p, q) = (0, (1 − psite c )/(1 + pc ), 2): To the left of this point on the line p = 0, Φ is a product measure with an infinite cluster of closed vertices. 2 Take psite c , the critical probability for independent site percolation on L , to be 0.59. Site percolation is a nearest-neighbour vertex percolation model. Vertices are open with probability p, adjacent open vertices are connected. By Theorem 2.10 (b) this point stochastically dominates the area to the left of the curve from (0.25, 0) to (0, 1). There is an infinite nearest-neighbour cluster of closed vertices. This corresponds to a BCP measure that has an infinite cluster of 0 states. site (iii) (a, p, q) = 0, psite /(2 − p ), 2 : To the right of this point on the line p = 0, c c Φ is a product measure with an infinite cluster of open vertices. By part (c) of Theorem 2.10, the region to the right of the line a = 0.41 stochastically dominates (0.41, 0), and therefore Φ has an infinite open vertex cluster. In addition, the DRC measure on the line a/(1−a) = 1−p with q = 1 correspond to an Ising model with zero external magnetic field. On the line segment above √ √ (a, p) = (1/((1 + 2)4 + 1), 1 − 1/(1 + 2)4 ), the Ising model is supercritical—the boundary conditions determine which spin is dominant. Below the supercritical line segment, the q = 1 DRC model has a preponderance of 0 spins. Above the supercritical line segment, there is a preponderance of 1 spins. By part (c) of Theorem 2.10, compare (a1 , p1 , 2) and (a2 , p2 , 1) with (a2 , p2 ) ‘just below’ the supercritical portion of a/(1 − a) = 1 − p. We need a1 3 (1 − p2 ) ≥ 2 (1 − p1 )2 . 1 − a1 This occurs if 1 − p2 ≥ 2a1 1 − a1 , √ −4 p1 ≥ p 2 ≥ 1 − 1 + 2 . The area below the curve from (a, p) = (.0541, .986) to (0, 1) is in region (b). 44 2.9 Phases and comparison results By part (d) of Theorem 2.10, compare (a2 , p2 , 2) and (a1 , p1 , 1) with (a1 , p1 ) ‘just above’ the supercritical portion of a/(1 − a) = 1 − p. We need, a2 p1 p2 2 (1 − p2 )2 ≥ (1 − p1 )3 and ≤ . 1 − a2 1 − p1 2(1 − p2 ) Set p1 = p2 . 2−p2 Then we need, 2a2 8(1 − p2 ) ≥ 1 − a2 (2 − p2 )3 √ 2 − 2(1 + 2)4 √ and p2 ≥ . 2 − (1 + 2)4 The area to the right of the curve from (a, p) = (.0145, .970) to (a, p) = (0, 1) has an infinite open vertex cluster. Further, we can see that this region is in (a) as follows. Let S be the set of dominant spins of a super-critical Ising model with parameter βJ = − 18 log(1 − p1 ). By the coupling of the Ising and the randomcluster model, the critical probability for bond percolation on S is a.s. less than π where π = 1 − exp(−2βJ). The set of 1 vertices under φa2 ,p2 ,2 stochastically dominates S, and the (p2 , 2) random-cluster measure stochastically dominates a product measure with density π. Hence φa2 ,p2 ,2 has an infinite open edge cluster. 45 Chapter 3 Influence and monotonic measures 3.1 Influence beyond KKL In the first chapter we introduced the topics of influence and sharp thresholds. We have so far only stated the influence result of KKL for Bernoulli product measure with density 1/2, the uniform measure on the Hamming space Ω = {0, 1}N . The theorem has been extended to more general product measures. A sharp threshold result follows by Russo’s formula. In this chapter we extend the notion of influence to monotonic measures. We prove influence and sharp threshold results for monotonic measures, in particular the random-cluster measure. A classic result in percolation theory is due to Kesten; the critical value for bond percolation on the square planar lattice L2 is one half [42]. His proof settled an old conjecture. The lower bound pc ≥ 1/2 was shown by Harris in 1960. Harris’s lower bound made use of the self duality of bond percolation on L2 at p = 1/2. More recently, Bollobás and Riordan [9] produced a shorter proof using the newly developed sharp threshold technology. The main motivation for this chapter is to explore the possiblility of adapting their technique to the random-cluster setting. The random-cluster model on L2 √ √ has a form of self duality at psd = q/(1+ q). We prove a sharp threshold result for short-ways rectangle crossings on a finite region of the square planar lattice 46 3.1 Influence beyond KKL with cyclic boundary conditions. This result is not strong enough to determine the critical value pc . In Section 3.9 we show that a related but stronger result, a sharp threshold result for long-ways rectangle crossings on L2 , would determine pc . Friedgut and Kalai extended the KKL influence result to µp , product measure with density p ∈ (0, 1). Recall that we write ω ∈ Ω for a general configuration and Xi for the random variable Xi (ω) = ω(i). Let i ∈ I = {1, . . . , N }; let A ⊂ Ω be an increasing event. The influence of coordinate i is, IA (i) = µp (A | Xi = 1) − µp (A | Xi = 0) = µp (ω i ∈ A, ωi 6∈ A). (3.1) Throughout the chapter, C > 0 is an absolute constant. Theorem 3.2 (Friedgut and Kalai [28]). There exists i ∈ I such that IA (i) ≥ C min{µp (A), µp (Ac )} log N . N The following is the most general influence result for product measure. Let E = [0, 1]N , the Euclidean cube, and let (E, E, λ) be the uniform (Lebesgue) measure space. E has the coordinate-wise partial order. Let A ⊂ E be increasing, if x ≤ y and x ∈ A then y ∈ A. Let Ui be the identity random variable for the i-th coordinate, Ui (x) = x(i). A continuous version of influence is IA (i) = λ(A | Ui = 1) − λ(A | Ui = 0). (3.3) Theorem 3.4 (BKKKL [10]). For some i ∈ I, IA (i) ≥ C min{λ(A), λ(Ac )} log N . N In Section 3.10 we extend this to an influence result for continuous monotonic measures. Notice that the result of Friedgut and Kalai follows from the result of BKKKL by embedding. Let f : E → Ω be defined by f (x(1), . . . , x(N )) = (ω(1), . . . , ω(N )) 47 3.2 Symmetric events where for all i ∈ I, ( 1 x(i) ≥ 1 − p, ω(i) = 0 otherwise. Function f is increasing, if A ⊂ Ω is increasing so is B = f −1 (A) ⊂ E. For all i, IA (i) = IB (i). The definition of influence for (E, E, λ) requires some clarification. It depends on the version of conditional expectation used. Interpret the conditional expectations in (3.3) as the N − 1 dimensional Lebesgue measures on {1} × [0, 1]N −1 and {0} × [0, 1]N −1 . We could have chosen a more robust definition of influence, let IˆA (i) = lim λ(A | Ui ∈ [1 − ε, 1]) − λ(A | Ui ∈ [0, ε]). ε→0 Consider the two events A = {x ∈ [0, 1]2 : x(1) ≥ 1/2}, B = A ∪ {x ∈ [0, 1]2 : x(2) = 1}. They only differ on a null set, so IˆA (2) = IˆB (2) = 0. In contrast, IA (2) = 0, IB (2) = 1/2; the influence for B seems exaggerated. However, Theorem 3.4 also holds with Iˆ substituted for I. Given an increasing set B, construct a set A that only differs from B on a null set. Remove points on the ‘upper’ faces such as (1, x(2), . . . , x(N )) if limx(1)→1− 1A (x) = 0, and add points on the ‘lower’ faces such as (0, x(2), . . . , x(N )) if limx(1)→0+ 1A (x) = 1. Then A is increasing and λ(A) = λ(B). For all i ∈ I, IA (i) = IˆB (i); we can apply Theorem 3.4 to A. 3.2 Symmetric events Let SN be the group of permutations of I = {1, . . . , N }. Let Γ be a subgroup of SN such that for all i, j ∈ I, there is a γ ∈ Γ with γ(i) = j; Γ is said to act transitively on I. For an event A and a permutation γ, define event Aγ by (ω(1), . . . , ω(N )) ∈ A ↔ (ω(γ(1)), . . . , ω(γ(N ))) ∈ Aγ . An event A is Γ-invariant if for all γ ∈ Γ, A = Aγ . An event is symmetric if it is Γ-invariant for a group Γ that acts transitively on I. Take for instance the 48 3.3 Sharp thresholds for product measures majority event ( A= ω∈Ω: ) X ω(i) ≥ N/2 . i∈I A is SN -invariant. Similarly the tribe event with m tribes of n members is invariant with respect to the group of size m!(n!)m generated by permutations within a tribe, and permutations of whole tribes. For a symmetric event A, the influence is the same for each i so X IA (i) ≥ C min{µp (A), µp (Ac )} log N. (3.5) i∈I Contrast this with the dictatorship event A = {ω : ω(1) = 1}; P i∈I IA (i) = 1 for all p. 3.3 Sharp thresholds for product measures Let ω ∈ Ω and let A be an increasing event. If ωi 6∈ A, ω i ∈ A, we say that i is critical for A. The probability that coordinate i is critical is IA (i). The higher the expected number of critical coordinates, the more sensitive µp (A) is to small changes in p. Russo’s formula [51] quantifies this, X d IA (i). µp (A) = dp i∈I Let A be a symmetric event, by Russo’s formula and (3.5), d µp (A) ≥ C min{µp (A), µp (Ac )} log N. dp When µp (A) ≤ 1/2, dividing both sides above by µp (A) gives, d log µp (A) ≥ C log N. dp Similarly for µp (A) ≥ 1/2, d log µp (Ac ) ≤ −C log N. dp Integrating over p gives a sharp threshold result [8, 28]. 49 3.4 Self duality for bond percolation on L2 Theorem 3.6. Let A 6∈ {∅, Ω} be an increasing symmetric event. Let ε ∈ [0, 1/2]. If p, p0 ∈ (0, 1) are such that µp (A) ≥ ε and p0 − p = 2 log 1/(2ε) C log N then µp0 (A) ≥ 1 − ε. This technique was first used in [28] to prove a sharp threshold theorem for graph properties on Erdős–Rényi graphs. The same technique can be used on graphs with less symmetry. Let Td = (Z/nZ)d be the d-dimensional torus; the nearest neighbour graph on Td can be thought of as the subgraph [0, n]d of Ld with cyclic boundary conditions imposed. Let Γ be the group of permutations generated by the translations and rotations of Td . Every Γ-invariant increasing event has a sharp threshold. 3.4 Self duality for bond percolation on L2 Let G = (V, E) be a simple graph (no loops or multiple edges) drawn on the Euclidean plane such that no edges cross. In every face, including in unbounded faces, mark a point; let V 0 be the set of marked points. These are the dual vertices. Every edge e ∈ E separates two faces; let e0 be a dual edge between the corresponding dual vertices. Let E 0 be the set of dual edges; then G0 = (V 0 , E 0 ) is the planar dual of G. G0 may depend of the particular planar embedding of G. Similarly, if a graph is embedded in a planar manifold, such as a torus, the dual with respect to that manifold can be constructed by following the same algorithm. In Figure 3.1 we show a small finite graph (edges marked as black lines) and the dual graph (dual vertices marked as black dots, dual edges marked as dotted lines). Bond percolation with density p on G is naturally coupled to bond percolation with density 1 − p on the dual graph. Every edge e ∈ E crosses exactly one dual edge e0 ∈ E 0 , and vice versa. Define e0 open if e is closed, and e0 closed if e is open. The square planar lattice L2 = (Z2 , E2 ) is self dual. In Figure 3.2 we show a subset of the lattice L2 and the isomorphic dual lattice. We have shaded part of 50 3.4 Self duality for bond percolation on L2 Figure 3.1: Finite graph and dual 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 Figure 3.2: Square planar lattice and isomorphic dual 51 3.5 Influence for monotonic measures the figure grey. We have left in black a [0, n + 1] × [0, n] rectangle R (n = 4) and a similar rectangle R0 in the dual lattice, rotated through 90◦ . Write LW(R) for the event that R is crossed long-ways (horizontally in this case) by a path of open edges. Notice that exactly one of the following happens: either LW(R), or R0 is crossed long-ways (vertically) by open dual edges, LW(R0 ). This is a consequence of the max-cut min-flow theorem. As µ1−p is the measure on the dual lattice, µp (LW(R)) + µ1−p (LW(R)) = 1. At p = 1/2, p = 1 − p so bond percolation is self-dual, µ1/2 (LW(R)) = 1/2. Write Rm,n for a [0, m] × [0, n] rectangle. Seymour and Welsh [52] proved that for k = 1, 2, . . . , lim inf µ1/2 (LW(Rn,kn )) > 0. n→∞ Kesten used an intricate path counting argument to show a sharp threshold for the event LW(Rn,2n ) for large n. It follows that pc = 1/2. We will later give an argument for this involving a construction of rectangles. Bollobás and Riordan [9] used the sharp threshold result of Friedgut and Kalai to simplify Kesten’s result. Place an R2n,4n rectangle in a discrete torus, large enough to prevent ‘wrapping round’, say T26n = (Z/6nZ)2 . Let A be the symmetric increasing event that there is a copy of Rn,5n somewhere in T26n , allowing rotation and translation, that is crossed long-ways. The torus can be covered by a small number of copies of R2n,4n such that A implies one of the copies is crossed long-ways. The sharp threshold result for A implies a sharp threshold result for LW (R2n,4n ). 3.5 Influence for monotonic measures Let µ be a positive measure on the Hamming space Ω = {0, 1}N ; µ(ω) > 0 for all ω ∈ Ω. For J ⊂ I and ξ ∈ Ω, let µξJ (η) = µ(Xj = η(j) for j ∈ J | Xi = ξ(i) for i ∈ I \ J). Recall that the following are equivalent for µ positive [34]. (i) µ satisfies the FKG lattice condition (1.4). 52 3.5 Influence for monotonic measures (ii) µ is monotonic: for J ⊂ I and A increasing, µξJ (A) is increasing in ξ ∈ Ω. (iii) µ is strongly positively-associated: for J ⊂ I, ξ ∈ Ω and A, B increasing, µξJ (A ∩ B) ≥ µξJ (A)µξJ (B). Let µ be a positive monotonic measure on (Ω, F). For an increasing set A ⊂ Ω, define the conditional influence of coordinate i by, IA (i) = µ(A | Xi = 1) − µ(A | Xi = 0). As µ is monotonic, the influence is always non-negative. When µ is a product measure, conditional influence is exactly the influence, see (3.1). Equality (3.1) for product measures also motivates an alternative definition for influence, I A (i) = µ(ω i ∈ A, ωi 6∈ A). Unlike with product measures, it is not generally the case that IA (i) = I A (i). Consider the following measure. Take N odd for convenience. Let µ1/3 , µ2/3 be product measures on Ω with densities 1/3 and 2/3 respectively. Let µ = (µ1/3 + µ2/3 )/2; this corresponds to picking a coin at random, with bias 1/3 or 2/3, and then tossing it N times. Let A be the majority event, that more than half of the coordinates are 1. By symmetry µ(A) = 1/2. For each coordinate, the conditional influence IA (i) = Θ(1). The majority event is strongly correlated with the choice between the two coins. However, coordinate one is only critical if the other coordinates sum to exactly (N − 1)/2. The probability of this, I A (1), decay exponentially with N . While there can be no influence theorem for the I A (i), we can extend the influence results for product measure to conditional influence results for monotonic measures. Theorem 3.7. Let µ be a positive monotonic measure on (Ω, F). Let A ⊂ Ω be an increasing event. Then for some i ∈ I, IA (i) ≥ C min{µ(A), µ(Ac )} 53 log N . N 3.5 Influence for monotonic measures Proof. Let λ be the uniform measure on [0, 1]N . We will construct a function f : [0, 1]N → {0, 1}N . We will do this on a coordinate-wise basis, constructing f = (f1 , . . . , fN ) ∈ Ω for given x ∈ [0, 1]N . Function f will have the following properties. (i) For all i ∈ I, fi only depends on x(1), . . . , x(i). (ii) For all ω ∈ Ω, λ(f = ω) = µ(ω). (iii) Function f is increasing with respect to the corresponding partial orders on the domain and co-domain. Let B = f −1 (A) = {x ∈ [0, 1]N : f (x) ∈ A}. The distribution of f under λ is equal to the distribution of the identity function under µ, so λ(B) = µ(A). The result then follows by showing that IA (i) ≥ IB (i) for all i ∈ I. Notice that IB (i) is an influence with respect to the product measure on [0, 1]N and IA (i) is a conditional influence with respect to µ. We now construct f (x). Let 1 x(1) ≥ 1 − µ(X1 = 1), f1 = 0 otherwise. Notice f1 only depends on x(1), f1 is increasing in x(1) and λ(f1 = 1) = µ(X1 = 1). For i ∈ {1, . . . , N − 1}, assume inductively: (i) fi only depends on x(1), . . . , x(i), (ii) µ(Xk = ak for k = 1, . . . , i) = λ(fk = ak for k = 1, . . . , i) for all a1 , . . . , ai ∈ {0, 1}, and (iii) fi is increasing in x(1), . . . , x(i). Let fi+1 1 x(i + 1) ≥ 1 − µ(Xi+1 = 1 | Xk = fk for k = 1, . . . , i), = 0 otherwise. Then clearly the first and second property follow immediately. The third property follows by the monotonicity of µ. The event Xi+1 = 1 is increasing, so 54 3.5 Influence for monotonic measures µ(Xi+1 = 1 | Xk = fk for k = 1, . . . , i) is increasing in f1 , . . . , fi , which by assumption are increasing in x(1), . . . , x(i). By induction f has the properties claimed. In the construction of f we have made a choice—the order in which we construct the fi based on the fj already constructed—we arbitrarily took the natural order 1, . . . , n. Below we will consider a different order. We now claim that IA (i) ≥ IB (i) for i ∈ I, where B = f −1 (A). The result follows from this claim by the BKKKL influence result applied to B. For i = 1, this is immediate. Measure µ is positive, so the conditional distributions of f under λ given x(1) = 0 and x(1) = 1 are exactly the conditional distributions of µ given X1 = 0 and X1 = 1, respectively. Let i > 1, we construct a function g : [0, 1]N → {0, 1}N in a similar fashion to f , except we construct coordinates gj in the order i, 1, . . . , i − 1, i + 1, . . . , N . Let 1 x(i) ≥ 1 − µ(Xi = 1), gi = 0 otherwise. For j = 1, . . . , i − 1, let 1 x(j) ≥ 1 − µ(Xj = 1 | gi , g1 , . . . , gj−1 ), gj = 0 otherwise. Then for j = i + 1, . . . , N , let 1 x(j) ≥ 1 − µ(Xj = 1 | g1 , . . . , gj−1 ), gj = 0 otherwise. Let D = g −1 (A). Then ID (i) = IA (i), just as IB (1) = IA (1). All that remains is to show that ID (i) ≥ IB (i). This follows from the monotonicity of µ. Let x ∈ [0, 1]N and write xi = (x(1), . . . , 1, . . . , x(N )), xi = (x(1), . . . , 0, . . . , x(N )), with the change in the i-th coordinate. We show that g(xi ) ≥ f (xi ). Then combined with the converse result g(xi ) ≤ f (xi ) the claim follows. The intuitive 55 3.6 Russo formula for monotonic measures reason why g(xi ) ≥ f (xi ) is that µ is positively associated, so the sooner we process the ‘good’ information that coordinate i takes the value 1, the greater the upward lift elsewhere. Firstly, fi (xi ) = gi (xi ) = 1. We can now check inductively that gj (xi ) ≥ fj (xi ) for j = 1, . . . , i − 1, i + 1, . . . , N . Look at the definitions for fj and gj . At every step, the values fk (xi ) already defined (k = 1, . . . , j − 1) are smaller than the corresponding gk (xi ), so by monotonicity gj (xi ) ≥ fj (xi ). 3.6 Russo formula for monotonic measures Let µ be a positive measure on Ω = {0, 1}N , it need not be a probability measure. Suppose that µ satisfies the FKG lattice condition (1.4). Define a parameterized set of probability measures µp for p ∈ (0, 1) by µp (A) = Y 1 X µ(ω) pω(i) (1 − p)1−ω(i) Z ω∈A i∈I with Z= X µ(ω) ω∈Ω Y pω(i) (1 − p)1−ω(i) . i∈I Then it is immediate that µp is positive and satisfies the FKG lattice condition; µp is monotonic. The obvious example of such a family of measures is given by the random-cluster measures on a graph. Fix q ≥ 1 and let I be the edge set of a graph. Take µ(ω) = q k(ω) ; k(ω) counts the number of open-clusters. Then µp is the (p, q) random-cluster measure. Russo’s formula has been extended to cover random-cluster measures [5]. It extends naturally to this setting. For convenience, write A for the indicator function of an increasing event A. Theorem 3.8. With covp the covariance with respect to µp , X d 1 µp (A) = covp (Xi , A). dp p(1 − p) i∈I 56 3.6 Russo formula for monotonic measures Proof. Differentiating Z × µp (A), Y d X µ(ω) pω(i) (1 − p)1−ω(i) dp ω∈A i∈I XX Y 1 − ω(i) ω(i) 1−ω(i) ω(i) − = µ(ω) p (1 − p) p 1−p i∈I ω∈A i∈I X µp (Xi A) µp ((1 − Xi )A) − . =Z p 1−p i∈I The above with A = Ω gives X µp (Xi ) µp (1 − Xi ) d Z=Z − . dp p 1−p i∈I Therefore, X µp (Xi A) µp ((1 − Xi )A) d µp (A) = − dp p 1−p i∈I X µp (Xi ) µp (1 − Xi ) − µp (A) − p 1−p i∈I X 1 = covp (Xi , A). p(1 − p) i∈I Conditional influence and covariance are related, µp (AXi ) µp (A(1 − Xi )) − µp (Xi ) µp (1 − Xi ) µp (AXi )µp (1 − Xi ) − µp (A(1 − Xi ))µp (Xi ) = µp (Xi )µp (1 − Xi ) covp (Xi , A) = . µp (Xi )µp (1 − Xi ) IA (i) = Let ξp = min i∈I so µp (Xi )µp (1 − Xi ) , p(1 − p) X d µp (A) ≥ ξp IA (i). dp i∈I 57 3.7 Sharp thresholds for the random-cluster measure Say that µ is Γ-invariant if for all γ ∈ Γ, for all events A, µ(A) = µ(Aγ ). If increasing event A and measure µ are both Γ-invariant for some transitive Γ then IA (i) is independent of i. Let µ be such a measure. Then d µp (A) ≥ Cξp min{µp (A), µp (Ac )} log N. dp There is an additional complication compared with the product case. If we want to prove the existence of a sharp threshold in the neighbourhood of p, then we need to bound ξp away from zero. 3.7 Sharp thresholds for the random-cluster measure Let G = (V, E) be a finite simple graph. A graph automorphism is a permutation π on the vertex set V such that ∀x, y ∈ V, hx, yi ∈ E ↔ hπ(x), π(y)i ∈ E. Let γ be the induced permutation on edge set E, γ(e) = hπ(x), π(y)i, e = hx, yi ∈ E. Call γ a graph edge automorphism. Let Π be the group of graph automorphisms of G. Let Γ be the corresponding group of graph edge automorphisms. We then say that G is Γ-invariant. If Γ acts transitively on E we say that G is symmetric. Let φp,q be the (p, q) random-cluster measure on {0, 1}E with p ∈ (0, 1), q ≥ 1, φp,q (ω) = 1 k(ω) Y ω(e) q p (1 − p)1−ω(e) . Z e∈E If Γ is the group of graph edge automorphisms, φp,q is Γ-invariant. Recall that φp,q has a finite energy property. Let Ce be the event that the end vertices of edge e = hx, yi are connected by an open path of edges in E \ {e}. Then φp,q (Xe | Ce ) = p, φp,q (Ee | Cec ) = 58 p . p + q(1 − p) 3.8 Self duality for the random-cluster model As t → t(1 − t) is concave, min{φp,q (Xe = 1)φp,q (Xe = 0)} ≥ min t(1 − t) : t ∈ p ,p p + q(1 − p) so ξp ≥ min{p(1 − p), p(1 − p)q/(p + q(1 − p))2 } ≥ 1/q. p(1 − p) If graph G and increasing event A are symmetric, d C φp,q (A) ≥ min{φp,q (A), φp,q (Ac )} log N. dp q As in Section 3.3, this gives a sharp threshold result. Theorem 3.9. Let A 6∈ {∅, Ω} be an increasing symmetric event on symmetric graph G. Let ε ∈ [0, 1/2]. If p, p0 ∈ (0, 1) are such that φp,q (A) ≥ ε and p0 − p = 2q log 1/(2ε) C log N then φp0 ,q (A) ≥ 1 − ε. 3.8 Self duality for the random-cluster model The square planar lattice L2 and the torus T2n are self dual graphs. Bond percolation with density p is dual to bond percolation with density 1 − p on the same graph. Let q ≥ 1 and let p, p0 satisfy p0 1−p = q . 1 − p0 p The (p, q) random-cluster model on a finite graph is dual to the (p0 , q) randomcluster model on the dual graph. Let φn,p be the (p, q) random-cluster model on √ √ T2n , φn,p is dual to φn,p0 . The self dual point is psd = q/(1 + q). On L2 we have to take into account boundary conditions. The (p, q) random-cluster model with free boundary conditions φ0p,q is dual to the wired (p0 , q) random-cluster measure φ1p0 ,q , φ1psd ,q is dual to φ0psd ,q . 59 3.8 Self duality for the random-cluster model Let k ≥ 2, n ≥ 1. Let R = Rn,n+1 be the [0, n] × [0, n + 1] rectangle in T2kn and let LW (R) be the event that R is crossed long-ways. By duality, φkn,psd (LW(R)) = 1/2. Let A be the event that T2kn contains a copy of R, up to rotation and translation, that is crossed long-ways. With Γ the group of graph edge automorphisms of T2kn = (V, E), A = {ω ∈ Ω : ∃γ ∈ Γ, γ(ω) ∈ LW(R)}, Ω = {0, 1}E . The increasing event A and measure φkn,p are Γ-invariant. Therefore, for p ≥ psd , as |E| = 2(kn)2 , 1 φkn,p (A) ≥ 1 − |E|−C(p−psd )/q 2 ≥ 1 − (kn)−2C(p−psd )/q . We will now use the ‘square-root trick’ [9] to produce a sharp threshold result for the asymmetric event that a given rectangle is crossed. Let A1 , . . . , AM be a number of increasing events with equal probability. Then by positive association ! ! M M M \ Y [ c Ai ≥ φkn,p (Aci ) = (1 − φkn,p (A1 ))M 1 − φkn,p Ai = φkn,p i=1 i=1 so i=1 v u u M φkn,p (A1 ) ≥ 1 − t1 − φkn,p M [ ! Ai . i=1 Let 1 < α < k, let Hn,α = [0, αn] × [0, n/α] and let Vn,α = [0, n/α] × [0, αn]. We will cover T2kn with copies of Hn,α and Vn,α so that A implies that one of the copies is crossed short-ways. Let hn,α = {(l1 bn(α − 1)c, l2 bn(1 − α−1 )c : λ1 , λ2 ∈ Z} ∩ [0, kn]2 , vn,α = {(l1 bn(1 − α−1 )c, l2 bn(α − 1)c : λ1 , λ2 ∈ Z} ∩ [0, kn]2 . Let H be the set of translations of Hn,α by the vectors in hn,α . Similarly let V be the set of translations of Vn,α by the vectors vn,α . Let M = |H ∪ V|. If A occurs, then an element of H ∪ V is crossed short-ways. By the square-root trick, the probability Hn,α is crossed short-ways, φkn,p (SW(Hn,α )) ≥ 1 − 60 q M 1 − φkn,p (A). 3.9 Long-ways crossings and the critical point M = |hn,α | + |vn,α | kn kn =2 nbα − 1c nb1 − α−1 c k k ≤2 1+ 1+ , α − 1 − n−1 1 − α−1 − n−1 so M ≈ 2k 2 α/(α − 1)2 for n, k large. Theorem 3.10. Let k ≥ 2, n ≥ 1, and α ∈ (1, k). For p ∈ [psd , 1], φkn,p (SW([0, nα] × [0, nα−1 ])) ≥ 1 − e−g(p−psd ) where g = g(k, n, α, q) = 3.9 2C log(kn). Mq Long-ways crossings and the critical point Let q ≥ 1. The critical point for the random-cluster measure on L2 is, pc = inf{p : φ1p,q (0 ↔ ∞) > 0}. It is known that pc ≥ psd , and it is conjectured that pc = psd [34]. For completeness, we will now give a construction connecting rectangle crossings to percolation. The starting premise is stronger than the conclusion of Theorem 3.10. It involves long-ways crossings rather than short-ways crossings, and it is set on the square planar lattice rather than on a finite torus. Let φ1p,q be the (p, q) random-cluster measure on L2 with wired boundary conditions. Theorem 3.11. Let q ≥ 1. Let pk = φ1p,q (LW (Rn,2n )) with n = 2k . If for all p ≥ psd , ∞ Y pk > 0, k=1 the critical point pc = psd . Proof. We use a construction from [17]. Let LW([0, 2k ] × [0, 2k+1 ]) k odd, Ak = LW([0, 2k+1 ] × [0, 2k ]) k even. 61 3.10 Influence for continuous random variables By positive association and symmetry, for p > psd , ! \ Y φ1p,q Ak ≥ φ1p,q (Ak ) > 0. k k The intersection of the Ak implies the existence of an infinite path; a path along the length of Ak crosses a path along the length of Ak+1 . Let p > psd > p0 satisfy the duality condition p0 /(1 − p0 ) = q(1 − p)/p. By the duality of φ1p,q and φ0p0 ,q , 1 − pk is the probability of a short-ways crossing, 1 − pk = φ0p0 ,q (SW(R2k+1 −1,2k +1 )). Let rad(C) be the maximum n such that 0 is joined to the boundary of the box [−n, n]2 by a path of open edges. ∞ X 1 − pk ≤ k=1 ∞ X k=1 ∞ X ≤4 = If φ0p0 ,q (rad(C)) < ∞ then 2k+1 φ0p0 ,q (rad(C) ≥ 2k + 1) Q k φ0p0 ,q (rad(C) ≥ n) n=1 4φ0p0 ,q (rad(C)). pk > 0. If the expectation of rad(C) is finite for all p0 < psd then pc = psd . 3.10 Influence for continuous random variables We have extended Friedgut and Kalai’s influence result from product measure to monotonic measures on {0, 1}N . The most general influence result for product measures is the result of BKKKL on the Euclidean cube [0, 1]N . The definitions of monotonicity and positive association have been extended to this setting. Holley’s theorem and the FKG theorem also have continuous versions [1, 50]. It seems natural to ask how Theorem 3.7 can be extended beyond discrete monotonic measures. With E = [0, 1]N , let (E, E, λ) be the uniform (Lebesgue) measure on E. Let Ui be the projection random variable from E onto the i-th coordinate, 62 3.10 Influence for continuous random variables Ui (x) = x(i), for i ∈ I = {1, . . . , N }. While it is not generally the case that increasing subsets of E are Borel measurable, the following is presumably well known. Lemma 3.12. Increasing subsets of E are Lebesgue measurable. Let ρ be a density function ρ : E → [0, ∞) with λ(ρ) = 1. Let µρ be the corresponding probability measure, Z µρ (A) = ρ(x) dx, A ∈ E. A We will state results in this section in terms of density functions, though of course they are results for the corresponding measures. We say that ρ is stochastically dominated by ν, written ρ ≤st ν, if for all increasing sets A, µρ (A) ≤ µν (A). The following theorem is an extension of Holley’s theorem. Theorem 3.13 (Preston [50]). Let ρ, ν be density functions on E. If ∀x, y ∈ E, ρ(x ∧ y)ν(x ∨ y) ≥ ρ(x)ν(y) then ρ ≤st ν. We say that ρ is positively associated if for all A, B increasing µρ (A ∩ B) ≥ µρ (A)µρ (B). Let J ⊂ I and let ξ ∈ E. Define EJξ = {x ∈ E : x(i) = ξ(i) for i ∈ I \ J}. Now define the conditional measure µξρ,J (A) by the density ρξJ (x) ∝ ρ(x)1E ξ (x). J Call ρ monotonic if for all J ⊂ I, ρξJ is stochastically increasing in ξ. We say that density function ρ satisfies the FKG lattice condition if ∀x, y ∈ E, ρ(x ∧ y)ρ(x ∨ y) ≥ ρ(x)ρ(y). 63 3.10 Influence for continuous random variables As in the discrete case, the FKG lattice condition for ρ positive implies strong positive-association [1]. This follows from Preston’s Theorem. Let J ⊂ I, let ξ ∈ E, and let A, B be increasing subsets of EJξ . For all x, y ∈ EJξ , ρξJ (x ∧ y)ρξJ (x ∨ y)1A (x ∨ y) ≥ ρξJ (x)ρξJ (y)1A (y). Therefore µξJ (·) ≤st µξJ (· | A), so µξJ (A ∩ B) ≥ µξJ (A)µξJ (B). The FKG lattice condition also implies monotonicity. Let J ⊂ I and ξ ≤ ζ ∈ E. Let x, y ∈ [0, 1]J , and write xξ for the configuration in EJξ that agrees with x on J. Putting xξ and y ζ into the FKG lattice condition gives ρξJ ((x ∧ y)ξ )ρζJ ((x ∨ y)ζ ) ≥ ρξJ (xξ )ρζJ (y ζ ), so ρξJ ≤st ρζJ . We can summarize these results as follows. Theorem 3.14. If strictly positive density function ρ satisfies the FKG lattice condition, (i) ρ is strongly positively-associated, and (ii) ρ is monotonic. Define the conditional influence IAρ (i) for monotonic density ρ and increasing set A, IAρ (i) = µρ (A | Ui = 1) − µρ (A | Ui = 0). Though it only depends on ρ on a null set, by monotonicity it satisfies IAρ (i) ≥ IˆAρ (i) := lim µρ (A | Ui ∈ [1 − ε, 1]) − µρ (A | Ui ∈ [0, ε]). ε→0 Theorem 3.15. Let ρ be a strictly positive monotonic density. Then ∃i ∈ I, IA (i) ≥ C min{µp (A), µp (Ac )} 64 log N . N 3.10 Influence for continuous random variables However, there does not seem to be a corresponding sharp threshold result for E = [0, 1]N . Let ρ be a monotonic density function. As in the discrete case, we can define an increasing family of monotonic density functions, N Y 1 ρp (x) = ρ(x) px(i) (1 − p)1−x(i) , Z i=1 with normalising constant Z. Russo’s formula extends to this setting, N X 1 d µp (A) = covρ,p (Ui , 1A ). dp p(1 − p) i=1 However, we cannot bound covρ,p (Ui , 1A ) in terms of IA (i). Let ρ = 1, so ρp is a product density for all p ∈ (0, 1). Let π = p/(1 − p), then log π N QN x(i) (1 − p)1−x(i) p 6= 1/2, i=1 p 2p−1 ρp (x) = 1 p = 1/2. Let A = [N −1 , 1]N . For all i ∈ I, IA (i) = 1, but as N → ∞, ( π −1/(π−1) p 6= 1/2, µp (A) → e−1 p = 1/2. There is no sharp threshold. Proof of Theorem 3.15. The proof is similar to the discrete case. We construct an increasing function f : E → E such that: (i) The law of the identity function U under µρ is the same as the law of f (U ) under λ. (ii) Given f1 , . . . , fi increasing functions of U1 , . . . , Ui , we construct fi+1 as an increasing function of U1 , . . . , Ui+1 such that (f1 , . . . , fi+1 ) under λ has the same distribution as (U1 , . . . , Ui+1 ) under µρ . (iii) With B = f −1 (A), IA (1) = IB (1), and IA (i) ≥ IB (i) for all i ∈ I. We first choose a strictly increasing function f1 : E → [0, 1] such that f1 only depends on the first coordinate. For convenience we will also write f1 as a function of one variable. We need, λ(f1 (U1 ) ≤ x(1)) = f1−1 (x(1)) = µρ (U1 ≤ x(1)). 65 3.10 Influence for continuous random variables Let φ : [0, 1] → [0, 1] be the continuous, increasing bijection given by x(1) Z φ(x(1)) = µρ (U1 ≤ x(1)) = Z Z(t) dt, Z(t) = ρ(t, y) dy. [0,1]I\{1} 0 Take f1 = φ−1 . For an integrable function h : [0, 1] → R, by change of variable, Z 1 Z 1 −1 λ(h(f1 (U1 ))) = h(φ (u(1))) du(1) = h(x(1))Z(x(1)) dx(1). 0 0 For every x, the density ρ conditional on the first coordinate is ρx{1} (y) = ρ(x(1), y) . Z(x(1)) The integrability of ρx{1} follows by Tonelli’s theorem. For any increasing event A, by the change of variable formula, Z 1Z µρ (A) = ρ(x(1), y)1A (x(1), y) dy dx(1) [0,1]I\{1} 0 Z 1 Z = [0,1]I\{1} 0 Z 1 Z f (u) = 0 ρx{1} (y)1A (x(1), y) dy Z(x(1)) dx(1) [0,1]I\{1} ρ{1} (y)1A (f1 (u(1)), y) dy du(1). We have ‘simplified’ sampling from the N dimensional density ρ. Take the first coordinate to have distribution λ(f1 (U1 )) and then sample from the N − 1 dimenf (x) sional density ρ{1} . Repeat this process a further N − 1 times for coordinates f (u) 2, 3, . . . , N ; ρ{1,...,N } ≡ 1 and Z µρ (A) = 1A (f (u)) du. [0,1]N Let B = f −1 (A). As f1 (x) only depends on x(1), IA (1) = IB (1). Let i > 1. Construct g as we constructed f but, as in the proof of Theorem 3.7, modify the order in which that coordinates are ‘sampled’. Process the coordinates in the order i, 1, . . . , i − 1, i + 1, . . . , N instead 1, . . . , N . We must show that f (x) ≤ g(x) when x(i) = 1, and similarly that f (x) ≥ g(x) when x(i) = 0. Given 66 3.10 Influence for continuous random variables x(i) = 1, f1 , g1 only depend on x(1). We must show that f1 (x(1)) ≤ g1 (x(1)), the result then follows by an inductive argument. Let φf (x) = µρ (U1 ≤ x), φg (x) = µρ (U1 ≤ x | Ui = 1). By monotonicity µρ (U1 ≤ x | Ui = a) is decreasing in a, so φf (t) ≥ φg (t). As f1 = (φf )−1 and g1 = (φg )−1 , the result follows. Proof of Lemma 3.12. This is immediate if N = 1. Suppose inductively that increasing subsets of [0, 1]N are Lebesgue measurable. Let A be an increasing subset on [0, 1]N × [0, 1]. Define a function f : [0, 1]N → [0, 1] ∪ ∞ by f (x) = inf{y ∈ [0, 1] : f (x, y) ∈ A}. The set A is increasing, so f is decreasing. By the inductive hypothesis, the sets {x : f (x) < c} are Lebesgue measurable; f is a measurable function and the graph Gf is null by Fubini’s theorem. Consider the subset of [0, 1]N +1 consisting of the ‘area above the graph’ Gf , A = {(x, y) ∈ [0, 1]N +1 : y ≥ f (x)}. A is measurable by the construction of the Lebesgue integral. Now A ⊂ A and A \ A ⊂ Gf is null, so A is measurable. The result follows by induction. 67 Chapter 4 The mean-field zero-range process 4.1 Balls and boxes In this chapter we consider a mean-field zero-range process (ZRP) motivated by a microcanonical ensemble. By considering the entropy of the system, we show that the empirical distribution rapidly converges—in the sense of Kullback–Leibler divergence—to a geometric distribution. The proof utilizes arguments of spectral gap and log Sobolev type. Suppose there are a number of boxes, N , each containing R indistinguishable balls. Iteratively, pick a source box and a sink box, both uniformly at random; if the source box is not empty then move a ball to the sink box. This is a Markov chain on the set BN = b ∈ NN : b1 + · · · + bN = N R . Here bi represents the number of balls in the i-th box. With probability N −1 the source box and the sink box are the same so no change occurs. The transition probability between neighbouring elements of BN is N −2 . The Markov chain is reversible with respect to the uniform distribution on BN . If we regard the boxes as distinguishable particles, and the balls as quanta of energy, the system is said to be a microcanonical ensemble with Maxwell–Boltzmann 68 4.1 Balls and boxes statistics [24, 40]. One can also consider the balls to be particles; the process can be said to have Bose–Einstein statistics [18]. P Let Nk be the number of boxes containing k balls, Nk = N . At equilibrium, the probability that (N0 , N1 , . . . ) = (n0 , n1 , . . . ) is proportional to the number of ball configurations b ∈ BN compatible with (n0 , n1 , . . . ). The number of −1 ways of putting b balls into N boxes is b+N ; the number of configurations is N −1 −1 |BN | = N R+N . Fix R. Under the equilibrium measure π, N −1 NR − k + N − 1 . NR + N − 1 π(B1 ≥ k) = N −1 N −1 k R = 1 + k 2 O N −1 as N → ∞. R+1 The law of B1 converges to a geometric distribution with mean R. The geometric distribution can be thought of as a Gibbs measure with respect to a linear energy function, say H : N → R given by H(n) = n. Recall from Section 1.3 that Gibbs measures are maximum entropy distributions. The equilibrium law of B1 converges to the maximum entropy distribution of N with mean R. In Section 4.3 we give an example to motivate our use of entropy to study the mean-field ZRP. The Ehrenfest urn model [20] is a simple example of a Markov chain that can be studied by looking at the entropy of the empirical distribution. The Ehrenfest model was proposed as a probabilistic model to help explain the deterministic nature of Boltzmann’s H theory. In Section 4.5 we define an entropy for the mean-field ZRP and show how it is related to the equilibrium distribution. We also summarize our results for how the entropy evolves. In Section 4.4 we discuss some general convergence techniques for reversible Markov chains [19]. If N0 , the number of empty boxes, is ‘fixed’, any given box behaves like a biased random walk on N. The equilibrium measure for the random walk is a geometric distribution. We study this random walk in detail in Section 4.7 using the techniques of Section 4.4. We will use these results to analyse the mean-field ZRP. In Section 4.8 we adapt a martingale concentration result [46]. We will prove bounds on the expected step-wise change of the mean-field ZRP entropy. The concentration result will be used to show that, with high probability, the entropy decreases as the bounds suggest. 69 4.1 Balls and boxes 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 0 10 20 30 40 50 60 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 0 10 20 30 40 50 60 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 0 10 20 30 40 50 60 0.05 0.05 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 0 0 10 20 30 40 50 60 Figure 4.1: A sample path: the empirical distribution for N = 107 , R = 20 at intervals of 40N steps 70 4.2 Notation In Section 4.9 we use coupling arguments to prove bounds on the empirical distribution of the mean-field ZRP. By coupling the process with a system of N biased random walks on N, we prove an upper bound on the tail of the empirical distribution. We show if a significant fraction of the boxes are empty, P then after O(R2 N log N ) steps, k≥j Nk decays exponentially in j. By a second coupling argument, we show that with high probability N0 ≥ N (5R)−1 after O(N R2 log(R + 1)) steps. In Sections 4.10 and 4.11 we use the results obtained from the biased random walk. We use a log Sobolev technique to prove that when the entropy is large, the entropy tends to halve over a period of O(R2 N log N ). Once the entropy is O(R−2 ), we use a spectral gap type technique to show that the entropy tends to halve over a period of O(R3 N ). 4.2 Notation We make use of asymptotic notation: ‘order less than’, O; ‘order equal to’, Θ; and ‘order greater than’, Ω, to describe the behaviour of functions as N → ∞. With c1 , c2 , N0 positive constants, if ∀N ≥ N0 , |f (N )| ≤ c2 |g(N )| write f (N ) = O(g(N )), if ∀N ≥ N0 , c1 |g(N )| ≤ |f (N )| ≤ c2 |g(N )| write f (N ) = Θ(g(N )), if ∀N ≥ N0 , c1 |g(N )| ≤ |f (N )| write f (N ) = Ω(g(N )). In Section 4.1 we made use of asymptotic notation with R fixed and N tending to infinity. We will want to express bounds that for all R, given R, hold for all N sufficiently large. When we write f (R, N ) = O(g(R, N )) as N → ∞ we mean that there is a constant c2 and a function N0 such that, ∀R, ∀N ≥ N0 (R), |f (R, N )| ≤ c2 |g(R, N )|. Similarly for Ω and Θ. 71 4.3 The Ehrenfest urn model The function log should be interpreted as the natural logarithm, unless indicated otherwise by a subscript; loga x = log x/ log a. If X(0), X(1), X(2), . . . is a sequence of real valued random variables, we will write ∆X(i) for the step-wise change X(i + 1) − X(i). For random variables X, Y , we say that X is stochastically smaller than Y , written X ≤st Y , if there is a coupling (X, Y ) such that X ≤ Y almost surely. 4.3 The Ehrenfest urn model In Chapter 1 we introduced the lazy Ehrenfest urn model [20]. An urn contains N balls, red or green. Start with the urn full of green balls. Each step pick a ball uniformly at random, and with probability one half replace it with a ball of the other colour. Assuming the balls are ordered, the process is time reversible with respect to the uniform distribution on the set {red, green}N . Let NR be the number of red balls, and let XR = NR /N be the fraction of red balls. After i steps, each ball has been picked binomial Bin(i, N −1 ) times. Each ball is red with probability pR (i) = (1 − [1 − N −1 ]i )/2. The Markov chain is of course just a random walk on the Hamming cube. The mixing properties have been studied in great detail [19, 41, 48]. The fraction of red balls is close to 1/2 after O(N log N ) steps. An alternative approach to the problem is to take a hydrodynamic limit [43]. Condition on XR = x. The next step, XR increases by N −1 with probability (1 − x)/2 and decreases by N −1 with probability x/2; the expected change is E(∆XR ) = N −1 (1/2 − x). The hydrodynamic limit XRf is the solution to the equation d f XR = 1/2 − XRf , XRf (0) = 0. dt In this simple case, we can easily find the hydrodynamic limit in closed form, XRf (t) = (1 − e−t )/2. 72 4.3 The Ehrenfest urn model 1 1 2 0 0 1 2 3 Time, scaled by N 4 5 Figure 4.2: Hydrodynamic limit XRf tends to 1/2 The hydrodynamic limit can also be studied by looking at the behaviour of a certain statistic. Let XGf = 1 − XRf be fraction of green balls, and define entropy S(XRf , XGf ) = −XRf log XRf − XGf log XGf . Entropy is initially 0; maximum entropy Smax = log 2 is obtained in the limit t → ∞ when XRf = XGf = 1/2. Again, we use the function δ introduced in Section 1.3, d y δ(x, y) = y log y − x log x − (y − x) t log t t=x = y log − (y − x). dt x The function is sketched for arbitrary x in Figure 4.3. If x = y, δ(x, y) = 0. By the convexity of t 7→ t log t, δ(x, y) ≥ 0 for all x, y. The function is bounded between (x − y)2 /(3x) and (x − y)2 /x when 0 ≤ y ≤ 2x. The upper bound is also 73 4.3 The Ehrenfest urn model x 0 0 x 2x 3x Figure 4.3: Graph of δ(x, y) as a function of y valid for all y. We can easily check that when XRf = x d 1−x S = (1/2 − x) log ≥ Smax − S. dt x Without having to solve the hydrodynamic limit differential equation, we can see that Smax − S decays exponentially, d (Smax − S) ≤ −(Smax − S). dt We can also perform an analogous calculation before taking the hydrodynamic limit, Smax − S E(∆S | XR = k/N ) ≥ + O(N −2 ). N We will use this approach to analyse the mean-field zero-range process. There are two additional complications over the continuous case. Firstly, we have an ‘error’ term of order N −2 . Secondly, the Markov chain is ergodic; over large time scales we cannot simply say that S is close to Smax . 74 4.4 Markov chain convergence 4.4 Markov chain convergence Let Ω be a finite set. Let K : Ω × Ω → [0, 1] be a Markov kernel: for all P x, y K(x, y) = 1. If µ is a distribution on Ω, µK i gives the distribution after i steps. We can also run the Markov chain in continuous time at rate 1, K − I is the corresponding Q-matrix. We say that the Markov chain is lazy if K(x, x) ≥ 1/2 for all x ∈ Ω. We will assume that K is irreducible, lazy and time reversible with respect to equilibrium probability measure π; for all x, y, π(x)K(x, y) = π(y)K(y, x). It is then standard that the left eigenvectors of K, suitably normalized, form a basis for RΩ orthonormal with respect to the inner product hf, giπ = hf, π −1 giRΩ = X f (x)g(x)/π(x). x∈Ω Label the eigenvectors F1 , . . . , Fn , with eigenvalues 1 = λ1 > λ2 ≥ λ3 ≥ · · · ≥ λn ≥ 0 respectively. The eigenvalues are all non-negative as the Markov Chain is lazy. The first eigenvector F1 = π, the stationary distribution. For a probability measure µ on Ω, µ(x) = π(x) + X αi Fi (x), αi = hµ, Fi iπ . i≥2 The χ2 -distance from µ to π, kµ/π − 1k22,π , can be written X (µ(x) − π(x))2 x∈Ω π(x) = hµ − π, µ − πiπ = X αi2 . i≥2 The quantity 1 − λ2 is called the spectral gap. More generally, if the Markov chain is not lazy, the spectral gap is 1 − maxi≥2 |λi |. Let µt = µe−t(I−K) . The eigenvalue of Fi with respect to e−t(I−K) is e−t(1−λi ) so, 2 X d X (µt (x) − π(x))2 µt (x) µt (y) =− π(x)K(x, y) − dt x π(x) π(x) π(y) x,y ≤ −2(1 − λ2 ) X (µt (x) − π(x))2 x 75 π(x) . (4.1) 4.4 Markov chain convergence Define the total variation distance between µ and π, 1X kµ − πkTV = sup |µ(A) − π(A)| = |µ(x) − π(x)|. 2 x A⊂Ω Write τ1 (ε) for the time to convergence within ε in total variation, τ1 (ε) = inf{t : ∀µ, kµt − πkTV ≤ ε}. Inequality (4.1) can be used to show the following bound [48], τ1 (ε) ≤ log(επmin ) , log λ2 πmin = min π(x). x (4.2) We will also use a log Sobolev technique. Log Sobolev inequalities for discrete Markov chains are described in [19]. Define the Dirichlet form, and the Laplacian for f, g : Ω → R, 1X π(x)K(x, y)[f (x) − f (y)][g(x) − g(y)], and 2 x,y X |f (x)|2 2 L(f ) = |f (x)| log π(x). kf k22,π x E(f, g) = The log Sobolev constant is defined, E(f, f ) : L(f ) 6= 0 . ν = min f L(f ) This is in a superficial way similar to the spectral gap; we can write the spectral gap in term of E, 1 − λ2 = min f E(f, f ) : varπ (f ) 6= 0 . varπ (f ) The entropy of µ relative to π, also called the Kullback–Leibler divergence from µ to π, is defined by DKL (µkπ) = X x µ(x) log µ(x) X = δ(π(x), µ(x)) ≥ 0. π(x) x It is not generally the case that DKL (µkπ) = DKL (πkµ). The log Sobolev constant can be used to show convergence in Kullback–Leibler divergence under the action of e−t(I−K) , µ d µt t DKL (µt kπ) = −E , log ≤ −4ν DKL (µt kπ). dt π π 76 (4.3) 4.5 Entropy for the mean-field ZRP Corollary A.4 of [19] gives a general bound for the log Sobolev constant in terms of the spectral gap and πmin , ν≥ 4.5 (1 − 2πmin )(1 − λ2 ) . log(1/πmin − 1) Entropy for the mean-field ZRP With Xk = Nk /N the fraction of boxes with k balls, the empirical distribution is the random variable X = (X0 , X1 , . . . ). Let D be the set of distributions with P P mean R; x ∈ [0, 1]N such that xk = 1 and kxk = R. Define the entropy function S : D → [0, (R + 1) log(R + 1) − R log R] by S(x) = − ∞ X xk log xk , 0 log 0 = 0. k=0 Up to a factor of log 2 this is the standard information-theoretic entropy. It is related to the thermodynamic entropy log N !/(N0 !N1 ! . . . ). The multinomial coefficient N !/(N0 !N1 ! . . . ) counts the permutations of B1 , . . . , BN ; under the equilibrium measure π, π(X = x) = |BN |−1 N! , n0 !n1 ! . . . xk = nk /N. The entropy and thermodynamic entropy are related. Provided each box contains O(R log N ) balls, N! S(X) − 1 log = O(RN −1 log2 N ). N N0 !N1 ! . . . This is simply by Stirling’s approximation [11], log n! = O(log n). (n/e)n We can estimate the expected change in entropy in terms of the expected changes E(∆Xk | X = x). This calculation is meant only to motivate two rather 77 4.5 Entropy for the mean-field ZRP lengthy calculations, the proofs of Lemmas 4.16 and 4.20. 1n −(xk (1 − x0 ) − xk+1 ) N (xk−1 (1 − x0 ) − xk ) − (xk (1 − x0 ) − xk+1 ) ∞ X d E(∆S | X = x) ≈ − (t log t) t=xk E(∆Xk | X = x) dt k=0 E(∆Xk | X = x) ≈ k=0 k>0 ∞ 1 X xk (1 − x0 ) ≈ (xk (1 − x0 ) − xk+1 ) log N k=0 xk+1 ≥ ∞ 1 X (xk (1 − x0 ) − xk+1 )2 ≥0 N k=0 max{xk (1 − x0 ), xk+1 } It is interesting to consider the hydrodynamic limit for this process. It is the solution Xf : [0, ∞) → [0, 1]N to the differential equation, f d f − (Xkf (1 − X0f ) − Xk+1 ) Xk = f f f f f f dt (Xk−1 (1 − X0 ) − Xk ) − (Xk (1 − X0 ) − Xk+1 ) k = 0, k > 0, with initial condition, ( 1 k = R, Xkf (0) = 0 otherwise. The solution Xf (t) is also the marginal probability distribution of a certain continuous time random walk on N. Let the walk start C(0) = R. At time t, if C(t) > 0 step left (decrement C by 1) at rate 1; step right (increment C by 1) at rate 1 − P(C(t) = 0). Then P(C(t) = k) = Xkf (t). The distribution x with maximum entropy for mean R is the geometric distribution xk = Rk /(R + 1)k+1 . Call this distribution GR . Define S to be the gap between the maximum entropy and the entropy, S(x) = S(GR ) − S(x). Note that ∆S(X) = −∆S(X), and we expect S(X) to decrease to almost zero. We can now hope for inequalities of the form E(∆S | X = x) ≤ −c S(x), We also have the following. 78 c constant. 4.5 Entropy for the mean-field ZRP Lemma 4.4. S(x) is the relative entropy of x with respect to GR , that is the Kullback–Leibler divergence from x to GR . Also, S(x) is less than Kullback– Leibler divergence from x to the geometric distribution with first term x0 and mean (1 − x0 )/x0 . X xk = δ(Rk /(R + 1)k+1 , xk ) k k+1 R /(R + 1) X X xk ≤ xk log = δ(x0 (1 − x0 )k , xk ). k x0 (1 − x0 ) S(x) = X xk log The main results of this chapter are in Sections 4.9, 4.10 and 4.11. They can be summarized as follows. Consider the evolution of the Markov chain up to N 21/20 steps. With probability 1 − O(N −1/3 ) the following occurs. (i) After time O(N R2 log(R + 1)), X0 ≥ (5R)−1 . As a consequence, we get a bound on the tail of the empirical distribution. After O(R2 N log N ) steps, P j j −1 log N . k≥j Xk ≤ 8(1 − a) for j such that (1 − a) ≥ 2N (ii) From then on, E(∆S | X = x) ≤ − S(x) 250R log N + . 2 500R N log N N2 After time O(R2 N log(R + 1) log N ) steps, S ≤ φ := 10−4 (R + 1)−2 . However, when S is small, this bound on E(∆S | X = x) is rather weak. It does not imply that S = O(R3 N −1 log2 N ) until Ω(R2 N log2 N ) steps. (iii) The bound on E(∆S | X = x) improves to, E(∆S | X = x) ≤ − 100(R + 1) log N S(x) + . 3 4000(R + 1) N N2 After O(R3 N log N ) steps, S = O(N −1 log2 N ). In addition, we can show that S tends to decrease to O(R4 N −1 log N ). In order to prove the above, we first analyse a biased random walk on N. We use the equilibrium distribution and mixing time in Section 4.9 to obtain the bound on the tail of the empirical distribution. In Section 4.10 we use the Markov chain’s log Sobolev constant. In Section 4.11 we use an inequality related to the spectral gap. 79 4.6 Total variation convergence 4.6 Total variation convergence Let K : BN × BN → [0, 1] be the Markov kernel for the mean-field ZRP. A natural quantity of interest is the rate at which the total variation distance from equilibrium, kµK i − πkTV = sup |µK i (A) − π(A)|, A⊂BN decays. The time to convergence within ε in total variation is defined, τ1 (ε) = inf{i : ∀µ, kµK i − πkTV ≤ ε}. Notice that τ1 (ε) is a maximum over all starting distributions µ. It is standard to call τ1 (1/4) the total variation mixing time; it can be used to control the Markov chain convergence to an arbitrary degree of accuracy, τ1 (2−k ) ≤ kτ1 (1/4), k = 2, 3, . . . . The results for the entropy described in Section 4.5 assume that the Markov chain starts from a particular state, with R balls in every box. Under the equilibrium measure, the maximum number of balls per box is O(R log N ) with high probability. The results immediately generalize to allow any starting distribution such that every box contains O(R log N ) balls. However, if we start with all RN balls in one box, the process clearly does not reach equilibrium before time RN 2 /2—one box will still contain half the balls. Trying to calculate the total variation mixing time would clearly be the wrong approach. One could consider a truncated version of the mean-field ZRP. The meanfield ZRP is defined on the set BN ; BN can be thought of as a graph, with edges between the elements of BN that differ by the location of one ball. The meanfield ZRP is then the random walk on BN with uniform transition probability N −2 : each step, move to each of the adjacent states with probability N −2 . The equilibrium distribution is uniform. The process can be truncated as follows. Remove certain states, and any incident edges, from the graph BN . Then consider the random walk on the new graph with uniform transition probability N −2 . The equilibrium distribution is again uniform. Provided the fraction of the vertices of BN that have been removed is small, the change to the equilibrium measure is small. 80 4.7 A biased random walk One such truncation is to remove all the ball configuration where any box holds more than O(R log N ) balls. Rough calculations suggest that under the equilibrium measure, S(X) has order at least RN −1 log N with high probability. Given that we show S(X) = O(R4 N −1 log N ) with high probability after O(R3 N log N ) steps, it is tempting to conjecture that for the restricted Markov chain, τ1 (1/4) = O(R3 N −1 log N ). Our results deal with the process when it is ‘far’ from equilibrium. Proving total variation convergence might well require a rather different technique. The empirical distribution is a Markov chain on ( D0 = x ∈ {0, N −1 , 2N −1 , . . . , 1}N : ) X k xk = 1, X kxk = R . k The Markov chain on D0 is similar to the one analysed by Frieze, Kannan and Polson [29]. They looked at the problem of efficiently sampling from a log-concave distribution on a convex subset of Euclidean space. They considered a reversible Markov chain on a convex subset of the hypercubic lattice, chosen so that the equilibrium distribution approximates the log-concave distribution. The mixing time in bounded in terms of the dimension and the diameter of the state space. Our results do seem to be a step in the right direction. We know that the empirical distribution is soon effectively confined to a subset of D0 , D00 = x ∈ D0 : S(x) = O(N −1 log2 N ) and xk = 0 for k = Ω(R log N ) . It would therefore be interesting to know the total variation mixing time for the process when truncated such that the support of empirical distribution is D00 . The diameter of D00 is much smaller than the diameter of D0 . 4.7 A biased random walk To study the evolution of a single box of the mean-field ZRP, you must know something about the state of the rest of the system. It is sufficient to know X0 , the fraction of empty boxes. Consider the step from time i to i + 1. If B1 (i) > 0, box one loses a ball with probability approximately 1/N . Box one gains a ball with probability approximately (1 − X0 (i))/N . We consider the effect of ‘fixing’ 81 4.7 A biased random walk X0 = a. Over a short period of time X0 does not change much. The empirical distribution evolves almost as if each box is an independent copy of the following biased random walk when ‘n’ is sufficiently large. Let a ∈ (0, 1) and let n be a positive integer. Define a discrete Markov chain C on state space Ω = {0, . . . , n − 1}. Each step, (i) decrease C by one with probability 1/N (unless C = 0), (ii) increase C by one with probability (1 − a)/N (unless C = n − 1). That is, C has Markov kernel K : Ω × Ω → [0, 1], (1 − a)/N k = j + 1, K(j, k) = 1/N k = j − 1, 0 |k − j| > 1, P and k K(j, k) = 1. The Markov chain has stationary distribution π(k) = â(1 − a)k , with â = a/(1 − (1 − a)n ). Of course, π(j)K(j, k) = π(k)K(k, j); the Markov chain is reversible. Unless N ≤ 3, the Markov chain is lazy. Lemma 4.5. (i) Suppose C(0) < n/2. The probability that C hits the right boundary, n − 1, within N 21/20 steps is O(N 21/20 (1 − a)n/2 ). (ii) The spectral gap 1 − λ2 ≥ a2 /(4N ). The Markov chain C mixes, in total variation to within ε of π, in time τ1 (ε) ≤ 4N a−2 log(επ(n − 1)). (iii) Let µ be the law of C(0). Choose j ≤ l such that µ(C > l) ≤ π(j)/2. Then (µK i )(k) ≤ 2π(k) for k ≤ j when i ≥ 4N a−2 log 2/π(l). Arguably it would be more natural to take n = ∞ and consider a random walk on N. However, with n finite, the analysis is much simpler. Fortunately, provided a = Ω(R−1 ), we can take n large enough that C is very unlikely to reach n − 1 over the first N 21/20 steps, while still having C mix in time O(R2 N log N ). We can also run the Markov chain in continuous time. The inequalities of Section 4.4 can be simplified by π(k + 1) = (1 − a)π(k) and N d µt (k) t=0 =1{k>0} [µ(k − 1)(1 − a) − µ(k)] dt − 1{k<n−1} [µ(k)(1 − a) − µ(k + 1)]. 82 4.7 A biased random walk The spectral gap inequality (4.1) becomes, n−2 n−1 X 1 X (µ(k)(1 − a) − µ(k + 1))2 (µ(k) − π(k))2 ≥ (1 − λ2 ) . N k=0 π(k + 1) π(k) k=0 The left and right hand sides of the above inequality are very sensitive to changes in the tail of µ. Obviously it is not generally possible to just ignore these terms. However, we can do exactly that if µ(0) = a. We will state this in terms of the empirical distribution, so take a = X0 . Lemma 4.6. For j = 1, 2, . . . , the empirical distribution satisfies j X (Xk (1 − X0 ) − Xk+1 )2 k=0 X0 (1 − X0 )k+1 ≥ j+1 X02 X (Xk − X0 (1 − X0 )k )2 . 4 k=1 X0 (1 − X0 )k Inequality (4.3), containing the log Sobolev constant, is equivalent to n−1 n−2 X µ(k) 1 X µ(k)(1 − a) µ(k) log ≥ 4ν . µ(k)(1 − a) − µ(k + 1) log N k=0 µ(k + 1) π(k) k=0 We can adapt this to apply to the empirical distribution. As we may have Xk = 0, set Xk0 = Xk ∨ (e−1 N −1 ) to avoid dividing by zero. Assume N ≥ n ≥ 10. Lemma 4.7. If X0 > 0 and Xk = 0 for k ≥ n − 1, n−2 1 X 0 Xk0 (1 − X0 ) 0 (X (1 − X0 )−Xk+1 ) log ≥ 0 N k=0 k Xk+1 4ν n−1 X Xk log k=0 n log N Xk − 4ν . k X0 (1 − X0 ) N For fixed a, the log Sobolev constant ν≥ (1 − 2π(n − 1))(1 − λ2 ) = Ω(N −1 n−1 ) log(1/π(n − 1) − 1) by Corollary A.4 of [19]. It is clear that the lower bound ν = Ω(N −1 n−1 ) cannot be greatly improved. Consider a probability distribution that places its mass far away from zero, and diffusely. Then the rate of change of the relative entropy is approximately −a−1 N −1 log(1 − a), due to the downward bias of the random walk, while the entropy can have order −n log(1 − a). 83 4.8 Concentration results 4.8 Concentration results We will use a concentration result, Theorem 3.15 from [46], strengthened as mentioned in their Section 3.5. Theorem 4.8. Let X = (0 = X(0), X(1), . . . , X(m)) be a martingale with respect to filtration {0, Ω} = F0 ⊂ F1 · · · ⊂ Fm . Let b be the maximum step size, |∆X(i)| ≤ b. Let σi2 be the conditional variance of ∆X(i) given Fi , and let v̂ be P 2 an upper bound on m−1 i=0 σi . For any λ ≥ 0, −λ2 P max X(i) ≥ λ ≤ exp . i≤m 2v̂ + 2bλ/3 The above result can be applied to the following class of processes by constructing a suitable martingale. Let (W (i) : 0 ≤ i ≤ N 21/20 ) be a process adapted to filtration (Fi ) with ∆W (i) − E(∆W (i) | Fi ) ≤ b. Suppose that with probability 1 − q from time 0 to N 21/20 , (a) W (i) ∈ [0, α], and (b) while W (i) ∈ [α/8, α], E(∆W (i) | Fi ) ≤ −γ. Our corollary will be applied to S once we have bounded the expected change E(∆S | X = x). The quantity v̂ is defined in the proof. It is an upper bound on sum of the step-wise variances of the constructed martingale X. Corollary 4.9. If p α ≥ 2b log N + 6v̂ log N 16 then P(∀i ≥ 9α/(16γ), W (i) ≤ α/2) ≥ 1 − q − N −1 . (4.10) Proof. Assume W (0) ≥ 3α/8. Let M be the stopping time triggered by any of the following: (i) W (i) ≤ α/8, Pi−1 (ii) j=0 E(∆W (j) | Fj ) ≤ 3α/16 − W (0), P (iii) W (i) − W (0) − i−1 j=0 E(∆W (j) | Fj ) ≥ α/16, 84 4.9 Coupling results (iv) E(∆W (i) | Fi ) > −γ, or (v) W (i) ≥ α. M is less than m := d13α/(16γ)e. Let W M denote W stopped at M ; define a martingale X by X(0) = 0 and ∆X(i) = ∆W M (i) − E(∆W M (i) | Fi ). Let v̂ be P an upper bound on m−1 i=0 var(∆X(i) | Fi ). Let c ∈ (0, 1) and suppose M P sup X (i) ≥ α/16 ≤ c. (4.11) i Then with probability 1 − q − c, W (i) first leaves the interval [α/4, α] before time 13α/(16γ), having decreased to less than α/4. In particular, W (i) ≤ α/2 after 9α/16 steps. Let c = N −3 , so (4.10) implies (4.11) by Theorem 4.8. If the process stops because of (i) or (ii), or if initially W ≤ 3α/16, wait for W ≥ 3α/8. Then start the construction of X over again. The number of times that the process is restarted is less than N 21/20 . Therefore, the probability that W ≥ α/2 at any time between 9α/(16γ) and N 21/20 is less than q + N 21/20 c. We have two other uses for Theorem 4.8. If X ∼ Bin(n, p), −λ2 . P(|X − np| ≥ λ) ≤ 2 exp 2np + 2λ/3 Firstly, if E(X) ≥ 8 log N then P(X ≥ 2E(X)) ≤ 2N −3 . We use this to bound the tail of the empirical distribution. Secondly, if X is a Poisson random variable with parameter µ ≤ N 1/4 , P(|X − µ| ≥ 3N 1/8 p log N ) ≤ 2N −3 . We use this to show that the coupling defined in the proof of Theorem 4.12 has the properties claimed. Over the next three sections we give the main results. 4.9 Coupling results We have described, and stated mixing results for, a biased random walk. The random walk was motivated by the evolution of a single box. We will now give coupling results that make the relationship with the mean-field ZRP more precise. 85 4.9 Coupling results We first show that if X0 , the fraction of empty boxes, is bounded below then we can couple the individual boxes to independently evolving copies of the biased random walk. Choose n = n(a) maximal such that π(n − 1) ≥ 2N −10 . This is an assumed upper bound on the number of balls any box will contain. Let Aa,s be the event √ (i) from time s until time N 21/20 , X0 ≥ a + 6N −1/8 log N , and (ii) at time s, Xk = 0 for all k ≥ n/2. Starting from time s, define a process C = (Ci )N i=1 . Let Ci (s) = Bi (s) for i = 1, . . . , N . Then let the Ci evolve independently, each according to the continuous time biased random walk in Section 4.7 on {0, 1, . . . , n − 1} with bias a. We then want to find a coupling of B and C such that the following event has high probability: for all integer t from s to N 21/20 , there exists t0 ∈ [t − N, t + N ] such that, (i) Bi (t) ≤ Ci (t0 ) + 2 for all i, and (ii) Ci (t0 ) < n − 2 for all i. Call this event A0a,s . Theorem 4.12. There exists a coupling (B, C) with P(A0a,s ) ≥ P(Aa,s ) − O(N −2/5 ). Recall that by Lemma 4.5 (ii), the Ci mix to within N −10 in total variation in time τ1 (N −10 ). For convenience write τ (a) = τ1 (N −10 ) + N = O(a−2 N log N ). After discrete time s + τ (a), we can assume the Bi are bounded by Ci close to equilibrium. Let Ca,s be the event that from time s + τ (a) to N 21/20 , P j j −1 (i) log N , and k≥j Xk ≤ 8(1 − a) for all j with (1 − a) ≥ 2N P j −5 (ii) . k≥j Xk = 0 for j such that (1 − a) ≤ N By the concentration calculations at the end of Section 4.8, P(A0a,s \ Ca,s ) = O(N −2/5 ), so P(Ca,s ) ≥ P(Aa,s ) − O(N −2/5 ). Before we can make use of Theorem 4.12, we need to know that Aa,s occurs with high probability for some a, s. This follows from another coupling argument, this time involving an unbiased random walk on {0, 1, . . . , 2R − 1}. 86 4.9 Coupling results Theorem 4.13. Let a = (5R)−1 and s = 10N R2 log(R + 1). Then P(Aa,s ) ≥ 1 − O(N −2/5 ). With a ≥ (5R)−1 , the mixing time τ1 (N −10 ) ≤ 2000R2 N log N . Under the event Ca,s we can perform some calculations that we will need later. So far we only have a lower bound on X0 . We can calculate an upper bound on X0 using our upper bound on the tail of the empirical distribution P P and the identities Xk = 1, kXk = R. We will also need a bound on the P contribution of the tail of the distribution k≥j δ(X0 (1 − X0 )k , Xk ) to the upper P bound δ(X0 (1 − X0 )k , Xk ) ≥ S(X). Lemma 4.14. Let a ∈ [(5R)−1 , (R + 1)−1 ]. Under Ca,s for integer t from s + τ (a) to N 21/20 : (i) The fraction of empty boxes X0 ≤ 0.99 if N is sufficiently large. (ii) For all j, 1−a δ(X0 (1 − X0 ) , Xk ) ≤ 8(1 − a) log 40R + (j + 5R) log 1 − X0 k≥j X k j + 80 log2 N . N Proof of Theorem 4.12. Define an alphabet of 2N letters, Ui and Di for i = 1, . . . , N . These should be understood to mean ‘move coordinate i up’ and ‘move coordinate i down’, respectively. We will use a Poisson process to produce a list of letters. Let u = N 1/4 , do the following for time u and independently for i = 1, . . . , N : write Ui at rate (1 − a)/N and write Di at rate 1/N . Use this list of letters to run C for time u. Read through the list taking the appropriate actions. For each Ui increment Ci by one, for each Di decrement Ci by one if Ci > 0. We can use the same list to run B; each ‘Di ’ corresponds to one step of process B. Go through the ‘Di ’ symbols in the list in order: (i) If Bi = 0 do nothing. (ii) If Bi > 0, pick the first Uj in the list that you have not already selected. If i 6= j, decrement Bi and increment Bj . 87 4.9 Coupling results Discard any remaining ‘Ui ’ letters. If we can follow this procedure, and if each coordinate i has appeared at most once (as either Di or Ui ), then the property Bi ≤ Ci is preserved. However, the value of Bi during the processing of the list may be one greater than the final value of Ci . We require there are enough ‘Uj ’ letters in the list for the letters ‘Di ’ that correspond to Bi > 0. The number of ‘Ui ’ is a Poisson random variable with mean u(1 − a) ≤ N 1/4 ; by concentration, with probability 1 − 2N −3 there are at √ least u(1 − a) − 3N 1/8 log N in the list. Now condition on the ‘Uj ’ in the list. The ‘Di ’ form a Poisson process with combined rate 1. We can run the process B as the Di arrive. The total rate at which ‘Di ’ with Bi > 0 arrive is 1 − X0 . Provided the event Aa,s holds, √ X0 ≥ a + 6N −1/8 log N ; the rate at which ‘Di ’ with Bi > 0 arrive is bounded √ above by 1 − a − 6N −1/8 log N . By another application of the concentration re√ sult, the number of Di corresponding to Bi > 0 is less than u(1−a)−3N 1/8 log N . Therefore with probability P(Aa,s ) − 4N −3 we can run the two processes coupled in this way for a ‘block’ of time. We have run process C for time u, and process B for as many steps as there are Di in the list, with probability 1 − 2N −3 √ between u ± 3N 1/8 log N steps. Repeat the above block procedure b := 2bN 4/5 c times; with probability P(Aa,s ) − 6bN −3 we have run process B for at least N 21/20 steps. When the construction fails, we can complete the coupling in an arbitrary manner. The motivation for the choice of b, the number of blocks, and u, the size of each block, was to ensure that with high probability: (i) B is run for at least order R3 N log N steps, (ii) (∀i)(coordinate i appears at most twice per block), and (iii) (∀i)(coordinate i appears twice in no more than one block). We can take P(Aa,s )−O(N −2/5 ) as a lower bound on the probability of this event. Let integer t in [s, N 21/20 ] be a discrete time for process B, and let t0 be the time of the end of the corresponding block for process C. Then if the above event holds, Bi (t) ≤ Ci (t0 ) + 2 for all i. At time s, Bi ≤ Ci + ki with ki = 0. If 88 4.9 Coupling results coordinate i does not appear twice is a block, and Bi ≤ Ci + ki at the start of a block, then Bi may be equal to Ci + ki + 1 during the block but Bi ≤ Ci + ki at the end of the block. If coordinate i does appear twice in a block, from then on it may be necessary to take ki = 1. Suppose Ci starts a block at 0, is acted on by a Ui and then by a Di ; Ci = 0 at the end of the block. Also starting from 0, Bi may be acted on by Di , Ui , ending up at 1. Proof of Lemma 4.14. The empirical distribution under Ca,s satisfies, X Xk ≤ 8(1 − a)j if (1 − a)j ≥ 2N −1 log N, and k≥j X Xk = 0 if (1 − a)j ≤ N −5 . k≥j P Xk imply upper bounds on k≥j kXk . We P choose j sufficiently large, so that k≥j kXk < R, and sufficiently small, P so that the upper bound 8(1 − a)j ≥ k≥j Xk is greater than 0.01. Then P X0 ≤ 1 − 8(1 − a)j or we get a contradiction, kXk < R. (i) The upper bounds on P k≥j We can assume a = (5R)−1 . Take j = 29R so 8(1 − a)j ≥ 0.01. Also, X k≥29R kXk ≤ X 8ak(1 − a)k + 16N −1 log N × log1−a N −5 k≥29R ≤ 0.9R + O(RN −1 log2 N ) < R if N is sufficiently large. (ii) Notice that δ(x, y) is linear in the sense that δ(x, y) = xδ(1, y/x). Check that x y < x, δ(x, y) ≤ y log y/x y ≥ x. 89 4.9 Coupling results Therefore X δ(X0 (1 − X0 )k , Xk ) k≥j 8 1−a ≤ + k log + δ(N −5 , 16N −1 log N ) 8a(1 − a) log X 1 − X 0 0 k≥j 8 1−a 1−a 80 log2 N j ≤8(1 − a) log + j+ log + . X0 a 1 − X0 N X k Proof of Theorem 4.13. Apply Markov’s inequality to the empirical distribution P X. The mean is R, so at most half the boxes have 2R balls in, k<2R Xk ≥ 1/2. Think of the boxes as being indistinguishable, but with labels 1, . . . , N attached to indicate how they correspond to B1 , . . . , BN . Let l = dN/2e. Initially Bi < 2R for i = 1, . . . , l. Run the Markov chain step by step. If at the completion of a step Bi = 2R for i ≤ l, (i) pick j > l with Bj < 2R, (ii) swap the labels on the boxes i and j, so Bi < 2R. Following this procedure, the property Bi < 2R for i = 1, . . . , l is preserved. With each step and any subsequent relabelling: (a) B1 increases by one with probability less than (N − 1)/N 2 . (b) B1 decreases by at least one with probability at least (N − 1)/N 2 , unless B1 = 0. Box one may be picked as the source box, or it may get too full and be subject to a relabelling. We introduced notation for a Markov chain on the set {0, 1, . . . , n − 1} in Section 4.7. Set n = 2R and a = 0, so the walk is unbiased. The spectral gap, see the proof of Lemma 4.5, is π 1 2 1 − λ2 = 1 − cos ≥ 2 . N 2R R N 90 4.9 Coupling results The stationary distribution π is uniform on {0, 1, . . . , 2R − 1}. The time needed to mix to within (20R)−1 of π in total variation is log (20R)−1 · (2R)−1 / log(λ2 ) ≤ N R2 log(40R2 ) ≤ 10N R2 log(R + 1). Let C1 , . . . , Cl be l copies of the continuous time Markov chain on state space {0, 1, . . . , 2R − 1}. Let Ci = Bi = R for i = 1, . . . , l at time 0, and let the Ci evolve independently. Once the Ci have mixed sufficiently, |{i : Ci = 0}| is stochastically greater than a binomial Bin(l, (2R)−1 − (20R)−1 ) distribution. √ With high probability |{i : Ci = 0}| ≥ 9N/(40R) − 2N 1/2 log N . Coupling B and C is much easier here than in the proof of Theorem 4.12. The result follows if we simply show that Bi ≤ Ci for ‘almost every’ i. Write letters Ui and Di at rate N −1 , independently for i = 1, . . . , N for a block of time u = N 1/4 . Use the Ui and Di , where i ≤ l, to run C for time u. We will also use the letters to run B. The number of ‘Ui ’ in the block can be assumed to be within √ √ N 1/4 ± 3N 1/8 log N ; similarly for the ‘Di ’. Run B for N 1/4 − 3N 1/8 log N steps. Pick pairs (Di , Uj ) from the list of letters, without replacement. For each pair move a ball from box i to box j if possible. Repeat b := 2bN 4/5 c times. Under the concentration bound, the total number of letters left unused is √ order O(N 37/40 log N ). The probability that coordinate i appears twice in at least one of the b blocks is O(N −7/10 ). It easily follows by concentration that with high probability |{i : Bi = 0}| ≥ |{i : Ci = 0}| + 8N 7/8 p log N − N/(40R). To bound the number of balls per box at s = 10N R2 log(R + 1), consider only the number of balls box one receives up to time s. This is bounded by a binomial Bin(s, 1/N ) distribution. Again we use concentration, B1 (s) ≤ R + 2 log N + p 60R2 log(R + 1) log N + 10R2 log(R + 1) with probability 1 − N −3 . 91 4.10 Slow initial decrease in S 4.10 Slow initial decrease in S By Section 4.9 we may assume Ca,s with a = (5R)−1 and s+τ (a) = O(R2 N log N ). Let φ = 10−4 (R + 1)−2 . In this section we use the log Sobolev result from Lemma 4.7 to show that S decreases to φ. This method could be pushed further, to show that S decreases to O(R3 N −1 log2 N ). However, the bound on the time needed would be Ω(R2 N log2 N ). In the next section we show that once S ≤ φ, S decreases more quickly. Theorem 4.15. With probability 1−O(N −1/3 ), between O(R2 N log(R+1) log N ) and N 21/20 , S ≤ φ. We need the following technical lemma to deal with the discrete nature of the process. Let x0k = xk ∨ (e−1 N −1 ). Lemma 4.16. Assume xk = 0 for k ≥ n0 ≥ 5 log N . If x0 ≤ 1 − 10−9 , 0 n −1 1 X 0 x0k (1 − x0 ) 10n0 0 E(∆S | X = x) ≥ (x (1 − x0 )−xk+1 ) log − 2. 2N k=0 k x0k+1 N Proof of Theorem 4.15. The inequality in Lemma 4.7 contains ν, the log Sobolev constant for the biased random walk. Recall Corollary A.4 of [19], the general bound on log Sobolev constants in terms of the spectral gap and the minimum of the equilibrium distribution, ν≥ (1 − 2π(n − 1))(1 − λ2 ) . log(1/π(n − 1) − 1) By Lemma 4.5 (ii) with a = (5R)−1 , spectral gap 1 − λ2 ≥ (100R2 N )−1 . With n = n(a), the maximum n such that π(n − 1) ≥ 2N −10 , ν ≥ (1000R2 N log N )−1 . By Lemma 4.14 (i) we may apply Lemma 4.16 with n0 = n(a)/2. Combining the inequalities, E(∆S | X = x) ≤ −2νS + 92 250R log N . N2 (4.17) 4.11 Rapid decrease in S We can now apply Corollary 4.9 to show that S decreases as inequality (4.17) suggests. Corollary 4.9 only gives a condition under which the upper bound on a process (W (i) : 0 ≤ i ≤ N 21/20 ) can be halved, so we need to apply it a number of times. Suppose inductively that with probability 1 − ql from time tl to N 21/20 , S ≤ α = S(GR ) · 2−l . Let W (i) = S(X((i + tl ) ∧ bN 21/20 c)). We need a lower bound γ on the expected decrease of W when W ∈ [α/8, α]. This is obtained by substituting S = α/8 into inequality (4.17); we can take γ = α(8000R2 N log N )−1 . The maximum time before W hits α/4, or the concentration bound fails, is m = 13α/(16γ) = 6500R2 N log N . The step size ∆W is bounded by ±2N −1 (1 + log N ) so take b = 4N −1 (1 + log N ) and v̂ = mb2 /4. Inequality (4.10) is satisfied if α ≥ 105 RN −1/2 log2 N . Take 1 − q0 = P(Ca,s ), with a = (5R)−1 and s = 10N R2 log(R + 1). Let t0 = s + τ (a). Then we can take ql+1 = ql + N −1 and tl+1 = tl + m. If N is sufficiently large, φ ≥ 105 RN −1/2 log2 N so the result follows. We delay the proof of Lemma 4.16. 4.11 Rapid decrease in S In this section we use the spectral gap related inequality, Lemma 4.6, to prove S decreases rapidly. Theorem 4.18. With probability 1−O(N −1/3 ), from O(R3 N log N ) until N 21/20 , S(X) = O(N −1 log2 N ). By Section 4.10 we may assume that S ≤ φ = 10−4 (R + 1)−2 from steps O(R2 N log(R + 1) log N ) to N 21/20 . When the entropy is close to the maximum 93 4.11 Rapid decrease in S entropy, X0 must be close to (R + 1)−1 . In Lemma 4.19 we bound the difference. This allows us to use Theorem 4.12 from Section 4.9 again, this time with an improved lower bound a. We also need two bounds related to ∆S. Lemma 4.19. If S(x) ≤ (R + 1)/(3R2 ) then 2 1 3R2 S x0 − ≤ . R+1 (R + 1)3 Lemma 4.20. Assume xk = 0 for k ≥ n0 ≥ 5 log N . Then 0 n −1 1 X (xk (1 − x0 ) − xk+1 )2 10n0 E(∆S | X = x) ≥ − 2. N k=0 max{xk (1 − x0 ), xk+1 } N Lemma 4.21. Suppose x0 ≤ 2/3 and xk = 0 for k ≥ n0 . Then the variance of ∆S satisfies var(∆S | X = x) ≤ 120n0 log N + 200n0 12 log N E(∆S | X = x) + . N N3 Proof of Theorem 4.18. The proof is similar to the proof of Theorem 4.15. We start off with the assumption that S ≤ φ between O(R2 N log(R + 1) log N ) and N 21/20 steps. This allows us to bound E(∆S | X = x) when S is in a suitable range. We then use Corollary 4.9 inductively. However there are some added complications. One problem is that the bound on E(∆S | X = x) will not apply ‘instantly’. Once S ≤ φ2−l , l = 0, 1, . . . , we will have to wait before we can assume an improved bound on E(∆S | X = x). Secondly, we must use Lemma 4.21 to bound the variance of ∆S. We split the proof of Theorem 4.18 into two parts. In the first part we will show that S ≤ ψ = (R + 1)/(3 log2R/(R+1) N ). In the second, that S ≤ 105 N −1 log2 N . We collect and ‘daisy-chain’ (i) Lemma 4.20, (ii) an elementary observation, (iii) Lemma 4.6, (iv) the bound δ(x, y) ≤ (x − y)2 /x for function δ, and (v) Lemma 4.4. P (xk (1−x0 )−xk+1 )2 −2 (i) E(∆S | X = x) ≥ N −1 j−2 log N , k=0 max{xk (1−x0 ),xk+1 } − 100(R + 1)N Pj−2 (xk (1−x0 )−xk+1 )2 Pj−2 (xk (1−x0 )−xk+1 )2 x0 (1−x0 )k (ii) inf , k≤j−1 k+1 k=0 max{xk (1−x0 ),xk+1 } ≥ k=0 xk x0 (1−x0 ) Pj−2 (xk (1−x0 )−xk+1 )2 P (xk −x0 (1−x0 )k )2 (iii) ≥ (x20 /4) j−1 , k=0 k=1 x0 (1−x0 )k+1 x0 (1−x0 )k 94 4.11 Rapid decrease in S (iv) Pj−1 (xk −x0 (1−x0 )k )2 x0 (1−x0 )k (v) P∞ δ(x0 (1 − x0 )k , xk ) ≥ S(x). k=1 k=1 ≥ Pj−1 k=1 δ(x0 (1 − x0 )k , xk ), Therefore, x3 E(∆S | X = x) ≤ − 0 4N ! S− X δ(x0 (1 − x0 )k , xk ) (1 − x0 )k k≤j−1 xk inf k≥j 100(R + 1) log N . N2 + The result now hinges on controlling, for some j, (a) X δ(x0 (1 − x0 )k , xk ) and k≥j (b) inf k≤j (1 − x0 )k . xk We need j sufficiently large to make quantity (a) small, but not too large or quantity (b) becomes too small. Part One: For the first induction argument, we take α = φ2−l , l = 0, 1, . . . , blog2 φ/ψc. Assume inductively S ≤ α ∈ [ψ, φ]. By Lemma 4.19, r 3α X0 − 1 ≤ R . R+1 R+1 R+1 Then, provided N is sufficiently large, we find that X0 , a satisfy the requirements of Theorem 4.12 if we take, 1 R a = a(α) = − ε(α), R+1 R+1 r ε(α) = 2 3α . R+1 (4.22) Once S ≤ ψ, we will not need to keep improving our bound a ≤ X0 in this way. For each value a = a(α), we apply Theorem 4.12, producing a corresponding set of coupled biased random walks Ca = (Cia )N i=1 . Once we have S ≤ α with high probability, we must wait for the Cia to mix, though not necessarily for as long as τ (a) steps. Choose j ∈ N, j = j(α) = bε(α)−1 c, 95 α ∈ [ψ, φ]. 4.11 Rapid decrease in S (a) When α = φ wait τ (a) steps; by Lemma 4.5 (ii) the Cia mix to within N −10 in total variation. Then by concentration we can assume Xk ≤ 8(1 − a(φ))k for k such that (1 − a(φ))k ≥ 2N −1 log N . (b) When α = φ2−l , l = 1, . . . , blog2 φ/ψc, wait O(RN j(α)) steps so we can bound Xk for k ≤ j(α) by Lemma 4.5 (iii). Then Xk ≤ 16(1 − a(α))k for k ≤ j(α). Note that if k ≤ j(α) by (4.22), (R/(R + 1))k ≤ (1 − a(α))k ≤ e(R/(R + 1))k . By (a) and Lemma 4.14, if α ∈ [ψ, φ] and S ∈ [α/8, α], X δ(X0 (1 − X0 )k , Xk ) ≤ 200j(1 − a(φ))j ≤ α/16. k≥j By (b), inf k≤j (1 − X0 )k e−1 (R/(R + 1))k 1 ≥ inf ≥ . k k≤j 16e(R/(R + 1)) Xk 16e2 a(α) Therefore when S ∈ [α/8, α], α ∈ [ψ, φ] and the Ci E(∆S | X = x) ≤ − 104 (R have mixed, α . + 1)3 N Part Two: From now on fix 1 R a= − ε, R+1 R+1 r ε=2 3ψ . R+1 We apply Theorem 4.12 only once more; wait τ (a) for the Cia to mix. For k such that (1 − a)k ≥ 2N −1 log N , Xk ≤ 8(1 − a)k . Let j = blog1−a 2N −1 log N c. If k ≤ j ≤ 2ε−1 , (R/(R + 1))k ≤ (1 − a)k ≤ e2 (R/(R + 1))k . Therefore inf k≤j (1 − X0 )k 1 ≤ 4. Xk 8e 96 4.11 Rapid decrease in S By Lemma 4.14 with N large, X δ(X0 (1 − X0 )k , Xk ) ≤ 160N −1 log2 N. k≥j For the second induction argument take α = ψ2−l , l = 0, 1, . . . . When S ∈ [α/8, α] and α ∈ [105 N −1 log2 N, ψ], E(∆S | X = x) ≤ − α 3· 104 (R + 1)3 N . Using the bound on variance from Lemma 4.21, v̂ ≤ 12 log N 108 (R + 1)4 log N α+ . N N2 Inequality (4.10) holds for α ≥ 105 N −1 log2 N if N is sufficiently large. Proof of Lemma 4.19. We prove this by maximizing S(x) with x0 fixed. We can assume that x1 , x2 , x3 , . . . form a geometric series, a, ar, ar2 , . . . . Then x0 + X ark = 1, X (1 − x0 )2 , R r= (k + 1)ark = R. Solve to get, a= R − 1 + x0 . R Therefore the maximum value for S is −x0 log x0 − (1 − x0 ) log R − 1 + x0 (1 − x0 )2 − (R − 1 + x0 ) log . R R Check that for 0 ≤ y ≤ 2x, (x − y)2 /(3x) ≤ δ(x, y). If x0 ≤ 2(R + 1)−1 , S ≥ S GR − S(x0 , a, ar, . . . ) ! √ √ 1 R 1 − x0 R R − 1 + x0 =δ , x0 + 2 Rδ , √ + Rδ , R+1 R+1 R+1 R R 2 (R + 1)3 1 1 ≥ x0 − . 3 R+1 R2 This lower bound on S is a convex function of x0 . 97 4.12 Final observations 4.12 Final observations We have shown that the event S = O(N −1 log2 N ) from time O(R3 N log N ) to N 21/20 has high probability. In this section we modify our argument to show that S decreases to O(R4 N −1 log N ). Rough calculations suggest that under the equilibrium measure, the expected value of S is Ω(RN −1 log N ); our result may in some sense be tight. As in part two of the proof of Theorem 4.18, take s 1 12R2 ψ a= − , j = blog1−a 2N −1 log N c. R+1 (R + 1)3 We can then assume that E(∆S | X = x) ≤ − S− δ(x0 (1 − x0 )k , xk ) 100(R + 1) log N + . 2000(R + 1)3 N N2 P k≥j We sharpen Lemma 4.14. With s = O(R3 N log N ), let E be the subset of Ca,s where S = O(N −1 log2 N ) from s to N 21/20 . Let E 0 be the intersection of events E and ( 1000(R + 1)2 log N δ(X0 (1 − X0 )k , Xk ) ≥ ∀i ∈ [s, N 21/20 ], N k≥j X ) . Lemma 4.23. P(E 0 ) ≥ 1 − O(N −1/3 ). We will now consider a modification of the entropy gap process S. Let Fi be the σ-algebra generated by the mean field ZRP up to step i. Let W be a random process defined from O(R3 N log N ) to N 21/20 , ( S(X(i)) if E(E 0 | Fi ) > 0, W (i) = 0 otherwise. Let n = blog log N c. Let C be the discrete time biased random walk on Ω = {0, 1, . . . , n − 1} that decreases by one with probability 2/3 if C > 0 and increases by one with probability 1/3 if C < n − 1. C is a faster version of the biased random walk from Section 4.7 with bias 1/2. The total variation mixing time for C is O(n) steps. 98 4.12 Final observations By the definition of E 0 , W = O(N −1 log2 N ). Say that log2 (W/α) is dominated by C with maximum step time m if it has the following property. Whenever 2k W is within a small interval of α/2 with positive integer k, then with probability 2/3, 2k W decreases to below α/4 within m steps without having reached α. Theorem 4.24. log2 (W/α) is dominated by C with maximum step time O(R3 N ). Proof. We use the construction from Section 4.8. In Corollary 4.9 the construction is applied repeatedly with c, the probability of ‘error’, very small each time. We now apply the construction once at a time and with relaxed error probability; let c = 1/3. Then for inequality (4.11), we need a weaker condition than (4.10), only that p α ≥ (2/3)b log 3 + 2v̂ log 3. (4.25) 16 Then if W is in a small neighbourhood of α/2, with probability 2/3, W enters [0, α/4] without increasing beyond α. If α ≥ 107 (R + 1)4 N −1 log N and W ∈ [α/8, α] then E(∆W | X = x) ≤ − α 2· 104 (R + 1)3 N and 12 log N 108 (R + 1)4 log N α+ . N N2 Inequality (4.25) holds for N sufficiently large. v̂ ≤ Proof of Lemma 4.23. As j ≥ (1/2) log1−a N −1 , under Ca,s , X k≥10j δ(X0 (1 − X0 )k , Xk ) = X δ(X0 (1 − X0 )k , 0) = O(N −5 ). k≥10j Similarly, we must bound the sum from j to 10j. Check, δ(λ, x) ≤ λ(e2 + 1) + [x log x/(λe) − λe2 ]+ . Suppose X ∼ Bin(N, λ/N ) and consider Y = [X log X/(λe) − e2 λ]+ . Let Z be the geometric random variable with distribution ((1 − e−1 )e−n )∞ n=0 ; P (Z ≥ n) = 99 4.12 Final observations exp(−n). We will show that P (Y ≥ n) ≤ exp(−n), so Y ≤st 1 + Z. We can write, 10j X k δ(X0 (1 − X0 ) , Xk ) ≤ k=j 10j X δ(4(1 − a)k , Xk ) + Xk log k=j 4(1 − a)k ≤ X0 (1 − X0 )k 10j X Xk 4(1 − a)k ) k 2 k 2 . − 4(1 − a) e + 4(1 − a) (e + 1) + Xk log Xk log X0 (1 − X0 )k 4(1 − a)k e + k=j For k ≤ 10j, (1 − X0 )k ≤ (1 − a)k ≤ e20 (1 − X0 )k . Up to modification off E 0 , Nk = N Xk ≤st Bin(N, 4(1 − a)k ). With Zi independent random variables equal to Z in distribution, 10j X k δ(X0 (1 − X0 ) , Xk ) ≤st N −1 k=j 9j X Zi + k=1 500(R + 1)2 log N . N We can bound the probability that the sum of l := 9j independent random variables Zi is greater than B := le−1/4 /(1 − e−1/4 ) ≤ 4l. Let Ẑi be independent geometric ((1 − e−1/4 )e−k/4 )∞ k=0 random variables. By an elementary comparison, ! ! l l l X X 1 − e−1 P Zi ≥ B ≤ P Ẑi ≥ B exp(−(3/4)B) −1/4 1 − e i=1 i=1 ≤ exp(−l) = O(N −1 ). It remains to show that P(Y ≥ n) ≤ exp(−n). The set of positive values Y can take is, Aλ = k 2 k log − e λ : k ∈ N ∩ [0, ∞) λe is sparser than N = {0, 1, . . . }; by this we mean that there is a one to one function f : N → Aλ such that n ≤ f (n) for all n. Let y = k log k/(λe) ∈ Aλ , k N −k N λ λ P(Y = y) = 1− k N N k (k/e)k λ ≤ ≤ (1 − e−1 ) exp (−y) . k! k/e Therefore P(Y ≥ n) ≤ exp(−n). 100 4.13 Proofs relating to entropy and the biased random walk 4.13 Proofs relating to entropy and the biased random walk Proof of Lemma 4.4. If xk = t(1 − t)k , then x has mean P kxk = (1 − t)t−1 and entropy S(x) = − log t − (1 − t)t−1 log(1 − t). By definition, S(x) = S(GR ) − Using P xk = 1 and P X −xk log xk , k k+1 GR . k = R /(R + 1) kxk = R, xk xk log k R /(R + 1)k+1 " # R X t(1 − t) = log xk log R + S(x) = X 1 R+1 R R+1 xk . t(1 − t)k For t ∈ [0, 1], t(1 − t)R is maximized by t = (R + 1)−1 . The relative entropy of x with respect to a geometric distribution is minimised by GR . The relative entropy can also be written using the function δ(x, y) = y log y/x + (y − x), X xk δ(Rk /(R + 1)k+1 , xk ) = xk log k , and R /(R + 1)k+1 X X xk δ(x0 (1 − x0 )k , xk ) = xk log . x0 (1 − x0 )k X Proof of Lemma 4.5. We will work with the left eigenvalues of K. The equilibrium distribution π(k) = F1 (k) = â(1 − a)k is the unique (up to scalar multiples) eigenvector of K with eigenvalue 1. The other n − 1 eigenvalues of the Markov chain can be written as the imaginary parts of a geometric sequences. Let √ −1/2 Aj = 1 − a exp(iπ(j − 1)/n) for j = 2, . . . , n. Let Fj (k) = Rj =[(1 − Aj )Akj ] with Rj the normalizing constant, n a √ π(j − 1) Rj = 1 − − 1 − a cos . â 2 n 101 4.13 Proofs relating to entropy and the biased random walk For convenience, allow the definition of Fj (k) to extend to Fj (−1) and Fj (n); note that Fj (−1)(1 − a) = Fj (0) and Fj (n − 1)(1 − a) = Fj (n). Check, 1 =[(1 − a)Fj (k − 1) − (1 + (1 − a))Fj (k) + Fj (k + 1)] N 1 = Fj (k) − =[(1 − Aj )Akj (1 − (1 − a)A−1 j )(1 − Aj )] N = Fj (k) 1 − N −1 |1 − Aj |2 . (Fj K)(k) = Fj (k) + Therefore, for j ≥ 2, the j-th eigenvalue is λj = 1 − N −1 |1 − Aj |2 √ (j − 1)π −1 = 1 − 2N 1 − a/2 − 1 − a cos n √ ≤ 1 − 2N −1 (1 − a/2 − 1 − a) ≤ 1 − a2 /(4N ). (i) Suppose the Markov chain C starts at k with probability 1. C can be restricted to the subset {k, k + 1, . . . , n − 1}. It is then a translation of the random walk with ‘n0 = n − k. The Markov chain on {0, . . . , n − 1} started at k is stochastically smaller than the MC restricted to {k, . . . , n − 1} and started at k, which in turn is stochastically smaller than the equilibrium distribution of the Markov chain restricted to {k, . . . , n − 1}. Therefore for all i, (µK i )(C ≥ j + k) ≤ (1 − a)j . (ii) With τ1 (ε) the time to convergence within ε in the sense of total variation, we will show the standard bound [48], τ1 (ε) ≤ log(επmin ) . log λ2 Write K i (x, y) for the i-step transition probability from x to y. Then K i (x, y) = X hδx , Fj iπ Fj (y)λij j = π(y) + X j≥2 102 π(x)−1 Fj (x)Fj (y)λij . 4.13 Proofs relating to entropy and the biased random walk As 1 = K 0 (x, x) ≥ π(x)−1 X Fj (x)2 , j≥2 X Fj (x)2 ≤ π(x). j≥2 By the Cauchy-Schwarz inequality, with πmin = minx π(x), X Fj (x)Fj (y)λij |K i (x, y) − π(y)| = π(x)−1 j≥2 sX X Fj (x)2 Fj (y)2 ≤ λi2 π(x)−1 j≥2 j≥2 p ≤ λi2 π(x)−1 π(x)π(y) λi2 ≤ π(y) p π(x)π(y) ≤ π(y) λi2 . πmin With 1 − λ2 ≥ a2 /(4N ), τ1 (ε) ≤ logλ2 (επ(n − 1)) ≤ −4N a−2 log(επ(n − 1)). (iii) Suppose first that µ(C = l) = 1. The general result follows by linearity. From part (ii), i i K (l, k) ≤ p λ2 − 1 π(k) π(k)π(l) λi ≤ 2 π(l) ≤ 1/2 if i ≥ 4N a−2 log 2/π(l). Proof of Lemma 4.6. For convenience, change variables as follows. Let uk = p (Xk − X0 (1 − X0 )k )/ X0 (1 − X0 )k for k = 0, 1, . . . , j + 1, u0 = 0. First consider the problem: minimize f for fixed g and uj+1 , f= √ j X p (uk 1 − X0 − uk+1 )2 , k=0 g= j+1 X u2k . k=1 Let c = 1 − 1 − X0 ≥ X0 /2. We will prove the lemma by showing that f ≥ c2 g. Introduce a Lagrange multiplier µ̂, ∂ ∂ µ̂ g= f, ∂uk ∂uk 103 k = 1, . . . , j. 4.13 Proofs relating to entropy and the biased random walk With a further change of variables this is simply, uk+1 − 2µ uk + uk−1 = 0, k = 1, . . . , j. When the equation x2 − 2µ x + 1 = 0 has distinct roots, the solution of the difference can in terms of powers of the roots, that is equation k be expressed k p p uk = c1 µ − µ2 − 1 + c2 µ + µ2 − 1 . The solution, up to a multiplicative factor, can be written as follows with Θ a function of µ. µ uk (−1)k sinh Θk < −1 (−1)k k −1 (−1, 1) sin Θk 1 k >1 sinh Θk Xk − X0 (1 − X0 )k p X0 (1 − X0 )k (−1)k sinh Θk p X0 (1 − X0 )k (−1)k k p X0 (1 − X0 )k sin Θk p X0 (1 − X0 )k k p X0 (1 − X0 )k sinh Θk If uk = k, f ≥ c2 g follows by evaluating the quadratic sums f and g. j X p f= (k 1 − X0 − (k + 1))2 k=0 j = X (kc + 1)2 k=0 j(j + 1) j(j + 1)(2j + 1) + 2c + (j + 1) 6 2 j+1 X (j + 1)(j + 2)(2j + 3) g= k2 = 6 k=1 = c2 If uk = sin Θk we can use the following identity to expand f − c2 g. j X sin Θ(k + a) sin Θk k=0 ≡ j X cos Θa − cos Θ(2k + a) k=0 ≡ 2 j+1 sin Θ(2j + 1 + a) − sin Θ(a − 1) cos Θa − 2 4 sin Θ 104 4.13 Proofs relating to entropy and the biased random walk Therefore p f − c2 g =2 1 − X0 j X sin2 Θ(k + 1) − k=0 j − (1 − X0 ) X j X ! sin Θ(k + 1) sin Θk k=0 j sin2 Θ(k + 1) − k=0 X ! sin2 Θ(k) k=0 p 1p Θ 1 − X0 tan sin Θ(2j + 2) =(j + 1) 1 − X0 (1 − cos Θ) + 2 2 1 p + ( 1 − X0 − (1 − X0 ))(1 − cos Θ(2j + 2)) 2 p Θ 1 ≥ 1 − X0 (j + 1)(1 − cos Θ) + tan sin Θ(2j + 2) 2 2 p sin Θ(2j + 2) 2 Θ = 1 − X0 sin (2j + 2) + ≥ 0. 2 sin Θ For uk = sinh Θk, we can apply Osborne’s rule to convert the previous trigonometric identity to a hyperbolic one, p −f + c2 g =(j + 1) 1 − X0 (1 − cosh Θ) 1p Θ − 1 − X0 tanh sinh Θ(2j + 2) 2 2 1 p + ( 1 − X0 − (1 − X0 ))(1 − cosh Θ(2j + 2)) 2 p sinh Θ(2j + 2) 2 Θ ≤ − 1 − X0 sinh (2j + 2) + ≤ 0. 2 sinh Θ We can ignore the solutions (uk ) corresponding to µ ≤ −1, they are clearly never optimal. Proof of Lemma 4.7. Define f by f (k) = Xk0 /π(k) where Xk0 = Xk ∨ (e−1 /N ). 105 4.14 Technical lemmas relating to ∆S We use the log Sobolev constant, following the proof of Theorem 3.6 in [19]. 1 X 0 x0 (1 − a) (xk (1 − a) − x0k+1 ) log k 0 N k xk+1 1 X = π(k + 1)[f (k) − f (k + 1)][log f (k) − log f (k + 1)] N k X = π(j)K(j, k)f (j)[log f (j) − log f (k)] j,k = E(f, log f ) p p ≥ 4E( f , f ) p ≥ 4νL( f ). We can now switch back to the empirical distribution X = (Xk ) by admitting a small error term. X p Xk0 0 √ L( f ) = Xk log k f k22,π π(k) X X X Xk e−1 e−1 N −1 X 0 = Xk log + log − Xk log Xk0 π(k) N π(k) {k∈Ω:Xk =0} −1 X ne Xk −1 −1 −1 −1 − log eN + (1 + ne N ) log(1 + ne N ) ≥ Xk log π(k) N −1 X Xk ne −1 −1 −1 −1 ≥ Xk log − log eN + (1 + ne N )ne N π(k) N X Xk n log N ≥ Xk log − . k X0 (1 − X0 ) N 4.14 Technical lemmas relating to ∆S All that remains now is to prove Lemmas 4.16, 4.20 and 4.21. The proofs of Lemmas 4.16 and 4.20 are very similar. We express E(∆S | X = x) as a sum, and then carefully bound the terms. The proof of Lemma 4.21 is similarly technical. To make the calculations more manageable we define two functions. Let f : [0, ∞) → R be given by f (x) = x log x, 106 f (0) = 0, 4.14 Technical lemmas relating to ∆S and let g : [0, ∞) → R be the ‘discrete-derivative’ of f , g(x) = f (x + 1/N ) − f (x). If x = 0, take x × g(x − 1/N ) to be equal to zero. Proof of Lemma 4.16. Consider one step of the Markov chain starting from X = x. Let Rk be the random variable, taking values in {−1/N, 0, +1/N }, measuring the movement from Xk to Xk+1 ; that is the fraction of boxes that hold k balls and gain a ball, minus the fraction of boxes that hold k + 1 balls and lose a ball. Let mk = xk (1 − x0 ) − xk+1 so E(Rk ) = mk N −1 + O(N −2 ). Define R−1 = m−1 = 0. P 0 −1 As nk=0 mk = 0 we can write: E(∆S) = ∞ X E[f (xk ) − f (xk + Rk−1 − Rk )] k=0 = ∞ X E[f (xk ) − f (xk + Rk−1 ) + f (xk + Rk−1 ) − f (xk + Rk−1 − Rk )] k=0 = 0 −1 n X E[f (xk + Rk−1 ) − f (xk + Rk−1 − Rk ) + f (xk+1 ) − f (xk+1 + Rk )] k=0 = 0 −1 n X E[f (xk + Rk−1 ) − f (xk + Rk−1 − Rk ) + mk N −1 log(1 − x0 ) k=0 + f (xk+1 ) − f (xk+1 + Rk )]. We must then bound the difference, for k = 0, 1, . . . , n0 − 1, between terms Zk := E[f (xk + Rk−1 ) − f (xk + Rk−1 − Rk ) + mk N −1 log(1 − x0 ) +f (xk+1 ) − f (xk+1 + Rk )] 1 0 x0 (1 − x0 ) and (xk (1 − x0 ) − x0k+1 ) log k 0 . 2N xk+1 For k ≥ 1, we list the possible values of the pair (Rk−1 , Rk ) such that Rk 6= 0. P((Rk−1 , Rk ) = (rk−1 , rk ) | X = x) rk−1 rk +1/N −1/N xk−1 xk+1 −1/N +1/N xk (xk − 1/N ) 0 −1/N xk+1 (1 − xk−1 − xk − 1/N ) 0 +1/N xk (1 − x0 − xk − xk+1 ) 107 4.14 Technical lemmas relating to ∆S Using function g(x) = f (x + 1/N ) − f (x) we split Zk into larger and smaller terms. Zk =mk N −1 log(1 − x0 ) + [g(xk+1 − 1/N ) − g(xk + 1/N )]xk−1 xk+1 + [g(xk − 2/N ) − g(xk+1 )]xk (xk − 1/N ) + [g(xk+1 − 1/N ) − g(xk )]xk+1 (1 − xk−1 − xk − 1/N ) + [g(xk − 1/N ) − g(xk+1 )]xk (1 − x0 − xk − xk+1 ) =ZkA + ZkB ZkA =mk [N −1 log(1 − x0 ) + g(xk ) − g(xk+1 )] − xk−1 xk+1 [g(xk + 1/N ) − g(xk )] ZkB = − xk (1 − x0 − xk+1 )[g(xk ) − g(xk − 1/N )] − xk+1 (1 − xk )[g(xk+1 ) − g(xk+1 − 1/N )] − xk (xk − 1/N )[g(xk − 1/N ) − g(xk − 2/N )] − (xk /N )[g(xk − 1/N ) − g(xk+1 )] − (xk+1 /N )[g(xk+1 − 1/N ) − g(xk )] We write the equivalent terms for k = 0. r0 P(R0 = r0 | X = x) −1/N x1 (1 − x0 − 1/N ) +1/N x0 (1 − x0 − x1 ) Zk =mk N −1 log(1 − x0 ) + [g(x1 − 1/N ) − g(x0 )]x1 (1 − x0 − 1/N ) + [g(x0 − 1/N ) − g(x1 )]x0 (1 − x0 − x1 ) =Z0A + Z0B 108 4.14 Technical lemmas relating to ∆S Z0A =mk [N −1 log(1 − x0 ) + g(x0 ) − g(x1 )] Z0B = − x0 (1 − x0 − x1 )[g(x0 ) − g(x0 − 1/N )] − x1 (1 − x0 )[g(x1 ) − g(x1 − 1/N )] − (x1 /N )[g(x1 − 1/N ) − g(x0 )] So by definition, E(∆S | X = x) = P k ZkA + ZkB . We will first bound P k ZkB . For x, y ∈ {0, 1/N, . . . , 1}, (i) x [g(x) − g(x − 1/N )] ≤ 2N −2 log 2, and (ii) (x/N )(g(x − 1/N ) − g(y)) ≤ xN −2 (1 + log N ). P B Therefore Zk ≥ −6n0 N −2 log 2 + 2N −2 (1 + log N ) ≥ −5n0 N −2 . We will now show that if x0 ≤ 1 − 10−9 , mk (N −1 log(1 − x0 ) + g(xk ) − g(xk+1 )) − xk−1 xk+1 [g(xk + 1/N ) − g(xk )] ≥ (2N )−1 (x0k (1 − x0 ) − x0k+1 ) log x0k (1 − x0 ) − 5N −2 . x0k+1 (4.26) Write xk = a/N , xk+1 = b/N . Let a0 = a ∨ e−1 , b0 = b ∨ e−1 . It is now sufficient to show A ≥ B/2 − 5, A = (a(1 − x0 ) − b)[log(1 − x0 ) + f (a + 1) − f (a) − f (b + 1) + f (b)] − b[f (a + 2) − 2f (a + 1) + f (a)], B = (a0 (1 − x0 ) − b0 ) log a0 (1 − x0 ) . b0 We will use the following simple facts in our case by case analysis. (a) If k ≥ 1, 1 + log k ≤ f (k + 1) − f (k) ≤ 1 + log k + (2k)−1 . (b) For all c, x ≥ 0, x log x − cx ≥ − exp(c − 1). (c) For all k ∈ N, 0 ≤ k[f (k + 2) − 2f (k + 1) + f (k)] ≤ 1. We now consider cases (i) a = b = 0, (ii) 0 = a < b, (iii) 0 = b < a, (iv) 0 < b < a(1 − x0 ) and (v) 0 < a(1 − x0 ) ≤ b. 109 4.14 Technical lemmas relating to ∆S (i) If a = b = 0: A=0 B = −x0 e−1 log(1 − x0 ) ≤ 10 A ≥ B/2 − 5 (ii) If 0 = a < b: b −1 A = b log + (b + 1) log(1 + b ) − b[2 log 2] 1 − x0 b ≥ b log + 1 − 2 log 2 1 − x0 b −1 +1 B = (b − e ) log 1 − x0 A − B/2 ≥ b [log b + 1 − 4 log 2] /2 ≥ −5 (iii) If 0 = b < a, set t = a(1 − x0 ). A ≥ t[log t + 1] B = (t − e−1 )[log t + 1] A − B/2 ≥ −5 (iv) If 0 < b < a(1 − x0 ), set t = a(1 − x0 )/b ≥ 1. 1 A ≥ b(t − 1) log t − −1 2b B = b(t − 1) log t A − B/2 ≥ (t − 1)[log t − 1]/2 − 1 ≥ −5 (v) If 0 < a(1 − x0 ) ≤ b, set t = b/a(1 − x0 ) ≥ 1. 1 A ≥ a(1 − x0 )(t − 1) log t − − t(1 − x0 ) 2a B ≥ a(1 − x0 )(t − 1) log t 1 A − B/2 ≥ (1 − x0 )(t − 1)[log t − 1] − t(1 − x0 ) ≥ −5 2 110 4.14 Technical lemmas relating to ∆S Proof of Lemma 4.20. The proof is similar to the proof of Lemma 4.16. In place of inequality (4.26) we must show, mk (N −1 log(1 − x0 ) + g(xk ) − g(xk+1 )) − xk−1 xk+1 [g(xk + 1/N ) − g(xk )] ≥ 1 m2k 5 − 2. N max{xk (1 − x0 ), xk+1 } N This is equivalent to showing A ≥ C − 5, with A =(a(1 − x0 ) − b)(log(1 − x0 ) + f (a + 1) − f (a) − f (b + 1) + f (b)) − b[f (a + 2) − 2f (a + 1) + f (a)], C =(a(1 − x0 ) − b)2 / max{a(1 − x0 ), b}. We consider the following cases: (i) a = b = 0, (ii) 0 = b < a (iii) 0 = a < b, (iv) 0 < b < a(1 − x0 ) and (v) 0 < a(1 − x0 ) ≤ b. (i) If a = b = 0 then A = C = 0. (ii) If 0 = b < a, set t = a(1 − x0 ). A − C ≥ t log t − t ≥ −5 (iii) If b > a = 0: A − C ≥ b log b − b (1 + 2 log 2) 1 − x0 ≥ −5 (iv) If 0 < b < a(1 − x0 ), set t = a(1 − x0 )/b > 1. 1 A ≥ b(t − 1) log t − −1 2b 1 C = b(t − 1) 1 − t A − C ≥ (t − 1)[log t − 3/2] − 1 ≥ −5 111 4.14 Technical lemmas relating to ∆S (v) If 0 < a(1 − x0 ) ≤ b, set t = b/(a(1 − x0 )). 1 A ≥ a(1 − x0 )(t − 1) log t − − t(1 − x0 ) 2a 1 C = a(1 − x0 )(t − 1) 1 − t A − C ≥ (1 − x0 )[(t − 1)(log t − 5/2) − 1] ≥ −5 Proof of Lemma 4.21. Of course var(∆S | X = x) ≤ E((∆S)2 | X = x). Once we have bounded E((∆S)2 | X = x) the result follows by Lemma 4.16. Sum over the number of balls in the source box, i, and the sink boxes, j. h i2 XX xi (xj − δij /N ) g(xi − N −1 ) − g(xi−1 ) + g(xj − N −1 ) − g(xj+1 ) i>0 j≥0 ≤ XX h xi (xj − δij /N ) − g(xi−1 ) − N −1 log(1 − x0 ) + g(xi − N −1 ) i>0 j≥0 i2 +g(xj − N −1 ) + N −1 log(1 − x0 ) − g(xj+1 ) h XX 2 ≤2 xi xj g(xi−1 ) + N −1 log(1 − x0 ) − g(xi − N −1 ) i>0 j≥0 2 i + g(xj − N −1 ) + N −1 log(1 − x0 ) − g(xj+1 ) ≤2 0 −1 n h X 2 xk (1 − x0 ) g(xk − N −1 ) + N −1 log(1 − x0 ) − g(xk+1 ) k=0 +xk+1 g(xk ) + N −1 log(1 − x0 ) − g(xk+1 − N −1 2 i ) Let 2 A =xk (1 − x0 ) g(xk − N −1 ) + N −1 log(1 − x0 ) − g(xk+1 ) 2 + xk+1 g(xk ) + N −1 log(1 − x0 ) − g(xk+1 − N −1 ) , B =(x0k (1 − x0 ) − x0k+1 ) log x0k (1 − x0 )/x0k+1 . With x0 ≤ 2/3 it is straightforward to check N 3 A ≤ 3N B log N + 100; consider the cases xk = xk+1 = 0, xk+1 > xk = 0, xk > xk+1 = 0 and xk , xk+1 > 0. 112 References [1] C. J. K. Batty and H. W. Bollmann. Generalised Holley–Preston inequalities on measure spaces and their products. Z. für Wahrsch’theorie verw. Geb., 53:157–173, 1980. [2] R. J. Baxter. Exactly Solved Models in Statistical Mechanics. Academic Press, London, 1982. [3] M. Ben-Or and N. Linial. Collective coin flipping. In Randomness and Computation, pages 91–115. Academic Press, New York, 1990. [4] I. Benjamini, R. Lyons, Y. Peres, and O. Schramm. Uniform spanning forests. Ann. Probab., 29:1–65, 2001. [5] C. E. Bezuidenhout, G.R. Grimmett, and H. Kesten. Strict inequality for critical values of Potts models and random-cluster processes. Communications in Mathematical Physics, 158:1–16, 1993. [6] P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968. [7] M. Blume. Theory of the first-order magnetic phase change in UO2 . Phys. Rev., 141:517–524, 1966. [8] B. Bollobás and O. M. Riordan. The critical probability for random Voronoi percolation in the plane is 1/2. Probab. Theory Related Fields, 136(3):417– 468, 2006. [9] B. Bollobás and O. M. Riordan. A short proof of the Harris-Kesten theorem. Bull. London Math. Soc., 38(3):470–484, 2006. 113 REFERENCES [10] J. Bourgain, J. Kahn, G. Kalai, Y. Katznelson, and N. Linial. The influence of variables in product spaces. Israel J. Math., 77:55–64, 1992. [11] L. Breiman. Probability. Classics in Applied Mathematics. SIAM, New York, 1992. [12] S.R Broadbent and J.M. Hammersley. Percolation processes. I. Crystals and mazes. Proceedings of the Cambridge Philosophical Society, 52:629–641, 1957. [13] R. M. Burton and M. Keane. Density and uniqueness in percolation. Comm. Math. Phys., 121:501–505, 1989. [14] H. W. Capel. On the possibility of first-order transitions in Ising systems of triplet ions with zero-field splitting. Physica, 32:966–988, 1966. [15] H. W. Capel. On the possibility of first-order transitions in Ising systems of triplet ions with zero-field splitting. Physica, 33:295–331, 1967. [16] H. W. Capel. On the possibility of first-order transitions in Ising systems of triplet ions with zero-field splitting. Physica, 37:423–441, 1967. [17] J. T. Chayes and L. Chayes. Percolation and random media. In K. Osterwalder and R. Stora, editors, Critical Phenomena, Random Systems and Gauge Theories (Les Houches, Session XLIII, 1984), pages 1001–1142. Elsevier, Amsterdam, 1986. [18] P. Diaconis. Analysis of a Bose-Einstein Markov chain. Ann. Inst. H. Poincaré Probab. Statist., 41(3):409–418, 2005. [19] P. Diaconis and L. Saloff-Coste. Logarithmic Sobolev inequalities for finite Markov chains. Ann. Appl. Probab., 6(3):695–750, 1996. [20] P. Ehrenfest and T. Ehrenfest. über zwei bekannte einwände gegen das boltzmannsche h-theorem. Phys. Z., 8, 1907. [21] P. Erdős and A. Rényi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int. Közl., 5:17–61, 1960. 114 REFERENCES [22] M.R. Evans. Phase transitions in one-dimensional nonequilibrium systems. Brazilial Journal of Physics, 30:42, 2000. [23] K. J. Falconer. The Geometry of Fractal Sets. Cambridge University Press, Cambridge, 1985. [24] W. Feller. An Introduction to Probability Theory and its Applications. Vol. I. Third edition. John Wiley & Sons Inc., New York, 1968. [25] C. M. Fortuin and P. W. Kasteleyn. On the random-cluster model. I. Introduction and relation to other models. Physica, 57:536–564, 1972. [26] C. M. Fortuin, P. W. Kasteleyn, and J. Ginibre. Correlation inequalities on some partially ordered sets. Comm. Math. Phys., 22:89–103, 1971. [27] E. Friedgut. Influences in product spaces: KKL and BKKKL revisited. Combin. Probab. Comput., 13:17–29, 2004. [28] E. Friedgut and G. Kalai. Every monotone graph property has a sharp threshold. Proc. Amer. Math. Soc., 124:2993–3002, 1996. [29] A. Frieze, R. Kannan, and N. Polson. Sampling from log-concave distributions. Ann. Appl. Probab., 4(3):812–837, 1994. [30] H.-O. Georgii, O. Häggström, and C. Maes. The Random Geometry of Equilibrium Phases. In Phase transitions and critical phenomena, Vol. 18, pages 1–142. Academic Press, San Diego, CA, 2001. [31] B. T. Graham and G. R. Grimmett. Influence and sharp-threshold theorems for monotonic measures. Ann. Probab., 34(5):1726–1745, 2006. [32] B. T. Graham and G. R. Grimmett. Random-current representation of the Blume-Capel model. J. Stat. Phys., 125(2):287–320, 2006. [33] G. R. Grimmett. The stochastic random-cluster process and the uniqueness of random-cluster measures. Ann. Probab., 23:1461–1510, 1995. [34] G. R. Grimmett. The Random-Cluster Model. Springer, Berlin, 2006. 115 REFERENCES [35] G. R. Grimmett. The random-cluster model: Corrections. http://www.statslab.cam.ac.uk/∼grg/books/rcmcorrections.html, 2006. [36] G. R. Grimmett and D. R. Stirzaker. Probability and Random Processes. Oxford University Press, Oxford, second edition, 1993. [37] T.E. Harris. A lower bound for the critical probability in a certain percolation process. Proc. Cam. Phil. Soc., 56:13–20, 1960. [38] R. Holley. Remarks on the FKG inequalities. Comm. Math. Phys., 36:227– 231, 1974. [39] E. Ising. Beitrag zur Theorie des Ferromagnetismus. Z. Physik, 31:253–258, 1925. [40] N.L. Johnson and S. Kotz. Urn models and their application. John Wiley & Sons, New York-London-Sydney, 1977. [41] J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. In Proceedings of 29th Symposium on the Foundations of Computer Science, pages 68–80. Computer Science Press, 1988. [42] H. Kesten. The critical probability of bond percolation on the square lattice equals 21 . Comm. Math. Phys., 74(1):41–59, 1980. [43] C. Kipnis and C. Landim. The Scaling Limits of Interacting Particle Systems. [44] M.J. Klein. Entropy and the Ehrenfest urn model. Physica, 22:569–575, 1956. [45] B. M. McCoy and T. T. Wu. The Two-Dimensional Ising Model. [46] C. McDiarmid. Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics, volume 16 of Algorithms Combin., pages 195–248. Springer, Berlin, 1998. [47] L. Onsager. Crystal statistics. I. A two-dimensional model with an orderdisorder transition. Phys. Rev., 65:117–149, 1944. 116 REFERENCES [48] Y. Peres. Mixing for Markov chains and spin systems. Lecture Notes 2005 UBC Summer School, http://stat–www.berkeley.edu/∼peres/. [49] R. B. Potts. Some generalized order–disorder transformations. Proc. Camb. Phil. Soc., 48:106–109, 1952. [50] C. J. Preston. A generalization of the FKG inequalities. Comm. Math. Phys., 36:233–241, 1974. [51] L. Russo. A note on percolation. Z. für Wahrsch’theorie verw. Geb., 43:39– 48, 1978. [52] P. D. Seymour and D. J. A. Welsh. Percolation probabilities on the square lattice. In B. Bollobás, editor, Advances in Graph Theory, Annals of Discrete Mathematics 3, pages 227–245. North-Holland, Amsterdam, 1978. [53] F. Spitzer. Interaction of Markov processes. Advances in Math., 5:246–290, 1970. 117