Interacting Stochastic Systems Benjamin Graham

advertisement
Interacting Stochastic Systems
Benjamin Graham
Fitzwilliam College and Statistical Laboratory
University of Cambridge
This dissertation is submitted for the degree of
Doctor of Philosophy
August 2007
Preface
This dissertation is the result of my own work and includes nothing
which is the outcome of work done in collaboration except where
specifically indicated in the text.
I would like to thank my PhD supervisor, Geoffrey Grimmett. Chapters 2 and 3 were done in collaboration with him. We have agreed
that 65% of the work is mine. The presentation is entirely my own.
The work has also been submitted to journals as joint publications
[31, 32]. I am the sole author of Chapter 4.
I would like to acknowledge the financial support I have received from
the Engineering and Physical Sciences Research Council under a Doctoral Training Award to the University of Cambridge.
Abstract
Statistical mechanics is the branch of physics that seeks to explain the
properties of matter that emerge from microscopic scale interactions.
Mathematicians are attempting to acquire a rigorous understanding
of the models and calculations arising from physicists’ work. The field
of interacting particle systems has proved a fruitful area for mathematical research. Two of the most fundamental models are the Ising
model and bond percolation.
The Ising model was suggested by Lenz in 1920. It is a probabilistic
model for ferromagnetism. Magnetization can then be explored as
correlation between spin random variables on a graph. Bond percolation was introduced by Simon Broadbent and John Hammersley in
1957. It is a model for long range order. Edges of a lattice graph are
declared open, independently, with some probability p, and clusters
of open edges are studied. Both these models can be understood as
aspects of the random-cluster model. In this thesis we study various
aspects of mathematical statistical mechanics.
In Chapter 2 we create a diluted version of the random-cluster model.
This allows the coupling of the Ising model to the random-cluster
model to be extended to include the Blume–Capel model. Crucially,
it retains some of the key properties of its parent model. This enables
much of the random-cluster technology to be carried forward.
The key issue for bond percolation concerns the fraction of open
edges required in order to have long range connectivity. Harry Kesten
proved that this fraction is precisely one half for the square planar lattice. Recent development in the theory of influence and sharp thresholds allowed Béla Bollobás and Oliver Riordan to simplify parts of
his proof. In Chapter 3 we extend an influence result to apply to
monotonic measures. This allows sharp thresholds to be shown for
certain families of stochastically increasing monotonic distributions,
including random-cluster measures.
In Chapter 4 we study time to convergence for a mean-field zerorange process. The problem was motivated by the canonical ensemble
model of energy levels used in the derivation of Maxwell–Boltzmann
distributions. By considering the entropy of the system, we show that
the empirical distribution rapidly converges—in the sense of KullbackLeibler divergence—to a geometric distribution. The proof utilizes
arguments of spectral gap and log Sobolev type.
Contents
1 Introduction
1.1 Phase transition and statistical mechanics . . . . . . . . . . . . .
1
1
1.2
Bond percolation . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Gibbs distributions . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.4
1.5
The Ising and Potts models . . . . . . . . . . . . . . . . . . . . .
The random-cluster measure . . . . . . . . . . . . . . . . . . . . .
6
8
1.6
The FKG inequality and Holley’s theorem . . . . . . . . . . . . .
10
1.7
Influence and sharp threshold . . . . . . . . . . . . . . . . . . . .
12
1.8
Microcanonical ensemble Markov chains . . . . . . . . . . . . . .
14
2 Blume–Capel model
16
2.1
The Blume–Capel model . . . . . . . . . . . . . . . . . . . . . . .
16
2.2
2.3
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FKG and Holley’s theorem . . . . . . . . . . . . . . . . . . . . . .
18
19
2.4
The diluted-random-cluster representation of the BCP model . . .
20
2.5
The vertex DRC marginal measure . . . . . . . . . . . . . . . . .
24
2.6
2.7
Edge comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . .
Infinite-volume measures . . . . . . . . . . . . . . . . . . . . . . .
29
31
2.8
Uniqueness by convexity of pressure . . . . . . . . . . . . . . . . .
37
2.9
Phases and comparison results . . . . . . . . . . . . . . . . . . . .
40
3 Influence and monotonic measures
3.1 Influence beyond KKL . . . . . . . . . . . . . . . . . . . . . . . .
46
46
3.2
Symmetric events . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
3.3
Sharp thresholds for product measures . . . . . . . . . . . . . . .
49
iv
CONTENTS
3.4
Self duality for bond percolation on L2 . . . . . . . . . . . . . . .
50
3.5
3.6
Influence for monotonic measures . . . . . . . . . . . . . . . . . .
Russo formula for monotonic measures . . . . . . . . . . . . . . .
52
56
3.7
Sharp thresholds for the random-cluster measure . . . . . . . . . .
58
3.8
Self duality for the random-cluster model . . . . . . . . . . . . . .
59
3.9 Long-ways crossings and the critical point . . . . . . . . . . . . .
3.10 Influence for continuous random variables . . . . . . . . . . . . . .
61
62
4 The mean-field zero-range process
68
4.1
4.2
Balls and boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
71
4.3
The Ehrenfest urn model . . . . . . . . . . . . . . . . . . . . . . .
72
4.4
Markov chain convergence . . . . . . . . . . . . . . . . . . . . . .
75
4.5
4.6
Entropy for the mean-field ZRP . . . . . . . . . . . . . . . . . . .
Total variation convergence . . . . . . . . . . . . . . . . . . . . .
77
80
4.7
A biased random walk . . . . . . . . . . . . . . . . . . . . . . . .
81
4.8
Concentration results . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.9 Coupling results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 Slow initial decrease in S . . . . . . . . . . . . . . . . . . . . . . .
85
92
4.11 Rapid decrease in S . . . . . . . . . . . . . . . . . . . . . . . . . .
93
4.12 Final observations . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
4.13 Proofs relating to entropy and the biased random walk . . . . . . 101
4.14 Technical lemmas relating to ∆S . . . . . . . . . . . . . . . . . . 106
References
117
v
List of Figures
1.1
Approximate phase diagram for H2 O . . . . . . . . . . . . . . . .
1.2
Percolation with p = 0.48, 0.50 and 0.52 on the 20 × 20 torus and
2
the 200 × 200 torus. The largest open cluster is black, the other
open edges are grey. . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.1
Approximate Blume–Capel phase diagram . . . . . . . . . . . . .
41
2.2
2.3
Approximate Blume–Capel phase diagram . . . . . . . . . . . . .
Blume–Capel comparison results . . . . . . . . . . . . . . . . . . .
42
43
3.1
Finite graph and dual . . . . . . . . . . . . . . . . . . . . . . . . .
51
3.2
Square planar lattice and isomorphic dual . . . . . . . . . . . . .
51
4.1
A sample path: the empirical distribution for N = 107 , R = 20 at
4.2
4.3
intervals of 40N steps . . . . . . . . . . . . . . . . . . . . . . . . .
70
Hydrodynamic limit XRf tends to 1/2 . . . . . . . . . . . . . . . .
Graph of δ(x, y) as a function of y . . . . . . . . . . . . . . . . . .
73
74
vi
Chapter 1
Introduction
1.1
Phase transition and statistical mechanics
Statistical mechanics seeks to explains the properties of matter that emerge from
microscopic scale interactions. It has for a long time been a source of interesting
mathematical problems. The field goes back to the work of Boltzmann, Gibbs
and Maxwell in the late 19th and early 20th century. Most models are not exactly
solvable. Simple local interactions produce complex long range behaviour.
In 1895 Pierre Curie discovered that ferromagnetic substances exhibit a phase
transition with temperature—in the case of iron at 1043 K. The phenomenon
had been observed 60 years earlier by Claude Pouillet; however, he had lacked
the means to measure temperature as precisely. In the classic physics experiment,
an iron bar is placed with a strong magnetic field parallel to its axis. The bar
becomes magnetized. When the magnetic field is reduced to zero, one of two
things happens. If the temperature is above the Curie point, the bar retains no
magnetism. When the temperature is below the Curie point, it does retain a
magnetized state. This is a physical example of a threshold phenomenon.
Statistical physicists have devised stochastic models for magnetic spin systems
in thermal equilibrium. They allow the study of the phase transition phenomenon.
The most famous model for ferromagnetism is the Ising model. It exhibits phase
transitions in two and higher dimensions. Another type of phase transition is
1
1.2 Bond percolation
Pressure
Water
Ice
Steam
Temperature
Figure 1.1: Approximate phase diagram for H2 O
that exhibited by H2 O. The phase diagram has a tri-critical point where three
phases meet—the triple point of water, 273 K, 612 Pa. See Figure 1.1.
1.2
Bond percolation
It would be almost impossible to begin a thesis on statistical mechanics without
first mentioning percolation. Bond percolation was introduced by Simon Broadbent and John Hammersley in 1957 [12]. It is one of the simplest spatial models
with a phase transition phenomenon, yet still some basic question have proved
difficult to answer. Bond percolation is defined on a lattice. Rather than defining
exactly what we mean by a lattice we will just give the standard example. Let
Z = {. . . , −1, 0, 1, . . . } be the set of integers and let Zd , d ≥ 2, be the set of
d-tuples of integers,
Zd = {(x1 , . . . , xd ) : x1 , . . . , xd ∈ Z}.
2
1.3 Gibbs distributions
Define the L1 distance l : Zd × Zd → [0, ∞) by
l(x, y) =
d
X
|xi − yi |,
x, y ∈ Zd .
i=1
d
Take E to be the pairs of lattice points at distance one, so l is the normal
graph-theoretic distance.
Ed = {hx, yi : x, y ∈ Zd , l(x, y) = 1}.
This is the set of ‘nearest-neighbours’ of Zd . The graph Ld = (Zd , Ed ) is called
the d-dimensional hypercubic lattice.
d
Let Ω = {0, 1}E . For ω ∈ Ω, we say that an edge is ω-open (or just open) if
ω(e) = 1, and ω-closed otherwise. Let (Ω, F, µp ) be the product probability space
such that the edge states are independent and each edge is open with probability
p. For x, y ∈ Zd , write {x ↔ y} for the event that there is a chain of open edges
connecting x to y. Write {x ↔ ∞} for the event that x is the starting point of
an arbitrarily long chain of open edges. The open clusters are the maximal sets
of connected vertices. The model is said to percolate, or to be supercritical, if
µp (0 ↔ ∞) > 0. By Kolmogorov’s zero-one law, the system has an infinite cluster
if and only if µp (0 ↔ ∞) > 0. It is clear that µp (0 ↔ ∞) is non-decreasing in p.
The critical probability is defined
pc = inf{p : µp (0 ↔ ∞) > 0}.
The exact critical point for the hypercubic lattice is only known for d = 2; this
case is particularly amenable to analysis due to the ‘self-duality’ of the planar
lattice.
1.3
Gibbs distributions
Suppose a system can be described by a configuration σ ∈ Σ with Σ finite. Let H
be the energy function, or Hamiltonian, H : Σ → R. Let T be the temperature
of the system and take β = (kB T )−1 to be the inverse-temperature; the constant
kB is Boltzmann’s constant. Define the partition function,
X
Z(β) =
e−βH(σ) .
σ
3
1.3 Gibbs distributions
Figure 1.2: Percolation with p = 0.48, 0.50 and 0.52 on the 20 × 20 torus and the
200 × 200 torus. The largest open cluster is black, the other open edges are grey.
4
1.3 Gibbs distributions
Then the Gibbs distribution Gβ is defined by
Gβ (σ) =
e−βH(σ)
.
Z(β)
We are abusing notation by using Gβ to mean both a probability measure and the
corresponding mass function. The motivation for the measure is as follows [45].
Assume now for simplicity that H is an integer valued function. As Σ is finite
it does not have interesting statistical properties by itself. Consider a canonical
ensemble of N copies of the system, with states σ (1) , . . . , σ (N ) . The total energy
of the ensemble is
N
X
Etot =
H σ (i) .
i=1
Imagine that the copies have been weakly interacting in some way that conserves
the total energy, and have reached an equilibrium state. Let
X
X
Ω(Etot ) =
···
δEtot ,Etot ,
σ (1)
σ (N )
counting the number of configurations of the canonical ensemble with total energy Etot . The basic assumption of statistical mechanics is that all compatible
configurations are equally likely once the system has reached equilibrium. Such
systems are said to have Maxwell–Boltzmann statistics. This gives
δE ,E
P σ (1) , . . . , σ (N ) | Etot = tot tot .
Ω(Etot )
Take the limit N → ∞ with E = Etot /N a fixed integer. We want to show that
the marginal distribution of one system converges to a Gibbs distribution. If
p(·) = lim P σ (1) = · | Etot = N E ,
N →∞
then the energy of p is
P
σ
p(σ)H(σ) = E. Define entropy,
X
S(p) = −
p(σ) log p(σ).
σ
Let δ : (0, ∞) × [0, ∞) → R be given by
d
δ(x, y) = y log y − x log x − (y − x)
t log t t=x ≥ 0.
dt
5
1.4 The Ising and Potts models
Take inverse temperature β such that Gβ has energy E,
E=−
∂
log Z(β).
∂β
For all distributions p with energy E,
X
S(Gβ ) − S(p) =
δ(Gβ (σ), p(σ)) ≥ 0.
σ
Therefore Gβ is the maximum entropy distribution. This implies,
Gβ (σ) = lim P(σ (1) = σ | Etot = N E).
N →∞
1.4
The Ising and Potts models
The Ising model was suggested by Lenz in 1920, and first analysed by his student
Ising in 1925 [39]. It is a probabilistic model for ferromagnetism. Magnetization
can be explored as correlation between simple random variables on a graph. Ising
analysed the model on the set of integers with nearest neighbour interactions. At
finite temperature, the Ising model is simply an ergodic two state Markov Chain—
there is no long range order. Certain associated calculations may be done exactly
on the d = 2 hypercubic lattice with uniform nearest neighbour interactions and
no external magnetic field [45, 47]. There are many similar models for magnetism
[2].
Let G = (V, E) be a graph. A spin model consists of a set of spins S that
each vertex can take, and a Hamiltonian function H : Σ → R from the set of
configurations Σ = S V to energy values. Hamiltonians are often written in an ill
defined way, as sums that do not converge on unbounded graphs. However, if two
configurations σ1 , σ2 ∈ Σ only differ at a finite number of vertices, it is standard
to interpret the sums in such a way that H(σ1 ) − H(σ2 ) is well-defined.
Let kB be Boltzmann’s constant and T ∈ (0, ∞) be the temperature. Define
inverse temperature β = (kB T )−1 . The Ising model has spin states S = {+1, −1}.
Let J ∈ R be the interaction strength and let h ∈ R be the strength of the external
magnetic field. The Hamiltonian H : Σ → R is given by
X
X
H(σ) = −J
σ(x)σ(y) − h
σ(x), σ ∈ Σ = S V .
x∈V
hx,yi∈E
6
1.4 The Ising and Potts models
The first sum is over the set of unordered edges and the second is over the set
of vertices. If J > 0 the model is called ferromagnetic; the spins are positively
correlated. This is easily seen using the random-cluster representation that will
be introduced in the next section. An important special case is h = 0, the Ising
model with no external magnetic field.
If G is finite we can immediately define the Ising measure π = πG,βJ,βh by the
Gibbs formalism,
1
π(σ) = I exp(−βH(σ)), σ ∈ Σ,
Z
with
X
I
Z I = ZβJ,βh
(G) =
exp(−βH(σ))
σ∈Σ
I
the necessary normalizing constant. Z is called the Ising partition function.
If G is infinite, constructing the Ising measure is not quite so straightforward.
Define Λ = (V, E) a region of G. Let V ⊂ V be a finite set of vertices. Let ∂V
be the external vertex boundary of V , the vertices in V \ V that are adjacent to a
member of V . Let E be the subset of edges E with both end vertices in V ∪ ∂V .
Note (V ∪ ∂V, E) is a subgraph of G. Let ξ ∈ Σ, call ξ the boundary conditions.
Let ΣξΛ be the set {σ ∈ Σ : σ(x) = ξ(x) for x ∈ V \ V }. The Ising measure on Λ
with boundary conditions ξ is defined by,




X
X
1
ξ
σ(x)σ(y) + βh
σ(x) , σ ∈ ΣξΛ .
πΛ,βJ,βh (σ) = I exp βJ


Z
x∈V
hx,yi∈E
I
Again Z I = ZβJ,βh
(ξ, Λ) is the required normalizing constant. There are two
standard ways to define the Ising model on an infinite graph G. They are both
measures on (Σ, F) where F is the σ-algebra generated by the product topology
on Σ = S V .
The first is as the infinite-volume limit of finite Ising measures. Let (Λn ) be an
increasing sequence of regions that tend towards G as n → ∞. WβJ,βh is defined
as the set of accumulation points of the sequences of measures πΛξ n ,βJ,βh . There is
no loss of generality in using ξ instead of a series of boundary conditions (ξn ), see
[33]. Write co WβJ,βh for the convex closure of WβJ,βh . If, for some ξ ∈ Σ and all
increasing sequences (Λn ) that tend to G there is a single accumulation point, we
7
1.5 The random-cluster measure
ξ
can define that point to be πβJ,βh
. We can also define the infinite-volume limit
with free boundary conditions. Let (Λn ) be an increasing sequence of subgraphs
that tend to G, and consider the accumulation points of the measures πΛn ,βJ,βh .
The second is the Dobrushin-Lanford-Ruelle (DLR) formalism. Let TΛ be the
σ-algebra generated by the states of vertices outside Λ. A DLR measure is a
measure π such that for π-almost every ξ ∈ Σ, for all regions Λ,
ξ
(A),
π(A | TΛ )(ξ) = πΛ,βJ,βh
∀A ∈ F.
The Ising model can be seen as the ‘q = 2’ case of the Potts model [49]. For
q ∈ {1, 2, . . . } the q-state Potts model has spin states S = {1, 2, . . . , q}. Let
J ∈ R be the interaction strength. The Hamiltonian is given by
H(σ) = −J
X
δσ(x),σ(y) ,
σ ∈ Σ = S V.
hx,yi∈E
This defines measures for inverse temperature β on graph G, in the same ways
as the Ising model. The Potts model may be viewed as a generalization of the
h = 0 Ising model. The −1, +1 Ising states are represented in the q = 2 Potts
model by states 1, 2.
1.5
The random-cluster measure
A key step in the understanding of the Ising and Potts models was the development of the random-cluster model by Fortuin and Kasteleyn [25]. It provides a
connection between bond percolation and the magnetic spin models. For a recent
account of the random-cluster measure see [30, 34]. It is an edge model like bond
percolation. Edges are open or closed, maximal set of vertices connected by open
edge paths are called clusters.
Let G = (V, E) be a finite graph. Let Ω = {0, 1}E . For ω ∈ Ω, let k(ω)
be the number of ω-open clusters. The random-cluster measure with parameters
p ∈ (0, 1), q ∈ (0, ∞) is given by,
Y p ω(e)
1
k(ω)
µ(ω) = RC
q
Zp,q (G)
1−p
e∈E
8
1.5 The random-cluster measure
with normalizing constant, the random-cluster partition function,
RC
(G)
Zp,q
=
X
q
k(ω)
ω∈{0,1}E
Y p ω(e)
.
1−p
e∈E
(1.1)
If q = 1 this is just a product measure with density p. If q > 1 then the measure favours having more clusters and so the edge density is driven down. The
reverse holds for q < 1. This is a generalisation of percolation with a dependence structure. As described in the next section, measure µ has the property of
monotonicity if q ≥ 1.
The density of an edge e, conditional on the states of the other edges, takes
a particularly simple form. Let Je be the event edge e is open and let Te be the
σ-algebra generated by {Jf : f 6= e}. The value of µ(Je | Te ) depends on whether
or not the edges E \ {e} contain an open path between the end vertices of e,
(
p
if the end vertices of e are joined,
(1.2)
µ(Je | Te ) =
p
otherwise.
p+q(1−p)
The random-cluster measure can also be defined with boundary conditions on a
region Λ = (V, E) of an infinite graph G. The number of clusters k(ω) is then
taken to be the number of open edge clusters of G that intersect V ∪ ∂V .
The random-cluster model is naturally coupled to the Potts model. Let
G = (V, E) be a finite graph with q-state Potts measure φ at inverse temperature β, and (p, q) random-cluster measure µ. If p = 1 − e−βJ there is a measure
π on {1, . . . , q}V × {0, 1}E with the following properties.
(i) The marginal measure on {1, . . . , q}V is φ.
(ii) The marginal measure on {0, 1}E is µ.
(iii) Given the edge configuration, spins are constant on connected clusters. Between the different clusters, the spins are independent. The common spin
of each cluster is uniformly distributed.
(iv) Given the vertex spins, an edge hx, yi is open with probability p if the spins
at x and y agree, and closed otherwise.
9
1.6 The FKG inequality and Holley’s theorem
We get the following formula relating long range connectivity in the randomcluster model and long range correlation in the Potts models. The two-point
correlation function for the Potts model is,
τ (x, y) = π(σx = σy ) − q −1 = (1 − q −1 ) φ(x ↔ y).
In Chapter 2 we consider a variant of the Potts model, the Blume–Capel model.
We analyse the Blume–Capel model by extending the coupling between the Potts
model and the random-cluster model.
1.6
The FKG inequality and Holley’s theorem
Let Ω be a partially ordered set. Ω is called a lattice if for all ω1 , ω2 ∈ Ω, there
exists a least upper bound ω1 ∨ ω2 ∈ Ω, and a greatest lower bound ω1 ∧ ω2 ∈ Ω.
A lattice is distributive if for all ω1 , ω2 , ω3 ∈ Ω,
ω1 ∧ (ω2 ∨ ω3 ) = (ω1 ∧ ω2 ) ∨ (ω1 ∧ ω3 ),
ω1 ∨ (ω2 ∧ ω3 ) = (ω1 ∨ ω2 ) ∧ (ω1 ∨ ω3 ).
Let S ⊂ R and let Ω = S I for a set I. Then Ω is a distributive lattice with
respect to the coordinate-wise partial order: ω1 ≤ ω2 if ∀i ∈ I, ω1 (i) ≤ ω2 (i). For
ω1 , ω2 ∈ Ω,
(ω1 ∨ ω2 )(i) = max{ω1 (i), ω2 (i)},
(ω1 ∧ ω2 )(i) = min{ω1 (i), ω2 (i)}.
A set A ⊂ Ω is called increasing if it is closed upwards with respect to the partial
order,
∀ω1 , ω2 ∈ Ω : ω1 ≤ ω2 and ω1 ∈ A → ω2 ∈ A.
Let F be the set of all subsets of Ω. Again we abuse notation by using µ to mean
both a measure on (Ω, F), and the corresponding mass function from Ω to [0, 1].
A probability measure µ is positively-associated if for all increasing A, B ⊂ Ω,
µ(A ∩ B) ≥ µ(A)µ(B).
10
1.6 The FKG inequality and Holley’s theorem
An early result in percolation theory is that product measures on sets {0, 1}I are
positively-associated [37]. Motivated by the Potts measure, Fortuin, Kasteleyn
and Ginibre showed the following [26].
Theorem 1.3. Let µ be a probability measure on a finite distributive lattice Ω.
If for all ω1 , ω2 ∈ Ω,
µ(ω1 ∧ ω2 )µ(ω1 ∨ ω2 ) ≥ µ(ω1 )µ(ω2 )
(1.4)
then µ is positively-associated.
Inequality (1.4) is known as the FKG lattice condition. Holley proved a
related result [38]. We actually state a slight generalisation, see for example [50].
Given measures µ1 , µ2 on Ω say that µ1 is stochastically dominated by µ2 , written
µ1 ≤st µ2 , if for all increasing A ⊂ Ω, µ1 (A) ≤ µ2 (A).
Theorem 1.5. Let Ω be a finite distributive lattice. Let µ1 , µ2 be two measures
on (Ω, F) such that for all ω1 , ω2 ∈ Ω,
µ1 (ω1 ∧ ω2 )µ2 (ω1 ∨ ω2 ) ≥ µ1 (ω1 )µ2 (ω2 ).
Then µ1 ≤st µ2 .
We will also define strong positive-association and monotonicity. We call µ
positive if for all ω ∈ Ω, µ(ω) > 0. Let µ be a positive measure on a finite
distributive lattice Ω = S I . Let J ⊂ I. Define ΩJ = S J and for ξ ∈ Ω let
ΩξJ = {ω ∈ Ω : ω(i) = ξ(i) for i ∈ I \ J}.
Let µξJ be the measure µ, conditioned on the coordinates in I \ J taking the
corresponding values from ξ. Let Xi be the random variable Xi = ω(i). For
η ∈ ΩJ ,
µξJ (η) = µ(Xj = η(j) for j ∈ J | Xi = ξ(i) for i ∈ I \ J).
We say that µ is strongly positively-associated if for all J ⊂ I and for all ξ ∈ Ω,
µξJ is positively associated. We say that µ is monotonic if for J ⊂ I and for all
increasing A ⊂ Ω, µξJ (A) is increasing in ξ.
If Ω = {0, 1}I finite and µ is a positive measure on (Ω, F) then these results
take a particularly simple form [34].
11
1.7 Influence and sharp threshold
Theorem 1.6. The following are equivalent:
(i) µ satisfies the FKG lattice condition (1.4).
(ii) µ is strongly positively-associated.
(iii) µ is monotonic.
1.7
Influence and sharp threshold
Another type of random graph is due to Erdős and Rényi [21]. The Erdős–Rényi
graph G(n, p) is a bond percolation model on the complete graph. Starting with
the graph of n vertices and all possible n2 edges, edges are open with probability
p and otherwise closed. An alternative model is G(n, M ). All M -edge subgraphs
n
of the complete graph are assigned probability 1/ M
. Many properties of Erdős–
Rényi graphs have sharp thresholds. It is natural to consider the limit n → ∞
with p = c/n for constant c. If c < 1 then the largest connected cluster has size
Θ(log n). If c > 1 then there is a unique giant cluster of size Θ(n), and the second
largest cluster has size Θ(log n).
A concept that has proved to be closely related to sharp-thresholds is that of
influence. Ben-Or and Linial [3] considered influence in the context of computer
science. They were studying coin tossing algorithms involving fair coins. How
important is any given coin? For N ∈ N let Ω = {0, 1}N , the N -dimensional
Hamming space. Recall that Ω has the coordinate-wise partial order. Let µ be the
uniform measure on Ω, so the coordinates are independent Bernoulli 1/2 random
variables. We will use the following notation to manipulate single coordinates of
vectors in Ω.
(
1
i=j
ω j (i) =
ω(i) i =
6 j
(
0
i=j
ωj (i) =
ω(i) i =
6 j
For a Boolean function f : Ω → {0, 1} define the influence of the j-th coordinate,
If (j) = µ({ω ∈ Ω : f (ωj ) 6= f (ω j )}).
(1.7)
Equivalently, for A ⊂ Ω, let f be the indicator function of A, so we have influence
IA (j) = If (j). Clearly if µ(f ) is very close to either 0 or 1, then the influence of
12
1.7 Influence and sharp threshold
every coordinate can be small. To avoid this, Ben-Or and Linial asked how small
the maximum influence could be be if µ(f ) = 1/2.
In this context, to study functions that minimize influence it is sufficient to
consider only increasing functions. Let f be a general Boolean function. Then
by Proposition 2.1 of [3], there is an increasing Boolean function g such that
µ(f ) = µ(g),
∀j, If (j) ≥ Ig (j).
This can be shown using a technique from extremal set theory. The values of
f on pairs (ωi , ω i ) where f (ωi ) > f (ω i ) can be swapped, making the function
‘more’ monotonic without increasing the coordinate-wise influences.
Let f : Ω → {0, 1} be a Boolean function such that µ(f ) = 1/2. Consider for
example f (ω) = ω(i), the dictatorship of coordinate i. Then If (j) = 1 if i = j
and If (j) = 0 otherwise. An example with much smaller maximum influence is
the ‘majority’ function. Suppose for convenience that N is odd. Set f (ω) = 1 if
more that half of the ω(i) are 1. The influence of each coordinate is Θ(N −1/2 ),
the probability exactly half of the other N − 1 coordinates take value 1.
They found they could do better, giving the ‘tribe’ function with maximum influence Θ(N −1 log N ). The N coordinates are split into approximately N/ log2 N
tribes of size log2 N − log2 log2 N + O(1). Take f (ω) = 1 when at least one tribe
is unanimously 1. The influence of a coordinate is the probability that all the
other members of the tribe take value 1, probability Θ(N −1 log N ), and that none
of the other tribes is unanimously 1, probability Θ(1). They conjectured that for
all f with µ(f ) = 1/2, for some j, If (j) = Ω(N −1 log N ). This was proved by
a set of authors known as KKL [41]. Their proof uses the observation that the
influences for increasing function f , equation (1.7), are Fourier coefficients of the
function on the group Zn2 . They actually prove a more general result, without
the restriction µ(f ) = 1/2. One can take C = 1/5 below.
Theorem 1.8 (KKL). There is an absolute constant C > 0 such that for all
Boolean functions f ,
∃j, If (j) ≥ C min{µ(f ), 1 − µ(f )}
13
log N
.
N
1.8 Microcanonical ensemble Markov chains
This definition of influence can be extended to more general product measure, see [27]. Let Ω = [0, 1]N and let λ be the uniform measure on Ω. For
x = (x1 , . . . , xN ) and j ∈ {1, . . . , N }, define the set
Sj (x) = {y ∈ Ω : y(i) = x(i) for i 6= j}.
This is the subset of Ω generated by freeing the j-th coordinate. Given a random
variable f : [0, 1]N → {0, 1}, define the influence of coordinate j on f ,
If (j) = λ({x : f not constant on Sj (x)}).
The result of KKL has been extended to this setting by a superset of authors
known as BKKKL. We use this result to study influence and sharp thresholds for
monotonic measures in Chapter 3.
1.8
Microcanonical ensemble Markov chains
We have introduced the Gibbs formalism and shown how Gibbs measures correspond to maximum entropy distributions. This has aroused some interest in
Markov chain models for microcanonical ensembles, such as the Ehrenfest urn
model [20, 44]. In Boltzmann’s H theory, entropy always increases. Markov
chain models often have a seemingly paradoxical property. It seems natural that
they should be time reversible. However, then by ergodicity any function of the
system that goes up must also come down.
In the Ehrenfest urn model there is an urn of N coloured balls, each either red
or green. A ball is picked at random and replaced with a ball of the other colour.
To ensure aperiodicity, we will consider a lazy version: each step do nothing
with probability one half. The equilibrium distribution is uniform on the set
{red, green}N , the system has Maxwell–Boltzmann statistics [40]. If the system
is started from a highly ordered state, say all green balls, after time O(N log N )
it is in a disordered state, with an approximately equal number of red and green
balls [19, 48].
In Chapter 4 we study a different Markov chain model. We start with N
boxes each containing R indistinguishable balls. The process evolves as follows.
14
1.8 Microcanonical ensemble Markov chains
Pick two boxes uniformly at random. If the first box has any balls in, move one
ball to the second box. It is a discrete time ‘mean-field’ version of the zero-range
process [53]. A zero-range process is defined on a directed graph. For simplicity
I will only describe a finite one-dimensional version. Consider N boxes arranged
in a circle with edges oriented clockwise. Distributed among the boxes are balls.
Balls move along the directed edges. The rate at which balls leave a box may
vary by location and box occupancy. The problem is hard to analyse; the model
exhibits a phenomenon known as condensation where a large fraction of all the
balls end up in one box [22].
We consider a zero-range process on the complete graph; there are directed
edges in both directions between each pair of boxes. The process can be interpreted as a model for a microcanonical ensemble. Think of the boxes as particles
and the balls as quanta of energy. The equilibrium distribution has Maxwell–
Boltzmann statistics.
15
Chapter 2
Blume–Capel model
2.1
The Blume–Capel model
We have introduced the Ising model for ferromagnetism. Each vertex takes spin
+1 or −1. The Blume–Capel model is a generalization of the Ising model, formulated by Blume in 1966 [7]. The set of spin states is S = {+1, 0, −1}. The
Blume–Capel model is defined on a graph G = (V, E) by the Hamiltonian,
X
X
X
H(σ) = −J
σ(x)σ(y) + D
σ(x)2 − h
σ(x), σ ∈ Σ = S V .
hx,yi∈E
x∈V
x∈V
This can be considered a dilution of the Ising model in the following sense. Let
G be a finite graph. For a configuration σ ∈ Σ, define the undeleted graph to
be Gσ = (Vσ , Eσ ), Vσ = {x : σ(x) 6= 0} and Eσ = {hx, yi ∈ E : x, y ∈ Vσ }.
Condition the Blume–Capel measure on the event that the undeleted graph takes
a particular value; the resulting measure is the Ising measure on the undeleted
graph.
A first order phase transition is one where a property of the model, such as
the density of +1 spins, is discontinuous as a function of the parameters. A
second order phase transition is one without this phenomenon. Capel studied
the ferromagnetic model (J > 0) by mean field approximations [14, 15, 16]. He
predicted that in the case h = 0, on a lattice with degree δ, there is a first order
phase transition when 31 Jδ log 4 < D < 12 Jδ. He predicted a second order phase
transition when D < 13 Jδ log 4 and no long range order for any temperature when
16
2.1 The Blume–Capel model
D > 21 Jδ. It is believed that even on two dimensional lattices there is a firstorder phase transition. It is predicted that there is a tri-critical point at which
the phase transition goes from first-order to second-order.
Some of the results we will show for the Blume–Capel model are in fact true
in a more general setting. The generalization of the Ising model from two states
to the q-state Potts model motivates an expansion of the Blume–Capel model
with h = 0. The Blume–Capel zero state is kept, but the ±1 states are replaced
by states 1, 2; and more generally 1, . . . , q.
Define the Blume–Capel–Potts (BCP) Hamiltonian as follows. Let q be a
positive integer, let S = {0, 1, . . . , q} and let Σ = S V . Again, define the deleted
graph by Vσ = {x : σ(x) 6= 0} and Eσ = {hx, yi ∈ E : x, y ∈ Vσ }. Let
H(σ) = J|Eσ | − 2J
X
δσ(x),σ(y) + D|Vσ |,
σ ∈ Σ.
hx,yi∈Eσ
Let K = βJ, ∆ = βD. Let Λ = (V, E) be a region of G and let ξ ∈ Σ. Define
ΣξΛ = {σ ∈ Σ : σ(x) = ξ(x) for x 6∈ V }. Let Vσ = Vσ ∩ V and Eσ = Eσ ∩ E.
Define the BCP measure on region Λ by


X
1
ξ
δσ(x),σ(y) − ∆|Vσ | ,
πΛ,K,∆,q
(σ) = BCP exp −K|Eσ | + 2K
Z
σ ∈ ΣξΛ .
hx,yi∈Eσ
It is easy to see that the q = 2 case is equivalent to the Blume–Capel measure.
In Chapter 1 we described a coupling of the Potts spin model and the randomcluster bond model. In Section 2.4, we demonstrate a random-cluster representation for the Blume–Capel–Potts model—we call it the diluted-random-cluster
model. Paths of open edges in the random-cluster model correspond to order
amongst the spins of the Potts model. Likewise, paths of open edges in the
diluted-random-cluster model correspond to order amongst the non-zero spins of
the BCP model. We say that it is a diluted version of the random-cluster model.
Some vertices, and the adjacent edges, are removed—they correspond to zero
spins in the BCP model.
In Section 2.5 we show that the distribution of vertices deleted in the dilutedrandom-cluster model is monotonic. We also give various comparison results in
Sections 2.5 and 2.6, indicating how the diluted-random-cluster measure varies
17
2.2 Notation
with the parameters of the model and with the boundary conditions. The comparison results relating to the boundary conditions allow us to define infinite-volume
limit measures in Section 2.7, extending the coupling to the hypercubic lattice.
In Section 2.8 we use the convexity of the BCP partition function to prove a
uniqueness result. If q = 2, the diluted-random-cluster measure on the hypercubic lattice is independent of the boundary conditions except on a null subset of
the parameter space. In Section 2.9 we use the coupling and comparison results
to draw conclusions about the phase diagram of the q = 2 Blume–Capel–Potts
model on the d = 2 hypercubic lattice.
2.2
Notation
Let d ≥ 2. Let Z = {. . . , −1, 0, 1, . . . }. For x, y ∈ Zd define distance
l(x, y) =
d
X
|xi − yi |.
i=1
Then Ld = (Zd , Ed ), the d-dimensional hypercubic lattice, is the nearest neighbour graph on Zd ,
Ed = {hx, yi : x, y ∈ Zd , l(x, y) = 1}.
It is regular, every vertex has degree δ = 2d. Ld is simple, it has no loop edges
or parallel edges. If V ⊂ Zd then the boundary ∂V is set of vertices in Zd \ V
adjacent to a member of V . The corresponding region Λ = (V, E) of Ld contains
the edges with both end vertices in V ∪ ∂V . Write Λn → Ld if (Λn ) is a sequence
of regions with vertices tending to Zd .
If ω ∈ Ω = {0, 1}I for some set I, define configurations ω x , ωx for x ∈ I by
(
(
1
x
=
y,
0
x = y,
ω x (y) =
ωx (y) =
ω(y) otherwise,
ω(y) otherwise.
For convenience when manipulating multiple coordinates, we will write ω x,y meaning (ω x )y . We use the Kronecker-delta symbol,
(
1 if x = y,
δx,y =
0 otherwise.
18
2.3 FKG and Holley’s theorem
2.3
FKG and Holley’s theorem
In Chapter 1 we stated the FKG theorem and Holley’s theorem [26, 38]. They are
useful tools in the study of the random-cluster model. In this section we will give
two helpful additional results [34, 35]. Let I be a finite set and let Ω = {0, 1}I .
Let (Ω, F) be the σ-algebra with F the set of all subset of Ω. We call probability
measure µ positive if µ(ω) > 0 for all ω ∈ Ω.
Theorem 2.1. Let µ be a positive probability measure on (Ω, F) such that for all
i, j ∈ I, for all ω ∈ Ω with ω(i) = ω(j) = 0,
µ(ω)µ(ω i,j ) ≥ µ(ω i )µ(ω j ).
Then µ satisfies the FKG lattice condition (1.4). By the FKG theorem it is
monotonic and strongly positively-associated.
Theorem 2.2. Let µ1 , µ2 be positive probability measures on (Ω, F) such that µ1
is strongly positively-associated. Suppose that for all ω ∈ Ω with ω(i) = 0,
µ1 (ω)µ2 (ω i ) ≥ µ1 (ω i )µ2 (ω),
and for all ω ∈ Ω with ω(i) = ω(j) = 0,
µ1 (ω)µ2 (ω i,j ) ≥ µ1 (ω i )µ2 (ω j ).
Then
∀ω, ω 0 ∈ Ω : µ1 (ω ∧ ω 0 )µ2 (ω ∨ ω 0 ) ≥ µ1 (ω)µ2 (ω 0 ),
so by Holley’s theorem µ1 ≤st µ2 .
Let G = (V, E) be a finite graph, let p ∈ (0, 1) and q ≥ 1. The (p, q) randomcluster measure on G satisfies the FKG lattice condition and hence is monotonic.
The more edges are open, the more closed edges can be opened without reducing
the number of clusters.
19
2.4 The diluted-random-cluster representation of the BCP model
2.4
The diluted-random-cluster representation
of the BCP model
In the same sense that the BCP model is a dilution of the Potts model, the
measure we now define is a diluted version of the random-cluster measure. The
parameters of the model are a ∈ [0, 1], p ∈ [0, 1) and q ∈ (0, ∞). Let G = (V, E)
be a graph. Let Ψ = {0, 1}V be the set of vertex configurations and let Ω = {0, 1}E
be the set of edge configurations. For ψ ∈ Ψ, call vertex x open if ψ(x) = 1 and
closed otherwise. Define the undeleted graph Gψ = (Vψ , Eψ ) by
Vψ = {x ∈ V : ψ(x) = 1},
Eψ = {hx, yi ∈ E : x, y ∈ Vψ }.
Call the edges not in Eψ deleted. Let ω ∈ Ω. Call edge e open if ω(e) = 1 and
closed otherwise. If all deleted edges are closed, say that ψ and ω are compatible.
Define Θ = {(ψ, ω) : ψ, ω are compatible}. Let Λ = (V, E) be a region, and let
λ = (κ, ρ) ∈ Θ. Let ΘλΛ = {θ ∈ Θ : θ = λ off Λ}, the set of configurations that
agree with boundary conditions λ on the vertices V \ V and edges E \ E. Let
Vψ = Vψ ∩ V and Eψ = Eψ ∩ E.
Two open vertices are connected, written x ↔ y, if they are joined by a path of
open edges. A cluster of the graph is a maximal set of connected open vertices. Let
k(θ) denote the number of clusters of the graph that include a vertex in V ∪ ∂V ;
do not count isolated closed vertices. For now assume a ∈ (0, 1), p ∈ [0, 1).
√
For notational convenience, set r = 1 − p. Define the (a, p, q) diluted-randomcluster (DRC) measure with boundary conditions λ on Λ. For θ = (ψ, ω) ∈ ΘλΛ ,
φλΛ,a,p,q (θ)
=
1
Z DRC
r
|Eψ | k(θ)
q
Y a ψ(x) Y p ω(e)
1−a
1−p
x∈V
e∈E
ψ
DRC
and φλΛ,a,p,q (θ) = 0 otherwise. Here Z DRC = Za,p,q
(λ, Λ) is a normalizing constant.
The above formula may be interpreted with a = 0 and a = 1. If a = 0, set all
vertices and edges closed. If a = 1, require all vertices be open, with edges are
distributed according to the normal (p, q) random-cluster measure.
The motivation for this definition is provided by the following coupling. Let
q be a positive integer and let s ∈ {0, 1, . . . , q}. We will abuse notation by using
20
2.4 The diluted-random-cluster representation of the BCP model
s to represent a BCP spin configuration: the member of Σ such that every vertex
has spin integer s. If s = 0 let b = 0; if s ∈ {1, . . . , q} let b = 1. Again we
will abuse notation, this time by taking b to represent a DRC configuration: the
element of Θ that assigns each vertex and edge state b. Let S be the set of triples
of spin, vertex and edge configurations, (σ, ψ, ω) ∈ Σ × Ψ × Ω, such that
(i) σ ∈ ΣsΛ ,
(ii) (ψ, ω) ∈ ΘbΛ ,
(iii) for all x ∈ V, σ(x) = 0 if and only if ψ(x) = 0, and
(iv) for all e = hx, yi ∈ E, σ(x) 6= σ(y) implies ω(e) = 0.
So S is the set of configuration that have Eσ = Eψ and such that the σ-spins on
(ψ, ω)-clusters are constant. Define measure µ with support S and parameters,
a
p = 1 − e−2K ,
= e−∆
(2.3)
1−a
by

ω(e)
ψ(x) Q
 −1 |Eψ | Q
p
a
(σ, ψ, ω) ∈ S,
Z r
e∈Eψ 1−p
x∈V 1−a
µ(σ, ψ, ω) =
0
otherwise.
s
Theorem 2.4. Measure µ couples measures π = πΛ,K,∆,q
and φ = φbΛ,a,p,q where
a, p, K, ∆ satisfy (2.3).
Proof. Let σ ∈ Σ. This uniquely determines vertex configuration ψ and the set
of edges that may be open, Aσ = {hx, yi ∈ Eσ : σ(x) = σ(y)}. Then


X
1
π(σ) = BCP exp −K|Eσ | + 2K
δσ(x),σ(y) − ∆|Vσ |
Z
hx,yi∈Eσ
1
e−K|Eψ | e−∆|Vσ | e2K|Aσ |
ψ(x) Y 1 |Eψ | Y
a
p
= BCP r
1+
Z
1−a
1−p
x∈V
e∈Aσ
X
Y a ψ(x) Y p ω(e)
1
|Eψ |
= BCP
r
Z
1−a
1−p
x∈V
e∈E
=
Z BCP
ω:(σ,ψ,ω)∈S
=
Z
Z BCP
X
ψ
µ(σ, ψ, ω).
ω:(σ,ψ,ω)∈S
21
2.4 The diluted-random-cluster representation of the BCP model
By summing over σ, Z = Z BCP and µ has first marginal π.
Now fix θ = (ψ, ω) ∈ ΘbΛ . There are q k(θ)−b BCP configurations σ ∈ ΣsΛ such
that (σ, ψ, ω) ∈ S; there are q possible spins on each of the k(θ) − b open clusters
that do not meet the boundary.
φ(ψ, ω) =
1
Z DRC
r
|Eψ | k(θ)
q
Y a ψ(x) Y p ω(e)
1−a
1−p
x∈V
e∈E
ψ
=
Z
Z DRC
qb
X
µ(σ, ψ, ω).
σ
So Z DRC = q b Z and the second marginal is φ.
Note that Z DRC = q b Z BCP . This is used to give a uniqueness result in Section
2.8. Conditional on the BCP configuration, the coupled diluted-random-cluster
edge states are
(i) zero on the deleted edges,
(ii) zero on the undeleted edges hx, yi where σ(x) 6= σ(y), and
(iii) independent Bernoulli p random variables on the other edges.
Conditional on the diluted-random-cluster configuration, the coupled BCP spin
states are
(i) constant and non-zero on clusters,
(ii) zero on deleted vertices,
(iii) equal to the boundary condition s if the cluster meets the boundary, and
(iv) independent between the other clusters, with the spins uniformly distributed
on S = {1, . . . , q}.
Suppose ψ(x) = ψ(y) = 1. If x ↔ y, σ(x) = σ(y). If x 6↔ y then σ(x) and σ(y)
are independent and they agree with probability 1/q. It follows that for q > 1,
π(σ(x) = σ(y) 6= 0) − q −1 π(σ(x)σ(y) 6= 0) = (1 − q −1 )φ(x ↔ y).
(2.5)
This will provide a precise connection between the phase diagrams of the DRC
and BCP models. Existence of large open edge clusters in the DRC model implies
22
2.4 The diluted-random-cluster representation of the BCP model
that non-zero spins in the BCP model are correlated in a non-trivial way over
large distances. Also, the distribution of the set of 0-spins in both models is the
same. The DRC model has an infinite vertex cluster of zero states if and only if
the same is true for the corresponding BCP model.
Unlike the random-cluster measure, the DRC measure is not monotonic. Let
G be the graph with two vertices and one edge, V = {x, y} and E = {e = hx, yi}.
√
Let 0 < a, p < 1 and q ∈ (0, ∞). Set r = 1 − p < 1. Let φ be the (a, p, q) DRC
measure for G. Then
φ(ψ(y) = 1 | ψ(x) = 0, ω(e) = 0) =
qa
qa + 1 − a
≥ φ(ψ(y) = 1 | ψ(x) = 1, ω(e) = 0) =
qar
.
qar + 1 − a
Hence φ is not monotonic with respect to {0, 1}V × {0, 1}E ; the probability that y
is open is not increasing with the other states. However, we will show in Section
2.5 that φ does have the weaker property of monotonicity in the vertex states.
There are some special cases of the diluted-random-cluster measure that are
well understood. We will use them later for comparison purposes. If p = 0, the
edge configuration is zero. The vertex states are independent Bernoulli random
variables with density qa/(1 − a + qa). This corresponds to the K = 0 Blume–
Capel–Potts measure with independent spins.
On graphs with regular degree, the q = 1 BCP and DRC measures may be
viewed as the Ising model. For the BCP model, let η(x) = 2σ(x) − 1. This takes
us from BCP configuration σ ∈ {0, 1}V to Ising configuration η ∈ {+1, −1}V .


X
ξ
πΛ,K,∆,1
(σ) ∝ exp −K|Eσ | + 2K
δσ(x),σ(y) − ∆|Vσ |
hx,yi∈Eσ

= exp −∆

X
σ(x) + K
x∈V
X
σ(x)σ(y)
hx,yi∈E


X
X
Kδ − 2∆ 1
∝ exp 
η(x)
+ K
η(x)η(y)
4
4
x∈V
hx,yi∈E
By the coupling theorem, the same holds for the vertex states of the corresponding
DRC measure.
23
2.5 The vertex DRC marginal measure
2.5
The vertex DRC marginal measure
Let φ = φλΛ,a,p,q be a diluted-random-cluster measure on a region Λ = (V, E) in
G with boundary conditions λ = (κ, ρ). Define vertex DRC measure Φ = ΦλΛ,a,p,q
as the marginal measure of φ on the vertex states Ψ = {0, 1}V ,
X
Φ(ψ) =
φ(ψ, ω).
(ψ,ω)∈Θλ
Λ
Let a ∈ (0, 1), p ∈ [0, 1) and q ∈ [1, 2]. Let r =
√
1 − p. We exclude the cases
a = 0 and a = 1; the corresponding vertex measures are trivial. Assume G is
simple with maximum vertex degree δ. Each vertex, conditional on the states of
the other vertices, is open with probability bounded away from 0 and 1; Φ has
what is known as the ‘finite energy property’ [13].
Theorem 2.6. Let Jx be the event that x ∈ V is open, and let Tx be the σ-field
generated by the states of vertices V \ {x},
qa
qa
≤ Φ(Jx | Tx ) ≤ δ
.
1 − a + qa
r (1 − a) + qa
Theorem 2.7. Φ is monotonic and strongly positively-associated.
The following theorems will be used to show the existence of infinite-volume limit
DRC measures.
1
2
Theorem 2.8. Let λ1 , λ2 ∈ Θ. If λ1 ≤ λ2 then ΦλΛ,a,p,q
≤st ΦλΛ,a,p,q
.
Theorem 2.9. Let Λ1 , Λ2 be two regions in G such that Λ1 is contained in Λ2 ,
Φ0Λ1 ,a,p,q ≤st Φ0Λ2 ,a,p,q ,
Φ1Λ1 ,a,p,q ≥st Φ1Λ2 ,a,p,q .
We can identify a number of situations in which stochastic inequalities hold between versions of the vertex DRC measure with different parameter values.
Theorem 2.10. Let ai ∈ (0, 1), pi ∈ [0, 1) and qi ∈ [1, 2] for i = 1, 2. Let
Φi = ΦλΛ,ai ,pi ,qi for i = 1, 2. Let δ be the maximum degree of G. Then Φ1 ≤st Φ2
if any of the following hold.
(a) That a1 ≤ a2 , p1 ≤ p2 and q1 = q2 .
24
2.5 The vertex DRC marginal measure
(b) That
q2
a2
1 − a2
≥ q1
a1
1 − a1
(1 − p1 )−δ/2 .
(c) That p1 ≤ p2 , q1 ≥ q2 and
a2
a1
δ/2
q2
(1 − p2 ) ≥ q1
(1 − p1 )δ/2 .
1 − a2
1 − a1
p2
≤ q2 (1−p
and
2)
a2
a1
δ/2
q2
(1 − p2 ) ≥ q1
(1 − p1 )δ/2 .
1 − a2
1 − a1
(d) That q1 ≤ q2 ,
p1
q1 (1−p1 )
Let Λ(λ, ψ) be the undeleted graph Gψ with boundary conditions λ on Λ; identify
the vertices in the boundary ∂V joined by open edges of λ outside Λ. Then with
reference to (1.1) we can write,
Φ(ψ) =
1
Z DRC
r
|Eψ |
a
1−a
|Vψ |
RC
Zp,q
(Λ(λ, ψ)).
(2.11)
Lemma 2.12. Let Φi , i = 1, 2, be the vertex DRC measures on Λ with boundary conditions λ ∈ Θ and parameters ai ∈ (0, 1), pi ∈ [0, 1), q1 ∈ [1, ∞) and
q2 ∈ [1, 2]. Suppose that ψ ∈ {0, 1}V , x ∈ V with ψ(x) = 0. Let b be the number
of edges incident to x in Λ(λ, ψ) and let Ix be the event that all the edges incident
to x are closed. Let µG
i be the (pi , qi ) random-cluster measure on graph G.
(i) If
q2
a2
1 − a2
(1 − p2 )b/2
Λ(λ,ψ x )
µ2
(Ix )
≥ q1
a1
1 − a1
(1 − p1 )b/2
Λ(λ,ψ x )
µ1
(2.13)
(Ix )
then
Φ1 (ψ)Φ2 (ψ x ) ≥ Φ1 (ψ x )Φ2 (ψ).
(ii) If y ∈ V \ {x} with ψ(y) = 0, let f ∈ {0, 1} be the number of edges between
x and y. Then
a2
(1 − p2 )(b+f )/2
a1
(1 − p1 )b/2
q2
≥
q
1
x,y
x)
)
1 − a2 µΛ(λ,ψ
1 − a1 µΛ(λ,ψ
(Ix )
(Ix )
2
1
implies
Φ1 (ψ)Φ2 (ψ x,y ) ≥ Φ1 (ψ x )Φ2 (ψ y ).
25
(2.14)
2.5 The vertex DRC marginal measure
The following reduces the work needed to check that the conditions for Lemma 2.12
√
hold. Let ri = 1 − pi , i = 1, 2.
Lemma 2.15. In the context of inequality (2.14),
Λ(λ,ψ x,y )
µ2
Λ(λ,ψ x )
(Ix ) ≤ µ2
(Ix )r2f .
Corollary 2.16. Condition (2.13) also implies (2.14) for all appropriate y.
Proof of Lemma 2.15. Let B and C be the sets of edges joining x and y to ψopen vertices, respectively. Let F be the set of f edges with end vertices x, y.
Let B0 , C0 , F0 be the decreasing events that all the edges in B, C, F are closed,
respectively.
By the monotonicity of the random-cluster model, the probability of Ix is
increased if the edges in C are deleted from Λ(λ, ψ x,y ),
Λ(λ,ψ x,y )
µ2
Λ(λ,ψ x,y )
(Ix ) = µ2
Λ(λ,ψ x,y )\C
(B0 ∩ F0 ) ≤ µ2
(B0 ∩ F0 ).
Notice that
(
Λ(λ,ψ x,y )\C
µ2
(F0 )
=
1
if f = 0
q2 (1−p2 )
p2 +q2 (1−p2 )
if f = 1
)
≤
r2f .
This is by the conditional probability property of the random-cluster model (1.2)
and the elementary inequality,
p
q(1 − p)
≤ 1 − p,
p + q(1 − p)
p ∈ [0, 1], q ∈ [1, 2].
Therefore, deleting edges that have been conditioned closed,
Λ(λ,ψ x,y )\C
µ2
Λ(λ,ψ x,y )\C
(B0 ∩ F0 ) ≤ µ2
Λ(λ,ψ x )
(B0 | F0 )r2f = µ2
(Ix )r2f .
Proof of Lemma 2.12. We will only show part (i), that inequality (2.13) implies
Φ1 (ψ)Φ2 (ψ x ) ≥ Φ1 (ψ x )Φ2 (ψ).
26
2.5 The vertex DRC marginal measure
Part (ii) follows in the same way. Use (1.1) to rewrite (2.13),
(Λ(λ, ψ x )) b
(Λ(λ, ψ x )) b
a2 ZpRC
a1 ZpRC
2 ,q2
1 ,q1
r
≤
r.
1 − a2 ZpRC
(Λ(λ, ψ)) 2 1 − a1 ZpRC
(Λ(λ, ψ)) 1
2 ,q2
1 ,q1
Notice that |Vψx | − |Vψ | = 1 and |Eψx | − |Eψ | = b. Therefore by (2.11) the result
follows.
The proofs of the Theorems 2.7 and 2.10 now follow from Lemma 2.12. In particular Corollary 2.16 means we only have to check that (2.13) holds. Let ai = a,
pi = p and qi = q for i = 1, 2.
Proof of Theorem 2.7. Clearly (2.13) holds with Φ = Φ1 = Φ2 . Now apply Theorem 2.1.
Proof of Theorem 2.10. It is sufficient to check (2.13) in each case. Let x be a
Λ(λ,ψ x )
vertex incident to b edges in Λ(λ, ψ x ). Let µi = µi
for i = 1, 2.
(a) For an edge e, let Je be the event that e is open. By [5] and positive
association,
X
d
1
µ2 (Ix ) =
cov(Ix , Je )
dp2
p2 (1 − p2 ) e∈E
X
1
cov(Ix , Je ).
≤
p2 (1 − p2 ) e∈B
For e ∈ B, Ix and Je are mutually exclusive,
X
d
1
µ2 (Je )
log µ2 (Ix ) ≤ −
dp2
p2 (1 − p2 ) e∈B
b
p2
p2 (1 − p2 ) p2 + q2 (1 − p2 )
b
≤−
as q2 ∈ [1, 2].
2(1 − p2 )
≤−
Integrating from p1 to p2 ,
µ2 (Ix )
(1 − p2 )b/2
.
≤
µ1 (Ix )
(1 − p1 )b/2
As q1 = q2 and a1 ≤ a2 , (2.13) holds and the result follows.
27
2.5 The vertex DRC marginal measure
(b) By the conditional edge densities of the random-cluster measure (1.2),
µ2 (Ix ) ≤ (1 − p2 )b/2
and µ1 (Ix ) ≥ (1 − p1 )b ,
so (2.13) holds.
(c) Under these conditions, µ2 ≥st µ1 so µ2 (Ix ) ≤ µ1 (Ix ).
(d) Again µ2 ≥st µ1 so µ2 (Ix ) ≤ µ1 (Ix ).
Proof of Theorem 2.8. Let λi = (κi , ρi ), i = 1, 2. By the monotonicity of the
vertex DRC measure we may assume κ1 = κ2 . We can adapt the proof of
Λ(λ1 ,ψ x )
Lemma 2.12. By the monotonicity of the random-cluster model, µ1
Λ(λ ,ψ x )
µ2 2 (Ix );
(Ix ) ≥
apply Theorem 2.2.
Proof of Theorem 2.9. Let A be the decreasing event that all vertices in Λ2 \ Λ1
are closed. By the definition of the DRC measure,
Φ0Λ1 ,a,p,q = Φ0Λ2 ,a,p,q ( · | A).
Event A is decreasing so by the monotonicity of Φ0Λ2 ,a,p,q , Φ0Λ1 ,a,p,q ≤st Φ0Λ2 ,a,p,q .
Let λ = (κ, ρ) be a configuration such that off Λ2 , κ = 1 and ρ = 1. Then
Φ1Λ1 ,a,p,q ≥st ΦλΛ1 ,a,p,q .
Notice Φ1Λ2 ,a,p,q can be written as a convex combination of terms on the right hand
side above, so Φ1Λ1 ,a,p,q ≥st Φ1Λ2 ,a,p,q .
Proof of Theorem 2.6. Since q ∈ [1, 2], ΦλΛ,a,p,q is monotonic by Theorem 2.7. As
Jx is increasing, a lower bound for Φ(Jx | Tx ) is obtained by conditioning on all
vertices other than x being closed. In that case, x contributes qa/(1 − a) when
open and 1 when closed; the lower bound follows.
An upper bound is obtained by conditioning on all the vertices other that
x being open, and assuming they are in the same open cluster. This time x
contributes no more than
δ
r q
a
1−a
X
ω∈{0,1}δ
when open, and 1 when closed.
28
ω(i)
δ Y
p
1−p
i=1
2.6 Edge comparisons
2.6
Edge comparisons
In the previous section, we only dealt with the vertex states. In this section we
will give results for the edge states and the full DRC measure. Let Λ = (V, E)
be a region in G. Let λ ∈ Θ, and let φ = φλΛ,a,p,q be the diluted-random-cluster
measure. Define the DRC edge measure by
Υ(ω) =
X
φ(ψ, ω),
ω ∈ Ω.
ψ∈Ψ
In the above sum, ψ that are not compatible with edge configuration ω in Λ and
boundary conditions λ on G \ Λ contribute nothing. Assume G is a simple graph
with maximum vertex degree δ. Let
−1
a
j
.
wj = r q
1−a
This can be thought of as the cost of closing a vertex under φ. Let (ψ, ω) be
a configuration such that ψ(x) = 0. If j is the number of ψ-open neighbouring
vertices of x,
φ(ψ, ω) = φ(ψ x , ω)wj ,
wj ≤ wδ .
Theorem 2.17. Let 0 < a ≤ a0 = 1, p, p0 ∈ (0, 1) and q ∈ [1, ∞). Let Υ = ΥλΛ,a,p,q
and Υ0 = ΥλΛ,a0 ,p0 ,q . If
p
p0
≥
(1 + 2wδ + wδ wδ−1 )
1−p
1 − p0
then Υ ≥st Υ0 .
Theorem 2.18. Let 0 < a < 1, 0 ≤ p < 1 and q ∈ [1, 2]. Let λ ∈ Θ. Let Λ be a
region in G.
(i) φλΛ,a,p,q is increasing in a, p and λ.
(ii) If λ = 0, φλΛ,a,p,q is increasing in Λ
(iii) If λ = 1, φλΛ,a,p,q is decreasing in Λ.
29
2.6 Edge comparisons
Proof of Theorem 2.17. Υ0 is monotonic, it is simply a random-cluster measure.
To apply Theorem 2.2 we must show
Υ0 (ω)Υ(ω e ) ≥ Υ0 (ω e )Υ(ω),
Υ0 (ω)Υ(ω e,f ) ≥ Υ0 (ω e )Υ(ω f ),
∀ω : ω(e) = 0,
(2.19)
∀ω : ω(e) = ω(f ) = 0.
(2.20)
Let ω be an edge configuration from (2.20). By the definition of the randomcluster measure,
Υ0 (ω)
=
Υ0 (ω e )
1 − p0
p0
qk
0
where

1 if e is an isthmus of the (1, ω e )-open graph,
0
e
k = k(1, ω) − k(1, ω ) =
0 otherwise.
Let x and y be the end vertices of e. Let A be the set of vertex configurations compatible with ω e,f , A = {ψ : (ψ, ω e,f ) ∈ Θ}. Write Ax for the set of configurations
in A with vertex x turned off, {ψx : ψ ∈ A}. Then (2.20) becomes,
X
X
1 − p0
k0
e,f
φ(ψ, ω f ).
q
φ(ψ,
ω
)
≥
0
p
ψ∈A∪A ∪A ∪A
ψ∈A
x
y
x,y
This holds if
X
X
1−p k
1 − p0
f
k0
f
q
φ(ψ, ω ) ≥
(1 + 2wδ + wδ wδ−1 )φ(ψ, ω )
q .
p0
p
ψ∈A
ψ∈A
with
k = k(ψ, ω f ) − k(ψ, ω e,f ) ≤ k 0 .
Splitting up the sum, we see that (2.20) holds. The proof of (2.19) is very
similar.
Proof of Theorem 2.18. In each case the statement holds for the vertex component of the measure, see Theorems 2.8, 2.9 and 2.10 (a). Recall that φλΛ,a,p,q has
Λ(λ,ψ)
the same distribution as the random-cluster measure µp,q
generated by ψ ∼
ΦλΛ,a,p,q .
on the graph Λ(λ, ψ)
Let X(ψ, ω) be an increasing random variable defined
30
2.7 Infinite-volume measures
on the vertex and edge states of G. For fixed ψ, define a random-cluster random
variable, Xψ (ω) = X(ψ, ω),
φ(X) = Φ(µΛ(λ,ψ)
(Xψ )).
p,q
The random-cluster measure in increasing in p and Λ(λ, ψ). Xψ is an increasing
Λ(λ,ψ)
random variable, and it is increasing in ψ. Therefore µp,q
(Xψ ) is an increasing
function of ψ.
2.7
Infinite-volume measures
We now are in a position to discuss infinite-volume limit measures. We will work
on the hypercubic lattice, take G = Ld with d ≥ 2. The results easily generalize to
other lattices. Let 0 < a, p < 1 and q ∈ (0, ∞). Let FΘ be the σ-algebra generated
by the weak topology on Θ. Call a probability measure φ on (Θ, FΘ ) a limit
diluted-random-cluster measure with parameters a, p, q and boundary conditions
λ ∈ Θ if it is an accumulation point of a sequence of measures (φλΛn ,a,p,q ) with
Λn → Ld . Let Wa,p,q be the set of such measures; let co Wa,p,q be the closure of
the convex hull of Wa,p,q .
The configuration space Θ is compact. The product Ψ × Ω of discrete topological spaces is compact, and Θ is a closed subset of Ψ × Ω. It is standard by
Prohorov’s theorem [6] that any sequence of measures on Θ must have a fixed
point. Hence Wa,p,q is non-empty. Using the results of the previous three sections, we can show that many results for the q ≥ 1 random-cluster model have
analogues for the DRC model when q ∈ [1, 2]. Let 0 < a, p < 1, q ∈ [1, 2] and
define the DRC model with regular boundary conditions, φba,p,q = limΛ→Ld φbΛ,a,p,q
for b ∈ {0, 1}.
Theorem 2.21. Let φ be a limit DRC measure. Consider the vertex measure Φ
obtained by projecting φ onto the set of vertex configurations.
(i) If φ ∈ Wa,p,q then Φ is positively associated.
(ii) If φ ∈ co Wa,p,q then Φ has a finite energy property.
31
2.7 Infinite-volume measures
By the conditional probability properties of the random-cluster model, if Φ has a
finite energy property then the corresponding edge measure Υ has a finite energy
property. Given the states of all but one edge, that edge is open with probability
bounded away from 0 and 1.
Theorem 2.22. Limit measures φba,p,q are
(i) well defined,
(ii) stochastically increasing in a, p and b,
(iii) extremal in co Wa,p,q ,
(iv) tail trivial, and
(v) translation invariant.
Also φba,p,q have the 0/1-infinite cluster property with respect to both nearestneighbour vertex clusters and edge clusters. The number of infinite clusters of
each type is a.s. constant and equal to either 0 or 1.
Theorem 2.23. If q ∈ {1, 2} and s ∈ {0, 1, . . . , q} then the limit Blume–Capel–
s
s
Potts measure πK,∆,q
= limΛ→Ld πΛ,K,∆,q
exists.
We can also define DLR type DRC measures. Let Ra,p,q be the set of measures
φ such that,
φ(A | TΛ )(λ) = φλΛ,a,p,q (A),
A ∈ FΘ , φ-a.e. λ.
Let x, y ∈ Zd and write 1{x↔y} for the indicator function of the event x is connected to y. A consequence of Theorem 2.22 is that the set of discontinuities of
1{x↔y} is φba,p,q -a.s. null. By this we mean that for almost all configurations θ,
there is a region Λ = Λ(θ) such that x ↔ y is decided inside Λ. If the event
is not determined inside a region Λ0 , then x and y must both be joined to the
boundary ∂Λ0 by open paths that do not meet inside Λ0 . This cannot be the case
for arbitrarily large Λ0 , or x and y are in distinct infinite edge clusters. It follows
from the above that φba,p,q ∈ Ra,p,q .
The following theorem collects some standard results [34]. Positive association
and stochastic orderings are preserved under weak limits.
32
2.7 Infinite-volume measures
Theorem 2.24. Let I be a countable set. Let Ω = {0, 1}I and let F be the σalgebra generated by the product topology.
(a) Let (µn ) and (νn ) be two sequences of probability measure on (Ω, F), with
weak limits µ, ν respectively. If µn ≤st νn for all n then µ ≤st ν.
(b) If measures (µn ) on (Ω, F) converges weakly to µ, and each µn is positively
associated, then so is µ.
(c) Let µ, ν be measures on (Ω, F). Let Ji (ω) = ω(i). If µ ≤st ν and µ(Ji ) = ν(Ji )
for all i ∈ I then µ = ν.
Corollary 2.25. The vertex comparison results of Theorem 2.10 and the edge
comparison result of Theorem 2.17 extend to Ld .
The following lemma is needed in the proof of Theorem 2.23 for s = 1, . . . , q.
Lemma 2.26. Let A be an increasing closed event. Let X be a φ1 -a.s. continuous
random variable. Then
φ1Λ (X1A ) → φ1 (X1A ) as Λ → Ld .
Proof of Theorem 2.21.
(i) Write Φ = limΛ→Ld ΦλΛ,a,p,q . By Theorem 2.7, Φ is the limit of monotonic
measures so the result follows by Theorem 2.24 (b).
(ii) We adapt the proof of Theorem 3.3 in [33]. Suppose first φ ∈ Wa,p,q , say
φ = lim φλΛ .
Λ→Zd
We will write FV for the σ-algebra generated by the states of vertices in
Λ = (V, E) and TV for σ-algebra generated by the states of vertices in Zd \V .
Let Λ0 = (V 0 , E 0 ). By the martingale convergence theorem [36] and weak
convergence,
φλ (Jx | Tx ) = lim φλ (Jx | FV 0 \x )
Λ0 →Zd
= lim lim φλΛ (Jx | FV 0 \x ).
Λ0 →Zd Λ→Zd
33
2.7 Infinite-volume measures
By Theorem 2.6 this limit is bounded away from 0 and 1. The bounds
depend only on a, p and q so the result extends to the convex hull of Wa,p,q
by linearity. If φ ∈ co Wa,p,q we can write φ = limn φn with each φn in the
convex hull of Wa,p,q . Write Iκ,V \x for the cylinder event that the vertices
in V \x take the values from κ. Again a.s. by martingale convergence,
φ(Jx | Tx )(κ) = lim φ(Jx | FV \x )(κ)
Λ→Zd
φ(Jx ∩ Iκ,V \x )
φ(Iκ,V \x )
Λ→Zd
φn (Jx ∩ Iκ,V \x )
= lim lim
.
d
n→∞
φn (Iκ,V \x )
Λ→Z
= lim
The vertex states of φ have a finite energy property.
An event is called a cylinder event if it only depends on the state of a finite
number of vertices and edges. The set of cylinder events is convergence determining [6]; the same is true for just the set of increasing cylinder events.
Proof of Theorem 2.22.
(i) Let A be an increasing cylinder event. Let Λn , ∆n be two increasing sequences of regions such that Λn → Ld , ∆n → Ld . Then by Theorem 2.18,
the sequence (φbΛn ,a,p,q (A)) is monotone. For every Λn there exists an m
such that Λn ⊂ ∆m , and vice versa. Hence φba,p,q (A) is well-defined.
(ii) This follows from Theorem 2.18, take the limit Λ → Ld by Theorem 2.24
(a).
(iii) This follows from Theorem 2.18, take the limit Λ → Ld by Theorem 2.24
(a).
(iv) The following argument is standard [4, 34]. First consider the case b = 0.
Let Λ ⊂ ∆ be regions in Ld . Let TΛ be the σ-field generated by the vertices
and edges of Ld \Λ. Let A be an increasing cylinder event on Λ, and T an
34
2.7 Infinite-volume measures
event on ∆\Λ. Let I0,∆\Λ be the decreasing event that vertices and edges
in ∆\Λ are closed. By Theorem 2.18 (ii),
φ0Λ (A) = φ0∆ (A | I0,∆\Λ ) ≤ φ0∆ (A | T ).
Take the limits ∆ → Zd and then Λ → Zd to get
φ0 (A)φ0 (T ) ≤ φ0 (A ∩ T )
for T in the tail σ-field T =
T
Λ a region of Ld
TΛ . Replace T with T c ,
φ0 (A)φ0 (T c ) ≤ φ0 (A ∩ T c ).
Adding the two inequalities above we get φ0 (A) ≤ φ0 (A). Therefore the
inequalities hold with equality,
φ0 (A)φ0 (T ) = φ0 (A ∩ T ).
This holds for all increasing cylinder events A, and so for all events. Let
A = T , φ0 (T ) = φ0 (T )2 , φ0 is tail-trivial. Similarly for b = 1.
(v) Let A be an increasing cylinder event on Λ. Let τ : Ld → Ld be a translation
of the lattice. Then
φ0 (A) ≥ φ0Λ (A) = φ0τ Λ (τ A) → φ0 (τ A),
Λ → Zd .
The above inequality also holds for τ −1 , so φ(A) = φ(τ A). Similarly for 1
boundary conditions with the inequalities reversed.
The 0/1-infinite cluster property follows from [13] as φb is translation invariant
and the projected measures on vertices and edges have finite energy properties.
Proof of Lemma 2.26. Write [1A ]1Λ (θ) for the indicator function of A evaluated at
θ on Λ with boundary condition 1. As A is increasing and closed, for all θ ∈ Θ,
[1A ]1Λ (θ) ↓ 1A (θ) as Λ → Ld .
35
2.7 Infinite-volume measures
Let regions Λ, Λ0 , Λ00 satisfy Λ0 ⊂ Λ ⊂ Λ00 . By monotonicity, as used in the proof
of Theorem 2.9,
φ1Λ ([1A ]1Λ0 ) ≥ φ1Λ (1A ) ≥ φ1Λ00 ([1A ]1Λ ).
Let Λ00 → Ld . By weak convergence,
φ1Λ ([1A ]1Λ0 ) ≥ φ1Λ (1A ) ≥ φ1 ([1A ]1Λ ).
Take the limit Λ → Ld . By weak convergence and the monotone convergence
theorem,
φ1 ([1A ]1Λ0 ) ≥ lim φ1Λ (1A ) ≥ φ1 (1A ).
Λ→Ld
Now let Λ0 → Ld . Again by the monotone convergence theorem,
φ1 (1A ) ≥ lim φ1Λ (1A ) ≥ φ1 (1A ).
Λ→Ld
Let (Λn ) be a sequence of regions that tend to Ld . The sequence of measures (φ1Λn )
is decreasing so there is a monotonic coupling. Let θ, (θn ) be random variables
such that each θn has law φ1Λn , θ has law φ1 and θn ↓ θ as n → ∞. X is φ1 -a.s.
continuous so by weak convergence X(θn ) → X(θ), φ1 -a.s.. Also 1A (θn ) → 1A (θ),
φ1 -a.s.. By the dominated convergence theorem the result follows.
Proof of Theorem 2.23. By the coupling of the finite BCP and DRC models, and
the existence of limΛ→Ld φbΛ , b = 0, 1, we can show that limΛ→Ld πΛs exists. Let
σ ∈ Σ be a spin configuration. Let A be the BCP cylinder event that the spins in
a region Λ = (V, E) agree with σ. Events such as A are convergence determining.
Let B be the DRC event
(i) for all x ∈ V , ψ(x) = 0 if and only if σ(x) = 0, and
(ii) for all x, y ∈ V , x ↔ y only if σ(x) = σ(y).
We can partition event B into disjoint events according to number of open clusters
that intersect V . Let Bi be the union of B with the event V intersects i open
clusters. Let ∆ be a region containing Λ. Suppose first b = 0 and s = 0. By the
P | −i 0
s
properties of the coupling we can write π∆
(A) = |V
i=0 q φ∆ (Bi ). Take the limit
∆ → Zd .
36
2.8 Uniqueness by convexity of pressure
s
In the case b = 1 and s = 1, . . . , q we must modify the expression for π∆
(A)
to take into account the fixed spin of the boundary cluster. Let
A+ = {∃x ∈ V : σ(x) = s and x ↔ ∞},
A− = {∃x ∈ V : σ(x) 6= s and x ↔ ∞}.
By Lemma 2.26 we can take the limit ∆ → Ld below to give the result,
s
π∆
(A)
=
|V |
X
q −i [φ1∆ (Bi ) + (q − 1)φ1∆ (Bi ∩ A+ ) − φ1∆ (Bi ∩ A− )].
i=0
2.8
Uniqueness by convexity of pressure
Fix q ≥ 1. Except possibly for countably many p, limit random-cluster measures on Ld are unique [33]. The proof uses the convexity of a certain function,
pressure, defined in terms of the random-cluster partition function. We extend
this technique to give a uniqueness result for the q = 2 limit DRC measures
φλ = φλa,p,q on Ld . By the coupling, we also get a uniqueness result for q = 2
ξ
Blume–Capel–Potts limit measures π ξ = πK,∆,q
. In this section, let (2.3) fix the
relationship between a, p and K, ∆.
Theorem 2.27. Let q = 2. The set of points a, p at which φ0a,p,q 6= φ1a,p,q , and
0
1
2
πK,∆,q
6= (πK,∆,q
+ πK,∆,q
)/2, can be covered by a countable collection of rectifiable
curves of [0, 1]2 .
Let Λ = (V, E) be a region in Ld . Take b ∈ {0, 1} to be both a DRC boundary
condition and a BCP boundary condition. By Theorem 2.4 we can define,
1
1
DRC
BCP
log q −b Za,p,q
(b, Λ) =
log ZK,∆,q
(b, Λ)
|V |
|V |
h
i
X
X
1
=
log
exp − K|Eσ | + 2K
δσ(x),σ(y) − ∆|Vσ | .
|V |
σ
GbΛ (K, ∆) =
hx,yi∈Eσ
37
2.8 Uniqueness by convexity of pressure
The sum is over BCP configurations σ that are compatible with boundary condition b. Differentiating GβΛ (K, ∆) with respect to ∆,
∂ b
1 b
GΛ (K, ∆) = −
π (|Vσ |)
∂∆
|V |
1 b
=−
φ (|Vψ |).
|V |
Also


∂ b
1 b
GΛ (K, ∆) = −
π −|Eσ | + 2
∂K
|V |
X
δσ(x),σ(y) 
hx,yi∈Eσ
!
1 b
2X
=−
φ −|Eψ | +
ω(e) .
|V |
p e∈E
Let i = (i1 , i2 ) be a unit vector in R2 . Let Var denotes variance with respect to
π b . Differentiating in the direction i on the (K, ∆)-plane,
X
1
∂2 b
G
(K,
∆)
=
Var
−
i
|E
|
+
2i
δ
−
i
|V
|
≥ 0.
1
σ
1
2 σ
σ(x),σ(y)
∂i2 Λ
|V |
hx,yi∈Eσ
GbΛ is convex.
Lemma 2.28. Let ∆ ∈ R, K ∈ [0, ∞) and b ∈ {0, 1}. The limit,
G(K, ∆) = lim GbΛ (K, ∆)
Λ→Ld
exists. It is a convex function of K, ∆ and independent of b.
Proof of Theorem 2.27. By Theorem 8.18 of [23] and the convexity of G, the
subset of [0, ∞) × R where G(K, ∆) is not differentiable can be covered by a
countable collection of rectifiable curves.
Let Jx be the event that vertex x is open. By translation invariance and the
monotonicity of vertex DRC measures, for any region Λ containing x,
1 X 0
1 X 1
φΛ (ψ(y)) ≤ φ0 (Jx ) ≤ φ1 (Jx ) ≤
φ (ψ(y)) .
|V | y∈V
|V | y∈V Λ
38
(2.29)
2.8 Uniqueness by convexity of pressure
The left and right most terms in (2.29) are
differentiable at (a, p),
∂
Gb
∂∆ Λ
→
∂
G
∂∆
∂
Gb ,
∂∆ Λ
b = 0, 1 respectively. If G is
as Λ → ∞; φ0 (Jx ) = φ1 (Jx ). By Theorem
2.24 (c), the corresponding vertex measures satisfy Φ0 = Φ1 .
As Φ0 = Φ1 , limΛ→Ld |V |−1 φb (|Eψ |) is independent of b. Take the limit of
∂
Gb (K, ∆)
∂K Λ
as Λ → Ld . For all e ∈ Ed , φ0 (Je ) = φ1 (Je ). Applying Theorem
2.24 (c) gives φ0 = φ1 .
We will use the following lemmas in a subadditivity argument to show GbΛ
converges as Λ → Zd . The first bounds the BCP partition function. The second
bounds how the partition function changes when edges are removed from the
graph. Write ZGBCP for the BCP partition function on a graph G with no boundary
conditions.
Lemma 2.30. For graph G = (V, E),
−K|E| − |V | · |∆| + |V | log 3 ≤ log ZGBCP ≤ K|E| + |V | · |∆| + |V | log 3.
Lemma 2.31. Suppose removing edges F ⊂ E from a graph G = (V, E) creates
disjoint subgraphs Gi = (Vi , Ei ), i=1,2. Then
log ZGBCP
+ log ZGBCP
− K|F | ≤ log ZGBCP ≤ log ZGBCP
+ log ZGBCP
+ K|F |.
1
2
1
2
Proof of Lemma 2.28. First assume b = 0. For a region Λ = (V, E) the edges
BCP
between V and the boundary ∂V make no contribution to ZK,∆,q
(b, Λ) and can
be ignored. Let n = (n1 , . . . , nd ) ∈ Nd and let Λn be the region defined by the
Q
|n| = n1 . . . nd vertices in di=1 [1, ni ]. Let k ∈ Nd and write
j k
ni
(n, k) = ki
:1≤i≤d ,
ki
jnk
k
=
d Y
ni
i=1
ki
.
Let ∆ be the region defined by the vertices in Λn but not in Λ(n,k) . Then by
Lemma 2.31,
j n k
k
log ZΛBCP
−K
k
d
X
|k| i=1
ki
39
BCP
≤ log ZΛBCP
.
+ log Z∆
n
2.9 Phases and comparison results
Apply Lemma 2.30 to ∆,
BCP
log Z∆
≥ (−dK − |∆| + log 3)
d
X
|n|
i=1
ni
ki .
Divide by |n| and let the ni tend to ∞,
d
X 1
1
−
K
log ZΛBCP
≤ lim inf GbΛn .
k
n→∞
|k|
k
i
i=1
Let k → ∞,
lim sup GbΛk ≤ lim inf GbΛn .
n→∞
k→∞
Hence G = limn→∞ GbΛn exists. Each GbΛn is a convex function of K, ∆ and so
G is too. If b = 1 we can ignore the boundary conditions by Lemma 2.31; the
proportion of vertices in Λn adjacent to the external vertex boundary tends to 0
as n → ∞.
Proof of Lemma 2.30. The partition function ZGBCP is a sum over 3|V | terms, each
bounded between exp(−K|E| − |V | · |∆|) and exp(K|E| + |V | · |∆|).
BCP
BCP
BCP
Proof of Lemma 2.31. Notice log Z(V,E\F
) = log ZG1 + log ZG2 , the BCP mea-
sure is independent between disjoint graph components. For every edge you add
to a graph, the change to log Z BCP is bounded by ±K.
2.9
Phases and comparison results
In this section we will discuss the phase diagrams of the BCP model and the DRC
model. By the coupling of Theorem 2.4, and in particular equation (2.5), we can
discuss them in parallel. For concreteness we will specify 1 boundary conditions
for both models with q = 2 and d = 2.
There are two properties of the DRC model that we are interested in:
(i) φ1 has an infinite open edge cluster.
(ii) φ1 has an infinite closed vertex cluster.
40
2.9 Phases and comparison results
K
(a)
(b)
(c)
∆
Figure 2.1: Approximate Blume–Capel phase diagram
For all a, p the probability of each property is either 0 or 1. The corresponding
phases of the BCP model are:
(i) π 1 (σ(0) = 1) > 12 π 1 (σ(0) ∈ {1, 2}).
(ii) π 1 has an infinite 0 spin nearest-neighbour cluster.
Capel’s predictions for the Blume–Capel phase diagram are illustrated in Figures 2.1 on the K, ∆ plane and in Figure 2.2 on the a, p plane. The areas indicated
are,
(a) where property (i) holds,
(b) where property (ii) holds,
(c) where neither property (i) nor property (ii) holds, there is neither type of
long range order.
Recall the original parameterization of the Blume–Capel model, K = βJ and
∆ = βD. With J, D fixed, as temperature increases (K, ∆) moves towards (0, 0)
along a straight line, (a, p) moves towards (1/2, 0) along the path
a
= (1 − p)D/(2J) .
1−a
41
2.9 Phases and comparison results
1
(a)
0.8
0.6
(b)
(c)
p
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
a
Figure 2.2: Approximate Blume–Capel phase diagram
Such paths are indicated in Figures 2.1 and 2.2 by grey lines.
A first order phase transition is one where a property of the model, such as
the density of +1 spins, is discontinuous as a function of the parameters. A
second order phase transition is one without this phenomenon. Capel predicted
that a tri-critical point existed where the three phases meet, lying on the curve
2
a/(1 − a) = (1 − p) 3 log 4 . The boundary between (a) and (b) is thought to
be first-order and arrive at (a, p) = (0, 1) at the same gradient as the curve
a/(1 − a) = (1 − p). The boundary between (a) and (c) is thought to be secondorder. By the comparison inequalities, if (a, p) lies in region (a) then so does the
rectangle bounded by (a, p) and (1, 1). Similarly, if (a, p) lies in region (b) then
42
2.9 Phases and comparison results
1
0.8
0.6
p
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
a
Figure 2.3: Blume–Capel comparison results
so does the rectangle bounded by (0, 0) and (a, p).
We can use the comparison results with certain points of reference to draw
more detailed inferences about the phase diagram. Three points are marked on
Figure 2.3.
(i) (a, p, q) = (1,
√
2/(1 +
√
2), 2): The a = 1 DRC model is a random-cluster
measure. The q = 2 random-cluster measure on L2 has critical point p =
√
√
2/(1 + 2) ∼ 0.57. By Theorem 2.17, the q = 2 DRC measure has an
infinite open edge cluster to the right of the curve from (a, p) = (1, 0.57)
to (a, p) = (1, 1). The corresponding BCP measure has a preponderance of
1 states. Below the line p = 0.57, φ has no infinite open edge clusters. In
43
2.9 Phases and comparison results
the corresponding BCP measure, the 1-spins and the 2-spins have the same
distribution.
site
(ii) (a, p, q) = (0, (1 − psite
c )/(1 + pc ), 2): To the left of this point on the line
p = 0, Φ is a product measure with an infinite cluster of closed vertices.
2
Take psite
c , the critical probability for independent site percolation on L , to
be 0.59. Site percolation is a nearest-neighbour vertex percolation model.
Vertices are open with probability p, adjacent open vertices are connected.
By Theorem 2.10 (b) this point stochastically dominates the area to the left
of the curve from (0.25, 0) to (0, 1). There is an infinite nearest-neighbour
cluster of closed vertices. This corresponds to a BCP measure that has an
infinite cluster of 0 states.
site
(iii) (a, p, q) = 0, psite
/(2
−
p
),
2
: To the right of this point on the line p = 0,
c
c
Φ is a product measure with an infinite cluster of open vertices. By part (c)
of Theorem 2.10, the region to the right of the line a = 0.41 stochastically
dominates (0.41, 0), and therefore Φ has an infinite open vertex cluster.
In addition, the DRC measure on the line a/(1−a) = 1−p with q = 1 correspond
to an Ising model with zero external magnetic field. On the line segment above
√
√
(a, p) = (1/((1 + 2)4 + 1), 1 − 1/(1 + 2)4 ), the Ising model is supercritical—the
boundary conditions determine which spin is dominant. Below the supercritical
line segment, the q = 1 DRC model has a preponderance of 0 spins. Above the
supercritical line segment, there is a preponderance of 1 spins.
By part (c) of Theorem 2.10, compare (a1 , p1 , 2) and (a2 , p2 , 1) with (a2 , p2 )
‘just below’ the supercritical portion of a/(1 − a) = 1 − p. We need
a1
3
(1 − p2 ) ≥ 2
(1 − p1 )2 .
1 − a1
This occurs if
1 − p2 ≥
2a1
1 − a1
,
√ −4
p1 ≥ p 2 ≥ 1 − 1 + 2
.
The area below the curve from (a, p) = (.0541, .986) to (0, 1) is in region (b).
44
2.9 Phases and comparison results
By part (d) of Theorem 2.10, compare (a2 , p2 , 2) and (a1 , p1 , 1) with (a1 , p1 )
‘just above’ the supercritical portion of a/(1 − a) = 1 − p. We need,
a2
p1
p2
2
(1 − p2 )2 ≥ (1 − p1 )3 and
≤
.
1 − a2
1 − p1
2(1 − p2 )
Set p1 =
p2
.
2−p2
Then we need,
2a2
8(1 − p2 )
≥
1 − a2
(2 − p2 )3
√
2 − 2(1 + 2)4
√
and p2 ≥
.
2 − (1 + 2)4
The area to the right of the curve from (a, p) = (.0145, .970) to (a, p) = (0, 1) has
an infinite open vertex cluster. Further, we can see that this region is in (a) as
follows. Let S be the set of dominant spins of a super-critical Ising model with
parameter βJ = − 18 log(1 − p1 ). By the coupling of the Ising and the randomcluster model, the critical probability for bond percolation on S is a.s. less than
π where π = 1 − exp(−2βJ). The set of 1 vertices under φa2 ,p2 ,2 stochastically
dominates S, and the (p2 , 2) random-cluster measure stochastically dominates a
product measure with density π. Hence φa2 ,p2 ,2 has an infinite open edge cluster.
45
Chapter 3
Influence and monotonic
measures
3.1
Influence beyond KKL
In the first chapter we introduced the topics of influence and sharp thresholds. We
have so far only stated the influence result of KKL for Bernoulli product measure
with density 1/2, the uniform measure on the Hamming space Ω = {0, 1}N .
The theorem has been extended to more general product measures. A sharp
threshold result follows by Russo’s formula. In this chapter we extend the notion
of influence to monotonic measures. We prove influence and sharp threshold
results for monotonic measures, in particular the random-cluster measure.
A classic result in percolation theory is due to Kesten; the critical value for
bond percolation on the square planar lattice L2 is one half [42]. His proof
settled an old conjecture. The lower bound pc ≥ 1/2 was shown by Harris in
1960. Harris’s lower bound made use of the self duality of bond percolation on
L2 at p = 1/2. More recently, Bollobás and Riordan [9] produced a shorter proof
using the newly developed sharp threshold technology.
The main motivation for this chapter is to explore the possiblility of adapting
their technique to the random-cluster setting. The random-cluster model on L2
√
√
has a form of self duality at psd = q/(1+ q). We prove a sharp threshold result
for short-ways rectangle crossings on a finite region of the square planar lattice
46
3.1 Influence beyond KKL
with cyclic boundary conditions. This result is not strong enough to determine
the critical value pc . In Section 3.9 we show that a related but stronger result, a
sharp threshold result for long-ways rectangle crossings on L2 , would determine
pc .
Friedgut and Kalai extended the KKL influence result to µp , product measure
with density p ∈ (0, 1). Recall that we write ω ∈ Ω for a general configuration
and Xi for the random variable Xi (ω) = ω(i). Let i ∈ I = {1, . . . , N }; let A ⊂ Ω
be an increasing event. The influence of coordinate i is,
IA (i) = µp (A | Xi = 1) − µp (A | Xi = 0) = µp (ω i ∈ A, ωi 6∈ A).
(3.1)
Throughout the chapter, C > 0 is an absolute constant.
Theorem 3.2 (Friedgut and Kalai [28]). There exists i ∈ I such that
IA (i) ≥ C min{µp (A), µp (Ac )}
log N
.
N
The following is the most general influence result for product measure. Let
E = [0, 1]N , the Euclidean cube, and let (E, E, λ) be the uniform (Lebesgue)
measure space. E has the coordinate-wise partial order. Let A ⊂ E be increasing,
if x ≤ y and x ∈ A then y ∈ A. Let Ui be the identity random variable for the
i-th coordinate, Ui (x) = x(i). A continuous version of influence is
IA (i) = λ(A | Ui = 1) − λ(A | Ui = 0).
(3.3)
Theorem 3.4 (BKKKL [10]). For some i ∈ I,
IA (i) ≥ C min{λ(A), λ(Ac )}
log N
.
N
In Section 3.10 we extend this to an influence result for continuous monotonic
measures.
Notice that the result of Friedgut and Kalai follows from the result of BKKKL
by embedding. Let f : E → Ω be defined by
f (x(1), . . . , x(N )) = (ω(1), . . . , ω(N ))
47
3.2 Symmetric events
where for all i ∈ I,
(
1 x(i) ≥ 1 − p,
ω(i) =
0 otherwise.
Function f is increasing, if A ⊂ Ω is increasing so is B = f −1 (A) ⊂ E. For all i,
IA (i) = IB (i).
The definition of influence for (E, E, λ) requires some clarification. It depends
on the version of conditional expectation used. Interpret the conditional expectations in (3.3) as the N − 1 dimensional Lebesgue measures on {1} × [0, 1]N −1
and {0} × [0, 1]N −1 . We could have chosen a more robust definition of influence,
let
IˆA (i) = lim λ(A | Ui ∈ [1 − ε, 1]) − λ(A | Ui ∈ [0, ε]).
ε→0
Consider the two events
A = {x ∈ [0, 1]2 : x(1) ≥ 1/2}, B = A ∪ {x ∈ [0, 1]2 : x(2) = 1}.
They only differ on a null set, so IˆA (2) = IˆB (2) = 0. In contrast, IA (2) = 0,
IB (2) = 1/2; the influence for B seems exaggerated. However, Theorem 3.4 also
holds with Iˆ substituted for I. Given an increasing set B, construct a set A that
only differs from B on a null set. Remove points on the ‘upper’ faces such as
(1, x(2), . . . , x(N )) if limx(1)→1− 1A (x) = 0, and add points on the ‘lower’ faces
such as (0, x(2), . . . , x(N )) if limx(1)→0+ 1A (x) = 1. Then A is increasing and
λ(A) = λ(B). For all i ∈ I, IA (i) = IˆB (i); we can apply Theorem 3.4 to A.
3.2
Symmetric events
Let SN be the group of permutations of I = {1, . . . , N }. Let Γ be a subgroup
of SN such that for all i, j ∈ I, there is a γ ∈ Γ with γ(i) = j; Γ is said to act
transitively on I. For an event A and a permutation γ, define event Aγ by
(ω(1), . . . , ω(N )) ∈ A ↔ (ω(γ(1)), . . . , ω(γ(N ))) ∈ Aγ .
An event A is Γ-invariant if for all γ ∈ Γ, A = Aγ . An event is symmetric if
it is Γ-invariant for a group Γ that acts transitively on I. Take for instance the
48
3.3 Sharp thresholds for product measures
majority event
(
A=
ω∈Ω:
)
X
ω(i) ≥ N/2 .
i∈I
A is SN -invariant. Similarly the tribe event with m tribes of n members is invariant with respect to the group of size m!(n!)m generated by permutations within a
tribe, and permutations of whole tribes. For a symmetric event A, the influence
is the same for each i so
X
IA (i) ≥ C min{µp (A), µp (Ac )} log N.
(3.5)
i∈I
Contrast this with the dictatorship event A = {ω : ω(1) = 1};
P
i∈I
IA (i) = 1 for
all p.
3.3
Sharp thresholds for product measures
Let ω ∈ Ω and let A be an increasing event. If ωi 6∈ A, ω i ∈ A, we say that i is
critical for A. The probability that coordinate i is critical is IA (i). The higher
the expected number of critical coordinates, the more sensitive µp (A) is to small
changes in p. Russo’s formula [51] quantifies this,
X
d
IA (i).
µp (A) =
dp
i∈I
Let A be a symmetric event, by Russo’s formula and (3.5),
d
µp (A) ≥ C min{µp (A), µp (Ac )} log N.
dp
When µp (A) ≤ 1/2, dividing both sides above by µp (A) gives,
d
log µp (A) ≥ C log N.
dp
Similarly for µp (A) ≥ 1/2,
d
log µp (Ac ) ≤ −C log N.
dp
Integrating over p gives a sharp threshold result [8, 28].
49
3.4 Self duality for bond percolation on L2
Theorem 3.6. Let A 6∈ {∅, Ω} be an increasing symmetric event. Let ε ∈ [0, 1/2].
If p, p0 ∈ (0, 1) are such that µp (A) ≥ ε and
p0 − p =
2 log 1/(2ε)
C log N
then µp0 (A) ≥ 1 − ε.
This technique was first used in [28] to prove a sharp threshold theorem for
graph properties on Erdős–Rényi graphs. The same technique can be used on
graphs with less symmetry. Let Td = (Z/nZ)d be the d-dimensional torus; the
nearest neighbour graph on Td can be thought of as the subgraph [0, n]d of Ld
with cyclic boundary conditions imposed. Let Γ be the group of permutations
generated by the translations and rotations of Td . Every Γ-invariant increasing
event has a sharp threshold.
3.4
Self duality for bond percolation on L2
Let G = (V, E) be a simple graph (no loops or multiple edges) drawn on the
Euclidean plane such that no edges cross. In every face, including in unbounded
faces, mark a point; let V 0 be the set of marked points. These are the dual
vertices. Every edge e ∈ E separates two faces; let e0 be a dual edge between the
corresponding dual vertices. Let E 0 be the set of dual edges; then G0 = (V 0 , E 0 )
is the planar dual of G. G0 may depend of the particular planar embedding of
G. Similarly, if a graph is embedded in a planar manifold, such as a torus, the
dual with respect to that manifold can be constructed by following the same
algorithm. In Figure 3.1 we show a small finite graph (edges marked as black
lines) and the dual graph (dual vertices marked as black dots, dual edges marked
as dotted lines).
Bond percolation with density p on G is naturally coupled to bond percolation
with density 1 − p on the dual graph. Every edge e ∈ E crosses exactly one dual
edge e0 ∈ E 0 , and vice versa. Define e0 open if e is closed, and e0 closed if e is
open.
The square planar lattice L2 = (Z2 , E2 ) is self dual. In Figure 3.2 we show a
subset of the lattice L2 and the isomorphic dual lattice. We have shaded part of
50
3.4 Self duality for bond percolation on L2
Figure 3.1: Finite graph and dual
1 0
0
1 0
1 0
1 0
1 0
1 0
1
1 0
0
1 0
1 0
1 0
1 0
1 0
1
1 0
0
1 0
1 0
1 0
1 0
1 0
1
1 0
0
1 0
1 0
1 0
1 0
1 0
1
1 0
0
1 0
1 0
1 0
1 0
1 0
1
1 0
0
1 0
1 0
1 0
1 0
1 0
1
Figure 3.2: Square planar lattice and isomorphic dual
51
3.5 Influence for monotonic measures
the figure grey. We have left in black a [0, n + 1] × [0, n] rectangle R (n = 4) and
a similar rectangle R0 in the dual lattice, rotated through 90◦ . Write LW(R) for
the event that R is crossed long-ways (horizontally in this case) by a path of open
edges. Notice that exactly one of the following happens: either LW(R), or R0 is
crossed long-ways (vertically) by open dual edges, LW(R0 ). This is a consequence
of the max-cut min-flow theorem. As µ1−p is the measure on the dual lattice,
µp (LW(R)) + µ1−p (LW(R)) = 1.
At p = 1/2, p = 1 − p so bond percolation is self-dual, µ1/2 (LW(R)) = 1/2.
Write Rm,n for a [0, m] × [0, n] rectangle. Seymour and Welsh [52] proved that
for k = 1, 2, . . . ,
lim inf µ1/2 (LW(Rn,kn )) > 0.
n→∞
Kesten used an intricate path counting argument to show a sharp threshold for
the event LW(Rn,2n ) for large n. It follows that pc = 1/2. We will later give an
argument for this involving a construction of rectangles. Bollobás and Riordan [9]
used the sharp threshold result of Friedgut and Kalai to simplify Kesten’s result.
Place an R2n,4n rectangle in a discrete torus, large enough to prevent ‘wrapping
round’, say T26n = (Z/6nZ)2 . Let A be the symmetric increasing event that there
is a copy of Rn,5n somewhere in T26n , allowing rotation and translation, that is
crossed long-ways. The torus can be covered by a small number of copies of R2n,4n
such that A implies one of the copies is crossed long-ways. The sharp threshold
result for A implies a sharp threshold result for LW (R2n,4n ).
3.5
Influence for monotonic measures
Let µ be a positive measure on the Hamming space Ω = {0, 1}N ; µ(ω) > 0 for all
ω ∈ Ω. For J ⊂ I and ξ ∈ Ω, let
µξJ (η) = µ(Xj = η(j) for j ∈ J | Xi = ξ(i) for i ∈ I \ J).
Recall that the following are equivalent for µ positive [34].
(i) µ satisfies the FKG lattice condition (1.4).
52
3.5 Influence for monotonic measures
(ii) µ is monotonic: for J ⊂ I and A increasing, µξJ (A) is increasing in ξ ∈ Ω.
(iii) µ is strongly positively-associated: for J ⊂ I, ξ ∈ Ω and A, B increasing,
µξJ (A ∩ B) ≥ µξJ (A)µξJ (B).
Let µ be a positive monotonic measure on (Ω, F). For an increasing set A ⊂ Ω,
define the conditional influence of coordinate i by,
IA (i) = µ(A | Xi = 1) − µ(A | Xi = 0).
As µ is monotonic, the influence is always non-negative. When µ is a product
measure, conditional influence is exactly the influence, see (3.1).
Equality (3.1) for product measures also motivates an alternative definition
for influence,
I A (i) = µ(ω i ∈ A, ωi 6∈ A).
Unlike with product measures, it is not generally the case that IA (i) = I A (i).
Consider the following measure. Take N odd for convenience. Let µ1/3 , µ2/3
be product measures on Ω with densities 1/3 and 2/3 respectively. Let µ =
(µ1/3 + µ2/3 )/2; this corresponds to picking a coin at random, with bias 1/3 or
2/3, and then tossing it N times. Let A be the majority event, that more than
half of the coordinates are 1. By symmetry µ(A) = 1/2.
For each coordinate, the conditional influence IA (i) = Θ(1). The majority
event is strongly correlated with the choice between the two coins. However,
coordinate one is only critical if the other coordinates sum to exactly (N − 1)/2.
The probability of this, I A (1), decay exponentially with N . While there can be
no influence theorem for the I A (i), we can extend the influence results for product
measure to conditional influence results for monotonic measures.
Theorem 3.7. Let µ be a positive monotonic measure on (Ω, F). Let A ⊂ Ω be
an increasing event. Then for some i ∈ I,
IA (i) ≥ C min{µ(A), µ(Ac )}
53
log N
.
N
3.5 Influence for monotonic measures
Proof. Let λ be the uniform measure on [0, 1]N . We will construct a function
f : [0, 1]N → {0, 1}N . We will do this on a coordinate-wise basis, constructing
f = (f1 , . . . , fN ) ∈ Ω for given x ∈ [0, 1]N . Function f will have the following
properties.
(i) For all i ∈ I, fi only depends on x(1), . . . , x(i).
(ii) For all ω ∈ Ω, λ(f = ω) = µ(ω).
(iii) Function f is increasing with respect to the corresponding partial orders on
the domain and co-domain.
Let B = f −1 (A) = {x ∈ [0, 1]N : f (x) ∈ A}. The distribution of f under λ
is equal to the distribution of the identity function under µ, so λ(B) = µ(A).
The result then follows by showing that IA (i) ≥ IB (i) for all i ∈ I. Notice that
IB (i) is an influence with respect to the product measure on [0, 1]N and IA (i) is
a conditional influence with respect to µ.
We now construct f (x). Let

1 x(1) ≥ 1 − µ(X1 = 1),
f1 =
0 otherwise.
Notice f1 only depends on x(1), f1 is increasing in x(1) and λ(f1 = 1) = µ(X1 = 1).
For i ∈ {1, . . . , N − 1}, assume inductively:
(i) fi only depends on x(1), . . . , x(i),
(ii) µ(Xk = ak for k = 1, . . . , i) = λ(fk = ak for k = 1, . . . , i) for all
a1 , . . . , ai ∈ {0, 1}, and
(iii) fi is increasing in x(1), . . . , x(i).
Let
fi+1

1 x(i + 1) ≥ 1 − µ(Xi+1 = 1 | Xk = fk for k = 1, . . . , i),
=
0 otherwise.
Then clearly the first and second property follow immediately. The third property follows by the monotonicity of µ. The event Xi+1 = 1 is increasing, so
54
3.5 Influence for monotonic measures
µ(Xi+1 = 1 | Xk = fk for k = 1, . . . , i) is increasing in f1 , . . . , fi , which by assumption are increasing in x(1), . . . , x(i).
By induction f has the properties claimed. In the construction of f we have
made a choice—the order in which we construct the fi based on the fj already
constructed—we arbitrarily took the natural order 1, . . . , n. Below we will consider a different order.
We now claim that IA (i) ≥ IB (i) for i ∈ I, where B = f −1 (A). The result
follows from this claim by the BKKKL influence result applied to B.
For i = 1, this is immediate. Measure µ is positive, so the conditional distributions of f under λ given x(1) = 0 and x(1) = 1 are exactly the conditional
distributions of µ given X1 = 0 and X1 = 1, respectively. Let i > 1, we construct
a function g : [0, 1]N → {0, 1}N in a similar fashion to f , except we construct
coordinates gj in the order i, 1, . . . , i − 1, i + 1, . . . , N . Let

1 x(i) ≥ 1 − µ(Xi = 1),
gi =
0 otherwise.
For j = 1, . . . , i − 1, let

1 x(j) ≥ 1 − µ(Xj = 1 | gi , g1 , . . . , gj−1 ),
gj =
0 otherwise.
Then for j = i + 1, . . . , N , let

1 x(j) ≥ 1 − µ(Xj = 1 | g1 , . . . , gj−1 ),
gj =
0 otherwise.
Let D = g −1 (A). Then ID (i) = IA (i), just as IB (1) = IA (1). All that remains
is to show that ID (i) ≥ IB (i). This follows from the monotonicity of µ. Let
x ∈ [0, 1]N and write
xi = (x(1), . . . , 1, . . . , x(N )),
xi = (x(1), . . . , 0, . . . , x(N )),
with the change in the i-th coordinate. We show that g(xi ) ≥ f (xi ). Then
combined with the converse result g(xi ) ≤ f (xi ) the claim follows. The intuitive
55
3.6 Russo formula for monotonic measures
reason why g(xi ) ≥ f (xi ) is that µ is positively associated, so the sooner we
process the ‘good’ information that coordinate i takes the value 1, the greater the
upward lift elsewhere. Firstly, fi (xi ) = gi (xi ) = 1. We can now check inductively
that gj (xi ) ≥ fj (xi ) for j = 1, . . . , i − 1, i + 1, . . . , N . Look at the definitions for
fj and gj . At every step, the values fk (xi ) already defined (k = 1, . . . , j − 1) are
smaller than the corresponding gk (xi ), so by monotonicity gj (xi ) ≥ fj (xi ).
3.6
Russo formula for monotonic measures
Let µ be a positive measure on Ω = {0, 1}N , it need not be a probability measure.
Suppose that µ satisfies the FKG lattice condition (1.4). Define a parameterized
set of probability measures µp for p ∈ (0, 1) by
µp (A) =
Y
1 X
µ(ω)
pω(i) (1 − p)1−ω(i)
Z ω∈A
i∈I
with
Z=
X
µ(ω)
ω∈Ω
Y
pω(i) (1 − p)1−ω(i) .
i∈I
Then it is immediate that µp is positive and satisfies the FKG lattice condition;
µp is monotonic. The obvious example of such a family of measures is given by
the random-cluster measures on a graph. Fix q ≥ 1 and let I be the edge set of
a graph. Take µ(ω) = q k(ω) ; k(ω) counts the number of open-clusters. Then µp
is the (p, q) random-cluster measure.
Russo’s formula has been extended to cover random-cluster measures [5]. It
extends naturally to this setting. For convenience, write A for the indicator
function of an increasing event A.
Theorem 3.8. With covp the covariance with respect to µp ,
X
d
1
µp (A) =
covp (Xi , A).
dp
p(1 − p) i∈I
56
3.6 Russo formula for monotonic measures
Proof. Differentiating Z × µp (A),
Y
d X
µ(ω)
pω(i) (1 − p)1−ω(i)
dp ω∈A
i∈I
XX
Y
1 − ω(i)
ω(i)
1−ω(i) ω(i)
−
=
µ(ω)
p (1 − p)
p
1−p
i∈I ω∈A
i∈I
X µp (Xi A) µp ((1 − Xi )A)
−
.
=Z
p
1−p
i∈I
The above with A = Ω gives
X µp (Xi ) µp (1 − Xi )
d
Z=Z
−
.
dp
p
1−p
i∈I
Therefore,
X µp (Xi A) µp ((1 − Xi )A)
d
µp (A) =
−
dp
p
1−p
i∈I
X µp (Xi ) µp (1 − Xi )
− µp (A)
−
p
1−p
i∈I
X
1
=
covp (Xi , A).
p(1
−
p)
i∈I
Conditional influence and covariance are related,
µp (AXi ) µp (A(1 − Xi ))
−
µp (Xi )
µp (1 − Xi )
µp (AXi )µp (1 − Xi ) − µp (A(1 − Xi ))µp (Xi )
=
µp (Xi )µp (1 − Xi )
covp (Xi , A)
=
.
µp (Xi )µp (1 − Xi )
IA (i) =
Let
ξp = min
i∈I
so
µp (Xi )µp (1 − Xi )
,
p(1 − p)
X
d
µp (A) ≥ ξp
IA (i).
dp
i∈I
57
3.7 Sharp thresholds for the random-cluster measure
Say that µ is Γ-invariant if for all γ ∈ Γ, for all events A, µ(A) = µ(Aγ ). If
increasing event A and measure µ are both Γ-invariant for some transitive Γ then
IA (i) is independent of i. Let µ be such a measure. Then
d
µp (A) ≥ Cξp min{µp (A), µp (Ac )} log N.
dp
There is an additional complication compared with the product case. If we want
to prove the existence of a sharp threshold in the neighbourhood of p, then we
need to bound ξp away from zero.
3.7
Sharp thresholds for the random-cluster measure
Let G = (V, E) be a finite simple graph. A graph automorphism is a permutation
π on the vertex set V such that
∀x, y ∈ V, hx, yi ∈ E ↔ hπ(x), π(y)i ∈ E.
Let γ be the induced permutation on edge set E,
γ(e) = hπ(x), π(y)i,
e = hx, yi ∈ E.
Call γ a graph edge automorphism. Let Π be the group of graph automorphisms
of G. Let Γ be the corresponding group of graph edge automorphisms. We then
say that G is Γ-invariant. If Γ acts transitively on E we say that G is symmetric.
Let φp,q be the (p, q) random-cluster measure on {0, 1}E with p ∈ (0, 1), q ≥ 1,
φp,q (ω) =
1 k(ω) Y ω(e)
q
p (1 − p)1−ω(e) .
Z
e∈E
If Γ is the group of graph edge automorphisms, φp,q is Γ-invariant.
Recall that φp,q has a finite energy property. Let Ce be the event that the end
vertices of edge e = hx, yi are connected by an open path of edges in E \ {e}.
Then
φp,q (Xe | Ce ) = p,
φp,q (Ee | Cec ) =
58
p
.
p + q(1 − p)
3.8 Self duality for the random-cluster model
As t → t(1 − t) is concave,
min{φp,q (Xe = 1)φp,q (Xe = 0)}
≥ min t(1 − t) : t ∈
p
,p
p + q(1 − p)
so
ξp ≥
min{p(1 − p), p(1 − p)q/(p + q(1 − p))2 }
≥ 1/q.
p(1 − p)
If graph G and increasing event A are symmetric,
d
C
φp,q (A) ≥ min{φp,q (A), φp,q (Ac )} log N.
dp
q
As in Section 3.3, this gives a sharp threshold result.
Theorem 3.9. Let A 6∈ {∅, Ω} be an increasing symmetric event on symmetric
graph G. Let ε ∈ [0, 1/2]. If p, p0 ∈ (0, 1) are such that φp,q (A) ≥ ε and
p0 − p =
2q log 1/(2ε)
C log N
then φp0 ,q (A) ≥ 1 − ε.
3.8
Self duality for the random-cluster model
The square planar lattice L2 and the torus T2n are self dual graphs. Bond percolation with density p is dual to bond percolation with density 1 − p on the same
graph. Let q ≥ 1 and let p, p0 satisfy
p0
1−p
=
q
.
1 − p0
p
The (p, q) random-cluster model on a finite graph is dual to the (p0 , q) randomcluster model on the dual graph. Let φn,p be the (p, q) random-cluster model on
√
√
T2n , φn,p is dual to φn,p0 . The self dual point is psd = q/(1 + q). On L2 we have
to take into account boundary conditions. The (p, q) random-cluster model with
free boundary conditions φ0p,q is dual to the wired (p0 , q) random-cluster measure
φ1p0 ,q , φ1psd ,q is dual to φ0psd ,q .
59
3.8 Self duality for the random-cluster model
Let k ≥ 2, n ≥ 1. Let R = Rn,n+1 be the [0, n] × [0, n + 1] rectangle
in T2kn and let LW (R) be the event that R is crossed long-ways. By duality,
φkn,psd (LW(R)) = 1/2. Let A be the event that T2kn contains a copy of R, up to
rotation and translation, that is crossed long-ways. With Γ the group of graph
edge automorphisms of T2kn = (V, E),
A = {ω ∈ Ω : ∃γ ∈ Γ, γ(ω) ∈ LW(R)},
Ω = {0, 1}E .
The increasing event A and measure φkn,p are Γ-invariant. Therefore, for p ≥ psd ,
as |E| = 2(kn)2 ,
1
φkn,p (A) ≥ 1 − |E|−C(p−psd )/q
2
≥ 1 − (kn)−2C(p−psd )/q .
We will now use the ‘square-root trick’ [9] to produce a sharp threshold result
for the asymmetric event that a given rectangle is crossed. Let A1 , . . . , AM be a
number of increasing events with equal probability. Then by positive association
!
!
M
M
M
\
Y
[
c
Ai ≥
φkn,p (Aci ) = (1 − φkn,p (A1 ))M
1 − φkn,p
Ai = φkn,p
i=1
i=1
so
i=1
v
u
u
M
φkn,p (A1 ) ≥ 1 − t1 − φkn,p
M
[
!
Ai .
i=1
Let 1 < α < k, let Hn,α = [0, αn] × [0, n/α] and let Vn,α = [0, n/α] × [0, αn].
We will cover T2kn with copies of Hn,α and Vn,α so that A implies that one of the
copies is crossed short-ways. Let
hn,α = {(l1 bn(α − 1)c, l2 bn(1 − α−1 )c : λ1 , λ2 ∈ Z} ∩ [0, kn]2 ,
vn,α = {(l1 bn(1 − α−1 )c, l2 bn(α − 1)c : λ1 , λ2 ∈ Z} ∩ [0, kn]2 .
Let H be the set of translations of Hn,α by the vectors in hn,α . Similarly let V be
the set of translations of Vn,α by the vectors vn,α . Let M = |H ∪ V|. If A occurs,
then an element of H ∪ V is crossed short-ways. By the square-root trick, the
probability Hn,α is crossed short-ways,
φkn,p (SW(Hn,α )) ≥ 1 −
60
q
M
1 − φkn,p (A).
3.9 Long-ways crossings and the critical point
M = |hn,α | + |vn,α |
kn
kn
=2
nbα − 1c
nb1 − α−1 c
k
k
≤2 1+
1+
,
α − 1 − n−1
1 − α−1 − n−1
so M ≈ 2k 2 α/(α − 1)2 for n, k large.
Theorem 3.10. Let k ≥ 2, n ≥ 1, and α ∈ (1, k). For p ∈ [psd , 1],
φkn,p (SW([0, nα] × [0, nα−1 ])) ≥ 1 − e−g(p−psd )
where
g = g(k, n, α, q) =
3.9
2C
log(kn).
Mq
Long-ways crossings and the critical point
Let q ≥ 1. The critical point for the random-cluster measure on L2 is,
pc = inf{p : φ1p,q (0 ↔ ∞) > 0}.
It is known that pc ≥ psd , and it is conjectured that pc = psd [34].
For completeness, we will now give a construction connecting rectangle crossings to percolation. The starting premise is stronger than the conclusion of Theorem 3.10. It involves long-ways crossings rather than short-ways crossings, and
it is set on the square planar lattice rather than on a finite torus. Let φ1p,q be the
(p, q) random-cluster measure on L2 with wired boundary conditions.
Theorem 3.11. Let q ≥ 1. Let pk = φ1p,q (LW (Rn,2n )) with n = 2k . If for all
p ≥ psd ,
∞
Y
pk > 0,
k=1
the critical point pc = psd .
Proof. We use a construction from [17]. Let

LW([0, 2k ] × [0, 2k+1 ]) k odd,
Ak =
LW([0, 2k+1 ] × [0, 2k ]) k even.
61
3.10 Influence for continuous random variables
By positive association and symmetry, for p > psd ,
!
\
Y
φ1p,q
Ak ≥
φ1p,q (Ak ) > 0.
k
k
The intersection of the Ak implies the existence of an infinite path; a path along
the length of Ak crosses a path along the length of Ak+1 .
Let p > psd > p0 satisfy the duality condition p0 /(1 − p0 ) = q(1 − p)/p. By the
duality of φ1p,q and φ0p0 ,q , 1 − pk is the probability of a short-ways crossing,
1 − pk = φ0p0 ,q (SW(R2k+1 −1,2k +1 )).
Let rad(C) be the maximum n such that 0 is joined to the boundary of the box
[−n, n]2 by a path of open edges.
∞
X
1 − pk ≤
k=1
∞
X
k=1
∞
X
≤4
=
If φ0p0 ,q (rad(C)) < ∞ then
2k+1 φ0p0 ,q (rad(C) ≥ 2k + 1)
Q
k
φ0p0 ,q (rad(C) ≥ n)
n=1
4φ0p0 ,q (rad(C)).
pk > 0. If the expectation of rad(C) is finite for all
p0 < psd then pc = psd .
3.10
Influence for continuous random variables
We have extended Friedgut and Kalai’s influence result from product measure to
monotonic measures on {0, 1}N . The most general influence result for product
measures is the result of BKKKL on the Euclidean cube [0, 1]N . The definitions of
monotonicity and positive association have been extended to this setting. Holley’s
theorem and the FKG theorem also have continuous versions [1, 50]. It seems
natural to ask how Theorem 3.7 can be extended beyond discrete monotonic
measures.
With E = [0, 1]N , let (E, E, λ) be the uniform (Lebesgue) measure on E.
Let Ui be the projection random variable from E onto the i-th coordinate,
62
3.10 Influence for continuous random variables
Ui (x) = x(i), for i ∈ I = {1, . . . , N }. While it is not generally the case that
increasing subsets of E are Borel measurable, the following is presumably well
known.
Lemma 3.12. Increasing subsets of E are Lebesgue measurable.
Let ρ be a density function ρ : E → [0, ∞) with λ(ρ) = 1. Let µρ be the
corresponding probability measure,
Z
µρ (A) =
ρ(x) dx, A ∈ E.
A
We will state results in this section in terms of density functions, though of course
they are results for the corresponding measures. We say that ρ is stochastically
dominated by ν, written ρ ≤st ν, if for all increasing sets A,
µρ (A) ≤ µν (A).
The following theorem is an extension of Holley’s theorem.
Theorem 3.13 (Preston [50]). Let ρ, ν be density functions on E. If
∀x, y ∈ E, ρ(x ∧ y)ν(x ∨ y) ≥ ρ(x)ν(y)
then ρ ≤st ν.
We say that ρ is positively associated if for all A, B increasing
µρ (A ∩ B) ≥ µρ (A)µρ (B).
Let J ⊂ I and let ξ ∈ E. Define
EJξ = {x ∈ E : x(i) = ξ(i) for i ∈ I \ J}.
Now define the conditional measure µξρ,J (A) by the density
ρξJ (x) ∝ ρ(x)1E ξ (x).
J
Call ρ monotonic if for all J ⊂ I, ρξJ is stochastically increasing in ξ. We say that
density function ρ satisfies the FKG lattice condition if
∀x, y ∈ E, ρ(x ∧ y)ρ(x ∨ y) ≥ ρ(x)ρ(y).
63
3.10 Influence for continuous random variables
As in the discrete case, the FKG lattice condition for ρ positive implies strong
positive-association [1]. This follows from Preston’s Theorem. Let J ⊂ I, let
ξ ∈ E, and let A, B be increasing subsets of EJξ . For all x, y ∈ EJξ ,
ρξJ (x ∧ y)ρξJ (x ∨ y)1A (x ∨ y) ≥ ρξJ (x)ρξJ (y)1A (y).
Therefore µξJ (·) ≤st µξJ (· | A), so
µξJ (A ∩ B) ≥ µξJ (A)µξJ (B).
The FKG lattice condition also implies monotonicity. Let J ⊂ I and ξ ≤ ζ ∈ E.
Let x, y ∈ [0, 1]J , and write xξ for the configuration in EJξ that agrees with x on
J. Putting xξ and y ζ into the FKG lattice condition gives
ρξJ ((x ∧ y)ξ )ρζJ ((x ∨ y)ζ ) ≥ ρξJ (xξ )ρζJ (y ζ ),
so ρξJ ≤st ρζJ . We can summarize these results as follows.
Theorem 3.14. If strictly positive density function ρ satisfies the FKG lattice
condition,
(i) ρ is strongly positively-associated, and
(ii) ρ is monotonic.
Define the conditional influence IAρ (i) for monotonic density ρ and increasing set
A,
IAρ (i) = µρ (A | Ui = 1) − µρ (A | Ui = 0).
Though it only depends on ρ on a null set, by monotonicity it satisfies
IAρ (i) ≥ IˆAρ (i) := lim µρ (A | Ui ∈ [1 − ε, 1]) − µρ (A | Ui ∈ [0, ε]).
ε→0
Theorem 3.15. Let ρ be a strictly positive monotonic density. Then
∃i ∈ I, IA (i) ≥ C min{µp (A), µp (Ac )}
64
log N
.
N
3.10 Influence for continuous random variables
However, there does not seem to be a corresponding sharp threshold result for
E = [0, 1]N . Let ρ be a monotonic density function. As in the discrete case, we
can define an increasing family of monotonic density functions,
N
Y
1
ρp (x) = ρ(x)
px(i) (1 − p)1−x(i) ,
Z
i=1
with normalising constant Z. Russo’s formula extends to this setting,
N
X
1
d
µp (A) =
covρ,p (Ui , 1A ).
dp
p(1 − p) i=1
However, we cannot bound covρ,p (Ui , 1A ) in terms of IA (i). Let ρ = 1, so ρp is a
product density for all p ∈ (0, 1). Let π = p/(1 − p), then

 log π N QN x(i)
(1 − p)1−x(i) p 6= 1/2,
i=1 p
2p−1
ρp (x) =
1
p = 1/2.
Let A = [N −1 , 1]N . For all i ∈ I, IA (i) = 1, but as N → ∞,
(
π −1/(π−1) p 6= 1/2,
µp (A) →
e−1
p = 1/2.
There is no sharp threshold.
Proof of Theorem 3.15. The proof is similar to the discrete case. We construct
an increasing function f : E → E such that:
(i) The law of the identity function U under µρ is the same as the law of f (U )
under λ.
(ii) Given f1 , . . . , fi increasing functions of U1 , . . . , Ui , we construct fi+1 as an
increasing function of U1 , . . . , Ui+1 such that (f1 , . . . , fi+1 ) under λ has the
same distribution as (U1 , . . . , Ui+1 ) under µρ .
(iii) With B = f −1 (A), IA (1) = IB (1), and IA (i) ≥ IB (i) for all i ∈ I.
We first choose a strictly increasing function f1 : E → [0, 1] such that f1 only
depends on the first coordinate. For convenience we will also write f1 as a function
of one variable. We need,
λ(f1 (U1 ) ≤ x(1)) = f1−1 (x(1)) = µρ (U1 ≤ x(1)).
65
3.10 Influence for continuous random variables
Let φ : [0, 1] → [0, 1] be the continuous, increasing bijection given by
x(1)
Z
φ(x(1)) = µρ (U1 ≤ x(1)) =
Z
Z(t) dt,
Z(t) =
ρ(t, y) dy.
[0,1]I\{1}
0
Take f1 = φ−1 . For an integrable function h : [0, 1] → R, by change of variable,
Z 1
Z 1
−1
λ(h(f1 (U1 ))) =
h(φ (u(1))) du(1) =
h(x(1))Z(x(1)) dx(1).
0
0
For every x, the density ρ conditional on the first coordinate is
ρx{1} (y) =
ρ(x(1), y)
.
Z(x(1))
The integrability of ρx{1} follows by Tonelli’s theorem. For any increasing event
A, by the change of variable formula,
Z 1Z
µρ (A) =
ρ(x(1), y)1A (x(1), y) dy dx(1)
[0,1]I\{1}
0
Z
1
Z
=
[0,1]I\{1}
0
Z
1
Z
f (u)
=
0
ρx{1} (y)1A (x(1), y) dy Z(x(1)) dx(1)
[0,1]I\{1}
ρ{1} (y)1A (f1 (u(1)), y) dy du(1).
We have ‘simplified’ sampling from the N dimensional density ρ. Take the first
coordinate to have distribution λ(f1 (U1 )) and then sample from the N − 1 dimenf (x)
sional density ρ{1} . Repeat this process a further N − 1 times for coordinates
f (u)
2, 3, . . . , N ; ρ{1,...,N } ≡ 1 and
Z
µρ (A) =
1A (f (u)) du.
[0,1]N
Let B = f −1 (A). As f1 (x) only depends on x(1), IA (1) = IB (1).
Let i > 1. Construct g as we constructed f but, as in the proof of Theorem 3.7,
modify the order in which that coordinates are ‘sampled’. Process the coordinates
in the order i, 1, . . . , i − 1, i + 1, . . . , N instead 1, . . . , N . We must show that
f (x) ≤ g(x) when x(i) = 1, and similarly that f (x) ≥ g(x) when x(i) = 0. Given
66
3.10 Influence for continuous random variables
x(i) = 1, f1 , g1 only depend on x(1). We must show that f1 (x(1)) ≤ g1 (x(1)),
the result then follows by an inductive argument. Let
φf (x) = µρ (U1 ≤ x),
φg (x) = µρ (U1 ≤ x | Ui = 1).
By monotonicity µρ (U1 ≤ x | Ui = a) is decreasing in a, so φf (t) ≥ φg (t). As
f1 = (φf )−1 and g1 = (φg )−1 , the result follows.
Proof of Lemma 3.12. This is immediate if N = 1. Suppose inductively that
increasing subsets of [0, 1]N are Lebesgue measurable. Let A be an increasing
subset on [0, 1]N × [0, 1]. Define a function f : [0, 1]N → [0, 1] ∪ ∞ by
f (x) = inf{y ∈ [0, 1] : f (x, y) ∈ A}.
The set A is increasing, so f is decreasing. By the inductive hypothesis, the sets
{x : f (x) < c} are Lebesgue measurable; f is a measurable function and the
graph Gf is null by Fubini’s theorem. Consider the subset of [0, 1]N +1 consisting
of the ‘area above the graph’ Gf ,
A = {(x, y) ∈ [0, 1]N +1 : y ≥ f (x)}.
A is measurable by the construction of the Lebesgue integral. Now A ⊂ A and
A \ A ⊂ Gf is null, so A is measurable. The result follows by induction.
67
Chapter 4
The mean-field zero-range
process
4.1
Balls and boxes
In this chapter we consider a mean-field zero-range process (ZRP) motivated by a
microcanonical ensemble. By considering the entropy of the system, we show that
the empirical distribution rapidly converges—in the sense of Kullback–Leibler
divergence—to a geometric distribution. The proof utilizes arguments of spectral
gap and log Sobolev type.
Suppose there are a number of boxes, N , each containing R indistinguishable
balls. Iteratively, pick a source box and a sink box, both uniformly at random; if
the source box is not empty then move a ball to the sink box. This is a Markov
chain on the set
BN = b ∈ NN : b1 + · · · + bN = N R .
Here bi represents the number of balls in the i-th box. With probability N −1 the
source box and the sink box are the same so no change occurs.
The transition probability between neighbouring elements of BN is N −2 . The
Markov chain is reversible with respect to the uniform distribution on BN . If we
regard the boxes as distinguishable particles, and the balls as quanta of energy,
the system is said to be a microcanonical ensemble with Maxwell–Boltzmann
68
4.1 Balls and boxes
statistics [24, 40]. One can also consider the balls to be particles; the process can
be said to have Bose–Einstein statistics [18].
P
Let Nk be the number of boxes containing k balls, Nk = N . At equilibrium,
the probability that (N0 , N1 , . . . ) = (n0 , n1 , . . . ) is proportional to the number
of ball configurations b ∈ BN compatible with (n0 , n1 , . . . ). The number of
−1
ways of putting b balls into N boxes is b+N
; the number of configurations is
N −1
−1
|BN | = N R+N
. Fix R. Under the equilibrium measure π,
N −1
NR − k + N − 1 . NR + N − 1
π(B1 ≥ k) =
N −1
N −1
k
R
=
1 + k 2 O N −1 as N → ∞.
R+1
The law of B1 converges to a geometric distribution with mean R. The geometric
distribution can be thought of as a Gibbs measure with respect to a linear energy
function, say H : N → R given by H(n) = n. Recall from Section 1.3 that
Gibbs measures are maximum entropy distributions. The equilibrium law of B1
converges to the maximum entropy distribution of N with mean R.
In Section 4.3 we give an example to motivate our use of entropy to study the
mean-field ZRP. The Ehrenfest urn model [20] is a simple example of a Markov
chain that can be studied by looking at the entropy of the empirical distribution.
The Ehrenfest model was proposed as a probabilistic model to help explain the
deterministic nature of Boltzmann’s H theory. In Section 4.5 we define an entropy
for the mean-field ZRP and show how it is related to the equilibrium distribution.
We also summarize our results for how the entropy evolves.
In Section 4.4 we discuss some general convergence techniques for reversible
Markov chains [19]. If N0 , the number of empty boxes, is ‘fixed’, any given
box behaves like a biased random walk on N. The equilibrium measure for the
random walk is a geometric distribution. We study this random walk in detail
in Section 4.7 using the techniques of Section 4.4. We will use these results to
analyse the mean-field ZRP.
In Section 4.8 we adapt a martingale concentration result [46]. We will prove
bounds on the expected step-wise change of the mean-field ZRP entropy. The
concentration result will be used to show that, with high probability, the entropy
decreases as the bounds suggest.
69
4.1 Balls and boxes
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
0
0
10
20
30
40
50
60
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
10
20
30
40
50
60
0
0
10
20
30
40
50
60
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
0
0
10
20
30
40
50
60
0.05
0.05
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
0
0
10
20
30
40
50
60
Figure 4.1: A sample path: the empirical distribution for N = 107 , R = 20 at
intervals of 40N steps
70
4.2 Notation
In Section 4.9 we use coupling arguments to prove bounds on the empirical
distribution of the mean-field ZRP. By coupling the process with a system of
N biased random walks on N, we prove an upper bound on the tail of the empirical distribution. We show if a significant fraction of the boxes are empty,
P
then after O(R2 N log N ) steps,
k≥j Nk decays exponentially in j. By a second coupling argument, we show that with high probability N0 ≥ N (5R)−1 after
O(N R2 log(R + 1)) steps.
In Sections 4.10 and 4.11 we use the results obtained from the biased random
walk. We use a log Sobolev technique to prove that when the entropy is large,
the entropy tends to halve over a period of O(R2 N log N ). Once the entropy is
O(R−2 ), we use a spectral gap type technique to show that the entropy tends to
halve over a period of O(R3 N ).
4.2
Notation
We make use of asymptotic notation: ‘order less than’, O; ‘order equal to’, Θ;
and ‘order greater than’, Ω, to describe the behaviour of functions as N → ∞.
With c1 , c2 , N0 positive constants,
if ∀N ≥ N0 ,
|f (N )| ≤ c2 |g(N )| write f (N ) = O(g(N )),
if ∀N ≥ N0 , c1 |g(N )| ≤ |f (N )| ≤ c2 |g(N )| write f (N ) = Θ(g(N )),
if ∀N ≥ N0 , c1 |g(N )| ≤ |f (N )|
write f (N ) = Ω(g(N )).
In Section 4.1 we made use of asymptotic notation with R fixed and N tending
to infinity. We will want to express bounds that for all R, given R, hold for all
N sufficiently large. When we write
f (R, N ) = O(g(R, N )) as N → ∞
we mean that there is a constant c2 and a function N0 such that,
∀R, ∀N ≥ N0 (R), |f (R, N )| ≤ c2 |g(R, N )|.
Similarly for Ω and Θ.
71
4.3 The Ehrenfest urn model
The function log should be interpreted as the natural logarithm, unless indicated otherwise by a subscript; loga x = log x/ log a.
If X(0), X(1), X(2), . . . is a sequence of real valued random variables, we will
write ∆X(i) for the step-wise change X(i + 1) − X(i).
For random variables X, Y , we say that X is stochastically smaller than Y ,
written X ≤st Y , if there is a coupling (X, Y ) such that X ≤ Y almost surely.
4.3
The Ehrenfest urn model
In Chapter 1 we introduced the lazy Ehrenfest urn model [20]. An urn contains
N balls, red or green. Start with the urn full of green balls. Each step pick a
ball uniformly at random, and with probability one half replace it with a ball of
the other colour. Assuming the balls are ordered, the process is time reversible
with respect to the uniform distribution on the set {red, green}N . Let NR be the
number of red balls, and let XR = NR /N be the fraction of red balls. After i
steps, each ball has been picked binomial Bin(i, N −1 ) times. Each ball is red with
probability
pR (i) = (1 − [1 − N −1 ]i )/2.
The Markov chain is of course just a random walk on the Hamming cube. The
mixing properties have been studied in great detail [19, 41, 48]. The fraction of
red balls is close to 1/2 after O(N log N ) steps.
An alternative approach to the problem is to take a hydrodynamic limit [43].
Condition on XR = x. The next step, XR increases by N −1 with probability
(1 − x)/2 and decreases by N −1 with probability x/2; the expected change is
E(∆XR ) = N −1 (1/2 − x). The hydrodynamic limit XRf is the solution to the
equation
d f
XR = 1/2 − XRf , XRf (0) = 0.
dt
In this simple case, we can easily find the hydrodynamic limit in closed form,
XRf (t) = (1 − e−t )/2.
72
4.3 The Ehrenfest urn model
1
1
2
0
0
1
2
3
Time, scaled by N
4
5
Figure 4.2: Hydrodynamic limit XRf tends to 1/2
The hydrodynamic limit can also be studied by looking at the behaviour of a
certain statistic. Let XGf = 1 − XRf be fraction of green balls, and define entropy
S(XRf , XGf ) = −XRf log XRf − XGf log XGf .
Entropy is initially 0; maximum entropy Smax = log 2 is obtained in the limit
t → ∞ when XRf = XGf = 1/2. Again, we use the function δ introduced in
Section 1.3,
d
y
δ(x, y) = y log y − x log x − (y − x)
t log t t=x = y log − (y − x).
dt
x
The function is sketched for arbitrary x in Figure 4.3. If x = y, δ(x, y) = 0. By
the convexity of t 7→ t log t, δ(x, y) ≥ 0 for all x, y. The function is bounded
between (x − y)2 /(3x) and (x − y)2 /x when 0 ≤ y ≤ 2x. The upper bound is also
73
4.3 The Ehrenfest urn model
x
0
0
x
2x
3x
Figure 4.3: Graph of δ(x, y) as a function of y
valid for all y. We can easily check that when XRf = x
d
1−x
S = (1/2 − x) log
≥ Smax − S.
dt
x
Without having to solve the hydrodynamic limit differential equation, we can see
that Smax − S decays exponentially,
d
(Smax − S) ≤ −(Smax − S).
dt
We can also perform an analogous calculation before taking the hydrodynamic
limit,
Smax − S
E(∆S | XR = k/N ) ≥
+ O(N −2 ).
N
We will use this approach to analyse the mean-field zero-range process. There
are two additional complications over the continuous case. Firstly, we have an
‘error’ term of order N −2 . Secondly, the Markov chain is ergodic; over large time
scales we cannot simply say that S is close to Smax .
74
4.4 Markov chain convergence
4.4
Markov chain convergence
Let Ω be a finite set. Let K : Ω × Ω → [0, 1] be a Markov kernel: for all
P
x, y K(x, y) = 1. If µ is a distribution on Ω, µK i gives the distribution after i steps. We can also run the Markov chain in continuous time at rate 1,
K − I is the corresponding Q-matrix. We say that the Markov chain is lazy if
K(x, x) ≥ 1/2 for all x ∈ Ω. We will assume that K is irreducible, lazy and
time reversible with respect to equilibrium probability measure π; for all x, y,
π(x)K(x, y) = π(y)K(y, x).
It is then standard that the left eigenvectors of K, suitably normalized, form
a basis for RΩ orthonormal with respect to the inner product
hf, giπ = hf, π −1 giRΩ =
X
f (x)g(x)/π(x).
x∈Ω
Label the eigenvectors F1 , . . . , Fn , with eigenvalues 1 = λ1 > λ2 ≥ λ3 ≥ · · · ≥
λn ≥ 0 respectively. The eigenvalues are all non-negative as the Markov Chain is
lazy. The first eigenvector F1 = π, the stationary distribution.
For a probability measure µ on Ω,
µ(x) = π(x) +
X
αi Fi (x),
αi = hµ, Fi iπ .
i≥2
The χ2 -distance from µ to π, kµ/π − 1k22,π , can be written
X (µ(x) − π(x))2
x∈Ω
π(x)
= hµ − π, µ − πiπ =
X
αi2 .
i≥2
The quantity 1 − λ2 is called the spectral gap. More generally, if the Markov chain
is not lazy, the spectral gap is 1 − maxi≥2 |λi |. Let µt = µe−t(I−K) . The eigenvalue
of Fi with respect to e−t(I−K) is e−t(1−λi ) so,
2
X
d X (µt (x) − π(x))2
µt (x) µt (y)
=−
π(x)K(x, y)
−
dt x
π(x)
π(x)
π(y)
x,y
≤ −2(1 − λ2 )
X (µt (x) − π(x))2
x
75
π(x)
.
(4.1)
4.4 Markov chain convergence
Define the total variation distance between µ and π,
1X
kµ − πkTV = sup |µ(A) − π(A)| =
|µ(x) − π(x)|.
2 x
A⊂Ω
Write τ1 (ε) for the time to convergence within ε in total variation,
τ1 (ε) = inf{t : ∀µ, kµt − πkTV ≤ ε}.
Inequality (4.1) can be used to show the following bound [48],
τ1 (ε) ≤
log(επmin )
,
log λ2
πmin = min π(x).
x
(4.2)
We will also use a log Sobolev technique. Log Sobolev inequalities for discrete
Markov chains are described in [19]. Define the Dirichlet form, and the Laplacian
for f, g : Ω → R,
1X
π(x)K(x, y)[f (x) − f (y)][g(x) − g(y)], and
2 x,y
X
|f (x)|2
2
L(f ) =
|f (x)| log
π(x).
kf k22,π
x
E(f, g) =
The log Sobolev constant is defined,
E(f, f )
: L(f ) 6= 0 .
ν = min
f
L(f )
This is in a superficial way similar to the spectral gap; we can write the spectral
gap in term of E,
1 − λ2 = min
f
E(f, f )
: varπ (f ) 6= 0 .
varπ (f )
The entropy of µ relative to π, also called the Kullback–Leibler divergence from
µ to π, is defined by
DKL (µkπ) =
X
x
µ(x) log
µ(x) X
=
δ(π(x), µ(x)) ≥ 0.
π(x)
x
It is not generally the case that DKL (µkπ) = DKL (πkµ). The log Sobolev constant can be used to show convergence in Kullback–Leibler divergence under the
action of e−t(I−K) ,
µ
d
µt t
DKL (µt kπ) = −E
, log
≤ −4ν DKL (µt kπ).
dt
π
π
76
(4.3)
4.5 Entropy for the mean-field ZRP
Corollary A.4 of [19] gives a general bound for the log Sobolev constant in terms
of the spectral gap and πmin ,
ν≥
4.5
(1 − 2πmin )(1 − λ2 )
.
log(1/πmin − 1)
Entropy for the mean-field ZRP
With Xk = Nk /N the fraction of boxes with k balls, the empirical distribution
is the random variable X = (X0 , X1 , . . . ). Let D be the set of distributions with
P
P
mean R; x ∈ [0, 1]N such that
xk = 1 and
kxk = R. Define the entropy
function S : D → [0, (R + 1) log(R + 1) − R log R] by
S(x) = −
∞
X
xk log xk ,
0 log 0 = 0.
k=0
Up to a factor of log 2 this is the standard information-theoretic entropy. It
is related to the thermodynamic entropy log N !/(N0 !N1 ! . . . ). The multinomial
coefficient N !/(N0 !N1 ! . . . ) counts the permutations of B1 , . . . , BN ; under the
equilibrium measure π,
π(X = x) = |BN |−1
N!
,
n0 !n1 ! . . .
xk = nk /N.
The entropy and thermodynamic entropy are related. Provided each box contains
O(R log N ) balls,
N!
S(X) − 1 log
= O(RN −1 log2 N ).
N
N0 !N1 ! . . . This is simply by Stirling’s approximation [11],
log
n!
= O(log n).
(n/e)n
We can estimate the expected change in entropy in terms of the expected
changes E(∆Xk | X = x). This calculation is meant only to motivate two rather
77
4.5 Entropy for the mean-field ZRP
lengthy calculations, the proofs of Lemmas 4.16 and 4.20.
1n
−(xk (1 − x0 ) − xk+1 )
N (xk−1 (1 − x0 ) − xk ) − (xk (1 − x0 ) − xk+1 )
∞
X
d
E(∆S | X = x) ≈ −
(t log t) t=xk E(∆Xk | X = x)
dt
k=0
E(∆Xk | X = x) ≈
k=0
k>0
∞
1 X
xk (1 − x0 )
≈
(xk (1 − x0 ) − xk+1 ) log
N k=0
xk+1
≥
∞
1 X (xk (1 − x0 ) − xk+1 )2
≥0
N k=0 max{xk (1 − x0 ), xk+1 }
It is interesting to consider the hydrodynamic limit for this process. It is the
solution Xf : [0, ∞) → [0, 1]N to the differential equation,
f
d f
− (Xkf (1 − X0f ) − Xk+1
)
Xk =
f
f
f
f
f
f
dt
(Xk−1 (1 − X0 ) − Xk ) − (Xk (1 − X0 ) − Xk+1 )
k = 0,
k > 0,
with initial condition,
(
1 k = R,
Xkf (0) =
0 otherwise.
The solution Xf (t) is also the marginal probability distribution of a certain continuous time random walk on N. Let the walk start C(0) = R. At time t, if
C(t) > 0 step left (decrement C by 1) at rate 1; step right (increment C by 1) at
rate 1 − P(C(t) = 0). Then P(C(t) = k) = Xkf (t).
The distribution x with maximum entropy for mean R is the geometric distribution xk = Rk /(R + 1)k+1 . Call this distribution GR . Define S to be the gap
between the maximum entropy and the entropy,
S(x) = S(GR ) − S(x).
Note that ∆S(X) = −∆S(X), and we expect S(X) to decrease to almost zero.
We can now hope for inequalities of the form
E(∆S | X = x) ≤ −c S(x),
We also have the following.
78
c constant.
4.5 Entropy for the mean-field ZRP
Lemma 4.4. S(x) is the relative entropy of x with respect to GR , that is the
Kullback–Leibler divergence from x to GR . Also, S(x) is less than Kullback–
Leibler divergence from x to the geometric distribution with first term x0 and
mean (1 − x0 )/x0 .
X
xk
=
δ(Rk /(R + 1)k+1 , xk )
k
k+1
R /(R + 1)
X
X
xk
≤
xk log
=
δ(x0 (1 − x0 )k , xk ).
k
x0 (1 − x0 )
S(x) =
X
xk log
The main results of this chapter are in Sections 4.9, 4.10 and 4.11. They can be
summarized as follows. Consider the evolution of the Markov chain up to N 21/20
steps. With probability 1 − O(N −1/3 ) the following occurs.
(i) After time O(N R2 log(R + 1)), X0 ≥ (5R)−1 . As a consequence, we get a
bound on the tail of the empirical distribution. After O(R2 N log N ) steps,
P
j
j
−1
log N .
k≥j Xk ≤ 8(1 − a) for j such that (1 − a) ≥ 2N
(ii) From then on,
E(∆S | X = x) ≤ −
S(x)
250R log N
+
.
2
500R N log N
N2
After time O(R2 N log(R + 1) log N ) steps, S ≤ φ := 10−4 (R + 1)−2 . However, when S is small, this bound on E(∆S | X = x) is rather weak. It does
not imply that S = O(R3 N −1 log2 N ) until Ω(R2 N log2 N ) steps.
(iii) The bound on E(∆S | X = x) improves to,
E(∆S | X = x) ≤ −
100(R + 1) log N
S(x)
+
.
3
4000(R + 1) N
N2
After O(R3 N log N ) steps, S = O(N −1 log2 N ).
In addition, we can show that S tends to decrease to O(R4 N −1 log N ).
In order to prove the above, we first analyse a biased random walk on N.
We use the equilibrium distribution and mixing time in Section 4.9 to obtain the
bound on the tail of the empirical distribution. In Section 4.10 we use the Markov
chain’s log Sobolev constant. In Section 4.11 we use an inequality related to the
spectral gap.
79
4.6 Total variation convergence
4.6
Total variation convergence
Let K : BN × BN → [0, 1] be the Markov kernel for the mean-field ZRP. A
natural quantity of interest is the rate at which the total variation distance from
equilibrium,
kµK i − πkTV = sup |µK i (A) − π(A)|,
A⊂BN
decays. The time to convergence within ε in total variation is defined,
τ1 (ε) = inf{i : ∀µ, kµK i − πkTV ≤ ε}.
Notice that τ1 (ε) is a maximum over all starting distributions µ. It is standard to
call τ1 (1/4) the total variation mixing time; it can be used to control the Markov
chain convergence to an arbitrary degree of accuracy,
τ1 (2−k ) ≤ kτ1 (1/4),
k = 2, 3, . . . .
The results for the entropy described in Section 4.5 assume that the Markov
chain starts from a particular state, with R balls in every box. Under the equilibrium measure, the maximum number of balls per box is O(R log N ) with high
probability. The results immediately generalize to allow any starting distribution such that every box contains O(R log N ) balls. However, if we start with all
RN balls in one box, the process clearly does not reach equilibrium before time
RN 2 /2—one box will still contain half the balls. Trying to calculate the total
variation mixing time would clearly be the wrong approach.
One could consider a truncated version of the mean-field ZRP. The meanfield ZRP is defined on the set BN ; BN can be thought of as a graph, with edges
between the elements of BN that differ by the location of one ball. The meanfield ZRP is then the random walk on BN with uniform transition probability
N −2 : each step, move to each of the adjacent states with probability N −2 . The
equilibrium distribution is uniform.
The process can be truncated as follows. Remove certain states, and any
incident edges, from the graph BN . Then consider the random walk on the new
graph with uniform transition probability N −2 . The equilibrium distribution is
again uniform. Provided the fraction of the vertices of BN that have been removed
is small, the change to the equilibrium measure is small.
80
4.7 A biased random walk
One such truncation is to remove all the ball configuration where any box
holds more than O(R log N ) balls. Rough calculations suggest that under the
equilibrium measure, S(X) has order at least RN −1 log N with high probability. Given that we show S(X) = O(R4 N −1 log N ) with high probability after
O(R3 N log N ) steps, it is tempting to conjecture that for the restricted Markov
chain, τ1 (1/4) = O(R3 N −1 log N ).
Our results deal with the process when it is ‘far’ from equilibrium. Proving
total variation convergence might well require a rather different technique. The
empirical distribution is a Markov chain on
(
D0 =
x ∈ {0, N −1 , 2N −1 , . . . , 1}N :
)
X
k
xk = 1,
X
kxk = R .
k
The Markov chain on D0 is similar to the one analysed by Frieze, Kannan and
Polson [29]. They looked at the problem of efficiently sampling from a log-concave
distribution on a convex subset of Euclidean space. They considered a reversible
Markov chain on a convex subset of the hypercubic lattice, chosen so that the
equilibrium distribution approximates the log-concave distribution. The mixing
time in bounded in terms of the dimension and the diameter of the state space.
Our results do seem to be a step in the right direction. We know that the
empirical distribution is soon effectively confined to a subset of D0 ,
D00 = x ∈ D0 : S(x) = O(N −1 log2 N ) and xk = 0 for k = Ω(R log N ) .
It would therefore be interesting to know the total variation mixing time for the
process when truncated such that the support of empirical distribution is D00 .
The diameter of D00 is much smaller than the diameter of D0 .
4.7
A biased random walk
To study the evolution of a single box of the mean-field ZRP, you must know
something about the state of the rest of the system. It is sufficient to know X0 ,
the fraction of empty boxes. Consider the step from time i to i + 1. If B1 (i) > 0,
box one loses a ball with probability approximately 1/N . Box one gains a ball
with probability approximately (1 − X0 (i))/N . We consider the effect of ‘fixing’
81
4.7 A biased random walk
X0 = a. Over a short period of time X0 does not change much. The empirical
distribution evolves almost as if each box is an independent copy of the following
biased random walk when ‘n’ is sufficiently large.
Let a ∈ (0, 1) and let n be a positive integer. Define a discrete Markov chain
C on state space Ω = {0, . . . , n − 1}. Each step,
(i) decrease C by one with probability 1/N (unless C = 0),
(ii) increase C by one with probability (1 − a)/N (unless C = n − 1).
That is, C has Markov kernel K : Ω × Ω → [0, 1],


(1 − a)/N k = j + 1,
K(j, k) = 1/N
k = j − 1,


0
|k − j| > 1,
P
and
k K(j, k) = 1. The Markov chain has stationary distribution π(k) =
â(1 − a)k , with â = a/(1 − (1 − a)n ). Of course, π(j)K(j, k) = π(k)K(k, j); the
Markov chain is reversible. Unless N ≤ 3, the Markov chain is lazy.
Lemma 4.5.
(i) Suppose C(0) < n/2. The probability that C hits the right boundary, n − 1,
within N 21/20 steps is O(N 21/20 (1 − a)n/2 ).
(ii) The spectral gap 1 − λ2 ≥ a2 /(4N ). The Markov chain C mixes, in total
variation to within ε of π, in time τ1 (ε) ≤ 4N a−2 log(επ(n − 1)).
(iii) Let µ be the law of C(0). Choose j ≤ l such that µ(C > l) ≤ π(j)/2. Then
(µK i )(k) ≤ 2π(k) for k ≤ j when i ≥ 4N a−2 log 2/π(l).
Arguably it would be more natural to take n = ∞ and consider a random walk
on N. However, with n finite, the analysis is much simpler. Fortunately, provided
a = Ω(R−1 ), we can take n large enough that C is very unlikely to reach n − 1
over the first N 21/20 steps, while still having C mix in time O(R2 N log N ).
We can also run the Markov chain in continuous time. The inequalities of
Section 4.4 can be simplified by π(k + 1) = (1 − a)π(k) and
N
d
µt (k) t=0 =1{k>0} [µ(k − 1)(1 − a) − µ(k)]
dt
− 1{k<n−1} [µ(k)(1 − a) − µ(k + 1)].
82
4.7 A biased random walk
The spectral gap inequality (4.1) becomes,
n−2
n−1
X
1 X (µ(k)(1 − a) − µ(k + 1))2
(µ(k) − π(k))2
≥ (1 − λ2 )
.
N k=0
π(k + 1)
π(k)
k=0
The left and right hand sides of the above inequality are very sensitive to changes
in the tail of µ. Obviously it is not generally possible to just ignore these terms.
However, we can do exactly that if µ(0) = a. We will state this in terms of the
empirical distribution, so take a = X0 .
Lemma 4.6. For j = 1, 2, . . . , the empirical distribution satisfies
j
X
(Xk (1 − X0 ) − Xk+1 )2
k=0
X0 (1 − X0 )k+1
≥
j+1
X02 X (Xk − X0 (1 − X0 )k )2
.
4 k=1
X0 (1 − X0 )k
Inequality (4.3), containing the log Sobolev constant, is equivalent to
n−1
n−2
X
µ(k)
1 X
µ(k)(1 − a)
µ(k) log
≥ 4ν
.
µ(k)(1 − a) − µ(k + 1) log
N k=0
µ(k + 1)
π(k)
k=0
We can adapt this to apply to the empirical distribution. As we may have Xk = 0,
set Xk0 = Xk ∨ (e−1 N −1 ) to avoid dividing by zero. Assume N ≥ n ≥ 10.
Lemma 4.7. If X0 > 0 and Xk = 0 for k ≥ n − 1,
n−2
1 X 0
Xk0 (1 − X0 )
0
(X (1 − X0 )−Xk+1 ) log
≥
0
N k=0 k
Xk+1
4ν
n−1
X
Xk log
k=0
n log N
Xk
− 4ν
.
k
X0 (1 − X0 )
N
For fixed a, the log Sobolev constant
ν≥
(1 − 2π(n − 1))(1 − λ2 )
= Ω(N −1 n−1 )
log(1/π(n − 1) − 1)
by Corollary A.4 of [19]. It is clear that the lower bound ν = Ω(N −1 n−1 ) cannot
be greatly improved. Consider a probability distribution that places its mass far
away from zero, and diffusely. Then the rate of change of the relative entropy
is approximately −a−1 N −1 log(1 − a), due to the downward bias of the random
walk, while the entropy can have order −n log(1 − a).
83
4.8 Concentration results
4.8
Concentration results
We will use a concentration result, Theorem 3.15 from [46], strengthened as mentioned in their Section 3.5.
Theorem 4.8. Let X = (0 = X(0), X(1), . . . , X(m)) be a martingale with respect
to filtration {0, Ω} = F0 ⊂ F1 · · · ⊂ Fm . Let b be the maximum step size,
|∆X(i)| ≤ b. Let σi2 be the conditional variance of ∆X(i) given Fi , and let v̂ be
P
2
an upper bound on m−1
i=0 σi . For any λ ≥ 0,
−λ2
P max X(i) ≥ λ ≤ exp
.
i≤m
2v̂ + 2bλ/3
The above result can be applied to the following class of processes by constructing
a suitable martingale.
Let (W (i) : 0 ≤ i ≤ N 21/20 ) be a process adapted to
filtration (Fi ) with ∆W (i) − E(∆W (i) | Fi ) ≤ b. Suppose that with probability
1 − q from time 0 to N 21/20 ,
(a) W (i) ∈ [0, α], and
(b) while W (i) ∈ [α/8, α], E(∆W (i) | Fi ) ≤ −γ.
Our corollary will be applied to S once we have bounded the expected change
E(∆S | X = x). The quantity v̂ is defined in the proof. It is an upper bound on
sum of the step-wise variances of the constructed martingale X.
Corollary 4.9. If
p
α
≥ 2b log N + 6v̂ log N
16
then P(∀i ≥ 9α/(16γ), W (i) ≤ α/2) ≥ 1 − q − N −1 .
(4.10)
Proof. Assume W (0) ≥ 3α/8. Let M be the stopping time triggered by any of
the following:
(i) W (i) ≤ α/8,
Pi−1
(ii)
j=0 E(∆W (j) | Fj ) ≤ 3α/16 − W (0),
P
(iii) W (i) − W (0) − i−1
j=0 E(∆W (j) | Fj ) ≥ α/16,
84
4.9 Coupling results
(iv) E(∆W (i) | Fi ) > −γ, or
(v) W (i) ≥ α.
M is less than m := d13α/(16γ)e. Let W M denote W stopped at M ; define a
martingale X by X(0) = 0 and ∆X(i) = ∆W M (i) − E(∆W M (i) | Fi ). Let v̂ be
P
an upper bound on m−1
i=0 var(∆X(i) | Fi ). Let c ∈ (0, 1) and suppose
M
P sup X (i) ≥ α/16 ≤ c.
(4.11)
i
Then with probability 1 − q − c, W (i) first leaves the interval [α/4, α] before time
13α/(16γ), having decreased to less than α/4. In particular, W (i) ≤ α/2 after
9α/16 steps. Let c = N −3 , so (4.10) implies (4.11) by Theorem 4.8. If the process
stops because of (i) or (ii), or if initially W ≤ 3α/16, wait for W ≥ 3α/8. Then
start the construction of X over again. The number of times that the process is
restarted is less than N 21/20 . Therefore, the probability that W ≥ α/2 at any
time between 9α/(16γ) and N 21/20 is less than q + N 21/20 c.
We have two other uses for Theorem 4.8. If X ∼ Bin(n, p),
−λ2
.
P(|X − np| ≥ λ) ≤ 2 exp
2np + 2λ/3
Firstly, if E(X) ≥ 8 log N then P(X ≥ 2E(X)) ≤ 2N −3 . We use this to bound
the tail of the empirical distribution. Secondly, if X is a Poisson random variable
with parameter µ ≤ N 1/4 ,
P(|X − µ| ≥ 3N 1/8
p
log N ) ≤ 2N −3 .
We use this to show that the coupling defined in the proof of Theorem 4.12 has
the properties claimed. Over the next three sections we give the main results.
4.9
Coupling results
We have described, and stated mixing results for, a biased random walk. The
random walk was motivated by the evolution of a single box. We will now give
coupling results that make the relationship with the mean-field ZRP more precise.
85
4.9 Coupling results
We first show that if X0 , the fraction of empty boxes, is bounded below then
we can couple the individual boxes to independently evolving copies of the biased
random walk. Choose n = n(a) maximal such that π(n − 1) ≥ 2N −10 . This is an
assumed upper bound on the number of balls any box will contain. Let Aa,s be
the event
√
(i) from time s until time N 21/20 , X0 ≥ a + 6N −1/8 log N , and
(ii) at time s, Xk = 0 for all k ≥ n/2.
Starting from time s, define a process C = (Ci )N
i=1 . Let Ci (s) = Bi (s) for i =
1, . . . , N . Then let the Ci evolve independently, each according to the continuous
time biased random walk in Section 4.7 on {0, 1, . . . , n − 1} with bias a. We
then want to find a coupling of B and C such that the following event has high
probability: for all integer t from s to N 21/20 , there exists t0 ∈ [t − N, t + N ] such
that,
(i) Bi (t) ≤ Ci (t0 ) + 2 for all i, and
(ii) Ci (t0 ) < n − 2 for all i.
Call this event A0a,s .
Theorem 4.12. There exists a coupling (B, C) with P(A0a,s ) ≥ P(Aa,s ) − O(N −2/5 ).
Recall that by Lemma 4.5 (ii), the Ci mix to within N −10 in total variation in
time τ1 (N −10 ). For convenience write
τ (a) = τ1 (N −10 ) + N = O(a−2 N log N ).
After discrete time s + τ (a), we can assume the Bi are bounded by Ci close to
equilibrium. Let Ca,s be the event that from time s + τ (a) to N 21/20 ,
P
j
j
−1
(i)
log N , and
k≥j Xk ≤ 8(1 − a) for all j with (1 − a) ≥ 2N
P
j
−5
(ii)
.
k≥j Xk = 0 for j such that (1 − a) ≤ N
By the concentration calculations at the end of Section 4.8, P(A0a,s \ Ca,s ) =
O(N −2/5 ), so P(Ca,s ) ≥ P(Aa,s ) − O(N −2/5 ).
Before we can make use of Theorem 4.12, we need to know that Aa,s occurs
with high probability for some a, s. This follows from another coupling argument,
this time involving an unbiased random walk on {0, 1, . . . , 2R − 1}.
86
4.9 Coupling results
Theorem 4.13. Let a = (5R)−1 and s = 10N R2 log(R + 1). Then P(Aa,s ) ≥
1 − O(N −2/5 ).
With a ≥ (5R)−1 , the mixing time τ1 (N −10 ) ≤ 2000R2 N log N .
Under the event Ca,s we can perform some calculations that we will need
later. So far we only have a lower bound on X0 . We can calculate an upper
bound on X0 using our upper bound on the tail of the empirical distribution
P
P
and the identities
Xk = 1,
kXk = R. We will also need a bound on the
P
contribution of the tail of the distribution k≥j δ(X0 (1 − X0 )k , Xk ) to the upper
P
bound
δ(X0 (1 − X0 )k , Xk ) ≥ S(X).
Lemma 4.14. Let a ∈ [(5R)−1 , (R + 1)−1 ]. Under Ca,s for integer t from s + τ (a)
to N 21/20 :
(i) The fraction of empty boxes X0 ≤ 0.99 if N is sufficiently large.
(ii) For all j,
1−a
δ(X0 (1 − X0 ) , Xk ) ≤ 8(1 − a) log 40R + (j + 5R) log
1 − X0
k≥j
X
k
j
+
80 log2 N
.
N
Proof of Theorem 4.12. Define an alphabet of 2N letters, Ui and Di for i =
1, . . . , N . These should be understood to mean ‘move coordinate i up’ and ‘move
coordinate i down’, respectively. We will use a Poisson process to produce a
list of letters. Let u = N 1/4 , do the following for time u and independently for
i = 1, . . . , N : write Ui at rate (1 − a)/N and write Di at rate 1/N . Use this list of
letters to run C for time u. Read through the list taking the appropriate actions.
For each Ui increment Ci by one, for each Di decrement Ci by one if Ci > 0.
We can use the same list to run B; each ‘Di ’ corresponds to one step of process
B. Go through the ‘Di ’ symbols in the list in order:
(i) If Bi = 0 do nothing.
(ii) If Bi > 0, pick the first Uj in the list that you have not already selected. If
i 6= j, decrement Bi and increment Bj .
87
4.9 Coupling results
Discard any remaining ‘Ui ’ letters. If we can follow this procedure, and if each
coordinate i has appeared at most once (as either Di or Ui ), then the property
Bi ≤ Ci is preserved. However, the value of Bi during the processing of the list
may be one greater than the final value of Ci .
We require there are enough ‘Uj ’ letters in the list for the letters ‘Di ’ that
correspond to Bi > 0. The number of ‘Ui ’ is a Poisson random variable with
mean u(1 − a) ≤ N 1/4 ; by concentration, with probability 1 − 2N −3 there are at
√
least u(1 − a) − 3N 1/8 log N in the list.
Now condition on the ‘Uj ’ in the list. The ‘Di ’ form a Poisson process with
combined rate 1. We can run the process B as the Di arrive. The total rate
at which ‘Di ’ with Bi > 0 arrive is 1 − X0 . Provided the event Aa,s holds,
√
X0 ≥ a + 6N −1/8 log N ; the rate at which ‘Di ’ with Bi > 0 arrive is bounded
√
above by 1 − a − 6N −1/8 log N . By another application of the concentration re√
sult, the number of Di corresponding to Bi > 0 is less than u(1−a)−3N 1/8 log N .
Therefore with probability P(Aa,s ) − 4N −3 we can run the two processes coupled in this way for a ‘block’ of time. We have run process C for time u, and
process B for as many steps as there are Di in the list, with probability 1 − 2N −3
√
between u ± 3N 1/8 log N steps. Repeat the above block procedure b := 2bN 4/5 c
times; with probability P(Aa,s ) − 6bN −3 we have run process B for at least N 21/20
steps. When the construction fails, we can complete the coupling in an arbitrary
manner.
The motivation for the choice of b, the number of blocks, and u, the size of
each block, was to ensure that with high probability:
(i) B is run for at least order R3 N log N steps,
(ii) (∀i)(coordinate i appears at most twice per block), and
(iii) (∀i)(coordinate i appears twice in no more than one block).
We can take P(Aa,s )−O(N −2/5 ) as a lower bound on the probability of this event.
Let integer t in [s, N 21/20 ] be a discrete time for process B, and let t0 be the
time of the end of the corresponding block for process C. Then if the above
event holds, Bi (t) ≤ Ci (t0 ) + 2 for all i. At time s, Bi ≤ Ci + ki with ki = 0. If
88
4.9 Coupling results
coordinate i does not appear twice is a block, and Bi ≤ Ci + ki at the start of a
block, then Bi may be equal to Ci + ki + 1 during the block but Bi ≤ Ci + ki at
the end of the block. If coordinate i does appear twice in a block, from then on
it may be necessary to take ki = 1. Suppose Ci starts a block at 0, is acted on
by a Ui and then by a Di ; Ci = 0 at the end of the block. Also starting from 0,
Bi may be acted on by Di , Ui , ending up at 1.
Proof of Lemma 4.14. The empirical distribution under Ca,s satisfies,
X
Xk ≤ 8(1 − a)j
if (1 − a)j ≥ 2N −1 log N,
and
k≥j
X
Xk = 0 if (1 − a)j ≤ N −5 .
k≥j
P
Xk imply upper bounds on
k≥j kXk . We
P
choose j sufficiently large, so that k≥j kXk < R, and sufficiently small,
P
so that the upper bound 8(1 − a)j ≥ k≥j Xk is greater than 0.01. Then
P
X0 ≤ 1 − 8(1 − a)j or we get a contradiction,
kXk < R.
(i) The upper bounds on
P
k≥j
We can assume a = (5R)−1 . Take j = 29R so 8(1 − a)j ≥ 0.01. Also,
X
k≥29R
kXk ≤
X
8ak(1 − a)k + 16N −1 log N × log1−a N −5
k≥29R
≤ 0.9R + O(RN −1 log2 N ) < R
if N is sufficiently large.
(ii) Notice that δ(x, y) is linear in the sense that δ(x, y) = xδ(1, y/x). Check
that

x
y < x,
δ(x, y) ≤
y log y/x y ≥ x.
89
4.9 Coupling results
Therefore
X
δ(X0 (1 − X0 )k , Xk )
k≥j
8
1−a
≤
+ k log
+ δ(N −5 , 16N −1 log N )
8a(1 − a) log
X
1
−
X
0
0
k≥j
8
1−a
1−a
80 log2 N
j
≤8(1 − a) log
+ j+
log
+
.
X0
a
1 − X0
N
X
k
Proof of Theorem 4.13. Apply Markov’s inequality to the empirical distribution
P
X. The mean is R, so at most half the boxes have 2R balls in, k<2R Xk ≥ 1/2.
Think of the boxes as being indistinguishable, but with labels 1, . . . , N attached
to indicate how they correspond to B1 , . . . , BN . Let l = dN/2e. Initially Bi < 2R
for i = 1, . . . , l. Run the Markov chain step by step. If at the completion of a
step Bi = 2R for i ≤ l,
(i) pick j > l with Bj < 2R,
(ii) swap the labels on the boxes i and j, so Bi < 2R.
Following this procedure, the property Bi < 2R for i = 1, . . . , l is preserved.
With each step and any subsequent relabelling:
(a) B1 increases by one with probability less than (N − 1)/N 2 .
(b) B1 decreases by at least one with probability at least (N − 1)/N 2 , unless
B1 = 0. Box one may be picked as the source box, or it may get too full
and be subject to a relabelling.
We introduced notation for a Markov chain on the set {0, 1, . . . , n − 1} in Section
4.7. Set n = 2R and a = 0, so the walk is unbiased. The spectral gap, see the
proof of Lemma 4.5, is
π 1
2 1 − λ2 =
1 − cos
≥ 2 .
N
2R
R N
90
4.9 Coupling results
The stationary distribution π is uniform on {0, 1, . . . , 2R − 1}. The time needed
to mix to within (20R)−1 of π in total variation is
log (20R)−1 · (2R)−1 / log(λ2 ) ≤ N R2 log(40R2 ) ≤ 10N R2 log(R + 1).
Let C1 , . . . , Cl be l copies of the continuous time Markov chain on state space
{0, 1, . . . , 2R − 1}. Let Ci = Bi = R for i = 1, . . . , l at time 0, and let the
Ci evolve independently. Once the Ci have mixed sufficiently, |{i : Ci = 0}|
is stochastically greater than a binomial Bin(l, (2R)−1 − (20R)−1 ) distribution.
√
With high probability |{i : Ci = 0}| ≥ 9N/(40R) − 2N 1/2 log N .
Coupling B and C is much easier here than in the proof of Theorem 4.12.
The result follows if we simply show that Bi ≤ Ci for ‘almost every’ i. Write
letters Ui and Di at rate N −1 , independently for i = 1, . . . , N for a block of time
u = N 1/4 . Use the Ui and Di , where i ≤ l, to run C for time u. We will also use
the letters to run B. The number of ‘Ui ’ in the block can be assumed to be within
√
√
N 1/4 ± 3N 1/8 log N ; similarly for the ‘Di ’. Run B for N 1/4 − 3N 1/8 log N steps.
Pick pairs (Di , Uj ) from the list of letters, without replacement. For each pair
move a ball from box i to box j if possible. Repeat b := 2bN 4/5 c times.
Under the concentration bound, the total number of letters left unused is
√
order O(N 37/40 log N ). The probability that coordinate i appears twice in at
least one of the b blocks is O(N −7/10 ). It easily follows by concentration that with
high probability
|{i : Bi = 0}| ≥ |{i : Ci = 0}| + 8N 7/8
p
log N − N/(40R).
To bound the number of balls per box at s = 10N R2 log(R + 1), consider only
the number of balls box one receives up to time s. This is bounded by a binomial
Bin(s, 1/N ) distribution. Again we use concentration,
B1 (s) ≤ R + 2 log N +
p
60R2 log(R + 1) log N + 10R2 log(R + 1)
with probability 1 − N −3 .
91
4.10 Slow initial decrease in S
4.10
Slow initial decrease in S
By Section 4.9 we may assume Ca,s with a = (5R)−1 and s+τ (a) = O(R2 N log N ).
Let φ = 10−4 (R + 1)−2 . In this section we use the log Sobolev result from Lemma
4.7 to show that S decreases to φ. This method could be pushed further, to
show that S decreases to O(R3 N −1 log2 N ). However, the bound on the time
needed would be Ω(R2 N log2 N ). In the next section we show that once S ≤ φ,
S decreases more quickly.
Theorem 4.15. With probability 1−O(N −1/3 ), between O(R2 N log(R+1) log N )
and N 21/20 , S ≤ φ.
We need the following technical lemma to deal with the discrete nature of the
process. Let x0k = xk ∨ (e−1 N −1 ).
Lemma 4.16. Assume xk = 0 for k ≥ n0 ≥ 5 log N . If x0 ≤ 1 − 10−9 ,
0
n −1
1 X 0
x0k (1 − x0 ) 10n0
0
E(∆S | X = x) ≥
(x (1 − x0 )−xk+1 ) log
− 2.
2N k=0 k
x0k+1
N
Proof of Theorem 4.15. The inequality in Lemma 4.7 contains ν, the log Sobolev
constant for the biased random walk. Recall Corollary A.4 of [19], the general
bound on log Sobolev constants in terms of the spectral gap and the minimum
of the equilibrium distribution,
ν≥
(1 − 2π(n − 1))(1 − λ2 )
.
log(1/π(n − 1) − 1)
By Lemma 4.5 (ii) with a = (5R)−1 , spectral gap 1 − λ2 ≥ (100R2 N )−1 . With
n = n(a), the maximum n such that π(n − 1) ≥ 2N −10 ,
ν ≥ (1000R2 N log N )−1 .
By Lemma 4.14 (i) we may apply Lemma 4.16 with n0 = n(a)/2. Combining the
inequalities,
E(∆S | X = x) ≤ −2νS +
92
250R log N
.
N2
(4.17)
4.11 Rapid decrease in S
We can now apply Corollary 4.9 to show that S decreases as inequality (4.17)
suggests. Corollary 4.9 only gives a condition under which the upper bound on a
process (W (i) : 0 ≤ i ≤ N 21/20 ) can be halved, so we need to apply it a number
of times. Suppose inductively that with probability 1 − ql from time tl to N 21/20 ,
S ≤ α = S(GR ) · 2−l .
Let
W (i) = S(X((i + tl ) ∧ bN 21/20 c)).
We need a lower bound γ on the expected decrease of W when W ∈ [α/8, α].
This is obtained by substituting S = α/8 into inequality (4.17); we can take
γ = α(8000R2 N log N )−1 . The maximum time before W hits α/4, or the concentration bound fails, is m = 13α/(16γ) = 6500R2 N log N . The step size ∆W
is bounded by ±2N −1 (1 + log N ) so take b = 4N −1 (1 + log N ) and v̂ = mb2 /4.
Inequality (4.10) is satisfied if α ≥ 105 RN −1/2 log2 N .
Take 1 − q0 = P(Ca,s ), with a = (5R)−1 and s = 10N R2 log(R + 1). Let
t0 = s + τ (a). Then we can take ql+1 = ql + N −1 and tl+1 = tl + m. If N is
sufficiently large,
φ ≥ 105 RN −1/2 log2 N
so the result follows.
We delay the proof of Lemma 4.16.
4.11
Rapid decrease in S
In this section we use the spectral gap related inequality, Lemma 4.6, to prove S
decreases rapidly.
Theorem 4.18. With probability 1−O(N −1/3 ), from O(R3 N log N ) until N 21/20 ,
S(X) = O(N −1 log2 N ).
By Section 4.10 we may assume that S ≤ φ = 10−4 (R + 1)−2 from steps
O(R2 N log(R + 1) log N ) to N 21/20 . When the entropy is close to the maximum
93
4.11 Rapid decrease in S
entropy, X0 must be close to (R + 1)−1 . In Lemma 4.19 we bound the difference.
This allows us to use Theorem 4.12 from Section 4.9 again, this time with an
improved lower bound a. We also need two bounds related to ∆S.
Lemma 4.19. If S(x) ≤ (R + 1)/(3R2 ) then
2
1
3R2 S
x0 −
≤
.
R+1
(R + 1)3
Lemma 4.20. Assume xk = 0 for k ≥ n0 ≥ 5 log N . Then
0
n −1
1 X (xk (1 − x0 ) − xk+1 )2
10n0
E(∆S | X = x) ≥
− 2.
N k=0 max{xk (1 − x0 ), xk+1 }
N
Lemma 4.21. Suppose x0 ≤ 2/3 and xk = 0 for k ≥ n0 . Then the variance of
∆S satisfies
var(∆S | X = x) ≤
120n0 log N + 200n0
12 log N
E(∆S | X = x) +
.
N
N3
Proof of Theorem 4.18. The proof is similar to the proof of Theorem 4.15. We
start off with the assumption that S ≤ φ between O(R2 N log(R + 1) log N ) and
N 21/20 steps. This allows us to bound E(∆S | X = x) when S is in a suitable
range. We then use Corollary 4.9 inductively.
However there are some added complications. One problem is that the bound
on E(∆S | X = x) will not apply ‘instantly’. Once S ≤ φ2−l , l = 0, 1, . . . , we
will have to wait before we can assume an improved bound on E(∆S | X = x).
Secondly, we must use Lemma 4.21 to bound the variance of ∆S. We split
the proof of Theorem 4.18 into two parts. In the first part we will show that
S ≤ ψ = (R + 1)/(3 log2R/(R+1) N ). In the second, that S ≤ 105 N −1 log2 N .
We collect and ‘daisy-chain’ (i) Lemma 4.20, (ii) an elementary observation,
(iii) Lemma 4.6, (iv) the bound δ(x, y) ≤ (x − y)2 /x for function δ, and (v)
Lemma 4.4.
P
(xk (1−x0 )−xk+1 )2
−2
(i) E(∆S | X = x) ≥ N −1 j−2
log N ,
k=0 max{xk (1−x0 ),xk+1 } − 100(R + 1)N
Pj−2 (xk (1−x0 )−xk+1 )2
Pj−2 (xk (1−x0 )−xk+1 )2
x0 (1−x0 )k
(ii)
inf
,
k≤j−1
k+1
k=0 max{xk (1−x0 ),xk+1 } ≥
k=0
xk
x0 (1−x0 )
Pj−2 (xk (1−x0 )−xk+1 )2
P
(xk −x0 (1−x0 )k )2
(iii)
≥ (x20 /4) j−1
,
k=0
k=1
x0 (1−x0 )k+1
x0 (1−x0 )k
94
4.11 Rapid decrease in S
(iv)
Pj−1
(xk −x0 (1−x0 )k )2
x0 (1−x0 )k
(v)
P∞
δ(x0 (1 − x0 )k , xk ) ≥ S(x).
k=1
k=1
≥
Pj−1
k=1
δ(x0 (1 − x0 )k , xk ),
Therefore,
x3
E(∆S | X = x) ≤ − 0
4N
!
S−
X
δ(x0 (1 − x0 )k , xk )
(1 − x0 )k
k≤j−1
xk
inf
k≥j
100(R + 1) log N
.
N2
+
The result now hinges on controlling, for some j,
(a)
X
δ(x0 (1 − x0 )k , xk )
and
k≥j
(b) inf
k≤j
(1 − x0 )k
.
xk
We need j sufficiently large to make quantity (a) small, but not too large or
quantity (b) becomes too small.
Part One:
For the first induction argument, we take α = φ2−l , l = 0, 1, . . . , blog2 φ/ψc.
Assume inductively S ≤ α ∈ [ψ, φ]. By Lemma 4.19,
r
3α
X0 − 1 ≤ R
.
R+1
R+1 R+1
Then, provided N is sufficiently large, we find that X0 , a satisfy the requirements
of Theorem 4.12 if we take,
1
R
a = a(α) =
−
ε(α),
R+1 R+1
r
ε(α) = 2
3α
.
R+1
(4.22)
Once S ≤ ψ, we will not need to keep improving our bound a ≤ X0 in this way.
For each value a = a(α), we apply Theorem 4.12, producing a corresponding set
of coupled biased random walks Ca = (Cia )N
i=1 . Once we have S ≤ α with high
probability, we must wait for the Cia to mix, though not necessarily for as long
as τ (a) steps. Choose j ∈ N,
j = j(α) = bε(α)−1 c,
95
α ∈ [ψ, φ].
4.11 Rapid decrease in S
(a) When α = φ wait τ (a) steps; by Lemma 4.5 (ii) the Cia mix to within N −10
in total variation. Then by concentration we can assume Xk ≤ 8(1 − a(φ))k
for k such that (1 − a(φ))k ≥ 2N −1 log N .
(b) When α = φ2−l , l = 1, . . . , blog2 φ/ψc, wait O(RN j(α)) steps so we can
bound Xk for k ≤ j(α) by Lemma 4.5 (iii). Then Xk ≤ 16(1 − a(α))k for
k ≤ j(α).
Note that if k ≤ j(α) by (4.22),
(R/(R + 1))k ≤ (1 − a(α))k ≤ e(R/(R + 1))k .
By (a) and Lemma 4.14, if α ∈ [ψ, φ] and S ∈ [α/8, α],
X
δ(X0 (1 − X0 )k , Xk ) ≤ 200j(1 − a(φ))j ≤ α/16.
k≥j
By (b),
inf
k≤j
(1 − X0 )k
e−1 (R/(R + 1))k
1
≥ inf
≥
.
k
k≤j 16e(R/(R + 1))
Xk
16e2
a(α)
Therefore when S ∈ [α/8, α], α ∈ [ψ, φ] and the Ci
E(∆S | X = x) ≤ −
104 (R
have mixed,
α
.
+ 1)3 N
Part Two:
From now on fix
1
R
a=
−
ε,
R+1 R+1
r
ε=2
3ψ
.
R+1
We apply Theorem 4.12 only once more; wait τ (a) for the Cia to mix. For k such
that (1 − a)k ≥ 2N −1 log N , Xk ≤ 8(1 − a)k . Let
j = blog1−a 2N −1 log N c.
If k ≤ j ≤ 2ε−1 , (R/(R + 1))k ≤ (1 − a)k ≤ e2 (R/(R + 1))k . Therefore
inf
k≤j
(1 − X0 )k
1
≤ 4.
Xk
8e
96
4.11 Rapid decrease in S
By Lemma 4.14 with N large,
X
δ(X0 (1 − X0 )k , Xk ) ≤ 160N −1 log2 N.
k≥j
For the second induction argument take α = ψ2−l , l = 0, 1, . . . . When S ∈ [α/8, α]
and α ∈ [105 N −1 log2 N, ψ],
E(∆S | X = x) ≤ −
α
3·
104 (R
+ 1)3 N
.
Using the bound on variance from Lemma 4.21,
v̂ ≤
12 log N
108 (R + 1)4 log N
α+
.
N
N2
Inequality (4.10) holds for α ≥ 105 N −1 log2 N if N is sufficiently large.
Proof of Lemma 4.19. We prove this by maximizing S(x) with x0 fixed. We can
assume that x1 , x2 , x3 , . . . form a geometric series, a, ar, ar2 , . . . . Then
x0 +
X
ark = 1,
X
(1 − x0 )2
,
R
r=
(k + 1)ark = R.
Solve to get,
a=
R − 1 + x0
.
R
Therefore the maximum value for S is
−x0 log x0 − (1 − x0 ) log
R − 1 + x0
(1 − x0 )2
− (R − 1 + x0 ) log
.
R
R
Check that for 0 ≤ y ≤ 2x, (x − y)2 /(3x) ≤ δ(x, y). If x0 ≤ 2(R + 1)−1 ,
S ≥ S GR − S(x0 , a, ar, . . . )
!
√
√
1
R 1 − x0
R R − 1 + x0
=δ
, x0 + 2 Rδ
, √
+ Rδ
,
R+1
R+1
R+1
R
R
2
(R + 1)3
1
1
≥
x0 −
.
3
R+1
R2
This lower bound on S is a convex function of x0 .
97
4.12 Final observations
4.12
Final observations
We have shown that the event S = O(N −1 log2 N ) from time O(R3 N log N ) to
N 21/20 has high probability. In this section we modify our argument to show
that S decreases to O(R4 N −1 log N ). Rough calculations suggest that under the
equilibrium measure, the expected value of S is Ω(RN −1 log N ); our result may
in some sense be tight.
As in part two of the proof of Theorem 4.18, take
s
1
12R2 ψ
a=
−
, j = blog1−a 2N −1 log N c.
R+1
(R + 1)3
We can then assume that
E(∆S | X = x) ≤ −
S−
δ(x0 (1 − x0 )k , xk ) 100(R + 1) log N
+
.
2000(R + 1)3 N
N2
P
k≥j
We sharpen Lemma 4.14. With s = O(R3 N log N ), let E be the subset of Ca,s
where S = O(N −1 log2 N ) from s to N 21/20 . Let E 0 be the intersection of events
E and
(
1000(R + 1)2 log N
δ(X0 (1 − X0 )k , Xk ) ≥
∀i ∈ [s, N 21/20 ],
N
k≥j
X
)
.
Lemma 4.23. P(E 0 ) ≥ 1 − O(N −1/3 ).
We will now consider a modification of the entropy gap process S. Let Fi be the
σ-algebra generated by the mean field ZRP up to step i. Let W be a random
process defined from O(R3 N log N ) to N 21/20 ,
(
S(X(i)) if E(E 0 | Fi ) > 0,
W (i) =
0
otherwise.
Let n = blog log N c. Let C be the discrete time biased random walk on
Ω = {0, 1, . . . , n − 1} that decreases by one with probability 2/3 if C > 0 and
increases by one with probability 1/3 if C < n − 1. C is a faster version of the
biased random walk from Section 4.7 with bias 1/2. The total variation mixing
time for C is O(n) steps.
98
4.12 Final observations
By the definition of E 0 , W = O(N −1 log2 N ). Say that log2 (W/α) is dominated by C with maximum step time m if it has the following property. Whenever
2k W is within a small interval of α/2 with positive integer k, then with probability 2/3, 2k W decreases to below α/4 within m steps without having reached
α.
Theorem 4.24. log2 (W/α) is dominated by C with maximum step time O(R3 N ).
Proof. We use the construction from Section 4.8. In Corollary 4.9 the construction is applied repeatedly with c, the probability of ‘error’, very small each time.
We now apply the construction once at a time and with relaxed error probability;
let c = 1/3. Then for inequality (4.11), we need a weaker condition than (4.10),
only that
p
α
≥ (2/3)b log 3 + 2v̂ log 3.
(4.25)
16
Then if W is in a small neighbourhood of α/2, with probability 2/3, W enters [0, α/4] without increasing beyond α. If α ≥ 107 (R + 1)4 N −1 log N and
W ∈ [α/8, α] then
E(∆W | X = x) ≤ −
α
2·
104 (R
+ 1)3 N
and
12 log N
108 (R + 1)4 log N
α+
.
N
N2
Inequality (4.25) holds for N sufficiently large.
v̂ ≤
Proof of Lemma 4.23. As j ≥ (1/2) log1−a N −1 , under Ca,s ,
X
k≥10j
δ(X0 (1 − X0 )k , Xk ) =
X
δ(X0 (1 − X0 )k , 0) = O(N −5 ).
k≥10j
Similarly, we must bound the sum from j to 10j. Check,
δ(λ, x) ≤ λ(e2 + 1) + [x log x/(λe) − λe2 ]+ .
Suppose X ∼ Bin(N, λ/N ) and consider Y = [X log X/(λe) − e2 λ]+ . Let Z be
the geometric random variable with distribution ((1 − e−1 )e−n )∞
n=0 ; P (Z ≥ n) =
99
4.12 Final observations
exp(−n). We will show that P (Y ≥ n) ≤ exp(−n), so Y ≤st 1 + Z. We can
write,
10j
X
k
δ(X0 (1 − X0 ) , Xk ) ≤
k=j
10j
X
δ(4(1 − a)k , Xk ) + Xk log
k=j
4(1 − a)k
≤
X0 (1 − X0 )k
10j
X
Xk
4(1 − a)k )
k 2
k 2
.
− 4(1 − a) e
+ 4(1 − a) (e + 1) + Xk log
Xk log
X0 (1 − X0 )k
4(1 − a)k e
+
k=j
For k ≤ 10j, (1 − X0 )k ≤ (1 − a)k ≤ e20 (1 − X0 )k . Up to modification off E 0 ,
Nk = N Xk ≤st Bin(N, 4(1 − a)k ). With Zi independent random variables equal
to Z in distribution,
10j
X
k
δ(X0 (1 − X0 ) , Xk ) ≤st N
−1
k=j
9j
X
Zi +
k=1
500(R + 1)2 log N
.
N
We can bound the probability that the sum of l := 9j independent random
variables Zi is greater than B := le−1/4 /(1 − e−1/4 ) ≤ 4l. Let Ẑi be independent
geometric ((1 − e−1/4 )e−k/4 )∞
k=0 random variables. By an elementary comparison,
!
!
l
l
l
X
X
1 − e−1
P
Zi ≥ B ≤ P
Ẑi ≥ B
exp(−(3/4)B)
−1/4
1
−
e
i=1
i=1
≤ exp(−l) = O(N −1 ).
It remains to show that P(Y ≥ n) ≤ exp(−n). The set of positive values Y can
take is,
Aλ =
k
2
k log
− e λ : k ∈ N ∩ [0, ∞)
λe
is sparser than N = {0, 1, . . . }; by this we mean that there is a one to one function
f : N → Aλ such that n ≤ f (n) for all n. Let y = k log k/(λe) ∈ Aλ ,
k N −k
N
λ
λ
P(Y = y) =
1−
k
N
N
k
(k/e)k
λ
≤
≤ (1 − e−1 ) exp (−y) .
k!
k/e
Therefore P(Y ≥ n) ≤ exp(−n).
100
4.13 Proofs relating to entropy and the biased random walk
4.13
Proofs relating to entropy and the biased
random walk
Proof of Lemma 4.4. If xk = t(1 − t)k , then x has mean
P
kxk = (1 − t)t−1 and
entropy S(x) = − log t − (1 − t)t−1 log(1 − t). By definition,
S(x) = S(GR ) −
Using
P
xk = 1 and
P
X
−xk log xk ,
k
k+1
GR
.
k = R /(R + 1)
kxk = R,
xk
xk log k
R /(R + 1)k+1
"
#
R
X
t(1 − t)
= log
xk log
R +
S(x) =
X
1
R+1
R
R+1
xk
.
t(1 − t)k
For t ∈ [0, 1], t(1 − t)R is maximized by t = (R + 1)−1 . The relative entropy
of x with respect to a geometric distribution is minimised by GR . The relative
entropy can also be written using the function δ(x, y) = y log y/x + (y − x),
X
xk
δ(Rk /(R + 1)k+1 , xk ) =
xk log k
, and
R /(R + 1)k+1
X
X
xk
δ(x0 (1 − x0 )k , xk ) =
xk log
.
x0 (1 − x0 )k
X
Proof of Lemma 4.5. We will work with the left eigenvalues of K. The equilibrium distribution π(k) = F1 (k) = â(1 − a)k is the unique (up to scalar multiples) eigenvector of K with eigenvalue 1. The other n − 1 eigenvalues of the
Markov chain can be written as the imaginary parts of a geometric sequences. Let
√
−1/2
Aj = 1 − a exp(iπ(j − 1)/n) for j = 2, . . . , n. Let Fj (k) = Rj =[(1 − Aj )Akj ]
with Rj the normalizing constant,
n
a √
π(j − 1)
Rj =
1 − − 1 − a cos
.
â
2
n
101
4.13 Proofs relating to entropy and the biased random walk
For convenience, allow the definition of Fj (k) to extend to Fj (−1) and Fj (n);
note that Fj (−1)(1 − a) = Fj (0) and Fj (n − 1)(1 − a) = Fj (n). Check,
1
=[(1 − a)Fj (k − 1) − (1 + (1 − a))Fj (k) + Fj (k + 1)]
N
1
= Fj (k) − =[(1 − Aj )Akj (1 − (1 − a)A−1
j )(1 − Aj )]
N
= Fj (k) 1 − N −1 |1 − Aj |2 .
(Fj K)(k) = Fj (k) +
Therefore, for j ≥ 2, the j-th eigenvalue is
λj = 1 − N −1 |1 − Aj |2
√
(j − 1)π
−1
= 1 − 2N
1 − a/2 − 1 − a cos
n
√
≤ 1 − 2N −1 (1 − a/2 − 1 − a)
≤ 1 − a2 /(4N ).
(i) Suppose the Markov chain C starts at k with probability 1. C can be
restricted to the subset {k, k + 1, . . . , n − 1}. It is then a translation of the
random walk with ‘n0 = n − k. The Markov chain on {0, . . . , n − 1} started
at k is stochastically smaller than the MC restricted to {k, . . . , n − 1} and
started at k, which in turn is stochastically smaller than the equilibrium
distribution of the Markov chain restricted to {k, . . . , n − 1}. Therefore for
all i, (µK i )(C ≥ j + k) ≤ (1 − a)j .
(ii) With τ1 (ε) the time to convergence within ε in the sense of total variation,
we will show the standard bound [48],
τ1 (ε) ≤
log(επmin )
.
log λ2
Write K i (x, y) for the i-step transition probability from x to y. Then
K i (x, y) =
X
hδx , Fj iπ Fj (y)λij
j
= π(y) +
X
j≥2
102
π(x)−1 Fj (x)Fj (y)λij .
4.13 Proofs relating to entropy and the biased random walk
As 1 = K 0 (x, x) ≥ π(x)−1
X
Fj (x)2 ,
j≥2
X
Fj (x)2 ≤ π(x).
j≥2
By the Cauchy-Schwarz inequality, with πmin = minx π(x),
X
Fj (x)Fj (y)λij |K i (x, y) − π(y)| = π(x)−1 j≥2
sX
X
Fj (x)2
Fj (y)2
≤ λi2 π(x)−1
j≥2
j≥2
p
≤ λi2 π(x)−1 π(x)π(y)
λi2
≤ π(y) p
π(x)π(y)
≤ π(y)
λi2
.
πmin
With 1 − λ2 ≥ a2 /(4N ),
τ1 (ε) ≤ logλ2 (επ(n − 1)) ≤ −4N a−2 log(επ(n − 1)).
(iii) Suppose first that µ(C = l) = 1. The general result follows by linearity.
From part (ii),
i
i
K (l, k)
≤ p λ2
−
1
π(k)
π(k)π(l)
λi
≤ 2
π(l)
≤ 1/2 if i ≥ 4N a−2 log 2/π(l).
Proof of Lemma 4.6. For convenience, change variables as follows. Let uk =
p
(Xk − X0 (1 − X0 )k )/ X0 (1 − X0 )k for k = 0, 1, . . . , j + 1, u0 = 0. First consider
the problem: minimize f for fixed g and uj+1 ,
f=
√
j
X
p
(uk 1 − X0 − uk+1 )2 ,
k=0
g=
j+1
X
u2k .
k=1
Let c = 1 − 1 − X0 ≥ X0 /2. We will prove the lemma by showing that f ≥ c2 g.
Introduce a Lagrange multiplier µ̂,
∂
∂
µ̂
g=
f,
∂uk
∂uk
103
k = 1, . . . , j.
4.13 Proofs relating to entropy and the biased random walk
With a further change of variables this is simply,
uk+1 − 2µ uk + uk−1 = 0,
k = 1, . . . , j.
When the equation x2 − 2µ x + 1 = 0 has distinct roots, the solution of the
difference
can
in terms of powers of the roots, that is
equation
k be expressed
k
p
p
uk = c1 µ − µ2 − 1 + c2 µ + µ2 − 1 . The solution, up to a multiplicative
factor, can be written as follows with Θ a function of µ.
µ
uk
(−1)k sinh Θk
< −1
(−1)k k
−1
(−1, 1) sin Θk
1
k
>1
sinh Θk
Xk − X0 (1 − X0 )k
p
X0 (1 − X0 )k (−1)k sinh Θk
p
X0 (1 − X0 )k (−1)k k
p
X0 (1 − X0 )k sin Θk
p
X0 (1 − X0 )k k
p
X0 (1 − X0 )k sinh Θk
If uk = k, f ≥ c2 g follows by evaluating the quadratic sums f and g.
j
X
p
f=
(k 1 − X0 − (k + 1))2
k=0
j
=
X
(kc + 1)2
k=0
j(j + 1)
j(j + 1)(2j + 1)
+ 2c
+ (j + 1)
6
2
j+1
X
(j + 1)(j + 2)(2j + 3)
g=
k2 =
6
k=1
= c2
If uk = sin Θk we can use the following identity to expand f − c2 g.
j
X
sin Θ(k + a) sin Θk
k=0
≡
j
X
cos Θa − cos Θ(2k + a)
k=0
≡
2
j+1
sin Θ(2j + 1 + a) − sin Θ(a − 1)
cos Θa −
2
4 sin Θ
104
4.13 Proofs relating to entropy and the biased random walk
Therefore
p
f − c2 g =2 1 − X0
j
X
sin2 Θ(k + 1) −
k=0
j
− (1 − X0 )
X
j
X
!
sin Θ(k + 1) sin Θk
k=0
j
sin2 Θ(k + 1) −
k=0
X
!
sin2 Θ(k)
k=0
p
1p
Θ
1 − X0 tan sin Θ(2j + 2)
=(j + 1) 1 − X0 (1 − cos Θ) +
2
2
1 p
+ ( 1 − X0 − (1 − X0 ))(1 − cos Θ(2j + 2))
2
p
Θ
1
≥ 1 − X0 (j + 1)(1 − cos Θ) + tan sin Θ(2j + 2)
2
2
p
sin Θ(2j + 2)
2 Θ
= 1 − X0 sin
(2j + 2) +
≥ 0.
2
sin Θ
For uk = sinh Θk, we can apply Osborne’s rule to convert the previous trigonometric identity to a hyperbolic one,
p
−f + c2 g =(j + 1) 1 − X0 (1 − cosh Θ)
1p
Θ
−
1 − X0 tanh sinh Θ(2j + 2)
2
2
1 p
+ ( 1 − X0 − (1 − X0 ))(1 − cosh Θ(2j + 2))
2
p
sinh Θ(2j + 2)
2 Θ
≤ − 1 − X0 sinh
(2j + 2) +
≤ 0.
2
sinh Θ
We can ignore the solutions (uk ) corresponding to µ ≤ −1, they are clearly never
optimal.
Proof of Lemma 4.7. Define f by f (k) = Xk0 /π(k) where Xk0 = Xk ∨ (e−1 /N ).
105
4.14 Technical lemmas relating to ∆S
We use the log Sobolev constant, following the proof of Theorem 3.6 in [19].
1 X 0
x0 (1 − a)
(xk (1 − a) − x0k+1 ) log k 0
N k
xk+1
1 X
=
π(k + 1)[f (k) − f (k + 1)][log f (k) − log f (k + 1)]
N k
X
=
π(j)K(j, k)f (j)[log f (j) − log f (k)]
j,k
= E(f, log f )
p p
≥ 4E( f , f )
p
≥ 4νL( f ).
We can now switch back to the empirical distribution X = (Xk ) by admitting a
small error term.
X
p
Xk0
0
√
L( f ) =
Xk log
k f k22,π π(k)
X
X
X
Xk
e−1
e−1 N −1 X 0
=
Xk log
+
log
−
Xk log
Xk0
π(k)
N
π(k)
{k∈Ω:Xk =0}
−1
X
ne
Xk
−1 −1
−1 −1
−
log eN + (1 + ne N ) log(1 + ne N )
≥
Xk log
π(k)
N
−1
X
Xk
ne
−1 −1
−1 −1
≥
Xk log
−
log eN + (1 + ne N )ne N
π(k)
N
X
Xk
n log N
≥
Xk log
−
.
k
X0 (1 − X0 )
N
4.14
Technical lemmas relating to ∆S
All that remains now is to prove Lemmas 4.16, 4.20 and 4.21. The proofs of
Lemmas 4.16 and 4.20 are very similar. We express E(∆S | X = x) as a sum,
and then carefully bound the terms. The proof of Lemma 4.21 is similarly technical. To make the calculations more manageable we define two functions. Let
f : [0, ∞) → R be given by
f (x) = x log x,
106
f (0) = 0,
4.14 Technical lemmas relating to ∆S
and let g : [0, ∞) → R be the ‘discrete-derivative’ of f ,
g(x) = f (x + 1/N ) − f (x).
If x = 0, take x × g(x − 1/N ) to be equal to zero.
Proof of Lemma 4.16. Consider one step of the Markov chain starting from X =
x. Let Rk be the random variable, taking values in {−1/N, 0, +1/N }, measuring
the movement from Xk to Xk+1 ; that is the fraction of boxes that hold k balls and
gain a ball, minus the fraction of boxes that hold k + 1 balls and lose a ball. Let
mk = xk (1 − x0 ) − xk+1 so E(Rk ) = mk N −1 + O(N −2 ). Define R−1 = m−1 = 0.
P 0 −1
As nk=0
mk = 0 we can write:
E(∆S) =
∞
X
E[f (xk ) − f (xk + Rk−1 − Rk )]
k=0
=
∞
X
E[f (xk ) − f (xk + Rk−1 ) + f (xk + Rk−1 ) − f (xk + Rk−1 − Rk )]
k=0
=
0 −1
n
X
E[f (xk + Rk−1 ) − f (xk + Rk−1 − Rk ) + f (xk+1 ) − f (xk+1 + Rk )]
k=0
=
0 −1
n
X
E[f (xk + Rk−1 ) − f (xk + Rk−1 − Rk ) + mk N −1 log(1 − x0 )
k=0
+ f (xk+1 ) − f (xk+1 + Rk )].
We must then bound the difference, for k = 0, 1, . . . , n0 − 1, between terms
Zk
:=
E[f (xk + Rk−1 ) − f (xk + Rk−1 − Rk ) + mk N −1 log(1 − x0 )
+f (xk+1 ) − f (xk+1 + Rk )]
1 0
x0 (1 − x0 )
and
(xk (1 − x0 ) − x0k+1 ) log k 0
.
2N
xk+1
For k ≥ 1, we list the possible values of the pair (Rk−1 , Rk ) such that Rk 6= 0.
P((Rk−1 , Rk ) = (rk−1 , rk ) | X = x)
rk−1
rk
+1/N
−1/N
xk−1 xk+1
−1/N
+1/N
xk (xk − 1/N )
0
−1/N
xk+1 (1 − xk−1 − xk − 1/N )
0
+1/N
xk (1 − x0 − xk − xk+1 )
107
4.14 Technical lemmas relating to ∆S
Using function g(x) = f (x + 1/N ) − f (x) we split Zk into larger and smaller
terms.
Zk =mk N −1 log(1 − x0 )
+ [g(xk+1 − 1/N ) − g(xk + 1/N )]xk−1 xk+1
+ [g(xk − 2/N ) − g(xk+1 )]xk (xk − 1/N )
+ [g(xk+1 − 1/N ) − g(xk )]xk+1 (1 − xk−1 − xk − 1/N )
+ [g(xk − 1/N ) − g(xk+1 )]xk (1 − x0 − xk − xk+1 )
=ZkA + ZkB
ZkA =mk [N −1 log(1 − x0 ) + g(xk ) − g(xk+1 )]
− xk−1 xk+1 [g(xk + 1/N ) − g(xk )]
ZkB = − xk (1 − x0 − xk+1 )[g(xk ) − g(xk − 1/N )]
− xk+1 (1 − xk )[g(xk+1 ) − g(xk+1 − 1/N )]
− xk (xk − 1/N )[g(xk − 1/N ) − g(xk − 2/N )]
− (xk /N )[g(xk − 1/N ) − g(xk+1 )]
− (xk+1 /N )[g(xk+1 − 1/N ) − g(xk )]
We write the equivalent terms for k = 0.
r0
P(R0 = r0 | X = x)
−1/N
x1 (1 − x0 − 1/N )
+1/N
x0 (1 − x0 − x1 )
Zk =mk N −1 log(1 − x0 )
+ [g(x1 − 1/N ) − g(x0 )]x1 (1 − x0 − 1/N )
+ [g(x0 − 1/N ) − g(x1 )]x0 (1 − x0 − x1 )
=Z0A + Z0B
108
4.14 Technical lemmas relating to ∆S
Z0A =mk [N −1 log(1 − x0 ) + g(x0 ) − g(x1 )]
Z0B = − x0 (1 − x0 − x1 )[g(x0 ) − g(x0 − 1/N )]
− x1 (1 − x0 )[g(x1 ) − g(x1 − 1/N )]
− (x1 /N )[g(x1 − 1/N ) − g(x0 )]
So by definition, E(∆S | X = x) =
P
k
ZkA + ZkB . We will first bound
P
k
ZkB .
For x, y ∈ {0, 1/N, . . . , 1},
(i) x [g(x) − g(x − 1/N )] ≤ 2N −2 log 2, and
(ii) (x/N )(g(x − 1/N ) − g(y)) ≤ xN −2 (1 + log N ).
P B
Therefore
Zk ≥ −6n0 N −2 log 2 + 2N −2 (1 + log N ) ≥ −5n0 N −2 . We will now
show that if x0 ≤ 1 − 10−9 ,
mk (N −1 log(1 − x0 ) + g(xk ) − g(xk+1 )) − xk−1 xk+1 [g(xk + 1/N ) − g(xk )]
≥ (2N )−1 (x0k (1 − x0 ) − x0k+1 ) log
x0k (1 − x0 )
− 5N −2 .
x0k+1
(4.26)
Write xk = a/N , xk+1 = b/N . Let a0 = a ∨ e−1 , b0 = b ∨ e−1 . It is now sufficient
to show A ≥ B/2 − 5,
A = (a(1 − x0 ) − b)[log(1 − x0 ) + f (a + 1) − f (a) − f (b + 1) + f (b)]
− b[f (a + 2) − 2f (a + 1) + f (a)],
B = (a0 (1 − x0 ) − b0 ) log
a0 (1 − x0 )
.
b0
We will use the following simple facts in our case by case analysis.
(a) If k ≥ 1,
1 + log k ≤ f (k + 1) − f (k) ≤ 1 + log k + (2k)−1 .
(b) For all c, x ≥ 0, x log x − cx ≥ − exp(c − 1).
(c) For all k ∈ N, 0 ≤ k[f (k + 2) − 2f (k + 1) + f (k)] ≤ 1.
We now consider cases (i) a = b = 0, (ii) 0 = a < b, (iii) 0 = b < a,
(iv) 0 < b < a(1 − x0 ) and (v) 0 < a(1 − x0 ) ≤ b.
109
4.14 Technical lemmas relating to ∆S
(i) If a = b = 0:
A=0
B = −x0 e−1 log(1 − x0 ) ≤ 10
A ≥ B/2 − 5
(ii) If 0 = a < b:
b
−1
A = b log
+ (b + 1) log(1 + b ) − b[2 log 2]
1 − x0
b
≥ b log
+ 1 − 2 log 2
1 − x0
b
−1
+1
B = (b − e ) log
1 − x0
A − B/2 ≥ b [log b + 1 − 4 log 2] /2
≥ −5
(iii) If 0 = b < a, set t = a(1 − x0 ).
A ≥ t[log t + 1]
B = (t − e−1 )[log t + 1]
A − B/2 ≥ −5
(iv) If 0 < b < a(1 − x0 ), set t = a(1 − x0 )/b ≥ 1.
1
A ≥ b(t − 1) log t −
−1
2b
B = b(t − 1) log t
A − B/2 ≥ (t − 1)[log t − 1]/2 − 1
≥ −5
(v) If 0 < a(1 − x0 ) ≤ b, set t = b/a(1 − x0 ) ≥ 1.
1
A ≥ a(1 − x0 )(t − 1) log t −
− t(1 − x0 )
2a
B ≥ a(1 − x0 )(t − 1) log t
1
A − B/2 ≥ (1 − x0 )(t − 1)[log t − 1] − t(1 − x0 ) ≥ −5
2
110
4.14 Technical lemmas relating to ∆S
Proof of Lemma 4.20. The proof is similar to the proof of Lemma 4.16. In place
of inequality (4.26) we must show,
mk (N −1 log(1 − x0 ) + g(xk ) − g(xk+1 )) − xk−1 xk+1 [g(xk + 1/N ) − g(xk )]
≥
1
m2k
5
− 2.
N max{xk (1 − x0 ), xk+1 } N
This is equivalent to showing A ≥ C − 5, with
A =(a(1 − x0 ) − b)(log(1 − x0 ) + f (a + 1) − f (a) − f (b + 1) + f (b))
− b[f (a + 2) − 2f (a + 1) + f (a)],
C =(a(1 − x0 ) − b)2 / max{a(1 − x0 ), b}.
We consider the following cases: (i) a = b = 0, (ii) 0 = b < a (iii) 0 = a < b,
(iv) 0 < b < a(1 − x0 ) and (v) 0 < a(1 − x0 ) ≤ b.
(i) If a = b = 0 then A = C = 0.
(ii) If 0 = b < a, set t = a(1 − x0 ).
A − C ≥ t log t − t ≥ −5
(iii) If b > a = 0:
A − C ≥ b log
b
− b (1 + 2 log 2)
1 − x0
≥ −5
(iv) If 0 < b < a(1 − x0 ), set t = a(1 − x0 )/b > 1.
1
A ≥ b(t − 1) log t −
−1
2b
1
C = b(t − 1) 1 −
t
A − C ≥ (t − 1)[log t − 3/2] − 1 ≥ −5
111
4.14 Technical lemmas relating to ∆S
(v) If 0 < a(1 − x0 ) ≤ b, set t = b/(a(1 − x0 )).
1
A ≥ a(1 − x0 )(t − 1) log t −
− t(1 − x0 )
2a
1
C = a(1 − x0 )(t − 1) 1 −
t
A − C ≥ (1 − x0 )[(t − 1)(log t − 5/2) − 1] ≥ −5
Proof of Lemma 4.21. Of course var(∆S | X = x) ≤ E((∆S)2 | X = x). Once
we have bounded E((∆S)2 | X = x) the result follows by Lemma 4.16. Sum over
the number of balls in the source box, i, and the sink boxes, j.
h
i2
XX
xi (xj − δij /N ) g(xi − N −1 ) − g(xi−1 ) + g(xj − N −1 ) − g(xj+1 )
i>0 j≥0
≤
XX
h
xi (xj − δij /N ) − g(xi−1 ) − N −1 log(1 − x0 ) + g(xi − N −1 )
i>0 j≥0
i2
+g(xj − N −1 ) + N −1 log(1 − x0 ) − g(xj+1 )
h
XX
2
≤2
xi xj g(xi−1 ) + N −1 log(1 − x0 ) − g(xi − N −1 )
i>0 j≥0
2 i
+ g(xj − N −1 ) + N −1 log(1 − x0 ) − g(xj+1 )
≤2
0 −1
n
h
X
2
xk (1 − x0 ) g(xk − N −1 ) + N −1 log(1 − x0 ) − g(xk+1 )
k=0
+xk+1 g(xk ) + N
−1
log(1 − x0 ) − g(xk+1 − N
−1
2 i
)
Let
2
A =xk (1 − x0 ) g(xk − N −1 ) + N −1 log(1 − x0 ) − g(xk+1 )
2
+ xk+1 g(xk ) + N −1 log(1 − x0 ) − g(xk+1 − N −1 ) ,
B =(x0k (1 − x0 ) − x0k+1 ) log x0k (1 − x0 )/x0k+1 .
With x0 ≤ 2/3 it is straightforward to check N 3 A ≤ 3N B log N + 100; consider
the cases xk = xk+1 = 0, xk+1 > xk = 0, xk > xk+1 = 0 and xk , xk+1 > 0.
112
References
[1] C. J. K. Batty and H. W. Bollmann. Generalised Holley–Preston inequalities
on measure spaces and their products. Z. für Wahrsch’theorie verw. Geb.,
53:157–173, 1980.
[2] R. J. Baxter. Exactly Solved Models in Statistical Mechanics. Academic
Press, London, 1982.
[3] M. Ben-Or and N. Linial. Collective coin flipping. In Randomness and
Computation, pages 91–115. Academic Press, New York, 1990.
[4] I. Benjamini, R. Lyons, Y. Peres, and O. Schramm. Uniform spanning forests.
Ann. Probab., 29:1–65, 2001.
[5] C. E. Bezuidenhout, G.R. Grimmett, and H. Kesten. Strict inequality for
critical values of Potts models and random-cluster processes. Communications in Mathematical Physics, 158:1–16, 1993.
[6] P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968.
[7] M. Blume. Theory of the first-order magnetic phase change in UO2 . Phys.
Rev., 141:517–524, 1966.
[8] B. Bollobás and O. M. Riordan. The critical probability for random Voronoi
percolation in the plane is 1/2. Probab. Theory Related Fields, 136(3):417–
468, 2006.
[9] B. Bollobás and O. M. Riordan. A short proof of the Harris-Kesten theorem.
Bull. London Math. Soc., 38(3):470–484, 2006.
113
REFERENCES
[10] J. Bourgain, J. Kahn, G. Kalai, Y. Katznelson, and N. Linial. The influence
of variables in product spaces. Israel J. Math., 77:55–64, 1992.
[11] L. Breiman. Probability. Classics in Applied Mathematics. SIAM, New York,
1992.
[12] S.R Broadbent and J.M. Hammersley. Percolation processes. I. Crystals
and mazes. Proceedings of the Cambridge Philosophical Society, 52:629–641,
1957.
[13] R. M. Burton and M. Keane. Density and uniqueness in percolation. Comm.
Math. Phys., 121:501–505, 1989.
[14] H. W. Capel. On the possibility of first-order transitions in Ising systems of
triplet ions with zero-field splitting. Physica, 32:966–988, 1966.
[15] H. W. Capel. On the possibility of first-order transitions in Ising systems of
triplet ions with zero-field splitting. Physica, 33:295–331, 1967.
[16] H. W. Capel. On the possibility of first-order transitions in Ising systems of
triplet ions with zero-field splitting. Physica, 37:423–441, 1967.
[17] J. T. Chayes and L. Chayes. Percolation and random media. In K. Osterwalder and R. Stora, editors, Critical Phenomena, Random Systems and
Gauge Theories (Les Houches, Session XLIII, 1984), pages 1001–1142. Elsevier, Amsterdam, 1986.
[18] P. Diaconis. Analysis of a Bose-Einstein Markov chain. Ann. Inst. H.
Poincaré Probab. Statist., 41(3):409–418, 2005.
[19] P. Diaconis and L. Saloff-Coste. Logarithmic Sobolev inequalities for finite
Markov chains. Ann. Appl. Probab., 6(3):695–750, 1996.
[20] P. Ehrenfest and T. Ehrenfest. über zwei bekannte einwände gegen das
boltzmannsche h-theorem. Phys. Z., 8, 1907.
[21] P. Erdős and A. Rényi. On the evolution of random graphs. Magyar Tud.
Akad. Mat. Kutató Int. Közl., 5:17–61, 1960.
114
REFERENCES
[22] M.R. Evans. Phase transitions in one-dimensional nonequilibrium systems.
Brazilial Journal of Physics, 30:42, 2000.
[23] K. J. Falconer. The Geometry of Fractal Sets. Cambridge University Press,
Cambridge, 1985.
[24] W. Feller. An Introduction to Probability Theory and its Applications. Vol.
I. Third edition. John Wiley & Sons Inc., New York, 1968.
[25] C. M. Fortuin and P. W. Kasteleyn. On the random-cluster model. I. Introduction and relation to other models. Physica, 57:536–564, 1972.
[26] C. M. Fortuin, P. W. Kasteleyn, and J. Ginibre. Correlation inequalities on
some partially ordered sets. Comm. Math. Phys., 22:89–103, 1971.
[27] E. Friedgut. Influences in product spaces: KKL and BKKKL revisited.
Combin. Probab. Comput., 13:17–29, 2004.
[28] E. Friedgut and G. Kalai. Every monotone graph property has a sharp
threshold. Proc. Amer. Math. Soc., 124:2993–3002, 1996.
[29] A. Frieze, R. Kannan, and N. Polson. Sampling from log-concave distributions. Ann. Appl. Probab., 4(3):812–837, 1994.
[30] H.-O. Georgii, O. Häggström, and C. Maes. The Random Geometry of
Equilibrium Phases. In Phase transitions and critical phenomena, Vol. 18,
pages 1–142. Academic Press, San Diego, CA, 2001.
[31] B. T. Graham and G. R. Grimmett. Influence and sharp-threshold theorems
for monotonic measures. Ann. Probab., 34(5):1726–1745, 2006.
[32] B. T. Graham and G. R. Grimmett. Random-current representation of the
Blume-Capel model. J. Stat. Phys., 125(2):287–320, 2006.
[33] G. R. Grimmett. The stochastic random-cluster process and the uniqueness
of random-cluster measures. Ann. Probab., 23:1461–1510, 1995.
[34] G. R. Grimmett. The Random-Cluster Model. Springer, Berlin, 2006.
115
REFERENCES
[35] G.
R.
Grimmett.
The
random-cluster
model:
Corrections.
http://www.statslab.cam.ac.uk/∼grg/books/rcmcorrections.html, 2006.
[36] G. R. Grimmett and D. R. Stirzaker. Probability and Random Processes.
Oxford University Press, Oxford, second edition, 1993.
[37] T.E. Harris. A lower bound for the critical probability in a certain percolation
process. Proc. Cam. Phil. Soc., 56:13–20, 1960.
[38] R. Holley. Remarks on the FKG inequalities. Comm. Math. Phys., 36:227–
231, 1974.
[39] E. Ising. Beitrag zur Theorie des Ferromagnetismus. Z. Physik, 31:253–258,
1925.
[40] N.L. Johnson and S. Kotz. Urn models and their application. John Wiley &
Sons, New York-London-Sydney, 1977.
[41] J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. In Proceedings of 29th Symposium on the Foundations of Computer
Science, pages 68–80. Computer Science Press, 1988.
[42] H. Kesten. The critical probability of bond percolation on the square lattice
equals 21 . Comm. Math. Phys., 74(1):41–59, 1980.
[43] C. Kipnis and C. Landim. The Scaling Limits of Interacting Particle Systems.
[44] M.J. Klein. Entropy and the Ehrenfest urn model. Physica, 22:569–575,
1956.
[45] B. M. McCoy and T. T. Wu. The Two-Dimensional Ising Model.
[46] C. McDiarmid. Concentration. In Probabilistic Methods for Algorithmic
Discrete Mathematics, volume 16 of Algorithms Combin., pages 195–248.
Springer, Berlin, 1998.
[47] L. Onsager. Crystal statistics. I. A two-dimensional model with an orderdisorder transition. Phys. Rev., 65:117–149, 1944.
116
REFERENCES
[48] Y. Peres. Mixing for Markov chains and spin systems. Lecture Notes 2005
UBC Summer School, http://stat–www.berkeley.edu/∼peres/.
[49] R. B. Potts. Some generalized order–disorder transformations. Proc. Camb.
Phil. Soc., 48:106–109, 1952.
[50] C. J. Preston. A generalization of the FKG inequalities. Comm. Math. Phys.,
36:233–241, 1974.
[51] L. Russo. A note on percolation. Z. für Wahrsch’theorie verw. Geb., 43:39–
48, 1978.
[52] P. D. Seymour and D. J. A. Welsh. Percolation probabilities on the square
lattice. In B. Bollobás, editor, Advances in Graph Theory, Annals of Discrete
Mathematics 3, pages 227–245. North-Holland, Amsterdam, 1978.
[53] F. Spitzer. Interaction of Markov processes. Advances in Math., 5:246–290,
1970.
117
Download