Simply generated trees, conditioned Galton–Watson trees, random allocations and condensation Svante Janson

advertisement
Probability Surveys
Vol. 9 (2012) 103–252
ISSN: 1549-5787
DOI: 10.1214/11-PS188
Simply generated trees, conditioned
Galton–Watson trees, random
allocations and condensation
Svante Janson
e-mail: svante.janson@math.uu.se
Abstract: We give a unified treatment of the limit, as the size tends to
infinity, of simply generated random trees, including both the well-known
result in the standard case of critical Galton–Watson trees and similar but
less well-known results in the other cases (i.e., when no equivalent critical
Galton–Watson tree exists). There is a well-defined limit in the form of an
infinite random tree in all cases; for critical Galton–Watson trees this tree
is locally finite but for the other cases the random limit has exactly one
node of infinite degree.
The proofs use a well-known connection to a random allocation model
that we call balls-in-boxes, and we prove corresponding theorems for this
model.
This survey paper contains many known results from many different
sources, together with some new results.
AMS 2000 subject classifications: Primary 60C50; secondary 05C05,
60F05, 60J80.
Keywords and phrases: Random trees, simply generated trees, Galton–
Watson trees, random allocations, balls in boxes, size-biased Galton–Watson
tree, random forests.
Received December 2011.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . .
2 Simply generated trees . . . . . . . . . . . . . .
2.1 Ordered rooted trees . . . . . . . . . . . .
2.2 Galton–Watson trees . . . . . . . . . . . .
2.3 Simply generated trees . . . . . . . . . . .
3 Notation . . . . . . . . . . . . . . . . . . . . . .
3.1 More notation . . . . . . . . . . . . . . . .
4 Equivalent weights . . . . . . . . . . . . . . . .
5 A modified Galton–Watson tree . . . . . . . . .
6 The Ulam–Harris tree and convergence . . . . .
7 Main result for simply generated random trees
8 Three different types of weights . . . . . . . . .
9 The maximum degree . . . . . . . . . . . . . .
10 Examples of simply generated random trees . .
103
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
104
106
106
107
108
109
111
111
114
118
121
125
127
129
104
11
12
13
14
15
16
17
18
19
S. Janson
Balls-in-boxes . . . . . . . . . . . . . . . . . . . . .
Examples of balls-in-boxes . . . . . . . . . . . . . .
Preliminaries . . . . . . . . . . . . . . . . . . . . .
Proofs of Theorems 11.4–11.7 . . . . . . . . . . . .
Trees and balls-in-boxes . . . . . . . . . . . . . . .
Proof of Theorem 7.1 . . . . . . . . . . . . . . . . .
Proofs of Theorems 7.11 and 7.12 . . . . . . . . . .
Asymptotics of the partition functions . . . . . . .
Largest degrees and boxes . . . . . . . . . . . . . .
19.1 The case λ 6 ν . . . . . . . . . . . . . . . . .
19.2 The subcase σ 2 < ∞ . . . . . . . . . . . . . .
19.3 The subcase λ < ν . . . . . . . . . . . . . . .
19.4 The subcase wk+1 /wk → 0 as k → ∞ . . . . .
19.5 The subcase λ = ν and σ 2 = ∞ . . . . . . . .
19.6 The case λ > ν . . . . . . . . . . . . . . . . .
19.7 Applications to random forests . . . . . . . .
20 Large nodes in simply generated trees with ν < 1 .
21 Further results and problems . . . . . . . . . . . .
21.1 Level widths . . . . . . . . . . . . . . . . . . .
21.2 Asymptotic normality . . . . . . . . . . . . .
21.3 Height and width . . . . . . . . . . . . . . . .
21.4 Scaled trees . . . . . . . . . . . . . . . . . . .
21.5 Random walks . . . . . . . . . . . . . . . . .
21.6 Multi-type conditioned Galton–Watson trees
22 Different conditionings for Galton–Watson trees . .
22.1 Conditioning on |T | > n. . . . . . . . . . . . .
22.2 Conditioning on H(T ) > n. . . . . . . . . . .
Acknowledgement . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
136
140
153
156
167
173
177
179
190
191
193
205
210
213
219
229
235
239
239
240
241
243
243
243
243
244
245
246
246
1. Introduction
The main purpose of this survey paper is to study the asymptotic shape of
simply generated random trees in complete generality; this includes conditioned
Galton–Watson trees as a special case, but we will also go beyond that case.
Definitions are given in Section 2; here we only recall that simply generated
trees are defined by a weight sequence (wk ), and that the case when the weight
sequence is a probability distribution yields conditioned Galton–Watson trees.
It is well-known that in the case of a critical conditioned Galton–Watson tree,
i.e., when the defining offspring distribution has expectation 1, the random tree
has a limit (as the size tends to infinity); this limit is an infinite random tree,
the size-biased Galton–Watson tree defined by Kesten [74], see also Aldous [4],
Aldous and Pitman [6] and Lyons, Pemantle and Peres [84]. It is also well-known
that this case is less special than it might seem; there is a notion of equivalent
weight sequences defining the same simply generated random tree, see Section 4,
Simply generated trees and random allocations
105
and a large class of weight sequences have an equivalent probability weight sequence defining a critical conditioned Galton–Watson tree. Many probabilists,
including myself, have often concentrated on this “standard” case of critical
conditioned Galton–Watson trees and dismissed the remaining cases as uninteresting exceptional cases. However, some researchers, in particular mathematical
physicists, have studied such cases too. Bialas and Burda [13] studied one case
(Example 10.7 below) and found a phase transition as we leave the standard
case; this can be interpreted as a condensation making the tree bushy with one
or a few nodes of very high degree. This interesting condensation was studied
further by Jonsson and Stefánsson [67], who showed that (in the power-law
case), there is a limit tree of a different type, having one node of infinite degree.
We give in the present paper a unified treatment of the limit as the size tends
to infinity for all simply generated trees, including both the well-known result in
the standard case of critical Galton–Watson trees and the “exceptional” cases
(i.e., when no equivalent probability weight sequence exists, or when such a
sequence exists but not with mean 1). We will see that there is a well-defined
limit in the form of an infinite random tree for any weight sequence. In the
non-standard cases, this infinite random limit has exactly one node of infinite
degree, so its form differs from the standard case of a critical Galton–Watson
tree where all nodes in the limit tree have finite degrees, but nevertheless the
trees are similar; see Sections 5 and 7 for details.
Some important notation, used throughout the paper, is introduced in Section 3, while Sections 4 and 6 contain further preliminaries. The main limit
theorem for simply generated random trees is stated in Section 7, together with
some other, related, limit theorems concerning node degrees and fringe subtrees.
The differences between different types of weight sequences are discussed further in Section 8, and this is continued in Section 9 with a summary of the main
results from Section 19 on the maximum outdegree in the random tree.
The proofs of the limit theorems for random trees use a well-known connection to a random allocation model that we call balls-in-boxes; this model exhibits
a similar behaviour, with condensation in the non-classical cases, see e.g. Bialas,
Burda and Johnston [14]. The model is defined in Section 11, and the relation
between the models is described in Section 15. The balls-in-boxes model is interesting in its own right, and it has been used for several other applications;
we give some examples from probability theory, combinatorics and statistical
physics in Section 12. We therefore also develop the general theory for balls-inboxes with arbitrary weight sequences (in the range where the mean occupancy
is bounded). In particular, we give in Section 11 theorems corresponding to (and
in some ways extending) our main theorems for random trees.
The limit theorems for balls-in-boxes are proved in Sections 13–14, and then
these results are used to prove the limit theorems for random trees in Sections
15–17.
The remaining sections contain additional results. Section 18 gives asymptotic results for the partition functions of the models. The very long Section 19
gives results on the largest degrees in random trees, and the largest numbers of
balls in a box in the balls-in-boxes model; the section is long because there are
106
S. Janson
several different cases with different types of behaviour. (See also the summary
in Section 9.) In particular, we study in Section 19.6 the case when there is
condensation, and investigate whether this appears as condensation to a single
box (or node), or whether the condensation is distributed over several boxes
(nodes); it turns out that both cases can occur. We give also, in Section 19.7,
applications to the size of the largest tree in random forests. In Section 20, the
condensation in random trees is discussed in further detail. Finally, some additional comments, results and open problems are given in Sections 21 and 22;
Section 21 mentions briefly various other types of asymptotic results for simply
generated random trees, and Section 22 discusses alternative ways to condition
Galton–Watson trees.
This paper contains many known results from many different sources, together with some new results. (We believe, for example, that the theorems in
Section 7 are new in the present generality.) We have tried to give relevant references, but the absence of references does not necessarily imply that a result
is new.
2. Simply generated trees
2.1. Ordered rooted trees
The trees that we consider are (with a few explicit exceptions) rooted and ordered
(such trees are also called plane trees). Recall that a tree is rooted if one node
is distinguished as the root o; this implies that we can arrange the nodes in a
sequence of generations (or levels), where generation x consists of all nodes of
distance x to the root. (Thus generation 0 is the root; generation 1 is the set of
neighbours of the root, and so on.) If v is a node with v 6= o, then the parent of
v is the neighbour of v on the path from v to o; thus, every node except the root
has a unique parent, while the root has no parent. Conversely, for any node v,
the neighbours of v that are further away from the root than v are the children
of v. The number of children of v is the outdegree d+ (v) > 0 of v. Note that if
v is in generation x, then its parent is in generation x − 1 and its children are
in generation x + 1.
Recall further that a rooted tree is ordered if the children of each node are
ordered in a sequence v1 , . . . , vd , where d = d+ (v) > 0 is the outdegree of v.
See e.g. Drmota [33] for more information on these and other types of trees.
(The trees we consider are called planted plane trees in [33].) We identify trees
that are isomorphic in the obvious (order preserving) way. (Formally, we can
define our trees as equivalence classes. Alternatively, we may select a specific
representative in each equivalence class as in Section 6.)
Remark 2.1. Some authors prefer to add an extra (phantom) node as a parent
of the root; such trees are called planted. (An alternative version is to add only
a pendant edge at the root, with no second endpoint.) There is an obvious
one-to-one correspondence between trees with and without the extra node, so
the difference is just a matter of formulations, but when comparing results one
Simply generated trees and random allocations
107
should be careful whether, for example, the extra node is counted or not. The
extra node yields the technical advantage that also the root has indegree 1 and
thus total degree = 1 + d+ (v); it further gives each embedding in the plane
a unique ordering of the children of every node (in clockwise order from the
parent, say). Nevertheless, we find this device less natural and we will not use
it in the present paper. (We use outdegrees instead of degrees and assume that
an ordering of the children as above is given; then there are no problems.)
We are primarily interested in (large) finite trees, but we will also consider
infinite trees, for example as limit objects in our main theorem (Theorem 7.1).
The infinite trees may have nodes with infinite outdegree d+ (v) = ∞; in this
case we assume that the children are ordered v1 , v2 , . . . (i.e., the order type of
the set of children is N).
We let Tn be the set
S∞of all ordered rooted trees with n nodes (including the
root) and let Tf := n=1 Tn be the set of all finite ordered rooted trees; see
further Section 6.
Remark 2.2. Note that Tn is a finite set. In fact, it is well-known that its size
|Tn | is the (n − 1):th Catalan number
(2n − 2)!
1 2n − 2
=
,
(2.1)
Cn−1 =
n n−1
n! (n − 1)!
see e.g. [33, Section 1.2.2 and Theorem 3.2], [40, Section I.2.3] or [103, Exercise
6.19(e)], but we do not need this.
For any tree T , we let |T | denote the number of nodes; we call |T | the size of
T . As is well known, for any finite tree T ,
X
d+ (v) = |T | − 1,
(2.2)
v∈T
since every node except the root is the child of exactly one node.
2.2. Galton–Watson trees
An important class of examples of random ordered rooted trees is given by the
Galton–Watson trees. These are defined as the family trees of Galton–Watson
processes: Given a probability distribution (πk )∞
k=0 on Z>0 , or, equivalently, a
random variable ξ with distribution (πk )∞
,
we
build the tree T recursively,
k=0
starting with the root and giving each node a number of children that is an
independent copy of ξ. (We call (πk )∞
k=0 the offspring distribution of T ; we
sometimes also abuse the language and call ξ the offspring distribution.) In
other words, the outdegrees d+ (v) are i.i.d. with the distribution (πk )∞
k=0 .
Recall that the Galton–Watson process is called subcritical,
critical
or superP∞
critical as the expected number of children E ξ = k=0 kπk satisfies E ξ < 1,
E ξ = 1 or E ξ > 1. It is a standard basic fact of branching process theory that T
is finite a.s. if E ξ 6 1 (i.e., in the subcritical and critical cases), but T is infinite
108
S. Janson
with positive probability if E ξ > 1 (the supercritical case), see e.g. Athreya and
Ney [8].
The Galton–Watson trees have random sizes. We are mainly interested in
random trees with a given size; we thus define Tn as T conditioned on |T | = n.
These random trees Tn are called conditioned Galton–Watson trees. By definition, Tn has size |Tn | = n.
It is well-known that several important classes of random trees can be seen
as conditioned Galton–Watson tree, see e.g. Aldous [4], Devroye [32], Drmota
[33] and Section 10.
2.3. Simply generated trees
The random trees that we will study are a generalization of the Galton–Watson
trees. We suppose in this paper that we are given a fixed weight sequence w =
(wk )k>0 of non-negative real numbers. We then define the weight of a finite tree
T ∈ Tf by
Y
w(T ) :=
wd+ (v) ,
(2.3)
v∈T
taking the product over all nodes v in T . Trees with such weights are called
simply generated trees and were introduced by Meir and Moon [85]. To avoid
trivialities, we assume that w0 > 0 and that there exists some k > 2 with
wk > 0.
We let Tn be the random tree obtained by picking an element of Tn at random
with probability proportional to its weight, i.e.,
P(Tn = T ) =
w(T )
,
Zn
T ∈ Tn ,
where the normalizing factor Zn is given by
X
Zn = Zn (w) :=
w(T );
(2.4)
(2.5)
T ∈Tn
Zn is known as the partition function. This definition makes sense only when
Zn > 0; we tacitly consider only such n when we discuss Tn . Our assumptions
w0 > 0 and wk > 0 for some k > 2 imply that Zn > 0 for infinitely many n,
see Corollary 15.6 for a more precise result. (In most applications, w1 > 0, and
then Zn > 0 for every n > 1, so there is no problem at all. The archetypical
example with a parity restriction is given by the random (full) binary tree, see
n is odd.)
Example 10.3, for which Zn > 0 if and only ifP
∞
One particularly important case is when
k=0 wk = 1, so the weight sequence (wk ) is a probability distribution on Z>0 . (We then say that (wk ) is a
probability weight sequence.) In this case we let ξ be a random variable with
the corresponding distribution: P(ξ = k) = wk ; we further let T be the random
Galton–Watson tree generated by ξ. It follows directly from the definitions that
for every finite tree T ∈ Tf , P(T = T ) = w(T ). Hence
Zn = P(|T | = n)
(2.6)
Simply generated trees and random allocations
109
and the simply generated random tree Tn is the same as the random Galton–
Watson tree T conditioned on |T | = n, i.e., it equals the conditioned Galton–
Watson tree Tn defined above.
It is well-known, see Section 4 for details, that in many cases it is possible
to change the weight sequence (wk ) to a probability weight sequence without
changing the distribution of the random trees Tn ; in this case Tn can thus be
seen as a conditioned Galton–Watson tree. Moreover, in many cases this can
be done such that the resulting probability distribution has mean 1. In such
cases it thus suffices to consider the case of a probability weight sequence with
mean E ξ = 1; then Tn is a conditional critical Galton–Watson tree. It turns
out that this is a nice and natural setting, with many known results proved by
many different authors. (In many papers it is further assumed that ξ has finite
variance, or even a finite exponential moment. This is not needed for the main
results presented here, but may be necessary for other results. See also Sections
8, 19 and 21.)
3. Notation
We consider a fixed weight sequence w = (wk )k>0 . The support supp(w) of the
weight sequence w = (wk ) is {k : wk > 0}. We define
ω = ω(w) := sup supp(w) = sup{k : wk > 0} 6 ∞,
(3.1)
(When considering Tn , we assume, as said above, w0 > 0 and wk > 0 for some
k > 2; this can be written 0 ∈ supp(w) and ω > 2.)
We further define (assuming that the support contains at least two points)
span(w) := max{d > 1 : d | (i − j) whenever wi , wj > 0}.
(3.2)
Since we assume w0 > 0, i.e., 0 ∈ supp(w), we can simplify this to
span(w) = max{d > 1 : d | i whenever wi > 0},
the greatest common divisor of supp(w).
We let
∞
X
Φ(z) :=
wk z k
(3.3)
(3.4)
k=0
be the generating function of the given weight sequence, and let ρ ∈ [0, ∞] be
its radius of convergence. Thus
1/k
ρ = 1/ lim sup wk .
(3.5)
k→∞
Φ(ρ) is always defined, with 0 < Φ(ρ) 6 ∞. Note that (assuming ω > 0)
Φ(∞) = ∞; in particular, if ρ = ∞, then Φ(ρ) = ∞. On the other hand,
if ρ < ∞, then both Φ(ρ) = ∞ and Φ(ρ) < ∞ are possible. If ρ > 0, then
Φ(t) ր Φ(ρ) as t ր ρ by monotone convergence.
110
S. Janson
We further define, for t such that Φ(t) < ∞,
P∞
kwk tk
tΦ′ (t)
;
= Pk=0
Ψ(t) :=
∞
k
Φ(t)
k=0 wk t
(3.6)
Ψ(t) is thus defined and finite at least for 0 6 t < ρ, and if Φ(ρ) < ∞, then
Ψ(ρ) is still defined by (3.6), with Ψ(ρ) 6 ∞ (note that the numerator in (3.6)
may diverge in this case, but not for 0 6 t < ρ). Moreover, if Φ(ρ) = ∞, we
define Ψ(ρ) := limtրρ Ψ(t) 6 ∞. (The limit exists by Lemma 3.1(i) below, but
may be infinite.)
Alternatively, (3.6) may be written
Ψ(ex ) = ex
d
Φ′ (ex )
=
log Φ(ex ).
Φ(ex )
dx
(3.7)
The function Ψ will play a central role in the sequel. This is mainly because
of Lemma 4.2 below, which gives a probabilistic interpretation of Ψ(t). Its basic
properties are given by the following lemma, which is proved in Section 13.
Lemma 3.1. Let w = (wk )∞
k=0 be a given weight sequence with w0 > 0 and
wk > 0 for some k > 1 (i.e., ω(w) > 0).
(i) If 0 < ρ 6 ∞, then the function
Ψ(t) :=
(ii)
(iii)
(iv)
(v)
P∞
kwk tk
tΦ′ (t)
= Pk=0
∞
k
Φ(t)
k=0 wk t
(3.8)
is finite, continuous and (strictly) increasing on [0, ρ), with Ψ(0) = 0.
If 0 < ρ 6 ∞, then Ψ(t) → Ψ(ρ) 6 ∞ as t ր ρ.
For any ρ, Ψ is continuous [0, ρ] → [0, ∞].
If ρ < ∞ and Φ(ρ) = ∞, then Ψ(ρ) := limt→ρ Ψ(t) = ∞.
If ρ = ∞, then Ψ(ρ) := limt→ρ Ψ(t) = ω 6 ∞.
Consequently, if ρ > 0, then
Ψ(ρ) = lim Ψ(t) = sup Ψ(t) ∈ (0, ∞].
(3.9)
ν := Ψ(ρ).
(3.10)
tրρ
06t<ρ
We define
In particular, if Φ(ρ) < ∞, then
ν=
ρΦ′ (ρ)
6 ∞.
Φ(ρ)
(3.11)
It follows from Lemma 3.1 that ν = 0 ⇐⇒ ρ = 0, and that if ρ > 0, then
ν := Ψ(ρ) = lim Ψ(t) = sup Ψ(t) ∈ (0, ∞].
tրρ
(3.12)
06t<ρ
It follows from (3.8) that ν 6 ω.
Note that all these parameters depend on the weight sequence w = (wk ); we
may occasionally write e.g. ω(w) and ν(w), but usually we for simplicity do not
show w explicitly in the notation.
111
Simply generated trees and random allocations
Remark 3.2. Let Z(z) denote the generating function Z(z) :=
Then
Z(z) = zΦ(Z(z)),
P∞
n=1
Zn z n .
(3.13)
as shown already by Otter [93]. This equation is the basis of much work on
simply generated trees using algebraic and analytic methods, see e.g. Drmota
[33], but the present paper uses different methods and we will use (3.13) only
in a few minor remarks.
3.1. More notation
We define N0 = Z>0 := {0, 1, 2, . . . }, N1 = Z>0 := {1, 2, . . . }, N0 := N0 ∪ {∞}
and N1 := N1 ∪ {∞}.
All unspecified limits are as n → ∞. Thus, an ∼ bn means an /bn → 1 as
p
d
n → ∞. We use −→ and −→ for convergence in probability and distribution,
d
respectively, of random variables, and = for equality in distribution. We use op
and Op in the standard senses: op (an ) is an unspecified random variable Xn
p
such that Xn /an −→ 0 as n → ∞, and Op (an ) is a random variable Xn such
that Xn /an is stochastically bounded (usually called tight ). We say that some
event holds w.h.p. (with high probability) if its probability tends to 1 as n → ∞.
(See further e.g. [62].)
A coupling of two random variables X and Y is formally a pair of random
d
variables X ′ and Y ′ defined on a common probability space such that X = X ′
d
and Y = Y ′ ; with a slight abuse of notation we may continue to write X
and Y , thus replacing the original variables with new ones having the same
distributions.
d
We write Xn ≈ Xn′ for two sequences of random variables or vectors Xn
and Xn′ if there exists a coupling of Xn and Xn′ with Xn = Xn′ w.h.p.; this
is equivalent to dTV (Xn , Xn′ ) → 0 as n → ∞, where dTV denotes the total
variation distance.
We use C1 , C2 , . . . to denote unimportant constants, possibly different at
different occurrences.
Recall that d+ (v) = d+
T (v) always denotes the outdegree of a node v in a tree
T . (We use the notation d+ (v) rather than d(v) to emphasise this.) We will not
use the total degree d(v) = 1 + d+ (v) (when v 6= o), but care should be taken
when comparing with other papers.
4. Equivalent weights
If a, b > 0 and we change wk to
w
ek := abk wk ,
(4.1)
then, for every tree T ∈ Tn , w(T ) is changed to, using (2.2),
w(T
e ) = an b
P
v
d+ (v)
w(T ) = an bn−1 w(T ).
(4.2)
112
S. Janson
Consequently, Zn is changed to
en := an bn−1 Zn ,
Z
(4.3)
and the probabilities in (2.4) are not changed. In other words, the new weight
sequence (w
ek ) defines the same simply generated random trees Tn as (wk ).
(This is essentially due to Kennedy [73], who did not consider trees but showed
the corresponding result for Galton–Watson processes. See also Aldous [4].)
We say that weight sequence (wk ) and (w
ek ) related by (4.1) (for some a, b >
0) are equivalent. (This is clearly an equivalence relation on the set of weight
sequences.)
Let us see how replacing (wk ) by the equivalent weight sequence (w
ek ) affects
the parameters defined above. The support, span and ω are not affected at all.
The generating function Φ(t) is replaced by
e :=
Φ(t)
∞
X
k=0
w
ek tk =
∞
X
abk tk = aΦ(bt),
(4.4)
k=0
with radius of convergence ρe = ρ/b. Further, Ψ(t) is replaced by
e ′ (t)
tΦ
tabΦ′ (bt)
e
Ψ(t)
:=
= Ψ(bt).
=
e
aΦ(bt)
Φ(t)
(4.5)
Hence, if ρ > 0, ν is replaced by, using (3.12),
e
=
νe := sup Ψ(t)
06t<e
ρ
sup Ψ(bt) = sup Ψ(s) = ν;
06s<ρ
06t<ρ/b
if ρ = 0 then νe = ρe = 0 = ν is trivial. In other words, ν is invariant and depends
only on the equivalence class of the weight sequence.
Lemma 4.1. There exists a probability weight sequence equivalent to (wk ) if
and only if and only if ρ > 0. In this case, the probability weight sequences
equivalent to (wk ) are given by
pk =
tk wk
,
Φ(t)
(4.6)
for any t > 0 such that Φ(t) < ∞.
Proof. The equivalent weight sequence (w
ek ) given by (4.1) is a probability distribution if and only if
1=
∞
X
k=0
w
ek = a
∞
X
wk bk = aΦ(b),
k=0
i.e., if and only if Φ(b) < ∞ and a = Φ(b)−1 . Thus, there exists a probability
weight sequence equivalent to (wk ) if and only if there exists b > 0 with Φ(b) <
∞, i.e., if and only if ρ > 0; in this case we can choose any such b and take
a := Φ(b)−1 , which yields (4.6) (with t = b).
Simply generated trees and random allocations
113
We easily find the probability generating function and thus moments of the
probability weight sequence in (4.6); we state this in a form including the trivial
case t = 0.
Lemma 4.2. If t > 0 and Φ(t) < ∞, then
pk :=
tk wk
,
Φ(t)
k > 0,
(4.7)
defines a probability weight sequence (pk ). This probability distribution has probability generating function
Φt (z) :=
∞
X
pk z k =
k=0
Φ(tz)
,
Φ(t)
(4.8)
and a random variable ξ with this distribution has expectation
E ξ = Φ′t (1) =
tΦ′ (t)
= Ψ(t)
Φ(t)
(4.9)
and variance
Var ξ = tΨ′ (t);
(4.10)
furthermore, for any s > 0 and x > 0,
P(ξ > x) 6 e−sx
Φ(es t)
Φ(es t)
6 e−sx
.
Φ(t)
Φ(0)
(4.11)
If t < ρ, then E ξ and Var ξ are finite. If t = ρ, however, E ξ and Var ξ may
be infinite (we define Var ξ = ∞ when E ξ = ∞, but Var ξ may be infinite also
when E ξ is finite); (4.9)–(4.10) still hold, with Ψ′ (ρ) 6 ∞ defined as the limit
limsրρ Ψ′ (s). The tail estimate (4.11) is interesting only when t < ρ, when we
may choose any s < log(ρ/t) and obtain the estimate O(e−sx ).
Proof. Direct summations yield
∞
X
pk =
k=0
P∞
tk wk
=1
Φ(t)
k=0
(4.12)
and, more generally,
∞
X
pk z k =
k=0
P∞
wk tk z k
Φ(tz)
=
,
Φ(t)
Φ(t)
k=0
(4.13)
showing that (pk ) is a probability distribution with the probability generating
function Φt given in (4.8).
The expectation E ξ = Φ′t (1) is evaluated by differentiating (4.8) (for z < 1
and then taking the limit as z → 1 to avoid convergence problems if t = ρ), or
directly from (4.7) as
P∞
∞
X
kwk tk
= Ψ(t).
Eξ =
kpk = k=0
Φ(t)
k=0
114
S. Janson
Similarly, the variance is given by, using (4.8) and (4.9),
Var ξ =
Φ′′t (1)
+
Φ′t (1)
−
(Φ′t (1))2
t2 Φ′′ (t) tΦ′ (t)
+
−
=
Φ(t)
Φ(t)
tΦ′ (t)
Φ(t)
2
= tΨ′ (t).
Alternatively,
d
dt
∞
X
tΨ′ (t) = t
=
k=0
P∞ 2 k
2
P∞
k
k t wk
ktk wk
k=0 kt wk
= k=0
−
Φ(t)
Φ(t)
Φ(t)
X
2
∞
kpk
= E ξ 2 − (E ξ)2 = Var ξ.
k 2 pk −
P∞
k=0
k=0
(In the case t = ρ and Var ξ = ∞, we use this calculation for t′ < t and let
t′ → t.)
Finally, by (4.8),
P(ξ > x) 6 e−sx E esξ = e−sx Φt (es ) = e−sx
Φ(es t)
.
Φ(t)
In particular, taking t = 1, we recover the standard facts that if (wk ) is a
probability distribution, so Φ(1) = 1, then it has expectation Φ′ (1) = Ψ(1) and
variance Ψ′ (1).
Remark 4.3. We see from Lemma 4.1 that the probability weight sequences
equivalent to (wk ) are given by (4.6), where t ∈ (0, ρ] when Ψ(ρ) < ∞ and
t ∈ (0, ρ) when Ψ(ρ) = ∞. By Lemma 3.1, t 7→ E ξ = Ψ(t) is an increasing
bijection (0, ρ] → (0, ν] and (0, ρ) → (0, ν). Hence, any equivalent probability
weight sequence is uniquely determined by its expectation, and the possible
expectations are (0, ν] (when Ψ(ρ) < ∞) or (0, ν) (when Ψ(ρ) = ∞).
Remark 4.4. Note that we will frequently use (4.6) to define a new probability
weight sequence also if we start with a probability weight sequence (wk ). Probability distributions related in this way are called conjugated or tilted. Conjugate
distributions were introduced by Cramér [27] as an important tool in large deviation theory, see e.g. [31]. The reason is essentially the same as in the present
paper: by conjugating the distribution we can change its mean in a way that
enables us to keep control over sums Sn .
5. A modified Galton–Watson tree
Let (πk )k>0 be a probability distribution on N0 and let ξ be a random variable
on N0 with distribution (πk )∞
k=0 :
P(ξ = k) = πk ,
k = 0, 1, 2, . . .
(5.1)
P
We assume that the expectation µ := E ξ = k kπk 6 1 (the subcritical or
critical case).
Simply generated trees and random allocations
115
In this case, we define (based on Kesten [74] and Jonsson and Stefánsson [67])
a modified Galton–Watson tree Tb as follows: There are two types of nodes: normal and special, with the root being special. Normal nodes have offspring (outdegree) according to independent copies of ξ, while special nodes have offspring
b where
according to independent copies of ξ,
(
kπk ,
k = 0, 1, 2, . . . ,
P(ξb = k) :=
(5.2)
1 − µ, k = ∞.
(Note that this is a probability distribution on N1 .) Moreover, all children of a
normal node are normal; when a special node gets an infinite number of children,
all are normal; when a special node gets a finite number of children, one of its
children is selected uniformly at random and is special, while all other children
are normal.
Thus, for a special node, and any integers j, k with 1 6 j 6 k < ∞, the
probability that the node has exactly k children and that the j:th of them is
special is kπk /k = πk .
Since each special node has at most one special child, the special nodes form
a path from the root; we call this path the spine of Tb . We distinguish two
different cases:
(T1) If µ = 1 (the critical case), then ξb < ∞ a.s. so each special node has a
special child and the spine is an infinite path. Each outdegree d+ (v) in Tb
is finite, so the tree is infinite but locally finite.
In this case, the distribution of ξb in (5.2) is the size-biased distribution of
ξ, and Tb is the size-biased Galton–Watson tree defined by Kesten [74], see
also Aldous [4], Aldous and Pitman [6], Lyons, Pemantle and Peres [84]
and Remark 5.7 below. The underlying size-biased Galton–Watson process
is the same as the Q-process studied in Athreya and Ney [8, Section I.14],
which is an instance of Doob’s h-transform. (See Lyons, Pemantle and
Peres [84] for further related constructions in other contexts and Geiger
and Kauffmann [45] for a generalization.)
An alternative construction of the random tree Tb is to start with the spine
(an infinite path from the root) and then at each node in the spine attach
further branches; the number of branches at each node in the spine is a
copy of ξb− 1 and each branch is a copy of the Galton–Watson tree T with
offspring distributed as ξ; furthermore, at a node where k new branches are
attached, the number of them attached to the left of the spine is uniformly
distributed on {0, . . . , k}. (All random choices are independent.) Since the
critical Galton–Watson tree T is a.s. finite, it follows that Tb a.s. has
exactly one infinite path from the root, viz. the spine.
(T2) If µ < 1 (the subcritical case), then a special node has with probability
1 − µ no special child. Hence, the spine is a.s. finite and the number L of
nodes in the spine has a (shifted) geometric distribution Ge(1 − µ),
P(L = ℓ) = (1 − µ)µℓ−1 ,
ℓ = 1, 2, . . . .
(5.3)
116
S. Janson
The tree Tb has a.s. exactly one node with infinite outdegree, viz. the top
of the spine. Tb has a.s. no infinite path.
In this case, an alternative construction of Tb is to start with a spine of
random length L, where L has the geometric distribution (5.3). We attach
as in (T1) further branches that are independent copies of the Galton–
Watson tree T ; at the top of the spine we attach an infinite number of
branches and at all other nodes in the spine the number we attach is a
d
copy of ξ ∗ − 1 where ξ ∗ = (ξb | ξb < ∞) has the size-biased distribution
∗
P(ξ = k) = kπk /µ. The spine thus ends with an explosion producing
an infinite number of branches, and this is the only node with an infinite
degree. This is the construction by Jonsson and Stefánsson [67].
Example 5.1. In the extreme case µ = 0, or equivalently ξ = 0 a.s., i.e., π0 = 1
and πk = 0 for k > 1, (5.2) shows that ξb = ∞ a.s. Hence, every normal node has
no child and is thus a leaf, while every special node has an infinite number of
children, all normal. Consequently, the root is the only special node, the spine
consists of the root only (i.e., its length L = 1), and the tree Tb consists of
the root with an infinite number of leaves attached to it, i.e., Tb is an infinite
star. (This is also given directly by the alternative construction in (T2) above.)
In contrast, T consists of the root only, so |T | = 1. In this case there is no
randomness in T or Tb .
Remark 5.2. In case (T1), if we remove the spine, we obtain a random forest
that can be regarded as coming from a Galton–Watson process with immigration, where the immigration is described by an i.i.d. sequence of random
variables with the distribution of ξb − 1, see Lyons, Pemantle and Peres [84]. (In
the Poisson case, Grimmett [47] gave a slightly different description of Tb using
a Galton–Watson process with immigration.)
In case (T2), we can do the same, but now the immigration is different: at a
random (geometric) time, there is an infinite immigration, and after that there
is no more immigration at all.
Remark 5.3. Some related modifications of Galton–Watson trees having a
finite spine have been considered previously. Sagitov and Serra [102] construct
(as a limit for a certain two-type branching process) a random tree similar
to the one in (T2) above (with a subcritical ξ), with a finite spine having a
length with the geometric distribution (5.3); the difference is that at the top of
the spine, only a finite number of Galton–Watson trees T are attached. (This
number may be a copy of ξ ∗ − 1 as at the other points of the spine, or it may
have a different distribution, see [102].) Thus there is no explosion, and the
tree is finite. Another modified Galton–Watson tree is used by Addario-Berry,
Devroye and Janson [1]; the proofs use a truncated version of (T1) above (with
a critical ξ), where the spine has a fixed length k; at the top of the spine the
special node becomes normal and reproduces normally with ξ children. Geiger
[44] studied T conditioned on its height being at least n, see Section 22, and
gave a construction of it using a spine of length n, but with more complicated
rules for the branches. See also the modified trees Tb1n , Tb2n , Tb3n in Section 20.
Simply generated trees and random allocations
117
The invariant random sin-tree constructed by Aldous [2] in a more general
situation, is for a critical Galton–Watson process another related tree; it has an
infinite spine as Tb , but differs from Tb in that the root has ξ + 1 children (and
b In this case, it may be better to reverse the
thus ξ normal children) instead of ξ.
orientation of the spine and consider the spine as an infinite path · · · v−2 v−1 v0
starting at −∞ (there is thus no root); we attach further branches (copies of
b
T ) as above, with all vi , i < 0, special (the number of children is a copy of ξ),
but the top node v0 normal (the number of children is a copy of ξ, and all are
normal).
Kurtz, Lyons, Pemantle and Peres [78] and Chassaing and Durhuus [23] have
constructed related trees with infinite spines using multi-type Galton–Watson
processs.
Remark 5.4. If ξ has the probability generating function ϕ(x) := E xξ =
P∞
k
b
k=0 πk x , then ξ has by (5.2) the probability generating function
b
E xξ =
∞
X
kπk xk = xϕ′ (x),
(5.4)
k=0
at least for 0 6 x < 1. (Also for µ < 1 when ξb may take the value ∞.)
Remark 5.5. In case (T1), the random variable ξb is a.s. finite and has mean
E ξb =
∞
X
k=0
k P(ξb = k) =
∞
X
k 2 πk = E ξ 2 = σ 2 + 1,
(5.5)
k=0
where σ 2 := Var ξ 6 ∞. In case (T2), we have P(ξb = ∞) > 0 and thus E ξb = ∞.
This suggests that in results that are known in the critical case (T1), and where
σ 2 appears as a parameter (see e.g. Section 21), the correct generalization of
σ 2 to the subcritical case (T2) is not Var ξ but E ξb − 1 = ∞. (See Remark 5.6
below for a simple example.) We thus define, for any distribution (πk )∞
k=0 with
expectation µ 6 1,
(
σ 2 , µ = 1,
2
b
σ̂ := E ξ − 1 =
(5.6)
∞, µ < 1.
Remark 5.6. Let lk (T ) denote the number of nodes with distance k to the
root in a rooted tree T . (This is thus the size of the k:th generation.) Trivially,
l0 (T ) = 1, while l1 (T ) = d+
T (o), the root degree.
It follows by the construction of Tb and induction that in case (T1), using
(5.5),
E lk (Tb ) = 1 + k(E ξb − 1) = kσ 2 + 1,
k > 0.
(5.7)
In case (T2), we have if µ > 0 and k > 1 a positive probability that L = k and
then lk (Tb ) = ∞. Thus E lk (Tb ) = ∞. Consequently, using (5.6), if 0 < µ 6 1,
then
E lk (Tb ) = kσ̂ 2 + 1,
k > 1.
(5.8)
118
S. Janson
However, this fails if µ = 0; in that case, l1 (Tb ) = ∞ but lk (Tb ) = 0 for k > 2,
see Example 5.1.
Remark 5.7. As said above, in the case µ = 1, the tree Tb is the size-biased
Galton–Watson tree, see [74, 6] and [84]. For comparison, we give the definition
of the latter, for an arbitrary distribution (πk )k>0 with finite mean µ > 0: Let,
as above, ξ have the distribution (πk ), see (5.1), and let ξ ∗ have the size-biased
distribution defined by
P(ξ ∗ = k) =
kπk
,
µ
k = 0, 1, 2, . . .
(5.9)
(Note that this is a probability distribution on N1 .) Construct T ∗ as Tb above,
with normal and special nodes, with the only difference that the number of
children of a special node has the distribution of ξ ∗ in (5.9).
In the critical case µ = 1, we have ξ ∗ = ξb and thus T ∗ = Tb , but in the
subcritical case µ < 1, T ∗ and Tb are clearly different. (Note that T ∗ always is
locally finite, but Tb is not when µ < 1.) When µ > 1, Tb is not even defined,
but T ∗ is. (As remarked by Aldous and Pitman [6], in the supercritical case T ∗
has a.s. an uncountable number of infinite paths from the root, in contrast to
the case µ 6 1 when the spine a.s. is the only one.)
T ∗ can also be constructed by the alternative construction in (T1) above
starting with an infinite spine, again with the difference that ξb − 1 is replaced
by ξ ∗ − 1. T ∗ can also be seen as a Galton–Watson process with immigration
in the same way as in Remark 5.2.
By (5.9), the probability that a given special node in T ∗ has k > 1 children,
with a given one of them special, is
1
kπk
πk
P(ξ ∗ = k) =
=
.
k
kµ
µ
(5.10)
Let T be a fixed tree of height ℓ, and let u be a node in the ℓ:th (and last)
generation in T . Let T ∗(ℓ) denote T ∗ truncated at height ℓ. It follows from
(5.10) and independence that the probability that T ∗(ℓ) = T and that u is
special (i.e., u is the unique element of the spine at distance ℓ from the root)
equals µ−ℓ P(T (ℓ) = T ). Hence, summing over the lℓ (T ) possible u,
P(T ∗(ℓ) = T ) = µ−ℓ lℓ (T ) P(T (ℓ) = T ),
(5.11)
which explains the name size-biased Galton–Watson tree. (As an alternative,
one can thus define T ∗ directly by (5.11), noting that this gives consistent
distributions for m = 1, 2, . . . , see Kesten [74].) See further Section 22.2.
6. The Ulam–Harris tree and convergence
It is convenient, especially when discussing convergence, to regard our trees as
subtrees of the infinite Ulam–Harris tree defined as follows. (See e.g. Otter [93],
Harris [51, § VI.2], Neveu [91] and Kesten [74].)
Simply generated trees and random allocations
119
Definition S
6.1. The Ulam–Harris tree U∞ is the infinite rooted tree with node
k
set V∞ := ∞
k=0 N1 , the set of all finite strings i1 · · · ik of positive integers,
including the empty string ∅ which we take as the root o, and with an edge
joining i1 · · · ik and i1 · · · ik+1 for any k > 0 and i1 , . . . , ik+1 ∈ N1 .
Thus every node v = i1 · · · ik has outdegree d+ (v) = ∞; the children of v are
the strings v1, v2, v3, . . . , and we let them have this order so U∞ becomes an
infinite ordered rooted tree. The parent of i1 · · · ik (k > 0) is i1 · · · ik−1 .
The family T of ordered rooted trees can be identified with the set of all
rooted subtrees T of U∞ that have the property
i1 · · · ik i ∈ V (T ) =⇒ i1 · · · ik j ∈ V (T ) for all j 6 i.
(6.1)
Equivalently, by identifying T and its node set V (T ), we can regard T as the
family of all subsets V of V∞ that satisfy
∅ ∈ V,
(6.2)
i1 · · · ik+1 ∈ V =⇒ i1 · · · ik ∈ V,
i1 · · · ik i ∈ V =⇒ i1 · · · ik j ∈ V for all j 6 i.
(6.3)
(6.4)
We let Tf := {T ∈ T : |T | < ∞} be the set of all finite ordered rooted trees
and Tn := {T ∈ T : |T | = n} the set of all ordered rooted trees of size n.
If T ∈ T, we let as above d+ (v) = d+
T (v) denote the outdegree of v for every
v ∈ V (T ), For convenience, we also define d+ (v) = 0 for v ∈
/ V (T ); thus d+ (v)
is defined for every v ∈ V∞ , and the tree T ∈ T is uniquely determined by the
(out)degree sequence (d+
T (v))v∈V∞ . It is easily seen that this gives a bijection
V∞
between T and the set of sequences (dv ) ∈ N0
di1 ···ik i = 0
with the property
when i > di1 ···ik .
(6.5)
The family Tlf of locally finite trees corresponds to the subset of all such
sequences with all dv < ∞, and the family Tf of finite trees correspond to the
subset of all such sequences (dv ) with all dv < ∞ and only finitely many dv 6= 0.
V∞
In this way we have Tf ⊂ Tlf ⊂ T ⊂ N0 ; note that Tlf = T ∩ NV0 ∞ , so
V∞
Tf ⊂ Tlf ⊂ N0 .
We give N0 the usual compact topology as the one-point compactification of
the discrete space N0 . Thus N0 is a compact metric space. (One metric, among
many equivalent ones, is given by the homomorphism n 7→ 1/(n + 1) onto
V∞
{1/n}∞
the product topology and its subspaces Tf ,
n=1 ∪ {0} ⊂ R.) We give N0
V∞
Tlf and T the induced topologies. Thus N0 is a compact metric space, and
its subspaces Tf , Tlf and T are metric spaces. (The precise choice of metric on
these spaces is irrelevant; we will not use any explicit metric except briefly in
V∞
Section 20.) Moreover, the condition (6.5) defines T as a closed subset of N0 ;
thus T is a compact metric space. (Tf and Tlf are not compact. In fact, it is
easily seen that they are dense proper subsets of T. Tf is a countable discrete
space.)
120
S. Janson
In other words, if Tn and T are trees in T, then Tn → T if and only if the
outdegrees converge pointwise:
+
d+
Tn (v) → dT (v)
for each v ∈ V∞ .
(6.6)
It is easily seen that it suffices to consider v ∈ V (T ), i.e., (6.6) is equivalent to
+
d+
Tn (v) → dT (v)
for each v ∈ V (T ),
(6.7)
since (6.7) implies that if v ∈
/ V (T ), then v ∈
/ V (Tn ) for sufficiently large n, and
thus d+
Tn (v) = 0. (Consider the last node w in V (T ) on the path from the root
+
to v and use d+
Tn (w) → dT (w).)
Alternatively, we may as above consider the node set V (T ) as a subset of
V∞ and regard T as the family of all subsets of V∞ that satisfy (6.2)–(6.4). We
identify the family of all subsets of V∞ with {0, 1}V∞ , and give this family the
product topology, making it into a compact metric space. (Thus, convergence
means convergence of the indicator 1{v ∈ ·} for each v ∈ V∞ .) This induces a
topology on T, where Tn → T means that, for each v ∈ V∞ , if v ∈ V (T ), then
v ∈ V (Tn ) for all large n, and, conversely, if v ∈
/ V (T ), then v ∈
/ V (Tn ) for all
large n.
If v = i1 . . . ik with k > 0, then v ∈ V (T ) if and only of ik 6 d+
T (i1 . . . ik−1 ).
It follows immediately that V (Tn ) → V (T ) in the sense just described, if and
only if (6.6) holds. The two definitions of Tn → T above are thus equivalent (for
T, and thus also for its subsets Tf and Tlf ).
Furthermore, we see, e.g. from (6.6), that the convergence of trees can be
described recursively: Let T(j) denote the j:th subtree of T , i.e., the subtree
rooted at the j:th child of T , for j = 1, . . . , d+
T (o). (We consider only finite j,
even when d+
(o)
=
∞.)
Then,
T
→
T
if
and
only
if
n
T
+
(i) the root degrees converge: d+
Tn (o) → dT (o), and further,
(ii) for each j = 1, . . . , d+
T (o), Tn,(j) → T(j) .
(Note that Tn,(j) is defined for large n, at least, by (i).)
It is important to realize that the notion of convergence used here is a local
(pointwise) one, so we consider only a single v at a time, or, equivalently, a finite
set of v; there is no uniformity in v required.
If T is a locally finite tree, T ∈ Tlf , then d+
T (v) < ∞ for each v, and thus
+
(v)
=
d
(v)
for
all sufficiently large n.
(6.6) means that for each v, d+
Tn
(m)
Let T
denote the tree T truncated at height m, i.e., the subtree of T
consisting of all nodes in generations 0, . . . , m. If T is locally finite, then each
T (m) is a finite tree, and it is easily seen from (6.7) that convergence to T can
be characterised as follows:
Lemma 6.2. If T is locally finite, then, for any trees Tn ∈ T,
Tn → T ⇐⇒ Tn(m) → T (m)
⇐⇒
Tn(m)
=T
(m)
for each m
for each m and all large n.
(The last condition means for n larger than some n(m) depending on m.)
Simply generated trees and random allocations
121
This notion of convergence for locally finite trees is widely used; see e.g. Otter
[93] and Aldous and Pitman [6].
In general, if T is not locally finite, this characterization fails. (For example,
if Sn , 1 6 n 6 ∞, is a star where the root has outdegree n and its children
(m)
(m)
all have outdegree 0, then Sn → S∞ , but Sn 6= S∞ for
Small n and m > 1.)
[m]
Instead, we have to localise also horizontally: Let V
:= k=0 {1, . . . , m}k , the
subset of V∞ consisting of strings of length at most m with all elements at most
m. For a tree T ∈ T, let T [m] be the subtree with node set V (T ) ∩ V [m] , i.e., the
tree T truncated at height m and pruned so that all outdegrees are at most m.
It is then easy to see from (6.6) that the following analogue and generalization
of Lemma 6.2 holds:
Lemma 6.3. For any trees T, Tn ∈ T,
Tn → T ⇐⇒ Tn[m] → T [m]
⇐⇒
Tn[m]
=T
[m]
for each m
for each m and all large n.
(The last condition means for n larger than some n(m) depending on m.)
Our notion of convergence for general trees T ∈ T was introduced in this
form by Jonsson and Stefánsson [67] (where the truncation T [m] is called a left
ball ).
Remark 6.4. It is straightforward to obtain versions of Lemmas 6.2–6.3 for
random trees T , Tn and convergence in probability or distribution. For example:
For any random trees T, Tn ∈ T,
d
d
for each m.
(6.8)
d
for each m,
(6.9)
Tn −→ T ⇐⇒ Tn[m] −→ T [m]
If T ∈ Tlf , a.s., then we also have
d
Tn −→ T ⇐⇒ Tn(m) −→ T (m)
see e.g. Aldous and Pitman [6]. The proofs are standard using the methods in
e.g. Billingsley [15].
7. Main result for simply generated random trees
Our main result for trees is the following, proved in Section 16. The case when
ν > 1 and σ 2 < ∞ was shown implicitly by Kennedy [73] (who considered
Galton–Watson processes and not trees), and explicitly by Aldous and Pitman
[6], see also Grimmett [47], Kolchin [76], Kesten [74] and Aldous [4]. Special
cases with 0 < ν < 1 and ν = 0 are given by Jonsson and Stefánsson [67] and
Janson, Jonsson and Stefánsson [64], respectively.
Theorem 7.1. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 2.
122
S. Janson
(i) If ν > 1, let τ be the unique number in [0, ρ] such that Ψ(τ ) = 1.
(ii) If ν < 1, let τ := ρ.
In both cases, 0 6 τ < ∞ and 0 < Φ(τ ) < ∞. Let
πk :=
τ k wk
,
Φ(τ )
k > 0;
(7.1)
then (πk )k>0 is a probability distribution, with expectation
µ = Ψ(τ ) = min(ν, 1) 6 1
(7.2)
and variance σ 2 = τ Ψ′ (τ ) 6 ∞. Let Tb be the infinite modified Galton–Watson
d
tree constructed in Section 5 for the distribution (πk )k>0 . Then Tn −→ Tb as
n → ∞, in the topology defined in Section 6.
Furthermore, in case (i), µ = 1 (the critical case) and Tb is locally finite with
an infinite spine; in case (ii) µ = ν < 1 (the subcritical case) and Tb has a finite
spine ending with an explosion.
Remark 7.2. Note that we can combine the two cases ν > 1 and ν < 1 and
define, using Lemma 3.1 and with Ψ(ρ) = ν,
n
o
τ := max t 6 ρ : Ψ(t) 6 1 .
(7.3)
Remark 7.3. In case (ii), there is no τ > 0 with Ψ(τ ) = 1, see Lemma 3.1.
Hence the definition of τ can also be expressed as follows, recalling Ψ(t) :=
tΦ′ (t)/Φ(t) from (3.6): τ is the unique number in [0, ρ] such that
τ Φ′ (τ ) = Φ(τ ),
(7.4)
if there exists any such τ ; otherwise τ := ρ. (Equation (7.4) is used in many
papers to define τ , in the case ν > 1.)
Remark 7.4. If 0 < t < ρ, then
tΦ′ (t) − Φ(t)
Φ(t)
d Φ(t)
=
= 2 Ψ(t) − 1 .
2
dt
t
t
t
Since Ψ(t) is increasing by Lemma 3.1, it follows that Φ(t)/t decreases on [0, τ ]
and increases on [τ, ρ], so τ can, alternatively, be characterised as the (unique)
minimum point in [0, ρ] of the convex function Φ(t)/t, cf. e.g. Minami [89] and
Jonsson and Stefánsson [67]. Consequently,
Φ(t)
Φ(t)
Φ(τ )
= inf
= inf
.
06t6ρ t
06t<∞ t
τ
(7.5)
(This holds also when ρ = 0, trivially, since then Φ(t)/t = ∞ for every t > 0.)
Simply generated trees and random allocations
123
Remark 7.5. By Remark 7.4, τ is, equivalently, the (unique) maximum point in
[0, ρ] of t/Φ(t), which by (3.13) is the inverse function of the generating function
Z(z). It follows easily that
τ = Z(ρZ ),
(7.6)
where ρZ = τ /Φ(τ ) is the radius of convergence of Z; see also Corollary 18.17.
Note that 0 6 ρZ < ∞ and that ρZ = 0 ⇐⇒ τ = 0 ⇐⇒ ρ = 0. Otter [93]
uses (7.6) as the definition of τ (by him denoted a); see also Minami [89].
Remark 7.6. When ν = 0 (which is equivalent to ρ = 0), the limit Tb is the
p
non-random infinite star in Example 5.1, so Theorem 7.1 gives Tn −→ Tb .
Remark 7.7. We consider briefly the cases excluded from Theorem 7.1. The
case when w0 = 0 is completely trivial, since then w(T ) = 0 for every finite tree,
so Tn is undefined. The same holds (for n > 2) when w0 > 0 but wk = 0 for all
k > 1, i.e., when ω = 0.
The case when w0 > 0 and w1 > 0 but wk = 0 for k > 2, so ω = 1, is
also trivial. Then w(T ) = 0 unless T is a rooted path Pn for some n. Thus
Zn = w(Pn ) = w0 w1n−1 , and (a.s.) Tn = Pn , which converges as n → ∞ to
the infinite path P∞ . We have ν = 1 = ω, but, in contrast to Theorem 7.1,
τ = ∞, with τ defined e.g. by (7.3). Further, interpreting (7.1) as a limit, we
have πk = δk1 , so (πk ) is the distribution concentrated at 1; thus (5.2) yields
ξb = 1 a.s., so Tb consists of an infinite spine only, i.e. Tb = P∞ . Consequently,
d
Tn −→ Tb holds in this case too.
Remark 7.8. If we replace (wk ) by the equivalent weight sequence (w
ek ) given
by (4.1), then (7.3) and (4.5) show that τ is replaced by
e
τe := max{t 6 ρe : Ψ(t)
6 1} = max{t 6 ρ/b : Ψ(bt) 6 1} = τ /b.
(7.7)
The corresponding probability weight sequence given by (7.1) thus is, using
(4.4),
τek w
ek
(τ /b)k abk wk
τ k wk
π
ek :=
=
=
= πk ,
(7.8)
e τ)
aΦ(τ )
Φ(τ )
Φ(e
so the distribution (πk ) is invariant and depends only on the equivalence class
of (wk ).
Remark 7.9. If ρ > 0, then τ > 0 and the distribution (πk ) is a probability weight sequence equivalent to (wk ). There are other equivalent probability
weight sequences, see Lemma 4.1, but Theorem 7.1 and the theorems below
show that (πk ) has a special role and therefore is a canonical choice of a weight
sequence in its equivalence class. Remark 4.3 shows that (πk ) is the unique
probability distribution with mean 1 that is equivalent to (wk ), if any such
distribution exists. If no such distribution exists but ρ > 0, then (πk ) is the
probability distribution equivalent to (wk ) that has the maximal mean.
A heuristic motivation for this choice of probability weight sequence is that
when we construct Tn as a Galton–Watson tree T conditioned on |T | = n, it
124
S. Janson
is better to condition on an event of not too small probability; in the critical
case this probability decreases as n−3/2 provided σ 2 < ∞, see [93] (ν > 1) and
[76, Theorem 2.3.1] (ν > 1, σ 2 < ∞), and always subexponentially, but in the
subcritical and supercritical cases it typically decreases exponentially fast, see
Theorems 18.7 and 18.11.
As a special case of Theorem 7.1 we have the following result for the root
degree d+
Tn (o), proved in Section 15.
Theorem 7.10. Let (wk )k>0 and (πk )k>0 be as in Theorem 7.1. Then, as
n → ∞,
P(d+
d > 0.
(7.9)
Tn (o) = d) → dπd ,
Consequently, regarding d+
Tn (o) as a random number in N0 ,
d b
d+
Tn (o) −→ ξ,
(7.10)
where ξb is a random variable in N0 with the distribution given in (5.2).
P
Note that the sum ∞
0 dπd = µ of the limiting probabilities in (7.9) may be
less than 1; in that case we do not have convergence to a proper finite random
variable, which is why we regard d+
Tn (o) as a random number in N0 .
Theorem 7.10 describes the degree of the root. If we instead take a random
node, we obtain a different limit distribution, viz. (πk ). We state two versions
of this; the two results are of the types called annealed and quenched in statistical physics. In the first (annealed) version, we take a random tree Tn and,
simultaneously, a random node v in it. In the second (quenched) version we fix
a random tree Tn and study the distribution of outdegrees in it. (This yields
a random probability distribution. Equivalently, we study the outdegree of a
random node conditioned on the tree Tn .)
Theorem 7.11. Let (wk )k>0 and (πk )k>0 be as in Theorem 7.1.
(i) Let v be a uniformly random node in Tn . Then, as n → ∞,
P(d+
Tn (v) = d) → πd ,
d > 0.
(ii) Let Nd be the number of nodes in Tn of outdegree d. Then
(7.11)
Nd p
−→ πd , d > 0.
(7.12)
n
The proof is given in Section 17. (When ν > 1, this was proved by Otter [93],
see also Minami [89].) See Section 21.2 for further results.
Instead of considering just the outdegree of a random node, i.e., its number of
children, we may obtain a stronger result by considering the subtree containing
its children, grandchildren and so on. (This random subtree is called a fringe
subtree by Aldous [2].) We have an analogous result, also proved in Section 17.
Cf. [2], which in particular contains (i) below in the case ν > 1 and σ 2 < ∞;
this was extended by Bennies and Kersting [11] to the general case ν > 1. (Note
that the limit distribution, i.e. the distribution of T , is a fringe distribution in
the sense of [2] only if µ = 1, i.e., if and only if ν > 1.)
125
Simply generated trees and random allocations
Theorem 7.12. Let (wk )k>0 and (πk )k>0 be as in Theorem 7.1, and let T be
the Galton–Watson tree with offspring distribution (πk ). Further, if v is a node
in Tn , let Tn;v be the subtree rooted at v.
d
(i) Let v be a uniformly random node in Tn . Then, Tn;v −→ T , i.e., for any
fixed tree T ,
P(Tn;v = T ) → P(T = T ).
(7.13)
(ii) Let T be an ordered rooted tree and let NT := |{v : Tn;v = T }| be the
number of nodes in Tn such that the subtree rooted there equals T . Then
NT p
−→ P(T = T ).
n
(7.14)
Remark 7.13. Aldous [2] considers also the tree obtained by a random rerooting of Tn , i.e., the tree obtained by declaring a uniformly random node v
to be the root. Note that this re-rooted tree contains Tn;v as a subtree, and
that, provided v 6= o, there is exactly one branch from the new root not in this
subtree, viz. the branch starting with the original parent of v. Aldous [2] shows,
at least when ν > 1 and σ 2 < ∞, convergence of this randomly re-rooted tree
to the random sin-tree in Remark 5.3. The limit of the re-rooted tree is thus
very similar to the limit of Tn in Theorem 7.1, but not identical to it.
8. Three different types of weights
Although Theorem 7.1 has only two cases, it makes sense to treat the case ρ = 0
separately. We thus have the following three (mutually exclusive) cases for the
weight sequence (wk ):
I. ν > 1. Then 0 < τ < ∞ and τ 6 ρ 6 ∞. The weight sequence (wk )
is equivalent to (πk ), which is a probability distribution
with mean µ =
P∞
Ψ(τ ) = 1 and probability generating function k=0 πk z k with radius of
convergence ρ/τ > 1.
II. 0 < ν < 1. Then 0 < τ = ρ < ∞. The weight sequence (wk ) is equivalent
to (πk ), which is a probability distribution
with mean µ = Ψ(τ ) = ν < 1
P∞
and probability generating function k=0 πk z k with radius of convergence
ρ/τ = 1.
III. ν = 0. Then τ = ρ = 0, and (wk ) is not equivalent to any probability
distribution.
If we consider the modified Galton–Watson tree in Theorem 7.1, then III is
the case discussed in Example 5.1; excluding this case, I and II are the same as
(T1) and (T2) in Section 5.
We can reformulate the partition into three cases in more probabilistic terms.
If ξ is a non-negative integer valued random variable with distribution
given by
P∞
pk = P(ξ = k), k > 0, then the exponential moments of ξ are E Rξ = k=0 pk Rk
for R > 1. (Equivalently, E erξ for r := log R > 0.) We say that ξ, or the
distribution (pk ), has some finite exponential moment if E Rξ < ∞ for some
126
S. Janson
P∞
k
R > 1; this is equivalent to the probability generating function
k=0 pk z
having radius of convergence strictly larger than 1.
Consider again a probability distribution (w
ek ) equivalent to (wk ), with w
ek =
tk wk /Φ(t) for some t 6 ρ. By Section 4, the radius of convergence of the probe
ability generating function Φ(z)
of this distribution is ρ/t, cf. (4.4). Hence, the
distribution (w
ek ) has some finite exponential moment if and only if 0 < t < ρ.
The cases I–III can thus be described as follows:
I. ν > 1. Then (wk ) is equivalent to a probability distribution with mean
µ = 1 (with or without some exponential moment). Moreover, (πk ) in
(7.1) is the unique such distribution.
II. 0 < ν < 1. Then (wk ) is equivalent to a probability distribution with mean
µ < 1 and no finite exponential moment. Moreover, (πk ) in (7.1) is the
unique such distribution.
III. ν = 0. Then (wk ) is not equivalent to any probability distribution.
Case I may be further subdivided. From an analytic point of view, it is natural
to split I into two subcases:
Ia. ν > 1; equivalently, 0 < τ < ρ 6 ∞. The weight sequence (wk ) is equivalent to (πk ), which is a probability
P∞ distribution with mean µ = 1 and probability generating function k=0 πk z k with radius of convergence ρ/τ > 1.
In other words, (wk ) is equivalent to a probability distribution with mean
µ = 1 and some finite exponential moment. (Then (πk ) is the unique such
distribution.) By (7.6), the condition can also be written analytically as
Z(ρZ ) < ρ, a version used e.g. in [35]. (This case is called generic in [35]
and [67].)
Ib. ν = 1; then 0 < τ = ρ < ∞. The weight sequence (wk ) is equivalent to
(πk ), which is a probability
distribution with mean 1 and probability genP∞
erating function k=0 πk z k with radius of convergence ρ/τ = 1. In other
words, (wk ) is equivalent to a probability distribution with mean µ = 1 and
no finite exponential moment. (Then (πk ) is the unique such distribution.)
Case Ia is convenient when using analytic methods, since it says that the
point τ is strictly inside the domain of convergence of Φ, which is convenient for
methods involving contour integrations in the complex plane. (See e.g. Drmota
[33] for several such results of different types.) For that reason, many papers
using such methods consider only case Ia. However, it has repeatedly turned
out, for many different problems, that results proved by such methods often
hold, by other proofs, assuming only that we are in case I with finite variance of
(πk ). (In fact, as shown in [59], it is at least sometimes possible to use complex
analytic methods also in the case when τ = ρ and (πk ) has a finite second
moment.) Consequently, it is often more important to partition case I into the
following two cases:
Iα. ν > 1 and (πk ) has variance σ 2 < ∞. In other words, (wk ) is equivalent
to a probability distribution (πk ) with mean µ = 1 and finite second
moment σ 2 .
Simply generated trees and random allocations
127
Iβ. ν = 1 and (πk ) has variance σ 2 = ∞. In other words, (wk ) is equivalent
to a probability distribution with mean µ = 1 and infinite variance.
Note that Ia is a subcase of Iα, since a finite exponential moment implies
that the second moment is finite.
When ν > 1, the quantity σ 2 is another natural parameter of the weight
sequence (wk ), which frequently occurs in asymptotic results, see e.g. Section 21.
(When ν < 1, the natural analogue is ∞, see Remark 5.5.) By Theorem 7.1 (or
(4.10)), σ 2 = τ Ψ′ (τ ), so (assuming ν > 1), we have case Iα when Ψ′ (τ ) < ∞
and Iβ when Ψ′ (τ ) = ∞. Moreover, when ν > 1, then (πk ) has mean µ = 1, and
it follows from (4.8) that the variance σ 2 of (πk ) also is given by the formula [4]
σ 2 = Φ′′τ (1) + µ − µ2 = Φ′′τ (1) =
τ 2 Φ′′ (τ )
.
Φ(τ )
(8.1)
Hence Iα is the case ν > 1 and Φ′′ (τ ) < ∞; equivalently, either ν > 1 or ν = 1
and Φ′′ (ρ) < ∞.
Remark 8.1. We have seen that except in case III, we may without loss of
generality assume that the weight (wk ) is a probability weight sequence. If this
distribution is critical, i.e. has mean 1, we are in case I with πk = wk , so we do
not have to change the weights.
If the distribution (wk ) is supercritical, then ν > 1 and we are in case Ia; we
can change to an equivalent critical probability weight. Hence we never have to
consider supercritical weights. (Recall that by Remark 4.3, ν is the supremum
of the means of the equivalent probability weight sequences.)
If the distribution (wk ) is subcritical, we can only say that we are in case I
or II. We can often change to an equivalent critical probability weight, but not
always.
9. The maximum degree
Theorem 7.1 studies convergence of the random tree Tn in the topology defined
in Section 6, which really means local convergence close to the root; we have seen
that the limit is of somewhat different types depending on the weight sequence,
with condensation in the form of a node of infinite degree in the limit tree Tb in
cases II and III but not in case I.
An alternative way to study the condensation (or absence of it) is to study
the largest degree in the tree. This is discussed in detail in Section 19 (in the
more general setting of random allocations). We give here a short summary of
the main results, showing a similar picture: the maximum degree is typically
rather small (logarithmic) in case I but larger (of order n) in cases II and III,
which can be interpreted as a condensation; however, there are exceptions in the
latter cases, and we do not have general theorems covering all possible weight
sequences.
The relation between the two ways of looking at condensation is discussed in
Section 20.
128
S. Janson
We denote, as in Section 19, the maximum outdegree in the tree Tn by Y(1) ;
we use further the notation in Theorem 7.1 and Section 8.
Case Ia: ν > 1
In this case 0 < τ < ρ 6 ∞, and we have a logarithmic bound due to Meir and
Moon [86] (Theorem 19.3):
Y(1) 6
1/k
if further wk
1
log n + op (log n);
log(ρ/τ )
(9.1)
→ 1/ρ as k → ∞, then
Y(1) p
1
−→
.
log n
log(ρ/τ )
(9.2)
In particular, if ρ = ∞, then Y(1) = op (log n).
Moreover, if wk+1 /wk → a > 0 as k → ∞, then Y(1) = k(n) + Op (1) for some
deterministic sequence k(n), so Y(1) is essentially concentrated in an interval of
length O(1) (Theorem 19.16). The distribution of Y(1) is asymptotically given by
a discretised Gumbel distribution (Theorem 19.19), but different subsequences
may have different limits and no limit distribution exists.
Similarly, if wk+1 /wk → 0, then Y(1) ∈ {k(n), k(n)+1} so Y(1) is concentrated
on at most two values, and often (but not always) on a single value (Theorems
19.16 and 19.23).
Case Iα: ν > 1 and σ 2 < ∞
The maximum outdegree Y(1) is asymptotically distributed as the maximum
ξ(1) of n i.i.d. copies of ξ; this holds in the strong sense that the total variation
distance
(9.3)
dTV Y(1) , ξ(1) → 0
(Theorem 19.7 and Corollary 19.11). Since E ξ 2 < ∞, this implies in particular
Y(1) = op (n1/2 ).
(9.4)
Y(1) = op (n)
(9.5)
Case Iβ: ν > 1 and σ 2 = ∞
We have
(Theorem 19.2), and this is (more or less) best possible (Example 19.27). However, (9.3) does not always hold in this case (Example 19.27).
Simply generated trees and random allocations
129
Case II: 0 < ν < 1
In this case, if further (wk ) satisfies an asymptotic power-law wk ∼ ck −β as
k → ∞, then Jonsson and Stefánsson [67] showed that
Y(1) = (1 − ν)n + op (n),
(9.6)
while the second largest node degree Y(2) = op (n) (Theorem 19.34 and Remark 19.35). However, if the weight sequence is more irregular, this is no longer
always true; it is possible (at least along a subsequence) that Y(1) = op (n),
which can be seen as incomplete condensation; it is also possible (at least along
a subsequence) that Y(2) too is of order n, meaning condensation to two or more
giant nodes (Example 19.37).
Case III: ν = ρ = 0
This is similar to case II. In some regular cases we have (9.6), which now says
Y(1) = n + op (n), and then necessarily Y(2) = op (n) (Example 19.36), but there
are exceptions in other cases with an irregular weight sequence (Examples 19.38
and 19.39).
10. Examples of simply generated random trees
One of the reasons for the interest in simply generated trees is that many kinds
of random trees occuring in various applications can be seen as simply generated
random trees and conditioned Galton–Watson trees. We give some important
examples here, see further Aldous [3, 4], Devroye [32] and Drmota [33].
We see from Theorem 7.1 and Section 8 that any simply generated random
tree defined by a weight sequence with ρ > 0 can be defined by an equivalent
probability weight sequence, and then the tree is the corresponding conditioned
Galton–Watson tree. Moreover, the probability weight sequence (πk ) defined
in (7.1) is the canonical choice of offspring distribution. Recall that (πk ) is
characterised by having mean 1, whenever this is possible (i.e., in case I), i.e.,
we prefer to have critical Galton–Watson trees.
Example 10.1 (ordered trees). The simplest example is to take wk = 1 for
every k > 0. Thus every tree has weight 1, and Tn is a uniformly random
ordered rooted tree with n nodes. Further, Zn is the number of such trees; thus
Zn is the Catalan number Cn−1 , see Remark 2.2 and (2.1). (For this reason,
these random trees are sometimes called Catalan trees.)
We have
∞
X
1
Φ(t) =
tk =
(10.1)
1−t
k=0
and
Ψ(t) =
tΦ′ (t)
t
=
.
Φ(t)
1−t
(10.2)
130
S. Janson
Thus ρ = 1 and ν = ∞ (cf. Lemma 3.1(iv)), and Ψ(τ ) = 1 yields τ = 1/2.
Hence (7.1) yields the canonical probability weight sequence
πk = 2−k−1 ,
k > 0.
(10.3)
In other words, the uniformly random ordered rooted tree is the conditioned
Galton–Watson tree with geometric offspring distribution ξ ∼ Ge(1/2). (This is
the geometric distribution with mean 1. Any other geometric distribution yields
an equivalent weight sequence, and thus the same conditioned Galton–Watson
tree.)
The size-biased random variable ξb in (5.2) has the distribution
P(ξb = k) = kπk = k2−k−1 ,
k > 1;
(10.4)
thus ξb − 1 has a negative binomial distribution NBin(2, 1/2). It follows that
in the infinite tree Tb , if v is a node on the spine (for example the root) and
dL (v), dR (v) are the numbers of children of it to the left and right of the spine,
respectively, then
P dL (v) = j and dR (v) = k =
1
P(ξb = j + k + 1) = 2−j−k−2
j+k+1
(10.5)
= 2−j−1 · 2−k−1 ,
j, k > 0;
thus dL (v) and dR (v) are independent and both have the same distribution
Ge(1/2) as ξ.
We have σ 2 := Var ξ = τ Ψ′ (τ ) = 2, see Theorem 7.1 and (8.1), and E ξb =
2
σ + 1 = 3, see (5.5).
Example 10.2 (unordered trees). We have assumed that our trees are ordered,
but it is possible to consider unordered labelled rooted trees too by imposing
a random order on the set of children of each node. Note first that for ordered
trees, the ordering of the children implicitly yields a labelling of all nodes as
in Section 6. Hence, any ordered tree with n nodes can be explicitly labeled by
1, . . . , n in exactly n! ways, and a uniformly random labelled ordered rooted tree
is the same as a uniformly random unlabelled ordered rooted tree with a random
labelling. (For unordered trees, a uniformly random labelled tree is different
from a uniformly random unlabelled tree. We consider only labelled unordered
trees here. In fact, unlabelled unordered trees are not simply generated trees;
more formally, there is no weight sequence such that the corresponding simply
generated random tree, with the orderings of the children of each node ignored,
is a uniformly random unlabelled unordered tree.)
Q
An unordered labelled rooted tree with outdegrees di corresponds to i di !
different ordered labelled rooted
trees. If we take wk = 1/k!, we give each of
Q
these ordered trees weight i di !−1 , so their total weight is 1. Hence, the simply
generated random tree with the weight sequence (1/k!) yields, by ignoring the
orderings of the children of each node, a uniformly random unordered labelled
rooted tree.
Simply generated trees and random allocations
131
In this sense, a uniformly random unordered labelled rooted tree is equivalent
to a simply generated random tree with wk = 1/k!, and with a minor abuse of
notation, we may say that a uniformly random unordered labelled rooted tree
is simply generated (with wk = 1/k!).
The number of unordered labelled unrooted trees with n nodes is nn−2 , see
e.g. [103, Section 5.3], a result given by Cayley [22] and known as Cayley’s
formula. (Although attributed by Cayley to Borchardt [17] and even earlier
found by Sylvester [104], see e.g. [103, p. 66].) Equivalently, the number of
unordered labelled rooted trees with n nodes is nn−1 . Hence random such trees
are sometimes called Cayley trees. However, this name is also used for regular
infinite trees.
We have, with wk = 1/k!,
Φ(t) =
∞ k
X
t
= et
(10.6)
tΦ′ (t)
= t.
Φ(t)
(10.7)
k=0
and
Ψ(t) =
k!
Thus ν = ∞ and Ψ(τ ) = 1 yields τ = 1. Hence (7.1) yields the canonical
probability weight sequence
πk =
e−1
,
k!
k > 0.
(10.8)
In other words, the uniformly random labelled unordered rooted tree is (equivalent to) the conditioned Galton–Watson tree with Poisson offspring distribution
ξ ∼ Po(1). (Any other Poisson distribution yields an equivalent weight sequence,
and thus the same conditioned Galton–Watson tree.)
The size-biased random variable ξb in (5.2) has the distribution
P(ξb = k) = kπk =
e−1
,
(k − 1)!
k > 1;
(10.9)
d
thus ξb − 1 has also the Poisson distribution Po(1), i.e., ξb − 1 = ξ. (It is only for
d
a Poisson distribution that ξb − 1 = ξ.)
2
′
We have σ := Var ξ = τ Ψ (τ ) = 1 and E ξb = σ 2 + 1 = 2, cf. (8.1) and (5.5).
The partition function is given by
Zn (π) = P(|T | = n) =
nn−1 e−n
.
n!
(10.10)
This is a special case of the Borel distribution in (12.29) below; Borel [18] proved
a result equivalent to (10.10) for a queueing problem, see also Otter [93], Tanner
[107], Dwass [36], Takács [106], Pitman [99], Example 12.6 and Theorem 15.5
below. Equivalently, using (4.3),
Zn (w) = en Zn (π) =
nn−1
.
n!
(10.11)
132
S. Janson
Recall that Zn is defined by the sum (2.5) over unlabelled ordered rooted trees;
if we sum over labelled ordered rooted trees, we obtain n! Zn , which by the
argument above corresponds to weight 1 on each labelled unordered rooted tree;
i.e., the number of labelled unordered rooted trees is n! Zn (w) = nn−1 . Thus
(10.11) is equivalent to Cayley’s formula for the number of unordered trees given
above.
P∞
By (10.11), the generating function Z(z) is n=1 nn−1 z n /n!, known as the
the tree function; see (12.22)–(12.25) in Example 12.6.
Example 10.3 (binary trees I). The name binary tree is used in (at least) two
different, but related, meanings. The first version (Drmota [33, Section 1.2.1]),
sometimes called full binary tree or strict binary tree, is an ordered rooted tree
where every node has outdegree 0 or 2. We obtain a uniformly random full
binary tree by taking the weight sequence with w0 = w2 = 1, and wk = 0 for
k 6= 0, 2. Note that this weight sequence has span 2; this is the standard example
of a weight sequence with span > 1. As a consequence, a full binary tree of size
n exists only if n is odd. (This is easily seen directly; see Corollary 15.6 for a
general result.)
We have
Φ(t) = 1 + t2
(10.12)
and
Ψ(t) =
2t2
tΦ′ (t)
.
=
Φ(t)
1 + t2
(10.13)
Thus ρ = ∞, ν = 2 (cf. Lemma 3.1(v)), and Ψ(τ ) = 1 yields τ = 1. Hence (7.1)
yields the canonical probability weight sequence
πk = 21 ,
k = 0, 2.
(10.14)
In other words, the random full binary tree is the conditioned Galton–Watson
tree with offspring distribution ξ = 2X where X ∼ Be(1/2). (In the Galton–
Watson tree T , thus each node gets either twins or no children, each outcome
with probability 1/2.)
The size-biased random variable ξb has P(ξb = 2) = 1 by (10.14) and (5.2), so
ξb = 2 and ξb − 1 = 1 a.s.
We have σ 2 := Var ξ = 1 and E ξb = σ 2 + 1 = 2, cf. (8.1) and (5.5).
Example 10.4 (binary trees II). The second version of a binary tree (Drmota
[33, Example 1.3]) is a rooted tree where every node has at most one left child
and at most one right child. Thus, each outdegree is 0, 1 or 2; if there are two
children they are ordered, and, moreover, if there is only one child, it is marked
as either left or right. (There is a one-to-one correspondence between binary
trees of this type with n nodes and the full binary trees in Example 10.3 with
2n + 1 nodes, mapping a binary tree T to a full binary tree T ′ , where T ′ is
obtained from T by adding 2 − d external nodes at every node with outdegree d;
conversely, we obtain T by deleting all leaves in T ′ and keeping only the nodes
that have outdegree 2 in T ′ (the internal nodes).)
Simply generated trees and random allocations
133
There are thus two types of nodes with outdegree 1, but only one type each
of nodes with outdegrees 0 or 2. If we ignore the type of child at each node
with only one child, we obtain an ordered tree with all outdegrees 6 2, and
each such tree with n1 nodes of degree 1 corresponds to 2n1 binary trees. This
number equals the weight given by the
weight sequence w0 = 1, w1 = 2, w2 = 1,
and wk = 0 for k > 3, i.e., wk = k2 . Hence, a simply generated random tree
with this weight sequence has the same distribution as the ordered tree obtained
from a uniformly distributed random binary tree; conversely, we may obtain a
uniformly distributed
random binary tree by taking a simply generated random
tree with wk = k2 and randomly labelling each single child as left or right.
In this sense, we may say that a uniformly distributed random
binary tree is
(equivalent to) a simply generated random tree with wk = k2 .
The choice wk = k2 yields
Φ(t) = 1 + 2t + t2 = (1 + t)2
(10.15)
and
2t
tΦ′ (t)
=
.
(10.16)
Φ(t)
1+t
Thus ρ = ∞, ν = 2, and Ψ(τ ) = 1 yields τ = 1. Hence (7.1) yields the canonical
probability weight sequence
1 2
,
k > 0.
(10.17)
πk =
4 k
Ψ(t) =
In other words, a uniformly random binary tree of this type is (equivalent to)
the conditioned Galton–Watson tree with binomial offspring distribution ξ ∼
Bi(2, 1/2). (Any other distribution Bi(2, p), 0 < p < 1, is equivalent and yields
the same conditioned Galton–Watson tree.)
The size-biased random variable ξb has by (5.2) P(ξb = 1) = P(ξb = 2) = 21 ;
thus ξb − 1 ∼ Bi(1, 1/2).
We have σ 2 := Var ξ = 1/2 and E ξb = σ 2 + 1 = 3/2, cf. (8.1) and (5.5).
Example 10.5 (Motzkin trees). A Motzkin tree is a ordered rooted tree with
each outdegree 6 2. The difference from Example 10.4 is that there is only one
type of a single child. Thus we count such trees and obtain uniformly random
Motzkin trees by taking w0 = w1 = w2 = 1 and wk = 0, k > 3. (We thus have
the same set of trees as in Example 10.4, but different probability distributions
on it.)
We have
Φ(t) = 1 + t + t2
(10.18)
and
1 + 2t
.
(10.19)
1 + t + t2
Thus ρ = ∞, ν = 2, and Ψ(τ ) = 1 yields τ = 1. Hence (7.1) yields the canonical
probability weight sequence
Ψ(t) =
πk = 31 ,
k = 0, 1, 2.
(10.20)
134
S. Janson
In other words, a uniformly random Motzkin tree is the conditioned Galton–
Watson tree with offspring distribution ξ uniform on {0, 1, 2}.
The size-biased random variable ξb has, by (5.2) and (10.20), the distribution
b
P(ξ = 1) = 13 , P(ξb = 2) = 32 ; thus ξb − 1 ∼ Bi(1, 2/3).
We have σ 2 := Var ξ = 2/3 and E ξb = σ 2 + 1 = 5/3, cf. (8.1) and (5.5).
Example 10.6 (d-ary trees). In a d-ary tree, each node has d positions where a
child may be attached, and there is at most one child per position. (Trees with
children attached at different positions are regarded as different trees.) This
generalises the binary trees in Example 10.4,
which is the special case d = 2.
Since k children may be attached in kd ways (with a given order), the argud-ary tree is equivalent
ment in Example 10.4 shows that a uniformly random
to a simply generated random tree with wk = kd . We have
Φ(t) = (1 + t)d
and
Ψ(t) =
dt
tΦ′ (t)
=
.
Φ(t)
1+t
(10.21)
(10.22)
Thus ρ = ∞, ν = ω = d, and Ψ(τ ) = 1 yields τ = 1/(d − 1). Hence (7.1) yields
the canonical probability weight sequence
d
d 1 k d − 1 d−k
,
k > 0.
(10.23)
πk =
(d − 1)d−k d−d =
d
k
k d
In other words, a uniformly random d-ary tree is (equivalent to) the conditioned Galton–Watson tree with binomial offspring distribution ξ ∼ Bi(d, 1/d).
(Any other distribution Bi(d, p), 0 < p < 1, is equivalent and yields the same
conditioned Galton–Watson tree.)
The size-biased random variable ξb has the distribution
d − 1 1 k−1 d − 1 d−k
P(ξb = k) = kπk =
,
k > 1;
(10.24)
k−1 d
d
thus ξb − 1 has the Binomial distribution Bi(d − 1, 1/d).
We have σ 2 := Var ξ = 1 − 1/d and E ξb = σ 2 + 1 = 2 − 1/d, cf. (8.1) and
(5.5).
Example 10.7. Let β be a real constant and let wk = (k + 1)−β . (The case
β = 0 is Example 10.1.) Then ρ = 1.
If −∞ < β 6 1, then Φ(ρ) = ∞, so ν = ∞ by (3.10) and Lemma 3.1(iv).
If β > 1, then Φ(ρ) = ζ(β) < ∞ and
P
kwk
ζ(β − 1) − ζ(β)
=
,
β > 2,
(10.25)
ν = Ψ(1) = k
Φ(1)
ζ(β)
while ν = Ψ(1) = ∞ if β 6 2. Hence, see also Bialas and Burda [13],
ν = 1 ⇐⇒ ζ(β − 1) = 2ζ(β) ⇐⇒ β = β0 = 2.47875 . . .
(10.26)
Simply generated trees and random allocations
135
and ν > 1 ⇐⇒ −∞ < β < β0 . (It can be shown that ν is a decreasing function
of β for β > 2.) In the case β = β0 , when thus ν = 1, we further have σ 2 = ∞
by (8.1), since Φ′′ (1) = ∞ when β 6 3. This is thus case Iβ, in the notation of
Section 8.
In the case β > β0 we thus have 0 < ν < 1, and Tn converges to a random tree
Tb with one node of infinite degree, see Theorem 7.1 and Section 5. If β 6 β0 ,
then ν > 1 and the limit tree Tb is locally finite. We thus see a phase transition
at β = β0 when we vary β in this example.
Note, however, that there is nothing special with the rate of decrease k −β0 ;
the value of β0 depends on the exact form of our choice of the weights wk in
this example, and reflects the values for small k rather than the asymptotic
behaviour. For example, as remarked by Bialas and Burda [13], just changing
w0 would change β0 to any desired value in (2, ∞). With a different w0 , Φ(1) =
ζ(β) − 1 + w0 , and a modification of (10.25) shows that the critical value β0
yielding ν = 1 is given by, see [13],
2ζ(β0 ) − ζ(β0 − 1) = 1 − w0 .
(10.27)
In particular, β0 > 3 for w0 < 1 + ζ(2) − 2ζ(3) = 0.24082 . . . ; in this case, for
the critical β = β0 , we then have ν = 1 and σ 2 < ∞, see (8.1).
See [13] for some further analytic properties. For example, if β0 < 3 (for
example when w0 = 1), then, as β ր β0 , we have 1 − τ ∼ c(β0 − β)1/(β0 −2) ,
where c > 0 and the exponent can take any value > 1.
P
k
Example 10.8. Take wk = k!. The generating function Φ(t) = ∞
k=0 k! t has
radius of convergence ρ = 0 so we are in case III, and there exists no equivalent
conditioned Galton–Watson tree.
Theorem 7.1 shows that Tn converges to an infinite star, see Remark 7.6 and
Example 5.1. This means that the root degree converges in probability to ∞,
and that the outdegree of any fixed child converges to 0 in probability, i.e.,
equals 0 w.h.p. Note, however, that we cannot draw the conclusion that the
outdegrees of all children of the root are 0 w.h.p.; Theorem 7.1 and symmetry
imply that the proportion of children of the root with outdegree > 0 tends to
0, but the number of such children may still be large. (Theorem 7.11(ii) yields
the same conclusion.)
In fact, for this particular example wk = k!, it is shown by Janson, Jonsson
and Stefánsson [64], using direct calculations, that w.h.p. all subtrees attached
to the root have size 1 or 2, and that the number of such subtrees of size 2 has
an asymptotic Poisson distribution Po(1). (This number thus w.h.p. equals N1 ,
and l2 (Tn ), and also the number of children of the root with at least one child.)
Example 10.9. If we instead take wk = k!α with 0 < α < 1, then as in
Example 10.8, ρ = 0 and Tn converges to the infinite star in Example 5.1. In this
p
case, if (for simplicity) 1/α ∈
/ N1 , then Ni (Tn )/n1−iα −→ i!α for 1 6 i 6 ⌊1/α⌋,
while Ni = 0 w.h.p. for each fixed i > ⌊1/α⌋; furthermore, among the subtrees
attached to the root, w.h.p. there are subtrees of all sizes 6 ⌊1/α⌋ + 1, and all
possible shapes of these trees, with the number of each type tending to ∞ in
136
S. Janson
probability, but no larger subtrees. See Janson, Jonsson and Stefánsson [64] for
details.
If we take wk = k!α with α > 1, then w.h.p. Tn is a star with n − 1 leaves, so
Nd = 0 for 1 6 d < n − 1.
See also the examples in Section 12.
11. Balls-in-boxes
The balls-in-boxes model is a model for random allocation of m (unlabelled)
balls in n (labelled) boxes; here m > 0 and n > 1 are given integers. The set of
possible allocations is thus
n
o
n
X
yi = m ,
Bm,n := (y1 , . . . , yn ) ∈ Nn0 :
(11.1)
i=1
where yi counts the number of balls in box i.
We suppose again that w = (wk )∞
k=0 is a fixed weight sequence, and we define
the weight of an allocation y = (y1 , . . . , yn ) as
w(y) :=
n
Y
wyi .
(11.2)
i=1
Given m and n, we choose a random allocation Bm,n with probability proportional to its weight, i.e.,
P(Bm,n = y) =
w(y)
,
Z(m, n)
y ∈ Bm,n ,
(11.3)
where the normalizing factor Z(m, n), again called the partition function, is
given by
X
Z(m, n) = Z(m, n; w) :=
w(y).
(11.4)
y∈Bm,n
We consider only m and n such that Z(m, n) > 0; otherwise Bm,n is unde(m,n)
(m,n)
fined. See further Lemma 13.3. We write Bm,n = (Y1
, . . . , Yn
), which
we usually simplify to (Y1 , . . . , Yn ), omitting the superscripts.
Remark 11.1. The names balls-in-boxes and balls-in-bins are used in the literature for several different allocation models. We use balls-in-boxes for the model
defined here, following e.g. Bialas, Burda and Johnston [14].
Example 11.2 (probability weights). In the special case when (wk ) is a probability weight sequence, let ξ1 , ξ2 , . . . be i.i.d.random variables with the distribution (wk ). Then w(y) = P (ξ1 , . . . , ξn ) = y for any y = (y1 , . . . , yn ). Hence
Z(m, n) = P (ξ1 , . . . , ξn ) ∈ Bm,n = P(Sn = m),
(11.5)
Simply generated trees and random allocations
where we define
Sn :=
n
X
ξi .
137
(11.6)
i=1
Moreover, Bm,n has the same distribution as (ξ1 , . . . , ξn ) conditioned on Sn = m:
(m,n)
(Y1
d
, . . . , Yn(m,n) ) = (ξ1 , . . . , ξn ) | Sn = m .
(11.7)
We will use this setting (and notation) several times below. (This construction
of a random allocation Bm,n is used by Kolchin [76] and there called the general
scheme of allocation.)
We can replace the weight sequence by an equivalent weight sequence for the
balls-in-boxes model just as we did for the random trees in Section 4.
Lemma 11.3. Suppose that we replace the weights (wk ) by equivalent weights
(w
ek ) where w
ek := abk wk with a, b > 0 as in (4.1). Then the weight of an
allocation y = (y1 , . . . , yn ) ∈ Bm,n is changed to
w(y)
e
= an bm w(y),
(11.8)
e
e = an bm Z(m, n),
Z(m,
n) := Z(m, n; w)
(11.9)
and the partition function Z(m, n) = Z(m, n; w) is changed to
while the distribution of Bm,n is invariant. Thus Bm,n depends only on the
equivalence class of the weight sequence.
Proof. We have, by the definition (11.2),
w(y)
e
=
n
Y
i=1
w
eyi =
n
Y
i=1
abyi wyi = an b
Pn
i=1
yi
n
Y
wyi = an bm w(y),
(11.10)
i=1
which shows (11.8), and (11.9) follows by (11.4). Consequently, for every y ∈
e
Bm,n , we have w(y)/
e
Z(m,
n) = w(y)/Z(m, n) so the probability P(Bm,n = y)
in (11.3) is unchanged, which completes the proof.
Our aim is to describe the asymptotic distribution of the random allocation
Bm,n as m, n → ∞; we consider the case when m/n → λ for some real λ, and
assume for simplicity that 0 6 λ < ω = ω(w). (Cases with m/n → ∞ are
interesting too in some applications, for example in Section 19.7, but will not
be considered here. See e.g. Kolchin, Sevast’yanov and Chistyakov [77], Kolchin
[76] and Pavlov [96] for such results in special cases.) The first step is to note
that the distribution of Bm,n = (Y1 , . . . , Yn ) is exchangeable, i.e., invariant under
any permutation of Y1 , . . . , Yn . Hence, the distribution is completely described
by the (joint) distribution of the numbers of boxes with a certain number of
balls, so it suffices to study these numbers.
For any allocation of balls y = (y1 , . . . , yn ) ∈ Nn0 , and k > 0, let
Nk (y) := |{i : yi = k}|,
(11.11)
138
S. Janson
the number of boxes with exactly k balls. Thus, if y ∈ Bm,n , then
∞
X
Nk (y) = n
∞
X
and
k=0
kNk (y) = m.
(11.12)
k=0
We thus want to find the asymptotic distribution of the random variables
Nk (Bm,n ), k = 0, 1, . . . . Our main result is the following, which will be proved
in Section 14 together with the other theorems in this section.
Theorem 11.4. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ
with 0 6 λ < ω.
(i) If λ 6 ν, let τ be the unique number in [0, ρ] such that Ψ(τ ) = λ.
(ii) If λ > ν, let τ := ρ.
In both cases, 0 6 τ < ∞ and 0 < Φ(τ ) < ∞. Let
πk :=
wk τ k
,
Φ(τ )
k > 0.
(11.13)
Then (πk )k>0 is a probability distribution, with expectation
µ = Ψ(τ ) = min(λ, ν)
(11.14)
and variance σ 2 = τ Ψ′ (τ ) 6 ∞. Moreover, for every k > 0,
p
Nk (Bm,n )/n −→ πk .
(11.15)
If we regard the weight sequence w as fixed and vary λ (i.e., vary m(n)), we
see that if 0 < ν < ∞, there is a phase transition at λ = ν.
Note that τ and πk in Theorem 7.1 are the same as in Theorem 11.4 with
λ = 1. Indeed, we will later see that the random trees correspond to m = n − 1
and thus λ = 1.
Remark 11.5. The argument in Remark 7.4 extends and shows that τ is the
(unique) minimum point in [0, ρ] of Φ(t)/tλ ; i.e.,
Φ(t)
Φ(t)
Φ(τ )
= inf
= inf
.
λ
λ
06t6ρ t
06t<∞ tλ
τ
(11.16)
By (11.15), there are roughly nπk boxes with k balls. Summing this approximation
over all k we would get n boxes (as we should) with a total of
P
n ∞
kπ
=
nµ balls. However, the total number of balls is m ≈ nλ, so in
k
k=0
the case λ > ν, (11.14) shows that about n(λ − µ) = n(λ − ν) balls are missing.
Where are they?
P∞
The explanation is that the sums k=0 kNk (Bm,n )/n = m are not uniformly
summable, and we cannot take the limit inside the summation sign. The “missing balls” appear in one or several boxes with very many balls, but these “giant”
Simply generated trees and random allocations
139
boxes are not seen in the limit (11.15) for fixed k. In physical terminology, this
can be regarded as condensation of part of the mass (= balls). We study this
further in Section 19.6. The simplest case is that there is a single giant box with
≈ (λ − ν)n balls. We shall see that this happens in an important case (Theorem 19.34; see also Bialas, Burda and Johnston [14, Fig. 1] for some numerical
examples), but that there are also other possibilities (Examples 19.37–19.39).
Recall that for simply generated random trees, which as said above correspond to balls-in-boxes with λ = 1, Theorem 7.1 too shows that there is a
condensation when ν < λ = 1 (since then µ < 1 by (7.2)); in this case the
condensation appears as a node of infinite degree in the random limit tree Tb of
type (T2), see Section 5. We shall in Section 20 study the relation between the
forms of the condensation shown in Theorems 7.1 and 11.4.
We further have the following, essentially equivalent, version of Theorem 11.4,
where we assume only that m/n is bounded, but not necessarily convergent.
Theorem 11.6. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n 6 C for
some C < ω.
Define the function τ : [0, ∞) → [0, ∞] by τ (x) := sup{t 6 ρ : Ψ(t) 6 x}.
Then τ (x) is the unique number in [0, ρ] such that Ψ(τ (x)) = x when x 6 ν,
and τ (x) = ρ when x > ν; furthermore, the function x 7→ τ (x) is continuous.
We have 0 6 τ (m/n) < ∞ and 0 < Φ(τ (m/n)) < ∞, and for every k > 0,
Nk (Bm,n ) wk (τ (m/n))k p
−→ 0.
−
n
Φ(τ (m/n))
(11.17)
Furthermore, for any C < ω, this holds uniformly as n → ∞ for all m = m(n)
with m/n 6 C.
Returning to the random variables Y1 , . . . , Yn , we have the following result,
which is shown by a physicists’ proof by Bialas, Burda and Johnston [14].
Theorem 11.7. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ
where 0 6 λ < ω, and let (πk )k>0 be as in Theorem 11.4. Then, for every ℓ > 1
and y1 , . . . , yℓ > 0,
(m,n)
P(Y1
(m,n)
= y1 , . . . , Yℓ
= yℓ ) →
ℓ
Y
πyi .
(11.18)
i=1
In other words, for every fixed ℓ, the random variables Y1 , . . . , Yℓ converge jointly
to independent random variables with the distribution (πk )k>0 .
A more fancy way of describing the same result is that the sequence Y1 , . . . , Yn ,
arbitrarily extended to infinite length, converges in distribution, as an element
of N∞
0 , to a sequence of i.i.d. random variables with the distribution (πk )k>0 .
(See e.g. [15, Problem 3.7].)
140
S. Janson
Remark 11.8. We have assumed w0 > 0 in the results above for convenience,
and because this condition is necessary when discussing simply generated trees,
which is our main topic. The balls-in-boxes model makes sense also when w0 = 0,
but this case is easily reduced to the case w0 > 0: Let α := min{k : wk > 0}. If
α > 0, then this means that each box has to have at least α balls. (In particular,
we need m > αn.) There is an obvious correspondence between such allocations
in Bm,n and allocations in Bm−αn,n obtained by removing α balls from each
e = (e
box. Formally, if y = (y1 , . . . , yn ) ∈ Bm,n let y
y1 , . . . , yen ) with yei := yi − α,
and note that if we shift the weight sequence to w
ek := wk+α , then w(e
e y) = w(y);
e with α extra balls added
thus Bm,n has the same distribution as Bm−αn,n for w,
in each box. It follows easily that the results above hold also in the case w0 = 0.
(We interpret wk τ k /Φ(τ ) for τ = 0 as the appropriate limit value. Note also
that it is essential to use (3.2) and not (3.3) when w0 = 0.)
Remark 11.9. Similarly, we can always reduce to the case span(w) = 1: If
span(w) = d, then the number of balls in each box has to be a multiple of d,
so we may instead consider an allocation of m/d “superballs”, each consisting
of d balls. This means replacing each Yi by Yi /d and using the weight sequence
(wdk ). We prefer, however, to allow a general span in our theorems, for ease of
our applications to simply generated trees where the corresponding reduction is
more complicated. (For trees, we may replace each branch by a d-fold branch.
In the probability weight sequence case with Galton–Watson trees, this replaces
d
the random variable ξ by (ξ1 + · · · + ξd )/d, with ξi = ξ i.i.d., but the roots
gets a different offspring distribution ξ/d; more generally, for a general weight
sequence w, we replace Φ(t) by Φ(t1/d )d , except at the root where we use different weights with the generating function Φ(t1/d ). We will not use this and
leave the details to the reader.)
Remark 11.10. We have assumed m/n → λ < ω in Theorems 11.4 and 11.7,
and similarly m/n 6 C < ω in Theorem 11.6; hence, for n large at least,
m/n < ω. In fact, m/n 6 ω is trivially necessary, see Lemma 13.3. When ω < ∞,
the only remaining case (assuming m/n converges) is thus m/n → ω with
m/n 6 ω; in this case, it is easy to see that (11.15) and (11.18) hold with πω = 1
and πk = 0, k 6= ω. (This can be seen as a limiting case of (11.13) with τ = ∞.)
In fact, if ω < ∞, so the boxes have a finite maximum capacity ω, then the
complementation yi 7→ ω − yi yields a bijection of Bm,n onto Bωn−m,n , which
e := (wω−k ). Hence,
preserves weights if (wk ) simultaneously is reflected to w
e and results for m/n → ω < ∞ follow
Bm,n corresponds to Bωn−m,n (for w),
from results for m/n → 0.
As said above, we do not consider the case ω = ∞ and m/n → ∞, when the
average occupancy tends to infinity.
12. Examples of balls-in-boxes
Apart from the connection with simply generated trees, see Section 15, the
balls-in-boxes model is interesting in its own right.
141
Simply generated trees and random allocations
We begin with three classic examples of balls-in-boxes, see e.g. Feller [38, II.5]
and Kolchin [76], followed by further examples from probability theory, combinatorics and statistical physics, including several examples of random forests.
(We return to these examples of random forests in Section 19.7, where we study
the size of the largest tree in them.)
Example 12.1 (Maxwell–Boltzmann statistics; multinomial distribution). Consider a uniform random allocation of m labelled balls in n boxes. This is the
same as throwing m balls into n boxes at random, independently and with
each ball uniformly distributed. (In statistical mechanics, this is known as the
Maxwell–Boltzmann statistics.) It is elementary that the resulting random allocation (Y1 , . . . , Yn ) has a multinomial distribution
P (Y1 , . . . , Yn ) = (y1 , . . . , yn ) = n−m
m
y1 , . . . , yn
= m! n−m
n
Y
1
. (12.1)
yi !
i=1
If we take wk = 1/k!, we see that the probabilities in (12.1) and (11.3) are
proportional, and thus must be identical, so the weight sequence (1/k!) yields
the uniform random allocation of labelled balls. We see also that then
Z(m, n) = nm /m!.
(12.2)
Alternatively, we may take a Poisson distribution Po(a): wk = ak e−a /k!;
this is an equivalent weight sequence for any a > 0. We see directly that then
Sn ∼ Po(na) so (11.5) yields
Z(m, n) = (na)m e−na /m!;
(12.3)
hence we see again that (11.3) and (12.1) agree.
Comparing with Example 10.2, and using Lemma 17.1 below, we see that the
multiset of degrees in a random unordered labelled tree of size n has exactly the
distribution obtained when throwing n − 1 balls into n boxes at random.
With wk = 1/k! we have, as in Example 10.2, (10.6)–(10.7) and ρ = ω =
ν = ∞. Hence, if m/n → λ, we have τ = λ and thus πk = λk e−λ /k!, so (πk )
is the Po(λ) distribution, which thus is the canonical choice of weights. (In the
asymptotic case; for given m and n one might choose Po(m/n), cf. (11.17).)
Theorem 11.7 (or (11.15)) shows that if m/n → λ < ∞, then the asymptotic
distribution of the numbers of balls in a given urn is Po(λ).
The idea to study the multinomial distribution as a vector of i.i.d. Poisson
variables conditioned on the sum is an old one that has been used repeatedly,
see e.g. Kolchin, Sevast’yanov and Chistyakov [77], Holst [52, 53], Kolchin [76],
Janson [55].
Example 12.2 (Bose–Einstein statistics). The weight sequence wk = 1 yields
a uniform distribution over all allocations of m identical and indistinguishable
balls in n boxes; thus each allocation (Y1 , . . . , Yn ) ∈ Bm,n has the same probability 1/|Bm,n | = 1/ n+m−1
.
m
142
S. Janson
This is known as Bose–Einstein statistics in statistical quantum mechanics;
it is the distribution followed by bosons. (In the simple case with no forces acting
on them.)
Comparing with Example 10.1, and using Lemma 17.1 below, we see that the
multiset of degrees in a random ordered tree of size n has exactly the distribution
obtained by a uniform random allocation of n − 1 balls into n boxes.
As in Example 10.1 we have (10.1)–(10.2) and ρ = 1, ν = ∞. If m/n → λ <
∞, then the equation Ψ(τ ) = λ is, by (10.2), τ /(1 − τ ) = λ, and thus
τ=
λ
.
1+λ
(12.4)
Any geometric distribution Ge(p) with 0 < p < 1 is a weight sequence equivalent to (wk ), and (12.4) shows that the canonical choice (7.1) is, using (10.1),
πk = (1 − τ )τ k =
λk
,
(λ + 1)k+1
(12.5)
which is the distribution Ge(1 − τ ) = Ge(1/(λ + 1)). By Theorem 11.7, this is
also the asymptotic distribution of balls in a given urn.
See also Holst [52, 53] and Kolchin [76].
Example 12.3 (Fermi–Dirac statistics). The other type of particles in statistical quantum mechanics is fermions; they exclude each other (the Pauli exclusion
principle) so all allocations of them have to satisfy Yi 6 1, i.e., Yi ∈ {0, 1}. A
random allocation uniform among all such possibilities is known as Fermi–Dirac
n
statistics; this is thus equivalent to a uniform random choice of one of the m
subsets of m boxes.
We obtain this distribution by the choice w0 = w1 = 1 and wk = 0 for k > 2;
thus
Φ(t) = 1 + t
(12.6)
and
Ψ(t) =
t
.
1+t
(12.7)
We have ρ = ∞ and ν = ω = 1. (Formally, (12.6) is the case d = 1 of (10.21),
but note that we assume d > 2 in Example 10.6.)
If m/n → λ < 1, we thus have a rather trivial example of the general theory
with τ /(1 + τ ) = λ and thus
λ
,
(12.8)
τ=
1−λ
and (πk ) = (1 − λ, λ, 0, 0, . . . ), i.e., the Bernoulli distribution Be(λ). (Any
Bernoulli distribution Be(p) with 0 < p < 1 is equivalent.)
Since ω = 1, the corresponding conditioned Galton–Watson tree is trivially
the deterministic path Pn , a case which we have excluded above.
Example 12.4 (Pólya urn [53]). Consider a multicolour Pólya urn containing
balls of n different colours, see Eggenberger and Pólya [37]. Initially, the urn
Simply generated trees and random allocations
143
contains a > 0 balls of each colour. Balls are drawn at random, one at a time.
After each drawing, the drawn ball is replaced together with b > 0 additional
balls of the same colour. (It is natural to take a and b to be integers, but the
model is easily interpreted also for arbitrary real a, b > 0, see e.g. [58].)
Make m draws, and let Yi be the number of times that a ball of colour i is
drawn; then (Y1 , . . . , Yn ) is a random allocation in Bm,n .
A straightforward calculation, see [37, 66, 53], shows that
P (Y1 , . . . , Yn ) = (y1 , . . . , yn )
Qn
m
i=1 a(a + b) · · · (a + (yi − 1)b)
=
y1 , . . . , yn na(na + b) · · · (na + (m − 1)b)
Qn a/b+yi −1
=
i=1
yi
na/b+m−1
m
(12.9)
.
Hence, as noted by Holst [53], this equals the random allocation given by the
weights
−a/b
a/b + k − 1
wk =
,
k = 0, 1, . . . .
(12.10)
= (−1)k
k
k
Note that the case a = b yields wk = 1 and the uniform random allocation in
Example 12.2 (Bose–Einstein statistics). We have
∞ X
a/b + k − 1 k
Φ(t) =
t = (1 − t)−a/b ,
(12.11)
k
k=0
with radius of convergence ρ = 1, and thus
Ψ(t) =
a
t
·
.
b 1−t
(12.12)
Hence, ν = Ψ(1) = ∞, and for any λ ∈ [0, ∞),
τ=
bλ
.
a + bλ
(12.13)
The equivalent probability weight sequences are, by Lemma 4.1, given by
a/b + k − 1 k
tk wk
=
t (1 − t)a/b ,
0 < t < 1,
(12.14)
Φ(t)
k
which is the negative binomial distribution NBin(a/b, 1−t) (where the parameter
a/b is not necessarily an integer). The canonical choice, which by Theorems 11.4
and 11.7 is the asymptotic distribution of the number of balls of a given colour,
is NBin(a/b, 1−τ ) = NBin(a/b, a/(a+bλ)). See also Holst [53] and Kolchin [76].
Note that the case b = 0 (excluded above) means drawing with replacement;
this is Example 12.1, which thus can be seen as a limit case. (This corresponds
d
to the Poisson limit NBin(a/b, a/(a + bλ)) −→ Po(λ) as b → 0.)
144
S. Janson
Example 12.5 (drawing without replacement). Consider again an urn with
balls of n colours, with initially a balls of each colour. (This time, a > 1 is an
integer.) Draw m balls without replacement, and let as above Yi be the number
of drawn balls of colour i. (The case a = 1 yields the Fermi–Dirac statistics in
Example 12.3.)
Formally, this is the case b = −1 of Example 12.4, and a similar calculation
shows that
Qn
a
i=1 yi
;
P (Y1 , . . . , Yn ) = (y1 , . . . , yn ) =
(12.15)
na
m
hence this is the random allocation given by the weights
a
wk =
,
k = 0, 1, . . .
k
(12.16)
We have thus Φ(t) = (1 + t)a , exactly as in Example 10.6, with d = a.
The equivalent probability weight sequences are the binomial distributions
Bi(a, p), 0 < p < 1, and the canonical choice is, for 0 < λ < a, (πk ) = Bi(a, λ/a),
i.e.
a−k k
k a−λ
a λ (a − λ)a−k
a
λ
=
.
(12.17)
πk =
a
a
aa
k
k
See also Holst [53] and Kolchin [76].
Note that taking the limit as a → ∞, we obtain drawing with replacement,
d
which is Example 12.1; this corresponds to the Poisson limit Bi(a, λ/a) −→
Po(λ) as a → ∞.
Example 12.6 (random rooted forests [76]). Consider labelled rooted forests
consisting of n unordered rooted trees with together m nodes, all of which are
labelled. (Thus m > n.) We may assume that the n roots are labelled 1, . . . , n;
let Ti be the tree with rootP
i and let ti := |Ti |. Then the node sets V (Ti ) form a
n
partition of {1, . . . , m}, so i=1 ti = m and (t1 , . . . , tn ) is an allocation in Bm,n ,
with each ti > 1. Furthermore, given (t1 , . . . , tn ) ∈ Bm,n with all ti > 1, the
m−n
node sets V (Ti ) can be chosen in t1 −1,...,t
ways, and given V (Ti ), the tree
n −1
ti −2
Ti can by Cayley’s formula be chosen in ti
ways. (The trees are rooted but
the roots are given.) Hence, the number of forests with the allocation (t1 , . . . , tn )
is
Y
n
n ti −1
n
Y
Y
titi −2
ti
m−n
titi −2 = (m − n)!
= (m − n)!
.
(t
−
1)!
ti !
t1 − 1, . . . , tn − 1 i=1
i
i=1
i=1
(12.18)
Hence, a uniformly random labelled rooted forest corresponds to a random allocation Bm,n with the weight sequence
wk =
k k−1
,
k!
k > 1,
and
w0 = 0.
(12.19)
Note that here w0 = 0 unlike almost everywhere else in the present paper; in
the notation of Remark 11.8, we have α = 1. (As discussed in Remark 11.8, we
Simply generated trees and random allocations
145
can reduce to the case w0 > 0 by considering (t1 − 1, . . . , tn − 1), which is an
allocation in Bm−n,n ; this means that we count only non-root nodes. We prefer,
however, to keep the setting above with w0 = 0, noting that the results above
still hold by Remark 11.8.)
r
If Fm,n
denotes the number of labelled rooted forests with m labelled nodes
of which n are given as roots, then (12.18) implies
r
Fm,n
= (m − n)! Z(m, n).
(12.20)
r
It is well-known that Fm,n
= nmm−n−1 , a formula also given by Cayley [22],
see e.g. [103, Proposition 5.3.2] or [99]; thus
nmm−n−1
.
(m − n)!
Z(m, n) =
We have
Φ(t) :=
∞
X
k k−1
k=1
k!
tk = T (t),
(12.21)
(12.22)
the well-known tree function (known by this name since it is the exponential
generating function for rooted unordered labelled trees, cf. Example 10.2). Note
that T (z) satisfies the functional equation
T (z) = zeT (z) ;
(12.23)
see e.g. [40, Section II.5]. Equivalently,
z = T (z)e−T (z) ,
(12.24)
which by differentiation leads to
T (z)
.
z(1 − T (z))
(12.25)
1
tΦ′ (t)
=
.
Φ(t)
1 − T (t)
(12.26)
T ′ (z) =
Hence,
Ψ(t) :=
By (12.22) and Stirling’s formula, Φ(t) has radius of convergence ρ = e−1 .
Furthermore, (12.24) implies that Φ(ρ) = T (e−1 ) = 1. Hence, (12.26) yields
ν = Ψ(ρ) = ∞, and if 1 6 λ < ∞, then λ = Ψ(τ ) is solved by
T (τ ) = 1 −
λ−1
1
=
λ
λ
(12.27)
and thus, using (12.24),
τ=
λ − 1 −(λ−1)/λ
e
.
λ
(12.28)
146
S. Janson
The probability weight sequences equivalent to (wk ) are by Lemma 4.1 given
by, substituting x = T (t), and thus t = xe−x by (12.24),
pk =
k k−1 tk
(kx)k−1 e−kx
tk
wk =
=
,
T (t)
T (t)k!
k!
k > 1,
(12.29)
where 0 6 t 6 e−1 and thus 0 6 x 6 1. This is known as a Borel distribution;
it appears for example as the distribution of the size |T | of the Galton–Watson
tree with offspring distribution Po(x). (This was first proved by Borel [18]. It
follows by Theorem 15.5 below, with the probability weight sequence Po(x); see
also Otter [93], Tanner [107], Dwass [36], Takács [106], Pitman [99].) It follows
that the random rooted forest considered here has the same distribution as
the forest defined by a Galton–Watson process with starting with n individuals
(the roots) and Po(x) offspring distribution, conditioned to have total size m;
cf. Example 12.8 below. See further Kolchin [76] and Pavlov [96].
In particular, the choice x = 1 (t = e−1 ) in (12.29) yields the equivalent
probability weight sequence
w
ek = e−k wk =
k k−1 e−k
,
k!
k > 1,
(12.30)
which by Stirling’s formula satisfies the asymptotic power-law
1
w
ek ∼ √ k −3/2 ,
2π
as k → ∞.
(12.31)
Moreover, the canonical distribution for a given λ > 1 is, using (12.28),
πk =
k k−1
k k−1 τ k
=
T (τ )k!
k!
λ−1
λ
k−1
e−k(λ−1)/λ .
(12.32)
By Theorems 11.4 and 11.7, and Remark 11.8, this is the asymptotic distribution
of the size of a given (or random) tree in the forest, say T1 . The asymptotic
distribution of |T1 | is thus the distribution of the size |T | of a Galton–Watson
tree with offspring distribution Po(1 − 1/λ). Moreover, T1 is, given its size |T1 |,
uniformly distributed over all trees on |T1 | nodes, and the same is true for the
d
Poisson Galton–Watson tree T by Example 10.2. Consequently, T1 −→ T as
n → ∞ with m/n → λ. (We may regard T1 as an ordered tree, ordering the
children of a node e.g. by their labels.)
The same random allocation Bm,n also describes the block lengths in hashing
with linear probing; see Janson [56]. Indeed, there is a one-to-one correspondence
between hash tables and rooted forests, see e.g. Knuth [75, Exercise 6.4-31] and
Chassaing and Louchard [24].
Example 12.7 (random unrooted forests). Consider labelled unrooted forests
consisting of n trees with together m nodes, all of which are labelled. (Thus
m > n.) We may assume that the n trees are labelled T1 , . . . , Tn ; let ti := |Ti |.
Simply generated trees and random allocations
147
As
Pn in Example 12.6, the node sets V (Ti ) form a partition of {1, . . . , m}, so
i=1 ti = m and (t1 , . . . , tn ) is an allocation in Bm,n , with each ti > 1. In the
unrooted case, given (t1 ,. . . , tn ) ∈ Bm,n with all ti > 1, the node sets V (Ti )
m
ways, and given V (Ti ), the tree Ti can by Cayley’s
can be chosen in t1 ,...,t
n
ti −2
formula be chosen in ti
ways. Hence, the number of unrooted forests with
the allocation (t1 , . . . , tn ) is
Y
n ti −2
n
Y
ti
m
titi −2 = m!
.
ti !
t1 , . . . , tn i=1
i=1
(12.33)
Hence, a uniformly random labelled unrooted forest corresponds to a random
allocation Bm,n with the weight sequence
wk =
k k−2
,
k!
k > 1,
and
w0 = 0.
(12.34)
As in Example 12.6, we have w0 = 0, but this is no problem by Remark 11.8.
u
If Fm,n
denotes the number of labelled unrooted forests with m labelled nodes
and n labelled trees, then (12.33) implies
u
Fm,n
= m! Z(m, n).
(12.35)
u
There is no simple general formula for Fm,n
, as there is for the rooted forests in
Example 12.6, and hence no simple formula for Z(m, n). Asymptotics are given
by Britikov [20]. (See Example 18.16 for one case. The asymptotic formula when
m/n → λ > 2 follows
√ similary from Theorem 19.34(ii), and when m/n → λ < 2
with m = λn + o( n) from Theorem 18.12.)
We have
∞
X
k k−2 k
Φ(t) :=
t = T (t) − 21 T (t)2 ,
(12.36)
k!
k=1
where T (t) is the tree function in (12.22). (The latter equality is well-known, see
e.g. [40, II.5.3]; it can be shown e.g. by showing that both sides have the same
derivative T (t)/t; there are also combinatorial proofs.) Hence, using (12.25),
Ψ(t) :=
T (t)
1
tΦ′ (t)
=
=
;
Φ(t)
Φ(t)
1 − T (t)/2
(12.37)
cf. the similar (12.26) in the rooted case.
As for (12.22), Φ has the radius of convergence ρ = e−1 , but now, by (12.37),
ν = Ψ(ρ) = 2 is finite, so there is a phase transition at λ = 2. The parameter
τ is by the definition in Theorem 7.1 and (12.37) given by T (τ ) = 2 − 2/λ =
2(λ − 1)/λ for λ 6 2; thus, using (12.24),
(
−2(λ−1)/λ
, λ 6 2.
2 λ−1
λ e
(12.38)
τ=
−1
e ,
λ > 2.
148
S. Janson
The probability weight sequences equivalent to (wk ) are by Lemma 4.1 given
by, again substituting t = xe−x or x = T (t),
pk =
x(kx)k−2 e−kx
k k−2 tk
=
,
T (t)(1 − T (t)/2)k!
(1 − x/2)k!
k > 1,
(12.39)
where 0 6 t 6 e−1 and thus 0 6 x 6 1. In particular, taking x = 1 (t = e−1 ),
we obtain the equivalent probability weight sequence
w
ek = 2wk e−k =
2k k−2 e−k
,
k!
k > 1,
(12.40)
which by Stirling’s formula satisfies the asymptotic power-law
2
w
ek ∼ √ k −5/2 ,
2π
as k → ∞.
(12.41)
Moreover, the canonical distribution for a given λ > 1 is, by (12.38) and (12.39),
for k > 1,

k−1
 kk−2
k k−2 τ k
λ 2 λ−1
e−2k(λ−1)/λ , λ 6 2,
k!
λ
πk =
(12.42)
=
T (τ )(1 − T (τ )/2)k!  2kk−2 e−k ,
λ > 2.
k!
By Theorems 11.4 and 11.7, and Remark 11.8, this is the asymptotic distribution
of the size of a given (or random) tree in the forest, say T1 .
We shall see in Theorem 19.49 that the phase transition at λ = 2 is seen
clearly in the size of the largest tree in the forest: if m/n → λ < 2, then the
largest tree is of size Op (log n), while if m/n → λ > 2, then there is a unique
giant tree of size (λ − 2)n + op (n); for details see Theorems 19.34 and 19.49,
and, more generally, Luczak and Pittel [83]. This is thus an example of the
condensation discussed after Theorem 11.4 (and similar to the condensation in
Theorem 7.1 when ν < 1).
Example 12.8 (simply generated forests and Galton–Watson forests). A simply
generated forest is a sequence (T1 , . . . , Tn ) of rooted trees, with weight
w(T1 , . . . , Tn ) :=
n
Y
w(Ti ),
(12.43)
i=1
where w(Ti ) is given by (2.3), for some fixed weight sequence w. A simply
generated random forest with n trees and m nodes, where n and m are given
with m > n, is such a forest chosen at random, with probability proportional
to its weight. Note that in the special case n = 1, this is the same as a simply
generated random tree defined in Section 2. More generally, for any n, a simply
generated random forest (T1 , . . . , Tn ) is, conditioned on the sizes |T1 |, . . . , |Tn |,
a sequence of independent simply generated random trees with the given sizes
(all defined by the same weight sequence w). Moreover, the sizes (|T1 |, . . . , |Tn |)
Simply generated trees and random allocations
149
form an allocation in Bm,n , and it is easily seen that this is a random allocation
Bm,n defined by the weight sequence (Zk )∞
k=0 , where Zk is the partition function
(2.5) for simply generated trees with weight sequence w (and Z0 = 0).
A simply generated random forest can thus be obtained by a two-stage process, combining the constructions in Sections 2 and 11. Note that equivalent
weight sequences w yield equivalent weight sequences (Zk ) by (4.3), and thus
the same simply generated random forest.
In the special case when w is a probability weight sequence, we also define
a Galton–Watson forest with n trees, for a given n, as a sequence (T1 , . . . , Tn )
of n i.i.d. Galton–Watson trees; it describes the evolution of a Galton–Watson
process started with n particles. (It can also be seen as a single Galton–Watson
tree T with the root chopped off, conditioned on the root degree being n, provided that this root degree is possible.) Note that the probability distribution of
the forest is given by the weights in (12.43). Hence, in the probability weight sequence case, the simply generated random forest equals the conditioned Galton–
Watson forest with n trees and m nodes, defined as a Galton–Watson forest
with n trees conditioned on the total size being m; in other words, it describes
a Galton–Watson process started with n particles conditioned on the total size
being m.
Random forests of this type are studied by Pavlov [96], see also Flajolet and
Sedgewick [40, Example III.21].
For example, taking wk = 1/k!, we have by (10.11) Zk = k k−1 /k!, k >
1; this is the weight sequence used in Example 12.6, so we obtain the same
random allocation of tree sizes as there; moreover, given the tree sizes, the
trees are uniformly random labelled unordered rooted trees by Example 10.2.
Consequently, for this weight sequence, the simply generated random forest is
the random labelled forest with unordered rooted trees in Example 12.6. The
same random forest is obtained by the equivalent probability weight sequence
wk = xk e−x /k!, with 0 < x 6 1, so it equals also the conditioned Galton–Watson
forest with offspring distribution Po(x), cf. Example 12.6.
Another example is obtained by taking wk = 1 for all k > 0. Then every forest
has weight 1, so the this simply generated random forest is a uniformly random
forest of ordered rooted trees. (An ordered rooted forest.) By Example 10.1,
the weight sequence (Zk ) is then given by the Catalan numbers in (2.1): Zk =
Ck−1 = (2k − 2)!/(k! (k − 1)!), k > 1.
Further examples are given by starting with the other examples of random
trees in Section 10.
We shall see in Theorem 18.11 that if the weight sequence w is as in Theorem 7.1, and further span(w) = 1, ν > 1 and σ 2 < ∞, then
Zk ∼ √
τ
2πσ 2
Φ(τ )
τ
k
k −3/2 .
(12.44)
Recalling Z(τ /Φ(τ )) = τ by (7.6), we may replace Zk by the equivalent probability weight sequence
150
S. Janson
ek :=
Z
Zk
Z(τ /Φ(τ ))
τ
Φ(τ )
k
=
Zk
τ
τ
Φ(τ )
k
1
k −3/2 ,
∼ √
2πσ 2
(12.45)
−3/2
e
so we have the asymptotic behaviour Z
for every such weight sequence
√k ∼ ck
2
w, where only the constant c = 1/ 2πσ depends on w. This explains why
random forests of this type have similar asymptotic behaviour, in contrast to
the unrooted forests in Example 12.7 which are given by random allocations
defined by a weight sequence ∼ ck −5/2 , see (12.41); see further Example 12.10.
Example 12.9. Let, as in Example 10.7, wk = (k + 1)−β for some real constant
β. Then ρ = 1. As shown in Example 10.7, ν = ∞ if β 6 2, and ν < ∞ if β > 2;
in the latter case, ν is given by (10.25). This example is studied further in e.g.
Bialas, Burda and Johnston [14].
Example 12.10 (power-law). More generally, suppose that wk ∼ ck −β as
k → ∞, for some real constant β and c > 0, i.e., that wk asymptotically satisfies a power-law. Qualitatively, we have the same behaviour as in Examples 10.7
and 12.9, but numerical values such as the critical β in (10.26) will in general
be different.
We repeat some easy facts: first, ρ = 1, ω = ∞ and span(w) = 1.
If −∞ < β 6 1, then Φ(ρ) = Φ(1) = ∞; henceP
ν = ∞ by Lemma 3.1(iv).
∞
If 1 < β 6 2, then Φ(ρ) < ∞ but Φ′ (ρ) =
k=0 kwk = ∞; hence again
ν = Ψ(ρ) = ∞ by (3.11).
On the other hand, if β > 2, then Φ(1) < ∞ and Φ′ (1) < ∞, and thus ν < ∞
by (3.11). Summarising:
ν < ∞ ⇐⇒ β > 2.
(12.46)
In the case β > 2, there is thus a phase transition when we vary λ.
Suppose β > 2, so ν < ∞. If λ > ν, then τ = ρ = 1, and the canonical
distribution (πk ) is by (11.13) given simply by πk = wk /Φ(1). This distribution
then has mean µ = ν < ∞ by (11.14); since πk ≍ k −β as k → ∞, the variance
σ 2 = ∞ if 2 < β 6 3, while σ 2 < ∞ when β > 3.
Note that Examples 12.6 and 12.7 with random forests are of this type,
provided we replace wk by the equivalent w
ek := e−k wk ; Stirling’s formula shows
that w
ek ∼ ck −β where β = 3/2 for rooted forests and β = 5/2 for unrooted
forests, see (12.31) and (12.41) (with a different choice of constant factor in the
latter). The different values of β explains the different asymptotical behaviours
of these two types of random forests: by the results above, the tail behaviour
of wk implies that ν = ∞ for rooted forests but ν < ∞ for unrooted forests, as
we have shown by explicit calculations in Examples 12.6 and 12.7. Recall that
this means that there is a phase transition and condensation for high m/n in
the unrooted case but not in the rooted case.
More generally, (12.45) shows that simply generated random forests under
weak assumptions have the same power-law behaviour of the weight sequence
with β = 3/2 as the special case of (unordered) rooted forests in Example 12.6.
Thus ν = ∞ and there is no phase transition. (At least not in the range m =
O(n) that we consider. Pavlov [96] show a phase transition at m = Θ(n2 ).)
Simply generated trees and random allocations
151
Example 12.11 (unlabelled forests). Consider, as Pavlov [97], rooted forests
consisting of n rooted unlabelled unordered trees, assuming that the trees, or
equivalently the roots, are labelled 1, . . . , n, but otherwise the nodes are unlabelled. A uniformly random forest of this type with m nodes can be seen as
balls-in-boxes with the weight sequence (tk ), where tk is the number of unlabelled unordered rooted trees with k nodes. In this case there is no simple
formula for the generating function Φ(z), but there is a functional equation,
from which it can be shown that tk ∼ c1 k −3/2 ρ−k , where ρ ≈ 0.3382 as usual is
the radius of convergence of Φ(z) and c1 ≈ 0.4399, see Otter [92] or, e.g., Drmota [33, Section 3.1.5]. Furthermore, Φ(ρ) = 1; thus (tk ρk ) gives an equivalent
probability weight sequence with tk ρk ∼ c1 k −3/2 as k → ∞. The asymptotic behaviour of the weight sequence is thus the same as for labelled rooted forests in
Example 12.6, and more generally for Galton–Watson forests (under weak conditions) in Example 12.8, and we expect the same type of asymptotic behaviour
in spite of the fact that the unlabelled forest is not simply generated; this is seen
in detail in Pavlov [96] for the size of the largest tree. In particular, we have
ν = ∞ by Example 12.10 and (12.46), and thus there is no phase transition at
finite λ.
Similarly, Bernikovich and Pavlov [12] considered unrooted forests consisting of n unordered trees labelled 1, . . . , n with a total of m unlabelled nodes.
These are described by the weight sequence (ťk ) where ťk is the number of unrooted unlabelled unordered trees with k nodes.
P Again, there is no no simple
formula for the generating function Φ̌(z) := k ťk z k , but there is the relation
Φ̌(z) = Φ(z) − 21 Φ(z)2 + 12 Φ(z 2 ) found by Otter [92], which leads to the asymptotic formula ťk ∼ c2 k −5/2 ρ−k , where ρ is as above and c2 ≈ 0.5347, see also
Drmota [33, Section 3.1.5]. In this case, (ťk ρk /Φ̌(ρ)) gives an equivalent probability weight sequence which is ∼ (c2 /Φ̌(ρ))k −5/2 as k → ∞, which is the same
type of asymptotic behaviour as for the weight sequence for labelled unrooted
forests in Example 12.7; we thus expect the same type of asymptotic behaviour
as for those forests. In particular, ν < ∞ by Example 12.10; a numerical calculation gives ν := ρΦ̌′ (ρ)/Φ̌(ρ) ≈ 2.0513, see Bernikovich and Pavlov [12].
Note that both types of “unlabelled” forests considered here have the n trees
labelled 1, . . . , n (but the individual nodes are not labelled). Completely unlabelled forests cannot be described by balls-in-boxes (as far as we know), since
the number of (non-isomorphic) ways to label the trees depends on the forest.
Example 12.12 (the backgammon model). The model with wk = 1/k! for
k > 1 as in Example 12.1, but w0 > 0 arbitrary, was considered by Ritort [100]
and Franz and Ritort [41, 42], who called it the backgammon model. We have
∞ k
X
t
= et + w0 − 1
(12.47)
tet
t
.
=
Φ(t)
1 + (w0 − 1)e−t
(12.48)
Φ(t) = w0 +
k=1
and
Ψ(t) =
k!
152
S. Janson
Thus ρ = ν = ∞. The equation Ψ(τ ) = λ can be written
(τ − λ)eτ = (w0 − 1)λ,
(12.49)
and the solution can be written
τ = λ + W (w0 − 1)λe−λ = λ − T (1 − w0 )λe−λ ,
(12.50)
where W (z) is the Lambert W function [26] defined by W (z)eW (z) = z and
T (z) is the tree function in (12.22) (analytically extended to all real z < e−1 );
note that W (z) = −T (−z) by (12.24), see [26].
The canonical probability weight sequence (11.13) is, using (12.48) and
Ψ(τ ) = λ,
πk =
λτ k−1 e−τ
λ τ k e−τ
τk
=
= ·
,
Φ(τ )k!
k!
τ
k!
k > 1,
(12.51)
and π0 = λτ −1 e−τ w0 .
Example 12.13 (random permutations and recursive forests). Consider permutations of {1, . . . , m} with exactly n cycles. We want to list the cycle lengths
in some order; for convenience, we consider all possible orders and define a cyclelabelled permutation to be a permutation with the cycles labelled 1, . . . , n, in
arbitrary order. Given a cycle-labelled permutation with exactly n cycles, let yi
be the length of the i:th cycle. Then (y1 , . . . , yn ) is an allocation in Bm,n with
each yi > 1, and for each such (y1 , . . . , yn ) ∈ Bm,n , the number of cycle-labelled
permutations with yi elements in cycle i is
Y
n
n
Y
1
m
,
(12.52)
(yi − 1)! = m!
y
y1 , . . . , yn i=1
i=1 i
since there are (y − 1)! cycles with y given elements. Consequently, a uniformly
random permutation of {1, . . . , m} with exactly n cycles (listed in random order,
say) corresponds to a random allocation Bm,n defined by the weight sequence
wk =
1
,
k
k > 1,
and
w0 = 0.
(12.53)
Note that here, as in Example 12.6, w0 = 0, and Remark 11.8 applies with
α = 1.
The number of permutations with n (unlabelled) cycles is by (12.52)
m! Z(m, n)/n!,
(12.54)
where we divide by n! in order to ignore the labelling above.
The same balls-in-boxes model with wk = 1/k, k > 1, also describes random
recursive forests, see Pavlov and Loseva [98].
We have
∞ k
X
t
= − log(1 − t)
(12.55)
Φ(t) =
k
k=1
153
Simply generated trees and random allocations
with radius of convergence ρ = 1 and
Ψ(t) :=
t
tΦ′ (t)
=
,
Φ(t)
−(1 − t) log(1 − t)
(12.56)
so ν = Ψ(1) = ∞, cf. Example 12.10 (β = 1).
The equivalent probability weight sequences are by Lemma 4.1 given by
pk =
xk
,
k| ln(1 − x)|
0 < x < 1,
(12.57)
with probability generating function Φ(xz)/Φ(x) = log(1 − xz)/ log(1 − x).
This distribution is called the logarithmic distribution. See further Kolchin, Sevast’yanov and Chistyakov [77] and Kolchin [76].
By Remark 11.8, we obtain results on random permutations with m cycles
as m/n → λ ∈ [1, ∞), see for example Kazimirov [71]. However, it is of greater
interest to consider random permutations without constraining the number of
cycles. This can be done using methods similar to the ones used here, but
is outside the scope of the present paper; see e.g. Kolchin, Sevast’yanov and
Chistyakov [77], Kolchin [76] and Arratia, Barbour and Tavaré [7]. Note that
even if we condition on the number of cycles, a typical random permutation of
{1, . . . , m} has about log m cycles, so we are interested in the case n ≈ log m
and thus m/n → ∞, which we do not considered here.
Other random objects that can be decomposed into components can be studied similarly, for example random mappings [76]; our results apply only to random objects with a given number of components (in some cases), but similar
methods are useful for the general case; see Kolchin [76] and Arratia, Barbour
and Tavaré [7].
13. Preliminaries
P∞
Proof of Lemma 3.1. (i): Since Φ′ (t) = k=0 kwk tk−1 has the same radius of
convergence ρ as Ψ, and Φ(t) > w0 > 0 for t > 0, it is immediate that Ψ is
well-defined, finite and continuous for t ∈ [0, ρ). Furthermore, if 0 < t < ρ, then
tΨ′ (t) is by (4.10) the variance of a non-degenerate random variable, and thus
tΨ′ (t) > 0. Hence Ψ(t) is increasing, completing the proof of (i).
(ii): If Φ(ρ) = ∞, the claim is just the definition of Ψ(ρ) in Section 2. (Note
that the existence of the limit follows from (i).) We may thus assume Φ(ρ) < ∞;
then t ր ρ implies Φ(t) → Φ(ρ) < ∞ and Φ′ (t) → Φ′ (ρ) 6 ∞ by monotone
convergence, and thus
Ψ(t) :=
ρΦ′ (ρ)
tΦ′ (t)
→
= Ψ(ρ).
Φ(t)
Φ(ρ)
(iii): The case ρ = 0 is trivial, and the case ρ > 0 follows from (i) and (ii).
(iv): For any ℓ > 0,
Pℓ−1
P∞
(k − ℓ)wk tk
(k − ℓ)wk tk
k=0
k=0
P∞
P
>
.
(13.1)
Ψ(t) − ℓ =
∞
k
k
k=0 wk t
k=0 wk t
154
S. Janson
If ρ < ∞ and Φ(ρ) = ∞, we thus have,
Ψ(t) − ℓ >
O(1)
→0
Φ(t)
as t ր ρ,
so Ψ(ρ) − ℓ > 0. Since ℓ is arbitrary, this shows Ψ(ρ) = ∞, proving (iv).
(v): If ρ = ∞, choose ℓ with wℓ > 0. Then (13.1) implies
Ψ(t) − ℓ >
−ℓ
Pℓ−1
k=0 wk t
wℓ tℓ
k
→0
as t → ∞,
so Ψ(∞) − ℓ > 0. Hence, Ψ(∞) > sup{ℓ : wℓ > 0} = ω.
Conversely,
Pω
kwk tk
6ω
Ψ(t) = Pk=0
ω
k
k=0 wk t
for all t ∈ [0, ρ),
so Ψ(ρ) 6 ω, completing the proof of (v).
Finally, (3.9) follows from (i) and (ii).
Remark 13.1. Alternatively, the fact that Ψ(t) is increasing can also be seen
as follows: Let 0 < a < b < ρ and let Y be a random variable with disk
tribution P(Y = k)
= wk aY /Φ(a) (cf. Lemma 4.2). Then Ψ(a) = E Y and
, so Ψ(a) 6 Ψ(b) is equivalent to the correlation
Ψ(b) = E Y (b/a)Y / E(b/a)
Y
inequality E Y (b/a)
> E Y E(b/a)Y , which says that the two random variables f (Y ) := Y and g(Y ) := (b/a)Y are positively correlated; it is well-known
that this holds (as long as the expectations are finite) for any two increasing
functions f and g and any Y , see [50, Theorem 236] where the result is attributed to Chebyshev, and it is easy to see that, in fact, strict inequality holds
in the present case. (The latter inequality is an analogue of Harris’ correlation
inequality [51] for variables Y with values in a discrete cube {0, 1}N ; in fact, the
inequalities have a common extension to variables with values in RN . Cf. also
the related FKG inequality, which extends Harris’ inequality; see for example
[48] where also its history is described.)
For a third proof that Ψ(t) is increasing, note that (3.7) shows that Ψ is
(strictly) increasing if and only if log Φ(ex ) is (strictly) convex, which is an easy
consequence
P∞of Hölder’s inequality, (See e.g. [31, Lemma 2.2.5(a)] and note that
Φ(ex ) = k=0 ekx wk is the moment generating function of (wk ) in the case that
(wk ) is a probability weight sequence.)
Lemma 3.1 shows that Ψ is a bijection [0, ρ] → [0, Ψ(ρ)] = [0, ν], so it has
a well-defined inverse Ψ−1 : [0, ν] → [0, ρ]. We extend this inverse to [0, ∞) as
follows.
Lemma 13.2. For x > 0 define τ = τ (x) ∈ [0, ∞] by
τ (x) := sup{t 6 ρ : Ψ(t) 6 x}.
(13.2)
Simply generated trees and random allocations
155
Then τ (x) is the unique number in [0, ρ] such that Ψ(τ (x)) = x when x 6 ν,
and τ (x) = ρ when x > ν. Furthermore, the function x 7→ τ (x) is continuous,
and, for any x > 0,
Ψ(τ (x)) = min(x, ν).
(13.3)
If x < ω, then 0 6 τ (x) < ∞ and 0 < Φ(τ (x)) < ∞. On the other hand, if
x > ω, then τ (x) = Φ(τ (x)) = ∞.
Proof. By Lemma 3.1 and the definition (3.10), Ψ is an increasing continuous
bijection [0, ρ] → [0, Ψ(ρ)] = [0, ν]; thus if 0 6 x 6 ν, there exists a unique
Ψ−1 (x) ∈ [0, ρ] with Ψ(Ψ−1 (x)) = x, and (13.2) yields τ (x) = Ψ−1 (x). Since Ψ
is a continuous bijection of one compact space onto another, its inverse Ψ−1 :
[0, ν] → [0, ρ] is continuous too; thus x 7→ τ (x) = Ψ−1 (x) is continuous on [0, ν].
Furthermore, (13.3) holds for x 6 ν.
If x > ν = Ψ(ρ), then (13.2) yields τ (x) = ρ, and thus Ψ(τ (x)) = Ψ(ρ) = ν,
so (13.3) holds in this case too.
Combining the two cases we see that x 7→ τ (x) is continuous on [0, ∞), and
that (13.3) holds.
Now suppose that x < ω and τ (x) = ∞. Since τ (x) 6 ρ we then have ρ = ∞,
and Lemma 3.1(v) yields Ψ(τ (x)) = Ψ(ρ) = ω > x, contradicting (13.3). Thus
τ (x) < ∞ when x < ω. Furthermore, if Φ(τ (x)) = ∞, then τ (x) = ρ, since
Φ(t) < ∞ for t < ρ, and thus Φ(ρ) = ∞. If further x < ω, and thus ρ = τ (x) <
∞ as just shown, then Lemma 3.1(iv) would give Ψ(τ (x)) = Ψ(ρ) = ∞, again
contradicting (13.3) since x < ∞. Thus Φ(τ (x)) < ∞ when x < ω.
Conversely, if x > ω, then ω < ∞, so Φ(t) is a polynomial and ρ = ∞.
Lemma 3.1(v) shows that Ψ(ρ) = ω 6 x, so (13.2) yields τ (x) = ρ = ∞, whence
also Φ(τ (x)) = Φ(∞) = ∞.
Next, we investigate when Z(m, n) > 0. We say than an allocation (y1 , . . . , yn )
of m balls in n boxes is good if it has positive weight, i.e., if yi ∈ supp(w) for
every i. Thus, Z(m, n) > 0 if and only if there is a good allocation in Bm,n ; in
this case, the random allocation Bm,n is defined and is always good.
Provided m is not too small or too large, the m and n for which good allocations exist are easily characterised; the following lemma shows that a simple
necessary condition also is sufficient. (The exact behaviour for very small m is
complicated. The largest m such that Z(m, n) = 0 for all n is called the Frobenius number of the set supp(w); it is a well-known, and in general difficult,
problem to compute this, see e.g. [10]. The case when m is close to ωn (with
finite ω) is essentially the same by the symmetry in Remark 11.10.)
Lemma 13.3. Suppose that w0 > 0.
(i) If Z(m, n) > 0, then span(w) | m and 0 6 m 6 ωn.
(ii) If ω < ∞, then there exists a constant C (depending on w) such that if
span(w) | m and C 6 m 6 ωn − C, then Z(m, n) > 0.
(iii) If ω = ∞, then for each C ′ < ∞, there exists a constant C (depending on
w and C ′ ) such that if span(w) | m and C 6 m 6 C ′ n, then Z(m, n) > 0.
156
S. Janson
Pn
Proof. (i): Z(m, n) > 0 if and only if m = i=1 yi for some yi with wyi > 0,
i.e., yi ∈ supp(w). This implies 0 6 yi 6 ω and span(w) | yi for each i, and the
necessary conditions in (i) follow immediately.
(ii): We may for convenience assume that span(w) = 1, see Remark 11.9;
then, by (3.3), supp(w) \ {0} is a finite set of integers with greatest common
divisor 1. Thus, by a well-known theorem by Schur, see e.g. [109, 3.15.2] or [40,
Proposition IV.2], there is a constant
C1 such that every integer m > C1 can be
P
written as a finite sum m = i yi with yi ∈ supp(w) (repetitions are allowed);
i.e. we have a good allocation of m balls in some number ℓ(m) boxes. Choose
one such allocation for each m ∈ [C1 , C1 + ω), and let C2 be the maximum
number of boxes in any of them.
If C1 6 m 6 ωn − C2 ω, let a := ⌊(m − C1 )/ω⌋. Then m − aω ∈ [C1 , C1 + ω),
and has thus a good allocation in at most C2 boxes. We add a boxes with ω
balls each, and have obtained a good allocation of m balls using at most
C2 + a = C2 + ⌊(m − C1 )/ω⌋ 6 C2 + ⌊(ωn − C2 ω − C1 )/ω⌋ 6 n
boxes. Hence we may add empty boxes and obtain a good allocation in Bm,n .
(Recall that 0 ∈ supp(w).) Thus Z(m, n) > 0 when C1 6 m 6 ωn − C2 ω.
(iii): We may again assume span(w) = 1. Let K be a large integer and
(K)
consider the truncated weight sequence w(K) = (wk ) defined by
(
wk , k 6 K,
(K)
wk :=
(13.4)
0,
k > K;
we assume that K ∈ supp(w) and that K is so large that K > C ′ + 1 and
span(w(K) ) = span(w) = 1. Then ω(w(K) ) = K, and (ii) shows that for some
C3 , if C3 6 m 6 Kn − C3 , then Z(m, n; w) > Z(m, n; w(K) ) > 0. Hence, if
C3 6 m 6 C ′ n and Z(m, n) = 0, then Kn − C3 < m 6 C ′ n 6 (K − 1)n,
and thus n < C3 , whence m < C ′ C3 . Consequently, if C ′ C3 6 m 6 C ′ n, then
Z(m, n) > 0.
Remark 13.4. In the case ω = ∞, it is not always true that there is a constant
C such that Z(m, n) > 0 whenever m > C. For example, suppose that wk = 1
when k = 0 or k = j! for some j > 0, and wk = 0 otherwise. Then Z(m, n) = 0
when m = (n + 1)! − 1 and n > 2.
Remark 13.5. Lemma 13.3 is easily modified for the case w0 = 0; if α :=
min{k : wk > 0} as in Remark 11.8, then the necessary condition (i) is αn 6
m 6 ωn and span(w) | (m − αn), and again this is sufficient if m stays away
from the boundaries.
14. Proofs of Theorems 11.4–11.7
We now prove the theorems in Section 11, which we for the reader’s convenience
repeat first.
Simply generated trees and random allocations
157
Theorem 11.4. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ
with 0 6 λ < ω.
(i) If λ 6 ν, let τ be the unique number in [0, ρ] such that Ψ(τ ) = λ.
(ii) If λ > ν, let τ := ρ.
In both cases, 0 6 τ < ∞ and 0 < Φ(τ ) < ∞. Let
πk :=
wk τ k
,
Φ(τ )
k > 0.
(11.13)
Then (πk )k>0 is a probability distribution, with expectation
µ = Ψ(τ ) = min(λ, ν)
(11.14)
and variance σ 2 = τ Ψ′ (τ ) 6 ∞. Moreover, for every k > 0,
p
Nk (Bm,n )/n −→ πk .
(11.15)
Theorem 11.6. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n 6 C for
some C < ω.
Define the function τ : [0, ∞) → [0, ∞] by τ (x) := sup{t 6 ρ : Ψ(t) 6 x}.
Then τ (x) is the unique number in [0, ρ] such that Ψ(τ (x)) = x when x 6 ν,
and τ (x) = ρ when x > ν; furthermore, the function x 7→ τ (x) is continuous.
We have 0 6 τ (m/n) < ∞ and 0 < Φ(τ (m/n)) < ∞, and for every k > 0,
Nk (Bm,n ) wk (τ (m/n))k p
−
−→ 0.
n
Φ(τ (m/n))
(11.17)
Furthermore, for any C < ω, this holds uniformly as n → ∞ for all m = m(n)
with m/n 6 C.
Theorem 11.7. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ
where 0 6 λ < ω, and let (πk )k>0 be as in Theorem 11.4. Then, for every ℓ > 1
and y1 , . . . , yℓ > 0,
(m,n)
P(Y1
=
(m,n)
y1 , . . . , Yℓ
= yℓ ) →
ℓ
Y
πyi .
(11.18)
i=1
In other words, for every fixed ℓ, the random variables Y1 , . . . , Yℓ converge jointly
to independent random variables with the distribution (πk )k>0 .
We begin with some lemmas. First we state and prove a version of the local
central limit theorem (for integer-valued variables) that is convenient for our
application below. We will need it for a triangular array, where the variables we
sum depend on n.
We define the span of an integer-valued random variable to be the span of
its distribution, defined as in (3.2).
158
S. Janson
Lemma 14.1. Let ξ and ξ (1) , ξ (2) , . . . be integer-valued random variables with
Pn (n)
d
(n)
(n)
ξ (n) −→ ξ as n → ∞, and let Sn := i=1 ξi , where ξi are independent
(n)
copies of ξ . Suppose further that ξ is non-degenerate, with span d and finite
variance σ 2 > 0, and that supn E |ξ (n) |3 < ∞. If d > 1, we assume for simplicity
that d | ξ and d | ξ (n) for each n.
Let m = m(n) be a sequence of integers that are multiples of d, and assume
that E ξ (n) = m(n)/n. Then, as n → ∞,
d + o(1)
.
P(Sn(n) = m) = √
2πσ 2 n
(14.1)
Proof. The proof uses standard arguments, see e.g. Kolchin [76, Theorem 1.4.2];
we only have to check uniformity in ξ (n) of our estimates.
If the span d > 1, we may divide ξ, ξ (n) and m by d, and reduce to the case
d = 1. Hence we assume in the proof that span(ξ) = 1.
(n)
Let ϕ(t) := E eitξ and ϕn (t) := E eitξ be the characteristic functions of ξ
(n)
−itm/n
and ξ . Further, let ϕ
en (t) := e
ϕn (t) be the characteristic function of
the centred random variable ξ (n) − E ξ (n) = ξ (n) − m/n.
(n)
Then Sn has characteristic function ϕn (t)n , and thus, by the inversion formula and a change of variables,
Z π
Z π
1
1
P(Sn(n) = m) =
e−imt ϕn (t)n dt =
ϕ
en (t)n dt
2π −π
2π −π
Z π√n
√
1
(14.2)
ϕ
en (x/ n)n dx
= √
√
2π n −π n
Z ∞
√
√ 1
ϕ
en (x/ n)n 1 |x| < π n dx.
= √
2π n −∞
Let σn2 be the variance of ξ (n) . Since E |ξ (n) |3 are uniformly bounded, σn2 <
∞; moreover, the random variables ξ (n) are uniformly square integrable and
d
it follows from ξ (n) −→ ξ that σn2 → σ 2 . (See e.g. Gut [49, Theorems 5.4.2
and 5.4.9] for this standard argument.) In particular, σ 2 /2 6 σn2 6 2σ 2 for all
sufficiently large n; we consider in the remainder of the proof only such n.
Since ϕ
en (t) is the characteristic function of ξ (n) − E ξ (n) which has mean 0
and, by assumption, an absolute third moment that is uniformly bounded, we
have by a standard expansion (see e.g. [49, Theorems 4.4.1])
ϕ
en (t) = 1 − 21 σn2 t2 + O(E |ξ (n) |3 |t|3 ) = 1 − 21 σn2 t2 + O(|t|3 ),
(14.3)
uniformly in all n and t. In particular, for any fixed real x,
and thus
√
σ 2 x2
σ 2 x2 + o(1)
ϕ
en (x/ n) = 1 − n + O(n−3/2 ) = 1 −
,
2n
2n
√
2 2
ϕ
en (x/ n)n → e−σ x /2 .
(14.4)
(14.5)
Simply generated trees and random allocations
159
We are aiming at estimating the integral in (14.2) by dominated convergence,
so we also need a suitable bound that is uniform in n.
We write (14.3) as
en (t) − (1 − 21 σn2 t2 ) 6 C1 |t|3 .
(14.6)
ϕ
Let δ := min{σ −1 , σ 2 /8C1 } > 0. Then, if |t| 6 δ, recalling our assumption
σ 2 /2 6 σn2 6 2σ 2 , we have 1 − 21 σn2 t2 > 1 − σ 2 t2 > 0 and, by (14.6),
|ϕ
en (t)| 6 1 − 21 σn2 t2 + C1 |t|3 6 1 − 41 σ 2 t2 + C1 δt2 6 1 − 81 σ 2 t2 .
(14.7)
For δ 6 |t| 6 π we claim that there exists n0 and η > 0 such that if n > n0
and δ 6 |t| 6 π, then
|ϕ
en (t)| 6 1 − η.
(14.8)
In fact, if this were not true, then there would exist sequences nk > k and
tk ∈ [δ, π] (by symmetry, it suffices to consider t > 0) such that |ϕnk (tk )| =
|ϕ
enk (tk )| > 1 − 1/k. By considering a subsequence, we may assume that tk →
d
t∞ as k → ∞ for some t∞ ∈ [δ, π]. Since ξn −→ ξ, ϕnk (t) → ϕ(t) uniformly
for |t| 6 π, and thus ϕnk (tk ) → ϕ(t∞ ). It follows that |ϕ(t∞ )| = 1 for some
t∞ ∈ [δ, π], but this is impossible when span(ξ) = 1, as is well-known (and
′
easily seen from E eit∞ (ξ−ξ ) = |ϕ(t∞ )|2 = 1, where ξ ′ is an independent copy of
ξ). This contradiction shows that (14.8) holds.
We can combine (14.7) and (14.8); we let c1 := min{σ 2 /8, η/π 2 } and obtain,
for n > n0 ,
|ϕ
en (t)| 6 1 − c1 t2 6 exp(−c1 t2 ),
|t| 6 π,
and thus
√
|ϕ
en (x/ n)|n 6 exp(−c1 x2 ),
√
|x| 6 π n.
This justifies the use of dominated convergence in (14.2), and we obtain by
(14.5)
Z ∞
√
√
√ ϕ
en (x/ n)n 1 |x| < π n dx
2π n P(Sn(n) = m) =
−∞
Z ∞
p
2 2
e−σ x /2 dx = 2π/σ 2 ,
→
−∞
which yields (14.1). (Recall that we have assumed d = 1.)
Remark 14.2. A simple modification of the proof shows that the result√still
holds if the condition E ξ (n) = m(n)/n is relaxed to m(n) = n E ξ (n) + o( n).
Rπ
(n)
1
en (t)|n dt, and it
Furthermore, for any m = m(n), P(Sn = m) 6 2π
−π |ϕ
follows by the proof above that
d + o(1)
,
P(Sn(n) = m) 6 √
2πσ 2 n
uniformly in all m ∈ Z.
(14.9)
160
S. Janson
Moreover, both Lemma 14.1 and the remarks above hold, with only minor
modifications in the proof, also if the condition supn E |ξ (n) |3 < ∞ is relaxed to
uniform square integrability of ξ (n) . In particular, if ξ (n) = ξ, this assumption
is not needed at all; then the assumption σ 2 < ∞ is the only moment condition
that we need. (This is the classical local central limit theorem for discrete distributions, see e.g. Gnedenko and Kolmogorov [46, § 49] or Kolchin [76, Theorem
1.4.2].)
We use Lemma 14.1 to obtain lower bounds of the (rather weak) type exp(o(n))
for P(Sn = m) in the case of a probability weight sequence, for suitable m. We
treat the cases ρ > 1 and ρ = 1 separately.
Lemma 14.3. Let w be a probability weight sequence with 0 < w0 < 1 and
ρ
1. Let ξ1 , ξ2 , . . . be i.i.d. random variables with distribution w and let Sn :=
P>
n
i=1 ξi .
Assume that m = m(n) are integers that are multiples of d := span(w), and
that m(n)/n → E ξ1 . Then
P(Sn = m) = Z(m, n) = eo(n) .
Proof. Let ξ := ξ1 and λ := E ξ = Φ′ (1) = Ψ(1). Since ρ > 1, we have ν >
Ψ(1) = λ. Thus, by assumption, m/n → λ < ν, so m/n < ν for all large n; we
consider in the sequel only such n. By Lemma 3.1 we may then define τn ∈ [0, ρ)
by Ψ(τn ) = m/n. Since Ψ−1 is continuous on [0, ν), and Ψ(1) = E ξ = λ, we
have
τn = Ψ−1 (m/n) → Ψ−1 (λ) = 1
as n → ∞.
(14.10)
Let ξ (n) have the conjugate distribution
P(ξ (n) = k) =
τnk
wk ,
Φ(τn )
k > 0;
(14.11)
by Lemma 4.2 this is a probability distribution with expectation
E ξ (n) = Ψ(τn ) = m/n.
(14.12)
The conditions of Lemma 14.1 are easily verified: Since τn → 1 by (14.10), we
d
have P(ξ (n) = k) → wk = P(ξ = k) and thus ξ (n) −→ ξ. Furthermore, taking
any τ∗ ∈ (1, ρ) and considering only n that are so large that τn < τ∗ ,
E |ξ (n) |3 =
∞
X
k=0
k3
∞
τnk
1 X 3 k
wk 6
k τ∗ wk < ∞.
Φ(τn )
Φ(0)
k=0
Furthermore, if d = span(ξ), then wk > 0 =⇒ d | k by (3.3); thus d | ξ and
d | ξ (n) (a.s.). Lemma 14.1 thus applies, and if w(n) denotes the distribution of
ξ (n) in (14.11), then by (11.5) and (14.1),
X
n
d
(n)
(n)
ξi = m ∼ √
Z(m, n; w ) = P
,
(14.13)
2πσ 2 n
i=1
Simply generated trees and random allocations
161
where σ 2 := Var ξ. By (11.9), we have Z(m, n; w(n) ) = Φ(τn )−n τnm Z(m, n), and
thus, recalling that τn → 1 < ρ and hence Φ(τn ) → Φ(1) = 1,
P(Sn = m) = Z(m, n) = τn−m Φ(τn )n Z(m, n; w(n) )
= exp −m log τn + n log Φ(τn ) + log Z(m, n; w(n) )
= exp o(n) .
Lemma 14.4. Let w be a probability weight sequence with 0 < w0 < 1 and
ρ
1. Let ξ1 , ξ2 , . . . be i.i.d. random variables with distribution w and let Sn :=
P=
n
i=1 ξi .
Assume that m = m(n) are integers that are multiples of d := span(w), and
that m(n)/n → λ < ∞ with λ > E ξ1 . Then
P(Sn = m) = eo(n) .
Proof. Let K be a large integer and consider the truncated weight sequence
(K)
w(K) = (wk ) defined by, as in (13.4),
(
wk , k 6 K,
(K)
wk :=
(14.14)
0,
k > K,
PK
having generating function ΦK (t) = k=0 wk tk , and the corresponding ΨK (t) :=
tΦ′K (t)/ΦK (t). We assume that K is so large that span(w(K) ) = span(w), and
that K > k for some k > λ with wk > 0. (Such k > λ exists since ρ < ∞.)
Thus the weight sequence w(K) has, by Lemma 3.1(v), ν(w(K) ) = ΨK (∞) =
ω(w(K) ) > λ. Hence, by Lemma 3.1 again, there exists τK ∈ [0, ∞) such that
(K)
ΨK (τK ) = λ. Thus the probability distribution π (K) = (πk ) defined by
(K)
πk
:=
k
τK
(K)
w
ΦK (τK ) k
(14.15)
has expectation λ. Since this distribution has finite support it has radius of
convergence ρK = ∞; furthermore, m/n → λ by assumption. Hence Lemma 14.3
applies to π (K) and yields
Z(m, n; π(K) ) = eo(n) .
(14.16)
m
Z(m, n; π(K) ) = ΦK (τK )−n τK
Z(m, n; w(K) ).
(14.17)
By (11.9) and (14.15),
(K)
Moreover, Z(m, n; w) > Z(m, n; w(K) ) since wk > wk
(14.16) and (14.17),
for each k. Hence, by
−m
Z(m, n; w) > Z(m, n; w(K) ) = τK
ΦK (τK )n Z(m, n; π(K) )
−m
= τK
ΦK (τK )n eo(n) .
(14.18)
162
S. Janson
This holds for every large fixed K.
If 0 < t < ρ = 1, then ΦK (t) → Φ(t) and Φ′K (t) → Φ′ (t) as K → ∞, so
ΨK (t) → Ψ(t) < Ψ(1) = E ξ1 6 λ. Hence, for large K, ΨK (t) < λ = ΨK (τK ),
so τK > t. Consequently, lim inf K→∞ τK > 1.
On the other hand, if t > ρ = 1, let ℓ := ⌈λ⌉ + 1 > λ, and assume K > ℓ.
Then
PK
Pℓ−1
PK
k
k
kwk tk
k=ℓ ℓwk t
k=0 ℓwk t
→ ℓ > λ, (14.19)
>
=
ℓ
−
ΨK (t) = Pk=0
P
K
K
k
k
ΦK (t)
k=0 wk t
k=0 wk t
as K → ∞, since ΦK (t) → Φ(t) = ∞. Hence, for large K, ΨK (t) > λ = ΨK (τK ),
and thus τK < t. Consequently, lim supK→∞ τK 6 1.
Combining these upper and lower bounds, we have
τK → 1,
as K → ∞.
If we take t < 1, we thus have for large K, τK > t and hence ΦK (τK ) > ΦK (t).
Thus, lim inf K→∞ ΦK (τK ) > limK→∞ ΦK (t) = Φ(t) for every t < 1, so
lim inf ΦK (τK ) > Φ(1) = 1.
K→∞
Given any ε > 0, we may thus take K so large that τK < eε and ΦK (τK ) >
e . Then (14.18) yields
−ε
Z(m, n; w) > e−εm−εn+o(n) > e−εm−2εn
for large n. Since ε is arbitrary and m = O(n), this shows Z(m, n; w) > eo(n) ,
and the result follows since Z(m, n) 6 1 for any probability weight sequence by
(11.5).
We next prove Theorems 11.4 and 11.6. Theorem 11.6 follows easily from
Theorem 11.4, so it may seem natural to prove Theorem 11.4 first. However,
our proof of Theorem 11.4 uses in one case Theorem 11.6 (for another case).
We will therefore first show that Theorem 11.6 follows from Theorem 11.4, and
then show Theorem 11.4.
Proof of Theorem 11.6 from Theorem 11.4. We prove that Theorem 11.4 for
some weight sequence (wk ) implies Theorem 11.6 for the same weights. The
assertions about τ follow from Lemma 13.2, so we turn to (11.17).
Consider a subsequence of (m(n), n). It suffices to show that every such subsequence has a subsubsequence such that (11.17) holds. (See e.g. [49, Section
5.7], [65, p. 12] or [15, Theorem 2.3] for this standard argument.)
Since m/n 6 C by assumption, we can select a subsubsequence such that
m/n → λ for some λ 6 C < ω. Then Theorem 11.4 applies and thus (along the
subsubsequence),
Nk (Bm,n ) wk (τ (λ))k
Nk (Bm,n )
p
−
=
− πk −→ 0.
n
Φ(τ (λ))
n
(14.20)
Simply generated trees and random allocations
163
Furthermore, since m/n → λ and x 7→ τ (x) is continuous, τ (m/n) → τ (λ)
(along the subsubsequence); hence
wk (τ (λ))k
wk (τ (m/n))k
−
→ 0.
Φ(τ (m/n))
Φ(τ (λ))
(14.21)
Combining (14.20) and (14.21), we see that (11.17) holds along the subsubsequence, which as said above completes the proof of (11.17).
That (11.17) holds uniformly is, in fact, automatic since we have shown it for
an arbitrary m(n) (although we stated it for emphasis): Let Xm,n denote the
left-hand side of (11.17), and let ε > 0. Choose m(n) as the integer m ∈ [0, Cn]
that maximises P(|Xm,n | > ε). Since (11.17) says that P(|Xm(n),n | > ε) → 0,
we have supm6Cn P(|Xm,n | > ε) → 0.
Proof of Theorem 11.4. First, Lemma 13.2 shows that τ defined by (i) and (ii)
is well-defined and equals τ (λ) defined in Lemma 13.2; since λ < ω we have
τ < ∞ and Φ(τ ) < ∞. Further, (13.3) yields
Ψ(τ ) = min(λ, ν).
(14.22)
Since τ < ∞ and Φ(τ ) < ∞, πk is well-defined by (11.13); furthermore,
by Lemma 4.2 and (14.22), (πk ) is a probability distribution with mean and
variance as asserted.
We now turn to proving (11.15), the main assertion. We study three cases
separately.
Case (a): τ > 0. Then π = (πk ) is a probability weight sequence equivalent to
w = (wk ), so we may replace (wk ) by (πk ) without changing Bm,n . Note that
this changes ρ and τ to ρ(π) = ρ(w)/τ and τ (π) = τ (w)/τ = 1 by (4.4) and
(4.5). We may thus assume that (wk ) equals the probability weight sequence
(πk ), and that ρ > τ = 1. By (14.22), then Ψ(1) = min(λ, ν).
We employ the notation of Example 11.2. Note that by (11.14),
E ξ1 = Ψ(1) = min(λ, ν) 6 λ.
(14.23)
Moreover, if ρ > 1, then ν = Ψ(ρ) > Ψ(1) by Lemma 3.1, so (14.22) shows that
in this case,
E ξ1 = Ψ(1) = λ.
(14.24)
The allocation (ξ1 , . . . , ξn ) (with a random sum Sn ) consists of n i.i.d. components, so
n
X
1{ξi = k} ∼ Bi(n, πk )
(14.25)
Nk (ξ1 , . . . , ξn ) =
i=1
has a binomial distribution. For every k and ε > 0, we have by Chernoff’s
inequality, see e.g. [65, Theorem 2.1 or Remark 2.5],
P |Nk (ξ1 , . . . , ξn ) − nπk | > εn 6 exp(−cε n),
(14.26)
for some constant cε > 0 depending on ε.
164
S. Janson
We condition on Sn = m, recalling that
d
Bm,n = (ξ1 , . . . , ξn ) | Sn = m .
(14.27)
When ρ > 1 we apply Lemma 14.3, using m/n → λ and (14.24), and when
ρ = 1 we apply Lemma 14.4, using (14.23). In both cases we obtain P(Sn =
m) = exp(o(n)) and thus by (14.27),
P |Nk (Bm,n ) − nπk | > εn = P |Nk (ξ1 , . . . , ξn ) − nπk | > εn | Sn = m
P |Nk (ξ1 , . . . , ξn ) − nπk | > εn
6 exp −cε n + o(n) → 0.
6
P(Sn = m)
Since ε is arbitrary, this shows that
Nk (Bm,n )
p
− πk −→ 0
n
as asserted, which completes the proof when τ > 0.
Case (b): τ = 0 and ρ > 0. We write Nk for Nk (Bm,n ). By (11.13) we have
p
p
π0 = 1 and πk = 0 for k > 0; hence, (11.15) says that N0 /n −→ 1 and Nk /n −→
0 for k > 0.
Since τ < ρ, we are in case (i), so λ = Ψ(τ ) = Ψ(0) = 0. In other words,
m/n → 0. The result is trivial (and deterministic) in this case. We have
∞
∞
k=1
k=1
1X
1X
m
→ λ = 0.
Nk 6
kNk =
n
n
n
(14.28)
Hence Nk /n → 0 = πk for every k > 1. Moreover, (14.28) also implies
P∞
n − k=1 Nk
N0
=
→ 1 = π0 ,
(14.29)
n
n
which completes the proof when τ = 0 < ρ.
Case (c): ρ = 0. We write again Nk for Nk (Bm,n ), recalling that this is a
random variable. In this case ν = 0 and τ = ρ = 0 for every λ > 0. By (11.13)
we thus have π0 = 1 and πk = 0 for k > 0; hence, as in case (b), we have
p
p
to show that N0 /n −→ 1 and Nk /n −→ 0 for k > 0. By assumption, m/n
converges, so the sequence m/n is bounded; let C be a large constant such
that m/n 6 C. Further, let K be a large integer; we assume K > 2C and (for
simplicity) wK > 0. (Note that such K exist since ω = ∞ when ρ = 0.)
We say P
that a box is small if it contains at most K balls,P
and large otherwise.
K
K
Let N ′ := 0 Nk be the number of small boxes and M ′ := 0 kNk the number
of balls in them. Note first that by our assumptions, m/n 6 C < K/2. Hence,
m > m − M′ =
∞
X
K+1
kNk > K
∞
X
K+1
Nk = K(n − N ′ ) >
2m
(n − N ′ ). (14.30)
n
Simply generated trees and random allocations
165
Thus, n − N ′ 6 n/2 and N ′ > n/2; in particular N ′ → ∞. Moreover,
06
m
M′
6
6 2C < K.
N′
n/2
(14.31)
The weight w(y) in (11.2) factorizes as the product over the small boxes
times the product over the large boxes. Thus, if we condition on M ′ and N ′ ,
and moreover on the set of the N ′ boxes that are small, then the allocations of
the small boxes and the large boxes are independent; moreover, the allocations
to the small boxes form a random allocation of the type BM ′ ,N ′ for the truncated
weight sequence w(K) given by (13.4) above. By assumption, wK > 0, and thus
the truncated sequence has ω (K) := ω(w(K) ) = K.
The truncated
weight sequence w(K) has a polynomial generating function
PK
(K)
Φ (t) = 0 wk tk with an infinite radius of convergence ρ(K) = ∞. We have
already proved Theorem 11.4 in this case, and thus Theorem 11.6 also holds in
this case, by the proof above. Applying Theorem 11.6 to the truncated weight
sequence and the allocations of small boxes we see that there exists a continuous
function τK : [0, K) → [0, ∞) such that, conditioned on (M ′ , N ′ ),
wk (τK (M ′ /N ′ ))k p
Nk
−→ 0,
− (K)
′
N
τK (M ′ /N ′ )
Φ
k 6 K.
(14.32)
Moreover, (14.32) holds uniformly in all (M ′ , N ′ ) by Theorem 11.6 and (14.31).
Hence, denoting the left-hand side of (14.32) by X, we have for every ε > 0
P(|X| > ε | M ′ , N ′ ) 6 δ(n), for some function δ(n) → 0. Taking the expectation, it follows that also P(|X| > ε) 6 δ(n) → 0, and thus (14.32) holds also
unconditionally. Thus,
wk (τK (M ′ /N ′ ))k
Nk
+ op (1),
= (K)
′
N
τK (M ′ /N ′ )
Φ
k 6 K.
(14.33)
By (14.31), M ′ /N ′ 6 2C, and thus, using Lemma 13.2 and 2C < K =
ω(w(K) ), τK (M ′ /N ′ ) 6 τK (2C) < ∞. Hence, with C1 := τK (2C),
w0 6 Φ(K) (τk (M ′ /N ′ )) 6 Φ(K) (C1 ) = C2 ,
say. Taking k = 0 in (14.33) we now find
N0
w0
w
+ op (1) > 0 + op (1).
= (K)
′
′
N′
C
Φ
τK (M /N )
2
(14.34)
Since N ′ > n/2 this shows that there exists c1 > 0 (for example c1 := w0 /(3C2 ))
such that w.h.p.
N0
> c1 .
(14.35)
n
It follows further from (14.34) that we can invert (14.33) for k = 0 (since
x 7→ x−1 is continuous for x > 0); thus
Φ(K) τK (M ′ /N ′ )
N′
=
+ op (1).
(14.36)
N0
w0
166
S. Janson
Multiplying (14.33) and (14.36) we find the simpler relation
wk
Nk
=
(τK (M ′ /N ′ ))k + op (1),
N0
w0
k 6 K.
(14.37)
Let ℓ := min{k > 0 : wk > 0} be the smallest non-zero index with positive
weight, and define a random variable by
1/ℓ
w0 Nℓ
.
(14.38)
τ∗ :=
wℓ N0
It follows from (14.37), with k = ℓ, that τ∗ = τK (M ′ /N ′ ) + op (1). Consequently,
(14.37) yields
wk k
Nk
=
τ + op (1),
k 6 K.
(14.39)
N0
w0 ∗
We have so far worked with a fixed, large K. However, the definition (14.38)
does not depend on the choice of K, and since K may be chosen arbitrarily large,
we see that, in fact, (14.39) holds for every k > 0, with the same (random) τ∗ .
Fix again K > 0, and sum (14.39) for k 6 K. This yields
K
K
X Nk
X wk
Φ(K) (τ∗ )
n
>
=
τ∗k + op (1) =
+ op (1).
N0
N0
w0
w0
0
0
(14.40)
Recall that N0 /n > c1 w.h.p. by (14.35). We thus have from (14.40)
Φ(K) (τ∗ ) 6 w0
n
+ op (1) 6 w0 /c1 + 1
N0
(14.41)
w.h.p. By assumption, ρ = 0, so Φ(t) = ∞ for every t > 0. Hence, for every
ε > 0 we have Φ(K) (ε) → Φ(ε) = ∞ as K → ∞, so we may choose K with
Φ(K) (ε) > w0 /c1 + 1. Then (14.41) shows that τ∗ < ε whp; since ε > 0 is
arbitrary, this says that
p
τ∗ −→ 0.
p
We substitute this in (14.39), and obtain Nk /N0 −→ 0 for every k > 1; hence
also
p
Nk /n −→ 0,
k > 1.
(14.42)
Finally, we return to (14.30), and see that
K(n − N ′ ) 6 m 6 Cn.
(14.43)
Let ε > 0 and choose K > C/ε; then (14.43) yields n − N ′ < εn and thus
N ′ > (1 − ε)n. Further, by (14.42),
N0 = N ′ −
K
X
1
Nk = N ′ + op (n) > (1 − ε)n + op (n),
p
so w.h.p. N0 > (1 − 2ε)n. This shows that N0 /n −→ 1, which together with
(14.42) completes the proof in the case ρ = 0.
Simply generated trees and random allocations
167
This completes the proof of Theorem 11.4, and thus also of Theorem 11.6.
Proof of Theorem 11.7. Conditioned on the numbers Nk = Nk (Bm,n ), k =
0, 1, . . . , the numbers Y1 , . . . , Yn are obtain by placing N0 0’s, N1 1’s, . . . , in
(uniformly) random order; thus the conditional probability is
ℓ
ℓ
Y
Y
Nyi + O(1)
N y i − ci
=
,
n
−
i
+
1
n + O(1)
i=1
i=1
(14.44)
where ci := |{j < i : yj = yi }|. By Theorem 11.4, this product converges in probQ
ability to ℓi=1 πyi as n → ∞, and the result follows by taking the expectation
(using dominated convergence).
P(Y1 = y1 , . . . , Yℓ = yℓ | N0 , N1 , . . . ) =
15. Trees and balls-in-boxes
The proofs of the results for random trees are based on a connection with the
balls-in-boxes model. This connection is well-known, see e.g. Otter [93], Dwass
[36], Kolchin [76], Pitman [99], but for completeness we give full proofs.
We consider a fixed weight sequence w = (wk ) and the corresponding random
trees Tn and random allocations Bm,n ; we write as above Bm,n = (Y1 , . . . , Yn ) =
(m,n)
(m,n)
(Y1
, . . . , Yn
).
We begin with some deterministic considerations. The idea is to regard the
outdegrees of the nodes of a tree T as an allocation; we regard the nodes as both
balls and boxes, and if v is a node, we put the children of v as balls in box v.
There are two complications, which will be dealt with in detail below: we have
to specify an ordering of the nodes and we will not obtain all allocations.
Let T be a finite tree, with |T | = n. Take the nodes in some prescribed order
v1 , . . . , vn , for definiteness we use the depth-first order (this is the lexicographic
order on V∞ ), and list the outdegrees as d1 = d+ (v1 ), . . . , dn = d+ (vn ). We
call this the degree sequence of T and denote it by Λ(T ) := (d1 , . . . , dn ). Note
that the tree T can be reconstructed from (d1 , . . . , dn ), so T is determined by
Λ(T ) = (d1 , . . . , dn ).
By (2.2), d1 + · · · + dn = n − 1, so (d1 , . . . , dn ) can be seen as an allocation
of n − 1 balls in n boxes: Λ(T ) = (d1 , . . . , dn ) ∈ Bn−1,n . Consequently, Λ is an
injective map Tn → Bn−1,n . Note also that Λ preserves the weight:
w(T ) = w(Λ(T ))
(15.1)
by the definitions (2.3) and (11.2). However, not every allocation corresponds
to a tree, so Λ is not onto. We begin by characterizing the image Λ(Tn ). We use
a simple and well-known extension of (2.2).
Lemma 15.1. Let T be a tree and T ′ a subtree with the same root. Let ∂T ′ :=
{v ∈ V (T ) \ V (T ′ ) : v ∼ w for some w ∈ T ′ } be the set of nodes outside T ′ with
a parent inside it. Then,
X
′
′
d+
(15.2)
T (v) = |T | + |∂T | − 1.
v∈T ′
168
S. Janson
Proof. The set of children of the nodes in T ′ consists of V (T ′ ) \ {o} ∪ ∂T ′ .
Lemma 15.2. A sequence (d1 , . . . , dn ) ∈ Nn0 is the degree sequence of a tree
T ∈ Tn if and only if
k
X
i=1
n
X
i=1
di > k,
1 6 k < n,
di = n − 1.
(15.3)
(15.4)
Of course, (15.4) is just the requirement that (d1 , . . . , dn ) ∈ Bn−1,n .
Proof. For any k 6 n, the nodes v1 , . . . , vk form a subtree Tk of T , and Lemma 15.1
yields
k
X
d+
(15.5)
T (vi ) = |∂Tk | + k − 1,
i=1
which yields (15.3) since |∂Tk | > 1 when k < n.
Conversely, if (d1 , . . . , dn ) satisfies (15.3)–(15.4), a tree with degree sequence
(d1 , . . . , dn ) is easily constructed. (We construct the tree by assigning degrees
in depth-first order. First, v1 is the root and gets d1 children. Next, v2 is the
first child of the root and gets d2 children. If d2 > 0, then v3 is the first child
of v2 , but if d2 = 0, we backtrack and let v3 be the second child of v1 ; in any
case, v3 gets d3 children, and so on. The point is that (15.3) assures that the
construction will not stop before we have n nodes.)
The amazing fact is that for any allocation in Bn−1,n , exactly one of its
cyclic shifts satisfies (15.3). (In particular, exactly 1/n of all allocations satisfy
(15.3).) To see this, it is simplest to consider the sequence (di − 1)ni=1 ; we state
a more general result that we will use later, see e.g. Takács [105], Wendel [108],
Pitman [99].
Lemma 15.3. Let x1 , . . . , xn ∈ {−1, 0, 1, . . . } with x1 + · · · + xn = −r 6 0. For
(j)
(j)
(j)
j ∈ Z, let x1 , . . . , xn be the cyclic shift defined by xi := xi+j with the index
Pk
(j)
(j)
taken modulo n, and consider the corresponding partial sums Sk := i=1 xi ,
k = 0, . . . , n. Then there are exactly r values of j ∈ {1, . . . , n} such that
(j)
Sk > −r,
(j)
0 6 k < n.
(15.6)
(j)
Note that S0 = 0 and Sn = −r for every i. The condition (15.6) thus says
(j)
(j)
that the walk S0 , . . . , Sn first reaches −r at time n. The case r = 0 is trivial:
(j)
since S0 = 0, (15.6) then is never satisfied for k = 0.
Proof. We extend the definition of xj for all j ∈ Z by taking the index modulo n;
thus xj+n = xj . We further define Sk for all k ∈ Z by S0 = 0 and Sk −Sk−1 = xk ,
Pk
P0
k ∈ Z; thus Sk = i=1 xi when k > 0 and Sk = − i=k+1 xi when k < 0. Then
(j)
Sk+n = Sk − r for all k ∈ Z, and Sk = Sk+j − Sj .
Simply generated trees and random allocations
169
Let further
Mk :=
min
−∞<i6k
Si =
min
k−n<i6k
Si ;
note that Mk is finite and Mk+n = Mk − r. Moreover, Mk+1 6 Mk and Mk+1 −
Mk is 0 or −1, since Sk+1 = Sk + xk+1 > Sk − 1. We have
(j)
Sk > −r, for 0 6 k < n ⇐⇒ Sk+j − Sj > −r, for 0 6 k < n
⇐⇒ Sk+j + r > Sj , for 0 6 k < n
⇐⇒ Sk+j−n > Sj , for 0 6 k < n
⇐⇒ Si > Sj , for j − n 6 i < j
⇐⇒ Mj−1 > Sj
⇐⇒ Mj−1 > Mj .
In each interval of n integers, M decreases by r in steps of 1, so there are exactly
r steps down, which completes the proof.
Corollary 15.4. If (d1 , . . . , dn ) ∈ Bn−1,n , then exactly one of the n cyclic shifts
of (d1 , . . . , dn ) is the degree sequence Λ(T ) of a tree T ∈ Tn .
Pk
Pk
Proof. Let xi := di − 1. Then i=1 xi = i=1 di − k, so (15.3) is equivalent
Pk
to i=1 P
xi > 0 for k < n, which for the shifted sequence is (15.6) with r = 1;
n
further, i=1 xi = n − 1 − n = −1. Hence the result follows by Lemma 15.3
with r = 1.
We now use our fixed weight sequence (wk ). We begin with the partition
function for simply generated trees. This was proved (in the probability weight
sequence case, which is no real loss of generality) by Otter [93], see also Dwass
[36]; an algebraic proof uses the Lagrange inversion formula [79], see e.g. Boyd
[19] and Drmota [33, Theorem 2.11]; Kolchin [76] gives a different proof by
induction. See also Pitman [99] where the relation between different approaches
is discussed.
Theorem 15.5.
Zn =
1
Z(n − 1, n).
n
Proof. By Corollary 15.4, the mapping (T, j) 7→ Λ(T )(j) , where (j) denotes
a cyclic shift as in Lemma 15.3, is a bijection of Tn × {1, . . . , n} → Bn−1,n .
Consequently, by (11.4), (15.1) and (2.5), since the weight w(y) is not changed
by cyclic shifts,
Z(n − 1, n) =
n
X X
T ∈Tn j=1
X
X
nw(T ) = nZn .
nw(Λ(T )) =
w Λ(T )(j) =
T ∈Tn
T ∈Tn
Corollary 15.6. Suppose that w0 > 0 and ω(w) > 2, with d := span(w) > 1.
If Zn > 0, then n ≡ 1 (mod d). Conversely, for some n0 (depending on w), if
n ≡ 1 (mod d) and n > n0 , then Zn > 0.
170
S. Janson
Proof. By Theorem 15.5, Zn > 0 ⇐⇒ Z(n − 1, n) > 0. The result follows from
Lemma 13.3.
In the same way we can compute various probabilities for the random tree
Tn . We begin with the root degree d+ (o); note that for any tree T , v1 is the root
o, so d+ (o) = d+ (v1 ) = d1 .
Lemma 15.7. For any d > 0 and n > 2,
P(d+
Tn (o) = d) =
n
(n−1,n)
d P(Y1
= d).
n−1
(15.7)
Thus, the distribution of the root degree d+
Tn (o) of Tn is the size-biased distribu(n−1,n)
tion of Y1
.
Lemma 15.7 is a special case of Lemma 15.9 below, but we prefer to study
this simpler case first because it shows the main ideas in the proof without the
complications (notational and others) in the more general version.
Proof. Consider an allocation (d1 , . . . , dn ) ∈ Bn−1,n ; if d1 = d, then d2 , . . . , dn
is an allocation in Bn−1−d,n−1 . Furthermore, by Lemma 15.2, an allocation
(d2 , . . . , dn ) ∈ Bn−1−d,n−1 is obtained by dropping the first term from the degree
sequence of a tree T ∈ Tn with d1 = d if and only if
d+
k
X
di > k,
1 6 k < n,
(15.8)
i=2
or, equivalently,
k
X
i=1
di+1 > k + 1 − d,
0 6 k < n − 1.
We use Lemma 15.3 again, now with xi = di+1 − 1 and r = d and see that
for any (d2 , . . . , dn ) ∈ Bn−1−d,n−1 , exactly r = d of the n − 1 cyclic shifts of
d2 , . . . , dn satisfy (15.8). Thus, by considering all trees T with d1 = d and the
n − 1 cyclic shifts of d2 , . . . , dn , we obtain each allocation (d1 , . . . , dn ) ∈ Bn−1,n
with d1 = d exactly r = d times. (It is possible that some shifts of (d2 , . . . , dn )
coincide, but this does not matter.) Consequently, using (2.4) and (11.3), and
recalling d+ (o) = d1 ,
X
w(T )
(n − 1)Zn P(d+
Tn (o) = d) = (n − 1)
T ∈Tn : d1 (T )=d
=d
X
(d1 ,...,dn )∈Bn−1,n : d1 =d
w (d1 , . . . , dn )
= d Z(n − 1, n) P(Y1 = d).
This yields the result by Theorem 15.5.
171
Simply generated trees and random allocations
Remark 15.8. More explicitly we have
(n−1,n)
Z(n − 1, n) P(Y1
X
= d) =
(d1 ,...,dn )∈Bn−1,n : d1 =d
X
=
(d2 ,...,dn )∈Bn−1−d,n−1
w (d1 , . . . , dn )
wd w (d2 , . . . , dn )
= wd Z(n − 1 − d, n − 1),
and thus
P(d+
Tn (o) = d) = dwd
Z(n − 1 − d, n − 1)
n
·
.
n−1
Z(n − 1, n)
(15.9)
(n−1,n)
Proof of Theorem 7.10. We have P(Y1
= d) → πd by Theorem 11.7 (with
m = n − 1 and λ = 1), and (7.9) follows from Lemma 15.7.
The space N0 is compact, so every sequence of random variables in it is tight,
and therefore has a subsequence converging in distribution, see [15, Section 6].
d
It follows from (7.9) that if d+
Tn (o) −→ X along a subsequence,
P∞ then P(X =
k) = kπk for every k ∈ N0 , and thus P(X = ∞) = 1 − k=0 kπk = 1 − µ.
d
d b
Consequently, X = ξb so d+
Tn (o) −→ ξ for every convergent subsequence, which
b see [15, Theorem 2.3].
means that the entire sequence converges to ξ,
This proves the part of Theorem 7.1 that describes the root degree. It remains
to consider all other nodes. This will be done by similar arguments. We begin
with a generalization of Lemma 15.7.
Lemma 15.9. Let T ′ ∈ Tf be a fixed finite subtree of the Ulam–Harris tree U∞ ,
let ℓ := |T ′ | be its size and let v1 , . . . , vℓ be its nodes in depth-first order, and let
d′1 , . . . , d′ℓ be its degree sequence. (I.e., d′i = d+
T ′ (vi ).) Suppose that d1 , . . . , dℓ ∈
N0 and that di > d′i for every i. Then, for every n > ℓ,
P d+
Tn (vi ) = di for i = 1, . . . , ℓ
X
ℓ
n
(n−1,n)
di − ℓ + 1
=
= di for i = 1, . . . ℓ . (15.10)
P Yi
n−ℓ
i=1
′
′
Note that d+
T (vi ) > di for i = 1, . . . , ℓ implies that T ⊃ T .
Proof. We have earlier used the depth-first order of the nodes to define the
degree sequence, but many other orders could be used. In this proof, we consider
only trees T that contain the given T ′ as a subtree, and then we choose the order
which first takes the nodes of T ′ in depth-first order (this is v1 , . . . , vℓ ), and then
the remaining nodes of T in depth-first order; let Λ′ (T ) be the degree sequence
in this order.
Let An be the set of trees T ∈ Tn with d+
T (vi ) = di for all i (which implies
T ⊃ T ′ ). If T ∈ An , then the degree sequence Λ′ (T ) thus begins with the given
d1 , . . . , dℓ ; furthermore, it satifies (15.3)–(15.4). Conversely, every sequence beginning with the given d1 , . . . , dℓ that satifies (15.3)–(15.4) is the degree sequence
172
S. Janson
Λ′ (T ) of a unique tree in An . Note also that (15.3) is automatically satisfied for
Pk
k < ℓ, since then di > d′i for i 6 k and i=1 d′i > k by Lemma 15.2 applied
to T ′ .
Let D := d1 + · · · + dℓ . Consider a sequence (d1 , . . . , dn ) ∈ Bn−1,n beginning
with the given d1 , . . . , dℓ , and let xi := dℓ+i − 1, for i = 1, . . . , n − ℓ. Then
(d1 , . . . , dn ) satisfies (15.3) if and only if
D+
k
X
(xi + 1) > ℓ + k
i=1
for k = 0, . . . , n − ℓ − 1, which is equivalent to
k
X
i=1
xi > −(D − ℓ),
0 6 k < n − ℓ.
Furthermore,
n−ℓ
X
i=1
xi =
n
X
ℓ+1
di − (n − ℓ) = (n − 1 − D) − (n − ℓ) = −(D − ℓ + 1).
Lemma 15.3 with r = D − ℓ + 1 thus shows that of the n − ℓ cyclic permutations
of dℓ+1 , . . . , dn , exactly D − ℓ + 1 yield a degree sequence Λ′ (T ) of a tree T ∈ An .
In other words, if we take the degree sequences Λ′ (T ) for all trees T ∈ An and
make these n − ℓ permutations of each of them, then we obtain every allocation
y = (y1 , . . . , yn ) ∈ Bn−1,n with yi = di , i = 1, . . . , ℓ, exactly D − ℓ + 1 times
each. Consequently,
X
X
(n − ℓ)Zn P(Tn ∈ An ) = (n − ℓ)
w(T ) =
(n − ℓ)w(Λ′ (T ))
=
X
T ∈An
y∈Bn−1,n : yi =di for i6ℓ
T ∈An
(D − ℓ + 1)w(y)
= (D − ℓ + 1)Z(n − 1, n) P(Yi = di for i 6 ℓ).
The result follows by Theorem 15.5.
Remark 15.10. Arguing as in Remark 15.8, we
from Lemma 15.9 the
Pobtain
ℓ
explicit formula, generalizing (15.9), with D := i=1 di and other notations as
above,
P d+
Tn (vi ) = di for i = 1, . . . , ℓ
=
wd · · · wdℓ Z(n − D − 1, n − ℓ)
n
(D − ℓ + 1) 1
. (15.11)
n−ℓ
Z(n − 1, n)
Remark 15.11. Note that Lemma 15.9 (or (15.11)) shows that the probability
remains exactly the same if we permute d1 , . . . , dℓ , provided that the permuted
Simply generated trees and random allocations
173
sequence (dσ(i) ) still is allowed, i.e., dσ(i) > d′i for all i 6 ℓ. However, if the latter
condition fails for some i, then the probability typically becomes 0. (This is an
interesting case of a symmetry that is not complete.)
For example, considering only the root o and its first child 1, we have
+
+
+
′
′
P(d+
Tn (o) = d and dTn (1) = d ) = P(dTn (o) = d and dTn (1) = d)
whenever d, d′ > 1; however, if, say, d > 1 and d′ = 0, then the right-hand side
is 0 while the left-hand side in general is not.
Remark 15.12. Lemma 15.9 extends with minor modifications (mainly notational) to arbitrary finite rooted subtrees T ′ of U∞ (not necessarily satisfying
(6.1)). We omit the details.
16. Proof of Theorem 7.1
For convenience, we first repeat the theorem.
Theorem 7.1. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 2.
(i) If ν > 1, let τ be the unique number in [0, ρ] such that Ψ(τ ) = 1.
(ii) If ν < 1, let τ := ρ.
In both cases, 0 6 τ < ∞ and 0 < Φ(τ ) < ∞. Let
πk :=
τ k wk
,
Φ(τ )
k > 0;
(7.1)
then (πk )k>0 is a probability distribution, with expectation
µ = Ψ(τ ) = min(ν, 1) 6 1
(7.2)
and variance σ 2 = τ Ψ′ (τ ) 6 ∞. Let Tb be the infinite modified Galton–Watson
d
tree constructed in Section 5 for the distribution (πk )k>0 . Then Tn −→ Tb as
n → ∞, in the topology defined in Section 6.
Furthermore, in case (i), µ = 1 (the critical case) and Tb is locally finite with
an infinite spine; in case (ii) µ = ν < 1 (the subcritical case) and Tb has a finite
spine ending with an explosion.
Proof. First, as in the proof of Theorem 11.4, Lemma 13.2 shows that τ defined
by (i) and (ii) is well-defined and equals τ (1) defined in Lemma 13.2; since 1 <
2 6 ω we have τ < ∞ and Φ(τ ) < ∞. Further, (13.3) yields Ψ(τ ) = min(1, ν).
Hence, by Lemma 4.2, (πk ) is a probability distribution with mean and variance
as asserted. (This is a special case of the corresponding claims in Theorem 11.4,
with λ = 1. We have λ = 1 here since we relate the random trees to allocations
with m = n − 1, and thus m/n → 1.)
The claims in the final paragraph are obvious from (7.2) and the construction
in Section 5.
174
S. Janson
d
We turn to the main assertion, Tn −→ Tb . Since T is a compact metric
space, any sequence of random trees in T is tight, and has thus a convergent
subsequence. (See e.g. [15, Section 6].) In particular, this holds for Tn .
d
Consider a limiting random tree T in T such that Tn −→ T along some
d
subsequence. We will show that then T = Tb , regardless of the subsequence; this
d b
implies Tn −→ T for the full sequence, which then completes the proof.
V∞
We have defined T in Section 6 such that T ⊂ N0 using the embedding
d b
T 7→ (d+
T (v))v∈V∞ . In order to show T = T , it thus suffices to show that the
ℓ
distributions agree on cylinder sets, i.e., that d+ (v1 ), . . . , d+ (vℓ ) ∈ N0 has the
same distribution for T and Tb , for any finite set V = {v1 , . . . , vℓ } ⊂ V∞ . Since
ℓ
N0 is a countable set, this is equivalent to
+
+
P dT+ (v1 ) = d1 , . . . , d+
(v
)
=
d
=
P
d
(v
)
=
d
,
.
.
.
,
d
(v
)
=
d
, (16.1)
ℓ
ℓ
1
1
ℓ
ℓ
b
b
T
T
T
for any finite set V = {v1 , . . . , vℓ } ⊂ V∞ and any d1 , . . . , dℓ ∈ N0 .
It thus suffices to show (16.1). Furthermore, given any finite set V ⊂ V∞ , we
may enlarge it to a finite set V satisfying (6.2)–(6.4), i.e., a set that is the node
set of some finite tree in Tf . It thus suffices to show (16.1) for V = V (T ′ ) with
T ′ ∈ Tf .
We make one more reduction. Suppose that V = V (T ′ ) with T ′ ∈ Tf and
that (16.1) contains a condition d+ (vi ) = di with di < d+
T ′ (vi ). Let v := vi and
let u be the last child of v in T ′ ; thus (recalling the notation in Section 6) u = vj
+
for some integer j = d+
T ′ (v) > di . By (6.5), any tree T ∈ T with dT (v) = di
+
+
has dT (u) = 0, and further (e.g. by (6.5) and induction) dT (s) = 0 for every
descendant s of u. Thus, letting Tu′ denote the subtree of T ′ rooted at u, for any
s ∈ Tu′ , the event {d+
(v) = di and d+
(s) > 0} is impossible and has probability
Tb
Tb
0; furthermore, the same holds for T , i.e., P d+
(v) = di and dT+ (s) > 0 = 0.
T
Consequently, if (16.1) contains a condition d+ (vj ) = dj with vj ∈ Tu′ and
dj > 0, then both sides are trivially 0. On the other hand, if dj = 0 for all
vj ∈ Tu′ , then the conditions d+ (vj ) = dj are redundant in (16.1) and may be
deleted, so we may replace T ′ by the smaller tree with Tu′ removed. Repeating
this pruning, if necessary, we see that it suffices to show (16.1) for V = V (T ′ )
when T ′ ∈ Tf is a finite tree and di > d+
T ′ (vi ) for every i.
Recall that di in (16.1) may be infinite. We study three different cases separately.
Case (a): Every di < ∞. This is the case treated in Lemma 15.9; we take the
limit as n → ∞ in (15.10) and obtain by Theorem 11.7 (with m = n − 1 and
Pℓ
λ = 1 < ω(w)), letting again D := i=1 di ,
ℓ
Y
πdi .
(v
)
=
d
for
i
=
1,
.
.
.
,
ℓ
→
(D
−
ℓ
+
1)
P d+
i
Tn i
i=1
175
Simply generated trees and random allocations
d
Since we have assumed Tn −→ T along a subsequence, this yields
ℓ
Y
πdi .
P dT+ (vi ) = di for i = 1, . . . , ℓ = (D − ℓ + 1)
(16.2)
i=1
Now consider the modified Galton–Watson tree Tb . (Recall its construction
in Section 5.) If the tree Tb has d+
(v ) = di < ∞ for all vi ∈ T ′ , then the spine
Tb i
′
has to extend outside T . The first point on the spine outside T ′ is a node in
∂T ′ (regarding T ′ as a subtree of Tb ). The condition d+ (vi ) = di for vi ∈ T ′
Tb
determines the boundary ∂T ′ of T ′ in Tb , which thus not depend on Tb , and
Lemma 15.1 shows that |∂T ′ | = D − ℓ + 1.
Fix a node u ∈ ∂T ′ , and consider the event Eu that the spine of Tb passes
through u and that d+
(v ) = di for i = 1, . . . , ℓ. The event Eu thus specifies
Tb i
the nodes in T ′ that are special in the construction of Tb (viz. the nodes on the
path from o to u), and for each special node it specifies which of its children
will be special; furthermore it specifies the number of children for each node in
T ′ , special or not. Recall that the probability that a special node has d < ∞
children, with a given one of them being special, is πd , just as the probability
′
that a normal
Qℓ node has d children. Thus, by independence, for every u ∈ ∂T ,
P(Eu ) = i=1 πdi . This probability thus does not depend on u, so summing over
the D − ℓ + 1 nodes u ∈ ∂T ′ we obtain
ℓ
X
Y
πdi ,
P d+
(v
)
=
d
for
i
=
1,
.
.
.
,
ℓ
=
P(E
)
=
(D
−
ℓ
+
1)
i
i
u
b
T
u∈∂T ′
i=1
which together with (16.2) shows (16.1) in this case. (Cf. Remark 5.7 for a
similar argument.)
Case (b): Exactly one di = ∞. Suppose that dj = ∞ and di < ∞ for i 6= j.
Define, for 0 6 k 6 ∞,
+
Ak := {T ∈ T : d+
T (vi ) = di for i 6= j and dT (vj ) = k}.
We thus want to show P(T ∈ A∞ ) = P(Tb ∈ A∞ ). We define further
[
A>K :=
Ak ,
K6k6∞
d
and note that since Tn −→ T (along a subsequence), we have (along the subsequence), for any finite K,
P(Tn ∈ A>K ) → P(T ∈ A>K ).
(16.3)
We define also (for finite k) the analogous
Bk := {(y1 , . . . , yn ) ∈ Bn−1,n : yj = k and yi = di for i 6 ℓ with i 6= j}.
176
S. Janson
Then Lemma 15.9 can be written, with D′ :=
P(Tn ∈ Ak ) = (k + D′ − ℓ + 1)
P
i6=j
di , for k < ∞,
n
P(Bn−1,n ∈ Bk ).
n−ℓ
(16.4)
Consider, for simplicity, k > k0 := maxi6=j di . Then (14.44) shows that, with
Ni = Ni (Bn−1,n ),
P(Bn−1,n
Nk Y Ndi + O(1)
∈ Bk ) = E P(Bn−1,n ∈ Bk | N0 , N1 , . . . ) = E
n
n + O(1)
i6=j
N Nk Y Nd i
k
.
+O 2
=E
n
n
n
i6=j
(The implicit constants in the O’s in this proof may depend on ℓ and d1 , . . . , dℓ ,
but not on n or k.) Consequently, by (16.4),
N Nk Y Nd i
k
−1
+O 2
P(Tn ∈ Ak ) = k + O(1) (1 + O(n ) E
n
n
n
i6=j
Y
kNk N
kN
di
k
−1
+O E
.
= 1 + O(k ) E
n
n
n2
i6=j
Summing over k > K, we obtain for any K > k0 , using
any allocation Bn−1,n ,
P(Tn ∈ A>K ) =
∞
X
k=K
P∞
k=0
kNk = n − 1 for
P(Tn ∈ Ak )
P
P
Y Nd k>K kNk
k>K kNk
i
= 1 + O(K ) E
+O E
n
n
n2
i6=j
P
n − 1 − k<K kNk Y Ndi
+ O(n−1 ).
= 1 + O(K −1 ) E
n
n
−1
i6=j
(16.5)
By Theorem 11.4, for any fixed K, as n → ∞,
P
X
Y
n − 1 − k<K kNk Y Ndi p
−→ 1 −
kπk
πdi .
n
n
i6=j
k<K
i6=j
By dominated convergence, the expectation converges to the same limit, and
thus (16.3) and (16.5) yield, for K > k0 ,
Y
X
P(T ∈ A>K ) = 1 + O(K −1 ) 1 −
kπk
πdi .
k<K
i6=j
(16.6)
177
Simply generated trees and random allocations
Finally, let K → ∞ to obtain
Y
Y
X
P(T ∈ A∞ ) = 1 −
πdi = (1 − µ)
πdi .
kπk
k<∞
i6=j
(16.7)
i6=j
Now consider Tb . If d+
(v ) = dj = ∞, then the spine ends with an explosion
Tb j
at vj . This fixes the spine, and the event that d+
(v ) = di for i 6= j then means,
Tb i
just as in case (a) when we considered a specific Eu , that we have specified the
number of children to be di for these nodes, and for the special nodes (except
vj ) we have also specified which child is special. The probability of this is πdi for
each i 6= j, and the probability that the special node vj has an infinite number
of children is, by (5.2), 1 − µ. Hence, by independence,
P(Tb ∈ A∞ ) = (1 − µ)
Y
πdi ,
(16.8)
i6=j
which together with (16.7) shows P(T ∈ A∞ ) = P(Tb ∈ A∞ ), which is (16.1) in
this case.
Case (c): More than one di = ∞. By the definition of the modified Galton–
Watson tree Tb , there is at most one node with infinite degree, so in this case,
P d+
(v ) = di for i = 1, . . . , ℓ = 0.
Tb i
This means that the sum of these probabilities for all sequences (d1 , . . . , dn )
with at most one infinite value is 1. But we have shown that for such sequences,
the probability is the same for T as for Tb , so the probabilities for T for these
sequences also sum up to 1. Consequently, if more than one di = ∞, then
(v
)
=
d
for
i
=
1,
.
.
.
,
ℓ
=0
P d+
i
i
T
too, which shows (16.1) in this case.
This shows that (16.1) holds for any v1 , . . . , vm such that {v1 , . . . , vm } =
m
V (T ′ ) where T ′ ∈ Tf is a finite tree and (d1 , . . . , dn ) is any sequence in N0 with
+
di > dT ′ (vi ) for every i. As discussed above, this implies (16.1) in full generality
d
d
and thus T = Tb , which shows that Tn −→ Tb .
17. Proofs of Theorems 7.11 and 7.12
We begin by stating another version of the correspondence between simply generated trees and the balls-in-boxes model.
Lemma 17.1. We may couple Tn and Bn−1,n such that the degree sequence
Λ(Tn ) is a cyclic shift of Bn−1,n , and, conversely, Bn−1,n is a uniformly random
cyclic shift of Λ(Tn ).
178
S. Janson
Proof. Let Bn−1,n = (Y1 , . . . , Yn ) and let (Yσ(1) , . . . , Yσ(n) ) be the unique cyclic
shift of (Y1 , . . . , Yn ) that is the degree sequence of a tree in Tn , see Corollary 15.4.
d
Then (Yσ(1) , . . . , Yσ(n) ) = Λ(Tn ), as a consequence of Corollary 15.4 and the
invariance of the weight w(Y1 , . . . , Yn ) under cyclic shifts. Consequently, we
may couple Bn−1,n and Tn such that (Yσ(1) , . . . , Yσ(n) ) = Λ(Tn ), and the result
follows.
Proof of Theorem 7.11. We use the coupling in Lemma 17.1. Then Nd in Theorem 7.11 equals Nd (Bn−1,n ) in Theorem 11.4, and thus (7.12) follows by (11.15).
We obtain (7.11) as a simple consequence of (7.12), using P(d+
Tn (v) = d |
(v)
=
d)
=
E
N
/n,
cf.
the
proof
of
Theorem
Nd ) = Nd /n and thus P(d+
11.7.
d
Tn
Alternatively, we can arrange so that d+
(v)
=
Y
,
and
the
result
then
follows
1
Tn
by Theorem 11.7.
Proof of Theorem 7.12. We use again the coupling in Lemma 17.1. Let T be a
fixed tree of size ℓ and let its degree sequence be (d¯1 , . . . , d¯ℓ ). Recall that we
have defined the degree sequence using depth-first search. It follows that if a
tree has degree sequence (d1 , . . . , dn ) and a node v is visited as node vj in the
depth-first search, then the subtree rooted at v has degree sequence (dj , . . . , dk ),
where we stop when this is a degree sequence of a tree, i.e., when it satisfies the
condition in Lemma 15.2. In particular, the subtree rooted at v equals T if and
only if (dj , . . . , dj+ℓ−1 ) = (d¯1 , . . . , d¯ℓ ). (Clearly, this is impossible if j > n−ℓ+1,
since then a tree would be completed with less size than ℓ.)
Consequently, NT equals the number of substrings (d¯1 , . . . , d¯ℓ ) in (Y1 , . . . , Yn ),
regarded as a cyclic sequence. In other words, if we let Ij be the indicator of
the event (Yj , . . . , Yj+ℓ−1 ) = (d¯1 , . . . , d¯ℓ ), where we define Yi := Yi−n for i > n,
then
n
X
Ij .
(17.1)
NT =
j=1
In particular, taking the expectation and using the rotational symmetry,
P(Tn;v = T ) =
1
E NT = E I1 = P (Y1 , . . . , Yℓ ) = (d¯1 , . . . , d¯ℓ ) ,
n
and thus Theorem 11.7 yields
P(Tn;v = T ) →
ℓ
Y
πd̄i = P(T = T ),
i=1
which proves (7.13).
In order to show the stronger result (7.14), we condition as in the proof of
Theorem 11.7 on N0 , N1 . . . and obtain, see (14.44),
E(Ij | N0 , N1 , . . . ) = P (Y1 , . . . , Yℓ ) = (d¯1 , . . . , d¯ℓ ) | N0 , N1 , . . .
ℓ
ℓ
Y
Y
(17.2)
Nd̄i − ci
Nd̄i
1
=
,
=
+O
n − i + 1 i=1 n
n
i=1
Simply generated trees and random allocations
179
where ci := |{j < i : d¯j = d¯i }|. If |j − k| > ℓ and |j − k ± n| > ℓ (i.e., j and k
have distance at least ℓ, regarded as point on a circle of length n), then similarly,
with c′i := |{j 6 ℓ : d¯j = d¯i }|,
E(Ij Ik | N0 , N1 , . . . ) =
ℓ
ℓ
Y
Nd̄i − ci Y Nd̄i − ci − c′i
,
n − i + 1 i=1 n − ℓ − i + 1
i=1
and it follows that
Cov(Ij , Ik | N0 , N1 , . . . ) = O(1/n).
(17.3)
For j and k of distance less than ℓ, we use the trivial
| Cov(Ij , Ik | N0 , N1 , . . . )| 6 1.
(17.4)
There are less than n2 pairs (j, k) of the first type and O(n) pairs of the second
type, and thus by (17.1) and (17.3)–(17.4),
Var(NT | N0 , N1 , . . . ) =
n X
n
X
j=1 k=1
Cov(Ij , Ik | N0 , N1 , . . . ) = O(n).
p
Consequently, NT /n − E(NT /n | N0 , N1 , . . . ) −→ 0, and thus by (17.1), (17.2)
and Theorem 11.4,
ℓ
N Y
Nd̄i
NT
T =E
+ op (1)
N0 , N1 , . . . + op (1) =
n
n
n
i=1
p
−→
ℓ
Y
πd̄i = P(T = T ).
i=1
18. Asymptotics of the partition functions
We have a simple asymptotic result for the partition function Z(m, n) (to the
first order in the exponent, at least if ρ > 0):
Theorem 18.1. Let w = (wk )k>0 be a weight sequence with w0 > 0 and wk > 0
for some k > 1. Suppose that n → ∞ and m = m(n) with span(w) | m, m → ∞
and m/n → λ where 0 6 λ < ω, and let τ be as in Theorem 11.4.
(i) If ρ > 0, then
1
log Z(m, n) → log Φ(τ ) − λ log τ ∈ (−∞, ∞).
n
(18.1)
(ii) If ρ = 0 and λ > 0, then
1
log Z(m, n) → ∞.
n
(18.2)
180
S. Janson
In both cases, the result can be written
1
Φ(t)
Φ(t)
= log inf
6 ∞.
log Z(m, n) → log inf
06t<∞ tλ
06t6ρ tλ
n
(18.3)
If 0 6 λ 6 ν and ρ > 0, the limit can also be written log Φ(τ ) − Ψ(τ ) log τ .
The formula (18.1) is shown by a physicists’ proof by Bialas, Burda and
Johnston [14].
Remark 18.2. If λ = 0, then τ = 0, and we interpret the right-hand side of
(18.1) as log Φ(0) = log w0 ; this is in accordance with (18.3).
It is easily seen that the result holds, with this limit, also in the rather trivial
case when m is bounded, provided Z(m, n) > 0.
Remark 18.3. If ω < ∞, then the result holds also when λ = ω, provided
Z(m, n) > 0, if we let τ = ∞ as in Remark 11.10 and interpret the right-hand
side of (18.1) as the limit value log wω , which again is in accordance with (18.3).
This follows from Remark 18.2 by the symmetry argument in Remark 11.10.
Remark 18.4. Using the function τ (x) defined in Theorem 11.6, the result
(18.1) can also be written, using the continuity of τ (x) and an extra argument
(which we omit) when λ = 0,
log Z(m, n) = n log Φ(τ (x)) − m log τ (x) + o(n)
(18.4)
Z(m, n) = Φ(τ (x))n τ (x)−m eo(n) .
(18.5)
or, equivalently,
As in Theorem 11.6, it suffices here that m/n 6 C < ω (and m → ∞).
Proof of Theorem 18.1. Note that the assumptions imply that Z(m, n) > 0 (at
least for n, and thus m, large) by Lemma 13.3. The equivalence between (18.1)–
(18.2) and (18.3) follows from (11.16).
(i): Assume first λ > 0. Since ρ > 0 and λ > 0, we then have τ > 0. Thus
w = (wk ) is equivalent to π = (πk ), and Lemma 11.3 yields
Z(m, n) = Z(m, n; w) = Φ(τ )n τ −m Z(m, n; π).
We saw in the proof of Theorem 11.4, case (a), that Lemmas 14.3 and 14.4 yield
Z(m, n; π) = exp(o(n)), and thus
Z(m, n) = exp n log Φ(τ ) − m log τ + o(n) ,
which yields (18.1).
It remains to consider the case λ = 0. Then m/n → 0, and we may assume
m < n/2. In any allocation of m balls, there are at most m non-empty boxes.
Let us mark 2m boxes, including all non-empty boxes. For each choice of the
marked boxes, we have in them
an allocation in Bm,2m , and only empty boxes
n
outside; since there are 2m
choices of marked boxes,
n
Z(m, n) 6
wn−2m Z(m, 2m).
(18.6)
2m 0
181
Simply generated trees and random allocations
On the other hand, any allocation of m balls in 2m boxes can be extended to
an allocation in Bm,n with the last n − 2m boxes empty; thus
Z(m, n) > w0n−2m Z(m, 2m).
We have, by Stirling’s formula, using m/n → λ = 0,
en 2m
n
2m
e 2m
m
1
1
log
=
log −
log
→ 0.
6 log
2m
n
n
2m
n
2
n
n
(18.7)
(18.8)
Moreover, by the case λ > 0 just proved, we have from (18.1) log Z(m, 2m) =
O(m) = o(n). Consequently, (18.6)–(18.8) yield
log Z(m, n) = (n − 2m) log w0 + o(n) = n log w0 + o(n),
showing (18.1) in the case λ = 0.
(ii): As in the proof of Lemma 14.4, we use the truncated weight sequence
w(K) defined in (14.14), where K is so large that span(w(K) ) = span(w) and
ω(w(K) ) > λ, and we let again ΦK and ΨK be the corresponding functions for
w(K) and define τK by ΨK (τK ) = λ.
For any t > 0, ΦK (t) → Φ(t) = ∞ as K → ∞, and thus (14.19) holds,
showing that for large K, ΨK (t) > λ and thus τK < t. Since t is arbitrary, this
shows that τK → 0 as K → ∞. Applying (i) to w(K) and its partition function
ZK we obtain, for every large K,
lim inf
n→∞
1
1
log Z(m, n) > lim
log ZK (m, n) = log ΦK (τK ) − λ log τK
n→∞ n
n
> log w0 − λ log τK .
As K → ∞, τK → 0 so the right-hand side tends to ∞, which completes the
proof.
Remark 18.5. The case ρ = 0 and λ = 0 is excluded from Theorem 18.1; in this
case, almost anything can happen. To see this, note first that by (18.6)–(18.8),
if m/n → λ = 0, then
1
1
log Z(m, n) = log w0 + log Z(m, 2m) + o(1).
n
n
(18.9)
1
Furthermore, by Theorem 18.1(ii), m
log Z(m, 2m) → ∞ as m → ∞, and hence
m/ log Z(m, 2m) → 0. We can choose m = m(n) → ∞ with m/n → 0 so
rapidly that m/n ≪ m/ log Z(m, 2m); then n1 log Z(m, 2m) → 0 and (18.9)
yields n1 log Z(m, n) → log w0 = log Φ(0).
We can also choose m with m/n → 0 so slowly that m/n ≫ m/log Z(m, 2m);
then n1 log Z(m, 2m) → ∞ and (18.9) yields n1 log Z(m, n) → ∞.
Moreover, we can choose m(n) oscillating between these two cases, and then
lim inf n1 log Z(m, n) = log Φ(0) and lim sup n1 log Z(m, n) = ∞, and we can arrange so that every number in [log Φ(0), ∞) is a limit point of some subsequence.
182
S. Janson
For many weight sequences with ρ = 0, one can choose m(n) such that
log Z(m, n) → a for any given a ∈ [log Φ(0), ∞]. For example for wk = k! as in
Example 10.8, we have by [64] and Theorem 15.5 Z(n−1, n) ∼ en! and it follows,
1
log Z(m, 2m) = log m + O(1), so
arguing similarly to (18.6) and (18.7), that m
1
taking m ∼ an/ log n, we obtain n log Z(m, n) → a by (18.9).
However, if wk increases very rapidly, it may be impossible to obtain convergence of the full sequence to a limit different from log Φ(0) or ∞, so we
can only achieve convergence of subsequences. For example, if w0 = 1 and
wk+1 > Z(k, 2k)2 , then Z(k + 1, 2(k + 1)) > wk+1 > Z(k, 2k)2 , and it follows
easily from (18.9) that lim sup n1 log Z(m, n) > 2 lim inf n1 log Z(m, n).
1
n
We apply Theorem 18.1 to simply generated trees.
Theorem 18.6. Let w = (wk )k>0 be any weight sequence with w0 > 0 and
wk > 0 for some k > 2. Suppose that n → ∞ with n ≡ 1 (mod span(w)), and
let τ be as in Theorem 7.1. Then
Φ(t)
1
log Zn → log Φ(τ ) − log τ = log inf
∈ (−∞, ∞].
06t<∞ t
n
The limit is finite if ρ > 0, and +∞ if ρ = 0.
Proof. An immediate consequence of Theorems 15.5 and 18.1.
For probability weight sequences, Theorem 18.6 can be expressed as follows,
cf. Remark 7.9.
Theorem 18.7. Let T be a Galton–Watson tree with offspring distribution ξ,
and assume that P(ξ = 0) > 0 and P(ξ > 1) > 0. Suppose that n → ∞ with
n ≡ 1 (mod span(ξ)), and let τ be as in Theorem 7.1. Then
1
Φ(t)
log P(|T | = n) → log Φ(τ ) − log τ = log inf
∈ (−∞, 0].
06t<∞ t
n
If E ξ = 1, or if E ξ < 1 and ρ = 1, then the limit is 0; otherwise it is strictly
negative. In other words, P(|T | = n) decays exponentially fast in the supercritial
case (then τ < 1) and in the subcritical case with ρ > 1 (then τ > 1), but only
subexponentially in the critical case and in the subcritical case with ρ = 1 (then
τ = 1).
Proof. We have P(|T | = n) = Zn , see Section 2, and we apply Theorem 18.6.
Since now (wk ) is a probability weight sequence, we have ρ > 1; furthermore,
inf 06t<∞ Φ(t)/t 6 Φ(1)/1 = 1, with equality if and only if τ = 1, see Remark 7.4. The final claims follow using the definition of τ in Theorem 7.1.
When ρ > 0 and λ > 0 (which are equivalent to τ > 0), we can also prove
stronger “local” versions of Theorems 18.1 and 18.6, showing that the partition
function behaves smoothly for small changes in m or n.
Theorem 18.8. Let w = (wk )k>0 be a weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ
Simply generated trees and random allocations
183
where 0 < λ < ω, and let τ be as in Theorem 11.4. If ρ > 0, then, for every
fixed k ∈ Z such that span(w) | k,
Z(m + k, n)
→ τ −k .
Z(m, n)
(18.10)
Proof. For any k > 0, by (11.2)–(11.4),
P(Y1 = k) =
and thus
wk Z(m − k, n − 1)
,
Z(m, n)
(18.11)
P(Y1 = k)
wk Z(m − k, n − 1)
=
.
P(Y1 = 0)
w0 Z(m, n − 1)
Since Theorem 11.7 yields
wk
πk
P(Y1 = k)
= τk
,
→
P(Y1 = 0)
π0
w0
we see (replacing n by n + 1) that (18.10) holds when −k ∈ supp(w). Furthermore, the set of k ∈ Z such that (18.10) holds for any allowed sequence m(n) is
easily seen to be a subgroup of Z (since we may replace m by m±k ′ for any fixed
k ′ ). Consequently, by (3.3), this set contains every multiple of span(w).
Theorem 18.9. Let w = (wk )k>0 be a weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ
where 0 6 λ < ω, and let τ be as in Theorem 11.4. Then,
Z(m, n + 1)
→ Φ(τ ).
Z(m, n)
(18.12)
Proof. By (18.11) with k = 0 and Theorem 11.7,
w0 Z(m, n − 1)
w0
= P(Y1 = 0) → π0 =
,
Z(m, n)
Φ(τ )
and the result follows since w0 6= 0.
For trees we have a corresponding result:
Theorem 18.10. Let w = (wk )k>0 be a weight sequence with w0 > 0 and
wk > 0 for some k > 2. If ρ > 0 and span(w) = 1, then
Zn+1
Φ(τ )
→
.
Zn
τ
Proof. By Theorems 15.5 and 18.8–18.9,
Zn+1
nZ(n, n + 1)
n
Z(n, n + 1)
Z(n, n)
=
=
·
·
→ Φ(τ )τ −1 .
Zn
(n − 1)Z(n − 1, n)
n−1
Z(n, n)
Z(n − 1, n)
184
S. Janson
We assumed here span 1 for convenience only; if span(w) = d, we instead
obtain, by a similar argument, Zn+d /Zn → (Φ(τ )/τ )d .
In the case ν > 1 and σ 2 = τ Ψ′ (τ ) < ∞ (which is automatic if ν > 1), i.e.
our case Iα, Theorem 18.6 can be sharpened substantially as follows, see Otter
[93], Meir and Moon [85], Kolchin [76], Drmota [33].
Theorem 18.11. Let w = (wk ), τ and σ 2 be as in Theorem 7.1, and let
d := span(w). If ν > 1 and σ 2 < ∞, then, for n ≡ 1 (mod d),
s
n
Φ(τ )n τ 1−n
Φ(τ )
Φ(τ )
·
Zn ∼ √
n−3/2 .
=d
2πΦ′′ (τ )
τ
n3/2
2πσ 2
d
(18.13)
Proof. Replacing (wk ) by (πk ) and using (4.3), we see that it suffices to consider
the case of a probability weight sequence with τ = Φ(τ ) = 1. By Theorem 15.5,
(11.5) and (8.1), in this case the result is equivalent to
d
,
P(Sn = n − 1) ∼ √
2πσ 2 n
which is the local central limit theorem in this case, see e.g. Kolchin [76, Theorem
1.4.2] or use Lemma 14.1 and Remark 14.2.
There is a corresponding improvement of Theorem 18.1.
Theorem 18.12. Let w = (wk ), m = m(n), τ amd σ 2 be as in Theorem 11.4,
and let √
d := span(w). If 0 < λ < ν, or λ = ν and σ 2 < ∞, then, for m =
λn + o( n) with m ≡ 0 (mod d),
d
Z(m, n) ∼ √
Φ(τ )n τ −m .
2πσ 2 n
(18.14)
Proof. Again it suffices to consider the case of a probability weight sequence
with τ = Φ(τ ) = 1; this time using (11.9). In this case the result is by (11.5)
equivalent to
d
P(Sn = m) ∼ √
,
2πσ 2 n
which again is the local central limit theorem and follows e.g. by Lemma 14.1
and Remark 14.2.
Remark 18.13. The asymptotic formula (18.14) holds for arbitrary m = m(n)
with 0 < c 6 m/n 6 C < ω and m ≡ 0 (mod d), and either C < ν or
C = ν and Φ′′ (ρ) < ∞ (which means that Ψ′ (ρ) < ∞ and thus the distribution
(11.13) has finite variance for τ = ν), provided τ is replaced by τ (m/n) given by
Ψ(τ (m/n)) = m/n. (Cf. Theorem 11.6.) The proof is essentially the same (as in
the proof of Theorem 11.6, it suffices to consider subsequences where m(n)/n
converges); we omit the details.
185
Simply generated trees and random allocations
In the case ν = λ (ν = 1 in the tree case) and σ 2 = ∞, we have no general
results but we can obtain similar more precise versions of Theorems 18.6 and 18.1
in the important case of a power-law weight sequence, Example 12.10. (We need
1 < α 6 2 here; if α 6 1, then ν = ∞ > λ, and if α > 2, then σ 2 < ∞ so
Theorems 18.11 and 18.12 apply, see Example 12.10 with β = α + 1. Note
also that span(w) = 1.) The case λ > ν is treated in Theorem 19.34 and
Remark 19.35.
Theorem 18.14. Suppose for some c > 0 and α with 1 < α 6 2,
wk ∼ ck −α−1
as k → ∞.
(18.15)
(i) If ν = 1, then,
Zn ∼
Φ(1)1/α
Φ(1)n n−1−1/α ,
c1/α Γ(−α)1/α |Γ(−1/α)|
when 1 < α < 2,
(18.16)
and
Zn ∼
Φ(1) 1/2
πc
Φ(1)n n−3/2 (log n)−1/2 ,
when α = 2.
(18.17)
(ii) If m = νn + o(n1/α ), then
Z(m, n) ∼
Φ(1)1/α
c1/α Γ(−α)1/α |Γ(−1/α)|
Φ(1)n n−1/α ,
when 1 < α < 2,
(18.18)
and
Z(m, n) ∼
Φ(1) 1/2 Φ(1)n
√
,
πc
n log n
when α = 2.
(18.19)
Proof. This time, we did not assume w0 > 0, but we may do so without loss
of generality in the proof. In fact, if w0 = 0, then ν > 1, so in (i) we always
have w0 > 0, and in (ii) we can reduce to the case w0 > 0 by the method in
Remark 11.8.
(i) follows from Theorem 15.5 and (ii), taking m = n − 1; hence it suffices to
prove (18.18)–(18.19).
We have ρ = 1, and in the usual notation λ = ν and thus τ = ρ = 1. We
reduce to the probability weight sequence case by dividing each wk by Φ(1)
(which changes c to c/Φ(1)). Let ξ be a random variable with the distribution
(πk ) = (wk ). Then E ξ = ν. Furthermore, (18.15) yields
P(ξ > k) =
∞
X
l=k
wl ∼ cα−1 k −α .
(18.20)
186
S. Janson
Hence ξ is in the domain of attraction of an α-stable distribution, see Feller [39,
Section XVII.5]. More precisely, if we first consider the case 1 < α < 2, then
there exists an α-stable random variable Xα such that
Sn − nν d
−→ Xα .
n1/α
(18.21)
(The distribution of Xα is given by (19.93) and (19.113) below.) Moreover, a local limit law holds, see e.g. Gnedenko and Kolmogorov [46, § 50], Ibragimov and
Linnik [54, Theorem 4.2.1] or Bingham, Goldie and Teugels [16, Corollary 8.4.3],
which says
ℓ − nν +
o(1)
,
(18.22)
P(Sn = ℓ) = n−1/α g
n1/α
uniformly for all integers ℓ, where g is the density function of Xα . In particular,
Z(m, n) = P(Sn = m) ∼ n−1/α g(0).
(18.23)
The results in [39, Sections XVII.5–6] show, if we keep track of the constants
(see e.g. [63] for calculations), that
g(0) = (c Γ(−α))−1/α |Γ(−1/α)|−1 ,
(18.24)
and (18.18) follows.
In the case α = 2, [39, Section XVII.5] similarly yields
Sn
d
√
−→ N (0, c/2);
n log n
(18.25)
again a local limit theorem holds by [54, Theorem 4.2.1] or [16, Corollary 8.4.3],
and thus
ℓ − nν 1
g √
+ o(1) ,
(18.26)
P(Sn = ℓ) = √
n log n
n log n
uniformly in ℓ ∈ Z, where now g(x) is the density function (πc)−1/2 e−x
N (0, c/2). In particular,
1
1
1
Z(m, n) = P(Sn = m) ∼ √
g(0) = √
·√ ,
πc
n log n
n log n
2
/c
of
(18.27)
which proves (18.19).
Remark 18.15. The proof shows that (18.15) can be relaxed to (18.20) together
with span(w) = 1.
u
Example 18.16. Let Fm,n
be the number of labelled unrooted forests with
m labelled nodes and n labelled trees, see Example 12.7. Using the weights
wk = k k−2 /k! and w
ek = e−k wk ∼ (2π)−1/2 k −5/2 , we have by (12.35) and (11.9)
u
e
Fm,n
= m! Z(m, n; w) = m! em Z(m, n; w).
(18.28)
187
Simply generated trees and random allocations
e with α = 3/2. We
At the phase transition m = 2n, Theorem 18.14 applies to w
have c = (2π)−1/2 and, by (12.36), Φ(1) = Φ(ρ) = 1/2. Hence (18.18) yields,
after simplifications,
u
F2n,n
2−2/3 3−1/3 2n −n −2/3
e ∼
= Z(2n, n; w) = e2n Z(2n, n; w)
e 2 n
.
2n!
Γ(1/3)
(18.29)
(The constant can also be written 2−5/3 31/6 π −1 Γ(2/3).) A more general result
is proved by the same method by Britikov [20]. Flajolet and Sedgewick [40,
Proposition VIII.11], show (18.29) by a different method (although there is a
computational error in the constant given in the result there).
We end
section by considering the behaviour of the generating function
Pthis
∞
Z(z) := n=1 Zn z n . The following immediate corollary of Theorem 18.6 was
shown by Otter [93], see Minami [89] and, for ν > 1, Flajolet and Sedgewick
[40, Proposition IV.5]. See also also Remark 7.5.
Corollary 18.17. Let (wk )k>0 and τ be as in Theorem 7.1,
P∞and let ρZ be
the radius of convergence of the generating function Z(z) := n=1 Zn z n . Then
ρZ = τ /Φ(τ ).
Moreover, by (7.6), Z(ρZ ) = τ < ∞. Since the generating function Z(z) has
non-negative coefficients, it follows that Z(z) is continuous on the closed disc
|z| 6 ρZ , and |Z(z)| 6 τ there. If we, for simplicity, assume that span(w) = 1,
then |Z(z)| < |Z(ρZ )| = τ for |z| 6 ρZ , z 6= ρZ . Since |Z| < τ implies
∞
∞
X
X
|Φ(Z) − ZΦ′ (Z)| = w0 −
(k − 1)wk Z k > w0 −
(k − 1)wk |Z|k
k=1
> w0 −
∞
X
k=1
k=1
(k − 1)wk τ k = Φ(τ ) − τ Φ′ (τ ) = 0,
it follows that Φ(Z) − ZΦ′ (Z) 6= 0 if Z = Z(z) with |z| = ρZ , z 6= ρZ ; hence
the implicit function theorem and (3.13) show that Z(z) has an analytic continuation to some neighbourhood of z. Consequently, Z then can be extended
across |z| = ρZ everywhere except at z = ρZ . (If span(w) = d, the same holds
except at z = ρZ e2πij/d , j ∈ Z.)
In our case Ia (ν > 1, or equivalently τ < ρ), much more is known: Z has
a square root
p singularity at ρZ with a local expansion of Z(z) as an analytic
function of 1 − z/ρZ :
Z(z) = τ − b
p
1 − z/ρZ + . . . ,
where, with σ 2 := Var ξ given by (8.1),
s
2Φ(τ ) √ τ
b :=
= 2 ,
Φ′′ (τ )
σ
(18.30)
(18.31)
188
S. Janson
see Meir and Moon [85], Flajolet and Sedgewick [40, Theorem VI.6] and Drmota
[33, Section 3.1.4 and Theorem 2.19]; in particular, Z then extends analytically
to a neighbourhood of ρ cut at the ray [ρ, ∞). In fact, this extends (in a weaker
form) to the case ν > p
1 and σ 2 < ∞ (case Iα): (18.30) holds in a suitable region,
with an error term o( 1 − z/ρZ ), see Janson [59].
Remark 18.18. In the case ν > 1, (18.30) and (18.31) yield another proof
of (18.13) by standard singularity analysis, see e.g. Drmota [33, Theorem 3.6]
and Flajolet and Sedgewick [40, Theorem VI.6 and VII.2]; this argument can
be extended to the case ν > 1 and σ 2 < ∞, see Drmota [33, Remark 3.7] and
Janson [59, Appendix]. When ν > 1, an expansion with further terms can also
be obtained, see Minami [89] and Flajolet and Sedgewick [40, Theorem VI.6].
In the other cases (σ 2 = ∞ or ν < 1), the asymptotic behaviour of Z at the
singularity ρZ depends on the behaviour of Φ(z) at its singularity ρ. It seems
difficult to say anything detailed in general, so we study only a few examples.
We assume ν 6 1 and ω > 1; thus Lemma 3.1 implies that ρ < ∞, Φ(ρ) < ∞
and Φ′ (ρ) < ∞. We assume also ρ > 0 and span(w) = 1.
Example 18.19. Suppose that 0 < ρ < ∞ and that Φ(z) has an analytic
extension to a sector Dρ,δ := {z : | arg(ρ − z)| < π/2 + δ and |z − ρ| < δ} for
some δ > 0, and that in this sector Dρ,δ , for some a 6= 0 and non-integer α > 1,
and some f (z) analytic at ρ (which can be taken as a polynomial of degree < α),
as z → ρ.
(18.32)
Φ(z) = f (z) + a(ρ − z)α + o |ρ − z|α ,
(We have to have α > 1 since Φ′ (ρ) < ∞. For α > 2 integer, see instead
Example 18.20.) If we assume that Φ has no further singularities on |z| = ρ, this
implies by singularity analysis, see Flajolet and Sedgewick [40, Section VI.3],
wk ∼
a
k −α−1 ρα−k ,
Γ(−α)
as k → ∞.
(18.33)
The converse does not hold in general, but can be expected if the weight sequence
is very regular. For example, (18.32) holds (in the plane cut at [ρ, ∞)) if wk =
(k + 1)−β , k > 1, as in Example 10.7, with β = α + 1 > 2, ρ = 1 and a = Γ(−α),
see e.g. [40, Section VI.8].
Let F (Z) := Z/Φ(Z), so (3.13) can be written
F (Z(z)) = z.
(18.34)
Since ν 6 1, we have τ = ρ, and thus by Corollary 18.17 and (7.6) ρZ = F (τ ) =
F (ρ) and Z(ρZ ) = ρ. Note that
F ′ (ρ) =
1 − Ψ(ρ)
1−ν
Φ(ρ) − ρΦ′ (ρ)
=
=
.
2
Φ(ρ)
Φ(ρ)
Φ(ρ)
(18.35)
If ν < 1, then (18.35) yields F ′ (ρ) > 0 and (18.34) shows that ρ − Z(z) ∼
F ′ (ρ)−1 (ρZ − z) as z → ρZ . Moreover, F is defined in a sector Dρ,δ , and its
Simply generated trees and random allocations
189
image contains some similar sector DρZ ,δ′ (with 0 < δ ′ < δ) such that Z(z)
extends analytically to DρZ ,δ′ by (18.34), and it follows easily by (18.34) and
(18.32) that in DρZ ,δ′ , with some f1 (z) analytic at ρZ ,
as z → ρZ ,
(18.36)
Z(z) = f1 (z) + a1 (ρZ − z)α + o |ρZ − z|α ,
where
ρΦ(ρ)α−1
.
(1 − ν)α+1
(18.37)
ρ
a1
Φ(ρ)n−1 wn .
n−α−1 ρα−n
∼
Z
Γ(−α)
(1 − ν)α+1
(18.38)
a1 = a
As noted above, Z(z) has no other singularities on |z| = ρ, and singularity
analysis [40] applies and shows, using (18.33),
Zn ∼
However, we will show in greater generality in Theorem 19.34 and Remark 19.35
(by a straightforward reduction to the case ρ = 1 using (4.3)) that (18.33) always
implies (18.38) when ν < 1, without any assumption like (18.32) on Φ(z).
If ν = 1, we assume 1 < α < 2, since (18.32) with α > 2 implies Φ′′ (ρ) < ∞
and thus σ 2 < ∞, so (18.30) and Theorem 18.11 would apply. We now have
F ′ (ρ) = 0, and (18.32)–(18.34) yield, in some domain DρZ ,δ′ ,
1/α z 1/α
Φ(ρ)
1−
+ ....
(18.39)
Z(z) = ρ −
a
ρZ
Singularity analysis yields
Zn ∼
1
|Γ(−1/α)|
Φ(ρ)
a
1/α
n−1−1/α ρZ −n .
(18.40)
However, we have already proved in Theorem 18.14(i) (assuming ρ = 1, without loss of generality) that (18.33) implies (18.40) in this case, without any
assumption like (18.32) on Φ(z).
Example 18.20. If α > 2 is an integer, (18.32) does not exhibit a singularity.
Instead we consider w with, for some f analytic at ρ,
(18.41)
Φ(z) = f (z) + a(ρ − z)α log(ρ − z) + O |ρ − z|α ,
as z → ρ in some sector Dρ,δ . This includes the case wk = (k + 1)−α−1 , see
Flajolet and Sedgewick [40, Section VI.8].
In the case ν < 1, we obtain as above
(18.42)
Z(z) = f1 (z) + a1 (ρZ − z)α log(ρZ − z) + O |ρZ − z|α ,
as z → ρZ in some sector, with f1 (z) analytic at ρZ and a1 given by (18.37).
We again obtain by singularity analysis
ρ
Φ(ρ)n−1 wn ,
(18.43)
Zn ∼
(1 − ν)α+1
which is another instance of (19.118).
190
S. Janson
In the case ν = 1, we consider only α = 2, since σ 2 < ∞ if α > 2. Then
(18.41) yields (we have a < 0 in this case)
Z(z) = ρ −
2Φ(ρ)
−a
1/2
1 − z/ρZ
1/2
−1/2
+ . . . . (18.44)
− log (1 − z/ρZ )
Singularity analysis [40, Theorems VI.2–3] gives another proof of (18.17) in the
special case (18.41) (again assuming ρ = 1, as we may).
P∞
j
Example 18.21. Define w by Φ(z) = w0 + j=0 2−2j z 2 , for some w0 > 0; thus
supp(w) is the lacunary sequence {0} ∪ {2j }. Then ρ = 1, Φ(ρ) = w0 + 4/3 and
Φ′ (ρ) = 2; hence ν = Ψ(ρ) = 2/(w0 + 4/3). The function Φ(z) is analytic in the
unit disc and has the unit circle as a natural boundary; it cannot be extended
analytically at any point. (See e.g. Rudin [101, Remark 16.4 and Theorem 16.6].)
Taking w0 > 2/3, we have ν < 1; hence, F ′ (ρ) > 0 by (18.35). Thus F maps
the unit circle onto a closed curve Γ that goes vertically through F (1) = ρz , and
since F cannot be continued analytically across the unit circle, Z(z) cannot be
continued analytically across the curve Γ. In particular, Z(z) is not analytic in
any sector DρZ ,δ′ .
19. Largest degrees and boxes
Consider a random allocation Bm,n = (Y1 , . . . , Yn ) and arrange Y1 , . . . , Yn in
decreasing order as Y(1) > Y(2) > . . . . Thus, Y(1) is the largest number of balls
in any box, Y(2) is the second largest, and so on.
By Lemma 17.1, we may also consider the random tree Tn by taking m = n−1;
then Y(1) is the largest outdegree in Tn , Y(2) is the second largest outdegree, and
so on.
As usual, we consider asymptotics as n → ∞ and m/n → λ. (Thus λ = 1
in the tree case.) We usually ignore the cases m/n → 0 and m/n → ∞; these
are left to the reader as open problems. (See e.g. Kolchin, Sevast’yanov and
Chistyakov [77], Kolchin [76], Pavlov [96] and Kazimirov [70] for examples of
such results.)
The results in Sections 7 and 11 suggest that Y(1) is small when λ < ν,
but large (perhaps of order n) when λ > ν, which is one aspect of the phase
transition at λ = ν. We will see that this roughly is correct, but that the full
story is somewhat more complicated.
We study the cases λ 6 ν and λ > ν separately; we also consider separately
several subcases of the first case where we can give more precise results.
We first note that the case ω < ∞, when the box capacities (node degrees in
the tree case) are bounded is trivial: w.h.p. the maximum is attained in many
boxes.
Theorem 19.1. Let w = (wk )k>0 be a weight sequence with w0 > 0 and ω < ∞.
Suppose that n → ∞ and m = m(n) with m/n → λ > 0. Then Y(j) = ω w.h.p.
for every fixed j.
Simply generated trees and random allocations
191
Proof. Clearly, each Yi 6 ω, so Y(j) 6 Y(1) 6 ω.
We assume tacitly, as always, that Bm,n exists, i.e. Z(m, n) > 0, and thus
m 6 ωn, so λ 6 ω. By Theorem 11.4 if λ < ω, and Remark 11.10 if λ = ω,
p
p
Nω (Bm,n )/n −→ πω > 0. In particular, Nω (Bm,n ) −→ ∞, and thus P(Y(j) =
ω) → 1.
19.1. The case λ 6 ν
In the case λ 6 ν, we show that, indeed, all Yi are small. Theorems 19.2–19.3
yield (w.h.p.) a bound o(n) when λ = ν, and a much stronger logarithmic bound
O(log n) when λ < ν. (In the tree case, we have λ = 1, so these are the cases
ν = 1 and ν > 1.)
Example 19.27 shows that in general, the bound o(n) when λ = ν is essentially
best possible; at least, for any given ε > 0, we can have Y(1) > n1−ε w.h.p. (We
do not know precisely how fast Y(1) can grow; see Problem 19.31.)
Theorem 19.2. Let w = (wk )k>0 be a weight sequence with w0 > 0 and
wk > 0 for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ
where 0 6 λ < ∞. If λ 6 ν, then Y(1) = op (n).
p
Equivalently, Y(1) /n −→ 0.
Proof. The case λ = 0 is trivial, since Y(1) /n 6 m/n → λ. The case λ = ω is also
trivial, since then ω < ∞ and Y(1) 6 ω. As above, λ > ω is impossible. Hence
we may assume 0 < λ < ω and ν > λ > 0, which implies τ > 0, where Ψ(τ ) = λ,
cf. Theorem 11.4. We may then for convenience replace (wk ) by the equivalent
weight sequence (πk ) in (11.13); we may thus assume that w is a probability
weight sequence with τ = 1, and thus ρ > τ = 1, and then the corresponding
random variable ξ has E ξ = λ.
By (18.11) and symmetry, for any k > 0,
P(Y(1) = k) 6 n P(Y1 = k) = n
wk Z(m − k, n − 1)
.
Z(m, n)
(19.1)
Furthermore, wk = πk = P(ξ = k) 6 1 and, using Example 11.2, Z(m, n) =
P(Sn = m) = eo(n) by Lemma 14.3 (ρ > 1) or 14.4 (ρ = 1). We turn to
estimating Z(m − k, n − 1).
Let 0 < ε < λ, and define τε by Ψ(τε ) = λ − ε. Since Ψ(τ ) = λ, we have
0 < τε < τ = 1.
For each n, choose k = k(n) ∈ [εn, m] such that Z(m−k, n−1) is maximal. We
have ε 6 k/n 6 m/n → λ; choose a subsequence such that k/n converges, say
k/n → γ with ε 6 γ 6 λ. Then, along the subsequence, (m− k)/(n− 1) → λ− γ.
By Theorem 18.1 (and Remark 18.2, ignoring the trivial case Z(m−k, n−1) =
0), using τε < 1, γ > ε and (11.16),
1
Φ(t)
Φ(t)
log Z(m − k, n − 1) → log inf λ−γ 6 log inf λ−γ
06t6τε t
t>0 t
n
Φ(t)
Φ(τε )
6 log inf λ−ε = log λ−ε =: cε ,
06t6τε t
τε
192
S. Janson
say, where Remark 11.5 shows that, since τε 6= 1,
cε < log Φ(1)/1λ−ε = 0.
(19.2)
We have shown that
lim sup
n→∞
1
log Z(m − k, n − 1) 6 cε
n
(19.3)
for k = k(n) and any subsequence such that k/n converges; it follows that (19.3)
holds for the full sequence. In other words,
log Z(m − k, n − 1) 6 cε n + o(n)
(19.4)
for our choice k = k(n) that maximises the left-hand side, and thus uniformly
for all k ∈ [εn, m]. Using (19.4) and, as said above, Lemma 14.4 in (19.1) we
obtain, recalling (19.2),
P(Y(1) > εn) =
m
X
k=εn
P(Y(1) = k) 6 mnecε n+o(n) eo(n) = ecε n+o(n) → 0.
In other words, for any ε > 0, Y(1) < εn w.h.p., which is equivalent to Y(1) =
op (n).
The following logarithmic bound when λ < ν is essentially due to Meir and
Moon [86] (who studied the tree case).
Theorem 19.3. Let w = (wk )k>0 be a weight sequence with w0 > 0 and wk > 0
for some k > 1. Suppose that n → ∞ and m = m(n) with m/n → λ. Assume
0 < λ < ν, and define τ ∈ (0, ρ) by Ψ(τ ) = λ.
(i) Then τ < ρ and
Y(1) 6
1
log n + op (log n).
log(ρ/τ )
(19.5)
(ii) In particular, if ρ = ∞, then Y(1) = op (log n).
1/k
(iii) If further wk
→ 1/ρ as k → ∞, then, for every fixed j > 1,
Y(j) p
1
−→
.
log n
log(ρ/τ )
1/k
(19.6)
1/k
Recall that 1/ρ = lim supk→∞ wk , see (3.5), so the extra assumption wk →
1/ρ as k → ∞ in (iii) holds unless the weight sequence is rather irregular. (The
proof shows that the assumption can be weakened to P(ξ > k)1/k → τ /ρ.)
It is not difficult to show Theorem 19.3 directly, but we prefer to postpone the
proof and use parts of the more refined Theorem 19.7 below, in order to avoid
some repetitions of arguments. Further results, under additional assumptions,
are given in Sections 19.3–19.4.
We conjecture that Theorem 19.3 holds also for λ = 0. Since then τ = 0, this
means the following. (This seems almost obvious given the result for positive λ
in Theorem 19.3, where the constant 1/ log(ρ/τ ) → 0 as λ → 0 and thus τ → 0,
but there is no general monotonicity and we leave this as an open problem.)
Conjecture 19.4. If ρ > 0 and m/n → 0, then Y(1) = op (log n).
193
Simply generated trees and random allocations
19.2. The subcase σ 2 < ∞
In the case σ 2 := Var ξ < ∞ (which includes the case λ < ν), there is a much
more precise result, which says that, simply, the largest numbers Y(1) , Y(2) . . .
asymptotically have the same distribution as the largest elements in the i.i.d.
sequence ξ1 , . . . , ξn . (Provided we choose the distribution of ξ correctly, and
possibly depending on n, see below for details.) In other words, the conditioning
in Example 11.2 then has asymptotically no effect on the largest elements of the
sequence. (When σ 2 = ∞ this is no longer necessarily true, however, as we shall
see in Example 19.27.)
In order to state this precisely, we now assume that ω = ∞ (see Theorem 19.1
otherwise) and 0 < λ 6 ν, and define as usual τ by Ψ(τ ) = λ, and let ξ be a
random variable with the distribution in (11.13).
If m/n 6 ν, we further define τn by Ψ(τn ) = m/n, and let ξ (n) be the random
variable with the distribution in (14.11). We will only use τn and ξ (n) in the
case λ < ν, so m/n → λ < ν and τn really is defined (at least for large n);
d
furthermore τn → τ < ρ and ξ (n) −→ ξ.
(n)
(n)
We further let ξ1 , . . . , ξn and (when λ < ν) ξ1 , . . . , ξn be i.i.d. sequences
(n)
of copies of ξ and ξ , respectively, and we arrange them in decreasing order
(n)
(n)
as ξ(1) > . . . > ξ(n) and ξ(1) > . . . > ξ(n) . Finally, we introduce the counting
variables, for any subset A ⊆ N0 ,
NA := |{i 6 n : Yi ∈ A}|,
N A := |{i 6 n : ξi ∈ A}|,
(n)
NA
(n)
:= |{i 6 n : ξi
∈ A}|.
(19.7)
(19.8)
(19.9)
(NA and N A also depend on n, but as usual, we for simplicity do not show this
(n)
in the notation.) Note that N A and N A simply have binomial distributions
(n)
NA
N A ∼ Bi(n, P(ξ ∈ A)) and
∼ Bi(n, P(ξ (n) ∈ A)).
We have
Y(j) 6 k ⇐⇒ N[k+1,∞) < j,
(19.10)
(n)
and similarly for ξ(j) and ξ(j) . Thus it is elementary to obtain asymptotic results
(n)
for the maximum ξ(1) of i.i.d. variables, and more generally for ξ(j) and ξ(j) , see
e.g. Leadbetter, Lindgren and Rootzén [82].
We introduce three different probability metrics to state the results. For discrete random variables X and Y with values in N0 (the case we are interested
in here), we define the Kolmogorov distance
dK (X, Y ) := sup | P(X 6 x) − P(Y 6 x)|
(19.11)
x∈N0
and the total variation distance
dTV (X, Y ) := sup | P(X ∈ A) − P(Y ∈ A)|.
A⊆N0
(19.12)
194
S. Janson
In order to treat also the case with variables tending to ∞, we further define
the modified Kolmogorov distance
| P(X 6 x) − P(Y 6 x)|
deK (X, Y ) := sup
.
1+x
x∈N0
(19.13)
For deK , we also allow random variables in N0 , i.e., we allow the value ∞. (Furthermore, the definitions of dK and dTV and the results for them in the lemma
below extend to random variables with values in Z. The definitions extend further to random variables with values in R for dK , and in any space for dTV , but
not all properties below hold in this generality.)
Note that these distances depend only on the distributions L(X) and L(Y ),
so d(L(X), L(Y )) might be a better notation, but we find it convenient to allow
both notations, as well as the mixed d(X, L(Y )).
It is obvious that the three distances above are metrics on the space of probability measures on N0 (or on N0 ).
We collect a few simple, and mostly well-known, facts for these three metrics
in a lemma; the proofs are left to the reader.
Lemma 19.5. (i) For any random variables X and Y with values in N0 ,
deK (X, Y ) 6 dK (X, Y ) 6 dTV (X, Y ).
(ii) For any X and X1 , X2 , . . . with values in N0 ,
d
Xn −→ X ⇐⇒ dTV (Xn , X) → 0 ⇐⇒ dK (Xn , X) → 0
⇐⇒ deK (Xn , X) → 0.
(iii) For any X and X1 , X2 , . . . with values in N0 ,
d
Xn −→ X ⇐⇒ deK (Xn , X) → 0.
In particular,
p
Xn −→ ∞ ⇐⇒ deK (Xn , ∞) → 0.
(iv) For any Xn and Xn′ with values in N0 , deK (Xn , Xn′ ) → 0 ⇐⇒ P(Xn 6
x) − P(Xn′ 6 x) → 0 for every fixed x > 0.
(v) For any Xn and Xn′ , dTV (Xn , Xn′ ) → 0 ⇐⇒ there exists a coupling
d
(Xn , Xn′ ) with Xn = Xn′ w.h.p. (We denote this also by Xn ≈ Xn′ .)
(vi) The supremum in (19.12) is attained, and the absolute value sign is
redundant. In fact, if A := {i : P(X = i) > P(Y = i)}, then dTV (X, Y ) =
P(X ∈ A) − P(Y ∈ A).
(vii) For any X and Y with values in N0 ,
X
X
dTV (X, Y ) =
P(X = x) − P(Y = x) + = 21
| P(X = x) − P(Y = x)|.
x∈N0
x∈N0
Simply generated trees and random allocations
195
Remark 19.6. The three metrics are, by Lemma 19.5(ii), equivalent in the
usual sense that they define the same topology, but they are not uniformly
equivalent. For example, if Xn ∼ Po(n), Xn′ := 2⌊Xn /2⌋ (i.e., Xn rounded
down to an even integer) and Xn′′ := Xn′ + 1, then dK (Xn′ , Xn′′ ) → 0 as n → ∞,
but dTV (Xn′ , Xn′′ ) = 1.
We define Po(∞) as the distribution of a random variable that equals ∞
identically.
After all these preliminaries, we state the result (together with some supplementary results). There are really two versions; it turns out that for general
sequences m(n), we have to use the random variables ξ (n) , with E ξ (n) = m(n)/n
exactly tuned to m(n), but under a weak assumption we can replace ξ (n) by ξ
and obtain a somewhat simpler statement, which we choose as our main formulation. (This goes back to Meir and Moon [87], who proved (i) in the tree case,
assuming λ < ν; see also Kolchin, Sevast’yanov and Chistyakov [77, Theorem
1.6.1] and Kolchin [76, Theorem 1.5.2] for Y(1) in the special case in Example 12.1.)
Theorem 19.7. Let w = (wk )k>0 be a weight sequence with
√ w0 > 0 and ω = ∞.
Suppose that n → ∞ and m = m(n) with m = λn + o( n) where 0 < λ 6 ν,
and use the notation above. Suppose further that σ 2 := Var ξ < ∞. (This is
redundant when λ < ν.)
(i) If (possibly for n in a subsequence) h(n) are integers such that
nP (ξ > h(n)) → α, for some α ∈ [0, ∞], then
d
N[h(n),∞) := |{i : Yi > h(n)}| −→ Po(α).
(ii) If h(n) are integers such that nP (ξ > h(n)) → 0, then w.h.p. Y(1) < h(n).
(iii) If h(n) are integers such that nP (ξ > h(n)) → ∞, then, for every fixed j,
w.h.p. Y(j) > h(n).
, N [h(n),∞) → 0.
(iv) For any sequence h(n), deK N[h(n),∞)
(v) For every fixed
j, dK Y(j) , ξ(j) → 0.
(vi) dTV Y(1) , ξ(1) → 0.
√
If λ < ν, the condition m = λn + o( n) can be weakened to m/n = λ +
o(1/ log n).
Moreover, if λ < ν, then the results hold for any m = m(n) with m/n → λ,
provided ξ is replaced by ξ (n) , N by N
(n)
(n)
and ξ(j) by ξ(j) .
Remark 19.8. In the version with ξ (n) , we do not need λ at all. By considering
subsequences, it follows that it suffices that 0 < c 6 m/n 6 C < ν. (Cf.
Theorem 11.6.) Furthermore, this version extends to the case λ = ν and m/n 6
ν, but we have ignored this case for simplicity.
Problem 19.9. Is Theorem 19.7 (in the ξ (n) version) true also for λ = 0 < ν?
The total variation approximation in (vi) is stronger than the Kolmogorov
distance approximation in (v), and our proof is considerably longer, but for
196
S. Janson
many purposes (v) is enough. We conjecture that total variation approximation
holds for every Y(j) , and not just for Y(1) ; presumably this can be shown by a
modification of the proof for Y(1) below, but we have not checked the details and
leave this as an open problem. Furthermore, we believe that the result extends
to the joint distribution of finitely many Y(j) . (The corresponding result in (v),
using a multivariate version of the Kolmogorov distance, is easily verified by the
methods below.)
Problem 19.10. Does dTV Y(j) , ξ(j) → 0 hold for every fixed j, under the
assumptions of Theorem 19.7?
Proof of Theorem 19.7. As in the proof of Theorem 19.2, we may replace (wk )
by the equivalent weight sequence (πk ) in (11.13). We may thus assume that
w is a probability weight sequence with τ = 1, and thus ρ > τ = 1, and the
corresponding random variable
ξ has E ξ = λ. We consider first the version with
√
ξ, assuming m = λn + o( n), and discuss afterwards the modifications for ξ (n) .
We begin by looking again at (18.11):
P(Y1 = k) =
wk Z(m − k, n − 1)
.
Z(m, n)
(19.14)
√
When m = λn + o( n), we may apply Lemma 14.1 and Remark 14.2 and thus,
with d := span(w),
d + o(1)
.
Z(m, n) = P(Sn = m) = √
2πσ 2 n
(19.15)
Moreover, by (14.9), for any k,
d + o(1)
Z(m − k, n − 1) = P(Sn−1 = m − k) 6 √
.
2πσ 2 n
(19.16)
Consequently, (19.14) yields, uniformly for all k,
P(Y1 = k) 6 (1 + o(1))wk = (1 + o(1)) P(ξ = k).
(19.17)
In particular, we may sum over k > K and obtain, for any K = K(n),
P(Y1 > K) 6 (1 + o(1)) P(ξ > K).
(19.18)
Since, by assumption, E ξ 2 < ∞, we
> K) = o(K −2 ) as K → ∞.
√ have P(ξ
−1
Hence, for every fixed δ > 0, P(ξ > δ n)√= o(n ). It follows that there exists
a sequence √δn → 0 such that P(ξ √> δn n) = o(n−1 ). Consequently, defining
B(n) := δn n, we have B(n) = o( n) and
P(ξ > B(n)) = o(n−1 ),
(19.19)
and thus, by (19.18) and symmetry,
P(Y(1) > B(n)) 6 n P(Y1 > B(n)) = n 1 + o(1) P(ξ > B(n)) = o(1). (19.20)
Hence, Y(1) < B(n) w.h.p.
197
Simply generated trees and random allocations
Similarly, P(ξ(1) > B(n)) 6 n P(ξ1 > B(n)) = o(1), so ξ(1) < B(n) w.h.p.
(i): Write, for convenience, N := N[h(n),B(n)] , and note that w.h.p. Y(1) 6
B(n) and then N = N[h(n),∞) . (We assume for simplicity h(n) 6 B(n); otherwise
we let N := 0, leaving the trivial
√
√ modifications in this case to the reader.)
Moreover, for k 6 B(n) = o( n), we have (m − k) − (n − 1)λ = o( n), and
thus Remark 14.2 shows that, for any k = k(n) 6 B(n),
d + o(1)
.
Z(m − k, n − 1) = P(Sn−1 = m − k) = √
2πσ 2 n
(19.21)
Since we here may take k = k(n) that maximises or minimises this for k 6
B(n), it follows that (19.21) holds uniformly for all k 6 B(n). Consequently, by
(19.14), (19.15) and (19.21),
P(Y1 = k) = (1 + o(1))wk = (1 + o(1)) P(ξ = k),
(19.22)
uniformly for all k 6 B(n). By the assumption and (19.19), this yields
B(n)
B(n)
EN = n
X
P(Y1 = k) = n
X
k=h(n)
k=h(n)
1 + o(1) P(ξ = k)
= 1 + o(1) n P h(n) 6 ξ 6 B(n)
= 1 + o(1) n P(ξ > h(n)) − P(ξ > B(n)) → α.
Similarly, again using the symmetry as well as Lemma 14.1 and Remark 14.2,
E N (N − 1) = n(n − 1) P Y1 , Y2 ∈ [h(n), B(n)]
B(n)
= n(n − 1)
X
B(n)
= n(n − 1)
X
wk1 wk2 Z(m − k1 − k2 , n − 2)
Z(m, n)
X
P(ξ = k1 ) P(ξ = k2 ) 1 + o(1)
k1 ,k2 =h(n)
B(n)
= n(n − 1)
P(Y1 = k1 and Y2 = k2 )
k1 ,k2 =h(n)
k1 ,k2 =h(n)
2
= 1 + o(1) n2 P(ξ > h(n)) − P(ξ > B(n))
→ α2 .
Moreover, the same argument works for any factorial moment E(N )ℓ and yields
d
E(N )ℓ → αℓ for every ℓ > 1. If α < ∞, we thus obtain N −→ Po(α) by the
method of moments, and the result follows, since N = N[h(n),∞) w.h.p.
If α = ∞, this argument yields
ℓ
E(N )ℓ ∼ n P(ξ > h(n)) → ∞
(19.23)
198
S. Janson
for every ℓ > 1, and we make a thinning: Let A be a constant and let q :=
A/ n P(ξ > h(n)) ; then q → A/α = 0. We consider only n that are so large that
q < 1. We then randomly, and independently, mark each box with probability
q. Let N ′ be the random number of marked boxes i such that Yi ∈ [h(n), B(n)].
Then, for every ℓ > 1, using (19.23),
E(N ′ )ℓ = (n)ℓ q ℓ P Y1 , . . . , Yℓ ∈ [h(n), B(n)] = q ℓ E(N )ℓ → Aℓ .
(19.24)
d
Consequently, by the method of moments, N ′ −→ Po(A). In particular, this
shows, for every fixed x,
P(N < x) 6 P(N ′ < x) → P(Po(A) < x),
which can be made arbitrarily small by taking A large. Hence, P(N < x) → 0
p
p
for every fixed x, i.e., N −→ ∞ and thus N[h(n),∞) −→ ∞, as we claim in this
case.
p
(ii): Part (i) applies with α = 0, and yields N[h(n),∞) −→ 0, which means
N[h(n),∞) = 0 w.h.p. Thus Y(j) < h(n) w.h.p. by (19.10).
p
(iii): Part (i) applies with α = ∞, and yields N[h(n),∞) −→ ∞. Thus, for
every fixed j, by (19.10), P(Y(j) < h(n)) = P(N[h(n),∞) < j) → 0.
(iv): Suppose not. Then there exists a sequence h(n) and an ε > 0 such that,
for some subsequence,
(19.25)
deK N[h(n),∞) , N [h(n),∞) > ε.
We may select a subsubsequence such that n P(ξ > h(n)) → α for some α ∈
[0, ∞]; then deK N[h(n),∞) , Po(α) → 0 by (i) and Lemma 19.5(iii). Moreover,
d
along the same subsubsequence, N [h(n),∞) ∼ Bi n, P(ξ > h(n) −→ Po(α),
by the standard Poisson approximation for binomial distributions (and rather
trivially if α = ∞); hence deK N [h(n),∞) , Po(α) → 0. The triangle inequality
yields deK N[h(n),∞) , N [h(n),∞) → 0 along the subsubsequence, which contradicts (19.25). This contradiction proves (iv).
(v): Suppose not. Then, by (19.11), there is an ε > 0 and a subsequence such
that for some h(n),
P(Y(j) 6 h(n)) − P(ξ(j) 6 h(n)) > ε.
(19.26)
However, by (19.10), (19.13) and (iv),
P(Y(j) 6 h(n))− P(ξ(j) 6 h(n))
= P(N[h(n)+1,∞) 6 j − 1) − P(N [h(n)+1,∞) 6 j − 1)
6 j deK N[h(n)+1,∞) , N [h(n)+1,∞) → 0,
which contradicts (19.26). This contradiction proves (v).
(vi): Let A = A(n) := {i : P(Y(j) = i) > P(ξ(j) = i)}; thus, see Lemma 19.5(vi),
dTV (Y(j) , ξ(j) ) = P(Y(j) ∈ A) − P(ξ(j) ∈ A).
(19.27)
Simply generated trees and random allocations
199
Let δ > 0. For each n, we partition N0 into a finite family P = {Jl }L
l=1 of
intervals as follows. First, each i ∈ N0 with P(ξ(1) = i) > δ/2 is a singleton {i};
note that there are at most 2/δ such i. The complement of the set of these i
consists of at most 2/δ + 1 intervals J˜k (of which one is infinite). We partition
each such interval J˜k further into intervals Jl with P(ξ(1) ∈ Jl ) 6 δ by repeatedly
chopping off the largest such subinterval starting at the left endpoint. Since only
points with P(ξ(1) = i) < δ/2 remain, each such interval Jl except the last in
each J˜k satisfies P(ξ(1) ∈ Jl ) > δ/2. Hence, our final partition {Jl } contains at
most 2/δ + 1 intervals Jl with P(ξ(1) ∈ Jl ) < δ/2, while the number of intervals
Jl with P(ξ(1) ∈ Jl ) > δ/2 is clearly at most 2/δ. Consequently, L, the total
number of intervals, is at most 4/δ + 1.
We write Jl = [al , bl ]. We say that an interval Jl ∈ P is fat if P(ξ(1) ∈ Jl ) > δ,
and thin otherwise. Note that by our construction, a fat interval is a singleton
{al }.
Next, fix a large number D. We say that an interval Jl = [al , bl ] ∈ P is good
if n P(ξ > al ) 6 D, and bad otherwise.
For any interval Jl ,
P(Y(1) ∈ Jl ) − P(ξ(1) ∈ Jl ) 6 2dK (Y(1) , ξ(1) ) = o(1)
(19.28)
by (v).
S
Let Al := A ∩ Jl . Thus A is the disjoint union l Al . (A, Jl and Al depend
on n.)
We note that if Jl is fat, then Jl is a singleton, and either Al = Jl or Al = ∅;
in both cases we have, using (19.28),
P(Y(1) ∈ Al ) − P(ξ(1) ∈ Al ) 6 2dK (Y(1) , ξ(1) ) = o(1).
(19.29)
We next turn to the good intervals. We claim that, uniformly for all good
intervals Jl , as n → ∞,
D
P(Y(1) ∈ Al ) 6 eδe P(ξ(1) ∈ Al ) + o(1).
(19.30)
As usual, we suppose that this is not true and derive a contradiction. Thus,
assume that there is an ε > 0 and, for each n in some subsequence, a good
interval Jl = [al , bl ] (depending on n) such that
D
P(Y(1) ∈ Al ) > eδe P(ξ(1) ∈ Al ) + ε.
(19.31)
If Jl is fat, then (19.31) contradicts (19.29) for large n, so we may assume that
Jl is thin, i.e., P(ξ(1) ∈ Jl ) 6 δ.
Let Acl := Jl \ Al and Bl := [bl + 1, ∞). Let αn := n P(ξ ∈ Al ), βn :=
n P(ξ ∈ Bl ) and γn := n P(ξ ∈ Acl ). The assumption that Jl is good implies that
αn + βn + γn = n P(ξ > al ) 6 D. By selecting a subsubsequence we may assume
that αn → α, βn → β and γn → γ for some real α, β, γ with α + β + γ 6 D.
d
Then (i) shows that NBl −→ Po(β); moreover, the proof extends easily (using
200
S. Janson
d
d
joint factorial moments) to show that NAl −→ Po(α), NBl −→ Po(β) and
d
NAcl −→ Po(γ), jointly and with independent limits.
Similarly, by the method of moments or otherwise (this is a standard Poisson
d
d
approximation of a multinomial distribution), N Al −→ Po(α), N Bl −→ Po(β)
d
and N Acl −→ Po(γ), jointly and with independent limits.
Note that
Y(1) ∈ Al =⇒ NAl > 1 and NBl = 0.
Conversely,
NAl > 1 and NBl = NAcl = 0 =⇒ Y(1) ∈ Al .
The corresponding results hold for ξ(1) . Thus,
P(Y(1) ∈ Al ) 6 P(NAl > 1, NBl = 0) → P Po(α) > 1 P Po(β) = 0 (19.32)
and
P(ξ(1) ∈ Al ) > P(N Al > 1, N Bl = N Acl = 0)
→ P Po(α) > 1 P Po(β) = 0 P Po(γ) = 0 . (19.33)
Since P Po(γ) = 0 = e−γ , (19.32)–(19.33) yield
P(Y(1) ∈ Al ) − eγ P(ξ(1) ∈ Al ) 6 o(1).
(19.34)
d
Moreover, N Jl = N Al + N Acl −→ Po(α + γ), and thus
P(ξ(1) ∈ Jl ) = P(N Jl > 1, N Bl = 0) > P(N Jl = 1, N Bl = 0)
→ (α + γ)e−α−γ e−β . (19.35)
We are assuming that Jl is thin, i.e., P(ξ(1) ∈ Jl ) 6 δ, and thus (19.35) yields
(α + γ)e−α−γ e−β 6 δ and consequently
γ 6 α + γ 6 δeα+β+γ 6 δeD .
Hence, (19.34) implies
D
P(Y(1) ∈ Al ) 6 eδe P(ξ(1) ∈ Al ) + o(1),
which contradicts (19.31). This contradiction shows that (19.30) holds uniformly
for all good intervals.
It remains to consider the bad intervals.
Let Jℓ = [aℓ , bℓ ] be the rightmost bad interval. If Jℓ is fat we use (19.29) and
if Jℓ is thin we use (19.28) which gives
P(Y(1) ∈ Aℓ ) 6 P(Y(1) ∈ Jℓ ) 6 P(ξ(1) ∈ Jℓ ) + o(1) 6 δ + o(1).
In both cases,
P(Y(1) ∈ Aℓ ) 6 P(ξ(1) ∈ Aℓ ) + δ + o(1).
(19.36)
Simply generated trees and random allocations
201
Finally, let A∗ be the union of the remaining bad intervals. Then A∗ =
[0, aℓ − 1] and by (v),
P(Y(1) ∈ A∗ ) = P(Y(1) < aℓ ) 6 P(ξ(1) < aℓ ) + o(1).
(19.37)
Furthermore, recalling n P(ξ > aℓ ) > D since Jℓ is bad,
n
P(ξ(1) < aℓ ) = P(N [aℓ ,∞) = 0) = 1 − P(ξ > aℓ ) 6 e−n P(ξ>aℓ ) 6 e−D .
(19.38)
We obtain by summing (19.30) for all good intervals together with (19.36)
and (19.37), recalling that the number of intervals is bounded (for a fixed δ)
and using (19.38),
X
P(Y(1) ∈ A) =
P(Y(1) ∈ Al )
l
6e
δeD
X
l
D
P(ξ(1) ∈ Al ) + o(1) + δ + P(ξ(1) < aℓ )
6 eδe P(ξ(1) ∈ A) + o(1) + δ + e−D .
Consequently,
and thus
dTV (Y(1) , ξ(1) ) = P(Y(1) ∈ A) − P(ξ(1) ∈ A)
D
6 eδe − 1 P(ξ(1) ∈ A) + δ + e−D + o(1)
D
6 eδe − 1 + δ + e−D + o(1),
D
lim sup dTV (Y(1) , ξ(1) ) 6 eδe − 1 + δ + e−D .
(19.39)
n→∞
Letting first δ → 0 and then D → ∞, we obtain dTV (Y(1) , ξ(1) ) → 0, which
proves (vi).
This completes the proof of the version with ξ and the assumption m =
λn+o(n1/2 ). Now remove this assumption, but assume λ < ν and thus τ < ρ. We
consider only n with 0 < m/n < ν and thus 0 < τn < ρ. Denote the distribution
(14.11) of ξ (n) by w(n) (this is a probability weight sequence equivalent to w)
(n)
(n)
(n)
and let Sn := ξ1 + · · · + ξn . Then, by Example 11.2 applied to w(n) , in
analogy with (19.14) (and equivalent to it by (11.9)),
(n)
wk Z(m − k, n − 1; w(n) )
Z(m − k, n − 1; w(n) )
=
P(ξ (n) = k).
(n)
Z(m, n; w )
Z(m, n; w(n) )
(19.40)
Furthermore, for any y > 0, using (14.13),
(n)
P(Y(1) > y) 6 n P(Y1 > y) = n P ξ1 > y Sn(n) = m
−1
(n)
(19.41)
6 n P ξ1 > y P Sn(n) = m
P(Y1 = k) =
6 n P(ξ (n) > y) · O(n1/2 ).
202
S. Janson
Choose τ∗ ∈ (τ, ρ). Then, for s > 0 and n so large than τn < τ∗ , by (4.11),
P(ξ (n) > y) 6 e−sy
Φ(es τ∗ )
Φ(es τn )
6 e−sy
.
Φ(τn )
Φ(0)
(19.42)
Choosing s > 0 with es < ρ/τ∗ , we thus find P(ξ (n) > y) = O(e−sy ) and, by
(19.41),
P(Y(1) > y) = O n3/2 e−sy .
We now define B(n) := 2s−1 log n, and obtain
P(Y(1) > B(n)) = O n3/2 e−sB(n) = O n−1/2 → 0.
(19.43)
Hence, Y(1) < B(n) w.h.p. Similarly, using (19.42) again,
P(ξ (n) > B(n)) = o(n−1 )
(19.44)
(n)
and thus P(ξ(1) > B(n)) 6 n P(ξ (n) > B(n)) → 0, so ξ(1) < B(n) w.h.p.
We have shown that (19.19) (with ξ (n) ) and (19.20) hold. Moreover, Lemma
14.1 yields, see (14.13) again, Z(m, n; w(n) ) ∼ d/(2πσ 2 n)1/2 , and for k 6
B(n) = O(log n), the same argument yields also, using Remark 14.2, Z(m −
k, n − 1; w(n) ) ∼ d/(2πσ 2 n)1/2 , because m − k − (n − 1) E ξ (n) = m − k − (n −
1)m/n = −k + m/n = o(n1/2 ). Consequently, (19.40) yields
P(Y1 = k) = 1 + o(1) P(ξ (n) = k),
(19.45)
uniformly for k 6 B(n).
(n)
(n)
We can now argue exactly as above, using ξ (n) , ξ(j) and N A , which proves
this version of the theorem.
Finally, if λ < ν and m/n = λ + o(1/ log n), then τn := Ψ−1 (m/n) = τ +
o(1/ log n), because Ψ−1 is differentiable on (0, ν). Since we are assuming τ = 1
and Φ(τ ) = 1 in the proof, we thus have, uniformly for all k 6 B(n) = O(log n),
P(ξ (n) = k) =
τnk
wk = 1 + o(1) wk = 1 + o(1) P(ξ = k).
Φ(τn )
(19.46)
Since also P(ξ (n) > B(n)) = o(n−1 ) and P(ξ > B(n)) = o(n−1 ), it follows that
n P(ξ (n) > h(n)) → α ⇐⇒ n P(ξ > h(n)) → α, and thus we may in (i)–(iii)
replace ξ (n) by ξ again. Finally, (iv)–(vi) follow as above in this case too.
Proof of Theorem 19.3. Recall that λ < ν ⇐⇒ τ < ρ by Lemma 3.1. We have
ν > λ > 0, so ρ > 0, and τ > 0. Thus 1 < ρ/τ 6 ∞.
(i): Fix a > 1/ log(ρ/τ ). Choose b with e1/a < b < ρ/τ . Then 0 6 1/ρ <
(bτ )−1 . Choose c with 1/ρ < c < (bτ )−1 .
1/k
1/k
Since lim supk→∞ wk = 1/ρ < c, we have wk < c for large k, and then,
(n)
defining τn and ξ
by (14.10)–(14.11),
P(ξ (n) = k) =
(cτn )k
τnk
wk 6
.
Φ(τn )
Φ(0)
(19.47)
Simply generated trees and random allocations
203
As n → ∞, cτn → cτ < b−1 . Let h := ⌊a log n⌋. For large n, (19.47) applies for
k > h, and cτn < b−1 < 1, and then
P(ξ (n) > h) 6
∞
∞
X
X
b−k
(cτn )k
= O b−h = O n−a log b
6
Φ(0)
w0
k=h
k=h
Since a log b > 1, thus n P(ξ (n) > h) → 0, and Theorem 19.7(ii) yields Y(1) 6
h 6 a log n w.h.p.
(ii): If ρ = ∞, then (i) applies with ρ/τ = ∞ and thus 1/ log(ρ/τ ) = 0.
(iii): If ρ = ∞, the result follows by (ii), so we may assume 1 = τ < ρ < ∞.
Let a := 1/ log(ρ/τ ) and 0 < ε < 1. The upper bound Y(j) 6 Y(1) 6 (a + ε) log n
w.h.p. follows from (i), and it remains to find a matching lower bound.
Let k := ⌈(1 − ε)a log n⌉. Then, since τn → τ ,
log P(ξ (n) = k) = log wk + k log τn − log Φ(τn )
= −k(log ρ + o(1)) + k(log τ + o(1)) + O(1)
= −k log(ρ/τ ) + o(k) = −(1 − ε + o(1)) log n
and thus
n P(ξ (n) > k) > n P(ξ (n) = k) = nε+o(1) → ∞.
By Theorem 19.7(iii) (and the last sentence in Theorem 19.7), this implies w.h.p.
Y(j) > k > (1 − ε)a log n
This completes the proof, since we can take ε arbitrarily small.
Specialising Theorem 19.7 to the tree case (m = n − 1), we obtain the following. (Recall that σ 2 < ∞ is automatic when ν > 1.)
Corollary 19.11. Let w = (wk )k>0 be a weight sequence with w0 > 0 and
wk > 0 for some k > 2, and let ξ have the distribution given by (πk ) in (7.1).
Suppose that ν > 1 and σ 2 := Var ξ < ∞. Then, as n → ∞, for the largest
degrees Y(1) > Y(2) > . . . in Tn , dTV (Y(1) , ξ(1) ) → 0 and, for every fixed j,
dK (Y(j) , ξ(j) ) → 0.
Proof. The case ω = ∞ is a special case of Theorem 19.7, with λ = 1.
The case ω < ∞ is trivial: for every fixed j, Y(j) = ω w.h.p. by Theorem 19.1,
and, trivially, ξ(j) = ω w.h.p.
The comparison with ξ(j) in Theorem 19.7 and Corollary 19.11 is appealing
since ξ(j) is the j:th largest of n i.i.d. random variables. For applications it is
often convenient to modify this a little by taking a Poisson number of variables
instead.
Consider an infinite i.i.d. sequence ξ1 , ξ2 , . . . , let as above ξ(j) be the j:th
largest among the first n elements of the sequence and define ξ̃(j) as the j:th
largest among the first N (n) elements ξ1 , . . . , ξN (n) , where N (n) ∼ Po(n) is a
random Poisson variable independent of ξ1 , ξ2 , . . . .
204
S. Janson
Lemma 19.12. W.h.p. ξ˜(j) = ξ(j) and thus dTV (ξ˜(j) , ξ(j) ) → 0 as n → ∞ for
every fixed j > 1.
−
Proof. Let n± := ⌊n ± n2/3 ⌋, and let ξ(j)
be the j:th largest of ξ1 , . . . , ξn− . By
symmetry, the positions of the j largest among ξ1 , . . . , ξn are uniformly random
(we resolve any ties in the ordering at random); thus the probability that one
of them has index > n− is at most j(n − n− )/n = o(1). Hence, w.h.p. all j are
−
.
among ξ1 , . . . , ξn− , and then ξ(j) = ξ(j)
Furthermore, w.h.p. n− 6 N (n) 6 n+ , and a similar argument (using condi−
−
tioning on N (n)) shows that w.h.p. ξ˜(j) = ξ(j)
. Hence, w.h.p. ξ(j) = ξ(j)
= ξ̃(j) .
Now use Lemma 19.5(v).
We can thus replace ξ(j) by ξ˜(j) in Theorem 19.7 and Corollary 19.11. (We can
(n)
(n)
similarly replace ξ(j) by ξ˜(j) defined in the same way.) The advantage is that,
by standard properties of the Poisson distribution, the corresponding counting
variables
ek := |{i 6 N (n) : ξ = k}|
N
ek ∼ Po(n P(ξ = k)). We similarly
are independent Poisson variables with N
P
∞
e[k,∞) :=
e
define N
l=k Nl ∼ Po n P(ξ > k) .
Remark 19.13. An equivalent way to express this is that the multiset Ξn :=
{ξi : i 6 N (n)} is a Poisson process on N0 with intensity measure Λn given by
Λn {k} = n P(ξ = k).
We thus have (exactly), for any j and k,
e[k+1,∞) < j = P Po(n P(ξ > k)) < j ;
P(ξ˜(j) 6 k) = P N
in particular
P(ξ˜(1) 6 k) = e−n P(ξ>k) .
(19.48)
(19.49)
This gives the following special case of Theorem 19.7. (There is a similar version
with ξ (n) .)
Corollary 19.14. Suppose that w0 > 0√ and ω = ∞. Suppose further that
n → ∞ and m = m(n) with m = λn + o( n) where 0 < λ 6 ν, and that either
λ < ν or σ 2 := Var ξ < ∞. Then, uniformly in all k > 0,
P(Y(j) 6 k) = P Po(n P(ξ > k)) < j + o(1)
(19.50)
for each fixed j > 1; in particular
P(Y(1) 6 k) = e−n P(ξ>k) + o(1).
(19.51)
Proof. Immediate by Theorem 19.7(v), Lemma 19.12 and (19.48)–(19.49).
e[h(n),∞) > j ⇐⇒ ξ˜(j) > h(n), it follows easily from
Remark 19.15. Since N
Lemmas 19.12 and 19.5(iv) that for any sequence h(n),
e[h(n),∞) → 0.
deK N [h(n),∞) , N
(19.52)
205
Simply generated trees and random allocations
e[h(n),∞) → 0, and thus
Hence, Theorem 19.7(iv) is equivalent to deK N[h(n),∞) , N
(19.53)
deK N[h(n),∞) , Po n P(ξ > h(n)) → 0.
This is another, essentially equivalent, way to express the results above.
19.3. The subcase λ < ν
When λ < ν, we have τ < ρ and the random variable ξ has some finite exponential moment, cf. Section 8; hence the probabilities πk decrease rapidly.
Theorem 19.7 and Corollary 19.14 show that Y(1) (and each Y(j) ) has its distribution concentrated on k such that P(ξ > k) is of the order 1/n. If the
decrease of πk is not too irregular, this implies strong concentration of Y(1) ,
with, rougly speaking, Y(1) ≈ k when P(ξ > k) ≈ 1/n. To make this precise,
we define three versions of a suitable such estimate k = k(n). Let, as above,
πk = P(ξ = k) = τ k wk /Φ(τ ) and let
Πk := P(ξ > k) =
∞
X
πl .
(19.54)
l=k
Define
k1 (n) := max{k : πk > 1/n},
k2 (n) := max{k : Πk > 1/n},
p
k3 (n) := max{k : Πk Πk+1 > 1/n}.
(19.55)
(19.56)
(19.57)
Note that k1 (n) 6 k2 (n) and k2 (n) − 1 6 k3 (n) 6 k2 (n).
We consider the typical case when wk+1 /wk converges as k → ∞. We assume
implicitly that wk+1 /wk is defined for all large k; thus wk > 0 and ω = ∞. If
wk+1 /wk → a as k → ∞, then (3.5) yields ρ = 1/a; hence ρ = ∞ if a = 0 and
0 < ρ < ∞ if a > 0.
Theorem 19.16. Suppose that w0 > 0 and that wk+1 /wk → a < ∞ √
as k → ∞.
Suppose further that n → ∞ and m = m(n) with m = λn + o( n) where
0 < λ < ν.
(i) Then, for each j > 1,
Y(j) = k1 (n) + Op (1) = k2 (n) + Op (1) = k3 (n) + Op (1).
(ii) If a = 0, then, moreover, w.h.p.,
|Y(j) − k1 (n)| 6 1,
|Y(j) − k2 (n)| 6 1,
Y(j) ∈ {k3 (n), k3 (n) + 1}.
Proof. (i): We have, as said above, ρ = 1/a > 0. Furthermore, since λ < ν, we
have τ < ρ and thus, as k → ∞,
wk+1
τ
πk+1
=τ
→ τ a = < 1.
πk
wk
ρ
(19.58)
206
S. Janson
It follows from (19.58) and (19.54), using dominated convergence, that, as
k → ∞,
∞
∞
X
X
1
πk+i
Πk
(τ a)i =
=
→
.
(19.59)
πk
π
1
−
τa
k
i=0
i=0
If ℓ is chosen such that (τ a)ℓ < 1−τ a, then (19.59) and (19.58) imply Πk+ℓ /πk →
(τ a)ℓ /(1 − τ a) < 1 as k → ∞, and thus, for large k, Πk+ℓ < πk < Πk ; hence,
for large n, k1 (n) 6 k2 (n) 6 k1 (n) + ℓ. Thus, recalling that |k2 (n) − k3 (n)| 6 1,
k1 (n) = k2 (n) + O(1) = k3 (n) + O(1).
(19.60)
Furthermore, (19.58) and (19.59) yield also
Πk+1
→ τ a < 1.
Πk
(19.61)
By (19.56), nΠk2 (n) > 1 > nΠk2 (n)+1 . This and (19.61) imply that if Ω(n) is
any sequence with Ω(n) → ∞, then nΠk2 (n)−Ω(n) → ∞ and nΠk2 (n)+Ω(n) → 0.
Consequently, recalling the definition (19.54), by Theorem 19.7(ii)–(iii) (or by
Corollary 19.14) w.h.p. Y(j) > k2 (n) − Ω(n) and Y(j) < k2 (n) + Ω(n). Since
Ω(n) → ∞ is arbitrary, this yields Y(j) = k2 (n) + Op (1). (See e.g. [62].) The
result follows by (19.60).
(ii): When a = 0, (19.59) yields Πk ∼ πk , (19.58) yields πk+1 /πk → 0
and (19.61) yields Πk+1 /Πk → 0 as k → ∞. It follows easily from (19.55)–
(19.57) that nΠk1 (n)−1 → ∞, nΠk1 (n)+2 → 0, nΠk2 (n)−1 → ∞, nΠk2 (n)+2 → 0,
nΠk3 (n) → ∞, nΠk3 (n)+2 → 0, and the results follow by Theorem 19.7(ii)–
(iii).
If a = 0, i.e. wk+1 /wk → 0 as k → ∞, thus Y(1) is asymptotically concentrated
at one or two values. (This was shown, in the tree case, by Meir and Moon [88],
after showing concentration to at most three values in [87]; see also Kolchin,
Sevast’yanov and Chistyakov [77], Kolchin [76] and Carr, Goh and Schmutz [21]
for special cases.) If a > 0, we still have a strong concentration, but not to any
finite number of values as is seen by Theorem 19.19 below.
We consider two important examples, where we apply this to random trees,
so m = n − 1 and λ = 1. (Recall that Y(1) then is the largest outdegree in Tn .
The largest degree is w.h.p. Y(1) + 1, since w.h.p. it is not attained at the root,
e.g. because the root degree is Op (1) by Theorem 7.10; this should be kept in
mind when comparing with results in other papers.)
Example 19.17. For uniform random labelled ordered rooted trees, we have
by Example 10.1 ξ ∼ Ge(1/2) with πk = 2−k−1 and thus P(ξ > k) = 2−k . Hence
Y(1) has asymptotically the same distribution as the maximum of n i.i.d. geometrically distributed random variables, which is a simple and well-studied example,
see e.g. Leadbetter, Lindgren and Rootzén [82]. Explicitly, Corollary 19.14 applies and (19.51) yields, uniformly in k > 0,
−k−1
P(Y(1) 6 k) = e−n2
+ o(1).
(This was, essentially, shown by Meir and Moon [87].)
(19.62)
207
Simply generated trees and random allocations
One way to express this is to introduce a random variable W with the Gumbel
distribution
−x
P(W 6 x) = e−e ,
−∞ < x < ∞.
(19.63)
Then (19.62) yields, uniformly for k ∈ Z,
P(Y(1) 6 k) = P W < (k + 1) log 2 − log n + o(1)
W + log n
=P
< k + 1 + o(1)
log 2
W + log n 6 k + o(1).
=P
log 2
In other words, extending dK to Z-valued random variables,
dK Y(1) , ⌊(W + log n)/ log 2⌋ → 0.
(19.64)
(19.65)
Thus, the maximum degree Y(1) can be approximated (in distribution) by ⌊(W +
log n)/ log 2⌋ = ⌊W/ log 2+log2 n⌋. Hence Y(1) −log2 n is tight but no asymptotic
distribution exists; Y(1) − log2 n can be approximated by ⌊W/ log 2 + log2 n⌋ −
log2 n = ⌊W/ log 2 + {log2 n}⌋ − {log2 n} (where we let {x} := x − ⌊x⌋ denote
the fractional part of x), which shows convergence in distribution for any subsequence such that {log2 n} converges to some α ∈ [0, 1], but the limit depends
on α. See further Janson [60, in particular Lemma 4.1 and Example 4.3].
In the same way we see that Y(j) can be approximated in distribution by
⌊Wj / log 2 + log2 n⌋ where Wj has the distribution
P(Wj 6 x) = P(Po(e−x ) < j) =
j−1 −ix
X
e
i!
i=0
−x
e−e
−∞ < x < ∞,
,
(19.66)
d
−x
with density function e−jx e−e /(j − 1)!; further Wj = − log Vj , where Vj has
the Gamma distribution Gamma(j, 1). (Cf. Leadbetter, Lindgren and Rootzén
[82, Section 2.2] for the relation between the distributions of ξ(j) and ξ(1) in the
i.i.d. case.)
Example 19.18. For uniform random labelled unordered rooted trees, we have
by Example 10.2 ξ ∼ Po(1) with πk = e−1 /k!. We have wk+1 /wk → 0, so
Theorem 19.16(ii) applies and shows that Y(1) is concentrated on at most two
values, as proved by Kolchin [76, Theorem 2.5.2]; see also Meir and Moon [88]
and Carr, Goh and Schmutz [21].
Explicitly, (19.51) yields (treating the rather trivial case n > k 1/2 · k! separately)
−1
P(Y(1) < k) = e−ne
/k!(1+O(1/k))
−1
+ o(1) = e−ne
/k!
+ o(1)
(19.67)
which by Stirling’s formula yields
1
√
P(Y(1) < k) = exp −elog n−(k+ 2 ) log k+k−log(e
2π)
+ o(1)
(19.68)
208
S. Janson
uniformly in k > 1, cf. Carr, Goh and Schmutz [21]. It follows easily from
Stirling’s formula, or from (19.68), that k1 (n), k2 (n), k3 (n) ∼ log n/ log log n,
and more precise asymptotics can be found too; cf. [90, 87, 21].
In fact, the simple Example 19.17 is typical for the case wk+1 /wk → a > 0 as
k → ∞; then Y(1) always has asymptotically the same distribution as the maximum of i.i.d. geometric random variables, provided we adjust the number of
these variables according to w. We state some versions of this in the next theorem. For simplicity we consider only the maximum Y(1) , and leave the extensions
to Y(j) for general fixed j to the reader.
Theorem 19.19. Suppose that w0 > 0 and that wk+1 /wk → a as k → ∞, √
with
0 < a < ∞. Suppose further that n → ∞ and m = m(n) with m = λn + o( n)
where 0 < λ < ν.
Let q := τ a = τ /ρ < 1. Let k(n) be any sequence such that πk(n) = Θ(1/n);
equivalently, k(n) = k1 (n) + O(1), and let N = N (n) be integers such that
N∼
nwk(n) a−k(n)
nπk(n) q −k(n)
=
.
1−q
Φ(τ )(1 − q)
(19.69)
(i) Let η1 , . . . , ηN be i.i.d. random variables with a geometric distribution
Ge(1 − q), i.e., P(ηi = k) = (1 − q)q −k , k > 0. Then
d
Y(1) ≈ max ηi .
i6N
(19.70)
(ii) Let W have the Gumbel distribution (19.63). Then
d
Y(1) ≈ ⌊W/ log(1/q) + log1/q N ⌋.
(19.71)
(iii) Let bn := nπk(n) ; thus bn = Θ(1). Then
d
Y(1) − k(n) ≈
W + log(bn /(1 − q)) / log(1/q) .
(19.72)
Thus Y(1) − k(n) is tight, and converges for every subsequence such that
bn converges.
Hence Y(1) − k(n) converges for every subsequence such that bn converges,
but the limit depends on the subsequence so Y(1) − k(n) does not have a limit
distribution. (For the distributions that appear as subsequence limits, see Janson
[60, Examples 4.3 and 2.7].) Note that necessarily k(n) → ∞ and thus N → ∞
as n → ∞.
We show first a simple lemma, similar to Lemma 19.5.
Lemma 19.20. Let Xn and Xn′ be integer-valued random variables and suppose that there exists a sequence of integers k(n) such that Xn − k(n) is tight.
(Equivalently: Xn = k(n) + Op (1).) Then the following are equivalent:
(i) P(Xn 6 k(n) + ℓ) − P(Xn′ 6 k(n) + ℓ) → 0 for each fixed ℓ ∈ Z;
Simply generated trees and random allocations
209
(ii) dK (Xn , Xn′ ) → 0;
d
(iii) Xn ≈ Xn′ , i.e., dTV (Xn , Xn′ ) → 0.
Proof. By considering Xn − k(n) and Xn′ − k(n) we may assume that k(n) = 0.
Let ε > 0. Since Xn is tight, there exists L such that P(|Xn | > L) < ε for every
n. Suppose that (i) holds. Then
dTV (Xn , Xn′ ) =
∞
X
ℓ=−∞
6
L
X
ℓ=−L
P(Xn = ℓ) − P(Xn′ = ℓ) +
P(Xn = ℓ) − P(Xn′ = ℓ) + + P(|Xn | > L) 6 o(1) + ε.
This shows (iii). The implications (iii) =⇒ (ii) and (ii) =⇒ (i) are trivial.
Proof of Theorem 19.19. By (19.58), πk+1 /πk → q as k → ∞, and it follows
from (19.55) that nπk1 (n) ∈ [1, q −1 + o(1)]. It follows further that πk(n) =
Θ(1/n) ⇐⇒ k(n) = k1 (n) + O(1), as asserted, and then πk(n) q −k(n) ∼
πk1 (n) q −k1 (n) ; thus we may replace k(n) by k1 (n) in (19.69).
(i): For each fixed ℓ ∈ Z, by (19.59), (19.58) and (19.69),
n P(ξ > k(n) + ℓ) = nΠk(n)+ℓ ∼ nπk(n)+ℓ /(1 − q) ∼ nπk(n) q ℓ /(1 − q)
(19.73)
∼ N q k(n)+ℓ = N P(η1 > k(n) + ℓ);
furthermore, this is Θ(1). Hence, (19.51) yields
P Y(1) < k(n) + ℓ = e−n P(ξ>k(n)+ℓ) + o(1) = e−N P(η1 >k(n)+ℓ) + o(1)
N
= 1 − P(η1 > k(n) + ℓ) + o(1)
= P max ηi < k(n) + ℓ + o(1),
i6N
and (19.70) follows by Lemma 19.20, since Y(1) −k1 (n) is tight by Theorem 19.16.
(ii): As in (19.64), uniformly in k ∈ Z,
N
k
P max ηi < k = 1 − q k
= e−N q + o(1)
i6N
= P W < k log(1/q) − log N + o(1)
(19.74)
W + log N
=P
< k + o(1).
log(1/q)
Hence, dTV maxi6N ηi , ⌊W/ log(1/q)+log1/q N ⌋ → 0, and (19.71) follows from
(19.70) and Lemma 19.20.
(iii): By (19.69), log1/q N = k(n) + log1/q (bn /(1 − q)) + o(1), and (19.72)
follows easily from (19.71), using Lemma 19.20 and the fact that W is absolutely
continuous.
210
S. Janson
Remark 19.21. For later use we note that Theorem 19.19, as other results,
extends to the case w0 = 0 by the argument in Remark 11.8; we now have to
assume λ > α := min{k : wk > 0}. The extension of Theorem 19.19(i) is perhaps
more subtle that other applications of this argument since N will change by a
factor ∼ q α , but (ii) and (iii) are straightforward, and then (i) follows by (19.74)
and Lemma 19.20.
If the weight sequence is very irregular, Y(1) can fail to be concentrated even
in the case λ < ν.
j
Example 19.22. Let ℓj := 22 and S := {ℓjP
}j>1 . Let wk = 2−k if k ∈ S,
and w1 := 1 −
w
/ S, and choose w0 := k∈S (k − 1)wk P
Pk = 0 if k > 2 and k ∈
∞
kw
.
Then
(w
)
is
a
probability
weight
sequence
with
µ
:=
k
k
k∈S
k=0 kwk = 1.
Furthermore, ρ = 2, Φ(ρ) = ∞ and, by Lemma 3.1(iv), ν := Ψ(ρ) = ∞. Choose
m = n − 1 (the tree case); thus λ = 1 < ν and τ = 1 so (πk ) = (wk ).
Note that ℓj+1 = ℓ2j . If n = 2ℓj , then P(ξ > ℓj ) ∼ 2−ℓj = n−1 , P(ξ >
ℓj+1 ) ∼ 2−ℓj+1 ≪ n−1 and P(ξ > ℓj−1 ) ∼ 2−ℓj−1 ≫ n−1 ; hence it follows from
(19.51) that for the subsequence n = 2ℓj with j ∈ S, P(Y(1) < ℓj+1 ) → 1,
P(Y(1) < ℓj ) → e−1 and P(Y(1) < ℓj−1 ) → 0. Thus, along this subsequence,
P(Y(1) = ℓj ) → 1 − e−1 and P(Y(1) = ℓj−1 ) → e−1 , i.e., P Y(1) = log2 n →
1/2 1 − e−1 and P Y(1) = log2 n → e−1 .
19.4. The subcase wk+1 /wk → 0 as k → ∞
We have seen in Theorem 19.16 that when wk+1 /wk → 0 as k → ∞, the maximum Y(1) is asymptotically concentrated at one or two values. We shall see that
for “most” (in a sense specified below) values of n, Y(1) is concentrated at one
value, but there are also rather large transition regions where Y(1) takes two
values with rather large probabilities.
We have, as said before Theorem 19.16, ω = ∞ and ρ = ∞. Furthermore, by
Lemma 3.1(v), ν = ∞.
We define
nk := ⌊1/πk ⌋,
(19.75)
noting that nk+1 /nk ∼ πk /πk+1 → ∞ as k → ∞; in particular, nk+1 > nk (for
large k, at least). The results above then can be stated as follows.
Theorem 19.23. Suppose that w0 > 0 and that wk+1 /wk → 0 as
√ k → ∞.
Suppose further that n → ∞ and m = m(n) with m = λn + o( n) where
0 < λ < ∞.
(i) Consider n in a subsequence such that for some k(n) and some x ∈ (0, ∞),
n/nk(n) → x. Then
P Y(1) = k(n) − 1 → e−x ,
P Y(1) = k(n) → 1 − e−x .
Simply generated trees and random allocations
211
S∞
(ii) Let Ωk → ∞ as k → ∞. If n → ∞ with n ∈
/ k=1 [Ω−1
k nk , Ωk nk ], then, for
k(n) such that nk(n) < n < nk(n)+1 ,
P Y(1) = k(n) → 1.
Proof. (i): Along the subsequence, using (19.54), (19.59) and (19.75),
n P(ξ > k(n)) = nΠk(n) ∼ nπk(n) ∼
n
→ x.
nk(n)
(19.76)
Hence, (19.51) yields P(Y(1) 6 k(n) − 1) → e−x . Furthermore, by (19.76) and
(19.61), n P(ξ > k(n)) → 0 and n P(ξ > k(n) − 1) → ∞; hence (19.51) yields
P(Y(1) 6 k(n)) → 1 and P(Y(1) 6 k(n) − 2) → 0.
(ii): We may assume Ωk > 1. Then the assumptions imply Ωk(n) nk(n) <
n < Ω−1
k(n)+1 nk(n)+1 , where k(n) → ∞ and thus Ωk(n) → ∞ as n → ∞. Hence,
similarly to (19.76),
n P ξ > k(n) ∼
n
> Ωk(n) → ∞,
nk(n)
n
n P ξ > k(n) + 1 ∼
< Ω−1
k(n)+1 → 0,
nk(n)+1
and the result follows by (19.51).
Roughly speaking, the values of n such that Y(1) takes two values with significant probabilities thus form intervals around each nk , of the same length on
a logarithmic scale; between these intervals, Y(1) is concentrated at one value.
Example 19.24. Consider again uniform random labelled unordered rooted
trees, as in Example 19.18. We have nk = ⌊k!/e⌋. In this case, it is simpler to
redefine nk := k!; Theorem 19.23(ii) is unaffected but (i) is modified to
P Y(1) = k(n) − 1 → e−x/e ,
(19.77)
P Y(1) = k(n) → 1 − e−x/e .
(19.78)
Cf. Carr, Goh and Schmutz [21].
Remark 19.25. We have for simplicity considered only the maximum value Y(1)
in Theorem 19.23. It is easily seen, by minor modifications in the proof, that for
any fixed j, in (ii) also Y(j) = k(n) w.h.p., while in (i) Y(j) ∈ {k(n) − 1, k(n)}
w.h.p., but the two probabilities have limits depending on j; in fact, the number
of j such that Y(j) = k(n) converges in distribution to Po(x). We omit the details.
To make the statement about “most” n precise, recall that the upper and lower
densities of a set A ⊆ N are defined as lim supn→∞ a(n)/n and lim inf n→∞ a(n)/n,
where a(n) := |{i 6 n : i ∈ A}|; if they coincide, i.e., if the limit limn→∞ a(n)/n
exists, it is called
the density. Similarly, the the logarithmic density of A is
P
limn→∞ log1 n i6n, i∈A 1i , when this limit exists, with upper and lower logarithmic densities defined using lim sup and lim inf. It is easily seen that if a set has
212
S. Janson
a density, then it also has a logarithmic density, and the two densities coincide.
(The converse does not hold.) Furthermore, define
p∗n := max P(Y(1) = k).
k
It follows from Theorem 19.16 that the second largest probability P(Y(1) = k) is
1 − p∗n + o(1). Thus, for n in a subsequence, Y(1) is asymptotically concentrated
at one value if and only if p∗n → 1; if p∗n stays away from 1, Y(1) takes two values
with large probabilities.
Theorem 19.26. Suppose that w0 > 0 and that wk+1 /wk → 0 as
√ k → ∞.
Suppose further that n → ∞ and m = m(n) with m = λn + o( n) where
0 < λ < ∞.
(i) If 21 < a < 1, then the set {n : p∗n < a} has upper density
1
a
/ log 1−a
> 0 and lower density 0.
log 1−a
(ii) There exists a subsequence of n with upper density 1 and logarithmic density 1 such that p∗n → 1.
Note that the upper density in (i) can be made arbitrarily close to 1 by
taking a close to 1. This was observed by Carr, Goh and Schmutz [21] for the
case in Example 19.24. (However, they failed to remark that the lower density
nevertheless is 0.)
Proof. (i): Let b1 := − log a and b2 := − log(1 − a); thus 0 < b1 < b2 < ∞. Then
max(e−x , 1 − e−x ) < a ⇐⇒ x ∈ (b1 , b2 ), and it follows from Theorem
S 19.23
(and a uniformity in x implicit in the proof) that for any
ε
>
0,
if
n
∈
k [(b1 +
S
ε)nk , (b2 − ε)nk ], then p∗n < a for large n, while if n ∈
/ k [(b1 − ε)nk , (b2 + ε)nk ],
then p∗n > a for large n. Since nk+1
S/nk → 0 as k → ∞, it is easily seen that for
any b′1 , b′2 with 0 < b′1 < b′2 < ∞, k [b′1 nk , b′2 nk ] has upper density (b′2 − b′1 )/b′2
and lower density 0; it follows by taking b′j := bj ± ε and letting ε → 0 that the
set {n : p∗n < a} has upper density (b1 − b2 )/b2 and lower density 0.
(ii): Let Ωk be an increasing
Ωk =
S sequence with Ωk ր ∞ so slowly that log
∗
o(log(nk /nk−1 )). Let A := k [Ω−1
n
,
Ω
n
].
By
Theorem
19.23(ii),
p
k
k k
n → 1
k
as n → ∞ with n ∈
/ A, so it suffices to prove that A has lower density 0 and
logarithmic density 0.
It is easily seen that for the upper logarithmic density of A, it suffices to
consider n ∈ {⌊Ωk nk ⌋}, which gives
lim sup
k→∞
Pk
j=1
PΩj nj
i=Ω−1
j nj
log(Ωk nk )
1/i
6 lim sup
k→∞
Pk
j=1 2 log Ωj + O(1)
Pk
j=1 log(nj /nj−1 )
→ 0.
Hence the logarithmic density exists and is 0.
The lower density is at most, considering the subsequence ⌊Ω−1
k nk ⌋,
lim inf
k→∞
a(Ω−1
Ωk−1 nk−1
Ωk−1 Ωk
k nk )
= 0,
6 lim inf
= lim
−1
k→∞
k→∞
n
Ω−1
n
Ω
n
k /nk−1
k
k
k
k
Simply generated trees and random allocations
213
since Ωk−1 6 Ωk < (nk /nk−1 )1/3 for large k. (Alternatively, it is a general fact
that the lower density is at most the (lower) logarithmic density, for any set
A ⊆ N.)
19.5. The subcase λ = ν and σ 2 = ∞
We give two examples of the case λ = ν and σ 2 = ∞. (In both examples, we may
assume that ν = 1 and m = n − 1, so the examples apply to simply generated
random trees.) The first example shows that Theorem 19.7 does not always hold
if σ 2 = ∞; the second shows that it sometimes does.
Example 19.27. Let 1 < α < 2 and let (wk ) be a probability weight sequence
with w0 > 0 and wk ∼ ck −α−1 as k → ∞, for some c > 0. (This is as in
Example 12.10 with β = α + 1 ∈ (2, 3). If (wk ) is not a probability weight
′
sequence,
P we may replace c by c := c/Φ(1).) We have ρ = 1, and thus ν =
Ψ(1) = kwk < ∞. (We may obtain any desired ν > 0, for example ν = 1, by
adjusting the first few wk .)
We consider the case m = νn + O(1); thus m/n → λ = ν. (This includes
the tree case m = n − 1 in the case ν = 1. Actually, it suffices to assume
m = νn + o(n1/α ).) Then τ = 1 = ρ, and πk = wk .
The random variable ξ thus satisfies E ξ = λ = ν. Note that σ 2 := Var ξ = ∞.
(This is the main reason for taking 1 < α < 2; if we take α > 2, then σ 2 < ∞
and Theorem 19.7 applies.) Furthermore,
P(ξ > k) =
∞
X
l=k
wl ∼ cα−1 k −α .
(19.79)
As in the proof of Theorem 18.14, there exists by [39, Section XVII.5] a stable
random variable Xα (satisfying (19.93) and (19.113)) such that
Sn − nν d
−→ Xα ;
n1/α
(19.80)
moreover, by [46, § 50], the local limit law (18.22) holds uniformly for all integers
ℓ. Note that the density function g is bounded and uniformly continuous on R,
and that g(0) > 0 by (18.24). (In fact, g(x) > 0 for all x. See also [39, Section
XVII.6] for an explicit formula for g as a power series; Xα is, after rescaling,
the extreme case γ = 2 − α, in the notation there.)
By (19.14) and (18.22),
P(Y1 = k) =
wk P(Sn−1 = m − k)
g(−k/n1/α ) + o(1)
= wk
P(Sn = m)
g(0) + o(1)
g(−k/n1/α ) + o(1)
= wk
,
g(0)
uniformly in k > 0.
(19.81)
214
S. Janson
For a non-negative function f on [0, ∞), define
Xnf :=
n
X
f (Yi /n1/α ).
(19.82)
i=1
In particular, if f is the indicator 1{a 6 x 6 b} of an interval [a, b], we write
Xna,b and have in the notation of (19.7)
Xna,b := |{i 6 n : an1/α 6 Yi 6 bn1/α }| = N[an1/α ,bn1/α ] .
(19.83)
Suppose that f is either the indicator of a compact interval [a, b] ⊂ (0, ∞),
or a continuous function with compact support in (0, ∞) (or, more generally,
any Riemann integrable function with support in a compact interval in (0, ∞)).
Then, using (19.81) and dominated convergence,
E Xnf = n
∞
X
f (k/n1/α ) P(Y1 = k) = n
k=0
∞
X
f (k/n1/α )wk
k=0
Z
∞
= n1+1/α
f (⌊xn1/α ⌋/n1/α )w⌊xn1/α ⌋
0
Z ∞
g(−x)
dx.
→
f (x)cx−α−1
g(0)
0
g(−k/n1/α ) + o(1)
g(0)
g(−⌊xn1/α ⌋/n1/α ) + o(1)
dx
g(0)
(19.84)
In the special case when f (x) = 1{a 6 x 6 b} with 0 < a < b < ∞, we further
similarly obtain,
X
E Xna,b (Xna,b − 1) = n(n − 1)
f (k/n1/α )f (j/n1/α ) P(Y1 = k, Y2 = j)
k,j>0
= n(n − 1)
→ c2
= c2
Z
Z
∞
0
b
a
Z
Z
X
f (k/n1/α )f (j/n1/α )wk wl
k,j>0
∞
f (x)f (y)x−α−1 y −α−1
0
b
x−α−1 y −α−1
a
P(Sn−2 = m − k − j)
P(Sn = m)
g(−x − y)
dx dy
g(0)
g(−x − y)
dx dy
g(0)
and, more generally, for any ℓ > 1,
E(Xna,b )ℓ → cℓ
Z
b
a
···
Z
ℓ
bY
a i=1
x−α−1
i
g(−x1 − · · · − xℓ )
dx1 · · · dxℓ .
g(0)
(19.85)
For each such interval [a, b], this integral is bounded by CRℓ for all ℓ > 1, for
some C and R (depending on a and b), and it follows by the method of moments
d
a,b
a,b
, where X∞
is determined by its factorial moments
that Xna,b −→ X∞
a,b
E(X∞
)ℓ = cℓ
Z
a
b
···
Z
ℓ
bY
a i=1
x−α−1
i
g(−x1 − · · · − xℓ )
dx1 · · · dxℓ .
g(0)
(19.86)
Simply generated trees and random allocations
215
a,b
(It follows that X∞
has a finite moment generating function, so the method of
moment applies.) Furthermore, joint convergence for several intervals holds by
the same argument. It follows also (by some modifications or by approximation
d
f
with step functions; we omit the details) that Xnf −→ X∞
for every continuous
f
f > 0 with compact support and some X∞ .
Let Ξn be the multiset {Yi /n1/α : Yi > 0}, regarded as
Pa point process on
(0, ∞). (I.e., formally we let Ξn be the discrete measure i:Yi >0 δYi /n1/α . See
e.g. Kallenberg [68] or [69] for details on point processes, or Janson [57, § 4] for
d
f
for every continuous f > 0 with
a brief summary.) The convergence Xnf −→ X∞
compact support in (0, ∞) implies, see [68, Lemma 5.1] or [69, Lemma 16.15
and Theorem 16.16], that Ξn converges in distribution, as a point process on
(0, ∞), to some point process Ξ on (0, ∞). The distribution of Ξ is determined
a,b
is the number of points of Ξ in [a, b]. By (19.86) or
by (19.86), where X∞
(19.84), the intensity measure is given by
E Ξ = cg(0)−1 x−α−1 g(−x) dx.
(19.87)
We can also consider
infinite intervals. Let a > 0. Then, using again (19.14)
P∞
and noting that k=−∞ P(Sn−1 = m − k) = 1,
E Xna,∞ = n
X
k>an1/α
6 nC1 (an1/α )−α−1
6 C1 a−α−1 n−1/α
6 C2 a
−α−1
X
P(Y1 = a) = n
P
wk
k>an1/α
k>an1/α
P(Sn−1 = m − k)
P(Sn = m)
P(Sn−1 = m − k)
P(Sn = m)
(19.88)
1
n−1/α (g(0)
+ o(1))
.
a,∞
a,∞
6 C2 a−α−1 < ∞. Hence, X∞
<∞
By Fatou’s lemma, (19.88) implies E X∞
a.s. for every a > 0, and we may order the points in Ξ in decreasing order as
Ξ = {ηj }Jj=1
with η1 > η2 > . . . .
(19.89)
0,∞
(Here J = X∞
6 ∞ is the random number of points in Ξ. We shall see that
J = ∞ a.s.)
The bound (19.88) is uniform in n, and tends to 0 as a → ∞. It follows, see
[57, Lemma 4.1], that if we regard Ξn and Ξ as point processes on [0, ∞], the
d
convergence Ξn −→ Ξ on (0, ∞) implies the stronger result
d
Ξn −→ Ξ on [0, ∞].
(19.90)
The points in Ξn , ordered in decreasing order, are Y(1) /n1/α > Y(2) /n1/α > . . . .
If we extend (19.89) by defining ηj := 0 when j > J, the convergence (19.90) of
point processes on [0, ∞] is by [57, Lemma 4.4] equivalent to joint convergence
of the ranked points, i.e.
d
Y(j) /n1/α −→ ηj ,
j > 1 (jointly).
(19.91)
216
S. Janson
0,∞
We claim that each ηj > 0 a.s., and thus J = X∞
= ∞ a.s. Suppose the opposite: P(ηj = 0) = δ > 0 for some j. Then, for every ε > 0, lim inf P(Y(j) /n1/α <
ε) > P(ηj < ε) > δ, and it follows that there exists a sequence εn → 0 such that
P(Y(j) /n1/α < εn ) > δ/2 for all n. We may assume that εn n1/α → ∞. Let A > 0
−1/α
and take (for large n) an := εn and bn := εn−α − αc−1 A
. Then an , bn → 0.
1/α
1/α
For k 6 bn n
= o(n ), (18.22) implies P(Sn−1 = m − k)/ P(Sn = m) → 1,
and the argument in (19.84)–(19.85) yields, for each ℓ > 1,
an ,bn
E Xn
ℓ
d

∼ n
bnX
n1/α
k=an
n1/α
ℓ
wk  ∼
−α
= cα−1 a−α
n − bn
Hence, Xnan ,bn −→ Po(A); in particular,
ℓ
c
Z
bn
an
−α−1
x
!ℓ
dx
(19.92)
= Aℓ .
δ/2 6 P(Y(j) /n1/α < εn ) 6 P(Xnan ,bn < j) → P(Po(A) < j).
Taking A large enough, we can make P(Po(A) < j) < δ/2, a contradiction which
proves our claim.
We have shown that (19.91) holds with ηj > 0. Furthermore, since the intensity (19.87) is absolutely continuous, each ηj has an absolutely continuous
distribution. Hence Y(1) , and every Y(j) , is of the order n−1/α , with a continuous
limit distribution ηj (and thus no strict concentration at some constant times
n−1/α ).
Note that if we consider i.i.d. variables ξ1 , . . . , ξn , then {ξi /n1/α : ξi > 0}
converges (as is easily verified) to a Poisson process on [0, ∞] with intensity
cx−α−1 dx. This intensity differs from the intensity of Ξ in (19.87), and, since
g(−x) → 0 as x → ∞, it is easy to see that ξ(1) /n1/α and Y(1) /n1/α have
different limit distributions. Thus, Theorem 19.7 does not hold in this case.
(However, Y(1) and ξ(1) are of the same order n−1/α .) Note also that, as an easy
consequence of (19.86), the limiting point process Ξ in this example is not a
Poisson process.
Remark 19.28. The distribution of the limiting point process Ξ in Example 19.27 is in principle determined by (19.86) and its extension to joint converai ,bi
. This can be made more explicit as follows. (See Luczak
gence for several X∞
and Pittel [83] for similar calculations.)
It follows from Feller [39, Section XVII.5], see e.g. [63] for detailed calculations, that Xα has the characteristic function
ϕ(t) = exp c Γ(−α)(−it)α ,
t ∈ R.
(19.93)
(Note that Γ(−α) > 0 and Re(−it)α < 0 for t 6= 0 since 1 < α < 2.) The
inversion formula gives
Z ∞
Z ∞
α
1
1
−ixt
e
ϕ(t) dt =
e−ixt+cΓ(−α)(−it) dt,
(19.94)
g(x) =
2π −∞
2π −∞
Simply generated trees and random allocations
217
and (19.86) yields
Z ∞ Y
Z b Z bY
ℓ
ℓ
1
eixj t ϕ(t) dt dx1 · · · dxℓ
x−α−1
cℓ
···
j
2πg(0)
−∞ j=1
a j=1
a
(19.95)
Z ∞ Z b
ℓ
1
−α−1 itx
c
x
e dx ϕ(t) dt.
=
2πg(0) −∞
a
a,b
E(X∞
)ℓ =
a,b
In particular, E(X∞
)ℓ = O(C ℓ ) for some C < ∞ (with C depending on a but
a,b
not on b). Hence, X∞ has probability generating function, convergent for all
complex z,
∞ a,b X
a,b
X∞
X∞
(z − 1)ℓ
=E
Ez
ℓ
ℓ=0
Z ∞ Z b
∞
ℓ
X
1
(z − 1)ℓ
(19.96)
c
x−α−1 eitx dx ϕ(t) dt
·
=
ℓ!
2πg(0) −∞
a
ℓ=0
Z ∞
Z b
1
=
exp (z − 1)c
x−α−1 eitx dx ϕ(t) dt.
2πg(0) −∞
a
We can here let b → ∞, so (19.96) holds for b = ∞ too. In particular, taking
z = 0, we obtain, using (19.91), the limit distribution of Y(1) /n1/α as
x,∞
P(η1 6 x) = P(X∞
= 0)
Z ∞
Z ∞
1
=
x−α−1 eitx dx ϕ(t) dt
exp −c
2πg(0) −∞
a
Z ∞
Z ∞
1
=
x−α−1 eitx dx + c Γ(−α)(−it)α dt
exp −c
2πg(0) −∞
a
Z ∞
Z a
a1−α a−α
1
dt,
− it
exp c
x−α−1 eitx − 1 − itx dx −
=
2πg(0) −∞
α
α−1
0
(19.97)
where the last equality holds because
Z ∞
Γ(−α)uα =
x−α−1 e−ux − 1 + ux du
(19.98)
0
when Re u > 0 and 1 < α < 2.
Furthermore, by extending (19.86) to joint factorial moments for several (disjoint) intervals,
it follows similarly, for step functions f , that the random variable
P∞
f
X∞
= j=1 f (ηj ) satisfies
Z ∞
Z ∞
f
1
ef (x) − 1 x−α−1 eitx dx ϕ(t) dt
exp c
E eX∞ =
2πg(0) −∞
0
Z ∞
Z ∞
1
ef (x) − 1 x−α−1 eitx dx + c Γ(−α)(−it)α dt
=
exp c
2πg(0) −∞
0
Z ∞
Z ∞
1
=
exp c
x−α−1 ef (x)+itx − 1 − itx dx dt.
2πg(0) −∞
0
(19.99)
218
S. Janson
By taking limits, (19.99) extends to, e.g., any bounded measurable f with comf
sf
pact support in (0, ∞]. Since E esX∞ = E eX∞ for s ∈ R, this formula determines
f
(in principle) the distribution of each X∞
and thus of Ξ.
Example 19.29. Let (wk ) be as in Example 19.27 but with α = 2, i.e., wk ∼
ck −3 as k → ∞, for some c > 0. (Example 12.10
P with β = 3.) We still have
kwk < ∞. (We may again
(19.79); further, ρ = 1, and thus ν = Ψ(1) =
obtain any desired ν > 0, for example ν = 1, by adjusting the first few wk .)
As in Example 19.27. we consider the case m = νn + O(1), including the tree
case m = n − 1 when ν = 1. Thus, again, m/n → λ = ν, τ = 1 = ρ, πk = wk ,
and the random variable ξ satisfies E ξ = λ = ν, while σ 2 := Var ξ = ∞.
As in the proof of Theorem 18.14, we have the central limit theorem (18.25),
and the local limit law (18.26) holds √
uniformly for all integers ℓ.
Choose B(n) := n1/2 log log n = o( n log n). Then, by (18.26),
g(0) + o(1)
Z(m, n) = P(Sn = m) = √
n log n
(19.100)
and, uniformly for all k 6 B(n),
g(0) + o(1)
.
Z(m − k, n − 1) = P(Sn−1 = m − k) = √
n log n
(19.101)
Hence, by (19.14), (19.22) holds. Furthermore, (18.26) yields also, since g(0) =
maxx∈R g(x),
g(0) + o(1)
Z(m − k, n − 1) = P(Sn−1 = m − k) 6 √
,
n log n
(19.102)
uniformly for all k > 0; hence (19.14) implies that (19.17)–(19.18) hold.
For our B(n) we have by (19.79)
P(ξ > B(n)) = O(B(n)−2 ) = o(n−1 ),
(19.103)
so (19.19) holds, and thus (19.20) holds.
The proof of Theorem 19.7 now holds without further modifications; hence
the conclusions of Theorem 19.7 holds for this example, although σ 2 = ∞.
Note that in Example 19.27, although the asymptotic distributions of Y(1)
and ξ(1) are different, they are still of the same order of magnitude. We do not
know whether this is true in general. This question can be formulated more
precisely as follows.
Problem 19.30. In the case λ = ν, do Theorem 19.7(ii)–(iii) hold also when
σ 2 = ∞?
In any case, we can ask about the possible rates of growth of Y(1) , for example
as follows, where we for definiteness consider the tree case m = n − 1 (and thus
λ = 1).
219
Simply generated trees and random allocations
Problem 19.31. For which sequences ω(n) does there exist a weight sequence
with ν = 1 such that, with m = n − 1, Y(1) > ω(n) w.h.p.?
As remarked earlier, the answer is positive for ω(n) = n1−ε , for any ε > 0,
as shown by Example 19.27 with α < 1/(1 − ε).
19.6. The case λ > ν
We turn to the case λ > ν. Then, as briefly discussed in Section 11, the
asymptotic
formula for the numbers Nk in Theorem 11.4 accounts only for
P∞
kπ
n
= µn = νn balls, so there are m − νn ≈ (λ − ν)n balls missk
k=0
ing. A more careful treatment of the limits show that the explanation is that
Theorem 11.4 really implies P
that the “small” boxes (i.e., those with rather few
∞
balls) have a total of about k=0 kπk n = µn = νn balls, while the remaining
≈ (λ − ν)n balls are in a few “large” boxes. One way to express this precisely is
the following simple result.
Lemma 19.32. Let w = (wk )k>0 be a weight sequence with w0 > 0 and ω = ∞.
Suppose that n → ∞ and m = m(n) with m/n → λ where ν < λ < ∞.
(i) For any sequence Kn → ∞,
X
kNk > νn + op (n)
k6Kn
X
and
k>Kn
kNk 6 (λ − ν)n + op (n).
(ii) There exists a sequence Ωn → ∞ such that for any sequence Kn → ∞
with Kn 6 Ωn we have
X
X
kNk = (λ − ν)n + op (n).
kNk = νn + op (n)
and
k>Kn
k6Kn
Proof. The two statements in each part are equivalent, since
∞
X
kNk = m = λn + o(n).
(19.104)
k=0
(i): For every fixed ℓ, Theorem 11.4 implies
1X
p X
kNk −→
kπk .
n
k6ℓ
(19.105)
k6ℓ
P
P∞
Let ε > 0. Since k=0 kπk = ν < ∞, there exists ℓ such that k6ℓ kπk > ν − ε,
and (19.105) implies that w.h.p.
1X
kNk > ν − ε.
n
k6ℓ
P
Since ε is arbitrary, this implies k6Kn kNk > νn + op (n).
P
P
(ii): For each fixed ℓ, k6ℓ kπk < ∞
k=0 kπk = ν, and thus (19.105) implies
P
P( k6ℓ kNk > νn) → 0. Hence, there exists an increasing sequence of integers
220
S. Janson
P
nℓ such that if n > nℓ , then
P
kN
>
νn
< 1/ℓ. Now define Ωn = ℓ
k
k6ℓ
P
for nℓ 6 n < nℓ+1 . Then k6Ωn kNk 6 νn w.h.p., which together with (i)
yields (ii).
Consider the “large” boxes. One obvious possibility is that there is a single
“giant” box with ≈ (λ − ν)n balls; more formally, (λ − ν)n + op (n) balls (a
“monopoly”). Applying Lemma 19.32(i) with Kn = o(n), we see that for every
ε > 0, w.h.p. there are then less than εn balls in all other boxes with more than
Kn balls each; thus, either Y(2) 6 Kn or Y(2) < εn. Consequently, this case is
defined by
Y(1) = (λ − ν)n + op (n),
Y(2) = op (n).
p
(19.106)
(19.107)
p
Equivalently, Y(1) /n −→ λ − ν and Y(2) /n −→ 0. This thus describes condensation of the missing balls to a single box.
We will see in Theorem 19.34 that, indeed, this is the case for the important example of weights with a power-law. Another, more extreme example is
Example 10.8, wk = k!, where ν = 0, see Example 19.36.
However, if (wk ) is very irregular, (19.106)–(19.107) do not always hold. Examples 19.37 and 19.38 give examples where, at least for a subsequence, either
p
Y(2) /n −→ a > 0, so there are at least two giant boxes with order n balls each
p
(an “oligopoly”), or Y(1) /n −→ 0, so there is no giant box with order n balls,
and the missing (λ − ν)n balls are distributed over a large number (necessarily
→ ∞ as n → ∞) of boxes, each with a large but o(n) number of balls.
Example 19.33. We consider Example 12.10; wk ∼ ck −β as k → ∞. If β 6 2,
then ν = ∞, see (12.46), and thus λ < ν and Theorems 19.3 and 19.7 apply. We
are interested in the case λ > ν, so we assume β > 2. In this case, Jonsson and
Stefánsson [67] showed (for the case of random trees) that when λ > ν we have
the simple situation with condensation to a single giant box. We state this in the
next theorem, which also includes further, more precise, results. (Note that the
case λ < ν is covered by Theorems 19.3 and 19.7, with Y(1) of order log n; the
case λ = ν is studied in Examples 19.27 and 19.29 for 2 < β 6 3, and is covered
by Theorem 19.7 when β > 3; in both cases Y(1) is of order n−1/(β−1) = o(n).)
Theorem 19.34. Suppose that wk ∼ ck −β as k → ∞ for some c > 0 and
β > 2. Then ν < ∞. Suppose further m/n → λ > ν. Let α := β − 1 > 1 and
c′ := c/Φ(1).
(i) The random allocation Bm,n = (Y1 , . . . , Yn ) has largest components
Y(1) = (λ − ν)n + op (n),
Y(2) = op (n).
(19.108)
(19.109)
(ii) The partition function is asymptotically given by
Z(m, n) ∼ c(λ − ν)−β Φ(1)n−1 n1−β .
(19.110)
Simply generated trees and random allocations
221
(iii) Furthermore,
n−1
X
d ′
′
,
, . . . , ξ(n−1)
ξi , ξ(1)
Y(1) , Y(2) , . . . , Y(n) ≈ m −
(19.111)
i=1
′
′
where ξ(1)
, . . . , ξ(n−1)
are the n − 1 i.i.d. random variables ξ1 , . . . , ξn−1 ,
with distribution (πk ), ordered in decreasing order.
(iv) Y(1) = m − νn + Op (n1/α ) and
d
n−1/α m − νn − Y(1) −→ Xα ,
(19.112)
where Xα is an α-stable random variable with Laplace transform
Re t > 0.
(19.113)
E e−tXα = exp c′ Γ(−α)tα ,
(v) Y(2) = Op (n1/α ) and
d
n−1/α Y(2) −→ W,
where W has the Fréchet distribution
c′
P(W 6 x) = exp − x−α ,
α
(19.114)
x > 0.
(19.115)
(vi) More generally, for each j > 2, Y(j) = Op (n1/α ) and
d
n−1/α Y(j) −→ Wj ,
(19.116)
where Wj has the density function
′ −α−1
cx
j−2
c′ α−1 x−α
exp −c′ α−1 x−α ,
(j − 2)!
x > 0,
(19.117)
and c′ α−1 Wj−α ∼ Γ(j − 1, 1).
Note that πk = wk /Φ(1) and that Γ(−α) > 0 in (19.113).
Part (iii) shows that Y(2) , . . . , Y(n) asymptotically are as order statistics of
n − 1 i.i.d. random variables ξi ; thus the giant box absorbs the dependency
between the variables Y1 , . . . , Yn introduced by the conditioning in (11.7).
Remark 19.35. Jonsson and Stefánsson [67] considered only trees, and thus
m = n − 1 and λ = 1, and then showed the tree versions of (i) and (ii). (They
further showed Theorem 7.1 when wk ∼ ck −β .) In the tree case (i) says that
the random tree Tn has w.h.p. a node of largest degree (1 − ν)n + o(n), while
all other nodes have degrees o(n); further, by Theorem 15.5, (ii) becomes
Zn ∼ c(1 − ν)−β Φ(1)n−1 n−β ∼ (1 − ν)−β Φ(1)n−1 wn .
(19.118)
Proof of Theorem 19.34. We may assume that w0 > 0 by the argument in Remark 11.8. Furthermore, using (11.9) for (ii), by dividing wk (and c) by Φ(1),
222
S. Janson
we may assume that (wk ) is a probability weight sequence, and thus Φ(1) = 1.
For λ > ν we have τ = ρ = 1, and thus then πk = wk .
P
(i): Φ(t) has radius of convergence ρ = 1, and since β > 2, Φ(1) = k wk < ∞
and ν = Φ′ (1)/Φ(1) < ∞.
Consider as in Example 11.2 i.i.d. random variables ξ1 , . . . , ξn with distribution (πk ) = (wk ) and mean µ = ν.
Fix a small ε > 0. We assume that ε < λ − ν.
p
By the law of large numbers, Sn−1 /n −→ µ = ν. We may thus find a sequence
δn → 0 such that |Sn−1 − nν| 6 nδn w.h.p.
Since m/n − ν − δn → λ − ν > ε, we have m − νn − δn n > εn for large n; we
consider only such n.
We separate the event Sn = m into four disjoint cases (subevents):
E1
E2
E3
E4
:
:
:
:
Exactly one ξi > εn, and that ξi satisfies |ξi − (m − νn)| 6 δn n.
Exactly one ξi > εn, and that ξi satisfies |ξi − (m − νn)| > δn n.
ξi > εn for at least two i ∈ {1, . . . , n}.
All ξi 6 εn.
We shall show that E1 is the dominating event. We define also the events
E1i
∗
E1i
∗
E2i
Dij
|ξi − (m − νn)| 6 δn n and ξj 6 εn for j 6= i.
|ξi − (m − νn)| 6 δn n.
|ξi − (m − νn)| > δn n, ξi > εn.
ξi > εn, ξj > εn.
S
Then E1 is the disjoint union ni=1 E1i , so by symmetry
:
:
:
:
Sn
Sn
Sn
Sn
= m,
= m,
= m,
= m,
P(E1 ) = n P(E11 ).
(19.119)
Furthermore, for any i,
∗
E1i ⊆ E1i
⊆ E1i ∪
[
j6=i
Dij
and thus, again using symmetry,
∗
∗
P(E11
) > P(E11 ) > P(E11
) − n P(D12 ).
(19.120)
Using the fact that |k − (m − νn)| 6 δn n implies wk ∼ ck −β ∼ c(λn − νn)−β ,
together with |Sn−1 − nν| 6 δn n w.h.p., we obtain
X
∗
P(ξ1 = k, Sn = m)
P(E11
)=
|k−(m−νn)|6δn n
=
X
|k−(m−νn)|6δn n
=
X
|k−(m−νn)|6δn n
P(ξ1 = k) P(Sn−1 = m − k)
(19.121)
c(λ − ν)−β n−β 1 + o(1) P(Sn−1 = m − k)
= c(λ − ν)−β n−β P |Sn−1 − nν| 6 δn n 1 + o(1)
= c(λ − ν)−β n−β 1 + o(1) .
Simply generated trees and random allocations
223
Similarly, allowing the constants Ci here and below to depend on ε,
X
∗
P(ξi = k, Sn = m)
P(E2i
)=
|k−(m−νn)|>δn n, k>εn
X
6 C1 (εn)−β
|k−(m−νn)|>δn n, k>εn
P(Sn−1 = m − k)
(19.122)
6 C2 n−β P |Sn−1 − νn| > δn n = o n−β .
For any i and j, by symmetry,
P(Dij ) = P(ξn > εn, ξn−1 > εn, Sn = m)
X
=
P(ξn = k) P(Sn−1 = m − k, ξn−1 > εn)
k>εn
X
6 C3 (εn)−β
k>εn
P(Sn−1 = m − k, ξn−1 > εn)
(19.123)
6 C3 (εn)−β P(ξn−1 > εn) 6 C4 (εn)1−2β .
Hence, (19.120) and (19.121) yield
P(E11 ) = c(λ − ν)−β n−β + o n−β + O n2−2β = c(λ − ν)−β n−β + o n−β
and hence, by (19.119),
P(E1 ) = c(λ − ν)−β n1−β + o n1−β .
(19.124)
Furthermore, (19.122) yields
P(E2 ) 6
n
X
i=1
∗
∗
P(E2i
) = n P(E21
) = o n1−β ,
and (19.123) also yields
X
P(E3 ) 6
P(Dij ) 6 n2 P(D12 ) = O n3−2β = o n1−β .
(19.125)
(19.126)
i<j
It remains to estimate
P(E4 ). We define the truncated variables ξ̄i := ξi 1{ξi 6
Pn
εn} and S̄n := i=1 ξ̄i . Thus E4 ⊆ {S̄n = m} and hence, for every real s,
n
(19.127)
P(E4 ) 6 e−sm E esS̄ = e−sm E esξ̄1 .
Let s := a log n/n, for a constant a > 0 chosen later. Then,
E esξ̄1 = 1 + s E ξ̄1 +
εn
X
k=1
πk esk − 1 − sk
2β/s
6 1 + sν + C5
X
k=1
k
−β 2 2
s k + C5
εn
X
k=2β/s
(19.128)
k
−β sk
e .
224
S. Janson
We have, treating the cases 2 < β < 3, β = 3 and β > 3 separately, using s → 0,
2β/s
X
k=1
s2 k 2−β 6 C6 s2 max 1, (2β/s)3−β , log(2β/s) = o(s).
Furthermore, for k > 2β/s,
1 β −s
k −β esk
e 6 eβ/k−s 6 es/2−s = e−s/2 .
=
1
+
k
(k + 1)−β es(k+1)
Hence, the final sum in (19.128) is dominated by a geometric series
X
(⌊εn⌋)−β es⌊εn⌋ e−s(⌊εn⌋−k)/2 6 C7 s−1 n−β esεn = C7 s−1 n−β eaε log n .
k6⌊εn⌋
If we assume aε 6 β − 2, the sum is thus 6 C8 n1−β+aε 6 C8 n−1 = o(s).
Consequently, (19.128) yields
E esξ̄1 6 1 + sν + o(s) 6 exp sν + o(s)
and thus (19.127) yields
P(E4 ) 6 exp −sm + nsν + o(ns) = exp −ns(λ − ν + o(1)) = n−a(λ−ν)+o(1) .
(19.129)
We choose first a := β/(λ − ν) and
then
ε
<
(β
−
2)/a,
and
see
by
(19.129)
that
then P(E4 ) = n−β+o(1) = o n1−β . Combining (19.124), (19.125), (19.126) and
(19.129), we find
P(Sn = m) = P(E1 ) + o n1−β = c(λ − ν)−β n1−β + o n1−β ,
(19.130)
and, in particular, P(E1 | Sn = m) → 1. Consequently, by conditioning on
Sn = m we see that w.h.p. |Y(1) − (m − νn)| 6 δn n and Y(2) 6 εn. Since ε can
be chosen arbitrarily small, this completes the proof of (19.108)–(19.109).
(ii): Z(m, n) = P(Sn = m), so (19.110) follows from (19.130), since we assume
Φ(1) = 1.
(iii): Since E1 ⊆ {Sn = m} and P(E1 | Sn = m) → 1,
d
d
(Y1 , . . . , Yn ) = (ξ1 , . . . , ξn ) | Sn = m ≈ (ξ1 , . . . , ξn ) | E1 .
(19.131)
When we consider the ordered variables Y(1) , . . . , Y(n) , we may by symmetry
condition on E1n instead of E1 . Note that E1n is the event (ξ1 , . . . , ξn ) ∈ A,
where A is the set
n−1
n−1
o
n
X
X
xi − νn 6 δn n .
xi , (x1 , . . . , xn ) : xj 6 εn for j 6 n − 1, xn = m −
i=1
i=1
Since (x1 , . . . , xn ) ∈ A implies |xn − (m − νn)| 6 δn n, we then have, similarly
to (19.121),
−β
P(ξn = xn ) ∼ cx−β
∼ c(λ − ν)−β n−β .
n ∼ c(m − νn)
225
Simply generated trees and random allocations
Furthermore, x1 , . . . , xn−1 determine xn by
formly for all (x1 , . . . , xn ) ∈ A,
Pn
1
xi = m. It follows that, uni-
P (ξ1 , . . . , ξn ) = (x1 , . . . , xn )
= 1 + o(1) c(λ − ν)−β n−β P (ξ1 , . . . , ξn−1 ) = (x1 , . . . , xn−1 )
= 1 + o(1) c(λ − ν)−β n−β P (ξ1 , . . . , ξn−1 , m − Sn−1 ) = (x1 , . . . , xn ) .
Hence, since the factor c(λ − ν)−β n−β is a constant for each n,
d
(ξ1 , . . . , ξn ) | E1n ≈ (ξ1 , . . . , ξn−1 , m − Sn−1 ) | Een ,
(19.132)
where Een is the event
(ξ1 , . . . , ξn−1 , m − Sn−1 ) ∈ A = ξj 6 εn for j 6 n − 1, Sn−1 − νn 6 δn n .
(19.133)
If Een holds, then m − Sn−1 > m − νn − δn n > εn (for large n), so the largest
variable among ξ1 , . . . , ξn−1 , m−Sn−1 is m−Sn−1 . Hence, ordering the variables,
we obtain using (19.131)–(19.132)
d
′
′
) | Een .
, . . . , ξ(n−1)
Y(1) , . . . , Y(n) ≈ (m − Sn−1 , ξ(1)
(19.134)
Finally, observe that |Sn−1 − νn| 6 δn n w.h.p. and
P ξj > εn for some j 6 n − 1 6 n P(ξ1 > εn) = O n2−β → 0.
Hence, P(Een ) → 1, and thus
d
′
′
′
′
, . . . , ξ(n−1)
.
(m − Sn−1 , ξ(1)
, . . . , ξ(n−1)
) | Een ≈ m − Sn−1 , ξ(1)
(19.135)
The result (19.111) follows from (19.134) and (19.135).
d Pn−1
(iv): By (iii), m − nν − Y(1) ≈ i=1 ξi − nν, and (19.112) follows by standard results on domains of attraction for stable distributions, see e.g. Feller [39,
Section XVII.5].
d
′
(v): By (iii), Y(2) ≈ ξ(1)
, and (19.114) follows by standard results on the
maximum of i.i.d. random variables, as in e.g. Leadbetter, Lindgren and Rootzén
[82]: using P(ξ > x) ∼ cα−1 x−α as x → ∞, we have
′
P(Y(1) 6 xn1/α ) = P(ξ(1)
6 xn1/α ) + o(1) = P(ξ 6 xn1/α )n−1 + o(1)
n−1
+ o(1)
= 1 − (cα−1 + o(1))(xn1/α )−α
−1 −α
.
→ exp −cα x
(vi): Similar, cf. Leadbetter, Lindgren and Rootzén [82, Section 2.2].
226
S. Janson
Example 19.36. If we take wk = k!, then ν = ρ = 0. Consider the tree case
m = n − 1. By Example 10.8, translating to balls-in-boxes, w.h.p. there are N1
boxes with 1 ball each and a single box with the remaining n − 1 − N1 balls,
d
while all other boxes are empty; furthermore, N1 −→ Po(1) so N1 = Op (1).
Hence, Y(1) = n + Op (1) and Y(2) 6 1 w.h.p.
If we take wk = k!α with 0 < α < 1, and still m = n−1, then by Example 10.9
and [64], Y(1) = n + Op (n1−α ) = n + op (n) and Y(2) 6 ⌊1/α⌋ w.h.p.
If we take wk = k!α with α > 1, and still m = n − 1, then by Example 10.9,
w.h.p. there is a single box containing all n − 1 balls; thus Y(1) = n − 1 and
Y(2) = 0 w.h.p.
In particular, (19.106)–(19.107) hold, with λ = 1 and ν = 0, for all three
cases. We guess that the same is true for any λ < ∞, but we have not checked
the details.
Example 19.37. We consider the tree case m = n − 1. Let S := {k0 , k1 , . . . } be
an infinite set with k0 = 0, k1 = 1, k2 = 2, and kj for j > 3 chosen recursively
as specified below. Let wk = (k + 1)−4 for k ∈ S, and wk = 0 otherwise; thus,
supp(w) = S. (S = N0 gives Example 10.7 with β = 4.) Then ρ = 1 and
P∞
P∞
k(k + 1)−4
kwk
k=0
6 k=0
ν = Ψ(1) = P∞
= ζ(3) − ζ(4) < 0.2 < 1; (19.136)
w0
k=0 wk
thus τ = ρ = 1.
To begin with, we require that kj > jkj−1 for j > 3. Take n = kj . A good
allocation of n − 1 balls in n boxes has at most kj−1 balls in any box, since
n − 1 < kj , so
Y(1) 6 kj−1 6 kj /j = n/j.
(19.137)
Hence, for n in the subsequence {kj }, the random allocation Bn−1,n has Y(1) =
o(n).
Next, suppose that k0 , . . . , kj−1 are given, and let w(kj−1 ) be w truncated
at kj−1 as in (13.4); for ease of notation we denote the corresponding generPj−1
ating function by Φj (t) := i=0 wki tki and write Ψj (t) := tΦ′j (t)/Φj (t) and
Zj (m, n) := Z(m, n; w(kj−1 ) ). Note that (19.136) applies to each Ψj too, and
thus
Ψj (1) < 0.2.
(19.138)
Take n = 3kj (where kj is not yet determined). A good allocation with n − 1
balls has at most 2 boxes with kj balls, and for the remaining boxes the weights
w and w(kj−1 ) coincide. We thus obtain
Z(3kj − 1, 3kj ) = Zj (3kj − 1, 3kj ) + 3kj wkj Zj (2kj − 1, 3kj − 1)
3kj
wk2j Zj (kj − 1, 3kj − 2). (19.139)
+
2
Let the three terms on the right-hand side be A0 , A1 , A2 , where Ai corresponds
to the case when i boxes have kj balls. The generating function Φj is a polynomial, with radius of convergence ρj = ∞ and, by Lemma 3.1, νj := Ψj (∞) =
Simply generated trees and random allocations
227
ω(w(kj−1 ) ) = kj−1 > 2. Define τ, τ ′ and τ ′′ by Ψj (τj ) = 1, Ψj (τj′ ) = 2/3,
Ψj (τj′′ ) = 1/3. Since Ψj (1) < 1/3 by (19.138), we have 1 < τj′′ < τj′ < τj < ∞.
Theorem 18.1 applies to each term Ai in (19.139), with λ = 1, 32 , 13 , respectively; hence, as kj → ∞,
Φj (τj )
+ o(kj ),
τj
Φj (τj′ )
log A1 = 3kj log ′ 2/3 + o(kj ),
(τj )
log A0 = 3kj log
log A2 = 3kj log
Φj (τj′′ )
+ o(kj ).
(τj′′ )1/3
(19.140)
(19.141)
(19.142)
By (11.16) and τj′′ > 1,
Φj (τj′′ )
Φj (τj′′ )
Φj (τj )
6
<
τj
τj′′
(τj′′ )1/3
and
Φj (τj′ )
Φj (τj′′ )
Φj (τj′′ )
6
<
.
(τj′ )2/3
(τj′′ )2/3
(τj′′ )1/3
Hence, the constant multiplying kj is larger in (19.142) than in (19.140) and
(19.141), so by choosing kj large enough, we obtain A2 > jA1 and A2 > jA0 ,
and thus
P(B3kj −1,3kj has 2 boxes with kj balls) =
A2
2
> 1 − . (19.143)
A0 + A1 + A2
j
This constructs recursively the sequence (kj ) and thus S and w, and (19.143)
shows that for n in the subsequence (3kj )j , Bn−1,n w.h.p. has 2 boxes with n/3
balls each.
By Lemma 17.1, it follows that, for this subsequence, Tn w.h.p. has 2 nodes
with outdegrees n/3.
To summarise, we have found a weight sequence with 0 < ν < 1 such that,
with m = n − 1, for one subsequence
Y(1) /n → 0
(19.144)
and for another subsequence w.h.p.
Y(1) = Y(2) = n/3.
(19.145)
Hence, neither (19.106) nor (19.107) holds. (It is easy to modify the construction
such that for every ℓ > 1, there is a subsequence with Y(1) = · · · = Y(ℓ) =
n/(ℓ + 1).)
Example 19.38. Let S := {0} ∪ {2i : i > 0}. We will construct a weight
sequence w recursively with support supp(w) = S and ρ = 0. Let w0 = 1.
228
S. Janson
Let i > 0. If w0 , . . . , w2i−1 are fixed and we let w2i → ∞, then for every m
with 2i 6 m < 2i+1 and every n,
P(Bm,n contains a box with 2i balls) → 1.
(19.146)
Hence, we can recursively choose w2i so large that, for every i > 0, if 2i 6 m <
2i+1 and 2i 6 n 6 22i , then, by (11.3),
P(Bm,n contains a box with 2i balls) > 1 − i−1 .
(19.147)
We further take w2i > (2i )!; thus ρ = 0 and ν = 0.
Consider the tree case, m = n−1. Thus λ = 1. If 2i < n 6 2i+1 , then (19.147)
applies and shows that Bn−1,n w.h.p. contains a box with 2i balls, so w.h.p.
Y(1) = 2⌊log2 (n−1)⌋ = 2⌈log2 n⌉−1 .
(19.148)
Hence, Y(1) /n w.h.p. is a (non-random) value that oscillates between 12 and 1,
depending on the fractional part {log2 n} of log2 n. Consequently, (19.106) holds
for subsequences such that 0 6= {log2 n} → 0, but not in general.
Moreover, conditioned on the existence of a box with 2i balls, the remainder
of the allocation is a random allocation Bm−2i ,n−1 of the remaining m − 2i balls
in n−1 boxes. For example, if n = 2i+1 , so m = 2i+1 −1, we have m−2i = 2i −1,
and we can apply (19.147) again (with i−1) to see that w.h.p. Y(2) = 2i−1 = n/4.
Continuing in the same way we see that for n in the subsequence (2i ), we have,
for each fixed j, w.h.p.
Y(j) = 2−j n.
(19.149)
Hence neither (19.106) nor (19.107) holds in this case.
Similar results follow easily for other subsequences. For example, for n in the
subsequence (⌊r2i ⌋)i>1 , where 21 < r < 1 and r has the infinite binary expansion
r = 2−ℓ1 + 2−ℓ2 + . . . , with 1 = ℓ1 < ℓ2 < . . . , we have w.h.p. Y(j) = 2−ℓi ⌈n/r⌉
for each fixed j.
Example 19.39. Let again m = n − 1, so λ = 1. Taking wk = k! for k ∈
supp(w) = {0} ∪ {i! : i > 0}, we obtain an example with ρ = 0 and thus ν = 0
such that Y(1) /n → 0 for some subsequences, for example for n = i! (since then
Y(1) 6 (i − 1)!).
p
Problem 19.40. Is Y(1) /n −→ 0 possible when 0 < ν < λ? Example 19.37
shows that this is possible for a subsequence, but we conjecture that it is not
possible for the full sequence, and, a little stronger, that there always is some
ε > 0 and some subsequence along which Y(1) > εn w.h.p.
p
Problem 19.41. Is Y(1) /n −→ 0 possible when λ > ν = 0? (Example 19.39
shows that this is possible for a subsequence.)
We expect that bad behaviour as in the examples above only can occur for
quite irregular weight sequences, but we have no general result beyond Theorem 19.34. We formulate two natural problems.
Simply generated trees and random allocations
229
Problem 19.42. Suppose that wk > wk+1 for all (large) k. Does this imply
that (19.106)–(19.107) hold when λ > ν?
Problem 19.43. Suppose that wk+1 /wk → ∞ as k → ∞. (Hence, ρ = 0 and
ν = 0.) Does this imply that (19.106)–(19.107) hold when λ > ν?
19.7. Applications to random forests
We give some applications of the results above to the size of the largest tree(s)
in different types of random forests witn n trees and m > n nodes. We consider
only the case m/n → λ with 1 < λ < ∞; for simplicity we further assume that
m = λn + O(1), although this can be relaxed and, moreover, the general case
m/n → λ can be handled by using λn := m/n and the corresponding τn :=
τ (λn ) as in Theorem 11.6; for details and for results in the cases m = n + o(n)
and m/n → ∞, see Pavlov [94, 95, 96, 97], Kolchin [76], Luczak and Pittel [83],
Kazimirov and Pavlov [72] and Bernikovich and Pavlov [12].
The random forests considered here are described by balls-in-boxes with
weight sequences with w0 = 0 and w1 > 0, see Section 12. As usual, we use
(without further comments) the argument in Remark 11.8 to extend theorems
above to the case w0 = 0. (See Remark 19.21.)
We first consider random rooted forests as in Example 12.6. We have
wk =
1
k k−1
∼ √ k −3/2 ek ,
k!
2π
as k → ∞,
(19.150)
and thus wk+1 /wk → e as k → ∞. (Alternatively, we may use w
ek := e−k wk ∼
−1/2 −3/2
(2π)
k
, see (12.30)–(12.31) and Example 12.10.) Since ν = ∞, see Examples 12.6 and 12.10, λ < ν and Theorem 19.19 applies for any λ ∈ (1, ∞).
We have a = e and thus, by (12.28),
q := τ e =
λ − 1 1/λ
e
∈ (0, 1)
λ
(19.151)
and, consequently,
1 1
log(1/q) = − log q = − log 1 −
− > 0.
λ
λ
(19.152)
As k → ∞, by (12.22), (12.27) and (19.150),
πk =
wk τ k
λ
λ
=
wk τ k ∼ (2π)−1/2
k −3/2 q k .
Φ(τ )
λ−1
λ−1
(19.153)
It follows that πk(n) = Θ(1/n) for
k(n) =
log n − 32 log log n
+ O(1),
log(1/q)
(19.154)
230
S. Janson
and then (19.69) yields
λ log3/2 (1/q)
λ
k(n)−3/2 ∼ √
n log−3/2 n. (19.155)
N ∼ n√
2π(λ − 1)(1 − q)
2π(λ − 1)(1 − q)
Consequently, Theorem 19.19(ii) yields the following theorem for the maximal
tree size Y(1) ; this is due to Pavlov [94, 96] (in a slightly different formulation),
who also gives further results. We further use Theorem 19.16(i) to give a simple
estimate for the size Y(j) of the j:th largest tree. (More precise limit results for
Y(j) are also easily obtained from (19.50).)
Theorem 19.44. For a random rooted forest, with m = λn + O(1) where
1 < λ < ∞,
log n − 23 log log n + log b + W
d
,
(19.156)
Y(1) ≈
log(1/q)
where W has the Gumbel distribution (19.63) and
b := √
λ log3/2 (1/q)
2π(λ − 1)(1 − q)
(19.157)
with q given by (19.151)–(19.152).
Furthermore, Y(j) = Y(1) + Op (1) for each fixed j.
Next, let us, more generally, consider a random simply generated forest as in
Example 12.8, defined by a weight sequence w. Then the tree sizes in the random
forest are distributed as balls-in-boxes with the weight sequence (Zk )∞
k=0 , where
Zk is the partition function (2.5) for simply generated trees with weight sequence
w (and Z0 = 0).
We assume that ν(w) > 1; thus there exists τ1 > 0 such that Ψ(τ1 ) = 1, and
then w′ := (τ1k wk /Φ(τ1 ))k is an equivalent probability weight sequence with
expectation 1, see Lemma 4.2. (τ1 is the same as τ in Theorem 7.1, but here we
need to consider several different τ ’s so we modify the notation.) This probability
weight sequence w′ defines the same random forest, which thus can be realized as
a conditioned critical Galton–Watson forest. Recall from (4.10) and Theorem 7.1
that the probability distribution w′ has variance σ 2 = τ1 Ψ′ (τ1 ); we assume
that σ 2 is finite, which always holds if ν(w) > 1 and thus τ1 < ρ(w). We
further assume, for simplicity, that w has span 1. We then have the following
generalization of Theorem 19.44, see Pavlov [95, 96], where also further results
are given.
Theorem 19.45. Consider a simply generated random forest defined by a
weight sequence w, and assume that m = λn + O(1) where 1 < λ < ∞. Suppose
that ν(w) > 1 and span(w) = 1. Define τ1 > 0 by Ψ(τ1 ) = 1, and assume that
σ 2 := τ1 Ψ′ (τ1 ) < ∞ (this is automatic if ν(w) > 1). Define further τ2 > 0 by
Ψ(τ2 ) = 1 − 1/λ
(19.158)
Simply generated trees and random allocations
and let
q :=
Φ(τ1 )
τ2
.
·
Φ(τ2 )
τ1
231
(19.159)
Then 0 < q < 1 and
d
Y(1) ≈
log n −
3
2
log log n + log b + W
log(1/q)
,
(19.160)
where W has the Gumbel distribution (19.63) and
b :=
τ1 log3/2 (1/q)
√
.
τ2 2πσ 2 (1 − q)
(19.161)
Furthermore, Y(j) = Y(1) + Op (1) for each fixed j.
e = (w
Proof. Replace w by the equivalent probability weight sequence w
ek ) with
w
ek := τ2k wk /Φ(τ2 ). This probability weight sequence has expectation Ψ(τ2 ) < 1
by (4.9), and using it we realize the random forest as a conditioned subcritical
ek for w
e is by (4.3) and TheoGalton–Watson forest. The partition function Z
rem 18.11,
k−1
τ2k−1 Φ(τ1 )k −3/2
1
ek = τ2
√
Z
k
.
Z
∼
·
k
Φ(τ2 )k
2πσ 2 Φ(τ2 )k τ1k−1
(19.162)
ek ) is the distribution of the size of a Galton–Watson proMoreover, by (2.6), (Z
e Since this offspring distribution is subcritical
cess with offspring distribution w.
ek ) has finite mean
with expectation Ψ(τ2 ) < 1, the size distribution (Z
∞
X
k=0
ek =
kZ
1
= λ,
1 − Ψ(τ2 )
(19.163)
by our choice of τ2 .
The sizes of the trees in the random forest are distributed as balls-in-boxes
with the weight sequence (Zek ), see Example 12.8. We apply Theorem 19.19,
translating wk to Zek . By (19.162),
ek → a :=
Zek+1 /Z
τ2
Φ(τ1 )
,
·
Φ(τ2 )
τ1
as k → ∞.
(19.164)
ek )) τ in Theorem 19.19 is chosen
Note further that (with this weight sequence (Z
ek /Z(τ
e ) has expecsuch that the equivalent probability weight sequence τ k Z
tation λ. We have already constructed (Zek ) such that it is a probability weight
sequence with this expectation, see (19.163); hence we have τ = 1 and q = a,
which yields (19.159).
As in (19.154), πk(n) = Zek(n) = Θ(1/n) for
k(n) =
log n − 32 log log n
+ O(1),
log(1/q)
(19.165)
232
S. Janson
and then (19.69) yields, by (19.162),
τ1 log3/2 (1/q)
τ1
k(n)−3/2 ∼ √
n log−3/2 n.
N ∼ n√
2πσ 2 τ2 (1 − q)
τ2 2πσ 2 (1 − q)
(19.166)
The result (19.160) now follows from Theorem 19.19(ii). Finally, again, Theorem
19.16(i) gives the estimate for Y(j) .
Example 19.46. Consider a random ordered rooted forest. This is obtained
by the weight sequence wk = 1, see Example 12.8, and we have by (10.1)–
(10.2) Φ(t) = 1/(1 − t) and Ψ(t) = t/(1 − t). Hence, τ1 = 1/2 and σ 2 = 2 (see
Example 10.1); furthermore, (19.158) is τ2 /(1 − τ2 ) = 1 − 1/λ, which has the
solution
λ−1
τ2 =
.
(19.167)
2λ − 1
Consequently, Theorem 19.45 says that (19.160) holds, with the parameters q
and b given by, see (19.159) and (19.161),
q=
and
1
4λ(λ − 1)
τ2 (1 − τ2 )
=1−
= 4τ2 (1 − τ2 ) =
τ1 (1 − τ1 )
(2λ − 1)2
(2λ − 1)2
(2λ − 1)3
b= √
log3/2 (1/q).
4 π(λ − 1)
(19.168)
(19.169)
Example 19.47. The random rooted unlabelled forest in Example 12.11 is
described by a weight sequence that also satisfies wk ∼ c1 k −3/2 ρ−k as k → ∞,
and we thus again obtain (19.160), although the parameters q and b now are
implicitly defined using the generating function of the number of unlabelled
rooted trees, see Pavlov [97].
Example 19.48. For the random recursive forest in Example 12.13, we have
wk = k −1 ,
k > 1.
(19.170)
Thus Theorem 19.19 applies with a = 1 and q = τ ∈ (0, 1) given by
q
= λ,
(1 − q)| log(1 − q)|
(19.171)
see (12.56). (Recall that ν = ∞, so we can take any λ > 1 here.) In this case,
see (12.55), πk(n) = k(n)−1 q k(n) /| log(1 − q)| = Θ(1/n) for
k(n) =
log n − log log n
+ O(1),
log(1/q)
(19.172)
cf. (19.154), and then (19.69) yields
N∼
log(1/q)
n log−1 n.
(1 − q)| log(1 − q)|
(19.173)
Simply generated trees and random allocations
Consequently, Theorem 19.19(ii) yields
d
log n − log log n + log b + W
,
Y(1) ≈
log(1/q)
233
(19.174)
where W has the Gumbel distribution (19.63) and, using (19.171),
b :=
λ log(1/q)
log(1/q)
=
.
(1 − q)| log(1 − q)|
q
(19.175)
We thus obtain a result similar to the cases above, but with a different coefficient
for log log n in (19.174). See Pavlov and Loseva [98] for further results.
If we consider the random unrooted forest in Example 12.7, we find different
results. In this case, the tree sizes are described by balls-in-boxes with the weight
sequence wk = k k−2 /k!, k > 1 (and w0 = 0). Alternatively, we can use the
probability weight sequence (12.40)
w
ek := 2wk e−k =
2k k−2 e−k
,
k!
(19.176)
which by Stirling’s formula satisfies (12.41)
2
w
ek ∼ √ k −5/2 ,
2π
as k → ∞.
(19.177)
Since we now have ν = 2 < ∞, see Examples 12.7 and 12.10, there is a
phase transition at λ = 2. We show in the theorem below that for λ < 2
we have a result similar to Theorems 19.44 and 19.45 with maximal tree size
Y(1) = Op (log n), but for λ > 2 there is a unique giant tree with size of order
n. At the phase transition, with m/n → 2, the result depends on the rate of
convergence of m/n; if, for example, m = 2n exactly, the maximal size is of
order n2/3 ; see further Luczak and Pittel [83], where precise results for general
m = m(n) are given. (By the proof below, (iii) in the following theorem holds
as soon as m/n → λ > 2, but (i) and (ii) are more sensitive.)
Theorem 19.49. Consider a random unrooted forest, and assume that m =
λn + O(1) where 1 < λ < ∞.
(i) If 1 < λ < 2, let
q := 2
Then 0 < q < 1 and
log n −
d
Y(1) ≈
5
2
λ − 1 2/λ−1
e
.
λ
log log n + log b + W
log(1/q)
(19.178)
,
(19.179)
where W has the Gumbel distribution (19.63) and
λ2 log5/2 (1/q)
b := √
2 2π(λ − 1)(1 − q)
Furthermore, Y(j) = Y(1) + Op (1) for each fixed j.
(19.180)
234
S. Janson
(ii) If λ = 2, then
d
Y(j) /n2/3 −→ ηj
(19.181)
for each j, where ηj > 0 are some random variables. The distribution of
η1 is given by (19.97) with α = 3/2 and c = (2/π)1/2 .
(iii) If 2 < λ < ∞, then Y(1) = (λ − 2)n + Op (n2/3 ). More precisely,
d
n−2/3 m − 2n − Y(1) −→ X,
(19.182)
where X is a 32 -stable random variable with Laplace transform
25/2
t3/2 ,
E e−tX = exp
3
Re t > 0.
(19.183)
d
For j > 2, Y(j) = Op (n2/3 ), and n−2/3 Y(j) −→ Wj where W2 has the
Fréchet distribution
23/2
x > 0.
(19.184)
P(W2 6 x) = exp − √ x−3/2 ,
3 π
and, more generally, Wj has the density function (19.117) with c′ =
(2/π)1/2 and α = 3/2.
Note that the exponents 32 , 1 and 52 in (19.150), (19.170) and (19.177) appear
as coefficients of log log n in (19.156), (19.174) and (19.179), respectively.
Proof. (i): This is very similar to the proofs of Theorems 19.44 and 19.45. We
use wk = k k−2 /k!. Then, as for rooted forests and (19.150) above, wk+1 /wk → e
as k → ∞. Further, τ is given by (12.38), and thus q := τ e is given by (19.178).
It follows, cf. (19.154) and (19.177), that πk(n) = Θ(1/n) for
k(n) =
log n − 52 log log n
+ O(1),
log(1/q)
(19.185)
and then (19.69) yields
N∼
nwk(n) e−k(n)
λ2
k(n)−5/2
∼n √
Φ(τ )(1 − q)
2 2π(λ − 1)(1 − q)
λ2 log5/2 (1/q)
n log−5/2 n.
∼ √
2 2π(λ − 1)(1 − q)
(19.186)
Hence Theorem 19.19(ii) yields (19.179).
(ii): We use the equivalent probability weight sequence (w
ek ) given by (19.176).
By (19.177), it satisfies the assumptions in Example 19.27 with α = 3/2 and
c = (2/π)1/2 ; thus (19.181) follows from (19.91), and (19.97) in Remark 19.28
applies.
(iii): We use again the probability weight sequence (w
ek ) and apply Theoby
(19.176),
and
thus c′ Γ(−3/2) =
rem 19.34. We have c′ = c = (2/π)1/2
√
5/2
′
3/2
′4
c 3 Γ(1/2) = 2 /3 and c /α = 2 /(3 π).
Simply generated trees and random allocations
235
Example 19.50. The random unrooted unlabelled forest (with labelled trees)
in Example 12.11 is described by another weight sequence that satisfies wk ∼
ck −5/2 ρ−k as k → ∞, and we thus obtain a result similar to Theorem 19.49, although the parameters differ (they can be obtained from the generating function
of the number of unlabelled trees); in particular, the phase transition appears
when λ is ν ≈ 2.0513, see Bernikovich and Pavlov [12] for details.
We do not know any corresponding results for random completely unlabelled
forests (n unlabelled trees consisting of m unlabelled nodes); as said in Example 12.11, they cannot be described by balls-in-boxes.
20. Large nodes in simply generated trees with ν < 1
In the tree case with ν < 1, the results in Section 19.6 show condensation in the
form of one or, sometimes, several nodes with very large degree, together making
up the “missing mass” of about (1 − ν)n. On the other hand, Theorem 7.1 shows
concentration in a somewhat different form, with a limit tree Tb having exactly
one node of infinite degree. This node corresponds to a node with very large
degree in Tn for n large but finite. How large is the degree? Why do we only see
one node with very large degree in Theorem 7.1, but sometimes several nodes
with large degrees above (Examples 19.37 and 19.38)?
The latter question is easily answered: recall that the convergence in Theo[m]
rem 7.1 means convergence of the truncated trees (“left balls”) Tn , see Lemma
6.3; thus we only see a small part of the tree close to the root, and the two pictures above are reconciled: if m is large but fixed, then in the set V (T ) ∩ V [m]
of nodes, there is with probability close to 1 exactly one node with very large
degree. (There may be several nodes with very large degree in the tree, but for
any fixed m, w.h.p. at most one of them is in V [m] .) Of course, to make this
precise, we would have to define “very large”, for example as below using a sequence Ωn growing slowly to ∞ as in Lemma 19.32, but we are at the moment
satisfied with an intuitive description.
To see how large the “very large” degree is, let us first look at the root.
Lemma 15.7 says that the distribution of the root degree is the size-biased
distribution of Y1 . We can write (15.7) as
n
P(d+
Tn (o)
n
d X
d X
P(Yi = d) =
P(Y(j) = d);
= d) =
n − 1 i=1
n − 1 j=1
(20.1)
hence the distribution of the root degree can be described by: sample (Y1 , . . . , Yn )
and then take Y(1) with probability Y(1) /(n − 1), Y(2) with probability Y(2) /(n −
1), . . . .
In particular, if Y(1) = (1 − ν)n + op (n), then (20.1) implies
P(d+
Tn (o) = Y(1) ) = 1 − ν + o(1),
(20.2)
and comparing with Theorem 7.10 we see that w.h.p. either the root degree is
small (more precisely, Op (1)), or it is the maximum outdegree Y(1) . However, we
236
S. Janson
also see that if Y(1) is not (1−ν)n+op (n), then this conclusion does not hold; for
example, in Example 19.38 for n in the subsequence (2i ) where (19.149) holds
for each fixed j,
−j
−j
P(d+
.
(20.3)
Tn (o) = 2 n) → 2
In the case ν = 0, we only have to consider the root, since the node with
infinite degree in Tb always is the root, but for 0 < ν < 1, the node with infinite
degree in Tb may be somewhere else. We shall see that it corresponds to a node
in Tn with a large degree having (asymptotically) the same distribution as the
root degree just considered, conditioned to be “large”.
To make this precise, let Ωn → ∞ be a fixed sequence which increases so
slowly that Lemma 19.32(ii) holds. We say that an outdegree d+ (v) is large if
it is greater than Ωn ; we then also say that the node v is large. (Note that by
e n by
Lemma 19.32(ii), w.h.p. at least one large node exists.) For each n, let D
a random variable whose distribution is the size-biased distribution of a large
outdegree, i.e. of (Y1 | Y1 > Ωn ):
e n = k) = P
P(D
k P(Y1 = k)
k E Nk
k E Nk
,
=
= P
(1
−
ν + o(1))n
l
P(Y
=
l)
l
E
N
1
l
l>Ωn
l>Ωn
(20.4)
e n = k) = 0 otherwise. Equivalently, in view of Lemma 15.7,
for k > Ωn and P(D
e
Dn has the distribution of the root degree d+
Tn (o) conditioned to be greater than
d
en ≈
Y(1) ,
Ωn . See also (20.1), and note that if Y(1) = (1 − ν)n + op (n), then D
e
i.e., we may take Dn = Y(1) w.h.p.; in this case (but not otherwise) we thus
e n = (1 − ν)n + op (n).
have D
Note that if Ω′n is another such sequence, similarly defining a random variable
P
e ′ , then P
D
n
l>Ω′ l P(Y1 = l), and it follows that
l>Ωn l P(Y1 = l) ∼ (1 − ν)n ∼
n
d
e n′ ; hence the choice of Ωn will not matter below.
en ≈
D
D
We claim that, w.h.p., the infinite outdegree in Tb corresponds to an outdegree
e n in Tn . To formalise this, recall from Section 6 that we may consider our trees
D
as subtrees of the infinite tree U∞ with node set V∞ , and that the convergence
of trees defined there means convergence of each d+ (v), see (6.6). Let Tb be the
random infinite tree defined in Section 5; we are in case (T2), and thus Tb has
e n and Tb are
a single node v with outdegree d+
(v) = ∞. We assume that D
Tb
independent, and define the modified degree sequence
(
d+
(v), d+
(v) < ∞,
Tb
Tb
de+
(v) :=
(20.5)
b
+
T
e
Dn ,
d b (v) = ∞.
T
e n , leaving all other values
We thus change the single infinite value to the finite D
+
e
e n does.) We then have
unchanged. (Note that dTb (v) may depend on n, since D
the following theorem.
Theorem 20.1. For any finite set of nodes v1 , . . . , vℓ ∈ V∞ ,
d +
+
e
e+
d+
Tn (v1 ), . . . , dTn (vℓ ) ≈ dTb (v1 ), . . . , dTb (vℓ ) .
(20.6)
237
Simply generated trees and random allocations
Proof. Let ε > 0, and let v ∗ denote the unique node in Tb with d+
(v ∗ ) = ∞. By
Tb
[m]
increasing the set {v1 , . . . , vℓ }, we may assume that it equals V
(see Section 6)
for some m, and that m is so large that P(v ∗ ∈ V [m] ) > 1 − ε. We may then
find K < ∞ such that
[m]
< ε.
P d+
b (v) ∈ (K, ∞) for some v ∈ V
T
d
Since Tn −→ Tb by Theorem 7.1, we may by the Skorohod coupling theorem [69,
Theorem 4.30] assume that the random trees are coupled such that Tn → Tb a.s.,
+
and thus d+
Tn (v) → dTb (v) a.s. for every v. Then, for large n, with probability
> 1 − 3ε, v ∗ ∈ V [m] , d+ (v) = d+ (v) = de+ (v) 6 K for all v ∈ V [m] \ {v ∗ },
Tn
Tb
Tb
+ ∗
∗
and d+
Tn (v ) → dTb (v ) = ∞. We may assume that Ωn → ∞ so slowly that
∗
furthermore P(d+
Tn (v ) 6 Ωn ) 6 ε. (Recall that we may change Ωn without
affecting the result (20.6).)
Let n be so large that also Ωn > m and Ωn > K. It follows from Lemma 15.9
that for each choice of v ′ ∈ V [m] and numbers d(v) for v ∈ V [m] \v ′ , and k > Ωn ,
′
[m]
\ {v ′ } and d+
P d+
Tn (v ) = k
Tn (v) = d(v) for v ∈ V
= k + O(1) C({d(v)}, v ′ , n) P(Y1 = k)
for some constant C({d(v)}, v ′ , n) > 0 not depending on k; hence, by (20.4),
+
′
[m]
′
P d+
\ {v ′ } and d+
Tn (v ) = k | dTn (v) = d(v) for v ∈ V
Tn (v ) > Ωn
k P(Y1 = k)
e n = k).
= 1 + o(1) P(D
= 1 + o(1) P
k
P(Y
=
k)
1
k>Ωn
There is only a finite number of choices of v ′ and (d(v))v∈V [m] \{v′ } , and it follows
∗
e
that we may choose the coupling of Tn and Tb above such that also d+
Tn (v ) = Dn
+
+
[m]
e
w.h.p.; thus, with probability > 1 − 4ε + o(1), dTn (v) = dTb (v) for all v ∈ V .
The result follows since ε > 0 is arbitrary.
We give some variations of this result, where we replace de+
(v) by the degree
Tb
sequences of some random trees obtained by modifying Tb . (Note that de+ (v) is
Tb
not the degree sequence of a tree.)
First, let Tb1n be the random tree obtained by pruning the tree Tb at the
e n children of v ∗ . Then
node v ∗ with infinite outdegree, keeping only the first D
b
T1n is a locally finite tree, and, in fact, it is a.s. finite. The random tree Tb1n
can be constructed as Tb in Section 5, starting with a spine, and then adding
independent Galton–Watson trees to it, but now the number of children of a
node in the spine is given by a finite random variable ξ̌n with the distribution
e n = k) = kπk +(1−ν) P(D
e n = k). (20.7)
P(ξˇn = k) = P(ξb = k)+P(ξb = ∞) P(D
The nodes not in the spine (the normal nodes) have offspring distribution (πk )
as before. (This holds also for the following modifications.)
The spine in Tb1n stops when we obtain ξb = ∞, but we may also define another
random tree Tb2n by continuing the spine to infinity; this defines a random infinite
238
S. Janson
but locally finite tree having an infinite spine; each node in the spine has a
number of children with the distribution in (20.7), and the spine continues
with a uniformly randomly chosen child. Equivalently, Tb2n can be defined by
a Galton–Watson process with normal and special nodes as in Section 5, but
with the offspring distribution for special nodes changed from (5.2) to (20.7).
Finally, let Ybn by a random variable with the size-biased distribution of Y1 :
P(Ybn = k) =
k E Nk
k P(Y1 = k)
=
,
(n − 1)/n
n−1
(20.8)
P
d
recalling that k kNk = n − 1; cf. (20.1) and (20.4). (Thus Ybn = d+
Tn (o) by
Lemma 15.7 and (20.1).) Define the infinite, locally finite random tree Tb3n by
the same Galton–Watson process again, but now with offspring distribution Ybn
e n or Ωn .) Thus Tb3n also has an infinite
for special nodes. (This does not involve D
spine.
We then have the following version of Theorem 20.1, where we also use the
metric δ1 on Tlf defined by
+
[m]
.
(20.9)
δ1 (T1 , T2 ) := 1/ sup m > 1 : d+
T1 (v) = dT2 (v) for v ∈ V
Theorem 20.2. For j = 1, 2, 3, and any finite set of nodes v1 , . . . , vℓ ∈ V∞ ,
d +
+
e+
e
d+
Tn (v1 ), . . . , dTn (vℓ ) ≈ d b (v1 ), . . . , d b (vℓ ) .
Tjn
Tjn
(20.10)
Equivalently, there is a coupling of Tn and Tbjn such that δ1 (Tn , Tbjn ) → 0 as
n → ∞.
e n > m and then the branches of Tb pruned to make
Proof. If Ωn > m we have D
[m]
b
= de+
defined in (20.5) for all v ∈ V [m] .
T1n are all outside V , and thus d+
Tb
Tb1n
Thus the result for Tb1n follows from Theorem 20.1.
Next, for any given m, and for any endpoint x of the spine of Tb1n , the probability that the continuation in Tb2n of the spine contains some node in V [m] is
less than m/Ωn = o(1); thus, w.h.p. Tb1n and Tb2n are equal on any V [m] .
Finally, Lemma 20.3 below implies that we can couple Tb2n and Tb3n such that
they w.h.p. agree on each V (m) ; then Tb1n and Tb3n are w.h.p. equal on each
V [m] .
d
Lemma 20.3. ξˇn ≈ Ybn .
Proof. For each fixed k, P(ξˇn = k) = kπk as soon as Ωn > k, and P(Ybn = k) →
kπk by (20.8) and Theorem 11.7. Hence,
| P(ξˇn = k) − P(Ybn = k)| → 0.
(20.11)
By (20.7), (20.4) and (20.8), uniformly for k > Ωn ,
P(ξˇn = k) = kπk + (1 − ν)
k P(Y1 = k)
= kπk + 1 + o(1) P(Ybn = k);
1 − ν + o(1)
239
Simply generated trees and random allocations
hence
X
k>Ωn
| P(ξˇn = k) − P(Ybn = k)| 6
X
k>Ωn
k=K+1
k>Ωn
(20.12)
Further, for any fixed K,
Ωn
X
X
kπk + o(1).
kπk + o(1) P(Ybn = k) =
Ωn
Ωn
X
X
kπk .
P(ξˇn = k) =
P(ξˇn = k) − P(Ybn = k) + 6
(20.13)
k=K+1
k=K+1
Using Lemma 19.5(vii) together with (20.11) for k 6 K, (20.12) and (20.13) we
obtain
dTV (ξˇn , Ybn ) =
∞
X
k=1
∞
X
ˇ
b
P(ξn = k) − P(Yn = k) + 6
kπk + o(1).
Since K is arbitrary and
(20.14)
k=K+1
P∞
1
kπk < ∞, it follows that dTV (ξˇn , Ybn ) → 0.
21. Further results and problems
21.1. Level widths
Let, as in Remark 5.6, lk (T ) denote the number of nodes with distance k to the
root in a rooted tree T .
If ν > 1, then Tb is a locally finite tree so all level widths lk (Tb ) are finite.
It follows easily from the characterisation of convergence in Lemma 6.2 that, in
this case, the functional lk is continuous at Tb , and thus Theorem 7.1 implies
(see Billingsley [15, Corollary 1, p. 31])
d
lk (Tn ) −→ lk (Tb ) < ∞
(21.1)
for each k > 0.
On the other hand, if ν < 1, then Tb has a node with infinite outdegree; this
node has a random distance L − 1 to the root, where L as in Section 5 is the
length of the spine, and thus lL (Tb ) = ∞.
In the case 0 < ν < 1, we have π0 < 1 and P(ξ > 1) = 1 − π0 > 0, so for
any j, there is a positive probability that the Galton–Watson tree T has height
at least j, and it follows that of the infinitely many copies of T that start in
generation L, a.s. infinitely many will survive at least until generation L + j.
Consequently, a.s., lk (Tb ) = ∞ for all k > L, while lk (Tb ) < ∞ for k < L. It
follows easily from Lemma 6.3, that in this case too, for each k > 0, the mapping
lk : T → N0 is continuous at Tb . Consequently,
d
lk (Tn ) −→ lk (Tb ) 6 ∞,
k = 0, 1, . . . ,
(21.2)
with P(lk (Tb ) < ∞) = P(L > k) = ν k . (Recall that µ = ν in this case by (7.2).)
240
S. Janson
When ν = 0, however, (21.2) does not always hold. By Example 5.1, Tb is an
infinite star, with l1 (Tb ) = ∞ and lk (Tb ) = 0 for all k > 2. By Theorem 7.10,
d b d
b
l1 (Tn ) = d+
Tn (o) −→ ξ = l1 (T ), so (21.2) holds for k = 1 (and trivially for k = 0)
in the case ν = 0 too (with l1 (Tb ) = ∞). However, by Example 10.8, if wk = k!,
d
then l2 (Tn ) −→ Po(1), so l2 (Tn ) does not converge to l2 (Tb ) = 0. Similarly, by
Example 10.9, if j > 2 and wk = k!α with 0 < α < 1/(j − 1), then the number
of paths of length j attached to the root in Tn tends to ∞ (in probability), so
p
lj (Tn ) −→ ∞, while lj (Tb ) = 0.
Turning to moments, we have for the expectation, by (5.8), E lk (Tb ) = ∞
if 0 < ν < 1 or σ 2 = ∞; in this case (21.1)–(21.2) and Fatou’s lemma yield
E lk (Tn ) → E lk (Tb ) = ∞.
If ν > 1 and σ 2 < ∞, then (5.7) yields E lk (Tb ) = 1 + kσ 2 < ∞. In this case,
for each fixed k, the random variables lk (Tn ), n > 1, are uniformly integrable,
and thus (21.1) implies E lk (Tn ) → E lk (Tb ), see Janson [59, Section 10]. (In the
case ν > 1, this was shown already by Meir and Moon [85].) Consequently, for
any w with ρ > 0 and any fixed k,
E lk (Tn ) → E lk (Tb ) 6 ∞.
(21.3)
(When ρ = 0, this is not always true, by the examples above.)
For higher moments, there remains a small gap. Let r > 1. When 0 < ν < 1,
(21.3) trivially implies E lk (Tn )r → E lk (Tb )r = ∞, so suppose ν > 1. Then,
by (5.2), E ξbr = E ξ r+1 , so if E ξ r+1 = ∞, then E l1 (Tb )r = ∞; moreover, each
lk (Tb ), k > 1, stochastically dominates ξb (consider the offspring of the k:th node
on the spine), and thus E lk (Tb )r = ∞ for every k > 1. Consequently, again
immediately by Fatou’s lemma and (21.2), E lk (Tn )r → E lk (Tb )r = ∞. The
only interesting case is thus when E ξ r+1 < ∞. If r > 1 is an integer, it was
shown in [59, Theorem 1.13] that E ξ r+1 < ∞ implies that E lk (Tn )r , n > 1,
are uniformly bounded for each k > 1. We conjecture that, moreover, lk (Tn )r ,
n > 1, are uniformly integrable, which by (21.1) would yield the following:
Conjecture 21.1. For every integer r > 1 and every k > 1, if ν > 0, then
E lk (Tn )r → E lk (Tb )r 6 ∞.
(21.4)
We further conjecture that this holds also for non-integer r > 0.
One thus has to consider the case E ξ r+1 < ∞ only, and the result from [59]
implies that (21.4) holds if E ξ r+2 < ∞, since then E lk (Tn )⌊r⌋+1 are uniformly
bounded.
21.2. Asymptotic normality
In Theorem 7.11, we proved that Nd , the number of nodes of outdegree d in the
p
random tree Tn , satisfies Nd /n −→ πd .
Simply generated trees and random allocations
241
In our case Iα (ν > 1 or ν = 1 and σ 2 < ∞), Kolchin [76, Theorem 2.3.1]
gives the much stronger result that the random variable Nd is asymptotically
normal, for every d > 0:
Nd − nπd d
√
−→ N (0, σd2 ),
n
(21.5)
with
(d − 1)2 πd .
(21.6)
σd2 := πd 1 − πd −
σ2
(In fact, Kolchin [76] gives a local limit theorem which is a stronger version of
(21.5).)
Under the assumption ξ 3 < ∞, Janson [55, Example 3.4] gave another proof
of (21.5), and showed further joint convergence for different d, with asymptotic
covariances, using Ik := 1{ξ = k},
(k − 1)(l − 1)πk πl
Cov(Ik , ξ) Cov(Il , ξ)
.
= πk δkl − πk πl −
Var ξ
σ2
(21.7)
Moreover, Janson [55] showed that if E |ξ|r < ∞ for every r (which in particular holds when ν > 1 since then τ < ρ and ξ has some exponential moment),
then convergence of all moments and joint moments holds in (21.5); in particular
2
σkl
= Cov(Ik , Il ) −
E Nk = nπk + o(n) and
2
Cov(Nk , Nl ) = nσkl
+ o(n).
(21.8)
In the case ν > 1, Minami [89] and Drmota [33, Section 3.2.1] have given
other proofs of the (joint) asymptotic normality using the saddle point method;
Drmota [33] shows further the stronger moment estimates
E Nd = nπd + O(1) and
Var Nd = nσd2 + O(1).
(21.9)
Problem 21.2. Do these results hold in the case ν = 1, σ 2 < ∞ without extra
moment conditions? Do they extend to the case ν = 1, σ 2 = ∞? What happens
when 0 6 ν < 1?
Problem 21.3. Extend this to the more general case of balls-in-boxes as in
Theorem 11.4. (We guess that the case 0 < λ < ν is easy by the methods in
the references above, in particular [55] and [33, Section 3.2.1], but we have not
checked the details.)
Problem 21.4. Extend this to the subtree counts in Theorem 7.12.
21.3. Height and width
We have studied the random trees Tn without any scaling. Since our mode of
convergence really means that we consider only a finite number of generations at
a time, we are really looking at the base of the tree, with the first generations.
The results in this paper thus do not say anything about, for example, the
242
S. Janson
height and width of Tn . (Recall that if T is a rooted tree, then the height
H(T ) := max{k : lk (T ) > 0}, the maximum distance from the root, and the
width W (T ) := maxk {lk (T )}, the largest size of a generation.) However, there
are other known results.
In the case ν > 1, σ 2 < ∞ (the case Iα in Section 8), it is well-known that
√
both the height H(Tn ) and the width W (Tn ) of Tn typically are of order n;
more precisely,
√ d
H(Tn )/ n −→ 2σ −1 X,
(21.10)
√ d
(21.11)
W (Tn )/ n −→ σX,
where X is some strictly positive random variable (in fact, X equals the maximum of a standard Brownian excursion and has what is known as a theta
distribution), see e.g. Kolchin [76], Aldous [4], Chassaing, Marckert and Yor
[25], Janson [59] and Drmota [33]. There are√also results for a single level
√ giving an asymptotic distribution for lk(n) (Tn )/ n when the level k(n) ∼ a n for
some a > 0, see Kolchin [76, Theorem 2.4.5].
Since the variance σ 2 appears as a parameter in these results, we cannot
expect any simple extensions to the case σ 2 = ∞, and even less to the case
0 6 ν < 1. Nevertheless, we conjecture that (21.10) and (21.11) extend formally
at least to the case ν = 1 and σ 2 = ∞:
√ p
Conjecture 21.5. If ν = 1 and σ 2 = ∞, then H(Tn )/ n −→ 0.
√ p
Conjecture 21.6. If ν = 1 and σ 2 = ∞, then W (Tn )/ n −→ ∞.
√ p
Problem 21.7. Does ν < 1 imply that H(Tn )/ n −→ 0?
√ p
Problem 21.8. Does ν < 1 imply that W (Tn )/ n −→ ∞?
Furthermore, still in the case ν > 1, σ 2 < ∞, Addario-Berry, Devroye and
Janson [1] have shown sub-Gaussian tail estimates for the height and width
√
2
P(H(Tn ) > x n) 6 Ce−cx ,
(21.12)
√
−cx2
P(W (Tn ) > x n) 6 Ce
,
(21.13)
uniformly in all x > 0 and n > 1 (with some positive constants C and c depending on π and thus on w). In view of (21.11), we cannot expect (21.13) to
hold when σ 2 = ∞ (or when ν < 1), but we see no reason why (21.12) cannot
hold; (21.10) suggests that H(Tn ) typically is smaller when σ 2 = ∞.
Problem 21.9. Does (21.12) hold for any weight sequence w (with C and c
depending on w, but not on x or n)?
√
It follows
n )/ n and
√ from (21.10)–(21.11) and (21.12)–(21.13) that E H(T√
E W (Tn )/ n converge to positive numbers. (In fact, the limits are 2π/σ and
p
π/2 σ, see e.g. Janson [61], where also joint moments are computed.)
Problem 21.10. What are the growth rates of E H(Tn ) and E W (Tn ) when
σ 2 = ∞ or ν < 1?
Simply generated trees and random allocations
243
21.4. Scaled trees
The results (21.10)–(21.11), as well as many other results on various asymptotics
of Tn in the case ν > 1, σ 2 < ∞, can be seen as consequences of the convergence
of
√ the tree Tn , after rescaling in a suitable sense in both height and width by
n, to the continuum random tree defined by Aldous [3, 4, 5], see also Le Gall
[80]. (The continuum random tree is not an ordinary tree; it is a compact metric
space.) This has been extended to the case σ 2 = ∞ when π is in the domain of
attraction of a stable distribution, see e.g. Duquesne [34] and Le Gall [80, 81];
the limit is now a different random metric space called a stable tree.
Problem 21.11. Is there some kind of similar limiting object in the case ν < 1
(after suitable scaling)?
21.5. Random walks
Simple random walk on the infinite random tree Tb has been studied by many
authors in the critical case ν > 1, in particular when σ 2 < ∞, see e.g. Kesten
[74], Barlow and Kumagai [9], Durhuus, Jonsson and Wheater [35], Fujii and
Kumagai [43], but also when σ 2 = ∞, see Croydon and Kumagai [30] (assuming
attraction to a stable law).
A different approach is to study simple random walk on Tn and study asymptotics os n → ∞. For example, by rescaling the tree one can obtain convergence
to a process on the continuum random tree (when σ 2 < ∞) or stable tree
(assuming attraction to a stable law), see Croydon [28, 29].
For ν < 1, the simple random walk on Tb does not make sense, since the
tree has a node with infinite degree. Nevertheless, it might be interesting to
study simple random walk on Tn and find asymptotics of interesting quantities
as n → ∞.
21.6. Multi-type conditioned Galton–Watson trees
It seems likely that there are results similar to the ones in Section 7 for multitype Galton–Watson trees conditioned on the total size, or perhaps on the number of nodes of each type, and for corresponding generalizations of simply generated random trees. We are not aware of any such results, however, and leave
this as an open problem. (See Kurtz, Lyons, Pemantle and Peres [78] for related
results that presumably are useful.)
22. Different conditionings for Galton–Watson trees
One of the principal objects studied in this paper is the conditioned Galton–
Watson tree (T | |T | = n), i.e. a Galton–Watson tree T conditioned on its total
size being n; we then let n → ∞. This is one way to consider very large Galton–
Watson trees, but there are also other similar conditionings. For comparison, we
244
S. Janson
briefly consider two possibilities; see further Kennedy [73] and Aldous and Pitman [6]. We denote the offspring distribution by ξ and its probability generating
function by Φ(t).
22.1. Conditioning on |T | > n.
If E ξ 6 1, i.e., in the subcritical and critical cases, |T | < ∞ a.s. and thus T
conditioned on |T | > n is a mixture of (T | |T | = N ) = TN for N > n. It follows
d
immediately from Theorem 7.1 that (T | |T | = N ) −→ Tb as n → ∞.
If E ξ > 1, i.e., in the supercritical case, on the other hand, the event |T | = ∞
has positive probability, and the events |T | > n decrease to |T | = ∞. Consequently,
d
(T | |T | > n) −→ (T | |T | = ∞),
(22.1)
a supercritical Galton–Watson tree conditioned on non-extinction.
Remark 22.1. When T is supercritical, the conditioned Galton–Watson tree
(T | |T | = ∞) in (22.1) can be constructed by a 2-type Galton–Watson process,
somewhat similar to the construction of Tb in Section 5: Let q := P(|T | < ∞) < 1
be the extinction probability, which is given by Φ(q) = q. Consider a Galton–
Watson process T with individuals of two types, mortal and immortal, where
a mortal gets only mortal children while an immortal may get both mortal and
immortal children. The numbers ξ ′ of mortal and ξ ′′ of immortal children are
described by the probability generating functions
′
′′
E xξ y ξ = Φm (x) := Φq (x) = Φ(qx)/q
(22.2)
for a mortal and
′
′′
E xξ y ξ = Φi (x) :=
Φ(qx + (1 − q)y) − Φ(qx)
1−q
(22.3)
for an immortal (with the children coming in random order). Note that the
subtree started by a mortal is subcritical (since Φ′m (1) = Φ′ (q) < 1, cf. (4.9)),
and thus a.s. finite, while every immortal has at least one immortal child (since
Φi (x, 0) = 0) and thus the subtree started by an immortal is infinite. It is
easily verified that T conditioned on non-extinction equals this random tree T
started with an immortal, while T conditioned on extinction equals T started
with a mortal. (See Athreya and Ney [8, Section I.12], where this is stated in a
somewhat different form.)
One important difference from Tb is that T does not have a single spine;
started with an immortal it has a.s. an uncountable number of infinite paths
from the root.
Note that Tb in the critical case can be seen as a limit case of this construction.
If we let q ր 1, which requires that we really consider a sequence of different
distributions with generating functions Φ(n) (t) → Φ(t), then taking the limits in
(22.2)–(22.3) gives for the limiting critical distribution the offspring generating
Simply generated trees and random allocations
245
functions Φm (x) = Φ(x) and Φi (x, y) = yΦ′ (x), which indeed are the generating
functions for the offspring distributions in Section 5 in the critical case (with
b
mortal = normal and immortal = special), since E xξ−1 y = yΦ′ (x) = Φi (x, y)
by (5.4).
22.2. Conditioning on H(T ) > n.
To condition on the height H(T ) being at least n is the same as conditioning
on ln (T ) > 0, i.e., that the Galton–Watson process survives for at least n
generations.
If E ξ > 1, i.e., in the supercritical case, the events ln (T ) > 0 decrease to
|T | = ∞. Consequently,
d
(T | H(T ) > n) = (T | ln (T ) > 0) −→ (T | |T | = ∞),
(22.4)
exactly as when conditioning on |T | > n in (22.1). By Remark 22.1, the limit
equals T , started with an immortal.
In the subcritical and critical cases, the following result, proved by Kesten
[74] (at least for E ξ = 1, see also Aldous and Pitman [6]), shows convergence
to the size-biased Galton–Watson tree T ∗ in Remark 5.7.
Theorem 22.2. Suppose that µ := E ξ 6 1. Then, as n → ∞,
d
(T | H(T ) > n) = (T | ln (T ) > 0) −→ T ∗ .
(22.5)
Proof. Let rn := P(ln (T ) > 0), the probability of survival for at least n generations. Then rn → 0 as n → ∞. Fix ℓ > 0 and a tree T with height ℓ. Conditioned
on T (ℓ) = T , the remainder of the tree consists of lℓ (T ) independent branches,
each distributed as T , and thus, for n > ℓ,
P(T (ℓ) = T | H(T ) > n) =
P(T (ℓ) = T and H(T ) > n)
P(H(T ) > n)
P(T (ℓ) = T ) 1 − (1 − rn−ℓ )lℓ (T )
=
.
P(H(T ) > n)
(22.6)
(ℓ)
(ℓ)
Let Tf be the set of finite trees of height ℓ. Summing (22.6) over T ∈ Tf
yields 1, and thus
X
P(H(T ) > n) =
(22.7)
P(T (ℓ) = T ) 1 − (1 − rn−ℓ )lℓ (T )
(ℓ)
T ∈Tf
Dividing by rn−ℓ , and noting that for any N > 1, 1 − (1 − r)N /r ր N as
r ց 0, we find by monotone convergence
X
P(H(T ) > n)
1 − (1 − rn−ℓ )lℓ (T )
=
P(T (ℓ) = T )
rn−ℓ
rn−ℓ
(ℓ)
T ∈Tf
(22.8)
X
→
P(T (ℓ) = T )lℓ (T ) = E lℓ (T ) = µℓ .
(ℓ)
T ∈Tf
246
S. Janson
Hence, by (22.6) and (5.11),
P(T (ℓ) = T | H(T ) > n) ∼
P(T (ℓ) = T )lℓ (T )rn−ℓ
P(H(T ) > n)
P(T (ℓ) = T )lℓ (T )
→
= P(T ∗(ℓ) = T ).
µℓ
(22.9)
d
Thus, (T | H(T ) > n)(ℓ) −→ T ∗(ℓ) , and the result follows by (6.9).
Note that if E ξ = 1, then T ∗ = Tb , see Remark 5.7, so the limits in Theorems
7.1 and 22.2 of T conditioned on |T | = n and H(T ) > n have the same limit.
However, in the subcritical case E ξ < 1, T ∗ 6= Tb ; moreover, T ∗ differs also
from the limit in Theorem 7.1, which is Tb for a conjugated distribution, and
the same is true in the supercritical case. Hence, as remarked by Kennedy [73],
conditioning on |T | = n and H(T ) > n give similar results (in the sense that
the limits as n → ∞ are the same) in the critical case, but quite different results
in the subcritical and supercritical cases. Similarly, conditioning on |T | > n and
H(T ) > n give quite different results in the subcritical case. Aldous and Pitman [6] remarks that the two different limits as n → ∞ both can be intuitively
interpreted as “T conditioned on being infinite”, which shows that one has to
be careful with such interpretations.
Acknowledgement
This research was started during a visit to NORDITA, Stockholm, during the
program Random Geometry and Applications, 2010. I thank the participants, in
particular Zdzislaw Burda, Bergfinnur Durhuus, Thordur Jonsson and Sigurkur
Stefánsson, for stimulating discussions.
References
[1] L. Addario-Berry, L. Devroye & S. Janson, Sub-Gaussian tail
bounds for the width and height of conditioned Galton–Watson trees.
Ann. Probab., to appear. arXiv:1011.4121
[2] D. Aldous, Asymptotic fringe distributions for general families of random trees. Ann. Appl. Probab. 1 (1991), no. 2, 228–266. MR1102319
[3] D. Aldous, The continuum random tree I. Ann. Probab. 19 (1991), no.
1, 1–28. MR1085326
[4] D. Aldous, The continuum random tree II: an overview. Stochastic Analysis (Durham, 1990), 23–70, London Math. Soc. Lecture Note Ser. 167,
Cambridge Univ. Press, Cambridge, 1991. MR1166406
[5] D. Aldous, The continuum random tree III. Ann. Probab. 21 (1993), no.
1, 248–289. MR1207226
[6] D. Aldous & J. Pitman, Tree-valued Markov chains derived from
Galton–Watson processes. Ann. Inst. H. Poincaré Probab. Statist. 34
(1998), no. 5, 637–686. MR1641670
Simply generated trees and random allocations
247
[7] R. Arratia, A. D. Barbour & S. Tavaré, Logarithmic Combinatorial
Structures: a Probabilistic Approach, EMS, Zürich, 2003. MR2032426
[8] K. B. Athreya & P. E. Ney, Branching Processes. Springer-Verlag,
Berlin, 1972. MR0373040
[9] M. T. Barlow & T. Kumagai, Random walk on the incipient infinite
cluster on trees. Illinois J. Math. 50 (2006), no. 1–4, 33–65. MR2247823
[10] D. Beihoffer, J. Hendry, A. Nijenhuis & S. Wagon, Faster algorithms for Frobenius numbers. Electron. J. Combin. 12 (2005), R27.
MR2156681
[11] J. Bennies & G. Kersting, A random walk approach to Galton–Watson
trees. J. Theoret. Probab. 13 (2000), no. 3, 777–803. MR1785529
[12] E. S. Bernikovich & Yu. L. Pavlov, On the maximum size of a tree
in a random unlabelled unrooted forest. Diskret. Mat. 23 (2011), no. 1,
3–20 (Russian). English transl.: Discrete Math. Appl. 21 (2011), no. 1,
1–21. MR2830693
[13] P. Bialas & Z. Burda, Phase transition in fluctuating branched geometry. Physics Letters B 384 (1996), 75–80. MR1410422
[14] P. Bialas, Z. Burda & D. Johnston, Condensation in the backgammon model. Nuclear Physics 493 (1997), 505–516.
[15] P. Billingsley, Convergence of Probability Measures. Wiley, New York,
1968. MR0233396
[16] N. H. Bingham, C. M. Goldie & J. L. Teugels, Regular Variation.
Cambridge Univ. Press, Cambridge, 1987. MR0898871
[17] C. W. Borchardt, Ueber eine der Interpolation entsprechende Darstellung der Eliminations-Resultante. J. reine und angewandte Mathematik
57 (1860), 111–121.
[18] É. Borel, Sur l’emploi du théorème de Bernoulli pour faciliter le calcul
d’une infinité de coefficients. Application au problème de l’attente à un
guichet. C. R. Acad. Sci. Paris 214 (1942), 452–456. MR0008126
[19] A. V. Boyd, Formal power series and the total progeny in a branching
process. J. Math. Anal. Appl. 34 (1971), 565–566. MR0281270
[20] V. E. Britikov, Asymptotic number of forests from unrooted trees. Mat.
Zametki 43 (1988), no. 5, 672–684, 703 (Russian). English transl.: Math.
Notes 43 (1988), no. 5–6, 387–394. MR0954351
[21] R. Carr, W. M. Y. Goh & E. Schmutz, The maximum degree in a
random tree and related problems. Random Struct. Alg. 5 (1994), no. 1,
13–24. MR1248172
[22] A. Cayley, A theorem on trees. Quart. J. Math. 23 (1889), 376–378.
[23] P. Chassaing & B. Durhuus, Local limit of labeled trees and expected
volume growth in a random quadrangulation. Ann. Probab. 34 (2006), no.
3, 879–917. MR2243873
[24] P. Chassaing & G. Louchard, Phase transition for parking blocks,
Brownian excursion and coalescence. Random Struct. Alg. 21 (2002), no.
1, 76–119. MR1913079
248
S. Janson
[25] P. Chassaing, J.-F. Marckert & M. Yor, The height and width
of simple trees. Mathematics and Computer Science (Versailles, 2000),
17–30, Trends Math., Birkhäuser, Basel, 2000. MR1798284
[26] R. M. Corless, G. H. Gonnet, D. E. Hare, D. J. Jeffrey & D.
E. Knuth, On the Lambert W function. Adv. Comput. Math. 5 (1996),
no. 4, 329–359. MR1414285
[27] H. Cramér, Sur un noveau théorème-limite de la théorie des probabilités. Les sommes et les fonctions de variables aléatoires, Actualités Scientifiques et Industrielles 736, Hermann, Paris, 1938, pp. 5–23.
[28] D. Croydon, Convergence of simple random walks on random discrete
trees to Brownian motion on the continuum random tree. Ann. Inst. H.
Poincaré Probab. Statist. 44 (2008), no. 6, 987–1019. MR2469332
[29] D. Croydon, Scaling limits for simple random walks on random ordered
graph trees. Adv. Appl. Probab. 42 (2010), no. 2, 528–558. MR2675115
[30] D. Croydon & T. Kumagai, Random walks on Galton–Watson trees
with infinite variance offspring distribution conditioned to survive. Electron. J. Probab. 13 (2008), no. 51, 1419–1441. MR2438812
[31] A. Dembo & O. Zeitouni, Large Deviations Techniques and Applications. 2nd ed., Springer, New York, 1998. MR1619036
[32] L. Devroye, Branching processes and their applications in the analysis of
tree structures and tree algorithms. Probabilistic Methods for Algorithmic
Discrete Mathematics, eds. M. Habib, C. McDiarmid, J. Ramirez and B.
Reed, Springer, Berlin, 1998, pp. 249–314. MR1678582
[33] M. Drmota, Random Trees, Springer, Vienna, 2009. MR2484382
[34] T. Duquesne, A limit theorem for the contour process of conditioned Galton–Watson trees. Ann. Probab. 31 (2003), no. 2, 996–1027.
MR1964956
[35] B. Durhuus, T. Jonsson & J. F. Wheater, The spectral dimension
of generic trees. J. Stat. Phys. 128 (2007), 1237–1260. MR2348795
[36] M. Dwass, The total progeny in a branching process and a related random
walk. J. Appl. Probab. 6 (1969), 682–686. MR0253433
[37] F. Eggenberger & G. Pólya, Über die Statistik verketteter Vorgänge.
Zeitschrift Angew. Math. Mech. 3 (1923), 279–289.
[38] W. Feller, An Introduction to Probability Theory and its Applications,
Volume I, 2nd ed., Wiley, New York, 1957. MR0088081
[39] W. Feller, An Introduction to Probability Theory and its Applications,
Volume II, 2nd ed., Wiley, New York, 1971. MR0270403
[40] P. Flajolet & R. Sedgewick, Analytic Combinatorics. Cambridge
Univ. Press, Cambridge, UK, 2009. MR2483235
[41] S. Franz & F. Ritort, Dynamical solution of a model without energy
barriers. Europhysics Letters 31 (1995), 507–512
[42] S. Franz & F. Ritort, Glassy mean-field dynamics of the backgammon
model. J. Stat. Phys. 85 (1996), 131–150.
[43] I. Fujii & T. Kumagai, Heat kernel estimates on the incipient infinite
cluster for critical branching processes. Proceedings of German–Japanese
Simply generated trees and random allocations
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
249
Symposium in Kyoto 2006, RIMS Kôkyûroku Bessatsu B6 (2008), pp.
8–95. MR2407556
J. Geiger, Elementary new proofs of classical limit theorems for Galton–
Watson processes. J. Appl. Probab. 36 (1999), no. 2, 301–309. MR1724856
J. Geiger & L. Kauffmann, The shape of large Galton–Watson trees
with possibly infinite variance. Random Struct. Alg. 25 (2004), no. 3,
311–335. MR2086163
B. V. Gnedenko & A. N. Kolmogorov, Limit Distributions for
Sums of Independent Random Variables. Gosudarstv. Izdat. Tehn.-Teor.
Lit., Moscow–Leningrad, 1949 (Russian). English transl.: Addison-Wesley,
Cambridge, Mass., 1954. MR0062975
G. R. Grimmett, Random labelled trees and their branching networks.
J. Austral. Math. Soc. Ser. A 30 (1980/81), no. 2, 229–237. MR0607933
G. R. Grimmett, The Random-Cluster Model, Springer, Berlin, 2006.
MR2243761
A. Gut, Probability: A Graduate Course. Springer, New York, 2005.
MR2125120
G. H. Hardy, J. E. Littlewood & G. Pólya, Inequalities. 2nd ed.,
Cambridge, at the University Press, 1952. MR0046395
T. E. Harris, A lower bound for the critical probability in a certain percolation process. Proc. Cambridge Philos. Soc. 56 (1960), 13–20.
MR0115221
L. Holst, Two conditional limit theorems with applications. Ann. Statist.
7 (1979), no. 3, 551–557. MR0527490
L. Holst, A unified approach to limit theorems for urn models. J. Appl.
Probab. 16 (1979), 154–162. MR0520945
I. A. Ibragimov & Yu. V. Linnik, Independent and Stationary Sequences of Random Variables. Nauka, Moscow, 1965 (Russian). English
transl.: Wolters-Noordhoff Publishing, Groningen, 1971. MR0322926
S. Janson, Moment convergence in conditional limit theorems. J. Appl.
Probab. 38 (2001), no. 2, 421–437. MR1834751
S. Janson, Asymptotic distribution for the cost of linear probing hashing.
Random Struct. Alg. 19 (2001), no. 3–4, 438–471. MR1871562
S. Janson, Cycles and unicyclic components in random graphs. Combin.
Probab. Comput. 12 (2003), 27–52. MR1967484
S. Janson, Functional limit theorems for multitype branching processes
and generalized Pólya urns. Stochastic Process. Appl. 110 (2004), no. 2,
177–245. MR2040966
S. Janson, Random cutting and records in deterministic and random
trees. Random Struct. Alg. 29 (2006), no. 2, 139–179. MR2245498
S. Janson, Rounding of continuous random variables and oscillatory
asymptotics. Ann. Probab. 34 (2006), no. 5, 1807–1826. MR2271483
S. Janson, On the asymptotic joint distribution of height and width
in random trees, Studia Sci. Math. Hungar. 45 (2008), no. 4, 451–467.
MR2641443
250
S. Janson
[62] S. Janson, Probability asymptotics: notes on notation. Institute MittagLeffler Report 12, 2009 spring. arXiv:1108.3924
[63] S. Janson, Stable distributions. Unpublished notes, 2011.
arXiv:1112.0220
[64] S. Janson, T. Jonsson & S. Ö. Stefánsson, Random trees with superexponential branching weights. J. Phys. A: Math. Theor. 44 (2011),
485002.
[65] S. Janson, T. Luczak & A. Ruciński, Random Graphs. Wiley, New
York, 2000. MR1782847
[66] N. L. Johnson & S. Kotz, Urn Models and their Application. Wiley,
New York, 1977. MR0488211
[67] T. Jonsson & S. Ö. Stefánsson, Condensation in nongeneric trees. J.
Stat. Phys. 142 (2011), no. 2, 277–313. MR2764126
[68] O. Kallenberg, Random Measures. Akademie-Verlag, Berlin, 1983.
MR0818219
[69] O. Kallenberg, Foundations of Modern Probability. 2nd ed., Springer,
New York, 2002. MR1876169
[70] N. I. Kazimirov, On some conditions for absence of a giant component in
the generalized allocation scheme. Diskret. Mat. 14 (2002), no. 2, 107–118
(Russian). English transl.: Discrete Math. Appl. 12 (2002), no. 3, 291–302.
MR1937012
[71] N. I. Kazimirov, Emergence of a giant component in a random permutation with a given number of cycles. Diskret. Mat. 15 (2003), no. 3,
145–159 (Russian). English transl.: Discrete Math. Appl. 13 (2003), no. 5,
523–535. MR2021211
[72] N. I. Kazimirov & Yu. L. Pavlov, A remark on the Galton–Watson
forests. Diskret. Mat. 12 (2000), no. 1, 47–59 (Russian). English transl.:
Discrete Math. Appl. 10 (2000), no. 1, 49–62. MR1778765
[73] D. P. Kennedy, The Galton–Watson process conditioned on the total
progeny. J. Appl. Probab. 12 (1975), 800–806. MR0386042
[74] H. Kesten, Subdiffusive behavior of random walk on a random cluster. Ann. Inst. H. Poincaré Probab. Statist. 22 (1986), no. 4, 425–487.
MR0871905
[75] D. E. Knuth, The Art of Computer Programming. Vol. 3: Sorting and
Searching. 2nd ed., Addison-Wesley, Reading, Mass., 1998. MR0445948
[76] V. F. Kolchin, Random Mappings. Nauka, Moscow, 1984 (Russian).
English transl.: Optimization Software, New York, 1986. MR0865130
[77] V. F. Kolchin, B. A. Sevast’yanov & V. P. Chistyakov, Random
Allocations. Nauka, Moscow, 1976 (Russian). English transl.: Winston,
Washington, D.C., 1978. MR0471016
[78] T. Kurtz, R. Lyons, R. Pemantle & Y. Peres, A conceptual proof of
the Kesten–Stigum Theorem for multi-type branching processes. Classical and Modern Branching Processes (Minneapolis, MN, 1994), IMA Vol.
Math. Appl., 84, Springer, New York, 1997, pp. 181–185. MR1601737
Simply generated trees and random allocations
251
[79] J.-L. Lagrange, Nouvelle méthode pour résoudre les équations littérales
par le moyen des séries. Mémoires de l’Académie royale des Sciences et
Belles-Lettres de Berlin, XXIV (1770), 5–73.
[80] J.-F. Le Gall, Random trees and applications. Probab. Surveys 2 (2005),
245–311. MR2203728
[81] J.-F. Le Gall, Random real trees. Ann. Fac. Sci. Toulouse Math. (6)
15 (2006), no. 1, 35–62. MR2225746
[82] M. R. Leadbetter, G. Lindgren & H. Rootzén, Extremes and
Related Properties of Random Sequences and Processes. Springer-Verlag,
New York, 1983. MR0691492
[83] T. Luczak & B. Pittel, Components of random forests. Combin.
Probab. Comput. 1 (1992), no. 1, 35–52. MR1167294
[84] R. Lyons, R. Pemantle & Y. Peres, Conceptual proofs of L log L
criteria for mean behavior of branching processes. Ann. Probab. 23 (1995),
no. 3, 1125–1138. MR1349164
[85] A. Meir & J. W. Moon, On the altitude of nodes in random trees.
Canad. J. Math., 30 (1978), 997–1015. MR0506256
[86] A. Meir & J. W. Moon, On the maximum out-degree in random trees.
Australas. J. Combin. 2 (1990), 147–156. MR1126991
[87] A. Meir & J. W. Moon, On nodes of large out-degree in random trees.
Congr. Numer. 82 (1991), 3–13. MR1152053
[88] A. Meir & J. W. Moon, A note on trees with concentrated maximum
degrees. Utilitas Math. 42 (1992), 61–64. Coorigendum: Utilitas Math. 43
(1993), 253. MR1199088
[89] N. Minami, On the number of vertices with a given degree in a Galton–
Watson tree. Adv. Appl. Probab. 37 (2005), no. 1, 229–264. MR2135161
[90] J. W. Moon, On the maximum degree in a random tree. Michigan Math.
J. 15 (1968), 429–432. MR0233729
[91] J. Neveu, Arbres et processus de Galton–Watson. Ann. Inst. H. Poincaré
Probab. Statist. 22 (1986), no. 2, 199–207. MR0850756
[92] R. Otter, The number of trees. Ann. of Math. (2) 49 (1948), 583–599.
MR0025715
[93] R. Otter, The multiplicative process. Ann. Math. Statistics 20 (1949),
206–224. MR0030716
[94] Yu. L. Pavlov, The asymptotic distribution of maximum tree size in a
random forest. Teor. Verojatnost. i Primenen. 22 (1977), no. 3, 523–533
(Russian). English transl.: Th. Probab. Appl. 22 (1977), no. 3, 509–520.
MR0461619
[95] Yu. L. Pavlov, The limit distributions of the maximum size of a tree in
a random forest. Diskret. Mat. 7 (1995), no. 3, 19–32 (Russian). English
transl.: Discrete Math. Appl. 5 (1995), no. 4, 301–315. MR1361491
[96] Yu. L. Pavlov, Random Forests. Karelian Centre Russian Acad. Sci.,
Petrozavodsk, 1996 (Russian). English transl.: VSP, Zeist, The Netherlands, 2000. MR1651128
252
S. Janson
[97] Yu. L. Pavlov, Limit theorems on sizes of trees in a random unlabelled
forest. Diskret. Mat. 17 (2005), no. 2, 70–86 (Russian). English transl.:
Discrete Math. Appl. 15 (2005), no. 2, 153–170. MR2167801
[98] Yu. L. Pavlov & E. A. Loseva, Limit distributions of the maximum
size of a tree in a random recursive forest. Diskret. Mat. 14 (2002), no. 1,
60–74 (Russian). English transl.: Discrete Math. Appl. 12 (2002), no. 1,
45–59. MR1919856
[99] J. Pitman, Enumerations of trees and forests related to branching processes and random walks. Microsurveys in Discrete Probability (Princeton,
NJ, 1997), DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 41, Amer. Math. Soc., Providence, RI, 1998, pp. 163–180.
MR1630413
[100] F. Ritort, Glassiness in a model without energy barriers. Physical Review Letters 75 (1995), 1190–1193.
[101] W. Rudin, Real and Complex Analysis. McGraw-Hill, London, 1970
[102] S. Sagitov & M. C. Serra, Multitype Bienaymé–Galton–Watson processes escaping extinction. Adv. Appl. Probab. 41 (2009), no. 1, 225–246.
MR2514952
[103] R. P. Stanley, Enumerative Combinatorics, Volume 2. Cambridge Univ.
Press, Cambridge, 1999. MR1676282
[104] J. J. Sylvester, On the change of systems of independent variables,
Quart J. Math. 1 (1857), 42–56.
[105] L. Takács, A generalization of the ballot problem and its application
in the theory of queues. J. Amer. Statist. Assoc. 57 (1962), 327–337.
MR0138139
[106] L. Takács, Ballots, queues and random graphs. J. Appl. Probab. 26
(1989), no. 1, 103–112. MR0981255
[107] J.C. Tanner, A derivation of the Borel distribution. Biometrika 48
(1961), 222–224. MR0125648
[108] J. G. Wendel, Left-continuous random walk and the Lagrange expansion. Amer. Math. Monthly 82 (1975), 494–499. MR0381000
[109] Herbert S. Wilf, generatingfunctionology. 2nd ed., Academic Press,
1994. MR1277813
Download