RANDOM GRAPHS

advertisement
RANDOM GRAPHS
MAXIME BERGERON
Abstract. We give a nearly identical account (adding a generous amount of details and
indicating an error) of the paper where Erdős and Rényi first analyzed the evolution of
random graphs [ER59].
1. Introduction
Our main object of study will be a random graph Γn,N . By this we shall
mean a graph
n
on n vertices along with N edges chosen randomly out of the possible 2 edges. We will
usually denote a particular graph on n vertices having N edges as Gn,N . Furthermore,
throughout most of this note N will be equal to Nc as defined by:
1
(1)
Nc := b n log n + cnc.
2
This emulates the idea of a threshold function for the property of being completely connected.
Definition. We say a random graph Γn,Nc is of type A when for some 0 ≤ k ≤ n it consists
of a connected graph on n − k vertices along with k isolated points. Graphs which are not
of type A will be said to be of type Ā and the probability that Γn,Nc is of type Ā shall be
denoted by P (Ā, n, Nc ).
The following lemma is the key to all of the later results.
Lemma 1.1.
lim P (Ā, n, Nc ) = 0
n→∞
In other words, when n is large enough, almost all random graphs Γn,Nc are of type A.
That is, they are made up of a single connected component and a bunch of isolated points.
This will be used to prove the following theorems:
Completely Connected. If P0 (n, Nc ) denotes the probability of Γn,Nc being completely
connected, then:
−2c
lim P0 (n, Nc ) = e−e .
n→∞
Greatest Connected Component. If Pk (n, Nc ) denotes the probability that the greatest
connected component of Γn,Nc consists of n − k points for some 0 ≤ k ≤ n, then:
(e−2c )k e−e
lim Pk (n, Nc ) =
n→∞
k!
1
−2c
.
2
MAXIME BERGERON
In other words, the number of points outside the greatest component of Γn,Nc is distributed
in the limit according to Poisson’s law with mean value e−2c .
Q
Disjoint Connected Components. If k (n, Nc ) denotes the probability
that Γn,Nc conQ
sists of exactly k + 1 disjoint connected components (in particular 0 (n, Nc ) = P0 (n, Nc )),
then:
−2c
Y
(e−2c )k e−e
lim
(n, Nc ) =
.
n→∞
k!
k
Namely, the number of disjoint components of Γn,Nc is distributed in the limit according to
Poisson’s law with mean value e−2c .
Markov Process. Let the edges of a random graph on the vertices v1 , v2 , . . . , vn be chosen
successively among all possible edges in such a manner that at each stage, every edge
which has not yet been chosen has the same probability to be chosen as the next, and let
us continue this process until the graph becomes completely connected. Let ηn denote the
number of edges of the resulting connected random graph Γ. Then, we have:
2 −2l − 2l
1
P (ηn = b n log nc + l) ∼ e n −e n
2
n
for |l| = O(n) and
lim P
n→∞
!
ηn − 12 n log n
−2x
< x = e−e .
n
In the limit the probability that the graph becomes completely connected after adding
−2x
Nx edges converges to e−e
which converges to 1 as n → ∞. This corresponds to phase
4 in the evolution of a graph as outlined in the sequel of this paper of Erdős and Rényi.
2. The Key Lemma
In the sequel we shall abbreviate Greatest Connected Component by GCC.
Lemma 2.1 (Key Lemma). Almost all graphs consist of a single connected component of
k vertices and n − k isolated vertices. More precisely,
lim P (Ā, n, Nc ) = 0.
n→∞
Remark We recreate here the proof as it appeared in the original paper. In doing so,
we highlight a harmless mistake which can be fixed as indicated in [GS81]. However, one
should be aware that modern approaches yield much shorter proofs [KR97].
Proof. Fix some M >> 0 to be specified later and divide the graphs Gn,Nc into two classes:
(1) EM := {Gn,Nc whose GCC contains ≥ n − M vertices }
(2) EM := {Gn,Nc whose GCC contains < n − M vertices }.
RANDOM GRAPHS
3
Since this definition depends on n and Nc we let N (EM , n, Nc ) := |EM |. Suppose now that
Gn,Nc ∈ EM has r connected components of l1 , . . . lr vertices. In this case,
r
r X
X
li
(2)
≥ Nc
li = n and
2
i=1
i=1
where the second inequality follows because the Nc edges must all lie within one of the
connected components. Thus, if L := maxi li , we have that
L−1
Nc
≥
2
n
(3)
c
and consequently L > 2N
n . This can also be seen because the average degree of a vertex
can not exceed L − 1. As such,
2Nc
2Nc
< L < n − M =⇒ M < n −
n
n
because M is the strict minimal number of vertices that do not belong to the GCC while
the GCC has at most L vertices. These preliminaries yield the following upper bound
where we are summing over all possible numbers ‘s’ of vertices that do not belong to the
GCC:
n
X
n
− s(n − s)
2
(5)
N (EM , n, Nc ) ≤
s
Nc
2Nc
(4)
M <s<n−
n
Indeed, if the n − s vertices belonging to a GCC are fixed, the s(n − s) potential edges
between the GCC and the rest of the graph are not in the graph. There is an inequality
instead of equality because we are not imposing any “connectedness” restrictions on GCC,
hence over-counting.
Now, if P (EM , n, Nc ) denotes the probability that a graph Γn,Nc ∈ EM , then:
(6)
N (EM , n, Nc )
P (EM , n, Nc ) =
≤
(n2 )
Nc
X
c
M <s<n− 2N
n
(n2 )−s(n−s)
n
Nc
.
(n2 )
s
Nc
For n >> 0, the author’s claimed that using elementary estimations one could obtain
(n2 )−s(n−s)
n
e(3−2c)s
Nc
≤
when s ≤ n/2
(7)
n (2)
s
s!
Nc
and
(8)
(n2 )−s(n−s)
n
e(3−2c)(n−s)
Nc
≤
when s ≥ n/2.
n (2)
s
(n − s)!
Nc
4
MAXIME BERGERON
Remark This is actually false as stated (see the Appendix) but it will be asumed to
complete the proof as in the original paper. The error seems to have gone unnoticed until
a 1981 paper of Godehardt and Steinbach [GS81] where the proof was corrected using
a modified estimate which later appeared as exercise 7.15 in Bolobás’ book on random
graphs [Bol01]. In modern random graph theory, this lemma has a very short proof using
the binomial random graph model as indicated in [KR97].
Now, combining equations (6), (7) and (8) we have for n >> 0 that
X e(3−2c)s
X
X e(3−2c)s
e(3−2c)(n−s)
+
≤
+
P (EM , n, Nc ) ≤
s!
(n − s)!
s!
n
n
2Nc
n
n
M <s≤ 2
2
≤s<n−
M <s≤ 2
n
2
X
c
≤(n−s)<n− 2N
n
and letting the tails go to infinity,
(9)
P (EM , n, Nc ) ≤
X e(3−2c)s
X e(3−2c)s
+
.
s!
s!
2Nc
M <s
n
<s
Thus, choosing M := log log n and keeping in mind that factorial kills exponential over
time (these are both sub-series of the Taylor expansion of the exponential) we have
(10)
lim P (EM , n, Nc ) = 0.
n→∞
To complete the proof of the lemma it suffices to verify the following claim:
Claim 2.2. limn→∞ P (AEM , n, Nc ) = 0
Suppose we have fixed the n − s vertices of a GCC in a graph G ∈ AEM . If there are r
edges between the s vertices not in the GCC then, since G ∈ A and r ≥ 1, we can choose
s these edges in (r2) ways. On the other hand, there are Nc − r edges within the GCC of
n−s n − s vertices which can be chosen in at most ( 2 ) ways (this is an upper bound since
Nc −r
we do not ensure that the GCC is actually connected). Finally, since G ∈ EM , the GCC
has at least n − log log n vertices so it misses at most log log n vertices. Moreover, since
G ∈ A, the GCC misses at least two vertices so
 s

(2) s (n−s
logX
log n X
2 )
n 
Nc −r 
2
(11)
P (AEM , n, Nc ) ≤

.
(n)
s
r
s=2
r=1
2
Nc
Finally, using our previous bounds in (7) and (8) we have by various estimates that
(12)
s
log log n
−2c
2
log n X 2(2) e−2sc
ee e(log log n) /2 n→∞
P (AEM , n, Nc ) ≤
≤
−−−→ 0.
n
s!
n
s=2
This completes the proof of the claim.
e(3−2c)s
s!
RANDOM GRAPHS
5
3. Main Results
Theorem 3.1 (Completely Connected). If P0 (n, Nc ) denotes the probability of Γn,Nc being
completely connected, then:
lim P0 (n, Nc ) = e−e
(13)
−2c
n→∞
.
Proof. Let us denote by Gn,Nc some fixed graph on n vertices with Nc edges. Set N0 (n, Nc )
to denote the number of completely connected graphs Gn,Nc or, equivalently, the number
of graphs Gn,Nc which are of type A with no isolated vertices. Let then N00 (n, Nc ) denote
the number of graphs Gn,Nc with no isolated points (including those of type Ā). We have
a convenient formula for this former quantity:
N00 (n, Nc )
(14)
=
n
X
k=0
n−k
n
2
(−1)
Nc
k
k
which follows from an application of the inclusion-exclusion principle as follows:
Let Ai denote the set of graphs Gn,Nc where vertex ‘i’ is isolated and notice that
n−k
2
|Ai1 ∩ . . . ∩ Aik | =
Nc
because the edges need to “avoid” the isolated vertices. The inclusion exclusion principle
states that


n
X
X
| ∪ni=1 Ai | =
(−1)k+1 
|Ai1 ∩ . . . ∩ Aik |
k=1
1≤i1 <...<ik ≤n
n−k
n
n
X
X
k+1 n
k+1 n
2
=
(−1)
|Ai1 ∩ . . . ∩ Aik | =
(−1)
k
k
Nc
k=1
k=1
and consequently
N00 (n, Nc )
n
=
2
Nc
n−k
n−k X
n
n
X
k+1 n
k n
2
2
(−1)
−
(−1)
=
k
Nc
k
Nc
k=0
k=1
as claimed. Note that the left hand side is contained between any two consecutive partial
sums of the right hand side (inclusion-exclusion overestimate-underestimate).
Now, for any fixed value of k we have:
Claim 3.2.
(15)
(n−k
2 )
e−2kc
n
Nc
=
lim
n n→∞ k
(2)
k!
Nc
6
MAXIME BERGERON
so we obtain that
∞
∞
k=0
k=0
N 0 (n, Nc ) X (−1)k e−2kc X (−e−2c )k
−2c
=
= e−e
lim 0 n =
n→∞
(2)
k!
k!
Nc
where the last equality follows from the power series definition of the exponential. But
clearly,
N 0 (n, Nc ) − N0 (n, Nc )
n→∞
≤ P (A, n, Nc ) −−−→ 0
0≤ 0
(n2 )
Nc
so an application of the key lemma concludes the proof.
The following argument is due to Byron Schmuland.
Proof of Equation (15). First, notice that
k−1
Y
i=0
k−1
k−1
Y n−i
k! nk
i
1 Y
(1 − ) =
(
)= k
(n − i) =
n
n
n
nk
i=0
i=0
and that
(n−k
2 )
Nc
=
( )
n
2
Nc
n−k
n
2 ! · ( 2 − Nc )!
( n−k
− Nc )! · n2 !
2
=
)−Nc
(n2Y
=
Both products are over
(n2Y
)−Nc
−Nc +1
j=(n−k
2 )
n
2
−
(n2 )
Y
j
+1
j=(n−k
2 )
n−k
2
1
=
j
(n2 )
Y
+1
j=(n−k
2 )
(n2 )
Y
j
j=(n−k
−Nc +1
2 )
j
−Nc +1
j=(n−k
2 )
(n−k
2 )
Y
j=(n
−Nc +1
2)
1
j
1
.
j
terms so in fact
(n2 )
Y
(i − Nc )
+1
i=(n−k
2 )
(n2 )
Y
+1
i=(n−k
2 )
1
=
i
(n2 )
Y
+1
i=(n−k
2 )
i − Nc
.
i
We may therefore modify our original expression to obtain
(n2 )
(n2 )
(n−k
k−1
2 )
Y
Y
Y
i
N
Nc
n
c
Nc
k
k
1−
1−
(16) k!
=n
1−
≈n
.
(n2 )
k
n
j
j
n−k
n−k
i=0
j=( 2 )+1
j=( 2 )+1
Nc
WLOG, take logs!
(17)
k log(n) +
(n2 )
X
j=(n−k
+1
2 )
Nc
log 1 −
≈ k log(n) − Nc
j
(n2 )
X
j=(n−k
+1
2 )
1
.
j
This holds because − log(1 − x) − x = x + x2 /2 + x3 /3 + . . . − x ≤ x2 /2 ≤ x2 for 0 ≤ x ≤ 1/2
and limn→∞ Nc /j = 0. Now, for n >> 0:
RANDOM GRAPHS
7
n
(n)
(
)
(n2 )
2
2
X
X
X
1 1
Nc
2
+ Nc
≤ Nc2
log 1 −
≤ Nc
2
n−k
j
j
j
j=( 2 )+1
j=(n−k
+1 j=(n−k
+1
2 )
2 )
n
2
n−k
2
n−k 2
2
−
where the right hand inequality came from bounding the sum by the large term times the
number of summands. The right hand of the inequality goes to zero because
n
− n−k
2
Nc2 2
n−k 2
2
=
Nc2
n−k
2
n
2
n−k
2
!
−1
=
Nc2
n−k
2
2kn − k 2 + k
(n − k)(n − k − 1)
2(2kn − k 2 + k)
1
<< (n log n)2 3 << log n/n → 0.
(n − k)2 (n − k − 1)2
n
P(n2 )
2k
1
Coming back to equation (17), notice that
j ≈ n because bounding from
j=(n−k
+1
)
2
above we have
= Nc2
(n2 )
X
j=(n−k
+1
2 )
1
≤
j
n
n−k
−
=
n−k
2
2
2
1
n
2
n−k
2
−1≈
2k
n
and bounding from below we have
(n2 )
X
+1
j=(n−k
2 )
1
1
≥ n
j
2
n
n−k
−
=
2
2
n−k
2
n
2
"
#
2k
n
n−k
≈1· .
−
n−k
2
2
n
2
1
Subbing things back into equation (17) we now have


(n−k
2 )
n
Nc
 ≈ k log(n) − ( 1 n log n + cn) 2k = k log(n) − k log(n) − 2kc
log k!
n (2)
k
2
n
Nc
so taking the exponential of both sides completes this rather tiresome proof.
Theorem 3.3 (Greatest Connected Component). If Pk (n, Nc ) denotes the probability of
the greatest connected component of Γn,Nc consisting of n − k points for 0 ≤ k ≤ n, then:
(18)
(e−2c )k e−e
lim Pk (n, Nc ) =
n→∞
k!
−2c
.
Namely, the number of points outside the greatest component of Γn,Nc is distributed in the
limit according to Poisson’s law with mean value e−2c .
8
MAXIME BERGERON
Proof. It follows from the key lemma that in the limit we need only consider graphs of
type A. The number of graphs Gn,Nc of type A having a connected component of precisely
n−
k points is equal to the number of completely connected graphs Gn−k,Nc multiplied by
n
k . As such, we have that
n
k
Pk (n, Nc ) ∼
· |{Gn−k,Nc : is connected }|
=
(n2 )
n
k
n−k · ( N2 ) · P0 (n − k, nc )
c
(n)
Nc
2
Nc
or

Pk (n, Nc ) ∼ 
n
k
n−k 
· ( N2 )
c
 · P0 (n − k, Nc ).
(n)
2
Nc
Now, the first bracketed term in the right hand side corresponds to equation (15) so it
−2kc
converges to e k! as n → ∞. On the other hand,
Nc − 12 (n − k) log(n − k)
=c
n→∞
n−k
lim
so the difference in the limit between the Nc (n) and Nc (n−k) for a constant k goes to zero,
hence by the Connected Component Theorem’s equation (13), P0 (n − k, Nc ) ∼ P0 (n, Nc ) =
−2c
ee .
Q
Theorem 3.4 (Disjoint Connected Components). If k (n, Nc ) denotes the probability
that
Q
Γn,Nc consists of exactly k + 1 disjoint connected components (in particular 0 (n, Nc ) =
P0 (n, Nc )), then:
(19)
−2c
Y
(e−2c )k e−(e )
(n, Nc ) =
.
n→∞
k!
lim
k
Proof. It follows from the key lemma that we may restrict ourself to graphs of type A. As
such, in the limit, graphs with k + 1 disjoint connected components
Q have a single connected
component of more than one vertex and k isolated vertices so k (n, Nc ) ∼ Pk (n, Nc ) and
the result follows by the Disjoint Connected Components theorem.
Theorem 3.5 (Markov Process). Let the edges of a random graph on the vertices v1 , v2 , . . . , vn
be chosen successively among all possible edges in such a manner that at each stage, every
edge which has not yet been chosen has the same probability to be chosen as the next, and
let us continue this process until the graph becomes completely connected. If ηn denotes the
number of edges of the resulting connected random graph Γ, then:
(20)
2 −2l − 2l
1
P (ηn = b n log nc + l) ∼ e n −e n
2
n
RANDOM GRAPHS
9
for |l| = O(n) and
(21)
lim P
n→∞
!
ηn − 12 n log n
−2x
< x = e−e .
n
Proof. To prove equation (20), let us reduce the probability to some others we already
know. If ηn = b 12 n log nc + l = N + 1, then just before choosing the last edges we had a
disconnected graph Gn,N which could be made completely connected by adding a single
edge. By the key lemma, in the limit such a Gn,N consists of the disjoint union of a
connected graph on n − 1 vertices and a single isolated vertex.
Given this setup, the last
edge can be chosen in n − 1 ways among the remaining n2 − N edges to obtain a connected
∼ n2 n−1
∼ n2n/2 = n2 we see that
graph Gn,N +1 . Since nn−1
( 2 )−N
−n log n
2
1
2
2 −2l − 2l
P (ηn = b n log nc + l) ∼ P1 (n, N ) ∼ e n −e n
2
n
n
where P1 (n, N ) is the probability of obtaining a graph as described previously. Since
l
N = b 12 n log nc + l − 1 and l ∈ O(n), we see that l − 1 = cn =⇒ c = l−1
n ≈ n which
justifies the right hand side of the claimed asymptotic.
To prove equation (21), notice that if ηn = b 12 n log nc + l then P (ηn − 21 n log n < nx) is
asymptotically the sum of the probabilities given by equation (20) where l < nx. In other
words:
X 2 −2l − 2ln
1
e n −e .
P (ηn − n log n < nx) ∼
2
n
l<nx
Changing the variables by t :=
X 2 −2l − 2ln
e n −e
n
l<nx
2l
n,
we recognize the sum as a Riemann partition so we have
Z 2x
X t
−t
−t
−t
−e−2x
e−t−e dt = [e−e ]2x
.
e−t−e ∼
=
−∞ = e
l
−∞
t<2x
4. Epilogue
Above, we have explored the structure of a random graph having N (n) = 21 n log n + cn
edges as done in [ER59]. In the subsequent paper [ER60] a broader picture of random
graphs Γn,N (n) was uncovered, describing their evolution in terms of five major phases.
n→∞
4.1. Phase 1. When N (n) = o(n) or N n(n) −−−→ 0, Γn,N (n) is almost surely made up
exclusively of small components that are trees.
4.2. Phase 2. When N (n) ∼ αn and 0 < α < 1/2, Γn,N (n) starts containing cycles of
any fixed order with probability tending to a positive limit. Each component of Γn,N (n)
is almost surely a single cycle or a tree. The GCC of Γn,N (n) remains a tree containing
roughly log n − log log n vertices.
10
MAXIME BERGERON
4.3. Phase 3. When N (n) ∼ αn, and α = 1/2, a dramatic change occurs. The GCC of
Γn,N (n) now consists almost surely of n2/3 vertices and as soon as α > 1/2, the GCC of
α→∞
Γn,N (n) consists almost surely of G(α) · n vertices where G(1/2) = 0 and G(α) −−−→ 1.
Remark During this phase, all the tree components of Γn,N (n) progressively melt into the
GCC with smaller trees surviving the longest outside of it.
4.4. Phase 4. When N (n) ∼ αn log n and α ≤ 1/2 the graph Γn,N (n) eventually becomes
almost surely connected. In particular, if N (n) = 12 n log n + cn + o(n), Γn,N (n) is almost
−2c
surely of type A and the probability that it is completely connected tends to e−e
which
goes to 1 as c → ∞ (This is the content of the Completely Connected Theorem).
4.5. Phase 5. When N (n) ∼ w(n)(n log n) with w(n) → ∞, Γn,N (n) is almost surely
connected and the degree of vertices are almost surely equal so the graph is asymptotically
regular.
5. appendix
The following argument is due to Byron Schmuland. The first (false) bound claimed in
the proof of the key lemma is equivalent to
n
1/s

− s(n − s)
2

 n
Nc
 ≤ e3−2c
s!
n 

s
2
Nc
and can be written in product form as
(22)
s−1
Y
(n − i)
(n2 )
Y
1/s
j=(n
−s(n−s)+1
2)
i=0
Nc 1/s
≤ e3−2c .
1−
j
We may assume that
1
n
2
0 ≤ Nc ≤
− (n/2) .
3
2
This is possible since cn grows like a multiple of n, Nc grows like a multiple of n log(n),
Q
1/s
,
while n2 − (n/2)2 grows like a multiple of n2 . We begin with the factor s−1
i=0 (n − i)
and bound its logarithm from below to obtain:
s−1
(23)
1X
log (n − i) ≥ log(n/2) = log(n) − log(2).
s
i=0
The logarithm of the other factor
(n2 )
Y
j=(n
−s(n−s)+1
2)
Nc 1/s
1−
j
RANDOM GRAPHS
is
1
s
(n2 )
X
j=(n
−s(n−s)+1
2)
11
Nc
.
log 1 −
j
From the inequality log(1 − x) ≥ −4x/3 for 0 ≤ x ≤ 1/3 and our assumption on Nc we see
that
1
s
(n2 )
X
j=(n
−s(n−s)+1
2)
−4Nc
Nc
≤
log 1 −
j
3s
and using the integral test we obtain
Z n
dt
−4Nc
−4Nc ( 2 )
≤
log
≤
3s
3s
(n2 )−s(n−s) t
(n2 )
X
j=(n
−s(n−s)+1
2)
n
2
n
2
− s(n − s)
1
j
!
.
Substituting −Nc ≥ −( 12 n log(n) + cn) and s = n/2 gives a lower bound of
4
2(n − 1)
4
8c
(24)
−(log(n) + 2c) log
≈ − log(n) log(2) −
log(2).
3
n−2
3
3
Since 43 log(2) ≈ .9242, by adding equations (24) and (23) we find that the left hand side of
equation (22) grows at least like n.0758 as n goes to infinity, so the inequality would appear
to be false.
References
[Bol01] B. Bollobás, Random graphs, vol. 73, Cambridge Univ Pr, 2001.
[ER59] P. Erdős and A. Rényi, On random graphs i., Publ. Math. Debrecen 6 (1959), 290–297.
, On the evolution of random graphs, Akad. Kiadó, 1960.
[ER60]
[GS81] E. Godehard and J. Steinbach, On a lemma of p. erdős and a. rényi about random graphs, Publ.
Math. Debrecen 8 (1981), 271–273.
[KR97] M. Karonski and A. Rucinski, The origins of the theory of random graphs, Algorithms and Combinatorics 13 (1997), 311–336.
Download