RANDOM GRAPHS MAXIME BERGERON Abstract. We give a nearly identical account (adding a generous amount of details and indicating an error) of the paper where Erdős and Rényi first analyzed the evolution of random graphs [ER59]. 1. Introduction Our main object of study will be a random graph Γn,N . By this we shall mean a graph n on n vertices along with N edges chosen randomly out of the possible 2 edges. We will usually denote a particular graph on n vertices having N edges as Gn,N . Furthermore, throughout most of this note N will be equal to Nc as defined by: 1 (1) Nc := b n log n + cnc. 2 This emulates the idea of a threshold function for the property of being completely connected. Definition. We say a random graph Γn,Nc is of type A when for some 0 ≤ k ≤ n it consists of a connected graph on n − k vertices along with k isolated points. Graphs which are not of type A will be said to be of type Ā and the probability that Γn,Nc is of type Ā shall be denoted by P (Ā, n, Nc ). The following lemma is the key to all of the later results. Lemma 1.1. lim P (Ā, n, Nc ) = 0 n→∞ In other words, when n is large enough, almost all random graphs Γn,Nc are of type A. That is, they are made up of a single connected component and a bunch of isolated points. This will be used to prove the following theorems: Completely Connected. If P0 (n, Nc ) denotes the probability of Γn,Nc being completely connected, then: −2c lim P0 (n, Nc ) = e−e . n→∞ Greatest Connected Component. If Pk (n, Nc ) denotes the probability that the greatest connected component of Γn,Nc consists of n − k points for some 0 ≤ k ≤ n, then: (e−2c )k e−e lim Pk (n, Nc ) = n→∞ k! 1 −2c . 2 MAXIME BERGERON In other words, the number of points outside the greatest component of Γn,Nc is distributed in the limit according to Poisson’s law with mean value e−2c . Q Disjoint Connected Components. If k (n, Nc ) denotes the probability that Γn,Nc conQ sists of exactly k + 1 disjoint connected components (in particular 0 (n, Nc ) = P0 (n, Nc )), then: −2c Y (e−2c )k e−e lim (n, Nc ) = . n→∞ k! k Namely, the number of disjoint components of Γn,Nc is distributed in the limit according to Poisson’s law with mean value e−2c . Markov Process. Let the edges of a random graph on the vertices v1 , v2 , . . . , vn be chosen successively among all possible edges in such a manner that at each stage, every edge which has not yet been chosen has the same probability to be chosen as the next, and let us continue this process until the graph becomes completely connected. Let ηn denote the number of edges of the resulting connected random graph Γ. Then, we have: 2 −2l − 2l 1 P (ηn = b n log nc + l) ∼ e n −e n 2 n for |l| = O(n) and lim P n→∞ ! ηn − 12 n log n −2x < x = e−e . n In the limit the probability that the graph becomes completely connected after adding −2x Nx edges converges to e−e which converges to 1 as n → ∞. This corresponds to phase 4 in the evolution of a graph as outlined in the sequel of this paper of Erdős and Rényi. 2. The Key Lemma In the sequel we shall abbreviate Greatest Connected Component by GCC. Lemma 2.1 (Key Lemma). Almost all graphs consist of a single connected component of k vertices and n − k isolated vertices. More precisely, lim P (Ā, n, Nc ) = 0. n→∞ Remark We recreate here the proof as it appeared in the original paper. In doing so, we highlight a harmless mistake which can be fixed as indicated in [GS81]. However, one should be aware that modern approaches yield much shorter proofs [KR97]. Proof. Fix some M >> 0 to be specified later and divide the graphs Gn,Nc into two classes: (1) EM := {Gn,Nc whose GCC contains ≥ n − M vertices } (2) EM := {Gn,Nc whose GCC contains < n − M vertices }. RANDOM GRAPHS 3 Since this definition depends on n and Nc we let N (EM , n, Nc ) := |EM |. Suppose now that Gn,Nc ∈ EM has r connected components of l1 , . . . lr vertices. In this case, r r X X li (2) ≥ Nc li = n and 2 i=1 i=1 where the second inequality follows because the Nc edges must all lie within one of the connected components. Thus, if L := maxi li , we have that L−1 Nc ≥ 2 n (3) c and consequently L > 2N n . This can also be seen because the average degree of a vertex can not exceed L − 1. As such, 2Nc 2Nc < L < n − M =⇒ M < n − n n because M is the strict minimal number of vertices that do not belong to the GCC while the GCC has at most L vertices. These preliminaries yield the following upper bound where we are summing over all possible numbers ‘s’ of vertices that do not belong to the GCC: n X n − s(n − s) 2 (5) N (EM , n, Nc ) ≤ s Nc 2Nc (4) M <s<n− n Indeed, if the n − s vertices belonging to a GCC are fixed, the s(n − s) potential edges between the GCC and the rest of the graph are not in the graph. There is an inequality instead of equality because we are not imposing any “connectedness” restrictions on GCC, hence over-counting. Now, if P (EM , n, Nc ) denotes the probability that a graph Γn,Nc ∈ EM , then: (6) N (EM , n, Nc ) P (EM , n, Nc ) = ≤ (n2 ) Nc X c M <s<n− 2N n (n2 )−s(n−s) n Nc . (n2 ) s Nc For n >> 0, the author’s claimed that using elementary estimations one could obtain (n2 )−s(n−s) n e(3−2c)s Nc ≤ when s ≤ n/2 (7) n (2) s s! Nc and (8) (n2 )−s(n−s) n e(3−2c)(n−s) Nc ≤ when s ≥ n/2. n (2) s (n − s)! Nc 4 MAXIME BERGERON Remark This is actually false as stated (see the Appendix) but it will be asumed to complete the proof as in the original paper. The error seems to have gone unnoticed until a 1981 paper of Godehardt and Steinbach [GS81] where the proof was corrected using a modified estimate which later appeared as exercise 7.15 in Bolobás’ book on random graphs [Bol01]. In modern random graph theory, this lemma has a very short proof using the binomial random graph model as indicated in [KR97]. Now, combining equations (6), (7) and (8) we have for n >> 0 that X e(3−2c)s X X e(3−2c)s e(3−2c)(n−s) + ≤ + P (EM , n, Nc ) ≤ s! (n − s)! s! n n 2Nc n n M <s≤ 2 2 ≤s<n− M <s≤ 2 n 2 X c ≤(n−s)<n− 2N n and letting the tails go to infinity, (9) P (EM , n, Nc ) ≤ X e(3−2c)s X e(3−2c)s + . s! s! 2Nc M <s n <s Thus, choosing M := log log n and keeping in mind that factorial kills exponential over time (these are both sub-series of the Taylor expansion of the exponential) we have (10) lim P (EM , n, Nc ) = 0. n→∞ To complete the proof of the lemma it suffices to verify the following claim: Claim 2.2. limn→∞ P (AEM , n, Nc ) = 0 Suppose we have fixed the n − s vertices of a GCC in a graph G ∈ AEM . If there are r edges between the s vertices not in the GCC then, since G ∈ A and r ≥ 1, we can choose s these edges in (r2) ways. On the other hand, there are Nc − r edges within the GCC of n−s n − s vertices which can be chosen in at most ( 2 ) ways (this is an upper bound since Nc −r we do not ensure that the GCC is actually connected). Finally, since G ∈ EM , the GCC has at least n − log log n vertices so it misses at most log log n vertices. Moreover, since G ∈ A, the GCC misses at least two vertices so s (2) s (n−s logX log n X 2 ) n Nc −r 2 (11) P (AEM , n, Nc ) ≤ . (n) s r s=2 r=1 2 Nc Finally, using our previous bounds in (7) and (8) we have by various estimates that (12) s log log n −2c 2 log n X 2(2) e−2sc ee e(log log n) /2 n→∞ P (AEM , n, Nc ) ≤ ≤ −−−→ 0. n s! n s=2 This completes the proof of the claim. e(3−2c)s s! RANDOM GRAPHS 5 3. Main Results Theorem 3.1 (Completely Connected). If P0 (n, Nc ) denotes the probability of Γn,Nc being completely connected, then: lim P0 (n, Nc ) = e−e (13) −2c n→∞ . Proof. Let us denote by Gn,Nc some fixed graph on n vertices with Nc edges. Set N0 (n, Nc ) to denote the number of completely connected graphs Gn,Nc or, equivalently, the number of graphs Gn,Nc which are of type A with no isolated vertices. Let then N00 (n, Nc ) denote the number of graphs Gn,Nc with no isolated points (including those of type Ā). We have a convenient formula for this former quantity: N00 (n, Nc ) (14) = n X k=0 n−k n 2 (−1) Nc k k which follows from an application of the inclusion-exclusion principle as follows: Let Ai denote the set of graphs Gn,Nc where vertex ‘i’ is isolated and notice that n−k 2 |Ai1 ∩ . . . ∩ Aik | = Nc because the edges need to “avoid” the isolated vertices. The inclusion exclusion principle states that n X X | ∪ni=1 Ai | = (−1)k+1 |Ai1 ∩ . . . ∩ Aik | k=1 1≤i1 <...<ik ≤n n−k n n X X k+1 n k+1 n 2 = (−1) |Ai1 ∩ . . . ∩ Aik | = (−1) k k Nc k=1 k=1 and consequently N00 (n, Nc ) n = 2 Nc n−k n−k X n n X k+1 n k n 2 2 (−1) − (−1) = k Nc k Nc k=0 k=1 as claimed. Note that the left hand side is contained between any two consecutive partial sums of the right hand side (inclusion-exclusion overestimate-underestimate). Now, for any fixed value of k we have: Claim 3.2. (15) (n−k 2 ) e−2kc n Nc = lim n n→∞ k (2) k! Nc 6 MAXIME BERGERON so we obtain that ∞ ∞ k=0 k=0 N 0 (n, Nc ) X (−1)k e−2kc X (−e−2c )k −2c = = e−e lim 0 n = n→∞ (2) k! k! Nc where the last equality follows from the power series definition of the exponential. But clearly, N 0 (n, Nc ) − N0 (n, Nc ) n→∞ ≤ P (A, n, Nc ) −−−→ 0 0≤ 0 (n2 ) Nc so an application of the key lemma concludes the proof. The following argument is due to Byron Schmuland. Proof of Equation (15). First, notice that k−1 Y i=0 k−1 k−1 Y n−i k! nk i 1 Y (1 − ) = ( )= k (n − i) = n n n nk i=0 i=0 and that (n−k 2 ) Nc = ( ) n 2 Nc n−k n 2 ! · ( 2 − Nc )! ( n−k − Nc )! · n2 ! 2 = )−Nc (n2Y = Both products are over (n2Y )−Nc −Nc +1 j=(n−k 2 ) n 2 − (n2 ) Y j +1 j=(n−k 2 ) n−k 2 1 = j (n2 ) Y +1 j=(n−k 2 ) (n2 ) Y j j=(n−k −Nc +1 2 ) j −Nc +1 j=(n−k 2 ) (n−k 2 ) Y j=(n −Nc +1 2) 1 j 1 . j terms so in fact (n2 ) Y (i − Nc ) +1 i=(n−k 2 ) (n2 ) Y +1 i=(n−k 2 ) 1 = i (n2 ) Y +1 i=(n−k 2 ) i − Nc . i We may therefore modify our original expression to obtain (n2 ) (n2 ) (n−k k−1 2 ) Y Y Y i N Nc n c Nc k k 1− 1− (16) k! =n 1− ≈n . (n2 ) k n j j n−k n−k i=0 j=( 2 )+1 j=( 2 )+1 Nc WLOG, take logs! (17) k log(n) + (n2 ) X j=(n−k +1 2 ) Nc log 1 − ≈ k log(n) − Nc j (n2 ) X j=(n−k +1 2 ) 1 . j This holds because − log(1 − x) − x = x + x2 /2 + x3 /3 + . . . − x ≤ x2 /2 ≤ x2 for 0 ≤ x ≤ 1/2 and limn→∞ Nc /j = 0. Now, for n >> 0: RANDOM GRAPHS 7 n (n) ( ) (n2 ) 2 2 X X X 1 1 Nc 2 + Nc ≤ Nc2 log 1 − ≤ Nc 2 n−k j j j j=( 2 )+1 j=(n−k +1 j=(n−k +1 2 ) 2 ) n 2 n−k 2 n−k 2 2 − where the right hand inequality came from bounding the sum by the large term times the number of summands. The right hand of the inequality goes to zero because n − n−k 2 Nc2 2 n−k 2 2 = Nc2 n−k 2 n 2 n−k 2 ! −1 = Nc2 n−k 2 2kn − k 2 + k (n − k)(n − k − 1) 2(2kn − k 2 + k) 1 << (n log n)2 3 << log n/n → 0. (n − k)2 (n − k − 1)2 n P(n2 ) 2k 1 Coming back to equation (17), notice that j ≈ n because bounding from j=(n−k +1 ) 2 above we have = Nc2 (n2 ) X j=(n−k +1 2 ) 1 ≤ j n n−k − = n−k 2 2 2 1 n 2 n−k 2 −1≈ 2k n and bounding from below we have (n2 ) X +1 j=(n−k 2 ) 1 1 ≥ n j 2 n n−k − = 2 2 n−k 2 n 2 " # 2k n n−k ≈1· . − n−k 2 2 n 2 1 Subbing things back into equation (17) we now have (n−k 2 ) n Nc ≈ k log(n) − ( 1 n log n + cn) 2k = k log(n) − k log(n) − 2kc log k! n (2) k 2 n Nc so taking the exponential of both sides completes this rather tiresome proof. Theorem 3.3 (Greatest Connected Component). If Pk (n, Nc ) denotes the probability of the greatest connected component of Γn,Nc consisting of n − k points for 0 ≤ k ≤ n, then: (18) (e−2c )k e−e lim Pk (n, Nc ) = n→∞ k! −2c . Namely, the number of points outside the greatest component of Γn,Nc is distributed in the limit according to Poisson’s law with mean value e−2c . 8 MAXIME BERGERON Proof. It follows from the key lemma that in the limit we need only consider graphs of type A. The number of graphs Gn,Nc of type A having a connected component of precisely n− k points is equal to the number of completely connected graphs Gn−k,Nc multiplied by n k . As such, we have that n k Pk (n, Nc ) ∼ · |{Gn−k,Nc : is connected }| = (n2 ) n k n−k · ( N2 ) · P0 (n − k, nc ) c (n) Nc 2 Nc or Pk (n, Nc ) ∼ n k n−k · ( N2 ) c · P0 (n − k, Nc ). (n) 2 Nc Now, the first bracketed term in the right hand side corresponds to equation (15) so it −2kc converges to e k! as n → ∞. On the other hand, Nc − 12 (n − k) log(n − k) =c n→∞ n−k lim so the difference in the limit between the Nc (n) and Nc (n−k) for a constant k goes to zero, hence by the Connected Component Theorem’s equation (13), P0 (n − k, Nc ) ∼ P0 (n, Nc ) = −2c ee . Q Theorem 3.4 (Disjoint Connected Components). If k (n, Nc ) denotes the probability that Q Γn,Nc consists of exactly k + 1 disjoint connected components (in particular 0 (n, Nc ) = P0 (n, Nc )), then: (19) −2c Y (e−2c )k e−(e ) (n, Nc ) = . n→∞ k! lim k Proof. It follows from the key lemma that we may restrict ourself to graphs of type A. As such, in the limit, graphs with k + 1 disjoint connected components Q have a single connected component of more than one vertex and k isolated vertices so k (n, Nc ) ∼ Pk (n, Nc ) and the result follows by the Disjoint Connected Components theorem. Theorem 3.5 (Markov Process). Let the edges of a random graph on the vertices v1 , v2 , . . . , vn be chosen successively among all possible edges in such a manner that at each stage, every edge which has not yet been chosen has the same probability to be chosen as the next, and let us continue this process until the graph becomes completely connected. If ηn denotes the number of edges of the resulting connected random graph Γ, then: (20) 2 −2l − 2l 1 P (ηn = b n log nc + l) ∼ e n −e n 2 n RANDOM GRAPHS 9 for |l| = O(n) and (21) lim P n→∞ ! ηn − 12 n log n −2x < x = e−e . n Proof. To prove equation (20), let us reduce the probability to some others we already know. If ηn = b 12 n log nc + l = N + 1, then just before choosing the last edges we had a disconnected graph Gn,N which could be made completely connected by adding a single edge. By the key lemma, in the limit such a Gn,N consists of the disjoint union of a connected graph on n − 1 vertices and a single isolated vertex. Given this setup, the last edge can be chosen in n − 1 ways among the remaining n2 − N edges to obtain a connected ∼ n2 n−1 ∼ n2n/2 = n2 we see that graph Gn,N +1 . Since nn−1 ( 2 )−N −n log n 2 1 2 2 −2l − 2l P (ηn = b n log nc + l) ∼ P1 (n, N ) ∼ e n −e n 2 n n where P1 (n, N ) is the probability of obtaining a graph as described previously. Since l N = b 12 n log nc + l − 1 and l ∈ O(n), we see that l − 1 = cn =⇒ c = l−1 n ≈ n which justifies the right hand side of the claimed asymptotic. To prove equation (21), notice that if ηn = b 12 n log nc + l then P (ηn − 21 n log n < nx) is asymptotically the sum of the probabilities given by equation (20) where l < nx. In other words: X 2 −2l − 2ln 1 e n −e . P (ηn − n log n < nx) ∼ 2 n l<nx Changing the variables by t := X 2 −2l − 2ln e n −e n l<nx 2l n, we recognize the sum as a Riemann partition so we have Z 2x X t −t −t −t −e−2x e−t−e dt = [e−e ]2x . e−t−e ∼ = −∞ = e l −∞ t<2x 4. Epilogue Above, we have explored the structure of a random graph having N (n) = 21 n log n + cn edges as done in [ER59]. In the subsequent paper [ER60] a broader picture of random graphs Γn,N (n) was uncovered, describing their evolution in terms of five major phases. n→∞ 4.1. Phase 1. When N (n) = o(n) or N n(n) −−−→ 0, Γn,N (n) is almost surely made up exclusively of small components that are trees. 4.2. Phase 2. When N (n) ∼ αn and 0 < α < 1/2, Γn,N (n) starts containing cycles of any fixed order with probability tending to a positive limit. Each component of Γn,N (n) is almost surely a single cycle or a tree. The GCC of Γn,N (n) remains a tree containing roughly log n − log log n vertices. 10 MAXIME BERGERON 4.3. Phase 3. When N (n) ∼ αn, and α = 1/2, a dramatic change occurs. The GCC of Γn,N (n) now consists almost surely of n2/3 vertices and as soon as α > 1/2, the GCC of α→∞ Γn,N (n) consists almost surely of G(α) · n vertices where G(1/2) = 0 and G(α) −−−→ 1. Remark During this phase, all the tree components of Γn,N (n) progressively melt into the GCC with smaller trees surviving the longest outside of it. 4.4. Phase 4. When N (n) ∼ αn log n and α ≤ 1/2 the graph Γn,N (n) eventually becomes almost surely connected. In particular, if N (n) = 12 n log n + cn + o(n), Γn,N (n) is almost −2c surely of type A and the probability that it is completely connected tends to e−e which goes to 1 as c → ∞ (This is the content of the Completely Connected Theorem). 4.5. Phase 5. When N (n) ∼ w(n)(n log n) with w(n) → ∞, Γn,N (n) is almost surely connected and the degree of vertices are almost surely equal so the graph is asymptotically regular. 5. appendix The following argument is due to Byron Schmuland. The first (false) bound claimed in the proof of the key lemma is equivalent to n 1/s − s(n − s) 2 n Nc ≤ e3−2c s! n s 2 Nc and can be written in product form as (22) s−1 Y (n − i) (n2 ) Y 1/s j=(n −s(n−s)+1 2) i=0 Nc 1/s ≤ e3−2c . 1− j We may assume that 1 n 2 0 ≤ Nc ≤ − (n/2) . 3 2 This is possible since cn grows like a multiple of n, Nc grows like a multiple of n log(n), Q 1/s , while n2 − (n/2)2 grows like a multiple of n2 . We begin with the factor s−1 i=0 (n − i) and bound its logarithm from below to obtain: s−1 (23) 1X log (n − i) ≥ log(n/2) = log(n) − log(2). s i=0 The logarithm of the other factor (n2 ) Y j=(n −s(n−s)+1 2) Nc 1/s 1− j RANDOM GRAPHS is 1 s (n2 ) X j=(n −s(n−s)+1 2) 11 Nc . log 1 − j From the inequality log(1 − x) ≥ −4x/3 for 0 ≤ x ≤ 1/3 and our assumption on Nc we see that 1 s (n2 ) X j=(n −s(n−s)+1 2) −4Nc Nc ≤ log 1 − j 3s and using the integral test we obtain Z n dt −4Nc −4Nc ( 2 ) ≤ log ≤ 3s 3s (n2 )−s(n−s) t (n2 ) X j=(n −s(n−s)+1 2) n 2 n 2 − s(n − s) 1 j ! . Substituting −Nc ≥ −( 12 n log(n) + cn) and s = n/2 gives a lower bound of 4 2(n − 1) 4 8c (24) −(log(n) + 2c) log ≈ − log(n) log(2) − log(2). 3 n−2 3 3 Since 43 log(2) ≈ .9242, by adding equations (24) and (23) we find that the left hand side of equation (22) grows at least like n.0758 as n goes to infinity, so the inequality would appear to be false. References [Bol01] B. Bollobás, Random graphs, vol. 73, Cambridge Univ Pr, 2001. [ER59] P. Erdős and A. Rényi, On random graphs i., Publ. Math. Debrecen 6 (1959), 290–297. , On the evolution of random graphs, Akad. Kiadó, 1960. [ER60] [GS81] E. Godehard and J. Steinbach, On a lemma of p. erdős and a. rényi about random graphs, Publ. Math. Debrecen 8 (1981), 271–273. [KR97] M. Karonski and A. Rucinski, The origins of the theory of random graphs, Algorithms and Combinatorics 13 (1997), 311–336.