Hashing Out Random Graphs • • • • • Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening Introduction • We will be looking at some applications of probability in computer science, hash functions, and also applications of probability with random graphs. Hash Functions • We are going to map at set of n records, denoted , r1, r2, … rn, in m, m > n, locations with only one record in each location in m. • A hashing function is a function that maps the record values into the m locations. • We use a sequence of hash functions, denoted h1, h2, h3, …, to map the ri records in the m locations. • The records are placed sequentially as indicated below: – h1(r1) = m1. – h1(r2), h2(r2), h3(r3), … • Every time we are unsuccessful in placing a record (because it is already full), a collision occurs. • We will let the random variable X denote the number of collisions that occur when placing n records. • We would like to find E[X] and Var(X). • These values are very hard to figure out but we can come up with a formula for each of these two problems. • In order to do this we need to define some other random variables. • Yk = #of collisions in placing rk X= n Yk = Y 1 + Y 2 + ... + Yn k =1 Zk = Yk + 1 (geometric with p = (m-k+1)/m) • Therefore, X = Z 1 + Z 2 + ... + Zn − n • We can then find E[Zk]. 1 m E[Zk] = = p m − k +1 E[ X ] = E[ Z 1] + E[ Z 2] + ... + E[ Zn] − n = −n + n E[ Zk ] k =1 n m = −n + k =1 m − k + 1 1 1 1 = −n + m + + ... + m m −1 m − n +1 ≈ −n + m m dx / x m − n +1 m ≈ −n + m log m − n +1 m E [X ] ≈ m log −n m − n +1 • We would also like to find Var(X). ( 1− p) = Var (Zk ) = p Var ( X ) = n k =1 2 m(k − 1) 2 (m − k +1) Var (Zk ) = m n k =1 (k − 1) (m−k +1) 2 n −1 1 2 =m 2 + 2 + ... + 2 (m −1) (m − 2) [m −(n −1)] ≈m n −1 1 x 2 dx (m − x ) We now know the formula for E[X] and the Var(X). m E [X ] ≈ m log −n m − n +1 Var ( X ) ≈ m n −1 1 x 2 dx (m − x ) Alfred Renyi March 30, 1921– Feb. 1, 1970 49 years old • The Hungarian mathematician spent six months in hiding after being forced into a Fascist Labor Camp in 1944 • During that time he rescued his parents from a Budapest prison by dressing up in a soldiers uniform • He got his Ph.D. at the University of Szeged in Hungary • Renyi worked with Erdös on Random Graphs, they published joint work • He worked on number theory and graph theory, which led him to results about the measures of the dependency of random variables Paul Erdös Born: March 26, 1913 May have been the most prolific mathematician of all time Written and Co-Authored over 1475 Papers Erdös was born to two high school math teachers His mother kept him out of school until his teen years because she feared its influence At home he did mental arithmetic and at three he could multiply numbers in his head Fortified by espresso Erdös did math for 19 hours a day, 7 days a week He devoted his life to a single narrow mission: uncovering mathematical truth He traveled around for six decades with a suit case looking for mathematicians to pick his brain His motto was: “Another roof, another proof” • “Property is a nuisance” • “Erdös posed and solved thorny problems in number theory and other areas and founded the field of discrete mathematics which is a foundation of computer science” • Awarded his doctorate in 1934 at the University of Pazmany Peter in Budapest Graphs • A graph consists of a set of elements V called vertices and a set E of pairs of vertices called edges • A path is a set of vertices i,i1,i2,..,ik,j for which (i,i1),(i1,i2),..,(ik,j) E is called a path from i to j Connected Graphs • A graph is said to be connected if there is a path between each pair of vertices • If a graph is not connected it is called disconnected Random Graphs • In a random graph, we start with a set of vertices and put in edges at random, thus creating paths • So an interesting question is to find P(graph is connected) such that there is a path to every vertex in the set James Stirling Who is James Stirling? • Lived 1692 – 1770. • Family is Roman Catholic in a Protestant England. • Family supported Jacobite Cause. • Matriculated at Balliol College Oxford • Believed to have studied and matriculated at two other universities but this is not certain. • Did not graduate because h refused to take an oath because of his Jacobite beliefs. • Spent years studying, traveling, and making friends with people such as Sir Isaac Newton and Nicolaus(I) Bernoulli. Methodus Differentialis • Stirling became a teacher in London. • There he wrote the book Methodus Differentialis in 1730. • The book’s purpose is to speed up the convergence of a series. • Stirling’s Formula is recorded in this book in Example 2 of Proposition 28. n!≈ 2πn n e n −n Stirling’s Formula n!≈ 2πn n e n −n • • • • Used to approximate n! Is an Asymptotic Expansion. Does not converge. Can be used to approximate a lower bound in a series. • Percentile error is extremely low. • The bigger the number inserted, the lower the percentile error. Stirling’s Formula Error Probability • • • • About 8.00% wrong for 1! About 0.80% wrong for 10! About 0.08% wrong for 100! Etc… • Percentile Error is close to 1 so if the formula is multiplied by , itn only gets 12 1 better with errors only at 1+ 12. n 1 n2 Probability Background • Normal Distribution and Central Limit theorem • Poisson Distribution • Multinomial Distribution The Normal Distribution • A continuous random variable x with pdf • • 1 f(x) = 2 −(x − ) 2 e 2 2 − ∞ < x < ∞ is called normal Normal Distribution • It can be shown that X~N( , σ ) Normal Distribution • Note: When the mean = 0 and standard deviation = 1, we get the standard normal random variable • Z~N(0,1) Central Limit Theorem • If X1, X2,… are independent identically distributed with common mean µ, and standard deviation , then n • lim P n →∞ i =1 x i − nµ σ n 1 <x = 2π x −∞ −y e 2 2 dy Central Limit Theorem • • If Sn = n i =1 x , n is large then, i Sn is approximately normal If X ~ N(µ , σ ) then, Z= x− ~ N(0,1) Poisson Distribution • p(x) = lim n →∞ • x p(x) = • n x p (1− p) x n −x − e , x = 0 ,1,2... x! Mean and variance both equal to Multinomial Distribution • n independent identical trials of events A1, A2,…,Ak with probabilities P1,P2,...Pk • Define Xi = number times Ai occurs j=1…k • (X1+X2+…+Xk = n) then, Multinomial Distribution • P{x1 = n1 , x2 = n2 ,... xk = nk} n! n1 n2 nk ... = P P 1 P2 k ! !... ! n1 n2 nk • Where n is sum of ni Connected Graphs • Recall: A random graph G consists of vertices, V={1,2,…,n}, random variables x(i) where i=1,..,n along with probabilities Pj ( Pj = 1) ∋ P{x (i ) = j} = Pj Connected Graphs • The set of random edges is then E = {(i, x(i)) : i = 1,.., n} which is the edge emanating from vertex i Connected Graphs • The probability that a random graph is connected P {graph is connected} = ? • A special case: suppose vertex 1 is ‘dead’ (doesn’t spawn an edge) P2 • N=2 P1 P1 + P2 = 1 P { graph connected } = P1 Dead Vertex Lemma • Consider a random graph consisting of vertices 0,1,2,..,r and edges ( i , Y i ) , Y i=1,2,…,r where i are independent and P { Y i = j } = Q j , j=0,1,..,r if ( n j =0 Q j = 1) then P{graph connected} = Q0 Dead Vertex Lemma 1 4 3 6 5 2 Maximal Non-Self Intersecting(MNSI) • Consider the maximal non-self intersecting path emanating from vertex 1: 2 k k −1 1, x(1), x (1),..., x (1) = x( x 1 2 3 k=3 5 4 (1)) Maximal Non-Self Intersecting(MNSI) • Define N = min(k : X (1) ∈{1, X (1),.., X and set N −1 k W = P1 + i =1 Pxi (1) k −1 (1)}) Maximal Non-Self Intersecting(MNSI) 2 7 1 3 6 4 k=4 5 • By using the MNSI path as the Dead Vertex Lemma, N−1 P{graphconnected | N,1, X(1),..., X (1)} = W Conditional Probability The idea of conditional probability : P{event} = P{event | scenario} ⋅ P{scenario} Expectations of discrete random variables are conditional probability averages. E( X ) = x ⋅ P{ X = x} x Conditional Probability Taking expectations : event E (W ) = scenario P{graph connected | N ,1, X (1),.., X ∗ P{N ,1, X (1),.., X scenario = P{graph connected} N −1 (1)} N −1 (1)} Conditional Probability S p e c ia l C a s e o f In te re s t: 1 ( e q u ip ro b a b le v e rtic e s ) Pj = n 1 N E [W ] = ⋅ E [N ] W = n n E [N ] = n −1 i= 0 P { N > i} Conditional Probability 1 E [W ] = n 1 = n 1 = n n−1 i=1 n−1 P { N > i} i= 0 ( n − 1)( n − 2 )... ( n − i ) n! ( n − 1) ! i n ( n − i − 1) ! Conditional Probability = ( n − 1) ! ⋅ n ( n − 1) = n n L e t j = E [W ] = n n − 1 i = 0 1 n i ( n − i − 1) ! n − 1 n − i − 1 n ⋅ ( n − i − 1) ! i = 0 n − i − 1 ( n − 1) ! ⋅ n n n j j ! Poisson Distribution Suppose X is Poisson with mean λ = n P{ X = k} = λ k e −λ k! k n −n = e k! Poisson Distribution So pick P { X < n} = n −1 P{ X = k} k =0 = n −1 k =0 = e −n k n e k! n −1 j=0 n −n j j! Central Limit Theorem Recall: X = X 1 + X 2 + ... + X n each Poisson of mean n By the Central Limit Thm, for large n Sn ≈ N (n, n ) 1 P( X < n) ≈ (asymptotic) 2 j j n n n e 1 ≈ → ≈ e −n j! 2 j! 2 Conditional Probability Recall: Stirling's Formula n!≈ 2πn nne−n (n − 1)!≈ 2π (n −1) (n −1)n−1 e−( n−1) (n −1)! ⋅ Recall E[W ] = n n nj j! 2π (n −1) (n −1)n−1 e−( n−1)en So by substitution E[W ] = n2 ⋅ 2 Conditional Probability n− 1 = = = = 2 π ( n − 1) 2 e 2n 2 2 π ( n − 1) n n− 1 2 n ( 1 ) ⋅ ⋅ − ⋅e n n 2 2π e n ⋅ ( ( n − 1) n ) ⋅ 2 n −1 2π ⋅ (1 − (1 n ) ) n ⋅ e 2 n −1 x n lim (1 + ) = e x n→∞ n Conditional Probability 2π 1 ⋅ ⋅e e 2 = n −1 2π = 2 n −1 2π = 2 n P{graph is connected} = E[W ] ≈ π 2n 1 2π = ⋅ 2 n Thank You “The first sign of senility is when a man forgets his theorems. The second is when he forgets to zip up. The third is when he forgets to zip down.” --Paul Erdös References • http://www-history.mcs.standrews.ac.uk/Mathematicians/Erdos.html • http://www-history.mcs.standrews.ac.uk/Mathematicians/Renyi.html • http://www.lassp.cornell.edu/sethna/Cracks/Stirling.html • http://www-gap.dcs.stand.ac.uk/~history/Mathematicians/Stirling.html