Hashing Out Random Graphs Nick Jones Sean Porter

advertisement
Hashing Out Random
Graphs
Nick Jones
Sean Porter
Erik Weyers
Andy Schieber
Jon Kroening
Introduction
We will be looking at some
applications of probability in
computer science, hash functions,
and also applications of probability
with random graphs.
Hash Functions
We are going to map at set of n
records, denoted , r1, r2, … rn, in m,
m > n, locations with only one
record in each location in m.
A hashing function is a function that
maps the record values into the m
locations.
We use a sequence of hash
functions, denoted h1, h2, h3, …, to
map the ri records in the m
locations.
The records are placed sequentially
as indicated below:
 h1(r1) = m1.
 h1(r2), h2(r2), h3(r3), …
Every time we are unsuccessful in
placing a record (because it is
already full), a collision occurs.
We will let the random variable X
denote the number of collisions that
occur when placing n records.
We would like to find E[X] and
Var(X).
These values are very hard to
figure out but we can come up with
a formula for each of these two
problems.
In order to do this we need to
define some other random
variables.
Yk = #of collisions in placing rk
n
X   Yk  Y 1  Y 2  ...  Yn
k 1
Zk  Yk  1
(geometric with p = (m-k+1)/m)
Therefore,
X  Z 1  Z 2  ...  Zn  n
We can then find E[Zk].
1
m
E[Zk]  
p m  k 1
E[ X ]  E[ Z 1]  E[ Z 2]  ...  E[ Zn]  n
n
 n   E[ Zk ]
k 1
n
m
 n  
k 1 m  k  1
1
1 
1
 n  m  
 ... 

m
m

1
m

n

1


m
 n  m
dx
/
x

m  n 1
 m 
 n  m log 

 m  n 1
 m 
EX   m log 
n
 m  n 1
We would also like to find Var(X).

1 p
Var Zk  

2
p
n
mk  1
2
mk 1
n
Var  X   Var Zk   m
k 1
k 1
k  1
mk 1
2
 1
2
n 1 

 m
2 
2  ... 
2
mn1 
 m1 m 2
n 1
m
1
x
2 dx
m x 
We now know the formula for E[X]
and the Var(X).
 m 
EX   m log 
n
 m  n 1
n 1
Var  X   m 
1
x
2 dx
m  x 
Alfred Renyi
March 30, 1921– Feb. 1, 1970
49 years old
The Hungarian mathematician
spent six months in hiding after
being forced into a Fascist Labor
Camp in 1944
During that time he rescued his
parents from a Budapest prison by
dressing up in a soldiers uniform
He got his Ph.D. at the University of
Szeged in Hungary
Renyi worked with Erdös on
Random Graphs, they published
joint work
He worked on number theory and
graph theory, which led him to
results about the measures of the
dependency of random variables
Paul Erdös
“A Mathematician is a machine for
turning coffee into theorems”
Born: March 26, 1913
May have been the most prolific
mathematician of all time
Written and Co-Authored over
1475 Papers
Erdös was born to two high school
math teachers
His mother kept him out of school
until his teen years because she
feared its influence
At home he did mental arithmetic and
at three he could multiply numbers
in his head
Fortified by espresso Erdös did math
for 19 hours a day, 7 days a week
He devoted his life to a single narrow
mission: uncovering mathematical
truth
He traveled around for six decades
with a suit case looking for
mathematicians to pick his brain
His motto was:
“Another roof, another proof”
“Property is a nuisance”
“Erdös posed and solved thorny
problems in number theory and
other areas and founded the field of
discrete mathematics which is a
foundation of computer science”
Awarded his doctorate in 1934 at
the University of Pazmany Peter in
Budapest
Graphs
A graph consists of a set of
elements V called vertices and a
set E of pairs of vertices called
edges
A path is a set of vertices i,i1,i2,..,ik,j
for which (i,i1),(i1,i2),..,(ik,j) Є E is
called a path from i to j
Connected Graphs
A graph is said to be connected if
there is a path between each pair
of vertices
If a graph is not connected it is
called disconnected
Random Graphs
In a random graph, we start with a
set of vertices and put in edges at
random, thus creating paths
So an interesting question is to find
P(graph is connected) such that
there is a path to every vertex in
the set
James Stirling
Who is James Stirling?
Lived 1692 – 1770.
Family is Roman Catholic in a Protestant
England.
Family supported Jacobite Cause.
Matriculated at Balliol College Oxford
Believed to have studied and matriculated at
two other universities but this is not certain.
Did not graduate because h refused to take
an oath because of his Jacobite beliefs.
Spent years studying, traveling, and making
friends with people such as Sir Isaac Newton
and Nicolaus(I) Bernoulli.
Methodus Differentialis
Stirling became a teacher in London.
There he wrote the book Methodus
Differentialis in 1730.
The book’s purpose is to speed up the
convergence of a series.
Stirling’s Formula is recorded in this
book in Example 2 of Proposition 28.
n! 2nn e
n n
Stirling’s Formula
n! 2nn e
n n
Used to approximate n!
Is an Asymptotic Expansion.
Does not converge.
Can be used to approximate a lower
bound in a series.
Percentile error is extremely low.
The bigger the number inserted, the
lower the percentile error.
Stirling’s Formula Error
Probability
About 8.00% wrong for 1!
About 0.80% wrong for 10!
About 0.08% wrong for 100!
Etc…
1
12 n
Percentile Error is close to
so if the
1
1
formula is multiplied by 12n , it only
gets better with errors only at 12 .
n
Probability Background
Normal Distribution and Central
Limit theorem
Poisson Distribution
Multinomial Distribution
The Normal Distribution
A continuous random variable x
with pdf
(x μ)
1
2
f(x) 
2σ
e
2πσ
2
   x   is called normal
Normal Distribution
It can be shown that X~N(μ,  )
Normal Distribution
Note: When the mean = 0 and
standard deviation = 1, we get the
standard normal random variable
Z~N(0,1)
Central Limit Theorem
If X1, X2,… are independent identically
distributed with common mean µ, and
standard deviation σ, then
 n


   xi   n

1
  i 1 

P
 x 
lim
 n
2
n 




x
y
 e

2
2
dy
Central Limit Theorem
n
If Sn   x i , n is large then,
i 1
Sn is approximat ely normal
If X ~ N( ,  ) then,
x μ
Z
~ N(0,1)
σ
Poisson Distribution
nx
n x
p(x)  lim   p (1 p)
n   x 
p(x) 
λ
λe
x
, x  0,1,2...
x!
Mean and variance both equal to λ
Multinomial Distribution
n independent identical trials of
events A1, A2,…,Ak with
probabilities P1,P2,...Pk
Define Xi = number times Ai occurs
j=1…k
(X1+X2+…+Xk = n) then,
Multinomial Distribution
Px1  n1 , x 2  n 2 ,... x k  n k
n!
n1
n2
nk

... Pk
P
1 P2
n1!n 2!... n k !
Where n is sum of ni
Connected Graphs
Recall: A random graph G consists of
vertices, V={1,2,…,n}, random
variables x(i) where i=1,..,n along with
probabilities
Pj ( Pj  1)  P{x(i )  j}  Pj
Connected Graphs
The set of random edges is then
E  {( i, x(i)) : i  1,.., n}
which is the edge emanating from
vertex i
Connected Graphs
The probability that a random graph is
connected P {graph is connected} = ?
A special case: suppose vertex 1 is
‘dead’ (doesn’t spawn an edge)
P2
N=2
P
1
P1 + P2 = 1
P{graph connected }  P1
Dead Vertex Lemma
Consider a random graph consisting of
vertices 0,1,2,..,r and edges (i , Yi ),
i=1,2,…,r where Yi are independent
and P{Yi  j}  Q j , j=0,1,..,r
n
if ( Q j  1) then P{graph connected } = Q0
j 0
Dead Vertex Lemma
1
4
6
3
5
2
Maximal Non-Self
Intersecting(MNSI)
Consider the maximal non-self
intersecting path emanating from
vertex 1:
1, x(1), x (1),..., x (1)  x( x
2
1
k
2
3
k=3
5
4
k 1
(1))
Maximal Non-Self
Intersecting(MNSI)
Define
N  min( k : X (1) {1, X (1),.., X
k
and set
N 1
W  P1   Pxi (1)
i 1
k 1
(1)})
Maximal Non-Self
Intersecting(MNSI)
2
7
1
3
6
4
k=4
5
By using the MNSI path as the Dead
Vertex Lemma,
P{graph connected | N ,1, X (1),..., X
N 1
(1)}  W
Conditional Probability
The idea of conditiona l probabilit y :
P{event}   P{event | scenario}  P{scenario}
Expectatio ns of discrete random variables
are conditiona l probabilit y averages.
E ( X )   x  P{ X  x}
x
Conditional Probability
Taking expectatio ns :
scenario
event
E (W )  
P{graph connected | N ,1, X (1),.., X
 P{N ,1, X (1),.., X
scenario
 P{graph connected}
N 1
(1)}
N 1
(1)}
Conditional Probability
Special Case of Interest:
1
Pj 
( equiprobable vertices)
n
N
1
W 
E [W ] 
 E[ N ]
n
n
n 1
E[ N ] 
 P{ N
i 0
 i}
Conditional Probability
1
E [W ] 
n
1

n
1

n
n 1

i 1

n 1
 P{ N
 i}
i 0
( n  1)( n  2)...( n  i )
n!
( n  1)!
i
n ( n  i  1)!
Conditional Probability
( n  1)! n 1
1


i
n
i  0 n ( n  i  1)!
n  i 1
( n  1)
n


n
n
i  0 ( n  i  1)!
Let
j  n i 1
n
n 1
( n  1)!
nj
E [W ] 

n
n
j!
Poisson Distribution
Suppose X is Poisson wi th mean   n
 
P{ X  k}  e
k!
k
n n
 e
k!
k
Poisson Distribution
So pick
P{ X  n} 
n 1
 P{ X
 k}
k 0

n 1

k 0
e
n
n k n
e
k!
n 1

j 0
j
n
j!
Central Limit Theorem
Recall : X  X 1  X 2  ...  X n each Poisson of mean n
By the Central Limit Thm, for large n
S n  N (n, n )
1
P( X  n)  (asymptotic)
2
j
j
n
n
1
n
e
e n     
j! 2
j! 2
Conditional Probability
Recall : Stirling' s Formula
n! 2n n n e  n
(n  1)!  2 (n  1) (n  1) n 1 e ( n 1)
(n  1)!
nj
Recall E[W ] 

n
n
j!
2 (n  1) (n  1) n 1 e ( n 1) e n
So by substituti on E[W ] 
n2  2
Conditional Probability
n 1




2 ( n  1) 2 e
2n 2
2 (n  1) n
n 1
2


(
n

1
)
e
n
2
n
2
e
n
 (( n  1) n) 
2
n 1
2
 (1  (1 n)) n  e
2
n 1
x n
lim (1  )  e x
n 
n
Conditional Probability
2 1
 e
 2 e
n 1
2

2 n 1
2
=
2 n

P{graph is connected}  E[W ] 
2n
1 2
= 
2
n
Thank You
“The first sign of senility is when a
man forgets his theorems. The
second is when he forgets to zip
up. The third is when he forgets to
zip down.”
--Paul Erdös
References
http://www-history.mcs.standrews.ac.uk/Mathematicians/Erdos.html
http://www-history.mcs.standrews.ac.uk/Mathematicians/Renyi.html
http://www.lassp.cornell.edu/sethna/Cracks/Stirling.ht
ml
http://www-gap.dcs.stand.ac.uk/~history/Mathematicians/Stirling.html
Download