Hashing Out Random Graphs • Nick Jones • Sean Porter • Erik Weyers

advertisement
Hashing Out Random Graphs
•
•
•
•
•
Nick Jones
Sean Porter
Erik Weyers
Andy Schieber
Jon Kroening
Introduction
• We will be looking at some
applications of probability in
computer science, hash functions,
and also applications of probability
with random graphs.
Hash Functions
• We are going to map at set of n
records, denoted , r1, r2, … rn, in m, m
> n, locations with only one record in
each location in m.
• A hashing function is a function that
maps the record values into the m
locations.
• We use a sequence of hash
functions, denoted h1, h2, h3, …, to
map the ri records in the m
locations.
• The records are placed sequentially
as indicated below:
– h1(r1) = m1.
– h1(r2), h2(r2), h3(r3), …
• Every time we are unsuccessful in
placing a record (because it is
already full), a collision occurs.
• We will let the random variable X
denote the number of collisions that
occur when placing n records.
• We would like to find E[X] and
Var(X).
• These values are very hard to
figure out but we can come up with
a formula for each of these two
problems.
• In order to do this we need to
define some other random
variables.
• Yk = #of collisions in placing rk
X=
n
Yk = Y 1 + Y 2 + ... + Yn
k =1
Zk = Yk + 1
(geometric with p = (m-k+1)/m)
• Therefore,
X = Z 1 + Z 2 + ... + Zn − n
• We can then find E[Zk].
1
m
E[Zk] = =
p m − k +1
E[ X ] = E[ Z 1] + E[ Z 2] + ... + E[ Zn] − n
= −n +
n
E[ Zk ]
k =1
n
m
= −n +
k =1 m − k + 1
1
1
1
= −n + m
+
+ ... +
m m −1
m − n +1
≈ −n + m
m
dx / x
m − n +1
m
≈ −n + m log
m − n +1
m
E [X ] ≈ m log
−n
m − n +1
• We would also like to find Var(X).
(
1− p)
=
Var (Zk ) =
p
Var ( X ) =
n
k =1
2
m(k − 1)
2
(m − k +1)
Var (Zk ) = m
n
k =1
(k − 1)
(m−k +1)
2
n −1
1
2
=m
2 +
2 + ... +
2
(m −1) (m − 2)
[m −(n −1)]
≈m
n −1
1
x
2 dx
(m − x )
We now know the formula for E[X]
and the Var(X).
m
E [X ] ≈ m log
−n
m − n +1
Var ( X ) ≈ m
n −1
1
x
2 dx
(m − x )
Alfred Renyi
March 30, 1921– Feb. 1, 1970
49 years old
• The Hungarian mathematician
spent six months in hiding after
being forced into a Fascist Labor
Camp in 1944
• During that time he rescued his
parents from a Budapest prison by
dressing up in a soldiers uniform
• He got his Ph.D. at the University of
Szeged in Hungary
• Renyi worked with Erdös on
Random Graphs, they published
joint work
• He worked on number theory and
graph theory, which led him to
results about the measures of the
dependency of random variables
Paul Erdös
Born: March 26, 1913
May have been the most prolific
mathematician of all time
Written and Co-Authored over
1475 Papers
Erdös was born to two high school
math teachers
His mother kept him out of school
until his teen years because she
feared its influence
At home he did mental arithmetic and
at three he could multiply numbers
in his head
Fortified by espresso Erdös did math
for 19 hours a day, 7 days a week
He devoted his life to a single narrow
mission: uncovering mathematical
truth
He traveled around for six decades
with a suit case looking for
mathematicians to pick his brain
His motto was:
“Another roof, another proof”
• “Property is a nuisance”
• “Erdös posed and solved thorny
problems in number theory and
other areas and founded the field of
discrete mathematics which is a
foundation of computer science”
• Awarded his doctorate in 1934 at
the University of Pazmany Peter in
Budapest
Graphs
• A graph consists of a set of elements
V called vertices and a set E of pairs
of vertices called edges
• A path is a set of vertices i,i1,i2,..,ik,j
for which (i,i1),(i1,i2),..,(ik,j) E is
called a path from i to j
Connected Graphs
• A graph is said to be connected if
there is a path between each pair of
vertices
• If a graph is not connected it is called
disconnected
Random Graphs
• In a random graph, we start with a set
of vertices and put in edges at
random, thus creating paths
• So an interesting question is to find
P(graph is connected) such that there
is a path to every vertex in the set
James Stirling
Who is James Stirling?
• Lived 1692 – 1770.
• Family is Roman Catholic in a Protestant
England.
• Family supported Jacobite Cause.
• Matriculated at Balliol College Oxford
• Believed to have studied and matriculated at
two other universities but this is not certain.
• Did not graduate because h refused to take
an oath because of his Jacobite beliefs.
• Spent years studying, traveling, and making
friends with people such as Sir Isaac Newton
and Nicolaus(I) Bernoulli.
Methodus Differentialis
• Stirling became a teacher in London.
• There he wrote the book Methodus
Differentialis in 1730.
• The book’s purpose is to speed up the
convergence of a series.
• Stirling’s Formula is recorded in this
book in Example 2 of Proposition 28.
n!≈ 2πn n e
n −n
Stirling’s Formula
n!≈ 2πn n e
n −n
•
•
•
•
Used to approximate n!
Is an Asymptotic Expansion.
Does not converge.
Can be used to approximate a lower
bound in a series.
• Percentile error is extremely low.
• The bigger the number inserted, the
lower the percentile error.
Stirling’s Formula Error
Probability
•
•
•
•
About 8.00% wrong for 1!
About 0.80% wrong for 10!
About 0.08% wrong for 100!
Etc…
• Percentile Error is close to 1 so if the
formula is multiplied by
, itn only gets
12
1
better with errors only at 1+ 12. n
1
n2
Probability Background
• Normal Distribution and Central Limit
theorem
• Poisson Distribution
• Multinomial Distribution
The Normal Distribution
• A continuous random variable x
with pdf
•
•
1
f(x) =
2
−(x − )
2
e
2
2
− ∞ < x < ∞ is called normal
Normal Distribution
•
It can be shown that X~N( , σ )
Normal Distribution
• Note: When the mean = 0 and
standard deviation = 1, we get the
standard normal random variable
• Z~N(0,1)
Central Limit Theorem
• If X1, X2,… are independent identically
distributed with common mean µ, and
standard deviation , then
n
•
lim P
n →∞
i =1
x
i
− nµ
σ n
1
<x =
2π
x
−∞
−y
e
2
2
dy
Central Limit Theorem
•
•
If Sn =
n
i =1
x , n is large then,
i
Sn is approximately normal
If X ~ N(µ , σ ) then,
Z=
x−
~ N(0,1)
Poisson Distribution
•
p(x) = lim
n →∞
•
x
p(x) =
•
n
x
p (1− p)
x
n −x
−
e
, x = 0 ,1,2...
x!
Mean and variance both equal to
Multinomial Distribution
• n independent identical trials of
events A1, A2,…,Ak with probabilities
P1,P2,...Pk
• Define Xi = number times Ai occurs
j=1…k
• (X1+X2+…+Xk = n) then,
Multinomial Distribution
•
P{x1 = n1 , x2 = n2 ,... xk = nk}
n!
n1
n2
nk
...
=
P
P
1 P2
k
!
!...
!
n1 n2 nk
• Where n is sum of ni
Connected Graphs
• Recall: A random graph G consists of
vertices, V={1,2,…,n}, random
variables x(i) where i=1,..,n along with
probabilities
Pj (
Pj = 1) ∋ P{x (i ) = j} = Pj
Connected Graphs
• The set of random edges is then
E = {(i, x(i)) : i = 1,.., n}
which is the edge emanating from
vertex i
Connected Graphs
• The probability that a random graph is
connected P {graph is connected} = ?
• A special case: suppose vertex 1 is
‘dead’ (doesn’t spawn an edge)
P2
• N=2
P1
P1 + P2 = 1
P { graph connected } = P1
Dead Vertex Lemma
• Consider a random graph consisting of
vertices 0,1,2,..,r and edges ( i , Y i ) ,
Y
i=1,2,…,r where i are independent
and P { Y i = j } = Q j , j=0,1,..,r
if (
n
j =0
Q j = 1) then P{graph connected} = Q0
Dead Vertex Lemma
1
4
3
6
5
2
Maximal Non-Self
Intersecting(MNSI)
• Consider the maximal non-self
intersecting path emanating from
vertex 1: 2
k
k −1
1, x(1), x (1),..., x (1) = x( x
1
2
3
k=3
5
4
(1))
Maximal Non-Self
Intersecting(MNSI)
• Define
N = min(k : X (1) ∈{1, X (1),.., X
and set
N −1
k
W = P1 +
i =1
Pxi (1)
k −1
(1)})
Maximal Non-Self
Intersecting(MNSI)
2
7
1
3
6
4
k=4
5
• By using the MNSI path as the Dead
Vertex Lemma,
N−1
P{graphconnected | N,1, X(1),..., X (1)} = W
Conditional Probability
The idea of conditional probability :
P{event} =
P{event | scenario} ⋅ P{scenario}
Expectations of discrete random variables
are conditional probability averages.
E( X ) =
x ⋅ P{ X = x}
x
Conditional Probability
Taking expectations :
event
E (W ) =
scenario
P{graph connected | N ,1, X (1),.., X
∗ P{N ,1, X (1),.., X
scenario
= P{graph connected}
N −1
(1)}
N −1
(1)}
Conditional Probability
S p e c ia l C a s e o f In te re s t:
1
( e q u ip ro b a b le v e rtic e s )
Pj =
n
1
N
E [W ] =
⋅ E [N ]
W =
n
n
E [N ] =
n −1
i= 0
P { N > i}
Conditional Probability
1
E [W ] =
n
1
=
n
1
=
n
n−1
i=1
n−1
P { N > i}
i= 0
( n − 1)( n − 2 )... ( n − i )
n!
( n − 1) !
i
n ( n − i − 1) !
Conditional Probability
=
( n − 1) !
⋅
n
( n − 1)
=
n
n
L e t
j =
E [W
] =
n
n − 1
i = 0
1
n
i
( n − i − 1) !
n − 1
n − i − 1
n
⋅
( n − i − 1) !
i = 0
n − i − 1
( n − 1) !
⋅
n
n
n
j
j !
Poisson Distribution
Suppose X is Poisson with mean λ = n
P{ X = k} =
λ
k
e
−λ
k!
k
n −n
= e
k!
Poisson Distribution
So pick
P { X < n} =
n −1
P{ X = k}
k =0
=
n −1
k =0
= e
−n
k
n
e
k!
n −1
j=0
n
−n
j
j!
Central Limit Theorem
Recall: X = X 1 + X 2 + ... + X n each Poisson of mean n
By the Central Limit Thm, for large n
Sn ≈ N (n, n )
1
P( X < n) ≈ (asymptotic)
2
j
j
n
n
n
e
1
≈ →
≈
e −n
j! 2
j! 2
Conditional Probability
Recall: Stirling's Formula
n!≈ 2πn nne−n
(n − 1)!≈ 2π (n −1) (n −1)n−1 e−( n−1)
(n −1)!
⋅
Recall E[W ] =
n
n
nj
j!
2π (n −1) (n −1)n−1 e−( n−1)en
So by substitution E[W ] =
n2 ⋅ 2
Conditional Probability
n− 1
=
=
=
=
2 π ( n − 1) 2 e
2n 2
2 π ( n − 1) n
n− 1
2
n
(
1
)
⋅
⋅
−
⋅e
n
n
2
2π
e
n
⋅ ( ( n − 1) n ) ⋅
2
n −1
2π
⋅ (1 − (1 n ) ) n ⋅ e
2
n −1
x n
lim (1 + ) = e x
n→∞
n
Conditional Probability
2π 1
⋅ ⋅e
e
2
=
n −1
2π
=
2 n −1
2π
=
2 n
P{graph is connected} = E[W ] ≈
π
2n
1 2π
= ⋅
2
n
Thank You
“The first sign of senility is when a
man forgets his theorems. The
second is when he forgets to zip
up. The third is when he forgets to
zip down.”
--Paul Erdös
References
• http://www-history.mcs.standrews.ac.uk/Mathematicians/Erdos.html
• http://www-history.mcs.standrews.ac.uk/Mathematicians/Renyi.html
• http://www.lassp.cornell.edu/sethna/Cracks/Stirling.html
• http://www-gap.dcs.stand.ac.uk/~history/Mathematicians/Stirling.html
Download