CS 312: Graphs: BFS and DFS Plan Running Time of Gale

advertisement
Plan
CS 312: Graphs: BFS and DFS
Dan Sheldon
I
Gale-Shapley Running Time
I
Graphs
I
I
Motivation and definitions
Graph traversal: BFS and DFS
February 5, 2015
Running Time of Gale-Shapley?
Data Structures
Initially all colleges and students are free
while some college is free and hasn’t made o↵ers to every
student do
Choose such a college c
Let s be the highest ranked student to whom c has not made
an o↵er
if s is free then
c and s become engaged
else if s is engaged to c0 but prefers c to c0 then
c0 becomes free
c and s become engaged
else
c remains free
end if
end while
Running-time depends on implementation details and data
structures (e.g. how to “choose such a college c”).
I
Q: How should we think about data structures when designing
algorithms?
I
A: Most of the time, as black boxes with running-time
guarantees (e.g., “find an element in O(log n) time”).
Good news: don’t need to remember details of data structures
Bad news: they may seem opaque
O(n2 ) iterations. Are all statements inside the loop constant time?
Review: Lists and Arrays
What Data Structures to Use For G-S?
Array
List
Get ith entry
O(1)
O(i)
Find element
O(n)
O(log n) if sorted
O(n)
Insert/delete
O(n)
O(1)
Note: O(1) = constant number of steps
Need to do the following in O(1) time
I
Find free college c
I
Find next student s in preference list of c
I
Find current college c0 of s
I
Check if s likes c0 better than c
What Data Structures to Use For G-S?
Another Example: Heapsort
Input: prefence lists = 2D arrays, e.g.
CollegePref[c, i] = student in position i on c’s list
Operation
Data structure
Fnd a free college c
linked list: freeColleges
Find next student s in
preference list of c
array
i = Next[c]
s = CollegePref[c, i]
Find current college c0 of s
1-D array: Current[s]
Check if s likes
c0
better than c
Input unsorted array A[] of length n
Let Q be a heap-based priority queue
for i = 1 to n do
Insert(Q, A[i])
end for
for i = 1 to n do
A[i] = ExtractMin(Q)
end for
Running time: (n⇥ Insert) + (n⇥ ExtractMin)
2-D array: Ranking[s,c]
I
O(n log n) if both operations are O(log n)
example on board
Graphs
Undirected Graph
I
Motivation and Definitions
I
Breadth-First Search (BFS) and Depth-First Search (DFS)
Undirected graph. G = (V, E)!
V = nodes (vertices)!
E = edges between pairs of nodes.!
Captures pairwise relationship between objects.!
Graph size parameters: n = |V|, m = |E|.
1
2
3
5
4
V = {1, 2, 3, 4, 5}!
E = {(1,2), (1,4), (1,5), (2,3), (2,4), !
(3,5)}!
n=5!
m=6
Graphs
Four Degrees of Separation
Four Degrees of Separation
⇤ †
†
†
⇤
⇤
†
Lars† Backstrom
Boldi
Marco
Rosa
Johan
Ugander
Lars Backstrom⇤ Paolo Boldi
Marco RosaPaolo
Johan
Ugander
Sebastiano
Vigna
January 6, 2012
I
Facebook: how many “degrees of separation” between me and
Barck Obama?
I
And many more. . .
studying the distance distribution
of very large
graphs: distribution
HyAbstract
studying
the distance
of ve
perANF [3]. Building on previousperANF
graph compression
[4] work
[3]. Building
on previous graph
Frigyes Karinthy, in his 1929
shortKarinthy,
story “Láncszemek”
on the
idea“Láncszemek”
of diffusive computation
pioneered
[21], computati
Frigyes
in his 1929and
short
story
and on the
idea of in
diffusive
(“Chains”) suggested that any(“Chains”)
two persons
are distanced
by two
the persons
new toolare
made
it possible
accurately
suggested
that any
distanced
by tothe
new toolstudy
madethe
it dispossible to accu
1
at most six friendship links.1 at
Stanley
in his links.
famous
tance distribution
orders
of magnitude
larger
than orders of m
most Milgram
six friendship
Stanley
Milgram inofhisgraphs
famous
tance
distribution
of graphs
experiment [20, 23] challengedexperiment
people to route
postcards
to
a
it
was
previously
possible.
[20, 23] challenged people to route postcards to a it was previously possible.
fixed recipient by passing them
onlyrecipient
through by
direct
acquainOnethrough
of the goals
in studying
distribution
the
fixed
passing
them only
direct
acquain-the distance
One of the
goals in is
studying
the dista
tances. The average number tances.
of intermediaries
on the
path ofidentification
of interesting
statistical
parameters
that can statistical
The average
number
intermediaries
on the path
identification
of interesting
of the postcards lay between of
4.4the
andpostcards
5.7, depending
on the 4.4
beand
used5.7,
to depending
tell proper on
social
from other complex
lay between
thenetworks
be used to tell proper social networks
sample of people chosen.
networks, such as web graphs. More generally, the distance
sample of people chosen.
networks, such as web graphs. More g
We report the results of the first world-scale social-network distribution is one interesting global
feature that makes it
We report the results of the first world-scale social-network distribution
is one interesting global f
graph-distance computations, using the entire Facebook net- possible to reject probabilistic models
even when they match
graph-distance
computations,
using
the
entire
Facebook
netpossible
to reject probabilistic models e
work of active users (⇡ 721 million users, ⇡ 69 billion friend- local features such as the in-degree
distribution.
work
of
active
users
(⇡
721
million
users,
⇡
69
billion
friendlocal
features
such as the in-degree dist
ship links). The average distance we observe is 4.74, corIn particular, earlier work had shown that the spid 2 ,
ship or
links).
The
average distance we observe is 4.74, corIn distance
particular,
earlier work had sh
responding to 3.74 intermediaries
“degrees
of separation”,
which measures the dispersion of the
distribution,
responding
to
3.74
intermediaries
or
“degrees
of
separation”,
which
measures
thefor
dispersion
of the
showing that the world is even smaller than we expected, and appeared to be smaller than 1 (underdispersion)
soshowing
that
the world
even smaller
than we expected, and appeared
to be smaller
prompting the title of this paper.
More
generally,
we isstudy
cial networks, but larger than one
(overdispersion)
for webthan 1 (und
prompting
the
title
of
this
paper.
More
generally,
we
study
the distance distribution of Facebook and of some interest- graphs [3]. Hence, during the talk,
cial networks,
larger
one of thebut
main
openthan one (ov
the distance
distribution
of Facebook
and
some is
interesting geographic subgraphs, looking
also at their
evolution over
graphs
[3]. Hence, during the talk, o
questions
wasof“What
the spid of
Facebook?”.
ing geographic subgraphs, looking Lars
also at
their evolution
overto listen
time.
questions
wastalk,
“What
the spid of Face
Backstrom
happened
to the
and issugThe networks we are able totime.
explore are almost two orders gested a collaboration studying theLars
Backstrom
to listen
Facebook
graph.happened
This
of magnitude larger than those analysed
in the previous
liter-to explore
The networks
we are able
are almost
two orders
was of course
an extremely
intriguing
possibility:
beside testgested
a collaboration
studying the Fa
ature. We report detailed statistical
metadata
showing
that analysed
of magnitude
larger
than those
in the
previous computing
liter- wasthe
ing the “spid
hypothesis”,
distanceandistribution
of course
extremely intriguing po
our measurements (which rely
on probabilistic
algorithms)
ature.
We report detailed
statistical
showing
thathaveing
of the metadata
Facebook graph
would
been
largest
Milgram-computing the
thethe
“spid
hypothesis”,
are very accurate.
our measurements (which rely like
on probabilistic
algorithms)
[20] experiment
ever performed,
of magnitudes
of theorders
Facebook
graph would have been
larger than previous attempts (during
our
experiments
are very accurate.
like [20] experiment Faceever performed, o
book has ⇡ 721 million active users
andthan
⇡ 69previous
billion friendlarger
attempts (during o
1 Introduction
ship links).
book has ⇡ 721 million active users an
Abstract
v:1111.4570v3 [cs.SI] 5 Jan 2012
Google Maps: what is the shortest driving route from South
Hadley to Florida?
Xiv:1111.4570v3 [cs.SI] 5 Jan 2012
I
Sebastian
January 6, 2012
Terminology
If e = (u, v) is an edge, then:
(1) u is a neighbor of v
(2) u is adjacent to v
(3) e is incident on u and v
(4) u and v are the endpoints of e
Definitions
1
2
3
5
4
Path
Distance
The distance from u to v is the minimum number
of edges in any path from u to v
A path is a sequence P of nodes v1, v2, …, vk-1, vk
with the property that each consecutive pair vi,
vi+1 is joined by an edge in E.
1
1
2
3
2
1-4-2 is a path.!
1-3-4 is NOT a path.
3
5
5
4
Connectivity
An undirected graph is connected if for every pair
of nodes u and v, there is a path between u and v.
A cycle is a path v1, v2, …, vk-1, vk in which v1 = vk, k
> 2, and the first k-1 nodes are all distinct.
1
2
3
5
distance(1,2) = 1!
distance(1,3) = 2
4
Cycle
1
1 is adjacent to 2!
(1,2) is incident on 1 and 2
4
1-2-4-1 is a cycle.!
1-2-4 is NOT a cycle.!
1-2-4-1-5 is NOT a cycle.!
1-2-4-1-5-3-2-1 is NOT a cycle.
2
5
4
1
2
5
4
3
is a connected graph.
3
is NOT a connected
graph.
Parents, descendants, ancestors?
(Upside-down) Trees
Trees
A tree is an undirected graph that is connected
and does not contain a cycle.
1
2
3
1
2
5
3
5
4
1
2
is a tree
1
2
2
5
3
5
4
is NOT a tree
4
1
3
4
5
http://www.offbeattravel.com/MoCA.html
Review Definitions
Graph Traversal
Is a graph connected?
What to know: n, m, neighbor, incident, path, distance, cycle,
connected, tree!
1
2
3
5
4
Example on board
easy
Graph Traversal
hmmm...
Is a Graph Connected?
Algorithm 1: Breadth-first search (BFS)
Explore outward by distance
Is a graph connected?!
Approach: explore outward from arbitrary
starting node s to find all nodes reachable from
s (connected component)
a
c
b
d
Start at a:
e
Visit all nodes at
distance 1 from a:
a
c
b
d
Visit all nodes at
distance 2 from a:
a
c
b
d
e
e
4
3
Breadth-First Search
BFS Tree
Layers
I
I
I
I
I
L0 = {s}
L1 = all neighbors of L0
L2 = all nodes with an edge to L1 that don’t belong to L0 or L1
...
Li+1 = nodes with an edge to Li that don’t belong to an earlier
layer:
If we keep only the edges traversed while doing a breadth-first
search, we will have a tree
Example on board
Li+1 = {v : 9(u, v) 2 E, u 2 Li , v 2
/ (L0 [ . . . [ Li )}
Observation: Li consists of all nodes at distance exactly i from s.
There is a path from s to t if and only if t appears in some layer.
A More General Strategy
BFS Tree
Property. Let T be a BFS tree of G = (V, E), and let (x,
y) be an edge of G. Then the layer of x and y differ by
at most 1.
a
Layer 0: {a}!
Layer 1: {b, c, d}!
Layer 2: {e}
c
e
b
d
To explore the connected component, add any node v for which
I
(u, v) is an edge
I
u is explored, but v is not
Picture on board
Proof on board
DFS Algorithm
Is a Graph Connected?
Algorithm 2: Depth-first search (DFS) - Keep exploring from
most recently added node until you have to backtrack
a
c
a
c
e
b
d
a
c
b
d
a
c
b
d
e
b
d
a
c
e
e
b
d
e
DFS(u)
Mark u as ”Explored”
for each edge (u, v) incident to u do
if v is not marked ”Explored” then
Recursively invoke DFS(v)
end if
end for
Summary
Depth First Search
Theorem: Let T be a depth-first search tree. Let
x and y be 2 nodes in the tree. Let (x, y) be an
edge that is in G but not in T. Then either x is an
ancestor of y or y is an ancestor of x in T.
a
a
b
e
d
b
Proof?
I
G = (V, E), n = |V |, m = |E|
neighbor, incident, cycle, path, connected
BFS and DFS
I
e
d
I
I
c
c
Definitions
I
Two ways to traverse a graph, each produces a tree
BFS tree: shallow and wide (“bushy”)
DFS tree: deep and narrow (“scraggly”)
Download