BFS and DFS
• BFS and DFS in directed graphs
• BFS in undirected graphs
• An improved undirected BFS-algorithm
The Buffered Repository Tree (BRT)
• Stores key-value pairs (k,v)
• Supported operations:
• INSERT(k,v) inserts a new pair (k,v) into T
• EXTRACT(k) extracts all pairs with key k
• Complexity:
• INSERT: O((1/B)log2(N/B)) amortized
• EXTRACT: O(log2(N/B) + K/B) amortized
(K = number of reported elements)
The Buffered Repository Tree (BRT)
Main memory
Disk
• (2,4)-tree
• Leaves store between B/4 and B elements
• Internal nodes have buffers of size B
• Root in main memory, rest on disk
INSERT(k,v)
Main memory
Disk
•
•
•
•
O(X/B) I/Os to empty buffer of size X B
Amortized charge per element and level: O(1/B)
Height of tree: O(log2(N/B))
Insertion cost: O((1/B)log2(N/B)) amortized
EXTRACT(k)
Main memory
Disk
•
•
•
•
Number of traversed nodes: O(log2(N/B) + K/B)
Elements with key k
I/Os per node: O(1)
Cost of operation: O(log2(N/B) + K/B)
But careful with removal of extracted elements
Cost of Rebalancing
•
•
O(N/B) leaf creations and deletions
O(N/B) node splits, fusions, merges
Each such operation costs O(1) I/Os
O(N/B) I/Os for rebalancing
Theorem: The BRT supports INSERT and EXTRACT
operations in O((1/B)log2(N/B)) and
O(log2(N/B) + K/B) I/Os amortized.
Directed DFS
• Algorithm proceeds as internal memory algorithm:
• Use stack to determine order in which vertices are
visited
• For current vertex v:
•
•
•
•
Find unvisited out-neighbor w
Push w on the stack
Continue search at w
If no unvisited out-neighbor exists
• Remove v from stack
• Continue search at v’s parent
• Stack operations cost O(N/B) I/Os
• Problem: Finding an unvisited vertex
Directed DFS
• Data structures:
• BRT T
• Stores directed edges (v,w) with key v
• Priority queues P(v), one per vertex
• Stores unexplored out-edges of v
• Invariant:
Not in P(v)
In P(v) and in T
In P(v), but not in T
Directed DFS
• Finding next vertex after vertex v:
v
Total:
O((|V| + |E|/B)log2(|E|/B))
w
EXTRACT(v): Retrieve red edges from T
Remove these edges from P(v) using DELETE
Retrieve next edge using DELETEMIN on P(v)
Insert in-edges of w into T
Push w on the stack
O(log2log
O(|V|
(|E|/B)
+ K1+/B)
|E|/B)
2(|E|/B)
O(|V| + 1sort(|E|))
O(sort(K
))
O((1/B)logm(|E|/B))
O(sort(|E|))
O(1 + (K2/B)log
O((|E|/B)log
2(|E|/B))
2(|E|/B))
O(1/B) amortized
O(|V|/B)
Directed DFS + BFS
• BFS can be solved using same algorithm
• Only modification: Use queue (FIFO) instead of stack
Theorem: Depth first-search and breadth-first search
in a directed graph G = (V,E) can be solved in
O((|V|+|E|/B)log2(|E|/B)) I/Os.
Exercise: Convince yourself that the priority queues
P(v) are not necessary in the case of BFS.
Undirected BFS
Partition graph into levels L(0), L(1), ...
around source:
L(0), L(1), L(2), L(3)
Observation: For v L(i), all its neighbors are in
L(i – 1) L(i) L(i + 1).
Build BFS-tree level by level:
• Initially, L(0) = {r}
• Given levels L(i – 1) and L(i):
• Let X(i) = set of all neighbors of vertices in L(i)
• Let L(i + 1) = X(i) \ (L(i – 1) L(i))
Undirected BFS
Constructing L(i + 1):
• Retrieve adjacency lists of vertices in L(i) X(i)
• Sort X(i)
• Scan L(i – 1), L(i), and X(i) to
• Remove duplicates from X(i)
• Compute X(i) \ (L(i – 1) L(i))
Complexity: O(|L(i)| + sort(|L(i – 1)| + |X(i)|)) I/Os
O( |V| + sort(|E|)) I/Os
Theorem: Breadth-first search in an undirected graph
G = (V,E) can be solved in O(|V| + sort(|E|)) I/Os.
A Faster BFS-Algorithm
Problem with simple BFS-algorithm:
• Random accesses to retrieve adjacency lists
Idea for a faster algorithm:
• Load more than one adjacency list at a time
• Reduces number of random accesses
• Causes edges to be involved in more than one
iteration of the algorithm
Trade-off
A Faster BFS-Algorithm (Randomized)
• Let 0 < m < 1 be a parameter (specified later)
• Two phases:
• Build m|V| disjoint clusters of diameter O(1/m)
• Perform modified version of SIMPLEBFS
• Clusters C1,...,Cq formed using BFS from randomly
chosen set V’ = {r1,...,rq} of masters
• Vertex is chosen as a master with probability m
(coin flip)
Observation: E[|V’|] = m|V|. That is, the expected
number of clusters is m|V|.
Forming Clusters (Randomized)
s
• Apply SIMPLEBFS to form clusters
• L(0) = V’
• v Ci if v is descendant of ri
Forming Clusters (Randomized)
Lemma: The expected diameter of a cluster is 2/m.
vk
s
v5
v4
v3
v2
v1
x
• E[k] 1/m
Corollary: The clusters are formed in expected
O((1/m)sort(|E|)) I/Os.
Forming Clusters (Randomized)
• Form files F1,...,Fq, one per cluster
Fi = concatenation of adjacency lists of vertices in Ci
• Augment every edge (v,w) Fi with the start position
of file Fj s.t. w Cj:
• Edge = triple (v,w,pj)
s
The BFS-Phase
• Maintain a sorted pool H of edges s.t. adjacency lists
of vertices in L(i) are contained in H
• Scan L(i) and H to find vertices in L(i) whose
O((|L(i)| + |H|)/B)
adjacency lists are not in H
• Form list of start positions of files containing these
O(sort(|L(i)|))
adjacency lists and remove duplicates
• Retrieve files, sort them, and merge resulting list H’
O(K + sort(|H’|) + |H|/B)
with H
O((|L(i)| + |H|)/B)
• Scan L(i) and H to build X(i)
• Construct L(i + 1) from L(i – 1), L(i), and X(i) as
O(sort(|L(i)| + |L(i–1)| + |X(i)|))
before
The BFS-Phase
I/O-complexity of single step:
• O(K + |H|/B +
sort(|H’| + |L(i – 1)| + |L(i)| + |X(i)|))
Expected I/O-complexity:
O(m|V| + |E|/(mB) + sort(|E|))
• Choose m max 1,
VB
E
Theorem: BFS in an undirected graph G = (V,E) can
be solved in O sort E 1 V B E scanE I/Os.
Single Source Shortest Paths
• The tournament tree
• SSSP in undirected graphs
• SSSP in planar graphs
Single Source Shortest Paths
Need:
• I/O-efficient
priority queue
• I/O-efficient
method to update
only unvisited
vertices
The Tournament Tree
= I/O-efficient priority queue
• Supports:
• INSERT(x,p)
• DELETE(x)
• DELETEMIN
• DECREASEKEY(x,p)
• All operations take O((1/B)log2(N/B)) I/Os amortized
Note: N = size of the universe # elements in the tree
The Tournament Tree
Main memory
Disk
•
•
•
•
•
Static binary tree over all elements in the universe
Elements map to leaves, M elements per leaf
Internal nodes store between M/2 and M elements
Internal nodes have signal buffers of size M
Root in main memory, rest on disk
The Tournament Tree
Main memory
Disk
• Elements stored at each node are sorted by priority
• Elements at node v have smaller priority than
elements at v’s descendants
• Convention: x T if and only if p(x) is finite
The Tournament Tree
Deletions
• Operation DELETE(x) signal DELETE(x)
x
UPDATE
(x,)
DELETE
(x)
v
The Tournament Tree
Insertions and Updates
• Operations INSERT(x,p) and DECREASEKEY(x,p)
signal UPDATE(x,p)
x
v
w
All elements < p
• Current
Forwardpriority
signal to
p’ w
AtIfleast
element p
p < one
p’: Update
• IfInsert
x Do nothing
p p’:
• Send DELETE(x) to w
The Tournament Tree
Handling Overflow
• Let y be element with highest priority py
• Send signal PUSH(y,py) to appropriate child of v
v
y
w
The Tournament Tree
Keeping the Nodes Filled
v
w
O(M/B) I/Os to move
M/2 elements one level up the tree
The Tournament Tree
Signal Propagation
Main memory
Disk
• Scan v’s signal, partition into sets Xu and Xw
• Load u into memory, apply signals in Xu to u,
insert signals into u’s signal buffer
• Do the same for w
• O((|X| + M)/B) = O(|X|/B) I/Os
The Tournament Tree
Analysis
• Elements travel up the tree
• Cost: O(1/B) I/Os amortized per element and level
• O((K/B)log2(N/B)) I/Os for K operations
• Signals travel down the tree
• Cost: O(1/B) I/Os amortized per signal and level
• O(K) signals for K operations
• O((K/B)log2(N/B)) I/Os
Theorem: The tournament tree supports INSERT,
DELETE, DELETEMIN, and DECREASEKEY operations in
O((1/B)log2(N/B)) I/Os amortized.
Single Source Shortest Paths
Modified Dijkstra:
• Retrieve next vertex v from priority queue Q using
DELETEMIN
• Retrieve v’s adjacency list
• Update distances of all of v’s neighbors, except
predecessor u on the path from s to v
• Repeat
• O(|V| + (E/B)log2(V/B)) I/Os using tournament tree
Single Source Shortest Paths
Problem:
u
v
Observation: If v performs a spurious update of u,
u has tried to update v before.
• Record this update attempt of u on v by insterting u
into another priority queue Q’
Priority: d(s,u) + w({u,v})
Single Source Shortest Paths
Second modification:
• Retrieve next vertex using two DELETEMIN’s,
one on Q, one on Q’
• Let (x,px) be the element retrieved from Q,
let (y,py) be the element retrieved from Q’
• If px py: re-insert (y,py) into Q’ and proceed as
normal
• If px < py: re-insert (x,px) into Q and perform a
DELETE(y) on Q
Single Source Shortest Paths
Lemma: A spurious update is removed from Q before
the targeted vertex can be retrieved using
DELETEMIN.
u
v
• Event A: Spurious update happens (“time”: d(s,v))
• Event B: Vertex u is deleted by retrieval of u from Q’
(“time”: d(s,u) + w(e))
• Event C: Vertex u is retrieved from Q using
DELETEMIN operation (“time”: d(s,v) + w(e))
Single Source Shortest Paths
• Assume that all vertices have different distance from
source s
d(u) < d(v)
• d(v) d(u) + w(e) < d(u) + w(e)
• Sequence of events: A B C
Theorem: The single source shortest path problem on
an undirected graph G = (V,E) can be solved in
O(|V| + (|E|/B)log2(|V|/B)) I/Os.
Planar Graphs
• Shortest paths in planar graphs
• Planar separators
• Planar DFS
Shortest Paths in Planar Graphs
s
GR
Shortest Paths in Planar Graphs
Observation: For every separator vertex v, the
distances from s to v in G and GR are the same.
s
v
s
v
The distances from s to all separator vertices can be
computed in GR.
Shortest Paths in Planar Graphs
Observation: For every vertex v in Gi,
dist(s,v) = min{dist(s,x) + dist(x,v) : v Gi}.
s
v
Can compute dist(s,v) in the following graph:
s
Shortest Paths in Planar Graphs
Three main steps:
• Solve all-pairs shortest paths in subgraphs Gi
• Compute shortest paths from s to separator vertices
in GR
• Compute shortest paths from s to all remaining
vertices
Shortest Paths in Planar Graphs
Regular h-partition:
• O(N/h) subgraphs G1,...,Gr
• Each Gi has size at most h
• Each Gi has boundary size at most h
• Total number of separator vertices ON / h
• Number of boundary sets is O(N/h)
Shortest Paths in Planar Graphs
Three main steps:
• Solve all-pairs shortest paths in subgraphs Gi
• Compute shortest paths from s to separator vertices
in GR
• Compute shortest paths from s to all remaining
vertices
• Assume the given partition is regular B2-partition
Steps 1 and 3 take O(scan(N)) I/Os
Graph GR has O(N/B) vertices and O(N) edges
Shortest Paths in Planar Graphs
Data structures:
• List L storing tentative distances of all vertices
• Priority queue Q storing vertices with their tentative
distances as priorities
One step:
• Retrieve next vertex v using DELETEMIN
• Get distances of v’s neighbors from L
• Update their distances in Q using DELETE and INSERT
O(N + sort(N)) I/Os
Shortest Paths in Planar Graphs
• One I/O per boundary set
• Each boundary set is touched O(B) times:
• Once per vertex on the boundary of the region
• O(N/B2) boundary sets O(N/B) I/Os
Planar Separator
Goal: Compute a separator S of size ON / h whose
removal partitions G into subgraphs of size at most h.
Basic idea:
• Compute hierarchy of log(DB) graphs of
geometrically decreasing size using graph contraction
• Compute a separator of the smallest graph
• Undo the contractions and maintain the separator
while doing this
Assumption: M = W(h log2 B)
Planar Separator
G2
G1
G0
Planar Separator
Properties:
• All Gi are planar
• |Gi+1| |Gi|/2
• Every vertex in Gi+1 represents only a constant
number of vertices in Gi
• Every vertex in Gi+1 represents at most 2i+2 vertices
in G0
• r = log2(DB) graphs G0,…,Gr
|Gr| = O(N/(DB))
Planar Separator
G2
G1
G0
Planar Separator
• Compute separator Sr of Gr:
• Sr = Sr partitions Gr into connected components
of size at most hlog2(DB)
• Takes O(|Gr|) = O(N/B) I/Os [AD96]
Planar Separator
• Compute Si from Si+1:
• Let Si be the set of vertices in Gi represented by
the vertices in Si+1
• Connected components of Gi – Si have size at
most chlog2(DB)
• Partition every connected components of size
more than hlog2(DB) into components of size
hlog2(DB) separator Si
• Takes O(sort(|Gi|)) I/Os:
• Connected components O(sort(|Gi|))
• Partitioning happens in internal memory
• Total: O(sort(N)) I/Os
Planar Separator
• Separator S0 partitions G0 into connected components
of size at most hlog2(DB)
• Size of S0:
r
S 0 2i Si
i 0
r
2i O Gi / h log B
i 0
r
ON / h log B
i 0
ON / h
Planar Separator
• Compute a superset S of S0 so that no connected
component of G – S has size more than h:
• Partition every connected component of G – S0
separately in internal memory
• Total number of extra separator vertices is
ON / h
• Extra cost: O(sort(N)) I/Os
Theorem: A separator S of size ON / h whose
removal partitions G into subgraphs of size at most h
can be obtained in O(sort(N)) I/Os, provided that M
= W(h log2 B).
Building the Graph Hierarchy
Properties:
• All Gi are planar
• |Gi+1| |Gi|/2
• Every vertex in Gi+1 represents only a constant
number of vertices in Gi
• Every vertex in Gi+1 represents at most 2i+2 vertices
in G0
• Build Gi+1 from Gi by
• Contracting edges
• Merging vertices of degree 2 with the same
neighbors
Building the Graph Hierarchy
Iterative approach:
• Extract set of edges that can be contracted
• Contract subset of these edges to reduce number of
vertices by a factor of two
• Repeat until no contractible edges remain
Problem:
• Standard graph contraction procedure may contract
too many vertices into a single vertex.
Building the Graph Hierarchy
Solution:
• Compute maximal matching of contractible subgraph
• Contract edges in the matching
New problem:
• We may not contract sufficient number of edges to
reduce number of vertices by a constant factor
Two-stage contraction:
• Contract maximal matching
• Contract edges between matched and unmatched
vertices
Building the Graph Hierarchy
Why is this two-stage approach good?
• No unmatched vertex remains in contractible
subgraph
• Every matched vertex represents at least two vertices
before the contraction
Size of graph reduces by a factor of two
If a single iteration takes O(sort(|Gi|)) I/Os, the
whole construction of Gi+1 from Gi takes
O(sort(|Gi|)) I/Os
A Single Contraction Phase
• Maximal matching can be computed and contracted
in O(sort(|H|)) I/Os, where H is the current
contractible subgraph
• Bipartite contraction:
• Takes O(sort(|H|)) I/Os using buffer tree as priority
queue
Building the Graph Hierarchy
Lemma: Graph Gi+1 can be constructed from Gi in
O(sort(|Gi|)) I/Os.
Corollary: The whole graph hierarchy can be built in
O(sort(|G0|)) = O(sort(N)) I/Os.
Planar DFS
s
Level 0
Level 1
Level 2
Planar DFS
s
Planar DFS
Observation: Every cycle in the i-th layer is a
boundary cycle of graph Gi.
Level > i
Level < i
Every bicomp of a layer is a cycle.
DFS in a Layer
Planar DFS
• DFS in a single layer Hi takes O(sort(|Hi|)) I/Os:
• Compute the bicomps
• Root the bicomp tree
• Remove one of the edges incident to parent
cutpoint in each cycle
Total I/O-complexity: O(sort(N))
Planar DFS
v
Gi
Planar DFS
r
Building the Face-on-Vertex Graph
Lower Bounds and Open Problems
• Lower bounds
• List ranking, BFS, DFS, and shortest paths
• Connected and biconnected components
• Open problems
Lower Bounds
Split Proximate Neighbors
1 2 3 4
5 6 7 8
7 8 1 3
5 6 4 2
3 3 6 6
5 5 2 2
7 7 1 1
4 4 8 8
Lower Bounds
Split Proximate Neighbors
Lemma: Split proximate neighbors requires
W(perm(N)) I/Os.
1 2 3 4 5 6 7 8
7 8 1 3 5 6 4 2
1 2 3 4
5 6 7 8
7 8 1 3
5 6 4 2
3 3 6 6
5 5 2 2
7 7 1 1
4 4 8 8
3 3 6 6
5 5 2 2
7 7 1 1
4 4 8 8
1 2 3 4
5 6 7 8
7 8 1 3
5 6 4 2
I(N)
O(scan(N))
I(N)
Total: O(I(N) + scan(N)) = O(I(N))
I(N) = W(perm(N))
Lower Bounds
List Ranking
• Consider general algorithms for weighted list ranking
• Algorithm is only allowed to use associativity of sum
operator
Algorithm can be made to have the following
property:
• For every vertex v, v and succ(v) are both in main
memory at some point during the course of the
algorithm
Note: The lower bound we show does not hold for
unweighted list ranking or weighted list ranking over
groups.
Lower Bounds
List Ranking
1 2 3 4
5 6 7 8
7 8 1 3
5 6 4 2
• When both copies of x are in main memory, move to
buffer of size B
• When buffer full, flush to disk
• Split proximate neighbors could be solved in
O(I(N) + scan(N)) I/Os
I(N) = W(perm(N))
Lower Bounds
List Ranking, BFS, DFS, and Shortest Paths
Theorem: List ranking requires W(perm(N)) I/Os.
• List ranking can be solved using BFS, DFS, or SSSP
from the head of the list.
Theorem: BFS, DFS, and SSSP require W(perm(N))
I/Os.
Note: Again, lower bound holds only for algorithms
that compute distances from source only by adding
path lengths.
Lower Bounds
Segmented Duplicate Elimination
• Let P N P2
S: 17 20 19 22 18 19 23 20 19 22 20 18 20 23 17 19
P/2
P/2
P/2
P/2
• Elements drawn from interval [2P+1,3P]
• Construct Boolean array C[2P+1..3P] s.t.
C[i] = 1 iff i S
Proposition: Segmented duplicate elimination requires
W(perm(N)) I/Os.
Lower Bounds
Connected Components
17 20 19 22 18 19 23 20 19 22 20 18 20 23 17 19
S1
S2
S3
1
2
3
4
• Graph construction O(scan(N)) I/Os
• |V| = Q(P), |E| = N
S4
17
18
19
20
21
22
23
24
Lower Bounds
Connected and Biconnected Components
Theorem: Computing the connected components of a
graph G = (V,E) requires W(perm(|E|)) I/Os.
Theorem: Computing the biconnected components of
a graph G = (V,E) requires W(perm(|E|)) I/Os.
More Classes of Sparse Graphs
• Grid graphs
• Separators: Size ON / h in O(sort(N)) I/Os
• BFS/SSSP: O(sort(N))
• DFS: ON / B
• Graphs of bounded treewidth
• Separators: O(N/h) in O(sort(N)) I/Os
• BFS/SSSP: O(sort(N))
• DFS: ???
Open Problems
• Optimal separators for grid graphs
• DFS
• Grid graphs
• Graphs of bounded treewidth
• Semi-external shortest paths
• Optimal connectivity
• Optimal BFS, DFS, and shortest paths or lower
bounds
• Directed graphs
• Topological sorting
• Strongly connected components
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )