Cache-Oblivious Priority Queue and Graph Algorithm Applications

advertisement
I/O-Algorithms
Lars Arge
Spring 2012
March 6, 2012
I/O-algorithms
I/O-Model
D
Block I/O
M
• Parameters
N = # elements in problem instance
B = # elements that fits in disk block
M = # elements that fits in main memory
T = # output size in searching problem
• We often assume that M>B2
P
Lars Arge
• I/O: Movement of block between memory
and disk
2
I/O-algorithms
Fundamental Bounds
Internal
N
External
• Sorting:
• Permuting
N log N
N
B
N
log M B NB
min{ N , NB log M B
• Searching:
log 2 N
log B N
• Scanning:
Lars Arge
N
B
N
}
B
3
I/O-algorithms
Fundamental Data Structures
• B-trees: Node degree (B)  queries in O(log B N  T B)
– Rebalancing using split/fuse  updates in O(log B N )
• Weight-balanced B-tress: Weight rather than degree constraint
 Ω(w(v)) updates below v between rebalancing operations on v
• Persistent B-trees:
– Update in current version in O(log B N )
– Search in all previous versions in O(log B N  T B)
• Buffer trees
– Batching of operations to obtain O( B1 log M B NB ) bounds
 O( NB log M NB ) construction algorithms
B
Lars Arge
4
I/O-algorithms
Last time: Interval management
• Maintain N intervals with unique endpoints dynamically such that
stabbing query with point x can be answered efficiently
x
• Static solution: Persistent B-tree
– Linear space and O(log B N  T B) query
• Dynamic solution: External interval tree
– O(log B N ) update
Lars Arge
5
I/O-algorithms
Internal Interval Tree
• Base tree on endpoints – “slab” Xv associated with each node v
• Interval stored in highest node v where it contains midpoint of Xv
• Intervals Iv associated with v stored in
– Left slab list sorted by left endpoint (search tree)
– Right slab list sorted by right endpoint (search tree)
 Linear space and O(log N) update
Lars Arge
6
I/O-algorithms
Internal Interval Tree
x
• Query with x on left side of midpoint of Xroot
– Search left slab list left-right until finding non-stabbed interval
– Recurse in left child
 O(log N+T) query bound
Lars Arge
7
I/O-algorithms
Externalizing Interval Tree
• Natural idea:
– Block tree
– Use B-tree for slab lists
• Number of stabbed intervals in large slab list may be small (or zero)
– We can be forced to do I/O in each of O(log N) nodes
Lars Arge
8
I/O-algorithms
Externalizing Interval Tree
( B )
multislab
• Idea:
– Decrease fan-out to ( B )  height remains O(log B N )
– ( B ) slabs define (B) multislabs
– Interval stored in two slab lists (as before) and one multislab list
– Intervals in small multislab lists collected in underflow structure
– Query answered in v by looking at 2 slab lists and not O(log N)
Lars Arge
9
I/O-algorithms
External Interval Tree
• Linear space, O(log B N  T B) query, O(log B N ) update
• General solution techniques:
– Filtering: Charge part of query cost to output
– Bootstrapping:
* Use O(B2) size structure in each internal node
* Constructed using persistence
* Dynamic using global rebuilding
– Weight-balanced B-tree: Split/fuse in amortized O(1)
Lars Arge
10
I/O-algorithms
Last time: Three-Sided Range Queries
• Interval management: “1.5 dimensional” search
x1
x2
(x1,x2)
(x,x)
x
• More general 2d problem: Dynamic 3-sidede range searching
– Maintain set of points in plane such
that given query (q1, q2, q3), all points
q3
(x,y) with q1  x  q2 and y  q3 can
be found efficiently
q1
q2
• Linear space, O(logB N+T/B) query static solution using persistence
– Dynamic: External priority search tree
Lars Arge
11
I/O-algorithms
Internal Priority Search Tree
9
16.20
4
5,6
16
19,9
5
9,4
1
1,2
1
4
4,1
5
13
13,3
9
13
19
20,3
16
19
20
• Base tree on x-coordinates with nodes augmented with points
• Heap on y-coordinates
– Decreasing y values on root-leaf path
– (x,y) on path from root to leaf holding x
– If v holds point then parent(v) holds point
 Linear space and O(log N) update
Lars Arge
12
I/O-algorithms
Internal Priority Search Tree
9
16.20
4
4
5,6
19
4
16
19,9
5
9,4
1
1,2
1
4
4,1
5
13
13,3
9
13
19
20,3
16
19
20
• Query with (q1, q2, q3) starting at root v:
– Report point in v if satisfying query
– Visit both children of v if point reported
– Always visit child(s) of v on path(s) to q1 and q2
 O(log N+T) query
Lars Arge
13
I/O-algorithms
Externalizing Priority Search Tree
9
16.20
4
5,6
16
19,9
5
9,4
1
1,2
1
4
4,1
5
13
13,3
9
13
19
20,3
16
19
20
• Natural idea: Block tree
• Problem:
– O(log B N ) I/Os to follow paths to to q1 and q2
– But O(T) I/Os may be used to visit other nodes (“overshooting”)
 O(log B N  T ) query
Lars Arge
14
I/O-algorithms
Externalizing Priority Search Tree
9
16.20
4
5,6
16
19,9
5
9,4
1
1,2
1
4
4,1
5
13
13,3
9
13
19
20,3
16
19
20
• Solution idea:
– Store B points in each node 
* O(B2) points stored in each supernode
* B output points can pay for “overshooting”
– Bootstrapping:
* Store O(B2) points in each supernode in static structure
Lars Arge
15
I/O-algorithms
Summary/Conclusion: Priority Search Tree
• We have now discussed structures for special cases of twodimensional range searching
– Space: O(N/B)
q
q3
T
– Query: O(log B N  B)
– Updates: O(log B N )
q1
q
q2
• Cannot be obtained for general (4-sided) 2d range searching:
logB N
) space
– O(logcB N ) query requires Ω( NB log log
B
BN
– O ( NB ) space requires ( N B ) query
q4
q3
q1
Lars Arge
q2
16
I/O-algorithms
External Range Tree
• Base tree: Weight balanced tree with branching parameter Θ(log B N )
and leaf parameter B on x-coordinates

logB N
O(log logB N N )  O( log log
) height
N
B
B
• Points below each node stored in 4 linear space secondary structures:
– “Right” priority search tree
(log B N )
– “Left” priority search tree
– B-tree on y-coordinates
– Interval (priority search) tree

logB N
O( NB log log
)
N space
B
Lars Arge
B
17
I/O-algorithms
External Range Tree
• Secondary interval tree:
– Connect points in each slab in y-order
– Project obtained segments in y-axis
(log B N )
– Intervals stored in interval tree
* Interval augmented with pointer to corresponding points in ycoordinate B-tree in corresponding child node
Lars Arge
18
I/O-algorithms
External Range Tree
• Query with (q1, q2, q3 , q4) answered in top node with q1 and q2 in
different slabs v1 and v2
• Points in slab v1
– Found with 3-sided query in v1
(log B N )
using right priority search tree
• Points in slab v2
– Found with 3-sided query in v2
using left priority search tree
v1
v2
• Points in slabs between v1 and v2
– Answer stabbing query with q3 using interval tree
 first point above q3 in each of the O(log B N ) slabs
– Find points using y-coordinate B-tree in O(log B N ) slabs
Lars Arge
19
I/O-algorithms
External Range Tree
• Query analysis:
– O(log B N ) I/Os to find relevant node
– O(log B N  T B) I/Os to answer two 3-sided queries
– O(log B N  logB N B )  O(log B N ) I/Os to query interval tree
– O(log B N  T B) I/Os to traverse O(log B N ) B-trees

O(log B N  T B) I/Os
(log B N )
v1
Lars Arge
v2
20
I/O-algorithms
External Range Tree
• Insert:
– Insert x-coordinate in weight-balanced B-tree
* Split of v can be performed in O(w(v) log B w(v)) I/Os
log2B N
 O(
) I/Os
logB logB N
logB N
) nodes on one
– Update secondary structures in all O( log log
B
BN
root-leaf path
(log B N )
* Update priority search trees
* Update interval tree
* Update B-tree
log2B N
 O( log log N ) I/Os
B
B
• Delete:
v1
v2
– Similar and using global rebuilding
Lars Arge
21
I/O-algorithms
Summary: External Range Tree
log N
B
• 2d range searching in O( NB log log
) space
B
BN
– O(log B N  T B) I/O query
log2B N
– O( log log N ) I/O update
B
B
• Optimal among O(log B N  T B) query structures
q4
q3
q1
Lars Arge
q2
22
I/O-algorithms
kdB-tree
• kd-tree:
– Recursive subdivision of point-set into two half using
vertical/horizontal line
– Horizontal line on even levels, vertical on uneven levels
– One point in each leaf

Linear space and logarithmic height
Lars Arge
23
I/O-algorithms
kd-Tree: Query
• Query
– Recursively visit nodes corresponding to regions intersecting query
– Report point in trees/nodes completely contained in query
• Query analysis
– Horizontal line intersect Q(N) = 2+2Q(N/4) = O( N ) regions
– Query covers T regions
 O( N  T ) I/Os worst-case
Lars Arge
24
I/O-algorithms
kdB-tree
• kdB-tree:
– Stop subdivision when leaf contains between B/2 and B points
– BFS-blocking of internal nodes
• Query as before
– Analysis as before but each region now contains Θ(B) points

O( N B  T B) I/O query
Lars Arge
25
I/O-algorithms
Construction of kdB-tree
• Simple O( NB log 2 NB ) algorithm
– Find median of y-coordinates (construct root)
– Distribute point based on median
– Recursively build subtrees
– Construct BFS-blocking top-down
• Idea in improved O( NB log M B NB ) algorithm
– Construct log M B levels at a time using O(N/B) I/Os
Lars Arge
26
I/O-algorithms
Construction of kdB-tree
• Sort N points by x- and by y-coordinates using O( NB log M B NB ) I/Os
• Building log M B levels ( M B nodes) in O(N/B) I/Os:
1. Construct M B by M B grid
with N M B points in each slab
2. Count number of points in each
grid cell and store in memory
3. Find slab s with median x-coordinate
4. Scan slab s to find median x-coordinate and construct node
5. Split slab containing median x-coordinate and update counts
6. Recurse on each side of median x-coordinate using grid (step 3)
 Grid grows to M B  M B  M B  ( M B ) during algorithm
 Each node constructed in O( N /( M B  B)) I/Os
Lars Arge
27
I/O-algorithms
kdB-tree
• kdB-tree:
– Linear space
– Query in O( N B  T B) I/Os
– Construction in O( NB log M / B NB ) I/Os
– Point search in O(log B N ) I/Os
• Dynamic?
– Deletions relatively easily in O(log 2B N ) I/Os (partial rebuilding)
Lars Arge
28
I/O-algorithms
kdB-tree Insertion using Logarithmic Method
• Partition pointset S into subsets S0, S1, … Slog N, |Si| = 2i or |Si| = 0
• Build kdB-tree Di on Si
..................................
2
0
2

log N
1
2
2
2
log N
• Query: Query each Di 
O( 2 B  Ti B )  O( N B  T B)
i 0
i 1 j
i
• Insert: Find first empty Di and construct Di out of 1   j 0 2  2
elements in S0,S1, … Si-1
i
– O( 2B log M / B 2B ) I/Os  O( B1 log M / B NB ) per moved point
– Point moved O(log N) times
2
 O( B1 log M / B NB log N )  O(log B N ) I/Os amortized
i
Lars Arge
i
29
I/O-algorithms
kdB-tree Insertion and Deletion
• Insert: Use logarithmic method ignoring deletes
• Delete: Simply delete point p from relevant Di
– i can be calculated based on # insertions since p was inserted
– # insertions calculated by storing insertion number of each point
in separate B-tree

..................................
extra
update
cost
O(log B N )
2
0
2
1
2
2
2
log N
• To maintain O(log N) structures Di
– Perform global rebuild after every Θ(N) updates

O( B1 log M / B NB )  O(log B N ) extra update cost
Lars Arge
30
I/O-algorithms
Summary: kdB-tree
• 2d range searching in O(N/B) space
– Query in O( N B  T B) I/Os
– Construction in O( NB log M / B NB ) I/Os
– Updates in O(log 2B N ) I/Os
q4
q3
• Optimal query among linear space structures
Lars Arge
q1
q2
31
I/O-algorithms
O-Tree Structure
• O-tree:
– B-tree on ( N B log B N ) vertical slabs
– B-tree on ( N B log B N ) horizontal slabs in each vertical slab
2
– kdB-tree on ( N N
2 )  ( B log B N ) points in each leaf
(
log N )
B
B
N
N
B
B
log B N
log 2B N
B log 2B N
Lars Arge
32
I/O-algorithms
O-Tree Query
• Perform rangesearch with q1 and q2 in vertical B-tree
– Query all kdB-trees in leaves of two horizontal B-trees with xinterval intersected but not spanned by query
– Perform rangesearch with q3 and q4 in horizontal B-trees with xinterval spanned by query
* Query all kdB-trees with range intersected by query
N
N
B
B
log B N
log 2B N
B log 2B N
Lars Arge
33
I/O-algorithms
O-Tree Query Analysis
• Vertical B-tree query: O(log B ( N B log B N ))  O( N B )
• Query of all kdB-trees in leaves of two horizontal B-trees:
O( N B log B N )  O( B log 2B N B )  O( TB )  O( N B  TB )
• Query O ( N B log B N ) horizontal B-trees:
O( N B log B N )  O(log B ( N B log B N ))  O( N B )
• Query 2  O( N B log B N ) kdB-trees not completely in query
2  O( N B log B N )  O( B log 2B N B )  O( TB )  O( N B  TB )
• Query in kdB-trees completely
contained in query: O( TB )

O( N B  TB ) I/Os
Lars Arge
34
I/O-algorithms
O-Tree Update
• Insert:
– Search in vertical B-tree: O(log B N ) I/Os
– Search in horizontal B-tree: O(log B N ) I/Os
– Insert in kdB-tree: O(log2B ( B log 2B N ))  O(log B N ) I/Os
• Use global rebuilding when structures grow too big/small
– B-trees not contain ( N B log B N ) elements
– kdB-trees not contain ( B log 2B N ) elements

O(log B N ) I/Os
• Deletes can be handled
in O(log B N ) I/Os similarly
Lars Arge
35
I/O-algorithms
Summary: O-Tree
• 2d range searching in linear space
– O( N B  TB ) I/O query
– O(log B N ) I/O update
• Optimal among structures
using linear space
q4
q3
q1
q2
• Can easily be extended to work in d-dimensions
1 1 d
T
N
O
((
)

)
with optimal query bound
B
B
Lars Arge
36
I/O-algorithms
Summary/Conclusion: 3 and 4-sided Queries
• 3-sided 2d range searching: External priority search tree
– O(log B N  T B) query, O ( NB ) space, O(log B N ) update
q4
q3
q3
q1
q2
q1
q2
• General (4-sided) 2d range searching:
logB N
N
T
O
(log
N

)
O
(
) space,
– External2 range tree:
B
B query,
B logB logB N
logB N
O( log log
) update
B
BN
– O-tree: O ( N B  T B ) query, O ( NB ) space, O(log B N ) update
Lars Arge
37
I/O-algorithms
Summary/Conclusion: Tools and Techniques
• Tools:
– B-trees
– Persistent B-trees
– Buffer trees
– Logarithmic method
– Weight-balanced B-trees
– Global rebuilding
• Techniques:
– Bootstrapping
– Filtering
(x,x)
q3
q1
q4
q3
q1
Lars Arge
q2
q2
38
I/O-algorithms
References
• External Memory Geometric Data Structures
Lecture notes by Lars Arge.
– Section 8-9
Lars Arge
39
Download