I/O-Algorithms Lars Arge Spring 2012 April 17, 2012 I/O-algorithms I/O-Model D Block I/O M • Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory T = # output size in searching problem • We often assume that M>B2 P Lars Arge • I/O: Movement of block between memory and disk 2 I/O-Algorithms Fundamental Bounds • • • • Scanning: Sorting: Permuting Searching: Internal N N log N N log External N B N B log B min{ N , 2 N log M N B B N B log M B N B } N • Note: – Linear I/O: O(N/B) – Permuting not linear – Permuting and sorting bounds are equal in all practical cases – B factor VERY important: NB NB log M B NB N – Cannot sort optimally with search tree Lars Arge 3 I/O-Algorithms Scalability Problems: Block Access Matters • Example: Traversing linked list (List ranking) – Array size N = 10 elements – Disk block size B = 2 elements – Main memory size M = 4 elements (2 blocks) 1 5 2 6 3 8 9 4 7 10 Algorithm 1: N=10 I/Os 1 2 10 9 5 6 3 4 8 7 Algorithm 2: N/B=5 I/Os • Large difference between N and N/B large since block size is large – Example: N = 256 x 106, B = 8000 , 1ms disk access time N I/Os take 256 x 103 sec = 4266 min = 71 hr N/B I/Os take 256/8 sec = 32 sec Lars Arge 4 I/O-algorithms List Ranking • Problem: – Given N-vertex linked list stored in array – Compute rank (number in list) of each vertex 13 54 2 9 4 5 6 9 38 8 7 10 2 7 6 10 • One of the simplest graph problem one can think of • Straightforward O(N) internal algorithm – Also uses O(N) I/Os in external memory • Much harder to get O ( NB log M B NB ) external algorithm Lars Arge 5 I/O-algorithms List Ranking • We will solve more general problem: – Given N-vertex linked list with edge-weights stored in array – Compute sum of weights (rank) from start for each vertex • List ranking: All edge weights one 1 1 1 1 1 1 1 1 1 5 2 6 3 8 9 4 7 10 1 1 • Note: Weight stored in array entry together with edge (next vertex) Lars Arge 6 I/O-algorithms List Ranking 1 1 1 2 2 1 1 1 3 4 2 5 1 1 6 2 7 1 1 8 1 9 10 • Algorithm: 1. Find and mark independent set of vertices 2. “Bridge-out” independent set: Add new edges 3. Recursively rank resulting list 4. “Bridge-in” independent set: Compute rank of independent set • • Step 1, 2 and 4 in O ( NB log M B NB ) I/Os Independent set of size αN for 0 < α ≤ 1 T ( N ) T (( 1 ) N ) O ( NB log M B NB ) O ( NB log Lars Arge N M B B ) I/Os 7 I/O-algorithms List Ranking: Bridge-out/in 2 1 3 24 385 49 58 67 10 7 82 96 10 1 • Obtain information (edge or rang) of successor – Make copy of original list – Sort original list by successor id – Scan original and copy together to obtain successor information – Sort modified original list by id O ( NB log M B NB ) I/Os Lars Arge 8 I/O-algorithms List Ranking: Independent Set • Easy to design O ( NB log M B NB ) randomized algorithm: – Scan list and flip a coin for each vertex – Independent set is vertices with head and successor with tails Independent set of expected size N/4 3 4 5 9 8 7 10 2 6 • Deterministic algorithm: – 3-color vertices (no vertex same color as predecessor/successor) – Independent set is vertices with most popular color Independent set of size at least N/3 • O ( NB log Lars Arge N M B B ) 3-coloring O ( NB log N M B B ) I/O algorithm 9 I/O-algorithms List Ranking: 3-coloring • Algorithm: – Consider forward and backward lists (heads/tails in two lists) – Color forward lists (except tail) alternately red and blue – Color backward lists (except tail) alternately green and blue 3-coloring 3 4 5 9 8 7 10 2 6 Lars Arge 10 I/O-algorithms List Ranking: Forward List Coloring • Identify heads and tails • For each head, insert red element in priority-queue (priority=position) • Repeatedly: – Extract minimal element from queue – Access and color corresponding element in list – Insert opposite color element corresponding to successor in queue 3 4 5 9 8 `7 10 2 6 • Scan of list • O(N) priority-queue operations O ( NB log M B NB ) I/Os Lars Arge 11 I/O-algorithms Summary: List Ranking • Simplest graph problem: Traverse linked list 13 54 2 9 2 4 7 5 6 9 38 8 7 10 6 10 • Very easy O(N) algorithm in internal memory • Much more difficult O ( NB log M B NB ) external memory – Finding independent set via 3-coloring – Bridging vertices in/out • Permuting bound O (min{ N , NB log M B NB }) best possible – Also true for other graph problems Lars Arge 12 I/O-algorithms Summary: List Ranking • External list ranking algorithm similar to PRAM algorithm – Sometimes external algorithms by “PRAM algorithm simulation” • Forward list coloring algorithm example of “time forward processing” – Use external priority-queue to send information “forward in time” to vertices to be processed later 3 4 5 9 8 7 10 2 6 Lars Arge 13 I/O-algorithms Algorithms on Trees TBD Lars Arge 14 I/O-algorithms References • External-Memory Graph Algorithms Y-J. Chiang, M. T. Goodrich, E.F. Grove, R. Tamassia. D. E. Vengroff, and J. S. Vitter. Proc. SODA'95 – Section 3-6 • I/O-Efficient Graph Algorithms Norbert Zeh. Lecture notes – Section 2-4 • Cache-Oblivious Priority Queue and Graph Algorithm Applications L. Arge, M. Bender, E. Demaine, B. Holland-Minkley and I. Munro. SICOMP, 36(6), 2007 – Section 3.1-3-2 Lars Arge 15