Linear Time Selection These lecture slides are adapted from CLRS 10.3. Princeton University • COS 423 • Theory of Algorithms • Spring 2002 • Kevin Wayne Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? 1 2 3 4 5 6 8 7 9 10 11 12 2 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n1 = nodes on top. n2 = nodes on bottom. 1 2 Decreases total cost by (n2-n1) 3 4 5 6 8 7 9 10 11 12 3 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n1 = nodes on top. n2 = nodes on bottom. 1 2 3 4 5 6 8 7 9 10 11 12 4 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? n1 = nodes on top. n2 = nodes on bottom. 1 2 3 4 5 6 8 7 9 10 11 12 5 Where to Build a Road? Given (x,y) coordinates of N houses, where should you build road parallel to x-axis to minimize construction cost of building driveways? Solution: put street at median of y coordinates. 1 2 3 4 5 6 8 7 9 10 11 12 6 Order Statistics Given N linearly ordered elements, find ith smallest element. Min: i = 1 Max: i = N Median: i = (N+1) / 2 and (N+1) / 2 O(N) for min or max. O(N log N) comparisons by sorting. O(N log i) with heaps. Can we do in worst-case O(N) comparisons? Surprisingly, yes. (Blum, Floyd, Pratt, Rivest, Tarjan, 1973) Assumption to make presentation cleaner. All items have distinct values. 7 Select Similar to quicksort, but throw away useless "half" at each iteration. Select ith smallest element from a1, a2, . . . , aN. Select (ith, N, a1, a2, . . . , aN) x Partition(N, a1, a2, ..., aN) k rank(x) if (i == k) return x x = partition element Want to choose x so that x is (roughly) the ith largest. else if (i < k) b[] all items of a[] less than x return Select(ith, k-1, b1, b2, ..., bk-1) else if (i > k) c[] all items of a[] greater than x return Select((i-k)th, N-k, c1, c2, ..., cN-k) 8 Partition Partition(). Divide N elements into N/5 groups of 5 elements each, plus extra. 29 10 38 37 02 55 18 24 34 35 36 22 44 52 11 53 12 13 43 20 04 27 28 23 06 26 40 19 01 46 31 49 08 14 09 05 03 54 30 48 47 32 51 21 45 39 50 15 25 16 41 17 33 07 N = 54 9 Partition Partition(). Divide N elements into N/5 groups of 5 elements each, plus extra. Brute force sort each of the 5-element groups. 29 14 10 09 38 05 37 03 02 55 12 18 01 24 17 34 20 35 04 36 22 44 10 52 06 11 53 25 12 16 13 43 24 20 31 04 07 27 28 23 06 38 26 15 40 19 01 18 46 43 31 32 49 35 08 14 29 09 39 05 50 03 26 54 53 30 48 41 47 46 32 33 51 49 21 45 39 44 50 52 15 37 25 54 16 53 41 48 17 47 33 34 07 51 10 Partition Partition(). Divide N elements into N/5 groups of 5 elements each, plus extra. Brute force sort each of the 5-element groups. Find x = "median of medians" by Select() on N/5 medians. 14 09 05 03 02 12 01 17 20 04 36 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51 11 Select Select (ith, N, a1, a2, . . . , aN) if (N is small) use mergesort Divide a[] into groups of 5, and let m1, m2, ..., mN/5 be list of medians. x Select(N/10, m1, m2, ..., mN/5) median of medians k rank(x) if (i == k) return x // Case 1 else if (i < k) // Case 2 b[] all items of a[] less than x return Select(ith, k-1, b1, b2, ..., bk-1) else if (i > k) // Case 3 c[] all items of a[] greater than x return Select((i-k)th, N-k, c1, c2, ..., cN-k) 12 Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians x – median of medians at least N / 5 / 2 = N / 10 medians x 14 09 05 03 02 12 01 17 20 04 36 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51 13 Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians x at least N / 5 / 2 = N / 10 medians x At least 3 N / 10 elements x. – median of medians 14 09 05 03 02 12 01 17 20 04 36 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51 14 Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians x at least N / 5 / 2 = N / 10 medians x At least 3 N / 10 elements x. At least 3 N / 10 elements x. – median of medians 14 09 05 03 02 12 01 17 20 04 36 22 10 06 11 25 16 13 24 31 07 27 28 23 38 15 40 19 18 43 32 35 08 29 39 50 26 53 30 41 46 33 49 21 45 44 52 37 54 53 48 47 34 51 15 Selection Analysis Crux of proof: delete roughly 30% of elements by partitioning. At least 1/2 of 5 element medians x at least N / 5 / 2 = N / 10 medians x At least 3 N / 10 elements x. At least 3 N / 10 elements x. – Select() called recursively (Case 2 or 3) with at most N - 3 N / 10 elements. C(N) = # comparisons on a file of size N. C (N ) C N / 5 median of medians C (N 3 N / 10 ) recursive select O (N ) insertion sort Now, solve recurrence. Apply master theorem? Assume N is a power of 2? Assume C(N) is monotone non-decreasing? 16 Selection Analysis Analysis of selection recurrence. T(N) = # comparisons on a file of size N. T(N) is monotone, but C(N) is not! 20cN T( N ) T ( N / 5 ) T ( N 3 N / 10 ) cN if N 50 otherwise Claim: T(N) 20cN. Base case: N < 50. Inductive hypothesis: assume true for 1, 2, . . . , N-1. Induction step: for N 50, we have: T (N ) T ( N / 5 ) T (N 3 N / 10 ) cN 20c N / 5 20c N 3 N / 10 cN 20c (N / 5) 20c (N ) 20c (N / 4) cN For n 50, 3 N / 10 N / 4. 20cN 17 Linear Time Selection Postmortem Practical considerations. Constant (currently) too large to be useful. Practical variant: choose random partition element. – O(N) expected running time ala quicksort. Open problem: guaranteed O(N) with better constant. Quicksort. Worst case O(N log N) if always partition on median. Justifies practical variants: median-of-3, median-of-5. 18