Linear Time Selection These lecture slides are adapted from CLRS 10.3.

advertisement
Linear Time Selection
These lecture slides are adapted
from CLRS 10.3.
Princeton University • COS 423 • Theory of Algorithms • Spring 2002 • Kevin Wayne
Where to Build a Road?
Given (x,y) coordinates of N houses, where should you build road
parallel to x-axis to minimize construction cost of building driveways?
1
2
3
4
5
6
8
7
9
10
11
12
2
Where to Build a Road?
Given (x,y) coordinates of N houses, where should you build road
parallel to x-axis to minimize construction cost of building driveways?
n1 = nodes on top.


n2 = nodes on bottom.
1
2
Decreases total cost by (n2-n1) 
3
4

5
6
8
7
9
10
11
12
3
Where to Build a Road?
Given (x,y) coordinates of N houses, where should you build road
parallel to x-axis to minimize construction cost of building driveways?
n1 = nodes on top.


n2 = nodes on bottom.
1
2
3
4
5
6
8
7
9
10
11
12
4
Where to Build a Road?
Given (x,y) coordinates of N houses, where should you build road
parallel to x-axis to minimize construction cost of building driveways?
n1 = nodes on top.


n2 = nodes on bottom.
1
2
3
4
5
6
8
7
9
10
11
12
5
Where to Build a Road?
Given (x,y) coordinates of N houses, where should you build road
parallel to x-axis to minimize construction cost of building driveways?
Solution: put street at median of y coordinates.
1
2
3
4
5
6
8
7
9
10
11
12
6
Order Statistics
Given N linearly ordered elements, find ith smallest element.
Min: i = 1


Max: i = N

Median: i = (N+1) / 2 and (N+1) / 2

O(N) for min or max.

O(N log N) comparisons by sorting.

O(N log i) with heaps.
Can we do in worst-case O(N) comparisons?
Surprisingly, yes. (Blum, Floyd, Pratt, Rivest, Tarjan, 1973)

Assumption to make presentation cleaner.
All items have distinct values.

7
Select
Similar to quicksort, but throw away useless "half" at each iteration.
Select ith smallest element from a1, a2, . . . , aN.

Select (ith, N, a1, a2, . . . , aN)
x  Partition(N, a1, a2, ..., aN)
k  rank(x)
if (i == k)
return x
x = partition element
Want to choose x so that
x is (roughly) the ith largest.
else if (i < k)
b[]  all items of a[] less than x
return Select(ith, k-1, b1, b2, ..., bk-1)
else if (i > k)
c[]  all items of a[] greater than x
return Select((i-k)th, N-k, c1, c2, ..., cN-k)
8
Partition
Partition().

Divide N elements into N/5 groups of 5 elements each, plus extra.
29
10
38
37
02
55
18
24
34
35
36
22
44
52
11
53
12
13
43
20
04
27
28
23
06
26
40
19
01
46
31
49
08
14
09
05
03
54
30
48
47
32
51
21
45
39
50
15
25
16
41
17
33
07
N = 54
9
Partition
Partition().

Divide N elements into N/5 groups of 5 elements each, plus extra.

Brute force sort each of the 5-element groups.
29
14
10
09
38
05
37
03
02
55
12
18
01
24
17
34
20
35
04
36
22
44
10
52
06
11
53
25
12
16
13
43
24
20
31
04
07
27
28
23
06
38
26
15
40
19
01
18
46
43
31
32
49
35
08
14
29
09
39
05
50
03
26
54
53
30
48
41
47
46
32
33
51
49
21
45
39
44
50
52
15
37
25
54
16
53
41
48
17
47
33
34
07
51
10
Partition
Partition().

Divide N elements into N/5 groups of 5 elements each, plus extra.

Brute force sort each of the 5-element groups.

Find x = "median of medians" by Select() on N/5 medians.
14
09
05
03
02
12
01
17
20
04
36
22
10
06
11
25
16
13
24
31
07
27
28
23
38
15
40
19
18
43
32
35
08
29
39
50
26
53
30
41
46
33
49
21
45
44
52
37
54
53
48
47
34
51
11
Select
Select (ith, N, a1, a2, . . . , aN)
if (N is small) use mergesort
Divide a[] into groups of 5, and let
m1, m2, ..., mN/5 be list of medians.
x  Select(N/10, m1, m2, ..., mN/5)
median of medians
k  rank(x)
if (i == k)
return x
// Case 1
else if (i < k) // Case 2
b[]  all items of a[] less than x
return Select(ith, k-1, b1, b2, ..., bk-1)
else if (i > k) // Case 3
c[]  all items of a[] greater than x
return Select((i-k)th, N-k, c1, c2, ..., cN-k)
12
Selection Analysis
Crux of proof: delete roughly 30% of elements by partitioning.
At least 1/2 of 5 element medians  x

–
median of
medians
at least  N / 5 / 2 = N / 10 medians  x
14
09
05
03
02
12
01
17
20
04
36
22
10
06
11
25
16
13
24
31
07
27
28
23
38
15
40
19
18
43
32
35
08
29
39
50
26
53
30
41
46
33
49
21
45
44
52
37
54
53
48
47
34
51
13
Selection Analysis
Crux of proof: delete roughly 30% of elements by partitioning.
At least 1/2 of 5 element medians  x

at least  N / 5 / 2 = N / 10 medians  x
At least 3 N / 10 elements  x.
–

median of
medians
14
09
05
03
02
12
01
17
20
04
36
22
10
06
11
25
16
13
24
31
07
27
28
23
38
15
40
19
18
43
32
35
08
29
39
50
26
53
30
41
46
33
49
21
45
44
52
37
54
53
48
47
34
51
14
Selection Analysis
Crux of proof: delete roughly 30% of elements by partitioning.
At least 1/2 of 5 element medians  x


at least  N / 5 / 2 = N / 10 medians  x
At least 3 N / 10 elements  x.

At least 3 N / 10 elements  x.
–
median of
medians
14
09
05
03
02
12
01
17
20
04
36
22
10
06
11
25
16
13
24
31
07
27
28
23
38
15
40
19
18
43
32
35
08
29
39
50
26
53
30
41
46
33
49
21
45
44
52
37
54
53
48
47
34
51
15
Selection Analysis
Crux of proof: delete roughly 30% of elements by partitioning.
At least 1/2 of 5 element medians  x


at least  N / 5 / 2 = N / 10 medians  x
At least 3 N / 10 elements  x.

At least 3 N / 10 elements  x.
–
 Select() called recursively (Case 2 or 3) with at most
N - 3 N / 10 elements.
C(N) = # comparisons on a file of size N.
C (N ) 
C  N / 5  


median of medians
 C  (N  3 N / 10  ) 



recursive select
O (N )



insertion sort
Now, solve recurrence.
Apply master theorem?


Assume N is a power of 2?

Assume C(N) is monotone non-decreasing?
16
Selection Analysis
Analysis of selection recurrence.
T(N) = # comparisons on a file of size  N.


T(N) is monotone, but C(N) is not!
 20cN
T( N )  
 T (  N / 5 )  T ( N  3  N / 10 )  cN
if N  50
otherwise
Claim: T(N)  20cN.
Base case: N < 50.


Inductive hypothesis: assume true for 1, 2, . . . , N-1.

Induction step: for N  50, we have:
T (N )  T ( N / 5  )  T (N  3 N / 10  )  cN
 20c N / 5   20c  N  3 N / 10    cN
 20c (N / 5)  20c (N )  20c (N / 4)  cN
For n  50,
3 N / 10  N / 4.
 20cN
17
Linear Time Selection Postmortem
Practical considerations.
Constant (currently) too large to be useful.


Practical variant: choose random partition element.
–

O(N) expected running time ala quicksort.
Open problem: guaranteed O(N) with better constant.
Quicksort.
Worst case O(N log N) if always partition on median.


Justifies practical variants: median-of-3, median-of-5.
18
Download