Randomized Algorithms for Selection and Sorting

advertisement
Randomized Algorithms for
Selection and Sorting
Analysis of Algorithms
Prepared by
John Reif, Ph.D.
Randomized Algorithms for
Selection and Sorting
a) Randomized Sampling
b) Selection by Randomized Sampling
c) Sorting by Randomized Splitting:
Quicksort and Multisample Sorts
Readings
• Main Reading Selections:
• CLR, Chapters 9
Comparison Problems
•
Input
set X of N distinct keys
total ordering < over X
•
Problems
1) For each key x X
rank(x,X) = |{x’ X| x’ < x}| + 1
2) For each index i {1, …, N)
select (i,X) = the key x X
where i = rank(x,X)
3) Sort (X) = (x1, x2, …, xn)
where xi = select (i,X)
Randomized Comparison Tree
Model
Algorithm Samplerank s(x,X)
begin
end
Let S be a random sample of X-{x}
of size s
output 1+ N/s [rank(x,S)-1]
Algorithm Samplerank s(x,X)
(cont’d)
• Lemma 1
– The expected value of
samplerank s(x,X) is rank (x,X)
Algorithm Samplerank s(x,X)
(cont’d)
proof
Let k = rank(x,X)
For a random y εX,
Prob (y< x) =
k-1
N
k-1
Hence E (rank(x,S)) = s 
1
N
Solving for k, we get
N
rank (x,X) = k = 1+ E[rank(x,S)-1] = E (samplerank(x,X))
s
S is Random Sample of X of Size s
More Precise Bounds on
Randomized Sampling (cont’d)
More Precise Bounds on
Randomized Sampling
• Let S be a random sampling of X
• Let ri = rank(select(i,S),X)
Lemma 2
N
N


Prob  |ri -i
|>c
log N   N -α
s+1
s


More Precise Bounds on
Randomized Sampling (cont’d)
• Proof
We can bound ri by a Beta
distribution, implying
N
mean (ri ) = i
s+1
i (s-i+1)
2
Var (ri ) 
N
(s+1)2 (s+2)
• Weak bounds follow from Chebychev
inequality
• The Tighter bounds follow from
Chernoff Bounds
Subdivision by Random Sampling
• Let S be a random sample of X of size
s
• Let k1, k2, …, ks be the elements of S
in sorted order
• These elements subdivide X into
s+1 subsets x1 = {x ε X | x  k1}
X 2 = {x ε X | k1  x  k 2 }
X 3 = {x ε X | k 2  x  k 3 }
X s+1 = {x ε X | x  k}
Subdivision by Random Sampling
(cont’d)
• How even are these subdivisions?
Subdivision by Random Sampling
(cont’d)
• Lemma 3
If random sample S in X
is of size s and X is of size N,
then S divides X into subsets each of
(N-1)
-α
size  α
ln(N) with prob  1 - N
s
Subdivision by Random Sampling
(cont’d)
Proof
• The number of (s+1) partitions of X is
 N-1 (N-1)s

~
s!
 s 
• The number of partitions of X with one
block of size  v is
 N-v-1 (N-v-1)s

~
s!
 s 
Subdivision by Random Sampling
(cont’d)
• So the probability of a random (s+1)
partition having a block size  v is
 N-v-1
s
s


 s  ~  N-v-1   1  1  for Y= N-1

 

v
 N-1  N-1   Y 


s


s
Y 
Y 
 1
 1  
 Y
~e

s
Y
=e

sv
N 1
 N
Y
-
 1
since 1    e -1
 Y
(N-1)
if v = 
ln N
s
Randomized Algorithms for
Selection
• “canonical selection algorithm”
• Algorithm can select (i,X) input set X of N
keys index i  {1, …, N}
[0] if N=1 then output X
[1] select a bracket B of X, so that
select (i,X) B with high prob.
[2] Let i1 be the number of keys less
than any element of B
[3] output can select (i-i1, B)
B found by random sampling
Hoar’s Selection Algorithm
• Algorithm Hselect (i,X) where 1  i  N
begin
if X = {x} then output x else
choose a random splitter k  X
let B = {x  X|x < k}
if |B| i then output Hselect(i,B)
else output Hselect(i-|B|, X-B)
end
Hoar’s Selection Algorithm (cont’d)
• Sequential time bound T(i,N) has
mean
N

1 i
T (i, N) = N +  T (i-j, N-j)+  T (i-j) 
N  j=1
j=i+1

= 2 N + min (i, N-i) + o(N)
Hoar’s Selection Algorithm (cont’d)
• Random splitter k  X Hselect (i,X) has
two cases
Hoar’s Selection Algorithm (cont’d)
• Inefficient: each recursive call requires
N comparisons, but only
reduces average problem size to ½ N
Improved Randomized Selection
• By Floyd and Rivest
• Algorithm FRselect(i,X)
begin
if X = {x} then output x else
choose k1, k2  X such that k1< k2
let r1 = rank(k1, X), r2 = rank(k2, X)
if r1 > i then FRselect(i, {x  X|x < k1})
else if r2 > i then FRselect(i-r1, {x  X|k1x k2})
else FRselect(i-r2, {x  X|x > k2})
end
Choosing k1, k2
• We must choose k1, k2 so that with
high likelihood,
k1  select(i,X)  k2
Choosing k1, k2 (cont’d)
• Choose random sample S  X size s
• Define
 (s+1)

k1 = select  i
 , S 
 (N+1)

 (s+1)

k 2 = select  i
 , S 
 (N+1)

where  =


d  s log N , d = constant
Choosing k1, k2 (cont’d)
• Lemma 2 implies
P rob (r1 > i) < N
-α
and
P rob (r2 > i) < N
-α
where r1 = rank (k1 , X)
r2 = rank (k 2 , X)
Expected Time Bound
T(i,N)  N + 2T(-,s)
+ Prob (r1 > i)  T(i, r1 )
+ Prob (i > r2 )  T(i - r1 , N - r2 )
+ Prob (r1  i  r2 )  T(i - r1 , r2 - r1 )



N+1 
-
 N+2T(-,s)+2N  N+T  i, 2
 


s+1


 N + min (i, N-i) + o(N)
if we set  
3

2
3
and s = N log N = o(N)
Small Cost of Recursions
• Note
• With prob  1 - 2N- each recursive
call costs only O(s) = o(N)
rather than N in previous algorithm
Randomized Sorting Algorithms
• “canonical sorting algorithm”
• Algorithm cansort(X)
begin
if x={x} then output X else
choose a random sample S of X of size s
Sort S
S subdivides X into s+1 subsets
X1, X2, …, Xs+1
output cansort (X1) cansort (X2)…
cansort(Xs+1)
end
Using Random Sampling to Aid
Sorting
• Problem:
– Must subdivide X into subsets of nearly
equal size to minimize number of
comparisons
– Solution: random sampling!
Hoar’s Randomized Sorting
Algorithm
• Uses sample size s=1
• Algorithm quicksort(X)
begin
if |X|=1 then output X else
choose a random splitter k  X
output quicksort ({x  X|x<k})· (k)·
quicksort ({x  X|x>k})
end
Expected Time Cost of Hoar’s Sort
1 N
T(N)  N-1 +  (T(i-1)  T(N-i))
N i1
 2 N log N
• Inefficient:
• Better to divide problem size by ½
with high likelihood!
Improved Sort using
• Better choice of splitter is
k = sample selects (N/2, N)
• Algorithm samplesorts (X)
begin
if |X|=1 then output X
choose a random subset S of X size
s = N/log N
k  select (S/2,S) cost time o(N)
output
samplesorts ({x  X|x<k})· (k) ·
samplesorts ({x  X|x>k})
end
Randomized Approximant of Mean
• By Lemma 2, rank(k,X) is very nearly
the mean:
N
Prob(| rank(k,X) |  d α N log N) < N -α
2
Improved Expected Time Bounds
T(N)  2T(N1 ) + N - T(N)  N + o(N) +N - 1
 log (N!)
• Is optimal for comparison trees!
Open Problems in Selection and
Sorting
1) Improve Randomized Algorithms to
exactly match lower bounds on
number of comparisons
2) Can we de randomize these
algorithms – i.e., give deterministic
algorithms with the same bounds?
Randomized Algorithms for
Selection and Sorting
Analysis of Algorithms
Prepared by
John Reif, Ph.D.
Download