ch4.notes.new

advertisement
Sorting
list stored as array
sort on key that uniquely represents a record
count comparisons for work
determine if inplace or not!
INSERTION SORT or push down sort
- Worst case
for each insert max comps are i-1
w(n) = sum i = 2 to n (i - 1) = [n(n-1)]/2
O(n ** 2)
- Average case
all keys unique
all permutations of keys are equally likely
i positions where x might go
1/i chance that the number belongs in any position
so work for one insertion
so amount of total work for all n-1 insertions
j = i + 1 in the last sum.
An in place sort
- Lower Bound on the behavior of Certain Sorting Algorithms
Let the basic operation be:
compare adjacent keys and swap if necessary
Show all algorithms that do a limited swapping must do a certain amount of work!
There are n! permutations on n items
There is one permutation for which the list is sorted.
The original list of keys is a permutation.
inversion
2, 4, 1, 5, 3 has inversions
(2,1), (4,1), (5,3),(4,3)
Sorting algorithm removes at most one inversion after each key comparison. So the
number of comparisons performed on the input is at least the number of inversions.
So consider inversions.
There is a permutation that has n(n-1)/2 inversions (5,4,3,2,1)
So worst case behavior must be omega(n ** 2)
Lower bound worst case is this, can't be done in fewer comparisons or less work.
Consider the average lower bound. Done by considering the average number of
inversions.
Each permutation has a transpose.
There is one inversion for every pair of numbers either in the transpose or in the
original permutation.
There are n (n-1)/2 such pairs of integers thus on the average there will be n(n-1)/4
inversions. So the best we can do is about n**2 /4. The lower bound!
Section 4.3 Quicksort next!
Quicksort and Mergesort: Divide and Conquer
Quick sort strategy
Worst case analysis:
Divides list into 0 and k-1, pivot is the smallest value.
Does n**2 work in worst case
Up to 2n-3 inversions can be removed with one swap, max with split.
5 4 3 2 1 has 4+3+2+1=10 inversions, 4 3 2 1 5 has 3+2+1=6 inversions.
Average Behavior
all keys distinct
all permutations equally likely
k is the number of keys in a section t sort
A(k) avg number of comparisons done for lists of k size
let x get placed in the ith position after split executes
split does k-1 comparisons
each sublist has i-1 keys and k-i keys respectively
each position for split point i has i/k probability so initial k = n
This simplifies to:
Solve the recurrence relation by:
1) guess with induction proof (establishes there are enough good cases, that
the average behavior is realized)
2) by manipulation
Divide and conquer - general technique
Prove there are enough permutations of numbers that allow Quick sort to realize
the computed average performance:
That is A(n) <= c n ln n for n >= 1
A(n)  n  1   (1/ n) * ( A(i) A(n  1  i))
n1
i 0
So
n1
A(n)  n  1  2 / n A(i)
i 1
Proof: induction on n. Base case: n = 1 A(1) = 0
c 1 ln 1 = 0
for n > 1 we know by the recurrence relation and induction hypothesis
We can bound the sum by integrating.
n 1
n
i 1
i
 ci ln i  c  x ln xdx
Equation 1.15 gives
n
 ci ln i  c  x ln xdx
n1
i 1
i
Equation 1.15 gives
n
 x ln xdx   n ln(n)   n
1
1
1
2
2
2
4
So
2c
1
n
2
A(n)  n  1   (n ln n   n )
1
2
2
4
c
 cn ln n n(1   ) 1
2
ln n  .693lg n which
makes 1.386n lg n
---------------------------------------------------------------------------------------------------
now solve the recurrence relation directly.
we know
n1
A(n)  n  1  2 / n A(i)
i 1
and
n2
A(n  1)  n  2  2 /(n  1) A(i)
i 1
subtract n-1 times 2nd from n times first is
n1
n2
i 1
i 1
nA(n)  (n  1) A(n  1)  n(n  1)  2 A(i)  (n  1)(n  2)  2 A(i)
which is
n 2
n 2
i 1
i 1
nA(n)  (n 1) A(n 1)  n(n 1)  2 A(i)  2 A(n 1)  (n 1)(n  2)  2 A(i)
and
nA(n)  (n  1) A(n  1)  n(n  1)  2 A(n  1)  (n  1)(n  2)
So
nA(n)  (n  1) A(n  1)  (n  1)(n  n  2)  2 A(n  1)
nA(n)  (n  1) A(n  1)  2(n  1)  2 A(n  1)
And
nA(n)  nA(n  1)  A(n  1)  2(n  1)  2 A(n  1)
nA(n)  nA(n  1)  2(n  1)  A(n  1)
nA(n)  nA(n  1)  2(n  1)  A(n  1)
nA(n)  (n  1) A(n  1)  2(n  1)
A( n )
A ( n 1)
 
n 1


2 ( n 1)

n ( n 1)
n
if we let
B ( n) 
A( n )

n1
the recurrence relation for B is
B(n)  B(n  1) 
2 ( n 1)

n ( n 1)
B(n  1)  B(n  2) 
2 ( n2 )

( n 1) n
2 ( n 3 )
B(n  2)  B(n  3) 

( n  2 )( n 1)
B(n)  B(n  3) 
2 ( n 3 )

( n  2 )( n 1)

2 ( n2 )


n
1
( n 1)( n )
2 ( n 1)

n ( n 1)
So
show
n
B ( n)  
2 i 2

i 2 i ( i 1)
n
1
 2   4
i 1 i

i 1 i ( i 1)
n
B(n)  
n 2 ( i 111)
2 i 2
  
i 2 i ( i 1)
n 2 ( i 111)
 
i 2
i ( i 1)
n 2 ( i 12 )
 
i 2
i ( i 1)
n 2 ( i 1)4
 
i 2
i ( i 1)
n 2 ( i 1)

i 2 i ( i 1)

i ( i 1)
i 2
n 2 ( i 12 )

i 2

i ( i 1)
n 2 ( i 1)4

i 2

i ( i 1)
n 2 ( i 1)



n
2
i 2 i ( i 1)
4
4

i ( i 1)
n
1
     4 
i ( i 1)
(Thanks to Eun Jin)
i 2 i
i 2 i ( i 1)
n
B ( n)  
2 i 2

i 2 i ( i 1)
n
1
n
 2   4
i 2 i
1

i 2 i ( i 1)
Which is
4n
B( n)  2(ln n  0.577 )  
n 1
From summation formula 1.11 on page 22
We have
A(n)  (n  1) B(n)
And Therefore
A(n)  1.386 n lg n  2.846 n
from
ln n  .693lg n which makes 1.386n lg n
Improvements on quicksort
- different key (middle)
- sort using a different method on small size
- manipulate stack yourselves
- stack larger interval, process the smaller
General technique - solve several small instances
rather than one large one
recursively break the problem down
solve small cases
combine small solutions
S(n) steps done in direct solution
D(n) steps done in breaking the problem donw
C(n) steps to combine
Recurrence relation for work done
Example for quick sort
k = 2 -- divided in two pieces each time
smallsize = 1
s(1) = 0
D(n) = n-1 -- had to consider n-1 elements in the
interval
C(n) = 0
so
0(nlgn) terms in the recurrence relation
Merge sort of Sorted lists
2 sorted lists - merge into one
measure of work - comparison of keys
Worst case
n in list A, m in list B
does n+m-1 comparisons
worst case is when the last item in A or B goes into C
Show for n = m that 2n-1 is optimal
Simply - the proof states that the last two items of each list
must be compared in order to determine the correct
ordering, thus the 2n-1 is optimal.
Space usage:
O(n) when m = n
Now merge sort since we know how to merge two lists
A Divide and conquer paradigm
Similar analysis to Quick sort, but Quick sort does not
always divide the list evenly. Merge sort will.
W (n)  W ( n / 2  W ( n / 2  n  1
W (1)  0
This is a similar relation to what we had for quick sort, thus:
W (n)  n lg n  n
A worst case performance
W (n)  (n lg n)
merge sort is not an in place sort
Section 4.7 Lower Bounds for Sorting by comparisons of
keys
Consider a decision tree:
n keys x1, x2, ..., xn
each comparison has a 2 way branch
Example decision tree for n = 3 p. 178 Figure 4.15
1:2
2:3
1:3
1,3,2
2:3
2,1,3
1:3
1,2,3
3,1,2
2,3,1
3,2,1
Every sort algorithm will follow one path through the tree.
There must be n! leaves/paths for all possible permutations.
The number of comparisons is the depth of the tree.
For the tree where n = 3, 3 comparisons is the worst case.
2 + 3 + 3 + 2 + 3 + 3 = 16 / 6 = 2 2/3 average performance.
Section 4.7.2
Derive the lower bound in terms of the number of leaves.
L - leaves in the tree
h - height of tree
then
L <= 2h
Proof: induct on h
when h = 1, l = 1, l <= 20
assume L <= 2h what happens for h + 1. To add a depth of 1
demands adding two leaves.
Now,
h  lg L 
by taking the log of both sides and since h is an integer.
Prove the depth must be at least
lg n!
In our decision tree L = n!
So, the number of comparisons needed to sort in worst case
is at least
lg n!
How close is
lg n!
n lg n
to
we know
n! n(n  1)...(n / 2  n / 2
n/2
so
n
n
2
2
lg n!  lg 
which is
(n lg n)
To get a closer lower bound observe
n
lg n!  lg( j )
n 1
Equation 1.18 page 27 established
n
lg n!  lg( j )  n lg n  n(lg e)
n 1
Where e is base of the natural log and lg(e) is 1.433.
so the height of the decision tree is at least
n lg n  1.443n
and thus any sorting algorithm must do at least that much
work.
Merge sort is close to optimal!
Consider n = 5, insertion sort does 10 comparisons,
mergesort does 8 but lg 5! = lg 120 = 7, hence the lower
bound is better than merge sort does. Try to find a sort of 5
keys in 7 comparisons in worst case.
Section 4.7.3 Lower bound for Average Behavior
Consider the External Path Length - EPL
Our decision trees are 2 trees - each node has 2 or 0 children
EPL is minimized if all leaves are on at most two adjacent
levels.
The average path length from the root to a leaf is epl/L for 2
trees with L leaves.
We are looking for a lower bound on epl. Trees that
minimize epl are as balanced as possible. So 2 trees with L
leaves have external path length about LlgL. So we have
argued that the average path length to a leaf is at least lgL.
Therefore, the average number of comparisons done by any
algorithm to sort n items by comparison of keys is at least
lg(n!), which is about nlgn-1.443n.
No algorithm can do better.
That is, this is the best we can do on the average.
Quick sort cannot guarantee worst case performance, Merge
sort is not in place (requires sizable external storage). Heap
Sort combines the best of each, but with a little larger
constant.
Heap Sort
build heap
fix heap
Creates a partial order in a complete tree with some right
most nodes missing. Essentially a priority queue ADT.
S is a set of elements of integer keys.
T is a binary tree
h is the height of the tree
HEAP
a binary tree T is a heap iff
T is complete through depth h-1
all leaves are at depth h or h-1
all paths to a leaf of depth h are to the left of all paths to
a leaf of depth h-1…. that is, T is a left-complete tree.
Partial order
T is partial order tree iff the node is >= its children.
heapSort(E,n)
Construct H from E, the set of n elements to be sorted.
for(i = n, i >= 1; i--)
curMax = get Max(H);
deleteMax(H);
E[i] = curMax; // place Max back into array of elements
deleteMax(H)
Copy the rightmost element on the lowest level of H into K
Delete the right most element on the lowest level of H
fixHeap(H,K)
Fix Heap
50
24
30
21
20
12
5
6
18
3
6
24
30
21
20
3
18
5
12
30
24
6
21
20
12
5
6
18
3
30
24
18
21
20
12
5
6
3
6
The fixHeap procedure does 2h comparisons of keys in
the worst case on a heap with height h.
Construct heap starts at the bottom and recursively
fixHeap.
While loop iterates to depth of tree, 2 comparisons each
iteration.
Worst case:
“construct heap” depends on the cost of fixHeap.
n number of numbers
H heap structure
fixHeap requires about 2 lg(n) comparisons (lgn is the height
of the tree.
r the number of nodes in the right subheap of H, we have
W(n) = W(n-r-1) + W(r) + 2lgn for n > 1
fix left subtree, fix right subtree, 2lgn each pass.
First we solve the above recurrence relation for
N 2
d
1
Which is for complete binary trees, W(N) is an upper bound
on W(n) for almost complete binary trees.
For
N 2
d
 1 the
numbers of nodes in
the right and left subtrees are equal. The recurrence relation
becomes:
1
W ( N )  2W ( ( N  1))  2 lg( N )
2
Applying the master theorem 3.17 with b =2, c=2, E=1 and
f(N)=2lgN it follows that W(N) = (N )
ie. the heap can be built in linear time.
The number of comparisons done by fixHeap on a heap with
k nodes is at most
so
most
the
lg k 
total
of
all
n 1
n
k 1
1
deletions
is
at
2 lg k   2  (lg e) ln xdx
= 2(lge) (nlnn-n) = 2(nlgn-1.443n)
So, the heap construction phase does at most O(n)
c0mparisons and the deletions do at most 2nlgn in both
average and worst case.
Download