Heap Is a data structure that is defined by two properties:

advertisement
Part three
Greedy Algorithms
Heap
Is a data structure that is defined by two properties:
1) It is a complete binary tree (implemented using
array representation)
2) The values stored in a heap are partially ordered.
i.e. there is a relationship between the value stored
at any node and the values of its children.
(Complete binary trees have all levels except the bottom
filled out completely, and the bottom level has all of its
nodes filled in from left to right).
Heaps are often used to implement priority queues and
for external sorting algorithms.
There are many situations, both in real life and in
computing applications, where we wish to choose the
next “most important” from a collection of people, tasks,
or objects.
19 2 46 16 12 54 64 22 17 66 37 35
Heap 66 64 54 17 37 35 46 2 16 12 22 19
Sorted 2 12 16 17 19 22 35 37 46 54 64 66
1 of 14
Part three
Greedy Algorithms
A lg orithm HeapSort  x, n 
Input : x : array in the range 1..n
Output : x : the array in sorted order
 x1  read val

 for i  1 to n  1



 {

BuildHeap x  

read
val


 InsertToHeap( x, i, val)


 }

for i  n downTo 2
{
swap( x1, xi )
Re arrangeHea p x, i  1
end .
One way to build a heap is to insert the elements one at a
time. Each insertion takes (lg n) times in the worst case,
since the value being inserted can move at most the
distance from the bottom of the tree to the top of the tree.
We need to insert (n) values with cost of (n lg n) .
Heap Sort does at most (∟2(n-1)lgn) comparisons of
keys in the worst case. It uses fewer than (2nlgn)
comparisons to sort n elements. No extra working
storage area, except for one record position is required
(working in place).
2 of 14
Part three
Greedy Algorithms
A lg orithm InsetToHeap(a, n, x)
Input : a : array of size n representing a heap;
x : a number
Output : a : new heap; n : new size of the heap
n  n 1
an   x
child  n
 n 
parent   
 2 
while  parent  1
if a parent   achild  then
{
swapa parent , achild 
child  parent
 parent 
parent  

 2 
}
else
parent  0
end .
n
{stop loop}
lg n
1  O(n  1) lg n
i  2 j 1
3 of 14
Part three
Greedy Algorithms
A lg orithm Re arrange (a, n)
Input : a : array of size n representing a heap
Output : a : new heap; n : new size of the heap
n  n 1
parent  1
child  2
while child  n  1
{
if a child   a child  1 then
child  child  1
if a child   a  parent  then
{
swap a  parent , achild 
parent  child
child  child  2
}
else
child  n {stop the loop}
end .
2
lg n
1  On  1lg n
i  n j 1
4 of 14
Part three
Greedy Algorithms
Greedy Algorithms
Huffman Coding Tree
The space/time tradeoff suggests that one can often gain
an improvement in space requirements in exchange for a
penalty in running time. A typical example is storing
files on disk.
If the files are not actively used the owner may wish to
compress them to save space. And then, they can be
uncompressed for use, which costs some time, but only
once.
We can represent a set of items in a computer program
by assigning a unique code for each item.
ASCII coding scheme ⇛ unique 8 bits value to each
character. Takes lg 128 or 7-bits to provide (128)
unique code to represent the (128) symbols of the ASCII
character set. The eighth bit is used either to check for
transmission errors, or to support extended ASCII codes
with additional (128) characters.
Requirement for lg n  bits to represent (n) unique code
values assume that the codes will be the same length
(fixed-length coding scheme).
If all the characters are used equally, fixed-length coding
scheme is the most space efficient method. But not all
characters are used equally often.
It is possible to store the more frequent letters in shorter
codes, but the other characters would require longer
codes.
Huffman code is an approach to assign variable-length
codes.
5 of 14
Part three
Greedy Algorithms
Building Huffman Coding Trees
Huffman coding tree assigns codes to characters such
that the length of the code depends on the relative
frequency or weight of the corresponding character, (it is
variable-length code).
Huffman code for each letter is derived from a full binary
tree called Huffman tree. Each leaf corresponds to a
letter. The goal is to build a tree with a minimum
external path weight.
Weighted path length of a leaf to be its weight times its
depth. A letter with high weight should have low depth.
Process of Building the Huffman Tree
 First order the letters in a list by ascending weight
(frequency).
 Remove the first two letters (ones with lowest
weight), from the list and assign them to leaves in
what will become the Huffman tree.
 Assign these leaves as the children of an internal
node whose weight is the sum of the weights for the
two children.
 Put the sum back on the list in correct place
necessary to preserve the order of the list.
 The process is repeated until only one item remains
on the list.
This process will build a full Huffman tree.
Z
2
K
7
F
24
C
32
U
37
6 of 14
D
42
L
42
E
120
Part three
Greedy Algorithms
Note
This is an example of a greedy algorithm, because at
each step, the two sub trees with least weight are joined
together.
Assigning Huffman Codes
After the Huffman tree is constructed we start assigning
codes to individual letters. Start at the ROOT, we assign
either a (0), or a (1), to each edge in the tree.
Zero is assigned to edges connecting a node with its left
child, and (one) to the right child.
The Huffman code for a letter is simply a binary number
determined by the path from the root to the leaf
corresponding to that letter.
7 of 14
Part three
Letter
C
D
E
F
K
L
U
Z
Greedy Algorithms
Frequency
32
42
120
24
7
42
37
2
Code
1110
101
0
11111
111101
110
100
111100
Bits
4
3
1
5
6
3
3
6
Decoding the message is done by looking at the bits in
the coded string from left to right until a letter is
decoded.
This can be done using the Huffman tree in a reverse
process from that used to generate the codes.
We start from the root; we take branches depending on
the bit value (0 left, 1 right), until reaching a leaf node.
A set of codes is said to meet the prefix property, if no
code in the set is the prefix of another. The prefix
property guarantees that there will be no ambiguity in
how a bit string is decoded. i.e. once we reach the last
bit of a code during the decoding process, we know
which letter it is the code for.
Huffman codes have the prefix property, since any
prefix of a code will correspond to an internal node,
while all codes correspond to leaf nodes.
The average expected cost per character is:
The sum of the cost for each character (Ci ) * the
probability of its occurring ( Pi )
C1 P1  C 2 P2    C n Pn
8 of 14
Part three
Greedy Algorithms
C1 F1  C F2    C n Fn
or
FT
Expected cost per letter to above tree is  2.57
While with fixed length code would require lg8 = 3
bits per letter.
The Huffman coding expected save about 12% for this
set of letters.
9 of 14
Part three
Greedy Algorithms
A lg orithm huffmans, f 
Input : s : a string of characters,
f : an array of frequencies
Output :T : huffmantree for s
Insert all characters int o a heapH  according to frequencies
While H is not empty do
if H contains only one character X  then
make X the root of T
else
pick two characters X and Y with lowest frequencies
delete them from H
replace X and Y with a new character Z whose
frequency is the sum of the frequencies of X and Y
insert Z to H
make X and Y children of Z in T {Z has no parent yet}
end .
Implementation
The operations required for Huffman’s encoding are
 Insertions into a data structure
 Deletions of the two characters with minimal
frequency from heap
 Building the tree
A heap is a good data structure for the 1st two operations,
each of which requires O(lgn) steps in the worst case.
Complexity
 Building the tree takes constant time per node.
 Insertions and deletions take O(lgn) steps each.
 Overall, the running time of the algorithm is
O(nlgn).
10 of 14
Part three
Greedy Algorithms
Shortest Path problem (Dijkstra algorithm)
Use an adjacency matrix representation, in which the edge
lengths are the cost (distances, times…), associated with
the edges. Then we initialize an array called Dist to equal
the 1st row of the edge matrix.
A lg orithm Dijkstra
s  {1}
initialize Dist to be the edges from vertex1  1st row of edges
for i  1 to v  1
choose a vertex w, which is not in s, for which Dist wis min
add w to s
for each vertex j , still not in s
Dist  j   min Dist  j , Dist w  edgew, j 
end .
0









30  50 40 100 

0 40    
 0  10 30 

 10 0   
  20 0 70 
    0 
Iter S
initial
{1}
1
{1,2}
2
{1,2,5}
3
{1,2,5,4}
4
{1,2,5,4,3}
5
{1,2,5,4,3,6}
W Dist(2)
2
5
4
3
6
30*
30
30
30
30
30
Dist(3)
Dist(4) Dist(5) Dist(6)
∞
70
70
60*
60
60
50
50
50*
50
50
50
11 of 14
40
40*
40
40
40
40
100
100
100
100
90*
90
Part three
Greedy Algorithms
This algorithm is called a greedy algorithm, because at
each stage it simply does what is locally optimal.
If the graph is undirected, we can think of it as a directed
graph such that each undirected edge corresponds to two
directed edges in opposite directions with the same
length.
Computes the cost of the shortest path from V0 to each
vertex requires O(n2) time.
Minimum Cost Spanning Tree (Kruskal’s algorithm)
One approach to determine a min-cost spanning tree of a
graph is given by Kruskal.
We partition the set of vertices into V equivalence
classes, each consisting of one vertex, and then process
the edges in order of weight. Edges can be processed in
order of weight by using a min-heap, which is faster than
sorting the edges first.
Examples of applications where a solution to this
problem is useful include soldering the shortest set of
wires needed to connect a set of terminals on a circuit
board, and connecting a set of cities by telephone in such
a way as to require the least amount of wire.
Kruskal's algorithm is dominated by the time required to
process the edges. Total cost of algorithm is ( E lg E ) in
worst case, and close to (V lg E ) in the average case.
Consider the weighted graph below:
12 of 14
Part three
Greedy Algorithms
Suppose the edges have been sorted into the following
order:
(3, 4)
2
Vertex
Cost
(1, 2)
1
(5, 6)
2
(1, 3)
1
(5, 7)
2
(2, 3)
1
(1, 4)
3
(6, 7)
1
(3, 5)
4
13 of 14
Part three
Greedy Algorithms
The efficiency of the algorithm depends upon the
implementations chosen for the priority queue (heap) and
Union-Find.
Priority queue implemented by heap requires O(lgn) for
enqueue (insertion), and dequeue (deletion) operations.
Union-Find structure implemented by weighted-balanced
tree yields O(lgn) time.
A lg orithm Kruskal
T 
while T contains less than n  1 edges
choose an edge(v, w) from E of lowest cos t
delete(v, w) from E
if v, w does not create a cycle in T then
add v, wto T
else
discard v, w
end .
1 of 14
Download