Part three Greedy Algorithms Heap Is a data structure that is defined by two properties: 1) It is a complete binary tree (implemented using array representation) 2) The values stored in a heap are partially ordered. i.e. there is a relationship between the value stored at any node and the values of its children. (Complete binary trees have all levels except the bottom filled out completely, and the bottom level has all of its nodes filled in from left to right). Heaps are often used to implement priority queues and for external sorting algorithms. There are many situations, both in real life and in computing applications, where we wish to choose the next “most important” from a collection of people, tasks, or objects. 19 2 46 16 12 54 64 22 17 66 37 35 Heap 66 64 54 17 37 35 46 2 16 12 22 19 Sorted 2 12 16 17 19 22 35 37 46 54 64 66 1 of 14 Part three Greedy Algorithms A lg orithm HeapSort x, n Input : x : array in the range 1..n Output : x : the array in sorted order x1 read val for i 1 to n 1 { BuildHeap x read val InsertToHeap( x, i, val) } for i n downTo 2 { swap( x1, xi ) Re arrangeHea p x, i 1 end . One way to build a heap is to insert the elements one at a time. Each insertion takes (lg n) times in the worst case, since the value being inserted can move at most the distance from the bottom of the tree to the top of the tree. We need to insert (n) values with cost of (n lg n) . Heap Sort does at most (∟2(n-1)lgn) comparisons of keys in the worst case. It uses fewer than (2nlgn) comparisons to sort n elements. No extra working storage area, except for one record position is required (working in place). 2 of 14 Part three Greedy Algorithms A lg orithm InsetToHeap(a, n, x) Input : a : array of size n representing a heap; x : a number Output : a : new heap; n : new size of the heap n n 1 an x child n n parent 2 while parent 1 if a parent achild then { swapa parent , achild child parent parent parent 2 } else parent 0 end . n {stop loop} lg n 1 O(n 1) lg n i 2 j 1 3 of 14 Part three Greedy Algorithms A lg orithm Re arrange (a, n) Input : a : array of size n representing a heap Output : a : new heap; n : new size of the heap n n 1 parent 1 child 2 while child n 1 { if a child a child 1 then child child 1 if a child a parent then { swap a parent , achild parent child child child 2 } else child n {stop the loop} end . 2 lg n 1 On 1lg n i n j 1 4 of 14 Part three Greedy Algorithms Greedy Algorithms Huffman Coding Tree The space/time tradeoff suggests that one can often gain an improvement in space requirements in exchange for a penalty in running time. A typical example is storing files on disk. If the files are not actively used the owner may wish to compress them to save space. And then, they can be uncompressed for use, which costs some time, but only once. We can represent a set of items in a computer program by assigning a unique code for each item. ASCII coding scheme ⇛ unique 8 bits value to each character. Takes lg 128 or 7-bits to provide (128) unique code to represent the (128) symbols of the ASCII character set. The eighth bit is used either to check for transmission errors, or to support extended ASCII codes with additional (128) characters. Requirement for lg n bits to represent (n) unique code values assume that the codes will be the same length (fixed-length coding scheme). If all the characters are used equally, fixed-length coding scheme is the most space efficient method. But not all characters are used equally often. It is possible to store the more frequent letters in shorter codes, but the other characters would require longer codes. Huffman code is an approach to assign variable-length codes. 5 of 14 Part three Greedy Algorithms Building Huffman Coding Trees Huffman coding tree assigns codes to characters such that the length of the code depends on the relative frequency or weight of the corresponding character, (it is variable-length code). Huffman code for each letter is derived from a full binary tree called Huffman tree. Each leaf corresponds to a letter. The goal is to build a tree with a minimum external path weight. Weighted path length of a leaf to be its weight times its depth. A letter with high weight should have low depth. Process of Building the Huffman Tree First order the letters in a list by ascending weight (frequency). Remove the first two letters (ones with lowest weight), from the list and assign them to leaves in what will become the Huffman tree. Assign these leaves as the children of an internal node whose weight is the sum of the weights for the two children. Put the sum back on the list in correct place necessary to preserve the order of the list. The process is repeated until only one item remains on the list. This process will build a full Huffman tree. Z 2 K 7 F 24 C 32 U 37 6 of 14 D 42 L 42 E 120 Part three Greedy Algorithms Note This is an example of a greedy algorithm, because at each step, the two sub trees with least weight are joined together. Assigning Huffman Codes After the Huffman tree is constructed we start assigning codes to individual letters. Start at the ROOT, we assign either a (0), or a (1), to each edge in the tree. Zero is assigned to edges connecting a node with its left child, and (one) to the right child. The Huffman code for a letter is simply a binary number determined by the path from the root to the leaf corresponding to that letter. 7 of 14 Part three Letter C D E F K L U Z Greedy Algorithms Frequency 32 42 120 24 7 42 37 2 Code 1110 101 0 11111 111101 110 100 111100 Bits 4 3 1 5 6 3 3 6 Decoding the message is done by looking at the bits in the coded string from left to right until a letter is decoded. This can be done using the Huffman tree in a reverse process from that used to generate the codes. We start from the root; we take branches depending on the bit value (0 left, 1 right), until reaching a leaf node. A set of codes is said to meet the prefix property, if no code in the set is the prefix of another. The prefix property guarantees that there will be no ambiguity in how a bit string is decoded. i.e. once we reach the last bit of a code during the decoding process, we know which letter it is the code for. Huffman codes have the prefix property, since any prefix of a code will correspond to an internal node, while all codes correspond to leaf nodes. The average expected cost per character is: The sum of the cost for each character (Ci ) * the probability of its occurring ( Pi ) C1 P1 C 2 P2 C n Pn 8 of 14 Part three Greedy Algorithms C1 F1 C F2 C n Fn or FT Expected cost per letter to above tree is 2.57 While with fixed length code would require lg8 = 3 bits per letter. The Huffman coding expected save about 12% for this set of letters. 9 of 14 Part three Greedy Algorithms A lg orithm huffmans, f Input : s : a string of characters, f : an array of frequencies Output :T : huffmantree for s Insert all characters int o a heapH according to frequencies While H is not empty do if H contains only one character X then make X the root of T else pick two characters X and Y with lowest frequencies delete them from H replace X and Y with a new character Z whose frequency is the sum of the frequencies of X and Y insert Z to H make X and Y children of Z in T {Z has no parent yet} end . Implementation The operations required for Huffman’s encoding are Insertions into a data structure Deletions of the two characters with minimal frequency from heap Building the tree A heap is a good data structure for the 1st two operations, each of which requires O(lgn) steps in the worst case. Complexity Building the tree takes constant time per node. Insertions and deletions take O(lgn) steps each. Overall, the running time of the algorithm is O(nlgn). 10 of 14 Part three Greedy Algorithms Shortest Path problem (Dijkstra algorithm) Use an adjacency matrix representation, in which the edge lengths are the cost (distances, times…), associated with the edges. Then we initialize an array called Dist to equal the 1st row of the edge matrix. A lg orithm Dijkstra s {1} initialize Dist to be the edges from vertex1 1st row of edges for i 1 to v 1 choose a vertex w, which is not in s, for which Dist wis min add w to s for each vertex j , still not in s Dist j min Dist j , Dist w edgew, j end . 0 30 50 40 100 0 40 0 10 30 10 0 20 0 70 0 Iter S initial {1} 1 {1,2} 2 {1,2,5} 3 {1,2,5,4} 4 {1,2,5,4,3} 5 {1,2,5,4,3,6} W Dist(2) 2 5 4 3 6 30* 30 30 30 30 30 Dist(3) Dist(4) Dist(5) Dist(6) ∞ 70 70 60* 60 60 50 50 50* 50 50 50 11 of 14 40 40* 40 40 40 40 100 100 100 100 90* 90 Part three Greedy Algorithms This algorithm is called a greedy algorithm, because at each stage it simply does what is locally optimal. If the graph is undirected, we can think of it as a directed graph such that each undirected edge corresponds to two directed edges in opposite directions with the same length. Computes the cost of the shortest path from V0 to each vertex requires O(n2) time. Minimum Cost Spanning Tree (Kruskal’s algorithm) One approach to determine a min-cost spanning tree of a graph is given by Kruskal. We partition the set of vertices into V equivalence classes, each consisting of one vertex, and then process the edges in order of weight. Edges can be processed in order of weight by using a min-heap, which is faster than sorting the edges first. Examples of applications where a solution to this problem is useful include soldering the shortest set of wires needed to connect a set of terminals on a circuit board, and connecting a set of cities by telephone in such a way as to require the least amount of wire. Kruskal's algorithm is dominated by the time required to process the edges. Total cost of algorithm is ( E lg E ) in worst case, and close to (V lg E ) in the average case. Consider the weighted graph below: 12 of 14 Part three Greedy Algorithms Suppose the edges have been sorted into the following order: (3, 4) 2 Vertex Cost (1, 2) 1 (5, 6) 2 (1, 3) 1 (5, 7) 2 (2, 3) 1 (1, 4) 3 (6, 7) 1 (3, 5) 4 13 of 14 Part three Greedy Algorithms The efficiency of the algorithm depends upon the implementations chosen for the priority queue (heap) and Union-Find. Priority queue implemented by heap requires O(lgn) for enqueue (insertion), and dequeue (deletion) operations. Union-Find structure implemented by weighted-balanced tree yields O(lgn) time. A lg orithm Kruskal T while T contains less than n 1 edges choose an edge(v, w) from E of lowest cos t delete(v, w) from E if v, w does not create a cycle in T then add v, wto T else discard v, w end . 1 of 14