Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) •If you do Question 1 only, you get 60 points. •If you do Question 2 only, you get 90 points. •If you correctly do both Question 1 and Question 2, you get 100 points. •Bonus: 5 Points will be given to those who write a Java program for the Huffman code algorithm. 1 Review of Lecture 1 to Lecture 6 Lecture 1: Some concept: Pseudo code, Abstract Data Type. (Page 60 of text book.) Stack. Give the ADT of stack (slide 11 of lecture1) The interface is on slide 19. (Q: Is the interface equivalent to ADT? Not really. We need the method for insertion and deletion, i.e., first in last out. ) Applications: parentheses matching 2 Lecture 2: Linked list Singly linked list Doubly linked list Just know how to setup a list. (Assignment 1) Lecture 3: Analysis of Algorithms (important) Primitive operations Count number of primitive operations for an algorithm big-O notation 2nO(n), 5n2+10n+11++>O(n2). 3 Lecture 4: Tree Definition of tree (slide 7) Tree terminology: root, internal node, external node (leaf), depth of a node, height of a node, height of a node. Inorder traversal of a binary tree Tree ADT, slide 11, Binary tree ADT, slide 17 In terms of programming, understand TreeInExample1.java. (If tested in exam, java codes will be given. I do not want to give long code.) 4 Lecture 5: More on Trees Linked Structure for Binary Tree. Just understand the node: Preorder traversal for any tree Postorder traversal for any tree Array-Based representation of binary tree (slide 9) Algorithms for Depth(), Height() slide 12-15. 5 Lecture 6: Priority Queue (Heeps) Priority Queue ADT (slide 2) Heap: 1. definition of heap 2. What does “heap-order” mean? 3. Complete Binary tree (what is a complete binary?) 4. Height of a complete binary tree with n nodes is O(log n). 5. Insert a node into a heap runtimg time O(log n). 6. removeMin: remove a node with minimum key. Running time O(log n) Array-based complete binary tree representation. Show a sample exam paper. 6 Lecture 6: Priority Queue (Heeps) Priority Queue ADT (slide 2) Heap: 1. definition of heap 2. What does “heap-order” mean? 3. Complete Binary tree (what is a complete binary?) 4. Height of a complete binary tree with n nodes is O(log n). 5. Insert a node into a heap runtimg time O(log n). 6. removeMin: remove a node with minimum key. Running time O(log n) Array-based complete binary tree representation. Show a sample exam paper. 7 Exercise: Give some trees and ask students to give InOrder, PostOrder and PreOrder. Tutorial 6 of Question 2: Using PreOrder. Given a complete binary, write the array representation. Given an array, draw the complete binary tree. Given a heap, show the steps to removMin. Given a heap, show the steps to insert a node with key 3. (Do it for the tree version, do it for an array version.) Linear time construction of a heap. 8 Huffman codes (Page 565 Chapter 12.4) Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%) 45 13 12 16 9 5 fixed-length code 000 001 010 011 100 101 variable-length 0 101 100 111 1101 1100 code The first way needs 1003=300 bits. The second way needs 45 1+13 3+12 3+16 3+9 4+5 4=232 bits. Hash Tables 9 Variable-length code Need some care to read the code. 001011101 (codeword: a=0, b=00, c=01, d=11.) Where to cut? 00 can be explained as either aa or b. Prefix of 0011: 0, 00, 001, and 0011. Prefix codes: no codeword is a prefix of some other codeword. (prefix free) Prefix codes are simple to encode and decode. Hash Tables 10 Using codeword in Table to encode and decode Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.) Decode: 001011101 = 0.0.101.1101 = aabe a b c d e f frequency(%) 45 13 12 16 9 5 fixed-length code 000 001 010 011 100 101 variable-length code 0 101 100 111 1101 1100 Hash Tables 11 Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.) Decode: 001011101 = 0.0.101.1101 = aabe (use the (right)binary tree below:) 10 0 0 0 1 86 0 14 1 58 0 a:4 5 0 28 1 0 b:13 c:1 2 a:4 5 0 d:16 e: 9 1 55 0 1 25 14 1 10 0 1 f:5 Tree for the fixed length codeword Hash Tables 30 0 1 c:1 2 b:13 1 0 14 d:16 0 1 f:5 e: 9 Tree for variablelength codeword 12 Binary tree Every nonleaf node has two children. The fixed-length code in our example is not optimal. The total number of bits required to encode a file is B(T ) f (c)dT (c) cC f ( c ) : the frequency (number of occurrences) of c in the file dT(c): denote the depth of c’s leaf in the tree Hash Tables 13 Constructing an optimal code Formal definition of the problem: Input: a set of characters C={c1, c2, …, cn}, each cC has frequency f[c]. Output: a binary tree representing codewords so that the total number of bits required for the file is minimized. Huffman proposed a greedy algorithm to solve the problem. Hash Tables 14 (a) (b) f:5 c:1 2 e: 9 c:1 2 b:13 b:13 14 0 1 f:5 e: 9 Hash Tables d:16 a:4 5 d:16 a:4 5 15 14 (c) d:16 25 0 1 0 1 f:5 e: 9 c:1 2 b:13 25 (d) 30 0 1 c:1 2 b:13 0 14 1 a:4 5 a:4 5 d:16 0 1 f:5 e: 9 Hash Tables 16 0 a:4 5 55 0 a:4 5 1 25 1 c:1 2 b:13 0 14 0 1 f:5 e: 9 1 25 1 d:16 0 1 55 30 0 10 0 30 0 1 c:1 2 b:13 1 0 14 d:16 0 1 f:5 e: 9 (f) (e) Hash Tables 17 HUFFMAN(C) 1 n:=|C| 2 Q:=C 3 for i:=1 to n-1 do 4 z:=ALLOCATE_NODE() 5 x:=left[z]:=EXTRACT_MIN(Q) 6 y:=right[z]:=EXTRACT_MIN(Q) 7 f[z]:=f[x]+f[y] 8 INSERT(Q,z) 9 return EXTRACT_MIN(Q) Hash Tables 18 The Huffman Algorithm This algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. C is a set of n characters, and each character c in C is a character with a defined frequency f[c]. Q is a priority queue, keyed on f, used to identify the two least-frequent characters to merge together. The result of the merger is a new object (internal node) whose frequency is the sum of the two objects. Hash Tables 19 Time complexity Lines 4-8 are executed n-1 times. Each heap operation in Lines 4-8 takes O(lg n) time. Total time required is O(n lg n). Note: The details of heap operation will not be tested. Time complexity O(n lg n) should be remembered. Hash Tables 20 Another example: e:4 c:6 a:6 c:6 b:9 b:9 d:11 10 0 e:4 Hash Tables d:11 1 a:6 21 10 0 d:11 15 1 e:4 0 c:6 a:6 15 0 c:6 1 b:9 21 0 1 1 10 b:9 0 e:4 Hash Tables d:11 1 a:6 22 36 0 1 15 0 c:6 21 0 1 1 10 b:9 0 e:4 d:11 1 a:6 Summary Huffman Code: Given a set of characters and frequency, you should be able to construct the binary tree for Huffman codes. Proofs for why this algorithm can give optimal solution are not required. Hash Tables 23