VERSION: B CSE 100 Midterm #1 Summer 2014: July 15 Problem Topic Points Possible 1 Data Structure comparisons 15 2 BSTs 10 3 Running Time Analysis 30 4 Huffman Coding 30 5 C++ 15 Total Points Earned Grader 100 This exam is closed book, closed notes. Write your name on every page, including reference and scratch paper. Scratch paper must be turned in at the end of the exam. You have 80 minutes to complete this exam. Work to maximize points. If you don’t know the answer to a problem, move on and come back later. Most importantly, stay calm and don’t panic. You can do this. Name:________________________________________ ID:___________________________________________ Exam versions of adjacent students MUST BE DIFFERENT. If your version is the same as your neighbor’s version, raise your hand. Name of student to your LEFT: Name of student to your RIGHT: Exam version of student to your LEFT: Exam version of student to your RIGHT: (Write “N/A” if seat immediately to your left or right is not occupied, or a wall or aisle, etc.) DO NOT OPEN THIS EXAM UNTIL YOU ARE INSTRUCTED TO DO SO. 2 Name__________________________________ 1. Data Structure Comparisons [15 points] Assume you have the choice of the following data structures: sorted arrays, sorted linked list, unsorted linked list, binary search tree and heap. Choose the appropriate data structure if your algorithm repeatedly performs each of the following functions. Briefly justify your answer. a. Searches for elements in a static data set (insertions and deletions are rare). Use sorted array or binary search tree. Reason: Both sorted array and binary search tree has time O(log n) for searching for elements, while sorted linked list, unsorted linked list and heap has time O(n) for searching. Because insertions and deletions are rare in static data set, we don’t need to consider the running time of insertion and deletion. b. Searches for elements in a dynamic data set (insertions and deletions are frequent). Use binary search tree. Reason: Both sorted array and binary search tree have time O(log n) for searching for elements, while sorted linked list, unsorted linked list and heap have time O(n) for searching. When we do search for elements in a dynamic data set, it’s better to choose binary search tree rather than sorted array because the running time of insertions and deletions of BST is O(1), which is much faster than sorted array(O(n)). c. Extracts the element with the minimum key value. Choose sorted linked list. Reason: The running time of extracting the element with the minimum key value of sorted linked list is O(1), which is much faster than others. Name________________________________________ 2. Binary Search Trees [10 points] 3 For each of the following, state whether or not they are legal Binary Search Tree (BST). If the tree is not a legal BST, state why not, annotating the tree where appropriate. For all the given trees, indicate if they are balanced or not. Justify your answer 52 52 A. B. 12 30 75 30 12 35 a) Is tree (A) a legal BST (circle one)? Yes If not, why not? No [3 points] b) Is tree (A) Balanced (circle one)? Justify your answer. No [2 points] Yes The balance factor of all the nodes in tree A is not smaller than -1 or greater than 1. The tree cannot be rearranged to have a smaller height. c) Is tree (B) a legal BST (circle one)? If not, why not? Yes No [2 points] For any node in BST, every element in its left subtree should be smaller than the node, and every element in the right subtree should be greater than the node. Considering the node with 32, its right child is 12, which is greater than it. So it’s not a legal BST. b) Is tree (B) Balanced (circle one)? Justify your answer. Yes No [3 points] The balance factor of the node with 52 is greater than 1. Currently, tree height is 3. The tree can be rearranged to have height = 2. Name________________________________________ 3. Running Time Analysis [30 points] 4 a. Write the most general equation for the average number of comparisons needed to find an element in a particular binary search tree with N nodes, where 𝑑(𝑥𝑖 ) is the depth of node 𝑥𝑖 in the tree and 𝑝𝑖 is the probability of searching for node 𝑥𝑖 . [3 points] Note: some students wrote ∑𝑁 𝑖=1 𝑝𝑖 𝑑(𝑥𝑖 ) = 1 𝑁 ∑ 𝑑(𝑥𝑖 ) 𝑁 𝑖=1 This assumption (pi=1/N) was made in lecture notes, but here, you are asked for the most general equation. If you just wrote the right hand side, this is incorrect, since it is not most general. b. Which of the following assumptions did you rely on in writing the above equation? [2 points] i. The tree is approximately balanced. ii. All nodes in the tree are equally likely to be searched for iii. All orders of insertions are equally likely to occur iv. All priorities are drawn from a uniform probability distribution v. None of the above No assumptions were made here. If you wrote the equation above and choose ii) as the answer, you received full credit. That was not what the question was asking, however. c. Construct all possible binary search trees with the keys 2, 5, 7, 9 under the restriction that the second key inserted into the tree has to be even. [10 points] 5 Name________________________________________ d. Compute the average total depth over all trees that can be constructed with the keys (2, 5, 7, 9, 15) under the restriction that the first key inserted into the tree is even. You are given the following recursive relationship: 6 N*D(N)=(N+1)*D(N-1)+2N-1, where D(N) is the expected total depth of all trees with N keys, under the assumption that all keys are equally likely to be inserted into the tree. [15points] The tree should be like: 2 s The right subtree contains 5,7,9,15. D(2) = 1+2 = 3; D(3) = 17/3; D(4) = 53/6; When we combine it with the root 2, The total depth is: 53/6 + 4 + 1 = 83/6 7 Name________________________________________ 4. Huffman coding [30 points] i) Which of the binary trees is a better encoding scheme over the symbols {h, u, f, m, a, n} if all symbols had a non-zero frequency? Justify your answer. [5 points] A. B. m h u f a B is better n h u f m a n B is prefix free. 8 Name________________________________________ ii) Consider the following symbols with the given frequency distribution: Symbol Frequency Code (see part a) H 0.15 0000 Frequency that would yield a worse lower bound on expected codeword length (See part c) 0.167 U 0.2 01 0.167 F 0.2 001 0.167 M 0.4 1 0.167 A 0.01 00011 0.167 N 0.04 00010 0.165 a. Draw the Huffman code tree below using the following conventions, and then use that tree to fill in the code table above: [15 points] The subtree with the lower frequency is always the right child when two trees are merged The left child is always the 0 child, the right child is always the 1 child Ties are broken using alphabetical ordering. In the case of a tie in frequency between two trees, the tree with the symbol that is earlier in the alphabet is the tree that is picked first to be merged. E.g., if trees with the symbols A and E had the same frequency, then the tree with A would be picked first. When merging two trees, the symbol that is alphabetically earlier is propagated up to the new root. If the trees have the same frequency, the tree with symbol earlier in the alphabet is as the left child of the root. 9 10 Name________________________________________ b. Using your tree, encode the following string. If you find extra bits at the end of the string just ignore them. [5 points] HUFFMAN = _______00000100100110001100010___________________________________________ c. What is the average code length of your Huffman code? [5 points] 4*0.15+2*0.2+3*0.2+1*0.4+5*0.01+5*0.04=2.25 d. Fill in the final column table above with a frequency distribution which would lead to a worse theoretical minimum expected length per coded symbol (i.e., has a higher entropy) than the current frequency distribution. Hint: The theoretical minimum expected length per coded symbol was referred to as Lave in your book and in class, but you don’t necessarily need to remember the exact formula to get the right answer here. [5 points] Can have different correct answers 5. C++ Concepts [15 points] a. Consider the following implementation of a node in the Huffman Tree #ifndef HCNODE_HPP #define HCNODE_HPP class HCNode { public: HCNode* parent; // pointer to HCNode* child0; // pointer to HCNode* child1; // pointer to unsigned char symb; // symbol int count; // count/frequency parent; null if root "0" child; null if leaf "1" child; null if leaf of symbols in subtree bool operator<(HCNode const &) const; }; #endif 11 Name________________________________________ bool HCNode::operator<(HCNode const & other) const { if(count != other.count) return count > other.count; return symb < other.symb; }; Now condsider following code snippet: HCNode n1, n2, n3, n4; n1.count = 100; n1.symb = ’A’; n2.count = 200; n2.symb = ’B’; n3.count = 100; n3.symb = ’C’; For the above code snippet, what do each of the expressions given below evaluate to. Choose TRUE or FALSE: [4 points] i) n1 < n2 FALSE ii) n3 < n1 FALSE II) Explain why the less than operator was overloaded in the HCNode class [1 point] Because we need to compare two objects of HCNode, and it isn’t defined in stdlib. 12 Name________________________________________ b. Show the contents of the array ‘a’ before and after line 4 of the given code is executed. [10 points] int a[5]={0,1,2,3,4}; //line 1 int* p = a+2; // line 2 int &ra = *(p+1); //line 3 ra = 5; //line 4 p = a; before: 0 1 2 3 4 after: 0 1 2 5 4 13