ITCS 2214 Exam 2 Study Guide 1. What is a tree data structure? Answer: In In computer science, a tree is a widely used abstract data type (ADT) or data structure implementing this ADT that simulates a hierarchical tree structure, with a root value and subtrees of children, represented as a set of linked nodes. A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root. Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree, with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node (rather than as a list of nodes and an adjacency list of edges between nodes, as one may represent a digraph, for instance). For example, looking at a tree as a whole, one can talk about "the parent node" of a given node, but in general as a data structure a given node only contains the list of its children, but does not contain a reference to its parent (if any computer science, a tree is a widelyused data structure that emulates a hierarchical tree structure with a set of linked nodes. 2. What is the root? Is the root at the top or bottom of a tree? Answer: The topmost node in a tree is called the root node. Being the topmost node, the root node will not have parents. It is the node at which operations on the tree commonly begin (although some algorithms begin with the leaf nodes and work up ending at the root). All other nodes can be reached from it by following edges or links. (In the formal definition, each such path is also unique). In diagrams, it is typically drawn at the top. In some trees, such as heaps, the root node has special properties. Every node in a tree can be seen as the root node of the subtree rooted at that node. 3. What is a node? Answer: A node is a structure which may contain a value, a condition, or represent a separate data structure (which could be a tree of its own). Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn growing downwards). A node that has a child is called the child's parent node (or ancestor node, or Document1 3/19/2014 1 superior). A node has at most one parent. Nodes that do not have any children are called leaf nodes. They are also referred to as terminal nodes. 4. What is a leaf? What makes a node a leaf? Answer: In computer science, a leaf node or external node is a node of a tree data structure that has zero child nodes. Often, leaf nodes are the nodes farthest from the root node. In the graph theory tree, a leaf node is a vertex of degree 1 other than the root (except when the tree has only one vertex; then the root, too, is a leaf). Every tree has at least one leaf. A non-leaf node is called an internal node. Some trees only store data in internal nodes, though this affects the dynamics of storing data in the tree. For example, with empty leaves, one can store an empty tree with a single leaf node. However with leaves that can store data, it is impossible to store an empty tree unless one stores some kind of marker data in the leaf that signifies that the leaf is to be empty (and thus the tree to be empty as well). Conversely, some trees only store data in the leaf nodes, and use the internal nodes to hold other metadata, such as the range of values in the subtree rooted at that node. This type of tree is useful for range queries. Another example of this is a parse tree. In this type of structure, the root node represents the starting symbol of a grammar, and all internal nodes represent derivations of non-terminals, which continue downward until concrete symbols are established. The leafs are the actual lexical tokens of the sentence. 5. What is the level of the root? Answer: Given a Binary Tree and a key, write a function that returns level of the key. For example, consider the following tree. If the input key is 3, then your function should return 1.If the input key is 4, then your function should return 3. And for key which is not present in key, then your function should return 0. 6. How do you determine the level of a node? Document1 3/19/2014 2 Answer: if the node is the root, then its level is one. Or zero, if that's how you count. if the node is not the root, then its level is one greater than the level of its parent. 7. What is a path? What is the height of a tree? Answer: All nodes along children pointers from root to leaf nodes form a path in a binary tree The maximum height of a binary tree is defined as the number of nodes along the path from the root node to the deepest leaf node. Note that the maximum height of an empty tree is 0 8. What is a balanced tree? Answer: A balanced binary tree is commonly defined as a binary tree in which the depth of the left and right subtrees of every node differ by 1 or less, although in general it is a binary tree where no leaf is much farther away from the root than any other leaf. (Different balancing schemes allow different definitions of "much farther".) Binary trees that are balanced according to this definition have a predictable depth (how many nodes are traversed from the root to a leaf, counting the root as node 0 and subsequent nodes as 1, 2, ..., n). This depth (also called the height) is equal to the integer part of log2(n), where n is the number of nodes on the balanced tree. For example, for a balanced tree with only 1 node, log2(1) = 0, so the depth of the tree is 0. For a balanced tree with 100 nodes, log2(100) = 6.64, so it has a depth of 6. 9. What is a complete tree? Answer: A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. A tree is called an almost complete binary tree or nearly complete binary tree if the exception holds, i.e. the last level is not completely filled. This type of tree is used as a specialized data structure called a heap. 10. What is a full tree? Answer: A full binary tree (sometimes 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children. A full tree is sometimes ambiguously defined as a perfect tree. Physicists define a binary tree to mean a full binary tree 11. What specific traits does a binary search tree have? Answer: Document1 3/19/2014 3 The number of nodes n in a perfect binary tree can be found using this formula: n = 2h+1-1 where h is the depth of the tree. The number of nodes n in a binary tree of height h is at least n = h + 1 and at most n = 2h+1-1 where h is the depth of the tree. The number of leaf nodes l in a perfect binary tree can be found using this formula: l = 2h where h is the depth of the tree. The number of nodes n in a perfect binary tree can also be found using this formula: n = 2l-1 where l is the number of leaf nodes in the tree. The number of null links (i.e., absent children of nodes) in a complete binary tree of n nodes is (n+1). The number of internal nodes (i.e., non-leaf nodes or n-l) in a complete binary tree of n nodes is ⌊ n/2 ⌋. For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2 + 1. 12. What is a heap? Answer: In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then the key of node A is ordered with respect to the key of node B with the same ordering applying across the heap. Either the keys of parent nodes are always greater than or equal to those of the children and the highest key is in the root node (this kind of heap is called max heap) or the keys of parent nodes are less than or equal to those of the children and the lowest key is in the root node (min heap). Heaps are crucial in several efficient graph algorithms such as Dijkstra's algorithm, and in the sorting algorithm heapsort. Note that, as shown in the graphic, there is no implied ordering between siblings or cousins and no implied sequence for an in-order traversal (as there would be in, e.g., a binary search tree). The heap relation mentioned above applies only between nodes and their immediate parents. The maximum number of children each node can have depends on the type of heap, but in many types it is at most two, which is known as a "binary heap". The heap is one maximally efficient implementation of an abstract data type called a priority queue, and in fact priority queues are often referred to as "heaps", regardless of how they may be implemented. Note that despite the similarity of the name "heap" to "stack" and "queue", the latter two are abstract data types, while a heap is a specific data structure, and "priority queue" is the proper term for the abstract data type. A heap data structure should not be confused with the heap which is a common name for the pool of memory from which dynamically allocated memory is allocated. The term was originally used only for the data structure 13. What is the difference between a heap and an ordinary binary tree? Document1 3/19/2014 4 Answer: A binary search tree uses the definition: that for every node, the node to the left of it has a less value(key) and the node to the right of it has a greater value(key). Whereas the heap, being an implementation of a binary tree uses the following definition: If A and B are nodes, where B is the child node of A, then the value(key) of A must be larger than or equal to the value(key) of B. That is, key(A) ≥ key(B) 14. What is the difference between a minheap and a maxheap? Answer: A min-heap is a binary tree such that: the data contained in each node is less than (or equal to) the data in that node’s children. A max-heap is a binary tree such that: the data contained in each node is greater than (or equal to) the data in that node’s children. 15. Describe the process to add an element to a minheap. At what location may an element be added to a minheap? A maxheap? Answer: For MinHeap: Place the new element in the next available position in the array. Compare the new element with its parent. If the new element is smaller, than swap it with its parent. Continue this process until either the new element’s parent is smaller than or equal to the new element, or the new element reaches the root (index 0 of the array) 16. Describe the process of removing an element from a minheap. At what location may an element be removed from a minheap? A maxheap? Answer: Minheap: Place the root element in a variable to return later. Remove the last element in the deepest level and move it to the root. While the moved element has a value greater than at least one of its children, swap this value with the smaller-valued child. Return the original root that was saved 17. What is the relation of a hash table and a hashing function? Answer: Hashing is the technique used for performing almost constant time search in case of insertion, deletion and find operation. Taking a very simple example of it, an array with Document1 3/19/2014 5 its index as key is the example of hash table.So each index (key) can be used for accessing the value in a constant search time. This mapping key must be simple to compute and must helping in identifying the associated value. Function which helps us in generating such kind of key-value mapping is known as Hash Function. Hash Table a.k.a Hash Map is a data structure which uses hash function to generate key corresponding to the associated value 18. What is a collision in a hash table? Answer: In computing, a hash table (also hash map) is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.Ideally, the hash function will assign each key to a unique bucket, but this situation is rarely achievable in practice (usually some keys will hash to the same bucket). Instead, most hash table designs assume that hash collisions—different keys that are assigned by the hash function to the same bucket—will occur and must be accommodated in some way.In a well-dimensioned hash table, the average cost (number of instructions) for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key-value pairs, at constant average cost per operation. In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets. 19. What is a perfect hashing function? Answer: A perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions. A perfect hash function has many of the same applications as other hash functions, but with the advantage that no collision resolution has to be implemented. A perfect hash function for a specific set S that can be evaluated in constant time, and with values in a small range, can be found by a randomized algorithm in a number of operations that is proportional to the size of S. Any perfect hash functions suitable for use with a hash table require at least a number of bits that is proportional to the size of S. A perfect hash function with values in a limited range can be used for efficient lookup operations, by placing keys from S (or other associated values) in a table indexed by the output of the function. Using a perfect hash function is best in situations where there is a frequently queried large set, S, which is seldom updated. Efficient solutions to performing updates are known as dynamic perfect hashing, but these methods are relatively complicated to Document1 3/19/2014 6 implement. A simple alternative to perfect hashing, which also allows dynamic updates, is cuckoo hashing Describe and contrast the following hashing function types: Extraction: Using digit extraction, selected digits are extracted from the key and used as the address. For example, using a six-digit employee number to hash to a three-digit address(000-999), we could select the first, third. and fourth digits (from left) and use them as the address. 379452 = 394 121267 = 112 Division: Perhaps the simplest of all the methods of hashing an integer x is to divide x by M and then to use the remainder modulo M. This is called the division method of hashing . In this case, the hash function is Generally, this approach is quite good for just about any value of M. However, in certain situations some extra care is needed in the selection of a suitable value for M. For example, it is often convenient to make M an even number. But this means that h(x) is even if x is even; and h(x) is odd if x is odd. If all possible keys are equiprobable, then this is not a problem. However if, say, even keys are more likely than odd keys, the function will not spread the hashed values of those keys evenly Folding: In this method the key is interpreted as an integer using some radix (say 10). The integer is divided into segments, each segment except possibly the last having the same number of digits. These segments are then added to obtain the home address. As an example, consider the key 76123451001214. Assume we are dividing keys into segments of size 3 digits. The segments for our key are 761, 234, 510, 012, and 14. The home bucket is 761 + 234 + 510 + 012 + 14 = 1531. In a variant of this scheme, the digits in alternate segments are reversed before adding. This variant is called folding at the boundaries and the original version is called shift folding. Applying the folding at the boundaries method to the above example, the segments after digit reversal are 761, 432, 510, 210, and 14; the home bucket is 761 + 432 + 510 + 210 + 14 = 1927. Document1 3/19/2014 7 Mid-Square: In midsquare hashing, the key is squared and the address selected from the middle of the squared number. The most obvious limitation of this method is the size of the key. Given a key of 6 digits, the product will be 12 digits, which is beyond the maximum integer size of many computers. Because most personal computers can handle a 9-digits integer, let’s demonstrate the concept with keys of 4 digits. Given a key of 9452, the midsquare address calculation is shown below using a 4-digit address (0000 to 9999). 9452 * 9452 = 89340304 : address is 3403 As a variation on the midsquare method, we can select a portion of the key, such as the middle three digits, and then use them rather than the whole key. Doing so allows the method to be used when the key is too large to square. For example, for the keys in Figure 6, we can select the first three digits and then use the midsquare method as shown below. (We select the third, fourth, and fifth digits as the address.) 379452: 379 * 379 = 143641 ë 364 121267: 121 * 121 = 014641 ë 464 378845: 378 * 378 = 142884 ë 288 160252: 160 * 160 = 025600 ë 560 045128: 045 * 045 = 002025 ë 202 Note that in the midsquare method, the same digits must be selected from the product. For that reason, we consider the product to have sufficient leading zeros to make it the full six digits Radix transformation method: Where the value or key is digital, the number base (or radix) can be changed resulting in a different sequence of digits. (For example, a decimal numbered key could be transformed into a hexadecimal numbered key.) High-order digits could be discarded to fit a hash value of uniform length. Digit hashing: function is referred to as Digit Analysis if it forms addresses by selecting and shifting digits or bits of the original keys. An analysis on a sample of the key set is performed to determine which key positions should be used in forming an address. This hashing transformation techniques has been used in the conjunction with static key set is i.e. key sets that do not change over time. Document1 3/19/2014 8 The Length Dependent Method: Another hashing technique which has been commonly used in table-handling applications is called the Length Dependent Method .The length of the key is used along with some portion of the key to produce either a table address directly or more commonly a intermediate key which is used. 1. Describe circumstances in practice where one hashing method is better than another. 2. Dealing with collisions: chaining (overflow) : If the hash table entries are all full then the hash table can increase the number of buckets that it has and then redistribute all the elements in the table. The hash function returns an integer and the hash table has to take the result of the hash function and mod it against the size of the table that way it can be sure it will get to bucket. so by increasing the size it will rehash and run the modulo calculations which if you are lucky might send the objects to different buckets. chaining (links): When a collision occurs, elements with the same hash key will be chained together. A chain is simply a linked list of all the elements with the same hash key. The hash table slots will no longer hold a table element. They will now hold the address of a table element. Document1 3/19/2014 9 linear probing: If faced with a collision situation, the linear probing table will look onto to subsequent hash elements until the first free space is found. This traversal is known as probing the table; and as it goes by one element at a time, it is linear probing. There are other kinds of probing; for example quadratic probing is where the traversal skips one element, then two, then four, etc. until a free space is found. Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In order to resolve the collision, the add algorithm will need to probe the table in order to find the first free space (after 'C'). Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In order to resolve the collision, the add algorithm will need to probe the table in order to find the first free space (after 'C'). If the probe loops back, and finally reaches the same element that it started at, it means that the hash table is full, and can no longer hold any more data. The addition operation will fail. Document1 3/19/2014 10 Document1 3/19/2014 11 1 2 3 4 5 6 8 7 9 Given the above tree, is the sequence that the nodes would be visited: Preorder traversal: 1 2 4 5 8 3 6 7 8 Inorder traversal: 4 2 8 5 1 6 3 7 8 Postorder traversal: Level-order traversal: 1 2 3 4 5 6 7 8 9 Document1 3/19/2014 12 9 5 15 3 8 10 6 17 20 What would the above binary search tree look like if Node 11 were added? Right of 10. Node 5 were removed? 3 will come in the place of the 5. Document1 3/19/2014 13 3. Balance the following binary search trees 9 5 15 17 10 20 Left rotation at the root node 9. And then it wil be balanced 9 5 15 3 8 6 Document1 3/19/2014 14 Right rotation at the end 5 will make it balanced. 4. Assume the following tree is an AVL tree. What is the balancing factor for each node? Add node 7. Now what is the balancing factor for each node? What does that tell you about the tree? 9 5 15 3 19 8 6 Node 6 will go to the right of 6. Root Node Node 5 Node 8 Node 15 After adding Node Root Node Node 5 Node 8 Node 15 Node 6 1 -1 1 -1 2 -2 2 -1 -1 Describe the use of a heap as a priority queue. Answer: n earlier sections you learned about the first-in first-out data structure called a queue. One important variation of a queue is called a priority queue. A priority queue acts like a queue in that you dequeue an item by removing it from the front. However, in a Document1 3/19/2014 15 priority queue the logical order of items inside a queue is determined by their priority. The highest priority items are at the front of the queue and the lowest priority items are at the back. Thus when you enqueue an item on a priority queue, the new item may move all the way to the front. We will see that the priority queue is a useful data structure for some of the graph algorithms we will study in the next chapter. You can probably think of a couple of easy ways to implement a priority queue using sorting functions and lists. However, inserting into a list is O(n) and sorting a list is O(nlogn). We can do better. The classic way to implement a priority queue is using a data structure called a binary heap. A binary heap will allow us both enqueue and dequeue items in O(logn). The binary heap is interesting to study because when we diagram the heap it looks a lot like a tree, but when we implement it we use only a single list as an internal representation. The binary heap has two common variations: the min heap, in which the smallest key is always at the front, and the max heap, in which the largest key value is always at the front. In this section we will implement the min heap. We leave a max heap implementation as an exercise. Binary Heap Operations The basic operations we will implement for our binary heap are as follows: BinaryHeap() creates a new, empty, binary heap. insert(k) adds a new item to the heap. findMin() returns the item with the minimum key value, leaving item in the heap. delMin() returns the item with the minimum key value, removing the item from the heap. isEmpty() returns true if the heap is empty, false otherwise. size() returns the number of items in the heap. buildHeap(list) builds a new heap from a list of keys 1. Describe the use of a minheap for sorting. Would a minheap be useful for ascending or descending sorting? Answer: best answer I can get : http://wiki.answers.com/Q/Is_an_array_that_is_in_sorted_order_a_min-heap?#slide=1 Document1 3/19/2014 16 2. Relate a hash table to a linear search. To a binary search. Answer: 1. As more data input comes, there is huge probability that collision shows up (hash function maps different data to same index). There are two ways to handle collision. First is linear probing that implement hash table as array of linked list. In this case, worst time for insertion or retrieve or deletion is O(n) that all input data are mapped to same index. Besides, hash table need more space than number of input data. Second way is open addressing. It would not consume more space than input data, but at worst case insertion and retrieve is still O(n), which is extremely slow than constant time. 2. You have to know approximate size of input data before initializing hash table. Otherwise you need to resize hash table which is a very time-consuming operation. For example, your hash table size is 100 and then you want to insert the 101st element. Not only the size of hash table is enlarged to 150, all element in hash table have to be rehashed. This insertion operation takes O(n). 3. The elements stored in hash table are unsorted. In certain circumstance, we want data to be stored with sorted order, like contacts in cell phone. However, binary search tree performs well against hash table: 1. Binary search tree never meets collision, which means binary search tree can guarantee insertion, retrieve and deletion are implemented in O(log(n)), which is hugely fast than linear time. Besides, space needed by tree is exactly same as size of input data. 2. You do not need to know size of input in advance. 3. all elements in tree are sorted that in-order traverse takes O(n) time. Useful Links: http://www.cs.cmu.edu/~adamchik/15-121/lectures/Binary%20Heaps/heaps.html Document1 3/19/2014 17