Answer - s3.amazonaws.com

advertisement
ITCS 2214 Exam 2 Study Guide
1. What is a tree data structure?
Answer: In In computer science, a tree is a widely used abstract data type (ADT) or data
structure implementing this ADT that simulates a hierarchical tree structure, with a root
value and subtrees of children, represented as a set of linked nodes.
A tree data structure can be defined recursively (locally) as a collection of nodes (starting at
a root node), where each node is a data structure consisting of a value, together with a list
of references to nodes (the "children"), with the constraints that no reference is duplicated,
and none points to the root.
Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree, with
a value assigned to each node. Both these perspectives are useful: while a tree can be
analyzed mathematically as a whole, when actually represented as a data structure it is
usually represented and worked with separately by node (rather than as a list of nodes and
an adjacency list of edges between nodes, as one may represent a digraph, for instance).
For example, looking at a tree as a whole, one can talk about "the parent node" of a given
node, but in general as a data structure a given node only contains the list of its children,
but does not contain a reference to its parent (if any computer science, a tree is a widelyused data structure that emulates a hierarchical tree structure with a set of linked nodes.
2. What is the root? Is the root at the top or bottom of a tree?
Answer: The topmost node in a tree is called the root node. Being the topmost node, the
root node will not have parents. It is the node at which operations on the tree commonly
begin (although some algorithms begin with the leaf nodes and work up ending at the root).
All other nodes can be reached from it by following edges or links. (In the formal definition,
each such path is also unique). In diagrams, it is typically drawn at the top. In some trees,
such as heaps, the root node has special properties. Every node in a tree can be seen as the
root node of the subtree rooted at that node.
3. What is a node?
Answer: A node is a structure which may contain a value, a condition, or represent a
separate data structure (which could be a tree of its own). Each node in a tree has zero or
more child nodes, which are below it in the tree (by convention, trees are drawn growing
downwards). A node that has a child is called the child's parent node (or ancestor node, or
Document1
3/19/2014
1
superior). A node has at most one parent. Nodes that do not have any children are called
leaf nodes. They are also referred to as terminal nodes.
4. What is a leaf? What makes a node a leaf?
Answer: In computer science, a leaf node or external node is a node of a tree data structure
that has zero child nodes. Often, leaf nodes are the nodes farthest from the root node. In
the graph theory tree, a leaf node is a vertex of degree 1 other than the root (except when
the tree has only one vertex; then the root, too, is a leaf). Every tree has at least one leaf. A
non-leaf node is called an internal node. Some trees only store data in internal nodes,
though this affects the dynamics of storing data in the tree. For example, with empty leaves,
one can store an empty tree with a single leaf node. However with leaves that can store
data, it is impossible to store an empty tree unless one stores some kind of marker data in
the leaf that signifies that the leaf is to be empty (and thus the tree to be empty as well).
Conversely, some trees only store data in the leaf nodes, and use the internal nodes to hold
other metadata, such as the range of values in the subtree rooted at that node. This type of
tree is useful for range queries. Another example of this is a parse tree. In this type of
structure, the root node represents the starting symbol of a grammar, and all internal nodes
represent derivations of non-terminals, which continue downward until concrete symbols
are established. The leafs are the actual lexical tokens of the sentence.
5. What is the level of the root?
Answer: Given a Binary Tree and a key, write a function that returns level of the key.
For example, consider the following tree. If the input key is 3, then your function should
return 1.If the input key is 4, then your function should return 3. And for key which is not
present in key, then your function should return 0.
6. How do you determine the level of a node?
Document1
3/19/2014
2
Answer: if the node is the root, then its level is one. Or zero, if that's how you count.
if the node is not the root, then its level is one greater than the level of its parent.
7. What is a path? What is the height of a tree?
Answer: All nodes along children pointers from root to leaf nodes form a path in a binary
tree
The maximum height of a binary tree is defined as the number of nodes along the path from
the root node to the deepest leaf node. Note that the maximum height of an empty tree is
0
8. What is a balanced tree?
Answer: A balanced binary tree is commonly defined as a binary tree in which the depth of
the left and right subtrees of every node differ by 1 or less, although in general it is a binary
tree where no leaf is much farther away from the root than any other leaf. (Different
balancing schemes allow different definitions of "much farther".) Binary trees that are
balanced according to this definition have a predictable depth (how many nodes are
traversed from the root to a leaf, counting the root as node 0 and subsequent nodes as 1, 2,
..., n). This depth (also called the height) is equal to the integer part of log2(n), where n is
the number of nodes on the balanced tree. For example, for a balanced tree with only 1
node, log2(1) = 0, so the depth of the tree is 0. For a balanced tree with 100 nodes,
log2(100) = 6.64, so it has a depth of 6.
9. What is a complete tree?
Answer: A complete binary tree is a binary tree in which every level, except possibly the
last, is completely filled, and all nodes are as far left as possible. A tree is called an almost
complete binary tree or nearly complete binary tree if the exception holds, i.e. the last level
is not completely filled. This type of tree is used as a specialized data structure called a
heap.
10. What is a full tree?
Answer: A full binary tree (sometimes 2-tree or strictly binary tree) is a tree in which every
node other than the leaves has two children. A full tree is sometimes ambiguously defined
as a perfect tree. Physicists define a binary tree to mean a full binary tree
11. What specific traits does a binary search tree have?
Answer:
Document1
3/19/2014
3







The number of nodes n in a perfect binary tree can be found using this formula: n =
2h+1-1 where h is the depth of the tree.
The number of nodes n in a binary tree of height h is at least n = h + 1 and at most n
= 2h+1-1 where h is the depth of the tree.
The number of leaf nodes l in a perfect binary tree can be found using this formula: l
= 2h where h is the depth of the tree.
The number of nodes n in a perfect binary tree can also be found using this formula:
n = 2l-1 where l is the number of leaf nodes in the tree.
The number of null links (i.e., absent children of nodes) in a complete binary tree of
n nodes is (n+1).
The number of internal nodes (i.e., non-leaf nodes or n-l) in a complete binary tree
of n nodes is ⌊ n/2 ⌋.
For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2, n0 = n2
+ 1.
12. What is a heap?
Answer: In computer science, a heap is a specialized tree-based data structure that satisfies
the heap property: If A is a parent node of B then the key of node A is ordered with respect
to the key of node B with the same ordering applying across the heap. Either the keys of
parent nodes are always greater than or equal to those of the children and the highest key
is in the root node (this kind of heap is called max heap) or the keys of parent nodes are less
than or equal to those of the children and the lowest key is in the root node (min heap).
Heaps are crucial in several efficient graph algorithms such as Dijkstra's algorithm, and in
the sorting algorithm heapsort. Note that, as shown in the graphic, there is no implied
ordering between siblings or cousins and no implied sequence for an in-order traversal (as
there would be in, e.g., a binary search tree). The heap relation mentioned above applies
only between nodes and their immediate parents. The maximum number of children each
node can have depends on the type of heap, but in many types it is at most two, which is
known as a "binary heap". The heap is one maximally efficient implementation of an
abstract data type called a priority queue, and in fact priority queues are often referred to
as "heaps", regardless of how they may be implemented. Note that despite the similarity of
the name "heap" to "stack" and "queue", the latter two are abstract data types, while a
heap is a specific data structure, and "priority queue" is the proper term for the abstract
data type. A heap data structure should not be confused with the heap which is a common
name for the pool of memory from which dynamically allocated memory is allocated. The
term was originally used only for the data structure
13. What is the difference between a heap and an ordinary binary tree?
Document1
3/19/2014
4
Answer: A binary search tree uses the definition: that for every node, the node to the left of
it has a less value(key) and the node to the right of it has a greater value(key). Whereas the
heap, being an implementation of a binary tree uses the following definition: If A and B are
nodes, where B is the child node of A, then the value(key) of A must be larger than or equal
to the value(key) of B. That is, key(A) ≥ key(B)
14. What is the difference between a minheap and a maxheap?
Answer: A min-heap is a binary tree such that: the data contained in each node is less than
(or equal to) the data in that node’s children.
A max-heap is a binary tree such that: the data contained in each node is greater than (or
equal to) the data in that node’s children.
15. Describe the process to add an element to a minheap. At what location may an element
be added to a minheap? A maxheap?
Answer: For MinHeap:
Place the new element in the next available position in the array.
Compare the new element with its parent. If the new element is smaller, than swap it
with its parent.
Continue this process until either the new element’s parent is smaller than or equal to
the new element, or the new element reaches the root (index 0 of the array)
16. Describe the process of removing an element from a minheap. At what location may an
element be removed from a minheap? A maxheap?
Answer: Minheap:




  Place the root element in a variable to return later.
  Remove the last element in the deepest level and move it to the root.
 While the moved element has a value greater than at least one of its children, swap
this value with the smaller-valued child.
  Return the original root that was saved
17. What is the relation of a hash table and a hashing function?
Answer: Hashing is the technique used for performing almost constant time search in case
of insertion, deletion and find operation. Taking a very simple example of it, an array with
Document1
3/19/2014
5
its index as key is the example of hash table.So each index (key) can be used for accessing
the value in a constant search time. This mapping key must be simple to compute and must
helping in identifying the associated value. Function which helps us in generating such kind
of key-value mapping is known as Hash Function. Hash Table a.k.a Hash Map is a data
structure which uses hash function to generate key corresponding to the associated value
18. What is a collision in a hash table?
Answer: In computing, a hash table (also hash map) is a data structure used to implement
an associative array, a structure that can map keys to values. A hash table uses a hash
function to compute an index into an array of buckets or slots, from which the correct value
can be found.Ideally, the hash function will assign each key to a unique bucket, but this
situation is rarely achievable in practice (usually some keys will hash to the same bucket).
Instead, most hash table designs assume that hash collisions—different keys that are
assigned by the hash function to the same bucket—will occur and must be accommodated
in some way.In a well-dimensioned hash table, the average cost (number of instructions) for
each lookup is independent of the number of elements stored in the table. Many hash table
designs also allow arbitrary insertions and deletions of key-value pairs, at constant average
cost per operation. In many situations, hash tables turn out to be more efficient than search
trees or any other table lookup structure. For this reason, they are widely used in many
kinds of computer software, particularly for associative arrays, database indexing, caches,
and sets.
19. What is a perfect hashing function?
Answer: A perfect hash function for a set S is a hash function that maps distinct elements in
S to a set of integers, with no collisions. A perfect hash function has many of the same
applications as other hash functions, but with the advantage that no collision resolution has
to be implemented.
A perfect hash function for a specific set S that can be evaluated in constant time, and with
values in a small range, can be found by a randomized algorithm in a number of operations
that is proportional to the size of S. Any perfect hash functions suitable for use with a hash
table require at least a number of bits that is proportional to the size of S. A perfect hash
function with values in a limited range can be used for efficient lookup operations, by
placing keys from S (or other associated values) in a table indexed by the output of the
function. Using a perfect hash function is best in situations where there is a frequently
queried large set, S, which is seldom updated. Efficient solutions to performing updates are
known as dynamic perfect hashing, but these methods are relatively complicated to
Document1
3/19/2014
6
implement. A simple alternative to perfect hashing, which also allows dynamic updates, is
cuckoo hashing
Describe and contrast the following hashing function types:
Extraction: Using digit extraction, selected digits are extracted from the key and used as the
address. For example, using a six-digit employee number to hash to a three-digit
address(000-999), we could select the first, third. and fourth digits (from left) and use them
as the address.
379452 =
394
121267 =
112
Division: Perhaps the simplest of all the methods of hashing an integer x is to
divide x by M and then to use the remainder modulo M. This is called the division method of
hashing . In this case, the hash function is
Generally, this approach is quite good for just about any value of M. However, in certain
situations some extra care is needed in the selection of a suitable value for M. For example,
it is often convenient to make M an even number. But this means that h(x) is even if x is
even; and h(x) is odd if x is odd. If all possible keys are equiprobable, then this is not a
problem. However if, say, even keys are more likely than odd keys, the function
will not spread the hashed values of those keys evenly
Folding:
In this method the key is interpreted as an integer using some radix (say 10). The integer is
divided into segments, each segment except possibly the last having the same number of
digits. These segments are then added to obtain the home address.
As an example, consider the key 76123451001214. Assume we are dividing keys into
segments of size 3 digits. The segments for our key are 761, 234, 510, 012, and 14. The
home bucket is 761 + 234 + 510 + 012 + 14 = 1531.
In a variant of this scheme, the digits in alternate segments are reversed before adding. This
variant is called folding at the boundaries and the original version is called shift folding.
Applying the folding at the boundaries method to the above example, the segments after
digit reversal are 761, 432, 510, 210, and 14; the home bucket is 761 + 432 + 510 + 210 + 14
= 1927.
Document1
3/19/2014
7
Mid-Square: In midsquare hashing, the key is squared and the address selected from the
middle of the squared number. The most obvious limitation of this method is the size of the
key. Given a key of 6 digits, the product will be 12 digits, which is beyond the maximum
integer size of many computers. Because most personal computers can handle a 9-digits
integer, let’s demonstrate the concept with keys of 4 digits. Given a key of 9452, the
midsquare address calculation is shown below using a 4-digit address (0000 to 9999).
9452 * 9452 = 89340304 : address is 3403
As a variation on the midsquare method, we can select a portion of the key, such as the
middle three digits, and then use them rather than the whole key. Doing so allows the
method to be used when the key is too large to square. For example, for the keys in Figure
6, we can select the first three digits and then use the midsquare method as shown below.
(We select the third, fourth, and fifth digits as the address.)
379452: 379 * 379 = 143641 ë 364
121267: 121 * 121 = 014641 ë 464
378845: 378 * 378 = 142884 ë 288
160252: 160 * 160 = 025600 ë 560
045128: 045 * 045 = 002025 ë 202
Note that in the midsquare method, the same digits must be selected from the product. For
that reason, we consider the product to have sufficient leading zeros to make it the full six
digits
Radix transformation method: Where the value or key is digital, the number base (or radix)
can be changed resulting in a different sequence of digits. (For example, a decimal
numbered key could be transformed into a hexadecimal numbered key.) High-order digits
could be discarded to fit a hash value of uniform length.
Digit hashing: function is referred to as Digit Analysis if it forms addresses by selecting and
shifting digits or bits of the original keys. An analysis on a sample of the key set is performed
to determine which key positions should be used in forming an address.
This hashing transformation techniques has been used in the conjunction with static key set
is i.e. key sets that do not change over time.
Document1
3/19/2014
8
The Length Dependent Method:
Another hashing technique which has been commonly used in table-handling applications is
called the Length Dependent Method .The length of the key is used along with some portion
of the key to produce either a table address directly or more commonly a intermediate key
which is used.
1. Describe circumstances in practice where one hashing method is better than another.
2. Dealing with collisions:
chaining (overflow) : If the hash table entries are all full then the hash table can increase
the number of buckets that it has and then redistribute all the elements in the table.
The hash function returns an integer and the hash table has to take the result of the
hash function and mod it against the size of the table that way it can be sure it will get
to bucket. so by increasing the size it will rehash and run the modulo calculations which
if you are lucky might send the objects to different buckets.
chaining (links): When a collision occurs, elements with the same hash key will
be chained together. A chain is simply a linked list of all the elements with the same
hash key. The hash table slots will no longer hold a table element. They will now hold
the address of a table element.
Document1
3/19/2014
9
linear probing:
If faced with a collision situation, the linear probing table will look onto to subsequent hash
elements until the first free space is found.
This traversal is known as probing the table; and as it goes by one element at a time, it is linear
probing.
There are other kinds of probing; for example quadratic probing is where the traversal skips
one element, then two, then four, etc. until a free space is found.
Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In
order to resolve the collision, the add algorithm will need to probe the table in order to find the
first free space (after 'C').
Consider the situation mentioned above where data 'F' has the same hash code as data 'D'. In
order to resolve the collision, the add algorithm will need to probe the table in order to find the
first free space (after 'C').
If the probe loops back, and finally reaches the same element that it started at, it means that
the hash table is full, and can no longer hold any more data. The addition operation will fail.
Document1
3/19/2014
10
Document1
3/19/2014
11
1
2
3
4
5
6
8
7
9
Given the above tree, is the sequence that the nodes would be visited:
Preorder traversal: 1 2 4 5 8 3 6 7 8
Inorder traversal: 4 2 8 5 1 6 3 7 8
Postorder traversal:
Level-order traversal: 1 2 3 4 5 6 7 8 9
Document1
3/19/2014
12
9
5
15
3
8
10
6
17
20
What would the above binary search tree look like if
Node 11 were added? Right of 10.
Node 5 were removed? 3 will come in the place of the 5.
Document1
3/19/2014
13
3. Balance the following binary search trees
9
5
15
17
10
20
Left rotation at the root node 9. And then it wil be balanced
9
5
15
3
8
6
Document1
3/19/2014
14
Right rotation at the end 5 will make it balanced.
4. Assume the following tree is an AVL tree. What is the balancing factor for each node?
Add node 7. Now what is the balancing factor for each node? What does that tell you
about the tree?
9
5
15
3
19
8
6
Node 6 will go to the right of 6.
Root Node
Node 5
Node 8
Node 15
After adding Node
Root Node
Node 5
Node 8
Node 15
Node 6
1
-1
1
-1
2
-2
2
-1
-1
Describe the use of a heap as a priority queue.
Answer: n earlier sections you learned about the first-in first-out data structure called a
queue. One important variation of a queue is called a priority queue. A priority queue acts
like a queue in that you dequeue an item by removing it from the front. However, in a
Document1
3/19/2014
15
priority queue the logical order of items inside a queue is determined by their priority. The
highest priority items are at the front of the queue and the lowest priority items are at the
back. Thus when you enqueue an item on a priority queue, the new item may move all the
way to the front. We will see that the priority queue is a useful data structure for some of
the graph algorithms we will study in the next chapter.
You can probably think of a couple of easy ways to implement a priority queue using sorting
functions and lists. However, inserting into a list is O(n) and sorting a list is O(nlogn). We can
do better. The classic way to implement a priority queue is using a data structure called a
binary heap. A binary heap will allow us both enqueue and dequeue items in O(logn).
The binary heap is interesting to study because when we diagram the heap it looks a lot like
a tree, but when we implement it we use only a single list as an internal representation. The
binary heap has two common variations: the min heap, in which the smallest key is always
at the front, and the max heap, in which the largest key value is always at the front. In this
section we will implement the min heap. We leave a max heap implementation as an
exercise.
Binary Heap Operations
The basic operations we will implement for our binary heap are as follows:
BinaryHeap() creates a new, empty, binary heap.
insert(k) adds a new item to the heap.
findMin() returns the item with the minimum key value, leaving item in the heap.
delMin() returns the item with the minimum key value, removing the item from the heap.
isEmpty() returns true if the heap is empty, false otherwise.
size() returns the number of items in the heap.
buildHeap(list) builds a new heap from a list of keys
1. Describe the use of a minheap for sorting. Would a minheap be useful for ascending or
descending sorting?
Answer: best answer I can get :
http://wiki.answers.com/Q/Is_an_array_that_is_in_sorted_order_a_min-heap?#slide=1
Document1
3/19/2014
16
2. Relate a hash table to a linear search. To a binary search.
Answer: 1. As more data input comes, there is huge probability that collision shows up
(hash function maps different data to same index). There are two ways to handle collision.
First is linear probing that implement hash table as array of linked list. In this case, worst
time for insertion or retrieve or deletion is O(n) that all input data are mapped to same
index. Besides, hash table need more space than number of input data. Second way is open
addressing. It would not consume more space than input data, but at worst case insertion
and retrieve is still O(n), which is extremely slow than constant time.
2. You have to know approximate size of input data before initializing hash table. Otherwise
you need to resize hash table which is a very time-consuming operation. For example, your
hash table size is 100 and then you want to insert the 101st element. Not only the size of
hash table is enlarged to 150, all element in hash table have to be rehashed. This insertion
operation takes O(n).
3. The elements stored in hash table are unsorted. In certain circumstance, we want data to
be stored with sorted order, like contacts in cell phone.
However, binary search tree performs well against hash table:
1. Binary search tree never meets collision, which means binary search tree can guarantee
insertion, retrieve and deletion are implemented in O(log(n)), which is hugely fast than
linear time. Besides, space needed by tree is exactly same as size of input data.
2. You do not need to know size of input in advance.
3. all elements in tree are sorted that in-order traverse takes O(n) time.
Useful Links:
http://www.cs.cmu.edu/~adamchik/15-121/lectures/Binary%20Heaps/heaps.html
Document1
3/19/2014
17
Download