Question 4 - Huaxia Xia

advertisement
CS 202 Homework 1
Group Member:
Huaxia Xia
Yuanfang Hu
Weihao Chuang
Helen
Qustion 1:
Problem 10.3-8
Let X[1..n] and Y[1..n] be two arrays, each containing n number in sorted order. We need to find the
median of the combined arrays Z=X+Y. Assume no duplicates.
We use the select algorithm of page 190 which can select an element in O(n) time. First we find the
median of X called x1and x2 (where x1=x2 if n is odd and x1x2) or the set of x, and the median of Y
called y1 and y2 (y1=y2 if n is odd and y1y2) or the set of x. For the X, Y, Z elements greater than the
median are called "above" and elements less than the median are called "below" sets. We check the
terminating condition for this initial case.
For each partition, we find the overlapping region of the medians. xmin = least elt of X s.t. xmin 
min(x1,x2,y1,y2). xmax = elt of X s.t. xmax  max(x1,x2,y1,y2). For xmin or xmax, one of them will be a
median if we exclude case 1 (by inspection). Similarly ymin and ymax are found. xsize = order(xmax)order(xmin) similarlyly ysize.
Terminating condition.
We know that Z will have 2n elements, hence there will be two medians z1 and z2 we have to check for. If
we fail this check we continue the median search.
Case 1. Even n
X: 1 2 3 4
13 14 15 16
Y:
5 6 7 8 9 10 11 12
x = {4, 13}, and y={8,9}
We know that the median of X and Y has an equal number elements above and below it. In this case x1<y1
and x2>y2, then y1 and y2 are the medians. This is because the elements of X above and below median of
X ,which is correspondingly above and below the median of Y will contribute to Z's above and below sets.
Y's above and below sets by virtue of picking y1, and y2 as medians will correspondingly contribute to Z's
above and below sets. Since X and Y above and below sets were balanced in size by the median definition,
Z's which is the superposition of X and Y's above and below set, will be balanced hence y1 and y2 are the
median. This check is O(1).
Case 2. Odd n
X: 1 2 3
9 10
Y:
4 5 6 7 8
x={3} y={4}
If ysize = 0 and xsize = 0, then we halt. We will find for x1 < y1 that all above X elts greater than y1, and
all below Y elts less than x1, hence by suposition of the above and below set, x1 and y1 are the medians of
Z. If y1 > x1 the same holds. This check is O(1).
Question 2 Multiplication of sparse polynomials:
a. Suppose the two polynomials to be a1Xi1+ a2Xi2+…+ akXik and b1Xj1+ b2Xj2+…+
bkXjk.
We will give two algorithms, each of them has the complexity of O(k2logk).
1. The first algorithm:
i.
We multify each of the asXis with each of the btXjt, and got k2 items.
ii.
Then put the results into an array using a hash function on the exponents to
decide the indexes. If two results have the same exponent, then they will be
hashed to the same place so that we can add their coefficients together to join
them into one item.
iii.
After that, we get at most k2 items. Sort these k2 items using quicksort and
will get the final result.
Analysis:
In step (i), the time cost is k2;
In step (ii), The number of different exponents should be less than min(k2, Ik+Jk+1),
and k2 should be the nearer value for the "sparse polynomials". Thus k2 is the space
cost for the array to place the results, and also is the time needed to initialize the
space. The time order to hash and place the items is k2+O(C). As for the time cost, we
need at most k2 to add the items with same exponents.
In step(iii), if the expected number of different exponents is k. Thetime cost is
expected to be 1.7k2logk2 = 3.4k2logk. (How to calculate k? Anyway, k is in order
O(k).)
Totally, the time is approxiately:
k2 + k2 + k2 + O(C) + k2 + 3.4k2logk = 4k2 + 3.4k2logk
And the time order is O(k2logk).
2. The above algorithm does not use the condition of "both Is and Jt is inorder". So we
try to find a more efficient algorithm.
Stretagy:
See that we'll get an inorder polynomial when multiplying the second polynomial by
each item of asXis. Thus we can get k inorder polynomials with multiplying the
second polynomial by all the items in the first polynomial. Merge the k polynomials
together and we'll get the final result. Use merge-sort to merge and use sort to get the
exponent-minimum item from the k head-items of the k polynomials.
Algorithm & implementation:
i.
We add two attributes to each of asXis of the first polynomial: Ps, which is the
index t of btXjt to which asXis is multiplying; and es, which is the exponents of
the current result item, i.e., is + jt. More over, we use a pointer r to point
which polynomial the current exponent-minimum result item is in.
For s=1 to k, do: Ps = 1; es = is + j1; endfor
Build the tree of heapsort of k items: e1, e2, … ,ek. And surely (e1 = i1+j1) is
the minimum one, so save a1b1Xi1+j1 into the final result space and let r = 1.
ii.
For n=2 to k2 do:
Pr = Pr + 1;
if Pr <= k then
er = ir + jPr
root of the heapsort tree  er
endif
“Heapify” the tree and get the new minimum value es;
Save asbPsXis+jPs to the final result space (maybe need add to last item);
r=s
endfor
Analysis:
In step (i), it takes 2k time to initialize P & e; and the time order to build a heapsort
tree is O(k).
In step (ii), the time to “heapify” a tree is about logk. So the total time of this step is
about k2logk.
Totally, the time order should be O(k2logk).
b. For the case of both the input and output polynomials need not to be inorder. We use
the first algorithm described above, except the third step of "sorting the k2 items". The
time order of the algorithm is O(k2).
Question 3
10.3-6 from text (p192)
Solution:
Strategy:
We have known that the time cost of SELECTION is cn. We devide the n elements into
two buckets in time of cn; recursively, we can divide the elements into 2logk buckets in
logk steps, each of the buckets is in the size of 2log(n/k). After that, the requested i-th
order statistic, i.e, the th order statistic of the whole array, must be the ((i*n/k-1) mod
2log(n/k) + 1)-th order statistic of the i*n/k / 2log(n/k)  -th bucket. We use SELECT to
select all the k items.
Algorithm:
K_QUANTILES(A, n, k)
begin
1
EQUAL_PATITION(A, 1, n, 2log(n/k));
/* to be implemented below */
2
for i=1 to k-1
3
K[i]SELECT(A, 2log(n/k) * (i-1) +1, 2log(n/k) * i, i*n/k / 2log(n/k)  )
/* SELECT is describe in page 190 */
4
end for
end
EQUAL_PATITION(A, p, r, s)
begin
5
m = 2log(r-p+1)
/* the min 2x larger or equal to size of the bucket */
6
if m>s then
7
Using SELECT the m/2 order statistic to PATITION(A, p, r)
8
EQUAL_PATITION(A, p, p+m/2-1, s)
9
EQUAL_PATITION(A, p+m/2, r, s)
10
endif
end
Analysis:
In K_QUANTILES, the time is spent in step 1 &3.
For step1, EQUAL_PATITION is a recursive function with recursive-level of logk. It
does SELECT in each level, the time cost is cn. So the total time cost for step 1 is:
cnlogk.
For step 3, we do k times of select from bucket with size of 2log(n/k), so time cost is:
ck2log(n/k)  cn
Totally, we need time of :
cnlogk + cn
And so the time order is O(nlogk).
Question 4
Question 15:2-1
Successor, Predecessor-In order to support Successor and Predecessor operations in O(1) time, we keep 2 extra pointers in each
node, succPtr and predPtr. succPtr and predPtr for the first node inserted into the tree is set to null. For
each subsequent node insertion, we insert the new node, and before we make any rotations, we update the
succPtrs and predPtrs. This is done by
if key[newNode] > key[parent] {
//here, parent is the parent of
//newNode
newNode.succPtr = parent.succPtr
newNode.predPtr = parent
parent.parent.predPtr = newNode
parent.succPtr = newNode
}
else {
newNode.predPtr = parent.predPtr
newNode.succPtr = parent
parent.predPtr = newNode
parent.predPtr = newNode
}
We then do the rotation if needed. Therefore, time for insertion remains
O(log n + const.) = O(log n)
time.
For deletion, we first identify the node to be deleted as described in text, and before we actually delete the
node from the tree, we update the succPtr's and prePtr's first by
deleteNode.succPtr.predPtr = deleteNode.predPtr
deleteNode.predPtr.succPtr = deleteNode.succPtr
//make the succPtr and prePtr of deleteNode null if necessary.
We can then delete the node from tree and rotate the tree if necessary. Time for deleteion remains
O(log n + const.) = O(log n) .
Then, the Successor and Predecessor operations will only take O(1) time. (Please note that adding succPtr
and predPtr is the same as building a doublly linked list within the tree.)
Minimum, Maximum--
We keep 2 extra pointers to the tree, minPtr and maxPtr. The first node inserted into the tree would be
pointed to by both minPtr and maxPtr. For each subsequent node insertion to the tree, we compare the
node with the nodes pointed to by minPtr and maxPtr. If the new node were greater than maxPtr, maxPtr
will point to the new node; if the new node were smaller than the node pointed to by minPtr, minPtr will
point to the new node. After these comparisons, the new node is inserted into the tree using the normal
insertion algorithm described in text. Therefore, each insertion still takes
O(log n + 3) = O(log n)
time. (3 is for 2 comparisons and at most one pointer update.)
For deletion, we first check if the node we wanted to delete is pointed to by maxPtr or minPtr. If the node
were pointed to by maxPtr, we make maxPtr point to deleteNode's predecessor. This will only take O(1)
time since we've added predPtr's and succPtr's to each node (see above.) We then proceed with the
modified version of deletion as described above. If the node we wanted to delete were pointed to by
minPtr, we update minPtr such that minPtr points to the successor of the node we want to delete. This will
again only take O(1) time because of the newly added pointers described above. We then proceed with the
deletion algorithm described above. Therefore, time for deletion still remains
O(log n + const.) = O(log n)
time and the Minimum and Maximum operations take O(1) time.
Quest 5
5.3-4
Answer:
To list all intervals in T that overlap i, we give following argorithm INTERVALSEARCH-ALL(T,i).
INTERVAL-SEARCH-ALL(T,i)
1
x  root[T]
2
if x != NIL
3
if i overlaps int(x)
4
print(x)
5
if left[x] != NIL and max[left[x]] >= low[i]
6
INTERVAL-SEARCH-ALL(left[x],i)
7
if right[x] != NIL and low[x] <= high[i] and max[right[x]] >= low[i]
8
INTERVAL-SEARCH-ALL(right[x],i)
Description:
The idea is : We walk the tree T and examine every node to find if it overlaps i. When
walking the tree, we cut the unnecessary path according to the properties of tree T. We
try to prove that we at most search k+1 paths to find k intervals. For every node, if it
overlaps i, then print it. If it is possible to find intervals in its left subtree, recursively find
them in its left subtree. If it is possible to find intervals in its right subtree, recursively
find them in its right subtree.
Analysis:
First, the time order of algorithm is T <= O(n). Because the worst case is that all
conditional recursive happen. In this case, we walk the whole tree and use O(n) time.
Second, the time order of algorithm is T<=O(klogn). Because:
i. The time to find one intervals in T that overlap i is O(logn). Because finding
one node in T is to walk a path from root to leaf and the hight of T is no more that
logn.
ii. Every time we walk a path (cut off the unnecessary lower part), there are two
cases:
 We get at least one intervals.
 We get no intervals. Assume the leaf node is x, that means low[x] > high [i],
since we guarantee low[i] <= high[x]. Therefore, we assert that we can stop
here because the lower endpoint of x's successors must be greater than that
of x. Because in-order walk sorts all nodes by their lower endpoint.
So, we at most search k+1 paths to find k intervals. The time order of search is
O(klogn).
Since the time order of algorithm T <= O(n) and T <= O(klogn), we prove that T =
O(min(n, klogn)).
Question 6 (Data structure to keep track of a bank account.)
We can use a red-black tree (or any balanced BST). The key will be time, and we will store the operation
(Enter a, or Cancel) information inside the nodes, i.e
struct Node {
key;
transaction;
amount;
}
//stores time t
//stores either "Cancel" or "Enter"
//stores the value 'a' in Enter(t, a)
With this data structure, when we receive operations Enter(t, a) orCancel(t), we will create a new node with
appropriate information and insert it into the tree. Since we're using a red-black tree and the time for
insertion into a red-black tree is O(log n), each Enter(t, a) and Cancel(t) operation will take O(log n) time.
For Balance(t) operation, we will have to walk the entire tree in orderuntil we reach a node with t node > t.
While walking the tree, we will subtract/add to the balance. The balance we obtained by the time we reach
a node where tnode > t will be equal to Balance(t). We will return this value. Since it takes O(n) time to
traverse the tree in order, Balance(t) will take O(n) time.
Pf that traversing the tree in order takes O(n) time:
To traverse the tree in order, we use the following alg'm:
traverse (root) {
if left subtree of root is not empty then
traverse(left subtree of root)
walk over root
if right subtree of root is not empty then
traverse (right subtree of root)
}
Since each node will be traversed at most twice (once to go down the subtree and once to go back
up the root), the time to traverse the tree is
O(2n) = O(n) .
There is an alternative implementation of each Node that our group felt did not give a balanced time of the
operations. This implementation of Node is as follows:
struct AlternativeNode {
key;
//stores time
balance; //stores the balance after this operation is performed. This is Balance(t).
transaction;
//stores either "Enter" or "Cancel".
Amount; //stores 'a' in Enter(t, a)
}
With this implementation, finding Balance(t) will only take O(log n) since we will only need search for the
last node with key == t. (The reason we want the last node is that there might be 2 nodes with the same
key, for example, Enter(ti, a) and Cancel(ti).) However, with AlternativeNode, for each Enter and Cancel
operation, we will need to insert a node with Enter/Cancel transaction and also walk the entire tree to
update the balance of each node. (Walking the entire tree is required since entries of the operations may
not be made in the correct order by time.) This will give Enter/Cancel a time of
O(log n + n) = O(n) .
So we will have 2 operations with O(n) time and 1 operation with O(log n) time as opposed to 2 operations
with O(log n) time and 1 operation with O(n) time.
Download