CS 202 Homework 1 Group Member: Huaxia Xia Yuanfang Hu Weihao Chuang Helen Qustion 1: Problem 10.3-8 Let X[1..n] and Y[1..n] be two arrays, each containing n number in sorted order. We need to find the median of the combined arrays Z=X+Y. Assume no duplicates. We use the select algorithm of page 190 which can select an element in O(n) time. First we find the median of X called x1and x2 (where x1=x2 if n is odd and x1x2) or the set of x, and the median of Y called y1 and y2 (y1=y2 if n is odd and y1y2) or the set of x. For the X, Y, Z elements greater than the median are called "above" and elements less than the median are called "below" sets. We check the terminating condition for this initial case. For each partition, we find the overlapping region of the medians. xmin = least elt of X s.t. xmin min(x1,x2,y1,y2). xmax = elt of X s.t. xmax max(x1,x2,y1,y2). For xmin or xmax, one of them will be a median if we exclude case 1 (by inspection). Similarly ymin and ymax are found. xsize = order(xmax)order(xmin) similarlyly ysize. Terminating condition. We know that Z will have 2n elements, hence there will be two medians z1 and z2 we have to check for. If we fail this check we continue the median search. Case 1. Even n X: 1 2 3 4 13 14 15 16 Y: 5 6 7 8 9 10 11 12 x = {4, 13}, and y={8,9} We know that the median of X and Y has an equal number elements above and below it. In this case x1<y1 and x2>y2, then y1 and y2 are the medians. This is because the elements of X above and below median of X ,which is correspondingly above and below the median of Y will contribute to Z's above and below sets. Y's above and below sets by virtue of picking y1, and y2 as medians will correspondingly contribute to Z's above and below sets. Since X and Y above and below sets were balanced in size by the median definition, Z's which is the superposition of X and Y's above and below set, will be balanced hence y1 and y2 are the median. This check is O(1). Case 2. Odd n X: 1 2 3 9 10 Y: 4 5 6 7 8 x={3} y={4} If ysize = 0 and xsize = 0, then we halt. We will find for x1 < y1 that all above X elts greater than y1, and all below Y elts less than x1, hence by suposition of the above and below set, x1 and y1 are the medians of Z. If y1 > x1 the same holds. This check is O(1). Question 2 Multiplication of sparse polynomials: a. Suppose the two polynomials to be a1Xi1+ a2Xi2+…+ akXik and b1Xj1+ b2Xj2+…+ bkXjk. We will give two algorithms, each of them has the complexity of O(k2logk). 1. The first algorithm: i. We multify each of the asXis with each of the btXjt, and got k2 items. ii. Then put the results into an array using a hash function on the exponents to decide the indexes. If two results have the same exponent, then they will be hashed to the same place so that we can add their coefficients together to join them into one item. iii. After that, we get at most k2 items. Sort these k2 items using quicksort and will get the final result. Analysis: In step (i), the time cost is k2; In step (ii), The number of different exponents should be less than min(k2, Ik+Jk+1), and k2 should be the nearer value for the "sparse polynomials". Thus k2 is the space cost for the array to place the results, and also is the time needed to initialize the space. The time order to hash and place the items is k2+O(C). As for the time cost, we need at most k2 to add the items with same exponents. In step(iii), if the expected number of different exponents is k. Thetime cost is expected to be 1.7k2logk2 = 3.4k2logk. (How to calculate k? Anyway, k is in order O(k).) Totally, the time is approxiately: k2 + k2 + k2 + O(C) + k2 + 3.4k2logk = 4k2 + 3.4k2logk And the time order is O(k2logk). 2. The above algorithm does not use the condition of "both Is and Jt is inorder". So we try to find a more efficient algorithm. Stretagy: See that we'll get an inorder polynomial when multiplying the second polynomial by each item of asXis. Thus we can get k inorder polynomials with multiplying the second polynomial by all the items in the first polynomial. Merge the k polynomials together and we'll get the final result. Use merge-sort to merge and use sort to get the exponent-minimum item from the k head-items of the k polynomials. Algorithm & implementation: i. We add two attributes to each of asXis of the first polynomial: Ps, which is the index t of btXjt to which asXis is multiplying; and es, which is the exponents of the current result item, i.e., is + jt. More over, we use a pointer r to point which polynomial the current exponent-minimum result item is in. For s=1 to k, do: Ps = 1; es = is + j1; endfor Build the tree of heapsort of k items: e1, e2, … ,ek. And surely (e1 = i1+j1) is the minimum one, so save a1b1Xi1+j1 into the final result space and let r = 1. ii. For n=2 to k2 do: Pr = Pr + 1; if Pr <= k then er = ir + jPr root of the heapsort tree er endif “Heapify” the tree and get the new minimum value es; Save asbPsXis+jPs to the final result space (maybe need add to last item); r=s endfor Analysis: In step (i), it takes 2k time to initialize P & e; and the time order to build a heapsort tree is O(k). In step (ii), the time to “heapify” a tree is about logk. So the total time of this step is about k2logk. Totally, the time order should be O(k2logk). b. For the case of both the input and output polynomials need not to be inorder. We use the first algorithm described above, except the third step of "sorting the k2 items". The time order of the algorithm is O(k2). Question 3 10.3-6 from text (p192) Solution: Strategy: We have known that the time cost of SELECTION is cn. We devide the n elements into two buckets in time of cn; recursively, we can divide the elements into 2logk buckets in logk steps, each of the buckets is in the size of 2log(n/k). After that, the requested i-th order statistic, i.e, the th order statistic of the whole array, must be the ((i*n/k-1) mod 2log(n/k) + 1)-th order statistic of the i*n/k / 2log(n/k) -th bucket. We use SELECT to select all the k items. Algorithm: K_QUANTILES(A, n, k) begin 1 EQUAL_PATITION(A, 1, n, 2log(n/k)); /* to be implemented below */ 2 for i=1 to k-1 3 K[i]SELECT(A, 2log(n/k) * (i-1) +1, 2log(n/k) * i, i*n/k / 2log(n/k) ) /* SELECT is describe in page 190 */ 4 end for end EQUAL_PATITION(A, p, r, s) begin 5 m = 2log(r-p+1) /* the min 2x larger or equal to size of the bucket */ 6 if m>s then 7 Using SELECT the m/2 order statistic to PATITION(A, p, r) 8 EQUAL_PATITION(A, p, p+m/2-1, s) 9 EQUAL_PATITION(A, p+m/2, r, s) 10 endif end Analysis: In K_QUANTILES, the time is spent in step 1 &3. For step1, EQUAL_PATITION is a recursive function with recursive-level of logk. It does SELECT in each level, the time cost is cn. So the total time cost for step 1 is: cnlogk. For step 3, we do k times of select from bucket with size of 2log(n/k), so time cost is: ck2log(n/k) cn Totally, we need time of : cnlogk + cn And so the time order is O(nlogk). Question 4 Question 15:2-1 Successor, Predecessor-In order to support Successor and Predecessor operations in O(1) time, we keep 2 extra pointers in each node, succPtr and predPtr. succPtr and predPtr for the first node inserted into the tree is set to null. For each subsequent node insertion, we insert the new node, and before we make any rotations, we update the succPtrs and predPtrs. This is done by if key[newNode] > key[parent] { //here, parent is the parent of //newNode newNode.succPtr = parent.succPtr newNode.predPtr = parent parent.parent.predPtr = newNode parent.succPtr = newNode } else { newNode.predPtr = parent.predPtr newNode.succPtr = parent parent.predPtr = newNode parent.predPtr = newNode } We then do the rotation if needed. Therefore, time for insertion remains O(log n + const.) = O(log n) time. For deletion, we first identify the node to be deleted as described in text, and before we actually delete the node from the tree, we update the succPtr's and prePtr's first by deleteNode.succPtr.predPtr = deleteNode.predPtr deleteNode.predPtr.succPtr = deleteNode.succPtr //make the succPtr and prePtr of deleteNode null if necessary. We can then delete the node from tree and rotate the tree if necessary. Time for deleteion remains O(log n + const.) = O(log n) . Then, the Successor and Predecessor operations will only take O(1) time. (Please note that adding succPtr and predPtr is the same as building a doublly linked list within the tree.) Minimum, Maximum-- We keep 2 extra pointers to the tree, minPtr and maxPtr. The first node inserted into the tree would be pointed to by both minPtr and maxPtr. For each subsequent node insertion to the tree, we compare the node with the nodes pointed to by minPtr and maxPtr. If the new node were greater than maxPtr, maxPtr will point to the new node; if the new node were smaller than the node pointed to by minPtr, minPtr will point to the new node. After these comparisons, the new node is inserted into the tree using the normal insertion algorithm described in text. Therefore, each insertion still takes O(log n + 3) = O(log n) time. (3 is for 2 comparisons and at most one pointer update.) For deletion, we first check if the node we wanted to delete is pointed to by maxPtr or minPtr. If the node were pointed to by maxPtr, we make maxPtr point to deleteNode's predecessor. This will only take O(1) time since we've added predPtr's and succPtr's to each node (see above.) We then proceed with the modified version of deletion as described above. If the node we wanted to delete were pointed to by minPtr, we update minPtr such that minPtr points to the successor of the node we want to delete. This will again only take O(1) time because of the newly added pointers described above. We then proceed with the deletion algorithm described above. Therefore, time for deletion still remains O(log n + const.) = O(log n) time and the Minimum and Maximum operations take O(1) time. Quest 5 5.3-4 Answer: To list all intervals in T that overlap i, we give following argorithm INTERVALSEARCH-ALL(T,i). INTERVAL-SEARCH-ALL(T,i) 1 x root[T] 2 if x != NIL 3 if i overlaps int(x) 4 print(x) 5 if left[x] != NIL and max[left[x]] >= low[i] 6 INTERVAL-SEARCH-ALL(left[x],i) 7 if right[x] != NIL and low[x] <= high[i] and max[right[x]] >= low[i] 8 INTERVAL-SEARCH-ALL(right[x],i) Description: The idea is : We walk the tree T and examine every node to find if it overlaps i. When walking the tree, we cut the unnecessary path according to the properties of tree T. We try to prove that we at most search k+1 paths to find k intervals. For every node, if it overlaps i, then print it. If it is possible to find intervals in its left subtree, recursively find them in its left subtree. If it is possible to find intervals in its right subtree, recursively find them in its right subtree. Analysis: First, the time order of algorithm is T <= O(n). Because the worst case is that all conditional recursive happen. In this case, we walk the whole tree and use O(n) time. Second, the time order of algorithm is T<=O(klogn). Because: i. The time to find one intervals in T that overlap i is O(logn). Because finding one node in T is to walk a path from root to leaf and the hight of T is no more that logn. ii. Every time we walk a path (cut off the unnecessary lower part), there are two cases: We get at least one intervals. We get no intervals. Assume the leaf node is x, that means low[x] > high [i], since we guarantee low[i] <= high[x]. Therefore, we assert that we can stop here because the lower endpoint of x's successors must be greater than that of x. Because in-order walk sorts all nodes by their lower endpoint. So, we at most search k+1 paths to find k intervals. The time order of search is O(klogn). Since the time order of algorithm T <= O(n) and T <= O(klogn), we prove that T = O(min(n, klogn)). Question 6 (Data structure to keep track of a bank account.) We can use a red-black tree (or any balanced BST). The key will be time, and we will store the operation (Enter a, or Cancel) information inside the nodes, i.e struct Node { key; transaction; amount; } //stores time t //stores either "Cancel" or "Enter" //stores the value 'a' in Enter(t, a) With this data structure, when we receive operations Enter(t, a) orCancel(t), we will create a new node with appropriate information and insert it into the tree. Since we're using a red-black tree and the time for insertion into a red-black tree is O(log n), each Enter(t, a) and Cancel(t) operation will take O(log n) time. For Balance(t) operation, we will have to walk the entire tree in orderuntil we reach a node with t node > t. While walking the tree, we will subtract/add to the balance. The balance we obtained by the time we reach a node where tnode > t will be equal to Balance(t). We will return this value. Since it takes O(n) time to traverse the tree in order, Balance(t) will take O(n) time. Pf that traversing the tree in order takes O(n) time: To traverse the tree in order, we use the following alg'm: traverse (root) { if left subtree of root is not empty then traverse(left subtree of root) walk over root if right subtree of root is not empty then traverse (right subtree of root) } Since each node will be traversed at most twice (once to go down the subtree and once to go back up the root), the time to traverse the tree is O(2n) = O(n) . There is an alternative implementation of each Node that our group felt did not give a balanced time of the operations. This implementation of Node is as follows: struct AlternativeNode { key; //stores time balance; //stores the balance after this operation is performed. This is Balance(t). transaction; //stores either "Enter" or "Cancel". Amount; //stores 'a' in Enter(t, a) } With this implementation, finding Balance(t) will only take O(log n) since we will only need search for the last node with key == t. (The reason we want the last node is that there might be 2 nodes with the same key, for example, Enter(ti, a) and Cancel(ti).) However, with AlternativeNode, for each Enter and Cancel operation, we will need to insert a node with Enter/Cancel transaction and also walk the entire tree to update the balance of each node. (Walking the entire tree is required since entries of the operations may not be made in the correct order by time.) This will give Enter/Cancel a time of O(log n + n) = O(n) . So we will have 2 operations with O(n) time and 1 operation with O(log n) time as opposed to 2 operations with O(log n) time and 1 operation with O(n) time.