int

Binary Search and Binary Tree • Binary Search • Heap • Binary Tree Search on Data • Search is one of fundamentals in computer science • It consists of methods to quickly answer the question, “is there this in the data?” (called query)  One way is to use buckets and hashes • We here approach this problem not from the way of memorizing the data but from the search method Consult a Dictionary • We will find out the position of a word in the dictionary • How do we do this? + check all words one by one from the beginning  called linear scan; O(n) time + open an arbitral page if the word is not there, check the former/latter half  faster than linear scan; the candidate pages are refined Binary Search • For conciseness, we assume that data is a collection of numbers • As preparation, sort the data Let s be the position (index) of 1st number, and t be that of the last • For query of finding q, we first look at the center  if the center is q, answer the position  if not, compare q and center to refine the area to be searched t s 1 3 7 8 9 11 13 17 18 19 Refine the Search Area • The center > q  q must be in the left side  set t to the position just before the center • The center < q  q must be in the right side  set s to the position just after the center • When t < s, end • Search space is refined to half and half, iteratively t s 1 3 7 8 9 s 11 t 13 17 18 19 Computation time for Binary Search • In each iteration, the search area becomes half or less  after at most log2 n iterations, the search area will be of length one, and the search will terminate  computation time is O(log2 n), that is optimal in the sense of complexity theory • No need of large extra memory, just two variables and the input data of O(n) (called “in place”) • So, very good Exercise • On the following number sequence, perform a binary search for queries of finding 8,17 and 19 (trace the movements of s and t ) 1 3 7 8 9 11 13 17 18 19 Weak Points of Array Data • Array needs long time (O(n) time) to keep the increasing order for insertion and deletion at a random position • If we use a list instead of arrays, we can insert/delete in O(1) time, but needs long time (O(n) time) to find the center of the order • In general, it is not trivial to attain efficiency for both search and insertion/deletion • … however, there are some ways Finding the minimum • We begin with, fast insertion/deletion, and fast search for minimum value, as a first step Problem: + store several (many) numeric values + insertion of new value, and deletion of a value in the data structure has to be done quickly + the minimum value among the values in the data can be found quickly • Generally, a data structure having these functions are called heap Determine the Winner • Determine the fastest runner in a school • They can not run at once, thus each class determine its fastest; then, we can find the fastest among the class-fastest • The class-fastest is also determined by classifying the students in smaller groups • For determining the strongest football team, two teams can play at once, thus we have a knockout system Finding the Minimum • Let’s have the same for numeric values (knockout system) • … after the determination, the minimum would be changed when we modify a value; how can we update? • ”A non-minimum value gets smaller” is easy; just compare the value and the minimum. It means that we have to keep only the minimum, for this • When the minimum value increases (or we delete it), we may have to re-compute everything? Re-computation is NOT Whole • Where do we have to re-compute, when a minimum increases (and becomes non-minimum?)  actually, it is not all • The results above the modified value can change, and others never • In the opposite view, the result which has the modified value below has to be checked Time for Re-computation • How long is the time for re-computation?  it’s linear in the height of the knockout system tree (this tree is often called heap tree) • #teams that are not knocked out increases exponentially, by going down the tree from the top • So, we take at most log2n +1 steps to get the bottom level • The time for re-computation is O(log n ) Insertion and Deletion • We keep that the left branch is always no less than the right, everywhere in the tree • To insert a new value to the heap, we put it at the right most position of the bottom level (or, the leftmost of new level if there is no space) • To delete a value, assign the value of the rightmost of the bottom level to the position to be deleted, and reduce the size by one • Both needs O(log n) time Realize Heap • To realize the heap, we may need something to structure  shall we use cell & pointers as list? • Actually, this is a good way Representing the adjacency relation by the pointers, to up, right child, and left child • However, actually, we can do this without pointers Structure by Array • Trace the heap from top to down, and trace each level from left to right, and put indices to the nodes from 0  When #leaves is n, the size of array is 2n-2 • Then, actually, the index of the parent/children can be computed in an arithmetic way 0 2 1 3 4 5 7 8 9 10 11 12 6 The Index of Adjacent Cell • The index of the cell adjacent to cell i up (parent)  (i-1)/2 (flooring) left-down (left child)  i*2+1 right-down (righ child)  i*2+2 • if i > n-1, then no child 0 2 1 3 4 5 7 8 9 10 11 12 6 Structure of Heap • Heap structure is composed of array, array size, and heap size • A subroutine changes the value of cell i to a typedef struct { void AHEAP_chg ( AHEAP *H, int i, int a ){ int *h; // array for values int j; int end; // size of array H->h[i] = a; int num; // current size of heap while ( i>0 ){ } AHEAP; j = i - 1 + (i%2)*2; // j := sibling of i if ( H->h[j] < a ) a = H->h[j]; i = (i-1) / 2; // i := parent of i if ( H->h[i] == a ) break; // no need to update H->h[i] = a; } } Insert & Delete • To insert, increase num and change the value of the last cell to a void AHEAP_ins ( AHEAP *H, int a ){ H->num++; H->h[H->num*2-3] = H->h[(H->num*2-2)/2] AHEAP_chg ( H, H->num*2-2, a); } void AHEAP_del ( AHEAP *H, int i ){ AHEAP_chg ( H, i, H->h[H->num*2-2]); AHEAP_chg ( H, (H->num*2-2)/2, H->h[H->num*2-3]); H->num--; } 1 3 1 7 1 4 7 9 2 1 8 4 3 Find the Cell of the Minimum Value • 一start from the top cell, and （セル i ）からスタートして、最小値を持つ子どもの方に降りていく int AHEAP_findmin ( AHEAP *H, int i ){ if ( H->num <= 0 ) return (-1); while ( i < H->num-1 ){ if ( H->h[i*2+1] == H->h[i] ) i = i*2+1; else i = i*2+2; } return ( i ); } 1 3 1 7 1 4 7 9 2 1 8 4 3 Find all ≤ Threshold • Find the left most one ≤ threshold int AHEAP_findlow_leftmost (AHEAP *H, int a , int i){ if ( H->num <= 0 ) return (-1); if ( H->h[0] > a ) return (-1); while ( i < H->num-1 ){ if ( H->h[i*2+1] <= a ) i = i*2+1; else i = i*2+2; } return ( i ); } • Find the one right to cell i ≤ threshold int AHEAP_findlow_nxt (AHEAP *H, int i){ for ( ; i>0 ; i=(i-1)/2 ){ if ( i%2 == 1 && H->h[i+1] <= a ) 1 return (AHEAP_findlow_leftmost (H, a, i+1)); } 7 return (-1); } 1 3 1 4 7 9 2 1 8 4 3 Example of Usage • Sort the numbers (in increasing order) + insert all numbers to a heap + extract the minimum number repeatedly • Clustering on similarity graph (gather nearest pairs, iteratively) 0 2 1 3 4 5 7 8 9 10 11 12 6 Ex. Huffman Tree • We have n words, or something, and each has frequency + insert all frequencies to a heap + extract two minimums, and merge them with the frequency of their sum (they are two children and merged one is their parent) + insert the new one to the heap • Finally, we obtain a tree structure • Assigning 0 to left, 1 to right child, each word gets a 01 code, obtained by tracing the path from the root to it • This code gives an optimal code assignment 35 15 20 11 7 A9 B6 C5 D4 E3 F8 Exercise: Heap • Construct a heap with the following values, and insert the values of 7, 2, and 13, iteratively 4, 6, 8, 9, 11, 15, 17 Memory Efficiency • 2n-1cells are used to store n values  using almost twice • Are there any way to more efficient storage?  store values on inner cells 0 2 1 3 4 5 7 8 9 10 11 12 6 Heap on Textbooks • Heap in usual texts is this type • In the “usual heap”, we keep the condition “parent has value smaller than its children”  top cell always has the minimum value • We update the heap with keeping this condition, so minimum is easy to find 0 2 1 3 4 5 7 8 9 10 11 12 6 Update Heap + Modification of the value is done by swapping the parent and child in the opposite relation, and go up (down) until the condition will be satisfied + Insertion is done by appending a cell at the right end + Deletion is done by moving the right end cell to there, and decrement the size • Almost the same as the previous one 0 2 7 9 8 3 10 11 9 10 4 4 7 A Code for Value Change • Heap structure is the same • Modify the value of cell i to a void HEAP_chg ( AHEAP *H, int i, int a ){ int aa = H->h[i]; H->h[i] = a; if ( aa > a ) HEAP_chg_up ( H, i ); if ( aa < a ) HEAP_chg_down ( H, i ); } typedef struct { int *h; // array for values int end; // size of array int num; // current size of heap } HEAP; Update Heap (upward) • Go upward with updating for decreasing the value, and go downward otherwise void HEAP_up ( AHEAP *H, int i ){ int a; while ( i>0 ){ if ( H->h[(i-1)/2]<= H->h[i] ) break; a = H->h[(i-1)/2]; H->h[(i-1)/2] = H->h[i]; H->h[i] = a; i = (i-1)/2 } } typedef struct { int *h; // array for values int end; // size of array int num; // current size of heap } HEAP; • The position of a value changes, thus is disadvantage if we want to store the position of a value Update Heap (downward) • Increasing a value may result reversal on the parent child constraint • Then, we have to swap parent and child, but we choose the smaller one, and we go down further void HEAP_down ( AHEAP *H, int i ){ int ii, a; while ( i<H->num/2 ){ ii = i*2+1; if (i*2+1 < H->num && H->h[ii]>H->h[ii+1]) ii = ii+1; if ( H->h[ii] >= H->h[i] ) break; typedef struct { a = H->h[ii]; int *h; // array for values H->h[ii] = H->h[i]; H->h[i] = a; int end; // size of array i = ii; int num; // current size of heap } } HEAP; } Find Values ≤ Threshold • Relatively simple by using recursion int HEAP_findlow ( AHEAP *H, int a , int i ){ if ( i>=H->num ) return; if ( H->h[i] > a ) return; printf (“%d\n”, H->h[i] HEAP_findlow ( H, a, i*2+1) HEAP_findlow ( H, a, i*2+2) } 0 2 7 9 8 3 10 11 9 10 4 4 7 Exercise: Heap (2) • Construct a usual heap with the following numbers, and insert numbers 7, 2, and 13, iteratively 4, 6, 8, 9, 11, 15, 17 Column: Speed of Heap in Practice • A heap needs O(log n) time for one operation • However, in practice, it takes 4 or 5 times more compared to usual arrays, even it has 1,000,000 cells (log2 1,000,000 ≒ 20 ) • Why does it happen? Column: Speed of Heap in Practice (2) • A heap update involves operation from the root to a leaf • Once it is done, the cells accessed are stored in cache memory, and can be quickly accessed in the next time • Do this several times, then the upper part of the heap is inside the cache; only lower part needs long memory access time • The phenomenon implies that the lower part is composed of 4 or 5 levels Here, Terminology on Trees • (In graph theory) the structure composed of vertices (or node) and edges connecting two vertices is called a graph • A graph without a ring (circle, cycle) is called a tree • A tree specified a top vertex called root is called a 根rooted tree • For a vertex x of a rooted tree + vertices on the path between x and the root are ancestors of x + vertices one of whose ancestors is x are descendants of x + the vertex adjacent to x and is an ancestor is the parent of x + the other vertices adjacent to x are children of x + the tree composed of all descendants of x is the subtree rooted at x • A vertex having no child is a leaf • A vertex having some children is an inner vertex • Distance to the root is the depth of a vertex • The max. depth among all vertices is height (depth) • A tree is a binary tree if for any vertex, #children ≤ 2 • A tree is a full binary tree if #children = 0 or 2 Find any Value • Heap is simple, so is good, but we want to find any value from the data in short time • To perform binary search, tree structure like heaps is good, but insertion/deletion take long time under keeping the increasing order • To keep the ordering, we have to be able to delete/insert any position quickly 0 2 1 3 4 5 7 8 9 10 11 12 6 When Order is kept • If the value at the leaves are sorted, we can perform a binary search by going down the tree from the top • To realize this, we write to each node the maximum value among its descendants  able to determine left or right, by looking at this value • This is realized with quick insertion/deletion, by allowing ill-formed tree + we attach two children to the vertex having the value just larger than the inserting value + copy the sibling vertex to the parent, and delete the both children Skew would Grow • Search/update time is linear in the depth of the target leaf • They are fast when the tree is balanced so that the height of the tree is low, but take long time when the tree is skewed  happens by many insertion at the same place • To fasten the operation, we need to derive something Eliminate the Skew • Optimal search time is O(log n ) • So, try to bound the time by c log n for some constant c • When deep leaves exist, shallow places must be somewhere else  deepen the shallow area and make deep places shallow, with keeping the ordering • This could be done by re-formulation of trees locally, by rotating the children and their parents Balancing by Rotation • Suppose that there are consecutive two vertices such that the left is two more higher than the right • We swapping the positions of the parent and the child (rotation) • By a rotation, gap of the heights decreases by two ≥2 Bounding the Height • For any vertex, the heights of children do not differ two, by repeatedly applying the rotation • Can we say something about the height k? + there is at least one vertex of depth k-1 (in another branch, branched at the root, or the child of the root) + At least two vertices of k-2 (branched at the depth of 2 or 3) + At least 2h-1 vertices of depth k-h (branched at 2h or 2h+1) ….  The number of vertices in the tree is at least 2k/3 • If there are n leaves, ら、高さは 3log2 n ＝ O(log n) Such a tree of height O(log n) is called a balanced tree Time for Search • ”Finding a value” needs to trace the path from the root to a leaf • The time for the search is, at most, the depth of the tree • When #leaves is n, the height ≤ 3log2 n ＝ O(log n)  therefore, time for search is O(log n) Effects by Rotation • When we rotate the tree at vertex x, are there any new vertex such that we now have to rotate the vertex? + descendants of x: OK, the heights of their children do not change + non-ancestor & non-descendants of x is also OK + For ancestors x, the height of one child can change • … so, if we rotate at a vertex, its ancestors may have to be rotated We thus rotate from the vertex to the root, iteratively Insertion and Deletion • We insert or delete a vertex, then its ancestors may violate the condition to be balanced • The height increase/decreases by one, thus one rotation is sufficient to each ancestor • Trace the ancestors from the vertex operated, and perform rotation if necessary (can stop if rotation is not needed at an ancestor) • The height of tree is O(log n), and a rotation can be done in a constant time, thus insertion and deletion with re-balancing can be done in O(log n) time • This rotation does not affect to any its ancestor (the number of descendants is not changed by rotation) Rotation by Other Criteria New criteria: the size of a subtree rooted at its grandchild is more than half, then rotate • By rotating, the maximum size of grandchildren will decrease at least one  the size of subtree gets half by going down two levels 20 30 50% 30 20 The height of Tree • Get half by two levels, thus we can go down at most 2log2 n levels  the height is at most 4log2 n ＝ O(log n) if #leaves is n 20 30 50% Insertion and Deletion • This rotation does not affect to any its ancestor (the number of descendants is not changed by rotation) • Trace the ancestors from the vertex operated, and perform rotation if necessary (not stop even if rotation is not needed at an ancestor) • The height of tree is O(log n), and a rotation can be done in a constant time, thus insertion and deletion with re-balancing can be done in O(log n) time Structure for Binary Tree • We need pointers in this case, since the shape of binary tree is not uniform and periodical • For the rotation threshold, we keep the height and size of the subtree, rooted at each vertex • We can represent the structure by array, as list typedef struct { BTREE *p; // -> parent BTREE *l; // -> left child BTREE *r; // -> rigth child int height; // height of subtree int height; // height of subtree int value; // (max) value } BTREE; Example of Usages • Dictionary data, storage for IDs • Keyword search in a document … Exersice: Binary Tree • Rotate the vertices of the following tree, that are necessary (examine two criteria) Many Children • Each vertex of a binary tree always has two children • Why two? + update cost is optimal + search and update will be the same costs + operation for children becomes simple • Can we get advantage by allowing more than two children? • 2-3 tree is an example; #children is 2 or 3 + the depths of all leaves are the same + however, operations for children are not simple (choosing minimum among three, split three into two,…) • Can we increase the number more? B-tree • A tree is a B-tree if #children of any vertex is bounded by B • There are some motivations for this • Consider HDD or tape, that take much cost to access a block, but reading a block takes not so long time compared to reading a bit  the computation time depends on #blocks we accessed • Then, simple solution is to increase the maximum number of children, that fits a block Update of B-tree • If the definition is “all vertices have exactly B children”, the memory usage is efficient however, we have to frequently update everywhere • However, the efficiency is less if many vertices have few children  bound the number of children from B/2 to B  if a parent and its child, or two siblings have at most B children in total, we merge them into one node • By applying rotation, the height of the tree is bounded by O(logB/2 n) Summary Binary search: search area is refined half, at most log n times Heap: simulate update of knockout system Binary tree: rotate at vertices to re-balance the tree B-tree: minimize the blocks to be accessed

int

Related documents

Products

Support

int

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib