B-Trees Data Structures and Other Objects Using C++ This presentation shows you the potential problem of unbalanced tree and show one way to fix it The problems of Unbalanced Trees Solution: 1. Periodically balance the tree (Project 9 in Chapter 10 on page 538) 2. Leaves cannot become too deep --- B-tree 1 2 3 4 5 B-Tree Differences compared to BST: 1. B-tree nodes have many more than two children 2. Each node contains more than just a single entry 3. Rules ensure that leaves do not become too deap B-Tree Rules 1. Root may have as few as 1 entry – every other node has MINIMUM entries 2. Maximum number of entries in a node is twice MINIMUM 3. Entries of each B-tree node are stored in a partially filled array – sorted in increasing order 4. Number of subtrees below a nonleaf node is 1 more than the number of entries in the node B-Tree Rules 5. For any nonleaf node a. An entry at index i is greater than all the entries in subtree number i of the node b. An entry at entry i is less than all the entries in subtree i+1 of the node 6. Every leaf in a B-tree has the same depth B-Tree Example ● B-tree of 10 integers (with MINIMUM set to 1) 6 2 and 4 1 3 9 5 7 and 8 10 B-Tree Illustrations https://www.youtube.com/watch?v=coRJrcIYbF4 https://www.cs.usfca.edu/~galles/visualization/BT ree.html Set Example Author uses a set implemented as a B-tree as an example Class Invariant Items in the set are stored in a B-tree Number of entries in root – stored in member variable data_count & number of subtrees stored in member variable child_count root's entries are stored in data[0] through data[data_count - 1] If root has sub-trees – subtrees are stored in sets pointed to by subset[0] through subset[child_count - 1] Searching Make a local variable i equal to the first index such that data[i] is not less than the target – if no such index exists, then set i equal to data_count, indicating that all of the entries are less than the target if (we found the target at data[i]) return 1; else if (the root has no children) return 0; else return subset[i]->count(target) Searching Example Inserting Loose insertion might result in MAXIMUM + 1 entries in the nodes Fix Loose insertion problem later 6, 17 4 12 6, 17 19, 22 4 12 18, 19, 22 Loose Inserting Make a local variable i equal to first index such that data[i] is not less than entry – if no such index exists, then set i to data_count if (we found the new entry at data[i]) 2a. Return false with no further work (since the new entry is already in the set). else if (the root has no children) { 2b. Add the new entry to the root at data[i]. (The original entries at data[i] and afterwards must be shifted right to make room for the new entry.) Return true to indicate that we added the entry. else { 2c. Save the value from this recursive call: subset[i]->loose_insert(entry); Then check whether the root of subset[i] now has an excess entry; if so, then fix that problem. Return the saved value from the recursive call. } Loose Inserting Example 6, 17 4 12 19, 22 MINIMUM = 1 18 6, 17 4 12 ? 6, 17, 19 18, 19, 22 4 12 18 22 Fixing Child with Excess Entry Split child node into nodes containing MINIMUM entries Pass median entry up to parent MINIMUM = 2 9, 28 3, 6 1, 2 4, 5 13, 16, 19, 22, 25 7, 8 11, 12 14, 15 17, 18 33, 40 20, 21 23, 24 26, 27 31, 32 34, 35 50, 51 9, 19, 28 3, 6 1, 2 4, 5 22, 25 13, 16 7, 8 11, 12 14, 15 17, 18 20, 21 33, 40 23, 24 26, 27 31, 32 34, 35 50, 51 Deleting From B-Tree Loose erase might leave the root of an internal subtree with fewer than MINIMUM entries Fix the problem later Loose Erase From B-Tree • Make a local variable i equal to the first index such that data[i] is not less than target. If there is no such index, then set i equal to data_count 4 possibilities: Root has no children & did not find entry • • Root has no children & found entry • • Remove the entry and return true Root has children & did not find target • • Done – entry not in tree Recursive call to subtree[i] Root has children & found the target • Replace the target by the largest item from subtree[i] Loose Erase Example 10, 28 2, 5 0, 1 3, 4 13, 16, 19, 22 6, 7, 8 11, 12 14, 15 17, 18 33, 40 20, 21 23, 24, 26 34, 35 31, 32 50, 51 MINIMUM = 2 ? 28 10, 26 2, 5 0, 1 3, 4 13, 16, 19, 22 6, 7, 8 11, 12 14, 15 17, 18 33, 40 20, 21 23, 24 31, 32 34, 35 50, 51 Fix Shortage in Child Case 1: Transfer an extra entry from subtree[i-1] - subset[i-1] has more than MINIMUM number of entries Transfer data[i-1] down to front of subset[i] Transfer final item of subtree[i-1] up to replace data[i1] If subtree[i-1] has children, transfer final child of subset[i-1] to front of subtree[i] Case 2: Transfer extra entry from subtree[i+1]Similar to Case 1 Fix Shortage Example 10, 28 2, 5 0, 1 3, 4 MINIMUM = 2 13, 16, 19, 22 6, 7, 8 11, 12 14, 15 17, 18 33 20, 21 31, 32 23, 24, 26 34, 35 The 22 has come up from the middle child. The 28 has come down from the root. 10, 22 2, 5 0, 1 3, 4 13, 16, 19 6, 7, 8 11, 12 14, 15 17, 18 28, 33 20, 21 23, 24, 26 31, 32 This child has been moved over. 34, 35 Fix Shortage in Child Case 3: Combine subtree[i]with subtree[i1] Transfer data[i-1] down to end of subtree[i-1] Transfer all items & children from subtree[i] to end of subtree[i-1] Delete node subtree[i] & shift subtree[i+1] leftward to fill gap Case 4: Combine subtree[i]with subtree[i+1]- Similar to Case 3 Fix Shortage Example 10, 28 2, 5 0, 1 3, 4 MINIMUM = 2 16, 19 6, 7, 8 14, 15 17, 18 33 31, 32 20, 21 34, 35 10 2, 5 0, 1 3, 4 16, 19, 28, 33 6, 7, 8 14, 15 17, 18 20, 21 31, 32 34, 35 Time analysis Insert, remove and search times are roughly proportional to the depth of a tree in binary search trees, heaps and B-trees. Binary search trees suffer, since a tree of n entries could have depth n. Heaps and B-trees have depth proportional to log(n), so the operations for a heap or a Btree is O(log(n)). Summary A B-tree is a tree for storing entries in a manner that follows six rules. The tree algorithms that we have seen for binary search trees, heaps, and B-trees all have worsecase time complexity of O(d), where d is the depth of the tree. The depth of a heap or B-tree is never more than O(log n), where n is the number of nodes.