CMPE126 Data Structures B-Trees Spring 2006 Copyright (c) All rights reserved Leonard Wesley 0 Why B-Trees? Trees studied so far are for storing data in memory B-Trees are better suited for storing data in memory AND on secondary storage. Better suited for balancing data than some other three ADTs. Can store multiple keys with the same value, unlike some other trees, such as AVL trees. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 1 The Problem With Unbalanced Trees 1 The levels are sparsely filled resulting in deep paths. This defeats the purpose of binary trees 2 3 4 5 Spring 2006 Copyright (c) All rights reserved Leonard Wesley 2 Possible Solutions To Unbalanced Trees Periodically balance the tree Don’t let a tree get too unbalanced when inserting or deleting AVL Trees: Sometimes called HB[1] trees. Invented by Adel’son-Vel’skii and Landis ~early 1960s. (an inmemory solution … not ideally suited secondary storage) B-Trees: Proposed by R. Bayer & E.M. Creight (see pg. 542 Main & Savitch for ref.) Spring 2006 Copyright (c) All rights reserved Leonard Wesley 3 What Is A B-Tree? It is a type of “multiway” tree. It is NOT a binary search tree, nor is it a binary tree. It provides a fast way to index into a multilevel set of nodes. Each node in the B-Tree contains a sorted array of key values. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 4 Motivation For Multiway Tree Secondary storage (e.g., disks) is typically divided into equalsized blocks (e.g., 512, 1024, …, 4096, …) The basic I/O operation reads and writes blocks rather than single bytes at a time between secondary storage and memory. Goal is to devise a multiway search tree that will minimize file access by exploiting disk reads. Each access to secondary storage is approximately equal to 250K instructions … depending on the speed of the CPU Spring 2006 Copyright (c) All rights reserved Leonard Wesley 5 ISAM ISAM = Indexed Sequential Access Method. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 6 ISAM: The Idea Track Disk Block 512, 1024, …bytes Platter Spring 2006 Copyright (c) All rights reserved Leonard Wesley 7 ISAM: Index & Keys Data Block Key Block # A Block on a track. • All data in the block will have keys ≤ the block key, or have keys ≥ the block key. Pick one inequality and stick with it. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 8 ISAM: Block Index This index could be stored in memory Block # Key 0 G 1 K 2 N Block Index Spring 2006 Copyright (c) All rights reserved Leonard Wesley 9 ISAM: Disk Index This index could be stored in memory also Disk 0 Disk # • • • Key 0 G 1 V 2 X Disk n Spring 2006 Copyright (c) All rights reserved Leonard Wesley 10 ISAM: Insertion/Deletion Insertion: Might involve moving data across blocks Can leave extra space when inserting into a block Deletion: Might involve contracting data across blocks Need not contract every time, i.e., leave some space for possible future expansion Spring 2006 Copyright (c) All rights reserved Leonard Wesley 11 Multiway Search Tree (order m) A generalization of a binary search trees. Each node has at most m children. If k<=m is the number of children, then the node has exactly k-1 keys. The tree is ordered. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 12 Multiway Search Tree (cont.) Nodes in a multiway tree k1 keys < k1 Spring 2006 k2 k3 k4 k5 k2 < keys < k3 Copyright (c) All rights reserved Leonard Wesley k5 < keys 13 Definition Of A B-Tree A B-Tree of order m is a m-way tree such that All leaves are on the same level All internal nodes except the root node are constrained to have at most m non-empty children and at least m/2 non-empty children. The root node has at most m non-empty children Spring 2006 Copyright (c) All rights reserved Leonard Wesley 14 Three Important Properties Of B-Trees All nodes in the B-Tree are at least half-full (root node is an exception at times) The B-tree is always balanced. That is, an identical number of nodes must be read into memory in order to locate all keys at any given level in the tree. A well organized B-Tree will have just a small number of levels relative to the number of nodes. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 15 Where are B-Tree Used? B-Trees are commonly found in database and file systems. B-Trees allow logarithmic time insertions and deletions. They generally grow from the bottom upwards as elements are inserted, whereas most binary trees grow downward. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 16 The Six Rules Governing B-Trees R1: A B-Tree might be empty, if not, then each node has some specified MINIMUM number of entries in each node. R2: The MAXIMUM number of entries is twice the MINIMUM. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 17 The Six Rules Governing B-Trees (cont) R3: The entries of each B-Tree node are stored in a partially filled array, sorted from the smallest entry (at index 0) to the largest entry (at the final position of the array). B-Tree node h k The data in such an array can be stored in a block on a disk k* n . . 0 Spring 2006 . . * B-Trees can support duplicate keys n-1 Copyright (c) All rights reserved Leonard Wesley 18 The Six Rules Governing B-Trees (cont) R4: The number of subtrees below a non-leaf node is always one more than the number of entries in the node. 0 4 entries in a non-leaf node 1 2 3 45 55 67 82 Keys > 82 Keys < 45 subtree 0 subtree 4 Keys > 45 & < 55 subtree 1 5 subtrees Spring 2006 Keys > 67 & < 82 Keys > 55 & < 67 subtree 3 subtree 2 Copyright (c) All rights reserved Leonard Wesley 19 The Six Rules Governing B-Trees (cont) R5: For any non-leaf node: An entry at index i is greater than all the entries in subtree i of the node, and An entry at index i is less than all the entries at entry i+1 of the node. R6: Every leaf node in a B-Tree has the same depth (i.e., at the same level) Spring 2006 Copyright (c) All rights reserved Leonard Wesley 20 Example B-Tree MIN = 1 MAX = 2 30 80 20 10 Spring 2006 50 60 25 35 40 90 55 72 Copyright (c) All rights reserved Leonard Wesley 82 85 95 21 Searching For A Target In B-Trees Start with root node and search for target in the array at that node. If found, then done and return success. If the target is not in the root and there are no children, then also done, but return failure. If the target is not in the root node, and there are children, then if the target exists, then it can only be in one subtree. Compare the target with the listed keys and traverse first subtree i for which target is < key_array[i] … while search key_array from left to right … up to data_count. Repeat the process at the new root node Spring 2006 Copyright (c) All rights reserved Leonard Wesley 22 Inserting Into A B-Tree Add the new key to the appropriate leaf node Overflow? Yes No Split the node into two nodes on the same level, and promote the median key Spring 2006 Copyright (c) All rights reserved Leonard Wesley 23 Loose Insertion (pg. 551 Maini & Savitch, one of several ways) 6 MIN = 1 MAX = 2 17 19 12 4 22 Insert 18 6 | 17 Excess Entry (problem child) 4 Spring 2006 12 18 | 19 | 22 Copyright (c) All rights reserved Leonard Wesley 24 Fixing A Loose Insertion MIN = 1 MAX = 2 6, 22 18 12 4 Split problem child, and promote middle key to parent node. Still have excess. 17, 19 17 19 6 4 Spring 2006 12 Fix excess by repeating the process. Split node and promote middle key to new root node. 18 Copyright (c) All rights reserved Leonard Wesley 22 25 Pseudo Code For Loose Insert 1. 2. Make a local variable, i, equal to the first index such that data[i] is not less than the new entry to insert. If there is no such index, then set i equal to data_count, indicating that all of the entries are less than the target. If (we found the new entry at data[i]) a) Return false with no further work (since the new entry is already in the tree) else if (the root has no children) b) Add the new entry to the root at data[i]. The original entries at data[i] and afterwards must be shifted right to make room for the new entry. Return to indicate that we added the entry. else c) Save the value from this recursive call: subset[i]->loose_insert(entry); Then check whether the root of subset[i] now has an excess entry; if so, then fix that problem. Return the saved value from the recursive call. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 26 Insert In Class Exercise MIN = 1 MAX = 2 Insert 5, then insert 7. 6 4 Spring 2006 17 12 Copyright (c) All rights reserved Leonard Wesley 19 22 27 Deleting From A B-Tree Spring 2006 Copyright (c) All rights reserved Leonard Wesley 28 Deleting From A B-Tree Example #1 6, 17 6 Spring 2006 Min = 1 Max = 2 19, 22 12 4 4 Delete 17 Violates # subtrees = # keys +1 B-Tree Rule 4 12 19, 22 Copyright (c) All rights reserved Leonard Wesley 29 Solution To Example #1 Min = 1 Max = 2 6, 19 4 Spring 2006 12 Copyright (c) All rights reserved Leonard Wesley 22 30 Deleting From A B-Tree Example #2 6, 17 2, 4 Min = 2 Max = 4 10, 12 19, 22 Delete 22 6, 17 2, 4 Spring 2006 10, 12 19 Copyright (c) All rights reserved Leonard Wesley Violates # keys !< MIN B-Tree Property 31 Solution #1 For Example #2 Min = 2 Max = 4 6 10, 12, 17, 19 2, 4 6, 2, 4 Spring 2006 Case 3 Solution: combine subset [i] with subset[i-1] If excess entries in siblings are not available 17 pg. 561 Main & Savitch 10, 12 19 Copyright (c) All rights reserved Leonard Wesley 32 Solution #2 To Fix A Shortage Case 1: Transfer an extra entry from subset[i-1] to subset[i] Min = 2 pg 560 Main & Savitch 6, 2, 4 Spring 2006 6, 15 17 10, 12, 15 Max = 4 19 2, 4 Copyright (c) All rights reserved Leonard Wesley 10, 12 17, 19 33 Solution #3 To Fix A Shortage Case 2: Transfer an extra entry from subset[i+1] Pg 561 Main & Savitch 6, 2, 4 Spring 2006 10 17 6, 19 19, 21, 22 2, 4 10, 17 Copyright (c) All rights reserved Leonard Wesley 21, 22 34 Deleting From A B-Tree (Loose Erase) 1. Make a local variable, i, = first index such that data[i] is !< target to delete. If there is no such index, then set i = to data_count, indicating that all of the entries are less than the target. 2. Deal with one of the following four possibilities: a. Root has no children, and we did not find the target (i.e., noting to do) b. Root has no children, and we found the target. Just remove target. c. Root has children, did not find target in root. Make recursive call to search subset[i]. d. Root has children, found target in root. Remove largest from subset[i], insert into data[i]. Elaborate on 2c and 2d on following slides … Spring 2006 Copyright (c) All rights reserved Leonard Wesley 35 Delete From B-Tree: Elaborate 2c Target not found in root node, but target might be in subset[i]. Make recursive call subset[i]->loose_erase(target) This will remove the target from subset[i] if it is in subset[i]. If so, then subset[i] might have < MIN entries. If so, then it needs to be fixed. subset[i]->fix_shortage(size_t i); Will discuss later Spring 2006 Copyright (c) All rights reserved Leonard Wesley 36 Delete From B-Tree: Elaborate 2d Target is found in root node, but cannot be remove because there are children. subset[i]->loose_erase(target) Go to subset[i] and remove the largest item in the subset. Create a copy of this largest item and insert it in data[i] (which contains the target) In effect this removes the target. However, removing the largest can cause a shortage. If so, call subset[i]->fix_shortage(i); Will discuss NOW!! Spring 2006 Copyright (c) All rights reserved Leonard Wesley 37 Fix Shortage Case 1: If subset[i-1] has extra entries, then transfer the entry to subset[i] (pg 560 Main & Savitch) Transfer data[i-1] (i.e., 17) down to the front of subset[i]->data Shift over as necessary & update data count Transfer the final item of subset[i-1] (i.e., 15) up to replace data[i-1] and update data_count If subset[i-1] has children, transfer the final child of subset[i-1] over to the front of subset[i] … update data_count 6, 2, 4 Spring 2006 6, 17 10, 12, 15 19 2, 4 Copyright (c) All rights reserved Leonard Wesley 15 10, 12 17, 19 38 Fix Shortage (cont.) Case 2: If subset[i+1] has extra entries, then transfer the entry to subset[i] (pg 561 Main & Savitch) Similar to Case 1 6, 2, 4 Spring 2006 10 17 6, 19 19, 21, 22 2, 4 10, 17 Copyright (c) All rights reserved Leonard Wesley 21, 22 39 Fix Shortage (cont.) Case 3: Combine subset[i] with subset[i-1] (pg 561 Main & Savitch) If subset[i-1] is present (i.e., i > 0) but subset[i-1] only has the minimum # items/keys (i.e., no excess keys/items). Transfer data[i-1] down from the end of subset[i-1]->data …(see a pg 562) Transfer all of the items and children from subset[i] to the end of subset[i-1] … (see b pg 562) Delete the node subset[1] and shift subset[i+1], subset[i+2], and so on left… (see c pg 562) 6, 2, 4 Spring 2006 17 10, 12 Deleted 22 19 6 2, 4 Copyright (c) All rights reserved Leonard Wesley 10, 12, 17, 19 40 In Class Delete Example #2 Go through Loose Erase Section In Main & Savitch pg. 558. Spring 2006 Copyright (c) All rights reserved Leonard Wesley 41