Notes on the B-Tree data structure - D. Goforth, May 2007 (revised from 2005) The algorithms in these notes are based on the design described in Main, sec.10.2 Three core operations: (i) search public boolean search (Comparable target) uses: private int firstGE (Comparable target) (ii) add public void add (Comparable target) uses: private void looseAdd (Comparable target) private void fixExcess (int index) (iii) remove public boolean remove (Comparable target) uses: private boolean looseRemove (Comparable target) private Comparable removeBiggest () private void fixShortage (int index) The node object: IntBalancedSet{ Comparable[] data; int dataCount; IntBalancedSet[] subset; int childCount; } ALGORITHM search (target) // in Main, search is contains() 1. BASE CASE Search for target in data array in this node i ← firstGE(target) IF (data[i]== target) return true IF NOT found and this is a leaf node return false 2. RECURSIVE CASE Return subset[i].search(target) ___________________________________________ ALGORITHM firstGE (target) 1. Search for target in data array in range 0 to dataCount - 1 (binary search) 2. IF found at data[i] Return i 3. IF NOT found Return least i such that data [i] > target* * special case, if target > all values in data, return dataCount ALGORITHM add (element) // detailed version later 1. looseAdd(element) 2. IF (dataCount>MAXIMUM) // root node has too many items in data fix the root node // this uses fixExcess() as part of solution ___________________________________________ ALGORITHM looseAdd (element) 1. BASE CASE Search for element in data array in this node i ← firstGE(target) IF found return // element already in the set IF NOT found and this is a leaf node put element in data array in sorted order (insertData(i) ) return 2. RECURSIVE CASE subset[i].looseAdd(element) 3. IF subset[i].dataCount>MAXIMUM // subset[i].data has too many items fixExcess(i) 4. Return ___________________________________________ ALGORITHM fixExcess (i) 1. midElement ← subset[i].data[MINIMUM] // middle element of array 2. newSubset ← new IntBalancedSet // put 'right half' of subset[i] in newSubset newSubset.dataCount ← MINIMUM newSubset.data ← subset[i].data[MINIMUM+1,…,MAXIMUM] subset[i].dataCount ← MINIMUM (make refs in dataCount null for moved references) IF subset[i] is not a leaf node // move subtrees if any newSubset.childCount ← MINIMUM+1 newSubset.subset ←subset[i].subset[MINIMUM+1,…,MAXIMUM+1] subset[i].childCount ← MINIMUM+1 (make refs in subset null for moved references) 3. insertData(i,midElement) // put the former middle element of the subset[i] into the data array 4. insertSubset(i+1,newSubset) // put the former 'right half' of subset[i] into the subset array next to it ALGORITHM add (element) (revised to show detail of step 2) 1. looseAdd(element) 2. IF (dataCount>MAXIMUM) // root node has too many items in data 2.1 newCopy ← new IntBalancedSet 2.2 // copy all root node contents to new node newCopy.data ← data newCopy.dataCount ← dataCount newCopy.subset ← subset newCopy.childCount ← childCount 2.3 // empty the root and make it parent of newCopy dataCount ← 0 childCount ← 1 subset[0] ← newCopy 2.4 // fix the problem of too many items in subset[0], i.e. newCopy fixExcess(0) ALGORITHM remove (target) 1. looseRemove(target) 2. IF (dataCount==0 && childCount!=0) // root node has no items but has a child eliminate a node (reducing height of tree; be sure to keep root reference) ___________________________________________ ALGORITHM looseRemove (target) 1. BASE CASE i ← firstGE(target) IF this is a leaf node 1.a IF target NOT found return FALSE 1.b ELSE // target found at i deleteData(i) return TRUE 2. RECURSIVE CASE (not a leaf node) 2.c IF target NOT found foundTarget ← subset[i].looseRemove(target) IF foundTarget AND subset[i].dataCount<MINIMUM // needs fixing - too few items in data array fixShortage(i) return foundTarget 2.d ELSE // target found at i in this node data[i] ← subset[i].removeBiggest() // grab element to replace target IF subset[i].dataCount < MINIMUM // subset just became too small fixShortage(i) return TRUE ___________________________________________ ALGORITHM removeBiggest () 1. IF (childCount==0) // remove last element if no children answer ← deleteData(dataCount-1) 2. ELSE // if subtrees, get last child from rightmost subtree answer ← subset[childCount-1].removeBiggest(); IF (subset[childCount-1].dataCount<MINIMUM) fixShortage(childCount-1) 3. return answer ALGORITHM fixShortage (i) 1. // grab data element from left sibling node IF (i!=0 AND subset[i-1].dataCount>MINIMUM) subset[i].insertData(0,data[i-1]) data[i-1] ← subset[i-1].deleteData(subset[i-1].dataCount-1) IF ( subset[i] not a leaf ) subset[i].insertSubset(0,subset[i-1].removeSubset(childCount-1)) 2. // grab data element from right sibling node ELSE IF (i!=dataCount-1 AND subset[i+1].dataCount>MINIMUM) subset[i].insertData(dataCount,data[i]) data[i] ← subset[i+1].deleteData(0) IF ( subset[i] not a leaf ) subset[i].insertSubset(childCount,subset[i+1].removeSubset(0)) 3. // combine subset with left sibling since they're both 'small' ELSE IF (i!=0 AND subset[i-1].dataCount==MINIMUM) subset[i-1].insertData(subset[i-1].dataCount,deleteData(i-1)) copy data array and subset array from subset[i] to end of subset[i-1] deleteSubset(i) 4. // combine subset with right sibling ELSE IF (i!=dataCount-1 AND subset[i+1].dataCount==MINIMUM) subset[i].insertData(dataCount,deleteData(i)) copy data array and subset array from subset[i+1] to end of subset[i] deleteSubset(i+1)