B-trees handout

advertisement
Notes on the B-Tree data structure - D. Goforth, May 2007 (revised from 2005)
The algorithms in these notes are based on the design described in Main, sec.10.2
Three core operations:
(i) search
public boolean search (Comparable target)
uses:
private int firstGE (Comparable target)
(ii) add
public void add (Comparable target)
uses:
private void looseAdd (Comparable target)
private void fixExcess (int index)
(iii)
remove
public boolean remove (Comparable target)
uses:
private boolean looseRemove (Comparable target)
private Comparable removeBiggest ()
private void fixShortage (int index)
The node object:
IntBalancedSet{
Comparable[] data;
int dataCount;
IntBalancedSet[] subset;
int childCount;
}
ALGORITHM search (target) // in Main, search is contains()
1. BASE CASE
Search for target in data array in this node
i ← firstGE(target)
IF (data[i]== target) return true
IF NOT found and this is a leaf node return false
2. RECURSIVE CASE
Return subset[i].search(target)
___________________________________________
ALGORITHM firstGE (target)
1. Search for target in data array in range 0 to dataCount - 1
(binary search)
2. IF found at data[i]
Return i
3. IF NOT found
Return least i such that data [i] > target*
* special case, if target > all values in data, return dataCount
ALGORITHM add (element) // detailed version later
1. looseAdd(element)
2. IF (dataCount>MAXIMUM) // root node has too many items in data
fix the root node // this uses fixExcess() as part of solution
___________________________________________
ALGORITHM looseAdd (element)
1. BASE CASE
Search for element in data array in this node
i ← firstGE(target)
IF found return // element already in the set
IF NOT found and this is a leaf node
put element in data array in sorted order (insertData(i) )
return
2. RECURSIVE CASE
subset[i].looseAdd(element)
3. IF subset[i].dataCount>MAXIMUM // subset[i].data has too many items
fixExcess(i)
4. Return
___________________________________________
ALGORITHM fixExcess (i)
1. midElement ← subset[i].data[MINIMUM] // middle element of array
2. newSubset ← new IntBalancedSet
// put 'right half' of subset[i] in newSubset
newSubset.dataCount ← MINIMUM
newSubset.data ← subset[i].data[MINIMUM+1,…,MAXIMUM]
subset[i].dataCount ← MINIMUM (make refs in dataCount null for moved references)
IF subset[i] is not a leaf node // move subtrees if any
newSubset.childCount ← MINIMUM+1
newSubset.subset ←subset[i].subset[MINIMUM+1,…,MAXIMUM+1]
subset[i].childCount ← MINIMUM+1 (make refs in subset null for moved references)
3. insertData(i,midElement)
// put the former middle element of the subset[i] into the data array
4. insertSubset(i+1,newSubset)
// put the former 'right half' of subset[i] into the subset array next to it
ALGORITHM add (element) (revised to show detail of step 2)
1. looseAdd(element)
2. IF (dataCount>MAXIMUM) // root node has too many items in data
2.1 newCopy ← new IntBalancedSet
2.2 // copy all root node contents to new node
newCopy.data ← data
newCopy.dataCount ← dataCount
newCopy.subset ← subset
newCopy.childCount ← childCount
2.3 // empty the root and make it parent of newCopy
dataCount ← 0
childCount ← 1
subset[0] ← newCopy
2.4 // fix the problem of too many items in subset[0], i.e. newCopy
fixExcess(0)
ALGORITHM remove (target)
1. looseRemove(target)
2. IF (dataCount==0 && childCount!=0) // root node has no items but has a child
eliminate a node (reducing height of tree; be sure to keep root reference)
___________________________________________
ALGORITHM looseRemove (target)
1.
BASE CASE
i ← firstGE(target)
IF this is a leaf node
1.a IF target NOT found
return FALSE
1.b ELSE
// target found at i
deleteData(i)
return TRUE
2. RECURSIVE CASE (not a leaf node)
2.c IF target NOT found
foundTarget ← subset[i].looseRemove(target)
IF foundTarget AND subset[i].dataCount<MINIMUM
// needs fixing - too few items in data array
fixShortage(i)
return foundTarget
2.d ELSE
// target found at i in this node
data[i] ← subset[i].removeBiggest() // grab element to replace target
IF subset[i].dataCount < MINIMUM // subset just became too small
fixShortage(i)
return TRUE
___________________________________________
ALGORITHM removeBiggest ()
1. IF (childCount==0) // remove last element if no children
answer ← deleteData(dataCount-1)
2. ELSE
// if subtrees, get last child from rightmost subtree
answer ← subset[childCount-1].removeBiggest();
IF (subset[childCount-1].dataCount<MINIMUM)
fixShortage(childCount-1)
3. return answer
ALGORITHM fixShortage (i)
1. // grab data element from left sibling node
IF (i!=0 AND subset[i-1].dataCount>MINIMUM)
subset[i].insertData(0,data[i-1])
data[i-1] ← subset[i-1].deleteData(subset[i-1].dataCount-1)
IF ( subset[i] not a leaf )
subset[i].insertSubset(0,subset[i-1].removeSubset(childCount-1))
2. // grab data element from right sibling node
ELSE IF (i!=dataCount-1 AND subset[i+1].dataCount>MINIMUM)
subset[i].insertData(dataCount,data[i])
data[i] ← subset[i+1].deleteData(0)
IF ( subset[i] not a leaf )
subset[i].insertSubset(childCount,subset[i+1].removeSubset(0))
3. // combine subset with left sibling since they're both 'small'
ELSE IF (i!=0 AND subset[i-1].dataCount==MINIMUM)
subset[i-1].insertData(subset[i-1].dataCount,deleteData(i-1))
copy data array and subset array from subset[i] to end of subset[i-1]
deleteSubset(i)
4. // combine subset with right sibling
ELSE IF (i!=dataCount-1 AND subset[i+1].dataCount==MINIMUM)
subset[i].insertData(dataCount,deleteData(i))
copy data array and subset array from subset[i+1] to end of subset[i]
deleteSubset(i+1)
Download