B-Trees

advertisement
CMPE126 Data Structures
B-Trees
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
0
Why B-Trees?
 Trees studied so far are for storing data in
memory
 B-Trees are better suited for storing data in
memory AND on secondary storage.
 Better suited for balancing data than some other
three ADTs.
 Can store multiple keys with the same value,
unlike some other trees, such as AVL trees.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
1
The Problem With Unbalanced Trees
1
The levels are sparsely
filled resulting in deep
paths. This defeats the
purpose of binary trees
2
3
4
5
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
2
Possible Solutions To Unbalanced
Trees
 Periodically balance the tree
 Don’t let a tree get too unbalanced when
inserting or deleting
 AVL Trees: Sometimes called HB[1] trees. Invented by
Adel’son-Vel’skii and Landis ~early 1960s. (an inmemory solution … not ideally suited secondary storage)
 B-Trees: Proposed by R. Bayer & E.M. Creight (see pg.
542 Main & Savitch for ref.)
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
3
What Is A B-Tree?
 It is a type of “multiway” tree.
 It is NOT a binary search tree, nor is it a
binary tree.
 It provides a fast way to index into a multilevel set of nodes.
 Each node in the B-Tree contains a sorted
array of key values.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
4
Motivation For Multiway Tree
 Secondary storage (e.g., disks) is typically divided into equalsized blocks (e.g., 512, 1024, …, 4096, …)
 The basic I/O operation reads and writes blocks rather than
single bytes at a time between secondary storage and memory.
 Goal is to devise a multiway search tree that will minimize file
access by exploiting disk reads.
 Each access to secondary storage is approximately equal to
250K instructions … depending on the speed of the CPU
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
5
ISAM
 ISAM = Indexed Sequential Access Method.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
6
ISAM: The Idea
Track
Disk
Block
512, 1024, …bytes
Platter
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
7
ISAM: Index & Keys
Data
Block Key
Block #
A Block on a track.
• All data in the block will have keys ≤ the block key, or
have keys ≥ the block key. Pick one inequality and stick
with it.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
8
ISAM: Block Index
This index could be stored in memory
Block #
Key
0
G
1
K
2
N
Block
Index
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
9
ISAM: Disk Index
This index could be stored in memory also
Disk 0
Disk #
•
•
•
Key
0
G
1
V
2
X
Disk n
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
10
ISAM: Insertion/Deletion
 Insertion:
Might involve moving data across blocks
Can leave extra space when inserting into a block
 Deletion:
Might involve contracting data across blocks
Need not contract every time, i.e., leave some
space for possible future expansion
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
11
Multiway Search Tree (order m)
 A generalization of a binary search trees.
 Each node has at most m children.
If k<=m is the number of children, then the node
has exactly k-1 keys.
The tree is ordered.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
12
Multiway Search Tree (cont.)
Nodes in
a multiway
tree
k1
keys < k1
Spring 2006
k2
k3
k4
k5
k2 < keys < k3
Copyright (c) All rights reserved Leonard Wesley
k5 < keys
13
Definition Of A B-Tree
 A B-Tree of order m is a m-way tree such
that
 All leaves are on the same level
 All internal nodes except the root node are
constrained to have at most m non-empty children
and at least m/2 non-empty children.
 The root node has at most m non-empty children
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
14
Three Important Properties Of B-Trees
 All nodes in the B-Tree are at least half-full
(root node is an exception at times)
 The B-tree is always balanced. That is, an
identical number of nodes must be read into
memory in order to locate all keys at any
given level in the tree.
 A well organized B-Tree will have just a small
number of levels relative to the number of
nodes.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
15
Where are B-Tree Used?
 B-Trees are commonly found in database and
file systems.
 B-Trees allow logarithmic time insertions and
deletions.
 They generally grow from the bottom
upwards as elements are inserted, whereas
most binary trees grow downward.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
16
The Six Rules Governing B-Trees
 R1: A B-Tree might be empty, if not, then
each node has some specified MINIMUM
number of entries in each node.
 R2: The MAXIMUM number of entries is twice
the MINIMUM.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
17
The Six Rules Governing B-Trees (cont)
 R3: The entries of each B-Tree node are
stored in a partially filled array, sorted from
the smallest entry (at index 0) to the largest
entry (at the final position of the array).
B-Tree node
h
k
The data in such
an array can be
stored in a block
on a disk
k*
n
.
.
0
Spring 2006
.
.
* B-Trees can
support duplicate
keys
n-1
Copyright (c) All rights reserved Leonard Wesley
18
The Six Rules Governing B-Trees (cont)
 R4: The number of subtrees below a non-leaf node is
always one more than the number of entries in the
node.
0
4 entries in a
non-leaf node
1
2
3
45 55 67 82
Keys > 82
Keys < 45
subtree 0
subtree 4
Keys > 45
& < 55
subtree 1
5 subtrees
Spring 2006
Keys > 67
& < 82
Keys > 55
& < 67
subtree 3
subtree 2
Copyright (c) All rights reserved Leonard Wesley
19
The Six Rules Governing B-Trees (cont)
 R5: For any non-leaf node:
An entry at index i is greater than all the
entries in subtree i of the node, and
An entry at index i is less than all the entries
at entry i+1 of the node.
 R6: Every leaf node in a B-Tree has the same
depth (i.e., at the same level)
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
20
Example B-Tree
MIN = 1
MAX = 2
30 80
20
10
Spring 2006
50 60
25
35 40
90
55
72
Copyright (c) All rights reserved Leonard Wesley
82 85
95
21
Searching For A Target In B-Trees
 Start with root node and search for target in the array
at that node. If found, then done and return success.
 If the target is not in the root and there are no
children, then also done, but return failure.
 If the target is not in the root node, and there are
children, then if the target exists, then it can only be
in one subtree.
 Compare the target with the listed keys and traverse
first subtree i for which target is < key_array[i]
… while search key_array from left to right … up to
data_count.
Repeat the process at the new root node
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
22
Inserting Into A B-Tree
Add the new key
to the appropriate leaf
node
Overflow?
Yes
No
Split the node into two nodes
on the same level, and promote
the median key
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
23
Loose Insertion
(pg. 551 Maini & Savitch, one of several ways)
6
MIN = 1
MAX = 2
17
19
12
4
22
Insert 18
6 |
17
Excess Entry
(problem child)
4
Spring 2006
12
18 | 19 | 22
Copyright (c) All rights reserved Leonard Wesley
24
Fixing A Loose Insertion
MIN = 1
MAX = 2
6,
22
18
12
4
Split problem child, and
promote middle key to
parent node. Still have
excess.
17, 19
17
19
6
4
Spring 2006
12
Fix excess by repeating the
process. Split node and promote
middle key to new root node.
18
Copyright (c) All rights reserved Leonard Wesley
22
25
Pseudo Code For Loose Insert
1.
2.
Make a local variable, i, equal to the first index such that data[i] is
not less than the new entry to insert. If there is no such index, then
set i equal to data_count, indicating that all of the entries are less
than the target.
If (we found the new entry at data[i])
a) Return false with no further work (since the new entry is already
in the tree)
else if (the root has no children)
b)
Add the new entry to the root at data[i]. The original entries at
data[i] and afterwards must be shifted right to make room for
the new entry. Return to indicate that we added the entry.
else
c)
Save the value from this recursive call:
subset[i]->loose_insert(entry);
Then check whether the root of subset[i] now has an excess
entry; if so, then fix that problem. Return the saved value from the
recursive call.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
26
Insert In Class Exercise
MIN = 1
MAX = 2
 Insert 5, then insert 7.
6
4
Spring 2006
17
12
Copyright (c) All rights reserved Leonard Wesley
19
22
27
Deleting From A B-Tree
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
28
Deleting From A B-Tree Example #1
6, 17
6
Spring 2006
Min = 1
Max = 2
19, 22
12
4
4
Delete 17
Violates # subtrees = # keys +1
B-Tree Rule 4
12
19, 22
Copyright (c) All rights reserved Leonard Wesley
29
Solution To Example #1
Min = 1
Max = 2
6, 19
4
Spring 2006
12
Copyright (c) All rights reserved Leonard Wesley
22
30
Deleting From A B-Tree Example #2
6, 17
2, 4
Min = 2
Max = 4
10, 12
19, 22
Delete 22
6, 17
2, 4
Spring 2006
10, 12
19
Copyright (c) All rights reserved Leonard Wesley
Violates # keys !< MIN
B-Tree Property
31
Solution #1 For Example #2
Min = 2
Max = 4
6
10, 12, 17, 19
2, 4
6,
2, 4
Spring 2006
Case 3 Solution: combine
subset [i] with subset[i-1]
If excess entries in siblings
are not available
17
pg. 561 Main & Savitch
10, 12
19
Copyright (c) All rights reserved Leonard Wesley
32
Solution #2 To Fix A Shortage
 Case 1: Transfer an extra entry from
subset[i-1] to subset[i]
Min = 2
 pg 560 Main & Savitch
6,
2, 4
Spring 2006
6, 15
17
10, 12, 15
Max = 4
19
2, 4
Copyright (c) All rights reserved Leonard Wesley
10, 12
17, 19
33
Solution #3 To Fix A Shortage
 Case 2: Transfer an extra entry from
subset[i+1]
 Pg 561 Main & Savitch
6,
2, 4
Spring 2006
10
17
6, 19
19, 21, 22
2, 4
10, 17
Copyright (c) All rights reserved Leonard Wesley
21, 22
34
Deleting From A B-Tree
(Loose Erase)
1.
Make a local variable, i, = first index such that data[i] is !< target to
delete. If there is no such index, then set i = to data_count,
indicating that all of the entries are less than the target.
2.
Deal with one of the following four possibilities:
a. Root has no children, and we did not find the target (i.e., noting to do)
b. Root has no children, and we found the target. Just remove target.
c. Root has children, did not find target in root. Make recursive call to
search subset[i].
d. Root has children, found target in root. Remove largest from subset[i],
insert into data[i].
Elaborate on 2c and 2d on following slides …
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
35
Delete From B-Tree: Elaborate 2c
 Target not found in root node, but target might
be in subset[i]. Make recursive call
subset[i]->loose_erase(target)
 This will remove the target from subset[i] if it
is in subset[i]. If so, then subset[i] might have
< MIN entries. If so, then it needs to be fixed.
subset[i]->fix_shortage(size_t i);
Will discuss later
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
36
Delete From B-Tree: Elaborate 2d
 Target is found in root node, but cannot be remove
because there are children.
subset[i]->loose_erase(target)

Go to subset[i] and remove the largest item in the subset.
Create a copy of this largest item and insert it in data[i]
(which contains the target) In effect this removes the
target. However, removing the largest can cause a
shortage. If so, call
subset[i]->fix_shortage(i);
Will discuss NOW!!
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
37
Fix Shortage
 Case 1: If subset[i-1] has extra entries, then transfer
the entry to subset[i] (pg 560 Main & Savitch)
 Transfer data[i-1] (i.e., 17) down to the front of subset[i]->data
 Shift over as necessary & update data count
 Transfer the final item of subset[i-1] (i.e., 15) up to replace data[i-1]
and update data_count
 If subset[i-1] has children, transfer the final child of subset[i-1] over
to the front of subset[i] … update data_count
6,
2, 4
Spring 2006
6,
17
10, 12, 15
19
2, 4
Copyright (c) All rights reserved Leonard Wesley
15
10, 12
17, 19
38
Fix Shortage (cont.)
 Case 2: If subset[i+1] has extra entries, then
transfer the entry to subset[i] (pg 561 Main &
Savitch)
 Similar to Case 1
6,
2, 4
Spring 2006
10
17
6, 19
19, 21, 22
2, 4
10, 17
Copyright (c) All rights reserved Leonard Wesley
21, 22
39
Fix Shortage (cont.)
 Case 3: Combine subset[i] with subset[i-1] (pg 561 Main & Savitch)
 If subset[i-1] is present (i.e., i > 0) but subset[i-1] only has the minimum
# items/keys (i.e., no excess keys/items).
 Transfer data[i-1] down from the end of subset[i-1]->data …(see a pg
562)
 Transfer all of the items and children from subset[i] to the end of
subset[i-1] … (see b pg 562)
 Delete the node subset[1] and shift subset[i+1], subset[i+2], and so on
left… (see c pg 562)
6,
2, 4
Spring 2006
17
10, 12
Deleted 22
19
6
2, 4
Copyright (c) All rights reserved Leonard Wesley
10, 12, 17, 19
40
In Class Delete Example #2
Go through Loose Erase
Section In Main & Savitch
pg. 558.
Spring 2006
Copyright (c) All rights reserved Leonard Wesley
41
Download
Related flashcards
Create Flashcards