Chapter 14

advertisement
B Trees
B Trees are a special case of the tree data structure. First we will review tree
structures and search trees, then talk about B-Trees and later B+ Trees.
Tree Structure Terminology Review







A tree is formed by nodes
Each node has one parent, and zero or more child nodes
The root is the only node without a parent node.
A node that does not have any child nodes is a leaf node.
A non leaf node is an internal node
The level of a node is one more than the level of its parent, with the root node
being zero.
A subtree of a node is the node and all its descendant nodes (child nodes, and
children’s children).
Level 0
A
B
E
C
F
G
H
J





Level 1
D
I
Level 2
K
Level 3
Root node is A, and its child nodes are B, C, D.
Nodes E, C, G, H, J, K are leaf nodes.
A common way to implement a tree is to have as many pointers in a node as
there are children of the node.
As well, a parent pointer can also be stored in each node.
Nodes usually contain some type of stored information. When a multilevel index
is implemented as a tree structure, the information includes values of the files’
indexing field that are used to guide the search for a record.
Multilevel Indexes as Special Search Trees
Multilevel indexes can be thought of as a variation of a search tree(a special type of
tree that is used to guide the search for a record with record field value V)
Each node can have as many as fo pointers and fo key values, where fo is the index
fo (blocking factor of the index).

.
.
.
The index values in each node guide us to the next node, until you reach the data
file block that contains the required records. By following a pointer, the search is
restricted at each level to a subtree of the search tree, and the nodes not in the
subtree are ignored.
Search Trees
P1
K1
…
K1-1
Pi
Ki
Pq-1
Kq-1
Kq-1
Pq
X
X
X
X < K1
Ki-1< X < Ki
Kq-1< X
A search tree of order p is such that each node contains at most p-1 search values
and p pointers in the following order: <P1, K1, P2, K2…Kq-1, Pq>, where:
 q<=p;
 each Pi is a pointer to a child node, or a null pointer;
 and each Ki is a search value from some ordered set of values.
Two constraints must hold on the search tree:
1. Within each node, the key values are ordered (K1 < K2 < …<Kq-1>)
2. For all values X in the subtree pointed to by Pi,
 For 1 < i < q, Ki-1 <X < Ki
 For i = 1, X < K, and
 For i = q, Ki-1 < X,
When searching for a value X, you follow the pointers, P, using the above conditions.
5
3
1
6
9
7
8
12
The values in the tree can be one of the fields in the file called the search field. This
is the same as the index field as a file. Each key value is associated with a pointer,
either to a record in the data file having that search key value, or a pointer to the
block containing the record with the search key value.
The tree in the first diagram is not balanced, meaning that leaf nodes can be found at
different levels. This is not an efficient organization, because some nodes may be at
very high levels, requiring many block accesses.
The B-Tree addresses this problem by specifying additional constraints.
B Trees
The B-Tree has additional constraints to ensure the tree is aways balanced, and the
space wasted by deletion never becomes excessive.
P1
K1
Pr1
P2…
Ki-1
Pri-1
Pi
Ki
Pri
Pq-1
Kq-1
Prq-1
Pq
data
pointer
Tree
Pointer
X
X < K1
X
Ki-1< X < Ki
X
Kq-1< X
The formal definition of a B-Tree of order p, when used as an access structure on a
key field, to search for a record is as follows:
1. Each internal node in the tree is of the form:
<P1, <K1, Pr1>, P2, <K2, Pr2>… <Kq-1, Prq-1>, Pq> , where q<=p. Each P is a tree
pointer, a pointer to a node in the tree, and each Pr is a data pointer, a pointer to
the record whose search key field value is equal to K.
2. The key values, Ki…Kq-1 are ordered within each node.
3. For all search key values X in the subtree pointed at by Pi, the ith subtree, we
have:
 For 1 < i < q, Ki-1 < X < Ki ,
 For i = 1, X < Ki,
 For i = q, Ki-1 < X
4. Each node has at most p tree pointers.
5. Each node except the root has at least p/2 tree pointers. The root node has at
least two tree pointers unless it is the only node in the tree.
6. A node with q tree pointers, q<=p had q-1 search key field values and hence q-1
data pointers.
7. All leaf nodes are at the same level. Leaf nodes have the same structure as
internal nodes except that all of their tree pointers Pi are null.
1
3
5
8
6
7
9
12
The above example shows a B-Tree of order 3.
The example assumes the B-Tree access structure is on a key field, therefore the
values are unique. If the B-Tree is used on a non-key field, the pointer would point to
a cluster of blocks that contain blocks of file pointers, similar to option 3 for
secondary indexes.







B-Tree starts with a single root node, which is also a leaf node, at level 0.
Once the root node is full with p-1 search key values, the root node splits
evenly into two nodes at level 1. Only the middle value is kept in the root.
When a non root node is full, and a new entry is inserted into it, the node is
split into two nodes at the same level, and the middle entry is moved to the
parent node along with two pointers to the split nodes.
If the parent node is full, it is also split.
Splitting can propogate all the way to the root, creating a new level if the root
is split.
If deletion of a value causes a node to be less that half full, it is combined with
its neighboing nodes, this can propogate all the way to the root.
After numerous random insertions and deletions, the nodes are approximately
69 percent full when the number of values in the tree stabilizes. If this
happens, node splitting and combining will occur only rarely.
Insert the following values into a B-tree:
5. 7, 4, 6, 8, 14, 2
Calculating the order p of a B-Tree stored on disk.
Example 4 from Text:
Suppose the search field is V=9 bytes long, the disk block size is B=512 bytes, the
record pointer is Pr = 7 bytes, and a block pointer is P = 6 bytes. Each B Tree node
can have at most p tree pointers, and p-1 data pointers and p-1 search key fields
values. These must fit into a single disk block if earch B-Tree node is to correspond
to a disk block.
To calculate p:
6p+(p-1)*(9 + 7) <=512
6p + 9p + 7p -9 – 7 <= 512
22p – 16 <= 512
22p <= 528
p <= 24
Although p can be a maximum of 24, we choose p = 23 because the B Tree nodes
may contain additional information used to manipulate the tree, such as the number
of entries q in the node, and a pointer to the parent, therefore before we calculate p
above, the block size should be first reduced by the amount of extra space needed.
Example 5 from Text:
Suppose that a serach field of example 4 is a nonordering key field, and we construct
a B-tree on this field. Assume that each node of the B tree is 69 percent full. Each
node on average will have p*0.69 = 23 * 0.69 or approximately 16 pointers, and
hence, 15 search key values. The average fan out fo = 16. To see how many
pointers and values can exist on each level:
Root:
Level 1:
Level 2:
Level 3:
1 node
16 nodes
256 nodes
4096 nodes
15 entries
16 pointers
240 entries
256 pointers
3840 entries 4096 pointers
61,440 entries
Each level, the number of entries is calculated by multiplying the total number of
pointers at the previous level by 15, the average number of entries at each node.
Hence for a given block size, pointer size, and search key field size, a two level BTree holds 3840 + 240 + 15 = 4096 entries on average. A three level B-tree holds
65,535 entries on average.
Download