B + -trees

advertisement
File Organizations
•Tree terms
•root, internal, leaf, subtree
•parent, child, sibling
•balanced, unbalanced
•b+-tree
- split on overflow; merge on underflow
- in practice it is usually 3 or 4 levels deep
•search, insert, delete algorithms
Fall 2015
R McFadyen
ACS - 3902
1
File Organizations
MySQL
(simplified syntax)
CREATE [UNIQUE] INDEX index_name
ON tbl_name (index_col_name,...)
index_type: USING {BTREE | HASH}
Fall 2015
R McFadyen
ACS - 3902
2
File Organizations
Indexes are automatically created for:
PRIMARY KEY
Is a constraint that enforces entity integrity for a given column or columns
through a unique index. Only one PRIMARY KEY constraint can be
created per table.
UNIQUE
Is a constraint that provides entity integrity for a given column or columns
through a unique index. A table can have multiple UNIQUE constraints.
Fall 2015
R McFadyen
ACS - 3902
3
File Organizations
CLUSTERED
Creates an object where the physical order of rows is the same as the
indexed order of the rows.
Index entries that are logically close also means the data will be close
together physically.
A table or view is allowed one clustered index.
Primary key index is normally clustered
Fall 2015
R McFadyen
ACS - 3902
4
File Organizations
Motivation (finding one record given its key)
•Scanning a file is time consuming
•B+-tree provides a short access path
file of records
page1
B+-tree
page2
page3
Fall 2015
R McFadyen
ACS - 3902
5
File Organizations
Motivation
•A B+-tree is a tree, in which each node is a bucket.
•A B+-tree for a file is stored in a separate file.
•A file could have many B+-trees
file of records
bucket 1
B+-tree
bucket 2
bucket 3
Fall 2015
R McFadyen
ACS - 3902
6
File Organizations
b+-tree
•based on b-tree (Bayer, balanced, Boeing)
•dynamic
Root
Internal
nodes
...
...
Leaf nodes
Fall 2015
R McFadyen
ACS - 3902
7
File Organizations
Node structure for b+-tree of order p
non-leaf node (internal node or a root)
• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq > (q  p)
• keys are in sequence
K1 < K2 < ... < Kq-1
(i.e. it’s an ordered set)
• for any key value, X, in the subtree pointed to by Pi
•Ki-1 < X  Ki for 1 < i < q
•X  K1
for i = 1
•Kq-1 < X
for i = q
•each internal node has at most p pointers
•each node except root must have at least p/2 pointers
•the root, if it has some children, must have at least 2 pointers
Fall 2015
R McFadyen
ACS - 3902
8
File Organizations
Node structure for b+-tree of order p
leaf node (terminal node)
•< (K1, Pr1), (K2, Pr2), …, (Kq-1, Prq-1), Pnext >
•K1 < K2 < ... < Kq-1
•Pri points to a record with key value Ki , or, Pri points to a block
containing a record with key value Ki
•each leaf has at least p/2 keys
•maximum of p keys
•all leaves are at the same level (balanced)
•Pnext points to the next leaf for key sequencing
Fall 2015
R McFadyen
ACS - 3902
9
File Organizations
Example
•insert records with key values
Diane, Cory, Ramon, Amy, Miranda, Ahmed,
Marshall, Zena, Rhonda, Vincent, Hok
into a b+-tree with p=3.
internal node will have minimum 2 pointers and maximum 3
pointers - inserting a fourth will cause a split
leaf can have at least 2 key/pointer pairs and a maximum of 3
key/pointer pairs - inserting a fourth will cause a split
Fall 2015
R McFadyen
ACS - 3902
10
File Organizations
Only leaf nodes at this point
– need a split before there
are internal nodes
insert Diane
Pointer to next leaf
in ascending key
sequence
Diane
Pointer to data
(wherever the
record for Diane
is actually stored)
insert Cory
Cory , Diane
Fall 2015
R McFadyen
ACS - 3902
11
File Organizations
Example
insert Ramon
Only leaf nodes
at this point
Cory , Diane , Ramon
inserting Amy will cause the node to overflow:
Amy
Fall 2015
, Cory , Diane , Ramon
R McFadyen
ACS - 3902
This must split
12
File Organizations
Example
This is logically correct but it exceeds the space available …..
it must split into two leafs:
Amy
Fall 2015
, Cory , Diane , Ramon
R McFadyen
ACS - 3902
13
File Organizations
split the node and promote a key value upwards (this must be Cory
because it’s the highest key value in the left subtree)
Amy
, Cory , Diane , Ramon
Splitting the
above results
in
Tree has grown one
level, from the
bottom up
Cory
Amy
Fall 2015
, Cory
Diane , Ramon
R McFadyen
ACS - 3902
14
File Organizations
Splitting Nodes
Any value being promoted upwards will come from the node that
is splitting.
•When a leaf splits, a ‘copy’ of a key value is promoted.
•When an internal node splits, the middle key value ‘moves’
from a child to a parent node.
There are three situations to be concerned with:
•a leaf splits,
•an internal node splits,
•a new root is generated.
Fall 2015
R McFadyen
ACS - 3902
15
File Organizations
Leaf splitting
When a leaf splits, a new leaf is allocated
•the original leaf is the left sibling, the new one is the right sibling
•key and pointer pairs of the overflowing node are redistributed: the left
sibling will have lesser keys than the right sibling
•a 'copy' of the key value which is the largest of the keys in the left sibling
is promoted to the parent
22 33
33
12 22 33
44 48 55
12 22
31 33
44 48 55
insert 31
Two situations arise: the parent exists or not.
If the parent exists, then a copy of the key value and the pointer to the right
sibling are promoted upwards. Otherwise, the b+-tree is just beginning to grow
...
Fall 2015
R McFadyen
ACS - 3902
16
File Organizations
Internal node splitting
If an internal node splits and it is not the root,
•insert the key and pointer and then determine the middle key
•a new 'right' sibling is allocated
•everything to its left stays in the left sibling
•everything to its right goes into the right sibling
•the middle key value along with the pointer to the new right sibling is
promoted to the parent (the middle key value 'moves' to the parent to become
the discriminator between the left and right siblings)
26 55
55
22 33
22
33
Insert
26
Fall 2015
R McFadyen
ACS - 3902
17
File Organizations
Internal node splitting
When a new root is formed, a key value and two pointers must
be placed into it.
55
26 55
26
56
Insert
56
Fall 2015
R McFadyen
ACS - 3902
18
File Organizations
A sample trace
Diane, Cory, Ramon, Amy, Miranda,
Marshall, Zena, Rhonda, Vincent, Simon, Mary
into a b+-tree with p=3.
Cory
Amy
, Cory
Diane , Ramon
Miranda
Fall 2015
R McFadyen
ACS - 3902
19
File Organizations
Cory
Amy
, Cory
Diane , Miranda , Ramon
Marshall
Cory
Amy
, Cory
Marshall
Diane , Marshall
Miranda , Ramon
Zena
Fall 2015
R McFadyen
ACS - 3902
20
File Organizations
Cory
Amy
, Cory
Marshall
Diane , Marshall
Miranda , Ramon
, Zena
Rhonda
Cory Marshall Ramon
Amy , Cory
Fall 2015
Diane , Marshall
R McFadyen
Miranda , Ramon
ACS - 3902
Rhonda , Zena
21
File Organizations
Marshall
Ramon
Cory
Amy , Cory
Diane , Marshall
Miranda , Ramon
Rhonda , Zena
Vincent
Fall 2015
R McFadyen
ACS - 3902
22
File Organizations
Marshall
Ramon
Cory
Amy , Cory
Diane , Marshall
Miranda , Ramon
Rhonda , Vincent ,Zena
Simon
Fall 2015
R McFadyen
ACS - 3902
23
File Organizations
Marshall
Ramon Simon
Miranda , Ramon
Rhonda , Simon
Vincent , Zena
Mary
Fall 2015
R McFadyen
ACS - 3902
24
File Organizations
A sample b+-tree
p = 3,
pleaf = 2.
5
3
1
3
5
7
6
7
8
8
9
12
Records
Fall 2015
R McFadyen
ACS - 3902
25
File Organizations
Searching a b+-tree
- search a record with key = 8:
5
3
1
3
5
7
6
7
8
8
9
12
The cost of retrieving this record is at most 4 disk accesses: 3 to the index component
and 1 to the data component. In an actual system it would usually be less because the
system would keep as much of the index as it can in main memory.
Fall 2015
R McFadyen
ACS - 3902
26
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
7
6
7
9
9
12
Deleting 8 results in underflow and
key redistribution
Fall 2015
R McFadyen
ACS - 3902
27
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
7
6
7
9
12 is removed.
Fall 2015
R McFadyen
ACS - 3902
28
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
6
6
7
9 is removed.
Fall 2015
R McFadyen
ACS - 3902
29
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
6
6
Deleting 7 makes this pointer useless.
Therefore, a merge at the level above
the leaf level occurs.
Fall 2015
R McFadyen
ACS - 3902
30
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
A
55
This pointer becomes useless.
The corresponding node
should also be removed.
B
1
3
5
6
C
For this merge, 5 will be taken as a key value in A since
any key value in B is less than or equal to 5 but any key
value in C is larger than 5.
Fall 2015
R McFadyen
ACS - 3902
31
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
3
1
Fall 2015
3
5
R McFadyen
55
6
ACS - 3902
32
File Organizations
b+-tree operations
•search - always the same search length - tree height+1
•retrieval - sequential access is facilitated - how?
•insert - may cause overflow - tree may grow
•delete - may cause underflow - tree may shrink
Aside:
What do you expect for storage utilization?
http://en.wikipedia.org/wiki/B-tree
https://en.wikipedia.org/wiki/B%2B_tree
Fall 2015
R McFadyen
ACS - 3902
33
Download