B Trees

advertisement
Data Structures
B‐Trees
Tzachi (Isaac) Rosen
Motivation
• When data is too large to fit in main memory, it
it expands to the disk.
d t th di k
– Disk access is highly expensive compared to a typical computer instruction
– The number of disk accesses will dominate the running time.
• Our goal is to devise a search tree that will minimize disk accesses.
Tzachi (Isaac) Rosen
1
Typical Disk Drive
Tzachi (Isaac) Rosen
B‐Tree
• A balanced search tree designed to work well on direct‐access secondary storage devices.
di t
d
t
d i
• A generalize search tree.
Tzachi (Isaac) Rosen
2
B‐Tree
• Each node corresponds to a block of data on th di k
the disk.
• Minimizes disk accesses.
– Tree of height 2 containing over one billion keys.
Tzachi (Isaac) Rosen
Definition
• A B‐tree T is a rooted tree (at root[T]) having th f ll i
the following properties:
ti
1. Every node x has the following fields:
a. n[x], the number of keys currently stored in node x,
b. the n[x] keys themselves, stored in non‐decreasing order, so that key1[x] ≤ key2[x] ≤ ∙∙∙ ≤ keyn[x][x],
c. leaf[x], a Boolean value that is TRUE if x is a leaf and l f[ ] B l
l th t i TRUE if i l f d
FALSE if x is an internal node.
Tzachi (Isaac) Rosen
3
Definition
2. Each internal node x also contains n[x]+ 1 pointers c1[x], c
[x] c2[x], ..., c
[x]
cn[x]+1[x] to its children.
[x] to its children
–
Leaf nodes have no children, so their ci fields are undefined.
3. The keys keyi[x] separate the ranges of keys stored in each sub‐tree:
–
if ki is any key stored in the sub‐tree with root c
y y
]
i[[x], then k1 ≤ key1[x] ≤ k2 ≤ key2[x] ≤∙∙∙ ≤ keyn[x][x] ≤ kn[x]+1.
4. All leaves have the same depth, which is the tree's height h.
Tzachi (Isaac) Rosen
Definition
5. There are lower and upper bounds on the number of keys a node can contain
of keys a node can contain.
–
These bounds can be expressed in terms of a fixed integer t ≥ 2 called the minimum degree of the B‐tree:
a.
Every node other than the root must have at least t‐1 keys.
–
–
b
b.
Every internal node thus has at least t children.
If the tree is nonempty, the root must have at least one key.
Every node can contain at most 2t‐1 keys
Every node can contain at most 2t‐1 keys.
–
–
Therefore, an internal node can have at most 2t children.
We say that a node is full if it contains exactly 2t–1 keys.
Tzachi (Isaac) Rosen
4
2‐3‐4 Tree
• The simplest B‐tree occurs when t = 2.
• Every internal node then has either 2, 3, or 4 children, and we have a 2‐3‐4 tree.
• In practice, however, much larger values of t are typically used.
Tzachi (Isaac) Rosen
Height
• Theorem:
If n ≥ 1, then for any n‐key B‐tree T of height h and minimum degree t ≥ 2, • Proof:
If a B‐tree has height h, the number of its nodes is minimized when the root contains one key and all other nodes contain t ‐ 1 keys.
In this case, there are
1 node at the root
2 nodes at depth 1
2 nodes at depth 1,
2t nodes at depth 2,
2t2 nodes at depth 3,
and so on,
until at depth h there are 2th‐1 nodes
Tzachi (Isaac) Rosen
5
Height
Thus, the number n of keys satisfies the i
inequality:
lit
which implies
Tzachi (Isaac) Rosen
Basic Operations
• We always keep the root in main memory, so th t DISK READ
that a DISK‐READ on the root is never required
th
ti
i d
• Any nodes that are passed as parameters must already have had a DISK‐READ.
• Any changed node must have DISK‐WRITE.
Tzachi (Isaac) Rosen
6
Searching
search (x, k)
i = 1
1
while (i ≤ n[x] & k > keyi[x]) do i = i + 1
if (i ≤ n[x] & k = keyi[x]) then
return (x, i)
if (leaf [x]) then
return null
else
diskRead(ci[x])
return search(ci[x], k)
CPU time is O(th) = O(t logt n).
Disk access is O(h) = O(logt n)
Tzachi (Isaac) Rosen
Insertion
Tzachi (Isaac) Rosen
7
Creating an Empty B‐tree
create (T)
x = ALLOCATE‐NODE()
ALLOCATE NODE()
leaf[x] = TRUE
n[x] = 0
diskWrite(x)
[ ]
root[T] = x
CPU time is O(1).
Disk access is O(1)
Tzachi (Isaac) Rosen
Splitting a Node
Tzachi (Isaac) Rosen
8
Splitting a Node
splitChild (x, i, y)
z = allocateNode() leaf[z] ← leaf[y] n[z] ← t ‐ 1
z = allocateNode(), leaf[z] ← leaf[y], n[z] ← t for (j = 1 to t – 1) do keyj[z] ← keyj+t[y]
if (not leaf[y]) then for (j = 1 to t) do cj[z] ← cj+t[y]
n[y] = t ‐ 1
for (j = n[x] + 1 downto i + 1) do cj+1[x] ← cj[x]
]
ci+1[[x] ← z
for (j = n[x] downto i) do keyj+1[x] ← keyj[x]
keyi[x] ← keyt[y], n[x] ← n[x] + 1
diskWrite(y), diskWrite(z), diskWrite(x) CPU time is O(t).
Disk access is O(1)
Tzachi (Isaac) Rosen
Insertion
insert (T, k)
r = root[T]
r = root[T]
if (n[r] = 2t – 1) then
s = allocateNode(), root[T] = s,
leaf[s] = false, n[s] = 0, c1[s] = r
splitChild(s, 1, r)
r = s
insertNonFull(r, k)
Tzachi (Isaac) Rosen
9
Insertion
CPU time is O(th) = O(t logt n).
insertNonFull (x, k)
Disk access is O(h) = O(logt n)
i = n[x]
if (leaf[x]) then
while (i ≥ 1 & k < keyi[x]) do keyi+1[x] = keyi[x], i = i ‐ 1
keyi+1[x] ← k
n[x] ← n[x] + 1
diskWrite(x)
else
while (i ≥ 1 & k < keyi[x]) do i = i – 1
i = i + 1
diskRead(ci[x])
if (n[ci[x]] = 2t –
[x]] 2t 1) then
1) then
splitChild (x, i, ci[x])
if (k > keyi[x]) then i = i + 1
insertNonFull (ci[x], k)
Tzachi (Isaac) Rosen
Deletion
delete (x, k)
(1) if (the key k is in node x & x is a leaf)
delete the key k from x.
(2a) else if (k is in x & x is internal &
the child y that precedes k has at least t keys)
recursively delete the predecessor k′ of k,
replace k by k′ in x
(2b) Symmetrically, if (k is in x & x is internal &
the child that follows k has at least t keys)
the child z that follows k has at least t keys)
(2c) else if (k is in x & x is internal)
delete k and merge its children
Tzachi (Isaac) Rosen
10
Deletion
CPU time is O(th) = O(t logt n).
Disk access is O(h) = O(logt n)
else if (k is not present in internal node x)
determine the root r of the appropriate
subtree that must contain k
(3a) if (r has only t ‐ 1 keys but has an
immediate sibling with at least t keys)
give r an extra key by shifting
(3b) else
else if (r and both of r's
if (r and both of r s immediate
immediate siblings
siblings
have t ‐ 1 keys)
merge r with one sibling
finish by recurring on the appropriate child of x
Tzachi (Isaac) Rosen
11
Download