Uploaded by Junaid Girkar

23trees

advertisement
CSCI-GA.1170-001/003 Fundamental Algorithms
October 17, 2021
2-3-Trees
Lecturer: Yevgeniy Dodis
2-3-trees are one instance of a class of data structures called balanced trees. These data structures
provide an efficient worst case instantiation for the Dictionary abstract data type. Recall that a
dictionary supports the operations Search, Insert, and Delete on a set of items drawn from
an ordered collection U , called the universe. Balanced trees perform each operation in worst-case
time O(log n), where n is the number of items stored in the dictionary at the time of the call.
1
The Data Structure
In a 2-3 tree, all the leaves are at the same level and each internal node has 2 or 3 children. All
the records stored in the dictionary are held in the leaves. Recall that each record includes a key,
the keys being used to order the records. Traversing the leaves of the 2-3 tree in left-to-right order
yields the records in sorted order. Each internal node of the 2-3 tree stores a copy of the largest
key that appears in any one of the leaves below it. For ease of notation, it would be helpful to
associate both the “guide values” stored at internal nodes, and the key values stored in leaf nodes
to be stored in the field .key of the node. We show an example below of a 2-3 Tree:
96
72
63
84
72 75 82 84 91
96
96
A 2-3 tree storing n nodes has a height between ⌈log3 n⌉ and ⌊log2 n⌋. A good exercise would
be to try to formally prove this statement. Thus, the length of any path from the root to a leaf is
O(log n).
2
Operations
Search. A search in a 2-3 tree proceeds in a manner very similar to that of an ordinary binary
search tree. The details and the proof of the desired logarithmic running time are omitted from
this handout and left to the reader as a simple exercise. Indeed, it would be immensely helpful to
formally write a pseudocode that implements this search process.
1
Insertion. An insertion begins by performing a search to determine where the item would be
located (if it were present in the tree). The item is inserted as a leaf at the resulting location. The
new leafs parent p now may have either three or four children. If it has three children, the insertion
is complete. Otherwise, p is replaced by two nodes p1 and p2 , where the two leftmost children of
p are placed under p1 and the two rightmost children under p2 . Of course, the left-to-right order
of the children is maintained. This operation is called a node partition and is repeated at each
successively higher level of the tree along the path from the inserted item to the root, as required
to remove nodes with four children. A special case arises if the root is replaced by two nodes; then,
a new root node is created; its children are the two nodes newly formed from the old root. The
details on how to update the copies of keys stored at internal nodes are straight-forward and left
as an exercise for the reader. It is easily seen that insertion takes O(log n) time.
Let us look at an example to visualize the insertion process. To the above tree, we will insert
element 73. We begin by finding the location of 73 in this tree and insert a leaf node corresponding
to this new record.
96
72
96
84
63 72 73 75 82 84 91 96
However, this insertion creates a violation at the internal node labeled 84. To account for this, we
will have to partition the node to create two internal nodes, each with two of the four children.
This yields the following tree:
96
72
96
84
75
63 72 73 75 82 84 91 96
This node partition now leaves the node 96 with four children. This yields another node partition.
However, since the node 96 is the root of the tree, the resulting tree will have its height increased
by 1.
96
75
72
96
75
84
96
63 72 73 75 82 84 91 96
Deletion. A deletion proceeds as follows: Again, a search is performed to find the item to be
deleted. The leaf containing the item is deleted. At this point the parent p has either one or two
2
children. If it has two children, deletion is complete. Otherwise, if p has only one child, p’s siblings
are checked: If p has a left sibling with three children, let s be that sibling; otherwise, if p has a
right sibling with three children, let s be that sibling. If such an s exists, node s gives p the child
closest to p’s child. If no such s exists, p and its left sibling are merged into a single node. This
process is repeated at each successively higher level of the tree along the path from the deleted
item to the root, as required to remove nodes with one child. If the root ends up with one child,
the root is simply removed and its sole child becomes the new root.
Again, it is useful to visualize the deletion process. We will look at the effect of deleting 75
from the following tree:
96
72
40
32 40
96
61
72
45 51 61 63
84
72
96
75
84
91
We begin by searching for the node 75.
96
72
40
96
72
61
84
96
32 40 45 55 61 63 72 75 84 91 96
Upon finding the node, it is deleted which yields the following tree.
96
72
40
96
72
61
84
96
32 40 45 55 61 63 72 84 91 96
3
96
However, that deletion leaves the internal node labeled with 84 at a violation (has only one child).
Therefore, it now looks to its siblings. It has exactly one sibling which has only two children, as
opposed to three. The only solution is for a merge of the nodes. This yields the following tree:
96
72
40
96
96
72
61
32 40 45 55 61 63 72 84 91 96
The above merger, however, created a violation at the parent level. That node now looks to its left
sibling which has three children and borrows the rightmost child of the left sibling to preserve the
invariants. This gives us the final tree:
96
61
40
96
61
72
96
32 40 45 55 61 63 72 84 91 96
It is also useful to look at the case where the delete operation causes the tree to reduce its height
by 1. For example, consider the following input tree:
96
72
96
40
32 40
72
63
84
72
75
96
84
The deletion of node 75 produces the following sequence of operations:
4
91
96
96
96
72
40
96
72
84
96
72
96
40
96
72
84
72
96
40
96
72
96
32 40 63 72 75 84 91 96 32 40 63 72 84 91 96 32 40 63 72 84 91 96
96
96
96
40
72
40
96
72
96
32 40 63 72 84 91 96 32 40 63 72 84 91 96
3
An Alternative Definition
Instead of storing a single key at each internal node, one can also store either one or two keys at
such nodes: in the case where the node has two children, the maximum key appearing in a leaf of
the tree rooted by the left child, and in the case where the node has three children, one additionally
stores the maximum key appearing as a leaf in the tree rooted by the middle child.
By storing these keys at internal nodes, the search procedure may be somewhat more efficient,
especially if nodes are stored as records on disk: all of the information necessary to choose which
child to examine next is stored in the node itself. However, keeping this information properly
maintained is just slightly more tedious.
5
Download