CSCI-GA.1170-001/003 Fundamental Algorithms October 17, 2021 2-3-Trees Lecturer: Yevgeniy Dodis 2-3-trees are one instance of a class of data structures called balanced trees. These data structures provide an efficient worst case instantiation for the Dictionary abstract data type. Recall that a dictionary supports the operations Search, Insert, and Delete on a set of items drawn from an ordered collection U , called the universe. Balanced trees perform each operation in worst-case time O(log n), where n is the number of items stored in the dictionary at the time of the call. 1 The Data Structure In a 2-3 tree, all the leaves are at the same level and each internal node has 2 or 3 children. All the records stored in the dictionary are held in the leaves. Recall that each record includes a key, the keys being used to order the records. Traversing the leaves of the 2-3 tree in left-to-right order yields the records in sorted order. Each internal node of the 2-3 tree stores a copy of the largest key that appears in any one of the leaves below it. For ease of notation, it would be helpful to associate both the “guide values” stored at internal nodes, and the key values stored in leaf nodes to be stored in the field .key of the node. We show an example below of a 2-3 Tree: 96 72 63 84 72 75 82 84 91 96 96 A 2-3 tree storing n nodes has a height between ⌈log3 n⌉ and ⌊log2 n⌋. A good exercise would be to try to formally prove this statement. Thus, the length of any path from the root to a leaf is O(log n). 2 Operations Search. A search in a 2-3 tree proceeds in a manner very similar to that of an ordinary binary search tree. The details and the proof of the desired logarithmic running time are omitted from this handout and left to the reader as a simple exercise. Indeed, it would be immensely helpful to formally write a pseudocode that implements this search process. 1 Insertion. An insertion begins by performing a search to determine where the item would be located (if it were present in the tree). The item is inserted as a leaf at the resulting location. The new leafs parent p now may have either three or four children. If it has three children, the insertion is complete. Otherwise, p is replaced by two nodes p1 and p2 , where the two leftmost children of p are placed under p1 and the two rightmost children under p2 . Of course, the left-to-right order of the children is maintained. This operation is called a node partition and is repeated at each successively higher level of the tree along the path from the inserted item to the root, as required to remove nodes with four children. A special case arises if the root is replaced by two nodes; then, a new root node is created; its children are the two nodes newly formed from the old root. The details on how to update the copies of keys stored at internal nodes are straight-forward and left as an exercise for the reader. It is easily seen that insertion takes O(log n) time. Let us look at an example to visualize the insertion process. To the above tree, we will insert element 73. We begin by finding the location of 73 in this tree and insert a leaf node corresponding to this new record. 96 72 96 84 63 72 73 75 82 84 91 96 However, this insertion creates a violation at the internal node labeled 84. To account for this, we will have to partition the node to create two internal nodes, each with two of the four children. This yields the following tree: 96 72 96 84 75 63 72 73 75 82 84 91 96 This node partition now leaves the node 96 with four children. This yields another node partition. However, since the node 96 is the root of the tree, the resulting tree will have its height increased by 1. 96 75 72 96 75 84 96 63 72 73 75 82 84 91 96 Deletion. A deletion proceeds as follows: Again, a search is performed to find the item to be deleted. The leaf containing the item is deleted. At this point the parent p has either one or two 2 children. If it has two children, deletion is complete. Otherwise, if p has only one child, p’s siblings are checked: If p has a left sibling with three children, let s be that sibling; otherwise, if p has a right sibling with three children, let s be that sibling. If such an s exists, node s gives p the child closest to p’s child. If no such s exists, p and its left sibling are merged into a single node. This process is repeated at each successively higher level of the tree along the path from the deleted item to the root, as required to remove nodes with one child. If the root ends up with one child, the root is simply removed and its sole child becomes the new root. Again, it is useful to visualize the deletion process. We will look at the effect of deleting 75 from the following tree: 96 72 40 32 40 96 61 72 45 51 61 63 84 72 96 75 84 91 We begin by searching for the node 75. 96 72 40 96 72 61 84 96 32 40 45 55 61 63 72 75 84 91 96 Upon finding the node, it is deleted which yields the following tree. 96 72 40 96 72 61 84 96 32 40 45 55 61 63 72 84 91 96 3 96 However, that deletion leaves the internal node labeled with 84 at a violation (has only one child). Therefore, it now looks to its siblings. It has exactly one sibling which has only two children, as opposed to three. The only solution is for a merge of the nodes. This yields the following tree: 96 72 40 96 96 72 61 32 40 45 55 61 63 72 84 91 96 The above merger, however, created a violation at the parent level. That node now looks to its left sibling which has three children and borrows the rightmost child of the left sibling to preserve the invariants. This gives us the final tree: 96 61 40 96 61 72 96 32 40 45 55 61 63 72 84 91 96 It is also useful to look at the case where the delete operation causes the tree to reduce its height by 1. For example, consider the following input tree: 96 72 96 40 32 40 72 63 84 72 75 96 84 The deletion of node 75 produces the following sequence of operations: 4 91 96 96 96 72 40 96 72 84 96 72 96 40 96 72 84 72 96 40 96 72 96 32 40 63 72 75 84 91 96 32 40 63 72 84 91 96 32 40 63 72 84 91 96 96 96 96 40 72 40 96 72 96 32 40 63 72 84 91 96 32 40 63 72 84 91 96 3 An Alternative Definition Instead of storing a single key at each internal node, one can also store either one or two keys at such nodes: in the case where the node has two children, the maximum key appearing in a leaf of the tree rooted by the left child, and in the case where the node has three children, one additionally stores the maximum key appearing as a leaf in the tree rooted by the middle child. By storing these keys at internal nodes, the search procedure may be somewhat more efficient, especially if nodes are stored as records on disk: all of the information necessary to choose which child to examine next is stored in the node itself. However, keeping this information properly maintained is just slightly more tedious. 5