Red-Black Trees CS 583 Analysis of Algorithms 7/1/2016 CS583 Fall'06: Red-Black Trees 1 Outline • Red-Black Trees – Definitions – Rotations • Augmenting Data Structures – – – – 7/1/2016 Definitions Dynamic order statistics Determining the rank of an element Maintaining subtree sizes CS583 Fall'06: Red-Black Trees 2 Definitions • A red-black tree is a binary search tree with one extra item per node: its color, which can be either RED or BLACK. – By constraining the color of nodes, red-black trees ensure the following balancing rule: • Any path from the root to a leaf is no more than twice as long as any other path. – Each node contains the following fields: • color, key, left, right, and parent. – If a child of a node does not exist, it is referred by NIL. • NILs are leaf nodes that are called external nodes, and all other key bearing nodes are internal nodes 7/1/2016 CS583 Fall'06: Red-Black Trees 3 Red-Black Trees Properties • Red-black trees must satisfy the following properties: 1) 2) 3) 4) 5) • Every node is either red or black. The root is black. Every leaf (NIL) is black. If a node is red, then both its children are black. For each node, all paths from the node to descendant leaves contain the same number of black nodes. We use a single sentinel nil[T] to represent NIL. – Its color field is BLACK, and all other fields are set to arbitrary values. – All pointers to NIL are replaced by pointers to sentinel nil[T]. 7/1/2016 CS583 Fall'06: Red-Black Trees 4 Height of the Tree We call the number of black nodes from, but not including a node x down to a leaf the black-height of the node, denoted bh(x). By property 5, this notion is well defined. Lemma 13.1 A red-black tree with n internal nodes has height at most 2 lg (n+1). Proof. First, show that the subtree rooted at any node x contains at least 2bh(x)-1 internal nodes. We prove it by induction on the height of x-based subtree. If the height of x is 0, then x is nil[T], hence it contains 0 nodes = 20-1. 7/1/2016 CS583 Fall'06: Red-Black Trees 5 Height of the Tree (cont.) Now, consider an internal node x with two children. Each child has a blackheight of bh(x) (if it is RED), or bh(x) - 1 (if it is BLACK). By our hypothesis, each child has at least 2bh(x)-1-1 (for the BLACK one) nodes. Thus, the subtree x contains at least 2*(2bh(x)-1-1) + 1 = 2bh(x)-1 internal nodes, which proves the claim. Note that, in the case of one child, that child cannot be BLACK (to not violate property 5). Hence, if the child y is RED, its bh(y)=bh(x) => N(x) >= 2bh(x)-1 + 1. 7/1/2016 CS583 Fall'06: Red-Black Trees 6 Height of the Tree (cont.) To complete the proof, let h be the height of the tree. According to property 4, at least half the nodes on any simple path from the root to a leaf, not including the root, must be black. (The simple path includes only one child, which must be black for each red node.) Consequently, the black-height of the root must be at least h/2; hence: n >= 2h/2-1 <=> 2h/2 <= n+1 <=> h/2 lg2 <= lg(n+1) <=> h <= 2 lg(n+1) 7/1/2016 CS583 Fall'06: Red-Black Trees 7 Rotations • The insert and delete operations when run on a redblack tree take O(lg n) time. – However, they modify the tree, which may violate the red-black tree properties. – To restore those properties, we must change the colors of some nodes and the pointers structure. • The pointer structure is changed through rotation, which is a local operation in a search that preserves the binary-search tree property. – There are two kinds of rotations: left and right. 7/1/2016 CS583 Fall'06: Red-Black Trees 8 Left Rotation The left rotation for node x assumes its right child y is not nil[T]. It "pivots" around the link from x to y: - It makes y the new root of the subtree. - x is y's left child. - x's right child is y's left child. x a y b c ---> y x a c b A rotation operation preserves the BST properties: key[a] <= key[x] <= key[b] <= key[y] <= key[c] 7/1/2016 CS583 Fall'06: Red-Black Trees 9 Left Rotation: Pseudocode Left-Rotate(T,x) 1 y = right[x] 2 right[x] = left[y] 3 if left[y] <> nil[T] 4 parent[left[y]] = x 5 parent[y] = parent[x] 6 if parent[x] = nil[T] 7 root[T] = y 8 else 9 if x = left[parent[x]] 10 left[parent[x]] = y 11 else 12 right[parent[x]] = y 13 left[y] = x 14 parent[x] = y // x is left child // x is right child The rotation operation runs in (1) time; only pointers are changed, all other fields remain the same. 7/1/2016 CS583 Fall'06: Red-Black Trees 10 Augmenting Data Structures • Many software engineering problems can be solved by using “textbook” data structures such as doublylinked lists, hash tables, or binary search trees. – For example, using a C++ STL library is sufficient for many financial algorithms. • In some situations, however, using a straightforward data structure is not sufficient. – It is very rare that an entirely new data structure has to be invented. – More often, it will suffice to augment a standard data structure by storing an additional information in it. • This is not often straightforward as the new information must be updated and maintained. 7/1/2016 CS583 Fall'06: Red-Black Trees 11 Dynamic Order Statistics • Recall that, the ith order statistics is the element in the set of n elements with the ith smallest key. – We saw that any order statistics could be retrieved in O(n) time from an ordered set. – Now we will augment a red-black tree to determine the order statistics in O(lg n) time. • The rank of an element is its position in the linear order of the set. – It can also be determined in O(lg n) time in an augmented red-black tree structure, 7/1/2016 CS583 Fall'06: Red-Black Trees 12 Order-Statistics Tree • An order-statistics tree T is a red-black tree with additional information stored in each node. – In addition to key[x], color[x], p[x], left[x], and right[x], we have another field size[x]. • This field contains the number of internal nodes in the subtree rooted at x (including x itself). – If we define size[nil[T]] = 0 (for sentinel nodes) then: • size[x] = size[left[x]] + size[right[x]] + 1 • We do not require keys to be distinct. – This creates ambiguity when determining the rank of an element. – The convention is to define the rank based on the inorder tree walk. 7/1/2016 CS583 Fall'06: Red-Black Trees 13 Retrieving an ith Order Element The procedure below returns a pointer to the node containing the ith smallest key in the subtree rooted at x. OS-Select(x,i) 1 r = size[left[x]] + 1 2 if i = r 3 return x 4 else 5 if i < r 6 return OS-Select(left[x], i) 7 else 8 return OS-Select(right[x], i-r) Each recursive call goes down one level in the tree, hence the total time for this procedure is proportional to the height of the tree, which is O(lg n) for the red-black tree. Thus, the running time of OS-Select is O(lg n). 7/1/2016 CS583 Fall'06: Red-Black Trees 14 Determining the Rank The procedure below returns the position of x in the linear order determined by an inorder tree walk of T. OS-Rank(T, x) 1 r = size[left[x]] + 1 2 y = x 3 while y <> root[T] 4 if y = right[p[y]] 5 r = r + size[left[p[y]]] + 1 6 y = p[y] 7 return r The rank of x can be viewed as the number of nodes preceding x in an inorder tree walk, plus 1 for x. Invariant: At the start of the while loop 3-6, r is the rank of key[x] in the subtree rooted at y. 7/1/2016 CS583 Fall'06: Red-Black Trees 15 Determining the Rank: Correctness • Initialization: – Prior to the first iteration, r is the rank of x in the subtree rooted at x, and y=x. • Maintenance: – At the end of each iteration y is set to p[y]. Hence r must be a rank of key[x] for a tree at p[y]. • If y is a left child, then no additional nodes for x need to be counted. • Otherwise, we need to add all nodes in p[y] left subtree and p[y] itself (line 5). • Termination: – The loop terminates when y=root[T], hence r is the rank of key[x] in the entire tree. 7/1/2016 CS583 Fall'06: Red-Black Trees 16 Determining the Rank: Performance • Each iteration of the while loop takes (1) time. • Node y goes up one level in the tree with each iteration. • Hence, the running time of OS-Rank is at worst proportional to the height of the tree: O(lg n) on an n-node order-statistics tree. 7/1/2016 CS583 Fall'06: Red-Black Trees 17 Maintaining Subtree Sizes • The size field in each node helps quickly compute order-statistics information. • This field should be maintained for both insertion and deletion operations on the red-black trees without affecting the asymptotic running time of these operations. • The insertion operation is based on two phases: – Walk the tree to add a node to the existing node. • Simply increment size[x] for each x on the path traversed. – The second phase is based on rotations. • The size needs to be changed for only two nodes involved. • Since only at most two rotations are needed, a constant time will be added, not affecting the asymptotic time. 7/1/2016 CS583 Fall'06: Red-Black Trees 18 Maintaining Subtree Sizes: Left Rotation Left-Rotate(T,x) 1 y = right[x] 2 right[x] = left[y] 3 if left[y] <> nil[T] 4 parent[left[y]] = x 5 parent[y] = parent[x] 6 if parent[x] = nil[T] 7 root[T] = y 8 else 9 if x = left[parent[x]] // x is left child 10 left[parent[x]] = y 11 else // x is right child 12 right[parent[x]] = y 13 left[y] = x 14 parent[x] = y 15 size[y] = size[x] 16 size[x] = size[left[x]] + size[right[x]] + 1 7/1/2016 CS583 Fall'06: Red-Black Trees 19